摘要

1. Robust Isometric Non-Rigid Structure-from-Motion [PDF] 返回目录
Shaifali Parashar, Adrien Bartoli, Daniel Pizarro
Abstract: Non-Rigid Structure-from-Motion (NRSfM) reconstructs a deformable 3D object from the correspondences established between monocular 2D images. Current NRSfM methods lack statistical robustness, which is the ability to cope with correspondence errors.This prevents one to use automatically established correspondences, which are prone to errors, thereby strongly limiting the scope of NRSfM. We propose a three-step automatic pipeline to solve NRSfM robustly by exploiting isometry. Step 1 computes the optical flow from correspondences, step 2 reconstructs each 3D point's normal vector using multiple reference images and integrates them to form surfaces with the best reference and step 3 rejects the 3D points that break isometry in their local neighborhood. Importantly, each step is designed to discard or flag erroneous correspondences. Our contributions include the robustification of optical flow by warp estimation, new fast analytic solutions to local normal reconstruction and their robustification, and a new scale-independent measure of 3D local isometric coherence. Experimental results show that our robust NRSfM method consistently outperforms existing methods on both synthetic and real datasets.
摘要：非刚性结构从 - 运动（NRSfM）重建来自单眼2D图像之间建立的对应的可变形的3D对象。当前NRSfM方法缺乏鲁棒性的统计，这是为了应付对应errors.This防止一个使用自动建立对应关系，这是容易出错，从而强烈地限制NRSfM的范围的能力。我们提出了一个三步管道自动解决NRSfM强劲通过利用等距。步骤1计算从对应，步骤中使用多个参考图像，并集成它们，以形成与最佳参考和步骤3拒绝该3D点表面2分别重建3D点的法线矢量光流在它们的本地附近断裂等距。重要的是，各步骤的目的是丢弃或标记错误对应关系。我们的贡献包括由经纱估计，到本地正常重建及其robustification，和3D立体局部相干性的新的规模无关的测量新的快速分析解决方案的光流的robustification。实验结果表明，我们强大的NRSfM方法始终优于在合成和真实数据的现有方法。

2. Torch-Points3D: A Modular Multi-Task Frameworkfor Reproducible Deep Learning on 3D Point Clouds [PDF] 返回目录
Thomas Chaton, Nicolas Chaulet, Sofiane Horache, Loic Landrieu
Abstract: We introduce Torch-Points3D, an open-source framework designed to facilitate the use of deep networks on3D data. Its modular design, efficient implementation, and user-friendly interfaces make it a relevant tool for research and productization alike. Beyond multiple quality-of-life features, our goal is to standardize a higher level of transparency and reproducibility in 3D deep learning research, and to lower its barrier to entry. In this paper, we present the design principles of Torch-Points3D, as well as extensive benchmarks of multiple state-of-the-art algorithms and inference schemes across several datasets and tasks. The modularity of Torch-Points3D allows us to design fair and rigorous experimental protocols in which all methods are evaluated in the same conditions. The Torch-Points3D repository :this https URL
摘要：介绍火炬Points3D，一个开源框架设计，方便使用深层网络的on3D数据。它的模块化设计，有效的实施，以及用户友好的界面，使其成为研究和产品化都相关的工具。除了质量的生活多的特点，我们的目标是标准化的3D深度学习研究的透明度和可重复性较高的水平，并降低其准入门槛。在本文中，我们提出了火炬Points3D的设计原则，以及跨多个数据集和任务的多个国家的最先进的算法和推理方案广泛基准。火炬Points3D的模块化设计使我们能够设计出公正的，所有的方法都在相同的条件下评估严格的实验协议。火炬Points3D库：此HTTPS URL

3. Hyperspectral Unmixing via Nonnegative Matrix Factorization with Handcrafted and Learnt Priors [PDF] 返回目录
Min Zhao, Tiande Gao, Jie Chen, Wei Chen
Abstract: Nowadays, nonnegative matrix factorization (NMF) based methods have been widely applied to blind spectral unmixing. Introducing proper regularizers to NMF is crucial for mathematically constraining the solutions and physically exploiting spectral and spatial properties of images. Generally, properly handcrafting regularizers and solving the associated complex optimization problem are non-trivial tasks. In our work, we propose an NMF based unmixing framework which jointly uses a handcrafting regularizer and a learnt regularizer from data. we plug learnt priors of abundances where the associated subproblem can be addressed using various image denoisers, and we consider an l_2,1-norm regularizer to the abundance matrix to promote sparse unmixing results. The proposed framework is flexible and extendable. Both synthetic data and real airborne data are conducted to confirm the effectiveness of our method.
摘要：目前，非负矩阵分解（NMF）的方法已被广泛应用到盲光谱分离。适当regularizers引入到NMF为数学约束的解决方案并且物理利用图像的光谱和空间特性是至关重要的。一般情况下，正常手工制作regularizers和解决相关的复杂的优化问题，是不平凡的任务。在我们的工作中，我们提出一种联合使用手工制作正则，并从数据中了解到正则基于NMF未混合的框架。我们了解到塞丰度，其中相关的子问题可以使用各种图像denoisers解决的前科，我们认为一个l_2,1范正则对丰矩阵促进稀疏未混合的结果。拟议的框架是灵活的，可扩展的。两个合成数据和实际空气中的数据进行确认我们的方法的有效性。

4. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering [PDF] 返回目录
Alex Trevithick, Bo Yang
Abstract: We present a simple yet powerful implicit neural function that can represent and render arbitrarily complex 3D scenes in a single network only from 2D observations. The function models 3D scenes as a general radiance field, which takes a set of posed 2D images as input, constructs an internal representation for each 3D point of the scene, and renders the corresponding appearance and geometry of any 3D point viewing from an arbitrary angle. The key to our approach is to explicitly integrate the principle of multi-view geometry to obtain the internal representations from observed 2D views, guaranteeing the learned implicit representations meaningful and multi-view consistent. In addition, we introduce an effective neural module to learn general features for each pixel in 2D images, allowing the constructed internal 3D representations to be remarkably general as well. Extensive experiments demonstrate the superiority of our approach.
摘要：我们提出一个简单的能够代表只有从2D渲染的观测在任意一个网络复杂的3D场景而强大的隐神经功能。的函数模型的3D场景作为一般辐射场，这需要所构成的2D图像的一组作为输入，构建针对场景中的每个3D点的内部表示，并呈现任何3D点的对应的外观和几何形状从任意角度观看。我们的方法，关键是要明确整合多视图几何的原则，从观察到的二维视图获得内部表示，保证学习隐含陈述有意义和多观点是一致的。此外，我们引入一个有效的神经模块，以了解一般特征为在二维图像的每个像素，从而允许内部构造的3D表示是显着一般为好。大量的实验证明了该方法的优越性。

5. A Novel ANN Structure for Image Recognition [PDF] 返回目录
Shilpa Mayannavar, Uday Wali, V M Aparanji
Abstract: The paper presents Multi-layer Auto Resonance Networks (ARN), a new neural model, for image recognition. Neurons in ARN, called Nodes, latch on to an incoming pattern and resonate when the input is within its 'coverage.' Resonance allows the neuron to be noise tolerant and tunable. Coverage of nodes gives them an ability to approximate the incoming pattern. Its latching characteristics allow it to respond to episodic events without disturbing the existing trained network. These networks are capable of addressing problems in varied fields but have not been sufficiently explored. Implementation of an image classification and identification system using two-layer ARN is discussed in this paper. Recognition accuracy of 94% has been achieved for MNIST dataset with only two layers of neurons and just 50 samples per numeral, making it useful in computing at the edge of cloud infrastructure.
摘要：本文介绍了多层自动共振网络（ARN），新的神经网络模型，图像识别。在ARN神经元，称为节点，锁定到一个输入图案和共振当输入是其内“覆盖”。共振使神经元是抗噪声和可调。节点的覆盖面给他们以近似传入模式的能力。其栓锁特性允许它不破坏现有的培训网络，以偶发事件。这些网络能够解决不同领域的问题，但没有得到充分的探讨。使用两层ARN图像分类和识别系统的实施是在本文中讨论。的94％的识别精度已经实现MNIST数据集仅具有两个层的神经元和每标号只是50个样本，使其在云计算基础设施的边缘计算是有用的。

6. Table Structure Recognition using Top-Down and Bottom-Up Cues [PDF] 返回目录
Sachin Raja, Ajoy Mondal, C. V. Jawahar
Abstract: Tables are information-rich structured objects in document images. While significant work has been done in localizing tables as graphic objects in document images, only limited attempts exist on table structure recognition. Most existing literature on structure recognition depends on extraction of meta-features from the PDF document or on the optical character recognition (OCR) models to extract low-level layout features from the image. However, these methods fail to generalize well because of the absence of meta-features or errors made by the OCR when there is a significant variance in table layouts and text organization. In our work, we focus on tables that have complex structures, dense content, and varying layouts with no dependency on meta-features and/or OCR. We present an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells. We incorporate structural constraints as additional differential components to the loss function for cell detection. We empirically validate our method on the publicly available real-world datasets - ICDAR-2013, ICDAR-2019 (cTDaR) archival, UNLV, SciTSR, SciTSR-COMP, TableBank, and PubTabNet. Our attempt opens up a new direction for table structure recognition by combining top-down (table cells detection) and bottom-up (structure recognition) cues in visually understanding the tables.
摘要：表是在文档图像信息丰富的结构化对象。虽然显著的作品在定位表作为文档图像图形对象已经完成，只限于尝试在表结构识别存在。上结构识别大多数现有文献取决于元特征提取从PDF文件上或光学字符识别（OCR）模型来提取低级别的布局从图像特征。然而，这些方法不能一概而论没有通过OCR由间功能或错误的很好，因为当在表格布局和文字组织显著差异。在我们的工作中，我们注重的是结构复杂，密集的内容，并与元功能和/或OCR不依赖不同的布局表。我们提出了表格结构识别，结合小区检测和互动模块本地化细胞，并用其他检测到的小区预测它们的行和列的关联的方法。我们结合结构约束作为附加差分分量到用于小区检测的损失函数。我们凭经验验证对可公开获得的真实世界的数据集，我们的方法 - ICDAR-2013，ICDAR-2019（cTDaR）档案，UNLV，SciTSR，SciTSR-COMP，TableBank和PubTabNet。我们试图通过在视觉上理解表自上而下（表单元检测）和自底向上（结构识别）线索组合开辟了表格结构识别一个新的方向。

7. Incorporating planning intelligence into deep learning: A planning support tool for street network design [PDF] 返回目录
Zhou Fang, Ying Jin, Tianren Yang
Abstract: Deep learning applications in shaping ad hoc planning proposals are limited by the difficulty in integrating professional knowledge about cities with artificial intelligence. We propose a novel, complementary use of deep neural networks and planning guidance to automate street network generation that can be context-aware, example-based and user-guided. The model tests suggest that the incorporation of planning knowledge (e.g., road junctions and neighborhood types) in the model training leads to a more realistic prediction of street configurations. Furthermore, the new tool provides both professional and lay users an opportunity to systematically and intuitively explore benchmark proposals for comparisons and further evaluations.
摘要：在整形特设规划建议深度学习申请在整合有关与人工智能城市专业知识难度的限制。我们提出了一个新颖，配套使用深层神经网络和规划指导意见的街道网络一代可能与上下文感知，例如基于和用户引导的自动化。该模型试验表明，在模型训练引线规划知识（例如，路口和邻里型）纳入到街道配置的更现实的预测。此外，新的工具，同时提供专业和非专业用户有机会系统地，直观地探索的比较和进一步的评估基准建议。

8. Uncertainty-Aware Few-Shot Image Classification [PDF] 返回目录
Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Shih-Fu Chang
Abstract: Few-shot image classification aims to learn to recognize new categories from limited labelled data. Recently, metric learning based approaches have been widely investigated which classify a query sample by finding the nearest prototype from the support set based on the feature similarities. For few-shot classification, the calculated similarity of a query-support pair depends on both the query and the support. The network has different confidences/uncertainty on the calculated similarities of the different pairs and there are observation noises on the similarity. Understanding and modeling the uncertainty on the similarity could promote better exploitation of the limited samples in optimization. However, this is still underexplored in few-shot learning. In this work, we propose Uncertainty-Aware Few-Shot (UAFS) image classification by modeling uncertainty of the similarities of query-support pairs and performing uncertainty-aware optimization. Particularly, we design a graph-based model to jointly estimate the uncertainty of similarities between a query and the prototypes in the support set. We optimize the network based on the modeled uncertainty by converting the observed similarity to a probabilistic similarity distribution to be robust to observation noises. Extensive experiments show our proposed method brings significant improvements on top of a strong baseline and achieves the state-of-the-art performance.
摘要：很少有镜头图像分类旨在学习认识新的类别，从有限的标记数据。最近，基于度量学习方法已被广泛研究，其分类通过发现从基于特征的相似性支持集中最接近原型的查询样本。对于为数不多的镜头分类，查询支持对计算出的相似度取决于查询和同时支持。该网络具有在不同对的计算的相似度不同置信度/不确定性和有所述相似度观测噪声。理解和模拟上的相似性的不确定性可能会促进更好地利用在优化有限的样本。然而，这仍然是在勘探不足几拍的学习。在这项工作中，我们通过模拟查询，支持对的相似性，不确定性和不确定性进行感知的优化提出了不确定性感知为数不多的射击（UAFS）图像分类。特别是，我们设计联合估计查询，并支持一组原型之间的相似性的不确定性基于图的模型。我们通过观察到的相似性转换为概率相似度分布是稳健的，以观测噪声优化基于建模的不确定性的网络。大量的实验证明我们提出的方法带来了强大的基础之上显著的改善，达到国家的最先进的性能。

9. Real Time Face Recognition Using Convoluted Neural Networks [PDF] 返回目录
Rohith Pudari, Sunil Bhutada, Sai Pavan Mudavath
Abstract: Face Recognition is one of the process of identifying people using their face, it has various applications like authentication systems, surveillance systems and law enforcement. Convolutional Neural Networks are proved to be best for facial recognition. Detecting faces using core-ml api and processing the extracted face through a coreML model, which is trained to recognize specific persons. The creation of dataset is done by converting face videos of the persons to be recognized into Hundreds of images of person, which is further used for training and validation of the model to provide accurate real-time results.
摘要：人脸识别是识别人在使用自己的脸的过程中的一个，它有各种应用，如身份验证系统，监控系统和执法。卷积神经网络被证明是最好的面部识别。检测面部通过coreML模型，其被训练来识别特定的人使用芯毫升API和处理所提取的面部。数据集的创建是通过将人的面部视频被识别到数百人，这还用于模型的训练和验证，以提供准确的实时结果的图像来完成。

10. Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer [PDF] 返回目录
Mahdi Ghorbani, Fahimeh Fooladgar, Shohreh Kasaei
Abstract: Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods have been proposed to diminish the heavy computational burden and memory consumption. Among them, the pruning and quantizing methods exhibit a critical drop in performances by compressing the model parameters. While the knowledge distillation methods improve the performance of compact models by focusing on training lightweight networks with the supervision of cumbersome networks. In the proposed method, the knowledge distillation has been performed within the network by constructing multiple branches over the primary stream of the model, known as the self-distillation method. Therefore, the ensemble of sub-neural network models has been proposed to transfer the knowledge among themselves with the knowledge distillation policies as well as an adversarial learning strategy. Hence, The proposed ensemble of sub-models is trained against a discriminator model adversarially. Besides, their knowledge is transferred within the ensemble by four different loss functions. The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process. Extensive experimental results on the main challenging datasets show that the proposed network outperforms the primary model in terms of accuracy at the same number of parameters and computational cost. The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods. The effectiveness of the proposed models has also been illustrated in the encoder-decoder model.
摘要：深层神经网络结构已经达到在现场了解的任务显着改善。使用高效的模型是对有限资源的设备中最重要的制约因素之一。最近，一些压缩方法被提出，以减少沉重的计算负担和内存消耗。其中，修剪和量化方法，通过压缩模型参数表现出表演临界落。虽然知识蒸馏方法来提高紧凑车型的通过集中培训轻量级网络繁琐的网络监督的表现。在所提出的方法中，蒸馏知识已经在网络内通过在模型，被称为自蒸馏方法的主要流构建多个分支执行。因此，副神经网络模型的集成已经提出了用知识蒸馏政策以及对抗性的学习策略传递彼此之间的了解。因此，子模型的提出合奏对鉴别模型adversarially培训。此外，他们的知识是合奏内四个不同的损失函数转移。该方法一直致力于既轻巧图像分类和编码器，解码器架构，以促进小型和紧凑车型性能，而在推理过程中产生额外的计算开销。主数据集的挑战广泛的实验结果表明，该网络性能优于精度方面的主要模型在相同数量的参数和计算成本。得到的结果表明，该模型已超过自蒸馏法早期的想法实现显著的改善。提出的模型的有效性也已在编码器 - 解码器模型示出。

11. Controllable Continuous Gaze Redirection [PDF] 返回目录
Weihao Xia, Yujiu Yang, Jing-Hao Xue, Wensen Feng
Abstract: In this work, we present interpGaze, a novel framework for controllable gaze redirection that achieves both precise redirection and continuous interpolation. Given two gaze images with different attributes, our goal is to redirect the eye gaze of one person into any gaze direction depicted in the reference image or to generate continuous intermediate results. To accomplish this, we design a model including three cooperative components: an encoder, a controller and a decoder. The encoder maps images into a well-disentangled and hierarchically-organized latent space. The controller adjusts the magnitudes of latent vectors to the desired strength of corresponding attributes by altering a control vector. The decoder converts the desired representations from the attribute space to the image space. To facilitate covering the full space of gaze directions, we introduce a high-quality gaze image dataset with a large range of directions, which also benefits researchers in related areas. Extensive experimental validation and comparisons to several baseline methods show that the proposed interpGaze outperforms state-of-the-art methods in terms of image quality and redirection precision.
摘要：在这项工作中，我们提出interpGaze，可控的目光重定向一个新的框架，同时实现精确的重定向和连续插补。给定两个图像凝视具有不同属性的，我们的目标是重定向眼睛注视一个人到参考图像中所描绘的任何注视方向或以产生连续的中间结果。要做到这一点，我们设计了包括三名协作组件的模式：编码器，控制器和解码器。编码器映射图像转换成一个良好的解缠分层组织的潜在空间。所述控制器通过改变控制矢量调整潜向量对应属性的所需的强度的大小。解码器从属性空间到图像空间转换所需的表示。为了便于覆盖凝视方向的全部空间，我们引入了一个高品质的注视图像数据集与大范围的方向，这也有利于在相关领域的研究人员。大量的实验验证和比较几种方法基线显示，在图像质量和重定向精度方面提出interpGaze性能优于国家的最先进的方法。

12. Background Learnable Cascade for Zero-Shot Object Detection [PDF] 返回目录
Ye Zheng, Ruoran Huang, Chuanqi Han, Xi Huang, Li Cui
Abstract: Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cascade (BLC) to improve ZSD performance. The major contributions for BLC are as follows: (i) we propose a multi-stage cascade structure named Cascade Semantic R-CNN to progressively refine the alignment between visual and semantic of ZSD; (ii) we develop the semantic information flow structure and directly add it between each stage in Cascade Semantic RCNN to further improve the semantic feature learning; (iii) we propose the background learnable region proposal network (BLRPN) to learn an appropriate word vector for background class and use this learned vector in Cascade Semantic R CNN, this design makes \Background Learnable" and reduces the confusion between background and unseen classes. Our extensive experiments show BLC obtains significantly performance improvements for MS-COCO over state-of-the-art methods.
摘要：零镜头检测（ZSD）是大型物体检测用的同时定位和识别对象看不见的目的是至关重要的。目前仍然是ZSD一些挑战，包括减少背景和看不见的物体之间的模糊性，以及改善视觉和语义概念之间的一致性。在这项工作中，我们提出了一种新命名的背景框架，可学习级联（BLC），以提高性能ZSD。对于BLC的主要内容如下：（ⅰ）我们建议命名级联语义R-CNN逐步的多级级联结构细化的视觉和语义ZSD的之间的对准; （ⅱ）我们开发的语义信息流结构，并直接在级联语义RCNN每个阶段之间添加它进一步提高语义特征点学习; （iii）本公司提出的背景可以学习的区域建议网络（BLRPN）学习背景类合适的词汇向量和级联语义[R CNN使用此了解到载体，这样的设计使得\背景可学习”，降低了背景和看不见的类之间的混淆我们广泛的实验表明BLC显著取得对国家的最先进的方法的性能改进的MS-COCO。

13. Contralaterally Enhanced Networks for Thoracic Disease Detection [PDF] 返回目录
Gangming Zhao, Chaowei Fang, Guanbin Li, Licheng Jiao, Yizhou Yu
Abstract: Identifying and locating diseases in chest X-rays are very challenging, due to the low visual contrast between normal and abnormal regions, and distortions caused by other overlapping tissues. An interesting phenomenon is that there exist many similar structures in the left and right parts of the chest, such as ribs, lung fields and bronchial tubes. This kind of similarities can be used to identify diseases in chest X-rays, according to the experience of broad-certificated radiologists. Aimed at improving the performance of existing detection methods, we propose a deep end-to-end module to exploit the contralateral context information for enhancing feature representations of disease proposals. First of all, under the guidance of the spine line, the spatial transformer network is employed to extract local contralateral patches, which can provide valuable context information for disease proposals. Then, we build up a specific module, based on both additive and subtractive operations, to fuse the features of the disease proposal and the contralateral patch. Our method can be integrated into both fully and weakly supervised disease detection frameworks. It achieves 33.17 AP50 on a carefully annotated private chest X-ray dataset which contains 31,000 images. Experiments on the NIH chest X-ray dataset indicate that our method achieves state-of-the-art performance in weakly-supervised disease localization.
摘要：识别和定位在胸部X射线疾病非常具有挑战性的，由于正常和异常区域，并引起其他重叠组织的失真之间的低视觉对比度。一个有趣的现象是，存在在胸部的左部和右部的许多相似的结构，如肋，肺野和支气管。这种相似性可以被用于识别在胸部X射线的疾病，根据广泛的认证放射科医师的经验。改善现有的检测方法的性能为目标，我们提出了一个深刻的端至端模块利用增强的疾病的建议特征表示对侧的上下文信息。首先，脊柱线的指引下，空间变换网络被用来提取当地对侧补丁，可为疾病的建议提供有价值的背景信息。然后，我们建立一个特定模块的基础上，既加分和扣分操作，融合了疾病的建议和对侧补丁的功能。我们的方法可以集成到既充分和弱监督的疾病检测框架。它实现了对其中包含31000个图像的精心标注的私人胸部X射线数据集33.17 AP50。在NIH胸部X射线数据集的实验表明，我们的方法实现了弱监督的疾病本地化的国家的最先进的性能。

14. gundapusunil at SemEval-2020 Task 8: Multimodal Memotion Analysis [PDF] 返回目录
Sunil Gundapu, Radhika Mamidi
Abstract: Recent technological advancements in the Internet and Social media usage have resulted in the evolution of faster and efficient platforms of communication. These platforms include visual, textual and speech mediums and have brought a unique social phenomenon called Internet memes. Internet memes are in the form of images with witty, catchy, or sarcastic text descriptions. In this paper, we present a multi-modal sentiment analysis system using deep neural networks combining Computer Vision and Natural Language Processing. Our aim is different than the normal sentiment analysis goal of predicting whether a text expresses positive or negative sentiment; instead, we aim to classify the Internet meme as a positive, negative, or neutral, identify the type of humor expressed and quantify the extent to which a particular effect is being expressed. Our system has been developed using CNN and LSTM and outperformed the baseline score.
摘要：在互联网和社交媒体的使用最近的技术进步已经导致了更快的发展和交流的高效平台。这些平台包括视觉，文本和语音媒介和带来了一种独特的社会现象称为网络爆红。网络爆红是与诙谐，琅琅上口，或讽刺的文字描述图像的形式。在本文中，我们使用深层神经网络结合计算机视觉和自然语言处理呈现出多模态的情感分析系统。我们的目标是比预测的文字是否表达积极或消极情绪的正常情感分析的目标不同;相反，我们的目标是在互联网模因分类为正，负，中性或，鉴定表达幽默的类型和量化到特定的效果被表达的程度。我们的系统已经使用CNN和LSTM开发并跑赢基准得分。

15. Real-time Mask Detection on Google Edge TPU [PDF] 返回目录
Keondo Park, Wonyoung Jang, Woochul Lee, Kisung Nam, Kihong Seong, Kyuwook Chai, Wen-Syan Li
Abstract: After the COVID-19 outbreak, it has become important to automatically detect whether people are wearing masks in order to reduce risk of front-line workers. In addition, processing user data locally is a great way to address both privacy and network bandwidth issues. In this paper, we present a light-weighted model for detecting whether people in a particular area wear masks, which can also be deployed on Coral Dev Board, a commercially available development board containing Google Edge TPU. Our approach combines the object detecting network based on MobileNetV2 plus SSD and the quantization scheme for integer-only hardware. As a result, the lighter model in the Edge TPU has a significantly lower latency which is more appropriate for real-time execution while maintaining accuracy comparable to a floating point device.
摘要：COVID-19爆发后，它的人是否戴口罩，以减少一线工人的风险自动检测变得很重要。此外，在本地处理用户数据，以解决隐私和网络带宽问题的好方法。在本文中，我们提出了一个光加权模型在某一特定领域戴口罩，这也可以在珊瑚开发板，含谷歌边缘TPU市售开发板部署检测是否人。我们的方法将基于MobileNetV2加SSD和用于仅整数的硬件使用的量化方案的物体检测网络。其结果，在边缘TPU打火机模型具有显著较低等待时间，同时维持准确度相当于一个浮点装置，其更适合于实时执行。

16. Long-distance tiny face detection based on enhanced YOLOv3 for unmanned system [PDF] 返回目录
Jia-Yi Chang, Yan-Feng Lu, Ya-Jun Liu, Bo Zhou, Hong Qiao
Abstract: Remote tiny face detection applied in unmanned system is a challeng-ing work. The detector cannot obtain sufficient context semantic information due to the relatively long distance. The received poor fine-grained features make the face detection less accurate and robust. To solve the problem of long-distance detection of tiny faces, we propose an enhanced network model (YOLOv3-C) based on the YOLOv3 algorithm for unmanned platform. In this model, we bring in multi-scale features from feature pyramid networks and make the features fu-sion to adjust prediction feature map of the output, which improves the sensitivity of the entire algorithm for tiny target faces. The enhanced model improves the accuracy of tiny face detection in the cases of long-distance and high-density crowds. The experimental evaluation results demonstrated the superior perfor-mance of the proposed YOLOv3-C in comparison with other relevant detectors in remote tiny face detection. It is worth mentioning that our proposed method achieves comparable performance with the state of the art YOLOv4[1] in the tiny face detection tasks.
摘要：在无人系统应用于远程小小的人脸检测是一个challeng-ING工作。检测器可以不是由于相对长的距离得到足够的上下文语义信息。接收差细粒度的特点使得人脸检测不太准确和鲁棒性。为了解决长距离检测的微小面的问题，我们提出了基于YOLOv3算法无人操纵的平台的增强型网络模型（YOLOv3-C）。在此模型中，我们在多尺度特征带来从特征金字塔网络和使设有富锡永来调整输出，这改善了整个算法的小目标面的敏感性的预测特征地图。增强模式提高了微小的面部检测的长距离和高密度的人群的情况下的精确度。实验评价结果表明了优越perfor-曼斯拟议YOLOv3-C的与在远程小面部检测其他相关检测器的比较。值得一提的是我们提出的方法实现了与现有技术YOLOv4 [1]在微小的人脸检测任务的状态相当的性能。

17. A deep learning based interactive sketching system for fashion images design [PDF] 返回目录
Yao Li, Xianggang Yu, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu
Abstract: In this work, we propose an interactive system to design diverse high-quality garment images from fashion sketches and the texture information. The major challenge behind this system is to generate high-quality and detailed texture according to the user-provided texture information. Prior works mainly use the texture patch representation and try to map a small texture patch to a whole garment image, hence unable to generate high-quality details. In contrast, inspired by intrinsic image decomposition, we decompose this task into texture synthesis and shading enhancement. In particular, we propose a novel bi-colored edge texture representation to synthesize textured garment images and a shading enhancer to render shading based on the grayscale edges. The bi-colored edge representation provides simple but effective texture cues and color constraints, so that the details can be better reconstructed. Moreover, with the rendered shading, the synthesized garment image becomes more vivid.
摘要：在这项工作中，我们提出了一个互动系统，从时装设计草图和纹理信息设计多样化的高品质的服装图像。这个系统背后的主要挑战是根据用户提供的纹理信息以生成高质量和细节纹理。在此之前的作品主要使用纹理补丁代表，并尝试小纹理补丁映射到一个整件服装的图像，从而无法产生高品质的细节。与此相反，内在的图像分解的启发，我们分解这个任务到纹理合成和阴影增强。特别是，我们提出了一个新颖的双色边缘纹理表示合成纹理服装图像和阴影增强渲染基于灰度边缘着色。该双色边表示提供简单而有效的纹理线索和颜色的限制，这样的细节可以更好地重建。此外，与所绘制的遮蔽，合成的服装图像变得更加生动。

18. Generating Novel Glyph without Human Data by Learning to Communicate [PDF] 返回目录
Seung-won Park
Abstract: In this paper, we present Neural Glyph, a system that generates novel glyph without any training data. The generator and the classifier are trained to communicate via visual symbols as a medium, which enforces the generator to come up with a set of distinctive symbols. Our method results in glyphs that resemble the human-made glyphs, which may imply that the visual appearances of existing glyphs can be attributed to constraints of communication via writing. Important tricks that enable this framework is described and the code is made available.
摘要：在本文中，我们提出神经字形，即无需任何的训练数据生成新的标志符号的系统。所述发电机和所述分类器被训练经由视觉符号进行通信为媒介，其强制发电机拿出一套独特的符号。我们的方法的结果在类似于人为字形，这可能意味着现有的字形视觉外观可以通过写入归因于通信的约束字形。重要的技巧，使这个框架的描述和代码可用。

19. Learning 3D Face Reconstruction with a Pose Guidance Network [PDF] 返回目录
Pengpeng Liu, Xintong Han, Michael Lyu, Irwin King, Jia Xu
Abstract: We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN). First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters. With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images. Our network is further augmented with a self-supervised learning scheme, which exploits face geometry information embedded in multiple frames of the same person, to alleviate the ill-posed nature of regressing 3D face geometry from a single image. These three insights yield a single approach that combines the complementary strengths of parametric model learning and data-driven learning techniques. We conduct a rigorous evaluation on the challenging AFLW2000-3D, Florence and FaceWarehouse datasets, and show that our method outperforms the state-of-the-art for all metrics.
摘要：我们提出了一个自我监督的学习方法来学习单眼三维人脸重建的姿势指导网络（PGN）。首先，我们推出之前参数化三维脸学习方法姿态估计的瓶颈，并提出利用三维人脸标志估计姿态参数。随着我们专门设计的PGN，我们的模型可以从完全标记3D地标和无限未标记在最狂野的人脸图像两面学习。我们的网络与自我监督的学习方式，它利用嵌入在同一个人的多帧脸部的几何信息，以缓解从一个单一的形象回归的三维人脸形状的病态性质进一步增强。这三种见解产生一个单一的方法，结合参数化模型的学习和优势互补数据驱动的学习技术。我们的挑战AFLW2000-3D，佛罗伦萨和FaceWarehouse数据集进行了严格的评估，并表明我们的方法优于国家的最先进的所有指标。

20. Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic [PDF] 返回目录
Sadegh Aliakbarian
Abstract: Video anticipation is the task of predicting one/multiple future representation(s) given limited, partial observation. This is a challenging task due to the fact that given limited observation, the future representation can be highly ambiguous. Based on the nature of the task, video anticipation can be considered from two viewpoints: the level of details and the level of determinism in the predicted future. In this research, we start from anticipating a coarse representation of a deterministic future and then move towards predicting continuous and fine-grained future representations of a stochastic process. The example of the former is video action anticipation in which we are interested in predicting one action label given a partially observed video and the example of the latter is forecasting multiple diverse continuations of human motion given partially observed one. In particular, in this thesis, we make several contributions to the literature of video anticipation...
摘要：视频预期是给定预测的限制，部分观察中的一个/多个未来表示（一个或多个）的任务。这是一个具有挑战性的任务，由于这样的事实，有限地观察，未来表示可以是模棱两可的。的详细程度和确定性的预测未来的水平：基于任务的性质，预期的视频可以从两个视角来考虑。在这项研究中，我们从期待确定性未来的粗略表示开始，然后向预测未来一个随机过程的连续和细粒度的表示移动。前者的例子是视频动作预期在其中我们感兴趣的是预测一个动作标签给出一个部分可观测视频和后者的示例是给定部分可观测一个人体运动的预测多个不同的延续。特别是，在本文中，我们对视频期待的文献一些贡献...

21. Robust Instance Tracking via Uncertainty Flow [PDF] 返回目录
Jianing Qian, Junyu Nan, Siddharth Ancha, Brian Okorn, David Held
Abstract: Current state-of-the-art trackers often fail due to distractorsand large object appearance changes. In this work, we explore the use ofdense optical flow to improve tracking robustness. Our main insight is that, because flow estimation can also have errors, we need to incorporate an estimate of flow uncertainty for robust tracking. We present a novel tracking framework which combines appearance and flow uncertainty information to track objects in challenging scenarios. We experimentally verify that our framework improves tracking robustness, leading to new state-of-the-art results. Further, our experimental ablations shows the importance of flow uncertainty for robust tracking.
摘要：当前状态的最先进的跟踪器往往不能由于distractorsand大对象的外观的变化。在这项工作中，我们将探讨ofdense光流的使用，以提高跟踪的鲁棒性。我们的主要观点是，由于流估计也能有错误，我们需要把流量不确定性的估计鲁棒跟踪。我们提出了一个新颖的跟踪框架相结合的外观及流动的不确定性的信息来跟踪在具有挑战性的场景的对象。我们用实验来验证我们的框架，提高了跟踪的鲁棒性，导致国家的最先进的新成果。此外，我们的实验消融显示了强大的跟踪流量的不确定性的重要性。

22. DeepStreet: A deep learning powered urban street network generation module [PDF] 返回目录
Zhou Fang, Tianren Yang, Ying Jin
Abstract: In countries experiencing unprecedented waves of urbanization, there is a need for rapid and high quality urban street design. Our study presents a novel deep learning powered approach, DeepStreet (DS), for automatic street network generation that can be applied to the urban street design with local characteristics. DS is driven by a Convolutional Neural Network (CNN) that enables the interpolation of streets based on the areas of immediate vicinity. Specifically, the CNN is firstly trained to detect, recognize and capture the local features as well as the patterns of the existing street network sourced from the OpenStreetMap. With the trained CNN, DS is able to predict street networks' future expansion patterns within the predefined region conditioned on its surrounding street networks. To test the performance of DS, we apply it to an area in and around the Eixample area in the City of Barcelona, a well known example in the fields of urban and transport planning with iconic grid like street networks in the centre and irregular road alignments farther afield. The results show that DS can (1) detect and self cluster different types of complex street patterns in Barcelona; (2) predict both gridiron and irregular street and road networks. DS proves to have a great potential as a novel tool for designers to efficiently design the urban street network that well maintains the consistency across the existing and newly generated urban street network. Furthermore, the generated networks can serve as a benchmark to guide the local plan-making especially in rapidly developing cities.
摘要：在经历前所未有的城市化浪潮的国家，有必要进行快速和高质量的城市街道设计。我们的研究提出了一种深度学习动力的方法，DeepStreet（DS），自动街道网络一代可以应用到城市街道设计具有地方特色。 DS是由卷积神经网络（CNN），使基于附近的地区街道的插值驱动。具体来说，CNN首先训练来检测，识别和捕捉到的地方特色以及来自OpenStreetMap的来源现有街道网络的模式。与训练CNN，DS是能够调节它的周围街道网络的预定义区域内的预测街道网络未来的扩展图案。为了测试DS的性能，我们把它应用到一个地区和周围的Eixample地区的巴塞罗那市，在城市和交通规划与像在中心街道网络和不规则的道路路线标志性的网格领域的著名例子更远的地方。结果表明，DS可以：（1）检测并自集群的不同类型的在巴塞罗那复杂街道模式; （2）预测二者烤架和不规则街道和道路网络。 DS证明有很大的潜力，作为设计师的新颖工具来有效地设计城市街道网络以及保持跨越现有的和新产生的城市道路网络的一致性。此外，所产生的网络可以作为一个基准，尤其是引导地方规划的制定在快速发展的城市。

23. Deep Learning Superpixel Semantic Segmentation with Transparent Initialization and Sparse Encoder [PDF] 返回目录
Zhiwei Xu, Thalaiyasingam Ajanthan, Richard Hartley
Abstract: Even though deep learning greatly improves the performance of semantic segmentation, its success mainly lies on object central areas but without accurate edges. As superpixel is a popular and effective auxiliary to preserve object edges, in this paper, we jointly learn semantic segmentation with trainable superpixels. We achieve it by adding fully-connected layers with transparent initialization and an efficient logit uniformization with a sparse encoder. Specifically, the proposed transparent initialization reserves the effects of learned parameters from pretrained networks, one for semantic segmentation and the other for superpixel, by a linear data recovery. This avoids a significant loss increase by using the pretrained networks, which otherwise can be caused by an inappropriate parameter initialization on the added layers. Meanwhile, consistent assignments to all pixels in each superpixel can be guaranteed by the logit uniformization with a sparse encoder. This sparse encoder with sparse matrix operations substantially improves the training efficiency by reducing the large computational complexity arising from indexing pixels by superpixels. We demonstrate the effectiveness of our proposal by transparent initialization and sparse encoder on semantic segmentation on PASCAL VOC 2012 dataset with enhanced labeling on the object edges. Moreover, the proposed transparent initialization can also be used to jointly finetune multiple or a deeper pretrained network on other tasks.
摘要：尽管深度学习大大提高了语义分割的性能，它的成功主要取决于物体的中心区，但没有准确的边缘。作为超像素是一种流行和有效的辅助保护对象的边缘，在本文中，我们共同学习与训练的超像素的语义分割。我们通过用透明的初始化，并用一个编码器稀疏的有效分对数均一加入全连接层实现它。具体地，所提出的透明初始化储量了解到参数的影响从预训练网络，一个用于语义分割而另一个用于超像素，通过线性数据恢复。这避免了使用预训练的网络，否则可以通过在添加层不适当的参数初始化引起的一个显著损耗增加。同时，在每个超级像素的所有像素一致的任务可以通过稀疏编码器的Logit均匀得到保证。这种稀疏编码器，具有稀疏矩阵运算显着地提高通过降低超像素从索引像素所产生的大的计算复杂性的训练效率。我们证明通过透明的初始化和语义分割稀疏编码器我们的建议的有效性上PASCAL VOC 2012数据集上的物体边缘增强了标签。此外，提出的透明初始化也可以用来共同精调或多个其他任务更深的预训练网络。

24. Once Quantized for All: Progressively Searching for Quantized Efficient Models [PDF] 返回目录
Mingzhu Shen, Feng Liang, Chuming Li, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
Abstract: Automatic search of Quantized Neural Networks has attracted a lot of attention. However, the existing quantization aware Neural Architecture Search (NAS) approaches inherit a two-stage search-retrain schema, which is not only time-consuming but also adversely affected by the unreliable ranking of architectures during the search. To avoid the undesirable effect of the search-retrain schema, we present Once Quantized for All (OQA), a novel framework that searches for quantized efficient models and deploys their quantized weights at the same time without additional post-process. While supporting a huge architecture search space, our OQA can produce a series of ultra-low bit-width(e.g. 4/3/2 bit) quantized efficient models. A progressive bit inheritance procedure is introduced to support ultra-low bit-width. Our discovered model family, OQANets, achieves a new state-of-the-art (SOTA) on quantized efficient models compared with various quantization methods and bit-widths. In particular, OQA2bit-L achieves 64.0% ImageNet Top-1 accuracy, outperforming its 2-bit counterpart EfficientNet-B0@QKD by a large margin of 14% using 30% less computation budget. Code is available at this https URL.
摘要：自动搜索量化神经网络已经吸引了大量的关注。然而，现有的量化感知神经结构搜索（NAS）接近继承两级搜索再训练模式，这不仅费时，而且通过在搜索过程中的不可靠的排名体系的不利影响。为了避免搜索重新训练模式的不良影响，我们提出一旦量化所有（OQA），一个新的框架，用于量化有效的模式搜索，并在无需额外的后处理，同时部署它们的量化权重。同时支持一个巨大的体系结构的搜索空间，我们OQA可产生一系列超低位宽度（例如4/3/2位）的量化有效的模式。甲累进位继承过程被引入以支持超低比特宽度。我们发现了示范户，OQANets，达到一个新的国家的最先进的量子化高效模型（SOTA）的各种量化方法和位宽相比。特别地，OQA2bit-L达到64.0％ImageNet顶1的精度，比其2位对应EfficientNet-B0 @通过使用30％较少的计算预算的14％大的余量QKD。代码可在此HTTPS URL。

25. MMGSD: Multi-Modal Gaussian Shape Descriptors for Correspondence Matching in 1D and 2D Deformable Objects [PDF] 返回目录
Aditya Ganapathi, Priya Sundaresan, Brijen Thananjeyan, Ashwin Balakrishna, Daniel Seita, Ryan Hoque, Joseph E. Gonzalez, Ken Goldberg
Abstract: We explore learning pixelwise correspondences between images of deformable objects in different configurations. Traditional correspondence matching approaches such as SIFT, SURF, and ORB can fail to provide sufficient contextual information for fine-grained manipulation. We propose Multi-Modal Gaussian Shape Descriptor (MMGSD), a new visual representation of deformable objects which extends ideas from dense object descriptors to predict all symmetric correspondences between different object configurations. MMGSD is learned in a self-supervised manner from synthetic data and produces correspondence heatmaps with measurable uncertainty. In simulation, experiments suggest that MMGSD can achieve an RMSE of 32.4 and 31.3 for square cloth and braided synthetic nylon rope respectively. The results demonstrate an average of 47.7% improvement over a provided baseline based on contrastive learning, symmetric pixel-wise contrastive loss (SPCL), as opposed to MMGSD which enforces distributional continuity.
摘要：我们在探索中学习不同的配置可变形物体的图像之间的基于像素的对应。传统的对应匹配方法，如SIFT，SURF，并且ORB能不能提供细粒度操纵足够的上下文信息。我们提出的多模态高斯形状描述符（MMGSD），可变形的对象的一个新的可视化表示，从密对象描述符延伸的想法来预测不同的对象结构之间的所有对称的对应关系。 MMGSD在由合成数据的自监督的方式学习，并生成对应热图具有可测量的不确定性。在模拟中，实验表明，MMGSD可分别达到32.4和31.3为方形布和编织合成尼龙绳的RMSE。结果基于对比学习，对称逐像素对比损耗（SPCL），而不是其MMGSD强制分配连续性通过提供基线表明平均47.7％的改进。

26. Targeted Attention Attack on Deep Learning Models in Road Sign Recognition [PDF] 返回目录
Xinghao Yang, Weifeng Liu, Shengli Zhang, Wei Liu, Dacheng Tao
Abstract: Real world traffic sign recognition is an important step towards building autonomous vehicles, most of which highly dependent on Deep Neural Networks (DNNs). Recent studies demonstrated that DNNs are surprisingly susceptible to adversarial examples. Many attack methods have been proposed to understand and generate adversarial examples, such as gradient based attack, score based attack, decision based attack, and transfer based attacks. However, most of these algorithms are ineffective in real-world road sign attack, because (1) iteratively learning perturbations for each frame is not realistic for a fast moving car and (2) most optimization algorithms traverse all pixels equally without considering their diverse contribution. To alleviate these problems, this paper proposes the targeted attention attack (TAA) method for real world road sign attack. Specifically, we have made the following contributions: (1) we leverage the soft attention map to highlight those important pixels and skip those zero-contributed areas - this also helps to generate natural perturbations, (2) we design an efficient universal attack that optimizes a single perturbation/noise based on a set of training images under the guidance of the pre-trained attention map, (3) we design a simple objective function that can be easily optimized, (4) we evaluate the effectiveness of TAA on real world data sets. Experimental results validate that the TAA method improves the attack successful rate (nearly 10%) and reduces the perturbation loss (about a quarter) compared with the popular RP2 method. Additionally, our TAA also provides good properties, e.g., transferability and generalization capability. We provide code and data to ensure the reproducibility: this https URL.
摘要：现实世界中的交通标志识别是朝着建设自主车，一个重要的步骤，其高度依赖于深层神经网络（DNNs）大部分。最近的研究表明，DNNs是出奇的容易对抗性的例子。许多攻击方法已经被提出来理解和产生对抗的例子，如基于梯度的攻击，评分依据的攻击，以决定攻击，并转移攻击为主。然而，大部分的这些算法在现实世界的路标攻击无效，因为：（1）对每一帧迭代学习扰动是不是一个快速移动的汽车的现实和（2）最优化算法同样遍历所有像素，而不考虑其多样化的贡献。为了缓解这些问题，本文提出了现实世界的路标攻击目标的关注攻击（TAA）方法。具体来说，我们提出了以下贡献：（1）我们利用软注意图来突出这些重要像素并跳过那些零贡献的领域 - 这也有助于产生自然干扰，（2），我们设计一个高效的普遍攻击优化单一扰动/噪声基于一组预先训练的注意力图的指导下训练图像，（3）我们的设计可以很容易地优化一个简单的目标函数，（4），我们评估TAA对现实世界的有效性数据集。实验结果证明，该方法TAA提高了进攻成功率（近10％），并减少了与流行RP2方法相比扰动损失（约四分之一）。此外，我们还TAA提供了良好的性能，例如，转让和推广能力。我们提供的代码和数据，以确保重复性：此HTTPS URL。

27. Deep-Masking Generative Network: A Unified Framework for Background Restoration from Superimposed Images [PDF] 返回目录
Xin Feng, Wenjie Pei, Zihui Jia, David Zhang, Guangming Lu
Abstract: Restoring the clean background from the superimposed images containing a noisy layer is the common crux of a classical category of tasks on image restoration such as image reflection removal, image deraining and image dehazing. These tasks are typically formulated and tackled individually due to the diverse and complicated appearance patterns of noise layers within the image. In this work we present the Deep-Masking Generative Network (DMGN), which is a unified framework for background restoration from the superimposed images and is able to cope with different types of noise. Our proposed DMGN follows a coarse-to-fine generative process: a coarse background image and a noise image are first generated in parallel, then the noise image is further leveraged to refine the background image to achieve a higher-quality background image. In particular, we design the novel Residual Deep-Masking Cell as the core operating unit for our DMGN to enhance the effective information and suppress the negative information during image generation via learning a gating mask to control the information flow. By iteratively employing this Residual Deep-Masking Cell, our proposed DMGN is able to generate both high-quality background image and noisy image progressively. Furthermore, we propose a two-pronged strategy to effectively leverage the generated noise image as contrasting cues to facilitate the refinement of the background image. Extensive experiments across three typical tasks for image background restoration, including image reflection removal, image rain steak removal and image dehazing, show that our DMGN consistently outperforms state-of-the-art methods specifically designed for each single task.
摘要：恢复从含有噪声的层的叠加图像中的干净的背景上的图像恢复的任务，如图像反射的去除，图像deraining和图像除雾的经典类别的共同关键。这些任务通常配制并由于图像内的噪声层的多样化和复杂化出现图案单独解决。在这项工作中，我们提出了深屏蔽剖成网络（DMGN），这是从叠加图像背景恢复一个统一的框架，并能应付不同类型的噪声。我们提出的DMGN如下粗到细的生成过程：一个粗背景图像和噪声图像并行地首先产生，那么噪声图像被进一步利用，以缩小的背景图像来实现更高质量的背景图像。特别是，我们设计了新型的剩余深度掩蔽细胞作为我们DMGN以提高有效的信息的核心操作单元和通过学习门控掩模来控制信息流抑制图像生成在负的信息。通过反复使用这种残余深遮蔽细胞，我们提出的DMGN能够同时生成高质量的背景图像和噪声图像渐进。此外，我们提出了一个双管齐下的策略，以有效地利用所产生的噪声图像作为对比线索，以方便背景图像的精细化。跨越三个典型任务大量的实验图像背景恢复，包括图像反射消除，图像雨牛排消除和图像除雾，表明我们的DMGN的性能一直优于国家的最先进的方法，专门为每个单独的任务而设计的。

28. Addressing the Real-world Class Imbalance Problem in Dermatology [PDF] 返回目录
Wei-Hung Weng, Jonathan Deaton, Vivek Natarajan, Gamaleldin F. Elsayed, Yuan Liu
Abstract: Class imbalance is a common problem in medical diagnosis, causing a standard classifier to be biased towards the common classes and perform poorly on the rare classes. This is especially true for dermatology, a specialty with thousands of skin conditions but many of which have rare prevalence in the real world. Motivated by recent advances, we explore few-shot learning methods as well as conventional class imbalance techniques for the skin condition recognition problem and propose an evaluation setup to fairly assess the real-world utility of such approaches. When compared to conventional class imbalance techniques, we find that few-shot learning methods are not as performant as those conventional methods, but combining the two approaches using a novel ensemble leads to improvement in model performance, especially for rare classes. We conclude that the ensemble can be useful to address the class imbalance problem, yet progress here can further be accelerated by the use of real-world evaluation setups for benchmarking new methods.
摘要：类不平衡是医疗诊断的一个常见问题，导致标准分类器以朝向公共类被偏置并在罕见的类表现不佳。这尤其适用于皮肤科，成千上万的皮肤状况的特产很多，但在现实世界中罕见的患病率。通过最新进展的推动下，我们探讨几个拍的学习方法以及对于皮肤状况识别问题的常规类不平衡技术，并提出评价设置，以公正地评价这种方法的真实世界的工具。相较于传统的类不平衡技术，我们发现，一些次学习方法都不如高性能的那些常规方法，但两者结合使用一种新型的集成引线接近改善模型的性能，特别是对于罕见的类。我们的结论是合奏可以解决不平衡类问题是有用的，但在这里进步还可以通过使用现实世界的评价设置为基准的新方法来加速。

29. Large Scale Indexing of Generic Medical Image Data using Unbiased Shallow Keypoints and Deep CNN Features [PDF] 返回目录
L. Chauvin, M. Ben Lazreg, J.B. Carluer, W. Wells, M. Toews
Abstract: We propose a unified appearance model accounting for traditional shallow (i.e. 3D SIFT keypoints) and deep (i.e. CNN output layers) image feature representations, encoding respectively specific, localized neuroanatomical patterns and rich global information into a single indexing and classification framework. A novel Bayesian model combines shallow and deep features based on an assumption of conditional independence and validated by experiments indexing specific family members and general group categories in 3D MRI neuroimage data of 1010 subjects from the Human Connectome Project, including twins and non-twin siblings. A novel domain adaptation strategy is presented, transforming deep CNN vectors elements into binary class-informative descriptors. A GPU-based implementation of all processing is provided. State-of-the-art performance is achieved in large-scale neuroimage indexing, both in terms of computational complexity, accuracy in identifying family members and sex classification.
摘要：我们提出了传统的浅（即3D SIFT关键点）和深层（即CNN的输出层）的图像特征表示，分别编码具体的，局部的神经解剖学模式和丰富的全球信息到一个单一的索引和分类框架，统一的外观模型核算。一种新颖的贝叶斯模型结合浅并且基于条件独立性的假设，并通过实验索引特定家庭成员和一般组类别中的来自人类连接组项目1010组的受试者，包括孪晶和非双兄弟姐妹三维MRI影像学数据验证深特性。一种新的域自适应策略被呈现，转化深CNN向量元素为二进制类信息性描述符。提供所有处理的基于GPU的实现。国家的最先进的性能在大型影像学索引实现，无论是在确定家庭成员和性别分类的计算复杂性，准确性方面。

30. Refinement of Predicted Missing Parts Enhance Point Cloud Completion [PDF] 返回目录
Alexis Mendoza, Alexander Apaza, Ivan Sipiran, Cristian Lopez
Abstract: Point cloud completion is the task of predicting complete geometry from partial observations using a point set representation for a 3D shape. Previous approaches propose neural networks to directly estimate the whole point cloud through encoder-decoder models fed by the incomplete point set. By predicting the complete model, the current methods compute redundant information because the output also contains the known incomplete input geometry. This paper proposes an end-to-end neural network architecture that focuses on computing the missing geometry and merging the known input and the predicted point cloud. Our method is composed of two neural networks: the missing part prediction network and the merging-refinement network. The first module focuses on extracting information from the incomplete input to infer the missing geometry. The second module merges both point clouds and improves the distribution of the points. Our experiments on ShapeNet dataset show that our method outperforms the state-of-the-art methods in point cloud completion. The code of our methods and experiments is available in \url{this https URL}.
摘要：云计算点是完成从预测用点集表示为一个三维形状部分观测完整的几何形状的任务。先前的方法提出神经网络来直接估计整点云通过由不完整的点集供给编码器 - 解码器模型。通过预测该型号齐全，目前的方法计算冗余信息，因为输出还包含已知的不完整输入几何。本文提出了着眼于计算遗漏的几何形状和合并所述已知的输入和预测点云的端至端的神经网络结构。我们的方法是由两个神经网络：缺失的部分预测网络和合并细化网络。第一个模块侧重于从不完整输入提取信息来推断缺少几何图形。第二个模块合并两个点云和改善点的分布。我们对ShapeNet数据集实验表明，我们的方法优于在点云完成的国家的最先进的方法。我们的方法和实验的代码是\ URL下载{此HTTPS URL}。

31. Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition [PDF] 返回目录
Varsha Nair, Moitrayee Chatterjee, Neda Tavakoli, Akbar Siami Namin, Craig Snoeyink
Abstract: This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs involved in the Convolutional Neural Networks (CNNs) and thus reduces the overall computational costs. The proposed model identifies the object information from the images. We apply the Fast Fourier transform algorithm on an image data set to obtain more accessible information about the image data, before segmenting them through the U-Net architecture. More specifically, we implement the FFT-based convolutional neural network to improve the training time of the network. The proposed approach was applied to publicly available Broad Bioimage Benchmark Collection (BBBC) dataset. Our model demonstrated improvement in training time during convolution from $600-700$ ms/step to $400-500$ ms/step. We evaluated the accuracy of our model using Intersection over Union (IoU) metric showing significant improvements.
摘要：本文提出了使用快速转型基于傅立叶掌中宽带（一精完全卷积网络）和神经网络进行图像卷积。利用快速傅立叶变换，它减少了参与卷积神经网络（细胞神经网络）的图像卷积成本，从而降低了整体计算成本。该模型识别从图像中的对象的信息。我们采用快速傅立叶变换算法的图像数据集，以获取有关的图像数据更容易获得信息，通过U-Net的结构分割之前。更具体地讲，我们实现了基于FFT的卷积神经网络，以提高网络的训练时间。所提出的方法应用到可公开获得的广泛的生物影像基准集（BBBC）数据集。我们的模型从$ 600-700 $ MS /步400-500 $ $ MS /步卷积过程表现在训练时间的改善。我们评估使用交叉口我们的模型在联盟（欠条）指标显示显著改进的精度。

32. Ensemble Hyperspectral Band Selection for Detecting Nitrogen Status in Grape Leaves [PDF] 返回目录
Ryan Omidi, Ali Moghimi, Alireza Pourreza, Mohamed El-Hadedy Aly, Anas Salah Eddin
Abstract: The large data size and dimensionality of hyperspectral data demands complex processing and data analysis. Multispectral data do not suffer the same limitations, but are normally restricted to blue, green, red, red edge, and near infrared bands. This study aimed to identify the optimal set of spectral bands for nitrogen detection in grape leaves using ensemble feature selection on hyperspectral data from over 3,000 leaves from 150 Flame Seedless table grapevines. Six machine learning base rankers were included in the ensemble: random forest, LASSO, SelectKBest, ReliefF, SVM-RFE, and chaotic crow search algorithm (CCSA). The pipeline identified less than 0.45% of the bands as most informative about grape nitrogen status. The selected violet, yellow-orange, and shortwave infrared bands lie outside of the typical blue, green, red, red edge, and near infrared bands of commercial multispectral cameras, so the potential improvement in remote sensing of nitrogen in grapevines brought forth by a customized multispectral sensor centered at the selected bands is promising and worth further investigation. The proposed pipeline may also be used for application-specific multispectral sensor design in domains other than agriculture.
摘要：高光谱数据的大的数据大小和维度要求复杂的处理和数据分析。多光谱数据不会受到同样的限制，但一般限于蓝，绿，红，红边和近红外波段。本研究旨在确定光谱带的最佳集合为氮检测在使用高光谱数据从150个火焰无核葡萄表3000叶合奏特征选择葡萄叶。六台机器学习基地rankers被列入合奏：随机森林，套索，SelectKBest，ReliefF，SVM-RFE，理还乱鸦搜索算法（CCSA）。管道标识带的小于0.45％，约葡萄氮素状况最翔实。所选择的紫，黄橙色，和短波红外波段位于典型的蓝色，绿色，红色，红色边缘，和商业多光谱照相机近红外波段的外部，从而在氮气遥感的潜在改善葡萄藤由带来了在所选择的波段为中心的多光谱定制传感器是有前途的，值得进一步调查。所提出的管道也可以在农业以外域用于特定应用的多光谱传感器设计。

33. Efficient Real-Time Radial Distortion Correction for UAVs [PDF] 返回目录
Marcus Valtonen Örnhag, Patrik Persson, Mårten Wadenbäck, Kalle Åström, Anders Heyden
Abstract: In this paper we present a novel algorithm for onboard radial distortion correction for unmanned aerial vehicles (UAVs) equipped with an inertial measurement unit (IMU), that runs in real-time. This approach makes calibration procedures redundant, thus allowing for exchange of optics extemporaneously. By utilizing the IMU data, the cameras can be aligned with the gravity direction. This allows us to work with fewer degrees of freedom, and opens up for further intrinsic calibration. We propose a fast and robust minimal solver for simultaneously estimating the focal length, radial distortion profile and motion parameters from homographies. The proposed solver is tested on both synthetic and real data, and perform better or on par with state-of-the-art methods relying on pre-calibration procedures.
摘要：在本文中，我们提出了用于装备有惯性测量单元（IMU）无人驾驶飞行器（UAV）板上径向畸变修正的新颖算法，即在实时运行。这种方法使得校准程序冗余，因此允许光学器件的交换即兴。通过利用IMU数据，摄像机可与重力方向对齐。这使我们能够用更少的自由度的工作，并为进一步的内在校准打开。我们提出了一种快速和强大的最小解算器，用于同时估计所述焦距，径向畸变分布和运动参数的单应性从。所提出的解算器上合成的和真实的数据进行测试，并与国家的最先进的方法依赖于预校准程序执行得更好或看齐。

34. Unsupervised 3D Brain Anomaly Detection [PDF] 返回目录
Jaime Simarro Viana, Ezequiel de la Rosa, Thijs Vande Vyvere, David Robben, Diana M. Sima
Abstract: Anomaly detection (AD) is the identification of data samples that do not fit a learned data distribution. As such, AD systems can help physicians to determine the presence, severity, and extension of a pathology. Deep generative models, such as Generative Adversarial Networks (GANs), can be exploited to capture anatomical variability. Consequently, any outlier (i.e., sample falling outside of the learned distribution) can be detected as an abnormality in an unsupervised fashion. By using this method, we can not only detect expected or known lesions, but we can even unveil previously unrecognized biomarkers. To the best of our knowledge, this study exemplifies the first AD approach that can efficiently handle volumetric data and detect 3D brain anomalies in one single model. Our proposal is a volumetric and high-detail extension of the 2D f-AnoGAN model obtained by combining a state-of-the-art 3D GAN with refinement training steps. In experiments using non-contrast computed tomography images from traumatic brain injury (TBI) patients, the model detects and localizes TBI abnormalities with an area under the ROC curve of ~75%. Moreover, we test the potential of the method for detecting other anomalies such as low quality images, preprocessing inaccuracies, artifacts, and even the presence of post-operative signs (such as a craniectomy or a brain shunt). The method has potential for rapidly labeling abnormalities in massive imaging datasets, as well as identifying new biomarkers.
摘要：异常检测（AD）是数据样本不符合一个学习数据分配的识别。因此，AD系统可以帮助医生确定存在，严重程度和病理的扩展。深生成模型，如剖成对抗性网络（甘斯），可以被利用来捕捉解剖变异。因此，任何异常值（即，样品落入了解到分布的外部）可被检测为以无监督方式中的异常。通过这种方法，我们不仅可以检测预期的或已知的病变，但我们甚至可以揭示以前无法识别的生物标志物。据我们所知，这项研究体现了第一AD的方法，可以有效地处理容积数据，在一个单一的模型检测3D大脑异常。我们的建议是通过国家的最先进的三维GAN与精化训练步骤相结合所获得的2D F-AnoGAN模型的体积和高细节扩展。在使用非造影计算机断层扫描图像由创伤性脑损伤（TBI）的患者中，模型检测实验和本地化TBI异常具有〜75％的ROC曲线下的面积。此外，我们测试用于检测其他异常的方法的潜力，例如低质量的图像，预处理不准确，工件和的术后体征（如颅骨切除术或脑分流器）即使存在。所述方法具有用于在大规模数据集的成像标记迅速异常，以及确定新的生物标志物的潜力。

35. PathoNet: Deep learning assisted evaluation of Ki-67 and tumor infiltrating lymphocytes (TILs) as prognostic factors in breast cancer; A large dataset and baseline [PDF] 返回目录
Farzin Negahbani, Rasool Sabzi, Bita Pakniyat Jahromi, Fatemeh Movahedi, Mahsa Kohandel Shirazi, Shayan Majidi, Dena Firouzabadi, Amirreza Dehganian
Abstract: The nuclear protein Ki-67 and Tumor infiltrating lymphocytes (TILs) have been introduced as prognostic factors in predicting tumor progression and its treatment response. The value of the Ki-67 index and TILs in approach to heterogeneous tumors such as Breast cancer (BC), known as the most common cancer in women worldwide, has been highlighted in the literature. Due to the indeterminable and subjective nature of Ki-67 as well as TILs scoring, automated methods using machine learning, specifically approaches based on deep learning, have attracted attention. Yet, deep learning methods need considerable annotated data. In the absence of publicly available benchmarks for BC Ki-67 stained cell detection and further annotated classification of cells, we propose SHIDC-BC-Ki-67 as a dataset for the aforementioned purpose. We also introduce a novel pipeline and a backend, namely PathoNet for Ki-67 immunostained cell detection and classification and simultaneous determination of intratumoral TILs score. Further, we show that despite facing challenges, our proposed backend, PathoNet, outperforms the state of the art methods proposed to date in the harmonic mean measure.
摘要：核蛋白Ki-67和肿瘤浸润淋巴细胞（TIL的）已被引入，作为预测肿瘤进展和其治疗反应的预后因素。在方法异质性肿瘤如乳腺癌（BC），被称为在全世界妇女中最常见的癌症的Ki-67的索引和TIL的值，已经凸显在文献中。由于Ki-67的以及肿瘤浸润淋巴细胞得分的不确定的和主观的性质，使用机器学习自动化方法，具体办法为基础深度学习，已引起关注。然而，深学习方法需要大量的注释数据。在不存在用于BC的Ki-67染色的细胞的检测可公开获得的基准和细胞的进一步注释分类，我们提出SHIDC-BC-Ki-67的作为用于上述目的的数据集。我们还引入了一个新的管道和后端，即PathoNet为Ki-67的免疫染色细胞检测和分类，并同时测定的瘤TIL的得分。此外，我们还显示，尽管面临着挑战，我们提出的后端，PathoNet，优于在调和平均数措施提出了迄今为止技术方法的状态。

36. LaND: Learning to Navigate from Disengagements [PDF] 返回目录
Gregory Kahn, Pieter Abbeel, Sergey Levine
Abstract: Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system due to the robot performing an undesirable maneuver, the autonomy developers gain insight into how to improve the autonomy system. However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate. We present a reinforcement learning approach for learning to navigate from disengagements, or LaND. LaND learns a neural network model that predicts which actions lead to disengagements given the current sensory observation, and then at test time plans and executes actions that avoid disengagements. Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches. Videos, code, and other material are available on our website this https URL
摘要：始终在现实世界中的场景测试自主移动机器人，是发展自主导航系统的一个必要方面。每当人类安全监控脱离机器人的自治制度，由于执行不期望的动作，自主开发增益洞察如何提高自治制度的机器人。然而，我们认为，这些脱开不仅显示在系统出现故障时，这是有用的故障排除，但也提供了直接学习信号，通过该机器人可以学习导航。我们提出了一个强化学习方法，学习从脱开，或土地进行导航。土地获悉预测哪些行为导致鉴于目前的感官观察脱开，然后在测试时间计划和执行措施能避免脱开神经网络模型。我们的研究结果表明土地能成功地学习不同，现实世界中的人行道环境中导航，既超越模仿学习和强化学习方法。视频，代码和其他材料都可以在我们的网站这HTTPS URL

37. Smooth Variational Graph Embeddings for Efficient Neural Architecture Search [PDF] 返回目录
Jovita Lukasik, David Friede, Arber Zela, Heiner Stuckenschmidt, Frank Hutter, Margret Keuper
Abstract: In this paper, we propose an approach to neural architecture search (NAS) based on graph embeddings. NAS has been addressed previously using discrete, sampling based methods, which are computationally expensive as well as differentiable approaches, which come at lower costs but enforce stronger constraints on the search space. The proposed approach leverages advantages from both sides by building a smooth variational neural architecture embedding space in which we evaluate a structural subset of architectures at training time using the predicted performance while it allows to extrapolate from this subspace at inference time. We evaluate the proposed approach in the context of two common search spaces, the graph structure defined by the ENAS approach and the NAS-Bench-101 search space, and improve over the state of the art in both.
摘要：在本文中，我们提出了一种方法来基于图嵌入神经结构搜索（NAS）。 NAS已解决先前使用离散，采样为基础的方法，其是计算上昂贵的以及可微的方法，其中来以更低的成本，但执行的搜索空间更强的约束。通过使用该预测性能的同时，它允许从该子空间的推理时间推断建设，我们在训练时计算架构的结构子集的平滑变神经结构嵌入空间从双方所提出的方法利用的优势。我们评估两种常见的搜索空间，由ENAS方法和NAS-台-101的搜索空间中定义的图形结构的背景下提出的方法，提高了现有技术中两者的状态。

38. Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation [PDF] 返回目录
Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Trevor Darrell, Kurt Keutzer, Han Zhao
Abstract: The success of supervised learning hinges on the assumption that the training and test data come from the same underlying distribution, which is often not valid in practice due to potential distribution shift. In light of this, most existing methods for unsupervised domain adaptation focus on achieving domain-invariant representations and small source domain error. However, recent works have shown that this is not sufficient to guarantee good generalization on the target domain, and in fact, is provably detrimental under label distribution shift. Furthermore, in many real-world applications it is often feasible to obtain a small amount of labeled data from the target domain and use them to facilitate model training with source data. Inspired by the above observations, in this paper we propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA). First, we provide a finite sample bound for both classification and regression problems under Semi-DA. The bound suggests a principled way to obtain target generalization, i.e. by aligning both the marginal and conditional distributions across domains in feature space. Motivated by this, we then introduce the LIRR algorithm for jointly \textbf{L}earning \textbf{I}nvariant \textbf{R}epresentations and \textbf{R}isks. Finally, extensive experiments are conducted on both classification and regression tasks, which demonstrates LIRR consistently achieves state-of-the-art performance and significant improvements compared with the methods that only learn invariant representations or invariant risks.
摘要：在假定的训练和测试数据来自同一个潜在分布，这往往是在实践中没有有效的监督学习铰链由于电位分布转变的成功。鉴于此，对于无监督领域适应性注重实现域不变的陈述和小源域误差大多数现有的方法。然而，最近的作品都表明，这是不足以保证在目标域良好的推广，而事实上，是在标签分发移可证明是有害的。此外，在许多实际应用中，经常是可行的获得目标域标记的数据量小，并用它们来促进与源数据模型训练。通过上述意见的启发，在本文中，我们提出，其目的是同时学习半监督领域适应性（SEMI-DA）的设置下是不变的陈述和风险的第一个方法。首先，我们提供开往下半-DA分类和回归问题的有限样本。结合表明一个原则的方式通过在特征空间跨域对准边缘和条件分布既以获得目标概括，即。这个启发，我们再介绍LIRR算法共同\ textbf {L}收入\ textbf {I} nvariant \ textbf {R}对产权和\ textbf {R} isks。最后，大量的实验在分类和回归任务，这表明LIRR进行持续达到国家的最先进的性能，并与只学不变陈述或不变的风险方法相比显著改善。

39. Baseline and Triangulation Geometry in a Standard Plenoptic Camera [PDF] 返回目录
Christopher Hahne, Amar Aggoun, Vladan Velisavljevic, Susanne Fiebig, Matthias Pesch
Abstract: In this paper, we demonstrate light field triangulation to determine depth distances and baselines in a plenoptic camera. Advances in micro lenses and image sensors have enabled plenoptic cameras to capture a scene from different viewpoints with sufficient spatial resolution. While object distances can be inferred from disparities in a stereo viewpoint pair using triangulation, this concept remains ambiguous when applied in the case of plenoptic cameras. We present a geometrical light field model allowing the triangulation to be applied to a plenoptic camera in order to predict object distances or specify baselines as desired. It is shown that distance estimates from our novel method match those of real objects placed in front of the camera. Additional benchmark tests with an optical design software further validate the model's accuracy with deviations of less than +-0.33 % for several main lens types and focus settings. A variety of applications in the automotive and robotics field can benefit from this estimation model.
摘要：在本文中，我们展示了光场三角测量来确定在全光相机深度距离和基线。在微透镜和图像传感器的进步已经使全光相机以足够的空间分辨率不同的视点拍摄场景。虽然对象的距离可以从使用三角测量在立体视点对差异可以推断，当在全光的照相机的情况下应用这种概念仍然不明确。我们提出了一个几何光场模型允许三角测量以预测对象的距离或指定所需的基线被应用于全光照相机。结果表明，从我们的新方法的距离估计匹配那些放置在照相机前面的真实对象的。用光学设计软件的其他基准测试进一步验证模型的具有小于+ -0.33％的偏差精度几个主透镜类型和聚焦设置。各种汽车和机器人领域的应用可以受益于这种估算模型。

40. Attaining Real-Time Super-Resolution for Microscopic Images Using GAN [PDF] 返回目录
Vibhu Bhatia, Yatender Kumar
Abstract: In the last few years, several deep learning models, especially Generative Adversarial Networks have received a lot of attention for the task of Single Image Super-Resolution (SISR). These methods focus on building an end-to-end framework, which produce a high resolution(SR) image from a given low resolution(LR) image in a single step to achieve state-of-the-art performance. This paper focuses on improving an existing deep-learning based method to perform Super-Resolution Microscopy in real-time using a standard GPU. For this, we first propose a tiling strategy, which takes advantage of parallelism provided by a GPU to speed up the network training process. Further, we suggest simple changes to the architecture of the generator and the discriminator of SRGAN. Subsequently, We compare the quality and the running time for the outputs produced by our model, opening its applications in different areas like low-end benchtop and even mobile microscopy. Finally, we explore the possibility of the trained network to produce High-Resolution HR outputs for different domains.
摘要：在过去的几年中，一些深学习模式，特别是对抗性剖成网络已经收到了很多关注的单幅图像超分辨率（SISR）的任务。这些方法专注于建立的端至端的框架，其产生从给定的低分辨率（LR）图像的高分辨率（SR）图像中的单个步骤来实现状态的最先进的性能。本文着重于提高使用标准的GPU实时执行超分辨率显微镜现有的深学习为基础的方法。对于这一点，我们首先提出了一个平铺战略，利用由GPU来加速网络训练过程中提供的并行性。此外，我们建议发电机的结构和SRGAN的鉴别简单的更改。随后，我们比较的质量和运行时间由我们的模型产生的输出，在不同的领域，如低端台式甚至移动显微镜开放其应用程序。最后，我们探讨了训练的网络的可能性，以产生不同的域高分辨率HR输出。

41. Conditional GAN for Prediction of Glaucoma Progression with Macular Optical Coherence Tomography [PDF] 返回目录
Osama N. Hassan, Serhat Sahin, Vahid Mohammadzadeh, Xiaohe Yang, Navid Amini, Apoorva Mylavarapu, Jack Martinyan, Tae Hong, Golnoush Mahmoudinezhad, Daniel Rueckert, Kouros Nouri-Mahdavi, Fabien Scalzo
Abstract: The estimation of glaucoma progression is a challenging task as the rate of disease progression varies among individuals in addition to other factors such as measurement variability and the lack of standardization in defining progression. Structural tests, such as thickness measurements of the retinal nerve fiber layer or the macula with optical coherence tomography (OCT), are able to detect anatomical changes in glaucomatous eyes. Such changes may be observed before any functional damage. In this work, we built a generative deep learning model using the conditional GAN architecture to predict glaucoma progression over time. The patient's OCT scan is predicted from three or two prior measurements. The predicted images demonstrate high similarity with the ground truth images. In addition, our results suggest that OCT scans obtained from only two prior visits may actually be sufficient to predict the next OCT scan of the patient after six months.
摘要：青光眼进展的估计是一个具有挑战性的任务，因为疾病进展的速率，除了其他因素的个体诸如测量变性和在定义进展缺乏标准化的不同而不同。结构测试，如视网膜神经纤维层或光学相干断层扫描（OCT）的黄斑厚度的测量，能够检测在青光眼解剖变化。这样的改变可以在任何功能性损害之前被观察到。在这项工作中，我们建立了使用条件GAN架构来预测青光眼进展一段时间内生成深度学习模型。患者的OCT扫描从三个或两个先前的测量预测。预测图像展示与地面实况图像很高的相似性。此外，我们的研究结果表明，OCT扫描仅两个先前获得的访问实际上可能足以预测六个月后的下一个OCT扫描的病人。

42. Sickle-cell disease diagnosis support selecting the most appropriate machinelearning method: Towards a general and interpretable approach for cellmorphology analysis from microscopy images [PDF] 返回目录
Nataša Petrović, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo
Abstract: In this work we propose an approach to select the classification method and features, based on the state-of-the-art, with best performance for diagnostic support through peripheral blood smear images of red blood cells. In our case we used samples of patients with sickle-cell disease which can be generalized for other study cases. To trust the behavior of the proposed system, we also analyzed the interpretability. We pre-processed and segmented microscopic images, to ensure high feature quality. We applied the methods used in the literature to extract the features from blood cells and the machine learning methods to classify their morphology. Next, we searched for their best parameters from the resulting data in the feature extraction phase. Then, we found the best parameters for every classifier using Randomized and Grid search. For the sake of scientific progress, we published parameters for each classifier, the implemented code library, the confusion matrices with the raw data, and we used the public erythrocytesIDB dataset for validation. We also defined how to select the most important features for classification to decrease the complexity and the training time, and for interpretability purpose in opaque models. Finally, comparing the best performing classification methods with the state-of-the-art, we obtained better results even with interpretable model classifiers.
摘要：在这项工作中，我们提出了一个方法来选择的分类方法和特点的基础上，国家的最先进的，具有通过红血球的外周血涂片图像诊断支持最佳性能。在我们的例子中，我们使用的治疗镰状细胞病，可概括为其他学习情况的样本。要信任所提出的系统的行为，我们也分析了解释性。我们预处理，分段显微图像，以确保高品质的功能。我们应用在文献中用于提取从血细胞和机器学习方法的特征，以它们的形态进行分类的方法。接下来，我们搜索了在特征提取阶段所得到的数据的最佳参数。于是，我们发现，使用随机和网格搜索每一个分类的最佳参数。对于科学进步着想，我们公布了每个分类中，实现的代码库中，混淆矩阵与原始数据参数，我们使用的公共erythrocytesIDB数据集进行验证。我们还定义了如何选择最重要的特征进行分类，以降低复杂性和训练时间，并为不透明的模型解释性目的。最后，比较符合国家的最先进的表现最好的分类方法，我们获得了更好的结果，即使解释模型分类。

43. Linear Mode Connectivity in Multitask and Continual Learning [PDF] 返回目录
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, Hassan Ghasemzadeh
Abstract: Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where continual learning can only have access to one task at a time, which for neural networks typically leads to catastrophic forgetting. That is, the solution found for a subsequent task does not perform well on the previous ones anymore. However, the relationship between the different minima that the two training regimes arrive at is not well understood. What sets them apart? Is there a local structure that could explain the difference in performance achieved by the two different schemes? Motivated by recent work showing that different minima of the same task are typically connected by very simple curves of low error, we investigate whether multitask and continual solutions are similarly connected. We empirically find that indeed such connectivity can be reliably achieved and, more interestingly, it can be done by a linear path, conditioned on having the same initialization for both. We thoroughly analyze this observation and discuss its significance for the continual learning process. Furthermore, we exploit this finding to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. We show that our method outperforms several state of the art continual learning algorithms on various vision benchmarks.
摘要：持续（连续）培训和多任务（同时）的培训往往是试图解决同样的总体目标：找到一个解决方案，以及执行上的所有考虑的任务。主要的区别是在培训制度，在不断学习只能在同一时间，这对于神经网络通常会导致灾难性遗忘访问一个任务。也就是说，该解决方案找到了后续的任务不会对以前的表现良好了。然而，不同的最小值，这两个培训制度，在到达之间的关系还不是很清楚。是什么使他们分开？有没有能够解释由两个不同的方案来实现性能上的差异局部结构？通过展示同一任务的不同极小通常由低误差的非常简单的曲线连接而成近期工作的启发，我们调查是否多任务和持续解决方案同样连接。我们经验发现，事实上，这种连接能够可靠地实现，更有趣的是，它可以通过一个线性路径，空调在具有两个相同的初始化完成。我们深入分析这一观察和讨论其对不断学习的过程意义。此外，我们利用这一发现提出了一种有效的算法，约束的顺序了解到极小表现为多任务解决方案。我们证明了我们的方法优于对各种视觉基准艺术不断学习算法几个州。

44. Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting [PDF] 返回目录
Vincent Le Guen, Yuan Yin, Jérémie Dona, Ibrahim Ayed, Emmanuel de Bézenac, Nicolas Thome, Patrick Gallinari
Abstract: Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters.
摘要：在设置预测复杂的动力学现象只有它们的动态的部分知识是可以跨越各个科学领域一个普遍的问题。虽然纯粹的数据驱动的方法，可以说是不足在这种背景下，基于标准的物理建模方法往往是过于简单化，引起不可忽略的误差。在这项工作中，我们介绍了APHYNITY框架，为增强不完全由深数据驱动模型微分方程描述物理动力学原则的做法。它由分解动力学分为两个部分：物理组件占了，我们有一些先验知识的动态，并为物理模型的误差数据驱动组件会计。学习问题是精心制定这样的物理模型解释了尽可能多的数据尽可能的，而数据驱动的组件只描述了不能用物理模型，没有更多的被捕获，毫不逊色的信息。这不仅提供了这种分解的存在唯一性，同时也保证了解释性和福利泛化。在三个重要的用途的情况下，不同的家庭的现象中的每个代表，即反应扩散方程由实验，波动方程和非线性阻尼摆，表明APHYNITY可以有效地利用近似物理模型来准确地预测系统的演进和正确识别相关的物理参数。

45. Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation [PDF] 返回目录
Yicheng Wu, Chengwei Pan, Shuqi Wang, Ming Zhang, Yong Xia, Yizhou Yu
Abstract: Analyzing the morphological attributes of blood vessels plays a critical role in the computer-aided diagnosis of many cardiovascular and ophthalmologic diseases. Although being extensively studied, segmentation of blood vessels, particularly thin vessels and capillaries, remains challenging mainly due to the lack of an effective interaction between local and global features. In this paper, we propose a novel deep learning model called PC-Net to segment retinal vessels and major arteries in 2D fundus image and 3D computed tomography angiography (CTA) scans, respectively. In PC-Net, the pyramid squeeze-and-excitation (PSE) module introduces spatial information to each convolutional block, boosting its ability to extract more effective multi-scale features, and the coarse-to-fine (CF) module replaces the conventional decoder to enhance the details of thin vessels and process hard-to-classify pixels again. We evaluated our PC-Net on the Digital Retinal Images for Vessel Extraction (DRIVE) database and an in-house 3D major artery (3MA) database against several recent methods. Our results not only demonstrate the effectiveness of the proposed PSE module and CF module, but also suggest that our proposed PC-Net sets new state of the art in the segmentation of retinal vessels (AUC: 98.31%) and major arteries (AUC: 98.35%) on both databases, respectively.
摘要：分析血管的形态属性起着许多心血管疾病和眼科疾病的计算机辅助诊断的关键作用。虽然被广泛地研究，血管，特别薄的血管和毛细血管，遗体的分割挑战主要是由于缺乏局部和全局特征之间的有效的相互作用。在本文中，我们提出了称为PC-Net的细分视网膜血管和2D眼底图像和3D主要动脉一种新的深度学习模型计算机断层扫描血管造影（CTA）扫描，分别。在PC-Net的，金字塔挤压和激励（PSE）模块介绍空间信息给每个卷积块，提高其提取更有效的多尺度特征的能力，和粗到细的（CF）模块取代了传统的解码器，以增强细血管的细节和再次处理难以分类像素。我们评估了对近期几种方法数字视网膜图像的血管提取（DRIVE）数据库和一个内部的3D大动脉（3MA）数据库我们的PC-Net的。我们的研究结果不仅证明了该PSE模块和CF模块的有效性，同时也表明，我们提出的PC-网套的技术视网膜血管（AUC：98.31％）的分割的新状态和主要干道（AUC：98.35 ％）对两个数据库，分别。

46. WHO 2016 subtyping and automated segmentation of glioma using multi-task deep learning [PDF] 返回目录
Sebastian R. van der Voort, Fatih Incekara, Maarten M.J. Wijnenga, Georgios Kapsas, Renske Gahrmann, Joost W. Schouten, Rishi Nandoe Tewarie, Geert J. Lycklama, Philip C. De Witt Hamer, Roelant S. Eijgelaar, Pim J. French, Hendrikus J. Dubbink, Arnaud J.P.E. Vincent, Wiro J. Niessen, Martin J. van den Bent, Marion Smits, Stefan Klein
Abstract: Accurate characterization of glioma is crucial for clinical decision making. A delineation of the tumor is also desirable in the initial decision stages but is a time-consuming task. Leveraging the latest GPU capabilities, we developed a single multi-task convolutional neural network that uses the full 3D, structural, pre-operative MRI scans to can predict the IDH mutation status, the 1p/19q co-deletion status, and the grade of a tumor, while simultaneously segmenting the tumor. We trained our method using the largest, most diverse patient cohort to date containing 1508 glioma patients from 16 institutes. We tested our method on an independent dataset of 240 patients from 13 different institutes, and achieved an IDH-AUC of 0.90, 1p/19q-AUC of 0.85, grade-AUC of 0.81, and a mean whole tumor DICE score of 0.84. Thus, our method non-invasively predicts multiple, clinically relevant parameters and generalizes well to the broader clinical population.
摘要：神经胶质瘤的准确描述是为临床决策至关重要。肿瘤的划分也是在初始决策阶段可取的，但是一个耗时的任务。利用最新的GPU功能，我们开发了一个多任务卷积神经，使用全3D的网络结构，术前MRI扫描，以能够预测IDH突变状态，则1P / 19Q共同缺失的状态，和品位肿瘤，而同时分割所述肿瘤。我们用最大的，最多样化的患者群包含1508名胶质瘤患者从16个研究机构接受了培训我们的方法。我们测试我们的方法对来自13个不同机构的240例患者的独立数据集，并取得了0.90的IDH-AUC，0.85 1P / 19Q-AUC，0.81级-AUC和平均整个肿瘤DICE得分为0.84。因此，我们的方法无创预测多，临床相关参数和推广以及更广泛的临床人口。

47. Weaponizing Unicodes with Deep Learning -- Identifying Homoglyphs with Weakly Labeled Data [PDF] 返回目录
Perry Deng, Cooper Linsky, Matthew Wright
Abstract: Visually similar characters, or homoglyphs, can be used to perform social engineering attacks or to evade spam and plagiarism detectors. It is thus important to understand the capabilities of an attacker to identify homoglyphs -- particularly ones that have not been previously spotted -- and leverage them in attacks. We investigate a deep-learning model using embedding learning, transfer learning, and augmentation to determine the visual similarity of characters and thereby identify potential homoglyphs. Our approach uniquely takes advantage of weak labels that arise from the fact that most characters are not homoglyphs. Our model drastically outperforms the Normalized Compression Distance approach on pairwise homoglyph identification, for which we achieve an average precision of 0.97. We also present the first attempt at clustering homoglyphs into sets of equivalence classes, which is more efficient than pairwise information for security practitioners to quickly lookup homoglyphs or to normalize confusable string encodings. To measure clustering performance, we propose a metric (mBIOU) building on the classic Intersection-Over-Union (IOU) metric. Our clustering method achieves 0.592 mBIOU, compared to 0.430 for the naive baseline. We also use our model to predict over 8,000 previously unknown homoglyphs, and find good early indications that many of these may be true positives. Source code and list of predicted homoglyphs are uploaded to Github: this https URL
摘要：外观相似的字符，或同形字，可以用来进行社会工程攻击或逃避垃圾邮件和剽窃探测器。因此重要的是要了解攻击者的能力，以确定同形字 - 这以前没有发现特别是那些 - 并利用它们的攻击。我们使用嵌入学习，迁移学习，和增强来确定人物的视觉相似性，从而找出潜在的同形字调查了深刻的学习模式。我们的方法唯一需要从事实，大多数字符不是同形字出现弱的标签的优势。我们的模型大大优于上两两同形字标识的标准化压缩距离的做法，为我们实现了0.97的平均精度。我们还提出在聚类同形字成组等价类的，这比成对信息安全从业者快速查找同形字或正常化混淆字符串编码更高效的首次尝试。为了测量集群性能，我们提出了一个度量标准（mBIOU）基础上的经典交叉点过联盟（IOU）度量。我们的聚类方法达到0.592 mBIOU，比0.430的天真基线。我们还使用我们的模型来预测8000以前未知的同形字，并找到良好的早期迹象表明，许多这些可真阳性。源代码和预测同形字的名单上传至Github上：此HTTPS URL

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-12

目录

摘要