摘要

1. MultiResolution Attention Extractor for Small Object Detection [PDF] 返回目录
Fan Zhang, Licheng Jiao, Lingling Li, Fang Liu, Xu Liu
Abstract: Small objects are difficult to detect because of their low resolution and small size. The existing small object detection methods mainly focus on data preprocessing or narrowing the differences between large and small objects. Inspired by human vision "attention" mechanism, we exploit two feature extraction methods to mine the most useful information of small objects. Both methods are based on multiresolution feature extraction. We initially design and explore the soft attention method, but we find that its convergence speed is slow. Then we present the second method, an attention-based feature interaction method, called a MultiResolution Attention Extractor (MRAE), showing significant improvement as a generic feature extractor in small object detection. After each building block in the vanilla feature extractor, we append a small network to generate attention weights followed by a weighted-sum operation to get the final attention maps. Our attention-based feature extractor is 2.0 times the AP of the "hard" attention counterpart (plain architecture) on the COCO small object detection benchmark, proving that MRAE can capture useful location and contextual information through adaptive learning.
摘要：小对象，因为它们的分辨率低，体积小而难以察觉。现有的小物体的检测方法主要集中在数据预处理或缩小大，小对象之间的差异。通过人的视觉“关注”机制的启发，我们利用两种特征提取方法开采的小物件最有用的信息。这两种方法都基于多分辨率特征提取。我们最初设计和探索软注意力的方法，但我们发现它的收敛速度慢。然后，我们提出的第二个方法中，基于注意机制的特征交互的方法，称为多分辨率注意提取器（MRAE），显示显著改善如在小物体检测的通用特征提取。在香草特征提取每个积木后，我们加一个小的网络，以产生关注的权重，然后加权和运算，来获得最终的注意地图。我们关注的基于特征提取器是“硬”注意对应物（平纹结构）的上COCO小物体检测基准的2.0倍AP，证明MRAE可以通过自适应学习捕获有用的位置和环境信息。

2. Simple and effective localized attribute representations for zero-shot learning [PDF] 返回目录
Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer
Abstract: Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their semantic descriptions. Some recent papers have shown the importance of localized features together with fine-tuning the feature extractor to obtain discriminative and transferable features. However, these methods require complex attention or part detection modules to perform explicit localization in the visual space. In contrast, in this paper we propose localizing representations in the semantic/attribute space, with a simple but effective pipeline where localization is implicit. Focusing on attribute representations, we show that our method obtains state-of-the-art performance on CUB and SUN datasets, and also achieves competitive results on AWA2 dataset, outperforming generally more complex methods with explicit localization in the visual space. Our method can be implemented easily, which can be used as a new baseline for zero shot learning.
摘要：零次学习（ZSL）由通过他们的语义描述剥削关系，看到班旨在从看不见的类别判别图像。最近的一些论文已经有微调所示的局部特征的重要性在一起特征提取，获得辨别和转移的功能。然而，这些方法需要复杂的注意或部分检测模块，在视觉空间进行明确的定位。相反，在本文中我们提出交涉定位在语义/属性的空间，用一个简单而有效的管道，其中本地化是隐含的。着眼于属性表示，我们表明，国家的最先进的我们的方法取得的幼崽和SUN数据集的性能，同时也实现了对AWA2数据集有竞争力的成绩，超越通常与视觉空间明确定位更复杂的方法。我们的方法可以很容易地实现，它可以被用作零射门学习一个新的基准。

3. Dataset Condensation with Gradient Matching [PDF] 返回目录
Bo Zhao, Konda Reddy Mopuri, Hakan Bilen
Abstract: Efficient training of deep neural networks is an increasingly important problem in the era of sophisticated architectures and large-scale datasets. This paper proposes a training set synthesis technique, called Dataset Condensation, that learns to produce a small set of informative samples for training deep neural networks from scratch in a small fraction of the required computational cost on the original data while achieving comparable results. We rigorously evaluate its performance in several computer vision benchmarks and show that it significantly outperforms the state-of-the-art methods. Finally we show promising applications of our method in continual learning and domain adaptation.
摘要：深层神经网络的高效培训是复杂的架构和规模的大型数据集的时代越来越重要的问题。本文提出了一种训练集合成技术，称为数据集凝结，即学会产生一个小的一系列有信息样本的原始数据中所需要的计算成本的一小部分从头训练深层神经网络，同时实现可比较的结果。我们严格评估其在几个计算机视觉的基准性能，并显示其显著优于国家的最先进的方法。最后，我们将展示在不断地学习和适应域我们的方法的应用前景。

4. Recent Advances in 3D Object and Hand Pose Estimation [PDF] 返回目录
Vincent Lepetit
Abstract: 3D object and hand pose estimation have huge potentials for Augmented Reality, to enable tangible interfaces, natural interfaces, and blurring the boundaries between the real and virtual worlds. In this chapter, we present the recent developments for 3D object and hand pose estimation using cameras, and discuss their abilities and limitations and the possible future development of the field.
摘要：3D物体和手部姿势估计有巨大的潜力增强现实，使有形的接口，自然界面，并且模糊了现实和虚拟世界之间的界线。在本章中，我们使用相机本作的3D对象和手姿势估计最近的事态发展，并讨论他们的能力和局限性和该领域的未来可能的发展。

5. Separable Four Points Fundamental Matrix [PDF] 返回目录
Gil Ben-Artzi
Abstract: We present an approach for the computation of the fundamental matrix based on epipolar homography decomposition. We analyze the geometrical meaning of the decomposition-based representation and show that it guarantees a minimal number of RANSAC samples, on the condition that four correspondences are on an image line. Experiments on real-world image pairs show that our approach successfully recovers such four correspondences, provides accurate results and requires a very small number of RANSAC iterations.
摘要：我们提出的方法基于核单应分解的基础矩阵的计算。我们分析了基于分解表示的几何意义，并表明它保证RANSAC样本数量最少，对四个对应上画线的条件。现实世界图像对实验结果表明，我们的方法成功恢复等四个对应，提供准确的结果，需要一个极少数RANSAC迭代。

6. Deep Learning with Attention Mechanism for Predicting Driver Intention at Intersection [PDF] 返回目录
Abenezer Girma, Seifemichael Amsalu, Abrham Workineh, Mubbashar Khan, Abdollah Homaifar
Abstract: In this paper, a driver's intention prediction near a road intersection is proposed. Our approach uses a deep bidirectional Long Short-Term Memory (LSTM) with an attention mechanism model based on a hybrid-state system (HSS) framework. As intersection is considered to be as one of the major source of road accidents, predicting a driver's intention at an intersection is very crucial. Our method uses a sequence to sequence modeling with an attention mechanism to effectively exploit temporal information out of the time-series vehicular data including velocity and yaw-rate. The model then predicts ahead of time whether the target vehicle/driver will go straight, stop, or take right or left turn. The performance of the proposed approach is evaluated on a naturalistic driving dataset and results show that our method achieves high accuracy as well as outperforms other methods. The proposed solution is promising to be applied in advanced driver assistance systems (ADAS) and as part of active safety system of autonomous vehicles.
摘要：在本文中，附近的道路交叉口驾驶员意图的预测提出。我们的方法使用深双向龙短时记忆（LSTM）基于混合态系统（HSS）框架上的注意机制模型。由于路口被认为是道路交通事故的主要来源之一，在一个十字路口预测驾驶员的意图是很关键的。我们的方法使用序列序列建模与关注机制，有效利用时间信息进行时间序列的车辆数据，包括速度，偏航率。然后，该模型预测的时间提前目标车辆/驱动器是否会直行，停止，或采取左转或右转。该方法的性能上自然的驾驶数据集评估，结果表明，该方法实现高精确度，以及优于其他方法。提出的解决方案是有前途的先进驾驶辅助系统（ADAS）和自主车辆的主动安全系统的一部分加以应用。

7. DisCont: Self-Supervised Visual Attribute Disentanglement using Context Vectors [PDF] 返回目录
Sarthak Bhagat, Vishaal Udandarao, Shagun Uppal
Abstract: Disentangling the underlying feature attributes within an image with no prior supervision is a challenging task. Models that can disentangle attributes well provide greater interpretability and control. In this paper, we propose a self-supervised framework DisCont to disentangle multiple attributes by exploiting the structural inductive biases within images. Motivated by the recent surge in contrastive learning paradigms, our model bridges the gap between self-supervised contrastive learning algorithms and unsupervised disentanglement. We evaluate the efficacy of our approach, both qualitatively and quantitatively, on four benchmark datasets.
摘要：没有事前监督理清图像内的基本功能属性是一个具有挑战性的任务。可以解开属性模型以及提供更大的可解释性和控制。在本文中，我们通过图片内利用结构感性偏见提出了一个自我监督的框架DisCont解开多个属性。通过对比学习范式最近激增的推动下，我们的模型桥梁自我监督对比学习算法和无监督的解开之间的差距。我们评估我们的方法的有效性，定性和定量，四个标准数据集。

8. Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging [PDF] 返回目录
Yeqi Bai, Tao Ma, Lipo Wang, Zhenjie Zhang
Abstract: While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies on data model and training, we demonstrate dramatic performance boost over state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.
摘要：虽然深学习技术现在能够产生逼真的图像，迷惑人的，研究工作正在转向图像合成为更具体和特定应用的目的。基于从语音声音特征的面部图像生成是如此重要而具有挑战性的任务之一。这是关键因素，以图像生成有影响的使用情况，特别是对公共安全和娱乐业务。现有的解决方案，以speech2face的问题，使得有限的图像质量和未能保持面部相似，由于缺乏优质的数据集进行训练和声乐特点适当的整合。在本文中，我们研究了这些关键技术难题，并提出语音融合到人脸，或SF2F总之，试图解决人脸图像质量和声音的功能域和现代图像生成模型之间的连接不良的问题。通过采用数据模型和培训新的战略，我们表现出显着的性能提升，在国家的最先进的解决方案，通过加倍个人身份的召回，以及基于互信息分数与VGGFace提升15质量分数19分类。

9. WasteNet: Waste Classification at the Edge for Smart Bins [PDF] 返回目录
Gary White, Christian Cabrera, Andrei Palade, Fan Li, Siobhan Clarke
Abstract: Smart Bins have become popular in smart cities and campuses around the world. These bins have a compaction mechanism that increases the bins' capacity as well as automated real-time collection notifications. In this paper, we propose WasteNet, a waste classification model based on convolutional neural networks that can be deployed on a low power device at the edge of the network, such as a Jetson Nano. The problem of segregating waste is a big challenge for many countries around the world. Automated waste classification at the edge allows for fast intelligent decisions in smart bins without needing access to the cloud. Waste is classified into six categories: paper, cardboard, glass, metal, plastic and other. Our model achieves a 97\% prediction accuracy on the test dataset. This level of classification accuracy will help to alleviate some common smart bin problems, such as recycling contamination, where different types of waste become mixed with recycling waste causing the bin to be contaminated. It also makes the bins more user friendly as citizens do not have to worry about disposing their rubbish in the correct bin as the smart bin will be able to make the decision for them.
摘要：智能垃圾桶已经在世界各地的智能城市和校园成为流行。这些信息仓具有压实机制，增加垃圾箱的能力以及自动实时采集的通知。在本文中，我们提出WasteNet的基础上，可以在网络边缘部署的低功率设备上的卷积神经网络，如杰特森纳米废物分类模型。分离浪费的问题是世界上许多国家的一大挑战。在边缘自动垃圾分类允许在智能箱快速做出明智的决策，而无需访问云。废物分为六类：纸，纸板，玻璃，金属，塑料等。我们的模型实现了对测试数据集97 \％的预测精度。分类准确性的这个水平，将有助于减轻一些常见的智能仓的问题，如循环污染，其中不同类型的废物成为废物回收造成的bin被污染混合。这也使得箱更加用户友好的公民不必有关正确处置斌把垃圾担心，因为智能箱就能做出决定他们。

10. Geometry-Aware Segmentation of Remote Sensing Images via implicit height estimation [PDF] 返回目录
Xiang Li, Yi Fang
Abstract: Convolutional neural networks have made significant breakthroughs in the field of remote sensing and greatly advanced the performance of the semantic segmentation of remote sensing images. Recent studies have shown the benefits of using additional elevation data (e.g., DSM) for enhancing the performance of the semantic segmentation of aerial images. However, previous methods mostly adopt 3D elevation information as additional inputs. While in many real-world applications, one does not have the corresponding DSM information at hand and the spatial resolution of acquired DSM images usually do not match the aerial images. To alleviate this data constraint and also take the advantage of 3D elevation information, in this paper, we propose a geometry-aware segmentation model that achieves accurate semantic segmentation of aerial images via implicit height estimation. Instead of using a single-stream encoder-decoder network for semantic labeling, we design a separate decoder branch to predict the height map and use the DSM images as side supervision to train this newly designed decoder branch. With the newly designed decoder branch, our model can distill the 3D geometric features from 2D appearance features under the supervision of ground truth DSM images. Moreover, we develop a new geometry-aware convolution module that fuses the 3D geometric features from the height decoder branch and the 2D contextual features from the semantic segmentation branch. The fused feature embeddings can produce geometry-aware segmentation maps with enhanced performance. Experiments on ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method for the semantic segmentation of aerial images. Our proposed model achieves remarkable performance on both datasets without using any hand-crafted features or post-processing.
摘要：卷积神经网络在遥感领域取得显著的突破，极大地推进遥感图像的语义分割的性能。最近的研究表明使用附加高程数据（例如，DSM）提高航空图像的语义分割的性能的好处。然而，以前的方法大多采用3D立体信息作为附加输入。虽然在许多实际应用中，人们不必在手相应的DSM信息和获得DSM图像的空间分辨率通常不匹配航拍图像。为了缓解这一数据约束，并采取三维高程信息的优势，在本文中，我们提出了通过隐高度估计实现航空影像的精确语义分割几何感知分割模型。代替使用单流编码器 - 解码器网络的语义标签，我们设计一个单独的解码器分支预测高度图，并使用DSM图像作为侧监督训练这个新设计的解码器分支。凭借全新设计的解码器分支，我们的模型可以提炼出三维几何特征从地面实况DSM图像的监督下，2D的外观特征。此外，我们开发了从高度解码器分支，从语义分割分支的2D上下文特征融合了3D几何特征一个新的几何感知卷积模块。融合功能的嵌入可产生具有增强的性能几何感知分割地图。在ISPRS Vaihingen和波茨坦的数据集实验证明我们提出了航拍图像的语义分割方法的有效性。我们提出的模型实现了对两个数据集卓越的性能，而无需使用任何手工制作的功能或后期处理。

11. Searching Learning Strategy with Reinforcement Learning for 3D Medical Image Segmentation [PDF] 返回目录
Dong Yang, Holger Roth, Ziyue Xu, Fausto Milletari, Ling Zhang, Daguang Xu
Abstract: Deep neural network (DNN) based approaches have been widely investigated and deployed in medical image analysis. For example, fully convolutional neural networks (FCN) achieve the state-of-the-art performance in several applications of 2D/3D medical image segmentation. Even the baseline neural network models (U-Net, V-Net, etc.) have been proven to be very effective and efficient when the training process is set up properly. Nevertheless, to fully exploit the potentials of neural networks, we propose an automated searching approach for the optimal training strategy with reinforcement learning. The proposed approach can be utilized for tuning hyper-parameters, and selecting necessary data augmentation with certain probabilities. The proposed approach is validated on several tasks of 3D medical image segmentation. The performance of the baseline model is boosted after searching, and it can achieve comparable accuracy to other manually-tuned state-of-the-art segmentation approaches.
摘要：深层神经网络（DNN）为基础的方法已被广泛研究和部署在医学图像分析。例如，完全的卷积神经网络（FCN）实现在2D / 3D医学图像分割的多个应用程序的状态的最先进的性能。即使基线神经网络模型（U-网，V型网等）已经被证明时所建立的训练过程正确是非常有效和高效。不过，为了充分利用神经网络的潜力，我们提出了以强化学习的最佳训练策略的自动搜索方法。所提出的方法可以被用于调谐超参数，和与某些概率选择必要的数据的增强。所提出的方法进行验证的三维医学图像分割的几个任务。基准模型的性能搜索后升压，并且它可以实现相当的精度其他手动调谐状态的最先进的分割方法。

12. Embedding Task Knowledge into 3D Neural Networks via Self-supervised Learning [PDF] 返回目录
Jiuwen Zhu, Yuexiang Li, Yifan Hu, S. Kevin Zhou
Abstract: Deep learning highly relies on the amount of annotated data. However, annotating medical images is extremely laborious and expensive. To this end, self-supervised learning (SSL), as a potential solution for deficient annotated data, attracts increasing attentions from the community. However, SSL approaches often design a proxy task that is not necessarily related to target task. In this paper, we propose a novel SSL approach for 3D medical image classification, namely Task-related Contrastive Prediction Coding (TCPC), which embeds task knowledge into training 3D neural networks. The proposed TCPC first locates the initial candidate lesions via supervoxel estimation using simple linear iterative clustering. Then, we extract features from the sub-volume cropped around potential lesion areas, and construct a calibrated contrastive predictive coding scheme for self-supervised learning. Extensive experiments are conducted on public and private datasets. The experimental results demonstrate the effectiveness of embedding lesion-related prior-knowledge into neural networks for 3D medical image classification.
摘要：深学习高度依赖于注释的数据量。然而，注释医学图像是极其费力的和昂贵的。为此，自我监督学习（SSL），作为缺乏注释的数据潜在的解决方案，吸引了来自社会各界越来越多的关注。然而，SSL方法通常设计一个不一定与目标任务代理任务。在本文中，我们提出了三维医学图像分类，即与任务相关的对比预测编码（TCPC），其中嵌入了任务知识转化为3D训练神经网络的一种新的SSL方法。所提出的TCPC第一经由使用简单的线性迭代聚类supervoxel估计所处的初始候选的病变。然后，我们提取到的子体积功能裁剪各地的潜在病灶区，构建自我监督学习校准的对比预测编码方案。大量的实验是在公共和私人的数据集进行。实验结果表明，嵌入病变相关的先验知识转化为三维医学图像分类神经网络的有效性。

13. Image Enhancement and Object Recognition for Night Vision Surveillance [PDF] 返回目录
Aashish Bhandari, Aayush Kafle, Pranjal Dhakal, Prateek Raj Joshi, Dinesh Baniya Kshatri
Abstract: Object recognition is a critical part of any surveillance system. It is the matter of utmost concern to identify intruders and foreign objects in the area where surveillance is done. The performance of surveillance system using the traditional camera in daylight is vastly superior as compared to night. The main problem for surveillance during the night is the objects captured by traditional cameras have low contrast against the background because of the absence of ambient light in the visible spectrum. Due to that reason, the image is taken in low light condition using an Infrared Camera and the image is enhanced to obtain an image with higher contrast using different enhancing algorithms based on the spatial domain. The enhanced image is then sent to the classification process. The classification is done by using convolutional neural network followed by a fully connected layer of neurons. The accuracy of classification after implementing different enhancement algorithms is compared in this paper.
摘要：物体识别任何监控系统的重要组成部分。这是最关注的问题，以确定在监视完成该地区的入侵者和异物。相比于夜间使用白天传统的摄像机监视系统的性能大大优于。用于在夜间监视的主要问题是由传统照相机捕获的对象具有针对因为在可见光谱中不存在环境光的背景低对比度。由于这个原因，该图像是使用红外线照相机拍摄在低光条件和图像增强使用基于空间域不同增强算法以更高的对比度，从而获得图像。然后将增强图像发送到分类处理。分类是通过使用卷积神经网络随后的神经元的完全连接的层来完成。分类实施不同的增强算法之后的准确性，本文进行了比较。

14. 3D Human Mesh Regression with Dense Correspondence [PDF] 返回目录
Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang
Abstract: Estimating 3D mesh of the human body from a single 2D image is an important task with many applications such as augmented reality and Human-Robot interaction. However, prior works reconstructed 3D mesh from global image feature extracted by using convolutional neural network (CNN), where the dense correspondences between the mesh surface and the image pixels are missing, leading to suboptimal solution. This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i.e. a 2D space used for texture mapping of 3D mesh). DecoMR first predicts pixel-to-surface dense correspondence map (i.e., IUV image), with which we transfer local features from the image space to the UV space. Then the transferred local image features are processed in the UV space to regress a location map, which is well aligned with transferred features. Finally we reconstruct 3D human mesh from the regressed location map with a predefined mapping function. We also observe that the existing discontinuous UV map are unfriendly to the learning of network. Therefore, we propose a novel UV map that maintains most of the neighboring relations on the original mesh surface. Experiments demonstrate that our proposed local feature alignment and continuous UV map outperforms existing 3D mesh based methods on multiple public benchmarks. Code will be made available at this https URL
摘要：从单一的2D图像人体的估算三维网格与许多应用，如增强现实和人机交互的重要任务。然而，现有作品重建从通过使用卷积神经网络（CNN），其中所述网状表面和图像的像素之间的对应密丢失，导致次优的溶液萃取全局图像特征的3D啮合。本文提出了一种无模型的三维人体目估计框架，命名DecoMR，其中明确建立网格和局部图像之间的密集对应在UV空间特征（用于3D的纹理映射目即2D空间中）。 DecoMR第一预测像素到表面致密对应地图（即，IUV图像），与我们从图像空间传送的局部特征到UV空间。然后所传送的局部图像特征在UV空间处理以退步所在位置的地图，这是很好地转移特征对准。最后，我们重建三维人体从倒退位置图与预定的映射函数啮合。我们还观察到，现有的不连续的UV贴图是不友好网络的学习。因此，我们提出了一个新的UV贴图维护大多数邻国关系的原始网格表面上。实验结果表明，我们所提出的局部特征比对，并连续UV贴图性能优于现有的3D网格上的多个公共基准为基础的方法。代码将在此HTTPS URL来

15. Object Detection in the DCT Domain: is Luminance the Solution? [PDF] 返回目录
Benjamin Deguerre, Clement Chatelain, Gilles Gasso
Abstract: Object detection in images has reached unprecedented performances. The state-of-the-art methods rely on deep architectures that extract salient features and predict bounding boxes enclosing the objects of interest. These methods essentially run on RGB images. However, the RGB images are often compressed by the acquisition devices for storage purpose and transfer efficiency. Hence, their decompression is required for object detectors. To gain in efficiency, this paper proposes to take advantage of the compressed representation of images to carry out object detection usable in constrained resources conditions. Specifically, we focus on JPEG images and propose a thorough analysis of detection architectures newly designed in regard of the peculiarities of the JPEG norm. This leads to a $\times 1.7$ speed up in comparison with a standard RGB-based architecture, while only reducing the detection performance by 5.5%. Additionally, our empirical findings demonstrate that only part of the compressed JPEG information, namely the luminance component, may be required to match detection accuracy of the full input methods.
摘要：在图像目标检测已达到了前所未有的性能。国家的最先进的方法依赖于深架构，提取物的显着特征，并预测边框包围感兴趣的对象。这些方法本质上的RGB图像运行。然而，RGB图像经常被用于存储目的和转印效率的采集装置压缩。因此，需要对对象的检测器其解压缩。为了获得在效率，提出了拍摄图像的压缩表示的优点在资源受限的条件来进行物体检测可用。具体而言，我们专注于JPEG图像，并提出检测架构在JPEG规范的特殊性，对于新设计的透彻分析。这导致了$ \次1.7 $加快与标准的基于RGB的架构比较，而只降低了5.5％，检测性能。此外，我们的经验发现证明的压缩的JPEG信息，即亮度分量，只有部分可能需要匹配的全部输入法检测精度。

16. Diagnosing Rarity in Human-Object Interaction Detection [PDF] 返回目录
Mert Kilickaya, Arnold Smeulders
Abstract: Human-object interaction (HOI) detection is a core task in computer vision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed visual recognition challenge since many combinations are rarely represented. The performance of the proposed models is limited especially for the tail categories, but little has been done to understand the reason. To that end, in this paper, we propose to diagnose rarity in HOI detection. We propose a three-step strategy, namely Detection, Identification and Recognition where we carefully analyse the limiting factors by studying state-of-the-art models. Our findings indicate that detection and identification steps are altered by the interaction signals like occlusion and relative location, as a result limiting the recognition accuracy.
摘要：人机交互对象（HOI）检测是计算机视觉中的核心任务。我们的目标是本地化所有人类对象对和识别它们之间的相互作用。因为许多组合由<动词，名词>元组引线定义到长尾视觉识别挑战的相互作用很少表示。提出的模型的性能限制特别是对尾类，但几乎已经完成，明白其中的道理。为此，在本文中，我们建议在检测HOI诊断罕见的。我们提出了一个三步走的战略，即检测，鉴定和认可，我们仔细研究通过国家的最先进的分析模型的限制因素。我们的研究结果表明，检测和识别的步骤由相同的闭塞和相对位置的相互作用的信号改变，结果限制了识别精度。

17. Estimating semantic structure for the VQA answer space [PDF] 返回目录
Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
Abstract: Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers. Despite its convenience, this classification approach poorly reflects the semantics of the problem limiting the answering to a choice between independent proposals, without taking into account the similarity between them (e.g. equally penalizing for answering cat or German shepherd instead of dog). We address this issue by proposing (1) two measures of proximity between VQA classes, and (2) a corresponding loss which takes into account the estimated proximity. This significantly improves the generalization of VQA models by reducing their language bias. In particular, we show that our approach is completely model-agnostic since it allows consistent improvements with three different VQA models. Finally, by combining our method with a language bias reduction approach, we report SOTA-level performance on the challenging VQAv2-CP dataset.
摘要：由于它的外观，视觉答疑（VQA，即回答了提出的图像的问题），一直被视为在一组预定义答案的分类问题。尽管它的便利性，这种分类方法反映不佳限制了应答独立提案之间进行选择的问题的语义，而不考虑它们之间的相似性（例如同样惩罚回答猫或德国牧羊犬，而不是狗）。我们通过提出解决这个问题（1）VQA类之间的接近，以及两项措施（2）相应的损失，考虑到估计接近。通过减少他们的语言偏这显著提高VQA车型的推广。特别是，我们表明，我们的做法是完全模型无关，因为它可以用三种不同的型号VQA持续改善。最后，我们的方法与语言偏倚减少相结合的方法，我们报告的挑战VQAv2-CP数据集SOTA级性能。

18. Real-time single image depth perception in the wild with handheld devices [PDF] 返回目录
Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi, Fabio Tosi, Stefano Mattoccia
Abstract: Depth perception is paramount to tackle real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image represents the most versatile solution, since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit its practical deployment: i) the low reliability when deployed in-the-wild and ii) the demanding resource requirements to achieve real-time performance, often not compatible with such devices. Therefore, in this paper, we deeply investigate these issues showing how they are both addressable adopting appropriate network design and training strategies -- also outlining how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild.
摘要：深度知觉是至关重要的，以解决现实世界的问题，从自主行驶到消费应用。对于从单个图像后者，深度估计代表了最通用的解决方案，因为一个标准的照相机是可用的几乎任何手持设备上。在最狂野部署时我）低可靠性和ii）苛刻的资源需求，实现实时性能，往往没有这样的设备兼容：尽管如此，两个主要问题限制了它的实际部署。因此，在本文中，我们深深调查这些问题表明他们是如何既寻址采取适当的网络设计和培训战略 - 也概述了如何在手持设备所产生的网络映射，实现实时性能。我们全面的评估景点，例如快速网络的概括以及对新环境的能力，一个重要的特征需要解决面临的实际应用中非常不同的背景下。事实上，为了进一步支持这方面的证据，我们报告关于实时深度感知增强现实和形象，在最狂野的智能手机模糊的实验结果。

19. Unique Faces Recognition in Videos [PDF] 返回目录
Jiahao Huo, Terence L van Zyl
Abstract: This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make use of still images and sequences from videos for training the networks and compare the performances implementing the above architectures. The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos. The contribution of this paper is two-fold: to begin, the experiments have established 3-D Convolutional networks and 2-D LSTMs with the contrastive loss on image sequences do not outperform Google/Inception architecture with contrastive loss in top $n$ rank face retrievals with still images. However, the 3-D Convolution networks and 2-D LSTM with triplet Loss outperform the Google/Inception with triplet loss in top $n$ rank face retrievals on the dataset; second, a Support Vector Machine (SVM) was used in conjunction with the CNNs' learned feature representations for facial identification. The results show that feature representation learned with triplet loss is significantly better for n-shot facial identification compared to contrastive loss. The most useful feature representations for facial identification are from the 2-D LSTM with triplet loss. The experiments show that learning spatio-temporal features from video sequences is beneficial for facial recognition in videos.
摘要：本文铲球人脸识别在采用公制的学习方法和相似性排名模式的视频。纸进行比较的使用，用对比损失重态和三重网络与执行以下体系结构三重损失连体网络的：谷歌/启架构，三维卷积网络（C3D），和2-d长短期记忆（LSTM）回归神经网络。我们利用仍然视频图像和序列的用于训练网络和比较实施上述架构的性能。使用的数据集是YouTube的人脸库专为视频调查人脸识别的问题。本文的贡献是双重的：开始的实验已3 d卷积网络和对图像序列进行对比损失2-d LSTMs不超越谷歌/盗梦空间结构与顶部$ n $的排名对比损失脸检索与静止图像。然而，3 d卷积网络和2-d LSTM使用三线损胜过谷歌/盗梦空间在顶部$ N对数据集$排名脸检索三重损失;第二，支持向量机（SVM）与细胞神经网络用于面部识别学习地物的表示一起使用。结果表明，使用三线损了解到，特征表示显著更好的是正拍面部识别相比，对比损失。面部识别最有用的功能表示是从2-d LSTM使用三线损失。实验结果表明，从视频序列学习的时空特点是在视频面部识别有利。

20. Delta Descriptors: Change-Based Place Representation for Robust Visual Localization [PDF] 返回目录
Sourav Garg, Ben Harwood, Gaurangi Anand, Michael Milford
Abstract: Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches have been developed to address this challenge including deep-learnt image descriptors, domain translation, and sequential filtering, all with shortcomings including generality and velocity-sensitivity. In this paper we propose a novel descriptor derived from tracking changes in any learned global descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the offsets induced in the original descriptor matching space in an unsupervised manner by considering temporal differences across places observed along a route. Like all other approaches, Delta Descriptors have a shortcoming volatility on a frame to frame basis - which can be overcome by combining them with sequential filtering methods. Using two benchmark datasets, we first demonstrate the high performance of Delta Descriptors in isolation, before showing new state-of-the-art performance when combined with sequence-based matching. We also present results demonstrating the approach working with a second different underlying descriptor type, and two other beneficial properties of Delta Descriptors in comparison to existing techniques: their increased inherent robustness to variations in camera motion and a reduced rate of performance degradation as dimensional reduction is applied. Source code will be released upon publication.
摘要：视觉识别的地方是具有挑战性的，因为有这么多的因素会导致一个地方要改变外观，从昼夜周期，季节变化对大气条件。近年来大范围的办法已经制定，以应对这一挑战，包括深学图像描述，域转换和连续过滤，所有的缺点，包括通用和速度灵敏度。在本文中，我们提出了从跟踪随着时间的推移任何了解到全局描述，被称为三角洲描述符的变化得出一个新的描述符。德尔塔描述符减轻通过考虑跨沿路线观察到的地方中的时间差在原始描述符匹配空间诱导以无监督的方式偏移。像所有其他的方法，三角洲描述符有一个帧与帧的基础上的缺点波动 - 这可以通过用连续的过滤相结合的方法来克服。使用两个基准数据集，我们首先证明德尔塔描述符的孤立的高性能，当与基于序列的匹配组合表示状态的最先进的新的性能之前。我们还本发明的结果证明该方法具有第二不同的底层描述符类型的工作，并且相比于现有技术德尔塔描述符的其它两个有益性质：它们增加的固有的鲁棒性的变化在相机运动和性能劣化的减小的速率降维是应用。源代码将在出版发行。

21. Rendering Natural Camera Bokeh Effect with Deep Learning [PDF] 返回目录
Andrey Ignatov, Jagruti Patel, Radu Timofte
Abstract: Bokeh is an important artistic effect used to highlight the main object of interest on the photo by blurring all out-of-focus areas. While DSLR and system camera lenses can render this effect naturally, mobile cameras are unable to produce shallow depth-of-field photos due to a very small aperture diameter of their optics. Unlike the current solutions simulating bokeh by applying Gaussian blur to image background, in this paper we propose to learn a realistic shallow focus technique directly from the photos produced by DSLR cameras. For this, we present a large-scale bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR with 50mm f/1.8 lenses. We use these images to train a deep learning model to reproduce a natural bokeh effect based on a single narrow-aperture image. The experimental results show that the proposed approach is able to render a plausible non-uniform bokeh even in case of complex input data with multiple objects. The dataset, pre-trained models and codes used in this paper are available on the project website.
摘要：背景虚化使用模糊失焦的所有领域，以突出照片的兴趣主要对象的一个重要的艺术效果。虽然DSLR和系统的相机镜头可以自然地呈现这样的效果，移动照相机不能产生深度的场浅照片由于它们的光学器件的一个非常小的孔直径。不同于目前的解决方案模拟背景虚化应用高斯模糊图像背景，在本文中，我们建议直接从数码单反相机所产生的照片学到现实浅的重点技术。对于这一点，我们提出了一个大型散景数据集包括使用Canon 7D DSLR用的50mm f / 1.8镜头拍摄5K浅/深宽的场图像对。我们使用这些图像来训练了深刻的学习模式重现基于单一狭窄的孔径像在自然背景虚化效果。实验结果表明，所提出的方法是能够呈现一个合理的非均匀背景虚化，即使在具有多个对象的复杂输入数据的情况下。该数据集，预先训练模式，本文使用的代码都可以在项目网站上。

22. TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model [PDF] 返回目录
Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, Cewu Lu
Abstract: Multi-object tracking is a fundamental vision problem that has been studied for a long time. As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework. Despite the success of TBD, this two-step method is too complicated to train in an end-to-end manner and induces many challenges as well, such as insufficient exploration of video spatial-temporal information, vulnerability when facing object occlusion, and excessive reliance on detection results. To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the ``bounding-tube" to indicate temporal-spatial locations of objects in a short video clip. TubeTK provides a novel direction of multi-object tracking, and we demonstrate its potential to solve the above challenges without bells and whistles. We analyze the performance of TubeTK on several MOT benchmarks and provide empirical evidence to show that TubeTK has the ability to overcome occlusions to some extent without any ancillary technologies like Re-ID. Compared with other methods that adopt private detection results, our one-stage end-to-end model achieves state-of-the-art performances even if it adopts no ready-made detection results. We hope that the proposed TubeTK model can serve as a simple but strong alternative for video-based MOT task. The code and models are available at this https URL.
摘要：多目标跟踪的是，已经研究了很长一段时间的基本视力问题。由于深学习带来了出色的表演目标检测算法，通过跟踪检测（TBD）已经成为主流跟踪框架。尽管TBD的成功，这两个步骤的方法过于复杂，列车在端至端方式和诱导许多挑战，以及，如视频时空信息，脆弱性勘探不足面向对象遮挡时，和过度依赖于检测结果。为了应对这些挑战，我们提出了一个简明的端至端模型TubeTK仅通过引入``边界管”，以指示在一个短片对象的时空位置需要一个步骤的训练。TubeTK提供一种新颖的方向多目标跟踪，而且我们展示了其潜在的解决，而不花俏上述挑战，我们分析的几个MOT基准TubeTK的性能，并提供经验证据表明，TubeTK必须克服闭塞在一定程度上没有任何的能力辅助技术，如重新标识。与采用专用的检测结果的其他方法相比，我们的一个级端至端模型实现国家的最先进的性能，即使采用没有现成的检测结果。我们希望，所提出的TubeTK模型可以作为基于视频的MOT任务简单而强有力的替代，代码和模型可在此HTTPS URL。

23. H3DNet: 3D Object Detection Using Hybrid Geometric Primitives [PDF] 返回目录
Zaiwei Zhang, Bo Sun, Haitao Yang, Qixing Huang
Abstract: We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels. The critical idea of H3DNet is to predict a hybrid set of geometric primitives, i.e., BB centers, BB face centers, and BB edge centers. We show how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometric primitives. This distance function enables continuous optimization of object proposals, and its local minimums provide high-fidelity object proposals. H3DNet then utilizes a matching and refinement module to classify object proposals into detected objects and fine-tune the geometric parameters of the detected objects. The hybrid set of geometric primitives not only provides more accurate signals for object detection than using a single type of geometric primitives, but it also provides an overcomplete set of constraints on the resulting 3D layout. Therefore, H3DNet can tolerate outliers in predicted geometric primitives. Our model achieves state-of-the-art 3D detection results on two large datasets with real 3D scans, ScanNet and SUN RGB-D.
摘要：我们介绍H3DNet，这需要一个无色3D点云作为输入和输出的面向对象边界框（或BB）和它们的语义标签的集合。 H3DNet的临界想法是预测的混合组几何图元，即，BB中心，BB面中心和BB边缘中心。我们展示了如何通过定义对象和几何图元之间的距离函数预测的几何图元到对象的建议转换。该距离具有如下功能，对象提案连续优化的，其局部最小值提供高保真对象的建议。然后H3DNet利用匹配和细化模块来分类对象的建议成检测到的物体和微调检测到的物体的几何参数。几何图元的所述的混合组不仅提供了一种用于对象检测更准确的信号比使用一个单一类型的几何图元的，但它也提供了一个过完备集上所得到的三维布局约束。因此，H3DNet能容忍在预测的几何图元的异常值。我们的模型实现了与真正的3D扫描，ScanNet和SUN RGB-d上的两个大型数据集的国家的最先进的三维检测结果。

24. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition [PDF] 返回目录
Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D. Abowd, Nicholas D. Lane, Thomas Ploetz
Abstract: The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition.
摘要：由于缺乏大型的，标记的数据集阻碍了发展强大和广义预测模型在体基于传感器的人类活动识别（HAR）的进展。在人类活动的识别标记的数据是稀少而难以通过来，作为传感器数据的收集是昂贵的，并且注释是耗时且易于出错的。为了解决这个问题，我们引入IMUTube，自动化的处理流水线，现有的计算机视觉和信号处理技术，人类活动的视频转换成IMU数据的虚拟流集成。这些虚拟IMU流表示在各种各样的对人体的位置加速度计。我们展示了几乎生成的IMU数据如何改进的已知HAR数据集多种车型的性能。我们的初步结果非常乐观，但这项工作处于由计算机视觉，信号处理和行为识别社区集体方法的更大承诺的方式，我们列出了扩大这项工作。这应该会导致身体上的，基于传感器HAR成为表彰大数据集的突破又一个成功的故事。

25. 3D geometric moment invariants from the point of view of the classical invariant theory [PDF] 返回目录
Leonid Bedratyuk
Abstract: The aim of this paper is to clear up the problem of the connection between the 3D geometric moments invariants and the invariant theory, considering a problem of describing of the 3D geometric moments invariants as a problem of the classical invariant theory. Using the remarkable fact that the groups $SO(3)$ and $SL(2)$ are locally isomorphic, we reduced the problem of deriving 3D geometric moments invariants to the well-known problem of the classical invariant theory. We give a precise statement of the 3D geometric invariant moments computation, introducing the notions of the algebras of simultaneous 3D geometric moment invariants, and prove that they are isomorphic to the algebras of joint $SL(2)$-invariants of several binary forms. To simplify the calculating of the invariants we proceed from an action of Lie group $SO(3)$ to an action of its Lie algebra $\mathfrak{sl}_2$. The author hopes that the results will be useful to the researchers in the fields of image analysis and pattern recognition.
摘要：本文的目的是清理3D几何矩不变量与不变量理论之间的联系的问题，考虑到3D几何矩不变量的描述为经典不变量理论的问题的问题。使用显着的事实是，群体$ SO（3）$ $和SL（2）$在本地同构，我们减少了获得3D几何矩不变量的经典不变量理论的众所周知的问题的问题。我们给三维几何不变矩计算的精确的陈述，介绍同时三维几何不变矩的代数的概念，并证明他们是同构的联合$ SL（2）$的代数 - 的几个二进制形式不变。为了简化，我们从李群$ SO（3）$的动作继续进行的不变量的计算到其李代数$ \ mathfrak {SL} _2 $的动作。笔者希望，结果将是图像分析和模式识别领域的研究人员非常有用。

26. Agrupamento de Pixels para o Reconhecimento de Faces [PDF] 返回目录
Tiago Buarque Assunção de Carvalho
Abstract: This research starts with the observation that face recognition can suffer a low impact from significant image shrinkage. To explain this fact, we proposed the Pixel Clustering methodology. It defines regions in the image in which its pixels are very similar to each other. We extract features from each region. We used three face databases in the experiments. We noticed that 512 is the maximum number of features needed for high accuracy image recognition. The proposed method is also robust, even if only it uses a few classes from the training set.
摘要：研究与脸部识别可以遭受显著图像收缩低影响的观察开始。为了解释这个事实，我们提出了像素聚类方法。它定义了图像中的区域，其中其像素是非常相似的对方。我们从各区域提取特征。我们在实验中使用的三种人脸数据库。我们注意到，512是需要高精度图像识别功能的最大数量。该方法也是稳健的，哪怕只是它使用了几类训练组。

27. Scalable Backdoor Detection in Neural Networks [PDF] 返回目录
Haripriya Harikumar, Vuong Le, Santu Rana, Sourangshu Bhattacharya, Sunil Gupta, Svetha Venkatesh
Abstract: Recently, it has been shown that deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
摘要：最近，它已经表明，深度学习模式很容易受到木马攻击，黑客可以在这些过程中的训练时间安装一个后门，使被污染的小触发补丁最终模型错误识别的样本。当前后门检测方法不能达到良好的检测性能，是计算昂贵的。在本文中，我们提出了一个新的触发逆向工程为基础的方法，其计算复杂度不带标签的数量规模，并基于这一措施是在不同的网络和斑块类型都解释和普遍的。在实验中，我们观察到，我们的方法实现了从单纯的模型，这是在当前国家的最先进方法的改进分离木马模型一个完善的比分。

28. A survey on deep hashing for image retrieval [PDF] 返回目录
Xiaopeng Zhang
Abstract: Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.
摘要：哈希已被广泛应用于近似最近搜索其计算和存储效率的大型数据库的检索。深散列，其图谋卷积神经网络架构，开发和提取图像的语义信息或功能，已获得最近越来越多的关注。在本次调查中，对于图像检索几个深监督哈希方法进行评估，并得出结论：我深有监督的散列方法三个主要不同的方向。一些评论在年底做出。此外，通过现有哈希方法瓶颈，我提出了一个阴影复发散列（SRH）方法一试。具体地讲，我制定一个CNN架构来提取图像的语义特征而设计的损失函数，以鼓励类似的图像投影接近。为此，我提出一个概念：CNN输出的阴影。在优化过程中，CNN输出和它的影子被引导彼此，以便实现尽可能最佳的解决方案。在数据集CIFAR-10的几个实验表明SRH的满足性能。

29. Deep Learning for Change Detection in Remote Sensing Images: Comprehensive Review and Meta-Analysis [PDF] 返回目录
Lazhar Khelifi, Max Mignotte
Abstract: Deep learning (DL) algorithms are considered as a methodology of choice for remote-sensing image analysis over the past few years. Due to its effective applications, deep learning has also been introduced for automatic change detection and achieved great success. The present study attempts to provide a comprehensive review and a meta-analysis of the recent progress in this subfield. Specifically, we first introduce the fundamentals of deep learning methods which arefrequently adopted for change detection. Secondly, we present the details of the meta-analysis conducted to examine the status of change detection DL studies. Then, we focus on deep learning-based change detection methodologies for remote sensing images by giving a general overview of the existing methods. Specifically, these deep learning-based methods were classified into three groups; fully supervised learning-based methods, fully unsupervised learning-based methods and transfer learning-based techniques. As a result of these investigations, promising new directions were identified for future research. This study will contribute in several ways to our understanding of deep learning for change detection and will provide a basis for further research.
摘要：深学习（DL）算法被认为是首选，在过去的几年遥感图像分析的方法。由于它的有效应用，深度学习也被引进了自动变化检测，取得了巨大的成功。本研究试图提供一个全面审查，并在此子的最新进展荟萃分析。具体而言，我们首先介绍的这对变化检测arefrequently采用深度学习方法的基本原理。其次，我们提出进行审查的变化检测DL的研究现状的荟萃分析的细节。然后，我们专注于深学习型变化检测方法的遥感图像通过给现有方法的概述。具体而言，这些深基于学习的方法分为三类：充分监督学习为基础的方法，完全无监督的基于学习的方法和转让以学习为基础的技术。由于这些调查的结果，有前途的新方向，确定了今后的研究。这项研究将在几个方面对我们的深度学习的变化检测的理解作出贡献，并提供了进一步研究的基础。

30. Condensing Two-stage Detection with Automatic Object Key Part Discovery [PDF] 返回目录
Zhe Chen, Jing Zhang, Dacheng Tao
Abstract: Modern two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy. To address this problem, we propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts. To this end, we first introduce an automatic object key part discovery task to make neural networks discover representative sub-parts in each foreground object. With these discovered key parts, we then decompose the object appearance modeling into a key part modeling process and a global modeling process for detection. Key part modeling encodes fine and detailed features from the discovered key parts, and global modeling encodes rough and holistic object characteristics. In practice, such decomposition allows us to significantly abridge model parameters without sacrificing much detection accuracy. Experiments on popular datasets illustrate that our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads, with the performance only deteriorating by 1.5% when waiving around 96% of the original model parameters. Codes will be shortly released to the public through GitHub.
摘要：现代两个阶段的对象检测器通常需要过大的模型它们的检测头，以实现高的精度。为了解决这个问题，我们提出的两阶段检测头模型参数可被冷凝并通过集中于对象关键部位减少。为此，我们首先引入一个自动对象关键部分发现任务，使神经网络在发现各前景对象代表子部分。有了这些发现的关键部位，我们再分解对象的外观造型到一个关键部分建模过程和检测技术的全球建模过程。关键零件建模编码罚款，并从细节特征发现关键零部件，以及全球模拟编码粗糙，整体对象的特征。在实践中，这样的分解可以让我们显著删节模型参数，而不牺牲太多的检测精度。上流行的数据集的实验表明我们提出的技术始终保持原有性能的同时，免收50％左右的普通两阶段检测头模型参数，以放弃96％左右的原始模型参数的时候表现只有1.5％恶化。代码不久将公布通过GitHub上的市民。

31. CNN-Based Semantic Change Detection in Satellite Imagery [PDF] 返回目录
Ananya Gupta, Elisabeth Welburn, Simon Watson, Hujun Yin
Abstract: Timely disaster risk management requires accurate road maps and prompt damage assessment. Currently, this is done by volunteers manually marking satellite imagery of affected areas but this process is slow and often error-prone. Segmentation algorithms can be applied to satellite images to detect road networks. However, existing methods are unsuitable for disaster-struck areas as they make assumptions about the road network topology which may no longer be valid in these scenarios. Herein, we propose a CNN-based framework for identifying accessible roads in post-disaster imagery by detecting changes from pre-disaster imagery. Graph theory is combined with the CNN output for detecting semantic changes in road networks with OpenStreetMap data. Our results are validated with data of a tsunami-affected region in Palu, Indonesia acquired from DigitalGlobe.
摘要：及时的灾害风险管理需要精确的路线图，并提示损害评估。目前，这是由志愿者手动标记受灾地区的卫星图像做，但这个过程是缓慢的，往往容易出错。分割算法可以应用到卫星图像以检测道路网络。然而，当他们做出关于道路网络拓扑结构可能不再在这些情况下有效的假设现有方法不适用于受灾地区。在此，我们提出了通过检测从灾前图像的变化识别灾后影像公路四通八达基于CNN的框架。图论与CNN的输出，用于检测与OpenStreetMap的数据的道路网络的语义更改组合。我们的研究结果与在帕卢海啸影响区域的数据进行了验证，从印尼获得DigitalGlobe公司。

32. Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [PDF] 返回目录
Rajeev Yasarla Vishwanath A. Sindagi Vishal M. Patel
Abstract: Recent CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. However, these methods are limited in the sense that they can be trained only on fully labeled data. Due to various challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data and hence, generalize poorly to real-world images. The use of real-world data in training image deraining networks is relatively less explored in the literature. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset while generalizing better using unlabeled real-world images. Through extensive experiments and ablations on several challenging datasets (such as Rain800, Rain200H and DDN-SIRR), we show that the proposed method, when trained on limited labeled data, achieves on-par performance with fully-labeled training. Additionally, we demonstrate that using unlabeled real-world images in the proposed GP-based framework results in superior performance as compared to existing methods. Code is available at: this https URL
摘要：近期基于CNN-图像deraining方法已经取得了重构误差方面具有极好的性能和视觉质量。然而，这些方法在这个意义上，他们可以训练只能在充分的标签数据的限制。由于获得现实世界完全标记的图像数据集deraining各种挑战，现有的方法只在训练的合成产生的数据，因此，不好推广到真实世界的影像。在训练图像deraining网络使用真实世界的数据相对较少文献探讨。我们提出了一种基于高斯过程半监督学习框架，使网络在学习使用合成数据集，同时更好地推广使用未标记的真实世界的影像德兰。通过对一些具有挑战性的数据集大量的实验和消融（如Rain800，Rain200H和DDN-SIRR），我们表明，所提出的方法，在有限的标记数据，在标准杆具有完全标记的训练表现达到训练的时候。此外，我们表明，相比于现有的方法，在优越的性能提出的GP为基础的框架使用结果未标记的真实世界的影像。代码，请访问：此HTTPS URL

33. A gaze driven fast-forward method for first-person videos [PDF] 返回目录
Alan Carvalho Neves, Michel Melo Silva, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento
Abstract: The growing data sharing and life-logging cultures are driving an unprecedented increase in the amount of unedited First-Person Videos. In this paper, we address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder. Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video. We performed several experimental evaluations on publicly available First-Person Videos datasets. The results show that our methodology can fast-forward videos emphasizing moments when the recorder visually interact with scene components while not including monotonous clips.
摘要：越来越多的数据共享和记录生活，文化正在推动于未经编辑的第一人称视频的数量空前增加。在本文中，我们要解决通过创建输入视频的加速版本，并强调重要的时刻，以记录访问的第一人称视频相关信息的问题。我们的方法是基于通过注视和提供输入视频的各帧的语义得分视觉场景分析驱动的关注模型。我们执行的可公开获得的第一人称视频数据集的几个实验评估。结果表明，我们的方法可以快进视频强调的时刻，而这还不包括单调的剪辑场景组件记录视觉互动。

34. Tree Annotations in LiDAR Data Using Point Densities and Convolutional Neural Networks [PDF] 返回目录
Ananya Gupta, Jonathan Byrne, David Moloney, Simon Watson, Hujun Yin
Abstract: LiDAR provides highly accurate 3D point clouds. However, data needs to be manually labelled in order to provide subsequent useful information. Manual annotation of such data is time consuming, tedious and error prone, and hence in this paper we present three automatic methods for annotating trees in LiDAR data. The first method requires high density point clouds and uses certain LiDAR data attributes for the purpose of tree identification, achieving almost 90% accuracy. The second method uses a voxel-based 3D Convolutional Neural Network on low density LiDAR datasets and is able to identify most large trees accurately but struggles with smaller ones due to the voxelisation process. The third method is a scaled version of the PointNet++ method and works directly on outdoor point clouds and achieves an F_score of 82.1% on the ISPRS benchmark dataset, comparable to the state-of-the-art methods but with increased efficiency.
摘要：激光雷达能够提供高精度的三维点云。然而，数据需要以提供随后的有用信息来手动地标记。这样的数据的手动注释是耗时的，在本文中乏味且容易出错的，因此我们本三条自动在LiDAR数据进行注释树的方法。第一种方法要求高密度点云和使用某些LiDAR数据属性为树识别的目的，实现了几乎90％的准确度。第二种方法低密度激光雷达数据集使用基于体素的三维卷积神经网络，并能够准确地识别最大树木但用较小由于voxelisation过程的斗争。第三种方法是PointNet ++方法的缩放版本和直接作用于室外点云和实现了82.1％的F_score上ISPRS基准数据集，媲美国家的最先进的方法，但以增加的效率。

35. 3D Point Cloud Feature Explanations Using Gradient-Based Methods [PDF] 返回目录
Ananya Gupta, Simon Watson, Hujun Yin
Abstract: Explainability is an important factor to drive user trust in the use of neural networks for tasks with material impact. However, most of the work done in this area focuses on image analysis and does not take into account 3D data. We extend the saliency methods that have been shown to work on image data to deal with 3D data. We analyse the features in point clouds and voxel spaces and show that edges and corners in 3D data are deemed as important features while planar surfaces are deemed less important. The approach is model-agnostic and can provide useful information about learnt features. Driven by the insight that 3D data is inherently sparse, we visualise the features learnt by a voxel-based classification network and show that these features are also sparse and can be pruned relatively easily, leading to more efficient neural networks. Our results show that the Voxception-ResNet model can be pruned down to 5\% of its parameters with negligible loss in accuracy.
摘要：Explainability是推动在利用神经网络与重大影响的任务用户信赖的重要因素。然而，大多数在这方面所做的工作侧重于图像分析，不考虑3D数据。我们扩大已被证明的工作对图像数据处理3D数据的显着性的方法。我们分析了点云和体素的空间的特征，并显示在三维数据的棱角被认为是重要的特征，而平面被视为不太重要。该方法是模型无关，并能提供有关学会功能的有用信息。通过3D数据本身就是稀疏的有识之士的推动下，我们可以形象地用基于体素的分类网并显示，这些功能也是稀疏，可以相对容易地修剪，从而更有效的神经网络学习功能。我们的研究结果表明，Voxception-RESNET模型可以被修剪到其参数的5 \％与精度损失忽略不计。

36. Dual-stream Maximum Self-attention Multi-instance Learning [PDF] 返回目录
Bin Li, Kevin W. Eliceiri
Abstract: Multi-instance learning (MIL) is a form of weakly supervised learning where a single class label is assigned to a bag of instances while the instance-level labels are not available. Training classifiers to accurately determine the bag label and instance labels is a challenging but critical task in many practical scenarios, such as computational histopathology. Recently, MIL models fully parameterized by neural networks have become popular due to the high flexibility and superior performance. Most of these models rely on attention mechanisms that assign attention scores across the instance embeddings in a bag and produce the bag embedding using an aggregation operator. In this paper, we proposed a dual-stream maximum self-attention MIL model (DSMIL) parameterized by neural networks. The first stream deploys a simple MIL max-pooling while the top-activated instance embedding is determined and used to obtain self-attention scores across instance embeddings in the second stream. Different from most of the previous methods, the proposed model jointly learns an instance classifier and a bag classifier based on the same instance embeddings. The experiments results show that our method achieves superior performance compared to the best MIL methods and demonstrates state-of-the-art performance on benchmark MIL datasets.
摘要：多实例学习（MIL）是弱监督学习地方，而实例级标签不可用一个类标签分配给实例的包的形式。培训分类准确确定袋标签与实例标识是在许多实际情况下，如计算组织病理学一个具有挑战性的，但重要的任务。近日，由神经网络完全参数化的MIL模型已经由于高度的灵活性和卓越的性能成为流行。大多数这些模型的依赖于关注机制，分配注意力得分跨越实例的嵌入在一个袋子里，并产生袋使用聚合操作嵌入。在本文中，我们提出了由神经网络参数双流最大限度的自我关注的MIL模型（DSMIL）。而顶活化实例嵌入被确定并用来获得跨越实例的嵌入自关注分数第二流中的第一数据流部署了一个简单的MIL MAX-池。大多数以前的方法不同的是，该模型共同学习实例分类器和基于同一个实例的嵌入包分类。实验结果表明，相比于最好的MIL方法我们的方法实现了卓越的性能，并演示在基准MIL数据集的国家的最先进的性能。

37. MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views [PDF] 返回目录
Ke Chen, Ryan Oldja, Nikolai Smolyanskiy, Stan Birchfield, Alexander Popov, David Wehr, Ibrahim Eden, Joachim Pehserl
Abstract: Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present a two-stage deep neural network (MVLidarNet) for multi-class object detection and drivable segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird's eye view, to detect and classify objects. Both stages are simple encoder-decoders. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
摘要：自主驾驶需要采取行动的信息，例如检测和对象分类，并确定该可驱动空间推理。为此，我们提出了多类物体检测和驱动分割两级深的神经网络（MVLidarNet）使用单个激光雷达点云的多个视图。第一阶段处理，以便投射到透视图语义段场景中的点云。第二阶段，然后处理所点云（与来自第一阶段的语义标签一起）投影到鸟瞰，用于检测和分类的对象。这两个阶段是简单的编码器，解码器。我们证明了我们的多视角，多阶段，多类方法能够检测和分类对象，而使用单一激光雷达扫描输入同时确定可行驶空间，与在一百多个车辆和行人具有挑战性的场景时间。该系统有效地在150 fps的操作上的嵌入式GPU设计用于自动驾驶汽车，包括后处理步骤，以保持标识随时间。我们发现两个KITTI和更大的内部数据集的结果，从而由一个数量级的展示方法的扩展能力。

38. Off-the-shelf sensor vs. experimental radar -- How much resolution is necessary in automotive radar classification? [PDF] 返回目录
Nicolas Scheiner, Ole Schumann, Florian Kraus, Nils Appenrodt, Jürgen Dickmann, Bernhard Sick
Abstract: Radar-based road user detection is an important topic in the context of autonomous driving applications. The resolution of conventional automotive radar sensors results in a sparse data representation which is tough to refine during subsequent signal processing. On the other hand, a new sensor generation is waiting in the wings for its application in this challenging field. In this article, two sensors of different radar generations are evaluated against each other. The evaluation criterion is the performance on moving road user object detection and classification tasks. To this end, two data sets originating from an off-the-shelf radar and a high resolution next generation radar are compared. Special attention is given on how the two data sets are assembled in order to make them comparable. The utilized object detector consists of a clustering algorithm, a feature extraction module, and a recurrent neural network ensemble for classification. For the assessment, all components are evaluated both individually and, for the first time, as a whole. This allows for indicating where overall performance improvements have their origin in the pipeline. Furthermore, the generalization capabilities of both data sets are evaluated and important comparison metrics for automotive radar object detection are discussed. Results show clear benefits of the next generation radar. Interestingly, those benefits do not actually occur due to better performance at the classification stage, but rather because of the vast improvements at the clustering stage.
摘要：基于雷达的道路使用者检测是在自动驾驶应用程序上下文的一个重要课题。的常规的汽车雷达传感器的结果以稀疏的数据表示其分辨率后续信号处理过程中是很难精炼。在另一方面，新的传感器生成伺机而其在这一具有挑战性的领域中的应用。在本文中，不同的雷达世代的两个传感器相对于彼此进行评价。评价标准是移动道路使用者目标检测和分类任务的性能。为此，从关断的，现成的雷达和高分辨率下一代雷达两个数据集始发进行比较。特别注意的是在这两个数据集是如何装配的，以使它们相媲美给出。所利用的对象检测器包括一个聚类算法，特征提取模块，以及用于分类的复发性神经网络集成的。对于评估，所有部件都单独测试，第一次，作为一个整体。这使得表示，其中整体性能的提升有其在管道原点。此外，两个数据集的泛化能力进行评估，并用于汽车雷达物体检测重要比较度量进行了讨论。结果表明新一代雷达的明显的好处。有趣的是，这些好处并没有真正发生，由于在分类阶段更好的表现，反倒是因为在聚类阶段的巨大改进。

39. Standardised convolutional filtering for radiomics [PDF] 返回目录
Adrien Depeursinge, Vincent Andrearczyk, Philip Whybra, Joost van Griethuysen, Henning Müller, Roger Schaer, Martin Vallières, Alex Zwanenburg
Abstract: The Image Biomarker Standardisation Initiative (IBSI) aims to improve reproducibility of radiomics studies by standardising the computational process of extracting image biomarkers (features) from images. We have previously established reference values for 169 commonly used features, created a standard radiomics image processing scheme, and developed reporting guidelines for radiomic studies. However, several aspects are not standardised. Here we present a preliminary version of a reference manual on the use of convolutional image filters in radiomics. Filters, such as wavelets or Laplacian of Gaussian filters, play an important part in emphasising specific image characteristics such as edges and blobs. Features derived from filter response maps have been found to be poorly reproducible. This reference manual forms the basis of ongoing work on standardising convolutional filters in radiomics, and will be updated as this work progresses.
摘要：生物标记物的图像标准化倡议（IBSI）旨在通过标准化从图像中提取图像的生物标志物（功能）的计算过程，提高radiomics研究的重现性。我们先前建立的参考值169个常用的功能，创建了一个标准radiomics图像处理方案，并制定报告的radiomic研究的指导方针。然而，一些方面不规范。在这里，我们在radiomics使用卷积图像滤波器中提出的参考手册的初稿。过滤器，如小波和高斯的拉普拉斯过滤器，在强调特定的图像特征，如边缘和斑点发挥重要作用。从滤波器响应的地图所导出的功能已被发现是很难再现。本参考手册构成了正在进行的工作在radiomics规范卷积过滤器的基础上，将更新工作的进展。

40. Dialog Policy Learning for Joint Clarification and Active Learning Queries [PDF] 返回目录
Aishwarya Padmakumar, Raymond J. Mooney
Abstract: Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on dialog systems has either focused on exclusively learning how to perform clarification/ information seeking, or to perform active learning. In this work, we train a hierarchical dialog policy to jointly perform {\it both} clarification and active learning in the context of an interactive language-based image retrieval task motivated by an on-line shopping application, and demonstrate that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions.
摘要：智能系统需要能够从错误中，决心不确定性恢复，并适应训练中没有见到的新概念。对话交互可以通过使用澄清的整改和解决的不确定性，以及主动学习的查询启用此学习操作过程中遇到的新概念。在对话系统之前的工作要么集中在专门学习如何进行澄清/信息搜索，或进行主动学习。在这项工作中，我们培养的层次对话的政策，共同执行{\它既}澄清和主动学习通过一个在线购物应用动机交互式的基于语言的图像检索任务的上下文，并表明共同学习对话框政策澄清和主动学习比使用静态对话框政策的一个或两个的这些功能更有效。

41. D-VPnet: A Network for Real-time Dominant Vanishing Point Detection in Natural Scenes [PDF] 返回目录
Yin-Bo Liu, Ming Zeng, Qing-Hao Meng
Abstract: As an important part of linear perspective, vanishing points (VPs) provide useful clues for mapping objects from 2D photos to 3D space. Existing methods are mainly focused on extracting structural features such as lines or contours and then clustering these features to detect VPs. However, these techniques suffer from ambiguous information due to the large number of line segments and contours detected in outdoor environments. In this paper, we present a new convolutional neural network (CNN) to detect dominant VPs in natural scenes, i.e., the Dominant Vanishing Point detection Network (D-VPnet). The key component of our method is the feature line-segment proposal unit (FLPU), which can be directly utilized to predict the location of the dominant VP. Moreover, the model also uses the two main parallel lines as an assistant to determine the position of the dominant VP. The proposed method was tested using a public dataset and a Parallel Line based Vanishing Point (PLVP) dataset. The experimental results suggest that the detection accuracy of our approach outperforms those of state-of-the-art methods under various conditions in real-time, achieving rates of 115fps.
摘要：线性透视的重要组成部分，消失点（VPS）提供用于从2D照片到3D空间映射对象的有用线索。现有的方法主要集中于提取结构特征，例如线或轮廓，然后聚类这些特征来检测的VP。然而，这些技术从暧昧信息遭受由于大量在室外环境中检测到的线段和轮廓。在本文中，我们提出了一个新的卷积神经网络（CNN），以检测在自然场景，主导的VP即，显性消失点检测网络（d-VPnet）。我们的方法的关键成分是特征线段建议单元（FLPU），其可直接用于预测的主导VP的位置。此外，该模型还使用两个主平行线作为辅助以确定主VP的位置。所提出的方法是使用一个公共数据集和平行线基于消失点（PLVP）数据集进行测试。实验结果表明，我们的方法的检测精度优于那些在实时各种条件下的国家的最先进的方法，实现115fps的速率。

42. SAL++: Sign Agnostic Learning with Derivatives [PDF] 返回目录
Matan Atzmon, Yaron Lipman
Abstract: Learning 3D geometry directly from raw data, such as point clouds, triangle soups, or un-oriented meshes is still a challenging task that feeds many downstream computer vision and graphics applications. In this paper, we introduce SAL++: a method for learning implicit neural representations of shapes directly from such raw data. We build upon the recent sign agnostic learning (SAL) approach and generalize it to include derivative data in a sign agnostic manner. In more detail, given the unsigned distance function to the input raw data, we suggest a novel sign agnostic regression loss, incorporating both pointwise values and gradients of the unsigned distance function. Optimizing this loss leads to a signed implicit function solution, the zero level set of which is a high quality, valid manifold approximation to the input 3D data. We demonstrate the efficacy of SAL++ shape space learning from two challenging datasets: ShapeNet that contains inconsistent orientation and non-manifold meshes, and D-Faust that contains raw 3D scans (triangle soups). On both these datasets, we present state-of-the-art results.
摘要：直接学习三维几何的原始数据，如点云，三角汤，或不定向网格仍然是一个具有挑战性的任务是饲料许多下游计算机视觉和图形应用程序。在本文中，我们介绍SAL ++：直接从这样的原始数据学习形状的隐神经表征的方法。我们建立在最近的迹象不可知学习（SAL）方法，并推广到包括有迹象不可知的方式导出数据。更详细地说，考虑到无符号距离函数与输入的原始数据，我们提出了一种新颖的符号无关的回归损耗，结合两者逐点值和无符号的距离函数的梯度。优化这种损失导致已签名的隐函数的解决方案，在零水平集，其中是一种高品质的，有效的歧管逼近输入3D数据。我们证明SAL ++形状空间学习的来自两个数据集的挑战的功效：ShapeNet包含不一致的取向和非流形网格，和d-浮士德包含原始3D扫描（三角形汤）。在这两个数据集，我们现在国家的先进成果。

43. Open-Narrow-Synechiae Anterior Chamber Angle Classification in AS-OCT Sequences [PDF] 返回目录
Huaying Hao, Huazhu Fu, Yanwu Xu, Jianlong Yang, Fei Li, Xiulan Zhang, Jiang Liu, Yitian Zhao
Abstract: Anterior chamber angle (ACA) classification is a key step in the diagnosis of angle-closure glaucoma in Anterior Segment Optical Coherence Tomography (AS-OCT). Existing automated analysis methods focus on a binary classification system (i.e., open angle or angle-closure) in a 2D AS-OCT slice. However, clinical diagnosis requires a more discriminating ACA three-class system (i.e., open, narrow, or synechiae angles) for the benefit of clinicians who seek better to understand the progression of the spectrum of angle-closure glaucoma types. To address this, we propose a novel sequence multi-scale aggregation deep network (SMA-Net) for open-narrow-synechiae ACA classification based on an AS-OCT sequence. In our method, a Multi-Scale Discriminative Aggregation (MSDA) block is utilized to learn the multi-scale representations at slice level, while a ConvLSTM is introduced to study the temporal dynamics of these representations at sequence level. Finally, a multi-level loss function is used to combine the slice-based and sequence-based losses. The proposed method is evaluated across two AS-OCT datasets. The experimental results show that the proposed method outperforms existing state-of-the-art methods in applicability, effectiveness, and accuracy. We believe this work to be the first attempt to classify ACAs into open, narrow, or synechia types grading using AS-OCT sequences.
摘要：前房角（ACA）分类是闭角型青光眼的眼前段光学相干断层扫描（AS-OCT）的诊断的关键步骤。现有的自动化分析方法集中在2D AS-OCT切片的二元分类系统（即，打开角度或闭角型）上。然而，临床诊断需要更判别ACA三级系统（即，开放的，缩小，或粘连角度）谁寻求更好地理解闭角型青光眼类型的光谱的进展临床医师的好处。为了解决这个问题，我们提出了一个新的序列多尺度汇聚深网（SMA-网）用于基于AS-OCT序列开放窄粘连ACA分类。在我们的方法中，多尺度判别聚合（MSDA）块被用于学习在限幅电平的多尺度表示，而ConvLSTM被引入研究这些表示中的时间动态在序列级别。最后，一个多层次的损失函数用于将基于序列切片为基础，结合损失。该方法是在两个AS-OCT数据集进行评估。实验结果表明，在现有的适用性，有效性和准确性状态的最先进的方法，该方法优于。我们相信，这项工作是对ACAS分为开放，缩小，或使用AS-OCT序列粘连类型分级的首次尝试。

44. A Hybrid Framework for Matching Printing Design Files to Product Photos [PDF] 返回目录
Alper Kaplan, Erdem Akagunduz
Abstract: We propose a real-time image matching framework, which is hybrid in the sense that it uses both hand-crafted features and deep features obtained from a well-tuned deep convolutional network. The matching problem, which we concentrate on, is specific to a certain application, that is, printing design to product photo matching. Printing designs are any kind of template image files, created using a design tool, thus are perfect image signals. However, photographs of a printed product suffer many unwanted effects, such as uncontrolled shooting angle, uncontrolled illumination, occlusions, printing deficiencies in color, camera noise, optic blur, et cetera. For this purpose, we create an image set that includes printing design and corresponding product photo pairs with collaboration of an actual printing facility. Using this image set, we benchmark various hand-crafted and deep features for matching performance and propose a framework in which deep learning is utilized with highest contribution, but without disabling real-time operation using an ordinary desktop computer.
摘要：我们提出了一个实时图像匹配的框架，这是因为它同时使用手工制作的功能和良好的调整深卷积网络获得深功能感混合。匹配的问题，这是我们专心，是专用于特定的应用程序，即，印刷设计到产品照片匹配。印刷设计是任何类型的模板图像文件，使用设计工具创建的，因此是完美的图像信号。然而，印刷产品的照片遭受许多不想要的效果，诸如不受控制的拍摄角度，不受控制的照明，闭塞，在彩色印刷缺陷，相机噪声，光学模糊，等等。为此，我们创建了一个图像集，包括印刷设计与实际印刷设备的协作相应的产品照片对。使用这种图像集，我们的基准各种手工制作和匹配性能深的特点，并提出在深学习利用具有最高贡献的框架，但没有使用普通的台式电脑禁用实时操作。

45. MeshWalker: Deep Mesh Understanding by Random Walks [PDF] 返回目录
Alon Lahav, Ayellet Tal
Abstract: Most attempts to represent 3D shapes for deep learning have focused on volumetric grids, multi-view images and point clouds. In this paper we look at the most popular representation of 3D shapes in computer graphics - a triangular mesh - and ask how it can be utilized within deep learning. The few attempts to answer this question propose to adapt convolutions & pooling to suit Convolutional Neural Networks (CNNs). This paper proposes a very different approach, termed MeshWalker, to learn the shape directly from a given mesh. The key idea is to represent the mesh by random walks along the surface, which "explore" the mesh's geometry and topology. Each walk is organized as a list of vertices, which in some manner imposes regularity on the mesh. The walk is fed into a Recurrent Neural Network (RNN) that "remembers" the history of the walk. We show that our approach achieves state-of-the-art results for two fundamental shape analysis tasks: shape classification and semantic segmentation. Furthermore, even a very small number of examples suffices for learning. This is highly important, since large datasets of meshes are difficult to acquire.
摘要：大多数试图代表深度学习主要集中在体积网格，多视角图像和点云的3D形状。在本文中，我们看一下3D的最流行的形状表示在计算机图形 - 三角网 - 请问如何能深度学习内使用。在几次尝试回答这个问题提出来适应卷积和汇集，以适应卷积神经网络（细胞神经网络）。本文提出了一种非常不同的方法，称为MeshWalker，直接从给定目学习的形状。其核心思想是表示由沿表面随机游动的网格，其中“探索”网格几何和拓扑结构。每个行走组织为顶点的列表，其中以某种方式强加给网格规律性。步行送入递归神经网络（RNN）说，“记住”行走的历史。我们表明，我们的方法实现了国家的最先进的成果有两个基本形状分析任务：形状分类和语义分割。此外，即使一个例子极少数足够了学习。这是非常重要的，因为网格的大型数据集都很难获得。

46. Towards Good Practices for Data Augmentation in GAN Training [PDF] 返回目录
Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Trung-Kien Nguyen, Ngai-Man Cheung
Abstract: Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been applied in these applications. In this work, we first argue that the classical DA approach could mislead the generator to learn the distribution of the augmented data, which could be different from that of the original data. We then propose a principled framework, termed Data Augmentation Optimized for GAN (DAG), to enable the use of augmented data in GAN training to improve the learning of the original distribution. We provide theoretical analysis to show that using our proposed DAG aligns with the original GAN in minimizing the JS divergence w.r.t. the original distribution and it leverages the augmented data to improve the learnings of discriminator and generator. The experiments show that DAG improves various GAN models. Furthermore, when DAG is used in some GAN models, the system establishes state-of-the-art Fréchet Inception Distance (FID) scores.
摘要：在创成对抗性网络（GAN）最近的成功也肯定了甘训练使用更多数据的重要性。然而，这是昂贵的许多领域，如医疗应用收集数据。数据增强（DA）已经在这些应用中得到了应用。在这项工作中，我们首先认为，经典的DA方法会误导发电机学习增强的数据，这可能是从原始数据的不同分布。然后，我们提出了一个原则性的框架，称为数据增强优化为GAN（DAG），使赣培训中使用增强的数据，以提高原始分布的学习。我们提供的理论分析表明，在最小化JS偏差w.r.t.使用我们提出的DAG对齐与原GAN原始分布和它利用了增强数据以提高鉴别器和发电机所学的知识。实验结果表明，DAG提高多种型号甘。此外，当在DAG一些GAN模型被使用时，系统建立状态的最先进的Fréchet可启距离（FID）的分数。

47. mEBAL: A Multimodal Database for Eye Blink Detection and Attention Level Estimation [PDF] 返回目录
Roberto Daza, Aythami Morales, Julian Fierrez, Ruben Tolosana
Abstract: This work presents mEBAL, a multimodal database for eye blink detection and attention level estimation. The eye blink frequency is related to the cognitive activity and automatic detectors of eye blinks have been proposed for many tasks including attention level estimation, analysis of neuro-degenerative diseases, deception recognition, drive fatigue detection, or face anti-spoofing. However, most existing databases and algorithms in this area are limited to experiments involving only a few hundred samples and individual sensors like face cameras. The proposed mEBAL improves previous databases in terms of acquisition sensors and samples. In particular, three different sensors are simultaneously considered: Near Infrared (NIR) and RGB cameras to capture the face gestures and an Electroencephalography (EEG) band to capture the cognitive activity of the user and blinking events. Regarding the size of mEBAL, it comprises 6,000 samples and the corresponding attention level from 38 different students while conducting a number of e-learning tasks of varying difficulty. In addition to presenting mEBAL, we also include preliminary experiments on: i) eye blink detection using Convolutional Neural Networks (CNN) with the facial images, and ii) attention level estimation of the students based on their eye blink frequency.
摘要：这项工作提出mEBAL，眼睛眨眼检测和注意力水平估计的多数据库。眨眼频率与认知活性和眨眼的自动检测器已经被提出了许多任务，包括注意力水平估计，神经退化性疾病的分析，欺骗识别，驱动疲劳检测，或面反欺骗。然而，大多数现有的数据库，并在此领域的算法局限于仅涉及几百个样品，各个传感器一样的脸相机实验。所提出的mEBAL提高了采集传感器和样本表示先前的数据库。特别是，三个不同的传感器被同时考虑：近红外（NIR）和RGB相机捕捉面部姿势和脑电图（EEG）带捕获用户和闪烁事件的认知活性。关于mEBAL的大小，它包括6000个样品和同时进行若干不同难度的电子学习任务从38名不同的学生相应关注程度。除了提出mEBAL，我们还包括预备实验：使用卷积神经网络（CNN）与面部图像I）眨眼检测，以及ii）根据自己的眨眼频率学生的注意力水平估计。

48. ComboNet: Combined 2D & 3D Architecture for Aorta Segmentation [PDF] 返回目录
Orhan Akal, Zhigang Peng, Gerardo Hermosillo Valadez
Abstract: 3D segmentation with deep learning if trained with full resolution is the ideal way of achieving the best accuracy. Unlike in 2D, 3D segmentation generally does not have sparse outliers, prevents leakage to surrounding soft tissues, at the very least it is generally more consistent than 2D segmentation. However, GPU memory is generally the bottleneck for such an application. Thus, most of the 3D segmentation applications handle sub-sampled input instead of full resolution, which comes with the cost of losing precision at the boundary. In order to maintain precision at the boundary and prevent sparse outliers and leakage, we designed ComboNet. ComboNet is designed in an end to end fashion with three sub-network structures. The first two are parallel: 2D UNet with full resolution and 3D UNet with four times sub-sampled input. The last stage is the concatenation of 2D and 3D outputs along with a full-resolution input image which is followed by two convolution layers either with 2D or 3D convolutions. With ComboNet we have achieved $92.1\%$ dice accuracy for aorta segmentation. With Combonet, we have observed up to $2.3\%$ improvement of dice accuracy as opposed to 2D UNet with the full-resolution input image.
摘要：3D分割与深度学习，如果经过培训的全分辨率达到最佳精度的理想方法。不像在2D，3D分割通常不具有疏离群值，防止泄漏到周围的软组织，至少是它通常比二维分割更加一致。然而，GPU存储器通常为这样的应用程序中的瓶颈。因此，大部分的3D应用程序分割处理的子采样输入而不是全分辨率，附带在边界失去精度的成本。为了在边界保持精度和防止稀疏异常值和泄漏，我们设计ComboNet。 ComboNet被设计成端对端的方式与三个子网络结构。前两个是平行：2D UNET与全分辨率和3D UNET与四倍的子采样的输入。最后阶段是2D和3D输出的与被后跟两个卷积层或者与2D或3D卷积全分辨率输入图像沿串联。随着ComboNet我们已经取得了$ 92.1 \％$骰子准确性主动脉分割。随着Combonet，我们已经观察到高达骰子准确性$ 2.3 \％$的改善，而不是2D UNET与全分辨率输入图像。

49. Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps [PDF] 返回目录
Xiaolin Zhang, Yunchao Wei, Yi Yang, Fei Wu
Abstract: Recently, remarkable progress has been made in weakly supervised object localization (WSOL) to promote object localization maps. The common practice of evaluating these maps applies an indirect and coarse way, i.e., obtaining tight bounding boxes which can cover high-activation regions and calculating intersection-over-union (IoU) scores between the predicted and ground-truth boxes. This measurement can evaluate the ability of localization maps to some extent, but we argue that the maps should be measured directly and delicately, i.e., comparing the maps with the ground-truth object masks pixel-wisely. To fulfill the direct evaluation, we annotate pixel-level object masks on the ILSVRC validation set. We propose to use IoU-Threshold curves for evaluating the real quality of localization maps. Beyond the amended evaluation metric and annotated object masks, this work also introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. We propose a two-stage approach to generate the localization maps by simply comparing the similarity of point-wise features between the high-activation and the rest pixels. Based on the predicted localization maps, we explore to estimate object boundaries on a very large dataset. A hard-negative suppression loss is proposed for obtaining fine boundaries. We conduct extensive experiments on the ILSVRC and CUB benchmarks. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC. The code and the annotated masks are released at this https URL.
摘要：近日，显着的进步已经在弱监督对象定位（WSOL），以促进目标定位地图制作。评估这些地图的一般做法施加间接和粗方式，即，获得紧密的包围盒，其可以覆盖高活化区域和计算交点-过联盟（IOU）的预测和地面实况框之间的分数。该测量可以评估的定位的地图的一定程度的能力，但是我们认为，映射应直接和微妙测量，即，将与所述地面真对象掩模像素明智的地图。为了满足直接评价，对ILSVRC验证集我们注释像素级别的对象口罩。我们建议使用欠条，阈值曲线用于评估本地化的地图的真正品质。除了修改评价指标和注释对象口罩，这项工作还引入了一个新的自我提升方法收获准确的目标定位地图和对象边界只有类别标签的监督。我们提出了一个两阶段的方法，通过简单地比较的高活性，其余像素之间的逐点特征的相似性来产生本地化的地图。根据预测的本地化的地图，我们探讨估计一个非常大的数据集对象边界。硬阴性抑制损失提出了获得精细边界。我们进行的ILSVRC和CUB基准广泛的实验。尤其是，所提出的自我提升地图上实现ILSVRC的54.88％的国家的最先进的定位精度。代码和注释的口罩在这个HTTPS URL被释放。

50. Neural Network Activation Quantization with Bitwise Information Bottlenecks [PDF] 返回目录
Xichuan Zhou, Kui Liu, Cong Shi, Haijun Liu, Ji Liu
Abstract: Recent researches on information bottleneck shed new light on the continuous attempts to open the black box of neural signal encoding. Inspired by the problem of lossy signal compression for wireless communication, this paper presents a Bitwise Information Bottleneck approach for quantizing and encoding neural network activations. Based on the rate-distortion theory, the Bitwise Information Bottleneck attempts to determine the most significant bits in activation representation by assigning and approximating the sparse coefficient associated with each bit. Given the constraint of a limited average code rate, the information bottleneck minimizes the rate-distortion for optimal activation quantization in a flexible layer-by-layer manner. Experiments over ImageNet and other datasets show that, by minimizing the quantization rate-distortion of each layer, the neural network with information bottlenecks achieves the state-of-the-art accuracy with low-precision activation. Meanwhile, by reducing the code rate, the proposed method can improve the memory and computational efficiency by over six times compared with the deep neural network with standard single-precision representation. Codes will be available on GitHub when the paper is accepted \url{this https URL}.
摘要：信息瓶颈，近年来的研究揭示了不断的尝试新的灯开启的神经信号编码的黑盒子。通过有损信号压缩的用于无线通信的问题的启发，提出用于量化和编码神经网络激活一个按位信息瓶颈方法。基于速率 - 失真理论，按位信息瓶颈试图通过分配和近似与每一位相关联的稀疏系数确定在激活表示最显著位。给定有限的平均码速率的约束，信息瓶颈最小化在柔性层 - 层的方式的速率 - 失真优化激活量化。过ImageNet和其他数据集实验表明，通过最小化每一层的量化速率 - 失真，随着信息瓶颈神经网络实现国家的最先进的精度低精度活化。同时，通过降低编码率，该方法能改善与标准的单精度表示深神经网络相比通过的6倍多存储器和计算效率。代码将可以在GitHub当纸张接受\ {URL这HTTPS URL}。

51. Multi-spectral Facial Landmark Detection [PDF] 返回目录
Jin Keong, Xingbo Dong, Zhe Jin, Khawla Mallat, Jean-Luc Dugelay
Abstract: Thermal face image analysis is favorable for certain circumstances. For example, illumination-sensitive applications, like nighttime surveillance; and privacy-preserving demanded access control. However, the inadequate study on thermal face image analysis calls for attention in responding to the industry requirements. Detecting facial landmark points are important for many face analysis tasks, such as face recognition, 3D face reconstruction, and face expression recognition. In this paper, we propose a robust neural network enabled facial landmark detection, namely Deep Multi-Spectral Learning (DMSL). Briefly, DMSL consists of two sub-models, i.e. face boundary detection, and landmark coordinates detection. Such an architecture demonstrates the capability of detecting the facial landmarks on both visible and thermal images. Particularly, the proposed DMSL model is robust in facial landmark detection where the face is partially occluded, or facing different directions. The experiment conducted on Eurecom's visible and thermal paired database shows the superior performance of DMSL over the state-of-the-art for thermal facial landmark detection. In addition to that, we have annotated a thermal face dataset with their respective facial landmark for the purpose of experimentation.
摘要：热脸图像分析是某些情况下是有利的。例如，照明敏感的应用，如夜间监视;和隐私保护要求的访问控制。然而，在热脸图像分析的不足，研究呼吁关注在应对行业的要求。面部检测标志点是很多面上的分析任务，如人脸识别，三维人脸重建，面部表情识别重要。在本文中，我们提出了一个强大的神经网络使面部界标检测，即深多光谱学（DMSL）。简言之，DMSL由两个子模型，即面部边界检测，和界标坐标检测。这样的体系结构表明的检测在可见光和热图像的面部界标的能力。特别地，所提出的模型DMSL是在面部标志检测健壮其中面被部分地遮挡，或朝向不同方向。上通信系统工程师学校与研究中心的可见和热配对数据库显示了在国家的最先进的用于热面部标志检测进行DMSL的卓越性能的实验。除此之外，我们已经标注的热脸数据集用于实验的目的，其各自的面部界标。

52. Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [PDF] 返回目录
Qingdong He, Zhengning Wang, Hao Zeng, Yijun Liu, Shuaicheng Liu, Bing Zeng
Abstract: 3D object detection has become an emerging task in autonomous driving scenarios. Previous works process 3D point clouds using either projection-based or voxel-based models. However, both approaches contain some drawbacks. The voxel-based methods lack semantic information, while the projection-based methods suffer from numerous spatial information loss when projected to different views. In this paper, we propose the Stereo RGB and Deeper LIDAR (SRDL) framework which can utilize semantic and spatial information simultaneously such that the performance of network for 3D object detection can be improved naturally. Specifically, the network generates candidate boxes from stereo pairs and combines different region-wise features using a deep fusion scheme. The stereo strategy offers more information for prediction compared with prior works. Then, several local and global feature extractors are stacked in the segmentation module to capture richer deep semantic geometric features from point clouds. After aligning the interior points with fused features, the proposed network refines the prediction in a more accurate manner and encodes the whole box in a novel compact method. The decent experimental results on the challenging KITTI detection benchmark demonstrate the effectiveness of utilizing both stereo images and point clouds for 3D object detection.
摘要：立体物检测已成为自主驾驶情况下的新兴的任务。以前的作品过程中使用任何基于体素的投影或基于模型的三维点云。然而，这两种方法包含一些缺点。基于体素的方法缺乏语义信息，而基于投影的方法从众多的空间信息丢失时投射到不同的观点受到影响。在本文中，我们提出了立体声RGB和更深的LIDAR（SRDL）框架，该框架可以利用语义空间信息同时使得网络为立体物检测的性能可以自然地提高。具体而言，网络从立体声对生成候选框和结合使用深融合方案不同区域方式的特征。与之前的作品相比，预测的立体战略提供了更多的信息。于是，一些局部和全局特征提取的堆叠分割模块中，从点云捕捉更丰富的深层语义的几何特征。具有稠合特征对准的内部点之后，所提出的网络提炼以更精确的方式预测并编码以一种新颖的紧凑方法在整个框。在挑战KITTI检测基准体面实验结果表明利用两个立体图像和点云为立体物检测的有效性。

53. A Note on Deepfake Detection with Low-Resources [PDF] 返回目录
Piotr Kawa, Piotr Syga
Abstract: Deepfakes are videos that include changes, quite often substituting face of a portrayed individual with a different face using neural networks. Even though the technology gained its popularity as a carrier of jokes and parodies it raises a serious threat to ones security - via biometric impersonation or besmearing. In this paper we present two methods that allow detecting Deepfakes for a user without significant computational power. In particular, we enhance MesoNet by replacing the original activation functions allowing a nearly 1% improvement as well as increasing the consistency of the results. Moreover, we introduced and verified a new activation function - Pish that at the cost of slight time overhead allows even higher consistency. Additionally, we present a preliminary results of Deepfake detection method based on Local Feature Descriptors (LFD), that allows setting up the system even faster and without resorting to GPU computation. Our method achieved Equal Error Rate of 0.28, with both accuracy and recall exceeding 0.7.
摘要：Deepfakes是包括改变视频，经常代的脸上刻画单独使用神经网络不同的面貌。尽管该技术获得了其受欢迎的笑话和模仿的载体把它举到那些安全构成严重威胁 - 通过生物识别假冒或besmearing。在本文中，我们提出两种方法，允许在不显著计算能力检测用户Deepfakes。特别是，我们通过更换原激活功能允许近1％的改善以及提高结果的一致性提高中尺度站。此外，我们提出并验证新的激活功能 - 与鱼，在轻微的时间成本开销允许更高的一致性。此外，我们提出了一种基于局部特征（LFD），其允许甚至更快设置系统和不诉诸GPU计算Deepfake检测方法的初步结果。我们的方法取得等错误率0.28，既准确率和查超过0.7。

54. Breaking the Limits of Remote Sensing by Simulation and Deep Learning for Flood and Debris Flow Mapping [PDF] 返回目录
Naoto Yokoya, Kazuki Yamanoi, Wei He, Gerald Baier, Bruno Adriano, Hiroyuki Miura, Satoru Oishi
Abstract: We propose a framework that estimates inundation depth (maximum water level) and debris-flow-induced topographic deformation from remote sensing imagery by integrating deep learning and numerical simulation. A water and debris flow simulator generates training data for various artificial disaster scenarios. We show that regression models based on Attention U-Net and LinkNet architectures trained on such synthetic data can predict the maximum water level and topographic deformation from a remote sensing-derived change detection map and a digital elevation model. The proposed framework has an inpainting capability, thus mitigating the false negatives that are inevitable in remote sensing image analysis. Our framework breaks the limits of remote sensing and enables rapid estimation of inundation depth and topographic deformation, essential information for emergency response, including rescue and relief activities. We conduct experiments with both synthetic and real data for two disaster events that caused simultaneous flooding and debris flows and demonstrate the effectiveness of our approach quantitatively and qualitatively.
摘要：我们提出了一个框架，通过整合深学习和数值模拟是估计浸水深度（最高水位）和泥石流致从遥感影像地形变形。一种水和碎屑流模拟器生成用于各种人造灾难的场景的训练数据。我们示出了基于注意U形网，回归模型和LINKnet远程上训练这种合成数据可以预测从遥感衍生变化检测地图和数字高程模型的最大水位和地形变形架构。所提出的框架有一个补绘能力，从而减轻了假阴性是在遥感图像分析不可避免的。我们的框架打破遥感的限制，使洪水深度和地形变形，应急响应，包括救援和救灾活动的基本信息的快速估计。我们同合成和实际数据有两个灾害事件造成同时浸水，泥石流等实验和定量和定性的证明了该方法的有效性。

55. Reconstruction and Quantification of 3D Iris Surface for Angle-Closure Glaucoma Detection in Anterior Segment OCT [PDF] 返回目录
Jinkui Hao, Huazhu Fu, Yanwu Xu, Yan Hu, Fei Li, Xiulan Zhang, Jiang Liu, Yitian Zhao
Abstract: Precise characterization and analysis of iris shape from Anterior Segment OCT (AS-OCT) are of great importance in facilitating diagnosis of angle-closure-related diseases. Existing methods focus solely on analyzing structural properties identified from the 2D slice, while accurate characterization of morphological changes of iris shape in 3D AS-OCT may be able to reveal in addition the risk of disease progression. In this paper, we propose a novel framework for reconstruction and quantification of 3D iris surface from AS-OCT imagery. We consider it to be the first work to detect angle-closure glaucoma by means of 3D representation. An iris segmentation network with wavelet refinement block (WRB) is first proposed to generate the initial shape of the iris from single AS-OCT slice. The 3D iris surface is then reconstructed using a guided optimization method with Poisson-disk sampling. Finally, a set of surface-based features are extracted, which are used in detecting of angle-closure glaucoma. Experimental results demonstrate that our method is highly effective in iris segmentation and surface reconstruction. Moreover, we show that 3D-based representation achieves better performance in angle-closure glaucoma detection than does 2D-based feature.
摘要：精确表征和眼前节OCT（AS-OCT）虹膜形状的分析是非常重要的促进闭角型相关疾病的诊断。现有的方法只集中于分析来自所述二维切片识别的结构特性，而虹膜在3D AS-OCT形状的形态变化精确的表征可以能够在除了揭示疾病进展的风险。在本文中，我们提出了从AS-OCT成像3D光圈表面的重建和定量的新的框架。我们认为，这是第一个由工作3D表示方式来检测闭角型青光眼。用小波精化块（WRB）虹膜分割网络被首次提出，以产生从单个AS-OCT切片虹膜的初始形状。然后，3D光圈表面使用具有泊松盘采样一个引导的优化方法重建。最后，一组基于表面的特征被提取，这是在闭角型青光眼的检测中使用。实验结果表明，我们的方法是在虹膜分割和表面重建高度有效的。此外，我们表明，基于3D的表现实现了闭角型青光眼的检测性能优于确实基于2D的功能。

56. Physically constrained short-term vehicle trajectory forecasting with naive semantic maps [PDF] 返回目录
Albert Dulian, John C. Murray
Abstract: Urban environments manifest a high level of complexity, and therefore it is of vital importance for safety systems embedded within autonomous vehicles (AVs) to be able to accurately predict the short-term future motion of nearby agents. This problem can be further understood as generating a sequence of future coordinates for a given agent based on its past motion data e.g. position, velocity, acceleration etc, and whilst current approaches demonstrate plausible results they have a propensity to neglect a scene's physical constrains. In this paper we propose the model based on a combination of the CNN and LSTM encoder-decoder architecture that learns to extract a relevant road features from semantic maps as well as general motion of agents and uses this learned representation to predict their short-term future trajectories. We train and validate the model on the publicly available dataset that provides data from urban areas, allowing us to examine it in challenging and uncertain scenarios. We show that our model is not only capable of anticipating future motion whilst taking into consideration road boundaries, but can also effectively and precisely predict trajectories for a longer time horizon than initially trained for.
摘要：城市环境的复杂性明显较高水平，因此它是针对嵌入式自主车（AVS）内安全系统至关重要，以便能够准确地预测附近代理人的短期未来的运动。这个问题可以进一步理解为产生未来坐标的序列基于其过去的运动数据例如给定的代理位置，速度，加速度等，而目前的做法表现出合理的结果，他们有一个倾向，忽视场景的物理约束。在本文中，我们提出的代理基础上，CNN和LSTM编码器，解码器架构，学会提取语义地图相关道路特征的组合模型以及一般的运动，并使用该学会表示，预测其短期的未来轨迹。我们训练和验证模型上，从城市地区提供的数据公开发布的数据集，让我们检查它在挑战和不确定的情况下。我们表明，我们的模型不仅能预测未来的运动，同时考虑到道路的界限，而且还可以有效和精确地预测轨迹更长的时间跨度比最初训练。

57. Smooth Proxy-Anchor Loss for Noisy Metric Learning [PDF] 返回目录
Carlos Roig, David Varas, Issey Masuda, Juan Carlos Riveiro, Elisenda Bou-Balust
Abstract: Many industrial applications use Metric Learning as a way to circumvent scalability issues when designing systems with a high number of classes. Because of this, this field of research is attracting a lot of interest from the academic and non-academic communities. Such industrial applications require large-scale datasets, which are usually generated with web data and, as a result, often contain a high number of noisy labels. While Metric Learning systems are sensitive to noisy labels, this is usually not tackled in the literature, that relies on manually annotated datasets. In this work, we propose a Metric Learning method that is able to overcome the presence of noisy labels using our novel Smooth Proxy-Anchor Loss. We also present an architecture that uses the aforementioned loss with a two-phase learning procedure. First, we train a confidence module that computes sample class confidences. Second, these confidences are used to weight the influence of each sample for the training of the embeddings. This results in a system that is able to provide robust sample embeddings. We compare the performance of the described method with current state-of-the-art Metric Learning losses (proxy-based and pair-based), when trained with a dataset containing noisy labels. The results showcase an improvement of 2.63 and 3.29 in Recall@1 with respect to MultiSimilarity and Proxy-Anchor Loss respectively, proving that our method outperforms the state-of-the-art of Metric Learning in noisy labeling conditions.
摘要：许多工业应用设计了大量的类系统时使用的度量学习，以此来规避可伸缩性问题。正因为如此，这一领域的研究，从学术和非学术团体吸引了很多人的兴趣。这样的工业应用需要大规模数据集，其通常与幅材数据生成，并且作为结果，往往含有大量的嘈杂的标签。虽然度量学习系统对噪声的敏感的标签，这通常不是在文献中，依靠人工标注的数据集解决。在这项工作中，我们提出一个度量学习方法，它能够克服使用我们新的平滑代理锚点损失嘈杂标签的存在。我们还提出了使用了两阶段学习过程的上述损失的架构。首先，我们培养了信心模块，计算样本类可信度。其次，这些置信度被用于加权每个样品对的嵌入的训练的影响。这导致了一个系统，能够提供强大的样品的嵌入。我们比较所描述的方法的与国家的最先进的当前度量学习损失（基于代理的和对基），当与含有嘈杂标签数据集训练的性能。结果分别展示的2.63和3.29中召回@ 1的改进相对于MultiSimilarity和Proxy-锚损耗，证明我们的方法优于公制学习在嘈杂的标记条件的状态的最先进的。

58. A Survey on Generative Adversarial Networks: Variants, Applications, and Training [PDF] 返回目录
Abdul Jabbar, Xi Li, Bourahla Omar
Abstract: The Generative Models have gained considerable attention in the field of unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to its outstanding data generation capability. Many models of GAN have proposed, and several practical applications emerged in various domains of computer vision and machine learning. Despite GAN's excellent success, there are still obstacles to stable training. The problems are due to Nash-equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GAN. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We survey, (I) the original GAN model and its modified classical versions, (II) detail analysis of various GAN applications in different domains, (III) detail study about the various GAN training obstacles as well as training solutions. Finally, we discuss several new issues as well as research outlines to the topic.
摘要：生成模型已经通过一个名为剖成对抗性网络（GAN）新型实用的框架，由于其出色的数据生成能力获得了极大的重视监督学习的领域。 GAN的许多模型都提出，一些实际应用在计算机视觉和机器学习的各个领域出现了。尽管甘出色的成功，仍然有稳定的训练障碍。该问题是由于纳什均衡，内部协变量移位，模式崩溃，消失梯度，并且缺乏适当的评价指标。因此，稳定的训练是在GAN中成功不同应用的一个关键问题。在这里，我们调查不同的研究人员提出，以稳定GAN训练几个培训解决方案。我们的调查，（我）原GAN模型及其修改版本的经典，在不同领域的各种应用甘（II）详细分析，对各种甘训练的障碍（III）详细研究和培训的解决方案。最后，我们讨论了一些新的问题，以及对课题研究大纲。

59. Over-crowdedness Alert! Forecasting the Future Crowd Distribution [PDF] 返回目录
Yuzhen Niu, Weifeng Shi, Wenxi Liu, Shengfeng He, Jia Pan, Antoni B. Chan
Abstract: In recent years, vision-based crowd analysis has been studied extensively due to its practical applications in real world. In this paper, we formulate a novel crowd analysis problem, in which we aim to predict the crowd distribution in the near future given sequential frames of a crowd video without any identity annotations. Studying this research problem will benefit applications concerned with forecasting crowd dynamics. To solve this problem, we propose a global-residual two-stream recurrent network, which leverages the consecutive crowd video frames as inputs and their corresponding density maps as auxiliary information to predict the future crowd distribution. Moreover, to strengthen the capability of our network, we synthesize scene-specific crowd density maps using simulated data for pretraining. Finally, we demonstrate that our framework is able to predict the crowd distribution for different crowd scenarios and we delve into applications including predicting future crowd count, forecasting high-density region, etc.
摘要：近年来，基于视觉的人群分析已被广泛由于在现实世界中的实际应用研究。在本文中，我们制定了一种新的人群分析问题，在我们的目标是预测给定人群的视频序列帧不久的将来，人群分布无任何标识标注。研究这个问题的研究将有利于关注预测人群动力学的应用程序。为了解决这个问题，我们提出了一个全球残余两流递归网络，利用的是连续的人群的视频帧作为输入，并且其相应的密度映射作为辅助信息来预测未来的人群分布。此外，要加强我们的网络的能力，我们合成场景特定人群密度图使用模拟数据进行训练前。最后，我们证明了我们的框架能够预测不同人群场景的人群分布，我们深入到应用，包括预测未来的人群数量，预测高密度区，等等

60. Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? [PDF] 返回目录
Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
Abstract: To be reliable on rare events is an important requirement for systems based on machine learning. In this work we focus on Visual Question Answering (VQA), where, in spite of recent efforts, datasets remain imbalanced, causing shortcomings of current models: tendencies to overly exploit dataset biases and struggles to generalise to unseen associations of concepts. We focus on a systemic evaluation of model error distributions and address fundamental questions: How is the prediction error distributed? What is the prediction accuracy on infrequent vs. frequent concepts? In this work, we design a new benchmark based on a fine-grained reorganization of the GQA dataset [1], which allows to precisely answer these questions. It introduces distributions shifts in both validation and test splits, which are defined on question groups and are thus tailored to each question. We performed a large-scale study and we experimentally demonstrate that several state-of-the-art VQA models, even those specifically designed for bias reduction, fail to address questions involving infrequent concepts. Furthermore, we show that the high accuracy obtained on the frequent concepts alone is mechanically increasing overall accuracy, covering up the true behavior of current VQA models.
摘要：为了在罕见的事件可靠是基于机器学习系统的重要要求。在这项工作中，我们专注于视觉答疑（VQA），其中，在尽管最近的努力，数据集仍不平衡，导致电流模式的缺点：倾向过度开发数据集的偏见和斗争推广到概念看不见的关联。我们专注于模型误差分布和地址的基本问题进行了系统评价：如何分布的预测误差？什么是罕见的与频繁概念的预测准确度？在这项工作中，我们设计了基于该数据集GQA [1]，它允许精确地回答这些问题的细粒度重组的新标杆。它介绍了在这两个验证和测试分裂，它们在问题组中定义，并因此适合于每个问题分布的变化。我们进行了一项大规模研究，我们通过实验证明，一些国家的最先进的VQA车型，即使是那些专门为减低偏差设计，不涉及罕见的概念地址的问题。此外，我们还显示，仅在频繁的概念获得高精度机械提高整体精度，覆盖了当前VQA车型的真实行为。

61. GAP++: Learning to generate target-conditioned adversarial examples [PDF] 返回目录
Xiaofeng Mao, Yuefeng Chen, Yuhong Li, Yuan He, Hui Xue
Abstract: Adversarial examples are perturbed inputs which can cause a serious threat for machine learning models. Finding these perturbations is such a hard task that we can only use the iterative methods to traverse. For computational efficiency, recent works use adversarial generative networks to model the distribution of both the universal or image-dependent perturbations directly. However, these methods generate perturbations only rely on input images. In this work, we propose a more general-purpose framework which infers target-conditioned perturbations dependent on both input image and target label. Different from previous single-target attack models, our model can conduct target-conditioned attacks by learning the relations of attack target and the semantics in image. Using extensive experiments on the datasets of MNIST and CIFAR10, we show that our method achieves superior performance with single target attack models and obtains high fooling rates with small perturbation norms.
摘要：对抗性例子扰乱这可能会导致机器学习模型构成严重威胁的投入。发现这些扰动是这样一个艰巨的任务，我们只能使用迭代方法来遍历。对于计算效率，最近的作品使用对抗性生成网络无论是通用还是依赖于图像的扰动的分布直接建模。然而，这些方法产生的扰动只能靠输入图像。在这项工作中，我们提出了一个更通用的框架，推断针对空调的依赖两个输入图像和目标标签上的扰动。从以前的单一目标攻击模式不同的是，我们的模型可以通过学习攻击目标的关系，并在图像的语义进行目标空调攻击。在MNIST和CIFAR10的数据集采用大量的实验，我们表明，我们的方法实现了与单一目标攻击模型卓越的性能和获得与小扰动规范高嘴硬率。

62. Towards an Intrinsic Definition of Robustness for a Classifier [PDF] 返回目录
Théo Giraudon, Vincent Gripon, Matthias Löwe, Franck Vermet
Abstract: The robustness of classifiers has become a question of paramount importance in the past few years. Indeed, it has been shown that state-of-the-art deep learning architectures can easily be fooled with imperceptible changes to their inputs. Therefore, finding good measures of robustness of a trained classifier is a key issue in the field. In this paper, we point out that averaging the radius of robustness of samples in a validation set is a statistically weak measure. We propose instead to weight the importance of samples depending on their difficulty. We motivate the proposed score by a theoretical case study using logistic regression, where we show that the proposed score is independent of the choice of the samples it is evaluated upon. We also empirically demonstrate the ability of the proposed score to measure robustness of classifiers with little dependence on the choice of samples in more complex settings, including deep convolutional neural networks and real datasets.
摘要：分类器的鲁棒性已经变得至关重要，在过去几年中的一个问题。事实上，这已经表明，国家的最先进的深度学习结构可以很容易地察觉不到改变他们的投入上当。因此，寻找训练的分类的鲁棒性好措施，是该领域的一个关键问题。在本文中，我们指出，在验证组平均样本的鲁棒性的半径是一个统计上微弱的措施。我们建议而不是重量依他们的困难样品的重要性。我们采用Logistic回归，我们表明，该得分是独立于它是被评价样本的选择通过理论案例激励提出了比分。我们也经验证明了该分数来衡量分类的稳健性与更复杂的设置，包括深卷积神经网络和真实数据样本的选择关系不大的能力。

63. PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local Module for Action Recognition [PDF] 返回目录
Yuecong Xu, Haozhi Cao, Jianfei Yang, Kezhi Mao, Jianxiong Yin, Simon See
Abstract: Long-range spatiotemporal dependencies capturing plays an essential role in improving video features for action recognition. The non-local block inspired by the non-local means is designed to address this challenge and have shown excellent performance. However, the non-local block brings significant increase in computation cost to the original network. It also lacks the ability to model regional correlation in videos. To address the above limitations, we propose Pyramid Non-Local (PNL) module, which extends the non-local block by incorporating regional correlation at multiple scales through a pyramid structured module. This extension upscales the effectiveness of non-local operation by attending to the interaction between different regions. Empirical results prove the effectiveness and efficiency of our PNL module, which achieves state-of-the-art performance of 83.09% on the Mini-Kinetics dataset, with decreased computation cost compared to the non-local block.
摘要：远射时空依赖捕捉戏剧的动作识别在提高视频至关重要的作用特点。由非本地手段激发了非本地模块是专门为解决这一挑战，并表现出优异的性能。然而，非本地块带来的计算成本，以原有的网络显著上升。它也缺乏区域相关的视频建模的能力。为了解决上述限制，我们提出了金字塔的非本地（PNL）模块，它扩展了通过金字塔结构的模块集成在多尺度区域相关的非本地块。该扩展通过参加不同地区之间的相互作用upscales非本地运行的有效性。经验结果证明我们的PNL模块，其相比于非本地块上的迷你动力学数据集实现了83.09％的国家的最先进的性能，具有降低的计算成本的有效性和效率。

64. SEKD: Self-Evolving Keypoint Detection and Description [PDF] 返回目录
Yafei Song, Ling Cai, Jia Li, Yonghong Tian, Mingyang Li
Abstract: Researchers have attempted utilizing deep neural network (DNN) to learn novel local features from images inspired by its recent successes on a variety of vision tasks. However, existing DNN-based algorithms have not achieved such remarkable progress that could be partly attributed to insufficient utilization of the interactive characters between local feature detector and descriptor. To alleviate these difficulties, we emphasize two desired properties, i.e., repeatability and reliability, to simultaneously summarize the inherent and interactive characters of local feature detector and descriptor. Guided by these properties, a self-supervised framework, namely self-evolving keypoint detection and description (SEKD), is proposed to learn an advanced local feature model from unlabeled natural images. Additionally, to have performance guarantees, novel training strategies have also been dedicatedly designed to minimize the gap between the learned feature and its properties. We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks. Extensive experimental results demonstrate that the proposed method outperforms popular hand-crafted and DNN-based methods by remarkable margins. Ablation studies also verify the effectiveness of each critical training strategy. We will release our code along with the trained model publicly.
摘要：研究人员试图利用深层神经网络（DNN）从上各种视觉任务灵感来自其最近取得的成功图像学习新的局部特征。然而，现有的基于DNN的算法都没有实现，可能部分归因于利用局部特征检测和描述符之间的互动角色的还不够好显着进展。为了减轻这些困难，我们强调两个期望的性质，即，可重复性和可靠性，同时总结局部特征检测器和描述符的内在和交互式字符。通过将这些属性的指导下，自监督框架，即自我发展的关键点检测和描述（东南九龙发展计划），提出了借鉴未标记的自然图像先进的局部特征模型。此外，有履约担保，新颖的培训策略也被专用地设计，以尽量减少学习的特征及其属性之间的差距。我们的基准上单应估计，相对位姿估计和结构，由运动任务所提出的方法。大量的实验结果表明，该方法优于通过显着的利润流行的手工制作和基于DNN的方法。消融的研究也验证了每个关键培训策略的有效性。我们将公开发布我们的代码与训练的模型一起。

65. Detection of Makeup Presentation Attacks based on Deep Face Representations [PDF] 返回目录
Christian Rathgeb, Pawel Drozdowski, Christoph Busch
Abstract: Facial cosmetics have the ability to substantially alter the facial appearance, which can negatively affect the decisions of a face recognition. In addition, it was recently shown that the application of makeup can be abused to launch so-called makeup presentation attacks. In such attacks, the attacker might apply heavy makeup in order to achieve the facial appearance of a target subject for the purpose of impersonation. In this work, we assess the vulnerability of a COTS face recognition system to makeup presentation attacks employing the publicly available Makeup Induced Face Spoofing (MIFS) database. It is shown that makeup presentation attacks might seriously impact the security of the face recognition system. Further, we propose an attack detection scheme which distinguishes makeup presentation attacks from genuine authentication attempts by analysing differences in deep face representations obtained from potential makeup presentation attacks and corresponding target face images. The proposed detection system employs a machine learning-based classifier, which is trained with synthetically generated makeup presentation attacks utilizing a generative adversarial network for facial makeup transfer in conjunction with image warping. Experimental evaluations conducted using the MIFS database reveal a detection equal error rate of 0.7% for the task of separating genuine authentication attempts from makeup presentation attacks.
摘要：脸部美容有显着改变面部外观，其可在面部识别的决策产生负面影响的能力。此外，最近显示化妆的应用可以被滥用，推出所谓的化妆演示的攻击。在这种攻击中，攻击者可能会为了达到目标对象为模拟的目的的面部美观适用浓妆。在这项工作中，我们评估COTS人脸识别系统的漏洞来采用公开可用化妆致脸部欺骗（MIFS）数据库化妆演示的攻击。结果表明，化妆演示的攻击可能会严重影响面部识别系统的安全性。此外，我们提出一种通过分析从潜在的化妆演示的攻击和相应的目标人脸图像获得深脸部表征的差异区分正版验证尝试化妆演示攻击的攻击检测方案。所提出的检测系统采用基于机器学习分类器，其与利用用于结合脸谱转印与图像变形一种生成对抗性网络合成产生的化妆演示攻击训练。使用MIFS数据库进行实验评估揭示了0.7％的检测等错误率用于分离化妆演示攻击真正的认证尝试的任务。

66. Learning Shared Filter Bases for Efficient ConvNets [PDF] 返回目录
Daeyeon Kim, Woochul Kang
Abstract: Modern convolutional neural networks (ConvNets) achieve state-of-the-art performance for many computer vision tasks. However, such high performance requires millions of parameters and high computational costs. Recently, inspired by the iterative structure of modern ConvNets, such as ResNets, parameter sharing among repetitive convolution layers has been proposed to reduce the size of parameters. However, naive sharing of convolution filters poses many challenges such as overfitting and vanishing/exploding gradients. Furthermore, parameter sharing often increases computational complexity due to additional operations. In this paper, we propose to exploit the linear structure of convolution filters for effective and efficient sharing of parameters among iterative convolution layers. Instead of sharing convolution filters themselves, we hypothesize that a filter basis of linearly-decomposed convolution layers are more effective units for sharing parameters since a filter basis is an intrinsic and reusable building block constituting diverse high dimensional convolution filters. The representation power and peculiarity of individual convolution layers are further increased by adding a small number of layer-specific non-shared components to the filter basis. We show empirically that enforcing orthogonality to shared filter bases can mitigate the difficulty in training shared parameters. Experimental results show that our approach achieves significant reductions both in model parameters and computational costs while maintaining competitive, and often better, performance than non-shared baseline networks.
摘要：现代卷积神经网络（ConvNets）实现国家的最先进的性能对于许多计算机视觉任务。然而，如此高的性能，需要数以百万计的参数和高计算成本。近日，由现代ConvNets，如ResNets的迭代结构的启发，重复卷积层之间的参数共享已经提出了减少参数的大小。然而，卷积过滤器的天真共享提出了许多挑战，比如过学习和消失/爆炸梯度。此外，参数共享往往是由于额外的操作增加了计算的复杂性。在本文中，我们提出利用卷积滤波器的线性结构，用于迭代卷积层之间的有效和高效的共享参数。而不是共享卷积滤波器本身，我们假设线性分解卷积层的过滤器的基础是更有效的单位，因为过滤器的基础是构成不同的高维卷积滤波器的特性和可重复使用积木共享参数。表示功率和个别卷积层的特点是进一步增加通过增加一个小数目层特定的非共享组件的过滤器的基础。我们展示的是经验正交执行对共享滤波器基地可以缓解训练共享参数的难度。实验结果表明，该方法实现了无论是在模型参数和计算成本降低显著，同时保持竞争力，而且往往更好，比非共享的基础网络的性能。

67. Single Image Deraining via Scale-space Invariant Attention Neural Network [PDF] 返回目录
Bo Pang, Deming Zhai, Member, IEEE, Junjun Jiang, Member, IEEE, Xianming Liu, Member, IEEE
Abstract: Image enhancement from degradation of rainy artifacts plays a critical role in outdoor visual computing systems. In this paper, we tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera. Specifically, we revisit multi-scale representation by scale-space theory, and propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain. Moreover, to improve the modeling ability of the network, we do not treat the extracted multi-scale features equally, but design a novel scale-space invariant attention mechanism to help the network focus on parts of the features. In this way, we summarize the most activated presence of feature maps as the salient features. Extensive experiments results on synthetic and real rainy scenes demonstrate the superior performance of our scheme over the state-of-the-arts.
摘要：从雨季文物的劣化图像增强起着户外视觉计算系统中起关键作用。在本文中，我们处理的规模，随着雨牛排的外观相对于相机的视觉变化涉及的概念。具体而言，我们通过重新审视尺度空间理论多尺度表示，并且提出来表示卷积功能域的多尺度相关性，这是比在像素域中更加紧凑和坚固。此外，为了提高网络的建模能力，我们不把所提取的多尺度同样的功能，但设计一种新的尺度空间不变的注意机制，以帮助网络焦点的功能部件。通过这种方式，我们总结功能的最激活存在映射为显着特征。对合成和真实场景多雨广泛的实验结果证明了我们在国家的最艺术方案的卓越性能。

68. Can Synthetic Data Improve Object Detection Results for Remote Sensing Images? [PDF] 返回目录
Weixing Liu, Jun Liu, Bin Luo
Abstract: Deep learning approaches require enough training samples to perform well, but it is a challenge to collect enough real training data and label them manually. In this letter, we propose the use of realistic synthetic data with a wide distribution to improve the performance of remote sensing image aircraft detection. Specifically, to increase the variability of synthetic data, we randomly set the parameters during rendering, such as the size of the instance and the class of background images. In order to make the synthetic images more realistic, we then refine the synthetic images at the pixel level using CycleGAN with real unlabeled images. We also fine-tune the model with a small amount of real data, to obtain a higher accuracy. Experiments on NWPU VHR-10, UCAS-AOD and DIOR datasets demonstrate that the proposed method can be applied for augmenting insufficient real data.
摘要：深学习方法需要足够的训练样本表现良好，但它收集到足够真实的训练数据和手工为其添加标签是一个挑战。在这封信中，我们提出了一个分布广泛使用的现实合成数据的改善遥感图像飞机检测的性能。具体而言，以增加合成数据的可变性，我们随机再现期间设置的参数，如该实例的大小和类背景图像。为了使合成图像更加逼真，我们再在使用CycleGAN与真正的未标记的图像像素级细化的合成图像。我们也微调模型与实际数据量小，以获得更高的精度。上西北工业大学VHR-10的实验中，UCAS-AOD和DIOR数据集表明，该方法可以应用于一种用于增强现实数据不足。

69. RGB-D-E: Event Camera Calibration for Fast 6-DOF Object Tracking [PDF] 返回目录
Etienne Dubeau, Mathieu Garon, Benoit Debaque, Raoul de Charette, Jean-François Lalonde
Abstract: Augmented reality devices require multiple sensors to perform various tasks such as localization and tracking. Currently, popular cameras are mostly frame-based (e.g. RGB and Depth) which impose a high data bandwidth and power usage. With the necessity for low power and more responsive augmented reality systems, using solely frame-based sensors imposes limits to the various algorithms that needs high frequency data from the environement. As such, event-based sensors have become increasingly popular due to their low power, bandwidth and latency, as well as their very high frequency data acquisition capabilities. In this paper, we propose, for the first time, to use an event-based camera to increase the speed of 3D object tracking in 6 degrees of freedom. This application requires handling very high object speed to convey compelling AR experiences. To this end, we propose a new system which combines a recent RGB-D sensor (Kinect Azure) with an event camera (DAVIS346). We develop a deep learning approach, which combines an existing RGB-D network along with a novel event-based network in a cascade fashion, and demonstrate that our approach significantly improves the robustness of a state-of-the-art frame-based 6-DOF object tracker using our RGB-D-E pipeline.
摘要：增强现实设备需要使用多个传感器来执行各种任务，如定位和跟踪。目前，流行的照相机大多是基于帧的（例如RGB和深度），其施加高数据带宽和功率使用。与必要性低功耗和更敏感的增强现实系统，仅使用基于帧的传感器强加限制于从environement需要高频数据的各种算法。因此，基于事件的传感器已经变得越来越流行，由于其低功耗，带宽和延迟，以及它们的频率非常高的数据采集能力。在本文中，我们提出，对于第一次使用基于事件的摄像头，以提高3D物体跟踪的速度在6个自由度。此应用程序需要处理非常高的物体的速度传达令人信服的AR体验。为此，我们提出了一个新的系统，它与事件相机（DAVIS346）结合了最近RGB-d传感器（Kinect的天青）。我们开发了深刻的学习方法，这与级联时尚新颖的基于事件的网络一起结合现有的RGB-d网络，并证明我们的方法显著改善的鲁棒性一个国家的最先进的框架为基础的6使用我们的RGB-DE管道-DOF目标物追踪。

70. Rethinking Classification Loss Designs for Person Re-identification with a Unified View [PDF] 返回目录
Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Shif-Fu Chang
Abstract: Person Re-identification (ReID) aims at matching a person of interest across images. In convolutional neural networks (CNNs) based approaches, loss design plays a role of metric learning which guides the feature learning process to pull closer features of the same identity and to push far apart features of different identities. In recent years, the combination of classification loss and triplet loss achieves superior performance and is predominant in ReID. In this paper, we rethink these loss functions within a generalized formulation and argue that triplet-based optimization can be viewed as a two-class subsampling classification, which performs classification over two sampled categories based on instance similarities. Furthermore, we present a case study which demonstrates that increasing the number of simultaneously considered instance classes significantly improves the ReID performance, since it is aligned better with the ReID test/inference process. With the multi-class subsampling classification incorporated, we provide a strong baseline which achieves the state-of-the-art performance on the benchmark person ReID datasets. Finally, we propose a new meta prototypical N-tuple loss for more efficient multi-class subsampling classification. We aim to inspire more new loss designs in the person ReID field.
摘要：人重新鉴定（里德）旨在匹配一个人的整个图像的兴趣。在卷积神经网络（细胞神经网络）为基础的办法，损失设计起着度量学习的作用，其引导功能，学习过程中提取相同的身份接近的功能和远推开不同的身份特征。近年来，分类损失和三重损失的组合，实现了卓越的性能，并在里德优势。在本文中，我们重新考虑这些损耗功能的广义制剂中，并主张基于三重态 - 优化可以作为两类子采样分类，这执行分类在两个采样类别基于实例的相似性来查看。此外，我们提出了一个案例研究，证明增加同时考虑实例类的数量显著提高了里德的表现，因为它对准与里德测试/推理过程更好。随着多级子采样分类合并，我们提供实现国家的最先进的性能基准的人里德数据集强大的基线。最后，我们提出了更高效的多级子采样分类的新的元原型N元的损失。我们的目标是激励的人里德场更多新的损失的设计。

71. Pixel-Wise Motion Deblurring of Thermal Videos [PDF] 返回目录
Manikandasriram Srinivasan Ramanagopal, Zixu Zhang, Ram Vasudevan, Matthew Johnson-Roberson
Abstract: Uncooled microbolometers can enable robots to see in the absence of visible illumination by imaging the "heat" radiated from the scene. Despite this ability to see in the dark, these sensors suffer from significant motion blur. This has limited their application on robotic systems. As described in this paper, this motion blur arises due to the thermal inertia of each pixel. This has meant that traditional motion deblurring techniques, which rely on identifying an appropriate spatial blur kernel to perform spatial deconvolution, are unable to reliably perform motion deblurring on thermal camera images. To address this problem, this paper formulates reversing the effect of thermal inertia at a single pixel as a Least Absolute Shrinkage and Selection Operator (LASSO) problem which we can solve rapidly using a quadratic programming solver. By leveraging sparsity and a high frame rate, this pixel-wise LASSO formulation is able to recover motion deblurred frames of thermal videos without using any spatial information. To compare its quality against state-of-the-art visible camera based deblurring methods, this paper evaluated the performance of a family of pre-trained object detectors on a set of images restored by different deblurring algorithms. All evaluated object detectors performed systematically better on images restored by the proposed algorithm rather than any other tested, state-of-the-art methods.
摘要：非制冷微辐射可以使机器人通过成像“热”从场景辐射在不存在可见照明的看到的。尽管这种能力在黑暗中看到，这些传感器从显著运动模糊吃亏。这限制了他们的机器人系统中的应用。正如本文描述的，该运动模糊的出现是由于每个像素的热惯性。这意味着，传统的运动去模糊的技术，这依赖于识别适当的空间模糊核来执行空间解卷积，无法可靠地进行运动去模糊的热摄像机图像。为了解决这个问题，本文公式化倒车热惯性在单个像素，我们可以迅速地解决使用二次规划求解一个最小绝对收缩和选择算（LASSO）问题的影响。通过利用稀疏性和高帧速率，这逐像素LASSO制剂是能够恢复的热视频运动去模糊帧，而无需使用任何空间信息。为了比较其对状态的最先进的可见光相机基于去模糊方法质量，本文评估了一组由不同的去模糊算法恢复的图像的家庭预训练对象检测器的性能。所有评价对象检测器上通过所提出的算法，而不是任何其他测试，国家的最先进的方法恢复的图像系统更好执行。

72. A Self-supervised Approach for Adversarial Robustness [PDF] 返回目录
Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
Abstract: Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable defenses that can provide cross-task protection. Adversarial training that enhances robustness by modifying target model's parameters lacks such generalizability. On the other hand, different input processing based defenses fall short in the face of continuously evolving attacks. In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. By design, our defense is a generalizable approach and provides significant robustness against the \textbf{unseen} adversarial attacks (\eg by reducing the success rate of translation-invariant \textbf{ensemble} attack from 82.6\% to 31.9\% in comparison to previous state-of-the-art). It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection. Code is available at: {\small\url{this https URL}}.
摘要：对抗性例子可引起深神经网络（DNNs）灾难性错误基于视觉系统例如用于分类，分割和对象检测。这种攻击DNNs的漏洞可以证明对他们的实际部署的主要障碍。的对抗性例子转让要求普及防御系统，可以提供交叉保护任务。对抗性训练，增强鲁棒性通过修改目标模型的参数缺乏这种普遍性。在另一方面，不同的输入处理基于防御达不到在连续不断变化的攻击的面。在本文中，我们采取的第一步，结合这两种方法的优点，提出了一种在输入空间自我监督的对抗训练机制。在设计上，我们的防守是一个普及的方法，并通过降低平移不变性\ textbf {乐团}进攻的成功率从82.6 \％至31.9 \％相比提供了对\ textbf {看不见}对抗性攻击显著的鲁棒性（\例如，到先前的状态的最先进的）。它可以部署为一个插件和播放解决方案来保护各种视觉系统，为我们演示了分类，分割和检测的情况下。代码，请访问：{\小\ {URL这HTTPS URL}}。

73. What Matters in Unsupervised Optical Flow [PDF] 返回目录
Rico Jonschkowski, Austin Stone, Jonathan T. Barron, Ariel Gordon, Kurt Konolige, Anelia Angelova
Abstract: We systematically compare and analyze a set of key components in unsupervised optical flow to identify which photometric loss, occlusion handling, and smoothness regularization is most effective. Alongside this investigation we construct a number of novel improvements to unsupervised flow models, such as cost volume normalization, stopping the gradient at the occlusion mask, encouraging smoothness before upsampling the flow field, and continual self-supervision with image resizing. By combining the results of our investigation with our improved model components, we are able to present a new unsupervised flow technique that significantly outperforms the previous unsupervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches.
摘要：我们系统地比较和分析了一组关键部件的无监督的光流，以确定其光度损耗，遮挡处理，和平滑正规化是最有效的。除了本次调查，我们建立了许多新的改进，以无监督流模型，如成本音量正常化，停止梯度的遮挡遮罩，鼓励上采样流场之前平滑，并带有图像大小调整不断自我监督。通过结合我们与我们的改进模型组件调查的结果，我们可以提出一个新的无监督流技术，显著优于先前的无监督看齐的KITTI 2015年数据集的国家的最先进的，并执行与监督FlowNet2，而也比相关办法显著简单。

74. Reposing Humans by Warping 3D Features [PDF] 返回目录
Markus Knoche, István Sárándi, Bastian Leibe
Abstract: We address the problem of reposing an image of a human into any desired novel pose. This conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Most prior works are either based on 2D representations or require fitting and manipulating an explicit 3D body mesh. Based on the recent success in deep learning-based volumetric representations, we propose to implicitly learn a dense feature volume from human images, which lends itself to simple and intuitive manipulation through explicit geometric warping. Once the latent feature volume is warped according to the desired pose change, the volume is mapped back to RGB space by a convolutional decoder. Our state-of-the-art results on the DeepFashion and the iPER benchmarks indicate that dense volumetric human representations are worth investigating in more detail.
摘要：我们解决寄托人的图像转换成任何所需的新姿势的问题。这种有条件的图像生成任务需要推理的三维结构的人，包括自我闭塞的身体部位。大多数现有的作品要么基于2D表示或需要安装和操作的显式三维人体网。基于深学习型体积表示的最近的成功，我们建议隐含借鉴人类图像的密度特征量，这使它通过明确的几何变形，以简单而直观的操作。一旦潜在部件体积根据所期望的姿势变化而弯曲，所述体积由卷积解码器映射回RGB空间。我们对DeepFashion和IPER基准国家的最先进的结果表明，密集的体积人表示是值得研究的更多细节。

75. Probabilistic Semantic Mapping for Urban Autonomous Driving Applications [PDF] 返回目录
David Paz, Hengyuan Zhang, Qinru Li, Hao Xiang, Henrik Christensen
Abstract: Recent advancement in statistical learning and computational ability has enabled autonomous vehicle technology to develop at a much faster rate and become widely adopted. While many of the architectures previously introduced are capable of operating under highly dynamic environments, many of these are constrained to smaller-scale deployments and require constant maintenance due to the associated scalability cost with high-definition (HD) maps. HD maps provide critical information for self-driving cars to drive safely. However, traditional approaches for creating HD maps involves tedious manual labeling. As an attempt to tackle this problem, we fuse 2D image semantic segmentation with pre-built point cloud maps collected from a relatively inexpensive 16 channel LiDAR sensor to construct a local probabilistic semantic map in bird's eye view that encodes static landmarks such as roads, sidewalks, crosswalks, and lanes in the driving environment. Experiments from data collected in an urban environment show that this model can be extended for automatically incorporating road features into HD maps with potential future work directions.
摘要：在统计学习和计算能力最近的进步已经使自主汽车技术以更快的速度发展，成为被广泛采用。虽然许多先前引入的体系结构能够在高度动态环境中操作的，许多这些被限制为较小规模的部署，并需要不断进行维护，由于与高清晰度（HD）映射相关联的可扩展性成本。高清地图提供了自动驾驶汽车安全驾驶的重要信息。然而，创建高清地图的传统方法涉及繁琐的手工贴标。为试图解决这一问题，我们保险丝与预先建立的点云的2D图像语义分割映射从相对便宜的16信道激光雷达传感器收集的构造的局部概率语义地图中的鸟瞰图，其编码的静态标如道路，人行道，人行横道和车道驾驶环境。在城市环境中表明，该模型可以延长自动结合道路特征为HD收集的数据映射实验与潜在的未来的工作方向。

76. Skinning a Parameterization of Three-Dimensional Space for Neural Network Cloth [PDF] 返回目录
Jane Wu, Zhenglin Geng, Hui Zhou, Ronald Fedkiw
Abstract: We present a novel learning framework for cloth deformation by embedding virtual cloth into a tetrahedral mesh that parametrizes the volumetric region of air surrounding the underlying body. In order to maintain this volumetric parameterization during character animation, the tetrahedral mesh is constrained to follow the body surface as it deforms. We embed the cloth mesh vertices into this parameterization of three-dimensional space in order to automatically capture much of the nonlinear deformation due to both joint rotations and collisions. We then train a convolutional neural network to recover ground truth deformation by learning cloth embedding offsets for each skeletal pose. Our experiments show significant improvement over learning cloth offsets from body surface parameterizations, both quantitatively and visually, with prior state of the art having a mean error five standard deviations higher than ours. Moreover, our results demonstrate the efficacy of a general learning paradigm where high-frequency details can be embedded into low-frequency parameterizations.
摘要：通过嵌入虚拟布成parametrizes周围底层体空气的体积区域中的四面体网格呈现为布变形的新型学习框架。为了维持角色动画在此体积参数，四面体网格被限制以跟随身体表面，因为它变形。我们嵌入布网格顶点到这个参数的三维空间以自动捕捉许多非线性变形由于双方联合旋转和碰撞。然后，我们培养了卷积神经网络通过学习布嵌入偏移每个骨骼姿态恢复地面实况变形。我们的实验表明在从身体表面参数化学习布偏移，在数量上和视觉上，与具有平均误差比我们高五个标准偏差在本领域的现有状态显著改善。此外，我们的结果表明，一般的学习模式，其中高频细节可以被嵌入到低频参数化的功效。

77. DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation [PDF] 返回目录
Debesh Jha, Michael A. Riegler, Dag Johansen, Pål Halvorsen, Håvard D. Johansen
Abstract: Semantic image segmentation is the process of labeling each pixel of an image with its corresponding class. An encoder-decoder based approach, like U-Net and its variants, is a popular strategy for solving medical image segmentation tasks. To improve the performance of U-Net on various segmentation tasks, we propose a novel architecture called DoubleU-Net, which is a combination of two U-Net architectures stacked on top of each other. The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, we added another U-Net at the bottom. We also adopt Atrous Spatial Pyramid Pooling (ASPP) to capture contextual information within the network. We have evaluated DoubleU-Net using four medical segmentation datasets, covering various imaging modalities such as colonoscopy, dermoscopy, and microscopy. Experiments on the MICCAI 2015 segmentation challenge, the CVC-ClinicDB, the 2018 Data Science Bowl challenge, and the Lesion boundary segmentation datasets demonstrate that the DoubleU-Net outperforms U-Net and the baseline models. Moreover, DoubleU-Net produces more accurate segmentation masks, especially in the case of the CVC-ClinicDB and MICCAI 2015 segmentation challenge datasets, which have challenging images such as smaller and flat polyps. These results show the improvement over the existing U-Net model. The encouraging results, produced on various medical image segmentation datasets, show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.
摘要：语义图像分割是与其对应的类标记的图像的每个像素的处理。编码器，解码器为基础的方法，像掌中及其变种，是解决医学图像分割任务的流行的策略。为了改善U形网的各种分割任务的性能，我们提出了一个新颖的架构称为DoubleU型网，其是堆叠在彼此的顶部上的两个U网架构的组合。第一U Net使用预训练VGG-19的编码器，它已经从ImageNet学到的特点，很容易被转移到另一项任务。为了有效地捕捉更多的语义信息，我们增加了一个U型网在底部。我们也采用Atrous空间金字塔池（ASPP）捕捉到网络中的上下文信息。我们已经使用四个医疗分割数据集，覆盖各种成像模态，诸如结肠镜检查，皮肤镜，和显微镜评估DoubleU-Net的。在2015年MICCAI分割攻击实验中，CVC-ClinicDB，2018年数据科学碗挑战，病灶边界分割数据集证明DoubleU-Net的性能优于U型网和基线模型。此外，DoubleU-Net的产生更精确分割掩码，特别是在CVC-ClinicDB和MICCAI 2015分割挑战数据集，其中已经挑战图像，例如更小和平面息肉的情况下。这些结果表明，在现有的U型网模型的改进。令人鼓舞的成果，在各种医学图像分割数据集产生的，表明DoubleU-Net的可以用作强大的基线为这两个医学图像分割和跨数据集的评估测试，以测量深度学习（DL）模型的普遍性。

78. Novel Perception Algorithmic Framework For Object Identification and Tracking In Autonomous Navigation [PDF] 返回目录
Suryansh Saxena, Isaac K Isukapati
Abstract: This paper introduces a novel perception framework that has the ability to identify and track objects in autonomous vehicle's field of view. The proposed algorithms don't require any training for achieving this goal. The framework makes use of ego-vehicle's pose estimation and a KD-Tree-based segmentation algorithm to generate object clusters. In turn, using a VFH technique, the geometry of each identified object cluster is translated into a multi-modal PDF and a motion model is initiated with every new object cluster for the purpose of robust spatio-temporal tracking. The methodology further uses statistical properties of high-dimensional probability density functions and Bayesian motion model estimates to identify and track objects from frame to frame. The effectiveness of the methodology is tested on a KITTI dataset. The results show that the median tracking accuracy is around 91% with an end-to-end computational time of 153 milliseconds
摘要：本文介绍了一种新的看法框架，具有识别和追踪在视自主车辆的领域对象的能力。该算法不需要实现这一目标的任何训练。该框架利用自车辆的姿态估计和基于KD树的分割算法来生成对象群集。反过来，使用VFH技术，每个识别出的对象聚类的几何形状被转换成一个多模态PDF与运动模型与用于健壮时空跟踪的目的每一个新的对象簇启动。的高维概率密度函数和贝叶斯运动模型的估计的方法进一步使用统计特性来识别和跟踪从帧的对象帧。该方法的有效性在KITTI数据集进行测试。结果表明，平均跟踪精度约为91％，其中153毫秒的终端到终端的计算时间

79. A systematic review on the role of artificial intelligence in sonographic diagnosis of thyroid cancer: Past, present and future [PDF] 返回目录
Fatemeh Abdolali, Atefeh Shahroudnejad, Abhilash Rakkunedeth Hareendranathan, Jacob L Jaremko, Michelle Noga, Kumaradevan Punithakumar
Abstract: Thyroid cancer is common worldwide, with a rapid increase in prevalence across North America in recent years. While most patients present with palpable nodules through physical examination, a large number of small and medium-sized nodules are detected by ultrasound examination. Suspicious nodules are then sent for biopsy through fine needle aspiration. Since biopsies are invasive and sometimes inconclusive, various research groups have tried to develop computer-aided diagnosis systems. Earlier approaches along these lines relied on clinically relevant features that were manually identified by radiologists. With the recent success of artificial intelligence (AI), various new methods are being developed to identify these features in thyroid ultrasound automatically. In this paper, we present a systematic review of state-of-the-art on AI application in sonographic diagnosis of thyroid cancer. This review follows a methodology-based classification of the different techniques available for thyroid cancer diagnosis. With more than 50 papers included in this review, we reflect on the trends and challenges of the field of sonographic diagnosis of thyroid malignancies and potential of computer-aided diagnosis to increase the impact of ultrasound applications on the future of thyroid cancer diagnosis. Machine learning will continue to play a fundamental role in the development of future thyroid cancer diagnosis frameworks.
摘要：甲状腺癌是全世界通用的，随着近几年的快速增长在北美流行。尽管大多数患者呈现与通过体检触及结节，大量的小和中型结节通过超声检查进行检测。可疑结节，然后通过细针穿刺送去活检。由于活组织检查是侵入性的，有时甚至没有定论，各个研究小组曾尝试开发计算机辅助诊断系统。早期的方法沿着这些线路中采纳了人工放射科医生确定临床相关的功能。随着最近的人工智能（AI）的成功，各种新的方法正在开发自动识别甲状腺彩超这些功能。在本文中，我们提出了国家的最先进的对甲状腺癌的超声诊断AI应用的系统评价。本文综述如下供甲状腺癌的诊断不同技术的基础的方法分类。随着包括在本次审查论文50余篇，我们反思的趋势和甲状腺恶性肿瘤和计算机辅助诊断潜在的超声诊断领域的挑战，以增加超声波应用对甲状腺癌诊断的未来的影响。机器学习将继续在未来的甲状腺癌的诊断框架的发展起着重要的作用。

80. To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs [PDF] 返回目录
Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP
Abstract: Regularized Auto-Encoders (AE) form a rich class of methods within the landscape of neural generative models. They effectively model the joint-distribution between the data and a latent space using an Encoder-Decoder combination, with regularization imposed in terms of a prior over the latent space. Despite their advantages such as stability in training, the performance of AE based models has not reached that of the other models such as GANs. While several reasons including the presence of conflicting terms in the objective, distributional choices imposed on the Encoder and the Decoder, and dimensionality of the latent space have been identified as possible causes for the suboptimal performance, the role of the regularization (prior distribution) imposed has not been studied systematically. Motivated by this, we examine the effect of the latent prior on the generation quality of the AE models in this paper. We show that there is no single fixed prior which is optimal for all data distributions, given a Gaussian Decoder. Further, with finite data, we show that there exists a bias-variance trade-off that comes with prior imposition. As a remedy, we optimize a generalized ELBO objective, with an additional state space over the latent prior. We implicitly learn this flexible prior jointly with the AE training using an adversarial learning technique, which facilitates operation on different points of the bias-variance curve. Our experiments on multiple datasets show that the proposed method is the new state-of-the-art for AE based generative models.
摘要：正则自动编码器（AE），形成了富裕阶层的神经生成模型景观中的方法。它们有效地模拟数据，并使用编码器 - 解码器组合的潜在空间，与正则化在先前在潜空间方面强加之间的联合分布。尽管他们的优势，比如在训练的稳定性，根据AE模型的性能还没有达到其他车型的是，如甘斯。虽然有几种原因，包括在施加在编码器和解码器中的目标，分配选择相互冲突的术语的存在，以及潜在空间的维数已被鉴定为用于最佳性能可能的原因，在正则化（先验分布）的作用施加还没有得到系统的研究。这个启发，我们研究潜在之前在AE模式在本文中产生质量的效果。我们表明，没有单一的固定之前，其是所有数据分布，给出的高斯解码器最佳的。此外，有限的数据，我们发现存在偏差方差权衡附带征收之前。作为补救措施，我们优化广义ELBO目标，超过潜之前一个额外的状态的空间。我们含蓄地用对抗性学习技术，这有利于在偏置变化曲线的不同点操作AE培训学习共同这种灵活之前。我们对多个数据集的实验表明，该方法是新的国家的最先进的基于AE生成模型。

81. Applying Deep-Learning-Based Computer Vision to Wireless Communications: Methodologies, Opportunities, and Challenges [PDF] 返回目录
Yu Tian, Gaofeng Pan, Mohamed-Slim Alouini
Abstract: Deep learning (DL) has obtained great success in computer vision (CV) field, and the related techniques have been widely used in security, healthcare, remote sensing, etc. On the other hand, visual data is universal in our daily life, which is easily generated by prevailing but low-cost cameras. Therefore, DL-based CV can be explored to obtain and forecast some useful information about the objects, e.g., the number, locations, distribution, motion, etc. Intuitively, DL-based CV can facilitate and improve the designs of wireless communications, especially in dynamic network scenarios. However, so far, it is rare to see such kind of works in the existing literature. Then, the primary purpose of this article is to introduce ideas of applying DL-based CV in wireless communications to bring some novel degrees of freedom for both theoretical researches and engineering applications. To illustrate how DL-based CV can be applied in wireless communications, an example of using DL-based CV to millimeter wave (mmWave) system is given to realize optimal mmWave multiple-input and multiple-output (MIMO) beamforming in mobile scenarios. In this example, we proposed a framework to predict the future beam indices from the previously-observed beam indices and images of street views by using ResNet, 3-dimensional ResNext, and long short term memory network. Experimental results show that our frameworks can achieve much higher accuracy than the baseline method, and visual data can help significantly improve the performance of MIMO beamforming system. Finally, we discuss the opportunities and challenges of applying DL-based CV in wireless communications.
摘要：深学习（DL）已获得计算机视觉（CV）领域取得巨大成功，以及相关的技术已被广泛应用于安防，医疗，遥感等。另一方面，可视化的数据是在我们的日常生活中普遍，这是很容易由盛行但低成本的相机生成的。因此，基于DL-CV可以被探索，以获得与预测有关的对象，例如，数量，位置，分配，运动等基于DL-直观上，CV可促进并提高无线通信的设计一些有用的信息，特别是在动态的网络场景。然而，迄今为止，这是难得看到这样一种在现有的文献作品。接着，本文的主要目的是在无线通信应用基于DL-CV的想法介绍给带来一些新的自由度的理论研究和工程应用。为了说明如何基于DL-CV可以在无线通信中被应用，使用基于DL-CV到毫米波（毫米波）系统的一个例子是考虑到实现最佳的毫米波多输入和多输出（MIMO）在移动场景波束形成。在这个例子中，我们提出了一个框架，通过使用RESNET，3维ResNext和长短期存储器网络来预测从先前观察到的波束索引和的街道视图的图像将来波束指数。实验结果表明，该框架可以实现比基线法更高的精确度和视觉数据可以帮助显著提高MIMO波束形成系统的性能。最后，我们讨论的机会，并在无线通信应用基于DL-CV的挑战。

82. Dual-level Semantic Transfer Deep Hashing for Efficient Social Image Retrieval [PDF] 返回目录
Lei Zhu, Hui Cui, Zhiyong Cheng, Jingjing Li, Zheng Zhang
Abstract: Social network stores and disseminates a tremendous amount of user shared images. Deep hashing is an efficient indexing technique to support large-scale social image retrieval, due to its deep representation capability, fast retrieval speed and low storage cost. Particularly, unsupervised deep hashing has well scalability as it does not require any manually labelled data for training. However, owing to the lacking of label guidance, existing methods suffer from severe semantic shortage when optimizing a large amount of deep neural network parameters. Differently, in this paper, we propose a Dual-level Semantic Transfer Deep Hashing (DSTDH) method to alleviate this problem with a unified deep hash learning framework. Our model targets at learning the semantically enhanced deep hash codes by specially exploiting the user-generated tags associated with the social images. Specifically, we design a complementary dual-level semantic transfer mechanism to efficiently discover the potential semantics of tags and seamlessly transfer them into binary hash codes. On the one hand, instance-level semantics are directly preserved into hash codes from the associated tags with adverse noise removing. Besides, an image-concept hypergraph is constructed for indirectly transferring the latent high-order semantic correlations of images and tags into hash codes. Moreover, the hash codes are obtained simultaneously with the deep representation learning by the discrete hash optimization strategy. Extensive experiments on two public social image retrieval datasets validate the superior performance of our method compared with state-of-the-art hashing methods. The source codes of our method can be obtained at this https URL
摘要：社会网络存储和传播的用户巨大量共享的图像。深散列是一种高效的索引技术，支持大规模的社会图像检索，由于其深厚的表现能力，快速的检索速度和存储成本较低。特别是，无人监督的深散列有，因为它不需要任何培训手工标注数据以及可扩展性。然而，由于标签指导的缺乏，现有的方法严重短缺的语义优化了大量的深层神经网络参数时吃亏。不同的是，在本文中，我们提出了一个双层次的语义转移深散列（DSTDH）方法来缓解这个问题有统一的深哈希学习框架。我们在通过专门开发与社会形象相关联的用户生成的标签学习语义增强深的散列码模式的目标。具体来说，我们设计了一个互补的双级语义的传送机制来高效地发现标签的潜在语义和无缝传输成二进制散列码。在一方面，实例级别语义直接保存到从不利的噪声消除相关联的变量的散列码。此外，图像概念超图被构建用于图像和标签的潜高阶语义相关间接转移到散列码。此外，与由离散的散列优化策略深表示学习同时获得的散列码。在两个公共社会图像检索数据集大量的实验验证了该方法的优越性能与国家的最先进的哈希方法相比。我们的方法的源代码可以在此HTTPS URL获得

83. Deep Learning-based Aerial Image Segmentation with Open Data for Disaster Impact Assessment [PDF] 返回目录
Ananya Gupta, Simon Watson, Hujun Yin
Abstract: Satellite images are an extremely valuable resource in the aftermath of natural disasters such as hurricanes and tsunamis where they can be used for risk assessment and disaster management. In order to provide timely and actionable information for disaster response, in this paper a framework utilising segmentation neural networks is proposed to identify impacted areas and accessible roads in post-disaster scenarios. The effectiveness of pretraining with ImageNet on the task of aerial image segmentation has been analysed and performances of popular segmentation models compared. Experimental results show that pretraining on ImageNet usually improves the segmentation performance for a number of models. Open data available from OpenStreetMap (OSM) is used for training, forgoing the need for time-consuming manual annotation. The method also makes use of graph theory to update road network data available from OSM and to detect the changes caused by a natural disaster. Extensive experiments on data from the 2018 tsunami that struck Palu, Indonesia show the effectiveness of the proposed framework. ENetSeparable, with 30% fewer parameters compared to ENet, achieved comparable segmentation results to that of the state-of-the-art networks.
摘要：卫星图像是在自然灾害如飓风和海啸，他们可以被用于风险评估和灾害管理的后果极其宝贵的资源。为了及时提供和救灾行动的信息，本文采用分段神经网络的框架建议，以确定灾后情境影响地区和公路四通八达。在航拍图像分割的任务与训练前ImageNet的有效性进行了分析和流行细分车型的性能进行比较。实验结果表明，训练前对ImageNet通常提高了一个数量模型的分割性能。购自OpenStreetMap的（OSM）开放数据用于训练，放弃对耗时的手动注释的需要。该方法还利用图论从OSM更新道路网络数据和检测由自然灾害的变化。从2018年海啸袭击帕卢数据大量的实验，印度尼西亚表明了该框架的有效性。 ENetSeparable，用较少的30％参数相比ENET，取得相当的分割结果，以该状态的最先进的网络。

84. Resolution-Enhanced MRI-Guided Navigation of Spinal Cellular Injection Robot [PDF] 返回目录
Daniel Enrique Martinez, Waiman Meinhold, John Oshinski, Ai-Ping Hu, Jun Ueda
Abstract: This paper presents a method of navigating a surgical robot beyond the resolution of magnetic resonance imaging (MRI) by using a resolution enhancement technique enabled by high-precision piezoelectric actuation. The surgical robot was specifically designed for injecting stem cells into the spinal cord. This particular therapy can be performed in a shorter time by using a MRI-compatible robotic platform than by using a manual needle positioning platform. Imaging resolution of fiducial markers attached to the needle guide tubing was enhanced by reconstructing a high-resolution image from multiple images with sub-pixel movements of the robot. The parallel-plane direct-drive needle positioning mechanism positioned the needle guide with a high spatial precision that is two orders of magnitude higher than typical MRI resolution up to 1 mm. Reconstructed resolution enhanced images were used to navigate the robot precisely that would not have been possible by using standard MRI. Experiments were conducted to verify the effectiveness of the proposed enhanced-resolution image-guided intervention.
摘要：提出通过使用分辨率增强技术导航手术机器人超出磁共振成像（MRI）的分辨率的方法能够通过高精度的压电驱动。手术机器人是专门为干细胞注射到脊髓设计。这种特殊的治疗可以在较短的时间通过使用MRI兼容机器人平台不是通过使用手动针定位平台来执行。附接到所述针引导管的基准标记的图像分辨率是通过重建从与机器人的子像素运动的多个图像的高分辨率图像增强。平行平面直接驱动针定位机构定位在所述针导向件具有高空间精度是两个数量级比典型的MRI分辨率达到1mm以上。重建分辨率增强的图像被用来精确导航机器人，不会使用标准MRI已经成为可能。进行实验，以验证所提出的增强的分辨率的图像引导的介入的效力。

85. Supervised Learning of Sparsity-Promoting Regularizers for Denoising [PDF] 返回目录
Michael T. McCann, Saiprasad Ravishankar
Abstract: We present a method for supervised learning of sparsity-promoting regularizers for image denoising. Sparsity-promoting regularization is a key ingredient in solving modern image reconstruction problems; however, the operators underlying these regularizers are usually either designed by hand or learned from data in an unsupervised way. The recent success of supervised learning (mainly convolutional neural networks) in solving image reconstruction problems suggests that it could be a fruitful approach to designing regularizers. As a first experiment in this direction, we propose to denoise images using a variational formulation with a parametric, sparsity-promoting regularizer, where the parameters of the regularizer are learned to minimize the mean squared error of reconstructions on a training set of (ground truth image, measurement) pairs. Training involves solving a challenging bilievel optimization problem; we derive an expression for the gradient of the training loss using Karush-Kuhn-Tucker conditions and provide an accompanying gradient descent algorithm to minimize it. Our experiments on a simple synthetic, denoising problem show that the proposed method can learn an operator that outperforms well-known regularizers (total variation, DCT-sparsity, and unsupervised dictionary learning) and collaborative filtering. While the approach we present is specific to denoising, we believe that it can be adapted to the whole class of inverse problems with linear measurement models, giving it applicability to a wide range of image reconstruction problems.
摘要：我们提出了对图像进行去噪稀疏促进regularizers的监督学习的方法。稀疏促进正规化是解决现代图像重建问题的一个关键因素;然而，这些潜在regularizers运营商通常是通过手动设计或以无监督方式从数据得知。监督学习的解决图像重建问题上取得的成功（主要是卷积神经网络）表明，它可能是一个卓有成效的方法来设计regularizers。由于在这个方向第一个实验中，我们建议去噪图像用变分公式与参数，稀疏促进正则，其中正则的参数都学会了尽量减少对训练组（地面真理重建的均方误差图像，测量）对。培训涉及解决一个具有挑战性的bilievel优化问题;我们推导的表达式使用卡罗需 - 库恩 - 塔克条件，并提供一个伴随的梯度下降算法以最小化它的训练损失的梯度。我们在一个简单的合成实验中，去噪问题表明，该方法可以了解一个操作符性能优于公知的regularizers（总变差，DCT-稀疏，且无人监管的字典学习）和协同过滤。虽然这种方法我们本是特定于去噪，我们认为，它可以适用于整个类的具有线性计量模型的逆问题，给它适用于宽范围的图像重建的问题。

86. A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images [PDF] 返回目录
Chen Zhao, Joyce H. Keyak, Jinshan Tang, Tadashi S. Kaneko, Sundeep Khosla, Shreyasee Amin, Elizabeth J. Atkinson, Lan-Juan Zhao, Michael J. Serou, Chaoyang Zhang, Hui Shen, Hong-Wen Deng, Weihua Zhou
Abstract: Purpose: Proximal femur image analyses based on quantitative computed tomography (QCT) provide a method to quantify the bone density and evaluate osteoporosis and risk of fracture. We aim to develop a deep-learning-based method for automatic proximal femur segmentation. Methods and Materials: We developed a 3D image segmentation method based on V-Net, an end-to-end fully convolutional neural network (CNN), to extract the proximal femur QCT images automatically. The proposed V-net methodology adopts a compound loss function, which includes a Dice loss and a L2 regularizer. We performed experiments to evaluate the effectiveness of the proposed segmentation method. In the experiments, a QCT dataset which included 397 QCT subjects was used. For the QCT image of each subject, the ground truth for the proximal femur was delineated by a well-trained scientist. During the experiments for the entire cohort then for male and female subjects separately, 90% of the subjects were used in 10-fold cross-validation for training and internal validation, and to select the optimal parameters of the proposed models; the rest of the subjects were used to evaluate the performance of models. Results: Visual comparison demonstrated high agreement between the model prediction and ground truth contours of the proximal femur portion of the QCT images. In the entire cohort, the proposed model achieved a Dice score of 0.9815, a sensitivity of 0.9852 and a specificity of 0.9992. In addition, an R2 score of 0.9956 (p<0.001) was obtained when comparing the volumes measured by our model prediction with ground truth. conclusion: this method shows a great promise for clinical application to qct and qct-based finite element analysis of proximal femur evaluating osteoporosis hip fracture risk. < font>
摘要：目的：股骨近端图像分析基于定量计算机断层扫描（QCT）提供量化骨密度的方法和评估骨质疏松症和骨折风险。我们的目标是开发自动股骨近端分割基于深学习方法。方法和材料：我们开发了一种基于V-网，端至端充分卷积神经网络（CNN），自动提取股骨近端QCT图像的3D图像分割方法。所提出的V-净方法采用化合物损失函数，其中包括一个骰子损失和L2正则。我们进行了实验，以评估所提出的分割方法的有效性。在实验中，一个数据集QCT其中包括397名QCT受试者使用。对于每一个主题的QCT形象，为股骨近端基本事实是由训练有素的科学家划定。在实验过程中，全部患者然后男性和女性受试者分别，90％的受试者在使用10倍的培训和内部验证交叉验证，并且选择所提出的模型的最佳参数;受试者的其余部分被用来评估模型的性能。结果：视觉比较表明所述QCT图像的股骨近端部分的模型预测和地面真轮廓之间一致性高。在整个队列，所提出的模型实现了骰子得分的0.9815，0.9852的的灵敏度和0.9992特异性。另外，R2得分0.9956（P <0.001）的比较，由我们的与地面实况模型预测测量的体积时获得。结论：该方法示出了对于临床应用qct和股骨近端用于评估骨质疏松症和髋部骨折的风险的基于qct的有限元分析一个很大的希望。< font>

87. Can artificial intelligence (AI) be used to accurately detect tuberculosis (TB) from chest x-ray? A multiplatform evaluation of five AI products used for TB screening in a high TB-burden setting [PDF] 返回目录
Zhi Zhen Qin, Shahriar Ahmed, Mohammad Shahnewaz Sarker, Kishor Paul, Ahammad Shafiq Sikder Adel, Tasneem Naheyan, Sayera Banu, Jacob Creswell
Abstract: Powered by artificial intelligence (AI), particularly deep neural networks, computer aided detection (CAD) tools can be trained to recognize TB-related abnormalities on chest radiographs, thereby screening large numbers of people and reducing the pressure on healthcare professionals. Addressing the lack of studies comparing the performance of different products, we evaluated five AI software platforms specific to TB: CAD4TB (v6), InferReadDR (v2), Lunit INSIGHT for Chest Radiography (v4.9.0) , JF CXR-1 (v2) by and qXR (v3) by on an unseen dataset of chest X-rays collected in three TB screening center in Dhaka, Bangladesh. The 23,566 individuals included in the study all received a CXR read by a group of three Bangladeshi board-certified radiologists. A sample of CXRs were re-read by US board-certified radiologists. Xpert was used as the reference standard. All five AI platforms significantly outperformed the human readers. The areas under the receiver operating characteristic curves are qXR: 0.91 (95% CI:0.90-0.91), Lunit INSIGHT CXR: 0.89 (95% CI:0.88-0.89), InferReadDR: 0.85 (95% CI:0.84-0.86), JF CXR-1: 0.85 (95% CI:0.84-0.85), CAD4TB: 0.82 (95% CI:0.81-0.83). We also proposed a new analytical framework that evaluates a screening and triage test and informs threshold selection through tradeoff between cost efficiency and ability to triage. Further, we assessed the performance of the five AI algorithms across the subgroups of age, use cases, and prior TB history, and found that the threshold scores performed differently across different subgroups. The positive results of our evaluation indicate that these AI products can be useful screening and triage tools for active case finding in high TB-burden regions.
摘要：技术的人工智能（AI），特别深的神经网络，计算机辅助检测（CAD）工具可以通过训练来识别胸片TB相关的异常，从而筛选大量的人并减少对医疗保健专业人士的压力。解决缺乏比较不同产品的性能研究，我们评估了5个AI软件平台专门针对TB：CAD4TB（V6），InferReadDR（V2），lunit中INSIGHT的胸片（v4.9.0），JF CXR-1（V2）通过和qXR（V3）通过在在孟加拉国达卡3个TB筛选中心收集胸部X射线的一个看不见的数据集。包括在所有接收到的CXR研究的23566人阅读一组三个孟加拉国板认证的放射科医生。 CXRS的样品是由美国委员会认证的放射科医生重新读取。 XPERT用作参考标准。所有五个AI平台显著优于人类的读者。接收器工作特性曲线下的面积是qXR：0.91（95％CI：0.90-0.91），INSIGHT lunit中CXR：0.89（95％CI：0.88-0.89），InferReadDR：0.85（95％CI：0.84-0.86）， JF CXR-1：0.85（95％CI：0.84-0.85），CAD4TB：0.82（95％CI：0.81-0.83）。我们还提出了一个新的分析框架，通过成本效率和能力，分流之间的权衡评估筛选和分类测试并通知阈值选择。此外，我们评估的整个年龄，使用情况，和以前的结核病史子组的五个AI算法的性能，并发现门槛分数在不同的子群进行不同。我们评价的积极结果表明，这些AI产品可以在结核病高负担地区主动发现有用的筛选和分类工具。

88. DcardNet: Diabetic Retinopathy Classification at Multiple Depths Based on Structural and Angiographic Optical Coherence Tomography [PDF] 返回目录
Pengxiao Zang, Liqin Gao, Tristan T. Hormel, Jie Wang, Qisheng You, Thomas S. Hwang, Yali Jia
Abstract: Optical coherence tomography (OCT) and its angiography (OCTA) have several advantages for the early detection and diagnosis of diabetic retinopathy (DR). However, automated, complete DR classification frameworks based on both OCT and OCTA data have not been proposed. In this study, a densely and continuously connected neural network with adaptive rate dropout (DcardNet) is proposed to fulfill a DR classification framework using en face OCT and OCTA. The proposed network outputs three separate classification depths on each case based on the International Clinical Diabetic Retinopathy scale. At the highest level the network classifies scans as referable or non-referable for DR. The second depth classifies the eye as non-DR, non-proliferative DR (NPDR), or proliferative DR (PDR). The last depth classifies the case as no DR, mild and moderate NPDR, severe NPDR, and PDR. We used 10-fold cross-validation with 10% of the data to assess the performance of our network. The overall classification accuracies of the three depths were 95.7%, 85.0%, and 71.0% respectively.
摘要：光学相干断层扫描（OCT）和其血管造影（OCTA）具有用于糖尿病性视网膜病（DR）的早期检测和诊断的若干优点。然而，根据双方OCT和八数据自动，完整DR分类框架还没有被提出。在这项研究中，用自适应速率差的致密和连续地连接的神经网络（DcardNet）提出了使用满足EN面OCT和OCTA一个DR分类框架。所提出的网络输出基于所述国际临床糖尿病性视网膜病变规模每种情况下三个独立的分类深度。在最高级别的网络进行分类扫描为可借鉴或不可参考的DR。第二深度进行分类眼部非DR，非增殖性DR（NPDR）或增殖性DR（PDR）。最后的深度进行分类的情况下，因为没有DR，轻度和中度NPDR，重度NPDR，PDR和。我们用10倍交叉验证与数据的10％，以评估我们的网络的性能。三米深处的总体分类准确度分别为95.7％，85.0％和71.0％。

89. Pruning neural networks without any data by iteratively conserving synaptic flow [PDF] 返回目录
Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli
Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.9 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that data must be used to quantify which synapses are important.
摘要：深修剪神经网络的参数产生了浓厚的兴趣，由于无论是在训练和测试时间，时间，记忆和能源节约潜力。近期的作品已经确定，通过培训和修剪周期，中奖彩票或疏训练的子网在初始化的存在昂贵的序列。这就提出了一个基本问题：我们可以在初始化识别高度疏训练的子网，从来没有训练，或确实没有以往任何时候都在看这些数据？我们提供了一个肯定的答案通过理论驱动的算法设计这个问题。我们首先从数学制定和实验验证这可以解释为什么现有的基于梯度修剪在初始化算法从层塌陷，整个图层渲染网络untrainable过早修剪遭受守恒定律。这个理论也阐明如何层塌陷可以完全避免，激励一种新颖的修剪算法迭代突触流修剪（SynFlow）。该算法可以被解释为突触优势通过网络在初始化主题的总流量保持到稀疏约束。值得注意的是，该算法没有提到的训练数据和一致地优于现有状态的最先进的修剪在初始化算法在一定范围的模型（VGG及RESNET），数据集（CIFAR-10/100和微小ImageNet），和稀疏性约束（高达99.9％）的。因此，我们的数据无关的剪枝算法挑战现有的范例，数据必须被用来量化这些突触是很重要的。

90. Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges [PDF] 返回目录
Edgar Galván, Peter Mooney
Abstract: A variety of methods have been applied to the architectural configuration and learning or training of artificial deep neural networks (DNN). These methods play a crucial role in the success or failure of the DNN for most problems and applications. Evolutionary Algorithms (EAs) are gaining momentum as a computationally feasible method for the automated optimisation and training of DNNs. Neuroevolution is a term which describes these processes of automated configuration and training of DNNs using EAs. While many works exist in the literature, no comprehensive surveys currently exist focusing exclusively on the strengths and limitations of using neuroevolution approaches in DNNs. Prolonged absence of such surveys can lead to a disjointed and fragmented field preventing DNNs researchers potentially adopting neuroevolutionary methods in their own research, resulting in lost opportunities for improving performance and wider application within real-world deep learning problems. This paper presents a comprehensive survey, discussion and evaluation of the state-of-the-art works on using EAs for architectural configuration and training of DNNs. Based on this survey, the paper highlights the most pertinent current issues and challenges in neuroevolution and identifies multiple promising future research directions.
摘要：多种方法已应用于建筑结构和学习或人工深层神经网络训练（DNN）。这些方法起到DNN的大多数问题和应用的成败至关重要的作用。进化算法（EAS）势头强劲作为用于自动优化在计算上可行的方法和DNNs训练。 Neuroevolution是描述自动配置和使用的EA DNNs的训练的这些过程的术语。而在文献中存在许多工作，目前存在只关注的优势和使用neuroevolution在DNNs接近限制没有全面的调查。长期没有这样的调查可能导致脱节和零散的领域预防DNNs研究人员可能采用自己研究neuroevolutionary方法，导致丧失机会为现实世界中提高性能和更广泛的应用深度学习的问题。本文礼物国家的最先进的作品，使用的EA用于建筑配置和DNNs培训的全面调查，讨论和评估。根据本次调查，该文件强调了最相关的当前问题和挑战neuroevolution和识别多看好未来的研究方向。

91. Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image [PDF] 返回目录
Danny Driess, Jung-Su Ha, Marc Toussaint
Abstract: In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we develop a neural network which, based on an initial image of the scene, directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene in images as input to the neural network, instead of a fixed feature vector. Results show runtime improvements of several magnitudes. Video: this https URL
摘要：在本文中，我们提出了预测的动作序列从最初的场景图像任务和运动规划（TAMP）深卷积递归神经网络。典型TAMP问题是由上的符号，离散电平（例如第一阶逻辑）与连续运动规划如非线性轨迹优化组合推理正规化。由于可能的离散动作场面的大组合复杂，大量的优化/运动规划问题必须解决，以找到一个解决方案，这限制了这些方法的可扩展性。为了规避这个组合的复杂性，我们开发出基于场景的初始图像上的神经网络，直接预测看好离散动作序列，使得只有一个最理想的运动规划问题必须解决，以找到一个解决方案的整体TAMP问题。一个重要方面是，我们的方法推广到场景有许多和不同对象的数量，但在同一时间接受培训，只有两个对象。这通过编码场景的对象在图像作为输入到神经网络，而不是一个固定的特征矢量是可能的。结果表明几个数量级的运行时间改善。视频：此HTTPS URL

92. A t-distribution based operator for enhancing \\ out of distribution robustness of neural network classifiers [PDF] 返回目录
Niccolò Antonello, Philip N. Garner
Abstract: Neural Network (NN) classifiers can assign extreme probabilities to samples that have not appeared during training (out-of-distribution samples) resulting in erroneous and unreliable predictions. One of the causes for this unwanted behaviour lies in the use of the standard softmax operator which pushes the posterior probabilities to be either zero or unity hence failing to model uncertainty. The statistical derivation of the softmax operator relies on the assumption that the distributions of the latent variables for a given class are Gaussian with known variance. However, it is possible to use different assumptions in the same derivation and attain from other families of distributions as well. This allows derivation of novel operators with more favourable properties. Here, a novel operator is proposed that is derived using $t$-distributions which are capable of providing a better description of uncertainty. It is shown that classifiers that adopt this novel operator can be more robust to out of distribution samples, often outperforming NNs that use the standard softmax operator. These enhancements can be reached with minimal changes to the NN architecture.
摘要：神经网络（NN）分类可以分配极端概率是导致错误的和不可靠的预测培训（外的分布样品）中没有出现的样本。其中的一个原因这个不必要的行为就在于使用标准SOFTMAX运营其推动后验概率是零或因此团结未能模型不确定性。在SOFTMAX运营商的统计推导依赖于假设，即给定类的潜在变量的分布是高斯与方差已知。然而，也可以在同一个推导使用不同的假设和分布的其他家庭达到为好。这允许具有更有利性质的新运营商的推导。在此，一种新型的操作者提出了使用$ T $ -distributions其能够提供的不确定性的一个更好的描述的衍生。结果表明，该采用这种新颖的操作者的分类器可以是更稳健出分布的样本，常常优于使用标准SOFTMAX操作神经网络。这些增强功能可以以最小的改动NN架构来达到。

93. A Comparative Study on Early Detection of COVID-19 from Chest X-Ray Images [PDF] 返回目录
Mete Ahishali, Aysen Degerli, Mehmet Yamac, Serkan Kiranyaz, Muhammad E. H. Chowdhury, Khalid Hameed, Tahir Hamid, Rashid Mazhar, Moncef Gabbouj
Abstract: In this study, our first aim is to evaluate the ability of recent state-of-the-art Machine Learning techniques to early detect COVID-19 from plain chest X-ray images. Both compact classifiers and deep learning approaches are considered in this study. Furthermore, we propose a recent compact classifier, Convolutional Support Estimator Network (CSEN) approach for this purpose since it is well-suited for a scarce-data classification task. Finally, this study introduces a new benchmark dataset called Early-QaTa-COV19, which consists of 175 early-stage COVID-19 Pneumonia samples (very limited or no infection signs) labelled by the medical doctors and 1579 samples for control (normal) class. A detailed set of experiments show that the CSEN achieves the top (over 98.5%) sensitivity with over 96% specificity. Moreover, transfer learning over the deep CheXNet fine-tuned with the augmented data produces the leading performance among other deep networks with 97.14% sensitivity and 99.49% specificity.
摘要：在这项研究中，我们的第一个目的是评估近期国家的最先进的机器学习技术的能力，从早期的纯胸部X射线图像检测COVID-19。既紧凑的分类和深入学习方法在这项研究中被考虑。此外，我们提出了一个最近的紧凑型分类，卷积支持估算网络（CSEN）用于此目的的方法，因为它是非常适合稀缺数据分类任务。最后，由医生标记的本研究中引入了新的基准数据集称为早期QaTa-COV19，它由175早期COVID-19肺炎样品（非常有限的或没有感染迹象）和1579个样品对照（正常）类。详细组实验表明，该CSEN实现了与超过96％的特异性的顶部（超过98.5％）的灵敏度。此外，转印学习过深CheXNet微调与增强数据产生等深网络之间有97.14％的敏感性和99.49％的特异性的领先的性能。

94. UMLS-ChestNet: A deep convolutional neural network for radiological findings, differential diagnoses and localizations of COVID-19 in chest x-rays [PDF] 返回目录
Germán González, Aurelia Bustos, José María Salinas, María de la Iglesia-Vaya, Joaquín Galant, Carlos Cano-Espinosa, Xavier Barber, Domingo Orozco-Beltrán, Miguel Cazorla, Antonio Pertusa
Abstract: In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differential diagnosis and 122 anatomic locations, including ground glass opacities, infiltrates, consolidations and other radiological findings compatible with COVID-19. We train the system on one large database of 92,594 frontal chest x-rays (AP or PA, standing, supine or decubitus) and a second database of 2,065 frontal images of COVID-19 patients identified by at least one positive Polymerase Chain Reaction (PCR) test. The reference labels are obtained through natural language processing of the radiological reports. On 23,159 test images, the proposed neural network obtains an AUC of 0.94 for the diagnosis of COVID-19. To our knowledge, this work uses the largest chest x-ray dataset of COVID-19 positive cases to date and is the first one to use a hierarchical labeling schema and to provide interpretability of the results, not only by using network attention methods, but also by indicating the radiological findings that have led to the diagnosis.
摘要：在这项工作中，我们提出了从胸部X光检测放射学发现，他们的位置和鉴别诊断的方法。不同于现有作品着重于检测少数病理的，我们使用映射到统一医学语言系统（UMLS）的术语来识别189个放射学发现，22种鉴别诊断和122个的解剖位置，包括磨玻璃影，浸润，合并层级分类和其他放射性的发现与COVID-19兼容。我们培养系统上的一个大数据库92594正面胸部x射线（AP或PA，站立，仰卧或褥疮）和由至少一个正聚合酶链反应鉴定COVID-19患者2065个的正面图像的第二数据库（PCR ）测试。参考标签通过的放射学报告自然语言处理获得的。上23159个测试图像，所提出的神经网络获得的0.94 COVID-19的诊断的AUC。据我们所知，这项工作使用的COVID-19阳性病例迄今最大的胸部X射线数据集，并使用分级标签模式，并不仅通过网络关注的方法来提供结果的解释性第一位的，但也通过指示放射性的发现已经导致了诊断。

95. What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time [PDF] 返回目录
Hanna Benoni, Daniel Harari, Shimon Ullman
Abstract: Rich empirical evidence has shown that visual object recognition in the brain is fast and effortless, with relevant brain signals reported to start as early as 80 ms. Here we study the time trajectory of the recognition process at the level of minimal recognizable images (termed MIRC). These are images that can be recognized reliably, but in which a minute change of the image (reduction by either size or resolution) has a drastic effect on recognition. Subjects were assigned to one of nine exposure conditions: 200, 500, 1000, 2000 ms with or without masking, as well as unlimited time. The subjects were not limited in time to respond after presentation. The results show that in the masked conditions, recognition rates develop gradually over an extended period, e.g. average of 18% for 200 ms exposure and 45% for 500 ms, increasing significantly with longer exposure even above 2 secs. When presented for unlimited time (until response), MIRC recognition rates were equivalent to the rates of full-object images presented for 50 ms followed by masking. What takes the brain so long to recognize such images? We discuss why processes involving eye-movements, perceptual decision-making and pattern completion are unlikely explanations. Alternatively, we hypothesize that MIRC recognition requires an extended top-down process complementing the feed-forward phase.
摘要：丰富的经验证据表明，在大脑可视对象识别速度快，不费力，相关大脑信号报道，早在80毫秒启动。在这里，我们以最小的可识别图像的水平研究识别过程的时间轨迹（称为MIRC）。这些是能够可靠地识别的图像，但在其中图像（通过尺寸或分辨率降低）的微小变化对识别一个重大影响。受试者被分配给九个曝光条件之一：200，500，1000，2000毫秒具有或不具有屏蔽，以及无限的时间。受试者没有在有限的时间介绍后响应。结果表明，在掩蔽的条件下，识别率逐渐发展在延长的时间，例如平均18％为200毫秒的曝光和500毫秒的45％，甚至高于2秒与较长曝光显著增加。当提出了无限的时间（直到响应），MIRC识别率，相当于全对象图像的呈现为50毫秒，然后通过屏蔽率。需要什么使大脑长期认识到这样的图像？我们讨论为什么涉及眼睛运动，感性决策和模式完成的过程是不可能的解释。可替换地，我们假设MIRC识别需要扩展的自上而下的过程补充前馈阶段。

96. A Review of Automatically Diagnosing COVID-19 based on Scanning Image [PDF] 返回目录
Delong Chen, Fan Liu, Zewen Li
Abstract: The pandemic of COVID-19 has caused millions of infectious. Due to the false-negative rate and the time cost of conventional RT-PCR tests, X-ray images and Computed Tomography (CT) images based diagnosing become widely adopted. Therefore, researchers of the computer vision area have developed many automatic diagnosing models to help the radiologists and pro-mote the diagnosing accuracy. In this paper, we present a review of these recently emerging automatic diagnosing models. 62 models from 14, February to 5, May, 2020 are involved. We analyzed the models from the perspective of preprocessing, feature extraction, classification, and evaluation. Then we pointed out that domain adaption in transfer learning and interpretability promotion are the possible future directions.
摘要：COVID-19的大流行造成数以百万计的感染。由于假阴性率和常规RT-PCR测试的时间成本，基于诊断的X射线图像和计算机断层摄影（CT）图像被广泛采用。因此，计算机视觉领域的研究人员已经开发了许多自动诊断模型，以帮助放射科医生和亲微尘的诊断准确性。在本文中，我们提出这些新近出现的自动诊断模型的审查。从14 62个车型年，2020年二月至5月，都有涉及。我们从预处理，特征提取，分类和评价的角度分析模型。然后，我们指出，域名适应在传递学习和解释性的推广是未来可能的方向。

97. Super-resolution Variational Auto-Encoders [PDF] 返回目录
Ioannis Gatopoulos, Maarten Stol, Jakub M. Tomczak
Abstract: The framework of variational autoencoders (VAEs) provides a principled method for jointly learning latent-variable models and corresponding inference models. However, the main drawback of this approach is the blurriness of the generated images. Some studies link this effect to the objective function, namely, the (negative) log-likelihood. Here, we propose to enhance VAEs by adding a random variable that is a downscaled version of the original image and still use the log-likelihood function as the learning objective. Further, by providing the downscaled image as an input to the decoder, it can be used in a manner similar to the super-resolution. We present empirically that the proposed approach performs comparably to VAEs in terms of the negative log-likelihood function, but it obtains a better FID score.
摘要：变自动编码的框架（VAES）为共同学习潜变量模型和相应的推论模型原则性方法。然而，这种方法的主要缺点是所生成的图像的模糊程度。一些研究这种效果链接到目标函数，即（负）数似然。在这里，我们提出通过增加一个随机变量，它是原始图像的缩小的版本，仍然使用对数似然函数作为学习目标，以提高VAES。此外，通过提供按比例缩小的图像作为输入到解码器，它可以被以类似于超分辨率的方式使用。我们经验介绍，该方法进行到相当的VAES负对数似然函数的条件，但它获得更好的FID得分。

98. Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [PDF] 返回目录
Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria Pazos, Luca Benini
Abstract: The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network optimisation techniques such as pruning and quantisation, iii) optimised algorithms to speed up the execution of the most computational intensive layers and, iv) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on the combination of these methods as the space of approaches becomes too large to test and obtain a globally optimised solution, which leads to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyse the methods to improve the deployment of DNNs across the different levels of software optimisation. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs for industrial applications by automatically exploring the design space and learning an optimised solution that speeds up the performance and reduces the memory on embedded CPU platforms. The framework relies on a Reinforcement Learning -based search that, combined with a deep learning inference framework, enables the deployment of DNN implementations to obtain empirical measurements on embedded AI applications. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation.
摘要：在嵌入式设备上的深度学习的蔓延已经促使许多方法的发展，优化深层神经网络（DNN）的部署。作品主要集中在：ⅰ）有效DNN架构，ⅱ）网络优化技术，如修剪和量化，ⅲ）优化的算法，以加快最计算密集层的执行，并且，静脉内）的专用硬件来加速数据流和计算。然而，对这些方法的组合缺乏研究作为行动的空间过大，以测试并获得全球范围内优化的解决方案，从而导致次优的部署延迟，准确性和存储器方面。在这项工作中，我们首先细节和分析方法，提高DNNs跨越不同层次的软件优化的部署。这一认识的基础上，我们提出了一个自动化框架，探索通过自动探索设计空间和学习优化的解决方案，加速性能，并减少对嵌入式CPU平台的内存，以缓解DNNs为工业应用的部署。该框架依赖于强化学习为基础的搜索，与深学习推理框架相结合，使DNN实现的部署，以获得嵌入式应用AI经验测量。因此，我们提出了一组结果为国家的最先进的DNNs的上一范围的臂的Cortex-A CPU平台实现高达性能4倍的改善，并与在精度可忽略的损失存储器超过2倍减少相对于所述BLAS浮点执行。

99. Orientation Attentive Robot Grasp Synthesis [PDF] 返回目录
Nikolaos Gkanatsios, Georgia Chalvatzaki, Petros Maragos, Jan Peters
Abstract: Physical neighborhoods of grasping points in common objects may offer a wide variety of plausible grasping configurations. For a fixed center of a simple spherical object, for example, there is an infinite number of valid grasping orientations. Such structures create ambiguous and discontinuous grasp maps that confuse neural regressors. We perform a thorough investigation of the challenging Jacquard dataset to show that the existing pixel-wise learning approaches are prone to box overlaps of drastically different orientations. We then introduce a novel augmented map representation that partitions the angle space into bins to allow for the co-occurrence of such orientations and observe larger accuracy margins on the ground truth grasp map reconstructions. On top of that, we build the ORientation AtteNtive Grasp synthEsis (ORANGE) framework that jointly solves a bin classification problem and a real-value regression. The grasp synthesis is attentively supervised by combining discrete and continuous estimations into a single map. We provide experimental evidence by appending ORANGE to two existing unimodal architectures and boost their performance to state-of-the-art levels on Jacquard, specifically 94.71\%, over all related works, even multimodal. Code is available at \url{this https URL}.
摘要：在普通物体抓点的物理邻居可以提供各种各样的合理的抓配置。对于一个简单的球形物体的固定中心时，例如，有有效抓握方向的无限数量。这种结构创建混淆神经回归元暧昧和不连续的把握地图。我们进行挑战提花数据集的深入调查表明，现有的逐像素的学习方法很容易发生截然不同的取向箱重叠。然后，我们介绍一种新颖的增强地图表示该分区中的角空间到箱中，以允许这样的取向的共现和地面真相把握地图重建观察更大的准确性边距。最重要的是，我们所建立的方向细心把握合成（橙色）的框架，共同解决了仓分类的问题，真正的价值回归。把握合成用心由离散和连续估计组合成单个地图监督。我们通过附加ORANGE两个现有单式架构提供了实验依据，并提高其性能，以提花国家的最先进水平，特别是94.71 \％，在所有相关的作品，甚至是多模态。代码可以在\ {URL这HTTPS URL}。

100. High Tissue Contrast MRI Synthesis Using Multi-Stage Attention-GAN for Glioma Segmentation [PDF] 返回目录
Mohammad Hamghalam, Baiying Lei, Tianfu Wang
Abstract: Magnetic resonance imaging (MRI) provides varying tissue contrast images of internal organs based on a strong magnetic field. Despite the non-invasive advantage of MRI in frequent imaging, the low contrast MR images in the target area make tissue segmentation a challenging problem. This paper demonstrates the potential benefits of image-to-image translation techniques to generate synthetic high tissue contrast (HTC) images. Notably, we adopt a new cycle generative adversarial network (CycleGAN) with an attention mechanism to increase the contrast within underlying tissues. The attention block, as well as training on HTC images, guides our model to converge on certain tissues. To increase the resolution of HTC images, we employ multi-stage architecture to focus on one particular tissue as a foreground and filter out the irrelevant background in each stage. This multi-stage structure also alleviates the common artifacts of the synthetic images by decreasing the gap between source and target domains. We show the application of our method for synthesizing HTC images on brain MR scans, including glioma tumor. We also employ HTC MR images in both the end-to-end and two-stage segmentation structure to confirm the effectiveness of these images. The experiments over three competitive segmentation baselines on BraTS 2018 dataset indicate that incorporating the synthetic HTC images in the multi-modal segmentation framework improves the average Dice scores 0.8%, 0.6%, and 0.5% on the whole tumor, tumor core, and enhancing tumor, respectively, while eliminating one real MRI sequence from the segmentation procedure.
摘要：磁共振成像（MRI）的基础上的强磁场提供不同的内部器官的组织对比度的图像。尽管MRI在频繁成像的非侵入性的优点，在目标区域中制作组织分割的低对比度的MR图像一个具有挑战性的问题。本文演示的图像到图像转换技术的潜在好处，以产生合成的高组织对比度（HTC）图像。值得注意的是，我们采用一个新的周期生成对抗网络（CycleGAN）与注意机制，以增加皮下组织内的对比度。注意块，以及对HTC的图像训练，指导我们的模型在某些组织收敛。为了增加HTC图像的分辨率，我们采用多级结构，以集中在一个特定的组织作为前景和滤出每级中的不相关的背景。这种多级结构还通过降低源和目标域之间的间隙减轻了合成图像的公共伪影。我们证明我们的方法对脑MR扫描合成HTC图像，包括神经胶质瘤中的应用。我们还采用中端至端和两阶段分割结构都HTC MR图像，以确认这些图像的效果。超过上臭小子三大竞争分割基线实验2018点的数据集表示在多模态分割框架结合有合成HTC图像提高了平均骰子分数0.8％，0.6％，并且在整个肿瘤0.5％，肿瘤核心，和增强肿瘤分别同时消除从分割程序一个真实MRI序列。

101. Deep learning to estimate the physical proportion of infected region of lung for COVID-19 pneumonia with CT image set [PDF] 返回目录
Wei Wu, Yu Shi, Xukun Li, Yukun Zhou, Peng Du, Shuangzhi Lv, Tingbo Liang, Jifang Sheng
Abstract: Utilizing computed tomography (CT) images to quickly estimate the severity of cases with COVID-19 is one of the most straightforward and efficacious methods. Two tasks were studied in this present paper. One was to segment the mask of intact lung in case of pneumonia. Another was to generate the masks of regions infected by COVID-19. The masks of these two parts of images then were converted to corresponding volumes to calculate the physical proportion of infected region of lung. A total of 129 CT image set were herein collected and studied. The intrinsic Hounsfiled value of CT images was firstly utilized to generate the initial dirty version of labeled masks both for intact lung and infected regions. Then, the samples were carefully adjusted and improved by two professional radiologists to generate the final training set and test benchmark. Two deep learning models were evaluated: UNet and 2.5D UNet. For the segment of infected regions, a deep learning based classifier was followed to remove unrelated blur-edged regions that were wrongly segmented out such as air tube and blood vessel tissue etc. For the segmented masks of intact lung and infected regions, the best method could achieve 0.972 and 0.757 measure in mean Dice similarity coefficient on our test benchmark. As the overall proportion of infected region of lung, the final result showed 0.961 (Pearson's correlation coefficient) and 11.7% (mean absolute percent error). The instant proportion of infected regions of lung could be used as a visual evidence to assist clinical physician to determine the severity of the case. Furthermore, a quantified report of infected regions can help predict the prognosis for COVID-19 cases which were scanned periodically within the treatment cycle.
摘要：利用计算机断层摄影（CT）图像，以快速地估计案件的严重性与COVID-19是最简单的和有效的方法之一。两个任务在目前这个进行了研究。一个是段完整肺肺炎的情况下面具。另一种是产生由COVID-19感染区域的面具。图像的这两部分的掩模，然后转化成相应的体积来计算肺的感染区域的物理比例。总共129 CT图像集的被本文收集和研究。 CT图像的固有Hounsfiled值首先用于产生既为完好肺和受感染的区域标记的掩模的初始脏版。然后，对样品进行仔细调整，由两个专业的放射科医生提高到生成最终的训练集和测试基准。两道深深的学习模型进行了评价：UNET和2.5D UNET。受感染区域的段，一个深学习的分类器之后，以去除被错误地分割出如空气管和血管组织等。对于完整肺和被感染的区域，最好的方法的分割掩码无关模糊边缘的区域可在我们的测试基准平均骰子相似系数达到0.972和0.757的措施。由于肺部受感染区域的整体比例，最终的结果显示，0.961（Pearson相关系数）和11.7％（平均绝对误差百分比）。肺部受感染区域的即时比例可以作为一种视觉证据，以协助临床医师确定案件的严重性。此外，受感染区域的量化报告可以帮助预测哪些是治疗周期内定期扫描COVID-19例预后。

102. Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT [PDF] 返回目录
Shikha Chaganti, Abishek Balachandran, Guillaume Chabin, Stuart Cohen, Thomas Flohr, apl. Prof., Bogdan Georgescu, Philippe Grenier, Prof., Sasa Grbic, Siqi Liu, François Mellot, Nicolas Murray, Savvas Nicolaou, William Parker, Thomas Re, Pina Sanelli, Alexander W. Sauter, Zhoubing Xu, Youngjin Yoo, Valentin Ziebandt, Dorin Comaniciu
Abstract: Purpose: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. Materials and Methods: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobe-wise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April, 2020). Ground truth is established by manual annotations of lesions, lungs, and lobes. Correlation and regression analyses were performed to compare the prediction to the ground truth. Results: Pearson correlation coefficient between method prediction and ground truth for COVID-19 cases was calculated as 0.92 for PO (P < .001), 0.97 for PHO(P < .001), 0.91 for LSS (P < .001), 0.90 for LHOS (P < .001). 98 of 100 healthy controls had a predicted PO of less than 1%, 2 had between 1-2%. Automated processing time to compute the severity scores was 10 seconds per case compared to 30 minutes required for manual annotations. Conclusion: A new method segments regions of CT abnormalities associated with COVID-19 and computes (PO, PHO), as well as (LSS, LHOS) severity scores.
摘要：目的：为了提供一个方法，该方法自动分割并量化异常CT图案通常存在于冠状病2019（COVID-19），即毛玻璃混浊和合并。材料和方法：在该回顾性研究，所提出的方法需要输入一个非对比胸部CT和在三维空间中，基于对9749个胸部CT体积数据集分段病变，肺部和裂片。该方法输出肺和叶受累的严重程度两者合用的措施，量化COVID-19异常两者的程度和高混浊的存在，基于深度学习和深强化学习。（PO，PHO）的第一项措施是全球性的，而（LSS，LHOS）的第二个是叶，明智的。该算法的评价报告了来自加拿大，欧洲和美国的机构200名人参加（100 COVID-19证实患者和100个健康对照）的CT 2002-至今（四月，2020年）之间的收集。基础事实通过病变，肺和叶的手工注释建立。进行相关分析和回归分析来预测比较真实地面。结果：COVID-19案件方法预测和地面真值之间的Pearson相关系数被计算为0.92，PO（P <.001），0.97 98 pho（p <.001），0.91 lss（p <.001），0.90对于lhos（p <0.001）。 100的健康对照具有小于1％的预测的po，2具有1-2％之间。相比于手动注释所需30分钟自动化处理的时间来计算严重程度评分为每情况下10秒。结论：covid-19和计算（po，pho）相关联的ct异常的新方法的段区域，以及（lss，lhos）严重程度评分。< font>

103. Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation [PDF] 返回目录
Xiang Jiang, Qicheng Lao, Stan Matwin, Mohammad Havaei
Abstract: We present an approach for unsupervised domain adaptation---with a strong focus on practical considerations of within-domain class imbalance and between-domain class distribution shift---from a class-conditioned domain alignment perspective. Current methods for class-conditioned domain alignment aim to explicitly minimize a loss function based on pseudo-label estimations of the target domain. However, these methods suffer from pseudo-label bias in the form of error accumulation. We propose a method that removes the need for explicit optimization of model parameters from pseudo-labels directly. Instead, we present a sampling-based implicit alignment approach, where the sample selection procedure is implicitly guided by the pseudo-labels. Theoretical analysis reveals the existence of a domain-discriminator shortcut in misaligned classes, which is addressed by the proposed implicit alignment approach to facilitate domain-adversarial learning. Empirical results and ablation studies confirm the effectiveness of the proposed approach, especially in the presence of within-domain class imbalance and between-domain class distribution shift.
摘要：我们提出了无监督域适应的方法---具有强烈关注内域类不平衡的实际考虑和域之间级分配移---从一个类的空调域对准透视。类空调域对准目标电流的方法来最小化明确地基于目标域的伪标签估计损失函数。然而，这些方法在误差积累的形式伪标签偏苦。我们建议，消除了从直接伪标签模型参数优化的明确需要的方法。取而代之的是，我们提出了一个基于采样的隐式对准方法，其中所述样品选择过程是隐式由伪标签引导。理论分析揭示了错位类域鉴别器快捷方式，这是由所提出的隐式对齐方式解决，以促进域对抗性学习的存在。经验结果和消融的研究证实了该方法的有效性，尤其是在中，域类的不平衡的存在和结构域之间级分布偏移。

104. Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models [PDF] 返回目录
Andrey Voynov, Stanislav Morozov, Artem Babenko
Abstract: Since collecting pixel-level groundtruth data is expensive, unsupervised visual understanding problems are currently an active research topic. In particular, several recent methods based on generative models have achieved promising results for object segmentation and saliency detection. However, since generative models are known to be unstable and sensitive to hyperparameters, the training of these methods can be challenging and time-consuming. In this work, we introduce an alternative, much simpler way to exploit generative models for unsupervised object segmentation. First, we explore the latent space of the BigBiGAN -- the state-of-the-art unsupervised GAN, which parameters are publicly available. We demonstrate that object saliency masks for GAN-produced images can be obtained automatically with BigBiGAN. These masks then are used to train a discriminative segmentation model. Being very simple and easy-to-reproduce, our approach provides competitive performance on common benchmarks in the unsupervised scenario.
摘要：由于收集像素级地面实况数据是昂贵的，无人监督的视觉理解的问题是当前一个活跃的研究课题。尤其是，基于生成模型最近的一些方法都取得了有希望的对象分割和显着性检测结果。然而，由于生成模型被称为是不稳定的和敏感的超参数，这些方法的培训可以是具有挑战性和费时。在这项工作中，我们介绍了利用无监督对象分割生成模型的替代，更简单的方式。首先，我们探索BigBiGAN的潜在空间 - 在国家的最先进的无监督甘，其参数是公开的。我们证明，对于GAN-产生的图像对象显着掩码可以自动与BigBiGAN来获得。这些面具则是用来训练的具有区分细分模型。作为非常简单，易于复制，我们的方法提供了在无人监督的情况下共同基准竞争力的性能。

105. KiU-Net: Towards Accurate Segmentation of Biomedical Images using Over-complete Representations [PDF] 返回目录
Jeya Maria Jose, Vishwanath Sindagi, Ilker Hacihaliloglu, Vishal M. Patel
Abstract: Due to its excellent performance, U-Net is the most widely used backbone architecture for biomedical image segmentation in the recent years. However, in our studies, we observe that there is a considerable performance drop in the case of detecting smaller anatomical landmarks with blurred noisy boundaries. We analyze this issue in detail, and address it by proposing an over-complete architecture (Ki-Net) which involves projecting the data onto higher dimensions (in the spatial sense). This network, when augmented with U-Net, results in significant improvements in the case of segmenting small anatomical landmarks and blurred noisy boundaries while obtaining better overall performance. Furthermore, the proposed network has additional benefits like faster convergence and fewer number of parameters. We evaluate the proposed method on the task of brain anatomy segmentation from 2D Ultrasound (US) of preterm neonates, and achieve an improvement of around 4% in terms of the DICE accuracy and Jaccard index as compared to the standard-U-Net, while outperforming the recent best methods by 2%. Code: this https URL .
摘要：由于其出色的性能，U-Net是在近年来生物医学图像分割使用最广泛的骨干架构。然而，在我们的研究中，我们观察到，有在检测与模糊嘈杂的界限小的解剖标志的情况下相当大的性能下降。我们通过提出涉及投影数据到更高的维度过完整的体系结构（文网）（在空间意义上的）详细分析这个问题，解决这个问题。这种网络中，当带U-Net的增加，导致在小分割解剖标志和模糊的界限嘈杂的同时，获得更好的整体性能的情况下显著的改善。此外，所提出的网络有一个像较快的收敛和更少数量的参数额外的好处。我们评估从二维超声（US），早产儿的大脑解剖分割的任务所提出的方法，并且相比于标准-U-Net的达到4％左右的DICE准确性和杰卡德指数方面的改进，而2％跑赢最近最好的方法。代码：该HTTPS URL。

106. Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning [PDF] 返回目录
Keuntaek Lee, Bogdan Vlahov, Jason Gibson, James M. Rehg, Evangelos A. Theodorou
Abstract: In this work, we present a method for obtaining an implicit objective function for vision-based navigation. The proposed methodology relies on Imitation Learning, Model Predictive Control (MPC), and Deep Learning. We use Imitation Learning as a means to do Inverse Reinforcement Learning in order to create an approximate costmap generator for a visual navigation challenge. The resulting costmap is used in conjunction with a Model Predictive Controller for real-time control and outperforms other state-of-the-art costmap generators combined with MPC in novel environments. The proposed process allows for simple training and robustness to out-of-sample data. We apply our method to the task of vision-based autonomous driving in multiple real and simulated environments using the same weights for the costmap predictor in all environments.
摘要：在这项工作中，我们提出了获得基于视觉的导航隐式目标函数的方法。所提出的方法依赖于模仿学习，模型预测控制（MPC）和深度学习。我们使用的模仿学习为手段，以做逆强化学习，以创造一个视觉导航挑战的近似costmap发生器。将所得costmap是与模型预测控制器用于实时控制结合使用，优于国家的最先进的其他costmap发电机，MPC在新颖的环境相结合。所提出的方法允许简单的训练和鲁棒性外的样本数据。我们使用相同的权重为在所有环境中的costmap预测运用我们的方法基于视觉的自主驾驶的任务在多个真实和模拟环境。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-06-11

目录

摘要