摘要

1. GRCNN: Graph Recognition Convolutional Neural Network for Synthesizing Programs from Flow Charts [PDF] 返回目录
Lin Cheng, Zijiang Yang
Abstract: Program synthesis is the task to automatically generate programs based on user specification. In this paper, we present a framework that synthesizes programs from flow charts that serve as accurate and intuitive specifications. In order doing so, we propose a deep neural network called GRCNN that recognizes graph structure from its image. GRCNN is trained end-to-end, which can predict edge and node information of the flow chart simultaneously. Experiments show that the accuracy rate to synthesize a program is 66.4%, and the accuracy rates to recognize edge and nodes are 94.1% and 67.9%, respectively. On average, it takes about 60 milliseconds to synthesize a program.
摘要：程序合成是自动生成基于用户规范程序的任务。在本文中，我们提出了一个框架，从合成充当准确，直观规范流程图程序。为了这样做，我们提出所谓GRCNN一个深层神经网络，从它的图像识别图形结构。 GRCNN被训练的端至端，可以同时预测的流程图的边缘和节点的信息。实验表明，准确率来合成程序是66.4％，并且准确率来识别边缘和节点分别是94.1％和67.9％。平均而言，大约需要60毫秒合成的程序。

2. LittleYOLO-SPP: A Delicate Real-Time Vehicle Detection Algorithm [PDF] 返回目录
Sri Jamiya S, Esther Rani P
Abstract: Vehicle detection in real-time is a challenging and important task. The existing real-time vehicle detection lacks accuracy and speed. Real-time systems must detect and locate vehicles during criminal activities like theft of vehicle and road traffic violations with high accuracy. Detection of vehicles in complex scenes with occlusion is also extremely difficult. In this study, a lightweight model of deep neural network LittleYOLO-SPP based on the YOLOv3-tiny network is proposed to detect vehicles effectively in real-time. The YOLOv3-tiny object detection network is improved by modifying its feature extraction network to increase the speed and accuracy of vehicle detection. The proposed network incorporated Spatial pyramid pooling into the network, which consists of different scales of pooling layers for concatenation of features to enhance network learning capability. The Mean square error (MSE) and Generalized IoU (GIoU) loss function for bounding box regression is used to increase the performance of the network. The network training includes vehicle-based classes from PASCAL VOC 2007,2012 and MS COCO 2014 datasets such as car, bus, and truck. LittleYOLO-SPP network detects the vehicle in real-time with high accuracy regardless of video frame and weather conditions. The improved network achieves a higher mAP of 77.44% on PASCAL VOC and 52.95% mAP on MS COCO datasets.
摘要：在车辆实时检测是一个充满挑战和重要任务。现有的实时车辆检测缺乏精度和速度。实时系统必须检测，并在犯罪活动像车辆和道路交通违法行为高精度的定位被盗车辆。在闭塞复杂场景的车辆检测也是非常困难的。在这项研究中，基于YOLOv3纤巧网络上的深层神经网络LittleYOLO-SPP的一个轻量级的模型，提出了实时有效地检测车辆。所述YOLOv3-微小物体检测网络是通过修改其特征提取网络，以增加的速度和车辆检测的准确度提高。所提出的网络并入空间金字塔汇集到网络中，它由池层为的功能，以提高网络的学习能力级联的不同尺度的。为包围盒回归的均方误差（MSE）和广义IOU（GIoU）损耗函数用于提高网络的性能。网络的训练包括从PASCAL VOC 2007,2012和MS COCO 2014点的数据集，例如汽车，公共汽车，卡车和基于车辆的类。 LittleYOLO-SPP网络检测实时高准确度的车辆不管视频帧和天气状况。所述改进的网络实现对PASCAL VOC 77.44％和52.95％地图上MS COCO数据集更高的地图。

3. Age Gap Reducer-GAN for Recognizing Age-Separated Faces [PDF] 返回目录
Daksha Yadav, Naman Kohli, Mayank Vatsa, Richa Singh, Afzel Noore
Abstract: In this paper, we propose a novel algorithm for matching faces with temporal variations caused due to age progression. The proposed generative adversarial network algorithm is a unified framework that combines facial age estimation and age-separated face verification. The key idea of this approach is to learn the age variations across time by conditioning the input image on the subject's gender and the target age group to which the face needs to be progressed. The loss function accounts for reducing the age gap between the original image and generated face image as well as preserving the identity. Both visual fidelity and quantitative evaluations demonstrate the efficacy of the proposed architecture on different facial age databases for age-separated face recognition.
摘要：在本文中，我们提出了一种新的算法匹配所造成因年龄进展时间变化的面孔。所提出的生成敌对网络算法是一个统一的框架结合了人脸年龄估计和年龄分隔的人脸验证。这种方法的核心思想是通过调节学习跨越时间的年龄变化在受试者的性别输入图像和到脸部必须取得进展的目标年龄组。损失函数说明了降低原始图像和生成的面部图像之间的年龄差距以及保留的身份。无论视觉保真度和定量评价显示在不同的人脸年龄数据库的年龄分开人脸识别提出的架构的有效性。

4. Transferred Fusion Learning using Skipped Networks [PDF] 返回目录
Vinayaka R Kamath, Vishal S, Varun M
Abstract: Identification of an entity that is of interest is prominent in any intelligent system. The visual intelligence of the model is enhanced when the capability of recognition is added. Several methods such as transfer learning and zero shot learning help to reuse the existing models or augment the existing model to achieve improved performance at the task of object recognition. Transferred fusion learning is one such mechanism that intends to use the best of both worlds and build a model that is capable of outperforming the models involved in the system. We propose a novel mechanism to amplify the process of transfer learning by introducing a student architecture where the networks learn from each other.
摘要：这是感兴趣的实体的标识是任何智能系统突出。增加识别的能力时，该模型的可视化智能性增强。有几种方法，如转让的学习和零射门学习帮助重用现有的模型或增强现有的模式，实现在目标识别的任务改进的性能。转移融合学习就是这样一种机制，它打算用两全其美，并建立了一个模型，能够跑赢参与系统的车型。我们提出了一个新的机制，通过引入学生架构，网络互相学习放大迁移学习的过程。

5. DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs [PDF] 返回目录
Yaxing Wang, Lu Yu, Joost van de Weijer
Abstract: Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.
摘要：图像 - 图像平移最近取得了显着成效。但是，尽管目前的成功，它从性能较差，当遭受类之间的翻译要求大形状的变化。我们将这归因于它们通过当前状态的最先进的图像到图像的方法中使用的高分辨率瓶颈。因此，在这项工作中，我们提出了一个新颖的深层次图像到影像转换方法，称为DeepI2I。（一）中所含的浅层结构信息，并从深层提取（B）语义信息：我们通过利用分层功能学习的楷模。为了使深I2I模型对小型数据集培训，我们提出了一个新颖的转移学习方法，从预先训练甘斯转移了知识。具体而言，我们利用一个预训练的甘斯（即BigGAN或StyleGAN）的鉴别器来初始化编码器和鉴别器和预先训练发生器初始化我们的模型的发电机。运用知识转移导致编码器和发电机之间的对准问题。我们推出了一项网络适配器来解决这个问题。在许多级图像到图像的翻译上三个数据集（动物面，鸟类和食品），我们的状态相比的最先进的在由至少35％的减少MFID。此外，我们定性和定量证明转让学习显著提高I2I系统的性能，特别是对于小数据集。最后，我们是第一个拥有超过100类执行I2I翻译领域。

6. Where to drive: free space detection with one fisheye camera [PDF] 返回目录
Tobias Scheck, Adarsh Mallandur, Christian Wiede, Gangolf Hirtz
Abstract: The development in the field of autonomous driving goes hand in hand with ever new developments in the field of image processing and machine learning methods. In order to fully exploit the advantages of deep learning, it is necessary to have sufficient labeled training data available. This is especially not the case for omnidirectional fisheye cameras. As a solution, we propose in this paper to use synthetic training data based on Unity3D. A five-pass algorithm is used to create a virtual fisheye camera. This synthetic training data is evaluated for the application of free space detection for different deep learning network architectures. The results indicate that synthetic fisheye images can be used in deep learning context.
摘要：在自动驾驶领域的发展齐头并进，在图像处理和机器学习方法领域有史以来的新发展。为了充分利用深度学习的优势，就必须有足够的标记的训练数据可用。这尤其不适合全方位鱼眼镜头的情况。作为一个解决方案，我们建议在本文中使用基于Unity3D合成训练数据。甲五通算法被用于创建虚拟鱼眼相机。对自由空间的检测中的应用为不同的深度学习网络架构，该合成的训练数据进行了评价。结果表明，合成鱼眼图像可以在深的学习环境中使用。

7. Dynamic Plane Convolutional Occupancy Networks [PDF] 返回目录
Stefan Lionar, Daniil Emtsev, Dusan Svilarkovic, Songyou Peng
Abstract: Learning-based 3D reconstruction using implicit neural representations has shown promising progress not only at the object level but also in more complicated scenes. In this paper, we propose Dynamic Plane Convolutional Occupancy Networks, a novel implicit representation pushing further the quality of 3D surface reconstruction. The input noisy point clouds are encoded into per-point features that are projected onto multiple 2D dynamic planes. A fully-connected network learns to predict plane parameters that best describe the shapes of objects or scenes. To further exploit translational equivariance, convolutional neural networks are applied to process the plane features. Our method shows superior performance in surface reconstruction from unoriented point clouds in ShapeNet as well as an indoor scene dataset. Moreover, we also provide interesting observations on the distribution of learned dynamic planes.
摘要：学习型使用隐神经表征已显示出大有希望不仅在目标水平，但也更复杂的场景正在进行3D重建。在本文中，我们提出了动态平面卷积占用网络，一种新型的隐式表示进一步推三维表面重建的质量。输入嘈杂点云被编码成被投影到多个2D动态平面的每点的特征。全连接网络获知来预测最能描述对象或场景的形状平面参数。进一步利用平移同变性，卷积神经网络被施加到处理该平面特性。我们的方法示出了在从非取向点云表面重建ShapeNet优越的性能以及室内场景的数据集。此外，我们还提供上学习到的动态面的分布有趣的观察。

8. Learned Equivariant Rendering without Transformation Supervision [PDF] 返回目录
Cinjon Resnick, Or Litany, Hugo Larochelle, Joan Bruna, Kyunghyun Cho
Abstract: We propose a self-supervised framework to learn scene representations from video that are automatically delineated into objects and background. Our method relies on moving objects being equivariant with respect to their transformation across frames and the background being constant. After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, transformations, and backgrounds. We show results on moving MNIST with backgrounds.
摘要：我们提出了一个自我监督框架，以从视频时自动描绘成对象和背景了解现场表示。我们的方法依赖于相对于移动对象是等变到在帧间它们转变和背景是恒定的。训练结束后，我们可以操纵和渲染实时场景中创建对象，转换和背景的看不见的组合。我们展示与背景的移动MNIST结果。

9. Finding Relevant Flood Images on Twitter using Content-based Filters [PDF] 返回目录
Björn Barz, Kai Schröter, Ann-Christin Kra, Joachim Denzler
Abstract: The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to coarsely distributed sensors or sensor failures. At the same time, a plethora of information is buried in an abundance of images of the event posted on social media platforms such as Twitter. These images could be used to document and rapidly assess the situation and derive proxy-data not available from sensors, e.g., the degree of water pollution. However, not all images posted online are suitable or informative enough for this purpose. Therefore, we propose an automatic filtering approach using machine learning techniques for finding Twitter images that are relevant for one of the following information objectives: assessing the flooded area, the inundation depth, and the degree of water pollution. Instead of relying on textual information present in the tweet, the filter analyzes the image contents directly. We evaluate the performance of two different approaches and various features on a case-study of two major flooding events. Our image-based filter is able to enhance the quality of the results substantially compared with a keyword-based filter, improving the mean average precision from 23% to 53% on average.
摘要：自然灾害，例如及时洪水的分析通常从有限的数据由于粗略分布的传感器或传感器的故障受到影响。与此同时，信息过多埋在丰富张贴在社交媒体平台，如Twitter事件的图像。这些图像可以用于文档和快速地评估形势并导出代理数据不能从传感器，例如，水的污染程度。然而，并非所有图片发布到网上是适合或信息足够用于这一目的。评估淹没面积，淹没深度和水的污染程度：因此，我们使用机器学习技术用于查找相关的以下信息目标之一Twitter的图像提出了一种自动过滤方法。代替在鸣叫依靠文本信息的存在，所述过滤器分析直接在图像内容。我们评估对两大洪水事件的情况下，研究两种不同的方法和不同功能的性能。我们的基于图像的滤波器能够提高结果的质量与基于关键字的过滤器基本上相比较，提高了从23％中值平均精度的53％平均。

10. Survey on 3D face reconstruction from uncalibrated images [PDF] 返回目录
Araceli Morales, Gemma Piella, Federico M. Sukno
Abstract: Recently, a lot of attention has been focused on the incorporation of 3D data into face analysis and its applications. Despite providing a more accurate representation of the face, 3D face images are more complex to acquire than 2D pictures. As a consequence, great effort has been invested in developing systems that reconstruct 3D faces from an uncalibrated 2D image. However, the 3D-from-2D face reconstruction problem is ill-posed, thus prior knowledge is needed to restrict the solutions space. In this work, we review 3D face reconstruction methods in the last decade, focusing on those that only use 2D pictures captured under uncontrolled conditions. We present a classification of the proposed methods based on the technique used to add prior knowledge, considering three main strategies, namely, statistical model fitting, photometry, and deep learning, and reviewing each of them separately. In addition, given the relevance of statistical 3D facial models as prior knowledge, we explain the construction procedure and provide a comprehensive list of the publicly available 3D facial models. After the exhaustive study of 3D-from-2D face reconstruction approaches, we observe that the deep learning strategy is rapidly growing since the last few years, matching its extension to that of the widespread statistical model fitting. Unlike the other two strategies, photometry-based methods have decreased in number since the required strong assumptions cause the reconstructions to be of more limited quality than those resulting from model fitting and deep learning methods. The review also identifies current gaps and suggests avenues for future research.
摘要：最近，很多注意力都集中在3D数据纳入面上的分析及其应用。尽管提供该面的更精确的表示，三维面部图像更复杂以获取比2D图像。因此，巨大的努力已投入开发该系统三维重建从一未校准2D图像的面孔。然而，3D-从-2D面重建问题是不适定的，因此需要的先验知识来限制的解决方案空间。在这项工作中，我们回顾过去十年的三维人脸重建方法，侧重于那些只使用2D画面不受控制的条件下拍摄。我们提出了基于用来添加先验知识的技术所提出的方法进行了分类，考虑三个主要策略，即统计模型拟合，测光和深度学习，并分别审查他们每个人。此外，鉴于统计3D面部模型作为先验知识的相关性，我们解释了施工程序，并提供可公开获得的3D面部模型的完整列表。 3D-从-2D人脸重建的详尽研究后的办法，我们观察到深的学习策略迅速，因为在过去几年不断增长，其匹配扩展了广泛的统计模型拟合的。不像其他两种策略，基于测光的方法在数量，因为需要严格的假设导致重建成为质量比那些从模型拟合和深厚的学习方法，从而更有限的减少。审查报告还确定当前的差距和建议渠道为今后的研究。

11. DeepSim: Semantic similarity metrics for learned image registration [PDF] 返回目录
Steffen Czolbe, Oswin Krause, Aasa Feragen
Abstract: We propose a semantic similarity metric for image registration. Existing metrics like euclidean distance or normalized cross-correlation focus on aligning intensity values, giving difficulties with low intensity contrast or noise. Our semantic approach learns dataset-specific features that drive the optimization of a learning-based registration model. Comparing to existing unsupervised and supervised methods across multiple image modalities and applications, we achieve consistently high registration accuracy and faster convergence than state of the art, and the learned invariance to noise gives smoother transformations on low-quality images.
摘要：本文提出了一种语义相似度量图像配准。现有指标像上对准的强度值，从而用低强度的对比度或噪声的困难欧几里德距离或归一化互相关的焦点。我们的语义方法得知驱动以学习为主的注册模型的优化数据集特有的功能。相较于现有的多个图像模式和应用无监督和监督的方法，我们实现了一贯的高配准精度和更快的收敛速度比最先进的技术，以及学习不变性噪声给出了低质量的图像平滑转换。

12. A CNN-based Feature Space for Semi-supervised Incremental Learning in Assisted Living Applications [PDF] 返回目录
Tobias Scheck, Ana Perez Grassi, Gangolf Hirtz
Abstract: A Convolutional Neural Network (CNN) is sometimes confronted with objects of changing appearance ( new instances) that exceed its generalization capability. This requires the CNN to incorporate new knowledge, i.e., to learn incrementally. In this paper, we are concerned with this problem in the context of assisted living. We propose using the feature space that results from the training dataset to automatically label problematic images that could not be properly recognized by the CNN. The idea is to exploit the extra information in the feature space for a semi-supervised labeling and to employ problematic images to improve the CNN's classification model. Among other benefits, the resulting semi-supervised incremental learning process allows improving the classification accuracy of new instances by 40% as illustrated by extensive experiments.
摘要：卷积神经网络（CNN）有时面临着超过其泛化能力改变外观（新实例）的对象。这就要求CNN将新的知识，即逐步学习。在本文中，我们关注的是在辅助生活的背景下，这个问题。我们建议使用功能空间，从训练数据集结果自动标示不能被CNN正确识别有问题的图像。我们的想法是利用特征空间的半监督标签的额外信息，并聘请有问题的图像，以提高CNN的分类模型。除了其他优点，所得到的半监督增量学习方法允许由40％提高通过大量的实验所示的新实例的分类精度。

13. Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor Dataset for Deep Transfer Learning [PDF] 返回目录
Tobias Scheck, Roman Seidel, Gangolf Hirtz
Abstract: Recent work about synthetic indoor datasets from perspective views has shown significant improvements of object detection results with Convolutional Neural Networks(CNNs). In this paper, we introduce THEODORE: a novel, large-scale indoor dataset containing 100,000 high-resolution diversified fisheye images with 14 classes. To this end, we create 3D virtual environments of living rooms, different human characters and interior textures. Beside capturing fisheye images from virtual environments we create annotations for semantic segmentation, instance masks and bounding boxes for object detection tasks. We compare our synthetic dataset to state of the art real-world datasets for omnidirectional images. Based on MS COCO weights, we show that our dataset is well suited for fine-tuning CNNs for object detection. Through a high generalization of our models by means of image synthesis and domain randomization, we reach an AP up to 0.84 for class person on High-Definition Analytics dataset.
摘要：约从立体图合成室内数据集最近的工作表明用卷积神经网络（细胞神经网络）的对象的检测结果的显著改进。在本文中，我们介绍THEODORE：一种新颖的，含100000高分辨率的大型室内多元化数据集鱼眼图像与14层的类。为此，我们创建的客厅，不同的人物角色和内饰纹理3D虚拟环境。除了从虚拟环境下捕捉鱼眼图像我们创建目标检测任务语义分割，例如口罩注释和边框。我们为合成数据集进行比较，以艺术真实世界的数据集的全方位图像的状态。基于MS COCO的权重，我们表明，我们的数据非常适合微调细胞神经网络的目标检测。通过图像合成和域随机的手段高度概括我们的模型，我们到达AP高达0.84上课人对高清晰度分析数据集。

14. Invariant Deep Compressible Covariance Pooling for Aerial Scene Categorization [PDF] 返回目录
Shidong Wang, Yi Ren, Gerard Parr, Yu Guan, Ling Shao
Abstract: Learning discriminative and invariant feature representation is the key to visual image categorization. In this article, we propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization. We consider transforming the input image according to a finite transformation group that consists of multiple confounding orthogonal matrices, such as the D4 group. Then, we adopt a Siamese-style network to transfer the group structure to the representation space, where we can derive a trivial representation that is invariant under the group action. The linear classifier trained with trivial representation will also be possessed with invariance. To further improve the discriminative power of representation, we extend the representation to the tensor space while imposing orthogonal constraints on the transformation matrix to effectively reduce feature dimensions. We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods. In particular, with using ResNet architecture, our IDCCP model can reduce the dimension of the tensor representation by about 98% without sacrificing accuracy (i.e., <0.5%). < font>
摘要：学习辨别和不变特征表现的关键是视觉图像分类。在这篇文章中，我们提出了一种新的不变深可压缩协方差池（IDCCP）解决空中场景分类滋扰变化。我们考虑根据其由多个混杂正交矩阵，如D4组的有限变换组变换所述输入图像。然后，我们采用连体式网络组结构转移到代表性的空间，在这里我们可以得出一个平凡表示这是该组动作下是不变的。用平凡表示训练线性分类也将与不变性拥有。为了进一步改善表示的辨别力，我们表示延伸到张量空间，同时在变换矩阵施加正交约束来有效地减少特征尺寸。我们在公开发布的航拍场景图像数据集进行了广泛的实验，证明了该方法的优越性与国家的最先进的方法相比。特别地，在使用RESNET架构，我们IDCCP模型可以由约98％降低的张量表示的尺寸而不牺牲精确度（即，<0.5％）。< font>

15. Noise Conscious Training of Non Local Neural Network powered by Self Attentive Spectral Normalized Markovian Patch GAN for Low Dose CT Denoising [PDF] 返回目录
Sutanu Bera, Prabir Kumar Biswas
Abstract: The explosive rise of the use of Computer tomography (CT) imaging in medical practice has heightened public concern over the patient's associated radiation dose. However, reducing the radiation dose leads to increased noise and artifacts, which adversely degrades the scan's interpretability. Consequently, an advanced image reconstruction algorithm to improve the diagnostic performance of low dose ct arose as the primary concern among the researchers, which is challenging due to the ill-posedness of the problem. In recent times, the deep learning-based technique has emerged as a dominant method for low dose CT(LDCT) denoising. However, some common bottleneck still exists, which hinders deep learning-based techniques from furnishing the best performance. In this study, we attempted to mitigate these problems with three novel accretions. First, we propose a novel convolutional module as the first attempt to utilize neighborhood similarity of CT images for denoising tasks. Our proposed module assisted in boosting the denoising by a significant margin. Next, we moved towards the problem of non-stationarity of CT noise and introduced a new noise aware mean square error loss for LDCT denoising. Moreover, the loss mentioned above also assisted to alleviate the laborious effort required while training CT denoising network using image patches. Lastly, we propose a novel discriminator function for CT denoising tasks. The conventional vanilla discriminator tends to overlook the fine structural details and focus on the global agreement. Our proposed discriminator leverage self-attention and pixel-wise GANs for restoring the diagnostic quality of LDCT images. Our method validated on a publicly available dataset of the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge performed remarkably better than the existing state of the art method.
摘要：在医疗实践中使用计算机断层扫描（CT）成像的爆炸性增长已经提高了病人的相关辐射剂量公众的关注。然而，降低剂量导致增加的噪声和伪像，其中，扫描的解释性不利降低了辐射。因此，一种先进的图像重建算法，以改善低剂量CT的诊断性能产生作为研究者中的主要关注的，这是由于该问题的不适定性挑战。近来，深基于学习的技术已经作为低剂量CT（LDCT）去噪显性方法。然而，一些常见的瓶颈仍然存在，阻碍因提供最佳的性能深基于学习的技术。在这项研究中，我们试图减轻这些问题有三个新的堆积物。首先，我们提出了一个新颖的卷积模块利用CT图像的附近相似去噪任务的第一次尝试。我们提出的模块在由显著利润率提升的降噪协助。接下来，我们对CT的非平稳性的噪声问题和移动引入LDCT降噪新的噪声感知均方误差损失。而且，损失上面提到也协助减轻而训练CT使用图像补丁去噪网络所需的费力的工作量。最后，我们提出了CT降噪任务的新的鉴别功能。传统的香草鉴别往往忽略了精细的结构细节和重点的全球性协议。我们所提出的鉴别杠杆自我关注和逐像素甘斯恢复LDCT影像的诊断质量。我们对2016年NIH-AAPM，梅奥诊所的低剂量CT挑战赛的公开可用的数据集验证方法比现有技术方法的现有状态下进行的非常好。

16. Zero-Pair Image to Image Translation using Domain Conditional Normalization [PDF] 返回目录
Samarth Shukla, Andrés Romero, Luc Van Gool, Radu Timofte
Abstract: In this paper, we propose an approach based on domain conditional normalization (DCN) for zero-pair image-to-image translation, i.e., translating between two domains which have no paired training data available but each have paired training data with a third domain. We employ a single generator which has an encoder-decoder structure and analyze different implementations of domain conditional normalization to obtain the desired target domain output. The validation benchmark uses RGB-depth pairs and RGB-semantic pairs for training and compares performance for the depth-semantic translation task. The proposed approaches improve in qualitative and quantitative terms over the compared methods, while using much fewer parameters. Code available at this https URL
摘要：在本文中，我们提出了一种基于域条件正常化（DCN），用于零对图像到图像的平移，即一种方法，其不具有配对的训练数据可用，但是每一个都具有成对的训练数据与两个结构域之间进行转换第三个领域。我们采用单个发电机，其具有编码器，解码器的结构和分析域条件正常化的不同实现，以获得所需的目标域输出。验证基准采用RGB-深度对和培训RGB-语义对和深度的语义翻译任务比较性能。所提出的方法在在比较的方法定性和定量方面提高，同时使用少得多的参数。代码可以在这个HTTPS URL

17. FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification [PDF] 返回目录
Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang
Abstract: Deep learning techniques have provided significant improvements in hyperspectral image (HSI) classification. The current deep learning based HSI classifiers follow a patch-based learning framework by dividing the image into overlapping patches. As such, these methods are local learning methods, which have a high computational cost. In this paper, a fast patch-free global learning (FPGA) framework is proposed for HSI classification. In FPGA, an encoder-decoder based FCN is utilized to consider the global spatial information by processing the whole image, which results in fast inference. However, it is difficult to directly utilize the encoder-decoder based FCN for HSI classification as it always fails to converge due to the insufficiently diverse gradients caused by the limited training samples. To solve the divergence problem and maintain the abilities of FCN of fast inference and global spatial information mining, a global stochastic stratified sampling strategy is first proposed by transforming all the training samples into a stochastic sequence of stratified samples. This strategy can obtain diverse gradients to guarantee the convergence of the FCN in the FPGA framework. For a better design of FCN architecture, FreeNet, which is a fully end-to-end network for HSI classification, is proposed to maximize the exploitation of the global spatial information and boost the performance via a spectral attention based encoder and a lightweight decoder. A lateral connection module is also designed to connect the encoder and decoder, fusing the spatial details in the encoder and the semantic features in the decoder. The experimental results obtained using three public benchmark datasets suggest that the FPGA framework is superior to the patch-based framework in both speed and accuracy for HSI classification. Code has been made available at: this https URL.
摘要：深学习技术已经在光谱图像（HSI）分类提供显著的改善。目前深度学习基于HSI分类遵循将图像分成重叠补丁基于块拼贴的学习框架。因此，这些方法都是当地的学习方法，具有高计算成本。在本文中，快速免费的补丁，全球学习（FPGA）框架，提出了HSI分类。在FPGA中，编码器 - 解码器基于FCN被用于考虑由处理整个图像，这导致快速推理全局空间信息。但是，很难直接利用基于FCN为HSI分类编码器 - 译码器，因为它总是不能收敛由于由有限的训练样本的充分多样梯度。为了解决这个问题的分歧，保持快速推理和全球空间信息挖掘的FCN的能力，一个全球性的随机分层抽样策略，首先通过将所有的训练样本分成分层样本的随机序列建议。这种策略可以得到不同的梯度，以保证FCN的收敛在FPGA架构。对于FCN架构，Freenet的，这是一个完全终端到端到端的网络恒指分类的更好的设计，提出了最大化的全球空间信息的开发利用，并通过频谱关注的基于编码器和一个轻量级的解码器提高性能。横向连接模块也被设计成连接在编码器和解码器，熔合在编码器中的空间细节和语义特征在解码器中。使用三个公共基准数据集获得的实验结果表明，FPGA架构要优于在速度和精度恒指分类基于补丁的框架。代码已可在：该HTTPS URL。

18. A Hybrid Approach for 6DoF Pose Estimation [PDF] 返回目录
Rebecca König, Bertram Drost
Abstract: We propose a method for 6DoF pose estimation of rigid objects that uses a state-of-the-art deep learning based instance detector to segment object instances in an RGB image, followed by a point-pair based voting method to recover the object's pose. We additionally use an automatic method selection that chooses the instance detector and the training set as that with the highest performance on the validation set. This hybrid approach leverages the best of learning and classic approaches, using CNNs to filter highly unstructured data and cut through the clutter, and a local geometric approach with proven convergence for robust pose estimation. The method is evaluated on the BOP core datasets where it significantly exceeds the baseline method and is the best fast method in the BOP 2020 Challenge.
摘要：我们提出了刚性物体的6自由度姿态估计使用一个国家的最先进的深基于学习实例探测器分割对象实例在RGB图像中，随后是点配对基于投票的方法来恢复该对象的一个方法姿势。我们还使用选择的情况下检测和训练集作为与验证集最高性能的自动方法选择。这种混合方法利用最好的学习和经典的方法，利用细胞神经网络过滤高度非结构化数据，并理出头绪，并与强大的姿态估计探明收敛局部几何方法。该方法在BOP核心数据集，它显著超过基线法，就是在BOP 2020挑战的最佳方法快速评估。

19. Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition [PDF] 返回目录
Negar Heidari, Alexandros Iosifidis
Abstract: Graph convolutional networks (GCNs) have been very successful in skeleton-based human action recognition where the sequence of skeletons is modeled as a graph. However, most of the GCN-based methods in this area train a deep feed-forward network with a fixed topology that leads to high computational complexity and restricts their application in low computation scenarios. In this paper, we propose a method to automatically find a compact and problem-specific topology for spatio-temporal graph convolutional networks in a progressive manner. Experimental results on two widely used datasets for skeleton-based human action recognition indicate that the proposed method has competitive or even better classification performance compared to the state-of-the-art methods with much lower computational complexity.
摘要：图形卷积网络（GCNs）已经在基于骨架人类动作识别其中骨架序列建模为一个图形是非常成功的。然而，大多数的在这一领域的基于GCN的方法培养出深厚的前馈网络具有固定的拓扑结构，导致较高的计算复杂性，并限制其较低的计算场景的应用程序。在本文中，我们提出了一个方法来自动寻找时空图卷积网络的紧凑和解决问题的具体拓扑结构以渐进的方式。对基于骨架人类动作识别两种广泛使用的数据集实验结果表明，相比于国家的最先进的方法，用低得多的计算复杂性，该方法具有竞争力的，甚至更好的分类性能。

20. Skeleton-based Relational Reasoning for Group Activity Analysis [PDF] 返回目录
Mauricio Perez, Jun Liu, Alex C. Kot
Abstract: Research on group activity recognition mostly leans on standard two-stream approach (RGB and Optical Flow) as their input features. Few have explored explicit pose information, with none using it directly to reason about the individuals interactions. In this paper, we leverage the skeleton information to learn the interactions between the individuals straight from it. With our proposed method GIRN, multiple relationship types are inferred from independent modules, that describe the relations between the joints pair-by-pair. Additionally to the joints relations, we also experiment with previously unexplored relationship between individuals and relevant objects (e.g. volleyball). The individuals distinct relations are then merged through an attention mechanism, that gives more importance to those more relevant for distinguishing the group activity. We evaluate our method in the Volleyball dataset, obtaining competitive results to the state-of-the-art, even though using a single modality. Therefore demonstrating the potential of skeleton-based approaches for modeling multi-person interactions.
摘要：研究组活动的识别大多斜靠在标准的双数据流方式（RGB和光流）作为其输入功能。很少有研究明确的姿态信息，直接使用它的原因有关个人的互动没有。在本文中，我们利用骨架信息学直接从它的个体之间的相互作用。与我们提出的方法GIRN，多个关系类型是从独立的模块，描述该关节之间的关系推断出对逐对。此外，对于关节的关系，我们也尝试与个人和相关对象（如排球）之间的以前未曾探索的关系。个人独特的关系，然后通过关注机制，让那些区分小组活动更相关的更重要的合并。我们评估在排球集我们的方法，获得有竞争力的结果给国家的最先进的，即使使用一个单一的模式。因此证明的建模多的人相互作用的基础骨架的办法的潜力。

21. Semi-supervised Sparse Representation with Graph Regularization for Image Classification [PDF] 返回目录
Hongfeng Li
Abstract: Image classification is a challenging problem for computer in reality. Large numbers of methods can achieve satisfying performances with sufficient labeled images. However, labeled images are still highly limited for certain image classification tasks. Instead, lots of unlabeled images are available and easy to be obtained. Therefore, making full use of the available unlabeled data can be a potential way to further improve the performance of current image classification methods. In this paper, we propose a discriminative semi-supervised sparse representation algorithm for image classification. In the algorithm, the classification process is combined with the sparse coding to learn a data-driven linear classifier. To obtain discriminative predictions, the predicted labels are regularized with three graphs, i.e., the global manifold structure graph, the within-class graph and the between-classes graph. The constructed graphs are able to extract structure information included in both the labeled and unlabeled data. Moreover, the proposed method is extended to a kernel version for dealing with data that cannot be linearly classified. Accordingly, efficient algorithms are developed to solve the corresponding optimization problems. Experimental results on several challenging databases demonstrate that the proposed algorithm achieves excellent performances compared with related popular methods.
摘要：图像分类是在现实中的电脑一个具有挑战性的问题。大量的方法可以实现满足足够的标记的图像性能。然而，标记的图像仍然非常有限的对某些图像分类任务。相反，大量的未标记的图片都可用且易于获得。因此，充分利用现有的未标记的数据可以进一步提高当前图像分类方法的性能的潜在方法。在本文中，我们提出了图像分类的判别式半监督稀疏表示算法。在该算法中，分类过程与稀疏编码学习数据驱动线性分类器组合。为了获得判别预测，预测的标签正规化与三个图，即全球歧管结构图，类内图和类之间-曲线图。所构造的图表能够提取包括在两个标记的和未标记的数据结构的信息。此外，所提出的方法推广到内核版本为处理不能线性分类数据。因此，高效的算法的开发，以解决相应的优化问题。在一些具有挑战性的数据库上的实验结果表明，该算法与相关流行的方法相比，实现了出色的表演。

22. Self-supervised Segmentation via Background Inpainting [PDF] 返回目录
Isinsu Katircioglu, Helge Rhodin, Victor Constantin, Jörg Spörri, Mathieu Salzmann, Pascal Fua
Abstract: While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. At the heart of our approach lies the observation that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the moving object cannot. We encode this intuition into a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
摘要：尽管监督对象检测与分割方法，可以实现令人惊叹的准确，他们不好推广到其外观由他们已经接受了有关数据显著不同的图像。为了解决这个问题，当注释数据是极其昂贵，我们引入一个自监督检测和分割方法可以与由潜在的移动摄像机捕获单个图像工作。在我们的方法的心脏位于观察该物体分割和背景重建链接的任务，而且，对于结构化的场景，背景区域可以被重新合成，从周围的环境，而地区描绘了移动物体不能。我们这种直觉编码成一个自我监督的损失函数，我们利用训练基于提议分割网络。要弄清这些建议的离散性，我们开发了一个基于蒙特卡罗的培训策略，使算法探索的对象建议的大空间。我们应用我们的方法对人体检测与分割在视觉上不同于标准基准测试的离开图像和超越现有的自我监督方法。

23. Scribble-Supervised Semantic Segmentation by Random Walk on Neural Representation and Self-Supervision on Neural Eigenspa [PDF] 返回目录
Zhiyi Pan, Peng Jiang, Changhe Tu
Abstract: Scribble-supervised semantic segmentation has gained much attention recently for its promising performance without high-quality annotations. Many approaches have been proposed. Typically, they handle this problem to either introduce a well-labeled dataset from another related task, turn to iterative refinement and post-processing with the graphical model, or manipulate the scribble label. This work aims to achieve semantic segmentation supervised by scribble label directly without auxiliary information and other intermediate manipulation. Specifically, we impose diffusion on neural representation by random walk and consistency on neural eigenspace by self-supervision, which forces the neural network to produce dense and consistent predictions over the whole dataset. The random walk embedded in the network will compute a probabilistic transition matrix, with which the neural representation diffused to be uniform. Moreover, given the probabilistic transition matrix, we apply the self-supervision on its eigenspace for consistency in the image's main parts. In addition to comparing the common scribble dataset, we also conduct experiments on the modified datasets that randomly shrink and even drop the scribbles on image objects. The results demonstrate the superiority of the proposed method and are even comparable to some full-label supervised ones. The code and datasets are available at this https URL.
摘要：涂鸦监督语义分割已经为没有高品质的注解其承诺的性能得到了很大的关注最近。许多方法被提出。通常情况下，他们处理这个问题无论是从另一个相关的任务介绍以及标记的数据集，转向迭代优化和后处理与图形模型，或操纵涂鸦标签。这项工作的目的是实现语义分割监督由乱画标签的情况下直接辅助信息和其它中间操纵。具体来说，我们通过自我监督，迫使神经网络，产生密集，在整个数据集的一致预测的随机游走和一致性神经固有空间强加神经表示扩散。嵌入网络中的随机游动将计算概率转移矩阵，与该神经表示扩散均匀。另外，考虑到概率转移矩阵，我们应用自检其固有空间在图像的主要部分的一致性。除了比较常见的手写体数据集，我们还开展对修改后的数据集，它随机收缩，甚至砸在图像对象的涂鸦实验。结果证明了该方法的优越性，甚至与一些全标签监督的。代码和数据集可在此HTTPS URL。

24. Intentonomy: a Dataset and Study towards Human Intent Understanding [PDF] 返回目录
Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim
Abstract: An image is worth a thousand words, conveying information that goes beyond the mere visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can facilitate recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.
摘要：图像胜过千言万语，传达超越单纯的视觉内容在其中的信息。在本文中，我们研究了背后的社会媒体形象的意图，旨在分析信息可视化如何能够促进人的认可意图。为了实现这一目标，我们推出了一项意向数据集，Intentonomy，包括14K的图像覆盖范围广，日常场景。这些图像被手动从一个社会心理分类衍生28个意图类别注释。然后，我们系统地研究是否以及在何种程度上，常用的可视信息，即，对象和背景，有助于人类动机的理解。根据我们的调查结果，我们进行进一步的研究，以量化的训练意图分类时出席在对象和背景类以及文本信息中的井号标签的形式的影响。我们的研究结果定性和定量揭示预测的意图时，视觉和文本信息如何产生观测效应光。

25. End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks [PDF] 返回目录
Alice Xue
Abstract: Current GAN-based art generation methods produce unoriginal artwork due to their dependence on conditional input. Here, we propose Sketch-And-Paint GAN (SAPGAN), the first model which generates Chinese landscape paintings from end to end, without conditional input. SAPGAN is composed of two GANs: SketchGAN for generation of edge maps, and PaintGAN for subsequent edge-to-painting translation. Our model is trained on a new dataset of traditional Chinese landscape paintings never before used for generative research. A 242-person Visual Turing Test study reveals that SAPGAN paintings are mistaken as human artwork with 55% frequency, significantly outperforming paintings from baseline GANs. Our work lays a groundwork for truly machine-original art generation.
摘要：当前基于GaN的技术产生方法产生非原始作品，由于其上的条件输入依赖性。在这里，我们提出素描和 - 涂料甘（SAPGAN），从端到端生成中国山水画，没有条件投入的第一款车型。 SAPGAN由两个甘斯的：SketchGAN用于产生边缘映像，并PaintGAN后续边对绘画的翻译。我们的模式是在传统的中国山水画从不用于生成研究之前的一个新的数据集训练。一个242人的视觉图灵测试研究表明，SAPGAN画错了，与55％的频率人类的艺术品，显著跑赢从基线甘斯的画作。我们的工作奠定了真正的机器原来的艺术生成一个基础。

26. Optimized Loss Functions for Object detection and Application on Nighttime Vehicle Detection [PDF] 返回目录
Shang Jiang, Haoran Qin, Bingli Zhang, Jieyu Zheng
Abstract: Loss functions is a crucial factor than affecting the detection precision in object detection task. In this paper, we optimize both two loss functions for classification and localization simultaneously. Firstly, by multiplying an IoU-based coefficient by the standard cross entropy loss in classification loss function, the correlation between localization and classification is established. Compared to the existing studies, in which the correlation is only applied to improve the localization accuracy for positive samples, this paper utilizes the correlation to obtain the really hard negative samples and aims to decrease the misclassified rate for negative samples. Besides, a novel localization loss named MIoU is proposed by incorporating a Mahalanobis distance between predicted box and target box, which eliminate the gradients inconsistency problem in the DIoU loss, further improving the localization accuracy. Finally, sufficient experiments for nighttime vehicle detection have been done on two datasets. Our results show than when train with the proposed loss functions, the detection performance can be outstandingly improved. The source code and trained models are available at this https URL.
摘要：损失函数比影响物体检测任务的检测精度的关键因素。在本文中，我们优化了分类和定位两者同时2层损耗的功能。首先，通过由在分类损失函数的标准交叉熵损失基于IOU系数相乘，本地化和分类之间的相关性被建立。相较于现有的研究，其中的相关性仅适用于改善阳性样品的定位精度，本文利用相关性来获得真正困难的负样本，旨在降低对负样本的错误分类率。此外，名为米欧一种新颖的局部化损失是通过将预测的框和目标框之间的马哈拉诺比斯距离，这消除迪欧损失梯度不一致问题，进一步提高定位精度提出。最后，夜间车辆检测足够的实验已经在两个数据集进行。我们的研究结果显示出比火车时所提出的损失函数，检测性能可着改善。源代码和训练的模型可在此HTTPS URL。

27. Automatic Open-World Reliability Assessment [PDF] 返回目录
Mohsen Jafarzadeh, Touqeer Ahmad, Akshay Raj Dhamija, Chunchun Li, Steve Cruz, Terrance E. Boult
Abstract: Image classification in the open-world must handle out-of-distribution (OOD) images. Systems should ideally reject OOD images, or they will map atop of known classes and reduce reliability. Using open-set classifiers that can reject OOD inputs can help. However, optimal accuracy of open-set classifiers depend on the frequency of OOD data. Thus, for either standard or open-set classifiers, it is important to be able to determine when the world changes and increasing OOD inputs will result in reduced system reliability. However, during operations, we cannot directly assess accuracy as there are no labels. Thus, the reliability assessment of these classifiers must be done by human operators, made more complex because networks are not 100% accurate, so some failures are to be expected. To automate this process, herein, we formalize the open-world recognition reliability problem and propose multiple automatic reliability assessment policies to address this new problem using only the distribution of reported scores/probability data. The distributional algorithms can be applied to both classic classifiers with SoftMax as well as the open-world Extreme Value Machine (EVM) to provide automated reliability assessment. We show that all of the new algorithms significantly outperform detection using the mean of SoftMax.
摘要：在开放的世界图像分类必须处理外的分布（OOD）的图像。系统应当理想地拒绝OOD图像，或者他们将映射顶上已知类别的，并降低可靠性。使用开集分类，可以拒绝OOD的投入可以帮助。然而，开放式组分类器的最佳的精度取决于OOD数据的频率。因此，对于标准或开放式集合分类器，它能够确定何时世界的变化和增加OOD输入将导致降低了系统的可靠性是重要的。然而，操作过程中，我们不能直接评估的准确性，因为没有标签。因此，这些分类的可靠性评估必须由人工操作完成，变得更加复杂，因为网络是不是100％准确，所以有些失败是可以预料的。要自动完成这一过程，在此，我们将正式开放世界认可的可靠性问题，并提出了多种自动可靠性评估的政策来解决这个新的问题只用报分数/概率数据的分布。分布式算法可应用于具有使用SoftMax经典的分类器以及开放世界极值机（EVM），以提供自动的可靠性的评估。我们发现，所有的新算法显著跑赢检测使用的平均使用SoftMax的。

28. Unsupervised Learning of Dense Visual Representations [PDF] 返回目录
Pedro O. Pinheiro, Amjad Almahairi, Ryan Y. Benmaleck, Florian Golemo, Aaron Courville
Abstract: Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.
摘要：对比自我监督学习已成为一个有前途的方法来监督的视觉表现的学习。一般而言，这些方法学是不变的不同视图全局（图像电平）表示（即数据扩张的组合物）相同的图像的。然而，许多直观的了解任务要求密（像素级）表示。在本文中，我们提出了景观无关的密集表示（维达）密集交涉无监督学习。通过迫使当地特色维达获悉基于像素的表示保持在不同的观看条件不变。具体地，这是通过像素级对比学习来实现的：匹配特征（即，描述了不同的视图场景的相同的位置的特征）应当接近于包埋空间，同时不匹配的特征应该是开的。维德提供了致密的预测任务和传送一个自然的表示以及对下游任务。我们的性能优于ImageNet方法监督多密集的预测任务训练前（强监督的基线）。

29. ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery [PDF] 返回目录
Jeremy Irvin, Hao Sheng, Neel Ramachandran, Sonja Johnson-Yu, Sharon Zhou, Kyle Story, Rose Rustowicz, Cooper Elsworth, Kemen Austin, Andrew Y. Ng
Abstract: Characterizing the processes leading to deforestation is critical to the development and implementation of targeted forest conservation and management policies. In this work, we develop a deep learning model called ForestNet to classify the drivers of primary forest loss in Indonesia, a country with one of the highest deforestation rates in the world. Using satellite imagery, ForestNet identifies the direct drivers of deforestation in forest loss patches of any size. We curate a dataset of Landsat 8 satellite images of known forest loss events paired with driver annotations from expert interpreters. We use the dataset to train and validate the models and demonstrate that ForestNet substantially outperforms other standard driver classification approaches. In order to support future research on automated approaches to deforestation driver classification, the dataset curated in this study is publicly available at this https URL .
摘要：表征导致毁林的过程是制定和实施有针对性的森林保护和管理政策的关键。在这项工作中，我们开发名为ForestNet在印尼，在世界上最高的森林砍伐率的国家之一原始森林损失的司机进行分类深刻的学习模式。利用卫星图像，ForestNet识别任意大小的森林损失补丁森林砍伐的直接驱动。我们策划与专家解释驾驶员注释配对称为森林损失事件的陆地卫星8号卫星图像的数据集。我们使用的数据集进行训练和验证模型，并证明ForestNet显着优于其他标准的驱动程序分类方法。为了支持自动化的方法来毁林驱动分类未来的研究，在这项研究策划数据集是公开的，在此HTTPS URL。

30. A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs [PDF] 返回目录
Juntao Tan, Changkyu Song, Abdeslam Boularias
Abstract: This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.
摘要：本文介绍了学习探测物体的新颖和以前看不到的类别图像的新的自我监管制度。所提出的系统接收含有多种对象的场景的输入几个未标记的视频。的视频帧被分割为使用深度信息的对象，并且所述段沿每个视频跟踪。然后，该系统构造的加权图，其基于所述对象之间的相似性所连接的序列，它们含有。对象的两个序列之间的相似性是通过使用通用的视觉特征测量之后自动重新布置在两个序列中的帧对齐的对象的观点来看。该曲线图是通过执行随机游动用于相似和不相似的例子样品三胞胎。三重态的例子最终用于训练神经连体网络项目的一般视觉特征成低维流形。三个公共数据集，YCB视频，CORe50和RGBD-对象的实验，表明预期低维特征提高聚类不明物体进入新的类别的准确性，跑赢大盘近几无监督聚类技术。

31. Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory Controllers [PDF] 返回目录
Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract: Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excellent performance can be achieved when the training data is consistent and well distributed, rare data samples are hard to learn from as the controllers fail to incorporate them effectively during model training. Drawing inspiration from the human cognition process, in particular the utilisation of neuromodulators in the human brain, we propose to decouple the learning process of the NMN controllers to allow them to achieve flexible, rapid adaptation in the presence of new information. This trait is highly beneficial for meta-learning tasks where the memory controllers must quickly grasp abstract concepts in the target domain, and adapt stored knowledge. This allows the NMN controllers to quickly determine which memories are to be retained and which are to be erased, and swiftly adapt their strategy to the new task at hand. Through both quantitative and qualitative evaluations on multiple public benchmarks, including classification and regression tasks, we demonstrate the utility of the proposed approach. Our evaluations not only highlight the ability of the proposed NMN architecture to outperform the current state-of-the-art methods, but also provide insights on how the proposed augmentations help achieve such superior results. In addition, we demonstrate the practical implications of the proposed learning strategy, where the feedback path can be shared among multiple neural memory networks as a mechanism for knowledge sharing.
摘要：神经记忆网络（NMNS）相比，使用受限的记忆深处架构已经受到越来越多的关注，近年来。尽管他们的新诉求，NMNS铰链对基于梯度的优化器的能力，成功执行NMN控制器的增量训练，确定如何利用他们的知识检索高容量。这意味着，虽然可以在训练数据的一致性和均匀分布可以实现优异的性能，罕见的数据样本是从难作为控制器故障模型训练过程中有效地纳入他们学习。从人类认知过程中汲取灵感，特别是在人脑中神经调质的利用率，我们建议解耦NMN控制器的学习过程，使他们能够实现灵活，快速适应于新的信息的存在。这个特点是元学习任务，其中存储器控制器必须迅速掌握在目标域中抽象的概念，并适应存储的知识是非常有益的。这使得NMN控制器能够快速确定哪些记忆被保留，哪些要擦除，并迅速调整自己的战略，以手头上的新任务。通过对多个公共基准，其中包括分类和回归任务定量和定性的评价，我们证明了该方法的效用。我们的评估不仅彰显提出NMN架构跑赢当前国家的最先进方法的能力，而且还提供了关于提议扩充帮助下如何实现这种效果出众的见解。此外，我们证明了该学习策略，其中反馈路径可以将多个神经网络的内存作为知识共享的机制之间共享的实际影响。

32. Debugging Tests for Model Explanations [PDF] 返回目录
Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
Abstract: We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into:~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.
摘要：我们调查的事后解释模型是否有效的诊断模型误差 - 模型调试。在回答解释模型预测的挑战，一个巨大的解释方法阵列已经被提出。尽管越来越多地使用，如果他们是有效的，目前尚不清楚。要启动中我们将\ {textit错误}，根据其来源，分为：〜\ {textit数据，模型和测试时间}污染的bug。若干说明的方法，我们评估他们的能力：检测伪相关伪像（数据污染），诊断错误标记训练实例（数据污染），（部分）重新初始化模型和受过训练的一个（模型污染）之间进行区分，并且检测外的分配输入（测试时间污染）。我们发现，测试的方法能够诊断虚假背景的错误，但不是最后确定贴错标签的训练实例。此外，一类方法，即修改该反向传播算法是不变的深网络的较高层参数;因此，不能有效地诊断模型污染。我们与人类对象的研究补充了我们的分析，发现受试者无法识别使用的归因模型有缺陷，而是依赖，主要是，对模型预测。总之，我们的研究结果为从业人员和研究人员转向解释为模型调试工具的指导。

33. Using GANs to Synthesise Minimum Training Data for Deepfake Generation [PDF] 返回目录
Simranjeet Singh, Rajneesh Sharma, Alan F. Smeaton
Abstract: There are many applications of Generative Adversarial Networks (GANs) in fields like computer vision, natural language processing, speech synthesis, and more. Undoubtedly the most notable results have been in the area of image synthesis and in particular in the generation of deepfake videos. While deepfakes have received much negative media coverage, they can be a useful technology in applications like entertainment, customer relations, or even assistive care. One problem with generating deepfakes is the requirement for a lot of image training data of the subject which is not an issue if the subject is a celebrity for whom many images already exist. If there are only a small number of training images then the quality of the deepfake will be poor. Some media reports have indicated that a good deepfake can be produced with as few as 500 images but in practice, quality deepfakes require many thousands of images, one of the reasons why deepfakes of celebrities and politicians have become so popular. In this study, we exploit the property of a GAN to produce images of an individual with variable facial expressions which we then use to generate a deepfake. We observe that with such variability in facial expressions of synthetic GAN-generated training images and a reduced quantity of them, we can produce a near-realistic deepfake videos.
摘要：有像计算机视觉，自然语言处理，语音合成，多领域创成对抗性网络（甘斯）的多种应用。无疑是最显着的成果已在图像合成的区域，特别是在deepfake视频的一代。虽然deepfakes已经收到很多媒体的负面报道，他们可以在喜欢娱乐，客户关系，甚至辅助护理应用的有用的技术。与产生deepfakes的一个问题是很多这不是一个问题，如果对象是为谁许多图像已经存在的名人主题的图像训练数据的要求。如果只有训练图像的数量少，则deepfake的质量会很差。有媒体报道指出，一个良好的deepfake可以用尽可能少的500张图片，但在实践中，质量deepfakes需要成千上万的图像的产生，为什么名人和政治家的deepfakes已经变得如此受欢迎的原因之一。在这项研究中，我们开发的GaN的具有可变的面部表情，我们则用它来生成一个deepfake个体产生的图像的特性。我们观察到，在合成GAN-生成的训练图像的面部表情变化等，并将它们的减少量，我们可以生产近现实deepfake视频。

34. Collaborative Augmented Reality on Smartphones via Life-long City-scale Maps [PDF] 返回目录
Lukas Platinsky, Michal Szabados, Filip Hlasek, Ross Hemsley, Luca Del Pero, Andrej Pancik, Bryan Baum, Hugo Grimmett, Peter Ondruska
Abstract: In this paper we present the first published end-to-end production computer-vision system for powering city-scale shared augmented reality experiences on mobile devices. In doing so we propose a new formulation for an experience-based mapping framework as an effective solution to the key issues of city-scale SLAM scalability, robustness, map updates and all-time all-weather performance required by a production system. Furthermore, we propose an effective way of synchronising SLAM systems to deliver seamless real-time localisation of multiple edge devices at the same time. All this in the presence of network latency and bandwidth limitations. The resulting system is deployed and tested at scale in San Francisco where it delivers AR experiences in a mapped area of several hundred kilometers. To foster further development of this area we offer the data set to the public, constituting the largest of this kind to date.
摘要：在本文中，我们提出了全市范围内共享增强移动设备上的现实体验供电首次公布的终端到终端的生产计算机视觉系统。为此，我们提出了一种新配方的经验为基础的映射框架，有效解决了城市规模的SLAM的可扩展性，健壮性，地图更新和所有时间由生产系统需要全天候性能的关键问题。此外，建议同步SLAM系统，同时提供多种边缘设备的无缝实时定位的有效途径。这一切都在的网络延迟和带宽限制的存在。最终的系统部署，并在旧金山，它提供了在几百公里映射区域AR体验大规模测试。为了促进这一领域的进一步发展，我们提供的数据集给公众，构成最大的这种更新。

35. Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos [PDF] 返回目录
Di Yang, Rui Dai, Yaohui Wang, Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond
Abstract: Taking advantage of human pose data for understanding human activities has attracted much attention these days. However, state-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to occlusion, truncation and low-resolution in real-world un-annotated videos. Hence, in this work, we propose 1) a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refines and smooths the keypoint locations extracted by multiple expert pose estimators, 2) an effective weakly-supervised self-training framework which leverages the aggregated poses as pseudo ground-truth instead of handcrafted annotations for real-world pose estimation. Extensive experiments are conducted for evaluating not only the upstream pose refinement but also the downstream action recognition performance on four datasets, Toyota Smarthome, NTU-RGB+D, Charades, and Kinetics-50. We demonstrate that the skeleton data refined by our Pose-Refinement system (SSTA-PRS) is effective at boosting various existing action recognition models, which achieves competitive or state-of-the-art performance.
摘要：以优势的理解人类活动人体姿势数据备受关注，这些天。然而，在获得高质量的2D或3D的国家的最先进的姿势估计斗争造成由于遮挡，截断和低分辨率在现实世界未标注的视频数据。因此，在这项工作中，我们提出1）选择性时空聚集机制，命名为SST-A，即提炼和平滑由多个专家的姿态估计，2）有效弱监督的自我培训框架它利用提取的关键点位置聚集的姿势伪地面实况，而不是对真实世界的姿态估计手工制作的注解。广泛实验用于评价不仅上游姿态细化也对四个数据集，丰田智能家居，NTU-RGB + d，字谜，和动力学-50下游动作识别性能进行。我们证明了我们的姿态，精细化系统（SSTA-PRS）精骨架数据在提高现有的各种动作识别模型，实现了有竞争力的或国家的最先进的性能效益。

36. Vulnerability of the Neural Networks Against Adversarial Examples: A Survey [PDF] 返回目录
Rui Zhao
Abstract: With further development in the fields of computer vision, network security, natural language processing and so on so forth, deep learning technology gradually exposed certain security risks. The existing deep learning algorithms cannot effectively describe the essential characteristics of data, making the algorithm unable to give the correct result in the face of malicious input. Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of the black box and white box, and classifies them. It briefly describes the application of some adversarial examples in different scenarios in recent years, compares several defense technologies of adversarial examples, and finally summarizes the problems in this research field and prospects for its future development. This paper introduces the common white box attack methods in detail, and further compares the similarities and differences between the attack of the black and white box. Correspondingly, the author also introduces the defense methods, and analyzes the performance of these methods against the black and white box attack.
摘要：随着计算机视觉，网络安全，自然语言处理领域的进一步发展，如此类推，深度学习技术逐渐暴露出一定的安全隐患。现有的深度学习算法不能有效地描述数据的本质特征，使得算法无法给出正确的结果中的恶意输入的面貌。基于所面临的深学习当前的安全威胁，本文介绍的深度学习对抗性的例子问题，梳理了黑盒和白盒，并对其进行分类的现有的攻击和防御方法。它简要介绍了在不同情况下的一些对抗性的例子应用在最近几年，比较了对抗性的例子几个防御技术，最后总结了其未来的发展在这一研究领域存在的问题和前景。本文详细介绍了普通白盒攻击方法，并进一步将黑和白盒的攻击之间的异同进行比较。相应地，笔者还引入了防守方法，并分析了对黑和白盒攻击，这些方法的性能。

37. Transformers for One-Shot Visual Imitation [PDF] 返回目录
Sudeep Dasari, Abhinav Gupta
Abstract: Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal. In other words, we can parse complex semantic knowledge from raw video and efficiently translate that into concrete motor control. Is it possible to give a robot this same capability? Prior research in robot imitation learning has created agents which can acquire diverse skills from expert human operators. However, expanding these techniques to work with a single positive example during test time is still an open challenge. Apart from control, the difficulty stems from mismatches between the demonstrator and robot domains. For example, objects may be placed in different locations (e.g. kitchen layouts are different in every house). Additionally, the demonstration may come from an agent with different morphology and physical appearance (e.g. human), so one-to-one action correspondences are not available. This paper investigates techniques which allow robots to partially bridge these domain gaps, using their past experience. A neural network is trained to mimic ground truth robot actions given context video from another agent, and must generalize to unseen task instances when prompted with new videos during test time. We hypothesize that our policy representations must be both context driven and dynamics aware in order to perform these tasks. These assumptions are baked into the neural network using the Transformers attention mechanism and a self-supervised inverse dynamics loss. Finally, we experimentally determine that our method accomplishes a $\sim 2$x improvement in terms of task success rate over prior baselines in a suite of one-shot manipulation tasks.
摘要：人类能够无缝地在视觉上模仿别人，通过推断他们的意图，并用过去的经验来达到同样的最终目标。换句话说，我们可以分析来自原始视频的复杂语义知识，有效地翻译成具体的电机控制。是否有可能赋予机器人此相同的功能？在机器人模仿学习以前的研究已经创建了能够从专家人工操作各种技能剂。然而，在测试时间扩大这些技术工作由一个正例子仍然是一个开放的挑战。除了控制，难度从验证机和机器人结构域之间的失配造成的。例如，对象可以被放置在不同的位置（例如，厨房布局在每家不同）。此外，示范可能来自不同的形态和物理外观（例如人）的药剂，所以一对一的对应动作不可用。本文研究的技术，允许机器人部分弥合这些差距域，使用他们过去的经验。神经网络进行训练，以给出从另一个代理上下文视频模拟地面实况机器人的动作，当与在测试时间的新视频提示必须推广到看不见的任务实例。我们假设，我们的政策陈述必须是上下文驱动和动态感知，以执行这些任务。这些假设烘烤成使用变形金刚注意机制和自我监督的逆动力学损失的神经网络。最后，我们实验确定，我们的方法完成任务中的成功率优于现有的基准方面的一套一次性的操作任务的$ \ SIM卡2 $ X的改善。

38. FAT: Training Neural Networks for Reliable Inference Under Hardware Faults [PDF] 返回目录
Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, Kees Vissers
Abstract: Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy, but also since they have significantly lower compute and memory requirements compared to their floating point equivalents. QNN deployment is also being evaluated for safety-critical applications, such as automotive, avionics, medical or industrial. These systems require functional safety, guaranteeing failure-free behaviour even in the presence of hardware faults. In general fault tolerance can be achieved by adding redundancy to the system, which further exacerbates the overall computational demands and makes it difficult to meet the power and performance requirements. In order to decrease the hardware cost for achieving functional safety, it is vital to explore domain-specific solutions which can exploit the inherent features of DNNs. In this work we present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device. Our experiments show that by injecting faults in the convolutional layers during training, highly accurate convolutional neural networks (CNNs) can be trained which exhibits much better error tolerance compared to the original. Furthermore, we show that redundant systems which are built from QNNs trained with FAT achieve higher worse-case accuracy at lower hardware cost. This has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.
摘要：深神经网络（DNNs）是国家的最先进的算法在多应用中，跨越从图像分类到语音识别。同时提供出色的精度，他们往往有巨大的计算和存储需求。由于这一结果，量化神经网络（QNNs）越来越多地被采用和部署尤其在嵌入式设备，由于其高的精度，而且还因为它们相对于它们的浮点等价物具有显著较低的计算和存储器需求。 QNN部署也正在评估安全关键型应用，如汽车，航空，医疗或工业。这些系统需要功能安全，保证即使在硬件故障的情况下无故障行为。一般来说容错可以通过添加冗余系统，这进一步加剧了整体的计算需求，并使其难以满足功耗和性能要求来实现。为了降低实现功能安全的硬件成本，它是探索特定领域的解决方案，可以利用DNNs的固有特征是至关重要的。在这项工作中，我们提出了一种新的方法叫做故障感知训练（FAT），其中包括在神经网络（NN）的训练误差建模，使QNNs弹性到设备上的具体故障模式。我们的实验表明，通过训练期间，喷射的卷积层故障，高度精确的卷积神经网络（细胞神经网络）可以被训练比较原始表现出更好的容错性。此外，我们表明，这是从FAT训练有素QNNs内置冗余系统以较低的硬件成本实现更高的最坏情况下的精度。这已被证实为众多分类的任务，包括CIFAR10，GTSRB，SVHN和ImageNet。

39. Distorted image restoration using stacked adversarial network [PDF] 返回目录
Yi Gu, Yuting Gao, Jie Li, Chentao Wu, Weijia Jia
Abstract: Liquify is a common technique for distortion. Due to the uncertainty in the distortion variation, restoring distorted images caused by liquify filter is a challenging task. Unlike existing methods mainly designed for specific single deformation, this paper aims at automatic distorted image restoration, which is characterized by seeking the appropriate warping of multitype and multi-scale distorted images. In this work, we propose a stacked adversarial framework with a novel coherent skip connection to directly predict the reconstruction mappings and represent high-dimensional feature. Since there is no available benchmark which hinders the exploration, we contribute a distorted face dataset by reconstructing distortion mappings based on CelebA dataset. We also introduce a novel method for generating synthesized data. We evaluate our method on proposed benchmark quantitatively and qualitatively, and apply it to the real world for validation.
摘要：液化是用于失真的常见技术。由于在失真变化的不确定性，恢复引起的液化滤镜失真的图像是一项具有挑战性的任务。不同于主要设计用于特定的单变形现有的方法，本文的目的在于自动失真图像恢复，其特征在于通过寻找多型和多尺度的适当翘曲变形的图像。在这项工作中，我们提出了一个新颖的相干跳过连接直接预测重建映射和表示高维特征的堆叠对抗性框架。由于没有可用的基准阻碍的探索，我们通过重建基于CelebA数据集失真映射有助于扭曲的脸数据集。我们还介绍了用于产生合成的数据的新方法。我们定量和定性评估我们对提出的基准方法，并将其应用到现实世界中进行验证。

40. Classification of COVID-19 in Chest CT Images using Convolutional Support Vector Machines [PDF] 返回目录
Umut Özkaya, Şaban Öztürk, Serkan Budak, Farid Melgani, Kemal Polat
Abstract: Purpose: Coronavirus 2019 (COVID-19), which emerged in Wuhan, China and affected the whole world, has cost the lives of thousands of people. Manual diagnosis is inefficient due to the rapid spread of this virus. For this reason, automatic COVID-19 detection studies are carried out with the support of artificial intelligence algorithms. Methods: In this study, a deep learning model that detects COVID-19 cases with high performance is presented. The proposed method is defined as Convolutional Support Vector Machine (CSVM) and can automatically classify Computed Tomography (CT) images. Unlike the pre-trained Convolutional Neural Networks (CNN) trained with the transfer learning method, the CSVM model is trained as a scratch. To evaluate the performance of the CSVM method, the dataset is divided into two parts as training (%75) and testing (%25). The CSVM model consists of blocks containing three different numbers of SVM kernels. Results: When the performance of pre-trained CNN networks and CSVM models is assessed, CSVM (7x7, 3x3, 1x1) model shows the highest performance with 94.03% ACC, 96.09% SEN, 92.01% SPE, 92.19% PRE, 94.10% F1-Score, 88.15% MCC and 88.07% Kappa metric values. Conclusion: The proposed method is more effective than other methods. It has proven in experiments performed to be an inspiration for combating COVID and for future studies.
摘要：目的：冠状病毒2019（COVID-19），它出现在武汉，中国，影响了整个世界，有成本的成千上万人的生命。手动诊断是低效的，由于这种病毒的迅速蔓延。出于这个原因，自动COVID-19检测的研究与支持的人工智能算法进行。方法：在本研究中，检测COVID-19例高性能深度学习模型。所提出的方法被定义为卷积支持向量机（CSVM），并且可以进行自动分类计算机断层扫描（CT）图像。不同于与转印学习方法训练预先训练的卷积神经网络（CNN），该模型CSVM被训练为划伤。为了评估CSVM方法的性能，该数据集被分成两个部分作为训练（％75）和测试（％25）。该模型CSVM由含有SVM内核的三个不同的号码块。结果：当预先训练CNN网络和CSVM模型的性能进行评估，CSVM（为7x7，3x3的，1x1的）模型显示有94.03％ACC，96.09％SEN，92.01％SPE，92.19％的预性能最高，94.10％的F1 -Score，88.15％MCC和88.07％卡帕指标值。结论：该方法比其他方法更有效。它已被证明在执行是打击COVID和未来研究的灵感实验。

41. EvidentialMix: Learning with Combined Open-set and Closed-set Noisy Labels [PDF] 返回目录
Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
Abstract: The efficacy of deep learning depends on large-scale data sets that have been carefully curated with reliable data acquisition and annotation processes. However, acquiring such large-scale data sets with precise annotations is very expensive and time-consuming, and the cheap alternatives often yield data sets that have noisy labels. The field has addressed this problem by focusing on training models under two types of label noise: 1) closed-set noise, where some training samples are incorrectly annotated to a training label other than their known true class; and 2) open-set noise, where the training set includes samples that possess a true class that is (strictly) not contained in the set of known training labels. In this work, we study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels, and introduce a benchmark evaluation to assess the performance of training algorithms under this setup. We argue that such problem is more general and better reflects the noisy label scenarios in practice. Furthermore, we propose a novel algorithm, called EvidentialMix, that addresses this problem and compare its performance with the state-of-the-art methods for both closed-set and open-set noise on the proposed benchmark. Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods. The code is available at this https URL.
摘要：深学习的有效性取决于有可靠的数据采集和注释过程经过精心策划的大型数据集。然而，获得这样的大型数据集的精确注释是非常昂贵和耗时，且价格便宜的替代品往往产生有嘈杂的标签数据集。该场已通过集中培训模式下的两种类型的标签噪音的解决了这个问题：1）收集噪声，其中一些训练样本被错误地注释比他们知道真正的类以外的培训标签;和2）开放组噪声，其中该训练集包括具有一个真正的类，它是样品（严格）不包含在所述一组已知训练标签。在这项工作中，我们研究了嘈杂的标签问题，结合开集和闭集嘈杂标签的新变种，并引入一个基准评估，以确定此设置下的训练算法的性能。我们认为，这样的问题就比较一般了，并更好地反映在实践中嘈杂的标签方案。此外，我们提出了一种新的算法，称为EvidentialMix，即解决了这个问题，并比较其与国家的最先进的方法都闭集和开集的噪声就拟议基准性能。我们的研究结果表明，我们的方法产生卓越的分类结果，并比以前国家的最先进的方法，更好的特征表示。该代码可在此HTTPS URL。

42. Skin disease diagnosis with deep learning: a review [PDF] 返回目录
Hongfeng Li
Abstract: Skin cancer is one of the most threatening diseases worldwide. However, diagnosing a skin cancer correctly is challenging. Recently, deep learning algorithms have achieved excellent performance on various tasks. Particularly, they have been also implemented for the tasks of skin disease diagnosis. In this paper, we present a review on deep learning methods and their applications in skin disease diagnosis. We first introduce skin diseases and image acquisition methods in dermatology, and list several publicly available datasets for training and testing algorithms for skin disease diagnosis. Then, we introduce the conception of deep learning and review popular deep learning architectures. Thereafter, popular deep learning frameworks that facilitate the implementation of deep learning algorithms and performance evaluation metrics are presented. As an important part of this article, we then review the literatures involving deep learning methods for skin disease diagnosis from several aspects according to the specific tasks. Additionally, we discuss the challenges faced in the area of skin disease diagnosis with deep learning and suggest possible future research directions. Finally, we summarize the article. The major purpose of this article is to provide a conceptual and systematically review of the recent works on skin disease diagnosis with deep learning. Given the popularity of deep learning, there remains great challenges in the area, as well as opportunities that we can explore in the future.
摘要：皮肤癌是世界上最危险的疾病之一。然而，正确诊断皮肤癌是具有挑战性的。近日，深学习算法已经实现对各种任务的优异性能。特别是，他们一直还实施了皮肤疾病诊断的任务。在本文中，我们提出了深入学习方法及其在皮肤病诊断中的应用进行审查。我们首先介绍了皮肤病皮肤科图像采集方法，以及集训名单几个公开可用的数据集和皮肤疾病的诊断测试的算法。然后，我们介绍了深度学习的概念和审查流行的深度学习架构。此后，有利于深学习算法和性能评价指标的实现流行的深度学习的框架介绍。正如本文的重要组成部分，我们再回顾，涉及对皮肤疾病的诊断从几个方面要根据具体任务的深度学习方法的文献。此外，我们讨论面临的皮肤疾病的诊断与深度学习领域所面临的挑战，并提出未来可能的研究方向。最后，我们总结了文章。这篇文章的主要目的是提供对皮肤疾病的诊断与深度学习的近期作品的概念，系统地审查。由于深学习的普及，仍然在该地区巨大的挑战，也有机遇，我们可以在未来的探索。

43. Adversarial images for the primate brain [PDF] 返回目录
Li Yuan, Will Xiao, Gabriel Kreiman, Francis E.H. Tay, Jiashi Feng, Margaret S. Livingstone
Abstract: Deep artificial neural networks have been proposed as a model of primate vision. However, these networks are vulnerable to adversarial attacks, whereby introducing minimal noise can fool networks into misclassifying images. Primate vision is thought to be robust to such adversarial images. We evaluated this assumption by designing adversarial images to fool primate vision. To do so, we first trained a model to predict responses of face-selective neurons in macaque inferior temporal cortex. Next, we modified images, such as human faces, to match their model-predicted neuronal responses to a target category, such as monkey faces. These adversarial images elicited neuronal responses similar to the target category. Remarkably, the same images fooled monkeys and humans at the behavioral level. These results challenge fundamental assumptions about the similarity between computer and primate vision and show that a model of neuronal activity can selectively direct primate visual behavior.
摘要：深人工神经网络已被提议作为灵长类动物视觉模型。然而，这些网络容易受到攻击敌对，由此引入最小的噪音可以欺骗网络分为误分类的图像。灵长类动物的视觉被认为是稳健的这种对抗性的图像。我们通过设计敌对图像愚弄灵长类动物视觉评估这个假设。要做到这一点，我们首先训练了一个模型来预测猕猴颞皮层的面孔选择神经元的反应。接下来，我们修改图像，如人脸，以匹配目标类别的模型预测神经元的反应，如猴子的脸。这些对抗性的图像引起类似目标类别神经元的反应。值得注意的是，相同的图像愚弄猴子和人类在行为层面。这些结果挑战有关计算机和灵长类动物的视觉和显示，神经元活动的模型可以选择直接灵长类动物视觉行为之间的相似度基本假设。

44. Invertible CNN-Based Super Resolution with Downsampling Awareness [PDF] 返回目录
Andrew Geiss, Joseph C. Hardin
Abstract: Single image super resolution involves artificially increasing the resolution of an image. Recently, convolutional neural networks have been demonstrated as very powerful tools for this problem. These networks are typically trained by artificially degrading high resolution images and training the neural network to reproduce the original. Because these neural networks are learning an inverse function for an image downsampling scheme, their high-resolution outputs should ideally re-produce the corresponding low-resolution input when the same downsampling scheme is applied. This constraint has not historically been explicitly and strictly imposed during training however. Here, a method for "downsampling aware" super resolution networks is proposed. A differentiable operator is applied as the final output layer of the neural network that forces the downsampled output to match the low resolution input data under 2D-average downsampling. It is demonstrated that appending this operator to a selection of state-of-the-art deep-learning-based super resolution schemes improves training time and overall performance on most of the common image super resolution benchmark datasets. In addition to this performance improvement for images, this method has potentially broad and significant impacts in the physical sciences. This scheme can be applied to data produced by medical scans, precipitation radars, gridded numerical simulations, satellite imagers, and many other sources. In such applications, the proposed method's guarantee of strict adherence to physical conservation laws is of critical importance.
摘要：单张超解像涉及人为地增加图像的分辨率。近日，卷积神经网络已经被证明是对这个问题非常强大的工具。这些网络通常由人工降解高分辨率图像训练，训练神经网络来重现原作。由于这些神经网络学习的图像的下采样方案的逆函数，它们的高分辨率输出应理想地被施加相同的下采样方案时重新产生相应的低分辨率的输入。这种约束也没有历史上一直明确和训练但在严格的规定。在这里，“知道下采样”超分辨率的网络提出了一种方法。甲微分算子被应用于作为神经网络的力将下采样的输出，以匹配下2D-平均下采样低分辨率输入数据的最终输出层。据证实，此附加操作者选择深学习为基础的国家的最先进的超分辨率方案提高了训练时间和整体性能上最常见的图像超分辨率的基准数据集。除了对图像的这一性能改进，这种方法在物理科学潜在的广泛而显著的影响。该方案可被应用于由医学扫描，沉淀雷达，网格数值模拟，卫星成像器，以及许多其他来源产生的数据。在这些应用中，严格遵守物理守恒定律的提出的方法的保证是至关重要的。

45. An ensemble-based approach by fine-tuning the deep transfer learning models to classify pneumonia from chest X-ray images [PDF] 返回目录
Sagar Kora Venu
Abstract: Pneumonia is caused by viruses, bacteria, or fungi that infect the lungs, which, if not diagnosed, can be fatal and lead to respiratory failure. More than 250,000 individuals in the United States, mainly adults, are diagnosed with pneumonia each year, and 50,000 die from the disease. Chest Radiography (X-ray) is widely used by radiologists to detect pneumonia. It is not uncommon to overlook pneumonia detection for a well-trained radiologist, which triggers the need for improvement in the diagnosis's accuracy. In this work, we propose using transfer learning, which can reduce the neural network's training time and minimize the generalization error. We trained, fine-tuned the state-of-the-art deep learning models such as InceptionResNet, MobileNetV2, Xception, DenseNet201, and ResNet152V2 to classify pneumonia accurately. Later, we created a weighted average ensemble of these models and achieved a test accuracy of 98.46%, precision of 98.38%, recall of 99.53%, and f1 score of 98.96%. These performance metrics of accuracy, precision, and f1 score are at their highest levels ever reported in the literature, which can be considered a benchmark for the accurate pneumonia classification.
摘要：肺炎是由病毒，细菌或感染肺部，而如果没有被诊断，可能是致命的，导致呼吸衰竭真菌引起的。超过25万个人在美国，主要是成年人，每年被诊断为肺炎，从发病50000芯片。胸部X光（X射线）被广泛用于通过放射检测肺炎。这并非罕见忽视肺炎检测训练有素的放射科医师，这将触发用于诊断的精度提高的需要。在这项工作中，我们建议使用迁移学习，这样可以减少神经网络的训练时间，并尽量减少泛化误差。我们训练有素，微调的国家的最先进的深度学习模式，如InceptionResNet，MobileNetV2，Xception，DenseNet201和ResNet152V2分类肺炎准确。后来，我们创造了这些模型的加权平均合奏等方面取得了测试精度的98.46％，精密的98.38％，召回的99.53％，和F1值的98.96％。准确度，精密度和F1分数的这些性能指标在其历史最高水平的文献，可以认为是准确的肺炎分类的基准报告。

46. A Unified Framework for Compressive Video Recovery from Coded Exposure Techniques [PDF] 返回目录
Prasan Shedligeri, Anupama S, Kaushik Mitra
Abstract: Several coded exposure techniques have been proposed for acquiring high frame rate videos at low bandwidth. Most recently, a Coded-2-Bucket camera has been proposed that can acquire two compressed measurements in a single exposure, unlike previously proposed coded exposure techniques, which can acquire only a single measurement. Although two measurements are better than one for an effective video recovery, we are yet unaware of the clear advantage of two measurements, either quantitatively or qualitatively. Here, we propose a unified learning-based framework to make such a qualitative and quantitative comparison between those which capture only a single coded image (Flutter Shutter, Pixel-wise coded exposure) and those that capture two measurements per exposure (C2B). Our learning-based framework consists of a shift-variant convolutional layer followed by a fully convolutional deep neural network. Our proposed unified framework achieves the state of the art reconstructions in all three sensing techniques. Further analysis shows that when most scene points are static, the C2B sensor has a significant advantage over acquiring a single pixel-wise coded measurement. However, when most scene points undergo motion, the C2B sensor has only a marginal benefit over the single pixel-wise coded exposure measurement.
摘要：一些编码曝光技术已经被提出了在低带宽获取高帧率视频。最近，编码2桶相机已经提出，可以获取两个压缩测量在单次曝光，不同于先前提出的编码曝光技术，其只能获得单次测量。虽然两种测量比一个有效的视频恢复好，我们还没有意识到的两次测量的明显优势，定量或定性。在这里，我们提出了一个统一的基于学习的框架，以使那些之间的这种定性和定量比较其捕获只有一个编码图像（扑快门，逐像素编码曝光）和那些捕捉每次曝光两次测量（C2B）。我们基于学习的框架，由一个变变卷积层，然后完全卷积深层神经网络。我们提出的统一的框架，实现了在所有三个传感技术的艺术重建的状态。进一步的分析表明，当大多数场景点是静态的，C2B传感器具有在获取单逐像素编码测量一个显著优势。然而，当最场景点经历运动时，C2B传感器具有仅在单个像素方式编码曝光测量一个边际效益。

47. Dense U-net for super-resolution with shuffle pooling layer [PDF] 返回目录
Zhengyang Lu, Ying Chen
Abstract: Single image super-resolution (SISR) in unconstrained environments is challenging because of various illuminations, occlusion and complex environments. Recent researches have achieved great progress on super-resolution due to the development of deep learning in the field of computer vision. In this letter, a Dense U-net with shuffle pooling method is proposed. First, a modified U-net with dense blocks, called dense U-net, is proposed for SISR. Second, a novel pooling strategy called shuffle pooling is designed, which is applied to the dense U-Net for super-resolution task. Third, a mix loss function, which combined with Mean Square Error(MSE), Structural Similarity Index (SSIM) and Mean Gradient Error (MGE), is proposed to solve the perception loss and high-frequency information loss. The proposed method achieves superior accuracy over previous state-of-the-arts on the three benchmark datasets: SET14, BSD300, ICDAR2003. Code is available online.
摘要：在无约束的环境中单图像超分辨率（SISR），是因为不同的照明，闭塞和复杂环境的挑战。最近的研究由于深学习计算机视觉领域的发展所取得的超分辨率很大的进步。在这封信中，密集的U型网带洗牌池方法提出。首先，一个变形的U净致密块，称为致密U形网，提出了一种用于SISR。其次，所谓的洗牌池一个新的合并策略的设计，这是适用于密集的掌中超分辨率任务。第三，混合损失函数，它与均方误差（MSE）相结合，结构相似性指数（SSIM）和平均梯度误差（MGE）中，提出了解决的感知损失和高频信息的损失。所提出的方法实现了以前的状态的最艺术的三个基准数据集卓越的测量精度：SET14，BSD300，ICDAR2003。代码可在网上。

48. Do You See What I See? Coordinating Multiple Aerial Cameras for Robot Cinematography [PDF] 返回目录
Arthur Bucker, Rogerio Bonatti, Sebastian Scherer
Abstract: Aerial cinematography is significantly expanding the capabilities of film-makers. Recent progress in autonomous unmanned aerial vehicles (UAVs) has further increased the potential impact of aerial cameras, with systems that can safely track actors in unstructured cluttered environments. Professional productions, however, require the use of multiple cameras simultaneously to record different viewpoints of the same scene, which are edited into the final footage either in real time or in post-production. Such extreme motion coordination is particularly hard for unscripted action scenes, which are a common use case of aerial cameras. In this work we develop a real-time multi-UAV coordination system that is capable of recording dynamic targets while maximizing shot diversity and avoiding collisions and mutual visibility between cameras. We validate our approach in multiple cluttered environments of a photo-realistic simulator, and deploy the system using two UAVs in real-world experiments. We show that our coordination scheme has low computational cost and takes only 1.17 ms on average to plan for a team of 3 UAVs over a 10 s time horizon. Supplementary video: this https URL
摘要：航空摄影是显著扩大电影制作的能力。在自主无人飞行器（UAV）的最新进展进一步增加的航拍相机的潜在影响，与可以安全地跟踪非结构化杂乱的环境行为的系统。专业生产，但是，同时需要使用多台摄像机录制的同一场景，这是在实时或后期制作编辑或者进入决赛的录像不同的观点。这种极端运动协调是脱稿动作场面，它们是航拍相机常见的使用情况下是特别困难的。在这项工作中，我们开发了实时多无人机协同系统，该系统能够记录动态目标的同时最大限度地提高拍摄的多样性和避免碰撞和照相机之间的相互知名度。我们验证了我们在照片般逼真的仿真器的多个杂乱的环境方针和部署使用两种无人机在现实世界的实验系统。我们证明了我们的协调方案具有较低的计算成本，并在10秒的时间跨度仅需1.17平均计划毫秒一队的3架无人机。补充视频：此HTTPS URL

49. Self-Supervised Out-of-Distribution Detection in Brain CT Scans [PDF] 返回目录
Abinav Ravi Venkatakrishnan, Seong Tae Kim, Rami Eisawy, Franz Pfister, Nassir Navab
Abstract: Medical imaging data suffers from the limited availability of annotation because annotating 3D medical data is a time-consuming and expensive task. Moreover, even if the annotation is available, supervised learning-based approaches suffer highly imbalanced data. Most of the scans during the screening are from normal subjects, but there are also large variations in abnormal cases. To address these issues, recently, unsupervised deep anomaly detection methods that train the model on large-sized normal scans and detect abnormal scans by calculating reconstruction error have been reported. In this paper, we propose a novel self-supervised learning technique for anomaly detection. Our architecture largely consists of two parts: 1) Reconstruction and 2) predicting geometric transformations. By training the network to predict geometric transformations, the model could learn better image features and distribution of normal scans. In the test time, the geometric transformation predictor can assign the anomaly score by calculating the error between geometric transformation and prediction. Moreover, we further use self-supervised learning with context restoration for pretraining our model. By comparative experiments on clinical brain CT scans, the effectiveness of the proposed method has been verified.
摘要：从标注的有限医学影像数据遭罪，因为标注3D医学数据是耗时且昂贵的任务。此外，即使注释是可用的，监督学习型方法挨高度不平衡的数据。大部分的筛选过程中扫描的是正常人，但也有在异常情况下的大变化。为了解决这些问题，最近，火车大型正常扫描模型，并通过计算重构误差已报告发现异常扫描，无监督的深层异常检测方法。在本文中，我们提出了异常检测的一种新的自我监督学习技术。我们的结构大体上是由两个部分组成：1）重构和2）预测几何变换。通过训练网络来预测几何变换，该模型能够更好地学习图像特征和正常扫描的分布。在测试时间，几何变换预测器可以通过计算几何变换和预测之间的误差分配异常分数。此外，我们还利用自身监督与上下文恢复训练前为我们的模型中学习。通过对临床脑CT扫描对比实验，该方法的有效性得到了验证。

50. Glioma Classification Using Multimodal Radiology and Histology Data [PDF] 返回目录
Azam Hamidinekoo, Tomasz Pieciak, Maryam Afzali, Otar Akanyeti, Yinyin Yuan
Abstract: Gliomas are brain tumours with a high mortality rate. There are various grades and sub-types of this tumour, and the treatment procedure varies accordingly. Clinicians and oncologists diagnose and categorise these tumours based on visual inspection of radiology and histology data. However, this process can be time-consuming and subjective. The computer-assisted methods can help clinicians to make better and faster decisions. In this paper, we propose a pipeline for automatic classification of gliomas into three sub-types: oligodendroglioma, astrocytoma, and glioblastoma, using both radiology and histopathology images. The proposed approach implements distinct classification models for radiographic and histologic modalities and combines them through an ensemble method. The classification algorithm initially carries out tile-level (for histology) and slice-level (for radiology) classification via a deep learning method, then tile/slice-level latent features are combined for a whole-slide and whole-volume sub-type prediction. The classification algorithm was evaluated using the data set provided in the CPM-RadPath 2020 challenge. The proposed pipeline achieved the F1-Score of 0.886, Cohen's Kappa score of 0.811 and Balance accuracy of 0.860. The ability of the proposed model for end-to-end learning of diverse features enables it to give a comparable prediction of glioma tumour sub-types.
摘要：神经胶质瘤是脑瘤的高死亡率。有各种等级和子类型此肿瘤的，且治疗过程相应地变化。临床医生和肿瘤科医生诊断和分类基于放射学和组织学数据的目视检查这些肿瘤。然而，这个过程可能会耗时和主观的。计算机辅助方法可帮助临床医生做出更快更好的决策。在本文中，我们提出了胶质瘤的自动分类管道分为三个亚类：少突胶质瘤，星形细胞瘤，胶质母细胞瘤，同时使用放射学和组织病理学图像。所提出的方法工具不同的分类模型进行射线和组织学的模式，并将它们结合通过集成方法。分类算法最初执行经由深学习方法瓦级（用于组织学）和条带级（用于放射学）的分类，然后瓦/切片级潜特征被组合为一个整体滑动和全体积的子类型预测。利用在CPM-RadPath 2020挑战提供所述数据集的分类算法进行评价。拟议的管道达到0.886的F1-得分，Cohen的κ得分0.811和0.860平衡精度。该模型对终端到终端的学习具有不同特性的能力，使其能够给胶质瘤的子类型可比的预测。

51. Deep Learning Derived Histopathology Image Score for Increasing Phase 3 Clinical Trial Probability of Success [PDF] 返回目录
Qi Tang, Vardaan Kishore Kumar
Abstract: Failures in Phase 3 clinical trials contribute to expensive cost of drug development in oncology. To drastically reduce such cost, responders to an oncology treatment need to be identified early on in the drug development process with limited amount of patient data before the planning of Phase 3 clinical trials. Despite the challenge of small sample size, we pioneered the use of deep-learning derived digital pathology scores to identify responders based on the immunohistochemistry images of the target antigen expressed in tumor biopsy samples from a Phase 1 Non-small Cell Lung Cancer clinical trial. Based on repeated 10-fold cross validations, the deep-learning derived score on average achieved 4% higher AUC of ROC curve and 6% higher AUC of Precision-Recall curve comparing to the tumor proportion score (TPS) based clinical benchmark. In a small independent testing set of patients, we also demonstrated that the deep-learning derived score achieved numerically at least 25% higher responder rate in the enriched population than the TPS clinical benchmark.
摘要：在故障3期临床试验有助于肿瘤药物开发成本昂贵。为了大幅度降低成本等，反应到肿瘤治疗需要及早在与病人的数据量有限的药物开发过程3期临床试验的计划之前确定。尽管小样品尺寸的挑战，我们首创了使用深学习的衍生数字病理学评分来识别基于从第1阶段非小细胞肺癌的临床试验肿瘤活检样品中所表达靶抗原的免疫组织化学的图像应答者。基于重复10倍交叉验证，平均深学习衍生得分达到高4％ROC曲线和6％精密召回曲线的更高的AUC进行比较来基于临床基准肿瘤比例分数（TPS）的AUC。在一个小型的独立测试集的患者，我们也证明了深学习得到的分数取得了富集的群体响应数值至少25％，快于TPS临床标杆。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-11-12

目录

摘要