0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-02-14

目录

1. Automatically Discovering and Learning New Visual Categories with Ranking Statistics [PDF] 摘要
2. Classifying the classifier: dissecting the weight space of neural networks [PDF] 摘要
3. Summarizing the performances of a background subtraction algorithm measured on several videos [PDF] 摘要
4. GANILLA: Generative Adversarial Networks for Image to Illustration Translation [PDF] 摘要
5. Asynchronous Tracking-by-Detection on Adaptive Time Surfaces for Event-based Object Tracking [PDF] 摘要
6. SpotNet: Self-Attention Multi-Task Network for Object Detection [PDF] 摘要
7. Replacing Mobile Camera ISP with a Single Deep Learning Model [PDF] 摘要
8. Chaotic Phase Synchronization and Desynchronization in an Oscillator Network for Object Selection [PDF] 摘要
9. EndoL2H: Deep Super-Resolution for Capsule Endoscopy [PDF] 摘要
10. Emotion Recognition for In-the-wild Videos [PDF] 摘要
11. Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks [PDF] 摘要
12. Hypergraph Optimization for Multi-structural Geometric Model Fitting [PDF] 摘要
13. Object Detection on Single Monocular Images through Canonical Correlation Analysis [PDF] 摘要
14. Continual Universal Object Detection [PDF] 摘要
15. SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud [PDF] 摘要
16. Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization [PDF] 摘要
17. Solving Missing-Annotation Object Detection with Background Recalibration Loss [PDF] 摘要
18. Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System [PDF] 摘要
19. Image-to-Image Translation with Text Guidance [PDF] 摘要
20. Cross-Iteration Batch Normalization [PDF] 摘要
21. A Simple Framework for Contrastive Learning of Visual Representations [PDF] 摘要
22. Generative-based Airway and Vessel Morphology Quantification on Chest CT Images [PDF] 摘要
23. Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE [PDF] 摘要
24. FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains [PDF] 摘要
25. Machines Learn Appearance Bias in Face Recognition [PDF] 摘要
26. Sparse and Structured Visual Attention [PDF] 摘要
27. Superpixel Image Classification with Graph Attention Networks [PDF] 摘要
28. Deep Learning-based End-to-end Diagnosis System for Avascular Necrosis of Femoral Head [PDF] 摘要
29. Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner [PDF] 摘要
30. Real or Not Real, that is the Question [PDF] 摘要
31. MLFcGAN: Multi-level Feature Fusion based Conditional GAN for Underwater Image Color Correction [PDF] 摘要
32. Physical Accuracy of Deep Neural Networks for 2D and 3D Multi-Mineral Segmentation of Rock micro-CT Images [PDF] 摘要
33. A Provably Robust Multiple Rotation Averaging Scheme for SO(2) [PDF] 摘要
34. Geom-GCN: Geometric Graph Convolutional Networks [PDF] 摘要
35. HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models [PDF] 摘要
36. Graph Similarity Using PageRank and Persistent Homology [PDF] 摘要

摘要

1. Automatically Discovering and Learning New Visual Categories with Ranking Statistics [PDF] 返回目录
  Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman
Abstract: We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. This setting is similar to semi-supervised learning, but significantly harder because there are no labelled examples for the new classes. The challenge, then, is to leverage the information contained in the labelled images in order to learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data. In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
摘要:我们给出解决其他类的标识样本的图像集合中发现新类的问题。因为有新的类没有标识样本此设置类似于半监督学习,而是显著更难。我们面临的挑战,那么,是利用以学习通用的集群模式,并使用后者来识别未标记的数据的新类包含在标记图像的信息。在这项工作中,我们结合三个想法解决这个问题:(1)我们建议使用标记的数据自举图像表示的共同方法只引入了不必要的偏见,而这可以通过自我监督学习到火车避免从标记的和未标记的数据的联合划伤表示; (2)我们使用排名统计标记的类模型的知识传递给聚类未标记的图像的问题;和,(3),我们通过优化上的数据的标记和未标记的子集的联合目标函数,提高了标记的数据的两个监督分类,和未标记数据的聚类训练数据表示。我们评估的标准分类的基准方法,并超越由显著裕新类别发现目前的方法。

2. Classifying the classifier: dissecting the weight space of neural networks [PDF] 返回目录
  Gabriel Eilertsen, Daniel Jönsson, Timo Ropinski, Jonas Unger, Anders Ynnerman
Abstract: This paper presents an empirical study on the weights of neural networks, where we interpret each model as a point in a high-dimensional space -- the neural weight space. To explore the complex structure of this space, we sample from a diverse selection of training variations (dataset, optimization procedure, architecture, etc.) of neural network classifiers, and train a large number of models to represent the weight space. Then, we use a machine learning approach for analyzing and extracting information from this space. Most centrally, we train a number of novel deep meta-classifiers with the objective of classifying different properties of the training setup by identifying their footprints in the weight space. Thus, the meta-classifiers probe for patterns induced by hyper-parameters, so that we can quantify how much, where, and when these are encoded through the optimization process. This provides a novel and complementary view for explainable AI, and we show how meta-classifiers can reveal a great deal of information about the training setup and optimization, by only considering a small subset of randomly selected consecutive weights. To promote further research on the weight space, we release the neural weight space (NWS) dataset -- a collection of 320K weight snapshots from 16K individually trained deep neural networks.
摘要:本文介绍了关于神经网络,在那里我们解释每个模型为高维空间中的点的权重进行了实证研究 - 神经权空间。为了探究这种空间的结构复杂,从我们的神经网络分类器的训练变化(数据集,优化过程,建筑等)的多样选择采样,并培养了大量的模型来表示重量的空间。然后,我们用从这个空间分析和提取信息的机器学习方法。最集中,我们通过鉴定权空间他们的足迹培养出一批新的深荟萃分类与客观的培训设置的不同性质进行分类的。因此,元分类探测由超参数引起的模式,让我们多少可以量化,在那里,当这些通过优化过程进行编码。这为解释的AI一种新颖的和互补的观点,我们展示荟萃分类如何揭示的有关训练的设置和优化的大量信息,只考虑随机选择的连续权重的一小部分。为了促进对重空间的进一步研究,我们释放神经权空间(NWS)数据集 - 的320K重量快照从16K集合单独训练深层神经网络。

3. Summarizing the performances of a background subtraction algorithm measured on several videos [PDF] 返回目录
  Sébastien Piérard, Marc Van Droogenbroeck
Abstract: There exist many background subtraction algorithms to detect motion in videos. To help comparing them, datasets with ground-truth data such as CDNET or LASIESTA have been proposed. These datasets organize videos in categories that represent typical challenges for background subtraction. The evaluation procedure promoted by their authors consists in measuring performance indicators for each video separately and to average them hierarchically, within a category first, then between categories, a procedure which we name "summarization". While the summarization by averaging performance indicators is a valuable effort to standardize the evaluation procedure, it has no theoretical justification and it breaks the intrinsic relationships between summarized indicators. This leads to interpretation inconsistencies. In this paper, we present a theoretical approach to summarize the performances for multiple videos that preserves the relationships between performance indicators. In addition, we give formulas and an algorithm to calculate summarized performances. Finally, we showcase our observations on CDNET 2014.
摘要:存在很多背景减除算法来检测视频中的运动。要比较它们的帮助下,与地面实况数据,如CDNET或LASIESTA数据集已经被提出。这些数据集在组织代表背景扣除的典型挑战类视频。它们的作者所倡导的评估过程包括分别测量性能指标为每个视频和他们的平均分层次,类别之内,然后再分类,这是我们的名字“汇总”的程序之间。虽然通过平均业绩指标汇总是一种宝贵的努力,以规范的评估程序,它没有理论依据和它打破了总结指标之间的内在关系。这导致解释不一致。在本文中,我们提出了一个理论方法总结为保留性能指标之间的关系多部影片的演出。另外,我们给出的公式和算法来计算总结演出。最后,我们上展示CDNET 2014我们的观察。

4. GANILLA: Generative Adversarial Networks for Image to Illustration Translation [PDF] 返回目录
  Samet Hicsonmez, Nermin Samet, Emre Akbas, Pinar Duygulu
Abstract: In this paper, we explore illustrations in children's books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content. There are no well-defined or agreed-upon evaluation metrics for unpaired image-to-image translation. So far, the success of image translation models has been based on subjective, qualitative visual comparison on a limited number of images. To address this problem, we propose a new framework for the quantitative evaluation of image-to-illustration models, where both content and style are taken into account using separate classifiers. In this new evaluation framework, our proposed model performs better than the current state-of-the-art models on the illustrations dataset. Our code and pretrained models can be found at this https URL.
摘要:在本文中,我们将探讨在儿童读物插图不成对图像 - 图像转换一个新的领域。我们发现,虽然目前国家的最先进的图像到图像的翻译模式成功传输的样式或内容,他们不能在同一时间传送两者。我们提出了一个新的发电机网络,以解决这一问题,并表明,导致网络罢工的风格和内容之间实现更好的平衡。有没有明确的或商定的评估指标不成对图像 - 图像转换。到目前为止,图像平移模式的成功是基于图像的数量有限,主观的,定性的视觉比较。为了解决这个问题,我们提出了图像到图模型的定量评价,在内容和风格都使用单独的分类考虑到了新的框架。在这个新的评估框架,我们提出的模型比对说明当前国家的最先进的机型更好的数据集。我们的代码和预训练的模型可以在此HTTPS URL中找到。

5. Asynchronous Tracking-by-Detection on Adaptive Time Surfaces for Event-based Object Tracking [PDF] 返回目录
  Haosheng Chen, Qiangqiang Wu, Yanjie Liang, Xinbo Gao, Hanzi Wang
Abstract: Event cameras, which are asynchronous bio-inspired vision sensors, have shown great potential in a variety of situations, such as fast motion and low illumination scenes. However, most of the event-based object tracking methods are designed for scenarios with untextured objects and uncluttered backgrounds. There are few event-based object tracking methods that support bounding box-based object tracking. The main idea behind this work is to propose an asynchronous Event-based Tracking-by-Detection (ETD) method for generic bounding box-based object tracking. To achieve this goal, we present an Adaptive Time-Surface with Linear Time Decay (ATSLTD) event-to-frame conversion algorithm, which asynchronously and effectively warps the spatio-temporal information of asynchronous retinal events to a sequence of ATSLTD frames with clear object contours. We feed the sequence of ATSLTD frames to the proposed ETD method to perform accurate and efficient object tracking, which leverages the high temporal resolution property of event cameras. We compare the proposed ETD method with seven popular object tracking methods, that are based on conventional cameras or event cameras, and two variants of ETD. The experimental results show the superiority of the proposed ETD method in handling various challenging environments.
摘要:事件相机,其是异步仿生视觉传感器,已在各种情况下,如快动作和低照明场景示出巨大的潜力。然而,大多数的基于事件的对象跟踪方法设计用于无网纹对象和整洁的背景场景。很少有基于事件的对象跟踪方法,借助现成支持边界对象跟踪。这背后工作的主要思想是提出了基于框包围仿制对象跟踪基于异步事件跟踪 - 通过检测(ETD)方法。为了实现这个目标,提出了一种自适应时表面与线性时间衰减(ATSLTD)事件到帧转换算法,它异步地和有效地经线异步视网膜事件的时空信息来ATSLTD帧的清晰目的的序列轮廓。我们从进料ATSLTD帧序列所提出的ETD方法来执行精确和高效的对象跟踪,它利用的事件摄像机的高时间分辨率特性。我们比较建议的ETD方法有七个流行的对象跟踪方法,是基于传统相机或事件相机和ETD的两个变种。实验结果表明,在处理各种复杂的环境下提出的ETD方法的优越性。

6. SpotNet: Self-Attention Multi-Task Network for Object Detection [PDF] 返回目录
  Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Maguelonne Héritier
Abstract: Humans are very good at directing their visual attention toward relevant areas when they search for different types of objects. For instance, when we search for cars, we will look at the streets, not at the top of buildings. The motivation of this paper is to train a network to do the same via a multi-task learning approach. To train visual attention, we produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow. Using these labels, we train an object detection model to produce foreground/background segmentation maps as well as bounding boxes while sharing most model parameters. We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes, decreasing the signal of non-relevant areas. We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets, with state-of-the-art results on both UA-DETRAC and UAVDT.
摘要:人类是在指导他们的视觉注意力转向当他们搜索不同类型的对象的相关领域的非常好。例如,当我们搜索汽车,我们将着眼于街道,而不是在建筑物的顶部。这篇文章的动机是培养网络通过多任务学习的方法来这样做。培养视觉注意,我们产生一个半监督方式前景/背景分割的标签,使用背景减除或光流。使用这些标签,我们培养的目标检测模型生成前景/背景分割映射以及边界框,而分享最模型参数。我们使用这些分割网络作为自注意机制加权用于生产边界框的功能地图内的地图,减小非相关领域的信号。我们表明,采用这种方法,我们获得显著改善地图上的两个交通监控的数据集,用两个UA-DETRAC和UAVDT国家的先进成果。

7. Replacing Mobile Camera ISP with a Single Deep Learning Model [PDF] 返回目录
  Andrey Ignatov, Luc Van Gool, Radu Timofte
Abstract: As the popularity of mobile photography is growing constantly, lots of efforts are being invested now into building complex hand-crafted camera ISP solutions. In this work, we demonstrate that even the most sophisticated ISP pipelines can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device. For this, we present PyNET, a novel pyramidal CNN architecture designed for fine-grained image restoration that implicitly learns to perform all ISP steps such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The model is trained to convert RAW Bayer data obtained directly from mobile camera sensor into photos captured with a professional high-end DSLR camera, making the solution independent of any particular mobile ISP implementation. To validate the proposed approach on the real data, we collected a large-scale dataset consisting of 10 thousand full-resolution RAW-RGB image pairs captured in the wild with the Huawei P20 cameraphone (12.3 MP Sony Exmor IMX380 sensor) and Canon 5D Mark IV DSLR. The experiments demonstrate that the proposed solution can easily get to the level of the embedded P20's ISP pipeline that, unlike our approach, is combining the data from two (RGB + B/W) camera sensors. The dataset, pre-trained models and codes used in this paper are available on the project website.
摘要:随着移动摄影的普及在不断增加,大量的努力,现在正在投入到构建复杂的手工制作的相机ISP解决方案。在这项工作中,我们证明,即使是最先进的ISP管道可以用单端至端深学习模型没有关于特定设备中使用的传感器和光学任何先验知识培训的进行更换。对于这一点,我们现在PyNET,一种新颖的锥体CNN架构设计用于细粒度图像恢复隐式学习执行所有ISP作为图像去马赛克,去噪,白平衡,颜色和对比度校正,demoireing等模型被训练步骤,例如转换直接从移动照相机传感器获得的与一个专业高端数码单反相机拍摄的照片的RAW拜尔数据,使得溶液独立于任何特定的移动ISP实现的。为了验证对实际数据所提出的方法,我们收集了大规模的数据集,包括与华为P20拍照手机(12.3 MP的索尼Exmor IMX380传感器)和佳能5D Mark野外捕获10000全分辨率的RAW-RGB图像对IV数码单反相机。实验结果表明,所提出的解决方案可以轻松搞定嵌入式P20的ISP管线的水平,不像我们的做法,是结合两个(RGB + B / W)相机传感器的数据。该数据集,预先训练模式,本文使用的代码都可以在项目网站上。

8. Chaotic Phase Synchronization and Desynchronization in an Oscillator Network for Object Selection [PDF] 返回目录
  Fabricio A Breve, Marcos G Quiles, Liang Zhao, Elbert E. N. Macau
Abstract: Object selection refers to the mechanism of extracting objects of interest while ignoring other objects and background in a given visual scene. It is a fundamental issue for many computer vision and image analysis techniques and it is still a challenging task to artificial visual systems. Chaotic phase synchronization takes place in cases involving almost identical dynamical systems and it means that the phase difference between the systems is kept bounded over the time, while their amplitudes remain chaotic and may be uncorrelated. Instead of complete synchronization, phase synchronization is believed to be a mechanism for neural integration in brain. In this paper, an object selection model is proposed. Oscillators in the network representing the salient object in a given scene are phase synchronized, while no phase synchronization occurs for background objects. In this way, the salient object can be extracted. In this model, a shift mechanism is also introduced to change attention from one object to another. Computer simulations show that the model produces some results similar to those observed in natural vision systems.
摘要:对象选择是指提取关注对象,而忽略了在给定的视觉场景中其他对象和背景的机制。这是许多计算机视觉和图像分析技术的一个基本问题,它仍然是一个艰巨的任务,以人工视觉系统。乱相同步需要在涉及几乎相同的动力系统的情况下的地方,这意味着该系统之间的相位差被保持为界在时间,而它们的幅度保持混乱,并且可以是不相关的。相反,完全同步,相位同步被认为是对脑神经一体化的机制。在本文中,对象选择模型。在网络中的振荡器表示在给定的场景中的显着对象的相位同步,而没有相位同步发生为背景对象。通过这种方式,显着对象可以提取。在这种模式下,换档机构也被引入到变化的注意力从一个对象到另一个。计算机模拟表明,该模型产生相似于在自然视觉系统观察到了一定的成效。

9. EndoL2H: Deep Super-Resolution for Capsule Endoscopy [PDF] 返回目录
  Yasin Almalioglu, Abdulkadir Gokce, Kagan Incetan, Muhammed Ali Simsek, Kivanc Ararat, Richard J. Chen, Nichalos J. Durr, Faisal Mahmood, Mehmet Turan
Abstract: Wireless capsule endoscopy is the preferred modality for diagnosis and assessment of small bowel disease. However, the poor resolution is a limitation for both subjective and automated diagnostics. Enhanced-resolution endoscopy has shown to improve adenoma detection rate for conventional endoscopy and is likely to do the same for capsule endoscopy. In this work, we propose and quantitatively validate a novel framework to learn a mapping from low-to-high resolution endoscopic images. We use conditional adversarial networks and spatial attention to improve the resolution by up to a factor of 8x. Our quantitative study demonstrates the superiority of our proposed approach over Super-Resolution Generative Adversarial Network (SRGAN) and bicubic interpolation. For qualitative analysis, visual Turing tests were performed by 16 gastroenterologists to confirm the clinical utility of the proposed approach. Our approach is generally applicable to any endoscopic capsule system and has the potential to improve diagnosis and better harness computational approaches for polyp detection and characterization. Our code and trained models are available at this https URL.
摘要:无线胶囊内窥镜是用于小肠疾病的诊断和评估的优选的形态。然而,可怜的分辨率是主观和自动诊断的限制。增强的分辨率内镜已经显示出改善常规胃镜腺瘤检出率,并有可能对胶囊内镜做同样的。在这项工作中,我们提出并定量验证新框架,从低到高清晰度内窥镜图像学的映射。我们使用条件对抗网络和空间注意改善了分辨率8X的一个因素。我们的定量研究表明我们提出的方法在超分辨率剖成对抗性网络(SRGAN)和双三次插值的优越性。对于定性分析,视觉图灵测试是由16名胃肠病来证实了该方法的临床应用。我们的做法是普遍适用于任何的胶囊内窥镜系统,并具有提高诊断和息肉检测和表征更好地利用计算方法的潜力。我们的代码和训练的模型可在此HTTPS URL。

10. Emotion Recognition for In-the-wild Videos [PDF] 返回目录
  Hanyu Liu, Jiabei Zeng, Shiguang Shan, Xilin Chen
Abstract: This paper is a brief introduction to our submission to the seven basic expression classification track of Affective Behavior Analysis in-the-wild Competition held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020. Our method combines Deep Residual Network (ResNet) and Bidirectional Long Short-Term Memory Network (BLSTM), achieving 64.3% accuracy and 43.4% final metric on the validation set.
摘要:本文简要介绍我们提交情感行为分析的七种基本表情分类轨道在最狂野结合的自动面部和手势识别(FG)2020年,我们的方法结合了IEEE国际会议举行比赛深剩余网络(RESNET)和双向长短时记忆网络(BLSTM),实现了64.3%的准确率和43.4%的验证集的最终指标。

11. Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks [PDF] 返回目录
  Taro Kiritani, Koji Ono
Abstract: Convolutional neural networks are vulnerable to small $\ell^p$ adversarial attacks, while the human visual system is not. Inspired by neural networks in the eye and the brain, we developed a novel artificial neural network model that recurrently collects data with a log-polar field of view that is controlled by attention. We demonstrate the effectiveness of this design as a defense against SPSA and PGD adversarial attacks. It also has beneficial properties observed in the animal visual system, such as reflex-like pathways for low-latency inference, fixed amount of computation independent of image size, and rotation and scale invariance. The code for experiments is available at this https URL.
摘要:卷积神经网络很容易受到小$ \ ELL ^ P $敌对攻击,而人的视觉系统是没有的。通过在眼睛和大脑的神经网络的启发,我们开发了具有反复的观点,即由关注控制的数极场收集数据的新型人工神经网络模型。我们证明这种设计的对抗SPSA和PGD敌对攻击防御的有效性。它还具有在动物的视觉系统观察到的有益的性质,例如反射般途径低延迟推断,计算独立的图像尺寸的固定量,并且旋转和尺度不变性。用于实验的代码可在此HTTPS URL。

12. Hypergraph Optimization for Multi-structural Geometric Model Fitting [PDF] 返回目录
  Shuyuan Lin, Guobao Xiao, Yan Yan, David Suter, Hanzi Wang
Abstract: Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.
摘要:近来,一些基于超图的方法已经被提出来处理在计算机视觉模型拟合的问题,主要是由于超图来表示数据点之间的复杂关系的卓越能力。然而,当输入数据包括大量的数据点(通常沾染噪声和异常值),这将增加显著的计算负担的超图变得极其复杂。为了克服上述问题,我们提出了一种新颖的超图优化的基于模型拟合(HOMF)方法来构造一个简单但有效的超图。具体而言,HOMF包括两个主要部分:用于顶点优化的自适应内点估计算法和用于超边优化迭代超边优化算法。所提出的方法是高效的,并且它可以在几次迭代中获得准确的模型拟合的结果。此外,HOMF就可以直接申请谱聚类,以达到良好的装配性能。广泛的实验结果表明,HOMF性能优于几个国家的最先进的模型拟合在两个合成数据和真实图像的方法,尤其是在采样效率,并与严重的异常值处理数据。

13. Object Detection on Single Monocular Images through Canonical Correlation Analysis [PDF] 返回目录
  Zifan Yu, Suya You
Abstract: Without using extra 3-D data like points cloud or depth images for providing 3-D information, we retrieve the 3-D object information from single monocular images. The high-quality predicted depth images are recovered from single monocular images, and it is fed into the 2-D object proposal network with corresponding monocular images. Most existing deep learning frameworks with two-streams input data always fuse separate data by concatenating or adding, which views every part of a feature map can contribute equally to the whole task. However, when data are noisy, and too much information is redundant, these methods no longer produce predictions or classifications efficiently. In this report, we propose a two-dimensional CCA(canonical correlation analysis) framework to fuse monocular images and corresponding predicted depth images for basic computer vision tasks like image classification and object detection. Firstly, we implemented different structures with one-dimensional CCA and Alexnet to test the performance on the image classification task. And then, we applied one of these structures with 2D-CCA for object detection. During these experiments, we found that our proposed framework behaves better when taking predicted depth images as inputs with the model trained from ground truth depth.
摘要:不使用额外的3-d数据,如点云或提供3 d信息的深度图像,我们从中检索单一单眼图像的3-d对象的信息。高品质的预测深度图像是从单个单目图像恢复,并且它被送入2-d对象提案网络与对应单眼图像。与大多数现有的深度学习框架,通过连接或添加,其浏览量特征图的每一个部分可以对整个任务同样有助于总是两流输入数据熔丝单独的数据。然而,当数据是嘈杂的,和太多的信息是多余的,这些方法不再生产预测或分类有效。在本报告中,我们提出了一种二维CCA(典型相关分析)框架,保险丝单眼图像和基本计算机视觉任务,如图像分类和物体检测对应的预测深度图像。首先,我们实施了不同的结构与一维CCA和Alexnet测试在图像分类任务性能。然后,我们应用这些结构的2D-CCA为对象检测中的一个。在这些实验中,我们发现,服用预测深度图像与地面真相的深度训练模型输入时,我们提出的框架的行为更好。

14. Continual Universal Object Detection [PDF] 返回目录
  Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
Abstract: Object detection has improved significantly in recent years on multiple challenging benchmarks. However, most existing detectors are still domain-specific, where the models are trained and tested on a single domain. When adapting these detectors to new domains, they often suffer from catastrophic forgetting of previous knowledge. In this paper, we propose a continual object detector that can learn sequentially from different domains without forgetting. First, we explore learning the object detector continually in different scenarios across various domains and categories. Learning from the analysis, we propose attentive feature distillation leveraging both bottom-up and top-down attentions to mitigate forgetting. It takes advantage of attention to ignore the noisy background information and feature distillation to provide strong supervision. Finally, for the most challenging scenarios, we propose an adaptive exemplar sampling method to leverage exemplars from previous tasks for less forgetting effectively. The experimental results show the excellent performance of our proposed method in three different scenarios across seven different object detection datasets.
摘要:目的检测已在多个具有挑战性的基准近年来显著改善。然而,大多数现有的探测器依然特定领域,其中模型训练和在单个域进行测试。当采用这些探测器新的领域,但常常会出现以前的知识的灾难性遗忘。在本文中,我们提出一种可从不同的域顺序地学习没有忘记一个连续的物体检测装置。首先,我们探究的学习对象检测器不断地在不同领域和类别不同的​​场景。从分析中学习,我们提出了周到的功能,利用蒸馏既自下而上和自上而下的注意力,以减轻遗忘。这需要关注的优势,忽略了嘈杂的背景信息和功能蒸馏提供强有力的监督。最后,最具挑战性的场景中,我们提出了有效的少遗忘的自适应典范抽样的方法,从以前的任务杠杆典范。实验结果表明,在三个不同的场景在七个不同的物体探测数据集我们提出的方法的优良性能。

15. SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud [PDF] 返回目录
  Hongwei Yi, Shaoshuai Shi, Mingyu Ding, Jiankai Sun, Kui Xu, Hui Zhou, Zhe Wang, Sheng Li, Guoping Wang
Abstract: 3D vehicle detection based on point cloud is a challenging task in real-world applications such as autonomous driving. Despite significant progress has been made, we observe two aspects to be further improved. First, the semantic context information in LiDAR is seldom explored in previous works, which may help identify ambiguous vehicles. Second, the distribution of point cloud on vehicles varies continuously with increasing depths, which may not be well modeled by a single model. In this work, we propose a unified model SegVoxelNet to address the above two problems. A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view. Suspicious regions could be highlighted while noisy regions are suppressed by this module. To better deal with vehicles at different depths, a novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range. Extensive experiments on the KITTI dataset show that the proposed method outperforms the state-of-the-art alternatives in both accuracy and efficiency with point cloud as input only.
摘要:基于点云3D车辆检测是在现实世界的应用,如自动驾驶一个具有挑战性的任务。尽管显著已经取得了进展,我们观察到两个方面有待进一步提高。首先,在激光雷达语义上下文信息在以前的作品中,这可能有助于识别模糊的车辆很少探讨。其次,点云上的车辆分布随深度,这可能不是一个单一的模式很好地模拟连续变化。在这项工作中,我们提出了一个统一的模型SegVoxelNet解决上述两个问题。语义上下文编码器,提出了利用自由充电语义分割口罩的鸟瞰视图。而嘈杂的区域由该模块抑制可疑的区域可以高亮显示。为了更好地应对在不同深度的车辆,新颖的深度感知的头被设计成分布差异和深度感知头部的每个部分是由专注于自己的目标的探测距离清晰的模型。对数据集KITTI表明,该方法优于国家的最先进的替代品在精度和效率与点云作为仅输入了广泛的实验。

16. Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization [PDF] 返回目录
  Meng Li, Yilei Li, Pierce Chuang, Liangzhen Lai, Vikas Chandra
Abstract: Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric. The data-path energy, including the computation energy and the data movement energy among the arithmetic units, claims a significant part of the total accelerator energy. By revisiting the basic physics of the arithmetic logic circuits, we show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units, defined as the hamming distance of the input operand matrices. Based on the insight, we propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-design and co-optimize the accelerator and the network synergistically. The experimental results based on post-layout simulation with MobileNetV2 demonstrate on average 2.85X data-path energy reduction and up to 8.51X data-path energy reduction for certain layers.
摘要:神经网络加速器是用于在设备上的AI推理的一个关键因素,为此,能量效率是一个重要的度量。数据通路的能量,包括计算能量和运算单元之间的数据移动的能量,要求总能量加速器的显著一部分。通过重新审视的算术逻辑电路的基本物理中,我们表明,与流式输入操作数到运算单元,其定义为输入操作数矩阵的汉明距离,当位翻转数据通路能量高度相关。基于这样的认识,我们提出了一个培训后的优化算法和汉明距离感知训练算法协同设计和协同优化的加速器和网络协同。基于后布局仿真MobileNetV2实验结果表明,平均2.85X数据路径能量减少且至多为8.51X某些层数据路径能量削减。

17. Solving Missing-Annotation Object Detection with Background Recalibration Loss [PDF] 返回目录
  Han Zhang, Fangyi Chen, Zhiqiang Shen, Qiqi Hao, Chenchen Zhu, Marios Savvides
Abstract: This paper focuses on a novel and challenging detection scenario: A majority of true objects/instances is unlabeled in the datasets, so these missing-labeled areas will be regarded as the background during training. Previous art on this problem has proposed to use soft sampling to re-weight the gradients of RoIs based on the overlaps with positive instances, while their method is mainly based on the two-stage detector (i.e. Faster RCNN) which is more robust and friendly for the missing label scenario. In this paper, we introduce a superior solution called Background Recalibration Loss (BRL) that can automatically re-calibrate the loss signals according to the pre-defined IoU threshold and input image. Our design is built on the one-stage detector which is faster and lighter. Inspired by the Focal Loss formulation, we make several significant modifications to fit on the missing-annotation circumstance. We conduct extensive experiments on the curated PASCAL VOC and MS COCO datasets. The results demonstrate that our proposed method outperforms the baseline and other state-of-the-arts by a large margin.
摘要:本文着重于新颖和具有挑战性的检测情况:多数真正的对象/实例中的数据集未标记的,所以这些丢失的标记区将被视为训练期间的背景。对这个问题以前的技术已提出了使用软采样重新重量基于与正实例的重叠ROI的梯度,而他们的方法主要是基于两阶段检测器上(即,更快的RCNN),这是更健壮的和友好的失踪标签的情况。在本文中,我们引入称为背景重新校准损失(BRL)优越的解决方案根据所述预定义的阈值IOU和输入图像,可以自动重新校准损失信号。我们的设计是建立在一个阶段的检测速度更快,更轻的。由焦点损失配方的启发,我们做几个显著的修改,以适应失踪的注释情况。我们进行的策划PASCAL VOC和MS COCO数据集大量的实验。结果表明,我们提出的方法优于大幅度基线和其他国家的的美术馆。

18. Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System [PDF] 返回目录
  Nataniel Ruiz, Mona Jalal, Vitaly Ablavsky, Danielle Allessio, John Magee, Jacob Whitehill, Ivon Arroyo, Beverly Woolf, Stan Sclaroff, Margrit Betke
Abstract: In the context of building an intelligent tutoring system (ITS), which improves student learning outcomes by intervention, we set out to improve prediction of student problem outcome. In essence, we want to predict the outcome of a student answering a problem in an ITS from a video feed by analyzing their face and gestures. For this, we present a novel transfer learning facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We model the temporal structure of video sequences of students solving math problems using a recurrent neural network architecture. Additionally, we extend the largest dataset of student interactions with an intelligent online math tutor by a factor of two. Our final model, coined ATL-BP (Affect Transfer Learning for Behavior Prediction) achieves an increase in mean F-score over state-of-the-art of 45% on this new dataset in the general case and 50% in a more challenging leave-users-out experimental setting when we use a user-personalized training scheme.
摘要:在建设智能教学系统(ITS),它通过干预提高了学生的学习成果的背景下,我们着手提高学生的问题结果的预测。从本质上说,我们希望通过分析他们的脸和手势来预测一个学生从视频源的ITS回答问题的结果。为此,我们提出了一个新的转移学习的面部影响表现和解锁此表示的潜在用户个性化的培训方案。我们的学生解决使用递归神经网络结构的数学题的视频序列的时间结构建模。此外,我们通过两个因素具有智能在线数学家教延长学生互动的最大的数据集。我们的最终模型,创造了ATL-BP(影响对行为预测迁移学习)达到平均F-得分超过国家的最先进的45%,在这个新的数据集在一般情况下增加,在50%以上挑战假用户出实验设置,当我们使用用户个性化的培训方案。

19. Image-to-Image Translation with Text Guidance [PDF] 返回目录
  Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
Abstract: The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.
摘要:本文的目的是嵌入可控因素,即,自然语言描述成图像到图像的平移与生成对抗性的网络,它允许文本描述,以确定合成图像的视觉属性。我们提出四个主要组成部分:(1)部分的词性标注的实施,以过滤出在给定的描述非语义字,(2)通过仿射组合模块的有效熔丝不同模态的文本和图像的特征, (3)一种新的改进的多级结构,以加强鉴别器的差动能力和发电机的整流能力,和(4)的新结构的损失,进一步提高鉴别器,以更好地分辨实际的和合成的图像。在COCO大量的实验数据集表明,我们的方法有两个逼真视觉效果,并与给定的描述语义一致性优越的性能。

20. Cross-Iteration Batch Normalization [PDF] 返回目录
  Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin
Abstract: A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. A challenge of computing statistics over multiple iterations is that the network activations from different iterations are not comparable to each other due to changes in network weights. We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied. On object detection and image classification with small mini-batch sizes, CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
摘要:知名批标准化的问题是迷你小批量的情况下,其显著降低效果。当小批量包含几个例子,在其上归一化定义的统计数据不能可靠地从它训练迭代期间估计。为了解决这个问题,我们提出了交叉迭代批标准化(CBN),其最近多次迭代的例子共同利用,以提高估计质量。在多次迭代计算统计数据的一个挑战是,从不同的迭代中的网络激活没有可比性彼此由于在网络权的变化。因此,我们补偿通过基于泰勒多项式一个提出的技术的网络的重量变化,从而使统计数据可以精确地估计和批量标准化可以有效的应用。关于物体检测及图像分类与迷你小批量大小,CBN发现优于原始批标准化并在先前迭代统计而不拟议补偿技术直接计算。

21. A Simple Framework for Contrastive Learning of Visual Representations [PDF] 返回目录
  Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton
Abstract: This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.
摘要:本文介绍SimCLR:对于视觉表现的对比学习一个简单的框架。我们简化了最近提出的对比自我监督学习算法,而不需要专门的架构或存储库。为了了解什么能使对比预测任务学习用的表现,我们系统地研究我们的框架的主要组成部分。我们显示数据增扩的:(1)组合物在定义有效的预测任务(2)将所述表示和所述对比损耗基本上之间的可以学习的非线性变换关键作用,改善了学习表示的质量,和(3)对比学习从大批量和更多的培训措施的好处相比,监督学习。通过结合这些研究结果,我们能够显着跑赢上ImageNet自我监督和半监督学习以前的方法。训练由SimCLR了解到自监督表示的线性分类器达到76.5%顶-1精度,这比以前的国家的最先进的7%的相对改善,匹配的性能监督RESNET-50。当只有1%的标签的微调,我们达到85.8%,排名前五的准确性,跑赢AlexNet与100X较少的标签。

22. Generative-based Airway and Vessel Morphology Quantification on Chest CT Images [PDF] 返回目录
  Pietro Nardelli, James C. Ross, Raúl San José Estépar
Abstract: Accurately and precisely characterizing the morphology of small pulmonary structures from Computed Tomography (CT) images, such as airways and vessels, is becoming of great importance for diagnosis of pulmonary diseases. The smaller conducting airways are the major site of increased airflow resistance in chronic obstructive pulmonary disease (COPD), while accurately sizing vessels can help identify arterial and venous changes in lung regions that may determine future disorders. However, traditional methods are often limited due to image resolution and artifacts. We propose a Convolutional Neural Regressor (CNR) that provides cross-sectional measurement of airway lumen, airway wall thickness, and vessel radius. CNR is trained with data created by a generative model of synthetic structures which is used in combination with Simulated and Unsupervised Generative Adversarial Network (SimGAN) to create simulated and refined airways and vessels with known ground-truth. For validation, we first use synthetically generated airways and vessels produced by the proposed generative model to compute the relative error and directly evaluate the accuracy of CNR in comparison with traditional methods. Then, in-vivo validation is performed by analyzing the association between the percentage of the predicted forced expiratory volume in one second (FEV1\%) and the value of the Pi10 parameter, two well-known measures of lung function and airway disease, for airways. For vessels, we assess the correlation between our estimate of the small-vessel blood volume and the lungs' diffusing capacity for carbon monoxide (DLCO). The results demonstrate that Convolutional Neural Networks (CNNs) provide a promising direction for accurately measuring vessels and airways on chest CT images with physiological correlates.
摘要:准确且精确地从计算机断层摄影术表征小肺结构的形态(CT)图像,如气道和血管中,对肺部疾病的诊断成为非常重要的。较小的传导气道中是慢性阻塞性肺疾病(COPD)增加的气流阻力的主要部位,而准确地定径容器可以帮助识别在肺部区域的动脉和静脉的变化可确定未来失调。然而,传统的方法往往是有限的,由于图像分辨率和文物。我们提出了一个卷积神经回归(CNR),其提供气道内腔,气道壁的厚度,和容器半径的横截面测量。 CNR进行训练由在与模拟和无监督剖成对抗式网络(SimGAN)来创建仿真和精制气道和与已知的地面实况容器组合使用的合成结构的生成模型创建的数据。进行验证,我们首先使用由所提出的生成模型产生合成产生的气道和血管以计算的相对误差,并直接评价与传统的方法相比CNR的精度。然后,在体内验证是通过分析所预测的用力呼气体积的百分比之间的关联在一秒钟(FEV1 \%)和PI10参数的值中,两个肺功能的公知的措施及气道疾病的药物,进行气道。对于容器,我们评估我们的小血管血液体积的估计和所述肺的一氧化碳弥散(弥散)容量之间的相关性。结果表明,卷积神经网络(细胞神经网络)提供用于准确测量与生理相关因素胸部CT图像的血管和气道有希望的方向。

23. Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE [PDF] 返回目录
  Petru-Daniel Tudosiu, Thomas Varsavsky, Richard Shaw, Mark Graham, Parashkev Nachev, Sebastien Ourselin, Carole H. Sudre, M. Jorge Cardoso
Abstract: The increasing efficiency and compactness of deep learning architectures, together with hardware improvements, have enabled the complex and high-dimensional modelling of medical volumetric data at higher resolutions. Recently, Vector-Quantised Variational Autoencoders (VQ-VAE) have been proposed as an efficient generative unsupervised learning approach that can encode images to a small percentage of their initial size, while preserving their decoded fidelity. Here, we show a VQ-VAE inspired network can efficiently encode a full-resolution 3D brain volume, compressing the data to $0.825\%$ of the original size while maintaining image fidelity, and significantly outperforming the previous state-of-the-art. We then demonstrate that VQ-VAE decoded images preserve the morphological characteristics of the original data through voxel-based morphology and segmentation experiments. Lastly, we show that such models can be pre-trained and then fine-tuned on different datasets without the introduction of bias.
摘要:提高效率和深度学习体系结构紧凑,与硬件的改进在一起,已经使医疗容积数据的复杂性和高维模型在更高的分辨率。近日,矢量量化变自动编码(VQ-VAE)已被提议作为一种高效生成无监督的学习方法,可以对图像进行编码,以他们的初始大小的一小部分,同时保留其解码的保真度。在这里,我们展示了一个VQ-VAE启发网络可以有效地编码全分辨率3D脑容量,数据压缩至$在0.825 \%的原始大小的$同时保持图像保真度,并显著超越以前的状态的最先进的。然后,我们证明了VQ-VAE解码图像通过基于体素的形态学和分割实验保留原始数据的形态特征。最后,我们表明,这种模式可以预先培训,并在不同的数据集,然后微调不引入偏见。

24. FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains [PDF] 返回目录
  Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
Abstract: In the realm of autonomous transportation, there have been many initiatives for open-sourcing self-driving cars datasets, but much less for alternative methods of transportation such as trains. In this paper, we aim to bridge the gap by introducing FRSign, a large-scale and accurate dataset for vision-based railway traffic light detection and recognition. Our recordings were made on selected running trains in France and benefited from carefully hand-labeled annotations. An illustrative dataset which corresponds to ten percent of the acquired data to date is published in open source with the paper. It contains more than 100,000 images illustrating six types of French railway traffic lights and their possible color combinations, together with the relevant information regarding their acquisition such as date, time, sensor parameters, and bounding boxes. This dataset is published in open-source at the address \url{this https URL}. We compare, analyze various properties of the dataset and provide metrics to express its variability. We also discuss specific challenges and particularities related to autonomous trains in comparison to autonomous cars.
摘要:在自治区交通运输领域,已经出现了开放式采购自动驾驶汽车的数据集诸多举措,但对于运输的替代方法,如火车要少得多。在本文中,我们的目标是通过引入FRSign,一个大型和准确的数据集用于基于视觉的铁路交通灯检测和识别,以缩小差距。我们的记录作了在法国选择运行列车和精心手工标记注释中受益。其对应于所获取的数据的最新的百分之十的示例性数据集发表在开源与纸。它包含了超过10万个图像,说明六类法国铁路交通信号灯和他们可能的颜色组合,连同有关的信息对他们的收购,如日期,时间,传感器参数,和边框。该数据集是在地址\ {URL这HTTPS URL}刊登在开源。我们比较,分析数据集的各种属性和提供指标来表达它的可变性。我们还讨论具体的挑战,相较于自主车与自主列车特殊性。

25. Machines Learn Appearance Bias in Face Recognition [PDF] 返回目录
  Ryan Steed, Aylin Caliskan
Abstract: We seek to determine whether state-of-the-art, black box face recognition techniques can learn first-impression appearance bias from human annotations. With FaceNet, a popular face recognition architecture, we train a transfer learning model on human subjects' first impressions of personality traits in other faces. We measure the extent to which this appearance bias is embedded and benchmark learning performance for six different perceived traits. In particular, we find that our model is better at judging a person's dominance based on their face than other traits like trustworthiness or likeability, even for emotionally neutral faces. We also find that our model tends to predict emotions for deliberately manipulated faces with higher accuracy than for randomly generated faces, just like a human subject. Our results lend insight into the manner in which appearance biases may be propagated by standard face recognition models.
摘要:我们试图确定是否国家的最先进的,黑盒脸部识别技术可以借鉴人类注释的第一印象,外观偏向。随着FaceNet,一个流行的脸部识别架构,我们培养对人的在其他面性格特征的第一印象是一个转移的学习模式。我们测量到这次出现偏差被嵌入的程度和基准学习表现为六个不同的感知特性。特别是,我们发现,我们的模型是基于判断自己的脸比其他性状一样可信性或喜爱程度一个人的主导地位,甚至情绪中性面孔更好。我们还发现,我们的模型往往会预测故意操纵面孔的情绪比为随机生成的面部更高的精确度,就像一个人的问题。我们的结果借洞察其外观偏见可以通过标准的面部识别模型来传播的方式。

26. Sparse and Structured Visual Attention [PDF] 返回目录
  Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins
Abstract: Visual attention mechanisms are widely used in multimodal tasks, such as image captioning and visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attention mechanism with two alternative sparsity-promoting transformations: sparsemax, which is able to select the relevant regions only (assigning zero weight to the rest), and a newly proposed Total-Variation Sparse Attention (TVmax), which further encourages the joint selection of adjacent spatial locations. Experiments in image captioning and VQA, using both LSTM and Transformer architectures, show gains in terms of human-rated caption quality, attention relevance, and VQA accuracy, with improved interpretability.
摘要:视觉注意机制被广泛应用于多任务,如图像字幕和视觉问答(VQA)。基于SOFTMAX注意力机制的一个缺点是它们分配概率质量到所有的图像区域,无论其邻接结构及其相关的文字。在本文中,以更好地链接与文本的图像结构,我们更换两种可供选择的稀疏性,促进转变传统的SOFTMAX注意机制:sparsemax,这是能够选择相关区域中仅仅(分配权重为零的其余部分),和新提出的总的变化率稀疏注意(TVmax),其进一步鼓励相邻的空间位置的联合选择。在图像字幕和VQA,同时使用LSTM和变压器的架构实验,显示人类额定字幕质量,重视相关性,准确性VQA,具有完善的可解释性方面的收益。

27. Superpixel Image Classification with Graph Attention Networks [PDF] 返回目录
  Pedro H. C. Avelar, Anderson R. Tavares, Thiago L. T. da Silveira, Cláudio R. Jung, Luís C. Lamb
Abstract: This document reports the use of Graph Attention Networks for classifying oversegmented images, as well as a general procedure for generating oversegmented versions of image-based datasets. The code and learnt models for/from the experiments are available on github. The experiments were ran from June 2019 until December 2019. We obtained better results than the baseline models that uses geometric distance-based attention by using instead self attention, in a more sparsely connected graph network.
摘要:本文报道了使用图形注意网络用于生成基于图像的数据集oversegmented版本oversegmented图像,以及作为一般程序进行分类。用于/代码,学习模型从实验都可以在GitHub上。该实验是跑到离2019年6月至2019年12月,我们获得比使用,而不是自我的关注,更稀疏连通图网络采用基于几何距离,注意基线模型更好的效果。

28. Deep Learning-based End-to-end Diagnosis System for Avascular Necrosis of Femoral Head [PDF] 返回目录
  Yang Li, Yan Li, Hua Tian
Abstract: As the first diagnostic imaging modality of avascular necrosis of the femoral head (AVNFH), accurately staging AVNFH from a plain radiograph is critical and challenging for orthopedists. Thus, we propose a deep learning-based AVNFH diagnosis system (AVN-net). The proposed AVN-net reads plain radiographs of the pelvis, conducts diagnosis, and visualizes results automatically. Deep convolutional neural networks are trained to provide an end-to-end diagnosis solution, covering femoral head detection, exam-view/sides identification, AVNFH diagnosis, and key clinical note generation subtasks. AVN-net is able to obtain state-of-the-art testing AUC of 0.95 (95% CI: 0.92-0.98) in AVNFH detection and significantly greater F1 scores (p<0.01) 1 4 than less-to-moderately experienced orthopedists in all diagnostic tests. furthermore, two real-world pilot studies were conducted for diagnosis support and education assistance, respectively, to assess the utility of avn-net. experimental results are promising. with avn-net as a reference, accuracy consistency considerably improved while requiring only time. students self-studying avnfh using can learn better faster control group. best our knowledge, this study is first research on prospective use deep learning-based system by conducting representing application scenarios. we have demonstrated that proposed achieves expert-level performance, provides efficient clinical decision-making, effectively passes experience students. < font>
摘要:股骨头缺血性坏死(AVNFH)的所述第一诊断成像模态,准确地从平片分级AVNFH是关键的,并且对骨科挑战。因此,我们提出了一个深刻的学习型AVNFH诊断系统(AVN-网)。所提出的AVN网自动读取骨盆,行为诊断的平片,以及可视化的结果。深卷积神经网络被训练,以提供端至端诊断溶液,覆盖股骨头检测,考试视点/边识别,AVNFH诊断,和关键临床音符生成的子任务。 AVN网能够获得国家的最先进的测试0.95的AUC(95%CI:0.92-0.98)中AVNFH检测和显著更大F1分数(P <0.01)小于到适度在所有经验骨科诊断测试。此外,两个真实世界的试点研究,诊断支持和教育协助下进行,分别评估avn网的效用。实验结果是有希望的。与avn网诊断,因为所有的骨科的基准,诊断的准确性和一致性,同时仅需要的时间的1 4大幅度地改善。学生自主学习使用avn网能够更好地学习和速度比对照组avnfh诊断。据我们所知,这研究是开展代表现实世界的应用场景的两个试点研究,在未来的使用了avnfh深基础的学习诊断系统的第一个研究。我们已经证明,所提出的avn网达到专家级avnfh诊断性能,提供了在临床决策的有效支持,并有效地传递临床经验的学生。< font>

29. Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner [PDF] 返回目录
  Yunlu Wang, Menghan Hu, Qingli Li, Xiao-Ping Zhang, Guangtao Zhai, Nan Yao
Abstract: Research significance: During the epidemic prevention and control period, our study can be helpful in prognosis, diagnosis and screening for the patients infected with COVID-19 (the novel coronavirus) based on breathing characteristics. According to the latest clinical research, the respiratory pattern of COVID-19 is different from the respiratory patterns of flu and the common cold. One significant symptom that occurs in the COVID-19 is Tachypnea. People infected with COVID-19 have more rapid respiration. Our study can be utilized to distinguish various respiratory patterns and our device can be preliminarily put to practical use. Demo videos of this method working in situations of one subject and two subjects can be downloaded online. Research details: Accurate detection of the unexpected abnormal respiratory pattern of people in a remote and unobtrusive manner has great significance. In this work, we innovatively capitalize on depth camera and deep learning to achieve this goal. The challenges in this task are twofold: the amount of real-world data is not enough for training to get the deep model; and the intra-class variation of different types of respiratory patterns is large and the outer-class variation is small. In this paper, considering the characteristics of actual respiratory signals, a novel and efficient Respiratory Simulation Model (RSM) is first proposed to fill the gap between the large amount of training data and scarce real-world data. Subsequently, we first apply a GRU neural network with bidirectional and attentional mechanisms (BI-AT-GRU) to classify 6 clinically significant respiratory patterns (Eupnea, Tachypnea, Bradypnea, Biots, Cheyne-Stokes and Central-Apnea). The proposed deep model and the modeling ideas have the great potential to be extended to large scale applications such as public places, sleep scenario, and office environment.
摘要:研究意义:在疫情防控期间,我们的研究可以在预后,有助于诊断和筛查感染基于呼吸特性COVID-19(该新型冠状病毒)的患者。根据最新的临床研究,COVID-19的呼吸模式是由流感的呼吸模式和普通感冒不同。发生在COVID-19的一个显著的症状是呼吸急促。感染COVID-19的人有更多的呼吸急促。我们的研究可以用来区分不同的呼吸模式和我们的设备可以预先投入实际使用。这种方法在一个主体和两个科目的情况下工作的演示视频可以在网上下载。研究细节:人在一个偏僻的和不显眼的方式意想不到的异常呼吸模式的准确的检测具有重要的意义。在这项工作中,我们创新性地利用深度相机和深度学习到实现这一目标。此任务中的挑战是双重的:真实世界的数据量是不够的训练得到深层模型;和不同类型的呼吸模式的类内变化较大和外级变化小。在本文中,考虑到实际的呼吸信号,一种新颖且有效的呼吸仿真模型(RSM)的特性被首次提出以填充大量的训练数据和稀缺真实世界的数据之间的间隙。随后,我们首先应用具有双向和注意力机制(BI-AT-GRU)一GRU神经网络分类6种临床显著的呼吸模式(正常呼吸,呼吸急促,Bradypnea,Biots,陈 - 施氏及中环呼吸暂停)。所提出的深层模型和建模的思想有很大的潜力可扩展到大规模应用,如公共场所,睡眠情况,以及办公环境。

30. Real or Not Real, that is the Question [PDF] 返回目录
  Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, Dahua Lin
Abstract: While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles. In this generalized framework, referred to as RealnessGAN, the discriminator outputs a distribution as the measure of realness. While RealnessGAN shares similar theoretical guarantees with the standard GAN, it provides more insights on adversarial learning. Compared to multiple baselines, RealnessGAN provides stronger guidance for the generator, achieving improvements on both synthetic and real-world datasets. Moreover, it enables the basic DCGAN architecture to generate realistic images at 1024*1024 resolution when trained from scratch.
摘要:虽然生成对抗网络(GAN)已经在各种主题被广泛采用,在本文中,我们通过治疗真实性为可以从多个角度来估计一个随机变量概括的标准GAN到一个新的视角。在此广义框架中,被称为RealnessGAN,鉴别器输出一个分发真实性的量度。虽然RealnessGAN分享相似的理论保证与标准甘,它提供了对抗的学习更多的见解。相比于多基线,RealnessGAN提供了更强的指导发电机,实现对合成和真实世界的数据集的改进。此外,它使基本DCGAN架构在1024 * 1024分辨率从头开始训练的时候,产生逼真的图像。

31. MLFcGAN: Multi-level Feature Fusion based Conditional GAN for Underwater Image Color Correction [PDF] 返回目录
  Xiaodong Liu, Zhi Gao, Ben M. Chen
Abstract: Color correction for underwater images has received increasing interests, due to its critical role in facilitating available mature vision algorithms for underwater scenarios. Inspired by the stunning success of deep convolutional neural networks (DCNNs) techniques in many vision tasks, especially the strength in extracting features in multiple scales, we propose a deep multi-scale feature fusion net based on the conditional generative adversarial network (GAN) for underwater image color correction. In our network, multi-scale features are extracted first, followed by augmenting local features on each scale with global features. This design was verified to facilitate more effective and faster network learning, resulting in better performance in both color correction and detail preservation. We conducted extensive experiments and compared with the state-of-the-art approaches quantitatively and qualitatively, showing that our method achieves significant improvements.
摘要:颜色校正水下图像已经受到越来越多的利益,由于在促进现有成熟的视觉算法用于水下场景中的关键作用。深卷积神经网络(DCNNs)在许多视觉任务的技术,尤其是在多个尺度提取特征的实力令人惊叹的成功的启发,我们提出了一个深刻的多尺度特征融合基础条件生成对抗网络(GAN)的净水下图像颜色校正。在我们的网络,多尺度特征首先提取,然后在全球各功能扩充规模的局部特征。这种设计进行了验证,以促进更有效和更快的网络学习,导致这两个色彩校正和细节保持更好的性能。我们进行了广泛的实验,相比定量和定性的方法,这表明我们的方法实现显著的改善最先进的国家的的。

32. Physical Accuracy of Deep Neural Networks for 2D and 3D Multi-Mineral Segmentation of Rock micro-CT Images [PDF] 返回目录
  Ying Da Wang, Mehdi Shabaninejad, Ryan T. Armstrong, Peyman Mostaghimi
Abstract: Segmentation of 3D micro-Computed Tomographic uCT) images of rock samples is essential for further Digital Rock Physics (DRP) analysis, however, conventional methods such as thresholding, watershed segmentation, and converging active contours are susceptible to user-bias. Deep Convolutional Neural Networks (CNNs) have produced accurate pixelwise semantic segmentation results with natural images and $\mu$CT rock images, however, physical accuracy is not well documented. The performance of 4 CNN architectures is tested for 2D and 3D cases in 10 configurations. Manually segmented uCT images of Mt. Simon Sandstone are treated as ground truth and used as training and validation data, with a high voxelwise accuracy (over 99%) achieved. Downstream analysis is then used to validate physical accuracy. The topology of each segmented phase is calculated, and the absolute permeability and multiphase flow is modelled with direct simulation in single and mixed wetting cases. These physical measures of connectivity, and flow characteristics show high variance and uncertainty, with models that achieve 95\%+ in voxelwise accuracy possessing permeabilities and connectivities orders of magnitude off. A new network architecture is also introduced as a hybrid fusion of U-net and ResNet, combining short and long skip connections in a Network-in-Network configuration. The 3D implementation outperforms all other tested models in voxelwise and physical accuracy measures. The network architecture and the volume fraction in the dataset (and associated weighting), are factors that not only influence the accuracy trade-off in the voxelwise case, but is especially important in training a physically accurate model for segmentation.
摘要:3D的分割微计算机断层UCT)岩石样品的图像是用于进一步数字岩石物理(DRP)分析必不可少的,然而,常规方法如阈值,分水岭分割,并会聚主动轮廓很容易受到用户偏置。深卷积神经网络(细胞神经网络)已经产生与自然图像和$ \亩$ CT岩石的图像,然而,物理精度不会有据可查的准确按像素语义分割结果。的4 CNN架构性能为2D和3D的情况下在10个配置测试。人工分割山UCT图片西蒙砂岩被视为基础事实和用作训练和验证数据,以实现高的精度voxelwise(超过99%)。然后向下游分析用于验证物理精度。每个分段的相位的拓扑计算,并且绝对渗透率和多相流建模与单一和混合润湿的情况下直接模拟。连接的这些物理措施和流动特性表现出较大差异性和不确定性,与在voxelwise准确性拥有幅度的渗透性和连通性关闭订单达到95 \%+车型。一个新的网络架构也被引入作为U型网和RESNET,结合短期和长期跳过在以网络为在网络配置连接的杂合融合。三维实现优于所有其他测试车型voxelwise和物理精度的措施。的网络体系结构和在数据集(和相关联的权重)的体积分数,是不仅影响精度的折衷在voxelwise情况下因素,但是在训练物理上精确模型分割尤其重要。

33. A Provably Robust Multiple Rotation Averaging Scheme for SO(2) [PDF] 返回目录
  Tyler Maunu, Gilad Lerman
Abstract: We give adversarial robustness results for synchronization on the rotation group over $\mathbb{R}^2$, $\mathrm{SO}(2)$. In particular, we consider an adversarial corruption setting, where an adversary can choose which measurements to corrupt as well as what to corrupt them to. In this setting, we first show that some common nonconvex formulations, which are categorized as "multiple rotation averaging", may fail. We then discuss a new fast algorithm, called Trimmed Averaging Synchronization, which has exact recovery and linear convergence up to an outlier percentage of $1/4$.
摘要:给予超过$ \ mathbb {R} ^ 2,$ \ mathrm {SO}(2)$上旋转组同步对抗性鲁棒性的结果。特别是,我们考虑一个对抗性腐败的设置,其中一个对手可以选择测量腐败是什么,以及腐败他们。在这种背景下,我们首先表明,一些常见的非凸制剂,其被归类为“多回转平均”,可能会失败。然后,我们讨论了一个新的快速算法,称为修剪平均化同步,其中有确切的恢复和线性收敛高达$ 1/4 $离群值百分比。

34. Geom-GCN: Geometric Graph Convolutional Networks [PDF] 返回目录
  Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, Bo Yang
Abstract: Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs' aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation. We also present an implementation of the scheme in graph convolutional networks, termed GeomGCN, to perform transductive learning on graphs. Experimental results show the proposed Geom-GCN achieved state-of-the-art performance on a wide range of open datasets of graphs. Code is available at this https URL.
摘要:消息传递神经网络(MPNNs)已经在各种实际应用中已成功地应用于表示学习上的图表。然而,MPNNs'聚合的两个根本性的弱点限制了他们的代表图结构数据的能力:失去街区节点的结构信息,缺乏捕捉到远距离的依赖于异配图的能力。很少有研究发现从不同的角度弱点。从经典的神经网络和网络上的几何形状的观察结果,我们提出了图形神经网络克服了两个弱点新颖的几何集成方案。后面的基本思想是在曲线图上的聚合可以从一个连续的空间图形底层受益。建议的聚合方案是排列不变和由三个模块组成,节点嵌入,结构附近,以及两级聚集。我们还提出在图形卷积网络计划的实施,被称为GeomGCN,对图形进行式学习。实验结果表明,所提出的Geom-GCN上大量的图形的开放数据集的实现状态的最先进的性能。代码可在此HTTPS URL。

35. HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models [PDF] 返回目录
  Qianwen Wang, William Alexander, Jack Pegg, Huamin Qu, Min Chen
Abstract: In this paper, we present a visual analytics tool for enabling hypothesis-based evaluation of machine learning (ML) models. We describe a novel ML-testing framework that combines the traditional statistical hypothesis testing (commonly used in empirical research) with logical reasoning about the conclusions of multiple hypotheses. The framework defines a controlled configuration for testing a number of hypotheses as to whether and how some extra information about a "concept" or "feature" may benefit or hinder a ML model. Because reasoning multiple hypotheses is not always straightforward, we provide HypoML as a visual analysis tool, with which, the multi-thread testing data is transformed to a visual representation for rapid observation of the conclusions and the logical flow between the testing data and hypotheses.We have applied HypoML to a number of hypothesized concepts, demonstrating the intuitive and explainable nature of the visual analysis.
摘要:在本文中,我们提出了实现的机器学习(ML)模型基于假设的评估可视化分析工具。我们描述了一种新ML-测试框架,结合有关的多个假设的结论,逻辑推理传统的统计假设检验(在实证研究常用)。该框架定义了用于测试多个假设是否以及如何对一个“概念”或“功能”一些额外的信息可能会受益或阻碍ML模型控制的配置。因为推理多个假设并不总是简单的,我们提供HypoML作为视觉分析工具,与其中,多线程测试数据被变换为结论的快速观察和测试数据和假设之间的逻辑流程的可视化表示。我们应用HypoML到多个虚拟的概念,展示了可视化分析的直观解释的性质。

36. Graph Similarity Using PageRank and Persistent Homology [PDF] 返回目录
  Mustafa Hajij, Elizabeth Munch, Paul Rosen
Abstract: The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this work, we utilize the PageRank function on the lower-star filtration of the graph as input to persistent homology to study the problem of graph similarity. By representing each graph as a persistence diagram, we can then compare outputs using the bottleneck distance. We show the effectiveness of our method by utilizing it on two shape mesh datasets.
摘要:一个图的PageRank是在节点集的图表,其编码节点的曲线图的中心性信息的定义的标量函数。在这项工作中,我们利用图表上的输入,持续的同源性研究图形的相似问题的低星级过滤的PageRank功能。由表示每个图形作为持久图,我们可以然后比较输出使用所述瓶颈距离。我们利用这两个状的网数据集显示了该方法的有效性。

注:中文为机器翻译结果!