0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-12

目录

1. Using Computer Vision to enhance Safety of Workforce in Manufacturing in a Post COVID World [PDF] 摘要
2. Normalized Convolutional Neural Network [PDF] 摘要
3. Deep-Learning-based Automated Palm Tree Counting and Geolocation in Large Farms from Aerial Geotagged Images [PDF] 摘要
4. On the Transferability of Winning Tickets in Non-Natural Image Datasets [PDF] 摘要
5. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence [PDF] 摘要
6. Reference Pose Generation for Visual Localization via Learned Features and View Synthesis [PDF] 摘要
7. FroDO: From Detections to 3D Objects [PDF] 摘要
8. Fine-Grained Visual Classification with Efficient End-to-end Localization [PDF] 摘要
9. HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment [PDF] 摘要
10. Quantitative Analysis of Image Classification Techniques for Memory-Constrained Devices [PDF] 摘要
11. Prototypical Contrastive Learning of Unsupervised Representations [PDF] 摘要
12. Fake Face Detection via Adaptive Residuals Extraction Network [PDF] 摘要
13. Conditional Image Generation and Manipulation for User-Specified Content [PDF] 摘要
14. An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation [PDF] 摘要
15. Celeganser: Automated Analysis of Nematode Morphology and Age [PDF] 摘要
16. Scope Head for Accurate Localizationin Object Detection [PDF] 摘要
17. Non-iterative Simultaneous Rigid Registration Method for Serial Sections of Biological Tissue [PDF] 摘要
18. Learning Descriptors Invariance Through Equivalence Relations Within Manifold: A New Approach to Expression Invariant 3D Face Recognition [PDF] 摘要
19. The Visual Social Distancing Problem [PDF] 摘要
20. A numerical method to estimate uncertainty in non-rigid structure from motion [PDF] 摘要
21. Photometric Multi-View Mesh Refinement for High-Resolution Satellite Images [PDF] 摘要
22. A Simple Semi-Supervised Learning Framework for Object Detection [PDF] 摘要
23. A Generalized Kernel Risk Sensitive Loss for Robust Two-Dimensional Singular Value Decomposition [PDF] 摘要
24. Domain Adaptation for Image Dehazing [PDF] 摘要
25. A Simple and Scalable Shape Representation for 3D Reconstruction [PDF] 摘要
26. A Comparison of Few-Shot Learning Methods for Underwater Optical and Sonar Image Classification [PDF] 摘要
27. A Unified Weight Learning and Low-Rank Regression Model for Robust Face Recognition [PDF] 摘要
28. MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking [PDF] 摘要
29. Variational Clustering: Leveraging Variational Autoencoders for Image Clustering [PDF] 摘要
30. Robust Tensor Decomposition for Image Representation Based on Generalized Correntropy [PDF] 摘要
31. Class-Aware Domain Adaptation for Improving Adversarial Robustness [PDF] 摘要
32. Compact Neural Representation Using Attentive Network Pruning [PDF] 摘要
33. Epipolar Transformers [PDF] 摘要
34. A Robust Matching Pursuit Algorithm Using Information Theoretic Learning [PDF] 摘要
35. Generative Model-driven Structure Aligning Discriminative Embeddings for Transductive Zero-shot Learning [PDF] 摘要
36. Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events [PDF] 摘要
37. Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation [PDF] 摘要
38. Vehicle Re-Identification Based on Complementary Features [PDF] 摘要
39. Understanding Dynamic Scenes using Graph Convolution Networks [PDF] 摘要
40. Memory-Augmented Relation Network for Few-Shot Learning [PDF] 摘要
41. High Resolution Face Age Editing [PDF] 摘要
42. Photo style transfer with consistency losses [PDF] 摘要
43. A Weighted Difference of Anisotropic and Isotropic Total Variation for Relaxed Mumford-Shah Color and Multiphase Image Segmentation [PDF] 摘要
44. ICE-GAN: Identity-aware and Capsule-Enhanced GAN for Micro-Expression Recognition and Synthesis [PDF] 摘要
45. Attentional Bottleneck: Towards an Interpretable Deep Driving Network [PDF] 摘要
46. VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [PDF] 摘要
47. View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors [PDF] 摘要
48. STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction [PDF] 摘要
49. Fundus2Angio: A Novel Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus Photography [PDF] 摘要
50. iUNets: Fully invertible U-Nets with Learnable Up- and Downsampling [PDF] 摘要
51. Medical Image Segmentation Using a U-Net type of Architecture [PDF] 摘要
52. A Contrast-Adaptive Method for Simultaneous Whole-Brain and Lesion Segmentation in Multiple Sclerosis [PDF] 摘要
53. A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms [PDF] 摘要
54. Autonomous Tissue Scanning under Free-Form Motion for Intraoperative Tissue Characterisation [PDF] 摘要
55. Deep Reinforcement Learning for Organ Localization in CT [PDF] 摘要
56. Learning to hash with semantic similarity metrics and empirical KL divergence [PDF] 摘要
57. Gleason Score Prediction using Deep Learning in Tissue Microarray Image [PDF] 摘要
58. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [PDF] 摘要
59. Hierarchical Regression Network for Spectral Reconstruction from RGB Images [PDF] 摘要
60. Segmentation of Macular Edema Datasets with Small Residual 3D U-Net Architectures [PDF] 摘要
61. Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning [PDF] 摘要
62. Learning Context-Based Non-local Entropy Modeling for Image Compression [PDF] 摘要
63. BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps [PDF] 摘要
64. Duality in Persistent Homology of Images [PDF] 摘要
65. A Hybrid Swarm and Gravitation based feature selection algorithm for Handwritten Indic Script Classification problem [PDF] 摘要
66. An Integrated Enhancement Solution for 24-hour Colorful Imaging [PDF] 摘要
67. A Survey on Deep Learning for Neuroimaging-based Brain Disorder Analysis [PDF] 摘要
68. Non-recurrent Traffic Congestion Detection with a Coupled Scalable Bayesian Robust Tensor Factorization Model [PDF] 摘要
69. Efficient Privacy Preserving Edge Computing Framework for Image Classification [PDF] 摘要
70. Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications [PDF] 摘要
71. Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features" [PDF] 摘要
72. Enhancing LGMD's Looming Selectivity for UAVs with Spatial-temporal Distributed Presynaptic Connection [PDF] 摘要
73. AutoCLINT: The Winning Method in AutoCV Challenge 2019 [PDF] 摘要
74. An Investigation of Why Overparameterization Exacerbates Spurious Correlations [PDF] 摘要
75. ST-MNIST -- The Spiking Tactile MNIST Neuromorphic Dataset [PDF] 摘要
76. Progressive Adversarial Semantic Segmentation [PDF] 摘要
77. Measuring the Algorithmic Efficiency of Neural Networks [PDF] 摘要
78. Deep Residual Network based food recognition for enhanced Augmented Reality application [PDF] 摘要

摘要

1. Using Computer Vision to enhance Safety of Workforce in Manufacturing in a Post COVID World [PDF] 返回目录
  Prateek Khandelwal, Anuj Khandelwal, Snigdha Agarwal
Abstract: The COVID-19 pandemic forced governments across the world to impose lockdowns to prevent virus transmissions. This resulted in the shutdown of all economic activity and accordingly the production at manufacturing plants across most sectors was halted. While there is an urgency to resume production, there is an even greater need to ensure the safety of the workforce at the plant site. Reports indicate that maintaining social distancing and wearing face masks while at work clearly reduces the risk of transmission. We decided to use computer vision on CCTV feeds to monitor worker activity and detect violations which trigger real time voice alerts on the shop floor. This paper describes an efficient and economic approach of using AI to create a safe environment in a manufacturing setup. We demonstrate our approach to build a robust social distancing measurement algorithm using a mix of modern-day deep learning and classic projective geometry techniques. We have also described our face mask detection approach which provides a high accuracy across a range of customized masks.
摘要:在世界各地的COVID-19大流行迫使政府施以lockdowns防止病毒传输。这导致所有经济活动的关闭和相应的生产制造,在大多数行业工厂全面停产。虽然是一个迫切需要恢复生产,有一个更大的需求,以确保在厂区工作人员的安全。报告指出,维护社会隔离并戴口罩,在工作时明确降低传播的风险。我们决定使用在中央电视台供稿计算机视觉监视工人活动并检测其引发车间的实时语音提醒违规。本文介绍如何使用AI创建在制造安装一个安全的环境的有效和经济的方法。我们证明我们的方法来构建利用现代深学习和经典射影几何技术的混合强大的社交距离测量算法。我们还描述了我们的面罩检测方法,其提供跨多种定制面具的精度高。

2. Normalized Convolutional Neural Network [PDF] 返回目录
  Dongsuk Kim, Geonhee Lee
Abstract: In this paper, we propose Normalized Convolutional Neural Network(NCNN). NCNN is more adaptive to a convolutional operator than other nomralizaiton methods. The normalized process is similar to a normalization methods, but NCNN is more adapative to sliced-inputs and corresponding the convolutional kernel. As NCNN is more adpative to sliced-inputs, NCNN can be targeted to micro-batch training. Normalizaing of NC is conducted during convolutional process. In short, NC process is not usual normalization and can not be realized in deep learning framework optimizing standard convolution process. Hence we named this method 'Normalized Convolution'. As a result, NC process has universal property which means NC can be applied to any AI tasks involving convolution neural layer . Since NC don't need other normalization layer, NCNN looks like convolutional version of Self Normalizing Network.(SNN). Among micro-batch trainings, NCNN outperforms other batch-independent normalization methods. NCNN archives these superiority by standardizing rows of im2col matrix of inputs, which theoretically smooths the gradient of loss. The code need to manipulate standard convolution neural networks step by step. The code is available : this https URL NormalizedCNN.
摘要:在本文中,我们提出了归一化卷积神经网络(NCNN)。 NCNN更适应于操作者卷积比其他nomralizaiton方法。归一化的过程类似于归一化的方法,但NCNN是自适应性多至切片-输入和相应的卷积内核。作为NCNN更adpative到切片-输入,NCNN可以靶向微分批训练。 NC的Normalizaing期间卷积方法进行。总之,NC过程是不平常正常化,并且不能在深学习框架优化标准卷积处理来实现。因此,我们命名了该方法的标准化卷积“。其结果是,NC处理具有通用性,这意味着NC可以应用于涉及卷积神经层中的任何AI任务。由于NC不需要其他标准化层,NCNN看起来像自我正火网络的卷积版。(SNN)。间微批量培训,NCNN优于其他批独立归一化方法。 NCNN档案这些优越性通过标准化的输入im2col矩阵,理论上损失平滑的梯度的行。该代码需要操作标准卷积神经网络一步一步来。该代码可:这HTTPS URL NormalizedCNN。

3. Deep-Learning-based Automated Palm Tree Counting and Geolocation in Large Farms from Aerial Geotagged Images [PDF] 返回目录
  Adel Ammar, Anis Koubaa
Abstract: In this paper, we propose a deep learning framework for the automated counting and geolocation of palm trees from aerial images using convolutional neural networks. For this purpose, we collected aerial images in a palm tree Farm in the Kharj region, in Riyadh Saudi Arabia, using DJI drones, and we built a dataset of around 10,000 instances of palms trees. Then, we developed a convolutional neural network model using the state-of-the-art, Faster R-CNN algorithm. Furthermore, using the geotagged metadata of aerial images, we used photogrammetry concepts and distance corrections to detect the geographical location of detected palms trees automatically. This geolocation technique was tested on two different types of drones (DJI Mavic Pro, and Phantom 4 Pro), and was assessed to provide an average geolocation accuracy of 2.8m. This GPS tagging allows us to uniquely identify palm trees and count their number from a series of drone images, while correctly dealing with the issue of image overlapping. Moreover, it can be generalized to the geolocation of any other objects in UAV images.
摘要:在本文中,我们提出了使用卷积神经网络的航空影像自动计数和棕榈树的地理定位的深度学习的框架。为此,我们收集了航拍图像的棕榈树农场在卡里区域,在利雅得沙特阿拉伯,使用DJI无人驾驶飞机,我们建的棕榈树大约10,000个实例的数据集。然后,我们开发了卷积神经网络模型使用状态的最先进的,更快的R-CNN算法。此外,使用航空图像的地理标记的元数据中,我们使用摄影测量的概念和距离校正来自动检测检测到的手掌树木的地理位置。这个地理定位技术在两种不同类型的无人驾驶飞机(DJI MAVIC Pro和幻影4 Pro)的测试,并且被评估为提供2.8米的平均地理位置的精度。这款GPS标记允许我们唯一确定的棕榈树和从一系列无人机图像的数着数,而正确地处理图像的重叠问题。此外,它可以被推广到在UAV图像的任何其他对象的地理定位。

4. On the Transferability of Winning Tickets in Non-Natural Image Datasets [PDF] 返回目录
  Matthia Sabatelli, Mike Kestemont, Pierre Geurts
Abstract: We study the generalization properties of pruned neural networks that are the winners of the lottery ticket hypothesis on datasets of natural images. We analyse their potential under conditions in which training data is scarce and comes from a non-natural domain. Specifically, we investigate whether pruned models that are found on the popular CIFAR-10/100 and Fashion-MNIST datasets, generalize to seven different datasets that come from the fields of digital pathology and digital heritage. Our results show that there are significant benefits in transferring and training sparse architectures over larger parametrized models, since in all of our experiments pruned networks, winners of the lottery ticket hypothesis, significantly outperform their larger unpruned counterparts. These results suggest that winning initializations do contain inductive biases that are generic to some extent, although, as reported by our experiments on the biomedical datasets, their generalization properties can be more limiting than what has been so far observed in the literature.
摘要:我们研究是对自然的图像数据集的彩票假设的赢家修剪神经网络的泛化性能。我们分析其中的训练数据是稀缺的,来自非天然域条件下,他们的潜力。具体而言,我们调查那些对流行CIFAR-10/100和时尚MNIST数据集发现修剪模式是否推广到来自数字化病理学及数字遗产领域的七个不同的数据集。我们的研究结果表明,在传输和在我们所有的修剪网络实验的训练稀疏架构在更大的参数化模型,因为显著的好处,彩票假设的胜利者,显著胜过他们的更大的未修剪的同行。这些结果表明,获奖初始化确实含有电感的偏见是通用在一定程度上,虽然,正如我们对生物医学数据集的实验报告,其泛化性能可以比已经在文献中迄今观察到更多的限制。

5. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence [PDF] 返回目录
  Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, Jaegul Choo
Abstract: This paper tackles the automatic colorization task of a sketch image given an already-colored reference image. Colorizing a sketch image is in high demand in comics, animation, and other content creation applications, but it suffers from information scarcity of a sketch image. To address this, a reference image can render the colorization process in a reliable and user-driven manner. However, it is difficult to prepare for a training data set that has a sufficient amount of semantically meaningful pairs of images as well as the ground truth for a colored image reflecting a given reference (e.g., coloring a sketch of an originally blue car given a reference green car). To tackle this challenge, we propose to utilize the identical image with geometric distortion as a virtual reference, which makes it possible to secure the ground truth for a colored output image. Furthermore, it naturally provides the ground truth for dense semantic correspondence, which we utilize in our internal attention mechanism for color transfer from reference to sketch input. We demonstrate the effectiveness of our approach in various types of sketch image colorization via quantitative as well as qualitative evaluation against existing methods.
摘要:本文铲球给已经色的参考图像的草图图像的自动着色任务。着色草图图像是在漫画,动画及其他内容创建应用程序的高需求,但它从草图图像信息的匮乏问题。为了解决这个问题,参考图像可以呈现以可靠和用户驱动方式的彩色化处理。然而,这是难以对具有语义上有意义的图像对足够量以及地面实况为彩色图像反射一个给定的基准的训练数据集(制备例如,着色的原来的蓝色车给出的草图参考专线车)。为了应对这一挑战,我们提出利用具有几何失真作为虚拟的,这使得它可以保证地面真理的彩色输出图像相同的图像。此外,它自然提供了密集的语义对应,这是我们在我们的内部注意力机制从参考草图输入彩色传输利用地面实况。我们通过定量以及对现有方法定性评价证明了我们在不同类型的素描图像着色的方法的有效性。

6. Reference Pose Generation for Visual Localization via Learned Features and View Synthesis [PDF] 返回目录
  Zichao Zhang, Torsten Sattler, Davide Scaramuzza
Abstract: Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to 47%) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.
摘要:可视本地化是重点扶持的自主驾驶和增强现实技术之一。高品质的数据集与准确度6的自由度(DOF)的参考姿态是衡量和改进现有方法的基础。传统上,参考姿态已经通过结构 - 从-运动(SFM)中获得。然而,SFM本身依赖于局部特征,其易于当图像被不同的条件下拍摄,例如,日/夜变化失败。与此同时,手工标注的特征对应是不可伸缩的和潜在的不准确的。在这项工作中,我们提出了一种半自动化的方法基于通过学特征的三维模型的效果图和实际图像之间的特征匹配产生的参考姿势。给定一个初始姿态估计,我们的方法反复提炼基于对从当前姿势估计模型的渲染特征匹配的姿势。我们显著提高流行亚琛日夜数据集的夜间基准的姿势,显示出国家的最先进的视觉定位方法更好执行(高达47%)比由原始参考姿态预测。我们扩展新的夜间测试图像数据集,为我们的新的参考姿态提供的不确定性估算,并引入新的评价标准。我们将尽我们的参考姿势和我们的框架时加以公布出版。

7. FroDO: From Detections to 3D Objects [PDF] 返回目录
  Kejie Li, Martin Rünz, Meng Tang, Lingni Ma, Chen Kong, Tanner Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, Richard Newcombe
Abstract: Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.
摘要:面向对象的地图是场景的理解很重要,因为他们共同捕捉几何和语义,允许单个实例化和有关对象有意义的推理。我们引入佛罗多,用于精确3D重建从RGB视频对象实例的该推断对象的位置,姿势和形状在粗到细的方式的方法。键佛罗多是嵌入对象的形状以一种新颖的学习空间,允许稀疏点云之间且致密DeepSDF解码无缝切换。给定的局部RGB帧的输入序列,佛罗第一聚集体2D检测实例化的每个对象类别感知三维边界框。形状代码优化形状之前使用编码器网络消退和下使用稀疏和密集形状表示所学习的形状的先验进一步姿势。优化使用多视图几何,光度和轮廓损失。我们评估对现实世界的数据集,包括Pix3D,红木-OS,并ScanNet,对于单视图,多视角,多对象的重建。

8. Fine-Grained Visual Classification with Efficient End-to-end Localization [PDF] 返回目录
  Harald Hanselmann, Hermann Ney
Abstract: The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct prediction. State-of-the-art approaches often include a localization step designed to help a classification network by localizing the relevant parts of the input images. However, this usually requires multiple iterations or passes through a full classification network or complex training schedules. In this work we present an efficient localization module that can be fused with a classification network in an end-to-end setup. On the one hand the module is trained by the gradient flowing back from the classification network. On the other hand, two self-supervised loss functions are introduced to increase the localization accuracy. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft and are able to achieve competitive recognition performance.
摘要:术语细粒度的视觉分类(FGVC)是指分类任务所在班非常相似,分类模型需要能够发现细微的差别作出正确的预测。国家的最先进的方法通常包括旨在通过本地化输入图像的相关部分,以帮助分类网本地化的一步。然而,这通常需要多次重复或经过一个完整的分级网络或复杂的培训计划。在这项工作中,我们提出可以用在端至端设置一个分级网络融合的有效定位模块。一方面,模块通过从分类网络的梯度流回训练。在另一方面,两个自监督损失函数引入来提高定位精度。我们评估的三个基准数据集CUB200-2011,斯坦福汽车和FGVC飞机的新模式,并能够实现竞争的识别性能。

9. HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment [PDF] 返回目录
  Lingbo Yang, Chang Liu, Pan Wang, Shanshe Wang, Peiran Ren, Siwei Ma, Wen Gao
Abstract: Existing face restoration researches typically relies on either the degradation prior or explicit guidance labels for training, which often results in limited generalization ability over real-world images with heterogeneous degradations and rich background contents. In this paper, we investigate the more challenging and practical "dual-blind" version of the problem by lifting the requirements on both types of prior, termed as "Face Renovation"(FR). Specifically, we formulated FR as a semantic-guided generation problem and tackle it with a collaborative suppression and replenishment (CSR) approach. This leads to HiFaceGAN, a multi-stage framework containing several nested CSR units that progressively replenish facial details based on the hierarchical semantic guidance extracted from the front-end content-adaptive suppression modules. Extensive experiments on both synthetic and real face images have verified the superior performance of HiFaceGAN over a wide range of challenging restoration subtasks, demonstrating its versatility, robustness and generalization ability towards real-world face processing applications.
摘要:现有的脸恢复研究通常依赖于任何退化之前或明确指导标签的训练,其结果往往是有限的泛化能力在异构退化和丰富的背景内容真实世界的影像。在本文中,我们将取消对两种类型的之前,被称为“面翻新”(FR)的要求调查问题的更具挑战性和实用的“双盲”的版本。具体来说,我们制定了FR作为一个语义制导生成问题,用合作的抑制和补货(CSR)的方式解决它。这导致HiFaceGAN,含有几个嵌套CSR单元基于从前端内容自适应抑制模块提取出的分层语义引导该逐渐补充面部细节的多级架构。在人工和真实的人脸图像大量的实验已经在大范围的挑战恢复的子任务,展示了其多功能性,鲁棒性和泛化走向真实世界的面孔加工应用能力的验证HiFaceGAN的卓越性能。

10. Quantitative Analysis of Image Classification Techniques for Memory-Constrained Devices [PDF] 返回目录
  Sebastian Müksch, Theo Olausson, John Wilhelm, Pavlos Andreadis
Abstract: Convolutional Neural Networks, or CNNs, are undoubtedly the state of the art for image classification. However, they typically come with the cost of a large memory footprint. Recently, there has been significant progress in the field of image classification on memory-constrained devices, such as Arduino Unos, with novel contributions like the ProtoNN, Bonsai and FastGRNN models. These methods have been shown to perform excellently on tasks such as speech recognition or optical character recognition using MNIST, but their potential on more complex, multi-channel and multi-class image classification has yet to be determined. This paper presents a comprehensive analysis that shows that even in memory-constrained environments, CNNs implemented memory-optimally using Direct Convolutions outperform ProtoNN, Bonsai and FastGRNN models on 3-channel image classification using CIFAR-10. For our analysis, we propose new methods of adjusting the FastGRNN model to work with multi-channel images and then evaluate each algorithm with a memory size budget of 8KB, 16KB, 32KB, 64KB and 128KB to show quantitatively that CNNs are still state-of-the-art in image classification, even when memory size is constrained.
摘要:卷积神经网络,或细胞神经网络,无疑是本领域中用于图像分类的状态。然而,它们通常配备大容量内存的成本。最近,出现了图像分类的内存受限的设备,如Arduino的UNOS领域显著进展,如ProtoNN,盆景和FastGRNN车型新颖的贡献。这些方法已被证明使用MNIST任务,例如语音识别或光学字符识别优异的性能,但它们对更复杂的,多通道和多类图像分类潜力尚未被确定。本文提出了一种综合分析该显示,即使在存储器有限的环境,细胞神经网络实现存储器优化使用直接卷积使用CIFAR-10优于上3通道图像分类ProtoNN,盆景和FastGRNN模型。在我们的分析,我们提出调整FastGRNN模型与多通道图像工作的新方法,然后评估与8KB,16KB,32KB,64KB和128KB的内存大小预算每种算法定量地表明,细胞神经网络仍然是国家的-the-技术中图像分类,即使当存储器大小被限制。

11. Prototypical Contrastive Learning of Unsupervised Representations [PDF] 返回目录
  Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, Steven C.H. Hoi
Abstract: This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of the popular instance-wise contrastive learning. PCL implicitly encodes semantic structures of the data into the learned embedding space, and prevents the network from solely relying on low-level cues for solving unsupervised learning tasks. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning by encouraging representations to be closer to their assigned prototypes. PCL achieves state-of-the-art results on multiple unsupervised representation learning benchmarks, with >10% accuracy improvement in low-resource transfer tasks.
摘要:本文介绍原型对比学习(PCL),无监督学习的表示方法地址流行的情况下,明智的对比学习的基本限制。 PCL隐含编码数据导入了解到嵌入空间的语义结构,并防止单纯依靠低层次的线索来解决无监督的学习任务的网络。具体来说,我们引进的原型潜变量来帮助找到期望最大化框架的网络参数的最大似然估计。我们反复进行E-步为寻找通过集群和M-步作为对比通过学习网络优化原型的分布。我们建议ProtoNCE损失,通过鼓励陈述更接近其指定的原型广义版本对比学习InfoNCE损失。 PCL实现在多个无监督学习的表示基准状态的最先进的结果,与在低资源传递任务> 10%的准确度的改进。

12. Fake Face Detection via Adaptive Residuals Extraction Network [PDF] 返回目录
  Zhiqing Guo, Gaobo Yang, Jiyou Chen, Xingming Sun
Abstract: With the proliferation of face image manipulation (FIM) techniques such as Face2Face and Deepfake, more fake face images are spreading over the internet, which brings serious challenges to public confidence. Face image forgery detection has made considerable progresses in exposing specific FIM, but it is still in scarcity of a robust fake face detector to expose face image forgeries under complex scenarios. Due to the relatively fixed structure, convolutional neural network (CNN) tends to learn image content representations. However, CNN should learn subtle tampering artifacts for image forensics tasks. We propose an adaptive residuals extraction network (AREN), which serves as pre-processing to suppress image content and highlight tampering artifacts. AREN exploits an adaptive convolution layer to predict image residuals, which are reused in subsequent layers to maximize manipulation artifacts by updating weights during the back-propagation pass. A fake face detector, namely ARENnet, is constructed by integrating AREN with CNN. Experimental results prove that the proposed AREN achieves desirable pre-processing. When detecting fake face images generated by various FIM techniques, ARENnet achieves an average accuracy up to 98.52%, which outperforms the state-of-the-art works. When detecting face images with unknown post-processing operations, the detector also achieves an average accuracy of 95.17%.
摘要:随着人脸图像处理(FIM)技术,如面对面商务和Deepfake泛滥,更假脸图像散布在互联网上,这给公众的信心严重挑战。面部图像伪造物检测已在曝光特定FIM相当大的进步,但它仍然是在一个坚固的假脸检测器的稀缺性,以暴露下复杂的场景面部图像伪造的。由于相对固定的结构,卷积神经网络(CNN)趋向于学习图像内容表示。然而,CNN应该学会对图像取证任务微妙篡改文物。我们提出了一种自适应残差提取网络(AREN),其作为预先处理,以抑制图像内容并突出显示篡改工件。 AREN利用自适应卷积层来预测图像的残差,其被再利用在后续层由反向传播通期间更新权重最大化操纵工件。假脸检测器,即ARENnet,通过与CNN积分AREN构成。实验结果表明,所提出的实现AREN期望预处理。当检测由各种FIM技术产生的假脸的图像,实现了ARENnet平均精度高达98.52%,这优于国家的最先进的作品。当与未知的后处理操作检测面部图像,该检测器也实现了95.17%的平均精确度。

13. Conditional Image Generation and Manipulation for User-Specified Content [PDF] 返回目录
  David Stap, Maurits Bleeker, Sarah Ibrahimi, Maartje ter Hoeve
Abstract: In recent years, Generative Adversarial Networks (GANs) have improved steadily towards generating increasingly impressive real-world images. It is useful to steer the image generation process for purposes such as content creation. This can be done by conditioning the model on additional information. However, when conditioning on additional information, there still exists a large set of images that agree with a particular conditioning. This makes it unlikely that the generated image is exactly as envisioned by a user, which is problematic for practical content creation scenarios such as generating facial composites or stock photos. To solve this problem, we propose a single pipeline for text-to-image generation and manipulation. In the first part of our pipeline we introduce textStyleGAN, a model that is conditioned on text. In the second part of our pipeline we make use of the pre-trained weights of textStyleGAN to perform semantic facial image manipulation. The approach works by finding semantic directions in latent space. We show that this method can be used to manipulate facial images for a wide range of attributes. Finally, we introduce the CelebTD-HQ dataset, an extension to CelebA-HQ, consisting of faces and corresponding textual descriptions.
摘要:近年来,剖成对抗性网络(甘斯)已经对产生越来越令人印象深刻的现实世界图像稳步提高。它以操纵图像生成过程为目的,如内容制作是非常有用的。这可以通过调节上做更多的信息模型。然而,当空调的附加信息,仍然存在一大组与特定的条件一致的图像。这使得它不太可能生成的图像是完全一样的用户,这是实际的内容创建方案,如产生面部复合材料或照片有问题的设想。为了解决这个问题,我们提出了文字到图像生成和操纵一个流水线。在我们的管道的第一部分,我们介绍textStyleGAN,即文本空调的典范。在我们的管道的第二部分,我们利用textStyleGAN的预训练的权重来执行语义的面部图像处理。该方法的工作原理是在潜在空间中找到语义方向。我们表明,这种方法可以用来操纵面部图像进行大范围的属性。最后,我们引入CelebTD-HQ数据集,一个扩展CelebA-HQ,由面和对应的文本描述。

14. An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation [PDF] 返回目录
  Yuta Tokuoka, Shuji Suzuki, Yohei Sugawara
Abstract: With recent advances in supervised machine learning for medical image analysis applications, the annotated medical image datasets of various domains are being shared extensively. Given that the annotation labelling requires medical expertise, such labels should be applied to as many learning tasks as possible. However, the multi-modal nature of each annotated image renders it difficult to share the annotation label among diverse tasks. In this work, we provide an inductive transfer learning (ITL) approach to adopt the annotation label of the source domain datasets to tasks of the target domain datasets using Cycle-GAN based unsupervised domain adaptation (UDA). To evaluate the applicability of the ITL approach, we adopted the brain tissue annotation label on the source domain dataset of Magnetic Resonance Imaging (MRI) images to the task of brain tumor segmentation on the target domain dataset of MRI. The results confirm that the segmentation accuracy of brain tumor segmentation improved significantly. The proposed ITL approach can make significant contribution to the field of medical image analysis, as we develop a fundamental tool to improve and promote various tasks using medical images.
摘要:随着在监督的机器学习用于医学图像分析的应用程序的最新进展,各种结构域的注释的医用图像数据集被广泛共享。鉴于注释标记需要医疗专业知识,这样的标签应适用于尽可能多的学习任务成为可能。然而,每个标注的图像的多模态性质使得它难以共享不同的任务之间的注释标签。在这项工作中,我们提供了采用源域数据集的注释标签使用周期GaN基无监督领域适应性(UDA)的目标域数据集的任务感应迁移学习(ITL)的方法。为了评估ITL方法的适用性,我们通过对磁共振成像(MRI)图像脑瘤分割对MRI的目标域数据集的任务源域数据集中的脑组织注释标签。结果证实,脑肿瘤分割的分割精度显著提高。所提出的ITL方法可以对医学图像分析领域显著的贡献,因为我们发展到完善和使用医疗影像促进各项工作任务的基本工具。

15. Celeganser: Automated Analysis of Nematode Morphology and Age [PDF] 返回目录
  Linfeng Wang, Shu Kong, Zachary Pincus, Charless Fowlkes
Abstract: The nematode Caenorhabditis elegans (C. elegans) serves as an important model organism in a wide variety of biological studies. In this paper we introduce a pipeline for automated analysis of C. elegans imagery for the purpose of studying life-span, health-span and the underlying genetic determinants of aging. Our system detects and segments the worm, and predicts body coordinates at each pixel location inside the worm. These coordinates provide dense correspondence across individual animals to allow for meaningful comparative analysis. We show that a model pre-trained to perform body-coordinate regression extracts rich features that can be used to predict the age of individual worms with high accuracy. This lays the ground for future research in quantifying the relation between organs' physiologic and biochemical state, and individual life/health-span.
摘要:线虫(秀丽隐杆线虫)用作在各种生物学研究的一个重要的模式生物。在本文中,我们介绍了线虫的图像对于研究寿命,健康跨度为目的的自动分析和老化的潜在遗传因素管道。我们的系统检测和区段蜗杆,并预测身体坐标在蜗杆内的每个像素位置。这些坐标提供跨动物个体密集的对应,能够进行有意义的比较分析。我们表明,模型预先训练进行身体坐标回归提取物可用于预测高精度个别蠕虫的时代丰富的功能。这为今后的研究地面量化机关的生理和生化状态,以及个人人寿/健康跨度之间的关系。

16. Scope Head for Accurate Localizationin Object Detection [PDF] 返回目录
  Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang
Abstract: Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance. However, they still encounter the design difficulty in hand-crafted 2D anchor definition and the learning complexity in 1D direct location regression. To tackle these issues, in this paper, we propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship. This approach quantises the prediction space and employs a coarse-to-fine strategy for localisation. It achieves superior flexibility as in the regression based anchor-free methods, while produces more precise prediction. Besides, an inherit anchor selection score is learned to indicate the localisation quality of the detection result, and we propose to better represent the confidence of a detection box by combining the category-classification score and the anchor-selection score. With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO
摘要:在多级或单级管道现有锚型和无锚对象检测器已经取得了非常有希望的检测性能。然而,他们仍然遇到手工制作的2D锚定义了设计难度和1D直接定位回归学习的复杂性。为了解决这些问题,在本文中,我们提出了一个新的探测器所取,ScopeNet,每个位置的哪些型号锚作为一个相互依存的关系。这种方法quantises预测空间并且采用粗到细的战略定位。它实现了优异的柔韧性如在基于回归的无锚的方法,同时产生更精确的预测。此外,在继承定位选择得分学会表示检测结果的本地化质量,我们提出通过合并类别的分类得分和锚选择分数值,以便更好地代表一个检测盒的信心。随着我们的简洁而有效的设计,建议ScopeNet实现了对COCO国家的先进成果

17. Non-iterative Simultaneous Rigid Registration Method for Serial Sections of Biological Tissue [PDF] 返回目录
  Chang Shu, Xi Chen, Qiwei Xie, Chi Xiao, Hua Han
Abstract: In this paper, we propose a novel non-iterative algorithm to simultaneously estimate optimal rigid transformation for serial section images, which is a key component in volume reconstruction of serial sections of biological tissue. In order to avoid error accumulation and propagation caused by current algorithms, we add extra condition that the position of the first and the last section images should remain unchanged. This constrained simultaneous registration problem has not been solved before. Our algorithm method is non-iterative, it can simultaneously compute rigid transformation for a large number of serial section images in a short time. We prove that our algorithm gets optimal solution under ideal condition. And we test our algorithm with synthetic data and real data to verify our algorithm's effectiveness.
摘要:在本文中,我们提出了一个新的非迭代算法同时估计用于串行截面图像最佳刚性变换,这是在生物组织的连续切片的体积重建的关键部件。为了避免误差积累而造成目前的算法传播,我们增加额外的条件,即第一的最后一节图像的位置,并应保持不变。这种约束的同时登记问题一直没有得到解决之前。我们的算法方法是非迭代的,它可以同时计算刚体变换为在短时间内大量连续切片图像。我们证明了我们的算法得到理想的条件下,最佳的解决方案。我们测试我们用合成数据和实际数据的算法来验证我们算法的有效性。

18. Learning Descriptors Invariance Through Equivalence Relations Within Manifold: A New Approach to Expression Invariant 3D Face Recognition [PDF] 返回目录
  Faisal R. Al-Osaimi
Abstract: This paper presents a unique approach for the dichotomy between useful and adverse variations of key-point descriptors, namely the identity and the expression variations in the descriptor (feature) space. The descriptors variations are learned from training examples. Based on the labels of the training data, the equivalence relations among the descriptors are established. Both types of descriptor variations are represented by a graph embedded in the descriptor manifold. The invariant recognition is then conducted as a graph search problem. A heuristic graph search algorithm suitable for the recognition under this setup was devised. The proposed approach was tests on the FRGC v2.0, the Bosphorus and the 3D TEC datasets. It has shown to enhance the recognition performance, under expression variations in particular, by considerable margins.
摘要:本文提出了关键点的描述符,即身份和在描述符(特征)空间中的表达变化的有用和不利的变化之间的二分法的独特方法。该描述符的变化是从训练实例教训。基于训练数据的标签,描述符之间的等价关系成立。这两种类型的描述符的变化是由嵌入在描述符歧管的曲线图表示。不变的认可,然后作为一个图搜索的问题进行。适合于这种设置下的识别的启发式图搜索算法设计的。所提出的方法是在FRGC V2.0测试,博斯普鲁斯海峡和3D TEC数据集。它已经显示出增强的识别性能,下在特定的表达的变化,通过相当大的边距。

19. The Visual Social Distancing Problem [PDF] 返回目录
  Marco Cristani, Alessio Del Bue, Vittorio Murino, Francesco Setti, Alessandro Vinciarelli
Abstract: One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a possible threat given the scene context. All of this, complying with privacy policies and making the measurement acceptable. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of the related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate which existing Computer Vision methods can be used to manage such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios.
摘要:一个遏制近期爆发病毒的主要和最有效的措施是所谓的社交接触(SD)的维护。为了符合这一限制,工作场所,公共机构,运输和学校可能会采用在人与人之间最小间个人的距离限制。鉴于这种实际情况下,关键是要大规模衡量我们生活中遵守这样的物理限制,为了弄清楚这些距离限制的可能断裂的原因,并了解如果这意味着给现场背景下可能的威胁。所有这一切,隐私政策,遵守,使测定可接受。为此,我们引入了视觉社交接触(VSD)的问题,定义为人与人之间的距离的图像自动估计,以及相关的人聚集的特征。 VSD是关键的一个非侵入性的分析,人们是否符合SD限制,每当这个约束被违反,以提供有关特定区域安全水平的统计数据。然后,我们讨论如何VSD与社会信号处理以往的文献涉及,并指出其存在的计算机视觉方法可以用来管理这样的问题。我们的结论与有关VSD系统,伦理问题和未来的应用场景的有效性未来的挑战。

20. A numerical method to estimate uncertainty in non-rigid structure from motion [PDF] 返回目录
  Jingwei Song, Mitesh Patel
Abstract: Semi-Definite Programming (SDP) with low-rank prior has been widely applied in Non-Rigid Structure from Motion (NRSfM). Based on a low-rank constraint, it avoids the inherent ambiguity of basis number selection in conventional base-shape or base-trajectory methods. Despite the efficiency in deformable shape reconstruction, it remains unclear how to assess the uncertainty of the recovered shape from the SDP process. In this paper, we present a statistical inference on the element-wise uncertainty quantification of the estimated deforming 3D shape points in the case of the exact low-rank SDP problem. A closed-form uncertainty quantification method is proposed and tested. Moreover, we extend the exact low-rank uncertainty quantification to the approximate low-rank scenario with a numerical optimal rank selection method, which enables solving practical application in SDP based NRSfM scenario. The proposed method provides an independent module to the SDP method and only requires the statistic information of the input 2D tracked points. Extensive experiments prove that the output 3D points have identical normal distribution to the 2D trackings, the proposed method and quantify the uncertainty accurately, and supports that it has desirable effects on routinely SDP low-rank based NRSfM solver.
摘要:半定规划(SDP)与低等级之前已广泛应用于非刚性结构从运动(NRSfM)应用。基于低秩约束,它避免了基础数选择在常规的碱形或碱轨迹方法的固有的模糊性。尽管在变形的形状重建的效率,如何评估从SDP过程中的恢复形状的不确定性仍不清楚。在本文中,我们提出在精确的低等级SDP问题的情况下,估计变形3D形状点的元素方面的不确定性定量的统计推断。的闭合形式的不确定性的定量方法,提出并测试。此外,我们用一个数值最佳等级选择方法,这使得能够解决在基于SDP NRSfM场景实际应用延伸的确切低秩的不确定性量化的近似低秩场景。所提出的方法提供了一种独立的模块至SDP方法,只需要输入二维跟踪点的统计信息。大量的实验证明,输出3D点具有相同的正态分布到2D漏电痕迹,所提出的方法和准确量化的不确定性,并支持其对常规SDP低秩基于NRSfM解算器所希望的效果。

21. Photometric Multi-View Mesh Refinement for High-Resolution Satellite Images [PDF] 返回目录
  Mathias Rothermel, Ke Gong, Dieter Fritsch, Konrad Schindler, Norbert Haala
Abstract: Modern high-resolution satellite sensors collect optical imagery with ground sampling distances (GSDs) of 30-50cm, which has sparked a renewed interest in photogrammetric 3D surface reconstruction from satellite data. State-of-the-art reconstruction methods typically generate 2.5D elevation data. Here, we present an approach to recover full 3D surface meshes from multi-view satellite imagery. The proposed method takes as input a coarse initial mesh and refines it by iteratively updating all vertex positions to maximize the photo-consistency between images. Photo-consistency is measured in image space, by transferring texture from one image to another via the surface. We derive the equations to propagate changes in texture similarity through the rational function model (RFM), often also referred to as rational polynomial coefficient (RPC) model. Furthermore, we devise a hierarchical scheme to optimize the surface with gradient descent. In experiments with two different datasets, we show that the refinement improves the initial digital elevation models (DEMs) generated with conventional dense image matching. Moreover, we demonstrate that our method is able to reconstruct true 3D geometry, such as facade structures, if off-nadir views are available.
摘要:现代高分辨率卫星传感器收集的30-50厘米与地面取样距离(GSD文件)光学图象,这引发了从卫星数据在摄影测量三维表面重建一个新的兴趣。状态的最先进的重建方法通常生成2.5D高程数据。这里,我们提出的方法,以恢复从多视图卫星图像完整的3D表面网格。所提出的方法需要输入一个粗初始网格和通过迭代地细化它更新所有顶点位置以最大化图像之间的光的一致性。光一致性在图像空间中测量,通过经由表面从一个图像到另一个传送纹理。我们推导出方程来传播通过有理函数模型(RFM)的纹理相似性的变化,经常也被称为有理多项式系数(RPC)模型。此外,我们设计一种分层方案来使所述表面与梯度下降优化。在具有两个不同的数据集的实验中,我们表明,细化改善了与常规致密的图像匹配中产生的初始数字高程模型(DEM)。此外,我们证明了我们的方法是能够重建真实的3D几何形状,如立面结构,如果最低偏离视图可用。

22. A Simple Semi-Supervised Learning Framework for Object Detection [PDF] 返回目录
  Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister
Abstract: Semi-supervised learning (SSL) has promising potential for improving the predictive performance of machine learning models using unlabeled data. There has been remarkable progress, but the scope of demonstration in SSL has been limited to image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose new experimental protocols to evaluate performance of semi-supervised object detection using MS-COCO and demonstrate the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from 76.30 to 79.08; on MS-COCO, STAC demonstrates 2x higher data efficiency by achieving 24.38 mAP using only 5% labeled data than supervised baseline that marks 23.86% using 10% labeled data. The code is available at \url{this https URL}.
摘要:半监督学习(SSL)有希望应用于提高使用无标签数据机器学习模型的预测性能潜力。已经有显着的进步,但示范的SSL范围已不限于图片分类任务。在本文中,我们提出了STAC,对于具有数据扩张战略以及可视物体检测一个简单而有效的SSL框架。 STAC从未标记的图像展开局部对象的高度自信伪标签和通过经由强增扩执行一致性更新模型。我们提出了新的实验方案用MS-COCO评估半监督对象检测的性能和展示两个MS-COCO和VOC07 ​​STAC的功效。上VOC07,STAC提高AP $ ^ {0.5} $ 76.30从79.08到;在MS-COCO,STAC通过仅使用5%标记的数据比监督基线达到24.38地图表明2倍更高的数据效率,使用10%的标记的数据23.86%标记。代码可以在\ {URL这HTTPS URL}。

23. A Generalized Kernel Risk Sensitive Loss for Robust Two-Dimensional Singular Value Decomposition [PDF] 返回目录
  Miaohua Zhang, Yongsheng Gao
Abstract: Two-dimensional singular decomposition (2DSVD) has been widely used for image processing tasks, such as image reconstruction, classification, and clustering. However, traditional 2DSVD algorithm is based on the mean square error (MSE) loss, which is sensitive to outliers. To overcome this problem, we propose a robust 2DSVD framework based on a generalized kernel risk sensitive loss (GKRSL-2DSVD) which is more robust to noise and and outliers. Since the proposed objective function is non-convex, a majorization-minimization algorithm is developed to efficiently solve it with guaranteed convergence. The proposed framework has inherent properties of processing non-centered data, rotational invariant, being easily extended to higher order spaces. Experimental results on public databases demonstrate that the performance of the proposed method on different applications significantly outperforms that of all the benchmarks.
摘要:二维奇异值分解(2DSVD)已被广泛地用于图像处理任务,例如图像重构,分类和聚类。然而,传统的2DSVD算法基于均方误差(MSE)的损失,这是对异常值敏感。为了克服这个问题,我们提出了一种基于广义内核风险敏感损失(GKRSL-2DSVD),这是更强大的噪音和和异常强大的2DSVD框架。由于所提出的目标函数是非凸的,一个优化最小化算法被显影以具有保证收敛有效地解决这个问题。所提出的架构具有处理非中心数据,旋转不变的固有特性,而容易地扩展到更高阶的空间。在公共数据库中的实验结果表明,在不同的应用程序所提出的方法的性能优于显著所有的基准。

24. Domain Adaptation for Image Dehazing [PDF] 返回目录
  Yuanjie Shao, Lerenhan Li, Wenqi Ren, Changxin Gao, Nong Sang
Abstract: Image dehazing using learning-based methods has achieved state-of-the-art performance in recent years. However, most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift. To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules. Specifically, we first apply a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another. And then, we use images before and after translation to train the proposed two image dehazing networks with a consistency constraint. In this phase, we incorporate the real hazy image into the dehazing training via exploiting the properties of the clear image (e.g., dark channel prior and image gradient smoothing) to further improve the domain adaptivity. By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing. Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.
摘要:图像使用基于学习的方法除雾在近几年取得了国家的最先进的性能。然而,大多数现有的方法培养上合成的朦胧的图像,这是不太能够很好推广到实际图像朦胧由于域移位一个除雾模型。为了解决这个问题,我们提出了一个领域适应性范式,它由一个图像转换模块和两个图像除雾模块。具体地讲,我们首先施加一个双向转换网络通过平移从一个域图像到另一个桥接合成的和真实的结构域之间的间隙。然后,我们用图像之前和翻译后,用一致性约束训练提出了两种图像除雾网络。在这个阶段中,我们通过利用清澈图像的属性包括真正的朦胧图像到去混浊训练(例如,暗信道之前和图像梯度平滑),以进一步提高该域适应性。通过在终端到终端的方式训练图像的平移和除雾的网络,我们可以同时获得图像的平移和除雾的效果更好。在人工和真实世界的影像实验结果表明,对良好的国家的最先进的算法除雾我们的模型执行。

25. A Simple and Scalable Shape Representation for 3D Reconstruction [PDF] 返回目录
  Mateusz Michalkiewicz, Eugene Belilovsky, Mahsa Baktashmotlagh, Anders Eriksson
Abstract: Deep learning applied to the reconstruction of 3D shapes has seen growing interest. A popular approach to 3D reconstruction and generation in recent years has been the CNN encoder-decoder model usually applied in voxel space. However, this often scales very poorly with the resolution limiting the effectiveness of these models. Several sophisticated alternatives for decoding to 3D shapes have been proposed typically relying on complex deep learning architectures for the decoder model. In this work, we show that this additional complexity is not necessary, and that we can actually obtain high quality 3D reconstruction using a linear decoder, obtained from principal component analysis on the signed distance function (SDF) of the surface. This approach allows easily scaling to larger resolutions. We show in multiple experiments that our approach is competitive with state-of-the-art methods. It also allows the decoder to be fine-tuned on the target task using a loss designed specifically for SDF transforms, obtaining further gains.
摘要:深学应用到三维重建形状已经看到越来越大的兴趣。一种流行的方式来三维重建和发电近年来一直是CNN编码器,解码器模型通常在元空间应用。然而,这往往做的非常不好决议限制了这些模型的有效性。解码到3D形状复杂的几个备选方案已被提出通常依赖于复杂的深度学习架构的解码器模型。在这项工作中,我们表明,这种额外的复杂性是没有必要的,而且我们可以用一个线性解码器,从主成分分析的表面的符号距离函数(SDF)得到实际获得高品质的三维重建。这种方法可以很容易地扩展到更大的分辨率。我们发现在多次实验中,我们的做法是与国家的最先进的方法,有竞争力的。它还允许该解码器将被使用损失专门为SDF变换设计,获得进一步的收益目标任务微调。

26. A Comparison of Few-Shot Learning Methods for Underwater Optical and Sonar Image Classification [PDF] 返回目录
  Mateusz Ochal, Jose Vazquez, Yvan Petillot, Sen Wang
Abstract: Deep convolutional neural networks have shown to perform well in underwater object recognition tasks, on both optical and sonar images. However, many such methods require hundreds, if not thousands, of images per class to generalize well to unseen examples. This is restricting in situations where obtaining and labeling larger volumes of data is impractical, such as observing a rare object, performing real-time operations, or operating in new underwater environments. Finding an algorithm capable of learning from only a few samples could reduce the time spent obtaining and labeling datasets, and accelerate the training of deep-learning models. To the best of our knowledge, this is the first paper to evaluate and compare several Few-Shot Learning (FSL) methods using underwater optical and side-scan sonar imagery. Our results show that FSL methods offer a significant advantage over the traditional transfer learning methods that employ fine-tuning of pre-trained models. Our findings show that FSL methods are not too far from being used on real-world robotics scenarios and expanding the capabilities of autonomous underwater systems.
摘要:深卷积神经网络已经显示出水下目标识别任务表现良好,在光学和声纳图像。然而,许多这样的方法需要数百,甚至数千,每班图像很好地推广到看不见的例子。这被限制在的情况下获取和标记数据的较大体积是不切实际的,如观察稀有对象,执行实时操作,或者在新的水下环境中操作。寻找能够从只有少数样本学习可以减少花在获取和标签数据集的时间,加快深学习模型的训练的算法。据我们所知,这是评估和比较利用水下光学和侧扫声纳图像的几个为数不多的射击学习(FSL)方法的第一篇论文。我们的研究结果表明:FSL方法提供比传统传输的学习方法是预先训练的模型聘请微调一个显著的优势。我们的研究结果表明:FSL方法不是来自真实世界的机器人的场景中使用,扩大自主水下系统的功能太远。

27. A Unified Weight Learning and Low-Rank Regression Model for Robust Face Recognition [PDF] 返回目录
  Miaohua Zhang, Yongsheng Gao, Fanhua Shang
Abstract: Regression-based error modelling has been extensively studied for face recognition in recent years. The most important problem in regression-based error model is fitting the complex representation error caused by various corruptions and environment changes. However, existing works are not robust enough to model the complex corrupted errors. In this paper, we address this problem by a unified sparse weight learning and low-rank approximation regression model and applied it to the robust face recognition in the presence of varying types and levels of corruptions, such as random pixel corruptions and block occlusions, or disguise. The proposed model enables the random noise and contiguous occlusions to be addressed simultaneously. For the random noise, we proposed a generalized correntropy (GC) function to match the error distribution. For the structured error caused by occlusion or disguise, we proposed a GC function based rank approximation to measure the rank of error matrix. An effective iterative optimization is developed to solve the optimal weight learning and low-rank approximation. Extensive experimental results on three public face databases show that the proposed model can fit the error distribution and structure very well, thus obtain better recognition accuracy in comparison with the existing methods.
摘要:基于回归的误差建模,近年来被广泛研究的人脸识别。在基于回归的误差模型最重要的问题是装修引起的各种腐败和环境变化的复表示错误。但是,现有的工作是不够强健的复杂损坏的误差建模。在本文中,我们解决由统一稀疏重量学习和低秩近似的回归模型这个问题,并在不同类型和损坏的水平,如随机像素损坏和块阻塞的存在它施加到鲁棒面部识别,或伪装。该模型使得能够随机噪声和连续的闭塞被同时寻址。对于随机噪声中,我们提出了广义correntropy(GC)功能相匹配的误差分布。对于由闭塞或变相结构化的错误,我们提出了一个GC功能基于秩逼近测量误差矩阵的秩。一个有效的迭代优化的出现解决了最优权重的学习和低阶近似。三个公共人脸数据库广泛的实验结果表明,该模型能够适应误差分布和结构非常好,从而获得与现有的方法相比更好的识别精度。

28. MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and Bayesian Tracking [PDF] 返回目录
  Puneet Gupta, Brojeshwar Bhowmick, Arpan Pal
Abstract: A non-invasive yet inexpensive method for heart rate (HR) monitoring is of great importance in many real-world applications including healthcare, psychology understanding, affective computing and biometrics. Face videos are currently utilized for such HR monitoring, but unfortunately this can lead to errors due to the noise introduced by facial expressions, out-of-plane movements, camera parameters (like focus change) and environmental factors. We alleviate these issues by proposing a novel face video based HR monitoring method MOMBAT, that is, MOnitoring using Modeling and BAyesian Tracking. We utilize out-of-plane face movements to define a novel quality estimation mechanism. Subsequently, we introduce a Fourier basis based modeling to reconstruct the cardiovascular pulse signal at the locations containing the poor quality, that is, the locations affected by out-of-plane face movements. Furthermore, we design a Bayesian decision theory based HR tracking mechanism to rectify the spurious HR estimates. Experimental results reveal that our proposed method, MOMBAT outperforms state-of-the-art HR monitoring methods and performs HR monitoring with an average absolute error of 1.329 beats per minute and the Pearson correlation between estimated and actual heart rate is 0.9746. Moreover, it demonstrates that HR monitoring is significantly
摘要:心脏速率(HR)监测的非侵入又便宜的方法是,在许多现实世界的应用,包括医疗保健,心理的理解,情感计算和生物识别非常重要。面的视频目前用于这种监测HR,但不幸的是,这可能导致错误由于面部表情所引入的噪声,外的平面的运动,相机参数(如焦点变化)和环境因素。我们通过提出一种新的人脸视频基于HR监测方法MOMBAT,即使用建模和贝叶斯跟踪监测缓解这些问题。我们利用了的面表情动作来定义一个新的质量评价机制。随后,我们介绍了基于模型来重建在包含质量差,即,受外的平面面运动的位置中的位置的心血管脉冲信号傅立叶基础。此外,我们还设计了一个贝叶斯决策理论基础的人力资源追踪机制,整顿虚假HR估计值。实验结果表明,我们提出的方法,国家的最先进的MOMBAT性能优于HR监测方法和执行人力资源以每分钟1.329次的平均绝对误差估计和实际心脏速率之间的Pearson相关监测为0.9746。此外,它表明,HR监测显著

29. Variational Clustering: Leveraging Variational Autoencoders for Image Clustering [PDF] 返回目录
  Vignesh Prasad, Dipanjan Das, Brojeshwar Bhowmick
Abstract: Recent advances in deep learning have shown their ability to learn strong feature representations for images. The task of image clustering naturally requires good feature representations to capture the distribution of the data and subsequently differentiate data points from one another. Often these two aspects are dealt with independently and thus traditional feature learning alone does not suffice in partitioning the data meaningfully. Variational Autoencoders (VAEs) naturally lend themselves to learning data distributions in a latent space. Since we wish to efficiently discriminate between different clusters in the data, we propose a method based on VAEs where we use a Gaussian Mixture prior to help cluster the images accurately. We jointly learn the parameters of both the prior and the posterior distributions. Our method represents a true Gaussian Mixture VAE. This way, our method simultaneously learns a prior that captures the latent distribution of the images and a posterior to help discriminate well between data points. We also propose a novel reparametrization of the latent space consisting of a mixture of discrete and continuous variables. One key takeaway is that our method generalizes better across different datasets without using any pre-training or learnt models, unlike existing methods, allowing it to be trained from scratch in an end-to-end manner. We verify our efficacy and generalizability experimentally by achieving state-of-the-art results among unsupervised methods on a variety of datasets. To the best of our knowledge, we are the first to pursue image clustering using VAEs in a purely unsupervised manner on real image datasets.
摘要:在深度学习的最新进展表明他们的学习强大的功能的表示,图像的能力。图像集群的任务自然需要良好的特征表示从彼此获取数据,并随后分化数据点的分布。通常,这两个方面都涉及独立,因此传统的功能上单独学习有意义地分割数据是不够的。变自动编码(VAES)自然借给自己处于潜伏空间学习数据分布。因为我们希望在数据不同集群之间有效地判别,我们提出了一种基于VAES我们之前帮助使用高斯混合簇的图像准确的方法。我们共同学习前和后验分布两者的参数。我们的方法是一种真正的高斯混合VAE。这样一来,我们的方法同时获悉先前捕获的图像的潜在分布和数据点之间的后侧,以帮助判别好。我们还建议由离散和连续变量的混合物的潜在空间的一种新颖的重新参数化。一个关键的外卖是,我们的方法推广跨越不同的数据集更好,而无需使用任何前培训或学习模式,不同于现有的方法,使其能够从头端至端的方式进行培训。我们通过实验实现对各种数据集的无监督方法中的国家的最先进的结果验证了我们的有效性和普遍性。据我们所知,我们是第一个去追求像在真实图像数据集纯粹的无监督方式使用聚类VAES。

30. Robust Tensor Decomposition for Image Representation Based on Generalized Correntropy [PDF] 返回目录
  Miaohua Zhang, Yongsheng Gao, Changming Sun, Michael Blumenstein
Abstract: Traditional tensor decomposition methods, e.g., two dimensional principal component analysis and two dimensional singular value decomposition, that minimize mean square errors, are sensitive to outliers. To overcome this problem, in this paper we propose a new robust tensor decomposition method using generalized correntropy criterion (Corr-Tensor). A Lagrange multiplier method is used to effectively optimize the generalized correntropy objective function in an iterative manner. The Corr-Tensor can effectively improve the robustness of tensor decomposition with the existence of outliers without introducing any extra computational cost. Experimental results demonstrated that the proposed method significantly reduces the reconstruction error on face reconstruction and improves the accuracies on handwritten digit recognition and facial image clustering.
摘要:传统张量分解的方法,例如,二维主成分分析和二维奇异值分解,即最小化均方误差,是对异常值敏感。为了克服这个问题,在本文中,我们提出了利用广义correntropy标准(更正张量)一个新的强大的张量分解方法。拉格朗日乘数方法用于有效地优化广义correntropy目标函数以迭代的方式。在科尔张量能有效改善离群的存在分解张的稳健性,而不会引入任何额外的计算成本。实验结果表明,所提出的方法降低了显著上面重建的重建误差,并改善对手写体数字识别和面部图像聚类精度。

31. Class-Aware Domain Adaptation for Improving Adversarial Robustness [PDF] 返回目录
  Xianxu Hou, Jingxin Liu, Bolei Xu, Xiaolong Wang, Bozhi Liu, Guoping Qiu
Abstract: Recent works have demonstrated convolutional neural networks are vulnerable to adversarial examples, i.e., inputs to machine learning models that an attacker has intentionally designed to cause the models to make a mistake. To improve the adversarial robustness of neural networks, adversarial training has been proposed to train networks by injecting adversarial examples into the training data. However, adversarial training could overfit to a specific type of adversarial attack and also lead to standard accuracy drop on clean images. To this end, we propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training. Specifically, we propose to learn domain-invariant features for adversarial examples and clean images via a domain discriminator. Furthermore, we introduce a class-aware component into the discriminator to increase the discriminative power of the network for adversarial examples. We evaluate our newly proposed approach using multiple benchmark datasets. The results demonstrate that our method can significantly improve the state-of-the-art of adversarial robustness for various attacks and maintain high performances on clean images.
摘要:最近的工作已经证明了卷积神经网络很容易受到对抗性的例子,即输入机器学习模型,攻击者故意设计成使模型犯了一个错误。为了提高神经网络的对抗鲁棒性,对抗性训练已经提出了通过注射对抗的例子到训练数据来训练网络。然而,对抗性训练会过度拟合到敌对攻击的一种特定类型,也导致对干净的影像标准精度下降。为此,我们建议不采用直接对抗训练对抗防御一类新的感知领域适应性(CADA)方法。具体来说,我们建议通过域鉴别学会了对抗的例子,干净的图像域不变特征。此外,我们引入了类知晓组件进鉴别,以增加网络对抗的例子的辨别力。我们使用多个标准数据集评估我们新提出的方法。结果表明,我们的方法可以显著改善各种攻击敌对稳健的国家的最先进和维护干净的影像高性能。

32. Compact Neural Representation Using Attentive Network Pruning [PDF] 返回目录
  Mahdi Biparva, John Tsotsos
Abstract: Deep neural networks have evolved to become power demanding and consequently difficult to apply to small-size mobile platforms. Network parameter reduction methods have been introduced to systematically deal with the computational and memory complexity of deep networks. We propose to examine the ability of attentive connection pruning to deal with redundancy reduction in neural networks as a contribution to the reduction of computational demand. In this work, we describe a Top-Down attention mechanism that is added to a Bottom-Up feedforward network to select important connections and subsequently prune redundant ones at all parametric layers. Our method not only introduces a novel hierarchical selection mechanism as the basis of pruning but also remains competitive with previous baseline methods in the experimental evaluation. We conduct experiments using different network architectures on popular benchmark datasets to show high compression ratio is achievable with negligible loss of accuracy.
摘要:深层神经网络已经发展成为电力要求苛刻,因此难以适用于小尺寸移动平台。网络参数还原方法已被引入到系统处理深网络的计算和存储的复杂性。我们建议先仔细周到的连接修剪,以处理在神经网络的冗余消减,以计算需求的减少做出贡献的能力。在这项工作中,我们描述了被添加到一个自下而上的前馈网络选择在所有参数层重要的连接,然后修剪多余的人自上而下的注意机制。我们的方法不仅引入了一个新的层次选择机制作为修剪的基础,但也仍然与实验评价以前的基准线方法的竞争力。我们采用流行标准数据集不同的网络结构,显示高压缩比是准确的损失忽略不计实现进行实验。

33. Epipolar Transformers [PDF] 返回目录
  Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
Abstract: A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps: (1) apply a 2D detector separately on each view to localize joints in 2D, and (2) perform robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information. Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'. Experiments on InterHand and Human3.6M show that our approach has consistent improvements over the baselines. Specifically, in the condition where no external data is used, our Human3.6M model trained with ResNet-50 backbone and image size 256 x 256 outperforms state-of-the-art by 4.23 mm and achieves MPJPE 26.9 mm.
摘要:本地化三维人体关节以同步和校准的多视图设置一种常见的方法包括两个步骤:(1)分别适用于每个视图的二维局部化接头二维检测器,和(2)执行的鲁棒三角测量从每个视图的2D检测到获取3D关节位置。然而,在步骤1中,2D检测器被限于解决具有挑战性的情况下,其可能潜在地在三维被更好地解决,如闭塞和倾斜的视角,纯粹2D而不利用任何3D信息。因此,我们提出微“核变”,这使2D检测到利用3D知晓的功能,以提高2D姿态估计。直觉是:鉴于当前观看二维位置P,我们想先找到其对应的点P“在邻近的视图,然后在P结合功能”与P点的功能,这样就导致了3D-在p感知功能。通过立体匹配的启发,极变压器杠杆极线约束和特征匹配到“近似特征于p。在InterHand和Human3.6M实验结果表明,我们的方法比基线持续改善。具体而言,在不使用外部数据的条件下,我们的Human3.6M模型RESNET-50主链和图像大小为256×256性能优于国家的最先进的由4.23毫米训练有素并且实现MPJPE26.9毫米。

34. A Robust Matching Pursuit Algorithm Using Information Theoretic Learning [PDF] 返回目录
  Miaohua Zhang, Yongsheng Gao, Changming Sun, Michael Blumenstein
Abstract: Current orthogonal matching pursuit (OMP) algorithms calculate the correlation between two vectors using the inner product operation and minimize the mean square error, which are both suboptimal when there are non-Gaussian noises or outliers in the observation data. To overcome these problems, a new OMP algorithm is developed based on the information theoretic learning (ITL), which is built on the following new techniques: (1) an ITL-based correlation (ITL-Correlation) is developed as a new similarity measure which can better exploit higher-order statistics of the data, and is robust against many different types of noise and outliers in a sparse representation framework; (2) a non-second order statistic measurement and minimization method is developed to improve the robustness of OMP by overcoming the limitation of Gaussianity inherent in cost function based on second-order moments. The experimental results on both simulated and real-world data consistently demonstrate the superiority of the proposed OMP algorithm in data recovery, image reconstruction, and classification.
摘要:当前正交匹配追踪(OMP)算法计算使用的内积运算的两个向量之间的相关性并最小化均方误差,这是当存在非高斯噪声或异常值的观测数据都次优的。 (1)的ITL-基于相关(ITL相关)被开发为新的相似性度量:为了克服这些问题,一个新的OMP算法基于信息理论的学习(ITL),它是建立在以下的新技术开发从而可以更好地利用数据的高阶统计量,并且是对许多不同类型的稀疏表示框架噪声和异常值的健壮; (2)非二阶统计测量和最小化方法开发通过基于二阶矩克服高斯固有的局限性在成本函数,以改善OMP的鲁棒性。两个模拟和真实数据的实验结果一致表明在数据恢复,图像重建和分类提出的OMP算法的优越性。

35. Generative Model-driven Structure Aligning Discriminative Embeddings for Transductive Zero-shot Learning [PDF] 返回目录
  Omkar Gune, Mainak Pal, Preeti Mukherjee, Biplab Banerjee, Subhasis Chaudhuri
Abstract: Zero-shot Learning (ZSL) is a transfer learning technique which aims at transferring knowledge from seen classes to unseen classes. This knowledge transfer is possible because of underlying semantic space which is common to seen and unseen classes. Most existing approaches learn a projection function using labelled seen class data which maps visual data to semantic data. In this work, we propose a shallow but effective neural network-based model for learning such a projection function which aligns the visual and semantic data in the latent space while simultaneously making the latent space embeddings discriminative. As the above projection function is learned using the seen class data, the so-called projection domain shift exists. We propose a transductive approach to reduce the effect of domain shift, where we utilize unlabeled visual data from unseen classes to generate corresponding semantic features for unseen class visual samples. While these semantic features are initially generated using a conditional variational auto-encoder, they are used along with the seen class data to improve the projection function. We experiment on both inductive and transductive setting of ZSL and generalized ZSL and show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and APY. We also show the efficacy of our model in the case of extremely less labelled data regime on different datasets in the context of ZSL.
摘要:零次学习(ZSL)是一种转移学习技术,其目的是从看到类转移知识看不见类。这方面的知识转移是可能的,因为语义空间,是常见的可见和不可见的类标的。大多数现有的方法学习使用它的视觉数据映射到语义数据标记看到类数据的投影功能。在这项工作中,我们提出了这样的学习对准其视觉和语义数据的潜在空间,同时使潜在空间的嵌入判别投影功能的浅而有效的基于神经网络模型。作为上述投影函数是使用看出类数据了解到,所谓投影域移位存在。我们提出了一个转导途径来减少域名转移,我们利用来自看不见的类未标记的视觉数据,产生语义特征的看不见的一流的视觉样本的效果。虽然使用条件变的自动编码器最初生成的这些语义特征,它们用于沿与所述可见类数据时提高了投影功能。我们对实验的ZSL既感性和直推式设置和广义ZSL和显示标准的基准数据集AWA1,AWA2,CUB,SUN,FLO和APY性能优越。我们还表明我们的模型中对ZSL的背景下不同的数据集非常少的标签数据政权的情况下的效果。

36. Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events [PDF] 返回目录
  Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Guo-Jun Qi, Rui Qian, Tao Wang, Nicu Sebe, Ning Xu, Hongkai Xiong, Mubarak Shah
Abstract: Along with the development of the modern smart city, human-centric video analysis is encountering the challenge of diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous individual, or collective behavior. However, limited by the scale of available surveillance video datasets, few existing human analysis approaches report their performances on such complex events. To this end, we present a new large-scale dataset, named Human-in-Events or HiEve (human-centric video analysis in complex events), for understanding human motions, poses, and actions in a variety of realistic events, especially crowd & complex events. It contains a record number of poses (>1M), the largest number of action labels (>56k) for complex events, and one of the largest number of trajectories lasting for long terms (with average trajectory length >480). Besides, an online evaluation server is built for researchers to evaluate their approaches. Furthermore, we conduct extensive experiments on recent video analysis approaches, demonstrating that the HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at this http URL .
摘要:随着现代智慧城市的发展,以人为中心的视频分析遇到的真实场景多样化和复杂事件的挑战。复杂事件涉及到人群密集的反常个人或集体行为。然而,现有的监控录像数据集的规模限制,现有的几个人的分析方法,在这类复杂事件报告他们的表演。为此,我们提出了一个新的大型数据集,命名人,在事件或HiEve(在复杂事件以人为中心的视频分析),对理解人类动作,姿势,并在各种现实事件的行动,特别是人群与复杂的事件。它包含姿势的记录数(> 1M),动作标签(> 56K)的复杂事件的数量最多,和轨迹持续长期的数量最大的一个(平均轨道长> 480)。此外,在线评估服务器是专为研究人员他们的方法评估。此外,我们进行的最新的视频分析大量的实验方法,表明HiEve是人类为中心的视频分析一个具有挑战性的数据集。我们预计,该数据集将推进前沿技术,以人为中心的分析的发展和复杂事件的理解。该数据集可在此HTTP URL。

37. Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation [PDF] 返回目录
  Fabricio Breve, Carlos Norberto Fischer
Abstract: Navigation and mobility are some of the major problems faced by visually impaired people in their daily lives. Advances in computer vision led to the proposal of some navigation systems. However, most of them require expensive and/or heavy hardware. In this paper we propose the use of convolutional neural networks (CNN), transfer learning, and semi-supervised learning (SSL) to build a framework aimed at the visually impaired aid. It has low computational costs and, therefore, may be implemented on current smartphones, without relying on any additional equipment. The smartphone camera can be used to automatically take pictures of the path ahead. Then, they will be immediately classified, providing almost instantaneous feedback to the user. We also propose a dataset to train the classifiers, including indoor and outdoor situations with different types of light, floor, and obstacles. Many different CNN architectures are evaluated as feature extractors and classifiers, by fine-tuning weights pre-trained on a much larger dataset. The graph-based SSL method, known as particle competition and cooperation, is also used for classification, allowing feedback from the user to be incorporated without retraining the underlying network. 92\% and 80\% classification accuracy is achieved in the proposed dataset in the best supervised and SSL scenarios, respectively.
摘要:导航和移动性是一些在日常生活中所面临的视障人士的主要问题。在计算机视觉的进步导致了一些导航系统的建议。然而,大多数人都需要昂贵的和/或重硬件。在本文中,我们建议使用卷积神经网络(CNN),转让的学习和半监督学习(SSL)来构建针对视障援助的框架。它具有较低的计算成本,因此,可对当前的智能手机来实现,而不依赖于任何附加设备。智能手机摄像头可以用于自动取前面的道路的照片。然后,他们将立即进行分类,提供了几乎瞬时反馈给用户。我们还提出了一个数据集训练的分类,包括不同类型的灯,地板,和障碍,室内和室外的情况。许多不同的CNN架构被评估为特征提取和分类,通过微调的权重上一个更大的数据集预先训练。基于图形的SSL的方法,称为颗粒竞争与合作,也用于分类,从而允许来自用户的反馈而无需再训练底层网络被并入。 92 \%和80 \%分类精度在所提出的数据集的最佳监督和SSL场景分别实现。

38. Vehicle Re-Identification Based on Complementary Features [PDF] 返回目录
  Cunyuan Gao, Yi Hu, Yi Zhang, Rui Yao, Yong Zhou, Jiaqi Zhao
Abstract: In this work, we present our solution to the vehicle re-identification (vehicle Re-ID) track in AI City Challenge 2020 (AIC2020). The purpose of vehicle Re-ID is to retrieve the same vehicle appeared across multiple cameras, and it could make a great contribution to the Intelligent Traffic System(ITS) and smart city. Due to the vehicle's orientation, lighting and inter-class similarity, it is difficult to achieve robust and discriminative representation feature. For the vehicle Re-ID track in AIC2020, our method is to fuse features extracted from different networks in order to take advantages of these networks and achieve complementary features. For each single model, several methods such as multi-loss, filter grafting, semi-supervised are used to increase the representation ability as better as possible. Top performance in City-Scale Multi-Camera Vehicle Re-Identification demonstrated the advantage of our methods, and we got 5-th place in the vehicle Re-ID track of AIC2020. The codes are available at this https URL.
摘要:在这项工作中,我们提出我们的解决方案在车辆重新鉴定(重新车辆-ID)轨道AI城市挑战2020(AIC2020)。车辆重新-ID的目的是检索出现在多个摄像机相同的车辆,它可以对智能交通系统(ITS)和智能城市的一大贡献。由于车辆的方向,照明和类间的相似性,这是很难实现稳健和歧视表示特征。在AIC2020车辆重新ID轨道,我们的方法是保险丝来自不同网络的提取,以便采取这些网络的优点和实现的互补特征的特征。对于每个单一的模型,多种方法如多损耗,过滤器接枝,半监督用于增加的表示能力尽可能更好。在全市规模多相机车辆重新鉴定的顶级性能证明了我们方法的优势,我们有5个在AIC2020的车辆重新编号的轨道位置。该代码可在此HTTPS URL。

39. Understanding Dynamic Scenes using Graph Convolution Networks [PDF] 返回目录
  Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan, K Madhava Krishna, Balaraman Ravindran, Anoop Namboodiri
Abstract: We present a novel Multi Relational Graph Convolutional Network (MRGCN) to model on-road vehicle behaviours from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a Multi Relational Graph (MRG) where the nodes of the graph represent the active and passive participants/agents in the scene while the bidrectional edges that connect every pair of nodes are encodings of the spatio-temporal relations. The bidirectional edges of the graph encode the temporal interactions between the agents that constitute the two nodes of the edge. The proposed method of obtaining his encoding is shown to be specifically suited for the problem at hand as it outperforms more complex end to end learning methods that do not use such intermediate representations of evolved spatio-temporal relations between agent pairs. We show significant performance gain in the form of behaviour classification accuracy on a variety of datasets from different parts of the globe over prior methods as well as show seamless transfer without any resort to fine-tuning across multiple datasets. Such behaviour prediction methods find immediate relevance in a variety of navigation tasks such as behaviour planning, state estimation as well as in applications relating to detection of traffic violations over videos.
摘要:从时间上排序的帧的一个序列提出一种新的多关系图卷积网络(MRGCN)到模型公路上车辆的行为由移动单眼照相机作为抓起。输入到MRGCN是一个多关系图(MRG),其中图的节点表示在场景中的主动和被动的参与者/剂而连接每对节点的bidrectional边缘时空关系的编码。该曲线图的双向边缘编码构成的边缘的两个节点的代理之间的时间相互作用。示出获得了编码的所提出的方法可以特别适合手头的问题,因为它优于更复杂的端到端不使用试剂对之间进化时空关系的这样的中间表示学习方法。我们展示的行为分类准确度的多种来自世界各地超过现有方法的不同部分的数据集的形式显著的性能增益,以及跨多个数据集没有任何度假秀无缝转移到微调。这样的行为预测方法立即找到在各种导航任务,如行为规划,状态估计以及在有关检测的交通违法行为在视频应用的相关性。

40. Memory-Augmented Relation Network for Few-Shot Learning [PDF] 返回目录
  Jun He, Xueliang Liu, Richang Hong
Abstract: Metric-based few-shot learning methods concentrate on learning transferable feature embedding that generalizes well from seen categories to unseen categories under the supervision of limited number of labelled instances. However, most of them treat each individual instance in the working context separately without considering its relationships with the others. In this work, we investigate a new metric-learning method, Memory-Augmented Relation Network (MRN), to explicitly exploit these relationships. In particular, for an instance, we choose the samples that are visually similar from the working context, and perform weighted information propagation to attentively aggregate helpful information from the chosen ones to enhance its representation. In MRN, we also formulate the distance metric as a learnable relation module which learns to compare for similarity measurement, and augment the working context with memory slots, both contributing to its generality. We empirically demonstrate that MRN yields significant improvement over its ancestor and achieves competitive or even better performance when compared with other few-shot learning approaches on the two major benchmark datasets, i.e. miniImagenet and tieredImagenet.
摘要:基于度量的几个拍的学习方法专心学习转移功能嵌入一般化以及从看到类别看不见的类别标记的实例数量有限的监督下。然而,大多数人对待在工作范围内的每个单独的实例分别在不考虑其与他人的关系。在这项工作中,我们研究了一个新的学习度量方法,内存增加了的关系网络(MRN),明确利用这些关系。特别是,对于一个实例,我们选择在视觉从工作上下文类似的样品,并且从所选择的那些执行加权信息传播到聚精会神聚合有用的信息,以提高它的表示。在MRN,我们还制定度量作为学习来比较相似性测量可学习关系模块的距离,并与内存插槽扩大工作范围内,既促进其一般性。我们经验表明,当与其他几拍的学习相比,该MRN产生了其祖先显著的改善,并实现有竞争力的,甚至更好的性能接近的两个主要标准数据集,即miniImagenet和tieredImagenet。

41. High Resolution Face Age Editing [PDF] 返回目录
  Xu Yao, Gilles Puy, Alasdair Newson, Yann Gousseau, Pierre Hellier
Abstract: Face age editing has become a crucial task in film post-production, and is also becoming popular for general purpose photography. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resolution images. In order to achieve aging/de-aging with the high quality and robustness necessary for wider use, these problems need to be addressed. This is the goal of the present work. We present an encoder-decoder architecture for face age editing. The core idea of our network is to create both a latent space containing the face identity, and a feature modulation layer corresponding to the age of the individual. We then combine these two elements to produce an output image of the person with a desired target age. Our architecture is greatly simplified with respect to other approaches, and allows for continuous age editing on high resolution images in a single unified model.
摘要:面对时代的编辑已经成为电影后期制作的关键任务,也正在成为流行的通用摄影。近日,对抗性训练产生了一些用于图像处理的最令人印象深刻的视觉效果,包括面部老化/解老化的任务。尽管长足的进步,目前的方法经常出现视觉假象,只能处理较低分辨率图像。为了实现老化/提供高品质的和必要的广泛使用耐用性去化,需要解决这些问题。这是目前工作的目标。我们提出了脸年龄编辑编码器,解码器架构。我们的网络的核心思想是建立包含脸部识别两个潜在空间,以及对应于个体的年龄特点调制层。然后,我们结合这两种元素,制作人的输出图像与期望的目标年龄。我们的结构相对于其他方法大大简化,并允许在一个统一的模型,高分辨率的图像连续年龄编辑。

42. Photo style transfer with consistency losses [PDF] 返回目录
  Xu Yao, Gilles Puy, Patrick Pérez
Abstract: We address the problem of style transfer between two photos and propose a new way to preserve photorealism. Using the single pair of photos available as input, we train a pair of deep convolution networks (convnets), each of which transfers the style of one photo to the other. To enforce photorealism, we introduce a content preserving mechanism by combining a cycle-consistency loss with a self-consistency loss. Experimental results show that this method does not suffer from typical artifacts observed in methods working in the same settings. We then further analyze some properties of these trained convnets. First, we notice that they can be used to stylize other unseen images with same known style. Second, we show that retraining only a small subset of the network parameters can be sufficient to adapt these convnets to new styles.
摘要:针对两张照片之间的风格转移的问题,并提出了一种新的方式来维护写实。使用单对可作为输入的照片,我们培养了一对深卷积网络(convnets),其中的每一个中的一个的照片,以另一样式传送的。要强制写实,我们通过一个周期的一致性损失与自洽的损失相结合引入内容保护机制。实验结果表明,该方法不会在同一个环境中工作的方法观察到典型器物受到影响。然后,我们进一步分析这些训练有素的convnets的一些性质。首先,我们注意到,它们可以被用来风格化与相同的已知风格等看不见的图像。其次,我们表明,只有再培训的网络参数的一小部分就足以将这些convnets适应新的风格。

43. A Weighted Difference of Anisotropic and Isotropic Total Variation for Relaxed Mumford-Shah Color and Multiphase Image Segmentation [PDF] 返回目录
  Kevin Bui, Fredrick Park, Yifei Lou, Jack Xin
Abstract: In a class of piecewise-constant image segmentation models, we incorporate a weighted difference of anisotropic and isotropic total variation (TV) to regularize the partition boundaries in an image. To deal with the weighted anisotropic-isotropic TV, we apply the difference-of-convex algorithm (DCA), where the subproblems can be minimized by the primal-dual hybrid gradient method (PDHG). As a result, we are able to design an alternating minimization algorithm to solve the proposed image segmentation models. The models and algorithms are further extended to segment color images and to perform multiphase segmentation. In the numerical experiments, we compare our proposed models with the Chan-Vese models that use either anisotropic or isotropic TV and the two-stage segmentation methods (denoising and then thresholding) on various images. The results demonstrate the effectiveness and robustness of incorporating weighted anisotropic-isotropic TV in image segmentation.
摘要:在一类分段恒定的图像分割模型中,我们结合各向异性和各向同性总偏差(TV)的加权差规则化分区边界的图像。为了处理的加权各向异性各向同性电视,我们应用差的凸算法(DCA),其中,所述子问题可以通过原始对偶混合梯度法(PDHG)被最小化。其结果是,我们能够设计出交替最小化算法来解决所提出的图像分割模型。的模型和算法被进一步扩展到段彩色图像和执行多相分割。在数值实验中,我们比较我们提出的模型与使用任何各向异性或各向同性电视浐Vese模型和两阶段分割方法(去噪,然后进行阈值)上的各种图像。结果证明的有效性和在图像分割结合加权各向异性各向同性电视的鲁棒性。

44. ICE-GAN: Identity-aware and Capsule-Enhanced GAN for Micro-Expression Recognition and Synthesis [PDF] 返回目录
  Jianhui Yu, Chaoyi Zhang, Yang Song, Weidong Cai
Abstract: Micro-expressions can reflect peoples true feelings and motives, which attracts an increasing number of researchers into the studies of automatic facial micro-expression recognition (MER). The detection window of micro-expressions is too short in duration to be perceived by human eye, while their subtle face muscle movements also make MER a challenging task for pattern recognition. To this end, we propose a novel Identity-aware and Capsule-Enhanced Generative Adversarial Network (ICE-GAN), which is adversarially completed with the micro-expression synthesis (MES) task, where synthetic faces with controllable micro-expressions can be produced by the generator with distinguishable identity information to improve the MER performance. Meanwhile, the capsule-enhanced discriminator is optimized to simultaneously detect the authenticity and micro-expression class labels. Our ICE-GAN was evaluated on the 2nd Micro-Expression Grand Challenge (MEGC2019) and outperformed the winner by a significant margin (7%). To the best of our knowledge, we are the first work generating identity-preserving faces with different micro-expressions based on micro-expression datasets only.
摘要:微表情能反映人民的真实情感和动机,这吸引了研究人员越来越多的进入自动面部微表情识别(MER)的研究。微表情的检测窗口是在持续时间太短,人眼所感知,而他们的脸上细微的肌肉运动也使MER模式识别一项艰巨的任务。为此,我们提出了一种新颖的身份感知和胶囊增强剖成对抗性网络(ICE-GAN),其adversarially与微表达合成(MES)的任务,其中具有可控制的微表情合成面能够制造完成通过区分身份信息的生成,提高MER性能。与此同时,胶囊增强鉴别器被优化以同时检测的真实性和微表达类别标签。我们的ICE-GaN二号微表达大挑战(MEGC2019)进行了评估,并通过显著保证金(7%)跑赢大赢家。据我们所知,我们是第一个工作发生身份保留面孔与仅基于微表达数据不同的微表情。

45. Attentional Bottleneck: Towards an Interpretable Deep Driving Network [PDF] 返回目录
  Jinkyu Kim, Mayank Bansal
Abstract: Deep neural networks are a key component of behavior prediction and motion generation for self-driving cars. One of their main drawbacks is a lack of transparency: they should provide easy to interpret rationales for what triggers certain behaviors. We propose an architecture called Attentional Bottleneck with the goal of improving transparency. Our key idea is to combine visual attention, which identifies what aspects of the input the model is using, with an information bottleneck that enables the model to only use aspects of the input which are important. This not only provides sparse and interpretable attention maps (e.g. focusing only on specific vehicles in the scene), but it adds this transparency at no cost to model accuracy. In fact, we find slight improvements in accuracy when applying Attentional Bottleneck to the ChauffeurNet model, whereas we find that the accuracy deteriorates with a traditional visual attention model.
摘要:深层神经网络是行为预测和运动产生的自驾驶汽车的重要组成部分。他们的一个主要缺点是缺乏透明度:他们应该提供易于理解的理由什么触发某些行为。我们提出所谓的注意瓶颈与提高透明度的目标的架构。我们的主要想法是视觉注意,结合识别什么模型使用的输入,与信息瓶颈,使该模型的输入哪些是重要的只是使用方面的问题。这不仅提供了稀疏和解释的注意力地图(如只关注在现场特定的车辆),但它没有成本,以模型的准确性增加了这种透明度。事实上,我们发现在精度略有改善应用的注意瓶颈的ChauffeurNet模型时,而我们发现,与传统的视觉注意模型的精度降低。

46. VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [PDF] 返回目录
  Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, Cordelia Schmid
Abstract: Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.
摘要:在动态的,多智能体系统行为的预测是在自动驾驶汽车的背景下的一个重要问题,由于复杂的交涉和道路成分的相互作用,包括移动剂(如行人和车辆)和道路情境信息(例如,车道,红绿灯)。本文介绍VectorNet,分层图表神经网络,其第一攻击由矢量,然后模型中的所有组件之间的高次的交互表示的个别道路分量的空间局部性。相较于最近的做法,这使得移动代理和道路情境信息鸟瞰图像的运动轨迹,并对其进行编码与卷积神经网络(ConvNets),我们的方法在向量表示操作。通过对矢量高清(HD)映射和代理操作的轨迹,我们避免有损渲染和计算密集型ConvNet编码步骤。为了进一步提升VectorNet的学习背景特征的能力,我们提出了一种新的辅助任务来恢复基于它们的上下文随机屏蔽掉了地图的实体和代理轨迹。我们评估VectorNet对我们的内部行为预测基准和最近发布Argoverse预测数据集。我们对超过两个基准竞争呈现方式面值或更好的性能方法实现,同时节省了模型参数70%与FLOPS幅度减小的次序。它也优于现有技术的Argoverse数据集的状态。

47. View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors [PDF] 返回目录
  Walid Bekhtaoui, Ruhan Sa, Brian Teixeira, Vivek Singh, Klaus Kirchberg, Yao-jen Chang, Ankur Kapoor
Abstract: Point cloud based methods have produced promising results in areas such as 3D object detection in autonomous driving. However, most of the recent point cloud work focuses on single depth sensor data, whereas less work has been done on indoor monitoring applications, such as operation room monitoring in hospitals or indoor surveillance. In these scenarios multiple cameras are often used to tackle occlusion problems. We propose an end-to-end multi-person 3D pose estimation network, Point R-CNN, using multiple point cloud sources. We conduct extensive experiments to simulate challenging real world cases, such as individual camera failures, various target appearances, and complex cluttered scenes with the CMU panoptic dataset and the MVOR operation room dataset. Unlike most of the previous methods that attempt to use multiple sensor information by building complex fusion models, which often lead to poor generalization, we take advantage of the efficiency of concatenating point clouds to fuse the information at the input level. In the meantime, we show our end-to-end network greatly outperforms cascaded state-of-the-art models.
摘要:基于点云方式已产生有希望的领域的结果,例如在自动驾驶立体物检测。然而,大多数最近的点云工作的重点放在单一的深度传感器数据,而较少的工作已经在室内监控应用,如手术室在医院或室内监控监测完成。在这些情况下多个摄像头经常被用来解决遮挡问题。我们提出了一个终端到终端的多人3D姿态估计网络,点R-CNN,使用多点云的来源。我们进行了广泛的实验,以模拟挑战现实世界的情况下,如个别相机故障,各种目标露面,并与CMU全景数据集和MVOR手术室数据集复杂混乱的场面。不像大多数的试图通过构建复杂的融合模式,这往往导致泛化穷使用多个传感器信息以前的方法,我们采取串联点云融合在输入电平信息的效率优势。在此期间,我们将展示我们的终端到终端的网络大大优于级联国家的最先进的车型。

48. STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction [PDF] 返回目录
  Zhishuai Zhang, Jiyang Gao, Junhua Mao, Yukai Liu, Dragomir Anguelov, Congcong Li
Abstract: Detecting pedestrians and predicting future trajectories for them are critical tasks for numerous applications, such as autonomous driving. Previous methods either treat the detection and prediction as separate tasks or simply add a trajectory regression head on top of a detector. In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). In addition to 3D geometry modeling of pedestrians, we model the temporal information for each of the pedestrians. To do so, our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames and the comprehensive spatio-temporal information can be captured in the second stage. Also, we model the interaction among objects with an interaction graph, to gather the information among the neighboring objects. Comprehensive experiments on the Lyft Dataset and the recently released large-scale Waymo Open Dataset for both object detection and future trajectory prediction validate the effectiveness of the proposed method. For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73 and trajectory prediction average displacement error (ADE) of 33.67cm for pedestrians, which establish the state-of-the-art for both tasks.
摘要:行人检测和预测未来他们都轨迹许多应用,如自动驾驶的关键任务。以前的方法或者治疗所述检测和预测作为单独的任务或简单地在一个检测器的顶部加一轨迹回归头。在这项工作中,我们提出了一个新颖的端至端的两个阶段的网络:时空交互式网络(STINet)。除了行人的三维几何建模,我们的模型对每个行人的时间信息。要做到这一点,我们的方法预测了第一阶段的当前和过去的位置,让每一个行人可以跨框架链接和全面的时空信息可以在第二阶段被捕获。此外,我们的模型交互与互动图形对象之间,收集邻近的对象之间的信息。在Lyft数据集和最近发布的大型Waymo打开的数据集两个目标检测和未来的轨迹预测验证综合性实验所提出的方法的有效性。对于Waymo打开的数据集,我们实现了行人,其建立的国家的最先进的两个任务33.67厘米的80.73和轨迹预测平均位移误差(ADE)的鸟眼视图(BEV)检测AP。

49. Fundus2Angio: A Novel Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus Photography [PDF] 返回目录
  Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod
Abstract: Carrying out clinical diagnosis of retinal vascular degeneration using Fluorescein Angiography (FA) is a time consuming process and can pose significant adverse effects on the patient. Angiography requires insertion of a dye that may cause severe adverse effects and can even be fatal. Currently, there are no non-invasive systems capable of generating Fluorescein Angiography images. However, retinal fundus photography is anon-invasive imaging technique that can be completed in a few seconds. In order to eliminate the need for FA, we propose a conditional generative adversarial network (GAN)to translate fundus images to FA images. The proposed GAN consists of a novel residual block capable of generating high quality FA images. These images are important tools in the differential diagnosis of retinal diseases without the need for invasive procedure with possible side effects. Our experiments show that the proposed architecture outperforms other state-of-the-art generative networks. Furthermore, our proposed model achieves better qualitative results indistinguishable from real angiograms.
摘要:进行使用荧光血管造影(FA)的视网膜血管变性的临床诊断是一个耗时的过程,并且可以造成对患者显著不利影响。血管造影要求可能会导致严重的不良影响,甚至可能是致命的染料的插入。目前,有能够产生荧光素血管造影图像的没有非侵入性系统。然而,视网膜眼底照相是一种可以在几秒钟内完成不久侵入性成像技术。为了消除对FA的需求,我们提出了一个条件生成对抗网络(GAN)到眼底图像转换为FA图像。所提出的GAN包括能够产生高质量的图像FA的新颖的残余块的。这些图像是在视网膜疾病的鉴别诊断,而不需要与可能的副作用侵入性手术的重要工具。我们的实验表明,该架构优于国家的最先进的等生成网络。此外,我们提出的模型取得了较好的定性结果从真正的血管造影没有什么区别。

50. iUNets: Fully invertible U-Nets with Learnable Up- and Downsampling [PDF] 返回目录
  Christian Etmann, Rihuan Ke, Carola-Bibiane Schönlieb
Abstract: U-Nets have been established as a standard neural network design architecture for image-to-image learning problems such as segmentation and inverse problems in imaging. For high-dimensional applications, as they for example appear in 3D medical imaging, U-Nets however have prohibitive memory requirements. Here, we present a new fully-invertible U-Net-based architecture called the \emph{iUNet}, which allows for the application of highly memory-efficient backpropagation procedures. For this, we introduce learnable and invertible up and downsampling operations. An open source library in Pytorch for 1D, 2D and 3D data is made available.
摘要:U-网已经建立为标准的神经网络设计架构,图像到图像的学习问题,如分割和在成像逆问题。对于高维的应用,因为它们例如出现在三维医学成像,U-篮网然而有望而却步存储器需求。在这里,我们提出了一个新的完全可逆的基于U型网络架构称为\ {EMPH} iUNet,允许用于高内存效率的反向传播程序的应用。对于此,我们介绍可以学习和可逆的向上和向下采样操作。在一个Pytorch开源库为一维,二维和三维数据变得可用。

51. Medical Image Segmentation Using a U-Net type of Architecture [PDF] 返回目录
  Eshal Zahra, Bostan Ali, Wajahat Siddique
Abstract: Deep convolutional neural networks have been proven to be very effective in image related analysis and tasks, such as image segmentation, image classification, image generation, etc. Recently many sophisticated CNN based architectures have been proposed for the purpose of image segmentation. Some of these newly designed networks are used for the specific purpose of medical image segmentation, models like V-Net, U-Net and their variants. It has been shown that U-Net produces very promising results in the domain of medical image segmentation.However, in this paper, we argue that the architecture of U-Net, when combined with a supervised training strategy at the bottleneck layer, can produce comparable results with the original U-Net architecture. More specifically, we introduce a fully supervised FC layers based pixel-wise loss at the bottleneck of the encoder branch of U-Net. The two layer based FC sub-net will train the bottleneck representation to contain more semantic information, which will be used by the decoder layers to predict the final segmentation map. The FC layer based sub-net is trained by employing the pixel-wise cross entropy loss, while the U-Net architectures trained by using L1 loss.
摘要:深卷积神经网络已经被证明是在图像的相关分析和任务,如图像分割,图像分类,图像生成等。最近很多复杂的基于CNN架构已经被提出了图像分割的目的是非常有效的。其中一些新设计的网络被用于医学图像分割的具体目的,车型,如V-网,U-Net和它们的变体。已经显示的是U型的净产生非常有希望在医学图像segmentation.However的域的结果,在本文中,我们认为,U形网的结构,当与在瓶颈层一个监督训练策略相结合,可以生产比较的结果与原来的U型网结构。更具体地说,我们介绍在U形网中的编码器分支的瓶颈基于逐个像素损失完全监督FC层。两个层基于FC子网将训练的瓶颈表示以包含更多的语义信息,这将是由译码器层可以用来预测最终的分割图。所述FC层基于子网通过采用逐像素交叉熵损失的训练,而U-Net的架构通过使用L1损失训练。

52. A Contrast-Adaptive Method for Simultaneous Whole-Brain and Lesion Segmentation in Multiple Sclerosis [PDF] 返回目录
  Stefano Cerri, Oula Puonti, Dominik S. Meier, Jens Wuerfel, Mark Mühlau, Hartwig R. Siebner, Koen Van Leemput
Abstract: Here we present a method for the simultaneous segmentation of white matter lesions and normal-appearing neuroanatomical structures from multi-contrast brain MRI scans of multiple sclerosis patients. The method integrates a novel model for white matter lesions into a previously validated generative model for whole-brain segmentation. By using separate models for the shape of anatomical structures and their appearance in MRI, the algorithm can adapt to data acquired with different scanners and imaging protocols without retraining. We validate the method using three disparate datasets, showing state-of-the-art performance in white matter lesion segmentation while simultaneously segmenting dozens of other brain structures. We further demonstrate that the contrast-adaptive method can also be applied robustly to MRI scans of healthy controls, and replicate previously documented atrophy patterns in deep gray matter structures in MS. The algorithm is publicly available as part of the open-source neuroimaging package FreeSurfer.
摘要:在这里,我们提出了脑白质病变和正常表现神经解剖结构从多发性硬化症患者多对比脑部MRI扫描同时分割方法。该方法集成了脑白质病变一个新的模型转化为全脑分割先前验证生成模型。通过使用单独的模型的解剖结构的形状以及它们在MRI的外观,该算法可以适应而不重新训练不同的扫描仪和成像协议获取的数据。我们验证使用三个不同的数据集的方法,示出了白质病变分割状态的最先进的性能,而同时分割数十个其它脑结构的。我们进一步证明对比度自适应方法也可以鲁棒地对MRI应用健康对照的扫描,并在MS复制在深部灰质结构先前已记载萎缩图案。该算法是公开可用的开源神经影像学包FreeSurfer的一部分。

53. A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms [PDF] 返回目录
  Said Boumaraf, Xiabi Liu, Chokri Ferkous, Xiaohong Ma
Abstract: Mammography remains the most prevalent imaging tool for early breast cancer screening. The language used to describe abnormalities in mammographic reports is based on the breast Imaging Reporting and Data System (BI-RADS). Assigning a correct BI-RADS category to each examined mammogram is a strenuous and challenging task for even experts. This paper proposes a new and effective computer-aided diagnosis (CAD) system to classify mammographic masses into four assessment categories in BI-RADS. The mass regions are first enhanced by means of histogram equalization and then semiautomatically segmented based on the region growing technique. A total of 130 handcrafted BI-RADS features are then extrcated from the shape, margin, and density of each mass, together with the mass size and the patient's age, as mentioned in BI-RADS mammography. Then, a modified feature selection method based on the genetic algorithm (GA) is proposed to select the most clinically significant BI-RADS features. Finally, a back-propagation neural network (BPN) is employed for classification, and its accuracy is used as the fitness in GA. A set of 500 mammogram images from the digital database of screening mammography (DDSM) is used for evaluation. Our system achieves classification accuracy, positive predictive value, negative predictive value, and Matthews correlation coefficient of 84.5%, 84.4%, 94.8%, and 79.3%, respectively. To our best knowledge, this is the best current result for BI-RADS classification of breast masses in mammography, which makes the proposed system promising to support radiologists for deciding proper patient management based on the automatically assigned BI-RADS categories.
摘要:乳腺X线摄影仍是早期乳腺癌筛查的最普遍的成像工具。用于描述乳腺报告异常的语言是基于乳腺成像报告和数据系统(BI-RADS)。分配正确的BI-RADS分类到各个检查乳房X光检查是即使是专家一个艰苦和具有挑战性的任务。本文提出了一种新的有效的计算机辅助诊断(CAD)系统进行分类乳腺肿块进入BI-RADS 4类评估。的质量区域首先通过直方图均衡化的手段增强,并且基于该区域生长技术然后半自动分段。总共130手工BI-RADS特征然后从形状,边缘,并且每个质量密度extrcated,随着质量的大小和患者的年龄在一起,如在BI-RADS乳房X射线摄影中提到。然后,基于遗传算法(GA)经修饰的特征选择方法,提出了以选择最具临床显著BI-RADS特征。最后,反向传播神经网络(BPN)用于分类,其精度在用作GA健身。一组从乳房X射线摄影筛查(DDSM)的数字数据库500个乳房X射线检查图像被用于评价。我们的系统实现了分级精度,阳性预测值,阴性预测值,并且分别84.5%,84.4%,94.8%,和79.3%,马修斯相关系数。据我们所知,这是在乳房X光检查乳房肿块的BI-RADS分级,这使得所提出的系统承诺支持放射科医师,用于基于自动分配BI-RADS分类正确的病人管理的最佳当前结果。

54. Autonomous Tissue Scanning under Free-Form Motion for Intraoperative Tissue Characterisation [PDF] 返回目录
  Jian Zhan, Joao Cartucho, Stamatia Giannarou
Abstract: In Minimally Invasive Surgery (MIS), tissue scanning with imaging probes is required for subsurface visualisation to characterise the state of the tissue. However, scanning of large tissue surfaces in the presence of deformation is a challenging task for the surgeon. Recently, robot-assisted local tissue scanning has been investigated for motion stabilisation of imaging probes to facilitate the capturing of good quality images and reduce the surgeon's cognitive load. Nonetheless, these approaches require the tissue surface to be static or deform with periodic motion. To eliminate these assumptions, we propose a visual servoing framework for autonomous tissue scanning, able to deal with free-form tissue deformation. The 3D structure of the surgical scene is recovered and a feature-based method is proposed to estimate the motion of the tissue in real-time. A desired scanning trajectory is manually defined on a reference frame and continuously updated using projective geometry to follow the tissue motion and control the movement of the robotic arm. The advantage of the proposed method is that it does not require the learning of the tissue motion prior to scanning and can deal with free-form deformation. We deployed this framework on the da Vinci surgical robot using the da Vinci Research Kit (dVRK) for Ultrasound tissue scanning. Since the framework does not rely on information from the Ultrasound data, it can be easily extended to other probe-based imaging modalities.
摘要:在微创外科手术(MIS),组织扫描与成像探针需要地下可视化以表征所述组织的状态。然而,在变形的情况下的大的组织表面扫描对于外科医生具有挑战性的任务。最近,机器人辅助的局部组织扫描已被研究用于成像探针以促进良好质量的图像的拍摄,降低了外科医生的认知负荷的运动稳定化。然而,这些方法需要在组织表面是静态的或变形以周期性运动。为了消除这些假设,我们提出了自治组织扫描一个视觉伺服框架,能够处理自由形式的组织变形。外科手术场景的3D结构被回收并提出了一种基于特征的方法来估计在实时的组织的运动。期望的扫描轨迹是一个参考帧的手动定义,并使用投影几何跟随组织运动和控制机器人臂的运动不断更新。所提出的方法的优点是,它不需要组织运动的学习扫描之前和可以处理自由变形。我们使用达芬奇研究工具包(dVRK)的超声扫描组织部署在达芬奇外科手术机器人这一框架。由于框架不依赖于从超声数据的信息,可以很容易地扩展到其它基于探针的成像模态。

55. Deep Reinforcement Learning for Organ Localization in CT [PDF] 返回目录
  Fernando Navarro, Anjany Sekuboyina, Diana Waldmannstetter, Jan C. Peeken, Stephanie E. Combs, Bjoern H. Menze
Abstract: Robust localization of organs in computed tomography scans is a constant pre-processing requirement for organ-specific image retrieval, radiotherapy planning, and interventional image analysis. In contrast to current solutions based on exhaustive search or region proposals, which require large amounts of annotated data, we propose a deep reinforcement learning approach for organ localization in CT. In this work, an artificial agent is actively self-taught to localize organs in CT by learning from its asserts and mistakes. Within the context of reinforcement learning, we propose a novel set of actions tailored for organ localization in CT. Our method can use as a plug-and-play module for localizing any organ of interest. We evaluate the proposed solution on the public VISCERAL dataset containing CT scans with varying fields of view and multiple organs. We achieved an overall intersection over union of 0.63, an absolute median wall distance of 2.25 mm, and a median distance between centroids of 3.65 mm.
摘要:在CT扫描器官的鲁棒的定位是器官特异性的图像检索,放疗计划,并介入图像分析恒定的预处理要求。在与基于穷举搜索或地区的建议,这就需要大量的注释数据的现有解决方案,我们提出了在CT器官本地化深强化学习方法。在这项工作中,一个人工座席正在积极自学通过学习从它的断言和错误本地化的CT器官。在强化学习的背景下,我们提出了一个新颖的一套在CT器官本土化定制的行动。我们的方法可以作为插件和播放模块用于本地化感兴趣的任何器官。我们评估对含CT公众内脏数据集所提出的解决方案与视图和多脏器的变化的磁场扫描。我们实现了总体交点超过工会0.63,2.25毫米的绝对平均壁的距离,和3.65毫米质心之间的中值的距离。

56. Learning to hash with semantic similarity metrics and empirical KL divergence [PDF] 返回目录
  Heikki Arponen, Tom E. Bishop
Abstract: Learning to hash is an efficient paradigm for exact and approximate nearest neighbor search from massive databases. Binary hash codes are typically extracted from an image by rounding output features from a CNN, which is trained on a supervised binary similar/ dissimilar task. Drawbacks of this approach are: (i) resulting codes do not necessarily capture semantic similarity of the input data (ii) rounding results in information loss, manifesting as decreased retrieval performance and (iii) Using only class-wise similarity as a target can lead to trivial solutions, simply encoding classifier outputs rather than learning more intricate relations, which is not detected by most performance metrics. We overcome (i) via a novel loss function encouraging the relative hash code distances of learned features to match those derived from their targets. We address (ii) via a differentiable estimate of the KL divergence between network outputs and a binary target distribution, resulting in minimal information loss when the features are rounded to binary. Finally, we resolve (iii) by focusing on a hierarchical precision metric. Efficiency of the methods is demonstrated with semantic image retrieval on the CIFAR-100, ImageNet and Conceptual Captions datasets, using similarities inferred from the WordNet label hierarchy or sentence embeddings.
摘要:学习到哈希值是从海量数据库精确和近似最近邻搜索的有效模式。二进制散列代码通常从图像通过从CNN,在监督二进制相似/不相似的任务,其被训练四舍五入输出特征提取。这种方法的缺点是:(i)产生的码不一定捕获输入数据的语义相似度(II)的舍入中的信息损失的结果,表现为降低的检索性能和(iii)仅使用类明智相似度作为目标可导致到平凡解,简单地编码分类器输出,而不是学习更复杂的关系,这是不被大多数性能指标进行检测。我们克服了(我)通过一种新颖的损失函数鼓励学特征的相对散列码的距离,以匹配他们的目标的。我们针对(ⅱ)通过网络输出和二进制目标分布之间的KL散度的微分估计,导致最小的信息丢失时,特征被舍入为二进制。最后,我们专注于分级精度指标解析(三)。的方法效率证明与在CIFAR-100语义图像检索,ImageNet与概念字幕数据集,使用从WordNet的标签层次结构或句子的嵌入推断相似之处。

57. Gleason Score Prediction using Deep Learning in Tissue Microarray Image [PDF] 返回目录
  Yi-hong Zhang, Jing Zhang, Yang Song, Chaomin Shen, Guang Yang
Abstract: Prostate cancer (PCa) is one of the most common cancers in men around the world. The most accurate method to evaluate lesion levels of PCa is microscopic inspection of stained biopsy tissue and estimate the Gleason score of tissue microarray (TMA) image by expert pathologists. However, it is time-consuming for pathologists to identify the cellular and glandular patterns for Gleason grading in large TMA images. We used Gleason2019 Challenge dataset to build a convolutional neural network (CNN) model to segment TMA images to regions of different Gleason grades and predict the Gleason score according to the grading segmentation. We used a pre-trained model of prostate segmentation to increase the accuracy of the Gleason grade segmentation. The model achieved a mean Dice of 75.6% on the test cohort and ranked 4th in the Gleason2019 Challenge with a score of 0.778 combined of Cohen's kappa and the f1-score.
摘要:前列腺癌(PCA)是在世界各地的男性最常见的癌症之一。最精确的方法来评价损伤程度的前列腺癌是染色的活检组织的显微镜检查和由病理专家估计Gleason评分组织微阵列(TMA)的图像。然而,这是费时的病理学家,以确定格里森在大TMA图像分级细胞和腺体模式。我们使用Gleason2019挑战数据集构建一个卷积神经网络(CNN)模型来段TMA图像以不同的Gleason等级的区域和预测Gleason评分根据分级分割。我们用前列腺分割的预先训练模式,以增加Gleason分级分割的准确性。该模型实现了75.6%的平均骰子上的测试群和排名第4的Gleason2019挑战赛得分为0.778组合科恩kappa和F1分数。

58. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [PDF] 返回目录
  Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine
Abstract: This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
摘要:这项工作提出了多分类的新挑战组,侧重于多模因检测仇恨言论。它是这样构成的单峰模型奋斗,只有多模式能够成功:困难的例子(“良性的混淆”)添加到数据集,使其很难依靠单峰信号。该任务需要微妙的推理,又是简单的计算为二元分类问题。我们提供单峰模型基准性能数据,以及为多型号不同复杂程度。我们发现国家的最先进的方法,即表现不佳相比,人类(64.73%对84.7%的准确度),说明任务的艰巨性,突出的挑战,这一重要问题的姿态给社会。

59. Hierarchical Regression Network for Spectral Reconstruction from RGB Images [PDF] 返回目录
  Yuzhi Zhao, Lai-Man Po, Qiong Yan, Wei Liu, Tingyu Lin
Abstract: Capturing visual image with a hyperspectral camera has been successfully applied to many areas due to its narrow-band imaging technology. Hyperspectral reconstruction from RGB images denotes a reverse process of hyperspectral imaging by discovering an inverse response function. Current works mainly map RGB images directly to corresponding spectrum but do not consider context information explicitly. Moreover, the use of encoder-decoder pair in current algorithms leads to loss of information. To address these problems, we propose a 4-level Hierarchical Regression Network (HRNet) with PixelShuffle layer as inter-level interaction. Furthermore, we adopt a residual dense block to remove artifacts of real world RGB images and a residual global block to build attention mechanism for enlarging perceptive field. We evaluate proposed HRNet with other architectures and techniques by participating in NTIRE 2020 Challenge on Spectral Reconstruction from RGB Images. The HRNet is the winning method of track 2 - real world images and ranks 3rd on track 1 - clean images. Please visit the project web page this https URL to try our codes and pre-trained models.
摘要:捕捉视觉图像与高光谱相机已成功地应用于许多领域,由于其窄带成像技术。从RGB图像的高光谱重建通过发现逆响应函数表示超光谱成像的逆过程。目前工程主要RGB图像直接映射到相应的频谱,但没有明确考虑上下文信息。此外,在当前的算法导致的信息损失的使用编码器 - 解码器对。为了解决这些问题,提出了一种4级分层回归网络(HRNet)与PixelShuffle层作为层间的相互作用。此外,我们采用残余致密块状,除去真实世界的RGB图像和残留全球块的文物建立注意机制为扩大感知领域。我们评估由RGB图像参与NTIRE 2020挑战赛上光谱重建提出HRNet与其他架构和技术。该HRNet是轨道2的中奖方法 - 真实世界的影像和排名第1条轨道3号 - 干净的影像。请访问项目的网页此HTTPS URL来尝试我们的代码和预先训练模式。

60. Segmentation of Macular Edema Datasets with Small Residual 3D U-Net Architectures [PDF] 返回目录
  Jonathan Frawley, Chris G. Willcocks, Maged Habib, Caspar Geenen, David H. Steel, Boguslaw Obara
Abstract: This paper investigates the application of deep convolutional neural networks with prohibitively small datasets to the problem of macular edema segmentation. In particular, we investigate several different heavily regularized architectures. We find that, contrary to popular belief, neural architectures within this application setting are able to achieve close to human-level performance on unseen test images without requiring large numbers of training examples. Annotating these 3D datasets is difficult, with multiple criteria required. It takes an experienced clinician two days to annotate a single 3D image, whereas our trained model achieves similar performance in less than a second. We found that an approach which uses targeted dataset augmentation, alongside architectural simplification with an emphasis on residual design, has acceptable generalization performance - despite relying on fewer than 15 training examples.
摘要:本文探讨深卷积神经网络的过于小数据集黄斑水肿分割问题的应用程序。特别是,我们研究了几种不同的重正规化架构。我们发现,与普遍观点相反,这个应用程序设置中的神经结构都能够实现接近上看不见的测试图像人类水平的性能,而不需要大量的训练例子。注释这些3D数据集是困难的,与需要多个标准。它需要一个有经验的临床医师两天标注单个3D图像,而我们的训练模型实现了在不到一秒钟类似的性能。我们发现,一种方法,它使用有针对性的数据集增大,旁边的建筑简化对剩余的设计重点,具有可接受的泛化性能 - 尽管依靠少于15个训练实例。

61. Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning [PDF] 返回目录
  Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu
Abstract: Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
摘要:大多数图像字幕模型是自回归的,即它们生成之前生成的话,这会导致沉重的延迟推理过程中调节每一个字。最近,非自回归解码已经在机器翻译中提出的并行产生的所有单词,加快推理时间。通常情况下,这些模型使用词级交叉熵损失独立地优化每个字。然而,这样的学习过程中没有考虑到的句子级的一致性,从而导致这些非自回归模型的低劣质量产生。在本文中,我们提出了一个非自回归图片字幕(NAIC)用一种新的培训模式模式:反事实的关键Multi-Agent的学习(CMAL)。 CMAL制定NAIC为靶序列中的位置被看作是学会协作最大化句级奖励代理多代理强化学习系统。此外,我们建议利用大量未标记的图像,以提高性能的字幕。在MSCOCO图像大量的实验字幕基准表明我们的模型NAIC达到堪比国家的最先进的自回归模型性能,同时带来了13.9倍解码加速。

62. Learning Context-Based Non-local Entropy Modeling for Image Compression [PDF] 返回目录
  Mu Li, Kai Zhang, Wangmeng Zuo, Radu Timofte, David Zhang
Abstract: The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in the performance. However, existing deep learning based entropy modeling methods generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinder the accurate entropy estimation. To address this issue, we propose a non-local operation for context modeling by employing the global similarity within the context. Specifically, we first introduce the proxy similarity functions and spatial masks to handle the missing reference problem in context modeling. Then, we combine the local and the global context via a non-local attention block and employ it in masked convolutional networks for entropy modeling. The entropy model is further adopted as the rate loss in a joint rate-distortion optimization to guide the training of the analysis transform and the synthesis transform network in transforming coding framework. Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the superiority of the proposed context-based non-local attention block in entropy modeling and the U-Net block in low distortion compression against the existing image compression standards and recent deep image compression models.
摘要:在代码的熵通常作为最近了解到有损图像压缩方法的速度损失。码的概率分布的精确估计起着性能至关重要的作用。然而,现有的深度学习基于熵建模方法通常假定潜码是统计独立或依赖于某些方面的信息或当地情况,从而未能采取的背景下,全球的相似考虑,从而阻碍了准确的熵估计。为了解决这个问题,我们采用的背景下,全球的相似性提出了上下文建模非本地操作。具体而言,我们首先介绍了代理相似的功能和空间口罩来处理在上下文建模缺少引用问题。然后,我们结合本地和全球范围内通过非本地注块,并用它在熵模型掩盖卷积网络。熵模型进一步采用为在联合率失真优化的速率损失来指导分析的训练变换和合成的变换网络中转化编码框架。考虑到变换的宽度是在训练低失真模型必不可少的,我们终于产生了转换为U-Net的块以增加与管理的内存消耗和时间复杂度的宽度。柯达和Tecnick数据集实验证明熵模型,并针对存在的图像压缩标准,近期深图像压缩模式低失真压缩在U-NET块所提出的基于上下文的非本地注块的优越性。

63. BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps [PDF] 返回目录
  Wang Zhu, Hexiang Hu, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha
Abstract: Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially. A special design memory buffer is used by the agent to turn its past experiences into contexts for future steps. The learning process is composed of two phases. In the first phase, the agent uses imitation learning from demonstration to accomplish BabySteps. In the second phase, the agent uses curriculum-based reinforcement learning to maximize rewards on navigation tasks with increasingly longer instructions. We create two new benchmark datasets (of long navigation tasks) and use them in conjunction with existing ones to examine BabyWalk's generalization ability. Empirical results show that BabyWalk achieves state-of-the-art results on several metrics, in particular, is able to follow long instructions better. The codes and the datasets are released on our project page this https URL.
摘要:学习遵循指示是为视觉和语言导航自主代理(VLN)至关重要。在本文中,我们研究了来自包括较短的语料库学习当代理如何导航长路径。我们发现,现有的国家的最先进的药物不能推广好。为此,我们提出BabyWalk,即通过分解长的指令转换成较短的(BabySteps)中,依次完成他们学会了导航新VLN剂。特殊设计的内存缓冲区用于由代理把它过去的经历为背景的未来步骤。学习的过程是由两个阶段组成。在第一阶段,代理使用模仿学习的示范来完成BabySteps。在第二阶段,代理使用基于课程的强化学习与越来越长的指令最大限度地提高导航任务奖励。我们创建了两个新的基准数据集(长导航任务),并结合使用他们现有的检查BabyWalk的泛化能力。实证结果表明,BabyWalk达到几个指标的国家的最先进的成果,特别是能够按照指令长好。该代码和数据集被释放我们的项目页面此HTTPS URL上。

64. Duality in Persistent Homology of Images [PDF] 返回目录
  Adélie Garin, Teresa Heiss, Kelly Maggs, Bea Bleibe, Vanessa Robins
Abstract: We derive the relationship between the persistent homology barcodes of two dual filtered CW complexes. Applied to greyscale digital images, we obtain an algorithm to convert barcodes between the two different (dual) topological models of pixel connectivity.
摘要:我们推导出两个双过滤CW复合物的持续同源条形码之间的关系。应用为灰度数字图像,我们得到一个算法,两种不同的(双)拓扑像素连接的模型之间转换的条形码。

65. A Hybrid Swarm and Gravitation based feature selection algorithm for Handwritten Indic Script Classification problem [PDF] 返回目录
  Ritam Guha, Manosij Ghosh, Pawan Kumar Singh, Ram Sarkar, Mita Nasipuri
Abstract: In any multi-script environment, handwritten script classification is of paramount importance before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimension, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In our paper, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation based FS (HSGFS). This algorithm is made to run on 3 feature vectors introduced in the literature recently - Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG) and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) are used for the handwritten script classification. Handwritten datasets, prepared at block, text-line and word level, consisting of officially recognized 12 Indic scripts are used for the evaluation of our method. An average improvement in the range of 2-5 % is achieved in the classification accuracies by utilizing only about 75-80 % of the original feature vectors on all three datasets. The proposed methodology also shows better performance when compared to some popularly used FS models.
摘要:在任何多脚本环境,手写脚本分类是最重要的文档图像被输入到各自的光学字符识别(OCR)引擎之前。多年来,这个复杂的模式分类问题已经由研究人员提出的各种特征向量大多具有大尺寸,从而增加了整个分类模型的计算复杂度来解决。特征选择(FS)可以作为一个中间步骤通过将它们仅限制到基本和相关特征,以减少的特征向量的大小。在本文中,我们已经通过引入新的FS算法,称为混合群和万有引力基于FS(HSGFS)解决了这个问题。该算法是由对最近在文献中介绍了3个的特征向量运行 - 距离-Hough变换(DHT),方向梯度直方图(HOG)和改性数的Gabor(MLG)滤波器变换。三个国家的最先进的分类,即多层感知(MLP),K近邻(KNN)和支持向量机(SVM)用于手写脚本分类。手写的数据集,在框,文本线和字级编制,由官方认可的12个印度文脚本用于我们的方法的评价。在2-5%的范围内的平均改善在分类精度利用原始特征向量的只有约75%-80%在所有三个数据集来实现的。拟议的方法还显示相比,一些普遍使用FS机型更好的表现。

66. An Integrated Enhancement Solution for 24-hour Colorful Imaging [PDF] 返回目录
  Feifan Lv, Yinqiang Zheng, Yicheng Li, Feng Lu
Abstract: The current industry practice for 24-hour outdoor imaging is to use a silicon camera supplemented with near-infrared (NIR) illumination. This will result in color images with poor contrast at daytime and absence of chrominance at nighttime. For this dilemma, all existing solutions try to capture RGB and NIR images separately. However, they need additional hardware support and suffer from various drawbacks, including short service life, high price, specific usage scenario, etc. In this paper, we propose a novel and integrated enhancement solution that produces clear color images, whether at abundant sunlight daytime or extremely low-light nighttime. Our key idea is to separate the VIS and NIR information from mixed signals, and enhance the VIS signal adaptively with the NIR signal as assistance. To this end, we build an optical system to collect a new VIS-NIR-MIX dataset and present a physically meaningful image processing algorithm based on CNN. Extensive experiments show outstanding results, which demonstrate the effectiveness of our solution.
摘要:24小时室外成像当前的工业实践是使用补充有近红外(NIR)照明的硅相机。这将导致与在夜间在白天对比度和不存在色度的差的彩色图像。对于这个难题,所有现有的解决方案试图单独捕捉RGB和近红外图像。然而,他们需要额外的硬件支持,并从各种缺点,包括使用寿命短,价格高,具体的使用方案等。在本文中受苦,我们提出了一个新颖的和集成的增强解决方案,在阳光充足的白天产生清晰的彩色图像,无论是或极低光夜间。我们的主要想法是与近红外信号作为援助自适应分开的混合信号的可见光和近红外信息,增强了VIS信号。为此,我们构建的光学系统,以收集基于CNN一个新的VIS-NIR-MIX数据集和存在于物理上有意义的图像处理算法。大量的实验证明了优异成绩,这证明我们的解决方案的有效性。

67. A Survey on Deep Learning for Neuroimaging-based Brain Disorder Analysis [PDF] 返回目录
  Li Zhang, Mingliang Wang, Mingxia Liu, Daoqiang Zhang
Abstract: Deep learning has been recently used for the analysis of neuroimages, such as structural magnetic resonance imaging (MRI), functional MRI, and positron emission tomography (PET), and has achieved significant performance improvements over traditional machine learning in computer-aided diagnosis of brain disorders. This paper reviews the applications of deep learning methods for neuroimaging-based brain disorder analysis. We first provide a comprehensive overview of deep learning techniques and popular network architectures, by introducing various types of deep neural networks and recent developments. We then review deep learning methods for computer-aided analysis of four typical brain disorders, including Alzheimer's disease, Parkinson's disease, Autism spectrum disorder, and Schizophrenia, where the first two diseases are neurodegenerative disorders and the last two are neurodevelopmental and psychiatric disorders, respectively. More importantly, we discuss the limitations of existing studies and present possible future directions.
摘要:深学习最近已用于neuroimages,如结构磁共振成像(MRI),功能磁共振成像和正电子发射断层扫描(PET)的分析,并取得了比传统的机器学习显著性能改进在计算机辅助诊断的脑部疾病。本文回顾了深学习方法基于神经影像学,脑功能障碍的分析应用。首先,我们提供的深度学习技术和流行的网络架构全面的概述,通过引入不同类型的深层神经网络和最近的事态发展的。然后,我们审查的四种典型的脑功能障碍,包括阿尔茨海默氏病,帕金森氏症,自闭症谱系障碍,和精神分裂症,其中前两个疾病是神经退行性疾病,最后两个分别是神经和精神疾病,计算机辅助分析深的学习方法。更重要的是,我们将讨论现有的研究并提出可能的未来发展方向的局限性。

68. Non-recurrent Traffic Congestion Detection with a Coupled Scalable Bayesian Robust Tensor Factorization Model [PDF] 返回目录
  Qin Li, Huachun Tan, Xizhu Jiang, Yuankai Wu, Linhui Ye
Abstract: Non-recurrent traffic congestion (NRTC) usually brings unexpected delays to commuters. Hence, it is critical to accurately detect and recognize the NRTC in a real-time manner. The advancement of road traffic detectors and loop detectors provides researchers with a large-scale multivariable temporal-spatial traffic data, which allows the deep research on NRTC to be conducted. However, it remains a challenging task to construct an analytical framework through which the natural spatial-temporal structural properties of multivariable traffic information can be effectively represented and exploited to better understand and detect NRTC. In this paper, we present a novel analytical training-free framework based on coupled scalable Bayesian robust tensor factorization (Coupled SBRTF). The framework can couple multivariable traffic data including traffic flow, road speed, and occupancy through sharing a similar or the same sparse structure. And, it naturally captures the high-dimensional spatial-temporal structural properties of traffic data by tensor factorization. With its entries revealing the distribution and magnitude of NRTC, the shared sparse structure of the framework compasses sufficiently abundant information about NRTC. While the low-rank part of the framework, expresses the distribution of general expected traffic condition as an auxiliary product. Experimental results on real-world traffic data show that the proposed method outperforms coupled Bayesian robust principal component analysis (coupled BRPCA), the rank sparsity tensor decomposition (RSTD), and standard normal deviates (SND) in detecting NRTC. The proposed method performs even better when only traffic data in weekdays are utilized, and hence can provide more precise estimation of NRTC for daily commuters.
摘要:非经常性交通拥堵(NRTC),往往会带来意想不到的延误乘客。因此,关键是要精确地检测和识别在NRTC实时方式。道路交通检测器和线圈检测器的发展提供了一个大型的多元时空交通数据,它允许进行上NRTC的深入研究的研究人员。然而,它仍然是一个挑战性的任务来构建,通过它的多变量交通信息的天然时空结构特性可以有效地表示和利用来更好地理解和检测NRTC的分析体系。在本文中,我们提出基于耦合可伸缩贝叶斯健壮张量因子分解(耦合SBRTF)一种新颖的分析免费培训的框架。该框架可以对夫妇多变量的交通数据,包括流量,行进速度,并通过占用共享相似或相同稀疏结构。而且,它自然地张量分解捕获业务数据的高维空间 - 时间结构特性。其条目揭示NRTC的分布和大小,该框架的共享稀疏结构圆规约NRTC足够丰富的信息。而框架的低阶部分,表达的一般预期的交通状况的分布作为辅助产品。真实世界业务数据实验结果表明,所提出的方法优于在检测NRTC耦合贝叶斯健壮主成分分析(耦合BRPCA),秩稀疏张量分解(RSTD),和标准正态分布偏离(SND)。所提出的方法进行甚至更好当仅在工作日交通数据被利用,因此可以为每日乘客提供NRTC的更精确的估计。

69. Efficient Privacy Preserving Edge Computing Framework for Image Classification [PDF] 返回目录
  Omobayode Fagbohungbe, Sheikh Rufsan Reza, Xishuang Dong, Lijun Qian
Abstract: In order to extract knowledge from the large data collected by edge devices, traditional cloud based approach that requires data upload may not be feasible due to communication bandwidth limitation as well as privacy and security concerns of end users. To address these challenges, a novel privacy preserving edge computing framework is proposed in this paper for image classification. Specifically, autoencoder will be trained unsupervised at each edge device individually, then the obtained latent vectors will be transmitted to the edge server for the training of a classifier. This framework would reduce the communications overhead and protect the data of the end users. Comparing to federated learning, the training of the classifier in the proposed framework does not subject to the constraints of the edge devices, and the autoencoder can be trained independently at each edge device without any server involvement. Furthermore, the privacy of the end users' data is protected by transmitting latent vectors without additional cost of encryption. Experimental results provide insights on the image classification performance vs. various design parameters such as the data compression ratio of the autoencoder and the model complexity.
摘要:为了从通过边缘设备,传统的基于云的方法收集的大型数据提取知识,需要数据上传可能是不可行的,由于通信带宽限制以及最终用户的隐私和安全问题。为了应对这些挑战,一个新的隐私保护边缘计算框架,本文对图像分类提出。具体而言,自动编码器将在单独,然后将所得到的潜矢量将被发送到边缘服务器用于分类器的训练每个边缘装置被训练无监督。该框架将减少开销的通信和保护最终用户的数据。相较于联合学习,分类的建议框架的培训不会受到边缘设备的限制,并自动编码器可以独立地在每一边缘设备不需要任何服务器参与的培训。此外,最终用户的数据的私密性是通过传送潜矢量未经加密额外成本的保护。实验结果提供在图像分类性能与各种设计参数的见解,如自动编码器的数据压缩率和模型的复杂性。

70. Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications [PDF] 返回目录
  Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Tusuke Matsui, Koki Tsubota, Hikaru Ikuta
Abstract: Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.
摘要:漫画,漫画或者,这是一种多模式的艺术品,一直留在近期因缺乏适当的数据集的深度学习的应用趋势。因此,我们建立Manga109,包括各种109个日本漫画(94名作家和21142页)的数据集,并使其可公开获得通过获取学术用途作者权限。我们仔细地诠释了框架,讲稿,人物的面孔,性格机构;注释的总数超过500K。此数据集提供了大量的漫画图片和注释,这将是在机器学习算法及其评估利用有益。除了学术用途,我们获得进一步许可数据集用于工业用途的一个子集。在这篇文章中,我们描述了数据集的详细信息,并呈现多媒体处理应用于现有的深度学习方法和数据集是由可能的应用(检测,检索和代)的几个例子。

71. Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features" [PDF] 返回目录
  Franz Götz-Hahn, Vlad Hosu, Dietmar Saupe
Abstract: In Neural Processing Letters 50,3 (2019) a machine learning approach to blind video quality assessment was proposed. It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks. The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art. In this letter we report the results from our careful reimplementations. The performance results, claimed in the paper, cannot be reached, and are even below the state-of-the-art by a large margin. We show that the originally reported wrong performance results are a consequence of two cases of data leakage. Information from outside the training dataset was used in the fine-tuning stage and in the model evaluation.
摘要:在神经处理快报50,3(2019)机器学习的方法来盲视频质量评估中提出的。它是基于视频帧,从深卷积神经网络的最后汇聚层考虑的特征的时间池。该方法进行了验证两个既定基准数据集,远不如以前的国家的最先进的给出的结果。在这封信中,我们报告从我们重新实现小心的结果。性能结果,要求中的纸,不能达到,并且甚至低于大幅度的状态的最先进的。我们发现,最初公布的错误表现效果的数据泄露的两种情况下的结果。从训练数据集之外的信息是在微调阶段,在模型评估中使用。

72. Enhancing LGMD's Looming Selectivity for UAVs with Spatial-temporal Distributed Presynaptic Connection [PDF] 返回目录
  Jiannan Zhao, Hongxin Wang, Shigang Yue
Abstract: Collision detection is one of the most challenging tasks for Unmanned Aerial Vehicles (UAVs), especially for small or micro UAVs with limited computational power. In nature, fly insects with compact and simple visual systems demonstrate the amazing ability to navigating and avoid collision in a complex environment. A good example of this is locusts. Locusts avoid collision in a dense swarm relying on an identified vision neuron called Lobula Giant Movement Detector (LGMD) which has been modelled and applied on ground robots and vehicles. LGMD as a fly insect's visual neuron, is an ideal model for UAV collision detection. However, the existing models are inadequate in coping with complex visual challenges unique for UAVs. In this paper, we proposed a new LGMD model for flying robots considering distributed spatial-temporal computing for both excitation and inhibition to enhance the looming selectivity in flying scenes. The proposed model integrated recent discovered presynaptic connection types in biological LGMD neuron into a spatial-temporal filter with linear distributed interconnection. Systematic experiments containing quadcopter's first person view (FPV) flight videos demonstrated that the proposed distributed presynaptic structure can dramatically enhance LGMD's looming selectivity especially in complex flying UAV applications.
摘要:碰撞检测是无人驾驶飞行器(UAV)中最具挑战性的任务之一,特别是对于小型或微型无人机具有有限的计算能力。在自然界中,飞紧凑和简单的视觉系统昆虫展示惊人的能力,导航和避碰在一个复杂的环境。这方面的一个很好的例子是蝗虫。蝗虫避免在密集的群依托叫小叶巨人运动探测器已建模并应用于地面机器人和车辆(LGMD)所标识的视觉神经元的碰撞。 LGMD一只苍蝇昆虫的视觉神经,是无人机碰撞检测的理想模型。然而,现有的模型是具有独特的无人机复杂的视觉挑战的应对不足。在本文中,我们提出了飞行机器人考虑既兴奋和抑制分布式时空计算,以提高飞行的场景若隐若现选择性的新LGMD模型。该模型集成在生物神经元LGMD最近发现的突触连接类型为空间 - 时间滤波与线性分布互连。含四轴飞行器的第一人称视角(FPV)飞行视频系统实验表明,提出的分布式突触前结构可以显着地提高尤其是在复杂飞行的无人机应用LGMD的若隐若现的选择性。

73. AutoCLINT: The Winning Method in AutoCV Challenge 2019 [PDF] 返回目录
  Woonhyuk Baek, Ildoo Kim, Sungwoong Kim, Sungbin Lim
Abstract: NeurIPS 2019 AutoDL challenge is a series of six automated machine learning competitions. Particularly, AutoCV challenges mainly focused on classification tasks on visual domain. In this paper, we introduce the winning method in the competition, AutoCLINT. The proposed method implements an autonomous training strategy, including efficient code optimization, and applies an automated data augmentation to achieve the fast adaptation of pretrained networks. We implement a light version of Fast AutoAugment to search for data augmentation policies efficiently for the arbitrarily given image domains. We also empirically analyze the components of the proposed method and provide ablation studies focusing on AutoCV datasets.
摘要:NeurIPS 2019的自动下载的挑战是一系列的六条自动化机器学习比赛。特别是,AutoCV挑战主要集中在视觉领域分类任务。在本文中,我们介绍了在竞争中,AutoCLINT获胜的方法。该方法实现了自主培训战略,包括高效的代码优化,并应用自动数据增强实现预训练网络的快速适应。我们实现快速AutoAugment的简易版本,以高效地搜索数据增强政策的随意给出图像域。我们还实证分析了该方法的组件并提供切除研究重点AutoCV数据集。

74. An Investigation of Why Overparameterization Exacerbates Spurious Correlations [PDF] 返回目录
  Shiori Sagawa, Aditi Raghunathan, Pang Wei Koh, Percy Liang
Abstract: We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data. Through simulations and experiments on two image datasets, we identify two key properties of the training data that drive this behavior: the proportions of majority versus minority groups, and the signal-to-noise ratio of the spurious correlations. We then analyze a linear setting and show theoretically how the inductive bias of models towards "memorizing" fewer examples can cause overparameterization to hurt. Our analysis leads to a counterintuitive approach of subsampling the majority group, which empirically achieves low minority error in the overparameterized regime, even though the standard approach of upweighting the minority fails. Overall, our results suggest a tension between using overparameterized models versus using all the training data for achieving low worst-group error.
摘要:我们研究为什么overparameterization - 增加模型的大小远远超出零训练错误的点 - 尽管能提高当有数据虚假相关平均测试错误伤害的少数群体的测试误差。通过在两个图像数据组的仿真和实验中,我们确定驱动此行为训练数据中的两个关键性质:多数与少数群体的比例,和信噪比的虚假相关的。然后,我们分析的线性设置和显示的模型对“记忆”的例子少归纳偏置理论上如何引起overparameterization伤害。我们的分析导致二次抽样多数群体,其经验实现了低少数错误的overparameterized政权,即使upweighting少数人的标准方法失败的违反直觉的方法。总的来说,我们的结果表明使用overparameterized模型与使用所有的训练数据,实现低最差组误差之间的紧张关系。

75. ST-MNIST -- The Spiking Tactile MNIST Neuromorphic Dataset [PDF] 返回目录
  Hian Hian See, Brian Lim, Si Li, Haicheng Yao, Wen Cheng, Harold Soh, Benjamin C.K. Tee
Abstract: Tactile sensing is an essential modality for smart robots as it enables them to interact flexibly with physical objects in their environment. Recent advancements in electronic skins have led to the development of data-driven machine learning methods that exploit this important sensory modality. However, current datasets used to train such algorithms are limited to standard synchronous tactile sensors. There is a dearth of neuromorphic event-based tactile datasets, principally due to the scarcity of large-scale event-based tactile sensors. Having such datasets is crucial for the development and evaluation of new algorithms that process spatio-temporal event-based data. For example, evaluating spiking neural networks on conventional frame-based datasets is considered sub-optimal. Here, we debut a novel neuromorphic Spiking Tactile MNIST (ST-MNIST) dataset, which comprises handwritten digits obtained by human participants writing on a neuromorphic tactile sensor array. We also describe an initial effort to evaluate our ST-MNIST dataset using existing artificial and spiking neural network models. The classification accuracies provided herein can serve as performance benchmarks for future work. We anticipate that our ST-MNIST dataset will be of interest and useful to the neuromorphic and robotics research communities.
摘要:触觉传感器是智能机器人的一种基本形式,因为它使他们能够在他们的环境中的物理对象的灵活互动。电子皮肤的最新发展,导致了该利用这个重要的感觉模式数据驱动的机器学习方法的发展。然而,用来训练这样的算法的数据集电流限于标准同步的触觉传感器。有神经运动基于事件的触觉数据集的缺乏,主要是由于大型的基于事件的触觉传感器的稀缺性。有了这样的数据集是新的算法,流程时空基于事件的数据开发和评估的关键。例如,评估尖峰在常规的基于帧的数据集的神经网络被认为是次优的。在这里,我们登场的新颖神经形态扣球触觉MNIST(ST-MNIST)数据集,其包含由人类受试者神经形态触觉传感器阵列上获得写入手写体数字。我们还描述了一个最初的努力来评估我们使用现有的人工和尖峰神经网络模型ST-MNIST数据集。本文所提供的分类准确度可以作为业绩基准为今后的工作。我们预计,我们的ST-MNIST数据集将有兴趣和有用的神经形态和机器人研究团体。

76. Progressive Adversarial Semantic Segmentation [PDF] 返回目录
  Abdullah-Al-Zubaer Imran, Demetri Terzopoulos
Abstract: Medical image computing has advanced rapidly with the advent of deep learning techniques such as convolutional neural networks. Deep convolutional neural networks can perform exceedingly well given full supervision. However, the success of such fully-supervised models for various image analysis tasks (e.g., anatomy or lesion segmentation from medical images) is limited to the availability of massive amounts of labeled data. Given small sample sizes, such models are prohibitively data biased with large domain shift. To tackle this problem, we propose a novel end-to-end medical image segmentation model, namely Progressive Adversarial Semantic Segmentation (PASS), which can make improved segmentation predictions without requiring any domain-specific data during training time. Our extensive experimentation with 8 public diabetic retinopathy and chest X-ray datasets, confirms the effectiveness of PASS for accurate vascular and pulmonary segmentation, both for in-domain and cross-domain evaluations.
摘要:医学影像计算已与深学习技术的出现进展迅速,如卷积神经网络。深卷积神经网络可以执行得非常好,给予了充分的监管。然而,由于各种图像分析任务,例如完全监督模型的成功(例如,解剖结构或从医学图像分割病变)被限制在大量的标记数据的可用性。给定的小的样本大小,这样的模型是过于大域移偏置数据。为了解决这个问题,我们提出了一个新颖的终端到终端的医学图像分割模式,即渐进对抗性语义分割(PASS),它可以使改进的分割预测没有在训练时需要任何特定于域的数据。我们与8公共糖尿病性视网膜病和胸部X射线数据集广泛的实验,证实了,无论是对在域和跨域评价PASS进行准确的血管和肺分割的有效性。

77. Measuring the Algorithmic Efficiency of Neural Networks [PDF] 返回目录
  Danny Hernandez, Tom B. Brown
Abstract: Three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. Algorithmic progress has traditionally been more difficult to quantify than compute and data. In this work, we argue that algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time in the compute needed to reach past capabilities. We show that the number of floating-point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44x between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years. By contrast, Moore's Law would only have yielded an 11x cost improvement. We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
摘要:三大因素推动人工智能的进步:算法创新,数据和计算提供的培训量。算法的进展历来难度比的计算和数据来量化。在这项工作中,我们认为,算法的进步有一个方面,既简单测量和有趣:减少随着时间的推移,达到过去的能力所需的计算。我们发现,浮点运算次数需要训练分类到AlexNet级性能上ImageNet已经2012年和2019年这相当于之间下降的44X的一个因素算法效率历时7年每16个月翻一番。相比之下,摩尔定律将只取得了一个11倍的成本改善。我们多次看到,硬件和算法的效率提高,并且可以在一个类似规模过有意义的视野,这表明的AI进步好的模型集成来自两个措施。

78. Deep Residual Network based food recognition for enhanced Augmented Reality application [PDF] 返回目录
  Siddarth S, Sainath G
Abstract: Deep neural network based learning approaches is widely utilized for image classification or object detection based problems with remarkable outcomes. Realtime Object state estimation of objects can be used to track and estimate the features that the object of the current frame possesses without causing any significant delay and misclassification. A system that can detect the features of such objects in the present state from camera images can be used to enhance the application of Augmented Reality for improving user experience and delivering information in a much perceptual way. The focus behind this paper is to determine the most suitable model to create a low-latency assistance AR to aid users by providing them nutritional information about the food that they consume in order to promote healthier life choices. Hence the dataset has been collected and acquired in such a manner, and we conduct various tests in order to identify the most suitable DNN in terms of performance and complexity and establish a system that renders such information realtime to the user.
摘要:深基于神经网络的学习方法被广泛地用于图像分类或物体检测基于问题具有显着的成果。对象的实时对象状态估计可用于当前帧的对象具有而不引起任何延迟显著和误分类来跟踪和估计的功能。可以检测在从摄像机的图像的当前状态这样的对象的特征的系统可被用于增强的增强现实的用于改善用户体验和在一个更感性方式提供信息的应用程序。本文后面的重点是确定最合适的模型,为他们提供营养有关的食品,他们为了推广健康的生活选择消费创造一个低延迟援助AR来帮助用户。因此,数据集已被收集并以这样的方式获得的,并且我们以找出最适合DNN在性能和复杂性方面,建立呈现这样的信息实时给用户的系统进行各种测试。

注:中文为机器翻译结果!