摘要

1. X3D: Expanding Architectures for Efficient Video Recognition [PDF] 返回目录
Christoph Feichtenhofer
Abstract: This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code will be available at: this https URL
摘要：本文呈现X3D，一个家庭高效的视频网络，逐步扩大沿多个网络轴线的微小的2D图像分类体系结构，在空间，时间，宽度和深度的。通过在机器学习特征选择方法的启发，采用一个简单的阶梯式网络扩展方法膨胀的单一轴中的每个步骤，使得良好的精度对复杂性的权衡得以实现。为了扩大X3D到特定目标的复杂性，我们进行渐进向前扩展，然后向后收缩。 X3D实现国家的最先进的性能，同时需要4.8倍和5.5倍更少的乘法增加和参数精度，以前的工作类似。我们的最令人惊讶的发现是，以高时空分辨率的网络可以表现良好，同时在网络和宽度参数方面极其轻。我们报告的视频分类和检测基准前所未有的效率竞争力的准确性。代码将可在：此HTTPS URL

2. 3D Photography using Context-aware Layered Depth Inpainting [PDF] 返回目录
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, Jia-Bin Huang
Abstract: We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show fewer artifacts compared with the state of the arts.
摘要：我们提出了一个RGB-d的输入图像转换为3D照片的方法 - 用于新颖的视图合成的多层表示包含幻觉颜色和深度结构区域在原始视图闭塞。我们使用带有明确的像素连接的分层深度图像作为底层的表示，并提出以学习为主的修复模型，其合成新的本地色彩和深度内容纳入遮挡区域在空间环境感知方式。将所得的3D照片可以与运动视差使用标准图形引擎有效地呈现。我们确认我们在广泛挑战的日常场景的方法的有效性，并与艺术的状态相比显示较少的文物。

3. Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection [PDF] 返回目录
Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz
Abstract: Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training. However, major challenges remain: (1) differentiation of object instances can be ambiguous; (2) detectors tend to focus on discriminative parts rather than entire objects; (3) without ground truth, object proposals have to be redundant for high recalls, causing significant memory consumption. Addressing these challenges is difficult, as it often requires to eliminate uncertainties and trivial solutions. To target these issues we develop an instance-aware and context-focused unified framework. It employs an instance-aware self-training algorithm and a learnable Concrete DropBlock while devising a memory-efficient sequential batch back-propagation. Our proposed method achieves state-of-the-art results on COCO ($12.1\% ~AP$, $24.8\% ~AP_{50}$), VOC 2007 ($54.9\% ~AP$), and VOC 2012 ($52.1\% ~AP$), improving baselines by great margins. In addition, the proposed method is the first to benchmark ResNet based models and weakly supervised video object detection. Refer to our project page for code, models, and more details: this https URL.
摘要：弱监督学习已经成为通过训练期间减少强有力的监督需要有对象检测一个令人信服的工具。然而，仍然存在重大挑战：（1）对象实例的分化可以是不明确的; （2）检测器趋向于集中在判别部分，而不是整个对象; （3）无地面实况，对象提案需要成为多余的高召回，引起显著内存消耗。应对这些挑战是困难的，因为它往往需要消除不确定性和琐碎的解决方案。要针对这些问题，我们开发了一个实例感知和上下文为重点的统一框架。而制定的存储器高效连续批量反向传播，它采用一个实例感知自训练算法和一个可以学习混凝土DropBlock。我们提出的方法实现了对COCO状态的最先进的结果（$ 12.1 \％〜AP，$ 24.8 \％〜AP_ {50} $），VOC 2007（$ 54.9 \％〜AP $）和VOC 2012（$ 52.1 \ ％〜$ AP），由伟大的利润率提高基准。此外，所提出的方法是在第一至基准RESNET基于模型和弱监督视频对象检测。请参阅我们的项目页面代码，模型和更多的细节：该HTTPS URL。

4. Scalable Active Learning for Object Detection [PDF] 返回目录
Elmar Haussmann, Michele Fenzi, Kashyap Chitta, Jan Ivanecky, Hanson Xu, Donna Roy, Akshita Mittel, Nicolas Koumchatzky, Clement Farabet, Jose M. Alvarez
Abstract: Deep Neural Networks trained in a fully supervised fashion are the dominant technology in perception-based autonomous driving systems. While collecting large amounts of unlabeled data is already a major undertaking, only a subset of it can be labeled by humans due to the effort needed for high-quality annotation. Therefore, finding the right data to label has become a key challenge. Active learning is a powerful technique to improve data efficiency for supervised learning methods, as it aims at selecting the smallest possible training set to reach a required performance. We have built a scalable production system for active learning in the domain of autonomous driving. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, present our current results at scale, and briefly describe the open problems and future directions.
摘要：在一个完全监控的方式训练的深层神经网络是基于感知的自主驾驶系统的主导技术。同时收集大量未标记的数据已经是一大创举，只是它的一个子集，人类可以标记由于需要高品质的注解的努力。因此，寻找合适的数据标签已成为一个关键的挑战。主动学习是提高监督学习方法的数据效率的强大的技术，因为它的目的是选择尽可能小的训练集，以达到所需的性能。我们已经建立了主动学习的一个可扩展的生产系统在自动驾驶的领域。在本文中，我们描述了产生高层次的设计，勾勒一些挑战及其解决方案，大规模展示我们目前的结果，并简要描述的问题和未来的发展方向。

5. TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images [PDF] 返回目录
Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, Jiebo Luo
Abstract: An unsupervised image-to-image translation (UI2I) task deals with learning a mapping between two domains without paired images. While existing UI2I methods usually require numerous unpaired images from different domains for training, there are many scenarios where training data is quite limited. In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved. To this end, we propose TuiGAN, a generative model that is trained on only two unpaired images and amounts to one-shot unsupervised learning. With TuiGAN, an image is translated in a coarse-to-fine manner where the generated image is gradually refined from global structures to local details. We conduct extensive experiments to verify that our versatile method can outperform strong baselines on a wide variety of UI2I tasks. Moreover, TuiGAN is capable of achieving comparable performance with the state-of-the-art UI2I models trained with sufficient data.
摘要：学习两个域之间的映射没有配对图像无监督图像 - 图像平移（UI2I）任务交易。尽管现有UI2I方法通常需要从不同的域众多不成对图像进行训练，有很多场景中训练数据十分有限。在本文中，我们认为，即使每个域包含一个单一的形象，UI2I仍然可以实现。为此，我们提出TuiGAN，即只有两个未成对图像和大量训练一次性监督学习生成模型。与TuiGAN，图像被转换在粗到细的方式，其中所生成的图像逐渐地从全球结构细化到局部细节。我们进行了广泛的实验来验证我们的通用的方法可以在各种各样的UI2I任务跑赢大盘强劲基线。此外，TuiGAN能够实现以足够的数据训练所述状态的最先进的UI2I模型相当的性能的。

6. Where Does It End? -- Reasoning About Hidden Surfaces by Object Intersection Constraints [PDF] 返回目录
Michael Strecke, Joerg Stueckler
Abstract: Dynamic scene understanding is an essential capability in robotics and VR/AR. In this paper we propose Co-Section, an optimization-based approach to 3D dynamic scene reconstruction, which infers hidden shape information from intersection constraints. An object-level dynamic SLAM frontend detects, segments, tracks and maps dynamic objects in the scene. Our optimization backend completes the shapes using hull and intersection constraints between the objects. In experiments, we demonstrate our approach on real and synthetic dynamic scene datasets. We also assess the shape completion performance of our method quantitatively. To the best of our knowledge, our approach is the first method to incorporate such physical plausibility constraints on object intersections for shape completion of dynamic objects in an energy minimization framework.
摘要：动态场景的理解是机器人和VR / AR的基本能力。在本文中，我们提出了联合部分，基于优化的方法，以3D动态场景重建，其中推断瞒过路口约束形状信息。对象级动态SLAM前端检测，细分，跟踪和动态对象在场景中映射。我们的优化后端完成使用对象之间船体和相交约束的形状。在实验中，我们证明我们的真实和合成动态场景数据集的方式。我们也评估了该方法的形状完成实绩定量。到我们所知，我们的方法是将动态对象的形状完成对对象交叉点这样的物理的合理性约束能量最小化框架的第一种方法。

7. AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching [PDF] 返回目录
Xiao Song, Guorun Yang, Xinge Zhu, Hui Zhou, Zhe Wang, Jianping Shi
Abstract: In this paper, we attempt to solve the domain adaptation problem for deep stereo matching networks. Instead of resorting to black-box structures or layers to find implicit connections across domains, we focus on investigating adaptation gaps for stereo matching. By visual inspections and extensive experiments, we conclude that low-level aligning is crucial for adaptive stereo matching, since main gaps across domains lie in the inconsistent input color and cost volume distributions. Correspondingly, we design a bottom-up domain adaptation method, in which two particular approaches are proposed, i.e. color transfer and cost regularization, that can be easily integrated into existing stereo matching models. The color transfer enables transferring a large amount of synthetic data to the same color spaces with target domains during training. The cost regularization can further constrain both the lower-layer features and cost volumes to domain-invariant distributions. Although our proposed strategies are simple and have no parameters to learn, they do improve the generalization ability of existing disparity networks by a large margin. We conduct experiments across multiple datasets, including Scene Flow, KITTI, Middlebury, ETH3D and DrivingStereo. Without whistles and bells, our synthetic-data pretrained models achieve state-of-the-art cross-domain performance compared to previous domain-invariant methods, even outperform state-of-the-art disparity networks fine-tuned with target domain ground-truths on multiple stereo matching benchmarks.
摘要：在本文中，我们试图解决深立体匹配网络领域适应性问题。而不是诉诸黑盒结构或层找到跨域隐式连接的，我们重点研究适应方面的差距立体匹配。通过目视检查和广泛的实验，我们得出结论，低级别的取向是用于自适应立体匹配关键的，因为在域主间隙在于不一致输入颜色和成本体积分布。相应地，我们设计了自下而上域自适应方法，其中，提出了两种特定方法，即彩色传输和成本正则化，可以很容易地集成到现有的立体匹配的模型。颜色转移使得能够在训练期间大量合成的数据的传送到相同的颜色空间与目标域。成本正规化可以进一步约束两者下层功能和成本卷域不变分布。虽然我们提出的策略很简单，没有参数的学习，他们也提高了一大截差距现有网络的泛化能力。我们在多个数据集，包括场景流量，KITTI，明德，ETH3D和DrivingStereo进行实验。没有口哨声和钟声，我们的综合数据预先训练模型实现国家的最先进的跨域性能相比以前的域不变的方法，甚至强于大盘国家的最先进的差距网络微调与目标域地面在多个立体匹配基准的真理。

8. Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation [PDF] 返回目录
Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen
Abstract: Image-level weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. Most of advanced solutions exploit class activation map (CAM). However, CAMs can hardly serve as the object mask due to the gap between full and weak supervisions. In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation. However, this constraint is lost on the CAMs trained by image-level supervision. Therefore, we propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning. Moreover, we propose a pixel correlation module (PCM), which exploits context appearance information and refines the prediction of current pixel by its similar neighbors, leading to further improvement on CAMs consistency. Extensive experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms state-of-the-art methods using the same level of supervision. The code is released online.
摘要：映像级弱监督语义分割是在近几年深入研究一个具有挑战性的问题。最先进的解决方案，利用类激活图（CAM）。然而，凸轮也很难作为对象遮罩由于充分和薄弱监督之间的差距。在本文中，我们提出了一个自我监督等变注意机制（SEAM）来发现更多的监管，缩小贫富差距。我们的方法是基于这样的观察同变性是一个隐式约束在完全监督语义分割，其像素级标签采取相同的空间变换作为数据扩张期间输入图像。然而，这种约束是失去了对通过图像层次监督训练有素的凸轮。因此，我们提出了一致性正规化上预测的CAM从不同的变换图像提供自我监督的网络学习。此外，我们提出了一个像素相关模块（PCM），其通过其类似的邻居利用上下文外观信息与细化当前像素的预测，导致上的CAM一致性的进一步改善。在PASCAL VOC 2012数据集大量的实验证明国家的最先进的我们的方法优于使用监督的同一水平的方法。该代码在网上公布。

9. Sequential Neural Rendering with Transformer [PDF] 返回目录
Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
Abstract: This paper address the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints. Using the known query pose and input poses, we create an ordered set of observations that leads to the target view. Thus, the problem of single novel view synthesis is reformulated as a sequential view prediction task. In this paper, the proposed Transformer-based Generative Query Network (T-GQN) extends the neural-rendering methods by adding two new concepts. First, we use multi-view attention learning between context images to obtain multiple implicit scene representations. Second, we introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. We evaluate our model on various challenging synthetic datasets and demonstrate that our model can give consistent predictions and achieve faster training convergence than the former architectures.
摘要：本文地址新颖视图合成的由神经渲染，其中我们感兴趣的是在预测基于一组给定来自其他视点的输入图像的任意相机姿势的新颖视图的装置的问题。利用已知查询姿势和姿势输入，我们创建一个有序组观察，导致了目标视图。因此，单一新颖视图合成的问题是重新为顺序视图预测任务。本文提出的基于变压器的剖成查询网（T-GQN）延伸，通过增加两个新的概念神经渲染方法。首先，我们使用的上下文图像之间的多视角关注学习获得多个隐现场表示。其次，我们引入一个连续的渲染解码预测图像序列，包括目标来看，基于所学习的表示。我们评估我们在各种具有挑战性的综合数据集模型，并表明我们的模型可以提供一致的预测和实现更快的训练收敛比前者架构。

10. Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation [PDF] 返回目录
Lin-Zhuo Chen, Zheng Lin, Ziqin Wang, Yong-Liang Yang, Ming-Ming Cheng
Abstract: 3D spatial information is known to be beneficial to the semantic segmentation task. Most existing methods take 3D spatial data as an additional input, leading to a two-stream segmentation network that processes RGB and 3D spatial information separately. This solution greatly increases the inference time and severely limits its scope for real-time applications. To solve this problem, we propose Spatial information guided Convolution (S-Conv), which allows efficient RGB feature and 3D spatial information integration. S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information, helping the convolutional layer adjust the receptive field and adapt to geometric transformations. S-Conv also incorporates geometric information into the feature learning process by generating spatially adaptive convolutional weights. The capability of perceiving geometry is largely enhanced without much affecting the amount of parameters and computational cost. We further embed S-Conv into a semantic segmentation network, called Spatial information Guided convolutional Network (SGNet), resulting in real-time inference and state-of-the-art performance on NYUDv2 and SUNRGBD datasets.
摘要：三维空间信息是已知的语义分割任务是有益的。大多数现有的方法采取三维空间数据作为附加输入，从而导致其分别处理RGB和3D空间信息的两流划分网络。该解决方案大大增加了推理时间和严重限制了它的范围为实时应用。为了解决这个问题，我们提出了指导卷积（S-CONV）空间信息，从而能够实现高效RGB功能和三维空间信息集成。 S-转化率是主管来推断采样由3D空间信息引导的卷积核的偏移，帮助卷积层调节感受域并适应几何变换。 S-转化率也通过产生空间自适应加权卷积包含几何信息到特征的学习过程。感知几何形状的能力大幅度提高没有多大影响的参数和计算成本的金额。我们进一步嵌入S-转化率成语义分割网络，卷积网络（SGNet）引导称为空间信息，从而实时推理和国家的最先进的性能上NYUDv2和SUNRGBD数据集。

11. A Proposed IoT Smart Trap using Computer Vision for Sustainable Pest Control in Coffee Culture [PDF] 返回目录
Vitor Alexandre Campos Figueiredo, Samuel Mafra, Joel Rodrigues
Abstract: The Internet of Things (IoT) is emerging as a multi-purpose technology with enormous potential for improving the quality of life in several areas. In particular, IoT has been applied in agriculture to make it more sustainable ecologically. For instance, electronic traps have the potential to perform pest control without any pesticide. In this paper, a smart trap with IoT capabilities that uses computer vision to identify the insect of interest is proposed. The solution includes 1) an embedded system with camera, GPS sensor and motor actuators; 2) an IoT middleware as database service provider, and 3) a Web application to present data by a configurable heat map. The demonstration of proposed solution is exposed and the main conclusions are the perception about pest concentration at the plantation and the viability as alternative pest control over traditional control based on pesticides.
摘要：物联网（IOT）是一种新兴的多功能技术，提高生活质量在若干领域的巨大潜力。特别是，物联网已在农业应用，使其更具生态可持续性。例如，电子陷阱具有防虫，没有任何农药进行的可能性。在本文中，与物联网功能，采用计算机视觉识别感兴趣的昆虫智能陷阱建议。该解决方案包括：1）与相机的嵌入式系统，GPS传感器和致动器马达; 2）的IoT中间件作为数据库服务提供者，以及3）由可配置热图的Web应用程序来呈现数据。提出的解决方案的示范露出，主要结论是有关在种植园害虫浓度和存活率为基于对杀虫剂传统控制替代性虫害控制的感知。

12. Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification [PDF] 返回目录
S. Wang, Y. Guan, L. Shao
Abstract: Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to automatically capture the latent ontological structure of remote sensing datasets. We design a granular framework that allows progressively cropping the input image to learn multi-grained features. For each specific granularity, we discover the canonical appearance from a set of pre-defined transformations and learn the corresponding CNN features through a maxout-based Siamese style architecture. Then, we replace the standard CNN features with Gaussian covariance matrices and adopt the proper matrix normalisations for improving the discriminative power of features. Besides, we provide a stable solution for training the eigenvalue-decomposition function (EIG) in a GPU and demonstrate the corresponding back-propagation using matrix calculus. Extensive experiments have shown that our framework can achieve promising results in public remote sensing scene datasets.
摘要：鉴于遥感场景图像遗体由于大视觉语义差异挑战。这些主要的出现是由于其可用于对准的像素级交涉高层语义标签缺乏详细注释。作为标记过程是劳动密集的和主观的，我们在此提出了一种多粒度典型外观池（MG-CAP）来自动拍摄遥感数据集的潜本体结构。我们设计了颗粒状的框架，允许逐步裁剪输入图像学的多晶特性。对于每一个特定的粒度，我们发现从一组预定义的转换的典型外观和学习相应的CNN通过基于MAXOUT连体式建筑特色。然后，我们在更换标准CNN高斯协方差矩阵功能，并采用适当的矩阵normalisations用于改善功能的辨别力。此外，我们提供一种用于在GPU训练特征值分解函数（EIG）稳定的溶液，并使用矩阵演算表明相应的反向传播。大量的实验表明，我们的框架可以实现在公共遥感现场数据集可喜的成果。

13. Neural Object Descriptors for Multi-View Shape Reconstruction [PDF] 返回目录
Edgar Sucar, Kentaro Wada, Andrew Davison
Abstract: The choice of scene representation is crucial in both the shape inference algorithms it requires and the smart applications it enables. We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine, for principled full object shape inference from one or more RGB-D images. Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system capable of optimising object poses and shapes jointly with camera trajectory.
摘要：情景再现的选择是在两个形状推断关键算法，它需要它使智能应用程序。我们本高效和optimisable多类具有新颖概率和差动渲染引擎学习对象描述符一起，用于从一个或多个RGB-d的图像原则性完整对象形状推断。我们的框架允许精确和鲁棒的3D对象重建使多个应用中，包括机器人抓取和放置，增强现实，并且能够优化对象的姿势和形状与相机轨迹共同的第一对象级SLAM系统。

14. LightConvPoint: convolution for points [PDF] 返回目录
Alexandre Boulch, Gilles Puy, Renaud Marlet
Abstract: Recent state-of-the-art methods for point cloud semantic segmentation are based on convolution defined for point clouds. In this paper, we propose a formulation of the convolution for point cloud directly designed from the discrete convolution in image processing. The resulting formulation underlines the separation between the discrete kernel space and the geometric space where the points lies. The link between the two space is done by a change space matrix $\mathbf{A}$ which distributes the input features on the convolution kernel. Several existing methods fall under this formulation. We show that the matrix $\mathbf{A}$ can be easily estimated with neural networks. Finally, we show competitive results on several semantic segmentation benchmarks while being efficient both in computation time and memory.
摘要：最近的状态的最先进的点云语义分割的方法是基于对点云定义的卷积。在本文中，我们提出了卷积点云直接从在图像处理中的离散卷积设计的制剂。所得制剂强调离散内核空间和几何空间，其中点位于之间的分离。这两个空间之间的联系是通过改变空间矩阵$ \ mathbf {A} $其分配输入的卷积核功能来完成。一些现有的方法属于这一提法下。我们表明，矩阵$ \ mathbf {A} $可以很容易地用神经网络估计。最后，我们将展示，同时无论是在计算时间和内存使用效率，在几个语义分割基准竞争的结果。

15. Decoupled Gradient Harmonized Detector for Partial Annotation: Application to Signet Ring Cell Detection [PDF] 返回目录
Tiancheng Lin, Yuanfan Guo, Canqian Yang, Jiancheng Yang, Yi Xu
Abstract: Early diagnosis of signet ring cell carcinoma dramatically improves the survival rate of patients. Due to lack of public dataset and expert-level annotations, automatic detection on signet ring cell (SRC) has not been thoroughly investigated. In MICCAI DigestPath2019 challenge, apart from foreground (SRC region)-background (normal tissue area) class imbalance, SRCs are partially annotated due to costly medical image annotation, which introduces extra label noise. To address the issues simultaneously, we propose Decoupled Gradient Harmonizing Mechanism (DGHM) and embed it into classification loss, denoted as DGHM-C loss. Specifically, besides positive (SRCs) and negative (normal tissues) examples, we further decouple noisy examples from clean examples and harmonize the corresponding gradient distributions in classification respectively. Without whistles and bells, we achieved the 2nd place in the challenge. Ablation studies and controlled label missing rate experiments demonstrate that DGHM-C loss can bring substantial improvement in partially annotated object detection.
摘要：印戒细胞癌的早期诊断显着提高的患者的生存率。由于缺乏公共数据集和专家级别的注解，对印戒细胞自动检测（SRC）尚未彻底调查。在MICCAI DigestPath2019挑战，除了前景（SRC区域）-background（正常组织区域）类失衡，的SRC部分地注释由于昂贵的医疗图像注释，这引入了额外的标签噪声。为了同时解决这些问题，我们提出了解耦梯度协调机制（DGHM），并嵌入到分类的损失，表示为DGHM-C的损失。具体而言，除了正器（SRC）和负极（正常组织）的例子，我们进一步解耦嘈杂从清洁示例实施例和分别协调在分类相应的梯度分布。没有口哨声和钟声，我们实现了在挑战第二名。消融的研究和控制标签丢失率实验表明，DGHM-C的损失可以部分注释对象检测带来实质性的改善。

16. CenterMask: single shot instance segmentation with point representation [PDF] 返回目录
Yuqing Wang, Zhaoliang Xu, Hao Shen, Baoshan Cheng, Lirong Yang
Abstract: In this paper, we propose a single-shot instance segmentation method, which is simple, fast and accurate. There are two main challenges for one-stage instance segmentation: object instances differentiation and pixel-wise feature alignment. Accordingly, we decompose the instance segmentation into two parallel subtasks: Local Shape prediction that separates instances even in overlapping conditions, and Global Saliency generation that segments the whole image in a pixel-to-pixel manner. The outputs of the two branches are assembled to form the final instance masks. To realize that, the local shape information is adopted from the representation of object center points. Totally trained from scratch and without any bells and whistles, the proposed CenterMask achieves 34.5 mask AP with a speed of 12.3 fps, using a single-model with single-scale training/testing on the challenging COCO dataset. The accuracy is higher than all other one-stage instance segmentation methods except the 5 times slower TensorMask, which shows the effectiveness of CenterMask. Besides, our method can be easily embedded to other one-stage object detectors such as FCOS and performs well, showing the generation of CenterMask.
摘要：在本文中，我们提出了一个单杆例如分割方法，该方法简便，快速，准确。存在用于单级实例分割两个主要挑战：对象实例分化和逐像素特征对准。因此，我们分解实例分割成两个平行的子任务：局部形状预测分开情况下，甚至在重叠条件下，和全局显着一代段在像素到像素的方式整个图像。两个分支的输出被组装以形成最终的实例掩模。为了实现如此，局部形状信息是从对象的中心点表示通过。从头且无任何花里胡哨完全嘤/拟议CenterMask达到34.5掩模AP 12.3 fps的速度，使用单模型与单尺度训练测试的挑战COCO数据集。精度比所有其他单级实例分割方法除了5倍TensorMask慢，更高，这显示CenterMask的有效性。此外，我们的方法可以很容易地嵌入到其他单级对象检测器，例如FCOS和执行以及，表示CenterMask的产生。

17. DeepSEE: Deep Disentangled Semantic Explorative Extreme Super-Resolution [PDF] 返回目录
Marcel Christoph Bühler, Andrés Romero, Radu Timofte
Abstract: Super-resolution (SR) is by definition ill-posed. There are infinitely many plausible high-resolution variants for a given low-resolution natural image. This is why example-based SR methods study upscaling factors up to 4x (or up to 8x for face hallucination). Most of the current literature aims at a single deterministic solution of either high reconstruction fidelity or photo-realistic perceptual quality. In this work, we propose a novel framework, DeepSEE, for Deep disentangled Semantic Explorative Extreme super-resolution. To the best of our knowledge, DeepSEE is the first method to leverage semantic maps for explorative super-resolution. In particular, it provides control of the semantic regions, their disentangled appearance and it allows a broad range of image manipulations. We validate DeepSEE for up to 32x magnification and exploration of the space of super-resolution.
摘要：超分辨率（SR）是定义病态。有一个给定低分辨率自然图像无限多的似是而非的高分辨率变种。这就是为什么基于实例的SR方法研究倍增系数高达4倍（或高达8倍的脸幻觉）。大多数目前的文献目的的在任一高的保真度重建或照片般逼真的感知质量的单一确定性溶液。在这项工作中，我们提出了一个新的框架，DeepSEE，深解开的语义探究至尊超分辨率。据我们所知，DeepSEE是利用开发性超分辨率语义地图的第一个方法。特别是，它提供了语义区域，它们的解缠结的外观控制，它允许一个宽范围的图像操作。我们验证DeepSEE高达32倍放大倍率和探索超分辨率的空间。

18. Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [PDF] 返回目录
Jogendra Nath Kundu, Siddharth Seth, Varun Jampani, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty
Abstract: Camera captured human pose is an outcome of several sources of variation. Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks. As a result, the learned model not only inculcates task-bias but also dataset-bias because of its strong reliance on the annotated samples, which also holds true for weakly-supervised models. Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames. We leverage the prior knowledge on human skeleton and poses in the form of a single part-based 2D puppet model, human pose articulation constraints, and a set of unpaired 3D poses. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, not only facilitates discovery of interpretable pose disentanglement but also allows us to operate on videos with diverse camera movements. Qualitative results on unseen in-the-wild datasets establish our superior generalization across multiple tasks beyond the primary tasks of 3D pose estimation and part segmentation. Furthermore, we demonstrate state-of-the-art weakly-supervised 3D pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.
摘要：捕捉人体姿势相机是变化的几个来源的结果。监督三维姿态估计的性能接近正值与变化，比如形状和外观配送的成本，这可能是解决其他相关任务有用。其结果是，学习模式不仅灌输任务的偏见，而且数据集，偏见，因为其对注释的样品的过分依赖，这也适用于弱监督模型真实的。承认这一点，我们提出了一个自我监督的学习框架，无标签的视频帧理清这种变化。我们充分利用对人体骨骼和姿势的先验知识在一个单一的基于部分的2D木偶模型，人体姿势清晰度的限制，以及一组配对的3D姿势的形式。我们的微形式化，缩小3D姿态与空间部分地图之间的间隙表示，不仅方便了解释的姿态解开的发现也使我们能够与不同的摄像机运动的视频操作。在看不见的最疯狂的数据集的定性结果我们建立跨多个任务优于泛化超越三维姿态估计和部分分割的主要任务。此外，我们展示了国家的最先进的弱监督双方Human3.6M和MPI-INF-3DHP数据集三维姿态估计性能。

19. Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation [PDF] 返回目录
Da Li, Timothy Hospedales
Abstract: Domain adaptation (DA) is the topical problem of adapting models from labelled source datasets so that they perform well on target datasets where only unlabelled or partially labelled data is available. Many methods have been proposed to address this problem through different ways to minimise the domain shift between source and target datasets. In this paper we take an orthogonal perspective and propose a framework to further enhance performance by meta-learning the initial conditions of existing DA algorithms. This is challenging compared to the more widely considered setting of few-shot meta-learning, due to the length of the computation graph involved. Therefore we propose an online shortest-path meta-learning framework that is both computationally tractable and practically effective for improving DA performance. We present variants for both multi-source unsupervised domain adaptation (MSDA), and semi-supervised domain adaptation (SSDA). Importantly, our approach is agnostic to the base adaptation algorithm, and can be applied to improve many techniques. Experimentally, we demonstrate improvements on classic (DANN) and recent (MCD and MME) techniques for MSDA and SSDA, and ultimately achieve state of the art results on several DA benchmarks including the largest scale DomainNet.
摘要：适配域（DA）是适应从标记为源数据集的模型的局部问题，使得它们对目标数据集，其中仅未标记的或部分标记的数据表现良好是可用的。许多方法被提出，通过不同的方式，以尽量减少源和目标数据集之间的域转移来解决这个问题。在本文中，我们采取垂直的角度，提出了一个框架，通过元学习的现有DA算法的初始条件，进一步提高性能。相比几炮元学习的更广泛的考虑设置这是一个挑战，由于涉及到的计算图的长度。因此，我们提出了一个在线的最短路径元学习框架，既易于计算和实际有效改善DA性能。我们两个多源无监督域适配（MSDA）本变体，和半监督域自适应（SSDA）。重要的是，我们的做法是不可知的基本自适应算法，并可以应用到改善许多技术。实验上，我们展示了经典（DANN）和最近（MCD和MME）技术MSDA和SSDA的改进，最终达到几个DA基准，包括规模最大DomainNet艺术效果的状态。

20. Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks [PDF] 返回目录
Kakeru Mitsuno, Junichi Miyao, Takio Kurita
Abstract: In a deep neural network (DNN), the number of the parameters is usually huge to get high learning performances. For that reason, it costs a lot of memory and substantial computational resources, and also causes overfitting. It is known that some parameters are redundant and can be removed from the network without decreasing performance. Many sparse regularization criteria have been proposed to solve this problem. In a convolutional neural network (CNN), group sparse regularizations are often used to remove unnecessary subsets of the weights, such as filters or channels. When we apply a group sparse regularization for the weights connected to a neuron as a group, each convolution filter is not treated as a target group in the regularization. In this paper, we introduce the concept of hierarchical grouping to solve this problem, and we propose several hierarchical group sparse regularization criteria for CNNs. Our proposed the hierarchical group sparse regularization can treat the weight for the input-neuron or the output-neuron as a group and convolutional filter as a group in the same group to prune the unnecessary subsets of weights. As a result, we can prune the weights more adequately depending on the structure of the network and the number of channels keeping high performance. In the experiment, we investigate the effectiveness of the proposed sparse regularizations through intensive comparison experiments on public datasets with several network architectures. Code is available on GitHub: "this https URL"
摘要：在深层神经网络（DNN），参数的数量通常是巨大的，以获得较高的学习表演。出于这个原因，它的成本大量的内存和大量的计算资源，并且还会导致过度拟合。已知的是，某些参数是多余的，可以从网络而不降低性能被移除。许多稀疏正规化的标准已经被提出来解决这个问题。在卷积神经网络（CNN），组稀疏正则化是通常用来去除的权重的不必要的子集，如过滤器或信道。当我们将一组稀疏正则化对连接到神经元作为一组的权重，每个卷积滤波器不被视为在正规化目标组。在本文中，我们介绍了层次分组的概念来解决这个问题，我们提出了几个细胞神经网络的层级组稀疏正规化标准。我们提出的层级组稀疏正则可以治疗的重量为输入神经元或所述输出神经元作为一组和卷积滤波器作为在同一组中的基团来修剪权重的不必要的子集。其结果是，我们可以根据网络的结构和通道保持高性能的数量更充分修剪的权重。在实验中，我们研究提出了稀疏正则化经过密集比较实验，在公共数据集与多个网络架构的有效性。代码可以在GitHub上：“这HTTPS URL”

21. Universal Source-Free Domain Adaptation [PDF] 返回目录
Jogendra Nath Kundu, Naveen Venkat, Rahul M V, R. Venkatesh Babu
Abstract: There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Closed-set, Open-set or Partial DA). Furthermore, almost all prior unsupervised DA works require coexistence of source and target samples even during deployment, making them unsuitable for real-time adaptation. Devoid of such impractical assumptions, we propose a novel two-stage learning process. 1) In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. To achieve this, we enhance the model's ability to reject out-of-source distribution samples by leveraging the available source data, in a novel generative classifier framework. 2) In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps, with no access to the previously seen source samples. To this end, in contrast to the usage of complex adversarial training regimes, we define a simple yet effective source-free adaptation objective by utilizing a novel instance-level weighting mechanism, named as Source Similarity Metric (SSM). A thorough evaluation shows the practical usability of the proposed learning framework with superior DA performance even over state-of-the-art source-dependent approaches.
摘要：有强烈的动机去开发多样化的学习，可以从一个标记源域的域转移的存在转移类可分性的知识，未标记的目标域技术。现有的域的适应（DA）方法都没有配备用于实际DA方案作为其上的源 - 目标标签集关系的知识的依赖（例如闭集，开集或部分DA）的结果。此外，几乎所有前无监督DA作品甚至在部署过程中需要的源和目标样本的并存，使它们不适合实时适应。没有这样一个不切实际的假设，我们提出了一个新颖的两阶段的学习过程。 1）在采购阶段，我们的目标是装备用于将来免费源的部署模型，假定即将类别的间隙和域移位没有先验知识。为了实现这一目标，我们提升了车型的通过利用现有的源数据，以一种新颖的生成分类框架拒绝外的源代码分发样本的能力。 2）在部署阶段，目标是设计一种能够在大范围类别的间隙的操作，无法获得先前看到的源样本的一个统一的自适应算法。为此，相比于复杂的对抗性训练制度的使用中，我们通过利用新颖的实例级加权机制，命名为源相似度度量（SSM）定义一个简单而有效的免费源适应的目标。详尽评估显示具有优越的性能DA即使在国家的最先进的源依赖的办法所提出的学习框架的实际可用性。

22. Towards Inheritable Models for Open-Set Domain Adaptation [PDF] 返回目录
Jogendra Nath Kundu, Naveen Venkat, Ambareesh Revanur, Rahul M V, R. Venkatesh Babu
Abstract: There has been a tremendous progress in Domain Adaptation (DA) for visual recognition tasks. Particularly, open-set DA has gained considerable attention wherein the target domain contains additional unseen categories. Existing open-set DA approaches demand access to a labeled source dataset along with unlabeled target instances. However, this reliance on co-existing source and target data is highly impractical in scenarios where data-sharing is restricted due to its proprietary nature or privacy concerns. Addressing this, we introduce a practical DA paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. To this end, we formalize knowledge inheritability as a novel concept and propose a simple yet effective solution to realize inheritable models suitable for the above practical paradigm. Further, we present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data. We provide theoretical insights followed by a thorough empirical evaluation demonstrating state-of-the-art open-set domain adaptation performance.
摘要：目前已在领域适应性（DA）从视觉上识别任务的巨大进步。特别是，开放式集合DA已经获得了相当大的关注，其中在目标域包含附加的看不见的类别。现有开放式集合DA接近与未标记的目标实例一起标记的源数据集的按需访问。然而，这种依赖并存的源和目标数据是在共享数据被限制的情况非常不实际的，由于其专有性或隐私问题。针对这个，我们引入了实用的DA模式，其中一个源训练模型被用来促进在没有未来的源数据集的改编。为此，我们正式知识可继承作为一个新的概念，并提出了一个简单而有效的解决方案，以实现适合于上述实际范例继承模型。此外，我们提出了一种客观的方式来量化可继承以使最适当的源模型的选择对于给定的目标域，即使在不存在源数据的。我们提供的理论见解，然后深入的实证评价表明国家的最先进的开放式集合域自适应性能。

23. Masked GANs for Unsupervised Depth and Pose Prediction with Scale Consistency [PDF] 返回目录
Chaoqiang Zhao, Gary G. Yen, Qiyu Sun, Chongzhen Zhang, Yang Tang
Abstract: Previous works have shown that adversarial learning can be used for unsupervised monocular depth and visual odometry (VO) estimation. However, the performance of pose and depth networks is limited by occlusions and visual field changes. Because of the incomplete correspondence of visual information between frames caused by motion, target images cannot be synthesized completely from source images via view reconstruction and bilinear interpolation. The reconstruction loss based on the difference between synthesized and real target images will be affected by the incomplete reconstruction. Besides, the data distribution of unreconstructed regions will be learned and help the discriminator distinguish between real and fake images, thereby causing the case that the generator may fail to compete with the discriminator. Therefore, a MaskNet is designed in this paper to predict these regions and reduce their impacts on the reconstruction loss and adversarial loss. The impact of unreconstructed regions on discriminator is tackled by proposing a boolean mask scheme, as shown in Fig. 1. Furthermore, we consider the scale consistency of our pose network by utilizing a new scale-consistency loss, therefore our pose network is capable of providing the full camera trajectory over the long monocular sequence. Extensive experiments on KITTI dataset show that each component proposed in this paper contributes to the performance, and both of our depth and trajectory prediction achieve competitive performance.
摘要：以前的作品已经表明，对抗性学习可用于无监督单眼深度和视觉里程计（VO）估计。然而，姿势和深度网络的性能是通过遮挡和视野改变的限制。由于运动引起的帧之间的视觉信息的不完全对应的，目标的图像不能完全从源图像通过视图重建和双线性内插合成。基于合成的和真实的目标图像之间的差别重建损失将由不完全重建的影响。此外，未经改造的地区的数据分布将学习和帮助鉴别真假图像进行区分，从而引起发电机可能会失败，鉴别竞争的情况。因此，MaskNet设计，本文预测这些地区和减少对重建损失和对抗性的损失及其影响。守旧的地区对鉴别的影响是通过提出一个布尔面具方案，如图解决。1。此外，我们利用一个新的尺度一致性损失考虑我们的姿势网络的规模一致，因此，我们的姿势网络能够提供从长远单眼序列完整的相机轨迹。在KITTI数据集显示出广泛的实验，每个组件本文有助于性能提出的，我们两个的深度和轨迹预测的实现有竞争力的表现。

24. Reciprocal Learning Networks for Human Trajectory Prediction [PDF] 返回目录
Hao Sun, Zhiqun Zhao, Zhihai He
Abstract: We observe that the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we develop a new approach, called reciprocal learning, for human trajectory prediction. Two networks, forward and backward prediction networks, are tightly coupled, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.
摘要：我们观察到人的轨迹不仅是向前预测的，而且还落后预测的。向前和向后的轨迹遵循相同的社会规范，并与他们的时间方向上的唯一区别遵守同样的物理限制。基于这种独特的属性，我们开发了一种新的方法，称为相互学习，为人类的轨迹预测。两个网络，向前和向后预测的网络，紧密耦合，满足相互约束，这使得他们能够共同教训。在此基础上的约束，我们借用的深层神经网络，反复修改了网络给定或强制网络输出相匹配的输入对抗攻击的概念，并制定网络预测，呼吁匹配的预测相互攻击的新方法。它进一步改善了预测精度。我们对标准数据集实验结果表明，我们的新方法优于国家的最先进的方法对人类轨迹预测。

25. MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion [PDF] 返回目录
Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, Andrew J. Davison
Abstract: Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.
摘要：机器人等智能设备需要从他们的车载视觉系统高效的基于对象的现场交涉原因有关的接触，物理和闭塞。公认的确切对象模型将起到不可识别的旁边结构的非参数重建中起重要作用。我们提出，其可以估计从实时接触和咬合的多个已知的对象的准确姿势的系统中，实施多视点的视野。我们的方法使得从单个RGB-d视图的3D对象姿态的提案，积聚姿态估计和从多个视图的摄像机移动非参数占用信息，并进行联合优化来估计一致，非交叉姿势为在接触的多个对象。 YCB-Video和我们自己的挑战混乱YCB视频：我们在2点对象的数据集实验来验证我们的方法的准确性和鲁棒性。我们展示了一个实时的机器人应用程序，其中一个机械臂准确，有序反汇编复杂对象的桩，只使用板载RGB-d愿景。

26. Quasi-Newton Solver for Robust Non-Rigid Registration [PDF] 返回目录
Yuxin Yao, Bailin Deng, Weiwei Xu, Juyong Zhang
Abstract: Imperfect data (noise, outliers and partial overlap) and high degrees of freedom make non-rigid registration a classical challenging problem in computer vision. Existing methods typically adopt the $\ell_{p}$ type robust estimator to regularize the fitting and smoothness, and the proximal operator is used to solve the resulting non-smooth problem. However, the slow convergence of these algorithms limits its wide applications. In this paper, we propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization, which can handle outliers and partial overlaps. We apply the majorization-minimization algorithm to the problem, which reduces each iteration to solving a simple least-squares problem with L-BFGS. Extensive experiments demonstrate the effectiveness of our method for non-rigid alignment between two shapes with outliers and partial overlap, with quantitative evaluation showing that it outperforms state-of-the-art methods in terms of registration accuracy and computational speed. The source code is available at this https URL.
摘要：不完全数据（噪声，异常值和部分重叠）和高自由度化妆非刚性配准的计算机视觉经典具有挑战性的问题。现有方法通常采用$ \ ell_ {P} $型鲁棒估计正规化嵌合和平滑度，并且近端运算符用于解决所得到的非平滑的问题。然而，这些算法的收敛速度慢限制了它的广泛应用。在本文中，我们提出了基于全局平滑鲁棒估计器，用于数据拟合，正规化，它可以处理和异常值的部分重叠健壮非刚性配准的制剂。我们应用优化，最小化算法的问题，从而降低每次迭代求解一个简单的最小二乘问题与L-BFGS。广泛的实验表明我们的方法的用于与异常值和部分重叠两个形状之间的非刚性对准的有效性，以定量评价表明其性能优于在配准精度和计算速度方面国家的最先进的方法。源代码可在此HTTPS URL。

27. Identification of splicing edges in tampered image based on Dichromatic Reflection Model [PDF] 返回目录
Zhe Shen, Peng Sun, Yubo Lang, Lei Liu, Silong Peng
Abstract: Imaging is a sophisticated process combining a plenty of photovoltaic conversions, which lead to some spectral signatures beyond visual perception in the final images. Any manipulation against an original image will destroy these signatures and inevitably leave some traces in the final forgery. Therefore we present a novel optic-physical method to discriminate splicing edges from natural edges in a tampered image. First, we transform the forensic image from RGB into color space of S and o1o2. Then on the assumption of Dichromatic Reflection Model, edges in the image are discovered by composite gradient and classified into different types based on their different photometric properties. Finally, splicing edges are reserved against natural ones by a simple logical algorithm. Experiment results show the efficacy of the proposed method.
摘要：成像是一个复杂的过程结合了大量的光伏转换，这导致在最终图像超出视觉感知的一些光谱特征。对原始图像会破坏这些签名难免任何操作留下一些痕迹在最后伪造。因此，我们在篡改图像提出一种新的光学器件物理方法从天然边缘判别剪接边缘。首先，我们从RGB转换法医图像转换成S和o1o2的色彩空间。然后在二色反射模型的假设下，在图像中边缘由复合梯度发现并分类成基于其不同的光度特性不同的类型。最后，拼接边缘对天然的，通过简单的逻辑算法保留。实验结果表明，该方法的有效性。

28. Learning to Scale Multilingual Representations for Vision-Language Tasks [PDF] 返回目录
Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer
Abstract: Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that represents many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for few. We use a novel masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
摘要：当前多语种视觉语言模型要么需要大量的附加参数为每个支持的语言，或语言添加有性能的下降。在本文中，我们提出了一个可扩展的多语种对齐语言表示（SMALR）表示与几个模型参数很多语言在不牺牲下游任务性能。 SMALR学习在一个多语种的词汇最的话，保持对少数特定语言的功能固定大小语言无关的表示。我们使用一种新型屏蔽跨语言建模损失对齐功能与其他语言环境。此外，我们提出了一个跨语言的一致性模块，用于查询和机器翻译做出保证的预测是相当的。 SMALR的有效性证明有十个不同种语言，通过视觉语言任务支持最新的两倍。我们评估对多语种图像句子检索和比其他字嵌入方法与小于1/5优于3-4％之前工作的训练参数。

29. Estimating Grape Yield on the Vine from Multiple Images [PDF] 返回目录
Daniel L. Silver, Jabun Nasa
Abstract: Estimating grape yield prior to harvest is important to commercial vineyard production as it informs many vineyard and winery decisions. Currently, the process of yield estimation is time consuming and varies in its accuracy from 75-90\% depending on the experience of the viticulturist. This paper proposes a multiple task learning (MTL) convolutional neural network (CNN) approach that uses images captured by inexpensive smart phones secured in a simple tripod arrangement. The CNN models use MTL transfer from autoencoders to achieve 85\% accuracy from image data captured 6 days prior to harvest.
摘要：前采收葡萄估计产量为商业葡萄园生产的重要，因为它告诉许多葡萄园和酿酒厂的决定。目前，估产的过程非常耗时，而且取决于葡萄栽培的经验，从75-90 \％在其准确性变化。本文提出了一种多任务学习（MTL）卷积神经网络（CNN）的方法，它使用的图像通过固定在简易三角架安排廉价智能手机拍摄的。 CNN的模型使用，从自动编码MTL转移以实现从捕获6天图像数据在收获前85 \％的精度。

30. Deep Manifold Prior [PDF] 返回目录
Matheus Gadelha, Rui Wang, Subhransu Maji
Abstract: We present a prior for manifold structured data, such as surfaces of 3D shapes, where deep neural networks are adopted to reconstruct a target shape using gradient descent starting from a random initialization. We show that surfaces generated this way are smooth, with limiting behavior characterized by Gaussian processes, and we mathematically derive such properties for fully-connected as well as convolutional networks. We demonstrate our method in a variety of manifold reconstruction applications, such as point cloud denoising and interpolation, achieving considerably better results against competitive baselines while requiring no training data. We also show that when training data is available, our method allows developing alternate parametrizations of surfaces under the framework of AtlasNet, leading to a compact network architecture and better reconstruction results on standard image to shape reconstruction benchmarks.
摘要：我们提出的现有用于歧管的结构化数据，例如3D形状，其中深神经网络采用重建使用梯度下降从随机初始化开始目标形状的表面。我们表明，这种方式产生表面是平滑的，具有限制行为特征在于高斯过程，我们在数学推导全连接以及卷积网络这样的性质。我们证明在各种歧管重建应用程序，如点云去噪和插值我们的方法，实现对竞争力的基线相当好的结果，而无需训练数据。我们还表明，当训练数据是可用的，我们的方法允许开发面的替代参数化AtlasNet的框架下，导致了紧凑的网络体系结构和标准的图像形状重建基准更好的重建结果。

31. Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking [PDF] 返回目录
Hongjun Wang, Guangrun Wang, Ya Li, Dongyu Zhang, Liang Lin
Abstract: The success of DNNs has driven the extensive applications of person re-identification (ReID) into a new era. However, whether ReID inherits the vulnerability of DNNs remains unexplored. To examine the robustness of ReID systems is rather important because the insecurity of ReID systems may cause severe losses, e.g., the criminals may use the adversarial perturbations to cheat the CCTV systems. In this work, we examine the insecurity of current best-performing ReID models by proposing a learning-to-mis-rank formulation to perturb the ranking of the system output. As the cross-dataset transferability is crucial in the ReID domain, we also perform a back-box attack by developing a novel multi-stage network architecture that pyramids the features of different levels to extract general and transferable features for the adversarial perturbations. Our method can control the number of malicious pixels by using differentiable multi-shot sampling. To guarantee the inconspicuousness of the attack, we also propose a new perception loss to achieve better visual quality. Extensive experiments on four of the largest ReID benchmarks (i.e., Market1501 [45], CUHK03 [18], DukeMTMC [33], and MSMT17 [40]) not only show the effectiveness of our method, but also provides directions of the future improvement in the robustness of ReID systems. For example, the accuracy of one of the best-performing ReID systems drops sharply from 91.8% to 1.4% after being attacked by our method. Some attack results are shown in Fig. 1. The code is available at this https URL.
摘要：DNNs的成功，带动了人重新鉴定（里德）的广泛应用进入一个新的时代。然而，里德是否继承DNNs的脆弱性仍然未知。为了检验REID系统的鲁棒性，因为里德系统的不安全性，可能造成严重的损失，例如，犯罪分子可以使用对抗扰动骗取CCTV系统是相当重要的。在这项工作中，我们通过提出一种学习到误排名配方扰乱系统输出的排名查看当前表现最好的车型里德的不安全性。由于跨数据集转让是在里德领域是至关重要的，我们也进行通过开发一种新型的多级网络架构，不同层次的金字塔特征提取一般和转移功能，为对抗扰动的背框进攻。我们的方法可以通过使用可微分的多次拍摄采样控制恶意的像素的数量。为了保证进攻的不显眼，我们也提出了新的看法损失达到更好的视觉质量。上最大的REID基准四个广泛的实验（即，Market1501 [45]，CUHK03 [18]，DukeMTMC [33]，和MSMT17 [40]）不仅显示出我们的方法的有效性，而且还提供的未来改进的方向在REID系统的鲁棒性。例如，表现最好的里德系统之一的精确度大幅从91.8％到1.4％，我们的方法被攻击后下降。有些攻击结果显示在图1中的代码可在此HTTPS URL。

32. The GeoLifeCLEF 2020 Dataset [PDF] 返回目录
Elijah Cole, Benjamin Deneu, Titouan Lorieul, Maximilien Servajean, Christophe Botella, Dan Morris, Nebojsa Jojic, Pierre Bonnet, Alexis Joly
Abstract: Understanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To facilitate research in this area, we present the GeoLifeCLEF 2020 dataset, which consists of 1.9 million species observations paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-resolution climate and soil variables. We also discuss the GeoLifeCLEF 2020 competition, which aims to use this dataset to advance the state-of-the-art in location-based species recommendation.
摘要：了解物种的地理分布是在保护的关键问题。通过配对出现的物种与环境特征，研究人员可以模拟一个环境，它可以在那里找到了物种之间的关系。为了促进这方面的研究，我们提出了GeoLifeCLEF 2020集，其中包括高分辨率遥感影像，土地覆盖数据和海拔高度，除了传统的低分辨率的气候和土壤变量配对190万点物种的观测。我们还讨论了GeoLifeCLEF 2020竞争，其目的是用这个数据集来推进国家的最先进的基于位置的品种推荐。

33. Leveraging 2D Data to Learn Textured 3D Mesh Generation [PDF] 返回目录
Paul Henderson, Vagia Tsiminaki, Christoph H. Lampert
Abstract: Numerous methods have been proposed for probabilistic generative modelling of 3D objects. However, none of these is able to produce textured objects, which renders them of limited use for practical tasks. In this work, we present the first generative model of textured 3D meshes. Training such a model would traditionally require a large dataset of textured meshes, but unfortunately, existing datasets of meshes lack detailed textures. We instead propose a new training methodology that allows learning from collections of 2D images without any 3D information. To do so, we train our model to explain a distribution of images by modelling each image as a 3D foreground object placed in front of a 2D background. Thus, it learns to generate meshes that when rendered, produce images similar to those in its training set. A well-known problem when generating meshes with deep networks is the emergence of self-intersections, which are problematic for many use-cases. As a second contribution we therefore introduce a new generation process for 3D meshes that guarantees no self-intersections arise, based on the physical intuition that faces should push one another out of the way as they move. We conduct extensive experiments on our approach, reporting quantitative and qualitative results on both synthetic data and natural images. These show our method successfully learns to generate plausible and diverse textured 3D samples for five challenging object classes.
摘要：许多方法已经被提出了三维物体的概率生成模型。然而，这些都不是能够产生纹理对象，这使得在实际工作用途有限的他们。在这项工作中，我们提出纹理的三维网格的第一生成模型。培养出这样一个模型将传统需要大量的数据集织纹网眼，但不幸的是，网格的现有数据集缺乏详细的纹理。相反，我们建议，允许从2D图像集合学习没有任何3D信息的新的训练方法。要做到这一点，我们训练我们的模型由每个图像放置在2D背景前一个3D的前景对象模型来解释图像的分布。因此，学习，以生成该网格中呈现时，产生类似于在它的训练图像集。生成与深网络网格当一个众所周知的问题是自相交，这对于许多用例问题的出现。作为第二个贡献，因此我们推出了，保证没有自相交产生的，基于物理直觉的面孔应该推动彼此的方式进行，因为他们移动3D网格的新一代工艺。我们对我们的做法进行了广泛的实验，报告在人工数据和自然图像的定量和定性结果。这表明我们的方法成功地学会了合理的生成和多样的纹理3D样品五个有挑战性的对象类。

34. Learning to Drive Off Road on Smooth Terrain in Unstructured Environments Using an On-Board Camera and Sparse Aerial Images [PDF] 返回目录
Travis Manderson, Stefan Wapnick, David Meger, Gregory Dudek
Abstract: We present a method for learning to drive on smooth terrain while simultaneously avoiding collisions in challenging off-road and unstructured outdoor environments using only visual inputs. Our approach applies a hybrid model-based and model-free reinforcement learning method that is entirely self-supervised in labeling terrain roughness and collisions using on-board sensors. Notably, we provide both first-person and overhead aerial image inputs to our model. We find that the fusion of these complementary inputs improves planning foresight and makes the model robust to visual obstructions. Our results show the ability to generalize to environments with plentiful vegetation, various types of rock, and sandy trails. During evaluation, our policy attained 90% smooth terrain traversal and reduced the proportion of rough terrain driven over by 6.1 times compared to a model using only first-person imagery.
摘要：本文提出了一种方法学开车上地势平坦，同时避免使用唯一的视觉输入具有挑战性的越野化和非结构化的室外环境中的碰撞。我们的方法应用于一种混合模式为基础的和无模型强化学习方法，该方法完全是用标记地形粗糙度和碰撞使用车载传感器自我监督。值得注意的是，我们同时提供第一人称和开销的航拍图像输入到我们的模型。我们发现，这些补充投入的融合提高了规划的远见和使模型强大的视觉障碍。我们的研究结果表明，以推广到拥有丰富的植被，不同类型的岩石，和沙质小径环境的能力。在评估过程中，我们的政策达到90％，地势平坦遍历和减少崎岖地形的6.1倍行驶超过比例相比，只用第一人称的图像模型。

35. Rethinking the Trigger of Backdoor Attack [PDF] 返回目录
Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, Shutao Xia
Abstract: In this work, we study the problem of backdoor attacks, which add a specific trigger ($i.e.$, a local patch) onto some training images to enforce that the testing images with the same trigger are incorrectly predicted while the natural testing examples are correctly predicted by the trained model. Many existing works adopted the setting that the triggers across the training and testing images follow the same appearance and are located at the same area. However, we observe that if the appearance or location of the trigger is slightly changed, then the attack performance may degrade sharply. According to this observation, we propose to spatially transform ($e.g.$, flipping and scaling) the testing image, such that the appearance and location of the trigger (if exists) will be changed. This simple strategy is experimentally verified to be effective to defend many state-of-the-art backdoor attack methods. Furthermore, to enhance the robustness of the backdoor attacks, we propose to conduct the random spatial transformation on the training images with the trigger before feeding into the training process. Extensive experiments verify that the proposed backdoor attack is robust to spatial transformations.
摘要：在这项工作中，我们研究的后门攻击，其中添加特定的触发问题（$即$，本地修补程序）到一些训练图像执行具有相同触发测试图像错误地预测，而自然的测试实例由受过训练的模型正确预测。许多现有的作品采用了跨训练和测试图像的触发器遵循相同的外观和位于同一区域的设置。但是，我们观察到的是，如果触发的外观或位置稍有改变，那么攻击性能急剧下降。根据这一观察，我们建议在空间变换（如$ $，翻转和缩放）的测试图像，使得触发器的外观和位置（如果存在）将被改变。这个简单的策略是实验验证是有效捍卫国家的最先进的很多后门攻击方法。此外，为提高借壳攻击的稳健性，我们提出送入训练过程之前进行的与触发训练图像的随机空间变换。大量的实验验证拟借壳攻击是稳健的空间变换。

36. Orthogonal Over-Parameterized Training [PDF] 返回目录
Weiyang Liu, Rongmei Lin, Zhen Liu, James M. Rehg, Li Xiong, Le Song
Abstract: The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is even more important than designing the architecture. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By constantly maintaining the minimum hyperspherical energy during training, OPT can greatly improve the network generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We propose multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient update. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization and may be more important than learning a specific relative position of neurons. We further provide theoretical insights of why OPT yields better generalization. Extensive experiments validate the superiority of OPT.
摘要：一个神经网络的感应偏压主要由架构和训练算法来确定。为了达到良好的推广，如何有效地训练神经网络甚至比设计结构更重要。我们提出了一个新颖的正交超参数训练（OPT）的框架，可以最大限度地减少可证明中超球的能量表征上的超球神经元的多样性。通过培训期间不断保持最小超球能量，OPT可以大大提高网络的泛化。具体而言，OPT修复神经元的随机初始化的权重并且获悉适用于这些神经元的正交变换。我们提出多种方式来学习这样的正交变换，包括展开正交化算法，应用正交参数，并设计正交保留梯度更新。有趣的是，OPT表明，学习正确的坐标系的神经元是推广的关键，可能比学习神经元的特定相对位置更重要。我们进一步提供了为什么OPT得到更好的泛化的理论见解。大量的实验验证OPT的优越性。

37. Fisher Discriminant Triplet and Contrastive Losses for Training Siamese Networks [PDF] 返回目录
Benyamin Ghojogh, Milad Sikaroudi, Sobhan Shafiei, H.R. Tizhoosh, Fakhri Karray, Mark Crowley
Abstract: Siamese neural network is a very powerful architecture for both feature extraction and metric learning. It usually consists of several networks that share weights. The Siamese concept is topology-agnostic and can use any neural network as its backbone. The two most popular loss functions for training these networks are the triplet and contrastive loss functions. In this paper, we propose two novel loss functions, named Fisher Discriminant Triplet (FDT) and Fisher Discriminant Contrastive (FDC). The former uses anchor-neighbor-distant triplets while the latter utilizes pairs of anchor-neighbor and anchor-distant samples. The FDT and FDC loss functions are designed based on the statistical formulation of the Fisher Discriminant Analysis (FDA), which is a linear subspace learning method. Our experiments on the MNIST and two challenging and publicly available histopathology datasets show the effectiveness of the proposed loss functions.
摘要：连体神经网络是两个特征提取和度量学习一个非常强大的架构。它通常由共享权重的几个网络。连体概念拓扑无关，可以使用任何神经网络为骨干。训练这些网络最流行的两种损失函数是三联和对比损失的功能。在本文中，我们提出了两种新的损失函数，命名Fisher判别三重（FDT）和Fisher判别对比（FDC）。前者采用锚邻居遥远三胞胎而锚邻居和锚遥远样品后者利用对。的FDT和FDC损失函数是基于Fisher判别分析（FDA），其是线性子空间学习方法的统计制剂设计。我们对MNIST两具有挑战性的和公开的组织病理学数据集实验表明所提出的损失函数的有效性。

38. Test-Time Adaptable Neural Networks for Robust Medical Image Segmentation [PDF] 返回目录
Neerav Karani, Krishna Chaitanya, Ender Konukoglu
Abstract: Convolutional Neural Networks (CNNs) work very well for supervised learning problems when the training dataset is representative of the variations expected to be encountered at test time. In medical image segmentation, this premise is violated when there is a mismatch between training and test images in terms of their acquisition details, such as the scanner model or the protocol. Remarkable performance degradation of CNNs in this scenario is well documented in the literature. To address this problem, we design the segmentation CNN as a concatenation of two sub-networks: a relatively shallow image normalization CNN, followed by a deep CNN that segments the normalized image. We train both these sub-networks using a training dataset, consisting of annotated images from a particular scanner and protocol setting. Now, at test time, we adapt the image normalization sub-network for each test image, guided by an implicit prior on the predicted segmentation labels. We employ an independently trained denoising autoencoder (DAE) in order to model such an implicit prior on plausible anatomical segmentation labels. We validate the proposed idea on multi-center Magnetic Resonance imaging datasets of three anatomies: brain, heart and prostate. The proposed test-time adaptation consistently provides performance improvement, demonstrating the promise and generality of the approach. Being agnostic to the architecture of the deep CNN, the second sub-network, the proposed design can be utilized with any segmentation network to increase robustness to variations in imaging scanners and protocols.
摘要：卷积神经网络（细胞神经网络）的工作非常好监督学习问题时，训练数据集代表预期在测试时会遇到的变化。在医学图像分割，当存在在其获取详细信息，例如扫描仪模型或协议条款的训练和测试图像之间的不匹配这样的前提被违反。在这种情况下细胞神经网络的卓越性能下降在文献中是有据可查的。为了解决这个问题，我们设计了分段CNN作为两个子网的连接：一个相对较浅的图像标准化CNN，其次是深CNN说，段归一化图像。我们培养既使用训练数据集，这些子网络，从一个特定的扫描仪和协议设置包括注释图像。现在，在测试时间，我们适应图像归一化的子网络中的每一个测试图像，通过隐式事先预测的分割标签引导。我们聘请独立的培训去噪的自动编码（DAE），以模型这样的隐式之前就合理的解剖学分割标签。我们验证对三种解剖多中心磁共振成像数据集所提出的想法：大脑，心脏和前列腺。所提出的测试时间适应持续提供性能改进，证明该方法的承诺和普遍性。作为不可知的深CNN，所述第二子网络的体系结构，所提出的设计可以与任何网络分段被用来增加鲁棒性变化进行成像的扫描仪和协议。

39. CNN2Gate: Toward Designing a General Framework for Implementation of Convolutional Neural Networks on FPGA [PDF] 返回目录
Alireza Ghaffari, Yvon Savaria
Abstract: Convolutional Neural Networks (CNNs) have a major impact on our society because of the numerous services they provide. On the other hand, they require considerable computing power. To satisfy these requirements, it is possible to use graphic processing units (GPUs). However, high power consumption and limited external IOs constrain their usability and suitability in industrial and mission-critical scenarios. Recently, the number of researches that utilize FPGAs to implement CNNs are increasing rapidly. This is due to the lower power consumption and easy reconfigurability offered by these platforms. Because of the research efforts put into topics such as architecture, synthesis and optimization, some new challenges are arising to integrate such hardware solutions to high-level machine learning software libraries. This paper introduces an integrated framework (CNN2Gate) that supports compilation of a CNN model for an FPGA target. CNN2Gate exploits the OpenCL\textsuperscript{TM} synthesis workflow for FPGAs offered by commercial vendors. CNN2Gate is capable of parsing CNN models from several popular high-level machine learning libraries such as Keras, Pytorch, Caffe2 etc. CNN2Gate extracts computation flow of layers, in addition to weights and biases and applies a "given" fixed-point quantization. Furthermore, it writes this information in the proper format for OpenCL synthesis tools that are then used to build and run the project on FPGA. CNN2Gate performs design-space exploration using a reinforcement learning agent and fits the design on different FPGAs with limited logic resources automatically. This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms. CNN2Gate achieves a latency of 205 ms for VGG-16 and 18 ms for AlexNet on the FPGA.
摘要：卷积神经网络（细胞神经网络）有，因为它们提供了大量的服务对我们的社会产生重大影响。在另一方面，它们需要大量的计算能力。为了满足这些要求，可以使用图形处理单元（GPU）。然而，高功耗和有限的外部IO的约束其在工业和关键任务场景可用性和适用性。最近，利用FPGA来实现细胞神经网络研究的数量正在迅速增加。这是由于较低的功耗，并通过这些平台提供方便的可重构。由于研究工作投入的主题，如建筑，综合和优化，一些新的挑战正在出现整合学习软件库，硬件解决方案，以高层次的机器。本文介绍了一种集成框架（CNN2Gate）CNN的模型用于FPGA目标的支持编译。 CNN2Gate利用OpenCL的\为FPGA textsuperscript {TM}合成工作流提供由商业供应商。 CNN2Gate能够从几个流行的高级机器学习库如Keras，Pytorch，Caffe2等CNN2Gate提取物层的计算流解析CNN模型，除了重量和偏见和应用“给定的”定点量化。此外，在写入针对OpenCL合成工具，然后用来编译和运行FPGA项目的正确格式信息。使用增强学习剂和配合上与自动有限逻辑资源不同的FPGA设计CNN2Gate进行设计空探测。本文报道AlexNet和VGG-16的各种英特尔FPGA平台自动合成和设计空间探索的结果。 CNN2Gate实现了205毫秒VGG-16和18毫秒AlexNet在FPGA上的等待时间。

40. Cortical surface registration using unsupervised learning [PDF] 返回目录
Jieyu Cheng, Adrian Dalca, Bruce Fischl, Lilla Zollei
Abstract: Non-rigid cortical registration is an important and challenging task due to the geometric complexity of the human cortex and the high degree of inter-subject variability. A conventional solution is to use a spherical representation of surface properties and perform registration by aligning cortical folding patterns in that space. This strategy produces accurate spatial alignment but often requires a high computational cost. Recently, convolutional neural networks (CNNs) have demonstrated the potential to dramatically speed up volumetric registration. However, due to distortions introduced by projecting a sphere to a 2D plane, a direct application of recent learning-based methods to surfaces yields poor results. In this study, we present SphereMorph, a diffeomorphic registration framework for cortical surfaces using deep networks that addresses these issues. SphereMorph uses a UNet-style network associated with a spherical kernel to learn the displacement field and warps the sphere using a modified spatial transformer layer. We propose a resampling weight in computing the data fitting loss to account for distortions introduced by polar projection, and demonstrate the performance of our proposed method on two tasks, including cortical parcellation and group-wise functional area alignment. The experiments show that the proposed SphereMorph is capable of modeling the geometric registration problem in a CNN framework and demonstrate superior registration accuracy and computational efficiency.
摘要：非刚性皮质登记是一项重要而艰巨的任务，由于人脑皮层的几何复杂性和高度的学科间差异的。传统的解决方案是使用表面性质的球形表示并通过在该空间对准皮质折叠模式执行注册。这种策略产生准确的空间定位，但往往需要计算成本高。近日，卷积神经网络（细胞神经网络）已经证明，大大加快体积登记的潜力。然而，由于由一个球体投影到二维平面上引入的失真，最近基于学习的方法表面的直接应用产生较差的结果。在这项研究中，我们目前SphereMorph，皮质表面的微分同胚注册框架中使用深层网络，解决了这些问题。 SphereMorph使用具有球形核心相关联的学习位移字段UNET式网络和经线使用改进的空间变换器层的球体。我们提出了一个重新取样权重计算数据拟合损失帐户，以便将极投影引入的失真，并展示两个任务，包括皮质和地块划分分组方式功能区定位我们提出的方法的性能。实验结果表明，所提出的SphereMorph能够建模在CNN框架几何配准问题的，并证明优异的配准精度和计算效率。

41. DeepCOVIDExplainer: Explainable COVID-19 Predictions Based on Chest X-ray Images [PDF] 返回目录
Md. Rezaul Karim, Till Döhmen, Dietrich Rebholz-Schuhmann, Stefan Decker, Michael Cochez, Oya Beyan
Abstract: Amid the coronavirus disease(COVID-19) pandemic, humanity experiences a rapid increase in infection numbers across the world. Challenge hospitals are faced with, in the fight against the virus, is the effective screening of incoming patients. One methodology is the assessment of chest radiography(CXR) images, which usually requires expert radiologists' knowledge. In this paper, we propose an explainable deep neural networks(DNN)-based method for automatic detection of COVID-19 symptoms from CXR images, which we call 'DeepCOVIDExplainer'. We used 16,995 CXR images across 13,808 patients, covering normal, pneumonia, and COVID-19 cases. CXR images are first comprehensively preprocessed, before being augmented and classified with a neural ensemble method, followed by highlighting class-discriminating regions using gradient-guided class activation maps(Grad-CAM++) and layer-wise relevance propagation(LRP). Further, we provide human-interpretable explanations of the predictions. Evaluation results based on hold-out data show that our approach can identify COVID-19 confidently with a positive predictive value(PPV) of 89.61% and recall of 83%, improving over recent comparable approaches. We hope that our findings will be a useful contribution to the fight against COVID-19 and, in more general, towards an increasing acceptance and adoption of AI-assisted applications in the clinical practice.
摘要：在一片冠状病毒病（COVID-19）的流行，人类经历的快速增长在全球感染人数。挑战医院都面临着在对抗病毒的斗争中，是进入的患者有效的筛选。其中一个方法是胸片（CXR）图像，这通常需要放射学专家的知识的考核。在本文中，我们提出了一种可解释深神经网络（DNN），用于从图像CXR COVID-19症状，我们称之为“DeepCOVIDExplainer”自动检测系方式。我们使用16,995 CXR图像跨越13808个例，覆盖正常，肺炎和COVID-19的情况。 CXR图像首先全面预处理，被增强并且被分类为有神经合奏方法之前，接着使用高亮梯度引导的类激活图（梯度-CAM ++）和逐层相关性传播（LRP）类辨别区域。此外，我们提供的预测人解释解释。基于保持了数据的评估结果表明，我们的方法可以用阳性预测值的89.61％（PPV）和83％的召回识别COVID-19信心十足，提高了近期类似的方法。我们希望我们的发现将是打击COVID-19战斗了有益的贡献，并在更一般的，对在临床实践中的AI辅助应用的越来越多的认可和采纳。

42. ARCH: Animatable Reconstruction of Clothed Humans [PDF] 返回目录
Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung
Abstract: In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing approaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Detailed surface geometry and appearance are learned using an implicit function representation with spatial local features. Furthermore, we propose additional per-pixel supervision on the 3D reconstruction using opacity-aware differentiable rendering. Our experiments indicate that ARCH increases the fidelity of the reconstructed humans. We obtain more than 50% lower reconstruction errors for standard metrics compared to state-of-the-art methods on public datasets. We also show numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
摘要：在本文中，我们提出了ARCH（穿衣人类的设置动画重建），一种新型的终端到终端的框架的动画3D准备从单眼图像穿衣人准确的重建。现有的方法，以数字化三维人类斗争来处理姿势变化和恢复的详细信息。此外，他们不生产模式，是动画准备。相比之下，ARCH是产生详细的三维操纵全身人类化身从一个单一的约束RGB图像了解到姿态感知模型。语义空间和语义变形场使用的是参数化三维人体估计创建。他们让2D / 3D的转变穿着人类成规范的空间，降低了训练数据造成的姿态变化和遮挡几何歧义。详细的表面几何形状和外观使用具有空间局部特征的隐函数表示获知。此外，建议使用不透明感知微渲染的三维重建额外的每个像素的监督。我们的实验表明，ARCH增加了重建人类的保真度。我们比较了在公共数据集状态的最先进的方法获得的标准指标的50％以上降低重建误差。我们还显示动画，高品质的众多定性例子重建化身看不见的文献至今。

43. Recognizing Spatial Configurations of Objects with Graph Neural Networks [PDF] 返回目录
Laetitia Teodorescu, Katja Hofmann, Pierre-Yves Oudeyer
Abstract: Deep learning algorithms can be seen as compositions of functions acting on learned representations encoded as tensor-structured data. However, in most applications those representations are monolithic, with for instance one single vector encoding an entire image or sentence. In this paper, we build upon the recent successes of Graph Neural Networks (GNNs) to explore the use of graph-structured representations for learning spatial configurations. Motivated by the ability of humans to distinguish arrangements of shapes, we introduce two novel geometrical reasoning tasks, for which we provide the datasets. We introduce novel GNN layers and architectures to solve the tasks and show that graph-structured representations are necessary for good performance.
摘要：深学习算法可以被看作是作用于编码为张量结构化的数据了解到表示的功能成分。然而，在大多数应用中这些表示是整体式的，具有例如一个单一的载体中编码的整个图像或句子。在本文中，我们建立在近期图表神经网络（GNNS）的成功探索利用图形结构表示的学习空间配置。人类的辨别形状的安排能力的启发，我们引入两个新的几何推理任务，为此，我们提供的数据集。我们引入新的GNN层和架构来解决的任务，并显示图结构的表示是必要的良好的性能。

44. Learnable Subspace Clustering [PDF] 返回目录
Jun Li, Hongfu Liu, Zhiqiang Tao, Handong Zhao, Yun Fu
Abstract: This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.
摘要：本文研究了大规模子空间聚类（LSSC）问题万个数据点。虽然他们已被视为国家的最先进的方法，对小规模的数据点，许多流行的子空间聚类方法无法直接处理LSSC问题。一个基本的原因是，这些方法往往会选择所有数据点作为一个大字典来建立庞大的编码模型，这导致高的时间和空间复杂度。在本文中，我们开发了一个可以学习的子空间聚类模式有效地解决了物流服务供应链的问题。其核心思想是学习的参数函数的高维子空间划分成其潜在的低维子空间，而不是传统的编码模型的昂贵费用。此外，我们提出了一个统一的强大的预测编码机（RPCM）学习的参数功能，可通过交替最小化算法来解决。此外，我们提供的参数函数的有界收缩分析。据我们所知，这纸是子空间聚类方法中的第一项工作，以有效集群百万个数据点。上亿规模的数据集实验验证我们的模式优于在效率和效益相关的国家的最先进的方法。

45. Adversarial Latent Autoencoders [PDF] 返回目录
Stanislav Pidhorskyi, Donald Adjeroh, Gianfranco Doretto
Abstract: Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the issues of whether they have the same generative power of GANs, or learn disentangled representations, have not been fully addressed. We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). It is a general architecture that can leverage recent improvements on GAN training procedures. We designed two autoencoders: one based on a MLP encoder, and another based on a StyleGAN generator, which we call StyleALAE. We verify the disentanglement properties of both architectures. We show that StyleALAE can not only generate 1024x1024 face images with comparable quality of StyleGAN, but at the same resolution can also produce face reconstructions and manipulations based on real images. This makes ALAE the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.
摘要：自动编码网络是无人监督的方法，旨在通过同时学习编码发电机地图结合生成和代表性特性。尽管广泛的研究，他们是否有甘斯相同原动力，或学习解开交涉的问题，还没有得到彻底解决。我们引进自动编码是共同铲球这些问题，我们称之为对抗性潜自动编码器（鼻翼）。它是一个可以利用的甘训练程序的最新改进的通用架构。我们设计了两个自动编码：一个是基于一个MLP编码器，以及其他根据StyleGAN发电机，我们称之为StyleALAE。我们验证这两种架构的性能解开。我们表明，StyleALAE不仅可以生成1024×1024的人脸图像与StyleGAN的质量不相上下，但在相同的分辨率也可以产生基于真实人脸图像重建和操作。这使得鼻翼第一个自动编码器能够与比较，超越建筑的唯一发电机型的能力。

46. TensorProjection Layer: A Tensor-Based Dimensionality Reduction Method in CNN [PDF] 返回目录
Toshinari Morimoto, Su-Yun Huang
Abstract: In this paper, we propose a dimensionality reduction method applied to tensor-structured data as a hidden layer (we call it TensorProjection Layer) in a convolutional neural network. Our proposed method transforms input tensors into ones with a smaller dimension by projection. The directions of projection are viewed as training parameters associated with our proposed layer and trained via a supervised learning criterion such as minimization of the cross-entropy loss function. We discuss the gradients of the loss function with respect to the parameters associated with our proposed layer. We also implement simple numerical experiments to evaluate the performance of the TensorProjection Layer.
摘要：在本文中，我们提出了一个降维的方法施加到张量结构的数据作为一个隐藏层（我们称之为TensorProjection层）中的卷积神经网络。我们提出的方法的变换输入张量到那些与由投影更小的尺寸。突出部的方向被视为与我们提出的层经由监督学习的标准，如交叉熵损失函数的最小化相关联的和受过训练的训练参数。我们讨论的损失函数的梯度相对于我们提出的层相关的参数。我们还实现了简单的数值实验，以评估TensorProjection层的性能。

47. Automatic detection of acute ischemic stroke using non-contrast computed tomography and two-stage deep learning model [PDF] 返回目录
Mizuho Nishio, Sho Koyasu, Shunjiro Noguchi, Takao Kiguchi, Kanako Nakatsu, Thai Akasaka, Hiroki Yamada, Kyo Itoh
Abstract: Background and Purpose: We aimed to develop and evaluate an automatic acute ischemic stroke-related (AIS) detection system involving a two-stage deep learning model. Methods: We included 238 cases from two different institutions. AIS-related findings were annotated on each of the 238 sets of head CT images by referring to head magnetic resonance imaging (MRI) images in which an MRI examination was performed within 24 h following the CT scan. These 238 annotated cases were divided into a training set including 189 cases and test set including 49 cases. Subsequently, a two-stage deep learning detection model was constructed from the training set using the You Only Look Once v3 model and Visual Geometry Group 16 classification model. Then, the two-stage model performed the AIS detection process in the test set. To assess the detection model's results, a board-certified radiologist also evaluated the test set head CT images with and without the aid of the detection model. The sensitivity of AIS detection and number of false positives were calculated for the evaluation of the test set detection results. The sensitivity of the radiologist with and without the software detection results was compared using the McNemar test. A p-value of less than 0.05 was considered statistically significant. Results: For the two-stage model and radiologist without and with the use of the software results, the sensitivity was 37.3%, 33.3%, and 41.3%, respectively, and the number of false positives per one case was 1.265, 0.327, and 0.388, respectively. On using the two-stage detection model's results, the board-certified radiologist's detection sensitivity significantly improved (p-value = 0.0313). Conclusions: Our detection system involving the two-stage deep learning model significantly improved the radiologist's sensitivity in AIS detection.
摘要：背景与目的：我们的目的是开发和评估涉及两个阶段的深度学习模型自动急性缺血性卒中相关（AIS）检测系统。方法：我们从两个不同的机构238案件。 AIS相关研究结果通过参照头在其下面的CT扫描在24小时内进行的MRI检查磁共振成像（MRI）图像注释在每个238套头CT图像中的。这些标注的238例患者分为训练集，包括189案件和测试集包含49例。随后，两个阶段的深度学习检测模型从训练集利用构建的你只看一旦V3模型和视觉的几何形状组16分类模型。然后，将两阶段模型中测试集执行的AIS检测处理。为了评估检测模型的结果，委员会认证的放射科医师还具有和不具有检测模型的帮助下进行评价测试集头部CT图像。 AIS检测和假阳性的数目的灵敏度进行了计算的测试组检测结果的评估。使用和不使用软件检测结果放射科医师的灵敏度，使用McNemar检验进行比较。小于0.05的p值被认为是统计学显著。结果：对于两阶段模型和放射科医师不具有和具有使用该软件的结果，灵敏度分别为37.3％，33.3％，和41.3％，和假阳性的数目每一种情况下为1.265，0.327，和0.388，分别。使用两阶段检测模型的结果中，该委员会认证的放射科医师的检测灵敏度显著改善（p值= 0.0313）。结论：我们涉及两阶段深度学习模型检测系统显著改善AIS检测放射科医生的敏感性。

48. Score-Guided Generative Adversarial Networks [PDF] 返回目录
Minhyeok Lee, Junhee Seok
Abstract: We propose a Generative Adversarial Network (GAN) that introduces an evaluator module using pre-trained networks. The proposed model, called score-guided GAN (ScoreGAN), is trained with an evaluation metric for GANs, i.e., the Inception score, as a rough guide for the training of the generator. By using another pre-trained network instead of the Inception network, ScoreGAN circumvents the overfitting of the Inception network in order that generated samples do not correspond to adversarial examples of the Inception network. Also, to prevent the overfitting, the evaluation metrics are employed only as an auxiliary role, while the conventional target of GANs is mainly used. Evaluated with the CIFAR-10 dataset, ScoreGAN demonstrated an Inception score of 10.36$\pm$0.15, which corresponds to state-of-the-art performance. Furthermore, to generalize the effectiveness of ScoreGAN, the model was further evaluated with another dataset, i.e., the CIFAR-100; as a result, ScoreGAN outperformed the other existing methods, where the Fréchet Inception Distance (FID) of ScoreGAN trained over the CIFAR-100 dataset was 13.98.
摘要：我们提出了一个剖成对抗性网络（GAN），介绍使用预训练网络的评估模块。所提出的模型，称为得分引导GAN（ScoreGAN），进行训练的评估度量甘斯，即启得分，作为粗略导用于发电机的训练。通过使用另一预先训练网络而不是先启网络，ScoreGAN绕开先启网络，以便过度拟合该生成的样本不对应于所述启网络的对抗性例子。而且，为了防止过度拟合，评价指标采用仅作为辅助作用，同时甘斯的常规目标主要使用。与CIFAR-10数据集进行评估，证实ScoreGAN一个启得分10.36 $ \下午$ 0.15，这对应于状态的最先进的性能。此外，概括ScoreGAN的有效性，该模型进一步用另一数据集进行评估，即，CIFAR-100;作为结果，ScoreGAN优于其它现有的方法，其中ScoreGAN的Fréchet可启距离（FID）训练过CIFAR-100数据集是13.98。

49. Feedback Recurrent Autoencoder for Video Compression [PDF] 返回目录
Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere, Taco S Cohen
Abstract: Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode. Our method yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset, among both learned video compression approaches and classical video compression methods (H.265 and H.264) in the rate range of interest for streaming applications. Additionally, we provide an analysis of existing approaches through the lens of their underlying probabilistic graphical models. Finally, we point out issues with temporal consistency and color shift observed in empirical evaluation, and suggest directions forward to alleviate those.
摘要：深生成模型的最新进展，使高维数据分布的高效的建模，开辟了解决数据压缩问题提供了新的视野。具体而言，基于自动编码了解到图像或视频压缩解决方案正在成为强劲的竞争对手，以传统的方法。在这项工作中，我们提出了一个新的网络架构，基于共同的且深入研究的组成部分，在低延迟模式了解到视频压缩操作。我们的方法的产率上的高分辨率数据集UVG，都学到视频压缩中的技术MS-SSIM /速率性能的状态接近并在用于流应用的利息率范围经典的视频压缩方法（H.265和H.264）。此外，我们通过他们的潜在概率图模型的镜头提供的现有方法进行分析。最后，我们指出的问题与时间一致性和经验评估观察颜色变化，并提出今后的方向，以缓解这些。

50. TOG: Targeted Adversarial Objectness Gradient Attacks on Real-time Object Detection Systems [PDF] 返回目录
Ka-Ho Chow, Ling Liu, Mehmet Emre Gursoy, Stacey Truex, Wenqi Wei, Yanzhao Wu
Abstract: The rapid growth of real-time huge data capturing has pushed the deep learning and data analytic computing to the edge systems. Real-time object recognition on the edge is one of the representative deep neural network (DNN) powered edge systems for real-world mission-critical applications, such as autonomous driving and augmented reality. While DNN powered object detection edge systems celebrate many life-enriching opportunities, they also open doors for misuse and abuse. This paper presents three Targeted adversarial Objectness Gradient attacks, coined as TOG, which can cause the state-of-the-art deep object detection networks to suffer from object-vanishing, object-fabrication, and object-mislabeling attacks. We also present a universal objectness gradient attack to use adversarial transferability for black-box attacks, which is effective on any inputs with negligible attack time cost, low human perceptibility, and particularly detrimental to object detection edge systems. We report our experimental measurements using two benchmark datasets (PASCAL VOC and MS COCO) on two state-of-the-art detection algorithms (YOLO and SSD). The results demonstrate serious adversarial vulnerabilities and the compelling need for developing robust object detection systems.
摘要：实时海量数据采集的快速增长推动了深刻的学习和数据分析计算到边缘系统。在边缘实时物体识别为代表的深层神经网络（DNN）供电的边缘为真实世界的关键任务应用，如自动驾驶和增强现实系统之一。虽然DNN供电对象检测边缘系统庆祝许多丰富生活的机会，他们也敞开大门的误用和滥用。本文提出了三种靶向对抗对象性梯度攻击，如铸造TOG，这会导致状态的最先进的深物体检测网络，以从对象消失，对象的制造，和贴错标签对象的攻击受到影响。我们还提出了一种通用的对象性梯度攻击用于暗箱攻击敌对转印，这是具有可忽略的攻击时间成本，低人类可感知性，特别是不利于物体检测边缘系统的任何输入有效。我们两个国家的最先进的检测算法（YOLO和SSD）使用两个标准数据集（PASCAL VOC和MS COCO）报告我们的实验测量。结果表明严重对抗漏洞并开发健壮的对象检测系统的迫切需要。

51. Physics-enhanced machine learning for virtual fluorescence microscopy [PDF] 返回目录
Colin L. Cooke, Fanjie Kong, Amey Chaware, Kevin C. Zhou, Kanghyun Kim, Rong Xu, D. Michael Ando, Samuel J. Yang, Pavan Chandra Konda, Roarke Horstmeyer
Abstract: This paper introduces a supervised deep-learning network that jointly optimizes the physical setup of an optical microscope to infer fluorescence image information. Specifically, we design a bright-field microscope's illumination module to maximize the performance for inference of fluorescent cellular features from bright-field imagery. We take advantage of the wide degree of flexibility available in illuminating a sample to optimize for programmable patterns of light from a customized LED array, which produce better task-specific performance than standard lighting techniques. We achieve illumination pattern optimization by including a physical model of image formation within the initial layers of a deep convolutional network. Our optimized illumination patterns result in up to a 45% performance improvement as compared to standard imaging methods, and we additionally explore how the optimized patterns vary as a function of inference task. This work demonstrates the importance of optimizing the process of image capture via programmable optical elements to improve automated analysis, and offers new physical insights into expected performance gains of recent fluorescence image inference work.
摘要：本文介绍了一种监督深学习网络联合地优化光学显微镜来推断的荧光图像信息的物理设置。具体来说，我们设计了明视场显微镜的照明模块，以最大化从亮场图像的荧光的细胞特征推断的性能。我们利用宽度的在照射样品，以优化从定制的LED阵列的光的可编程图案，其产生比标准的照明技术更好任务的特定性能可用灵活性。我们通过包括图像形成的深卷积网络的初始层内的物理模型实现照明图案的优化。我们已优化的照明图案导致高达45％的性能改进相比于标准的成像方法，我们还探讨如何优化图案作为推断任务的函数而变化。这项工作表明通过可编程光学元件优化图像采集的过程中提高自动化分析的重要性，并提供新的物理洞察最新的荧光图像推断工作预期的性能提升。

52. GeneCAI: Genetic Evolution for Acquiring Compact AI [PDF] 返回目录
Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar
Abstract: In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy such compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyper-parameters that require per-layer customization to ensure high accuracy. Choosing such hyper-parameters is cumbersome as the pertinent search space grows exponentially with model layers. This paper introduces GeneCAI, a novel optimization method that automatically learns how to tune per-layer compression hyper-parameters. We devise a bijective translation scheme that encodes compressed DNNs to the genotype space. The optimality of each genotype is measured using a multi-objective score based on accuracy and number of floating point operations. We develop customized genetic operations to iteratively evolve the non-dominated solutions towards the optimal Pareto front, thus, capturing the optimal trade-off between model accuracy and complexity. GeneCAI optimization method is highly scalable and can achieve a near-linear performance boost on distributed multi-GPU platforms. Our extensive evaluations demonstrate that GeneCAI outperforms existing rule-based and reinforcement learning methods in DNN compression by finding models that lie on a better accuracy-complexity Pareto curve.
摘要：在当今大数据领域，深层神经网络（DNNs）正在向更加复杂的架构来实现更高的精度推断。模型的压缩技术可被利用来有效地部署在资源有限的移动设备，例如计算密集型结构。这样的方法包括那些需要每层定制，以保证高精度的各种超参数。选择这种超参数是麻烦的作为相关的搜索空间与模型层呈指数增长。本文介绍GeneCAI，一种新颖的优化方法，该方法自动地学习如何调整每个层压缩超参数。我们设计一个双射转换方案编码压缩DNNs的基因型空间。每种基因型的最优使用基于精度和浮点操作数的多目标得分测量。我们开发定制的遗传操作迭代进化非支配解朝着帕累托最优前面，因此，捕捉最佳平衡模型的精度和复杂度之间。 GeneCAI优化方法具有高度可扩展，可实现分布式的多GPU平台的近线性的性能提升。我们广泛的评估表明，GeneCAI性能优于通过寻找现有车型在DNN压缩规则为基础和强化学习方法是谎言更好的精确度，复杂性帕累托曲线。

53. Variable Rate Video Compression using a Hybrid Recurrent Convolutional Learning Framework [PDF] 返回目录
Aishwarya Jadhav
Abstract: In recent years, neural network-based image compression techniques have been able to outperform traditional codecs and have opened the gates for the development of learning-based video codecs. However, to take advantage of the high temporal correlation in videos, more sophisticated architectures need to be employed. This paper presents PredEncoder, a hybrid video compression framework based on the concept of predictive auto-encoding that models the temporal correlations between consecutive video frames using a prediction network which is then combined with a progressive encoder network to exploit the spatial redundancies. A variable-rate block encoding scheme has been proposed in the paper that leads to remarkably high quality to bit-rate ratios. By joint training and fine-tuning of this hybrid architecture, PredEncoder has been able to gain significant improvement over the MPEG-4 codec and has achieved bit-rate savings over the H.264 codec in the low to medium bit-rate range for HD videos and comparable results over most bit-rates for non-HD videos. This paper serves to demonstrate how neural architectures can be leveraged to perform at par with the highly optimized traditional methodologies in the video compression domain.
摘要：近年来，基于神经网络的图像压缩技术已经能够超越传统的编解码器，并打开了闸门的基础学习，视频编解码器的发展。然而，利用视频中的高时间相关的，更复杂的结构需要采用。本文呈现PredEncoder，基于预测的自动编码模型的概念的混合视频压缩框架中使用的预测网络，然后将其与逐行编码器网络能够利用空间冗余组合的连续视频帧之间的时间相关性。一种可变速率块编码方案已经在本文提出了导致显着高的质量，以比特率比率。通过联合培训和这种混合架构的微调，PredEncoder已经能够在MPEG-4编解码器，以获得显著的改善，并已在H.264编解码器在实现低储蓄比特率中等码率范围HD视频和超极比特率用于非高清视频比较的结果。本文用来说明如何架构的神经可以利用按面值在视频压缩领域的高度优化的传统方法来执行。

54. A single image deep learning approach to restoration of corrupted remote sensing products [PDF] 返回目录
Anna Petrovskaia, Raghavendra B. Jana, Ivan V. Oseledets
Abstract: Remote sensing images are used for a variety of analyses, from agricultural monitoring, to disaster relief, to resource planning, among others. The images can be corrupted due to a number of reasons, including instrument errors and natural obstacles such as clouds. We present here a novel approach for reconstruction of missing information in such cases using only the corrupted image as the input. The Deep Image Prior methodology eliminates the need for a pre-trained network or an image database. It is shown that the approach easily beats the performance of traditional single-image methods.
摘要：遥感图像被用于多种分析，从农业监测，救灾，资源规划，等等。图像可以以多种原因，包括仪器误差和天然障碍物，如云彩被破坏所致。我们在这里提出一种新的方法用于在丢失仅使用损坏的图像作为输入这样的情况下的信息的重建。深图像之前的方法消除了预先训练网络或图像数据库的需要。结果表明，该方法很容易地跳动的传统的单图像方法的性能。

55. Inpainting via Generative Adversarial Networks for CMB data analysis [PDF] 返回目录
Alireza Vafaei Sadr, Farida Farsian
Abstract: In this work, we propose a new method to inpaint the CMB signal in regions masked out following a point source extraction process. We adopt a modified Generative Adversarial Network (GAN) and compare different combinations of internal (hyper-)parameters and training strategies. We study the performance using a suitable $\mathcal{C}_r$ variable in order to estimate the performance regarding the CMB power spectrum recovery. We consider a test set where one point source is masked out in each sky patch with a 1.83 $\times$ 1.83 squared degree extension, which, in our gridding, corresponds to 64 $\times$ 64 pixels. The GAN is optimized for estimating performance on Planck 2018 total intensity simulations. The training makes the GAN effective in reconstructing a masking corresponding to about 1500 pixels with $1\%$ error down to angular scales corresponding to about 5 arcminutes.
摘要：在这项工作中，我们建议补绘屏蔽掉下面的点源提取过程的区域中的CMB信号的新方法。我们采用改良剖成对抗性网络（GAN）和比较的内部（超）参数和培训战略的不同组合。我们使用合适的$ \ mathcal {C} _r $变量，以估计有关CMB功率谱恢复性能研究的性能。我们认为测试集，其中一个点源与1.83 $ \ $倍1.83平方度的扩展，这在我们的网格化，对应于64 $ \ $次64个像素每个天空补丁屏蔽掉。在GAN是用于在普朗克2018总强度的模拟估计性能优化。训练使得GAN有效重建对应于约1500个像素的掩蔽用$ 1 \％$误差降低到对应于约5弧分角秤。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-04-10

目录

摘要