摘要

1. LOOC: Localize Overlapping Objects with Count Supervision [PDF] 返回目录
Issam H. Laradji, Rafael Pardinas, Pau Rodriguez, David Vazquez
Abstract: Acquiring count annotations generally requires less human effort than point-level and bounding box annotations. Thus, we propose the novel problem setup of localizing objects in dense scenes under this weaker supervision. We propose LOOC, a method to Localize Overlapping Objects with Count supervision. We train LOOC by alternating between two stages. In the first stage, LOOC learns to generate pseudo point-level annotations in a semi-supervised manner. In the second stage, LOOC uses a fully-supervised localization method that trains on these pseudo labels. The localization method is used to progressively improve the quality of the pseudo labels. We conducted experiments on popular counting datasets. For localization, LOOC achieves a strong new baseline in the novel problem setup where only count supervision is available. For counting, LOOC outperforms current state-of-the-art methods that only use count as their supervision. Code is available at: this https URL.
摘要：获取计数标注通常需要比点水平，边框标注少人的努力。因此，我们提出了新的问题设置这个较弱的监督下定位在密集的场景对象。我们建议LOOC，本地化重叠的方法以计数的监督对象。我们通过两个阶段之间交替训练LOOC。在第一阶段，学习LOOC在半监督方式来产生伪点级别注释。在第二阶段中，LOOC使用完全监督定位方法，在这些伪标签列车。定位方法被用于逐步提高伪标签的质量。我们进行了流行的计数的数据集实验。对于本地化，LOOC实现了在新的问题，建立一个强大的新的基准，其中只算监督是可用的。对于计数，LOOC优于国家的最先进的电流的方法，只有使用次数作为他们的监督。代码，请访问：此HTTPS URL。

2. Image-based Vehicle Re-identification Model with Adaptive Attention Modules and Metadata Re-ranking [PDF] 返回目录
Quang Truong, Hy Dang, Zhankai Ye, Minh Nguyen, Bo Mei
Abstract: Vehicle Re-identification is a challenging task due to intra-class variability and inter-class similarity across non-overlapping cameras. To tackle these problems, recently proposed methods require additional annotation to extract more features for false positive image exclusion. In this paper, we propose a model powered by adaptive attention modules that requires fewer label annotations but still out-performs the previous models. We also include a re-ranking method that takes account of the importance of metadata feature embeddings in our paper. The proposed method is evaluated on CVPR AI City Challenge 2020 dataset and achieves mAP of 37.25% in Track 2.
摘要：车辆重新鉴定是一项具有挑战性的任务，由于类内变性和跨越非重叠摄像机类间的相似性。为了解决这些问题，最近提出的方法需要额外的注释中提取更多的功能为假阳性图像排斥。在本文中，我们提出了搭载自适应关注的模块，需要更少的标签标注，但仍出执行以往机型的典范。我们还包括了重新评级方法，它考虑的元数据功能的嵌入的重要性，我们的论文。所提出的方法是在CVPR AI城市挑战2020的数据集进行评估和实现了37.25％MAP在音轨2。

3. Deep learning for scene recognition from visual data: a survey [PDF] 返回目录
Alina Matei, Andreea Glavan, Estefania Talavera
Abstract: The use of deep learning techniques has exploded during the last few years, resulting in a direct contribution to the field of artificial intelligence. This work aims to be a review of the state-of-the-art in scene recognition with deep learning models from visual data. Scene recognition is still an emerging field in computer vision, which has been addressed from a single image and dynamic image perspective. We first give an overview of available datasets for image and video scene recognition. Later, we describe ensemble techniques introduced by research papers in the field. Finally, we give some remarks on our findings and discuss what we consider challenges in the field and future lines of research. This paper aims to be a future guide for model selection for the task of scene recognition.
摘要：采用深度学习技术在过去几年中已经发生爆炸，导致对人工智能领域的直接贡献。这项工作的目标是成为国家的最先进的场景识别与可视化数据深度学习模型的审查。场景识别仍然是计算机视觉中，已经从单个图像和动态图像透视寻址一个新兴的领域。我们首先给出了图像和视频场景识别可用的数据集的概述。后来，我们描述了在该领域的研究论文引入合奏技巧。最后，我们给我们的研究结果发表一些看法，并讨论了我们认为在该领域的研究和未来的挑战行。本文旨在为场景识别任务模型选择未来的指导。

4. Evaluating Uncertainty Estimation Methods on 3D Semantic Segmentation of Point Clouds [PDF] 返回目录
Swaroop Bhandary K, Nico Hochgeschwender, Paul Plöger, Frank Kirchner, Matias Valdenegro-Toro
Abstract: Deep learning models are extensively used in various safety critical applications. Hence these models along with being accurate need to be highly reliable. One way of achieving this is by quantifying uncertainty. Bayesian methods for UQ have been extensively studied for Deep Learning models applied on images but have been less explored for 3D modalities such as point clouds often used for Robots and Autonomous Systems. In this work, we evaluate three uncertainty quantification methods namely Deep Ensembles, MC-Dropout and MC-DropConnect on the DarkNet21Seg 3D semantic segmentation model and comprehensively analyze the impact of various parameters such as number of models in ensembles or forward passes, and drop probability values, on task performance and uncertainty estimate quality. We find that Deep Ensembles outperforms other methods in both performance and uncertainty metrics. Deep ensembles outperform other methods by a margin of 2.4% in terms of mIOU, 1.3% in terms of accuracy, while providing reliable uncertainty for decision making.
摘要：深学习模型在不同的安全关键应用中广泛使用。因此，与被准确需要沿着这些模型是高度可靠的。实现这一目标的方法之一是通过量化的不确定性。对于UQ贝叶斯方法已被广泛研究的深度学习模型应用于图像，但对于3D模式，如经常用于机器人和自治系统点云已较少探讨。在这项工作中，我们评估3点不确定性量化的方法，即深套装，MC-差和MC-DropConnect在DarkNet21Seg 3D语义分割模型，全面分析各种参数的影响，如在合奏或向前传球车型数量和丢包率值，对任务绩效和不确定性估计质量。我们发现，深合奏优于在性能和不确定性指标等方法。深合奏跑赢大盘的米欧，在准确度方面1.3％的条款等方法以2.4％的保证金，同时为决策制定可靠的不确定性。

5. Visual Question Answering as a Multi-Task Problem [PDF] 返回目录
Amelia Elizabeth Pollard, Jonathan L. Shapiro
Abstract: Visual Question Answering(VQA) is a highly complex problem set, relying on many sub-problems to produce reasonable answers. In this paper, we present the hypothesis that Visual Question Answering should be viewed as a multi-task problem, and provide evidence to support this hypothesis. We demonstrate this by reformatting two commonly used Visual Question Answering datasets, COCO-QA and DAQUAR, into a multi-task format and train these reformatted datasets on two baseline networks, with one designed specifically to eliminate other possible causes for performance changes as a result of the reformatting. Though the networks demonstrated in this paper do not achieve strongly competitive results, we find that the multi-task approach to Visual Question Answering results in increases in performance of 5-9% against the single-task formatting, and that the networks reach convergence much faster than in the single-task case. Finally we discuss possible reasons for the observed difference in performance, and perform additional experiments which rule out causes not associated with the learning of the dataset as a multi-task problem.
摘要：视觉答疑（VQA）是一个非常复杂的问题集，依靠许多子问题产生合理的答案。在本文中，我们介绍了Visual答疑应该被看作是一个多任务的问题，并提供证据支持这一假说的假设。我们通过格式化两种常用的视觉问题回答的数据集，COCO-QA和DAQUAR，为多任务格式证明这一点，培养这些格式化的数据集上的两个基线网络，与一个专门设计，以消除对性能的变化，结果其他可能的原因的格式化。虽然在本文中展示的网络没有达到强劲的竞争力的结果，我们发现，多任务的方式来可视答疑结果在针对单任务的5-9％的性能提升格式，并且该网络达到收敛多比单任务的情况下更快。最后，我们讨论了在性能上观察到的差异的可能原因，并执行其排除不与数据集作为一个多任务的问题的学习有关原因的其他实验。

6. Learning Expectation of Label Distribution for Facial Age and Attractiveness Estimation [PDF] 返回目录
Bin-Bin Gao, Xin-Xin Liu, Hong-Yu Zhou, Jianxin Wu, Xin Geng
Abstract: Facial attributes (e.g., age and attractiveness) estimation performance has been greatly improved by using convolutional neural networks. However, existing methods have an inconsistency between the training objectives and the evaluation metric, so they may be suboptimal. In addition, these methods always adopt image classification or face recognition models with a large amount of parameters, which carry expensive computation cost and storage overhead. In this paper, we firstly analyze the essential relationship between two state-of-the-art methods (Ranking-CNN and DLDL) and show that the Ranking method is in fact learning label distribution implicitly. This result thus firstly unifies two existing popular state-of-the-art methods into the DLDL framework. Second, in order to alleviate the inconsistency and reduce resource consumption, we design a lightweight network architecture and propose a unified framework which can jointly learn facial attribute distribution and regress attribute value. The effectiveness of our approach has been demonstrated on both facial age and attractiveness estimation tasks. Our method achieves new state-of-the-art results using the single model with 36$\times$(6$\times$) fewer parameters and 2.6$\times$(2.1$\times$) faster inference speed on facial age (attractiveness) estimation. Moreover, our method can achieve comparable results as the state-of-the-art even though the number of parameters is further reduced to 0.9M (3.8MB disk storage).
摘要：面部属性（例如，年龄和吸引力）估计性能已通过使用卷积神经网络得到很大的提高。但是，现有的方法有培养目标和评价指标之间的不一致，所以他们可能不是最优的。此外，这些方法总是采用图像分类或面部识别的模型与大量的参数，携带昂贵的计算成本和存储开销。在本文中，我们首先分析两个国家的最先进的方法（排名-CNN和DLDL），并表明，评级方法实际上是在学习标签分发隐含的本质关系。该结果因此首先结合两个现有流行国家的最先进的方法进DLDL框架。其次，为了缓解矛盾，降低资源消耗，我们设计了一个轻量级的网络架构，并提出一个统一的框架，可以共同学习的面部属性分配和回归属性值。我们的方法的有效性已经被证实在两个人脸年龄和吸引力估算任务。我们的方法实现了使用具有36 $ \倍单个模型新的国家的最先进的结果$（6 $ \倍$）更少的参数和2.6 $ \倍$（2.1 $ \倍$）上的面部的年龄更快推理速度（吸引力）估计。而且，我们的方法可以达到类似的结果作为状态的最先进的，即使参数的数量被进一步减少至0.9M（3.8MB磁盘存储）。

7. End-to-end Interpretable Learning of Non-blind Image Deblurring [PDF] 返回目录
Thomas Eboli, Jian Sun, Jean Ponce
Abstract: Non-blind image deblurring is typically formulated as a linear least-squares problem regularized by natural priors on the corresponding sharp picture's gradients, which can be solved, for example, using a half-quadratic splitting method with Richardson fixed-point iterations for its least-squares updates and a proximal operator for the auxiliary variable updates. We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels. Using convolutions instead of a generic linear preconditioner allows extremely efficient parameter sharing across the image, and leads to significant gains in accuracy and/or speed compared to classical FFT and conjugate-gradient methods. More importantly, the proposed architecture is easily adapted to learning both the preconditioner and the proximal operator using CNN embeddings. This yields a simple and efficient algorithm for non-blind image deblurring which is fully interpretable, can be learned end to end, and whose accuracy matches or exceeds the state of the art, quite significantly, in the non-uniform case.
摘要：非盲图像去模糊通常配制成通过自然先验上的相应清晰的图像的梯度正规化的线性最小二乘问题，这是可以解决的，例如，使用半二次分割方法与理查森定点迭代其最小二乘更新和辅助变量更新近端操作者。我们建议以预处理理查德森求解器使用（已知的）模糊和自然的图像之前内核的近似逆滤波器。使用，而不是一个通用的线性预处理器卷积允许横跨图像非常有效的参数的共享，并导致在精确度和/或速度增益显著相比古典FFT和共轭梯度方法。更重要的是，所提出的架构很容易适应学习都预条件和使用的嵌入CNN近端操作。这产生用于非盲图像去模糊的简单且高效的算法，这是完全解释，可以得知首尾相连，并且其准确度匹配或超过现有技术的状态，相当显著，在非均匀的情况下。

8. Explainable Deep One-Class Classification [PDF] 返回目录
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus-Robert Müller
Abstract: Deep one-class classification variants for anomaly detection learn a mapping that concentrates nominal samples in feature space causing anomalies to be mapped away. Because this transformation is highly non-linear, finding interpretations poses a significant challenge. In this paper we present an explainable deep one-class classification method, Fully Convolutional Data Description (FCDD), where the mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a recent manufacturing dataset offering ground-truth anomaly maps, FCDD meets the state of the art in an unsupervised setting, and outperforms its competitors in a semi-supervised setting. Finally, using FCDD's explanations we demonstrate the vulnerability of deep one-class classification models to spurious image features such as image watermarks.
摘要：深一类分类变体异常检测学习的映射，在特征空间中的浓缩物的标称样本造成异常要被映射了。因为这种转变是高度非线性的，找到的解释提出了一个显著的挑战。在本文中，我们提出了一种可解释深一类分类方法，全卷积数据描述（FCDD），其中映射样本本身也解释热图。 FCDD产生有竞争力的检测性能，并提供有关共同异常检测基准合理的解释与CIFAR-10和ImageNet。上MVTec公司-AD，最近制造数据集提供地面实况异常映射，FCDD满足本领域中的无监督设置的状态，并且优于其竞争对手在一个半监督设置。最后，使用FCDD的解释，我们展示深一类分类模型的漏洞图像水印虚假图片等功能。

9. Collaborative Learning for Faster StyleGAN Embedding [PDF] 返回目录
Shanyan Guan, Ying Tai, Bingbing Ni, Feida Zhu, Feiyue Huang, Xiaokang Yang
Abstract: The latent code of the recent popular model StyleGAN has learned disentangled representations thanks to the multi-layer style-based generator. Embedding a given image back to the latent space of StyleGAN enables wide interesting semantic image editing applications. Although previous works are able to yield impressive inversion results based on an optimization framework, which however suffers from the efficiency issue. In this work, we propose a novel collaborative learning framework that consists of an efficient embedding network and an optimization-based iterator. On one hand, with the progress of training, the embedding network gives a reasonable latent code initialization for the iterator. On the other hand, the updated latent code from the iterator in turn supervises the embedding network. In the end, high-quality latent code can be obtained efficiently with a single forward pass through our embedding network. Extensive experiments demonstrate the effectiveness and efficiency of our work.
摘要：最近流行的模型StyleGAN的潜代码了解到解开感谢交涉为主风格的多层发生器。嵌入一个给定的图像回StyleGAN的潜在空间使宽有趣的语义图像编辑应用程序。虽然以前的作品都能够基于优化框架，但是从效率的问题受到影响，产生令人印象深刻的反演结果。在这项工作中，我们提出了一种新的协作学习框架，包括一个高效的网络嵌入和基于优化的迭代。一方面，随着训练的进展，嵌入网络给出了迭代一个合理的潜代码初始化。在另一方面，依次从迭代更新的潜代码监督嵌入网络。最后，可以用单个直传通过我们的嵌入网络高效地获得高品质潜代码。大量的实验证明了我们工作的有效性和效率。

10. Multi-Label Image Recognition with Multi-Class Attentional Regions [PDF] 返回目录
Bin-Bin Gao, Hong-Yu Zhou
Abstract: Multi-label image recognition is a practical and challenging task compared to single-label image classification. However, previous works may be suboptimal because of a great number of object proposals or complex attentional region generation modules. In this paper, we propose a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects. To bridge the gap between global and local streams, we propose a multi-class attentional region module which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible. Our method can efficiently and effectively recognize multi-class objects with an affordable computation cost and a parameter-free region localization module. Over three benchmarks on multi-label image classification, we create new state-of-the-art results with a single model only using image semantics without label dependency. In addition, the effectiveness of the proposed method is extensively demonstrated under different factors such as global pooling strategy, input size and network architecture.
摘要：相比于单标签图像分类多标签图像识别是一种实用的和艰巨的任务。然而，以前的作品可能是因为对象的建议或复杂的注意区域发电模块的大量次优的。在本文中，我们提出了一个简单而有效的双流框架识别来自全球形象多类别对象的局部区域，类似于人类如何感知对象。为了弥合全球和本地流之间的差距，我们提出了一个多级的注视区域模块的目的是使该注视区域的数量尽可能小，并保持这些地区的多样性尽可能高。我们的方法能够有效地和有效地识别和合理的计算成本和无参数区域定位模块的多类对象。在多标签图像分类三个基准，我们创建了国家的最先进的新成果与单个模型只使用图像语义无标签的依赖。此外，所提出的方法的有效性在不同因素，诸如全球池策略，输入大小和网络体系结构被广泛证实。

11. Video Prediction via Example Guidance [PDF] 返回目录
Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell
Abstract: In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics. In this work, we propose a simple yet effective framework that can efficiently predict plausible future states. The key insight is that the potential distribution of a sequence could be approximated with analogous ones in a repertoire of training pool, namely, expert examples. By further incorporating a novel optimization scheme into the training procedure, plausible predictions can be sampled efficiently from distribution constructed from the retrieved examples. Meanwhile, our method could be seamlessly integrated with existing stochastic predictive models; significant enhancement is observed with comprehensive experiments in both quantitative and qualitative aspects. We also demonstrate the generalization ability to predict the motion of unseen class, i.e., without access to corresponding data during training phase.
摘要：在视频预测的任务，一个主要的挑战是捕捉未来的内容和动态多模态性质。在这项工作中，我们提出了一个简单而有效的框架，可以有效地预测未来合理的状态。关键的观点是，一个序列的电位分布可以用在训练池，即专家的例子的剧目类似的人来近似。通过进一步结合有新颖的优化方案进入训练过程，合理的预测可以被有效地从所述检索到的实施例构建的分发采样。同时，我们的方法可以无缝地与现有的随机预测模型集成;显著增强是在定量和定性两个方面综合实验中观察到。我们还证明预测看不见类，即运动不访问期间训练阶段对应的数据泛化能力。

12. Deep Fence Estimation using Stereo Guidance and Adversarial Learning [PDF] 返回目录
Paritosh Mittal, Shankar M Venkatesan, Viswanath Veera, Aloknath De
Abstract: People capture memorable images of events and exhibits that are often occluded by a wire mesh loosely termed as fence. Recent works in removing fence have limited performance due to the difficulty in initial fence segmentation. This work aims to accurately segment fence using a novel fence guidance mask (FM) generated from stereo image pair. This binary guidance mask contains deterministic cues about the structure of fence and is given as additional input to the deep fence estimation model. We also introduce a directional connectivity loss (DCL), which is used alongside adversarial loss to precisely detect thin wires. Experimental results obtained on real world scenarios demonstrate the superiority of proposed method over state-of-the-art techniques.
摘要：这往往是由电线遮挡活动和展览的人们捕捉令人难忘的图片网眼松散称为栅栏。在去除围栏最近的作品都表现有限，因为在最初的栅栏分割的难度。这项工作旨在准确段围栏使用来自立体图像对产生的新颖围栏指导掩模（FM）。这个二进制指导面膜含有约栅栏的结构确定性线索，并给出额外的输入到深围栏估计模型。我们还介绍了一个定向连接丢失（DCL），其用于沿着对抗性损失精确地检测细线。在现实世界中的场景得到的实验结果证明了方法的对国家的最先进的技术优势。

13. Synergistic saliency and depth prediction for RGB-D saliency detection [PDF] 返回目录
Yue Wang, Yuke Li, James H Elder, Huchuan Lu, Runmin Wu
Abstract: Depth information available from an RGB-D camera can be useful in segmenting salient objects when figure/ground cues from RGB channels are weak. This has motivated the development of several RGB-D saliency datasets and algorithms that use all four channels of the RGB-D data for both training and inference. Unfortunately, existing RGB-D saliency datasets are small, leading to overfitting and poor generalization. Here we demonstrate a system for RGB-D saliency detection that makes effective joint use of large RGB saliency datasets with hand-labelled saliency ground truth together, and smaller RGB-D saliency datasets {\em without} saliency ground truth. This novel prediction-guided cross-refinement network is trained to jointly estimate both saliency and depth, allowing mutual refinement between feature representations tuned for the two respective tasks. An adversarial stage resolves domain shift between RGB and RGB-D saliency datasets, allowing representations for saliency and depth estimation to be aligned on either. Critically, our system does not require saliency ground-truth for the RGB-D datasets, making it easier to expand these datasets for training, and does not require the D channel for inference, allowing the method to be used for the much broader range of applications where only RGB data are available. Evaluation on seven RGBD datasets demonstrates that, without using hand-labelled saliency ground truth for RGB-D datasets and using only the RGB channels of these datasets at inference, our system achieves performance that is comparable to state-of-the-art methods that use hand-labelled saliency maps for RGB-D data at training and use the depth channels of these datasets at inference.
摘要：当从RGB通道图形/背景线索是弱可从RGB-d相机的深度信息可以在分割凸有用的对象。这已促使几个RGB-d显着的数据集和使用RGB-d数据的所有四个通道的训练和推理算法的开发。不幸的是，现有的RGB-d显着的数据集是小，导致过度拟合和泛化差。在这里，我们演示了RGB-d显着性检测，使得有效的联合利用手标记显着地真相大显着的RGB数据集在一起的系统，以及更小的RGB-d显着的数据集{\ EM无显着性}地面实况。这种新颖的预测制导交叉细化网络进行训练，以联合估计二者的显着性和深度，从而允许调整为两个相应的任务特征表示之间的相互细化。 RGB和RGB-d的显着性的数据集之间的对抗阶段做出决议域移位，从而允许显着性和深度估计交涉上或者被对准。重要的是，我们的系统不需要显着地面实况的RGB-d的数据集，从而更容易扩大这些数据集进行训练，并且不需要进行推断的d通道，允许使用的范围更广的方法只有RGB数据是可用的应用程序。评价七个RGBD数据集表明，不使用手工标记显着地真理RGBD数据集和只使用这些数据集在推理的RGB通道，我们的系统实现的性能是相当的国家的最先进的方法是采用手工标记为RGB-d数据显着性映射在训练和在推理使用这些数据集的深度信道。

14. Improving auto-encoder novelty detection using channel attention and entropy minimization [PDF] 返回目录
Dongyan Guo, Miao Tian, Ying Cui, Xiang Pan, Shengyong Chen
Abstract: Novelty detection is a important research area which mainly solves the classification problem of inliers which usually consists of normal samples and outliers composed of abnormal samples. We focus on the role of auto-encoder in novelty detection and further improved the performance of such methods based on auto-encoder through two main contributions. Firstly, we introduce attention mechanism into novelty detection. Under the action of attention mechanism, auto-encoder can pay more attention to the representation of inlier samples through adversarial training. Secondly, we try to constrain the expression of the latent space by information entropy. Experimental results on three public datasets show that the proposed method has potential performance for novelty detection.
摘要：新颖性检测是一个重要的研究领域主要解决正常值的分类问题通常是由异常样本组成样本的正常和异常的。我们专注于自动编码器的新颖性检测的作用，进一步提高了通过两个主要的贡献基于自动编码器来这种方法的性能。首先，我们介绍了注意机制引入新颖的检测。在关注机制的作用，自动编码器可以更加注重通过对抗训练的内围样本的代表性。其次，我们尝试通过信息熵来约束潜在空间的表达。三个公共数据集的实验结果表明，该方法具有新颖性检测潜在的性能。

15. Few-Shot Microscopy Image Cell Segmentation [PDF] 返回目录
Youssef Dawoud, Julia Hornauer, Gustavo Carneiro, Vasileios Belagiannis
Abstract: Automatic cell segmentation in microscopy images works well with the support of deep neural networks trained with full supervision. Collecting and annotating images, though, is not a sustainable solution for every new microscopy database and cell type. Instead, we assume that we can access a plethora of annotated image data sets from different domains (sources) and a limited number of annotated image data sets from the domain of interest (target), where each domain denotes not only different image appearance but also a different type of cell segmentation problem. We pose this problem as meta-learning where the goal is to learn a generic and adaptable few-shot learning model from the available source domain data sets and cell segmentation tasks. The model can be afterwards fine-tuned on the few annotated images of the target domain that contains different image appearance and different cell type. In our meta-learning training, we propose the combination of three objective functions to segment the cells, move the segmentation results away from the classification boundary using cross-domain tasks, and learn an invariant representation between tasks of the source domains. Our experiments on five public databases show promising results from 1- to 10-shot meta-learning using standard segmentation neural network architectures.
摘要：在显微镜图像自动分割单元格的支持与全程监督训练的深层神经网络的效果很好。收集和注释图像，但是，是不是每一个新的显微镜的数据库和细胞型可持续发展的解决方案。相反，我们假定，我们可以从不同的域（源）和从所关注的域（目标），其中每个结构域不仅表示不同的图像的外观，但也标注的图像数据集的有限数量的访问标注的图像数据组的过多不同类型的细胞分割问题的。我们提出这个问题，因为元学习，其目的是要学会从可用的源域数据集和单元分割任务的通用和适应性强的几拍的学习模式。该模型可以事后微调对包含不同的图像的外观和不同细胞类型的目标域的几个注释的图像。在我们的元学习培训，我们提出的三个目标函数的组合来细分细胞，将分割结果利用跨域任务的分类边界的时候，学源域的任务之间的恒定表征。我们在五个公共数据库的实验表明有前途的结果从1到10次元学习使用标准分割神经网络结构。

16. Complex Network Construction for Interactive Image Segmentation using Particle Competition and Cooperation: A New Approach [PDF] 返回目录
Jefferson Antonio Ribeiro Passerini, Fabricio Aparecido Breve
Abstract: In the interactive image segmentation task, the Particle Competition and Cooperation (PCC) model is fed with a complex network, which is built from the input image. In the network construction phase, a weight vector is needed to define the importance of each element in the feature set, which consists of color and location information of the corresponding pixels, thus demanding a specialist's intervention. The present paper proposes the elimination of the weight vector through modifications in the network construction phase. The proposed model and the reference model, without the use of a weight vector, were compared using 151 images extracted from the Grabcut dataset, the PASCAL VOC dataset and the Alpha matting dataset. Each model was applied 30 times to each image to obtain an error average. These simulations resulted in an error rate of only 0.49\% when classifying pixels with the proposed model while the reference model had an error rate of 3.14\%. The proposed method also presented less error variation in the diversity of the evaluated images, when compared to the reference model.
摘要：在交互式图像分割任务，粒子竞争与合作（PCC）模型被馈送有一个复杂的网络，其从输入图像构建的。在网络建设阶段，需要一个权重向量，以限定在所述特征集合中的每个元素，它由对应的像素的颜色和位置信息的重要性，因此，要求一个专门的干预。本提出的权重向量通过在网络建设阶段修改的消除。所提出的模型和参考模型，而无需使用一个权重向量，使用从所述数据集Grabcut中，PASCAL VOC数据集和阿尔法抠图的数据集151个中提取图像进行比较。每个模型应用于30次以每个图像以获得误差平均值。这些模拟与该模型进行分类像素时，同时参考模型有3.14 \％的误差率导致的仅为0.49 \％的误差率。相比于参考模型时在所评价的图像，的多样性所提出的方法还提出了以下误差变化。

17. Balanced Symmetric Cross Entropy for Large Scale Imbalanced and Noisy Data [PDF] 返回目录
Feifei Huang, Jie Li, Xuelin Zhu
Abstract: Deep convolution neural network has attracted many attentions in large-scale visual classification task, and achieves significant performance improvement compared to traditional visual analysis methods. In this paper, we explore many kinds of deep convolution neural network architectures for large-scale product recognition task, which is heavily class-imbalanced and noisy labeled data, making it more challenged. Extensive experiments show that PNASNet achieves best performance among a variety of convolutional architectures. Together with ensemble technology and negative learning loss for noisy labeled data, we further improve the model performance on online test data. Finally, our proposed method achieves 0.1515 mean top-1 error on online test data.
摘要：深卷积神经网络已经吸引了大型视觉分类任务多的关注，而相比传统的可视化分析方法，实现了显著的性能提升。在本文中，我们将探讨多种深卷积神经网络架构进行大规模的产品识别任务，这在很大程度上类不平衡和嘈杂的标记数据，使得它更具挑战性的。大量的实验表明，PNASNet实现各种卷积架构中性能最好。与合奏技术和嘈杂的标记数据负学习一起损失，我们进一步完善在线测试数据模型的性能。最后，我们提出的方法实现了在线测试数据0.1515均值最高1错误。

18. Weakly Supervised Temporal Action Localization with Segment-Level Labels [PDF] 返回目录
Xinpeng Ding, Nannan Wang, Xinbo Gao, Jie Li, Xiaoyu Wang, Tongliang Liu
Abstract: Temporal action localization presents a trade-off between test performance and annotation-time cost. Fully supervised methods achieve good performance with time-consuming boundary annotations. Weakly supervised methods with cheaper video-level category label annotations result in worse performance. In this paper, we introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here. We incorporate this segment-level supervision along with a novel localization module in the training. Specifically, we devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments. Since the labeled segments are only parts of actions, the model tends to overfit along with the training process. To tackle this problem, we first obtain a similarity matrix from discriminative features guided by a sphere loss. Then, a propagation loss is devised based on the matrix to act as a regularization term, allowing implicit unlabeled segments propagation during training. Experiments validate that our method can outperform the video-level supervision methods with almost same the annotation time.
摘要：临时行动本地化礼物测试性能和注解的时间成本之间进行权衡。全监督的方法实现与耗时的边界注解不错的表现。用更便宜的视频级类别标签标注弱监督方法导致更差的性能。在本文中，我们引入了一个新的段级监督的设置：当注释观察行动在这里发生的段标记。我们在训练一个新的定位模块一起纳入本段级监督。具体来说，我们设计视为损失采样学习从标记的段积分动作部分的局部段损失。由于标记段的行动只会部分，模型往往与培训过程中过度拟合一起。为了解决这个问题，首先要获得一个球体损失引导判别特征的相似性矩阵。然后，将传播损耗是基于矩阵作为正则化项，允许在训练期间隐未标记的片段传播设计。实验验证了我们的方法可以超越几乎相同的标注时间的视频级监管手段。

19. Ground Truth Free Denoising by Optimal Transport [PDF] 返回目录
Sören Dittmer, Carola-Bibiane Schönlieb, Peter Maass
Abstract: We present a learned unsupervised denoising method for arbitrary types of data, which we explore on images and one-dimensional signals. The training is solely based on samples of noisy data and examples of noise, which - critically -- do not need to come in pairs. We only need the assumption that the noise is independent and additive (although we describe how this can be extended). The method rests on a Wasserstein Generative Adversarial Network setting, which utilizes two critics and one generator.
摘要：本文提出了一种无监督了解到去噪方法对任意类型的数据，这是我们的图像和一维信号的探索。培训完全基于噪声数据和噪声的例子，它的样本 - 批判 - 不需要成对出现。我们只需要假设噪声独立和添加剂（虽然我们在此介绍这个可扩展）。上的瓦瑟斯坦剖成对抗性网络设置，其中利用两个评论家和一个发电机的方法，休止符。

20. Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN [PDF] 返回目录
Zhenwei He, Lei Zhang
Abstract: Conventional object detection models inevitably encounter a performance drop as the domain disparity exists. Unsupervised domain adaptive object detection is proposed recently to reduce the disparity between domains, where the source domain is label-rich while the target domain is label-agnostic. The existing models follow a parameter shared siamese structure for adversarial domain alignment, which, however, easily leads to the collapse and out-of-control risk of the source domain and brings negative impact to feature adaption. The main reason is that the labeling unfairness (asymmetry) between source and target makes the parameter sharing mechanism unable to adapt. Therefore, in order to avoid the source domain collapse risk caused by parameter sharing, we propose an asymmetric tri-way Faster-RCNN (ATF) for domain adaptive object detection. Our ATF model has two distinct merits: 1) A ancillary net supervised by source label is deployed to learn ancillary target features and simultaneously preserve the discrimination of source domain, which enhances the structural discrimination (object classification vs. bounding box regression) of domain alignment. 2) The asymmetric structure consisting of a chief net and an independent ancillary net essentially overcomes the parameter sharing aroused source risk collapse. The adaption safety of the proposed ATF detector is guaranteed. Extensive experiments on a number of datasets, including Cityscapes, Foggy-cityscapes, KITTI, Sim10k, Pascal VOC, Clipart and Watercolor, demonstrate the SOTA performance of our method.
摘要：传统的物体检测模型难免会遇到性能下降的域差距存在。无监督域自适应物体检测近来提出了减少域之间的视差，其中源结构域是富含标签而目标域是标签无关。现有的模型跟踪参数共享连体结构对抗性的畴对准，其中，但是，容易导致源域的崩溃和外的控制风险，并带来功能适应的负面影响。其主要原因是，源和目标之间的不公平性的标记（不对称）使得参数共享机制无法适应。因此，为了避免由参数共享源域崩塌危险，我们提出了域自适应物体检测的不对称三方式更快-RCNN（ATF）。我们的ATF模型具有两个不同的优点：1）一种辅助净监督由源标签被部署到学习辅助目标特征和同时保留源域的鉴别，这增强了结构性歧视（对象分类与包围盒回归）结构域对准的。 2）由首席净并且基本上独立的辅助网的非对称结构克服了参数共享引起源风险崩溃。所提出的ATF检测器的适配的安全得到保证。在许多数据集，包括城市景观，雾，城市景观，KITTI，Sim10k，帕斯卡尔VOC，剪贴画和水彩画广泛的实验，证明了该方法的SOTA性能。

21. Surrogate-assisted Particle Swarm Optimisation for Evolving Variable-length Transferable Blocks for Image Classification [PDF] 返回目录
Bin Wang, Bing Xue, Mengjie Zhang
Abstract: Deep convolutional neural networks have demonstrated promising performance on image classification tasks, but the manual design process becomes more and more complex due to the fast depth growth and the increasingly complex topologies of convolutional neural networks. As a result, neural architecture search has emerged to automatically design convolutional neural networks that outperform handcrafted counterparts. However, the computational cost is immense, e.g. 22,400 GPU-days and 2,000 GPU-days for two outstanding neural architecture search works named NAS and NASNet, respectively, which motivates this work. A new effective and efficient surrogate-assisted particle swarm optimisation algorithm is proposed to automatically evolve convolutional neural networks. This is achieved by proposing a novel surrogate model, a new method of creating a surrogate dataset and a new encoding strategy to encode variable-length blocks of convolutional neural networks, all of which are integrated into a particle swarm optimisation algorithm to form the proposed method. The proposed method shows its effectiveness by achieving competitive error rates of 3.49% on the CIFAR-10 dataset, 18.49% on the CIFAR-100 dataset, and 1.82% on the SVHN dataset. The convolutional neural network blocks are efficiently learned by the proposed method from CIFAR-10 within 3 GPU-days due to the acceleration achieved by the surrogate model and the surrogate dataset to avoid the training of 80.1% of convolutional neural network blocks represented by the particles. Without any further search, the evolved blocks from CIFAR-10 can be successfully transferred to CIFAR-100 and SVHN, which exhibits the transferability of the block learned by the proposed method.
摘要：深卷积神经网络已经证实有前途的遥感影像分类任务上的表现，但手工设计过程中由于快速深入发展和卷积神经网络的日益复杂的拓扑结构变得越来越复杂。其结果是，神经结构搜索已经出现了自动设计出超越同行的手工卷积神经网络。然而，该计算成本是巨大的，例如22400 GPU日和2000 GPU天名为NAS和NASNet，分别为两个突出的神经结构搜索的作品，这促使这项工作。一种新的有效和高效的替代辅助粒子群优化算法来自动演变卷积神经网络。这是通过提出一种新颖的替代模型，创建一个代理数据集和一个新的编码策略，以卷积神经网络的编码可变长度块，所有这些都集成到一个粒子群优化算法的一个新的方法来实现，以形成所提出的方法。该方法显示了实现对SVHN数据集对CIFAR-10数据集3.49％，在CIFAR-100数据集18.49％和1.82％，有竞争力的错误率其有效性。的卷积神经网络的块被有效地利用该方法从CIFAR-10内3 GPU-天了解到由于由替代模型和所述替代数据集实现，以避免的卷积神经网络的块的80.1％的训练由粒子表示的加速度。没有任何进一步的搜索，从CIFAR-10演进块可以被成功地传送到CIFAR-100和SVHN，其表现出由所提出的方法学的块的转印性。

22. Segment as Points for Efficient Online Multi-Object Tracking and Segmentation [PDF] 返回目录
Zhenbo Xu, Wei Zhang, Xiao Tan, Wei Yang, Huan Huang, Shilei Wen, Errui Ding, Liusheng Huang
Abstract: Current multi-object tracking and segmentation (MOTS) methods follow the tracking-by-detection paradigm and adopt convolutions for feature extraction. However, as affected by the inherent receptive field, convolution based feature extraction inevitably mixes up the foreground features and the background features, resulting in ambiguities in the subsequent instance association. In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation. Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images. Furthermore, multiple informative data modalities are converted into point-wise representations to enrich point-wise features. The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods including 3D tracking methods by large margins (5.4% higher MOTSA and 18 times faster over MOTSFusion) with the near real-time speed (22 FPS). Evaluations across three datasets demonstrate both the effectiveness and efficiency of our method. Moreover, based on the observation that current MOTS datasets lack crowded scenes, we build a more challenging MOTS dataset named APOLLO MOTS with higher instance density. Both APOLLO MOTS and our codes are publicly available at this https URL.
摘要：当前多目标跟踪和分割（MOTS）方法遵循循迹通过检测范式和采用用于特征提取卷积。然而，由于受固有的感受域，基于卷积特征提取不可避免地混合了前景特征和背景特征，从而导致在随后的实例相关联的歧义。在本文中，我们提出了通过小型图像表示转换到非有序的2D点云表示学习基于段的嵌入例如一个非常有效的方法。我们的方法生成一个新的跟踪逐点模式，其中判别实例的嵌入是从随机选择的点，而不是图像的教训。此外，多个信息数据形式转换成逐点交涉丰富逐点功能。得到的在线MOTS框架，名为PointTrack，超过所有的国家的最先进的方法，包括通过大利润的3D跟踪方法（5.4％更高MOTSA和18倍以上MOTSFusion更快）与近实时速度（22 FPS）。跨越三个数据集的评估表明两者的效果和我们的方法的效率。此外，基于这样的观察，目前的MOTS数据集缺乏拥挤的场面，我们建立一个名为APOLLO MOTS具有较高的情况下密度更具挑战性的MOTS数据集。无论APOLLO MOTS和我们的代码是公开的，在此HTTPS URL。

23. PointTrack++ for Effective Online Multi-Object Tracking and Segmentation [PDF] 返回目录
Zhenbo Xu, Wei Zhang, Xiao Tan, Wei Yang, Xiangbo Su, Yuchen Yuan, Hongwu Zhang, Shilei Wen, Errui Ding, Liusheng Huang
Abstract: Multiple-object tracking and segmentation (MOTS) is a novel computer vision task that aims to jointly perform multiple object tracking (MOT) and instance segmentation. In this work, we present PointTrack++, an effective on-line framework for MOTS, which remarkably extends our recently proposed PointTrack framework. To begin with, PointTrack adopts an efficient one-stage framework for instance segmentation, and learns instance embeddings by converting compact image representations to un-ordered 2D point cloud. Compared with PointTrack, our proposed PointTrack++ offers three major improvements. Firstly, in the instance segmentation stage, we adopt a semantic segmentation decoder trained with focal loss to improve the instance selection quality. Secondly, to further boost the segmentation performance, we propose a data augmentation strategy by copy-and-paste instances into training images. Finally, we introduce a better training strategy in the instance association stage to improve the distinguishability of learned instance embeddings. The resulting framework achieves the state-of-the-art performance on the 5th BMTT MOTChallenge.
摘要：多目标跟踪和分割（MOTS）是一种新型的计算机视觉任务，旨在共同执行多个对象跟踪（MOT）和实例分割。在这项工作中，我们目前PointTrack ++，有效的上线框架MOTS，显着扩大了我们最近提出PointTrack框架。首先，PointTrack通过转换紧凑图像表示于未有序2D点云采用例如分割的有效单级框架，并获知实例的嵌入开始。与PointTrack，我们提出的PointTrack ++提供了三种主要的改进相比。首先，在实例分割阶段，我们采用与焦点损失，提高实例选择品质培养了语义分割解码器。其次，要进一步提升分割性能，我们建议通过复制和粘贴实例投入到训练图像的数据扩张战略。最后，我们介绍的实例关联阶段更好的培训战略，以提高了解到实例的嵌入的区分性。将所得的框架实现的第五BMTT MOTChallenge所述状态的最先进的性能。

24. Multiple Instance-Based Video Anomaly Detection using Deep Temporal Encoding-Decoding [PDF] 返回目录
Ammar Mansoor Kamoona, Amirali Khodadadian Gosta, Alireza Bab-Hadiashar, Reza Hoseinnezhad
Abstract: In this paper, we propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos using multiple instance learning. The proposed approach uses both abnormal and normal video clips during the training phase which is developed in the multiple instance framework where we treat video as a bag and video clips as instances in the bag. Our main contribution lies in the proposed novel approach to consider temporal relations between video instances. We deal with video instances (clips) as a sequential visual data rather than independent instances. We employ a deep temporal and encoder network that is designed to capture spatial-temporal evolution of video instances over time. We also propose a new loss function that is smoother than similar loss functions recently presented in the computer vision literature, and therefore; enjoys faster convergence and improved tolerance to local minima during the training phase. The proposed temporal encoding-decoding approach with modified loss is benchmarked against the state-of-the-art in simulation studies. The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
摘要：在本文中，我们提出了一个弱监督在使用多个实例学习监控视频异常检测颞深编码译码解决方案。所提出的方法在此期间，在多重背景框架，我们把视频作为一个包和视频剪辑作为袋情况下制定的训练阶段同时使用异常和正常的视频剪辑。我们的主要贡献在于提出的新的方法来考虑视频实例之间的时间关系。我们在处理与视频实例（剪辑）为连续可视化数据，而不是独立的实例。我们采用的是被设计来捕捉视频实例的时空演变随着时间的推移了深刻的时间和编码器网络。我们还提出了一种新的损失函数比最近在计算机视觉文献中提出类似损失的功能，因此更流畅;享受更快的收敛和在训练阶段改进的耐受局部极小。具有修饰的损失所提出的时空编码译码方法是对基准的状态的最先进的在模拟研究。结果表明，相似或比在视频监控应用异常检测的先进设备，最先进的解决方案更好地所提出的方法进行。

25. Multiple Expert Brainstorming for Domain Adaptive Person Re-identification [PDF] 返回目录
Yunpeng Zhai, Qixiang Ye, Shijian Lu, Mengxi Jia, Rongrong Ji, Yonghong Tian
Abstract: Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored. In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions. MEB-Net adopts a mutual learning strategy, where multiple networks with different architectures are pre-trained within a source domain as expert models equipped with specific features and knowledge, while the adaptation is then accomplished through brainstorming (mutual learning) among expert models. MEB-Net accommodates the heterogeneity of experts learned with different architectures and enhances discrimination capability of the adapted re-ID model, by introducing a regularization scheme about authority of experts. Extensive experiments on large-scale datasets (Market-1501 and DukeMTMC-reID) demonstrate the superior performance of MEB-Net over the state-of-the-arts.
摘要：通常情况下，表现最好的深层神经模型多个基层网络的合奏，然而，相对于域自适应人再ID集成学习仍然未知。在本文中，我们提出了域自适应人重新-ID多的专家集思广益网络（MEB-网），开辟约无人监督的情况下模式集合问题有前途的方向。 MEB-网采用了相互学习的策略，在不同架构的多个网络预先训练源域配备了特定的功能和专业知识模型中，而适应，然后通过头脑风暴专家模型中（相互学习）来实现的。 MEB-Net的适应与不同架构学会专家的异质性，提高了适应再ID模型的辨别能力，通过引入有关专家的权威正则化方案。在大型数据集（市场-1501和DukeMTMC-REID）大量的实验证明MEB-网在国家的最艺术的卓越性能。

26. Domain Adaptation without Source Data [PDF] 返回目录
Youngeun Kim, Sungeun Hong, Donghyeon Cho, Hyoungseob Park, Priyadarshini Panda
Abstract: Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in real cases and possibly causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data which may contain sensitive information, we introduce source data-free domain adaptation (SFDA). Our key idea is to leverage a pre-trained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pre-trained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudo labels for every target sample based on the similarity score with class prototypes. Further, to reduce the uncertainty from the pseudo labeling process, we propose set-to-set distance-based filtering which does not require any tunable hyperparameters. Finally, we train the target model with the filtered pseudo labels with regularization from the pre-trained source model. Surprisingly, without direct usage of labeled source samples, our SFDA outperforms conventional domain adaptation methods on benchmark datasets. Our code is publicly available at this https URL.
摘要：域适应假定从源和目标域采样是在训练阶段期间可自由接近。然而，这样的假设是在真实情况下，很少可行的和可能导致数据的隐私问题，特别是当源域的标签可以是一个敏感属性作为标识符。为了避免可能包含敏感信息访问源数据，我们引入源免费数据域适配（SFDA）。我们的核心思想是利用从源域的前训练模型，并逐步更新自我学习方式的目标模式。我们观察到，目标样本与预先训练源模型测量低自我熵更容易被正确分类。由此看来，我们与自熵准则来选择可靠的样本，其定义为类原型。我们然后分配伪标签基于相似的每个目标样本得分类原型。此外，为了减少伪标记过程中的不确定性，我们建议设置到组，它不需要任何可调超参数基于距离的过滤。最后，我们训练与从预先训练源模型正规化过滤伪标签的目标模式。出人意料的是，没有标记源样本直接使用，我们SFDA优于对基准数据集传统领域适应性方法。我们的代码是公开的，在此HTTPS URL。

27. Three-dimensional Human Tracking of a Mobile Robot by Fusion of Tracking Results of Two Cameras [PDF] 返回目录
Shinya Matsubara, Akihiko Honda, Yonghoon Ji, Kazunori Umeda
Abstract: This paper proposes a process that uses two cameras to obtain three-dimensional (3D) information of a target object for human tracking. Results of human detection and tracking from two cameras are integrated to obtain the 3D information. OpenPose is used for human detection. In the case of a general processing a stereo camera, a range image of the entire scene is acquired as precisely as possible, and then the range image is processed. However, there are problems such as incorrect matching and computational cost for the calibration process. A new stereo vision framework is proposed to cope with the problems. The effectiveness of the proposed framework and the method is verified through target-tracking experiments.
摘要：提出使用两个照相机以获得用于人体跟踪一个目标物体的三维（3D）信息的过程。从两个相机人体检测和跟踪的结果被积分，以获得3D信息。 OpenPose用于人体检测。在一般的处理的立体相机的情况下，整个场景的范围的图像被获取尽可能精确地，然后将范围图像被处理。不过，也有例如不正确的匹配和计算成本的校准过程的问题。一种新的立体视觉框架，提出了应对的问题。所提出的框架和有效性的方法的通过目标跟踪实验验证。

28. A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification [PDF] 返回目录
Mengxi Jia, Yunpeng Zhai, Shijian Lu, Siwei Ma, Jian Zhang
Abstract: RGB-Infrared (IR) cross-modality person re-identification (re-ID), which aims to search an IR image in RGB gallery or vice versa, is a challenging task due to the large discrepancy between IR and RGB modalities. Existing methods address this challenge typically by aligning feature distributions or image styles across modalities, whereas the very useful similarities among gallery samples of the same modality (i.e. intra-modality sample similarities) is largely neglected. This paper presents a novel similarity inference metric (SIM) that exploits the intra-modality sample similarities to circumvent the cross-modality discrepancy targeting optimal cross-modality image matching. SIM works by successive similarity graph reasoning and mutual nearest-neighbor reasoning that mine cross-modality sample similarities by leveraging intra-modality sample similarities from two different perspectives. Extensive experiments over two cross-modality re-ID datasets (SYSU-MM01 and RegDB) show that SIM achieves significant accuracy improvement but with little extra training as compared with the state-of-the-art.
摘要：RGB-红外（IR）跨模态人重新鉴定（RE-ID），在RGB画廊或反之亦然，以搜寻的IR图像，其目的，是一项具有挑战性的任务，由于IR和RGB模式之间的大的差异。现有的方法通过对准特征分布或跨模态图像的样式，而在相同模态的画廊样品中，是非常有用的相似性（即帧内模态样品相似）在很大程度上被忽视通常应对这一挑战。本文提出了一种新颖的相似性度量推理（SIM），它利用所述内部形态样品相似规避跨模态差异靶向最佳跨模态图像匹配。 SIM卡的工作原理是连续相似度图形推理和相互近邻理由是从两个不同的角度充分利用内部形态样品的相似矿井跨模式样本相似。超过两跨模态重ID的数据集（中山大学-MM01和RegDB）广泛的实验表明，SIM达到显著精度的提高，但与一些额外的训练与国家的最先进的相比。

29. A Competence-aware Curriculum for Visual Concepts Learning via Question Answering [PDF] 返回目录
Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu
Abstract: Humans can progressively learn visual concepts from easy to hard questions. To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. Specifically, we design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process with an adaptive curriculum. The mIRT effectively estimates the concept difficulty and the model competence at each learning step from accumulated model responses. The estimated concept difficulty and model competence are further utilized to select the most profitable training samples. Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances with superior data efficiency and convergence speed. Specifically, the proposed model only uses 40% of training data and converges three times faster compared with other state-of-the-art methods.
摘要：人类可以逐步学习视觉概念从易到难的问题。为了模拟这种高效的学习能力，我们提出了视觉概念学习能力的感知课程在一问一答的方式。具体来说，我们设计了一个神经符号概念学习者用于学习视觉概念和多维项目反应理论（MIRT）模型具有自适应课程指导学习过程。该MIRT有效估计的概念困难，从积累的模型响应每个学习步骤模型的能力。估计概念难度和模型的能力被进一步用于选择最赚钱的训练样本。上CLEVR实验结果表明，具有竞争力感知课程，所提出的方法实现了状态的最艺术表演与卓越的数据效率和收敛速度。具体而言，与其他国家的最先进的方法相比，所提出的模型仅使用训练数据的40％，收敛快三倍。

30. Few-Shot Semantic Segmentation Augmented with Image-Level Weak Annotations [PDF] 返回目录
Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Chang-Tien Lu
Abstract: Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural network-based methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in few-shot semantic segmentation tackles the issue by utilizing only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to utilize image-level weak annotations, which can easily be obtained and considerably improve performance in the semantic segmentation task. In this paper, we advance the few-shot segmentation paradigm towards a scenario where image-level annotations are available to help the training process of a few pixel-level annotations. Specifically, we propose a new framework to learn the class prototype representation in the metric space by integrating image-level annotations. Furthermore, a soft masked average pooling strategy is designed to handle distractions in image-level annotations. Extensive empirical results on PASCAL-5i show that our method can achieve 5.1% and 8.2% increases of mIoU score for one-shot settings with pixel-level and scribble annotations, respectively.
摘要：尽管深层神经网络在语义分割任务取得了很大的进步，传统的基于神经网络的方法通常遭受大量的像素级别的注解短缺。在为数不多的镜头语义分割的最新进展，利用只有几个像素级别的注释例子铲球的问题。然而，这些为数不多的射门方法不能容易地应用到利用图像电平弱的注解，它可以很容易地获得，大大提高了语义分割任务性能。在本文中，我们推进为数不多的镜头分割模式对这样一个场景，图像级别注释可以帮助几个像素级注释的训练过程。具体来说，我们提出了一个新的框架，通过集成图像级别注释学会在度量空间的类原型表示。此外，软屏蔽平均合并策略被设计用于处理图像级别注释分心。在PASCAL-5I显示，我们的方法可以实现分别像素级和涂鸦注解，一次性设置米欧分数的5.1％和8.2％增加大量的实证结果。

31. Learning to Prune in Training via Dynamic Channel Propagation [PDF] 返回目录
Shibo Shen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang, Yugeng Zhou
Abstract: In this paper, we propose a novel network training mechanism called "dynamic channel propagation" to prune the neural networks during the training period. In particular, we pick up a specific group of channels in each convolutional layer to participate in the forward propagation in training time according to the significance level of channel, which is defined as channel utility. The utility values with respect to all selected channels are updated simultaneously with the error back-propagation process and will adaptively change. Furthermore, when the training ends, channels with high utility values are retained whereas those with low utility values are discarded. Hence, our proposed scheme trains and prunes neural networks simultaneously. We empirically evaluate our novel training scheme on various representative benchmark datasets and advanced convolutional neural network (CNN) architectures, including VGGNet and ResNet. The experiment results verify the superior performance and robust effectiveness of our approach.
摘要：在本文中，我们提出了称为“动态信道传播”在训练期间修剪神经网络的新的网络训练机构。特别是，我们拾取的信道的特定组中的每个卷积层根据信道的显着性水平，其被定义为信道利用率参与正向传播的训练时间。到所有选择的信道效用值相对于与所述误差反向传播过程同时更新，并将自适应地改变。此外，当训练结束时，具有高的效用值的通道被保留，而那些具有低效用值被丢弃。因此，我们提出的方案，火车和修剪神经网络同时进行。我们根据经验评估我们的各种代表性的基准数据集和先进的卷积神经网络（CNN）架构，包括VGGNet和RESNET新颖的培训方案。实验结果验证了卓越的性能和我们的方法的稳健有效性。

32. RSAC: Regularized Subspace Approximation Classifier for Lightweight Continuous Learning [PDF] 返回目录
Chih-Hsing Ho, Shang-Ho, Tsai
Abstract: Continuous learning seeks to perform the learning on the data that arrives from time to time. While prior works have demonstrated several possible solutions, these approaches require excessive training time as well as memory usage. This is impractical for applications where time and storage are constrained, such as edge computing. In this work, a novel training algorithm, regularized subspace approximation classifier (RSAC), is proposed to achieve lightweight continuous learning. RSAC contains a feature reduction module and classifier module with regularization. Extensive experiments show that RSAC is more efficient than prior continuous learning works and outperforms these works on various experimental settings.
摘要：不断学习力求在从时间到达时间的数据进行学习。虽然之前的作品已经证明了几种可能的解决方法，这些方法需要过多的培训时间和内存使用情况。这是不切实际的，其中时间和存储被约束的应用，如边缘计算。在这项工作中，一个新的训练算法，规则化的子空间近似分类（RSAC），提出了实现轻量化不断学习。 RSAC包含一个功能降低模块，并与正规化分类器模块。大量的实验表明，RSAC比之前不断的学习工作更高效，更胜过对各种实验设置这些作品。

33. Interactive Knowledge Distillation [PDF] 返回目录
Shipeng Fu, Zhen Li, Jun Xu, Ming-Ming Cheng, Gwanggil Jeon, Xiaomin Yang
Abstract: Knowledge distillation is a standard teacher-student learning framework to train a light-weight student network under the guidance of a well-trained large teacher network. As an effective teaching strategy, interactive teaching has been widely employed at school to motivate students, in which teachers not only provide knowledge but also give constructive feedback to students upon their responses, to improve their learning performance. In this work, we propose an InterActive Knowledge Distillation (IAKD) scheme to leverage the interactive teaching strategy for efficient knowledge distillation. In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation: randomly replacing the blocks in the student network with the corresponding blocks in the teacher network. In the way, we directly involve the teacher's powerful feature transformation ability to largely boost the student's performance. Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods on diverse image classification datasets.
摘要：知识蒸馏是一个标准的师生学习框架来训练重量轻学生网络训练有素的大名师网的指导下进行。作为一种有效的教学策略，教学互动已广泛在学校用来激励学生，在教师不仅提供知识，而且在他们的反应给学生建设性的反馈意见，以提高他们的学习表现。在这项工作中，我们提出了一个互动的知识蒸馏（IAKD）方案，以充分利用高效的知识蒸馏互动教学策略。在蒸馏过程中，教师和学生网络之间的相互作用是由一个交换入操作来实现：与老师网络中的相应的块随机地更换学生网络中的块。在路上，我们直接参与老师的强大的功能转化能力大幅提升学生的表现。与师生网络的典型设置的实验表明，通过我们的IAKD训练学生的网络实现比那些通过对不同的图像分类数据集传统知识的蒸馏方法的培训更好的性能。

34. ODE-CNN: Omnidirectional Depth Extension Networks [PDF] 返回目录
Xinjing Cheng, Peng Wang, Yanqi Zhou, Chenye Guan, Ruigang Yang
Abstract: Omnidirectional 360° camera proliferates rapidly for autonomous robots since it significantly enhances the perception ability by widening the field of view(FoV). However, corresponding 360° depth sensors, which are also critical for the perception system, are still difficult or expensive to have. In this paper, we propose a low-cost 3D sensing system that combines an omnidirectional camera with a calibrated projective depth camera, where the depth from the limited FoV can be automatically extended to the rest of the recorded omnidirectional image. To accurately recover the missing depths, we design an omnidirectional depth extension convolutional neural network(ODE-CNN), in which a spherical feature transform layer(SFTL) is embedded at the end of feature encoding layers, and a deformable convolutional spatial propagation network(D-CSPN) is appended at the end of feature decoding layers. The former resamples the neighborhood of each pixel in the omnidirectional coordination to the projective coordination, which reduces the difficulty of feature learning, and the later automatically finds a proper context to well align the structures in the estimated depths via CNN w.r.t. the reference image, which significantly improves the visual quality. Finally, we demonstrate the effectiveness of proposed ODE-CNN over the popular 360D dataset and show that ODE-CNN significantly outperforms (relatively 33% reduction in-depth error) other state-of-the-art (SoTA) methods.
摘要：全向360°照相机那繁殖迅速为自主机器人，因为它增强了显著通过加宽的视野（FOV）的字段中的感知能力。然而，对应360°深度传感器，其也可用于感知系统的关键，仍然困难或昂贵有。在本文中，我们提出了一种低成本的三维传感系统结合的全方位照相机用校准的投影深度相机，其中从限定的FoV的深度可以自动扩展所记录的全方位图像的其余部分。为了精确地恢复丢失的深处，我们设计了一个全向深度延伸卷积神经网络（ODE-CNN），其中一个球形特征变换层（SFTL）被嵌入在特征的编码层的端部，和一个可变形的卷积空间传播网络（ d-CSPN）在特征解码层的端被附加。前者重采样到投影协调，从而降低了地物学习的难度全方位协调每个像素的附近，而后者会自动找到一个适当的范围内通过CNN w.r.t.很好地对准在估计深度结构参考图像，其显著改善视觉质量。最后，我们证明提出ODE-CNN的过流行360D数据集的效果，并表明ODE-CNN显著性能优于（相对减少33％深度误差）其他国家的最先进的（Sota株）的方法。

35. Task-agnostic Temporally Consistent Facial Video Editing [PDF] 返回目录
Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, Jiebo Luo
Abstract: Recent research has witnessed the advances in facial image editing tasks. For video editing, however, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In addition, these methods are confined to dealing with one specific task at a time without any extensibility. In this paper, we propose a task-agnostic temporally consistent facial video editing framework. Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner. The core design includes a dynamic training sample selection mechanism and a novel 3D temporal loss constraint that fully exploits both image and video datasets and enforces temporal consistency. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
摘要：最近的研究已经见证了人脸图像编辑任务的进展。视频编辑，然而，以前的方法或者简单地通过帧应用变换帧或利用在级联或迭代的方式的多个帧，这导致显着的视觉闪烁。此外，这些方法只限于在同一时间与一个特定的任务处理，没有任何可扩展性。在本文中，我们提出了一个任务无关的时间一致的面部视频编辑的框架。基于三维重建模型，我们的框架设计更统一，解开的方式来处理一些编辑任务。核心设计包括动态训练样本选择机构和新颖3D时间损失约束充分利用了两者的图像和视频的数据集，并执行时间一致性。与国家的最先进的面部图像编辑方法相比，我们的框架生成更逼真的和时间上流畅的视频肖像。

36. Anatomy-Aware Siamese Network: Exploiting Semantic Asymmetry for Accurate Pelvic Fracture Detection in X-ray Images [PDF] 返回目录
Haomin Chen, Yirui Wang, Kang Zheng, Weijian Li, Chi-Tung Cheng, Adam P. Harrison, Jing Xiao, Gregory D. Hager, Le Lu, Chien-Hung Liao, Shun Miao
Abstract: Visual cues of enforcing bilaterally symmetric anatomies as normal findings are widely used in clinical practice to disambiguate subtle abnormalities from medical images. So far, inadequate research attention has been received on effectively emulating this practice in CAD methods. In this work, we exploit semantic anatomical symmetry or asymmetry analysis in a complex CAD scenario, i.e., anterior pelvic fracture detection in trauma PXRs, where semantically pathological (refer to as fracture) and non-pathological (e.g., pose) asymmetries both occur. Visually subtle yet pathologically critical fracture sites can be missed even by experienced clinicians, when limited diagnosis time is permitted in emergency care. We propose a novel fracture detection framework that builds upon a Siamese network enhanced with a spatial transformer layer to holistically analyze symmetric image features. Image features are spatially formatted to encode bilaterally symmetric anatomies. A new contrastive feature learning component in our Siamese network is designed to optimize the deep image features being more salient corresponding to the underlying semantic asymmetries (caused by pelvic fracture occurrences). Our proposed method have been extensively evaluated on 2,359 PXRs from unique patients (the largest study to-date), and report an area under ROC curve score of 0.9771. This is the highest among state-of-the-art fracture detection methods, with improved clinical indications.
摘要：实施双侧对称的解剖正常结果被广泛应用于临床实践，以消除歧义从医学图像细微异常的视觉线索。到目前为止，研究重视不够已收到有效仿效CAD方法这种做法。在这项工作中，我们利用在一个复杂的CAD场景语义解剖对称或不对称分析，即，前骨盆骨折检测在创伤PXRs，其中语义病理（参照如断裂）和非病理（例如，姿势）不对称都发生。视觉微妙而关键的病理骨折部位，甚至可以由经验丰富的医生，当有限的诊断时间在紧急护理是允许错过。我们提出了一个新颖的裂缝检测框架，建立在与空间变换器层增强的一个连体网络从整体上分析对称图像特征。图像特征空间上格式化编码左右对称的解剖结构。我们的连体网络中的新的对比特征的学习组件被设计，以优化深图像特征是对应于底层的语义不对称更加突出的（由骨盆骨折发生）。我们提出的方法已经从独特的患者（最大的研究至今）2359个PXRs了广泛的评价，并报告在ROC曲线得分0.9771区域。这是国家的最先进的断裂检测方法中最高的，具有改善的临床适应症。

37. Joint Frequency- and Image-Space Learning for Fourier Imaging [PDF] 返回目录
Nalini M. Singh, Juan Eugenio Iglesias, Elfar Adalsteinsson, Adrian V. Dalca, Polina Golland
Abstract: We propose a neural network layer structure that combines frequency and image feature representations for robust Fourier image reconstruction. Our work is motivated by the challenges in magnetic resonance imaging (MRI) where the acquired signal is a corrupted Fourier transform of the desired image. The proposed layer structure enables both correction of artifacts native to the frequency-space and manipulation of image-space representations to reconstruct coherent image structures. This is in contrast to the current deep learning approaches for image reconstruction that manipulate data solely in the frequency-space or solely in the image-space. We demonstrate the advantages of the proposed joint learning on three diverse tasks including image reconstruction from undersampled acquisitions, motion correction, and image denoising in brain MRI. Unlike purely image based and purely frequency based architectures, the proposed joint model produces consistently high quality output images. The resulting joint frequency- and image-space feature representations promise to significantly improve modeling and reconstruction of images acquired in the frequency-space. Our code is available at this https URL.
摘要：本文提出了一种神经网络层结构，结合频率和图像特征的表示，强劲的傅立叶图像重建。我们的工作是通过在磁共振成像中的挑战（MRI），其中所获取的信号是一个已损坏的傅立叶变换所需要的图像的激励。所提出的层结构使原产于频率空间和图像空间表示的操纵来重建图像的相干结构的工件的两个校正。这与当前的深度学习用于图像重建操纵仅在频率 - 空间或仅在图像空间数据接近。我们证明在三个不同的任务，包括从欠收购图像重建，运动校正，并在颅脑MRI图像去噪提出的联合学习的优点。不同于纯粹基于图像和纯粹基于频率的结构，所提出的联合模型一致地产生高质量的输出图像。将所得的联合频率和图像空间特征表示承诺显著提高建模并在频率 - 空间获取的图像的重建。我们的代码可在此HTTPS URL。

38. Learning Orientation Distributions for Object Pose Estimation [PDF] 返回目录
Brian Okorn, Mengyun Xu, Martial Hebert, David Held
Abstract: For robots to operate robustly in the real world, they should be aware of their uncertainty. However, most methods for object pose estimation return a single point estimate of the object's pose. In this work, we propose two learned methods for estimating a distribution over an object's orientation. Our methods take into account both the inaccuracies in the pose estimation as well as the object symmetries. Our first method, which regresses from deep learned features to an isotropic Bingham distribution, gives the best performance for orientation distribution estimation for non-symmetric objects. Our second method learns to compare deep features and generates a non-parameteric histogram distribution. This method gives the best performance on objects with unknown symmetries, accurately modeling both symmetric and non-symmetric objects, without any requirement of symmetry annotation. We show that both of these methods can be used to augment an existing pose estimator. Our evaluation compares our methods to a large number of baseline approaches for uncertainty estimation across a variety of different types of objects.
摘要：机器人在现实世界中稳健运行，他们应该意识到自己的不确定性。然而，对于对象姿态估计大多数方法返回对象的姿态的单点估计。在这项工作中，我们提出了评估对象的方位分布两位渊博的方法。我们的方法既考虑到在姿态估计，以及对象对称性的不准确。我们的第一个方法，从深学特性各向同性宾汉姆分布消退，给出了方向分布估计的非对称物体的最佳性能。我们的第二个方法学会了比较深的功能，并产生一个非parameteric直方图分布。此方法给出了具有未知对称性的对象的最佳性能，精确地建模对称和非对称的目的，而不对称注释的任何要求。我们发现，这两种方法都可以用来扩展现有的姿势估计。我们评估了大量基线的不确定性估计在各种不同类型的对象的方法进行了比较我们的方法。

39. D-NetPAD: An Explainable and Interpretable Iris Presentation Attack Detector [PDF] 返回目录
Renu Sharma, Arun Ross
Abstract: An iris recognition system is vulnerable to presentation attacks, or PAs, where an adversary presents artifacts such as printed eyes, plastic eyes, or cosmetic contact lenses to circumvent the system. In this work, we propose an effective and robust iris PA detector called D-NetPAD based on the DenseNet convolutional neural network architecture. It demonstrates generalizability across PA artifacts, sensors and datasets. Experiments conducted on a proprietary dataset and a publicly available dataset (LivDet-2017) substantiate the effectiveness of the proposed method for iris PA detection. The proposed method results in a true detection rate of 98.58\% at a false detection rate of 0.2\% on the proprietary dataset and outperfoms state-of-the-art methods on the LivDet-2017 dataset. We visualize intermediate feature distributions and fixation heatmaps using t-SNE plots and Grad-CAM, respectively, in order to explain the performance of D-NetPAD. Further, we conduct a frequency analysis to explain the nature of features being extracted by the network. The source code and trained model are available at this https URL.
摘要：一种虹膜识别系统很容易受到攻击呈现，或功率放大器，其中对手呈现伪影，例如印刷的眼睛，眼睛塑料或化妆品隐形眼镜规避系统。在这项工作中，我们提出了基于DenseNet卷积神经网络架构，称为d-NetPAD有效和有力的虹膜PA探测器。它表明整个PA文物，传感器和数据集的普遍性。一种专有数据集和可公开获得的数据集（LivDet-2017）进行的实验证实了虹膜PA检测所提出的方法的有效性。所提出的方法导致的98.58 \％真正的检测率在上专有数据集和outperfoms上LivDet-2017数据集状态的最先进的方法的0.2 \％误检测率。我们想象中间特征分布和用t-SNE图和梯度-CAM，分别固定热图中，为了解释d-NetPAD的性能。此外，我们进行频率分析，以解释由网络被提取的特征的性质。源代码和训练模型可在此HTTPS URL。

40. Low-Power Object Counting with Hierarchical Neural Networks [PDF] 返回目录
Abhinav Goel, Caleb Tung, Sara Aghajanzadeh, Isha Ghodgaonkar, Shreya Ghosh, George K. Thiruvathukal, Yung-Hsiang Lu
Abstract: Deep Neural Networks (DNNs) can achieve state-of-the-art accuracy in many computer vision tasks, such as object counting. Object counting takes two inputs: an image and an object query and reports the number of occurrences of the queried object. To achieve high accuracy on such tasks, DNNs require billions of operations, making them difficult to deploy on resource-constrained, low-power devices. Prior work shows that a significant number of DNN operations are redundant and can be eliminated without affecting the accuracy. To reduce these redundancies, we propose a hierarchical DNN architecture for object counting. This architecture uses a Region Proposal Network (RPN) to propose regions-of-interest (RoIs) that may contain the queried objects. A hierarchical classifier then efficiently finds the RoIs that actually contain the queried objects. The hierarchy contains groups of visually similar object categories. Small DNNs are used at each node of the hierarchy to classify between these groups. The RoIs are incrementally processed by the hierarchical classifier. If the object in an RoI is in the same group as the queried object, then the next DNN in the hierarchy processes the RoI further; otherwise, the RoI is discarded. By using a few small DNNs to process each image, this method reduces the memory requirement, inference time, energy consumption, and number of operations with negligible accuracy loss when compared with the existing object counters.
摘要：深层神经网络（DNNs）可以实现国家的最先进的精度在许多计算机视觉任务，如对象计数。对象计数采用两个输入：图像和对象查询，并报告所查询的对象的出现的次数。为了实现这样的任务精度高，DNNs需要数十亿的操作，使他们难以在资源受限，低功耗设备部署。以前的工作表明，DNN操作的显著数量是多余的，可以在不影响精度被淘汰。为了减少这些冗余，我们提出了一个层次DNN架构对象计数。这种结构采用的区域建议网络（RPN）提出的区域 - 的利益（投资回报）可能包含查询的对象。分层分类，然后高效地发现，实际上包含查询对象的投资回报。的层次结构包含视觉上相似的对象的类别的组。小DNNs在层次结构的每个节点用于将这些组之间进行分类。感兴趣区递增由分级分类处理。如果在一个ROI对象处于同一组作为查询对象，那么接下来DNN层次结构中的进一步处理的投资回报;否则，将ROI丢弃。通过使用几个小DNNs处理每个图像，当与现有的对象计数器相比，该方法减少了存储器需求，推理时间，能量消耗，和数量可以忽略不计的精度损失操作。

41. Trace-Norm Adversarial Examples [PDF] 返回目录
Ehsan Kazemi, Thomas Kerdreux, Liqiang Wang
Abstract: White box adversarial perturbations are sought via iterative optimization algorithms most often minimizing an adversarial loss on a $l_p$ neighborhood of the original image, the so-called distortion set. Constraining the adversarial search with different norms results in disparately structured adversarial examples. Here we explore several distortion sets with structure-enhancing algorithms. These new structures for adversarial examples, yet pervasive in optimization, are for instance a challenge for adversarial theoretical certification which again provides only $l_p$ certificates. Because adversarial robustness is still an empirical field, defense mechanisms should also reasonably be evaluated against differently structured attacks. Besides, these structured adversarial perturbations may allow for larger distortions size than their $l_p$ counter-part while remaining imperceptible or perceptible as natural slight distortions of the image. Finally, they allow some control on the generation of the adversarial perturbation, like (localized) bluriness.
摘要：白盒对抗扰动通过迭代优化算法最常最大限度地减少对原始图像，即所谓的失真集的$ L_P $附近的敌对损失追捧。约束与全异地结构对抗性的例子不同的规范结果的对抗搜索。这里，我们探讨几个失真套，结构增强算法。为对抗的例子，在优化尚未普及这些新的结构，例如是用于对抗理论认证，再次仅提供$ L_P $证书是一个挑战。由于对抗性稳健性仍然是一个经验领域，防御机制也应该合理地对结构不同的攻击进行评估。此外，这些结构化的对抗性扰动可以允许更大的扭曲大小比它们的$ $ L_P计数器部分，而其余不可察觉或感知为图像的自然轻微扭曲。最后，他们允许在对抗扰动，如（本地化）bluriness产生一定的控制。

42. Swoosh! Rattle! Thump! -- Actions that Sound [PDF] 返回目录
Dhiraj Gandhi, Abhinav Gupta, Lerrel Pinto
Abstract: Truly intelligent agents need to capture the interplay of all their senses to build a rich physical understanding of their world. In robotics, we have seen tremendous progress in using visual and tactile perception; however, we have often ignored a key sense: sound. This is primarily due to the lack of data that captures the interplay of action and sound. In this work, we perform the first large-scale study of the interactions between sound and robotic action. To do this, we create the largest available sound-action-vision dataset with 15,000 interactions on 60 objects using our robotic platform Tilt-Bot. By tilting objects and allowing them to crash into the walls of a robotic tray, we collect rich four-channel audio information. Using this data, we explore the synergies between sound and action and present three key insights. First, sound is indicative of fine-grained object class information, e.g., sound can differentiate a metal screwdriver from a metal wrench. Second, sound also contains information about the causal effects of an action, i.e. given the sound produced, we can predict what action was applied to the object. Finally, object representations derived from audio embeddings are indicative of implicit physical properties. We demonstrate that on previously unseen objects, audio embeddings generated through interactions can predict forward models 24% better than passive visual embeddings. Project videos and data are at this https URL
摘要：真正的智能代理需要捕获所有的感官的相互影响，以建立自己的世界中丰富的物理理解。在机器人技术，我们已经看到了使用视觉和触觉感知的巨大进步;但是，我们往往忽略了一个重要的意义：声音。这主要是由于缺乏数据的捕获动作和声音的相互作用。在这项工作中，我们执行的声音和机器人的行动之间的相互作用的首次大规模研究。要做到这一点，我们创造最大可用的声音，动作，视觉数据集上使用我们的机器人平台倾斜博特60级的对象15000个互动。通过倾斜对象，并允许他们撞到机器人托盘的墙，我们收集丰富的四通道音频信息。利用这些数据，我们探索的声音和行动，并呈现三个关键见解之间的协同作用。首先，声音指示细粒度对象类信息，例如，声音可以从一个金属扳手区分的金属螺丝起子。其次，声音还包含给定的生产，我们可以预测什么样的行动被应用到对象的声音行动的因果效应，即信息。最后，从音频的嵌入衍生对象表示是表示隐含的物理性能。我们表明，在以前看不到的物体，通过相互作用产生的音频嵌入物可预测前进车型24％优于被动视觉的嵌入。项目视频和数据都在此HTTPS URL

43. AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot [PDF] 返回目录
Tong Qin, Tongqing Chen, Yilun Chen, Qing Su
Abstract: Autonomous valet parking is a specific application for autonomous vehicles. In this task, vehicles need to navigate in narrow, crowded and GPS-denied parking lots. Accurate localization ability is of great importance. Traditional visual-based methods suffer from tracking lost due to texture-less regions, repeated structures, and appearance changes. In this paper, we exploit robust semantic features to build the map and localize vehicles in parking lots. Semantic features contain guide signs, parking lines, speed bumps, etc, which typically appear in parking lots. Compared with traditional features, these semantic features are long-term stable and robust to the perspective and illumination change. We adopt four surround-view cameras to increase the perception range. Assisting by an IMU (Inertial Measurement Unit) and wheel encoders, the proposed system generates a global visual semantic map. This map is further used to localize vehicles at the centimeter level. We analyze the accuracy and recall of our system and compare it against other methods in real experiments. Furthermore, we demonstrate the practicability of the proposed system by the autonomous parking application.
摘要：自主代客泊车是自主车辆的特定应用程序。在此任务中，车辆需要在狭窄的导航，拥挤和GPS否认停车场。精确的定位能力是非常重要的。传统的基于视觉的方法从跟踪丢失，由于无纹理区域，重复结构和外观上的变化受到影响。在本文中，我们利用强大的语义特征建立地图和定位车辆停车场。语义特征包含指路标志，停车线，减速带等，这些通常出现在停车场。与传统的特性相比，这些语义特征是长期稳定和稳健的视角和照明变化。我们采用四个环绕视摄像头，以增加感知范围。通过IMU（惯性测量单元）和轮编码器辅助，所提出的系统产生全局可视语义图。该地图还用于在厘米级来定位车辆。我们分析系统的准确率和查并将其与实际实验等方法。此外，我们证明了该系统由自治区驻车应用的实用性。

44. Continuously Indexed Domain Adaptation [PDF] 返回目录
Hao Wang, Hao He, Dina Katabi
Abstract: Existing domain adaptation focuses on transferring knowledge between domains with categorical indices (e.g., between datasets A and B). However, many tasks involve continuously indexed domains. For example, in medical applications, one often needs to transfer disease analysis and prediction across patients of different ages, where age acts as a continuous domain index. Such tasks are challenging for prior domain adaptation methods since they ignore the underlying relation among domains. In this paper, we propose the first method for continuously indexed domain adaptation. Our approach combines traditional adversarial adaptation with a novel discriminator that models the encoding-conditioned domain index distribution. Our theoretical analysis demonstrates the value of leveraging the domain index to generate invariant features across a continuous range of domains. Our empirical results show that our approach outperforms the state-of-the-art domain adaption methods on both synthetic and real-world medical datasets.
摘要：现有域适配集中于与分类索引（例如，数据集A和B之间）结构域之间转移的知识。然而，许多任务涉及连续索引域。例如，在医疗应用中，一个经常需要在不同的年龄，其中年龄充当连续域索引的患者转移疾病的分析和预测。因为它们忽略域之间的关系底层这种任务是具有挑战性事先域适配方法。在本文中，我们提出了持续索引领域适应性第一种方法。我们的方法结合了传统的对抗性适应了新的鉴别该型号编码空调域指数分布。我们的理论分析表明利用所访问的索引以产生跨越连续范围域的不变特征的值。我们的实证结果表明，该方法比对合成和真实世界的医疗数据集的国家的最先进的领域适应方法。

45. HDR-GAN: HDR Image Reconstruction from Multi-Exposed LDR Images with Large Motions [PDF] 返回目录
Yuzhen Niu, Jianbin Wu, Wenxi Liu, Wenzhong Guo, Rynson W.H. Lau
Abstract: Synthesizing high dynamic range (HDR) images from multiple low-dynamic range (LDR) exposures in dynamic scenes is challenging. There are two major problems caused by the large motions of foreground objects. One is the severe misalignment among the LDR images. The other is the missing content due to the over-/under-saturated regions caused by the moving objects, which may not be easily compensated for by the multiple LDR exposures. Thus, it requires the HDR generation model to be able to properly fuse the LDR images and restore the missing details without introducing artifacts. To address these two problems, we propose in this paper a novel GAN-based model, HDR-GAN, for synthesizing HDR images from multi-exposed LDR images. To our best knowledge, this work is the first GAN-based approach for fusing multi-exposed LDR images for HDR reconstruction. By incorporating adversarial learning, our method is able to produce faithful information in the regions with missing content. In addition, we also propose a novel generator network, with a reference-based residual merging block for aligning large object motions in the feature domain, and a deep HDR supervision scheme for eliminating artifacts of the reconstructed HDR images. Experimental results demonstrate that our model achieves state-of-the-art reconstruction performance over the prior HDR methods on diverse scenes.
摘要：从多个低动态范围合成的高动态范围（HDR）图像（LDR）在动态场景曝光是具有挑战性的。有引起前景对象的大运动两大问题。一个是LDR图像之间的错位严重。另一种是缺少内容由于欠饱和所引起的移动的物体，这可能不容易由多个LDR曝光补偿区域中的过/。因此，它需要HDR一代车型能够正确保险丝LDR图像，并没有引入文物还原丢失的细节。为了解决这两个问题，我们建议在本文提出了一种新的基于GaN的模式，HDR-GAN，从多暴露LDR图像合成HDR图像。据我们所知，这项工作是融合多暴露LDR图片为HDR重建第一GaN的方法。通过采用对抗性的学习，我们的方法是能够产生忠实信息在各地区与遗漏的内容。此外，我们还提出了一种新发电机网络，具有基于参考残余合并块用于在所述特征域对准大对象运动，并用于消除重构的HDR图像的伪像的深HDR监督方案。实验结果表明，我们的模型实现了对不同的场景之前HDR方法的国家的最先进的重建性能。

46. LOL: Lidar-Only Odometry and Localization in 3D Point Cloud Maps [PDF] 返回目录
David Rozenberszki, Andras Majdik
Abstract: In this paper we deal with the problem of odometry and localization for Lidar-equipped vehicles driving in urban environments, where a premade target map exists to localize against. In our problem formulation, to correct the accumulated drift of the Lidar-only odometry we apply a place recognition method to detect geometrically similar locations between the online 3D point cloud and the a priori offline map. In the proposed system, we integrate a state-of-the-art Lidar-only odometry algorithm with a recently proposed 3D point segment matching method by complementing their advantages. Also, we propose additional enhancements in order to reduce the number of false matches between the online point cloud and the target map, and to refine the position estimation error whenever a good match is detected. We demonstrate the utility of the proposed LOL system on several Kitti datasets of different lengths and environments, where the relocalization accuracy and the precision of the vehicle's trajectory were significantly improved in every case, while still being able to maintain real-time performance.
摘要：在本文中，我们处理的测距和定位的激光雷达搭载车辆行驶在城市环境中，预制目标映射存在本地化反对的问题。在我们的配方问题，纠正的累积漂移激光雷达测距法只适用于我们的地方识别方法来检测在线三维点云和先验离线地图之间的几何相似的位置。在所提出的系统中，我们整合了国家的最先进的激光雷达测距法只用算法通过补充自己的优势最近提出的三维点段匹配方法。此外，我们建议，为了减少网上的点云和目标之间的映射错误匹配的数量，并且无论何时检测的良好匹配细化位置估计误差的进一步增强。我们证明在不同的长度和环境，在重新定位的准确性和车辆的轨迹的精确度在任何情况下均显著改善的几个吉滴数据集所提出的LOL系统的效用，同时仍然能够保持实时性能。

47. Deep image prior for 3D magnetic particle imaging: A quantitative comparison of regularization techniques on Open MPI dataset [PDF] 返回目录
Sören Dittmer, Tobias Kluth, Mads Thorstein Roar Henriksen, Peter Maass
Abstract: Magnetic particle imaging (MPI) is an imaging modality exploiting the nonlinear magnetization behavior of (super-)paramagnetic nanoparticles to obtain a space- and often also time-dependent concentration of a tracer consisting of these nanoparticles. MPI has a continuously increasing number of potential medical applications. One prerequisite for successful performance in these applications is a proper solution to the image reconstruction problem. More classical methods from inverse problems theory, as well as novel approaches from the field of machine learning, have the potential to deliver high-quality reconstructions in MPI. We investigate a novel reconstruction approach based on a deep image prior, which builds on representing the solution by a deep neural network. Novel approaches, as well as variational and iterative regularization techniques, are compared quantitatively in terms of peak signal-to-noise ratios and structural similarity indices on the publicly available Open MPI dataset.
摘要：磁性粒子成像（MPI）是一种成像模态利用（超）顺磁纳米颗粒的非线性磁化行为以获得由这些纳米颗粒的示踪剂的空间和通常也依赖于时间的浓度。 MPI有不断增加的潜在的医学应用数量。在这些应用中的成功表现的一个先决条件是一个适当的解决方案，图像重建问题。从逆问题理论更经典方法，以及机器学习领域新的方法，必须在MPI提供高品质重建的潜力。我们研究基于深图像之前的新的重建方法，该方法建立在表示通过深神经网络的解决方案。新的方法，以及变分和迭代正则化技术，在对可公开获得的打开MPI数据集的峰值信号噪声比和结构相似性指标方面定量比较。

48. Self-supervised Neural Architecture Search [PDF] 返回目录
Sapir Kaplan, Raja Giryes
Abstract: Neural Architecture Search (NAS) has been used recently to achieve improved performance in various tasks and most prominently in image classification. Yet, current search strategies rely on large labeled datasets, which limit their usage in the case where only a smaller fraction of the data is annotated. Self-supervised learning has shown great promise in training neural networks using unlabeled data. In this work, we propose a self-supervised neural architecture search (SSNAS) that allows finding novel network models without the need for labeled data. We show that such a search leads to comparable results to supervised training with a "fully labeled" NAS and that it can improve the performance of self-supervised learning. Moreover, we demonstrate the advantage of the proposed approach when the number of labels in the search is relatively small.
摘要：神经结构搜索（NAS）最近已被使用，以实现各种任务的图像分类改进的性能和最突出。然而，当前的搜索策略依赖于大型数据集的标记，这限制了它们的使用在只有数据的小部分被标注的情况。自我监督学习已经显示出使用无标签数据训练神经网络的巨大潜力。在这项工作中，我们提出了一个自我监督的神经结构搜索（SSNAS），允许寻找新的网络模型，而不需要标注的数据。我们发现，这样的搜索导致类似的结果来指导训练与“全标” NAS，它可以提高自我监督学习的性能。此外，我们证明了该方法的优势，当标签在搜索的数量相对较少。

49. Self-Supervised GAN Compression [PDF] 返回目录
Chong Yu, Jeff Pool
Abstract: Deep learning's success has led to larger and larger models to handle more and more complex tasks; trained models can contain millions of parameters. These large models are compute- and memory-intensive, which makes it a challenge to deploy them with minimized latency, throughput, and storage requirements. Some model compression methods have been successfully applied to image classification and detection or language models, but there has been very little work compressing generative adversarial networks (GANs) performing complex tasks. In this paper, we show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
摘要：深学习的成功导致了越来越大的模型来处理越来越复杂的任务;训练的模型可包含数以百万计的参数。这些大型模型计算和内存密集型的，这使它成为一个挑战，他们以最小的延迟，吞吐量和存储要求部署。一些模型压缩方法已经成功地应用于图像分类和检测或语言模型，但一直很少的工作压缩生成对抗网络（甘斯）执行复杂的任务。在本文中，我们表明，一个标准模型的压缩技术，重量修剪，不能被应用到使用现有的方法甘斯。然后，我们开发出利用训练的鉴别监督压缩发生器的训练自我监督的压缩技术。我们表明，这种框架有一个令人信服的表现高度的稀疏性，可以很容易地应用到新的任务和模式，并允许不同的修剪粒度进行有意义的比较。

50. Persistent Neurons [PDF] 返回目录
Yimeng Min
Abstract: Most algorithms used in neural networks(NN)-based leaning tasks are strongly affected by the choices of initialization. Good initialization can avoid sub-optimal solutions and alleviate saturation during training. However, designing improved initialization strategies is a difficult task and our understanding of good initialization is still very primitive. Here, we propose persistent neurons, a strategy that optimizes the learning trajectory using information from previous converged solutions. More precisely, we let the parameters explore new landscapes by penalizing the model from converging to the previous solutions under the same initialization. Specifically, we show that persistent neurons, under certain data distribution, is able to converge to more optimal solutions while initializations under popular framework find bad local minima. We further demonstrate that persistent neurons helps improve the model's performance under both good and poor initializations. Moreover, we evaluate full and partial persistent model and show it can be used to boost the performance on a range of NN structures, such as AlexNet and residual neural network. Saturation of activation functions during persistent training is also studied.
摘要：神经网络（NN）中使用的大多数算法为基础的倾向任务强烈影响初始化的选择。良好的初始化可避免次优解决方案，并在训练中减轻饱和。然而，设计改进初始化策略是一项艰巨的任务，我们的好初始化的了解还非常原始。在这里，我们提出了持续性的神经元，优化利用以前融合解决方案信息的学习轨迹的策略。更确切地说，我们让参数通过相同的初始化下汇聚成了以前的解决方案惩罚模型探索新的景观。具体来说，我们表明，持久性的神经元，某些数据分布下，能够汇聚到更优化的解决方案，同时流行的框架下初始化发现的坏局部极小。我们进一步证明持续的神经元有助于提高包括好的和差的初始化根据模型的性能。此外，我们评估全部和部分持久化模型，并显示它可用于提高对一系列NN的结构，如AlexNet和残留神经网络的性能。持续训练期间激活功能饱和度进行了研究。

51. Learn Faster and Forget Slower via Fast and Stable Task Adaptation [PDF] 返回目录
Farshid Varno, Lucas May Petry, Lisa Di Jorio, Stan Matwin
Abstract: Training Deep Neural Networks (DNNs) is still highly time-consuming and compute-intensive. It has been shown that adapting a pretrained model may significantly accelerate this process. With a focus on classification, we show that current fine-tuning techniques make the pretrained models catastrophically forget the transferred knowledge even before anything about the new task is learned. Such rapid knowledge loss undermines the merits of transfer learning and may result in a much slower convergence rate compared to when the maximum amount of knowledge is exploited. We investigate the source of this problem from different perspectives and to alleviate it, introduce Fast And Stable Task-adaptation (FAST), an easy to apply fine-tuning algorithm. The paper provides a novel geometric perspective on how the loss landscape of source and target tasks are linked in different transfer learning strategies. We empirically show that compared to prevailing fine-tuning practices, FAST learns the target task faster and forgets the source task slower. The code is available at this https URL.
摘要：培训深层神经网络（DNNs）仍然是非常耗时和计算密集型。它已被证明适应预训练模式可能显著加快这一进程。重点是分类，我们发现目前的微调技术使预训练的机型甚至有关新任务学到了什么之前灾难性忘记转移知识。如此快速的知识流失破坏了迁移学习的优点，并可能导致更慢的收敛速度相比，当知识的最大量被利用到。我们研究从不同的角度这个问题的来源，以缓解它，引进快速和稳定的任务适应（FAST），一个简单的应用微调算法。本文提供了关于如何的源和目标任务的损失景观中不同的传输学习策略被链接的新颖几何透视。我们经验表明，与现行的微调的做法，FAST学习目标任务更快，忘了源任务慢。该代码可在此HTTPS URL。

52. Posterior Model Adaptation With Updated Priors [PDF] 返回目录
Jim Davis
Abstract: Classification approaches based on the direct estimation and analysis of posterior probabilities will degrade if the original class priors begin to change. We prove that a unique (up to scale) solution is possible to recover the data likelihoods for a test example from its original class posteriors and dataset priors. Given the recovered likelihoods and a set of new priors, the posteriors can be re-computed using Bayes' Rule to reflect the influence of the new priors. The method is simple to compute and allows a dynamic update of the original posteriors.
摘要：分类方法基于直接估计和后验概率的分析，如果原始类先验开始改变会降低。我们证明了一个独特的（最高比例）解决方案，可以恢复数据的可能性从原来的类后验和先验数据集的测试实例。由于回收可能性和一套新的先验，后验可以使用贝叶斯法则以反映新的先验的影响，重新计算的。该方法是简单的计算，并且允许原始后验的动态更新。

53. Deep Interactive Learning: An Efficient Labeling Approach for Deep Learning-Based Osteosarcoma Treatment Response Assessment [PDF] 返回目录
David Joon Ho, Narasimhan P. Agaram, Peter J. Schueffler, Chad M. Vanderbilt, Marc-Henri Jean, Meera R. Hameed, Thomas J. Fuchs
Abstract: Osteosarcoma is the most common malignant primary bone tumor. Standard treatment includes pre-operative chemotherapy followed by surgical resection. The response to treatment as measured by ratio of necrotic tumor area to overall tumor area is a known prognostic factor for overall survival. This assessment is currently done manually by pathologists by looking at glass slides under the microscope which may not be reproducible due to its subjective nature. Convolutional neural networks (CNNs) can be used for automated segmentation of viable and necrotic tumor on osteosarcoma whole slide images. One bottleneck for supervised learning is that large amounts of accurate annotations are required for training which is a time-consuming and expensive process. In this paper, we describe Deep Interactive Learning (DIaL) as an efficient labeling approach for training CNNs. After an initial labeling step is done, annotators only need to correct mislabeled regions from previous segmentation predictions to improve the CNN model until the satisfactory predictions are achieved. Our experiments show that our CNN model trained by only 7 hours of annotation using DIaL can successfully estimate ratios of necrosis within expected inter-observer variation rate for non-standardized manual surgical pathology task.
摘要：骨肉瘤是最常见的恶性原发性骨肿瘤。标准的治疗包括术前化疗后手术切除。如通过坏死肿瘤区整体肿瘤面积的比来测量对治疗的反应是整体存活率已知预后因素。此评估是目前人工病理学家通过观察其可能无法再生，在显微镜下的载玻片上，由于其主观性完成。卷积神经网络（细胞神经网络）可用于对骨肉瘤整个幻灯片图像可行的和坏死的肿瘤的自动分割。为监督学习的一个瓶颈是所需要的培训是一个耗时且昂贵的过程，大量准确的注解。在本文中，我们描述了深度互动学习（DIAL）是培养细胞神经网络的有效标记方法。最初的标记步骤完成后，注释者只需要从以前的分割预测正确贴错标签的地区，以改善CNN模型，直到满意的预言得以实现。我们的实验表明，我们的模型CNN仅7使用拨号可以成功地估计预期观察者间的变化率内坏死比率非标准化手册手术病理任务时间标注的训练。

54. DATE: Defense Against TEmperature Side-Channel Attacks in DVFS Enabled MPSoCs [PDF] 返回目录
Somdip Dey, Amit Kumar Singh, Xiaohang Wang, Klaus Dieter McDonald-Maier
Abstract: Given the constant rise in utilizing embedded devices in daily life, side channels remain a challenge to information flow control and security in such systems. One such important security flaw could be exploited through temperature side-channel attacks, where heat dissipation and propagation from the processing elements are observed over time in order to deduce security flaws. In our proposed methodology, DATE: Defense Against TEmperature side-channel attacks, we propose a novel approach of reducing spatial and temporal thermal gradient, which makes the system more secure against temperature side-channel attacks, and at the same time increases the reliability of the device in terms of lifespan. In this paper, we have also introduced a new metric, Thermal-Security-in-Multi-Processors (TSMP), which is capable of quantifying the security against temperature side-channel attacks on computing systems, and DATE is evaluated to be 139.24% more secure at the most for certain applications than the state-of-the-art, while reducing thermal cycle by 67.42% at the most.
摘要：鉴于不断上升的利用日常生活中的嵌入式设备，侧通道保持在这样的系统中信息流的控制和安全的挑战。一种这样的重要的安全漏洞可以通过温度侧信道攻击，其中的散热性和传播从所述处理元件被观察到随着时间的推移，以推导安全漏洞被利用。在我们提出的方法，DATE：防御低温侧信道攻击，我们提出了减少空间和时间的热梯度，这使得系统更加安全，对温度侧信道攻击的新方法，并在同一时间增加的可靠性该装置在寿命方面。在本文中，我们也推出了新度量，热 - 安全功能于多处理器（TSMP），其能够定量相对于温度边信道攻击的安全性上的计算系统，和日期是评价为139.24％更安全的在最对于某些应用比状态的最先进的，而在最由67.42％减少热循环。

55. Decoder-free Robustness Disentanglement without (Additional) Supervision [PDF] 返回目录
Yifei Wang, Dan Peng, Furui Liu, Zhenguo Li, Zhitang Chen, Jiansheng Yang
Abstract: Adversarial Training (AT) is proposed to alleviate the adversarial vulnerability of machine learning models by extracting only robust features from the input, which, however, inevitably leads to severe accuracy reduction as it discards the non-robust yet useful features. This motivates us to preserve both robust and non-robust features and separate them with disentangled representation learning. Our proposed Adversarial Asymmetric Training (AAT) algorithm can reliably disentangle robust and non-robust representations without additional supervision on robustness. Empirical results show our method does not only successfully preserve accuracy by combining two representations, but also achieve much better disentanglement than previous work.
摘要：对抗性训练（AT），提出通过提取来自输入，其中，但是，不可避免地导致严重的精度的下降，因为它丢弃非功能强大且实用的功能只有强大的功能，以减轻机器学习模型的对抗漏洞。这促使我们既保持稳健和非强大的功能，并将它们与解缠结表示学习分开。我们提出的对抗性训练不对称（AAT）算法能够可靠地解开强健和非强健表示，而没有对稳健性的额外监管。实证结果表明，我们的方法不仅成功地通过结合两种表示保持精度，还能达到比以往工作更好的解开。

56. Clustering of Electromagnetic Showers and Particle Interactions with Graph Neural Networks in Liquid Argon Time Projection Chambers Data [PDF] 返回目录
Francois Drielsma, Qing Lin, Pierre Côte de Soux, Laura Dominé, Ran Itay, Dae Heun Koh, Bradley J. Nelson, Kazuhiro Terao, Ka Vang Tsang, Tracy L. Usher
Abstract: Liquid Argon Time Projection Chambers (LArTPCs) are a class of detectors that produce high resolution images of charged particles within their sensitive volume. In these images, the clustering of distinct particles into superstructures is of central importance to the current and future neutrino physics program. Electromagnetic (EM) activity typically exhibits spatially detached fragments of varying morphology and orientation that are challenging to efficiently assemble using traditional algorithms. Similarly, particles that are spatially removed from each other in the detector may originate from a common interaction. Graph Neural Networks (GNNs) were developed in recent years to find correlations between objects embedded in an arbitrary space. GNNs are first studied with the goal of predicting the adjacency matrix of EM shower fragments and to identify the origin of showers, i.e. primary fragments. On the PILArNet public LArTPC simulation dataset, the algorithm developed in this paper achieves a shower clustering accuracy characterized by a mean adjusted Rand index (ARI) of 97.8 % and a primary identification accuracy of 99.8 %. It yields a relative shower energy resolution of $(4.1+1.4/\sqrt{E (\text{GeV})})\,\%$ and a shower direction resolution of $(2.1/\sqrt{E(\text{GeV})})^{\circ}$. The optimized GNN is then applied to the related task of clustering particle instances into interactions and yields a mean ARI of 99.2 % for an interaction density of $\sim\mathcal{O}(1)\,m^{-3}$.
摘要：液氩时间投影庭（LArTPCs）是一类其敏感体积内产生的带电粒子的高清晰度的图像检测器。在这些图像中，不同的颗粒的聚集成上层建筑是至关重要的当前和未来的中微子物理程序。电磁（EM）的活性通常表现出的是正在挑战使用传统的算法，以高效地组装不同的形态和取向的空间上分离的片段。类似地，在空间上彼此在检测器去除的颗粒可以从公共的交互发起。图神经网络（GNNS）在近几年发展找到嵌入在任意空间物体之间的相关性。 GNNS首先研究了预测EM淋浴片段的邻接矩阵的目标，并确定淋浴的起源，即，主段。在PILArNet公共LArTPC模拟数据集，在本文中开发的算法实现了淋浴聚类准确特征在于平均调整兰德指数97.8％（ARI）和99.8％的初级识别精度。它产生的$（4.1 + 1.4 / \ SQRT {E（\文本{GeV的}）}）\，\％$和$淋浴方向分辨率（2.1 / \ SQRT {E（\文本{的相对淋浴能量分辨率电子伏特}）}）^ {\保监会} $。然后，将最优化GNN被施加到聚类粒子实例插入相互作用的相关任务，并产生的99.2％的平均ARI为$ \ SIM \ mathcal {Ó}（1）的交互密度\，米^ { - 3} $。

57. Learning-based Defect Recognition for Quasi-Periodic Microscope Images [PDF] 返回目录
Nik Dennler, Antonio Foncubierta-Rodriguez, Titus Neupert, Marilyne Sousa
Abstract: The detailed control of crystalline material defects is a crucial process, as they affect properties of the material that may be detrimental or beneficial for the final performance of a device. Defect analysis on the sub-nanometer scale is enabled by high-resolution transmission electron microscopy (HRTEM), where the identification of defects is currently carried out based on human expertise. However, the process is tedious, highly time consuming and, in some cases, can yield to ambiguous results. Here we propose a semi-supervised machine learning method that assists in the detection of lattice defects from atomic resolution microscope images. It involves a convolutional neural network that classifies image patches as defective or non-defective, a graph-based heuristic that chooses one non-defective patch as a model, and finally an automatically generated convolutional filter bank, which highlights symmetry breaking such as stacking faults, twin defects and grain boundaries. Additionally, a variance filter is suggested to segment amorphous regions and beam defects. The algorithm is tested on III-V/Si crystalline materials and successfully evaluated against different metrics, showing promising results even for extremely small data sets. By combining the data-driven classification generality, robustness and speed of deep learning with the effectiveness of image filters in segmenting faulty symmetry arrangements, we provide a valuable open-source tool to the microscopist community that can streamline future HRTEM analyses of crystalline materials.
摘要：结晶材料缺陷的详细的控制是至关重要的过程，因为它们影响到可能有损或用于装置的最终性能是有益的材料的性质。在亚纳米尺度缺陷分析是通过高分辨率允许发送电子显微镜（HRTEM），其中的缺陷的识别基于人类专门知识目前进行。然而，该过程是乏味，非常费时，并且在一些情况下，可以产生不明确的结果。在这里，我们提出了一种半监督的机器学习方法，在检测到从原子分辨率显微镜图像晶格缺陷的协助。它涉及到一个卷积神经网络进行分类图像块为缺陷或无缺陷，基于图的试探法中选择一个无缺陷的补丁作为模型，最后自动生成的卷积滤波器组，其亮点对称破坏诸如堆垛层错，双缺陷和晶界。此外，方差滤波器被建议段非晶区和光束的缺陷。该算法是在III-V族/ Si的晶体材料测试，并成功评价针对不同的指标，示出了即使是非常小的数据集有前途的结果。通过深度学习的数据驱动分类的通用性，鲁棒性和速度与图像过滤器的故障分割对称安排的有效性结合起来，我们提供了一个有价值的开源工具，显微镜技术的社区，可以简化晶体材料的未来高分辨透射电子显微镜分析。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-07-06

目录

摘要