摘要

1. Improving Person Re-identification with Iterative Impression Aggregation [PDF] 返回目录
Dengpan Fu, Bo Xin, Jingdong Wang, Dongdong Chen, Jianmin Bao, Gang Hua, Houqiang Li
Abstract: Our impression about one person often updates after we see more aspects of him/her and this process keeps iterating given more meetings. We formulate such an intuition into the problem of person re-identification (re-ID), where the representation of a query (probe) image is iteratively updated with new information from the candidates in the gallery. Specifically, we propose a simple attentional aggregation formulation to instantiate this idea and showcase that such a pipeline achieves competitive performance on standard benchmarks including CUHK03, Market-1501 and DukeMTMC. Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods. Another advantage of this proposal is its flexibility to incorporate different representations and similarity metrics. By utilizing stronger representations and metrics, we further demonstrate state-of-the-art person re-ID performance, which also validates the general applicability of the proposed method.
摘要：之后，我们看到的他/她更多的方面我们对一个人的印象经常更新，这个过程不断迭代给出更多的会议。我们制定这样的直觉到的人重新鉴定（重新-ID），其中查询（探头）图像的表示迭代从画廊中的候选新信息更新的问题。具体来说，我们提出了一个简单的注意力聚集配方实例化这个概念，并展示，这样的管道上达到标准基准测试包括CUHK03，市场-1501和DukeMTMC竞争力的性能。不仅如此简单的方法，提高了基准模型的性能，还实现了与最先进的重新排序方法相当的性能。这个建议的另一个优点是它的灵活性，能把不同的表示和相似性指标。通过利用更强的表示和指标，我们进一步证明国家的最先进的人重新ID性能，这也证实了所提出的方法的普遍适用性。

2. Exploring Intensity Invariance in Deep Neural Networks for Brain Image Registration [PDF] 返回目录
Hassan Mahmood, Asim Iqbal, Syed Mohammed Shamsul Islam
Abstract: Image registration is a widely-used technique in analysing large scale datasets that are captured through various imaging modalities and techniques in biomedical imaging such as MRI, X-Rays, etc. These datasets are typically collected from various sites and under different imaging protocols using a variety of scanners. Such heterogeneity in the data collection process causes inhomogeneity or variation in intensity (brightness) and noise distribution. These variations play a detrimental role in the performance of image registration, segmentation and detection algorithms. Classical image registration methods are computationally expensive but are able to handle these artifacts relatively better. However, deep learning-based techniques are shown to be computationally efficient for automated brain registration but are sensitive to the intensity variations. In this study, we investigate the effect of variation in intensity distribution among input image pairs for deep learning-based image registration methods. We find a performance degradation of these models when brain image pairs with different intensity distribution are presented even with similar structures. To overcome this limitation, we incorporate a structural similarity-based loss function in a deep neural network and test its performance on the validation split separated before training as well as on a completely unseen new dataset. We report that the deep learning models trained with structure similarity-based loss seems to perform better for both datasets. This investigation highlights a possible performance limiting factor in deep learning-based registration models and suggests a potential solution to incorporate the intensity distribution variation in the input image pairs. Our code and models are available at this https URL.
摘要：图像配准是在分析大规模数据集广泛使用的技术，该技术通过各种成像模态和技术在生物医学成像，例如MRI，X射线，拍摄等，这些数据集通常由各个网站，并在不同成像协议收集使用各种扫描仪。在数据采集过程中这样的异质性引起的强度（亮度）和噪声分布的不均匀性或偏差。这些变化起到的图像配准，分割和检测算法的性能产生不利的作用。经典图像配准的方法是计算昂贵，但能够相对较好地处理这些文物。然而，深基于学习的技术被证明是用于自动化脑登记计算上高效的，但对于强度变化敏感。在这项研究中，我们调查的变化在强度分布深学习型图像配准方法输入图像对之间的影响。我们发现这些车型的性能下降时，具有不同的强度分布脑图像对，即使有相似的结构呈现。为了克服这种局限性，我们在深层神经网络结合的结构基于相似性损失函数，并测试其性能上的验证分开训练，以及一个完全看不见的新的数据集之前分离。我们报告说，与基于相似结构的损失训练有素的深度学习模式似乎有更好的表现为两个数据集。这项调查突出了可能的性能在基于深学习登记模式的限制因素，并建议纳入输入图像对强度分布的变化的可能的解决方案。我们的代码和模型可在此HTTPS URL。

3. Regularizing Attention Networks for Anomaly Detection in Visual Question Answering [PDF] 返回目录
Doyup Lee, Yeongjae Cheon, Wook-Shin Han
Abstract: For stability and reliability of real-world applications, the robustness of DNNs in unimodal tasks has been evaluated. However, few studies consider abnormal situations that a visual question answering (VQA) model might encounter at test time after deployment in the real-world. In this study, we evaluate the robustness of state-of-the-art VQA models to five different anomalies, including worst-case scenarios, the most frequent scenarios, and the current limitation of VQA models. Different from the results in unimodal tasks, the maximum confidence of answers in VQA models cannot detect anomalous inputs, and post-training of the outputs, such as outlier exposure, is ineffective for VQA models. Thus, we propose an attention-based method, which uses confidence of reasoning between input images and questions and shows much more promising results than the previous methods in unimodal tasks. In addition, we show that a maximum entropy regularization of attention networks can significantly improve the attention-based anomaly detection of the VQA models. Thanks to the simplicity, attention-based anomaly detection and the regularization are model-agnostic methods, which can be used for various cross-modal attentions in the state-of-the-art VQA models. The results imply that cross-modal attention in VQA is important to improve not only VQA accuracy, but also the robustness to various anomalies.
摘要：对于稳定性和实际应用中的可靠性，在单峰任务DNNs的稳健性进行了评估。然而，很少有研究考虑了异常情况，一个视觉问答（VQA）的模式可能会在实际部署后，在测试时遇到的问题。在这项研究中，我们评估的国家的最先进的VQA模型，以五个不同的异常，包括最坏的情况，最常见的场景，和VQA车型的限流的稳健性。从单峰的任务的结果不同的是，在VQA车型答案的最大信心无法检测到异常的输入，输出，如离群曝光的培训后，是无效的VQA模型。因此，我们提出了一种基于注意机制的方法，它使用输入图像和问题，并在节目之间的推理更可喜的成果比单峰的任务前面的方法的信心。此外，我们表明，关注网络的最大熵正则可以显著提高注意力，基于异常检测的VQA车型。多亏了简单起见，基于注意机制的异常检测和正则化是模型无关的方法，其可用于在国家的最先进的VQA模型各种横模态的关注。结果意味着VQA跨模态的关注是非常重要的改进，不仅VQA准确性，而且鲁棒性的各种异常。

4. Visual-Semantic Embedding Model Informed by Structured Knowledge [PDF] 返回目录
Mirantha Jayathilaka, Tingting Mu, Uli Sattler
Abstract: We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both standard and zero-shot settings. We propose two novel evaluation frameworks to analyse classification errors with respect to the class hierarchy indicated by the knowledge base. The approach is tested using the ILSVRC 2012 image dataset and a WordNet knowledge base. With respect to both standard and zero-shot image classification, our approach shows superior performance compared with the original approach, which uses word embeddings.
摘要：本文提出一种新的方法通过将来自外部的结构化的知识基础拍摄的概念表示，以改善视觉语义模型嵌入。我们调查下标准和零拍摄设置的图像分类性能。我们提出了两个新的评估框架来分析相对于由知识库指示的类层次结构分类的错误。该方法是使用ILSVRC 2012的图像数据集和一个WordNet的知识库进行测试。对于标准和零镜头图像分类，我们的做法显示出优异的性能与原来的做法，即用字的嵌入比较。

5. Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild [PDF] 返回目录
Akash Sengupta, Ignas Budvytis, Roberto Cipolla
Abstract: This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of \textit{in-the-wild} training data with \textit{diverse and accurate} body shape labels. Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity. We bridge the gap between synthetic training inputs and noisy real inputs, which are predicted by keypoint detection and segmentation CNNs at test-time, by using data augmentation and corruption during training. In order to evaluate our approach, we curate and provide a challenging evaluation dataset for monocular human shape estimation, Sports Shape and Pose 3D (SSP-3D). It consists of RGB images of tightly-clothed sports-persons with a variety of body shapes and corresponding pseudo-ground-truth SMPL shape and pose parameters, obtained via multi-frame optimisation. We show that STRAPS outperforms other state-of-the-art methods on SSP-3D in terms of shape prediction accuracy, while remaining competitive with the state-of-the-art on pose-centric datasets and metrics.
摘要：本文地址单眼三维人体形状和姿态估计的从RGB图像的问题。尽管姿态预测精度方面在这一领域的巨大进步，国家的最先进的方法往往不准确的预测身体外形。我们认为，这主要是由于\ textit {在最狂野}训练数据与\ {textit多样和准确}体形标签的稀缺性。因此，我们提出了表带，利用代理表示，例如轮廓和2D关节的系统中，作为输入到一个形状和姿势回归神经网络，其与合成的训练数据来训练（实准确姿势和形状综合训练）（产生使用SMPL统计身体模型）来克服数据缺乏训练期间的即时。我们桥接合成训练输入和嘈杂的实数输入，这是由关键点检测与分割细胞神经网络在测试时间预测，通过训练期间使用数据扩张和腐败之间的间隙。为了评估我们的方法，我们策划并提供单眼人体形状估计，体育塑造一个具有挑战性的评估数据集和姿态3D（SSP-3D）。它由紧密衣服体育者与各种身体形状和相应的伪地面实况SMPL的形状和姿势参数，经由多帧最优化而获得的RGB图像。我们发现，表带优于其他的SSP-3D的国家的最先进的方法，形状预测精度方面，而保持竞争力与国家的最先进的姿态为中心的数据集和度量。

6. Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction [PDF] 返回目录
Danfeng Hong, Naoto Yokoya, Jocelyn Chanussot, Jian Xu, Xiao Xiang Zhu
Abstract: Conventional nonlinear subspace learning techniques (e.g., manifold learning) usually introduce some drawbacks in explainability (explicit mapping) and cost-effectiveness (linearization), generalization capability (out-of-sample), and representability (spatial-spectral discrimination). To overcome these shortcomings, a novel linearized subspace analysis technique with spatial-spectral manifold alignment is developed for a semi-supervised hyperspectral dimensionality reduction (HDR), called joint and progressive subspace analysis (JPSA). The JPSA learns a high-level, semantically meaningful, joint spatial-spectral feature representation from hyperspectral data by 1) jointly learning latent subspaces and a linear classifier to find an effective projection direction favorable for classification; 2) progressively searching several intermediate states of subspaces to approach an optimal mapping from the original space to a potential more discriminative subspace; 3) spatially and spectrally aligning manifold structure in each learned latent subspace in order to preserve the same or similar topological property between the compressed data and the original data. A simple but effective classifier, i.e., nearest neighbor (NN), is explored as a potential application for validating the algorithm performance of different HDR approaches. Extensive experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets: Indian Pines (92.98\%) and the University of Houston (86.09\%) in comparison with previous state-of-the-art HDR methods. The demo of this basic work (i.e., ECCV2018) is openly available at this https URL.
摘要：常规的非线性子空间学习技术（例如，歧管学习）通常在explainability（显式映射）和成本效益（线性化），泛化能力（外的样品），和表示性（空间谱歧视）介绍一些缺点。为了克服这些缺点，具有空间光谱歧管对准的新的线性子空间分析技术为半监督高光谱降维（HDR），称为关节和逐行子空间分析（JPSA）开发。所述JPSA学习的高级别，从高光谱数据由1语义上有意义的，关节空间光谱特征表示）共同学习潜子空间和一个线性分类器找到一个有效的投影方向进行分类有利; 2）逐步搜索子空间的几个中间状态从原始空间到电位更有辨别子空间法的最佳映射; 3）在空间上和光谱上对准，以保持压缩的数据和原始数据之间的相同或类似的拓扑性质在每个学习潜子空间歧管结构。一个简单但有效的分类，即，最近邻（NN），进行了探索作为一个潜在的应用，用于验证不同HDR的算法的性能接近。大量的实验以证明对两种广泛使用的高光谱数据集所提出的JPSA的优越性和有效性：印度因斯（92.98 \％）和休斯敦（86.09 \％）大学与以前的国家的最先进的比较HDR方法。这个基础工作的演示（即ECCV2018）是公开可在此HTTPS URL。

7. TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval [PDF] 返回目录
George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quenot
Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.
摘要：TREC视频检索评测（TRECVID）2019是一个TREC风格的视频分析和检索评价，目标这仍然通过开放促进研究和数字视频基于内容的开发和检索信息的开发进度，指标基于评估。在过去的19年这一努力已经取得了更好的理解系统如何能有效地完成这样的处理，以及如何可以可靠地评估他们的表现。 TRECVID已经由NIST（美国国家标准与技术研究所）和其他美国政府机构的资助。此外，许多组织和个人全球贡献显著的时间和精力。 TRECVID 2019为代表的四个任务的延续，从2018年TRECVID总体而言，来自不同研究机构的27支球队完成了全球的一个或多个以下四项工作：1.点对点视频搜索（AVS）2.实例搜索（INS）3 。活动扩展视频（ActEV）4.视频到文本说明（VTT）本文是介绍了评价框架，任务数据，并在车间使用的措施。

8. DR2S : Deep Regression with Region Selection for Camera Quality Evaluation [PDF] 返回目录
Marcelin Tworski, Stéphane Lathuilière, Salim Belkarfa, Attilio Fiandrotti, Marco Cagnazzo
Abstract: In this work, we tackle the problem of estimating a camera capability to preserve fine texture details at a given lighting condition. Importantly, our texture preservation measurement should coincide with human perception. Consequently, we formulate our problem as a regression one and we introduce a deep convolutional network to estimate texture quality score. At training time, we use ground-truth quality scores provided by expert human annotators in order to obtain a subjective quality measure. In addition, we propose a region selection method to identify the image regions that are better suited at measuring perceptual quality. Finally, our experimental evaluation shows that our learning-based approach outperforms existing methods and that our region selection algorithm consistently improves the quality estimation.
摘要：在这项工作中，我们解决估计在给定的光照条件下保持质地细腻的细节相机能力的问题。重要的是，我们的纹理保存测量应与人的感知一致。因此，我们制定了我们作为一个回归问题，我们引入了深刻的卷积网络来估计纹理质量得分。在训练的时候，我们使用专家个人工注释为了获得主观质量测量提供地面实况质量分数。此外，我们提出了一个区域的选择方法，以确定更适合于测量的感知质量的图像区域。最后，我们的实验评价结果显示，我们的基于学习的方法比现有的方法和我们的区域选择算法一直提高质量估计。

9. Depth-Adapted CNN for RGB-D cameras [PDF] 返回目录
Zongwei Wu, Guillaume Allibert, Christophe Stolz, Cedric Demonceaux
Abstract: Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters. These filters compute the spatial coherence by weighting the photometric information on a fixed neighborhood without taking into account the geometric information. We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras. State-of-the-art approaches use depth as an additional channel or image (HHA) or pass from 2D CNN to 3D CNN. This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture. The depth data is represented as a 2D offset to adapt spatial sampling locations. The new model presented is invariant to scale and rotation around the X and the Y axis of the camera coordinate system. Moreover, when depth data is constant, our model is equivalent to a regular CNN. Experiments of benchmarks validate the effectiveness of our model.
摘要：常规的2D卷积神经网络（CNN）提取物通过应用线性滤波器从输入图像中的特征。这些过滤器计算通过加权在一个固定的附近的测光信息的空间相干性，而不考虑的几何信息。我们解决提高使用由RGB-d相机提供的深度信息经典的RGB CNN方法的问题。状态的最先进的方法使用深度作为附加信道或图像（HHA）或通过从2D CNN到3D CNN。本文提出了一种新颖的和通用过程阐明在CNN架构光度和几何信息。深度数据被表示为一个二维偏移以适应空间采样位置。提出了新的模式是不变的缩放和旋转绕X和坐标系统相机的Y轴。此外，当深度数据是不变的，我们的模型相当于一个普通CNN。基准的实验验证了我们模型的有效性。

10. Line Flow based SLAM [PDF] 返回目录
Qiuyuan Wang, Zike Yan, Junqiu Wang, Fei Xue, Wei Ma, Hongbin Zha
Abstract: We propose a method of visual SLAM by predicting and updating line flows that represent sequential 2D projections of 3D line segments. While indirect SLAM methods using points and line segments have achieved excellent results, they still face problems in challenging scenarios such as occlusions, image blur, and repetitive textures. To deal with these problems, we leverage line flows which encode the coherence of 2D and 3D line segments in spatial and temporal domains as the sequence of all the 2D line segments corresponding to a specific 3D line segment. Thanks to the line flow representation, the corresponding 2D line segment in a new frame can be predicted based on 2D and 3D line segment motions. We create, update, merge, and discard line flows on-the-fly. We model our Line Flow-based SLAM (LF-SLAM) using a Bayesian network. We perform short-term optimization in front-end, and long-term optimization in back-end. The constraints introduced in line flows improve the performance of our LF-SLAM. Extensive experimental results demonstrate that our method achieves better performance than state-of-the-art direct and indirect SLAM approaches. Specifically, it obtains good localization and mapping results in challenging scenes with occlusions, image blur, and repetitive textures.
摘要：通过预测和更新表示3D线段的顺序2D投影线流动提出视觉SLAM的方法。当使用点和线段取得了优异的结果间接SLAM方法，它们仍然面临在挑战场景问题，如闭塞，图像模糊，且重复的纹理。为了解决这些问题，我们编码的2D和3D线段的相干性在空间和时间域，因为所有的对应于特定的3D线段的2D线段的序列杠杆线流动。多亏了线流表示，在一个新的帧中的相应的2D线段可以基于2D和3D线段运动来预测。我们创建，更新，合并，并于即时丢弃线流动。我们使用贝叶斯网络我们的基于流线SLAM（LF-SLAM）模型。我们执行前端短期优化，并在后端长期优化。本着引入的约束流改善我们的LF-SLAM的性能。大量的实验结果表明，我们的方法实现了比国家的最先进的直接和间接的SLAM方法更好的性能。具体地，它获得与闭塞，图像模糊，且重复的纹理挑战场景良好定位和映射结果。

11. Towards Fast, Accurate and Stable 3D Dense Face Alignment [PDF] 返回目录
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, Stan Z. Li
Abstract: Existing methods of 3D dense face alignment mainly concentrate on accuracy, thus limiting the scope of their practical applications. In this paper, we propose a novel regression framework which makes a balance among speed, accuracy and stability. Firstly, on the basis of a lightweight backbone, we propose a meta-joint optimization strategy to dynamically regress a small set of 3DMM parameters, which greatly enhances speed and accuracy simultaneously. To further improve the stability on videos, we present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving. On the premise of high accuracy and stability, our model runs at over 50fps on a single CPU core and outperforms other state-of-the-art heavy models simultaneously. Experiments on several challenging datasets validate the efficiency of our method. Pre-trained models and code are available at this https URL.
摘要：三维密集面取向的现有方法主要集中在准确度，因此限制了它们的实际应用的范围。在本文中，我们提出了一种新的回归框架，使中速度，精度和稳定性的平衡。首先，一个轻量级的骨干的基础上，我们提出了一个荟萃联合优化策略，动态地倒退一小3DMM参数，这大大提高了速度和精度同时进行。为了进一步改善对视频的稳定性，我们提出了一个虚拟的合成方法来变换一个静止图像到短视频其包含在面内和面外面移动。高精确度和稳定性，我们的模型运行在一个单一的CPU核心，同时性能优于国家的最先进的等重模型在50fps的前提。在一些具有挑战性的数据集实验验证了该方法的效率。预先训练模型和代码可在此HTTPS URL。

12. PP-OCR: A Practical Ultra Lightweight OCR System [PDF] 返回目录
Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, Haoshuang Wang
Abstract: The Optical Character Recognition (OCR) systems have been widely used in various of application scenarios, such as office automation (OA) systems, factory automations, online educations, map productions etc. However, OCR is still a challenging task due to the various of text appearances and the demand of computational efficiency. In this paper, we propose a practical ultra lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols, respectively. We introduce a bag of strategies to either enhance the model ability or reduce the model size. The corresponding ablation experiments with the real data are also provided. Meanwhile, several pre-trained models for the Chinese and English recognition are released, including a text detector (97K images are used), a direction classifier (600K images are used) as well as a text recognizer (17.9M images are used). Besides, the proposed PP-OCR are also verified in several other language recognition tasks, including French, Korean, Japanese and German. All of the above mentioned models are open-sourced and the codes are available in the GitHub repository, i.e., this https URL.
摘要：光学字符识别（OCR）系统已被广泛应用于各种应用场景，如办公自动化（OA）系统，工厂自动化，在线教育，地图制作等。然而，OCR仍然是一个具有挑战性的任务，由于各种文本出场和计算效率的需求。在本文中，我们提出了一个实用的超轻OCR系统，即PP-OCR。在PP-OCR的总体模型大小仅为3.5M识别识别63个字母数字符号，分别为6622个中国文字和2.8M。我们引进的战略包要么提高模型的能力或减少模型的大小。还提供了与真实数据对应的消融实验。与此同时，一些预训练模式，为中国和英语识别被释放，包括文本检测器（97K图像使用），一个方向分类（600K图像使用）以及文字识别（17.9M图像使用）。此外，所提出的PP-OCR也验证了其他几种语言识别任务，包括法语，韩语，日语和德语。所有上述车型都是开放源代码和代码都可以在GitHub的库，即该HTTPS URL。

13. CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics [PDF] 返回目录
Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, Zhonghua Lu
Abstract: Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computational resources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing less important neurons and fine-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruning solutions often rely on a numerical threshold of the pruning criteria, lacking the flexibility to optimally balance the trade-off between model size and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model fine-tuning makes this process opaque, and therefore becomes difficult to optimize. In this paper, we address these challenges through a visual analytics approach, named CNNPruner. It considers the importance of convolutional filters through both instability and sensitivity, and allows users to interactively create pruning plans according to a desired goal on model size or accuracy. Also, CNNPruner integrates state-of-the-art filter visualization techniques to help users understand the roles that different filters played and refine their pruning plans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of CNNPruner.
摘要：卷积神经网络（细胞神经网络）已经证明，在许多计算机视觉任务非常不错的表现。 CNN模型的增加的尺寸，但是，可以防止它们被广泛地部署到具有有限的计算资源，例如，移动/嵌入式设备的设备。模型修剪努力打造的新课题通过去除不太重要的神经元和微调修剪网络，以尽量减少精度损失来解决这个问题。然而，现有的自动化解决方案的修剪往往依靠的修剪标准的数值的阈值，缺乏灵活性，以最佳平衡模型的大小和准确性之间的折衷。此外，神经元修剪和模型微调的级之间的复杂的相互作用，使这一过程不透明的，并因此变得难以优化。在本文中，我们讨论通过可视化分析方法，名为CNNPruner这些挑战。它认为卷积过滤器通过两个不稳定性和灵敏度的重要性，并允许用户根据型号大小或精度的预期目标交互创建修剪计划。此外，CNNPruner集成了国家的最先进的过滤器的可视化技术，以帮助用户了解不同的过滤器发挥和完善其修剪计划的角色。通过与真实大小的细胞神经网络的综合案例分析，我们验证CNNPruner的有效性。

14. Automatic Target Recognition (ATR) from SAR Imaginary by Using Machine Learning Techniques [PDF] 返回目录
Umut Özkaya
Abstract: Automatic Target Recognition (ATR) in Synthetic aperture radar (SAR) images becomes a very challenging problem owing to containing high level noise. In this study, a machine learning-based method is proposed to detect different moving and stationary targets using SAR images. First Order Statistical (FOS) features were obtained from Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) on gray level SAR images. Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM) and Gray Level Size Zone Matrix (GLSZM) algorithms are also used. These features are provided as input for the training and testing stage Support Vector Machine (SVM) model with Gaussian kernels. 4-fold cross-validations were implemented in performance evaluation. Obtained results showed that GLCM + SVM algorithm is the best model with 95.26% accuracy. This proposed method shows that moving and stationary targets in MSTAR database could be recognized with high performance.
摘要：自动目标识别（ATR）在合成孔径雷达（SAR）图像变得由于含有噪声电平高一个非常具有挑战性的问题。在这项研究中，提出了一种基于机器学习的方法来检测不同的移动和使用SAR图像静止目标。从快速傅立叶获得一级统计（FOS）功能变换（FFT），离散余弦变换（DCT）和离散小波变换（DWT）的灰度SAR图像。灰度共生矩阵（GLCM），灰度行程长度矩阵（GLRLM）和灰色级别大小区矩阵（GLSZM）算法也可使用。这些功能提供作为训练输入和测试阶段，支持向量机（SVM）模型与高斯内核。 4倍交叉验证是在性能评价实施。得到的结果表明，GLCM + SVM算法是有95.26％的准确度最好的模型。该建议的方法表明，MSTAR数据库的移动和固定目标可与高性能得到认可。

15. Is each layer non-trivial in CNN? [PDF] 返回目录
Wei Wang, Zhuoxu Cui, Dong Liang
Abstract: Many convolutional neural network (CNN) models have achieved great success in many fields. The networks get deeper and deeper. However, is each layer non-trivial in networks? To answer these questions, we propose to replace the convolution kernels with zeros. We compare these results with baseline and show that we can reach similar or even same performances. Although convolution kernels are the cores of networks,we demonstrate that some are trivial and that these layers are regular.
摘要：许多卷积神经网络（CNN）模型已经在许多领域取得了巨大成功。在网络越来越深入。然而，在网络中每一层非平凡？为了回答这些问题，我们提出了用零来代替卷积核。我们比较这些结果与基线，并表明我们可以达到类似甚至相同的性能。虽然卷积核是网络的核心，我们证明，有些是微不足道的，而这些层是有规律的。

16. Applying a random projection algorithm to optimize machine learning model for breast lesion classification [PDF] 返回目录
Morteza Heidari, Sivaramakrishnan Lakshmivarahan, Seyedehnafiseh Mirniaharikandehei, Gopichandh Danala, Sai Kiran R. Maryada, Hong Liu, Bin Zheng
Abstract: Machine learning is widely used in developing computer-aided diagnosis (CAD) schemes of medical images. However, CAD usually computes large number of image features from the targeted regions, which creates a challenge of how to identify a small and optimal feature vector to build robust machine learning models. In this study, we investigate feasibility of applying a random projection algorithm to build an optimal feature vector from the initially CAD-generated large feature pool and improve performance of machine learning model. We assemble a retrospective dataset involving 1,487 cases of mammograms in which 644 cases have confirmed malignant mass lesions and 843 have benign lesions. A CAD scheme is first applied to segment mass regions and initially compute 181 features. Then, support vector machine (SVM) models embedded with several feature dimensionality reduction methods are built to predict likelihood of lesions being malignant. All SVM models are trained and tested using a leave-one-case-out cross-validation method. SVM generates a likelihood score of each segmented mass region depicting on one-view mammogram. By fusion of two scores of the same mass depicting on two-view mammograms, a case-based likelihood score is also evaluated. Comparing with the principle component analyses, nonnegative matrix factorization, and Chi-squared methods, SVM embedded with the random projection algorithm yielded a significantly higher case-based lesion classification performance with the area under ROC curve of 0.84+0.01 (p<0.02). the study demonstrates that random project algorithm is a promising method to generate optimal feature vectors help improve performance of machine learning models medical images. < font>
摘要：机器学习被广泛应用于开发计算机辅助诊断（CAD）医学图像的方案。然而，CAD通常计算从目标区域，从而产生的如何识别一个小和最佳的特征向量来构建强大的机器学习模型是一个挑战大量的图像特征。在这项研究中，我们探讨使用随机投影算法建立从最初CAD生成大量的功能池的最佳特征向量，提高机器学习模型的性能的可行性。我们组装涉及1487箱子乳房X线照片，其中644箱子已确诊的恶性肿块和843具有良性病变的回顾性数据集。甲CAD方案首先被施加到段质量区域，最初计算181层的功能。然后，嵌有多个功能降维的方法支持向量机（SVM）模型在建造时，预测为病变的恶性可能性。所有SVM模型训练和使用留一情况下，交叉验证方法进行测试。 SVM生成各分割质量区描绘在一个视图乳房X线照片的可能性的得分。由相同的质量描绘在两个视点的乳房X线照片两个分数的融合中，基于案例的可能性分数还评价。与主成分分析中，非负矩阵分解，和卡方方法相比，嵌入有随机投影算法SVM产生了显著更高基于案例的病变分类性能0.84 + 0.01（P <0.02）的roc曲线下的面积。这项研究表明，随机项目算法来生成最佳的特征向量，以帮助提高医疗图像的机器学习模型的性能很有前途的方法。< font>

17. Prune Responsibly [PDF] 返回目录
Michela Paganini
Abstract: Irrespective of the specific definition of fairness in a machine learning application, pruning the underlying model affects it. We investigate and document the emergence and exacerbation of undesirable per-class performance imbalances, across tasks and architectures, for almost one million categories considered across over 100K image classification models that undergo a pruning process.We demonstrate the need for transparent reporting, inclusive of bias, fairness, and inclusion metrics, in real-life engineering decision-making around neural network pruning. In response to the calls for quantitative evaluation of AI models to be population-aware, we present neural network pruning as a tangible application domain where the ways in which accuracy-efficiency trade-offs disproportionately affect underrepresented or outlier groups have historically been overlooked. We provide a simple, Pareto-based framework to insert fairness considerations into value-based operating point selection processes, and to re-evaluate pruning technique choices.
摘要：公平在机器学习应用的具体定义不管，修剪底层模型会影响它。我们调查和记录的出现和不良每级性能的不平衡加剧，整个任务和架构，为近百万类别跨经过修剪process.We超过100K的图像分类模型认为证明需要透明的报告，包括偏见，公平和包容性指标，在现实生活中的工程决策周围神经网络修剪。响应于AI模式的定量评价是人口感知来电，我们提出了神经网络修剪为其中在精度，效率的权衡严重影响人数不足或异常集团历来方式忽略了一个实实在在的应用领域。我们提供了一个简单的，基于帕累托框架插入公平的考虑纳入基于价值的工作点选择过程，并重新评估修剪技术的选择。

18. Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion [PDF] 返回目录
Abhinav Sagar
Abstract: Depth estimation from monocular images is a challenging problem in computer vision. In this paper, we tackle this problem using a novel network architecture using multi scale feature fusion. Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We present a new loss function for training the network which uses a depth regression term, SSIM loss term and a multinomial logistic loss term combined. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss. Our network outperforms previous state of the art methods with lesser parameters.
摘要：从单目图像深度估计是计算机视觉中一个具有挑战性的问题。在本文中，我们将处理使用一种新的网络架构，采用多尺度特征融合这个问题。我们的网络使用两个不同的区块，首先它采用不同的过滤器尺寸为卷积和合并所有的个体特征图。所述第二块使用代替完全连接层的扩张卷积从而减少计算和增加感受域。我们提出了一个新的损失函数，培训，使用深度回归来看，SSIM损失项和多项物流损耗项相结合的网络。我们培养和使用由RMSE损失和SILog损失的深度估计标准评价指标上制作三维数据集，NYU深度V2数据集和吉滴数据集测试我们的网络。我们的网络优于与较少的参数技术方法以前的状态。

19. PESAO: Psychophysical Experimental Setup for Active Observers [PDF] 返回目录
Markus D. Solbach, John K. Tsotsos
Abstract: Most past and present research in computer vision involves passively observed data. Humans, however, are active observers outside the lab; they explore, search, select what and how to look. Nonetheless, how exactly active observation occurs in humans so that it can inform the design of active computer vision systems is an open problem. PESAO is designed for investigating active, visual observation in a 3D world. The goal was to build an experimental setup for various active perception tasks with human subjects (active observers) in mind that is capable of tracking the head and gaze. While many studies explore human performances, usually, they use line drawings portrayed in 2D, and no active observer is involved. PESAO allows us to bring many studies to the three-dimensional world, even involving active observers. In our instantiation, it spans an area of 400cm x 300cm and can track active observers at a frequency of 120Hz. Furthermore, PESAO provides tracking and recording of 6D head motion, gaze, eye movement-type, first-person video, head-mounted IMU sensor, birds-eye video, and experimenter notes. All are synchronized at microsecond resolution.
摘要：在计算机视觉大多数过去和现在的研究涉及被动地观测数据。然而人类是在实验室外活动的观察员;他们探索，搜索，选择什么，怎么看。然而，究竟主动观察发生在人类身上，以便它可以通知活跃计算机视觉系统的设计是一个开放的问题。 PESAO是专为3D世界调查活动的，肉眼观察。我们的目标是建立与铭记，它能够跟踪头部和凝视人类受试者（主动观察者）各种有效感知任务的实验装置。虽然许多研究探索人类表演，通常情况下，他们用线条画在2D描绘，并没有积极的观察员参与。 PESAO允许我们带来许多研究的三维世界，甚至累及活跃观察员。在我们的实例，它跨越的400厘米X300厘米的区域，并且可以在120Hz的频率跟踪活动观察员。此外，PESAO提供跟踪和6D头部运动的记录，凝视，眼球运动型，第一人称视频，头戴式IMU传感器，鸟瞰视频和实验者的注释。所有在微秒级同步。

20. Supervised Learning with Projected Entangled Pair States [PDF] 返回目录
Song Cheng, Lei Wang, Pan Zhang
Abstract: Tensor networks, a model that originated from quantum physics, has been gradually generalized as efficient models in machine learning in recent years. However, in order to achieve exact contraction, only tree-like tensor networks such as the matrix product states and tree tensor networks have been considered, even for modeling two-dimensional data such as images. In this work, we construct supervised learning models for images using the projected entangled pair states (PEPS), a two-dimensional tensor network having a similar structure prior to natural images. Our approach first performs a feature map, which transforms the image data to a product state on a grid, then contracts the product state to a PEPS with trainable parameters to predict image labels. The tensor elements of PEPS are trained by minimizing differences between training labels and predicted labels. The proposed model is evaluated on image classifications using the MNIST and the Fashion-MNIST datasets. We show that our model is significantly superior to existing models using tree-like tensor networks. Moreover, using the same input features, our method performs as well as the multilayer perceptron classifier, but with much fewer parameters and is more stable. Our results shed light on potential applications of two-dimensional tensor network models in machine learning.
摘要：张量的网络，源自量子物理学模型，在近几年逐渐被概括为机器学习有效的模式。然而，为了实现精确的收缩，只有树状张网络，如矩阵产品状态和树张网络已经认为，即使对二维数据建模，如图像。在这项工作中，我们构建监督学习模型用于使用具有类似结构的二维张量网络之前，自然图像投影纠缠对状态（PEPS），图像。我们的方法首先进行一个特征图，其中图像数据转换为一个产品状态的网格上，然后收缩的产品状态与可训练参数的PEPS预测图像标签。 PEPS的张量元件通过最小化训练标签和标签的预测之间的差异训练。该模型是在使用MNIST和时尚-MNIST数据集图像分类评价。我们表明，我们的模型是显著优于采用树状网络，张现有车型。此外，使用相同的输入功能，我们的方法进行，以及多层感知分类器，但具有少得多的参数和更稳定。我们的研究结果阐明了在机器学习二维张网络模型的应用潜力。

21. CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions [PDF] 返回目录
Vincenzo Lomonaco, Lorenzo Pellegrini, Pau Rodriguez, Massimo Caccia, Qi She, Yu Chen, Quentin Jodelet, Ruiping Wang, Zheda Mai, David Vazquez, German I. Parisi, Nikhil Churamani, Marc Pickett, Issam Laradji, Davide Maltoni
Abstract: In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a difficult task by itself. In fact, given the proliferation of different settings, training and evaluation protocols, metrics and nomenclature, it is often tricky to properly characterize a continual learning algorithm, relate it to other solutions and gauge its real-world applicability. The first Continual Learning in Computer Vision challenge held at CVPR in 2020 has been one of the first opportunities to evaluate different continual learning algorithms on a common hardware with a large set of shared evaluation metrics and 3 different settings based on the realistic CORe50 video benchmark. In this paper, we report the main results of the competition, which counted more than 79 teams registered, 11 finalists and 2300$ in prizes. We also summarize the winning approaches, current challenges and future research directions.
摘要：在过去的几年中，我们见证了不断地学习与共同目标使目前的AI系统适应性更强，高效，自主的深层神经网络重新和快速增长的兴趣。然而，尽管该领域的解决灾难性遗忘的问题，不同的标杆学习不断接近显著和不容置疑的进步本身就是一个艰巨的任务。事实上，由于不同的设置，培训和考核方案，指标和命名的扩散，它往往是棘手的正确表征不断学习算法，它涉及到其他的解决方案，并衡量其真实世界的适用性。 2020年在CVPR举行的第一次持续学习计算机视觉挑战一直到上一大套基于现实CORe50视频基准共享的评价指标和3点不同的设置的通用硬件评估不同的持续学习算法的第一个机会之一。在本文中，我们报道了比赛，其中数超过79队报名，入围11和$ 2300的奖金的主要结果。我们也总结了获胜的方法，目前的挑战和未来的研究方向。

22. Deep Neural Network Approach for Annual Luminance Simulations [PDF] 返回目录
Yue Liu, Alex Colburn, Mehlika Inanici
Abstract: Annual luminance maps provide meaningful evaluations for occupants' visual comfort, preferences, and perception. However, acquiring long-term luminance maps require labor-intensive and time-consuming simulations or impracticable long-term field measurements. This paper presents a novel data-driven machine learning approach that makes annual luminance-based evaluations more efficient and accessible. The methodology is based on predicting the annual luminance maps from a limited number of point-in-time high dynamic range imagery by utilizing a deep neural network (DNN). Panoramic views are utilized, as they can be post-processed to study multiple view directions. The proposed DNN model can faithfully predict high-quality annual panoramic luminance maps from one of the three options within 30 minutes training time: a) point-in-time luminance imagery spanning 5% of the year, when evenly distributed during daylight hours, b) one-month hourly imagery generated or collected continuously during daylight hours around the equinoxes (8% of the year); or c) 9 days of hourly data collected around the spring equinox, summer and winter solstices (2.5% of the year) all suffice to predict the luminance maps for the rest of the year. The DNN predicted high-quality panoramas are validated against Radiance (RPICT) renderings using a series of quantitative and qualitative metrics. The most efficient predictions are achieved with 9 days of hourly data collected around the spring equinox, summer and winter solstices. The results clearly show that practitioners and researchers can efficiently incorporate long-term luminance-based metrics over multiple view directions into the design and research processes using the proposed DNN workflow.
摘要：每年的亮度地图提供了驾乘者的视觉舒适度，偏好和感知有意义的评估。然而，在获取长期的亮度贴图需要劳动密集和耗时的模拟或不可行长期的实地测量。本文提出了一种新颖的数据驱动的机器学习方法，使每年的基于亮度的评估更高效，更方便。该方法是基于通过利用深层神经网络（DNN）预测从点即时高动态范围图像的有限数量的年度亮度映射。全景被利用，因为它们可以进行后处理，以研究多个观察方向。所提出的DNN模型可以忠实地预测高品质的年度全景亮度贴图从30分钟内的三个选项之一培训时间：1）点的时间亮度图像跨越一年的5％，当白天均匀分布，B ）一个月的期间生成或周围的二分点白天（年份的8％）连续收集小时一次的图像;或c）周围的春分，夏天和冬至每小时收集数据的9天（一年的2.5％）都足以预测亮度映射在今年的其余部分。所述DNN预测的高品质的全景使用一系列定量和定性度量针对光辉（RPICT）渲染验证。最有效的预测与周围的春分，夏天和冬至收集的数据每小时9天实现。结果清楚地表明，实践者和研究者可以有效地掺入基于长期的亮度指标在多个视图方向为使用建议DNN工作流程的设计和研究的过程。

23. Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation [PDF] 返回目录
Po Li, Lei Li, Yan Fu, Jun Rong, Yu Zhang
Abstract: In this paper, we introduce Cross-modal Alignment with mixture experts Neural Network (CameNN) recommendation model for intral-city retail industry, which aims to provide fresh foods and groceries retailing within 5 hours delivery service arising for the outbreak of Coronavirus disease (COVID-19) pandemic around the world. We propose CameNN, which is a multi-task model with three tasks including Image to Text Alignment (ITA) task, Text to Image Alignment (TIA) task and CVR prediction task. We use pre-trained BERT to generate the text embedding and pre-trained InceptionV4 to generate image patch embedding (each image is split into small patches with the same pixels and treat each patch as an image token). Softmax gating networks follow to learn the weight of each transformer expert output and choose only a subset of experts conditioned on the input. Then transformer encoder is applied as the share-bottom layer to learn all input features' shared interaction. Next, mixture of transformer experts (MoE) layer is implemented to model different aspects of tasks. At top of the MoE layer, we deploy a transformer layer for each task as task tower to learn task-specific information. On the real word intra-city dataset, experiments demonstrate CameNN outperform baselines and achieve significant improvements on the image and text representation. In practice, we applied CameNN on CVR prediction in our intra-city recommender system which is one of the leading intra-city platforms operated in China.
摘要：在本文中，我们介绍了跨模态对准混合专家神经网络防敏感市零售行业，为客户提供新鲜食品和杂货5小时送服内零售价为冠状病毒病的爆发所产生的，其目的（CameNN）推荐模型（COVID-19）在世界各地流行。我们建议CameNN，这是一个有三个任务，包括图像到文本对齐（ITA）的任务，文本图像对齐（TIA）的任务，并CVR预测任务多任务模式。我们使用预先训练BERT生成文本嵌入和预先训练InceptionV4产生像块嵌入（每个图像分割成小块用相同的像素，并且把每个补丁为图像标记）。 SOFTMAX选通网络遵循学习每个变压器专家输出的重量，只选择专家条件输入的一个子集。然后转换器编码器被应用作为学习的所有输入功能的共享交互共享底层。接着，变压器专家混合（MOE）层被实现为任务的不同方面进行建模。在教育部层的顶部，我们部署的每个任务，因为任务塔学习任务特定信息的转换层。在现实世界同城数据集，实验证明CameNN跑赢基准和实现对图像和文本表示显著的改善。在实践中，我们在城市内部推荐系统，这是在中国经营的领先同城平台之一应用于CVR预测CameNN。

24. Multi-species Seagrass Detection and Classification from Underwater Images [PDF] 返回目录
Scarlett Raine, Ross Marchant, Peyman Moghadam, Frederic Maire, Brett Kettle, Brano Kusy
Abstract: Underwater surveys conducted using divers or robots equipped with customized camera payloads can generate a large number of images. Manual review of these images to extract ecological data is prohibitive in terms of time and cost, thus providing strong incentive to automate this process using machine learning solutions. In this paper, we introduce a multi-species detector and classifier for seagrasses based on a deep convolutional neural network (achieved an overall accuracy of 92.4%). We also introduce a simple method to semi-automatically label image patches and therefore minimize manual labelling requirement. We describe and release publicly the dataset collected in this study as well as the code and pre-trained models to replicate our experiments at: this https URL
摘要：水下调查使用装备有相机的定制的有效载荷潜水员或机器人可以生成大量的图像进行的。这些图像的人工审核，提取的生态数据在时间和成本方面望而却步，从而提供了强大的动力使用机器学习解决方案来自动完成这一过程。在本文中，我们介绍了基于一个深的卷积神经网络的海草多物种检测器和分类器（实现了92.4％的总精度）。我们还介绍了一种简单的方法，以半自动地标签图像块，并因此最小化人工标签要求。我们描述和发布的可公开数据集在这项研究中收集以及代码和预先训练模式在复制我们的实验：此HTTPS URL

25. Beyond Identity: What Information Is Stored in Biometric Face Templates? [PDF] 返回目录
Philipp Terhörst, Daniel Fährmann, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract: Deeply-learned face representations enable the success of current face recognition systems. Despite the ability of these representations to encode the identity of an individual, recent works have shown that more information is stored within, such as demographics, image characteristics, and social traits. This threatens the user's privacy, since for many applications these templates are expected to be solely used for recognition purposes. Knowing the encoded information in face templates helps to develop bias-mitigating and privacy-preserving face recognition technologies. This work aims to support the development of these two branches by analysing face templates regarding 113 attributes. Experiments were conducted on two publicly available face embeddings. For evaluating the predictability of the attributes, we trained a massive attribute classifier that is additionally able to accurately state its prediction confidence. This allows us to make more sophisticated statements about the attribute predictability. The results demonstrate that up to 74 attributes can be accurately predicted from face templates. Especially non-permanent attributes, such as age, hairstyles, haircolors, beards, and various accessories, found to be easily-predictable. Since face recognition systems aim to be robust against these variations, future research might build on this work to develop more understandable privacy preserving solutions and build robust and fair face templates.
摘要：深入学习的脸部表征使当前的人脸识别系统的成功。尽管有这些陈述的编码个体的身份的能力，最近的作品表明，更多的信息被存储在其中，如人口统计，图像特征，社会特征。这威胁到用户的隐私，因为对于许多应用，预计这些模板将仅用于识别的目的。了解在人脸模板编码的信息有助于开发偏置减轻和隐私保护脸部识别技术。这项工作旨在通过分析关于113点的属性人脸模板，以支持这两个分支的发展。实验在两个公开可用的脸的嵌入进行。为了评估属性的可预测性，我们培养了大量的属性分类是另外能够准确说出其预测的信心。这使我们能够对属性预测更复杂的语句。结果表明，高达74个属性可以从面部模板中准确地预测。特别是非永久性属性，如年龄，发型，haircolors，胡须，和各种配件，发现是很容易预测的。由于面部识别系统的目标是针对这些变化稳健，未来的研究会加强这一工作，制定更容易理解的隐私保护解决方案，并建立健全及公正的人脸模板。

26. A Deep Neural Network Tool for Automatic Segmentation of Human Body Parts in Natural Scenes [PDF] 返回目录
Patrick McClure, Gabrielle Reimann, Michal Ramot, Francisco Pereira
Abstract: This short article describes a deep neural network trained to perform automatic segmentation of human body parts in natural scenes. More specifically, we trained a Bayesian SegNet with concrete dropout on the Pascal-Parts dataset to predict whether each pixel in a given frame was part of a person's hair, head, ear, eyebrows, legs, arms, mouth, neck, nose, or torso.
摘要：这篇短文介绍了培训，以执行自然场景人体器官的自动分割了深刻的神经网络。更具体地讲，我们训练SegNet贝叶斯与帕斯卡零配件数据集混凝土差来预测在给定帧中的每个像素是否是一个人的头发，头部，耳朵，眉毛，腿，胳膊，嘴，颈，鼻子的一部分，或者躯干。

27. Clustering COVID-19 Lung Scans [PDF] 返回目录
Jacob Householder, Andrew Householder, John Paul Gomez-Reed, Fredrick Park, Shuai Zhang
Abstract: With the recent outbreak of COVID-19, creating a means to stop it's spread and eventually develop a vaccine are the most important and challenging tasks that the scientific community is facing right now. The first step towards these goals is to correctly identify a patient that is infected with the virus. Our group applied an unsupervised machine learning technique to identify COVID-19 cases. This is an important topic as COVID-19 is a novel disease currently being studied in detail and our methodology has the potential to reveal important differences between it and other viral pneumonia. This could then, in turn, enable doctors to more confidently help each patient. Our experiments utilize Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and the recently developed Robust Continuous Clustering algorithm (RCC). We display the performance of RCC in identifying COVID-19 patients and its ability to compete with other unsupervised algorithms, namely K-Means++ (KM++). Using a COVID-19 Radiography dataset, we found that RCC outperformed KM++; we used the Adjusted Mutual Information Score (AMI) in order to measure the effectiveness of both algorithms. The AMI for the two and three class cases of KM++ were 0.0250 and 0.054, respectively. In comparison, RCC scored 0.5044 in the two class case and 0.267 in the three class case, clearly showing RCC as the superior algorithm. This not only opens new possible applications of RCC, but it could potentially aid in the creation of a new tool for COVID-19 identification.
摘要：随着近期COVID-19的爆发，创造一种手段来阻止它的蔓延，并最终研制疫苗是最重要和具有挑战性的任务，科学界正面临着现在。实现这些目标的第一步是正确识别被感染病毒的患者。我们集团应用无监督的机器学习技术来识别COVID-19的情况。这是一个重要的话题，因为COVID-19是在细节目前正在研究一个新的疾病，我们的方法有揭示它与其他病毒性肺炎之间的重要区别的潜力。这可能那么，反过来，使医生能够更自信地帮助每一位患者。我们的实验采用主成分分析（PCA），T-分布随机近邻嵌入（T-SNE），以及最近开发的强大的连续簇算法（RCC）。我们在确定COVID-19的患者和其能力竞争与其它无监督的算法，即K均值++（KM ++）显示RCC的性能。使用COVID-19 X线数据集，我们发现，RCC跑赢KM ++;我们以测量两种算法的有效性所用的调整互信息得分（AMI）。的AMI为KM ++的两个和三个类的情况下分别为0.0250和0.054。相比较而言，在RCC三个类情况下打进两个类情况下0.5044和0.267，充分显示出RCC作为上级算法。这不仅打开RCC的新的可能的应用，但它可能在创造了COVID-19标识的新工具的潜在帮助。

28. Efficient Computation of Higher Order 2D Image Moments using the Discrete Radon Transform [PDF] 返回目录
William Diggin, Michael Diggin
Abstract: Geometric moments and moment invariants of image artifacts have many uses in computer vision applications, e.g. shape classification or object position and orientation. Higher order moments are of interest to provide additional feature descriptors, to measure kurtosis or to resolve n-fold symmetry. This paper provides the method and practical application to extend an efficient algorithm, based on the Discrete Radon Transform, to generate moments greater than the 3rd order. The mathematical fundamentals are presented, followed by relevant implementation details. Results of scaling the algorithm based on image area and its computational comparison with a standard method demonstrate the efficacy of the approach.
摘要：几何矩和图像伪影的不变矩在计算机视觉应用，例如许多用途形状分类或对象的位置和取向。高阶矩的利益提供额外的特征描述，测量峰度或为了解决n重对称。本文提供的方法和实际应用，以延长一个有效的算法，基于所述离散Radon变换，以产生时刻比第三顺序越大。数学基本面呈现，其次是相关的实施细则。缩放基于图像区域及其与标准方法计算比较的算法的结果证明了该方法的功效。

29. LiPo-LCD: Combining Lines and Points for Appearance-based Loop Closure Detection [PDF] 返回目录
Joan P. Company-Corcoles, Emilio Garcia-Fidalgo, Alberto Ortiz
Abstract: Visual SLAM approaches typically depend on loop closure detection to correct the inconsistencies that may arise during the map and camera trajectory calculations, typically making use of point features for detecting and closing the existing loops. In low-textured scenarios, however, it is difficult to find enough point features and, hence, the performance of these solutions drops drastically. An alternative for human-made scenarios, due to their structural regularity, is the use of geometrical cues such as straight segments, frequently present within these environments. Under this context, in this paper we introduce LiPo-LCD, a novel appearance-based loop closure detection method that integrates lines and points. Adopting the idea of incremental Bag-of-Binary-Words schemes, we build separate BoW models for each feature, and use them to retrieve previously seen images using a late fusion strategy. Additionally, a simple but effective mechanism, based on the concept of island, groups similar images close in time to reduce the image candidate search effort. A final step validates geometrically the loop candidates by incorporating the detected lines by means of a process comprising a line feature matching stage, followed by a robust spatial verification stage, now combining both lines and points. As it is reported in the paper, LiPo-LCD compares well with several state-of-the-art solutions for a number of datasets involving different environmental conditions.
摘要：视觉SLAM方法通常依赖于环路闭合检测以校正地图和摄像机轨迹计算过程中可能出现，典型地利用点的特征在于用于检测和关闭现有的循环中的不一致。在低纹理场景，但是，它是很难找到足够的点要素，因此，这些解决方案的性能急剧下降。对于人为方案中，由于其结构规律性的替代，是使用几何线索，如直线段，在这些环境中经常存在的。在这种背景下，在本文中，我们介绍锂聚合物-LCD，一种新颖的基于外观的环路闭合检测方法，集成的线和点。采用一袋二进制字的增量计划的想法，我们建立独立的BoW模型的每个特征，并用它们来检索使用后融合战略先前看到的图像。此外，一个简单而有效的机制，根据海岛的概念，组类似的图像在时间上接近，以减少图像候选搜索工作。最后一步，通过包括线特征匹配阶段，随后是健壮空间验证阶段，现在结合两种线和点的处理的手段结合所检测到的线路几何验证循环的候选者。正如在本文报道，脂LCD具有用于一些涉及不同的环境条件的数据集的几个国家的最先进的解决方案相媲美。

30. Decontextualized learning for interpretable hierarchical representations of visual patterns [PDF] 返回目录
R. Ian Etheredge, Manfred Schartl, Alex Jordan
Abstract: Apart from discriminative models for classification and object detection tasks, the application of deep convolutional neural networks to basic research utilizing natural imaging data has been somewhat limited; particularly in cases where a set of interpretable features for downstream analysis is needed, a key requirement for many scientific investigations. We present an algorithm and training paradigm designed specifically to address this: decontextualized hierarchical representation learning (DHRL). By combining a generative model chaining procedure with a ladder network architecture and latent space regularization for inference, DHRL address the limitations of small datasets and encourages a disentangled set of hierarchically organized features. In addition to providing a tractable path for analyzing complex hierarchal patterns using variation inference, this approach is generative and can be directly combined with empirical and theoretical approaches. To highlight the extensibility and usefulness of DHRL, we demonstrate this method in application to a question from evolutionary biology.
摘要：除了判别模型进行分类和目标检测任务，深刻的卷积神经网络来利用自然影像数据的基础研究中的应用受到了一定的限制;特别是在需要下游分析一组解释功能，对许多科学调查的关键要求的情况。我们提出的算法和培训模式专为满足这样的：去情境分层表示学习（DHRL）。通过生成模型链接程序用梯子网络体系结构和潜在空间正规化结合的推断，DHRL地址的小数据集的限制并鼓励解缠结的一系列分层组织的功能。除了提供用于使用变异推理分析复杂层次图案的易处理的路径，该方法是生成，并且可以与经验和理论方法来直接结合。为了突出DHRL的可扩展性和实用性，我们展示了应用这一方法从进化生物学的问题。

31. Haar Wavelet based Block Autoregressive Flows for Trajectories [PDF] 返回目录
Apratim Bhattacharyya, Christoph-Nikolas Straehle, Mario Fritz, Bernt Schiele
Abstract: Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. While previous works have leveraged conditional generative models like GANs and VAEs for learning the likely future trajectories, accurately modeling the dependency structure of these multimodal distributions, particularly over long time horizons remains challenging. Normalizing flow based generative models can model complex distributions admitting exact inference. These include variants with split coupling invertible transformations that are easier to parallelize compared to their autoregressive counterparts. To this end, we introduce a novel Haar wavelet based block autoregressive model leveraging split couplings, conditioned on coarse trajectories obtained from Haar wavelet based transformations at different levels of granularity. This yields an exact inference method that models trajectories at different spatio-temporal resolutions in a hierarchical manner. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets - Stanford Drone and Intersection Drone.
摘要：轨迹预测，如行人的是自主代理的性能是至关重要的。虽然以前的作品已经利用条件生成模型，如甘斯和VAES学习未来可能的轨迹，准确地模拟这些峰分布的依赖性结构，特别是在长期的时间跨度仍充满挑战。正火流为基础生成模型可以模拟复杂分布承认精确推断。这些包括拆开连接可逆的转换更易于相比，他们的自回归同行并行变种。为此，我们介绍一种新颖的Haar小波基于块自回归模型杠杆分割接头，条件上从哈尔得到粗轨迹基于小波的在不同的粒度级别转换。这产生一个精确的推理方法，其在以分级的方式不同的时空分辨率的模型轨迹。我们说明了两个真实世界的数据集生成不同的，准确的轨迹我们的方法的优点 - 斯坦福无人机和无人机交叉口。

32. DeepActsNet: Spatial and Motion features from Face, Hands, and Body Combined with Convolutional and Graph Networks for Improved Action Recognition [PDF] 返回目录
Umar Asif, Deval Mehta, Stefan von Cavallar, Jianbin Tang, Stefan Harrer
Abstract: Existing action recognition methods mainly focus on joint and bone information in human body skeleton data due to its robustness to complex backgrounds and dynamic characteristics of the environments. In this paper, we combine body skeleton data with spatial and motion information from face and two hands, and present Deep Action Stamps (DeepActs), a novel data representation to encode actions from video sequences. We also present DeepActsNet, a deep learning based model with modality-specific Convolutional and Graph sub-networks for highly accurate action recognition based on Deep Action Stamps. Experiments on three challenging action recognition datasets (NTU60, NTU120, and SYSU) show that DeepActs produce considerable improvements in the recognition performance of standard convolutional and graph networks. Experiments also show that the fusion of modality-specific convolutional and structural features learnt by our DeepActsNet yields consistent improvements in action recognition accuracy over the state-of-the-art on the target datasets.
摘要：现有的动作识别方法主要侧重于人体骨骼数据关节和骨骼的信息，由于其稳健性复杂背景和环境的动态特性。在本文中，我们结合了从面的空间和运动信息和两只手，和本深行动邮票（DeepActs），一种新型的数据表示，以从视频序列编码的动作体骨架数据。我们还提出DeepActsNet，深刻学习基于模型具情态，具体卷积和图形子网络来进行高精度的动作识别基于深行动邮票。在三个挑战动作识别的数据集（NTU60，NTU120，和中山大学）实验表明，DeepActs产生标准卷积和图表网络的识别性能相当大的改善。实验还表明，特定的模态卷积和结构特点融合了解到我们DeepActsNet产生了对目标数据集的国家的最先进的动作识别准确度持续改善。

33. Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval [PDF] 返回目录
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Abstract: Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text reading system. Then, we combine textual features with salient image regions to exploit the complementary information carried by the two sources. Specifically, we employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image. By obtaining an enhanced set of visual and textual features, the proposed model greatly outperforms the previous state-of-the-art in two different tasks, fine-grained classification and image retrieval in the Con-Text and Drink Bottle datasets.
摘要：在自然图像中找到场景文本实例进行明确的语义信息，可以提供解决广泛的计算机视觉问题阵列的重要线索。在本文中，我们专注于利用其在视觉和文字提示的形式，多模式的内容，以解决细粒度的图像分类和检索的任务。首先，我们通过采用文本阅读系统获得的图像文本的情况。然后，我们结合文本特征与突出的图像区域，以利用两个源携带的补充信息。具体而言，我们采用一个图形卷积网络执行多模式推理和通过学习显着对象与文本之间的公共语义空间关系获得增强的特征的图像中找到。通过获得增强的视觉和文本特征集，该模型大大优于以前的国家的最先进的在两个不同的任务，细粒度的分类和图像检索在精读文本和饮料瓶的数据集。

34. Generating Adversarial yet Inconspicuous Patches with a Single Image [PDF] 返回目录
Jinqi Luo, Tao Bai, Jun Zhao, Bo Li
Abstract: Deep neural networks have been shown vulnerable toadversarial patches, where exotic patterns can resultin models wrong prediction. Nevertheless, existing ap-proaches to adversarial patch generation hardly con-sider the contextual consistency between patches andthe image background, causing such patches to be eas-ily detected and adversarial attacks to fail. On the otherhand, these methods require a large amount of data fortraining, which is computationally expensive. To over-come these challenges, we propose an approach to gen-erate adversarial yet inconspicuous patches with onesingle image. In our approach, adversarial patches areproduced in a coarse-to-fine way with multiple scalesof generators and discriminators. Contextual informa-tion is encoded during the Min-Max training to makepatches consistent with surroundings. The selection ofpatch location is based on the perceptual sensitivity ofvictim models. Through extensive experiments, our ap-proach shows strong attacking ability in both the white-box and black-box setting. Experiments on saliency de-tection and user evaluation indicate that our adversar-ial patches can evade human observations, demonstratethe inconspicuousness of our approach. Lastly, we showthat our approach preserves the attack ability in thephysical world.
摘要：深层神经网络已经显示出脆弱的toadversarial补丁，其中充满异国情调的图案可以resultin模型预测错误。然而，现有的AP-proaches到对抗性色块生成几乎不CON-代尔以及所述背景图像斑点之间的一致性的上下文，引起这样的补丁是EAS-ILY检测和敌对攻击失败。在otherhand，这些方法需要大量的数据fortraining，这是计算昂贵的。过度来应对这些挑战，我们提出了一个方法来GEN-对抗性中心提供全方位还与onesingle图像不显眼的补丁。在我们的方法，对抗补丁areproduced在粗到精的方式与多个scalesof发电机和鉴别。 Informa的语境 - 重刑是最小 - 最大的培训makepatches与周围环境相一致时编码。选择ofpatch位置是基于所述感知灵敏度ofvictim模型。通过大量的实验，我们的AP-proach显示出强大的攻击在白盒和黑盒设置德才兼备。在显着脱tection和用户评价的实验表明，我们的adversar-IAL补丁可以逃避人的意见，我们的方法不显眼demonstratethe。最后，我们showthat我们的方法保留在界的实物世界的攻击能力。

35. Conditional Automated Channel Pruning for Deep Neural Networks [PDF] 返回目录
Yixin Liu, Yong Guo, Zichang Liu, Haohua Liu, Jingjie Zhang, Zejun Chen, Jing Liu, Jian Chen
Abstract: Model compression aims to reduce the redundancy of deep networks to obtain compact models. Recently, channel pruning has become one of the predominant compression methods to deploy deep models on resource-constrained devices. Most channel pruning methods often use a fixed compression rate for all the layers of the model, which, however, may not be optimal. To address this issue, given a target compression rate for the whole model, one can search for the optimal compression rate for each layer. Nevertheless, these methods perform channel pruning for a specific target compression rate. When we consider multiple compression rates, they have to repeat the channel pruning process multiple times, which is very inefficient yet unnecessary. To address this issue, we propose a Conditional Automated Channel Pruning(CACP) method to obtain the compressed models with different compression rates through single channel pruning process. To this end, we develop a conditional model that takes an arbitrary compression rate as input and outputs the corresponding compressed model. In the experiments, the resultant models with different compression rates consistently outperform the models compressed by existing methods with a channel pruning process for each target compression rate.
摘要：型号压缩旨在减少深网络的冗余，以获得紧凑车型。近日，通道修剪已成为主要的压缩方法之一，在资源受限的设备部署深模型。大多数信道修剪方法常常使用固定压缩率的模式，其中，然而，可能不是最优的所有层。为了解决这个问题，给出了整个模型的目标压缩率，一个可以搜索每一层的最佳压缩率。然而，这些方法中执行信道修剪为特定目标压缩率。当我们考虑多种压缩率，他们不得不重复通道修剪过程多次，这是非常低效的又是不必要的。为了解决这个问题，我们提出了一个条件自动通道修剪（CACP）方法在单信道修剪过程中获得不同的压缩率压缩模式。为此，我们开发了一个条件模型接受一个任意的压缩率作为输入，并输出对应的压缩模式。在实验中，所得到的模型具有不同压缩率始终优于通过与每个目标压缩率的信道修剪过程现有方法压缩的模型。

36. MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion [PDF] 返回目录
Yicheng Wang, Shuang Xu, Jiangshe Zhang, Chunxia Zhang, Zixiang Zhao, Junmin Liu
Abstract: Multi-Focus Image Fusion (MFIF) is one of the promising techniques to obtain all-in-focus images to meet people's visual needs and it is a precondition of other computer vision tasks. One of the research trends of MFIF is to solve the defocus spread effect (DSE) around the focus/defocus boundary (FDB). In this paper, we present a novel generative adversarial network termed MFIF-GAN to translate multi-focus images into focus maps and to get the all-in-focus images further. The Squeeze and Excitation Residual Network (SE-ResNet) module as an attention mechanism is employed in the network. During the training, we propose reconstruction and gradient regularization loss functions to guarantee the accuracy of generated focus maps. In addition, by combining the prior knowledge of training conditon, this network is trained on a synthetic dataset with DSE by an {\alpha}-matte model. A series of experimental results demonstrate that the MFIF-GAN is superior to several representative state-of-the-art (SOTA) algorithms in visual perception, quantitative analysis as well as efficiency.
摘要：多聚焦图像融合（MFIF）是的有前途的技术之一，获得全聚焦图像，以满足人们的视觉需求，这是其他计算机视觉任务的前提条件。一个MFIF的研究趋势是解决散焦散布效果（DSE）的聚焦/离焦边界（FDB）左右。在本文中，我们提出了一个新颖的生成对抗性的网络被称为MFIF-GaN多聚焦图像转化为重点的地图，并进一步获得全聚焦图像。挤压和励磁剩余网络（SE-RESNET）模块作为关注机构采用的网络。在培训过程中，我们提出重建和梯度正丧失功能，以保证所产生的焦点地图的准确性。此外，通过组合训练conditon的先验知识，该网络是在由{\阿尔法} -matte模型DSE合成数据集训练。一系列的实验结果表明，MFIF-GAN优于在视觉感知几个代表性状态的最先进的（SOTA）算法，定量分析以及效率。

37. The High-Quality Wide Multi-Channel Attack (HQ-WMCA) database [PDF] 返回目录
Zohreh Mostaani, Anjith George, Guillaume Heusch, David Geissbuhler, Sebastien Marcel
Abstract: The High-Quality Wide Multi-Channel Attack database (HQ-WMCA) database extends the previous Wide Multi-Channel Attack database(WMCA), with more channels including color, depth, thermal, infrared (spectra), and short-wave infrared (spectra), and also a wide variety of attacks.
摘要：高品质宽多通道攻击数据库（HQ-WMCA）数据库扩展了先前的宽多通道攻击数据库（WMCA），具有多个信道，包括颜色，深度，热，红外（光谱），和短波红外（光谱），并且也是各种各样的攻击。

38. Batch Coherence-Driven Network for Part-aware Person Re-Identification [PDF] 返回目录
Kan Wang, Pengfei Wang, Changxing Ding, Dacheng Tao
Abstract: Existing part-aware person re-identification methods typically employ two separate steps: namely, body part detection and part-level feature extraction. However, part detection introduces an additional computational cost and is inherently challenging for low-quality images. Accordingly, in this work, we propose a simple framework named Batch Coherence-Driven Network (BCD-Net) that bypasses body part detection during both the training and testing phases while still learning semantically aligned part features. Our key observation is that the statistics in a batch of images are stable, and therefore that batch-level constraints are robust. First, we introduce a batch coherence-guided channel attention (BCCA) module that highlights the relevant channels for each respective part from the output of a deep backbone model. We investigate channelpart correspondence using a batch of training images, then impose a novel batch-level supervision signal that helps BCCA to identify part-relevant channels. Second, the mean position of a body part is robust and consequently coherent between batches throughout the training process. Accordingly, we introduce a pair of regularization terms based on the semantic consistency between batches. The first term regularizes the high responses of BCD-Net for each part on one batch in order to constrain it within a predefined area, while the second encourages the aggregate of BCD-Nets responses for all parts covering the entire human body. The above constraints guide BCD-Net to learn diverse, complementary, and semantically aligned part-level features. Extensive experimental results demonstrate that BCDNet consistently achieves state-of-the-art performance on four large-scale ReID benchmarks.
摘要：现有部分感知人重新鉴定方法通常采用两个独立的步骤：即身体部位的检测和部分高级特征提取。然而，部分检测引入了额外的计算成本和低画质图像被固有挑战。因此，在这项工作中，我们提出了一个简单的框架命名批次一致性驱动的网络（BCD-网），培训和测试阶段无论是在绕过身体部位检测，同时还在学习语义对准部分功能。我们的主要发现是，在一批图像的统计量的稳定性，因此该批次级约束是稳健的。首先，我们引入一个批次一致性引导通道注意（BCCA）模块亮点用于每个相应部分从深骨架模型的输出的相关信道。我们使用了一批训练图像的调查channelpart对应，再施以一种新批级监控信号，帮助BCCA识别部分相关渠道。其次，主体部分的平均位置是稳健和在整个训练过程批次之间因此连贯。因此，我们介绍了基于批次之间的语义一致性的一对正则化项的。第一项规则化，以便在预定区域内限制它BCD-网用于在一个批次中的每个部分中的高响应，而第二个鼓励的，用于覆盖整个人体各部位BCD-篮网响应的聚集体。在上面的约束导向BCD-net学习多样，互补的，对准的语义部分级功能。大量的实验结果表明，BCDNet始终实现四个大型里德基准的国家的最先进的性能。

39. Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules [PDF] 返回目录
Robby Neven, Marian Verhelst, Tinne Tuytelaars, Toon Goedemé
Abstract: Training deep learning models on embedded devices is typically avoided since this requires more memory, computation and power over inference. In this work, we focus on lowering the amount of memory needed for storing all activations, which are required during the backward pass to compute the gradients. Instead, during the forward pass, static Synthetic Gradient Modules (SGMs) predict gradients for each layer. This allows training the model in a feed-forward manner without having to store all activations. We tested our method on a robot grasping scenario where a robot needs to learn to grasp new objects given only a single demonstration. By first training the SGMs in a meta-learning manner on a set of common objects, during fine-tuning, the SGMs provided the model with accurate gradients to successfully learn to grasp new objects. We have shown that our method has comparable results to using standard backpropagation.
摘要：在嵌入式设备培训深度学习模型通常避免，因为这需要更多的内存，计算和权力推断。在这项工作中，我们将重点放在降低所需的存储所有激活，其被向后传递期间需要计算梯度的内存量。取而代之的是，向前通过期间，静态合成梯度模块（SGMS）预测为每一层梯度。这使得在训练前馈方式的模型，而无需存储所有激活。我们在哪里机器人需要学习掌握只给出一个示范新对象机器人抓取情景测试我们的方法。通过第一期培训在一组通用对象，微调过程中的元学习方式SGMS，该SGMS提供的模型提供准确的梯度，以成功地学习掌握新的对象。我们已经表明，我们的方法比较的结果，使用标准的反向传播。

40. Discriminative Segmentation Tracking Using Dual Memory Banks [PDF] 返回目录
Fei Xie, Wankou Yang, Bo Liu, Kaihua Zhang, Wanli Xue, Wangmeng Zuo
Abstract: Existing template-based trackers usually localize the target in each frame with bounding box, thereby being limited in learning pixel-wise representation and handling complex and non-rigid transformation of the target. Further, existing segmentation tracking methods are still insufficient in modeling and exploiting dense correspondence of target pixels across frames. To overcome these limitations, this work presents a novel discriminative segmentation tracking architecture equipped with dual memory banks, i.e., appearance memory bank and spatial memory bank. In particular, the appearance memory bank utilizes spatial and temporal non-local similarity to propagate segmentation mask to the current frame, and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on the VOT2016, VOT2018, VOT2019, GOT-10K and TrackingNet benchmarks, especially achieving the EAO of 0.535 and 0.506 respectively on VOT2016 and VOT2018. Moreover, our approach outperforms the leading segmentation tracker D3S on two video object segmentation benchmarks DAVIS16 and DAVIS17. The source code will be released at this https URL.
摘要：现有的基于模板的跟踪器通常定位在每一帧中的目标与边界框，从而在学习逐像素表示和处理目标的复杂和非刚性变换的限制。此外，现有的分割跟踪方法仍然在模拟和利用在帧之间的目标的像素的密集对应不足。为了克服这些限制，这项工作提出了一种新的歧视性分割跟踪架构配备了双存储库，即外观存储体和空间记忆库。特别地，外观存储体利用的空间和时间的非本地相似性传播分割掩码到当前帧，并且我们进一步处理辨别相关滤波器作为空间的存储体来存储特征地图和地图空间之间的映射。没有花俏，我们简单但有效的跟踪架构设置上VOT2016一个新的国家的最先进的，VOT2018，VOT2019，GOT-10K和TrackingNet基准，尤其是在VOT2016达到0.535的EAO和0.506分别VOT2018。此外，我们的方法比对两个视频对象分割基准DAVIS16和DAVIS17领先分割跟踪D3S。源代码将在这个HTTPS URL被释放。

41. Feature Flow: In-network Feature Flow Estimation for Video Object Detection [PDF] 返回目录
Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang, Fayao Liu
Abstract: Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent to the pixel displacement, a common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset. With this method,they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an \textbf{I}n-network \textbf{F}eature \textbf{F}low estimation module (IFF module) for video object detection. Without resorting pre-training on any additional dataset, our IFF module is able to directly produce \textbf{feature flow} which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on \textit{self-supervision}, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and sets a state-of-the-art performance on ImageNet VID.
摘要：光流，其表示像素位移，被广泛应用于许多计算机视觉任务提供像素级运动信息。然而，随着卷积神经网络的显着进步，国家的最先进的方法最近提出了直接在功能层面解决问题。由于特征向量的位移不是像素位移一致，常见的做法是：正向光流的神经网络和微调这个网络上的任务数据集。利用该方法，他们所期望的微调网络编码功能级的运动信息产生张量。在本文中，我们重新审视这个事实上的范式和分析该视频对象检测任务，它的缺点。为缓解这些问题，我们提出了一种新颖的网络（IFF-净）与\ textbf {I}正网络\ textbf {F} eature \ textbf {F}低估计模块（IFF模块），用于视频对象检测。没有任何额外的数据集诉诸前的训练，我们的敌我识别模块，能够直接产生\ textbf {功能流}指示功能置换。我们IFF模块由浅的模块，这股特征与检测分支。这种紧凑的设计使我们的IFF-净准确地检测对象，同时保持了快速的推理速度。此外，我们提出了一种基于\ textit {自检}，这进一步提高了我们IFF-Net的性能的改造剩余损失（TRL）。我们IFF-Net的性能优于现有的方法和设置上ImageNet VID一个国家的最先进的性能。

42. 3D-FUTURE: 3D Furniture shape with TextURE [PDF] 返回目录
Huan Fu, Rongfei Jia, Lin Gao, Mingming Gong, Binqiang Zhao, Steve Maybank, Dacheng Tao
Abstract: The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 20,240 clean and realistic synthetic images of 5,000 different rooms. There are 9,992 unique detailed 3D instances of furniture with high-resolution textures. Experienced designers developed the room scenes, and the 3D CAD shapes in the scene are used for industrial production. Given the well-organized 3D-FUTURE, we provide baseline experiments on several widely studied tasks, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, and texture recovery for 3D shapes, to facilitate related future researches on our database.
摘要：3D CAD形状在目前的3D基准大多是从网上模型库收集。因此，他们通常没有足够的几何细节不够丰富的纹理，使他们在诸如高品质的3D网格纹理和恢复领域的全面和微妙的研究缺乏吸引力。本文呈现的3D家具形状的纹理（3D-FUTURE）：在家庭场景中的3D家具形状的丰富的注解和大型仓库。在这个技术报告的时间，3D-FUTURE含有5000个不同的房间干净20,240和现实的合成图像。还有的家具高分辨率纹理9,992独特细致的3D实例。有经验的设计师开发的室内场景和3D CAD图形场景被用于工业生产。由于组织严密的3D未来，我们提供一些广泛的研究工作，如联合2D实例分割和三维物体姿态估计，基于图像的三维形状检索，从单一的图像三维物体重建和纹理恢复基线实验3D形状，便于今后在我们的数据库相关研究。

43. Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness [PDF] 返回目录
Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung
Abstract: Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy.
摘要：基于集合对抗性训练是一个原则性的方式来实现对敌对攻击的鲁棒性。这种方法的一个重要的技术是，以控制的集合成员之间对抗例的转印性。我们在这项工作中提出了一个整体模型的委员会模型中一个简单而有效的战略合作。这是通过对给定的样本每个模型成员所定义的安全和不安全设置来实现，从而帮助我们量化和规范的可转让性。因此，我们提出的框架提供了灵活性，减少对抗性转让以及推动乐团的成员，这是我们的集成方法更好的稳健性两个关键因素的多样性。我们进行了广泛而全面的实验证明，我们提出的方法优于国家的最先进的集成基线，同时可以检测多种对抗性的例子有近乎完美的准确。

44. A Novel Transferability Attention Neural Network Model for EEG Emotion Recognition [PDF] 返回目录
Yang Li, Boxun Fu, Fu Li, Guangming Shi, Wenming Zheng
Abstract: The existed methods for electroencephalograph (EEG) emotion recognition always train the models based on all the EEG samples indistinguishably. However, some of the source (training) samples may lead to a negative influence because they are significant dissimilar with the target (test) samples. So it is necessary to give more attention to the EEG samples with strong transferability rather than forcefully training a classification model by all the samples. Furthermore, for an EEG sample, from the aspect of neuroscience, not all the brain regions of an EEG sample contains emotional information that can transferred to the test data effectively. Even some brain region data will make strong negative effect for learning the emotional classification model. Considering these two issues, in this paper, we propose a transferable attention neural network (TANN) for EEG emotion recognition, which learns the emotional discriminative information by highlighting the transferable EEG brain regions data and samples adaptively through local and global attention mechanism. This can be implemented by measuring the outputs of multiple brain-region-level discriminators and one single sample-level discriminator. We conduct the extensive experiments on three public EEG emotional datasets. The results validate that the proposed model achieves the state-of-the-art performance.
摘要：脑电图（EEG）的情感识别中存在的方法总是培养基于所有EEG样本不加区别的车型。然而，一些源（培训）样本可能是因为它们与目标（测试）样本显著不同导致的负面影响。因此，有必要给予更多的关注EEG样本具有较强的可转让性，而不是由所有样品有力地训练分类模型。此外，对于一个样品EEG，从神经科学的方面，并不是所有的EEG样品的脑区域包含可有效地传递到所述测试数据的情感的信息。即使是一些大脑区域的数据将使强烈的负面影响学习的情感分类模型。考虑到这两个问题，在本文中，我们提出了脑电的情感识别，它通过自适应通过本地和全球的关注机制凸显转让EEG脑区的数据和样品学习情绪判别信息的转移注意力的神经网络（TANN）。这可以通过测量的多发性脑区域级别鉴别器和一个单一采样电平鉴别器的输出来实现。我们三个公共EEG情绪数据集进行了大量的实验。结果证实，该模型实现状态的最先进的性能。

45. Semi-supervised Semantic Segmentation of Organs at Risk on 3D Pelvic CT Images [PDF] 返回目录
Zhuangzhuang Zhang, Tianyu Zhao, Hiram Gay, Baozhou Sun, Weixiong Zhang
Abstract: Automated segmentation of organs-at-risk in pelvic computed tomography (CT) images can assist the radiotherapy treatment planning by saving time and effort of manual contouring and reducing intra-observer and inter-observer variation. However, training high-performance deep-learning segmentation models usually requires broad labeled data, which are labor-intensive to collect. Lack of annotated data presents a significant challenge for many medical imaging-related deep learning solutions. This paper proposes a novel end-to-end convolutional neural network-based semi-supervised adversarial method that can segment multiple organs-at-risk, including prostate, bladder, rectum, left femur, and right femur. New design schemes are introduced to enhance the baseline residual U-net architecture to improve performance. Importantly, new unlabeled CT images are synthesized by a generative adversarial network (GAN) that is trained on given images to overcome the inherent problem of insufficient annotated data in practice. A semi-supervised adversarial strategy is then introduced to utilize labeled and unlabeled 3D CT images. The new method is evaluated on a dataset of 100 training cases and 20 testing cases. Experimental results, including four metrics (dice similarity coefficient, average Hausdorff distance, average surface Hausdorff distance, and relative volume difference), show that the new method outperforms several state-of-the-art segmentation approaches.
摘要：器官高危骨盆计算机断层扫描（CT）图像可以通过节省时间和人工轮廓的努力和减少内部观察员和国际观察员的变异协助放射治疗计划的自动分割。然而，培养高性能深学习分割模型通常需要广泛的标记数据，这是劳动密集型收集。缺乏注释的数据呈现的许多医疗成像相关的深度学习解决方案的显著挑战。本文提出了一种新颖的端至端的卷积神经网络的基于半监督对抗方法，其可段多器官高危，包括前列腺癌，膀胱癌，直肠癌，左股骨和右股骨。新的设计方案被引入，以提高基准剩余U型网结构，以提高性能。重要的是，新的未标记的CT图像是由生成对抗网络（GAN），其被给定的图像训练，以克服在实践不足注释的数据的固有的问题合成。然后，将半监督对抗性策略被引入到利用标记和未标记的3D CT图像。这种新方法是在100训练病例和20测试例的数据集进行评估。实验结果，包括四个度量标准（骰子相似系数，平均Hausdorff距离，平均表面Hausdorff距离，且相对体积的差），表明该新方法优于几个国家的最先进的分割方法。

46. SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning [PDF] 返回目录
Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang
Abstract: Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative instructions to edit images step by step. Data scarcity is a significant issue for ILBIE as it is challenging to collect large-scale examples of images before and after instruction-based changes. However, humans still accomplish these editing tasks even when presented with an unfamiliar image-instruction pair. Such ability results from counterfactual thinking and the ability to think about alternatives to events that have happened already. In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity. SSCR allows the model to consider out-of-distribution instructions paired with previous images. With the help of cross-task consistency (CTC), we train these counterfactual instructions in a self-supervised scenario. Extensive results show that SSCR improves the correctness of ILBIE in terms of both object identity and position, establishing a new state of the art (SOTA) on two IBLIE datasets (i-CLEVR and CoDraw). Even with only 50% of the training data, SSCR achieves a comparable result to using complete data.
摘要：基于语言的迭代图像编辑（IL-BIE）的任务跟着编辑图像分步重复说明。数据匮乏是一个ILBIE问题显著，因为它是具有挑战性的之前和之后基于指令的改变采集图像的大规模的例子。然而，当一个陌生的图像指令对人类提出了甚至还完成这些编辑任务。从反事实思维和能力的这种能力的结果考虑替代已经发生的事件。在本文中，我们引入一个自我监督的反事实推理（SSCR）框架，结合反事实思维克服数据缺乏。 SSCR允许模型考虑与以前的图像配对外的分布说明。随着跨任务的一致性的帮助（CTC），我们在自我监督的情况下训练这些反指令。广泛结果表明，提高了SSCR ILBIE的正确性在两个对象的身份和位置方面，在两个数据集IBLIE（I-CLEVR和CoDraw）建立的技术（SOTA）的一个新的状态。即使只有50％的训练数据，SSCR实现了媲美结果使用完整的数据。

47. ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles [PDF] 返回目录
Xiaoyong Yuan, Lei Ding, Lan Zhang, Xiaolin Li, Dapeng Wu
Abstract: Deep neural networks (DNNs) have become the essential components for various commercialized machine learning services, such as Machine Learning as a Service (MLaaS). Recent studies show that machine learning services face severe privacy threats - well-trained DNNs owned by MLaaS providers can be stolen through public APIs, namely model stealing attacks. However, most existing works undervalued the impact of such attacks, where a successful attack has to acquire confidential training data or auxiliary data regarding the victim DNN. In this paper, we propose ES Attack, a novel model stealing attack without any data hurdles. By using heuristically generated synthetic data, ES Attackiteratively trains a substitute model and eventually achieves a functionally equivalent copy of the victim DNN. The experimental results reveal the severity of ES Attack: i) ES Attack successfully steals the victim model without data hurdles, and ES Attack even outperforms most existing model stealing attacks using auxiliary data in terms of model accuracy; ii) most countermeasures are ineffective in defending ES Attack; iii) ES Attack facilitates further attacks relying on the stolen model.
摘要：深层神经网络（DNNs）已经成为各种商业化的机器学习服务，如机器学习服务（MLaaS）的重要组成部分。最近的研究表明，机器学习服务面临严重的隐私威胁 - 由MLaaS供应商所拥有，可以窃取训练有素DNNs通过公共的API，即窃取攻击模型。然而，大多数现有的作品被低估的这种攻击，在成功攻击必须获得关于受害人DNN机密的训练数据或辅助数据的影响。在本文中，我们提出了ES的攻击，没有任何数据的障碍一个新的模型窃取攻击。通过使用启发式生成的合成数据，ES Attackiteratively训练的替代模型，并最终达到受害者DNN的功能上等同的副本。实验结果表明ES攻击的严重性：1）ES攻击成功抢断没有数据障碍受害人模型，ES攻击甚至超过大多数现有的模型偷用模型精度方面的辅助数据攻击;二）大部分的对策是在保卫ES攻击无效; III）ES攻击利于进一步攻击依靠偷来的模型。

48. PIE: Portrait Image Embedding for Semantic Control [PDF] 返回目录
Ayush Tewari, Mohamed Elgharib, Mallikarjun B R., Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt
Abstract: Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.
摘要：肖像图片的编辑与种类繁多的应用非常普遍和重要的研究课题。为了便于使用，控制应该经由一个语义上有意义的参数是类似于计算机动画控制来提供。绝大多数现有技术没有提供这样的直观的和精细的控制，或者仅使单个分离的控制参数的粗编辑。最近，高品质的语义控制的编辑已被证实，但仅在合成产生StyleGAN图像。我们提出了在StyleGAN的潜在空间，这允许所述头部姿态的直观的编辑，面部表情，并且图像在场景照明嵌入真实肖像图像的第一种方法。在参数空间语义编辑基于StyleRig，一个3D形变脸部模型的控制空间映射到GAN的潜在空间预训练神经网络来实现。我们设计了一个新的层次非线性优化问题获得的嵌入。身份保存能量项使得空间相干的编辑，同时保持面部的完整性。我们的方法在互动的帧速率下运行，因此允许用户探索可能的修改的空间。我们评估我们在一系列广泛的人像照片的方法，它比现有技术的当前状态，并验证在消融研究其成分的有效性。

49. Remote sensing image fusion based on Bayesian GAN [PDF] 返回目录
Junfu Chen, Yue Pan, Yang Chen
Abstract: Remote sensing image fusion technology (pan-sharpening) is an important means to improve the information capacity of remote sensing images. Inspired by the efficient arameter space posteriori sampling of Bayesian neural networks, in this paper we propose a Bayesian Generative Adversarial Network based on Preconditioned Stochastic Gradient Langevin Dynamics (PGSLD-BGAN) to improve pan-sharpening tasks. Unlike many traditional generative models that consider only one optimal solution (might be locally optimal), the proposed PGSLD-BGAN performs Bayesian inference on the network parameters, and explore the generator posteriori distribution, which assists selecting the appropriate generator parameters. First, we build a two-stream generator network with PAN and MS images as input, which consists of three parts: feature extraction, feature fusion and image reconstruction. Then, we leverage Markov discriminator to enhance the ability of generator to reconstruct the fusion image, so that the result image can retain more details. Finally, introducing Preconditioned Stochastic Gradient Langevin Dynamics policy, we perform Bayesian inference on the generator network. Experiments on QuickBird and WorldView datasets show that the model proposed in this paper can effectively fuse PAN and MS images, and be competitive with even superior to state of the arts in terms of subjective and objective metrics.
摘要：遥感图像融合技术（全色锐化）是提高遥感图像的信息容量的一个重要手段。通过贝叶斯神经网络的高效arameter空间事后抽样的启发，本文提出了一种基于预处理随机梯度郎之万动力学（PGSLD-BGAN），以提高泛锐化任务的贝叶斯剖成对抗性网络。不像考虑对网络参数只有一个最优的解决方案（可能是局部最优），建议PGSLD-BGAN进行贝叶斯推理，探索产生后验分布，这有助于选择合适的发电机参数许多传统的生成模型。首先，我们建立了一个两流发电机网络与PAN和MS图像作为输入，它由三个部分组成：特征提取，特征融合和图像重建。然后，我们利用马尔可夫鉴别，以提高发电机的重建融合图像的能力，从而使结果图像可以保留更多的细节。最后，引入预处理随机梯度朗之万动力学政策，正进行发电机网络上的贝叶斯推理。 QuickBird卫星和世界观数据集实验结果表明，在本文提出的模型可以有效地融合全色和多光谱图像，并与甚至优于艺术的国家竞争力在主观和客观指标方面。

50. Implicit Feature Networks for Texture Completion from Partial 3D Data [PDF] 返回目录
Julian Chibane, Gerard Pons-Moll
Abstract: Prior work to infer 3D texture use either texture atlases, which require uv-mappings and hence have discontinuities, or colored voxels, which are memory inefficient and limited in resolution. Recent work, predicts RGB color at every XYZ coordinate forming a texture field, but focus on completing texture given a single 2D image. Instead, we focus on 3D texture and geometry completion from partial and incomplete 3D scans. IF-Nets have recently achieved state-of-the-art results on 3D geometry completion using a multi-scale deep feature encoding, but the outputs lack texture. In this work, we generalize IF-Nets to texture completion from partial textured scans of humans and arbitrary objects. Our key insight is that 3D texture completion benefits from incorporating local and global deep features extracted from both the 3D partial texture and completed geometry. Specifically, given the partial 3D texture and the 3D geometry completed with IF-Nets, our model successfully in-paints the missing texture parts in consistence with the completed geometry. Our model won the SHARP ECCV'20 challenge, achieving highest performance on all challenges.
摘要：以前的工作来推断3D纹理兼用纹理地图，这需要紫外映射并因此具有不连续性，或有色的体素，其是存储器低效和分辨率的限制。最近的工作，预计RGB色彩在每一个XYZ坐标形成纹理领域，但重点放在质地完成给予单一的2D图像。相反，我们注重从局部和不完整的3D扫描3D纹理和几何完成。 IF-篮网最近取得上使用多尺度深特征编码3D几何完成状态的最先进的结果，但输出缺少纹理。在这项工作中，我们概括IF-篮网从人类和任意对象的局部纹理质感扫描完成。我们的主要观点是，从合并从3D部分的纹理和几何完成两个提取局部和全局深功能3D纹理完成好处。具体来说，成功地给出了部分3D纹理和三维几何与IF-篮网完成了，我们的模型，绘制了失踪的纹理部分与完成的几何形状一致。我们的模型获得了SHARP ECCV'20挑战，实现对所有挑战最高的性能。

51. Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [PDF] 返回目录
Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, Hefeng Wu
Abstract: Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks.
摘要：识别图像的多个标签是已通过搜索语义区和利用标签的依赖达到了实用又具有挑战性的任务，并显着进展。然而，目前的作品利用RNN / LSTM隐式地捕捉连续区域/标签的依赖性，不能充分发掘语义地区/标签之间的相互交互和不明确将标签共现。此外，这些作品需要大量的训练样本为每个类别，并且无法推广到新的类别，有限制的样品。为了解决这些问题，我们提出了一个以知识为导向图路由（KGGR）框架，统一了与深层神经网络统计标签相关的先验知识。该框架利用先验知识来指导自适应信息传播不同类别之间，以促进多标记分析和减少训练样本的相关性。具体而言，首先构建了一个结构化的知识图来基于统计标签共现相关成分不同的标签。然后，它介绍了标签的语义，以指导学习语义的特定功能来初始化曲线图，它利用了一个图形传播网络探索图形节点的相互作用，使学习情境化图像特征表示。此外，我们初始化与所述分类器的权重对应的标签的每个图形节点和通过图形应用另一个传播网络到传送节点的消息。通过这种方式，它可以方便利用相关的标签，以帮助更好地训练分类器的信息。我们进行了对传统的多标签图像识别（MLR）和多标签几拍学习（ML-FSL）任务，表明我们的KGGR框架优于当前国家的最先进的方法，通过在相当大的利润大量的实验公众基准。

52. Renovating Parsing R-CNN for Accurate Multiple Human Parsing [PDF] 返回目录
Lu Yang, Qing Song, Zhihui Wang, Mengjie Hu, Chun Liu, Xueshi Xin, Wenhe Jia, Songcen Xu
Abstract: Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and complex background. Through analysis of multiple human parsing task, we observe that human-centric global perception and accurate instance-level parsing scoring are crucial for obtaining high-quality results. But the most state-of-the-art methods have not paid enough attention to these issues. To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline. The proposed RP R-CNN adopts global semantic representation to enhance multi-scale features for generating human parsing maps, and regresses a confidence score to represent its quality. Extensive experiments show that RP R-CNN performs favorably against state-of-the-art methods on CIHP and MHP-v2 datasets. Code and models are available at this https URL.
摘要：多人力解析目标细分各种人体部位，并同时相应的实例每个部分关联。这是一个非常具有挑战性的任务，由于不同人的外形，不同的身体部位的语义模糊和复杂的背景。通过多次的人解析任务的分析，我们观察到以人为中心的整体感知，准确的实例级解析得分是获得高质量的结果至关重要。但是，国家的最先进的最方法并没有得到足够的重视这些问题。扭转这种现象，我们本如何创新解析R-CNN（RP R-CNN），它引入了一个全球语义增强的特征金字塔网络和解析重新打分网络到现有的高性能流水线。所提出的RP R-CNN采用全球语义表示，以提高多尺度特征在于用于产生人体解析图，和倒退信心分数来表示其品质。大量的实验表明，RP R-CNN进行良好地靠在CIHP状态的最先进的方法和MHP-V2数据集。代码和模型可在此HTTPS URL。

53. Unsupervised Domain Adaptation for Person Re-Identification through Source-Guided Pseudo-Labeling [PDF] 返回目录
Fabian Dubourvieux, Romaric Audigier, Angelique Loesch, Samia Ainouz, Stephane Canu
Abstract: Person Re-Identification (re-ID) aims at retrieving images of the same person taken by different cameras. A challenge for re-ID is the performance preservation when a model is used on data of interest (target data) which belong to a different domain from the training data domain (source data). Unsupervised Domain Adaptation (UDA) is an interesting research direction for this challenge as it avoids a costly annotation of the target data. Pseudo-labeling methods achieve the best results in UDA-based re-ID. Surprisingly, labeled source data are discarded after this initialization step. However, we believe that pseudo-labeling could further leverage the labeled source data in order to improve the post-initialization training steps. In order to improve robustness against erroneous pseudo-labels, we advocate the exploitation of both labeled source data and pseudo-labeled target data during all training iterations. To support our guideline, we introduce a framework which relies on a two-branch architecture optimizing classification and triplet loss based metric learning in source and target domains, respectively, in order to allow \emph{adaptability to the target domain} while ensuring \emph{robustness to noisy pseudo-labels}. Indeed, shared low and mid-level parameters benefit from the source classification and triplet loss signal while high-level parameters of the target branch learn domain-specific features. Our method is simple enough to be easily combined with existing pseudo-labeling UDA approaches. We show experimentally that it is efficient and improves performance when the base method has no mechanism to deal with pseudo-label noise or for hard adaptation tasks. Our approach reaches state-of-the-art performance when evaluated on commonly used datasets, Market-1501 and DukeMTMC-reID, and outperforms the state of the art when targeting the bigger and more challenging dataset MSMT.
摘要：人重新鉴定（重新-ID），目的是获得由不同相机拍摄的同一人的图像。一种用于重新ID挑战是保存性能时模型上属于从训练数据域（源数据）不同的域（对象数据）的感兴趣的数据使用。无监督领域适应性（UDA）为迎接这一挑战一个有趣的研究方向，因为它避免了目标数据的昂贵注解。伪标记方法实现基于UDA-RE-ID的最好成绩。出人意料的是，标记的源数据被这个初始化步骤之后丢弃。但是，我们相信，假标签可以进一步利用，以提高后期的初始化训练步标记的源数据。为了提高对错误的伪标签稳健性，我们主张在所有的训练迭代双方打成源数据和伪标记的目标数据的开发。为了支持我们的方针，我们引入了一个框架，分别依赖于一个有两个分支结构优化分类和三重丧失是基于度量学习的源和目标域，为了让\ EMPH {适应性目标域}同时确保\ EMPH {鲁棒性嘈杂伪标签}。实际上，共享低中级参数从源分类和三重损失信号而获益目标分支的高级别参数学习特定于域的特征。我们的方法是简单的足以用现有的伪标记UDA接近容易地组合。我们通过实验证明它是有效的，并提高性能时基方法有没有机制来处理伪标签噪音或难以适应任务。在常用的数据集进行评估时，我们的方法达到国家的最先进的性能，市场-1501和DukeMTMC-REID，并瞄准更大，更具挑战性的数据集MSMT时优于现有技术的状态。

54. ContourCNN: convolutional neural network for contour data classification [PDF] 返回目录
Ahmad Droby, Jihad El-Sana
Abstract: This paper proposes a novel Convolutional Neural Network model for contour data analysis (ContourCNN) and shape classification. A contour is a circular sequence of points representing a closed shape. For handling with the cyclical property of the contour representation, we employ circular convolution layers. Contours are often represented sparsely. To address information sparsity, we introduce priority pooling layers that select features based on their magnitudes. Priority pooling layers pool features with low magnitudes while leaving the rest unchanged. We evaluated the proposed model using letters and digits shapes extracted from the EMNIST dataset and obtained a high classification accuracy.
摘要：本文提出了一种用于轮廓数据分析（ContourCNN）和形状分类的新的卷积神经网络的模型。甲轮廓是表示闭合的形状的点的圆形序列。对于与轮廓表示的循环特性处理，我们采用循环卷积层。轮廓往往稀疏表示。为了地址信息的稀疏性，我们引入优先池层，基于它们的大小选择功能。优先池层泳池，低幅度的功能，同时保留其余不变。我们评估使用从EMNIST数据集中提取并获得了较高的分类精度字母和数字的形状所提出的模型。

55. MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model [PDF] 返回目录
Ling Pei, Songpengcheng Xia, Lei Chu, Fanyi Xiao, Zixuan Zhang, Qi Wu, Wenxian Yu
Abstract: Human activity recognition (HAR) using wearable Inertial Measurement Unit (IMU) sensors is a promising technology for many research areas. Recently, deep learning-based methods pave a new way of understanding and performing analysis of the complex data in the HAR system. However, the performance of these methods is mostly based on the quality and quantity of the collected data. In this paper, we innovatively propose to build a large database based on virtual IMUs and then address technical issues by introducing a multiple-domain deep learning framework consisting of three technical parts. In the first part, we propose to learn the single-frame human activity from the noisy IMU data with hybrid convolutional neural networks (CNNs) in the semi-supervised form. For the second part, the extracted data features are fused according to the principle of uncertainty-aware consistency, which reduces the uncertainty by weighting the importance of the features. The transfer learning is performed in the last part based on the newly released Archive of Motion Capture as Surface Shapes (AMASS) dataset, containing abundant synthetic human poses, which enhances the variety and diversity of the training dataset and is beneficial for the process of training and feature transfer in the proposed method. The efficiency and effectiveness of the proposed method have been demonstrated in the real deep inertial poser (DIP) dataset. The experimental results show that the proposed methods can surprisingly converge within a few iterations and outperform all competing methods.
摘要：使用可穿戴式惯性测量单元（IMU）传感器，人类活动识别（HAR）是许多研究领域有前途的技术。近日，深基于学习的方法铺平的理解，并在HAR系统进行复杂的数据进行分析的新方法。然而，这些方法的性能主要是基于收集的数据的质量和数量。在本文中，我们创新性地提出构建基于虚拟IMU的大型数据库，然后通过引入多域深度学习框架由三个部分技术解决的技术问题。在第一部分中，我们提出了以从与在半监督形式混合卷积神经网络（细胞神经网络）嘈杂IMU数据学习单帧人类活动。对于第二部分，所提取的数据的特征是根据不确定性感知一致性，这通过加权的特征的重要性降低的不确定性原理熔合。转移学习是基于动作捕捉的表面形状（AMASS）数据集的新发布的存档中的最后一部分来执行，含有丰富的合成人体姿势，提高了训练数据的多样性和多元化，是训练过程中有益和在所提出的方法的特征传输。所提出的方法的效率和效果已经证明在实际惯性深拗造型（DIP）的数据集。实验结果表明，该方法可以在几迭代中令人惊讶的融合并超越所有竞争的方法。

56. DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition [PDF] 返回目录
Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Abstract: Heterogeneous Face Recognition (HFR) refers to matching cross-domain faces, playing a crucial role in public security. Nevertheless, HFR is confronted with the challenges from large domain discrepancy and insufficient heterogeneous data. In this paper, we formulate HFR as a dual generation problem, and tackle it via a novel Dual Variational Generation (DVG-Face) framework. Specifically, a dual variational generator is elaborately designed to learn the joint distribution of paired heterogeneous images. However, the small-scale paired heterogeneous training data may limit the identity diversity of sampling. With this in mind, we propose to integrate abundant identity information of large-scale VIS images into the joint distribution. Furthermore, a pairwise identity preserving loss is imposed on the generated paired heterogeneous images to ensure their identity consistency. As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises. The identity consistency and diversity properties allow us to employ these generated images to train the HFR network via a contrastive learning mechanism, yielding both domain invariant and discriminative embedding features. Concretely, the generated paired heterogeneous images are regarded as positive pairs, and the images obtained from different samplings are considered as negative pairs. Our method achieves superior performances over state-of-the-art methods on seven databases belonging to five HFR tasks, including NIR-VIS, Sketch-Photo, Profile-Frontal Photo, Thermal-VIS, and ID-Camera. The related code will be released at this https URL.
摘要：异构人脸识别（HFR）是指符合跨域的面孔，打在公安了至关重要的作用。尽管如此，HFR面临着从大域差异和异构数据不足的挑战。在本文中，我们制定HFR为双生成问题，并通过新颖的双变代（DVG-FACE）框架解决它。具体而言，对偶变发生器被精心设计学习配对异质图像的联合分布。然而，小规模的配对异构训练数据可以限制采样的身份多样性。考虑到这一点，我们建议大规模可见光图像丰富的身份信息纳入联合分布。此外，成对的身份保存损失被施加到产生的成对的异质图像，以确保他们的身份的一致性。因此，可以从声音所产生的具有相同标识庞大的新多样化的配对异质图像。身份一致性和多样性特性允许我们使用这些生成的图像通过一个对比学习机制来训练HFR网络，得到两个结构域的不变和判别嵌入的功能。具体地，所生成的成对的异质图像被视为正对，并从不同的采样所获得的图像被认为是阴性对。我们的方法实现对属于五项HFR任务，包括NIR-VIS，素描，照片，档案，脸部照片，热-VIS，和ID-摄像机7个数据库在国家的最先进的方法，性能优越。相关的代码将在这个HTTPS URL被释放。

57. Transform Domain Pyramidal Dilated Convolution Networks For Restoration of Under Display Camera Images [PDF] 返回目录
Hrishikesh P.S., Densen Puthussery, Melvin Kuriakose, Jiji C.V
Abstract: Under-display camera (UDC) is a novel technology that can make digital imaging experience in handheld devices seamless by providing large screen-to-body ratio. UDC images are severely degraded owing to their positioning under a display screen. This work addresses the restoration of images degraded as a result of UDC imaging. Two different networks are proposed for the restoration of images taken with two types of UDC technologies. The first method uses a pyramidal dilated convolution within a wavelet decomposed convolutional neural network for pentile-organic LED (P-OLED) based display system. The second method employs pyramidal dilated convolution within a discrete cosine transform based dual domain network to restore images taken using a transparent-organic LED (T-OLED) based UDC system. The first method produced very good quality restored images and was the winning entry in European Conference on Computer Vision (ECCV) 2020 challenge on image restoration for Under-display Camera - Track 2 - P-OLED evaluated based on PSNR and SSIM. The second method scored fourth position in Track-1 (T-OLED) of the challenge evaluated based on the same metrics.
摘要：在显示摄像机（UDC）是新技术，它可通过提供大屏幕到身体之比使得在手持设备无缝数字成像体验。 UDC图像被严重退化的显示屏幕下由于它们的定位。这项工作的地址的图像的劣化复原作为UDC成像结果。两种不同的网络提出了两种类型的UDC技术拍摄的图像恢复。第一种方法使用一个扩张锥体卷积小波分解卷积神经网络内为了Pentile有机LED（P-OLED）的显示系统。第二种方法采用锥体扩张型卷积的离散余弦变换中基于双域网络，以恢复使用基于UDC系统的透明有机LED（T-OLED）拍摄的图像。生产质量非常好第一种方法复原的图像，并在欧洲会议计算机视觉（ECCV）2020挑战图像复原下显示相机的获奖作品 - 评估基于PSNR和SSIM P-OLED - 轨道2。第二种方法在评估的挑战的履带1（T-OLED）打进第四位置基于相同的指标。

58. Real-time Lane detection and Motion Planning in Raspberry Pi and Arduino for an Autonomous Vehicle Prototype [PDF] 返回目录
Alfa Rossi, Nadim Ahmed, Sultanus Salehin, Tashfique Hasnine Choudhury, Golam Sarowar
Abstract: This paper discusses a vehicle prototype that recognizes streets' lanes and plans its motion accordingly without any human input. Pi Camera 1.3 captures real-time video, which is then processed by Raspberry-Pi 3.0 Model B. The image processing algorithms are written in Python 3.7.4 with OpenCV 4.2. Arduino Uno is utilized to control the PID algorithm that controls the motor controller, which in turn controls the wheels. Algorithms that are used to detect the lanes are the Canny edge detection algorithm and Hough transformation. Elementary algebra is used to draw the detected lanes. After detection, the lanes are tracked using the Kalman filter prediction method. Then the midpoint of the two lanes is found, which is the initial steering direction. This initial steering direction is further smoothed by using the Past Accumulation Average Method and Kalman Filter Prediction Method. The prototype was tested in a controlled environment in real-time. Results from comprehensive testing suggest that this prototype can detect road lanes and plan its motion successfully.
摘要：本文讨论了承认街头的小巷，并相应地规划其运动无需任何人工输入车辆的原型。裨摄像头130所捕捉的实时视频，然后将其通过覆盆子丕3.0模型B处理的处理算法被写入在Python 3.7.4与OpenCV的4.2的图像。 Arduino的乌诺被用来控制PID算法控制该电机控制器，其进而控制车轮。被用于检测车道算法是Canny边缘检测算法和霍夫变换。初等代数用于绘制所检测到的车道。在检测之后，通道被使用卡尔曼滤波器预测方法追踪。然后将两个车道的中点被发现，它是初始的转向方向。该初始转向方向是通过使用过去的累积平均方法和卡尔曼滤波的预测方法进一步平滑。该原型在实时控制的环境中测试。从全面的试验结果表明，这种原型可以检测路面的车道，并成功地规划其运动。

59. Deriving Visual Semantics from Spatial Context: An Adaptation of LSA and Word2Vec to generate Object and Scene Embeddings from Images [PDF] 返回目录
Matthias S. Treder, Juan Mayor-Torres, Christoph Teufel
Abstract: Embeddings are an important tool for the representation of word meaning. Their effectiveness rests on the distributional hypothesis: words that occur in the same context carry similar semantic information. Here, we adapt this approach to index visual semantics in images of scenes. To this end, we formulate a distributional hypothesis for objects and scenes: Scenes that contain the same objects (object context) are semantically related. Similarly, objects that appear in the same spatial context (within a scene or subregions of a scene) are semantically related. We develop two approaches for learning object and scene embeddings from annotated images. In the first approach, we adapt LSA and Word2vec's Skipgram and CBOW models to generate two sets of embeddings from object co-occurrences in whole images, one for objects and one for scenes. The representational space spanned by these embeddings suggests that the distributional hypothesis holds for images. In an initial application of this approach, we show that our image-based embeddings improve scene classification models such as ResNet18 and VGG-11 (3.72\% improvement on Top5 accuracy, 4.56\% improvement on Top1 accuracy). In the second approach, rather than analyzing whole images of scenes, we focus on co-occurrences of objects within subregions of an image. We illustrate that this method yields a sensible hierarchical decomposition of a scene into collections of semantically related objects. Overall, these results suggest that object and scene embeddings from object co-occurrences and spatial context yield semantically meaningful representations as well as computational improvements for downstream applications such as scene classification.
摘要：曲面嵌入是词义表达的重要工具。其功效停留在分配假设：发生在同样的情况下进行的话类似的语义信息。在这里，我们适应这种方式来索引视觉语义场景的图像。为此，我们制定物体和场景一个分布的假设：包含相同的对象（对象上下文）的语义相关的场景。类似地，出现在相同的空间上下文（一个场景或场景的子区域内）对象语义相关。我们开发从注释的图像学习对象和场景的嵌入两种方法。在第一种方法，我们适应LSA和Word2vec的Skipgram和CBOW模型来生成全部图片来自对象共同出现两套的嵌入，一个为对象，一个用于场景。这些嵌入物跨越的代表性空间表明，分布假设成立的图像。在这种方法的最初应用，我们证明了我们的基于图像的嵌入提高场景分类模型，如ResNet18和VGG-11（上TOP5精度3.72 \％的改善，在TOP1精度4.56 \％的改善）。在第二种方法，而不是分析场景的整体图像，我们专注于对象的共同出现的图像的子区域内。我们举例说明，这种方法产生一个场景的明智分层分解成语义相关对象的集合。总体而言，这些结果表明，从对象共现和空间背景产量语义上有意义的表述以及计算的改进，对象和场景的嵌入下游应用，如场景分类。

60. $pi_t$- Enhancing the Precision of Eye Tracking using Iris Feature Motion Vectors [PDF] 返回目录
Aayush K. Chaudhary, Jeff B. Pelz
Abstract: A new high-precision eye-tracking method has been demonstrated recently by tracking the motion of iris features rather than by exploiting pupil edges. While the method provides high precision, it suffers from temporal drift, an inability to track across blinks, and loss of texture matches in the presence of motion blur. In this work, we present a new methodology $pi_t$ to address these issues by optimally combining the information from both iris textures and pupil edges. With this method, we show an improvement in precision (S2S-RMS & STD) of at least 48% and 10% respectively while fixating a series of small targets and following a smoothly moving target. Further, we demonstrate the capability in the identification of microsaccades between targets separated by 0.2-degree.
摘要：一种新的高精密的眼睛跟踪方法已通过跟踪虹膜的运动最近证明特征，而不是通过利用瞳孔边缘。尽管该方法提供了高精确度，它从时间漂移患有，一个无法跟踪所有闪烁，并且在运动模糊的存在纹理匹配损失。在这项工作中，我们提出了一种新的方法$ pi_t $通过优化组合从两个虹膜纹理和瞳孔边缘的信息来解决这些问题。使用这种方法，我们分别示出在精密（S2S-RMS＆STD）至少48％和10％的改善，而固定的一系列小目标和以下平滑移动的目标。此外，我们证明在0.2度的分离靶之间微跳的识别能力。

61. Predicting Geographic Information with Neural Cellular Automata [PDF] 返回目录
Mingxiang Chen, Qichang Chen, Lei Gao, Yilin Chen, Zhecheng Wang
Abstract: This paper presents a novel framework using neural cellular automata (NCA) to regenerate and predict geographic information. The model extends the idea of using NCA to generate/regenerate a specific image by training the model with various geographic data, and thus, taking the traffic condition map as an example, the model is able to predict traffic conditions by giving certain induction information. Our research verified the analogy between NCA and gene in biology, while the innovation of the model significantly widens the boundary of possible applications based on NCAs. From our experimental results, the model shows great potentials in its usability and versatility which are not available in previous studies. The code for model implementation is available at https://redacted.
摘要：提出用神经元胞自动机（NCA）来再生和预测的地理信息的新的框架。该模型扩展使用NCA，以生成/通过训练与各种地理数据的模型，并且因此，同时考虑交通状况地图作为例子再生的特定图像的想法，该模型能够通过给予某些感应信息来预测的交通状况。我们的研究证实NCA和基因之间的类比于生物学，而模型的创新显著扩大的基础上种NCA可能的应用边界。从我们的实验结果，该模型显示了它的易用性和多功能性潜力巨大，这是不是在以前的研究中使用。为模型实现的代码可以在https：//开头节录。

62. Dual-path CNN with Max Gated block for Text-Based Person Re-identification [PDF] 返回目录
Tinghuai Ma, Mingming Yang, Huan Rong, Yurong Qian, Yurong Qian, Yuan Tian, NajlaAl-Nabhan
Abstract: Text-based person re-identification(Re-id) is an important task in video surveillance, which consists of retrieving the corresponding person's image given a textual description from a large gallery of images. It is difficult to directly match visual contents with the textual descriptions due to the modality heterogeneity. On the one hand, the textual embeddings are not discriminative enough, which originates from the high abstraction of the textual descriptions. One the other hand,Global average pooling (GAP) is commonly utilized to extract more general or smoothed features implicitly but ignores salient local features, which are more important for the cross-modal matching problem. With that in mind, a novel Dual-path CNN with Max Gated block (DCMG) is proposed to extract discriminative word embeddings and make visual-textual association concern more on remarkable features of both modalities. The proposed framework is based on two deep residual CNNs jointly optimized with cross-modal projection matching (CMPM) loss and cross-modal projection classification (CMPC) loss to embed the two modalities into a joint feature space. First, the pre-trained language model, BERT, is combined with the convolutional neural network (CNN) to learn better word embeddings in the text-to-image matching domain. Second, the global Max pooling (GMP) layer is applied to make the visual-textual features focus more on the salient part. To further alleviate the noise of the maxed-pooled features, the gated block (GB) is proposed to produce an attention map that focuses on meaningful features of both modalities. Finally, extensive experiments are conducted on the benchmark dataset, CUHK-PEDES, in which our approach achieves the rank-1 score of 55.81% and outperforms the state-of-the-art method by 1.3%.
摘要：基于文本的人重新鉴定（重新编号）是视频监控，其中包括检索相应的人的形象给从一大画廊图像的文本描述的一项重要任务。这是很难直接匹配与文字描述视觉内容由于模态的异质性。在一方面，文字的嵌入不歧视不够，从文本描述的高抽象它起源。一个另一方面，全球平均池（GAP），通常用于隐含地提取多个通用或平滑特征，但忽略凸角的局部特征，这是交叉模态匹配问题更重要。考虑到这一点，马克斯门控块一种新型的双通道CNN转移酶（dcmG）拟提取辨别字的嵌入，使视觉文本协会的关注更多在两种模式的显着特点。所提出的架构是基于与跨通道投影匹配（CMPM）损失和跨通道投影分类（CMPC）损失嵌入两个模态联合优化到关节特征空间两个深残余细胞神经网络。首先，预先训练的语言模型，BERT，与卷积神经网络（CNN）学习的文本到影像匹配域更好的词的嵌入结合。二，全球最大池（GMP）层被施加以使视觉文本特征更集中于凸部。为了进一步减轻的淋漓尽致池化特征的噪声，选通块（GB）提出以产生注意力图，专注于两个模态的有意义特征。最后，大量的实验在基准数据集，CUHK-德斯，其中我们的方法实现了秩-1得分的55.81％和1.3％优于国家的最先进的方法进行。

63. Factorized Deep Generative Models for Trajectory Generation with Spatiotemporal-Validity Constraints [PDF] 返回目录
Liming Zhang, Liang Zhao, Dieter Pfoser
Abstract: Trajectory data generation is an important domain that characterizes the generative process of mobility data. Traditional methods heavily rely on predefined heuristics and distributions and are weak in learning unknown mechanisms. Inspired by the success of deep generative neural networks for images and texts, a fast-developing research topic is deep generative models for trajectory data which can learn expressively explanatory models for sophisticated latent patterns. This is a nascent yet promising domain for many applications. We first propose novel deep generative models factorizing time-variant and time-invariant latent variables that characterize global and local semantics, respectively. We then develop new inference strategies based on variational inference and constrained optimization to encapsulate the spatiotemporal validity. New deep neural network architectures have been developed to implement the inference and generation models with newly-generalized latent variable priors. The proposed methods achieved significant improvements in quantitative and qualitative evaluations in extensive experiments.
摘要：轨迹数据生成是一种描述性数据的生成过程中的一个重要领域。传统的方法在很大程度上依赖于预定义的启发式和分布，且在学习未知的机制薄弱。深生成神经网络的图像和文字的成功的启发，一个快速发展的研究课题是可以学习意味深长地解释模型，复杂的潜在图案轨迹数据深生成模型。这是许多应用的新生还看好域。首先，我们提出了新的深生成模型因式分解分别表征全局和局部语义，时变和时不变的潜在变量。然后，我们根据变推理和约束优化封装时空有效性开发新的推理策略。新的深层神经网络架构已经发展到实现推理和代车型与新广义潜变量先验。所提出的方法实现了在大量的实验定量和定性评估显著的改善。

64. High-Resolution Augmentation for Automatic Template-Based Matching of Human Models [PDF] 返回目录
Riccardo Marin, Simone Melzi, Emanuele Rodolà, Umberto Castellani
Abstract: We propose a new approach for 3D shape matching of deformable human shapes. Our approach is based on the joint adoption of three different tools: an intrinsic spectral matching pipeline, a morphable model, and an extrinsic details refinement. By operating in conjunction, these tools allow us to greatly improve the quality of the matching while at the same time resolving the key issues exhibited by each tool individually. In this paper we present an innovative High-Resolution Augmentation (HRA) strategy that enables highly accurate correspondence even in the presence of significant mesh resolution mismatch between the input shapes. This augmentation provides an effective workaround for the resolution limitations imposed by the adopted morphable model. The HRA in its global and localized versions represents a novel refinement strategy for surface subdivision methods. We demonstrate the accuracy of the proposed pipeline on multiple challenging benchmarks, and showcase its effectiveness in surface registration and texture transfer.
摘要：我们提出了人类变形形状的3D形状匹配的新方法。我们的做法是基于联合通过三种不同的工具：一种内在的光谱匹配管道，一个形变模型，以及外在的细节细化。通过联合运营，这些工具使我们能够大大提高匹配的质量，而在同一时间解决由每个单独的工具表现出的关键问题。在本文中，我们提出了一种创新的高分辨率增强（HRA）的策略，使高度精确的对应关系，即使在显著网格分辨率不匹配的输入形状之间的存在。这增强提供了通过采用形变模型强加的分辨率限制的有效解决方案。在其全球及本地化版本的HRA代表曲面细分方法的新颖的改进策略。我们证明在多个具有挑战性的基准测试中提出的管线的准确度，并在表面登记和质地传递展示了其有效性。

65. Features based Mammogram Image Classification using Weighted Feature Support Vector Machine [PDF] 返回目录
S. Kavitha, K.K. Thyagharajan
Abstract: In the existing research of mammogram image classification, either clinical data or image features of a specific type is considered along with the supervised classifiers such as Neural Network (NN) and Support Vector Machine (SVM). This paper considers automated classification of breast tissue type as benign or malignant using Weighted Feature Support Vector Machine (WFSVM) through constructing the precomputed kernel function by assigning more weight to relevant features using the principle of maximizing deviations. Initially, MIAS dataset of mammogram images is divided into training and test set, then the preprocessing techniques such as noise removal and background removal are applied to the input images and the Region of Interest (ROI) is identified. The statistical features and texture features are extracted from the ROI and the clinical features are obtained directly from the dataset. The extracted features of the training dataset are used to construct the weighted features and precomputed linear kernel for training the WFSVM, from which the training model file is created. Using this model file the kernel matrix of test samples is classified as benign or malignant. This analysis shows that the texture features have resulted in better accuracy than the other features with WFSVM and SVM. However, the number of support vectors created in WFSVM is less than the SVM classifier.
摘要：在乳房X线照片图像分类的现有研究，无论是临床数据或特定类型的图像特征与监督分类如神经网络（NN）和支持向量机（SVM）一起考虑。本文考虑通过使用离差最大化的原理分配更大的权重相关特征构造预先计算内核函数自动乳房组织类型为良性或恶性的使用加权功能支持向量机（WFSVM）的分类。最初，乳房X线照片图像的数据集MIAS分为训练和测试集，则预处理技术，诸如噪声去除和背景去除施加到输入图像和关注区域（ROI）的区域被识别。统计特征和纹理特征从ROI提取和临床特征被直接从数据集中获得。训练数据集的所提取的特征被用于构建加权特征和预计算的线性核用于训练WFSVM，从中创建训练模型文件。使用测试样品的核矩阵被分类为良性或恶性的此模型文件。这一分析表明，纹理特征导致比WFSVM和SVM其他功能更好的精度。然而，在创建WFSVM支持向量的数目小于所述SVM分类器。

66. Adversarial Consistent Learning on Partial Domain Adaptation of PlantCLEF 2020 Challenge [PDF] 返回目录
Youshan Zhang, Brian D. Davison
Abstract: Domain adaptation is one of the most crucial techniques to mitigate the domain shift problem, which exists when transferring knowledge from an abundant labeled sourced domain to a target domain with few or no labels. Partial domain adaptation addresses the scenario when target categories are only a subset of source categories. In this paper, to enable the efficient representation of cross-domain plant images, we first extract deep features from pre-trained models and then develop adversarial consistent learning ($ACL$) in a unified deep architecture for partial domain adaptation. It consists of source domain classification loss, adversarial learning loss, and feature consistency loss. Adversarial learning loss can maintain domain-invariant features between the source and target domains. Moreover, feature consistency loss can preserve the fine-grained feature transition between two domains. We also find the shared categories of two domains via down-weighting the irrelevant categories in the source domain. Experimental results demonstrate that training features from NASNetLarge model with proposed $ACL$ architecture yields promising results on the PlantCLEF 2020 Challenge.
摘要：域名适应是最关键的技术，以减轻域转移问题，从丰富的标记源域很少或没有标签传授知识时，目标域存在哪一个。当目标类别只源类别的一个子集分域自适应解决方案。在本文中，从而实现跨域植物影像的高效表现，我们首先从前期训练的模型提取深的特点，然后在局部领域适应性统一的深架构开发对抗一致的学习（$ ACL $）。它由源域分类的损失，对抗性学习的损失和功能一致性的损失。对抗性学习的损失可以保持源和目标域之间域不变特征。此外，特征一致性损失可以保持两个结构域之间的细粒特征的过渡。我们还通过下行加权不相关的类别中的源域找到两个结构域的共享类别。实验结果表明，从NASNetLarge模型训练的特点与建议$ ACL $架构产量看好的PlantCLEF 2020挑战的结果。

67. Subverting Privacy-Preserving GANs: Hiding Secrets in Sanitized Images [PDF] 返回目录
Kang Liu, Benjamin Tan, Siddharth Garg
Abstract: Unprecedented data collection and sharing have exacerbated privacy concerns and led to increasing interest in privacy-preserving tools that remove sensitive attributes from images while maintaining useful information for other tasks. Currently, state-of-the-art approaches use privacy-preserving generative adversarial networks (PP-GANs) for this purpose, for instance, to enable reliable facial expression recognition without leaking users' identity. However, PP-GANs do not offer formal proofs of privacy and instead rely on experimentally measuring information leakage using classification accuracy on the sensitive attributes of deep learning (DL)-based discriminators. In this work, we question the rigor of such checks by subverting existing privacy-preserving GANs for facial expression recognition. We show that it is possible to hide the sensitive identification data in the sanitized output images of such PP-GANs for later extraction, which can even allow for reconstruction of the entire input images, while satisfying privacy checks. We demonstrate our approach via a PP-GAN-based architecture and provide qualitative and quantitative evaluations using two public datasets. Our experimental results raise fundamental questions about the need for more rigorous privacy checks of PP-GANs, and we provide insights into the social impact of these.
摘要：前所未有的数据收集和共享加剧了隐私担忧，并导致增加，从图像删除敏感属性，同时保持其他任务有用的信息隐私保护的工具的兴趣。目前，国家的最先进的用于此目的的方法使用隐私保护生成对抗网络（PP-甘斯），例如，实现可靠的面部表情识别无泄露用户的身份。然而，PP-甘斯不提供的隐私形式化证明，而是依靠实验测量使用的深度学习（DL）为基础的鉴别的敏感属性分类准确性的信息泄漏。在这项工作中，我们将颠覆现有的隐私保护甘斯的面部表情识别质疑这种检查的严密性。我们表明，有可能隐藏在这种PP-Gans的供以后的提取，这甚至可以允许在整个输入图像的重建的消毒输出图像中的敏感的识别数据，同时满足隐私检查。我们证明通过基于PP-GAN架构我们的方法，并提供使用两个公共数据集定性和定量评估。我们的实验结果提出有关需要PP-甘斯的更严格的隐私检查基本问题，我们提供深入了解这些社会影响。

68. Making Images Undiscoverable from Co-Saliency Detection [PDF] 返回目录
Ruijun Gao, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Xuhong Ren, Wei Feng, Song Wang
Abstract: In recent years, co-saliency object detection (CoSOD) has achieved significant progress and played a key role in the retrieval-related tasks, e.g., image retrieval and video foreground detection. Nevertheless, it also inevitably posts a totally new safety and security problem, i.e., how to prevent high-profile and personal-sensitive contents from being extracted by the powerful CoSOD methods. In this paper, we address this problem from the perspective of adversarial attack and identify a novel task, i.e., adversarial co-saliency attack: given an image selected from an image group containing some common and salient objects, how to generate an adversarial version that can mislead CoSOD methods to predict incorrect co-salient regions. Note that, compared with general adversarial attacks for classification, this new task introduces two extra challenges for existing whitebox adversarial noise attacks: (1) low success rate due to the diverse appearance of images in the image group; (2) low transferability across CoSOD methods due to the considerable difference between CoSOD pipelines. To address these challenges, we propose the very first blackbox joint adversarial exposure & noise attack (Jadena) where we jointly and locally tune the exposure and additive perturbations of the image according to a newly designed high-feature-level contrast-sensitive loss function. Our method, without any information of the state-of-the-art CoSOD methods, leads to significant performance degradation on various co-saliency detection datasets and make the co-salient objects undetectable, which can be strongly practical in nowadays where large-scale personal photos are shared on the internet and should be properly and securely preserved.
摘要：近年来，共显着的物体检测（COSOD）已取得显著进展，并在检索相关的任务发挥了关键作用，例如，图像检索和视频前景检测。然而，它也不可避免地张贴了一个全新的安全和保障问题，即如何防止高调和个人敏感内容不被强大的COSOD方法被提取。在本文中，我们要解决的对抗攻击的角度来看这个问题，并确定一个新的任务，即，对抗共同的显着性攻击：给予含有一些常见和突出的物体的图像组的图像，如何产生敌对版本可以误导COSOD方法来预测不正确共同显着的区域。需要注意的是，与一般的分类敌对攻击，这项新任务介绍相比，现有的两个额外的挑战白牌对抗噪音攻击：（1）低成功率，由于图像组中的影像的不同外观; （2）横跨COSOD方法低转移性由于COSOD管道之间的相当大的差异。为了应对这些挑战，我们提出的第一个黑箱联合对抗曝光和噪音攻击（Jadena），我们联合并在本地调整曝光，并根据新设计的高功能级对比度敏感损失函数图像的添加剂扰动。我们的方法，没有国家的最先进的COSOD方法的任何信息，导致对各种合作显着性检测数据集显著的性能下降，使合作显着对象无法检测，这在大型时下其中实践性很强的个人照片在互联网上共享，并应妥善，安全地保存。

69. City-Scale Visual Place Recognition with Deep Local Features Based on Multi-Scale Ordered VLAD Pooling [PDF] 返回目录
Duc Canh Le, Chan Hyun Youn
Abstract: Visual place recognition is the task of recognizing a place depicted in an image based on its pure visual appearance without metadata. In visual place recognition, the challenges lie upon not only the changes in lighting conditions, camera viewpoint, and scale, but also the characteristic of scene level images and the distinct features of the area. To resolve these challenges, one must consider both the local discriminativeness and the global semantic context of images. On the other hand, the diversity of the datasets is also particularly important to develop more general models and advance the progress of the field. In this paper, we present a fully-automated system for place recognition at a city-scale based on content-based image retrieval. Our main contributions to the community lie in three aspects. Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task compared to general image retrieval tasks. Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector. Finally, we introduce new datasets for place recognition, which are particularly essential for application-based research. Furthermore, throughout extensive experiments, various issues in both image retrieval and place recognition are analyzed and discussed to give some insights for improving the performance of retrieval models in reality.
摘要：视觉识别的地方承认是基于其纯粹的视觉外观没有元数据的图像中描绘的地方的任务。在视觉地方识别中，挑战在于在不仅变化的照明条件下，摄像机视点，和规模，而且现场级图像的特征和该区域的显着特征。要解决这些挑战，必须同时考虑当地discriminativeness和图像的全局语义上下文。在另一方面，该数据集的多样性，也可以开发出更多的通用模型和推动该领域的进展尤为重要。在本文中，我们在基于基于内容的图像检索全市规模礼物的地方承认一个完全自动化的系统。我们对社会的谎言有三个方面的主要贡献。首先，我们需要视觉的地方识别的综合分析，勾画出相比于一般的图像检索任务的任务的独特挑战。接下来，我们还提出卷积神经网络激活之上的简单池的方法来嵌入空间信息到图像表示向量。最后，我们介绍的地方识别新的数据集，这是基于应用的研究显得尤为必要。此外，在整个广泛的实验，在这两个图像检索和识别发生的各种问题进行分析和讨论提供一些见解对于提高检索模型的现实表现。

70. EfficientDeRain: Learning Pixel-wise Dilation Filtering for High-Efficiency Single-Image Deraining [PDF] 返回目录
Qing Guo, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Xiaofei Xie, Wei Feng, Yang Liu
Abstract: Single-image deraining is rather challenging due to the unknown rain model. Existing methods often make specific assumptions of the rain model, which can hardly cover many diverse circumstances in the real world, making them have to employ complex optimization or progressive refinement. This, however, significantly affects these methods' efficiency and effectiveness for many efficiency-critical applications. To fill this gap, in this paper, we regard the single-image deraining as a general image-enhancing problem and originally propose a model-free deraining method, i.e., EfficientDeRain, which is able to process a rainy image within 10~ms (i.e., around 6~ms on average), over 80 times faster than the state-of-the-art method (i.e., RCDNet), while achieving similar de-rain effects. We first propose the novel pixel-wise dilation filtering. In particular, a rainy image is filtered with the pixel-wise kernels estimated from a kernel prediction network, by which suitable multi-scale kernels for each pixel can be efficiently predicted. Then, to eliminate the gap between synthetic and real data, we further propose an effective data augmentation method (i.e., RainMix) that helps to train network for real rainy image handling.We perform comprehensive evaluation on both synthetic and real-world rainy datasets to demonstrate the effectiveness and efficiency of our method. We release the model and code in this https URL.
摘要：单图像deraining由于未知雨模型相当具有挑战性。现有的方法往往使雨水模型，难掩许多不同的情况下，在现实世界中的具体假设，使他们不得不使用复杂的优化或逐步细化。然而，这显著影响了很多效率的关键应用这些方法的效率和有效性。为了填补这一空白，在本文中，我们认为单图像deraining一般作为一种图像增强的问题，原本提出无模型deraining方法，即EfficientDeRain，这是能够在10〜毫秒处理一个下雨的图像（即，在平均约6〜毫秒），比国家的最先进的方法快80倍（即，RCDNet），同时实现类似的去雨的影响。首先，我们提出了新的逐像素扩张过滤。特别是，雨天图像进行滤波，以从内核预测网络推定的逐像素的内核，通过该每个像素合适的多尺度内核能够高效地进行预测。然后，为了消除合成的和实际数据之间的间隙中，我们还提出了一种有效的数据增强方法（即，RainMix），有助于列车网络即时雨天图像handling.We对合成的和真实世界的雨季数据集进行综合评价证明了我们方法的有效性和效率。我们发布这个HTTPS URL的模型和代码。

71. AAA: Adaptive Aggregation of Arbitrary Online Trackers with Theoretical Performance Guarantee [PDF] 返回目录
Heon Song, Daiki Suehiro, Seiichi Uchida
Abstract: For visual object tracking, it is difficult to realize an almighty online tracker due to the huge variations of target appearance depending on an image sequence. This paper proposes an online tracking method that adaptively aggregates arbitrary multiple online trackers. The performance of the proposed method is theoretically guaranteed to be comparable to that of the best tracker for any image sequence, although the best expert is unknown during tracking. The experimental study on the large variations of benchmark datasets and aggregated trackers demonstrates that the proposed method can achieve state-of-the-art performance. The code is available at this https URL.
摘要：对于视觉目标跟踪，这是很难实现的全能在线跟踪，由于根据图像序列目标出现的巨大变化。本文提出了一种在线跟踪方法自适应聚集随心所欲多个在线跟踪。该方法的性能在理论上保证可比性，对于任何图像序列的最佳跟踪器，虽然最好的专家是跟踪期间未知。关于基准数据集和聚集跟踪器的大的变化的实验研究表明，所提出的方法能达到的状态的最先进的性能。该代码可在此HTTPS URL。

72. Combining Shape Features with Multiple Color Spaces in Open-Ended 3D Object Recognition [PDF] 返回目录
Nils Keunecke, S. Hamidreza Kasaei
Abstract: As a consequence of an ever-increasing number of camera-based service robots, there is a growing demand for highly accurate real-time 3D object recognition. Considering the expansion of robot applications in more complex and dynamic environments, it is evident that it is impossible to pre-program all possible object categories. Robots will have to be able to learn new object categories in the field. The network architecture proposed in this work expands from the OrthographicNet, an approach recently proposed by Kasaei et al., using a deep transfer learning strategy which not only meets the aforementioned requirements but additionally generates a scale and rotation-invariant reference frame for the classification of objects. In its current iteration, the OrthographicNet only uses shape-information. With the addition of multiple color spaces, the upgraded network architecture proposed here, can achieve an even higher descriptiveness while simultaneously increasing the robustness of predictions for similarly shaped objects. Multiple color space combinations and network architectures are evaluated to find the most descriptive system. However, this performance increase is not achieved at the cost of longer processing times, because any system deployed in robotic applications will need the ability to provide real-time information about its environment. Experimental results show that the proposed network architecture ranks competitively among other state-of-the-art algorithms.
摘要：作为越来越多基于摄像头的服务机器人的结果，有高度精确的实时三维物体识别的需求不断增长。考虑到更复杂和动态的环境的机器人应用的扩展，显然，这是不可能预先计划的所有可能的对象类别。机器人必须能够在现场学习新对象的类别。在这项工作中所提出的网络体系结构从OrthographicNet，最近由Kasaei等人提出的方法膨胀。，使用深转印学习策略不仅满足前述要求，但附加地产生一个尺度和旋转不变的参考帧对的分类对象。在当前迭代中，OrthographicNet只使用形状的信息。通过添加多种颜色空间，升级后的网络架构这里建议，可以实现更高的描述性的同时，提高预测的相似形状的物体的鲁棒性。多色彩空间的组合和网络架构进行评估，以找到最描述系统。然而，这种性能的提高并不在更长的加工时间为代价的，因为部署在机器人应用的任何系统都需要提供有关其环境实时信息的能力。实验结果表明，所提出的网络体系结构状态的最先进的其他算法中竞争行列。

73. Adversarial Exposure Attack on Diabetic Retinopathy Imagery [PDF] 返回目录
Yupeng Cheng, Felix Juefei-Xu, Qing Guo, Huazhu Fu, Xiaofei Xie, Shang-Wei Lin, Weisi Lin, Yang Liu
Abstract: Diabetic retinopathy (DR) is a leading cause of vision loss in the world and numerous cutting-edge works have built powerful deep neural networks (DNNs) to automatically classify the DR cases via the retinal fundus images (RFIs). However, RFIs are usually affected by the widely existing camera exposure while the robustness of DNNs to the exposure is rarely explored. In this paper, we study this problem from the viewpoint of adversarial attack and identify a totally new task, i.e., adversarial exposure attack generating adversarial images by tuning image exposure to mislead the DNNs with significantly high transferability. To this end, we first implement a straightforward method, i.e., multiplicative-perturbation-based exposure attack, and reveal the big challenges of this new task. Then, to make the adversarial image naturalness, we propose the adversarial bracketed exposure fusion that regards the exposure attack as an element-wise bracketed exposure fusion problem in the Laplacian-pyramid space. Moreover, to realize high transferability, we further propose the convolutional bracketed exposure fusion where the element-wise multiplicative operation is extended to the convolution. We validate our method on the real public DR dataset with the advanced DNNs, e.g., ResNet50, MobileNet, and EfficientNet, showing our method can achieve high image quality and success rate of the transfer attack. Our method reveals the potential threats to the DNN-based DR automated diagnosis and can definitely benefit the development of exposure-robust automated DR diagnosis method in the future.
摘要：糖尿病性视网膜病变（DR）是在世界和众多前沿作品的视力丧失的主要原因，已经建立了强大的深层神经网络（DNNs）来自动通过视网膜眼底图像（农村金融机构）的DR案件分类。然而，农村金融机构通常受到广泛存在相机的曝光而DNNs的曝光健壮性很少探讨。在本文中，我们研究了从对抗攻击的角度来看这个问题，并找出一个完全新的任务，即通过调整图像曝光对抗泄露攻击发生对抗的图像与显著高转印误导DNNs。为此，我们首先实现一个简单的方法，即基于乘扰动泄露攻击，并透露这项新任务的巨大挑战。然后，使敌对形象自然，我们提出的对抗括号内的曝光融合是关于泄露攻击作为拉普拉斯金字塔空间元素之括号曝光的融合问题。此外，为了实现高转印性，我们还提出了卷积括号曝光融合，其中逐元素乘法操作延伸到卷积。我们验证我们对真正的公共DR数据集方法与先进DNNs，例如，ResNet50，MobileNet和EfficientNet，显示我们的方法可以实现高图像质量和传输攻击的成功率。我们的方法揭示了基于DNN-DR自动诊断的潜在威胁，并可以肯定在未来受益曝光，强大的自动DR诊断方法的发展。

74. FakeRetouch: Evading DeepFakes Detection via the Guidance of Deliberate Noise [PDF] 返回目录
Yihao Huang, Felix Juefei-Xu, Qing Guo, Xiaofei Xie, Lei Ma, Weikai Miao, Yang Liu, Geguang Pu
Abstract: The novelty and creativity of DeepFake generation techniques have attracted worldwide media attention. Many researchers focus on detecting fake images produced by these GAN-based image generation methods with fruitful results, indicating that the GAN-based image generation methods are not yet perfect. Many studies show that the upsampling procedure used in the decoder of GAN-based image generation methods inevitably introduce artifact patterns into fake images. In order to further improve the fidelity of DeepFake images, in this work, we propose a simple yet powerful framework to reduce the artifact patterns of fake images without hurting image quality. The method is based on an important observation that adding noise to a fake image can successfully reduce the artifact patterns in both spatial and frequency domains. Thus we use a combination of additive noise and deep image filtering to reconstruct the fake images, and we name our method FakeRetouch. The deep image filtering provides a specialized filter for each pixel in the noisy image, taking full advantages of deep learning. The deeply filtered images retain very high fidelity to their DeepFake counterparts. Moreover, we use the semantic information of the image to generate an adversarial guidance map to add noise intelligently. Our method aims at improving the fidelity of DeepFake images and exposing the problems of existing DeepFake detection methods, and we hope that the found vulnerabilities can help improve the future generation DeepFake detection methods.
摘要：新颖性和DeepFake生成技术创新已经吸引了全世界媒体的关注。许多研究人员集中于检测由这些氮化镓系图像生成方法与卓有成效的结果产生的假的图像，指示所述基于GaN的图像生成方法尚未完善。许多研究表明，在基于GAN-图像生成方法的解码器中使用的上采样过程不可避免地引入伪影图案成假图像。为了进一步提高DeepFake图像的保真度，在这项工作中，我们提出了一个简单而强大的框架，以减少假图像的神器模式，而不损害图像质量。该方法是基于一个重要的观察的是将噪声添加到一个假图像可以成功地减少在空间域和频域伪像图案。因此，我们使用加性噪声和深图像滤波来重建假图像的组合，我们命名我们方法FakeRetouch。深图像滤波提供了噪声图像中的每个像素的专用过滤器，取深度学习的全部优点。深深过滤图像保持非常高的保真度，以他们的DeepFake同行。此外，我们使用图像的语义信息来生成对抗性向导地图噪声智能添加。我们的方法旨在提高DeepFake图像的保真度和曝光的现有DeepFake检测方法的问题，我们希望发现的漏洞可以帮助改善下一代DeepFake检测方法。

75. Neural Architecture Search Using Stable Rank of Convolutional Layers [PDF] 返回目录
Kengo Machida, Kuniaki Uto, Koichi Shinoda, Taiji Suzuki
Abstract: In Neural Architecture Search (NAS), Differentiable ARchiTecture Search (DARTS) has recently attracted much attention due to its high efficiency. It defines an over-parameterized network with mixed edges each of which represents all operator candidates, and jointly optimizes the weights of the network and its architecture in an alternating way. However, this process prefers a model whose weights converge faster than the others, and such a model with fastest convergence often leads to overfitting. Accordingly the resulting model cannot always be well-generalized. To overcome this problem, we propose Minimum Stable Rank DARTS (MSR-DARTS), which aims to find a model with the best generalization error by replacing the architecture optimization with the selection process using the minimum stable rank criterion. Specifically, a convolution operator is represented by a matrix and our method chooses the one whose stable rank is the smallest. We evaluate MSR-DARTS on CIFAR-10 and ImageNet dataset. It achieves an error rate of 2.92% with only 1.7M parameters within 0.5 GPU-days on CIFAR-10, and a top-1 error rate of 24.0% on ImageNet. Our MSR-DARTS directly optimizes an ImageNet model with only 2.6 GPU days while it is often impractical for existing NAS methods to directly optimize a large model such as ImageNet models and hence a proxy dataset such as CIFAR-10 is often utilized.
摘要：在神经结构搜索（NAS），可微架构搜索（飞镖）近日备受关注，因为它的高效率。它定义具有混合边缘的过度参数化网络的每一个表示所有的操作员的候选人，并且共同以交替的方式优化了网络的权重和它的体系结构。然而，这个过程更喜欢它们的权重收敛比别人更快，最快的衔接往往导致过度拟合的模型和这样的模型。因此，得到的模型不能总是很好推广。为了克服这个问题，我们提出了最低稳定秩飞镖（MSR-DARTS），通过使用最小稳定等级标准的选择过程中更换结构优化，以找到最好的泛化误差的模型，其目的。具体地，卷积运算符由矩阵表示，并且我们的方法选择，其稳定秩为最小的一个。我们评估MSR-飞镖上CIFAR-10和ImageNet数据集。它实现的2.92％的误差率与仅1.7M内0.5 GPU-天参数对CIFAR-10，和24.0％上ImageNet顶1错误率。我们的MSR-DARTS直接优化只有2.6 GPU天ImageNet模型而它往往是不切实际的现有的NAS的方法来直接优化大模型如ImageNet模型，并因此如CIFAR-10被经常使用的代理数据集。

76. It's Raining Cats or Dogs? Adversarial Rain Attack on DNN Perception [PDF] 返回目录
Liming Zhai, Felix Juefei-Xu, Qing Guo, Xiaofei Xie, Lei Ma, Wei Feng, Shengchao Qin, Yang Liu
Abstract: Rain is a common phenomenon in nature and an essential factor for many deep neural network (DNN) based perception systems. Rain can often post inevitable threats that must be carefully addressed especially in the context of safety and security-sensitive scenarios (e.g., autonomous driving). Therefore, a comprehensive investigation of the potential risks of the rain to a DNN is of great importance. Unfortunately, in practice, it is often rather difficult to collect or synthesize rainy images that can represent all raining situations that possibly occur in the real world. To this end, in this paper, we start from a new perspective and propose to combine two totally different studies, i.e., rainy image synthesis and adversarial attack. We present an adversarial rain attack, with which we could simulate various rainy situations with the guidance of deployed DNNs and reveal the potential threat factors that can be brought by rain, helping to develop more rain-robust DNNs. In particular, we propose a factor-aware rain generation that simulates rain steaks according to the camera exposure process and models the learnable rain factors for adversarial attack. With this generator, we further propose the adversarial rain attack against the image classification and object detection, where the rain factors are guided by the various DNNs. As a result, it enables to comprehensively study the impacts of the rain factors to DNNs. Our largescale evaluation on three datasets, i.e., NeurIPS'17 DEV, MS COCO and KITTI, demonstrates that our synthesized rainy images can not only present visually realistic appearances, but also exhibit strong adversarial capability, which builds the foundation for further rain-robust perception studies.
摘要：雨是自然界的普遍现象和许多深层神经网络（DNN）基于感知系统的重要因素。雨经常张贴必然威胁必须特别是在安全和安全敏感的情况下（例如，自动驾驶）的情况下谨慎处理。因此，雨到DNN的潜在风险进行全面的调查是非常重要的。不幸的是，在实践中，往往是相当困难的收集或合成阴雨的图像，可以代表可能发生在现实世界中的所有下雨的情况。为此，在本文中，我们从一个全新的角度入手，并提出两个完全不同的研究，即雨季图像合成和对抗性的攻击组合。我们提出的对抗雨攻击，有了它我们可以模拟各种情况多雨与部署DNNs的指导和揭示，可被雨水所带来的潜在威胁因素，帮助开发更多的雨水稳健DNNs。特别是，我们提出了一个因素感知雨一代，根据相机的曝光过程和模式可以学习降雨因素对抗性攻击模拟雨牛排。有了这个发电机，我们进一步提出了对图像分类和目标检测，这里的雨因素由不同DNNs引导对抗雨攻击。其结果是，它能够全面学习雨因素DNNs的影响。我们对三个数据集，即NeurIPS'17 DEV，MS COCO和KITTI大规模的评估，表明我们合成的阴雨图像不仅可以在视觉上存在现实的外观，而且还表现出很强的对抗能力，它建立了进一步的雨强大的感知的基础学习。

77. Weak-shot Fine-grained Classification via Similarity Transfer [PDF] 返回目录
Junjie Chen, Li Niu, Liu Liu, Liqing Zhang
Abstract: Recognizing fine-grained categories remains a challenging task, due to the subtle distinctions among different subordinate categories, which results in the need of abundant annotated samples. To alleviate the data-hungry problem, we consider the problem of learning novel categories from web data with the support of a clean set of base categories, which is referred to as weak-shot learning. Under this setting, we propose to transfer pairwise semantic similarity from base categories to novel categories, because this similarity is highly transferable and beneficial for learning from web data. Specifically, we firstly train a similarity net on clean data, and then employ two simple yet effective strategies to leverage the transferred similarity to denoise web training data. In addition, we apply adversarial loss on similarity net to enhance the transferability of similarity. Comprehensive experiments on three fine-grained datasets demonstrate that we could dramatically facilitate webly supervised learning by a clean set and similarity transfer is effective under this setting.
摘要：鉴于细粒度的类别仍然是一个具有挑战性的任务，由于不同类别的下属之间的微妙的区别，这导致需要丰富的注解样本。为了缓解大量数据的问题，我们考虑学习新的类别的网络数据的支持下，清洁套装的基础类别，这被称为弱拍学习的问题。在此设置下，我们建议转移成对从基本类别小说类别语义相似性，因为这种相似性是非常转让，从Web数据学习有益。具体而言，我们首先培养干净数据的相似度净，然后使用两个简单而有效的策略，以充分利用转移相似降噪网络训练数据。此外，我们采用相似性净亏损对抗，加强相似的转让。三细粒度数据集综合性实验证明，我们可以显着促进webly监督由清洁套装和相似性转移的学习是在此设置下有效。

78. Multi-Level Graph Convolutional Network with Automatic Graph Learning for Hyperspectral Image Classification [PDF] 返回目录
Sheng Wan, Chen Gong, Shirui Pan, Jie Yang, Jian Yang
Abstract: Nowadays, deep learning methods, especially the Graph Convolutional Network (GCN), have shown impressive performance in hyperspectral image (HSI) classification. However, the current GCN-based methods treat graph construction and image classification as two separate tasks, which often results in suboptimal performance. Another defect of these methods is that they mainly focus on modeling the local pairwise importance between graph nodes while lack the capability to capture the global contextual information of HSI. In this paper, we propose a Multi-level GCN with Automatic Graph Learning method (MGCN-AGL) for HSI classification, which can automatically learn the graph information at both local and global levels. By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions, which helps encode the spatial context to form the graph information at local level. Moreover, we utilize multiple pathways for local-level graph convolution, in order to leverage the merits from the diverse spatial context of HSI and to enhance the expressive power of the generated representations. To reconstruct the global contextual relations, our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level. Then inference can be performed along the reconstructed graph edges connecting faraway regions. Finally, the multi-level information is adaptively fused to generate the network output. In this means, the graph learning and image classification can be integrated into a unified framework and benefit each other. Extensive experiments have been conducted on three real-world hyperspectral datasets, which are shown to outperform the state-of-the-art methods.
摘要：目前，深学习方法，尤其是图形卷积网络（GCN），已经显示出光谱图像（HSI）分类骄人的业绩。然而，当前基于GCN的方法处理图形的建设和图像分类为两个单独的任务，其结果往往是最佳性能。这些方法的另一个缺点是，它们主要集中在同时缺少捕捉恒指的全球环境信息的能力模型图节点之间的本地配对重要性。在本文中，我们提出用自动图表学习方法（MGCN-AGL）恒指分类，能够自动学习在地方和全球层面的图形信息的多层次GCN。通过采用注意机制表征空间相邻地区之间的重要性，最相关的信息可以被纳入自适应作出决定，这有助于编码空间环境，形成在地方一级的图形信息。此外，我们利用地方一级图表卷积多种途径，以便从恒指的各种空间环境利用的优点和提升产生交涉的表现力。为了重建全球语境的关系，我们MGCN-AGL编码基于已经在地方一级产生的表现表示图像区域之间的长距离依赖。然后可以沿着连接遥远区域中的重构的图的边进行推断。最后，多级信息被自适应地融合，以产生网络的输出。在这种装置中，图形学习及图像分类可以被集成到一个统一的框架和利益彼此。大量的实验已经在三个真实世界的高光谱数据集，这是证明优于国家的最先进的方法进行。

79. A Review of Visual Odometry Methods and Its Applications for Autonomous Driving [PDF] 返回目录
Kai Li Lim, Thomas Bräunl
Abstract: The research into autonomous driving applications has observed an increase in computer vision-based approaches in recent years. In attempts to develop exclusive vision-based systems, visual odometry is often considered as a key element to achieve motion estimation and self-localisation, in place of wheel odometry or inertial measurements. This paper presents a recent review to methods that are pertinent to visual odometry with an emphasis on autonomous driving. This review covers visual odometry in their monocular, stereoscopic and visual-inertial form, individually presenting them with analyses related to their applications. Discussions are drawn to outline the problems faced in the current state of research, and to summarise the works reviewed. This paper concludes with future work suggestions to aid prospective developments in visual odometry.
摘要：研究自动驾驶的应用已经观察到计算机基于视觉的方式增加在最近几年。在尝试开发独家基于视觉的系统，视觉里程通常被认为是实现运动估计和自定位，代替车轮里程计或惯性测量的一个关键要素。本文提出了一种最近审查方法，这些方法相关的视觉里程与自动驾驶的重视。本文综述了他们的单眼，立体感和视觉惯性形式的视觉里程，分别与有关他们的应用程序的分析呈现它们。讨论是来描绘面临的研究现状存在的问题，并总结回顾了作品。本文从今后工作的建议总结了视觉里程，以帮助未来的发展。

80. ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference [PDF] 返回目录
Zhihang Yuan, Xin Liu, Bingzhe Wu, Guangyu Sun
Abstract: Dynamic inference is a feasible way to reduce the computational cost of convolutional neural network(CNN), which can dynamically adjust the computation for each input sample. One of the ways to achieve dynamic inference is to use multi-stage neural network, which contains a sub-network with prediction layer at each stage. The inference of a input sample can exit from early stage if the prediction of the stage is confident enough. However, design a multi-stage CNN architecture is a non-trivial task. In this paper, we introduce a general framework, ENAS4D, which can efficiently search for optimal multi-stage CNN architecture for dynamic inference in a well-designed search space. Firstly, we propose a method to construct the search space with multi-stage convolution. The search space include different numbers of layers, different kernel sizes and different numbers of channels for each stage and the resolution of input samples. Then, we train a once-for-all network that supports to sample diverse multi-stage CNN architecture. A specialized multi-stage network can be obtained from the once-for-all network without additional training. Finally, we devise a method to efficiently search for the optimal multi-stage network that trades the accuracy off the computational cost taking the advantage of once-for-all network. The experiments on the ImageNet classification task demonstrate that the multi-stage CNNs searched by ENAS4D consistently outperform the state-of-the-art method for dyanmic inference. In particular, the network achieves 74.4% ImageNet top-1 accuracy under 185M average MACs.
摘要：动态推论是降低卷积神经网络的（CNN）的计算成本，可以动态地调整所述计算对于每个输入采样的可行方法。一的，以实现动态推理方法是使用多级的神经网络，它包含一个子网络并在每个阶段预测层。一个输入样本的推断可以从早期退出如果该级的预测是足够的信心。然而，设计了多级CNN架构是一个不平凡的任务。在本文中，我们介绍了一个总体框架，ENAS4D，它可以有效地在一个精心设计的搜索空间寻找最佳的多级CNN架构动态推理。首先，我们提出构建具有多级卷积的搜索空间的方法。搜索空间包括不同数目的层，不同的内核大小和不同数量的用于每个阶段的通道和输入样本的分辨率。然后，我们培养一个一劳永逸的所有网络，支持与样本不同的多级CNN架构。一个专门的多级网络可以从一次参加的所有网络，无需额外的培训来获得。最后，我们设计一种方法能够有效地寻找最优多级网络交易的准确性断计算成本采取一劳永逸的所有网络的优势。在ImageNet分类任务的实验结果表明，多级细胞神经网络搜查ENAS4D不断超越的dyanmic推理的国家的最先进的方法。特别地，网络实现74.4％ImageNet顶部-1下185M平均的MAC精度。

81. Recognizing Micro-expression in Video Clip with Adaptive Key-frame Mining [PDF] 返回目录
Min Peng, Chongyang Wang, Yuan Gao, Tao Bi, Tong Chen, Yu Shi, Xiang-Dong Zhou
Abstract: As a spontaneous expression of emotion on face, micro-expression is receiving increasing focus. Whist better recognition accuracy is achieved by various deep learning (DL) techniques, one characteristic of micro-expression has been not fully leveraged. That is, such facial movement is transient and sparsely localized through time. Therefore, the representation learned from a long video clip is usually redundant. On the other hand, methods utilizing the single apex frame require manual annotations and sacrifice the temporal dynamic information. To simultaneously spot and recognize such fleeting facial movement, we propose a novel end-to-end deep learning architecture, referred to as Adaptive Key-frame Mining Network (AKMNet). Operating on the raw video clip of micro-expression, AKMNet is able to learn discriminative spatio-temporal representation by combining the spatial feature of self-exploited local key frames and their global-temporal dynamics. Empirical and theoretical evaluations show advantages of the proposed approach with improved performance comparing with other state-of-the-art methods.
摘要：面部情感的自发表达，微表达越来越受到关注的焦点。惠斯特更好的识别精度由各种深度学习（DL）技术来实现，微表达的一个特征已经没有完全利用。也就是说，这样的面部运动是短暂的，随着时间稀疏本地化。因此，从长的视频剪辑学会的代表通常是多余的。在另一方面，利用单个顶点框架的方法需要手动注释和牺牲时间动态信息。同时发现和识别这样短暂的面部运动，我们提出了一种新颖的端至端的深学习架构，被称作自适应关键帧挖掘的网络（AKMNet）。操作上微表达的原始视频剪辑，AKMNet能够结合自身利用本地关键帧和他们的全球时间动态的空间特征，以学习辨别时空表示。经验和理论评估显示性能得到改善与其他国家的最先进的方法相比，该方法的优点。

82. BargainNet: Background-Guided Domain Translation for Image Harmonization [PDF] 返回目录
Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang
Abstract: Image composition is a fundamental operation in image editing field. However, unharmonious foreground and background downgrade the quality of composite image. Image harmonization, which adjusts the foreground to improve the consistency, is an essential yet challenging task. Previous deep learning based methods mainly focus on directly learning the mapping from composite image to real image, while ignoring the crucial guidance role that background plays. In this work, with the assumption that the foreground needs to be translated to the same domain as background, we formulate image harmonization task as background-guided domain translation. Therefore, we propose an image harmonization network with a novel domain code extractor and well-tailored triplet losses, which could capture the background domain information to guide the foreground harmonization. Extensive experiments on the existing image harmonization benchmark demonstrate the effectiveness of our proposed method.
摘要：图像组合物是在图像编辑字段的基本操作。然而，不和谐的前景和背景降级合成图像的质量。图像一致，其调整前景改善的一致性，是一个重要而具有挑战性的任务。上一页深度学习为基础的方法主要集中在直接学习从合成图像真实图像映射，而忽略了重要的指导作用是充当背景。在这项工作中，在假设的前景需要被翻译成同一个域中的背景下，我们制定图像一致任务为背景制导领域转换。因此，我们提出了一种图像协调网络具有新颖域码提取器和针对需要的三重态损耗，其可以捕获背景域信息来指导前景协调一致。在现有的图像一致基准大量的实验证明我们提出的方法的有效性。

83. Introspective Learning by Distilling Knowledge from Online Self-explanation [PDF] 返回目录
Jindong Gu, Zhiliang Wu, Volker Tresp
Abstract: In recent years, many explanation methods have been proposed to explain individual classifications of deep neural networks. However, how to leverage the created explanations to improve the learning process has been less explored. As the privileged information, the explanations of a model can be used to guide the learning process of the model itself. In the community, another intensively investigated privileged information used to guide the training of a model is the knowledge from a powerful teacher model. The goal of this work is to leverage the self-explanation to improve the learning process by borrowing ideas from knowledge distillation. We start by investigating the effective components of the knowledge transferred from the teacher network to the student network. Our investigation reveals that both the responses in non-ground-truth classes and class-similarity information in teacher's outputs contribute to the success of the knowledge distillation. Motivated by the conclusion, we propose an implementation of introspective learning by distilling knowledge from online self-explanations. The models trained with the introspective learning procedure outperform the ones trained with the standard learning procedure, as well as the ones trained with different regularization methods. When compared to the models learned from peer networks or teacher networks, our models also show competitive performance and requires neither peers nor teachers.
摘要：近年来，许多解释方法已经被提出来解释深层神经网络的各个分类。然而，如何利用创建的解释，以提高学习的过程一直不太探讨。作为特权信息，模型的解释可用于指导模型本身的学习过程。在社会上，用于指导模型的训练另一个深入研究特权信息是从一个伟大的老师模型的知识。这项工作的目标是利用自身的解释由知识蒸馏借款理念，提高学习的过程。我们通过调查从教师网络传送到学生网络知识的有效成分开始。我们的调查显示，在老师的产出非地面实况类和类相似的信息均反应有助于知识蒸馏的成功。由结论的启发，我们通过在线自助解释蒸馏知识提出内省学习的实现。与内省的学习过程培养出来的车型优于其他标准的学习过程训练的人，以及与不同的正则化方法训练的。相较于从点对点网络或教师网络学习的模型，我们的模型也显示出有竞争力的性能，既不需要也不是同行教师。

84. An Efficient Language-Independent Multi-Font OCR for Arabic Script [PDF] 返回目录
Hussein Osman, Karim Zaghw, Mostafa Hazem, Seifeldin Elsehely
Abstract: Optical Character Recognition (OCR) is the process of extracting digitized text from images of scanned documents. While OCR systems have already matured in many languages, they still have shortcomings in cursive languages with overlapping letters such as the Arabic language. This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document. Our Arabic OCR system consists of the following modules: Pre-processing, Word-level Feature Extraction, Character Segmentation, Character Recognition, and Post-processing. This paper also proposes an improved font-independent character segmentation algorithm that outperforms the state-of-the-art segmentation algorithms. Lastly, the paper proposes a neural network model for the character recognition task. The system has experimented on several open Arabic corpora datasets with an average character segmentation accuracy 98.06%, character recognition accuracy 99.89%, and overall system accuracy 97.94% achieving outstanding results compared to the state-of-the-art Arabic OCR systems.
摘要：光学字符识别（OCR）是提取扫描文件的图像数字化文本的过程。虽然OCR系统已经在许多语言已经成熟，他们仍然有重叠字母草书语言的缺点，如阿拉伯语。本文提出了一种完整的阿拉伯OCR系统，该系统需要阿拉伯纳斯赫脚本的扫描图像作为输入，并产生相应的数字文档。我们的阿拉伯语的OCR系统包括以下模块：前处理，词一级特征提取，字符分割，字符识别，以及后处理。本文还提出了优于国家的最先进的分割算法的改进的字体无关的字符分割算法。最后，本文提出了一种字符识别任务神经网络模型。该系统已经试验了几个开放阿拉伯语语料库的数据集，平均字符分割准确度98.06％，字符识别准确度99.89％，而整个系统的准确度97.94％，较国家的最先进的OCR阿拉伯语实现系统的优异成绩。

85. Holistic Grid Fusion Based Stop Line Estimation [PDF] 返回目录
Runsheng Xu, Faezeh Tafazzoli, Li Zhang, Timo Rehfeld, Gunther Krehl, Arunava Seal
Abstract: Intersection scenarios provide the most complex traffic situations in Autonomous Driving and Driving Assistance Systems. Knowing where to stop in advance in an intersection is an essential parameter in controlling the longitudinal velocity of the vehicle. Most of the existing methods in literature solely use cameras to detect stop lines, which is typically not sufficient in terms of detection range. To address this issue, we propose a method that takes advantage of fused multi-sensory data including stereo camera and lidar as input and utilizes a carefully designed convolutional neural network architecture to detect stop lines. Our experiments show that the proposed approach can improve detection range compared to camera data alone, works under heavy occlusion without observing the ground markings explicitly, is able to predict stop lines for all lanes and allows detection at a distance up to 50 meters.
摘要：交叉口场景提供自动驾驶和驾驶辅助系统中最复杂的交通状况。知道在哪里停止预先在交叉路口是在控制车辆的纵向速度的重要参数。大多数文献中现有的方法仅使用相机来检测停止线，其通常不在检测范围而言足够。为了解决这个问题，我们提出了利用融合多感官数据，包括立体相机和激光雷达作为输入，并采用了精心设计的卷积神经网络结构来检测停止线的方法。我们的实验表明，该方法能比单独相机数据提高了检测范围，严重遮挡下的作品，而不明确地观察地面标志，是能够预测的所有车道停车线，并允许在检测距离可达50米。

86. Psoriasis Severity Assessment with a Similarity-Clustering Machine Learning Approach Reduces Intra- and Inter-observation variation [PDF] 返回目录
Arman Garakani, Martin Malmstedt-Miller, Ionela Manole, Adrian Y. Rossler, John R. Zibert
Abstract: Psoriasis is a complex disease with many variations in genotype and phenotype. General advancements in medicine has further complicated both assessments and treatment for both physicians and dermatologist alike. Even with all of our technological progress we still primarily use the assessment tool Psoriasis Area and Severity Index (PASI) for severity assessments which was developed in the 1970s. In this study we evaluate a method involving digital images, a comparison web application and similarity clustering, developed to improve the assessment tool in terms of intra- and inter-observer variation. Images of patients was collected from a mobile device. Images were captured of the same lesion area taken approximately 1 week apart. Five dermatologists evaluated the severity of psoriasis by modified-PASI, absolute scoring and a relative pairwise PASI scoring using similarity-clustering and conducted using a web-program displaying two images at a time. mPASI scoring of single photos by the same or different dermatologist showed mPASI ratings of 50% to 80%, respectively. Repeated mPASI comparison using similarity clustering showed consistent mPASI ratings > 95%. Pearson correlation between absolute scoring and pairwise scoring progression was 0.72.
摘要：银屑病是一种复杂的疾病，在基因型和表现型的许多变化。在医学上一般的进步，医生和皮肤科医生都既评估和处理进一步复杂化。即使我们所有的技术进步，我们仍然主要使用这是在20世纪70年代开发的严重程度评估的评估工具银屑病面积和严重程度指数（PASI）。在这项研究中，我们评估涉及数字图像，比较的Web应用程序和相似性聚类，发展以改善区域内和跨观察员变化方面的评估工具的方法。患者的图像是从移动设备收集。捕捉图像拍摄大约间隔1周相同的病变区域。五个皮肤科评价由改性-PASI，绝对得分和使用相似性的聚类的相对成对PASI得分银屑病的严重程度和使用基于web的程序一次显示两个图像进行的。由相同或不同的皮肤科医生的单个照片mPASI得分显示出50％的评分mPASI至80％之间。使用类似性群集重复mPASI比较表明一致mPASI收视率> 95％。绝对得分和得分成对的进展之间Pearson相关系数为0.72。

87. Grasp-type Recognition Leveraging Object Affordance [PDF] 返回目录
Naoki Wake, Kazuhiro Sasabuchi, Katsushi Ikeuchi
Abstract: A key challenge in robot teaching is grasp-type recognition with a single RGB image and a target object name. Here, we propose a simple yet effective pipeline to enhance learning-based recognition by leveraging a prior distribution of grasp types for each object. In the pipeline, a convolutional neural network (CNN) recognizes the grasp type from an RGB image. The recognition result is further corrected using the prior distribution (i.e., affordance), which is associated with the target object name. Experimental results showed that the proposed method outperforms both a CNN-only and an affordance-only method. The results highlight the effectiveness of linguistically-driven object affordance for enhancing grasp-type recognition in robot teaching.
摘要：在机器人的示教的一个主要挑战是把握型识别与单个RGB图像和目标对象名称。在这里，我们提出了一个简单而有效的管道通过利用掌握的类型为每个对象先验分布，加强学习型识别。在管道，卷积神经网络（CNN）识别来自RGB图像的把握类型。识别结果是使用先验分布（即，启示），其与所述目标对象相关联的名称进一步校正。实验结果表明，所提出的方法优于既有仅CNN和仅启示-方法。结果突出了在机器人的示教增强把握型识别语言驱动对象启示的有效性。

88. Overfit Neural Networks as a Compact Shape Representation [PDF] 返回目录
Thomas Davies, Derek Nowrouzezahrai, Alec Jacobson
Abstract: Neural networks have proven to be effective approximators of signed distance fields (SDFs) for solid 3D objects. While prior work has focused on the generalization power of such approximations, we instead explore their suitability as a compact - if purposefully overfit - SDF representation of individual shapes. Specifically, we ask whether neural networks can serve as first-class implicit shape representations in computer graphics. We call such overfit networks Neural Implicits. Similar to SDFs stored on a regular grid, Neural Implicits have fixed storage profiles and memory layout, but afford far greater accuracy. At equal storage cost, Neural Implicits consistently match or exceed the accuracy of irregularly-sampled triangle meshes. We achieve this with a combination of a novel loss function, sampling strategy and supervision protocol designed to facilitate robust shape overfitting. We demonstrate the flexibility of our representation on a variety of standard rendering and modelling tasks.
摘要：神经网络已经被证明是坚实的3D对象签订距离字段（的SDF）的有效逼近。虽然以前的工作主要集中在这种近似的推广力量，我们不是探讨其是否适合作为紧凑 - 如果故意过度拟合 - 单个形状的SDF表示。具体来说，我们要问的神经网络是否可以作为计算机图形学一流的隐含形状表示。我们称这种过度拟合网络神经Implicits。存储在规则网格上的SDF类似，神经Implicits有固定的存储配置文件和内存布局，但买得起远远更高的精度。以相等的存储成本，神经Implicits一致匹配或者超过不规则采样的三角形网格的精度。我们用一种新的损失函数的组合实现这一目标，采样战略和监督协议旨在促进稳健的形状过度拟合。我们证明了我们表示对各种标准的渲染和建模任务的灵活性。

89. Learning Audio-Visual Representations with Active Contrastive Coding [PDF] 返回目录
Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
Abstract: Contrastive coding has achieved promising results in self-supervised representation learning. However, there are practical challenges given that obtaining a tight lower bound on mutual information (MI) requires a sample size exponential in MI and thus a large set of negative samples. We can incorporate more samples by building a large queue-based dictionary, but there are theoretical limits to performance improvements even with a large number of negative samples. We hypothesize that 'random negative sampling' leads to a highly redundant dictionary, which could result in representations that are suboptimal for downstream tasks. In this paper, we propose an active contrastive coding approach that builds an 'actively sampled' dictionary with diverse and informative items, which improves the quality of negative samples and achieves substantially improved results on tasks where there is high mutual information in the data, e.g., video classification. Our model achieves state-of-the-art performance on multiple challenging audio and visual downstream benchmarks including UCF101, HMDB51 and ESC50.
摘要：对比编码已在自我监督表示学习取得了可喜的成果。不过，也有因为获得一紧下互信息（MI）结合实际的挑战，需要在MI样本大小的指数，从而一大组阴性样品。我们可以通过建立一个大的基于队列的字典包含更多的样本，但也有理论极限性能的改进，即使有大量阴性样品。我们推测，“随机抽样负”导致一个高度冗余的字典，这可能导致了为下游任务的次优表示。在本文中，我们提出了一个积极的对比编码的办法，建立与多样化和信息项目的“积极采样”，字典，提高阴性样品的质量，并在有数据的高互信息在任务上取得显着改善的结果，例如视频分类。我们的模型实现了对多个具有挑战性的视听下游基准包括UCF101，HMDB51和ESC50状态的最先进的性能。

90. Defending against substitute model black box adversarial attacks with the 01 loss [PDF] 返回目录
Yunzhe Xue, Meiyan Xie, Usman Roshan
Abstract: Substitute model black box attacks can create adversarial examples for a target model just by accessing its output labels. This poses a major challenge to machine learning models in practice, particularly in security sensitive applications. The 01 loss model is known to be more robust to outliers and noise than convex models that are typically used in practice. Motivated by these properties we present 01 loss linear and 01 loss dual layer neural network models as a defense against transfer based substitute model black box attacks. We compare the accuracy of adversarial examples from substitute model black box attacks targeting our 01 loss models and their convex counterparts for binary classification on popular image benchmarks. Our 01 loss dual layer neural network has an adversarial accuracy of 66.2%, 58%, 60.5%, and 57% on MNIST, CIFAR10, STL10, and ImageNet respectively whereas the sigmoid activated logistic loss counterpart has accuracies of 63.5%, 19.3%, 14.9%, and 27.6%. Except for MNIST the convex counterparts have substantially lower adversarial accuracies. We show practical applications of our models to deter traffic sign and facial recognition adversarial attacks. On GTSRB street sign and CelebA facial detection our 01 loss network has 34.6% and 37.1% adversarial accuracy respectively whereas the convex logistic counterpart has accuracy 24% and 1.9%. Finally we show that our 01 loss network can attain robustness on par with simple convolutional neural networks and much higher than its convex counterpart even when attacked with a convolutional network substitute model. Our work shows that 01 loss models offer a powerful defense against substitute model black box attacks.
摘要：替换模型黑箱攻击只需通过访问其输出的标签创建一个目标模型对抗性的例子。这给机器学习模型的一个主要挑战在实践中，特别是在安全敏感的应用。 01损耗模型被称为是比通常在实践中使用凸模型更健壮的对异常值和噪声。通过这些特性，我们提出01损耗线性和01损失双层神经网络模型作为对传输基于替代模型黑箱攻击防御动机。我们比较了从替代模型黑箱攻击瞄准我们的01损耗模型及其凸同行上流行的图像基准二元分类对抗性的例子准确性。我们的01损失双层神经网络具有的66.2％，58％，60.5％，和57％的对抗性精度MNIST，CIFAR10，STL10，和ImageNet分别而乙状结肠激活物流损失对方具有63.5％，19.3％的准确度， 14.9％和27.6％。除了MNIST凸同行具有基本上较低的对抗性精度。我们发现我们的模型的实际应用，以防止交通标志和面部识别对抗性攻击。在GTSRB路牌和CelebA面部检测我们的01损耗网络有34.6％和37.1％对抗性精度分别而凸物流对口具有精度为24％和1.9％。最后，我们表明，即使使用卷积网络替代模型袭击了我们的01损耗网络可以达到鲁棒性看齐，与简单的卷积神经网络，远高于其对应的凸起。我们的工作表明，01损耗模型提供针对替代模型进行黑箱攻击强悍的防御。

91. Multi-Task Learning with Deep Neural Networks: A Survey [PDF] 返回目录
Michael Crawshaw
Abstract: Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model. Such approaches offer advantages like improved data efficiency, reduced overfitting through shared representations, and fast learning by leveraging auxiliary information. However, the simultaneous learning of multiple tasks presents new design and optimization challenges, and choosing which tasks should be learned jointly is in itself a non-trivial problem. In this survey, we give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field. Our discussion is structured according to a partition of the existing deep MTL techniques into three groups: architectures, optimization methods, and task relationship learning. We also provide a summary of common multi-task benchmarks.
摘要：多任务学习（MTL）是机器学习的一个分支，其中多个任务同时通过共享模式的经验教训。这种方法提供了改进的等数据效率的优点，减少了通过共享表示过拟合，并通过利用辅助信息快速学习。然而，多任务礼物的同时学习新的设计和优化的挑战，并且选择哪些任务应共同了解到这本身就是一个不平凡的问题。在这次调查中，我们给出了多任务学习方法对深层神经网络的概述，以及领域内的总结完善和最近两个方向的目的。架构，优化方法和任务的关系学习：我们的讨论是根据分区的现有深MTL技术分为三组结构。我们还提供了常见的多任务基准的摘要。

92. Impact of lung segmentation on the diagnosis and explanation of COVID-19 in chest X-ray images [PDF] 返回目录
Lucas O. Teixeira, Rodolfo M. Pereira, Diego Bertolini, Luiz S. Oliveira, Loris Nanni, Yandre M. G. Costa
Abstract: The COVID-19 pandemic is undoubtedly one of the biggest public health crises our society has ever faced. This paper's main objectives are to demonstrate the impact of lung segmentation in COVID-19 automatic identification using CXR images and evaluate which contents of the image decisively contribute to the identification. We have performed lung segmentation using a U-Net CNN architecture, and the classification using three well-known CNN architectures: VGG, ResNet, and Inception. To estimate the impact of lung segmentation, we applied some Explainable Artificial Intelligence (XAI), such as LIME and Grad-CAM. To evaluate our approach, we built a database named RYDLS-20-v2, following our previous publication and the COVIDx database guidelines. We evaluated the impact of creating a COVID-19 CXR image database from different sources, called database bias, and the COVID-19 generalization from one database to another, representing our less biased scenario. The experimental results of the segmentation achieved a Jaccard distance of 0.034 and a Dice coefficient of 0.982. In the best and more realistic scenario, we achieved an F1-Score of 0.74 and an area under the ROC curve of 0.9 for COVID-19 identification using segmented CXR images. Further testing and XAI techniques suggest that segmented CXR images represent a much more realistic and less biased performance. More importantly, the experiments conducted show that even after segmentation, there is a strong bias introduced by underlying factors from the data sources, and more efforts regarding the creation of a more significant and comprehensive database still need to be done.
摘要：COVID-19大流行，无疑是我们的社会所面临的最大的公共卫生危机之一。本文的主要目标是使用CXR图像来演示肺分割在COVID-19自动识别的影响，并评价图像的哪些内容果断地有助于鉴定。我们使用U型网CNN架构进行肺分割，并用三个著名的CNN结构分类：VGG，RESNET，和启。为了估计肺部分割的影响，我们应用了一些可以解释的人工智能（XAI），如石灰和梯度-CAM。为了评估我们的方法，我们建立了一个名为RYDLS-20-V2数据库，下面我们以前的出版物和数据库COVIDx指引。我们评估创建不同来源的一个COVID-19 CXR图像数据库，数据库名为偏置从一个数据库到另一个数据库的影响，以及COVID-19概括，代表我们的较少偏见的场景。分割的实验结果获得的一个0.034的Jaccard距离和0.982一个骰子系数。在最好的和更现实的情况下，我们实现了0.74的F1-得分和0.9使用分段CXR图像COVID-19标识的ROC曲线下面积。进一步的测试和XAI技术建议，分段CXR图像代表一个更为现实的和更少的偏置性能。更重要的是，在实验进行显示，即使分割后，存在通过从数据源的基本因素，以及关于建立仍然需要做更显著和综合数据库更加努力推出一个强烈的偏见。

93. Weight Training Analysis of Sportsmen with Kinect Bioinformatics for Form Improvement [PDF] 返回目录
Muhammad Umair Khan, Khawar Saeed, Sidra Qadeer
Abstract: Sports franchises invest a lot in training their athletes. use of latest technology for this purpose is also very common. We propose a system of capturing motion of athletes during weight training and analyzing that data to find out any shortcomings and imperfections. Our system uses Kinect depth image to compute different parameters of athlete's selected joints. These parameters are passed through certain algorithms to process them and formulate results on their basis. Some parameters like range of motion, speed and balance can be analyzed in real time. But for comparison to be performed between motions, data is first recorded and stored and then processed for accurate results. Our results depict that this system can be easily deployed and implemented to provide a very valuable insight to dynamics of a work out and help an athlete in improving his form.
摘要：体育特许经营在训练他们的运动员投入了很多。利用最新的技术用于这一目的也很常见。我们建议重量训练期间图像捕捉运动员的运动和分析数据，找出任何缺点和不完善的制度。我们的系统使用Kinect的深度图像计算运动员的选定关节的不同参数。这些参数是通过一定的算法通过对它们进行处理，并制定自己的基础上的结果。像的运动，速度和平衡范围内的一些参数可以实时进行分析。但用于向运动之间进行比较，数据首先被记录和存储，然后准确的结果进行处理。我们的研究结果描绘了该系统可以轻松地部署和实施提供了非常有价值的见解，以工作的动力去帮助运动员提高自己的形式。

94. Improving Automated COVID-19 Grading with Convolutional Neural Networks in Computed Tomography Scans: An Ablation Study [PDF] 返回目录
Coen de Vente, Luuk H. Boulogne, Kiran Vaidhya Venkadesh, Cheryl Sital, Nikolas Lessmann, Colin Jacobs, Clara I. Sánchez, Bram van Ginneken
Abstract: Amidst the ongoing pandemic, several studies have shown that COVID-19 classification and grading using computed tomography (CT) images can be automated with convolutional neural networks (CNNs). Many of these studies focused on reporting initial results of algorithms that were assembled from commonly used components. The choice of these components was often pragmatic rather than systematic. For instance, several studies used 2D CNNs even though these might not be optimal for handling 3D CT volumes. This paper identifies a variety of components that increase the performance of CNN-based algorithms for COVID-19 grading from CT images. We investigated the effectiveness of using a 3D CNN instead of a 2D CNN, of using transfer learning to initialize the network, of providing automatically computed lesion maps as additional network input, and of predicting a continuous instead of a categorical output. A 3D CNN with these components achieved an area under the ROC curve (AUC) of 0.934 on our test set of 105 CT scans and an AUC of 0.923 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2D CNN. An ablation study demonstrated that in addition to using a 3D CNN instead of a 2D CNN transfer learning contributed the most and continuous output contributed the least to improving the model performance.
摘要：在一片正在进行的大流行，一些研究已经表明，COVID-19分类和使用计算机断层摄影分级（CT）图像可以与卷积神经网络（细胞神经网络）自动化。许多研究侧重于报告中，从常用的部件进行组装该算法初步成效。这些组件的选择往往是务实的，而不是系统性的。例如，一些研究使用2D细胞神经网络，即使这些可能不适合处理三维CT体积优化。本文标识的各种增加的基于CNN-算法COVID-19的性能，从CT图像分级元件。我们调查使用3D CNN代替2D CNN的，利用传递学习初始化提供自动计算病变的网络，的有效性映射作为附加网络的输入，并预测，而不是一个分类输出的连续的。一种3D CNN与这些组件中的比较的ROC曲线的0.934在我们的测试组的105次CT扫描和0.923在公开的组742次CT扫描，显着改善的AUC（AUC）下所取得的区域与先前公布的2D CNN。消融的研究证明，除了使用3D CNN代替2D CNN转印学习的贡献最大和连续输出作出贡献的至少改善模型的性能。

95. When Healthcare Meets Off-the-Shelf WiFi: A Non-Wearable and Low-Costs Approach for In-Home Monitoring [PDF] 返回目录
Lingchao Guo, Zhaoming Lu, Shuang Zhou, Xiangming Wen, Zhihong He
Abstract: As elderly population grows, social and health care begin to face validation challenges, in-home monitoring is becoming a focus for professionals in the field. Governments urgently need to improve the quality of healthcare services at lower costs while ensuring the comfort and independence of the elderly. This work presents an in-home monitoring approach based on off-the-shelf WiFi, which is low-costs, non-wearable and makes all-round daily healthcare information available to caregivers. The proposed approach can capture fine-grained human pose figures even through a wall and track detailed respiration status simultaneously by off-the-shelf WiFi devices. Based on them, behavioral data, physiological data and the derived information (e.g., abnormal events and underlying diseases), of the elderly could be seen by caregivers directly. We design a series of signal processing methods and a neural network to capture human pose figures and extract respiration status curves from WiFi Channel State Information (CSI). Extensive experiments are conducted and according to the results, off-the-shelf WiFi devices are capable of capturing fine-grained human pose figures, similar to cameras, even through a wall and track accurate respiration status, thus demonstrating the effectiveness and feasibility of our approach for in-home monitoring.
摘要：随着老年人口的增加，社会和卫生保健开始面临挑战的验证，家用监视正在成为一个重点领域专业人士。政府迫切需要以更低的成本，提高医疗服务质量的同时，确保老年人的舒适性和独立性。这项工作提出在家庭基础上现成的，货架无线网络，这是 - 成本低，不耐磨，使提供给照顾者全方位的日常健康信息监测方法。所提出的方法甚至可以通过墙壁捕捉细粒度人体姿势的数字和关闭的，现成的WiFi设备同时跟踪详细的呼吸状态。基于它们，行为数据，生理数据和所导出的信息（例如，异常事件和基础疾病），老年人的可以由护理人员直接看到。我们设计了一系列的信号处理方法和神经网络来捕捉人体姿势的数字和提取呼吸状态的曲线来自WiFi信道状态信息（CSI）。广泛实验进行，并根据结果，关断的，现成的WiFi设备能够捕获细粒人类姿势图中，相似于摄像机的，即使是通过壁和跟踪精确呼吸状态，因此证明的有效性和可行性我们的接近的家用监视。

96. Contrastive Clustering [PDF] 返回目录
Yunfan Li, Peng Hu, Zitao Liu, Dezhong Peng, Joey Tianyi Zhou, Xi Peng
Abstract: In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.
摘要：在本文中，我们提出了一个阶段的在线聚类方法称为对比聚类（CC），其中明确执行实例和集群级对比学习。具体而言，对于给定的数据集，所述正和负实例对通过数据扩充构造，然后投影到特征空间。在其中，所述实例和集群级对比学习分别在行和列空间通过最大化正对的相似性，同时尽量减少那些消极的进行。我们的主要发现是，该功能矩阵的行可以被看作是实例的软标签，并相应地列，可进一步视为集群表示。通过同时优化实例和集群级对比损失，模型共同学习的陈述和集群分配在一个终端到终端的方式。广泛的实验结果表明，显着CC优于在六个挑战图像17种的基准竞争的聚类方法。特别地，CC实现的0.705（0.431）对CIFAR-10（CIFAR-100）的数据集的NMI，这是最好的基线相比，高达19 \％（39 \％）的性能改善。

97. Modeling Score Distributions and Continuous Covariates: A Bayesian Approach [PDF] 返回目录
Mel McCurrie, Hamish Nicholson, Walter J. Scheirer, Samuel Anthony
Abstract: Computer Vision practitioners must thoroughly understand their model's performance, but conditional evaluation is complex and error-prone. In biometric verification, model performance over continuous covariates---real-number attributes of images that affect performance---is particularly challenging to study. We develop a generative model of the match and non-match score distributions over continuous covariates and perform inference with modern Bayesian methods. We use mixture models to capture arbitrary distributions and local basis functions to capture non-linear, multivariate trends. Three experiments demonstrate the accuracy and effectiveness of our approach. First, we study the relationship between age and face verification performance and find previous methods may overstate performance and confidence. Second, we study preprocessing for CNNs and find a highly non-linear, multivariate surface of model performance. Our method is accurate and data efficient when evaluated against previous synthetic methods. Third, we demonstrate the novel application of our method to pedestrian tracking and calculate variable thresholds and expected performance while controlling for multiple covariates.
摘要：计算机视觉从业者必须深入了解他们的模型中的表现，但有条件的评价是复杂且容易出错。在生物特征识别，在连续模型中参数的性能---影响性能的图像---的实数的属性特别具有挑战性的研究。我们开发的匹配和不匹配分值分布在连续协变量生成模型，并与现代贝叶斯方法进行推理。我们使用混合模型来捕捉任意分布和当地的基础功能，以捕捉非线性的，多元的趋势。三个实验证明了该方法的准确性和有效性。首先，我们研究验证性能年龄和面之间的关系，发现以前的方法可以夸大性能和信心。其次，我们研究了预处理和细胞神经网络找到一个高度非线性的，模型表现多元的表面。当与先前的合成方法进行评价我们的方法是准确和有效的数据。第三，我们证明了我们方法的行人跟踪和计算变量的阈值的新的应用和预期性能，同时控制多个协变量。

98. Reconstruct high-resolution multi-focal plane images from a single 2D wide field image [PDF] 返回目录
Jiabo Ma, Sibo Liu, Shenghua Cheng, Xiuli Liu, Li Cheng, Shaoqun Zeng
Abstract: High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the proposed MFPINet adopts generative adversarial network framework and the strategies of post-sampling and refocusing all focal planes at one time. We conduct a series experiments on cytology microscopy images and demonstrate that MFPINet performs well on both axial refocusing and horizontal super resolution. Furthermore, MFPINet is approximately 24 times faster than current refocusing methods for reconstructing the same volume images. The proposed method has the potential to greatly increase the speed of high-resolution 3D imaging and expand the application of low-resolution wide-field images.
摘要：高分辨率三维医学图像进行分析和诊断，但轴向扫描获得这些重要的是非常耗时。在本文中，我们提出了一种快速的端至端的多焦平面成像网络（MFPINet）重建从单个2D低分辨率野生日提交的图像的高分辨率的多焦点平面的图像，而不依赖于扫描。为了快速获得逼真的图像MFP，建议MFPINet采用生成对抗性的网络架构和后取样的策略，并在同一时间重新调整所有焦平面。我们细胞学显微镜图像进行一系列实验，证明MFPINet执行以及在两个轴向重新调整和水平的超分辨率。此外，MFPINet比当前重聚焦方法用于重构相同体积的图像快大约24倍。所提出的方法具有大大增加的高分辨率3D成像的速度，扩大低分辨率的宽视场图像的应用的潜力。

99. Learning Soft Labels via Meta Learning [PDF] 返回目录
Nidhi Vyas, Shreyas Saxena, Thomas Voice
Abstract: One-hot labels do not represent soft decision boundaries among concepts, and hence, models trained on them are prone to overfitting. Using soft labels as targets provide regularization, but different soft labels might be optimal at different stages of optimization. Also, training with fixed labels in the presence of noisy annotations leads to worse generalization. To address these limitations, we propose a framework, where we treat the labels as learnable parameters, and optimize them along with model parameters. The learned labels continuously adapt themselves to the model's state, thereby providing dynamic regularization. When applied to the task of supervised image-classification, our method leads to consistent gains across different datasets and architectures. For instance, dynamically learned labels improve ResNet18 by 2.1% on CIFAR100. When applied to dataset containing noisy labels, the learned labels correct the annotation mistakes, and improves over state-of-the-art by a significant margin. Finally, we show that learned labels capture semantic relationship between classes, and thereby improve teacher models for the downstream task of distillation.
摘要：一热的标签并不代表软判决边界概念之间，因此，培养他们的模型很容易发生过度拟合。使用软标签作为目标提供正规化，但不同的软标签可能会在优化的不同阶段是最佳的。此外，培养具有固定标签在噪声注释引线的存在更糟糕的概括。为了解决这些限制，我们提出了一个框架，在这里我们把标签为可以学习的参数，并对其进行优化与模型参数一起。博学的标签不断地适应其模型状态，从而提供动态的正规化。当应用于监督图像分类，我们的方法导致在不同的数据集和体系结构相一致的收益的任务。例如，动态学习标签通过CIFAR100 2.1％提高ResNet18。当应用到数据集包含嘈杂的标签，该标签学到正确标注错误，并通过显著利润率改善了国家的最先进的。最后，我们表明，了解到标签捕获类之间的语义关系，从而提高教师模型蒸馏的下游任务。

100. Scale-Localized Abstract Reasoning [PDF] 返回目录
Yaniv Benny, Niv Pekar, Lior Wolf
Abstract: We consider the abstract relational reasoning task, which is commonly used as an intelligence test. Since some patterns have spatial rationales, while others are only semantic, we propose a multi-scale architecture that processes each query in multiple resolutions. We show that indeed different rules are solved by different resolutions and a combined multi-scale approach outperforms the existing state of the art in this task on all benchmarks by 5-54%. The success of our method is shown to arise from multiple novelties. First, it searches for relational patterns in multiple resolutions, which allows it to readily detect visual relations, such as location, in higher resolution, while allowing the lower resolution module to focus on semantic relations, such as shape type. Second, we optimize the reasoning network of each resolution proportionally to its performance, hereby we motivate each resolution to specialize on the rules for which it performs better than the others and ignore cases that are already solved by the other resolutions. Third, we propose a new way to pool information along the rows and the columns of the illustration-grid of the query. Our work also analyses the existing benchmarks, demonstrating that the RAVEN dataset selects the negative examples in a way that is easily exploited. We, therefore, propose a modified version of the RAVEN dataset, named RAVEN-FAIR. Our code and pretrained models are available at this https URL. The dataset of RAVEN-FAIR is available at this https URL.
摘要：我们认为抽象的推理关系的任务，这是常用的智力测验。由于一些模式具有空间基本原理，而另一些则只是语义，我们建议，在处理多种分辨率每个查询多尺度结构。我们发现，确实是不同的规则是由不同的分辨率解决，并且结合多尺度方法，通过5-54％，优于现有技术中的所有基准这一任务的现有状态。我们的方法的成功显示来自多个新奇出现。首先，它搜索在多种分辨率，这使得它能够容易地检测的视觉关系，如位置，在更高分辨率的关系的模式，同时允许较低分辨率模块侧重于语义关系，诸如形状类型。第二，我们每个分辨率的推理网络优化比例，以它的性能，在此我们鼓励每个分辨率专注于它比其他人更好地执行规则，而忽略那些已经通过其他决议解决的案例。第三，我们建议沿着行和查询的插图网格的列池信息的新方式。我们的工作还分析了现有的基准，这表明RAVEN数据集选择以易于利用的方式的负面例子。因此，我们提出RAVEN数据集，命名为RAVEN公平的修改版本。我们的代码和预训练的模型可在此HTTPS URL。 RAVEN-FAIR的数据集可在此HTTPS URL。

101. Learning a Lie Algebra from Unlabeled Data Pairs [PDF] 返回目录
Chris Ick, Vincent Lostanlen
Abstract: Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations. In recent years, the generalization of deep learning to Lie groups beyond rigid motion in $\mathbb{R}^n$ has allowed to build convnets over datasets with non-trivial symmetries, such as patterns over the surface of a sphere. However, one limitation of this approach is the need to explicitly define the Lie group underlying the desired invariance property before training the convnet. Whereas rotations on the sphere have a well-known symmetry group ($\mathrm{SO}(3)$), the same cannot be said of many real-world factors of variability. For example, the disentanglement of pitch, intensity dynamics, and playing technique remains a challenging task in music information retrieval. This article proposes a machine learning method to discover a nonlinear transformation of the space $\mathbb{R}^n$ which maps a collection of $n$-dimensional vectors $(\boldsymbol{x}_i)_i$ onto a collection of target vectors $(\boldsymbol{y}_i)_i$. The key idea is to approximate every target $\boldsymbol{y}_i$ by a matrix--vector product of the form $\boldsymbol{\widetilde{y}}_i = \boldsymbol{\phi}(t_i) \boldsymbol{x}_i$, where the matrix $\boldsymbol{\phi}(t_i)$ belongs to a one-parameter subgroup of $\mathrm{GL}_n (\mathbb{R})$. Crucially, the value of the parameter $t_i \in \mathbb{R}$ may change between data pairs $(\boldsymbol{x}_i, \boldsymbol{y}_i)$ and does not need to be known in advance.
摘要：深卷积网络（convnets）显示学习解开表示了非凡的能力。近年来，深学习在$泛化到李群超出刚体运动\ mathbb {R} ^ n $的已获准筹建convnets在具有不平凡的对称性，如球体的表面上的图案的数据集。但是，这种方法的一个限制是需要训练convnet之前明确地定义所需的不变性背后的李群。而在球的旋转有一个著名的对称群（$ \ mathrm {} SO（3）$），同样不能说的可变性的许多真实世界的因素。例如，音调，强度动态和演奏技术解开一直是音乐信息检索一个具有挑战性的任务。本文提出了一种机器学习方法来发现空间$ \ mathbb {R} ^的非线性变换N $其中$ N $维向量$（\ boldsymbol {X} _i）_i $的集合映射到的集合目标矢量$（\ boldsymbol {Y} _i）_i $。的核心思想是通过一个矩阵来近似每个目标$ \ boldsymbol {Y} _i $ - 形式$ \ boldsymbol的向量积{\ widetilde {Y}} _ I = \ boldsymbol {\披}（t_i）\ boldsymbol { X} _i $，其中所述基质$ \ boldsymbol {\披}（t_i）$属于$ \ mathrm {GL} _n（\ mathbb {R}）$一参数子组。至关重要的是，该参数$ t_i \在\ mathbb {R} $的值可以数据之间对$改变（\ boldsymbol {X} _i，\ boldsymbol {Y} _i）$并且不需要事先是已知的。

102. Efficient Certification of Spatial Robustness [PDF] 返回目录
Anian Ruoss, Maximilian Baader, Mislav Balunović, Martin Vechev
Abstract: Recent work has exposed the vulnerability of computer vision models to spatial transformations. Due to the widespread usage of such models in safety-critical applications, it is crucial to quantify their robustness against spatial transformations. However, existing work only provides empirical quantification of spatial robustness via adversarial attacks, which lack provable guarantees. In this work, we propose novel convex relaxations, which enable us, for the first time, to provide a certificate of robustness against spatial transformations. Our convex relaxations are model-agnostic and can be leveraged by a wide range of neural network verifiers. Experiments on several network architectures and different datasets demonstrate the effectiveness and scalability of our method.
摘要：最近的工作已经暴露了计算机视觉模型的漏洞进行空间变换。由于安全关键应用这些模型的广泛使用，关键是要量化其可以有效抵抗的空间变换。然而，现有的工作只能通过敌对攻击，这可证明缺乏担保提供空间稳健性的实证量化。在这项工作中，我们提出了新的凸松弛，这使我们能够在第一时间，为客户提供稳健的证书对空间的转换。我们的凸松弛是模型无关，并且可以通过宽范围的神经网络验证器被利用。在几个网络架构和不同的数据集实验证明我们的方法的有效性和可扩展性。

103. Reducing false-positive biopsies with deep neural networks that utilize local and global information in screening mammograms [PDF] 返回目录
Nan Wu, Zhe Huang, Yiqiu Shen, Jungkyu Park, Jason Phang, Taro Makino, S. Gene Kim, Kyunghyun Cho, Laura Heacock, Linda Moy, Krzysztof J. Geras
Abstract: Breast cancer is the most common cancer in women, and hundreds of thousands of unnecessary biopsies are done around the world at a tremendous cost. It is crucial to reduce the rate of biopsies that turn out to be benign tissue. In this study, we build deep neural networks (DNNs) to classify biopsied lesions as being either malignant or benign, with the goal of using these networks as second readers serving radiologists to further reduce the number of false positive findings. We enhance the performance of DNNs that are trained to learn from small image patches by integrating global context provided in the form of saliency maps learned from the entire image into their reasoning, similar to how radiologists consider global context when evaluating areas of interest. Our experiments are conducted on a dataset of 229,426 screening mammography exams from 141,473 patients. We achieve an AUC of 0.8 on a test set consisting of 464 benign and 136 malignant lesions.
摘要：乳腺癌是女性最常见的癌症，和成千上万的不必要的活检都在巨大的成本在世界各地进行。关键是要减少被证明是良性的组织活检率。在这项研究中，我们建立深层神经网络（DNNs），以活检病变分类为是恶性还是良性的，与作为第二读者服务放射科医生，以进一步降低假阳性结果的数量使用这些网络的目标。我们加强对经过培训通过整合在显着的形式提供全球范围内从小型图像块映射学习从整个图像教训纳入他们的推理DNNs，类似评估关注的领域时，放射科医师如何考虑全球范围内的性能。我们的实验是从141473名病人的229426乳房摄影筛检检查的数据集进行。我们在由464个良性和136恶性病变的测试组达到0.8的AUC。

104. Humans learn too: Better Human-AI Interaction using Optimized Human Inputs [PDF] 返回目录
Johannes Schneider
Abstract: Humans rely more and more on systems with AI components. The AI community typically treats human inputs as a given and optimizes AI models only. This thinking is one-sided and it neglects the fact that humans can learn, too. In this work, human inputs are optimized for better interaction with an AI model while keeping the model fixed. The optimized inputs are accompanied by instructions on how to create them. They allow humans to save time and cut on errors, while keeping required changes to original inputs limited. We propose continuous and discrete optimization methods modifying samples in an iterative fashion. Our quantitative and qualitative evaluation including a human study on different hand-generated inputs shows that the generated proposals lead to lower error rates, require less effort to create and differ only modestly from the original samples.
摘要：人类越来越依赖与人工智能组件的系统。人工智能界通常将其视为一个给定的人力投入，只有优化了AI模式。这种想法是片面的，它忽视了一个事实，即人类可以学习了。在这项工作中，人力投入都用，同时保持固定的模式的AI模式更好的互动进行了优化。优化的输入都伴随着对如何创建它们的说明。他们让人类节省错误时间和降低，而原来的输入保存所需的更改限制。我们建议连续和离散优化方法迭代的方式修改样本。我们的定量和定性评估，包括对不同的手产生的输入表明，生成的投标导致降低错误率人的研究，需要更少的精力来创建和从原始样本不同之处仅温和。

105. Bias Field Poses a Threat to DNN-based X-Ray Recognition [PDF] 返回目录
Binyu Tian, Qing Guo, Felix Juefei-Xu, Wen Le Chan, Yupeng Cheng, Xiaohong Li, Xiaofei Xie, Shengchao Qin
Abstract: The chest X-ray plays a key role in screening and diagnosis of many lung diseases including the COVID-19. More recently, many works construct deep neural networks (DNNs) for chest X-ray images to realize automated and efficient diagnosis of lung diseases. However, bias field caused by the improper medical image acquisition process widely exists in the chest X-ray images while the robustness of DNNs to the bias field is rarely explored, which definitely poses a threat to the X-ray-based automated diagnosis system. In this paper, we study this problem based on the recent adversarial attack and propose a brand new attack, i.e., the adversarial bias field attack where the bias field instead of the additive noise works as the adversarial perturbations for fooling the DNNs. This novel attack posts a key problem: how to locally tune the bias field to realize high attack success rate while maintaining its spatial smoothness to guarantee high realisticity. These two goals contradict each other and thus has made the attack significantly challenging. To overcome this challenge, we propose the adversarial-smooth bias field attack that can locally tune the bias field with joint smooth & adversarial constraints. As a result, the adversarial X-ray images can not only fool the DNNs effectively but also retain very high level of realisticity. We validate our method on real chest X-ray datasets with powerful DNNs, e.g., ResNet50, DenseNet121, and MobileNet, and show different properties to the state-of-the-art attacks in both image realisticity and attack transferability. Our method reveals the potential threat to the DNN-based X-ray automated diagnosis and can definitely benefit the development of bias-field-robust automated diagnosis system.
摘要：胸部X-射线起着筛选和许多肺部疾病包括COVID-19的诊断关键作用。最近，许多作品构建胸部X射线图像实现肺部疾病的自动和高效的诊断深神经网络（DNNs）。然而，偏置场造成的不当的医疗图像采集过程广泛存在于胸部X射线图像而DNNs于偏置场的鲁棒性很少探索，这绝对构成了对基于X射线的自动诊断系统的威胁。在本文中，我们研究了基于最近的敌对攻击这个问题，并提出了一个全新的攻击，即对抗偏置场的攻击，其中偏磁场，而不是加性噪声工作作为愚弄DNNs对抗扰动。这种新型攻击的帖子一个关键问题：如何在本地调整偏置领域实现高攻的成功率，同时保持其空间平滑，以保证高realisticity。这两个目标相互矛盾，从而取得了攻击显著挑战。为了克服这一挑战，我们提出的对抗光滑偏置场的攻击，可以在本地调关节偏磁场润滑和对抗性的约束。其结果是，对抗X射线图像，不仅可以有效地愚弄DNNs还保有realisticity的非常高的水平。我们确认我们的真实胸部X射线数据集方法具有强大的DNNs，例如，ResNet50，DenseNet121和MobileNet，并表现出不同的特性在两个图像realisticity和攻击转让的国家的最先进的攻击。我们的方法揭示了基于DNN-X射线自动诊断潜在的威胁，绝对可以受益偏置场强大的自动诊断系统的开发。

106. Few-shot learning using pre-training and shots, enriched by pre-trained samples [PDF] 返回目录
Detlef Schmicker
Abstract: We use the EMNIST dataset of handwritten digits to test a simple approach for few-shot learning. A fully connected neural network is pre-trained with a subset of the 10 digits and used for few-shot learning with untrained digits. Two basic ideas are introduced: during few-shot learning the learning of the first layer is disabled, and for every shot a previously unknown digit is used together with four previously trained digits for the gradient descend, until a predefined threshold condition is fulfilled. This way we reach about 90% accuracy after 10 shots.
摘要：我们使用的手写数字的EMNIST数据集来测试一些次学习一个简单的方法。一个完全连接的神经网络预先训练用的10个数字的子集，用于少数次学习与未受过训练的数字。两个基本观念介绍：几拍学习第一层的学习过程中被禁用，并为每一个镜头一个先前未知的数字被使用的梯度下降4个之前训练的数字一起，直到预定阈值条件被满足。这样我们就达到约经过10个投90％的准确率。

107. Lossless White Balance For Improved Lossless CFA Image and Video Compression [PDF] 返回目录
Yeejin Lee, Keigo Hirakawa
Abstract: Color filter array is spatial multiplexing of pixel-sized filters placed over pixel detectors in camera sensors. The state-of-the-art lossless coding techniques of raw sensor data captured by such sensors leverage spatial or cross-color correlation using lifting schemes. In this paper, we propose a lifting-based lossless white balance algorithm. When applied to the raw sensor data, the spatial bandwidth of the implied chrominance signals decreases. We propose to use this white balance as a pre-processing step to lossless CFA subsampled image/video compression, improving the overall coding efficiency of the raw sensor data.
摘要：滤色器阵列的放置在像素探测器在相机传感器像素尺寸的过滤器的空间复用。状态的最先进的无损编码由这样的传感器杠杆空间或使用提升方案的横色相关捕获的原始传感器数据的技术。在本文中，我们提出了一种基于提升的无损白平衡算法。当施加到原始传感器数据，隐含色度信号的空间带宽减小。我们建议使用这个白平衡作为预处理步骤的无损CFA子采样的图像/视频压缩，提高原始传感器数据的总体编码效率。

108. Kernel Ridge Regression Using Importance Sampling with Application to Seismic Response Prediction [PDF] 返回目录
Farhad Pourkamali-Anaraki, Mohammad Amin Hariri-Ardebili, Lydia Morawiec
Abstract: Scalable kernel methods, including kernel ridge regression, often rely on low-rank matrix approximations using the Nystrom method, which involves selecting landmark points from large data sets. The existing approaches to selecting landmarks are typically computationally demanding as they require manipulating and performing computations with large matrices in the input or feature space. In this paper, our contribution is twofold. The first contribution is to propose a novel landmark selection method that promotes diversity using an efficient two-step approach. Our landmark selection technique follows a coarse to fine strategy, where the first step computes importance scores with a single pass over the whole data. The second step performs K-means clustering on the constructed coreset to use the obtained centroids as landmarks. Hence, the introduced method provides tunable trade-offs between accuracy and efficiency. Our second contribution is to investigate the performance of several landmark selection techniques using a novel application of kernel methods for predicting structural responses due to earthquake load and material uncertainties. Our experiments exhibit the merits of our proposed landmark selection scheme against baselines.
摘要：可扩展核方法，包括内核岭回归，通常依赖于使用Nystrom方法的，其涉及从大的数据集选择地标点低秩矩阵的近似。现有的方法在选择的地标，通常需要大量计算的，因为它们需要操纵和与输入或特征空间大矩阵执行计算。在本文中，我们的贡献是双重的。第一个贡献是提出一种使用高效的两步骤方法促进分集的新颖标志选择方法。我们的标志选择技术遵循一个粗到细的策略，其中的第一步计算重要性得分以单道次在整个数据。所述第二步骤执行K-means聚类所构造coreset使用所获得的质心作为界标。因此，所提出的方法提供了精确度和效率之间的可调谐的权衡。我们的第二个贡献是调查使用的核方法新颖的应用程序来预测因地震载荷和材料的不确定性结构响应几个标志选择技术的性能。我们的实验显示对基线我们所提出的具有里程碑意义的选择方案的优劣。

109. Pose Correction Algorithm for Relative Frames between Keyframes in SLAM [PDF] 返回目录
Youngseok Jang, Hojoon Shin, H. Jin Kim
Abstract: With the dominance of keyframe-based SLAM in the field of robotics, the relative frame poses between keyframes have typically been sacrificed for a faster algorithm to achieve online applications. However, those approaches can become insufficient for applications that may require refined poses of all frames, not just keyframes which are relatively sparse compared to all input frames. This paper proposes a novel algorithm to correct the relative frames between keyframes after the keyframes have been updated by a back-end optimization process. The correction model is derived using conservation of the measurement constraint between landmarks and the robot pose. The proposed algorithm is designed to be easily integrable to existing keyframe-based SLAM systems while exhibiting robust and accurate performance superior to existing interpolation methods. The algorithm also requires low computational resources and hence has a minimal burden on the whole SLAM pipeline. We provide the evaluation of the proposed pose correction algorithm in comparison to existing interpolation methods in various vector spaces, and our method has demonstrated excellent accuracy in both KITTI and EuRoC datasets.
摘要：随着基于关键帧的SLAM在机器人领域的主导地位，关键帧之间的相对帧的姿势通常被牺牲了更快的算法实现在线应用。然而，这些方法可能变得不足为可能需要的所有帧，这是比较稀疏相比，所有输入帧不只是关键帧的精制姿态应用。本文提出了一种新颖的算法来纠正关键帧之间的相对帧关键帧已经由后端优化过程更新之后。校正模型是使用地标和机器人姿态之间的测量约束的保护的。该算法被设计为容易地积到现有基于关键帧的SLAM系统同时显示出强大的和精确的性能优于现有的内插方法。该算法还需要较低的计算资源，因此对整个SLAM管道最小的负担。我们提供建议的姿态校正算法的评估相比，在不同的向量空间现有的插值方法，而我们的方法展示了出色的精度均KITTI和EuRoC数据集。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-09-22

目录

摘要