摘要

1. A Survey on Deep Hashing Methods [PDF] 返回目录
Xiao Luo, Chong Chen, Huasong Zhong, Hao Zhang, Minghua Deng, Jianqiang Huang, Xiansheng Hua
Abstract: Nearest neighbor search is to find the data points in the database such that the distances from them to the query are the smallest, which is a fundamental problem in various domains, such as computer vision, recommendation systems and machine learning. Hashing is one of the most widely used method for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this paper, we present a comprehensive survey of the deep hashing algorithms. Based on the loss function, we categorize deep supervised hashing methods according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization. In addition, we also introduce some other topics such as deep unsupervised hashing and multi-modal deep hashing methods. Meanwhile, we also present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discussed some potential research directions in the conclusion.
摘要：最近邻搜索是在数据库中，以便向他们查询的距离是最小的，这是在各个领域，如计算机视觉，推荐系统和机器学习的一个根本问题找到数据点。散列是其计算和存储效率的最广泛使用的方法之一。随着深度学习的发展，深哈希方法表现出比传统方法更具有优势。在本文中，我们提出的深散列算法进行全面调查。基于所述损失函数，我们根据保存的相似入的方式进行分类深监督散列方法：成对相似性防腐剂，multiwise相似防腐剂，隐相似防腐剂，以及量化。此外，我们还介绍了一些其他的主题，如深无监督散列和多模态深哈希方法。同时，我们也提出了一些常用的公共数据集和测量的深散列算法性能的方案。最后，我们讨论的结论，一些潜在的研究方向。

2. Captioning Images with Novel Objects via Online Vocabulary Expansion [PDF] 返回目录
Mikihiro Tanaka, Tatsuya Harada
Abstract: In this study, we introduce a low cost method for generating descriptions from images containing novel objects. Generally, constructing a model, which can explain images with novel objects, is costly because of the following: (1) collecting a large amount of data for each category, and (2) retraining the entire system. If humans see a small number of novel objects, they are able to estimate their properties by associating their appearance with known objects. Accordingly, we propose a method that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features of the objects. The method can be integrated with general image-captioning models. The experimental results show the effectiveness of our approach.
摘要：在这项研究中，我们介绍了用于从含有对象新颖的图像的说明的低成本方法。通常，构建模型，其可与新物体解释图像，是因为以下的原因昂贵：（1）收集的大量数据的用于每个类别，和（2）再训练整个系统。如果人类看到一个小数目新颖的对象，他们能够通过关联它们与已知的物体的外观来估计他们的财产。因此，我们建议，可以解释与新物体的图像，而不使用单词对象从只有少量的图像的估计的对象的嵌入物设有再训练的方法。该方法可以与一般的图像字幕模型集成。实验结果表明我们的方法的有效性。

3. Heterogeneity Loss to Handle Intersubject and Intrasubject Variability in Cancer [PDF] 返回目录
Shubham Goswami, Suril Mehta, Dhruva Sahrawat, Anubha Gupta, Ritu Gupta
Abstract: Developing nations lack adequate number of hospitals with modern equipment and skilled doctors. Hence, a significant proportion of these nations' population, particularly in rural areas, is not able to avail specialized and timely healthcare facilities. In recent years, deep learning (DL) models, a class of artificial intelligence (AI) methods, have shown impressive results in medical domain. These AI methods can provide immense support to developing nations as affordable healthcare solutions. This work is focused on one such application of blood cancer diagnosis. However, there are some challenges to DL models in cancer research because of the unavailability of a large data for adequate training and the difficulty of capturing heterogeneity in data at different levels ranging from acquisition characteristics, session, to subject-level (within subjects and across subjects). These challenges render DL models prone to overfitting and hence, models lack generalization on prospective subjects' data. In this work, we address these problems in the application of B-cell Acute Lymphoblastic Leukemia (B-ALL) diagnosis using deep learning. We propose heterogeneity loss that captures subject-level heterogeneity, thereby, forcing the neural network to learn subject-independent features. We also propose an unorthodox ensemble strategy that helps us in providing improved classification over models trained on 7-folds giving a weighted-$F_1$ score of 95.26% on unseen (test) subjects' data that are, so far, the best results on the C-NMC 2019 dataset for B-ALL classification.
摘要：发展中国家缺乏与现代化的设备和熟练的医生医院的足够数量。因此，这些国家的人口的比例显著，特别是在农村地区，不能够利用专业和及时的医疗设施。近年来，深度学习（DL）的模型，一类的人工智能（AI）方法，都表现出在医疗领域骄人的成绩。这些AI的方法可以提供给发展中国家负担得起的医疗保健解决方案的巨大支持。这项工作是集中在血液癌症诊断的一个这样的应用。不过，也有因为充分的培训和在不同层次，从采集的特点，会话，以对象级的数据采集异质性的难度大数据不可用的一些挑战DL模型癌症研究（学科内和跨科目）。这些挑战使易发生过拟合，因此，模型缺乏对未来的受试者的数据概括DL型号。在这项工作中，我们要解决在使用深度学习B细胞急性淋巴细胞白血病（B-ALL）的诊断中的应用这些问题。我们建议异质性损失捕获对象级的异质性，从而迫使神经网络学习主题无关的特性。我们还提出了一个非正统的合奏战略，帮助我们在全球培训了7倍量给上看不见的（测试）受试者的数据是一个weighted- $ F_1 $得分95.26％，到目前为止，效果最好的车型提供改进的分类的C-NMC 2019数据集B-ALL分类。

4. Probability Weighted Compact Feature for Domain Adaptive Retrieval [PDF] 返回目录
Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou
Abstract: Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at this https URL
摘要：域自适应图像检索包括单域检索和跨域检索。大部分的现有的图像检索方法只集中在单域检索，其中假定检索数据库和查询的分布是相似的。然而，在实际应用中，通常是采取理想的照明/姿态/背景/相机条件和通常在不受控制的条件下获得的查询检索数据库之间的差异是非常大的。在本文中，考虑到实际应用中，我们专注于挑战跨域检索。为了解决这个问题，我们提出了一种有效的命名方法概率加权紧致特征点学习（PWCF），其提供域间的相关性的指导以促进交域检索的准确度和学习一系列紧凑二进制码来提高检索速度。诱导焦三重损失，BP引起的量化损失和BP引起的损失分类看贝叶斯（BP）：首先，我们通过最大后验估计（MAP）获得我们的损失功能。其次，我们提出域之间的公共歧管结构，探索跨领域的潜在关联。考虑原特征表示被偏压由于域间差异，歧管结构难以构成。因此，我们建议从命名样本统计的角度邻居（HFON）的直方图特征的新功能。在各种基准数据库广泛的实验验证了我们的方法优于对域自适应图像检索许多国家的最先进的图像检索方法。源代码可在此HTTPS URL

5. A Hybrid Approach for Tracking Individual Players in Broadcast Match Videos [PDF] 返回目录
Roberto L. Castro, Diego Andrade, Basilio Fraguela
Abstract: Tracking people in a video sequence is a challenging task that has been approached from many perspectives. This task becomes even more complicated when the person to track is a player in a broadcasted sport event, the reasons being the existence of difficulties such as frequent camera movements or switches, total and partial occlusions between players, and blurry frames due to the codification algorithm of the video. This paper introduces a player tracking solution which is both fast and accurate. This allows to track a player precisely in real-time. The approach combines several models that are executed concurrently in a relatively modest hardware, and whose accuracy has been validated against hand-labeled broadcast video sequences. Regarding the accuracy, the tests show that the area under curve (AUC) of our approach is around 0.6, which is similar to generic state of the art solutions. As for performance, our proposal can process high definition videos (1920x1080 px) at 80 fps.
摘要：视频序列跟踪的人是已经从许多方面来看走近一项艰巨的任务。这项任务变得更加复杂，当人到磁道在播放体育赛事的球员，一个原因是困难的存在，如频繁的相机运动或开关，总的和玩家之间的部分遮挡，而模糊帧中由于编纂算法的视频。本文介绍了玩家跟踪解决方案，既快速又准确。这使得在实时精确跟踪的球员。这种方法结合了在一个相对温和的硬件，且其精度已验证对手工标记的广播视频序列的同时执行的几种模式。关于精度，测试表明，我们的方法的曲线下（AUC）的面积约为0.6，这是类似的技术解决方案的一般状态。至于性能，我们建议可以在80 fps的处理高清视频（1920×1080像素）。

6. Spherical formulation of moving object geometric constraints for monocular fisheye cameras [PDF] 返回目录
Letizia Mariotti, Ciaran Hughes
Abstract: In this paper, we introduce a moving object detection algorithm for fisheye cameras used in autonomous driving. We reformulate the three commonly used constraints in rectilinear images (epipolar, positive depth and positive height constraints) to spherical coordinates which is invariant to specific camera configuration once the calibration is known. One of the main challenging use case in autonomous driving is to detect parallel moving objects which suffer from motion-parallax ambiguity. To alleviate this, we formulate an additional fourth constraint, called the anti-parallel constraint, which aids the detection of objects with motion that mirrors the ego-vehicle possible. We analyze the proposed algorithm in different scenarios and demonstrate that it works effectively operating directly on fisheye images.
摘要：在本文中，我们介绍了在自动驾驶使用鱼眼镜头的运动目标检测算法。我们重新配制在直线图像（对极，正深度和高度阳性约束）到球面坐标一旦校准，已知这是不变的特定相机配置的三个常用约束。其中一个主要的挑战性应用的情况下在自动驾驶的是能够检测到从运动视差歧义遭受平行移动的物体。为了减轻这种情况，我们制定一个附加的第四约束，称为反平行约束，这有助于物体的检测具有运动的是反射镜的自车辆成为可能。我们分析了在不同情况下所提出的算法，并证明其有效运作的鱼眼图像直接操作。

7. Traffic Signs Detection and Recognition System using Deep Learning [PDF] 返回目录
Pavly Salah Zaki, Marco Magdy William, Bolis Karam Soliman, Kerolos Gamal Alexsan, Keroles Khalil, Magdy El-Moursy
Abstract: With the rapid development of technology, automobiles have become an essential asset in our day-to-day lives. One of the more important researches is Traffic Signs Recognition (TSR) systems. This paper describes an approach for efficiently detecting and recognizing traffic signs in real-time, taking into account the various weather, illumination and visibility challenges through the means of transfer learning. We tackle the traffic sign detection problem using the state-of-the-art of multi-object detection systems such as Faster Recurrent Convolutional Neural Networks (F-RCNN) and Single Shot Multi Box Detector (SSD) combined with various feature extractors such as MobileNet v1 and Inception v2, and also Tiny-YOLOv2. However, the focus of this paper is going to be F-RCNN Inception v2 and Tiny YOLO v2 as they achieved the best results. The aforementioned models were fine-tuned on the German Traffic Signs Detection Benchmark (GTSDB) dataset. These models were tested on the host PC as well as Raspberry Pi 3 Model B+ and the TASS PreScan simulation. We will discuss the results of all the models in the conclusion section.
摘要：随着科技的飞速发展，汽车已经成为我们每天的日常生活最重要的资产。其中一个比较重要的研究的是交通标志识别（TSR）系统。本文描述用于有效地检测和识别交通标志在实时，考虑到通过转移学习的装置中的各种天气，照明和可视的挑战的一种方法。我们利用解决当地的交通标志检测问题的国家的最先进的多目标检测系统，如更快的各种特征提取，如合并复发卷积神经网络（F-RCNN）和单次多盒检测器（SSD） MobileNet v1和v2的盗梦空间，也微小，YOLOv2。但是，本文的重点将是F-RCNN盗梦空间v2和微小YOLO V2作为他们取得的最好成绩。上述模型进行微调，对德国交通标志检测基准（GTSDB）数据集。这些模型主机PC以及树莓裨3型号B +和TASS扫描前仿真进行测试。我们将讨论在结论部分中的所有模型的结果。

8. Automated detection of pitting and stress corrosion cracks in used nuclear fuel dry storage canisters using residual neural networks [PDF] 返回目录
Theodore Papamarkou, Hayley Guy, Bryce Kroencke, Jordan Miller, Preston Robinette, Daniel Schultz, Jacob Hinkle, Laura Pullum, Catherine Schuman, Jeremy Renshaw, Stylianos Chatzidakis
Abstract: Nondestructive evaluation methods play an important role in ensuring component integrity and safety in many industries. Operator fatigue can play a critical role in the reliability of such methods. This is important for inspecting high value assets or assets with a high consequence of failure, such as aerospace and nuclear components. Recent advances in convolution neural networks can support and automate these inspection efforts. This paper proposes using residual neural networks (ResNets) for real-time detection of pitting and stress corrosion cracking, with a focus on dry storage canisters housing used nuclear fuel. The proposed approach crops nuclear canister images into smaller tiles, trains a ResNet on these tiles, and classifies images as corroded or intact using the per-image count of tiles predicted as corroded by the ResNet. The results demonstrate that such a deep learning approach allows to detect the locus of corrosion cracks via smaller tiles, and at the same time to infer with high accuracy whether an image comes from a corroded canister. Thereby, the proposed approach holds promise to automate and speed up nuclear fuel canister inspections, to minimize inspection costs, and to partially replace human-conducted onsite inspections, thus reducing radiation doses to personnel.
摘要：无损检测方法，在确保在许多行业组件的完整性和安全性具有重要作用。操作者的疲劳能起到这种方法的可靠性至关重要的作用。这是检查高价值资产或资产失败的严重后果，如航空航天和核能的重要部件。在卷积最新进展神经网络能够支持和自动完成这些检查力度。本文提出使用残差神经网络（ResNets），用于点蚀和耐应力腐蚀裂纹，重点放在干燥存储罐壳体用于核燃料的实时检测。所提出的方法作物核罐图像转换成更小的砖，火车一个RESNET这些瓷砖，并进行分类的图像作为预测使用瓦片的每图像数腐蚀或完整由RESNET腐蚀。结果表明，这样的深学习方法允许通过更小的砖来检测腐蚀裂纹的轨迹，并在同一时刻的图像是否来自腐蚀罐以高精度推断。因此，该方法有希望实现自动化，加快核燃料罐检查，最大限度地降低检验成本，并部分代替人进行了现场检查，从而降低辐射剂量的人员。

9. Diverse and Admissible Trajectory Forecasting through Multimodal Context Understanding [PDF] 返回目录
Seong Hyeon Park, Gyubok Lee, Manoj Bhat, Jimin Seo, Minseok Kang, Jonathan Francis, Ashwin R. Jadhav, Paul Pu Liang, Louis-Philippe Morency
Abstract: Multi-agent trajectory forecasting in autonomous driving requires an agent to accurately anticipate the behaviors of the surrounding vehicles and pedestrians, for safe and reliable decision-making. Due to partial observability over the goals, contexts, and interactions of agents in these dynamical scenes, directly obtaining the posterior distribution over future agent trajectories remains a challenging problem. In realistic embodied environments, each agent's future trajectories should be diverse since multiple plausible sequences of actions can be used to reach its intended goals, and they should be admissible since they must obey physical constraints and stay in drivable areas. In this paper, we propose a model that fully synthesizes multiple input signals from the multimodal world|the environment's scene context and interactions between multiple surrounding agents|to best model all diverse and admissible trajectories. We offer new metrics to evaluate the diversity of trajectory predictions, while ensuring admissibility of each trajectory. Based on our new metrics as well as those used in prior work, we compare our model with strong baselines and ablations across two datasets and show a 35% performance-improvement over the state-of-the-art.
摘要：在自动驾驶多代理轨迹预测需要一个代理来准确地预见到周围的车辆和行人，安全，可靠的决策行为。由于在目标，上下文和代理商在这些动态场景的交互部分可观测性，直接获得对未来代理的轨迹后验分布仍然是一个具有挑战性的问题。在现实环境中体现，每个代理的未来轨迹应该是多样化的，因为动作的多个合理的序列可以用来达到其预定目标，他们应该被接纳，因为他们必须服从物理限制，并留在驱动领域。在本文中，我们提出了一个模型，完全由合成的多世界多个输入信号|多个周围代理商之间的环境的现场环境和互动|最佳模型的所有不同的和可接受的轨迹。我们提供新的指标来评估轨迹预测的多样性，同时确保每个轨迹的受理。基于我们新的指标，以及那些在以前的工作中，我们比较我们与跨越两个数据集强大的基线和消融模型，并显示出35％的性能，提高在国家的最先进的。

10. Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition [PDF] 返回目录
Yuanhang Zhang, Shuang Yang, Jingyun Xiao, Shiguang Shan, Xilin Chen
Abstract: Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR). Currently, most existing methods equate VSR with automatic lip reading, which attempts to recognise speech by analysing lip motion. However, human experience and psychological studies suggest that we do not always fix our gaze at each other's lips during a face-to-face conversation, but rather scan the whole face repetitively. This inspires us to revisit a fundamental yet somehow overlooked problem: can VSR models benefit from reading extraoral facial regions, i.e. beyond the lips? In this paper, we perform a comprehensive study to evaluate the effects of different facial regions with state-of-the-art VSR models, including the mouth, the whole face, the upper face, and even the cheeks. Experiments are conducted on both word-level and sentence-level benchmarks with different characteristics. We find that despite the complex variations of the data, incorporating information from extraoral facial regions, even the upper face, consistently benefits VSR performance. Furthermore, we introduce a simple yet effective method based on Cutout to learn more discriminative features for face-based VSR, hoping to maximise the utility of information encoded in different facial regions. Our experiments show obvious improvements over existing state-of-the-art methods that use only the lip region as inputs, a result we believe would probably provide the VSR community with some new and exciting insights.
摘要：在深度学习的最新进展在视觉语音识别（VSR）领域研究人员的兴趣提高。目前，大多数现有的方法等同VSR自动唇读，它试图通过分析唇运动识别语音。然而，人类的经验和心理的研究表明，我们并不总是解决我们对方的嘴唇凝视一张脸对脸的谈话过程中，但反复而扫描整个脸。这启发我们重新审视一个基本的但不知何故被忽视的问题：可以VSR阅读口外面部区域，即超出了嘴唇模型中受益？在本文中，我们进行了全面的研究，以评估与国家的最先进的VSR车型不同的面部区域，包括口，全脸，在上面，甚至脸颊的效果。实验在不同的两种特征的字级和句子级基准进行。我们发现，尽管该数据的复杂变化，结合从口外面部区域的信息，甚至上面，持续受益VSR性能。此外，我们推出基于抠图一个简单而有效的方法去学习基于面的VSR更有辨别力的功能，希望能最大限度地在不同的面部区域编码信息的工具。我们的实验显示在仅使用唇的区域作为输入现有的国家的最先进的方法明显改进，我们认为可能会提供VSR社会提供一些新的和令人兴奋的见解的结果。

11. A Neuro-AI Interface for Evaluating Generative Adversarial Networks [PDF] 返回目录
Zhengwei Wang, Qi She, Alan F. Smeaton, Tomas E. Ward, Graham Healy
Abstract: Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often require large sample sizes for evaluation and do not directly reflect human perception of image quality. In this work, we introduce an evaluation metric called Neuroscore, for evaluating the performance of GANs, that more directly reflects psychoperceptual image quality through the utilization of brain signals. Our results show that Neuroscore has superior performance to the current evaluation metrics in that: (1) It is more consistent with human judgment; (2) The evaluation process needs much smaller numbers of samples; and (3) It is able to rank the quality of images on a per GAN basis. A convolutional neural network (CNN) based neuro-AI interface is proposed to predict Neuroscore from GAN-generated images directly without the need for neural responses. Importantly, we show that including neural responses during the training phase of the network can significantly improve the prediction capability of the proposed model. Codes and data can be referred at this link: this https URL.
摘要：创成对抗网络（甘斯）正越来越多地吸引在计算机视觉，自然语言处理，语音合成以及类似领域的关注。然而，评估甘斯的表现仍然是一个开放和具有挑战性的问题。现有评价指标主要测量使用自动统计方法和实际产生的图像间的差异性。他们经常需要进行评估的样本量和不直接反映图像质量的人类感知。在这项工作中，我们介绍指标的评估叫做神经功能评分，评价甘斯的表现，更直接地反映通过大脑信号的利用率psychoperceptual图像质量。我们的研究结果表明，神经功能评分具有优越的性能在当前的评估指标：（1）它是与人的判断较为一致; （2）评价过程需要样品的更小的数字;和（3）它是能够排名的图像的质量上的每GAN基础。卷积神经网络（CNN）基于神经AI接口提议直接而不需要神经反应从GAN-生成的图像预测神经功能评分。重要的是，我们表明，包括在网络的训练阶段的神经反应能显著提高了模型的预测能力。此HTTPS URL：代码和数据可以在这个环节被称为。

12. Generalizable semi-supervised learning method to estimate mass from sparsely annotated images [PDF] 返回目录
Muhammad K.A. Hamdan, Diane T. Rover, Matthew J. Darr, John Just
Abstract: Mass flow estimation is of great importance to several industries, and it can be quite challenging to obtain accurate estimates due to limitation in expense or general infeasibility. In the context of agricultural applications, yield monitoring is a key component to precision agriculture and mass flow is the critical factor to measure. Measuring mass flow allows for field productivity analysis, cost minimization, and adjustments to machine efficiency. Methods such as volume or force-impact have been used to measure mass flow; however, these methods are limited in application and accuracy. In this work, we use deep learning to develop and test a vision system that can accurately estimate the mass of sugarcane while running in real-time on a sugarcane harvester during operation. The deep learning algorithm that is used to estimate mass flow is trained using very sparsely annotated images (semi-supervised) using only final load weights (aggregated weights over a certain period of time). The deep neural network (DNN) succeeds in capturing the mass of sugarcane accurately and surpasses older volumetric-based methods, despite highly varying lighting and material colors in the images. The deep neural network is initially trained to predict mass on laboratory data (bamboo) and then transfer learning is utilized to apply the same methods to estimate mass of sugarcane. Using a vision system with a relatively lightweight deep neural network we are able to estimate mass of bamboo with an average error of 4.5% and 5.9% for a select season of sugarcane.
摘要：质量流量的估计是非常重要的几个行业，它可以是相当具有挑战性，以获得准确的估计是由于费用或一般不可行的限制。在农业应用的上下文中，产率监测是一个关键组成部分精密农业和质量流来测量关键因素。测量质量流量使得现场生产力分析，成本最小化，并调整机器的效率。方法，例如体积或力影响已被用于测量质量流量;然而，这些方法中的应用和准确性的限制。在这项工作中，我们使用深度学习来开发和测试视觉系统能够精确估计甘蔗的质量运行期间对甘蔗收割机在实时运行时。被用于估计质量流量深学习算法使用非常稀疏注释的图像（半监督）仅使用最后的负载权重（权重聚集在一定的时间周期）训练。深层的神经网络（DNN）成功地准确捕获甘蔗的质量并超过旧的基于体积的方法，尽管在图像高度不同的照明和材料的颜色。深层的神经网络最初被训练以预测对实验室数据质量（竹），然后转移学习被利用来应用相同的方法，以甘蔗的估计质量。使用视觉系统具有相对轻便的深层神经网络，我们能够估计竹质4.5％和5.9％的平均误差为甘蔗的选择季节。

13. Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning [PDF] 返回目录
Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein
Abstract: One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval.
摘要：一个实现机器学习模型来理解和解决现实世界的任务的关键因素是利用多模数据。不幸的是，多模态数据的注释是具有挑战性的和昂贵的。近来，提出了结合了视觉和语言的自我监督多式联运的方法来学习多表示，而没有标注。然而，这些方法选择忽略高噪音水平的存在，从而得到次优的结果。在这项工作中，我们表明，多模态数据的噪声估计的问题可以简化为多密度估计任务。使用多模态密度估计，我们提出了多模态表示学习噪音估计积木是严格依据不同的模式之间的内在关系。我们演示了如何我们的噪声估计大致可集成并实现比较的结果，国家的最先进的性能上五个不同的基准数据集两具有挑战性的任务多式联运：视频答疑和文本的视频检索。

14. When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs) [PDF] 返回目录
Victor Villena-Martinez, Sergiu Oprea, Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Robert B. Fisher
Abstract: Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers and occlusions, the quality of the extracted features, real-time requirements and the type of transformation, especially those ones defined by multiple parameters, like non-rigid deformations. Recent advancements in machine learning could be a turning point in these issues, particularly with the development of deep learning (DL) techniques, which are helping to improve multiple computer vision problems through an abstract understanding of the input data. In this paper, a review of deep learning-based registration methods is presented. We classify the different papers proposing a framework extracted from the traditional registration pipeline to analyse the new learning-based proposal strengths. Deep Registration Networks (DRNs) try to solve the alignment task either replacing part of the traditional pipeline with a network or fully solving the registration problem. The main conclusions extracted are, on the one hand, 1) learning-based registration techniques cannot always be clearly classified in the traditional pipeline. 2) These approaches allow more complex inputs like conceptual models as well as the traditional 3D datasets. 3) In spite of the generality of learning, the current proposals are still ad hoc solutions. Finally, 4) this is a young topic that still requires a large effort to reach general solutions able to cope with the problems that affect traditional approaches.
摘要：注册是计算转型是对齐的数据集的过程。通常，登记处理可分为四个主要步骤：目标选择，特征提取，特征匹配，并且用于对准变换计算。结果的精度取决于多种因素，最显著被输入的数据的量，噪声，异常值和遮挡的情况下，所提取的特征，实时性要求和变换的类型的质量，特别是那些那些定义由多个参数，如非刚性变形。在机器学习的最新进展可能是这些问题的一个转折点，特别是随着深度学习（DL）技术，该技术有助于通过输入数据的抽象理解，提高多计算机视觉发展的难题。在本文中，深基础的学习登记方法进行审查，提出。我们分类的不同的文件提出了从传统的管道注册提取的框架来分析新的基于学习的建议的优点。深注册网络（DRNS）试图解决对准任务或者与网络或完全解决问题登记了传统的管道来替换一部分。提取的主要结论是，在一方面，1）基于学习的配准技术不能总是清楚地分类在传统的管道。 2）这些方法允许像概念模型以及传统的3D数据集的更复杂的输入。 3）尽管学习的普遍性，目前的建议仍然是临时的解决方案。最后，4）这是一个仍需要大量的努力，以实现能够应付影响传统方法的问题，一般的解决方案，一个年轻的话题。

15. D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features [PDF] 返回目录
Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, Chiew-Lan Tai
Abstract: A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.[code release](this https URL)
摘要：一个成功的点云登记往往通过辨别3D地方特色在于稳健建立稀疏的比赛。尽管基于学习的三维特征描述的快速发展，很少有人注意吸引到3D功能的探测器的学习，甚至更少的两个任务的联合学习。在本文中，我们利用三维点云三维全卷积网络，并提出一种新颖且实用的学习机制，密集的预言既检测得分为每个3D点的说明功能。特别是，我们提出了克服三维点云固有的密度变化，并进一步提出了在训练期间由上即时特征匹配结果导向的自我监督检测损失关键点选择策略。最后，我们的方法实现国家的最先进成果在室内和室外场景，在3DMatch和KITTI数据集进行评估，并显示在ETH数据集其强大的推广能力。朝着实用化，我们表明，通过采用一个可靠的特征检测器，采样的特征的数目较小的足以实现准确，快速的点云对准。[代码释放]（此HTTPS URL）

16. Demographic Bias in Presentation Attack Detection of Iris Recognition Systems [PDF] 返回目录
Meiling Fang, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract: With the widespread use of biometric systems, the demographic bias problem raises more attention. Although many studies addressed bias issues in biometric verification, there is no works that analyse the bias in presentation attack detection (PAD) decisions. Hence, we investigate and analyze the demographic bias in iris PAD algorithms in this paper. To enable a clear discussion, we adapt the notions of differential performance and differential outcome to the PAD problem. We study the bias in iris PAD using three baselines (hand-crafted, transfer-learning, and training from scratch) using the the NDCLD-2013 database. The experimental results points out that female users will be significantly less protected by the PAD, in comparison to males.
摘要：随着广泛使用的生物识别系统，人口偏差问题引起了更多的关注。虽然许多研究在生物特征验证解决偏见问题，没有工作是分析演示攻击检测（PAD）的决策偏差。因此，我们调查和本文分析了人口偏向于虹膜PAD算法。要启用一个明确的讨论，我们的适应性能差和差结局的概念到PAD问题。我们研究使用使用的NDCLD 2013数据库3个基线（手工制作，传递学习，并从头开始培训）虹膜PAD的偏差。实验结果指出，女性用户会通过PAD显著的保护较少，相较于男性。

17. Bundle Adjustment on a Graph Processor [PDF] 返回目录
Joseph Ortiz, Mark Pupilli, Stefan Leutenegger, Andrew J. Davison
Abstract: Graph processors such as Graphcore's Intelligence Processing Unit (IPU) are part of the major new wave of novel computer architecture for AI, and have a general design with massively parallel computation, distributed on-chip memory and very high inter-core communication bandwidth which allows breakthrough performance for message passing algorithms on arbitrary graphs. We show for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor using Gaussian Belief Propagation. Our simple but fully parallel implementation uses the 1216 cores on a single IPU chip to, for instance, solve a real BA problem with 125 keyframes and 1919 points in under 40ms, compared to 1450ms for the Ceres CPU library. Further code optimisation will surely increase this difference on static problems, but we argue that the real promise of graph processing is for flexible in-place optimisation of general, dynamically changing factor graphs representing Spatial AI problems. We give indications of this with experiments showing the ability of GBP to efficiently solve incremental SLAM problems, and deal with robust cost functions and different types of factors.
摘要：图形处理器，如Graphcore的情报处理单元（IPU）是新的计算机体系结构的主要新波对的AI的一部分，并且具有大规模并行计算的一般设计的，分布式的片上存储器和非常高的核心间的通信带宽这允许用于对任意图形消息传递算法突破性的性能。我们显示的第一次束调整（BA）的经典计算机视觉问题，可以使用高斯置信传播的图形处理器速度极快解决。我们的简单，但完全并行实现使用1216个内核的单芯片IPU上，比如，解决一个真正的问题，BA 125个关键帧和下40ms的1919年点，较1450ms的谷神星CPU库。进一步的代码优化，必将加大对静电问题这种差异，但我们认为，图形处理真正的承诺是一般灵活就地优化，动态变化的表示空间AI问题因子图。我们给这个适应症有实验显示英镑有效地解决增量SLAM问题，以及应对强大的成本函数和不同类型的因素的能力。

18. Pixel-Level Self-Paced Learning for Super-Resolution [PDF] 返回目录
W. Lin, J. Gao, Q. Wang, X. Li
Abstract: Recently, lots of deep networks are proposed to improve the quality of predicted super-resolution (SR) images, due to its widespread use in several image-based fields. However, with these networks being constructed deeper and deeper, they also cost much longer time for training, which may guide the learners to local optimization. To tackle this problem, this paper designs a training strategy named Pixel-level Self-Paced Learning (PSPL) to accelerate the convergence velocity of SISR models. PSPL imitating self-paced learning gives each pixel in the predicted SR image and its corresponding pixel in ground truth an attention weight, to guide the model to a better region in parameter space. Extensive experiments proved that PSPL could speed up the training of SISR models, and prompt several existing models to obtain new better results. Furthermore, the source code is available at this https URL.
摘要：近日，许多深网络都提出了提高预测超分辨率（SR）图像的质量，因为它在几个基于图像的领域广泛使用。然而，与正在建设这些网络越陷越深，他们的成本也更长的时间进行训练，这可以引导学习者局部优化。为了解决这个问题，本文设计了一个名为像素级自学（PSPL）加快SISR模型的收敛速度的训练策略。 PSPL模仿自学给出了预测SR图像中每个像素和地面实测其对应的像素的关注体重，模式，引导到一个更好的区域在参数空间。大量的实验证明，PSPL可能加速SISR模型的培训，并及时若干现有车型取得新的更好的结果。此外，源代码可在此HTTPS URL。

19. Show, Edit and Tell: A Framework for Editing Image Captions [PDF] 返回目录
Fawaz Sammani, Luke Melas-Kyriazi
Abstract: Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state-of-art performance on the MS COCO dataset both with and without sequence-level training.
摘要：大多数图像字幕框架直接从图像生成字幕，从视觉功能，以自然语言学习的映射。然而，编辑现有的标题可以比从头生成新的更容易。直观地说，在编辑字幕时，不需要模型的学习已经存在在字幕（即句子结构）的信息，从而使其能够专注于固定详细信息（例如代替重复字）。本文提出了一种新颖的方法来基于现有标题的迭代自适应细化图像字幕。具体地，我们的标题编辑模型由两个子模块：（1）EditNet，语言模块与自适应复制机制（复制 - LSTM）和选择性复制存储器注意机构（SCMA），和（2）DCNet，一个基于LSTM降噪自动编码器。这些组件使我们的模型可以直接从复制和修改现有的字幕。实验表明，我们的新方法实现对带或不带序列层次培训的MS COCO数据集的国家的艺术表演。

20. CNN-based Repetitive self-revised learning for photos' aesthetics imbalanced classification [PDF] 返回目录
Ying Dai
Abstract: Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on using repetitive self-revised learning (RSRL) to train the CNN-based aesthetics classification network by imbalanced data set. As RSRL, the network is trained repetitively by dropping out the low likelihood photo samples at the middle levels of aesthetics from the training data set based on the previously trained network. Further, the retained two networks are used in extracting highlight regions of the photos related with the aesthetic assessment. Experimental results show that the CNN-based repetitive self-revised learning is effective for improving the performances of the imbalanced classification.
摘要：审美评价是主观的，审美水平的分布不均衡。为了实现照片美学的自动评估，我们专注于利用重复自我修订学习（RSRL）培训由不平衡数据集基于CNN美学分类网。作为RSRL，网络是通过在从训练数据集基于先前训练的网络上的美学的中间水平掉出低可能性照片样品反复训练。此外，保留两个网络可以提取与审美评估相关照片的亮点地区使用。实验结果表明，基于CNN重复自译自学习是有效提高不平衡分类的性能。

21. GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition [PDF] 返回目录
Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng
Abstract: Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture. Most existing AU recognition approaches leverage geometry information in a straightforward 2D or 3D manner, which either ignore 3D manifold information or suffer from high computational costs. In this paper, we propose a novel geodesic guided convolution (GeoConv) for AU recognition by embedding 3D manifold information into 2D convolutions. Specifically, the kernel of GeoConv is weighted by our introduced geodesic weights, which are negatively correlated to geodesic distances on a coarsely reconstructed 3D face model. Moreover, based on GeoConv, we further develop an end-to-end trainable framework named GeoCNN for AU recognition. Extensive experiments on BP4D and DISFA benchmarks show that our approach significantly outperforms the state-of-the-art AU recognition methods.
摘要：自动面部动作单元（AU）识别已引起高度重视，但仍然是一个艰巨的任务，为当地的面部肌肉的细微变化都难以彻底攻克。大多数现有的AU识别方法在一个简单的二维或三维的方式利用几何信息，它要么忽略3D歧管信息或从高计算成本的困扰。在本文中，我们通过嵌入3D歧管信息为二维卷积提出AU识别的新颖测地引导卷积（GeoConv）。具体而言，GeoConv的内核是由我们引入短程线的权重，其被负测地距离上的粗糙重建的3D脸部模型相关加权。此外，基于GeoConv，我们进一步开发命名为GeoCNN AU识别的终端到终端的可训练的框架。在BP4D和DISFA基准大量实验表明，我们的方法显著优于国家的最先进的AU识别方法。

22. DeLTra: Deep Light Transport for Projector-Camera Systems [PDF] 返回目录
Bingyao Huang, Haibin Ling
Abstract: In projector-camera systems, light transport models the propagation from projector emitted radiance to camera-captured irradiance. In this paper, we propose the first end-to-end trainable solution named Deep Light Transport (DeLTra) that estimates radiometrically uncalibrated projector-camera light transport. DeLTra is designed to have two modules: DepthToAtrribute and ShadingNet. DepthToAtrribute explicitly learns rays, depth and normal, and then estimates rough Phong illuminations. Afterwards, the CNN-based ShadingNet renders photorealistic camera-captured image using estimated shading attributes and rough Phong illuminations. A particular challenge addressed by DeLTra is occlusion, for which we exploit epipolar constraint and propose a novel differentiable direct light mask. Thus, it can be learned end-to-end along with the other DeLTra modules. Once trained, DeLTra can be applied simultaneously to three projector-camera tasks: image-based relighting, projector compensation and depth/normal reconstruction. In our experiments, DeLTra shows clear advantages over previous arts with promising quality and meanwhile being practically convenient.
摘要：投影仪摄像系统，光传输模型从投影仪发射辐射传播到照相机捕获的辐照度。在本文中，我们提出了第一端至端的可训练的命名深光传输（DeLTra），其估计辐射测量未校准投影器相机的光传输解决方案。 DeLTra设计有两个模块：DepthToAtrribute和ShadingNet。 DepthToAtrribute明确学习的光线，深度和正常的，然后粗略估计海防灯饰。此后，基于CNN-ShadingNet使用估计的着色属性和粗糙的Phong照明呈现逼真的相机拍摄图像。谈到了DeLTra一个特别的挑战是闭塞，为此我们利用极约束，提出了一种新的微直接光罩。因此，它可以与其他DeLTra模块一起被学习的端至端。一旦被训练，DeLTra可以同时施加到三个投影器相机任务：基于图像的重新点灯，投影仪补偿和深度/正常重建。在我们的实验中，DeLTra显示了以前的艺术明显的优势与品质承诺，同时是一个实用方便。

23. Clean-Label Backdoor Attacks on Video Recognition Models [PDF] 返回目录
Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang
Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks which can hide backdoor triggers in DNNs by poisoning training data. A backdoored model behaves normally on clean test images, yet consistently predicts a particular target class for any test examples that contain the trigger pattern. As such, backdoor attacks are hard to detect, and have raised severe security concerns in real-world applications. Thus far, backdoor research has mostly been conducted in the image domain with image classification models. In this paper, we show that existing image backdoor attacks are far less effective on videos, and outline 4 strict conditions where existing attacks are likely to fail: 1) scenarios with more input dimensions (eg. videos), 2) scenarios with high resolution, 3) scenarios with a large number of classes and few examples per class (a "sparse dataset"), and 4) attacks with access to correct labels (eg. clean-label attacks). We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions. We show on benchmark video datasets that our proposed backdoor attack can manipulate state-of-the-art video models with high success rates by poisoning only a small proportion of training data (without changing the labels). We also show that our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods, and can even be applied to improve image backdoor attacks. Our proposed video backdoor attack not only serves as a strong baseline for improving the robustness of video models, but also provides a new perspective for more understanding more powerful backdoor attacks.
摘要：深层神经网络（DNNs）很容易受到攻击的后门可以通过中毒的训练数据掩盖了借壳DNNs触发器。甲后门模型行为正常上干净的测试图像，但始终预测出某个目标类包含触发图案的任何试验例。因此，后门攻击很难被发现，并提出了在实际应用中严重的安全问题。迄今为止，借壳的研究大多被与图像分类模型图像域进行。在本文中，我们表明，现有的图像后门攻击是远远低于有效的视频和轮廓4个严格的条件，其中现有的攻击很可能会失败：1）有更多的输入尺寸的场景（如视频），2）高分辨率场景。 3）情景有大量的类和每个类（一个“稀疏数据集”），以及4）能够访问正确的标签（如清洁标签攻击）攻击的几个例子。我们建议使用一个通用的对抗性触发的后门触发攻击视频识别模型，一种情况，后门攻击很可能是由上述4个条件严格的挑战。我们展示的视频基准数据集，我们提出的后门攻击可以通过毒害只训练数据的一小部分（不改变标签）操纵的高成功率的国家的最先进的视频机型。我们还表明，我们所提出的后门攻击是国家的最先进的后门防御/检测方法抵抗，甚至可以用于改善图像后门攻击。我们提出的视频后门攻击，不仅可作为改善视频模型的鲁棒性很强的基础，而且还提供了更多的理解更强大的后门攻击一个新的视角。

24. DA4AD: End-to-end Deep Attention Aware Features Aided Visual Localization for Autonomous Driving [PDF] 返回目录
Yao Zhou, Guowei Wan, Shenhua Hou, Li Yu, Gang Wang, Xiaofei Rui, Shiyu Song
Abstract: We present a visual localization framework aided by novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too scarce to deliver constant and robust localization results in challenging scenarios. In this work, we seek to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for long-term matching in the scene through a novel end-to-end deep neural network. Furthermore, our learned feature descriptors are demonstrated to be competent to establish robust matches and therefore successfully estimate the optimal camera poses with high precision. We comprehensively validate the effectiveness of our method using a freshly collected dataset with high-quality ground truth trajectories and hardware synchronization between sensors. Results demonstrate that our method achieves a competitive localization accuracy when compared to the LiDAR-based localization solutions under various challenging circumstances, leading to a potential low-cost localization solution for autonomous driving.
摘要：我们提出通过为达到厘米级定位精度自主驾驶的新深重视感知功能辅助视觉本地化框架。传统方法的视觉定位问题依靠手工的功能或在道路上人为对象。它们被称为是要么容易引起严重的外观或照明的变化，或太稀少到在挑战方案提供恒定和健壮定位结果不稳定匹配。在这项工作中，我们寻求利用深注意机制来搜索突出，特色鲜明和稳定的功能，是通过一个新的终端到终端的深层神经网络场景中的长期匹配良好。此外，我们了解到特征描述信息被证明是有能力建立强大的比赛，因此成功地估计高精度的最佳摄影机姿态。我们全面验证使用高品质的地面真实轨迹和传感器之间的硬件同步的新收集的数据集我们的方法的有效性。结果表明，相比于在各种困难的情况下的基于激光雷达的本地化解决方案时，导致对自动驾驶潜在低成本的本地化解决方案我们的方法实现具有竞争力的定位精度。

25. Centrality Graph Convolutional Networks for Skeleton-based Action Recognition [PDF] 返回目录
Dong Yang, Monica Mengqi Li, Hong Fu, Jicong Fan, Howard Leung
Abstract: The topological structure of skeleton data plays a significant role in human action recognition. Combining the topological structure with graph convolutional networks has achieved remarkable performance. In existing methods, modeling the topological structure of skeleton data only considered the connections between the joints and bones, and directly use physical information. However, there exists an unknown problem to investigate the key joints, bones and body parts in every human action. In this paper, we propose the centrality graph convolutional networks to uncover the overlooked topological information, and best take advantage of the information to distinguish key joints, bones, and body parts. A novel centrality graph convolutional network firstly highlights the effects of the key joints and bones to bring a definite improvement. Besides, the topological information of the skeleton sequence is explored and combined to further enhance the performance in a four-channel framework. Moreover, the reconstructed graph is implemented by the adaptive methods on the training process, which further yields improvements. Our model is validated by two large-scale datasets, NTU-RGB+D and Kinetics, and outperforms the state-of-the-art methods.
摘要：骨架数据的拓扑结构在人类动作识别一个显著的作用。结合了图形卷积网络的拓扑结构，取得了骄人的业绩。在现有的方法，造型骨架数据的拓扑结构只考虑了关节和骨骼之间的连接，并直接使用的物理信息。然而，存在一个未知的问题，调查在每个人的行动的关键关节，骨骼和身体部位。在本文中，我们提出的核心图形卷积网络揭开忽视的拓扑信息，以及信息最好趁区分关键关节，骨骼和身体部位。一种新型的核心地位图形卷积网络首先突出了关键关节和骨骼带来了一定的改善作用。另外，骨架序列的拓扑信息进行了探索和组合，以进一步增强在四通道框架的性能。此外，重建的曲线图是通过在训练过程中，这进一步改善产率自适应方法来实现。我们的模型是由两个大规模数据集，NTU-RGB + d和动力学，，优于国家的最先进的方法验证。

26. Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation [PDF] 返回目录
István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
Abstract: Heatmap representations have formed the basis of 2D human pose estimation systems for many years, but their generalizations for 3D pose have only recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and the Z axis to metric depth around the subject. To obtain metric-scale predictions, these methods must include a separate, explicit post-processing step to resolve scale ambiguity. Further, they cannot encode body joint positions outside of the image boundaries, leading to incomplete pose estimates in case of image truncation. We address these limitations by proposing metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject, instead of being aligned with image space. We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner. This reinterpretation of the heatmap dimensions allows us to estimate complete metric-scale poses without test-time knowledge of the focal length or person distance and without relying on anthropometric heuristics in post-processing. Furthermore, as the image space is decoupled from the heatmap space, the network can learn to reason about joints beyond the image boundary. Using ResNet-50 without any additional learned layers, we obtain state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems. We make our code publicly available to facilitate further research (see this https URL).
摘要：热图表示已形成多年2D人体姿势估计系统的基础，但他们对3D姿态概括最近才考虑。这包括2.5D体积热图，其X和Y轴对应于图像的空间和Z轴，以围绕对象度量深度。为了获得指标规模的预测，这些方法必须包括一个独立的，明确的后处理步骤，以解决规模歧义。此外，他们不能编码人体关节位置的图像边界之外，从而导致不完整的姿态估计在图像截断的情况。我们通过提出的度量尺度截断稳健（地铁）的体积热图，其尺寸度量3D空间中定义的对象附近，而不是与图像空间对准解决这些限制。我们培养了全卷积网络从单眼RGB中的端至端的方式估计，例如热图。热图尺寸的重新诠释这使得我们估计完备的度量尺度姿势不焦距或人距离的测试时间的知识和没有后处理依靠人体启发。此外，由于图像空间从热图的空间分离，网络可以学习推理图像边界之外的关节。使用RESNET-50没有任何附加了解到层，我们得到的Human3.6M和MPI-INF-3DHP基准国家的先进成果。由于我们的方法简单，快速，它可以成为实时自上而下多人姿态估计系统的有用成分。我们使我们的代码公开可用，以便进一步研究（见本HTTPS URL）。

27. Segmentation of Satellite Imagery using U-Net Models for Land Cover Classification [PDF] 返回目录
Priit Ulmas, Innar Liiv
Abstract: The focus of this paper is using a convolutional machine learning model with a modified U-Net structure for creating land cover classification mapping based on satellite imagery. The aim of the research is to train and test convolutional models for automatic land cover mapping and to assess their usability in increasing land cover mapping accuracy and change detection. To solve these tasks, authors prepared a dataset and trained machine learning models for land cover classification and semantic segmentation from satellite images. The results were analysed on three different land classification levels. BigEarthNet satellite image archive was selected for the research as one of two main datasets. This novel and recent dataset was published in 2019 and includes Sentinel-2 satellite photos from 10 European countries made in 2017 and 2018. As a second dataset the authors composed an original set containing a Sentinel-2 image and a CORINE land cover map of Estonia. The developed classification model shows a high overall F\textsubscript{1} score of 0.749 on multiclass land cover classification with 43 possible image labels. The model also highlights noisy data in the BigEarthNet dataset, where images seem to have incorrect labels. The segmentation models offer a solution for generating automatic land cover mappings based on Sentinel-2 satellite images and show a high IoU score for land cover classes such as forests, inland waters and arable land. The models show a capability of increasing the accuracy of existing land classification maps and in land cover change detection.
摘要：本文的焦点是使用与基于卫星图像创建土地覆盖分类映射变形的U-Net的结构的卷积机器学习模型。这项研究的目的是为了训练和测试卷积模型自动土地覆盖图和评估，增加土地覆盖制图精度和变化检测其可用性。为了解决这些任务，作者编写了从卫星图像土地覆盖分类和语义分割的数据集和训练的机器学习模型。在三个不同的土地分类级别的结果进行分析。选择BigEarthNet卫星图像档案的研究作为两个主要的数据集之一。这种新颖的和最近的数据集出版于2019和包括来自10个欧洲国家的Sentinel-2卫星照片在2017年和2018年取得作为第二个数据集由含有哨兵-2图像和爱沙尼亚的CORINE土地覆盖图一套独创的作者。发达分类模型示出了高的总˚F\ textsubscript {1}得分0.749与43级可能的图像的标签多类土地覆盖分类。该模型还突出了BigEarthNet数据集，嘈杂数据，其中的图像似乎有不正确的标签。分割模型提供了用于基于哨兵-2卫星图像自动土地覆盖的映射的溶液，并显示高的得分IOU土地覆盖类，如森林，内陆水和耕地。该模型显示提高现有土地分类图和土地覆盖变化检测精度的能力。

28. Optimizing JPEG Quantization for Classification Networks [PDF] 返回目录
Zhijing Li, Christopher De Sa, Adrian Sampson
Abstract: Deep learning for computer vision depends on lossy image compression: it reduces the storage required for training and test data and lowers transfer costs in deployment. Mainstream datasets and imaging pipelines all rely on standard JPEG compression. In JPEG, the degree of quantization of frequency coefficients controls the lossiness: an 8 by 8 quantization table (Q-table) decides both the quality of the encoded image and the compression ratio. While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system. This work asks whether JPEG Q-tables exist that are "better" for specific vision networks and can offer better quality--size trade-offs than ones designed for human perception or minimal distortion. We reconstruct an ImageNet test set with higher resolution to explore the effect of JPEG compression under novel Q-tables. We attempt several approaches to tune a Q-table for a vision task. We find that a simple sorted random sampling method can exceed the performance of the standard JPEG Q-table. We also use hyper-parameter tuning techniques including bounded random search, Bayesian optimization, and composite heuristic optimization methods. The new Q-tables we obtained can improve the compression rate by 10% to 200% when the accuracy is fixed, or improve accuracy up to $2\%$ at the same compression rate.
摘要：深学习计算机视觉取决于有损图像压缩：它减少了训练和测试数据所需的存储空间，并降低了部署成本转移。主流数据集和图像管线都依赖于标准的JPEG压缩。在JPEG，的频率系数控制的量化程度的lossiness：一个由8量化表8（Q-表）判定所述编码图像的质量和压缩比。虽然工作很长的历史一直寻求更好的Q-表，现有的工作要么力求减少图像失真或以优化人类视觉系统的模型。这项工作要求JPEG Q-表中是否存在属于特定视觉网络“更好”，并能提供更好的质量 - 比那些专为人类感知或最小的失真大小的权衡。我们重构ImageNet测试组具有更高的分辨率，以探索JPEG压缩的下新颖Q-表的效果。我们尝试多种方法来调一调Q表的视觉任务。我们发现，一个简单的分类随机抽样的方法可以超过标准JPEG Q-表的性能。我们还使用超参数调节技术，包括有界的随机搜索，贝叶斯优化和复合启发式优化方法。我们得到的新的Q-表可以由10％提高压缩率为200％时的准确度是固定的，或提高精度可达到$ 2 \％$以相同的压缩率。

29. Anysize GAN: A solution to the image-warping problem [PDF] 返回目录
Connah Kendrick, David Gillespie, Moi Hoon Yap
Abstract: We propose a new type of General Adversarial Network (GAN) to resolve a common issue with Deep Learning. We develop a novel architecture that can be applied to existing latent vector based GAN structures that allows them to generate on-the-fly images of any size. Existing GAN for image generation requires uniform images of matching dimensions. However, publicly available datasets, such as ImageNet contain thousands of different sizes. Resizing image causes deformations and changing the image data, whereas as our network does not require this preprocessing step. We make significant changes to the standard data loading techniques to enable any size image to be loaded for training. We also modify the network in two ways, by adding multiple inputs and a novel dynamic resizing layer. Finally we make adjustments to the discriminator to work on multiple resolutions. These changes can allow multiple resolution datasets to be trained on without any resizing, if memory allows. We validate our results on the ISIC 2019 skin lesion dataset. We demonstrate our method can successfully generate realistic images at different sizes without issue, preserving and understanding spatial relationships, while maintaining feature relationships. We will release the source codes upon paper acceptance.
摘要：本文提出了一种新型通用对抗性网络（GAN）的解决与深度学习的共同课题。我们开发了可应用于，使他们能够产生任意大小的上即时图像现有潜基于矢量GAN结构的新颖体系结构。为图像生成现有GAN需要匹配的尺寸的均匀的图象。然而，公开可用的数据集，如ImageNet包含数千个不同的尺寸。调整图像大小会导致变形和改变图像数据，而作为我们的网络不需要这个前工序。我们对标准的数据加载技术显著的变化，使任何大小的图片被载入训练。我们还修改网络有两种方式，通过将多个输入和一个新颖动态调整层。最后，我们进行调整，以鉴别工作多种分辨率。这些改变可以允许多个分辨率的数据集进行训练上没有任何调整大小，如果内存允许。我们验证的ISIC 2019皮损数据集我们的结果。我们证明我们的方法可以成功地产生不同大小的逼真的图像没有问题，维护和理解的空间关系，同时保持功能的关系。在论文录用，我们将发布源代码。

30. Non-linear Neurons with Human-like Apical Dendrite Activations [PDF] 返回目录
Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Nicolae-Catalin Ristea, Nicu Sebe
Abstract: In order to classify linearly non-separable data, neurons are typically organized into multi-layer neural networks that are equipped with at least one hidden layer. Inspired by some recent discoveries in neuroscience, we propose a new neuron model along with a novel activation function enabling learning of non-linear decision boundaries using a single neuron. We show that a standard neuron followed by the novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. Furthermore, we conduct experiments on three benchmark data sets from computer vision and natural language processing, i.e. Fashion-MNIST, UTKFace and MOROCO, showing that the ADA and the leaky ADA functions provide superior results to Rectified Liner Units (ReLU) and leaky ReLU, for various neural network architectures, e.g. 1-hidden layer or 2-hidden layers multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) such as LeNet, VGG, ResNet and Character-level CNN. We also obtain further improvements when we change the standard model of the neuron with our pyramidal neuron with apical dendrite activations (PyNADA).
摘要：为了线性分类不可分离数据，神经元通常被组织成被配备有至少一个隐藏层的多层神经网络。在神经科学最近的一些发现的启发，我们提出了一个新颖的激活功能能够使用单个神经元的非线性决策边界的学习沿着一个新的神经元模型。我们表明，一个标准的神经元随后新颖顶端树突激活（ADA）可以学习与100％的准确度的XOR逻辑函数。此外，我们进行三个基准数据集实验从计算机视觉和自然语言处理，即时装-MNIST，UTKFace和MOROCO，显示出ADA和漏泄ADA函数提供优异的结果，以整流衬里单元（RELU）和漏泄RELU，各种神经网络结构，如1-隐藏层或2-隐藏层的多层感知器（的MLP）和卷积神经网络（细胞神经网络），诸如LeNet，VGG，RESNET和字符级别的CNN。我们也得到进一步的改善，当我们改变神经元的标准模型与我们的锥体神经元与顶树突激活（PyNADA）。

31. Knowledge graph based methods for record linkage [PDF] 返回目录
B. Gautam, O. Ramos Terrades, J. M. Pujades, M. Valls
Abstract: Nowadays, it is common in Historical Demography the use of individual-level data as a consequence of a predominant life-course approach for the understanding of the demographic behaviour, family transition, mobility, etc. Record linkage advance is key in these disciplines since it allows to increase the volume and the data complexity to be analyzed. However, current methods are constrained to link data coming from the same kind of sources. Knowledge graph are flexible semantic representations, which allow to encode data variability and semantic relations in a structured manner. In this paper we propose the knowledge graph use to tackle record linkage task. The proposed method, named {\bf WERL}, takes advantage of the main knowledge graph properties and learns embedding vectors to encode census information. These embeddings are properly weighted to maximize the record linkage performance. We have evaluated this method on benchmark data sets and we have compared it to related methods with stimulating and satisfactory results.
摘要：如今，它是常见的历史人口学使用个人层面的数据作为人口行为，家庭化，流动性等记录联动推进的理解主要的生命历程的做法的结果是在这些学科的关键因为它允许以增加要被分析的体积和数据复杂。然而，目前的方法被限制链接数据来自同一种来源的。知识图是柔性的语义表示，其允许编码数据变异和语义关系以结构化方式。在本文中，我们提出了知识图使用，以解决创纪录的联动任务。所提出的方法，命名为{\ BF WERL}，作为主要知识图形属性并获知嵌入载体来编码人口普查信息的优点。这些的嵌入适当加权最大限度地记录链接性能。我们已经评估了基准数据集这种方法和我们相比，它具有刺激和令人满意的结果相关方法。

32. Meta-SVDD: Probabilistic Meta-Learning for One-Class Classification in Cancer Histology Images [PDF] 返回目录
Jevgenij Gamper, Brandon Chan, Yee Wah Tsang, David Snead, Nasir Rajpoot
Abstract: To train a robust deep learning model, one usually needs a balanced set of categories in the training data. The data acquired in a medical domain, however, frequently contains an abundance of healthy patients, versus a small variety of positive, abnormal cases. Moreover, the annotation of a positive sample requires time consuming input from medical domain experts. This scenario would suggest a promise for one-class classification type approaches. In this work we propose a general one-class classification model for histology, that is meta-trained on multiple histology datasets simultaneously, and can be applied to new tasks without expensive re-training. This model could be easily used by pathology domain experts, and potentially be used for screening purposes.
摘要：要培养一个强大的深度学习模型，一个通常需要一个平衡的一套训练数据类别。在医疗领域获得的数据，但是，常常含有丰富的健康患者，与少量多样的正面，异常情况。此外，阳性样品的注释需要来自医疗领域专家耗时的输入。这种情况下建议对一类分类型方案的承诺。在这项工作中，我们提出了组织学一般的单类分类模式，即在多个组织学数据集同时荟萃培训，并可以应用到新的任务，而无需昂贵的再培训。这种模式可以通过病理领域专家很容易地使用，并有可能被用于筛选目的。

33. StereoNeuroBayesSLAM: A Neurobiologically Inspired Stereo Visual SLAM System Based on Direct Sparse Method [PDF] 返回目录
Taiping Zeng, Xiaoli Li, Bailu Si
Abstract: We propose a neurobiologically inspired visual simultaneous localization and mapping (SLAM) system based on direction sparse method to real-time build cognitive maps of large-scale environments from a moving stereo camera. The core SLAM system mainly comprises a Bayesian attractor network, which utilizes neural responses of head direction (HD) cells in the hippocampus and grid cells in the medial entorhinal cortex (MEC) to represent the head direction and the position of the robot in the environment, respectively. Direct sparse method is employed to accurately and robustly estimate velocity information from a stereo camera. Input rotational and translational velocities are integrated by the HD cell and grid cell networks, respectively. We demonstrated our neurobiologically inspired stereo visual SLAM system on the KITTI odometry benchmark datasets. Our proposed SLAM system is robust to real-time build a coherent semi-metric topological map from a stereo camera. Qualitative evaluation on cognitive maps shows that our proposed neurobiologically inspired stereo visual SLAM system outperforms our previous brain-inspired algorithms and the neurobiologically inspired monocular visual SLAM system both in terms of tracking accuracy and robustness, which is closer to the traditional state-of-the-art one.
摘要：我们提出了一个神经生物学的启发视觉同步定位和映射（SLAM）系统基于方向稀疏方法的大规模环境实时构建认知地图从移动立体相机。核心SLAM系统主要包括贝叶斯吸引网络，其利用在内侧内嗅皮层（MEC）来表示头方向和机器人的在环境中的位置的海马和网格单元头方向（HD）的细胞的神经反应，分别。直接稀疏方法被用于从立体摄像机准确且鲁棒估计速度信息。输入旋转和平移速度由HD细胞和网格单元网络分别集成。我们证明在KITTI测距基准数据集我们神经生物学的启发立体视觉SLAM系统。我们提出的SLAM系统是强大的实时从构建立体相机连贯半度量拓扑图。认知地图显示的定性评价，我们提出的神经生物学的启发立体视觉SLAM系统优于我们先前的脑启发算法和神经生物学的启发单眼视觉SLAM系统无论是在跟踪精度和鲁棒性方面，更接近传统的国家的最-art之一。

34. Toward Adaptive Guidance: Modeling the Variety of User Behaviors in Continuous-Skill-Improving Experiences of Machine Operation Tasks [PDF] 返回目录
Long-fei Chen, Yuichi Nakamura, Kazuaki Kondo
Abstract: An adaptive guidance system that supports equipment operators requires a comprehensive model of task and user behavior that considers different skill and knowledge levels as well as diverse situations. In this study, we investigated the relationships between user behaviors and skill levels under operational conditions. We captured sixty samples of two sewing tasks performed by five operators using a head-mounted RGB-d camera and a static gaze tracker. We examined the operators' gaze and head movements, and hand interactions to essential regions (hotspots on machine surface) to determine behavioral differences among continuous skill improving experiences. We modeled the variety of user behaviors to an extensive task model with a two-step automatic approach, baseline model selection and experience integration. The experimental results indicate that some features, such as task execution time and user head movements, are good indexes for skill level and provide valuable information that can be applied to obtain an effective task model. Operators with varying knowledge and operating habits demonstrate different operational features, which can contribute to the design of user-specific guidance.
摘要：一种自适应制导系统，支持设备操作需要的任务和用户行为在考虑到不同的技能和知识水平，以及不同情况下的一个综合模型。在这项研究中，我们调查的操作条件下用户行为和技术水平之间的关系。我们抓住了两个缝合任务60个样品用五个运营进行头戴式RGB-d相机和静态凝视跟踪器。我们检查了经营者的凝视和头部运动，并手互动，必需区域（在机器表面热点），以确定连续技能提高的经历中的行为差异。我们模拟了各种用户行为的一个广泛的任务模型两步自动方式，基准模型的选择和体验整合。实验结果表明，一些功能，如任务执行时间和用户的头部运动，是技术水平良好的指标，并提供可应用于获得有效的任务模型有价值的信息。具有不同的知识和操作习惯运营商表现出不同的运行特点，这将有助于用户具体指导的设计。

35. Neural networks approach for mammography diagnosis using wavelets features [PDF] 返回目录
Essam A. Rashed, and Mohamed G. Awad
Abstract: A supervised diagnosis system for digital mammogram is developed. The diagnosis processes are done by transforming the data of the images into a feature vector using wavelets multilevel decomposition. This vector is used as the feature tailored toward separating different mammogram classes. The suggested model consists of artificial neural networks designed for classifying mammograms according to tumor type and risk level. Results are enhanced from our previous study by extracting feature vectors using multilevel decompositions instead of one level of decomposition. Radiologist-labeled images were used to evaluate the diagnosis system. Results are very promising and show possible guide for future work.
摘要：监管诊断系统数字乳房X线照片的发展。诊断方法是通过使用多级小波分解的图像的数据变换成特征向量来完成。该载体被用作朝向分离不同乳房X线照片的类定制的特征。建议的模型由设计根据肿瘤类型和风险等级分类乳房X线照片的人工神经网络。结果是从我们以前的研究中提取的使用多层次的分解，而不是分解的一个层次特征向量增强。放射科医生标记的图像被用来评估诊断系统。结果非常乐观，并表示今后的工作可能指导。

36. Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder [PDF] 返回目录
Zhisheng Xiao, Qing Yan, Yali Amit
Abstract: Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, a recent study shows that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that some of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances compared with other OOD method applied on VAE.
摘要：深概率生成模型使造型非常高维数据的可能性。生成建模的一个重要应用应该是由似然设定的阈值，以检测出的分布（OOD）样品的能力。然而，最近的一项研究表明，概率生成模型可以在某些情况下，对某些类型的OOD的样本分配较高的可能性，使得基于可能性的阈值问题的OOD检测规则。为了解决这个问题，一些OOD的检测方法已经被提出了深刻的生成模型。在本文中，我们做的观察，当基于变自动编码器（VAE）应用于生成模型一些方法失败。作为替代方案，我们建议可能性遗憾，高效的OOD得分VAES。我们的基准我们提出的在现有的方法方法和实证结果表明，与应用于其他VAE OOD方法相比，我们的方法以获得最佳的整体OOD检测性能。

37. Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations [PDF] 返回目录
Aditya Golatkar, Alessandro Achille, Stefano Soatto
Abstract: We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.
摘要：我们描述了对来自改善了，概括了以前的方法，以不同的读出功能，并可以扩展以确保网络的激活忘记一个训练有素的深网络训练数据队列删除依赖的过程。我们引进多少信息可以每个查询中提取约从只输入 - 输出行为观察黑匣子网络被遗忘的群体一个新的约束。所提出的遗忘过程具有从所述模型的线性化版本的微分方程衍生的确定性部分，和一个随机部分，通过添加可确保信息的破坏噪声量身定做的损失景观的几何形状。我们利用通过神经切线内核启发计算在激活的信息DNN的激活和权重动态之间的连接。

38. From Perspective X-ray Imaging to Parallax-Robust Orthographic Stitching [PDF] 返回目录
Javad Fotouhi, Xingtong Liu, Mehran Armand, Nassir Navab, Mathias Unberath
Abstract: Stitching images acquired under perspective projective geometry is a relevant topic in computer vision with multiple applications ranging from smartphone panoramas to the construction of digital maps. Image stitching is an equally prominent challenge in medical imaging, where the limited field-of-view captured by single images prohibits holistic analysis of patient anatomy. The barrier that prevents straight-forward mosaicing of 2D images is depth mismatch due to parallax. In this work, we leverage the Fourier slice theorem to aggregate information from multiple transmission images in parallax-free domains using fundamental principles of X-ray image formation. The semantics of the stitched image are restored using a novel deep learning strategy that exploits similarity measures designed around frequency, as well as dense and sparse spatial image content. Our pipeline, not only stitches images, but also provides orthographic reconstruction that enables metric measurements of clinically relevant quantities directly on the 2D image plane.
摘要：根据拼接透视投影几何获得的图像是与多个应用，从智能手机到全景数字地图的建设计算机视觉相关的话题。图像拼接是在医学成像中，其中由单个图像捕获的有限字段的视图禁止患者解剖结构的整体分析的同样突出的挑战。所述屏障防止直接的拼接2D图像是由于视差深度不匹配。在这项工作中，我们利用了傅立叶切片定理从在使用的X射线成像中的基本原则自由视差域的多个透射图像的聚集信息。拼接图像的语义使用，它利用围绕频率设计相似性度量，以及密集和稀疏空间图像的内容的新颖深学习策略恢复。我们的管道，不仅拆线的图像，而且还提供正字重建，使直接在2D图像平面临床相关的量的指标的测量。

39. A deep learning-facilitated radiomics solution for the prediction of lung lesion shrinkage in non-small cell lung cancer trials [PDF] 返回目录
Antong Chen, Jennifer Saouaf, Bo Zhou, Randolph Crawford, Jianda Yuan, Junshui Ma, Richard Baumgartner, Shubing Wang, Gregory Goldmacher
Abstract: Herein we propose a deep learning-based approach for the prediction of lung lesion response based on radiomic features extracted from clinical CT scans of patients in non-small cell lung cancer trials. The approach starts with the classification of lung lesions from the set of primary and metastatic lesions at various anatomic locations. Focusing on the lung lesions, we perform automatic segmentation to extract their 3D volumes. Radiomic features are then extracted from the lesion on the pre-treatment scan and the first follow-up scan to predict which lesions will shrink at least 30% in diameter during treatment (either Pembrolizumab or combinations of chemotherapy and Pembrolizumab), which is defined as a partial response by the Response Evaluation Criteria In Solid Tumors (RECIST) guidelines. A 5-fold cross validation on the training set led to an AUC of 0.84 +/- 0.03, and the prediction on the testing dataset reached AUC of 0.73 +/- 0.02 for the outcome of 30% diameter shrinkage.
摘要：在此，我们提出了肺部病灶反应的基础上，从患者的临床CT扫描在非小细胞肺癌的临床试验中提取radiomic功能预测深基于学习的方法。该方法开始于肺部病变的从该组初级和转移性病灶在各种解剖位置的分类。着眼于肺部病变，我们执行自动分割，并提取其三维体积。 Radiomic特征然后从上的预治疗扫描的病变和第一后续扫描预测哪些病变会在治疗过程中在直径收缩为至少30％提取（或者Pembrolizumab或化疗的组合和Pembrolizumab），其被定义为由响应评价标准在实体瘤（RECIST）的指导方针的部分响应。将5倍于导致的0.84 +/- 0.03的AUC，而在测试数据集中的预测训练集交叉验证的30％直径收缩的结果达到了0.73 +/- 0.02 AUC。

40. IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning [PDF] 返回目录
Xi Yang, Ding Xia, Taichi Kin, Takeo Igarashi
Abstract: Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a large-scale benchmark of classification and part segmentation by testing state-of-the-art networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: this https URL.
摘要：医学是深学习模式的一个重要应用领域。研究在这一领域的医学专业知识和数据的科学知识相结合。在本文中，而不是2D医学图像中，我们引入一个开放式接入3D颅内动脉瘤数据集内，这使得和网格为基础的分类和细分型号点为基础的应用程序。我们的数据集可以被用于诊断颅内动脉瘤，并提取用于医药和深度学习的其他领域，例如正常估计和表面重建削波操作颈部。我们通过测试国家的最先进的网络提供的分类和部分分割的大规模基准。我们还讨论了每种方法的性能和展示我们的数据集的挑战。从已公布的数据集可以在这里访问：此HTTPS URL。

41. Generating Embroidery Patterns Using Image-to-Image Translation [PDF] 返回目录
Mohammad Akif Beg, Jia Yuan Yu
Abstract: In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the mapping from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or vice versa, photo enhancement, to name a few. In this paper, we propose two machine learning techniques to solve the embroidery image-to-image translation. Our goal is to generate a preview image which looks similar to an embroidered image, from a user-uploaded image. Our techniques are modifications of two existing techniques, neural style transfer, and cycle-consistent generative-adversarial network. Neural style transfer renders the semantic content of an image from one domain in the style of a different image in another domain, whereas a cycle-consistent generative adversarial network learns the mapping from an input image to output image without any paired training data, and also learn a loss function to train this mapping. Furthermore, the techniques we propose are independent of any embroidery attributes, such as elevation of the image, light-source, start, and endpoints of a stitch, type of stitch used, fabric type, etc. Given the user image, our techniques can generate a preview image which looks similar to an embroidered image. We train and test our propose techniques on an embroidery dataset which consist of simple 2D images. To do so, we prepare an unpaired embroidery dataset with more than 8000 user-uploaded images along with embroidered images. Empirical results show that these techniques successfully generate an approximate preview of an embroidered version of a user image, which can help users in decision making.
摘要：在计算机视觉，机器学习和计算机图形学许多情况下，还有就是要学会从一个域的图像映射到另一个域的图像，称为图像到影像转换的要求。例如，风格转移，对象变身，在视觉上改变的天气条件下出现的图像中，改变了一天的外观形象到夜间图像或反之亦然，照片增强，仅举几例。在本文中，我们提出了两种机器学习技术，解决了刺绣图像 - 图像转换。我们的目标是产生预览图像看起来类似于绣图像，从用户上载图像。我们的技术是现有的两个技术改进，神经风格转移，周期一致生成对抗性的网络。神经样式转印呈现的图像的从一个域在另一个域中的不同的图像的样式的语义内容，而周期一致的生成对抗网络学习从输入图像到输出图像的映射没有任何配对的训练数据，并且还学习损失函数来训练这种映射。此外，我们提出的技术是独立于任何刺绣属性，诸如图像的标高，光源，启动，和一个线圈的端点，使用线圈的类型，织物类型等鉴于用户图像，我们的技术可的生成的预览图像看起来类似于绣图像。我们训练和测试我们提出了基于刺绣数据集，其由简单的2D图像的技术。要做到这一点，我们与刺绣图像以及超过8000用户上传影像准备一个未成对刺绣数据集。实证结果表明，这些技术成功地生成的用户图像的绣版，它可以帮助用户决策的近似预览。

42. Practical Full Resolution Learned Lossless Image Compression [PDF] 返回目录
Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool
Abstract: We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models such as PixelCNN, our method i) models the image distribution jointly with learned auxiliary representations instead of exclusively modeling the image distribution in RGB space, and ii) only requires three forward-passes to predict all pixel probabilities instead of one for each pixel. As a result, L3C obtains over two orders of magnitude speedups when sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN). Furthermore, we find that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.
摘要：本文提出了第一个实用了解到无损图像压缩系统，L3C，并表明它优于流行的设计的编解码器，PNG，WebP的和JPEG 2000在我们的方法的核心是自适应熵编码完全并行分层概率模型，被优化的端至端的压缩任务。相较于最近自回归离散概率模型，如PixelCNN，我们的方法I）机型了解到辅助申述联合图像传送，而不是完全模拟在RGB空间图像传送，以及ii）只需要三个正向传递预测到所有像素概率而不是一个用于每个像素。其结果，采样时相比，最快PixelCNN变体（多尺度-PixelCNN）L3C取得对两个数量级的加速的。此外，我们发现，学习辅助表达是至关重要的，优于预设辅助申述，如RGB显著金字塔。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-03-09

目录

摘要