摘要

1. Explainable Object-induced Action Decision for Autonomous Vehicles [PDF] 返回目录
Yiran Xu, Xiaoyin Yang, Lihang Gong, Hsuan-Chu Lin, Tz-Ying Wu, Yunsheng Li, Nuno Vasconcelos
Abstract: A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem. While it relies on scene understanding, the latter only considers objects that could originate hazard. These are denoted as action-inducing, since changes in their state should trigger vehicle actions. They also define a set of explanations for these actions, which should be produced jointly with the latter. An extension of the BDD100K dataset, annotated for a set of 4 actions and 21 explanations, is proposed. A new multi-task formulation of the problem, which optimizes the accuracy of both action commands and explanations, is then introduced. A CNN architecture is finally proposed to solve this problem, by combining reasoning about action inducing objects and global scene context. Experimental results show that the requirement of explanations improves the recognition of action-inducing objects, which in turn leads to better action predictions.
摘要：一个新的范例，提出了自主驾驶。端至端和流水线式的方法，并与新范式的谎言是人类是如何解决这个问题的启发。虽然它依赖于现场的了解，后者只认为可能源于危险对象。这些被称为活动感应，因为在他们的状态发生变化时触发车辆的行动。他们还定义了一组针对这些措施，它们将共同与后者生产的解释。在BDD100K数据集的扩展，注释为一组4个行动和21条的解释，建议。该问题的一种新的多任务制剂，其优化了动作命令和解释的准确性，然后将其引入。 CNN的架构，最后提出了解决这一问题，结合有关操作诱导对象和全局景物情境推理。实验结果表明，解释的要求，提高认识活动感应物体，而这又带来更好的行为预测。

2. Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation [PDF] 返回目录
Yansong Tang, Jiwen Lu, Jie Zhou
Abstract: Thanks to the substantial and explosively inscreased instructional videos on the Internet, novices are able to acquire knowledge for completing various tasks. Over the past decade, growing efforts have been devoted to investigating the problem on instructional video analysis. However, the most existing datasets in this area have limitations in diversity and scale, which makes them far from many real-world applications where more diverse activities occur. To address this, we present a large-scale dataset named as "COIN" for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under five different settings. Furthermore, we exploit two important characteristics (i.e., task-consistency and ordering-dependency) for localizing important steps in instructional videos. Accordingly, we propose two simple yet effective methods, which can be easily plugged into conventional proposal-based action detection models. We believe the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community. Our dataset, annotation toolbox and source code are available at this http URL.
摘要：由于互联网上的大量的爆炸inscreased教学视频，新手都能够获取知识来完成各种任务。在过去的十年中，越来越多的努力一直致力于研究上的教学视频分析的问题。然而，在这方面现有的大多数数据集在多元化，规模化，这使得他们不远的地方，更多样化的活动出现许多现实世界的应用程序的限制。为了解决这个问题，我们提出了命名为“筹码”综合教学视频分析大规模数据集。具有层次结构组织中，COIN数据集包含在与我们的日常生活中12个域（例如，车辆，工具等）180个任务11827个视频。随着新开发的工具箱，所有的视频都用一系列的步骤标签和相应的时间界限有效注解。为了给教学视频分析提供一个基准，我们评估了大量的下五种不同设置，在硬币的数据集的方法。此外，我们利用本地化的教学视频的重要步骤两个重要特征（即任务的一致性和排序依赖性）。因此，我们提出了两种简单而有效的方法，可以很容易地插入到传统的基于提案动作检测模型。我们相信，引进COIN数据集将促进对社区的教学视频分析的未来进行了深入研究。我们的数据集，注释工具箱和源代码都可以在这个HTTP URL。

3. Domain Adaptation by Class Centroid Matching and Local Manifold Self-Learning [PDF] 返回目录
Lei Tian, Yongqiang Tang, Liangchen Hu, Zhida Ren, Wensheng Zhang
Abstract: Domain adaptation has been a fundamental technology for transferring knowledge from a source domain to a target domain. The key issue of domain adaptation is how to reduce the distribution discrepancy between two domains in a proper way such that they can be treated indifferently for learning. Different from existing methods that make label prediction for target samples independently, in this paper, we propose a novel domain adaptation approach that assigns pseudo-labels to target data with the guidance of class centroids in two domains, so that the data distribution structure of both source and target domains can be emphasized. Besides, to explore the structure information of target data more thoroughly, we further introduce a local connectivity self-learning strategy into our proposal to adaptively capture the inherent local manifold structure of target samples. The aforementioned class centroid matching and local manifold self-learning are integrated into one joint optimization problem and an iterative optimization algorithm is designed to solve it with theoretical convergence guarantee. In addition to unsupervised domain adaptation, we further extend our method to the semi-supervised scenario including both homogeneous and heterogeneous settings in a direct but elegant way. Extensive experiments on five benchmark datasets validate the significant superiority of our proposal in both unsupervised and semi-supervised manners.
摘要：域名改编一直是从源域转移的知识，对目标域的基本技术。领域适应性的关键问题是如何减少以适当的方式，使得它们可以无差别地学习来处理两个域之间的分布差异。从构成标签预测目标样本独立地现有的方法不同，在本文中，我们提出了一个新颖的结构域的适应的办法，受让人伪标签与类质心的在两个结构域的引导目标数据，以使两者的数据分配结构源和目标域可以被强调。此外，为了更彻底地了解目标数据的结构信息，我们还引入了本地连接的自主学习策略为我们的建议，自适应捕获目标样本的固有本地歧管结构。上述类质心匹配和地方歧管自学集成到一个联合优化问题，迭代优化算法设计与理论衔接的保证，解决它。除了监督的领域适应性，我们进一步我们的方法扩展到半监督方案，包括直接的，但优雅的方式均相和多相设置。五个基准数据集大量的实验验证了我们的建议，在这两个无监督和半监督方式的显著优势。

4. SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness [PDF] 返回目录
Philipp Terhörst, Jan Niklas Kolf, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract: Face image quality is an important factor to enable high performance face recognition systems. Face quality assessment aims at estimating the suitability of a face image for recognition. Previous work proposed supervised solutions that require artificially or human labelled quality values. However, both labelling mechanisms are error-prone as they do not rely on a clear definition of quality and may not know the best characteristics for the utilized face recognition system. Avoiding the use of inaccurate quality labels, we proposed a novel concept to measure face quality based on an arbitrary face recognition model. By determining the embedding variations generated from random subnetworks of a face model, the robustness of a sample representation and thus, its quality is estimated. The experiments are conducted in a cross-database evaluation setting on three publicly available databases. We compare our proposed solution on two face embeddings against six state-of-the-art approaches from academia and industry. The results show that our unsupervised solution outperforms all other approaches in the majority of the investigated scenarios. In contrast to previous works, the proposed solution shows a stable performance over all scenarios. Utilizing the deployed face recognition model for our face quality assessment methodology avoids the training phase completely and further outperforms all baseline approaches by a large margin. Our solution can be easily integrated into current face recognition systems and can be modified to other tasks beyond face recognition.
摘要：人脸图像质量，以实现高性能的人脸识别系统的一个重要因素。面对质量评估的目的是为评估识别人脸图像的适用性。以前的工作提出了要求人工或人类标记质量监督价值的解决方案。然而，这两种标签机制是容易出错的，因为他们不依赖于质量的一个清晰的定义，可能不知道的使用人脸识别系统的最佳特性。避免使用不准确的质量标签，我们提出了基于任意面部识别模型一个新的概念来衡量面质量。通过确定从面部模型的随机子网络生成嵌入变型中，一个样本表示，因此的鲁棒性，其质量进行估计。实验是在一个跨数据库的评价设置三个公开数据库进行。我们比较我们提出的解决方案上的两个面的嵌入兑六种国家的最先进的学术界和产业界的方法。结果表明，我们的无监督的解决方案胜过在大多数情况下调查的所有其他方法。相较于以前的作品中，提出的解决方案显示了所有场景稳定的性能。完全利用了我们的脸质量评估方法所部署的脸部识别模式避免了训练阶段，并进一步优于所有基线大幅度接近。我们的解决方案可以方便地集成到现有的人脸识别系统，可以进行修改，以超越面部识别等任务。

5. Out-of-Distribution Detection for Skin Lesion Images with Deep Isolation Forest [PDF] 返回目录
Xuan Li, Yuchen Lu, Christian Desrosiers, Xue Liu
Abstract: In this paper, we study the problem of out-of-distribution detection in skin disease images. Publicly available medical datasets normally have a limited number of lesion classes (e.g. HAM10000 has 8 lesion classes). However, there exists a few thousands of clinically identified diseases. Hence, it is important if lesions not in the training data can be differentiated. Toward this goal, we propose DeepIF, a non-parametric Isolation Forest based approach combined with deep convolutional networks. We conduct comprehensive experiments to compare our DeepIF with three baseline models. Results demonstrate state-of-the-art performance of our proposed approach on the task of detecting abnormal skin lesions.
摘要：在本文中，我们研究外的分布检测的问题，在皮肤病的图像。可公开获得的数据集的医疗通常具有病变类的有限数目（例如HAM10000具有8病变类）。然而，存在几千个临床上确定的疾病。因此，它是重要的，如果没有病变在训练数据可以分化。为了实现这一目标，我们提出DeepIF，基于非参数隔离森林的做法深为卷积网络相结合。我们进行全面的实验，我们DeepIF三种基本模式进行比较。结果证明我们提出了关于检测皮肤异常病变任务方法的国家的最先进的性能。

6. Selecting Relevant Features from a Universal Representation for Few-shot Classification [PDF] 返回目录
Nikita Dvornik, Cordelia Schmid, Julien Mairal
Abstract: Popular approaches for few-shot classification consist of first learning a generic data representation based on a large annotated dataset, before adapting the representation to new classes given only a few labeled samples. In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. First, we obtain a universal representation by training a set of semantically different feature extractors. Then, given a few-shot learning task, we use our universal feature bank to automatically select the most relevant representations. We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training, which leads to state-of-the-art results on MetaDataset and improved accuracy on mini-ImageNet.
摘要：流行的方法为为数不多的镜头分类由第一学习基于一个大型注释的数据集的通用数据表示，调整表示给定只有少数标记的样品新类之前的。在这项工作中，我们提出了一种基于特征选择的新战略，这既是简单，比以前的功能适应方法更有效。首先，我们通过训练一组不同语义特征提取的获得普遍的代表性。然后，给出了几拍的学习任务，我们用我们的普遍特征的银行自动选择最相关的陈述。我们发现，建立在这种特征的顶部简单的非参数分类产生高精确度和推广到训练中从未见过的领域，这导致对MetaDataset国家的先进成果，提高了精度上的迷你ImageNet。

7. Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints [PDF] 返回目录
Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Otmar Hilliges, Jan Kautz
Abstract: Estimating 3D hand pose from 2D images is a difficult, inverse problem due to the inherent scale and depth ambiguities. Current state-of-the-art methods train fully supervised deep neural networks with 3D ground-truth data. However, acquiring 3D annotations is expensive, typically requiring calibrated multi-view setups or labor intensive manual annotations. While annotations of 2D keypoints are much easier to obtain, how to efficiently leverage such weakly-supervised data to improve the task of 3D hand pose prediction remains an important open question. The key difficulty stems from the fact that direct application of additional 2D supervision mostly benefits the 2D proxy objective but does little to alleviate the depth and scale ambiguities. Embracing this challenge we propose a set of novel losses. We show by extensive experiments that our proposed constraints significantly reduce the depth ambiguity and allow the network to more effectively leverage additional 2D annotated images. For example, on the challenging freiHAND dataset using additional 2D annotation without our proposed biomechanical constraints reduces the depth error by only $15\%$, whereas the error is reduced significantly by $50\%$ when the proposed biomechanical constraints are used.
摘要：从二维图像估计三维手姿势是一个困难的，逆问题由于固有规模和深度模糊。当前国家的最先进的训练方法，全面监督与3D地面实况数据深层神经网络。然而，在获取3D注释是昂贵的，通常需要校准多视图设置或劳动密集的手动注释。虽然2D关键点的注释更容易获得，如何有效地利用这种弱监督的数据，以提高3D手姿态预测的任务仍然是一个重要的悬而未决的问题。主要的困难来自一个事实，即附加2D监管的直接应用大多有利于2D代理目标，但无助于缓解深度和广度含糊不清造成的。积极应对挑战，我们提出了一套新颖的损失。我们通过展示大量的实验，我们提出的约束显著降低深度模糊性和使网络更有效地利用额外的2D注释的图像。例如，在使用附加的2D注释没有我们的提议的生物力学约束挑战freiHAND数据集仅由$ 15 \％$减小了深度误差，而当使用所提出的生物力学约束误差显著由$ 50 \％$降低。

8. Blockchain meets Biometrics: Concepts, Application to Template Protection, and Trends [PDF] 返回目录
Oscar Delgado-Mohatar, Julian Fierrez, Ruben Tolosana, Ruben Vera-Rodriguez
Abstract: Blockchain technologies provide excellent architectures and practical tools for securing and managing the sensitive and private data stored in biometric templates, but at a cost. We discuss opportunities and challenges in the integration of blockchain and biometrics, with emphasis in biometric template storage and protection, a key problem in biometrics still largely unsolved. Key tradeoffs involved in that integration, namely, latency, processing time, economic cost, and biometric performance are experimentally studied through the implementation of a smart contract on the Ethereum blockchain platform, which is publicly available in github for research purposes.
摘要：Blockchain技术提供出色的架构和用于保护和管理存储在生物识别模板的敏感和私有数据的实用工具，但有代价的。我们讨论的机遇和挑战blockchain和生物识别技术的融合，并在生物特征模板存储和保护，生物识别技术中的一个关键问题仍然没有解决主要重点。参与该整合重点的权衡，即等待时间，处理时间，经济成本，以及生物识别性能实验通过复仇blockchain平台，这是在github上用于研究目的公布在智能合同执行情况的研究。

9. DeepFake Detection: Current Challenges and Next Steps [PDF] 返回目录
Siwei Lyu
Abstract: High quality fake videos and audios generated by AI-algorithms (the deep fakes) have started to challenge the status of videos and audios as definitive evidence of events. In this paper, we highlight a few of these challenges and discuss the research opportunities in this direction.
摘要：由AI-算法（深假货）生成高品质的假视频和音频已经开始挑战的视频和音频作为事件的确切的证据地位。在本文中，我们着重介绍其中的几个挑战，并在这一方向讨论研究的机会。

10. Superaccurate Camera Calibration via Inverse Rendering [PDF] 返回目录
Morten Hannemose, Jakob Wilm, Jeppe Revall Frisvad
Abstract: The most prevalent routine for camera calibration is based on the detection of well-defined feature points on a purpose-made calibration artifact. These could be checkerboard saddle points, circles, rings or triangles, often printed on a planar structure. The feature points are first detected and then used in a nonlinear optimization to estimate the internal camera parameters.We propose a new method for camera calibration using the principle of inverse rendering. Instead of relying solely on detected feature points, we use an estimate of the internal parameters and the pose of the calibration object to implicitly render a non-photorealistic equivalent of the optical features. This enables us to compute pixel-wise differences in the image domain without interpolation artifacts. We can then improve our estimate of the internal parameters by minimizing pixel-wise least-squares differences. In this way, our model optimizes a meaningful metric in the image space assuming normally distributed noise characteristic for camera sensors.We demonstrate using synthetic and real camera images that our method improves the accuracy of estimated camera parameters as compared with current state-of-the-art calibration routines. Our method also estimates these parameters more robustly in the presence of noise and in situations where the number of calibration images is limited.
摘要：摄像机标定最常见的程序基于一个特制的校准工具检测的明确定义的特征点。这些可以棋盘式鞍点，圆，环或三角形，常常印上的平面结构。首先检测到的特征点，然后在一个非线性优化用于估计内部照相机parameters.We提出了使用逆呈现的原理相机校准的新方法。代替仅基于检测的特征点的依赖，我们使用内部参数的估计，并且所述标定物体的隐式地呈现的光学特征的非真实感等效姿态。这使我们能够在图像域，而无需插值计算文物逐像素的差异。然后，我们可以通过最小化逐像素最小二乘法的差异改善我们的内部参数的估计。以这种方式，我们的模型优化在图像空间中的有意义的度量作为与电流相比假设正态分布用于照相机sensors.We噪声特性使用合成的和真实的照相机图像表明，我们的方法提高了估计的相机参数的精度状态的最-art校准程序。我们的方法还在噪声的存在，并且在校准图像的数量是有限的情况下更稳健估计这些参数。

11. 3dDepthNet: Point Cloud Guided Depth Completion Network for Sparse Depth and Single Color Image [PDF] 返回目录
Rui Xiang, Feng Zheng, Huapeng Su, Zhe Zhang
Abstract: In this paper, we propose an end-to-end deep learning network named 3dDepthNet, which produces an accurate dense depth image from a single pair of sparse LiDAR depth and color image for robotics and autonomous driving tasks. Based on the dimensional nature of depth images, our network offers a novel 3D-to-2D coarse-to-fine dual densification design that is both accurate and lightweight. Depth densification is first performed in 3D space via point cloud completion, followed by a specially designed encoder-decoder structure that utilizes the projected dense depth from 3D completion and the original RGB-D images to perform 2D image completion. Experiments on the KITTI dataset show our network achieves state-of-art accuracy while being more efficient. Ablation and generalization tests prove that each module in our network has positive influences on the final results, and furthermore, our network is resilient to even sparser depth.
摘要：在本文中，我们提出了一个名为3dDepthNet一个终端到终端的深度学习网络，它从一对稀疏激光雷达的深度和彩色图像的机器人和自动驾驶的任务产生一个精确的密集深度图像。基于深度图像，我们的网络提供了一种新的3D到2D粗到细的双致密化的设计，是既准确又轻巧的尺寸性质。深度致密化首先在3D空间中通过点云完成通过利用投影从3D完成和原始RGB-d的图像稠密深度来执行2D图像完成一个专门设计的编码器 - 解码器结构上执行，紧随其后。在KITTI数据集实验表明我们的网络实现了国家的最先进的精确度，同时更有效率。消融和推广试验证明，我们的网络中的每个模块对最终的结果积极影响，而且，我们的网络是有弹性的，甚至稀疏深度。

12. DMV: Visual Object Tracking via Part-level Dense Memory and Voting-based Retrieval [PDF] 返回目录
Gunhee Nam, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim
Abstract: We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessible in-formation to the initial target features. In this paper, we relieve this limitation by maintaining an external memory that saves the tracking record. Part-level retrieval from the memory also liberates the information from the template and allows our tracker to better handle the challenges such as appearance changes and occlusions. By updating the memory during tracking, the representative power for the target object can be enhanced without online learning. We also propose a novel voting mechanism for the memory reading to filter out unreliable information in the memory. We comprehensively evaluate our tracker on OTB-100,TrackingNet, GOT-10k, LaSOT, and UAV123, which show that our method yields comparable results to the state-of-the-art methods.
摘要：本文提出通过部件级密集的内存和基于投票的检索，被称为DMV一种新型的基于内存的跟踪。由于深学习技术已被引入到跟踪领域，连体追踪器已经吸引了众多研究人员，由于速度和精度之间的平衡。然而，其中大部分是基于一个单一的模板匹配，这限制了性能，因为它限制了可访问在-形成到初始目标功能。在本文中，我们通过维护节省了跟踪记录的外部存储解除了这一限制。从存储部件级的检索也从模板中解放出来的信息，并允许我们的跟踪，以更好地处理所面临的挑战，如外观的变化和闭塞。通过更新跟踪过程中的记忆，目标对象代表电力可以在不在线学习来提高。我们还提出了一个新的投票机制的内存读取滤除记忆不可靠的信息。我们全面评估我们跟踪的OTB-100，TrackingNet，GOT-10K，LaSOT和UAV123，这表明我们的方法可以得到比较的结果，国家的最先进的方法。

13. Privileged Pooling: Supervised attention-based pooling for compensating dataset bias [PDF] 返回目录
Andres C. Rodriguez, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler
Abstract: In this paper we propose a novel supervised image classification method that overcomes dataset bias and scarcity of training data using privileged information in the form of keypoints annotations. Our main motivation is recognition of animal species for ecological applications like biodiversity modelling, which can be challenging due to long-tailed species distributions due to rare species, and strong dataset biases in repetitive scenes such as in camera traps. To counteract these challenges, we propose a weakly-supervised visual attention mechanism that has access to keypoints highlighting the most important object parts. This privileged information, implemented via a novel privileged pooling operation, is only accessible during training and helps the model to focus on the regions that are most discriminative. We show that the proposed approach uses more efficiently small training datasets, generalizes better and outperforms competing methods in challenging training conditions.
摘要：在本文中，我们提议，在关键点注释的形式使用特权信息克服数据集偏压和训练数据的稀缺的新颖监督图像分类方法。我们的主要动机是承认动物物种中重复的场景，由于稀有物种生态应用，如生物多样性的造型，可由于长尾物种分布是具有挑战性，和强大的数据集的偏见，如相机陷阱。为了应对这些挑战，我们提出可以访问关键点突出最重要的对象部分弱监督的视觉注意机制。这种特权信息，通过一个新的特权池操作来实现，只能访问训练过程中帮助模型把重点放在那些最歧视性的区域。我们证明了该方法的使用更有效的小训练数据，推广更好，优于竞争中具有挑战性的训练条件的方法。

14. Detection in Crowded Scenes: One Proposal, Multiple Predictions [PDF] 返回目录
Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun
Abstract: We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes. The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects. On a FPN-Res50 baseline, our detector can obtain 4.9\% AP gains on challenging CrowdHuman dataset and 1.0\% $\text{MR}^{-2}$ improvements on CityPersons dataset, without bells and whistles. Moreover, on less crowed datasets like COCO, our approach can still achieve moderate improvement, suggesting the proposed method is robust to crowdedness. Code and pre-trained models will be released at this https URL.
摘要：本文提出了一种简单而有效的基于提案对象检测器，针对在拥挤的场景检测高度重叠的情况。我们的方法的关键是让每个建议预测一组相关的实例，而不是单一的一个在以前的基础提案的框架中。配备了新的技术，如EMD损耗和集NMS，我们的探测器可以有效处理的检测高度重叠对象的难度。在FPN-Res50基线，我们的探测器能够获得挑战CrowdHuman数据集和1.0 \％$ \ {文字MR} ^ 4.9 \％AP增益{ - 2}上CityPersons数据集$改进，而不花俏。此外，就少乐得合不拢嘴的数据集像COCO，我们的方法仍然可以实现温和的改善，这表明该方法具有较强的抗拥挤。代码和预先训练的车型将在此HTTPS URL被释放。

15. Exploring Categorical Regularization for Domain Adaptive Object Detection [PDF] 返回目录
Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, Xiu-Shen Wei
Abstract: In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains. Previous work seeks to plainly align image-level and instance-level shifts to eventually minimize the domain discrepancy. However, they still overlook to match crucial image regions and important instances across domains, which will strongly affect domain shift mitigation. In this work, we propose a simple but effective categorical regularization framework for alleviating this issue. It can be applied as a plug-and-play component on a series of Domain Adaptive Faster R-CNN methods which are prominent for dealing with domain adaptive detection. Specifically, by integrating an image-level multi-label classifier upon the detection backbone, we can obtain the sparse but crucial image regions corresponding to categorical information, thanks to the weakly localization ability of the classification manner. Meanwhile, at the instance level, we leverage the categorical consistency between image-level predictions (by the classifier) and instance-level predictions (by the detection head) as a regularization factor to automatically hunt for the hard aligned instances of target domains. Extensive experiments of various domain shift scenarios show that our method obtains a significant performance gain over original Domain Adaptive Faster R-CNN detectors. Furthermore, qualitative visualization and analyses can demonstrate the ability of our method for attending on the key regions/instances targeting on domain adaptation. Our code is open-source and available at \url{this https URL}.
摘要：在本文中，我们解决域自适应目标检测的问题，其中最主要的挑战在于源和目标域之间显著领域的空白。以往的工作力求清楚地对准图像级和实例级转移到最终减少域差异。然而，他们仍然忽视匹配跨域关键图像区域和重要的情况下，这将严重影响域转移的缓解。在这项工作中，我们提出了缓解此问题的简单而有效的分类正规化框架。它可以在一个系列域自适应的应用为一个插头和播放部件更快的R-CNN方法，其是用于处理域自适应检测突出。具体地，通过在检测到主链结合的图像级多标记分类器，我们可以得到相应于分类信息的稀疏但关键的图像区域，由于分类方式的弱定位能力。另一方面，在实例级，我们利用图像级别预测（由分类器）和实例级预测（由检测头），为正则化因子之间的分类一致性自动寻找目标域的硬对齐实例。各种域移位场景广泛实验表明，我们的方法在原始域自适应更快R-CNN探测器获得显著的性能增益。此外，定性的可视化和分析可以证明我们的方法的能力，为出席的重点区域/实例域适应目标。我们的代码是开源的，并可以在\ {URL这HTTPS URL}。

16. Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net) [PDF] 返回目录
Fan Zhang, Guisheng Zhai, Meng Li, Yizhao Liu
Abstract: ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years, but it can not achieve good result to directly migrate the champions of the annual competition, to fine-grained visual categorization (FGVC) tasks. The small interclass variations and the large intraclass variations caused by the fine-grained nature makes it a challenging problem. The proposed method can be effectively localize object and useful part regions without the need of bounding-box and part annotations by attention object location module (AOLM) and attention part proposal module (APPM). The obtained object images contain both the whole structure and more details, the part images have many different scales and have more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our three-branch network structure. The model has good classification ability, good generalization and robustness for different scale object images. Our approach is end-to-end training, through the comprehensive experiments demonstrate that our approach achieves state-of-the-art results on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets.
摘要：ImageNet大型视觉识别挑战（ILSVRC）是计算机视觉（CV）近年来该领域最权威的学术比赛之一，但它不能达到良好的效果，直接迁移年度比赛的冠军，到细粒度的视觉分类（FGVC）任务。小类别间的变化，并引起细颗粒性质的大组内的变化使其成为一个具有挑战性的问题。该方法可以有效地定位目标和有用的部分地区不受关注对象定位模块（AOLM）和关注部位建议模块（APPM）需要包围盒和部分注释。将所获得的对象图像包含所述整体结构和更多细节两者中，一部分图像具有许多不同的尺度，并有更多细粒度特征和原始图像包含完整的对象。这三种训练图像都是由我们的三个分支网络结构的监督。该模型具有良好的分类能力，良好的通用性和鲁棒性的不同规模的物体图像。我们的做法是终端到终端的培训，通过全面的实验结果表明，我们的方法实现对CUB-200-2011，斯坦福汽车和FGVC飞机数据集的国家的最先进的成果。

17. Event-based Asynchronous Sparse Convolutional Networks [PDF] 返回目录
Nico Messikommer, Daniel Gehrig, Antonio Loquercio, Davide Scaramuzza
Abstract: Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events". Recently, pattern recognition algorithms, such as learning-based methods, have made significant progress with event cameras by converting events into synchronous dense, image-like representations and applying traditional machine learning methods developed for standard cameras. However, these approaches discard the spatial and temporal sparsity inherent in event data at the cost of higher computational complexity and latency. In this work, we present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output, thus directly leveraging the intrinsic asynchronous and sparse nature of the event data. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks without sacrificing accuracy. In addition, our framework has several desirable characteristics: (i) it exploits spatio-temporal sparsity of events explicitly, (ii) it is agnostic to the event representation, network architecture, and task, and (iii) it does not require any train-time change, since it is compatible with the standard neural networks' training process. We thoroughly validate the proposed framework on two computer vision tasks: object detection and object recognition. In these tasks, we reduce the computational complexity up to 20 times with respect to high-latency neural networks. At the same time, we outperform state-of-the-art asynchronous approaches up to 24% in prediction accuracy.
摘要：事件摄像机是仿生的传感器，在异步和稀疏的“事件”的形式，每个像素的亮度的变化做出反应。近日，模式识别算法，如基于学习的方法，已经通过将事件转化为同步密集，图像般的陈述和申请标准的相机开发传统的机器学习方法制备的具有事件相机显著的进展。然而，这些方法在丢弃的更高的计算复杂性和延迟的成本固有事件数据的空间和时间上稀疏。在这项工作中，我们提出用于将训练同步上的模型的一般框架图像样事件表示分为异步模式具有相同的输出，从而直接利用所述事件数据的固有异步和稀疏性质。我们同时显示理论和实验，这大大降低了计算的复杂性和高容量的延迟，同步神经网络在不牺牲精度。此外，我们的框架有几个理想的特性：（i）其利用事件的时空稀疏明确，（ii）其是不可知的事件表示，网络架构和任务，以及（iii）它不需要任何列车 - 时间的变化，因为它与标准的神经网络的训练工艺兼容。我们彻底验证两个计算机视觉任务所提出的框架：目标检测和物体识别。在这些任务中，我们降低计算复杂度高达20倍相对于高延迟的神经网络。与此同时，我们优于现有技术的国家的异步中预测精度接近高达24％。

18. CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection [PDF] 返回目录
Zhiwei Dong, Guoxuan Li, Yue Liao, Fei Wang, Pengju Ren, Chen Qian
Abstract: Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at this https URL.
摘要：基于关键点探测器都取得了相当孔性能。然而，关键点不正确匹配现象仍然十分普遍，大大影响了探测器的性能。在本文中，我们提出CentripetalNet它使用来自同一实例心转移到对边角的关键点。 CentripetalNet预测的位置和角点和比赛的角落，其转移的结果一致的心转变。结合位置信息，我们的方法更准确地比传统的嵌入方法做匹配角点。角落汇集边框内提取到的信息边界。为了使这些信息更知道在拐角处，我们设计了一个十字星变形卷积网络进行功能适应。此外，我们通过与口罩预测模块配备了CentripetalNet探索无锚探测器实例分割。在MS-COCO测试开发，我们CentripetalNet不仅优于48.0％的AP所有现有的无锚探测器也达到相当的性能，以国家的最先进的实例分割了40.2％MaskAP方法。代码将可在此HTTPS URL。

19. Dual-discriminator GAN: A GAN way of profile face recognition [PDF] 返回目录
Xinyu Zhang, Yang Zhao, Hao Zhang
Abstract: A wealth of angle problems occur when facial recognition is performed: At present, the feature extraction network presents eigenvectors with large differences between the frontal face and profile face recognition of the same person in many cases. For this reason, the state-of-the-art facial recognition network will use multiple samples for the same target to ensure that eigenvector differences caused by angles are ignored during training. However, there is another solution available, which is to generate frontal face images with profile face images before recognition. In this paper, we proposed a method of generating frontal faces with image-to-image profile faces based on Generative Adversarial Network (GAN).
摘要：当进行面部识别发生问题的角度丰富：目前，特征提取网络呈现的特征向量，在许多情况下，脸的正面和侧面人脸的识别同一个人之间的巨大差异。出于这个原因，所述状态的最先进的面部识别网络将使用多个样本对同一目标，以确保造成角度该特征向量的差异在训练期间被忽略。然而，有另一个可用的解决方案，这是产生正面人脸图像识别与前侧面人脸图像。在本文中，我们提议产生正面人脸基于剖成对抗性网络（GAN）图像到图像轮廓面的方法。

20. FocalMix: Semi-Supervised Learning for 3D Medical Image Detection [PDF] 返回目录
Dong Wang, Yuan Zhang, Kexin Zhang, Liwei Wang
Abstract: Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.
摘要：在医疗成像应用人工智能技术在医学中最有前途的领域之一。然而，大多数在这方面最近成功高度依赖于大量精心标注的数据，而注释医学图像是一个昂贵的过程。在本文中，我们提出了一种新的方法，叫做FocalMix，其中，在我们所知的，是在半监督学习（SSL）的第一杠杆最新进展三维医学图像检测。我们对肺结节检测，LUNA16和NLST两种广泛使用的数据集进行了广泛的实验。结果表明，我们所提出的SSL方法可以在国家的最先进的监督学习与400次未标记的CT扫描的方法实现了大幅提高至17.3％。

21. Masked Face Recognition Dataset and Application [PDF] 返回目录
Zhongyuan Wang, Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu, Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing Huang, Jinbi Liang
Abstract: In order to effectively prevent the spread of COVID-19 virus, almost everyone wears a mask during coronavirus epidemic. This almost makes conventional facial recognition technology ineffective in many cases, such as community access control, face access control, facial attendance, facial security checks at train stations, etc. Therefore, it is very urgent to improve the recognition performance of the existing face recognition technology on the masked faces. Most current advanced face recognition approaches are designed based on deep learning, which depend on a large number of face samples. However, at present, there are no publicly available masked face recognition datasets. To this end, this work proposes three types of masked face datasets, including Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD) and Simulated Masked Face Recognition Dataset (SMFRD). Among them, to the best of our knowledge, RMFRD is currently the world's largest real-world masked face dataset. These datasets are freely available to industry and academia, based on which various applications on masked faces can be developed. The multi-granularity masked face recognition model we developed achieves 95% accuracy, exceeding the results reported by the industry. Our datasets are available at: this https URL.
摘要：为了有效防止COVID-19病毒的传播，几乎每个人都戴着冠状病毒流行期间的面具。这几乎让传统面部识别技术无效在许多情况下，如社区门禁，人脸门禁，考勤面部，在火车站面部的安全检查等。因此，这是非常紧迫的，以改善现有脸部识别的识别性能技术在掩盖面孔。目前，大多数先进的人脸识别方法是基于深刻的学习，这依赖于大量的人脸样本的设计。然而，目前没有可公开获得的蒙面人脸识别数据集。为此，这项工作提出了三种类型的掩盖脸上的数据集，包括蒙面人脸检测数据集（MFDD），真实世界的蒙面人脸识别数据集（RMFRD）和模拟屏蔽人脸识别数据集（SMFRD）。其中，给我们所知，RMFRD是目前世界上最大的现实掩盖脸部数据集。这些数据集是免费提供给工业界和学术界的基础上，其上掩盖脸上的各种应用程序的开发。多粒度掩盖脸部识别模型中，我们开发了达到95％的准确率，超出的行业报告的结果。我们的数据集，请访问：此HTTPS URL。

22. Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [PDF] 返回目录
Jingwen Ye, Yixin Ji, Xinchao Wang, Xin Gao, Mingli Song
Abstract: Recent advances in deep learning have provided procedures for learning one network to amalgamate multiple streams of knowledge from the pre-trained Convolutional Neural Network (CNN) models, thus reduce the annotation cost. However, almost all existing methods demand massive training data, which may be unavailable due to privacy or transmission issues. In this paper, we propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers. The main idea is to construct the group-stack generative adversarial networks (GANs) which have two dual generators. First one generator is trained to collect the knowledge by reconstructing the images approximating the original dataset utilized for pre-training the teachers. Then a dual generator is trained by taking the output from the former generator as input. Finally we treat the dual part generator as the target network and regroup it. As demonstrated on several benchmarks of multi-label classification, the proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
摘要：在深度学习的最新进展，学习一个网络合并由预先训练卷积神经网络（CNN）模型知识的多个数据流，从而降低成本标注规定的程序。然而，几乎所有现有的方法需要大量的训练数据，这可能是不可用的，由于隐私或传输的问题。在本文中，我们提出了一个无数据知识合并将战略手艺从多个单/多任务的教师表现良好的多任务网络的学生。其主要思想是，构建组堆叠生成对抗网络（甘斯），其具有两个双发电机。第一个发电机被训练由近似重建训练前的教师使用的原始数据集图像采集知识。然后双发电机通过采取从以前的发电机作为输入输出培训。最后，我们把双部分发电机作为目标网络和重组它。由于在多标签分类的几个基准证实，没有任何培训数据所提出的方法实现了令人惊讶的有竞争力的结果，即使有一些全监督的方法相比。

23. Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network [PDF] 返回目录
Minjee Kim, Joonmyeong Choi, Namkug Kim
Abstract: Hand hygiene is one of the most significant factors in preventing hospital acquired infections (HAI) which often be transmitted by medical staffs in contact with patients in the operating room (OR). Hand hygiene monitoring could be important to investigate and reduce the outbreak of infections within the OR. However, an effective monitoring tool for hand hygiene compliance is difficult to develop due to the visual complexity of the OR scene. Recent progress in video understanding with convolutional neural net (CNN) has increased the application of recognition and detection of human actions. Leveraging this progress, we proposed a fully automated hand hygiene monitoring tool of the alcohol-based hand rubbing action of anesthesiologists on OR video using spatio-temporal features with 3D CNN. First, the region of interest (ROI) of anesthesiologists' upper body were detected and cropped. A temporal smoothing filter was applied to the ROIs. Then, the ROIs were given to a 3D CNN and classified into two classes: rubbing hands or other actions. We observed that a transfer learning from Kinetics-400 is beneficial and the optical flow stream was not helpful in our dataset. The final accuracy, precision, recall and F1 score in testing is 0.76, 0.85, 0.65 and 0.74, respectively.
摘要：手部卫生是预防医院获得性感染（HAI），它通常可以通过医务人员在接触病人在手术室（OR）发射的最显著的因素之一。手部卫生监测可能是重要的调查和减少或在感染爆发。然而，对于手卫生依从性的有效监控工具是难以发展因或场景的视觉复杂性。在卷积神经网络（CNN）的视频了解最新的进展增加了识别和人类行为的检测中的应用。利用这种进步，我们提出了麻醉医师与3D CNN或视频使用时空特征的酒精类手揉搓动作的全自动手部卫生监测工具。首先，进行检测，并裁剪感兴趣麻醉师上半身（ROI）的区域。时间平滑滤波器被应用到的ROI。然后，ROI被赋予了3D CNN和分成两类：搓手或其他行动。我们观察到从转移动力学-400学习是有益的，光流流是不是在我们的数据有帮助。最终的准确度，精密度，召回和F1评分测试是0.76，0.85，0.65和0.74。

24. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network [PDF] 返回目录
Jakaria Rabbi, Nilanjan Ray, Matthias Schubert, Subir Chowdhury, Dennis Chao
Abstract: The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection performance degrades for the small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance. We propose an architecture with three components: ESRGAN, Edge Enhancement Network (EEN), and Detection network. We use residual-in-residual dense blocks (RRDB) for both the GAN and EEN, and for the detector network, we use the faster region-based convolutional network (FRCNN) (two-stage detector) and single-shot multi-box detector (SSD) (one stage detector). Extensive experiments on car overhead with context and oil and gas storage tank (created by us) data sets show superior performance of our method compared to the standalone state-of-the-art object detectors.
摘要：小物件的遥感图像的检测性能是不能令人满意的相比大的物体，特别是在低的分辨率和噪声图像。 22的生成对抗网络（GaN）基模型，称为加强超分辨率甘（ESRGAN）显示显着的图像增强的表现，但重建图像会错过高频边缘信息。因此，对于回收的嘈杂和低分辨率遥感图像的小物体的物体检测性能降低。通过边缘增强的GAN（EEGAN）和ESRGAN的成功的启发，我们应用一个新的边缘增强超分辨率GAN（EESRGAN）改善遥感图像和使用不同的检测器网络在端至端的方式，图像质量其中检测器损耗backpropagated到EESRGAN来提高检测性能。我们提出了三个组件的架构：ESRGAN，边缘增强网络（EEN），以及检测网络。我们使用的GAN和EEN两个剩余的残留致密块（RRDB），以及用于检测网络，我们使用更快的基于区域的卷积网络（FRCNN）（两级检测器）和单次多箱检测器（SSD）（一个级检测器）。上车架空广泛的实验与上下文和油气储存罐（由我们创造）的数据集显示我们的方法的优良性能相比独立状态的最先进的对象检测器。

25. A Graduated Filter Method for Large Scale Robust Estimation [PDF] 返回目录
Huu Le, Christopher Zach
Abstract: Due to the highly non-convex nature of large-scale robust parameter estimation, avoiding poor local minima is challenging in real-world applications where input data is contaminated by a large or unknown fraction of outliers. In this paper, we introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima. Our algorithm is built upon the class of traditional graduated optimization techniques, which are considered state-of-the-art local methods to solve problems having many poor minima. The novelty of our work lies in the introduction of an adaptive kernel (or residual) scaling scheme, which allows us to achieve faster convergence rates. Like other existing methods that aim to return good local minima for robust estimation tasks, our method relaxes the original robust problem but adapts a filter framework from non-linear constrained optimization to automatically choose the level of relaxation. Experimental results on real large-scale datasets such as bundle adjustment instances demonstrate that our proposed method achieves competitive results.
摘要：由于大规模的稳健性参数估计的高度非凸性质，避免了可怜的局部最小的在输入数据是由异常的或大或未知的分数污染的现实世界的应用挑战。在本文中，我们介绍的是拥有一个强大的逃脱贫穷局部极小能力稳健估计的新解算器。我们的算法是基于类的传统毕业的优化技术，这被认为是国家的最先进的本地方法来解决有许多贫困极小的问题建。我们的工作，就在于引入自适应核（或剩余）换算方案，这使我们能够实现更快的收敛速度的新颖性。这样的目的是恢复良好的局部极小的稳健估计任务的其他现有的方法，我们的方法放宽了原来健壮的问题，而是适应从过滤器框架非线性约束优化自动选择放松的水平。真实的大型数据集，如束调节的情况下，实验结果表明，我们提出的方法实现了有竞争力的结果。

26. How to Train Your Event Camera Neural Network [PDF] 返回目录
Timo Stoffregen, Cedric Scheerlinck, Davide Scaramuzza, Tom Drummond, Nick Barnes, Lindsay Kleeman, Robert Mahony
Abstract: Event cameras are paradigm-shifting novel sensors that report asynchronous, per-pixel brightness changes called 'events' with unparalleled low latency. This makes them ideal for high speed, high dynamic range scenes where conventional cameras would fail. Recent work has demonstrated impressive results using Convolutional Neural Networks (CNNs) for video reconstruction and optic flow with events. We present strategies for improving training data for event based CNNs that result in 25-40% boost in performance of existing state-of-the-art (SOTA) video reconstruction networks retrained with our method, and up to 80% for optic flow networks. A challenge in evaluating event based video reconstruction is lack of quality groundtruth images in existing datasets. To address this, we present a new High Quality Frames (HQF) dataset, containing events and groundtruth frames from a DAVIS240C that are well-exposed and minimally motion-blurred. We evaluate our method on HQF + several existing major event camera datasets.
摘要：事件摄像机是报告具有无可比拟的低等待时间异步的，每个像素的亮度的变化被称为“事件”的范例移新颖传感器。这使它们非常适用于高速，高动态范围场景中传统摄像机会失败。最近的工作已经证明了使用卷积神经网络（细胞神经网络）进行视频重建和与事件光流了不俗的业绩。用于改进的训练数据用于基于事件的细胞神经网络我们本策略，结果，在现有状态的最先进的（SOTA）视频重建与我们的方法重新训练网络的性能25-40％升压，和高达80％的为光流网络。在评估基于事件的视频重建的一个挑战是缺乏对现有数据集质量真实状况的图像。为了解决这个问题，我们提出了一种新的高品质的帧（HQF）的数据集，包含来自DAVIS240C是良好的曝光和最小的运动模糊事件和地面实况帧。我们评估我们的HQF法+若干现有大事相机的数据集。

27. Cross-Shape Graph Convolutional Networks [PDF] 返回目录
Dmitry Petrov, Evangelos Kalogerakis
Abstract: We present a method that processes 3D point clouds by performing graph convolution operations across shapes. In this manner, point descriptors are learned by allowing interaction and propagation of feature representations within a shape collection. To enable this form of non-local, cross-shape graph convolution, our method learns a pairwise point attention mechanism indicating the degree of interaction between points on different shapes. Our method also learns to create a graph over shapes of an input collection whose edges connect shapes deemed as useful for performing cross-shape convolution. The edges are also equipped with learned weights indicating the compatibility of each shape pair for cross-shape convolution. Our experiments demonstrate that this interaction and propagation of point representations across shapes make them more discriminative. In particular, our results show significantly improved performance for 3D point cloud semantic segmentation compared to conventional approaches, especially in cases with the limited number of training examples.
摘要：我们提出了通过在形状进行图形卷积运算处理的3D点云的方法。以这种方式，点描述符通过允许形状集合内相互作用和特征表示的传播获知。为了使这种形式的非局部，十字形图形卷积的，我们的方法学习指示相互作用的点之间的不同的形状的程度的成对点注意机制。我们的方法还学习创建了一个多输入集合，并将其边缘连接形状视为有用用于执行交叉形状卷积的形状的曲线图。边缘还配备了学习权重表示各形状为一对交叉形状卷积的相容性。我们的实验表明，这种互动和整个形状点表示的传播使他们更有辨别力。特别是，我们的结果与常规方法相比表现出显著改善性能三维点云语义分割，尤其是在训练范例数量有限的情况下。

28. Affinity Graph Supervision for Visual Recognition [PDF] 返回目录
Chu Wang, Babak Samari, Vladimir G. Kim, Siddhartha Chaudhuri, Kaleem Siddiqi
Abstract: Affinity graphs are widely used in deep architectures, including graph convolutional neural networks and attention networks. Thus far, the literature has focused on abstracting features from such graphs, while the learning of the affinities themselves has been overlooked. Here we propose a principled method to directly supervise the learning of weights in affinity graphs, to exploit meaningful connections between entities in the data source. Applied to a visual attention network, our affinity supervision improves relationship recovery between objects, even without the use of manually annotated relationship labels. We further show that affinity learning between objects boosts scene categorization performance and that the supervision of affinity can also be applied to graphs built from mini-batches, for neural network training. In an image classification task we demonstrate consistent improvement over the baseline, with diverse network architectures and datasets.
摘要：亲和图被广泛应用于深架构，包括图形卷积神经网络和关注网络。迄今为止，文献主要集中在从抽象图形等功能，而亲和力自己一直学习忽视。在这里，我们提出了一个原则性的方法直接监督权的学习亲和力图表，利用数据源中的实体之间有意义的联系。应用于可视关注网络，我们的亲和改进监督对象之间关系的恢复，即使没有使用手动注释关系的标签。进一步的研究表明之间的物体提升场景分类性能和亲和力的监督，也可应用于从小型批次，对神经网络训练内置图形，亲和学习。在图像分类任务中，我们证明了基线持续改善，与不同的网络架构和数据集。

29. Human Activity Recognition from Wearable Sensor Data Using Self-Attention [PDF] 返回目录
Saif Mahmud, M Tanjid Hasan Tonmoy, Kishor Kumar Bhaumik, A K M Mahbubur Rahman, M Ashraful Amin, Mohammad Shoyaib, Muhammad Asif Hossain Khan, Amin Ahsan Ali
Abstract: Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.
摘要：人类活动识别从身体佩戴的传感器数据的姿势捕获时间序列信号的空间和时间的依赖关系的固有的挑战。在这方面，现有的复发或卷积或它们的混合模式从传感器读数序列的特征空间活动识别斗争捕捉时空背景。为了解决这个复杂的问题，我们提出了一个自我关注基于神经网络模型，foregoes经常性的架构，并利用不同类型的注意力机制，生成用于分类高维表示。 PAMAP2，机遇，斯柯达和USC-HAD：我们在四个流行的可公开获得的数据集HAR进行大量的实验。我们的模型实现对国家的最先进的最新型号显著的性能提升在这两个基准测试科目和留一主题进行评价。我们也观察到，传感器注意映射我们的模型产生能够捕获传感器的形态和位置的预测不同的活动课的重要性。

30. Multilayer Dense Connections for Hierarchical Concept Classification [PDF] 返回目录
Toufiq Parag, Hongcheng Wang
Abstract: Classification is a pivotal function for many computer vision tasks such as object classification, detection, scene segmentation. Multinomial logistic regression with a single final layer of dense connections has become the ubiquitous technique for CNN-based classification. While these classifiers learn a mapping between the input and a set of output category classes, they do not typically learn a comprehensive knowledge about the category. In particular, when a CNN based image classifier correctly identifies the image of a Chimpanzee, it does not know that it is a member of Primate, Mammal, Chordate families and a living thing. We propose a multilayer dense connectivity for a CNN to simultaneously predict the category and its conceptual superclasses in hierarchical order. We experimentally demonstrate that our proposed dense connections, in conjunction with popular convolutional feature layers, can learn to predict the conceptual classes with minimal increase in network size while maintaining the categorical classification accuracy.
摘要：分类是许多计算机视觉任务，如对象分类，检测，场景分割了举足轻重的作用。密集连接的单个最终层多项Logistic回归已成为基于CNN-分类无处不技术。虽然这些分类学的输入和一组输出分类等级之间的映射，它们通常不会了解类别的综合知识。特别是，当一个基于CNN的图像分类正确识别黑猩猩的图像，它不知道它是灵长类动物，哺乳动物，脊索动物家族的一员，一个活物。我们提出了一个CNN的多层致密连接同时预测类别和等级秩序的概念超。我们通过实验证明，我们提出的密集连接，与流行的卷积功能层的同时，可以学习在网络规模最小的增长预测概念类，同时保持绝对的分类精度。

31. Semi-Supervised Semantic Segmentation with Cross-Consistency Training [PDF] 返回目录
Yassine Ouali, Céline Hudelot, Myriam Tami
Abstract: In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation. Consistency training has proven to be a powerful semi-supervised learning framework for leveraging unlabeled data under the cluster assumption, in which the decision boundary should lie in low-density regions. In this work, we first observe that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs. We thus propose cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Concretely, a shared encoder and a main decoder are trained in a supervised manner using the available labeled examples. To leverage the unlabeled examples, we enforce a consistency between the main decoder predictions and those of the auxiliary decoders, taking as inputs different perturbed versions of the encoder's output, and consequently, improving the encoder's representations. The proposed method is simple and can easily be extended to use additional training signal, such as image-level labels or pixel-level labels across different domains. We perform an ablation study to tease apart the effectiveness of each component, and conduct extensive experiments to demonstrate that our method achieves state-of-the-art results in several datasets.
摘要：在本文中，我们提出了语义分割的新颖横一致性基于半监督的方法。一致性的训练已经被证明是对集群的假设，其中决策边界应位于低密度区下利用未标记数据的强大半监督学习框架。在这项工作中，我们首先观察到对于语义分割，低密度区是隐藏表示比输入内内更加明显。因此，我们提出了交叉一致性训练，其中，所述预测的不变性是强制执行在不同的扰动施加到所述编码器的输出。具体而言，一个共享的编码器和主译码器中使用可用的标记的例子有监督的方式被训练。为了利用未标记的例子中，我们执行主解码器预测和那些辅助解码器之间的一致性，以作为输入的编码器的输出的不同扰动的版本，并且因此，改善了编码器的表示。所提出的方法是简单的，并且可以容易地扩展到使用附加训练信号，诸如跨不同的域映像级标签或像素级标签。我们进行消融研究，梳理出每个组件的有效性，并进行广泛的实验，以证明我们的方法实现国家的最先进成果在多个数据集。

32. MOT20: A benchmark for multi object tracking in crowded scenes [PDF] 返回目录
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixé
Abstract: Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods. The challenge focuses on multiple people tracking, since pedestrians are well studied in the tracking community, and precise tracking and detection has high practical relevance. Since the first release, MOT15, MOT16, and MOT17 have tremendously contributed to the community by introducing a clean dataset and precise framework to benchmark multi-object trackers. In this paper, we present our MOT20benchmark, consisting of 8 new sequences depicting very crowded challenging scenes. The benchmark was presented first at the 4thBMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2019, and gives to chance to evaluate state-of-the-art methods for multiple object tracking when handling extremely crowded scenarios.
摘要：从标准化基准是为广大的计算机视觉应用的关键。虽然排行榜和排名表不应过分要求，基准通常提供的性能的最客观的衡量标准，因此研究的重要指南。对多目标跟踪，MOTChallenge基准，与建立的多目标跟踪方法的标准化评价的目标展开。我们面临的挑战集中在多人跟踪，因为行人在跟踪社区很好的研究和精确的跟踪和检测具有较高的实用意义。由于第一释放，MOT15，MOT16和MOT17已经极大通过引入一个干净的数据集和精确的框架，以基准多对象跟踪器贡献给社会。在本文中，我们提出我们MOT20benchmark，包括描绘非常拥挤场面挑战8个新的序列。基准是在4thBMTT MOT挑战研讨会在计算机视觉与模式识别大会（CVPR）2019第一次提出，并给予机会搬运极为拥挤的场景时，评价为多目标跟踪国家的最先进的方法。

33. Local Implicit Grid Representations for 3D Scenes [PDF] 返回目录
Chiyu Max Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser
Abstract: Shape priors learned from data are commonly used to reconstruct 3D objects from partial or noisy data. Yet no such shape priors are available for indoor scenes, since typical 3D autoencoders cannot handle their scale, complexity, or diversity. In this paper, we introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality. The motivating idea is that most 3D surfaces share geometric details at some scale -- i.e., at a scale smaller than an entire object and larger than a small patch. We train an autoencoder to learn an embedding of local crops of 3D shapes at that size. Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops such that an interpolation of the decoded local shapes matches a partial or noisy observation. We demonstrate the value of this proposed approach for 3D surface reconstruction from sparse point observations, showing significantly better results than alternative approaches.
摘要：从数据学习形状先验通常用于重建从部分或噪声的数据的3D对象。然而，没有这种形状先验可用于室内场景，因为典型的3D自动编码不能处理自己的规模，复杂性或多样性。在本文中，我们介绍了当地电网隐陈述，全新的3D形状表示专为可扩展性和通用性。该激励的想法是，大多数3D表面在一些尺度共享几何细节 - 即，以规模比整个对象比一小块更小和更大。我们训练自动编码的3D形状以该尺寸学习当地农作物的嵌入。然后，我们使用解码器的形状优化一个组件，用于在重叠的规则网格的一组潜码解决了作物，使得解码的局部形状的内插的部分或嘈杂的观察一致。我们证明这个建议的方法从稀疏的点观察三维表面重建的值，显示出比其他方法显著更好的结果。

34. Temporal Extension Module for Skeleton-Based Action Recognition [PDF] 返回目录
Yuya Obinata, Takuma Yamamoto
Abstract: We present a module that extends the temporal graph of a graph convolutional network (GCN) for action recognition with a sequence of skeletons. Existing methods attempt to represent a more appropriate spatial graph on an intra-frame, but disregard optimization of the temporal graph on the inter-frame. In this work, we focus on adding extra edges to neighboring multiple vertices on the inter-frame and extracting additional features based on the extended temporal graph. Our module is a simple yet effective method to extract correlated features of multiple joints in human movement. Moreover, our module aids in further performance improvements, along with other GCN methods that optimize only the spatial graph. We conduct extensive experiments on two large datasets, NTU RGB+D and Kinetics-Skeleton, and demonstrate that our module is effective for several existing models and our final model achieves competitive or state-of-the-art performance.
摘要：我们提出与骨架的序列延伸的曲线图卷积网络（GDN）为动作识别的时间曲线图的模块。现有方法试图表示帧内在更适当的空间图形，但帧间所述时间图表的无视优化。在这项工作中，我们侧重于增加额外的边缘与帧间对邻近多个顶点和基于扩展时间用图表提取的附加功能。我们的模块是一个简单而有效的方法，以提取在人体运动多个关节的相关特征。此外，我们的模块有助于进一步性能改进，与仅优化所述空间图形其它GCN方法一起。我们两个大型数据集，NTU RGB + d和动力学骨架进行广泛的实验，证明了我们的模块是有效的一些现有的模式和我们的最终模型实现了有竞争力的或国家的最先进的性能。

35. RGB-Topography and X-rays Image Registration for Idiopathic Scoliosis Children Patient Follow-up [PDF] 返回目录
Insaf Setitra, Noureddine Aouaa, Abdelkrim Meziane, Afef Benrabia, Houria Kaced, Hanene Belabassi, Sara Ait Ziane, Nadia Henda Zenati, Oualid Djekkoune
Abstract: Children diagnosed with a scoliosis pathology are exposed during their follow up to ionic radiations in each X-rays diagnosis. This exposure can have negative effects on the patient's health and cause diseases in the adult age. In order to reduce X-rays scanning, recent systems provide diagnosis of scoliosis patients using solely RGB images. The output of such systems is a set of augmented images and scoliosis related angles. These angles, however, confuse the physicians due to their large number. Moreover, the lack of X-rays scans makes it impossible for the physician to compare RGB and X-rays images, and decide whether to reduce X-rays exposure or not. In this work, we exploit both RGB images of scoliosis captured during clinical diagnosis, and X-rays hard copies provided by patients in order to register both images and give a rich comparison of diagnoses. The work consists in, first, establishing the monomodal (RGB topography of the back) and multimodal (RGB and Xrays) image database, then registering images based on patient landmarks, and finally blending registered images for a visual analysis and follow up by the physician. The proposed registration is based on a rigid transformation that preserves the topology of the patient's back. Parameters of the rigid transformation are estimated using a proposed angle minimization of Cervical vertebra 7, and Posterior Superior Iliac Spine landmarks of a source and target diagnoses. Experiments conducted on the constructed database show a better monomodal and multimodal registration using our proposed method compared to registration using an Equation System Solving based registration.
摘要：儿童诊断患有脊柱侧凸病理其随访在每个X射线诊断离子辐射期间被暴露。这暴露可能对在成年患者的健康，引起疾病的负面影响。为了降低X射线扫描时，新近的系统提供仅使用RGB图像侧凸患者诊断。这种系统的输出是一组增强的图像和脊柱侧凸相关角度。这些角度，然而，混淆医生由于其大量。此外，缺乏X射线扫描，使它不可能对医生比较RGB和X射线图像，并决定是否减少X射线曝光与否。在这项工作中，我们利用脊柱侧弯的两个RGB图像临床诊断中被俘，和X射线病人以注册两个图像，给诊断了丰富的比较提供硬拷贝。工作在于，首先，建立单峰（背面的RGB地形）和多峰（RGB和X射线）图像数据库，然后根据患者的地标准图像，最后一个视觉分析混合注册的图像和由医师跟进。建议的注册是基于保留了患者背部的拓扑结构刚性变换。的刚性变换的参数是使用颈的角度提出了最小化椎骨7估计，和髂后上棘的路标的源和目标诊断。构建数据库上进行的实验表明，使用我们提出的方法相比，使用方程求解系统根据注册登记更好的单模和多模态注册。

36. Visual Navigation Among Humans with Optimal Control as a Supervisor [PDF] 返回目录
Varun Tolani, Somil Bansal, Aleksandra Faust, Claire Tomlin
Abstract: Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot's visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples' motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website.
摘要：现实世界中的导航需要机器人在不熟悉的，动态的环境中工作，与人类共享的空间。因为它需要预测自己未来的运动，这是相当有挑战人类周围的导航是特别困难。我们提出了一个新的框架，周边的人它集导航学习为基础的基于模型的优化控制的看法。具体而言，我们培养了机器人的视觉输入映射到一个航点或下一个所需的状态的卷积神经网络（CNN）基感知模块。此路标然后被输入到该安全和有效地输送机器人的目标规划和控制模块。要训练CNN贡献我们在人类的存在自主机器人导航照片般逼真的基准标记数据集。 CNN的使用从我们的照片般逼真的数据集渲染图像监督的学习培训。拟议的框架学会预测和反应仅基于单眼RGB图像人民的运动，没有明确地预测未来人类的运动。我们的方法推广以及在两个模拟看不见的建筑物和人与现实世界的环境。此外，我们的实验表明，引线结合基于模型的控制和学习，比纯粹的学习为基础的方法更好，更多的数据，快捷导航行为。说明我们的方法和实验影片都可以在项目网站上。

37. Detection of Information Hiding at Anti-Copying 2D Barcodes [PDF] 返回目录
Ning Xie, Ji Hu, Junjie Chen, Qiqi Zhang, Changsheng Chen
Abstract: This paper concerns the problem of detecting the use of information hiding at anti-copying 2D barcodes. Prior hidden information detection schemes are either heuristicbased or Machine Learning (ML) based. The key limitation of prior heuristics-based schemes is that they do not answer the fundamental question of why the information hidden at a 2D barcode can be detected. The key limitation of prior MLbased information schemes is that they lack robustness because a printed 2D barcode is very much environmentally dependent, and thus an information hiding detection scheme trained in one environment often does not work well in another environment. In this paper, we propose two hidden information detection schemes at the existing anti-copying 2D barcodes. The first scheme is to directly use the pixel distance to detect the use of an information hiding scheme in a 2D barcode, referred as to the Pixel Distance Based Detection (PDBD) scheme. The second scheme is first to calculate the variance of the raw signal and the covariance between the recovered signal and the raw signal, and then based on the variance results, detects the use of information hiding scheme in a 2D barcode, referred as to the Pixel Variance Based Detection (PVBD) scheme. Moreover, we design advanced IC attacks to evaluate the security of two existing anti-copying 2D barcodes. We implemented our schemes and conducted extensive performance comparison between our schemes and prior schemes under different capturing devices, such as a scanner and a camera phone. Our experimental results show that the PVBD scheme can correctly detect the existence of the hidden information at both the 2LQR code and the LCAC 2D barcode. Moreover, the probability of successfully attacking of our IC attacks achieves 0.6538 for the 2LQR code and 1 for the LCAC 2D barcode.
摘要：本文涉及的检测使用信息隐藏在防复制二维条码的问题。在此之前隐藏信息检测方案要么heuristicbased或基于机器学习（ML）。之前基于启发式的方案的主要限制是，他们不回答，为什么可以被检测到隐藏在二维码中的信息的根本问题。之前MLbased信息方案的主要限制是，他们缺乏稳健性，因为印刷的二维条码是非常依赖于环境的，因此，在一个环境中训练的信息隐藏检测方案往往不能在另一个环境中很好地工作。在本文中，我们提出了在现有的防复制二维条码两个隐藏信息检测方案。第一种方案是直接使用所述像素距离来检测2维条形码的使用信息隐藏方案的，被称为对像素距离基于检测（PDBD）方案。第二个方案是先计算原始信号的方差和恢复的信号与原始信号，然后基于所述方差结果之间的协方差，检测在二维条码中使用信息隐藏方案的，被称为对像素基于方差的检测（PVBD）方案。此外，我们设计先进的IC攻击来评估现有的两个防复制二维条码的安全性。我们实施我们的计划，并进行下不同的捕获设备，如扫描仪和照相机的手机我们的方案和之前方案之间广泛的性能对比。我们的实验结果表明，该PVBD方案可以正确检测在2LQR代码和LCAC二维条码两者的隐藏信息的存在。此外，成功地攻击我们的IC攻击的概率达到0.6538为的LCAC二维条码的2LQR代码和1。

38. U-Det: A Modified U-Net architecture with bidirectional feature network for lung nodule segmentation [PDF] 返回目录
Nikhil Varma Keetha, Samson Anosh Babu P, Chandra Sekhara Rao Annavarapu
Abstract: Early diagnosis and analysis of lung cancer involve a precise and efficient lung nodule segmentation in computed tomography (CT) images. However, the anonymous shapes, visual features, and surroundings of the nodule in the CT image pose a challenging problem to the robust segmentation of the lung nodules. This article proposes U-Det, a resource-efficient model architecture, which is an end to end deep learning approach to solve the task at hand. It incorporates a Bi-FPN (bidirectional feature network) between the encoder and decoder. Furthermore, it uses Mish activation function and class weights of masks to enhance segmentation efficiency. The proposed model is extensively trained and evaluated on the publicly available LUNA-16 dataset consisting of 1186 lung nodules. The U-Det architecture outperforms the existing U-Net model with the Dice similarity coefficient (DSC) of 82.82% and achieves results comparable to human experts.
摘要：早期诊断和肺癌的分析涉及在计算机断层摄影（CT）图像的精确和有效的肺结节分割。然而，匿名的形状，视觉特征，和CT图像中的结节的周围构成具有挑战性的问题至肺结节的鲁棒分割。本文提出了U型挪威，资源节约型模型架构，这是一个端到端的深度学习的方法来解决手头的任务。它集成了一个双FPN（双向功能网络）的编码器和解码器之间。此外，它使用的掩模的米什激活函数和类的权重，以提高分割效率。该模型被广泛的培训和评估了包括1186个肺结节可公开获得的LUNA-16数据集。在U-DET架构优于具有82.82％的骰子相似系数（DSC）现有的U形网模型并达到效果媲美人类专家。

39. Multiview Chirality [PDF] 返回目录
Sameer Agarwal, Andrew Pryhuber, Rainer Sinn, Rekha R. Thomas
Abstract: Given an arrangement of cameras $\mathcal{A} = \{A_1,\dots, A_m\}$, the chiral domain of $\mathcal{A}$ is the subset of $\mathbb{P}^3$ that lies in front it. It is a generalization of the classical definition of chirality. We give an algebraic description of this set and use it to generalize Hartley's theory of chiral reconstruction to $m \ge 2$ views and derive a chiral version of Triggs' Joint Image.
摘要：给定的照相机的布置$ \ mathcal {A} = \ {A_1，\点，A_M \} $，$ a的手性域\ mathcal {A} $是$ \ mathbb {P}的子集^ 3 $即在于前它。它是手性的经典定义的推广。我们给这个组的代数描述，并用它来概括手重建$ M \ GE 2 $视图的哈特利的理论，并得出Triggs'联合图像的手性的版本。

40. Accuracy of MRI Classification Algorithms in a Tertiary Memory Center Clinical Routine Cohort [PDF] 返回目录
Alexandre Morin, Jorge Samper-González, Anne Bertrand, Sebastian Stroer, Didier Dormont, Aline Mendes, Pierrick Coupé, Jamila Ahdidan, Marcel Lévy, Dalila Samri, Harald Hampel, Bruno Dubois, Marc Teichmann, Stéphane Epelbaum, Olivier Colliot
Abstract: BACKGROUND:Automated volumetry software (AVS) has recently become widely available to neuroradiologists. MRI volumetry with AVS may support the diagnosis of dementias by identifying regional atrophy. Moreover, automatic classifiers using machine learning techniques have recently emerged as promising approaches to assist diagnosis. However, the performance of both AVS and automatic classifiers has been evaluated mostly in the artificial setting of research datasets.OBJECTIVE:Our aim was to evaluate the performance of two AVS and an automatic classifier in the clinical routine condition of a memory clinic.METHODS:We studied 239 patients with cognitive troubles from a single memory center cohort. Using clinical routine T1-weighted MRI, we evaluated the classification performance of: 1) univariate volumetry using two AVS (volBrain and Neuroreader$^{TM}$); 2) Support Vector Machine (SVM) automatic classifier, using either the AVS volumes (SVM-AVS), or whole gray matter (SVM-WGM); 3) reading by two neuroradiologists. The performance measure was the balanced diagnostic accuracy. The reference standard was consensus diagnosis by three neurologists using clinical, biological (cerebrospinal fluid) and imaging data and following international criteria.RESULTS:Univariate AVS volumetry provided only moderate accuracies (46% to 71% with hippocampal volume). The accuracy improved when using SVM-AVS classifier (52% to 85%), becoming close to that of SVM-WGM (52 to 90%). Visual classification by neuroradiologists ranged between SVM-AVS and SVM-WGM.CONCLUSION:In the routine practice of a memory clinic, the use of volumetric measures provided by AVS yields only moderate accuracy. Automatic classifiers can improve accuracy and could be a useful tool to assist diagnosis.
摘要：背景：自动容量分析软件（AVS）最近已成为广泛提供给神经放射学家。 MRI用容量法AVS可以通过识别区域萎缩支持痴呆的诊断。此外，使用机器学习技术自动分类器最近出现的有前途的方法，以协助诊断。然而，无论是AVS和自动分类器的性能大多已经在研究datasets.OBJECTIVE人工设定评价：我们的目的是评价2个AVS的性能和内存clinic.METHODS的临床常规条件自动分类：我们研究了239例从一个单一的存储中心队列认知烦恼。使用临床常规T1加权MRI，我们评估的分类性能：1）使用两个AVS（volBrain和Neuroreader $ ^ {TM} $）单变量容量分析; 2）支持向量机（SVM）分类器自动，即使用AVS卷（SVM-AVS），或整个灰质（SVM-WGM）; 3）由两个神经放射学家读取。性能指标是均衡诊断的准确性。参考标准是一致的诊断通过使用临床，生物（脑脊液）和成像数据和下面的国际criteria.RESULTS 3个神经学家：单因素AVS容量法（与海马体积46％至71％）仅提供适度的精度。使用SVM-AVS分类器（52％至85％）时，成为接近SVM-WGM（52至90％）的精度提高。通过视觉神经放射学家分类SVM-AVS和SVM-WGM.CONCLUSION介于：在记忆诊所的日常实践中，使用AVS通过提供体积措施仅产生温和的准确性。自动分类可以提高准确性和可协助诊断的有用工具。

41. DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion [PDF] 返回目录
Zixiang Zhao, Shuang Xu, Chunxia Zhang, Junmin Liu, Pengfei Li, Jiangshe Zhang
Abstract: Infrared and visible image fusion, a hot topic in the field of image processing, aims at obtaining fused images keeping the advantages of source images. This paper proposes a novel auto-encoder (AE) based fusion network. The core idea is that the encoder decomposes an image into background and detail feature maps with low- and high-frequency information, respectively, and that the decoder recovers the original image. To this end, the loss function makes the background/detail feature maps of source images similar/dissimilar. In the test phase, background and detail feature maps are respectively merged via a fusion module, and the fused image is recovered by the decoder. Qualitative and quantitative results illustrate that our method can generate fusion images containing highlighted targets and abundant detail texture information with strong robustness and meanwhile surpass state-of-the-art (SOTA) approaches.
摘要：红外和可见光图像融合，在图像处理领域中的热点话题，目的在于获得融合图像保持源图像的优点。本文提出了一种新颖的自动编码器（AE）基于融合网络。的核心思想是，所述编码器将图像分解成背景和细节特征与低和高频率的信息，分别映射，并且所述解码器恢复原始图像。为此，损耗函数使得源图像相似/不相似的背景/细节特征映射。在测试阶段，背景和细节特征图分别经由融合模块合并，融合图像被解码器恢复。定性和定量结果表明，我们的方法可以产生含有突出显示的目标和丰富的细节纹理信息与鲁棒性强融合图像，同时超过国家的最先进的（SOTA）接近。

42. Diagnosis of Diabetic Retinopathy in Ethiopia: Before the Deep Learning based Automation [PDF] 返回目录
Misgina Tsighe Hagos
Abstract: Introducing automated Diabetic Retinopathy (DR) diagnosis into Ethiopia is still a challenging task, despite recent reports that present trained Deep Learning (DL) based DR classifiers surpassing manual graders. This is mainly because of the expensive cost of conventional retinal imaging devices used in DL based classifiers. Current approaches that provide mobile based binary classification of DR, and the way towards a cheaper and offline multi-class classification of DR will be discussed in this paper.
摘要：自动化的糖尿病视网膜病变（DR）诊断引入埃塞俄比亚仍是一项艰巨的任务，尽管最近报道说，目前训练的基于深度学习（DL）DR分类超越手动年级学生。这主要是因为在基于DL分类器中使用的常规视网膜成像设备的昂贵成本。提供DR的移动基于二元分类目前的做法，以及对DR的便宜和离线多类分类的方式将在本文中讨论。

43. Across-scale Process Similarity based Interpolation for Image Super-Resolution [PDF] 返回目录
Sobhan Kanti Dhara, Debashis Sen
Abstract: A pivotal step in image super-resolution techniques is interpolation, which aims at generating high resolution images without introducing artifacts such as blurring and ringing. In this paper, we propose a technique that performs interpolation through an infusion of high frequency signal components computed by exploiting `process similarity'. By `process similarity', we refer to the resemblance between a decomposition of the image at a resolution to the decomposition of the image at another resolution. In our approach, the decompositions generating image details and approximations are obtained through the discrete wavelet (DWT) and stationary wavelet (SWT) transforms. The complementary nature of DWT and SWT is leveraged to get the structural relation between the input image and its low resolution approximation. The structural relation is represented by optimal model parameters obtained through particle swarm optimization (PSO). Owing to process similarity, these parameters are used to generate the high resolution output image from the input image. The proposed approach is compared with six existing techniques qualitatively and in terms of PSNR, SSIM, and FSIM measures, along with computation time (CPU time). It is found that our approach is the fastest in terms of CPU time and produces comparable results.
摘要：在图像超分辨率技术的一个关键步骤是插值，其目的是在产生高分辨率图像而不引入伪像诸如模糊和振铃。在本文中，我们提出了一种技术，该技术执行内插通通过利用`过程相似性”计算高频信号分量的输注。通过`过程相似性”，我们指的是图像的分解之间的相似处另一项决议，决议，图像的分解。在我们的方法中，分解生成图像细节和近似值通过离散小波（DWT）和平稳小波（SWT）变换而获得。 DWT和SWT的互补性被利用来获取输入图像及其低分辨率近似之间的结构关系。的结构关系是通过粒子群优化（PSO）获得最佳模型参数表示。由于过程相似，这些参数被用来产生从所述输入图像的分辨率高的输出图像。所提出的方法与定性六个现有技术和PSNR，SSIM和FSIM措施相比而言，随着计算时间（CPU时间）。研究发现，我们的做法是在CPU时间上最快，比较的结果。

44. Few-Shot Learning with Geometric Constraints [PDF] 返回目录
Hong-Gyu Jung, Seong-Whan Lee
Abstract: In this article, we consider the problem of few-shot learning for classification. We assume a network trained for base categories with a large number of training examples, and we aim to add novel categories to it that have only a few, e.g., one or five, training examples. This is a challenging scenario because: 1) high performance is required in both the base and novel categories; and 2) training the network for the new categories with a few training examples can contaminate the feature space trained well for the base categories. To address these challenges, we propose two geometric constraints to fine-tune the network with a few training examples. The first constraint enables features of the novel categories to cluster near the category weights, and the second maintains the weights of the novel categories far from the weights of the base categories. By applying the proposed constraints, we extract discriminative features for the novel categories while preserving the feature space learned for the base categories. Using public data sets for few-shot learning that are subsets of ImageNet, we demonstrate that the proposed method outperforms prevalent methods by a large margin.
摘要：在本文中，我们考虑几个次学习的问题进行分类。我们假设训练基地类别，有大量的训练样本的网络，我们的目标是增加新的类别，它是只有几个，例如，一个或五个，训练例子。这是一个具有挑战性的情形，因为：1）高性能在基部和新颖类别都必需的;和2）训练网络为新的类别与几个训练实例可以污染用于基础类训练有素的特征空间。为了应对这些挑战，我们提出了两种几何约束，以微调网络与几个训练的例子。第一个约束使得新颖类别的特征附近的类别权重进行聚类，并且所述第二保持新颖类别的权重远离基地类别的权重。通过应用所提出的限制，我们提取的小说类别判别特征，同时保持获悉，为基类的功能空间。使用公共数据集几拍的学习是ImageNet的子集，我们证明了该方法优于大幅度普遍的方法。

45. Unsupervised Latent Space Translation Network [PDF] 返回目录
Magda Friedjungová, Daniel Vašata, Tomáš Chobola, Marcel Jiřina
Abstract: One task that is often discussed in a computer vision is the mapping of an image from one domain to a corresponding image in another domain known as image-to-image translation. Currently there are several approaches solving this task. In this paper, we present an enhancement of the UNIT framework that aids in removing its main drawbacks. More specifically, we introduce an additional adversarial discriminator on the latent representation used instead of VAE, which enforces the latent space distributions of both domains to be similar. On MNIST and USPS domain adaptation tasks, this approach greatly outperforms competing approaches.
摘要：其通常用在计算机视觉讨论的一个任务是图像的从已知的作为图像到图像的翻译另一个域一个域到对应的图像的映射。目前，有几种方法解决这个任务。在本文中，我们提出单位框架，在去除其主要缺点艾滋病的增强。更具体地说，我们介绍关于用于代替VAE潜表示，这强制两个域的潜在空间分布是相似的附加对抗性鉴别器。在MNIST和USPS领域适应性任务，这种方法大大优于竞争的方法。

46. Efficient algorithm for calculating transposed PSF matrices for 3D light field deconvolution [PDF] 返回目录
Martin Eberhart
Abstract: Volume reconstruction by 3D light field deconvolution is a technique that has been successfully demonstrated for microscopic images recorded by a plenoptic camera. This method requires to compute a transposed version of the 5D matrix that holds the point spread function (PSF) of the optical system. For high resolution cameras with hexagonal microlens arrays this is a very time consuming step. This paper illustrates the significance and the construction of this special matrix and presents an efficient algorithm for its computation, which is based on the distinct relation of the corresponding indices within the original and the transposed matrix. The required computation time is, compared to previously published algorithms, significantly shorter.
摘要：体积重建三维光场去卷积是已经成功地证明了用于由全光照相机记录的显微图像的技术。这种方法需要计算其保持光学系统的点扩散函数（PSF）的5D矩阵的转置版本。对于高分辨率摄像机和六边形微透镜阵列，这是一个非常耗时的步骤。本文阐述的意义和这个特殊矩阵，并提出一个有效的算法用于其计算，它是在原有基础上和置矩阵内对应指数的明显关系的建设。所需的计算时间，比此前公布的算法，显著短。

47. Learning the Loss Functions in a Discriminative Space for Video Restoration [PDF] 返回目录
Younghyun Jo, Jaeyeon Kang, Seoung Wug Oh, Seonghyeon Nam, Peter Vajda, Seon Joo Kim
Abstract: With more advanced deep network architectures and learning schemes such as GANs, the performance of video restoration algorithms has greatly improved recently. Meanwhile, the loss functions for optimizing deep neural networks remain relatively unchanged. To this end, we propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration task. Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network. The generator learns to restore videos in a supervised fashion, by following ground truth features through the feature matching in the discriminative space learned by the loss network. In addition, we also introduce a new relation loss in order to maintain the temporal consistency in output videos. Experiments on video superresolution and deblurring show that our method generates visually more pleasing videos with better quantitative perceptual metric values than the other state-of-the-art methods.
摘要：随着越来越多的先进深的网络架构和学习方案，如甘斯的影像恢复算法的性能，大大提高了最近。同时，优化深层神经网络的损失功能相对保持不变。为此，我们提出了通过学习辨别空间专用于影像恢复任务建立有效的丧失功能的新框架。我们的框架是类似于甘斯在我们反复训练两个网络 - 发电机和损失的网络。发电机学会恢复的视频在监督的方式，通过由网络损耗学会了辨别空间特征匹配以下地面实况功能。此外，我们也为了保持在输出视频的时间一致性引入新的关系损失。视频超分辨率和去模糊表明，我们的方法产生视觉上比其他国家的最先进的方法，更好地量化感知度量值更悦目的视频实验。

48. Online Continual Learning on Sequences [PDF] 返回目录
German I. Parisi, Vincenzo Lomonaco
Abstract: Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate \textit{catastrophic forgetting} in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.
摘要：网上不断地学习（OCL）是指一个系统的，从连续的数据流随时间学习，而不必重新审视以前遇到训练样本的能力。在一个单一的数据传递不断学习是在不断变化的环境中工作的代理人和机器人的关键，需要采集，微调，并从非i.i.d转移日益复杂的表示。输入分布。机器学习模型，该地址OCL必须缓解\ textit {灾难性遗忘}其中隐藏表示被破坏或新颖的输入流学习时完全覆盖。在本章中，我们总结和讨论最近的深度学习模式，在连续输入地址OCL通过突触正规化，结构可塑性和经验回放的使用（和组合）。重放的不同实现已经提出了通过（的潜表示）的输入序列再次发生减轻connectionists架构灾难性遗忘以及功能上类似的海马重播机制在哺乳动物脑中。经验证据表明，架构赋予了经验重播通常优于架构没有（在线）增量的学习任务。

49. Hierarchical Severity Staging of Anterior Cruciate Ligament Injuries using Deep Learning [PDF] 返回目录
Nikan K. Namiri, Io Flament, Bruno Astuto, Rutwik Shah, Radhika Tibrewala, Francesco Caliva, Thomas M. Link, Valentina Pedoia, Sharmila Majumdar
Abstract: Purpose: To evaluate diagnostic utility of two convolutional neural networks (CNNs) for severity staging anterior cruciate ligament (ACL) injuries. Materials and Methods: This retrospective analysis was conducted on 1243 knee MR images (1008 intact, 18 partially torn, 77 fully torn, 140 reconstructed ACLs) from 224 subjects collected between 2011 and 2014 (age=46.50+\-13.55 years, body mass index=24.58+\-3.60 kg/m2, 46% women (mean+\-standard deviation). Images were acquired with a 3.0T MR scanner using 3D fast spin echo CUBE-sequences. The radiologists used a modified scoring metric analagous to the ACLOAS and WORMS for grading standard. To classify ACL injuries with deep learning, two types of CNNs were used, one with three-dimensional (3D) and the other with two-dimensional (2D) convolutional kernels. Performance metrics included sensitivity, specificity, weighted Cohen's kappa, and overall accuracy, followed by two-sample t-tests to compare CNN performance. Results: The overall accuracy (84%) and weighted Cohen's kappa (.92) reported for ACL injury classification were higher using the 2D CNN than the 3D CNN. The 2D CNN and 3D CNN performed similarly in assessing intact ACLs (2D CNN: 93% sensitivity and 90% specificity, 3D CNN: 89% sensitivity and 88% specificity). Classification of full tears by both networks were also comparable (2D CNN: 83% sensitivity and 94% specificity, 3D CNN: 77% sensitivity and 100% sensitivity). The 2D CNN classified all reconstructed ACLs correctly. Conclusion: CNNs applied to ACL lesion classification results in high sensitivity and specificity, leading to potential use in helping grade ACL injuries by non-experts.
摘要：目的：评估严重程度分级前交叉韧带（ACL）受伤2个卷积神经网络（细胞神经网络）的诊断工具。材料和方法：在来自2011和2014（年龄= 46.50 + \之间收集224名受试者1243幅膝盖MR图像（1008完好，18部分撕裂，77完全撕开，140重构的ACL）进行这一回顾性分析 - 13.55年，体重指数= 24.58 + \ - 3.60公斤/平方米，46层％的女性。（平均+ \ - 标准差）图像用采用三维快速自旋回波CUBE-序列的3.0T MR扫描器获取的放射科医师使用修改的得分度量analagous到ACLOAS和分级标准蠕虫。为进行分类ACL损伤深学习中，使用两种类型的细胞神经网络的，一种具有三维（3D），而另一个具有二维（2D）卷积内核。性能度量包括敏感性，特异性，加权科恩的κ，和整体精度，随后双样本t-检验来比较CNN性能结果：报道ACL损伤分类的总体准确性（84％）和加权科恩kappa（0.92）中使用2D CNN高于所述3D CNN。二维CNN和3D CNN在评估完整的ACL（93％的敏感性和90％的特异性，3D CNN：89％的灵敏度和88％特异性2D CNN）进行类似。的全泪由两个网络分类也可比（2D CNN：83％的敏感性和94％的特异性，3D CNN：77％的敏感性和100％的灵敏度）。二维CNN正确归类所有重建的ACL。结论：细胞神经网络的应用在高灵敏度和特异性病变ACL分类结果，导致潜在的使用由非专业人士帮助级交叉韧带损伤。

50. Kidney segmentation using 3D U-Net localized with Expectation Maximization [PDF] 返回目录
Omid Bazgir, Kai Barck, Richard A.D. Carano, Robby M. Weimer, Luke Xie
Abstract: Kidney volume is greatly affected in several renal diseases. Precise and automatic segmentation of the kidney can help determine kidney size and evaluate renal function. Fully convolutional neural networks have been used to segment organs from large biomedical 3D images. While these networks demonstrate state-of-the-art segmentation performances, they do not immediately translate to small foreground objects, small sample sizes, and anisotropic resolution in MRI datasets. In this paper we propose a new framework to address some of the challenges for segmenting 3D MRI. These methods were implemented on preclinical MRI for segmenting kidneys in an animal model of lupus nephritis. Our implementation strategy is twofold: 1) to utilize additional MRI diffusion images to detect the general kidney area, and 2) to reduce the 3D U-Net kernels to handle small sample sizes. Using this approach, a Dice similarity coefficient of 0.88 was achieved with a limited dataset of n=196. This segmentation strategy with careful optimization can be applied to various renal injuries or other organ systems.
摘要：肾脏体积在几个肾脏疾病的影响很大。肾脏的精确和自动分割可以帮助确定肾脏大小和评估肾功能。全卷积神经网络已被用来段机关从大生物医学3D图像。虽然这些网络展示国家的最先进的分割演出，他们没有立即转化为小前景对象，样本量小，并在MRI数据集的各向异性的分辨率。在本文中，我们提出了一个新的框架，以解决一些挑战分割3D MRI。这些方法在临床前MRI实施了狼疮性肾炎的动物模型分割肾脏。我们的实施策略有两方面：1）利用额外的MRI图像扩散检测一般肾区，和2）减少了3D掌中内核来处理小样本量。使用这种方法，0.88骰子相似系数为具有n = 196的有限数据集来实现的。这与细致的优化细分策略可以应用到各种肾损伤或其他器官系统。

51. Weakly Supervised Context Encoder using DICOM metadata in Ultrasound Imaging [PDF] 返回目录
Szu-Yeu Hu, Shuhang Wang, Wei-Hung Weng, JingChao Wang, XiaoHong Wang, Arinc Ozturk, Qian Li, Viksit Kumar, Anthony E. Samir
Abstract: Modern deep learning algorithms geared towards clinical adaption rely on a significant amount of high fidelity labeled data. Low-resource settings pose challenges like acquiring high fidelity data and becomes the bottleneck for developing artificial intelligence applications. Ultrasound images, stored in Digital Imaging and Communication in Medicine (DICOM) format, have additional metadata data corresponding to ultrasound image parameters and medical exams. In this work, we leverage DICOM metadata from ultrasound images to help learn representations of the ultrasound image. We demonstrate that the proposed method outperforms the non-metadata based approaches across different downstream tasks.
摘要：对临床适应现代面向深度学习算法依赖于高保真标记数据显著量。低资源环境造成像获得高保真的数据挑战，并成为开发人工智能应用的瓶颈。超声图像，存储在数字成像和通信医学（DICOM）格式，具有对应于超声图像参数和体检的附加元数据的数据。在这项工作中，我们利用DICOM元数据从超声图像，以帮助了解超声图像的表示。我们证明，该方法优于在不同的下游任务的非基于元数据的方法。

52. VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation [PDF] 返回目录
Ryan Hoque, Daniel Seita, Ashwin Balakrishna, Aditya Ganapathi, Ajay Kumar Tanwani, Nawid Jamali, Katsu Yamane, Soshi Iba, Ken Goldberg
Abstract: Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery and more. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We address this problem by extending the recently proposed Visual Foresight framework to learn fabric dynamics, which can be efficiently reused to accomplish a variety of different fabric manipulation tasks with a single goal-conditioned policy. We introduce VisuoSpatial Foresight (VSF), which extends prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks both in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance for cloth manipulation tasks, and results suggest that leveraging RGBD data for video prediction and planning yields an 80% improvement in fabric folding success rate over pure RGB data. Supplementary material is available at this https URL.
摘要：机器人操纵布在布和线缆管理，高级护理，手术和更多的应用。现有的面料处理技术，然而，是专为特定的任务，因此很难跨越不同但相关的任务概括。我们通过扩展最近提出的视觉远见学习的框架结构动力学，它可以有效地重复使用，以实现各种不同的面料处理任务，一个目标空调政策解决这一问题。我们引入视觉空间展望（VSF），其通过在域视觉学习动态延伸现有工作随机RGB图像和深度同时且完全地在映射仿真。我们通过实验评估对多步织物平滑和折叠无论是在模拟与达芬奇研究试剂盒（dVRK）外科手术机器人任务VSF而不会在火车或测试时间任何演示。此外，我们发现，利用深度显著提高了布操作任务的性能，并且结果表明，利用RGBD数据，视频的预测和规划产生织物提高80％折叠纯RGB数据的成功率。补充材料可在此HTTPS URL。

53. Microvasculature Segmentation and Inter-capillary Area Quantification of the Deep Vascular Complex using Transfer Learning [PDF] 返回目录
Julian Lo, Morgan Heisler, Vinicius Vanzan, Sonja Karst, Ivana Zadro Matovinovic, Sven Loncaric, Eduardo V. Navajas, Mirza Faisal Beg, Marinko V. Sarunic
Abstract: Purpose: Optical Coherence Tomography Angiography (OCT-A) permits visualization of the changes to the retinal circulation due to diabetic retinopathy (DR), a microvascular complication of diabetes. We demonstrate accurate segmentation of the vascular morphology for the superficial capillary plexus and deep vascular complex (SCP and DVC) using a convolutional neural network (CNN) for quantitative analysis. Methods: Retinal OCT-A with a 6x6mm field of view (FOV) were acquired using a Zeiss PlexElite. Multiple-volume acquisition and averaging enhanced the vessel network contrast used for training the CNN. We used transfer learning from a CNN trained on 76 images from smaller FOVs of the SCP acquired using different OCT systems. Quantitative analysis of perfusion was performed on the automated vessel segmentations in representative patients with DR. Results: The automated segmentations of the OCT-A images maintained the hierarchical branching and lobular morphologies of the SCP and DVC, respectively. The network segmented the SCP with an accuracy of 0.8599, and a Dice index of 0.8618. For the DVC, the accuracy was 0.7986, and the Dice index was 0.8139. The inter-rater comparisons for the SCP had an accuracy and Dice index of 0.8300 and 0.6700, respectively, and 0.6874 and 0.7416 for the DVC. Conclusions: Transfer learning reduces the amount of manually-annotated images required, while producing high quality automatic segmentations of the SCP and DVC. Using high quality training data preserves the characteristic appearance of the capillary networks in each layer. Translational Relevance: Accurate retinal microvasculature segmentation with the CNN results in improved perfusion analysis in diabetic retinopathy.
摘要：目的：光学相干断层扫描血管造影（OCT-A）允许的改变到视网膜循环可视化由于糖尿病性视网膜病（DR），糖尿病微血管并发症。我们证明了血管形态使用定量分析卷积神经网络（CNN）浅表毛细血管丛和深部血管复合体（SCP和DVC）的精确分割。方法：视网膜OCT-A用（FOV）的视场6x6mm的使用Zeiss PlexElite获取。多体积采集和平均增强的用于训练CNN容器网络对比度。我们所用的传送学习从CNN的培训从SCP的更小的视场76倍的图像使用不同的OCT系统获得的。是在代表性DR患者自动容器分割进行灌注的定量分析。结果：OCT-A的图像的自动分割维持分层分支和分别SCP和DVC，小叶形态。网络分割的SCP与0.8599的精度，和0.8618一个骰子索引。对于DVC，准确度为0.7986，而骰子指数为0.8139。 SCP的评估者间的比较具有0.8300和0.6700，分别和0.6874和0.7416的DVC的精度和骰子索引。结论：传输学习减少了所需的手动注释的图像的数量，同时产生的SCP和DVC的高品质自动分割。采用高品质的训练数据保留了毛细管网络的每一层中的特征外观。平移相关性：精确视网膜微血管分割与CNN导致改进的灌注分析在糖尿病性视网膜病。

54. Metric learning: cross-entropy vs. pairwise losses [PDF] 返回目录
Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, Ismail Ben Ayed
Abstract: Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses and convoluted sample-mining and implementation strategies to ease optimization. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. These findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for complex sample-mining and optimization schemes. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our experiments over four standard DML benchmarks (CUB200, Cars-196, Stanford Online Product and In-Shop) strongly support our findings. We consistently obtained state-of-the-art results, outperforming many recent and complex DML methods.
摘要：近日，在深度量学习（DML）大量的研究工作集中于设计复杂的成对距离的损失和曲样品采和实施战略，以缓解优化。分类标准交叉熵损失在DML很大程度上被忽略。从表面上看，交叉熵看似无关的和不相干的度量学习，因为它没有明确涉及成对距离。但是，我们提供了交叉熵链接到多家知名和最近成对损失的理论分析。我们的连接是从两个不同的角度得出：基于明确的优化见解之一;另外对的标签和学习功能之间的相互区别的信息和生成的观点。首先，我们明确地表明，交叉熵是一个上限一个新的成对损失，其具有类似于各种配对损失的结构：它同时最大化级间距离最小化的类内的距离。其结果是，最小化交叉熵可以被看作是一个近似束缚优化（或Majorize-最小化）算法用于最小化此成对的损失。其次，我们证明了，更一般地，最大限度地减少交叉熵实际上相当于最大化互信息，这是我们连接几个知名成对的损失。这些结果表明，交叉熵表示最大化互信息的代理 - 为成对的损失做 - 而不需要复杂的样品挖掘和优化方案。此外，我们表明，各种标准的成对的损失可以通过绑定关系来明确彼此相关。我们在四个标准DML基准（CUB200，汽车-196，斯坦福大学的在线产品和店铺内）的实验有力地支持我们的研究结果。我们始终得到国家的先进成果，表现优于许多最近的和复杂的DML方法。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-03-23

目录

摘要