摘要

1. Improved Modeling of 3D Shapes with Multi-view Depth Maps [PDF] 返回目录
Kamal Gupta, Susmija Jabbireddy, Ketul Shah, Abhinav Shrivastava, Matthias Zwicker
Abstract: We present a simple yet effective general-purpose framework for modeling 3D shapes by leveraging recent advances in 2D image generation using CNNs. Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects. Our simple encoder-decoder framework, comprised of a novel identity encoder and class-conditional viewpoint generator, generates 3D consistent depth maps. Our experimental results demonstrate the two-fold advantage of our approach. First, we can directly borrow architectures that work well in the 2D image domain to 3D. Second, we can effectively generate high-resolution 3D shapes with low computational memory. Our quantitative evaluations show that our method is superior to existing depth map methods for reconstructing and synthesizing 3D objects and is competitive with other representations, such as point clouds, voxel grids, and implicit functions.
摘要：我们提出了通过利用使用细胞神经网络在二维图像生成的最新进展建立3D形状简单而有效的通用框架。仅使用对象的单个深度图像，就可以输出3D的致密多视点深度图表示的对象。我们简单的编码器 - 解码器框架，包括一个新颖的身份编码器和分类条件视点发生器，生成3D一致的深度图。我们的实验结果证明了该方法的双重优点。首先，我们可以直接借用架构，做工精良的2D图像域到3D。其次，我们可以有效地生成高分辨率的3D形状具有低计算内存。我们的定量评估表明，我们的方法是优于重建和合成3D对象存在的深度映射方法，并与其他表示，如点云，体素网格和隐函数的竞争力。

2. A novel action recognition system for smart monitoring of elderly people using Action Pattern Image and Series CNN with transfer learning [PDF] 返回目录
L. Aneesh Euprazia, K.K.Thyagharajan
Abstract: Falling of elderly people who are staying alone at home leads to health risks. If they are not attended immediately even it may lead to fatal danger to their life. In this paper a novel computer vision-based system for smart monitoring of elderly people using Series Convolutional Neural Network (SCNN) with transfer learning is proposed. When CNN is trained by the frames of the videos directly, it learns from all pixels including the background pixels. Generally, the background in a video does not contribute anything in identifying the action and actually it will mislead the action classification. So, we propose a novel action recognition system and our contributions are 1) to generate more general action patterns which are not affected by illumination and background variations of the video sequences and eliminate the obligation of image augmentation in CNN training 2) to design SCNN architecture and enhance the feature extraction process to learn large amount of data, 3) to present the patterns learnt by the neurons in the layers and analyze how these neurons capture the action when the input pattern is passing through these neurons, and 4) to extend the capability of the trained SCNN for recognizing fall actions using transfer learning.
摘要：谁在家里导致健康风险的人留老人的下落。如果他们不立即出席，甚至有可能导致致命的危险，他们的生活。在本文中使用卷积系列神经网络（SCNN）具有迁移学习老人的智能监控新颖的基于计算机视觉系统算法。当CNN是通过直接的视频帧的训练，它包括背景像素的所有像素获悉。一般来说，在视频的背景不识别的动作有助于任何东西，它实际上会误导行为分类。因此，我们提出了一种新颖的动作识别系统和我们的贡献是：1），以产生不影响由视频序列的照明和背景的变化更一般的行动模式，并消除图像增强的在CNN训练2）来设计SCNN架构的义务增强特征提取处理来学习大量的数据，3），以呈现由神经元的各层学习的模式和分析这些神经元如何捕获动作，当输入模式是通过这些神经元，以及4）以延长能力训练的SCNN识别使用迁移学习秋季行动。

3. User-assisted Video Reflection Removal [PDF] 返回目录
Amgad Ahmed, Suhong Kim, Mohamed Elgharib, Mohamed Hefeeda
Abstract: Reflections in videos are obstructions that often occur when videos are taken behind reflective surfaces like glass. These reflections reduce the quality of such videos, lead to information loss and degrade the accuracy of many computer vision algorithms. A video containing reflections is a combination of background and reflection layers. Thus, reflection removal is equivalent to decomposing the video into two layers. This, however, is a challenging and ill-posed problem as there is an infinite number of valid decompositions. To address this problem, we propose a user-assisted method for video reflection removal. We rely on both spatial and temporal information and utilize sparse user hints to help improve separation. The key idea of the proposed method is to use motion cues to separate the background layer from the reflection layer with minimal user assistance. We show that user-assistance significantly improves the layer separation results. We implement and evaluate the proposed method through quantitative and qualitative results on real and synthetic videos. Our experiments show that the proposed method successfully removes reflection from video sequences, does not introduce visual distortions, and significantly outperforms the state-of-the-art reflection removal methods in the literature.
摘要：反射在视频是当采取象玻璃反射面后面的视频经常出现的障碍物。这些反射降低这种视频的质量，导致信息丢失和降解的许多计算机视觉算法的准确性。含反射视频的背景和反射层的组合。因此，反射去除相当于分解视频分成两层。然而，这是一个具有挑战性的和病态问题，因为有效分解无限多的。为了解决这个问题，我们提出了视频反射去除用户辅助的方法。我们依靠空间和时间信息，并利用稀疏的用户提示，以帮助改善分离。所提出的方法的主要思想是使用运动提示到背景层从以最小的用户辅助反射层分离。我们发现，用户援助显著提高层分离的结果。我们实施和评估通过真实和合成视频定量和定性结果所提出的方法。我们的实验结果表明，所提出的方法成功地除去反射从视频序列中，不引入视觉扭曲，并显著优于文献中的状态的最先进的反射去除方法。

4. Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents [PDF] 返回目录
Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh
Abstract: Recent work has presented embodied agents that can navigate to point-goal targets in novel indoor environments with near-perfect accuracy. However, these agents are equipped with idealized sensors for localization and take deterministic actions. This setting is practically sterile by comparison to the dirty reality of noisy sensors and actuations in the real world -- wheels can slip, motion sensors have error, actuations can rebound. In this work, we take a step towards this noisy reality, developing point-goal navigation agents that rely on visual estimates of egomotion under noisy action dynamics. We find these agents outperform naive adaptions of current point-goal agents to this setting as well as those incorporating classic localization baselines. Further, our model conceptually divides learning agent dynamics or odometry (where am I?) from task-specific navigation policy (where do I want to go?). This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy. Our agent was the runner-up in the PointNav track of CVPR 2020 Habitat Challenge.
摘要：最近的工作提出，可以浏览到的点目标目标与近乎完美的精度新颖的室内环境中体现剂。然而，这些药物都配备了理想化的传感器，用于定位和采取行动的确定性。此设置是比较嘈杂的传感器及工作在现实世界的肮脏现实几乎无菌 - 轮子可以滑，运动传感器有故障，启动可以反弹。在这项工作中，我们采取朝着这个嘈杂的现实了一步，发展依靠在嘈杂的行动力度自身运动的视觉估计点目标导航剂。我们发现这些药物优于当前点目标的代理天真adaptions此设置以及那些结合了经典的定位基准。此外，我们的模型概念上将学习从任务的具体导航策略代理动态或测距（我在哪里？）（我在哪里要去？）。绕过的导航政策的重新训练的费用 - 这通过简单地重新校准所述视觉里程计模型能够对无缝适应不断变化的动态（不同的机器人或地板类型）。我公司代理的是亚军，在CVPR 2020人居挑战的PointNav轨道。

5. A Review on Near Duplicate Detection of Images using Computer Vision Techniques [PDF] 返回目录
K. K. Thyagharajan, G. Kalaiarasi
Abstract: Nowadays, digital content is widespread and simply redistributable, either lawfully or unlawfully. For example, after images are posted on the internet, other web users can modify them and then repost their versions, thereby generating near-duplicate images. The presence of near-duplicates affects the performance of the search engines critically. Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from digital images. The main application of computer vision is image understanding. There are several tasks in image understanding such as feature extraction, object detection, object recognition, image cleaning, image transformation, etc. There is no proper survey in literature related to near duplicate detection of images. In this paper, we review the state-of-the-art computer vision-based approaches and feature extraction methods for the detection of near duplicate images. We also discuss the main challenges in this field and how other researchers addressed those challenges. This review provides research directions to the fellow researchers who are interested to work in this field.
摘要：如今，数字内容是广泛的，只是再发行的，无论是合法还是非法。例如，图像在互联网上发布后，其他网络用户可以对其进行修改和重新发布，然后它们的版本，从而产生近乎重复的图像。近重复的存在影响了搜索引擎至关重要的性能。计算机视觉涉及的自动提取，分析和从数字图像有用信息的理解。计算机视觉的主要应用是图像理解。有图像几个任务的理解，如特征提取，目标检测，物体识别，图像清洁，图像变换等，有文献中没有适当的调查涉及到图像的近重复检测。在本文中，我们回顾用于检测近似重复图像的国家的最先进的基于计算机视觉的方法和特征提取方法。我们还讨论了在这一领域的主要挑战，以及如何其他研究人员解决了这些挑战。本文综述了研究方向的研究伙伴谁有兴趣的工作在这个领域。

6. Are Deep Neural Architectures Losing Information? Invertibility Is Indispensable [PDF] 返回目录
Yang Liu, Zhenyue Qin, Saeed Anwar, Sabrina Caldwell, Tom Gedeon
Abstract: Ever since the advent of AlexNet, designing novel deep neural architectures for different tasks has consistently been a productive research direction. Despite the exceptional performance of various architectures in practice, we study a theoretical question: what is the condition for deep neural architectures to preserve all the information of the input data? Identifying the information lossless condition for deep neural architectures is important, because tasks such as image restoration require keep the detailed information of the input data as much as possible. Using the definition of mutual information, we show that: a deep neural architecture can preserve maximum details about the given data if and only if the architecture is invertible. We verify the advantages of our Invertible Restoring Autoencoder (IRAE) network by comparing it with competitive models on three perturbed image restoration tasks: image denoising, jpeg image decompression and image inpainting. Experimental results show that IRAE consistently outperforms non-invertible ones. Our model even contains far fewer parameters. Thus, it may be worthwhile to try replacing standard components of deep neural architectures, such as residual blocks and ReLU, with their invertible counterparts. We believe our work provides a unique perspective and direction for future deep learning research.
摘要：自AlexNet问世以来，设计不同的任务新颖的深层神经结构一直是一个富有成效的研究方向。尽管各种架构在实践中的卓越性能，我们研究的一个理论问题：什么是深层神经结构保存所有输入数据的信息的条件？识别深层神经结构的信息无损条件是很重要的，因为任务，例如图像恢复需要保持输入数据的详细信息，尽可能多地。使用互信息的定义，我们证明了：一个深神经结构可以保留有关给定数据最大细节当且仅当该架构是可逆的。图像去噪，JPEG图像压缩和图像修复：我们通过将它与竞争车型在三个扰动图像恢复的任务比较验证我们的可逆恢复自动编码器（IRAE）网络的优势。实验结果表明，IRAE一贯优于不可逆的。我们的模型甚至包含少得多的参数。因此，它可能是值得尝试更换深层神经结构，如残余块和RELU的标准组件，与他们的可逆对应物。我们相信，我们的工作为今后的深度学习研究的独特视角和方向。

7. Improving colonoscopy lesion classification using semi-supervised deep learning [PDF] 返回目录
Mayank Golhar, Taylor L. Bobrow, MirMilad Pourmousavi Khoshknab, Simran Jit, Saowanee Ngamruengphong, Nicholas J. Durr
Abstract: While data-driven approaches excel at many image analysis tasks, the performance of these approaches is often limited by a shortage of annotated data available for training. Recent work in semi-supervised learning has shown that meaningful representations of images can be obtained from training with large quantities of unlabeled data, and that these representations can improve the performance of supervised tasks. Here, we demonstrate that an unsupervised jigsaw learning task, in combination with supervised training, results in up to a 9.8% improvement in correctly classifying lesions in colonoscopy images when compared to a fully-supervised baseline. We additionally benchmark improvements in domain adaptation and out-of-distribution detection, and demonstrate that semi-supervised learning outperforms supervised learning in both cases. In colonoscopy applications, these metrics are important given the skill required for endoscopic assessment of lesions, the wide variety of endoscopy systems in use, and the homogeneity that is typical of labeled datasets.
摘要：虽然数据驱动方法擅长于多种图像分析任务，这些性能的办法通常是由注释数据可用于训练不足的限制。在半监督学习最近的研究表明，图像的有意义的表示可以从大量的无标签数据的训练来获得，并且这些表述可以提高监督任务的性能。在这里，我们表明，相比于完全监督基线时无人监督的拼图学习任务，在指导训练，结果在高达9.8％的改善在结肠镜检查图像进行正确分类的病变组合。我们在领域适应性和外的分布检测另外标杆改进，并证明了半监督学习性能优于监督在这两种情况下学习。在结肠镜检查的应用，给出了病变的内镜评估，在各种各样的应用内窥镜系统，这是典型的标记数据集的同质性所需要的技能，这些指标是非常重要的。

8. Deepfake detection: humans vs. machines [PDF] 返回目录
Pavel Korshunov, Sébastien Marcel
Abstract: Deepfake videos, where a person's face is automatically swapped with a face of someone else, are becoming easier to generate with more realistic results. In response to the threat such manipulations can pose to our trust in video evidence, several large datasets of deepfake videos and many methods to detect them were proposed recently. However, it is still unclear how realistic deepfake videos are for an average person and whether the algorithms are significantly better than humans at detecting them. In this paper, we present a subjective study conducted in a crowdsourcing-like scenario, which systematically evaluates how hard it is for humans to see if the video is deepfake or not. For the evaluation, we used 120 different videos (60 deepfakes and 60 originals) manually pre-selected from the Facebook deepfake database, which was provided in the Kaggle's Deepfake Detection Challenge 2020. For each video, a simple question: "Is face of the person in the video real of fake?" was answered on average by 19 naïve subjects. The results of the subjective evaluation were compared with the performance of two different state of the art deepfake detection methods, based on Xception and EfficientNets (B4 variant) neural networks, which were pre-trained on two other large public databases: the Google's subset from FaceForensics++ and the recent Celeb-DF dataset. The evaluation demonstrates that while the human perception is very different from the perception of a machine, both successfully but in different ways are fooled by deepfakes. Specifically, algorithms struggle to detect those deepfake videos, which human subjects found to be very easy to spot.
摘要：Deepfake视频，其中一个人的脸部自动脸部别人交换，变得更容易更真实的结果产生。为了应对威胁这样的操作可能对我们的视频证据信赖，deepfake视频和许多方法的几个大型数据集检测他们最近提出。但是，目前还不清楚deepfake视频如何现实是一般人，以及是否算法显著优于在检测它们的人类。在本文中，我们提出的进行了主观考察众包般的场景，有系统地评估是多么困难的人，看视频是deepfake与否。对于评估中，我们使用来自Facebook的deepfake数据库，这是在Kaggle的Deepfake检测挑战2020年提供给每个视频手动预先选定的120个不同的视频（60个deepfakes和60个原件），一个简单的问题：“是的脸人假冒的视频是真的吗？”由19名天真的受试者中平均。从谷歌的子集：主观评价结果与艺术deepfake检测方法的基础上，Xception和EfficientNets（B4变体）神经网络，对其他两个大型公共数据库这是预先训练的两种不同状态下的性能进行了比较FaceForensics ++和最近的名人-DF数据集。评估表明，尽管人类的感知是从一台机器的感觉非常不同，无论是成功，但以不同的方式被deepfakes上当。具体而言，算法很难检测那些deepfake视频，其中人受试者认为是非常容易被发现。

9. Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges [PDF] 返回目录
Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham
Abstract: An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of 3D scene understanding is the availability of large-scale and richly annotated datasets. However, publicly available datasets are either in relative small spatial scales or have limited semantic annotations due to the expensive cost of data acquisition and data annotation, which severely limits the development of fine-grained semantic understanding in the context of 3D point clouds. In this paper, we present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points, which is five times the number of labeled points than the existing largest point cloud dataset. Our dataset consists of large areas from two UK cities, covering about 6 $km^2$ of the city landscape. In the dataset, each 3D point is labeled as one of 13 semantic classes. We extensively evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results. In particular, we identify several key challenges towards urban-scale point cloud understanding. The dataset is available at this https URL.
摘要：发动的监督深度学习算法的潜力在3D场景的了解该地区的一个必要前提是大规模和丰富的注解数据集的可用性。然而，可公开获得的数据集或者是在相对小的空间尺度或已经由于数据采集和数据注解的昂贵成本，这严重限制了细粒度语义理解的发展中的3D点云的上下文中不受限制语义注释。在本文中，我们提出用近三十亿丰富的注解分，这比现有最大的点云数据集标记点五倍于城市规模的摄影点云数据集。我们的数据包括来自两个英国城市大面积，占地约$ 6公里^ 2 $的城市景观。在数据集，每个3D点被标记为13个语义类别之一。我们广泛地评估我们的数据集的国家的最先进的算法的性能，并提供结果的综合分析。特别是，我们确定向城市大规模点云的理解几个关键的挑战。该数据集可在此HTTPS URL。

10. Interpretable Deep Multimodal Image Super-Resolution [PDF] 返回目录
Iman Marivani, Evaggelia Tsiligianni, Bruno Cornelis, Nikos Deligiannis
Abstract: Multimodal image super-resolution (SR) is the reconstruction of a high resolution image given a low-resolution observation with the aid of another image modality. While existing deep multimodal models do not incorporate domain knowledge about image SR, we present a multimodal deep network design that integrates coupled sparse priors and allows the effective fusion of information from another modality into the reconstruction process. Our method is inspired by a novel iterative algorithm for coupled convolutional sparse coding, resulting in an interpretable network by design. We apply our model to the super-resolution of near-infrared image guided by RGB images. Experimental results show that our model outperforms state-of-the-art methods.
摘要：多模态图像超分辨率（SR）是给定的一个低分辨率观察与另一图像模态的辅助下的高分辨率图像的重建。尽管现有的深多式联运模式做一下形象SR没有纳入领域知识，我们提出了一个多深的网络设计，集成加上稀疏先验，并允许信息的有效融合，从另一个模式纳入重建进程。我们的方法是由一种新的迭代算法耦合卷积稀疏编码的启发，导致由设计可解释的网络。我们运用我们的模型的超分辨率的RGB图像引导的近红外图像。实验结果表明，我们的模型优于国家的最先进的方法。

11. Stem-leaf segmentation and phenotypic trait extraction of maize shoots from three-dimensional point cloud [PDF] 返回目录
Chao Zhu, Teng Miao, Tongyu Xu, Tao Yang, Na Li
Abstract: Nowadays, there are many approaches to acquire three-dimensional (3D) point clouds of maize plants. However, automatic stem-leaf segmentation of maize shoots from three-dimensional (3D) point clouds remains challenging, especially for new emerging leaves that are very close and wrapped together during the seedling stage. To address this issue, we propose an automatic segmentation method consisting of three main steps: skeleton extraction, coarse segmentation based on the skeleton, fine segmentation based on stem-leaf classification. The segmentation method was tested on 30 maize seedlings and compared with manually obtained ground truth. The mean precision, mean recall, mean micro F1 score and mean over accuracy of our segmentation algorithm were 0.964, 0.966, 0.963 and 0.969. Using the segmentation results, two applications were also developed in this paper, including phenotypic trait extraction and skeleton optimization. Six phenotypic parameters can be accurately and automatically measured, including plant height, crown diameter, stem height and diameter, leaf width and length. Furthermore, the values of R2 for the six phenotypic traits were all above 0.94. The results indicated that the proposed algorithm could automatically and precisely segment not only the fully expanded leaves, but also the new leaves wrapped together and close together. The proposed approach may play an important role in further maize research and applications, such as genotype-to-phenotype study, geometric reconstruction and dynamic growth animation. We released the source code and test data at the web site this https URL
摘要：如今，有许多方法来玉米植株的获取三维（3D）点云。然而，从三维（3D）点云遗体玉米笋自动茎叶分割挑战，特别是对新出现的叶子非常接近，并在苗期缠在一起。为了解决这个问题，我们提出了包括三个主要步骤自动分割方法：骨架提取的基础上，骨架粗分割，细分割基于茎叶分类。分割方法已于30个玉米幼苗测试，并与手动获得的基础事实相比较。平均精度，平均召回，平均微F1分数和平均超过精度我们的分割算法都是0.964，0.966，0.963和0.969。使用分割结果，两个应用程序也被开发在本文中，包括表型性状提取和骨架优化。六个参数表型可以精确地和自动测量，包括植物高度，冠径，茎高度和直径，叶宽和长度。另外，R2的为六个的表型性状值均高于0.94。结果表明，该算法能够自动，准确地细分，不仅完全展开的叶片，也是新叶包裹起来，并并拢。所提出的方法可以起到进一步玉米的研究和应用，如基因型到表型研究，几何重建和活力的增长动画了重要的作用。我们在网站上发布的源代码和测试数据，此HTTPS URL

12. Progressive Bilateral-Context Driven Model for Post-Processing Person Re-Identification [PDF] 返回目录
Min Cao, Chen Chen, Hao Dou, Xiyuan Hu, Silong Peng, Arjan Kuijper
Abstract: Most existing person re-identification methods compute pairwise similarity by extracting robust visual features and learning the discriminative metric. Owing to visual ambiguities, these content-based methods that determine the pairwise relationship only based on the similarity between them, inevitably produce a suboptimal ranking list. Instead, the pairwise similarity can be estimated more accurately along the geodesic path of the underlying data manifold by exploring the rich contextual information of the sample. In this paper, we propose a lightweight post-processing person re-identification method in which the pairwise measure is determined by the relationship between the sample and the counterpart's context in an unsupervised way. We translate the point-to-point comparison into the bilateral point-to-set comparison. The sample's context is composed of its neighbor samples with two different definition ways: the first order context and the second order context, which are used to compute the pairwise similarity in sequence, resulting in a progressive post-processing model. The experiments on four large-scale person re-identification benchmark datasets indicate that (1) the proposed method can consistently achieve higher accuracies by serving as a post-processing procedure after the content-based person re-identification methods, showing its state-of-the-art results, (2) the proposed lightweight method only needs about 6 milliseconds for optimizing the ranking results of one sample, showing its high-efficiency. Code is available at: this https URL.
摘要：大多数现有的人重新鉴定方法计算提取强大的视觉特征和学习的判别指标配对相似。由于视觉模糊度，决定成对关系这些基于内容的方法仅基于它们之间的相似性，必然产生不理想的排名列表。取而代之的是，成对的相似性可以更精确地沿着通过探索样品的丰富的上下文信息的基础数据歧管的短程线路径估计。在本文中，我们提出在其中成对测量由样品和对方的以无监督方式上下文之间的关系确定的轻量的后处理的人重新鉴定方法。我们翻译点至点比较到双边点到组比较。第一阶上下文和第二阶上下文，其被用于计算序列中的配对相似，导致渐进后处理模式：将样品的上下文与两种不同的定义方式构成其相邻样本。在四个大型人重新鉴定基准数据集的实验结果表明，（1）所提出的方法能始终如一由作为基于内容的人重新识别方法之后的后处理程序实现更高的精度，显示出其状态的-the-技术的结果，（2）所提出的轻量化方法只需要大约6毫秒用于优化一个样本的排名结果，显示其高效率。代码，请访问：此HTTPS URL。

13. Uncertainty Inspired RGB-D Saliency Detection [PDF] 返回目录
Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Saleh, Sadegh Aliakbarian, Nick Barnes
Abstract: We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: this https URL.
摘要：我们从数据标记过程学习提出了RGB-d显着性检测雇佣的不确定性第一随机框架。现有的RGB-d显着性检测模型通过预测下一个确定性的学习管道一个显着图把这个任务作为点估计问题。我们认为，然而，确定性的解决方案是比较病态。通过显着性数据标记过程的启发，我们提出了一种生成体系结构，以实现概率RGB-d显着性检测其利用潜变量与标记的变化进行建模。我们的框架包括两个主要的模型：1）的发电机模型，其将输入图像和潜在变量映射到随机显着性预测，以及2）一推理模型，通过从真实或近似后验分布采样它逐渐更新潜在变量。发电机模型是编码器 - 解码器的显着性网络。推断潜变量，我们介绍两种不同的溶液：1）一个有条件变自动编码器与额外的编码器来近似潜在变量的后验分布;和ii）交替反向传播技术，直接样品从真实后验分布潜在变量。定性和六个定量结果挑战RGB-d基准数据集显示我们的方法对学习的显着性图的分布性能优越。源代码是通过我们的项目页面上公开提供：此HTTPS URL。

14. A Light-Weight Object Detection Framework with FPA Module for Optical Remote Sensing Imagery [PDF] 返回目录
Xi Gu, Lingbin Kong, Zhicheng Wang, Jie Li, Zhaohui Yu, Gang Wei
Abstract: With the development of remote sensing technology, the acquisition of remote sensing images is easier and easier, which provides sufficient data resources for the task of detecting remote sensing objects. However, how to detect objects quickly and accurately from many complex optical remote sensing images is a challenging hot issue. In this paper, we propose an efficient anchor free object detector, CenterFPANet. To pursue speed, we use a lightweight backbone and introduce the asymmetric revolution block. To improve the accuracy, we designed the FPA module, which links the feature maps of different levels, and introduces the attention mechanism to dynamically adjust the weights of each level of feature maps, which solves the problem of detection difficulty caused by large size range of remote sensing objects. This strategy can improve the accuracy of remote sensing image object detection without reducing the detection speed. On the DOTA dataset, CenterFPANet mAP is 64.00%, and FPS is 22.2, which is close to the accuracy of the anchor-based methods currently used and much faster than them. Compared with Faster RCNN, mAP is 6.76% lower but 60.87% faster. All in all, CenterFPANet achieves a balance between speed and accuracy in large-scale optical remote sensing object detection.
摘要：随着遥感技术的发展，遥感图像的获取是更容易和更简单，这为检测遥感对象任务提供了足够的数据的资源。然而，如何从许多复杂的光学遥感图像快速，准确地检测物体是一个具有挑战性的热点问题。在本文中，我们提出了一个高效的无锚定对象检测器，CenterFPANet。为了追求速度，我们使用一个轻量级的骨干和介绍不对称革命块。为了提高精确度，我们设计了FPA模块，哪个环节的功能不同级别的地图，并介绍了注意机制，动态调整特征图的每一层，解决造成的大尺寸范围的检测难的问题的权重遥感对象。这种策略可以提高遥感图像对象检测的精确度不降低检测速度。在DOTA数据集，CenterFPANet地图为64.00％，而FPS是22.2，这是接近的基于锚的方法的准确性目前使用的比他们快得多。更快的RCNN相比，地图是6.76％低，但60.87％的速度。总而言之，CenterFPANet实现大型光学遥感物体检测速度和精确度之间的平衡。

15. Visual Sentiment Analysis from Disaster Images in Social Media [PDF] 返回目录
Syed Zohaib Hassan, Kashif Ahmad, Steven Hicks, Paal Halvorsen, Ala Al-Fuqaha, Nicola Conci, Michael Riegler
Abstract: The increasing popularity of social networks and users' tendency towards sharing their feelings, expressions, and opinions in text, visual, and audio content, have opened new opportunities and challenges in sentiment analysis. While sentiment analysis of text streams has been widely explored in literature, sentiment analysis from images and videos is relatively new. This article focuses on visual sentiment analysis in a societal important domain, namely disaster analysis in social media. To this aim, we propose a deep visual sentiment analyzer for disaster related images, covering different aspects of visual sentiment analysis starting from data collection, annotation, model selection, implementation, and evaluations. For data annotation, and analyzing peoples' sentiments towards natural disasters and associated images in social media, a crowd-sourcing study has been conducted with a large number of participants worldwide. The crowd-sourcing study resulted in a large-scale benchmark dataset with four different sets of annotations, each aiming a separate task. The presented analysis and the associated dataset will provide a baseline/benchmark for future research in the domain. We believe the proposed system can contribute toward more livable communities by helping different stakeholders, such as news broadcasters, humanitarian organizations, as well as the general public.
摘要：社交网络和用户对文本，视觉分享自己的感受，表达和意见倾向，和音频内容的日益普及，在情感分析开辟了新的机遇和挑战。虽然文本流的情感分析已经被广泛地探讨文学，从图像和视频的情感分析是比较新的。本文重点介绍的视觉情感分析的社会重要领域，在社会化媒体即灾害分析。为了这个目标，我们提出了一个深刻的视觉情感分析器用于灾难相关的图像，覆盖了从数据采集，注释，选型，实施和评估开始的视觉情感分析的不同方面。对于数据的注释和分析人民对自然灾害和社交媒体相关的图像的情绪，一个众包的研究已经与世界各地的大批参与者进行。人群外包研究导致了大规模的基准数据集中四套不同的注解，针对每一个单独的任务。所提出的分析和相关数据集将提供域中未来研究的基线/基准。我们认为，所提出的系统可以帮助不同利益相关者，如新闻广播员，人道主义组织，以及广大公众对更适宜居住的社区做出贡献。

16. Real-Time Segmentation of Non-Rigid Surgical Tools based on Deep Learning and Tracking [PDF] 返回目录
Luis C. García-Peraza-Herrera, Wenqi Li, Caspar Gruijthuijsen, Alain Devreker, George Attilakos, Jan Deprest, Emmanuel Vander Poorten, Danail Stoyanov, Tom Vercauteren, Sébastien Ourselin
Abstract: Real-time tool segmentation is an essential component in computer-assisted surgical systems. We propose a novel real-time automatic method based on Fully Convolutional Networks (FCN) and optical flow tracking. Our method exploits the ability of deep neural networks to produce accurate segmentations of highly deformable parts along with the high speed of optical flow. Furthermore, the pre-trained FCN can be fine-tuned on a small amount of medical images without the need to hand-craft features. We validated our method using existing and new benchmark datasets, covering both ex vivo and in vivo real clinical cases where different surgical instruments are employed. Two versions of the method are presented, non-real-time and real-time. The former, using only deep learning, achieves a balanced accuracy of 89.6% on a real clinical dataset, outperforming the (non-real-time) state of the art by 3.8% points. The latter, a combination of deep learning with optical flow tracking, yields an average balanced accuracy of 78.2% across all the validated datasets.
摘要：实时工具分割是计算机辅助外科手术系统的一个重要组成部分。我们提出了一种基于全卷积网络（FCN）一种新颖的实时自动方法和光流跟踪。我们的方法利用深神经网络，以产生高度可变形的部件的准确分割与光流的高速沿的能力。此外，预训练FCN可以在无需手工工艺特征少量医用图像的微调。我们使用现有的和新的基准数据集，涵盖体外和在不同的手术器械采用体内真正的临床病例验证了我们的方法。该方法的两个版本呈现，非实时和实时性。前者，只用深度学习，达到89.6％，对一个真正的临床数据集的平衡精度，3.8个百分点跑赢艺术的（非实时）的状态。后者，深度学习与光流跟踪的组合，得到的78.2％在所有验证数据集的平均平衡精度。

17. Light Field View Synthesis via Aperture Flow and Propagation Confidence Map [PDF] 返回目录
Nan Meng, Kai Li, Jianzhuang Liu, Edmund Y. Lam
Abstract: This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing an aperture flow map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a propagation confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows outstanding performance over several state-of-the-art techniques.
摘要：本文提出了一种基于学习的方法来合成，从给定的稀疏组图像的任意摄像机位置的视图。对于这种新的视图合成的关键挑战源自于重建过程中，当从不同的输入图像的意见可能不一致由于在光路阻塞。我们通过联合建模设计卷积神经网络的核财产和闭塞克服这一点。我们通过定义和计算的孔流图，其近似于视差，并且测量两个视图之间的逐像素偏移开始。虽然这涉及到自由空间渲染和可能失败的对象边界附近，我们进一步发展的传播置信图到地址像素闭塞这些挑战性的区域。该方法是在不同的现实世界和合成光场的场景评价，它显示了国家的最先进的几种技术出色的表现。

18. Stochastic-YOLO: Efficient Probabilistic Object Detection under Dataset Shifts [PDF] 返回目录
Tiago Azevedo, René de Jong, Partha Maji
Abstract: In image classification tasks, the evaluation of models' robustness to increased dataset shifts with a probabilistic framework is very well studied. However, Object Detection (OD) tasks pose other challenges for uncertainty estimation and evaluation. For example, one needs to evaluate both the quality of the label uncertainty (i.e., what?) and spatial uncertainty (i.e., where?) for a given bounding box, but that evaluation cannot be performed with more traditional average precision metrics (e.g., mAP). In this paper, we adapt the well-established YOLOv3 architecture to generate uncertainty estimations by introducing stochasticity in the form of Monte Carlo Dropout (MC-Drop), and evaluate it across different levels of dataset shift. We call this novel architecture Stochastic-YOLO, and provide an efficient implementation to effectively reduce the burden of the MC-Drop sampling mechanism at inference time. Finally, we provide some sensitivity analyses, while arguing that Stochastic-YOLO is a sound approach that improves different components of uncertainty estimations, in particular spatial uncertainties.
摘要：在图像分类任务，模型的鲁棒性增加的数据集的变化与概率框架的评价是很好的研究。然而，目标检测（OD）任务带来不确定性估计和评价其他挑战。例如，一个需要评估标签不确定性无论是质量（即什么？）和空间的不确定性（即在哪里？）为给定的边界框，但评价不能与更传统的平均精度指标（如执行，地图）。在本文中，我们适应行之有效YOLOv3架构通过在蒙特卡洛降（MC-丢弃）的形式引入随机性，产生的不确定性的估计，并在不同的水平的数据集偏移评估它。我们把这种新的架构随机-YOLO，并提供有效的实施，有效地降低了MC-下降的负担，在推理时间采样机制。最后，我们提供了一些灵敏度分析，同时他们认为随机-YOLO是提高不确定性估计的不同的部件，在特定的空间不确定性声音的方法。

19. DV-ConvNet: Fully Convolutional Deep Learning on Point Clouds with Dynamic Voxelization and 3D Group Convolution [PDF] 返回目录
Zhaoyu Su, Pin Siang Tan, Junkang Chow, Jimmy Wu, Yehur Cheong, Yu-Hsing Wang
Abstract: 3D point cloud interpretation is a challenging task due to the randomness and sparsity of the component points. Many of the recently proposed methods like PointNet and PointCNN have been focusing on learning shape descriptions from point coordinates as point-wise input features, which usually involves complicated network architectures. In this work, we draw attention back to the standard 3D convolutions towards an efficient 3D point cloud interpretation. Instead of converting the entire point cloud into voxel representations like the other volumetric methods, we voxelize the sub-portions of the point cloud only at necessary locations within each convolution layer on-the-fly, using our dynamic voxelization operation with self-adaptive voxelization resolution. In addition, we incorporate 3D group convolution into our dense convolution kernel implementation to further exploit the rotation invariant features of point cloud. Benefiting from its simple fully-convolutional architecture, our network is able to run and converge at a considerably fast speed, while yields on-par or even better performance compared with the state-of-the-art methods on several benchmark datasets.
摘要：三维点云的解释是一项具有挑战性的任务，由于该部件点的随机性和稀疏。许多最近提出的方法，如PointNet和PointCNN的一直专注于学习从点坐标形状描述为逐点输入功能，这通常涉及复杂的网络架构。在这项工作中，我们提请大家注意建立一个有效的三维点云解释回标准3D卷积。而不是整个点云转换为体素表示像其他容量法，我们体素化的点群的子部分仅在上即时每个卷积层内需要的位置，使用带自适应体素化我们的动态体素化操作解析度。此外，我们引入3D组卷积到我们的密集卷积核实施，进一步挖掘点云的旋转不变特征。从简单的完全卷积架构中获益，我们的网络能够同时在标准杆或更好的性能收益率与几个基准数据集的国家的最先进的方法相比运行，并以相当快的速度收敛。

20. Quantifying Explainability of Saliency Methods in Deep Neural Networks [PDF] 返回目录
Erico Tjoa, Cuntai Guan
Abstract: One way to achieve eXplainable artificial intelligence (XAI) is through the use of post-hoc analysis methods. In particular, methods that generate heatmaps have been used to explain black-box models, such as deep neural network. In some cases, heatmaps are appealing due to the intuitive and visual ways to understand them. However, quantitative analysis that demonstrates the actual potential of heatmaps have been lacking, and comparison between different methods are not standardized as well. In this paper, we introduce a synthetic data that can be generated adhoc along with the ground-truth heatmaps for better quantitative assessment. Each sample data is an image of a cell with easily distinguishable features, facilitating a more transparent assessment of different XAI methods. Comparison and recommendations are made, shortcomings are clarified along with suggestions for future research directions to handle the finer details of select post-hoc analysis methods.
摘要：一层，实现可解释的人工智能（XAI）的方法是通过采用事后分析方法。特别是，产生热图的方法已被用来解释黑盒模型，如深层神经网络。在某些情况下，热图的吸引力，由于直观形象的方式来理解他们。但是，一直缺乏演示热图的实际潜力定量分析，不同方法之间的比较不规范，以及。在本文中，我们将介绍可与地面实况热图获得更好的定量评估一起产生即席在内的综合数据。每个样本的数据与容易区分的特征的细胞，促进不同XAI方法的更透明的评估的图像。比较和推荐由，缺点与未来的研究方向来处理的选择事后分析方法更精细的细节建议澄清一起。

21. Benchmarking off-the-shelf statistical shape modeling tools in clinical applications [PDF] 返回目录
Anupama Goparaju, Alexandre Bone, Nan Hu, Heath B. Henninger, Andrew E. Anderson, Stanley Durrleman, Matthijs Jacxsens, Alan Morris, Ibolya Csecs, Nassir Marrouche, Shireen Y. Elhabian
Abstract: Statistical shape modeling (SSM) is widely used in biology and medicine as a new generation of morphometric approaches for the quantitative analysis of anatomical shapes. Technological advancements of in vivo imaging have led to the development of open-source computational tools that automate the modeling of anatomical shapes and their population-level variability. However, little work has been done on the evaluation and validation of such tools in clinical applications that rely on morphometric quantifications (e.g., implant design and lesion screening). Here, we systematically assess the outcome of widely used, state-of-the-art SSM tools, namely ShapeWorks, Deformetrica, and SPHARM-PDM. We use both quantitative and qualitative metrics to evaluate shape models from different tools. We propose validation frameworks for anatomical landmark/measurement inference and lesion screening. We also present a lesion screening method to objectively characterize subtle abnormal shape changes with respect to learned population-level statistics of controls. Results demonstrate that SSM tools display different levels of consistencies, where ShapeWorks and Deformetrica models are more consistent compared to models from SPHARM-PDM due to the groupwise approach of estimating surface correspondences. Furthermore, ShapeWorks and Deformetrica shape models are found to capture clinically relevant population-level variability compared to SPHARM-PDM models.
摘要：统计形状建模（SSM）被广泛应用于生物学和医学作为新一代的解剖学形状的定量分析方法形态。的技术进步体内成像导致可自动化解剖形状和它们的人口水平变化的造型开源计算工具的发展。然而，很少工作已经在依靠形态的量化（例如，植入物的设计和病变筛查中的）临床应用评价和这样的工具验证完成。在这里，我们系统地评估的广泛应用，国家的最先进的SSM工具，即ShapeWorks，Deformetrica和SPHARM-PDM的结果。我们采用定量和定性指标，从不同的工具来评估形状模型。我们提出解剖标志/测量推断和病变筛查中的验证框架。我们还提出了一种病变筛查方法客观特征分析细微的异常形状的变化相对于控件了解到人口水平的统计数据。结果表明，SSM工具由于估计表面对应的成组的方式显示不同水平的一致性，其中，相比于从SPHARM-PDM模型ShapeWorks和Deformetrica模型是更一致的。此外，ShapeWorks和Deformetrica形状模型被发现捕获临床相关群体水平的变化相比，SPHARM-PDM模型。

22. Frontier Detection and Reachability Analysis for Efficient 2D Graph-SLAM Based Active Exploration [PDF] 返回目录
Zezhou Sun, Banghe Wu, Cheng-Zhong Xu, Sanjay E. Sarma, Jian Yang, Hui Kong
Abstract: We propose an integrated approach to active exploration by exploiting the Cartographer method as the base SLAM module for submap creation and performing efficient frontier detection in the geometrically co-aligned submaps induced by graph optimization. We also carry out analysis on the reachability of frontiers and their clusters to ensure that the detected frontier can be reached by robot. Our method is tested on a mobile robot in real indoor scene to demonstrate the effectiveness and efficiency of our approach.
摘要：通过利用制图方法作为子图创建基本SLAM模块并且在由图形优化引起的几何联合对齐子图进行有效前沿检测提出了一种综合方法积极探索。我们还开展边境及其集群的可达性分析，以确保检测的前沿可以通过机器人到达。我们的方法是在真正的室内场景移动机器人证明了该方法的有效性和效率测试。

23. Channel-wise Alignment for Adaptive Object Detection [PDF] 返回目录
Hang Yang, Shan Jiang, Xinge Zhu, Mingyang Huang, Zhiqiang Shen, Chunxiao Liu, Jianping Shi
Abstract: Generic object detection has been immensely promoted by the development of deep convolutional neural networks in the past decade. However, in the domain shift circumstance, the changes in weather, illumination, etc., often cause domain gap, and thus performance drops substantially when detecting objects from one domain to another. Existing methods on this task usually draw attention on the high-level alignment based on the whole image or object of interest, which naturally, cannot fully utilize the fine-grained channel information. In this paper, we realize adaptation from a thoroughly different perspective, i.e., channel-wise alignment. Motivated by the finding that each channel focuses on a specific pattern (e.g., on special semantic regions, such as car), we aim to align the distribution of source and target domain on the channel level, which is finer for integration between discrepant domains. Our method mainly consists of self channel-wise and cross channel-wise alignment. These two parts explore the inner-relation and cross-relation of attention regions implicitly from the view of channels. Further more, we also propose a RPN domain classifier module to obtain a domain-invariant RPN network. Extensive experiments show that the proposed method performs notably better than existing methods with about 5% improvement under various domain-shift settings. Experiments on different task (e.g. instance segmentation) also demonstrate its good scalability.
摘要：通用对象检测已经通过深卷积神经网络在过去十年的发展已经极大地提升。然而，在域转移情况下，在天气，照明等的变化，经常导致域间隙，因此从一个域到另一个检测对象时，性能大幅度下降。这项任务的现有方法通常基于整个图像或感兴趣的对象，自然也不能充分利用细粒度的频道信息的高层次排列提请注意。在本文中，我们实现从不同彻底的角度，即，信道逐对准适应。通过每个信道集中在一个特定的图案的发现动机（例如，在特殊的语义的地区，如汽车），我们的目标对准源和目标域的分布上的信道电平，这是有差异的结构域之间的融合细。我们的方法主要由自身信道向和交叉通道明智对准的。这两部分从隐式信道的视图探索注意区域的内关系和交叉关系。更进一步，我们也提出了一个RPN域分类模块获取域名不变RPN网络。大量的实验表明，该方法执行显着高于更好地在各种领域移设置约5％的改善现有的方法。在不同的任务（例如，例如分割）的实验也证明了其良好的可扩展性。

24. 3D Room Layout Estimation Beyond the Manhattan World Assumption [PDF] 返回目录
Dongho Choi
Abstract: Predicting 3D room layout from single image is a challenging task with many applications. In this paper, we propose a new training and post-processing method for 3D room layout estimation, built on a recent state-of-the-art 3D room layout estimation model. Experimental results show our method outperforms state-of-the-art approaches by a large margin in predicting visible room layout. Our method has obtained the 3rd place in 2020 Holistic Scene Structures for 3D Vision Workshop.
摘要：从单幅图像的3D预测房间布局与许多应用的具有挑战性的任务。在本文中，我们提出了3D房间布局估计一个新的培训和后处理方法，建立在最近的国家的最先进的3D房间布局估算模型。实验结果表明，我们的方法优于国家的最先进的以大比分预测可见房间的布局方法。我们的方法获得了第3名，2020年整体现场搭建三维愿景研讨会。

25. Unsupervised Wasserstein Distance Guided Domain Adaptation for 3D Multi-Domain Liver Segmentation [PDF] 返回目录
Chenyu You, Junlin Yang, Julius Chapiro, James S. Duncan
Abstract: Deep neural networks have shown exceptional learning capability and generalizability in the source domain when massive labeled data is provided. However, the well-trained models often fail in the target domain due to the domain shift. Unsupervised domain adaptation aims to improve network performance when applying robust models trained on medical images from source domains to a new target domain. In this work, we present an approach based on the Wasserstein distance guided disentangled representation to achieve 3D multi-domain liver segmentation. Concretely, we embed images onto a shared content space capturing shared feature-level information across domains and domain-specific appearance spaces. The existing mutual information-based representation learning approaches often fail to capture complete representations in multi-domain medical imaging tasks. To mitigate these issues, we utilize Wasserstein distance to learn more complete representation, and introduces a content discriminator to further facilitate the representation disentanglement. Experiments demonstrate that our method outperforms the state-of-the-art on the multi-modality liver segmentation task.
摘要：深层神经网络已经表明，当提供大量的标记数据源域非凡的学习能力和普遍性。然而，训练有素的模特经常在目标域失败的原因域转变。无监督领域适应性旨在运用训练有素的从源域医学图像到一个新的目标域稳健的模型时，提高网络性能。在这项工作中，我们提出基于所述瓦瑟斯坦距离引导解缠结表示实现3D多域肝脏分割的方法。具体而言，我们嵌入图像投影到一个共享的内容空间捕获跨越域和域特异性外观空间共享特征级信息。现有的基于信息相互学习的表示方法往往不捕获在多域医学成像任务的完整表述。为了缓解这些问题，我们利用Wasserstein的距离，以了解更多完整的表示，并引入了内容鉴别，进一步方便了代表性的解开。实验表明，我们的方法优于在多模态肝脏分割任务的国家的最先进的。

26. MFL_COVID19: Quantifying Country-based Factors affecting Case Fatality Rate in Early Phase of COVID-19 Epidemic via Regularised Multi-task Feature Learning [PDF] 返回目录
Po Yang, Jun Qi, Xulong Wang, Yun Yang
Abstract: Recent outbreak of COVID-19 has led a rapid global spread around the world. Many countries have implemented timely intensive suppression to minimize the infections, but resulted in high case fatality rate (CFR) due to critical demand of health resources. Other country-based factors such as sociocultural issues, ageing population etc., has also influenced practical effectiveness of taking interventions to improve morality in early phase. To better understand the relationship of these factors across different countries with COVID-19 CFR is of primary importance to prepare for potentially second wave of COVID-19 infections. In the paper, we propose a novel regularized multi-task learning based factor analysis approach for quantifying country-based factors affecting CFR in early phase of COVID-19 epidemic. We formulate the prediction of CFR progression as a ML regression problem with observed CFR and other countries-based factors. In this formulation, all CFR related factors were categorized into 6 sectors with 27 indicators. We proposed a hybrid feature selection method combining filter, wrapper and tree-based models to calibrate initial factors for a preliminary feature interaction. Then we adopted two typical single task model (Ridge and Lasso regression) and one state-of-the-art MTFL method (fused sparse group lasso) in our formulation. The fused sparse group Lasso (FSGL) method allows the simultaneous selection of a common set of country-based factors for multiple time points of COVID-19 epidemic and also enables incorporating temporal smoothness of each factor over the whole early phase period. Finally, we proposed one novel temporal voting feature selection scheme to balance the weight instability of multiple factors in our MTFL model.
摘要：COVID-19的最近爆发导致世界各地迅速蔓延全球。许多国家已经实施了密集的及时制止，以尽量减少感染，但造成高病死率（CFR），由于卫生资源的关键需求。其他国家为基础的因素如社会文化问题，人口老龄化等，也影响了采取干预措施，以提高道德早期阶段的实际效果。为了更好地了解在不同的国家，这些因素与COVID-19 CFR的关系是最重要的，为COVID-19感染的潜在的第二次浪潮做好准备。在本文中，我们提出了一个新的规则多任务学习用于定量COVID-19疫情初期影响CFR国家为基础的因素基于因子分析的方法。我们制定CFR发展的预测作为ML回归问题与观察到的基于国家，CFR等因素的影响。在该制剂中，所有的CFR相关的因素被分为6个扇区与27个指标。我们提出了一个混合特征选择相结合的方法过滤，包装材料和基于树的模型校准初始因素进行初步功能的交互。然后我们在我们的制剂中采用了两种典型的单任务模式（岭和套索回归）和一个状态的最先进的方法MTFL（稠合稀疏组套索）。将融合的稀疏组套索（FSGL）方法允许一组通用的用于COVID-19流行病的多个时间点的基于国家的因素的同时选择并且还使得能够结合每个因素的时间平滑度在整个早期阶段期间。最后，我们提出了一个新颖的时空投票功能选择方案，以平衡的多种因素的权重的不稳定性在我们MTFL模型。

27. TRANSPR: Transparency Ray-Accumulating Neural 3D Scene Point Renderer [PDF] 返回目录
Maria Kolos, Artem Sevastopolsky, Victor Lempitsky
Abstract: We propose and evaluate a neural point-based graphics method that can model semi-transparent scene parts. Similarly to its predecessor pipeline, ours uses point clouds to model proxy geometry, and augments each point with a neural descriptor. Additionally, a learnable transparency value is introduced in our approach for each point. Our neural rendering procedure consists of two steps. Firstly, the point cloud is rasterized using ray grouping into a multi-channel image. This is followed by the neural rendering step that "translates" the rasterized image into an RGB output using a learnable convolutional network. New scenes can be modeled using gradient-based optimization of neural descriptors and of the rendering network. We show that novel views of semi-transparent point cloud scenes can be generated after training with our approach. Our experiments demonstrate the benefit of introducing semi-transparency into the neural point-based modeling for a range of scenes with semi-transparent parts.
摘要：本文提出并评估神经基于点的图形方法可以模拟半透明的场景部分。同样，它的前身管道，我们采用分布式云代理几何模型，并且增加每个点与神经描述。此外，一个可以学习的透明度值是在我们的每一点办法出台。我们的神经渲染过程由两个步骤。首先，点云使用光线分组到多通道图像光栅化。这之后是由神经渲染步骤即“翻译”光栅化图像划分成使用可学习卷积网络的RGB输出。新的场景可以用神经描述和渲染网络的基于梯度的优化建模。我们发现可以用我们的方法训练后产生半透明点云场景的那本小说的看法。我们的实验证明引入半透明度到神经基于点的建模为一系列具有半透明部分的场景的益处。

28. Deep Longitudinal Modeling of Infant Cortical Surfaces [PDF] 返回目录
Peirong Liu, Zhengwang Wu, Gang Li, Pew-Thian Yap, Dinggang Shen
Abstract: Charting cortical growth trajectories is of paramount importance for understanding brain development. However, such analysis necessitates the collection of longitudinal data, which can be challenging due to subject dropouts and failed scans. In this paper, we will introduce a method for longitudinal prediction of cortical surfaces using a spatial graph convolutional neural network (GCNN), which extends conventional CNNs from Euclidean to curved manifolds. The proposed method is designed to model the cortical growth trajectories and jointly predict inner and outer cortical surfaces at multiple time points. Adopting a binary flag in loss calculation to deal with missing data, we fully utilize all available cortical surfaces for training our deep learning model, without requiring a complete collection of longitudinal data. Predicting the surfaces directly allows cortical attributes such as cortical thickness, curvature, and convexity to be computed for subsequent analysis. We will demonstrate with experimental results that our method is capable of capturing the nonlinearity of spatiotemporal cortical growth patterns and can predict cortical surfaces with improved accuracy.
摘要：在排行榜皮质增长轨迹是对理解大脑发育至关重要。然而，这样的分析必要的纵向数据，其可以是具有挑战性的，由于主体脱落和失败的扫描的集合。在本文中，我们将介绍使用空间图形的卷积神经网络（GCNN），其延伸从欧几里德到弯曲歧管常规细胞神经网络皮质表面的纵向预测的方法。所提出的方法被设计成模拟皮质增长轨迹和共同预测在多个时间点内和外皮层的表面。采用在损失计算一个二进制符号来处理丢失的数据，我们充分利用所有可用的皮质表面的训练，我们深切的学习模式，而不需要纵向数据的完整集合。预测所述表面允许直接皮质属性，诸如皮质厚度，曲率，和凸要计算用于随后的分析。我们将与实验结果表明，我们的方法是能够捕获的时空皮质增长模式的非线性的和能够更精确地预测皮质表面。

29. DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation [PDF] 返回目录
Hugo Bertiche, Meysam Madadi, Sergio Escalera
Abstract: We present a novel approach to the garment animation problem through deep learning. Previous approaches propose learning a single model for one or few garment types, or alternatively, extend a human body model to represent multiple garment types. These works are not able to generalize to arbitrarily complex outfits we commonly find in real life. Our proposed methodology is able to work with any topology, complexity and multiple layers of cloth. Because of this, it is also able to generalize to completely unseen outfits with complex details. We design our model such that it can be efficiently deployed on portable devices and achieve real-time performance. Finally, we present an approach for unsupervised learning.
摘要：本文提出了一种新的方法来通过深入学习服装动画问题。先前方法提议学习一个或几个服装类型的单一模式，或可替代地，延伸人体模型来表示多个服装类型。这些作品不能推广到任意复杂的服装，我们在现实生活中经常发现。我们提出的方法能够在任何拓扑，复杂性和布多层的工作。正因为如此，它也能够推广到复杂的细节完全看不见的服装。我们在设计模型，使得它可以有效地部署在移动设备上，实现实时性能。最后，我们提出了监督学习的方法。

30. Efficient Pedestrian Detection in Top-View Fisheye Images Using Compositions of Perspective View Patches [PDF] 返回目录
Sheng-Ho Chiang, Tsaipei Wang
Abstract: Pedestrian detection in images is a topic that has been studied extensively, but existing detectors designed for perspective images do not perform as successfully on images taken with top-view fisheye cameras, mainly due to the orientation variation of people in such images. In our proposed approach, several perspective views are generated from a fisheye image and then concatenated to form a composite image. As pedestrians in this composite image are more likely to be upright, existing detectors designed and trained for perspective images can be applied directly without additional training. We also describe a new method of mapping detection bounding boxes from the perspective views to the fisheye frame. The detection performance on several public datasets compare favorably with state-of-the-art results.
摘要：在图像行人检测是已被广泛研究，但设计的透视图像现有检测系统不与俯视鱼眼相机拍摄的，这主要是由于人在这样的图像中的方向变化图像作为成功执行的一个话题。在我们提出的方法中，几个透视图是从鱼眼图像生成，然后连接以形成合成图像。作为该合成图像中的行人更可能是直立，设计并训练透视图像现有的检测器可以直接无需额外的训练应用。我们还描述了映射检测边界从透视图，以鱼眼镜头框架箱的新方法。在几个公共数据集上的检测性能与国家的最先进的效果相媲美。

31. Approaches, Challenges, and Applications for Deep Visual Odometry: Toward to Complicated and Emerging Areas [PDF] 返回目录
Ke Wang, Sai Ma, Junlan Chen, Fan Ren
Abstract: Visual odometry (VO) is a prevalent way to deal with the relative localization problem, which is becoming increasingly mature and accurate, but it tends to be fragile under challenging environments. Comparing with classical geometry-based methods, deep learning-based methods can automatically learn effective and robust representations, such as depth, optical flow, feature, ego-motion, etc., from data without explicit computation. Nevertheless, there still lacks a thorough review of the recent advances of deep learning-based VO (Deep VO). Therefore, this paper aims to gain a deep insight on how deep learning can profit and optimize the VO systems. We first screen out a number of qualifications including accuracy, efficiency, scalability, dynamicity, practicability, and extensibility, and employ them as the criteria. Then, using the offered criteria as the uniform measurements, we detailedly evaluate and discuss how deep learning improves the performance of VO from the aspects of depth estimation, feature extraction and matching, pose estimation. We also summarize the complicated and emerging areas of Deep VO, such as mobile robots, medical robots, augmented reality and virtual reality, etc. Through the literature decomposition, analysis, and comparison, we finally put forward a number of open issues and raise some future research directions in this field.
摘要：视觉测距（VO）是处理相对定位问题，这是日趋成熟和准确的一种普遍方式，但它往往具有挑战性的环境下是脆弱的。与传统的基于几何的方法相比，基于深学习方法，能够自动学习有效和有力的陈述，如深度，光流，特征，自我运动，等等，从数据中没有明确的计算。尽管如此，仍然缺乏近期为基础的学习深VO（深VO）的进展进行了全面审查。因此，本文旨在获得关于深度学习如何盈利和优化VO体系的深刻洞察。我们首先筛选出一批资质，包括精度，效率，可扩展性，动态性，实用性和可扩展性，并聘请他们为准则。然后，使用所提供的标准作为统一的测量，我们详细的评估，并讨论深度学习如何提高VO从深度估计，特征提取和匹配，姿态估计等方面的性能。我们也总结了复杂和深VO的新兴领域，如移动机器人，医疗机器人，增强现实和虚拟现实等通过文献分解，分析和比较，我们最终提出了一些开放性的问题，并提出了一些未来的研究方向在这一领域。

32. A Genetic Feature Selection Based Two-stream Neural Network for Anger Veracity Recognition [PDF] 返回目录
Chaoxing Huang, Xuanying Zhu, Tom Gedeon
Abstract: People can manipulate emotion expressions when interacting with others. For example, acted anger can be expressed when stimuli is not genuinely angry with an aim to manipulate the observer. In this paper, we aim to examine if the veracity of anger can be recognized with observers' pupillary data with computational approaches. We use Genetic-based Feature Selection (GFS) methods to select time-series pupillary features of of observers who observe acted and genuine anger of the video stimuli. We then use the selected features to train a simple fully connected neural work and a two-stream neural network. Our results show that the two-stream architecture is able to achieve a promising recognition result with an accuracy of 93.58% when the pupillary responses from both eyes are available. It also shows that genetic algorithm based feature selection method can effectively improve the classification accuracy by 3.07%. We hope our work could help daily research such as human machine interaction and psychology studies that require emotion recognition .
摘要：人们可以与他人交往时操纵的情感表达。例如，采取行动的愤怒可以表达的刺激，不以目标操纵观察者真正生气。在本文中，我们的目的是检查是否愤怒的真实性能与观察员与计算方法瞳孔的数据进行识别。我们采用基于遗传特征选择（GFS）的方法来选择谁担任观摩和视频刺激的真正怒气观察员的时间序列瞳孔功能。然后，我们使用选定的功能训练全连接的一个简单的神经工作和两流的神经网络。我们的研究结果表明，这两种流架构能够实现承诺的识别结果与93.58％的准确度时，从两只眼睛的瞳孔反应都可用。它还表明，基于遗传算法的特征选择方法能有效地由3.07％提高分类精度。我们希望我们的工作能够帮助日常的研究，如需要情感识别人机交互和心理学研究。

33. Deep Sparse Light Field Refocusing [PDF] 返回目录
Shachar Ben Dayan, David Mendlovic, Raja Giryes
Abstract: Light field photography enables to record 4D images, containing angular information alongside spatial information of the scene. One of the important applications of light field imaging is post-capture refocusing. Current methods require for this purpose a dense field of angle views; those can be acquired with a micro-lens system or with a compressive system. Both techniques have major drawbacks to consider, including bulky structures and angular-spatial resolution trade-off. We present a novel implementation of digital refocusing based on sparse angular information using neural networks. This allows recording high spatial resolution in favor of the angular resolution, thus, enabling to design compact and simple devices with improved hardware as well as better performance of compressive systems. We use a novel convolutional neural network whose relatively small structure enables fast reconstruction with low memory consumption. Moreover, it allows handling without re-training various refocusing ranges and noise levels. Results show major improvement compared to existing methods.
摘要：光场摄影能记录4D图像，包含沿着场景的空间信息的角度信息。一个光场成像的重要应用是捕捉后的重聚。当前方法要求用于此目的的角视图的致密场;那些可以用的微透镜系统或压缩系统来获取。这两种技术都主要缺点考虑，包括庞大的结构和角度，空间分辨率权衡。我们提出基于使用神经网络的稀疏角度信息新颖实现数字重聚焦的。这允许记录高空间分辨率有利于角分辨率，从而，使得能够与改进的硬件以及压缩系统的更好的性能，设计紧凑且简单的装置。我们使用一种新型的卷积神经网络，其相对较小的结构，可以用较低的内存消耗快速重建。此外，它允许处理无需重新培训各种重聚范围和噪声水平。结果相比，现有的方法表明重大改进。

34. Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT [PDF] 返回目录
Ke Yan, Jinzheng Cai, Youjing Zheng, Adam P. Harrison, Dakai Jin, You-Bao Tang, Yu-Xing Tang, Lingyun Huang, Jing Xiao, Le Lu
Abstract: Large-scale datasets with high-quality labels are desired for training accurate deep learning models. However, due to annotation costs, medical imaging datasets are often either partially-labeled or small. For example, DeepLesion is a large-scale CT image dataset with lesions of various types, but it also has many unlabeled lesions (missing annotations). When training a lesion detector on a partially-labeled dataset, the missing annotations will generate incorrect negative signals and degrade performance. Besides DeepLesion, there are several small single-type datasets, such as LUNA for lung nodules and LiTS for liver tumors. Such datasets have heterogeneous label scopes, i.e., different lesion types are labeled in different datasets with other types ignored. In this work, we aim to tackle the problem of heterogeneous and partial labels, and develop a universal lesion detection algorithm to detect a comprehensive variety of lesions. First, we build a simple yet effective lesion detection framework named Lesion ENSemble (LENS). LENS can efficiently learn from multiple heterogeneous lesion datasets in a multi-task fashion and leverage their synergy by feature sharing and proposal fusion. Next, we propose strategies to mine missing annotations from partially-labeled datasets by exploiting clinical prior knowledge and cross-dataset knowledge transfer. Finally, we train our framework on four public lesion datasets and evaluate it on 800 manually-labeled sub-volumes in DeepLesion. On this challenging task, our method brings a relative improvement of 49% compared to the current state-of-the-art approach.
摘要：以高品质的标签，大型数据集所需的训练精确的深度学习模式。然而，由于成本注释，医疗成像数据集常常是或者部分地标记的或小。例如，DeepLesion是一个大型的CT图像数据集与各种类型的病灶，但它也有许多未标记的病变（缺少注释）。当训练上的部分标记的数据集的病变检测器，缺少的注释会生成不正确的负信号，并降低性能。此外DeepLesion，有几个小单类型的数据集，如LUNA肺结节和双床肝肿瘤。这种数据集具有异质标签范围，即，不同病变类型标记与其它类型的不同忽略的数据集。在这项工作中，我们的目标是解决异构和部分标签的问题，并制定通用病变检测算法来检测各种综合病变。首先，我们建立一个名为病变合奏（LENS）一个简单而有效的肿瘤检测的框架。镜头可以有效地从多个异构数据集病灶在多任务的方式学习和特征共享和建议融合利用其协同作用。接下来，我们提出战略，矿山通过利用临床先验知识和跨数据集的知识转移失踪部分标记的数据集的注释。最后，我们训练我们的框架在四个公共病变数据集，并评估它在DeepLesion 800手动标记的子卷。在这个具有挑战性的任务，我们的方法带来比当前国家的最先进方法的49％的相对改善。

35. Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability [PDF] 返回目录
Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva
Abstract: A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay over time. We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time. In contrast with previous work, our model can predict the probability that a video will be remembered at an arbitrary delay. Importantly, our approach combines visual and semantic information (in the form of textual captions) to fully represent the meaning of events. Our experiments on two video memorability benchmarks, including Memento10k, show that our model significantly improves upon the best prior approach (by 12% on average).
摘要：当从过去的经验的事件必须记住的智能系统的一个关键能力是决定时，他们可以被人遗忘。为了实现这一目标，我们开发人类视觉事件记忆的预测模型，这些记忆是如何随时间衰减。我们介绍Memento10k，含有不同的观看延迟人类注释的新的，动态的视频记忆性数据集。根据我们的调查结果，我们建议可记忆衰退的一个新的数学公式，导致了一个模型，能够产生怎样的视频内存随时间衰减第一次定量估计。与以前的工作相反，我们的模型可以预测，视频会在任意延迟被记住的概率。重要的是，我们的方法结合了视觉和语义信息（文本字幕的形式），充分代表了事件的意义。我们对两个视频记忆性基准测试，包括Memento10k实验，表明我们的模型在之前最好的办法显著提高（平均提高12％）。

36. An Efficient Technique for Image Captioning using Deep Neural Network [PDF] 返回目录
Borneel Bikash Phukan, Amiya Ranjan Panda
Abstract: With the huge expansion of internet and trillions of gigabytes of data generated every single day, the needs for the development of various tools has become mandatory in order to maintain system adaptability to rapid changes. One of these tools is known as Image Captioning. Every entity in internet must be properly identified and managed and therefore in the case of image data, automatic captioning for identification is required. Similarly, content generation for missing labels, image classification and artificial languages all requires the process of Image Captioning. This paper discusses an efficient and unique way to perform automatic image captioning on individual image and discusses strategies to improve its performances and functionalities.
摘要：随着互联网和每一天所产生的数据的千兆字节数万亿的大规模扩张，对各种工具的发展的需要已成为以维持系统适应快速变化的强制性。这些工具之一是一种被称作图像字幕。在互联网的每个实体必须正确识别和管理，因此在图像数据的情况下，需要识别的自动字幕。同样，内容生成缺少的标签，图像分类和人工语言中的所有需要图像字幕的过程。本文讨论了进行个人形象和讨论策略的自动图像字幕，以提高其性能和功能的有效和独特的方式。

37. Visual Object Tracking by Segmentation with Graph Convolutional Network [PDF] 返回目录
Bo Jianga, Panpan Zhang, Lili Huang
Abstract: Segmentation-based tracking has been actively studied in computer vision and multimedia. Superpixel based object segmentation and tracking methods are usually developed for this task. However, they independently perform feature representation and learning of superpixels which may lead to sub-optimal results. In this paper, we propose to utilize graph convolutional network (GCN) model for superpixel based object tracking. The proposed model provides a general end-to-end framework which integrates i) label linear prediction, and ii) structure-aware feature information of each superpixel together to obtain object segmentation and further improves the performance of tracking. The main benefits of the proposed GCN method have two main aspects. First, it provides an effective end-to-end way to exploit both spatial and temporal consistency constraint for target object segmentation. Second, it utilizes a mixed graph convolution module to learn a context-aware and discriminative feature for superpixel representation and labeling. An effective algorithm has been developed to optimize the proposed model. Extensive experiments on five datasets demonstrate that our method obtains better performance against existing alternative methods.
摘要：基于分割的跟踪一直积极研究在计算机视觉和多媒体。基于超像素对象分割和跟踪方法通常用于这个任务的发展。然而，它们独立地执行特征表示，并且超像素，这可能导致次优的结果的学习。在本文中，我们提出了利用用于基于超像素对象跟踪图表卷积网络（GDN）模型。该模型提供了一个通用的端至端框架它集成ⅰ）线性预测的标签，和ii）结构感知每个超像素的特征信息一起，以获得对象分割和进一步改善跟踪性能。所提出的GCN方法的主要好处有两个主要方面。首先，它提供了有效的端至端的方式来利用对目标对象分割空间和时间一致性约束。其次，它采用了混合图卷积模块，以了解对超像素表示和标签一个情境感知和辨别特征。一个有效的算法已经发展到优化提出的模型。在五个数据集大量的实验证明，我们的方式来获取对现有的替代方法更好的性能。

38. Generalization on the Enhancement of Layerwise Relevance Interpretability of Deep Neural Network [PDF] 返回目录
Erico Tjoa, Guan Cuntai
Abstract: The practical application of deep neural networks are still limited by their lack of transparency. One of the efforts to provide explanation for decisions made by artificial intelligence (AI) is the use of saliency or heat maps highlighting relevant regions that contribute significantly to its prediction. A layer-wise amplitude filtering method was previously introduced to improve the quality of heatmaps, performing error corrections by noise-spike suppression. In this study, we generalize the layerwise error correction by considering any identifiable error and assuming there exists a groundtruth interpretable information. The forms of errors propagated through layerwise relevance methods are studied and we propose a filtering technique for interpretability signal rectification taylored to the trend of signal amplitude of the particular neural network used. Finally, we put forth arguments for the use of groundtruth interpretable information.
摘要：深层神经网络的实际应用仍然受到缺乏透明度限制。一个由人工智能（AI）做出决策提供解释的努力是利用显着性或热的映射突出显示其预测显著促进相关区域。逐层振幅滤波方法之前引入，以提高热图的质量，由噪声尖峰抑制进行错误更正。在这项研究中，我们考虑任何可以识别的错误，并假设存在一个真实状况可解释的信息概括的分层纠错。研究了通过逐层相关方法繁殖误差的形式，并且我们提出taylored所使用的具体的神经网络的信号幅度的趋势解释性信号整流的滤波技术。最后，我们提出的论点对使用真实状况可解释的信息。

39. Reverse-engineering Bar Charts Using Neural Networks [PDF] 返回目录
Fangfang Zhou, Yong Zhao, Wenjiang Chen, Yijing Tan, Yaqi Xu, Yi Chen, Chao Liu, Ying Zhao
Abstract: Reverse-engineering bar charts extracts textual and numeric information from the visual representations of bar charts to support application scenarios that require the underlying information. In this paper, we propose a neural network-based method for reverse-engineering bar charts. We adopt a neural network-based object detection model to simultaneously localize and classify textual information. This approach improves the efficiency of textual information extraction. We design an encoder-decoder framework that integrates convolutional and recurrent neural networks to extract numeric information. We further introduce an attention mechanism into the framework to achieve high accuracy and robustness. Synthetic and real-world datasets are used to evaluate the effectiveness of the method. To the best of our knowledge, this work takes the lead in constructing a complete neural network-based method of reverse-engineering bar charts.
摘要：逆向工程条形图中提取文本和条形图的视觉表现到需要的基础信息支持的应用场景的数字信息。在本文中，我们提出了反向工程条形图一个基于神经网络的方法。我们采用基于神经网络的目标检测模型，同时定位和分类的文本信息。这种方法提高了文本信息提取效率。我们设计了一个编码器，解码器架构，集成了卷积和复发性神经网络来提取数字信息。我们进一步引入注意机制到框架，以实现高精确度和耐用性。合成和真实世界的数据集用于评估方法的有效性。据我们所知，这项工作需要构建逆向工程条形图的一个完整的基于神经网络的方法的领先优势。

40. Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks [PDF] 返回目录
Wei-An Lin, Chun Pong Lau, Alexander Levine, Rama Chellappa, Soheil Feizi
Abstract: Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks.
摘要：对抗性训练是针对攻击的威胁模型流行的国防战略与界脂蛋白规范。但是，它通常会降低对正常的图像模型表现和防守没有很好推广的新型攻击。鉴于深生成模型，如甘斯和VAES的表征图像的基本歧管成功，我们调查是否上述问题可以通过利用底层歧管信息予以纠正。为此，我们通过ImageNet样本投影到由StyleGSN学到的歧管构造一个“在歧管ImageNet”（OM-ImageNet）数据集。对于此数据集，底层歧管信息是准确的。使用OM-ImageNet，我们首先显示图像的潜在空间对抗训练提高了标准的准确性和鲁棒性上歧管的攻击。不过，由于没有外的歧管扰动实现，防御可以通过脂蛋白敌对攻击破坏。我们进一步建议，其中两个潜在的和图像空间对抗扰动robustifying模型中使用双歧管对抗性训练（DMAT）。我们DMAT提高了正常的图像性能，实现媲美的鲁棒性对抗脂蛋白攻击的标准对抗性训练。此外，我们观察到，型号悍DMAT实现针对其通过全局色移或各种类型的图像的滤波处理图像新颖攻击改进的鲁棒性。有趣的是，当保卫模型上外的歧管的自然图像测试类似的改进，也能够实现。这些结果表明，在加强对各类新颖的敌对攻击深度学习模型的鲁棒性使用歧管信息的潜在好处。

41. User-Guided Domain Adaptation for Rapid Annotation from User Interactions: A Study on Pathological Liver Segmentation [PDF] 返回目录
Ashwin Raju, Zhanghexuan Ji, Chi Tung Cheng, Jinzheng Cai, Junzhou Huang, Jing Xiao, Le Lu, ChienHung Liao, Adam P. Harrison
Abstract: Mask-based annotation of medical images, especially for 3D data, is a bottleneck in developing reliable machine learning models. Using minimal-labor user interactions (UIs) to guide the annotation is promising, but challenges remain on best harmonizing the mask prediction with the UIs. To address this, we propose the user-guided domain adaptation (UGDA) framework, which uses prediction-based adversarial domain adaptation (PADA) to model the combined distribution of UIs and mask predictions. The UIs are then used as anchors to guide and align the mask prediction. Importantly, UGDA can both learn from unlabelled data and also model the high-level semantic meaning behind different UIs. We test UGDA on annotating pathological livers using a clinically comprehensive dataset of 927 patient studies. Using only extreme-point UIs, we achieve a mean (worst-case) performance of 96.1%(94.9%), compared to 93.0% (87.0%) for deep extreme points (DEXTR). Furthermore, we also show UGDA can retain this state-of-the-art performance even when only seeing a fraction of available UIs, demonstrating an ability for robust and reliable UI-guided segmentation with extremely minimal labor demands.
摘要：医学图像的基础面膜标注，特别是对于3D数据，在开发可靠的机器学习模型的瓶颈。使用最少的劳动用户交互（UI）的指导注解是有希望的，但挑战依然存在上最好的协调与UI的面具预测。为了解决这个问题，我们提出了用户引导的适配域（UGDA）框架，其使用基于预测的对抗域自适应（PADA）到用户界面的组合分布模型和掩蔽预测。的用户界面随后被用作锚引导和对准掩模预测。重要的是，UGDA既可以从数据未标记的学习和模拟也是不同的UI背后的高层次语义。我们使用927项患者研究的临床综合数据集注释病理肝脏测试UGDA。仅使用极端点的用户界面，我们实现了96.1％（94.9％），平均（最坏情况下）的性能，相比93.0％（87.0％），深极值点（DEXTR）。此外，我们还显示UGDA甚至可以保留国家的最先进的这样的表现时，只看到可用的用户界面的一部分，展示了坚固可靠的UI引导分割与极小的劳动力需求的能力。

42. GazeMAE: General Representations of Eye Movements using a Micro-Macro Autoencoder [PDF] 返回目录
Louise Gillian C. Bautista, Prospero C. Naval, Jr
Abstract: Eye movements are intricate and dynamic events that contain a wealth of information about the subject and the stimuli. We propose an abstract representation of eye movements that preserve the important nuances in gaze behavior while being stimuli-agnostic. We consider eye movements as raw position and velocity signals and train separate deep temporal convolutional autoencoders. The autoencoders learn micro-scale and macro-scale representations that correspond to the fast and slow features of eye movements. We evaluate the joint representations with a linear classifier fitted on various classification tasks. Our work accurately discriminates between gender and age groups, and outperforms previous works on biometrics and stimuli clasification. Further experiments highlight the validity and generalizability of this method, bringing eye tracking research closer to real-world applications.
摘要：眼球运动包含了丰富的有关此主题的刺激信息复杂和动态事件。我们建议眼球运动是保持在注视行为的重要细微差别，同时刺激无关的抽象表示。我们认为，眼球运动为原料的位置和速度信号和培养独立的颞深卷积自动编码。该自动编码学习微观尺度和宏观尺度表示对应于快速眼球运动缓慢的特点。我们评估是装配在各种分类任务的线性分类的联合代表。我们的工作性别和年龄组之间准确区分，并优于对生物识别和刺激clasification以前的作品。进一步的实验凸显这种方法的有效性和普遍性，使眼动追踪研究更接近现实世界的应用。

43. Player Identification in Hockey Broadcast Videos [PDF] 返回目录
Alvin Chan, Martin D. Levine, Mehrsan Javan
Abstract: We present a deep recurrent convolutional neural network (CNN) approach to solve the problem of hockey player identification in NHL broadcast videos. Player identification is a difficult computer vision problem mainly because of the players' similar appearance, occlusion, and blurry facial and physical features. However, we can observe players' jersey numbers over time by processing variable length image sequences of players (aka 'tracklets'). We propose an end-to-end trainable ResNet+LSTM network, with a residual network (ResNet) base and a long short-term memory (LSTM) layer, to discover spatio-temporal features of jersey numbers over time and learn long-term dependencies. For this work, we created a new hockey player tracklet dataset that contains sequences of hockey player bounding boxes. Additionally, we employ a secondary 1-dimensional convolutional neural network classifier as a late score-level fusion method to classify the output of the ResNet+LSTM network. This achieves an overall player identification accuracy score over 87% on the test split of our new dataset.
摘要：我们提出了一个深刻的反复卷积神经网络（CNN）的方法来解决在NHL广播视频曲棍球运动员识别的问题。玩家识别是一个困难的计算机视觉问题，主要是因为球员的外观相似，闭塞，和模糊的面部和身体特征。然而，我们可以通过处理的播放器可变长度的图像序列（又名随时间球衣号码‘tracklets’）观察玩家。我们提出了一种端至端可训练RESNET + LSTM网络，具有剩余网络（RESNET）碱和长短期记忆（LSTM）层，以发现球衣号码的时空特征随着时间的推移和学习长期依赖。对于这项工作，我们创建了一个包含的曲棍球运动员边框序列的新的曲棍球运动员tracklet数据集。此外，我们采用了二次1维卷积神经网络分类为晚分数级融合方法将RESNET + LSTM网络的输出分类。这达到了整体的游戏者识别的准确度得分超过87％，对我们的新的数据集的测试分裂。

44. Explanation of Unintended Radiated Emission Classification via LIME [PDF] 返回目录
Tom Grimes, Eric Church, William Pitts, Lynn Wood
Abstract: Unintended radiated emissions arise during the use of electronic devices. Identifying and mitigating the effects of these emissions is a key element of modern power engineering and associated control systems. Signal processing of the electrical system can identify the sources of these emissions. A dataset known as Flaming Moes includes captured unintended radiated emissions from consumer electronics. This dataset was analyzed to construct next-generation methods for device identification. To this end, a neural network based on applying the ResNet-18 image classification architecture to the short time Fourier transforms of short segments of voltage signatures was constructed. Using this classifier, the 18 device classes and background class were identified with close to 100 percent accuracy. By applying LIME to this classifier and aggregating the results over many classifications for the same device, it was possible to determine the frequency bands used by the classifier to make decisions. Using ensembles of classifiers trained on very similar datasets from the same parent data distribution, it was possible to recover robust sets of features of device output useful for identification. The additional understanding provided by the application of LIME enhances the trainability, trustability, and transferability of URE analysis networks.
摘要：无意辐射的电子装置的使用过程中出现的排放。识别并减少这些排放的效果是现代电力工程及相关控制系统的一个关键要素。电气系统的信号处理可以识别这些排放源。称作火焰MOES数据集包括：捕捉从消费电子不期望的辐射。此数据集进行分析，以构建用于装置识别下一代方法。为此，一个神经网络基于应用RESNET-18图像分类体系结构到短时间电压签名的短片段的傅立叶变换，构建。使用该分类器，所述18的设备类和背景类被鉴定具有接近100％的准确度。通过应用石灰该分类和综合这些结果在同一设备很多分类，有可能确定所使用的分类作出决定的频带。使用已训练的上从相同的父数据分布非常相似的数据集的分类器的合奏，有可能恢复健壮集合中的用于识别有用装置输出的特征。石灰的应用程序提供额外的认识提高训练能力，可信任性和URE分析网络的转让。

45. Video Moment Retrieval via Natural Language Queries [PDF] 返回目录
Xinli Yu, Mohsen Malmir, Cynthia He, Yue Liu, Rex Wu
Abstract: In this paper, we propose a novel method for video moment retrieval (VMR) that achieves state of the arts (SOTA) performance on R@1 metrics and surpassing the SOTA on the high IoU metric (R@1, IoU=0.7). First, we propose to use a multi-head self-attention mechanism, and further a cross-attention scheme to capture video/query interaction and long-range query dependencies from video context. The attention-based methods can develop frame-to-query interaction and query-to-frame interaction at arbitrary positions and the multi-head setting ensures the sufficient understanding of complicated dependencies. Our model has a simple architecture, which enables faster training and inference while maintaining . Second, We also propose to use multiple task training objective consists of moment segmentation task, start/end distribution prediction and start/end location regression task. We have verified that start/end prediction are noisy due to annotator disagreement and joint training with moment segmentation task can provide richer information since frames inside the target clip are also utilized as positive training examples. Third, we propose to use an early fusion approach, which achieves better performance at the cost of inference time. However, the inference time will not be a problem for our model since our model has a simple architecture which enables efficient training and inference.
摘要：在本文中，我们提出一种用于视频的时刻检索（VMR），其实现了艺术上R（SOTA）性能@ 1个指标和超过在高IOU度量（R的SOTA @ 1的状态的新方法，IOU = 0.7 ）。首先，我们建议使用一个多头的自我关注机制，进一步交叉注意方案来捕捉视频/互动查询和远程查询依赖从视频背景。注意基于方法可以在任意位置开发框架 - 查询互动，查询到帧的互动和多头设置确保复杂的依赖关系的充分理解。我们的模型有一个简单的架构，实现了更高的训练和推理，同时保持。其次，我们还建议使用多个任务的培训目标由时刻分割的任务，开始/结束分布预测和开始/结束位置回归任务。我们已验证的开始/结束的预测是嘈杂由于标注的分歧，并与时刻分割任务联合训练可以提供更丰富的信息，因为目标剪辑中的帧也用作正例。第三，我们建议使用早期的融合方法，这在推理时间成本获得更好的性能。然而，推理时间将不会对我们的模型一个问题，因为我们的模式有一个简单的架构，它能够有效的训练和推理。

46. A Deep Learning Approach to Tongue Detection for Pediatric Population [PDF] 返回目录
Javad Rahimipour Anaraki, Silvia Orlandi, Tom Chau
Abstract: Children with severe disabilities and complex communication needs face limitations in the usage of access technology (AT) devices. Conventional ATs (e.g., mechanical switches) can be insufficient for nonverbal children and those with limited voluntary motion control. Automatic techniques for the detection of tongue gestures represent a promising pathway. Previous studies have shown the robustness of tongue detection algorithms on adult participants, but further research is needed to use these methods with children. In this study, a network architecture for tongue-out gesture recognition was implemented and evaluated on videos recorded in a naturalistic setting when children were playing a video-game. A cascade object detector algorithm was used to detect the participants' faces, and an automated classification scheme for tongue gesture detection was developed using a convolutional neural network (CNN). In evaluation experiments conducted, the network was trained using adults and children's images. The network classification accuracy was evaluated using leave-one-subject-out cross-validation. Preliminary classification results obtained from the analysis of videos of five typically developing children showed an accuracy of up to 99% in predicting tongue-out gestures. Moreover, we demonstrated that using only children data for training the classifier yielded better performance than adult's one supporting the need for pediatric tongue gesture datasets.
摘要：儿童重度残疾人和复杂的通信需求面临接入技术（AT）设备的使用限制。常规的AT（例如，机械开关）可以是不足以非语言儿童和那些具有有限的自愿的运动控制。用于检测舌手势的自动技术代表有前途的途径。以往的研究表明舌检测算法对成年参与者的鲁棒性，但还需要进一步的研究来使用这些方法与儿童。在这项研究中，舌手势识别的网络架构中实现并记录在一个自然设置时，孩子们在玩视频游戏的视频进行评估。级联对象检测算法来检测参与者的面，并使用卷积神经网络（CNN），用于舌手势检测的自动分类方法的开发。在评估实验中进行的，该网络使用成人和儿童的图像训练。使用留一标的交叉验证网络分类精度进行了评价。从五个正常发育儿童的录像分析得到的初步分类结果在预测舌出手势露面的准确度99％。此外，我们发现，使用仅训练分类儿童的数据产生比成人的一个配套小儿舌姿态数据集需要更好的性能。

47. Class Interference Regularization [PDF] 返回目录
Bharti Munjal, Sikandar Amin, Fabio Galasso
Abstract: Contrastive losses yield state-of-the-art performance for person re-identification, face verification and few shot learning. They have recently outperformed the cross-entropy loss on classification at the ImageNet scale and outperformed all self-supervision prior results by a large margin (SimCLR). Simple and effective regularization techniques such as label smoothing and self-distillation do not apply anymore, because they act on multinomial label distributions, adopted in cross-entropy losses, and not on tuple comparative terms, which characterize the contrastive losses. Here we propose a novel, simple and effective regularization technique, the Class Interference Regularization (CIR), which applies to cross-entropy losses but is especially effective on contrastive losses. CIR perturbs the output features by randomly moving them towards the average embeddings of the negative classes. To the best of our knowledge, CIR is the first regularization technique to act on the output features. In experimental evaluation, the combination of CIR and a plain Siamese-net with triplet loss yields best few-shot learning performance on the challenging tieredImageNet. CIR also improves the state-of-the-art technique in person re-identification on the Market-1501 dataset, based on triplet loss, and the state-of-the-art technique in person search on the CUHK-SYSU dataset, based on a cross-entropy loss. Finally, on the task of classification CIR performs on par with the popular label smoothing, as demonstrated for CIFAR-10 and -100.
摘要：对比损失得到国家的最先进的性能者重新鉴定，人脸验证很少出手学习。他们最近跑赢分类交叉熵损失在ImageNet规模大幅度（SimCLR）跑赢所有自检之前的结果。简单而有效的规则化技术，如标签平滑和自蒸馏不要再申请，因为他们的行为在多项标签的分布，在交叉熵损失通过，而不是元组比较而言，其表征对比损失。在这里，我们提出一种新颖的，简单和有效的正则化技术，类干扰的正则化（CIR），它适用于交叉熵的损失，但是在对比损失特别有效。 CIR通过随机移动它们向负类的平均的嵌入扰乱输出功能。据我们所知，CIR是第一个正则化技术作用于输出功能。在实验评价，CIR的组合和一个普通的连体净使用三线损率最好的几个次学习的挑战tieredImageNet性能。 CIR也提高了国家的最先进的技术，在市场-1501数据集的人重新鉴定的基础上，三重损失，和国家的最先进的技术，在香港中文大学，中山大学数据集的人搜索，基于上的交叉熵损失。最后，在分类CIR执行的看齐任务与流行的标签平滑，这表现为CIFAR-10和-100。

48. ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [PDF] 返回目录
Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu
Abstract: Convolutional Neural Networks (CNNs) are known to be significantly over-parametrized, and difficult to interpret, train and adapt. In this paper, we introduce a structural regularization across convolutional kernels in a CNN. In our approach, each convolution kernel is first decomposed as 2D dictionary atoms linearly combined by coefficients. The widely observed correlation and redundancy in a CNN hint a common low-rank structure among the decomposed coefficients, which is here further supported by our empirical observations. We then explicitly regularize CNN kernels by enforcing decomposed coefficients to be shared across sub-structures, while leaving each sub-structure only its own dictionary atoms, a few hundreds of parameters typically, which leads to dramatic model reductions. We explore models with sharing across different sub-structures to cover a wide range of trade-offs between parameter reduction and expressiveness. Our proposed regularized network structures open the door to better interpreting, training and adapting deep models. We validate the flexibility and compatibility of our method by image classification experiments on multiple datasets and underlying network structures, and show that CNNs now maintain performance with dramatic reduction in parameters and computations, e.g., only 5\% parameters are used in a ResNet-18 to achieve comparable performance. Further experiments on few-shot classification show that faster and more robust task adaptation is obtained in comparison with models with standard convolutions.
摘要：卷积神经网络（细胞神经网络）被称为是显著过度参数化和难以解释，火车和适应。在本文中，我们介绍了在CNN跨越卷积内核结构正规化。在我们的方法中，每个卷积核首先被分解为2D字典原子由系数线性组合。在CNN广泛观察到的相关和冗余暗示分解系数中一个共同的低秩结构，其在此处进一步由我们的经验观察的支持。然后，我们要实施跨子结构共享分解系数明确正规化CNN内核，同时使每个子结构，只有自己的字典原子，几百个参数通常，这会导致显着的模型减少。我们用在不同的子结构共享，以涵盖广泛的参数减少和表现之间权衡的探索模式。我们提出的正规化的网络结构开门，以便更好地解释，培训和适应深模型。我们通过在多个数据集和底层网络的结构，并显示图像分类实验验证我们的方法的灵活性和兼容性该细胞神经网络现在保持与在参数和计算显着减少，例如，仅5 \％参数在RESNET-18使用性能达到相当的性能。对在比较与标准卷积模型获得更快，更强大的任务适应为数不多的镜头分类显示进一步的实验。

49. Don't miss the Mismatch: Investigating the Objective Function Mismatch for Unsupervised Representation Learning [PDF] 返回目录
Bonifaz Stuhr, Jürgen Brauer
Abstract: Finding general evaluation metrics for unsupervised representation learning techniques is a challenging open research question, which recently has become more and more necessary due to the increasing interest in unsupervised methods. Even though these methods promise beneficial representation characteristics, most approaches currently suffer from the objective function mismatch. This mismatch states that the performance on a desired target task can decrease when the unsupervised pretext task is learned too long - especially when both tasks are ill-posed. In this work, we build upon the widely used linear evaluation protocol and define new general evaluation metrics to quantitatively capture the objective function mismatch and the more generic metrics mismatch. We discuss the usability and stability of our protocols on a variety of pretext and target tasks and study mismatches in a wide range of experiments. Thereby we disclose dependencies of the objective function mismatch across several pretext and target tasks with respect to the pretext model's representation size, target model complexity, pretext and target augmentations as well as pretext and target task types.
摘要：查找无监督表示学习技术综合评价指标是一项具有挑战性的开放研究问题，它最近变得越来越必要的，因为在无人监督的方法越来越多的关注。虽然这些方法保证有利的代表性特征，大多数方法从目前的目标函数不一致吃亏。这种不匹配指出，当无监督的借口任务据悉过长所期望的目标任务的性能会降低 - 特别是当两个任务都病态。在这项工作中，我们建立在广泛使用的线性评价协议，并确定新的综合评价指标，定量地捕捉目标函数不一致和更通用的衡量标准不匹配。我们讨论的范围广泛的实验各种借口和目标任务，研究不匹配的可用性和我们的协议的稳定性。因此我们透露在几个借口和目标任务的目标函数不一致的依赖性相对于为借口模型的表示大小，目标模型的复杂性，借口和目标增扩，以及借口和目标任务类型。

50. End-to-End Deep Learning Model for Cardiac Cycle Synchronization from Multi-View Angiographic Sequences [PDF] 返回目录
Raphaël Royer-Rivard, Fantin Girard, Nagib Dahdah, Farida Cheriet
Abstract: Dynamic reconstructions (3D+T) of coronary arteries could give important perfusion details to clinicians. Temporal matching of the different views, which may not be acquired simultaneously, is a prerequisite for an accurate stereo-matching of the coronary segments. In this paper, we show how a neural network can be trained from angiographic sequences to synchronize different views during the cardiac cycle using raw x-ray angiography videos exclusively. First, we train a neural network model with angiographic sequences to extract features describing the progression of the cardiac cycle. Then, we compute the distance between the feature vectors of every frame from the first view with those from the second view to generate distance maps that display stripe patterns. Using pathfinding, we extract the best temporally coherent associations between each frame of both videos. Finally, we compare the synchronized frames of an evaluation set with the ECG signals to show an alignment with 96.04% accuracy.
摘要：冠状动脉的动态重建（3D + T）可向医生给予重要灌注的细节。的不同的视图，其可以不被同时获取时间上的匹配，是用于冠状节段的精确立体匹配的前提。在本文中，我们将展示如何神经网络可以从血管造影序列进行培训，使用原始X射线造影视频独家心动周期中同步不同的看法。首先，我们培养具有血管造影序列的神经网络模型提取特征描述心动周期的进展。然后，我们计算每一帧的特征向量之间的距离从与那些来自第二视图中的第一视图，以生成距离图的显示条纹图案。使用寻路，我们提取的两个视频每一帧之间的最佳时间相干关联。最后，我们比较的评估组与ECG信号的同步帧，以显示与96.04％的准确对齐。

51. Implicit Multidimensional Projection of Local Subspaces [PDF] 返回目录
Rongzheng Bian, Yumeng Xue, Liang Zhou, Jian Zhang, Baoquan Chen, Daniel Weiskopf, Yunhai Wang
Abstract: We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.
摘要：我们提出了一个可视化的方法来理解多维投影的本地子空间，用隐函数分化的影响。在这里，我们了解了当地的子空间的数据点的多维当地居委会。现有的方法集中在多维数据点的投影，和邻里信息被忽略。我们的方法是能够分析的形状和局部子空间法的方向性信息，通过局部结构的观感，以获得更深入的理解数据的全局结构。本地子空间是通过由基向量多维跨越椭圆拟合。根据配制成隐函数多维突起的分析分化提出了一种准确，高效的矢量变换方法。结果可视化的字形和采用全套在我们高效的基于Web的可视化工具支持专门设计的相互作用的分析。我们的方法的有效性是利用各种多边和高维基准数据集证明。我们的隐式分化矢量变换通过数值比较评价;整个方法是通过探索实例和用例进行评价。

52. A New Screening Method for COVID-19 based on Ocular Feature Recognition by Machine Learning Tools [PDF] 返回目录
Yanwei Fu, Feng Li, Wenxuan Wang, Haicheng Tang, Xuelin Qian, Mengwei Gu, Xiangyang Xue
Abstract: The Coronavirus disease 2019 (COVID-19) has affected several million people. With the outbreak of the epidemic, many researchers are devoting themselves to the COVID-19 screening system. The standard practices for rapid risk screening of COVID-19 are the CT imaging or RT-PCR (real-time polymerase chain reaction). However, these methods demand professional efforts of the acquisition of CT images and saliva samples, a certain amount of waiting time, and most importantly prohibitive examination fee in some countries. Recently, some literatures have shown that the COVID-19 patients usually accompanied by ocular manifestations consistent with the conjunctivitis, including conjunctival hyperemia, chemosis, epiphora, or increased secretions. After more than four months study, we found that the confirmed cases of COVID-19 present the consistent ocular pathological symbols; and we propose a new screening method of analyzing the eye-region images, captured by common CCD and CMOS cameras, could reliably make a rapid risk screening of COVID-19 with very high accuracy. We believe a system implementing such an algorithm should assist the triage management or the clinical diagnosis. To further evaluate our algorithm and approved by the Ethics Committee of Shanghai public health clinic center of Fudan University, we conduct a study of analyzing the eye-region images of 303 patients (104 COVID-19, 131 pulmonary, and 68 ocular patients), as well as 136 healthy people. Remarkably, our results of COVID-19 patients in testing set consistently present similar ocular pathological symbols; and very high testing results have been achieved in terms of sensitivity and specificity. We hope this study can be inspiring and helpful for encouraging more researches in this topic.
摘要：冠状病毒病2019（COVID-19）已影响到数百万人。随着疫情的爆发，许多研究者都投身到COVID-19筛选系统。用于快速风险筛查COVID-19的标准做法是CT成像或RT-PCR（实时聚合酶链式反应）。然而，这些方法需要采集CT图像和唾液样本，一定的等待时间的专业努力，最重要的是在一些国家禁止的考试费。近来，一些文献表明，COVID，19例患者通常伴有眼部表现与结膜炎一致，包括结膜充血，结膜，泪溢，或分泌物增多。四个多月的研究之后，我们发现，COVID-19目前一致的眼部病变符号的确诊病例;我们建议分析眼区域图像，通过共同的CCD和CMOS相机拍摄的新的筛选方法，可以可靠地进行快速的风险非常高的精度COVID-19的筛选。我们认为，实现这样的算法应协助分诊管理或临床诊断的系统。为了进一步评估我们的算法，并通过复旦大学上海公共卫生临床中心的伦理委员会批准，我们进行分析的303例患者的眼部区域图像的研究（104 COVID-19，131肺和68名患者）以及136健康人。值得注意的是，我们的COVID-19例测试组始终存在类似的眼部病变符号的结果;和非常高的测试结果的敏感性和特异性方面已经实现。我们希望这项研究能有所启发和这个主题鼓励更多的研究很有帮助。

53. Localization and classification of intracranialhemorrhages in CT data [PDF] 返回目录
Jakub Nemcek, Roman Jakubicek, Jiri Chmelik
Abstract: Intracranial hemorrhages (ICHs) are life-threatening brain injures with a relatively high incidence. In this paper, the automatic algorithm for the detection and classification of ICHs, including localization, is present. The set of binary convolutional neural network-based classifiers with a designed cascade-parallel architecture is used. This automatic system may lead to a distinct decrease in the diagnostic process's duration in acute cases. An average Jaccard coefficient of 53.7 % is achieved on the data from the publicly available head CT dataset CQ500.
摘要：颅内出血（发生ICH）是危及生命的脑伤处有发病率相对较高。在本文中，自动算法发生ICH的检测和分类，包括定位，是存在的。该组具有设计的级联并行架构二进制卷积基于神经网络的分类器被使用。该自动系统可导致急性病例在诊断过程中的持续时间的明显减少。的53.7％的平均的Jaccard系数从可公开获得的头部CT数据集CQ500上的数据来实现的。

54. Towards learned optimal q-space sampling in diffusion MRI [PDF] 返回目录
Tomer Weiss, Sanketh Vedula, Ortal Senouf, Oleg Michailovich, AlexBronstein
Abstract: Fiber tractography is an important tool of computational neuroscience that enables reconstructing the spatial connectivity and organization of white matter of the brain. Fiber tractography takes advantage of diffusion Magnetic Resonance Imaging (dMRI) which allows measuring the apparent diffusivity of cerebral water along different spatial directions. Unfortunately, collecting such data comes at the price of reduced spatial resolution and substantially elevated acquisition times, which limits the clinical applicability of dMRI. This problem has been thus far addressed using two principal strategies. Most of the efforts have been extended towards improving the quality of signal estimation for any, yet fixed sampling scheme (defined through the choice of diffusion-encoding gradients). On the other hand, optimization over the sampling scheme has also proven to be effective. Inspired by the previous results, the present work consolidates the above strategies into a unified estimation framework, in which the optimization is carried out with respect to both estimation model and sampling design {\it concurrently}. The proposed solution offers substantial improvements in the quality of signal estimation as well as the accuracy of ensuing analysis by means of fiber tractography. While proving the optimality of the learned estimation models would probably need more extensive evaluation, we nevertheless claim that the learned sampling schemes can be of immediate use, offering a way to improve the dMRI analysis without the necessity of deploying the neural network used for their estimation. We present a comprehensive comparative analysis based on the Human Connectome Project data. Code and learned sampling designs aviliable at this https URL.
摘要：纤维束成像是计算神经科学的一个重要工具，使重建的脑白质的空间连通性和组织。纤维束成像利用扩散的磁共振成像（DMRI），其允许测量的脑水沿不同的空间方向的表观扩散率。不幸的是，收集这些数据来自于减小的空间分辨率，并且基本上升高的采集时间，这限制了DMRI的临床适用性的价格。这个问题至今一直使用的两个主要策略解决。大部分的努力已扩展对改善信号估计的任何质量的，但固定的采样方案（通过编码扩散梯度的选择定义）。在另一方面，优化过采样方案也被证明是有效的。由前面的结果的鼓舞，本工作合并上述策略成一个统一的估计框架，其中，所述优化是相对于两个估计模型和采样设计{\它同时}进行。所提出的解决方案提供了在信号估计的质量以及由纤维束成像的装置随后的分析的精度显着改善。虽然证明了教训估算模型将可能需要更广泛的评价最优，但我们仍然声称，了解到采样方案可以立即使用的，提供一种方法来改善，无需部署神经网络用于其估计的必要性DMRI分析。我们提出了基于人类连接组项目数据进行综合比较分析。代码并了解到采样在这个HTTPS URL设计aviliable。

55. The 2ST-UNet for Pneumothorax Segmentation in Chest X-Rays using ResNet34 as a Backbone for U-Net [PDF] 返回目录
Ayat Abedalla, Malak Abdullah, Mahmoud Al-Ayyoub, Elhadj Benkhelifa
Abstract: Pneumothorax, also called a collapsed lung, refers to the presence of the air in the pleural space between the lung and chest wall. It can be small (no need for treatment), or large and causes death if it is not identified and treated on time. It is easily seen and identified by experts using a chest X-ray. Although this method is mostly error-free, it is time-consuming and needs expert radiologists. Recently, Computer Vision has been providing great assistance in detecting and segmenting pneumothorax. In this paper, we propose a 2-Stage Training system (2ST-UNet) to segment images with pneumothorax. This system is built based on U-Net with Residual Networks (ResNet-34) backbone that is pre-trained on the ImageNet dataset. We start with training the network at a lower resolution before we load the trained model weights to retrain the network with a higher resolution. Moreover, we utilize different techniques including Stochastic Weight Averaging (SWA), data augmentation, and Test-Time Augmentation (TTA). We use the chest X-ray dataset that is provided by the 2019 SIIM-ACR Pneumothorax Segmentation Challenge, which contains 12,047 training images and 3,205 testing images. Our experiments show that 2-Stage Training leads to better and faster network convergence. Our method achieves 0.8356 mean Dice Similarity Coefficient (DSC) placing it among the top 9% of models with a rank of 124 out of 1,475.
摘要：气胸，也称为肺萎陷，是指空气中的肺和胸壁之间的胸膜间隙的存在。它可以是小（无需治疗），或者大而导致死亡，如果不识别和及时治疗它。这是很容易看到和识别用胸部X光专家。虽然这种方法主要是无差错，这是费时，需要放射学专家。最近，计算机视觉已经提供检测和分割气胸很大的帮助。在本文中，我们提出了一个2级培训系统（2ST-UNET）图像分割与气胸。该系统是基于掌中残留网络内置（RESNET-34）骨干网的ImageNet数据集被预先训练。我们先以较低的分辨率训练网络之前，我们加载训练模型的权重与更高的分辨率重新培训网络。此外，我们使用不同的技术，包括随机重量平均（SWA），数据扩张，和测试时间增强（TTA）。我们使用的是由2019 SIIM-ACR气胸分割挑战，其中包含12,047个训练图像和3205个测试图像提供的胸部X射线数据集。我们的实验表明，2级培训能带来更好，更快的网络融合。我们的方法达到0.8356平均骰子相似系数（DSC）将它的顶部之间的模型9％与124总分1475的秩。

56. Perfusion Imaging: A Data Assimilation Approach [PDF] 返回目录
Peirong Liu, Yueh Z. Lee, Stephen R. Aylward, Marc Niethammer
Abstract: Perfusion imaging (PI) is clinically used to assess strokes and brain tumors. Commonly used PI approaches based on magnetic resonance imaging (MRI) or computed tomography (CT) measure the effect of a contrast agent moving through blood vessels and into tissue. Contrast-agent free approaches, for example, based on intravoxel incoherent motion, also exist, but are so far not routinely used clinically. These methods rely on estimating on the arterial input function (AIF) to approximately model tissue perfusion, neglecting spatial dependencies, and reliably estimating the AIF is also non-trivial, leading to difficulties with standardizing perfusion measures. In this work we therefore propose a data-assimilation approach (PIANO) which estimates the velocity and diffusion fields of an advection-diffusion model that best explains the contrast dynamics. PIANO accounts for spatial dependencies and neither requires estimating the AIF nor relies on a particular contrast agent bolus shape. Specifically, we propose a convenient parameterization of the estimation problem, a numerical estimation approach, and extensively evaluate PIANO. We demonstrate that PIANO can successfully resolve velocity and diffusion field ambiguities and results in sensitive measures for the assessment of stroke, comparing favorably to conventional measures of perfusion.
摘要：灌注成像（PI）在临床上用于评估中风和脑瘤。常用的办法PI基于磁共振成像（MRI）或计算机断层扫描（CT）测量造影剂通过血管并进入组织移动的效果。造影剂游离的方法，例如，基于体素内非相干运动，也存在，但到目前为止还没有临床上常规使用。这些方法依赖于在动脉输入功能（AIF）来估计到大约模型的组织灌注，忽略空间相关性，并可靠地估计AIF也是不平凡的，导致具有标准化灌注措施困难。在这项工作中，我们提出，因此，其估计的对流扩散模型的速度和传播领域最能说明对比度动态数据同化方法（钢琴）。钢琴占空间依赖性和既不要求估计AIF也不依赖于特定造影剂团的形状。具体来说，我们建议的估计问题，数值估计方法的方便参数，并广泛评估PIANO。我们证明PIANO可以成功解析速度和扩散场含糊不清，并导致中风的评估敏感的措施，相媲美灌注的常规措施。

57. Edge-variational Graph Convolutional Networks for Uncertainty-aware Disease Prediction [PDF] 返回目录
Yongxiang Huang, Albert C. S. Chung
Abstract: There is a rising need for computational models that can complementarily leverage data of different modalities while investigating associations between subjects for population-based disease analysis. Despite the success of convolutional neural networks in representation learning for imaging data, it is still a very challenging task. In this paper, we propose a generalizable framework that can automatically integrate imaging data with non-imaging data in populations for uncertainty-aware disease prediction. At its core is a learnable adaptive population graph with variational edges, which we mathematically prove that it is optimizable in conjunction with graph convolutional neural networks. To estimate the predictive uncertainty related to the graph topology, we propose the novel concept of Monte-Carlo edge dropout. Experimental results on four databases show that our method can consistently and significantly improve the diagnostic accuracy for Autism spectrum disorder, Alzheimer's disease, and ocular diseases, indicating its generalizability in leveraging multimodal data for computer-aided diagnosis.
摘要：有需要的计算模型，可以互补不同的模式的利用数据，而调查对象为基于人群的疾病分析之间的关联上升。尽管卷积神经网络在学习表示成功的成像数据，它仍然是一个非常具有挑战性的任务。在本文中，我们提出了一个一般化的框架，可以在人群不确定性感知疾病预测自动集成了非成像数据的成像数据。其核心是一个可以学习的自适应人口图表与变边缘，这我们数学上证明它是与图卷积神经网络结合优化的。为了估计相关的图形拓扑预测的不确定性，我们建议蒙特卡洛边缘辍学的新概念。在四个数据库的实验结果表明，该方法可以持续和显著改善自闭症谱系障碍，阿尔茨海默氏病和眼部疾病诊断的准确性，表明利用多模态数据的计算机辅助诊断其普遍性。

58. Deep Learning for Automatic Spleen Length Measurement in Sickle Cell Disease Patients [PDF] 返回目录
Zhen Yuan, Esther Puyol-Anton, Haran Jogeesvaran, Catriona Reid, Baba Inusa, Andrew P. King
Abstract: Sickle Cell Disease (SCD) is one of the most common genetic diseases in the world. Splenomegaly (abnormal enlargement of the spleen) is frequent among children with SCD. If left untreated, splenomegaly can be life-threatening. The current workflow to measure spleen size includes palpation, possibly followed by manual length measurement in 2D ultrasound imaging. However, this manual measurement is dependent on operator expertise and is subject to intra- and inter-observer variability. We investigate the use of deep learning to perform automatic estimation of spleen length from ultrasound images. We investigate two types of approach, one segmentation-based and one based on direct length estimation, and compare the results against measurements made by human experts. Our best model (segmentation-based) achieved a percentage length error of 7.42%, which is approaching the level of inter-observer variability (5.47%-6.34%). To the best of our knowledge, this is the first attempt to measure spleen size in a fully automated way from ultrasound images.
摘要：镰状细胞病（SCD）是世界上最常见的遗传性疾病之一。脾肿大（脾脏异常增大）与SCD儿童频繁。如果不及时治疗，可脾肿大危及生命。当前工作流来测量脾脏大小包括触诊，可能随后在2D超声成像手动长度测量值。然而，这种手动测量是依赖于操作者的专业知识，并受帧内和帧间观察员变异。我们调查使用深层学习从超声图像进行脾脏长的自动估计。我们调查两种方法，一个分割的基础，一个基于直接长度估计，以及对人类专家的测量结果进行比较。我们的最佳模式（分段为主）达到7.42％的百分比长度误差，这是接近国际观察员变异的水平（5.47％-6.34％）。据我们所知，这是从超声图像完全自动化的方法来衡量脾脏大小的首次尝试。

59. Semi-supervised Pathology Segmentation with Disentangled Representations [PDF] 返回目录
Haochuan Jiang, Agisilaos Chartsias, Xinheng Zhang, Giorgos Papanastasiou, Scott Semple, Mark Dweck, David Semple, Rohan Dharmakumar, Sotirios A. Tsaftaris
Abstract: Automated pathology segmentation remains a valuable diagnostic tool in clinical practice. However, collecting training data is challenging. Semi-supervised approaches by combining labelled and unlabelled data can offer a solution to data scarcity. An approach to semi-supervised learning relies on reconstruction objectives (as self-supervision objectives) that learns in a joint fashion suitable representations for the task. Here, we propose Anatomy-Pathology Disentanglement Network (APD-Net), a pathology segmentation model that attempts to learn jointly for the first time: disentanglement of anatomy, modality, and pathology. The model is trained in a semi-supervised fashion with new reconstruction losses directly aiming to improve pathology segmentation with limited annotations. In addition, a joint optimization strategy is proposed to fully take advantage of the available annotations. We evaluate our methods with two private cardiac infarction segmentation datasets with LGE-MRI scans. APD-Net can perform pathology segmentation with few annotations, maintain performance with different amounts of supervision, and outperform related deep learning methods.
摘要：自动病理分割依然在临床实践中一个有价值的诊断工具。然而，收集训练数据是一个挑战。通过结合标记的和未标记的数据半监督方法可以提供一种解决方案，以数据缺乏。半监督学习的方法依赖于重建目标（如自我监督的目标），在该任务的联合方式合适的表示获悉。在这里，我们提出解剖病理学退纠缠网（APD-NET）病理学细分模型，试图共同学习的第一次：解剖，形态和病理解开。该模型是在一个半监督的方式与新的重建损失直接旨在改善病理分割用限制注释训练。此外，联合优化策略，提出了充分利用现有的注释的优势。我们评估我们与LGE-MRI扫描两个私人心肌梗塞分割数据集的方法。 APD-网可以用一些注释进行病理分割，保持与不同量的监管性能，以及超越相关深刻的学习方法。

60. CLEANN: Accelerated Trojan Shield for Embedded Neural Networks [PDF] 返回目录
Mojan Javaheripi, Mohammad Samragh, Gregory Fields, Tara Javidi, Farinaz Koushanfar
Abstract: We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the ground-truth class of Trojan samples without the need for labeled data, model retraining, or prior assumptions on the trigger or the attack. We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers. CLEANN is devised based on algorithm/hardware co-design and is equipped with specialized hardware to enable efficient real-time execution on resource-constrained embedded platforms. Proof of concept evaluations on CLEANN for the state-of-the-art Neural Trojan attacks on visual benchmarks demonstrate its competitive advantage in terms of attack resiliency and execution overhead.
摘要：本文提出CLEANN，第一端至端框架，使木马程序的在线缓解嵌入式深层神经网络（DNN）的应用程序。特洛伊木马攻击的原理是注射在DNN一个后门，而培训;推理过程中，该木马可通过特定的后门触发激活。什么从以前的工作区分CLEANN是其轻量级方法用于恢复地面实况类木马样本，而不需要标记数据，模型重新训练，或在触发或者攻击之前的假设。我们利用词典学习和稀疏逼近表征良性数据的统计行为，并识别特洛伊木马触发。 CLEANN基于算法/硬件协同设计设计，并配有专门的硬件，使资源受限的嵌入式平台的高效实时执行。在CLEANN概念评估的视觉基准的国家的最先进的神经木马攻击的证据证明攻击的弹性和执行开销方面的竞争优势。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-09-08

目录

摘要