摘要

1. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections [PDF] 返回目录
Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth
Abstract: We present a learning-based method for synthesizing novel views of complex outdoor scenes using only unstructured collections of in-the-wild photographs. We build on neural radiance fields (NeRF), which uses the weights of a multilayer perceptron to implicitly model the volumetric density and color of a scene. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. In this work, we introduce a series of extensions to NeRF to address these issues, thereby allowing for accurate reconstructions from unstructured image collections taken from the internet. We apply our system, which we dub NeRF-W, to internet photo collections of famous landmarks, thereby producing photorealistic, spatially consistent scene representations despite unknown and confounding factors, resulting in significant improvement over the state of the art.
摘要：我们提出了使用合成仅在最狂野照片非结构化的集合复杂的室外场景新颖的观点基于学习的方法。我们建立在神经辐射场（NERF），它采用了多层的权重感知到场景的体积密度和颜色含蓄建模。而NERF效果很好上在受控的设置捕获静态对象的图像，其不能建模不受控制图像，诸如可变照明或瞬时封堵器许多普遍存在的，真实世界的现象。在这项工作中，我们引入了一系列的扩展NERF来解决这些问题，从而允许从互联网采取非结构化的图像集合精确重建。我们应用我们的系统，我们配音NERF-W，著名的地标互联网照片集，从而产生真实感，尽管未知空间一致的现场陈述和混杂因素，导致在技术状态显著改善。

2. Learning Long-term Visual Dynamics with Region Proposal Interaction Networks [PDF] 返回目录
Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik
Abstract: Learning long-term dynamics models is the key to understanding physical common sense. Most existing approaches on learning dynamics from visual input sidestep long-term predictions by resorting to rapid re-planning with short-term models. This not only requires such models to be super accurate but also limits them only to tasks where an agent can continuously obtain feedback and take action at each step until completion. In this paper, we aim to leverage the ideas from success stories in visual recognition tasks to build object representations that can capture inter-object and object-environment interactions over a long range. To this end, we propose Region Proposal Interaction Networks (RPIN), which reason about each object's trajectory in a latent region-proposal feature space. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin both in terms of prediction quality and their ability to plan for downstream tasks, and also generalize well to novel environments. Our code is available at this https URL.
摘要：学习长期动态模型的关键是理解物理常识。通过诉诸快速重新规划与短期模型学习从视觉输入台阶长期预测动态大多数现有的方法。这不仅需要这样的车型是超级准确，而且他们只限制到任务，其中代理可以连续获得反馈，并在每个步骤，直到完成采取行动。在本文中，我们的目标是从成功案例中的视觉识别任务的思想充分利用来构建对象表示，可以捕获在远距离对象间和对象与环境的相互作用。为此，我们提出建议区相互作用网络（RPIN），其原因大概有隐蔽区域，建议特征的每个物体的轨迹。由于采用了简单而有效的对象表示，我们的方法优于无论是在预测的质量和他们的能力，以计划任务下游方面由显著保证金以前的方法，同时也推广以及新型环境。我们的代码可在此HTTPS URL。

3. Performance Improvement of Path Planning algorithms with Deep Learning Encoder Model [PDF] 返回目录
Janderson Ferreira, Agostinho A. F. Júnior, Yves M. Galvão, Pablo Barros, Sergio Murilo Maciel Fernandes, Bruno J. T. Fernandes
Abstract: Currently, path planning algorithms are used in many daily tasks. They are relevant to find the best route in traffic and make autonomous robots able to navigate. The use of path planning presents some issues in large and dynamic environments. Large environments make these algorithms spend much time finding the shortest path. On the other hand, dynamic environments request a new execution of the algorithm each time a change occurs in the environment, and it increases the execution time. The dimensionality reduction appears as a solution to this problem, which in this context means removing useless paths present in those environments. Most of the algorithms that reduce dimensionality are limited to the linear correlation of the input data. Recently, a Convolutional Neural Network (CNN) Encoder was used to overcome this situation since it can use both linear and non-linear information to data reduction. This paper analyzes in-depth the performance to eliminate the useless paths using this CNN Encoder model. To measure the mentioned model efficiency, we combined it with different path planning algorithms. Next, the final algorithms (combined and not combined) are checked in a database that is composed of five scenarios. Each scenario contains fixed and dynamic obstacles. Their proposed model, the CNN Encoder, associated to other existent path planning algorithms in the literature, was able to obtain a time decrease to find the shortest path in comparison to all path planning algorithms analyzed. the average decreased time was 54.43 %.
摘要：目前，路径规划算法，在很多日常工作中使用。它们是相关的发现流量的最佳途径，使自主机器人能够进行导航。使用路径规划的呈现在大动态环境中的一些问题。大环境使这些算法花费很多时间寻找最短路径。在另一方面，动态环境中的每个环境中发生的变化的时间请求的算法的一个新的执行，并且其增加了执行时间。该降维显示为解决这个问题，这在该上下文中的装置中除去无用的路径存在于这些环境。大部分的降低维数的算法被限制为所述输入数据的线性相关性。最近，一个卷积神经网络（CNN）编码器被用来克服这种情况，因为它可以使用线性和对数据缩减的非线性的信息。本文分析深入的性能，以消除采用这种编码器CNN模型中的无用的路径。为了测量上述模型的效率，我们有不同的路径规划算法结合它。接着，最终的算法（合并和未结合）的一个，其由五个场景数据库进行检查。每个场景都包含固定和动态的障碍。他们提出的模型，CNN的编码器，相关文献中的其他存在的路径规划算法，能够获得一个时间减少找到比较分析的所有路径规划算法的最短路径。平均下降时间为54.43％。

4. Fully Automated and Standardized Segmentation of Adipose Tissue Compartments by Deep Learning in Three-dimensional Whole-body MRI of Epidemiological Cohort Studies [PDF] 返回目录
Thomas Küstner, Tobias Hepp, Marc Fischer, Martin Schwartz, Andreas Fritsche, Hans-Ulrich Häring, Konstantin Nikolaou, Fabian Bamberg, Bin Yang, Fritz Schick, Sergios Gatidis, Jürgen Machann
Abstract: Purpose: To enable fast and reliable assessment of subcutaneous and visceral adipose tissue compartments derived from whole-body MRI. Methods: Quantification and localization of different adipose tissue compartments from whole-body MR images is of high interest to examine metabolic conditions. For correct identification and phenotyping of individuals at increased risk for metabolic diseases, a reliable automatic segmentation of adipose tissue into subcutaneous and visceral adipose tissue is required. In this work we propose a 3D convolutional neural network (DCNet) to provide a robust and objective segmentation. In this retrospective study, we collected 1000 cases (66$\pm$ 13 years; 523 women) from the Tuebingen Family Study and from the German Center for Diabetes research (TUEF/DZD), as well as 300 cases (53$\pm$ 11 years; 152 women) from the German National Cohort (NAKO) database for model training, validation, and testing with a transfer learning between the cohorts. These datasets had variable imaging sequences, imaging contrasts, receiver coil arrangements, scanners and imaging field strengths. The proposed DCNet was compared against a comparable 3D UNet segmentation in terms of sensitivity, specificity, precision, accuracy, and Dice overlap. Results: Fast (5-7seconds) and reliable adipose tissue segmentation can be obtained with high Dice overlap (0.94), sensitivity (96.6%), specificity (95.1%), precision (92.1%) and accuracy (98.4%) from 3D whole-body MR datasets (field of view coverage 450x450x2000mm${}^3$). Segmentation masks and adipose tissue profiles are automatically reported back to the referring physician. Conclusion: Automatic adipose tissue segmentation is feasible in 3D whole-body MR data sets and is generalizable to different epidemiological cohort studies with the proposed DCNet.
摘要：目的：使快速和全身MRI来源的皮下和内脏脂肪组织室的可靠评估。方法：量化与不同的脂肪组织的隔间的本地化与全身MR图像是高利息的检查代谢疾病。对于正确的识别，并在对代谢性疾病的风险增加的个体的表型分型，需要脂肪组织的一个可靠的自动分割成皮下和内脏脂肪组织。在这项工作中，我们提出了一个3D卷积神经网络（DCNet）提供了强有力的和客观的分割。在这项回顾性研究中，我们收集千案件（66 $ \时许$13年，523名女性）从蒂宾家庭研究和德国中心为糖尿病研究（TUEF / DZD），以及300例（53 $ \ PM $11年;从模型训练，验证和测试与同伙之间的转移学习德国国家队列（菜子）数据库中152名妇女）。这些数据集具有可变的成像序列，成像对比，接收器线圈装置，扫描仪和成像的场强。所提出的DCNet是针对可比3D UNET分割在灵敏度，特异性，精确度，准确度和骰子重叠方面进行比较。结果：快速（5-7seconds）和可靠的脂肪组织的分割可以与来自3D全高骰子重叠（0.94），灵敏度（96.6％），特异度（95.1％），精密（92.1％）和准确度（98.4％）来获得体区MR数据集（图覆盖450x450x2000mm $ {} ^ 3 $的字段）。分割口罩和脂肪组织中的配置文件会自动报告给咨询医师。结论：自动脂肪组织分割是3D全身MR数据集可行且是推广到与所提出的DCNet不同的流行病学队列研究。

5. Can You Read Me Now? Content Aware Rectification using Angle Supervision [PDF] 返回目录
Amir Markovitz, Inbal Lavi, Or Perel, Shai Mazor, Roee Litman
Abstract: The ubiquity of smartphone cameras has led to more and more documents being captured by cameras rather than scanned. Unlike flatbed scanners, photographed documents are often folded and crumpled, resulting in large local variance in text structure. The problem of document rectification is fundamental to the Optical Character Recognition (OCR) process on documents, and its ability to overcome geometric distortions significantly affects recognition accuracy. Despite the great progress in recent OCR systems, most still rely on a pre-process that ensures the text lines are straight and axis aligned. Recent works have tackled the problem of rectifying document images taken in-the-wild using various supervision signals and alignment means. However, they focused on global features that can be extracted from the document's boundaries, ignoring various signals that could be obtained from the document's content. We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document's content, the location of the words and specifically their orientation, as hints to assist in the rectification process. We utilize a novel pixel-wise angle regression approach and a curvature estimation side-task for optimizing our rectification model. Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
摘要：智能手机相机的普及已经导致了相机，而不是扫描捕获越来越多的文件。不同于平板扫描仪，拍摄的文档通常折叠和折皱，导致文本结构大的局部方差。文档整改的问题是光学字符识别（OCR）上的文档流程的基础，其克服几何失真能力显著影响识别的准确性。尽管最近OCR系统的巨大进步，大部分还是依靠前处理，确保文本线都是直线和轴线对齐。最近的作品解决整顿中最百搭利用各种监管信号和对齐方式拍摄文档图像的问题。然而，他们专注于可以从文档的边界被提取，忽略可能从文档的内容获得的各种信号的全局特征。我们现在CREASE：使用角度监督，文档整改首先学到的方法，它依赖于该文件的内容的话来说，具体的方向位置，作为提示，在整改过程中提供协助的内容感知整改。我们利用一种新的逐像素的角度回归方法和曲率估计侧任务优化我们的整改模式。我们的方法超越了OCR精度，几何误差和视觉相似的条款以前的方法。

6. Domain-Specific Mappings for Generative Adversarial Style Transfer [PDF] 返回目录
Hsin-Yu Chang, Zhixiang Wang, Yung-Yu Chuang
Abstract: Style transfer generates an image whose content comes from one image and style from the other. Image-to-image translation approaches with disentangled representations have been shown effective for style transfer between two image categories. However, previous methods often assume a shared domain-invariant content space, which could compromise the content representation power. For addressing this issue, this paper leverages domain-specific mappings for remapping latent features in the shared content space to domain-specific content spaces. This way, images can be encoded more properly for style transfer. Experiments show that the proposed method outperforms previous style transfer methods, particularly on challenging scenarios that would require semantic correspondences between images. Code and results are available at this https URL.
摘要：风格转移生成图像，其内容来自一个形象和风格与其他。图像 - 图像平移方法与解缠结的交涉已被证明有效的两个图像类别之间的风格转移。然而，以前的方法通常假定共享的域不变的内容空间，这可能会损害内容的表现力。为了解决这一问题，本文利用重映射在共享内容空间的潜在功能，特定领域的内容空间域的特定映射。通过这种方式，图像可以更正确编码的风格转移。实验结果表明，该方法优于以往的风格传输方法，尤其是在具有挑战性的情况下，要求图像之间的语义对应。代码和结果可在此HTTPS URL。

7. Active Perception using Light Curtains for Autonomous Driving [PDF] 返回目录
Siddharth Ancha, Yaadhav Raaj, Peiyun Hu, Srinivasa G. Narasimhan, David Held
Abstract: Most real-world 3D sensors such as LiDARs perform fixed scans of the entire environment, while being decoupled from the recognition system that processes the sensor data. In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient controllable sensor that measures depth at user-specified locations in the environment. Crucially, we propose using prediction uncertainty of a deep learning based 3D point cloud detector to guide active perception. Given a neural network's uncertainty, we derive an optimization objective to place light curtains using the principle of maximizing information gain. Then, we develop a novel and efficient optimization algorithm to maximize this objective by encoding the physical constraints of the device into a constraint graph and optimizing with dynamic programming. We show how a 3D detector can be trained to detect objects in a scene by sequentially placing uncertainty-guided light curtains to successively improve detection accuracy. Code and details can be found on the project webpage: this http URL.
摘要：大多数现实世界的三维传感器，例如激光雷达执行固定整个环境的扫描，而从处理传感器数据的识别系统被解耦。在这项工作中，我们使用光幕提出一种用于三维目标识别的方法，资源节约可控传感器，在环境中的用户指定的位置的措施深度。最重要的是，我们建议使用深学习基于三维点云检测的不确定预测来指导活动的看法。由于神经网络的不确定性，我们得出一个优化目标，以使用最大化信息增益的原理光幕。然后，我们开发一种新型和有效的优化算法通过编码设备的物理限制成约束图，并用动态规划优化以最大化这一目标。我们展示了3D探测器如何进行培训，通过按顺序放置的不确定性引导光幕相继提高检测精度检测场景中的对象。此http网址：代码和细节可以在项目网页上找到。

8. Tiny-YOLO object detection supplemented with geometrical data [PDF] 返回目录
Ivan Khokhlov, Egor Davydenko, Ilia Osokin, Ilya Ryakin, Azer Babaev, Vladimir Litvinenko, Roman Gorbachev
Abstract: We propose a method of improving detection precision (mAP) with the help of the prior knowledge about the scene geometry: we assume the scene to be a plane with objects placed on it. We focus our attention on autonomous robots, so given the robot's dimensions and the inclination angles of the camera, it is possible to predict the spatial scale for each pixel of the input frame. With slightly modified YOLOv3-tiny we demonstrate that the detection supplemented by the scale channel, further referred as S, outperforms standard RGB-based detection with small computational overhead.
摘要：我们提出改进约场景几何先验知识帮助检测精度（MAP）的方法：我们假设的场景与置于其上的物体的平面。我们把注意力集中在自主机器人，所以考虑到机器人的尺寸和摄像头的倾斜角度，是可以预测的空间尺度为输入帧的每个像素。用稍微修改YOLOv3-微小我们证明，由秤信道，还称为为S补充的检测，优于具有小的计算开销标准基于RGB的检测。

9. Self-supervised Temporal Discriminative Learning for Video Representation Learning [PDF] 返回目录
Jinpeng Wang, Yiqi Lin, Andy J. Ma, Pong C. Yuen
Abstract: Temporal cues in videos provide important information for recognizing actions accurately. However, temporal-discriminative features can hardly be extracted without using an annotated large-scale video action dataset for training. This paper proposes a novel Video-based Temporal-Discriminative Learning (VTDL) framework in self-supervised manner. Without labelled data for network pretraining, temporal triplet is generated for each anchor video by using segment of the same or different time interval so as to enhance the capacity for temporal feature representation. Measuring temporal information by time derivative, Temporal Consistent Augmentation (TCA) is designed to ensure that the time derivative (in any order) of the augmented positive is invariant except for a scaling constant. Finally, temporal-discriminative features are learnt by minimizing the distance between each anchor and its augmented positive, while the distance between each anchor and its augmented negative as well as other videos saved in the memory bank is maximized to enrich the representation diversity. In the downstream action recognition task, the proposed method significantly outperforms existing related works. Surprisingly, the proposed self-supervised approach is better than fully-supervised methods on UCF101 and HMDB51 when a small-scale video dataset (with only thousands of videos) is used for pre-training. The code has been made publicly available on this https URL.
摘要：在视频时空线索提供准确识别行动的重要信息。然而，颞判别特征难以被不使用注释的大型视频动作的数据集进行训练提取。本文提出了自我监督的方式一种新型的基于视频的颞判别学习（VTDL）框架。如果没有对网络预训练标记数据，为每个锚视频通过使用相同或不同的时间间隔的段，以便增强对时间特征的表示的容量生成的时间三元组。通过时间微分测量的时间信息，时间一致的扩增（TCA）被设计，以确保增强正的（以任意顺序）的时间导数是除了一个缩放常数不变。最后，颞判别特征是通过最小化每个锚和增强正面之间的距离了解到，虽然每个锚及其增强负以及保存在存储器组其他视频之间的距离最大化，以丰富的表现多样性。在下游动作识别任务，所提出的方法显著优于现有的相关工作。出人意料的是，所提出的自我监督的方法比对UCF101和HMDB51充分监督的方法时，一个小规模的视频数据集（只有几千视频）用于训练前更好。该代码已经被公布在该HTTPS URL。

10. Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning [PDF] 返回目录
Kshitij Dwivedi, Jiahui Huang, Radoslaw Martin Cichy, Gemma Roig
Abstract: In this paper, we tackle an open research question in transfer learning, which is selecting a model initialization to achieve high performance on a new task, given several pre-trained models. We propose a new highly efficient and accurate approach based on duality diagram similarity (DDS) between deep neural networks (DNNs). DDS is a generic framework to represent and compare data of different feature dimensions. We validate our approach on the Taskonomy dataset by measuring the correspondence between actual transfer learning performance rankings on 17 taskonomy tasks and predicted rankings. Computing DDS based ranking for $17\times17$ transfers requires less than 2 minutes and shows a high correlation ($0.86$) with actual transfer learning rankings, outperforming state-of-the-art methods by a large margin ($10\%$) on the Taskonomy benchmark. We also demonstrate the robustness of our model selection approach to a new task, namely Pascal VOC semantic segmentation. Additionally, we show that our method can be applied to select the best layer locations within a DNN for transfer learning on 2D, 3D and semantic tasks on NYUv2 and Pascal VOC datasets.
摘要：在本文中，我们将处理转移中学习，这是选择模型初始化，以实现对新任务的高性能，给出了几个预先训练模型一个开放的研究问题。我们提出了一种基于二元图的相似性（DDS）深层神经网络（DNNs）之间的一种新型高效和准确的方法。 DDS是表示和比较不同特征尺寸的数据的通用框架。我们通过测量17个taskonomy任务和预测的排名实际转让的学习业绩排名之间的对应关系验证了我们对Taskonomy数据集的方式。计算基于排名$ 17 \ times17 $传输DDS需要与实际传递学习排名，大幅度（$ 80 \％$）上优于状态的最先进的方法少于2分钟，并显示高相关性（$ 0.86 $）该Taskonomy基准。我们还表明我们的模型选择方法的稳健性，以一个新的任务，即帕斯卡VOC语义分割。此外，我们表明，我们的方法可以适用于选择DNN内对2D，3D和NYUv2和Pascal VOC数据集的语义任务迁移学习最佳层的位置。

11. Point Proposal Network: Accelerating Point Source Detection Through Deep Learning [PDF] 返回目录
Duncan Tilley, Christopher W. Cleghorn, Kshitij Thorat, Roger Deane
Abstract: Point source detection techniques are used to identify and localise point sources in radio astronomical surveys. With the development of the Square Kilometre Array (SKA) telescope, survey images will see a massive increase in size from Gigapixels to Terapixels. Point source detection has already proven to be a challenge in recent surveys performed by SKA pathfinder telescopes. This paper proposes the Point Proposal Network (PPN): a point source detector that utilises deep convolutional neural networks for fast source detection. Results measured on simulated MeerKAT images show that, although less precise when compared to leading alternative approaches, PPN performs source detection faster and is able to scale to large images, unlike the alternative approaches.
摘要：点光源的检测技术应用于无线电天文调查，以确定和定位点源。随着平方公里阵列（SKA）望远镜的发展，调查图像会看到大小Terapixels大幅增加从千兆像素。点源检测已被证明是由SKA探路者望远镜进行的最近的调查是一个挑战。本文提出的提案点网络（PPN）：利用用于快速源检测深卷积神经网络的一个点源检测器。上模拟猫鼬图像测得的结果表明，虽然不太精确的相比导致替代方法时，PPN执行源检测更快，是能够扩展到放大图，不像替代方法。

12. Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition [PDF] 返回目录
Jinpeng Wang, Yiqi Lin, Andy J.Ma
Abstract: Self-supervised learning has shown great potentials in improving the deep learning model in an unsupervised manner by constructing surrogate supervision signals directly from the unlabeled data. Different from existing works, we present a novel way to obtain the surrogate supervision signal based on high-level feature maps under consistency regularization. In this paper, we propose a Spatio-Temporal Consistency Regularization between different output features generated from a siamese network including a clean path fed with original video and a noise path fed with the corresponding augmented video. Based on the Spatio-Temporal characteristics of video, we develop two video-based data augmentation methods, i.e., Spatio-Temporal Transformation and Intra-Video Mixup. Consistency of the former one is proposed to model transformation consistency of features, while the latter one aims at retaining spatial invariance to extract action-related features. Extensive experiments demonstrate that our method achieves substantial improvements compared with state-of-the-art self-supervised learning methods for action recognition. When using our method as an additional regularization term and combine with current surrogate supervision signals, we achieve 22% relative improvement over the previous state-of-the-art on HMDB51 and 7% on UCF101.
摘要：自监督学习中通过直接从标签数据构建替代监管的信号改善无人监督的方式深度学习模式已显示出巨大的潜力。从现有的作品不同的是，我们提出了一个新颖的方式，以获得下一致性正规化映射基于高级别功能的替代监管的信号。在本文中，我们提出了一种时空一致性正则化之间从连体网络包括与原始视频并与相应的增强视频馈送的噪声路径馈送的清洁路径中产生不同的输出特性。基于视频的时空特点，我们开发了两个基于视频的数据增强方法，即时空转换和视频内的mixup。前者的一致性提出的功能模式转型的一致性，而后者的目的是保持空间不变性提取物行动相关的功能。大量的实验证明，与国家的最先进的自我监督学习方法动作识别相比，我们的方法实现了实质性的改进。如果使用我们的方法作为额外的调整项，并与当前的替代监管的信号结合起来，就可以实现较上HMDB51国家的最先进和UCF101 7％22％的相对改善。

13. Compact Graph Architecture for Speech Emotion Recognition [PDF] 返回目录
A. Shirian, T. Guha
Abstract: We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a graph convolution network (GCN)-based architecture that can perform an \emph{accurate} graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP database. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves state-of-the-art performance (4-class, $65.29\%$) with significantly fewer learnable parameters.
摘要：我们建议，以解决语音情感识别的任务了深刻的图表方法。来表示数据的紧凑，高效和可伸缩的方式在图的形式。下面的图表信号处理理论，我们建议语音信号的周期图形或线图模型。这种图形结构使我们能够构建可以在与在标准GCNs使用的近似卷积执行\ {EMPH准确}图表卷积的曲线图卷积网络（GCN）系架构。我们评估我们对流行IEMOCAP数据库语音情感识别模型的性能。我们的模型优于标准GCN和指示我们的方法的有效性等相关深图形架构。当与现有语音情感识别方法相比，我们的模型实现了国家的最先进的性能（4级，$ 65.29 \％$）与显著少学得的参数。

14. Pose-based Modular Network for Human-Object Interaction Detection [PDF] 返回目录
Zhijun Liang, Junfa Liu, Yisheng Guan, Juan Rojas
Abstract: Human-object interaction(HOI) detection is a critical task in scene understanding. The goal is to infer the triplet in a scene. In this work, we note that the human pose itself as well as the relative spatial information of the human pose with respect to the target object can provide informative cues for HOI detection. We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Our module consists of a branch that first processes the relative spatial pose features of each joint independently. Another branch updates the absolute pose features via fully connected graph structures. The processed pose features are then fed into an action classifier. To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and HICO-DET, which shows its efficacy and flexibility. Code is available at \url{this https URL}.
摘要：人机交互对象（HOI）检测是在现场了解的重要任务。我们的目标是在推断的场景三重<主语，谓语，宾语>。在这项工作中，我们注意到，人体姿势本身以及人的姿势相对于目标对象可HOI检测提供信息线索的相对空间信息。我们贡献一个基于姿态模块化网络（PMN），它探讨了绝对的姿势功能和相对空间姿态功能，以提高HOI检测，并与现有网络完全兼容。我们的模块由一个分支，第一进程的相对空间位姿的每个关节的独立特征的。另一个分支通过更新完全连接的图形结构的绝对姿势的功能。然后将处理后的姿态的特征被馈送到一个动作分类器。为了评估我们提出的方法，我们结合模块名为VS-服务贸易总协定的国家的最先进的模型，并获得两个公共基准显著改善：V-COCO和HICO-DET，这表明它的有效性和灵活性。代码可以在\ {URL这HTTPS URL}。

15. Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes [PDF] 返回目录
Johanna Wald, Torsten Sattler, Stuart Golodetz, Tommaso Cavallari, Federico Tombari
Abstract: Long-term camera re-localization is an important task with numerous computer vision and robotics applications. Whilst various outdoor benchmarks exist that target lighting, weather and seasonal changes, far less attention has been paid to appearance changes that occur indoors. This has led to a mismatch between popular indoor benchmarks, which focus on static scenes, and indoor environments that are of interest for many real-world applications. In this paper, we adapt 3RScan - a recently introduced indoor RGB-D dataset designed for object instance re-localization - to create RIO10, a new long-term camera re-localization benchmark focused on indoor scenes. We propose new metrics for evaluating camera re-localization and explore how state-of-the-art camera re-localizers perform according to these metrics. We also examine in detail how different types of scene change affect the performance of different methods, based on novel ways of detecting such changes in a given RGB-D frame. Our results clearly show that long-term indoor re-localization is an unsolved problem. Our benchmark and tools are publicly available at this http URL
摘要：长期相机重新定位是与众多的计算机视觉和机器人技术应用的一个重要任务。虽然各种户外基准存在这一目标的照明，天气和季节的变化，却很少关注已经支付给发生在室内外观上的变化。这导致了流行的室内基准，其专注于静态场景，以及室内环境是对许多现实世界的应用利益之间的不匹配。在本文中，我们适应3RScan - 最近推出的室内RGB-d数据集设计对象实例重新定位 - 打造RIO10，一个新的长期再相机定位的基准集中在室内场景。我们提出了新的指标，用于评估照相机重新定位，探索国家的最先进的相机重新定位器根据这些指标如何执行。我们还详细研究不同类型的场景变化的如何影响的不同方法的性能，基于检测在给定的RGB-d帧这样的改变新颖的方式。我们的研究结果清楚地表明，长期室内重新定位是一个未解决的问题。我们的基准和工具是公开的，在这个HTTP URL

16. Fast top-K Cosine Similarity Search through XOR-Friendly Binary Quantization on GPUs [PDF] 返回目录
Xiaozheng Jian, Jianqiu Lu, Zexi Yuan, Ao Li
Abstract: We explore the use of GPU for accelerating large scale nearest neighbor search and we propose a fast vector-quantization-based exhaustive nearest neighbor search algorithm that can achieve high accuracy without any indexing construction specifically designed for cosine similarity. This algorithm uses a novel XOR-friendly binary quantization method to encode floating-point numbers such that high-complexity multiplications can be optimized as low-complexity bitwise operations. Experiments show that, our quantization method takes short preprocessing time, and helps make the search speed of our exhaustive search method much more faster than that of popular approximate nearest neighbor algorithms when high accuracy is needed.
摘要：本文探讨加快近邻搜索大规模采用GPU的，我们提出了一个快速的基于矢量量化详尽的近邻搜索算法，可以达到较高的精度没有专门针对余弦相似设计的任何索引结构。该算法采用一种新颖的XOR友好二进制量化的方法来编码浮点数使得高复杂乘法可以以低复杂度的位操作进行优化。实验表明，我们的量化方法需要短预处理时间，并有助于使我们的穷举搜索方法的搜索速度比流行的近似最近邻算法更快，需要高精确度的时候。

17. F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation [PDF] 返回目录
Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang
Abstract: In order to generate images for a given category, existing deep generative models generally rely on abundant training images. However, extensive data acquisition is expensive and fast learning ability from limited data is necessarily required in real-world applications. Also, these existing methods are not well-suited for fast adaptation to a new category. Few-shot image generation, aiming to generate images from only a few images for a new category, has attracted some research interest. In this paper, we propose a Fusing-and-Filling Generative Adversarial Network (F2GAN) to generate realistic and diverse images for a new category with only a few images. In our F2GAN, a fusion generator is designed to fuse the high-level features of conditional images with random interpolation coefficients, and then fills in attended low-level details with non-local attention module to produce a new image. Moreover, our discriminator can ensure the diversity of generated images by a mode seeking loss and an interpolation regression loss. Extensive experiments on five datasets demonstrate the effectiveness of our proposed method for few-shot image generation.
摘要：为了产生给定类别的图像，现有深生成模型通常依靠丰富的训练图像。然而，大量的数据采集是从有限的数据昂贵和快速的学习能力，在实际应用中是必需的。此外，这些现有方法并不适合于快速适应新的类别。几个镜头图像生成，旨在产生从几个图像的图像为一个新的类别，吸引了一些研究的兴趣。在本文中，我们提出了一个融合和 - 灌装剖成对抗性网络（F2GAN）来生成，只有少数图像的新类别现实和多样的图像。在我们的F2GAN，融合发电机的设计与随机插值系数融合条件图像的高级特性，然后在参加低级别的细节填满了非本地注意模块产生一个新的形象。此外，我们的鉴别可以通过模式寻求损失和插值回归损失保证生成的图像的多样性。在五个数据集大量实验证明我们提出的方法的几个镜头图像生成的效果。

18. Subclass Contrastive Loss for Injured Face Recognition [PDF] 返回目录
Puspita Majumdar, Saheb Chhabra, Richa Singh, Mayank Vatsa
Abstract: Deaths and injuries are common in road accidents, violence, and natural disaster. In such cases, one of the main tasks of responders is to retrieve the identity of the victims to reunite families and ensure proper identification of deceased/ injured individuals. Apart from this, identification of unidentified dead bodies due to violence and accidents is crucial for the police investigation. In the absence of identification cards, current practices for this task include DNA profiling and dental profiling. Face is one of the most commonly used and widely accepted biometric modalities for recognition. However, face recognition is challenging in the presence of facial injuries such as swelling, bruises, blood clots, laceration, and avulsion which affect the features used in recognition. In this paper, for the first time, we address the problem of injured face recognition and propose a novel Subclass Contrastive Loss (SCL) for this task. A novel database, termed as Injured Face (IF) database, is also created to instigate research in this direction. Experimental analysis shows that the proposed loss function surpasses existing algorithm for injured face recognition.
摘要：死亡和受伤的交通事故，暴力和自然灾害常见。在这种情况下，反应的主要任务之一是检索遇难者的身份，以家庭团聚，并确保死者/伤者个人的正确识别。除此之外，由于暴力和事故不明尸体的身份是警方的调查至关重要。由于没有身份证，这个任务目前的做法包括DNA分析和牙齿分析。脸是识别最常用的和被广泛接受的生物特征识别方式之一。然而，面部识别在面部损伤，例如肿胀，瘀伤，血凝块，裂伤，和撕脱影响在识别中使用的特征的存在挑战。在本文中，第一次，我们解决受伤的面部识别的问题，并提出了一种新的子类对比损失（SCL）完成这个任务。一种新的数据库，称为带伤工作面（IF）数据库，还创建到在该方向上指使研究。实验分析表明，该损失函数超过了受伤的面部识别算法存在。

19. MultiCheXNet: A Multi-Task Learning Deep Network For Pneumonia-like Diseases Diagnosis From X-ray Scans [PDF] 返回目录
Abdullah Tarek Farag, Ahmed Raafat Abd El-Wahab, Mahmoud Nada, Mohamed Yasser Abd El-Hakeem, Omar Sayed Mahmoud, Reem Khaled Rashwan, Ahmad El Sallab
Abstract: We present MultiCheXNet, an end-to-end Multi-task learning model, that is able to take advantage of different X-rays data sets of Pneumonia-like diseases in one neural architecture, performing three tasks at the same time; diagnosis, segmentation and localization. The common encoder in our architecture can capture useful common features present in the different tasks. The common encoder has another advantage of efficient computations, which speeds up the inference time compared to separate models. The specialized decoders heads can then capture the task-specific features. We employ teacher forcing to address the issue of negative samples that hurt the segmentation and localization performance. Finally,we employ transfer learning to fine tune the classifier on unseen pneumonia-like diseases. The MTL architecture can be trained on joint or dis-joint labeled data sets. The training of the architecture follows a carefully designed protocol, that pre trains different sub-models on specialized datasets, before being integrated in the joint MTL model. Our experimental setup involves variety of data sets, where the baseline performance of the 3 tasks is compared to the MTL architecture performance. Moreover, we evaluate the transfer learning mode to COVID-19 data set,both from individual classifier model, and from MTL architecture classification head.
摘要：我们目前MultiCheXNet，最终到终端的多任务学习模式，即能够利用不同的X射线的优势数据集的肺炎样在一个神经结构的疾病，在同一时间执行三项任务;诊断，分割和定位。目前在不同的任务中我们的架构共同的编码器可以捕捉有用的共同特征。公共编码器具有高效率的计算的另一优点，相较于单独的模型，其加速了推理时间。随后，由专门的解码器头可以捕捉任务特定的功能。我们聘请的老师强迫，以解决损害的细分和定位性能阴性样本的问题。最后，我们采用迁移学习微调分类上看不见的类似肺炎的疾病。的MTL架构可以在关节或DIS-关节标记的数据集来训练。该架构的培训遵循一个精心设计的协议，对专业数据集是预列车不同的子型号，集成在联合MTL模型前。我们的实验设置包括各种数据集，其中的3个任务基线性能相比MTL架构的性能。此外，我们评估传输学习模式COVID-19数据集，无论是从个人的分类模型，并从MTL结构分类头。

20. A feature-supervised generative adversarial network for environmental monitoring during hazy days [PDF] 返回目录
Ke Wang, Siyuan Zhang, Junlan Chen, Fan Ren, Lei Xiao
Abstract: The adverse haze weather condition has brought considerable difficulties in vision-based environmental applications. While, until now, most of the existing environmental monitoring studies are under ordinary conditions, and the studies of complex haze weather conditions have been ignored. Thence, this paper proposes a feature-supervised learning network based on generative adversarial networks (GAN) for environmental monitoring during hazy days. Its main idea is to train the model under the supervision of feature maps from the ground truth. Four key technical contributions are made in the paper. First, pairs of hazy and clean images are used as inputs to supervise the encoding process and obtain high-quality feature maps. Second, the basic GAN formulation is modified by introducing perception loss, style loss, and feature regularization loss to generate better results. Third, multi-scale images are applied as the input to enhance the performance of discriminator. Finally, a hazy remote sensing dataset is created for testing our dehazing method and environmental detection. Extensive experimental results show that the proposed method has achieved better performance than current state-of-the-art methods on both synthetic datasets and real-world remote sensing images.
摘要：不利的灰霾天气条件已经在基于视觉的环境应用带来了相当大的困难。虽然，到现在为止，大多数现有的环境监测研究的是普通条件下和复杂的灰霾天气条件下，研究已经被忽视了。从那里，本文提出了一种基于生成对抗网络（GAN）的环境监测在朦胧的天，有监督学习网络。它的主要目的是为了训练特征地图从地面真理的监督下进行的模式。四个关键技术贡献在纸做。首先，对朦胧和清洁图像被用作输入，以监督编码处理并获得高品质的特征图。二，基本GAN制剂是通过引入感知损失，损失的风格和功能正规化损失，产生更好的结果修改。第三，多尺度图像被施加作为输入，以提高鉴别器的性能。最后，一个朦胧的远程感应数据是检验我们的除雾方法和环境检测创建。广泛的实验结果表明，所提出的方法已经实现了比状态的最先进的电流在两个合成数据集和真实世界的遥感图像的方法更好的性能。

21. COALESCE: Component Assembly by Learning to Synthesize Connections [PDF] 返回目录
Kangxue Yin, Zhiqin Chen, Siddhartha Chaudhuri, Matthew Fisher, Vladimir Kim, Hao Zhang
Abstract: We introduce COALESCE, the first data-driven framework for component-based shape assembly which employs deep learning to synthesize part connections. To handle geometric and topological mismatches between parts, we remove the mismatched portions via erosion, and rely on a joint synthesis step, which is learned from data, to fill the gap and arrive at a natural and plausible part joint. Given a set of input parts extracted from different objects, COALESCE automatically aligns them and synthesizes plausible joints to connect the parts into a coherent 3D object represented by a mesh. The joint synthesis network, designed to focus on joint regions, reconstructs the surface between the parts by predicting an implicit shape representation that agrees with existing parts, while generating a smooth and topologically meaningful connection. We employ test-time optimization to further ensure that the synthesized joint region closely aligns with the input parts to create realistic component assemblies from diverse input parts. We demonstrate that our method significantly outperforms prior approaches including baseline deep models for 3D shape synthesis, as well as state-of-the-art methods for shape completion.
摘要：我们介绍COALESCE，基于组件的形状组件，其采用深学习到合成部连接所述第一数据驱动框架。为了处理部件之间的几何和拓扑不一致问题，我们通过侵蚀去除错配部分，并依赖于联合合成步骤，这是从数据了解到，以填充所述间隙并在天然的和可信的部分接合到达。给定一组来自不同对象的提取的输入部分，COALESCE自动对准他们并合成似是而非关节零件连接成由网格表示的相干3D对象。联合合成网络，设计的重点接合区域，通过预测的隐式形状表示，与现有的部件一致，同时产生一个光滑和拓扑有意义连接重建所述部分之间的表面。我们采用测试时间优化，以进一步确保合成联合区域紧密地与输入部件对齐创建从不同的输入部分现实元件的组件。我们证明我们的方法显著优于对形状完成之前的方法包括3D图形合成基准深模型，以及国家的最先进的方法。

22. Component Divide-and-Conquer for Real-World Image Super-Resolution [PDF] 返回目录
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, Liang Lin
Abstract: In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. DRealSR establishes a new SR benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation. In general, the targets of SR vary with image regions with different low-level image components, e.g., smoothness preserving for flat regions, sharpening for edges, and detail enhancing for textures. Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures. We propose a Component Divide-and-Conquer (CDC) model and a Gradient-Weighted (GW) loss for SR. Our CDC parses an image with three components, employs three Component-Attentive Blocks (CABs) to learn attentive masks and intermediate SR predictions with an intermediate supervision learning strategy, and trains an SR model following a divide-and-conquer learning principle. Our GW loss also provides a feasible way to balance the difficulties of image components for SR. Extensive experiments validate the superior performance of our CDC and the challenging aspects of our DRealSR dataset related to diverse real-world scenarios. Our dataset and codes are publicly available at this https URL
摘要：在本文中，我们提出了一个大型的多元化的现实世界图像超分辨率的数据集，即DRealSR，以及分而治之的超分辨率（SR）网络，探索指导SR模式的效用低层次的图像成分。 DRealSR建立与多样化的现实世界的降解过程新的SR标杆，减轻传统的模拟图像退化的限制。一般情况下，SR的目标具有不同的低级别图像成分，例如图像区域而变化，平滑度为保留平坦区域，锐化边缘，以及细节增强为纹理。学习与常规的逐像素损失SR模型通常很容易被平坦区域和边缘支配，并且不能推断复杂的纹理的现实细节。我们提出了一个组件分而治之（CDC）模型和SR渐变加权（GW）的损失。我们的CDC分析的图像由三个部分组成，采用三种组件的细心块（CAB的）学习周到口罩和中间SR预测值与中间监督学习策略，和火车的SR模式下分而治之的学习原则。我们GW损失还提供了一个可行的方法来平衡图像组件的困难SR。大量的实验验证了我们的疾病预防控制中心的卓越的性能和我们的DRealSR数据集的相关多元化的现实情景的挑战性的方面。我们的数据和代码是公开的，在此HTTPS URL

23. Graph Signal Processing for Geometric Data and Beyond: Theory and Applications [PDF] 返回目录
Wei Hu, Jiahao Pang, Xianming Liu, Dong Tian, Chia-Wen Lin, Anthony Vetro
Abstract: Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.
摘要：来自真实世界的场景，例如，2D深度图像，三维点云，和4D动感点云，获得几何数据已经找到了广泛的应用，包括融入式远程呈现，自动驾驶，监控等。由于不规则的采样模式最几何数据，传统的图像/视频处理方法是有限的，而图形信号处理（GSP）---在信号处理社区一个快速发展的领域---能够处理的信号驻留在不规则结构域和起关键作用的在从低级别的处理，以高级别分析几何数据的许多应用。为了进一步预先在该领域的研究中，我们通过桥接的各种几何数据模式之间的几何数据和图形之间的连接提供GSP方法用于以统一方式几何数据的第一及时和全面的概述，并用光谱/节点图表滤波技术。我们还讨论了最近开发的图形神经网络（GNNS）和GSP的角度解释这些网络的运行。最后，我们的开放性问题和挑战进行简短的讨论。

24. Multimodality Biomedical Image Registration using Free Point Transformer Networks [PDF] 返回目录
Zachary M. C. Baum, Yipeng Hu, Dean C. Barratt
Abstract: We describe a point-set registration algorithm based on a novel free point transformer (FPT) network, designed for points extracted from multimodal biomedical images for registration tasks, such as those frequently encountered in ultrasound-guided interventional procedures. FPT is constructed with a global feature extractor which accepts unordered source and target point-sets of variable size. The extracted features are conditioned by a shared multilayer perceptron point transformer module to predict a displacement vector for each source point, transforming it into the target space. The point transformer module assumes no vicinity or smoothness in predicting spatial transformation and, together with the global feature extractor, is trained in a data-driven fashion with an unsupervised loss function. In a multimodal registration task using prostate MR and sparsely acquired ultrasound images, FPT yields comparable or improved results over other rigid and non-rigid registration methods. This demonstrates the versatility of FPT to learn registration directly from real, clinical training data and to generalize to a challenging task, such as the interventional application presented.
摘要：我们基于一种新的自由点变压器（FPT）网络，设计用于多模态生物医学图像注册任务，如在超声引导下介入手术中经常遇到提取点上描述了点集配准算法。 FPT被构造为具有其接受无序源和目标点集可变大小的全局特征提取器。所提取的特征由一个共享的多层感知点变压器模块调节以预测的位移矢量的每个源点，将其转化为目标空间。点变压器模块不承担任何附近或光滑度在预测空间变换，并且与全局特征提取器一起，在数据驱动的方式被训练以无人监督的损失函数。在使用前列腺MR和疏采集的超声图像的多模态配准任务，FPT产生超过其它刚性和非刚性配准的方法可比较的或改进的结果。这表明FPT的通用性，直接从真实的，临床培训学习资料登记和推广到一个具有挑战性的任务，如提交的介入应用。

25. Synthetic to Real Unsupervised Domain Adaptation for Single-Stage Artwork Recognition in Cultural Sites [PDF] 返回目录
Giovanni Pasqualino, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella
Abstract: Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in order to achieve good performance. Using synthetic data generated from the 3D model of the cultural site to train the algorithms can reduce these costs. On the other hand, when these models are tested with real images, a significant drop in performance is observed due to the differences between real and synthetic images. In this study we consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites. To address this problem, we created a new dataset containing both synthetic and real images of 16 different artworks. We hence investigated different domain adaptation techniques based on one-stage and two-stage object detector, image-to-image translation and feature alignment. Based on the observation that single-stage detectors are more robust to the domain shift in the considered settings, we proposed a new method based on RetinaNet and feature alignment that we called DA-RetinaNet. The proposed approach achieves better results than compared methods. To support research in this field we release the dataset at the following link this https URL and the code of the proposed architecture at this https URL.
摘要：认识到使用但从用户的角度获取图像的文化遗址作品（第一人称视觉）允许建立的观众和现场管理人员既有趣的应用程序。但是，在充分监督的环境中工作电流目标检测算法需要大批量标签的数据，其收集需要大量的时间和成本较高，以达到良好的性能方面的培训。使用来自文化遗址训练算法可以降低这些成本的3D模型生成的合成数据。在另一方面，当这些模型与真实图像测试，性能的下降显著观察到由于真实的和合成的图像之间的差别。在这项研究中，我们认为无监督领域适应性的对文化遗址的物体检测的问题。为了解决这个问题，我们创建了一个包含合成和16个不同的艺术作品真实图像的新数据集。基于一阶段和两阶段对象检测器，图像到图像的平移和特征对准我们因此研究了不同的域的自适应技术。基于这样的观察是单级探测器更加坚固的考虑设置的域变化，我们提出了一种基于RetinaNet和功能定位，我们称之为DA-RetinaNet的新方法。所提出的方法实现了比比较的方法更好的结果。在这一领域支持研究我们发布的数据集在以下链接此HTTPS URL和提出的架构的代码在此HTTPS URL。

26. Implicit Saliency in Deep Neural Networks [PDF] 返回目录
Yutong Sun, Mohit Prabhushankar, Ghassan AlRegib
Abstract: In this paper, we show that existing recognition and localization deep architectures, that have not been exposed to eye tracking data or any saliency datasets, are capable of predicting the human visual saliency. We term this as implicit saliency in deep neural networks. We calculate this implicit saliency using expectancy-mismatch hypothesis in an unsupervised fashion. Our experiments show that extracting saliency in this fashion provides comparable performance when measured against the state-of-art supervised algorithms. Additionally, the robustness outperforms those algorithms when we add large noise to the input images. Also, we show that semantic features contribute more than low-level features for human visual saliency detection.
摘要：在本文中，我们表明，现有的识别和定位深深的架构，还没有接触过眼动追踪数据或任何显着的数据集，能够预测人类视觉显着性的。我们称此为深层神经网络隐含的显着性。我们在无人监督的方式使用预期不匹配假设计算这种隐含的显着性。我们的实验表明，当对状态的最先进的监督算法测量，在这种方式提取的显着性提供相当的性能。此外，鲁棒性优于这些算法的时候，我们对输入图像添加噪声大。此外，我们表明，语义特征比低级别的功能，对人的视觉显着性检测作出更大的贡献。

27. From Human Mesenchymal Stromal Cells to Osteosarcoma Cells Classification by Deep Learning [PDF] 返回目录
Mario D'Acunto, Massimo Martinelli, Davide Moroni
Abstract: Early diagnosis of cancer often allows for a more vast choice of therapy opportunities. After a cancer diagnosis, staging provides essential information about the extent of disease in the body and the expected response to a particular treatment. The leading importance of classifying cancer patients at the early stage into high or low-risk groups has led many research teams, both from the biomedical and bioinformatics field, to study the application of Deep Learning (DL) methods. The ability of DL to detect critical features from complex datasets is a significant achievement in early diagnosis and cell cancer progression. In this paper, we focus the attention on osteosarcoma. Osteosarcoma is one of the primary malignant bone tumors which usually afflicts people in adolescence. Our contribution to the classification of osteosarcoma cells is made as follows: a DL approach is applied to discriminate human Mesenchymal Stromal Cells (MSCs) from osteosarcoma cells and to classify the different cell populations under investigation. Glass slides of differ-ent cell populations were cultured including MSCs, differentiated in healthy bone cells (osteoblasts) and osteosarcoma cells, both single cell populations or mixed. Images of such samples of isolated cells (single-type of mixed) are recorded with traditional optical microscopy. DL is then applied to identify and classify single cells. Proper data augmentation techniques and cross-fold validation are used to appreciate the capabilities of a convolutional neural network to address the cell detection and classification problem. Based on the results obtained on individual cells, and to the versatility and scalability of our DL approach, the next step will be its application to discriminate and classify healthy or cancer tissues to advance digital pathology.
摘要：癌症的早期诊断常允许的治疗机会，更广阔的选择。癌症诊断之后，分段为约疾病在体内的程度和对特定治疗的预期响应的必要信息。在早期癌症患者分成高或低风险组的领导的重要性，导致许多研究团队，无论是从生物医学和生物信息学领域，研究深度学习（DL）方法的应用。 DL的从复杂的数据集检测关键特征的能力在早期诊断和细胞癌进展的显著成就。在本文中，我们侧重于骨肉瘤的关注。骨肉瘤是主要的恶性骨肿瘤通常折磨人在青春期的一个。我们对骨肉瘤细胞的分类贡献如下制备：在DL方法从骨肉瘤细胞施加到判别人类间充质基质细胞（MSC）和到不同的细胞群体被调查分类。的载玻片上不同-ENT细胞群体中培养包含MSC，区别在健康的骨细胞（成骨细胞）和骨肉瘤细胞，单细胞群或混合。分离的细胞（单型的混合的）的这种样品的图像被记录与传统的光学显微镜。然后DL被施加到识别和分类单个细胞。适当的数据增量技术和横折验证用于欣赏的卷积神经网络的功能，以满足小区检测和分类的问题。基于对单个细胞获得的结果，以及多功能性和我们的DL方法的可扩展性，下一步将是其应用到判别和分类健康或癌组织推进数字病理学。

28. Importance of Self-Consistency in Active Learning for Semantic Segmentation [PDF] 返回目录
S. Alireza Golestaneh, Kris M. Kitani
Abstract: We address the task of active learning in the context of semantic segmentation and show that self-consistency can be a powerful source of self-supervision to greatly improve the performance of a data-driven model with access to only a small amount of labeled data. Self-consistency uses the simple observation that the results of semantic segmentation for a specific image should not change under transformations like horizontal flipping (i.e., the results should only be flipped). In other words, the output of a model should be consistent under equivariant transformations. The self-supervisory signal of self-consistency is particularly helpful during active learning since the model is prone to overfitting when there is only a small amount of labeled training data. In our proposed active learning framework, we iteratively extract small image patches that need to be labeled, by selecting image patches that have high uncertainty (high entropy) under equivariant transformations. We enforce pixel-wise self-consistency between the outputs of segmentation network for each image and its transformation (horizontally flipped) to utilize the rich self-supervisory information and reduce the uncertainty of the network. In this way, we are able to find the image patches over which the current model struggles the most to classify. By iteratively training over these difficult image patches, our experiments show that our active learning approach reaches $\sim96\%$ of the top performance of a model trained on all data, by using only $12\%$ of the total data on benchmark semantic segmentation datasets (e.g., CamVid and Cityscapes).
摘要：针对主动学习的语义分割的情况下的任务，表明自洽可以自我监督的有力来源，大大提高了访问数据驱动模型的性能，只是标有少量数据。自洽使用简单的观察对特定的图像语义分割的结果应下状水平翻转（即，其结果应仅被翻转）变换不会改变。换句话说，一个模型的输出结果应该是下等变转化相一致。的自洽的自监控信号是主动学习因为该模型是易于过度拟合仅存在标记的训练数据量小时期间特别有用。在我们提出的主动学习框架，我们反复提取需要通过选择具有下等变转换不确定性高（高熵）像块标记，小图像块。我们执行逐像素分割网络的各图像的输出和它的变换（水平翻转）之间，以利用富自动监视信息并降低网络的不确定性自我一致性。通过这种方式，我们都能够找到在其当前模型斗争最分类的图像块。通过在这些困难的图像块反复训练，我们的实验表明，我们的主动学习的方法达到$ \ sim96 \％$的培训上的所有数据，采用整体数据仅$ 12 \％$的基准语义模型的顶级性能的分割的数据集（例如，CamVid和都市风景）。

29. Deep Multi Depth Panoramas for View Synthesis [PDF] 返回目录
Kai-En Lin, Zexiang Xu, Ben Mildenhall, Pratul P. Srinivasan, Yannick Hold-Geoffroy, Stephen DiVerdi, Qi Sun, Kalyan Sunkavalli, Ravi Ramamoorthi
Abstract: We propose a learning-based approach for novel view synthesis for multi-camera 360$^{\circ}$ panorama capture rigs. Previous work constructs RGBD panoramas from such data, allowing for view synthesis with small amounts of translation, but cannot handle the disocclusions and view-dependent effects that are caused by large translations. To address this issue, we present a novel scene representation - Multi Depth Panorama (MDP) - that consists of multiple RGBD$\alpha$ panoramas that represent both scene geometry and appearance. We demonstrate a deep neural network-based method to reconstruct MDPs from multi-camera 360$^{\circ}$ images. MDPs are more compact than previous 3D scene representations and enable high-quality, efficient new view rendering. We demonstrate this via experiments on both synthetic and real data and comparisons with previous state-of-the-art methods spanning both learning-based approaches and classical RGBD-based methods.
摘要：我们提出了新的视图合成的多摄像头360 $ ^ {\保监会} $全景拍摄钻机基于学习的方法。以前的工作构建RGBD全景从这样的数据，从而允许视图与少量翻译的合成，但不能处理由大的平移引起disocclusions和视点相关的效果。为了解决这个问题，我们提出了一个新颖的情景再现 - 多深度全景（MDP） - 是由多个RGBD $ \ $阿尔法全景代表两个场景的几何形状和外观。我们证明了深刻的基于神经网络的方法，从多摄像机360 $ ^ {\保监会} $图像重建的MDP。 MDP中比以往3D场景的表示更加紧凑，实现高品质，高效率的新视图渲染。我们通过对合成和真实的数据和比较与以往的国家的最先进的方法，既涵盖基于学习方法和经典的基于RGBD的方法实验证明这一点。

30. High resolution neural texture synthesis with long range constraints [PDF] 返回目录
Nicolas Gonthier, Yann Gousseau, Saïd Ladjal
Abstract: The field of texture synthesis has witnessed important progresses over the last years, most notably through the use of Convolutional Neural Networks. However, neural synthesis methods still struggle to reproduce large scale structures, especially with high resolution textures. To address this issue, we first introduce a simple multi-resolution framework that efficiently accounts for long-range dependency. Then, we show that additional statistical constraints further improve the reproduction of textures with strong regularity. This can be achieved by constraining both the Gram matrices of a neural network and the power spectrum of the image. Alternatively one may constrain only the autocorrelation of the features of the network and drop the Gram matrices constraints. In an experimental part, the proposed methods are then extensively tested and compared to alternative approaches, both in an unsupervised way and through a user study. Experiments show the interest of the multi-scale scheme for high resolution textures and the interest of combining it with additional constraints for regular textures.
摘要：纹理合成领域已经见证了重要进展，在过去几年，特别是通过使用卷积神经网络。然而，神经合成方法仍在努力重现大规模的结构，尤其是具有高分辨率的纹理。为了解决这个问题，我们首先介绍一个简单的多分辨率框架，有效地占远射依赖。然后，我们表明，附加统计约束进一步提高纹理具有很强的规律性的再现。这可以通过约束神经网络两者的革兰氏矩阵和图像的功率谱来实现。此外，也可以只限制的网络的特征的自相关和删除Gram矩阵约束。在实验部分，所提出的方法，然后广泛的测试，并且与替代方法，无论是在无监督的方式和通过用户研究。实验表明，多尺度方案的高分辨率纹理和利息将它与普通的纹理附加约束相结合的兴趣。

31. Deep Learning Based Early Diagnostics of Parkinsons Disease [PDF] 返回目录
Elcin Huseyn
Abstract: In the world, about 7 to 10 million elderly people are suffering from Parkinson's Disease (PD) disease. Parkinson's disease is a common neurological degenerative disease, and its clinical characteristics are Tremors, rigidity, bradykinesia, and decreased autonomy. Its clinical manifestations are very similar to Multiple System Atrophy (MSA) disorders. Studies have shown that patients with Parkinson's disease often reach an irreparable situation when diagnosed, so As Parkinson's disease can be distinguished from MSA disease and get an early diagnosis, people are constantly exploring new methods. With the advent of the era of big data, deep learning has made major breakthroughs in image recognition and classification. Therefore, this study proposes to use The deep learning method to realize the diagnosis of Parkinson's disease, multiple system atrophy, and healthy people. This data source is from Istanbul University Cerrahpasa Faculty of Medicine Hospital. The processing of the original magnetic resonance image (Magnetic Resonance Image, MRI) is guided by the doctor of Istanbul University Cerrahpasa Faculty of Medicine Hospital. The focus of this experiment is to improve the existing neural network so that it can obtain good results in medical image recognition and diagnosis. An improved algorithm was proposed based on the pathological characteristics of Parkinson's disease, and good experimental results were obtained by comparing indicators such as model loss and accuracy.
摘要：在世界范围内，约7至1000万老人都患有帕金森氏病（PD）病。帕金森病是一种常见的神经系统变性疾病，其临床特点是震颤，强直，运动迟缓，和自主性下降。其临床表现非常相似，多系统萎缩（MSA）的疾病。有研究表明确诊时，患有帕金森氏病常达到无法挽回的局面，从而为帕金森氏病可以从MSA疾病区分开来，并得到早期诊断，人们不断探索新方法。随着大数据时代的到来，深度学习取得了图像识别和分类的重大突破。因此，本研究提出利用深度学习方法，实现帕金森氏病，多系统萎缩，和健康人的诊断。此数据源是从中医医院伊斯坦布尔大学Cerrahpasa学院。原来的磁共振图像的处理（磁共振成像，MRI）是由医生开的药医院的伊斯坦布尔大学Cerrahpasa教师的指导。这个实验的重点是改进现有的神经网络，使其能够获得医疗图像识别和诊断了良好的效果。基于帕金森氏症的病理特征，提出了一种改进的算法，以及良好的实验结果进行了比较指标，如模型损失和准确性得到。

32. Entropy Guided Adversarial Model for Weakly Supervised Object Localization [PDF] 返回目录
Sabrina Narimene Benassou, Wuzhen Shi, Feng Jiang
Abstract: Weakly Supervised Object Localization is challenging because of the lack of bounding box annotations. Previous works tend to generate a class activation map i.e CAM to localize the object. Unfortunately, the network activates only the features that discriminate the object and does not activate the whole object. Some methods tend to remove some parts of the object to force the CNN to detect other features, whereas, others change the network structure to generate multiple CAMs from different levels of the model. In this present article, we propose to take advantage of the generalization ability of the network and train the model using clean examples and adversarial examples to localize the whole object. Adversarial examples are typically used to train robust models and are images where a perturbation is added. To get a good classification accuracy, the CNN trained with adversarial examples is forced to detect more features that discriminate the object. We futher propose to apply the shannon entropy on the CAMs generated by the network to guide it during training. Our method does not erase any part of the image neither does it change the network architecure and extensive experiments show that our Entropy Guided Adversarial model (EGA model) improved performance on state of the arts benchmarks for both localization and classification accuracy.
摘要：弱监督的对象定位是因为缺乏边界框标注的挑战。以前的作品往往会产生一类活动图即CAM本地化的对象。不幸的是，网络激活只存在歧视的对象，不激活整个对象的特征。一些方法往往删除该对象的某些部分来迫使CNN来检测其他特征，反之，其他改变网络结构，以产生从不同层次模型的多个凸轮。在目前这个文章中，我们建议采取的网络推广能力的优势，使用清洁的例子和对抗性的例子来定位整个对象训练模型。对抗性例子通常用于训练可靠的模型，并且其中，增加了扰动的图像。为了获得良好的分类精度，CNN训练了与对抗的例子被强制检测到更多的功能，歧视的对象。我们futher提出申请由网络产生的训练过程中引导它CAMS的信息熵。我们的方法不删除图像的任何部分，它也不会改变网络构建筑和广泛的实验表明，我们的熵引导对抗性模型（EGA模型）在艺术上的基准两者的定位和分类准确度的状态下提高性能。

33. PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability [PDF] 返回目录
Dikshant Sagar, Jatin Garg, Prarthana Kansal, Sejal Bhalla, Rajiv Ratn Shah, Yi Yu
Abstract: Fashion is an important part of human experience. Events such as interviews, meetings, marriages, etc. are often based on clothing styles. The rise in the fashion industry and its effect on social influencing have made outfit compatibility a need. Thus, it necessitates an outfit compatibility model to aid people in clothing recommendation. However, due to the highly subjective nature of compatibility, it is necessary to account for personalization. Our paper devises an attribute-wise interpretable compatibility scheme with personal preference modelling which captures user-item interaction along with general item-item interaction. Our work solves the problem of interpretability in clothing matching by locating the discordant and harmonious attributes between fashion items. Extensive experiment results on IQON3000, a publicly available real-world dataset, verify the effectiveness of the proposed model.
摘要：时尚是人类经验的重要组成部分。活动，如面谈，会议，婚姻等往往是根据服装款式。在时尚界和其对社会影响因素影响纷纷崛起，取得装备的兼容性需要。因此，它需要在服装推荐的装备兼容模式来援助的人。但是，由于兼容性的高度主观的性质，既要考虑个性化。我们的论文图谋与个人喜好造型捕获与一般的项目，项目互动以及用户交互项的属性明智可解释的兼容性方案。我们的工作通过定位时尚单品之间的不和谐，和谐的属性解决了服装搭配解释性的问题。在IQON3000，一个公开的真实世界的数据集丰富的实验结果，验证了该模型的有效性。

34. Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs [PDF] 返回目录
Robin Rombach, Patrick Esser, Björn Ommer
Abstract: To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance. Our implementation is available at this https URL .
摘要：为了解决日益复杂的任务，它已成为神经网络的学习抽象表示的基本能力。这些任务的具体表述，特别是他们捕捉不变性把神经网络为黑盒模型，缺乏可解释性。要打开这样一个黑盒子，它是，因此，至关重要的发现不同的语义概念模型了解到，以及那些已经学会了是不变的。我们提出基于国际非专利一种方法是：（i）恢复任务特异性，了解到不变性通过解开变异的剩余因子中的数据和（ⅱ）invertibly变换这些回收不变性与模型表示组合成一个同样表现一个与访问语义概念。因此，神经网络表示通过提供资金，成为可以理解的（我）暴露自己的语义，（二）语义修改的表示，以及（iii）形象化个人了解到语义概念和不变性。我们可逆的方式显著扩展能力，以了解通过实现国家的最先进的网络事后解释黑盒子模型而不影响其性能。我们的实施可在此HTTPS URL。

35. COVID-19 in CXR: from Detection and Severity Scoring to Patient Disease Monitoring [PDF] 返回目录
Rula Amer, Maayan Frid-Adar, Ophir Gozes, Jannette Nassar, Hayit Greenspan
Abstract: In this work, we estimate the severity of pneumonia in COVID-19 patients and conduct a longitudinal study of disease progression. To achieve this goal, we developed a deep learning model for simultaneous detection and segmentation of pneumonia in chest Xray (CXR) images and generalized to COVID-19 pneumonia. The segmentations were utilized to calculate a "Pneumonia Ratio" which indicates the disease severity. The measurement of disease severity enables to build a disease extent profile over time for hospitalized patients. To validate the model relevance to the patient monitoring task, we developed a validation strategy which involves a synthesis of Digital Reconstructed Radiographs (DRRs - synthetic Xray) from serial CT scans; we then compared the disease progression profiles that were generated from the DRRs to those that were generated from CT volumes.
摘要：在这项工作中，我们估计肺炎COVID-19在患者的严重程度和进行疾病进展的纵向研究。为了实现这一目标，我们开发了同时检测和胸片（CXR）图像肺炎分割了深刻的学习模式，并推广到COVID-19的肺炎。的分割被用于计算一个“肺炎比率”，其指示所述疾病严重性。疾病严重程度的测量能够建立随时间推移住院患者疾病程度轮廓。为了验证模型相关的监测患者的任务，我们开发了包括数字X光片重建的综合验证策略 - 从串行CT扫描（的DRR合成X射线）;我们再比较从的DRR那些从CT体积生成的生成的疾病进展概况。

36. Structure Preserving Stain Normalization of Histopathology Images Using Self-Supervised Semantic Guidance [PDF] 返回目录
Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Ling Shao
Abstract: Although generative adversarial network (GAN) based style transfer is state of the art in histopathology color-stain normalization, they do not explicitly integrate structural information of tissues. We propose a self-supervised approach to incorporate semantic guidance into a GAN based stain normalization framework and preserve detailed structural information. Our method does not require manual segmentation maps which is a significant advantage over existing methods. We integrate semantic information at different layers between a pre-trained semantic network and the stain color normalization network. The proposed scheme outperforms other color normalization methods leading to better classification and segmentation performance.
摘要：虽然生成对抗网络（GAN）的风格转移是在本领域中组织病理学色染色正常化的状态，他们没有明确的整合组织结构信息。我们提出了一个自我监督的方式纳入语义引导到一个基于GaN的污点正常化的框架，并保留详细的结构信息。我们的方法并不需要手动分割的地图是在现有的方法显著的优势。我们整合在预训练的语义网络和色斑颜色正常化网络之间不同层次的语义信息。该方案优于其他颜色标准化方法从而获得更好的分类和分割性能。

37. Learning Boost by Exploiting the Auxiliary Task in Multi-task Domain [PDF] 返回目录
Jonghwa Yim, Sang Hwan Kim
Abstract: Learning two tasks in a single shared function has some benefits. Firstly by acquiring information from the second task, the shared function leverages useful information that could have been neglected or underestimated in the first task. Secondly, it helps to generalize the function that can be learned using generally applicable information for both tasks. To fully enjoy these benefits, Multi-task Learning (MTL) has long been researched in various domains such as computer vision, language understanding, and speech synthesis. While MTL benefits from the positive transfer of information from multiple tasks, in a real environment, tasks inevitably have a conflict between them during the learning phase, called negative transfer. The negative transfer hampers function from achieving the optimality and degrades the performance. To solve the problem of the task conflict, previous works only suggested partial solutions that are not fundamental, but ad-hoc. A common approach is using a weighted sum of losses. The weights are adjusted to induce positive transfer. Paradoxically, this kind of solution acknowledges the problem of negative transfer and cannot remove it unless the weight of the task is set to zero. Therefore, these previous methods had limited success. In this paper, we introduce a novel approach that can drive positive transfer and suppress negative transfer by leveraging class-wise weights in the learning process. The weights act as an arbitrator of the fundamental unit of information to determine its positive or negative status to the main task.
摘要：在一个单一的共享功能，学习两个任务有一定的好处。首先通过从第二任务获取的信息，所述共享功能利用可能被忽视或在第一任务低估的有用信息。其次，它有助于推广，可以使用两个任务普遍适用的信息要学习的功能。为了充分享受这些好处，多任务学习（MTL）长期以来一直研究在各个领域，如计算机视觉，语言理解和语音合成。而从来自多个任务的信息正迁移MTL的好处，在真实的环境，任务不可避免地在学习阶段它们之间的冲突，称为负迁移。负转移篮从实现最优化，并降低了性能发挥。为了解决任务冲突的问题，以前的作品只是建议，不是根本的部分解决方案，但即席。一种常见的方法是使用的损耗的加权和。的权重进行调整，以诱导正向传送。奇怪的是，这种解决方案承认负迁移的问题，除非该任务的权重设置为零，无法将其删除。因此，这些以前的方法了有限的成功。在本文中，我们引入可通过利用集体智慧的重量在学习过程中推动积极的转移并抑制负迁移的新方法。权重作为信息的基本单元的仲裁来确定的主要任务的积极或消极的状态。

38. Extracting and Leveraging Nodule Features with Lung Inpainting for Local Feature Augmentation [PDF] 返回目录
Sebastian Guendel, Arnaud Arindra Adiyoso Setio, Sasa Grbic, Andreas Maier, Dorin Comaniciu
Abstract: Chest X-ray (CXR) is the most common examination for fast detection of pulmonary abnormalities. Recently, automated algorithms have been developed to classify multiple diseases and abnormalities in CXR scans. However, because of the limited availability of scans containing nodules and the subtle properties of nodules in CXRs, state-of-the-art methods do not perform well on nodule classification. To create additional data for the training process, standard augmentation techniques are applied. However, the variance introduced by these methods are limited as the images are typically modified globally. In this paper, we propose a method for local feature augmentation by extracting local nodule features using a generative inpainting network. The network is applied to generate realistic, healthy tissue and structures in patches containing nodules. The nodules are entirely removed in the inpainted representation. The extraction of the nodule features is processed by subtraction of the inpainted patch from the nodule patch. With arbitrary displacement of the extracted nodules in the lung area across different CXR scans and further local modifications during training, we significantly increase the nodule classification performance and outperform state-of-the-art augmentation methods.
摘要：胸片（CXR）可以快速检测肺部异常最常见的检查。最近，自动算法已经发展到CXR扫描多种疾病和异常分类。然而，由于含有结节扫描和结节在CXRS细微性质，状态的最先进的方法不上结节分类以及执行的有限的可用性。以产生用于训练过程的附加数据，标准增量技术被应用。然而，由于图像通常全局改性通过这些方法引入的方差被限制。在本文中，我们提出通过提取使用生成修补网络局部结节功能的局部特征增强的方法。网络被施加到产生在含有结节贴剂现实，健康组织和结构。结节在修补的表示完全除去。的结节特征的提取是通过从结节补丁将修补补片的减法处理。与在训练期间中在不同的CXR扫描和进一步本地修改的肺区域所提取的结节任意位移，我们显著增加结节分类性能和状态的最先进的跑赢增强方法。

39. Multiple Sclerosis Lesion Activity Segmentation with Attention-Guided Two-Path CNNs [PDF] 返回目录
Nils Gessert, Julia Krüger, Roland Opfer, Ann-Christin Ostwaldt, Praveena Manogaran, Hagen H. Kitzler, Sven Schippling, Alexander Schlaefer
Abstract: Multiple sclerosis is an inflammatory autoimmune demyelinating disease that is characterized by lesions in the central nervous system. Typically, magnetic resonance imaging (MRI) is used for tracking disease progression. Automatic image processing methods can be used to segment lesions and derive quantitative lesion parameters. So far, methods have focused on lesion segmentation for individual MRI scans. However, for monitoring disease progression, \textit{lesion activity} in terms of new and enlarging lesions between two time points is a crucial biomarker. For this problem, several classic methods have been proposed, e.g., using difference volumes. Despite their success for single-volume lesion segmentation, deep learning approaches are still rare for lesion activity segmentation. In this work, convolutional neural networks (CNNs) are studied for lesion activity segmentation from two time points. For this task, CNNs are designed and evaluated that combine the information from two points in different ways. In particular, two-path architectures with attention-guided interactions are proposed that enable effective information exchange between the two time point's processing paths. It is demonstrated that deep learning-based methods outperform classic approaches and it is shown that attention-guided interactions significantly improve performance. Furthermore, the attention modules produce plausible attention maps that have a masking effect that suppresses old, irrelevant lesions. A lesion-wise false positive rate of 26.4% is achieved at a true positive rate of 74.2%, which is not significantly different from the interrater performance.
摘要：多发性硬化症是一种炎症性自身免疫脱髓鞘疾病，在中枢神经系统的特征在于病变。通常，磁共振成像（MRI）被用于跟踪疾病的进展。自动图像处理方法可用于分割病变和派生定量损伤的参数。到目前为止，方法都集中在病变划分为单独的MRI扫描。然而，对于监控的新名词疾病进展，\ {textit病变活动}和扩大病变两个时间点之间是一个重要的生物标志物。对于这个问题，有几个经典的方法已经被提出，例如，利用差异卷。尽管他们的单卷病变划分成功，深学习方法仍然是罕见的病变活动分割。在这项工作中，卷积神经网络（细胞神经网络）的研究从两个时间点的病变活动分割。对于这个任务，细胞神经网络的设计和评估，从两分以不同的方式相结合的信息。特别是，注意引导相互作用两路径架构提出了使这两个时间点的处理路径之间有效的信息交流。据证实，深基于学习的方法优于传统方法和它表明注意引导作用显著提高性能。此外，注意模块产生合理的注意地图，具有抑制老的，不相关的病变掩蔽效应。 26.4％的脓疱明智的假阳性率在74.2％，这是不从者间的性能显著不同的真阳性率来实现的。

40. Unsupervised seismic facies classification using deep convolutional autoencoder [PDF] 返回目录
Vladimir Puzyrev, Chris Elders
Abstract: With the increased size and complexity of seismic surveys, manual labeling of seismic facies has become a significant challenge. Application of automatic methods for seismic facies interpretation could significantly reduce the manual labor and subjectivity of a particular interpreter present in conventional methods. A recently emerged group of methods is based on deep neural networks. These approaches are data-driven and require large labeled datasets for network training. We apply a deep convolutional autoencoder for unsupervised seismic facies classification, which does not require manually labeled examples. The facies maps are generated by clustering the deep-feature vectors obtained from the input data. Our method yields accurate results on real data and provides them instantaneously. The proposed approach opens up possibilities to analyze geological patterns in real time without human intervention.
摘要：随着增加的大小和地震勘探的复杂性，地震相手动标记已成为一个显著的挑战。用于地震相解释自动方法的应用可以减少显著特别解释存在于常规方法的手工劳动和主观性。的方法的最近出现的组是基于深层神经网络。这些方法是数据驱动的，需要对网络训练大标记数据集。我们应用无监督的地震相分类，这不需要手工标注的例子深刻的卷积的自动编码。该相图是通过聚类从输入数据得到的深特征向量生成。我们的方法得到的真实数据准确的结果，并立即为他们提供。所提出的方法开辟了可能性，以实时分析地质模式，无需人工干预。

41. More Than Accuracy: Towards Trustworthy Machine Learning Interfaces for Object Recognition [PDF] 返回目录
Hendrik Heuer, Andreas Breiter
Abstract: This paper investigates the user experience of visualizations of a machine learning (ML) system that recognizes objects in images. This is important since even good systems can fail in unexpected ways as misclassifications on photo-sharing websites showed. In our study, we exposed users with a background in ML to three visualizations of three systems with different levels of accuracy. In interviews, we explored how the visualization helped users assess the accuracy of systems in use and how the visualization and the accuracy of the system affected trust and reliance. We found that participants do not only focus on accuracy when assessing ML systems. They also take the perceived plausibility and severity of misclassification into account and prefer seeing the probability of predictions. Semantically plausible errors are judged as less severe than errors that are implausible, which means that system accuracy could be communicated through the types of errors.
摘要：本文探讨的是识别图像中的物体机器学习（ML）系统的可视化的用户体验。这是重要的，因为即使是好的系统可以作为照片共享网站的错误分类显示以意想不到的方式失败。在我们的研究中，我们暴露用户在ML三个系统的不同级别的精确度的三个可视化的背景。在采访中，我们探讨了可视化帮助用户如何评估的系统所使用的精确度，以及如何可视化和系统影响的信任和依赖的准确性。我们发现，评估ML系统时，参加者不仅要着眼于准确性。他们还采取感知可信性和误判的严重程度考虑在内，更喜欢看到预测的概率。语义似是而非错误被判断为不是错误是不可信的，这意味着，系统的精度可以通过错误类型来传送较轻。

42. Hierarchical Amortized Training for Memory-efficient High Resolution 3D GAN [PDF] 返回目录
Li Sun, Junxiang Chen, Yanwu Xu, Mingming Gong, Ke Yu, Kayhan Batmanghelich
Abstract: Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited embedded memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images. In this work, we propose a novel end-to-end GAN architecture that can generate high-resolution 3D images. We achieve this goal by separating training and inference. During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image. The hierarchical design has two advantages: First, the memory demand for training on high-resolution images is amortized among subvolumes. Furthermore, anchoring the high-resolution subvolumes to a single low-resolution image ensures anatomical consistency between subvolumes. During inference, our model can directly generate full high-resolution images. We also incorporate an encoder with a similar hierarchical structure into the model to extract features from the images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation, image reconstruction, and clinical-relevant variables prediction.
摘要：剖成对抗性网络（GAN）有许多潜在的医学成像应用，包括数据扩张，适应域和模型解释。由于图形处理单元（GPU）的有限的嵌入式存储器，最当前的3D模型GAN被训练在低分辨率医学图像。在这项工作中，我们提出了一个新颖的终端到终端的GAN架构，能够产生高清晰度的3D图像。我们实现了通过分离培养和推理这一目标。在训练期间，我们采用的是同时产生的图像的低分辨率版本和高分辨率图像的一个随机选择的子体积的分层结构。分层设计有两个好处：首先，在高分辨率的图像训练内存需求是亚体积中摊销。此外，锚定高分辨率子体积到单个低分辨率图像确保子容积之间的解剖一致性。在推理，我们的模型可以直接生成完整的高清晰度图像。我们还采用了类似的分级结构的编码到模型，从图像中提取的特征。在3D实验胸部CT和脑部MRI证明我们的方法优于现有技术中的图像生成，图像重建，以及临床相关的变量预测的状态。

43. Counterfactual Explanation Based on Gradual Construction for Deep Networks [PDF] 返回目录
Sin-Han Kang, Hong-Gyu Jung, Dong-Ok Won, Seong-Whan Lee
Abstract: To understand the black-box characteristics of deep networks, counterfactual explanation that deduces not only the important features of an input space but also how those features should be modified to classify input as a target class has gained an increasing interest. The patterns that deep networks have learned from a training dataset can be grasped by observing the feature variation among various classes. However, current approaches perform the feature modification to increase the classification probability for the target class irrespective of the internal characteristics of deep networks. This often leads to unclear explanations that deviate from real-world data distributions. To address this problem, we propose a counterfactual explanation method that exploits the statistics learned from a training dataset. Especially, we gradually construct an explanation by iterating over masking and composition steps. The masking step aims to select an important feature from the input data to be classified as a target class. Meanwhile, the composition step aims to optimize the previously selected feature by ensuring that its output score is close to the logit space of the training data that are classified as the target class. Experimental results show that our method produces human-friendly interpretations on various classification datasets and verify that such interpretations can be achieved with fewer feature modification.
摘要：为了了解深网络，反解释的黑盒的特点，推导不仅输入空间的重要特征，而且这些功能应该如何修改分类输入作为一个目标类已经获得了越来越多的关注。深网络已经从训练数据集所学到的模式可以通过观察各种类之间的特征变化来把握。然而，当前的方法执行特征修改以增加的深网络的内部特性的目标类，而不管分类概率。这往往会导致不明确的解释，从现实世界的数据分布偏离。为了解决这个问题，我们建议利用从训练数据集学习的统计数据的解释反方法。特别是，我们通过逐渐遍历掩蔽和组合物的步骤构造的解释。掩蔽步骤的目的来选择从被分类为一个目标类的输入数据的重要特征。同时，该组合物的步骤旨在通过确保其输出得分是接近被分类为目标类中的训练数据的分对数空间优化先前选择的功能。实验结果表明，我们的方法产生的各种分类数据集人性化的解释和验证这样的解释可以用较少的特征修改来实现。

44. A coarse-to-fine framework for unsupervised multi-contrast MR image deformable registration with dual consistency constraint [PDF] 返回目录
Weijian Huang, Hao Yang, Xinfeng Liu, Cheng Li, Ian Zhang, Rongpin Wang, Hairong Zheng, Shanshan Wang
Abstract: Multi-contrast magnetic resonance (MR) image registration is essential in the clinic to achieve fast and accurate imaging-based disease diagnosis and treatment planning. Nevertheless, the efficiency and performance of the existing registration algorithms can still be improved. In this paper, we propose a novel unsupervised learning-based framework to achieve accurate and efficient multi-contrast MR image registrations. Specifically, an end-to-end coarse-to-fine network architecture consisting of affine and deformable transformations is designed to get rid of both the multi-step iteration process and the complex image preprocessing operations. Furthermore, a dual consistency constraint and a new prior knowledge-based loss function are developed to enhance the registration performances. The proposed method has been evaluated on a clinical dataset that consists of 555 cases, with encouraging performances achieved. Compared to the commonly utilized registration methods, including Voxelmorph, SyN, and LDDMM, the proposed method achieves the best registration performance with a Dice score of 0.826 in identifying stroke lesions. More robust performance in low-signal areas is also observed. With regards to the registration speed, our method is about 17 times faster than the most competitive method of SyN when testing on a same CPU.
摘要：在诊所以实现快速和精确的基于成像的疾病的诊断和治疗计划的多对比磁共振（MR）图像配准是必要的。然而，效率和现有配准算法的性能仍然可以提高。在本文中，我们提出了一种新的无监督学习为主框架，以实现准确，高效的多对比MR图像登记。具体而言，由仿射和可变形变换的端至端粗到细的网络体系结构被设计成摆脱多步迭代处理和复杂的图像预处理操作两者。此外，双一致性约束和一个新的先验知识为基础的损失函数的开发，以提高注册的表演。该方法已被评估临床上的数据集包括555例，取得了令人鼓舞的演出。相比于通常使用的配准方法，包括Voxelmorph，SYN，和LDDMM，所提出的方法实现了与在识别中风病变骰子得分的0.826最好注册的性能。在低信号区更强劲的性能，也观察到。至于登记速度，我们的方法是大约比同一个CPU测试时SYN的最具竞争力的方法快17倍。

45. Stabilizing Deep Tomographic Reconstruction Networks [PDF] 返回目录
Weiwen Wu, Dianlin Hu, Shaoyu Wang, Hengyong Yu, Varut Vardhanabhuti, Ge Wang
Abstract: While the field of deep tomographic reconstruction has been advancing rapidly since 2016, there are constant debates and major challenges with the recently published PNAS paper on instabilities of deep learning in image reconstruction as a primary example, in which three kinds of unstable phenomena are demonstrated: (1) tiny perturbation on input generating strong output artifacts, (2) small structural features going undetected, and (3) increased input data leading to decreased performance. In this article, we show that key algorithmic ingredients of analytic inversion, compressed sensing, iterative reconstruction, and deep learning can be synergized to stabilize deep neural networks for optimal tomographic image reconstruction. With the same or similar datasets used in the PNAS paper and relative to the same state of the art compressed sensing algorithm, our proposed analytic, compressed, iterative deep (ACID) network produces superior imaging performance that are both accurate and robust with respect to noise, under adversarial attack, and as the number of input data is increased. We believe that deep tomographic reconstruction networks can be designed to produce accurate and robust results, improve clinical and other important applications, and eventually dominate the tomographic imaging field.
摘要：虽然深断层重建的领域已经从2016年迅速发展，也有不断的辩论，并在图像重构为一个主要的例子深度学习的不稳定性在最近出版的PNAS论文的主要挑战，其中3种不稳定现象证明：（1）对输入产生强输出工件微小扰动;（2）小的结构特征外出时未被发现，和（3）增加的输入数据，导致性能下降。在这篇文章中，我们表明，分析反转，压缩感知，迭代重建和深度学习的关键算法成分可协同稳定深层神经网络的最佳断层图像重建。与在PNAS纸使用，并且相对于现有技术的压缩感测算法的相同的状态相同或相似的数据集，我们所提出的分析，压缩，迭代深（ACID）网络产生优良的成像性能，而且精确和鲁棒相对于噪声下对抗攻击，并且作为输入数据的数量增加。我们认为，深层析重建网络，可设计能产生精确而稳定的结果，提高临床和其他重要应用，并最终占据主导地位层析成像领域。

46. Graph Neural Networks with Low-rank Learnable Local Filters [PDF] 返回目录
Xiuyuan Cheng, Zichen Miao, Qiang Qiu
Abstract: For the classification of graph data consisting of features sampled on an irregular coarse mesh like landmark points on face and human body, graph neural network (gnn) models based on global graph Laplacians may lack expressiveness to capture local features on graph. The current paper introduces a new gnn layer type with learnable low-rank local graph filters, which significantly reduces the complexity of traditional locally connected gnn. The architecture provides a unified framework for both spectral and spatial convolutional gnn constructions. The new gnn layer is provably more expressive than gnn based on global graph Laplacians, and to improve model robustness, regularization by local graph Laplacians is introduced. The representation stability against input graph data perturbation is theoretically proved, making use of the graph filter locality and the local graph regularization. Experiments on spherical mesh data, real-world facial expression recognition/skeleton-based action recognition data, and data with simulated graph noise show the empirical advantage of the proposed model.
摘要：对于包含在不规则采样粗糙特征的图形数据的分类目等上面和人体，图形神经网络（GNN）基于全局图形拉普拉斯算子模型地标点可能缺乏表现力来捕获图形的局部特征。目前介绍了一种新的GNN层型具有可学习低秩本地图形过滤器，其显著降低了传统的本地连接GNN的复杂性。该体系结构提供两个光谱和空间卷积GNN构造的统一框架。新GNN层可证明是更具表现力基础上的全球图形拉普拉斯算子比GNN，并提高模型的鲁棒性，正由当地图拉普拉斯算子介绍。针对输入图形数据扰动表示稳定性从理论上证明，利用图形滤波器局部性和本地图形正规化。上球形网格数据，真实世界的人脸表情识别/基于骨架的动作识别数据，以及与模拟图数据实验噪声表明了该模型的经验优势。

47. Fast Nonconvex $T_2^*$ Mapping Using ADMM [PDF] 返回目录
Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu
Abstract: Magnetic resonance (MR)-$T_2^*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed images, which limits its wider applicability. In this paper we address this issue by performing $T_2^*$ mapping from undersampled data using compressive sensing (CS). We formulate the reconstruction as a nonconvex problem that can be decomposed into two subproblems. They can be solved either separately via the standard approach or jointly via the alternating direction method of multipliers (ADMM). Compared to previous CS-based approaches that only apply sparse regularization on the spin density $\boldsymbol X_0$ and the relaxation rate $\boldsymbol R_2^*$, our formulation enforces additional sparse priors on the $T_2^*$-weighted images at multiple echoes to improve the reconstruction performance. We performed convergence analysis of the proposed algorithm, evaluated its performance on in vivo data, and studied the effects of different sampling schemes. Experimental results showed that the proposed joint-recovery approach generally outperforms the state-of-the-art method, especially in the low-sampling rate regime, making it a preferred choice to perform fast 3D $T_2^*$ mapping in practice. The framework adopted in this work can be easily extended to other problems arising from MR or other imaging modalities with non-linearly coupled variables.
摘要：磁共振（MR） - $ T_2 ^ * $映射被广泛用于研究出血，钙化和铁沉积在各种临床应用中，它提供了在组织所需的对比度的直接和精确的映射。然而，需要通过常规的三维高分辨率$ T_2长采集时间^ * $映射方法导致的不适的患者，并引入运动伪影，以重建图像，这限制了它的更广泛的适用性。在本文中，我们通过使用压缩感知（CS）从欠采样数据进行$ T_2 ^ * $映射解决这个问题。我们制定重建作为可以分解为两个子问题一个非凸问题。它们可以通过标准的方法单独地或经由乘法器的交替方向法（ADMM）联合来解决。相比于之前的基于CS的方法，只有在自旋密度$ \ boldsymbol X_0 $和松弛率$申请稀疏正规化\ boldsymbol R_2 ^ * $，我们制定强制在$ T_2 ^ * $额外的稀疏先验 - 加权像在多重回波以提高重建性能。我们进行了算法的收敛性分析，评估了体内数据其性能，并研究不同的采样方案的效果。实验结果表明，所提出的关节恢复方法通常优于国家的最先进的方法，特别是在低采样率制度，使得它在实践中执行快速3D $ T_2 ^ * $映射的首选。在这项工作中所采用的框架可以容易地扩展到从MR或其他成像模态所引起与非线性耦合变量的其他问题。

48. An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department [PDF] 返回目录
Farah E. Shamout, Yiqiu Shen, Nan Wu, Aakash Kaku, Jungkyu Park, Taro Makino, Stanisław Jastrzębski, Duo Wang, Ben Zhang, Siddhant Dogra, Meng Cao, Narges Razavian, David Kudlowitz, Lea Azour, William Moore, Yvonne W. Lui, Yindalon Aphinyanaphongs, Carlos Fernandez-Granda, Krzysztof J. Geras
Abstract: During the COVID-19 pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images, and a gradient boosting model that learns from routine clinical variables. Our AI prognosis system, trained using data from 3,661 patients, achieves an AUC of 0.786 (95% CI: 0.742-0.827) when predicting deterioration within 96 hours. The deep neural network extracts informative areas of chest X-ray images to assist clinicians in interpreting the predictions, and performs comparably to two radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at NYU Langone Health during the first wave of the pandemic, which produced accurate predictions in real-time. In summary, our findings demonstrate the potential of the proposed system for assisting front-line physicians in the triage of COVID-19 patients.
摘要：在患者在急诊科COVID-19大流行，快速，准确分诊是至关重要的决策提供信息。我们建议使用深层神经网络的恶化风险的自动预测的数据驱动的方法，从胸部X射线图像获悉，以及梯度推进模式，从常规临床变量获悉。在96小时内预测劣化时：我们的AI预后系统，使用从3661名患者的数据训练，达到0.786的AUC（0.742-0.827 95％CI）。深神经网络提取胸部X射线图像，以帮助临床医生在解释预言的信息领域，并执行到相当的读者学习两名医生。为了在实际临床验证的性能，我们默默的流感大流行，这产生了实时准确的预测第一波期间部署在纽约大学朗格尼健康深层神经网络的初步版本。总之，我们的研究结果表明协助一线医师COVID-19例患者的分流所提出的系统的潜力。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-08-06

目录

摘要