摘要

1. DASGIL: Domain Adaptation for Semantic and Geometric-aware Image-based Localization [PDF] 返回目录
Hanjiang Hu, Ming Cheng, Zhe Liu, Hesheng Wang
Abstract: Long-Term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics due to season, illumination variance, etc. Image retrieval for localization is an efficient and effective solution to the problem. In this paper, we propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition. To use the high-quality ground truths without any human effort, depth and segmentation generator model is trained on virtual synthetic dataset and domain adaptation is adopted from synthetic to real-world dataset. The multi-scale model presents the strong generalization ability on real-world KITTI dataset though trained on the virtual KITTI 2 dataset. The proposed approach is validated on the Extended CMU-Seasons dataset through a series of crucial comparison experiments, where our performance outperforms state-of-the-art baselines for retrieval-based localization under the challenging environment.
摘要：变化的环境下长期视觉定位是在自动驾驶和移动机器人由于季节，照明方差等，用于定位的图像检索是一种高效的和有效的解决该问题的具有挑战性的问题。在本文中，我们提出了一种新的多任务架构融合几何和语义信息转化为视觉识别的地方多尺度潜嵌入表示。使用高品质的基础事实，没有任何人的努力，深度和分段产生模型训练上的虚拟合成数据集和领域适应性是由合成到现实世界的数据集采用。多尺度模型礼物，虽然训练的虚拟KITTI 2集现实世界KITTI数据集的泛化能力强。所提出的方法是通过一系列的关键对比实验，验证了在扩展CMU-四季的数据集，其中我们的表现性能优于国家的最先进的基线为具有挑战性的环境下，基于内容的检索定位。

2. A Multi-modal Machine Learning Approach and Toolkit to Automate Recognition of Early Stages of Dementia among British Sign Language Users [PDF] 返回目录
Xing Liang, Anastassia Angelopoulou, Epaminondas Kapetanios, Bencie Woll, Reda Al-batat, Tyron Woolfe
Abstract: The ageing population trend is correlated with an increased prevalence of acquired cognitive impairments such as dementia. Although there is no cure for dementia, a timely diagnosis helps in obtaining necessary support and appropriate medication. Researchers are working urgently to develop effective technological tools that can help doctors undertake early identification of cognitive disorder. In particular, screening for dementia in ageing Deaf signers of British Sign Language (BSL) poses additional challenges as the diagnostic process is bound up with conditions such as quality and availability of interpreters, as well as appropriate questionnaires and cognitive tests. On the other hand, deep learning based approaches for image and video analysis and understanding are promising, particularly the adoption of Convolutional Neural Network (CNN), which require large amounts of training data. In this paper, however, we demonstrate novelty in the following way: a) a multi-modal machine learning based automatic recognition toolkit for early stages of dementia among BSL users in that features from several parts of the body contributing to the sign envelope, e.g., hand-arm movements and facial expressions, are combined, b) universality in that it is possible to apply our technique to users of any sign language, since it is language independent, c) given the trade-off between complexity and accuracy of machine learning (ML) prediction models as well as the limited amount of training and testing data being available, we show that our approach is not over-fitted and has the potential to scale up.
摘要：人口老龄化趋势与获取的认知损伤，例如痴呆的发病率增加相关。虽然目前还没有治愈老年痴呆症，及时诊断有助于获得必要的支持和适当的药物治疗。研究人员正在紧急工作，制定有效的技术工具，可以帮助医生进行性认知障碍的早期识别。具体地，筛选在老化英国手语（BSL）的签名者聋人痴呆构成作为诊断过程中的约束与诸如质量和口译可用性，以及适当的问卷和认知测试条件的额外挑战。在另一方面，深学习图像和视频分析为基础的方法和理解是有希望的，特别是通过卷积神经网络（CNN）的，这就需要大量的训练数据。在本文中，但是，我们新奇以下列方式证明：a）在多模态机器学习在功能BSL用户贡献的标志的信封，例如身体多个部位中痴呆症的早期阶段，基于自动识别工具包，手臂的动作和表情，组合中的b），有可能对我们的技术适用于任何手语的用户，因为它是独立的语言，三）鉴于复杂性和机器的准确度之间的权衡普遍性学习（ML）预测模型以及训练量有限和测试数据是可用的，我们表明，我们的方法是不要过度安装，并且可以扩展之势。

3. Neural encoding with visual attention [PDF] 返回目录
Meenakshi Khosla, Gia H. Ngo, Keith Jamison, Amy Kuceyeski, Mert R. Sabuncu
Abstract: Visual perception is critically influenced by the focus of attention. Due to limited resources, it is well known that neural representations are biased in favor of attended locations. Using concurrent eye-tracking and functional Magnetic Resonance Imaging (fMRI) recordings from a large cohort of human subjects watching movies, we first demonstrate that leveraging gaze information, in the form of attentional masking, can significantly improve brain response prediction accuracy in a neural encoding model. Next, we propose a novel approach to neural encoding by including a trainable soft-attention module. Using our new approach, we demonstrate that it is possible to learn visual attention policies by end-to-end learning merely on fMRI response data, and without relying on any eye-tracking. Interestingly, we find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns, despite no explicit supervision to do so. Together, these findings suggest that attention modules can be instrumental in neural encoding models of visual stimuli.
摘要：视觉感知关键取决于人们关注的焦点的影响。由于资源有限，这是众所周知的神经表示赞成参加位置的偏差。从一个大的人群的人类受试者的使用并发眼睛跟踪和功能磁共振成像（fMRI）记录看电影，我们首先证明杠杆注视信息，在注意力掩蔽的形式，可以显著改善在神经编码脑响应预测精度模型。接下来，我们通过包括一个可训练软注意模块提出了一种新的方法来神经编码。使用我们的新方法，我们证明了可以通过学习终端到终端的学习视觉注意政策只是对fMRI的响应数据，并且不依赖于任何眼动跟踪。有趣的是，我们发现通过模型估计，关注的位置上独立的数据与相应的眼球固定图案吻合，尽管没有明确的监管这样做。总之，这些研究结果表明，注意力模块可以在视觉刺激的神经编码模型辅助。

4. Linguistic Structure Guided Context Modeling for Referring Image Segmentation [PDF] 返回目录
Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han
Abstract: Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either insufficiently or redundantly model the multimodal this http URL tackle this problem, we propose a "gather-propagate-distribute" scheme to model multimodal context by cross-modal interaction and implement this scheme as a novel Linguistic Structure guided Context Modeling (LSCM) module. Our LSCM module builds a Dependency Parsing Tree suppressed Word Graph (DPT-WG) which guides all the words to include valid multimodal context of the sentence while excluding disturbing ones through three steps over the multimodal feature, i.e., gathering, constrained propagation and distributing. Extensive experiments on four benchmarks demonstrate that our method outperforms all the previous state-of-the-arts.
摘要：参照图像分割目标来预测由自然语言句子称为对象的前景掩码。这句话的背景下多式联运是至关重要的所指对象从背景中区分。现有的方法要么不足或冗余建模多式联运这个HTTP URL解决这个问题，我们提出了“收集，繁殖，分发”计划，通过跨模式的互动多情境建模和实现这一计划，作为一种新型的语言结构引导上下文建模（LSCM ）模块。我们的LSCM模块建立一个依存句法分析树抑制引导所有的字以包括句子的有效多峰上下文，同时通过三个步骤在所述多模态特征排除干扰的，即，收集，约束传播和分配词图（DPT-WG）。四个基准大量实验表明，我们的方法优于以前所有的国家的最艺术。

5. Referring Image Segmentation via Cross-Modal Progressive Comprehension [PDF] 返回目录
Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li
Abstract: Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities, but usually fail to explore informative words of the expression to well align features from the two modalities for accurately identifying the referred entity. In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. Concretely, the CMPC module first employs entity and attribute words to perceive all the related entities that might be considered by the expression. Then, the relational words are adopted to highlight the correct entity as well as suppress other irrelevant ones by multimodal graph reasoning. In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information. In this way, features from multi-levels could communicate with each other and be refined based on the textual context. We conduct extensive experiments on four popular referring segmentation benchmarks and achieve new state-of-the-art performances.
摘要：在分割，可以很好匹配在自然语言表达式给出的描述中的实体的前景掩模参照图像分割目标。先前的方法解决使用视觉和语言模态之间的隐含特征交互和融合此问题，但通常无法探索到阱对准特征从所述两个模态的表达的信息字准确地识别称为实体。在本文中，我们提出了一个跨模态的进步理解（CMPC）模块和文本制导功能Exchange（TGFE）模块，有效地解决了具有挑战性的任务。具体而言，CMPC模块采用第一实体，属性词感知到的所有可能受表达被认为是相关实体。然后，关系词采用突出的多图形推理正确的实体以及其他抑制那些无关。除了CMPC模块，我们进一步利用一个简单而有效的TGFE模块，从不同的层面的理由多的功能与文本信息指导相结合。通过这种方式，从多层次特征可以相互通信和基于文本上下文加以改进。我们对四大流行指分割基准进行了广泛的实验，实现国家的最先进的新表演。

6. Few-Shot Classification By Few-Iteration Meta-Learning [PDF] 返回目录
Ardhendu Shekhar Tripathi, Martin Danelljan, Luc Van Gool, Radu Timofte
Abstract: Learning in a low-data regime from only a few labeled examples is an important, but challenging problem. Recent advancements within meta-learning have demonstrated encouraging performance, in particular, for the task of few-shot classification. We propose a novel optimization-based meta-learning approach for few-shot classification. It consists of an embedding network, providing a general representation of the image, and a base learner module. The latter learns a linear classifier during the inference through an unrolled optimization procedure. We design an inner learning objective composed of (i) a robust classification loss on the support set and (ii) an entropy loss, allowing transductive learning from unlabeled query samples. By employing an efficient initialization module and a Steepest Descent based optimization algorithm, our base learner predicts a powerful classifier within only a few iterations. Further, our strategy enables important aspects of the base learner objective to be learned during meta-training. To the best of our knowledge, this work is the first to integrate both induction and transduction into the base learner in an optimization-based meta-learning framework. We perform a comprehensive experimental analysis, demonstrating the effectiveness of our approach on four few-shot classification datasets.
摘要：学习从只有几个标记的例子低数据政权是一个重要的，但具有挑战性的问题。元学习中的最新进展表明令人鼓舞的表现，特别是对一些拍分类的任务。我们提出了几拍分类的新型基于优化的元学习方法。它由一个嵌入网络，提供的图像的一般表示，和碱学习者模块。后者通过获悉的展开优化过程的推理过程中的线性分类器。我们设计（I）的健壮分类在支撑组和（ii）的熵损失损失，允许从非标记查询样品式学习的组成的一个内的学习目标。通过采用高效的初始化模块和基于最速下降优化算法，我们的基本预测学习者只有几个迭代中一个强大的分类。此外，我们的策略使学习者目标是在元培训学习基础的重要方面。据我们所知，这项工作是第一个到两个感应和传导到基础学习者在基于优化的元学习框架集成。我们进行了全面的实验分析，表明四为数不多的镜头分类数据集我们的方法的有效性。

7. An Ultra Lightweight CNN for Low Resource Circuit Component Recognition [PDF] 返回目录
Yingnan Ju, Yue Chen
Abstract: In this paper, we present an ultra lightweight system that can effectively recognize different circuit components in an image with very limited training data. Along with the system, we also release the data set we created for the task. A two-stage approach is employed by our system. Selective search was applied to find the location of each circuit component. Based on its result, we crop the original image into smaller pieces. The pieces are then fed to the Convolutional Neural Network (CNN) for classification to identify each circuit component. It is of engineering significance and works well in circuit component recognition in a low resource setting. The accuracy of our system reaches 93.4\%, outperforming the support vector machine (SVM) baseline (75.00%) and the existing state-of-the-art RetinaNet solutions (92.80%).
摘要：在本文中，我们提出，能有效地以非常有限的训练数据识别图像中不同的电路元件的超轻系统。随着系统，我们也发布数据集，我们的任务创建。两阶段的方式是通过我们的系统使用。选择性搜索施加到找到每个电路组件的位置。基于其结果，我们裁剪原始图像成小块。然后，将片被馈送到卷积神经网络（CNN）进行分类，以确定每个电路组件。它的工程意义和行之有效的电路元件识别在低资源设置。我们的系统的准确率达到93.4 \％，表现优于支持向量机（SVM）的基线（75.00％）和现有状态的最先进的RetinaNet溶液（92.80％）。

8. Mini-DDSM: Mammography-based Automatic Age Estimation [PDF] 返回目录
Charitha Dissanayake Lekamlage, Fabia Afzal, Erik Westerberg, Abbas Cheddad
Abstract: Age estimation has attracted attention for its various medical applications. There are many studies on human age estimation from biomedical images. However, there is no research done on mammograms for age estimation, as far as we know. The purpose of this study is to devise an AI-based model for estimating age from mammogram images. Due to lack of public mammography data sets that have the age attribute, we resort to using a web crawler to download thumbnail mammographic images and their age fields from the public data set; the Digital Database for Screening Mammography. The original images in this data set unfortunately can only be retrieved by a software which is broken. Subsequently, we extracted deep learning features from the collected data set, by which we built a model using Random Forests regressor to estimate the age automatically. The performance assessment was measured using the mean absolute error values. The average error value out of 10 tests on random selection of samples was around 8 years. In this paper, we show the merits of this approach to fill up missing age values. We ran logistic and linear regression models on another independent data set to further validate the advantage of our proposed work. This paper also introduces the free-access Mini-DDSM data set.
摘要：年龄估计已经引起了关注，各种医疗应用。还有从生物医学图像人类年龄估计很多研究。然而，没有乳房X线照片上的年龄估计完成，因为据我们所知研究。这项研究的目的是，设计一个基于人工智能的模型从乳房X光图像估计年龄。由于缺乏具有时代属性的公共乳房X光检查的数据集，我们求助于使用网络爬虫下载缩略图X线影像，并从公共数据集他们的年龄字段;数字化数据库的乳房摄影筛检。在此数据不幸设定原稿图像可以仅由被破坏一个软件进行检索。随后，我们提取所收集的数据集，通过它我们建立使用随机森林回归自动估计年龄的模型深度学习功能。性能评估是用平均绝对误差值测量。 10次的平均误差值超出样本的随机选择是大约8年。在本文中，我们证明这种方法的优点，以填补缺失的时代价值。我们跑了后勤和线性回归模型上的另一个独立的数据集，以进一步验证我们提出的工作的优势。本文还介绍了免费迷你DDSM数据集。

9. From Handcrafted to Deep Features for Pedestrian Detection: A Survey [PDF] 返回目录
Jiale Cao, Yanwei Pang, Jin Xie, Fahad Shahbaz Khan, Ling Shao
Abstract: Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and compare some representative methods. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at this https URL.
摘要：行人检测是计算机视觉中一个重要但又充满挑战的问题，尤其是在人类为中心的任务。在过去的十年中，显著改善被目击与手工制作的特色和深厚的功能帮助。在这里，我们在行人检测的最新进展提交一份全面的调查。首先，我们提供包括手工基于特征的方法和深厚的特点为基础的方法单光谱行人探测的详细审查。对于基于手工方法的特点，我们提出的方法进行全面的审查，并找到在外形和空间大的自由度是手工制作的特点有更好的表现。在深基于特征的方法的情况下，我们将它们分为纯基础CNN方法和那些同时采用手工和基于CNN功能。我们给这些方法，在功能增强，部分感知，和后处理方法已引起主要关注的统计分析和趋势。除了单光谱行人检测，我们还检查多光谱行人检测，它提供了更强大的功能用于照明方差。此外，我们介绍了一些相关的数据集和评价指标，以及一些比较有代表性的方法。我们强调的是需要解决的问题，并强调各种未来方向有待解决的问题总结本次调查。研究人员可以跟踪在此HTTPS URL向上最新名单。

10. X-Fields: Implicit Neural View-, Light- and Time-Image Interpolation [PDF] 返回目录
Mojtaba Bemana, Karol Myszkowski, Hans-Peter Seidel, Tobias Ritschel
Abstract: We suggest to represent an X-Field -a set of 2D images taken across different view, time or illumination conditions, i.e., video, light field, reflectance fields or combinations thereof-by learning a neural network (NN) to map their view, time or light coordinates to 2D images. Executing this NN at new coordinates results in joint view, time or light interpolation. The key idea to make this workable is a NN that already knows the "basic tricks" of graphics (lighting, 3D projection, occlusion) in a hard-coded and differentiable form. The NN represents the input to that rendering as an implicit map, that for any view, time, or light coordinate and for any pixel can quantify how it will move if view, time or light coordinates change (Jacobian of pixel position with respect to view, time, illumination, etc.). Our X-Field representation is trained for one scene within minutes, leading to a compact set of trainable parameters and hence real-time navigation in view, time and illumination.
摘要：我们建议其-通过学习神经网络（NN）映射到代表的X字段-a组在不同视图中，时间或照明条件下，即，视频，光场，反射场或组合考虑的2D图像的其视图，时间或光坐标2D图像。在联合视图，时间或浅插新坐标结果执行该NN。使这个可行的核心思想是，已经知道在一个硬编码的微格式的图形的“基本技巧”（照明，3D投影，闭塞）一NN。在NN表示输入到渲染作为隐式映射，对于任何视图，时间或光的坐标和用于任何像素可以量化如何将移动如果来看，时间或光的坐标变化（雅可比像素位置相对于图，时间，照明等）。我们的X-字段表示被训练分钟内的一个场景，从而导致紧凑集可训练参数和在视图中，时间和照度，因此实时导航的。

11. DeepFakesON-Phys: DeepFakes Detection based on Heart Rate Estimation [PDF] 返回目录
Javier Hernandez-Ortega, Ruben Tolosana, Julian Fierrez, Aythami Morales
Abstract: This work introduces a novel DeepFake detection framework based on physiological measurement. In particular, we consider information related to the heart rate using remote photoplethysmography (rPPG). rPPG methods analyze video sequences looking for subtle color changes in the human skin, revealing the presence of human blood under the tissues. In this work we investigate to what extent rPPG is useful for the detection of DeepFake videos. The proposed fake detector named DeepFakesON-Phys uses a Convolutional Attention Network (CAN), which extracts spatial and temporal information from video frames, analyzing and combining both sources to better detect fake videos. This detection approach has been experimentally evaluated using the latest public databases in the field: Celeb-DF and DFDC. The results achieved, above 98% AUC (Area Under the Curve) on both databases, outperform the state of the art and prove the success of fake detectors based on physiological measurement to detect the latest DeepFake videos.
摘要：该作品介绍了基于生理测量的新颖DeepFake检测框架。特别是，我们考虑与使用远程光电容积描记（rPPG）的心脏速率信息。 rPPG方法分析寻找人体皮肤细微的色彩变化，揭示了人体血液的组织下，呈现视频序列。在这项工作中，我们调查到什么程度rPPG是用于检测DeepFake视频有用。命名DeepFakesON-物理学所提出的假检测器使用卷积注意网络（CAN），它提取从视频帧的空间和时间信息，分析和两个源组合以更好地检测假视频。名人-DF和DFDC：这种检测方法已经使用在该领域的最新公共数据库被实验评估。所取得的结果，98％以上的AUC（曲线下面积）上两个数据库，优于现有技术的状态，并证明基于生理测量结果以检测最新DeepFake视频假检测器的成功。

12. Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue [PDF] 返回目录
Zipeng Xu, Fangxiang Feng, Xiaojie Wang, Yushu Yang, Huixing Jiang, Zhongyuan Ouyang
Abstract: A goal-oriented visual dialogue involves multi-turn interactions between two agents, Questioner and Oracle. During which, the answer given by Oracle is of great significance, as it provides golden response to what Questioner concerns. Based on the answer, Questioner updates its belief on target visual content and further raises another question. Notably, different answers drive into different visual beliefs and future questions. However, existing methods always indiscriminately encode answers after much longer questions, resulting in a weak utilization of answers. In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states. First, we propose an Answer-Driven Focusing Attention (ADFA) to capture the answer-driven effect on visual attention by sharpening question-related attention and adjusting it by answer-based logical operation at each turn. Then based on the focusing attention, we get the visual state estimation by Conditional Visual Information Fusion (CVIF), where overall information and difference information are fused conditioning on the question-answer state. We evaluate the proposed ADVSE to both question generator and guesser tasks on the large-scale GuessWhat?! dataset and achieve the state-of-the-art performances on both tasks. The qualitative results indicate that the ADVSE boosts the agent to generate highly efficient questions and obtains reliable visual attentions during the reasonable question generation and guess processes.
摘要：视觉对话面向目标的涉及二级代理商，发问和Oracle之间的多圈的相互作用。在此期间，由甲骨文给出的答案是意义重大，因为它提供了什么发问者关注黄金响应。根据答案，提问更新其目标视觉内容的信念，进一步提出了另一个问题。值得注意的是，不同的答案开车到不同的视觉信仰和未来的问题。然而，总是要长得多问题后胡乱编的答案，导致答案的弱利用现有的方法。在本文中，我们提出了一个答案驱动的视觉状态估计（ADVSE）强加给视觉状态不同的答案的效果。首先，我们提出了一个答案驱动的注意力集中（ADFA）通过锐化问题相关的关注，并在每回合基于答案逻辑动作调整它来捕捉视觉注意答案推动作用。然后根据集中注意力，我们得到了由条件视觉信息融合（CVIF），其中整体信息和差分信息融合在问答状态调节的可视状态估计。我们评估的建议ADVSE这两个问题产生和猜测者任务的大型GuessWhat？数据集，并实现两个任务的国家的最先进的性能。定性结果表明，ADVSE提升代理人产生高效率的问题，并在合理的问题的产生和猜测过程中获得可靠的视觉关注。

13. Meta-Consolidation for Continual Learning [PDF] 返回目录
K J Joseph, Vineeth N Balasubramanian
Abstract: The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a novel methodology for continual learning called MERLIN: Meta-Consolidation for Continual Learning. We assume that weights of a neural network $\boldsymbol \psi$, for solving task $\boldsymbol t$, come from a meta-distribution $p(\boldsymbol{\psi|t})$. This meta-distribution is learned and consolidated incrementally. We operate in the challenging online continual learning setting, where a data point is seen by the model only once. Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini-ImageNet datasets show consistent improvement over five baselines, including a recent state-of-the-art, corroborating the promise of MERLIN.
摘要：不断学习和适应自己新的任务，而不会失去已经获得的知识掌握的能力是生物学的学习系统，目前的深度学习系统短的下降的标志。在这项工作中，我们提出呼吁MERLIN为不断学习一种新方法：元合并为不断学习。我们假设一个神经网络$的是权重\ boldsymbol \ $ PSI，解决任务$ \ boldsymbol T $，来自元发行$ P（\ boldsymbol {\ PSI | T}）$。这间分布教训，并逐步巩固。我们在充满挑战的网上持续学习环境，其中数据点由模型只看到一次操作。我们与MNIST，CIFAR-10，CIFAR-100和Mini-ImageNet数据集的持续学习的基准实验表明，持续改进过五条基线，包括最近的国家的最先进的，证实MERLIN的承诺。

14. Can You Trust Your Pose? Confidence Estimation in Visual Localization [PDF] 返回目录
Luca Ferranti, Xiaotian Li, Jani Boutellier, Juho Kannala
Abstract: Camera pose estimation in large-scale environments is still an open question and, despite recent promising results, it may still fail in some situations. The research so far has focused on improving subcomponents of estimation pipelines, to achieve more accurate poses. However, there is no guarantee for the result to be correct, even though the correctness of pose estimation is critically important in several visual localization applications,such as in autonomous navigation. In this paper we bring to attention a novel research question, pose confidence estimation,where we aim at quantifying how reliable the visually estimated pose is. We develop a novel confidence measure to fulfil this task and show that it can be flexibly applied to different datasets,indoor or outdoor, and for various visual localization pipelines.We also show that the proposed techniques can be used to accomplish a secondary goal: improving the accuracy of existing pose estimation pipelines. Finally, the proposed approach is computationally light-weight and adds only a negligible increase to the computational effort of pose estimation.
摘要：在大型环境中相机姿态估计仍然是一个悬而未决的问题，尽管近期有希望的结果，但仍可能无法在某些情况下。这项研究迄今一直专注于提高估计管线子，以获得更准确的姿势。然而，没有保证的结果是正确的，尽管姿态估计的正确性是在几个可视本地化应用，如在自主导航至关重要。在本文中，我们提请关注的一个新的研究问题，提出置信估计，我们的目标是量化的目测估计姿势有多可靠。我们开发了一种新的信任措施，以完成这一任务，并表明它可以灵活地应用到不同的数据集，室内或室外，以及各种视觉定位pipelines.We也表明，该技术可以用于完成第二个目标：提高现有的姿势估计管线的准确度。最后，该方法在计算上是轻量，并增加了只有微不足道的增加姿态估计的计算工作量。

15. Training general representations for remote sensing using in-domain knowledge [PDF] 返回目录
Maxim Neumann, André Susano Pinto, Xiaohua Zhai, Neil Houlsby
Abstract: Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples. This paper investigates development of generic remote sensing representations, and explores which characteristics are important for a dataset to be a good source for representation learning. For this analysis, five diverse remote sensing datasets are selected and used for both, disjoint upstream representation learning and downstream model training and evaluation. A common evaluation protocol is used to establish baselines for these datasets that achieve state-of-the-art performance. As the results indicate, especially with a low number of available training samples a significant performance enhancement can be observed when including additionally in-domain data in comparison to training models from scratch or fine-tuning only on ImageNet (up to 11% and 40%, respectively, at 100 training samples). All datasets and pretrained representation models are published online.
摘要：自动寻找优秀与一般遥感表示允许在广泛的应用进行迁移学习 - 提高了测量精度和减少所需的训练样本。本文研究了通用遥感交涉，并探讨其特性很重要的一个数据集的发展成为代表学习的良好来源。对于该分析，五个多样遥感数据集被选择并用于两个，不相交的上游表示学习和下游模型训练和评价。一个常见的评价协议用于建立基线为这些数据集即实现状态的最先进的性能。作为结果表明，特别是仅在低数量的可用的训练样本一个显著性能增强可以包括附加在域数据相比，从头或微调训练模型时，可以观察到的ImageNet（高达11％和40％分别在100个训练样本）。所有的数据集和预先训练表示模型在网上公布。

16. Deep-3DAligner: Unsupervised 3D Point Set Registration Network With Optimizable Latent Vector [PDF] 返回目录
Lingjing Wang, Xiang Li, Yi Fang
Abstract: Point cloud registration is the process of aligning a pair of point sets via searching for a geometric transformation. Unlike classical optimization-based methods, recent learning-based methods leverage the power of deep learning for registering a pair of point sets. In this paper, we propose to develop a novel model that organically integrates the optimization to learning, aiming to address the technical challenges in 3D registration. More specifically, in addition to the deep transformation decoding network, our framework introduce an optimizable deep \underline{S}patial \underline{C}orrelation \underline{R}epresentation (SCR) feature. The SCR feature and weights of the transformation decoder network are jointly updated towards the minimization of an unsupervised alignment loss. We further propose an adaptive Chamfer loss for aligning partial shapes. To verify the performance of our proposed method, we conducted extensive experiments on the ModelNet40 dataset. The results demonstrate that our method achieves significantly better performance than the previous state-of-the-art approaches in the full/partial point set registration task.
摘要：点云登记是通过搜索一个几何变换对准的一对点集的过程。不同于传统的基于优化的方法，最近基于学习的方法，利用深度学习的动力登记一对点集。在本文中，我们提出了开发有机结合的优化学习，旨在解决3D对准的技术挑战一个新的模型。更具体地，除了深变换解码网络，我们的框架引入一个可优化深\下划线{S}（ε2）\下划线{C} orrelation \下划线{R} epresentation（SCR）功能。该SCR功能和变换解码器网络的权重对无监督对准损失的最小化共同更新。我们进一步提出了对齐的部分形状自适应倒角损失。为了验证我们提出的方法的性能，我们在ModelNet40数据集进行了广泛的实验。结果表明，比以前的国家的最先进的全部/部分点集登记工作接近我们的方法实现显著更好的性能。

17. Open-Set Hypothesis Transfer with Semantic Consistency [PDF] 返回目录
Zeyu Feng, Chang Xu, Dacheng Tao
Abstract: Unsupervised open-set domain adaptation (UODA) is a realistic problem where unlabeled target data contain unknown classes. Prior methods rely on the coexistence of both source and target domain data to perform domain alignment, which greatly limits their applications when source domain data are restricted due to privacy concerns. This paper addresses the challenging hypothesis transfer setting for UODA, where data from source domain are no longer available during adaptation on target domain. We introduce a method that focuses on the semantic consistency under transformation of target data, which is rarely appreciated by previous domain adaptation methods. Specifically, our model first discovers confident predictions and performs classification with pseudo-labels. Then we enforce the model to output consistent and definite predictions on semantically similar inputs. As a result, unlabeled data can be classified into discriminative classes coincided with either source classes or unknown classes. Experimental results show that our model outperforms state-of-the-art methods on UODA benchmarks.
摘要：无监督的开放式集合领域适应性（UODA）是一个现实的问题，即未标记的目标数据包含未知类。现有方法依赖于源和目标域数据的共存来执行域对准，这大大限制了它们的应用程序时，源域数据由于隐私问题的限制。本文地址UODA，其中在目标域的适应过程中从源域数据不再可用的挑战假设传送设定。我们介绍下目标数据，这是很少由先前的域自适应方法理解的变换集中于语义的一致性的方法。具体来说，我们的模型首次发现与伪标签可信的预测和执行分类。然后，我们执行模型对语义相似的输入输出保持一致和明确的预测。其结果是，未标记数据可以被分类为与任一源类或类别未知一致判别类。实验结果表明，我们的模型优于上UODA基准国家的最先进的方法。

18. Cost-Sensitive Regularization for Diabetic Retinopathy Grading from Eye Fundus Images [PDF] 返回目录
Adrian Galdran, José Dolz, Hadi Chakor, Hervé Lombaert, Ismail Ben Ayed
Abstract: Assessing the degree of disease severity in biomedical images is a task similar to standard classification but constrained by an underlying structure in the label space. Such a structure reflects the monotonic relationship between different disease grades. In this paper, we propose a straightforward approach to enforce this constraint for the task of predicting Diabetic Retinopathy (DR) severity from eye fundus images based on the well-known notion of Cost-Sensitive classification. We expand standard classification losses with an extra term that acts as a regularizer, imposing greater penalties on predicted grades when they are farther away from the true grade associated to a particular image. Furthermore, we show how to adapt our method to the modelling of label noise in each of the sub-problems associated to DR grading, an approach we refer to as Atomic Sub-Task modeling. This yields models that can implicitly take into account the inherent noise present in DR grade annotations. Our experimental analysis on several public datasets reveals that, when a standard Convolutional Neural Network is trained using this simple strategy, improvements of 3-5\% of quadratic-weighted kappa scores can be achieved at a negligible computational cost. Code to reproduce our results is released at this https URL.
摘要：在生物医学图像评估疾病严重程度是类似于标准分类，但通过在标签空间底层结构约束的任务。这种结构反映了不同的疾病等级之间的单调关系。在本文中，我们提出了一个简单的方法来加强这个约束预测基于成本敏感型分类的著名概念眼底图像糖尿病视网膜病变（DR）程度的任务。我们扩大与充当正则，气势上预测成绩更大的惩罚，当他们远离相关联的特定图像的真实等级的额外项标准分类的损失。此外，我们将展示如何我们的方法适应每一个相关联的DR分级子问题标签噪音的造型，这种做法我们称之为原子子任务建模。这个收益率模型，可以含蓄地考虑到固有噪声存在于DR等级注解。我们在几个公开的数据集实验分析表明，当一个标准的卷积神经网络是使用这个简单的策略的培训，二次型，加权卡帕比分3-5 \％的改善可以忽略不计的计算成本来实现。代码重现我们的结果，在此HTTPS URL被释放。

19. Action Units Recognition with Pairwise Deep Architecture [PDF] 返回目录
Junya Saito, Kentaro Murase
Abstract: In this paper, we propose a new automatic Action Units (AUs) recognition method used in a competition, Affective Behavior Analysis in-the-wild (ABAW). Our method uses pairwise deep architecture to tackle a problem of AUs label criteria change in different videos. While the baseline score is 0.31, our method achieved 0.65 in validation dataset of the competition.
摘要：在本文中，我们提出了在比赛中使用新的自动操作单元（AU）识别方法，情感行为分析中最野性（ABAW）。我们的方法是使用两两深架构，以解决不同的视频AU的标签标准变化的问题。虽然基线得分是0.31，我们的方法在竞争中的验证数据集实现0.65。

20. RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation [PDF] 返回目录
Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto
Abstract: The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the phrases in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, with the non-trivial REs annotated with seven RE semantic categories. We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided VOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.
摘要：，视频对象分割的与参照表达的任务（语言引导VOS）是给定的一个语言短语和视频，生成用于向所述短语是指所述对象的二进制掩码。我们的工作认为，用于这个任务的现有基准主要是由琐碎的案件，其中所指可以用简单的词语来识别。我们的分析依赖于戴维斯 - 2017年的短语和演员，行动数据集的新的分类到平凡和不平凡的RE，与七RE语义范畴诠释了不平凡的RE。我们充分利用这些数据来分析RefVOS，即获得了艺术成果的语言引导VOS的语言引导下的图像分割和国家竞争力的任务结果的新型神经网络的结果。我们的研究表明，对于任务的主要挑战与理解运动和静止动作。

21. CariMe: Unpaired Caricature Generation with Multiple Exaggerations [PDF] 返回目录
Zheng Gu, Chuanqi Dong, Jing Huo, Wenbin Li, Yang Gao
Abstract: Caricature generation aims to translate real photos into caricatures with artistic styles and shape exaggerations while maintaining the identity of the subject. Different from the generic image-to-image translation, drawing a caricature automatically is a more challenging task due to the existence of various spacial deformations. Previous caricature generation methods are obsessed with predicting definite image warping from a given photo while ignoring the intrinsic representation and distribution for exaggerations in caricatures. This limits their ability on diverse exaggeration generation. In this paper, we generalize the caricature generation problem from instance-level warping prediction to distribution-level deformation modeling. Based on this assumption, we present the first exploration for unpaired CARIcature generation with Multiple Exaggerations (CariMe). Technically, we propose a Multi-exaggeration Warper network to learn the distribution-level mapping from photo to facial exaggerations. This makes it possible to generate diverse and reasonable exaggerations from randomly sampled warp codes given one input photo. To better represent the facial exaggeration and produce fine-grained warping, a deformation-field-based warping method is also proposed, which helps us to capture more detailed exaggerations than other point-based warping methods. Experiments and two perceptual studies prove the superiority of our method comparing with other state-of-the-art methods, showing the improvement of our work on caricature generation.
摘要：漫画一代旨在真实照片转化为与艺术风格，并同时保持主体的身份形状夸张的漫画。从一般的图像 - 图像转换不同的是，自动绘制漫画是一个更具挑战性的任务，由于各种空间变形的存在。以前漫画生成方法迷恋而忽略在漫画夸大的内在表示和分配从给定的照片预测定图像变形。这限制了他们对不同的夸张发电能力。在本文中，我们概括从实例级变形预测分布级变形造型的漫画一代的问题。基于这个假设，我们提出了一个未成对漫画产生与多夸张（CariMe）第一口勘探。从技术上讲，我们提出了一个多夸张整经机网络来学习，从照片中的分布级映射到面部夸张。这使得能够产生从给定的一个输入相片随机取样的经编码多样化和合理夸张。为了更好地代表面部夸张和生产细粒翘曲，基于形变场扭曲方法也提出，这有助于我们捕捉到比其他基于点的变形方法的更详细的夸张。实验和两个感性的研究证明了该方法的优越性与其他国家的最先进的方法相比，显示出我们对漫画产生工作的改进。

22. MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding [PDF] 返回目录
Xiaoman Qi, PanPan Zhu, Yuebin Wang, Liqiang Zhang, Junhuan Peng, Mengfan Wu, Jialong Chen, Xudong Zhao, Ning Zang, P.Takis Mathiopoulos
Abstract: To better understand scene images in the field of remote sensing, multi-label annotation of scene images is necessary. Moreover, to enhance the performance of deep learning models for dealing with semantic scene understanding tasks, it is vital to train them on large-scale annotated data. However, most existing datasets are annotated by a single label, which cannot describe the complex remote sensing images well because scene images might have multiple land cover classes. Few multi-label high spatial resolution remote sensing datasets have been developed to train deep learning models for multi-label based tasks, such as scene classification and image retrieval. To address this issue, in this paper, we construct a multi-label high spatial resolution remote sensing dataset named MLRSNet for semantic scene understanding with deep learning from the overhead perspective. It is composed of high-resolution optical satellite or aerial images. MLRSNet contains a total of 109,161 samples within 46 scene categories, and each image has at least one of 60 predefined labels. We have designed visual recognition tasks, including multi-label based image classification and image retrieval, in which a wide variety of deep learning approaches are evaluated with MLRSNet. The experimental results demonstrate that MLRSNet is a significant benchmark for future research, and it complements the current widely used datasets such as ImageNet, which fills gaps in multi-label image research. Furthermore, we will continue to expand the MLRSNet. MLRSNet and all related materials have been made publicly available at this https URL and this https URL.
摘要：为了更好地理解在遥感领域的场景图像，场景图像的多标签标注是必要的。此外，为了促进深度学习模型来处理语义理解现场执行任务时，至关重要的是要培养他们对大型注释的数据。然而，大多数现有数据集是由一个单一的标签，这是不能描述复杂的遥感影像很好，因为场景图像可能有多个土地覆盖类注解。几个多品牌高空间分辨率遥感数据集已发展到训练深度学习模型多标签基于任务，如场景分类和图像检索。为了解决这个问题，在本文中，我们构建了一个多品牌的高空间分辨率遥感数据集命名MLRSNet与从架空的角度深度学习语义场景理解。它是由高分辨率的光学卫星或空中图像。 MLRSNet包含内46场景类共109161个样本，并且每个图像具有60层预定义的标签的至少一个。我们设计的视觉识别任务，包括多标签基于图像分类和图像检索，在各种各样的深度学习方法与MLRSNet评估。实验结果表明，MLRSNet是未来研究的一个显著的基准，并补充了目前广泛使用的数据集，如ImageNet，填补了多品牌形象研究的空白。此外，我们将继续扩大MLRSNet。 MLRSNet和所有相关材料已被公之于众，在这个HTTPS URL，这HTTPS URL。

23. Quantum Annealing Approaches to the Phase-Unwrapping Problem in Synthetic-Aperture Radar Imaging [PDF] 返回目录
Khaled A. Helal Kelany, Nikitas Dimopoulos, Clemens P. J. Adolphs, Bardia Barabadi, Amirali Baniasadi
Abstract: The focus of this work is to explore the use of quantum annealing solvers for the problem of phase unwrapping of synthetic aperture radar (SAR) images. Although solutions to this problem exist based on network programming, these techniques do not scale well to larger-sized images. Our approach involves formulating the problem as a quadratic unconstrained binary optimization (QUBO) problem, which can be solved using a quantum annealer. Given that present embodiments of quantum annealers remain limited in the number of qubits they possess, we decompose the problem into a set of subproblems that can be solved individually. These individual solutions are close to optimal up to an integer constant, with one constant per sub-image. In a second phase, these integer constants are determined as a solution to yet another QUBO problem. We test our approach with a variety of software-based QUBO solvers and on a variety of images, both synthetic and real. Additionally, we experiment using D-Wave Systems's quantum annealer, the D-Wave 2000Q. The software-based solvers obtain high-quality solutions comparable to state-of-the-art phase-unwrapping solvers. We are currently working on optimally mapping the problem onto the restricted topology of the quantum annealer to improve the quality of the solution.
摘要：本工作的重点是探索的相位展开合成孔径雷达（SAR）图像的问题使用量子退火求解器。虽然对这个问题的解决方案存在网络编程基础，这些技术不能很好地扩展到更大尺寸的图像。我们的方法包括制定问题，二次约束二进制优化（QUBO）的问题，可以使用量子退火来解决。鉴于量子退火炉的本实施例中它们具有量子位的数目仍然有限，我们将问题分解成一组子问题可单独解决。这些单独的解决方案是接近最佳的为整数的常数，每个子图像的一个常数。在第二阶段中，这些整数常数被确定为一个溶液又一QUBO问题。我们与各种基于软件的QUBO求解器和各种图像，合成和实际的测试我们的方法。此外，我们用实验d波系统的量子退火中，d波2000Q。基于软件的解算器获得高质量的解决方案相媲美的国家的最先进的相位解缠解算器。目前，我们正在对最佳映射问题到量子退火的限制拓扑结构，以提高解决方案的质量。

24. Multi-label Classification of Common Bengali Handwritten Graphemes: Dataset and Challenge [PDF] 返回目录
Samiul Alam, Tahsin Reasat, Asif Shahriyar Sushmit, Sadi Mohammad Siddiquee, Fuad Rahman, Mahady Hasan, Ahmed Imtiaz Humayun
Abstract: Latin has historically led the state-of-the-art in handwritten optical character recognition (OCR) research. Adapting existing systems from Latin to alpha-syllabary languages is particularly challenging due to a sharp contrast between their orthographies. The segmentation of graphical constituents corresponding to characters becomes significantly hard due to a cursive writing system and frequent use of diacritics in the alpha-syllabary family of languages. We propose a labeling scheme based on graphemes (linguistic segments of word formation) that makes segmentation inside alpha-syllabary words linear and present the first dataset of Bengali handwritten graphemes that are commonly used in an everyday context. The dataset is open-sourced as a part of the this http URL Handwritten Grapheme Classification Challenge on Kaggle to benchmark vision algorithms for multi-label grapheme classification. From competition proceedings, we see that deep learning methods can generalize to a large span of uncommon graphemes even when they are absent during training.
摘要：拉美历史上带动了国家的最先进的手写光学字符识别（OCR）的研究。适应从拉丁现有系统的α-音节语言是特别具有挑战性的，因为它们的正字法之间的鲜明对比。对应的字符图形成分的分割变得草书书写系统和语言的字母音节家庭频繁使用变音符号的显著硬所致。我们建议，使分割内部的α-音节字线性和呈现常用于日常上下文中使用孟加拉语手写字形的第一数据集基于字形（构词语言链段）的标记方案。该数据集是开源的这个HTTP URL手写字形分类挑战上Kaggle的一部分，以基准视觉算法的多标签分类字形。从竞争诉讼，我们看到深的学习方法可以推广到大跨度罕见字形的，甚至当他们在训练中缺席。

25. Deformable Kernel Convolutional Network for Video Extreme Super-Resolution [PDF] 返回目录
Xuan Xu, Xin Xiong, Jinge Wang, Xin Li
Abstract: Video super-resolution, which attempts to reconstruct high-resolution video frames from their corresponding low-resolution versions, has received increasingly more attention in recent years. Most existing approaches opt to use deformable convolution to temporally align neighboring frames and apply traditional spatial attention mechanism (convolution based) to enhance reconstructed features. However, such spatial-only strategies cannot fully utilize temporal dependency among video frames. In this paper, we propose a novel deep learning based VSR algorithm, named Deformable Kernel Spatial Attention Network (DKSAN). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC_Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can better exploit both spatial and temporal redundancies to facilitate the information propagation across different layers. We have tested DKSAN on AIM2020 Video Extreme Super-Resolution Challenge to super-resolve videos with a scale factor as large as 16. Experimental results demonstrate that our proposed DKSAN can achieve both better subjective and objective performance compared with the existing state-of-the-art EDVR on Vid3oC and IntVID datasets.
摘要：视频超分辨率，它试图从他们相应的低分辨率版本重构高分辨率视频帧，已获得越来越多的关注，近年来。大多数现有的方法选择使用变形卷积时间对齐相邻帧和应用（基于卷积）传统的空间注意机制，以提高重建的功能。然而，这样只有空间的战略，不能充分利用视频帧之间的时间相关。在本文中，我们提出了一种新的基于深VSR学习算法，名为变形内核空间注意网络（DKSAN）。由于新设计的变形内核卷积对齐（DKC_Align）和可变形的内核空间注意（DKSA）模块，DKSAN可以更好地利用空间和时间冗余以促进跨越不同层的信息传播。我们已经在AIM2020视频至尊超分辨率测试的挑战对DKSAN超决心影片，比例因子大到16实验结果表明，我们提出的DKSAN可以相比实现双方更好的主观和客观性能现有状态的最在Vid3oC和IntVID数据集-art EDVR。

26. Self-Guided Multiple Instance Learning for Weakly Supervised Disease Classification and Localization in Chest Radiographs [PDF] 返回目录
Constantin Seibold, Jens Kleesiek, Heinz-Peter Schlemmer, Rainer Stiefelhagen
Abstract: The lack of fine-grained annotations hinders the deployment of automated diagnosis systems, which require human-interpretable justification for their decision process. In this paper, we address the problem of weakly supervised identification and localization of abnormalities in chest radiographs. To that end, we introduce a novel loss function for training convolutional neural networks increasing the \emph{localization confidence} and assisting the overall \emph{disease identification}. The loss leverages both image- and patch-level predictions to generate auxiliary supervision. Rather than forming strictly binary from the predictions as done in previous loss formulations, we create targets in a more customized manner, which allows the loss to account for possible misclassification. We show that the supervision provided within the proposed learning scheme leads to better performance and more precise predictions on prevalent datasets for multiple-instance learning as well as on the NIH~ChestX-Ray14 benchmark for disease recognition than previously used losses.
摘要：缺乏细粒度的注解阻碍自动诊断系统，这就需要为他们的决策过程中的人为可解释的理由的部署。在本文中，我们要解决的弱监督鉴定和胸片异常的本地化问题。为此，我们引入了训练卷积神经网络增加了\ {EMPH本地化信心}和协助整体\ {EMPH病鉴别}一个新的损失函数。损失同时利用图像 - 和修补程序级别的预测，以产生辅助监督。而不是形成从作为先前的损耗配方所做的预言严格二进制，我们在一个更加个性化的方式，允许对可能误判输给帐户创建目标。我们表明，监管所提出的学习方案线索中提供更好的性能和对多示例学习以及对NIH〜ChestX-Ray14标杆疾病认识比以前使用的损失普遍的数据集更精确的预测。

27. MaterialGAN: Reflectance Capture using a Generative SVBRDF Model [PDF] 返回目录
Yu Guo, Cameron Smith, Miloš Hašan, Kalyan Sunkavalli, Shuang Zhao
Abstract: We address the problem of reconstructing spatially-varying BRDFs from a small set of image measurements. This is a fundamentally under-constrained problem, and previous work has relied on using various regularization priors or on capturing many images to produce plausible results. In this work, we present MaterialGAN, a deep generative convolutional network based on StyleGAN2, trained to synthesize realistic SVBRDF parameter maps. We show that MaterialGAN can be used as a powerful material prior in an inverse rendering framework: we optimize in its latent representation to generate material maps that match the appearance of the captured images when rendered. We demonstrate this framework on the task of reconstructing SVBRDFs from images captured under flash illumination using a hand-held mobile phone. Our method succeeds in producing plausible material maps that accurately reproduce the target images, and outperforms previous state-of-the-art material capture methods in evaluations on both synthetic and real data. Furthermore, our GAN-based latent space allows for high-level semantic material editing operations such as generating material variations and material morphing.
摘要：我们从解决一小部分图像测量的重建空间变化BRDFs的问题。这是一个从根本上受到约束的问题，和以前的工作一直依靠使用各种正规化先验或捕获许多图像产生合理的结果。在这项工作中，我们提出MaterialGAN，基于StyleGAN2深生成卷积网络，训练有素的合成逼真SVBRDF参数图。我们表明，MaterialGAN可以用作在逆呈现框架的有力材料之前：我们在其潜表示优化，以产生呈现时匹配捕获的图像的外观的材料的地图。我们证明上重建使用手持手机的闪光灯下的照明下拍摄的图像SVBRDFs的任务，这个框架。我们的方法成功生产可行材料的地图，精确地再现目标图像，并且优于在对合成的和真实数据评估先前的状态的最先进的材料捕获方法。此外，我们的基于GaN的潜在空间允许高层语义材料的编辑操作，例如产生材料的变化和材料变形。

28. The Importance of Balanced Data Sets: Analyzing a Vehicle Trajectory Prediction Model based on Neural Networks and Distributed Representations [PDF] 返回目录
Florian Mirus, Terrence C. Stewart, Jorg Conradt
Abstract: Predicting future behavior of other traffic participants is an essential task that needs to be solved by automated vehicles and human drivers alike to achieve safe and situationaware driving. Modern approaches to vehicles trajectory prediction typically rely on data-driven models like neural networks, in particular LSTMs (Long Short-Term Memorys), achieving promising results. However, the question of optimal composition of the underlying training data has received less attention. In this paper, we expand on previous work on vehicle trajectory prediction based on neural network models employing distributed representations to encode automotive scenes in a semantic vector substrate. We analyze the influence of variations in the training data on the performance of our prediction models. Thereby, we show that the models employing our semantic vector representation outperform the numerical model when trained on an adequate data set and thereby, that the composition of training data in vehicle trajectory prediction is crucial for successful training. We conduct our analysis on challenging real-world driving data.
摘要：未来的其他交通参与者的预测行为是一个重要的任务，需要由自动驾驶车辆和驾驶人都实现安全驾驶situationaware来解决。现代的方法来车辆轨迹预测通常依赖于数据驱动的模型，如神经网络，特别是LSTMs（长短期Memorys），取得可喜的成果。但是，基本的训练数据的最佳组合物的问题已经很少受到关注。在本文中，我们对车辆轨迹预测以前的工作拓展基于采用分布式交涉，在语义向量基板编码汽车场面神经网络模型。我们分析了我们的预测模型性能的训练数据变化的影响。因此，我们表明，我们采用语义向量表示模型跑赢数值模型上有足够的数据集训练，因此当，即在车辆轨迹预测训练数据的组成是成功的培训是至关重要的。我们的挑战实际驾驶数据进行我们的分析。

29. GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization [PDF] 返回目录
Ioannis Papakis, Abhijit Sarkar, Anuj Karpatne
Abstract: This paper proposes a novel method for online Multi-Object Tracking (MOT) using Graph Convolutional Neural Network (GCNN) based feature extraction and end-to-end feature matching for object association. The Graph based approach incorporates both appearance and geometry of objects at past frames as well as the current frame into the task of feature learning. This new paradigm enables the network to leverage the "context" information of the geometry of objects and allows us to model the interactions among the features of multiple objects. Another central innovation of our proposed framework is the use of the Sinkhorn algorithm for end-to-end learning of the associations among objects during model training. The network is trained to predict object associations by taking into account constraints specific to the MOT task. Experimental results demonstrate the efficacy of the proposed approach in achieving top performance on the MOT16 & 17 Challenge problems among state-of-the-art online and supervised approaches.
摘要：提出使用Graph卷积神经网络（GCNN）基于特征提取和对对象关联的端至端的特征匹配在线多目标跟踪（MOT）的新方法。基于图形方法结合在过去的帧中的对象的外观和几何形状以及当前帧为特征的学习任务。这种新的模式使网络能够利用物体的几何形状的“上下文”信息，并允许我们将多个对象的特征之间的相互作用进行建模。我们提出的框架的另一个核心创新是模型训练期间使用Sinkhorn算法的端至端的学习对象之间的关联的。该网络进行训练，考虑到限制特定的MOT任务预测对象关联。实验结果表明，在实现上MOT16和国家的最先进的中17个挑战问题在线和监督的方法顶级性能所提出的方法的有效性。

30. Depth Estimation from Monocular Images and Sparse Radar Data [PDF] 返回目录
Juan-Ting Lin, Dengxin Dai, Luc Van Gool
Abstract: In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network. We give a comprehensive study of the fusion between RGB images and Radar measurements from different aspects and proposed a working solution based on the observations. We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from applying the existing fusion methods developed for LiDAR data and images to the new fusion problem between Radar data and images. The experiments are conducted on the nuScenes dataset, which is one of the first datasets which features Camera, Radar, and LiDAR recordings in diverse scenes and weather conditions. Extensive experiments demonstrate that our method outperforms existing fusion methods. We also provide detailed ablation studies to show the effectiveness of each component in our method.
摘要：在本文中，我们将探讨利用深层神经网络融合单眼图像和雷达点，实现更精确的深度估计的可能性。我们给来自不同方面的RGB图像和雷达测量之间的融合的综合研究，并根据观察提出了一个可行的解决方案。我们发现，现有的雷达测量噪声是主要的关键原因之一是防止一个从申请LiDAR数据和图像开发的雷达数据和图像之间的新的融合问题，现有的融合方法。实验是在nuScenes数据集，这是其特点摄像机，雷达，激光雷达和记录在不同的场景和天气条件的第一数据集的一个进行。大量的实验证明我们的方法优于现有的融合方法。我们还提供了详细的消融研究，以显示我们的方法各组分的有效性。

31. DOT: Dynamic Object Tracking for Visual SLAM [PDF] 返回目录
Irene Ballester, Alejandro Fontan, Javier Civera, Klaus H. Strobl, Rudolph Triebel
Abstract: In this paper we present DOT (Dynamic Object Tracking), a front-end that added to existing SLAM systems can significantly improve their robustness and accuracy in highly dynamic environments. DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to allow SLAM systems based on rigid scene models to avoid such image areas in their optimizations. To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error. This short-term tracking improves the accuracy of the segmentation with respect to other approaches. In the end, only actually dynamic masks are generated. We have evaluated DOT with ORB-SLAM 2 in three public datasets. Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
摘要：在本文中我们本DOT（动态目标跟踪），一个前端，它添加到现有的SLAM系统可以显著提高其鲁棒性和准确性在高度动态环境。 DOT联合收割机例如分割和多视图几何模型生成的动态对象口罩，以允许基于刚性场景模型SLAM系统，以避免他们的优化，例如图像区域。要确定哪些对象被实际移动，DOT段潜在的动态对象的第一实例，然后，与所估计的相机运动，通过最小化光度投影误差跟踪此类物体。这种短期跟踪改善了分割的相对于其它方法的准确性。最终，仅生成实际动态口罩。我们已经评估DOT与ORB-SLAM 2在三个公共数据集。我们的研究结果表明，我们的方法显著改善ORB-SLAM 2的精确度和耐用性，尤其是在高动态场景。

32. Utilizing Transfer Learning and a Customized Loss Function for Optic Disc Segmentation from Retinal Images [PDF] 返回目录
Abdullah Sarhan, Ali Al-KhazÁly, Adam Gorner, Andrew Swift, Jon Rokne, Reda Alhajj, Andrew Crichton
Abstract: Accurate segmentation of the optic disc from a retinal image is vital to extracting retinal features that may be highly correlated with retinal conditions such as glaucoma. In this paper, we propose a deep-learning based approach capable of segmenting the optic disc given a high-precision retinal fundus image. Our approach utilizes a UNET-based model with a VGG16 encoder trained on the ImageNet dataset. This study can be distinguished from other studies in the customization made for the VGG16 model, the diversity of the datasets adopted, the duration of disc segmentation, the loss function utilized, and the number of parameters required to train our model. Our approach was tested on seven publicly available datasets augmented by a dataset from a private clinic that was annotated by two Doctors of Optometry through a web portal built for this purpose. We achieved an accuracy of 99.78\% and a Dice coefficient of 94.73\% for a disc segmentation from a retinal image in 0.03 seconds. The results obtained from comprehensive experiments demonstrate the robustness of our approach to disc segmentation of retinal images obtained from different sources.
摘要：从视网膜图像视盘的精确分割将提取可与视网膜病症如青光眼是高度相关的视网膜功能的关键。在本文中，我们提出了一个能够分割给予了高精度的视网膜眼底图像视盘深学习基础的方法。我们的方法是利用训练有素的ImageNet数据集VGG16编码器基于UNET模型。这项研究可以从对VGG16模型进行定制其他研究区别开来，该数据集的多样性通过，盘分割的时间，损失函数利用，参数的数量要求来训练我们的模型。我们的方法是在从被通过为此目的建立了一个门户网站注释通过验光两名医生的私人诊所通过一个数据集增加7个可公开获得的数据集进行测试。我们实现了99.78 \％的精确度和94.73 \％用于从视网膜图像0.03秒的圆盘分割一个骰子系数。从综合性实验得到的结果证明我们的方法从不同来源获得的视网膜图像分割盘的稳健性。

33. ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention [PDF] 返回目录
Jose Manuel Gomez-Perez, Raul Ortega
Abstract: Textbook Question Answering is a complex task in the intersection of Machine Comprehension and Visual Question Answering that requires reasoning with multimodal information from text and diagrams. For the first time, this paper taps on the potential of transformer language models and bottom-up and top-down attention to tackle the language and visual understanding challenges this task entails. Rather than training a language-visual transformer from scratch we rely on pre-trained transformers, fine-tuning and ensembling. We add bottom-up and top-down attention to identify regions of interest corresponding to diagram constituents and their relationships, improving the selection of relevant visual information for each question and answer options. Our system ISAAQ reports unprecedented success in all TQA question types, with accuracies of 81.36%, 71.11% and 55.12% on true/false, text-only and diagram multiple choice questions. ISAAQ also demonstrates its broad applicability, obtaining state-of-the-art results in other demanding datasets.
摘要：教科书问题回答是机器理解和Visual答疑的交叉复杂的任务，需要从文本和图表多模式信息的推理。这是第一次，对变压器的语言模型和自下而上和自上而下关注的潜在本文水龙头，以解决语言和直观的了解这个挑战任务带来什么。而不是从头训练语言的视觉变压器，我们依靠预训练的变压器，微调和ensembling。我们增加自下而上和自上而下的注意辨别对应图成分及其关系的感兴趣区域，提高每个问题和答案选项的相关视觉信息的选择。我们的系统ISAAQ报告中的所有TQA题型空前的成功，与真/假，只有文字和图表选择题的81.36％，71.11％和55.12％的精度。 ISAAQ也证明了其广泛的适用性，从而获得状态的最先进的结果在其他苛刻的数据集。

34. Dynamic Facial Asset and Rig Generation from a Single Scan [PDF] 返回目录
Jiaman Li, Zhengfei Kuang, Yajie Zhao, Mingming He, Karl Bladin, Hao Li
Abstract: The creation of high-fidelity computer-generated (CG) characters used in film and gaming requires intensive manual labor and a comprehensive set of facial assets to be captured with complex hardware, resulting in high cost and long production cycles. In order to simplify and accelerate this digitization process, we propose a framework for the automatic generation of high-quality dynamic facial assets, including rigs which can be readily deployed for artists to polish. Our framework takes a single scan as input to generate a set of personalized blendshapes, dynamic and physically-based textures, as well as secondary facial components (e.g., teeth and eyeballs). Built upon a facial database consisting of pore-level details, with over $4,000$ scans of varying expressions and identities, we adopt a self-supervised neural network to learn personalized blendshapes from a set of template expressions. We also model the joint distribution between identities and expressions, enabling the inference of the full set of personalized blendshapes with dynamic appearances from a single neutral input scan. Our generated personalized face rig assets are seamlessly compatible with cutting-edge industry pipelines for facial animation and rendering. We demonstrate that our framework is robust and effective by inferring on a wide range of novel subjects, and illustrate compelling rendering results while animating faces with generated customized physically-based dynamic textures.
摘要：在膜和游戏中使用高保真的计算机生成（CG）的字符的创建需要密集的手工劳动和一整套面部资产的与复杂的硬件被捕获，导致高成本和长的生产周期。为了简化和加快这一数字化过程中，我们提出了自动生成高质量的动态面部资产，包括它可以很容易地部署为艺术家抛光钻塔的框架。我们的框架采用单个扫描作为输入以产生一组个性化blendshapes，动态的和基于物理的纹理，以及次级面部组件（例如，牙齿和眼球）。建立在由孔级别的细节，具有不同的表情和身份的4000 $ $扫描面部数据库，我们采取自我监督的神经网络从一组模板表达式的学习个性化blendshapes。我们还模拟身份和表达之间的联合分布，使全套动态外观个性化blendshapes的推理从单一的中性输入扫描。我们产生个性化的面部钻机资产与前沿产业管道面部动画和渲染无缝兼容。我们证明，我们的框架是鲁棒的和通过推断就广泛新颖受试者有效，并示出了令人信服的渲染结果，而与动画生成定制的基于物理的动态纹理面。

35. Understanding the Role of Adversarial Regularization in Supervised Learning [PDF] 返回目录
Litu Rout
Abstract: Despite numerous attempts sought to provide empirical evidence of adversarial regularization outperforming sole supervision, the theoretical understanding of such phenomena remains elusive. In this study, we aim to resolve whether adversarial regularization indeed performs better than sole supervision at a fundamental level. To bring this insight into fruition, we study vanishing gradient issue, asymptotic iteration complexity, gradient flow and provable convergence in the context of sole supervision and adversarial regularization. The key ingredient is a theoretical justification supported by empirical evidence of adversarial acceleration in gradient descent. In addition, motivated by a recently introduced unit-wise capacity based generalization bound, we analyze the generalization error in adversarial framework. Guided by our observation, we cast doubts on the ability of this measure to explain generalization. We therefore leave as open questions to explore new measures that can explain generalization behavior in adversarial learning. Furthermore, we observe an intriguing phenomenon in the neural embedded vector space while contrasting adversarial learning with sole supervision.
摘要：尽管多次努力试图提供的对抗性正规化跑赢全权监督经验证据，这种现象的理论的理解仍然遥遥无期。在这项研究中，我们的目标是解决在最基本的层面比全权监督是否对抗性正规化确实执行得更好。为了将这种洞察力修成正果，我们研究全权监督和对抗性正规化的情况下消失梯度问题，渐近迭代复杂性，梯度流动和可证明收敛。关键成分是通过梯度下降对抗加速的经验证据支持的理论依据。此外，通过结合最近推出的单位明智能力基于推广的动机，我们分析了对抗性框架泛化的错误。通过我们的观察指导下，我们投的这一指标的解释泛化能力的怀疑。因此，我们保留为开放式的问题，探索能够解释对抗性学习推广行为的新措施。此外，我们在神经嵌入式矢量空间观察一个有趣的现象，而对比与全权监督对抗性学习。

36. Why Adversarial Interaction Creates Non-Homogeneous Patterns: A Pseudo-Reaction-Diffusion Model for Turing Instability [PDF] 返回目录
Litu Rout
Abstract: Long after Turing's seminal Reaction-Diffusion (RD) model, the elegance of his fundamental equations alleviated much of the skepticism surrounding pattern formation. Though Turing model is a simplification and an idealization, it is one of the best-known theoretical models to explain patterns as a reminiscent of those observed in nature. Over the years, concerted efforts have been made to align theoretical models to explain patterns in real systems. The apparent difficulty in identifying the specific dynamics of the RD system makes the problem particularly challenging. Interestingly, we observe Turing-like patterns in a system of neurons with adversarial interaction. In this study, we establish the involvement of Turing instability to create such patterns. By theoretical and empirical studies, we present a pseudo-reaction-diffusion model to explain the mechanism that may underlie these phenomena. While supervised learning attains homogeneous equilibrium, this paper suggests that the introduction of an adversary helps break this homogeneity to create non-homogeneous patterns at equilibrium. Further, we prove that randomly initialized gradient descent with over-parameterization can converge exponentially fast to an $\epsilon$-stationary point even under adversarial interaction. In addition, different from sole supervision, we show that the solutions obtained under adversarial interaction are not limited to a tiny subspace around initialization.
摘要：龙图灵的开创性反应扩散（RD）模型后，他的基本方程的风采缓解很多周围的图案形成的怀疑。虽然图灵模型是一个简化和一个理想化，它是最知名的理论模型来解释图案作为一个让人想起那些自然界中可见的一个。多年来，众志成城已作出对齐的理论模型来解释在实际系统中的模式。在识别RD系统的具体动力学明显的困难使这一问题尤其具有挑战性。有趣的是，我们观察到图灵样与对抗性互动的神经元系统模式。在这项研究中，我们建立了图灵不稳定的参与创造这样的模式。通过理论和经验研究，我们提出了一个伪反应扩散模型来解释，可依据这些现象的机制。虽然监督学习达到均匀平衡，本文建议引入一个对手有助于打破这种同质化的平衡创建非均质模式。此外，我们证明了具有超参数随机初始化的梯度下降可以成倍的快速收敛到$ \ $小量即使在敌对的交互-stationary点。此外，从全权监督不同，我们表明，在对抗的互动中获得的溶液不局限于围绕初始化一个很小的子空间。

37. Ray-based classification framework for high-dimensional data [PDF] 返回目录
Justyna P. Zwolak, Sandesh S. Kalantre, Thomas McJunkin, Brian J. Weber, Jacob M. Taylor
Abstract: While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional qualitative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-dimensional representations, called \emph{rays}, to construct the "fingerprint" of the structure(s) based on substantially reduced information. We empirically study this framework using a synthetic dataset of double and triple quantum dot devices and apply it to the classification problem of identifying the device state. We show that the performance of the ray-based classifier is already on par with traditional 2D images for low dimensional systems, while significantly cutting down the data acquisition cost.
摘要：尽管在高维的任意结构的分类，可能需要完整的定量信息，用于简单的几何结构，有关定义结构可以是足够的界限低维的定性信息。而不是使用致密，多维数据，我们提出了一个深的神经网络（DNN）分类框架，其利用一维表示的最小的集合，称为\ EMPH {射线}，以构建该结构的“指纹”（一个或多个）的基础上显着降低的信息。我们凭经验使用双重和三重量子点器件的合成数据集研究这个框架，并把它应用到识别设备状态的分类的问题。我们发现，基于射线的分类器的性能已经看齐，与传统的二维图像的低维系统，而显著削减数据采集成本。

38. Physical Exercise Recommendation and Success Prediction Using Interconnected Recurrent Neural Networks [PDF] 返回目录
Arash Mahyari, Peter Pirolli
Abstract: Unhealthy behaviors, e.g., physical inactivity and unhealthful food choice, are the primary healthcare cost drivers in developed countries. Pervasive computational, sensing, and communication technology provided by smartphones and smartwatches have made it possible to support individuals in their everyday lives to develop healthier lifestyles. In this paper, we propose an exercise recommendation system that also predicts individual success rates . The system, consisting of two inter-connected recurrent neural networks (RNNs), uses the history of workouts to recommend the next workout activity for each individual. The system then predicts the probability of successful completion of the predicted activity by the individual. The prediction accuracy of this interconnected-RNN model is assessed on previously published data from a four-week mobile health experiment and is shown to improve upon previous predictions from a computational cognitive model.
摘要：不健康的行为，例如，缺乏身体活动和不健康的食物选择，在发达国家初级卫生保健的成本驱动因素。普适计算，传感和通信用智能手机和智能手表提供的技术使人们有可能支持他们的日常生活个人发展更健康的生活方式。在本文中，我们提出了一个锻炼推荐系统，该系统还预测个人的成功率。该系统由两个相互连接的递归神经网络（RNNs），利用训练的历史，建议下次锻炼活动的每一个人。然后，系统预测由个人预测活动的成功完成的概率。这种互连-RNN模型的预测精度从四个星期移动健康实验评估了此前公布的数据并显示在从计算认知模型前一个预测来改善。

39. Bag of Tricks for Adversarial Training [PDF] 返回目录
Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, Jun Zhu
Abstract: Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic training settings (e.g., weight decay, learning rate schedule, etc.) used in these methods are highly inconsistent, which could largely affect the model performance as shown in our experiments. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. In this work, we provide comprehensive evaluations on the effects of basic training tricks and hyperparameter settings for adversarially trained models. We provide a reasonable baseline setting and re-implement previous defenses to achieve new state-of-the-art results.
摘要：对抗性训练（AT）是促进模型的鲁棒性的最有效策略之一。然而，最近的基准测试表明，大多数对AT所提出的改进是不是简单地提前停止训练过程事倍功半。这是反直觉的事实促使我们调查几十AT方法的实施细节。出人意料的是，我们发现基本训练的设置（例如，体重衰减，学习费率表等），在这些方法中使用的是高度不一致，如在我们的实验中这可能在很大程度上影响模型的性能。例如，权衰减的略微不同的值可以超过7％，这是可能的，以覆盖由所提出的方法所引起的潜在促进减少模型稳健精度。在这项工作中，我们提供的基本训练技巧和超参数设置adversarially训练的模型的影响进行全面评估。我们提供了一个合理的基线设置和重新实现之前的防御，实现国家的最先进的新成果。

40. GraphXCOVID: Explainable Deep Graph Diffusion Pseudo-Labelling for Identifying COVID-19 on Chest X-rays [PDF] 返回目录
Angelica I Aviles-Rivero, Philip Sellars, Carola-Bibiane Schönlieb, Nicolas Papadakis
Abstract: Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for developing Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availability of a large and representative labelled dataset. The creation of which is a heavily expensive and time consuming task, and especially imposes a great challenge for a novel disease. Semi-supervised learning has shown the ability to match the incredible performance of supervised models whilst requiring a small fraction of the labelled examples. This makes the semi-supervised paradigm an attractive option for identifying COVID-19. In this work, we introduce a graph based deep semi-supervised framework for classifying COVID-19 from chest X-rays. Our framework introduces an optimisation model for graph diffusion that reinforces the natural relation among the tiny labelled set and the vast unlabelled data. We then connect the diffusion prediction output as pseudo-labels that are used in an iterative scheme in a deep net. We demonstrate, through our experiments, that our model is able to outperform the current leading supervised model with a tiny fraction of the labelled examples. Finally, we provide attention maps to accommodate the radiologist's mental model, better fitting their perceptual and cognitive abilities. These visualisation aims to assist the radiologist in judging whether the diagnostic is correct or not, and in consequence to accelerate the decision.
摘要：一个可以学会诊断COVID-19极最小的监督下？由于小说COVID-19的爆发出现了开发人工智能技术在胸部X射线数据专家级病鉴别仓促。尤其是，使用深层监督学习已成为去到范式。然而，这种模型的性能在很大程度上依赖于一个大的和有代表性的标记数据集的可用性。它的建立是一个沉重昂贵和耗时的任务，特别是规定了一个新的疾病的巨大挑战。半监督学习已经显示出匹配监管模式的令人难以置信的性能，同时要求的标记示例一小部分的能力。这使得半监督模式，用于识别COVID-19有吸引力的选择。在这项工作中，我们介绍的曲线图基于深半监督框架COVID-19从胸部X光进行分类。我们的框架引入了对图形扩散的最优化模型，强化了微小的标记集和广大未标记的数据之间的天然联系。然后，我们将扩散预测输出连接为在迭代方案中使用的深净伪标签。我们证明，通过我们的实验，我们的模型能够与标记示例一小部分，以超越目前领先的监督模式。最后，我们提供了注意映射，以适应放射科医生的心理模型，更好地适合他们的感知和认知能力。这些可视化的目的是帮助放射科医生在判断是否诊断正确与否，并且因此加速决策。

41. Improving spatial domain based image formation through compressed sensing [PDF] 返回目录
Gene Stoltz, André Leon Nel
Abstract: In this paper, we improve image reconstruction in a single-pixel scanning system by selecting an detector optimal field of view. Image reconstruction is based on compressed sensing and image quality is compared to interpolated staring arrays. The image quality comparisons use a "dead leaves" data set, Bayesian estimation and the Peak-Signal-to-Noise Ratio (PSNR) measure. Compressed sensing is explored as an interpolation algorithm and shows with high probability an improved performance compared to Lanczos interpolation. Furthermore, multi-level sampling in a single-pixel scanning system is simulated by dynamically altering the detector field of view. It was shown that multi-level sampling improves the distribution of the Peak-Signal-to-Noise Ratio. We further explore the expected sampling level distributions and PSNR distributions for multi-level sampling. The PSNR distribution indicates that there is a small set of levels which will improve image quality over interpolated staring arrays. We further conclude that multi-level sampling will outperform single-level uniform random sampling on average.
摘要：在本文中，我们通过选择的视图的检测器的最佳场提高在单像素扫描系统的图像重建。图像重建是基于压缩感测和图像质量相比，内插的凝视阵列。图像质量比较使用“枯叶”数据集，贝叶斯估计和峰值信噪比（PSNR）的措施。压缩感测探索作为内插算法和显示具有高概率的改进的性能相比的Lanczos插值。此外，多级在单像素扫描系统采样是通过动态改变视场检测器模拟。结果表明，多级采样改善了峰值信噪比的分布。我们进一步探索多级采样的采样预期水平分布和PSNR分布。的PSNR分布指示有一个小集水平，这将提高在插盯着阵列的图像质量。我们进一步得出结论，多级采样的表现将优于平均单层次均匀随机抽样。

42. Deep Group-wise Variational Diffeomorphic Image Registration [PDF] 返回目录
Tycho F.A. van der Ouderaa, Ivana Išgum, Wouter B. Veldhuis, Bob D. de Vos
Abstract: Deep neural networks are increasingly used for pair-wise image registration. We propose to extend current learning-based image registration to allow simultaneous registration of multiple images. To achieve this, we build upon the pair-wise variational and diffeomorphic VoxelMorph approach and present a general mathematical framework that enables both registration of multiple images to their geodesic average and registration in which any of the available images can be used as a fixed image. In addition, we provide a likelihood based on normalized mutual information, a well-known image similarity metric in registration, between multiple images, and a prior that allows for explicit control over the viscous fluid energy to effectively regularize deformations. We trained and evaluated our approach using intra-patient registration of breast MRI and Thoracic 4DCT exams acquired over multiple time points. Comparison with Elastix and VoxelMorph demonstrates competitive quantitative performance of the proposed method in terms of image similarity and reference landmark distances at significantly faster registration.
摘要：深神经网络越来越多地用于成对图像配准。我们建议延长当前基于学习的图像配准，允许多个图像的同时登记。为了实现这一点，我们建立在成对变和微分同胚VoxelMorph方法并提出一个一般的数学框架，使多个图像的对他们的测地平均和登记都登记在其中的任何可用的图像可被用作一个固定的图象。另外，我们提供了基于归一化的互信息，一个公知的图像相似度在登记度量，多个图像之间的可能性，并且允许有效地正规化变形显式控制的粘性流体能量的现有。我们的培训和使用在多个时间点所采集乳腺MRI检查和胸4DCT考试的患者内注册评估我们的做法。用的Elastix和VoxelMorph比较表明在显著更快登记在图像类似度和基准的地标距离方面所提出的方法的竞争性定量性能。

43. DEEPMIR: A DEEP convolutional neural network for differential detection of cerebral Microbleeds and IRon deposits in MRI [PDF] 返回目录
Tanweer Rashid, Ahmed Abdulkadir, Ilya M. Nasrallah, Jeffrey B. Ware, Pascal Spincemaille, J. Rafael Romero, R. Nick Bryan, Susan R. Heckbert, Mohamad Habes
Abstract: Background: Cerebral microbleeds (CMBs) and non-hemorrhage iron deposits in the basal ganglia have been associated with brain aging, vascular disease and neurodegenerative disorders. Recent advances using quantitative susceptibility mapping (QSM) make it possible to differentiate iron content from mineralization in-vivo using magnetic resonance imaging (MRI). However, automated detection of such lesions is still challenging, making quantification in large cohort bases studies rather limited. Purpose: Development of a fully automated method using deep learning for detecting CMBs and basal ganglia iron deposits using multimodal MRI. Materials and Methods: We included a convenience sample of 24 participants from the MESA cohort and used T2-weighted images, susceptibility weighted imaging (SWI), and QSM to segment the lesions. We developed a protocol for simultaneous manual annotation of CMBs and non-hemorrhage iron deposits in the basal ganglia, which resulted in defining the gold standard. This gold standard was then used to train a deep convolution neural network (CNN) model. Specifically, we adapted the U-Net model with a higher number of resolution layers to be able to detect small lesions such as CMBs from standard resolution MRI which are used in cohort-based studies. The detection performance was then evaluated using the cross-validation principle in order to ensure generalization of the results. Results: With multi-class CNN models, we achieved an average sensitivity and precision of about 0.8 and 0.6, respectively for detecting CMBs. The same framework detected non-hemorrhage iron deposits reaching an average sensitivity and precision of about 0.8. Conclusions: Our results showed that deep learning could automate the detection of small vessel disease lesions and including multimodal MR data such as QSM can improve the detection of CMB and non-hemorrhage iron deposits.
摘要：背景：脑微出血（CMBs的）和非出血铁矿基底节已与大脑衰老，血管疾病和神经退行性疾病有关。使用量化磁敏测绘（QSM）的最新进展在体内使用磁共振成像（MRI）使得有可能区分铁含量从矿化。然而，此种病变的自动化检测仍然是具有挑战性的，在大型队列研究基地相当有限，使得量化。目的：利用深度学习检测使用多式联运MRI CMBs的和基底节铁矿床的全自动的方法的发展。材料和方法：我们包括来自MESA队列和用过的T2加权图像，磁化率加权成像（SWI）24名参与者一个便利样本，和QSM来分割病变。我们开发了基底节，导致定义的黄金标准CMBs的和非出血铁矿的同时手动标注的协议。然后，这个金标准用于训练了深刻的卷积神经网络（CNN）模型。具体来说，我们适于具有较高数目的分辨率层的U-Net的模型，以便能够检测小病灶如它们在基础的队列-研究使用从标准分辨率MRI CMBs的。然后检测性能是使用交叉验证的原则，以确保结果的概括评估。结果：与多类CNN模型，我们实现的平均灵敏度和精度大约0.8和0.6，分别用于检测CMBs的。相同的框架检测到的非出血铁矿达到约0.8的平均灵敏度和精确度。结论：我们的研究结果表明，深度学习会自动小血管病变病灶的检出和包括多式联运MR数据，如QSM可以提高CMB和非出血铁矿的检测。

44. Sampling possible reconstructions of undersampled acquisitions in MR imaging [PDF] 返回目录
Kerem C. Tezcan, Christian F. Baumgartner, Ender Konukoglu
Abstract: Undersampling the k-space during MR acquisitions saves time, however results in an ill-posed inversion problem, leading to an infinite set of images as possible solutions. Traditionally, this is tackled as a reconstruction problem by searching for a single "best" image out of this solution set according to some chosen regularization or prior. This approach, however, misses the possibility of other solutions and hence ignores the uncertainty in the inversion process. In this paper, we propose a method that instead returns multiple images which are possible under the acquisition model and the chosen prior. To this end, we introduce a low dimensional latent space and model the posterior distribution of the latent vectors given the acquisition data in k-space, from which we can sample in the latent space and obtain the corresponding images. We use a variational autoencoder for the latent model and the Metropolis adjusted Langevin algorithm for the sampling. This approach allows us to obtain multiple possible images and capture the uncertainty in the inversion process under the used prior. We evaluate our method on images from the Human Connectome Project dataset as well as in-house measured multi-coil images and compare to two different methods. The results indicate that the proposed method is capable of producing images that match the ground truth in regions where acquired k-space data is informative and construct different possible reconstructions, which show realistic structural variations, in regions where acquired k-space data is not informative. Keywords: Magnetic Resonance image reconstruction, uncertainty estimation, inverse problems, sampling, MCMC, deep learning, unsupervised learning.
摘要：在MR采集欠k空间节省了时间，但结果是一个病态反演问题，导致无限集合图像作为可能的解决方案。传统上，这是通过搜索一个单一的“最佳”图像出根据一些选择的正则化或事先向该溶液组解决作为重建问题。这种做法，但是，错过其他解决方案的可能性，并因此忽略了反转过程中的不确定性。在本文中，我们提议而是返回其获取模型和所选择的前下是可能的多个图像的方法。为此，我们引入一个低维潜在空间并给予在k空间中采集数据，从中可以在潜在空间采样并获得相应的图像中的潜矢量的后验分布进行建模。我们使用的潜在模型变的自动编码和都市调整朗之万算法采样。这种方法允许我们获得多个可能的图像，并捕获下使用的现有在反演过程中的不确定性。我们评估我们从人类连接组项目数据集以及内部测量多线圈的图像图像的方法，并比较两种不同的方法。结果表明，所提出的方法能够产生匹配的区域中的地面实况其中采集的k-空间数据是信息性的和构造的不同的可能重建，其示出逼真的结构变化，在区域即图像，其中所采集的k-空间数据不是信息。关键词：磁共振图像重建，不确定性估计，反问题，抽样，MCMC，深度学习，无监督学习。

45. RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior [PDF] 返回目录
Hong-Ye Hu, Dian Wu, Yi-Zhuang You, Bruno Olshausen, Yubei Chen
Abstract: Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate different scale information of images with disentangle representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representation at different scales enables semantic manipulation and style mixing of the images. To visualize the latent representation, we introduce the receptive fields for flow-based models and find receptive fields learned by RG-Flow are similar to convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by sparse prior distributions to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has $O(\log L)$ complexity for image inpainting compared to previous flow-based models with $O(L^2)$ complexity.
摘要：基于流的生成模型已成为无监督学习的一类重要方法。在这项工作中，我们引入重整化群（RG）和稀疏先验分布的核心理念来设计的分层基于流的生成模型，称为RG-流，这可以单独与各比例解开表示图像的不同规模的信息。我们主要是展示我们的方法在CelebA数据集，并显示在不同尺度的解缠结的表示使得语义操作和风格的图像的混合。以可视化的潜表示，我们将介绍基于流的模型感受野，并找到RG-流动了解到感受野类似于卷积神经网络。此外，我们替换稀疏先验分布广泛采用高斯先验分布，进一步提升交涉解开。从理论的角度来看，该方法具有$ O（\日志L）$图像的复杂补绘相较于$ O（L ^ 2）$复杂以往基于流的模型。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-02

目录

摘要