摘要

1. SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction [PDF] 返回目录
Jaehoon Choi, Dongki Jung, Donghwan Lee, Changick Kim
Abstract: Self-supervised monocular depth estimation has emerged as a promising method because it does not require groundtruth depth maps during training. As an alternative for the groundtruth depth map, the photometric loss enables to provide self-supervision on depth prediction by matching the input image frames. However, the photometric loss causes various problems, resulting in less accurate depth values compared with supervised approaches. In this paper, we propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss. Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge. Therefore, we introduce multi-task learning schemes to incorporate semantic-awareness into the representation of depth features. Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods. Furthermore, extensive experiments on different datasets show its better generalization ability and robustness to various conditions, such as low-light or adverse weather.
摘要：自监督单眼深度估计已成为一个有前途的方法，因为它不需要在训练中真实状况的深度图。作为用于真实状况的深度图的替代，测光损失能够通过匹配所述输入图像帧以提供深度预测自检。但是，光度损耗引起各种问题，从而导致不太准确的深度值与受监督的方法相比。在本文中，我们提出了SafeNet公司，旨在利用语义信息来克服光度损耗的限制。我们的主要想法是利用集成了语义和几何知识语义感知深度的功能。因此，我们引入多任务学习计划，结合语义意识进入深度特性的表示。在KITTI实验数据集表明，我们的方法竞争，甚至超越国家的最先进的方法。此外，在不同的数据集广泛实验表明其更好泛化能力和稳健性的各种条件下，如低光或恶劣天气。

2. CORE: Color Regression for Multiple Colors Fashion Garments [PDF] 返回目录
Alexandre Rame, Arthur Douillard, Charles Ollion
Abstract: Among all fashion attributes, color is challenging to detect due to its subjective perception. Existing classification approaches can not go beyond the predefined list of discrete color names. In this paper, we argue that color detection is a regression problem. Thus, we propose a new architecture, based on attention modules and in two-stages. The first stage corrects the image illumination while detecting the main discrete color name. The second stage combines a colorname-attention (dependent of the detected color) with an object-attention (dependent of the clothing category) and finally weights a spatial pooling over the image pixels' RGB values. We further expand our work for multiple colors garments. We collect a dataset where each fashion item is labeled with a continuous color palette: we empirically show the benefits of our approach.
摘要：在所有时尚属性，颜色是具有挑战性的，由于其主观感觉来检测。现有的分类方法无法超越离散颜色名称的预定义列表。在本文中，我们认为，颜色检测是一个回归问题。因此，我们提出了一个新的体系结构的基础上，注意模块，并在两个阶段。第一级校正图像照明而检测主离散颜色名。第二阶段结合了colorname注意力（取决于所检测到的颜色的）与对象-注意（依赖于服装类的），最后的权重超过图像像素RGB值的空间池。我们进一步扩大我们的多种颜色的服装作品。我们收集每个时尚项目都标有一个连续的调色板数据集：我们经验证明我们的方法的好处。

3. Support-set bottlenecks for video-text representation learning [PDF] 返回目录
Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi
Abstract: The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs. We posit that this last behaviour is too strict, enforcing dissimilar representations even for samples that are semantically-related -- for example, visually similar videos or ones that share the same depicted action. In this paper, we propose a novel method that alleviates this by leveraging a generative model to naturally push these related samples together: each sample's caption must be reconstructed as a weighted combination of other support samples' visual representations. This simple idea ensures that representations are not overly-specialized to individual samples, are reusable across the dataset, and results in representations that explicitly encode semantics shared between samples, unlike noise contrastive learning. Our proposed method outperforms others by a large margin on MSR-VTT, VATEX and ActivityNet, for video-to-text and text-to-video retrieval.
摘要： - 噪音对比学习 - 学习视频，文字表述的主导范式增加对已知会相关的样品，如文本和视频从相同的样品，并推动的的表示的相似性远交涉所有其他对。我们断定，这最后的行为过于严格，执行不同的表示即使样品是在语义相关的 - 例如，在视觉上类似视频或共享相同的描绘动作的。在本文中，我们提出了一种新方法，该方法解决了这个通过利用生成模型，以这些相关样本自然推到一起：各样品的标题必须重构为其他支撑样本的视觉表示的加权组合。这个简单的想法确保申述没有过于专业个体样本，是整个数据集可重复使用，并且在表示，用于样本之间共享明确编码的语义，不像噪音对比的学习效果。我们提出的方法优于他人通过MSR-VTT，VATEX和ActivityNet，对于大比分视频到文本和文本到视频检索。

4. Microscopic fine-grained instance classification through deep attention [PDF] 返回目录
Mengran Fan, Tapabrata Chakrabort, Eric I-Chao Chang, Yan Xu, Jens Rittscher
Abstract: Fine-grained classification of microscopic image data with limited samples is an open problem in computer vision and biomedical imaging. Deep learning based vision systems mostly deal with high number of low-resolution images, whereas subtle detail in biomedical images require higher resolution. To bridge this gap, we propose a simple yet effective deep network that performs two tasks simultaneously in an end-to-end manner. First, it utilises a gated attention module that can focus on multiple key instances at high resolution without extra annotations or region proposals. Second, the global structural features and local instance features are fused for final image level classification. The result is a robust but lightweight end-to-end trainable deep network that yields state-of-the-art results in two separate fine-grained multi-instance biomedical image classification tasks: a benchmark breast cancer histology dataset and our new fungi species mycology dataset. In addition, we demonstrate the interpretability of the proposed model by visualising the concordance of the learned features with clinically relevant features.
摘要：具有有限样品显微图像数据的细粒度分类是计算机视觉和生物医学成像的开放问题。深基础的学习视觉系统主要处理大量的低清晰度的图像，而在生物医学图像微妙的细部需要更高的分辨率。为了弥补这种差距，我们提出了一个简单而有效的深层网络执行同时在一个终端到终端的方式在两个任务。首先，它采用了门控注意力模块，可以在高分辨率的聚焦多密钥情况下，无需额外的注解或地区的建议。二，全球结构特点和当地实际情况的功能融合为最终的图像层次的划分。其结果是一个强大而轻巧的终端到终端的可训练的深网国债收益率国家的先进成果在两个独立的细粒度多实例生物医学图像分类任务：一个基准乳腺癌组织学数据集和新菌种真菌学数据集。另外，我们通过可视化的学习功能的一致性与临床相关的功能证明了该模型的可解释性。

5. Representation learning from videos in-the-wild: An object-centric approach [PDF] 返回目录
Rob Romijnders, Aravindh Mahendran, Michael Tschannen, Josip Djolonga, Marvin Ritter, Neil Houlsby, Mario Lucic
Abstract: We propose a method to learn image representations from uncurated videos. We combine a supervised loss from off-the-shelf object detectors and self-supervised losses which naturally arise from the video-shot-frame-object hierarchy present in each video. We report competitive results on 19 transfer learning tasks of the Visual Task Adaptation Benchmark (VTAB), and on 8 out-of-distribution-generalization tasks, and discuss the benefits and shortcomings of the proposed approach. In particular, it improves over the baseline on all 18/19 few-shot learning tasks and 8/8 out-of-distribution generalization tasks. Finally, we perform several ablation studies and analyze the impact of the pretrained object detector on the performance across this suite of tasks.
摘要：我们建议从uncurated视频学习图像表示的方法。我们结合从现成的，货架对象检测器和自监督损失有监督的损失，这与存在于每个视频的视频镜头帧-对象分层结构中自然产生的。我们报告的视觉任务适应基准（VTAB）的19个的学习任务有竞争力的结果，并在第8外的分布推广任务，并讨论的好处和所提出的方法的缺点。特别是，它改进了对所有18/19几拍的学习任务基线和8/8外的分布推广任务。最后，我们进行一些消融研究和分析预训练的对象检测器的跨这套房的任务性能的影响。

6. Compressing Deep Convolutional Neural Networks by Stacking Low-dimensional Binary Convolution Filters [PDF] 返回目录
Weichao Lan, Liang Lan
Abstract: Deep Convolutional Neural Networks (CNN) have been successfully applied to many real-life problems. However, the huge memory cost of deep CNN models poses a great challenge of deploying them on memory-constrained devices (e.g., mobile phones). One popular way to reduce the memory cost of deep CNN model is to train binary CNN where the weights in convolution filters are either 1 or -1 and therefore each weight can be efficiently stored using a single bit. However, the compression ratio of existing binary CNN models is upper bounded by around 32. To address this limitation, we propose a novel method to compress deep CNN model by stacking low-dimensional binary convolution filters. Our proposed method approximates a standard convolution filter by selecting and stacking filters from a set of low-dimensional binary convolution filters. This set of low-dimensional binary convolution filters is shared across all filters for a given convolution layer. Therefore, our method will achieve much larger compression ratio than binary CNN models. In order to train our proposed model, we have theoretically shown that our proposed model is equivalent to select and stack intermediate feature maps generated by low-dimensional binary filters. Therefore, our proposed model can be efficiently trained using the split-transform-merge strategy. We also provide detailed analysis of the memory and computation cost of our model in model inference. We compared the proposed method with other five popular model compression techniques on two benchmark datasets. Our experimental results have demonstrated that our proposed method achieves much higher compression ratio than existing methods while maintains comparable accuracy.
摘要：深卷积神经网络（CNN）已成功地应用于许多现实生活中的问题。然而，深CNN模型的巨大的内存价格给内存受限的设备（如移动电话）进行部署的一个巨大的挑战。以减少深CNN模型的存储器成本的一种流行的方式是培养二进制CNN其中卷积滤波器权重是1或-1，并且因此每个权重可以有效地使用单个位存储。然而，现有的二进制CNN模型的压缩比通过围绕32.为了解决该限制是上界，我们通过堆叠低维二进制卷积滤波器提出了一种新方法，以压缩深CNN模型。我们提出的方法，通过选择和从一组低维二进制卷积滤波器的层叠过滤器近似于标准卷积滤波器。这组低维二进制卷积滤波器的跨所有滤波器共享对于给定的卷积层。因此，我们的方法将实现比二进制CNN车型更大的压缩比。为了训练我们提出的模型，我们已经示出理论上，我们提出的模型相当于选择和堆栈中间特征图通过低维二进制滤波器生成的。因此，我们提出的模型可以有效地使用分割 - 转换 - 合并策略培训。我们还为在模型推理模型的存储和计算成本的详细分析。我们比较建议的方法有两个标准数据集等五大热门机型的压缩技术。我们的实验结果表明，我们提出的方法实现了比现有方法更高的压缩比，同时保持相当的准确度。

7. High Speed Event Camera TRacking [PDF] 返回目录
William Chamorro, Juan Andrade-Cetto, Joan Solà
Abstract: Event cameras are bioinspired sensors with reaction times in the order of microseconds. This property makes them appealing for use in highly-dynamic computer vision applications. In this work,we explore the limits of this sensing technology and present an ultra-fast tracking algorithm able to estimate six-degree-of-freedom motion with dynamics over 25.8 g, at a throughput of 10 kHz,processing over a million events per second. Our method is capable of tracking either camera motion or the motion of an object in front of it, using an error-state Kalman filter formulated in a Lie-theoretic sense. The method includes a robust mechanism for the matching of events with projected line segments with very fast outlier rejection. Meticulous treatment of sparse matrices is applied to achieve real-time performance. Different motion models of varying complexity are considered for the sake of comparison and performance analysis
摘要：事件摄像机的仿生与反应时间的传感器微秒级。此属性使他们呼吁在高动态计算机视觉应用。在这项工作中，我们探讨这个传感技术的限制，并提出了一个超快速跟踪算法能够估算与动态六度的自由度运动可超过25.8克，以吞吐量为10kHz，处理超过每百万事件第二。我们的方法是能够跟踪或者相机运动或在它前面的物体的运动，使用误差状态卡尔曼滤波器在烈理论感配制的。该方法包括用于与投影线段的事件具有非常快的异常值拒绝的匹配的有力机制。稀疏矩阵的细致处理，应用了可实现实时性能。的不同复杂程度不同的运动模型被认为是比较和性能分析的目的

8. Characterization of surface motion patterns in highly deformable soft tissue organs from dynamic Magnetic Resonance Imaging [PDF] 返回目录
Karim Makki, Amine Bohi, Marc Emmanuel Bellemare
Abstract: In this work, we present a pipeline for characterization of bladder surface dynamics during deep respiratory movements from dynamic Magnetic Resonance Imaging (MRI). Dynamic MRI may capture temporal anatomical changes in soft tissue organs with high-contrast but the obtained sequences usually suffer from limited volume coverage which makes the high resolution reconstruction of organ shape trajectories a major challenge in temporal studies. For a compact shape representation, the reconstructed temporal data with full volume coverage are first used to establish a subject-specific dynamical 4D mesh sequences using the large deformation diffeomorphic metric mapping (LDDMM) framework. Then, we performed a statistical characterization of organ shape changes from mechanical parameters such as mesh elongations and distortions. Since shape space is curved, we have also used the intrinsic curvature changes as metric to quantify surface evolution. However, the numerical computation of curvature is strongly dependant on the surface parameterization (i.e. the mesh resolution). To cope with this dependency, we propose a non-parametric level set method to evaluate spatio-temporal surface evolution. Independent of parameterization and minimizing the length of the geodesic curves, it shrinks smoothly the surface curves towards a sphere by minimizing a Dirichlet energy. An Eulerian PDE approach is used for evaluation of surface dynamics from the curve-shortening flow. Results demonstrate the numerical stability of the derived descriptor throughout smooth continuous-time organ trajectories. Intercorrelations between individuals' motion patterns from different geometric features are computed using the Laplace-Beltrami Operator (LBO) eigenfunctions for spherical mapping.
摘要：在这项工作中，我们在从动态磁共振成像（MRI）深的呼吸运动提出膀胱表面动力学表征管道。动态MRI可捕获软组织器官的解剖时间变化具有高对比度，但获得的序列通常从有限的体积范围，这使得器官形状的高分辨率重建轨迹在时间研究的一个重大挑战苦。对于紧凑的形状表示，具有全容积覆盖重构时间数据首先被用于建立一个特定主题的动态4D啮合利用大变形微分同胚度量映射（LDDMM）框架序列。然后，我们进行的从力学参数器官形状的变化的统计特性，例如网的伸长和扭曲。由于形状的空间是弯曲的，我们还使用了固有曲率变化度量以量化表面进化。然而，曲率的数值计算是强烈地依赖于表面的参数（即，网格分辨率）。为了应对这种依赖关系，我们提出了一种非参数水平集方法来评估时空曲面演化。独立参数化和最小化测地曲线的长度的，它通过最小化能量狄利克雷收缩顺利向球体的表面的曲线。欧拉PDE方法用于从曲线缩短流动表面动力学评估。结果表明在整个平滑连续时间器官轨迹导出的描述符的数值稳定性。从不同的几何特征个人的运动模式之间交互相关所使用的拉普拉斯Beltrami算（LBO）的本征函数为球面映射来计算。

9. Assisted Probe Positioning for Ultrasound Guided Radiotherapy Using Image Sequence Classification [PDF] 返回目录
Alexander Grimwood, Helen McNair, Yipeng Hu, Ester Bonmati, Dean Barratt, Emma Harris
Abstract: Effective transperineal ultrasound image guidance in prostate external beam radiotherapy requires consistent alignment between probe and prostate at each session during patient set-up. Probe placement and ultrasound image inter-pretation are manual tasks contingent upon operator skill, leading to interoperator uncertainties that degrade radiotherapy precision. We demonstrate a method for ensuring accurate probe placement through joint classification of images and probe position data. Using a multi-input multi-task algorithm, spatial coordinate data from an optically tracked ultrasound probe is combined with an image clas-sifier using a recurrent neural network to generate two sets of predictions in real-time. The first set identifies relevant prostate anatomy visible in the field of view using the classes: outside prostate, prostate periphery, prostate centre. The second set recommends a probe angular adjustment to achieve alignment between the probe and prostate centre with the classes: move left, move right, stop. The algo-rithm was trained and tested on 9,743 clinical images from 61 treatment sessions across 32 patients. We evaluated classification accuracy against class labels de-rived from three experienced observers at 2/3 and 3/3 agreement thresholds. For images with unanimous consensus between observers, anatomical classification accuracy was 97.2% and probe adjustment accuracy was 94.9%. The algorithm identified optimal probe alignment within a mean (standard deviation) range of 3.7$^{\circ}$ (1.2$^{\circ}$) from angle labels with full observer consensus, comparable to the 2.8$^{\circ}$ (2.6$^{\circ}$) mean interobserver range. We propose such an algorithm could assist ra-diotherapy practitioners with limited experience of ultrasound image interpreta-tion by providing effective real-time feedback during patient set-up.
摘要：在前列腺外照射有效会阴超声图像引导要求在每个会话患者的建立过程中探针和前列腺之间是一致对齐。探头位置和超声图像间的解释力在操作人员的技能的手动任务队伍，从而导致操作符之间的不确定性降解放疗的精确度。我们证明，通过图像和探头位置数据的联合分类确保了精确的探针放置的方法。使用多输入多任务算法，空间从光学跟踪超声波探头与使用回归神经网络的图像CLAS-sifier以产生两套实时预测的组合坐标数据。该第一组识别相关前列腺解剖学中的使用的类视场可见：外前列腺，前列腺周，前列腺中心。第二组建议的探针角度调整，以实现所述探针和前列腺中心之间的对准与类：向左移动，向右移动，停止。的ALGO-rithm进行训练，并且从61次治疗在32名患者9743幅临床图像进行测试。我们评估对阶级标签分类的准确性，在2/3和3/3协议阈值从三位经验丰富的观察家衍化。用于与观察者之间一致共识图像，解剖分类准确度为97.2％和探针调整精度为94.9％。从全观察者共识角度标签3.7 $ ^ {\ CIRC}的平均值（标准偏差）的范围$（1.2 $ ^ {\ CIRC} $）内的算法确定的最佳探头对准，媲美2.8 $ ^ {\ CIRC } $（2.6 $ ^ {\保监会} $）平均观察者范围。我们提出了这样的算法可以由病人设置的过程中提供有效的实时反馈帮助超声图像interpreta-重刑的经验有限RA-diotherapy从业人员。

10. Vec2Instance: Parameterization for Deep Instance Segmentation [PDF] 返回目录
N. Lakmal Deshapriya, Matthew N. Dailey, Manzul Kumar Hazarika, Hiroyuki Miyazaki
Abstract: Current advances in deep learning is leading to human-level accuracy in computer vision tasks such as object classification, localization, semantic segmentation, and instance segmentation. In this paper, we describe a new deep convolutional neural network architecture called Vec2Instance for instance segmentation. Vec2Instance provides a framework for parametrization of instances, allowing convolutional neural networks to efficiently estimate the complex shapes of instances around their centroids. We demonstrate the feasibility of the proposed architecture with respect to instance segmentation tasks on satellite images, which have a wide range of applications. Moreover, we demonstrate the usefulness of the new method for extracting building foot-prints from satellite images. Total pixel-wise accuracy of our approach is 89\%, near the accuracy of the state-of-the-art Mask RCNN (91\%). Vec2Instance is an alternative approach to complex instance segmentation pipelines, offering simplicity and intuitiveness. The code developed under this study is available in the Vec2Instance GitHub repository, this https URL
摘要：在深度学习进展是导致在计算机视觉任务，如对象分类，定位，语义分割和实例分割人类水平的精确度。在本文中，我们描述了一种新的深卷积神经网络架构，称为Vec2Instance例如分割。 Vec2Instance提供了实例参数化的框架，允许卷积神经网络能够有效地估计在他们重心情况下的复杂的形状。我们证明了该架构的可行性相对于实例分割任务的卫星图像，有广泛的应用。此外，我们展示了从卫星图像中提取建筑脚打印新方法的有效性。我们的方法总的逐像素精度为89 \％，国家的最先进的准确性附近面膜RCNN（91 \％）。 Vec2Instance是一种替代方法，以复杂的实例分割的管道，将提供简单和直观。这项研究下开发的代码可以在GitHub上Vec2Instance库，这HTTPS URL

11. Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation [PDF] 返回目录
Allan Pinto, Manuel A. Córdova, Luis G. L. Decker, Jose L. Flores-Campana, Marcos R. Souza, Andreza A. dos Santos, Jhonatas S. Conceição, Henrique F. Gagliardi, Diogo C. Luvizon, Ricardo da S. Torres, Helio Pedrini
Abstract: Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality.
摘要：立体视觉是计算机视觉越来越多的话题，由于无数的机会和应用这项技术提供了先进的解决方案，如虚拟和增强现实应用的发展。为了增强用户在三维虚拟环境的经验，运动视差估计是一种很有前途的技术来实现这一目标。在本文中，我们提出一种用于从单个图像生成视差运动效果，服用的状态的最先进的实例分割和深度估计方法的优势的算法。这项工作也提出了对这种算法进行比较，调查效率和视差运动效果质量之间的权衡，考虑到多任务学习网络能够同时估计例如分割和深度估计。实验结果和视觉质量的评估表明，用面膜R-CNN或FBNet网络（例如分段）结合PYD-Net网络（深度估计）可产生具有良好的视觉质量视差运动效果。

12. A Method for Tumor Treating Fields Fast Estimation [PDF] 返回目录
Reuben R Shamir, Zeev Bomzon
Abstract: Tumor Treating Fields (TTFields) is an FDA approved treatment for specific types of cancer and significantly extends patients life. The intensity of the TTFields within the tumor was associated with the treatment outcomes: the larger the intensity the longer the patients are likely to survive. Therefore, it was suggested to optimize TTFields transducer array location such that their intensity is maximized. Such optimization requires multiple computations of TTFields in a simulation framework. However, these computations are typically performed using finite element methods or similar approaches that are time consuming. Therefore, only a limited number of transducer array locations can be examined in practice. To overcome this issue, we have developed a method for fast estimation of TTFields intensity. We have designed and implemented a method that inputs a segmentation of the patients head, a table of tissues electrical properties and the location of the transducer array. The method outputs a spatial estimation of the TTFields intensity by incorporating a few relevant parameters in a random-forest regressor. The method was evaluated on 10 patients (20 TA layouts) in a leave-one-out framework. The computation time was 1.5 minutes using the suggested method, and 180-240 minutes using the commercial simulation. The average error was 0.14 V/cm (SD = 0.06 V/cm) in comparison to the result of the commercial simulation. These results suggest that a fast estimation of TTFields based on a few parameters is feasible. The presented method may facilitate treatment optimization and further extend patients life.
摘要：肿瘤治疗领域（TTFields）是FDA批准用于治疗特定类型的癌症和显著延长患者的生存。肿瘤内TTFields的强度与处理结果有关：越大强度越长患者可能存活。因此，有人建议以优化TTFields换能器阵列的位置，使得它们的强度被最大化。这种优化需要一个模拟框架TTFields的多次计算。然而，这些计算是利用有限元方法或耗时类似的方法而进行。因此，可以在实践中仅检测换能器阵列位置的数量有限。为了解决这个问题，我们已经制定了TTFields强度快速估算的方法。我们已经设计和实现的方法，其输入的患者头部的分割，组织中的电特性的表和换能器阵列的位置。该方法通过在随机森林回归掺入几个相关参数输出TTFields强度的空间估计。该方法在留一出框架10名患者（20个TA布局）来评价。计算时间是使用建议的方法1.5分钟并使用商业模拟180-240分钟。的平均误差为相比于商业模拟结果0.14伏/厘米（SD = 0.06伏/厘米）。这些结果表明，基于一些参数TTFields的快速估计是可行的。所提出的方法可以简化处理的优化和进一步延长患者的生存。

13. How Convolutional Neural Network Architecture Biases Learned Opponency and Colour Tuning [PDF] 返回目录
Ethan Harris, Daniela Mihai, Jonathon Hare
Abstract: Recent work suggests that changing Convolutional Neural Network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function. To understand this relationship fully requires a way of quantitatively comparing trained networks. The fields of electrophysiology and psychophysics have developed a wealth of methods for characterising visual systems which permit such comparisons. Inspired by these methods, we propose an approach to obtaining spatial and colour tuning curves for convolutional neurons, which can be used to classify cells in terms of their spatial and colour opponency. We perform these classifications for a range of CNNs with different depths and bottleneck widths. Our key finding is that networks with a bottleneck show a strong functional organisation: almost all cells in the bottleneck layer become both spatially and colour opponent, cells in the layer following the bottleneck become non-opponent. The colour tuning data can further be used to form a rich understanding of how colour is encoded by a network. As a concrete demonstration, we show that shallower networks without a bottleneck learn a complex non-linear colour system, whereas deeper networks with tight bottlenecks learn a simple channel opponent code in the bottleneck layer. We further develop a method of obtaining a hue sensitivity curve for a trained CNN which enables high level insights that complement the low level findings from the colour tuning data. We go on to train a series of networks under different conditions to ascertain the robustness of the discussed results. Ultimately, our methods and findings coalesce with prior art, strengthening our ability to interpret trained CNNs and furthering our understanding of the connection between architecture and learned representation. Code for all experiments is available at this https URL.
摘要：最近的工作通过在第二层上引入瓶颈可以产生在学习的函数的变化表明，改变卷积神经网络（CNN）架构。要理解这种关系完全需要的定量比较训练有素的网络的方式。电生理和心理物理学领域已经开发了大量的方法来表征的视觉系统，其允许这样的比较。通过这些方法的启发，我们提出了一种方法来获得用于卷积神经元，其可在它们的空间和颜色对立的研究方面被用于分类细胞的空间和颜色调节曲线。我们执行这些分类对不同深度和宽度瓶颈的一系列细胞神经网络的。我们的主要发现是，有一个瓶颈网络展现出强大的功能性组织：在瓶颈层几乎所有的细胞变得在空间和色彩的对手，细胞层以下的瓶颈成为非对手。颜色调谐数据还可以用于形成的颜色如何由网络编码的丰富的理解。作为具体的示范，我们表明无瓶颈较浅网络学习复杂的非线性颜色系统，而具有紧瓶颈更深网络学习在瓶颈层的简单信道的对手的代码。我们进一步发展获得用于一个训练有素的CNN使高水平的见解，从彩色调谐数据补充低电平发现色相灵敏度曲线的方法。我们继续来训练不同条件下的一系列网络来确定讨论结果的可靠性。最终，我们的方法和结果合并与现有技术，加强我们的能力来解释训练有素的细胞神经网络和促进我们的架构和学会代表之间的关系的理解。代号为所有实验可在此HTTPS URL。

14. Unfolding the Alternating Optimization for Blind Super Resolution [PDF] 返回目录
Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, Tieniu Tan
Abstract: Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. Small estimation error of the first step could cause severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from LR image, which makes it difficult to predict highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores SR image based on predicted kernel, and \textit{Estimator} estimates blur kernel with the help of restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, \textit{Estimator} utilizes information from both LR and SR images, which makes the estimation of blur kernel easier. More importantly, \textit{Restorer} is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus \textit{Restorer} could be more tolerant to the estimation error of \textit{Estimator}. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at much higher speed. The source code is available at this https URL.
摘要：以前的方法分解盲超分辨率（SR）问题分成两个连续的步骤：\ textit {I}）估计模糊核从给定的低分辨率（LR）图像和\ textit {II}）恢复基于估计的内核SR图像。这种两步溶液包括两个独立地训练的模型，这可能不是彼此很好地兼容。第一步的小的估计误差可能会导致第二个严重的性能下降。而在另一方面，在第一步骤中只能利用来自LR图像，这使得它很难预测高度精确的模糊核的有限信息。对这些问题，而不是单独考虑这两个步骤中，我们采用交替优化算法，它可以估算模糊内核和一个模型中恢复SR图像。具体来说，我们设计了两个卷积神经模块，即\ textit {恢复}和\ textit {估计}。 \ {textit恢复}恢复基于预测内核SR图像，并且\ textit {}估计估计模糊内核恢复SR图像的帮助。我们反复交替这两个模块和展开此过程，以形成端部至端可训练网络。通过这种方式，\ {textit估计}利用来自LR和SR图像，这使得模糊内核更容易估计的信息。更重要的是，\ {textit恢复}与由\ {textit估计}估计内核的培训，而不是地面实况内核，从而\ textit {}恢复可能更宽容\ textit {}估计的估计误差。在合成数据集和真实世界的图像大量的实验表明，我们的模型可以国家的最先进的大幅跑赢大盘的方法和产生更多的视觉上以更高的速度有利的结果。源代码可在此HTTPS URL。

15. Comprehensive Online Network Pruning via Learnable Scaling Factors [PDF] 返回目录
Muhammad Umair Haider, Murtaza Taj
Abstract: One of the major challenges in deploying deep neural network architectures is their size which has an adverse effect on their inference time and memory requirements. Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks. Width wise pruning (filter pruning) is commonly performed via learnable gates or switches and sparsity regularizers whereas pruning of layers has so far been performed arbitrarily by manually designing a smaller network usually referred to as a student network. We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning. This is achieved by introducing gates at different granularities (neuron, filter, layer, block) which are then controlled via an objective function that simultaneously performs pruning at different granularity during each forward pass. Our approach is applicable to wide-variety of architectures without any constraints on spatial dimensions or connection type (sequential, residual, parallel or inception). Our method has resulted in a compression ratio of 70% to 90% without noticeable loss in accuracy when evaluated on benchmark datasets.
摘要：一个部署深层神经网络结构的主要挑战是它们的大小这对他们的推理时间和内存需求产生不利影响。深细胞神经网络可以通过去除基于它们的重要性或深度方向通过去除层和块滤波器被修剪宽度方向。宽度明智修剪（修剪滤波器），而层中的修剪迄今任意通过手动设计通常被称为学生网络的小型网络进行通常经由可学习门或开关和稀疏regularizers进行。我们提出了一个全面的修剪策略，可以同时执行宽度方向以及纵深修剪。这是通过在不同的粒度（神经元，过滤器，层，块），其然后经由一个目标函数来控制，其同时执行每个直传期间在不同粒度修剪引入门来实现的。我们的方法是适用于宽各种体系结构而对空间维度或连接类型（顺序，残余，并联或以来）的任何约束。上基准数据集进行评估时，我们的方法已经导致没有明显的损失精度的70％〜90％的压缩比。

16. Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering [PDF] 返回目录
Wei Han, Hantao Huang, Tao Han
Abstract: Image text carries essential information to understand the scene and perform reasoning. Text-based visual question answering (text VQA) task focuses on visual questions that require reading text in images. Existing text VQA systems generate an answer by selecting from optical character recognition (OCR) texts or a fixed vocabulary. Positional information of text is underused and there is a lack of evidence for the generated answer. As such, this paper proposes a localization-aware answer prediction network (LaAP-Net) to address this challenge. Our LaAP-Net not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer. Moreover, a context-enriched OCR representation (COR) for multimodal fusion is proposed to facilitate the localization task. Our proposed LaAP-Net outperforms existing approaches on three benchmark datasets for the text VQA task by a noticeable margin.
摘要：图片文字进行必要的信息，以了解现场并进行推理。基于文本的视觉问答（文字VQA）任务的重点是需要在图像阅读文本的视觉问题。现有文本VQA系统生成由光学字符识别（OCR）的文本或固定词汇中选择一个答案。文本的位置信息利用不足，也缺乏对所生成的答案证据。因此，本文提出了一种定位感知回答预测网络（LAAP-网）来应对这一挑战。我们LAAP-Net的不仅是产生问题的答案，但也可以预测边框作为生成的答案的证据。此外，上下文富集OCR表示（COR），用于多模态融合提出了促进定位任务。我们提出了一个明显的保证金现有三个基准数据集文本VQA任务的方法LAAP-Net的性能优于。

17. Arbitrary Style Transfer using Graph Instance Normalization [PDF] 返回目录
Dongki Jung, Seunghan Yang, Jaehoon Choi, Changick Kim
Abstract: Style transfer is the image synthesis task, which applies a style of one image to another while preserving the content. In statistical methods, the adaptive instance normalization (AdaIN) whitens the source images and applies the style of target images through normalizing the mean and variance of features. However, computing feature statistics for each instance would neglect the inherent relationship between features, so it is hard to learn global styles while fitting to the individual training dataset. In this paper, we present a novel learnable normalization technique for style transfer using graph convolutional networks, termed Graph Instance Normalization (GrIN). This algorithm makes the style transfer approach more robust by taking into account similar information shared between instances. Besides, this simple module is also applicable to other tasks like image-to-image translation or domain adaptation.
摘要：样式传输图像合成任务，它适用于一个图像的款式到另一个同时保留内容。在统计方法中，自适应实例正常化（AdaIN）美白源图像，并通过归一化的平均值和方差的特征应用于目标图像的样式。然而，计算功能统计信息，每个实例都将忽略特征之间的内在关系，所以很难去学习全球风格，同时配合到各个训练数据集。在本文中，我们利用图卷积网络提出一种新的归一化可学习技术样式转移，称为图实例正常化（GRIN）。这种算法的风格传输方式更稳健的考虑实例之间共享账户类似的信息。此外，该单模也适用于像图像 - 图像平移或领域适应性等任务。

18. Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images [PDF] 返回目录
Qianwei Zhou, Yuhang Chen, Baoqing Li, Xiaoxin Li, Chen Zhou, Jingchang Huang, Haigen Hu
Abstract: Although deep learning has achieved remarkable successes over the past years, few reports have been published about applying deep neural networks to Wireless Sensor Networks (WSNs) for image targets recognition where data, energy, computation resources are limited. In this work, a Cost-Effective Domain Generalization (CEDG) algorithm has been proposed to train an efficient network with minimum labor requirements. CEDG transfers networks from a publicly available source domain to an application-specific target domain through an automatically allocated synthetic domain. The target domain is isolated from parameters tuning and used for model selection and testing only. The target domain is significantly different from the source domain because it has new target categories and is consisted of low-quality images that are out of focus, low in resolution, low in illumination, low in photographing angle. The trained network has about 7M (ResNet-20 is about 41M) multiplications per prediction that is small enough to allow a digital signal processor chip to do real-time recognitions in our WSN. The category-level averaged error on the unseen and unbalanced target domain has been decreased by 41.12%.
摘要：尽管深学习已在过去几年里取得了令人瞩目的成就，一些报告已经发表了有关的图像目标识别其中的数据，能源，计算资源是有限的应用深层神经网络，无线传感器网络（WSN）。在这项工作中，具有成本效益的领域泛化（CEDG）算法已经被提出，要把以最小的劳动需求的高效率的网络。 CEDG通过自动分配合成结构域从公共可用的源域到应用特定目标域传送网络。目标域从参数调谐分离并用于模型选择和只测试。目标域是从源域，因为它具有新的目标类别和在拍摄角度由低画质图像是焦点时，在低的分辨率，在低光照，低的显著不同。训练有素的网络大约7M（RESNET-20约41M），每个预测是小到足以让一个数字信号处理器芯片做实时认可我们的WSN乘法。类别级别上看不见的和不平衡的目标域已经下降了41.12％平均误差。

19. Mapping of Sparse 3D Data using Alternating Projection [PDF] 返回目录
Siddhant Ranade, Xin Yu, Shantnu Kakkarınst, Pedro Miraldo, Srikumar Ramalingam
Abstract: We propose a novel technique to register sparse 3D scans in the absence of texture. While existing methods such as KinectFusion or Iterative Closest Points (ICP) heavily rely on dense point clouds, this task is particularly challenging under sparse conditions without RGB data. Sparse texture-less data does not come with high-quality boundary signal, and this prohibits the use of correspondences from corners, junctions, or boundary lines. Moreover, in the case of sparse data, it is incorrect to assume that the same point will be captured in two consecutive scans. We take a different approach and first re-parameterize the point-cloud using a large number of line segments. In this re-parameterized data, there exists a large number of line intersection (and not correspondence) constraints that allow us to solve the registration task. We propose the use of a two-step alternating projection algorithm by formulating the registration as the simultaneous satisfaction of intersection and rigidity constraints. The proposed approach outperforms other top-scoring algorithms on both Kinect and LiDAR datasets. In Kinect, we can use 100X downsampled sparse data and still outperform competing methods operating on full-resolution data.
摘要：本文提出一种新的技术来注册在没有纹理的稀疏三维扫描。尽管现有的方法，如KinectFusion或迭代最近点（ICP）在很大程度上依赖于密度点云，这个任务是特别不RGB数据稀疏的条件下具有挑战性。稀疏无纹理数据不来提供高品质的边界信号，该禁止角，结，或边界线使用对应的。另外，在稀疏数据的情况下，它是不正确的假设，该相同点将在连续两次扫描来捕获。我们采取不同的方法，并首次重新参数使用大量的线段的点云。在这种重新参数化的数据中，存在大量的交线（而不是对应）的约束，使我们能够解决登记的任务。我们通过配制登记作为交叉点和刚性约束同时满意提出使用两个步骤的交替投影算法的。本文提出的方法优于两个Kinect和激光雷达数据集，其他得分最高的算法。在Kinect的，我们可以用100X下采样稀疏数据仍跑赢上全分辨率的数据进行操作的方法竞争。

20. Optimization over Random and Gradient Probabilistic Pixel Sampling for Fast, Robust Multi-Resolution Image Registration [PDF] 返回目录
Boris N. Oreshkin, Tal Arbel
Abstract: This paper presents an approach to fast image registration through probabilistic pixel sampling. We propose a practical scheme to leverage the benefits of two state-of-the-art pixel sampling approaches: gradient magnitude based pixel sampling and uniformly random sampling. Our framework involves learning the optimal balance between the two sampling schemes off-line during training, based on a small training dataset, using particle swarm optimization. We then test the proposed sampling approach on 3D rigid registration against two state-of-the-art approaches based on the popular, publicly available, Vanderbilt RIRE dataset. Our results indicate that the proposed sampling approach yields much faster, accurate and robust registration results when compared against the state-of-the-art.
摘要：本文提出通过概率像素采样的方法来快速图像配准。我们提出了一个切实可行的方案，利用的好处的国家的最先进的像素采样方法二：基于梯度的幅度像素采样和均匀随机抽样。我们的框架，涉及学习两个采样方案离线之间的最佳平衡训练过程中，基于小训练数据集，利用粒子群算法。然后，我们测试的3D刚性登记所提出的采样方法对两名国家的最先进方法的基础上流行的，公开的，范德比尔特RIRE数据集。我们的研究结果表明，该抽样方法的产量更快，准确和稳健的注册结果时，对国家的最先进的相比。

21. RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs [PDF] 返回目录
Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, Richard Hartley
Abstract: Although 3D Convolutional Neural Networks (CNNs) are essential for most learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Compressing such networks by pruning therefore becomes highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms that embeds pruning into an iterative optimization paradigm. In this work, we introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at initialization to high sparsity levels. Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function. This neuron importance is then reweighted according to the neuron resource consumption related to FLOPs or memory. We demonstrate the effectiveness of our pruning method on 3D semantic segmentation with widely used 3D-UNets on ShapeNet and BraTS'18 as well as on video classification with MobileNetV2 and I3D on UCF101 dataset. In these experiments, our RANP leads to roughly 50-95 reduction in FLOPs and 35-80 reduction in memory with negligible loss in accuracy compared to the unpruned networks. This significantly reduces the computational resources required to train 3D CNNs. The pruned network obtained by our algorithm can also be easily scaled up and transferred to another dataset for training.
摘要：虽然3D卷积神经网络（细胞神经网络）是涉及密集的3D数据大多数学习基础的应用至关重要，其适用性是有限的，由于过多的内存和计算要求。通过因此修剪压缩这样的网络中变得非常可取的。然而，修剪3D细胞神经网络在很大程度上可能是因为未开发的嵌入到修剪迭代优化模式的典型修剪算法的复杂性。在这项工作中，我们引入了资源感知神经元修剪（RANP）算法，李子3D细胞神经网络在初始化时高稀疏性水平。具体而言，其核心思想是获得重要性分数基于其损失函数灵敏度每个神经元。然后，这个神经元的重要性是根据有关触发器或内存中的神经元资源消耗再加权。我们证明我们的修剪方法对3D语义分割与ShapeNet和BraTS'18以及与MobileNetV2和I3D上UCF101数据集视频分类广泛应用于3D-UNets的有效性。在这些实验中，我们RANP导致在FLOPS约50-95减少和35-80减少内存损失忽略不计的精度相比未修剪的网络。这显著减少训练3D细胞神经网络所需的计算资源。通过我们的算法得到的修剪网络也可以简便地升级和转移到另一个数据集进行训练。

22. Self-supervised Exposure Trajectory Recovery for Dynamic Blur Estimation [PDF] 返回目录
Youjian Zhang, Chaoyue Wang, Stephen J. Maybank, Dacheng Tao
Abstract: Dynamic scene blurring is an important yet challenging topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of blurry motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation is highly ill-posed. By revisiting the principle of camera exposure, dynamic blur can be described by the relative motions of sharp content with respect to each exposed pixel. We define exposure trajectories, which record the trajectories of relative motions to represent the motion information contained in a blurry image and explain the causes of the dynamic blur. A new blur representation, which we call motion offset, is proposed to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild constraints, the learned motion offsets can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, we demonstrate that the estimated exposure trajectories can fit real-world dynamic blurs and further contribute to motion-aware image deblurring and warping-based video extraction from a single blurry image.
摘要：动态全景模糊化是一个重要而又富有挑战性的课题。近日，深学习方法已经实现了动态场景去模糊骄人的业绩。然而，包含在一个模糊图像的运动信息还没有被充分开发并准确地配制因为：（i）模糊运动的基础事实难以得到; （ii）所述时间排序在曝光期间被破坏;和（iii）的运动估计是高度病态。通过重新审视相机曝光的原理，动态模糊可以通过的尖锐含量的相对运动相对于每个暴露的像素进行说明。我们定义曝光轨迹，相对运动的轨迹来表示包含在一个模糊图像的运动信息，并解释所述动态模糊的原因，其中包括记录。一个新的模糊表示，我们称之为运动偏移，提出了在多个时间点的潜清晰图像的模型逐像素位移。在温和的限制，所学习的运动补偿可以致密恢复，（非）线性曝光轨迹，这显著减少时间障碍和不适定问题。最后，我们表明，估计曝光轨迹可以适应现实世界的动态模糊和进一步从单一的模糊图像有助于运动感知的图像去模糊和变形为基础的影像撷取。

23. Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track [PDF] 返回目录
Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun
Abstract: In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation. Our baseline detector is mainly built on a new designed RPN, called RPN++. On the COCO-2019 detection/instance-segmentation test-dev dataset, our system achieves 61.0/53.1 mAP, which surpassed our 2018 winning results by 5.0/4.2 respectively. We achieve the best results in COCO Challenge 2019 and 2020.
摘要：在这份报告中，我们提出我们的目标检测/实例分系统MegDetV2，其在两通的方式工作，首先要检测实例即可获得分割。我们的基准探测器主要是建立在一个全新设计的RPN，叫RPN ++。在COCO-2019检测/实例的分割测试-dev的数据集，我们的系统达到61.0 / 53.1图，它由5.0 / 4.2分别超出了我们的2018获胜结果。我们取得了COCO挑战2019年和2020年的最好成绩。

24. Visualizing Color-wise Saliency of Black-Box Image Classification Models [PDF] 返回目录
Yuhki Hatakeyama, Hiroki Sakuma, Yoshinori Konishi, Kohei Suenaga
Abstract: Image classification based on machine learning is being commonly used. However, a classification result given by an advanced method, including deep learning, is often hard to interpret. This problem of interpretability is one of the major obstacles in deploying a trained model in safety-critical systems. Several techniques have been proposed to address this problem; one of which is RISE, which explains a classification result by a heatmap, called a saliency map, which explains the significance of each pixel. We propose MC-RISE (Multi-Color RISE), which is an enhancement of RISE to take color information into account in an explanation. Our method not only shows the saliency of each pixel in a given image as the original RISE does, but the significance of color components of each pixel; a saliency map with color information is useful especially in the domain where the color information matters (e.g., traffic-sign recognition). We implemented MC-RISE and evaluate them using two datasets (GTSRB and ImageNet) to demonstrate the effectiveness of our methods in comparison with existing techniques for interpreting image classification results.
摘要遍使用基于机器学习的图像分类：抽象。然而，通过先进的方法，包括深学习给出的分类结果，往往是很难解释。解释性的这个问题是部署在安全关键系统训练模型的主要障碍之一。一些技术已经被提出来解决这个问题;其中之一是RISE，其通过热图，称为显着性图，其解释了每个像素的意义解释的分类结果。我们建议MC-RISE（多色RISE），这是RISE的增强取色信息纳入考虑中的解释。我们的方法不仅示出了作为原始RISE确实给定图像中的每个像素，但每个像素的色彩分量的意义的显着性;的显着图的彩色信息是有用的，尤其在其中域的颜色信息的事项（例如，交通标志识别）。我们实现了MC-RISE和使用两个数据集（GTSRB和ImageNet）证明在解释图像分类结果的现有技术相比，我们的方法的有效性进行评估。

25. Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays [PDF] 返回目录
Jianmo Ni, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley
Abstract: Automatic medical image report generation has drawn growing attention due to its potential to alleviate radiologists' workload. Existing work on report generation often trains encoder-decoder networks to generate complete reports. However, such models are affected by data bias (e.g.~label imbalance) and face common issues inherent in text generation models (e.g.~repetition). In this work, we focus on reporting abnormal findings on radiology images; instead of training on complete radiology reports, we propose a method to identify abnormal findings from the reports in addition to grouping them with unsupervised clustering and minimal rules. We formulate the task as cross-modal retrieval and propose Conditional Visual-Semantic Embeddings to align images and fine-grained abnormal findings in a joint embedding space. We demonstrate that our method is able to retrieve abnormal findings and outperforms existing generation models on both clinical correctness and text generation metrics.
摘要：医学图像自动生成报告已经引起越来越多的关注，因为它的潜力，以减轻放射科医生的工作量。报告生成现有的工作中经常训练编码器，解码器网络，以生成完整的报告。然而，这种模型是通过数据偏压的影响（例如〜标签不平衡）在文本生成模型所固有的，面部的常见问题（例如〜重复）。在这项工作中，我们重点汇报放射影像异常发现;而不是在完整的放射学报告的训练，我们提出了一个以确定从报告异常发现除了与无监督聚类和最小的规则对它们进行分组的方法。我们制定的任务，因为跨模态获取并提出有条件的视觉，语义曲面嵌入到对齐图像和细粒度的异常发现在联合嵌入空间。我们证明我们的方法是能够检索异常发现，优于现有一代车型上都临床正确性和文本生成指标。

26. Collaboratively boosting data-driven deep learning and knowledge-guided ontological reasoning for semantic segmentation of remote sensing imagery [PDF] 返回目录
Yansheng Li, Song Ouyang, Yongjun Zhang
Abstract: As one kind of architecture from the deep learning family, deep semantic segmentation network (DSSN) achieves a certain degree of success on the semantic segmentation task and obviously outperforms the traditional methods based on hand-crafted features. As a classic data-driven technique, DSSN can be trained by an end-to-end mechanism and competent for employing the low-level and mid-level cues (i.e., the discriminative image structure) to understand images, but lacks the high-level inference ability. By contrast, human beings have an excellent inference capacity and can be able to reliably interpret the RS imagery only when human beings master the basic RS domain knowledge. In literature, ontological modeling and reasoning is an ideal way to imitate and employ the domain knowledge of human beings, but is still rarely explored and adopted in the RS domain. To remedy the aforementioned critical limitation of DSSN, this paper proposes a collaboratively boosting framework (CBF) to combine data-driven deep learning module and knowledge-guided ontological reasoning module in an iterative way.
摘要：作为一种架构从深学习型家庭，深层语义分割网络（DSSN）实现了一定程度上的语义分割任务的成功，明显优于基于手工制作特色传统方法。作为经典数据驱动技术，DSSN可由最终到终端的机制来训练和胜任使用低级，中级线索（即，判别图像结构）了解图像，但缺乏高水平推理能力。相比之下，人类有一个优秀的推理能力，并且可以只能够可靠地解释遥感影像，当人类掌握基本RS领域知识。在文学本体论建模和推理是模仿一种理想的方式，聘请人类的领域知识，但仍然很少探讨，并在RS域采用。为了弥补DSSN的上述严格限制，本文提出了一种协同推进框架（CBF）以迭代方式的数据驱动的深度学习模块和知识引导本体论推理模块结合起来。

27. Shot in the Dark: Few-Shot Learning with No Base-Class Labels [PDF] 返回目录
Zitian Chen, Subhransu Maji, Erik Learned-Miller
Abstract: Few-shot learning aims to learn classifiers for new objects from a small number of labeled examples. But it does not do this in a vacuum. Usually, a strong inductive bias is borrowed from the supervised learning of base classes. This inductive bias enables more statistically efficient learning of the new classes. In this work, we show that no labels are needed to develop such an inductive bias, and that self-supervised learning can provide a powerful inductive bias for few-shot learning. This is particularly effective when the unlabeled data for learning such a bias contains not only examples of the base classes, but also examples of the novel classes. The setting in which unlabeled examples of the novel classes are available is known as the transductive setting. Our method outperforms state-of-the-art few-shot learning methods, including other transductive learning methods, by 3.9% for 5-shot accuracy on miniImageNet without using any base class labels. By benchmarking unlabeled-base-class (UBC) few-shot learning and UBC transductive few-shot learning, we demonstrate the great potential of self-supervised feature learning: self-supervision alone is sufficient to create a remarkably good inductive bias for few-shot learning. This motivates a rethinking of whether base-class labels are necessary at all for few-shot learning. We also explore the relationship between self-supervised features and supervised features, comparing both their transferability and their complementarity in the non-transductive setting. By combining supervised and self-supervised features learned from base classes, we also achieve a new state-of-the-art in the non-transductive setting, outperforming all previous methods.
摘要：很少次的学习目标，以学习新对象分类从少数的标记例子。不过，这并不在真空中做到这一点。通常，较强的感应偏压从基类的监督学习借来的。这种归纳偏置使更多统计新类的高效学习。在这项工作中，我们证明了不需要的标签来开发这样的归纳偏置和自我监督学习可以提供一些次学习强大的归纳偏置。当用于学习这样的偏压未标记的数据不仅包含基类的新颖的类的实例的例子，但也这是特别有效的。在其中新的类的未标记的实例是可用的设置被称为转导设置。我们的方法优于国家的最先进的少数次学习方法，包括其它转导学习方法，由3.9％5-喷射精度上miniImageNet而无需使用任何的基类的标签。标杆管理未标记基类（UBC）几拍的学习和UBC直推几拍的学习中，我们展示自我监督功能学习的巨大潜力：单独的自我监督就足以创造一个few-非常好的归纳偏置出手学习。这激发的基类的标签是否在所有的几拍学习必要进行反思。我们还探索自我监督的功能和监督功能之间的关系，比较两个推广工作和他们在非直推式设置的互补性。通过结合从基类学到监督和自我监督的功能，我们还实现了新的国家的最先进的非直推式设置，超越以前的所有方法。

28. Video Anomaly Detection Using Pre-Trained Deep Convolutional Neural Nets and Context Mining [PDF] 返回目录
Chongke Wu, Sicong Shao, Cihan Tunc, Salim Hariri
Abstract: Anomaly detection is critically important for intelligent surveillance systems to detect in a timely manner any malicious activities. Many video anomaly detection approaches using deep learning methods focus on a single camera video stream with a fixed scenario. These deep learning methods use large-scale training data with large complexity. As a solution, in this paper, we show how to use pre-trained convolutional neural net models to perform feature extraction and context mining, and then use denoising autoencoder with relatively low model complexity to provide efficient and accurate surveillance anomaly detection, which can be useful for the resource-constrained devices such as edge devices of the Internet of Things (IoT). Our anomaly detection model makes decisions based on the high-level features derived from the selected embedded computer vision models such as object classification and object detection. Additionally, we derive contextual properties from the high-level features to further improve the performance of our video anomaly detection method. We use two UCSD datasets to demonstrate that our approach with relatively low model complexity can achieve comparable performance compared to the state-of-the-art approaches.
摘要：智能监控系统，及时发现任何恶意行为异常检测是非常重要的。许多视频异常检测方法使用深度学习方法集中在具有固定场景中单个摄像机的视频流。这些深层次的学习方法使用大型训练数据有大的复杂性。作为一种解决方案，在本文中，我们将展示如何使用预训练卷积神经网络模型进行特征提取和上下文采矿，然后使用降噪自动编码器与相对较低的模型的复杂性，提供高效，准确的监控异常检测，它可以是对于诸如观光（IOT）的因特网的边缘设备的资源受限的设备是有用的。我们的异常检测模型使得基于高层次的决策从选定的嵌入式计算机视觉模型衍生而来，例如对象分类和对象检测的功能。此外，我们从高层获得情境性特点，进一步完善我们的视频异常检测方法的性能。我们用两个UCSD数据集，以证明我们的以相对较低的模型复杂度的方法能够比国家的最先进的方法达到相当的性能。

29. Adaptive Automotive Radar data Acquisition [PDF] 返回目录
Madhumitha Sakthi, Ahmed Tewfik
Abstract: In an autonomous driving scenario, it is vital to acquire and efficiently process data from various sensors to obtain a complete and robust perspective of the surroundings. Many studies have shown the importance of having radar data in addition to images since radar is robust to weather conditions. We develop a novel algorithm for selecting radar return regions to be sampled at a higher rate based on prior reconstructed radar frames and image data. Our approach uses adaptive block-based Compressed Sensing(CS) to allocate higher sampling rates to "important" blocks dynamically while maintaining the overall sampling budget per frame. This improves over block-based CS, which parallelizes computation by dividing the radar frame into blocks. Additionally, we use the Faster R-CNN object detection network to determine these important blocks from previous radar and image information. This mitigates the potential information loss of an object missed by the image or the object detection network. We also develop an end-to-end transformer-based 2D object detection network using the NuScenes radar and image data. Finally, we compare the performance of our algorithm against that of standard CS on the Oxford Radar RobotCar dataset.
摘要：自主驾驶的情况下，它是获得重要和有效地处理来自各种传感器的数据，以获得周围环境的完整和健壮的视角。许多研究表明，除了图像具有雷达数据的重要性，因为雷达是稳健的天气条件。我们开发用于选择的雷达返回区域的新颖算法以基于先前重建的雷达帧和图像数据更高的速率被采样。我们的方法采用自适应的基于块的压缩传感（CS）来动态地分配更高的采样率，以“重要”的块，同时保持每帧的总采样预算。这改善了在基于块的CS，其通过将雷达帧划分为块进行并行计算。此外，我们使用了更快的R-CNN物体检测网络，以确定从先前的雷达和图像信息的这些重要的块。这减轻对象的可能会丢失信息错过由图像或物体检测网络。我们还开发使用NuScenes雷达和图像数据的端部到端基于变压器的2D对象检测网络。最后，我们比较我们针对标准CS算法在牛津雷达RobotCar数据集的性能。

30. VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach [PDF] 返回目录
Mohamed Kerroumi, Othmane Sayem, Aymen Shabou
Abstract: We introduce a novel approach for scanned document representation to perform fields extraction task. It allows the simultaneous encoding of the textual, visual and layout information in a 3D matrix used as an input to a segmentation model. We improve the recent Chargrid and Wordgrid models in several directions, first by taking into account the visual modality, then by boosting its robustness toward small datasets, while keeping the inference time low. Our approach is tested on public and private document image datasets, showing higher performances compared to the recent state-of-the-art methods.
摘要：介绍了扫描文档表示一种新的方法来执行领域抽取的任务。它允许在用作输入到分割模型的3D矩阵的文本，视觉和布局信息的同时编码。首先考虑到视觉模式，然后通过提高其坚固性对小数据集，同时保持推理时间低，我们在几个方向上提高近Chargrid和Wordgrid模型。我们的做法是在公共和私人文档图像数据集进行测试，显示出相比近期国家的最先进的方法更高的性能。

31. Deep Generative Modelling of Human Reach-and-Place Action [PDF] 返回目录
Connor Daly, Yuzuko Nakamura, Tobias Ritschel
Abstract: The motion of picking up and placing an object in 3D space is full of subtle detail. Typically these motions are formed from the same constraints, optimizing for swiftness, energy efficiency, as well as physiological limits. Yet, even for identical goals, the motion realized is always subject to natural variation. To capture these aspects computationally, we suggest a deep generative model for human reach-and-place action, conditioned on a start and end position.We have captured a dataset of 600 such human 3D actions, to sample the 2x3-D space of 3D source and targets. While temporal variation is often modeled with complex learning machinery like recurrent neural networks or networks with memory or attention, we here demonstrate a much simpler approach that is convolutional in time and makes use of(periodic) temporal encoding. Provided a latent code and conditioned on start and end position, the model generates a complete 3D character motion in linear time as a sequence of convolutions. Our evaluation includes several ablations, analysis of generative diversity and applications.
摘要：拾取和放置在三维空间中的物体的运动是充分细微细节。典型地，这些运动是从相同的约束构成，优化迅捷，能效，以及生理极限。然而，即使是相同的目标，实现了运动总是受自然变化。为了计算捕捉这些方面，我们建议对人类到达放动作深深的生成模型，条件在开始和结束position.We已抓获600个人等3D动作的数据集，来样3D的2×3-d空间源和目标。虽然时间上的变化往往与复杂的学习机器一样回归神经网络或网络与内存或注意力建模，我们在这里展示一个更简单的方法就是在时间卷积和利用（周期）时间编码的。提供了一种潜代码和条件上的开始和结束位置，所述模型生成线性时间作为卷积的序列的完整的3D字符运动。我们的评估包括几个消融，生成多样性和应用的分析。

32. Multi-level Feature Learning on Embedding Layer of Convolutional Autoencoders and Deep Inverse Feature Learning for Image Clustering [PDF] 返回目录
Behzad Ghazanfari, Fatemeh Afghah
Abstract: This paper introduces Multi-Level feature learning alongside the Embedding layer of Convolutional Autoencoder (CAE-MLE) as a novel approach in deep clustering. We use agglomerative clustering as the multi-level feature learning that provides a hierarchical structure on the latent feature space. It is shown that applying multi-level feature learning considerably improves the basic deep convolutional embedding clustering (DCEC). CAE-MLE considers the clustering loss of agglomerative clustering simultaneously alongside the learning latent feature of CAE. In the following of the previous works in inverse feature learning, we show that the representation of learning of error as a general strategy can be applied on different deep clustering approaches and it leads to promising results. We develop deep inverse feature learning (deep IFL) on CAE-MLE as a novel approach that leads to the state-of-the-art results among the same category methods. The experimental results show that the CAE-MLE improves the results of the basic method, DCEC, around 7% -14% on two well-known datasets of MNIST and USPS. Also, it is shown that the proposed deep IFL improves the primary results about 9%-17%. Therefore, both proposed approaches of CAE-MLE and deep IFL based on CAE-MLE can lead to notable performance improvement in comparison to the majority of existing techniques. The proposed approaches while are based on a basic convolutional autoencoder lead to outstanding results even in comparison to variational autoencoders or generative adversarial networks.
摘要：本文介绍的多级特征的学习一起卷积自动编码（CAE-MLE）的嵌入层作为深聚类的新方法。我们用凝聚聚类的多层次特征的学习上的潜在功能空间提供了一个分层结构。结果表明，应用多级特征显着地提高了学习的基本深卷积嵌入聚类（DCEC）。 CAE-MLE认为凝聚聚类的聚类损失同时旁边CAE的学习潜在功能。在反地物学习前代作品的下面，我们表明，错误的学习作为总体战略的展现不同的深聚类方法被应用，它会导致有希望的结果。我们开发的CAE-MLE深逆地物学习（深IFL）作为一种新的方法，导致了同一类方法中的国家的最先进的成果。实验结果表明，该CAE-MLE提高了基本方法，DCEC的结果，约7％-14％上MNIST和USPS的两个著名的数据集。另外，还示出，所提出的深IFL提高约9％-17％的主要结果。因此，基于CAE-MLE CAE-MLE和深IFL双方提出的方法会导致显着的性能改进相比，大多数的现有技术。所提出的方法，而甚至在比较变自动编码或生成对抗性的网络基于一个基本的卷积的自动编码导致优异成绩。

33. A Benchmark and Baseline for Language-Driven Image Editing [PDF] 返回目录
Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu
Abstract: Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.
摘要：语言驱动的图像编辑可以显著保存费力图像编辑工作，并友好地对待摄影的新手。然而，大多数类似的工作只能处理一个特定的图像域或只能做全球润饰。为了解决这个新任务，我们首先提出了一种新的语言驱动的图像编辑数据集支持本地和全球的编辑，编辑操作和掩码注解。此外，我们还建议充分利用注释来解决这个问题基线法。我们的新方法将每个编辑操作作为一个子模块，并且可以自动预测操作的参数。不仅对挑战用户数据表现良好，但这样的做法也是高度可解释的。我们相信，我们的工作，包括基准和基线两者将推动图像编辑区域朝着更加全面和自由形式的水平。

34. Common CNN-based Face Embedding Spaces are (Almost) Equivalent [PDF] 返回目录
David McNeely-White, Benjamin Sattelberg, Nathaniel Blanchard, Ross Beveridge
Abstract: CNNs are the dominant method for creating face embeddings for recognition. It might be assumed that, since these networks are distinct, complex, nonlinear functions, that their embeddings are network specific, and thus have some degree of anonymity. However, recent research has shown that distinct networks' features can be directly mapped with little performance penalty (median 1.9% reduction across 90 distinct mappings) in the context of the 1,000 object ImageNet recognition task. This finding has revealed that embeddings coming from different systems can be meaningfully compared, provided the mapping. However, prior work only considered networks trained and tested on a closed set classification task. Here, we present evidence that a linear mapping between feature spaces can be easily discovered in the context of open set face recognition. Specifically, we demonstrate that the feature spaces of four face recognition models, of varying architecture and training datasets, can be mapped between with no more than a 1.0% penalty in recognition accuracy on LFW . This finding, which we also replicate on YouTube Faces, demonstrates that embeddings from different systems can be readily compared once the linear mapping is determined. In further analysis, fewer than 500 pairs of corresponding embeddings from two systems are required to calculate the full mapping between embedding spaces, and reducing the dimensionality of the mapping from 512 to 64 produces negligible performance penalty.
摘要：细胞神经网络是用于识别创建的嵌入面对的主要方法。可以假设的是，由于这些网络是不同的，复杂的，非线性函数，其嵌入物是特定的网络，因此有一定程度的匿名。然而，最近的研究表明，不同的网络功能，可以在1000对象ImageNet识别任务的情况下很少性能（在90个不同的映射中值1.9％减少）直接映射。这一发现表明，嵌入物从不同的系统来可以进行有意义的比较，提供的映射。然而，以前的工作只考虑网络的培训和闭集分类任务测试。在这里，我们目前的证据表明，功能空间之间的线性映射可以很容易地在开集面部识别的情况下发现的。具体而言，我们证明了四种面部识别模型特征空间，不同的架构和训练数据集，可以之间不超过在LFW识别准确率1.0％的罚款映射。这一发现，我们还复制YouTube上面，表明来自不同系统的嵌入可以一旦线性映射被确定被容易地比较。在进一步的分析中，需要从两个系统少于500双对应的嵌入的计算嵌入的空间，并减少了映射的维数从512到64产生可忽略的性能损失之间的全部映射。

35. Tensor Fields for Data Extraction from Chart Images: Bar Charts and Scatter Plots [PDF] 返回目录
Jaya Sreevalsan-Nair, Komal Dadhich, Siri Chandana Daggubati
Abstract: Charts are an essential part of both graphicacy (graphical literacy), and statistical literacy. As chart understanding has become increasingly relevant in data science, automating chart analysis by processing raster images of the charts has become a significant problem. Automated chart reading involves data extraction and contextual understanding of the data from chart images. In this paper, we perform the first step of determining the computational model of chart images for data extraction for selected chart types, namely, bar charts, and scatter plots. We demonstrate the use of positive semidefinite second-order tensor fields as an effective model. We identify an appropriate tensor field as the model and propose a methodology for the use of its degenerate point extraction for data extraction from chart images. Our results show that tensor voting is effective for data extraction from bar charts and scatter plots, and histograms, as a special case of bar charts.
摘要：图表都是graphicacy（图形识字），以及统计素养的重要组成部分。正如图表的理解已经成为数据科学越来越重要，通过处理图表的光栅图像已经成为一个显著问题自动图表分析。自动图表读取涉及数据提取和从图表图像的数据的上下文的理解。在本文中，我们执行确定图表图像的数据提取的计算模型用于所选图表的类型，即，条形图和散点图的第一步骤。我们展示了使用半正定二阶张量场作为一种有效的模式。我们确定一个适当的张量场作为模型，并提出了利用其退化点提取从图图像数据提取的一种方法。我们的研究结果表明，张量投票是有效地从条形图和散点图和直方图数据提取，条形图的一个特例。

36. SMILE: Semantically-guided Multi-attribute Image and Layout Editing [PDF] 返回目录
Andrés Romero, Luc Van Gool, Radu Timofte
Abstract: Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs). Exploring the disentangled attribute space within a transformation is a very challenging task due to the multiple and mutually-inclusive nature of the facial images, where different labels (eyeglasses, hats, hair, identity, etc.) can co-exist at the same time. Several works address this issue either by exploiting the modality of each domain/attribute using a conditional random vector noise, or extracting the modality from an exemplary image. However, existing methods cannot handle both random and reference transformations for multiple attributes, which limits the generality of the solutions. In this paper, we successfully exploit a multimodal representation that handles all attributes, be it guided by random noise or exemplar images, while only using the underlying domain information of the target domain. We present extensive qualitative and quantitative results for facial datasets and several different attributes that show the superiority of our method. Additionally, our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space, and it can be easily extended to head-swapping and face-reenactment applications without being trained on videos.
摘要：属性图像处理一直以来引进剖成对抗性网络（甘斯）的一个非常活跃的话题。探索的变换中的解缠结的属性空间是一项极具挑战的任务，因为在同一时间的面部图像，其中不同标签（眼镜，帽子，头发，身份，等）可以共存的多个和相互包容性。几个作品通过使用条件随机向量噪声利用每个域/属性的模态，或从示例性图像中提取形态任一解决这一问题。但是，现有的方法不能处理多个属性，这限制了解决方案的一般性随机和参考变换。在本文中，我们成功地利用多峰表示处理所有的属性，它被由随机噪声或示例性的图像引导的，而仅使用目标域的基本域信息。我们提出了广泛的定性和面部数据集和几种不同的属性，显示了该方法的优越性定量结果。此外，我们的方法能够增加，删除或通过使用图像作为参考或通过探索样式分配空间改变任一细粒或粗的属性，并且它可以很容易地扩展到头部交换和面重演应用程序，而不对影片的培训。

37. Smoother Network Tuning and Interpolation for Continuous-level Image Processing [PDF] 返回目录
Hyeongmin Lee, Taeoh Kim, Hanbin Son, Sangwook Baek, Minsu Cheon, Sangyoun Lee
Abstract: In Convolutional Neural Network (CNN) based image processing, most studies propose networks that are optimized to single-level (or single-objective); thus, they underperform on other levels and must be retrained for delivery of optimal performance. Using multiple models to cover multiple levels involves very high computational costs. To solve these problems, recent approaches train networks on two different levels and propose their own interpolation methods to enable arbitrary intermediate levels. However, many of them fail to generalize or have certain side effects in practical usage. In this paper, we define these frameworks as network tuning and interpolation and propose a novel module for continuous-level learning, called Filter Transition Network (FTN). This module is a structurally smoother module than existing ones. Therefore, the frameworks with FTN generalize well across various tasks and networks and cause fewer undesirable side effects. For stable learning of FTN, we additionally propose a method to initialize non-linear neural network layers with identity mappings. Extensive results for various image processing tasks indicate that the performance of FTN is comparable in multiple continuous levels, and is significantly smoother and lighter than that of other frameworks.
摘要：卷积神经网络（CNN）基于图像处理，大多数研究提出，被优化以单级（或单物镜）网络;因此，它们表现不佳的其他水平和必须重新训练用于递送的最佳性能。使用多个车型覆盖多层次涉及到非常高的计算成本。为了解决这些问题，最近的训练方法在两个不同级别的网络，并提出自己的插值方法，使任意中间水平。然而，许多人无法一概而论，或在实际使用中一定的副作用。在本文中，我们定义这些框架作为网络调谐和内插，并提出用于连续级学习一种新的模块，称为过滤器转移网络（FTN）。该模块比现有的结构上平滑模块。因此，与FTN框架以及跨各种任务和网络和引起不希望的副作用更少一概而论。对于FTN的稳定的学习，我们另外提出了一种方法来初始化非线性神经网络层用身份映射。各种图像处理任务大量结果表明，FTN的性能在多个连续可比的水平，并且比其他框架的显著更流畅，重量更轻。

38. CO2: Consistent Contrast for Unsupervised Visual Representation Learning [PDF] 返回目录
Chen Wei, Huiyu Wang, Wei Shen, Alan Yuille
Abstract: Contrastive learning has been adopted as a core method for unsupervised visual representation learning. Without human annotation, the common practice is to perform an instance discrimination task: Given a query image crop, this task labels crops from the same image as positives, and crops from other randomly sampled images as negatives. An important limitation of this label assignment strategy is that it can not reflect the heterogeneous similarity between the query crop and each crop from other images, taking them as equally negative, while some of them may even belong to the same semantic class as the query. To address this issue, inspired by consistency regularization in semi-supervised learning on unlabeled data, we propose Consistent Contrast (CO2), which introduces a consistency regularization term into the current contrastive learning framework. Regarding the similarity of the query crop to each crop from other images as "unlabeled", the consistency term takes the corresponding similarity of a positive crop as a pseudo label, and encourages consistency between these two similarities. Empirically, CO2 improves Momentum Contrast (MoCo) by 2.9% top-1 accuracy on ImageNet linear protocol, 3.8% and 1.1% top-5 accuracy on 1% and 10% labeled semi-supervised settings. It also transfers to image classification, object detection, and semantic segmentation on PASCAL VOC. This shows that CO2 learns better visual representations for these downstream tasks.
摘要：对比学习已被采纳为无监督的可视化表示学习核心方法。如果没有人的注解，一般的做法是进行一个实例辨别任务：给定一个查询图像裁切，这个任务标签在同一图像作为阳性作物和其他随机抽样的图像作为底片作物。该标签分配策略的一个重要的限制是，它不能反映查询作物和其他图像中的每个作物之间的异质性的相似性，把他们作为平等负，而他们中的一些甚至可能属于同一语义类作为查询。为了解决这个问题，通过一致性正规化的标签数据的半监督学习的启发，我们提出了一致的对比度（CO2），它引入了一个一致性调整项到当前的对比学习框架。关于该查询作物从其它图像中的每个作物的相似性为“未标记”，所述一致性术语取正的作物作为伪标签的相应的相似性，并鼓励这两个相似性之间的一致性。凭经验，CO2提高动量对比度（莫科）2.9％顶部-1精度上ImageNet线性协议，3.8％和1.1％顶部-5精度在1％和10％的标记的半监督设置。它也传送给图像分类，对象检测和PASCAL VOC语义分割。这表明，这些下游任务CO2获悉更好的视觉表示。

39. NCP-VAE: Variational Autoencoders with Noise Contrastive Priors [PDF] 返回目录
Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat
Abstract: Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in various domains. However, they struggle to generate high-quality images, especially when samples are obtained from the prior without any tempering. One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior. Due to this mismatch, there exist areas in the latent space with high density under the prior that do not correspond to any encoded image. Samples from those areas are decoded to corrupted images. To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior. We train the reweighting factor by noise contrastive estimation, and we generalize it to hierarchical VAEs with many latent variable groups. Our experiments confirm that the proposed noise contrastive priors improve the generative performance of state-of-the-art VAEs by a large margin on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ 256 datasets.
摘要：变自动编码（VAES）是功能强大的基于似然生成模型在各个领域的应用之一。然而，它们很难产生高质量的图像，特别是当没有任何回火是现有获得的样品。对于VAES'差生成质量的一个解释是，之前孔的问题：先验分布不匹配的总近似后路。由于这种失配，存在下不对应于任何编码图像的现有的高密度的潜在空间区域。来自这些地区的样品进行解码，损坏的图像。为了解决这个问题，我们提出了基于能量的前由基础先验分布的产品和权重调整因素，设计使底座更接近总后确定。我们通过对比噪声估计训练重新加权因素，我们也有许多潜变量组推广到分层VAES。我们的实验证实，所提出的噪音对比先验大幅度提高对MNIST国家的最先进的VAES的生成性能，CIFAR-10，CelebA 64和CelebA HQ 256个集。

40. Iterative Methods for Computing Eigenvectors of Nonlinear Operators [PDF] 返回目录
Guy Gilboa
Abstract: In this chapter we are examining several iterative methods for solving nonlinear eigenvalue problems. These arise in variational image-processing, graph partition and classification, nonlinear physics and more. The canonical eigenproblem we solve is $T(u)=\lambda u$, where $T:\R^n\to \R^n$ is some bounded nonlinear operator. Other variations of eigenvalue problems are also discussed. We present a progression of 5 algorithms, coauthored in recent years by the author and colleagues. Each algorithm attempts to solve a unique problem or to improve the theoretical foundations. The algorithms can be understood as nonlinear PDE's which converge to an eigenfunction in the continuous time domain. This allows a unique view and understanding of the discrete iterative process. Finally, it is shown how to evaluate numerically the results, along with some examples and insights related to priors of nonlinear denoisers, both classical algorithms and ones based on deep networks.
摘要：在本章中，我们正在研究解决非线性特征值问题的几个迭代方法。这些出现在变图像处理，图分区和分类，非线性物理和更多。规范特征问题我们解决是$ T（U）= \拉姆达U $，其中$ T：\ r ^ N \到\ r ^ n $的是某些有界非线性算子。特征值问题的其他变化进行了讨论。我们提出的5种算法的进展，近年来由作者和他的同事合着。每个算法试图解决一个独特的问题或改进的理论基础。这些算法可以被理解为非线性PDE的会聚到在连续时域中的本征函数。这样一个独特的看法和理解离散迭代过程。最后，展示了如何评估数值结果，与相关的非线性denoisers，基于网络的深既经典算法和一的先验一些例子和见解一起。

41. LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU [PDF] 返回目录
Evgeny Ponomarev, Sergey Matveev, Ivan Oseledets
Abstract: A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may be not the best choice. In order to obtain a better approximation of latency, research community uses look-up tables of all possible layers for latency calculation for the final prediction of the inference on mobile CPU. It requires only a small number of experiments. Unfortunately, on mobile GPU this method is not applicable in a straight-forward way and shows low precision. In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem. Our main goal is to construct a convenient latency estimation tool for investigation(LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we build open-source tools which provide a convenient way to conduct massive experiments on different target devices focusing on mobile GPU. After evaluation of the dataset, we learn the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset and also evaluate the most popular neural network architectures for two mobile GPUs. As a result, we construct latency prediction model with good precision on the target evaluation subset. We consider LETI as a useful tool for neural architecture search or massive latency evaluation. The project is available at this https URL
摘要：很多深学习应用都要求在移动设备上运行。精度和推理时间是有意义的他们中的很多。虽然FLOPS的数量通常用作神经网络延迟的代理，它可能不是最好的选择。为了获得延迟的更好的近似，研究界使用查找所有可能的层表的间隔时间计算在移动CPU的推断的最终预测。它仅需要少量的实验。不幸的是，在移动GPU这种方法并不适用于直接的方式，并显示精度低。在这项工作中，我们认为在移动GPU的数据和硬件的特定问题延迟近似。我们的主要目标是建立一个方便的潜伏期推定工具，神经网络推理的调查（LETI），并建立健全和准确的等待时间预测模型为每个特定的任务。为了实现这一目标，我们建立的开源工具，提供了便捷的方式上专注于移动GPU不同的目标设备进行大量实验。该数据集的评估后，我们学会对实验数据的回归模型，并将其用于未来的等待时间预测和分析。我们在实验上证实这种方法的适用性上流行的NAS-101的基准数据集的一个子集，并评估了两个移动GPU最流行的神经网络结构。因此，我们构建与评价对象子集佳精密延时预测模型。我们认为LETI作为神经结构的搜索或块状延迟评估的有用工具。该项目可在此HTTPS URL

42. COVIDomaly: A Deep Convolutional Autoencoder Approach for Detecting Early Cases of COVID-19 [PDF] 返回目录
Faraz Khoshbakhtian, Ahmed Bilal Ashraf, Shehroz S. Khan
Abstract: As of September 2020, the COVID-19 pandemic continues to devastate the health and well-being of the global population. With more than 33 million confirmed cases and over a million deaths, global health organizations are still a long way from fully containing the pandemic. This pandemic has raised serious questions about the emergency preparedness of health agencies, not only in terms of treatment of an unseen disease, but also in identifying its early symptoms. In the particular case of COVID-19, several studies have indicated that chest radiography images of the infected patients show characteristic abnormalities. However, at the onset of a given pandemic, such as COVID-19, there may not be sufficient data for the affected cases to train models for their robust detection. Hence, supervised classification is ill-posed for this problem because the time spent in collecting large amounts of infected peoples' data could lead to the loss of human lives and delays in preventive interventions. Therefore, we formulate this problem within a one-class classification framework, in which the data for healthy patients is abundantly available, whereas no training data is present for the class of interest (COVID-19 in our case). To solve this problem, we present COVIDomaly, a convolutional autoencoder framework to detect unseen COVID-19 cases from the chest radiographs. We tested two settings on a publicly available dataset (COVIDx) by training the model on chest X-rays from (i) only healthy adults, and (ii) healthy and other non-COVID-19 pneumonia, and detected COVID-19 as an anomaly. After performing 3-fold cross validation, we obtain a pooled ROC-AUC of 0.7652 and 0.6902 in the two settings respectively. These results are very encouraging and pave the way towards research for ensuring emergency preparedness in future pandemics, especially the ones that could be detected from chest X-rays.
摘要：2020年9月的作为，在COVID-19大流行继续蹂躏的健康和福祉的全球人口。随着越来越多的超过3300万确诊病例和超过一万人死亡，全球健康组织仍然完全包含流行很长的路要走。这种流行病引起了人们对医疗卫生机构，不仅在治疗疾病看不见的方面，而且在识别其早期症状的应急准备的严重问题。在COVID-19的特定情况下，一些研究表明，受感染患者胸部X线检查图像显示性能异常。然而，在受影响的情况下，以火车模型，其强大的检测一个给定的大流行，如COVID-19，可能没有足够的数据的发作。因此，监督分类是病态的这个问题，因为在收集大量受感染的人民的数据所用的时间可能会导致人的生命和延误的预防性干预措施的损失。因此，我们制订了一类分类框架，其中健康患者的数据是可用的大量内这个问题，而没有训练数据存在的类的利益（在我们的例子COVID-19）。为了解决这个问题，我们目前COVIDomaly，卷积的自动编码框架，从胸片检测看不见COVID-19的情况。我们通过选自（i）只健康成人训练胸部X射线的模型中测试在可公开获得的数据集（COVIDx）两个设置，和（ii）健康和其他非COVID-19肺炎，和检测COVID-19作为异常。进行3倍交叉验证后，我们分别得到的在两个设置0.7652和0.6902的汇总ROC-AUC。这些结果是非常令人鼓舞的，对确保未来的流感大流行应急准备，特别是能够从胸部X光检查被检测者的研究铺平了道路。

43. Image Translation for Medical Image Generation -- Ischemic Stroke Lesions [PDF] 返回目录
Moritz Platscher, Jonathan Zopes, Christian Federau
Abstract: Deep learning-based automated disease detection and segmentation algorithms promise to accelerate and improve many clinical processes. However, such algorithms require vast amounts of annotated training data, which are typically not available in a medical context, e.g., due to data privacy concerns, legal obstructions, and non-uniform data formats. Synthetic databases of annotated pathologies could provide the required amounts of training data. Here, we demonstrate with the example of ischemic stroke that a significant improvement in lesion segmentation is feasible using deep learning-based data augmentation. To this end, we train different image-to-image translation models to synthesize diffusion-weighted magnetic resonance images (DWIs) of brain volumes with and without stroke lesions from semantic segmentation maps. In addition, we train a generative adversarial network to generate synthetic lesion masks. Subsequently, we combine these two components to build a large database of synthetic stroke DWIs. The performance of the various generative models is evaluated using a U-Net which is trained to segment stroke lesions on a clinical test set. We compare the results to human expert inter-reader scores. For the model with the best performance, we report a maximum Dice score of 82.6\%, which significantly outperforms the model trained on the clinical images alone (74.8\%), and also the inter-reader Dice score of two human readers of 76.9\%. Moreover, we show that for a very limited database of only 10 or 50 clinical cases, synthetic data can be used to pre-train the segmentation algorithms, which ultimately yields an improvement by a factor of as high as 8 compared to a setting where no synthetic data is used.
摘要：深学习型自动疾病检测和分割算法有希望加速和提高许多临床过程。然而，这种算法需要大量注释的训练数据，这通常不是在医疗上下文中可用，例如，由于数据隐私问题，法律障碍物，以及非均匀的数据格式。注释疾病的综合数据库，可以提供所需的训练数据。这里，我们证明与缺血性中风，在病变划分一个显著改善使用深学习型数据扩张是可行的例子。为此，我们训练不同的图像 - 图像平移模式，以大脑体积的合成弥散加权磁共振图像（DWIs）有和没有从语义分割地图中风病变。此外，我们培养出生成对抗性的网络来生成合成病变口罩。随后，我们结合这两种组件来构建合成中风DWIs的大型数据库。各个生成模型的性能是使用U形网，其被训练以段行程病变上的临床试验组进行评价。我们比较的结果，人类专家间阅读器的分数。对于具有最佳性能的模型，我们提出一个最大骰子得分82.6 \％，其中显著优于培训了单独的临床图像模型（74.8 \％），也是阅读器间骰子得分的76.9两人的读者\％。此外，我们显示，只有10或50的临床病例非常有限的数据库，合成数据可被用于预培养的分割算法，由高达8相比，设置在没有的一个因素，其最终产生的改进合成的数据被使用。

44. Assessing Automated Machine Learning service to detect COVID-19 from X-Ray and CT images: A Real-time Smartphone Application case study [PDF] 返回目录
Razib Mustafiz, Khaled Mohsin
Abstract: The recent outbreak of SARS COV-2 gave us a unique opportunity to study for a non interventional and sustainable AI solution. Lung disease remains a major healthcare challenge with high morbidity and mortality worldwide. The predominant lung disease was lung cancer. Until recently, the world has witnessed the global pandemic of COVID19, the Novel coronavirus outbreak. We have experienced how viral infection of lung and heart claimed thousands of lives worldwide. With the unprecedented advancement of Artificial Intelligence in recent years, Machine learning can be used to easily detect and classify medical imagery. It is much faster and most of the time more accurate than human radiologists. Once implemented, it is more cost-effective and time-saving. In our study, we evaluated the efficacy of Microsoft Cognitive Service to detect and classify COVID19 induced pneumonia from other Viral/Bacterial pneumonia based on X-Ray and CT images. We wanted to assess the implication and accuracy of the Automated ML-based Rapid Application Development (RAD) environment in the field of Medical Image diagnosis. This study will better equip us to respond with an ML-based diagnostic Decision Support System(DSS) for a Pandemic situation like COVID19. After optimization, the trained network achieved 96.8% Average Precision which was implemented as a Web Application for consumption. However, the same trained network did not perform the same like Web Application when ported to Smartphone for Real-time inference. Which was our main interest of study. The authors believe, there is scope for further study on this issue. One of the main goal of this study was to develop and evaluate the performance of AI-powered Smartphone-based Real-time Application. Facilitating primary diagnostic services in less equipped and understaffed rural healthcare centers of the world with unreliable internet service.
摘要：最近SARS COV-2的爆发给了我们研究的一个独特的机会，非介入性和可持续的AI解决方案。肺部疾病仍然是与高发病率和死亡率世界范围内主要的医疗挑战。主要肺病是肺癌。直到最近，世界上出现COVID19，该新型冠状病毒爆发全球大流行。我们已经经历了肺和心脏的病毒感染如何声称全球成千上万人的生命。随着人工智能近年来前所未有的进步，机器学习可用于轻松检测和分类医疗影像。这是更快，大部分的时间比人类放射科医生更准确。一旦实施，这是更具成本效益和节省时间的。在我们的研究中，我们评估了微软的认知服务的功效检测和基于X射线和CT图像的其它病毒/细菌性肺炎分类COVID19诱发肺炎。我们希望评估医学影像诊断领域的基于ML-自动快速应用开发（RAD）环境的含义和准确性。这项研究将更好地装备我们响应与类似COVID19流行病状况的基于ML-诊断决策支持系统（DSS）。优化后，训练有素的网络达到96.8％，平均准确率这是实现消费的Web应用程序。然而，当移植到智能手机的实时推理同一训练的网络没有执行相同的类似Web应用程序。这是我们研究的主要兴趣。笔者认为，有余地就这一问题进一步研究。一本研究的主要目标是开发和评估AI供电的基于智能手机的实时应用的性能。在世界的装备少人员不足和农村医疗中心提供不可靠的互联网服务促进主要诊断服务。

45. Histopathological Stain Transfer using Style Transfer Network with Adversarial Loss [PDF] 返回目录
Harshal Nishar, Nikhil Chavanke, Nitin Singhal
Abstract: Deep learning models that are trained on histopathological images obtained from a single lab and/or scanner give poor inference performance on images obtained from another scanner/lab with a different staining protocol. In recent years, there has been a good amount of research done for image stain normalization to address this issue. In this work, we present a novel approach for the stain normalization problem using fast neural style transfer coupled with adversarial loss. We also propose a novel stain transfer generator network based on High-Resolution Network (HRNet) which requires less training time and gives good generalization with few paired training images of reference stain and test stain. This approach has been tested on Whole Slide Images (WSIs) obtained from 8 different labs, where images from one lab were treated as a reference stain. A deep learning model was trained on this stain and the rest of the images were transferred to it using the corresponding stain transfer generator network. Experimentation suggests that this approach is able to successfully perform stain normalization with good visual quality and provides better inference performance compared to not applying stain normalization.
摘要：正在从单一的实验室和/或扫描仪获得的图像病理训练的深度学习模型给出了不同的染色协议从另一个扫描仪/实验室获得的图像推断表现不佳。近年来，出现了图像污点正常化做来解决这个问题研究的好量。在这项工作中，我们使用加上对抗性损失快速的神经传递样式呈现为色斑正常化问题的新方法。我们还提出了一种基于高分辨率网络（HRNet），这需要训练时间少，提供了良好的泛化与基准污损和测试染色的几个配对训练图像上一个新的污点转移发生器网络。这种方法已在从8级不同的实验室，在那里从一个实验室图像被视为一个基准污损获得整个幻灯片图像（WSIS）进行了测试。深学习模型被训练在此染色和图像的其余均使用相应的污渍转印发电机网络传送到它。实验表明，这种方法能够以良好的视觉效果成功地执行污点正常化，并提供与不采用染色正常化更好的推理性能。

46. PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework Based on Adversarial Learning [PDF] 返回目录
Yuli Zheng, Zhenyu Wu, Ye Yuan, Tianlong Chen, Zhangyang Wang
Abstract: Credit risk modeling has permeated our everyday life. Most banks and financial companies use this technique to model their clients' trustworthiness. While machine learning is increasingly used in this field, the resulting large-scale collection of user private information has reinvigorated the privacy debate, considering dozens of data breach incidents every year caused by unauthorized hackers, and (potentially even more) information misuse/abuse by authorized parties. To address those critical concerns, this paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL). PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance, by (iteratively) weighing between a privacy-risk loss and a utility-oriented loss. PCAL is compared against off-the-shelf options in terms of both utility and privacy protection. Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
摘要：信用风险模型已经渗透到我们的日常生活。大多数银行和金融公司使用此技术为他们的客户守信建模。虽然机器学习在这一领域越来越多地使用，用户的私人信息，导致大规模的收集工作注入新的活力隐私的争论，考虑到几十每年数据泄露事故所造成的未经授权的黑客，以及（甚至可能更多）的信息误用/滥用授权方。为了解决这些关键的关注，提出的隐私保护信用风险模型的基础上对抗性学习（PCAL）的框架。 PCAL旨在掩盖原始数据集内的私人信息，同时维持目标预测工作绩效的重要工具的信息，由隐私风险损失和面向公用事业的损失之间（反复）称重。 PCAL是不要在这两个工具和隐私保护方面现成的架子选项进行比较。结果表明，PCAL可以学习用户数据的有效，免费的隐私表示，对提供隐私保护机器学习的信用风险分析了坚实的基础。

47. Adversarial Boot Camp: label free certified robustness in one epoch [PDF] 返回目录
Ryan Campbell, Chris Finlay, Adam M Oberman
Abstract: Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.
摘要：机器学习模型很容易受到攻击的对抗性。一个办法来解决这个漏洞认证，侧重于保证是对于给定的扰动大小可靠的模型。最近的认证模式的缺点是，它们是随机的：它们需要与添加到给定的输入随机噪声多耗费计算模型评估。在我们的工作中，我们提出了一个确定性的认证方法导致可证明稳健的模型。这种方法是基于与特定的正则损失，高斯平均的预期值之间的训练等价。我们通过再培训这个损失对一个时代的典范，而无需使用的标签信息ImageNet-1K实现认证型号。

48. Denoising Diffusion Implicit Models [PDF] 返回目录
Jiaming Song, Chenlin Meng, Stefano Ermon
Abstract: Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.
摘要：去噪扩散概率模型（DDPMs）已经实现高质量的图像生成无对抗性的训练，但他们需要模拟的许多步骤，制成样品马尔可夫链。为了加快采样，我们目前降噪扩散隐性模型（DDIMs），更高效的迭代类隐含概率模型的具有相同训练过程的DDPMs。在DDPMs，该生成过程被定义为马尔可夫扩散过程的逆过程。我们构造一类非马尔科夫扩散过程，导致同样的训练目标，但其逆过程可以更快，从样品的。我们经验表明，DDIMs可以生产出高品质的样品10 $ \次$ 50 $ \次$相比DDPMs在墙上时钟时间更快，让我们折中计算样品的质量，并且可以直接在执行语义上有意义的图像插值潜在空间。

49. Downscaling Attacks: What You See is Not What You Get [PDF] 返回目录
Andrew J. Lohn
Abstract: The resizing of images, which is typically a required part of preprocessing for computer vision systems, is vulnerable to attack. We show that images can be created such that the image is completely different at machine-vision scales than at other scales. The default settings for some common computer vision and machine learning systems are vulnerable although defenses exist and are trivial to administer provided that defenders are aware of the threat. These attacks and defenses help to establish the role of input sanitization in machine learning.
摘要：图像的大小调整，通常是用于预处理计算机视觉系统的必要部分，很容易受到攻击。我们表明，图像可以被创建，使得图像在机器视觉尺度比其他尺度完全不同。对于一些常见的计算机视觉和机器学习系统的默认设置是脆弱的，虽然存在着防御和是微不足道的管理规定，捍卫者意识到威胁。这些攻击和防御有助于建立输入清理的机器学习中的作用。

50. The Effectiveness of Memory Replay in Large Scale Continual Learning [PDF] 返回目录
Yogesh Balaji, Mehrdad Farajtabar, Dong Yin, Alex Mott, Ang Li
Abstract: We study continual learning in the large scale setting where tasks in the input sequence are not limited to classification, and the outputs can be of high dimension. Among multiple state-of-the-art methods, we found vanilla experience replay (ER) still very competitive in terms of both performance and scalability, despite its simplicity. However, a degraded performance is observed for ER with small memory. A further visualization of the feature space reveals that the intermediate representation undergoes a distributional drift. While existing methods usually replay only the input-output pairs, we hypothesize that their regularization effect is inadequate for complex deep models and diverse tasks with small replay buffer size. Following this observation, we propose to replay the activation of the intermediate layers in addition to the input-output pairs. Considering that saving raw activation maps can dramatically increase memory and compute cost, we propose the Compressed Activation Replay technique, where compressed representations of layer activation are saved to the replay buffer. We show that this approach can achieve superior regularization effect while adding negligible memory overhead to replay method. Experiments on both the large-scale Taskonomy benchmark with a diverse set of tasks and standard common datasets (Split-CIFAR and Split-miniImageNet) demonstrate the effectiveness of the proposed method.
摘要：我们在大规模的环境，让输入序列任务不限于分类研究不断学习，而输出可以是高维的。在多个国家的最先进的方法，我们发现香草经验重播（ER）还是在性能和扩展能力方面非常具有竞争力，尽管它的简单性。但是，下降的性能是观察ER与小内存。特征空间的进一步的可视化显示，中间表示经历分布的漂移。虽然现有的方法通常只重播的输入 - 输出对，我们假设其正效应是不够的复杂深刻的模型和小型重播缓冲区大小不同的任务。在此之后的观察，我们提出重放该中间层的活化除输入 - 输出对。考虑到节约原材料激活图可以极大地提高存储和计算成本，我们提出了压缩激活重播技术，其中层活化的压缩表示保存到重传缓冲器。我们表明，这种方法可以实现卓越的正规化的效果，同时增加的开销可以忽略不计内存重播方法。两个大型Taskonomy基准与一组不同的任务和标准通用的数据集（拆分CIFAR和分miniImageNet）的实验证明了该方法的有效性。

51. ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution [PDF] 返回目录
Jialiang Shen, Yucheng Wang, Jian Zhang
Abstract: Deep convolutional neural networks have significantly improved the peak signal-to-noise ratio of SuperResolution (SR). However, image viewer applications commonly allow users to zoom the images to arbitrary magnification scales, thus far imposing a large number of required training scales at a tremendous computational cost. To obtain a more computationally efficient model for arbitrary scale SR, this paper employs a Laplacian pyramid method to reconstruct any-scale high-resolution (HR) images using the high-frequency image details in a Laplacian Frequency Representation. For SR of small-scales (between 1 and 2), images are constructed by interpolation from a sparse set of precalculated Laplacian pyramid levels. SR of larger scales is computed by recursion from small scales, which significantly reduces the computational cost. For a full comparison, fixed- and any-scale experiments are conducted using various benchmarks. At fixed scales, ASDN outperforms predefined upsampling methods (e.g., SRCNN, VDSR, DRRN) by about 1 dB in PSNR. At any-scale, ASDN generally exceeds Meta-SR on many scales.
摘要：深卷积神经网络已经显著改善超分辨率（SR）的峰值信噪比。然而，图像浏览器应用程序通常允许用户将图像放大到任意的放大尺度，迄今在一个巨大的计算成本强加了大量所需的培训尺度。为了获得用于任意标度SR计算效率更高模型，本文采用了拉普拉斯金字塔方法来重建任何大规模使用拉普拉斯频表示的高频图象细节的高分辨率（HR）图像。对于小尺度（和间1 2）的SR，图像通过内插从稀疏集合预先计算的拉普拉斯金字塔级的构造。 SR更大的尺度的由递归从小的尺度，其中显著降低了计算成本来计算。对于一个完整的比较，固定和任何规模的实验使用各种基准进行。在固定尺规，ASDN性能优于预定义上采样方法（例如，SRCNN，VDSR，DRRN）约在PSNR为1dB。在任何大规模，ASDN普遍超过许多尺度元SR。

52. Fusion 360 Gallery: A Dataset and Environment for Programmatic CAD Reconstruction [PDF] 返回目录
Karl D.D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik
Abstract: Parametric computer-aided design (CAD) is a standard paradigm used for the design of manufactured objects. CAD designers perform modeling operations, such as sketch and extrude, to form a construction sequence that makes up a final design. Despite the pervasiveness of parametric CAD and growing interest from the research community, a dataset of human designed 3D CAD construction sequences has not been available to-date. In this paper we present the Fusion 360 Gallery reconstruction dataset and environment for learning CAD reconstruction. We provide a dataset of 8,625 designs, comprising sequential sketch and extrude modeling operations, together with a complementary environment called the Fusion 360 Gym, to assist with performing CAD reconstruction. We outline a standard CAD reconstruction task, together with evaluation metrics, and present results from a novel method using neurally guided search to recover a construction sequence from raw geometry.
摘要：参数的计算机辅助设计（CAD）是用于制造物体的设计中的标准模式。 CAD设计师进行建模操作，如草图和拉伸，形成施工顺序，构成了最终的设计。尽管参数化CAD的普及和越来越大的兴趣在研究界，人体的数据集设计三维CAD施工工序都还没有推出最新的。在本文中，我们提出了学习CAD重建融合360图库重建数据集和环境。我们提供的8625级的设计数据集，其包括连续的草图和拉伸建模操作，与被称为融合360健身房互补环境一起，以协助执行CAD重建。我们与评价标准，并使用neurally引导搜索以恢复来自原始几何结构的施工顺序从一种新颖的方法本发明的结果勾勒出一个标准CAD重建任务，在一起。

53. Winning Lottery Tickets in Deep Generative Models [PDF] 返回目录
Neha Mukund Kalibhat, Yogesh Balaji, Soheil Feizi
Abstract: The lottery ticket hypothesis suggests that sparse, sub-networks of a given neural network, if initialized properly, can be trained to reach comparable or even better performance to that of the original network. Prior works in lottery tickets have primarily focused on the supervised learning setup, with several papers proposing effective ways of finding "winning tickets" in classification problems. In this paper, we confirm the existence of winning tickets in deep generative models such as GANs and VAEs. We show that the popular iterative magnitude pruning approach (with late rewinding) can be used with generative losses to find the winning tickets. This approach effectively yields tickets with sparsity up to 99% for AutoEncoders, 93% for VAEs and 89% for GANs on CIFAR and Celeb-A datasets. We also demonstrate the transferability of winning tickets across different generative models (GANs and VAEs) sharing the same architecture, suggesting that winning tickets have inductive biases that could help train a wide range of deep generative models. Furthermore, we show the practical benefits of lottery tickets in generative models by detecting tickets at very early stages in training called "early-bird tickets". Through early-bird tickets, we can achieve up to 88% reduction in floating-point operations (FLOPs) and 54% reduction in training time, making it possible to train large-scale generative models over tight resource constraints. These results out-perform existing early pruning methods like SNIP (Lee, Ajanthan, and Torr 2019) and GraSP (Wang, Zhang, and Grosse 2020). Our findings shed light towards existence of proper network initializations that could improve convergence and stability of generative models.
摘要：彩票假说认为一个给定的神经网络是稀疏，子网络中，如果正确初始化，可以训练，达到媲美，甚至更好的性能与原始网络。在彩票之前的作品主要集中在监督学习的设置，以多篇论文中提出的分类问题寻找“中奖彩票”的有效途径。在本文中，我们确认在深生成模型，如甘斯和VAES中奖彩票的存在。我们表明，流行的迭代幅度修剪方法（与晚收卷）可以生成损失被用来寻找中奖彩票。这种方法有效地产生了具有稀疏的门票高达99％的自动编码，为VAES 93％，对CIFAR甘斯和名人-A的数据集89％。我们还演示了获奖在不同的生成模型（甘斯和VAES）车票共享相同的架构，这表明中奖彩票有感性的偏见，可以帮助培养了广泛的深生成模型的可转移性。此外，我们还通过在培训初期阶段被称为“早鸟票”门票检测显示彩票中生成模型的实际好处。通过早期鸟门票，我们可以实现高达浮点运算（FLOPS），并在训练时间减少54％，减少88％，从而能够在资源紧张的制约训练的大规模生成模型。这些结果在性能现有的早期修剪方法，比如SNIP（李Ajanthan和托2019），把握（王，张，并格罗斯2020年）。我们的研究结果揭示朝着正确的网络初始化，可以改善生成模型的收敛性和稳定性的存在。

54. Early Detection of Myocardial Infarction in Low-Quality Echocardiography [PDF] 返回目录
Aysen Degerli, Morteza Zabihi, Serkan Kiranyaz, Tahir Hamid, Rashid Mazhar, Ridha Hamila, Moncef Gabbouj
Abstract: Myocardial infarction (MI), or commonly known as heart attack, is a life-threatening worldwide health problem from which 32.4 million of people suffer each year. Early diagnosis and treatment of MI are crucial to prevent further heart tissue damages. However, MI detection in early stages is challenging because the symptoms are not easy to distinguish in electrocardiography findings or biochemical marker values found in the blood. Echocardiography is a noninvasive clinical tool for a more accurate early MI diagnosis, which is used to analyze the regional wall motion abnormalities. When echocardiography quality is poor, the diagnosis becomes a challenging and sometimes infeasible task even for a cardiologist. In this paper, we introduce a three-phase approach for early MI detection in low-quality echocardiography: 1) segmentation of the entire left ventricle (LV) wall of the heart using state-of-the-art deep learning model, 2) analysis of the segmented LV wall by feature engineering, and 3) early MI detection. The main contributions of this study are: highly accurate segmentation of the LV wall from low-resolution (both temporal and spatial) and noisy echocardiographic data, generating the segmentation ground-truth at pixel-level for the unannotated dataset using pseudo labeling approach, and composition of the first public echocardiographic dataset (HMC-QU) labeled by the cardiologists at the Hamad Medical Corporation Hospital in Qatar. Furthermore, the outputs of the proposed approach can significantly help cardiologists for a better assessment of the LV wall characteristics. The proposed method is evaluated in a 5-fold cross validation scheme on the HMC-QU dataset. The proposed approach has achieved an average level of 95.72% sensitivity and 99.58% specificity for the LV wall segmentation, and 85.97% sensitivity, 74.03% specificity, and 86.85% precision for MI detection.
摘要：心肌梗死（MI），或俗称心脏攻击，是威胁生命的全球健康问题，从3240万的人每年遭受。早期诊断和治疗心肌梗死的是，以防止进一步的心脏组织的损害是至关重要的。然而，MI检测早期阶段是具有挑战性的，因为症状不容易在血液中发现心电图发现或生化指标值来区分。超声心动图是一个更准确的早期诊断MI，其用于分析区域壁运动异常的非侵入性的临床工具。当超声心动图质量较差，诊断成为一个具有挑战性的，有时甚至是不可行的任务，即使是心脏病。在本文中，我们引入早期MI检测三相的方法在低质量的超声心动图：1）使用状态的最先进的心脏的整个左心室（LV）壁的分割深学习模型，2）通过特征工程，以及3）早期MI检测所分割的LV壁的分析。本研究的主要贡献是：从低分辨率（时间和空间）和嘈杂的超声心动图数据的LV壁的高度精确的分割，使用伪标记方法生成所述分割地面实况在像素级为未注释的数据集，并通过在哈马德医疗集团医院在卡塔尔的心脏病专家标记首次公开超声心动图的数据集（HMC-QU）组成。此外，该方法的输出可以显著帮助心脏病专家对的左心室壁的特点进行更好的评估。所提出的方法中的HMC-QU数据集5倍交叉验证方案进行评估。所提出的方法已经实现了95.72％的敏感性和为LV壁分割99.58％的特异性，和85.97％的敏感性，74.03％的特异性，以及用于检测MI 86.85％精度的平均电平。

55. Multi-Resolution 3D Convolutional Neural Networks for Automatic Coronary Centerline Extraction in Cardiac CT Angiography Scans [PDF] 返回目录
Zohaib Salahuddin, Matthias Lenga, Hannes Nickisch
Abstract: We propose a deep learning-based automatic coronary artery tree centerline tracker (AuCoTrack) extending the vessel tracker by Wolterink (arXiv:1810.03143). A dual pathway Convolutional Neural Network (CNN) operating on multi-scale 3D inputs predicts the direction of the coronary arteries as well as the presence of a bifurcation. A similar multi-scale dual pathway 3D CNN is trained to identify coronary artery endpoints for terminating the tracking process. Two or more continuation directions are derived based on the bifurcation detection. The iterative tracker detects the entire left and right coronary artery trees based on only two ostium landmarks derived from a model-based segmentation of the heart. The 3D CNNs were trained on a proprietary dataset consisting of 43 CCTA scans. An average sensitivity of 87.1% and clinically relevant overlap of 89.1% was obtained relative to a refined manual segmentation. In addition, the MICCAI 2008 Coronary Artery Tracking Challenge (CAT08) training and test datasets were used to benchmark the algorithm and to assess its generalization. An average overlap of 93.6% and a clinically relevant overlap of 96.4% were obtained. The proposed method achieved better overlap scores than the current state-of-the-art automatic centerline extraction techniques on the CAT08 dataset with a vessel detection rate of 95%.
摘要：本文提出了一种基于深学习型自动冠状动脉树中心线延长由Wolterink（的arXiv：1810.03143）船舶跟踪器跟踪器（AuCoTrack）。甲双路径卷积神经网络的多尺度三维输入（CNN）操作预测冠状动脉的方向以及分叉的存在。类似的多尺度双路径3D CNN被训练来识别用于终止追踪处理冠状动脉端点。两个或更多个连续方向是基于分叉检测而得。迭代跟踪检测整个左和仅基于两个来自心脏的基于模型的分割衍生口地标右冠状动脉树。所述3D细胞神经网络被训练在由43次CCTA扫描的专有数据集。的87.1％的89.1％的平均灵敏度和临床上相关的重叠获得相对于精制手动分割。此外，2008年MICCAI冠状动脉跟踪挑战（CAT08）训练和测试数据集被用来基准的算法，并评估其概括。得到的93.6％，平均重叠的96.4％的临床上相关的重叠。所提出的方法比为95％的容器检出率CAT08数据集的当前状态的最先进的自动中心线提取技术来实现更好的重叠分数。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-07

目录

摘要