0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-09-09

目录

1. Intraoperative Liver Surface Completion with Graph Convolutional VAE [PDF] 摘要
2. Understanding and Exploiting Dependent Variables with Deep Metric Learning [PDF] 摘要
3. VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network [PDF] 摘要
4. Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors [PDF] 摘要
5. Analysis and Prediction of Deforming 3D Shapes using Oriented Bounding Boxes and LSTM Autoencoders [PDF] 摘要
6. Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective [PDF] 摘要
7. Convolutional Neural Networks for Automatic Detection of Artifacts from Independent Components Represented in Scalp Topographies of EEG Signals [PDF] 摘要
8. Rain rendering for evaluating and improving robustness to bad weather [PDF] 摘要
9. A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification [PDF] 摘要
10. Region Comparison Network for Interpretable Few-shot Image Classification [PDF] 摘要
11. Few-Shot Hyperspectral Image Classification With Unknown Classes Using Multitask Deep Learning [PDF] 摘要
12. A Residual Solver and Its Unfolding Neural Network for Total Variation Regularized Models [PDF] 摘要
13. LaSOT: A High-quality Large-scale Single Object Tracking Benchmark [PDF] 摘要
14. ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation [PDF] 摘要
15. Convolution Neural Networks for diagnosing colon and lung cancer histopathological images [PDF] 摘要
16. TanhSoft -- a family of activation functions combining Tanh and Softplus [PDF] 摘要
17. Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation [PDF] 摘要
18. Deep Cyclic Generative Adversarial Residual Convolutional Networks for Real Image Super-Resolution [PDF] 摘要
19. GPU-based Self-Organizing Maps for Post-Labeled Few-Shot Unsupervised Learning [PDF] 摘要
20. Learning more expressive joint distributions in multimodal variational methods [PDF] 摘要
21. Imbalanced Continual Learning with Partitioning Reservoir Sampling [PDF] 摘要
22. Horus: Using Sensor Fusion to Combine Infrastructure and On-board Sensing to Improve Autonomous Vehicle Safety [PDF] 摘要
23. Adversarial attacks on deep learning models for fatty liver disease classification by modification of ultrasound image reconstruction method [PDF] 摘要
24. Going deeper with brain morphometry using neural networks [PDF] 摘要
25. Sensors, Safety Models and A System-Level Approach to Safe and Scalable Automated Vehicles [PDF] 摘要

摘要

1. Intraoperative Liver Surface Completion with Graph Convolutional VAE [PDF] 返回目录
  Simone Foti, Bongjin Koo, Thomas Dowrick, Joao Ramalhinho, Moustafa Allam, Brian Davidson, Danail Stoyanov, Matthew J. Clarkson
Abstract: In this work we propose a method based on geometric deep learning to predict the complete surface of the liver, given a partial point cloud of the organ obtained during the surgical laparoscopic procedure. We introduce a new data augmentation technique that randomly perturbs shapes in their frequency domain to compensate the limited size of our dataset. The core of our method is a variational autoencoder (VAE) that is trained to learn a latent space for complete shapes of the liver. At inference time, the generative part of the model is embedded in an optimisation procedure where the latent representation is iteratively updated to generate a model that matches the intraoperative partial point cloud. The effect of this optimisation is a progressive non-rigid deformation of the initially generated shape. Our method is qualitatively evaluated on real data and quantitatively evaluated on synthetic data. We compared with a state-of-the-art rigid registration algorithm, that our method outperformed in visible areas.
摘要:在这项工作中,我们提出了一种基于几何深度学习来预测肝脏的整个表面的方法,考虑到腹腔镜手术过程中获取器官的局部点云。我们推出了新的数据增强技术,随机扰动形状在频率域,以弥补我们的数据集的大小限制。我们的方法的核心是变的自动编码(VAE)被训练来学习对肝脏的完整形状的潜在空间。在推理时,该模型的生成部分嵌入其中的优化过程,其中潜表示被迭代地更新,以生成术中局部点云相匹配的模型英寸这种优化的效果是在最初被生成的形状的渐进非刚性变形。我们的方法是定性评价实际的数据以及对合成数据定量评价。我们有一个国家的最先进的刚性配准算法相比,我们的方法在可见区域跑赢。

2. Understanding and Exploiting Dependent Variables with Deep Metric Learning [PDF] 返回目录
  Niall O' Mahony, Sean Campbell, Anderson Carvalho, Lenka Krpalkova, Gustavo Velasco-Hernandez, Daniel Riordan, Joseph Walsh
Abstract: Deep Metric Learning (DML) approaches learn to represent inputs to a lower-dimensional latent space such that the distance between representations in this space corresponds with a predefined notion of similarity. This paper investigates how the mapping element of DML may be exploited in situations where the salient features in arbitrary classification problems vary over time or due to changing underlying variables. Examples of such variable features include seasonal and time-of-day variations in outdoor scenes in place recognition tasks for autonomous navigation and age/gender variations in human/animal subjects in classification tasks for medical/ethological studies. Through the use of visualisation tools for observing the distribution of DML representations per each query variable for which prior information is available, the influence of each variable on the classification task may be better understood. Based on these relationships, prior information on these salient background variables may be exploited at the inference stage of the DML approach by using a clustering algorithm to improve classification performance. This research proposes such a methodology establishing the saliency of query background variables and formulating clustering algorithms for better separating latent-space representations at run-time. The paper also discusses online management strategies to preserve the quality and diversity of data and the representation of each class in the gallery of embeddings in the DML approach. We also discuss latent works towards understanding the relevance of underlying/multiple variables with DML.
摘要:深度量学习(DML)接近学习来表示输入到较低维潜在空间,使得在该空间中以对应于相似性的预定义的概念表示之间的距离。本文研究如何DML的映射元件可以在任意分类问题的显着特征的情况下被利用随着时间的推移或由于改变基础变量而变化。这样的可变功能的例子包括在室外场景到位识别任务的自主导航和年龄/人/动物科目分类任务的医疗/行为学研究性别差异的季节性日分时和变化。通过使用可视化工具,用于观察每每个查询变量这之前信息,请DML表示的分布,在分类任务中每个变量的影响,可以更好地理解。基于这些关系,这些突出的背景变量先验信息,可以在DML方法的推论阶段使用的聚类算法,提高分类性能的发挥。这项研究提出了这样的方法建立查询背景变量的显着性和制定聚类算法在运行时更好的分离潜空间表示。本文还讨论了在线管理策略,以保持数据的质量和多样性,在DML方法的嵌入的画廊的每个类的代表。我们还讨论了对理解与DML底层/多个变量的相关性潜在的作品。

3. VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network [PDF] 返回目录
  Peiying Zhang, Chenhui Li, Changbo Wang
Abstract: We present an approach called VisCode for embedding information into visualization images. This technology can implicitly embed data information specified by the user into a visualization while ensuring that the encoded visualization image is not distorted. The VisCode framework is based on a deep neural network. We propose to use visualization images and QR codes data as training data and design a robust deep encoder-decoder network. The designed model considers the salient features of visualization images to reduce the explicit visual loss caused by encoding. To further support large-scale encoding and decoding, we consider the characteristics of information visualization and propose a saliency-based QR code layout algorithm. We present a variety of practical applications of VisCode in the context of information visualization and conduct a comprehensive evaluation of the perceptual quality of encoding, decoding success rate, anti-attack capability, time performance, etc. The evaluation results demonstrate the effectiveness of VisCode.
摘要:我们提出呼吁VisCode用于把信息嵌入到可视化图像的方法。该技术可以隐式地嵌入由用户指定到一个可视化,同时确保编码的可视化图像不失真数据的信息。该VisCode框架是基于深层神经网络。我们建议使用可视化图像和QR码的数据作为训练数据而设计的强大的深编码器,解码器网络。所设计的模型考虑可视化的图像的显着特征,以减少所造成的编码明确视觉丧失。为了进一步支持大型编码和解码,我们考虑信息可视化的特点,提出了一个基于显着-QR码布局算法。我们提出的各种信息可视化的背景下VisCode的实际应用并进行编码的感知质量进行综合评价,解码成功率,抗攻击能力,及时性等评价结果表明VisCode的有效性。

4. Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors [PDF] 返回目录
  Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein
Abstract: Image compositions as a tool for analysis of artworks is of extreme significance for art historians. These compositions are useful in analyzing the interactions in an image to study artists and their artworks. Max Imdahl in his work called Ikonik, along with other prominent art historians of the 20th century, underlined the aesthetic and semantic importance of the structural composition of an image. Understanding underlying compositional structures within images is challenging and a time consuming task. Generating these structures automatically using computer vision techniques (1) can help art historians towards their sophisticated analysis by saving lot of time; providing an overview and access to huge image repositories and (2) also provide an important step towards an understanding of man made imagery by machines. In this work, we attempt to automate this process using the existing state of the art machine learning techniques, without involving any form of training. Our approach, inspired by Max Imdahl's pioneering work, focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background. Currently, our approach works for artworks comprising of protagonists (persons) in an image. In order to validate our approach qualitatively and quantitatively, we conduct a user study involving experts and non-experts. The outcome of the study highly correlates with our approach and also demonstrates its domain-agnostic capability. We have open-sourced the code at this https URL.
摘要:作为艺术品的分析的工具的图像的组合物是用于艺术历史学家极端重要性。这些组合物在分析图像中的相互作用研究的艺术家和他们的作品是有用的。马克斯·达尔在他叫Ikonik,与20世纪的其他著名的艺术史学家一起工作,强调图像的结构组成的审美和语义重要性。图像中了解基本的组成结构是具有挑战性和耗时的任务。产生这些结构自动利用计算机视觉技术(1)可以帮助艺术史学家对通过节省大量的时间其复杂的分析;提供一个概要,并获得巨大的图像库和(2)还提供了对通过机器的人造图像的理解的重要一步。在这项工作中,我们试图利用本领域的机器学习技术的现有状态来自动完成这一过程,而不涉及任何形式的培训。我们的方法,由马克斯·达尔的开拓性工作的启发,着眼于图像合成的两个中心的主题:动作区域和艺术品的动作线(a)的检测;和(b)基于姿势-前景和背景的分割。目前,我们的方法也适用于图像中包含主角(人)的作品。为了定性和定量验证我们的方法,我们进行涉及专家和非专家用户研究。这项研究的结果与我们的做法高度相关,同时也表明其领域无关的能力。我们开源这个HTTPS URL的代码。

5. Analysis and Prediction of Deforming 3D Shapes using Oriented Bounding Boxes and LSTM Autoencoders [PDF] 返回目录
  Sara Hahner, Rodrigo Iza-Teran, Jochen Garcke
Abstract: For sequences of complex 3D shapes in time we present a general approach to detect patterns for their analysis and to predict the deformation by making use of structural components of the complex shape. We incorporate long short-term memory (LSTM) layers into an autoencoder to create low dimensional representations that allow the detection of patterns in the data and additionally detect the temporal dynamics in the deformation behavior. This is achieved with two decoders, one for reconstruction and one for prediction of future time steps of the sequence. In a preprocessing step the components of the studied object are converted to oriented bounding boxes which capture the impact of plastic deformation and allow reducing the dimensionality of the data describing the structure. The architecture is tested on the results of 196 car crash simulations of a model with 133 different components, where material properties are varied. In the latent representation we can detect patterns in the plastic deformation for the different components. The predicted bounding boxes give an estimate of the final simulation result and their quality is improved in comparison to different baselines.
摘要:在时间复杂的三维形状的序列,我们提出以检测模式为他们的分析和通过利用复杂形状的结构部件的预测变形的通用方法。我们结合长短期存储器(LSTM)层成自动编码以创建低维表示,其允许在所述数据的模式的检测,并且另外检测在变形行为时间动态。这是通过两个解码器,一个用于重建和一个用于序列的未来时间步预测来实现的。在预处理步骤中所研究的对象的成分被转化成定向包围盒,其捕获的塑性变形的影响,并允许减小描述结构中的数据的维数。该架构上的133个不同的组件,其中,材料特性变化的模型的196个汽车碰撞模拟的结果进行测试。在潜表示,我们可以在用于不同部件的塑性变形检测图案。预测的边界框得到的最终模拟结果的估计值和它们的质量相比,不同的基线的改善。

6. Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective [PDF] 返回目录
  Gabriel Resende Machado, Eugênio Silva, Ronaldo Ribeiro Goldschmidt
Abstract: Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.
摘要:深学习算法都取得了国家的最先进的性能图像分类,并已在安全关键应用,如生物特征识别系统和自动驾驶汽车,甚至使用。然而,最近的作品显示,那些算法,甚至可以超过人的能力,很容易受到对抗性的例子。在计算机视觉,对抗性的例子是含有以愚弄分类器被恶意优化算法生成细微扰动图象。作为一种尝试,以缓解这些漏洞,许多措施已不断文献中提出。然而,制定一个有效的防御机制已被证明是一项艰巨的任务,因为许多方法已经证明是无效的自适应攻击者。因此,这种含自本文旨在提供所有的读者群具有对对抗性机器学习在图像分类中的最新研究进展进行审查,但是有一个后卫的角度。在这里,归类敌对攻击和防御小说分类进行了介绍和提供有关的对抗性例子存在的讨论。此外,相比于exisiting调查,也给出应该由研究人员设计和评估防御时,可以考虑相关的指导。最后,基于该评审的文献,它讨论了今后的研究方向有前途的路径。

7. Convolutional Neural Networks for Automatic Detection of Artifacts from Independent Components Represented in Scalp Topographies of EEG Signals [PDF] 返回目录
  Giuseppe Placidi, Luigi Cinque, Matteo Polsinelli
Abstract: Electroencephalography (EEG) measures the electrical brain activity in real-time by using sensors placed on the scalp. Artifacts, due to eye movements and blink, muscular/cardiac activity and generic electrical disturbances, have to be recognized and eliminated to allow a correct interpretation of the useful brain signals (UBS) of EEG. Independent Component Analysis (ICA) is effective to split the signal into independent components (ICs) whose re-projections on 2D scalp topographies (images), also called topoplots, allow to recognize/separate artifacts and by UBS. Until now, IC topoplot analysis, a gold standard in EEG, has been carried on visually by human experts and, hence, not usable in automatic, fast-response EEG. We present a completely automatic and effective framework for EEG artifact recognition by IC topoplots, based on 2D Convolutional Neural Networks (CNNs), capable to divide topoplots in 4 classes: 3 types of artifacts and UBS. The framework setup is described and results are presented, discussed and compared with those obtained by other competitive strategies. Experiments, carried on public EEG datasets, have shown an overall accuracy of above 98%, employing 1.4 sec on a standard PC to classify 32 topoplots, that is to drive an EEG system of 32 sensors. Though not real-time, the proposed framework is efficient enough to be used in fast-response EEG-based Brain-Computer Interfaces (BCI) and faster than other automatic methods based on ICs.
摘要:脑电图(EEG)措施,通过放置在头皮上的传感器实时的脑电活动。工件,由于眼睛运动和眨眼,肌肉/心脏活动和通用电气干扰,必须被识别和消除,以允许EEG的有用脑信号(UBS)的正确解释。独立分量分析(ICA)是有效的分裂信号分成独立组件(集成电路),其上的2D头皮形貌(图像),也称为topoplots,重新投影允许识别/分离伪像和通过UBS。到现在为止,IC topoplot分析,在EEG黄金标准,已经在视觉上由人类专家进行的,因此,在自动,快速响应的EEG不能使用。我们提出了通过IC topoplots,基于二维卷积神经网络(细胞神经网络),能够分topoplots在4类EEG假象识别完全自动和有效的框架:3种类型的伪像和UBS的。该框架设置被描述和结果被呈现,讨论并与其他的竞争战略获得的那些进行比较。实验中,在公共数据集EEG进行,都显示出98%以上的总体准确度,采用在标准PC上1.4秒至32个topoplots分类,即驱动的32个传感器的EEG系统。虽然不是实时的,所提出的框架是足够有效的基于EEG的快速响应脑 - 机接口(BCI)中使用,比基于IC的其他自动方法更快。

8. Rain rendering for evaluating and improving robustness to bad weather [PDF] 返回目录
  Maxime Tremblay, Shirsendu Sukanta Halder, Raoul de Charette, Jean-François Lalonde
Abstract: Rain fills the atmosphere with water particles, which breaks the common assumption that light travels unaltered from the scene to the camera. While it is well-known that rain affects computer vision algorithms, quantifying its impact is difficult. In this context, we present a rain rendering pipeline that enables the systematic evaluation of common computer vision algorithms to controlled amounts of rain. We present three different ways to add synthetic rain to existing images datasets: completely physic-based; completely data-driven; and a combination of both. The physic-based rain augmentation combines a physical particle simulator and accurate rain photometric modeling. We validate our rendering methods with a user study, demonstrating our rain is judged as much as 73% more realistic than the state-of-theart. Using our generated rain-augmented KITTI, Cityscapes, and nuScenes datasets, we conduct a thorough evaluation of object detection, semantic segmentation, and depth estimation algorithms and show that their performance decreases in degraded weather, on the order of 15% for object detection, 60% for semantic segmentation, and 6-fold increase in depth estimation error. Finetuning on our augmented synthetic data results in improvements of 21% on object detection, 37% on semantic segmentation, and 8% on depth estimation.
摘要:雨填补了大气中的水粒子,它打破了常见的假设是光速从现场摄像机不变。虽然这是众所周知的,雨会影响计算机视觉算法,量化其影响是困难的。在此背景下,我们提出了一个雨渲染管线,使普通计算机视觉算法的系统评价,以控制量的雨。我们提出三种不同的方法来合成雨水添加到现有的图像数据集:完全物理为基础的;完全数据驱动的;和两者的组合。基于物理雨增强结合了物理粒子模拟器和准确雨光度建模。我们验证了我们与用户研究呈现方法,证明了我们雨判断高达73%,比国家的theart更加逼真。使用我们生成的降雨增加了的KITTI,风情,和nuScenes数据集,我们进行目标检测,语义分割和深度估计算法进行全面评估,并表明其性能退化的天气物体检测减小,15%左右, 60%的语义分割,并且在深度估计误差增加6倍。在对物体检测21%,在语义分割37%,并且在深度估计8%改进微调对我们的增强合成数据的结果。

9. A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification [PDF] 返回目录
  Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Yi Guo, Jun Cheng, Bin Hu, Xinwang Liu
Abstract: Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information. Our codes are available at this https URL.
摘要:人重新鉴定(再ID)通过步态3D骨架序列中的特征是具有若干优点的新兴主题。现有的解决方案无论是依靠手工制作的描述符或监督步态表示学习。本文提出了一种自我监督的步态编码的方法,可以利用未标记的骨架数据来学习的人重新编号步态表示。具体而言,我们首先通过学习反向重构未标记骨架序列,其中包括更丰富的高层次语义来获得更好的步态表示创建自检。其他借口任务进行了探讨,以进一步提高自我监督学习。其次,通过该运动的连续性赋予在一个骨架序列中的相邻骨架和具有较高的相关性(以下简称在三维骨架数据局部性)时间上连续的骨架序列的事实启发,我们提出了一种局部性感知注意机制和局部性感知对比学习方案,其目的自监督学习过程中保持分别在序列内水平和序列间水平局部性意识。最后,与我们的本地感知注意机制和对比学习方案,名为Constrastive基于注意力步态编码(笼)一个新的特征了解到情境矢量旨在有效地代表步态。实证评估表明15-40%排名-1的精度,我们的方法比骨架为基础显著同行,它甚至实现了卓越的性能众多的多模态方法有额外的RGB或深度信息。我们的代码可在此HTTPS URL。

10. Region Comparison Network for Interpretable Few-shot Image Classification [PDF] 返回目录
  Zhiyu Xue, Lixin Duan, Wen Li, Lin Chen, Jiebo Luo
Abstract: While deep learning has been successfully applied to many real-world computer vision tasks, training robust classifiers usually requires a large amount of well-labeled data. However, the annotation is often expensive and time-consuming. Few-shot image classification has thus been proposed to effectively use only a limited number of labeled examples to train models for new classes. Recent works based on transferable metric learning methods have achieved promising classification performance through learning the similarity between the features of samples from the query and support sets. However, rare of them explicitly considers the model interpretability, which can actually be revealed during the training phase. For that, in this work, we propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works as in a neural network as well as to find out specific regions that are related to each other in images coming from the query and support sets. Moreover, we also present a visualization strategy named Region Activation Mapping (RAM) to intuitively explain what our method has learned by visualizing intermediate variables in our network. We also present a new way to generalize the interpretability from the level of tasks to categories, which can also be viewed as a method to find the prototypical parts for supporting the final decision of our RCN. Extensive experiments on four benchmark datasets clearly show the effectiveness of our method over existing baselines.
摘要:尽管深度学习已经成功地应用于许多现实世界的计算机视觉任务,培养健壮的分类通常需要大量的良好标记的数据。然而,注释往往是昂贵和费时。几个镜头图像分类也因此提出了有效的利用只标识样本数量有限的训练模式为新类。根据转让度量学习方法的最新作品都取得通过学习从查询和支持组样品的特征之间的相似性有前途的分类性能。然而,难得他们明确地认为模型解释性,这实际上可以在训练阶段显露出来。为此,在这项工作中,我们提出了一个名为区域比较网络(RCN)度量学习为基础的方法,这是能够揭示如何少拍学习作品的神经网络,以及找出相关的特定区域对方从查询和支持台来的图像。此外,我们还提出一个名为地区激活映射(RAM)可视化战略,以直观地解释我们的方法已经在我们的网络可视化中间变量教训。我们还提出来概括从任务的等级类别,这也可以看作是找到原型零件支持我们的RCN最终决定的方法可解释性一条新的途径。四个基准数据集大量的实验清楚地表明我们对现有基准方法的有效性。

11. Few-Shot Hyperspectral Image Classification With Unknown Classes Using Multitask Deep Learning [PDF] 返回目录
  Shengjie Liu, Qian Shi, Liangpei Zhang
Abstract: Current hyperspectral image classification assumes that a predefined classification system is closed and complete, and there are no unknown or novel classes in the unseen data. However, this assumption may be too strict for the real world. Often, novel classes are overlooked when the classification system is constructed. The closed nature forces a model to assign a label given a new sample and may lead to overestimation of known land covers (e.g., crop area). To tackle this issue, we propose a multitask deep learning method that simultaneously conducts classification and reconstruction in the open world (named MDL4OW) where unknown classes may exist. The reconstructed data are compared with the original data; those failing to be reconstructed are considered unknown, based on the assumption that they are not well represented in the latent features due to the lack of labels. A threshold needs to be defined to separate the unknown and known classes; we propose two strategies based on the extreme value theory for few-shot and many-shot scenarios. The proposed method was tested on real-world hyperspectral images; state-of-the-art results were achieved, e.g., improving the overall accuracy by 4.94% for the Salinas data. By considering the existence of unknown classes in the open world, our method achieved more accurate hyperspectral image classification, especially under the few-shot context.
摘要:当前高光谱图像分类假设一个预定的分类系统是封闭的和完整的,并有在看不见的数据没有未知的或新的类。然而,这种假设可能过于严格的现实世界。通常,分类系统被构建时新颖的类被忽视。封闭性力模型来分配标签赋予了新的样本,并可能导致称为土地覆盖(例如,作物区)的高估。为了解决这个问题,我们提出了一个多任务深学习方法,能同时进行在未知的类可能存在开放的世界(名为MDL4OW)的分类和重建。重建的数据与原始数据进行比较;那些未能重建被认为是未知的基础上,他们没有得到很好的潜在功能,由于缺少标签所代表的假设。阈值需要被定义到未知和已知类别分开;我们提出了基于极值理论几拍,很多次情况下有两种策略。该方法是对真实世界的高光谱图像检测;状态的最先进的结果实现的,例如,改进由4.94%的整体精度为萨利纳斯数据。通过考虑未知的班在开放的世界的存在,我们的方法来实现更精确的高光谱影像分类,特别是在少数次上下文。

12. A Residual Solver and Its Unfolding Neural Network for Total Variation Regularized Models [PDF] 返回目录
  Yuanhao Gong
Abstract: This paper proposes to solve the Total Variation regularized models by finding the residual between the input and the unknown optimal solution. After analyzing a previous method, we developed a new iterative algorithm, named as Residual Solver, which implicitly solves the model in gradient domain. We theoretically prove the uniqueness of the gradient field in our algorithm. We further numerically confirm that the residual solver can reach the same global optimal solutions as the classical method on 500 natural images. Moreover, we unfold our iterative algorithm into a convolution neural network (named as Residual Solver Network). This network is unsupervised and can be considered as an "enhanced version" of our iterative algorithm. Finally, both the proposed algorithm and neural network are successfully applied on several problems to demonstrate their effectiveness and efficiency, including image smoothing, denoising, and biomedical image reconstruction. The proposed network is general and can be applied to solve other total variation regularized models.
摘要:本文提出了解决全变差通过查找输入和未知的最佳解决方案之间的残余正规化模型。分析以前的方法后,我们开发了一种新的迭代算法,命名为剩余求解器,它隐解决了梯度域模型。从理论上证明了梯度场的独特性在我们的算法。我们进一步数值确认剩余求解器作为500个自然图像的经典方法达到相同的全局最优解。此外,我们开展我们的迭代算法为卷积神经网络(称为残余求解网络)。这个网络是无监督,可以被视为我们的迭代算法的“加强版”。最后,无论是算法和神经网络应用成功的几个问题来证明其有效性和效率,包括图像平滑,去噪和生物医学图像重建。所提出的网络是通用的并且可应用于解决其他全变差正则化的模型。

13. LaSOT: A High-quality Large-scale Single Object Tracking Benchmark [PDF] 返回目录
  Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling
Abstract: Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1,550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2,500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at this https URL.
摘要:尽管最近在视觉跟踪,其进一步的发展,包括算法设计和评估重大进展,由于缺乏专门的大规模基准的限制。为了解决这个问题,我们目前LaSOT,高品质的大型单目标跟踪基准。 LaSOT包含85对象类和总额超过387万帧报价1,550多元化的选择。每个视频帧被仔细地和手动注释与边界框。这使得LaSOT,据我们所知,最大的密集注释跟踪基准。我们在释放LaSOT目标是提供一种用于跟踪的训练和评估专用的高品质平台。 LaSOT的平均视频长度为约2500架,其中每个视频包含了存在于现实世界的录像,如目标消失和重新出现的各种挑战因素。这些较长的视频长度允许长期跟踪评估。要利用视觉外观和自然语言之间的密切联系的优势,我们提供了在LaSOT每个视频语言规范。我们认为,这种增加将允许未来的研究用语言特征,以提高跟踪。两个协议,全重叠和一杆,被指定用于跟踪的灵活评估。我们广泛的评估对LaSOT 48个基线跟踪与深入分析,并且结果显示,仍然存在改进的余地显著。完整的基准,跟踪结果,以及分析可在此HTTPS URL。

14. ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation [PDF] 返回目录
  Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, Kurt Keutzer
Abstract: Due to its robust and precise distance measurements, LiDAR plays an important role in scene understanding for autonomous driving. Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations, which are time-consuming and expensive to obtain. Instead, simulation-to-real domain adaptation (SRDA) trains a DNN using unlimited synthetic data with automatically generated labels and transfers the learned model to real scenarios. Existing SRDA methods for LiDAR point cloud segmentation mainly employ a multi-stage pipeline and focus on feature-level alignment. They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains. In this paper, we propose a novel end-to-end framework, named ePointDA, to address the above issues. Specifically, ePointDA consists of three components: self-supervised dropout noise rendering, statistics-invariant and spatially-adaptive feature alignment, and transferable segmentation learning. The joint optimization enables ePointDA to bridge the domain shift at the pixel-level by explicitly rendering dropout noise for synthetic LiDAR and at the feature-level by spatially aligning the features between different domains, without requiring the real-world statistics. Extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrate the superiority of ePointDA for LiDAR point cloud segmentation.
摘要:由于它的强大和精确的距离测量,激光雷达起着场景理解为自主驾驶着重要作用。上LiDAR数据训练深神经网络(DNNs)需要大规模的逐点注解,这是耗时且昂贵,获得。相反,模拟到现实领域适应性(SRDA),使用自动生成的标签和转让的学习模型真实场景无限合成数据将训练DNN。激光雷达点云分割现有SRDA方法主要采用多级流水线,专注于功能级排列。他们需要的真实世界的统计先验知识而忽略了像素级的辍学噪音差距和不同的域之间的空间特征差距。在本文中,我们提出了一个新颖的终端到终端的框架,名为ePointDA,以解决上述问题。具体而言,ePointDA由三个部分组成:自监督降噪声渲染,统计不变和空间自适应特征对准,并转让分割学习。联合优化使ePointDA通过明确地呈现降噪音综合激光雷达,并在通过空间对准不同的域之间的功能特征级弥合在像素级域转移,而不需要真实世界的统计数据。大量的实验由合成GTA激光雷达适应实际KITTI和SemanticKITTI展示ePointDA的激光雷达点云分割的优越性。

15. Convolution Neural Networks for diagnosing colon and lung cancer histopathological images [PDF] 返回目录
  Sanidhya Mangal, Aanchal Chaurasia, Ayush Khajanchi
Abstract: Lung and Colon cancer are one of the leading causes of mortality and morbidity in adults. Histopathological diagnosis is one of the key components to discern cancer type. The aim of the present research is to propose a computer aided diagnosis system for diagnosing squamous cell carcinomas and adenocarcinomas of lung as well as adenocarcinomas of colon using convolutional neural networks by evaluating the digital pathology images for these cancers. Hereby, rendering artificial intelligence as useful technology in the near future. A total of 2500 digital images were acquired from LC25000 dataset containing 5000 images for each class. A shallow neural network architecture was used classify the histopathological slides into squamous cell carcinomas, adenocarcinomas and benign for the lung. Similar model was used to classify adenocarcinomas and benign for colon. The diagnostic accuracy of more than 97% and 96% was recorded for lung and colon respectively.
摘要:肺癌和结肠癌是成人死亡率和发病率的主要原因之一。病理组织学诊断是关键部件辨别癌症类型中的一个。本研究的目的是提出一种计算机辅助诊断系统,用于通过评估对于这些癌症的数字病理图像使用卷积神经网络诊断鳞状细胞癌和肺腺癌以及结肠腺癌。在此,使人工智能作为有用的技术在不久的将来。总共2500个的数字图像从含有用于每个类别5000倍的图像数据集LC25000获取。一个浅的神经网络结构被用于分类的组织病理学滑入鳞状细胞癌,腺癌和良性的肺部。类似的模型,用来区分腺癌和良性结肠。的97%以上和96%的诊断准确性分别记录肺癌和结肠癌。

16. TanhSoft -- a family of activation functions combining Tanh and Softplus [PDF] 返回目录
  Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey
Abstract: Deep learning at its core, contains functions that are composition of a linear transformation with a non-linear function known as activation function. In past few years, there is an increasing interest in construction of novel activation functions resulting in better learning. In this work, we propose a family of novel activation functions, namely TanhSoft, with four undetermined hyper-parameters of the form tanh and tune these hyper-parameters to obtain activation functions which are shown to outperform several well known activation functions. For instance, replacing ReLU with xtanh(0.6e^x)improves top-1 classification accuracy on CIFAR-10 by 0.46% for DenseNet-169 and 0.7% for Inception-v3 while with tanh(0.87x)ln(1 +e^x) top-1 classification accuracy on CIFAR-100 improves by 1.24% for DenseNet-169 and 2.57% for SimpleNet model.
摘要:其核心深学习,包含与已知为活化功能的非线性函数的线性变换的组合物的功能。在过去的几年里,有正在施工中产生更好的学习新的激活功能的越来越大的兴趣。在这项工作中,我们提出了一个家族的新激活的功能,即TanhSoft,与以下形式的tanh四个未定超参数和调整这些超参数,以获得被证明优于几种公知激活函数激活功能。例如,对于xtanh替换RELU(0.6E ^ x)的0.46%提高顶部-1分类精度上CIFAR-10 DenseNet-169和用于启-V3 0.7%,而用的tanh(0.87x)LN(1个+ E ^ x)的上CIFAR-100顶1的分类精度提高了1.24%为DenseNet-169和用于SimpleNet模型2.57%。

17. Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation [PDF] 返回目录
  Brandon Wagstaff, Jonathan Kelly
Abstract: The self-supervised loss formulation for jointly training depth and egomotion neural networks with monocular images is well studied and has demonstrated state-of-the-art accuracy. One of the main limitations of this approach, however, is that the depth and egomotion estimates are only determined up to an unknown scale. In this paper, we present a novel \textit{scale recovery loss} that enforces consistency between a known camera height and the estimated camera height, generating metric (scaled) depth and egomotion predictions. % We show that our proposed method is competitive with other scale recovery techniques (i.e., pose supervision and stereo left/right consistency constraints). Further, we demonstrate how our method facilitates network retraining within new environments, whereas other scale-resolving approaches are incapable of doing so. Notably, our egomotion network is able to produce more accurate estimates than a similar method that only recovers scale at test time.
摘要:共同培养深度和自我运动神经网络与单目图像的自监督损失制剂充分的研究,并已证实状态的最先进的精度。其中一个这种方法的主要限制,然而,就是在深度和自身运动估计只确定了一个未知的规模。在本文中,我们提出了一个新颖\ textit {规模恢复损耗},一个公知的摄像机高度和推定摄像机高度,产生度量(缩放)深度和自我运动预测之间强制实施的一致性。 %我们证明了我们提出的方法与其他规模恢复技术(即,姿势监督和立体声左/右一致性约束)的竞争力。此外,我们证明我们的方法如何促进中新环境的网络再培训,而其他规模,解决的方法是不能这样做的。值得注意的是,我们的自身运动网络能够产生更准确的估计比仅在测试的时间内恢复规模类似的方法。

18. Deep Cyclic Generative Adversarial Residual Convolutional Networks for Real Image Super-Resolution [PDF] 返回目录
  Rao Muhammad Umer, Christian Micheloni
Abstract: Recent deep learning based single image super-resolution (SISR) methods mostly train their models in a clean data domain where the low-resolution (LR) and the high-resolution (HR) images come from noise-free settings (same domain) due to the bicubic down-sampling assumption. However, such degradation process is not available in real-world settings. We consider a deep cyclic network structure to maintain the domain consistency between the LR and HR data distributions, which is inspired by the recent success of CycleGAN in the image-to-image translation applications. We propose the Super-Resolution Residual Cyclic Generative Adversarial Network (SRResCycGAN) by training with a generative adversarial network (GAN) framework for the LR to HR domain translation in an end-to-end manner. We demonstrate our proposed approach in the quantitative and qualitative experiments that generalize well to the real image super-resolution and it is easy to deploy for the mobile/embedded devices. In addition, our SR results on the AIM 2020 Real Image SR Challenge datasets demonstrate that the proposed SR approach achieves comparable results as the other state-of-art methods.
摘要:最近的深度学习基于单幅图​​像超分辨率(SISR)方法主要是培养他们的模型在一个干净的数据域在低分辨率(LR)和高分辨率(HR)图像来自无噪音的设置(同一个域)由于双三次下采样的假设。然而,这样的降解过程是不是在现实世界中的可用设置。我们认为深循环的网络结构,以保持LR和HR数据分布,这是由CycleGAN的图像 - 图像转换应用,最近成功的启发之间的域一致性。我们通过培训的LR到HR域翻译的端至端的方式生成对抗网络(GAN)框架提出了超分辨率残循环剖成对抗性网络(SRResCycGAN)。我们证明了我们提出的方法在推广以及对真实影像超分辨率的定量和定性实验,它是易于部署的移动/嵌入式设备。此外,我们对AIM 2020真实影像SR SR结果挑战数据集表明,该SR方法实现了类似的结果与其他国家的技术方法。

19. GPU-based Self-Organizing Maps for Post-Labeled Few-Shot Unsupervised Learning [PDF] 返回目录
  Lyes Khacef, Vincent Gripon, Benoit Miramond
Abstract: Few-shot classification is a challenge in machine learning where the goal is to train a classifier using a very limited number of labeled examples. This scenario is likely to occur frequently in real life, for example when data acquisition or labeling is expensive. In this work, we consider the problem of post-labeled few-shot unsupervised learning, a classification task where representations are learned in an unsupervised fashion, to be later labeled using very few annotated examples. We argue that this problem is very likely to occur on the edge, when the embedded device directly acquires the data, and the expert needed to perform labeling cannot be prompted often. To address this problem, we consider an algorithm consisting of the concatenation of transfer learning with clustering using Self-Organizing Maps (SOMs). We introduce a TensorFlow-based implementation to speed-up the process in multi-core CPUs and GPUs. Finally, we demonstrate the effectiveness of the method using standard off-the-shelf few-shot classification benchmarks.
摘要:很少拍分类是机器学习是一个挑战,其目的是使用的标识样本数量非常有限训练分类。这种情况是有可能发生的数据采集或标签是很昂贵的频繁发生在现实生活中,例如。在这项工作中,我们考虑后标记几拍无监督学习,分类任务,其中表示在无监督的方式了解到的问题,用非常少的注释例子在后面标记。我们认为,这一问题很可能在边缘,出现在嵌入式设备直接获取数据,并进行标注所需要的专家不能常提示。为了解决这个问题,我们考虑的算法,包括迁移学习的级联使用自组织映射(SOM网络)聚类。我们引入多核CPU和GPU基于TensorFlow的实现的加速过程。最后,我们证明了使用标准的现成的货架为数不多的镜头分类基准测试方法的有效性。

20. Learning more expressive joint distributions in multimodal variational methods [PDF] 返回目录
  Sasho Nedelkoski, Mihail Bogojeski, Odej Kao
Abstract: Data often are formed of multiple modalities, which jointly describe the observed phenomena. Modeling the joint distribution of multimodal data requires larger expressive power to capture high-level concepts and provide better data representations. However, multimodal generative models based on variational inference are limited due to the lack of flexibility of the approximate posterior, which is obtained by searching within a known parametric family of distributions. We introduce a method that improves the representational capacity of multimodal variational methods using normalizing flows. It approximates the joint posterior with a simple parametric distribution and subsequently transforms into a more complex one. Through several experiments, we demonstrate that the model improves on state-of-the-art multimodal methods based on variational inference on various computer vision tasks such as colorization, edge and mask detection, and weakly supervised learning. We also show that learning more powerful approximate joint distributions improves the quality of the generated samples. The code of our model is publicly available at this https URL.
摘要:数据经常多个模态,它们共同描述了观察到的现象的形成。造型多模数据的联合分布需要较大的表现力,以捕捉高层次的概念,并提供更好的数据表示。然而,基于变推理多峰生成模型是有限的,由于缺乏的近似后验,这是由分布的已知参数家庭内进行搜索而获得的灵活性。我们介绍,可以改进使用正火流多峰变分法代表能力的方法。它近似于一个简单的参数分布的联合后验,并随后变换成更复杂的一个。经过多次实验,我们证明,该模型在国家的最先进的多模态方法改进了基于各种计算机视觉任务,如着色,边缘和面罩检测,和弱监督学习变的推论。我们还表明,学习更强大的近似联合分布提高了所产生样品的质量。我们模型的代码是公开的,在此HTTPS URL。

21. Imbalanced Continual Learning with Partitioning Reservoir Sampling [PDF] 返回目录
  Chris Dongjoo Kim, Jinseo Jeong, Gunhee Kim
Abstract: Continual learning from a sequential stream of data is a crucial challenge for machine learning research. Most studies have been conducted on this topic under the single-label classification setting along with an assumption of balanced label distribution. This work expands this research horizon towards multi-label classification. In doing so, we identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution. We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by first empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the study of both intra- and inter-task imbalances. Lastly, we propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS), which allows the model to maintain a balanced knowledge of both head and tail classes. We publicly release the dataset and the code in our project page.
摘要:从数据的连续流持续学习是机器学习研究的一个重大挑战。大多数研究关于这一主题的单标签分类设置下与均衡标签分发的假设一起被进行。这项工作扩大对多标签分类这项研究的视野。在此过程中,我们确定意外的逆境中许多多标签数据集,长尾分布天生存在。我们共同应对两个独立解决的问题,Catastropic遗忘和第一经验显示的尾巴上少数概念破坏性遗忘了新的挑战长尾标签分发。于是,我们策划2个基准数据集,COCOseq和NUS-WIDEseq,允许内和任务间失衡的研究。最后,我们提出了一个名为分区水库采样(PRS)的重播的方法,它允许模型保持头部和尾部类的平衡知识新的抽样策略。我们公开发布的数据集,并在我们的项目页面的代码。

22. Horus: Using Sensor Fusion to Combine Infrastructure and On-board Sensing to Improve Autonomous Vehicle Safety [PDF] 返回目录
  Sanjay Seshan
Abstract: Studies predict that demand for autonomous vehicles will increase tenfold between 2019 and 2026. However, recent high-profile accidents have significantly impacted consumer confidence in this technology. The cause for many of these accidents can be traced back to the inability of these vehicles to correctly sense the impending danger. In response, manufacturers have been improving the already extensive on-vehicle sensor packages to ensure that the system always has access to the data necessary to ensure safe navigation. However, these sensor packages only provide a view from the vehicle's perspective and, as a result, autonomous vehicles still require frequent human intervention to ensure safety. To address this issue, I developed a system, called Horus, that combines on-vehicle and infrastructure-based sensors to provide a more complete view of the environment, including areas not visible from the vehicle. I built a small-scale experimental testbed as a proof of concept. My measurements of the impact of sensor failures showed that even short outages (1 second) at slow speeds (25 km/hr scaled velocity) prevents vehicles that rely on on-vehicle sensors from navigating properly. My experiments also showed that Horus dramatically improves driving safety and that the sensor fusion algorithm selected plays a significant role in the quality of the navigation. With just a pair of infrastructure sensors, Horus could tolerate sensors that fail 40% of the time and still navigate safely. These results are a promising first step towards safer autonomous vehicles.
摘要:研究预测,对自主车的需求将增加十倍之间的2019和2026在这个技术然而,最近高调事故时有显著影响消费者信心。对于很多这些事故发生的原因可追溯到无力这些车辆的正确感知即将发生的危险。对此,制造商一直在改进已经广泛车载传感器封装,以确保系统始终获得必要的,以确保安全航行数据。然而,这些传感器包只提供从车辆的立体图和,因此,自主车仍需要频繁的人工干预,以确保安全。为了解决这个问题,我开发了一个系统,称为荷鲁斯,即在车辆和基础设施为基础的传感器相结合,提供了环境的更全面的了解,包括从车辆不可见区域。我建立了一个小规模的实验测试平台为概念的证明。我的传感器故障的影响测量结果表明,即使短时间中断,在依赖于车载传感器从导航正常低速(25公里/小时速度缩放)防止车辆(1秒)。我的实验还表明,荷鲁斯极大地提高了驾驶的安全性和所选择的传感器融合算法起到导航的质量显著的作用。只需对基础设施的传感器,荷鲁斯可以容忍失败时40%的传感器和导航仍然安全。这些结果是转向更加安全的自主车有前途的第一步。

23. Adversarial attacks on deep learning models for fatty liver disease classification by modification of ultrasound image reconstruction method [PDF] 返回目录
  Michal Byra, Grzegorz Styczynski, Cezary Szmigielski, Piotr Kalinowski, Lukasz Michalowski, Rafal Paluszkiewicz, Bogna Ziarkiewicz-Wroblewska, Krzysztof Zieniewicz, Andrzej Nowicki
Abstract: Convolutional neural networks (CNNs) have achieved remarkable success in medical image analysis tasks. In ultrasound (US) imaging, CNNs have been applied to object classification, image reconstruction and tissue characterization. However, CNNs can be vulnerable to adversarial attacks, even small perturbations applied to input data may significantly affect model performance and result in wrong output. In this work, we devise a novel adversarial attack, specific to ultrasound (US) imaging. US images are reconstructed based on radio-frequency signals. Since the appearance of US images depends on the applied image reconstruction method, we explore the possibility of fooling deep learning model by perturbing US B-mode image reconstruction method. We apply zeroth order optimization to find small perturbations of image reconstruction parameters, related to attenuation compensation and amplitude compression, which can result in wrong output. We illustrate our approach using a deep learning model developed for fatty liver disease diagnosis, where the proposed adversarial attack achieved success rate of 48%.
摘要:卷积神经网络(细胞神经网络)都实现了医学图像分析任务显着成效。在超声(US)成像,细胞神经网络已经被应用到对象分类,图像重建和组织表征。然而,细胞神经网络容易受到攻击的对抗,哪怕是很小的扰动应用于输入数据可能显著影响的输出错误模型的性能和结果。在这项工作中,我们设计了一种新的对抗攻击,具体到超声(US)成像。美国图像是基于射频信号重建。由于US图像的外观取决于所施加的图像重建方法,我们通过扰动US B模式图像重建方法探索嘴硬深学习模型的可能性。我们采用零阶优化找到的图像重建参数,涉及到的衰减补偿和振幅压缩,这可能会导致错误的输出小扰动。我们说明了使用脂肪肝疾病的诊断,其中所提出的对抗性攻击达到了48%的成功率建立了深厚的学习模式,我们的做法。

24. Going deeper with brain morphometry using neural networks [PDF] 返回目录
  Rodrigo Santa Cruz, Léo Lebrat, Pierrick Bourgeat, Vincent Doré, Jason Dowling, Jurgen Fripp, Clinton Fookes, Olivier Salvado
Abstract: Brain morphometry from magnetic resonance imaging (MRI) is a consolidated biomarker for many neurodegenerative diseases. Recent advances in this domain indicate that deep convolutional neural networks can infer morphometric measurements within a few seconds. Nevertheless, the accuracy of the devised model for insightful bio-markers (mean curvature and thickness) remains unsatisfactory. In this paper, we propose a more accurate and efficient neural network model for brain morphometry named HerstonNet. More specifically, we develop a 3D ResNet-based neural network to learn rich features directly from MRI, design a multi-scale regression scheme by predicting morphometric measures at feature maps of different resolutions, and leverage a robust optimization method to avoid poor quality minima and reduce the prediction variance. As a result, HerstonNet improves the existing approach by 24.30% in terms of intraclass correlation coefficient (agreement measure) to FreeSurfer silver-standards while maintaining a competitive run-time.
摘要:磁共振成像(MRI)脑形态学是许多神经变性疾病的综合生物标志物。在这一领域的最新进展表明,深卷积神经网络可以在几秒钟之内推断形态测量。然而,对于有见地的生物标志物(平均曲率和厚度)所设计的模型的准确性仍然不能令人满意。在本文中,我们提出了一个名为HerstonNet大脑形态更准确,高效的神经网络模型。更具体地讲,我们开发了一个基于RESNET-3D的神经网络直接从MRI学习丰富的功能,通过在不同分辨率的特征图预测形态测量设计多尺度回归方案,并利用强大的优化方法,以避免质量极小差,减少预测方差。其结果是,HerstonNet同时保持有竞争力的运行时提高了24.30%在组内相关系数(协议度量)到FreeSurfer银标准方面的现有的方法。

25. Sensors, Safety Models and A System-Level Approach to Safe and Scalable Automated Vehicles [PDF] 返回目录
  Jack Weast
Abstract: When considering the accuracy of sensors in an automated vehicle (AV), it is not sufficient to evaluate the performance of any given sensor in isolation. Rather, the performance of any individual sensor must be considered in the context of the overall system design. Techniques like redundancy and different sensing modalities can reduce the chances of a sensing failure. Additionally, the use of safety models is essential to understanding whether any particular sensing failure is relevant. Only when the entire system design is taken into account can one properly understand the meaning of safety-relevant sensing failures in an AV. In this paper, we will consider what should actually constitute a sensing failure, how safety models play an important role in mitigating potential failures, how a system-level approach to safety will deliver a safe and scalable AV, and what an acceptable sensing failure rate should be considering the full picture of an AV's architecture.
摘要:当考虑在自动车辆(AV)的传感器的准确度,它是不够的,以评估隔离任何给定的传感器的性能。相反,任何个别传感器的性能,必须在整个系统的设计的范围内加以考虑。像冗余和不同的传感方式技术可以减少传感失败的机会。此外,使用安全模型是理解任何特定的传感故障是相关的基本。只有当整个系统的设计是考虑到可以在一个正确的理解在AV安全相关的检测失败的含义。在本文中,我们会考虑什么其实应该构成传感故障,安全模型如何减缓潜在的故障,如何在系统级的安全方法将提供一个安全和可扩展的AV发挥了重要作用,什么可接受的传感故障率应考虑的AV架构的全貌。

注:中文为机器翻译结果!封面为论文标题词云图!