0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-08-17

目录

1. Feedback Attention for Cell Image Segmentation [PDF] 摘要
2. Abstracting Deep Neural Networks into Concept Graphs for Concept Level Interpretability [PDF] 摘要
3. Self-adapting confidence estimation for stereo [PDF] 摘要
4. RODEO: Replay for Online Object Detection [PDF] 摘要
5. Renormalization for Initialization of Rolling Shutter Visual-Inertial Odometry [PDF] 摘要
6. Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos [PDF] 摘要
7. PointMixup: Augmentation for Point Clouds [PDF] 摘要
8. An Overview of Deep Learning Architectures in Few-Shots Learning Domain [PDF] 摘要
9. Survey of XAI in digital pathology [PDF] 摘要
10. Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video Re-identification [PDF] 摘要
11. GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes [PDF] 摘要
12. Rb-PaStaNet: A Few-Shot Human-Object Interaction Detection Based on Rules and Part States [PDF] 摘要
13. Optimized Deep Encoder-Decoder Methods for Crack Segmentation [PDF] 摘要
14. A Learning-based Method for Online Adjustment of C-arm Cone-Beam CT Source Trajectories for Artifact Avoidance [PDF] 摘要
15. ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection [PDF] 摘要
16. Deep Atrous Guided Filter for Image Restoration in Under Display Cameras [PDF] 摘要
17. BriNet: Towards Bridging the Intra-class and Inter-class Gaps in One-Shot Segmentation [PDF] 摘要
18. Parameters Sharing Exploration and Hetero-Center based Triplet Loss for Visible-Thermal Person Re-Identification [PDF] 摘要
19. Structure-Aware Network for Lane Marker Extraction with Dynamic Vision Sensor [PDF] 摘要
20. An Improved Deep Convolutional Neural Network-Based Autonomous Road Inspection Scheme Using Unmanned Aerial Vehicles [PDF] 摘要
21. Apparel-invariant Feature Learning for Apparel-changed Person Re-identification [PDF] 摘要
22. A Multimodal Late Fusion Model for E-Commerce Product Classification [PDF] 摘要
23. 3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View [PDF] 摘要
24. Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction [PDF] 摘要
25. Geometric Deep Learning for Post-Menstrual Age Prediction based on the Neonatal White Matter Cortical Surface [PDF] 摘要
26. Novelty Detection Through Model-Based Characterization of Neural Networks [PDF] 摘要
27. Semantically Adversarial Learnable Filters [PDF] 摘要
28. Self-Sampling for Neural Point Cloud Consolidation [PDF] 摘要
29. SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud [PDF] 摘要
30. Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review [PDF] 摘要
31. Integrating uncertainty in deep neural networks for MRI based stroke analysis [PDF] 摘要
32. Automated detection and quantification of COVID-19 airspace disease on chest radiographs: A novel approach achieving radiologist-level performance using a CNN trained on digital reconstructed radiographs (DRRs) from CT-based ground-truth [PDF] 摘要
33. Homotopic Gradients of Generative Density Priors for MR Image Reconstruction [PDF] 摘要
34. Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images [PDF] 摘要
35. WAN: Watermarking Attack Network [PDF] 摘要
36. Unsupervised Image Restoration Using Partially Linear Denoisers [PDF] 摘要
37. Interpretation of Brain Morphology in Association to Alzheimer's Disease Dementia Classification Using Graph Convolutional Networks on Triangulated Meshes [PDF] 摘要
38. Landmark detection in Cardiac Magnetic Resonance Imaging Using A Convolutional Neural Network [PDF] 摘要
39. Can weight sharing outperform random architecture search? An investigation with TuNAS [PDF] 摘要
40. MIXCAPS: A Capsule Network-based Mixture of Experts for Lung Nodule Malignancy Prediction [PDF] 摘要

摘要

1. Feedback Attention for Cell Image Segmentation [PDF] 返回目录
  Hiroki Tsuda, Eisuke Shibuya, Kazuhiro Hotta
Abstract: In this paper, we address cell image segmentation task by Feedback Attention mechanism like feedback processing. Unlike conventional neural network models of feedforward processing, we focused on the feedback processing in human brain and assumed that the network learns like a human by connecting feature maps from deep layers to shallow layers. We propose some Feedback Attentions which imitate human brain and feeds back the feature maps of output layer to close layer to the input. U-Net with Feedback Attention showed better result than the conventional methods using only feedforward processing.
摘要:在本文中,我们应对反馈注意机制像反馈处理细胞图像的分割任务。不同于前馈处理的常规神经网络模型,我们在人类大脑集中在反馈处理,并假设该网络获知喜欢通过连接功能,人类从深层到浅层映射。我们提出了一些注意事项反馈其模仿人类大脑并反馈输出层的功能映射到接近层的输入。掌中与反馈注意表现出比仅仅使用前馈处理的传统方法更好的结果。

2. Abstracting Deep Neural Networks into Concept Graphs for Concept Level Interpretability [PDF] 返回目录
  Avinash Kori, Parth Natekar, Ganapathy Krishnamurthi, Balaji Srinivasan
Abstract: The black-box nature of deep learning models prevents them from being completely trusted in domains like biomedicine. Most explainability techniques do not capture the concept-based reasoning that human beings follow. In this work, we attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn. Extracting such a graphical representation of the model's behavior on an abstract, higher conceptual level would unravel the learnings of these models and would help us to evaluate the steps taken by the model for predictions. We show the application of our proposed implementation on two biomedical problems - brain tumor segmentation and fundus image classification. We provide an alternative graphical representation of the model by formulating a \textit{concept level graph} as discussed above, which makes the problem of intervention to find active inference trails more tractable. Understanding these trails would provide an understanding of the hierarchy of the decision-making process followed by the model. [As well as overall nature of model]. Our framework is available at \url{this https URL}
摘要:深学习模式妨碍了他们的黑盒性质在诸如生物医药领域被完全信任的。大多数explainability技术并没有捕捉到基于概念的推理,人类遵循。在这项工作中,我们试图了解,通过构建他们学习概念的图形表示执行医疗领域的图像处理任务训练的模型的行为。在一个抽象的,更高层次的概念提取模型的行为,这样的图形表示将解开这些模型所学的知识,将有助于我们评估由模型预测采取的步骤。我们发现我们的建议实施的两个生物医学问题的应用程序 - 脑肿瘤分割和眼底图像分类。我们通过配制\ textit {概念级别图表}如以上所讨论的,这使得干预的问题找到活性推理小径更易处理提供模型的替代图形表示。了解这些创新将提供决策所遵循的程序模型的层次的理解。 [以及模型的整体性质。我们的框架是可以在\ {URL这HTTPS URL}

3. Self-adapting confidence estimation for stereo [PDF] 返回目录
  Matteo Poggi, Filippo Aleotti, Fabio Tosi, Giulio Zaccaroni, Stefano Mattoccia
Abstract: Estimating the confidence of disparity maps inferred by a stereo algorithm has become a very relevant task in the years, due to the increasing number of applications leveraging such cue. Although self-supervised learning has recently spread across many computer vision tasks, it has been barely considered in the field of confidence estimation. In this paper, we propose a flexible and lightweight solution enabling self-adapting confidence estimation agnostic to the stereo algorithm or network. Our approach relies on the minimum information available in any stereo setup (i.e., the input stereo pair and the output disparity map) to learn an effective confidence measure. This strategy allows us not only a seamless integration with any stereo system, including consumer and industrial devices equipped with undisclosed stereo perception methods, but also, due to its self-adapting capability, for its out-of-the-box deployment in the field. Exhaustive experimental results with different standard datasets support our claims, showing how our solution is the first-ever enabling online learning of accurate confidence estimation for any stereo system and without any requirement for the end-user.
摘要:估算差距的信任映射通过立体算法推断已成为这些年来非常相关的任务,由于越来越多的应用利用这样的提示。虽然自我监督学习已经在许多计算机视觉任务最近蔓延,已经几乎没有信心估计的领域考虑。在本文中,我们提出了一种灵活的轻量级解决方案,支持自适应置信估计不可知的立体演算法或网络。我们的方法依赖于在任何立体声设置可用的最小信息(即,输入的立体声对和所述输出视差图),学习的有效置信度量。这一战略使我们不仅与任何音响系统的无缝集成,包括消费者,并配备了未公开的立体感受的方法的工业设备,而且,由于它的自适应能力,对于超出的即装即用其部署在该领域。用不同的标准数据集详尽的实验结果支持了我们的要求,展示我们的解决方案如何为任何立体声系统,并没有对最终用户的任何要求,准确把握估计的首次启用网上学习。

4. RODEO: Replay for Online Object Detection [PDF] 返回目录
  Manoj Acharya, Tyler L. Hayes, Christopher Kanan
Abstract: Humans can incrementally learn to do new visual detection tasks, which is a huge challenge for today's computer vision systems. Incrementally trained deep learning models lack backwards transfer to previously seen classes and suffer from a phenomenon known as $"catastrophic forgetting."$ In this paper, we pioneer online streaming learning for object detection, where an agent must learn examples one at a time with severe memory and computational constraints. In object detection, a system must output all bounding boxes for an image with the correct label. Unlike earlier work, the system described in this paper can learn this task in an online manner with new classes being introduced over time. We achieve this capability by using a novel memory replay mechanism that efficiently replays entire scenes. We achieve state-of-the-art results on both the PASCAL VOC 2007 and MS COCO datasets.
摘要:人类可以逐步学会做新的视觉检测任务,这是当今计算机视觉系统的巨大挑战。增量训练的深度学习模式缺乏向后转移到以前看到的类和患有被称为$现象“灾难性的遗忘。” $在本文中,我们在网上开拓物体检测,当代理人必须学会一个例子,在一个时间流学习严重的记忆和计算的制约。在物体检测,系统必须输出所有包围盒,用于与正确的标签的图像。不像早期的工作,引入随着时间的推移新的类本文描述的系统可以学习在线方式这个任务。我们通过使用一种新的存储重放机制能够有效地重放整个场景实现此功能。我们实现了对PASCAL VOC 2007和MS COCO数据集两者国家的先进成果。

5. Renormalization for Initialization of Rolling Shutter Visual-Inertial Odometry [PDF] 返回目录
  Branislav Micusik, Georgios Evangelidis
Abstract: In this paper we deal with the initialization problem of a visual-inertial odometry system with rolling shutter cameras. The initialization is a prerequisite to utilize inertial signals and fuse them with the visual data. We propose a novel way to solve this problem on visual and inertial data simultaneously in a statistical sense, by casting it into the renormalization scheme of Kanatani. The renormalization is an optimization scheme which intends to reduce the inherent statistical bias of common linear systems. We derive and present necessary steps and methodology specific for the initialization problem. Extensive evaluations on perfect ground truth exhibit superior performance and up to 20% accuracy gain to the originally proposed Least Squares solution. The renormalization performs similarly to the optimal Maximum Likelihood estimate, despite arriving to the solution by different means. By this, we extend the set of common Computer Vision problems which can be cast into the renormalization scheme.
摘要:在本文中,我们处理与滚动快门相机视觉惯性测程系统的初始化问题。初始化是利用惯性信号,并且保险丝它们与视觉数据的先决条件。我们建议解决视觉和惯性数据这一问题在统计意义上的同时以新颖的方式,通过铸造成金谷的重整化方案。再归一化是打算减少常见的线性系统的固有统计偏差的优化方案。我们推导和现在必要的步骤和方法,具体的初始化问题。在完美的地面实况表现出优异的性能和高达20%的准确度增益最初提出的最小二乘解决方案广泛的评估。再归一化进行的最优极大似然估计同样地,尽管抵达通过不同的方式解决方案。通过这种方式,我们扩展了一套可铸造成重整化方案常见的计算机视觉问题。

6. Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos [PDF] 返回目录
  Gnana Praveen R, Eric Granger, Patrick Cardinal
Abstract: Predicting the level of facial expression intensities based on videos allow capturing a representation of affect, which has many potential applications such as pain localisation, depression detection, etc. However, state-of-the-art DL(DL) models to predict these levels are typically formulated regression problems, and do not leverage the data distribution, nor the ordinal relationship between levels. This translates to a limited robustness to noisy and uncertain labels. Moreover, annotating expression intensity levels for video frames is a costly undertaking, involving considerable labor by domain experts, and the labels are vulnerable to subjective bias due to ambiguity among adjacent intensity levels. This paper introduces a DL model for weakly-supervised domain adaptation with ordinal regression (WSDA-OR), where videos in target domain have coarse labels representing of ordinal intensity levels that are provided on a periodic basis. In particular, the proposed model learn discriminant and domain-invariant representations by integrating multiple instance learning with deep adversarial domain adaptation, where an Inflated 3D CNN (I3D) is trained using fully supervised source domain videos, and weakly supervised target domain videos. The trained model is finally used to estimate the ordinal intensity levels of individual frames in the target operational domain. The proposed approach has been validated for pain intensity estimation on using RECOLA dataset as labeled source domain, and UNBC-McMaster dataset as weakly-labeled target domain. Experimental results shows significant improvement over the state-of-the-art models and achieves higher level of localization accuracy.
摘要:预测面部表情强度的基于视频的水平允许捕获影响的表示,其具有许多潜在的应用,例如疼痛定位,按下检测,等等。然而,国家的最先进的DL(DL)模型来预测这些水平通常配制回归的问题,并没有充分利用数据分布,也不水平之间的关系,有序。这意味着在有限的鲁棒性嘈杂和不确定的标签。此外,注释表达强度水平为视频帧是一个昂贵的工作,涉及到由领域专家相当大的劳动,而且标签由于相邻强度水平之间歧义易受主观偏见。本文介绍了具有有序回归(WSDA-OR),其中在目标域视频具有表示被设置在周期性基础上该序强度水平的粗标签弱监督域自适应的DL模型。特别是,该模型通过与深对抗性域调整,其中一个充气3D CNN(I3D)使用完全监控源域的视频培训,弱监管目标域视频集成多个实例学习学习判别和域名不变表示。经训练的模型最终用于估计目标操作域的各个帧的序强度水平。所提出的方法已经被验证为疼痛强度估计使用RECOLA数据集作为标记的源域,和UNBC-麦克马斯特数据集作为弱标记的靶结构域。实验结果显示了国家的最先进的机型显著改善,并实现定位精度的更高的水平。

7. PointMixup: Augmentation for Point Clouds [PDF] 返回目录
  Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes, Pengwan Yang, Cees G.M. Snoek
Abstract: This paper introduces data augmentation for point clouds by interpolation between examples. Data augmentation by interpolation has shown to be a simple and effective approach in the image domain. Such a mixup is however not directly transferable to point clouds, as we do not have a one-to-one correspondence between the points of two different objects. In this paper, we define data augmentation between point clouds as a shortest path linear interpolation. To that end, we introduce PointMixup, an interpolation method that generates new examples through an optimal assignment of the path function between two point clouds. We prove that our PointMixup finds the shortest path between two point clouds and that the interpolation is assignment invariant and linear. With the definition of interpolation, PointMixup allows to introduce strong interpolation-based regularizers such as mixup and manifold mixup to the point cloud domain. Experimentally, we show the potential of PointMixup for point cloud classification, especially when examples are scarce, as well as increased robustness to noise and geometric transformations to points. The code for PointMixup and the experimental details are publicly available.
摘要:本文介绍了通过实施例之间的内插点云数据扩张。通过内插数据的增强已证明是在图像域中的简单而有效的方法。这样的的mixup然而,这不是直接转让给点云,因为我们没有两个不同对象的点之间的一个一一对应。在本文中,我们定义点云作为最短路径的线性内插数据之间的增强。为此目的,我们引入PointMixup,通过两个点云之间的路径功能的最佳分配,生成新的例子的插值方法。我们证明了我们的PointMixup找到两个点云之间的插值是分配不变和线性的最短路径。具有内插的定义,PointMixup允许引入强基于内插的regularizers如查询股价和歧管的mixup点云域。在实验中,我们展示PointMixup的潜力点云的分类,尤其是当例子是稀缺的,以及增加坚固性噪声和几何变换来分。对于PointMixup和实验细节代码是公开的。

8. An Overview of Deep Learning Architectures in Few-Shots Learning Domain [PDF] 返回目录
  Shruti Jadon
Abstract: Since 2012, Deep learning has revolutionized Artificial Intelligence and has achieved state-of-the-art outcomes in different domains, ranging from Image Classification to Speech Generation. Though it has many potentials, our current architectures come with the pre-requisite of large amounts of data. Few-Shot Learning(also known as few-shot learning) is a subfield of machine learning that aims to create such models that can learn the desired objective with less data, similar to how humans learn. In this paper, we have reviewed some of the well-known deep learning-based approaches towards few-shot learning. We have discussed the recent achievements, challenges, and possibilities of improvement of few-shot learning based deep learning architectures. Our aim for this paper is threefold: (i) Give a brief introduction to deep learning architectures for few-shot learning with pointers to core references. (ii) Indicate how deep learning has been applied to the low-data regime, from data preparation to model training. And, (iii) Provide a starting point for people interested in experimenting and perhaps contributing to the field of few-shot learning by pointing out some useful resources and open-source code. Our code is available at Github:\url{this https URL}
摘要:2012年以来,深学习已经彻底改变了人工智能,并已取得在不同的域中的国家的最先进的结果,从图像分类到语音生成。虽然它有很多的潜力,我们目前的架构来与大量数据的先决条件。为数不多的射击学习(也称几次学习)是机器学习,旨在建立这样的模型,可以学到所需的客观数据较少,类似于人类如何学习的一个分支。在本文中,我们回顾了一些对几拍学习众所周知深学习型方法的。我们已经讨论了最近取得的成就,面临的挑战,以及改善基础深度学习架构几拍学习的可能性。我们对于本文的目的有三个方面:(一)简要介绍,以深度学习架构来几拍的指针核心参考学习。 (二)指出学习如何深已经施加到所述低数据制度,从数据准备模型训练。而且,(三)有兴趣尝试,或许通过指出一些有用的资源和开源代码贡献少拍学习领域的人提供了一个起点。我们的代码可在Github上:\ {URL这HTTPS URL}

9. Survey of XAI in digital pathology [PDF] 返回目录
  Milda Pocevičiūtė, Gabriel Eilertsen, Claes Lundström
Abstract: Artificial intelligence (AI) has shown great promise for diagnostic imaging assessments. However, the application of AI to support medical diagnostics in clinical routine comes with many challenges. The algorithms should have high prediction accuracy but also be transparent, understandable and reliable. Thus, explainable artificial intelligence (XAI) is highly relevant for this domain. We present a survey on XAI within digital pathology, a medical imaging sub-discipline with particular characteristics and needs. The review includes several contributions. Firstly, we give a thorough overview of current XAI techniques of potential relevance for deep learning methods in pathology imaging, and categorise them from three different aspects. In doing so, we incorporate uncertainty estimation methods as an integral part of the XAI landscape. We also connect the technical methods to the specific prerequisites in digital pathology and present findings to guide future research efforts. The survey is intended for both technical researchers and medical professionals, one of the objectives being to establish a common ground for cross-disciplinary discussions.
摘要:人工智能(AI)已经显示出影像诊断评估的巨大潜力。然而,AI的支持医疗诊断在临床常规应用附带了许多挑战。该算法应具有较高的预测精度,而且是透明的,可理解的和可靠的。因此,解释的人工智能(XAI)是该领域高度相关。我们提出数字病理学,医学成像亚学科具有特定特征和需求内的XAI的调查。该评价纳入了一些贡献。首先,我们给出的潜力,在病理成像深度学习方法相关的电流XAI技术的全面概述,并从三个不同的方面进行分类。在此过程中,我们结合不确定性估计方法的XAI景观的重要组成部分。我们还连接技术手段来具体的先决条件在数字病理学和目前的研究结果,以指导今后的研究工作。本次调查的目的是出于技术研究人员和医学专家,被建立一个共同点跨学科讨论的目标之一。

10. Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video Re-identification [PDF] 返回目录
  Toby P. Breckon, Aishah Alsehaim
Abstract: Video-based person re-identification has received increasing attention recently, as it plays an important role within surveillance video analysis. Video-based Re-ID is an expansion of earlier image-based re-identification methods by learning features from a video via multiple image frames for each person. Most contemporary video Re-ID methods utilise complex CNNbased network architectures using 3D convolution or multibranch networks to extract spatial-temporal video features. By contrast, in this paper, we illustrate superior performance from a simple single stream 2D convolution network leveraging the ResNet50-IBN architecture to extract frame-level features followed by temporal attention for clip level features. These clip level features can be generalised to extract video level features by averaging without any significant additional cost. Our approach uses best video Re-ID practice and transfer learning between datasets to outperform existing state-of-the-art approaches on the MARS, PRID2011 and iLIDS-VID datasets with 89:62%, 97:75%, 97:33% rank-1 accuracy respectively and with 84:61% mAP for MARS, without reliance on complex and memory intensive 3D convolutions or multi-stream networks architectures as found in other contemporary work. Conversely, our work shows that global features extracted by the 2D convolution network are a sufficient representation for robust state of the art video Re-ID.
摘要:基于视频的人重新鉴定已收到最近越来越多的关注,因为它播放的监控录像分析中的一个重要角色。基于视频的重新ID是较早的基于图像的重新鉴定方法通过经由针对每个人的多个图像帧的视频学习特征的扩展。最现代的视频重新ID方法利用使用三维卷积或多分支网络中提取的空间 - 时间视频功能复杂CNNbased网络架构。与此相反,在本文中,我们示出了从简单的单流二维卷积网络撬动ResNet50-IBN架构来提取帧级后跟限幅电平特征的时间关注的特征优越的性能。这些片段级功能可以通过平均无任何显著额外费用可以推广到提取视频电平特性。我们的方法使用数据集之间最好的视频重新ID实践和迁移学习胜过现有的最先进的国家的89对MARS,PRID2011和iLIDS-VID的数据集接近:62%,97:75%,97:33%秩1的精度分别并用84:61%的地图为MARS,无需复杂和内存密集型3D褶或多流网络架构的依赖在其他当代工作中发现。相反,我们的工作表明,由2D卷积网络提取的全局特征对于本领域的视频重新ID的健壮状态的足够的代表性。

11. GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes [PDF] 返回目录
  Weidong Zhang, Wei Zhang, Yinda Zhang
Abstract: The task of room layout estimation is to locate the wall-floor, wall-ceiling, and wall-wall boundaries. Most recent methods solve this problem based on edge/keypoint detection or semantic segmentation. However, these approaches have shown limited attention on the geometry of the dominant planes and the intersection between them, which has significant impact on room layout. In this work, we propose to incorporate geometric reasoning to deep learning for layout estimation. Our approach learns to infer the depth maps of the dominant planes in the scene by predicting the pixel-level surface parameters, and the layout can be generated by the intersection of the depth maps. Moreover, we present a new dataset with pixel-level depth annotation of dominant planes. It is larger than the existing datasets and contains both cuboid and non-cuboid rooms. Experimental results show that our approach produces considerable performance gains on both 2D and 3D datasets.
摘要:房间布局估计的任务是找到墙壁地板,墙壁,天花板和墙壁的壁边界。最近的方法基于EDGE /关键点检测或者语义分割解决这个问题。然而,这些方法已经表现出对主导面的几何形状和它们之间的交叉点,这对房间布局显著影响有限关注。在这项工作中,我们建议结合几何推理深学习的布局估计。我们的方法获悉通过预测所述像素级表面参数来推断在场景中的主导平面的深度图,并且可以由深度图的交叉点来生成布局。此外,我们提出与主流平面的像素级别的深度诠释一个新的数据集。它比现有的大数据集,并包含两个长方体和非长方体的房间。实验结果表明,我们的方法产生的二维和三维数据集显着的性能提升。

12. Rb-PaStaNet: A Few-Shot Human-Object Interaction Detection Based on Rules and Part States [PDF] 返回目录
  Shenyu Zhang, Zichen Zhu, Qingquan Bao
Abstract: Existing Human-Object Interaction (HOI) Detection approaches have achieved great progress on nonrare classes while rare HOI classes are still not well-detected. In this paper, we intend to apply human prior knowledge into the existing work. So we add human-labeled rules to PaStaNet and propose Rb-PaStaNet aimed at improving rare HOI classes detection. Our results show a certain improvement of the rare classes, while the non-rare classes and the overall improvement is more considerable.
摘要:现有的人力,对象交互(海)检测方法已在nonrare班取得了长足的进步,而罕见HOI类仍然不能很好地检测。在本文中,我们打算人类已有知识应用到现有的工作。所以我们人类标记规则添加到PaStaNet并提出RB-PaStaNet旨在提高难得HOI类的检测。我们的研究结果显示出罕见的类有一定的提高,而多个非稀土类和整体提高是更为可观。

13. Optimized Deep Encoder-Decoder Methods for Crack Segmentation [PDF] 返回目录
  Jacob König, Mark Jenkins, Mike Mannion, Peter Barrie, Gordon Morison
Abstract: Continuous maintenance of concrete infrastructure is an important task which is needed to continue safe operations of these structures. One kind of defect that occurs on surfaces in these structures are cracks. Automatic detection of those cracks poses a challenging computer vision task as background, shape, colour and size of cracks vary. In this work we propose optimized deep encoder-decoder methods consisting of a combination of techniques which yield an increase in crack segmentation performance. Specifically, we propose a new design for the decoder-part in encoder-decoder based deep learning architectures for semantic segmentation. We study its composition and how to achieve increased performance by exploring components such as deep supervision and upsampling strategies. Then we examine the optimal encoder to go in conjunction with this decoder and determine that pretrained encoders lead to an increase in performance. We propose a data augmentation strategy to increase the amount of available training data and carry out the performance evaluation of the designed architecture on four publicly available crack segmentation datasets. Additionally, we introduce two techniques into the field of surface crack segmentation, previously not used there: Generating results using test-time-augmentation and performing a statistical result analysis over multiple training runs. The former approach generally yields increased performance results, whereas the latter allows for more reproducible and better representability of a methods results. Using those aforementioned strategies with our proposed encoder-decoder architecture we are able to achieve new state of the art results in all datasets.
摘要:混凝土基础设施的持续维护是需要继续进行这些结构的安全运行的一项重要任务。一种缺陷的,关于在这些结构的表面发生裂纹。这些裂纹的自动检测带来了挑战性的计算机视觉任务作为背景,形状,颜色和大小的裂缝而变化。在这项工作中我们提出优化由其产生裂纹分割性能的提高技术的组合的深编码器 - 解码器的方法。具体来说,我们建议基于语义分割深度学习架构编码器,解码器解码器部分的新设计。我们研究它的组成以及如何通过探索组件,如深监督上采样战略,以实现更高的性能。然后,我们检查的最佳编码器一起去与该解码器,并确定预训练的编码器导致的性能提升。我们提出了一个数据扩张战略,以增加可用的训练数据的数量,并进行设计架构的性能评价四个可公开获得的裂缝分割的数据集。此外,我们介绍两种技术到表面裂纹分割的领域中,先前未使用有:生成结果使用测试时间增加和执行在多个训练运行的统计结果分析。前一种方法通常会得到更高的性能结果,而后者允许的方法的结果更具有可重复性,更好的表示性。使用上述那些与策略我们提出的编码解码器架构,我们能够在所有数据集,以实现艺术效果的新状态。

14. A Learning-based Method for Online Adjustment of C-arm Cone-Beam CT Source Trajectories for Artifact Avoidance [PDF] 返回目录
  Mareike Thies, Jan-Nico Zäch, Cong Gao, Russell Taylor, Nassir Navab, Andreas Maier, Mathias Unberath
Abstract: During spinal fusion surgery, screws are placed close to critical nerves suggesting the need for highly accurate screw placement. Verifying screw placement on high-quality tomographic imaging is essential. C-arm Cone-beam CT (CBCT) provides intraoperative 3D tomographic imaging which would allow for immediate verification and, if needed, revision. However, the reconstruction quality attainable with commercial CBCT devices is insufficient, predominantly due to severe metal artifacts in the presence of pedicle screws. These artifacts arise from a mismatch between the true physics of image formation and an idealized model thereof assumed during reconstruction. Prospectively acquiring views onto anatomy that are least affected by this mismatch can, therefore, improve reconstruction quality. We propose to adjust the C-arm CBCT source trajectory during the scan to optimize reconstruction quality with respect to a certain task, i.e. verification of screw placement. Adjustments are performed on-the-fly using a convolutional neural network that regresses a quality index for possible next views given the current x-ray image. Adjusting the CBCT trajectory to acquire the recommended views results in non-circular source orbits that avoid poor images, and thus, data inconsistencies. We demonstrate that convolutional neural networks trained on realistically simulated data are capable of predicting quality metrics that enable scene-specific adjustments of the CBCT source trajectory. Using both realistically simulated data and real CBCT acquisitions of a semi-anthropomorphic phantom, we show that tomographic reconstructions of the resulting scene-specific CBCT acquisitions exhibit improved image quality particularly in terms of metal artifacts. Since the optimization objective is implicitly encoded in a neural network, the proposed approach overcomes the need for 3D information at run-time.
摘要:在脊柱融合手术,螺钉被放置在靠近提示了高度精确的螺钉放置的需要关键的神经。验证螺钉高质量断层摄影成像位置是至关重要的。 C型臂锥束CT(CBCT)提供术中3D断层摄影成像,这将允许立即检验和,如果需要的话,修订版。然而,重建质量可达到与商业CBCT设备是不够的,主要是由于在椎弓根螺钉的存在严重金属伪影。这些工件从图像形成的真实的物理和理想化模型重建期间其假定之间的失配引起的。前瞻性采集视图解剖学上被至少受到这种不匹配可以,因此,改善重建品质。我们建议的扫描,以优化重建质量相对于在C形臂CBCT源轨迹调整到一定的任务,即螺钉放置的验证。调整进行使用倒退为给定当前的x射线图像可能的下一视图的质量指标的卷积神经网络的即时。调节所述轨迹CBCT获得在非圆形轨道源能避免较差的图像,并且因此,数据的不一致的推荐意见结果。我们证明训练有素上真实地模拟数据卷积神经网络是能够预测质量度量使CBCT源轨迹的特定场景的调整。同时使用真实地模拟数据和半仿真人体模型的实际CBCT收购,我们表明,所得到的场景特异性CBCT收购断层重建表现出改进的图像质量特别是在金属伪影方面。由于优化目标是在神经网络中隐式编码,该方法克服了在运行时的3D信息的需要。

15. ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection [PDF] 返回目录
  Ye Liu, Junsong Yuan, Chang Wen Chen
Abstract: We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of in images. Most existing works treat HOIs as individual interaction categories, thus can not handle the problem of long-tail distribution and polysemy of action labels. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Leveraging the compositional and relational peculiarities of HOI labels, we propose ConsNet, a knowledge-aware framework that explicitly encodes the relations among objects, actions and interactions into an undirected graph called consistency graph, and exploits Graph Attention Networks (GATs) to propagate knowledge among HOI categories as well as their constituents. Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities. We extensively evaluate our model on the challenging V-COCO and HICO-DET datasets, and results validate that our approach outperforms state-of-the-arts under both fully-supervised and zero-shot settings.
摘要:我们认为人机交互对象(HOI)检测的问题,以定位目标和识别<人,行为,客体>的形式HOI实例中的图像。大多数现有的作品治疗HOIS个别交互作用类,因此无法处理长尾分布和动作标签中一词多义的问题。我们认为,其中的对象,动作和互动的多层次的一致性是产生罕见或不明HOIS的语义表示强烈暗示。凭借HOI标签的成分和关系的特殊性,我们建议ConsNet,知识感知框架,明确编码中的对象,行为和交互转化为所谓的一致性图无向图的关系,并利用图形注意网络(GATS)传播之间的知识HOI类别以及他们的选民。我们的模型考虑候选人类对象对与字的嵌入的视觉特征的HOI标签作为输入,将它们映射到可视语义关节嵌入空间和通过测量它们的相似取得的检测结果。我们广泛评估我们在挑战V-COCO和HICO-DET数据集模型,结果验证了国家的的艺术下都充分监督和零拍摄设置我们的方法比。

16. Deep Atrous Guided Filter for Image Restoration in Under Display Cameras [PDF] 返回目录
  Varun Sundar, Sumanth Hegde, Divya Kothandaraman, Kaushik Mitra
Abstract: Under Display Cameras present a promising opportunity for phone manufacturers to achieve bezel-free displays by positioning the camera behind semi-transparent OLED screens. Unfortunately, such imaging systems suffer from severe image degradation due to light attenuation and diffraction effects. In this work, we present Deep Atrous Guided Filter (DAGF), a two-stage, end-to-end approach for image restoration in UDC systems. A Low-Resolution Network first restores image quality at low-resolution, which is subsequently used by the Guided Filter Network as a filtering input to produce a high-resolution output. Besides the initial downsampling, our low-resolution network uses multiple, parallel atrous convolutions to preserve spatial resolution and emulates multi-scale processing. Our approach's ability to directly train on megapixel images results in significant performance improvement. We additionally propose a simple simulation scheme to pre-train our model and boost performance. Our overall framework ranks 2nd and 5th in the RLQ-TOD'20 UDC Challenge for POLED and TOLED displays, respectively.
摘要:在显示相机提出一个有前途的机会,让手机制造商通过将摄像头安放背后半透明的OLED屏幕达到无边框的显示器。不幸的是,这样的成像系统从严重的图像劣化由于光衰减和衍射效应受到影响。在这项工作中,我们提出了深Atrous引导器(DAGF),两阶段,终端到终端的方式在UDC系统图像复原。在低分辨率,其随后被引导滤波网络作为过滤输入以产生一个高解析度输出低分辨率网络第一恢复图像质量。除了最初的采样,我们的低解析度网络使用多个并联的atrous卷积保持空间分辨率和模拟多尺度处理。我们的方法对在显著的性能改进万像素的图像结果直接培养的能力。我们还提出了一个简单的模拟方案预先训练我们的模型,并提升性能。我们的总体框架,分别位居第2和第5名RLQ,TOD'20 UDC挑战POLED和TOLED显示器。

17. BriNet: Towards Bridging the Intra-class and Inter-class Gaps in One-Shot Segmentation [PDF] 返回目录
  Xianghui Yang, Bairun Wang, Kaige Chen, Xinchi Zhou, Shuai Yi, Wanli Ouyang, Luping Zhou
Abstract: Few-shot segmentation focuses on the generalization of models to segment unseen object instances with limited training samples. Although tremendous improvements have been achieved, existing methods are still constrained by two factors. (1) The information interaction between query and support images is not adequate, leaving intra-class gap. (2) The object categories at the training and inference stages have no overlap, leaving the inter-class gap. Thus, we propose a framework, BriNet, to bridge these gaps. First, more information interactions are encouraged between the extracted features of the query and support images, i.e., using an Information Exchange Module to emphasize the common objects. Furthermore, to precisely localize the query objects, we design a multi-path fine-grained strategy which is able to make better use of the support feature representations. Second, a new online refinement strategy is proposed to help the trained model adapt to unseen classes, achieved by switching the roles of the query and the support images at the inference stage. The effectiveness of our framework is demonstrated by experimental results, which outperforms other competitive methods and leads to a new state-of-the-art on both PASCAL VOC and MSCOCO dataset.
摘要:很少,镜头分割侧重于模型的泛化段看不见的对象实例与有限的训练样本。虽然巨大的改进已经实现,现有的方法仍然受到两个因素的制约。 (1)查询和支撑图像之间的信息交互是不够的,留下的类内的间隙。 (2)对象类别在训练和推理阶段有没有重叠,而使级间间隙。因此,我们提出了一个框架,BriNet,弥合这些差距。首先,鼓励查询并支持图像,即所提取的特征之间的更多信息的相互作用,使用信息交换模块,以强调所述公共对象。此外,为了精确定位的查询对象,设计了多路径细粒度的策略,它能够更好地利用支持功能表示。其次,一个新的在线细化的策略,提出了帮助训练的模型适应看不见的班,通过在推论阶段切换查询的角色和支持图像来实现的。我们的框架的有效性由实验结果,其优于两个PASCAL VOC和MSCOCO数据集等竞争方式,并导致一个新的国家的最先进的证明。

18. Parameters Sharing Exploration and Hetero-Center based Triplet Loss for Visible-Thermal Person Re-Identification [PDF] 返回目录
  Haijun Liu, Xiaoheng Tan
Abstract: This paper focuses on the visible-thermal cross-modality person re-identification (VT Re-ID) task, whose goal is to match person images between the daytime visible modality and the nighttime thermal modality. The two-stream network is usually adopted to address the cross-modality discrepancy, the most challenging problem for VT Re-ID, by learning the multi-modality person features. In this paper, we explore how many parameters of two-stream network should share, which is still not well investigated in the existing literature. By well splitting the ResNet50 model to construct the modality-specific feature extracting network and modality-sharing feature embedding network, we experimentally demonstrate the effect of parameters sharing of two-stream network for VT Re-ID. Moreover, in the framework of part-level person feature learning, we propose the hetero-center based triplet loss to relax the strict constraint of traditional triplet loss through replacing the comparison of anchor to all the other samples by anchor center to all the other centers. With the extremely simple means, the proposed method can significantly improve the VT Re-ID performance. The experimental results on two datasets show that our proposed method distinctly outperforms the state-of-the-art methods by large margins, especially on RegDB dataset achieving superior performance, rank1/mAP/mINP 91.05%/83.28%/68.84%. It can be a new baseline for VT Re-ID, with simple but effective strategy.
摘要:本文着重于可见热跨模态的人重新鉴定(重新VT-ID)任务,其目标是相匹配的白天可见模式和夜间模式热之间的人的图像。这两个流的网络通常采用以解决交叉形态差异,对于VT重新ID最具挑战性的问题,通过学习多模态的人的特点。在本文中,我们将探讨如何二流网络的多个参数应该分享,这仍然不是在现有的文献中很好地研究。用公分裂ResNet50模型构建特定形态特征抽出网络和方式共享功能嵌入网络,我们通过实验证明的参数重新VT-ID共享的二流网络的影响。此外,在部分级别的人地物学习的框架下,我们提出了异质中心基于三重损失通过锚中心更换锚的比较所有其它样品到所有其他中心放松传统的三重损失的严格约束。与极其简单的装置,该方法能显著提高VT重新ID性能。两个数据集上的实验结果表明,该方法明显大的利润优于国家的最先进的方法,尤其是在RegDB数据集实现卓越的性能,等级-1 /地图/ MINP 91.05%/ 83.28%/ 68.84%。它可以为VT重新ID一个新的基准,以简单而有效的策略。

19. Structure-Aware Network for Lane Marker Extraction with Dynamic Vision Sensor [PDF] 返回目录
  Wensheng Cheng, Hao Luo, Wen Yang, Lei Yu, Wei Li
Abstract: Lane marker extraction is a basic yet necessary task for autonomous driving. Although past years have witnessed major advances in lane marker extraction with deep learning models, they all aim at ordinary RGB images generated by frame-based cameras, which limits their performance in extreme cases, like huge illumination change. To tackle this problem, we introduce Dynamic Vision Sensor (DVS), a type of event-based sensor to lane marker extraction task and build a high-resolution DVS dataset for lane marker extraction. We collect the raw event data and generate 5,424 DVS images with a resolution of 1280$\times$800 pixels, the highest one among all DVS datasets available now. All images are annotated with multi-class semantic segmentation format. We then propose a structure-aware network for lane marker extraction in DVS images. It can capture directional information comprehensively with multidirectional slice convolution. We evaluate our proposed network with other state-of-the-art lane marker extraction models on this dataset. Experimental results demonstrate that our method outperforms other competitors. The dataset is made publicly available, including the raw event data, accumulated images and labels.
摘要:车道标记提取是自主驾驶一个基本的但必要的任务。虽然过去几年已经见证了深学习模型车道标记提取的重大进展,他们的目标都是通过基于帧的摄像头,这限制了它们的性能在极端情况下,像巨大的光照变化产生普通的RGB图像。为了解决这个问题,我们引入动态视觉传感器(DVS),一种基于事件的传感器,车道标记提取的任务,并建立高分辨率的DVS数据集车道标记提取。我们收集到的原始事件数据并生成5424个DVS的图像1280 $ \ $倍800个像素,现已全部DVS数据集之间最高之一的决议。所有图像都标注了多类语义分割格式。然后,我们提出了在DVS的图像车道标记提取的结构感知网络。它可与多向片卷积全面捕捉方向信息。我们评估我们所提出的网络与该数据集的其他国家的最先进的车道标记提取模型。实验结果表明,我们的方法优于其他竞争对手。该数据集被公之于众,包括原始事件数据,累积的图像和标签。

20. An Improved Deep Convolutional Neural Network-Based Autonomous Road Inspection Scheme Using Unmanned Aerial Vehicles [PDF] 返回目录
  Syed Ali Hassan, Tariq Rahim, Soo Young Shin
Abstract: Advancements in artificial intelligence (AI) gives a great opportunity to develop an autonomous devices. The contribution of this work is an improved convolutional neural network (CNN) model and its implementation for the detection of road cracks, potholes, and yellow lane in the road. The purpose of yellow lane detection and tracking is to realize autonomous navigation of unmanned aerial vehicle (UAV) by following yellow lane while detecting and reporting the road cracks and potholes to the server through WIFI or 5G medium. The fabrication of own data set is a hectic and time-consuming task. The data set is created, labeled and trained using default and an improved model. The performance of both these models is benchmarked with respect to accuracy, mean average precision (mAP) and detection time. In the testing phase, it was observed that the performance of the improved model is better in respect of accuracy and mAP. The improved model is implemented in UAV using the robot operating system for the autonomous detection of potholes and cracks in roads via UAV front camera vision in real-time.
摘要:在进步人工智能(AI)提供了开发的自主设备的绝佳机会。这项工作的贡献是一种改进的卷积神经网络(CNN)模型及其检测路面裂缝,坑洼的实施,并在道路黄色轨道。黄色车道检测和跟踪的目的是通过同时检测和报告通过WIFI或5G介质道路裂纹和坑洼到服务器下列黄色车道实现无人驾驶飞行器(UAV)的自主导航。自己的数据集的制作是一个忙碌和耗时的任务。该数据集被创建,标记和使用默认和改进的模型的培训。这两种模型的性能进行基准相对于精确度,值平均精度(MAP)和检测时间。在测试阶段,观察到了改进模型的性能是在相对于精度和地图的更好。改进的模型UAV经由实时UAV前置摄像头视野使用用于在道路坑洼和裂缝的自主检测机器人操作系统中实现。

21. Apparel-invariant Feature Learning for Apparel-changed Person Re-identification [PDF] 返回目录
  Zhengxu Yu, Yilun Zhao, Bin Hong, Zhongming Jin, Jianqiang Huang, Deng Cai, Xian-Sheng Hua
Abstract: With the rise of deep learning methods, person Re-Identification (ReID) performance has been improved tremendously in many public datasets. However, most public ReID datasets are collected in a short time window in which persons' appearance rarely changes. In real-world applications such as in a shopping mall, the same person's clothing may change, and different persons may wearing similar clothes. All these cases can result in an inconsistent ReID performance, revealing a critical problem that current ReID models heavily rely on person's apparels. Therefore, it is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes. In this work, we tackle this problem from the viewpoint of invariant feature representation learning. The main contributions of this work are as follows. (1) We propose the semi-supervised Apparel-invariant Feature Learning (AIFL) framework to learn an apparel-invariant pedestrian representation using images of the same person wearing different clothes. (2) To obtain images of the same person wearing different clothes, we propose an unsupervised apparel-simulation GAN (AS-GAN) to synthesize cloth changing images according to the target cloth embedding. It's worth noting that the images used in ReID tasks were cropped from real-world low-quality CCTV videos, making it more challenging to synthesize cloth changing images. We conduct extensive experiments on several datasets comparing with several baselines. Experimental results demonstrate that our proposal can improve the ReID performance of the baseline models.
摘要:随着深学习方式的兴起,人重新鉴定(里德)的性能得到了极大的许多公共数据集的改善。然而,大多数公共REID数据集在其中的人的出现很少改变很短的时间窗收集。在现实世界的应用,如在商场,同一个人的服装可能会改变,而且不同的人可以穿着差不多的衣服。所有这些情况下,可能会导致不一致的REID性能,露出一个关键的问题,目前的里德模型在很大程度上依赖于人的服装。因此,关键是要学习下像布改变或几个人穿着差不多的衣服案件的服装不变的人表示。在这项工作中,我们将处理从不变特征表示学习的角度来看这个问题。这项工作的主要内容如下。 (1)提出了半监督服装不变地物学习(AIFL)框架来学习使用穿着不同的衣服同一人的图像的服装不变行人表示。 (2)为了得到穿着不同的衣服同一人的图像,提出了一种无监督服装模拟GAN(AS-GAN)来合成布根据目标布嵌入变化的图像。值得一提的是,在里德的任务中使用的图片是从现实世界低品质CCTV视频裁剪,使其更加具有挑战性的合成布料变化的图像。我们在几个数据集与几个比较基准进行大量的实验。实验结果表明,我们的建议能提高基本模式,里德的表现。

22. A Multimodal Late Fusion Model for E-Commerce Product Classification [PDF] 返回目录
  Ye Bi, Shuo Wang, Zhongrui Fan
Abstract: The cataloging of product listings is a fundamental problem for most e-commerce platforms. Despite promising results obtained by unimodal-based methods, it can be expected that their performance can be further boosted by the consideration of multimodal product information. In this study, we investigated a multimodal late fusion approach based on text and image modalities to categorize e-commerce products on Rakuten. Specifically, we developed modal specific state-of-the-art deep neural networks for each input modal, and then fused them at the decision level. Experimental results on Multimodal Product Classification Task of SIGIR 2020 E-Commerce Workshop Data Challenge demonstrate the superiority and effectiveness of our proposed method compared with unimodal and other multimodal methods. Our team named pa_curis won the 1st place with a macro-F1 of 0.9144 on the final leaderboard.
摘要:产品上市的编目是大多数电子商务平台的一个基本问题。尽管承诺基于单峰的方法获得的结果,可以预期的是其性能可以通过考虑多产品信息进一步增强。在这项研究中,我们调查基础上的文字和图片形式对乐天群归类电子商务产品多式联运后融合的方式。具体而言,我们开发了莫代尔特定国家的最先进的深层神经网络的每一个输入模式,然后在决策层融合他们。在SIGIR 2020电子商务研讨会数据挑战的多式联运产品分类任务的实验结果证明,单峰等多式联运方式相比,我们提出的方法的优越性和有效性。我们的团队命名pa_curis与0.9144最终排行榜宏观F1荣获第一名。

23. 3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View [PDF] 返回目录
  Marc Badger, Yufu Wang, Adarsh Modh, Ammon Perkes, Nikos Kolotouros, Bernd G. Pfrommer, Marc F. Schmidt, Kostas Daniilidis
Abstract: Automated capture of animal pose is transforming how we study neuroscience and social behavior. Movements carry important social cues, but current methods are not able to robustly estimate pose and shape of animals, particularly for social animals such as birds, which are often occluded by each other and objects in the environment. To address this problem, we first introduce a model and multi-view optimization approach, which we use to capture the unique shape and pose space displayed by live birds. We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views. Finally, we provide extensive multi-view keypoint and mask annotations collected from a group of 15 social birds housed together in an outdoor aviary. The project website with videos, results, code, mesh model, and the Penn Aviary Dataset can be found at this https URL.
摘要:动物姿态的自动捕获正在改变我们如何学习神经科学和社会行为。运动随身携带重要的社会线索,但目前的方法是不能够稳健地估算动物的姿势和体形,特别是社会性动物,如鸟,这往往是由相互遮挡和环境中的物体。为了解决这个问题,我们先介绍一个模型和多视图的优化方法,我们用它来捕捉独特的造型和姿势的空间显示由活禽。然后,我们引入管道和关键点,面具,姿势实验和形状的回归,从单一的观点复苏准确禽流感的姿势。最后,我们提供广泛的多视角关键点和掩码从一组一起容纳在户外鸟舍15种群居的鸟类收集注解。该项目的网站,视频,结果,码,网格模型,以及宾州百鸟数据集可以在此HTTPS URL中找到。

24. Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction [PDF] 返回目录
  M. Akin Yilmaz, A. Murat Tekalp
Abstract: We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance.
摘要:分析前馈与反馈神经网络(RNN)架构和教训帧预测相关的培训方法的性能。为了这种效果,我们培养了残留充分卷积神经网络(FCNN),卷积RNN(CRNN),以及使用所述平均平方损失卷积长短期存储器(CLSTM)网络对下一帧的预测。我们进行无状态和复发性的网络状态的训练。实验结果表明,残余FCNN架构执行最好在峰值信号而言在更高的训练和测试(推论)的计算复杂度为代价的信噪比(PSNR)。所述CRNN能够稳定且非常有效地通过时间过程中使用有状态的截短的反向传播来训练,它需要较少数量级运行时推理的顺序,以实现近实时帧预测具有可接受的性能。

25. Geometric Deep Learning for Post-Menstrual Age Prediction based on the Neonatal White Matter Cortical Surface [PDF] 返回目录
  Vitalis Vosylius, Andy Wang, Cemlyn Waters, Alexey Zakharov, Francis Ward, Loic Le Folgoc, John Cupitt, Antonios Makropoulos, Andreas Schuh, Daniel Rueckert, Amir Alansary
Abstract: Accurate estimation of the age in neonates is essential for measuring neurodevelopmental, medical, and growth outcomes. In this paper, we propose a novel approach to predict the post-menstrual age (PA) at scan, using techniques from geometric deep learning, based on the neonatal white matter cortical surface. We utilize and compare multiple specialized neural network architectures that predict the age using different geometric representations of the cortical surface; we compare MeshCNN, Pointnet++, GraphCNN, and a volumetric benchmark. The dataset is part of the Developing Human Connectome Project (dHCP), and is a cohort of healthy and premature neonates. We evaluate our approach on 650 subjects (727scans) with PA ranging from 27 to 45 weeks. Our results show accurate prediction of the estimated PA, with mean error less than one week.
摘要:新生儿的年龄精确估计是衡量神经发育,医疗和增长成果至关重要。在本文中,我们提出了一种新的方法,在扫描预测后月经年龄(PA),利用几何深度学习技术,基于新生儿脑白质皮质表面上。我们利用和比较,预测使用皮质表面的不同几何表示岁多的专业神经网络结构;我们比较MeshCNN,Pointnet ++,GraphCNN和体积基准。该数据集是发展中的人类连接组项目(DHCP)的一部分,并且是健康的,早产儿的队列。我们评估我们对650名受试者(727scans)与PA 27〜45周方法。我们的研究结果表明估计PA的准确预测,平均误差不超过一个星期。

26. Novelty Detection Through Model-Based Characterization of Neural Networks [PDF] 返回目录
  Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib
Abstract: In this paper, we propose a model-based characterization of neural networks to detect novel input types and conditions. Novelty detection is crucial to identify abnormal inputs that can significantly degrade the performance of machine learning algorithms. Majority of existing studies have focused on activation-based representations to detect abnormal inputs, which limits the characterization of abnormality from a data perspective. However, a model perspective can also be informative in terms of the novelties and abnormalities. To articulate the significance of the model perspective in novelty detection, we utilize backpropagated gradients. We conduct a comprehensive analysis to compare the representation capability of gradients with that of activation and show that the gradients outperform the activation in novel class and condition detection. We validate our approach using four image recognition datasets including MNIST, Fashion-MNIST, CIFAR-10, and CURE-TSR. We achieve a significant improvement on all four datasets with an average AUROC of 0.953, 0.918, 0.582, and 0.746, respectively.
摘要:在本文中,我们提出了神经网络的基于模型的特性,以检测新的输入类型和条件。新颖性检测是识别异常的输入,可以显著降解的机器学习算法的性能是至关重要的。现有研究的大部分都集中在基于激活的表示检测异常输入,从数据透视限制异常的表征。然而,一个型号上来看,也可以在信息的新奇和异常的条款。为了阐明新奇检测型号上来看的意义,我们利用backpropagated梯度。我们进行了全面的分析与激活和展示的比较梯度的表示能力的梯度超越小说类和状态检测激活。我们验证使用四个图像识别的数据集,包括MNIST,时尚MNIST,CIFAR-10和CURE-TSR我们的做法。我们分别实现对所有四个数据集一个显著的改善与0.953,0.918,0.582,0.746和平均AUROC。

27. Semantically Adversarial Learnable Filters [PDF] 返回目录
  Ali Shahin Shamsabadi, Changjae Oh, Andrea Cavallaro
Abstract: We present the first adversarial framework that crafts perturbations that mislead classifiers by accounting for the content of the images and the semantics of the labels. The proposed framework combines deep neural networks and traditional image processing filters, which define the type and magnitude of the adversarial perturbation. We also introduce a semantic adversarial loss that guides the training of a fully convolutional neural network to generate adversarial images that will be classified with a label that is semantically different from the label of the original (clean) image. We analyse the limitations of existing methods that do not account for the semantics of the labels and evaluate the proposed framework, FilterFool, on ImageNet and with three object classifiers, namely ResNet50, ResNet18 and AlexNet. We discuss its success rate, robustness and transferability to unseen classifiers.
摘要:我们提出的第一个对抗性的框架,工艺品扰动,通过占图像的内容和标签的语义误导分类。所提出的框架联合深神经网络和传统的图像处理过滤器,它定义了对抗扰动的类型和大小。我们还引入了语义对抗性损失引导全卷积神经网络的训练以生成将与一个标签,是从原来的(清洁)图像的标签语义不同可分为对抗性的图像。我们分析的不占标签的语义和评价所提出的框架,FilterFool,在ImageNet,有三个对象分类,即ResNet50,ResNet18和AlexNet现有方法的局限性。我们讨论它的成功率,稳定性和可转移性,以看不见的分类。

28. Self-Sampling for Neural Point Cloud Consolidation [PDF] 返回目录
  Gal Metzer, Rana Hanocka, Raja Giryes, Daniel Cohen-Or
Abstract: In this paper, we introduce a deep learning technique for consolidating and sharp feature generation of point clouds using only the input point cloud itself. Rather than explicitly define a prior that describes typical shape characteristics (i.e., piecewise-smoothness), or a heuristic policy for generating novel sharp points, we opt to learn both using a neural network with shared-weights. Instead of relying on a large collection of manually annotated data, we use the self-supervision present within a single shape, i.e., self-prior, to train the network, and learn the underlying distribution of sharp features specific to the given input point cloud. By learning to map a low-curvature subset of the input point cloud to a disjoint high-curvature subset, the network formalizes the shape-specific characteristics and infers how to generate sharp points. During test time, the network is repeatedly fed a random subset of points from the input and displaces them to generate an arbitrarily large set of novel sharp feature points. The local shared weights are optimized over the entire shape, learning non-local statistics and exploiting the recurrence of local-scale geometries. We demonstrate the ability to generate coherent sets of sharp feature points on a variety of shapes, while eliminating outliers and noise.
摘要:在本文中,我们介绍了巩固和仅使用输入点云本身尖锐特征生成点云的深入学习技术。而不是显式地定义,它描述典型形状特性(即,分段光滑度),或用于产生新颖尖点启发式策略的先前,我们选择既使用神经网络具有共享权重来学习。而不是依赖于收集了大量的手动注释的数据的,我们使用本自检单个形状内的,即,自之前,对网络进行训练,并学习尖锐特征特定的潜在分布于给定输入点云。通过学习到输入点云的低曲率子集映射到不相交的高曲率的子集,所述网络形式化的特定形状特性和推断如何产生尖点。在测试时间,所述网络被重复地馈送从输入点的随机子集并置换它们,以生成任意大的组新颖尖锐的特征点的。本地共享权重优化在整个形状,学习非本地数据和利用本地尺度的几何形状的复发。我们证明产生连贯的套在各种形状的尖锐特征点的能力,同时消除异常值和噪声。

29. SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud [PDF] 返回目录
  Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias Leontiadis, Nicholas D. Lane
Abstract: Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.
摘要:尽管飞涨使用移动应用卷积神经网络(细胞神经网络),在移动设备上保持一致的高性能推理一直是现代细胞神经网络的过度计算要求和部署的设备的日益多样化,难以捉摸的原因。一个流行的替代包括卸载CNN处理以强大的基于云的服务器。然而,依靠云来产生输出,新兴的任务关键型和高流动性的应用,如无人驾驶飞机避开障碍物或交互式应用程序,可以从动态连接条件和云量是否充足吃亏。在本文中,我们提出了SPINN,与一个渐进的推理方法一起使用的协同设备,云计算跨不同设置下提供快速和强大CNN推理分布式推理系统。所提出的系统引入了新的调度程序,共同优化早期退出政策和在运行时CNN分裂,以适应动态环境和满足用户定义的服务级别要求。定量评价示出了SPINN高达优于它的状态的最先进的协作推理同行2倍的吞吐量变化的网络条件下实现,减少了由在服务器成本6.8x和下等待时间约束20.7%提高了精度,同时提供相比于以云为中心的执行不确定连接条件和显著节能下可靠工作。

30. Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review [PDF] 返回目录
  Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I. Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R. Weir-McCall, Zhongzhao Teng, James H.F. Rudd, Evis Sala, Carola-Bibiane Schönlieb
Abstract: Background: Machine learning methods offer great potential for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. In this systematic review we critically evaluate the machine learning methodologies employed in the rapidly growing literature. Methods: In this systematic review we reviewed EMBASE via OVID, MEDLINE via PubMed, bioRxiv, medRxiv and arXiv for published papers and preprints uploaded from Jan 1, 2020 to June 24, 2020. Studies which consider machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images were included. A methodology quality review of each paper was performed against established benchmarks to ensure the review focusses only on high-quality reproducible papers. This study is registered with PROSPERO [CRD42020188887]. Interpretation: Our review finds that none of the developed models discussed are of potential clinical use due to methodological flaws and underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. Typically, we find that the documentation of a model's development is not sufficient to make the results reproducible and therefore of 168 candidate papers only 29 are deemed to be reproducible and subsequently considered in this review. We therefore encourage authors to use established machine learning checklists to ensure sufficient documentation is made available, and to follow the PROBAST (prediction model risk of bias assessment tool) framework to determine the underlying biases in their model development process and to mitigate these where possible. This is key to safe clinical implementation which is urgently needed.
摘要:背景:机器学习方法从标准的护理胸片(CXR)和计算机断层扫描(CT)图像COVID-19的快速,准确的检测和预测提供了巨大的潜力。在这个系统的审查中,我们严格评估在快速增长的文献中的机器学习方法。方法:在这个系统的审查中,我们通过奥维德,通过考研,bioRxiv,medRxiv和MEDLINE的arXiv用于上传自2020年1月1日至6月24日,2020年发表的研究论文和预印本其考虑机器学习模型的诊断或预后审查文摘COVID-19从CXR或CT图像都包括在内。对被确定的基准进行每篇论文的方法质量审查,以确保只有高品质的重复性论文审查论点集中。这项研究在PROSPERO [CRD42020188887]注册。解读:我们的审查认定,没有讨论建立的模型都是潜在的临床使用由于方法上的缺陷和潜在的偏见。这是一个很大的弱点,因为与验证需要COVID-19型号的紧迫性。通常情况下,我们发现一个模型的开发文档不足以使结果重现性好,因此168篇的候选论文只有29个被认为是可再生的,随后在本次审查考虑。因此,我们鼓励作者使用建立机器学习清单,以确保有足够的文档提供,并遵循PROBAST(偏置评估工具的预测模型风险)的框架,以确定他们的模型开发过程中潜在的偏见,以减轻这些在可能的情况。这是关键,安全的临床实施这是迫切需要的。

31. Integrating uncertainty in deep neural networks for MRI based stroke analysis [PDF] 返回目录
  Lisa Herzog, Elvis Murina, Oliver Dürr, Susanne Wegener, Beate Sick
Abstract: At present, the majority of the proposed Deep Learning (DL) methods provide point predictions without quantifying the models uncertainty. However, a quantification of the reliability of automated image analysis is essential, in particular in medicine when physicians rely on the results for making critical treatment decisions. In this work, we provide an entire framework to diagnose ischemic stroke patients incorporating Bayesian uncertainty into the analysis procedure. We present a Bayesian Convolutional Neural Network (CNN) yielding a probability for a stroke lesion on 2D Magnetic Resonance (MR) images with corresponding uncertainty information about the reliability of the prediction. For patient-level diagnoses, different aggregation methods are proposed and evaluated, which combine the single image-level predictions. Those methods take advantage of the uncertainty in image predictions and report model uncertainty at the patient-level. In a cohort of 511 patients, our Bayesian CNN achieved an accuracy of 95.33% at the image-level representing a significant improvement of 2% over a non-Bayesian counterpart. The best patient aggregation method yielded 95.89% of accuracy. Integrating uncertainty information about image predictions in aggregation models resulted in higher uncertainty measures to false patient classifications, which enabled to filter critical patient diagnoses that are supposed to be closer examined by a medical doctor. We therefore recommend using Bayesian approaches not only for improved image-level prediction and uncertainty estimation but also for the detection of uncertain aggregations at the patient-level.
摘要:目前,大多数建议的深度学习(DL)方法提供点预测没有量化模型的不确定性。然而,当医生依靠结果进行关键治疗决策的自动图像分析的可靠性的量化是必不可少的,特别是在医学。在这项工作中,我们提供了一个整个框架诊断结合贝叶斯不确定性分析程序缺血性卒中患者。我们提出了一个贝叶斯卷积神经网络(CNN),得到为二维磁共振(MR)与对应关于预测的可靠性的不确定性的信息图像的笔划病变的概率。对于患者水平的诊断,提出了不同的聚合方法和评价,结合单个图像级别预测。这些方法需要在病人级别的图像预测和报表模型的不确定性不确定性的优势。在511一名患者,我们的贝叶斯CNN在较2%,比非贝叶斯对应一个显著改善图像水平达到95.33%的准确度。最好的病人聚集方法产生准确的95.89%。关于整合导致了较高的不确定性的措施,假病人分类,这使过滤,都应该由医生来检查接近危重病人的诊断聚集模型图像的预测不确定性的信息。因此,我们建议使用贝叶斯方法不仅改善图像层次的预测和估计的不确定性也为在患者水平的检测不确定聚合的。

32. Automated detection and quantification of COVID-19 airspace disease on chest radiographs: A novel approach achieving radiologist-level performance using a CNN trained on digital reconstructed radiographs (DRRs) from CT-based ground-truth [PDF] 返回目录
  Eduardo Mortani Barbosa Jr., Warren B. Gefter, Rochelle Yang, Florin C. Ghesu, Siqi Liu, Boris Mailhe, Awais Mansoor, Sasa Grbic, Sebastian Piat, Guillaume Chabin, Vishwanath R S., Abishek Balachandran, Sebastian Vogt, Valentin Ziebandt, Steffen Kappler, Dorin Comaniciu
Abstract: Purpose: To leverage volumetric quantification of airspace disease (AD) derived from a superior modality (CT) serving as ground truth, projected onto digitally reconstructed radiographs (DRRs) to: 1) train a convolutional neural network to quantify airspace disease on paired CXRs; and 2) compare the DRR-trained CNN to expert human readers in the CXR evaluation of patients with confirmed COVID-19. Materials and Methods: We retrospectively selected a cohort of 86 COVID-19 patients (with positive RT-PCR), from March-May 2020 at a tertiary hospital in the northeastern USA, who underwent chest CT and CXR within 48 hrs. The ground truth volumetric percentage of COVID-19 related AD (POv) was established by manual AD segmentation on CT. The resulting 3D masks were projected into 2D anterior-posterior digitally reconstructed radiographs (DRR) to compute area-based AD percentage (POa). A convolutional neural network (CNN) was trained with DRR images generated from a larger-scale CT dataset of COVID-19 and non-COVID-19 patients, automatically segmenting lungs, AD and quantifying POa on CXR. CNN POa results were compared to POa quantified on CXR by two expert readers and to the POv ground-truth, by computing correlations and mean absolute errors. Results: Bootstrap mean absolute error (MAE) and correlations between POa and POv were 11.98% [11.05%-12.47%] and 0.77 [0.70-0.82] for average of expert readers, and 9.56%-9.78% [8.83%-10.22%] and 0.78-0.81 [0.73-0.85] for the CNN, respectively. Conclusion: Our CNN trained with DRR using CT-derived airspace quantification achieved expert radiologist level of accuracy in the quantification of airspace disease on CXR, in patients with positive RT-PCR for COVID-19.
摘要:目的:从作为基础事实优异的模态(CT)衍生的空域病(AD)的杠杆体积定量,投射到数字重建射线照片(的DRR):1)培养的卷积神经网络来进行量化空域疾病上配对CXRS; 2)比较DRR训练CNN专家人类读者的患者确诊COVID-19 CXR评价。材料与方法:我们回顾性在三级医院在美国东北部,选定86 COVID-19患者队列(正RT-PCR),从3月 - 2020年5谁接受胸部CT和胸片在48个小时以内。 COVID-19相关的AD的基础事实体积百分比(POV)通过在CT手动AD分割成立。由此产生的三维面具被投影到2D的前 - 后数字重建射线照片(DRR),以计算基于区域的AD百分比(POA)。卷积神经网络(CNN)与来自COVID-19和非COVID-19的患者的大规模数据集CT产生的DRR图像进行训练,自动分割肺,AD和CXR量化POA。 CNN POA结果与POA通过计算相关和平均绝对误差由两个专家和读者的POV地面实况的CXR量化。结果:自举平均绝对误差(MAE)和相关性POA和POV之间分别为11.98%[11.05%-12.47%]和0.77 [0.70-0.82]为平均专家读者,和9.56%-9.78%[8.83%-10.22% ]和0.78-0.81 [0.73-0.85]为CNN,分别。结论:我们的CNN训练了与DRR使用CT衍生空域量化实现的精确度专家放射水平,空域疾病对CXR量化,患者的阳性RT-PCR对COVID-19。

33. Homotopic Gradients of Generative Density Priors for MR Image Reconstruction [PDF] 返回目录
  Cong Quan, Jinjie Zhou, Yuanzheng Zhu, Yang Chen, Shanshan Wang, Dong Liang, Qiegen Liu
Abstract: Deep learning, particularly the generative model, has demonstrated tremendous potential to significantly speed up image reconstruction with reduced measurements recently. Rather than the existing generative models that often optimize the density priors, in this work, by taking advantage of the denoising score matching, homotopic gradients of generative density priors (HGGDP) are proposed for magnetic resonance imaging (MRI) reconstruction. More precisely, to tackle the low-dimensional manifold and low data density region issues in generative density prior, we estimate the target gradients in higher-dimensional space. We train a more powerful noise conditional score network by forming high-dimensional tensor as the network input at the training phase. More artificial noise is also injected in the embedding space. At the reconstruction stage, a homotopy method is employed to pursue the density prior, such as to boost the reconstruction performance. Experiment results imply the remarkable performance of HGGDP in terms of high reconstruction accuracy; only 10% of the k-space data can still generate images of high quality as effectively as standard MRI reconstruction with the fully sampled data.
摘要:深学习,尤其是生成模型,展现了巨大的潜力,最近显著加速图像重建降低测量。而不是现有的生成模型经常优化密度先验,在这项工作中,通过利用所述去噪得分匹配的,生成密度先验(HGGDP)的同伦梯度提出用于磁共振成像(MRI)重建。更确切地说,为了应对之前在生成密度低维流形和低数据密度地区的问题,我们估计高维空间内的对象梯度。我们培养通过形成高维张量作为训练阶段的网络输入一个更强大的噪音条件得分网络。更多的人为噪声也被注入在嵌入空间。在重构阶段中,采用同伦方法之前追求的密度,例如,以提高重建性能。实验结果意味着HGGDP的高重建精度方面的表现可圈可点;只有10%的k空间数据的仍然可以产生高质量的图像尽可能有效标准MRI重建与完全采样数据。

34. Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images [PDF] 返回目录
  Leanne Nortje, Herman Kamper
Abstract: We consider the task of multimodal one-shot speech-image matching. An agent is shown a picture along with a spoken word describing the object in the picture, e.g. cookie, broccoli and ice-cream. After observing one paired speech-image example per class, it is shown a new set of unseen pictures, and asked to pick the "ice-cream". Previous work attempted to tackle this problem using transfer learning: supervised models are trained on labelled background data not containing any of the one-shot classes. Here we compare transfer learning to unsupervised models trained on unlabelled in-domain data. On a dataset of paired isolated spoken and visual digits, we specifically compare unsupervised autoencoder-like models to supervised classifier and Siamese neural networks. In both unimodal and multimodal few-shot matching experiments, we find that transfer learning outperforms unsupervised training. We also present experiments towards combining the two methodologies, but find that transfer learning still performs best (despite idealised experiments showing the benefits of unsupervised learning).
摘要:我们认为多一杆的语音图像匹配的任务。的试剂与描述在画面中的对象一个口语单词一起示出一个画面,例如饼干,西兰花和冰淇淋。观察每一个类配对语音形象的例子后,它显示了一组新的图片看不见的,并且要挑选“冰淇淋”。先前的工作试图解决使用迁移学习这个问题:监督模型上不包含任何一次性类的标记背景数据训练。在这里,我们比较迁移学习训练有素未标记的域数据模型无人监督。在配对隔离语音和视频的数字数据集,我们特别比较的自动编码般的模型来监督分类和连体神经网络的无监督。在这两个单峰和多几个次匹配实验,我们发现,传递学习性能优于无监督训练。我们对两种方法相结合也存在实验,但发现转移学习仍表现最好(尽管理想化的实验显示无监督学习的好处)。

35. WAN: Watermarking Attack Network [PDF] 返回目录
  Seung-Hun Nam, Wonhyuk Ahn, In-Jae Yu, Seung-Min Mun, Heung-Kyu Lee
Abstract: Multi-bit watermarking (MW) has been developed to improve robustness against signal processing operations and geometric distortions. To this end, several benchmark tools that simulate possible attacks on images to test robustness are available. However, limitations in these general attacks exist since they cannot exploit specific characteristics of the targeted MW. In addition, these attacks are usually devised without consideration for visual quality, which rarely occurs in the real world. To address these limitations, we propose a watermarking attack network (WAN), a fully trainable watermarking benchmark tool, that utilizes the weak points of the target MW and removes inserted watermark and inserts inverted bit information, thereby considerably reducing watermark extractability. To hinder the extraction of hidden information while ensuring high visual quality, we utilize a residual dense blocks-based architecture specialized in local and global feature learning. A novel watermarking attack loss is introduced to break the MW systems. We empirically demonstrate that the WAN can successfully fool a variety of MW systems.
摘要:多比特水印(MW)已被开发以改进对信号处理操作和几何失真的鲁棒性。为此,在模拟上的图像可能的攻击,以测试鲁棒性的几个基准工具可用。然而,在这些普通攻击限制存在,因为他们不能利用目标兆瓦的具体特点。此外,这些攻击通常不考虑设计视觉质量,这很少发生在现实世界中。为了解决这些限制,我们提出了一个水印攻击网络(WAN),完全可训练的水印基准测试工具,利用目标MW和插入水印和插入倒位信息消除了薄弱环节,从而大大减少水印提取性。为了阻止隐藏信息的提取,同时确保高视觉质量,我们利用剩余的基于块的密集建筑专业的本地和全局特征的学习。一种新的数字水印攻击损失引入打破兆瓦的系统。我们经验表明,广域网能够成功骗过各种兆瓦的系统。

36. Unsupervised Image Restoration Using Partially Linear Denoisers [PDF] 返回目录
  Rihuan Ke, Carola-Bibiane Schönlieb
Abstract: Deep neural network based methods are the state of the art in various image restoration problems. Standard supervised learning frameworks require a set of noisy measurement and clean image pairs for which a distance between the output of the restoration model and the ground truth, clean images is minimized. The ground truth images, however, are often unavailable or very expensive to acquire in real-world applications. We circumvent this problem by proposing a class of structured denoisers that can be decomposed as the sum of a nonlinear image-dependent mapping, a linear noise-dependent term and a small residual term. We show that these denoisers can be trained with only noisy images under the condition that the noise has zero mean and known variance. The exact distribution of the noise, however, is not assumed to be known. We show the superiority of our approach for image denoising, and demonstrate its extension to solving other restoration problems such as blind deblurring where the ground truth is not available. Our method outperforms some recent unsupervised and self-supervised deep denoising models that do not require clean images for their training. For blind deblurring problems, the method, using only one noisy and blurry observation per image, reaches a quality not far away from its fully supervised counterparts on a benchmark dataset.
摘要:深基于神经网络的方法是在本领域中各种图像复原问题的状态。标准监督学习框架都需要一组噪声测量结果和的量,修复体模型的输出与地面实况之间的距离,清洁图像被最小化清洁图像对。地面实况图像,但是,往往是不可用或在实际应用中是非常昂贵的收购。我们通过提出一种类,它可以被分解为一个非线性依赖于图像的映射的总和,线性噪声相关项和小的残余术语结构化denoisers的规避这个问题。我们表明,这些denoisers可以与噪声具有零均值和方差已知的情况下只有噪声图像进行培训。噪声的精确分布,但是,不假定为已知的。我们展示我们对图像进行去噪方法的优越性,并展示其推广到解决其他问题的修复,如盲目去模糊在地面真相是不可用的。我们的方法优于近期一些无监督和自我监督的深降噪模式不需要清洁图像进行训练。对于盲目去模糊的问题,方法,只用一个嘈杂,每幅图像模糊的观察,达到质量而不是远离它的完全监督同行在基准数据集。

37. Interpretation of Brain Morphology in Association to Alzheimer's Disease Dementia Classification Using Graph Convolutional Networks on Triangulated Meshes [PDF] 返回目录
  Emanuel A. Azcona, Pierre Besson, Yunan Wu, Arjun Punjabi, Adam Martersteck, Amil Dravid, Todd B. Parrish, S. Kathleen Bandt, Aggelos K. Katsaggelos
Abstract: We propose a mesh-based technique to aid in the classification of Alzheimer's disease dementia (ADD) using mesh representations of the cortex and subcortical structures. Deep learning methods for classification tasks that utilize structural neuroimaging often require extensive learning parameters to optimize. Frequently, these approaches for automated medical diagnosis also lack visual interpretability for areas in the brain involved in making a diagnosis. This work: (a) analyzes brain shape using surface information of the cortex and subcortical structures, (b) proposes a residual learning framework for state-of-the-art graph convolutional networks which offer a significant reduction in learnable parameters, and (c) offers visual interpretability of the network via class-specific gradient information that localizes important regions of interest in our inputs. With our proposed method leveraging the use of cortical and subcortical surface information, we outperform other machine learning methods with a 96.35% testing accuracy for the ADD vs. healthy control problem. We confirm the validity of our model by observing its performance in a 25-trial Monte Carlo cross-validation. The generated visualization maps in our study show correspondences with current knowledge regarding the structural localization of pathological changes in the brain associated to dementia of the Alzheimer's type.
摘要:本文提出了一种基于网格的技术,在使用皮质和皮质下结构的网状表示阿尔茨海默病性痴呆(ADD)的分类救助。深度学习方法分类任务是利用神经影像学结构往往需要大量的学习参数来优化。通常情况下,这些方法对于自动医学诊断也缺乏对参与作出诊断的大脑区域的视觉解释性。这项工作:(1)分析使用皮质和皮质下结构的表面信息大脑形状,(b)中提出了一种用于它提供在可以学习的参数的显著减少,和(c国家的最先进的图形卷积网络的残余学习框架)通过提供本地化是对我们的投入利益的重要地区类特定的梯度信息网络的视觉解释性。与我们提出的方法利用使用的皮层和皮层下表面信息,我们超越同为ADD与健康对照问题96.35%,测试精度等机器学习方法。我们通过在25试用蒙特卡洛交叉验证观察它的表现证实了我们模型的有效性。生成的可视化在我们的研究显示对应的地图,关于在关联到阿尔茨海默型痴呆大脑病理变化的结构本地化目前的知识。

38. Landmark detection in Cardiac Magnetic Resonance Imaging Using A Convolutional Neural Network [PDF] 返回目录
  Hui Xue, Jessica Artico, Marianna Fontana, James C Moon, Rhodri H Davies, Peter Kellman
Abstract: Purpose: To develop a convolutional neural network (CNN) solution for robust landmark detection in cardiac MR images. Methods: This retrospective study included cine, LGE and T1 mapping scans from two hospitals. The training set included 2,329 patients and 34,019 images. A hold-out test set included 531 patients and 7,723 images. CNN models were developed to detect two mitral valve plane and apical points on long-axis (LAX) images. On short-axis (SAX) images, anterior and posterior RV insertion points and LV center were detected. Model outputs were compared to manual labels by two operators for accuracy with a t-test for statistical significance. The trained model was deployed to MR scanners. Results: For the LAX images, success detection was 99.8% for cine, 99.4% for LGE. For the SAX, success rate was 96.6%, 97.6% and 98.9% for cine, LGE and T1-mapping. The L2 distances between model and manual labels were 2 to 3.5 mm, indicating close agreement between model landmarks to manual labels. No significant differences were found for the anterior RV insertion angle and LV length by the models and operators for all views and imaging sequences. Model inference on MR scanner took 610ms/5.6s on GPU/CPU, respectively, for a typical cardiac cine series. Conclusions: This study developed, validated and deployed a CNN solution for robust landmark detection in both long and short-axis CMR images for cine, LGE and T1 mapping sequences, with the accuracy comparable to the inter-operator variation.
摘要:目的:发展心脏MR图像鲁棒标志检测卷积神经网络(CNN)溶液。方法:回顾性研究包括电影,LGE和T1映射扫描从两家医院。训练集包括2329例和34019倍的图像。保持输出测试组包括531例和7723倍的图像。 CNN模型的开发是为了在长轴(LAX)图像检测两个二尖瓣平面和根尖点。上短轴(SAX)图像,前部和后部RV插入点和LV中心进行检测。模型产出由两个操作员用于与t检验进行统计学显着性的精度比手动标签。训练的模型被部署到MR扫描仪。结果:对于LAX图像,成功检测是为电影99.8%,LG电子为99.4%。对于SAX,成功率为96.6%,97.6%和电影,LGE和T1映射98.9%。模型和手动标记物之间的距离L2为2至3.5mm,这表明模型地标为手动标签之间接近一致。被发现的由模型和运营商的所有视图和成像序列前RV插入角度和LV长度没有显著差异。磁共振扫描仪型号推断了610ms / 5.6s的GPU / CPU,分别为一个典型的心脏电影系列。结论:本研究开发,验证和部署用于健壮标志检测CNN的溶液在长和短轴CMR图片电影,LGE和T1映射序列,精度媲美的运营商间的变化。

39. Can weight sharing outperform random architecture search? An investigation with TuNAS [PDF] 返回目录
  Gabriel Bender, Hanxiao Liu, Bo Chen, Grace Chu, Shuyang Cheng, Pieter-Jan Kindermans, Quoc Le
Abstract: Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models. There is, however, an ongoing debate whether these efficient methods are significantly better than random search. Here we perform a thorough comparison between efficient and random search methods on a family of progressively larger and more challenging search spaces for image classification and detection on ImageNet and COCO. While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search. In addition, we propose and evaluate techniques which improve the quality of searched architectures and reduce the need for manual hyper-parameter tuning. Source code and experiment data are available at this https URL
摘要:基于重量共享高效的神经结构的搜索方法已经在民主化的神经结构搜索计算机视觉模型中显示出良好的前景。然而,有,正在进行的辩论这些有效的方法是否显著优于随机搜索。在这里,我们对一个家庭的逐渐加大执行效率和随机搜索方法之间的详细比较,并在ImageNet和COCO更有挑战性的图像分类和检测的搜索空间。虽然这两种方法的功效是问题相关的,我们的实验证明,存在着较大的,现实的任务,其中有效的搜索方法,可以提供在随机搜索可观的收益。此外,我们提出并评估其改善搜索架构的质量和减少手动超参数调整的需要的技术。源代码和实验数据都可以在这个HTTPS URL

40. MIXCAPS: A Capsule Network-based Mixture of Experts for Lung Nodule Malignancy Prediction [PDF] 返回目录
  Parnian Afshar, Farnoosh Naderkhani, Anastasia Oikonomou, Moezedin Javad Rafiee, Arash Mohammadi, Konstantinos N. Plataniotis
Abstract: Lung diseases including infections such as Pneumonia, Tuberculosis, and novel Coronavirus (COVID-19), together with Lung Cancer are significantly widespread and are, typically, considered life threatening. In particular, lung cancer is among the most common and deadliest cancers with a low 5-year survival rate. Timely diagnosis of lung cancer is, therefore, of paramount importance as it can save countless lives. In this regard, deep learning radiomics solutions have the promise of extracting the most useful features on their own in an end-to-end fashion without having access to the annotated boundaries. Among different deep learning models, Capsule Networks are proposed to overcome shortcomings of the Convolutional Neural Networks (CNN) such as their inability to recognize detailed spatial relations. Capsule networks have so far shown satisfying performance in medical imaging problems. Capitalizing on their success, in this study, we propose a novel capsule network-based mixture of experts, referred to as the MIXCAPS. The proposed MIXCAPS architecture takes advantage of not only the capsule network's capabilities to handle small datasets, but also automatically splitting dataset through a convolutional gating network. MIXCAPS enables capsule network experts to specialize on different subsets of the data. Our results show that MIXCAPS outperforms a single capsule network and a mixture of CNNs, with an accuracy of 92.88%, sensitivity of 93.2%, specificity of 92.3% and area under the curve of 0.963. Our experiments also show that there is a relation between the gate outputs and a couple of hand-crafted features, illustrating explainable nature of the proposed MIXCAPS. To further evaluate generalization capabilities of the proposed MIXCAPS architecture, additional experiments on a brain tumor dataset are performed showing potentials of MIXCAPS for detection of tumors related to other organs.
摘要:肺疾病,包括感染,如肺炎,肺结核,和新颖的冠状病毒(COVID-19),肺癌一起是显著广泛并且,典型地,被认为威胁生命。具体而言,肺癌是与低5年生存率最常见和最致命的癌症之一。肺癌的及时诊断,因此,极为重要的,因为它可以挽救无数的生命。在这方面,深度学习radiomics解决方案具有提取的终端到终端的方式对自己最有用的功能,而无需访问注释边界的承诺。在不同的深度学习模式,胶囊网络提出了克服卷积神经网络(CNN)的缺点,如他们没有认识到详细的空间关系。胶囊网络至今显示在医疗成像问题,满足性能。善用他们的成功,在这项研究中,我们的专家提出了一种新的基于网络的胶囊的混合物,被称为MIXCAPS。所提出的MIXCAPS架构利用的不仅是胶囊网络的能力来处理小数据集,而是通过卷积选通网络还自动拆分数据集。 MIXCAPS使胶囊网络专家专门对数据的不同子集。我们的研究结果表明,MIXCAPS优于单个胶囊网络和细胞神经网络的混合物,具有92.88%,93.2%的灵敏度,92.3%的特异性和面积的0.963的曲线下的精度。我们的实验还表明,有栅极输出和一对的手工制作的特征,示出了所提出的MIXCAPS可解释性质之间的关系。为了进一步评价所提出MIXCAPS结构的概括能力,对脑肿瘤数据集进行显示的检测与其他器官的肿瘤MIXCAPS的电位的附加试验。

注:中文为机器翻译结果!封面为论文标题词云图!