0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-29

目录

1. Self-supervised Modal and View Invariant Feature Learning [PDF] 摘要
2. Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection [PDF] 摘要
3. Unsupervised learning of multimodal image registration using domain adaptation with projected Earth Move's discrepancies [PDF] 摘要
4. Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild [PDF] 摘要
5. Robust Modeling of Epistemic Mental States [PDF] 摘要
6. Improving Generalized Zero-Shot Learning by Semantic Discriminator [PDF] 摘要
7. Disentanglement Then Reconstruction: Learning Compact Features for Unsupervised Domain Adaptation [PDF] 摘要
8. Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction [PDF] 摘要
9. CNN-based Approach for Cervical Cancer Classification in Whole-Slide Histopathology Images [PDF] 摘要
10. P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds [PDF] 摘要
11. CGGAN: A Context Guided Generative Adversarial Network For Single Image Dehazing [PDF] 摘要
12. Traditional Method Inspired Deep Neural Network for Edge Detection [PDF] 摘要
13. Boosting Few-Shot Learning With Adaptive Margin Loss [PDF] 摘要
14. TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization with Few Labeled Samples [PDF] 摘要
15. Explainable deep learning models in medical image analysis [PDF] 摘要
16. 3D human pose estimation with adaptive receptive fields and dilated temporal convolutions [PDF] 摘要
17. Stereo Vision Based Single-Shot 6D Object Pose Estimation for Bin-Picking by a Robot Manipulator [PDF] 摘要
18. Universal Lesion Detection by Learning from Multiple Heterogeneously Labeled Datasets [PDF] 摘要
19. L^2UWE: A Framework for the Efficient Enhancement of Low-Light Underwater Images Using Local Contrast and Multi-Scale Fusion [PDF] 摘要
20. Anomaly Detection Based on Deep Learning Using Video for Prevention of Industrial Accidents [PDF] 摘要
21. Few-Shot Open-Set Recognition using Meta-Learning [PDF] 摘要
22. AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking [PDF] 摘要
23. Detecting Scatteredly-Distributed, Small, andCritically Important Objects in 3D OncologyImaging via Decision Stratification [PDF] 摘要
24. D2D: Keypoint Extraction with Describe to Detect Approach [PDF] 摘要
25. Network Fusion for Content Creation with Conditional INNs [PDF] 摘要
26. QEBA: Query-Efficient Boundary-Based Blackbox Attack [PDF] 摘要
27. Heatmap-Based Method for Estimating Drivers' Cognitive Distraction [PDF] 摘要
28. A Normalized Fully Convolutional Approach to Head and Neck Cancer Outcome Prediction [PDF] 摘要
29. Perception-aware time optimal path parameterization for quadrotors [PDF] 摘要
30. Early Screening of SARS-CoV-2 by Intelligent Analysis of X-Ray Images [PDF] 摘要
31. Deep Learning for Automatic Pneumonia Detection [PDF] 摘要
32. Learning Various Length Dependence by Dual Recurrent Neural Networks [PDF] 摘要
33. A Feature-map Discriminant Perspective for Pruning Deep Neural Networks [PDF] 摘要
34. Graph-based Proprioceptive Localization Using a Discrete Heading-Length Feature Sequence Matching Approach [PDF] 摘要
35. Towards the Infeasibility of Membership Inference on Deep Models [PDF] 摘要
36. An ENAS Based Approach for Constructing Deep Learning Models for Breast Cancer Recognition from Ultrasound Images [PDF] 摘要
37. Multiple resolution residual network for automatic thoracic organs-at-risk segmentation from CT [PDF] 摘要
38. Segmentation of the Myocardium on Late-Gadolinium Enhanced MRI based on 2.5 D Residual Squeeze and Excitation Deep Learning Model [PDF] 摘要
39. Looking back to lower-level information in few-shot learning [PDF] 摘要

摘要

1. Self-supervised Modal and View Invariant Feature Learning [PDF] 返回目录
  Longlong Jing, Yucheng Chen, Ling Zhang, Mingyi He, Yingli Tian
Abstract: Most of the existing self-supervised feature learning methods for 3D data either learn 3D features from point cloud data or from multi-view images. By exploring the inherent multi-modality attributes of 3D objects, in this paper, we propose to jointly learn modal-invariant and view-invariant features from different modalities including image, point cloud, and mesh with heterogeneous networks for 3D data. In order to learn modal- and view-invariant features, we propose two types of constraints: cross-modal invariance constraint and cross-view invariant constraint. Cross-modal invariance constraint forces the network to maximum the agreement of features from different modalities for same objects, while the cross-view invariance constraint forces the network to maximum agreement of features from different views of images for same objects. The quality of learned features has been tested on different downstream tasks with three modalities of data including point cloud, multi-view images, and mesh. Furthermore, the invariance cross different modalities and views are evaluated with the cross-modal retrieval task. Extensive evaluation results demonstrate that the learned features are robust and have strong generalizability across different tasks.
摘要:大多数的3D数据无论是从学习的点云数据,或从多视角图像的3D采用了现有的自我监督功能学习方法。通过探索固有的多模态属性的3D对象,在本文中,我们提出,共同学习模式不变,并查看不变特征从不同的形式,包括图像,点云,并与三维数据的异构网络网格。为了学习modal-和观点不变特征,提出了两种类型的约束:跨模态不变性约束和交叉视角不变的约束。跨模态不变的约束力量在网络上最大的,从同对象的不同方式的特征一致,而交叉视角不变性约束力量网络的,从同物体图像的不同视图功能最大的协议。的学习特征的质量对不同的下游任务进行了测试数据的三个形式,包括点云,多视点图像,并啮合。此外,不变性横不同模态和视图与交叉模态获取的任务进行评估。广泛的评估结果表明,学习功能强大,具有跨越不同的任务较强的普遍性。

2. Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection [PDF] 返回目录
  Oliver Rippel, Patrick Mertens, Dorit Merhof
Abstract: Anomaly Detection (AD) in images is a fundamental computer vision problem and refers to identifying images and/or image substructures that deviate significantly from the norm. Popular AD algorithms commonly try to learn a model of normality from scratch using task specific datasets, but are limited to semi-supervised approaches employing mostly normal data due to the inaccessibility of anomalies on a large scale combined with the ambiguous nature of anomaly appearance. We follow an alternative approach and demonstrate that deep feature representations learned by discriminative models on large natural image datasets are well suited to describe normality and detect even subtle anomalies. Our model of normality is established by fitting a multivariate Gaussian to deep feature representations of classification networks trained on ImageNet using normal data only in a transfer learning setting. By subsequently applying the Mahalanobis distance as the anomaly score we outperform the current state of the art on the public MVTec AD dataset, achieving an Area Under the Receiver Operating Characteristic curve of $95.8 \pm 1.2$ (mean $\pm$ SEM) over all 15 classes. We further investigate why the learned representations are discriminative to the AD task using Principal Component Analysis. We find that the principal components containing little variance in normal data are the ones crucial for discriminating between normal and anomalous instances. This gives a possible explanation to the often sub-par performance of AD approaches trained from scratch using normal data only. By selectively fitting a multivariate Gaussian to these most relevant components only we are able to further reduce model complexity while retaining AD performance. We also investigate setting the working point by selecting acceptable False Positive Rate thresholds based on the multivariate Gaussian assumption.
摘要:异常检测(AD)中的图像是一项基本的计算机视觉问题,并且是指识别图像和/或从所述规范显著偏离图像子结构。常用的广告算法通常尝试使用任务特定的数据集从头学起正常的模式,但仅限于由于异常的大规模与异常出现的模糊性综合交通不便使用大多正常数据半监督的方法。我们遵循的另一种方法,并表明,通过判别模型对大型自然的图像数据集了解到深特征表示很适合来描述正常和检测哪怕是细微的异常。我们正常的模式是通过安装一个多元高斯训练有素只在转移学习环境中使用正常的数据ImageNet分类网络的深层特征表示成立。通过随后施加所述马哈拉诺比斯距离作为异常分数我们优于现有技术的公共MVTec公司AD数据集的当前状态,实现低于$ 95.8 \ PM 1.2 $(平均$ \下午$ SEM)的接受者操作特性曲线中的面积在所有15个班。我们进一步调查,为什么学表示是歧视性的,使用主成分分析的AD任务。我们发现,在含有正常数据差异不大的主要成分是那些正常和异常情况下,区分的关键。这给出了一个可能的解释AD的往往低于平均水平的表现办法仅使用普通数据的临时训练。通过选择性地拟合多元高斯这些最相关的组件只有我们能够进一步降低模型的复杂性,同时保持广告的效果。我们还研究通过基于多元高斯假设选择接受假阳性率阈值设置工作点。

3. Unsupervised learning of multimodal image registration using domain adaptation with projected Earth Move's discrepancies [PDF] 返回目录
  Mattias P Heinrich, Lasse Hansen
Abstract: Multimodal image registration is a very challenging problem for deep learning approaches. Most current work focuses on either supervised learning that requires labelled training scans and may yield models that bias towards annotated structures or unsupervised approaches that are based on hand-crafted similarity metrics and may therefore not outperform their classical non-trained counterparts. We believe that unsupervised domain adaptation can be beneficial in overcoming the current limitations for multimodal registration, where good metrics are hard to define. Domain adaptation has so far been mainly limited to classification problems. We propose the first use of unsupervised domain adaptation for discrete multimodal registration. Based on a source domain for which quantised displacement labels are available as supervision, we transfer the output distribution of the network to better resemble the target domain (other modality) using classifier discrepancies. To improve upon the sliced Wasserstein metric for 2D histograms, we present a novel approximation that projects predictions into 1D and computes the L1 distance of their cumulative sums. Our proof-of-concept demonstrates the applicability of domain transfer from mono- to multimodal (multi-contrast) 2D registration of canine MRI scans and improves the registration accuracy from 33% (using sliced Wasserstein) to 44%.
摘要:多模态图像配准是深学习方法的一个非常具有挑战性的问题。目前,大多数的工作集中在两种监督的学习需要标记的训练扫描和可能产生的模型朝向是基于手工制作的相似性指标,因此可能无法超越其传统的未受过培训的同行注释结构或无监督的方法偏差。我们相信,无人监督的领域适应性可以克服的多模态注册,在良好的指标是很难定义的当前限制是有益的。领域适应性迄今主要限于分类问题。我们提出了离散多注册在首次使用无人监督的领域适应性的。基于对这些量化排量标签可作为监控源域,我们在网络的输出分布转移到更好地模拟使用分级不符的目标域(其他模式)。为了改善在所述切片瓦瑟斯坦度量2D直方图中,我们提出一种新的近似,即项目预测到1D和计算它们的累加和的L1距离。我们证明了概念证明从单域转移到多峰(多对比度)的犬MRI扫描2D登记的适用性和从33%提高对位精度(使用切片瓦瑟斯坦)至44%。

4. Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild [PDF] 返回目录
  Weixia Zhang, Kede Ma, Guangtao Zhai, Xiaokang Yang
Abstract: Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shifts between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an effective approach of training it for both synthetic and realistic distortions. We first sample pairs of images from the same IQA databases and compute a probability that one image of each pair is of higher quality as the supervisory signal. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.
摘要:盲图像质量评价(BIQA)车型的性能已经有工程和质量回归的端至端优化了显著提升。然而,由于在实验室中模拟和在野外捕捉到的图像之间的分布的变化,培养与合成扭曲的数据库模型保持在处理现实的扭曲(反之亦然)特别弱。为了应对交叉失真的情况下的挑战,我们开发了一个统一的BIQA模型和训练它的合成和现实的扭曲的有效途径。我们从相同的数据库IQA图像的第一样本对和计算的概率使得每对中的一个图像是更高质量为监控信号的。然后,我们采用的保真度损失在大量这样的图像对,以优化BIQA了深刻的神经网络。我们还优化过程中,明确强制执行铰链约束来规范的不确定性估计。六个IQA数据库大量实验表明,该方法学的实验室和野生盲目评价图像质量的承诺。此外,我们利用它来改善现有BIQA模型验证了该培训策略的普遍性。

5. Robust Modeling of Epistemic Mental States [PDF] 返回目录
  AKMMahbubur Rahman, ASM Iftekhar Anam, Mohammed Yeasin
Abstract: This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.
摘要:该作品识别和进步的面部特征及其与二元对话认知心理状态时间动态分析一些研究的挑战。认知状态有:协议,浓度,周到,可以肯定的,和兴趣。在本文中,我们进行了一些统计分析和模拟来识别面部特征和认知状态之间的关系。非线性关系被发现是更普遍,而从原始面部特征导出的时间特征已经证明具有强的变化密切相关。然后,我们提出了一种新的预测框架,考虑面部特征和它们的非线性关系的分数作为输入,并在视频中预测不同的认知状态。当情绪变化的地区,如上升,下降,或稳态的分类与时间特征引入认知状态的预测被升压。所提出的预测模型可以与显著改进的精确度预测认知状态:用于协议相关系数(CoERR)是0.827,对于浓度0.901,对于周到0.794,对于某些0.854,和利息0.913。

6. Improving Generalized Zero-Shot Learning by Semantic Discriminator [PDF] 返回目录
  Xinpeng Li, Mao Ye, Lihua Zhou, Dan Zhang, Ce Zhu, Yiguang Liu
Abstract: It is a recognized fact that the classification accuracy of unseen classes in the setting of Generalized Zero-Shot Learning (GZSL) is much lower than that of traditional Zero-Shot Leaning (ZSL). One of the reasons is that an instance is always misclassified to the wrong domain. Here we refer to the seen and unseen classes as two domains respectively. We propose a new approach to distinguish whether the instances come from the seen or unseen classes. First the visual feature of instance is projected into the semantic space. Then the absolute norm difference between the projected semantic vector and the class semantic embedding vector, and the minimum distance between the projected semantic vectors and the semantic embedding vectors of the seen classes are used as discrimination basis. This approach is termed as SD (Semantic Discriminator) because domain judgement of instance is performed in the semantic space. Our approach can be combined with any existing ZSL method and fully supervision classification model to form a new GZSL method. Furthermore, our approach is very simple and does not need any fixed parameters. A large number of experiments show that the accuracy of our approach is 8.5% to 21.9% higher than the current best method.
摘要:这是一个公认的事实是看不见类广义零射门学习(GZSL)的设置分类精度远远超过传统的零射门斜塔(ZSL)的降低。其中一个原因是,一个实例始终误判为错误的域。在这里,我们分别指的是看到和看不到的类作为两个域。我们提出了一个新的方法来区分实例是否来自于有形无形类。一审的视觉特征投射到语义空间。然后投影语义向量和类语义嵌入矢量,和之间的最小距离之间的绝对差规范投影语义矢量和看到类的语义嵌入向量作为基础歧视。这种方法被称为SD(语义鉴别),因为实例的域判决在语义空间中进行。我们的方法可以与任何现有的ZSL方法,全面监督分类模型相结合,形成新的GZSL方法。此外,我们的做法很简单,不需要任何固定的参数。大量的实验表明,我们的方法的精度比目前最好的方法提高8.5%至21.9%。

7. Disentanglement Then Reconstruction: Learning Compact Features for Unsupervised Domain Adaptation [PDF] 返回目录
  Lihua Zhou, Mao Ye, Xinpeng Li, Ce Zhu, Yiguang Liu, Xue Li
Abstract: Recent works in domain adaptation always learn domain invariant features to mitigate the gap between the source and target domains by adversarial methods. The category information are not sufficiently used which causes the learned domain invariant features are not enough discriminative. We propose a new domain adaptation method based on prototype construction which likes capturing data cluster centers. Specifically, it consists of two parts: disentanglement and reconstruction. First, the domain specific features and domain invariant features are disentangled from the original features. At the same time, the domain prototypes and class prototypes of both domains are estimated. Then, a reconstructor is trained by reconstructing the original features from the disentangled domain invariant features and domain specific features. By this reconstructor, we can construct prototypes for the original features using class prototypes and domain prototypes correspondingly. In the end, the feature extraction network is forced to extract features close to these prototypes. Our contribution lies in the technical use of the reconstructor to obtain the original feature prototypes which helps to learn compact and discriminant features. As far as we know, this idea is proposed for the first time. Experiment results on several public datasets confirm the state-of-the-art performance of our method.
摘要:在域名适应最近的作品总是学域不变特征通过对抗的方法来减轻源和目标域之间的差距。类别信息不充分利用这将导致了解到域不变特征是不够的区别。我们提出了一种基于原型结构,它喜欢捕捉数据聚类中心一个新的领域适应性方法。具体地,它由两个部分组成:解缠结和重建。首先,特定领域的特点和域不变特征是从原有的特色解开。同时,这两个域的域原型和类原型进行估计。然后,重建是由解缠结域不变特征和特定领域的功能重建原有的特色培训。通过这种重建,我们可以构造使用类原型和域的原型相应的原始功能样机。最后,特征提取网络被强制为提取特征接近这些原型。我们的贡献在于技术使用重构来获得原始特征的原型,这有助于了解紧凑,判别特征。据我们所知,这一想法提出首次。在几个公开的数据集实验结果证实了该方法的国家的最先进的性能。

8. Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction [PDF] 返回目录
  Ronny Hug, Stefan Becker, Wolfgang Hübner, Michael Arens
Abstract: Methods to quantify the complexity of trajectory datasets are still a missing piece in benchmarking human trajectory prediction models. In order to gain a better understanding of the complexity of trajectory datasets, an approach for deriving complexity scores from a prototype-based dataset representation is proposed. The dataset representation is obtained by first employing a non-trivial spatial sequence alignment, which enables a following learning vector quantization (LVQ) stage. A large-scale complexity analysis is conducted on several human trajectory prediction benchmarking datasets, followed by a brief discussion on indications for human trajectory prediction and benchmarking.
摘要:量化方法轨迹数据集的复杂性仍然是标杆人类轨迹预测模型中缺少的部分。为了更好地了解轨迹数据集的复杂性,从一个基于原型的数据集表示推导复杂的分数的方法,提出了。该数据集表示由第一采用非平凡空间序列比对,其使得以下学习矢量量化(LVQ)阶段获得。一个大型的复杂性分析是在几个人的轨迹预测基准数据集后,进行对人的轨迹预测和标杆适应症的简短讨论。

9. CNN-based Approach for Cervical Cancer Classification in Whole-Slide Histopathology Images [PDF] 返回目录
  Ferdaous Idlahcen, Mohammed Majid Himmi, Abdelhak Mahmoudi
Abstract: Cervical cancer will cause 460 000 deaths per year by 2040, approximately 90% are Sub-Saharan African women. A constantly increasing incidence in Africa making cervical cancer a priority by the World Health Organization (WHO) in terms of screening, diagnosis, and treatment. Conventionally, cancer diagnosis relies primarily on histopathological assessment, a deeply error-prone procedure requiring intelligent computer-aided systems as low-cost patient safety mechanisms but lack of labeled data in digital pathology limits their applicability. In this study, few cervical tissue digital slides from TCGA data portal were pre-processed to overcome whole-slide images obstacles and included in our proposed VGG16-CNN classification approach. Our results achieved an accuracy of 98,26% and an F1-score of 97,9%, which confirm the potential of transfer learning on this weakly-supervised task.
摘要:宫颈癌会造成每年460名万人死亡,到2040年,大约90%是撒哈拉以南的非洲妇女。在非洲不断增加发病率在筛查,诊断和治疗方面进行宫颈癌由世界卫生组织(WHO)的优先事项。以往,癌症诊断主要依赖组织病理学评估,一个深容易出错的过程需要智能辅助系统作为低成本患者安全机制,但缺乏在数字病理学标记数据的限制了它们的实用性。在这项研究中,来自TCGA数据门户少数宫颈组织数字切片是预处理,以克服全幻灯片图像障碍,包括在我们的建议VGG16-CNN分类方法。我们的研究结果达到98,26%的精度和97,9%的F1-得分,这证实迁移学习的潜能在这个弱监督任务。

10. P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds [PDF] 返回目录
  Haozhe Qi, Chen Feng, Zhiguo Cao, Feng Zhao, Yang Xiao
Abstract: Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification. We apply PointNet++ as our backbone and experiments on KITTI tracking dataset demonstrate P2B's superiority (~10%'s improvement over state-of-the-art). Note that P2B can run with 40FPS on a single NVIDIA 1080Ti GPU. Our code and model are available at this https URL.
摘要:为在点云3D对象跟踪,一种新颖的点至框网络称为P2B是在端至端学习方式提出。我们的主要想法是在嵌入式与目标信息的3D搜索区域第一局部化的潜在目标中心。然后点驱动的三维目标的建议和验证共同执行。通过这种方式,可避免耗时的3D穷举搜索。具体来说,我们分别从模板和搜索区域的点云第一样品种子。然后,我们执行排列不变特征增强从模板嵌入目标线索进入搜索领域的种子,并与目标特有的功能代表他们。因此,增强搜索区域种子退步通过霍夫投票的潜在靶标中心。该中心与种子明智targetness比分进一步加强。最后,每个中心集群邻国利用联合3D目标的建议和验证的集成能力。我们应用PointNet ++作为我们的骨干和实验上KITTI跟踪数据集展示P2B的优势(在国家的最先进的约10%的改善)。请注意,P2B可以40FPS在单个NVIDIA GPU 1080Ti运行。我们的代码和型号可在此HTTPS URL。

11. CGGAN: A Context Guided Generative Adversarial Network For Single Image Dehazing [PDF] 返回目录
  Zhaorun Zhou, Zhenghao Shi, Mingtao Guo, Yaning Feng, Minghua Zhao
Abstract: Image haze removal is highly desired for the application of computer vision. This paper proposes a novel Context Guided Generative Adversarial Network (CGGAN) for single image dehazing. Of which, an novel new encoder-decoder is employed as the generator. And it consists of a feature-extraction-net, a context-extractionnet, and a fusion-net in sequence. The feature extraction-net acts as a encoder, and is used for extracting haze features. The context-extraction net is a multi-scale parallel pyramid decoder, and is used for extracting the deep features of the encoder and generating coarse dehazing image. The fusion-net is a decoder, and is used for obtaining the final haze-free image. To obtain more better results, multi-scale information obtained during the decoding process of the context extraction decoder is used for guiding the fusion decoder. By introducing an extra coarse decoder to the original encoder-decoder, the CGGAN can make better use of the deep feature information extracted by the encoder. To ensure our CGGAN work effectively for different haze scenarios, different loss functions are employed for the two decoders. Experiments results show the advantage and the effectiveness of our proposed CGGAN, evidential improvements over existing state-of-the-art methods are obtained.
摘要:图像霾除去是高度期望用于计算机视觉应用。本文提出的指导下剖成对抗性网络(CGGAN)为单个图像除雾一种新颖的语境。其中,一个新颖的新的编码器 - 解码器被用作发电机。它包括一个功能extractionnet,上下文extractionnet,和融合网的序列。特征提取网充当编码器,并且用于提取雾度特性。上下文提取网是一个多尺度平行金字塔解码器,并用于提取所述编码器的深特征并生成粗除雾图像。融合网是一个解码器,并且用于获得最终无混浊图像。为了获得更多更好的结果,在上下文提取解码器的解码处理中获得的多尺度信息用于指导融合解码器。通过引入超粗解码器原来的编码器,解码器,该CGGAN可以更好地利用由编码器提取的深层特征信息。为了确保我们CGGAN有效地针对不同的阴霾情况下的工作,不同的损失函数被用于两个解码器。实验结果表明,该优势,我们提出的CGGAN的成效,得到了在现有的国家的最先进的方法,证据的改进。

12. Traditional Method Inspired Deep Neural Network for Edge Detection [PDF] 返回目录
  Jan Kristanto Wibisono, Hsueh-Ming Hang
Abstract: Recently, Deep-Neural-Network (DNN) based edge prediction is progressing fast. Although the DNN based schemes outperform the traditional edge detectors, they have much higher computational complexity. It could be that the DNN based edge detectors often adopt the neural net structures designed for high-level computer vision tasks, such as image segmentation and object recognition. Edge detection is a rather local and simple job, the over-complicated architecture and massive parameters may be unnecessary. Therefore, we propose a traditional method inspired framework to produce good edges with minimal complexity. We simplify the network architecture to include Feature Extractor, Enrichment, and Summarizer, which roughly correspond to gradient, low pass filter, and pixel connection in the traditional edge detection schemes. The proposed structure can effectively reduce the complexity and retain the edge prediction quality. Our TIN2 (Traditional Inspired Network) model has an accuracy higher than the recent BDCN2 (Bi-Directional Cascade Network) but with a smaller model.
摘要:近日,深神经网络(DNN)的边缘预测进展快。虽然基于DNN方案超越了传统的边缘检测,他们有更高的计算复杂性。这可能是基于DNN边缘检测往往采用专为高层次的计算机视觉任务,如图像分割和目标识别的神经网络结构。边缘检测是一个相当本地和简单的工作,过复杂的结构和大量的参数可能是不必要的。因此,我们提出了一个传统的方法启发框架产生最小的复杂程度良好的边缘。我们简化网络体系结构包括特征提取器,富集和摘要生成,这大致对应于梯度,低通滤波器,并在传统的边缘检测方案像素连接。所提出的结构能有效地降低复杂性,并保留边缘预测质量。我们TIN2(繁体启发网)模型的精度高于近期BDCN2(双向级联网络),但具有更小的模型。

13. Boosting Few-Shot Learning With Adaptive Margin Loss [PDF] 返回目录
  Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, Liwei Wang
Abstract: Few-shot learning (FSL) has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty in learning to generalize from a few examples. This paper proposes an adaptive margin principle to improve the generalization ability of metric-based meta-learning approaches for few-shot learning problems. Specifically, we first develop a class-relevant additive margin loss, where semantic similarity between each pair of classes is considered to separate samples in the feature embedding space from similar classes. Further, we incorporate the semantic context among all classes in a sampled training task and develop a task-relevant additive margin loss to better distinguish samples from different classes. Our adaptive margin method can be easily extended to a more realistic generalized FSL setting. Extensive experiments demonstrate that the proposed method can boost the performance of current metric-based meta-learning approaches, under both the standard FSL and generalized FSL settings.
摘要:很少次学习(FSL)已经吸引了越来越多的关注,近年来,但仍然具有挑战性,由于学习从几个例子来概括的内在困难。本文提出了一种自适应的利润率原则,以提高基于度量的元学习的泛化能力接近的几个拍的学习问题。具体来说,我们首先开发的一类相关的添加剂余量损失,其中每对类之间的语义相似度被认为分离样本中的类似的类的特征嵌入空间。此外,我们结合语义上下文中的所有类之间的采样训练任务和发展任务相关添加剂差额损失,以从不同的类更好地区分样品。我们的自适应余量方法可以容易地扩展到更真实的广义FSL设置。大量的实验表明,该方法可以提高当前基于度量的元学习方法的性能,标准FSL和广义FSL设置这两个下。

14. TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization with Few Labeled Samples [PDF] 返回目录
  Huaxi Huang, Junjie Zhang, Jian Zhang, Qiang Wu, Chang Xu
Abstract: The challenges of high intra-class variance yet low inter-class fluctuations in fine-grained visual categorization are more severe with few labeled samples, \textit{i.e.,} Fine-Grained categorization problems under the Few-Shot setting (FGFS). High-order features are usually developed to uncover subtle differences between sub-categories in FGFS, but they are less effective in handling the high intra-class variance. In this paper, we propose a Target-Oriented Alignment Network (TOAN) to investigate the fine-grained relation between the target query image and support classes. The feature of each support image is transformed to match the query ones in the embedding feature space, which reduces the disparity explicitly within each category. Moreover, different from existing FGFS approaches devise the high-order features over the global image with less explicit consideration of discriminative parts, we generate discriminative fine-grained features by integrating compositional concept representations to global second-order pooling. Extensive experiments are conducted on four fine-grained benchmarks to demonstrate the effectiveness of TOAN compared with the state-of-the-art models.
摘要:高类内变化的细粒度的视觉分类的挑战又低,级间波动与几个标记的样品更严重,\ textit {即}的几个合一设定下细粒度的分类问题(的FGF) 。高阶功能通常发展到的FGF子类之间的揭示细微的差别,但它们在处理高类内变化不那么有效。在本文中,我们提出了一个目标为本对齐网(TOAN)调查对象查询图像和支持类之间的细粒度关系。每个支持图像的特征转化为在嵌入特征空间,其中明确降低了每个类别中的差距与查询匹配的。此外,从现有的FGF不同的方法设计出高阶功能在与歧视性的部分较少明确考虑的全球形象,我们生成通过整合组成概念表示全球二阶池辨别细粒度的功能。大量的实验是在四个细粒度的基准进行展示与国家的最先进的机型相比TOAN的有效性。

15. Explainable deep learning models in medical image analysis [PDF] 返回目录
  Amitojdeep Singh, Sourya Sengupta, Vasudevan Lakshminarayanan
Abstract: Deep learning methods have been very effective for a variety of medical diagnostic tasks and has even beaten human experts on some of those. However, the black-box nature of the algorithms has restricted clinical use. Recent explainability studies aim to show the features that influence the decision of a model the most. The majority of literature reviews of this area have focused on taxonomy, ethics, and the need for explanations. A review of the current applications of explainable deep learning for different medical imaging tasks is presented here. The various approaches, challenges for clinical deployment, and the areas requiring further research are discussed here from a practical standpoint of a deep learning researcher designing a system for the clinical end-users.
摘要:深学习方法已经非常有效地用于多种医疗诊断任务,甚至在某些那些殴打人类专家。然而,该算法的黑箱性质限制了临床使用。最近explainability研究的目的是表明,影响模型中的大多数决策的特点。大部分这方面的文献综述都集中在分类学,伦理学,以及需要解释。解释的深度学习的不同的医疗成像任务的当前应用程序的审查这里介绍。各种方法,挑战临床部署,以及需要进一步研究的领域是从深度学习研究人员设计,为临床最终用户的系统的实际情况来看这里讨论。

16. 3D human pose estimation with adaptive receptive fields and dilated temporal convolutions [PDF] 返回目录
  Michael Shin, Eduardo Castillo, Irene Font Peradejordi, Shobhna Jayaraman
Abstract: In this work, we demonstrate that receptive fields in 3D pose estimation can be effectively specified using optical flow. We introduce adaptive receptive fields, a simple and effective method to aid receptive field selection in pose estimation models based on optical flow inference. We contrast the performance of a benchmark state-of-the-art model running on fixed receptive fields with their adaptive field counterparts. By using a reduced receptive field, our model can process slow-motion sequences (10x longer) 23% faster than the benchmark model running at regular speed. The reduction in computational cost is achieved while producing a pose prediction accuracy to within 0.36% of the benchmark model.
摘要:在这项工作中,我们证明了在三维姿态估计的感受野可以使用光流进行有效的规定。我们引入自适应感受野,一个简单而有效的方法,基于光流推理,以帮助姿势估计模型感受野的选择。我们与他们的适应领域的同行固定感受野对比的标杆国家的最先进的模式运行的性能。通过使用减小的感受域,我们的模型可以处理慢动作序列(10倍以上)为23%比以规则的速度运行基准模型更快。在计算成本的降低而基准模型的0.36%的范围内产生一个姿态预测精度得以实现。

17. Stereo Vision Based Single-Shot 6D Object Pose Estimation for Bin-Picking by a Robot Manipulator [PDF] 返回目录
  Yoshihiro Nakano
Abstract: We propose a fast and accurate method of 6D object pose estimation for bin-picking of mechanical parts by a robot manipulator. We extend the single-shot approach to stereo vision by application of attention architecture. Our convolutional neural network model regresses to object locations and rotations from either a left image or a right image without depth information. Then, a stereo feature matching module, designated as Stereo Grid Attention, generates stereo grid matching maps. The important point of our method is only to calculate disparity of the objects found by the attention from stereo images, instead of calculating a point cloud over the entire image. The disparity value is then used to calculate the depth to the objects by the principle of triangulation. Our method also achieves a rapid processing speed of pose estimation by the single-shot architecture and it is possible to process a 1024 x 1024 pixels image in 75 milliseconds on the Jetson AGX Xavier implemented with half-float model. Weakly textured mechanical parts are used to exemplify the method. First, we create original synthetic datasets for training and evaluating of the proposed model. This dataset is created by capturing and rendering numerous 3D models of several types of mechanical parts in virtual space. Finally, we use a robotic manipulator with an electromagnetic gripper to pick up the mechanical parts in a cluttered state to verify the validity of our method in an actual scene. When a raw stereo image is used by the proposed method from our stereo camera to detect black steel screws, stainless screws, and DC motor parts, i.e., cases, rotor cores and commutator caps, the bin-picking tasks are successful with 76.3%, 64.0%, 50.5%, 89.1% and 64.2% probability, respectively.
摘要:本文提出6D对象姿态估计由机械臂机械零件斌采摘了快速准确的方法。我们通过关注架构的应用延长了单次的方式立体视觉。我们的卷积神经网络模型倒退到,无论是从左边的图像或没有深度信息的右图像对象的位置和旋转。然后,立体特征匹配模块,被指定为立体声网格注意,生成立体声格匹配的地图。我们的方法的重要的一点是只对物体的差距计算发现,从立体图像的关注,而不是计算整个图像的点云。然后视差值用于通过三角测量的原理来计算深度的对象。我们的方法也由单次架构实现姿势估计的快速的处理速度,并且能够在对杰特森AGX泽维尔75毫秒处理1024×1024像素的图像与半浮点模型实现。弱纹理的机械部件是用来举例说明本方法。首先,我们创建了培训原合成数据集和该模型的评估。此数据集是通过捕捉和渲染几种类型的虚拟空间中的机械部件的众多3D模型创建的。最后,我们使用机器人操纵的电磁夹具拿起机械零件的混乱状态,以验证在实际场景中我们的方法的有效性。当一个原始立体声图像被用于通过从我们的立体照相机所提出的方法来检测黑钢螺丝,不锈钢螺钉,直流电动机部分,即情况下,转子铁心和换向器帽,仓体采集任务是成功的76.3%, 64.0%,分别为50.5%,89.1%和64.2%的概率。

18. Universal Lesion Detection by Learning from Multiple Heterogeneously Labeled Datasets [PDF] 返回目录
  Ke Yan, Jinzheng Cai, Adam P. Harrison, Dakai Jin, Jing Xiao, Le Lu
Abstract: Lesion detection is an important problem within medical imaging analysis. Most previous work focuses on detecting and segmenting a specialized category of lesions (e.g., lung nodules). However, in clinical practice, radiologists are responsible for finding all possible types of anomalies. The task of universal lesion detection (ULD) was proposed to address this challenge by detecting a large variety of lesions from the whole body. There are multiple heterogeneously labeled datasets with varying label completeness: DeepLesion, the largest dataset of 32,735 annotated lesions of various types, but with even more missing annotation instances; and several fully-labeled single-type lesion datasets, such as LUNA for lung nodules and LiTS for liver tumors. In this work, we propose a novel framework to leverage all these datasets together to improve the performance of ULD. First, we learn a multi-head multi-task lesion detector using all datasets and generate lesion proposals on DeepLesion. Second, missing annotations in DeepLesion are retrieved by a new method of embedding matching that exploits clinical prior knowledge. Last, we discover suspicious but unannotated lesions using knowledge transfer from single-type lesion detectors. In this way, reliable positive and negative regions are obtained from partially-labeled and unlabeled images, which are effectively utilized to train ULD. To assess the clinically realistic protocol of 3D volumetric ULD, we fully annotated 1071 CT sub-volumes in DeepLesion. Our method outperforms the current state-of-the-art approach by 29% in the metric of average sensitivity.
摘要:病变检测是医学成像分析中的一个重要问题。大多数以前的工作着重于检测和分割病变(例如,肺结节)的一个专门类别。然而,在临床实践中,放射科医生负责寻找所有可能的异常类型。通用病变检测(ULD)的任务,提出了由全身检测大量的各种病变,以应对这一挑战。有具有不同标签的完整性多个异质标记数据集:DeepLesion,各类32735注解损害的最大的数据集,但更缺少注释实例;和几个充分标记的单型病变的数据集,如LUNA肺结节和双床肝肿瘤。在这项工作中,我们提出了一个新的框架,所有这些数据集利用,共同提高ULD的性能。首先,我们学习使用的所有数据集多头多任务病灶检测并产生DeepLesion病变建议。其次,在DeepLesion缺少注释通过嵌入利用临床先验知识匹配的新方法检索。最后,我们发现使用由单一型病变探测器知识转移可疑,但没有注释的病变。以这种方式,可靠的正和负区域从局部标记和未标记图像,这是有效地利用训练ULD获得。为了评估三维立体ULD的临床现实的协议,我们完全注释中DeepLesion 1071 CT分卷。我们的方法优于在度量平均灵敏度的29%的当前状态的最先进的方法。

19. L^2UWE: A Framework for the Efficient Enhancement of Low-Light Underwater Images Using Local Contrast and Multi-Scale Fusion [PDF] 返回目录
  Tunai Porto Marques, Alexandra Branzan Albu
Abstract: Images captured underwater often suffer from suboptimal illumination settings that can hide important visual features, reducing their quality. We present a novel single-image low-light underwater image enhancer, L^2UWE, that builds on our observation that an efficient model of atmospheric lighting can be derived from local contrast information. We create two distinct models and generate two enhanced images from them: one that highlights finer details, the other focused on darkness removal. A multi-scale fusion process is employed to combine these images while emphasizing regions of higher luminance, saliency and local contrast. We demonstrate the performance of L^2UWE by using seven metrics to test it against seven state-of-the-art enhancement methods specific to underwater and low-light scenes.
摘要:在水下拍摄往往能掩盖重要的视觉功能不理想的照明设置苦,降低其品质的图像。我们提出了一个新颖的单图像低光水下图像增强器,L ^ 2UWE,即建立在我们的观察,大气照明的有效模型可以从局部对比度信息来导出。我们创建了两个完全不同的模型,并生成从他们两个增强型图片:一个亮点更精细的细节,其他的集中在黑暗中去除。采用多尺度融合过程同时强调较高的亮度,显着性和局部对比度的区域,以将这些图像合并。我们证明L ^ 2UWE通过使用七个指标来测试它反对国家的最先进的7增强方法具体到水下低光线场景下的性能。

20. Anomaly Detection Based on Deep Learning Using Video for Prevention of Industrial Accidents [PDF] 返回目录
  Satoshi Hashimoto, Yonghoon Ji, Kenichi Kudo, Takayuki Takahashi, Kazunori Umeda
Abstract: This paper proposes an anomaly detection method for the prevention of industrial accidents using machine learning technology.
摘要:本文提出了一种防止使用机器学习技术的工业意外的异常检测方法。

21. Few-Shot Open-Set Recognition using Meta-Learning [PDF] 返回目录
  Bo Liu, Hao Kang, Haoxiang Li, Gang Hua, Nuno Vasconcelos
Abstract: The problem of open-set recognition is considered. While previous approaches only consider this problem in the context of large-scale classifier training, we seek a unified solution for this and the low-shot classification setting. It is argued that the classic softmax classifier is a poor solution for open-set recognition, since it tends to overfit on the training classes. Randomization is then proposed as a solution to this problem. This suggests the use of meta-learning techniques, commonly used for few-shot classification, for the solution of open-set recognition. A new oPen sEt mEta LEaRning (PEELER) algorithm is then introduced. This combines the random selection of a set of novel classes per episode, a loss that maximizes the posterior entropy for examples of those classes, and a new metric learning formulation based on the Mahalanobis distance. Experimental results show that PEELER achieves state of the art open set recognition performance for both few-shot and large-scale recognition. On CIFAR and miniImageNet, it achieves substantial gains in seen/unseen class detection AUROC for a given seen-class classification accuracy.
摘要:开放设定识别的问题被认为是。虽然以前的方法只考虑在大型分类培训的背景下,这个问题,我们寻求这种和统一的解决方案低射击分类设置。有人认为,经典的SOFTMAX分类为开放式集合识别一个贫穷的解决方案,因为它往往会过度拟合的培训课程。然后随机提出的解决这一问题。这表明利用元学习技术,常用于为数不多的镜头分类,为开放式集合认可的解决方案。然后,新开集元学习(PEELER)算法引入。此联合的一组每集新颖类的随机选择,最大化这些类的实例的后熵损失,以及基于所述马哈拉诺比斯距离新的度量学习制剂。实验结果表明,PEELER实现两个几炮和大型识别技术开集识别性能状态。在CIFAR和miniImageNet,实现了在一个给定的可见级分类的准确性看到/看不见类检测AUROC可观的收益。

22. AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking [PDF] 返回目录
  Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler
Abstract: Siamese approaches have achieved promising performance in visual object tracking recently. The key to the success of Siamese trackers is to learn appearance-invariant feature embedding functions via pair-wise offline training on large-scale video datasets. However, the Siamese paradigm uses one-shot learning to model the online tracking task, which impedes online adaptation in the tracking process. Additionally, the uncertainty of an online tracking response is not measured, leading to the problem of ignoring potential failures. In this paper, we advocate online adaptation in the tracking stage. To this end, we propose a failure-aware system, realised by a Quality Prediction Network (QPN), based on convolutional and LSTM modules in the decision stage, enabling online reporting of potential tracking failures. Specifically, sequential response maps from previous successive frames as well as current frame are collected to predict the tracking confidence, realising spatio-temporal fusion in the decision level. In addition, we further provide an Adaptive Failure-Aware Tracker (AFAT) by combing the state-of-the-art Siamese trackers with our system. The experimental results obtained on standard benchmarking datasets demonstrate the effectiveness of the proposed failure-aware system and the merits of our AFAT tracker, with outstanding and balanced performance in both accuracy and speed.
摘要:连体的方法已经取得了最近在视觉对象跟踪看好的表现。对连体跟踪器的成功的关键是通过对大型数据集的视频成对离线训练,学习外观不变的功能嵌入功能。然而,暹罗典范使用一次性学习到网上跟踪任务,这阻碍了在跟踪过程的在线适应模式。此外,在线跟踪响应的不确定性,不测量,导致忽视潜在故障的问题。在本文中,我们主张在跟踪阶段网游改编。为此,我们提出了一个失败的感知系统,通过质量预测网络(QPN)实现的基础上,在决策阶段卷积和LSTM模块,可实现在线的潜在跟踪失败的报告。具体地,从以前的连续帧以及当前帧顺序响应映射被收集来预测跟踪置信度,在判决电平实现时空融合。此外,我们还进一步与我们的系统梳理了国家的最先进的连体跟踪器提供一个自适应故障感知跟踪(AFAT)。标准基准数据集获得的实验结果证明了该故障感知系统的有效性,我们AFAT跟踪器的优点,在精度和速度优秀,性能均衡。

23. Detecting Scatteredly-Distributed, Small, andCritically Important Objects in 3D OncologyImaging via Decision Stratification [PDF] 返回目录
  Zhuotun Zhu, Ke Yan, Dakai Jin, Jinzheng Cai, Tsung-Ying Ho, Adam P Harrison, Dazhou Guo, Chun-Hung Chao, Xianghua Ye, Jing Xiao, Alan Yuille, Le Lu
Abstract: Finding and identifying scatteredly-distributed, small, and critically important objects in 3D oncology images is very challenging. We focus on the detection and segmentation of oncology-significant (or suspicious cancer metastasized) lymph nodes (OSLNs), which has not been studied before as a computational task. Determining and delineating the spread of OSLNs is essential in defining the corresponding resection/irradiating regions for the downstream workflows of surgical resection and radiotherapy of various cancers. For patients who are treated with radiotherapy, this task is performed by experienced radiation oncologists that involves high-level reasoning on whether LNs are metastasized, which is subject to high inter-observer variations. In this work, we propose a divide-and-conquer decision stratification approach that divides OSLNs into tumor-proximal and tumor-distal categories. This is motivated by the observation that each category has its own different underlying distributions in appearance, size and other characteristics. Two separate detection-by-segmentation networks are trained per category and fused. To further reduce false positives (FP), we present a novel global-local network (GLNet) that combines high-level lesion characteristics with features learned from localized 3D image patches. Our method is evaluated on a dataset of 141 esophageal cancer patients with PET and CT modalities (the largest to-date). Our results significantly improve the recall from $45\%$ to $67\%$ at $3$ FPs per patient as compared to previous state-of-the-art methods. The highest achieved OSLN recall of $0.828$ is clinically relevant and valuable.
摘要:在3D查找和识别零散分布的,小的,非常重要的对象肿瘤图像是非常具有挑战性的。我们专注于发现和肿瘤学显著(或可疑癌症转移)分割淋巴结(OSLNs),尚未研究过的计算任务。确定和划定OSLNs的传播是在确定对应的切除/照射区域手术切除和各种癌症放疗的下游的工作流是至关重要的。对于谁是放射治疗的患者,这个任务是由涉及对淋巴结是否转移高层次的推理,这是受高观察者间差异有经验的放射肿瘤学家进行。在这项工作中,我们提出了分而治之的决定分层的办法,划分成OSLNs肿瘤近端及肿瘤远端类别。这是由每个类别具有的外观,尺寸和其它特性它自己的不同的基础分布的观察激励。两个单独的检测通过分割网络被每个类别训练和稠合。为了进一步减少误报(FP),我们提出一个新的全球局域网(GLNet)与局部3D图像块了解到特点结合了高层次的病变特征。我们的方法是在141名食管癌患者的PET和CT模态(最大更新)的数据集进行评估。从45 $ \%$我们的研究结果显著改善召回67 $ \%$为$ 3 $ FP的每位患者相比,以前的国家的最先进的方法。为$ 0.828 $最高到达OSLN召回是临床相关的和有价值的。

24. D2D: Keypoint Extraction with Describe to Detect Approach [PDF] 返回目录
  Yurun Tian, Vassileios Balntas, Tony Ng, Axel Barroso-Laguna, Yiannis Demiris, Krystian Mikolajczyk
Abstract: In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or detect and describe jointly are two typical strategies for extracting local descriptors. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. % Describe-to-Detect (D2D) leverages successful descriptor models without the need for any additional training. Our method selects keypoints as salient locations with high information content which is defined by the descriptors rather than some independent operators. We perform experiments on multiple benchmarks including image matching, camera localisation, and 3D reconstruction. The results indicate that our method improves the matching performance of various descriptors and that it generalises across methods and tasks.
摘要:在本文中,我们提出利用描述符空间中的信息,提出关键点位置的新方法。检测然后描述,或检测和描述共同用于提取局部描述两种典型策略。与此相反,我们提出了一种方法,反转该过程通过首先描述,然后检测关键点位置。 %说明对检测(D2D)利用成功模式的描述符,而不需要任何额外的培训。我们的方法的关键点选择作为通过所述描述符,而不是一些独立运营商定义的高信息含量的显着位置。我们执行在多个基准实验,包括图像匹配,相机定位,及三维重建。结果表明,我们的方法提高了各种描述符的匹配性能,它整个的方法和任务,可以推广。

25. Network Fusion for Content Creation with Conditional INNs [PDF] 返回目录
  Robin Rombach, Patrick Esser, Björn Ommer
Abstract: Artificial Intelligence for Content Creation has the potential to reduce the amount of manual content creation work significantly. While automation of laborious work is welcome, it is only useful if it allows users to control aspects of the creative process when desired. Furthermore, widespread adoption of semi-automatic content creation depends on low barriers regarding the expertise, computational budget and time required to obtain results and experiment with new techniques. With state-of-the-art approaches relying on task-specific models, multi-GPU setups and weeks of training time, we must find ways to reuse and recombine them to meet these requirements. Instead of designing and training methods for controllable content creation from scratch, we thus present a method to repurpose powerful, existing models for new tasks, even though they have never been designed for them. We formulate this problem as a translation between expert models, which includes common content creation scenarios, such as text-to-image and image-to-image translation, as a special case. As this translation is ambiguous, we learn a generative model of hidden representations of one expert conditioned on hidden representations of the other expert. Working on the level of hidden representations makes optimal use of the computational effort that went into the training of the expert model to produce these efficient, low-dimensional representations. Experiments demonstrate that our approach can translate from BERT, a state-of-the-art expert for text, to BigGAN, a state-of-the-art expert for images, to enable text-to-image generation, which neither of the experts can perform on its own. Additional experiments show the wide applicability of our approach across different conditional image synthesis tasks and improvements over existing methods for image modifications.
摘要:人工智能内容创作具有降低人工内容创作工作量显著的潜力。虽然费力的工作自动化是值得欢迎的,这是唯一有用的,如果它允许用户在需要时控制创作过程的各个方面。此外,广泛采用半自动内容创建的依赖于有关专业技术门槛低,计算预算和时间需要取得的结果和试验新的技术。随着国家的最先进的方法依赖于任务的具体型号,多GPU设置和培训时间的几周内,我们必须找到方法来重用和重组他们满足这些要求。而不是设计和训练方法从头开始控制内容的创建,因此,我们提出一个方法来重新调整对新任务的功能强大,现有的模型,尽管他们从来没有为他们设计的。我们提出这个问题的专家模式之间的转换,其中包括常见的内容创作的场景,如文本到影像和图像 - 图像平移,作为一种特殊情况。由于这个翻译是不明确的,我们学习条件的其他专家的隐藏表示一个专家的隐藏表示的生成模型。对隐藏交涉的级别工作,使该进入专家模式的培训,生产这些高效率,低维表示计算工作量的最佳利用。实验表明,我们的方法可以从BERT,国家的最先进的专家为文字,BigGAN,国家的最先进的专家为图像,转换,使文本到图像生成的既不专家可以对自己的表演。进一步的实验表明我们在不同条件图像合成任务,并在图像的修改现有方法的改进方法的适用性广。

26. QEBA: Query-Efficient Boundary-Based Blackbox Attack [PDF] 返回目录
  Huichen Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li
Abstract: Machine learning (ML), especially deep neural networks (DNNs) have been widely used in various applications, including several safety-critical ones (e.g. autonomous driving). As a result, recent research about adversarial examples has raised great concerns. Such adversarial attacks can be achieved by adding a small magnitude of perturbation to the input to mislead model prediction. While several whitebox attacks have demonstrated their effectiveness, which assume that the attackers have full access to the machine learning models; blackbox attacks are more realistic in practice. In this paper, we propose a Query-Efficient Boundary-based blackbox Attack (QEBA) based only on model's final prediction labels. We theoretically show why previous boundary-based attack with gradient estimation on the whole gradient space is not efficient in terms of query numbers, and provide optimality analysis for our dimension reduction-based gradient estimation. On the other hand, we conducted extensive experiments on ImageNet and CelebA datasets to evaluate QEBA. We show that compared with the state-of-the-art blackbox attacks, QEBA is able to use a smaller number of queries to achieve a lower magnitude of perturbation with 100% attack success rate. We also show case studies of attacks on real-world APIs including MEGVII Face++ and Microsoft Azure.
摘要:机器学习(ML),特别是深层神经网络(DNNs)已被广泛用于各种应用,包括一些安全关键方法(例如自动驾驶)。其结果是,最近关于对抗性的实例研究提出了极大的关注。这种对抗攻击可以通过添加扰动的幅度小至输入误导模型预测来实现。虽然一些白牌攻击已经证明了其有效性,这假定攻击者可以完全访问的机器学习模型;黑箱攻击在实践中更现实。在本文中,我们提出了只基于模型的最终预测标签查询,高效的基于边界的黑盒攻击(QEBA)。从理论上说明为什么对整个梯度空间梯度估计以前的基于边界的攻击查询数量方面效率不高,并为我们的基于降维梯度估计提供最优的分析。在另一方面,我们进行了ImageNet和CelebA数据集了广泛的实验,以评估QEBA。我们发现,与国家的最先进的黑箱攻击相比,QEBA能够使用查询的数量较少,以实现100%的进攻成功率扰动的幅度较低。我们还表明对现实世界的API,其中包括MEGVII面对++和微软Azure的攻击案例。

27. Heatmap-Based Method for Estimating Drivers' Cognitive Distraction [PDF] 返回目录
  Antonyo Musabini, Mounsif Chetitah
Abstract: In order to increase road safety, among the visual and manual distractions, modern intelligent vehicles need also to detect cognitive distracted driving (i.e., the drivers mind wandering). In this study, the influence of cognitive processes on the drivers gaze behavior is explored. A novel image-based representation of the driver's eye-gaze dispersion is proposed to estimate cognitive distraction. Data are collected on open highway roads, with a tailored protocol to create cognitive distraction. The visual difference of created shapes shows that a driver explores a wider area in neutral driving compared to distracted driving. Thus, support vector machine (SVM)-based classifiers are trained, and 85.2% of accuracy is achieved for a two-class problem, even with a small dataset. Thus, the proposed method has the discriminative power to recognize cognitive distraction using gaze information. Finally, this work details how this image-based representation could be useful for other cases of distracted driving detection.
摘要:为了提高道路安全,视觉和手动分心中,现代智能车辆也需要检测的认知分心驾驶(即司机思绪飘远)。在这项研究中,对驾驶员的认知过程的影响凝视行为进行了探讨。驾驶员的眼睛注视分散体的一种新的基于图像的表示被提出来估计认知分心。数据被收集在开放式高速公路的道路,以量身定制的协议创造认知分心。的创建的形状示出了视觉上的差异使驾驶员探索相比驾驶分心在中性驱动更宽的区域。因此,支持向量机(SVM)类分类器的训练,和准确性的85.2%为两类问题实现的,即使具有小的数据集。因此,该方法具有辨别力使用凝视信息来识别认知分心。最后,这项工作有详细讲述基于图像的表现如何成为分心驾驶检测等情况下非常有用。

28. A Normalized Fully Convolutional Approach to Head and Neck Cancer Outcome Prediction [PDF] 返回目录
  William Le, Francisco Perdigón Romero
Abstract: In medical imaging, radiological scans of different modalities serve to enhance different sets of features for clinical diagnosis and treatment planning. This variety enriches the source information that could be used for outcome prediction. Deep learning methods are particularly well-suited for feature extraction from high-dimensional inputs such as images. In this work, we apply a CNN classification network augmented with a FCN preprocessor sub-network to a public TCIA head and neck cancer dataset. The training goal is survival prediction of radiotherapy cases based on pre-treatment FDG PET-CT scans, acquired across 4 different hospitals. We show that the preprocessor sub-network in conjunction with aggregated residual connection leads to improvements over state-of-the-art results when combining both CT and PET input images.
摘要:在医学成像,不同形式的放射性扫描有助于加强不同的功能集,为临床诊断和治疗计划。该品种丰富,可用于对结果预测源的信息。深学习方法是特别适合用于特征提取从高维输入,如图像。在这项工作中,我们应用了FCN预处理子网络,公共TCIA头部和颈部癌症的数据集增强CNN的分类网络。培训目标是基于预治疗FDG PET-CT扫描,在4家不同的医院收购放疗的情况下生存的预测。我们表明,预处理器的子网络中的超过国家的最先进的结果汇总残余连接引线以改善结合结合两种CT和PET输入图像时。

29. Perception-aware time optimal path parameterization for quadrotors [PDF] 返回目录
  Igor Spasojevic, Varun Murali, Sertac Karaman
Abstract: The increasing popularity of quadrotors has given rise to a class of predominantly vision-driven vehicles. This paper addresses the problem of perception-aware time optimal path parametrization for quadrotors. Although many different choices of perceptual modalities are available, the low weight and power budgets of quadrotor systems makes a camera ideal for on-board navigation and estimation algorithms. However, this does come with a set of challenges. The limited field of view of the camera can restrict the visibility of salient regions in the environment, which dictates the necessity to consider perception and planning jointly. The main contribution of this paper is an efficient time optimal path parametrization algorithm for quadrotors with limited field of view constraints. We show in a simulation study that a state-of-the-art controller can track planned trajectories, and we validate the proposed algorithm on a quadrotor platform in experiments.
摘要:四旋翼飞行器的日益普及,已经引起一类主要是视觉驱动的车辆。本文解决了四旋翼飞行器知觉感知时间的最优路径参数化的问题。尽管感知模式的许多不同的选择是可用的,四旋翼系统的低重量和功率预算使得相机理想的机载导航和估计算法。不过,这也确实会带来了一系列的挑战。视摄像头的有限领域可以限制显着区域的环境,这决定了需要考虑感知和共同策划的知名度。本文的主要贡献是与视约束有限场四旋翼飞行器的有效时间最优路径参数化算法。我们发现在模拟研究,国家的最先进的控制器可以跟踪计划的轨迹,我们验证在实验四旋翼平台的算法。

30. Early Screening of SARS-CoV-2 by Intelligent Analysis of X-Ray Images [PDF] 返回目录
  D. Gil, K. Díaz-Chito, C. Sánchez, A. Hernández-Sabaté
Abstract: Future SARS-CoV-2 virus outbreak COVID-XX might possibly occur during the next years. However the pathology in humans is so recent that many clinical aspects, like early detection of complications, side effects after recovery or early screening, are currently unknown. In spite of the number of cases of COVID-19, its rapid spread putting many sanitary systems in the edge of collapse has hindered proper collection and analysis of the data related to COVID-19 clinical aspects. We describe an interdisciplinary initiative that integrates clinical research, with image diagnostics and the use of new technologies such as artificial intelligence and radiomics with the aim of clarifying some of SARS-CoV-2 open questions. The whole initiative addresses 3 main points: 1) collection of standardize data including images, clinical data and analytics; 2) COVID-19 screening for its early diagnosis at primary care centers; 3) define radiomic signatures of COVID-19 evolution and associated pathologies for the early treatment of complications. In particular, in this paper we present a general overview of the project, the experimental design and first results of X-ray COVID-19 detection using a classic approach based on HoG and feature selection. Our experiments include a comparison to some recent methods for COVID-19 screening in X-Ray and an exploratory analysis of the feasibility of X-Ray COVID-19 screening. Results show that classic approaches can outperform deep-learning methods in this experimental setting, indicate the feasibility of early COVID-19 screening and that non-COVID infiltration is the group of patients most similar to COVID-19 in terms of radiological description of X-ray. Therefore, an efficient COVID-19 screening should be complemented with other clinical data to better discriminate these cases.
摘要:未来SARS-COV-2病毒爆发COVID-XX可能会在今后几年可能发生。然而,在人类病理是这么近,许多临床方面,如早期发现并发症,副作用,恢复或早期筛查后,目前都不得而知。尽管COVID-19的病例数,它的迅速蔓延使许多卫生系统在崩溃的边缘阻碍了正确的收集和相关COVID-19临床方面的数据进行分析。我们描述了一个跨学科的倡议,集成了临床研究,用图像诊断和利用新技术,如人工智能和radiomics以澄清一些SARS-COV-2开放性问题的宗旨。整个主动地址3个要点:1)标准化的数据包括图像,临床数据和分析的收集; 2)COVID-19筛选其在初级保健中心早期诊断; 3)定义COVID-19进化radiomic签名和用于早期治疗并发症相关的病理学。特别地,在本文中我们提出的项目,实验设计和使用基于系首长和特征选择一个经典的方法透视COVID-19检测的第一结果的一般概述。我们的实验包括比较近期的一些方法COVID-19 X光检查和X射线COVID-19筛查的可行性的探索性分析。结果表明,传统的方法可以超越这个实验设置深层学习方法,表明早期COVID-19筛查的可行性和非COVID浸润入组患者在X-放射学描述方面最为相似的COVID-19射线。因此,一种有效的COVID-19筛选应与其他临床数据来补充,以便更好地判别这些情况。

31. Deep Learning for Automatic Pneumonia Detection [PDF] 返回目录
  Tatiana Gabruseva, Dmytro Poplavskiy, Alexandr A. Kalinin
Abstract: Pneumonia is the leading cause of death among young children and one of the top mortality causes worldwide. The pneumonia detection is usually performed through examine of chest X-ray radiograph by highly-trained specialists. This process is tedious and often leads to a disagreement between radiologists. Computer-aided diagnosis systems showed the potential for improving diagnostic accuracy. In this work, we develop the computational approach for pneumonia regions detection based on single-shot detectors, squeeze-and-excitation deep convolution neural networks, augmentations and multi-task learning. The proposed approach was evaluated in the context of the Radiological Society of North America Pneumonia Detection Challenge, achieving one of the best results in the challenge.
摘要:肺炎是导致死亡的幼儿中最主要的原因,也是全球前死亡的原因之一。肺炎检测通常是通过由受过培训的专业人员检查胸部X射线透射照片中进行。这个过程是繁琐,往往导致放射科医师之间的分歧。计算机辅助诊断系统表现出对提高诊断准确性的潜力。在这项工作中,我们开发的计算方法为肺炎区域检测基于单次探测器,挤压和激励深卷积神经网络,扩充和多任务学习。所提出的方法在北美肺炎检测挑战放射学会的背景下进行了评估,实现挑战的最好成绩之一。

32. Learning Various Length Dependence by Dual Recurrent Neural Networks [PDF] 返回目录
  Chenpeng Zhang, Shuai Li, Mao Ye, Ce Zhu, Xue Li
Abstract: Recurrent neural networks (RNNs) are widely used as a memory model for sequence-related problems. Many variants of RNN have been proposed to solve the gradient problems of training RNNs and process long sequences. Although some classical models have been proposed, capturing long-term dependence while responding to short-term changes remains a challenge. To this problem, we propose a new model named Dual Recurrent Neural Networks (DuRNN). The DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence. The first part is a recurrent neural network with constrained full recurrent connections to deal with short-term dependence in sequence and generate short-term memory. Another part is a recurrent neural network with independent recurrent connections which helps to learn long-term dependence and generate long-term memory. A selection mechanism is added between two parts to help the needed long-term information transfer to the independent neurons. Multiple modules can be stacked to form a multi-layer model for better performance. Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence. Both theoretical analysis and extensive experiments are conducted to validate the performance of our model, and we also conduct simple visualization experiments and ablation analyses for the model interpretability. Experimental results indicate that the proposed DuRNN model can handle not only very long sequences (over 5000 time steps), but also short sequences very well. Compared with many state-of-the-art RNN models, our model has demonstrated efficient and better performance.
摘要:经常性神经网络(RNNs)被广泛用作序列有关的问题存储器模型。已经提出RNN有很多变种,解决训练RNNs和流程长序列的梯度问题。尽管一些经典机型已经提出,获取长期的依赖,同时应对短期变化仍然是一个挑战。对于这个问题,我们提出了一个名为双循环神经网络(DuRNN)新模式。该DuRNN由两个部分组成学习短期的依赖,并逐步学会了长期依赖。第一部分是与约束充分反复连接的回归神经网络来处理短期依赖于序列和产生短期记忆。另一部分是具有自主经常连接的递归神经网络,这有助于学习的长期依赖,并产生长期记忆。两个部分之间添加了一个选择机制,以帮助需要长期的信息传递给独立的神经元。多个模块可以被堆叠以形成更好的性能的多层模型。我们的贡献是:1)新的复发性模型的基础上研制的分而治之的策略,了解长期和短期的依赖分开,和2)选择机制,以提高分离和依赖的不同时间尺度的学习。理论分析和大量的实验以验证我们的模型的性能,我们还进行简单的可视化实验和消融分析该模型可解释性。实验结果表明,该DuRNN模型不仅可以处理很长的序列(5000个时间步),而且短序列非常好。与许多国家的最先进的RNN模型相比,我们的模型展示了高效率和更好的性能。

33. A Feature-map Discriminant Perspective for Pruning Deep Neural Networks [PDF] 返回目录
  Zejiang Hou, Sun-Yuan Kung
Abstract: Network pruning has become the de facto tool to accelerate deep neural networks for mobile and edge applications. Recently, feature-map discriminant based channel pruning has shown promising results, as it aligns well with the CNN objective of differentiating multiple classes and offers better interpretability of the pruning decision. However, existing discriminant-based methods are challenged by computation inefficiency, as there is a lack of theoretical guidance on quantifying the feature-map discriminant power. In this paper, we present a new mathematical formulation to accurately and efficiently quantify the feature-map discriminativeness, which gives rise to a novel criterion,Discriminant Information(DI). We analyze the theoretical property of DI, specifically the non-decreasing property, that makes DI a valid selection criterion. DI-based pruning removes channels with minimum influence to DI value, as they contain little information regarding to the discriminant power. The versatility of DI criterion also enables an intra-layer mixed precision quantization to further compress the network. Moreover, we propose a DI-based greedy pruning algorithm and structure distillation technique to automatically decide the pruned structure that satisfies certain resource budget, which is a common requirement in reality. Extensive experiments demonstratethe effectiveness of our method: our pruned ResNet50 on ImageNet achieves 44% FLOPs reduction without any Top-1 accuracy loss compared to unpruned model
摘要:网络修剪已成为加速移动和边缘应用深层神经网络的实际工具。近日,特征图判别基于通道修剪已展现出可喜效果,因为它具有分化多类和修剪决策提供了更好的可解释性的CNN目标对准好。然而,现有的基于判别的方法是通过计算效率低下的挑战,是有关于量化特征图的判别能力缺乏理论指导。在本文中,我们提出了一种新的数学公式来准确和有效地量化特征图discriminativeness,这产生了一个新的标准,判别信息(DI)。我们分析DI的理论性能,特别是不降低性能,这使得DI有效的选择标准。 DI-基于修剪消除通道以最小的影响DI值,因为它们包含关于向的判别能力的信息很少。 DI标准的通用性也使层内混合精度量化以进一步压缩网络。此外,我们提出了一个基于DI-贪婪修正算法和结构蒸馏技术来自动决定修剪结构满足一定的资源预算,这在现实中是一个普遍的要求。大量的实验我们的方法的有效性demonstratethe:我们对ImageNet修剪ResNet50达到比未修剪的模型44%减少触发器没有顶-1的精度损失

34. Graph-based Proprioceptive Localization Using a Discrete Heading-Length Feature Sequence Matching Approach [PDF] 返回目录
  Hsin-Min Cheng, Dezhen Song
Abstract: Proprioceptive localization refers to a new class of robot egocentric localization methods that do not rely on the perception and recognition of external landmarks. These methods are naturally immune to bad weather, poor lighting conditions, or other extreme environmental conditions that may hinder exteroceptive sensors such as a camera or a laser ranger finder. These methods depend on proprioceptive sensors such as inertial measurement units (IMUs) and/or wheel encoders. Assisted by magnetoreception, the sensors can provide a rudimentary estimation of vehicle trajectory which is used to query a prior known map to obtain location. Named as graph-based proprioceptive localization (GBPL), we provide a low cost fallback solution for localization under challenging environmental conditions. As a robot/vehicle travels, we extract a sequence of heading-length values for straight segments from the trajectory and match the sequence with a pre-processed heading-length graph (HLG) abstracted from the prior known map to localize the robot under a graph-matching approach. Using the information from HLG, our location alignment and verification module compensates for trajectory drift, wheel slip, or tire inflation level. We have implemented our algorithm and tested it in both simulated and physical experiments. The algorithm runs successfully in finding robot location continuously and achieves localization accurate at the level that the prior map allows (less than 10m).
摘要:本体感受本地化是指一类新的不依赖于外部标志的感知和识别机器人自我为中心的定位方法。这些方法是自然免疫于恶劣天气,照明条件差,或者有可能阻碍外感受传感器,诸如照相机或激光测距仪等极端环境条件。这些方法依赖于本体传感器如惯性测量单元(IMU)及/或轮编码器。通过magnetoreception辅助,传感器可以提供用于查询一个现有已知的地图获得的位置的车辆轨迹的初步估计。命名为基于图的本体定位(GBPL),我们下具有挑战性的环境条件下提供用于定位的低成本后备解决方案。作为一个机器人/车辆行进时,我们提取用于从轨迹直线段标题长度的值的序列,并且匹配与现有公知的地图提取处理前的标题长度曲线图(HLG)的序列来定位下一个机器人图形匹配的方法。使用来自HLG,我们对轨迹漂移,车轮滑移,或轮胎充气水平位置对准和验证模块补偿的信息。我们已经实现了我们的算法,并在这两个模拟和物理实验进行了测试。成功的算法运行在连续查找机器人位置,并实现在该现有地图允许电平(小于10米)的定位准确。

35. Towards the Infeasibility of Membership Inference on Deep Models [PDF] 返回目录
  Shahbaz Rezaei, Xin Liu
Abstract: Recent studies propose membership inference (MI) attacks on deep models. Despite the moderate accuracy of such MI attacks, we show that the way the attack accuracy is reported is often misleading and a simple blind attack which is highly unreliable and inefficient in reality can often represent similar accuracy. We show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly indistinguishable among train and non-train samples. Moreover, in contrast with general intuition that deeper models have a capacity to memorize training samples, and, hence, they are more vulnerable to membership inference, we find no evidence to support that and in some cases deeper models are often harder to launch membership inference attack on. Furthermore, despite the common belief, we show that overfitting does not necessarily lead to higher degree of membership leakage. We conduct experiments on MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, ResNet, DenseNet, InceptionV3, and Xception. Source code: this https URL}{\color{blue} {this https URL}.
摘要:最近的研究提出了深入的模型推断成员(MI)的攻击。尽管这样的MI攻击的中等精度,我们表明,攻击精度报告方式往往是误导和简单的盲目的攻击是在现实中往往能表现类似的准确性非常不可靠,效率低下。我们发现,目前MI攻击模型可以用平庸精度充其量只能识别错误分类样本的成员,这只占训练样本的一小部分。我们分析认为以前没有探索会员​​推断多项新功能,其中包括决策边界和梯度规范的距离,并得出结论,深模型的反应是火车和非训练样本中大多是没有什么区别。此外,与一般的直觉,更深层次的模型有记忆训练样本容量,并且,因此,他们更容易受到会员推论相反,我们发现没有证据支持这一点,在某些情况下,更深层次的模型往往难以推出会员推理进攻。此外,尽管共同的信念,我们表明,过度拟合并不必然导致较高的隶属度泄漏。我们对MNIST,CIFAR-10,CIFAR-100进行实验,并ImageNet,使用各种模型架构,包括LeNet,RESNET,DenseNet,InceptionV3和Xception。源代码:这个HTTPS URL} {\ {色蓝} {这个HTTPS URL}。

36. An ENAS Based Approach for Constructing Deep Learning Models for Breast Cancer Recognition from Ultrasound Images [PDF] 返回目录
  Mohammed Ahmed, Hongbo Du, Alaa AlZoubi
Abstract: Deep Convolutional Neural Networks (CNN) provides an "end-to-end" solution for image pattern recognition with impressive performance in many areas of application including medical imaging. Most CNN models of high performance use hand-crafted network architectures that require expertise in CNNs to utilise their potentials. In this paper, we applied the Efficient Neural Architecture Search (ENAS) method to find optimal CNN architectures for classifying breast lesions from ultrasound (US) images. Our empirical study with a dataset of 524 US images shows that the optimal models generated by using ENAS achieve an average accuracy of 89.3%, surpassing other hand-crafted alternatives. Furthermore, the models are simpler in complexity and more efficient. Our study demonstrates that the ENAS approach to CNN model design is a promising direction for classifying ultrasound images of breast lesions.
摘要:深卷积神经网络(CNN)提供了一个“端到端”为图像图案识别与在许多应用领域,包括医学成像出色的性能的解决方案。需要在细胞神经网络的专业知识,利用他们的潜能高性能采用手工制作的网络架构的最CNN模型。在本文中,我们采用了高效的神经结构搜索(ENAS)方法来找到最佳CNN架构的乳腺病变的超声(US)图像分类。我们的实证研究,524所超声图像显示的数据集,通过使用ENAS达到89.3%的平均准确度,超过了其他手工制作的替代品所产生的最佳模型。此外,该模型在复杂简单,更有效。我们的研究表明,ENAS方法CNN模型设计对乳腺病变的超声图像进行分类的有前途的方向。

37. Multiple resolution residual network for automatic thoracic organs-at-risk segmentation from CT [PDF] 返回目录
  Hyemin Um, Jue Jiang, Maria Thor, Andreas Rimner, Leo Luo, Joseph O. Deasy, Harini Veeraraghavan
Abstract: We implemented and evaluated a multiple resolution residual network (MRRN) for multiple normal organs-at-risk (OAR) segmentation from computed tomography (CT) images for thoracic radiotherapy treatment (RT) planning. Our approach simultaneously combines feature streams computed at multiple image resolutions and feature levels through residual connections. The feature streams at each level are updated as the images are passed through various feature levels. We trained our approach using 206 thoracic CT scans of lung cancer patients with 35 scans held out for validation to segment the left and right lungs, heart, esophagus, and spinal cord. This approach was tested on 60 CT scans from the open-source AAPM Thoracic Auto-Segmentation Challenge dataset. Performance was measured using the Dice Similarity Coefficient (DSC). Our approach outperformed the best-performing method in the grand challenge for hard-to-segment structures like the esophagus and achieved comparable results for all other structures. Median DSC using our method was 0.97 (interquartile range [IQR]: 0.97-0.98) for the left and right lungs, 0.93 (IQR: 0.93-0.95) for the heart, 0.78 (IQR: 0.76-0.80) for the esophagus, and 0.88 (IQR: 0.86-0.89) for the spinal cord.
摘要:实施和评估多正常器官高危从计算机断层扫描(CT)图像进行胸部放射治疗(RT)规划(OAR)分割多分辨率剩余网络(MRRN)。我们的方法同时联合机配备在多个图像的分辨率和特征级别通过残余连接计算流。作为图像通过各种特征等级通过在每个级别的功能流更新。我们训练用的肺癌患者进行验证,以段左,右肺部,心脏,食管和脊髓举行了35次扫描的206个胸部CT扫描我们的做法。这种方法是从开源AAPM胸自动分割挑战数据集60个上的CT扫描测试。性能是使用骰子相似系数(DSC)进行测定。我们的方法跑赢业绩最好的方法,在盛大的挑战难以段结构,如食道,并取得所有其他结构的比较的结果。用我们的方法中位DSC为0.97(四分范围[IQR]:0.97-0.98)的左,右肺部,0.93(IQR:0.93-0.95)的心脏,0.78(IQR:0.76-0.80)的食道和0.88(IQR:0.86-0.89),用于脊髓。

38. Segmentation of the Myocardium on Late-Gadolinium Enhanced MRI based on 2.5 D Residual Squeeze and Excitation Deep Learning Model [PDF] 返回目录
  Abdul Qayyum, Alain Lalande, Thomas Decourselle, Thibaut Pommier, Alexandre Cochet, Fabrice Meriaudeau
Abstract: Cardiac left ventricular (LV) segmentation from short-axis MRI acquired 10 minutes after the injection of a contrast agent (LGE-MRI) is a necessary step in the processing allowing the identification and diagnosis of cardiac diseases such as myocardial infarction. However, this segmentation is challenging due to high variability across subjects and the potential lack of contrast between structures. Then, the main objective of this work is to develop an accurate automatic segmentation method based on deep learning models for the myocardial borders on LGE-MRI. To this end, 2.5 D residual neural network integrated with a squeeze and excitation blocks in encoder side with specialized convolutional has been proposed. Late fusion has been used to merge the output of the best trained proposed models from a different set of hyperparameters. A total number of 320 exams (with a mean number of 6 slices per exam) were used for training and 28 exams used for testing. The performance analysis of the proposed ensemble model in the basal and middle slices was similar as compared to intra-observer study and slightly lower at apical slices. The overall Dice score was 82.01% by our proposed method as compared to Dice score of 83.22% obtained from the intra observer study. The proposed model could be used for the automatic segmentation of myocardial border that is a very important step for accurate quantification of no-reflow, myocardial infarction, myocarditis, and hypertrophic cardiomyopathy, among others.
摘要:心脏左心室(LV)分割从短轴MRI造影剂(LGE-MRI)的注射后获得的10分钟是在处理一个必要的步骤,允许心脏病如心肌梗死的识别和诊断。然而,这种分割是具有挑战性由于跨学科的高变异性和潜在的缺乏结构之间的对比。然后,这项工作的主要目标是开发基于深度学习模型对LGE-MRI心肌边界的精确的自动分割方法。为此,2.5 d残留神经网络的专门卷积已经提出在编码器侧的挤压和激励块集成。后融合已被用于最训练有素提出的模型的输出从一组不同的超参数的合并。的320个检查(具有每考试6片的平均数目)的总数目被用于训练和用于测试28次考试。相比于观察者内的研究和心尖切片略有降低基底和中间片所提出的集成模型的性能分析是相似的。总体骰子得分是82.01%,由我们提出的方法相比,从内部观察研究中获得的骰子得分的83.22%。该模型可用于心肌边界的自动分割是对无复流的准确定量,心肌梗死,心肌炎,和肥厚型心肌病,以及其他非常重要的一步。

39. Looking back to lower-level information in few-shot learning [PDF] 返回目录
  Zhongjie Yu, Sebastian Raschka
Abstract: Humans are capable of learning new concepts from small numbers of examples. In contrast, supervised deep learning models usually lack the ability to extract reliable predictive rules from limited data scenarios when attempting to classify new examples. This challenging scenario is commonly known as few-shot learning. Few-shot learning has garnered increased attention in recent years due to its significance for many real-world problems. Recently, new methods relying on meta-learning paradigms combined with graph-based structures, which model the relationship between examples, have shown promising results on a variety of few-shot classification tasks. However, existing work on few-shot learning is only focused on the feature embeddings produced by the last layer of the neural network. In this work, we propose the utilization of lower-level, supporting information, namely the feature embeddings of the hidden neural network layers, to improve classifier accuracy. Based on a graph-based meta-learning framework, we develop a method called Looking-Back, where such lower-level information is used to construct additional graphs for label propagation in limited data settings. Our experiments on two popular few-shot learning datasets, miniImageNet and tieredImageNet, show that our method can utilize the lower-level information in the network to improve state-of-the-art classification performance.
摘要:人类是能够从实例少数学习新概念。相比之下,深监督学习模型通常缺乏尝试新的实例进行分类时,提取数据有限情况下可靠预测规则的能力。这一具有挑战性的场景通常被称为几拍的学习。很少拍的学习已经获得了在最近几年越来越多的关注,因为它为许多现实问题的意义。近日,新方法依赖于元学习范式与基于图形的结构相结合,其实例之间的关系模型,都表现出看好的各种为数不多的镜头分类任务的结果。然而,在一些次学习现有的工作只集中在通过神经网络的最后一层产生的功能的嵌入。在这项工作中,我们提出了低级别的利用率,支持信息,即隐藏的神经网络层的功能的嵌入,以提高分类的准确性。根据基于图的元学习框架内,我们开发一种被称为寻找回,其中这样的较低级信息被用来构建受限数据设置标签传播附加图形方法。我们对两种流行的几拍的学习数据集,miniImageNet和tieredImageNet,表明我们的方法可以利用网络中的较低级别的信息,以提高国家的最先进的分类性能实验。

注:中文为机器翻译结果!