摘要

1. Semi-supervised Medical Image Segmentation through Dual-task Consistency [PDF] 返回目录
Xiangde Luo, Jieneng Chen, Tao Song, Yinan Chen, Guotai Wang, Shaoting Zhang
Abstract: Deep learning-based semi-supervised learning (SSL) algorithms have led to promising results in medical images segmentation and can alleviate doctors' expensive annotations by leveraging unlabeled data. However, most of the existing SSL algorithms in literature tend to regularize the model training by perturbing networks and/or data. Observing that multi/dual-task learning attends to various levels of information which have inherent prediction perturbation, we ask the question in this work: can we explicitly build task-level regularization rather than implicitly constructing networks- and/or data-level perturbation-and-transformation for SSL? To answer this question, we propose a novel dual-task-consistency semi-supervised framework for the first time. Concretely, we use a dual-task deep network that jointly predicts a pixel-wise segmentation map and a geometry-aware level set representation of the target. The level set representation is converted to an approximated segmentation map through a differentiable task transform layer. Simultaneously, we introduce a dual-task consistency regularization between the level set-derived segmentation maps and directly predicted segmentation maps for both labeled and unlabeled data. Extensive experiments on two public datasets show that our method can largely improve the performance by incorporating the unlabeled data. Meanwhile, our framework outperforms the state-of-the-art semi-supervised medical image segmentation methods. Code is available at: this https URL
摘要：深学习型半监督学习（SSL）算法导致承诺在医学图像分割结果，可以通过利用未标记数据减轻医生的昂贵的注解。然而，大多数的文献中现有的SSL算法往往通过扰动网络和/或数据来规范模型训练。观察到多/双任务学习照顾到其具有固有预测扰动的各种信息的水平，我们要求在这项工作中的问题：我们可以明确地建立任务级正规化，而不是隐含构建网络 - 和/或数据级perturbation-并转化为SSL？要回答这个问题，我们提出了第一次新的双任务一致性半监督框架。具体而言，我们使用了双任务深厚的网络，共同预测逐像素分割图和目标的几何感知水平集表示。水平集表示通过微的任务转换成的近似分段图变换层。同时，我们引入水平集衍生的分割的地图之间的双任务一致性正规化和直接预测的分割映射两个标记和未标记的数据。在两个公共数据集大量的实验表明，该方法可以在很大程度上通过将无标签数据提高性能。同时，我们的框架优于状态的最先进的半监督医学图像分割的方法。代码，请访问：此HTTPS URL

2. Plant Diseases recognition on images using Convolutional Neural Networks: A Systematic Review [PDF] 返回目录
Andre S. Abade, Paulo Afonso Ferreira, Flavio de Barros Vidal
Abstract: Plant diseases are considered one of the main factors influencing food production and minimize losses in production, and it is essential that crop diseases have fast detection and recognition. The recent expansion of deep learning methods has found its application in plant disease detection, offering a robust tool with highly accurate results. In this context, this work presents a systematic review of the literature that aims to identify the state of the art of the use of convolutional neural networks(CNN) in the process of identification and classification of plant diseases, delimiting trends, and indicating gaps. In this sense, we present 121 papers selected in the last ten years with different approaches to treat aspects related to disease detection, characteristics of the data set, the crops and pathogens investigated. From the results of the systematic review, it is possible to understand the innovative trends regarding the use of CNNs in the identification of plant diseases and to identify the gaps that need the attention of the research community.
摘要：植物疾病被认为是影响粮食生产的主要因素之一，并尽量减少生产损失，这是至关重要的作物病害具有快速检测与识别。最近的深度学习方法扩张已经发现其在植物病害检测中的应用，提供高度精确的结果的强大的工具。在这方面，这个工作提出的文献，其目的是确定在本领域中的识别和植物病害，限定趋势的分类，和指示间隙的过程中使用的卷积神经网络的（CNN）的状态的系统评价。在这个意义上说，在过去的十年里有不同的方法来与疾病检测治疗方面我们选择了目前121篇论文，数据集的特点，作物和病原体调查。从系统评价的结果，这是可以理解的创新趋势关于植物病害的识别采用细胞神经网络，并确定需要研究界的关注的差距。

3. HSFM-$Σ$nn: Combining a Feedforward Motion Prediction Network and Covariance Prediction [PDF] 返回目录
A. Postnikov, A. Gamayunov, G. Ferrer
Abstract: In this paper, we propose a new method for motion prediction: HSFM-$\Sigma$nn. Our proposed method combines two different approaches: a feedforward network whose layers are model-based transition functions using the HSFM and a Neural Network (NN), on each of these layers, for covariance prediction. We will compare our method with classical methods for covariance estimation showing their limitations. We will also compare with a learning-based approach, social-LSTM, showing that our method is more precise and efficient.
摘要：在本文中，我们提出了运动预测的新方法：HSFM - $ \ $西格玛NN。我们提出的方法结合了两种不同的方法：前馈网络的其层是使用HSFM和一个神经网络（NN）基于模型的转移函数，对这些层的每用于协方差的预测。我们将比较与展示自己的局限性协方差估计的经典方法我们的方法。我们还将与以学习为主的方针，社会LSTM比较，表明我们的方法是更精确和高效。

4. Online trajectory recovery from offline handwritten Japanese kanji characters [PDF] 返回目录
Hung Tuan Nguyen, Tsubasa Nakamura, Cuong Tuan Nguyen, Masaki Nakagawa
Abstract: In general, it is straightforward to render an offline handwriting image from an online handwriting pattern. However, it is challenging to reconstruct an online handwriting pattern given an offline handwriting image, especially for multiple-stroke character as Japanese kanji. The multiple-stroke character requires not only point coordinates but also stroke orders whose difficulty is exponential growth by the number of strokes. Besides, several crossed and touch points might increase the difficulty of the recovered task. We propose a deep neural network-based method to solve the recovered task using a large online handwriting database. Our proposed model has two main components: Convolutional Neural Network-based encoder and Long Short-Term Memory Network-based decoder with an attention layer. The encoder focuses on feature extraction while the decoder refers to the extracted features and generates the time-sequences of coordinates. We also demonstrate the effect of the attention layer to guide the decoder during the reconstruction. We evaluate the performance of the proposed method by both visual verification and handwritten character recognition. Although the visual verification reveals some problems, the recognition experiments demonstrate the effect of trajectory recovery in improving the accuracy of offline handwritten character recognition when online recognition for the recovered trajectories are combined.
摘要：在一般情况下，它是直接从在线手写模式使脱机手写图像。但是，它是具有挑战性的重建给予脱机手写图像的手写在线模式，特别是对于多笔画字符的日语汉字。在多笔画字符不仅需要点坐标也笔顺，其困难是由笔画数呈指数增长。此外，一些交叉和接触点可能会增加恢复任务的难度。我们提出了一个深刻的基于神经网络的方法来解决使用大型在线手写数据库恢复的任务。我们提出的模型有两个主要组件：基于网络的卷积神经编码器和基于网络的长短期记忆解码器与关注层。编码器集中于特征提取而解码器是指所提取的特征，并产生坐标的时间序列。我们也证明了关注层的作用重建过程中引导解码器。我们评估了该方法的双方视觉验证和手写字符识别的性能。虽然视觉验证发现一些问题，识别实验证明轨迹复苏的提高脱机手写字符识别的准确度时，为恢复的轨迹在线识别相结合的效果。

5. Unsupervised Part Discovery by Unsupervised Disentanglement [PDF] 返回目录
Sandro Braun, Patrick Esser, Björn Ommer
Abstract: We address the problem of discovering part segmentations of articulated objects without supervision. In contrast to keypoints, part segmentations provide information about part localizations on the level of individual pixels. Capturing both locations and semantics, they are an attractive target for supervised learning approaches. However, large annotation costs limit the scalability of supervised algorithms to other object categories than humans. Unsupervised approaches potentially allow to use much more data at a lower cost. Most existing unsupervised approaches focus on learning abstract representations to be refined with supervision into the final representation. Our approach leverages a generative model consisting of two disentangled representations for an object's shape and appearance and a latent variable for the part segmentation. From a single image, the trained model infers a semantic part segmentation map. In experiments, we compare our approach to previous state-of-the-art approaches and observe significant gains in segmentation accuracy and shape consistency. Our work demonstrates the feasibility to discover semantic part segmentations without supervision.
摘要：我们发现解决铰接式对象的一部分分割不受监督的问题。与此相反，以关键点，部分分割提供有关各像素的电平部分的本地化信息。捕获这两个地点和语义，它们是监督学习有吸引力的目标接近。然而，大注释成本限制监督算法到其他对象类别比人类的可扩展性。无监督办法可能允许以较低的成本使用更多的数据。大多数现有的无监督方法的重点是学习抽象的表述与监理到最后的表现加以完善。我们的方法利用由一个对象的形状和外观以及部分细分潜在变量2所解开表示的生成模型。从一个单一的形象，训练的模型推断语义部分分割图。在实验中，我们我们的做法比较以前的方法的国家的最先进的，并观察分割精度和形状一致性显著的收益。我们的工作表明的可行性，以发现语义部分的分割没有监督。

6. Binarized Neural Architecture Search for Efficient Object Recognition [PDF] 返回目录
Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo
Abstract: Traditional neural architecture search (NAS) has a significant impact in computer vision by automatically designing network architectures for various tasks. In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing. The BNAS calculation is more challenging than NAS due to the learning inefficiency caused by optimization requirements and the huge architecture space, and the performance loss when handling the wild data in various computing applications. To address these issues, we introduce operation space reduction and channel sampling into BNAS to significantly reduce the cost of searching. This is accomplished through a performance-based strategy that is robust to wild data, which is further used to abandon less potential operations. Furthermore, we introduce the Upper Confidence Bound (UCB) to solve 1-bit BNAS. Two optimization methods for binarized neural networks are used to validate the effectiveness of our BNAS. Extensive experiments demonstrate that the proposed BNAS achieves a comparable performance to NAS on both CIFAR and ImageNet databases. An accuracy of $96.53\%$ vs. $97.22\%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40\%$ faster search than the state-of-the-art PC-DARTS. On the wild face recognition task, our binarized models achieve a performance similar to their corresponding full-precision models.
摘要：传统的神经结构搜索（NAS）具有自动设计网络架构完成各项任务，在计算机视觉显著的影响。在本文中，二值化的神经结构搜索（BNAS），与二值化卷积的搜索空间，引入产生极高的压缩模式，以减少对边缘计算嵌入式设备巨大的计算成本。该BNAS计算比NAS更加具有挑战性，因为造成优化要求的学习效率和巨大的建筑空间，以及处理各种计算应用野生数据时的性能损失。为了解决这些问题，我们引入的操作空间减少和通道的采样到BNAS以显著降低搜索成本。这是通过具有较强的抗野生的数据，将其进一步使用，放弃的可能性更小操作的基于性能的策略来实现的。此外，我们引入上置信限（UCB）来解决1位BNAS。对于二值化神经网络的两种优化方法来验证我们BNAS的有效性。大量的实验表明，该BNAS实现了相当的性能到NAS上都CIFAR和ImageNet数据库。的96.53 $ \％$与$ 97.22 \％$的精度在CIFAR-10数据集实现，但有显著压缩模型和$ 40 \％$比国家的最先进的PC-DARTS更快的搜索。对野生人脸识别的任务，我们的二值化模型实现类似于其对应的全精度的模型性能。

7. Temporal Attribute-Appearance Learning Network for Video-based Person Re-Identification [PDF] 返回目录
Jiawei Liu, Xierong Zhu, Zheng-Jun Zha
Abstract: Video-based person re-identification aims to match a specific pedestrian in surveillance videos across different time and locations. Human attributes and appearance are complementary to each other, both of them contribute to pedestrian matching. In this work, we propose a novel Temporal Attribute-Appearance Learning Network (TALNet) for video-based person re-identification. TALNet simultaneously exploits human attributes and appearance to learn comprehensive and effective pedestrian representations from videos. It explores hard visual attention and temporal-semantic context for attributes, and spatial-temporal dependencies among body parts for appearance, to boost the learning of them. Specifically, an attribute branch network is proposed with a spatial attention block and a temporal-semantic context block for learning robust attribute representation. The spatial attention block focuses the network on corresponding regions within video frames related to each attribute, the temporal-semantic context block learns both the temporal context for each attribute across video frames and the semantic context among attributes in each video frame. The appearance branch network is designed to learn effective appearance representation from both whole body and body parts with spatial-temporal dependencies among them. TALNet leverages the complementation between attribute and appearance representations, and jointly optimizes them by multi-task learning fashion. Moreover, we annotate ID-level attributes for each pedestrian in the two commonly used video datasets. Extensive experiments on these datasets, have verified the superiority of TALNet over state-of-the-art methods.
摘要：基于视频的人重新鉴定的目的，以匹配在不同时间和地点的监视录像特定的行人。人属性和外观都彼此互补，两者向行人匹配。在这项工作中，我们提出了一个新颖的时间属性 - 外观学习网络（TALNet）基于视频的人重新鉴定。 TALNet同时利用人类的属性和外观从视频中学习全面有效的行人表示。它探讨了硬视觉注意力和属性颞语义上下文，和身体部位的外观之间的时空相关性，以提高他们的学习。具体而言，属性分支网络，提出具有空间注意力块和用于学习强健属性表示的颞语义语境块。空间注意力集中块在网络上对应与每个属性的视频帧中的区域，所述时间 - 语义语境块获悉既用于跨视频帧的每个属性和每个视频帧属性之间的语义语境的时间上下文。外观分支网络的设计从两个全身和身体部位与它们之间的时空依赖性学习有效的外观表现。 TALNet利用属性和外观表述的互补，共同通过多任务学习的方式优化它们。此外，我们注释ID级在两个常用的视频数据集的每个行人属性。对这些数据集大量的实验，已经证实TALNet超过国家的最先进方法的优越性。

8. MU-GAN: Facial Attribute Editing based on Multi-attention Mechanism [PDF] 返回目录
Ke Zhang, Yukun Su, Xiwang Guo, Liang Qi, Zhenbing Zhao
Abstract: Facial attribute editing has mainly two objectives: 1) translating image from a source domain to a target one, and 2) only changing the facial regions related to a target attribute and preserving the attribute-excluding details. In this work, we propose a Multi-attention U-Net-based Generative Adversarial Network (MU-GAN). First, we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator, and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability. Second, a self-attention mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions. experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability, and can decouple the correlation among attributes. It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality.
摘要：面部属性编辑主要有两个目的：1）从源域转换的图像的目标之一，和2）仅改变相关的目标属性的面部区域和保存属性不包括的信息。在这项工作中，我们提出了一个基于掌中宽带，多关注剖成对抗性网络（MU-GAN）。首先，我们更换一个经典的卷积编码器，解码器，U型网状对称结构的发电机，然后应用的添加剂注意机制，以诚为本注意构建掌中宽带连接用于自适应传输编码表示，以补充与属性的解码器-excluding细节增强属性编辑能力。其次，自关注机构被并入到卷积层跨越图像区域建模远距离和多级的依赖关系。实验结果表明，我们的方法是能够平衡属性的编辑能力和细节保护能力，并且可以断开属性之间的相关性。它优于在属性操作的精度和图像质量而言国家的最先进的方法。

9. Diversified Mutual Learning for Deep Metric Learning [PDF] 返回目录
Wonpyo Park, Wonjae Kim, Kihyun You, Minsu Cho
Abstract: Mutual learning is an ensemble training strategy to improve generalization by transferring individual knowledge to each other while simultaneously training multiple models. In this work, we propose an effective mutual learning method for deep metric learning, called Diversified Mutual Metric Learning, which enhances embedding models with diversified mutual learning. We transfer relational knowledge for deep metric learning by leveraging three kinds of diversities in mutual learning: (1) model diversity from different initializations of models, (2) temporal diversity from different frequencies of parameter update, and (3) view diversity from different augmentations of inputs. Our method is particularly adequate for inductive transfer learning at the lack of large-scale data, where the embedding model is initialized with a pretrained model and then fine-tuned on a target dataset. Extensive experiments show that our method significantly improves individual models as well as their ensemble. Finally, the proposed method with a conventional triplet loss achieves the state-of-the-art performance of Recall@1 on standard datasets: 69.9 on CUB-200-2011 and 89.1 on CARS-196.
摘要：相互学习是个体知识传递给对方，同时多种训练模式来提高泛化合奏培训战略。在这项工作中，我们提出了深刻的度量学习的有效相互学习方法，称为多元互助度量学习，增强多元化相互学习嵌入模型。（1）从型号不同的初始化模式的多样性，（2）从参数更新的频率不同时间分集，和（3）从不同扩充观点的多样性：我们通过利用3种相互学习多样性的转让对深度量学习的关系知识输入。我们的方法是特别足以感应传输在缺乏大规模数据的，其中所述嵌入模型初始化为预训练的模型和一个目标数据集然后微调学习。大量的实验表明，我们的方法显著提高个别型号以及它们的合奏。最后，所提出的方法与常规的三重态损耗达到召回的标准数据集的状态的最先进的性能@ 1：69.9上CUB-200-2011和89.1上CARS-196。

10. One-shot Text Field Labeling using Attention and Belief Propagation for Structure Information Extraction [PDF] 返回目录
Mengli Cheng, Minghui Qiu, Xing Shi, Jun Huang, Wei Lin
Abstract: Structured information extraction from document images usually consists of three steps: text detection, text recognition, and text field labeling. While text detection and text recognition have been heavily studied and improved a lot in literature, text field labeling is less explored and still faces many challenges. Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document. However, collecting large amounts of document images and labeling them is difficult and sometimes impossible due to privacy issues. Deploying separate models for each type of document also consumes a lot of resources. Facing these challenges, we explore one-shot learning for the text field labeling task. Existing one-shot learning methods for the task are mostly rule-based and have difficulty in labeling fields in crowded regions with few landmarks and fields consisting of multiple separate text regions. To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling, which makes use of attention mechanism to transfer the layout information between document images. We further applied conditional random field on the transferred layout information for the refinement of field labeling. We collected and annotated a real-world one-shot field labeling dataset with a large variety of document types and conducted extensive experiments to examine the effectiveness of the proposed model. To stimulate research in this direction, the collected dataset and the one-shot model will be released1.
摘要：从文档图像结构化信息提取通常包括三个步骤：文本检测，文字识别，和文本字段标记。虽然文本检测和文字识别已经大量研究和改进了很多文献，文本字段标签较少探索，仍面临诸多挑战。文本标签制作任务的现有的学习基础的方法通常需要大量的标识样本培养出具体型号为每种类型的文件。然而，收集了大量的文本图像和标签他们是困难，有时甚至不可能的，因为隐私问题。部署单独的模型为每种类型的文件也消耗了大量的资源。面对这些挑战，我们探索一次性学习的文本字段标签制作任务。大多是基于规则的任务现有一次性学习方法，并在与几个标志性建筑和领域由多个单独的文本区域的拥挤区域标记领域的难度。为了缓解这些问题，我们提出了一次性文本字段标签，这使得使用中注意的机制传递文档图像之间的布局信息的新的深底到终端的可训练的方法。我们在现场贴标签的细化转移布局信息进一步应用条件随机场。我们收集并注明真实世界的一次性场标签数据集与大量的文档类型，并进行了大量的实验研究了该模型的有效性。为了促进研究在这个方向上，所收集的数据集和单稳态模式将released1。

11. Real-time Plant Health Assessment Via Implementing Cloud-based Scalable Transfer Learning On AWS DeepLens [PDF] 返回目录
Asim Khan, Umair Nawaz, Anwaar Ulhaq, Randall W. Robinson
Abstract: In the Agriculture sector, control of plant leaf diseases is crucial as it influences the quality and production of plant species with an impact on the economy of any country. Therefore, automated identification and classification of plant leaf disease at an early stage is essential to reduce economic loss and to conserve the specific species. Previously, to detect and classify plant leaf disease, various Machine Learning models have been proposed; however, they lack usability due to hardware incompatibility, limited scalability and inefficiency in practical usage. Our proposed DeepLens Classification and Detection Model (DCDM) approach deal with such limitations by introducing automated detection and classification of the leaf diseases in fruits (apple, grapes, peach and strawberry) and vegetables (potato and tomato) via scalable transfer learning on AWS SageMaker and importing it on AWS DeepLens for real-time practical usability. Cloud integration provides scalability and ubiquitous access to our approach. Our experiments on extensive image data set of healthy and unhealthy leaves of fruits and vegetables showed an accuracy of 98.78% with a real-time diagnosis of plant leaves diseases. We used forty thousand images for the training of deep learning model and then evaluated it on ten thousand images. The process of testing an image for disease diagnosis and classification using AWS DeepLens on average took 0.349s, providing disease information to the user in less than a second.
摘要：在农业部门，植物叶部病害的控制是至关重要的，因为它会影响质量和生产植物物种与任何国家的经济产生影响。因此，在早期阶段自动识别和植物叶疾病的分类是必不可少的，以减少经济损失，并保护的特定物种。此前，用于检测和分类植物叶病，各种机器学习模型已被提出;然而，他们缺乏实用性由于硬件不兼容，有限的可扩展性和效率低下的实际应用。我们提出的DeepLens分类检测模型（DCDM）的方式处理这样的限制由在水果（苹果，葡萄，桃和草莓），蔬菜上AWS SageMaker引入叶疾病的自动检测和分类经由可伸缩的迁移学习（马铃薯和番茄）和进口它AWS DeepLens实时实际可用性。云集成提供了可扩展性和我们的做法普遍接入。我们对水果和蔬菜的健康和不健康的树叶大量图像数据集的实验表明的98.78％的准确度与植物叶子疾病的实时诊断。我们用四万图像进行深度学习模型的训练，然后评估它对于1万倍的图像。使用平均AWS DeepLens测试对于疾病的诊断和分类的图像的过程花费0.349s，在不到一秒钟的疾病提供信息给用户。

12. Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature Embedding [PDF] 返回目录
Binh X. Nguyen, Binh D. Nguyen, Gustavo Carneiro, Erman Tjiputra, Quang D. Tran, Thanh-Toan Do
Abstract: Unsupervised Deep Distance Metric Learning (UDML) aims to learn sample similarities in the embedding space from an unlabeled dataset. Traditional UDML methods usually use the triplet loss or pairwise loss which requires the mining of positive and negative samples w.r.t. anchor data points. This is, however, challenging in an unsupervised setting as the label information is not available. In this paper, we propose a new UDML method that overcomes that challenge. In particular, we propose to use a deep clustering loss to learn centroids, i.e., pseudo labels, that represent semantic classes. During learning, these centroids are also used to reconstruct the input samples. It hence ensures the representativeness of centroids - each centroid represents visually similar samples. Therefore, the centroids give information about positive (visually similar) and negative (visually dissimilar) samples. Based on pseudo labels, we propose a novel unsupervised metric loss which enforces the positive concentration and negative separation of samples in the embedding space. Experimental results on benchmarking datasets show that the proposed approach outperforms other UDML methods.
摘要：无监督深度距离度量学习（UDML）旨在从一个未标记的数据集学习中嵌入空间样品相似之处。传统UDML方法通常使用需要阳性和阴性样品的采矿三重损失或成对损失w.r.t.锚数据点。这，然而，在无人监督的设置作为标签信息挑战不可用。在本文中，我们提出了一种新的方法UDML克服这一挑战。特别是，我们建议使用一个深集群丧失学习重心，即伪标签，代表的语义类。在学习中，这些质心也用于重建输入样本。它因此确保质心的代表性 - 每个质心表示视觉上相似的样品。因此，质心给出关于正（视觉上相似的）和负极（视觉上不相似的）采样的信息。基于伪标签，我们提出了一种新的无监督度量损失，这强制正浓度和嵌入空间样本的负分离。标杆数据集实验结果表明，该方法优于其他UDML方法。

13. View-consistent 4D Light Field Depth Estimation [PDF] 返回目录
Numair Khan, Min H. Kim, James Tompkin
Abstract: We propose a method to compute depth maps for every sub-aperture image in a light field in a view consistent way. Previous light field depth estimation methods typically estimate a depth map only for the central sub-aperture view, and struggle with view consistent estimation. Our method precisely defines depth edges via EPIs, then we diffuse these edges spatially within the central view. These depth estimates are then propagated to all other views in an occlusion-aware way. Finally, disoccluded regions are completed by diffusion in EPI space. Our method runs efficiently with respect to both other classical and deep learning-based approaches, and achieves competitive quantitative metrics and qualitative performance on both synthetic and real-world light fields
摘要：我们建议在光场来计算深度贴图为每个子孔径图像的方法的观点是一致的方式。先前的光场的深度估计方法通常仅估计为中央子孔径视图的深度图，并用斗争观点一致估计。我们的方法精确地经由环境绩效指标定义深度边缘，那么我们中央视图内的空间扩散这些边缘。然后，这些深度估计会传播到闭塞感知方式，所有其他视图。最后，disoccluded区域由在EPI空间扩散完成。我们的方法相对于其他两种古典与深学习型方式高效运行，实现对合成和真实世界的光场竞争性定量指标和定性性能

14. Improved Trainable Calibration Method for Neural Networks on Medical Imaging Classification [PDF] 返回目录
Gongbo Liang, Yu Zhang, Xiaoqin Wang, Nathan Jacobs
Abstract: Recent works have shown that deep neural networks can achieve super-human performance in a wide range of image classification tasks in the medical imaging domain. However, these works have primarily focused on classification accuracy, ignoring the important role of uncertainty quantification. Empirically, neural networks are often miscalibrated and overconfident in their predictions. This miscalibration could be problematic in any automatic decision-making system, but we focus on the medical field in which neural network miscalibration has the potential to lead to significant treatment errors. We propose a novel calibration approach that maintains the overall classification accuracy while significantly improving model calibration. The proposed approach is based on expected calibration error, which is a common metric for quantifying miscalibration. Our approach can be easily integrated into any classification task as an auxiliary loss term, thus not requiring an explicit training round for calibration. We show that our approach reduces calibration error significantly across various architectures and datasets.
摘要：最近的工作表明，深层神经网络可以在医疗成像领域广泛的图像分类任务实现超人类的表现。然而，这些作品主要集中于分类的准确性，忽视不确定性量化的重要作用。根据经验，神经网络常常误校准，并在他们的预测过于自信。这种失准可能在任何自动决策体系有问题，但我们专注于在神经网络的失准有导致显著治疗误区的潜在医疗领域。我们建议维持整体分类准确度，同时显著改善模型校准一个新的标定方法。所提出的方法是基于预期的校准误差，这是量化失准的共同指标。我们的方法可以很容易地集成到任何分类任务作为辅助损耗项，因此不需要校准明确的训练轮。我们证明了我们的方法在不同的体系结构和数据集显著减少了校准误差。

15. Unconstrained Text Detection in Manga: a New Dataset and Baseline [PDF] 返回目录
Julián Del Gobbo, Rosana Matuk Herrera
Abstract: The detection and recognition of unconstrained text is an open problem in research. Text in comic books has unusual styles that raise many challenges for text detection. This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own. To improve the evaluation and search of an optimal model, in addition to standard metrics in binarization, we implement other special metrics. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
摘要：检测与识别不受约束的文本是在研究一个开放的问题。在漫画书文字有着不同寻常的风格，提高对文本检测许多挑战。这项工作旨在以二进制化文本的体裁漫画与高度复杂的文本样式：日本漫画。为了克服缺少文本注释的漫画集的在像素级，我们创造我们自己。为了提高评估和寻找一个最佳的模式，除了在二值化标准的指标，我们实行其他特别指标。利用这些资源，我们设计并评估了深刻的网络模型，超越了大多数指标在漫画文本二值化目前的方法。

16. Modeling Wildfire Perimeter Evolution using Deep Neural Networks [PDF] 返回目录
Maxfield E. Green, Karl Kaiser, Nat Shenton
Abstract: With the increased size and frequency of wildfire eventsworldwide, accurate real-time prediction of evolving wildfirefronts is a crucial component of firefighting efforts and for-est management practices. We propose a wildfire spreadingmodel that predicts the evolution of the wildfire perimeter in24 hour periods. The fire spreading simulation is based ona deep convolutional neural network (CNN) that is trainedon remotely sensed atmospheric and environmental time se-ries data. We show that the model is able to learn wildfirespreading dynamics from real historic data sets from a seriesof wildfires in the Western Sierra Nevada Mountains in Cal-ifornia. We validate the model on a previously unseen wild-fire and produce realistic results that significantly outperformhistoric alternatives with validation accuracies ranging from78% 98%
摘要：随着增加的大小和野火eventsworldwide的频率，不断发展wildfirefronts的准确的实时预测是消防工作的重要组成部分和-EST管理实践。我们提出了一个野火spreadingmodel，预测野火周边IN24小时周期的演变。火势蔓延模拟基于奥纳是trainedon遥感大气和环境的时间SE-RIES数据深卷积神经网络（CNN）。我们表明，该模型能够学到真实的历史数据集wildfirespreading动力从西内华达山脉seriesof野火在加州，ifornia。我们验证一个前所未见的野火模型，并产生实际结果与验证精度显著outperformhistoric替代范围from78％98％

17. Joint Pose and Shape Estimation of Vehicles from LiDAR Data [PDF] 返回目录
Hunter Goforth, Xiaoyan Hu, Michael Happold, Simon Lucey
Abstract: We address the problem of estimating the pose and shape of vehicles from LiDAR scans, a common problem faced by the autonomous vehicle community. Recent work has tended to address pose and shape estimation separately in isolation, despite the inherent connection between the two. We investigate a method of jointly estimating shape and pose where a single encoding is learned from which shape and pose may be decoded in an efficient yet effective manner. We additionally introduce a novel joint pose and shape loss, and show that this joint training method produces better results than independently-trained pose and shape estimators. We evaluate our method on both synthetic data and real-world data, and show superior performance against a state-of-the-art baseline.
摘要：针对估算激光雷达扫描，所面临的自主汽车社会面临的共同问题的姿态和车辆形状的问题。最近的工作已经趋于地址姿势和孤立的单独形状估计，尽管两者之间的内在联系。我们研究推定共同形状的方法和姿势其中单个编码了解到从形状和姿势可以高效而有效的方式进行解码。我们还引进了新的联合姿势和体形的损失，并表明该联合训练方法产生比单独训练的姿势和体形估计更好的结果。我们评估我们的模拟数据和真实数据的方法，并显示出对国家的最先进的基准性能优越。

18. Towards Unique and Informative Captioning of Images [PDF] 返回目录
Zeyu Wang, Berthy Feng, Karthik Narasimhan, Olga Russakovsky
Abstract: Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be 'topped' using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model -- by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves three different state-of-the-art models on SPICE-U as well as average score over existing metrics.
摘要：尽管有相当大的进步，艺术形象字幕车型的状态产生通用字幕，而忽略了重要的图像细节。此外，这些系统可能甚至歪曲以便产生更简单的字幕组成的共同概念的图像。在本文中，我们首先分析既现代字幕系统和评价指标，通过实证实验来量化这些现象。我们发现，现代的字幕系统相对于地面实况字幕不正确牵张句子返回高可能性，像SPICE该评价指标可以是“突破”使用简单的字幕系统依靠对象探测器。由这些观察结果的鼓舞，我们通过引入独特性在以上的字幕生成的概念的概念设计一个新的度量（SPICE-U）。我们证明了SPICE-U是更好地相比，SPICE人判断，多样性和描述性的有效捕获概念相关。最后，我们还展示一个通用的技术来提高现有的字幕模型 - 利用互信息作为重新排名解码过程中的目标。根据经验，这会导致更多的独特和翔实的字幕，并提高了SPICE-U三个不同的国家的最先进的车型，以及超过现有指标平均分。

19. Map-Adaptive Goal-Based Trajectory Prediction [PDF] 返回目录
Lingyao Zhang, Po-Hsun Su, Jerrick Hoang, Galen Clark Haynes, Micol Marchetti-Bowick
Abstract: We present a new method for multi-modal, long-term vehicle trajectory prediction. Our approach relies on using lane centerlines captured in rich maps of the environment to generate a set of proposed goal paths for each vehicle. Using these paths -- which are generated at run time and therefore dynamically adapt to the scene -- as spatial anchors, we predict a set of goal-based trajectories along with a categorical distribution over the goals. This approach allows us to directly model the goal-directed behavior of traffic actors, which unlocks the potential for more accurate long-term prediction. Our experimental results on both a large-scale internal driving dataset and on the public nuScenes dataset show that our model outperforms state-of-the-art approaches for vehicle trajectory prediction over a 6-second horizon. We also empirically demonstrate that our model is better able to generalize to road scenes from a completely new city than existing methods.
摘要：本文提出了一种新的方法用于多模态的，长期的车辆轨迹预测。我们的方法依赖于使用中丰富环境的地图捕捉车道中心线，以生成一组每辆车提出的目标路径。使用这些路径 - 这是在运行时产生的，因此动态适应场景 - 作为空间锚，我们预计有超过目标的一个类别分布沿一套基于目标的轨迹。这种方法允许我们直接模型的交通参与者的目标导向行为，这样就打开了更准确的长期预测的潜力。我们两个大规模的内在驱动数据集，并在公共nuScenes数据集上，我们的模型优于国家的最先进的车辆轨迹预测在6秒地平线接近实验结果。我们也经验表明，我们的模型能够更好地从一个全新的城市比现有的方法推广到道路场景。

20. not-so-BigGAN: Generating High-Fidelity Images on a Small Compute Budget [PDF] 返回目录
Seungwook Han, Akash Srivastava, Cole Hurwitz, Prasanna Sattigeri, David D. Cox
Abstract: BigGAN is the state-of-the-art in high-resolution image generation, successfully leveraging advancements in scalable computing and theoretical understanding of generative adversarial methods to set new records in conditional image generation. A major part of BigGAN's success is due to its use of large mini-batch sizes during training in high dimensions. While effective, this technique requires an incredible amount of compute resources and/or time (256 TPU-v3 Cores), putting the model out of reach for the larger research community. In this paper, we present not-so-BigGAN, a simple and scalable framework for training deep generative models on high-dimensional natural images. Instead of modelling the image in pixel space like in BigGAN, not-so-BigGAN uses wavelet transformations to bypass the curse of dimensionality, reducing the overall compute requirement significantly. Through extensive empirical evaluation, we demonstrate that for a fixed compute budget, not-so-BigGAN converges several times faster than BigGAN, reaching competitive image quality with an order of magnitude lower compute budget (4 Telsa-V100 GPUs).
摘要：BigGAN是国家的最先进的高分辨率图像生成，可扩展计算和生成性的对抗方法理论的理解成功地撬动进步来设置条件图像生成新的记录。 BigGAN成功的一个重要组成部分是由于其在高维训练中使用大小批量的大小。尽管有效，但这种技术需要的计算资源和/或时间（256 TPU-V3核心）的数量惊人，把模型遥不可及了较大的研究团体。在本文中，我们提出不那么BigGAN，对高维自然影像培养深厚生成模型简单，可扩展的框架。代替在像素空间的图像建模像BigGAN的，不那么BigGAN使用小波变换来旁路维数灾难，显著减少了总计算要求。通过大量的实证评价，我们证明了一个固定的计算预算，不那么BigGAN收敛数倍于BigGAN快，达到了有竞争力的图像质量与较低的幅度计算预算（4特斯拉-V100的GPU）的顺序。

21. Cephalogram Synthesis and Landmark Detection in Dental Cone-Beam CT Systems [PDF] 返回目录
Yixing Huang, Fuxin Fan, Christopher Syben, Philipp Roser, Leonid Mill, Andreas Maier
Abstract: Due to the lack of standardized 3D cephalometric analytic methodology, 2D cephalograms synthesized from 3D cone-beam computed tomography (CBCT) volumes are widely used for cephalometric analysis in dental CBCT systems. However, compared with conventional X-ray film based cephalograms, such synthetic cephalograms lack image contrast and resolution. In addition, the radiation dose during the scan for 3D reconstruction causes potential health risks. In this work, we propose a sigmoid-based intensity transform that uses the nonlinear optical property of X-ray films to increase image contrast of synthetic cephalograms. To improve image resolution, super resolution deep learning techniques are investigated. For low dose purpose, the pixel-to-pixel generative adversarial network (pix2pixGAN) is proposed for 2D cephalogram synthesis directly from two CBCT projections. For landmark detection in the synthetic cephalograms, an efficient automatic landmark detection method using the combination of LeNet-5 and ResNet50 is proposed. Our experiments demonstrate the efficacy of pix2pixGAN in 2D cephalogram synthesis, achieving an average peak signal-to-noise ratio (PSNR) value of 33.8 with reference to the cephalograms synthesized from 3D CBCT volumes. Pix2pixGAN also achieves the best performance in super resolution, achieving an average PSNR value of 32.5 without the introduction of checkerboard or jagging artifacts. Our proposed automatic landmark detection method achieves 86.7% successful detection rate in the 2 mm clinical acceptable range on the ISBI Test1 data, which is comparable to the state-of-the-art methods. The method trained on conventional cephalograms can be directly applied to landmark detection in the synthetic cephalograms, achieving 93.0% and 80.7% successful detection rate in 4 mm precision range for synthetic cephalograms from 3D volumes and 2D projections respectively.
摘要：由于缺乏标准化3D头部测量分析的方法学，2D测颅从三维锥形束计算机断层摄影合成（CBCT）卷被广泛地用于在牙科CBCT系统测量分析。然而，与传统的X射线胶片基于测颅相比，这种合成测颅缺乏图像对比度和分辨率。此外，扫描三维重建过程中的辐射剂量会导致潜在的健康风险。在这项工作中，我们提出基于乙状结肠强度变换使用X射线胶片的非线性光学性质，以增加合成测颅的图像对比度。为了提高图像分辨率，超分辨率深学习技术进行了研究。对于低剂量的目的，像素到像素的生成对抗网络（pix2pixGAN）提出了一种用于2D测颅X直接从两个CBCT突起合成。对于标志检测在合成测颅，使用LeNet-5和ResNet50的组合的有效自动标志检测方法，提出了我们的实验证明pix2pixGAN在2D测颅X合成的功效，参照从三维CBCT卷合成的测颅实现的平均峰值信噪比（PSNR）的33.8的值。 Pix2pixGAN也实现了超分辨率的最佳性能，无需引入棋盘或锯齿文物达到32.5的平均PSNR值。我们提出的自动标志检测方法实现了对ISBI测试1数据，这与国家的最先进的方法2毫米临床上可接受的范围为86.7％成功检测率。上训练常规测颅的方法可直接应用于标志检测在合成测颅，实现93.0％和80.7％的成功检测率在4毫米精度范围从分别3D体积和2D投影合成的测颅。

22. NTGAN: Learning Blind Image Denoising without Clean Reference [PDF] 返回目录
Rui Zhao, Daniel P.K. Lun, Kin-Man Lam
Abstract: Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks. Most of these deep denoisers are trained either under the supervision of clean references, or unsupervised on synthetic noise. The assumption with the synthetic noise leads to poor generalization when facing real photographs. To address this issue, we propose a novel deep unsupervised image-denoising method by regarding the noise reduction task as a special case of the noise transference task. Learning noise transference enables the network to acquire the denoising ability by only observing the corrupted samples. The results on real-world denoising benchmarks demonstrate that our proposed method achieves state-of-the-art performance on removing realistic noises, making it a potential solution to practical noise reduction problems.
摘要：基于学习图像去噪最近的研究已经取得了各种降噪任务有前途的性能。大多数这些深denoisers都干净引用的监督下，无论是受过训练的，或非监督上合成的噪音。用合成的噪音导致泛化差的假设面临真正的照片时。为了解决这个问题，我们建议对于降噪的任务，因为噪声转移任务的特殊情况，一种新型的无监督的深层图像去噪方法。学习噪声转移使网络仅观察变质取样获取去噪能力。现实世界的去噪基准测试结果表明，我们提出的方法实现对消除噪音的现实，使其成为一个潜在的解决方案，以实际降噪问题的国家的最先进的性能。

23. Small-floating Target Detection in Sea Clutter via Visual Feature Classifying in the Time-Doppler Spectra [PDF] 返回目录
Yi Zhou, Yin Cui, Xiaoke Xu, Jidong Suo, Xiaoming Liu
Abstract: It is challenging to detect small-floating object in the sea clutter for a surface radar. In this paper, we have observed that the backscatters from the target brake the continuity of the underlying motion of the sea surface in the time-Doppler spectra (TDS) images. Following this visual clue, we exploit the local binary pattern (LBP) to measure the variations of texture in the TDS images. It is shown that the radar returns containing target and those only having clutter are separable in the feature space of LBP. An unsupervised one-class support vector machine (SVM) is then utilized to detect the deviation of the LBP histogram of the clutter. The outiler of the detector is classified as the target. In the real-life IPIX radar data sets, our visual feature based detector shows favorable detection rate compared to other three existing approaches.
摘要：这是具有挑战性的检测海杂波的表面雷达小浮动对象。在本文中，我们已经观察到，从目标制动的反向散射海面的在时多普勒频谱（TDS）的图像的基础运动的连续性。在此之后的视觉线索，我们利用了局部二元模式（LBP）来测量TDS图像纹理的变化。结果表明，含有靶和那些只具有杂波的雷达回波在LBP的特征空间分离。然后无监督一个类支持向量机（SVM）是用来检测所述杂波的LBP直方图的偏差。检测器的outiler被分类为目标。在现实生活中的IPIX雷达数据集，我们的视觉特征基于探测器显示良好的检出率相对于其他三个现有的方法。

24. Revealing Lung Affections from CTs. A Comparative Analysis of Various Deep Learning Approaches for Dealing with Volumetric Data [PDF] 返回目录
Radu Miron, Cosmin Moisii, Mihaela Breaban
Abstract: The paper presents and comparatively analyses several deep learning approaches to automatically detect tuberculosis related lesions in lung CTs, in the context of the ImageClef 2020 Tuberculosis task. Three classes of methods, different with respect to the way the volumetric data is given as input to neural network-based classifiers are discussed and evaluated. All these come with a rich experimental analysis comprising a variety of neural network architectures, various segmentation algorithms and data augmentation schemes. The reported work belongs to the SenticLab.UAIC team, which obtained the best results in the competition.
摘要：本文介绍和比较分析了几种深的学习方法来自动检测肺CT的肺结核病灶相关，在ImageClef 2020结核病任务的上下文。三个类的方法，相对于所述体积数据被给定为输入到基于神经网络的分类器的方式不同的讨论和评价。所有这些都配有丰富的实验分析，包括各种神经网络体系结构，各个分割算法和数据扩张方案。报告的工作属于SenticLab.UAIC队，获得比赛的最好成绩。

25. Single Image Super-Resolution for Domain-Specific Ultra-Low Bandwidth Image Transmission [PDF] 返回目录
Jesper Haahr Christensen, Lars Valdemar Mogensen, Ole Ravn
Abstract: Low-bandwidth communication, such as underwater acoustic communication, is limited by best-case data rates of 30--50 kbit/s. This renders such channels unusable or inefficient at best for single image, video, or other bandwidth-demanding sensor-data transmission. To combat data-transmission bottlenecks, we consider practical use-cases within the maritime domain and investigate the prospect of Single Image Super-Resolution methodologies. This is investigated on a large, diverse dataset obtained during years of trawl fishing where cameras have been placed in the fishing nets. We propose down-sampling images to a low-resolution low-size version of about 1 kB that satisfies underwater acoustic bandwidth requirements for even several frames per second. A neural network is then trained to perform up-sampling, trying to reconstruct the original image. We aim to investigate the quality of reconstructed images and prospects for such methods in practical use-cases in general. Our focus in this work is solely on learning to reconstruct the high-resolution images on "real-world" data. We show that our method achieves better perceptual quality and superior reconstruction than generic bicubic up-sampling and motivates further work in this area for underwater applications.
摘要：低带宽通信，例如水声通信，由30--50千比特的最佳情况的数据速率限制/秒。这使得这样的信道不可用的或低效率充其量为单个图像，视频或其他带宽要求传感器数据的传输。打击数据传输的瓶颈，我们认为海洋领域内的实际使用情况，并调查单幅图像的超分辨率方法的前景。这是对在多年拖网捕鱼时，其中摄像机被放置在渔网获得大量的，多样化的数据集调查。我们建议下采样图像的约1 KB的低分辨率低尺寸版本，满足水下每秒甚至几帧声带宽需求。然后，训练神经网络来执行上采样，试图重构原始图像。我们的目标是调查重建图像和前景的用于一般实际使用情况等方法的质量。我们在此工作的重点仅仅是学习到的“真实世界”的数据重构高分辨率图像。我们证明了我们的方法获得更好的感知质量和卓越的重建比在这方面的通用双三次上采样和激励进一步工作水下应用。

26. Generalizing Complex/Hyper-complex Convolutions to Vector Map Convolutions [PDF] 返回目录
Chase J Gaudet, Anthony S Maida
Abstract: We show that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the weight sharing mechanism and treating multidimensional data as a single entity. Their algebra linearly combines the dimensions, making each dimension related to the others. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. Here we introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while dropping the unnatural dimensionality constraints they impose. This is achieved by introducing a system that mimics the unique linear combination of input dimensions, such as the Hamilton product for quaternions. We perform three experiments to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.
摘要：我们发现，核心原因复杂和超复数重视神经网络提供的改进在他们的实值是同行中的权重共享机制和治疗多维数据作为一个单一的实体。他们的线性代数结合的尺寸，使得相关的其他每个维度。但是，这两个被限制为维度的一组数字，两个用于复杂和四个为四元数。在这里，我们引入新颖的矢量地图卷积其中两个由复杂/超复数卷积提供这些性质的捕获，同时丢弃所述非天然维约束它们强加。这是通过引入一个系统来实现的，模仿输入的尺寸，如汉密尔顿产物为四元数的独特线性组合。我们进行三个实验表明，这些新的矢量地图回旋似乎捕获复杂和超复杂网络的所有优势，如他们捕捉潜在的内部关系，同时也避免了维数限制的能力。

27. Learning joint segmentation of tissues and brain lesions from task-specific hetero-modal domain-shifted datasets [PDF] 返回目录
Reuben Dorent, Thomas Booth, Wenqi Li, Carole H. Sudre, Sina Kafiabadi, Jorge Cardoso, Sebastien Ourselin, Tom Vercauteren
Abstract: Brain tissue segmentation from multimodal MRI is a key building block of many neuroimaging analysis pipelines. Established tissue segmentation approaches have, however, not been developed to cope with large anatomical changes resulting from pathology, such as white matter lesions or tumours, and often fail in these cases. In the meantime, with the advent of deep neural networks (DNNs), segmentation of brain lesions has matured significantly. However, few existing approaches allow for the joint segmentation of normal tissue and brain lesions. Developing a DNN for such a joint task is currently hampered by the fact that annotated datasets typically address only one specific task and rely on task-specific imaging protocols including a task-specific set of imaging modalities. In this work, we propose a novel approach to build a joint tissue and lesion segmentation model from aggregated task-specific hetero-modal domain-shifted and partially-annotated datasets. Starting from a variational formulation of the joint problem, we show how the expected risk can be decomposed and optimised empirically. We exploit an upper bound of the risk to deal with heterogeneous imaging modalities across datasets. To deal with potential domain shift, we integrated and tested three conventional techniques based on data augmentation, adversarial learning and pseudo-healthy generation. For each individual task, our joint approach reaches comparable performance to task-specific and fully-supervised models. The proposed framework is assessed on two different types of brain lesions: White matter lesions and gliomas. In the latter case, lacking a joint ground-truth for quantitative assessment purposes, we propose and use a novel clinically-relevant qualitative assessment methodology.
摘要：从多式联运MRI脑组织分割是许多神经影像学分析管道的重要组成部分。成立组织分割方法已经，但是，尚未被开发，以应对从病理产生大的解剖结构变化，如白质病变或肿瘤，并且经常在这些情况下失败。在此期间，具有深厚的神经网络（DNNs）的问世，脑部病变的分割已显著成熟。然而，现有的几个方案允许正常组织和脑部病变的关节分割。对于这样一个联合工作制定DNN当前的事实，数据集注释通常地址只有一个特定的任务，依赖于特定任务的成像协议，包括成像方式的特定任务集阻碍。在这项工作中，我们提出构建从聚集的特定任务的杂模态域移和部分标注数据集的联合组织和病变分割模型的新方法。从关节问题的变分列式开始，我们展示了预期风险如何可以分解和经验进行了优化。我们利用一个上限，以应对跨异构数据集的成像方式的风险。为了应对潜在领域转移，我们集成和基于数据增强，对抗学习和伪健康的一代测试了三种常规技术。对于每一个单独的任务，我们联合的方式达到相当的性能，以任务的具体和全面监督模式。拟议的框架评估在两个不同类型的脑病变：脑白质病变和神经胶质瘤。在后一种情况下，缺乏联合地面实况进行定量评估的目的，我们提出并使用了一种新的临床相关的定性评估方法。

28. Fuzzy Unique Image Transformation: Defense Against Adversarial Attacks On Deep COVID-19 Models [PDF] 返回目录
Achyut Mani Tripathi, Ashish Mishra
Abstract: Early identification of COVID-19 using a deep model trained on Chest X-Ray and CT images has gained considerable attention from researchers to speed up the process of identification of active COVID-19 cases. These deep models act as an aid to hospitals that suffer from the unavailability of specialists or radiologists, specifically in remote areas. Various deep models have been proposed to detect the COVID-19 cases, but few works have been performed to prevent the deep models against adversarial attacks capable of fooling the deep model by using a small perturbation in image pixels. This paper presents an evaluation of the performance of deep COVID-19 models against adversarial attacks. Also, it proposes an efficient yet effective Fuzzy Unique Image Transformation (FUIT) technique that downsamples the image pixels into an interval. The images obtained after the FUIT transformation are further utilized for training the secure deep model that preserves high accuracy of the diagnosis of COVID-19 cases and provides reliable defense against the adversarial attacks. The experiments and results show the proposed model prevents the deep model against the six adversarial attacks and maintains high accuracy to classify the COVID-19 cases from the Chest X-Ray image and CT image Datasets. The results also recommend that a careful inspection is required before practically applying the deep models to diagnose the COVID-19 cases.
摘要：COVID-19的早期识别使用训练的胸部X射线和CT图像的深层模型已经获得了相当大的关注，从研究人员加快的积极COVID-19案件识别过程。这些深层次的模型作为一个辅助，从专家或放射科医师的可用性遭受医院，特别是在偏远地区。已经提出了各种深模型来检测COVID-19的情况下，但很少工程已进行，以防止深模型对能够通过图像的像素用一个小的扰动愚弄深层模型的敌对攻击。本文介绍了深COVID-19模型的性能对敌对攻击的评估。此外，它提出了下采样的图像像素到时间间隔的有效而有效的模糊图像的唯一变换（FUIT）技术。在FUIT改造后获得的图像被进一步用于培训，保留的COVID-19例诊断准确度高，提供了对敌对攻击防御可靠安全的深层模型。该实验结果表明，该模型对防止六种对抗攻击的深层模型，并保持高精确度从胸部X射线图像和CT图像数据集的COVID-19个案分类。结果还建议仔细检查实际适用的深层模型诊断COVID-19案件之前需要。

29. Tangent Space Based Alternating Projections for Nonnegative Low Rank Matrix Approximation [PDF] 返回目录
Guangjing Song, Michael K. Ng, Tai-Xiang Jiang
Abstract: In this paper, we develop a new alternating projection method to compute nonnegative low rank matrix approximation for nonnegative matrices. In the nonnegative low rank matrix approximation method, the projection onto the manifold of fixed rank matrices can be expensive as the singular value decomposition is required. We propose to use the tangent space of the point in the manifold to approximate the projection onto the manifold in order to reduce the computational cost. We show that the sequence generated by the alternating projections onto the tangent spaces of the fixed rank matrices manifold and the nonnegative matrix manifold, converge linearly to a point in the intersection of the two manifolds where the convergent point is sufficiently close to optimal solutions. This convergence result based inexact projection onto the manifold is new and is not studied in the literature. Numerical examples in data clustering, pattern recognition and hyperspectral data analysis are given to demonstrate that the performance of the proposed method is better than that of nonnegative matrix factorization methods in terms of computational time and accuracy.
摘要：在本文中，我们开发了一个新的交替投影方法计算非负低阶矩阵近似非负矩阵。在非负低秩矩阵的近似方法，因为需要奇异值分解投射到固定秩矩阵的歧管可以是昂贵的。我们建议使用点的切线空间的歧管到投影逼近到歧管上，以减少计算成本。我们表明，由交流突起到固定秩的正切空间中产生的序列矩阵歧管和所述非负矩阵歧管，线性收敛到在两个歧管的交叉的点处会聚点足够接近最优解。此收敛结果基于不精确投影到歧管上是新的，在文献中没有研究。在数据聚类，模式识别和高光谱数据分析数值实施例是为了证明所提出的方法的性能比的中的计算时间和精度方面非负矩阵因式分解方法更好。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-09-10

目录

摘要