摘要

1. AViD Dataset: Anonymized Videos from Diverse Countries [PDF] 返回目录
AJ Piergiovanni, Michael S. Ryoo
Abstract: We introduce a new public video dataset for action recognition: Anonymized Videos from Diverse countries (AViD). Unlike existing public video datasets, AViD is a collection of action videos from many different countries. The motivation is to create a public dataset that would benefit training and pretraining of action recognition models for everybody, rather than making it useful for limited countries. Further, all the face identities in the AViD videos are properly anonymized to protect their privacy. It also is a static dataset where each video is licensed with the creative commons license. We confirm that most of the existing video datasets are statistically biased to only capture action videos from a limited number of countries. We experimentally illustrate that models trained with such biased datasets do not transfer perfectly to action videos from the other countries, and show that AViD addresses such problem. We also confirm that the new AViD dataset could serve as a good dataset for pretraining the models, performing comparably or better than prior datasets.
摘要：介绍了动作识别一个新的公共视频数据集：来自不同国家（AVID）匿名的影片。不同于现有的公共视频数据集，AVID是动作影片来自不同国家的集合。的动机是创建一个公共数据集，将有利于培养和大家的动作识别模型训练前，而不是使之成为有限的国家是有用的。此外，所有的面孔身份在Avid视频被妥善匿名保护自己的隐私。它也是一个静态的数据集，其中每部影片都是行货带Creative Commons许可。我们确认，大部分现有的视频数据集进行统计偏向于只拍摄动作影片从国家的数量有限。我们实验表明，与这种偏见的数据集训练的模型没有从其他国家转移完美行动的视频，并表明热心地址这样的问题。我们还确认新的狂热数据集可以作为一个很好的数据集训练前的模型，进行同等或优于之前的数据集。

2. Scientific Discovery by Generating Counterfactuals using Image Translation [PDF] 返回目录
Arunachalam Narayanaswamy, Subhashini Venugopalan, Dale R. Webster, Lily Peng, Greg Corrado, Paisan Ruamviboonsuk, Pinal Bavishi, Michael Brenner, Philip Nelson, Avinash V. Varadarajan
Abstract: Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding.
摘要：型号说明技术在理解模型的性能源和做决定的透明起着至关重要的作用。在这里，我们调查，如果解释技术也可以用来作为科学发现的机制。我们提出三点贡献：第一，我们提出了一个框架，从解释技术的预测转化为发现的机制。其次，我们将展示与暗箱预测组合生成模型怎么可以用来生成可严格审查假设（没有人的先验）。第三，随着这些技术，我们研究的分类模型视网膜图像预测糖尿病性黄斑水肿（DME），其中最近的工作表明，CNN训练的这些图像很可能是学习图像中的新特征。我们表明，该框架能够解释基本科学机制，从而缩小模型的性能和人性的理解之间的差距。

3. Camera-Lidar Integration: Probabilistic sensor fusion for semantic mapping [PDF] 返回目录
Julie Stephany Berrio, Mao Shan, Stewart Worrall, Eduardo Nebot
Abstract: An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now possible to obtain high definition semantic information in 2D from monocular cameras, though cameras cannot reliably provide the highly accurate 3D information provided by lasers. The fusion of these two sensor modalities can overcome the shortcomings of each individual sensor, though there are a number of important challenges that need to be addressed in a probabilistic manner. In this paper, we address the common, yet challenging, lidar/camera/semantic fusion problems which are seldom approached in a wholly probabilistic manner. Our approach is capable of using a multi-sensor platform to build a three-dimensional semantic voxelized map that considers the uncertainty of all of the processes involved. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and heuristic label probabilities for the semantic images. We also present a novel and efficient viewpoint validation algorithm to check for occlusions from the camera frames. A probabilistic projection is performed from the camera images to the lidar point cloud. Each labelled lidar scan then feeds into an octree map building algorithm that updates the class probabilities of the map voxels every time a new observation is available. We validate our approach using a set of qualitative and quantitative experimental tests on the USyd Dataset.
摘要：在城市环境中的自动化车辆运行必须能够感知和识别三维世界中的对象/障碍物，同时在不断变化的环境中导航。为了规划和执行准确精密的驾驶动作，周围的高层次情境的理解是至关重要的。由于在图像处理中的最新进展，现在能够获得从单眼相机在2D高清晰度的语义信息，尽管相机不能可靠地提供由激光器提供的高度精确的3D信息。这两个传感器方式的融合可以克服每个单独的传感器的缺点，尽管有一些需要以随机的方式加以解决的重要挑战。在本文中，我们解决常见的，但具有挑战性，激光雷达/相机/语义融合其在全概率方式很少接近的问题。我们的做法是能够使用多传感器平台来构建考虑所有涉及的过程中的不确定性三维语义素化的地图。我们提出了一个概率管道从所述传感器读数（照相机，激光雷达，IMU和车轮编码器），用于车辆中的运动补偿，和启发式标签概率语义图像包含不确定性。我们还提出了一种新颖的和有效的观点出发，验证算法，以检查来自照相机的帧闭塞。一种概率投影来自摄像机的图像的激光雷达点云进行。每个标记激光雷达扫描，然后送入八叉树图生成算法的更新的地图类体素的概率每一次一个新的观察是可用的。我们使用一组对USyd数据集的定性和定量实验测试验证我们的做法。

4. STaRFlow: A SpatioTemporal Recurrent Cell for Lightweight Multi-Frame Optical Flow Estimation [PDF] 返回目录
Pierre Godet, Alexandre Boulch, Aurélien Plyer, Guy Le Besnerais
Abstract: We present a new lightweight CNN-based algorithm for multi-frame optical flow estimation. Our solution introduces a double recurrence over spatial scale and time through repeated use of a generic "STaR" (SpatioTemporal Recurrent) cell. It includes (i) a temporal recurrence based on conveying learned features rather than optical flow estimates; (ii) an occlusion detection process which is coupled with optical flow estimation and therefore uses a very limited number of extra parameters. The resulting STaRFlow algorithm gives state-of-the-art performances on MPI Sintel and Kitti2015 and involves significantly less parameters than all other methods with comparable results.
摘要：我们提出了多帧的光流估计的新的轻量级基于CNN-算法。我们的解决方案引入了通过重复使用一个通用的“星”（时空递归）细胞的过度空间尺度和时间的双复发。它包括（i）基于传达学习特征，而不是光流估计的时间复发; （ⅱ），其耦合于光流估计，因此遮挡检测处理使用的额外的参数的数非常有限。将所得STaRFlow算法给出了MPI辛特尔和Kitti2015状态的最艺术表演及显著涉及比具有可比较的结果的所有其它方法更少的参数。

5. Geometric Style Transfer [PDF] 返回目录
Xiao-Chang Liu, Xuan-Yi Li, Ming-Ming Cheng, Peter Hall
Abstract: Neural style transfer (NST), where an input image is rendered in the style of another image, has been a topic of considerable progress in recent years. Research over that time has been dominated by transferring aspects of color and texture, yet these factors are only one component of style. Other factors of style include composition, the projection system used, and the way in which artists warp and bend objects. Our contribution is to introduce a neural architecture that supports transfer of geometric style. Unlike recent work in this area, we are unique in being general in that we are not restricted by semantic content. This new architecture runs prior to a network that transfers texture style, enabling us to transfer texture to a warped image. This form of network supports a second novelty: we extend the NST input paradigm. Users can input content/style pair as is common, or they can chose to input a content/texture-style/geometry-style triple. This three image input paradigm divides style into two parts and so provides significantly greater versatility to the output we can produce. We provide user studies that show the quality of our output, and quantify the importance of geometric style transfer to style recognition by humans.
摘要：神经风格转移（NST），其中输入图像是另一个图像的渲染风格，一直是近年来的长足进步的一个话题。研究在这段时间已经被转移颜色和质地的方面占主导地位，但这些因素都只是一个风格的组成部分。的风格的其他因素包括组合物，所使用的投影系统，以及其中的艺术家翘曲和弯曲对象的方式。我们的贡献是引入神经架构，支持传输的几何风格。不像在这方面最近的工作，我们是在为一般的，我们不是通过语义内容的限制是独一无二的。这种新的架构之前的网络传输纹理样式运行，使我们能够传递纹理扭曲的图像。这种网络形式支持第二新颖性：我们扩展了NST输入范例。用户可以输入内容/风格对作为是常见的，或者它们可以选择输入内容/纹理样式/几何式三倍。这三个图像输入模式划分风格分为两个部分，因此提供了更大的显著多功能性，我们可以产生输出。我们提供用户研究显示我们的输出质量，并量化几何风格转移到风格识别人类的重要性。

6. Grading video interviews with fairness considerations [PDF] 返回目录
Abhishek Singhania, Abhishek Unnam, Varun Aggarwal
Abstract: There has been considerable interest in predicting human emotions and traits using facial images and videos. Lately, such work has come under criticism for poor labeling practices, inconclusive prediction results and fairness considerations. We present a careful methodology to automatically derive social skills of candidates based on their video response to interview questions. We, for the first time, include video data from multiple countries encompassing multiple ethnicities. Also, the videos were rated by individuals from multiple racial backgrounds, following several best practices, to achieve a consensus and unbiased measure of social skills. We develop two machine-learning models to predict social skills. The first model employs expert-guidance to use plausibly causal features. The second uses deep learning and depends solely on the empirical correlations present in the data. We compare errors of both these models, study the specificity of the models and make recommendations. We further analyze fairness by studying the errors of models by race and gender. We verify the usefulness of our models by determining how well they predict interview outcomes for candidates. Overall, the study provides strong support for using artificial intelligence for video interview scoring, while taking care of fairness and ethical considerations.
摘要：一直在预测使用面部图像和视频的人类情感和特点相当大的兴趣。最近，这样的工作已经受到批评为贫困标签的做法，不确定的预测结果和公平性的考虑。我们提出了一个谨慎的方法来自动地得到基于他们的视频响应的面试问题的候选人的社交技巧。我们，第一次，包括来自多个国家涵盖多个民族的视频数据。此外，视频是由多个种族背景的人评分，以下几个最佳实践，实现社会技能共识和公正的措施。我们开发了两个机器学习模型来预测社交技巧。第一个模型采用专家指导使用振振有词因果功能。仅基于经验关系的第二用途深度学习和取决于存在于所述数据。我们比较这两种模型的误差，研究模型，并提出建议的特异性。我们由种族和性别研究的模型误差进一步分析公平性。我们通过确定他们如何预测候选人面试结果验证了模型的有效性。总的来说，这项研究提供了使用视频面试得分人工智能，同时注意公平和伦理方面的考虑的大力支持。

7. Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images [PDF] 返回目录
Hui Qu, Pengxiang Wu, Qiaoying Huang, Jingru Yi, Zhennan Yan, Kang Li, Gregory M. Riedlinger, Subhajyoti De, Shaoting Zhang, Dimitris N. Metaxas
Abstract: Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segmentation tasks require significant effort to manually generate accurate pixel-wise annotations for fully supervised training. To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i.e., only a small portion of nuclei locations in each image are labeled. The framework consists of two learning stages. In the first stage, we design a semi-supervised strategy to learn a detection model from partially labeled nuclei locations. Specifically, an extended Gaussian mask is designed to train an initial model with partially labeled data. Then, selftraining with background propagation is proposed to make use of the unlabeled regions to boost nuclei detection and suppress false positives. In the second stage, a segmentation model is trained from the detected nuclei locations in a weakly-supervised fashion. Two types of coarse labels with complementary information are derived from the detected points and are then utilized to train a deep neural network. The fully-connected conditional random field loss is utilized in training to further refine the model without introducing extra computational complexity during inference. The proposed method is extensively evaluated on two nuclei segmentation datasets. The experimental results demonstrate that our method can achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort.
摘要：核分割是病理图像分析的根本任务。通常情况下，这样的分割任务要求显著努力来手动生成的完全监督下的训练精确的逐像素的注释。为了减轻这样的繁琐和手动工作，在本文中，我们提出了一种弱基于偏点注释监督分割框架，即，仅在每个图像中的细胞核位置的一小部分被标记。该框架包括两个学习阶段。在第一阶段中，我们设计了一个半监督策略学习从局部标记的核位置的检测模型。具体地，扩展高斯掩模被设计为训练用部分标记的数据的初始模型。然后，背景传播selftraining提出了利用未标记的区域，以提高核检测和抑制误报。在第二阶段，一个分割模型是从在弱监督方式所检测到的核心位置的训练。两种类型的具有互补信息粗标签被从检测到的点衍生，然后被用来训练深神经网络。全连接条件随机场损失被用在训练中以进一步缩小而不推理过程中引入额外的计算复杂模型。所提出的方法是在两个细胞核分割数据集广泛评价。实验结果表明，该方法能够较充分地监督对方和国家的最先进的方法，同时要求显著少注解努力实现竞争力的性能。

8. Impression Space from Deep Template Network [PDF] 返回目录
Gongfan Fang, Xinchao Wang, Haofei Zhang, Jie Song, Mingli Song
Abstract: It is an innate ability for humans to imagine something only according to their impression, without having to memorize all the details of what they have seen. In this work, we would like to demonstrate that a trained convolutional neural network also has the capability to "remember" its input images. To achieve this, we propose a simple but powerful framework to establish an {\emph{Impression Space}} upon an off-the-shelf pretrained network. This network is referred to as the {\emph{Template Network}} because its filters will be used as templates to reconstruct images from the impression. In our framework, the impression space and image space are bridged by a layer-wise encoding and iterative decoding process. It turns out that the impression space indeed captures the salient features from images, and it can be directly applied to tasks such as unpaired image translation and image synthesis through impression matching without further network training. Furthermore, the impression naturally constructs a high-level common space for different data. Based on this, we propose a mechanism to model the data relations inside the impression space, which is able to reveal the feature similarity between images. Our code will be released.
摘要：只有根据自己的印象，想象的东西，而不必全部记住，他们已经看到了细节的与生俱来的能力的人。在这项工作中，我们想证明，受过训练的卷积神经网络也有“记住”它的输入图像的能力。为了实现这一目标，我们提出了一个简单而强大的框架，在一个现成的，现成的预训练的网络建立一个{\ {EMPH印象空间}}。该网络被称为{\ {EMPH模板网络}}，因为它的过滤器将被用作模板以从压印重建图像。在我们的框架，印象空间和图像空间由逐层编码和迭代解码过程桥接。事实证明，在印象空间确实从图像捕获的显着特征，它可直接施加到任务，如不成对图像转换和图像合成通过印象匹配无需进一步网络的训练。此外，自然的印象构建了不同的数据的高级公共空间。在此基础上，我们提出了一个机制，印象空间，这能够揭示图像间的特征相似度中的数据关系进行建模。我们的代码将被释放。

9. VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users [PDF] 返回目录
Adithya Ranga, Filippo Giruzzi, Jagdish Bhanushali, Emilie Wirbel, Patrick Pérez, Tuan-Hung Vu, Xavier Perrotton
Abstract: Advanced perception and path planning are at the core for any self-driving vehicle. Autonomous vehicles need to understand the scene and intentions of other road users for safe motion planning. For urban use cases it is very important to perceive and predict the intentions of pedestrians, cyclists, scooters, etc., classified as vulnerable road users (VRU). Intent is a combination of pedestrian activities and long term trajectories defining their future motion. In this paper we propose a multi-task learning model to predict pedestrian actions, crossing intent and forecast their future path from video sequences. We have trained the model on naturalistic driving open-source JAAD dataset, which is rich in behavioral annotations and real world scenarios. Experimental results show state-of-the-art performance on JAAD dataset and how we can benefit from jointly learning and predicting actions and trajectories using 2D human pose features and scene context.
摘要：先进的感知和路径规划的核心都用于任何自动驾驶汽车。自主车需要了解其他道路使用者的安全运动规划的场景和意图。对于城市使用情况下，感知和预测行人，自行车，摩托车等的意图，归类为弱势道路使用者（VRU）是非常重要的。目的是确定自己的未来运动的步行活动和长期轨迹的组合。在本文中，我们提出了一个多任务学习模型来预测行人的行动，穿越意图并预测其未来从视频序列路径。我们已经培训了自然的驾驶开源JAAD数据集，其中富含的行为注释和真实世界的场景模型。实验结果表明，在JAAD数据集，以及如何我们可以从共同学习和预测使用二维人体姿势的功能和场景背景的动作和轨迹中受益的国家的最先进的性能。

10. Context-Aware Refinement Network Incorporating Structural Connectivity Prior for Brain Midline Delineation [PDF] 返回目录
Shen Wang, Kongming Liang, Yiming Li, Yizhou Yu, Yizhou Wang
Abstract: Brain midline delineation can facilitate the clinical evaluation of brain midline shift, which plays an important role in the diagnosis and prognosis of various brain pathology. Nevertheless, there are still great challenges with brain midline delineation, such as the largely deformed midline caused by the mass effect and the possible morphological failure that the predicted midline is not a connected curve. To address these challenges, we propose a context-aware refinement network (CAR-Net) to refine and integrate the feature pyramid representation generated by the UNet. Consequently, the proposed CAR-Net explores more discriminative contextual features and a larger receptive field, which is of great importance to predict largely deformed midline. For keeping the structural connectivity of the brain midline, we introduce a novel connectivity regular loss (CRL) to punish the disconnectivity between adjacent coordinates. Moreover, we address the ignored prerequisite of previous regression-based methods that the brain CT image must be in the standard pose. A simple pose rectification network is presented to align the source input image to the standard pose image. Extensive experimental results on the CQ dataset and one inhouse dataset show that the proposed method requires fewer parameters and outperforms three state-of-the-art methods in terms of four evaluation metrics. Code is available at this https URL.
摘要：脑中线划分可以促进大脑中线移位，它在不同脑病理学的诊断和预后具有重要作用的临床评价。尽管如此，仍存在与脑中线划分等造成的质量效应和可能的形态失败所预测的中线不是连接曲线大幅变形中线巨大挑战。为了应对这些挑战，我们提出了一个上下文感知细化网络（CAR-网）提炼和整合由UNET生成的特征金字塔表示。因此，建议CAR-Net的探索更有辨别力的上下文特征和更大的感受野，这是非常重要的预测发生较大变形中线。为了保持大脑中线的结构连接，我们引入了一个新的连接经常丢失（CRL）以惩罚相邻坐标之间的不连通。此外，我们解决了旧的基于回归的方法被忽略的先决条件是脑部CT图像必须在标准姿势。一个简单的姿势整流网络呈现给源输入的图像对齐到标准姿势图像。在CQ集和一个数据集点播服务表明，所提出的方法需要更少的参数和在四个评价标准方面优于三态的最先进的方法广泛的实验结果。代码可在此HTTPS URL。

11. Progressive Point Cloud Deconvolution Generation Network [PDF] 返回目录
Le Hui, Rui Xu, Jin Xie, Jianjun Qian, Jian Yang
Abstract: In this paper, we propose an effective point cloud generation method, which can generate multi-resolution point clouds of the same shape from a latent vector. Specifically, we develop a novel progressive deconvolution network with the learning-based bilateral interpolation. The learning-based bilateral interpolation is performed in the spatial and feature spaces of point clouds so that local geometric structure information of point clouds can be exploited. Starting from the low-resolution point clouds, with the bilateral interpolation and max-pooling operations, the deconvolution network can progressively output high-resolution local and global feature maps. By concatenating different resolutions of local and global feature maps, we employ the multi-layer perceptron as the generation network to generate multi-resolution point clouds. In order to keep the shapes of different resolutions of point clouds consistent, we propose a shape-preserving adversarial loss to train the point cloud deconvolution generation network. Experimental results demonstrate the effectiveness of our proposed method.
摘要：在本文中，我们提出了一种有效的点云生成方法，其可以从一个潜矢量生成的相同形状的多分辨率点云。具体来说，我们制定了新的进步去卷积网络与基于学习的双边插值。基于学习双边插在点云的空间和功能空间进行这样的点云局部几何结构信息可被利用。从低分辨率的点云开始，随着双边插值和最大池操作，反卷积网络可以逐步输出高分辨率局部和全局的特征图。通过连接本地和全局特征的地图不同的分辨率，我们采用了多层感知的下一代网络，以生成多分辨率点云。为了保持点云的不同分辨率的形状相一致，我们提出了保形对抗性损失训练点云去卷积代网络。实验结果表明，我们提出的方法的有效性。

12. Spine Landmark Localization with combining of Heatmap Regression and Direct Coordinate Regression [PDF] 返回目录
Wanhong Huang, Chunxi Yang, TianHong Hou
Abstract: Landmark Localization plays a very important role in processing medical images as well as in disease identification. However, In medical field, it's a challenging task because of the complexity of medical images and the high requirement of accuracy for disease identification and treatment.There are two dominant ways to regress landmark coordination, one using the full convolutional network to regress the heatmaps of landmarks , which is a complex way and heatmap post-process strategies are needed, and the other way is to regress the coordination using CNN + Full Connective Network directly, which is very simple and faster training , but larger dataset and deeper model are needed to achieve higher accuracy. Though with data augmentation and deeper network it can reach a reasonable accuracy, but the accuracy still not reach the requirement of medical field. In addition, a deeper networks also means larger space consumption. To achieve a higher accuracy, we contrived a new landmark regression method which combing heatmap regression and direct coordinate regression base on probability methods and system control theory.
摘要：地标定位起着处理医学图像，以及在疾病鉴别非常重要的作用。然而，在医疗领域，这是一个具有挑战性的任务，因为医学图像的复杂性和准确性的疾病鉴定和treatment.There有较高要求的是采用全卷积网络两种主要的方式来回归标志性的协调，一个倒退的热图标志性建筑，这是一种复杂的方式和热图后处理，需要策略，另一种方式是退步直接使用CNN +全连体网的协调，这是非常简单和快捷的训练，但更大的数据集和更深层次的模型需要达到更高的精度。尽管数据增长以及更深入的网络可以达到一个合理的精度，但精度仍达不到医疗领域的需求。此外，更深的网络还意味着更大的空间消耗。为了达到更高的精度，我们做作的新地标回归方法，梳理热图回归和指导协调概率的方法和系统控制理论回归基地。

13. Are pathologist-defined labels reproducible? Comparison of the TUPAC16 mitotic figure dataset with an alternative set of labels [PDF] 返回目录
Christof A. Bertram, Mitko Veta, Christian Marzahl, Nikolas Stathonikos, Andreas Maier, Robert Klopfleisch, Marc Aubreville
Abstract: Pathologist-defined labels are the gold standard for histopathological data sets, regardless of well-known limitations in consistency for some tasks. To date, some datasets on mitotic figures are available and were used for development of promising deep learning-based algorithms. In order to assess robustness of those algorithms and reproducibility of their methods it is necessary to test on several independent datasets. The influence of different labeling methods of these available datasets is currently unknown. To tackle this, we present an alternative set of labels for the images of the auxiliary mitosis dataset of the TUPAC16 challenge. Additional to manual mitotic figure screening, we used a novel, algorithm-aided labeling process, that allowed to minimize the risk of missing rare mitotic figures in the images. All potential mitotic figures were independently assessed by two pathologists. The novel, publicly available set of labels contains 1,999 mitotic figures (+28.80%) and additionally includes 10,483 labels of cells with high similarities to mitotic figures (hard examples). We found significant difference comparing F_1 scores between the original label set (0.549) and the new alternative label set (0.735) using a standard deep learning object detection architecture. The models trained on the alternative set showed higher overall confidence values, suggesting a higher overall label consistency. Findings of the present study show that pathologists-defined labels may vary significantly resulting in notable difference in the model performance. Comparison of deep learning-based algorithms between independent datasets with different labeling methods should be done with caution.
摘要：病理学家定义的标签进行组织病理学数据集的金标准，无论知名的局限性，对于某些任务的一致性。迄今，核分裂象一些数据集可用，并且被用于看好深学习型算法的发展。为了评估的那些算法的鲁棒性和可重复性的它们的方法，有必要以测试几个独立的数据集。这些数据集提供不同的标记方法的影响目前还不清楚。为了解决这个问题，我们提出另一套标签为TUPAC16挑战的辅助有丝分裂数据集的图像。附加到手动核分裂筛选中，我们使用一种新颖的，算法辅助标记过程中，允许以最小化中缺少图像罕见核分裂的风险。所有潜在的有丝分裂数字是由两名病理学家评估。新颖的，标签的可公开获得的集包含1999核分裂（+ 28.80％），并且另外包括具有高相似性核分裂（硬的例子）细胞的10483个标签。我们发现显著差异比较原始标签组（0.549），并使用标准的深度学习对象检测体系的新的替代标签组（0.735）之间F_1得分。训练有素的替代集的模型具有较高的整体信心值，表明较高的整体标签的一致性。本研究的结果显示发现，即病理学家定义的标签可能会发生变化显著导致在模型的性能显着的差异。用不同的标记方法独立的数据集之间的深厚基础的学习的算法比较应谨慎进行。

14. DECAPS: Detail-Oriented Capsule Networks [PDF] 返回目录
Aryan Mobiny, Pengyu Yuan, Pietro Antonio Cicalese, Hien Van Nguyen
Abstract: Capsule Networks (CapsNets) have demonstrated to be a promising alternative to Convolutional Neural Networks (CNNs). However, they often fall short of state-of-the-art accuracies on large-scale high-dimensional datasets. We propose a Detail-Oriented Capsule Network (DECAPS) that combines the strength of CapsNets with several novel techniques to boost its classification accuracies. First, DECAPS uses an Inverted Dynamic Routing (IDR) mechanism to group lower-level capsules into heads before sending them to higher-level capsules. This strategy enables capsules to selectively attend to small but informative details within the data which may be lost during pooling operations in CNNs. Second, DECAPS employs a Peekaboo training procedure, which encourages the network to focus on fine-grained information through a second-level attention scheme. Finally, the distillation process improves the robustness of DECAPS by averaging over the original and attended image region predictions. We provide extensive experiments on the CheXpert and RSNA Pneumonia datasets to validate the effectiveness of DECAPS. Our networks achieve state-of-the-art accuracies not only in classification (increasing the average area under ROC curves from 87.24% to 92.82% on the CheXpert dataset) but also in the weakly-supervised localization of diseased areas (increasing average precision from 41.7% to 80% for the RSNA Pneumonia detection dataset).
摘要：胶囊网络（CapsNets）已经证明是一个很有前途的替代卷积神经网络（细胞神经网络）。然而，它们往往缺乏关于大型高维数据集状态的最先进的精度的。我们提出了一个注重细节的胶囊网（DECAPS），结合CapsNets的几个新技术，以提高其分类精确的实力。首先，DECAPS用途倒立动态路由（IDR）机制来组下级胶囊入头将它们发送到更高级别的胶囊之前。这一战略使胶囊选择性顾不上可以在细胞神经网络的汇集操作过程中丢失的数据中虽小，但详细资讯。其次，DECAPS采用了捉迷藏训练过程，鼓励网络专注于通过第二级注意方案细粒度信息。最后，在蒸馏过程中提高DECAPS的通过平均原件及出席图像区域预测的稳健性。我们提供的CheXpert和RSNA肺炎的数据集大量的实验来验证DECAPS的有效性。我们的网络实现国家的最先进的精度不仅在分类（从87.24％提高ROC曲线下的平均面积，以92.82％的CheXpert数据集），而且在患病区域的弱监督的定位（增加从平均精度41.7％至80％的RSNA肺炎检测数据集）。

15. Data-Efficient Ranking Distillation for Image Retrieval [PDF] 返回目录
Zakaria Laskar, Juho Kannala
Abstract: Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network. In this paper we address knowledge distillation for metric learning problems. Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, and iii) small fraction of original training data without any ground-truth labels. In addition, the distillation method does not require the student and teacher to have same dimensionality. Addressing these constraints reduces computation requirements, dependency on large-scale training datasets and addresses practical scenarios of limited or partial access to private data such as teacher models or the corresponding training data/labels. The key idea is to augment the original training set with additional samples by performing linear interpolation in the final output representation space. Distillation is then performed in the joint space of original and augmented teacher-student sample representations. Results demonstrate that our approach can match baseline models trained with full supervision. In low training sample settings, our approach outperforms the fully supervised approach on two challenging image retrieval datasets, ROxford5k and RParis6k \cite{Roxf} with the least possible teacher supervision.
摘要：在深度学习的最新进展已经导致图像检索领域的快速发展。然而，为表现最佳的架构招致显著计算成本。最近方法运用知识蒸馏知识转让从更深和更重的架构，以更小的网络解决这个问题。在本文中，我们讨论了度量学习问题的认识升华。不像以前的方法，我们提出的方法共同解决以下限制i）受限制查询老师的模式，ii）与接入到最终输出表示黑箱老师模型，以及iii）原始训练数据的很小一部分，没有任何地面实况标签。此外，蒸馏方法不要求学生和老师有相同的维数。解决这些制约因素降低了运算要求，大型训练数据和地址的限制或部分访问私人数据实际情况下如教师模型或相应的训练数据/标签的依赖。关键思想是通过在最终的输出表示空间进行线性插值，以增加额外的样品原始训练集。蒸馏，然后在原来的和增强师生样本表示的关节间隙进行。结果表明，我们的方法可以配合全程监督训练的基本模式。在低训练样本的设置，我们的方法优于两个有挑战性的图像检索数据集的完全监控方法，ROxford5k和RParis6k \ {引用} Roxf用尽可能少的老师监督。

16. Using Machine Learning to Detect Ghost Images in Automotive Radar [PDF] 返回目录
Florian Kraus, Nicolas Scheiner, Werner Ritter, Klaus Dietmayer
Abstract: Radar sensors are an important part of driver assistance systems and intelligent vehicles due to their robustness against all kinds of adverse conditions, e.g., fog, snow, rain, or even direct sunlight. This robustness is achieved by a substantially larger wavelength compared to light-based sensors such as cameras or lidars. As a side effect, many surfaces act like mirrors at this wavelength, resulting in unwanted ghost detections. In this article, we present a novel approach to detect these ghost objects by applying data-driven machine learning algorithms. For this purpose, we use a large-scale automotive data set with annotated ghost objects. We show that we can use a state-of-the-art automotive radar classifier in order to detect ghost objects alongside real objects. Furthermore, we are able to reduce the amount of false positive detections caused by ghost images in some settings.
摘要：雷达传感器是驾驶员辅助系统和智能汽车的重要组成部分，由于其对各种不利条件，例如，雾，雪，雨，甚至是阳光直射下的鲁棒性。这种鲁棒性通过基本上更大波长相比基于光的传感器，例如照相机或激光雷达来实现。作为副作用，许多表面像镜在该波长，从而导致不希望的重影检测。在这篇文章中，提出了一种新颖的方法通过施加数据驱动机器学习算法来检测这些鬼对象。为此，我们使用注解鬼对象的大型汽车的数据集。我们表明我们可以以检测鬼对象一起真正的对象使用一个国家的最先进的汽车雷达分类。此外，我们能够减少一些设置造成的鬼影误报检测的量。

17. Continual Adaptation for Deep Stereo [PDF] 返回目录
Matteo Poggi, Alessio Tonioni, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano
Abstract: Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an assumption systematically met in real applications, the capacity of adapting to any unseen setting becomes of paramount importance. Purposely, we propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments. We design a lightweight and modular architecture, Modularly ADaptive Network (MADNet), and formulate Modular ADaptation algorithms(MAD,MAD++) which permit efficient optimization of independent sub-portions of the entire network. In our paradigm the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms. With both sources no other data than the input images being gathered at deployment time are needed.Thus, our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system and pave the way for a new paradigm that can facilitate practical deployment of end-to-end architectures for dense disparity regression.
摘要：从立体图像深度估计与由经过培训的端至端回归密集的差距卷积神经网络无法比拟的结果进行。像大多数的任务，这是可能的，如果有大量的标记样本可用于培训，可能覆盖在部署时遇到的整个数据分布。是这样的，在实际应用中满足系统的假设，适应任何看不见设置的能力变得至关重要。故意，我们提出了旨在应对挑战和不断变化的环境中深立体网络中不断适应范式。我们设计的轻量和模块化架构，具有模块化自适应网络（MADNet），制定模块化自适应算法（MAD，MAD ++），其允许的整个网络的独立子部分有效的优化。在我们的范例学习需要连续在线适应机型信号可以从自检通过从右到左图像扭曲或从传统的立体算法进行采购。有了这两个来源不超过输入图像的其他数据在部署时被聚集在needed.Thus，我们的网络架构和自适应算法实现了第一个实时自适应深立体声系统和一个新的范例，可以方便实用铺平道路终端到终端的架构的密集差异回归的部署。

18. Miss the Point: Targeted Adversarial Attack on Multiple Landmark Detection [PDF] 返回目录
Qingsong Yao, Zecheng He, Hu Han, S. Kevin Zhou
Abstract: Recent methods in multiple landmark detection based on deep convolutional neural networks (CNNs) reach high accuracy and improve traditional clinical workflow. However, the vulnerability of CNNs to adversarial-example attacks can be easily exploited to break classification and segmentation tasks. This paper is the first to study how fragile a CNN-based model on multiple landmark detection to adversarial perturbations. Specifically, we propose a novel Adaptive Targeted Iterative FGSM (ATI-FGSM) attack against the state-of-the-art models in multiple landmark detection. The attacker can use ATI-FGSM to precisely control the model predictions of arbitrarily selected landmarks, while keeping other stationary landmarks still, by adding imperceptible perturbations to the original image. A comprehensive evaluation on a public dataset for cephalometric landmark detection demonstrates that the adversarial examples generated by ATI-FGSM break the CNN-based network more effectively and efficiently, compared with the original Iterative FGSM attack. Our work reveals serious threats to patients' health. Furthermore, we discuss the limitations of our method and provide potential defense directions, by investigating the coupling effect of nearby landmarks, i.e., a major source of divergence in our experiments. Our source code is available at this https URL.
摘要：在多个标志检测最近的方法基于深卷积神经网络（细胞神经网络）达到很高的精度，改善传统的临床工作流程。然而，细胞神经网络，以对抗 - 例如攻击的漏洞，可以很容易地利用突破的分类和分割任务。本文是第一个多么脆弱多个标志检测基于CNN的模型研究，以对抗扰动。具体来说，我们建议目标迭代FGSM（ATI-FGSM）对国家的最先进的模型中多个标志检测攻击的新型自适应。该攻击者可以使用ATI-FGSM精确地控制任意选择的地标模型预测，同时保持其它固定地标仍然，通过添加难以察觉扰动原始图像。在公共数据集X线头影测量标志检测综合评价表明，通过ATI-FGSM产生对抗的例子更有效地打破了基于CNN的网络，与原迭代FGSM攻击比较。我们的工作揭示了患者的健康造成严重威胁。此外，我们还讨论了该方法的限制，并提供可能的防御方向，通过调查附近的地标，也就是在我们的实验分歧的主要来源的耦合效应。我们的源代码可在此HTTPS URL。

19. Distillation Guided Residual Learning for Binary Convolutional Neural Networks [PDF] 返回目录
Jianming Ye, Shiliang Zhang, Jingdong Wang
Abstract: It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN). We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN. To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN. This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN. It also motivates us to update the binary convolutional block architecture to facilitate the optimization of block-wise distillation loss. Specifically, a lightweight shortcut branch is inserted into each binary convolutional block to complement residuals at each block. Benefited from its Squeeze-and-Interaction (SI) structure, this shortcut branch introduces a fraction of parameters, e.g., 10\% overheads, but effectively complements the residuals. Extensive experiments on ImageNet demonstrate the superior performance of our method in both classification efficiency and accuracy, e.g., BCNN trained with our methods achieves the accuracy of 60.45\% on ImageNet.
摘要：这是具有挑战性的弥合二元CNN（BCNN）和浮点CNN（FCNN）之间的性能差距。我们观察到，这种性能上的差距导致中间特征之间的显着残差映射BCNN和FCNN的。为了尽量减少性能差距，我们执行BCNN生产与FCNN的那些类似的中间特征图。这种训练策略，即，具有从FCNN衍生逐块蒸馏损失，引线各二进制卷积块优化以更有效的优化BCNN。这也促使我们更新二进制卷积块架构，以促进块式蒸馏损失的优化。具体而言，轻量快捷分支被插入到每个二进制卷积块在每个块来补充残差。从它的挤压和 - 相互作用（SI）结构中受益，该快捷方式分支介绍的参数的一小部分，例如，10层\％的开销，但有效地补充了残差。上ImageNet广泛的实验表明我们在这两个分类效率和准确性，例如方法的性能优越，BCNN训练了与我们的方法实现了60.45 \％上ImageNet精度。

20. Affine Non-negative Collaborative Representation Based Pattern Classification [PDF] 返回目录
He-Feng Yin, Xiao-Jun Wu, Zhen-Hua Feng, Josef Kittler
Abstract: During the past decade, representation-based classification methods have received considerable attention in pattern recognition. In particular, the recently proposed non-negative representation based classification (NRC) method has been reported to achieve promising results in a wide range of classification tasks. However, NRC has two major drawbacks. First, there is no regularization term in the formulation of NRC, which may result in unstable solution and misclassification. Second, NRC ignores the fact that data usually lies in a union of multiple affine subspaces, rather than linear subspaces in practical applications. To address the above issues, this paper presents an affine non-negative collaborative representation (ANCR) model for pattern classification. To be more specific, ANCR imposes a regularization term on the coding vector. Moreover, ANCR introduces an affine constraint to better represent the data from affine subspaces. The experimental results on several benchmarking datasets demonstrate the merits of the proposed ANCR method. The source code of our ANCR is publicly available at this https URL.
摘要：在过去十年中，基于表示的分类方法已经得到极大的重视模式识别。尤其是，最近提出的非负的表现为基础的分类（NRC）方法已经被报道，以实现广泛的分类任务可喜的成果。然而，NRC有两个主要缺点。首先，有在NRC的制剂中，这可能导致不稳定的溶液和错误分类没有正则化项。其次，NRC忽略了一个事实数据通常在于多个仿射子空间，而不是在实际应用中的线性子空间的结合。为了解决上述问题，提出了在模式分类的仿射非负协同表示（ANCR）模型。更具体地讲，ANCR强加给编码向量的正则化项。此外，引入了ANCR仿射约束，以更好地表示从仿射子空间中的数据。在几个基准数据集上的实验结果表明，所提出的方法ANCR的优点。我们ANCR的源代码是公开的，在此HTTPS URL。

21. SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation [PDF] 返回目录
John Yang, Hyung Jin Chang, Seungeui Lee, Nojun Kwak
Abstract: 3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework for better 3D hand pose estimation performance, which leads to the necessity of a large scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movements by re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential images of synthetic hands in motion and emphasizing temporal smoothness of estimations with a temporal consistency constraint. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in experiments on hand pose estimation benchmarks.
摘要：基于RGB图像的3D手姿势估计已经研究了很长一段时间。大多数研究，但是，基于独立的静态图像都进行一帧一帧的估计。在本文中，我们尝试不仅要考虑手的外观，但结合了手的时间运动信息在运动到学习框架，更好的3D手姿势估计性能，从而导致与连续大规模数据集的必要性RGB手图像。我们建议，生成合成数据集模仿自然人类的手部动作由现存静电手姿态数据集的重新设计注解成姿态-流动的新方法。随着产生的数据集，我们培养一个新提出的经常性的框架，利用从合成手的连续图像的视觉一时间特征的运动，并强调估计的时间平滑度与时间一致性约束。从合成到现实领域中细化和微调分离框架的反复层的我们新的培训策略允许从顺序合成手指图像学的视觉一时间特征保存。顺序地估计因而手的姿势产生自然和平滑的手部动作，其导致更健壮的估计。我们显示3D手姿态估计是利用时间信息显著增强了在手的形状估计基准实验超越国家的最先进方法的一般姿势估计。

22. Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer [PDF] 返回目录
Xinghao Chen, Yiman Zhang, Yunhe Wang, Han Shu, Chunjing Xu, Chang Xu
Abstract: Video style transfer techniques inspire many exciting applications on mobile devices. However, their efficiency and stability are still far from satisfactory. To boost the transfer stability across frames, optical flow is widely adopted, despite its high computational complexity, e.g. occupying over 97% inference time. This paper proposes to learn a lightweight video style transfer network via knowledge distillation paradigm. We adopt two teacher networks, one of which takes optical flow during inference while the other does not. The output difference between these two teacher networks highlights the improvements made by optical flow, which is then adopted to distill the target student network. Furthermore, a low-rank distillation loss is employed to stabilize the output of student network by mimicking the rank of input videos. Extensive experiments demonstrate that our student network without an optical flow module is still able to generate stable video and runs much faster than the teacher network.
摘要：视频风格转移技术激发移动设备上的许多令人兴奋的应用。然而，他们的效率和稳定性都令人满意尚远。为了提高在帧间转印稳定性，光流广泛采用，尽管其高计算复杂度，例如占97％以上推理时间。本文提出了学习知识，通过蒸馏范例一个轻量级的视频风格传输网络。我们采用两种教师网络，其中一个推论过程中所采用的光流，而另一个没有。这两个教师网络之间的输出差强调了光流，然后将其通过蒸馏目标学生网络所作的改进。此外，低秩蒸馏损失采用通过模仿的输入视频的排名，以稳定学生网络的输出。大量的实验证明，我们没有光流模块，学生网络仍然能够产生稳定的视频和运行速度比老师网络快得多。

23. FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-Oriented Scene Text Detection [PDF] 返回目录
Xugong Qin, Yu Zhou, Dayan Wu, Yinliang Yue, Weiping Wang
Abstract: Recent scene text detection works mainly focus on curve text detection. However, in real applications, the curve texts are more scarce than the multi-oriented ones. Accurate detection of multi-oriented text with large variations of scales, orientations, and aspect ratios is of great significance. Among the multi-oriented detection methods, direct regression for the geometry of scene text shares a simple yet powerful pipeline and gets popular in academic and industrial communities, but it may produce imperfect detections, especially for long texts due to the limitation of the receptive field. In this work, we aim to improve this while keeping the pipeline simple. A fully convolutional corner refinement network (FC2RN) is proposed for accurate multi-oriented text detection, in which an initial corner prediction and a refined corner prediction are obtained at one pass. With a novel quadrilateral RoI convolution operation tailed for multi-oriented scene text, the initial quadrilateral prediction is encoded into the feature maps which can be further used to predict offset between the initial prediction and the ground-truth as well as output a refined confidence score. Experimental results on four public datasets including MSRA-TD500, ICDAR2017-RCTW, ICDAR2015, and COCO-Text demonstrate that FC2RN can outperform the state-of-the-art methods. The ablation study shows the effectiveness of corner refinement and scoring for accurate text localization.
摘要：最近的场景文本检测工作主要集中在曲线文本检测。然而，在实际应用中，曲线文本比面向多的人更加稀少。面向多文本有鳞片，取向，和纵横比的大变化的精确检测具有重要的意义。其中面向多种检测方法，直接回归为现场文字股一个简单但功能强大的管道的几何形状，并得到学术界和工业界流行，但它可能会产生不完美的检测，特别是对于长文本由于感受野的限制。在这项工作中，我们的目标是改善这一点的同时保持管道简单。一个完全卷积角细化网络（FC2RN）提出了一种用于准确定向多文本检测，其中在一次通过中获得的初始角预测和精制角预测。与尾面向多场面文本的新的四边形的投资回报卷积运算，初始四边形预测被编码成可被进一步用于将初始预测和地面实况以及输出一个精制置信度得分之间的偏移预测特征地图。在四个公共数据集包括MSRA-TD500，ICDAR2017-RCTW，ICDAR2015，和COCO-文本实验结果表明，FC2RN可以超越国家的最先进的方法。该消融研究表明角落细化和评分的准确定位文本的有效性。

24. $n$-Reference Transfer Learning for Saliency Prediction [PDF] 返回目录
Yan Luo, Yongkang Wong, Mohan S. Kankanhalli, Qi Zhao
Abstract: Benefiting from deep learning research and large-scale datasets, saliency prediction has achieved significant success in the past decade. However, it still remains challenging to predict saliency maps on images in new domains that lack sufficient data for data-hungry models. To solve this problem, we propose a few-shot transfer learning paradigm for saliency prediction, which enables efficient transfer of knowledge learned from the existing large-scale saliency datasets to a target domain with limited labeled examples. Specifically, very few target domain examples are used as the reference to train a model with a source domain dataset such that the training process can converge to a local minimum in favor of the target domain. Then, the learned model is further fine-tuned with the reference. The proposed framework is gradient-based and model-agnostic. We conduct comprehensive experiments and ablation study on various source domain and target domain pairs. The results show that the proposed framework achieves a significant performance improvement. The code is publicly available at \url{this https URL}.
摘要：从深度学习研究和大规模数据集受益，显着性预测已经实现了在过去十年显著的成功。然而，它仍然是具有挑战性的预测在缺乏数据饥渴的模型有足够的数据新域的图像特征地图。为了解决这个问题，我们提出了显着性预测，这使得从现有的大型数据集显着学会了目标域有限的标识样本知识的有效转移了几拍迁移学习的范例。具体而言，很少目标域的例子被用作参考来训练模型与源域的数据集，使得训练过程能够收敛到局部最小值赞成目标域的。然后，学习的模型进一步微调与参考。拟议的框架是基于梯度和模型无关。我们进行全面的试验和各种源域和目标域对烧蚀研究。结果表明，所提出的框架实现了显著的性能提升。该代码是公开的，在\ {URL这HTTPS URL}。

25. Learnable Hollow Kernels for Anatomical Segmentation [PDF] 返回目录
Elizaveta Lazareva, Oleg Rogov, Olga Shegai, Denis Larionov, Dmitry V. Dylov
Abstract: Segmentation of certain hollow organs, such as the bladder, is especially hard to automate due to their complex geometry, vague intensity gradients in the soft tissues, and a tedious manual process of the data annotation routine. Yet, accurate localization of the walls and the cancer regions in the radiologic images of such organs is an essential step in oncology. To address this issue, we propose a new class of hollow kernels that learn to 'mimic' the contours of the segmented organ, effectively replicating its shape and structural complexity. We train a series of the U-Net-like neural networks using the proposed kernels and demonstrate the superiority of the idea in various spatio-temporal convolution scenarios. Specifically, the dilated hollow-kernel architecture outperforms state-of-the-art spatial segmentation models, whereas the addition of temporal blocks with, e.g., Bi-LSTM, establishes a new multi-class baseline for the bladder segmentation challenge. Our spatio-temporal model based on the hollow kernels reaches the mean dice scores of 0.936, 0.736, and 0.712 for the bladder's inner wall, the outer wall, and the tumor regions, respectively. The results pave the way towards other domain-specific deep learning applications where the shape of the segmented object could be used to form a proper convolution kernel for boosting the segmentation outcome.
摘要：某些中空器官，如膀胱的分割，尤其是很难实现自动化，由于其复杂的几何形状，在软组织模糊强度梯度，并且将数据注释例程的繁琐的手工过程。然而，在这种器官的放射图像中的壁和癌症区域的精确定位是在肿瘤学中的重要步骤。为了解决这个问题，我们提出了一个新的类，学会“模仿”的分段器官的轮廓，有效地复制它的形状和结构复杂的空心内核。我们培养了一系列利用所提出的核U型网状神经网络，并证明在不同的时空卷积场景想法的优越性。具体而言，扩张的中空内核架构性能优于国家的最先进的空间分割模型，而另外与时间块，例如的，碧LSTM，建立用于膀胱分割挑战一个新的多级基线。我们基于在中空内核时空模型分别达到0.936，0.736，和0.712的球胆的内壁平均骰子分数，外壁，和肿瘤区域。结果铺平对其他特定领域的深度学习应用中被分割对象的形状可以用来形成提振分割结果的适当的卷积核的方式。

26. DCANet: Learning Connected Attentions for Convolutional Neural Networks [PDF] 返回目录
Xu Ma, Jingda Guo, Sihai Tang, Zhinan Qiao, Qi Chen, Qing Yang, Song Fu
Abstract: While self-attention mechanism has shown promising results for many vision tasks, it only considers the current features at a time. We show that such a manner cannot take full advantage of the attention mechanism. In this paper, we present Deep Connected Attention Network (DCANet), a novel design that boosts attention modules in a CNN model without any modification of the internal structure. To achieve this, we interconnect adjacent attention blocks, making information flow among attention blocks possible. With DCANet, all attention blocks in a CNN model are trained jointly, which improves the ability of attention learning. Our DCANet is generic. It is not limited to a specific attention module or base network architecture. Experimental results on ImageNet and MS COCO benchmarks show that DCANet consistently outperforms the state-of-the-art attention modules with a minimal additional computational overhead in all test cases. All code and models are made publicly available.
摘要：虽然自注意机制已经显示出令人鼓舞的结果对于很多视觉任务，它只考虑在当前时间的功能。我们表明，这种方式不能把注意力机制的充分利用。在本文中，我们提出了深连注意网络（DCANet），一种新型的设计，提升注意模块在CNN模型而无需内部结构的任何修饰。为了实现这一目标，我们相邻互连关注块，使得关注块可能之间的信息流。随着DCANet，所有的注意力块在CNN模型共同的培训，从而提高注意力学习能力。我们DCANet是通用的。它并不局限于特定的注意模块或碱网络架构。在ImageNet和MS COCO基准实验结果表明，DCANet始终优于国家的最先进的关注模块在所有测试情况下，最小的额外计算开销。所有代码和模型对外公开。

27. A Benchmark for Inpainting of Clothing Images with Irregular Holes [PDF] 返回目录
Furkan Kınlı, Barış Özcan, Furkan Kıraç
Abstract: Fashion image understanding is an active research field with a large number of practical applications for the industry. Despite its practical impacts on intelligent fashion analysis systems, clothing image inpainting has not been extensively examined yet. For that matter, we present an extensive benchmark of clothing image inpainting on well-known fashion datasets. Furthermore, we introduce the use of a dilated version of partial convolutions, which efficiently derive the mask update step, and empirically show that the proposed method reduces the required number of layers to form fully-transparent masks. Experiments show that dilated partial convolutions (DPConv) improve the quantitative inpainting performance when compared to the other inpainting strategies, especially it performs better when the mask size is 20% or more of the image. \keywords{image inpainting, fashion image understanding, dilated convolutions, partial convolutions
摘要：时尚形象的理解是有大量的为各行业实际应用中一个活跃的研究领域。尽管智能时尚解析系统，其实际影响，服装图像修复还没有被广泛尚未审查。对于这个问题，我们提出了服装图像修复上著名时装数据集的广泛基准。此外，我们介绍使用局部卷积，这有效地导出所述掩模更新步骤的扩张型版本，并根据经验表明，所提出的方法降低了层的所需数目，以形成完全透明的掩模。实验表明相对于其他图像修复策略时扩张局部卷积（DPConv）提高定量修补性能，特别是当掩模尺寸为20％或更多的图像它执行更好。 \关键字{图像修复，时尚图像理解，扩张型卷积，局部卷积

28. Automatic Detection of Major Freeway Congestion Events Using Wireless Traffic Sensor Data: A Machine Learning Approach [PDF] 返回目录
Sanaz Aliari, Kaveh F. Sadabadi
Abstract: Monitoring the dynamics of traffic in major corridors can provide invaluable insight for traffic planning purposes. An important requirement for this monitoring is the availability of methods to automatically detect major traffic events and to annotate the abundance of travel data. This paper introduces a machine learning based approach for reliable detection and characterization of highway traffic congestion events from hundreds of hours of traffic speed data. Indeed, the proposed approach is a generic approach for detection of changes in any given time series, which is the wireless traffic sensor data in the present study. The speed data is initially time-windowed by a ten-hour long sliding window and fed into three Neural Networks that are used to detect the existence and duration of congestion events (slowdowns) in each window. The sliding window captures each slowdown event multiple times and results in increased confidence in congestion detection. The training and parameter tuning are performed on 17,483 hours of data that includes 168 slowdown events. This data is collected and labeled as part of the ongoing probe data validation studies at the Center for Advanced Transportation Technologies (CATT) at the University of Maryland. The Neural networks are carefully trained to reduce the chances of over-fitting to the training data. The experimental results show that this approach is able to successfully detect most of the congestion events, while significantly outperforming a heuristic rule-based approach. Moreover, the proposed approach is shown to be more accurate in estimation of the start-time and end-time of the congestion events.
摘要：监控流量的动态，主要干道可以为交通规划的目的宝贵的见解。该监控的一个重要要求就是方法的可用性，自动检测主要交通事件和注释旅游数据的丰度。本文介绍了机器学习从数以百计的交通速度数据的小时可靠的检测和公路交通拥堵事件定性为基础的方法。实际上，该方法是用于检测在任何给定的时间序列，这是在本研究中，无线交通传感器数据的变化的一种通用方法。速度数据是初始时域加窗由10个小时长的滑动窗口，并送入3个神经网络被用来检测每个窗口的拥塞事件（减速）的存在和持续时间。滑动窗口捕获每个放缓事件多次，导致在拥塞检测增加信心。训练和参数调整是在17483小时包括168个减速事件的数据的执行。该数据被收集并标记为在马里兰大学的中心先进运输技术研究院（CATT）正在进行的探测数据验证研究的一部分。神经网络是认真训练，以减少过度拟合训练数据的机会。实验结果表明，该方法能够成功检测最拥堵的事件，而显著优于启发式基于规则的方法。此外，该方法被证明是在拥挤事件的开始时间和结束时间的估计更准确。

29. Learning Representations that Support Extrapolation [PDF] 返回目录
Taylor W. Webb, Zachary Dulberg, Steven M. Frankland, Alexander A. Petrov, Randall C. O'Reilly, Jonathan D. Cohen
Abstract: Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to interpolation between data points in their training corpora. In this paper, we consider the challenge of learning representations that support extrapolation. We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation as a function of distance from the convex domain defined by the training data. We also introduce a simple technique, context normalization, that encourages representations that emphasize the relations between objects. We find that this technique enables a significant improvement in the ability to extrapolate, considerably outperforming a number of competitive techniques.
摘要：外推法 - 使超越一个人的经验范围推论的能力 - 是人类智慧的一个标志。与此相反，由当代神经网络算法表现出的泛化主要限于数据点之间进行内插在它们的训练语料库。在本文中，我们考虑学习表示，用于支持外插的挑战。我们介绍一种新颖的视觉类比基准，其允许外推的分级评价作为距离从由所述训练数据中定义的凸域的功能。我们还介绍了一个简单的技术，背景正常化，鼓励，强调对象之间的关系表示。我们发现，这种技术使显著改善的能力来推断，大大跑赢了一些有竞争力的技术。

30. Multimodal price prediction [PDF] 返回目录
Aidin Zehtab-Salmasi, Ali-Reza Feizi-Derakhshi, Narjes Nikzad-Khasmakhi, Meysam Asgari-Chenaghlu, Saeideh Nabipour
Abstract: Valorization is one of the most heated discussions in the business community, and commodities valorization is one subset in this task. Features of a product is an essential characteristic in valorization and features are categorized into two classes: graphical and non-graphical. Nowadays, the value of products is measured by price. The goal of this research is to achieve an arrangement to predict the price of a product based on specifications of that. We propose five deep learning models to predict the price range of a product, one unimodal and four multimodal systems. The multimodal methods predict based on the image and non-graphical specification of product. As a platform to evaluate the methods, a cellphones dataset has been gathered from GSMArena. In proposed methods, convolutional neural network is an infrastructure. The experimental results show 88.3% F1-score in the best method.
摘要：稳定物价是在商业社会中最热烈的讨论之一，大宗商品价格稳定在这个任务的一个子集。一个产品的特征是在物价稳定措施的必要特性和特征被分为两类：图形和非图形。如今，产品的价值是由价格测量。这项研究的目标是实现安排来预测基础上，该规格产品的价格。我们建议深五个学习模型来预测一个产品，一个单峰和四个多模系统的价格范围。多峰方法基础上预测产品的图像和非图形规范。作为一个平台来评估方法，一个手机的数据集已经从GSMArena聚集。在提出的方法，卷积神经网络是基础设施。实验结果表明，88.3％的F1-得分的最佳方法。

31. A Quick Review on Recent Trends in 3D Point Cloud Data Compression Techniques and the Challenges of Direct Processing in 3D Compressed Domain [PDF] 返回目录
Mohammed Javed, MD Meraz, Pavan Chakraborty
Abstract: Automatic processing of 3D Point Cloud data for object detection, tracking and segmentation is the latest trending research in the field of AI and Data Science, which is specifically aimed at solving different challenges of autonomous driving cars and getting real time performance. However, the amount of data that is being produced in the form of 3D point cloud (with LiDAR) is very huge, due to which the researchers are now on the way inventing new data compression algorithms to handle huge volumes of data thus generated. However, compression on one hand has an advantage in overcoming space requirements, but on the other hand, its processing gets expensive due to the decompression, which indents additional computing resources. Therefore, it would be novel to think of developing algorithms that can operate/analyse directly with the compressed data without involving the stages of decompression and recompression (required as many times, the compressed data needs to be operated or analyzed). This research field is termed as Compressed Domain Processing. In this paper, we will quickly review few of the recent state-of-the-art developments in the area of LiDAR generated 3D point cloud data compression, and highlight the future challenges of compressed domain processing of 3D point cloud data.
摘要：三维点云数据的自动处理物体检测，跟踪和分割是人工智能和数据科学，它是专门针对解决的自动驾驶汽车不同的挑战，并获得实时性能的领域的最新趋势的研究。然而，在3D点云（与激光雷达）的形式被生产的数据量是非常巨大的，由于该研究人员现在对发明新的数据压缩算法来处理由此产生大量的数据的方式。然而，一方面压缩具有在克服空间需求的优点，但在另一方面，它的处理变得昂贵，由于解压缩，其缩进额外的计算资源。因此，这将是新颖想到的是可以操作/与压缩数据直接分析不涉及解压缩和重新压缩的阶段开发算法（需要多次，压缩数据需要被操作或分析的）。这一研究领域被称为压缩域处理技术。在本文中，我们将快速回顾一下几个国家的最先进的最新发展激光雷达的区域产生三维点云数据压缩，并突出显示三维点云数据的压缩域处理的未来的挑战。

32. Few Is Enough: Task-Augmented Active Meta-Learning for Brain Cell Classification [PDF] 返回目录
Pengyu Yuan, Aryan Mobiny, Jahandar Jahanipour, Xiaoyang Li, Pietro Antonio Cicalese, Badrinath Roysam, Vishal Patel, Maric Dragan, Hien Van Nguyen
Abstract: Deep Neural Networks (or DNNs) must constantly cope with distribution changes in the input data when the task of interest or the data collection protocol changes. Retraining a network from scratch to combat this issue poses a significant cost. Meta-learning aims to deliver an adaptive model that is sensitive to these underlying distribution changes, but requires many tasks during the meta-training process. In this paper, we propose a tAsk-auGmented actIve meta-LEarning (AGILE) method to efficiently adapt DNNs to new tasks by using a small number of training examples. AGILE combines a meta-learning algorithm with a novel task augmentation technique which we use to generate an initial adaptive model. It then uses Bayesian dropout uncertainty estimates to actively select the most difficult samples when updating the model to a new task. This allows AGILE to learn with fewer tasks and a few informative samples, achieving high performance with a limited dataset. We perform our experiments using the brain cell classification task and compare the results to a plain meta-learning model trained from scratch. We show that the proposed task-augmented meta-learning framework can learn to classify new cell types after a single gradient step with a limited number of training samples. We show that active learning with Bayesian uncertainty can further improve the performance when the number of training samples is extremely small. Using only 1% of the training data and a single update step, we achieved 90% accuracy on the new cell type classification task, a 50% points improvement over a state-of-the-art meta-learning algorithm.
摘要：深层神经网络（或DNNs）必须不断地与输入数据分布的变化应对时关心的任务或数据收集协议的变化。从头再培训网络，以打击这一问题带来了显著的成本。元学习目标，提供自适应模型考虑了这些潜在分布的变化很敏感，但需要在元训练过程中的许多任务。在本文中，我们提出了一个任务，增强活动的元学习（敏捷）方法，通过使用少量的训练实例有效地适应DNNs新任务。敏捷结合了我们用它来产生初始自适应模型的新颖任务增强技术的元学习算法。然后，它使用贝叶斯差的不确定性估计更新模型到一个新的任务时主动选择最困难的样品。这允许灵活地用更少的任务和一些翔实的样本学习，实现了高性能与有限的数据集。我们执行使用脑细胞分类任务我们的实验，结果从头培养了纯元学习模型进行比较。我们表明，所提出的任务，增强元学习框架，可以学习与训练样本数量有限的单一梯度步骤之后新的细胞类型进行分类。我们表明，贝叶斯不确定性的主动学习，可以进一步提高性能，当训练样本的数量非常小。使用只有1％的训练数据和一个更新步骤，我们在新的细胞类型分类任务达到90％的准确率，50％的点改善了国家的最先进的元学习算法。

33. StyPath: Style-Transfer Data Augmentation For Robust Histology Image Classification [PDF] 返回目录
Pietro Antonio Cicalese, Aryan Mobiny, Pengyu Yuan, Jan Becker, Chandra Mohan, Hien Van Nguyen
Abstract: The classification of Antibody Mediated Rejection (AMR) in kidney transplant remains challenging even for experienced nephropathologists; this is partly because histological tissue stain analysis is often characterized by low inter-observer agreement and poor reproducibility. One of the implicated causes for inter-observer disagreement is the variability of tissue stain quality between (and within) pathology labs, coupled with the gradual fading of archival sections. Variations in stain colors and intensities can make tissue evaluation difficult for pathologists, ultimately affecting their ability to describe relevant morphological features. Being able to accurately predict the AMR status based on kidney histology images is crucial for improving patient treatment and care. We propose a novel pipeline to build robust deep neural networks for AMR classification based on StyPath, a histological data augmentation technique that leverages a light weight style-transfer algorithm as a means to reduce sample-specific bias. Each image was generated in 1.84 + 0.03 seconds using a single GTX TITAN V gpu and pytorch, making it faster than other popular histological data augmentation techniques. We evaluated our model using a Monte Carlo (MC) estimate of Bayesian performance and generate an epistemic measure of uncertainty to compare both the baseline and StyPath augmented models. We also generated Grad-CAM representations of the results which were assessed by an experienced nephropathologist; we used this qualitative analysis to elucidate on the assumptions being made by each model. Our results imply that our style-transfer augmentation technique improves histological classification performance (reducing error from 14.8% to 11.5%) and generalization ability.
摘要：抗体介导的排斥的肾移植遗体即使是经验丰富的nephropathologists挑战的分类（AMR）;这部分是因为病理组织染色分析的特点往往是低的观察者间协议和重复性差。一个的牵连原因观察者间不一致是（和内）病理实验室之间的组织染色质量的可变性，加上存档章节的逐步褪色。在染色的颜色和强度的变化可以使组织评估困难病理学家，最终影响其描述相关的形态特征的能力。基于肾脏组织学图像能够准确预测AMR状态对于改善患者的治疗和护理的关键。我们提出了一个新颖的管道来构建基于StyPath，组织学数据的增强技术，该技术利用了重量轻的样式传递算法，以减少样品特定偏压的装置AMR分类健壮深神经网络。使用单个GTX TITAN V GPU和pytorch在1.84±0.03秒产生的每个图像，使得它比其它流行的组织学数据增量技术更快。我们评估使用贝叶斯性能蒙特卡洛（MC）估计我们的模型，并产生不确定性的认知措施比较两种基线和StyPath增强模式。我们也产生了这是由有经验的nephropathologist评估结果的梯度-CAM表示;我们用这个定性分析阐明对每个模型所做出的假设。我们的研究结果意味着，我们的风格转移增强技术改善组织学分类性能（减少误差从14.8％到11.5％）和泛化能力。

34. Automatic Detection of COVID-19 Cases on X-ray images Using Convolutional Neural Networks [PDF] 返回目录
Lucas P. Soares, Cesar P. Soares
Abstract: In recent months the world has been surprised by the rapid advance of COVID-19. In order to face this disease and minimize its socio-economic impacts, in addition to surveillance and treatment, diagnosis is a crucial procedure. However, the realization of this is hampered by the delay and the limited access to laboratory tests, demanding new strategies to carry out case triage. In this scenario, deep learning models are being proposed as a possible option to assist the diagnostic process based on chest X-ray and computed tomography images. Therefore, this research aims to automate the process of detecting COVID-19 cases from chest images, using convolutional neural networks (CNN) through deep learning techniques. The results can contribute to expand access to other forms of detection of COVID-19 and to speed up the process of identifying this disease. All databases used, the codes built, and the results obtained from the models' training are available for open access. This action facilitates the involvement of other researchers in enhancing these models since this can contribute to the improvement of results and, consequently, the progress in confronting COVID-19.
摘要：近几个月来，世界已经被COVID-19的快速推进感到惊讶。为了应对这种疾病并尽量减少其社会经济影响，除了监测和治疗，诊断是很关键的过程。然而，这种实现是通过延迟和实验室测试的机会有限的阻碍，需要新的策略来开展情况分流。在这种情况下，深度学习模型被提出作为一个可能的选项，可帮助基于X线胸片和CT影像诊断过程。因此，本研究的目的是自动的从胸部图像中检测COVID-19的情况下，通过深学习技术使用卷积神经网络（CNN）的过程。结果可以向扩大获得其它形式的检测COVID-19的并加快识别该疾病的过程。使用的所有数据库，建立了规范，并从模型的训练得到的结果可用于开放式访问。这个动作有利于其他研究人员在提高这些模型，因为这有助于业绩的提升，因此，在COVID-19面临的进步的参与。

35. SIMBA: Specific Identity Markers for Bone Age Assessment [PDF] 返回目录
Cristina González, María Escobar, Laura Daza, Felipe Torres, Gustavo Triana, Pablo Arbeláez
Abstract: Bone Age Assessment (BAA) is a task performed by radiologists to diagnose abnormal growth in a child. In manual approaches, radiologists take into account different identity markers when calculating bone age, i.e., chronological age and gender. However, the current automated Bone Age Assessment methods do not completely exploit the information present in the patient's metadata. With this lack of available methods as motivation, we present SIMBA: Specific Identity Markers for Bone Age Assessment. SIMBA is a novel approach for the task of BAA based on the use of identity markers. For this purpose, we build upon the state-of-the-art model, fusing the information present in the identity markers with the visual features created from the original hand radiograph. We then use this robust representation to estimate the patient's relative bone age: the difference between chronological age and bone age. We validate SIMBA on the Radiological Hand Pose Estimation dataset and find that it outperforms previous state-of-the-art methods. SIMBA sets a trend of a new wave of Computer-aided Diagnosis methods that incorporate all of the data that is available regarding a patient. To promote further research in this area and ensure reproducibility we will provide the source code as well as the pre-trained models of SIMBA.
摘要：骨龄评估（BAA）是由放射科医生诊断异常生长在一个孩子执行的任务。在手动方法中，放射科医师计算骨龄，即，实足年龄和性别当考虑到不同的身份标记。然而，当前的自动化骨龄评估方法不能完全利用存在于病人的元数据信息。有了这种缺乏可用的方法为动力，我们现在SIMBA：具体身份标记的骨龄评估。 SIMBA是基于使用身份标记的BAA的任务的新方法。为此，我们建立在国家的最先进的机型，目前融合与从原始的手工胸片创建视觉特征身份标记的信息。然后，我们使用这个强大的代表估计病人的相对骨龄：实足年龄和骨龄之间的差异。我们验证SIMBA的放射手姿态估计数据集，并发现它以前优于国家的最先进的方法。 SIMBA套结合了所有可用关于患者数据的计算机辅助诊断方法，新一波的趋势。为了促进这一领域的进一步研究，并确保重复性，我们将提供源代码以及SIMBA的预先训练模式。

36. Evaluation of Big Data based CNN Models in Classification of Skin Lesions with Melanoma [PDF] 返回目录
Prasitthichai Naronglerdrit, Iosif Mporas
Abstract: This chapter presents a methodology for diagnosis of pigmented skin lesions using convolutional neural networks. The architecture is based on convolu-tional neural networks and it is evaluated using new CNN models as well as re-trained modification of pre-existing CNN models were used. The experi-mental results showed that CNN models pre-trained on big datasets for gen-eral purpose image classification when re-trained in order to identify skin le-sion types offer more accurate results when compared to convolutional neural network models trained explicitly from the dermatoscopic images. The best performance was achieved by re-training a modified version of ResNet-50 convolutional neural network with accuracy equal to 93.89%. Analysis on skin lesion pathology type was also performed with classification accuracy for melanoma and basal cell carcinoma being equal to 79.13% and 82.88%, respectively.
摘要：本章介绍的色素皮损症状使用卷积神经网络的诊断的方法。该架构是基于convolu，周志武神经网络，它是使用新的CNN模型以及预先存在的CNN模型重新训练的修改进行评价使用。相比于从显式训练卷积神经网络模型时，experi - 心理结果表明，CNN模型对发电机全部擦除目的图像分类的大数据集预先训练，以确定皮肤LE-锡安类型提供更精确的结果时，再培训dermatoscopic图像。表现最好的是由再培训RESNET-50卷积神经网络的精度修改后的版本等于93.89％，实现的。还与分类精度为黑素瘤和基底细胞癌等于分别79.13％和82.88％，进行在皮肤上病变的病理类型的分析。

37. Joint Blind Deconvolution and Robust Principal Component Analysis for Blood Flow Estimation in Medical Ultrasound Imaging [PDF] 返回目录
Duong-Hung Pham, Adrian Basarab, Ilyess Zemmoura, Jean-Pierre Remenieras, Denis Kouame
Abstract: This paper addresses the problem of high-resolution Doppler blood flow estimation from an ultrafast sequence of ultrasound images. Formulating the separation of clutter and blood components as an inverse problem has been shown in the literature to be a good alternative to spatio-temporal singular value decomposition (SVD)-based clutter filtering. In particular, a deconvolution step has recently been embedded in such a problem to mitigate the influence of the experimentally measured point spread function (PSF) of the imaging system. Deconvolution was shown in this context to improve the accuracy of the blood flow reconstruction. However, measuring the PSF requires non-trivial experimental setups. To overcome this limitation, we propose herein a blind deconvolution method able to estimate both the blood component and the PSF from Doppler data. Numerical experiments conducted on simulated and in vivo data demonstrate qualitatively and quantitatively the effectiveness of the proposed approach in comparison with the previous method based on experimentally measured PSF and two other state-of-the-art approaches.
摘要：本文地址高分辨率多普勒血流估计从超声图像的超快顺序的问题。配制杂波和血液成分的分离作为反向问题已在文献中被证明是一个很好的替代时空奇异值分解（SVD）基杂波滤波。特别地，去卷积步骤最近已经嵌入在这样的问题，以减轻成像系统的实验测量的点扩散函数（PSF）的影响。反褶积在这方面显示出改善血流重建的精度。然而，测量PSF需要非平凡的实验设置。为了克服这种限制，我们建议此能够估计血液成分，并从多普勒数据的PSF两者盲去卷积方法。上模拟和体内数据进行数值实验证明定性和定量所提出的方法的有效性在前面的方法基于实验测量的PSF和两个其他国家的最先进的方法比较。

38. Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets [PDF] 返回目录
Chinedu Innocent Nwoye, Cristians Gonzalez, Tong Yu, Pietro Mascagni, Didier Mutter, Jacques Marescaux, Nicolas Padoy
Abstract: Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called Class Activation Guide (CAG), which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D Interaction Space, which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.
摘要：手术活动的认识是开发手术室上下文感知决策支持的重要组成部分。在这项工作中，我们处理的细粒度活动，模拟成行动三胞胎<仪器，动词，目标>代表刀具活动的认可。为此，我们引入了一个新的腹腔镜的数据集，CholecT40，从公开数据集Cholec80在所有帧被使用128三重类注释，包括有40部影片。此外，我们提出直接从视频数据识别这些三胞胎的方法。它依靠一个称为类激活指南（CAG）模块，其使用该仪器激活图以引导动词和目标识别上。为了模拟多个三胞胎在同一帧的认可，我们也提出了一个训练的3D互动空间，捕捉三重组件之间的关联。最后，我们通过几个消融的研究和比较，对CholecT40基线证明这些捐款的意义。

39. Semi-supervised Task-driven Data Augmentation for Medical Image Segmentation [PDF] 返回目录
Krishna Chaitanya, Neerav Karani, Christian F. Baumgartner, Anton Becker, Olivio Donati, Ender Konukoglu
Abstract: Supervised learning-based segmentation methods typically require a large number of annotated training data to generalize well at test time. In medical applications, curating such datasets is not a favourable option because acquiring a large number of annotated samples from experts is time-consuming and expensive. Consequently, numerous methods have been proposed in the literature for learning with limited annotated examples. Unfortunately, the proposed approaches in the literature have not yet yielded significant gains over random data augmentation for image segmentation, where random augmentations themselves do not yield high accuracy. In this work, we propose a novel task-driven data augmentation method for learning with limited labeled data where the synthetic data generator, is optimized for the segmentation task. The generator of the proposed method models intensity and shape variations using two sets of transformations, as additive intensity transformations and deformation fields. Both transformations are optimized using labeled as well as unlabeled examples in a semi-supervised framework. Our experiments on three medical datasets, namely cardic, prostate and pancreas, show that the proposed approach significantly outperforms standard augmentation and semi-supervised approaches for image segmentation in the limited annotation setting. The code is made publicly available at this https URL\_$driven$\_$data$\_$augmentation.
摘要：基于监督学习的分割方法通常需要大量的注释的训练数据的测试时间一概而论好。在医疗应用中，策划这样的数据集是不是一个有利的选择，因为获得了大量来自专家注释样品是耗时且昂贵的。因此，许多方法已经在文献中提出了有限的注释例子学习。不幸的是，在文献中提出的方法还没有取得过随机数据增强显著收益的图像分割，其中随机扩增系统本身不产生高的精度。在这项工作中，我们提出了与将合成的数据发生器，用于分割任务优化局限于标记的数据进行学习的新颖任务驱动数据增强方法。使用两组变换的建议的方法的模型强度和形状的变化的发电机，作为添加剂强度变换和变形场。两个转换是在一个半监督框架使用标记的以及未标记的例子进行了优化。我们在三级医疗数据集，即贲门癌，前列腺癌和胰腺癌的实验，证明该方法显著优于标准的增加和在有限的注释设置的图像分割半监督的方法。该代码是在这个HTTPS URL \公之于众_ $驱动$ \ _ $数据$ \ _ $增强。

40. A distance-based loss for smooth and continuous skin layer segmentation in optoacoustic images [PDF] 返回目录
Stefan Gerl, Johannes C. Paetzold, Hailong He, Ivan Ezhov, Suprosanna Shit, Florian Kofler, Amirhossein Bayat, Giles Tetteh, Vasilis Ntziachristos, Bjoern Menze
Abstract: Raster-scan optoacoustic mesoscopy (RSOM) is a powerful, non-invasive optical imaging technique for functional, anatomical, and molecular skin and tissue analysis. However, both the manual and the automated analysis of such images are challenging, because the RSOM images have very low contrast, poor signal to noise ratio, and systematic overlaps between the absorption spectra of melanin and hemoglobin. Nonetheless, the segmentation of the epidermis layer is a crucial step for many downstream medical and diagnostic tasks, such as vessel segmentation or monitoring of cancer progression. We propose a novel, shape-specific loss function that overcomes discontinuous segmentations and achieves smooth segmentation surfaces while preserving the same volumetric Dice and IoU. Further, we validate our epidermis segmentation through the sensitivity of vessel segmentation. We found a 20 $\%$ improvement in Dice for vessel segmentation tasks when the epidermis mask is provided as additional information to the vessel segmentation network.
摘要：光栅扫描光声mesoscopy（RSOM）是一个功能强大的，非侵入性光学成像用于功能，解剖和分子皮肤和组织分析技术。然而，无论是手动和这样的图像的自动分析是具有挑战性的，因为RSOM图像具有非常低的对比度，较差信号噪声比，和黑色素和血红蛋白的吸收光谱之间的系统重叠。尽管如此，表皮层的分割为许多下游医疗和诊断任务，如血管分割或癌症进展的监测的关键步骤。我们提出一种克服不连续的分段，并实现平滑分割表面，同时保持了相同的体积骰子和IOU一种新颖的，特定形状损失函数。此外，我们还通过血管分割的灵敏度验证我们的表皮分割。我们发现在骰子$ 20 \％$改进血管分割任务时表皮面具作为附加信息的血管分割网络提供。

41. TIMELY: Improving Labeling Consistency in Medical Imaging for Cell Type Classification [PDF] 返回目录
Yushan Liu, Markus M. Geipel, Christoph Tietz, Florian Buettner
Abstract: Diagnosing diseases such as leukemia or anemia requires reliable counts of blood cells. Hematologists usually label and count microscopy images of blood cells manually. In many cases, however, cells in different maturity states are difficult to distinguish, and in combination with image noise and subjectivity, humans are prone to make labeling mistakes. This results in labels that are often not reproducible, which can directly affect the diagnoses. We introduce TIMELY, a probabilistic model that combines pseudotime inference methods with inhomogeneous hidden Markov trees, which addresses this challenge of label inconsistency. We show first on simulation data that TIMELY is able to identify and correct wrong labels with higher precision and recall than baseline methods for labeling correction. We then apply our method to two real-world datasets of blood cell data and show that TIMELY successfully finds inconsistent labels, thereby improving the quality of human-generated labels.
摘要：诊断疾病如白血病或贫血，需要血细胞的可靠计数。血液科通常标记和手动计数血细胞的显微图像。在许多情况下，然而，细胞在不同的成熟状态是难以区分，并与图像噪声和主体的组合，人类很容易做出错误的标签。这导致的标签，往往是不可复制的，它可以直接影响诊断。我们适时推出，概率模型，结合pseudotime与非齐次隐马尔可夫树，其中涉及标签不一致的这一挑战推理方法。我们首先显示的模拟数据及时能够以更高的精度和召回超过基线的方法进行标记校正识别和纠正错误的标签。然后，我们我们的方法应用到两个真实世界中的血细胞的数据集，并显示及时成功地找到不一致的标签，从而提高人类产生的标签的质量。

42. Deep Learning-Based Regression and Classification for Automatic Landmark Localization in Medical Images [PDF] 返回目录
Julia M. H. Noothout, Bob D. de Vos, Jelmer M. Wolterink, Elbrich M. Postma, Paul A. M. Smeets, Richard A. P. Takx, Tim Leiner, Max A. Viergever, Ivana Išgum
Abstract: In this study, we propose a fast and accurate method to automatically localize anatomical landmarks in medical images. We employ a global-to-local localization approach using fully convolutional neural networks (FCNNs). First, a global FCNN localizes multiple landmarks through the analysis of image patches, performing regression and classification simultaneously. In regression, displacement vectors pointing from the center of image patches towards landmark locations are determined. In classification, presence of landmarks of interest in the patch is established. Global landmark locations are obtained by averaging the predicted displacement vectors, where the contribution of each displacement vector is weighted by the posterior classification probability of the patch that it is pointing from. Subsequently, for each landmark localized with global localization, local analysis is performed. Specialized FCNNs refine the global landmark locations by analyzing local sub-images in a similar manner, i.e. by performing regression and classification simultaneously and combining the results. Evaluation was performed through localization of 8 anatomical landmarks in CCTA scans, 2 landmarks in olfactory MR scans, and 19 landmarks in cephalometric X-rays. We demonstrate that the method performs similarly to a second observer and is able to localize landmarks in a diverse set of medical images, differing in image modality, image dimensionality, and anatomical coverage.
摘要：在这项研究中，我们提出了一个快速和准确的方法在医学图像自动本地化解剖标志。我们采用全卷积神经网络（FCNNs）采用全局到局部定位方法。首先，全球FCNN通过图像块的分析本地化多个标志性建筑，同时执行回归和分类。在回归，从朝向界标位置图像块的中心指向位移矢量被确定。在分类上，在补丁兴趣地标存在成立。全球标志位置由平均预测的位移矢量，其中每个位移矢量的贡献由它从指向贴片的后验分类概率加权而获得。随后，全球定位本地化的每一个地标，进行局部分析。专门FCNNs通过以类似方式分析本地子图像，即通过同时执行回归和分类，其结果合并细化全球标志的位置。评估是通过在CCTA扫描8个解剖标志，2层的地标在嗅觉MR扫描，并在头部测量X射线19个的地标定位进行。我们表明，该方法执行的第二观察者同样地，并能够定位在地标一组不同的医学图像的，在图像模态，图像的维数，并解剖覆盖不同。

43. Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution [PDF] 返回目录
Jing Yao, Danfeng Hong, Jocelyn Chanussot, Deyu Meng, Xiaoxiang Zhu, Zongben Xu
Abstract: The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). Inspired by coupled spectral unmixing, a two-stream convolutional autoencoder framework is taken as backbone to jointly decompose MS and HS data into a spectrally meaningful basis and corresponding coefficients. CUCaNet is capable of adaptively learning spectral and spatial response functions from HS-MS correspondences by enforcing reasonable consistency assumptions on the networks. Moreover, a cross-attention module is devised to yield more effective spatial-spectral information transfer in networks. Extensive experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models, demonstrating the superiority of the CUCaNet in the HSI-SR application. Furthermore, the codes and datasets will be available at: this https URL.
摘要：深学习技术的进步最近取得了高光谱图像超分辨率（HSI-SR）巨大的进步。然而，无人监督的深层网络的发展仍然充满挑战这项任务。为此，我们提出了一种新颖的耦合解混网络与交注意机制，用于CUCaNet短，被具有更高空间分辨率的多光谱图像（MSI）的装置，以增强HSI的空间分辨率。由耦合光谱分离的启发，两流卷积自动编码器框架被作为骨架共同分解MS和HS数据转换成有意义的光谱基础和相应的系数。 CUCaNet能够通过执行在网络上合理的一致性假设自适应地学习从HS-MS对应的光谱和空间响应功能。此外，交叉注意模块被设计以产生在网络中更有效的空间光谱信息传输。广泛实验在与国家的最先进的HSI-SR模式的比较在三个广泛使用的HS-MS的数据集进行的，表明在HSI-SR应用CUCaNet的优越性。此外，代码和数据集将可在：该HTTPS URL。

44. Automatic Segmentation of Non-Tumor Tissues in Glioma MR Brain Images Using Deformable Registration with Partial Convolutional Networks [PDF] 返回目录
Zhongqiang Liu
Abstract: In brain tumor diagnosis and surgical planning, segmentation of tumor regions and accurate analysis of surrounding normal tissues are necessary for physicians. Pathological variability often renders difficulty to register a well-labeled normal atlas to such images and to automatic segment/label surrounding normal brain tissues. In this paper, we propose a new registration approach that first segments brain tumor using a U-Net and then simulates missed normal tissues within the tumor region using a partial convolutional network. Then, a standard normal brain atlas image is registered onto such tumor-removed images in order to segment/label the normal brain tissues. In this way, our new approach greatly reduces the effects of pathological variability in deformable registration and segments the normal tissues surrounding brain tumor well. In experiments, we used MICCAI BraTS2018 T1 tumor images to evaluate the proposed algorithm. By comparing direct registration with the proposed algorithm, the results showed that the Dice coefficient for gray matters was significantly improved for surrounding normal brain tissues.
摘要：在脑肿瘤的诊断和手术计划，肿瘤区域和周围正常组织的准确分析的分割是必要的医师。病理变化往往使困难注册一个良好标记的正常图谱，以这样的图像和周围的正常脑组织自动段/标签。在本文中，我们提议第一区段脑瘤使用U形网，然后使用局部卷积网络摸拟错过了肿瘤区域内的正常组织中新登记的方法。然后，一个标准的正常脑图谱的图像登记到这种肿瘤移除图像，以便段/标记正常脑组织。这样一来，我们的新方法大大降低了病理变化的变形登记和段周围的脑肿瘤及正常组织的影响。在实验中，我们使用MICCAI BraTS2018 T1肿瘤图像，该算法评估。通过用该算法比较直接注册，结果表明，该骰子系数灰色事项正常脑组织周围被显著改善。

45. Efficient Unpaired Image Dehazing with Cyclic Perceptual-Depth Supervision [PDF] 返回目录
Chen Liu, Jiaqi Fan, Guosheng Yin
Abstract: Image dehazing without paired haze-free images is of immense importance, as acquiring paired images often entails significant cost. However, we observe that previous unpaired image dehazing approaches tend to suffer from performance degradation near depth borders, where depth tends to vary abruptly. Hence, we propose to anneal the depth border degradation in unpaired image dehazing with cyclic perceptual-depth supervision. Coupled with the dual-path feature re-using backbones of the generators and discriminators, our model achieves $\mathbf{20.36}$ Peak Signal-to-Noise Ratio (PSNR) on NYU Depth V2 dataset, significantly outperforming its predecessors with reduced Floating Point Operations (FLOPs).
摘要：无配对的无雾图像图像除雾是极为重要，如获取成对的图像往往需要显著成本。但是，我们观察到，以前未成对图像除雾方法往往从近旁的深度边界性能下降，其中深度趋于突然变化之苦。因此，我们建议在退火未成图像具有循环感性深入监督除雾深度边界退化。再加上发电机和鉴别器的双通道功能，重新使用骨干，我们的模型达到$ \ mathbf {20.36} $峰值信噪上NYU深度V2数据集比（PSNR），显著优于它的前辈与减少浮动浮点运算（FLOPS）。

46. OT-driven Multi-Domain Unsupervised Ultrasound Image Artifact Removal using a Single CNN [PDF] 返回目录
Jaeyoung Huh, Shujaat Khan, Jong Chul Ye
Abstract: Ultrasound imaging (US) often suffers from distinct image artifacts from various sources. Classic approaches for solving these problems are usually model-based iterative approaches that have been developed specifically for each type of artifact, which are often computationally intensive. Recently, deep learning approaches have been proposed as computationally efficient and high performance alternatives. Unfortunately, in the current deep learning approaches, a dedicated neural network should be trained with matched training data for each specific artifact type. This poses a fundamental limitation in the practical use of deep learning for US, since large number of models should be stored to deal with various US image artifacts. Inspired by the recent success of multi-domain image transfer, here we propose a novel, unsupervised, deep learning approach in which a single neural network can be used to deal with different types of US artifacts simply by changing a mask vector that switches between different target domains. Our algorithm is rigorously derived using an optimal transport (OT) theory for cascaded probability measures. Experimental results using phantom and in vivo data demonstrate that the proposed method can generate high quality image by removing distinct artifacts, which are comparable to those obtained by separately trained multiple neural networks.
摘要：超声波成像（US）通常由来自各种来源的不同的图像伪影缺点。解决这些问题的办法经典是已经为每个类型的伪像，这往往是计算密集型专门开发通常基于模型的迭代方法。近日，深学习方法已被提议作为计算效率和高性能的替代品。不幸的是，在目前的深度学习方法，有专门的神经网络应该为每个特定工件类型匹配的训练数据来训练。这引起了美国的实际应用深度学习的基本限制，因为大量的模型应该被存储到处理各种美国图像伪影。受近期多域图像传输的成功的启发，我们在这里提出一种新的，无监督，深度学习的方法，其中一个神经网络可以用来对付不同类型的美国文物只需改变一次面膜载体，不同交换机之间目标域。我们的算法是使用级联的概率措施的最佳传输（OT）的理论推导严格。使用幻象和体内数据实验结果表明，所提出的方法可以通过除去不同伪影，这是与通过单独训练多个神经网络获得的那些产生高质量的图像。

47. ROSE: A Retinal OCT-Angiography Vessel Segmentation Dataset and New Model [PDF] 返回目录
Yuhui Ma, Huaying Hao, Huazhu Fu, Jiong Zhang, Jianlong Yang, Jiang Liu, Yalin Zheng, Yitian Zhao
Abstract: Optical Coherence Tomography Angiography (OCT-A) is a non-invasive imaging technique, and has been increasingly used to image the retinal vasculature at capillary level resolution. However, automated segmentation of retinal vessels in OCT-A has been under-studied due to various challenges such as low capillary visibility and high vessel complexity, despite its significance in understanding many eye-related diseases. In addition, there is no publicly available OCT-A dataset with manually graded vessels for training and validation. To address these issues, for the first time in the field of retinal image analysis we construct a dedicated Retinal OCT-A SEgmentation dataset (ROSE), which consists of 229 OCT-A images with vessel annotations at either centerline-level or pixel level. This dataset has been released for public access to assist researchers in the community in undertaking research in related topics. Secondly, we propose a novel Split-based Coarse-to-Fine vessel segmentation network (SCF-Net), with the ability to detect thick and thin vessels separately. In the SCF-Net, a split-based coarse segmentation (SCS) module is first introduced to produce a preliminary confidence map of vessels, and a split-based refinement (SRN) module is then used to optimize the shape/contour of the retinal microvasculature. Thirdly, we perform a thorough evaluation of the state-of-the-art vessel segmentation models and our SCF-Net on the proposed ROSE dataset. The experimental results demonstrate that our SCF-Net yields better vessel segmentation performance in OCT-A than both traditional methods and other deep learning methods.
摘要：光学相干断层扫描血管造影（OCT-A）是一种非侵入性成像技术，并已越来越多地用于图像在毛细管水平分辨率的视网膜血管系统。然而，在OCT-A视网膜血管的自动分割一直在研究过由于各种挑战，如低能见度毛细血管和高容器的复杂性，尽管它对于理解许多眼相关疾病的意义。另外，有一个带有训练和验证手动分级船只没有公开可用的OCT-的数据集。为了解决这些问题，对于在视网膜图像分析我们构建了一个专用的视网膜OCT-分割数据集（ROSE），其在任一中心线级或像素级由229 OCT-甲图片包含有容器的注释的字段中的第一次。此数据集已经发布供公众查阅，以协助在社区的研究人员在开展相关课题研究。其次，我们提出了一种新颖的基于拆分粗到细血管分割网络（SCF-净），具有分别检测厚和薄的血管的能力。在SCF-Net的，基于分割粗分割（SCS）模块首先被引入到产生的血管的初步信心地图，以及基于分割细化（SRN）模块然后被用于优化形状/视网膜的轮廓微血管。第三，我们执行国家的最先进的血管分割模式和我们的SCF-网络上所提出的ROSE数据集进行全面评估。实验结果表明，我们的SCF-净收益在OCT-更好的血管分割的性能比传统方法等深学习方法。

48. Hyperspectral Imaging to detect Age, Defects and Individual Nutrient Deficiency in Grapevine Leaves [PDF] 返回目录
Manoranjan Paul, Sourabhi Debnath, Tanmoy Debnath, Suzy Rogiers, Tintu Baby, DM Motiur Rahaman, Lihong Zheng, Leigh Schmidtke
Abstract: Hyperspectral (HS) imaging was successfully employed in the 380 nm to 1000 nm wavelength range to investigate the efficacy of detecting age, healthiness and individual nutrient deficiency of grapevine leaves collected from vineyards located in central west NSW, Australia. For age detection, the appearance of many healthy grapevine leaves has been examined. Then visually defective leaves were compared with healthy leaves. Control leaves and individual nutrient-deficient leaves (e.g. N, K and Mg) were also analysed. Several features were employed at various stages in the Ultraviolet (UV), Visible (VIS) and Near Infrared (NIR) regions to evaluate the experimental data: mean brightness, mean 1st derivative brightness, variation index, mean spectral ratio, normalised difference vegetation index (NDVI) and standard deviation (SD). Experiment results demonstrate that these features could be utilised with a high degree of effectiveness to compare age, identify unhealthy samples and not only to distinguish from control and nutrient deficiency but also to identify individual nutrient defects. Therefore, our work corroborated that HS imaging has excellent potential as a non-destructive as well as a non-contact method to detect age, healthiness and individual nutrient deficiencies of grapevine leaves
摘要：高光谱（HS）成像在380nm处被成功地用于1000nm的波长范围内，调查检测年龄，健康和从位于中央西澳大利亚新南威尔士州葡萄园葡萄收集叶子的个体营养缺乏的功效。对于年龄检测，许多健康的葡萄叶片的外观已审查。然后，在视觉上有缺陷的叶子用健康的叶子比较。控制叶和个体的营养缺陷型的叶子（例如N，K和Mg）进行了分析。平均亮度，平均第一衍生物的亮度，变化指数，平均谱比，归一化植被指数：若干特征物在各阶段中的紫外线（UV），可见（VIS）和近红外（NIR）区域来评价实验数据使用（NDVI）和标准偏差（SD）。实验结果表明，这些特征可以以高度的有效性被用来比较年龄，发现不健康的样品而不是只从控制和营养缺乏区分还要找出个别的营养缺陷。因此，我们的工作印证了HS影像具有极佳的潜力作为一种非破坏性和非接触式方法，检测的年龄，健康和葡萄叶的个人营养不足

49. Rain Streak Removal in a Video to Improve Visibility by TAWL Algorithm [PDF] 返回目录
Muhammad Rafiqul Islam, Manoranjan Paul
Abstract: In computer vision applications, the visibility of the video content is crucial to perform analysis for better accuracy. The visibility can be affected by several atmospheric interferences in challenging weather-one of them is the appearance of rain streak. In recent time, rain streak removal achieves lots of interest to the researchers as it has some exciting applications such as autonomous car, intelligent traffic monitoring system, multimedia, etc. In this paper, we propose a novel and simple method by combining three novel extracted features focusing on temporal appearance, wide shape and relative location of the rain streak and we called it TAWL (Temporal Appearance, Width, and Location) method. The proposed TAWL method adaptively uses features from different resolutions and frame rates. Moreover, it progressively processes features from the up-coming frames so that it can remove rain in the real-time. The experiments have been conducted using video sequences with both real rains and synthetic rains to compare the performance of the proposed method against the relevant state-of-the-art methods. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods by removing more rain streaks while keeping other moving regions.
摘要：在计算机视觉应用中，视频内容的可见性是为更好的精度进行分析是至关重要的。可见性可通过在挑战他们的天气，一个几大气干扰的影响是雨条纹的外观。在最近的时间，雨条纹去除实现很多感兴趣的研究人员，因为它有一些令人兴奋的应用，如自主轿车，智能交通监控系统，多媒体等，在本文中，我们提出了一个新颖而简单的方法，通过提取相结合三种新型功能集中在颞外观，宽形状和相对位置的雨条纹的，我们把它称为TAWL（时间外观，宽度和位置）方法。所提出的方法TAWL自适应使用不同分辨率和帧速率的功能。此外，它逐渐处理从上未来的帧功能，以便它可以在实时雨水。实验已经使用具有实降雨和合成的降雨视频序列进行比较的状态的最先进的相关方法，该方法的性能进行的。实验结果表明，所提出的方法通过，同时保持其它移动区域中去除更多雨条纹优于国家的最先进的方法。

50. Self-Reflective Variational Autoencoder [PDF] 返回目录
Ifigeneia Apostolopoulou, Elan Rosenfeld, Artur Dubrawski
Abstract: The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the exact posterior, but this flexibility comes at a cost: such models are expensive to train in high-dimensional regimes and can be slow to produce samples. In this work, we introduce an orthogonal solution, which we call self-reflective inference. By redesigning the hierarchical structure of existing VAE architectures, self-reflection ensures that the stochastic flow preserves the factorization of the exact posterior, sequentially updating the latent codes in a recurrent manner consistent with the generative model. We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior - on binarized MNIST, self-reflective inference achieves state-of-the art performance without resorting to complex, computationally expensive components such as autoregressive layers. Moreover, we design a variational normalizing flow that employs the proposed architecture, yielding predictive benefits compared to its purely generative counterpart. Our proposed modification is quite general and complements the existing literature; self-reflective inference can naturally leverage advances in distribution estimation and generative modeling to improve the capacity of each layer in the hierarchy.
摘要：变自动编码器（VAE）是学习概率潜变量生成模型一个强有力的框架。然而，在编码器和/或现有的近似后验分布典型的假设，严重制约其用于推断和生成模拟能力。这样的模型在高维制度，培训价格昂贵，可能会很慢，产生的样本：基于神经网络的自回归模型方面的确切后的条件依赖关系，但这种灵活性是有代价的变推论。在这项工作中，我们引入了正交的解决方案，我们称之为自我反省的推断。通过重新设计现有VAE架构中，自反射确保的层级结构，该随机流保留了精确后的分解，依次更新所述潜码与所述生成模型相一致的方式复发。我们凭经验证实变分后的匹配的确切后的明显的优点 - 在二值化MNIST，自反射推理达到国家的本领域性能不借助于复杂，计算上昂贵的部件，诸如自回归层。此外，我们设计，采用所提出的架构，产生相比，它的纯粹生成对应的预测好处变流正常化。我们提出的修改是相当普遍和补充现有文献;自反射推断可以天然存在于分布估计和有性建模杠杆前进到改善层次结构中的每一层的容量。

51. Localized Motion Artifact Reduction on Brain MRI Using Deep Learning with Effective Data Augmentation Techniques [PDF] 返回目录
Yijun Zhao, Jacek Ossowski, Xuming Wang, Shangjin Li, Orrin Devinsky, Samantha P. Martin, Heath R. Pardoe
Abstract: In-scanner motion degrades the quality of magnetic resonance imaging (MRI) thereby reducing its utility in the detection of clinically relevant abnormalities. We introduce a deep learning-based MRI artifact reduction model (DMAR) to localize and correct head motion artifacts in brain MRI scans. Our approach integrates the latest advances in object detection and noise reduction in Computer Vision. Specifically, DMAR employs a two-stage approach: in the first, degraded regions are detected using the Single Shot Multibox Detector (SSD), and in the second, the artifacts within the found regions are reduced using a convolutional autoencoder (CAE). We further introduce a set of novel data augmentation techniques to address the high dimensionality of MRI images and the scarcity of available data. As a result, our model was trained on a large synthetic dataset of 217,000 images generated from six whole-brain T1-weighted MRI scans obtained from three subjects. DMAR produces convincing visual results when applied to both synthetic test images and 55 real-world motion-affected slices from 18 subjects from the multi-center Autism Brain Imaging Data Exchange study. Quantitatively, depending on the level of degradation, our model achieves a 14.3%-25.6% reduction in RMSE and a 1.38-2.68 dB gain in PSNR on a 5000-sample set of synthetic images. For real-world scans where the ground-truth is unavailable, our model produces a 3.65% reduction in regional standard deviations of image intensity.
摘要：扫描器运动降解磁共振成像（MRI），由此减少在检测临床相关异常其效用的质量。我们引入了深刻的学习型MRI伪减少模型（DMAR）定位和脑MRI扫描正确头部运动伪影。我们的方法整合了目标检测和计算机视觉中的降噪的最新进展。具体而言，DMAR采用两阶段的方法：在所述第一，使用单拍的Multibox检测器（SSD）被检测退化区域，并且在第二，所找到的区域内的伪影使用卷积自动编码（CAE）减小。我们进一步引进了一套新的数据增强技术来解决MRI图像的高维和可用数据的匮乏。其结果是，我们的模型进行训练上从六个全脑T1加权MRI生成的扫描来自三个科目获得217000个图像的合成大数据集。当从18对从多中心孤独症脑成像数据交换研究应用到合成测试图像和55的真实世界的运动影响的切片DMAR产生令人信服的视觉效果。定量地，取决于降解的水平，我们的模型实现了RMSE减少14.3％-25.6％，和1.38-2.68 dB增益在PSNR上的5000样本集合成的图像。对于现实世界的扫描，其中地面实况是不可用的，我们的模型产生图像强度的区域标准差减少3.65％。

52. Improving Adversarial Robustness by Enforcing Local and Global Compactness [PDF] 返回目录
Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung
Abstract: The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances.
摘要：深神经网络很容易受到扰动精雕细琢严重影响应用程序的某些领域的使用深度学习的事实。在这种攻击许多发达国家的防守模式，对抗性训练涌现作为一贯抵抗各种攻击的最成功的方法。在这项工作的基础上，从以前的研究中观察到一个干净的数据实例的陈述和其对抗的例子成为一个深层神经网络的高层更发散，我们提出了敌手发散减小网络，其强制执行局部/全局的紧凑性和聚类假设在深神经网络的中间层。我们进行了全面的实验，以了解每个组件的隔离行为（即局部/全局的紧凑性和聚类假设），并与国家的最先进的对抗性训练方法比较我们提出的模型。实验结果表明，与我们提出的成分增强对抗性训练可以进一步提高网络的鲁棒性，从而导致更高的镇定自若和对抗性的预测表演。

53. Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models [PDF] 返回目录
Adarsh K. Jeewajee, Leslie P. Kaelbling
Abstract: Undirected graphical models are compact representations of joint probability distributions over random variables. Given a distribution over inference tasks, graphical models of arbitrary topology can be trained using empirical risk minimization. However, when faced with new task distributions, these models (EGMs) often need to be re-trained. Instead, we propose an inference-agnostic adversarial training framework for producing an ensemble of graphical models (AGMs). The ensemble is optimized to generate data, and inference is learned as a by-product of this endeavor. AGMs perform comparably with EGMs on inference tasks that the latter were specifically optimized for. Most importantly, AGMs show significantly better generalization capabilities across distributions of inference tasks. AGMs are also on par with GibbsNet, a state-of-the-art deep neural architecture, which like AGMs, allows conditioning on any subset of random variables. Finally, AGMs allow fast data sampling, competitive with Gibbs sampling from EGMs.
摘要：无向图模型的联合概率分布在随机变量的紧凑表示。鉴于分布在任务进行推演，任意拓扑的图形模型可以使用经验风险最小化的培训。然而，当面对新的任务分配，这些模型（EGM中）往往需要重新培训。相反，我们提出了用于生产的图形模型（年度股东大会）的合奏推理无关的对抗训练框架。合奏被优化以产生数据，和推理被学习为这一努力的一个副产品。年度股东大会上推理任务的EGM后者为专门进行了优化，同等执行。最重要的是，年度股东大会显示了对推理任务分布显著更好的泛化能力。 AGM来说也是看齐GibbsNet，一个国家的最先进的深神经结构，其中相同的AGM来说，允许在随机变量的任意子集调节。最后，周年大会允许快速数据采集，与吉布斯从EGM中取样的竞争力。

54. Multi-view Orthonormalized Partial Least Squares: Regularizations and Deep Extensions [PDF] 返回目录
Li Wang, Ren-Cang Li, Wen-Wei
Abstract: We establish a family of subspace-based learning method for multi-view learning using the least squares as the fundamental basis. Specifically, we investigate orthonormalized partial least squares (OPLS) and study its important properties for both multivariate regression and classification. Building on the least squares reformulation of OPLS, we propose a unified multi-view learning framework to learn a classifier over a common latent space shared by all views. The regularization technique is further leveraged to unleash the power of the proposed framework by providing three generic types of regularizers on its inherent ingredients including model parameters, decision values and latent projected points. We instantiate a set of regularizers in terms of various priors. The proposed framework with proper choices of regularizers not only can recast existing methods, but also inspire new models. To further improve the performance of the proposed framework on complex real problems, we propose to learn nonlinear transformations parameterized by deep networks. Extensive experiments are conducted to compare various methods on nine data sets with different numbers of views in terms of both feature extraction and cross-modal retrieval.
摘要：我们建立了一个家庭基于子空间的学习方法的多视角使用最小二乘法为基本依据学习。具体而言，我们调查正交偏最小二乘法（OPLS）和两个多元回归和分类研究其重要的属性。在OPLS的最小二乘再形成的基础上，我们提出了一个统一的多视图学习框架学习分类在所有视图共享一个共同的潜在空间。正则化技术进一步通过利用其固有的成分，包括模型参数，决定价值和潜在的投射点提供三个泛型类型regularizers的发动所提出的框架的力量。我们实例化一个集regularizers的各种先验的条款。与regularizers的正确选择所提出的框架，不仅可以重铸现有的方法，而且还激发新的车型。为了进一步提高对复杂的现实问题所提出的框架的性能，我们建议学会深网络参数化非线性变换。大量的实验以两个特征提取和跨模态获取方面九个数据集不同数量的比较各种方法。

55. Neural Architecture Search with GBDT [PDF] 返回目录
Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
Abstract: Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing interests due to its simplicity and effectiveness. Previous works employ neural network based predictors which unfortunately cannot well exploit the tabular data representations of network architectures. As decision tree-based models can better handle tabular data, in this paper, we propose to leverage gradient boosting decision tree (GBDT) as the predictor for NAS and demonstrate that it can improve the prediction accuracy and help to find better architectures than neural network based predictors. Moreover, considering that a better and compact search space can ease the search process, we propose to prune the search space gradually according to important features derived from GBDT using an interpreting tool named SHAP. In this way, NAS can be performed by first pruning the search space (using GBDT as a pruner) and then searching a neural architecture (using GBDT as a predictor), which is more efficient and effective. Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of GBDT for NAS: (1) NAS with GBDT predictor finds top-10 architecture (among all the architectures in the search space) with $0.18\%$ test regret on NASBench-101, and achieves $24.2\%$ top-1 error rate on ImageNet; and (2) GBDT based search space pruning and neural architecture search further achieves $23.5\%$ top-1 error rate on ImageNet.
摘要：神经结构搜索（NAS）与预测候选架构的精度已经引起越来越浓厚的兴趣，因为它的简单性和有效性的精度预测。以前的作品采用基于神经网络的预测不幸的是不能很好地利用网络体系结构的表格数据表示。作为决定基于树的模型能够更好地处理表格数据，在本文中，我们提出利用梯度提升决策树（GBDT）作为预测结果为NAS和证明其能够提高预测精度，并帮助找到比神经网络更好的架构根据预测。此外，考虑到更好的和紧凑的搜索空间可以缓解在搜索过程中，我们建议逐步根据使用的命名解释SHAP工具从GBDT衍生的重要特征修剪的搜索空间。以这种方式，NAS可以通过首先修剪的搜索空间（使用GBDT作为修枝剪），然后搜索一个神经结构（使用GBDT作为预测器），这是更高效和有效的来进行。上NASBench-101和ImageNet实验证明GBDT的有效性为NAS：（1）NAS与GBDT预测的发现前10架构与NASBench-101 $ 0.18 \％$测试遗憾（都在搜索空间中的体系结构之间），和达到$ 24.2 \％$顶部-1 ImageNet错误率;和（2）基于GBDT搜索空间修剪和神经结构搜索进一步实现上ImageNet $ 23.5 \％$顶部-1错误率。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-07-13

目录

摘要