摘要

1. Focus on defocus: bridging the synthetic to real domain gap for depth estimation [PDF] 返回目录
Maxim Maximov, Kevin Galim, Laura Leal-Taixé
Abstract: Data-driven depth estimation methods struggle with the generalization outside their training scenes due to the immense variability of the real-world scenes. This problem can be partially addressed by utilising synthetically generated images, but closing the synthetic-real domain gap is far from trivial. In this paper, we tackle this issue by using domain invariant defocus blur as direct supervision. We leverage defocus cues by using a permutation invariant convolutional neural network that encourages the network to learn from the differences between images with a different point of focus. Our proposed network uses the defocus map as an intermediate supervisory signal. We are able to train our model completely on synthetic data and directly apply it to a wide range of real-world images. We evaluate our model on synthetic and real datasets, showing compelling generalization results and state-of-the-art depth prediction.
摘要：他们的训练场景外泛化由于真实世界场景的巨大变化数据驱动的深度估计方法的斗争。此问题可以通过利用合成产生的图像，但在关闭合成真实域间隙是远远琐碎来部分地解决。在本文中，我们将处理通过使用域不变的散焦模糊的直接监督这个问题。我们通过使用鼓励网络从不同角度聚焦图像之间的差异，以学习置换不变卷积神经网络利用散焦线索。我们提出的网络使用散焦图作为中间监视信号。我们能够完全训练的合成数据我们的模型，并直接将其应用到广泛的现实世界图像。我们评估我们对合成和真实数据集模型，表现出令人信服的泛化结果和国家的最先进的深度预测。

2. CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks [PDF] 返回目录
Maxim Maximov, Ismail Elezi, Laura Leal-Taixé
Abstract: The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the data while taking careful consideration in protecting people's identity. We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. Our model is able to remove the identifying characteristics of faces and bodies while producing high-quality images and videos that can be used for any computer vision task, such as detection or tracking. Unlike previous methods, we have full control over the de-identification (anonymization) procedure, ensuring both anonymization as well as diversity. We compare our method to several baselines and achieve state-of-the-art results.
摘要：在计算机视觉技术在社会中的使用率空前增加齐头并进，在数据隐私的日益关注。在许多现实世界的场景喜欢的人跟踪或动作识别，它能够同时在保护人的身份详细考虑来处理数据是非常重要的。我们提出和发展CIAGAN的基础上，有条件的生成对抗网络的图像和视频匿名的典范。我们的模型能够去除面的同时产生高质量的图像和视频可被用于任何计算机视觉任务，诸如检测或跟踪的鉴定特征和机构。不同于以往的方法，我们在去标识（匿名）过程的完全控制，从而确保了匿名以及多样性。我们我们的方法比较几个基线和实现国家的最先进的成果。

3. Ultrasound Video Summarization using Deep Reinforcement Learning [PDF] 返回目录
Tianrui Liu, Qingjie Meng, Athanasios Vlontzos, Jeremy Tan, Daniel Rueckert, Bernhard Kainz
Abstract: Video is an essential imaging modality for diagnostics, e.g. in ultrasound imaging, for endoscopy, or movement assessment. However, video hasn't received a lot of attention in the medical image analysis community. In the clinical practice, it is challenging to utilise raw diagnostic video data efficiently as video data takes a long time to process, annotate or audit. In this paper we introduce a novel, fully automatic video summarization method that is tailored to the needs of medical video data. Our approach is framed as reinforcement learning problem and produces agents focusing on the preservation of important diagnostic information. We evaluate our method on videos from fetal ultrasound screening, where commonly only a small amount of the recorded data is used diagnostically. We show that our method is superior to alternative video summarization methods and that it preserves essential information required by clinical diagnostic standards.
摘要：视频是用于诊断的重要的成像模态，例如在超声成像，内镜检查，或运动评估。然而，视频尚未收到了很多的关注，在医学图像分析领域。在临床实践中，它是具有挑战性的作为视频数据需要较长的时间来处理，注释或审计以有效地利用原始诊断视频数据。在本文中，我们介绍一种新颖的，即适合于医疗视频数据的需要全自动视频摘要方法。我们的做法是诬陷为强化学习问题，并产生代理侧重于重要的诊断信息的保存。我们评估我们从胎儿超声筛查，其中常用仅记录数据的少量用于诊断方法的视频。我们证明了我们的方法优于替代视频概括方法，它保留了临床诊断标准所要求的基本信息。

4. Differentiable Mapping Networks: Learning Structured Map Representations for Sparse Visual Localization [PDF] 返回目录
Peter Karkus, Anelia Angelova, Vincent Vanhoucke, Rico Jonschkowski
Abstract: Mapping and localization, preferably from a small number of observations, are fundamental tasks in robotics. We address these tasks by combining spatial structure (differentiable mapping) and end-to-end learning in a novel neural network architecture: the Differentiable Mapping Network (DMN). The DMN constructs a spatially structured view-embedding map and uses it for subsequent visual localization with a particle filter. Since the DMN architecture is end-to-end differentiable, we can jointly learn the map representation and localization using gradient descent. We apply the DMN to sparse visual localization, where a robot needs to localize in a new environment with respect to a small number of images from known viewpoints. We evaluate the DMN using simulated environments and a challenging real-world Street View dataset. We find that the DMN learns effective map representations for visual localization. The benefit of spatial structure increases with larger environments, more viewpoints for mapping, and when training data is scarce. Project website: this http URL
摘要：标测和定位，优选地从一个小数目的观察，是在机器人的基本任务。可分化映射网络（DMN）：我们通过一种新颖的神经网络结构组合空间结构（可微映射）和端至端学习处理这些任务。的DMN构建空间结构视图嵌入图和将它用于带有颗粒过滤后续视觉定位。由于DMN架构是终端到终端的微的，我们可以用梯度下降共同学习地图表示和本地化。我们应用DMN稀疏视觉定位，其中一个机器人需要在一个新的环境相对于从已知的观点少数图像的本地化。我们评估使用模拟环境和挑战现实世界的街景数据集的DMN。我们发现，DMN学习视觉定位有效的地图表示。与较大的环境中的空间结构的增加，用于映射更多个视点，并且当训练数据的好处是稀少。项目网站：这个HTTP URL

5. Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate [PDF] 返回目录
Anand Ramakrishnan, Brian Zylich, Erin Ottmar, Jennifer LoCasale-Crouch, Jacob Whitehill
Abstract: In this work we present a multi-modal machine learning-based system, which we call ACORN, to analyze videos of school classrooms for the Positive Climate (PC) and Negative Climate (NC) dimensions of the CLASS observation protocol that is widely used in educational research. ACORN uses convolutional neural networks to analyze spectral audio features, the faces of teachers and students, and the pixels of each image frame, and then integrates this information over time using Temporal Convolutional Networks. The audiovisual ACORN's PC and NC predictions have Pearson correlations of $0.55$ and $0.63$ with ground-truth scores provided by expert CLASS coders on the UVA Toddler dataset (cross-validation on $n=300$ 15-min video segments), and a purely auditory ACORN predicts PC and NC with correlations of $0.36$ and $0.41$ on the MET dataset (test set of $n=2000$ videos segments). These numbers are similar to inter-coder reliability of human coders. Finally, using Graph Convolutional Networks we make early strides (AUC=$0.70$) toward predicting the specific moments (45-90sec clips) when the PC is particularly weak/strong. Our findings inform the design of automatic classroom observation and also more general video activity recognition and summary recognition systems.
摘要：在这项工作中，我们提出了多模态基于机器学习系统，我们称之为ACORN，分析学校教室的视频在积极的气氛（PC）和阴性气候（NC）该类观测协议的层面，广泛在教育研究中。 ACORN使用卷积神经网络以分析频谱音频特征，教师和学生的面，并且每个图像帧的像素，然后使用集成时空卷积网络随时间该信息。视听ACORN的PC和NC预测有$ $ 0.55和$ 0.63 $与地面实况得分由专业类编码器对UVA幼儿数据集提供的皮尔逊相关性（交叉验证上的$ n = $ 300的15分钟视频片段），和纯粹听觉ACORN预测PC和NC与对MET数据集的$ $ 0.36 $和$ 0.41的相关性（试验组的$ N = 2000 $视频片段）。这些数字是类似于人类编码器编码器间的可靠性。最后，使用图形卷积网络，我们早作步（AUC = $ $ 0.70）向预测特定时刻（45-90sec剪辑）当PC特别弱/强。我们的研究结果告知自动课堂观察的设计也比较一般的视频行为识别和总结识别系统。

6. RoadText-1K: Text Detection & Recognition Dataset for Driving Videos [PDF] 返回目录
Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C.V. Jawahar
Abstract: Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new "RoadText-1K" dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at this http URL
摘要：感知文本理解室外场景的语义至关重要的，因此是建立驾驶辅助和自动驾驶智能系统的关键要求。大多数文本检测和识别现有的数据集包括静止图像和大多编译保持文本的初衷。本文介绍了一个新的“RoadText-1K”数据集在推动视频文本。该数据集是比在视频文本的现存规模最大的数据集的20倍。我们的数据集包括驾驶没有对文本的任何偏差和带有注释的文本边界框和转录在每帧的1000个视频剪辑。对文本检测，识别和跟踪的技术方法状态在新的数据集评估，结果表明相比于现有的数据集无约束驾驶视频的挑战。这表明，RoadText-1K适合于阅读系统的研发，强大到足以被纳入像驾驶员辅助系统和自驾车更复杂的下游任务。该数据集可以在这个HTTP URL中找到

7. Built Infrastructure Monitoring and Inspection Using UAVs and Vision-based Algorithms [PDF] 返回目录
Khai Ky Ly, Manh Duong Phung
Abstract: This study presents an inspecting system using real-time control unmanned aerial vehicles (UAVs) to investigate structural surfaces. The system operates under favourable weather conditions to inspect a target structure, which is the Wentworth light rail base structure in this study. The system includes a drone, a GoPro HERO4 camera, a controller and a mobile phone. The drone takes off the ground manually in the testing field to collect the data requiring for later analysis. The images are taken through HERO 4 camera and then transferred in real time to the remote processing unit such as a ground control station by the wireless connection established by a Wi-Fi router. An image processing method has been proposed to detect defects or damages such as cracks. The method based on intensity histogram algorithms to exploit the pixel group related to the crack contained in the low intensity interval. Experiments, simulation and comparisons have been conducted to evaluate the performance and validity of the proposed system.
摘要：本研究提出采用实时控制无人驾驶飞行器（UAV）调查结构表面的检测系统。该系统有利的天气条件下操作，以检查目标的结构，这是在本研究中特沃斯光轨底结构。该系统包括无人驾驶飞机中，GoPro的HERO4相机，控制器和移动电话。无人驾驶飞机手动地离开地面在测试场来收集需要用于以后的分析的数据。该图像是通过HERO 4相机拍摄，然后实时地远程处理单元传送诸如通过由Wi-Fi路由器建立的无线连接的地面控制站。已经提出了一种图像处理方法，以检测故障或损坏如裂纹。基于亮度直方图的算法的方法，利用与包含在所述低强度间隔裂纹的像素组。实验中，模拟和比较已经进行了评价所提出的系统的性能和有效性。

8. Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction [PDF] 返回目录
Andreas Eitel, Nico Hauff, Wolfram Burgard
Abstract: Instance segmentation of unknown objects from images is regarded as relevant for several robot skills including grasping, tracking and object sorting. Recent results in computer vision have shown that large hand-labeled datasets enable high segmentation performance. To overcome the time-consuming process of manually labeling data for new environments, we present a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner. Our robot pushes unknown objects on a table and uses information from optical flow to create training labels in the form of object masks. To achieve this, we fine-tune an existing DeepMask network for instance segmentation on the self-labeled training data acquired by the robot. We evaluate our trained network (SelfDeepMask) on a set of real images showing challenging and cluttered scenes with novel objects. Here, SelfDeepMask outperforms the DeepMask network trained on the COCO dataset by 9.5% in average precision. Furthermore, we combine our approach with recent approaches for training with noisy labels in order to better cope with induced label noise.
摘要：从图像中不明物体的实例分割被认为是相关的几个机器人的技能，包括抓，跟踪和物体分选。在计算机视觉最近的研究结果表明，大手标记的数据集，能够实现高分割性能。为了克服人工标注新环境中的数据的耗时的过程，我们提出了，通过与他们的自我监督的方式环境交互学段对象机器人传送的学习方法。我们的机器人推桌子上的未知物体，并使用从信息光流来创建对象的掩模的形式培养标签。为了实现这一目标，我们微调现有DeepMask网络例如分割由机器人获得的自身标记的训练数据。我们评价一组展示挑战与新的对象杂乱的场景真实图像的我们训练有素的网络（SelfDeepMask）。在这里，SelfDeepMask优于DeepMask网络9.5％的平均准确训练有素的COCO数据集。此外，我们才能将我们的方法与近来的方案进行训练，嘈杂的标签与标签感应噪声更好地应对。

9. MaskFace: multi-task face and landmark detector [PDF] 返回目录
Dmitry Yashunin, Tamir Baydasov, Roman Vlasov
Abstract: Currently in the domain of facial analysis single task approaches for face detection and landmark localization dominate. In this paper we draw attention to multi-task models solving both tasks simultaneously. We present a highly accurate model for face and landmark detection. The method, called MaskFace, extends previous face detection approaches by adding a keypoint prediction head. The new keypoint head adopts ideas of Mask R-CNN by extracting facial features with a RoIAlign layer. The keypoint head adds small computational overhead in the case of few faces in the image while improving the accuracy dramatically. We evaluate MaskFace's performance on a face detection task on the AFW, PASCAL face, FDDB, WIDER FACE datasets and a landmark localization task on the AFLW, 300W datasets. For both tasks MaskFace achieves state-of-the-art results outperforming many of single-task and multi-task models.
摘要：目前在面部分析单个任务的域进行人脸检测和标志性的定位占据主导地位的做法。在本文中，我们提请注意多任务模式同时解决这两个任务。我们提出了脸和标志检测高度精确的模型。的方法中，称为MaskFace，延伸前的人脸检测通过添加关键点预测磁头接近。新的关键点头部用一个RoIAlign层提取人脸特征，采用面膜R-CNN的想法。关键点头部图像中添加了少量的计算开销在几面的情况下，同时大大提高了测量精度。我们评估MaskFace就中日AFW，PASCAL脸，FDDB脸检测任务的性能，更宽的FACE数据集，并在AFLW，300W数据集的标志性定位任务。对于这两个任务MaskFace达到国家的先进成果优于许多单任务和多任务模式。

10. Uncertainty Estimation in Deep 2D Echocardiography Segmentation [PDF] 返回目录
Lavsen Dahal, Aayush Kafle, Bishesh Khanal
Abstract: 2D echocardiography is the most common imaging modality for cardiovascular diseases. The portability and relatively low-cost nature of Ultrasound (US) enable the US devices needed for performing echocardiography to be made widely available. However, acquiring and interpreting cardiac US images is operator dependent, limiting its use to only places where experts are present. Recently, Deep Learning (DL) has been used in 2D echocardiography for automated view classification, and structure and function assessment. Although these recent works show promise in developing computer-guided acquisition and automated interpretation of echocardiograms, most of these methods do not model and estimate uncertainty which can be important when testing on data coming from a distribution further away from that of the training data. Uncertainty estimates can be beneficial both during the image acquisition phase (by providing real-time feedback to the operator on acquired image's quality), and during automated measurement and interpretation. The performance of uncertainty models and quantification metric may depend on the prediction task and the models being compared. Hence, to gain insight of uncertainty modelling for left ventricular segmentation from US images, we compare three ensembling based uncertainty models quantified using four different metrics (one newly proposed) on state-of-the-art baseline networks using two publicly available echocardiogram datasets. We further demonstrate how uncertainty estimation can be used to automatically reject poor quality images and improve state-of-the-art segmentation results.
摘要：2D超声心动图是心血管疾病中最常见的成像模态。便携性和超声（美国）的成本相对较低的性质使得能够广泛提供所需执行超声心动图美国设备。然而，在获取和解释心脏超声图像取决于运营商，其使用限于仅在专家存在的地方。最近，深度学习（DL）已在二维超声心动图进行自动视图分类，结构和功能的评估使用。虽然这些最近的作品展示在开发计算机引导的采集和超声心动图的自动解释的承诺，这些方法大多不建模和数据从分发来从训练数据的渐行渐远测试时可能是重要的估计的不确定性。不确定性估算可能是有益的无论是在图像获取阶段（通过提供实时反馈来对获取的图像的质量的操作者），并且在自动测量和解释。不确定性模型和量化指标的表现可能取决于预测任务和被比较的车型。因此，获得来自美国的图像左心室分割的不确定性建模的洞察力，我们比较三种ensembling基于不确定模型使用使用两个公开可用超声心动图数据集的国家的最先进的基线网络四种不同的度量（一个新提出的）量化。我们进一步证明不确定性估计如何使用自动拒绝质量差的图像，提高国家的最先进的分割结果。

11. Localizing Firearm Carriers by Identifying Human-Object Pairs [PDF] 返回目录
Abdul Basit, Muhammad Akhtar Munir, Mohsen Ali, Arif Mahmood
Abstract: Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate effectiveness of the algorithm, including exploiting full pose of the human, hand key-points, and their association with the firearm. The knowledge of spatially localized features is key to success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously firearm detection dataset, by adding more images and tagging in extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results (\textit{78.5 $AP_{hold}$}) demonstrate effectiveness of the proposed method.
摘要：在人群中枪手的视觉识别是一个具有挑战性的问题，这需要与对象（枪支）解决一个人的协会。我们提出了一种新颖的方法来解决这个问题，通过定义人类对象交互（和非交互）包围盒。在给定的图像，人的和枪支被分别检测。每个检测到的人是搭配每个检测到的枪支，使我们能够创建包含对象和人类配对的边框。网络训练这些配对边界盒分为人携带枪支鉴定与否。进行了大量的实验来评价算法的有效性，包括利用人的完整的pose，手关键点，以及它们与枪支协会。的空间定位功能的知识是关键，通过使用多尺寸的建议具有自适应平均汇集我们的方法的成功。我们还扩展先前枪支检测数据集，通过增加更多的图片和扩展数据集标记的人对枪支（包括边界枪支和枪手框）。实验结果（\ {textit 78.5 $ {AP_保持} $}）证明了该方法的有效性。

12. Patch Attack for Automatic Check-out [PDF] 返回目录
Aishan Liu, Jiakai Wang, Xianglong Liu, Chongzhi Zhang, Bowen Cao, Hang Yu
Abstract: Adversarial examples are inputs with imperceptible perturbations that easily misleading deep neural networks(DNNs). Recently, adversarial patch, with noise confined to a small and localized patch, has emerged for its easy feasibility in real-world scenarios. However, existing strategies failed to generate adversarial patches with strong generalization ability. In other words, the adversarial patches were input-specific and failed to attack images from all classes, especially unseen ones during training. To address the problem, this paper proposes a bias-based framework to generate class-agnostic universal adversarial patches with strong generalization ability, which exploits both the perceptual and semantic bias of models. Regarding the perceptual bias, since DNNs are strongly biased towards textures, we exploit the hard examples which convey strong model uncertainties and extract a textural patch prior from them by adopting the style similarities. The patch prior is more close to decision boundaries and would promote attacks. To further alleviate the heavy dependency on large amounts of data in training universal attacks, we further exploit the semantic bias. As the class-wise preference, prototypes are introduced and pursued by maximizing the multi-class margin to help universal training. Taking AutomaticCheck-out (ACO) as the typical scenario, extensive experiments including white-box and black-box settings in both digital-world(RPC, the largest ACO related dataset) and physical-world scenario(Taobao and JD, the world' s largest online shopping platforms) are conducted. Experimental results demonstrate that our proposed framework outperforms state-of-the-art adversarial patch attack methods.
摘要：对抗性的例子是用察觉不到的扰动，可以轻松地误导深神经网络（DNNs）输入。近日，对抗性补丁，噪音限制在很小的局部和补丁，已经出现在现实世界的情景它很容易可行性。然而，现有的战略未能产生具有很强的推广能力对抗补丁。换句话说，对抗补丁是输入特定的，并没有从所有课程，培训期间，尤其是看不见的那些攻击图像。为了解决这一问题，本文提出了基于偏见的框架产生具有很强的推广能力，它利用两个感性和模型语义偏类无关的普遍敌对补丁。关于感知的偏差，因为DNNs强烈向纹理施力，我们利用传达强模型不确定性和采用的样式相似之前从中提取一个纹理补丁硬例子。该补丁之前更接近决策边界，并会促进攻击。为了进一步减轻训练通用攻击大量数据的严重依赖，我们进一步利用语义偏差。由于类明智的偏好，原型进行了介绍和通过最大化多类保证金帮助通用培训来实现。以AutomaticCheck出（ACO）作为典型的场景，丰富的实验，包括在这两个数字世界的白盒和黑盒的设置（RPC，最大的ACO相关数据集）和物理世界的情景（淘宝和JD，世界最大的网络购物平台）的工序。实验结果表明，我们提出的框架性能优于国家的最先进的对抗性补丁的攻击方法。

13. On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law [PDF] 返回目录
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel
Abstract: Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.
摘要：外的分布（OOD）的测试是评价一个机器学习系统的概括超越训练集的偏差能力越来越受欢迎。 OOD基准被设计成呈现训练和测试时间之间的数据和标签的不同联合分布。 VQA-CP已成为视觉的问答标准OOD标杆，但我们在目前的使用发现3点令人不安的做法。首先，大多数公布的方法依赖于OOD分裂建设的显性知识。他们往往依赖于``反相'标签的分配，例如回答多为“是”时，共同培训的回答是“不”。其次，OOD测试集用于模型选择。三，模型的域内性能再培训它表现出的标签的分配更加均衡域内分裂（VQA v2）的后评估。这三种做法击败客观评价概括，并投入问题专门为此设计的数据集方法的价值。我们证明了尴尬，简单的方法，包括一个随机生成的答案，超越现有技术的一些问题类型。我们提供短期和长期的解决方案，以避免这些缺陷，实现OOD评价的好处。

14. An Auto-Context Deformable Registration Network for Infant Brain MRI [PDF] 返回目录
Dongming Wei, Sahar Ahmad, Yunzhi Huang, Lei Ma, Qian Wang, Pew-Thian Yap, Dinggang Shen
Abstract: Deformable image registration is fundamental to longitudinal and population analysis. Geometric alignment of the infant brain MR images is challenging, owing to rapid changes in image appearance in association with brain development. In this paper, we propose an infant-dedicated deep registration network that uses the auto-context strategy to gradually refine the deformation fields to obtain highly accurate correspondences. Instead of training multiple registration networks, our method estimates the deformation fields by invoking a single network multiple times for iterative deformation refinement. The final deformation field is obtained by the incremental composition of the deformation fields. Experimental results in comparison with state-of-the-art registration methods indicate that our method achieves higher accuracy while at the same time preserves the smoothness of the deformation fields. Our implementation is available online.
摘要：变形图像配准是纵向和人口分析的基础。婴儿大脑MR图像的几何排列是具有挑战性的，因为在大脑发育的关联图像外观的快速变化。在本文中，我们建议使用自动背景下的战略，逐步细化变形领域获得高度精确的对应关系的婴幼儿专用的深注册的网络。取而代之的训练多个注册网络，我们的方法估计通过多次调用一个单一的网络迭代变形细化变形场。最终变形字段由变形场的增量组合物得到。在与国家的最先进的配准方法的比较实验结果表明，我们的方法实现了更高的精度，而在同一时间保持了变形场的平滑度。我们的实现是在网上提供。

15. Holistic Parameteric Reconstruction of Building Models from Point Clouds [PDF] 返回目录
Zhixin Li, Wenyuan Zhang, Jie Shan
Abstract: Building models are conventionally reconstructed by building roof points planar segmentation and then using a topology graph to group the planes together. Roof edges and vertices are then mathematically represented by intersecting segmented planes. Technically, such solution is based on sequential local fitting, i.e., the entire data of one building are not simultaneously participating in determining the building model. As a consequence, the solution is lack of topological integrity and geometric rigor. Fundamentally different from this traditional approach, we propose a holistic parametric reconstruction method which means taking into consideration the entire point clouds of one building simultaneously. In our work, building models are reconstructed from predefined parametric (roof) primitives. We first use a well-designed deep neural network to segment and identify primitives in the given building point clouds. A holistic optimization strategy is then introduced to simultaneously determine the parameters of a segmented primitive. In the last step, the optimal parameters are used to generate a watertight building model in CityGML format. The airborne LiDAR dataset RoofN3D with predefined roof types is used for our test. It is shown that PointNet++ applied to the entire dataset can achieve an accuracy of 83% for primitive classification. For a subset of 910 buildings in RoofN3D, the holistic approach is then used to determine the parameters of primitives and reconstruct the buildings. The achieved overall quality of reconstruction is 0.08 meters for point-surface-distance or 0.7 times RMSE of the input LiDAR points. The study demonstrates the efficiency and capability of the proposed approach and its potential to handle large scale urban point clouds.
摘要：建筑模型是由建筑物屋顶分平面分割，然后使用拓扑图形到基平面一起常规重建。屋顶边和顶点，然后通过数学相交分割平面表示。从技术上讲，这种解决方案是基于连续的局部拟合，即，一个建筑物的整个数据不同时参与确定建筑模型。因此，解决的办法是缺乏诚信的拓扑和几何的严密性。从这种传统的方法根本不同，我们提出了一个全面的参数重建方法，这意味着考虑到一个整个点云同步建设。在我们的工作中，建立模型，从预定义的参数（屋顶）原语重建。我们首先使用一个精心设计的深层神经网络分段，并确定在给定的建筑点云元。然后，将整体优化策略被引入到同时确定分段原语的参数。在最后的步骤中，最佳参数被用来产生在CityGML格式的水密建筑模型。使用预定义的屋顶类型的机载激光雷达数据集RoofN3D用于我们的测试。结果表明，PointNet ++施加到整个数据集可以达到的83％为原始分类的精度。在RoofN3D 910楼的建筑物的一个子集，所述整体方法然后被用于确定基元的参数和重构的建筑物。重建的实现整体质量为点表面的距离或0.7倍输入的LiDAR点的RMSE0.08米。这项研究表明，效率和所提出的方法的能力和潜力来处理大规模城市的点云。

16. Deep Learning Guided Building Reconstruction from Satellite Imagery-derived Point Clouds [PDF] 返回目录
Bo Xu, Xu Zhang, Zhixin Li, Matt Leotta, Shih-Fu Chang, Jie Shan
Abstract: 3D urban reconstruction of buildings from remotely sensed imagery has drawn significant attention during the past two decades. While aerial imagery and LiDAR provide higher resolution, satellite imagery is cheaper and more efficient to acquire for large scale need. However, the high, orbital altitude of satellite observation brings intrinsic challenges, like unpredictable atmospheric effect, multi view angles, significant radiometric differences due to the necessary multiple views, diverse land covers and urban structures in a scene, small base-height ratio or narrow field of view, all of which may degrade 3D reconstruction quality. To address these major challenges, we present a reliable and effective approach for building model reconstruction from the point clouds generated from multi-view satellite images. We utilize multiple types of primitive shapes to fit the input point cloud. Specifically, a deep-learning approach is adopted to distinguish the shape of building roofs in complex and yet noisy scenes. For points that belong to the same roof shape, a multi-cue, hierarchical RANSAC approach is proposed for efficient and reliable segmenting and reconstructing the building point cloud. Experimental results over four selected urban areas (0.34 to 2.04 sq km in size) demonstrate the proposed method can generate detailed roof structures under noisy data environments. The average successful rate for building shape recognition is 83.0%, while the overall completeness and correctness are over 70% with reference to ground truth created from airborne lidar. As the first effort to address the public need of large scale city model generation, the development is deployed as open source software.
摘要：从遥感影像的建筑物三维重建城市在过去二十年中已经引起显著的关注。虽然航拍图像和激光雷达提供更高的分辨率，卫星图像是大型需要更便宜，更有效的收购。然而，高，卫星观测轨道高度带来内在的挑战，如不可预知的大气效应，多视角，显著辐射差异由于必要的多个视图，不同的土地覆盖和城市结构在一个场景中，小型基站高度比或缩小的视场，所有这些都可能降低3D重建质量。为了解决这些重大挑战，我们提出了从多视角的卫星图像生成点云模型构建重建一个可靠和有效的方法。我们利用多种类型的原始形状，适合输入的点云。具体而言，深学习方法被采用来区分建设复杂，但嘈杂的场面屋顶的形状。对于属于同一屋檐状，多提示点，分层RANSAC方法，提出了高效，可靠的分割和重建建筑物点云。在四个选择的城市地区（0.34至2.04平方千米大小）实验结果表明，所提出的方法可以生成在噪声环境中的数据的详细的屋顶结构。建立形状识别的平均成功率为83.0％，而整体的完整性和正确性是超过70％，参照从机载激光雷达创建的地面实况。作为第一努力解决公众需要大规模的城市模型生成，开发部署开放源代码软件。

17. Retrieving and Highlighting Action with Spatiotemporal Reference [PDF] 返回目录
Seito Kasai, Yuchi Ishikawa, Masaki Hayashi, Yoshimitsu Aoki, Kensho Hara, Hirokatsu Kataoka
Abstract: In this paper, we present a framework that jointly retrieves and spatiotemporally highlights actions in videos by enhancing current deep cross-modal retrieval methods. Our work takes on the novel task of action highlighting, which visualizes where and when actions occur in an untrimmed video setting. Action highlighting is a fine-grained task, compared to conventional action recognition tasks which focus on classification or window-based localization. Leveraging weak supervision from annotated captions, our framework acquires spatiotemporal relevance maps and generates local embeddings which relate to the nouns and verbs in captions. Through experiments, we show that our model generates various maps conditioned on different actions, in which conventional visual reasoning methods only go as far as to show a single deterministic saliency map. Also, our model improves retrieval recall over our baseline without alignment by 2-3% on the MSR-VTT dataset.
摘要：在本文中，我们提出了一个框架，联合检索并通过增强电流深跨模态获取方法视频时空亮点行动。我们的工作采取行动突出的新颖任务，这其中，可视化和发生在未修剪视频设置操作时。行动高亮是一个细粒度的任务，相比于注重分类或基于窗口的定位常规动作识别任务。从标注的字幕利用监管不力，我们的框架收购时空相关性的地图，并产生涉及到字幕的名词和动词的嵌入地方。通过实验，我们表明，我们的模型生成各种地图上条件的不同动作，其中传统的视觉推理的方法只能去尽量展现一个确定性的显着图。此外，我们的模型提高了我们的不对齐的MSR-VTT数据集基线2-3％的检索召回。

18. MOTS: Multiple Object Tracking for General Categories Based On Few-Shot Method [PDF] 返回目录
Xixi Xu, Chao Lu, Liang Zhu, Xiangyang Xue, Guanxian Chen, Qi Guo, Yining Lin, Zhijian Zhao
Abstract: Most modern Multi-Object Tracking (MOT) systems typically apply REID-based paradigm to hold a balance between computational efficiency and performance. In the past few years, numerous attempts have been made to perfect the systems. Although they presented favorable performance, they were constrained to track specified category. Drawing on the ideas of few shot method, we pioneered a new multi-target tracking system, named MOTS, which is based on metrics but not limited to track specific category. It contains two stages in series: In the first stage, we design the self-Adaptive-matching module to perform simple targets matching, which can complete 88.76% assignments without sacrificing performance on MOT16 training set. In the second stage, a Fine-match Network was carefully designed for unmatched targets. With a newly built TRACK-REID data-set, the Fine-match Network can perform matching of 31 category targets, even generalizes to unseen categories.
摘要：大多数现代多目标跟踪（MOT）系统通常采用基于REID范式持有计算效率和性能之间的平衡。在过去的几年中，已经进行了多次尝试，以完美的系统。虽然他们提出了良好的性能，他们被限制为跟踪指定的类别。借鉴一些拍摄方法的想法，我们开创了一种新的多目标跟踪系统，命名为MOTS，它是基于指标包括但不限于跟踪特定的类别。它包含两个阶段的系列：在第一阶段，我们设计的自适应匹配模块进行简单的目标匹配，可同时完成88.76％，作业不会对MOT16训练集牺牲性能。在第二阶段，精细匹配网络进行了精心设计，提供无与伦比的目标。随着新建轨道REID数据集，精细匹配网络可以执行的31类目标匹配，甚至推广到看不见的类别。

19. Learning from a Lightweight Teacher for Efficient Knowledge Distillation [PDF] 返回目录
Yuang Liu, Wei Zhang, Jun Wang
Abstract: Knowledge Distillation (KD) is an effective framework for compressing deep learning models, realized by a student-teacher paradigm requiring small student networks to mimic the soft target generated by well-trained teachers. However, the teachers are commonly assumed to be complex and need to be trained on the same datasets as students. This leads to a time-consuming training process. The recent study shows vanilla KD plays a similar role as label smoothing and develops teacher-free KD, being efficient and mitigating the issue of learning from heavy teachers. But because teacher-free KD relies on manually-crafted output distributions kept the same for all data instances belonging to the same class, its flexibility and performance are relatively limited. To address the above issues, this paper proposes en efficient knowledge distillation learning framework LW-KD, short for lightweight knowledge distillation. It firstly trains a lightweight teacher network on a synthesized simple dataset, with an adjustable class number equal to that of a target dataset. The teacher then generates soft target whereby an enhanced KD loss could guide student learning, which is a combination of KD loss and adversarial loss for making student output indistinguishable from the output of the teacher. Experiments on several public datasets with different modalities demonstrate LWKD is effective and efficient, showing the rationality of its main design principles.
摘要：知识蒸馏（KD）为压缩在深的学习模式，通过要求学生小网络，以模拟由训练有素的教师中产生的软目标一个师生典范实现一个有效的框架。然而，教师普遍认为是复杂的，需要在同一个数据集的学生进行培训。这导致了一个耗时的训练过程。最近的研究表明香草KD扮演标签平滑类似的作用和发展师范生免费-KD，是高效和减轻由重教师学习的问题。但由于无教师KD依赖于手工制作的输出分布保持属于同一类别的所有数据实例一样，其灵活性和性能都相对有限。为了解决上述问题，本文提出了带有效的知识蒸馏学习框架LW-KD，短轻型知识升华。它首先列车上的合成数据集简单的轻型老师网络，具有可调节的类数目等于目标的数据集。老师然后生成软目标由此增强KD损失可以指导学生的学习，这是KD损失和从教师的输出使学生输出不可区分对抗性损失的组合。在以不同方式几个公共数据集的实验结果表明LWKD是有效的，高效的，显示了它的主要设计原则的合理性。

20. Adversarial Attacks for Embodied Agents [PDF] 返回目录
Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J. Maybank, Dacheng Tao
Abstract: Adversarial attacks are valuable for providing insights into the blind-spots of deep learning models and help improve their robustness. Existing work on adversarial attacks have mainly focused on static scenes; however, it remains unclear whether such attacks are effective against embodied agents, which could navigate and interact with a dynamic environment. In this work, we take the first step to study adversarial attacks for embodied agents. In particular, we generate spatiotemporal perturbations to form 3D adversarial examples, which exploit the interaction history in both the temporal and spatial dimensions. Regarding the temporal dimension, since agents make predictions based on historical observations, we develop a trajectory attention module to explore scene view contributions, which further help localize 3D objects appeared with the highest stimuli. By conciliating with clues from the temporal dimension, along the spatial dimension, we adversarially perturb the physical properties (e.g., texture and 3D shape) of the contextual objects that appeared in the most important scene views. Extensive experiments on the EQA-v1 dataset for several embodied tasks in both the white-box and black-box settings have been conducted, which demonstrate that our perturbations have strong attack and generalization abilities.
摘要：对抗性攻击提供见解深刻的学习模式，有助于提高他们的鲁棒性的盲点有价值。在对抗攻击现有的工作主要集中于静态场景;但是，目前还不清楚这种攻击是否针对具体化剂，它可以浏览和交互与动态环境中有效。在这项工作中，我们采取的第一个步骤，研究了具体代理对抗性攻击。特别是，我们生成时空扰动，以形成3D对抗性实施例，其利用在时间和空间维度二者交互历史。关于时间维度，因为代理使基于历史观察的预测，我们开发了一个轨迹注意模块发掘现场查看的贡献，这进一步有助于本地化3D对象出现最高的刺激。通过从时间维度线索调解，沿着空间维度，我们adversarially扰动的物理性质（例如，质地和3D形状）中出现的最重要的场景视图上下文对象。在EQA-V1的数据集大量的实验在白盒和黑盒都设置一些体现任务已经进行，这证明我们的扰动具有强大的攻击力和概括能力。

21. Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt [PDF] 返回目录
Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue
Abstract: Previous researches of sketches often considered sketches in pixel format and leveraged CNN based models in the sketch understanding. Fundamentally, a sketch is stored as a sequence of data points, a vector format representation, rather than the photo-realistic image of pixels. SketchRNN studied a generative neural representation for sketches of vector format by Long Short Term Memory networks (LSTM). Unfortunately, the representation learned by SketchRNN is primarily for the generation tasks, rather than the other tasks of recognition and retrieval of sketches. To this end and inspired by the recent BERT model, we present a model of learning Sketch Bidirectional Encoder Representation from Transformer (Sketch-BERT). We generalize BERT to sketch domain, with the novel proposed components and pre-training algorithms, including the newly designed sketch embedding networks, and the self-supervised learning of sketch gestalt. Particularly, towards the pre-training task, we present a novel Sketch Gestalt Model (SGM) to help train the Sketch-BERT. Experimentally, we show that the learned representation of Sketch-BERT can help and improve the performance of the downstream tasks of sketch recognition, sketch retrieval, and sketch gestalt.
摘要：草图以往的研究通常被认为是像素格式的草图和草图理解杠杆基于CNN模型。从根本上说，一个草图存储为数据点的序列，一个矢量格式表示，而不是像素的照片般逼真的图像。 SketchRNN研究了通过长短期记忆网络（LSTM）矢量格式的草图一个生成神经表示。不幸的是，SketchRNN学到的表示主要是用于发电的任务，而不是认可和草图检索的其他任务。为此，并由最近BERT模式的启发，我们提出从变压器（草图-BERT）学习素描双向编码表示的模型。我们推广BERT到草图域，与新提出的组件和预训练算法，包括全新设计的草图嵌入网络和草图完形的自我监督学习。特别是，对前培训任务，我们提出了一个新颖的草图完形模型（SGM），以帮助训练素描-BERT。在实验中，我们表明，素描-BERT的教训表示可以帮助提高的草图识别，草图检索和草图格式塔的下游任务的性能。

22. Associating Multi-Scale Receptive Fields for Fine-grained Recognition [PDF] 返回目录
Zihan Ye, Fuyuan Hu, Yin Liu, Zhenping Xia, Fan Lyu, Pengqing Liu
Abstract: Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations. First, CNL computes correlations between features of a query layer and all response layers. Second, all response features are weighted according to the correlations and are added to the query features. Due to the interactions of cross-layer features, our model builds spatial dependencies among multi-level layers and learns more discriminative features. In addition, we can reduce the aggregation cost if we set low-dimensional deep layer as query layer. Experiments are conducted to show our model achieves or surpasses state-of-the-art results on three benchmark datasets of fine-grained classification. Our codes can be found at this http URL.
摘要：提取和融合部分功能已成为细粒度的图像识别的关键。最近，非本地（NL）模块已经显示出图像识别优秀的改善。但是，它缺乏机制，多尺度的部分功能，这是至关重要的细粒度识别之间的交互进行建模。在本文中，我们提出了一种新的跨层的非本地（CNL）由两个操作模块关联的多尺度感受域。首先，CNL计算一个查询层的特征和所有响应层之间的相关性。第二，所有的响应功能都根据相关性加权和被添加到查询的功能。由于跨层功能的交互，我们的模型构建多层次层和在更进一步判别特征之间的空间依赖关系。此外，我们可以，如果我们设置低维深层作为查询层降低成本的聚集。实验以表明我们的模型达到或超过国家的先进成果细粒度分类的三个地基准数据集。我们的代码可以在这个HTTP URL中找到。

23. Increasing-Margin Adversarial (IMA) Training to Improve Adversarial Robustness of Neural Networks [PDF] 返回目录
Linhai Ma, Liang Liang
Abstract: Convolutional neural network (CNN) has surpassed traditional methods for med-ical image classification. However, CNN is vulnerable to adversarial attacks which may lead to disastrous consequences in medical applications. Although adversarial noises are usually generated by attack algorithms, white-noise-induced adversarial samples can exist, and therefore the threats are real. In this study, we propose a novel training method, named IMA, to improve the robust-ness of CNN against adversarial noises. During training, the IMA method in-creases the margins of training samples in the input space, i.e., moving CNN de-cision boundaries far away from the training samples to improve robustness. The IMA method is evaluated on four publicly available datasets under strong 100-PGD white-box adversarial attacks, and the results show that the proposed meth-od significantly improved CNN classification accuracy on noisy data while keep-ing a relatively high accuracy on clean data. We hope our approach may facilitate the development of robust applications in medical field.
摘要：卷积神经网络（CNN）已经超越了MED-iCal的图像分类的传统方法。然而，CNN是容易受到攻击的对抗性可能导致在医疗应用带来灾难性的后果。尽管对抗噪声通常是由攻击算法生成，白噪声引起的对抗样品可以存在，因此的威胁是真实的。在这项研究中，我们提出了一种新的训练方法，名为IMA，提高CNN的强劲的烦躁反对对抗噪音。在训练期间，该方法IMA在-折痕的训练样本在输入空间中，即，移动CNN脱cision边界远离训练样本，以提高鲁棒性的边缘。该IMA方法是在强烈的100-PGD白盒对抗性攻击四个可公开获得的数据集进行评估，结果表明，该方法显著提高对噪声数据分类CNN精确度，同时保持-ING在干净的数据相对高的准确。我们希望我们的方法可以方便的强大的应用程序在医疗领域的发展。

24. Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation [PDF] 返回目录
Shuhao Fu, Yongyi Lu, Yan Wang, Yuyin Zhou, Wei Shen, Elliot Fishman, Alan Yuille
Abstract: In this paper, we present a novel unsupervised domain adaptation (UDA) method, named Domain Adaptive Relational Reasoning (DARR), to generalize 3D multi-organ segmentation models to medical data collected from different scanners and/or protocols (domains). Our method is inspired by the fact that the spatial relationship between internal structures in medical images is relatively fixed, e.g., a spleen is always located at the tail of a pancreas, which serves as a latent variable to transfer the knowledge shared across multiple domains. We formulate the spatial relationship by solving a jigsaw puzzle task, i.e., recovering a CT scan from its shuffled patches, and jointly train it with the organ segmentation task. To guarantee the transferability of the learned spatial relationship to multiple domains, we additionally introduce two schemes: 1) Employing a super-resolution network also jointly trained with the segmentation model to standardize medical images from different domain to a certain spatial resolution; 2) Adapting the spatial relationship for a test image by test-time jigsaw puzzle training. Experimental results show that our method improves the performance by 29.60\% DSC on target datasets on average without using any data from the target domain during training.
摘要：在本文中，我们提出了一个新颖的无监督域适配（UDA）的方法，命名域自适应关系推理（DARR），来概括3D多器官分割模型来从不同的扫描仪和/或协议（域）收集的医疗数据。我们的方法是通过以下事实：在医用图像的内部结构之间的空间关系是相对固定的，例如，脾总是位于胰腺的尾部，其用作潜在变量传送跨越多个域共享的知识的启发。我们制定通过解决一个拼图任务，即恢复其洗牌补丁CT扫描的空间关系，以及与器官分割任务联合训练它。为了保证多个域学习空间关系的转让，我们还引入了两个方案：1）采用了超分辨率的网络还与细分模型，从不同的域医学图像标准化到一定空间分辨率的联合培训; 2）通过适应测试时间拼图训练的测试图像的空间关系。实验结果表明，该方法提高了对平均目标数据集29.60 \％DSC性能，而训练期间使用目标域中的任何数据。

25. Two-View Fine-grained Classification of Plant Species [PDF] 返回目录
Voncarlos M. Araujo, Alceu S. Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich
Abstract: Automatic plant classification is a challenging problem due to the wide biodiversity of the existing plant species in a fine-grained scenario. Powerful deep learning architectures have been used to improve the classification performance in such a fine-grained problem, but usually building models that are highly dependent on a large training dataset and which are not scalable. In this paper, we propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. It uses the botanical taxonomy as a basis for a coarse-to-fine strategy applied to identify the plant genus and species. The two-view representation provides complementary global and local features of leaf images. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species. The experimental results on two challenging fine-grained datasets of leaf images (i.e. LifeCLEF 2015 and LeafSnap) have shown the effectiveness of the proposed method, which achieved recognition accuracy of 0.87 and 0.96 respectively.
摘要：自动植物分类是一个具有挑战性的问题，因为在一个细粒度的场景现有植物种类的多样性广泛。功能强大的深度学习架构已经用于改善这种细粒度问题的分类性能，但通常建筑模型高度依赖于一个大的训练数据集，哪些是不可扩展的。在本文中，我们提出了一种基于两视图叶图像表示和植物物种的细粒度识别分层分类策略的新方法。它使用的植物学分类为适用于粗到细的战略提供了基础，以确定该植物属和种。两个视图表示提供叶片图像的互补全局和局部特征。基于连体卷积神经网络的深指标用于减少对大量训练样本的依赖，使该方法可扩展到新的植物物种。在两个实验结果挑战叶图像（即LifeCLEF 2015和LeafSnap）的细粒度数据集已经表明所提出的方法，其实现的0.87和0.96的识别精度分别的有效性。

26. Cross-filter compression for CNN inference acceleration [PDF] 返回目录
Fuyuan Lyu, Shien Zhu, Weichen Liu
Abstract: Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of the kernel. Meanwhile, with the popularity of small kernel, the natural limit further decrease. To address this issue, we propose a new cross-filter compression method that can provide $\sim32\times$ memory savings and $122\times$ speed up in convolution operations. In our method, all convolution filters are quantized to given bits and spatially adjacent filters share the same scaling factor. Our compression method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset with widely used network structures, such as ResNet and VGG, and witness tolerable accuracy loss compared to state-of-the-art quantification methods.
摘要：卷积神经网络演示多种任务，如图像分类和许多其他伟大的能力。然而，许多资源需要训练网络。因此，很多的努力已经取得了抑制重，激活和梯度的精度，加快神经网络。然而，这些过滤明智的量化方法存在天然的上限，引起内核的大小。同时，随着小内核的普及，自然限制进一步降低。为了解决这个问题，我们提出了一个新的交叉过滤器的压缩方法，可以在卷积操作提供$ \ sim32 \ $时间节约了内存和$ 122 \ $时间加快。在我们的方法中，所有卷积滤波器被量化为给定的位上和空间上相邻的滤波器共享相同的缩放因子。我们的压缩方法，根据二进制加权和XNOR-Net的分开，对CIFAR-10和ImageNet数据集进行评估与广泛使用的网络结构，例如RESNET和VGG，和证人可容忍的精度损失相比状态的最先进的定量的方法。

27. Efficient Image Gallery Representations at Scale Through Multi-Task Learning [PDF] 返回目录
Benjamin Gutelman, Pavel Levin
Abstract: Image galleries provide a rich source of diverse information about a product which can be leveraged across many recommendation and retrieval applications. We study the problem of building a universal image gallery encoder through multi-task learning (MTL) approach and demonstrate that it is indeed a practical way to achieve generalizability of learned representations to new downstream tasks. Additionally, we analyze the relative predictive performance of MTL-trained solutions against optimal and substantially more expensive solutions, and find signals that MTL can be a useful mechanism to address sparsity in low-resource binary tasks.
摘要：图像画廊提供了丰富的关于可以在许多建议和检索应用程序可以利用产品多样化的信息源。我们研究通过构建多任务学习（MTL）方法的通用图像库编码器的问题，并证明它的确是实现学会表示，以新的下游任务的普遍性的实用方法。此外，我们分析对最佳MTL培训解决方案和比较昂贵的解决方案的相对预测性能，并找到信号是MTL可以是地址稀疏的有用机制在资源匮乏的二进制任务。

28. A Novel Technique Combining Image Processing, Plant Development Properties, and the Hungarian Algorithm, to Improve Leaf Detection in Maize [PDF] 返回目录
Nazifa Khan, Oliver A.S. Lyon, Mark Eramian, Ian McQuillan
Abstract: Manual determination of plant phenotypic properties such as plant architecture, growth, and health is very time consuming and sometimes destructive. Automatic image analysis has become a popular approach. This research aims to identify the position (and number) of leaves from a temporal sequence of high-quality indoor images consisting of multiple views, focussing in particular of images of maize. The procedure used a segmentation on the images, using the convex hull to pick the best view at each time step, followed by a skeletonization of the corresponding image. To remove skeleton spurs, a discrete skeleton evolution pruning process was applied. Pre-existing statistics regarding maize development was incorporated to help differentiate between true leaves and false leaves. Furthermore, for each time step, leaves were matched to those of the previous and next three days using the graph-theoretic Hungarian algorithm. This matching algorithm can be used to both remove false positives, and also to predict true leaves, even if they were completely occluded from the image itself. The algorithm was evaluated using an open dataset consisting of 13 maize plants across 27 days from two different views. The total number of true leaves from the dataset was 1843, and our proposed techniques detect a total of 1690 leaves including 1674 true leaves, and only 16 false leaves, giving a recall of 90.8%, and a precision of 99.0%.
摘要：手动判定植物表型特性，例如植物结构，生长和健康的是非常耗时的并且有时具有破坏性。自动图像分析已经成为一种流行的做法。本研究的目的是找出叶的位置（和编号）选自的多个视图高质量室内图像的时间序列，特别是玉米的图像的聚焦。过程中使用的图像的分割，使用凸包接在每个时间步的最佳视图，接着对应的图像的骨架。以除去骨架杂散，施加一个离散骨架进化修剪过程。关于玉米开发既存的统计数据被合并到帮助区分真正的叶子和假叶之间。此外，对于每个时间段，叶相匹配的是那些使用图论匈牙利算法的过去和未来三天的。这种匹配算法可以用来删除两者误报，同时还预测片真叶时，即使它们完全从图像本身遮挡。使用开放的数据集由13种玉米植物的跨27天从两个不同视图的算法进行评价。从数据集真叶总数为1843，而我们提出的技术检测共1690个树叶包括1674片真叶时，只有16假树叶，给人的90.8％召回，而99.0％的精度。

29. Patch based Colour Transfer using SIFT Flow [PDF] 返回目录
Hana Alghamdi, Rozenn Dahyot
Abstract: We propose a new colour transfer method with Optimal Transport (OT) to transfer the colour of a sourceimage to match the colour of a target image of the same scene that may exhibit large motion changes betweenimages. By definition OT does not take into account any available information about correspondences whencomputing the optimal solution. To tackle this problem we propose to encode overlapping neighborhoodsof pixels using both their colour and spatial correspondences estimated using motion estimation. We solvethe high dimensional problem in 1D space using an iterative projection approach. We further introducesmoothing as part of the iterative algorithms for solving optimal transport namely Iterative DistributionTransport (IDT) and its variant the Sliced Wasserstein Distance (SWD). Experiments show quantitative andqualitative improvements over previous state of the art colour transfer methods.
摘要：本文提出用最佳的交通（OT）的新颜色传输方法来传输sourceimage的颜色相匹配的同一场景的目标图像可显示出大型运动改变betweenimages的颜色。根据定义OT不考虑关于对应whencomputing的最佳解决方案的任何资料。为了解决这个问题，我们建议编码重叠neighborhoodsof同时使用它们的颜色，并使用运动估计估计的空间对应的像素。我们使用迭代投影的方式在一维空间solvethe高维问题。我们进一步introducesmoothing作为解决交通优化迭代即DistributionTransport（IDT）及其变种切片Wasserstein的距离（SWD）的迭代算法的一部分。实验证明了的艺术色彩的传送方式以前的状态定量andqualitative改进。

30. U$^2$-Net: Going Deeper with Nested U-Structure for Salient Object Detection [PDF] 返回目录
Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
Abstract: In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD). The architecture of our U$^2$-Net is a two-level nested U-structure. The design has the following advantages: (1) it is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in our proposed ReSidual U-blocks (RSU), (2) it increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in these RSU blocks. This architecture enables us to train a deep network from scratch without using backbones from image classification tasks. We instantiate two models of the proposed architecture, U$^2$-Net (176.3 MB, 30 FPS on GTX 1080Ti GPU) and U$^2$-Net$^{\dagger}$ (4.7 MB, 40 FPS), to facilitate the usage in different environments. Both models achieve competitive performance on six SOD datasets. The code is available: this https URL.
摘要：在本文中，我们设计了一个简单而强大的深度网络架构，U $ ^ 2 $ -Net，对突出物检测（SOD）。我们的U $ ^ 2 $ -Net的架构是两级嵌套的U型结构。该设计具有以下优点：（1）它是能够捕获从由于在我们的提议的残留的U块大小不同（RSU）的感受域的混合物不同尺度更多的上下文信息，（2）它增加的深度整个建筑没有显著增加，因为在这些RSU块使用的池操作的计算成本。这种架构使我们能够培养一个从无到有的深网络，而无需使用来自骨干图像分类任务。我们实例化提出的架构，U $ ^ 2 $ -Net（176.3 MB，30 FPS的GTX GPU 1080Ti）和U $ ^ 2 $ -Net $ ^ {\匕首} $（4.7 MB，40 fps）的两款车型，以方便在不同环境中的使用。两款车型实现六个SOD数据集竞争性优势。该代码可：这个HTTPS URL。

31. Identifying Statistical Bias in Dataset Replication [PDF] 返回目录
Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry
Abstract: Dataset replication is a useful tool for assessing whether improvements in test accuracy on a specific benchmark correspond to improvements in models' ability to generalize reliably. In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality. We show that after correcting for the identified statistical bias, only an estimated $3.6\% \pm 1.5\%$ of the original $11.7\% \pm 1.0\%$ accuracy drop remains unaccounted for. We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication. Code for our study is publicly available at this http URL .
摘要：数据集的复制是评估在特定基准对应于模型的可靠地概括能力，改进测试精度是否改进的有用工具。在这项工作中，我们目前还没有直观的方式显著在标准的方法来复制数据集介绍统计偏差，倾斜所产生的意见。我们研究ImageNet-V2，该ImageNet数据集的复制上的模型表现出精度显著（11-14％）下降，即使在控制数据质量的一个标准的人在半实物测量之后。我们发现，修正标识的统计偏差，只能估计$ 3.6 \％\时许1.5 \％的原来的$ 11.7 \％\ PM 1.0 \％$精度下降$后仍然下落不明。我们的结论与识别和避免数据集复制偏置具体建议。规范我们的研究是公开的，在此http网址。

32. Advances in Computer Vision in Gastric Cancer: Potential Efficient Tools for Diagnosis [PDF] 返回目录
Yihua Sun
Abstract: Early and rapid diagnosis of gastric cancer is a great challenge for clinical doctors. Dramatic progress of computer vision on gastric cancer has been made recently and this review focused on advances during the past five years. Different methods for data generation and augmentation have been presented, and various approaches to extract discriminative features compared and evaluated. Classification and segmentation techniques are carefully discussed for assisting more precise diagnosis and timely treatment. Application of those methods will greatly reduce the labor and time consumed for the diagnosis of gastric cancers.
摘要：胃癌的早期快速诊断是临床医生的一个巨大的挑战。对胃癌计算机视觉的戏剧已经取得了进展最近这个审查期间过去五年集中在进步。不同的方法用于数据的生成和增大，已经给出，并且各种方法来提取比较和评估判别特征。分类和分割技术精心为帮助更精确的诊断，及时治疗讨论。这些方法的应用将大大减少消耗胃癌诊断的劳动和时间。

33. The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report [PDF] 返回目录
Daniel Sonntag, Fabrizio Nunnari, Hans-Jürgen Profitlich
Abstract: A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces.
摘要：皮肤科医生的短缺导致长时间等待谁寻求皮肤科护理的病人。此外，全科医生的诊断准确率有报道比的人工智能软件的精度低。本文介绍了护肤项目（H2020，EIT数字）。贡献包括基于交互式机器学习（IML），向欧洲数字医疗基础设施（也参见EIT MCPS）的参考架构，用于聚集数字化的病人信息技术组件和集成决策支持技术的进入使能技术临床决策支持临床试验台环境。然而，主要的贡献是在皮肤科诊断和决策支持系统，为患者和医生，恶性皮肤病变的鉴别诊断的互动深度学习系统。在这篇文章中，我们将介绍其功能和用户界面，以便于从人类输入的机器学习。基线深学习系统，该系统提供先进的最先进的结果，并增加全科医生和甚至皮肤科医生的潜力，开发和验证使用去确定的情况下从一个皮肤科图像数据的基础上（ISIC），其具有约20000用于开发和验证的情况下，由委员会认证的皮肤科医生限定用于每一种情况下的参考标准提供。 ISIC允许鉴别诊断，八个诊断的排序列表，其是用来治疗计划在诊断歧义的通用设置。我们给护肤项目成果的总体描述，我们注重的步骤人类和机器之间的IML支持的沟通和协调。这是未来的认知助手在医疗领域发展的一个组成部分，我们描述了必要的智能用户界面。

34. hidden markov random fields and cuckoo search method for medical image segmentation [PDF] 返回目录
EL-Hachemi Guerrout, Ramdane Mahiou, Dominique Michelucci, Boukabene Randa, Ouali Assia
Abstract: Segmentation of medical images is an essential part in the process of diagnostics. Physicians require an automatic, robust and valid results. Hidden Markov Random Fields (HMRF) provide powerful model. This latter models the segmentation problem as the minimization of an energy function. Cuckoo search (CS) algorithm is one of the recent nature-inspired meta-heuristic algorithms. It has shown its efficiency in many engineering optimization problems. In this paper, we use three cuckoo search algorithm to achieve medical image segmentation.
摘要：医学图像分割是在诊断过程中的重要组成部分。医生需要一个自动的，强大的和有效的结果。隐马尔可夫随机场（HMRF）提供强大的模型。这后一种模式的分割问题作为能量函数的最小化。布谷鸟搜索（CS）算法是最近的灵感来自大自然的启发式算法之一。它已经显示出其在许多工程优化问题的效率。在本文中，我们使用三个布谷鸟搜索算法来实现医学图像分割。

35. Learning to segment clustered amoeboid cells from brightfield microscopy via multi-task learning with adaptive weight selection [PDF] 返回目录
Rituparna Sarkar, Suvadip Mukherjee, Elisabeth Labruyère, Jean-Christophe Olivo-Marin
Abstract: Detecting and segmenting individual cells from microscopy images is critical to various life science applications. Traditional cell segmentation tools are often ill-suited for applications in brightfield microscopy due to poor contrast and intensity heterogeneity, and only a small subset are applicable to segment cells in a cluster. In this regard, we introduce a novel supervised technique for cell segmentation in a multi-task learning paradigm. A combination of a multi-task loss, based on the region and cell boundary detection, is employed for an improved prediction efficiency of the network. The learning problem is posed in a novel min-max framework which enables adaptive estimation of the hyper-parameters in an automatic fashion. The region and cell boundary predictions are combined via morphological operations and active contour model to segment individual cells. The proposed methodology is particularly suited to segment touching cells from brightfield microscopy images without manual interventions. Quantitatively, we observe an overall Dice score of 0.93 on the validation set, which is an improvement of over 15.9% on a recent unsupervised method, and outperforms the popular supervised U-net algorithm by at least $5.8\%$ on average.
摘要：检测和显微镜图像分割单个细胞的各种生命科学应用的关键。传统电池的分割工具通常不适合于在明显微镜由于对比度和亮度不均匀性差的应用中，只有一小部分是适用于部分细胞簇。在这方面，我们引进了多任务学习范式细胞分割一个新的监督技术。基于区域和小区边界检测的多任务损失的组合，被用于网络的改进的预测效率。学习的问题是在一个新的最小 - 最大架构，使超参数的自适应估计以自动的方式所构成。该区域和小区边界预测是通过形态学运算和活动轮廓模型分割单个细胞结合。所提出的方法特别适合于段感人细胞从明场显微术图像，而无需人工干预。从数量上看，我们观察到一个整体骰子上验证组，这是超过15.9％，在最近的一个无人监管方法的改进得分为0.93，和优于流行的监督U型网络算法由平均至少$ 5.8 \％$。

36. AdaptiveWeighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images [PDF] 返回目录
Jiaojiao Li, Chaoxiong Wu, Rui Song, Yunsong Li, Fei Liu
Abstract: Recent promising effort for spectral reconstruction (SR) focuses on learning a complicated mapping through using a deeper and wider convolutional neural networks (CNNs). Nevertheless, most CNN-based SR algorithms neglect to explore the camera spectral sensitivity (CSS) prior and interdependencies among intermediate features, thus limiting the representation ability of the network and performance of SR. To conquer these issues, we propose a novel adaptive weighted attention network (AWAN) for SR, whose backbone is stacked with multiple dual residual attention blocks (DRAB) decorating with long and short skip connections to form the dual residual learning. Concretely, we investigate an adaptive weighted channel attention (AWCA) module to reallocate channel-wise feature responses via integrating correlations between channels. Furthermore, a patch-level second-order non-local (PSNL) module is developed to capture long-range spatial contextual information by second-order non-local operations for more powerful feature representations. Based on the fact that the recovered RGB images can be projected by the reconstructed hyperspectral image (HSI) and the given CSS function, we incorporate the discrepancies of the RGB images and HSIs as a finer constraint for more accurate reconstruction. Experimental results demonstrate the effectiveness of our proposed AWAN network in terms of quantitative comparison and perceptual quality over other state-of-the-art SR methods. In the NTIRE 2020 Spectral Reconstruction Challenge, our entries obtain the 1st ranking on the Clean track and the 3rd place on the Real World track. Codes are available at this https URL.
摘要：光谱重建（SR）最近承诺努力的重点是通过采用更深入和更广泛的卷积神经网络（细胞神经网络）学习复杂的映射。尽管如此，大多数基于CNN-SR算法忽略探索相机光谱灵敏度（CSS）中间特征中之前和相互依赖性，从而限制了网络和SR的性能的表示能力。为了克服这些问题，我们提出了SR，其主链上叠加了多个双残留关注块（DRAB）一种新型自适应加权重视网络（AWAN）长装潢和短跳连接，形成双残留学习。具体地，我们通过信道之间的相关性积分调查自适应加权信道的注意（AWCA）模块重新分配信道逐特征响应。此外，补丁级别的二阶非本地（PSNL）模块开发捕获远射被二阶非本地运营了强大的功能表示的空间上下文信息。基于这样的事实，回收的RGB图像可以通过重建光谱图像（HSI）和给定的CSS功能被投影，我们结合了RGB图像和HSIS的差异为更准确的重建更细的约束。实验结果表明，在定量比较和对国家的最先进的SR等方法感知质量方面我们提出的AWAN网络的有效性。在NTIRE 2020光谱重建的挑战，我们的项目取得第一的排名清洁轨道和现实世界赛道上的第3位上。代码可在此HTTPS URL。

37. Synthesizing Unrestricted False Positive Adversarial Objects Using Generative Models [PDF] 返回目录
Martin Kotuliak, Sandro E. Schoenborn, Andrei Dan
Abstract: Adversarial examples are data points misclassified by neural networks. Originally, adversarial examples were limited to adding small perturbations to a given image. Recent work introduced the generalized concept of unrestricted adversarial examples, without limits on the added perturbations. In this paper, we introduce a new category of attacks that create unrestricted adversarial examples for object detection. Our key idea is to generate adversarial objects that are unrelated to the classes identified by the target object detector. Different from previous attacks, we use off-the-shelf Generative Adversarial Networks (GAN), without requiring any further training or modification. Our method consists of searching over the latent normal space of the GAN for adversarial objects that are wrongly identified by the target object detector. We evaluate this method on the commonly used Faster R-CNN ResNet-101, Inception v2 and SSD Mobilenet v1 object detectors using logo generative iWGAN-LC and SNGAN trained on CIFAR-10. The empirical results show that the generated adversarial objects are indistinguishable from non-adversarial objects generated by the GANs, transferable between the object detectors and robust in the physical world. This is the first work to study unrestricted false positive adversarial examples for object detection.
摘要：对抗性的例子是由神经网络分类错误的数据点。最初，对抗性例子仅限于添加的小扰动给定图像。最近的工作引入的无限制实例对抗性广义概念，而对所添加的扰动限制。在本文中，我们介绍的是创建对象检测无限制的敌对攻击的例子一个新的类别。我们的主要想法是产生无关的由对象检测器识别类敌对对象。从以往的攻击不同，我们使用过的，现成的剖成对抗性网络（GAN），无需任何进一步的培训或修改。我们的方法包括搜索在GAN为被错误通过目标对象检测器识别对抗对象的潜正常空间。我们对常用更快R-CNN RESNET-101，启v2和SSD Mobilenet V1使用标志生成iWGAN-LC和SNGAN对象检测器上CIFAR-10训练评估此方法。实证结果表明，所产生的对抗对象是从由甘斯，对象检测器之间的转移和健壮在物理世界中产生非对抗对象区分。这是第一次合作，研究物体检测无限制的假阳性对抗性的例子。

38. Assertion Detection in Multi-Label Clinical Text using Scope Localization [PDF] 返回目录
Rajeev Bhatt Ambati, Ahmed Ada Hanifi, Ramya Vunikili, Puneet Sharma, Oladimeji Farri
Abstract: Multi-label sentences (text) in the clinical domain result from the rich description of scenarios during patient care. The state-of-theart methods for assertion detection mostly address this task in the setting of a single assertion label per sentence (text). In addition, few rules based and deep learning methods perform negation/assertion scope detection on single-label text. It is a significant challenge extending these methods to address multi-label sentences without diminishing performance. Therefore, we developed a convolutional neural network (CNN) architecture to localize multiple labels and their scopes in a single stage end-to-end fashion, and demonstrate that our model performs atleast 12% better than the state-of-the-art on multi-label clinical text.
摘要：多标签句子（文本）从场景病人护理中丰富的描述临床领域的结果。国家的theart的方法断言检测主要是解决每个句子（文本）的单个断言标签设置这一任务。此外，一些规则基础和深厚的学习方法执行对单标签文本否定/断言范围检测。这是扩展这些方法来解决多标签的句子而不会降低性能显著的挑战。因此，我们开发了卷积神经网络（CNN）架构来本地化多个标签及其范围在单级端至端的方式，并表明我们的模型进行ATLEAST 12％，优于国家的最先进的多标签临床文本。

39. Structural Residual Learning for Single Image Rain Removal [PDF] 返回目录
Hong Wang, Yichen Wu, Qi Xie, Qian Zhao, Yong Liang, Deyu Meng
Abstract: To alleviate the adverse effect of rain streaks in image processing tasks, CNN-based single image rain removal methods have been recently proposed. However, the performance of these deep learning methods largely relies on the covering range of rain shapes contained in the pre-collected training rainy-clean image pairs. This makes them easily trapped into the overfitting-to-the-training-samples issue and cannot finely generalize to practical rainy images with complex and diverse rain streaks. Against this generalization issue, this study proposes a new network architecture by enforcing the output residual of the network possess intrinsic rain structures. Such a structural residual setting guarantees the rain layer extracted by the network finely comply with the prior knowledge of general rain streaks, and thus regulates sound rain shapes capable of being well extracted from rainy images in both training and predicting stages. Such a general regularization function naturally leads to both its better training accuracy and testing generalization capability even for those non-seen rain configurations. Such superiority is comprehensively substantiated by experiments implemented on synthetic and real datasets both visually and quantitatively as compared with current state-of-the-art methods.
摘要：为了减轻雨条纹的图像处理任务的不利影响，基于CNN-单个图像雨去除方法近来已经提出。然而，这些深层的学习方法的性能在很大程度上依赖于降雨覆盖范围包含在预先采集的训练阴雨干净的图像对形状。这使得他们容易陷入过度拟合到了训练样本的问题，不能一概而论精细实用阴雨图像复杂多样的雨条纹。在这种泛化的问题，本研究通过强制执行网络的输出残留具有内在的雨水结构提出了一种新的网络架构。这样的结构的残留设置保证由网络萃取一般雨条纹的先验知识精细遵守雨层，从而调节声音雨形状能够从在训练和预测阶段多雨图像被充分萃取。这种一般性规则化功能自然会导致它的两个更好的训练精度和测试泛化能力甚至对那些非见过下雨配置。这种优势是全面由合成的和真实数据集视觉和定量地实现实验证实为与国家的最先进的现有方法进行比较。

40. A Self-ensembling Framework for Semi-supervised Knee Osteoarthritis Localization and Classification with Dual-Consistency [PDF] 返回目录
Jiayu Huo, Liping Si, Xi Ouyang, Kai Xuan, Weiwu Yao, Zhong Xue, Lichi Zhang, Qian Wang
Abstract: Knee osteoarthritis (OA) is one of the most common musculoskeletal disorders and requires early-stage diagnosis. Nowadays, the deep convolutional neural networks have achieved greatly in the computer-aided diagnosis field. However, the construction of the deep learning models usually requires great amounts of annotated data, which is generally high-cost. In this paper, we propose a novel approach for knee OA diagnosis, including severity classification and lesion localization. Particularly, we design a self-ensembling framework, which is composed of a student network and a teacher network with the same structure. The student network learns from both labeled data and unlabeled data and the teacher network averages the student model weights through the training course. A novel attention loss function is developed to obtain accurate attention masks. With dual-consistency checking of the attention in the lesion classification and localization, the two networks can gradually optimize the attention distribution and improve the performance of each other, whereas the training relies on partially labeled data only and follows the semi-supervised manner. Experiments show that the proposed method can significantly improve the self-ensembling performance in both knee OA classification and localization, and also greatly reduce the needs of annotated data.
摘要：膝骨关节炎（OA）是最常见的肌肉骨骼疾病中的一个，并且需要早期诊断。如今，深卷积神经网络在计算机辅助诊断领域大大实现。但是，深学习模型的建设通常需要很大的量的注释数据，这通常是高成本的。在本文中，我们提出了膝关节骨性关节炎诊断的新方法，其中包括严重程度分类和病变定位。特别是，我们设计了一个自ensembling框架，它是由一个学生网络和相同结构的教师网络。从双方打成数据和未标记数据和教师网络平均通过培训班学员模型权重的学生网络获知。一种新型的注意力损失函数展开，以获得准确的关注口罩。随着在病变分类和本地化注意双一致性检查，两个网络可以逐步优化注意分配，提高相互的性能，而训练依赖于仅部分标记的数据和随后的半监督方式。实验表明，该方法可以显著提高两膝关节炎的分类和定位自我ensembling性能，同时也大大降低了注释数据的需求。

41. Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies [PDF] 返回目录
Minhyeok Lee, Junhee Seok
Abstract: Despite its short history, Generative Adversarial Network (GAN) has been extensively studied and used for various tasks, including its original purpose, i.e., synthetic sample generation. However, applying GAN to different data types with diverse neural network architectures has been hindered by its limitation in training, where the model easily diverges. Such a notorious training of GANs is well known and has been addressed in numerous studies. Consequently, in order to make the training of GAN stable, numerous regularization methods have been proposed in recent years. This paper reviews the regularization methods that have been recently introduced, most of which have been published in the last three years. Specifically, we focus on general methods that can be commonly used regardless of neural network architectures. To explore the latest research trends in the regularization for GANs, the methods are classified into several groups by their operation principles, and the differences between the methods are analyzed. Furthermore, to provide practical knowledge of using these methods, we investigate popular methods that have been frequently employed in state-of-the-art GANs. In addition, we discuss the limitations in existing methods and propose future research directions.
摘要：尽管它的历史很短，剖成对抗性网络（GAN）已被广泛研究并用于各种任务，包括其原来的目的，即，合成的样品的产生。然而，将GAN到不同的数据类型与不同的神经网络结构已被其在训练中的限制，在轻松的模型发散受阻。甘斯这样一个臭名昭著的训练是众所周知的，并已在许多研究解决。因此，为了使甘稳定的训练，无数的正规化方法已经在近几年提出的。本文回顾了最近已推出了正则化方法，其中大部分已发表在最近三年。具体而言，我们侧重于可以不管神经网络结构来常用的一般方法。探索正规化为甘斯的最新研究动向，方法是通过其工作原理分为几组，并且这些方法之间的差异进行了分析。此外，为了提供使用这些方法的实用知识，我们调查已经在国家的最先进的甘斯被频繁采用流行的方法。此外，我们还讨论了现有方法的局限性，并提出了今后的研究方向。

42. A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [PDF] 返回目录
Mohammad Hossein Fazel Zarandi, Shahabeddin Sotudian, Oscar Castillo
Abstract: In some complicated datasets, due to the presence of noisy data points and outliers, cluster validity indices can give conflicting results in determining the optimal number of clusters. This paper presents a new validity index for fuzzy-possibilistic c-means clustering called Fuzzy-Possibilistic (FP) index, which works well in the presence of clusters that vary in shape and density. Moreover, FPCM like most of the clustering algorithms is susceptible to some initial parameters. In this regard, in addition to the number of clusters, FPCM requires a priori selection of the degree of fuzziness and the degree of typicality. Therefore, we presented an efficient procedure for determining their optimal values. The proposed approach has been evaluated using several synthetic and real-world datasets. Final computational results demonstrate the capabilities and reliability of the proposed approach compared with several well-known fuzzy validity indices in the literature. Furthermore, to clarify the ability of the proposed method in real applications, the proposed method is implemented in microarray gene expression data clustering and medical image segmentation.
摘要：在一些复杂的数据集，由于噪声的数据点和异常值的存在，聚类有效性索引可以给在确定的簇的最佳数目相互矛盾的结果。为模糊可能性c-手段本文提出了一种新的有效性索引聚类称为模糊能度（FP）索引，其在形状和密度变化的簇的存在效果很好。此外，像大多数聚类算法的FPCM易受一些初始参数。在这方面，除了簇的数目，FPCM需要模糊的程度和典型性的程度的先验选择。因此，我们提出了用于确定它们的最佳值的有效程序。所提出的方法已经使用几种合成和真实世界的数据集进行评估。最终计算结果证明了能力，并与文献中的几个知名模糊有效性指标相比，该方法的可靠性。此外，为了阐明在实际应用中所提出的方法的能力，所提出的方法在微阵列基因表达数据聚类和医学图像分割实现。

43. Improve robustness of DNN for ECG signal classification:a noise-to-signal ratio perspective [PDF] 返回目录
Linhai Ma, Liang Liang
Abstract: Electrocardiogram (ECG) is the most widely used diagnostic tool to monitor the condition of the cardiovascular system. Deep neural networks (DNNs), have been developed in many research labs for automatic interpretation of ECG signals to identify potential abnormalities in patient hearts. Studies have shown that given a sufficiently large amount of data, the classification accuracy of DNNs could reach human-expert cardiologist level. A DNN-based automated ECG diagnostic system would be an affordable solution for patients in developing countries where human-expert cardiologist are lacking. However, despite of the excellent performance in classification accuracy, it has been shown that DNNs are highly vulnerable to adversarial attacks: subtle changes in input of a DNN can lead to a wrong classification output with high confidence. Thus, it is challenging and essential to improve adversarial robustness of DNNs for ECG signal classification, a life-critical application. In this work, we proposed to improve DNN robustness from the perspective of noise-to-signal ratio (NSR) and developed two methods to minimize NSR during training process. We evaluated the proposed methods on PhysionNets MIT-BIH dataset, and the results show that our proposed methods lead to an enhancement in robustness against PGD adversarial attack and SPSA attack, with a minimal change in accuracy on clean data.
摘要：心电图（ECG）是监测心血管系统的状况的最广泛使用的诊断工具。深层神经网络（DNNs），已经开发了许多研究实验室的ECG信号自动判读，以确定患者心中潜在的异常。有研究表明，给予足够大的数据量，DNNs的分类精度可以达到人类专家心脏病水平。基于DNN-A自动心电图诊断系统将是发展中国家的患者人类心脏病专家缺乏其中一个负担得起的解决方案。然而，尽管在分类准确度的出色表现，已经证明DNNs极易受到攻击的对抗性：在DNN的输入微妙的变化可能导致高信心错误分类输出。因此，它是具有挑战性的并且必须改善DNNs用于ECG信号分类，一个生命攸关的应用的对抗性鲁棒性。在这项工作中，我们提出了以提高从噪声 - 信号比（NSR）和开发了两种方法在训练过程中尽量减少NSR透视DNN鲁棒性。我们评估PhysionNets MIT-BIH数据集所提出的方法，结果表明，该方法导致对PGD敌对攻击和攻击SPSA鲁棒性的增强，在精度上干净的数据变化最小。

44. An Artificial-intelligence/Statistics Solution to Quantify Material Distortion for Thermal Compensation in Additive Manufacturing [PDF] 返回目录
Chao Wang, Shaofan Li, Danielle Zeng, Xinhai Zhu
Abstract: In this paper, we introduce a probabilistic statistics solution or artificial intelligence (AI) approach to identify and quantify permanent (non-zero strain) continuum/material deformation only based on the scanned material data in the spatial configuration and the shape of the initial design configuration or the material configuration. The challenge of this problem is that we only know the scanned material data in the spatial configuration and the shape of the design configuration of three-dimensional (3D) printed products, whereas for a specific scanned material point we do not know its corresponding material coordinates in the initial or designed referential configuration, provided that we do not know the detailed information on actual physical deformation process. Different from physics-based modeling, the method developed here is a data-driven artificial intelligence method, which solves the problem with incomplete deformation data or with missing information of actual physical deformation process. We coined the method is an AI-based material deformation finding algorithm. This method has practical significance and important applications in finding and designing thermal compensation configuration of a 3D printed product in additive manufacturing, which is at the heart of the cutting edge 3D printing technology. In this paper, we demonstrate that the proposed AI continuum/material deformation finding approach can accurately find permanent thermal deformation configuration for a complex 3D printed structure component, and hence to identify the thermal compensation design configuration in order to minimizing the impact of temperature fluctuations on 3D printed structure components that are sensitive to changes of temperature.
摘要：在本文中，我们引入一个概率统计溶液或人工智能（AI）的方法来鉴定和定量永久性（非零应变）连续/材料仅基于在空间配置中的扫描素材数据和的形状变形初始设计构型或材料构造。这个问题的挑战是，我们只知道在空间配置中的扫描素材数据和三维（3D）印刷产品的设计构造的形状，而对于特定的扫描材料点我们不知道其对应的材料的坐标在初始或设计参考配置，前提是我们不知道实际的物理变形过程的详细信息。从基于物理学的模型不同的是，这里介绍的方法是数据驱动的人工智能方法，它不完全变形数据或缺少实际的物理变形过程的信息解决问题。我们创造了这个方法是基于AI-材料变形的发现算法。该方法具有实际意义和在发现和设计中添加制造三维印刷的产品，这是在切削刃的3D印刷技术的心脏的热补偿结构重要的应用。在本文中，我们表明，该AI连续/材料变形的发现的方法可以准确找到永久热变形配置一个复杂的三维印刷结构部件，因此，以便识别所述热补偿设计配置来最小化温度波动的影响三维打印的对温度的变化很敏感结构组件。

45. Translating Video Recordings of Mobile App Usages into Replayable Scenarios [PDF] 返回目录
Carlos Bernal-Cárdenas, Nathan Cooper, Kevin Moran, Oscar Chaparro, Andrian Marcus, Denys Poshyvanyk
Abstract: Screen recordings of mobile applications are easy to obtain and capture a wealth of information pertinent to software developers (e.g., bugs or feature requests), making them a popular mechanism for crowdsourced app feedback. Thus, these videos are becoming a common artifact that developers must manage. In light of unique mobile development constraints, including swift release cycles and rapidly evolving platforms, automated techniques for analyzing all types of rich software artifacts provide benefit to mobile developers. Unfortunately, automatically analyzing screen recordings presents serious challenges, due to their graphical nature, compared to other types of (textual) artifacts. To address these challenges, this paper introduces V2S, a lightweight, automated approach for translating video recordings of Android app usages into replayable scenarios. V2S is based primarily on computer vision techniques and adapts recent solutions for object detection and image classification to detect and classify user actions captured in a video, and convert these into a replayable test scenario. We performed an extensive evaluation of V2S involving 175 videos depicting 3,534 GUI-based actions collected from users exercising features and reproducing bugs from over 80 popular Android apps. Our results illustrate that V2S can accurately replay scenarios from screen recordings, and is capable of reproducing $\approx$ 89% of our collected videos with minimal overhead. A case study with three industrial partners illustrates the potential usefulness of V2S from the viewpoint of developers.
摘要：移动应用的屏幕录制很容易获得，并捕获了丰富的信息相关的软件开发者（例如，错误或功能请求），使他们对众包的应用反馈的流行机制。因此，这些影片已经成为常见的假象，开发商必须管理。在光的独特的移动发展的制约因素，包括迅速的发布周期和快速发展的平台，自动化技术分析各类丰富的软件制品提供益处移动开发。不幸的是，自动分析屏幕录制呈现严峻的挑战，由于其图形化的特性，相对于其他类型（文本）文物。为了应对这些挑战，本文介绍V2S，重量轻，转换为Android应用用途的视频记录转换为可重新播放场景自动化的方法。 V2S主要是基于计算机视觉技术和适应近期目标检测和图像分类的解决方案来检测和视频拍摄的分类用户动作，并将其转换成可回放测试场景。我们进行V2S的广泛的评估，涉及175部影片描绘了来自用户的运动特性和来自80多个流行的Android应用程序重现错误收集3534基于GUI的行动。我们的研究结果表明V2S可以准确地从屏幕录制回放场景，并能够以最小的开销再现我们收集到的视频$ \约89％$的。与三次产业的合作伙伴为例说明V2S从开发商的角度来看，潜在有用。

46. On the effectiveness of GAN generated cardiac MRIs for segmentation [PDF] 返回目录
Youssef Skandarani, Nathan Painchaud, Pierre-Marc Jodoin, Alain Lalande
Abstract: In this work, we propose a Variational Autoencoder (VAE) - Generative Adversarial Networks (GAN) model that can produce highly realistic MRI together with its pixel accurate groundtruth for the application of cine-MR image cardiac segmentation. On one side of our model is a Variational Autoencoder (VAE) trained to learn the latent representations of cardiac shapes. On the other side is a GAN that uses "SPatially-Adaptive (DE)Normalization" (SPADE) modules to generate realistic MR images tailored to a given anatomical map. At test time, the sampling of the VAE latent space allows to generate an arbitrary large number of cardiac shapes, which are fed to the GAN that subsequently generates MR images whose cardiac structure fits that of the cardiac shapes. In other words, our system can generate a large volume of realistic yet labeled cardiac MR images. We show that segmentation with CNNs trained with our synthetic annotated images gets competitive results compared to traditional techniques. We also show that combining data augmentation with our GAN-generated images lead to an improvement in the Dice score of up to 12 percent while allowing for better generalization capabilities on other datasets.
摘要：在这项工作中，我们提出了一个变自动编码器（VAE） - 能够产生高度逼真的MRI与电影-MR图像分割心脏的应用它的像素精确的地面实况一起剖成对抗性网络（GAN）模型。在我们的模型中的一侧是一个变自动编码器（VAE）训练学习心脏形状的潜表示。在另一侧是GaN的是使用“空间自适应（DE）正常化”（SPADE）模块，以产生适合于一个给定的解剖图逼真的MR图像。在测试时，VAE潜在空间的采样允许生成任意大量的心脏的形状，它们被馈送到随后产生的MR图像，其心脏结构配合，所述的心脏的形状的GAN的。换句话说，我们的系统可以产生大量真实生动又标记心脏MR图像。我们发现与细胞神经网络是分割与我们合成注释的图像经过培训得到有竞争力的结果与传统技术相比。我们还表明，数据增强结合我们的GAN-生成的图像导致骰子得分最高的改善到12％，同时允许对其他数据集更好的泛化能力。

47. Saving the Sonorine: Audio Recovery Using Image Processing and Computer Vision [PDF] 返回目录
Kai Ji, Feng, Adam Finkelstein
Abstract: This paper presents a novel technique to recover audio from sonorines, an early 20th century form of analogue sound storage. Our method uses high resolution photographs of sonorines under different lighting conditions to observe the change in reflection behavior of the physical surface features and create a three-dimensional height map of the surface. Sound can then be extracted using height information within the surface's grooves, mimicking a physical stylus on a phonograph. Unlike traditional playback methods, our method has the advantage of being contactless: the medium will not incur damage and wear from being played repeatedly. We compare the results of our technique to a previously successful contactless method using flatbed scans of the sonorines, and conclude with future research that can be applied to this photovisual approach to audio recovery.
摘要：本文提出了一种新颖的技术来恢复音频从sonorines，模拟声音存储的20世纪早期形式。我们的方法使用不同的照明条件下sonorines的高分辨率照片观察的物理表面特征反射行为的变化和创建的表面的三维高度的地图。声音然后，可以使用在表面的凹槽内的高度信息，模仿上的留声机的物理指示笔来提取。不同于传统的播放方法，我们的方法是接触式的优点：中不会产生损坏和磨损时，反复播放。我们为技术的结果比较使用sonorines的平板扫描之前成功的非接触式方法，并与可应用于这种仿视的方法来恢复音频今后的研究结论。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-20

目录

摘要