摘要

1. Instant 3D Object Tracking with Applications in Augmented Reality [PDF] 返回目录
Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, Matthias Grundmann
Abstract: Tracking object poses in 3D is a crucial building block for Augmented Reality applications. We propose an instant motion tracking system that tracks an object's pose in space (represented by its 3D bounding box) in real-time on mobile devices. Our system does not require any prior sensory calibration or initialization to function. We employ a deep neural network to detect objects and estimate their initial 3D pose. Then the estimated pose is tracked using a robust planar tracker. Our tracker is capable of performing relative-scale 9-DoF tracking in real-time on mobile devices. By combining use of CPU and GPU efficiently, we achieve 26-FPS+ performance on mobile devices.
摘要：在3D跟踪对象的姿态是增强现实应用的一个关键组成部分。我们建议，在实时移动设备上的追踪对象在空间中的姿势（通过其三维边界框表示）的即时运动跟踪系统。我们的系统不需要任何前感官校准或初始化功能。我们采用深层神经网络来检测对象并估算其初始三维姿态。然后估计姿态被使用鲁棒平面跟踪器跟踪。我们的跟踪器能够在实时的移动设备上执行的相对尺度9自由度跟踪的。通过有效地结合使用CPU和GPU，我们实现了在移动设备上的26 FPS +的性能。

2. Towards Robust Sensor Fusion in Visual Perception [PDF] 返回目录
Shaojie Wang, Tong Wu, Yevgeniy Vorobeychik
Abstract: We study the problem of robust sensor fusion in visual perception, especially under the autonomous driving settings. We evaluate the robustness of RGB camera and LiDAR sensor fusion for binary classification and object detection. In this work, we are interested in the behavior of different fusion methods under adversarial attacks on different sensors. We first train both classification and detection models with early fusion and late fusion, then apply different combinations of adversarial attacks on both sensor inputs for evaluation. We also study the effectiveness of adversarial attacks with varying budgets. Experiment results show that while sensor fusion models are generally vulnerable to adversarial attacks, late fusion method is more robust than early fusion. The results also provide insights on further obtaining robust sensor fusion models.
摘要：我们在视知觉学习强大的传感器融合的问题，尤其是在自主移动设置。我们评价RGB照相机和激光雷达传感器融合为二值分类和对象检测的鲁棒性。在这项工作中，我们感兴趣的是根据不同的传感器对抗攻击不同的融合方法的行为。我们首先培养具有早期融合后融合分类和检测模型，然后申请对评估这两个传感器输入对抗性攻击不同的组合。我们也研究了不同预算的对抗攻击的有效性。实验结果表明，当传感器融合模型一般都容易受到攻击的对抗性，后融合方法比早期的融合更加健壮。结果还提供关于进一步获得强大的传感器融合模型的见解。

3. Facing the Hard Problems in FGVC [PDF] 返回目录
Connor Anderson, Matt Gwilliam, Adam Teuscher, Andrew Merrill, Ryan Farrell
Abstract: In fine-grained visual categorization (FGVC), there is a near-singular focus in pursuit of attaining state-of-the-art (SOTA) accuracy. This work carefully analyzes the performance of recent SOTA methods, quantitatively, but more importantly, qualitatively. We show that these models universally struggle with certain "hard" images, while also making complementary mistakes. We underscore the importance of such analysis, and demonstrate that combining complementary models can improve accuracy on the popular CUB-200 dataset by over 5%. In addition to detailed analysis and characterization of the errors made by these SOTA methods, we provide a clear set of recommended directions for future FGVC researchers.
摘要：细粒度视觉分类（FGVC），有一个近焦点奇异追求实现状态的最先进的（SOTA）的准确性。这项工作认真分析了近期SOTA方法，定量，但更重要的是性能，质量上。我们发现，这些机型普遍具有一定的“硬”的图像奋斗，同时也使得互补性的错误。我们强调这种分析的重要性，并表明相结合的互补模式可通过在5％提高上流行的CUB-200数据集的准确性。除了详细的分析，并通过这些SOTA方法做出的错误的特征，我们提供了一套明确的未来FGVC研究人员建议的方向。

4. Efficient Spatially Adaptive Convolution and Correlation [PDF] 返回目录
Thomas W. Mitchel, Benedict Brown, David Koller, Tim Weyrich, Szymon Rusinkiewicz, Michael Kazhdan
Abstract: Fast methods for convolution and correlation underlie a variety of applications in computer vision and graphics, including efficient filtering, analysis, and simulation. However, standard convolution and correlation are inherently limited to fixed filters: spatial adaptation is impossible without sacrificing efficient computation. In early work, Freeman and Adelson have shown how steerable filters can address this limitation, providing a way for rotating the filter as it is passed over the signal. In this work, we provide a general, representation-theoretic, framework that allows for spatially varying linear transformations to be applied to the filter. This framework allows for efficient implementation of extended convolution and correlation for transformation groups such as rotation (in 2D and 3D) and scale, and provides a new interpretation for previous methods including steerable filters and the generalized Hough transform. We present applications to pattern matching, image feature description, vector field visualization, and adaptive image filtering.
摘要：快速方法卷积和相关背后的各种计算机视觉和图形，包括有效的过滤，分析和仿真应用。然而，标准的卷积和相关固有限于固定滤波器：空间适应性是在不牺牲效率的计算是不可能的。在早期的工作中，弗里曼和阿德尔森已经表明可操纵过滤器可以如何解决此限制，用于当越过信号旋转过滤器提供一种方式。在这项工作中，我们提供了一个一般的，表示理论上的，框架，其允许空间变化的线性变换被施加到过滤器。此框架允许高效地实现扩展卷积和相关用于转化基团，例如旋转（在2D和3D）和规模的，并提供了以前的方法，包括可操纵滤波器和广义霍夫变换了新的解释。我们本应用模式匹配，图像特征的说明中，矢量场的可视化，和自适应图像滤波。

5. Boundary Regularized Building Footprint Extraction From Satellite Images Using Deep Neural Network [PDF] 返回目录
Kang Zhao, Muhammad Kamran, Gunho Sohn
Abstract: In recent years, an ever-increasing number of remote satellites are orbiting the Earth which streams vast amount of visual data to support a wide range of civil, public and military applications. One of the key information obtained from satellite imagery is to produce and update spatial maps of built environment due to its wide coverage with high resolution data. However, reconstructing spatial maps from satellite imagery is not a trivial vision task as it requires reconstructing a scene or object with high-level representation such as primitives. For the last decade, significant advancement in object detection and representation using visual data has been achieved, but the primitive-based object representation still remains as a challenging vision task. Thus, a high-quality spatial map is mainly produced through complex labour-intensive processes. In this paper, we propose a novel deep neural network, which enables to jointly detect building instance and regularize noisy building boundary shapes from a single satellite imagery. The proposed deep learning method consists of a two-stage object detection network to produce region of interest (RoI) features and a building boundary extraction network using graph models to learn geometric information of the polygon shapes. Extensive experiments show that our model can accomplish multi-tasks of object localization, recognition, semantic labelling and geometric shape extraction simultaneously. In terms of building extraction accuracy, computation efficiency and boundary regularization performance, our model outperforms the state-of-the-art baseline models.
摘要：近年来，越来越多的远程卫星的轨道是该流大量的视觉数据，以支持广泛的公民，公共和军事应用地球。一个从卫星图像获得的关键信息是产生和更新建筑环境空间地图由于其覆盖面广，具有高分辨率的数据。然而，从卫星图像重建空间地图不是一个简单的视觉任务，因为它需要重建场景或对象与高级别表示，例如原语。在过去的十年中，目标检测和使用可视数据表示显著进步已经实现，但基于原语的对象表示仍然作为一个具有挑战性的视觉任务。因此，高品质的空间地图主要是通过复杂的劳动密集的过程生产的。在本文中，我们提出了一个新颖的深层神经网络，这使得联合检测建筑实例和规范嘈杂从一个单一的卫星影像建筑物边界的形状。所提出的深学习方法包括两个阶段的物体检测网络到感兴趣产品区域（ROI）特征，并使用图模型学习多边形形状的几何信息的建筑物边界提取网络。大量的实验表明，我们的模型可以同时实现目标定位，识别，语义标注和几何形状提取的多任务。在建筑物提取的精度，计算效率和边界正规化性能方面，我们的模型优于国家的最先进的基本模式。

6. ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects [PDF] 返回目录
Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans
Abstract: We revisit the problem of Object-Goal Navigation (ObjectNav). In its simplest form, ObjectNav is defined as the task of navigating to an object, specified by its label, in an unexplored environment. In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e.g., find a chair, by navigating to it. As the community begins to show increased interest in semantic goal specification for navigation tasks, a number of different often-inconsistent interpretations of this task are emerging. This document summarizes the consensus recommendations of this working group on ObjectNav. In particular, we make recommendations on subtle but important details of evaluation criteria (for measuring success when navigating towards a target object), the agent's embodiment parameters, and the characteristics of the environments within which the task is carried out. Finally, we provide a detailed description of the instantiation of these recommendations in challenges organized at the Embodied AI workshop at CVPR 2020 \url{this http URL} .
摘要：我们重新审视对象的目标导航（ObjectNav）的问题。在其最简单的形式，ObjectNav被定义为导航到一个对象，由它的标签说明，在一个未开发环境的任务。特别是，该药剂在随机位置初始化，并在环境造成，并要求找对象类的实例，例如，找张椅子，导航到它。随着社会开始显示在导航任务语义的目标规格越来越多的关注，许多这种任务的不同往往不一致的解释不断涌现。本文总结了ObjectNav该工作组的协商一致的建议。特别是，我们做出的评价标准微妙但重要的细节（用于朝目标对象导航时衡量成功）的建议，该代理的实施例中的参数，并在其内的任务被执行的环境的特性。最后，我们提供的体现AI研讨会在CVPR 2020 \ {URL这个HTTP URL}组织的挑战，这些建议的实例的详细描述。

7. Joint Detection and Multi-Object Tracking with Graph Neural Networks [PDF] 返回目录
Yongxin Wang, Xinshuo Weng, Kris Kitani
Abstract: Object detection and data association are critical components in multi-object tracking (MOT) systems. Despite the fact that these two components are highly dependent on each other, one popular trend in MOT is to perform detection and data association as separate modules, processed in a cascaded order. Due to this cascaded process, the resulting MOT system can only perform forward inference and cannot back-propagate error through the entire pipeline and correct them. This leads to sub-optimal performance over the total pipeline. To address this issue, recent work jointly optimizes detection and data association and forms an integrated MOT approach, which has been shown to improve performance in both detection and tracking. In this work, we propose a new approach for joint MOT based on Graph Neural Networks (GNNs). The key idea of our approach is that GNNs can explicitly model complex interactions between multiple objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association. We also leverage the fact that motion features are useful for MOT when used together with appearance features. So our proposed joint MOT approach also incorporates appearance and motion features within our graph-based feature learning framework, leading to better feature learning for MOT. Through extensive experiments on the MOT challenge dataset, we show that our proposed method achieves state-of-the-art performance on both object detection and MOT.
摘要：对象检测和数据关联是在多目标跟踪（MOT）系统中的关键部件。尽管这两个组件是高度依赖于彼此，在MOT一个流行趋势是执行检测和数据关联为单独的模块，以级联顺序进行处理。由于这种级联过程中，所产生的MOT系统只能进行正向推理，并通过整条管线无法回繁殖的错误并加以改正。这导致次优的性能在总管道。为了解决这个问题，最近的工作共同优化了检测和数据关联，并形成综合MOT方法，这已显示出改善探测和跟踪性能。在这项工作中，我们提出了基于图的神经网络（GNNS）联合MOT的新方法。我们的方法的核心思想是，GNNS可以明确地在两个空间和时间域，这是学习的检测和数据关联判别特征必不可少的多个对象之间复杂的相互作用的模型。我们还充分利用的事实，与外观功能一起使用时运动特征是MOT有用。所以，我们提出的联合MOT方法还采用了外观和运动我们的基于图形的功能学习框架内的功能，从而更好地学习功能为MOT。通过对MOT挑战数据集大量的实验，我们证明了我们所提出的方法实现了两个目标检测和MOT国家的最先进的性能。

8. Calibrated Adversarial Refinement for Multimodal Semantic Segmentation [PDF] 返回目录
Elias Kassapis, Georgi Dikov, Deepak K. Gupta, Cedric Nugteren
Abstract: Ambiguities in images or unsystematic annotation can lead to multiple valid solutions in semantic segmentation. To learn a distribution over predictions, recent work has explored the use of probabilistic networks. However, these do not necessarily capture the empirical distribution accurately. In this work, we aim to learn a calibrated multimodal predictive distribution, where the empirical frequency of the sampled predictions closely reflects that of the corresponding labels in the training set. To this end, we propose a novel two-stage, cascaded strategy for calibrated adversarial refinement. In the first stage, we explicitly model the data with a categorical likelihood. In the second, we train an adversarial network to sample from it an arbitrary number of coherent predictions. The model can be used independently or integrated into any black-box segmentation framework to enable the synthesis of diverse predictions. We demonstrate the utility and versatility of the approach by achieving competitive results on the multigrader LIDC dataset and a modified Cityscapes dataset. In addition, we use a toy regression dataset to show that our framework is not confined to semantic segmentation, and the core design can be adapted to other tasks requiring learning a calibrated predictive distribution.
摘要：在图像或无章可循标注歧义会导致语义分割多个有效的解决方案。要了解在预测的分布，最近的工作探索了利用概率网络。然而，这些并不一定准确的捕捉经验分布。在这项工作中，我们的目标是学习校准的多模态的预测分布，其中采样预测的经验频率密切反映了在训练集中相应的标签。为此，我们提出了校准对抗细化一种新型两级级联策略。在第一阶段，我们明确地将数据与一个明确的可能性模型。在第二，我们培养了对抗网络到样品从它相干预测的任意数量。该模型可以被独立地使用或者集成到任何黑盒分割框架，使不同的预测的合成。我们通过实现对multigrader LIDC数据集和修改的风情数据集有竞争力的结果证明了该方法的实用性和通用性。此外，我们使用的玩具回归数据集来表明我们的框架并不局限于语义分割，核心设计，可适应于需要学习校准预测分布的其他任务。

9. Distilling Object Detectors with Task Adaptive Regularization [PDF] 返回目录
Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian
Abstract: Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific priors. The intuition is that simply distilling all information from teacher to student is not advisable, instead we should only borrow priors from the teacher model where the student cannot perform well. Towards this goal, we propose a region proposal sharing mechanism to interflow region responses between the teacher and student models. Based on this, we adaptively transfer knowledge at three levels, \emph{i.e.}, feature backbone, classification head, and bounding box regression head, according to which model performs more reasonably. Furthermore, considering that it would introduce optimization dilemma when minimizing distillation loss and detection loss simultaneously, we propose a distillation decay strategy to help improve model generalization via gradually reducing the distillation penalty. Experiments on widely used detection benchmarks demonstrate the effectiveness of our method. In particular, using Faster R-CNN with FPN as an instantiation, we achieve an accuracy of $39.0\%$ with Resnet-50 on COCO dataset, which surpasses the baseline $36.3\%$ by $2.7\%$ points, and even better than the teacher model with $38.5\%$ mAP.
摘要：当前状态的最先进的对象检测器是在高的计算成本为代价的和难以部署到低端设备。知识蒸馏，其目的是通过从更大的教师典范转移知识培训较小的学生网络，是模型小型化前途的解决方案之一。在本文中，我们调查的深度典型探测器的各个模块，并根据具体的任务先验从老师自适应传输知识，学生提出的一般蒸馏框架。直觉是简单地从老师蒸馏的所有信息的学生是不可取的，相反，我们应该只从教师模型，其中学生不能表现良好借前科。为了实现这一目标，我们提出了一个建议，区域共享机制，以教师和学生模型之间互通的区域响应。在此基础上，我们在三个层次自适应地传授知识，\ {EMPH即}，功能骨干，分类头，边框回归头，根据该模型进行更合理。此外，考虑到尽量减少损失蒸馏和检测丢失时，将推出优化的窘境同时，我们提出了一个蒸馏衰减战略，以帮助通过逐步减少蒸馏罚款提高模型的泛化。广泛使用的检测基准的实验结果证明了该方法的有效性。特别是，使用更快的R-CNN与FPN作为一个实例，我们实现了$ 39.0 \％$的精度RESNET-50 COCO数据集，由$ 2.7 \％$点超过基线$ 36.3 \％$，甚至优于老师模型$ 38.5 \％$图。

10. Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometry Constrained Keypoints in Real-Time [PDF] 返回目录
Nils Gählert, Jun-Jun Wan, Nicolas Jourdan, Jan Finkbeiner, Uwe Franke, Joachim Denzler
Abstract: In this paper we propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images. Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters and hence keeping the runtime close to pure 2D object detection. The additional parameters are transformed to 3D bounding box keypoints within the network under geometric constraints. Our proposed method features a full 3D description including all three angles of rotation without supervision by any labeled ground truth data for the object's orientation, as it focuses on certain keypoints within the image plane. While our approach can be combined with any modern object detection framework with only little computational overhead, we exemplify the extension of SSD for the prediction of 3D bounding boxes. We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection as well as the novel nuScenes Object Detection benchmarks. While we achieve competitive results on both benchmarks we outperform current state-of-the-art methods in terms of speed with more than 20 FPS for all tested datasets and image resolutions.
摘要：在本文中，我们提出一种用于在单眼RGB图像检测车辆的新颖3D单拍物体检测方法。我们的方法通过预测附加回归和分类参数，并因此保持运行时接近纯2D物体检测升降机2D检测到3D空间。附加参数被转换为三维边界框的关键点下的几何约束的网络内。我们提出的方法采用了全3D的描述，包括所有旋转三个角度不受任何标记的地面真实数据，监督对象的方位，因为它专注于像平面内的某些关键点。虽然我们的方法可以用只有很少的计算开销任何现代的对象检测框架结合起来，我们举例说明SSD的用于3D边界框预测的扩展。我们测试我们对不同的数据集用于自主驾驶的方法，它使用挑战KITTI立体物检测以及小说nuScenes对象检测基准评估。虽然我们实现了两个基准测试结果有竞争力优于我们国家的最先进的电流在所有测试数据集和图像分辨率超过20 FPS速度方面的方法。

11. DCNNs: A Transfer Learning comparison of Full Weapon Family threat detection forDual-Energy X-Ray Baggage Imagery [PDF] 返回目录
A. Williamson, P. Dickinson, T. Lambrou, J. C. Murray
Abstract: Recent advancements in Convolutional Neural Networks have yielded super-human levels of performance in image recognition tasks [13, 25]; however, with increasing volumes of parcels crossing UK borders each year, classification of threats becomes integral to the smooth operation of UK borders. In this work we propose the first pipeline to effectively process Dual-Energy X-Ray scanner output, and perform classification capable of distinguishing between firearm families (Assault Rifle, Revolver, Self-Loading Pistol,Shotgun, and Sub-Machine Gun) from this output. With this pipeline we compare re-cent Convolutional Neural Network architectures against the X-Ray baggage domain via Transfer Learning and show ResNet50 to be most suitable to classification - outlining a number of considerations for operational success within the domain.
摘要：最近在卷积神经网络的进步已经在图像识别任务产生的性能超人类的水平[13，25]。然而，随着包裹穿越英国的体积每年接壤，威胁分类成为不可或缺的英国边境的平稳运行。在这项工作中，我们从这个建议的第一条管道有效地处理双能X射线扫描仪的输出，并执行能够枪支家庭之间的区别进行分类（突击步枪，左轮手枪，半自动手枪，猎枪和冲锋枪）输出。有了这条管道，我们比较重美分卷积神经网络结构对通过迁移学习的X光行李域，并显示ResNet50是最合适的分类 - 概述了一些注意事项，为域内运营成功。

12. Rotation Invariant Deep CBIR [PDF] 返回目录
Subhadip Maji, Smarajit Bose
Abstract: Introduction of Convolutional Neural Networks has improved results on almost every image-based problem and Content-Based Image Retrieval is not an exception. But the CNN features, being rotation invariant, creates problems to build a rotation-invariant CBIR system. Though rotation-invariant features can be hand-engineered, the retrieval accuracy is very low because by hand engineering only low-level features can be created, unlike deep learning models that create high-level features along with low-level features. This paper shows a novel method to build a rotational invariant CBIR system by introducing a deep learning orientation angle detection model along with the CBIR feature extraction model. This paper also highlights that this rotation invariant deep CBIR can retrieve images from a large dataset in real-time.
摘要：卷积神经网络的介绍已经对几乎所有基于图像的问题，改进的结果和基于内容的图像检索也不例外。但CNN的特点，是旋转不变，产生问题建立一个旋转不变图像检索系统。虽然旋转不变特征可手设计，检索精度非常低，因为手工工程只能创建低级别的功能，不像打造高层次深的学习模式与低级别的功能以及特征。本文示出了由与CBIR特征提取模型一起引入深学习取向角检测模型建立一个旋转不变CBIR系统的新方法。本文还着重指出，该旋转不变性深CBIR可以从实时的大型数据集检索图像。

13. Motion Representation Using Residual Frames with 3D CNN [PDF] 返回目录
Li Tao, Xueting Wang, Toshihiko Yamasaki
Abstract: Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which is very high. In this paper, we propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets. By replacing traditional stacked RGB frames with residual ones, 35.6% and 26.6% points improvements over top-1 accuracy can be obtained on the UCF101 and HMDB51 datasets when ResNet-18 models are trained from scratch. And we achieved the state-of-the-art results in this training mode. Analysis shows that better motion features can be extracted using residual frames compared to RGB counterpart. By combining with a simple appearance path, our proposal can be even better than some methods using optical flow streams.
摘要：近日，3D卷积网络（3D ConvNets）的动作识别产量不错的表现。然而，仍然需要光流流，以确保更好的性能，它的成本是非常高的。在本文中，我们提出了一种快速而有效的方法来提取运动从利用残留帧作为3D ConvNets输入数据的视频功能。通过在顶部-1精度残留的，35.6％和26.6％的改进点取代了传统的堆叠RGB帧可以在UCF101和HMDB51数据集时RESNET-18模型从头训练而获得。而我们在这个训练模式取得了国家的最先进的成果。分析表明，更好的运动特征可以使用残差帧相比RGB对应被提取。通过一个简单的外观路合成，我们的建议可能甚至比使用光流流的一些方法。

14. Exemplar Loss for Siamese Network in Visual Tracking [PDF] 返回目录
Shuo Chang, YiFan Zhang, Sai Huang, Yuanyuan Yao, Zhiyong Feng
Abstract: Visual tracking plays an important role in perception system, which is a crucial part of intelligent transportation. Recently, Siamese network is a hot topic for visual tracking to estimate moving targets' trajectory, due to its superior accuracy and simple framework. In general, Siamese tracking algorithms, supervised by logistic loss and triplet loss, increase the value of inner product between exemplar template and positive sample while reduce the value of inner product with background sample. However, the distractors from different exemplars are not considered by mentioned loss functions, which limit the feature models' discrimination. In this paper, a new exemplar loss integrated with logistic loss is proposed to enhance the feature model's discrimination by reducing inner products among exemplars. Without the bells and whistles, the proposed algorithm outperforms the methods supervised by logistic loss or triplet loss. Numerical results suggest that the newly developed algorithm achieves comparable performance in public benchmarks.
摘要：视觉跟踪起着感知系统，是智能交通的重要组成部分的重要作用。近日，连体网络是视觉跟踪，以评估移动目标的轨迹，一个热门话题，由于其卓越的精度和简单的框架。一般而言，连体跟踪算法，由物流损失和三重态损耗监督，增加示例性模板和阳性样品之间的内积的值，同时降低内积的与背景样品的值。然而，从不同的范例的干扰物，不被提及的损失函数，这限制了功能模型的歧视考虑。在本文中，与物流集成损失一个新的典范损失提出通过减少范例中内产品，提升特征模型的歧视。没有花俏，该算法优于方法监督由后勤损失或三重损失。计算结果表明，新开发的算法，实现了公共基准相当的性能。

15. FNA++: Fast Network Adaptation via Parameter Remapping and Architecture Search [PDF] 返回目录
Jiemin Fang, Yuzhu Sun, Qian Zhang, Kangjian Peng, Yuan Li, Wenyu Liu, Xinggang Wang
Abstract: Deep neural networks achieve remarkable performance in many computer vision tasks. Most state-of-the-art (SOTA) semantic segmentation and object detection approaches reuse neural network architectures designed for image classification as the backbone, commonly pre-trained on ImageNet. However, performance gains can be achieved by designing network architectures specifically for detection and segmentation, as shown by recent neural architecture search (NAS) research for detection and segmentation. One major challenge though is that ImageNet pre-training of the search space representation (a.k.a. super network) or the searched networks incurs huge computational cost. In this paper, we propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network (e.g. an ImageNet pre-trained network) to become a network with different depths, widths, or kernel sizes via a parameter remapping technique, making it possible to use NAS for segmentation/detection tasks a lot more efficiently. In our experiments, we conduct FNA++ on MobileNetV2 to obtain new networks for semantic segmentation, object detection, and human pose estimation that clearly outperform existing networks designed both manually and by NAS. We also implement FNA++ on ResNets and NAS networks, which demonstrates a great generalization ability. The total computation cost of FNA++ is significantly less than SOTA segmentation/detection NAS approaches: 1737x less than DPC, 6.8x less than Auto-DeepLab, and 8.0x less than DetNAS. The code will be released at this https URL.
摘要：深层神经网络实现在许多计算机视觉任务骄人的业绩。大多数国家的最先进的（SOTA）语义分割和对象检测的接近重用神经网络结构设计用于图像分类为骨干，上ImageNet通常预先训练。然而，性能提升可以通过设计网络架构专为检测和分割，如最近的神经结构搜索（NAS）的研究检测和分割来实现。虽然一个主要的挑战是，ImageNet搜索空间表示（又名超级网络）或搜索网络即被巨大的计算成本的岗前培训。在本文中，我们提出了一种快速网络适配（FNA ++）方法，该方法能适应两个架构和种子网络的参数（例如，预训练ImageNet网络），以成为具有不同深度，宽度，或内核尺寸经由网络一个参数重新映射技术，使得有可能有很多更有效地使用用于NAS分割/检测任务。在我们的实验中，我们进行FNA ++上MobileNetV2以获得语义分割，目标检测和人体姿势估计新的网络，显然跑赢现有的网络设计手动和通过NAS。我们还实施FNA ++上ResNets和NAS网络，这表明有很大的推广能力。 FNA ++的总计算成本比SOTA显著较少分段/检测NAS接近：1737x小于DPC，6.8x小于自动DeepLab，而且比DetNAS少8.0倍。该代码将在这个HTTPS URL被释放。

16. PFGDF: Pruning Filter via Gaussian Distribution Feature for Deep Neural Networks Acceleration [PDF] 返回目录
Jianrong Xu, Chao Li, Bifeng Cui, Kang Yang, Yongjun Xu
Abstract: The existence of a lot of redundant information in convolutional neural networks leads to the slow deployment of its equipment on the edge. To solve this issue, we proposed a novel deep learning model compression acceleration method based on data distribution characteristics, namely Pruning Filter via Gaussian Distribution Feature(PFGDF) which was to found the smaller interval of the convolution layer of a certain layer to describe the original on the grounds of distribution characteristics . Compared with revious advanced methods, PFGDF compressed the model by filters with insignificance in distribution regardless of the contribution and sensitivity information of the convolution filter. The pruning process of the model was automated, and always ensured that the compressed model could restore the performance of original model. Notably, on CIFAR-10, PFGDF compressed the convolution filter on VGG-16 by 66:62%, the parameter reducing more than 90%, and FLOPs achieved 70:27%. On ResNet-32, PFGDF reduced the convolution filter by 21:92%. The parameter was reduced to 54:64%, and the FLOPs exceeded 42%
摘要：很多在卷积神经网络导致其设备上的优势部署缓慢冗余信息的存在。为了解决这个问题，我们提出了一种基于数据的分布特性，通过高斯分布特征（PFGDF）即修剪滤波器，其是发现了某些层的卷积层来描述原始的小间隔的新的深学习模型压缩加速度方法上的分布特征的理由。与revious先进的方法相比，在PFGDF分配压缩通过过滤器的模型与渺小不管卷积滤波器的贡献和灵敏度信息。该模型的修剪过程自动化，并始终确保了压缩模型可以恢复原来模型的性能。值得注意的是，上CIFAR-10，由PFGDF 66压缩上VGG-16的卷积滤波器：62％，参数减少90％以上，和触发器实现70：27％。上RESNET-32，由PFGDF 21减小卷积滤波器：92％。参数减少到54：64％，和所述触发器超过42％

17. Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction using a Graph Vehicle-Pedestrian Attention Network [PDF] 返回目录
Stuart Eiffert, Kunming Li, Mao Shan, Stewart Worrall, Salah Sukkarieh, Eduardo Nebot
Abstract: Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, extends recent work in trajectory prediction, combining Recurrent Neural Networks (RNNs) with Mixture Density Networks (MDNs) to output probabilistic multimodal predictions, from which likely modal paths are found and used for adversarial training. We also propose the use of Graph Vehicle-Pedestrian Attention Network (GVAT), which models social interactions and allows input of a shared vehicle feature, showing that inclusion of this module leads to improved trajectory prediction both with and without the presence of a vehicle. Through evaluation on various datasets, we demonstrate improvements on the existing state of the art methods for trajectory prediction and illustrate how the true multimodal and uncertain nature of crowd interactions can be directly modelled.
摘要：了解和预测行人的意图主要目的是使自主车和移动机器人导航的人群。当我们考虑到不确定性和行人运动的多模态，以及人群的成员，包括对车辆的任何响应之间的相互作用暗示这个问题变得越来越复杂。我们的方法，概率人群甘，延伸轨迹预测最近的工作，回归神经网络（RNNs）与混合密度网络（的MDN）相结合，输出概率多式联运的预测，从中可能模态路径发现并用于对抗训练。我们还建议使用图形车辆，行人注意网络（GVAT）的该款车型的社会互动，并允许共享车辆特征的输入，显示，列入该模块导致既没有车辆的情况下改善轨迹预测的。通过对各种数据集的评估，我们证明上的轨迹预测技术方法的存在状态的改善和说明如何人群互动的真正的多式联运和不确定的性质，可直接建模。

18. Scribble2Label: Scribble-Supervised Cell Segmentation via Self-Generating Pseudo-Labels with Consistency [PDF] 返回目录
Hyeonsoo Lee, Won-Ki Jeong
Abstract: Segmentation is a fundamental process in microscopic cell image analysis. With the advent of recent advances in deep learning, more accurate and high-throughput cell segmentation has become feasible. However, most existing deep learning-based cell segmentation algorithms require fully annotated ground-truth cell labels, which are time-consuming and labor-intensive to generate. In this paper, we introduce Scribble2Label, a novel weakly-supervised cell segmentation framework that exploits only a handful of scribble annotations without full segmentation labels. The core idea is to combine pseudo-labeling and label filtering to generate reliable labels from weak supervision. For this, we leverage the consistency of predictions by iteratively averaging the predictions to improve pseudo labels. We demonstrate the performance of Scribble2Label by comparing it to several state-of-the-art cell segmentation methods with various cell image modalities, including bright-field, fluorescence, and electron microscopy. We also show that our method performs robustly across different levels of scribble details, which confirms that only a few scribble annotations are required in real-use cases.
摘要：分割是显微细胞图像分析的基本过程。随着深度学习最新进展的到来，更精确和高通量细胞分割已变得可行。然而，大多数现有的深学习型细胞分割算法需要完全注释地面真电池的标签，这是耗时耗力的产生。在本文中，我们介绍Scribble2Label，一种新型弱监督小区分割的框架，仅利用涂鸦注解极少数没有完全分割的标签。其核心思想是伪标签和标签过滤相结合，产生由弱监督可靠的标签。为此，我们通过反复平均预测，以提高假标签，利用预测的一致性。我们通过比较状态的最先进的几种细胞的分割方法与各种细胞图像模态，包括明视野，荧光，和电子显微镜表明Scribble2Label的性能。我们还表明，稳健地跨越不同层次的涂鸦细节我们的方法执行，这证实了只有少数涂鸦注解需要在实际使用情况。

19. SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection [PDF] 返回目录
Ze Chen, Zhihang Fu, Rongxin Jiang, Yaowu Chen, Xian-sheng Hua
Abstract: Based on the framework of multiple instance learning (MIL), tremendous works have promoted the advances of weakly supervised object detection (WSOD). However, most MIL-based methods tend to localize instances to their discriminative parts instead of the whole content. In this paper, we propose a spatial likelihood voting (SLV) module to converge the proposal localizing process without any bounding box annotations. Specifically, all region proposals in a given image play the role of voters every iteration during training, voting for the likelihood of each category in spatial dimensions. After dilating alignment on the area with large likelihood values, the voting results are regularized as bounding boxes, being used for the final classification and localization. Based on SLV, we further propose an end-to-end training framework for multi-task learning. The classification and localization tasks promote each other, which further improves the detection performance. Extensive experiments on the PASCAL VOC 2007 and 2012 datasets demonstrate the superior performance of SLV.
摘要：基于多示例学习（MIL）的框架内，巨大的工作，促进了弱监督对象检测（WSOD）的进展。然而，大多数基于MIL-方法往往本地化实例的歧视性部分，而不是全部内容。在本文中，我们提出了一个空间的可能性投票（SLV）模块收敛的建议本地化过程中没有任何边界框注释。具体来说，给定的图像中的所有区域的建议发挥选民的作用，在培训过程中每一次迭代，在空间维度每个类别的可能性投票。与大似然值的区域扩张对准之后，投票结果正规化为边界框，被用于最终的分类和定位。基于SLV，我们进一步提出了多任务学习结束到终端的培训框架。分类和定位任务，相互促进，从而进一步提高了检测性能。在PASCAL VOC 2007年和2012年的数据集大量实验证明SLV的卓越性能。

20. Non-parametric spatially constrained local prior for scene parsing on real-world data [PDF] 返回目录
Ligang Zhang
Abstract: Scene parsing aims to recognize the object category of every pixel in scene images, and it plays a central role in image content understanding and computer vision applications. However, accurate scene parsing from unconstrained real-world data is still a challenging task. In this paper, we present the non-parametric Spatially Constrained Local Prior (SCLP) for scene parsing on realistic data. For a given query image, the non-parametric SCLP is learnt by first retrieving a subset of most similar training images to the query image and then collecting prior information about object co-occurrence statistics between spatial image blocks and between adjacent superpixels from the retrieved subset. The SCLP is powerful in capturing both long- and short-range context about inter-object correlations in the query image and can be effectively integrated with traditional visual features to refine the classification results. Our experiments on the SIFT Flow and PASCAL-Context benchmark datasets show that the non-parametric SCLP used in conjunction with superpixel-level visual features achieves one of the top performance compared with state-of-the-art approaches.
摘要：现场解析旨在识别场景图像的每个像素的对象类别，它在图像内容的理解和计算机视觉应用的核心作用。然而，从不受约束的真实世界数据的准确场景分析仍是一项艰巨的任务。在本文中，我们提出了非参数空间约束地方之前（SCLP）为现实的数据分析现场。对于给定查询图像，非参数SCLP通过首先检索最相似的训练图像与查询图像的子集，然后收集关于空间图像块之间，并从检索到的子集相邻的超像素之间对象共同出现统计的先验信息得知。的SCLP是在查询图像中捕获关于对象间相关性长期和短程上下文强大，可以与传统的视觉特征被有效地集成到细化分类结果。我们对SIFT流量和PASCAL - 语境基准数据集实验表明，与超像素级别的视觉特征结合使用的非参数SCLP实现了与国家的最先进的方法相比，顶级的性能之一。

21. Object recognition through pose and shape estimation [PDF] 返回目录
Anitta D, Annis Fathima A
Abstract: Computer vision helps machines or computer to see like humans. Computer Takes information from the images and then understands of useful information from images. Gesture recognition and movement recognition are the current area of research in computer vision. For both gesture and movement recognition finding pose of an object is of great importance. The purpose of this paper is to review many state of art which is already available for finding the pose of object based on shape, based on appearance, based on feature and comparison for its accuracy, complexity and performance
摘要：计算机视觉帮助机器或电脑看像人类一样。计算机需要从图像信息，然后从图像有用的信息了解。手势识别和运动识别是计算机视觉研究的当前区域。对于对象的兼备的姿态和动作识别发现的姿态是非常重要的。本文的目的是审查许多艺术状态，这已经可用于基于形状找对象的姿势的基础上，外观，基于特征和比较其准确性，复杂性和性能

22. NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks [PDF] 返回目录
Eugene Lee, Chen-Yi Lee
Abstract: Deciding the amount of neurons during the design of a deep neural network to maximize performance is not intuitive. In this work, we attempt to search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy. Using iterative pruning methods as a proxy, we parameterize the change of the neuron (filter) number of each layer with respect to the change in parameters, allowing us to efficiently scale an architecture across arbitrary sizes. We also introduce architecture descent which iteratively refines the parameterized function used for model scaling. The combination of both proposed methods is coined as NeuralScale. To prove the efficiency of NeuralScale in terms of parameters, we show empirical simulations on VGG11, MobileNetV2 and ResNet18 using CIFAR10, CIFAR100 and TinyImageNet as benchmark datasets. Our results show an increase in accuracy of 3.04%, 8.56% and 3.41% for VGG11, MobileNetV2 and ResNet18 on CIFAR10, CIFAR100 and TinyImageNet respectively under a parameter-constrained setting (output neurons (filters) of default configuration with scaling factor of 0.25).
摘要：深神经网络性能最大化的设计过程中决定神经元的数量是不直观。在这项工作中，我们试图寻找最大化准确性固定网络架构的神经元（过滤器）的配置。使用迭代修剪方法作为代理，我们相对于参数的每个层的神经元（过滤器）的数目的变化在参数的改变，使我们能够有效地扩展跨越任意大小的架构。我们还引进建筑血统，其反复提炼用于模型缩放参数化功能。双方提出的方法的组合创造为NeuralScale。为了证明NeuralScale在参数方面的效率，我们将展示使用CIFAR10，CIFAR100和TinyImageNet作为基准数据集上VGG11，MobileNetV2和ResNet18实证模拟。我们的结果表明在3.04％，8.56％和VGG11，MobileNetV2和ResNet18上CIFAR10，CIFAR100 3.41％的精确度和分别下一个参数约束设置增加TinyImageNet（缺省配置的输出神经元（过滤器）与缩放0.25倍）。

23. MSMD-Net: Deep Stereo Matching with Multi-scale and Multi-dimension Cost Volume [PDF] 返回目录
Zhelun Shen, Yuchao Dai, Zhibo Rao
Abstract: Deep end-to-end learning based stereo matching methods have achieved great success as witnessed by the leaderboards across different benchmarking datasets (KITTI, Middlebury, ETH3D, etc), where the cost volume representation is an indispensable step to the success. However, most existing work only employs a single cost volume, which cannot fully exploit the multi-scale cues in stereo matching and provide guidance for disparity refinement. What's more, the single cost volume representation also limits the disparity range and the resolution of the disparity estimation. In this paper, we propose MSMD-Net (Multi-Scale and Multi-Dimension) to construct multi-scale and multi-dimension cost volume. At the multi-scale level, we generate four 4D combination volumes at different scales and integrate them in 3D cost aggregation to predict an initial disparity estimation. At the multi-dimension level, we construct a 3D warped correlation volume and use it to refine the initial disparity map with residual learning. These two dimensional cost volumes are complementary to each other and can boost the performance of disparity estimation. Additionally, we propose a switch training strategy to further improve the accuracy of disparity estimation, where we switch two kinds of different activation functions to alleviate the overfitting issue in the pre-training process. Our proposed method was evaluated on several benchmark datasets and ranked first on KITTI 2012 leaderboard and second on KITTI 2015 leaderboard as of June 23.The code of MSMD-Net is available at this https URL.
摘要：深终端到终端的基于学习的立体匹配方法都取得了巨大的成功，通过在不同的数据集标杆（KITTI，明德，ETH3D等），其中成本音量表示是一个不可缺少的步骤成功排行榜的见证。然而，大多数现有的工作只采用单一的成本量，不能充分利用立体匹配的多尺度线索和差距细化提供指导。更重要的是，单一的成本量表示也限制了视差范围和视差估计的分辨率。在本文中，我们提出MSMD-NET（多尺度和多维度），构建多尺度，多维度的成本量。在多级水平，我们产生不同秤盘四角4D组合体积和它们在3D的成本聚集整合来预测初始视差估计。在多维度级别，我们构建了一个三维翘曲相关量，并用它来细化初始视差图与剩余的学习。这两个维成本卷是彼此互补，可以提高视差估计的性能。此外，我们提出了一个开关培训战略，进一步提高视差估计的准确度，我们切换2种不同的激活功能，以减轻过度拟合问题在预训练过程。我们提出的方法在几个基准数据集进行评估和排名第一的KITTI 2012排行榜和第二对KITTI 2015年排行榜作为MSMD-网六月23.The代码可在此HTTPS URL。

24. Increased-Range Unsupervised Monocular Depth Estimation [PDF] 返回目录
Saad Imran, Muhammad Umar Karim Khan, Sikander Bin Mukarram, Chong-Min Kyung
Abstract: Unsupervised deep learning methods have shown promising performance for single-image depth estimation. Since most of these methods use binocular stereo pairs for self-supervision, the depth range is generally limited. Small-baseline stereo pairs provide small depth range but handle occlusions well. On the other hand, stereo images acquired with a wide-baseline rig cause occlusions-related errors in the near range but estimate depth well in the far range. In this work, we propose to integrate the advantages of the small and wide baselines. By training the network using three horizontally aligned views, we obtain accurate depth predictions for both close and far ranges. Our strategy allows to infer multi-baseline depth from a single image. This is unlike previous multi-baseline systems which employ more than two cameras. The qualitative and quantitative results show the superior performance of multi-baseline approach over previous stereo-based monocular methods. For 0.1 to 80 meters depth range, our approach decreases the absolute relative error of depth by 24% compared to Monodepth2. Our approach provides 21 frames per second on a single Nvidia1080 GPU, making it useful for practical applications.
摘要：深无监督学习方法已经显示了单一图像深度估计有前途的性能。由于大多数的这些方法使用双目对自我监督，深度范围一般限于。小基线立体对提供小深度范围却把手遮挡好。在另一方面，收购在不久的范围宽基线钻机原因闭塞相关的错误，但估计深度以及在远范围的立体图像。在这项工作中，我们提出整合小而宽基线的优势。通过训练使用三个水平排列的看法网络，我们得到了两个接近准确的深度预测和远范围。我们的策略允许从单个图像推断多基线深度。这不像它采用两个以上的相机之前的多基线系统。定性和定量结果表明，多基线方法比以前基于立体声单眼方法的优越性能。为0.1〜80米的深度范围，我们的方法由24％相比，减少Monodepth2深度的绝对值相对误差。我们的方法提供了在单个Nvidia1080 GPU每秒21帧，使之成为实际应用中是有用的。

25. Surpassing Real-World Source Training Data: Random 3D Characters for Generalizable Person Re-Identification [PDF] 返回目录
Yanan Wang, Shengcai Liao, Ling Shao
Abstract: Person re-identification has seen significant advancement in recent years. However, the ability of learned models to generalize to unknown target domains still remains limited. One possible reason for this is the lack of large-scale and diverse source training data, since manually labeling such a dataset is very expensive. To address this, we propose to automatically synthesize a large-scale person re-identification dataset following a set-up similar to real surveillance but with virtual environments, and then use the synthesized person images to train a generalizable person re-identification model. Specifically, we design a method to generate a large number of random UV texture maps and use them to create different 3D clothing models. Then, an automatic code is developed to randomly generate various different 3D characters with diverse clothes, races and attributes. Next, we simulate a number of different virtual environments using Unity3D, with customized camera networks similar to real surveillance systems, and import multiple 3D characters at the same time, with various movements and interactions along different paths through the camera networks. As a result, we obtain a virtual dataset, called RandPerson, with 1,756,759 person images of 8,000 identities. By training person re-identification models on these synthesized person images, we demonstrate, for the first time, that models trained on virtual data can generalize well to unseen target images, surpassing the models trained on various real-world datasets, including CUHK03, Market-1501, DukeMTMC-reID, and MSMT17. The RandPerson dataset will be released at this https URL.
摘要：人重新鉴定已经出现显著的进步在最近几年。然而，得知模型的能力推广到未知目标域仍然有限。一个可能的原因是缺乏大规模多样源训练数据的，因为手动标记这样的数据集是非常昂贵的。为了解决这个问题，我们建议自动合成一个大型的人重新鉴定的数据集下一个建立类似于现实监视但虚拟环境，然后用合成的人的图像，以培养普及人重新识别模型。具体来说，我们设计产生了大量的随机UV纹理贴图，并使用它们来创建不同的3D服装模特的方法。然后，自动代码被显影以随机生成各种不同的3D字符具有不同的衣服，种族和属性。接下来，我们模拟使用Unity3D了许多不同的虚拟环境中，类似于真正的监控系统定制的网络摄像头，并在同一时间导入多个3D人物，有各种运动和沿着穿过摄像头网络不同的路径相互作用。其结果是，我们得到了一个虚拟的数据集，名为RandPerson，有1756759倍人的身份8000的图像。通过对这些合成的人员图像训练的人重新识别模型，我们证明，第一次，受训于虚拟数据模型可以概括很好地看不见的目标图像，超越了训练的各种现实世界的数据集，包括CUHK03，市场的车型-1501，DukeMTMC - 里德和MSMT17。该RandPerson数据集将在此HTTPS URL被释放。

26. Discriminative Feature Alignment: ImprovingTransferability of Unsupervised DomainAdaptation by Gaussian-guided LatentAlignment [PDF] 返回目录
Jing Wang, Jiahong Chen, Jianzhe Lin, Leonid Sigal, Clarence W. de Silva
Abstract: In this study, we focus on the unsupervised domain adaptation problem where an approximate inference model is to be learned from a labeled data domain and expected to generalize well to an unlabeled data domain. The success of unsupervised domain adaptation largely relies on the cross-domain feature alignment. Previous work has attempted to directly align latent features by the classifier-induced discrepancies. Nevertheless, a common feature space cannot always be learned via this direct feature alignment especially when a large domain gap exists. To solve this problem, we introduce a Gaussian-guided latent alignment approach to align the latent feature distributions of the two domains under the guidance of the prior distribution. In such an indirect way, the distributions over the samples from the two domains will be constructed on a common feature space, i.e., the space of the prior, which promotes better feature alignment. To effectively align the target latent distribution with this prior distribution, we also propose a novel unpaired L1-distance by taking advantage of the formulation of the encoder-decoder. The extensive evaluations on eight benchmark datasets validate the superior knowledge transferability through outperforming state-of-the-art methods and the versatility of the proposed method by improving the existing work significantly.
摘要：在这项研究中，我们重点监督的领域适应性问题，在近似推理模型是从标记的数据域的经验和预期以及推广到未标记的数据域。无人监督的领域适应性的成功在很大程度上依赖于跨域功能定位。以前的工作已试图通过分类诱导的差异直接对准潜在功能。然而，一个共同的特征空间不能总是可通过此功能直接对准获悉尤其是当存在较大的差距域。为了解决这个问题，我们引入一个高斯引导潜在的对齐方式对齐两个域的潜在功能分布先验分布的指导下进行。在这样的间接的方式，在从两个结构域的样本分布将在一个共同的特征空间来构造，即，在现有，这促进了更好的特征对准的空间。为了有效地对准该现有分配的目标潜分布，我们还通过取编码器 - 解码器的配制剂的优点提出了一种新颖的不成对L1-距离。八个基准数据集的广泛评估验证通过超越国家的最先进的方法和丰富知识转让所提出的方法通过显著改善现有工作的多功能性。

27. Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative [PDF] 返回目录
Mingxi Lei, Bino Varghese, Darryl Hwang, Steven Cen, Xiaomeng Lei, Afshin Azadikhah, Bhushan Desai, Assad Oberai, Vinay Duddalwar
Abstract: There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchmark values were used to compare the variation of the radiomic features while using 6 publicly available software programs and 1 in-house radiomics pipeline. All IBSI-standardized features (11 classes, 173 in total) were extracted. The relative differences between the extracted feature values from the different software and the IBSI benchmark values were calculated to measure the inter-software agreement. To better understand the variations, features are further grouped into 3 categories according to their properties: 1) morphology, 2) statistic/histogram and 3)texture features. While a good agreement was observed for a majority of radiomics features across the various programs, relatively poor agreement was observed for morphology features. Significant differences were also found in programs that use different gray level discretization approaches. Since these programs do not include all IBSI features, the level of quantitative assessment for each category was analyzed using Venn and the UpSet diagrams and also quantified using two ad hoc metrics. Morphology features earns lowest scores for both metrics, indicating that morphological features are not consistently evaluated among software programs. We conclude that radiomic features calculated using different software programs may not be identical and reliable. Further studies are needed to standardize the workflow of radiomic feature extraction.
摘要：有关于radiomic功能术语，基本数学，或它们的实现没有达成共识。这产生了这样的特点使用不同工具箱不能用于建立或验证同型号导致的结果radiomic非概括提取的场景。在这项研究中，图像的生物标志物的标准化倡议（IBSI）建立的幻象和基准值使用，而使用6个公开可用的软件程序和1内部radiomics管道比较的radiomic功能的变化。所有IBSI-标准化特征（11类，总共173）萃取。计算从不同的软件和IBSI基准值抽取的特征值之间的相对差异来衡量软件间协议。为了更好地理解变化，特征进一步分组为根据它们的属性3类：1）的形态，2）统计/直方图和3）纹理特征。虽然观察到大多数radiomics的一致横跨各种方案的特点，比较观察到的形态特征一致性较差。显著差异也使用不同的灰度级离散化方法的程序中。由于这些项目不包括所有IBSI功能，使用维恩和心烦图表分析定量评估的每个类别的水平，用两个量化还特设指标。形态特征的两个指标赚取得分最低，表明形态特征都没有一致的软件程序之间进行评估。我们的结论是使用不同的软件程序可能不相同，可靠的计算radiomic功能。需要进一步的研究，以规范radiomic特征提取的工作流程。

28. PoseGAN: A Pose-to-Image Translation Framework for Camera Localization [PDF] 返回目录
Kanglin Liu, Qing Li, Guoping Qiu
Abstract: Camera localization is a fundamental requirement in robotics and computer vision. This paper introduces a pose-to-image translation framework to tackle the camera localization problem. We present PoseGANs, a conditional generative adversarial networks (cGANs) based framework for the implementation of pose-to-image translation. PoseGANs feature a number of innovations including a distance metric based conditional discriminator to conduct camera localization and a pose estimation technique for generated camera images as a stronger constraint to improve camera localization performance. Compared with learning-based regression methods such as PoseNet, PoseGANs can achieve better performance with model sizes that are 70% smaller. In addition, PoseGANs introduce the view synthesis technique to establish the correspondence between the 2D images and the scene, \textit{i.e.}, given a pose, PoseGANs are able to synthesize its corresponding camera images. Furthermore, we demonstrate that PoseGANs differ in principle from structure-based localization and learning-based regressions for camera localization, and show that PoseGANs exploit the geometric structures to accomplish the camera localization task, and is therefore more stable than and superior to learning-based regressions which rely on local texture features instead. In addition to camera localization and view synthesis, we also demonstrate that PoseGANs can be successfully used for other interesting applications such as moving object elimination and frame interpolation in video sequences.
摘要：相机的定位是机器人和计算机视觉的基本要求。本文介绍了一个姿势到影像转换架构，以解决相机定位问题。我们目前PoseGANs，有条件的生成对抗网络（CGANS）的姿态对图像的平移的实施为基础的框架。 PoseGANs设有多项创新包括距离度量基于条件鉴别器行为照相机的定位和对生成的摄像机图像作为更强的约束，以改善摄像机定位性能姿势估计技术。一起学习为基础的回归方法，如PoseNet相比，PoseGANs可以实现与小70％是模型大小更好的性能。此外，PoseGANs引入视图合成技术来建立2D图像和场景之间的对应\ textit {即}，给定的姿势，PoseGANs能合成其对应的摄像机图像。此外，我们证明了PoseGANs从基于结构的定位和基于学习，回归相机的定位，并显示原理不同的是PoseGANs利用几何结构来完成相机的定位任务，比因此更加稳定和优越的学习为基础的这依赖于局部纹理回归功能来代替。除了相机定位和视图合成，我们还证明了PoseGANs可以成功地用于其它有趣的应用，例如视频序列中运动对象的消除和帧插值。

29. CIE XYZ Net: Unprocessing Images for Low-Level Computer Vision Tasks [PDF] 返回目录
Mahmoud Afifi, Abdelrahman Abdelhamed, Abdullah Abuolaim, Abhijith Punnappurath, Michael S. Brown
Abstract: Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data) or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer vision tasks that work best with a linear image state, such as image deblurring and image dehazing. Unfortunately, the vast majority of images are saved in the nonlinear image state. Because of this, a number of methods have been proposed to "unprocess" nonlinear images back to a raw-RGB state. However, existing unprocessing methods have a drawback because raw-RGB images are sensor-specific. As a result, it is necessary to know which camera produced the sRGB output and use a method or network tailored for that sensor to properly unprocess it. This paper addresses this limitation by exploiting another camera image state that is not available as an output, but it is available inside the camera pipeline. In particular, cameras apply a colorimetric conversion step to convert the raw-RGB image to a device-independent space based on the CIE XYZ color space before they apply the nonlinear photo-finishing. Leveraging this canonical image state, we propose a deep learning framework, CIE XYZ Net, that can unprocess a nonlinear image back to the canonical CIE XYZ image. This image can then be processed by any low-level computer vision operator and re-rendered back to the nonlinear image. We demonstrate the usefulness of the CIE XYZ Net on several low-level vision tasks and show significant gains that can be obtained by this processing framework. Code and dataset are publicly available at this https URL.
摘要：照相机目前允许访问两个图像状态：（ⅰ）一最小程度线性原始RGB图像状态（即，原始传感器数据）或（ii）一个高度非线性处理图像的状态（例如，sRGB）可进行处理。有许多计算机视觉任务的工作最好用线性图像状态，如图像去模糊和图像除雾。不幸的是，绝大多数的图像保存在非线性图像状态。正因为如此，许多方法已经被提出来“unprocess”非线性影像传回一个原始RGB状态。但是，现有的unprocessing方法具有一个缺点，因为原始RGB图像传感器特定的。其结果是，有必要知道哪些相机产生与sRGB输出并使用该传感器适合的方法或网络正常unprocess它。本文通过利用其他相机图像状态，是不是可以作为一个输出解决了这一限制，但它可在相机内的管道。特别地，摄像机应用比色转换步骤的原始RGB图像转换为基于CIE XYZ色彩空间它们适用非线性光精加工之前与设备无关的空间。利用该标准图像的状态，我们提出了一个深刻的学习框架，CIE XYZ网，可以unprocess非线性图像回规范CIE XYZ图像。该图像然后可以通过任何低级计算机视觉操作者进行处理和重新渲染回到非线性图像。我们证明了CIE XYZ网的实用性在几个低级别的视觉任务，并表明，通过该处理框架内得到显著的收益。代码和数据集是公开的，在此HTTPS URL。

30. iffDetector: Inference-aware Feature Filtering for Object Detection [PDF] 返回目录
Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann
Abstract: Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors, resulting in our iffDetector. Unlike conventional open-loop feature calculation approaches without feedback, the IFF module performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features. By applying Fourier transform analysis, we demonstrate that the IFF module acts as a negative feedback that theoretically guarantees the stability of feature learning. IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that our iffDetector consistently outperforms state-of-the-art methods by significant margins\footnote{The test code and model are anonymously available in this https URL }.
摘要：现代CNN基于对象的探测器注重功能配置训练中却往往忽略了推理过程中的功能优化。在本文中，我们提出了一个新的功能优化方法以增强功能，抑制背景噪声在训练和推理阶段两者。我们引入一个通用的推理感知特征过滤（IFF），可以很容易地与现代相结合的探测器，导致我们iffDetector模块。不同于传统的开环特征计算方法无反馈，敌我识别模块执行通过利用高层次语义，以提高卷积功能的闭环优化。通过应用傅里叶变换分析，我们证明了IFF模块作为理论上保证了学习功能的稳定性的负反馈。 IFF可以在可以忽略不计的计算成本开销插件和播放方式基于CNN对象探测器融合。在PASCAL VOC和MS COCO数据集实验表明，我们的iffDetector的性能一直优于国家的最先进的方法，通过显著利润率\ {脚注的测试代码和型号都可以匿名在此HTTPS URL}。

31. Deep Learning of Unified Region, Edge, and Contour Models for Automated Image Segmentation [PDF] 返回目录
Ali Hatamizadeh
Abstract: Image segmentation is a fundamental and challenging problem in computer vision with applications spanning multiple areas, such as medical imaging, remote sensing, and autonomous vehicles. Recently, convolutional neural networks (CNNs) have gained traction in the design of automated segmentation pipelines. Although CNN-based models are adept at learning abstract features from raw image data, their performance is dependent on the availability and size of suitable training datasets. Additionally, these models are often unable to capture the details of object boundaries and generalize poorly to unseen classes. In this thesis, we devise novel methodologies that address these issues and establish robust representation learning frameworks for fully-automatic semantic segmentation in medical imaging and mainstream computer vision. In particular, our contributions include (1) state-of-the-art 2D and 3D image segmentation networks for computer vision and medical image analysis, (2) an end-to-end trainable image segmentation framework that unifies CNNs and active contour models with learnable parameters for fast and robust object delineation, (3) a novel approach for disentangling edge and texture processing in segmentation networks, and (4) a novel few-shot learning model in both supervised settings and semi-supervised settings where synergies between latent and image spaces are leveraged to learn to segment images given limited training data.
摘要：图像分割是在计算机视觉与应用跨越多个领域，如医学成像，遥感，和自主车辆的基本和具有挑战性的问题。近日，卷积神经网络（细胞神经网络）中的自动分割管道设计都获得了牵引力。虽然基于CNN的模型善于从原始图像数据中学习抽象的特点，其性能取决于合适的训练数据的可用性和大小。此外，这些模型往往无法捕捉到对象边界的细节，层次不清推广到看不见的类。在本文中，我们制定解决这些问题，并建立强大的学习表现为框架在医疗成像和主流的计算机视觉全自动语义分割新方法。特别地，我们的贡献包括：（1）状态的最先进的2D和3D图像分割网络的计算机视觉和医学图像分析，（2）的端部到端可训练图像分割框架相结合细胞神经网络和活动轮廓模型具有用于快速和强大的对象划分可学习的参数，（3），用于在两个监督设置和半监督设置解开边缘和在分割网络纹理处理，和（4）的新的几次学习模型，其中之间的协同作用的新方法潜和图像空间利用要学会给予有限的培训数据段的图像。

32. Contrastive Generative Adversarial Networks [PDF] 返回目录
Minguk Kang, Jaesik Park
Abstract: Conditional image synthesis is the task to generate high-fidelity diverse images using class label information. Although many studies have shown realistic results, there is room for improvement if the number of classes increases. In this paper, we propose a novel conditional contrastive loss to maximize a lower bound on mutual information between samples from the same class. Our framework, called Contrastive Generative Adversarial Networks (ContraGAN), learns to synthesize images using class information and data-to-data relations of training examples. The discriminator in ContraGAN discriminates the authenticity of given samples and maximizes the mutual information between embeddings of real images from the same class. Simultaneously, the generator attempts to synthesize images to fool the discriminator and to maximize the mutual information of fake images from the same class prior. The experimental results show that ContraGAN is robust to network architecture selection and outperforms state-of-the-art-models by 3.7% and 11.2% on CIFAR10 and Tiny ImageNet datasets, respectively, without any data augmentation. For the fair comparison, we re-implement the nine state-of-the-art-approaches to test various methods under the same condition. The software package that can re-produce all experiments is available at this https URL.
摘要：条件图像合成是生成使用类别标签信息高保真多样的图像的任务。虽然许多研究显示逼真的效果，还有改进的余地，如果类的数量增加。在本文中，我们提出了一种新的条件对比损失最大化同一类样本之间的相互信息的下限。我们的框架，称为对比剖成对抗性网络（ContraGAN），学会使用类信息和训练实例数据到数据关系的合成图像。在ContraGAN鉴别鉴别给出样本的真实性和最大化同一类真实图像的嵌入之间的互信息。同时，发电机尝试合成图像愚弄鉴别和之前最大限度的假图像的同一类的互信息。实验结果表明，ContraGAN是稳健的网络体系结构的选择，优于国家的最先进的模型3.7％和CIFAR10和微小ImageNet数据集，分别为11.2％，而没有任何数据扩张。对于公平的比较，我们重新实现国家九的最先进的办法，以测试在相同条件下各种方法。该软件包可以重新制作所有实验可在此HTTPS URL。

33. AFDet: Anchor Free One Stage 3D Object Detection [PDF] 返回目录
Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Yu Wang, Sijia Chen, Li Huang, Yuan Li
Abstract: High-efficiency point cloud 3D object detection operated on embedded systems is important for many robotics applications including autonomous driving. Most previous works try to solve it using anchor-based detection methods which come with two drawbacks: post-processing is relatively complex and computationally expensive; tuning anchor parameters is tricky. We are the first to address these drawbacks with an anchor free and Non-Maximum Suppression free one stage detector called AFDet. The entire AFDet can be processed efficiently on a CNN accelerator or a GPU with the simplified post-processing. Without bells and whistles, our proposed AFDet performs competitively with other one stage anchor-based methods on KITTI validation set and Waymo Open Dataset validation set.
摘要：在嵌入式系统上运行的高效率的点云3D物体的检测是非常重要的许多机器人应用，包括自动驾驶。大多数以前的作品中尝试使用基于锚的检测方法，其配有两个缺点来解决这个问题：后处理比较复杂，计算量很大;调谐锚参数是棘手的。我们是第一个来解决这些缺点与无锚定和非最大抑制无一个阶段称为AFDet探测器。整个AFDet能够高效上的CNN加速器或与简化的后处理的GPU进行处理。没有花俏，我们的竞争力与KITTI验证集和Waymo打开的数据集验证集另一个阶段基于锚的方法提出AFDet执行。

34. RP2K: A Large-Scale Retail Product Dataset forFine-Grained Image Classification [PDF] 返回目录
Jingtian Peng, Chang Xiao, Xun Wei, Yifan Li
Abstract: We introduce RP2K, a new large-scale retail product dataset for fine-grained image classification. Unlike previous datasets focusing on relatively few products, we collect more than 500,000 images of retail products on shelves belonging to 2000 different products. Our dataset aims to advance the research in retail object recognition, which has massive applications such as automatic shelf auditing and image-based product information retrieval. Our dataset enjoys following properties: (1) It is by far the largest scale dataset in terms of product categories. (2) All images are captured manually in physical retail stores with natural lightings, matching the scenario of real applications. (3) We provide rich annotations to each object, including the sizes, shapes and flavors/scents. We believe our dataset could benefit both computer vision research and retail industry.
摘要：介绍RP2K，一个新的大型零售产品数据集细粒度图像分类。不同于以往的数据集专注于相对较少的产品，我们收集属于2000种不同的产品货架上零售的产品超过50万倍的图像。我们的数据集的目标是推动零售物体识别，里面有大量的应用，如自动货架审计和基于图像的产品信息检索研究。我们的数据享有以下特性：（1）这是迄今为止在产品类别方面规模最大的数据集。（2）所有的图像与自然照明实体零售店手动捕获，匹配的实际应用的情况。（3）提供丰富的注解的每个对象，包括尺寸，形状和风味/香味。我们相信，我们的数据可以造福于计算机视觉研究和零售业。

35. Drive-Net: Convolutional Network for Driver Distraction Detection [PDF] 返回目录
Mohammed S. Majdi, Sundaresh Ram, Jonathan T. Gill, Jeffery J. Rodriguez
Abstract: To help prevent motor vehicle accidents, there has been significant interest in finding an automated method to recognize signs of driver distraction, such as talking to passengers, fixing hair and makeup, eating and drinking, and using a mobile phone. In this paper, we present an automated supervised learning method called Drive-Net for driver distraction detection. Drive-Net uses a combination of a convolutional neural network (CNN) and a random decision forest for classifying images of a driver. We compare the performance of our proposed Drive-Net to two other popular machine-learning approaches: a recurrent neural network (RNN), and a multi-layer perceptron (MLP). We test the methods on a publicly available database of images acquired under a controlled environment containing about 22425 images manually annotated by an expert. Results show that Drive-Net achieves a detection accuracy of 95%, which is 2% more than the best results obtained on the same database using other methods
摘要：为了帮助防止机动车事故伤害，也一直在寻找一个自动化的方法来识别驾驶者分心的迹象，如说话的乘客，固定发型和化妆，进食和饮水，并用手机显著的兴趣。在本文中，我们提出了称为驱动器 - 网的驾驶员分心检测的自动监督学习方法。驱动Net使用卷积神经网络（CNN）和随机决策林的司机的图像分类的组合。我们比较我们提出的驱动，网络的两个其他流行的机器学习的表现方法：一种回归神经网络（RNN）和多层感知器（MLP）。我们测试中，含有约由专家手工标注的22425个图像的受控环境下获得的图像的公开可用的数据库的方法。结果表明，驱动器 - 网达到的95％的检测精度，这是2％，比使用其他方法相同的数据库上得到了最好的结果更

36. Laplacian Mixture Model Point Based Registration [PDF] 返回目录
Mohammad Sadegh Majdi, Emad Fatemizadeh
Abstract: Point base registration is an important part in many machine VISIOn applications, medical diagnostics, agricultural studies etc. The goal of point set registration is to find correspondences between different data sets and estimate the appropriate transformation that can map one set to another. Here we introduce a novel method for matching of different data sets based on Laplacian distribution. We consider the alignment of two point sets as probability density estimation problem. By using maximum likelihood methods we try to fit the Laplacian mixture model (LMM) centroids (source point set) to the data point set.
摘要：点基地注册在许多机器视觉应用，医疗诊断，农业研究等的重要组成部分点集配准的目标是要找到不同的数据集之间的对应关系，并估计进行适当改造，可以一组映射到另一个。这里我们介绍了基于拉普拉斯分布不同的数据集的匹配的新方法。我们认为两个点集的概率密度估计问题的对齐方式。通过使用最大似然方法我们尝试以适合拉普拉斯混合模型（LMM）的质心（源点集）提供给数据点集。

37. LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation [PDF] 返回目录
Wentao Zhu, Can Zhao, Wenqi Li, Holger Roth, Ziyue Xu, Daguang Xu
Abstract: Deep Learning (DL) models are becoming larger, because the increase in model size might offer significant accuracy gain. To enable the training of large deep networks, data parallelism and model parallelism are two well-known approaches for parallel training. However, data parallelism does not help reduce memory footprint per device. In this work, we introduce Large deep 3D ConvNets with Automated Model Parallelism (LAMP) and investigate the impact of both input's and deep 3D ConvNets' size on segmentation accuracy. Through automated model parallelism, it is feasible to train large deep 3D ConvNets with a large input patch, even the whole image. Extensive experiments demonstrate that, facilitated by the automated model parallelism, the segmentation accuracy can be improved through increasing model size and input context size, and large input yields significant inference speedup compared with sliding window of small patches in the inference. Code is available\footnote{this https URL}.
摘要：深学习（DL）模型越来越大，因为在模型尺寸的增加可能提供显著的准确性增益。为了使大而深的网络培训，数据并行模型和并行是平行的培训两个著名的方法。然而，数据并行不利于降低每台设备的内存占用。在这项工作中，我们将介绍与自动模式并行（LAMP）的大型深3D ConvNets和调查两个输入的对细分精度的影响和深刻的3D ConvNets'大小。通过自动化模型的并行性，它是培养大深3D ConvNets具有大输入接插，甚至整个图像是可行的。大量的实验证明，由自动化模型并行促进，分割精度可通过增加模型大小和输入上下文大小，和大输入的产量显著推断加速与在推理滑动小片的窗口相比提高。代码是可用\ {注脚这HTTPS URL}。

38. A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence [PDF] 返回目录
Changhao Chen, Bing Wang, Chris Xiaoxuan Lu, Niki Trigoni, Andrew Markham
Abstract: Deep learning based localization and mapping has recently attracted great attentions. Instead of crating hand-designed algorithms via exploiting physical models or geometry theory, deep learning based solutions provide an alternative to solve the problem in a data-driven way. Benefited from the ever-increasing amount of data and computational power, these methods are fast evolving into a new area that offers accurate and robust systems to track motion and estimate scene structure for real-world applications. In this work, we provide a comprehensive survey, and propose a new taxonomy on the existing approaches on localization and mapping using deep learning. We also discuss the limitations of current models, and indicate possible future directions. A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping (SLAM). We revisit the problem of perceiving self-motion and scene with on-board sensors, and show how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS). It is our hope that this work can connect the emerging works from robotics, computer vision and machine learning communities, and serve as a guide for future researchers to know about the possible ways that apply deep learning to tackle the localization and mapping problems.
摘要：深基础的学习定位与地图近来也备受关注。相反，通过利用物理模型或几何理论装箱手设计的算法，深度学习基础的解决方案提供解决数据驱动方式的问题的替代方法。从不断增加的数据和计算能力的量受惠，这些方法都迅速演变成一个新的领域的优惠，准确和强大的系统来跟踪现实世界的应用和运动估计场景结构。在这项工作中，我们提供了一个全面的调查，并提出利用深度学习的定位和地图的现有方法一个新的分类。我们还讨论了当前模型的局限性，并指出未来可能的方向。宽范围的主题的覆盖，从学习测距估计，映射到全局定位和同步定位和地图创建（SLAM）。我们重温感知自我的运动和场景与车载传感器的问题，并展示如何通过这些模块集成到未来的空间机器智能系统（SMIS）来解决它。我们希望这项工作能连接从机器人，计算机视觉和机器学习社区新兴的作品，并为今后的研究人员了解适用深学习，以解决定位和映射问题的可能途径的指南。

39. Frost filtered scale-invariant feature extraction and multilayer perceptron for hyperspectral image classification [PDF] 返回目录
G.Kalaiarasi, S.Maheswari
Abstract: Hyperspectral image (HSI) classification plays a significant in the field of remote sensing due to its ability to provide spatial and spectral information. Due to the rapid development and increasing of hyperspectral remote sensing technology, many methods have been developed for HSI classification but still a lack of achieving the better performance. A Frost Filtered Scale-Invariant Feature Transformation based MultiLayer Perceptron Classification (FFSIFT-MLPC) technique is introduced for classifying the hyperspectral image with higher accuracy and minimum time consumption. The FFSIFT-MLPC technique performs three major processes, namely preprocessing, feature extraction and classification using multiple layers. Initially, the hyperspectral image is divided into number of spectral bands. These bands are given as input in the input layer of perceptron. Then the Frost filter is used in FFSIFT-MLPC technique for preprocessing the input bands which helps to remove the noise from hyper-spectral image at the first hidden layer. After preprocessing task, texture, color and object features of hyper-spectral image are extracted at second hidden layer using Gaussian distributive scale-invariant feature transform. At the third hidden layer, Euclidean distance is measured between the extracted features and testing features. Finally, feature matching is carried out at the output layer for hyper-spectral image classification. The classified outputs are resulted in terms of spectral bands (i.e., different colors). Experimental analysis is performed with PSNR, classification accuracy, false positive rate and classification time with number of spectral bands. The results evident that presented FFSIFT-MLPC technique improves the hyperspectral image classification accuracy, PSNR and minimizes false positive rate as well as classification time than the state-of-the-art methods.
摘要：高光谱图像（HSI）分类起着遥感领域的显著由于其以提供空间和光谱信息的能力。由于快速发展和高光谱遥感技术日趋增多，很多方法已经被开发用于恒指分类，但仍缺乏实现更好的性能。甲霜滤波尺度不变特征变换基于多层感知器分类（FFSIFT-MLPC）技术被引入用于高光谱图像以更高的精度和最小时间消耗进行分类。所述FFSIFT-MLPC技术执行三个主要过程，即预处理，特征提取和使用多个层分类。最初，高光谱图像被划分成的光谱带的数目。这些频带被给定为在感知器的输入层输入。然后冰霜滤波器以FFSIFT-MLPC技术用于预处理输入波段，这有助于在所述第一隐藏层以除去从高光谱图像的噪声。预处理高光谱图像的任务，质地，颜色和对象特征后使用高斯分布尺度不变特征转换在第二隐蔽层萃取。在第三隐藏层，欧几里德距离所提取的特征和测试特征之间测量的。最后，特征匹配在对高光谱图像分类的输出层进行。所分类的输出导致的光谱带（即，不同的颜色）的条款。与PSNR，分类精度，假阳性率和分类的时间与光谱带的数量来进行实验分析。所呈现FFSIFT-MLPC技术明显的结果改善了高光谱图像的分类精度，PSNR并最大限度地减少比国家的最先进的方法的假阳性率以及分类时间。

40. Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors [PDF] 返回目录
Karl Pertsch, Oleh Rybkin, Frederik Ebert, Chelsea Finn, Dinesh Jayaraman, Sergey Levine
Abstract: The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without considering goal information, and (2) at the finest temporal resolution, one step at a time. In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations. First, we formulate the problem of predicting towards a goal and propose the corresponding class of latent space goal-conditioned predictors (GCPs). GCPs significantly improve planning efficiency by constraining the search space to only those trajectories that reach the goal. Further, we show how GCPs can be naturally formulated as hierarchical models that, given two observations, predict an observation between them, and by recursively subdividing each part of the trajectory generate complete sequences. This divide-and-conquer strategy is effective at long-term prediction, and enables us to design an effective hierarchical planning algorithm that optimizes trajectories in a coarse-to-fine manner. We show that by using both goal-conditioning and hierarchical prediction, GCPs enable us to solve visual planning tasks with much longer horizon than previously possible.
摘要：预测能力，并计划在未来是世界效制剂的基础。为了达到一个遥远的目标，我们在多个时间尺度预测轨迹，先制定的目标迈出的粗略计划，然后在细节中渐渐充满。相比之下，当前的学习视觉预测和规划失败的长期视野的任务，因为它们产生的预测（1）不考虑目标信息，并在最好的时间分辨率（2），在一步一步接近。在这项工作中，我们提出了视觉预测和规划，能够克服这两个限制的框架。首先，我们制定朝着一个目标预测的问题，并提出相应的类潜在空间目标空调预测（地面控制点）的。该搜索空间限制，只有那些达到目标的轨迹控制点显著提高规划效率。此外，我们展示了如何控制点可以自然地配制为：给定两个观测分层模型，预测它们之间的观察，并通过递归细分轨迹的每一部分生成完整的序列。这种分而治之的策略是有效的长期预测，并且使我们能够设计出在由粗到细地优化轨迹的有效分层规划算法。我们发现，通过使用两个目标空调和分级预测，地面控制点使我们能够解决视觉规划任务，比以前更长的视野。

41. Simple and Effective VAE Training with Calibrated Decoders [PDF] 返回目录
Oleh Rybkin, Kostas Daniilidis, Sergey Levine
Abstract: Variational autoencoders (VAEs) provide an effective and simple method for modeling complex distributions. However, training VAEs often requires considerable hyperparameter tuning, and often utilizes a heuristic weight on the prior KL-divergence term. In this work, we study how the performance of VAEs can be improved while not requiring the use of this heuristic hyperparameter, by learning calibrated decoders that accurately model the decoding distribution. While in some sense it may seem obvious that calibrated decoders should perform better than uncalibrated decoders, much of the recent literature that employs VAEs uses uncalibrated Gaussian decoders with constant variance. We observe empirically that the naïve way of learning variance in Gaussian decoders does not lead to good results. However, {other calibrated decoders, such as discrete decoders or learning shared variance} can substantially improve performance. To further improve results, we propose a simple but novel modification to the commonly used Gaussian decoder, which represents the prediction variance non-parametrically. We observe empirically that using the heuristic weight hyperparameter is not necessary with our method. We analyze the performance of various discrete and continuous decoders on a range of datasets and several single-image and sequential VAE models. Project website: \url{this https URL}
摘要：变自动编码（VAES）提供复杂的分布建模的有效且简单的方法。然而，训练VAES往往需要大量的超参数调整，并经常使用的前KL-散度项启发式的重量。在这项工作中，我们研究同时不需要使用这种启发式超参数，通过学习校正解码器是精确的模拟解码分布如何VAES的性能得以提高。虽然在某些意义上，它可能看起来很明显，校准解码器应比未校准的解码器更好地发挥，许多最近的文献，它采用VAES使用未校准高斯解码器具有恒定方差。我们经验观察，在高斯解码器学习差异的天真方式不会导致良好的结果。然而，{其他校准的解码器，诸如离散的解码器或学习共享方差}可以显着提高性能。为了进一步提高的结果，我们提出了一种简单而新颖修饰常用的高斯解码器，它表示预测方差非参数。我们观察经验，采用启发式重量超参数是没有必要跟我们的方法。我们分析各种离散和连续解码器对一系列数据集和几个单图像和连续VAE模型的性能。项目网站：\ {URL这HTTPS URL}

42. MANTRA: A Machine Learning reference lightcurve dataset for astronomical transient event recognition [PDF] 返回目录
Mauricio Neira, Catalina Gómez, John F. Suárez-Pérez, Diego A. Gómez, Juan Pablo Reyes, Marcela Hernández Hoyos, Pablo Arbeláez, Jaime E. Forero-Romero
Abstract: We introduce MANTRA, an annotated dataset of 4869 transient and 71207 non-transient object lightcurves built from the Catalina Real Time Transient Survey. We provide public access to this dataset as a plain text file to facilitate standardized quantitative comparison of astronomical transient event recognition algorithms. Some of the classes included in the dataset are: supernovae, cataclysmic variables, active galactic nuclei, high proper motion stars, blazars and flares. As an example of the tasks that can be performed on the dataset we experiment with multiple data pre-processing methods, feature selection techniques and popular machine learning algorithms (Support Vector Machines, Random Forests and Neural Networks). We assess quantitative performance in two classification tasks: binary (transient/non-transient) and eight-class classification. The best performing algorithm in both tasks is the Random Forest Classifier. It achieves an F1-score of 96.25% in the binary classification and 52.79% in the eight-class classification. For the eight-class classification, non-transients ( 96.83% ) is the class with the highest F1-score, while the lowest corresponds to high-proper-motion stars ( 16.79% ); for supernovae it achieves a value of 54.57% , close to the average across classes. The next release of MANTRA includes images and benchmarks with deep learning models.
摘要：介绍口头禅，4869瞬态和71207非瞬态对象lightcurves的标注的数据集从卡塔利娜实时瞬态调查建。我们提供的公共访问该数据集作为一个纯文本文件，方便天文瞬态事件识别算法标准化的定量比较。一些包括在数据集的类的有：超新星，激变变，活动星系核，高自行星，耀变体和弹。由于可以在我们与多个数据前处理方法，特征选择技术和流行的机器学习算法（支持向量机，随机森林和神经网络）实验数据集执行的任务的示例。我们评估两种分类任务量化绩效：二进制（瞬态/非瞬态）和八类分类。在这两项任务表现最好的算法是随机森林分类。它实现了在八类分类中的二元分类96.25％和52.79％的F1-得分。为八类分类，非瞬变（96.83％）是具有最高F1-得分的类，而最低的对应于高适当运动分（16.79％）;超新星它实现了54.57％，接近横跨类的平均的值。口头禅下一个版本包括图像和基准深的学习模式。

43. Deep Polynomial Neural Networks [PDF] 返回目录
Grigorios Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Jiankang Deng, Yannis Panagakis, Stefanos Zafeiriou
Abstract: Deep Convolutional Neural Networks (DCNNs) are currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning. The success of DCNNs can be attributed to the careful selection of their building blocks (e.g., residual blocks, rectifiers, sophisticated normalization schemes, to mention but a few). In this paper, we propose $\Pi$-Nets, a new class of DCNNs. $\Pi$-Nets are polynomial neural networks, i.e., the output is a high-order polynomial of the input. The unknown parameters, which are naturally represented by high-order tensors, are estimated through a collective tensor factorization with factors sharing. We introduce three tensor decompositions that significantly reduce the number of parameters and show how they can be efficiently implemented by hierarchical neural networks. We empirically demonstrate that $\Pi$-Nets are very expressive and they even produce good results without the use of non-linear activation functions in a large battery of tasks and signals, i.e., images, graphs, and audio. When used in conjunction with activation functions, $\Pi$-Nets produce state-of-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
摘要：深卷积神经网络（DCNNs）目前无论是生成，以及一种在计算机视觉和机器学习判别学习的首选方法。 DCNNs的成功可以归因于他们的积木仔细选择（例如，残余块，整流器，复杂的标准化方案，以仅举几例）。在本文中，我们提出了$ \丕$ -Nets，一类新DCNNs的。 $ \裨$ -Nets是多项式神经网络，即，输出是输入的高阶多项式。未知参数，这些参数由自然高阶张量表示，通过与因子共享集体张量分解估计。我们推出三款张量分解是显著减少参数的数量和展示他们如何能够通过分层神经网络可以有效地实现。我们经验表明，$ \丕$ -Nets是非常有表现，他们甚至会产生没有大电池的任务和信号，即，图片，图表，音频的使用非线性激活功能良好的效果。当与活化的功能一起使用时，$ \裨$ -Nets产生状态的最先进的结果在三个富有挑战性的任务，即图像生成，人脸验证和3D网格表示学习。

44. Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation [PDF] 返回目录
Li Zhong, Zhen Fang, Feng Liu, Bo Yuan, Guangquan Zhang, Jie Lu
Abstract: In the unsupervised open set domain adaptation (UOSDA), the target domain contains unknown classes that are not observed in the source domain. Researchers in this area aim to train a classifier to accurately: 1) recognize unknown target data (data with unknown classes) and, 2) classify other target data. To achieve this aim, a previous study has proven an upper bound of the target-domain risk, and the open set difference, as an important term in the upper bound, is used to measure the risk on unknown target data. By minimizing the upper bound, a shallow classifier can be trained to achieve the aim. However, if the classifier is very flexible (e.g., deep neural networks (DNNs)), the open set difference will converge to a negative value when minimizing the upper bound, which causes an issue where most target data are recognized as unknown data. To address this issue, we propose a new upper bound of target-domain risk for UOSDA, which includes four terms: source-domain risk, $\epsilon$-open set difference ($\Delta_\epsilon$), a distributional discrepancy between domains, and a constant. Compared to the open set difference, $\Delta_\epsilon$ is more robust against the issue when it is being minimized, and thus we are able to use very flexible classifiers (i.e., DNNs). Then, we propose a new principle-guided deep UOSDA method that trains DNNs via minimizing the new upper bound. Specifically, source-domain risk and $\Delta_\epsilon$ are minimized by gradient descent, and the distributional discrepancy is minimized via a novel open-set conditional adversarial training strategy. Finally, compared to existing shallow and deep UOSDA methods, our method shows the state-of-the-art performance on several benchmark datasets, including digit recognition (MNIST, SVHN, USPS), object recognition (Office-31, Office-Home), and face recognition (PIE).
摘要：在无监督开集域适配（UOSDA），目标域包含未在源域中观察到未知类。在这个领域的研究人员的目标是分类器训练准确地：1）识别未知对象数据（与未知类别的数据）和，2）分类其他目标数据。为了达到这个目的，先前的研究已经证明了一个上限目标域的风险，而开集的差异，作为上界一个重要的名词，用来测量未知目标数据的风险。通过最小化上界，一个浅的分类器可以被训练以达到目的。然而，如果分类器是非常灵活的（例如，深神经网络（DNNs）），开集差异将最小化的上限，这将导致在那里大多数目标数据被识别为未知的数据一个问题，当收敛到一个负值。为了解决这个问题，我们提出了一个新的上界UOSDA，其中包括四个方面的目标域的风险：源域风险，$ \ $小量一套-open差异（$ \ Delta_ \ $小量）之间的分布差异域，和一个常数。相比开集不同，$ \ Delta_ \ $小量是针对问题更健壮被最小化时，因此，我们能够使用非常灵活的分类（即DNNs）。然后，我们提出了一个新的原则，引导深UOSDA方法是通过最大限度地减少列车DNNs新的上限。具体而言，源域的风险和$ \ Delta_ \小量$通过梯度下降最小化，并且分布差异是通过一种新的开放的设置条件对抗性训练策略最小化。最后，与现有浅层和深层UOSDA方法，我们的方法显示了几个基准数据集，包括数字识别（MNIST，SVHN，USPS），物体识别的国家的最先进的性能（办公-31，Office的主页）和人脸识别（PIE）。

45. Rotation-Equivariant Neural Networks for Privacy Protection [PDF] 返回目录
Hao Zhang, Yiting Chen, Haotian Ma, Xu Cheng, Qihan Ren, Liyao Xiang, Jie Shi, Quanshi Zhang
Abstract: In order to prevent leaking input information from intermediate-layer features, this paper proposes a method to revise the traditional neural network into the rotation-equivariant neural network (RENN). Compared to the traditional neural network, the RENN uses d-ary vectors/tensors as features, in which each element is a d-ary number. These d-ary features can be rotated (analogous to the rotation of a d-dimensional vector) with a random angle as the encryption process. Input information is hidden in this target phase of d-ary features for attribute obfuscation. Even if attackers have obtained network parameters and intermediate-layer features, they cannot extract input information without knowing the target phase. Hence, the input privacy can be effectively protected by the RENN. Besides, the output accuracy of RENNs only degrades mildly compared to traditional neural networks, and the computational cost is significantly less than the homomorphic encryption.
摘要：为了防止中间层的特征泄漏输入信息，提出对传统的神经网络修正到旋转等变神经网络（RENN）的方法。相较于传统的神经网络，所述RENN使用d进制矢量/张量作为特征，其中每个元素是一个d进制数。这些d进制特征可以被旋转（类似于d维矢量的旋转）与一个随机角度作为加密过程。输入信息被隐藏在d进制特征为属性模糊处理这个目标相位。即使攻击者已经获得的网络参数和中间层的功能，它们不能在不知道目标相位提取输入信息。因此，输入隐私，可以有效地通过RENN保护。此外，RENNs的输出精度不仅降低轻度相比传统的神经网络和计算成本显著低于同态加密。

46. Joint Left Atrial Segmentation and Scar Quantification Based on a DNN with Spatial Encoding and Shape Attention [PDF] 返回目录
Lei Li, Xin Weng, Julia A. Schnabel, Xiahai Zhuang
Abstract: We propose an end-to-end deep neural network (DNN) which can simultaneously segment the left atrial (LA) cavity and quantify LA scars. The framework incorporates the continuous spatial information of the target by introducing a spatially encoded (SE) loss based on the distance transform map. Compared to conventional binary label based loss, the proposed SE loss can reduce noisy patches in the resulting segmentation, which is commonly seen for deep learning-based methods. To fully utilize the inherent spatial relationship between LA and LA scars, we further propose a shape attention (SA) mechanism through an explicit surface projection to build an end-to-end-trainable model. Specifically, the SA scheme is embedded into a two-task network to perform the joint LA segmentation and scar quantification. Moreover, the proposed method can alleviate the severe class-imbalance problem when detecting small and discrete targets like scars. We evaluated the proposed framework on 60 LGE MRI data from the MICCAI2018 LA challenge. For LA segmentation, the proposed method reduced the mean Hausdorff distance from 36.4 mm to 20.0 mm compared to the 3D basic U-Net using the binary cross-entropy loss. For scar quantification, the method was compared with the results or algorithms reported in the literature and demonstrated better performance.
摘要：我们提出一个终端到终端深层神经网络（DNN），它可以同时段左心房（LA）腔和量化LA疤痕。该框架结合了通过引入基于距离的空间编码的（SE）损失变换映射目标的连续的空间信息。相比传统的二进制标记基于丢失，建议SE损失减少产生的分割，这是常见的深基于学习的方法嘈杂的补丁。为了充分利用LA和LA疤痕之间的固有的空间关系，我们进一步通过显式表面投影提出的形状的关注（SA）机制来构建的端至端可训练模型。具体来说，SA方案被嵌入到一个双任务网络来执行联合LA分割和瘢痕定量。此外，检测像伤痕小和离散的目标时，该方法能减轻严重类不平衡问题。我们评估从MICCAI2018 LA挑战60个LGE MRI数据所提出的框架。对于LA分割，所提出的方法相比，3D基本U形网使用二进制交叉熵损失降低平均Hausdorff距离从36.4毫米至20.0mm。疤痕量化，所述方法用在文献中报道的结果或算法相比并展示了更好的性能。

47. Deep Attentive Wasserstein Generative Adversarial Networks for MRI Reconstruction with Recurrent Context-Awareness [PDF] 返回目录
Yifeng Guo, Chengjia Wang, Heye Zhang, Guang Yang
Abstract: The performance of traditional compressive sensing-based MRI (CS-MRI) reconstruction is affected by its slow iterative procedure and noise-induced artefacts. Although many deep learning-based CS-MRI methods have been proposed to mitigate the problems of traditional methods, they have not been able to achieve more robust results at higher acceleration factors. Most of the deep learning-based CS-MRI methods still can not fully mine the information from the k-space, which leads to unsatisfactory results in the MRI reconstruction. In this study, we propose a new deep learning-based CS-MRI reconstruction method to fully utilise the relationship among sequential MRI slices by coupling Wasserstein Generative Adversarial Networks (WGAN) with Recurrent Neural Networks. Further development of an attentive unit enables our model to reconstruct more accurate anatomical structures for the MRI data. By experimenting on different MRI datasets, we have demonstrated that our method can not only achieve better results compared to the state-of-the-arts but can also effectively reduce residual noise generated during the reconstruction process.
摘要：传统的基于感测的压缩MRI（CS-MRI）重建的性能是由其缓慢迭代过程和噪声引起的伪影的影响。虽然许多深学习型CS-MRI方法被提出，以减轻传统方法的问题，他们一直没能在更高的加速因子，以获得更稳定的结果。大多数的深度学习为基础的CS-MRI方法还不能完全矿从k空间的信息，这导致了MRI重建令人满意的结果。在这项研究中，我们提出的递归神经网络的新的基于学习深CS-MRI重建方法，充分利用通过耦合瓦瑟斯坦剖成对抗性网络（WGAN）连续MRI片之间的关系。一个细心的单元的进一步发展，使我们的模型重建更精确的解剖结构的MRI数据。通过实验对不同的数据集，MRI，我们已经证明，我们的方法不仅可以达到更好的效果相比，国家的最艺术的同时也可以有效地降低在重建过程中产生的剩余噪声。

48. Scale-Space Autoencoders for Unsupervised Anomaly Segmentation in Brain MRI [PDF] 返回目录
Christoph Baur, Benedikt Wiestler, Shadi Albarqouni, Nassir Navab
Abstract: Brain pathologies can vary greatly in size and shape, ranging from few pixels (i.e. MS lesions) to large, space-occupying tumors. Recently proposed Autoencoder-based methods for unsupervised anomaly segmentation in brain MRI have shown promising performance, but face difficulties in modeling distributions with high fidelity, which is crucial for accurate delineation of particularly small lesions. Here, similar to these previous works, we model the distribution of healthy brain MRI to localize pathologies from erroneous reconstructions. However, to achieve improved reconstruction fidelity at higher resolutions, we learn to compress and reconstruct different frequency bands of healthy brain MRI using the laplacian pyramid. In a range of experiments comparing our method to different State-of-the-Art approaches on three different brain MR datasets with MS lesions and tumors, we show improved anomaly segmentation performance and the general capability to obtain much more crisp reconstructions of input data at native resolution. The modeling of the laplacian pyramid further enables the delineation and aggregation of lesions at multiple scales, which allows to effectively cope with different pathologies and lesion sizes using a single model.
摘要：脑病变可以在尺寸和形状变化很大，从几个像素（即MS病变）到大，占位肿瘤。最近提出的建模与高保真，这是特别小病灶的准确界定至关重要分布在脑MRI无监督异常分割显示性能看好基于自动编码的方法，但面对的困难。在此，与这些以前的作品中，我们的模型健康的大脑MRI的本地化疾病从错误重建的分布。然而，为了实现在更高的分辨率提高重建的保真度，我们学会了压缩和使用拉普拉斯金字塔重建健康的大脑MRI的不同频段。在一系列的我们的方法比较不同国家的最先进与MS病变和肿瘤三种不同的脑MR数据集的方法实验中，我们表现出改善异常的分割性能和综合性能，在获取输入数据的更清晰的重建原始分辨率。拉普拉斯金字塔的建模还使得病变的多尺度划分和聚集，这允许有效地使用单个模型不同的病状和损伤尺寸对应。

49. Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks [PDF] 返回目录
Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, Matthias Hein
Abstract: A large body of research has focused on adversarial attacks which require to modify all input features with small $l_2$- or $l_\infty$-norms. In this paper we instead focus on query-efficient sparse attacks in the black-box setting. Our versatile framework, Sparse-RS, based on random search achieves state-of-the-art success rate and query efficiency for different sparse attack models such as $l_0$-bounded perturbations (outperforming established white-box methods), adversarial patches, and adversarial framing. We show the effectiveness of Sparse-RS on different datasets considering problems from image recognition and malware detection and multiple variations of sparse threat models, including targeted and universal perturbations. In particular Sparse-RS can be used for realistic attacks such as universal adversarial patch attacks without requiring a substitute model. The code of our framework is available at this https URL.
摘要：研究一个大身体一直专注于需要修改所有输入功能与小$ L_2 $对抗攻击 - 或$ L_ \ infty $ -norms。在本文中，我们反而集中在黑盒设置的查询效率稀疏的攻击。我们的多功能框架，稀疏-RS，基于随机搜索实现了国家的最先进的成功率和查询效率不同稀疏攻击模式，如$ L_0 $ -bounded扰动（跑赢建立白盒法），对抗性的补丁，和对抗性的框架。我们发现稀疏-RS对考虑从图像识别和恶意软件检测和稀疏的威胁模型，包括有针对性的和普遍的扰动的多个变化问题不同的数据集的有效性。特别是稀疏-RS可用于逼真的攻击，例如，而不需要替代模型通用对抗性补丁攻击。我们的框架的代码可在此HTTPS URL。

50. 3D Probabilistic Segmentation and Volumetry from 2D projection images [PDF] 返回目录
Athanasios Vlontzos, Samuel Budd, Benjamin Hou, Daniel Rueckert, Bernhard Kainz
Abstract: X-Ray imaging is quick, cheap and useful for front-line care assessment and intra-operative real-time imaging (e.g., C-Arm Fluoroscopy). However, it suffers from projective information loss and lacks vital volumetric information on which many essential diagnostic biomarkers are based on. In this paper we explore probabilistic methods to reconstruct 3D volumetric images from 2D imaging modalities and measure the models' performance and confidence. We show our models' performance on large connected structures and we test for limitations regarding fine structures and image domain sensitivity. We utilize fast end-to-end training of a 2D-3D convolutional networks, evaluate our method on 117 CT scans segmenting 3D structures from digitally reconstructed radiographs (DRRs) with a Dice score of $0.91 \pm 0.0013$. Source code will be made available by the time of the conference.
摘要：X射线成像是快速，廉价和一线护理评估和术中实时成像（例如，C型臂荧光透视法）是有用的。然而，从投影信息遭受的损失，缺乏对许多重要的诊断性生物标记是基于重要的体积信息。在本文中，我们探讨概率方法来重建从二维成像方式的3D立体图像，并测量模型的性能和信心。我们发现我们的模型大连接结构的性能和我们测试的局限性有关精细结构和图像域灵敏度。我们利用快速终端到终端的培训2D-3D卷积网络，评估我们在117 CT方法扫描分割从数字重建射线照片（的DRR）用骰子得分$ 0.91 \下午$ 0.0013的三维结构。源代码将通过本次会议的时间提供。

51. Post-hoc Calibration of Neural Networks [PDF] 返回目录
Amir Rahimi, Kartik Gupta, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley
Abstract: Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as methods that learn an additional function to calibrate an already trained base network. In this work, we intend to understand the post-hoc calibration methods from a theoretical point of view. Especially, it is known that minimizing Negative Log-Likelihood (NLL) will lead to a calibrated network on the training set if the global optimum is attained (Bishop, 1994). Nevertheless, it is not clear learning an additional function in a post-hoc manner would lead to calibration in the theoretical sense. To this end, we prove that even though the base network ($f$) does not lead to the global optimum of NLL, by adding additional layers ($g$) and minimizing NLL by optimizing the parameters of $g$ one can obtain a calibrated network $g \circ f$. This not only provides a less stringent condition to obtain a calibrated network but also provides a theoretical justification of post-hoc calibration methods. Our experiments on various image classification benchmarks confirm the theory.
摘要：神经网络的校准是在现实世界中的决策系统结合机器学习模型时，其中的决策信心是因为决定本身同样重要的是考虑的一个重要方面。近年来，有可分为事后校准方法，定义为学习的附加功能来校准已经受过训练基地网络方法的神经网络校准研究和大部分作品的激增。在这项工作中，我们打算从理解的理论观点的事后校准方法。特别地，已知的是，最小化负对数似然（NLL）将导致训练集校准网络如果全局最优达到（毕晓普，1994）。然而，它没有明确的学习在事后的方式附加功能会导致校准的理论意义。为此，我们证明了即使基础网络（$ F $）不会导致NLL的全局最优，通过添加额外的层（$克$）和通过优化的参数最小化NLL $克$可以得到校准的网络$克\ CIRC F $。这不仅提供了一种较不严格的条件，以获得校准的网络，而且还提供了事后校准方法的理论的理由。我们对各种图像分类基准实验证实这一理论。

52. Calibration of Neural Networks using Splines [PDF] 返回目录
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley
Abstract: Calibrating neural networks is of utmost importance when employing them in safety-critical applications where the downstream decision making depends on the predicted probabilities. Measuring calibration error amounts to comparing two empirical distributions. In this work, we introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions. From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities. The spine-fitting is performed using a held-out calibration set and the obtained recalibration function is evaluated on an unseen test set. We tested our method against existing calibration approaches on various image classification datasets and our spline-based recalibration approach consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
摘要：在安全关键应用雇用他们时，其中下游决策依赖于预测概率校准神经网络是非常重要的。测量校准误差达比较两个经验分布。在这项工作中，我们介绍了由经典柯尔莫哥洛夫 - 斯米尔诺夫（KS）统计测试中，其主要思想是，以各自的累积概率分布比较启发了免费分仓校正措施。由此，通过近似经由花键使用微函数的经验累积分布，我们得到一个重新校准功能，它在网络的输出映射到实际的（校准的）类别指派概率。使用持有出校准集进行脊柱配合和所获得的再校准功能上看不见的测试组进行评价。我们测试了我们对现有的标定方法上的各种图像数据集分类方法和我们的基于样条校准方法始终优于对KS错误以及其他常用的校正措施，现有的方法。

53. Learning Physical Constraints with Neural Projections [PDF] 返回目录
Shuqi Yang, Xingzhe He, Bo Zhu
Abstract: We propose a new family of neural networks to predict the behaviors of physical systems by learning their underpinning constraints. A neural projection operator liesat the heart of our approach, composed of a lightweight network with an embedded recursive architecture that interactively enforces learned underpinning constraints and predicts the various governed behaviors of different physical systems. Our neural projection operator is motivated by the position-based dynamics model that has been used widely in game and visual effects industries to unify the various fast physics simulators. Our method can automatically and effectively uncover a broad range of constraints from observation point data, such as length, angle, bending, collision, boundary effects, and their arbitrary combinations, without any connectivity priors. We provide a multi-group point representation in conjunction with a configurable network connection mechanism to incorporate prior inputs for processing complex physical systems. We demonstrated the efficacy of our approach by learning a set of challenging physical systems all in a unified and simple fashion including: rigid bodies with complex geometries, ropes with varying length and bending, articulated soft and rigid bodies, and multi-object collisions with complex boundaries.
摘要：我们提出了神经网络的一个新的家庭通过学习他们的托底约束预测物理系统的行为。的神经投影算liesat我们的做法的心脏地带，与嵌入式递归架构的轻量级网络交互强制实施托底了解到约束和组成预测不同的物理系统的各种约束行为。我们的神经投影算子是由已被广泛应用于游戏和视觉特效行业统一的各种快速物理仿真的基于位置的动力学模型的动机。我们的方法可以自动地并有效地发现一个宽范围的从观察点的数据，诸如长度，角度，弯曲，碰撞，边界效应，和它们的任意组合的限制，而没有任何连接先验。我们提供了可配置的网络连接机构结合的多组点表示用于处理复杂的物理系统结合之前输入。具有复杂几何形状的刚体，具有不同长度和弯曲，关节软，硬体的绳索，并用复杂的多物体碰撞：我们通过学习一组都在一个统一和简单的方式，包括具有挑战性的物理系统的展示了我们方法的有效性边界。

54. Semi-Supervised Learning for Fetal Brain MRI Quality Assessment with ROI consistency [PDF] 返回目录
Junshen Xu, Sayeri Lala, Borjan Gagoski, Esra Abaci Turk, P. Ellen Grant, Polina Golland, Elfar Adalsteinsson
Abstract: Fetal brain MRI is useful for diagnosing brain abnormalities but is challenged by fetal motion. The current protocol for T2-weighted fetal brain MRI is not robust to motion so image volumes are degraded by inter- and intra slice motion artifacts. Besides, manual annotation for fetal MR image quality assessment are usually time-consuming. Therefore, in this work, a semi-supervised deep learning method that detects slices with artifacts during the brain volume scan is proposed. Our method is based on the mean teacher model, where we not only enforce consistency between student and teacher models on the whole image, but also adopt an ROI consistency loss to guide the network to focus on the brain region. The proposed method is evaluated on a fetal brain MR dataset with 11,223 labeled images and more than 200,000 unlabeled images. Results show that compared with supervised learning, the proposed method can improve model accuracy by about 6\% and outperform other state-of-the-art semi-supervised learning methods. The proposed method is also implemented and evaluated on an MR scanner, which demonstrates the feasibility of online image quality assessment and image reacquisition during fetal MR scans.
摘要：胎儿脑部MRI是诊断脑畸形有用，但由胎儿运动的挑战。所以图像体积通过之间和内部片运动伪影对降解T2加权胎脑MRI当前协议是不鲁棒的运动。此外，对于胎儿MR图像质量评价人工注释通常耗时。因此，在这项工作中，一个半监督所提出的脑容积扫描过程中检测到的文物切片深度学习方法。我们的方法是基于平均老师模型，在这里我们不仅强制学生和教师模型之间的一致性的整体形象，而且还采用了ROI的一致性损失的网络引导重点放在大脑区域。所提出的方法与11223倍标记的图像和超过20万个的未标记的图象的胎儿大脑MR数据集进行评估。结果表明，与监督学习相比，所提出的方法可以通过约6 \％提高模型的精度和优于国家的最先进的其他半监督学习方法。所提出的方法也可以实现和一个MR扫描仪，这表明在胎儿MR扫描线上的图像质量评估和重新获取图像的可行性进行评估。

55. Generalized Grasping for Mechanical Grippers for Unknown Objects with Partial Point Cloud Representations [PDF] 返回目录
Michael Hegedus, Kamal Gupta, Mehran Mehrandezh
Abstract: We present a generalized grasping algorithm that uses point clouds (i.e. a group of points and their respective surface normals) to discover grasp pose solutions for multiple grasp types, executed by a mechanical gripper, in near real-time. The algorithm introduces two ideas: 1) a histogram of finger contact normals is used to represent a grasp 'shape' to guide a gripper orientation search in a histogram of object(s) surface normals, and 2) voxel grid representations of gripper and object(s) are cross-correlated to match finger contact points, i.e. grasp 'size', to discover a grasp pose. Constraints, such as collisions with neighbouring objects, are optionally incorporated in the cross-correlation computation. We show via simulations and experiments that 1) grasp poses for three grasp types can be found in near real-time, 2) grasp pose solutions are consistent with respect to voxel resolution changes for both partial and complete point cloud scans, and 3) a planned grasp is executed with a mechanical gripper.
摘要：本文提出了一种广义抓算法，使用点云（即一个点的组，其各自的表面法线）来发现多掌握类型，通过机械夹持器执行，在近实时掌握姿势的解决方案。该算法介绍两个概念：1）手指接触法线的直方图用于表示一抓“形状”，以指导对象的直方图的夹持方向搜索（一个或多个）表面法线，以及2）夹持器和对象的体素网格表示（一个或多个）是互相关，以匹配手指接触点，即，把握“尺寸”，以发现一抓姿态。的限制，例如与相邻的物体的碰撞，在互相关计算任选并入。我们通过仿真和实验证明，三种抓类型1）掌握姿势可以近乎实时的发现显示，2）掌握姿势的解决方案是相对于部分和完全的点云扫描三维像素分辨率的变化相一致，以及3）计划把握与机械爪执行。

56. Inexact Derivative-Free Optimization for Bilevel Learning [PDF] 返回目录
Matthias J. Ehrhardt, Lindon Roberts
Abstract: Variational regularization techniques are dominant in the field of mathematical imaging. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. A by now common strategy to resolve this issue is to learn these parameters from data. While mathematically appealing this strategy leads to a nested optimization problem (known as bilevel optimization) which is computationally very difficult to handle. A key ingredient in solving the upper-level problem is the exact solution of the lower-level problem which is practically infeasible. In this work we propose to solve these problems using inexact derivative-free optimization algorithms which never require to solve the lower-level problem exactly. We provide global convergence and worst-case complexity analysis of our approach, and test our proposed framework on ROF-denoising and learning MRI sampling patterns. Dynamically adjusting the lower-level accuracy yields learned parameters with similar reconstruction quality as high-accuracy evaluations but with dramatic reductions in computational work (up to 100 times faster in some cases).
摘要：变正规化技术在数学成像领域占主导地位。这些技术的缺点是，它们依赖于许多必须由用户设置的参数。一个由解决这个问题，现在常见的策略是学习从数据这些参数。虽然数学呼吁这一战略导致嵌套优化问题（被称为双相优化），这是计算非常难以处理。在解决上层问题的一个关键因素是较低级别的问题在实用上是不可行的精确解。在这项工作中，我们提出了用不精确的导数的免费优化算法，它永远不需要精确地解决了低级别的问题，解决这些问题。我们提供全局收敛性和我们的方法的最坏情况下的复杂性分析，并测试我们提出了ROF除噪和学习MRI采样模式框架。动态调整了解到类似的重建质量为高精度的评价，但在计算工作（最多在某些情况下快100倍）显着减少参数较低级别的精度产量。

57. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [PDF] 返回目录
Cassidy Laidlaw, Sahil Singla, Soheil Feizi
Abstract: We present adversarial attacks and defenses for the perceptual adversarial threat model: the set of all perturbations to natural images which can mislead a classifier but are imperceptible to human eyes. The perceptual threat model is broad and encompasses $L_2$, $L_\infty$, spatial, and many other existing adversarial threat models. However, it is difficult to determine if an arbitrary perturbation is imperceptible without humans in the loop. To solve this issue, we propose to use a {\it neural perceptual distance}, an approximation of the true perceptual distance between images using internal activations of neural networks. In particular, we use the Learned Perceptual Image Patch Similarity (LPIPS) distance. We then propose the {\it neural perceptual threat model} that includes adversarial examples with a bounded neural perceptual distance to natural images. Under the neural perceptual threat model, we develop two novel perceptual adversarial attacks to find any imperceptible perturbations to images which can fool a classifier. Through an extensive perceptual study, we show that the LPIPS distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Because the LPIPS threat model is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against 12 types of adversarial attacks and find that, for each attack, PAT achieves close to the accuracy of adversarial training against just that perturbation type. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial defense with this property.
摘要：我们提出了感知对抗威胁模型敌对攻击和防御：集合所有的扰动，可以误导分类，但无法察觉到人眼的自然图像。感知威胁模型是广泛和包括$ L_2 $，$ L_ \ infty $，空间，和许多其他现有的敌对威胁模型。但是，很难确定一个任意扰动是没有在循环人类感觉不到的。为了解决这个问题，我们建议使用{\它的神经感知距离}，利用神经网络的内部激活图像之间的真实距离知觉的近似值。特别地，我们使用习得感知图像补丁相似度（LPIPS）距离。然后，我们提出{\它的神经感知威胁模型}，其中包括与自然图像的边界神经感知距离对抗性的例子。根据神经感知威胁模型，我们开发了两种新的感性对抗攻击找到任何感觉不到的扰动，可以骗过分类图像。通过广泛的知觉的研究，我们证明了LPIPS距离很好的相关性与对抗性的例子感知能力的人的判断，验证我们的威胁模型。由于LPIPS威胁模型是非常广泛的，我们发现，感知对抗性训练（PAT）针对感性的攻击给人稳健性对许多其他类型的敌对攻击。我们对CIFAR-10和ImageNet-100测试PAT对12种类型的敌对攻击和发现，每次攻击，PAT达到接近对抗只是扰动型对抗性训练的精度。也就是说，PAT推广以及不可预见的扰动类型。这是在敏感的应用，不能假设一个特定的威胁模型至关重要，并尽我们所知，PAT是第一对抗性的防守与此属性。

58. Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images [PDF] 返回目录
Preethi Srinivasan, Prabhjot Kaur, Aditya Nigam, Arnav Bhavsar
Abstract: Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in approximately 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (approximately 1dB) increase in PSNR over SOTA.
摘要：长采集时间（AQT）由于系列采集多模态MR图像（尤其是T2加权像（T2WI）具有更长AQT），尽管用于疾病诊断是有益的，实际上是不希望的。我们提出了一个新颖的深基于网络的解决方案来从重建使用编码器 - 解码器架构T1W图像（T1WI）T2W图像。所提出的学习辅助通过使用多通道输入用的强度值和在两个正交方向图像的梯度语义特征。重建模块（RM）与域适配模块（DAM），它是内置的具有尖锐瓶颈模块（SBM）的编码器 - 解码器模型沿扩充所述网络经由模块培训的培训。所提出的网络显著减少具有可忽略的定性伪影和定量损失恢复（重构在大约1秒一个体积）的总AQT。该测试是在与真正的MR图像可公开获得的数据集进行，并通过SOTA在PSNR所提出的网络节目（约1分贝）增加。

59. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks [PDF] 返回目录
Avi Schwarzschild, Micah Goldblum, Arjun Gupta, John P Dickerson, Tom Goldstein
Abstract: Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, we find that the impressive performance evaluations from data poisoning attacks are, in large part, artifacts of inconsistent experimental design. Moreover, we find that existing poisoning methods have been tested in contrived scenarios, and they fail in realistic settings. In order to promote fair comparison in future work, we develop unified benchmarks for data poisoning and backdoor attacks.
摘要：数据中毒和后门攻击操纵为了原因模型推理过程中失败的训练数据。近期行业从业者的调查发现，数据中毒的威胁，从模型偷敌对攻击中的头号关注。然而，我们发现，从数据中毒攻击的骄人业绩评价是，在很大程度上，不一致的实验设计假象。此外，我们发现，现有的中毒方法在人为的情况进行了测试，以及他们在现实的设置失败。为了促进在今后的工作公平的比较，我们制订一套统一的基准数据中毒和后门攻击。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-06-24

目录

摘要