摘要

1. Motion and Region Aware Adversarial Learning for Fall Detection with Thermal Imaging [PDF] 返回目录
Vineet Mehta, Abhinav Dhall, Sujata Pal, Shehroz Khan
Abstract: Automatic fall detection is a vital technology for ensuring health and safety of people. Home based camera systems for fall detection often put people's privacy at risk. Thermal cameras can partially/fully obfuscate facial features, thus preserving the privacy of a person. Another challenge is the less occurrence of falls in comparison to normal activities of daily living. As fall occurs rarely, it is non-trivial to learn algorithms due to class imbalance. To handle these problems, we formulate fall detection as an anomaly detection within an adversarial framework using thermal imaging camera. We present a novel adversarial network that comprise of two channel 3D convolutional auto encoders; one each handling video sequences and optical flow, which then reconstruct the thermal data and the optical flow input sequences. We introduce a differential constraint, a technique to track the region of interest and a joint discriminator to compute the reconstruction error. Larger reconstruction error indicates the occurrence of fall in a video sequence. The experiments on a publicly available thermal fall dataset show the superior results obtained in comparison to standard baseline.
摘要：自动跌倒检测是确保人的健康和安全的重要技术。秋季检测家庭为基础的相机系统通常把人的隐私处于危险之中。热成像摄像机可以部分/完全模糊的五官，从而保持一个人的隐私。另一个挑战是瀑布相比，日常生活的正常活动较少发生。随着秋天很少出现，这是不平凡的学习由于类不平衡的算法。为了处理这些问题，我们配制跌倒检测如使用热成像相机对抗性框架内的异常检测。我们提出了一个新颖的对抗性网络包括两个信道的三维卷积自动编码器;每一个处理视频序列和光流，然后重建的热数据和所述光流输入序列。我们引入一个微分约束，技术，跟踪感兴趣的区域和联合鉴别计算重构误差。较大的重构误差指示跌倒的视频序列中发生。在公开的热秋季数据集实验表明相比于标准基线得到的优异的结果。

2. An integrated light management system with real-time light measurement and human perception [PDF] 返回目录
Theodore Tsesmelis, Irtiza Hasan, Marco Cristani, Alessio Del Bue, Fabio Galasso
Abstract: Illumination is important for well-being, productivity and safety across several environments, including offices, retail shops and industrial warehouses. Current techniques for setting up lighting require extensive and expert support and need to be repeated if the scene changes. Here we propose the first fully-automated light management system (LMS) which measures lighting in real-time, leveraging an RGBD sensor and a radiosity-based light propagation model. Thanks to the integration of light distribution and perception curves into the radiosity, we outperform a commercial software (Relux) on a newly introduced dataset. Furthermore, our proposed LMS is the first to estimate both the presence and the attention of the people in the environment, as well as their light perception. Our new LMS adapts therefore lighting to the scene and human activity and it is capable of saving up to 66%, as we experimentally quantify,without compromising the lighting quality.
摘要：照明是幸福，生产力和安全性的重要跨多个环境，包括办公室，零售商店和工业仓库。设立照明目前的技术需要大量的和专家支持，如果需要在现场改变重复。这里我们提出了第一个完全自动化的光管理系统（LMS），其措施实时照明，利用一个RGBD传感器和基于光能传递光传播模型。由于光分布和感知曲线成光能的整合，我们跑赢上新推出的集商业软件（Relux）。此外，我们提出的LMS是估计的存在都与环境的人的注意，以及它们的光感第一。因此，我们的新的LMS适应灯光场景和人类活动，它能够节省高达66％，因为我们实验量化，不影响照明质量的。

3. Data-driven Flood Emulation: Speeding up Urban Flood Predictions by Deep Convolutional Neural Networks [PDF] 返回目录
Zifeng Guo, Joao P. Leitao, Nuno E. Simoes, Vahid Moosavi
Abstract: Computational complexity has been the bottleneck of applying physically-based simulations on large urban areas with high spatial resolution for efficient and systematic flooding analyses and risk assessments. To address this issue of long computational time, this paper proposes that the prediction of maximum water depth rasters can be considered as an image-to-image translation problem where the results are generated from input elevation rasters using the information learned from data rather than by conducting simulations, which can significantly accelerate the prediction process. The proposed approach was implemented by a deep convolutional neural network trained on flood simulation data of 18 designed hyetographs on three selected catchments. Multiple tests with both designed and real rainfall events were performed and the results show that the flood predictions by neural network uses only 0.5 % of time comparing with physically-based approaches, with promising accuracy and ability of generalizations. The proposed neural network can also potentially be applied to different but relevant problems including flood predictions for urban layout planning.
摘要：计算复杂性一直在与高效率和系统化洪水分析和风险评估高空间分辨率的大城市采用基于物理模拟的瓶颈。为了解决长期计算时间这个问题，本文提出了最大水深栅格的预测可以被认为是其中的结果是从使用的资料，而不是学习的信息输入的高程栅格数据生成的图像 - 图像转换问题传导模拟，其可以显著加速预测处理。所提出的方法是由经过培训的对三个选定流域18个设计hyetographs洪水模拟数据的深卷积神经网络来实现。与多个测试都设计并进行实际降雨事件和结果表明，神经网络洪水预测仅使用0.5％的时间用物理为基础的方法相比，有希望的精度和概括的能力。拟议的神经网络也有可能被应用到不同但相关的问题，包括洪水预测城市布局规划。

4. IDDA: a large-scale multi-domain dataset for autonomous driving [PDF] 返回目录
Emanuele Alberti, Antonio Tavera, Carlo Masone, Barbara Caputo
Abstract: Semantic segmentation is key in autonomous driving. Using deep visual learning architectures is not trivial in this context, because of the challenges in creating suitable large scale annotated datasets. This issue has been traditionally circumvented through the use of synthetic datasets, that have become a popular resource in this field. They have been released with the need to develop semantic segmentation algorithms able to close the visual domain shift between the training and test data. Although exacerbated by the use of artificial data, the problem is extremely relevant in this field even when training on real data. Indeed, weather conditions, viewpoint changes and variations in the city appearances can vary considerably from car to car, and even at test time for a single, specific vehicle. How to deal with domain adaptation in semantic segmentation, and how to leverage effectively several different data distributions (source domains) are important research questions in this field. To support work in this direction, this paper contributes a new large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types. Extensive benchmark experiments assess the dataset, showcasing open challenges for the current state of the art. The dataset will be available at: this https URL .
摘要：语义分割是自动驾驶的关键。使用深视觉学习架构是不平凡的在这种情况下，因为在创造适合大规模数据集注释的挑战。这个问题已经通过使用合成的数据集，已成为这一领域的热门资源的传统上规避。他们已经发布了需要开发能够关闭训练和测试数据之间的视觉域转变语义分割算法。虽然使用人工数据的加剧，问题是在这个领域，即使训练的真实数据非常重要。事实上，气候条件，视角变化和变化的城市露面甚至可以在测试时间为一个单一的，特定的车辆有很大的不同，从车来车，和。如何处理在语义分割领域适应性，以及如何利用有效的几个不同的数据分布（源域）在这个领域重要的研究问题。在这个方向支撑工作，本文有助于新的大规模的，合成的数据集用于与100多个不同的源的视觉域语义分割。该数据集已经建立，以明确应对各种天气和观点条件，训练和测试数据之间的域转移的挑战，在七个不同的城市类型。大量的实验基准评估数据集，展示了艺术的现状开放的挑战。该数据集将可在：该HTTPS URL。

5. Weakly Supervised Geodesic Segmentation of Egyptian Mummy CT Scans [PDF] 返回目录
Avik Hati, Matteo Bustreo, Diego Sona, Vittorio Murino, Alessio Del Bue
Abstract: In this paper, we tackle the task of automatically analyzing 3D volumetric scans obtained from computed tomography (CT) devices. In particular, we address a particular task for which data is very limited: the segmentation of ancient Egyptian mummies CT scans. We aim at digitally unwrapping the mummy and identify different segments such as body, bandages and jewelry. The problem is complex because of the lack of annotated data for the different semantic regions to segment, thus discouraging the use of strongly supervised approaches. We, therefore, propose a weakly supervised and efficient interactive segmentation method to solve this challenging problem. After segmenting the wrapped mummy from its exterior region using histogram analysis and template matching, we first design a voxel distance measure to find an approximate solution for the body and bandage segments. Here, we use geodesic distances since voxel features as well as spatial relationship among voxels is incorporated in this measure. Next, we refine the solution using a GrabCut based segmentation together with a tracking method on the slices of the scan that assigns labels to different regions in the volume, using limited supervision in the form of scribbles drawn by the user. The efficiency of the proposed method is demonstrated using visualizations and validated through quantitative measures and qualitative unwrapping of the mummy.
摘要：在本文中，我们处理的自动分析源自计算机断层扫描（CT）装置中获得的三维立体扫描任务。特别是，我们解决特定任务的数据是非常有限的：古埃及木乃伊的CT扫描的分割。我们的目标是在数字解缠木乃伊和识别不同的领域，如身体，绷带和珠宝。问题是复杂的，因为缺乏对不同的语义区域段注释的数据，从而鼓励使用强力监督的方法。因此，我们提出了一个弱监督，高效的互动分割的方法来解决这个具有挑战性的问题。使用直方图分析和模板匹配从它的外部区域分割所述包裹木乃伊后，我们首先设计一个体素距离度量以找到体和绷带段的近似解。在这里，我们用最短距离，因为体素的功能，以及体素之间的空间关系，在这一措施中。接下来，我们细化使用基于GrabCut分割与指派标签在体积不同的区域中，使用由用户绘制涂鸦的形式有限监督的扫描的切片的跟踪方法一起的溶液。所提出的方法的效率是利用可视化展示，并通过定量测量和木乃伊的定性解缠验证。

6. Res-CR-Net, a residual network with a novel architecture optimized for the semantic segmentation of microscopy images [PDF] 返回目录
Hassan Abdallah, Asiri Liyanaarachchi, Maranda Saigh, Samantha Silvers, Suzan Arslanturk, Douglas J. Taatjes, Lars Larsson, Bhanu P. Jena, Domenico L. Gatti
Abstract: Deep Neural Networks (DNN) have been widely used to carry out segmentation tasks in both electron and light microscopy. Most DNNs developed for this purpose are based on some variation of the encoder-decoder type U-Net architecture, in combination with residual blocks to increase ease of training and resilience to gradient degradation. Here we introduce Res-CR-Net, a type of DNN that features residual blocks with either a bundle of separable atrous convolutions with different dilation rates or a convolutional LSTM. The number of filters used in each residual block and the number of blocks are the only hyperparameters that need to be modified in order to optimize the network training for a variety of different microscopy images.
摘要：深层神经网络（DNN）已被广泛应用于电子和光学显微镜来进行分割的任务。为此目的而开发最DNNs基于所述编码器 - 解码器型U形网架构，在与残余块以增加的容易性的训练和适应力梯度降解组合的一些变化。下面我们介绍RES-CR-Net的，一种类型的DNN，具有残余块与无论是具有不同膨胀速率可分离atrous卷积束或卷积LSTM。在每个剩余块的过滤器和数字的块的数量是需要以优化用于各种不同的显微图像的网络的训练进行修改的唯一超参数。

7. Learning to Predict Context-adaptive Convolution for Semantic Segmentation [PDF] 返回目录
Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li
Abstract: Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K.
摘要：远程上下文信息是实现高性能的语义分割是必不可少的。以前功能重新加权的方法证明，使用全局上下文重新加权特征的通道能有效地提高语义分割的准确性。然而，共享全局特征重新加权矢量可能不是在输入图像中不同类别的区域是最佳的。在本文中，我们提出了一个基于上下文的卷积网络（CAC-净）来预测空间变化的特征加权向量的语义特征地图的每个空间位置。在CAC-网，一组上下文自适应卷积核的是从一个参数有效的方式全局上下文信息进行预测。当与语义特征地图用于卷积，预测的卷积内核可以产生空间变化的特征加权捕获全局和局部的上下文信息的因素。综合实验结果表明，我们的CAC-网实现三个公共数据集卓越的分割性能，PASCAL上下文，PASCAL VOC 2012和ADE20K。

8. Vehicle Position Estimation with Aerial Imagery from Unmanned Aerial Vehicles [PDF] 返回目录
Friedrich Kruber, Eduardo Sánchez Morales, Samarjit Chakraborty, Michael Botsch
Abstract: The availability of real-world data is a key element for novel developments in the fields of automotive and traffic research. Aerial imagery has the major advantage of recording multiple objects simultaneously and overcomes limitations such as occlusions. However, there are only few data sets available. This work describes a process to estimate a precise vehicle position from aerial imagery. A robust object detection is crucial for reliable results, hence the state-of-the-art deep neural network Mask-RCNN is applied for that purpose. Two training data sets are employed: The first one is optimized for detecting the test vehicle, while the second one consists of randomly selected images recorded on public roads. To reduce errors, several aspects are accounted for, such as the drone movement and the perspective projection from a photograph. The estimated position is comapared with a reference system installed in the test vehicle. It is shown, that a mean accuracy of 20 cm can be achieved with flight altitudes up to 100 m, Full-HD resolution and a frame-by-frame detection. A reliable position estimation is the basis for further data processing, such as obtaining additional vehicle state variables. The source code, training weights, labeled data and example videos are made publicly available. This supports researchers to create new traffic data sets with specific local conditions.
摘要：真实世界的数据的可用性是在汽车和交通研究领域的新发展的关键因素。航空影像具有同时记录多个对象的主要优点，并克服诸如闭塞限制。不过，也有只有很少的数据集提供。这项工作描述了用于估计从航空影像精确的车辆位置的处理。一个强大的物体检测为是可靠的结果是至关重要的，因此状态的最先进的深层神经网络掩码-RCNN施加用于这一目的。两个训练数据集应用：第一种是用于检测测试车辆优化，而第二个是由记录在公共道路上随机选择的图像。为了减少误差，几个方面占，如无人驾驶飞机的运动和从照片中的透视投影。所估计的位置comapared与安装在试验车辆的参考系统。它被示出，即20cm的平均精确度可与飞行高度来实现高达100m，全HD分辨率和一帧一帧的检测。一个可靠位置估计是用于进一步的数据处理，诸如获取额外的车辆状态变量的基础。源代码，训练权重，标签数据和视频。例如被公布于众。这支持研究人员创建新的交通数据集当地的具体条件。

9. The FaceChannel: A Light-weight Deep Neural Network for Facial Expression Recognition [PDF] 返回目录
Pablo Barros, Nikhil Churamani, Alessandra Sciutti
Abstract: Current state-of-the-art models for automatic FER are based on very deep neural networks that are difficult to train. This makes it challenging to adapt these models to changing conditions, a requirement from FER models given the subjective nature of affect perception and understanding. In this paper, we address this problem by formalizing the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We perform a series of experiments on different benchmark datasets to demonstrate how the FaceChannel achieves a comparable, if not better, performance, as compared to the current state-of-the-art in FER.
摘要：当前国家的最先进的模型自动FER是基于是很难培养非常深的神经网络。这使得它具有挑战性的这些模型适应不断变化的条件下，FER模型的要求给予影响的感知和认识的主观性。在本文中，我们通过正式的FaceChannel，具有比普通深层神经网络的参数要少得多重量轻的神经网络解决这个问题。我们执行不同的标准数据集了一系列的实验证明FaceChannel如何实现可比的，如果不是更好的性能，相比于当前国家的最先进的FER。

10. Structured Landmark Detection via Topology-Adapting Deep Graph Learning [PDF] 返回目录
Weijian Li, Yuhang Lu, Kang Zheng, Haofu Liao, Chihung Lin, Jiebo Luo, Chi-Tung Cheng, Jing Xiao, Le Lu, Chang-Fu Kuo, Shun Miao
Abstract: Image landmark detection aims to automatically identify the locations of predefined fiducial points. Despite recent success in this filed, higher-ordered structural modeling to capture implicit or explicit relationships among anatomical landmarks has not been adequately exploited. In this work, we present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection. The proposed method constructs graph signals leveraging both local image features and global shape features. The adaptive graph topology naturally explores and lands on task-specific structures which is learned end-to-end with two Graph Convolutional Networks (GCNs). Extensive experiments are conducted on three public facial image datasets (WFLW, 300W and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis). Quantitative results comparing with the previous state-of-the-art approaches across all studied datasets indicating the superior performance in both robustness and accuracy. Qualitative visualizations of the learned graph topologies demonstrate a physically plausible connectivity laying behind the landmarks.
摘要：图片标志检测目标自动识别预定义的基准点的位置。尽管在此提起最近的成功，更有序的结构模型来捕捉解剖标志之间或明或暗的关系还没有得到充分的利用。在这项工作中，我们提出了一个新的拓扑结构，适应深图的学习进行精确的解剖面部和医疗（例如，手，骨盆）标志检测方法。所提出的方法的构建体绘制利用两个局部图像特征和全局形状特征的信号。自适应图布局自然探索和土地上他们被学习的端至端有两个图形卷积网络（GCNs）特定任务的结构。大量的实验是在三次公开的面部图像数据集（WFLW，300W和COFW-68）以及三个真实世界的X射线医疗数据集（X线头影测量（公共），手和骨盆）进行。定量结果与表明两者的鲁棒性和准确性的卓越性能在所有研究数据集的先前状态的最先进的方法进行比较。学习图形拓扑结构的定性可视化演示物理连接合理铺设的地标背后。

11. MOPT: Multi-Object Panoptic Tracking [PDF] 返回目录
Juana Valeria Hurtado, Rohit Mohan, Abhinav Valada
Abstract: Comprehensive understanding of dynamic scenes is a critical prerequisite for intelligent robots to autonomously operate in our environment. Research in this domain which encompasses diverse perception problems has primarily been limited to addressing specific tasks individually and thus has contributed very little towards modeling the ability to understand dynamic scenes holistically. As a step towards encouraging research in this direction, we introduce a new perception task that we name Multi-Object Panoptic Tracking (MOPT). MOPT unifies the conventionally disjoint tasks of semantic segmentation, instance segmentation, and multi-object tracking. MOPT allows for exploiting pixel-level semantic information of 'thing' and 'stuff' classes, temporal coherence, and pixel-level associations over time, for the mutual benefit of each of these sub-problems. In order to facilitate quantitative evaluations of MOPT in a unified manner, we propose the soft Panoptic Tracking Quality (sPTQ) metric. As a first step towards addressing this task, we propose the novel PanopticTrackNet architecture that builds upon the state-of-the-art top-down panoptic segmentation network EfficientPS by adding a new tracking head to simultaneously learn all subtasks in an end-to-end manner. Additionally, we present several strong baselines that combine predictions from state-of-the-art panoptic segmentation and multi-object tracking models for comparison. We present extensive quantitative and qualitative evaluations for both vision-based and LiDAR-based MOPT on the challenging Virtual KITTI 2 and SemanticKITTI datasets, which demonstrates encouraging results.
摘要：动态场景的全面理解是智能机器人的一个关键先决条件在我们的环境中自主运行。研究在这一领域，其涵盖了不同的看法的问题一直主要局限于个别从而解决具体的任务已经对造型从整体上了解动态场景的能力非常小的贡献。作为朝这个方向鼓励研究了一步，我们引入了一个新的看法的任务，我们的名字多目标跟踪全景（MOPT）。 MOPT结合语义分割，实例分割和多目标跟踪的常规不相交的任务。 MOPT允许利用的“东西”和“东西”班，时间相干性，以及像素级协会像素级别的语义信息随着时间的推移，对于这些子问题的互惠互利。为了便于统一方式MOPT的定量评价，提出了软全景跟踪质量（sPTQ）指标。作为解决这一任务的第一步，我们建议通过添加新的跟踪头同时学习所有的子任务在端到于国家的最先进的自上而下的全景分割网络EfficientPS构建新型PanopticTrackNet架构结束方式。此外，我们提出从结合国家的最先进的全景分割和多目标跟踪模型的预测比较几个强势的基线。我们提出了广泛的定量和定性评估为在具有挑战性的虚拟KITTI 2和SemanticKITTI数据集，这表明了令人鼓舞的结果都基于视觉的和基于激光雷达MOPT。

12. Detailed 2D-3D Joint Representation for Human-Object Interaction [PDF] 返回目录
Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, Cewu Lu
Abstract: Human-Object Interaction (HOI) detection lies at the core of action understanding. Besides 2D information such as human/object appearance and locations, 3D pose is also usually utilized in HOI learning since its view-independence. However, rough 3D body joints just carry sparse body information and are not sufficient to understand complex interactions. Thus, we need detailed 3D body shape to go further. Meanwhile, the interacted object in 3D is also not fully studied in HOI learning. In light of these, we propose a detailed 2D-3D joint representation learning method. First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes. Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors. Finally, a joint learning framework and cross-modal consistency tasks are proposed to learn the joint HOI representation. To better evaluate the 2D ambiguity processing capacity of models, we propose a new benchmark named Ambiguous-HOI consisting of hard ambiguous images. Extensive experiments in large-scale HOI benchmark and Ambiguous-HOI show impressive effectiveness of our method. Code and data are available at this https URL.
摘要：人机交互对象（HOI）检测谎言在行动理解的核心。除了二维信息，例如人/对象外观和位置，三维姿态也通常在HOI学习因为其视图独立使用。然而，粗糙的三维人体关节而已矣稀疏的身体信息，不能够充分理解复杂的相互作用。因此，我们需要详细的三维体形走得更远。同时，在3D所述交互对象还没有完全HOI学习研究。在这些光中，我们提出了一个详细的2D-3D关节表示学习方法。首先，我们利用单视图人体捕获方法获得详细的3D身体，脸和手的形状。接下来，我们估计3D对象的位置和大小，参照所述2D人类对象空间配置和对象类别先验。最后，共同学习框架和跨模式的一致性任务提出学习联合HOI表示。为了更好地评估模型的2D歧义的处理能力，我们建议由硬暧昧图像的命名不明确，HOI新标杆。大型HOI基准和暧昧，HOI大量的实验证明我们的方法令人印象深刻的效果。代码和数据都可以在此HTTPS URL。

13. Modeling Extent-of-Texture Information for Ground Terrain Recognition [PDF] 返回目录
Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal
Abstract: Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image. In this paper, we propose a novel approach towards ground-terrain recognition via modeling the Extent-of-Texture information to establish a balance between the order-less texture component and ordered-spatial information locally. At first, the proposed method uses a CNN backbone feature extractor network to capture meaningful information of a ground terrain image, and model the extent of texture and shape information locally. Then, the order-less texture information and ordered shape information are encoded in a patch-wise manner, which is utilized by intra-domain message passing module to make every patch aware of each other for rich feature learning. Next, the Extent-of-Texture (EoT) Guided Inter-domain Message Passing module combines the extent of texture and shape information with the encoded texture and shape information in a patch-wise fashion for sharing knowledge to balance out the order-less texture information with ordered shape information. Further, Bilinear model generates a pairwise correlation between the order-less texture information and ordered shape information. Finally, the ground-terrain image classification is performed by a fully connected layer. The experimental results indicate superior performance of the proposed model over existing state-of-the-art techniques on publicly available datasets like DTD, MINC and GTOS-mobile.
摘要：地面地形识别是一项艰巨的任务，因为上下文信息在地面地形图像的区域显著变化。在本文中，我们提出了对地面地形识别的新方法通过模拟盘区的纹理的信息来建立订单较少的组织成分和有序的空间本地信息之间的平衡。首先，所提出的方法使用CNN骨干特征提取网络捕捉地面地形图像的有意义的信息，和纹理和形状的信息的程度局部建模。然后，为了少纹理信息，并命令形状信息进行编码的贴片式的方式，这是通过使模块，以使每个贴片对功能丰富学习知道对方域内消息利用。接着，扩展区的的纹理的（EOT）引导域间消息传递模块结合纹理和形状的信息与在一个贴片式的方式所编码的纹理和形状的信息的程度共享知识，以平衡的顺序无纹理与形状整齐的信息。此外，双线性模型生成的顺序无纹理信息，并命令形状信息之间的成对相关。最后，地面地形图像分类是通过一个完全连接层进行。实验结果表明，该模型在现有的国家的最先进的技术，像DTD，MINC和公开的数据集GTOS移动的卓越性能。

14. Object Detection and Recognition of Swap-Bodies using Camera mounted on a Vehicle [PDF] 返回目录
Ebin Zacharias, Didier Stricker, Martin Teuchler, Kripasindhu Sarkar
Abstract: Object detection and identification is a challenging area of computer vision and a fundamental requirement for autonomous cars. This project aims to jointly perform object detection of a swap-body and to find the type of swap-body by reading an ILU code using an efficient optical character recognition (OCR) method. Recent research activities have drastically improved deep learning techniques which proves to enhance the field of computer vision. Collecting enough images for training the model is a critical step towards achieving good results. The data for training were collected from different locations with maximum possible variations and the details are explained. In addition, data augmentation methods applied for training has proved to be effective in improving the performance of the trained model. Training the model achieved good results and the test results are also provided. The final model was tested with images and videos. Finally, this paper also draws attention to some of the major challenges faced during various stages of the project and the possible solutions applied.
摘要：目标检测与识别是计算机视觉的一个具有挑战性的领域和自主汽车的基本要求。该项目旨在共同执行交换体的物体检测和通过读取使用高效的光学字符识别（OCR）方法的ILU代码以查找交换体的类型。最近的研究活动已显着改善深度学习技术，这证明，以提高计算机视觉领域。对于训练模型收集足够的图像是实现良好效果的关键一步。对于训练数据是从以最大可能的变型不同的位置收集和细节进行说明。此外，应用于训练数据增强方法已被证明是有效地改善训练的模型的性能。训练模式取得了良好的成果，还提供了测试结果。最终的模型与图像和视频进行测试。最后，本文还提请注意一些在项目的各个阶段和应用的可能的解决方案所面临的重大挑战。

15. Fast Soft Color Segmentation [PDF] 返回目录
Naofumi Akimoto, Huachun Zhu, Yanghua Jin, Yoshimitsu Aoki
Abstract: We address the problem of soft color segmentation, defined as decomposing a given image into several RGBA layers, each containing only homogeneous color regions. The resulting layers from decomposition pave the way for applications that benefit from layer-based editing, such as recoloring and compositing of images and videos. The current state-of-the-art approach for this problem is hindered by slow processing time due to its iterative nature, and consequently does not scale to certain real-world scenarios. To address this issue, we propose a neural network based method for this task that decomposes a given image into multiple layers in a single forward pass. Furthermore, our method separately decomposes the color layers and the alpha channel layers. By leveraging a novel training objective, our method achieves proper assignment of colors amongst layers. As a consequence, our method achieve promising quality without existing issue of inference speed for iterative approaches. Our thorough experimental analysis shows that our method produces qualitative and quantitative results comparable to previous methods while achieving a 300,000x speed improvement. Finally, we utilize our proposed method on several applications, and demonstrate its speed advantage, especially in video editing.
摘要：解决软颜色分割的问题，定义为给定图像分解成几个RGBA层，每个包含唯一的均匀的颜色区域。从分解所得的各层铺路应用，从基于层的编辑，如重着色和图像和视频的合成益处。对于这个问题的当前状态的最先进的方法是缓慢的处理时间，由于其迭代特性阻碍，因此不能扩展到特定的真实场景。为了解决这个问题，我们提出了这个任务分解在一个单一的直传给定的图像为多层的基于神经网络的方法。此外，我们的方法分别分解彩色层和α沟道层。通过利用一种新的培养目标，我们的方法实现的颜色层之间正确地分配。因此，我们的方法实现无现有推断速度的问题，迭代方法有前途的质量。我们深入的实验分析表明，我们的方法产生的定性和定量结果堪比以前的方法同时实现了300,000x的速度提升。最后，我们利用多个应用程序我们提出的方法，并展示其速度优势，尤其是在视频编辑。

16. Image Processing Based Scene-Text Detection and Recognition with Tesseract [PDF] 返回目录
Ebin Zacharias, Martin Teuchler, Bénédicte Bernier
Abstract: Text Recognition is one of the challenging tasks of computer vision with considerable practical interest. Optical character recognition (OCR) enables different applications for automation. This project focuses on word detection and recognition in natural images. In comparison to reading text in scanned documents, the targeted problem is significantly more challenging. The use case in focus facilitates the possibility to detect the text area in natural scenes with greater accuracy because of the availability of images under constraints. This is achieved using a camera mounted on a truck capturing likewise images round-the-clock. The detected text area is then recognized using Tesseract OCR engine. Even though it benefits low computational power requirements, the model is limited to only specific use cases. This paper discusses a critical false positive case scenario occurred while testing and elaborates the strategy used to alleviate the problem. The project achieved a correct character recognition rate of more than 80\%. This paper outlines the stages of development, the major challenges and some of the interesting findings of the project.
摘要：文本识别是计算机视觉与相当的实际利益有挑战性的任务之一。光学字符识别（OCR），可用于自动化不同的应用。该项目的重点是在自然图像文字检测与识别。相较于阅读扫描文档中的文本，有针对性的问题是显著更具挑战性。用例的重点便于检测自然场景文本面积的图像，因为下约束可用性更高的精度的可能性。这是通过使用相机来实现安装在卡车捕获同样二十四时钟的图像。然后，将检测到的文本区域使用的Tesseract OCR引擎的认可。虽然它有利于低计算能力的需求，该机型只限于特定的使用情况。本文讨论了在测试发生严重误报情况，并阐述了用于缓解该问题的战略。该项目实现了超过80 \％正确的文字识别率。本文概述发展阶段，面临的主要挑战和一些项目的有趣的发现之一。

17. Adaptive Neuron-wise Discriminant Criterion and Adaptive Center Loss at Hidden Layer for Deep Convolutional Neural Network [PDF] 返回目录
Motoshi Abe, Junichi Miyao, Takio Kurita
Abstract: A deep convolutional neural network (CNN) has been widely used in image classification and gives better classification accuracy than the other techniques. The softmax cross-entropy loss function is often used for classification tasks. There are some works to introduce the additional terms in the objective function for training to make the features of the output layer more discriminative. The neuron-wise discriminant criterion makes the input feature of each neuron in the output layer discriminative by introducing the discriminant criterion to each of the features. Similarly, the center loss was introduced to the features before the softmax activation function for face recognition to make the deep features discriminative. The ReLU function is often used for the network as an active function in the hidden layers of the CNN. However, it is observed that the deep features trained by using the ReLU function are not discriminative enough and show elongated shapes. In this paper, we propose to use the neuron-wise discriminant criterion at the output layer and the center-loss at the hidden layer. Also, we introduce the online computation of the means of each class with the exponential forgetting. We named them adaptive neuron-wise discriminant criterion and adaptive center loss, respectively. The effectiveness of the integration of the adaptive neuron-wise discriminant criterion and the adaptive center loss is shown by the experiments with MNSIT, FashionMNIST, CIFAR10, CIFAR100, and STL10. Source code is at this https URL
摘要：深卷积神经网络（CNN）已被广泛应用于图像分类和提供更好的分类精度比其他技术。该SOFTMAX交叉熵损失函数通常用于分类任务。还有一些作品引进目标函数的附加条款的培训，以使输出层的功能，更有辨别力。神经元明智判别准则通过引入判别准则，以每个特征使得在输出层判别各神经元的输入的功能。同样，该中心损失引入到功能SOFTMAX激活功能之前，面部识别，使深部的功能区别。该RELU函数通常用于网络如CNN的隐藏层的有源功能。然而，据观察，通过使用RELU功能训练的深的特点是不歧视不够，显示长条状。在本文中，我们建议使用神经元明智的判断标准在输出层和中间损耗在隐藏层。此外，我们推出与指数遗忘每个类的手段的在线计算。我们分别将它们命名为自适应神经元明智的判断标准和自适应中心损失。自适应神经元明智判别标准和自适应中心损失的积分的有效性是通过用MNSIT，FashionMNIST，CIFAR10，CIFAR100和STL10实验所示。源代码是在这个HTTPS URL

18. Transform and Tell: Entity-Aware News Image Captioning [PDF] 返回目录
Alasdair Tran, Alexander Mathews, Lexing Xie
Abstract: We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically rich captions that include uncommon words. We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism. We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts. On the GoodNews dataset, our model outperforms the previous state of the art by a factor of four in CIDEr score (13 to 54). This performance gain comes from a unique combination of language models, word representation, image embeddings, face embeddings, object embeddings, and improvements in neural network design. We also introduce the NYTimes800k dataset which is 70% larger than GoodNews, has higher article quality, and includes the locations of images within articles as an additional contextual cue.
摘要：我们提出了一个端到端的高端型号，其产生嵌入在新闻报道的图片标题。新闻图片呈现两大挑战：他们依赖于现实世界的知识，尤其是关于命名实体;他们通常有丰富的语言字幕，其中包括生僻字。我们应对的话与图像中的人脸和对象的标题相关联的第一个挑战，通过多模式，多注意头机制。我们解决第二个挑战与一个国家的最先进的变压器语言模型使用字节编码对到生成字幕作为字部位的序列。在真证数据集，我们的模型通过四个苹果酒得分（13〜54）的一个因素胜过艺术的以前的状态。这种性能增益来自语言模型，文字作品，图像的嵌入，脸上的嵌入，对象嵌入物和改进的神经网络的设计的独特组合。我们还介绍了NYTimes800k数据集比真证大70％，具有较高的制品的品质，并且包括图像的制品内作为附加上下文线索的位置。

19. One-vs-Rest Network-based Deep Probability Model for Open Set Recognition [PDF] 返回目录
Jaeyeon Jang, Chang Ouk Kim
Abstract: Unknown examples that are unseen during training often appear in real-world computer vision tasks, and an intelligent self-learning system should be able to differentiate between known and unknown examples. Open set recognition, which addresses this problem, has been studied for approximately a decade. However, conventional open set recognition methods based on deep neural networks (DNNs) lack a foundation for post recognition score analysis. In this paper, we propose a DNN structure in which multiple one-vs-rest sigmoid networks follow a convolutional neural network feature extractor. A one-vs-rest network, which is composed of rectified linear unit activation functions for the hidden layers and a single sigmoid target class output node, can maximize the ability to learn information from nonmatch examples. Furthermore, the network yields a sophisticated nonlinear features-to-output mapping that is explainable in the feature space. By introducing extreme value theory-based calibration techniques, the nonlinear and explainable mapping provides a well-grounded class membership probability models. Our experiments show that one-vs-rest networks can provide more informative hidden representations for unknown examples than the commonly used SoftMax layer. In addition, the proposed probability model outperformed the state-of-the art methods in open set classification scenarios.
摘要：未知的例子是在训练中是看不见的经常出现在真实世界的计算机视觉任务，和智能自学习系统应该能够已知和未知的例子来区分。开集识别，这解决了这个问题，已经研究了近十年。然而，基于深层神经网络（DNNs）传统的开集识别方法缺乏后识别分值分析提供了基础。在本文中，我们提出了一个DNN结构，其中多单VS-休息乙状结肠网络遵循卷积神经网络的特征提取。一个一VS-其余网络，其被用于隐藏层和单个S形目标类输出节点由整流线性单元激活功能，能最大限度地从非匹配实例学信息的能力。此外，网络产生一个复杂的非线性的特征 - 输出映射是在特征空间中解释的。通过引入极值理论为基础的校准技术中，非线性和解释的映射提供一个良好接地类别成员概率的模型。我们的实验表明，一个-VS-休息网络能够比常用的使用SoftMax层未知的实例提供了更多的信息隐藏表示。此外，所提出的概率模型优于开放组分类情景国家的最先进的方法。

20. YuruGAN: Yuru-Chara Mascot Generator Using Generative Adversarial Networks With Clustering Small Dataset [PDF] 返回目录
Yuki Hagiwara, Toshihisa Tanaka
Abstract: A yuru-chara is a mascot character created by local governments and companies for publicizing information on areas and products. Because it takes various costs to create a yuruchara, the utilization of machine learning techniques such as generative adversarial networks (GANs) can be expected. In recent years, it has been reported that the use of class conditions in a dataset for GANs training stabilizes learning and improves the quality of the generated images. However, it is difficult to apply class conditional GANs when the amount of original data is small and when a clear class is not given, such as a yuruchara image. In this paper, we propose a class conditional GAN based on clustering and data augmentation. Specifically, first, we performed clustering based on K-means++ on the yuru-chara image dataset and converted it into a class conditional dataset. Next, data augmentation was performed on the class conditional dataset so that the amount of data was increased five times. In addition, we built a model that incorporates ResBlock and self-attention into a network based on class conditional GAN and trained the class conditional yuru-chara dataset. As a result of evaluating the generated images, the effect on the generated images by the difference of the clustering method was confirmed.
摘要：玉乳，甜心是宣传上的区域和产品信息，由当地政府和企业创造了一个吉祥物。因为它需要各种费用，以创建一个yuruchara，机器学习技术，如生成对抗性网络（甘斯）的使用可以预期的。近年来，据报道，在对甘斯训练数据集采用的类条件企稳学习，提高了生成的图像的质量。然而，这是难以应用类条件甘斯当原始数据的量小，并且当没有给出明确的类，诸如yuruchara图像。在本文中，我们提出了基于集群和数据增强一类条件GAN。具体而言，首先，进行了基于在玉如-甜心图像数据集的K均值聚类++和将它转换成一个类条件数据集。接着，在类条件数据集，以使数据的量增加了5倍，进行数据扩充。此外，我们还建立了包含ResBlock和自我关注到基于类条件GAN和训练有素类条件玉乳，甜心数据集的网络模型。为评价所生成的图像的结果，通过聚类法的差所生成的图像的效果得到了确认。

21. Generative Adversarial Networks for Video-to-Video Domain Adaptation [PDF] 返回目录
Jiawei Chen, Yuexiang Li, Kai Ma, Yefeng Zheng
Abstract: Endoscopic videos from multicentres often have different imaging conditions, e.g., color and illumination, which make the models trained on one domain usually fail to generalize well to another. Domain adaptation is one of the potential solutions to address the problem. However, few of existing works focused on the translation of video-based data. In this work, we propose a novel generative adversarial network (GAN), namely VideoGAN, to transfer the video-based data across different domains. As the frames of a video may have similar content and imaging conditions, the proposed VideoGAN has an X-shape generator to preserve the intra-video consistency during translation. Furthermore, a loss function, namely color histogram loss, is proposed to tune the color distribution of each translated frame. Two colonoscopic datasets from different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the performance of domain adaptation of our VideoGAN. Experimental results demonstrate that the adapted colonoscopic video generated by our VideoGAN can significantly boost the segmentation accuracy, i.e., an improvement of 5%, of colorectal polyps on multicentre datasets. As our VideoGAN is a general network architecture, we also evaluate its performance with the CamVid driving video dataset on the cloudy-to-sunny translation task. Comprehensive experiments show that the domain gap could be substantially narrowed down by our VideoGAN.
摘要：从multicentres内窥镜视频往往有不同的成像条件下，例如，颜色和照明，这使得上训练一个结构域的模型通常不能很好推广到另一个。领域适应性是解决这一问题的潜在解决方案之一。然而，现有的几个作品专注于基于视频数据的转换。在这项工作中，我们提出了一种新生成对抗网络（GAN），即VideoGAN，跨不同的域转移的基于视频的数据。作为视频的帧可具有类似的内容和成像条件，所提出的VideoGAN具有X形状发生器翻译过程中保留帧内视频一致性。此外，损耗函数，即颜色直方图损耗，提出了调节每个翻译框的颜色分布。从不同的中心，即CVC-诊所和ETIS-Larib，两个结肠镜数据集采用的评估我们VideoGAN的领域适应性的表现。实验结果表明，在适于通过我们的VideoGAN产生结肠镜视频可以显著提高上多中心数据集分割精度，即，5％的改善，大肠息肉。由于我们的VideoGAN是一个通用的网络架构，我们也评价其与多云到晴天翻译任务的CamVid驱动视频数据集的性能。综合实验表明，该领域的差距可以通过我们的VideoGAN大幅收窄。

22. Self-Learning with Rectification Strategy for Human Parsing [PDF] 返回目录
Tao Li, Zhiyuan Liang, Sanyuan Zhao, Jiahao Gong, Jianbing Shen
Abstract: In this paper, we solve the sample shortage problem in the human parsing task. We begin with the self-learning strategy, which generates pseudo-labels for unlabeled data to retrain the model. However, directly using noisy pseudo-labels will cause error amplification and accumulation. Considering the topology structure of human body, we propose a trainable graph reasoning method that establishes internal structural connections between graph nodes to correct two typical errors in the pseudo-labels, i.e., the global structural error and the local consistency error. For the global error, we first transform category-wise features into a high-level graph model with coarse-grained structural information, and then decouple the high-level graph to reconstruct the category features. The reconstructed features have a stronger ability to represent the topology structure of the human body. Enlarging the receptive field of features can effectively reducing the local error. We first project feature pixels into a local graph model to capture pixel-wise relations in a hierarchical graph manner, then reverse the relation information back to the pixels. With the global structural and local consistency modules, these errors are rectified and confident pseudo-labels are generated for retraining. Extensive experiments on the LIP and the ATR datasets demonstrate the effectiveness of our global and local rectification modules. Our method outperforms other state-of-the-art methods in supervised human parsing tasks.
摘要：在本文中，我们解决了人类解析任务样本不足的问题。我们开始与自我学习策略，其产生的伪标签未标记的数据重新训练模型。然而，直接使用嘈杂伪标签会造成误差放大和积累。考虑人体的拓扑结构，我们建议，建立到在伪标签正确两种典型的错误，即，全局结构误差和局部一致性错误图形节点之间的内部结构连接的可训练图形推理方法。对于全局误差，我们第一变换类明智特征为高水平的图形模型与粗粒度结构信息，然后解耦高级图形重建的类别特征。重建的特征有较强的代表人体的拓扑结构的能力。放大的功能感受野可以有效地减少局部误差。我们的第一个项目特征像素成层次图的方式局部图模型来捕获逐像素的关系，再反向关系信息反馈给像素。随着全球结构和局部一致性模块，这些错误纠正和自信伪标签再培训产生。在LIP和ATR数据集大量的实验证明我们的全球和本地整改模块的有效性。我们的方法优于国家的最先进的等在监督人解析任务的方法。

23. Conservative Plane Releasing for Spatial Privacy Protection in Mixed Reality [PDF] 返回目录
Jaybie A. de Guzman, Kanchana Thilakarathna, Aruna Seneviratne
Abstract: Augmented reality (AR) or mixed reality (MR) platforms require spatial understanding to detect objects or surfaces, often including their structural (i.e. spatial geometry) and photometric (e.g. color, and texture) attributes, to allow applications to place virtual or synthetic objects seemingly "anchored" on to real world objects; in some cases, even allowing interactions between the physical and virtual objects. These functionalities require AR/MR platforms to capture the 3D spatial information with high resolution and frequency; however, these pose unprecedented risks to user privacy. Aside from objects being detected, spatial information also reveals the location of the user with high specificity, e.g. in which part of the house the user is. In this work, we propose to leverage spatial generalizations coupled with conservative releasing to provide spatial privacy while maintaining data utility. We designed an adversary that builds up on existing place and shape recognition methods over 3D data as attackers to which the proposed spatial privacy approach can be evaluated against. Then, we simulate user movement within spaces which reveals more of their space as they move around utilizing 3D point clouds collected from Microsoft HoloLens. Results show that revealing no more than 11 generalized planes--accumulated from successively revealed spaces with large enough radius, i.e. $r\leq1.0m$--can make an adversary fail in identifying the spatial location of the user for at least half of the time. Furthermore, if the accumulated spaces are of smaller radius, i.e. each successively revealed space is $r\leq 0.5m$, we can release up to 29 generalized planes while enjoying both better data utility and privacy.
摘要：增强现实（AR）或混合现实（MR）平台需要空间理解为检测物体或表面，通常包括它们的结构（即，空间几何形状）和光度（例如，颜色和质地）的属性，以允许应用程序将虚拟或合成对象到真实世界的物体看似“锚定”;在某些情况下，甚至允许物理和虚拟物体之间的相互作用。这些功能需要AR / MR平台来捕获具有高的分辨率和频率的3D空间信息;然而，这些会对用户隐私前所未有的风险。除了被检测到的物体，空间信息也揭示了用户的以高特异性，例如位置其中房子的部分用户。在这项工作中，我们提出利用再加上保守的释放提供空间私密性，同时保持数据效用空间的概括。我们设计了现有的地方和形状识别方法，在3D数据，攻击者将提出的空间私密性的方法可以对被评估建立起一个对手。然后，我们模拟中揭示更多的空间，因为它们走动利用微软HoloLens收集三维点云空间用户移动。结果表明，透露出不超过11架通用 - 从相继地累积透露有足够大的空间半径，即$ R \ leq1.0m $ - 可以让对手无法识别用户的空间位置至少一半时间。此外，如果累积的空间是较小的半径，即各自相继透露空间是$ R \当量0.5米$，我们可以释放多达29架广义飞机，同时享受两个更好的数据工具和隐私。

24. CPARR: Category-based Proposal Analysis for Referring Relationships [PDF] 返回目录
Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia
Abstract: The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{}. This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.
摘要：参照关系的任务是本地化主体和客体实体满足关系查询，其在\ texttt的形式给出{<主语，谓语，宾语>}的图像。这需要在一个指定的关系的主体和对象实体的同时定位。我们介绍指关系的简单而有效的基于提议法。从现有的方法，如SSAS不同的是，我们的方法可以生成高分辨率的结果，同时降低其复杂性和模糊性。我们的方法是由两个模块：基于分类的方案生成模块，用于选择与所述实体和谓词分析模块的提案计分对所选建议的兼容性。我们给出两个公共数据集的参照关系任务的国家的最先进的性能：视觉关系的检测和视觉基因。

25. DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for Monocular Depth Estimation [PDF] 返回目录
Linda Wang, Mahmoud Famouri, Alexander Wong
Abstract: Depth estimation is an active area of research in the field of computer vision, and has garnered significant interest due to its rising demand in a large number of applications ranging from robotics and unmanned aerial vehicles to autonomous vehicles. A particularly challenging problem in this area is monocular depth estimation, where the goal is to infer depth from a single image. An effective strategy that has shown considerable promise in recent years for tackling this problem is the utilization of deep convolutional neural networks. Despite these successes, the memory and computational requirements of such networks have made widespread deployment in embedded scenarios very challenging. In this study, we introduce DepthNet Nano, a highly compact self normalizing network for monocular depth estimation designed using a human machine collaborative design strategy, where principled network design prototyping based on encoder-decoder design principles are coupled with machine-driven design exploration. The result is a compact deep neural network with highly customized macroarchitecture and microarchitecture designs, as well as self-normalizing characteristics, that are highly tailored for the task of embedded depth estimation. The proposed DepthNet Nano possesses a highly efficient network architecture (e.g., 24X smaller and 42X fewer MAC operations than Alhashim et al. on KITTI), while still achieving comparable performance with state-of-the-art networks on the NYU-Depth V2 and KITTI datasets. Furthermore, experiments on inference speed and energy efficiency on a Jetson AGX Xavier embedded module further illustrate the efficacy of DepthNet Nano at different resolutions and power budgets (e.g., ~14 FPS and >0.46 images/sec/watt at 384 X 1280 at a 30W power budget on KITTI).
摘要：深度估计是计算机视觉领域研究的活跃领域，并且已经获得显著的兴趣，因为它在大量的应用，包括机器人和无人机自主汽车的需求不断上升。在这方面的特别具有挑战性的问题是单眼深度估计，其中该目标是从一个单一的图像推断深度。已经表现出相当大的希望在最近几年为解决这一问题的有效策略是深卷积神经网络的利用率。尽管取得了这些成功，这种网络的存储和计算要求已在嵌入式方案中广泛部署非常具有挑战性的。在这项研究中，我们介绍DepthNet纳米，高度紧凑自正火的网络使用的人机协同设计策略，在坚持原则的网络原型设计基于编码器，解码器的设计原理加上机驱动的设计探索设计单眼深度估计。其结果是高度定制的宏架构和微架构设计，以及自正火特征，即高度的嵌入深度估计的任务量身打造的一款紧凑型深层神经网络。所提出的DepthNet纳米具有高效的网络架构（例如，24X小，42X更少MAC运算比Alhashim等人于KITTI），同时仍实现相当的性能与对NYU-深度状态的最先进的网络V2和KITTI数据集。此外，在推理速度和能量效率的杰特森AGX泽维尔嵌入模块上实验进一步说明以不同的分辨率和功率预算（例如，〜14 FPS和> 0.46张/秒/ 384 X1280瓦特在30W DepthNet纳米的功效在KITTI功率预算）。

26. ViBE: A Tool for Measuring and Mitigating Bias in Image Datasets [PDF] 返回目录
Angelina Wang, Arvind Narayanan, Olga Russakovsky
Abstract: Machine learning models are known to perpetuate the biases present in the data, but oftentimes these biases aren't known until after the models are deployed. We present the Visual Bias Extraction (ViBE) Tool that assists in the investigation of a visual dataset, surfacing potential dataset biases along three dimensions: (1) object-based, (2) gender-based, and (3) geography-based. Object-based biases relate to things like size, context, or diversity of object representation in the dataset; gender-based metrics aim to reveal the stereotypical portrayal of people of different genders within the dataset, with future iterations of our tool extending the analysis to additional axes of identity; geography-based analysis considers the representation of different geographic locations. Our tool is designed to shed light on the dataset along these three axes, allowing both dataset creators and users to gain a better understanding of what exactly is portrayed in their dataset. The responsibility then lies with the tool user to determine which of the revealed biases may be problematic, taking into account the cultural and historical context, as this is difficult to determine automatically. Nevertheless, the tool also provides actionable insights that may be helpful for mitigating the revealed concerns. Overall, our work allows for the machine learning bias problem to be addressed early in the pipeline at the dataset stage. ViBE is available at this https URL.
摘要：机器学习模型是众所周知的延续数据中存在的偏见，但往往这些偏见是不知道，直到后的模型部署。我们目前的视觉偏差提取（ViBE系列）工具，在视觉数据集的调查，堆焊潜在的数据集的偏见沿着三个维度助攻：（1）基于对象;（2）基于性别的，和（3）地域为基础的。基于对象的偏差涉及之类的东西尺寸，上下文，或在数据集中对象表示的多样性;基于性别的指标，旨在揭示的数据集内不同性别的人的陈词滥调，延伸分析身份的附加轴我们的工具的未来迭代;基于地理的分析考虑不同的地理位置的表示。我们的工具被设计为在沿着这三条轴线数据集中揭示，同时允许数据集创作者和使用者更好地了解究竟是什么在他们的数据集刻画。责任所在，然后使用工具的用户来确定哪些启示的偏差可能是有问题的，考虑到文化和历史背景，因为这是很难自动确定。不过，该工具还提供了可操作的见解可能对减轻显露担忧有所帮助。总体而言，我们的工作允许机器学习偏差问题将在管道在数据集中阶段及早处理。盛传是可以在这个HTTPS URL。

27. A generic ensemble based deep convolutional neural network for semi-supervised medical image segmentation [PDF] 返回目录
Ruizhe Li, Dorothee Auer, Christian Wagner, Xin Chen
Abstract: Deep learning based image segmentation has achieved the state-of-the-art performance in many medical applications such as lesion quantification, organ detection, etc. However, most of the methods rely on supervised learning, which require a large set of high-quality labeled data. Data annotation is generally an extremely time-consuming process. To address this problem, we propose a generic semi-supervised learning framework for image segmentation based on a deep convolutional neural network (DCNN). An encoder-decoder based DCNN is initially trained using a few annotated training samples. This initially trained model is then copied into sub-models and improved iteratively using random subsets of unlabeled data with pseudo labels generated from models trained in the previous iteration. The number of sub-models is gradually decreased to one in the final iteration. We evaluate the proposed method on a public grand-challenge dataset for skin lesion segmentation. Our method is able to significantly improve beyond fully supervised model learning by incorporating unlabeled data.
摘要：深学习基于图像分割取得了许多医疗应用，如病变定量，器官检测等。但是，大部分的方法依赖于监督学习，这需要大集高的国家的最先进的性能-quality标签的数据。数据注释通常是一个非常耗时的过程。为了解决这个问题，我们提出了图像分割基于深刻的卷积神经网络（DCNN）上通用的半监督学习框架。编码器，解码器基于DCNN使用几个注释的训练样本最初训练。然后，这个最初训练模式被复制到子模型和改进反复使用与前次迭代训练的模型生成的伪标签，标签数据的随机子集。子模型的数量在最后的迭代逐渐减少到一个。我们评估在公共盛大挑战数据集皮肤肿瘤分割所提出的方法。我们的方法是能够通过将未标记的数据超出完全监控模式学习显著改善。

28. Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence [PDF] 返回目录
Huy Manh Nguyen, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi
Abstract: Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.
摘要：视觉语义嵌入旨在学习联合嵌入空间，相关的视频和句子实例靠近彼此。大多数现有的方法把实例在一个单一的嵌入空间。然而，他们的斗争中嵌入实例由于在句子文字特征匹配的视频视觉动态的难度。一个单一的空间是不够的，以适应各种视频和句子。在本文中，我们提出了实例映射到多个单独的嵌入空间，让我们可以捕捉实例之间的多重关系，导致了精彩的视频检索一个新的框架。我们提出通过使用加权和策略在熔合的各嵌入空间中测量相似性，以产生实例之间的最终相似性。我们根据句子确定权重。因此，我们可以灵活地强调一个嵌入的空间。我们在基准数据集进行判决对视频检索实验。所提出的方法实现了优越的性能，其结果是国家的最先进的方法具有竞争力。这些实验结果表明相比于现有方法所提出的多个嵌入方法的有效性。

29. ALCN: Adaptive Local Contrast Normalization [PDF] 返回目录
Mahdi Rad, Peter M. Roth, Vincent Lepetit
Abstract: To make Robotics and Augmented Reality applications robust to illumination changes, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is a very unwieldy and complex task. We therefore propose a novel illumination normalization method that can easily be used for different problems with challenging illumination conditions. Our preliminary experiments show that among current normalization methods, the Difference-of Gaussians method remains a very good baseline, and we introduce a novel illumination normalization model that generalizes it. Our key insight is then that the normalization parameters should depend on the input image, and we aim to train a Convolutional Neural Network to predict these parameters from the input image. This, however, cannot be done in a supervised manner, as the optimal parameters are not known a priori. We thus designed a method to train this network jointly with another network that aims to recognize objects under different illuminations: The latter network performs well when the former network predicts good values for the normalization parameters. We show that our method significantly outperforms standard normalization methods and would also be appear to be universal since it does not have to be re-trained for each new application. Our method improves the robustness to light changes of state-of-the-art 3D object detection and face recognition methods.
摘要：为了使机器人与增强现实应用鲁棒的光照变化，目前的趋势是培养深网络与培训许多不同照明条件下捕获的图像。不幸的是，建立这样一个训练集是一个非常笨重而复杂的任务。因此，我们建议可以很容易地用于具有挑战的照明条件不同的问题的新颖照明归一化方法。我们的初步实验表明，目前的标准化方法中，高斯差-的方法仍然是一个非常好的基础，我们介绍一种新的照明标准化模型，概括它。我们的主要观点是然后规范化参数应该取决于输入图像上，我们的目标是训练卷积神经网络从输入图像预测这些参数。然而，这不能以有监督的方式进行，作为最佳参数不是先验已知的。因此，我们设计了一个方法来与另一个网络，其目的是识别在不同的光照对象共同培养这个网络：当网络前预测的标准化参数良好的价值观，后者的网络表现良好。我们证明了我们的方法显著优于标准的标准化方法，也将是似乎是通用的，因为它没有被重新训练每个新的应用程序。我们的方法提高了鲁棒性的状态的最先进的三维物体检测和脸部识别方法的光的变化。

30. Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding [PDF] 返回目录
Panagiotis Meletis, Xiaoxiao Wen, Chenyang Lu, Daan de Geus, Gijs Dubbelman
Abstract: In this technical report, we present two novel datasets for image scene understanding. Both datasets have annotations compatible with panoptic segmentation and additionally they have part-level labels for selected semantic classes. This report describes the format of the two datasets, the annotation protocols, the merging strategies, and presents the datasets statistics. The datasets labels together with code for processing and visualization will be published at this https URL.
摘要：在这个技术报告中，我们提出了图像场景的理解两个新的数据集。两个数据集具有全景分割兼容注释和另外它们具有对选定的语义类别部件级标签。这份报告介绍了两个数据集，注释协议，合并策略，并提出了统计数据集的格式。该数据集的标签与处理和可视化代码一起将在此HTTPS URL公布。

31. Continual Learning for Anomaly Detection in Surveillance Videos [PDF] 返回目录
Keval Doshi, Yasin Yilmaz
Abstract: Anomaly detection in surveillance videos has been recently gaining attention. A challenging aspect of high-dimensional applications such as video surveillance is continual learning. While current state-of-the-art deep learning approaches perform well on existing public datasets, they fail to work in a continual learning framework due to computational and storage issues. Furthermore, online decision making is an important but mostly neglected factor in this domain. Motivated by these research gaps, we propose an online anomaly detection method for surveillance videos using transfer learning and continual learning, which in turn significantly reduces the training complexity and provides a mechanism for continually learning from recent data without suffering from catastrophic forgetting. Our proposed algorithm leverages the feature extraction power of neural network-based models for transfer learning, and the continual learning capability of statistical detection methods.
摘要：异常检测监控视频最近已经受到关注。高维应用，如视频监控面对的挑战是不断学习。虽然当前国家的最先进的深学习方法对现有公共数据集的良好表现，他们无法在不断地学习框架的工作，由于计算和存储的问题。此外，在线决策是在这一领域的一个重要但被忽视的主要因素。这些研究的启发差距，提出了利用传递学习和持续学习，这又显著降低了训练的复杂性，并提供了从近期的数据不断地学习，而不从灾难性遗忘痛苦的机制监控视频网上异常检测方法。我们提出的算法利用转移学习基于神经网络的模型的特征提取能力，并统计检测方法的持续学习能力。

32. Unsupervised Learning of Landmarks based on Inter-Intra Subject Consistencies [PDF] 返回目录
Weijian Li, Haofu Liao, Shun Miao, Le Lu, Jiebo Luo
Abstract: We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images. This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure. To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images. Our proposed method is extensively evaluated on two public facial image datasets (MAFL, AFLW) with various settings. Experimental results indicate that our method can extract the consistent landmarks for both datasets and achieve better performances compared to the previous state-of-the-art methods quantitatively and qualitatively.
摘要：通过将面部图像，跨学科具有里程碑意义的一致性提出一种新的无监督学习的方法来形象里程碑式的发现。这是通过一个从属间映射模块，基于辅助主题相关的结构变换原始主体的地标来实现。从变换图像恢复回到原始主题，标志检测器被强制地得知，同时含有用于配对的被检体内图像和配对对象间的图像之间的一致的语义的空间位置。我们提出的方法是在两个大众脸图像数据集（MAFL，AFLW）使用不同的设置广泛的评估。实验结果表明，我们的方法可以提取两个数据集的一致的标志性建筑和相比之前的国家的最先进的方法，定量和定性实现更好的性能。

33. Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [PDF] 返回目录
Zheng Dang, Kwang Moo Yi, Yinlin Hu, Fei Wang, Pascal Fua, Mathieu Salzmann
Abstract: Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.
摘要：许多经典的计算机视觉问题，如基本矩阵计算和姿态估计从3D到2D对应，可以通过求解线性最小二乘问题，可以通过寻找对应于最小或零的特征向量来实现加以解决，特征值表示线性系统的矩阵。要将这个在深度学习框架，使我们能够明确地编码，而不是具有网络隐式地从数据学习它们的几何形状已知的概念。然而，在网络中执行特征分解需要区分这种操作的能力。虽然理论上可行，在实践中优化过程本介绍数值不稳定。在本文中，我们介绍了一个无特征分解的方法来训练了深刻的网络，其损耗取决于对应于网络预测一个矩阵的特征值为零的特征向量。我们证明我们的方法比使用两个常规任务，异常值拒绝和降噪的特征分解的明确区分，有几个实际的例子，包括宽基线立体，角度正点问题，椭圆拟合更健壮。根据经验，我们的方法具有更好的收敛性和收益率的国家的最先进的成果。

34. Models Genesis [PDF] 返回目录
Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B. Gotway, Jianming Liang
Abstract: Transfer learning from natural image to medical image has been established as one of the most practical paradigms in deep learning for medical image analysis. To fit this paradigm, however, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information, thereby inevitably compromising its performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learnt by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D/2.5D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated and recurrent anatomy in medical images can serve as strong yet free supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all codes and pre-trained Models Genesis are available at this https URL
摘要：医学图像传输的学习自然图像已被确立为深度学习最实用的范式医学图像分析中的一个。为了适应这种模式，但是，在最显眼的成像方式（例如，CT和MRI）三维成像的任务，必须重新和2D解决，失去丰富的3D解剖信息，从而不可避免地影响其性能。为了克服这种限制，我们已经建立了一套模型，称为通用Autodidactic模型，绰号模型创，因为他们创造了无中生有（没有手工贴标），自学成才（通过自我监督学习）和通用（服作为源模型生成应用特定的目标模型）。我们广泛的实验表明，我们的模型创显著跑赢从头开始学习在所有五个目标的3D应用程序涵盖分割和分类。更重要的是，学习从单纯的3D从头模型可能不一定比迁移学习更好地在2D收益表现从ImageNet，但我们的模型创一贯顶部的任何2D / 2.5D方法，包括微调从ImageNet模型预先训练以及作为微调的2D版本，我们的模型的创世纪，证实了三维解剖信息和三维医学成像模型创世纪意义的重要性。这一成绩主要归功于我们统一的自我监督学习框架，建立在一个简单而强大的观察：在医学图像的复杂和反复的解剖结构可以作为深机型强大而自由的监督信号通过自检自动学会共同解剖图示。作为开放的科学，所有的代码和预先训练模式起源可在此HTTPS URL

35. Iteratively Pruned Deep Learning Ensembles for COVID-19 Detection in Chest X-rays [PDF] 返回目录
Sivaramakrishnan Rajaraman, Jen Siegelman, Philip O. Alderson, Lucas S. Folio, Les R. Folio, Sameer K. Antani
Abstract: We demonstrate use of iteratively pruned deep learning model ensembles for detecting the coronavirus disease 2019 (COVID-19) infection with chest X-rays (CXRs). The disease is caused by the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus, also known as the novel Coronavirus (2019-nCoV). A custom convolutional neural network (CNN) and a selection of pretrained CNN models are trained on publicly available CXR collections to learn CXR modality-specific feature representations and the learned knowledge is transferred and fine-tuned to improve performance and generalization in the related task of classifying normal, bacterial pneumonia, and CXRs exhibiting COVID-19 abnormalities. The best performing models are iteratively pruned to identify optimal number of neurons in the convolutional layers to reduce complexity and improve memory efficiency. The predictions of the best-performing pruned models are combined through different ensemble strategies to improve classification performance. The custom and pretrained CNNs are evaluated at the patient-level to alleviate issues due to information leakage and reduce generalization errors. Empirical evaluations demonstrate that the weighted average of the best-performing pruned models significantly improves performance resulting in an accuracy of 99.01% and area under the curve (AUC) of 0.9972 in detecting COVID-19 findings on CXRs as compared to the individual constituent models. The combined use of modality-specific knowledge transfer, iterative model pruning, and ensemble learning resulted in improved predictions. We expect that this model can be quickly adopted for COVID-19 screening using chest radiographs.
摘要：我们证明使用迭代修剪深学习模式集合的用于检测冠状病2019（COVID-19）感染与胸部X射线（CXRS）。这种疾病是由新型严重急性呼吸系统综合征冠状2（SARS-CoV的-2）病毒，也称为新型冠状病毒（2019-nCoV）引起的。一个自定义的卷积神经网络（CNN）和选择预训练的CNN模型进行培训，可公开获得的CXR集学习CXR特定形态特征的陈述和所学的知识转移和微调，以提高中的相关任务性能和泛化分类正常，细菌性肺炎，并表现出COVID-19异常CXRS。表现最好的模型反复修剪，以确定卷积层神经元的最优数量降低复杂性并提高记忆效率。表现最好的修剪模型的预测是通过不同的合奏战略相结合，提高分类性能。自定义和预训练的细胞神经网络在患者水平进行评估，以缓解因信息泄漏问题，减少泛化误差。经验评价证明表现最好的修剪模式的加权平均显著提高导致99.01％的0.9972在相比于单独的成分模型检测在CXRS COVID-19的研究结果的曲线（AUC）下的精确度和面积的性能。结合使用特定的模态知识转移，迭代模型修剪，并集成学习，改善了预测。我们预计，这种模式可以为COVID-19使用胸片筛查通过迅速。

36. Robotic Room Traversal using Optical Range Finding [PDF] 返回目录
Cole Smith, Eric Lin, Dennis Shasha
Abstract: Consider the goal of visiting every part of a room that is not blocked by obstacles. Doing so efficiently requires both sensors and planning. Our findings suggest a method of inexpensive optical range finding for robotic room traversal. Our room traversal algorithm relies upon the approximate distance from the robot to the nearest obstacle in 360 degrees. We then choose the path with the furthest approximate distance. Since millimeter-precision is not required for our problem, we have opted to develop our own laser range finding solution, in lieu of using more common, but also expensive solutions like light detection and ranging (LIDAR). Rather, our solution uses a laser that casts a visible dot on the target and a common camera (an iPhone, for example). Based upon where in the camera frame the laser dot is detected, we may calculate an angle between our target and the laser aperture. Using this angle and the known distance between the camera eye and the laser aperture, we may solve all sides of a trigonometric model which provides the distance between the robot and the target.
摘要：考虑访问是不被障碍物房间的每一个部分的目标。这样做有效地需要两个传感器和规划。我们的研究结果表明廉价的光学测距机器人室遍历的方法。我们的房间遍历算法依赖于近似距离从机器人到在360度附近的障碍物。然后，我们选择具有最远的大概距离的路径。由于不需要对我们的问题毫米的精度，我们选择开发自己的激光测距解决方案，代替使用比较常见的，也是昂贵的解决方案，如光探测和测距（LIDAR）。相反，我们的解决方案使用的激光，使人们对目标和共同的相机可见的点（一个iPhone为例）。基于其中在照相机帧中检测激光点，我们可以计算我们的目标和激光孔径之间的角度。使用此角度和相机眼睛和激光孔径之间的已知距离，我们可以解决，其提供所述机器人和所述目标之间的距离的三角模型的所有侧。

37. Complexity Analysis of an Edge Preserving CNN SAR Despeckling Algorithm [PDF] 返回目录
Sergio Vitale, Giampaolo Ferraioli, Vito Pascazio
Abstract: SAR images are affected by multiplicative noise that impairs their interpretations. In the last decades several methods for SAR denoising have been proposed and in the last years great attention has moved towards deep learning based solutions. Based on our last proposed convolutional neural network for SAR despeckling, here we exploit the effect of the complexity of the network. More precisely, once a dataset has been fixed, we carry out an analysis of the network performance with respect to the number of layers and numbers of features the network is composed of. Evaluation on simulated and real data are carried out. The results show that deeper networks better generalize on both simulated and real images.
摘要：SAR图像乘性噪声，其损害他们的解释的影响。在过去的几十年SAR去噪的几种方法已经被提出，并在过去几年中引起了关注，走向基于深刻的学习解决方案。基于对SAR去斑我们最后提出的卷积神经网络，在这里，我们利用了网络的复杂性的影响。更精确地，一旦数据集已经固定，我们开展了网络性能的分析相对于层和特征的网络是由号码的数目。仿真和实测数据评估被执行。结果表明，更深层次的网络上都模拟和真实图像更好的推广。

38. MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [PDF] 返回目录
Siddharth Tourani, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy
Abstract: Dense, discrete Graphical Models with pairwise potentials are a powerful class of models which are employed in state-of-the-art computer vision and bio-imaging applications. This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, the Max Product Linear Programming (MPLP) algorithm, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin, including the state-of-the-art solver Tree-Reweighted Sequential (TRWS) message-passing algorithm. Additionally, our solver is highly parallel, in contrast to TRWS, which gives a further boost in performance with the proposed GPU and multi-thread CPU implementations. We verify the superiority of our algorithm on dense problems from publicly available benchmarks, as well, as a new benchmark for 6D Object Pose estimation. We also provide an ablation study with respect to graph density.
摘要：致密，具有两两电位的离散图形模型是它们在国家的最先进的计算机视觉和生物成像应用中使用了强大的类模型。这项工作引入了新的MAP-求解器，基于流行的双块坐标的Ascent原则。出人意料的是，通过使一个小的变化到低进行求解，马克斯产品线性规划（MPLP）算法，我们得到了新的求解器MPLP ++大幅度那显著优于所有现有的解算器，包括国家的最先进的解算器树重加权顺序（TRWS）消息传递算法。此外，我们的求解器是高度并行的，而相比之下，TRWS，这给与所提出的GPU和多线程CPU实现性能进一步推动。我们验证我们从公开可用的基准密问题算法的优越性，以及，为6D物体姿态估计的新标杆。我们还提供了一个学习的消融相对于图形密度。

39. A Cross-Stitch Architecture for Joint Registration and Segmentation in Adaptive Radiotherapy [PDF] 返回目录
Laurens Beljaards, Mohamed S. Elmahdy, Fons Verbeek, Marius Staring
Abstract: Recently, joint registration and segmentation has been formulated in a deep learning setting, by the definition of joint loss functions. In this work, we investigate joining these tasks at the architectural level. We propose a registration network that integrates segmentation propagation between images, and a segmentation network to predict the segmentation directly. These networks are connected into a single joint architecture via so-called cross-stitch units, allowing information to be exchanged between the tasks in a learnable manner. The proposed method is evaluated in the context of adaptive image-guided radiotherapy, using daily prostate CT imaging. Two datasets from different institutes and manufacturers were involved in the study. The first dataset was used for training (12 patients) and validation (6 patients), while the second dataset was used as an independent test set (14 patients). In terms of mean surface distance, our approach achieved $1.06 \pm 0.3$ mm, $0.91 \pm 0.4$ mm, $1.27 \pm 0.4$ mm, and $1.76 \pm 0.8$ mm on the validation set and $1.82 \pm 2.4$ mm, $2.45 \pm 2.4$ mm, $2.45 \pm 5.0$ mm, and $2.57 \pm 2.3$ mm on the test set for the prostate, bladder, seminal vesicles, and rectum, respectively. The proposed multi-task network outperformed single-task networks, as well as a network only joined through the loss function, thus demonstrating the capability to leverage the individual strengths of the segmentation and registration tasks. The obtained performance as well as the inference speed make this a promising candidate for daily re-contouring in adaptive radiotherapy, potentially reducing treatment-related side effects and improving quality-of-life after treatment.
摘要：近日，联合登记和分割已经制定了深刻的学习环境，通过共同的损失函数的定义。在这项工作中，我们研究架构层面加入这些任务。我们提出了一个注册的网络，一个网络的分割图像之间的集成分割传播，并直接预测分割。这些网络通过所谓的横针迹单位连接成一个单一的联合体系结构，从而允许在一个可学习方式的任务之间进行交换的信息。所提出的方法在自适应图像引导放射治疗的情况下进行评价时，使用每日前列腺CT成像。从不同的研究机构和制造商两个数据集都参与了这项研究。所述第一数据集被用于训练（12例）和验证（6例），而第二数据集被用作一个独立的测试组（14例）。在平均表面的距离而言，我们的方法实现$ 1.06 \时许$ 0.3毫米，$ 0.91 \ PM 0.4 $毫米，$ 1.27 \ PM 0.4 $毫米，$ 1.76 \ PM 0.8 $毫米的验证集和$ 1.82 \ PM 2.4 $毫米， $ 2.45 \ PM 2.4 $毫米，$ 2.45 \ PM 5.0 $毫米，和$ 2.57 \ PM 2.3 $上的前列腺，膀胱，精囊，和直肠，分别测试集毫米。所提出的多任务网络优于单任务的网络，以及只通过损失函数加入了一个网络，因此证明利用的分割和配准任务的个人的长处的能力。所获得的性能以及推理速度使这个每天重新轮廓自适应放疗，潜在地减少治疗相关的副作用和提高质量的生活治疗后有希望的候选。

40. Triplet Loss for Knowledge Distillation [PDF] 返回目录
Hideki Oki, Motoshi Abe, Junichi Miyao, Takio Kurita
Abstract: In recent years, deep learning has spread rapidly, and deeper, larger models have been proposed. However, the calculation cost becomes enormous as the size of the models becomes larger. Various techniques for compressing the size of the models have been proposed to improve performance while reducing computational costs. One of the methods to compress the size of the models is knowledge distillation (KD). Knowledge distillation is a technique for transferring knowledge of deep or ensemble models with many parameters (teacher model) to smaller shallow models (student model). Since the purpose of knowledge distillation is to increase the similarity between the teacher model and the student model, we propose to introduce the concept of metric learning into knowledge distillation to make the student model closer to the teacher model using pairs or triplets of the training samples. In metric learning, the researchers are developing the methods to build a model that can increase the similarity of outputs for similar samples. Metric learning aims at reducing the distance between similar and increasing the distance between dissimilar. The functionality of the metric learning to reduce the differences between similar outputs can be used for the knowledge distillation to reduce the differences between the outputs of the teacher model and the student model. Since the outputs of the teacher model for different objects are usually different, the student model needs to distinguish them. We think that metric learning can clarify the difference between the different outputs, and the performance of the student model could be improved. We have performed experiments to compare the proposed method with state-of-the-art knowledge distillation methods.
摘要：近年来，深学习已迅速蔓延，深，较大的车型已经被提出。然而，计算成本变得庞大的模型的尺寸变得更大。压缩模型的尺寸的各种技术已经被提出，以提高性能的同时降低了计算成本。一个压缩模型大小的方法是知识蒸馏（KD）。知识蒸馏转移深或整体模型的知识，许多参数（教师模型），以较小的浅模型（学生模型）的技术。由于知识蒸馏的目的是提高教师模型和学生模型之间的相似性，我们建议引入度量学习的理念转化为知识蒸馏，使学生模型更接近老师模型中使用的训练样本对或三胞胎。在度量学习，研究人员正在开发的方法来建立一个模型，可以加大对类似样品输出的相似性。度量学习的目的是类似的减少之间的距离，提高异种之间的距离。度量学习，来减少类似的输出之间的差别的功能，可用于知识蒸馏，以减少教师模型的输出和学生模型之间的差异。由于针对不同的对象，老师模型的输出通常是不同的，学生模型需要区分它们。我们认为，度量学习可以明确不同的输出之间的差别，学生模型的性能可以改善。我们已经进行实验来比较所提出的方法与国家的最先进的知识蒸馏法。

41. LiteDenseNet: A Lightweight Network for Hyperspectral Image Classification [PDF] 返回目录
Rui Li, Chenxi Duan
Abstract: Hyperspectral Image (HSI) classification based on deep learning has been an attractive area in recent years. However, as a kind of data-driven algorithm, deep learning method usually requires numerous computational resources and high-quality labelled dataset, while the cost of high-performance computing and data annotation is expensive. In this paper, to reduce dependence on massive calculation and labelled samples, we propose a lightweight network architecture (LiteDenseNet) based on DenseNet for Hyperspectral Image Classification. Inspired by GoogLeNet and PeleeNet, we design a 3D two-way dense layer to capture the local and global features of the input. As convolution is a computationally intensive operation, we introduce group convolution to decrease calculation cost and parameter size further. Thus, the number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods, which means LiteDenseNet owns simpler architecture and higher efficiency. A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed LiteDenseNet obtains the state-of-the-art performance, even though when the absence of labelled samples is severe.
摘要：基于深度学习的高光谱图像（HSI）的分类一直是一个有吸引力的领域在最近几年。然而，作为一种数据驱动的算法，深度学习方法通常需要大量的计算资源和高品质的标记数据集，而高性能的成本计算和数据注解是昂贵的。在本文中，以减少对大规模计算和标记的样品依赖性，我们提出了一种基于DenseNet的高光谱图像分类的轻量级网络架构（LiteDenseNet）。通过GoogLeNet和PeleeNet启发，我们设计了一个3D双向致密层捕获输入的局部和全局的功能。卷积是计算密集型操作，我们引入组卷积，以减少计算成本和尺寸参数进一步。因此，参数的计算的消耗数量和比对换句深学习方法观察地少，这意味着LiteDenseNet拥有简单的架构和更高的效率。一系列的6个广泛使用的超光谱数据集定量的经验表明，所提出LiteDenseNet取得国家的最先进的性能，即使当不存在标记的样品的严重。

42. Cascaded Context Enhancement for Automated Skin Lesion Segmentation [PDF] 返回目录
Ruxin Wang, Shuyuan Chen, Jianping Fan, Ye Li
Abstract: Skin lesion segmentation is an important step for automated melanoma diagnosis. Due to the non-negligible diversity of lesions from different patients, extracting powerful context for fine-grained semantic segmentation is still challenging today. In this paper, we formulate a cascaded context enhancement neural network for skin lesion segmentation. The proposed method adopts encoder-decoder architecture, a new cascaded context aggregation (CCA) module with gate-based information integration approach is proposed for sequentially and selectively aggregating original image and encoder network features from low-level to high-level. The generated context is further utilized to guide discriminative features extraction by the designed context-guided local affinity module. Furthermore, an auxiliary loss is added to the CCA module for refining the prediction. In our work, we evaluate our approach on three public datasets. We achieve the Jaccard Index (JA) of 87.1%, 80.3% and 86.6% on ISIC-2016, ISIC-2017 and PH2 datasets, which are higher than other state-of-the-art methods respectively.
摘要：皮肤病变分割为自动黑素瘤诊断的一个重要步骤。由于来自不同患者的病变，细粒度语义分割提取强大背景的不可忽略的多样性今天仍然具有挑战性。在本文中，我们制定了皮肤肿瘤分割级联的背景下增强神经网络。所提出的方法采用编码器 - 解码器架构，与基于栅极 - 信息集成方法新级联上下文聚合（CCA）模块提出了一种用于连续地和选择聚集原始图像和从低电平到高电平编码器网络功能。所生成的上下文进一步利用由所设计的上下文引导局部亲和度模块引导判别特征提取。此外，辅助损失被添加到CCA模块用于细化预测。在我们的工作中，我们评估我们的三个公共数据集的方式。我们实现的87.1％，80.3％和86.6％的ISIC-2016，ISIC-2017和PH2数据集的Jaccard指数（JA），其分别大于其他国家的最先进的方法高。

43. Meta-Meta-Classification for One-Shot Learning [PDF] 返回目录
Arkabandhu Chowdhury, Dipak Chaudhari, Swarat Chaudhuri, Chris Jermaine
Abstract: We present a new approach, called meta-meta-classification, to learning in small-data settings. In this approach, one uses a large set of learning problems to design an ensemble of learners, where each learner has high bias and low variance and is skilled at solving a specific type of learning problem. The meta-meta classifier learns how to examine a given learning problem and combine the various learners to solve the problem. The meta-meta-learning approach is especially suited to solving few-shot learning tasks, as it is easier to learn to classify a new learning problem with little data than it is to apply a learning algorithm to a small data set. We evaluate the approach on a one-shot, one-class-versus-all classification task and show that it is able to outperform traditional meta-learning as well as ensembling approaches.
摘要：本文提出了一种新的方法，称为元元分类，在小数据设置学习。在这种方法中，一个使用了一大组学习问题来设计学习者的集合，其中每个学习者具有高的偏压和低的方差，并擅长于解决学习问题的一个特定类型。荟萃元分类学习如何检查一个给定的学习问题，并结合不同的学习者解决问题。在元元学习方法特别适合于解决一些次学习任务，因为它更容易学习比它是一个学习算法适用于小数据集几乎没有数据来分类新的学习问题。我们评估在一杆，一类抗所有分类任务和表演方法，它能够超越传统的元学习以及ensembling方法。

44. A New Modified Deep Convolutional Neural Network for Detecting COVID-19 from X-ray Images [PDF] 返回目录
Mohammad Rahimzadeh, Abolfazl Attar
Abstract: COVID-19 has become a serious health problem all around the world. It is confirmed that this virus has taken over 126,607 lives until today. Since the beginning of its spreading, many Artificial Intelligence researchers developed systems and methods for predicting the virus's behavior or detecting the infection. One of the possible ways of determining the patient infection to COVID-19 is through analyzing the chest X-ray images. As there are a large number of patients in hospitals, it would be time-consuming and difficult to examine lots of X-ray images, so it can be very useful to develop an AI network that does this job automatically. In this paper, we have trained several deep convolutional networks with the introduced training techniques for classifying X-ray images into three classes: normal, pneumonia, and COVID-19, based on two open-source datasets. Unfortunately, most of the previous works on this subject have not shared their dataset, and we had to deal with few data on covid19 cases. Our data contains 180 X-ray images that belong to persons infected to COVID-19, so we tried to apply methods to achieve the best possible results. In this research, we introduce some training techniques that help the network learn better when we have few cases of COVID-19, and also we propose a neural network that is a concatenation of Xception and ResNet50V2 networks. This network achieved the best accuracy by utilizing multiple features extracted by two robust networks. In this paper, despite some other researches, we have tested our network on 11302 images to report the actual accuracy our network can achieve in real circumstances. The average accuracy of the proposed network for detecting COVID-19 cases is 99.56%, and the overall average accuracy for all classes is 91.4%.
摘要：COVID-19已经成为一个严重的健康问题，所有周围的世界。据证实，这种病毒已接管126607的生活，直到今天。自蔓延之初，许多人工智能的研究人员开发的系统和方法来预测病毒的行为或检测感染。一个确定所述患者感染COVID-19的可能的方法是通过分析胸部X射线图像。由于有大量的病人在医院，这将是耗时和难以检查大量X射线图像，所以可以开发一个AI网络自动完成这个工作是非常有用的。在本文中，我们已经训练与分类X射线图像导入训练技术的若干深卷积网络分为三类：正常，肺炎和COVID-19，基于两个开源的数据集。不幸的是，大多数关于这个问题的以前的作品都没有分享他们的数据集，我们不得不面对的covid19案件的资料很少。我们的数据包含属于感染到COVID-19的人，所以我们试图运用方法，以达到最佳的可能结果180 X射线图像。在这项研究中，我们介绍一些训练技巧，帮助网络更好地学习，当我们有COVID-19的一些情况，并且还提出了一种神经网络是Xception和ResNet50V2网络的串联。该网络通过利用由两个强大的网络提取的多个特征来实现最佳的精度。在本文中，尽管有一些其他的研究中，我们测试了11302个图像我们的网络报告实际精度我们的网络可以在现实的情况下实现的。所提出的网络，用于检测COVID-19病例平均精度为99.56％，而对于所有的类的总平均准确度是91.4％。

45. Smartphone camera based pointer [PDF] 返回目录
Predrag Lazic
Abstract: Large screen displays are omnipresent today as a part of infrastructure for presentations and entertainment. Also powerful smartphones with integrated camera(s) are ubiquitous. However, there are not many ways in which smartphones and screens can interact besides casting the video from a smartphone. In this paper, we present a novel idea that turns a smartphone into a direct virtual pointer on the screen using the phone's camera. The idea and its implementation are simple, robust, efficient and fun to use. Besides the mathematical concepts of the idea we accompany the paper with a small javascript project (this http URL) which demonstrates the possibility of the new interaction technique presented as a massive multiplayer game in the HTML5 framework.
摘要：大屏幕显示今天无所不在的基础设施进行演示和娱乐的一部分。也有集成摄像头（一个或多个）功能强大的智能手机无处不在。但是，有没有在智能手机和屏幕可以从另外一个智能手机铸造视频互动很多方面。在本文中，我们提出把一个智能手机成为使用手机的摄像头在屏幕上直接虚拟指针一个新的想法。这个想法和它的实现很简单，稳健，高效和有趣的使用。除了观念的数学概念，我们陪纸用小的JavaScript项目（该HTTP URL），这表明作为呈现在HTML5框架大规模多人游戏新的交互技术的可能性。

46. Developing and Deploying Machine Learning Pipelines against Real-Time Image Streams from the PACS [PDF] 返回目录
Pradeeban Kathiravelu, Ashish Sharma, Saptarshi Purkayastha, Priyanshu Sinha, Alexandre Cadrin-Chenevert, Imon Banerjee, Judy Wawira Gichoya
Abstract: Executing machine learning (ML) pipelines on radiology images is hard due to limited computing resources in clinical environments, whereas running them in research clusters in real-time requires efficient data transfer capabilities. We propose Niffler, an integrated ML framework that runs in research clusters that receives radiology images in real-time from hospitals' Picture Archiving and Communication Systems (PACS). Niffler consists of an inter-domain data streaming approach that exploits the Digital Imaging and Communications in Medicine (DICOM) protocol to fetch data from the PACS to the data processing servers for executing the ML pipelines. It provides metadata extraction capabilities and Application programming interfaces (APIs) to apply filters on the DICOM images and run the ML pipelines. The outcomes of the ML pipelines can then be shared back with the end-users in a de-identified manner. Evaluations on the Niffler prototype highlight the feasibility and efficiency in running the ML pipelines in real-time from a research cluster on the images received in real-time from hospital PACS.
摘要：放射影像执行机器学习（ML）等的管道是很难由于在临床环境中计算资源有限，而在实时研究集群上运行他们需要高效的数据传输能力。我们建议Niffler，集成ML框架，在从医院的医学影像存档与通信系统（PACS）接收实时放射影像研究集群运行。 Niffler由一个域间数据流的方法，它利用数字成像和通信医学（DICOM）协议来从PACS取数据到数据处理服务器用于执行ML管道。它提供元数据提取功能和应用程序编程接口（API），以对DICOM图像应用过滤器和运行ML管道。对ML管道的结果然后可以共享在去标识的方式与最终用户回。在Niffler原型评估突出了对由医院PACS实时接收到的图像的一个研究群集中运行在实时ML管道的可行性和有效性。

47. Targeted Attack for Deep Hashing based Retrieval [PDF] 返回目录
Jiawang Bai, Bin Chen, Yiming Li, Dongxian Wu, Weiwei Guo, Shu-tao Xia, En-hui Yang
Abstract: The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel component-voting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label, whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the $\ell^\infty$ restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval.
摘要：深哈希基于内容的检索方法在大型图像和视频检索的广泛采用。然而，它的安全小调查。在本文中，我们提出了一种新方法，被称为深散列针对性的攻击（DHTA），来研究这样的检索有针对性的攻击。具体地讲，我们首先制定目标攻击作为点对集合最优化，其中对抗性示例的哈希码和那些一组与目标标签的对象的之间的平均距离最小化。然后，我们设计了一个新的组件的表决方案来获得的锚定码作为设定的与目标标签，其最优保证也理论上导出对象的哈希码的代表。为了平衡性能和可感知性，我们建议尽量减少对抗例子的哈希码和$ \ ELL ^ \ infty $的扰动约束下锚代码之间的汉明距离。大量的实验验证DHTA是有效的攻击都基于深散列图像检索和视频检索。

48. Learning visual policies for building 3D shape categories [PDF] 返回目录
Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid
Abstract: Manipulation and assembly tasks require non-trivial planning of actions depending on the environment and the final goal. Previous work in this domain often assembles particular instances of objects from known sets of primitives. In contrast, we here aim to handle varying sets of primitives and to construct different objects of the same shape category. Given a single object instance of a category, e.g. an arch, and a binary shape classifier, we learn a visual policy to assemble other instances of the same category. In particular, we propose a disassembly procedure and learn a state policy that discovers new object instances and their assembly plans in state space. We then render simulated states in the observation space and learn a heatmap representation to predict alternative actions from a given input image. To validate our approach, we first demonstrate its efficiency for building object categories in state space. We then show the success of our visual policies for building arches from different primitives. Moreover, we demonstrate (i) the reactive ability of our method to re-assemble objects using additional primitives and (ii) the robust performance of our policy for unseen primitives resembling building blocks used during training. Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
摘要：操纵和装配任务需要根据环境和最终目标的行动不平凡的规划。在这一领域以往的工作往往汇集从已知的套图元对象的特定实例。与此相反，我们在这里的目标是处理不同组原语和构造形状相同类别的不同对象。给定类别的单个对象实例，例如拱，以及二元形状分类，我们学习可视化策略来组装同一类的其他实例。特别是，我们提出了一个拆卸过程和学习的状态政策，即发现了新的对象实例和状态空间中的装配计划。然后，我们渲染模拟状态的观察空间和学习热图表示预测从给定的输入图像交替动作。为了验证我们的方法，我们首先证明其高效率，在状态空间建筑对象的类别。然后，我们证明了我们的视觉政策，从不同的原语构建拱成功。此外，我们证明（ⅰ）我们的方法的反应性的能力重新组装使用附加的基元，和（ii）我们对看不见基元类似于训练期间使用的积木政策的鲁棒性能对象。我们的可视化装配策略进行培训，没有真正的图像和一个真正的机器人进行评估时高达95％的成功率。

49. Deep Neural Network (DNN) for Water/Fat Separation: Supervised Training, Unsupervised Training, and No Training [PDF] 返回目录
R. Jafari, P. Spincemaille, J. Zhang, T. D. Nguyen, M. R. Prince, X. Luo, J. Cho, D. Margolis, Y. Wang
Abstract: Purpose: To use a deep neural network (DNN) for solving the optimization problem of water/fat separation and to compare supervised and unsupervised training. Methods: The current T2*-IDEAL algorithm for solving fat/water separation is dependent on initialization. Recently, deep neural networks (DNN) have been proposed to solve fat/water separation without the need for suitable initialization. However, this approach requires supervised training of DNN (STD) using the reference fat/water separation images. Here we propose two novel DNN water/fat separation methods 1) unsupervised training of DNN (UTD) using the physical forward problem as the cost function during training, and 2) no-training of DNN (NTD) using physical cost and backpropagation to directly reconstruct a single dataset. The STD, UTD and NTD methods were compared with the reference T2*-IDEAL. Results: All DNN methods generated consistent water/fat separation results that agreed well with T2*-IDEAL under proper initialization. Conclusion: The water/fat separation problem can be solved using unsupervised deep neural networks.
摘要：目的：使用深层神经网络（DNN）为解决水/脂肪分离的优化问题，并比较监督和无监督的训练。方法：将当前T2 *解决脂肪/水分离 - 理想算法依赖于初始化。近日，深层神经网络（DNN）已经提出了解决脂肪/水分离，而不需要合适的初始化。然而，这种方法需要使用参考脂肪/水分离图像DNN（STD）的监督训练。在这里，我们提出使用物理向前的问题，因为在训练期间所述成本函数使用物理成本和反向传播到直接在两个新颖DNN水/脂肪分离方法1）DNN（UTD的无监督训练），和2）DNN（NTD的无训练）重建的单个数据集。性病，UTD和NTD方法与参考T2 * - 理想进行了比较。结果：DNN方法产生，与T2吻合* - 理想下正确的初始化一致的水/脂肪分离的结果。结论：水/脂肪分离问题可以使用无监督深神经网络来解决。

50. Divergent Search for Few-Shot Image Classification [PDF] 返回目录
Jeremy Tan, Bernhard Kainz
Abstract: When data is unlabelled and the target task is not known a priori, divergent search offers a strategy for learning a wide range of skills. Having such a repertoire allows a system to adapt to new, unforeseen tasks. Unlabelled image data is plentiful, but it is not always known which features will be required for downstream tasks. We propose a method for divergent search in the few-shot image classification setting and evaluate with Omniglot and Mini-ImageNet. This high-dimensional behavior space includes all possible ways of partitioning the data. To manage divergent search in this space, we rely on a meta-learning framework to integrate useful features from diverse tasks into a single model. The final layer of this model is used as an index into the `archive' of all past behaviors. We search for regions in the behavior space that the current archive cannot reach. As expected, divergent search is outperformed by models with a strong bias toward the evaluation tasks. But it is able to match and sometimes exceed the performance of models that have a weak bias toward the target task or none at all. This demonstrates that divergent search is a viable approach, even in high-dimensional behavior spaces.
摘要：当数据被未标记和目标任务是不知道学习各种各样的技能是先验的，不同的搜索提供的策略。具有这样的剧目可以使系统适应新的，不可预见的任务。未标记的图像数据丰富，但它并不总是已知的功能将需要下游任务。我们建议在几个镜头图像分类设置不同的搜索方法，并与Omniglot和Mini-ImageNet评估。这种高维空间的行为包括分区数据的所有可能的方式。要管理在这个空间发散搜索，我们依靠元学习框架，从不同的任务有用的功能集成到一个单一的模式。这个模型的最后一层被用作索引到`所有过去的行为归档”。我们搜索的行为空间，目前的档案无法到达的区域。正如预期的那样，发散搜索是通过模型朝向评估任务的强烈的偏见跑赢。但它是能够匹配，有时甚至超过有对所有的目标任务或无弱偏置车型的性能。这表明，不同的搜索是一种可行的方法，即使是在高维空间的行为。

51. Symmetry as an Organizing Principle for Geometric Intelligence [PDF] 返回目录
Snejana Sheghava, Ashok Goel
Abstract: The exploration of geometrical patterns stimulates imagination and encourages abstract reasoning which is a distinctive feature of human intelligence. In cognitive science, Gestalt principles such as symmetry have often explained significant aspects of human perception. We present a computational technique for building artificial intelligence (AI) agents that use symmetry as the organizing principle for addressing Dehaene's test of geometric intelligence \cite{dehaene2006core}. The performance of our model is on par with extant AI models of problem solving on the Dehaene's test and seems correlated with some elements of human behavior on the same test.
摘要：几何图案可以刺激想象力的探索和鼓励抽象推理是人类智慧的一个显着特点。在认知科学，完形的原则，例如对称性经常解释人类感知的显著方面。我们提出建设的人工智能（AI）药剂使用对称性作为解决几何智力德阿纳的测试组织原则\ {引用} dehaene2006core的计算技术。我们的模型的性能与解决问题的德阿纳的测试现存AI车型不相上下，而且似乎与在同一测试人的行为的一些因素有关。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-04-20

目录

摘要