摘要

1. Using Graph Neural Networks to Reconstruct Ancient Documents [PDF] 返回目录
Cecilia Ostertag, Marie Beurton-Aimar
Abstract: In recent years, machine learning and deep learning approaches such as artificial neural networks have gained in popularity for the resolution of automatic puzzle resolution problems. Indeed, these methods are able to extract high-level representations from images, and then can be trained to separate matching image pieces from non-matching ones. These applications have many similarities to the problem of ancient document reconstruction from partially recovered fragments. In this work we present a solution based on a Graph Neural Network, using pairwise patch information to assign labels to edges representing the spatial relationships between pairs. This network classifies the relationship between a source and a target patch as being one of Up, Down, Left, Right or None. By doing so for all edges, our model outputs a new graph representing a reconstruction proposal. Finally, we show that our model is not only able to provide correct classifications at the edge-level, but also to generate partial or full reconstruction graphs from a set of patches.
摘要：近年来，随着人工神经网络已经得到普及的自动谜解析问题解决机器学习和深入学习方法等。事实上，这些方法都能够从图像中提取高级别交涉，然后就可以从非匹配的人进行培训，以单独匹配图像块。这些应用有很多相似之处，从部分回收片段古文件重建的问题。在这项工作中，我们提出一个基于图形神经网络的解决方案，使用成对的补丁信息来分配标签，以表示对之间的空间关系的边缘。该网络分类源和目标补丁的上，下，左，右的是一个或无之间的关系。这样所有的边缘，我们的模型输出表示重建提议新图。最后，我们表明，我们的模型不仅能够提供在边缘级正确分类，也能产生从一组补丁的部分或全部重建图表。

2. A Study of Domain Generalization on Ultrasound-based Multi-Class Segmentation of Arteries, Veins, Ligaments, and Nerves Using Transfer Learning [PDF] 返回目录
Edward Chen, Tejas Sudharshan Mathai, Vinit Sarode, Howie Choset, John Galeotti
Abstract: Identifying landmarks in the femoral area is crucial for ultrasound (US) -based robot-guided catheter insertion, and their presentation varies when imaged with different scanners. As such, the performance of past deep learning-based approaches is also narrowly limited to the training data distribution; this can be circumvented by fine-tuning all or part of the model, yet the effects of fine-tuning are seldom discussed. In this work, we study the US-based segmentation of multiple classes through transfer learning by fine-tuning different contiguous blocks within the model, and evaluating on a gamut of US data from different scanners and settings. We propose a simple method for predicting generalization on unseen datasets and observe statistically significant differences between the fine-tuning methods while working towards domain generalization.
摘要：在大腿区域识别标志是超声（US）为基础的机器人导引导管插入关键，当与不同的扫描仪成像的表现各异。因此，过去的深基于学习的方法的性能也被严格限制训练数据分布;这可以通过微调模型的全部或部分被规避，但很少被讨论微调的影响。在这项工作中，我们研究了通过转移学习由模型内微调不同的连续块多类的总部设在美国的分割，并在美国数据从不同的扫描仪和设置的色域评估。我们建议对看不见的数据集的预测概括和对域名的推广工作，同时观察微调方法之间的统计学差异显著的简单方法。

3. NightVision: Generating Nighttime Satellite Imagery from Infra-Red Observations [PDF] 返回目录
Paula Harder, William Jones, Redouane Lguensat, Shahine Bouabid, James Fulton, Dánell Quesada-Chacón, Aris Marcolongo, Sofija Stefanović, Yuhan Rao, Peter Manshausen, Duncan Watson-Parris
Abstract: The recent explosion in applications of machine learning to satellite imagery often rely on visible images and therefore suffer from a lack of data during the night. The gap can be filled by employing available infra-red observations to generate visible images. This work presents how deep learning can be applied successfully to create those images by using U-Net based architectures. The proposed methods show promising results, achieving a structural similarity index (SSIM) up to 86\% on an independent test set and providing visually convincing output images, generated from infra-red observations.
摘要：近期卫星图像机器学习的应用爆炸通常依赖于可见光图像，因此从在夜间缺乏数据受到影响。该缺口可以通过使用可用的红外观测产生可见光图像来填充。这项工作提出了多深的学习可以成功地应用于使用掌中宽带基础架构来创建这些图像。所提出的方法显示出有希望的结果，在一个独立的测试集实现结构相似性指数（SSIM）高达86 \％和提供视觉输出令人信服图像，从红外线观测生成的。

4. Multi-layered tensor networks for image classification [PDF] 返回目录
Raghavendra Selvan, Silas Ørting, Erik B Dam
Abstract: The recently introduced locally orderless tensor network (LoTeNet) for supervised image classification uses matrix product state (MPS) operations on grids of transformed image patches. The resulting patch representations are combined back together into the image space and aggregated hierarchically using multiple MPS blocks per layer to obtain the final decision rules. In this work, we propose a non-patch based modification to LoTeNet that performs one MPS operation per layer, instead of several patch-level operations. The spatial information in the input images to MPS blocks at each layer is squeezed into the feature dimension, similar to LoTeNet, to maximise retained spatial correlation between pixels when images are flattened into 1D vectors. The proposed multi-layered tensor network (MLTN) is capable of learning linear decision boundaries in high dimensional spaces in a multi-layered setting, which results in a reduction in the computation cost compared to LoTeNet without any degradation in performance.
摘要：最近推出的局部无序张网络（LoTeNet）上变换图像块的网格监督图像分类使用矩阵产品状态（MPS）业务。将所得的贴剂的表示被组合到一起到图像空间，并使用每层的多个MPS块以获得最终的决策规则聚集分层。在这项工作中，我们提出了一个非补丁基于修改LoTeNet执行每一层一个MPS操作，而不是几个补丁级别的操作。在输入图像中在每个层MPS块的空间信息被挤压到特征尺寸，类似于LoTeNet，最大限度地保持像素之间的空间相关性时图像被展平为1D向量。所提出的多层张量网络（MLTN）能够学习在高维空间中线性判决边界在多层设置，这导致在计算成本的降低相比LoTeNet没有任何性能劣化。

5. Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection [PDF] 返回目录
Faisal Alamri, Sinan Kalkan, Nicolas Pugeault
Abstract: Deep neural network approaches have demonstrated high performance in object recognition (CNN) and detection (Faster-RCNN) tasks, but experiments have shown that such architectures are vulnerable to adversarial attacks (FFF, UAP): low amplitude perturbations, barely perceptible by the human eye, can lead to a drastic reduction in labeling performance. This article proposes a new context module, called \textit{Transformer-Encoder Detector Module}, that can be applied to an object detector to (i) improve the labeling of object instances; and (ii) improve the detector's robustness to adversarial attacks. The proposed model achieves higher mAP, F1 scores and AUC average score of up to 13\% compared to the baseline Faster-RCNN detector, and an mAP score 8 points higher on images subjected to FFF or UAP attacks due to the inclusion of both contextual and visual features extracted from scene and encoded into the model. The result demonstrates that a simple ad-hoc context module can improve the reliability of object detectors significantly.
摘要：深神经网络的方法已经证明，在物体识别（CNN）高性能和检测（更快-RCNN）任务，但实验表明，这样的体系结构是容易受到对抗攻击（FFF，UAP）：低幅度的扰动，由几乎察觉不到人的眼睛，可导致标签性能的显着降低。本文提出了一种新的上下文模块，称为\ textit {变压器编码器探测器模块}，可被应用到对象检测器与（i）改善对象实例的标签;和（ii）提高检测器的稳健性，以对抗攻击。与基线相比更快-RCNN检测器所提出的模型达到较高的地图，F1分数和高达13 \％AUC平均得分，以及地图分数上经受FFF或UAP攻击图像更高8分由于包含两个上下文的并从场景中提取与编码到模型的视觉特征。该结果表明，一个简单的ad-hoc上下文模块可以显著改善对象检测器的可靠性。

6. Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis [PDF] 返回目录
Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, Horst-Michael Gross
Abstract: Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that it reaches state-of-the-art performance when considering both segmentation performance and runtime. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we show qualitative results in one of our indoor application scenarios.
摘要：分析场景完全是至关重要的在不同环境下代移动机器人。语义分割可增强各种后续任务，如（语义辅助）人感知，（语义）的自由空间的检测，（语义）映射，以及（语义）的导航。在本文中，我们提议可以被优化以使用NVIDIA TensorRT，因此，非常适合作为在复杂的系统中的共同的初始处理步骤，用于对移动机器人的场景分析的高度有效和健壮的RGB-d分割方法。我们证明了RGB-d分割优于单纯，它仍然可以在实时如果网络架构是经过精心设计进行处理RGB图像。我们对常见的室内数据集NYUv2和SUNRGB-d和表明它同时考虑分割性能和运行时达到国家的最先进的性能评估我们提出有效的场景分析网络（ESANet）。此外，我们在户外的数据集风情表演的评价，我们的方法是适合应用的其他领域。最后，而不是仅呈现基准测试结果，我们展现在我们的室内应用场景的一个定性结果。

7. A Study of Image Pre-processing for Faster Object Recognition [PDF] 返回目录
Md Tanzil Shahriar, Huyue Li
Abstract: Quality of image always plays a vital role in in-creasing object recognition or classification rate. A good quality image gives better recognition or classification rate than any unprocessed noisy images. It is more difficult to extract features from such unprocessed images which in-turn reduces object recognition or classification rate. To overcome problems occurred due to low quality image, typically pre-processing is done before extracting features from the image. Our project proposes an image pre-processing method, so that the performance of selected Machine Learning algorithms or Deep Learning algorithms increases in terms of increased accuracy or reduced the number of training images. In the later part, we compare the performance results by using our method with the previous used approaches.
摘要：图像质量始终发挥在-压痕物体识别和分类率至关重要的作用。良好的品质形象提供了比任何未处理图像噪点更好的识别和分类率。它是由这种未处理的图像提取特征，后者又降低对象识别或分类率更加困难。为了克服低质量图像的发生是由于问题，通常预处理从图像中提取特征之前完成。我们的项目提出了一种图像预处理方法，从而使选择的机器学习算法或深度学习算法中增加更高的精确度方面的性能或降低训练图像的数量。在后期，我们比较了使用我们的方法与以前使用的方法的性能测试结果。

8. Image Animation with Perturbed Masks [PDF] 返回目录
Yoav Shalev, Lior Wolf
Abstract: We present a novel approach for image-animation of a source image by a driving video, both depicting the same type of object. We do not assume the existence of pose models and our method is able to animate arbitrary objects without knowledge of the object's structure. Furthermore, both the driving video and the course image are only seen during test-time. Our method is based on a shared mask generator, which separates the foreground object from its background, and captures the object's general pose and shape. A mask-refinement module then replaces, in the mask extracted from the driver image, the identity of the driver with the identity of the source. Conditioned on the source image, the transformed mask is then decoded by a multi-scale generator that renders a realistic image, in which the content of the source frame is animated by the pose in the driving video. Due to lack of fully supervised data, we train on the task of reconstructing frames from the same video the source image is taken from. In order to control source of the identity of the output frame, we employ during training perturbations that remove the unwanted identity information. Our method is shown to greatly outperform the state of the art methods on multiple benchmarks. Our code and samples are available at this https URL.
摘要：由驱动视频呈现为一个源图像的图像的动画的新方法，既描绘同一类型的对象。我们不承担姿势车型的存在，我们的方法是能够任意动画对象，而无需对象的结构的知识。此外，两个驱动用视频和过程图像中测试时仅看到。我们的方法是基于共享掩码生成器，其分离从它的背景的前景对象，并捕获该对象的一般姿态和形状。掩模-细化模块然后替换，在从驾驶员的图像，与所述源的身份的驾驶员的身份提取的掩模。调节的源图像上，转化的掩模，然后通过多尺度发生器呈现逼真的图像，其中，所述源帧的内容是由在驱动视频姿态动画解码。由于缺乏全面监督的数据，我们培养从同一视频源图像取自重建帧的任务。为了将输出帧的身份的控制源，我们培养该删除不需要的身份信息扰动期间使用。被示出我们的方法来大大优于对多个基准的现有技术方法的状态。我们的代码和样本可在此HTTPS URL。

9. Discriminative Feature Representation with Spatio-temporal Cues for Vehicle Re-identification [PDF] 返回目录
J. Tu, C. Chen, X. Huang, J. He, X. Guan
Abstract: Vehicle re-identification (re-ID) aims to discover and match the target vehicles from a gallery image set taken by different cameras on a wide range of road networks. It is crucial for lots of applications such as security surveillance and traffic management. The remarkably similar appearances of distinct vehicles and the significant changes of viewpoints and illumination conditions take grand challenges to vehicle re-ID. Conventional solutions focus on designing global visual appearances without sufficient consideration of vehicles' spatiotamporal relationships in different images. In this paper, we propose a novel discriminative feature representation with spatiotemporal clues (DFR-ST) for vehicle re-ID. It is capable of building robust features in the embedding space by involving appearance and spatio-temporal information. Based on this multi-modal information, the proposed DFR-ST constructs an appearance model for a multi-grained visual representation by a two-stream architecture and a spatio-temporal metric to provide complementary information. Experimental results on two public datasets demonstrate DFR-ST outperforms the state-of-the-art methods, which validate the effectiveness of the proposed method.
摘要：车辆重新鉴定（重新-ID）目的在于发现和从广泛的道路网络所采取不同的相机画廊图像集匹配目标车辆。它是大量的应用，如安全监控和交通管理的关键。不同车辆的非常相似的外观和视点和光照条件的变化显著采取车辆重新ID重大挑战。传统的解决方案专注于设计全球视觉外观，未充分考虑车辆的不同图像spatiotamporal关系。在本文中，我们提出一种具有时空线索（DFR-ST），用于车辆再ID的新的辨别特征表示。它能够通过包括外观和时空信息建设中的嵌入空间强大的功能的。在此基础上的多模态信息，所提出的DFR-ST构建通过两流架构和时空度量来提供补充信息的多晶视觉表示外观模型。在两个数据集的公共实验结果表明，DFR-ST优于国家的最先进的方法，这验证了该方法的有效性。

10. Transductive Zero-Shot Learning using Cross-Modal CycleGAN [PDF] 返回目录
Patrick Bordes, Eloi Zablocki, Benjamin Piwowarski, Patrick Gallinari
Abstract: In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes -- classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes that unseen classes and their associated images are known during training, but not their correspondence. As current T-ZSL approaches do not scale efficiently when the number of seen classes is high, we tackle this problem with a new model for T-ZSL based upon CycleGAN. Our model jointly (i) projects images on their seen class labels with a supervised objective and (ii) aligns unseen class labels and visual exemplars with adversarial and cycle-consistency objectives. We show the efficiency of our Cross-Modal CycleGAN model (CM-GAN) on the ImageNet T-ZSL task where we obtain state-of-the-art results. We further validate CM-GAN on a language grounding task, and on a new task that we propose: zero-shot sentence-to-image matching on MS COCO.
摘要：在计算机视觉，零射门学习（ZSL）的目的是看不见的阶级分类 - 对于没有匹配训练图像存在的类。大多数ZSL作品为学习班看到的图像和类标签之间的跨模式的映射。然而，看到和看不到类的数据分布可能会有所不同，造成了域名转移问题。根据这一观察，转导ZSL（T-ZSL）假设看不见类及其相关的图像训练中已知的，但不是他们的信件。作为目前T-ZSL的方法不能有效扩展时看到类的数量高，我们与基于CycleGAN T-ZSL一种新的模式解决这个问题。我们的模型联合（一）项目在他们看到阶级标签用图像监管目标及（ii）对齐看不见类的标签，并与敌对和周期的一致性目标的视觉典范。我们展示的ImageNet T-ZSL任务我们的跨模态的CycleGAN模型（CM-GAN）的，我们获得国家的先进成果的效率。我们在语言接地任务进一步验证了CM-GaN，并在一个新的任务，我们提出：零射门句子对图像匹配的MS COCO。

11. LULC classification by semantic segmentation of satellite images using FastFCN [PDF] 返回目录
Md. Saif Hassan Onim, Aiman Rafeed Ehtesham, Amreen Anbar, A. K. M. Nazrul Islam, A. K. M. Mahbubur Rahman
Abstract: This paper analyses how well a Fast Fully Convolu-tional Network (FastFCN) semantically segments satellite images and thus classifies Land Use/Land Cover(LULC) classes. Fast-FCN was used on Gaofen-2 Image Dataset (GID-2) to segment them in five different classes: BuiltUp, Meadow, Farmland, Water and Forest. The results showed better accuracy (0.93), precision (0.99), recall (0.98) and mean Intersection over Union (mIoU)(0.97) than other approaches like using FCN-8 or eCognition, a readily available software. We presented a comparison between the results. We propose FastFCN to be both faster and more accurate automated method than other existing methods for LULC classification.
摘要：本文分析了快速全面Convolu-周志武网络（FastFCN）语义段卫星图像以及效果如何分类因而土地利用/土地覆盖变化（LULC）班。快速FCN已于Gaofen-2的图像数据集（GID-2）用于分割他们在五个不同的类：堆积的，草地，农田，水和森林。结果表明不是像使用FCN-8或eCognition，一个现成的软件，其他方法更好的精确度（0.93），精确度（0.99），召回（0.98）和过联盟（米欧）平均交集（0.97）。我们给出的结果之间的比较。我们建议FastFCN比为LULC分类等现有方法更快更准确的自动化方法。

12. SHAD3S: : A model to Sketch, Shade and Shadow [PDF] 返回目录
Raghav Brahmadesam Venkataramaiyer, Abhishek Joshi, Saisha Narang, Vinay P. Namboodiri
Abstract: Hatching is a common method used by artists to accentuate the third dimension of a sketch, and to illuminate the scene. Our system SHAD3S attempts to compete with a human at hatching generic three-dimensional (3D) shapes, and also tries to assist her in a form exploration exercise. The novelty of our approach lies in the fact that we make no assumptions about the input other than that it represents a 3D shape, and yet, given a contextual information of illumination and texture, we synthesise an accurate hatch pattern over the sketch, without access to 3D or pseudo 3D. In the process, we contribute towards a) a cheap yet effective method to synthesise a sufficiently large high fidelity dataset, pertinent to task; b) creating a pipeline with conditional generative adversarial network (CGAN); and c) creating an interactive utility with GIMP, that is a tool for artists to engage with automated hatching or a form-exploration exercise. User evaluation of the tool suggests that the model performance does generalise satisfactorily over diverse input, both in terms of style as well as shape. A simple comparison of inception scores suggest that the generated distribution is as diverse as the ground truth.
摘要：孵化是使用由艺术家加重草图的第三尺寸，且以照亮场景的常用方法。我们的系统SHAD3S尝试在孵化通用三维（3D）的形状与人竞争，而且还试图协助她一个形式的勘探工作。我们的方法谎言的事实，我们不作任何关于给予光照和纹理的上下文信息的输入以外，它代表了一个三维形状，但其他假设，我们合成了草图准确的孵化模式，不访问新颖以3D或伪3D。在这个过程中，我们朝着贡献）的廉价尚未合成足够大的高保真数据集，相关的任务有效的方法; b）产生具有条件生成对抗网络（CGAN）的管路;和c）创建与GIMP一个互动的工具，这是一个工具，为艺术家与自动化孵化或形式的探索运动搞。该工具的用户评价表明，该模型的表现确实令人满意广义含在不同的输入，无论是在风格和形状方面。以来分数的简单比较表明，所生成的分布等不同的地面实况。

13. Deep Template Matching for Pedestrian Attribute Recognition with the Auxiliary Supervision of Attribute-wise Keypoints [PDF] 返回目录
Jiajun Zhang, Pengyuan Ren, Jianmin Li
Abstract: Pedestrian Attribute Recognition (PAR) has aroused extensive attention due to its important role in video surveillance scenarios. In most cases, the existence of a particular attribute is strongly related to a partial region. Recent works design complicated modules, e.g., attention mechanism and proposal of body parts to localize the attribute corresponding region. These works further prove that localization of attribute specific regions precisely will help in improving performance. However, these part-information-based methods are still not accurate as well as increasing model complexity which makes it hard to deploy on realistic applications. In this paper, we propose a Deep Template Matching based method to capture body parts features with less computation. Further, we also proposed an auxiliary supervision method that use human pose keypoints to guide the learning toward discriminative local cues. Extensive experiments show that the proposed method outperforms and has lower computational complexity, compared with the state-of-the-art approaches on large-scale pedestrian attribute datasets, including PETA, PA-100K, RAP, and RAPv2 zs.
摘要：行人属性识别（PAR）引起了广泛关注，因为它在视频监控场景中发挥重要作用。在大多数情况下，一个特定属性的存在密切相关的部分区域。最近的作品设计复杂的模块，例如，注意机制和身体部位提案本地化属性对应的区域。这些作品进一步证明属性特定区域的精确定位将在提高性能的帮助。然而，这些部分的信息为基础的方法仍然不准确以及增加模型的复杂性，这使得它很难在现实的应用程序部署。在本文中，我们提出了一个深模板匹配的基础方法来捕获身体部位的功能与较少的计算。此外，我们还建议使用人体姿势的关键点，以指导向当地歧视线索学习辅助监管方法。广泛的实验表明，该方法优于并且具有较低的计算复杂度，相比于大规模行人属性的数据集，包括PETA，PA-100K，RAP和RAPv2 ZS接近现有技术状态的最。

14. Adaptive Future Frame Prediction with Ensemble Network [PDF] 返回目录
Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki
Abstract: Future frame prediction in videos is a challenging problem because videos include complicated movements and large appearance changes. Learning-based future frame prediction approaches have been proposed in kinds of literature. A common limitation of the existing learning-based approaches is a mismatch of training data and test data. In the future frame prediction task, we can obtain the ground truth data by just waiting for a few frames. It means we can update the prediction model online in the test phase. Then, we propose an adaptive update framework for the future frame prediction task. The proposed adaptive updating framework consists of a pre-trained prediction network, a continuous-updating prediction network, and a weight estimation network. We also show that our pre-trained prediction model achieves comparable performance to the existing state-of-the-art approaches. We demonstrate that our approach outperforms existing methods especially for dynamically changing scenes.
摘要：在未来的视频帧预测是一个具有挑战性的问题，因为视频中包含复杂的运动和大外观上的变化。基于学习的未来帧预测方法已经在各种文献中被提出。现有的基于学习的方法中的一个常见的限制是训练数据和测试数据的不匹配。在未来帧预测的任务，我们可以只等待几帧获取地面实况数据。这意味着我们可以在测试阶段的在线更新的预测模型。然后，我们提出了未来帧预测任务的自适应更新框架。所提出的自适应更新框架由一个预训练的预测网络，连续更新预测网络，以及一个权重估计网络。我们还表明，我们的预先训练预测模型达到相当的性能，以现有的国家的最先进的方法。我们证明我们的方法比现有的方法特别适用于动态变化的场景。

15. Fast and Scalable Earth Texture Synthesis using Spatially Assembled Generative Adversarial Neural Networks [PDF] 返回目录
Sung Eun Kim, Hongkyu Yoon, Jonghyun Lee
Abstract: The earth texture with complex morphological geometry and compositions such as shale and carbonate rocks, is typically characterized with sparse field samples because of an expensive and time-consuming characterization process. Accordingly, generating arbitrary large size of the geological texture with similar topological structures at a low computation cost has become one of the key tasks for realistic geomaterial reconstruction. Recently, generative adversarial neural networks (GANs) have demonstrated a potential of synthesizing input textural images and creating equiprobable geomaterial images. However, the texture synthesis with the GANs framework is often limited by the computational cost and scalability of the output texture size. In this study, we proposed a spatially assembled GANs (SAGANs) that can generate output images of an arbitrary large size regardless of the size of training images with computational efficiency. The performance of the SAGANs was evaluated with two and three dimensional (2D and 3D) rock image samples widely used in geostatistical reconstruction of the earth texture. We demonstrate SAGANs can generate the arbitrary large size of statistical realizations with connectivity and structural properties similar to training images, and also can generate a variety of realizations even on a single training image. In addition, the computational time was significantly improved compared to standard GANs frameworks.
摘要：地球纹理具有复杂形态的几何形状和如页岩和碳酸盐岩组合物中，其典型特征在于有因为昂贵的和耗时的表征过程的稀疏野外样品。因此，以低计算成本生成所述地质质地类似的拓扑结构的任意大尺寸已成为逼真的岩土材料重建的关键任务之一。近日，生成对抗神经网络（甘斯）已经证明了合成输入纹理图像和创建等概率岩土图像的潜力。然而，随着甘斯框架的纹理合成通常是由输出纹理大小的计算成本和可扩展性限制。在这项研究中，我们提出了一个空间组装甘斯（SAGANs），可以生成任意大尺寸的输出图像，无论与计算效率训练图像的大小。所述SAGANs的性能与广泛使用的地球纹理的地质统计重建二维和三维（2D和3D）岩石图像样本进行评价。我们证明SAGANs可以产生连接和类似训练图像，并且还甚至可以在一个单一的训练图像生成各种实现方式的结构性能统计变现的任意大的尺寸。此外，计算时间比标准甘斯框架是显著改善。

16. Lightweight Single-Image Super-Resolution Network with Attentive Auxiliary Feature Learning [PDF] 返回目录
Xuehui Wang, Qing Wang, Yuzhi Zhao, Junchi Yan, Lei Fan, Long Chen
Abstract: Despite convolutional network-based methods have boosted the performance of single image super-resolution (SISR), the huge computation costs restrict their practical applicability. In this paper, we develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$^2$F) for SISR. Firstly, to explore the features from the bottom layers, the auxiliary feature from all the previous layers are projected into a common space. Then, to better utilize these projected auxiliary features and filter the redundant information, the channel attention is employed to select the most important common feature based on current layer feature. We incorporate these two modules into a block and implement it with a lightweight network. Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods. Notably, when parameters are less than 320k, A$^2$F outperforms SOTA methods for all scales, which proves its ability to better utilize the auxiliary features. Codes are available at this https URL.
摘要：尽管卷积基于网络的方法已经提高单个图像超分辨率（SISR）的性能，巨大的计算成本限制了其实用性。在本文中，我们开发了一个计算高效而准确的网络基础上，提出细心的辅助功能（A $ ^ 2 $ F）为SISR。首先，为了从底层探索功能，从所有先前层中的辅助特征被投影到一公共空间。然后，以更好地利用这些突出的辅助特征和过滤冗余信息，信道注意采用基于当前层功能，以选择最重要的共同特征。我们这两个模块合并成块，并用一个轻量级的网络实现它。上的大型实验的结果数据集抗议国家的最先进的（SOTA）所提出的模型SR方法的有效性。值得注意的是，当参数小于320K，A $ ^ 2 $ F性能优于SOTA方法对所有尺度，这证明它有能力以更好地利用辅助功能。代码可在此HTTPS URL。

17. Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks [PDF] 返回目录
Jun Nishikawa, Ryoji Ikegaya
Abstract: Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and activations, using fine-tuning to recover the drop in accuracy. However, it is generally difficult to train neural networks which use low-bit expressions. One reason is that the weights in the middle layer of the DNN have a wide dynamic range and so when quantizing the wide dynamic range into a few bits, the step size becomes large, which leads to a large quantization error and finally a large degradation in accuracy. To solve this problem, this paper makes the following three contributions without using any additional learning parameters and hyper-parameters. First, we analyze how batch normalization, which causes the aforementioned problem, disturbs the fine-tuning of the quantized DNN. Second, based on these results, we propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN while not affecting the inferred result as far as possible. Third, we propose a workflow of fine-tuning for quantized DNNs using the proposed pruning method(PfQ). Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size than conventional quantization methods including fine-tuning.
摘要：深层神经网络（DNNs）有很多参数和激活数据，而这些都是实施起来很昂贵。以减少DNN的尺寸的一种方法是通过使用用于权重和激活的低比特表达，使用微调以恢复在精度下降以量化预先训练模型。但是，一般很难培养其使用低比特表达神经网络。一个原因是，在DNN的中间层的权重具有宽动态范围和如此量化宽动态范围划分为几个比特的情况下，步长变大，这导致大的量化误差和最后一个大的退化准确性。为了解决这个问题，本文对不使用任何额外的学习参数和超参数以下三个方面的影响。首先，我们来分析批标准化，这会导致上述问题，如何扰乱的量化DNN微调。其次，根据这些结果，我们提出了所谓的修剪量化（PFQ）一个新的修剪方法从而消除扰乱了DNN的微调，而不会影响推断结果尽可能的过滤器。第三，我们建议微调的工作流程，利用所提出的修剪方法（PFQ）量化DNNs。使用公知的模型和数据集实验证实，所提出的方法实现了比常规量化方法包括微调一个类似的模型大小更高的性能。

18. Structured Attention Graphs for Understanding Deep Image Classifications [PDF] 返回目录
Vivswan Shitole, Fuxin Li, Minsuk Kahng, Prasad Tadepalli, Alan Fern
Abstract: Attention maps are a popular way of explaining the decisions of convolutional networks for image classification. Typically, for each image of interest, a single attention map is produced, which assigns weights to pixels based on their importance to the classification. A single attention map, however, provides an incomplete understanding since there are often many other maps that explain a classification equally well. In this paper, we introduce structured attention graphs (SAGs), which compactly represent sets of attention maps for an image by capturing how different combinations of image regions impact a classifier's confidence. We propose an approach to compute SAGs and a visualization for SAGs so that deeper insight can be gained into a classifier's decisions. We conduct a user study comparing the use of SAGs to traditional attention maps for answering counterfactual questions about image classifications. Our results show that the users are more correct when answering comparative counterfactual questions based on SAGs compared to the baselines.
摘要：注意地图是解释图像分类卷积网络的决定的一种流行方式。典型地，对于每个感兴趣的图像，单一的注意力图产生，该分配权重，以基于它们的分类重要性像素。单注意图，然而，提供了一个不完整的理解，因为经常有解释分类同样许多其他地图。在本文中，我们介绍了结构化的注意力图（下降），通过捕获图像区域的不同组合如何影响分类的信心，这代表紧凑套注意地图的图像。我们提出了一个方法来计算下垂和下垂可视化，从而更深入的了解，可以得到成分类的决定。我们会进行用户研究，比较使用骤降到传统的注意地图回答有关图像分类反问题。我们的研究结果显示，用户基于较基线骤降回答比较反问题时都比较正确。

19. Local Anomaly Detection in Videos using Object-Centric Adversarial Learning [PDF] 返回目录
Pankaj Raj Roy, Guillaume-Alexandre Bilodeau, Lama Seoud
Abstract: We propose a novel unsupervised approach based on a two-stage object-centric adversarial framework that only needs object regions for detecting frame-level local anomalies in videos. The first stage consists in learning the correspondence between the current appearance and past gradient images of objects in scenes deemed normal, allowing us to either generate the past gradient from current appearance or the reverse. The second stage extracts the partial reconstruction errors between real and generated images (appearance and past gradient) with normal object behaviour, and trains a discriminator in an adversarial fashion. In inference mode, we employ the trained image generators with the adversarially learned binary classifier for outputting region-level anomaly detection scores. We tested our method on four public benchmarks, UMN, UCSD, Avenue and ShanghaiTech and our proposed object-centric adversarial approach yields competitive or even superior results compared to state-of-the-art methods.
摘要：提出了一种基于两阶段的对象为中心的对抗框架，只需要目标区域在视频检测帧级局部异常新颖的无监督的办法。第一阶段包括在学习中的场景的对象的当前的外观和过去的梯度图像之间的对应关系被视为正常，允许我们要么产生从当前外观或反向过去梯度。第二阶段提取实部和所产生的图像与正常对象行为（外观和过去的梯度）之间的部分重建的误差，和火车在敌对方式鉴别器。在推理模式，我们聘请训练有素的图像生成与输出区级异常检测分数的adversarially了解到二元分类。我们相比，国家的最先进的方法对四个公立基准，UMN，加州大学圣地亚哥分校，大道和ShanghaiTech和我们提出的目标为中心的对抗方式能够有竞争力的，甚至更好的结果我们的测试方法。

20. Adversarial Robustness Against Image Color Transformation within Parametric Filter Space [PDF] 返回目录
Zhengyu Zhao, Zhuoran Liu, Martha Larson
Abstract: We propose Adversarial Color Enhancement (ACE), a novel approach to generating non-suspicious adversarial images by optimizing a color transformation within a parametric filter space. The filter we use approximates human-understandable color curve adjustment, constraining ACE with a single, continuous function. This property gives rise to a principled adversarial action space explicitly controlled by filter parameters. Existing color transformation attacks are not guided by a parametric space, and, consequently, additional pixel-related constraints such as regularization and sampling are necessary. These constraints make methodical analysis difficult. In this paper, we carry out a systematic robustness analysis of ACE from both the attack and defense perspectives by varying the bound of the color filter parameters. We investigate a general formulation of ACE and also a variant targeting particularly appealing color styles, as achieved with popular image filters. From the attack perspective, we provide extensive experiments on the vulnerability of image classifiers, but also explore the vulnerability of segmentation and aesthetic quality assessment algorithms, in both the white-box and black-box scenarios. From the defense perspective, more experiments provide insight into the stability of ACE against input transformation-based defenses and show the potential of adversarial training for improving model robustness against ACE.
摘要：本文提出对抗性色彩增强（ACE），一种新的方法来通过参数滤波器空间内优化色彩变换生成非对抗性的可疑图像。该过滤器，我们使用近似的人类可理解的色曲线的调整，限制ACE具有单个，连续函数。此属性产生了由滤波器参数明确控制的原则的对抗行动的空间。现有色彩变换攻击没有被参数空间引导，并且，因此，附加的像素相关的约束，如正则化和抽样是必要的。这些限制使得系统的分析困难。在本文中，我们开展ACE的同时从进攻和防守的角度通过改变约束彩色滤光片参数进行了系统的鲁棒性分析。我们研究的ACE的一般公式中，也是一个变种瞄准特别吸引人的色彩款式，与流行的图像过滤器来实现的。从进攻的角度来看，我们提供的图像分类器的漏洞大量的实验，也探索的分割和美观质量评估算法的漏洞，在白盒和黑盒这两种方案。从防御的角度来看，更多的实验洞察ACE的稳定性对输入基于转换的防御，并显示改善的模型可以有效抵抗敌对ACE培训的潜力。

21. Trajectory Prediction in Autonomous Driving with a Lane Heading Auxiliary Loss [PDF] 返回目录
Ross Greer, Nachiket Deo, Mohan Trivedi
Abstract: Predicting a vehicle's trajectory is an essential ability for autonomous vehicles navigating through complex urban traffic scenes. Bird's-eye-view roadmap information provides valuable information for making trajectory predictions, and while state-of-the-art models extract this information via image convolution, auxiliary loss functions can augment patterns inferred from deep learning by further encoding common knowledge of social and legal driving behaviors. Since human driving behavior is inherently multimodal, models which allow for multimodal output tend to outperform single-prediction models on standard metrics; the proposed loss function benefits such models, as all predicted modes must follow the same expected driving rules. Our contribution to trajectory prediction is twofold; we propose a new metric which addresses failure cases of the off-road rate metric by penalizing trajectories that contain driving behavior that opposes the ascribed heading (flow direction) of a driving lane, and we show this metric to be differentiable and therefore suitable as an auxiliary loss function. We then use this auxiliary loss to extend the the standard multiple trajectory prediction (MTP) and MultiPath models, achieving improved results on the nuScenes prediction benchmark by predicting trajectories which better conform to the lane-following rules of the road.
摘要：预测车辆的轨迹是自主车通过复杂的城市交通场景导航的基本能力。鸟瞰图的路线图信息提供了用于制备轨迹预测的有价值的信息，并且当状态的最先进的模型通过图像卷积提取该信息，辅助损失函数可以增加通过的社会进一步编码常识和从深度学习推断图案合法的驾驶行为。由于人的驾驶行为本身是多峰的，模型，其允许多模态输出往往优于上的标准指标的单预测模式;所提出的损失函数有利于这种模式，因为所有的预测模式，必须遵循相同的预期驾驶规则。我们对轨迹预测的贡献是双重的;我们提出了一个行驶车道的新的指标，用于越野速率通过惩罚包含驾驶行为面对该冲高标题轨迹度量的地址失效的情况下（流动方向），并且我们显示此度量值是可微的，因此适合作为辅助损失函数。然后，我们使用这个辅助损耗，延长了标准的多轨迹预测（MTP）和多径模型，通过预测轨迹其更好地与道路的车道以下的规则实现对nuScenes预测基准改进的结果。

22. Empirical Performance Analysis of Conventional Deep Learning Models for Recognition of Objects in 2-D Images [PDF] 返回目录
Sangeeta Satish Rao, Nikunj Phutela, V R Badri Prasad
Abstract: Artificial Neural Networks, an essential part of Deep Learning, are derived from the structure and functionality of the human brain. It has a broad range of applications ranging from medical analysis to automated driving. Over the past few years, deep learning techniques have improved drastically - models can now be customized to a much greater extent by varying the network architecture, network parameters, among others. We have varied parameters like learning rate, filter size, the number of hidden layers, stride size and the activation function among others to analyze the performance of the model and thus produce a model with the highest performance. The model classifies images into 3 categories, namely, cars, faces and aeroplanes.
摘要：人工神经网络，深度学习的重要组成部分，是从人类大脑的结构和功能的。它具有广泛的应用范围从医疗分析到自动驾驶。通过改变网络结构，网络参数等车型现在可以进行定制，以更大的程度 - 在过去的几年中，深学习技术已经大大改善。我们已经改变就像学习速度，过滤器大小，隐藏层数，步幅大小和他人之间的激活函数参数来分析模型的性能，从而产生具有最高性能的典范。该模型对图像进行分类分为3类，即汽车，脸和飞机。

23. Automatic segmentation with detection of local segmentation failures in cardiac MRI [PDF] 返回目录
Jörg Sander, Bob D. de Vos, Ivana Išgum
Abstract: Segmentation of cardiac anatomical structures in cardiac magnetic resonance images (CMRI) is a prerequisite for automatic diagnosis and prognosis of cardiovascular diseases. To increase robustness and performance of segmentation methods this study combines automatic segmentation and assessment of segmentation uncertainty in CMRI to detect image regions containing local segmentation failures. Three state-of-the-art convolutional neural networks (CNN) were trained to automatically segment cardiac anatomical structures and obtain two measures of predictive uncertainty: entropy and a measure derived by MC-dropout. Thereafter, using the uncertainties another CNN was trained to detect local segmentation failures that potentially need correction by an expert. Finally, manual correction of the detected regions was simulated. Using publicly available CMR scans from the MICCAI 2017 ACDC challenge, the impact of CNN architecture and loss function for segmentation, and the uncertainty measure was investigated. Performance was evaluated using the Dice coefficient and 3D Hausdorff distance between manual and automatic segmentation. The experiments reveal that combining automatic segmentation with simulated manual correction of detected segmentation failures leads to statistically significant performance increase.
摘要：在心脏磁共振图像（CMRI）心脏解剖结构分割为自动诊断和心血管疾病的预后的先决条件。为了提高鲁棒性和分割方法的性能本研究结合了自动分割和CMRI分割的不确定性的评估，以检测含有局部分割失败的图像区域。三状态的最先进的卷积神经网络（CNN）被训练自动分割心脏的解剖结构，并获得预测的不确定性的两项措施：熵和由MC差导出的度量。此后，用另一CNN被训练来检测由专家可能需要纠正的地方分割失败的不确定性。最后，将检测到的区域的手动校正了模拟。从2017年MICCAI ACDC挑战使用公开可用的CMR扫描，CNN架构和分段损失函数，和不确定性的措施的影响进行了研究。使用手动和自动分割之间的骰子系数和3D Hausdorff距离性能进行了评价。实验揭示与检测分割故障导致的模拟手动校正统计学显著的性能提升是结合自动分割。

24. Monitoring and Diagnosability of Perception Systems [PDF] 返回目录
Pasquale Antonante, David I. Spivak, Luca Carlone
Abstract: Perception is a critical component of high-integrity applications of robotics and autonomous systems, such as self-driving vehicles. In these applications, failure of perception systems may put human life at risk, and a broad adoption of these technologies requires the development of methodologies to guarantee and monitor safe operation. Despite the paramount importance of perception systems, currently there is no formal approach for system-level monitoring. In this work, we propose a mathematical model for runtime monitoring and fault detection and identification in perception systems. Towards this goal, we draw connections with the literature on diagnosability in multiprocessor systems, and generalize it to account for modules with heterogeneous outputs that interact over time. The resulting temporal diagnostic graphs (i) provide a framework to reason over the consistency of perception outputs -- across modules and over time -- thus enabling fault detection, (ii) allow us to establish formal guarantees on the maximum number of faults that can be uniquely identified in a given perception systems, and (iii) enable the design of efficient algorithms for fault identification. We demonstrate our monitoring system, dubbed PerSyS, in realistic simulations using the LGSVL self-driving simulator and the Apollo Auto autonomy software stack, and show that PerSyS is able to detect failures in challenging scenarios (including scenarios that have caused self-driving car accidents in recent years), and is able to correctly identify faults while entailing a minimal computation overhead (< 5ms on a single-core CPU).
摘要：感知是机器人和自治系统，如自驾车车辆的高完整性应用的重要组成部分。在这些应用中，感知系统的失败可能使人类生命危险，并广泛采用这些技术都需要方法的发展，保障和监督安全运行。尽管感知系统的极端重要性，目前有进行系统级的监测没有正式的方法。在这项工作中，我们提出了在感知系统运行监控和故障检测与识别的数学模型。为了实现这一目标，我们绘制在多处理器系统可诊断性文献的连接，并与异构输出推广到帐户模块交互随着时间的推移。跨模块随着时间的推移 - - 将所得的时间的曲线图的诊断（i）至原因在感知输出的一致性提供一个框架，从而使故障检测，（ⅱ）使我们能够建立上的故障的是的最大数量正式保证罐可以在给定的感知系统唯一标识，以及（iii）使得能够有效的算法用于故障识别的设计。我们使用LGSVL自驾驶模拟器和阿波罗汽车自主软件堆栈，并表明人事管理系统能够探测到在具有挑战性的情况下（包括已造成自驾车车祸场景的失败证明我们的监控系统，被称为人事管理系统，在逼真的模拟在最近几年），并能正确识别故障，同时将会导致一个最小的计算开销（单核CPU <5ms的）。< font>

25. Relative Drone -- Ground Vehicle Localization using LiDAR and Fisheye Cameras through Direct and Indirect Observations [PDF] 返回目录
Jan Hausberg, Ryoichi Ishikawa, Menandro Roxas, Takeshi Oishi
Abstract: Estimating the pose of an unmanned aerial vehicle (UAV) or drone is a challenging task. It is useful for many applications such as navigation, surveillance, tracking objects on the ground, and 3D reconstruction. In this work, we present a LiDAR-camera-based relative pose estimation method between a drone and a ground vehicle, using a LiDAR sensor and a fisheye camera on the vehicle's roof and another fisheye camera mounted under the drone. The LiDAR sensor directly observes the drone and measures its position, and the two cameras estimate the relative orientation using indirect observation of the surrounding objects. We propose a dynamically adaptive kernel-based method for drone detection and tracking using the LiDAR. We detect vanishing points in both cameras and find their correspondences to estimate the relative orientation. Additionally, we propose a rotation correction technique by relying on the observed motion of the drone through the LiDAR. In our experiments, we were able to achieve very fast initial detection and real-time tracking of the drone. Our method is fully automatic.
摘要：估计无人驾驶飞行器（UAV）或无人机的姿态是一项艰巨的任务。这是许多应用，如导航，监视，地面上的跟踪对象，三维重建有用。在这项工作中，我们提出了一种基于激光雷达相机相对姿态估计方法的无人驾驶飞机和地面车辆之间，使用在车辆上的屋顶激光雷达传感器和鱼眼相机和安装在无人驾驶飞机下另一鱼眼相机。激光雷达传感器直接观察雄蜂和测量其位置，并且两个相机使用周围物体的间接观测估计相对取向。我们建议使用激光雷达无人驾驶飞机探测和跟踪动态自适应的基于内核的方法。我们检测两个相机消失点，找到自己对应估计相对方向。此外，我们通过依赖于通过激光雷达无人驾驶飞机的所观察到的运动提出一种旋转校正技术。在我们的实验中，我们能够实现无人驾驶的非常快速的初始检测和实时跟踪。我们的方法是完全自动的。

26. Metastatic Cancer Image Classification Based On Deep Learning Method [PDF] 返回目录
Guanwen Qiu, Xiaobing Yu, Baolin Sun, Yunpeng Wang, Lipei Zhang
Abstract: Using histopathological images to automatically classify cancer is a difficult task for accurately detecting cancer, especially to identify metastatic cancer in small image patches obtained from larger digital pathology scans. Computer diagnosis technology has attracted wide attention from researchers. In this paper, we propose a noval method which combines the deep learning algorithm in image classification, the DenseNet169 framework and Rectified Adam optimization algorithm. The connectivity pattern of DenseNet is direct connections from any layer to all consecutive layers, which can effectively improve the information flow between different layers. With the fact that RAdam is not easy to fall into a local optimal solution, and it can converge quickly in model training. The experimental results shows that our model achieves superior performance over the other classical convolutional neural networks approaches, such as Vgg19, Resnet34, Resnet50. In particular, the Auc-Roc score of our DenseNet169 model is 1.77% higher than Vgg19 model, and the Accuracy score is 1.50% higher. Moreover, we also study the relationship between loss value and batches processed during the training stage and validation stage, and obtain some important and interesting findings.
摘要：利用组织病理学图像自动分类癌症是用于准确地检测癌症，特别是识别从较大数字病理学扫描得到的小图像块转移性癌症一个困难的任务。计算机诊断技术已引起广泛关注的研究人员。在本文中，我们提出，结合图像分类，DenseNet169框架和整流亚当优化算法的深度学习算法诺瓦尔方法。 DenseNet的连通图案是从任何层到所有连续的层，可有效地提高不同层之间的信息流的直接连接。与事实RAdam不容易陷入局部最优解，它可以在模型训练快速收敛。实验结果表明，我们的模型比其他传统的卷积神经网络实现卓越的性能接近，如Vgg19，Resnet34，Resnet50。特别是，AUC大鹏分数我们DenseNet169模型高1.77％，比Vgg19模型，准确度得分越高1.50％。此外，我们还研究了损耗值，并在训练阶段和验证阶段处理批次之间的关系，并取得了一些重要的和有趣的发现。

27. SALAD: Self-Assessment Learning for Action Detection [PDF] 返回目录
Guillaume Vaudaux-Ruth, Adrien Chan-Hon-Tong, Catherine Achard
Abstract: Literature on self-assessment in machine learning mainly focuses on the production of well-calibrated algorithms through consensus frameworks i.e. calibration is seen as a problem. Yet, we observe that learning to be properly confident could behave like a powerful regularization and thus, could be an opportunity to improve performance.Precisely, we show that used within a framework of action detection, the learning of a self-assessment score is able to improve the whole action localization process.Experimental results show that our approach outperforms the state-of-the-art on two action detection benchmarks. On THUMOS14 dataset, the mAP at tIoU@0.5 is improved from 42.8\% to 44.6\%, and from 50.4\% to 51.7\% on ActivityNet1.3 dataset. For lower tIoU values, we achieve even more significant improvements on both datasets.
摘要：文学在机器学习主要是自我评估的重点是生产井校准算法，通过协商一致的框架，即校准被视为一个问题。然而，我们观察到，学习是适当的自信可以像一个强大的正规化，因此，可以改善performance.Precisely一个机会，我们表明，动作检测的范围内使用，自我评估分数的学习能力提高整个动作的本地化process.Experimental结果表明，该方法比两个动作检测基准的国家的最先进的。上THUMOS14数据集，在tIoU@0.5地图被从42.8 \％提高到44.6 \％，并从50.4 \％上ActivityNet1.3数据集以51.7 \％。对于较低tIoU值，就可以实现对两个数据集更显著的改善。

28. LEAN: graph-based pruning for convolutional neural networks by extracting longest chains [PDF] 返回目录
Richard Schoonhoven, Allard A. Hendriksen, Daniël M. Pelt, K. Joost Batenburg
Abstract: Convolutional neural networks (CNNs) have proven to be highly successful at a range of image-to-image tasks. CNNs can be computationally expensive, which can limit their applicability in practice. Model pruning can improve computational efficiency by sparsifying trained networks. Common methods for pruning CNNs determine what convolutional filters to remove by ranking filters on an individual basis. However, filters are not independent, as CNNs consist of chains of convolutions, which can result in sub-optimal filter selection. We propose a novel pruning method, LongEst-chAiN (LEAN) pruning, which takes the interdependency between the convolution operations into account. We propose to prune CNNs by using graph-based algorithms to select relevant chains of convolutions. A CNN is interpreted as a graph, with the operator norm of each convolution as distance metric for the edges. LEAN pruning iteratively extracts the highest value path from the graph to keep. In our experiments, we test LEAN pruning for several image-to-image tasks, including the well-known CamVid dataset. LEAN pruning enables us to keep just 0.5%-2% of the convolutions without significant loss of accuracy. When pruning CNNs with LEAN, we achieve a higher accuracy than pruning filters individually, and different pruned substructures emerge.
摘要：卷积神经网络（细胞神经网络）已经被证明是在一个范围内的图像到影像的任务非常成功。细胞神经网络可以是计算昂贵的，这限制了它们的应用实践。型号修剪可以通过训练有素的稀疏基底网络提高计算效率。对于修剪细胞神经网络常用的方法确定哪些卷积过滤器，通过在个人基础上的排名过滤器去除。然而，过滤器不是独立的，因为细胞神经网络包括卷积的链，这可导致次优的滤波器选择的。我们提出了一个新颖的修剪方法，最长链（瘦）修剪，这需要卷积操作之间的相互依赖性考虑在内。我们建议采用基于图形的算法来选择卷积的相关链进行修剪细胞神经网络。甲CNN被解释为曲线图，每个卷积作为边缘距离度量的算子范数。 LEAN修剪反复提取从图表中的最高值路径保持。在我们的实验中，我们测试LEAN修剪几个图像到影像的任务，其中包括著名的CamVid数据集。 LEAN修剪，使我们能够保持仅为0.5％-2％的卷积％，没有准确的显著损失。当LEAN修剪细胞神经网络，我们实现比单独修剪过滤器更高的精度，以及不同的修剪子出现。

29. REPAC: Reliable estimation of phase-amplitude coupling in brain networks [PDF] 返回目录
Giulia Cisotto
Abstract: Recent evidence has revealed cross-frequency coupling and, particularly, phase-amplitude coupling (PAC) as an important strategy for the brain to accomplish a variety of high-level cognitive and sensory functions. However, decoding PAC is still challenging. This contribution presents REPAC, a reliable and robust algorithm for modeling and detecting PAC events in EEG signals. First, we explain the synthesis of PAC-like EEG signals, with special attention to the most critical parameters that characterize PAC, i.e., SNR, modulation index, duration of coupling. Second, REPAC is introduced in detail. We use computer simulations to generate a set of random PAC-like EEG signals and test the performance of REPAC with regard to a baseline method. REPAC is shown to outperform the baseline method even with realistic values of SNR, e.g., -10 dB. They both reach accuracy levels around 99%, but REPAC leads to a significant improvement of sensitivity, from 20.11% to 65.21%, with comparable specificity (around 99%). REPAC is also applied to a real EEG signal showing preliminary encouraging results.
摘要：最近的证据已经揭示交频耦合，特别是相位 - 幅度耦合（PAC）作为脑的重要策略以实现各种高层次的认知和感觉功能。然而，解码PAC仍然具有挑战性。这种贡献礼物REPAC，可靠和强大的演算法和检测EEG信号PAC事件。首先，我们说明PAC状EEG信号的合成，特别注意的最关键参数表征PAC，即，SNR，调制指数，耦合的持续时间。其次，REPAC详细介绍。我们用计算机模拟生成一组随机PAC样脑电信号，并就基线法测试REPAC的性能。 REPAC被证明优于基线方法即使使用SNR的实际值，例如-10分贝。他们99％左右均达到精度水平，但REPAC通向显著改善敏感性，从20.11％至65.21％，具有相当的特异性（99％左右）。 REPAC也适用于显示初步结果令人鼓舞一个真正的EEG信号。

30. Unified Multi-Modal Landmark Tracking for Tightly Coupled Lidar-Visual-Inertial Odometry [PDF] 返回目录
David Wisth, Marco Camurri, Sandipan Das, Maurice Fallon
Abstract: We present an efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph. This runs in real-time at full framerate using fixed lag smoothing. To perform such tight integration, a new method to extract 3D line and planar primitives from lidar point clouds is presented. This approach overcomes the suboptimality of typical frame-to-frame tracking methods by treating the primitives as landmarks and tracking them over multiple scans. True integration of lidar features with standard visual features and IMU is made possible using a subtle passive synchronization of lidar and camera frames. The lightweight formulation of the 3D features allows for real-time execution on a single CPU. Our proposed system has been tested on a variety of platforms and scenarios, including underground exploration with a legged robot and outdoor scanning with a dynamically moving handheld device, for a total duration of 96 min and 2.4 km traveled distance. In these test sequences, using only one exteroceptive sensor leads to failure due to either underconstrained geometry (affecting lidar) and textureless areas caused by aggressive lighting changes (affecting vision). In these conditions, our factor graph naturally uses the best information available from each sensor modality without any hard switches.
摘要：一个单一的集成因子图内呈现为移动平台的有效的多传感器测距系统，其联合地优化视觉，激光雷达，和惯性信息。这在全帧率运行在实时使用固定滞后平滑。为了执行这样的紧密集成，一个新的方法，以提取三维线和从激光雷达点云平面原语被呈现。该方法通过处理原语为标记和跟踪它们在多个扫描克服了典型的帧到帧的跟踪方法中的次优。激光雷达的真正的集成与标准的视觉特征的特性和IMU是使用激光雷达和摄像机帧的微妙被动同步成为可能。的3D功能的轻质制剂允许在单CPU上实时执行。我们的提议的系统已经过测试，在各种平台和方案，包括具有腿式机器人和室外扫描具有动态移动手持设备地下勘探，为96分钟，并2.4公里行进距离的总持续时间。在这些测试序列，仅使用一个外感受传感器引线故障由于或者欠约束几何（影响激光雷达）和引起的侵蚀性照明的变化无纹理的区域（影响视力）。在这些条件下，我们的因子图自然使用而没有任何硬开关可从每个传感器模态的最佳信息。

31. FastTrack: an open-source software for tracking varying numbers of deformable objects [PDF] 返回目录
Benjamin Gallois, Raphaël Candelier
Abstract: Analyzing the dynamical properties of mobile objects requires to extract trajectories from recordings, which is often done by tracking movies. We compiled a database of two-dimensional movies for very different biological and physical systems spanning a wide range of length scales and developed a general-purpose, optimized, open-source, cross-platform, easy to install and use, self-updating software called FastTrack. It can handle a changing number of deformable objects in a region of interest, and is particularly suitable for animal and cell tracking in two-dimensions. Furthermore, we introduce the probability of incursions as a new measure of a movie's trackability that doesn't require the knowledge of ground truth trajectories, since it is resilient to small amounts of errors and can be computed on the basis of an ad hoc tracking. We also leveraged the versatility and speed of FastTrack to implement an iterative algorithm determining a set of nearly-optimized tracking parameters -- yet further reducing the amount of human intervention -- and demonstrate that FastTrack can be used to explore the space of tracking parameters to optimize the number of swaps for a batch of similar movies. A benchmark shows that FastTrack is orders of magnitude faster than state-of-the-art tracking algorithms, with a comparable tracking accuracy. The source code is available under the GNU GPLv3 at this https URL and pre-compiled binaries for Windows, Mac and Linux are available at this http URL.
摘要：分析移动物体的动态性质，需要从录音，这通常是由跟踪电影完成提取轨迹。我们编制的二维电影数据库涵盖了各种长度尺度非常不同的生物和物理系统，并制定了通用的，优化的，开放源码的，跨平台的，易于安装和使用，自我更新软件所谓的FastTrack。它可以处理可变形物体的变化的数字在感兴趣的区域，并且是特别适用于动物和细胞在二维跟踪。此外，我们引入入侵的概率为影片的可跟踪性的新措施，不需要地面实况轨迹的知识，因为它是有弹性的，以少量错误的，可以临时跟踪的基础上进行计算。我们还利用FastTrack网络的通用性和速度来实现迭代算法确定一组近优化追踪参数 - 更进一步减少人为干预的量 - 和证明的FastTrack可以用来探索跟踪参数，以空间优化交换次数为一批类似电影。一个基准显示，迅速完成是数量级比状态的最先进的跟踪算法更快，具有可比的跟踪精度。源代码是在GNU GPLv3的提供用于Windows，Mac和Linux这HTTPS URL和预编译的二进制文件都可以在这个HTTP URL。

32. Learning Object Manipulation Skills via Approximate State Estimation from Real Videos [PDF] 返回目录
Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic
Abstract: Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.
摘要：人类善于通过观看一些教学视频学习新的任务。在另一方面，学习新动作的机器人无论是通过试验和错误，或者是具有挑战性的获得使用专家演示需要很多的努力。在本文中，我们探讨了便于直接从视频学习对象操作技能的方法。利用在2D视觉识别和区分的渲染的最新进展，我们开发的优化基于以估计手粗3D状态表示和操纵对象（一个或多个），而不需要任何监管方法。我们使用这些轨迹密集奖励的代理人，通过强化学习学会模仿他们。我们评估我们从东西出头的数据集简单单，双对象的行动方法。我们的方法允许代理借鉴的单一影片的动作，一边看多的示威使政策更加稳健。我们发现，在模拟环境下学到的政策可以很容易地转移到真正的机器人。

33. Diffusion models for Handwriting Generation [PDF] 返回目录
Troy Luhman, Eric Luhman
Abstract: In this paper, we propose a diffusion probabilistic model for handwriting generation. Diffusion models are a class of generative models where samples start from Gaussian noise and are gradually denoised to produce output. Our method of handwriting generation does not require using any text-recognition based, writer-style based, or adversarial loss functions, nor does it require training of auxiliary networks. Our model is able to incorporate writer stylistic features directly from image data, eliminating the need for user interaction during sampling. Experiments reveal that our model is able to generate realistic , high quality images of handwritten text in a similar style to a given writer. Our implementation can be found at this https URL
摘要：在本文中，我们提出了笔迹产生扩散概率模型。扩散模型是一类生成模型，其中样本高斯噪声开始并逐步去噪产生输出的。我们的手写一代使用基于任何文字识别不要求的方法，作家风格为主，或敌对的损失函数，也不需要辅助的网络培训。我们的模型能够并入作家文体直接从图像数据的功能，消除了取样过程中与用户交互的需要。实验表明，我们的模型能够在相似的样式，生成手写文本的现实，高品质的图像给定的作家。我们的实现可以在这个HTTPS URL中找到

34. Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation [PDF] 返回目录
Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto
Abstract: Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a high sample complexity cost and the final result is often brittle, failing unexpectedly if the test environment differs from that of training. We study the effects of using mid-level visual representations (features learned asynchronously for traditional computer vision objectives), as a generic and easy-to-decode perceptual state in an end-to-end RL framework. Mid-level representations encode invariances about the world, and we show that they aid generalization, improve sample complexity, and lead to a higher final performance. Compared to other approaches for incorporating invariances, such as domain randomization, asynchronously trained mid-level representations scale better: both to harder problems and to larger domain shifts. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed. We report results on both manipulation and navigation tasks, and for navigation include zero-shot sim-to-real experiments on real robots.
摘要：基于视觉的机器人经常控制回路分离成用于感知一个模块和用于控制的单独模块。它可以培养整个系统的端至端（如深RL），而是“从零开始”配备了一个高采样复杂的成本和最终的结果往往是易碎的，没有做意外，如果测试环境不同于该培训。我们研究使用中级视觉表现的效果（功能异步学习了传统计算机视觉的目标），作为一个终端到终端的RL框架通用和易于解码知觉状态。中层表示编码不变性这个世界，我们表明，他们帮助推广，提高样本的复杂性，并导致更高的最后一场演出。相比其他方法用于将不变性，如域随机，异步训练的中层表示变得更好：既难的问题，并为更多领域的变化。在实践中，这意味着，中层表示可以用来成功地训练了那里域随机和学习，从划痕失败的任务的政策。我们报告两个操作和导航任务的结果，并进行导航包括真实机器人零射门SIM到真正的实验。

35. Disassemblable Fieldwork CT Scanner Using a 3D-printed Calibration Phantom [PDF] 返回目录
Florian Schiffers, Thomas Bochynek, Andre Aichert, Tobias Würfl, Michael Rubenstein, Oliver Cossairt
Abstract: The use of computed tomography (CT) imaging has become of increasing interest to academic areas outside of the field of medical imaging and industrial inspection, e.g., to biology and cultural heritage research. The pecularities of these fields, however, sometimes require that objects need to be imaged on-site, e.g., in field-work conditions or in museum collections. Under these circumstances, it is often not possible to use a commercial device and a custom solution is the only viable option. In order to achieve high image quality under adverse conditions, reliable calibration and trajectory reproduction are usually key requirements for any custom CT scanning system. Here, we introduce the construction of a low-cost disassemblable CT scanner that allows calibration even when trajectory reproduction is not possible due to the limitations imposed by the project conditions. Using 3D-printed in-image calibration phantoms, we compute a projection matrix directly from each captured X-ray projection. We describe our method in detail and show successful tomographic reconstructions of several specimen as proof of concept.
摘要：利用计算机断层扫描（CT）成像已成为越来越多的关注到学术领域的医疗成像和工业检测，例如领域之外，生物学和文化遗产的研究。这些字段的pecularities，然而，有时要求对象需要在现场，例如要被成像，在现场工作条件下或在博物馆集合。在这种情况下，它往往是不可能使用商业装置和一个自定义的解决方案是唯一可行的选择。为了实现在不利条件下的高的图像质量，可靠的校准和轨迹再现通常是为任何自定义CT扫描系统的关键要求。这里，我们介绍了一种低成本可拆卸的CT扫描仪的建设，使校准即使轨迹再现是不可能的，由于该项目的条件所施加的限制。使用3D-印刷在图像校正模型，我们计算投影直接从每个捕获的透视投影矩阵。我们描述我们的方法进行了详细和显示多个标本作为概念验证的成功的断层重建。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-11-16

目录

摘要