摘要

1. Purifying Real Images with an Attention-guided Style Transfer Network for Gaze Estimation [PDF] 返回目录
Yuxiao Yan, Yang Yan, Jinjia Peng, Huibing Wang, Xianping Fu
Abstract: Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared to real images, the desired performance cannot be achieved. Real images consist of multiple forms of light orientation, while synthetic images consist of a uniform light orientation. These features are considered to be characteristic of outdoor and indoor scenes, respectively. To solve this problem, the previous method learned a model to improve the realism of the synthetic image. Different from the previous methods, this paper try to purify real image by extracting discriminative and robust features to convert outdoor real images to indoor synthetic images. In this paper, we first introduce the segmentation masks to construct RGB-mask pairs as inputs, then we design a attention-guided style transfer network to learn style features separately from the attention and bkgd(background) region , learn content features from full and attention region. Moreover, we propose a novel region-level task-guided loss to restrain the features learnt from style and content. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. We evaluate the proposed method on three public datasets, including LPW, COCO and MPIIGaze. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.
摘要：近日，边学边合成的进步提出了合成图像的人才培养模式，可有效降低人力和物力成本。然而，由于不同的分布相比，实际图像的合成图像，无法实现期望的性能。真实图像包括的光取向多种形式，而合成图像由一个均匀的光取向的。这些特性被认为是室内和室外场景的特点，分别。为了解决这个问题，以前的方法学到了模式，提高合成图像的真实感。从以前的方法不同，本文尝试通过提取歧视和强大的功能，真正的户外图像转换为室内合成图像净化真实图像。在本文中，我们首先介绍的分割掩码构建RGB-掩模对作为输入，然后我们设计了注意力引导式传送网络学习风格从关注和BKGD（背景）区分开的特征，从全学习内容的特征和关注区域。此外，我们提出了一个新的区域级任务引导损失抑制从风格和内容学习的特点。实验使用混合研究（定性和定量）方法来证明在复杂的方向净化真实图像的可能性进行。我们评估三个公共数据集，包括LPW，COCO和MPIIGaze所提出的方法。广泛的实验结果表明，所提出的方法是有效的和达到状态的最先进的结果。

2. Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers [PDF] 返回目录
Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
Abstract: The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.
摘要：在过去的十年中取得数字化的历史记录了大量的自然借给自己自动处理和探索。研究工作寻求自动处理传真和提取信息，从而与倍增，作为第一个重要步骤，文档布局分析。如果识别和文件图像的兴趣细分的分类已经看到过去几年中由于深学习技术了显著的进步，许多挑战仍然存在，等等，使用细粒度分割类型学和考虑复杂的异构文件如历史报纸。此外，大多数的方法只考虑视觉特征，忽略文本信号。在这方面，我们引入历史报纸的语义分割，结合视觉和文本特征的多模态的方法。基于一系列关于历时瑞士和卢森堡报纸实验，调查，除其他外，视觉和文字特征的预测能力和他们的能力，以跨越时间和来源一概而论。结果表明，比较多车型的持续改进，以强烈的视觉底线，以及更好的鲁棒性高的材料差异。

3. Layer-wise Pruning and Auto-tuning of Layer-wise Learning Rates in Fine-tuning of Deep Networks [PDF] 返回目录
Youngmin Ro, Jin Young Choi
Abstract: Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that lower-level layers extract general features and higher-level layers extract specific features. Based on our discussion, we propose an algorithm that improves fine-tuning performance and reduces network complexity through layer-wise pruning and auto-tuning of layer-wise learning rates. Through in-depth experiments on image retrieval (CUB-200-2011, Stanford online products, and Inshop) and fine-grained classification (Stanford cars, Aircraft) datasets, the effectiveness of the proposed algorithm is verified.
摘要：现有的微调方法使用在所有层的单个学习率。在本文中，第一，我们使用单个学习速率不匹配的公知的概念，即较低级的层提取一般特征和更高级别的层提取特定特征讨论逐层重量变化通过微调该趋势。根据我们的讨论，我们提出改进微调性能，并通过逐层修剪和逐层学习速率自动调整降低了网络复杂性的算法。通过对图像检索进行了深入的实验（CUB-200-2011，斯坦福在线产品和Inshop）和细粒度分类（斯坦福汽车，飞机）的数据集，该算法的有效性进行验证。

4. Constrained Dominant sets and Its applications in computer vision [PDF] 返回目录
Alemu Leulseged Tesfaye
Abstract: In this thesis, we present new schemes which leverage a constrained clustering method to solve several computer vision tasks ranging from image retrieval, image segmentation and co-segmentation, to person re-identification. In the last decades clustering methods have played a vital role in computer vision applications; herein, we focus on the extension, reformulation, and integration of a well-known graph and game theoretic clustering method known as Dominant Sets. Thus, we have demonstrated the validity of the proposed methods with extensive experiments which are conducted on several benchmark datasets.
摘要：在本文中，我们提出新的方案，其杠杆率约束的聚类方法来解决几个计算机视觉任务，从图像检索，图像分割和共同分割，以人重新鉴定。在聚类方法在过去几十年都起到了计算机视觉应用至关重要的作用;本文中，我们侧重于延伸，再形成，以及一个公知的图形和博弈论聚类方法称为显性集的整合。因此，我们已经证明了广泛的实验所提出的方法被几个基准数据集进行的有效性。

5. Building Networks for Image Segmentation using Particle Competition and Cooperation [PDF] 返回目录
Fabricio Breve
Abstract: Particle competition and cooperation (PCC) is a graph-based semi-supervised learning approach. When PCC is applied to interactive image segmentation tasks, pixels are converted into network nodes, and each node is connected to its k-nearest neighbors, according to the distance between a set of features extracted from the image. Building a proper network to feed PCC is crucial to achieve good segmentation results. However, some features may be more important than others to identify the segments, depending on the characteristics of the image to be segmented. In this paper, an index to evaluate candidate networks is proposed. Thus, building the network becomes a problem of optimizing some feature weights based on the proposed index. Computer simulations are performed on some real-world images from the Microsoft GrabCut database, and the segmentation results related in this paper show the effectiveness of the proposed method.
摘要：粒子竞争与合作（PCC）是一种基于图的半监督学习方法。当PCC被施加到交互式图像分割任务，像素被转换成的网络节点，并且每个节点被连接到它的k-最近邻，根据一组从图像中提取特征之间的距离。建立一个适当的网络饲料PCC是实现良好的分割效果的关键。然而，比其他人来识别段，根据图像的特性来被分段一些特征可能是更重要的。在本文中，来评估候选网络的指数建议。因此，建设网络成为优化基础上，提出了指数一些要素权重的问题。计算机模拟从微软GrabCut数据库中的一些真实世界的影像进行，在本文中涉及的分割结果证明了该方法的有效性。

6. A Hybrid 3DCNN and 3DC-LSTM based model for 4D Spatio-temporal fMRI data: An ABIDE Autism Classification study [PDF] 返回目录
Ahmed El-Gazzar, Mirjam Quaak, Leonardo Cerliani, Peter Bloem, Guido van Wingen, Rajat Mani Thomas
Abstract: Functional Magnetic Resonance Imaging (fMRI) captures the temporal dynamics of neural activity as a function of spatial location in the brain. Thus, fMRI scans are represented as 4-Dimensional (3-space + 1-time) tensors. And it is widely believed that the spatio-temporal patterns in fMRI manifests as behaviour and clinical symptoms. Because of the high dimensionality ($\sim$ 1 Million) of fMRI, and the added constraints of limited cardinality of data sets, extracting such patterns are challenging. A standard approach to overcome these hurdles is to reduce the dimensionality of the data by either summarizing activation over time or space at the expense of possible loss of useful information. Here, we introduce an end-to-end algorithm capable of extracting spatiotemporal features from the full 4-D data using 3-D CNNs and 3-D Convolutional LSTMs. We evaluate our proposed model on the publicly available ABIDE dataset to demonstrate the capability of our model to classify Autism Spectrum Disorder (ASD) from resting-state fMRI data. Our results show that the proposed model achieves state of the art results on single sites with F1-scores of 0.78 and 0.7 on NYU and UM sites, respectively.
摘要：功能性磁共振成像（fMRI）技术捕获的空间位置在大脑功能的神经活动的时间动态。因此，功能磁共振成像扫描被表示为4维（3-空间+ 1时）张量。而人们普遍认为，在功能磁共振成像表现为行为和临床症状的时空格局。因为高维数（$ \ SIM $ 100万）的fMRI的，和数据集的基数有限的附加约束，提取这种图案是具有挑战性。克服这些障碍的标准方法是通过任一总结激活随时间或空间，以减少数据的维度在有用的信息可能丢失的费用。在这里，介绍一种能够使用3-d细胞神经网络和3-d卷积LSTMs全4-d数据提取时空特征的端至端的算法。我们评估的可公开获得的数据集遵守我们提出的模型，以展示我们的模型，以泛自闭症障碍（ASD），从静止状态fMRI数据进行分类的能力。我们的研究结果表明，该模型实现了对单个站点，分别为0.78和0.7的纽约大学和UM网站，F1分数艺术成果的状态。

7. Context Conditional Variational Autoencoder for Predicting Multi-Path Trajectories in Mixed Traffic [PDF] 返回目录
Hao Cheng, Wentong Liao. Michael Ying Yang, Monica Sester, Bodo Rosenhahn
Abstract: Trajectory prediction in urban mixed-traffic zones is critical for many AI systems, such as traffic management, social robots and autonomous driving.However, there are many challenges to predict the trajectories of heterogeneous road agents (pedestrians, cyclists and vehicles) at a microscopic-level. For example, an agent might be able to choose multiple plausible paths in complex interactions with other agents in varying environments. To this end, we propose an approach named Context Conditional Variational Autoencoder (CCVAE) that encodes both past and future scene context, interaction context and motion information to capture the variations of the future trajectories using a set of stochastic latent variables. We predict multi-path trajectories conditioned on past information of the target agent by sampling the latent variable multiple times. Through experiments on several datasets of varying scenes, our method outperforms the recent state-of-the-art methods for mixed traffic trajectory prediction by a large margin and more robust in a very challenging environment.
摘要：在城市混合交通区轨迹预测是许多AI系统，如交通管理，社会机器人和自主driving.However，有预测异质道路剂（行人，自行车和汽车）的的轨迹许多挑战的关键微观层面。例如，一个代理可能能够选择与在不同环境中的其他代理复杂交互的多个合理的路径。为此，我们提出了一个名为语境条件变自动编码器（CCVAE）编码过去和将来的景物情境，互动情境和运动信息采集使用一组随机潜在变量的未来轨迹的变化的方法。我们预测多路径轨迹通过潜变量多次取样空调以往的目标代理的信息。通过对不同场景的几个数据集实验，我们的方法优于近期的大幅度和一个非常具有挑战性的环境中更稳健的国家的最先进的方法混合交通轨迹预测。

8. Multi-Level Feature Fusion Mechanism for Single Image Super-Resolution [PDF] 返回目录
Jiawen Lyn
Abstract: Convolution neural network (CNN) has been widely used in Single Image Super Resolution (SISR) so that SISR has been a great success recently. As the network deepens, the learning ability of network becomes more and more powerful. However, most SISR methods based on CNN do not make full use of hierarchical feature and the learning ability of network. These features cannot be extracted directly by subsequent layers, so the previous layer hierarchical information has little impact on the output and performance of subsequent layers relatively poor. To solve above problem, a novel Multi-Level Feature Fusion network (MLRN) is proposed, which can take full use of global intermediate features. We also introduce Feature Skip Fusion Block (FSFblock) as basic module. Each block can be extracted directly to the raw multiscale feature and fusion multi-level feature, then learn feature spatial correlation. The correlation among the features of the holistic approach leads to a continuous global memory of information mechanism. Extensive experiments on public datasets show that the method proposed by MLRN can be implemented, which is favorable performance for the most advanced methods.
摘要：卷积神经网络（CNN）已被广泛应用于单图像超分辨率（SISR），以便SISR取得了很大的成功最近。随着网络的深入，网络的学习能力变得越来越强大。然而，根据CNN最SISR方法没有充分利用分层特征和网络的学习能力。这些功能不能由随后的层直接提取，所以先前层的分层信息对输出和后续层相对较差的性能的影响很小。为了解决上述问题，一种新颖的多级特征融合网络（MLRN）提出了一种能充分利用全球中间特性。我们还引进功能跳过融合块（FSFblock）为基本模块。每个块可以被直接提取到的原始多尺度特征和融合多层次特征，然后学习特征空间相关性。整体性方法导致的功能当中的信息机制，连续全局内存的相关性。公共数据集大量的实验表明，MLRN提出的方法可以实现，这是最先进的方法良好的性能。

9. Counting dense objects in remote sensing images [PDF] 返回目录
Guangshuai Gao, Qingjie Liu, Yunhong Wang
Abstract: Estimating accurate number of interested objects from a given image is a challenging yet important task. Significant efforts have been made to address this problem and achieve great progress, yet counting number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in natural scene, this task is challenging in following factors: large scale variation, complex cluttered background and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset based on remote sensing images, which contains four kinds of objects: buildings, crowded ships in harbor, large-vehicles and small-vehicles in parking lot. We then benchmark the dataset by designing a novel neural network which can generate density map of an input image. The proposed network consists of three parts namely convolution block attention module (CBAM), scale pyramid module (SPM) and deformable convolution module (DCM). Experiments on the proposed dataset and comparisons with state of the art methods demonstrate the challenging of the proposed dataset, and superiority and effectiveness of our method.
摘要：从给定的图像估计感兴趣对象的准确数字是一个挑战而重要的任务。显著已作出努力来解决这个问题，并取得了长足的进步，但是从遥感影像地面物体的计数值几乎没有影响。在本文中，我们感兴趣的是从遥感图像计数密集的对象。大规模的变化，复杂的复杂背景和方向随意性：在自然场景对象计数相比，这个任务是在以下因素的挑战。更重要的是，数据的缺乏严重限制了该领域研究的发展。为了解决这些问题，我们首先建立基于遥感影像，其包含四个类型的对象的大型对象计数数据集：建筑，拥挤的船只在港口，大型车和小型车的停车场。然后，我们的基准通过设计其可产生输入图像的密度图的新的神经网络的数据集。所提出的网络由三个部分组成，即卷积块注意模块（CBAM），规模金字塔模块（SPM）和可变形卷积模块（DCM）的。上提出的数据集实验和比较用的现有技术的方法证明了该数据集的挑战，而我们的方法的优越性和有效性。

10. End-to-end Learning of Object Motion Estimation from Retinal Events for Event-based Object Tracking [PDF] 返回目录
Haosheng Chen, David Suter, Qiangqiang Wu, Hanzi Wang
Abstract: Event cameras, which are asynchronous bio-inspired vision sensors, have shown great potential in computer vision and artificial intelligence. However, the application of event cameras to object-level motion estimation or tracking is still in its infancy. The main idea behind this work is to propose a novel deep neural network to learn and regress a parametric object-level motion/transform model for event-based object tracking. To achieve this goal, we propose a synchronous Time-Surface with Linear Time Decay (TSLTD) representation, which effectively encodes the spatio-temporal information of asynchronous retinal events into TSLTD frames with clear motion patterns. We feed the sequence of TSLTD frames to a novel Retinal Motion Regression Network (RMRNet) to perform an end-to-end 5-DoF object motion regression. Our method is compared with state-of-the-art object tracking methods, that are based on conventional cameras or event cameras. The experimental results show the superiority of our method in handling various challenging environments such as fast motion and low illumination conditions.
摘要：事件相机，这是异步的仿生视觉传感器，显示在计算机视觉和人工智能的巨大潜力。然而，事件摄像机对象级运动估计或跟踪的应用仍处于初级阶段。这背后工作的主要思想是提出一种新颖的深层神经网络学习和退步的参数对象级运动/变换模型基于事件的对象跟踪。为了实现这个目标，提出了一种同步时间曲面的线性时间衰减（TSLTD）表示，这有效地编码异步事件视网膜与清晰的运动模式TSLTD帧的时空信息。我们从进料TSLTD帧序列，以一种新颖的视网膜运动回归网络（RMRNet）来执行一个终端到终端的五自由度对象运动消退。我们的方法是与国家的最先进的物体跟踪方法中，是基于传统摄像机或事件相机相比。实验结果表明我们在处理各种挑战性的环境，诸如快速运动和低光照条件的方法的优越性。

11. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method [PDF] 返回目录
Bin Ren, Mengyuan Liu, Runwei Ding, Hong Liu
Abstract: 3D skeleton-based action recognition, owing to the latent advantages of skeleton, has been an active topic in computer vision. As a consequence, there are lots of impressive works including conventional handcraft feature based and learned feature based have been done over the years. However, previous surveys about action recognition mostly focus on the video or RGB data dominated methods, and the scanty existing reviews related to skeleton data mainly indicate the representation of skeleton data or performance of some classic techniques on a certain dataset. Besides, though deep learning methods has been applied to this field for years, there is no related reserach concern about an introduction or review from the perspective of deep learning architectures. To break those limitations, this survey firstly highlight the necessity of action recognition and the significance of 3D-skeleton data. Then a comprehensive introduction about Recurrent Neural Network(RNN)-based, Convolutional Neural Network(CNN)-based and Graph Convolutional Network(GCN)-based main stream action recognition techniques are illustrated in a data-driven manner. Finally, we give a brief talk about the biggest 3D skeleton dataset NTU-RGB+D and its new edition called NTU-RGB+D 120, accompanied with several existing top rank algorithms within those two datasets. To our best knowledge, this is the first research which give an overall discussion over deep learning-based action recognitin using 3D skeleton data.
摘要：基于骨架的3D动作识别，由于骨骼的潜在优势，一直是计算机视觉活跃的课题。因此，有很多令人印象深刻的作品，包括和基于学习的特征已多年来做传统手工功能。然而，关于动作识别之前的调查主要集中在视频和RGB数据为主的方法，和现有相关的骨架数据的评论寥寥无几，主要显示骨架数据或对某一数据集一些经典的技术性能的表现。此外，虽然深学习方法已应用到这个领域里，有来自深学习架构的角度介绍或审查没有相关启发式算法关注。为了打破这些限制，本次调查首先突出动作识别的必要性和3D骨架数据的意义。然后将约回归神经网络（RNN）为主，卷积神经网络（CNN）的全面介绍的基于图形和卷积网络（GCN）基主流动作识别技术在数据驱动的方式示出。最后，我们给出一个简单说说最大的三维骨骼数据集NTU-RGB + d和它的新版本称为NTU-RGB + d 120，伴随着这两个数据集内的多个现有热门排名算法。据我们所知，这是这给使用的三维骨骼数据在深基础的学习行动recognitin全面讨论第一个研究。

12. Liver Segmentation in Abdominal CT Images via Auto-Context Neural Network and Self-Supervised Contour Attention [PDF] 返回目录
Minyoung Chung, Jingyu Lee, Jeongjin Lee, Yeong-Gil Shin
Abstract: Accurate image segmentation of the liver is a challenging problem owing to its large shape variability and unclear boundaries. Although the applications of fully convolutional neural networks (CNNs) have shown groundbreaking results, limited studies have focused on the performance of generalization. In this study, we introduce a CNN for liver segmentation on abdominal computed tomography (CT) images that shows high generalization performance and accuracy. To improve the generalization performance, we initially propose an auto-context algorithm in a single CNN. The proposed auto-context neural network exploits an effective high-level residual estimation to obtain the shape prior. Identical dual paths are effectively trained to represent mutual complementary features for an accurate posterior analysis of a liver. Further, we extend our network by employing a self-supervised contour scheme. We trained sparse contour features by penalizing the ground-truth contour to focus more contour attentions on the failures. The experimental results show that the proposed network results in better accuracy when compared to the state-of-the-art networks by reducing 10.31% of the Hausdorff distance. We used 180 abdominal CT images for training and validation. Two-fold cross-validation is presented for a comparison with the state-of-the-art neural networks. Novel multiple N-fold cross-validations are conducted to verify the performance of generalization. The proposed network showed the best generalization performance among the networks. Additionally, we present a series of ablation experiments that comprehensively support the importance of the underlying concepts.
摘要：肝脏精确的图像分割是由于其较大的形状变化和界限不清一个具有挑战性的问题。虽然完全卷积神经网络（细胞神经网络）的应用已经显示开创性成果，有限的研究都集中在泛化的表现。在这项研究中，我们对腹部电脑断层扫描（CT）图像引入CNN肝分割昭示着高泛化性能和精度。为了提高推广能力，我们初步提出了在一个单一的CNN自动背景算法。所提出的自动上下文神经网络利用一个有效的高层次的剩余估计之前获得的形状。相同的双路径被有效地训练以表示用于肝脏的精确分析后相互互补的特征。此外，我们通过采用自监督轮廓方案我们的网络扩展。我们通过惩罚地面真轮廓专注于故障的详细轮廓重视培训的稀疏轮廓特征。实验结果表明，在更好的准确度所提出的网络的结果通过减少Hausdorff距离的10.31％，较先进的最先进的网络时。我们使用180幅腹部CT图像进行训练和验证。两折交叉验证呈现，用于与状态的最先进的神经网络的比较。新颖倍数N倍交叉验证是为了验证泛化的性能。所提出的网络显示网络中最好的泛化性能。此外，我们提出了一系列的消融实验证明，全面支持基本概念的重要性。

13. An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset [PDF] 返回目录
Zhicheng Li, Zhihao Gu, Xuan Di, Rongye Shi
Abstract: The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While the dataset provides a large amount of high-quality and multi-source driving information, people in academia are more interested in the underlying driving policy programmed in Waymo self-driving cars, which is inaccessible due to AV manufacturers' proprietary protection. Accordingly, academic researchers have to make various assumptions to implement AV components in their models or simulations, which may not represent the realistic interactions in real-world traffic. Thus, this paper introduces an approach to learn an long short-term memory (LSTM)-based model for imitating the behavior of Waymo's self-driving model. The proposed model has been evaluated based on Mean Absolute Error (MAE). The experimental results show that our model outperforms several baseline models in driving action prediction. Also, a visualization tool is presented for verifying the performance of the model.
摘要：Waymo打开的数据集已经发布最近，提供一个平台，以众包的自动车（AVS），如3D检测和跟踪一些根本性的挑战。而该数据集提供了大量的高品质和多源驾驶信息，学术界人士更关心的是Waymo自动驾驶汽车编程的基本推动政策，这是人迹罕至，由于AV厂商的专有保护对策研究。因此，学术研究人员不得不做出各种假设以实现他们的模型或模拟AV设备，这可能不能代表现实世界的交通现实的互动。因此，本文介绍了学习的长短期记忆（LSTM）为基础的模型模仿Waymo的自驾车模型的行为的方法。基于平均绝对误差（MAE），该模型已被评估。实验结果表明，该模型优于几个基本模式，在驾驶行动的预测。此外，可视化工具提出了用于验证模型的性能。

14. Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets [PDF] 返回目录
Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, Xingjun Ma
Abstract: Skip connections are an essential component of current state-of-the-art deep neural networks (DNNs) such as ResNet, WideResNet, DenseNet, and ResNeXt. Despite their huge success in building deeper and more powerful DNNs, we identify a surprising security weakness of skip connections in this paper. Use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like (with skip connections) neural networks, gradients can backpropagate through either skip connections or residual modules. We find that using more gradients from the skip connections rather than the residual modules according to a decay factor, allows one to craft adversarial examples with high transferability. Our method is termed Skip Gradient Method(SGM). We conduct comprehensive transfer attacks against state-of-the-art DNNs including ResNets, DenseNets, Inceptions, Inception-ResNet, Squeeze-and-Excitation Network (SENet) and robustly trained DNNs. We show that employing SGM on the gradient flow can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, SGM can be easily combined with existing black-box attack techniques, and obtain high improvements over state-of-the-art transferability methods. Our findings not only motivate new research into the architectural vulnerability of DNNs, but also open up further challenges for the design of secure DNN architectures.
摘要：跳过连接是国家的最先进的电流深神经网络（DNNs）如RESNET，WideResNet，DenseNet，和ResNeXt的一个基本组成部分。尽管他们在建立更深入和更强大的DNNs巨大的成功，我们确定在本文中跳跃连接的一个令人惊讶的安全弱点。跳过连接的使用使得更容易产生高度对抗性转让的例子。具体而言，在RESNET状（具有跳跃连接）神经网络，梯度可backpropagate通过任跳过连接或残余模块。我们发现，使用更多的梯度从跳过连接，而不是根据衰减因子残留的模块，允许人们与手艺高转印对抗性例子。我们的方法被称为跳过梯度法（SGM）。我们开展反对国家的最先进的DNNs全面转让攻击，包括ResNets，DenseNets，Inceptions，成立之初，RESNET，挤压和激励网络（SENET）和稳健的培训DNNs。我们发现，采用SGM的梯度流动可以大大提高制作的攻击转让在几乎所有情况。此外，SGM能够容易地与现有的黑盒攻击技术组合，并且获得对国家的最先进的转印方法高的改进。我们的研究结果不仅激发新的研究DNNs的建筑的脆弱性，同时也开辟安全DNN架构的设计进一步的挑战。

15. SemI2I: Semantically Consistent Image-to-Image Translation for Domain Adaptation of Remote Sensing Data [PDF] 返回目录
Onur Tasar, S L Happy, Yuliya Tarabalka, Pierre Alliez
Abstract: Although convolutional neural networks have been proven to be an effective tool to generate high quality maps from remote sensing images, their performance significantly deteriorates when there exists a large domain shift between training and test data. To address this issue, we propose a new data augmentation approach that transfers the style of test data to training data using generative adversarial networks. Our semantic segmentation framework consists in first training a U-net from the real training data and then fine-tuning it on the test stylized fake training data generated by the proposed approach. Our experimental results prove that our framework outperforms the existing domain adaptation methods.
摘要：虽然卷积神经网络已经被证明是产生从遥感影像的高质量地图的有效工具，当存在训练和测试数据之间存在较大的领域转变他们的表现显著恶化。为了解决这个问题，我们提出了转会的测试数据的样式训练数据使用生成对抗性的网络新的数据增强方法。我们的语义分割框架由在第一次训练从真正的训练数据的U形网，然后进行精细调整，通过该方法生成的测试程式化假的训练数据。我们的实验结果证明我们的架构优于现有的域自适应方法。

16. Remove Appearance Shift for Ultrasound Image Segmentation via Fast and Universal Style Transfer [PDF] 返回目录
Zhendong Liu, Xin Yang, Rui Gao, Shengfeng Liu, Haoran Dou, Shuangchi He, Yuhao Huang, Yankai Huang, Huanjia Luo, Yuanji Zhang, Yi Xiong, Dong Ni
Abstract: Deep Neural Networks (DNNs) suffer from the performance degradation when image appearance shift occurs, especially in ultrasound (US) image segmentation. In this paper, we propose a novel and intuitive framework to remove the appearance shift, and hence improve the generalization ability of DNNs. Our work has three highlights. First, we follow the spirit of universal style transfer to remove appearance shifts, which was not explored before for US images. Without sacrificing image structure details, it enables the arbitrary style-content transfer. Second, accelerated with Adaptive Instance Normalization block, our framework achieved real-time speed required in the clinical US scanning. Third, an efficient and effective style image selection strategy is proposed to ensure the target-style US image and testing content US image properly match each other. Experiments on two large US datasets demonstrate that our methods are superior to state-of-the-art methods on making DNNs robust against various appearance shifts.
摘要：深层神经网络（DNNs）从性能下降时遭受图像外观发生偏移，尤其是在超声（US）图像分割。在本文中，我们提出了一种新颖的和直观的架构，消除外观移位，并因此提高DNNs的泛化能力。我们的工作有三大亮点。首先，我们按照通用的风格传递的精神，以除去外观的变化，这是以前没有的超声图像研究。在不牺牲图像结构的详细信息，它使任意的方式，内容传输。其次，与Adaptive实例标准化框加快，我们的框架实现在美国临床扫描所需的实时速度。第三，高效和有效的风格形象的选择策略，提出了确保目标式的美国形象和测试内容美国的形象正确地相互匹配。两个大型数据集美国实验证明我们的方法是优于国家的最先进的方法上做出DNNs对各种外观的变化稳健。

17. Variational Conditional-Dependence Hidden Markov Models for Human Action Recognition [PDF] 返回目录
Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis
Abstract: Hidden Markov Models (HMMs) are a powerful generative approach for modeling sequential data and time-series in general. However, the commonly employed assumption of the dependence of the current time frame to a single or multiple immediately preceding frames is unrealistic; more complicated dynamics potentially exist in real world scenarios. Human Action Recognition constitutes such a scenario, and has attracted increased attention with the advent of low-cost 3D sensors. The naturally arising variations and complex temporal dependencies have established this task as a challenging problem in the community. This paper revisits conventional sequential modeling approaches, aiming to address the problem of capturing time-varying temporal dependency patterns. To this end, we propose a different formulation of HMMs, whereby the dependence on past frames is dynamically inferred from the data. Specifically, we introduce a hierarchical extension by postulating an additional latent variable layer; therein, the (time-varying) temporal dependence patterns are treated as latent variables over which inference is performed. We leverage solid arguments from the Variational Bayes framework and derive a tractable inference algorithm based on the forward-backward algorithm. As we experimentally show using benchmark datasets, our approach yields competitive recognition accuracy and can effectively handle data with missing values.
摘要：隐马尔可夫模型（HMM）是用于一般模拟连续数据和时间序列的一个强大的生成方法。然而，当前时间帧的单个或多个紧接在前的帧的依赖性的通常使用的假设是不现实的;更复杂的动态潜在存在于真实世界的场景。人类行为识别构成这样的情景，并吸引具有低成本的3D传感器的出现越来越多的关注。自然产生的变化和复杂的时序依赖已经建立了这个任务，因为在社会上具有挑战性的问题。本文回访传统的顺序建模方法，旨在解决捕捉时间变化的时间依赖性模式的问题。为此，我们提出的HMM的不同的制剂，由此在过去的帧的依赖性被动态地从数据推断。具体来说，我们介绍通过假定一个附加潜变量层的分层扩展;在其中，所述（随时间变化）时间依赖性模式将被视为在其上执行推理潜变量。我们从变贝叶斯框架充分利用了坚实的论据并且基于向前向后的算法易处理推理算法。正如我们通过实验证明使用标准数据集，我们的方法产生有竞争力的识别精度和能有效地缺失值处理数据。

18. ACEnet: Anatomical Context-Encoding Network for Neuroanatomy Segmentation [PDF] 返回目录
Yuemeng Li, Hongming Li, Yong Fan
Abstract: Segmentation of brain structures from magnetic resonance (MR) scans plays an important role in the quantification of brain morphology. Since 3D deep learning models suffer from high computational cost, 2D deep learning methods are favored for their computational efficiency. However, existing 2D deep learning methods are not equipped to effectively capture 3D spatial contextual information that is needed to achieve accurate brain structure segmentation. In order to overcome this limitation, we develop an Anatomical Context-Encoding Network (ACEnet) to incorporate 3D spatial and anatomical contexts in 2D convolutional neural networks (CNNs) for efficient and accurate segmentation of brain structures from MR scans, consisting of 1) an anatomical context encoding module to incorporate anatomical information in 2D CNNs, 2) a spatial context encoding module to integrate 3D image information in 2D CNNs, and 3) a skull stripping module to guide 2D CNNs to attend to the brain. Extensive experiments on three benchmark datasets have demonstrated that our method outperforms state-of-the-art alternative methods for brain structure segmentation in terms of both computational efficiency and segmentation accuracy.
摘要：从分割磁共振脑结构（MR）扫描起着大脑形态的量化具有重要作用。由于3D深度学习模型从高计算成本受到影响，2D深学习方法有利于他们的计算效率。然而，现有的2D深学习方法不具备有效地捕捉所需要实现精确的大脑结构分割三维空间的上下文信息。为了克服这种限制，我们开发的解剖上下文编码网络（ACENET）掺入三维空间和解剖上下文在2D卷积神经网络（细胞神经网络），用于从MR扫描脑结构的有效和准确的分割，包括1）一个解剖上下文编码模块纳入在2D细胞神经网络，2）空间上下文编码模块到3D图像信息中的2D细胞神经网络整合，以及3）一个头骨汽提模块引导2D细胞神经网络参加到大脑的解剖信息。三个基准数据集大量的实验已经证明，我们的方法优于国家的最先进的脑结构分割的替代方法在两个计算效率和分割准确度方面。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-02-17

目录

摘要