摘要

1. Rembrandts and Robots: Using Neural Networks to Explore Authorship in Painting [PDF] 返回目录
Steven J. Frank, Andrea M. Frank
Abstract: We use convolutional neural networks to analyze authorship questions surrounding works of representational art. Trained on the works of an artist under study and visually comparable works of other artists, our system can identify forgeries and provide attributions. Our system can also assign classification probabilities within a painting, revealing mixed authorship and identifying regions painted by different hands.
摘要：我们使用卷积神经网络来分析周围的代表性艺术作品的著作权问题。培训了一个艺术家的所研究的作品和其他艺术家的视觉作品相媲美，我们的系统可以识别伪造并提供归属。我们的系统可以画内还可以指派分类概率，揭示了混合署名权，并确定由不同的双手涂上地区。

2. Component Analysis for Visual Question Answering Architectures [PDF] 返回目录
Camila Kolling, Jônatas Wehrmann, Rodrigo C. Barros
Abstract: Recent research advances in Computer Vision and Natural Language Processing have introduced novel tasks that are paving the way for solving AI-complete problems. One of those tasks is called Visual Question Answering (VQA). A VQA system must take an image and a free-form, open-ended natural language question about the image, and produce a natural language answer as the output. Such a task has drawn great attention from the scientific community, which generated a plethora of approaches that aim to improve the VQA predictive accuracy. Most of them comprise three major components: (i) independent representation learning of images and questions; (ii) feature fusion so the model can use information from both sources to answer visual questions; and (iii) the generation of the correct answer in natural language. With so many approaches being recently introduced, it became unclear the real contribution of each component for the ultimate performance of the model. The main goal of this paper is to provide a comprehensive analysis regarding the impact of each component in VQA models. Our extensive set of experiments cover both visual and textual elements, as well as the combination of these representations in form of fusion and attention mechanisms. Our major contribution is to identify core components for training VQA models so as to maximize their predictive performance.
摘要：在计算机视觉和自然语言处理的最新研究进展介绍，铺平解决AI完全问题的方式新颖任务。其中的一个任务被称为视觉答疑（VQA）。一个VQA系统必须采取的图像和关于图像的自由形式的，开放式的自然语言问题，而产生的自然语言回答作为输出。这样的任务已经从科学界，其产生的旨在提高VQA预测精度接近过多的高度关注。他们中的大多数包括三个主要组成部分：（一）独立表示学习图像和问题; （ⅱ）特征融合因此模型可以使用来自两个源的信息来回答问题视觉;及（iii）在自然语言的正确答案的产生。有了这么多的方法被新近推出的，它变得不明朗的各成分的模型的最终性能的真正的贡献。本文的主要目的是提供关于每个组件的VQA模型的影响进行全面分析。我们广泛组实验涵盖视觉和文本元素，以及这些表象的融合，注重机制的形式组合。我们的主要贡献是确定培训VQA车型的核心部件，以最大限度地提高其预测性能。

3. AlignNet: A Unifying Approach to Audio-Visual Alignment [PDF] 返回目录
Jianren Wang, Zhaoyuan Fang, Hang Zhao
Abstract: We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at this https URL.
摘要：我们目前AlignNet，即同步与下不均匀和不规则的错位参考音频视频的模式。 AlignNet得知一个视频的每个帧和音频之间的端至端致密的对应关系。注意，金字塔形处理，翘曲和亲和功能：我们的方法是根据简单的和行之有效的原则设计的。连同模型，我们发布一个跳舞数据集Dance50的培训和考核。定性，舞蹈，音乐对准和语音唇对准定量和主观评价结果表明，我们的方法远优于国家的最先进的方法。项目视频和代码可在此HTTPS URL。

4. Detect and Correct Bias in Multi-Site Neuroimaging Datasets [PDF] 返回目录
Christian Wachinger, Anna Rieckmann, Sebastian Pölsterl
Abstract: The desire to train complex machine learning algorithms and to increase the statistical power in association studies drives neuroimaging research to use ever-larger datasets. The most obvious way to increase sample size is by pooling scans from independent studies. However, simple pooling is often ill-advised as selection, measurement, and confounding biases may creep in and yield spurious correlations. In this work, we combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in neuroimaging. In the first experiment, Name That Dataset, we provide empirical evidence for the presence of bias by showing that scans can be correctly assigned to their respective dataset with 71.5% accuracy. Given such evidence, we take a closer look at confounding bias, which is often viewed as the main shortcoming in observational studies. In practice, we neither know all potential confounders nor do we have data on them. Hence, we model confounders as unknown, latent variables. Kolmogorov complexity is then used to decide whether the confounded or the causal model provides the simplest factorization of the graphical model. Finally, we present methods for dataset harmonization and study their ability to remove bias in imaging features. In particular, we propose an extension of the recently introduced ComBat algorithm to control for global variation across image features, inspired by adjusting for population stratification in genetics. Our results demonstrate that harmonization can reduce dataset-specific information in image features. Further, confounding bias can be reduced and even turned into a causal relationship. However, harmonziation also requires caution as it can easily remove relevant subject-specific information.
摘要：培养复杂的机器学习算法，提高了统计功率关联研究神经影像学驱动器研究使用越来越大的数据集的愿望。增加样本规模最明显的方法是由独立的研究集中扫描。然而，简单的池通常不明智作为选择，测量和混杂偏差可能在蠕变和屈服虚假相关。在这项工作中，我们结合大脑的35320个磁共振图像从17项研究审查神经影像学的偏见。在第一个实验中，名称该数据集，我们提供了由表示扫描可以正确地分配给它们各自的数据集与71.5％的准确度存在偏差的经验证据。鉴于这些证据，我们就在混杂的偏见，这通常被视为观察性研究的主要缺点一探究竟。在实践中，我们不知道，所有的潜在混杂因素我们也没有对他们的数据。因此，我们的模型混杂因素未知的，潜在变量。然后Kolmogorov复杂被用来决定是否混淆或因果模型提供图形模型的最简单的因式分解。最后，我们对数据集协调本发明的方法，并研究其去除影像学特征偏差的能力。特别是，我们提出了最近推出的打击算法的扩展来控制整个图像的功能，通过调整遗传学群体分层的启发全球变化。我们的研究结果表明，统一可以降低图像特征数据集的特定信息。另外，混杂偏压可以减少，甚至变成了因果关系。然而，harmonziation也需要谨慎，因为它可以很容易地删除相关对象特定信息。

5. Intra-Camera Supervised Person Re-Identification [PDF] 返回目录
Xiangping Zhu, Xiatian Zhu, Minxian Li, Pietro Morerio, Vittorio Murino, Shaogang Gong
Abstract: Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors.
摘要：现有人员重新鉴定（重新编号）方法主要是利用大集交叉的摄像机标识标记的训练数据。这需要繁琐的数据收集和注释过程，从而导致实际的再ID应用程序可扩展性差。在另一方面监督的重新编号方法不需要身份标签信息，但它们通常是从远不如和不足模型的性能受到影响。为了克服这些基本的限制，提出了一种基于独立的每个摄像机的身份注解的想法新颖的人重新鉴定的范例。这消除了最耗时和繁琐的摄像装置间的身份标记过程，显著减少人为批注努力的量。因此，产生了更多的可扩展性和更可行的设置，我们称之为摄像机内监督（ICS）的人重新编号，为此我们制定一个多任务多标签（MATE）深的学习方法。具体而言，MATE被设计用于在每个摄像机多任务推理框架自发现横相机身份对应。大量的实验证明我们的方法超过三个大的人重新编号数据集替代方法的成本效益优势。例如，MATE产生在建议ICS人再ID设置，以市场为1501 88.7％秩1的比分，显著跑赢无监督学习模式，并密切接近传统的完全监督学习的竞争对手。

6. Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task [PDF] 返回目录
Tomás Völker, Guillaume Boisson, Bertrand Chupeau
Abstract: In this paper we address the problem of view synthesis from large baseline light fields, by turning a sparse set of input views into a Multi-plane Image (MPI). Because available datasets are scarce, we propose a lightweight network that does not require extensive training. Unlike latest approaches, our model does not learn to estimate RGB layers but only encodes the scene geometry within MPI alpha layers, which comes down to a segmentation task. A Learned Gradient Descent (LGD) framework is used to cascade the same convolutional network in a recurrent fashion in order to refine the volumetric representation obtained. Thanks to its low number of parameters, our model trains successfully on a small light field video dataset and provides visually appealing results. It also exhibits convenient generalization properties regarding both the number of input views, the number of depth planes in the MPI, and the number of refinement iterations.
摘要：在本文中，我们解决视图合成的问题从大基线光场，通过转动稀疏集合的输入视图到多平面图像（MPI）。由于可用的数据集是稀缺的，我们建议，不需要大量的培训一个轻量级的网络。与最新的方法，我们的模型不学习估算RGB层，但仅编码MPI阿尔法层内的场景几何，这可以归结为一个细分任务。习得梯度下降（LGD）框架用于级联中一个反复出现的方式相同的卷积网络，以便改进所获得的体积表示。由于其数量少的参数，我们的模型成功列车小亮场图像数据集，并提供视觉吸引力的结果。这也显示出关于输入两种意见的数量，深度平面中的MPI的数量，和改进的迭代次数方便的泛化性能。

7. Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis [PDF] 返回目录
Tao Zhou, Huazhu Fu, Geng Chen, Jianbing Shen, Ling Shao
Abstract: Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesis has been proposed as an effective solution to this, where any missing modalities are synthesized from the existing ones. In this paper, we propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis, which learns a mapping from multi-modal source images (i.e., existing modalities) to target images (i.e., missing modalities). In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality, and a fusion network is employed to learn the common latent representation of multi-modal data. Then, a multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality, acting as a generator to synthesize the target images. Moreover, a layer-wise multi-modal fusion strategy is presented to effectively exploit the correlations among multiple modalities, in which a Mixed Fusion Block (MFB) is proposed to adaptively weight different fusion strategies (i.e., element-wise summation, product, and maximization). Extensive experiments demonstrate that the proposed model outperforms other state-of-the-art medical image synthesis methods.
摘要：磁共振成像（MRI）是一种广泛使用的神经成像技术，其可以提供不同的对比度（即，模式）的图像。这个融合多模态数据已经证明，在许多任务提高模型的性能特别有效。然而，由于不良的数据质量和频繁的患者差，收集所有方式为每位患者仍然是一个挑战。医用图像合成已经被提出作为一种有效的解决方案，这一点，其中任何缺失的方式从现有的合成。在本文中，我们提出一种用于多模态MR图像合成的新型混合融合网络（高净），该学习到的多模态的源图像的映射（即，现有的模式）到目标图像（即，丢失的方式）。在我们的Hi-网，一个特定的模态网络被用于学习的表示为每个单独的模式，并且采用的融合网络学习多模态数据的共同潜表示。然后，多模式合成网被设计成密集地结合具有分级特性的潜表示从每个模态，作为发电机来合成目标图像。此外，逐层多模态融合策略被呈现给有效地利用其中混合融合块（MFB）提出了自适应重量不同融合策略（即，逐元素求和，产品和多个模态之间的相关性，最大化）。广泛的实验表明，该模型优于其他国家的最先进的医用图像的合成方法。

8. Real-Time Semantic Background Subtraction [PDF] 返回目录
Anthony Cioppa, Marc Van Droogenbroeck, Marc Braham
Abstract: Semantic background subtraction SBS has been shown to improve the performance of most background subtraction algorithms by combining them with semantic information, derived from a semantic segmentation network. However, SBS requires high-quality semantic segmentation masks for all frames, which are slow to compute. In addition, most state-of-the-art background subtraction algorithms are not real-time, which makes them unsuitable for real-world applications. In this paper, we present a novel background subtraction algorithm called Real-Time Semantic Background Subtraction (denoted RT-SBS) which extends SBS for real-time constrained applications while keeping similar performances. RT-SBS effectively combines a real-time background subtraction algorithm with high-quality semantic information which can be provided at a slower pace, independently for each pixel. We show that RT-SBS coupled with ViBe sets a new state of the art for real-time background subtraction algorithms and even competes with the non real-time state-of-the-art ones. Note that python CPU and GPU implementations of RT-SBS will be released soon.
摘要：语义背景减除SBS已经显示出通过将它们与语义信息，从一个语义分割网络衍生组合以改善的最背景减除算法的性能。然而，SBS需要对所有帧，这是缓慢的计算高质量的语义分割口罩。此外，国家的最先进最背景减除算法不是实时的，这使得它们不适合于现实世界的应用。在本文中，我们提出了所谓的实时语义背景减法（表示RT-SBS）一种新型的背景减除算法延伸SBS实时受限的应用，同时保持性能相似。 RT-SBS有效地结合有能够以较慢的速度被提供，独立地对每个像素的高品质的语义信息的实时背景减除算法。我们表明，RT-SBS加上盛传将艺术进行实时背景减除算法的一个新的状态，甚至与非实时状态的最先进的人竞争。需要注意的是RT-SBS的蟒蛇CPU和GPU的实现将很快被释放。

9. Hierarchical Auto-Regressive Model for Image Compression Incorporating Object Saliency and a Deep Perceptual Loss [PDF] 返回目录
Yash Patel, Srikar Appalaraju, R. Manmatha
Abstract: We propose a new end-to-end trainable model for lossy image compression which includes a number of novel components. This approach incorporates 1) a hierarchical auto-regressive model; 2)it also incorporates saliency in the images and focuses on reconstructing the salient regions better; 3) in addition, we empirically demonstrate that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of deep learned image compression techniques as they do not align well with human perceptual similarity. We, therefore propose an alternative metric, which is learned on perceptual similarity data specific to image compression. Our experiments show that this new metric aligns significantly better with human judgments when compared to other hand-crafted or learned metrics. The proposed compression model not only generates images that are visually better but also gives superior performance for subsequent computer vision tasks such as object detection and segmentation when compared to other engineered or learned codecs.
摘要：我们提出了有损图像压缩，其包括许多新颖的部件的新的端至端的可训练模型。这种方法结合1）分层自回归模型; 2）它还采用在图像中的显着性，侧重于重构显着区域更好; 3）此外，我们凭经验证明普遍使用的评价指标，例如MS-SSIM和PSNR是不足判断的深了解到图像压缩技术的性能，因为它们不与人类感知相似井对齐。我们因此提出替代度量，这是在感知相似数据中的特定图像压缩获知。我们的实验表明，这种新的度量与对齐人为判断显著更好时，相对于其他手工制作或学习指标。所提出的压缩模式，不仅产生视觉上更好的图像，但相对于其他工程或学习编解码器时，也给出了后续的计算机视觉任务，如对象检测和分割卓越的性能。

10. Towards Precise Intra-camera Supervised Person Re-identification [PDF] 返回目录
Menglin Wang, Baisheng Lai, Haokun Chen, Jianqiang Huang, Xiaojin Gong, Xian-Sheng Hua
Abstract: Intra-camera supervision (ICS) for person re-identification (Re-ID) assumes that identity labels are independently annotated within each camera view and no inter-camera identity association is labeled. It is a new setting proposed recently to reduce the burden of annotation while expect to maintain desirable Re-ID performance. However, the lack of inter-camera labels makes the ICS Re-ID problem much more challenging than the fully supervised counterpart. By investigating the characteristics of ICS, this paper proposes camera-specific non-parametric classifiers, together with a hybrid mining quintuplet loss, to perform intra-camera learning. Then, an inter-camera learning module consisting of a graph-based ID association step and a Re-ID model updating step is conducted. Extensive experiments on three large-scale Re-ID datasets show that our approach outperforms all existing ICS works by a great margin. Our approach performs even comparable to state-of-the-art fully supervised methods in two of the datasets.
摘要：帧内照相机监管（ICS），用于人重新鉴定（再ID）假设身份标签每个摄像机视图内独立地注释和没有摄影机间身份关联被标记。这是最近提出的减少注释的负担，同时希望保持理想的再ID性能的新设置。然而，由于缺乏摄像装置间的标签使得ICS重新编号的问题远远超过了充分监督对口挑战。通过调查ICS的特点，提出了具体的摄像头，非参数的分类，与混合动力采矿五元组一起损失，执行摄像机内学习。然后，将由基于图的ID关联步骤和再ID模型更新步骤的相机间学习模块中进行。三个大型再ID的数据集大量的实验表明，我们的方法有很大裕度优于所有现有的ICS作品。我们的方法执行甚至可以媲美国家的最先进的完全监督两个数据集的方法。

11. A Zero-Shot based Fingerprint Presentation Attack Detection System [PDF] 返回目录
Haozhe Liu, Wentian Zhang, Guojie Liu, Feng Liu
Abstract: With the development of presentation attacks, Automated Fingerprint Recognition Systems(AFRSs) are vulnerable to presentation attack. Thus, numerous methods of presentation attack detection(PAD) have been proposed to ensure the normal utilization of AFRS. However, the demand of large-scale presentation attack images and the low-level generalization ability always astrict existing PAD methods' actual performances. Therefore, we propose a novel Zero-Shot Presentation Attack Detection Model to guarantee the generalization of the PAD model. The proposed ZSPAD-Model based on generative model does not utilize any negative samples in the process of establishment, which ensures the robustness for various types or materials based presentation attack. Different from other auto-encoder based model, the Fine-grained Map architecture is proposed to refine the reconstruction error of the auto-encoder networks and a task-specific gaussian model is utilized to improve the quality of clustering. Meanwhile, in order to improve the performance of the proposed model, 9 confidence scores are discussed in this article. Experimental results showed that the ZSPAD-Model is the state of the art for ZSPAD, and the MS-Score is the best confidence score. Compared with existing methods, the proposed ZSPAD-Model performs better than the feature-based method and under the multi-shot setting, the proposed method overperforms the learning based method with little training data. When large training data is available, their results are similar.
摘要：随着演示攻击的发展，指纹自动识别系统（AFRSs）很容易受到攻击的演示。因此，已提出演示攻击检测（PAD）的多种方法，以确保AFRS的正常使用。然而，大规模的进攻呈现图像的需求和低级别的泛化能力总是astrict现有PAD方法的实际表现。因此，我们提出了一个新颖的零射击演示攻击检测模型，以保证PAD模型的泛化。基于生成模型的提出ZSPAD的模型没有利用任何负面样本中建立的过程中，这样可以确保基于演示攻击各种类型或材料的坚固性。从其他自动编码器来基于模型不同的是，细粒度的地图架构提出了改进自动编码器来网络和特定任务的高斯模型被用来提高聚类质量的重建误差。同时，为了提高该模型的性能，9个信心分数本文中讨论。实验结果表明，ZSPAD-模型是本领域中用于ZSPAD的状态，并且MS-分数是最好的置信度得分。与现有的方法相比，该ZSPAD-模型比基于特征的方法，并在多合一设定好，所提出的方法overperforms很少训练数据的学习为基础的方法。当大量的训练数据是可用的，其结果是相似的。

12. Bi-Directional Generation for Unsupervised Domain Adaptation [PDF] 返回目录
Guanglei Yang, Haifeng Xia, Mingli Ding, Zhengming Ding
Abstract: Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains. Specifically, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classifiers and the cross-domain alignment constraints. We also design two classifiers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.
摘要：无监督领域适应性方便了未标记的目标域依托完善的源域信息。常规方法强行降低潜在空间域差异将导致固有的数据结构的破坏。为了平衡域间隙和内在结构的保存缓解，我们提出一致分类插两个中间域弥合源和目标域的双向代域适应模式。具体而言，两个交叉域发电机被用于合成一种域调节为另一方。我们提出的方法的性能可以由一致的分类器和跨域对齐约束来进一步增强。我们还设计了两个分类被联合优化，最大化的目标样本预测的一致性。大量的实验验证，我们提出的模型优于标准的跨域视觉基准的国家的最先进的。

13. Analysis Of Multi Field Of View Cnn And Attention Cnn On H&E Stained Whole-slide Images On Hepatocellular Carcinoma [PDF] 返回目录
Mehmet Burak Sayıcı, Rikiya Yamashita, Jeanne Shen
Abstract: Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide. Whole-slide imaging which is a method of scanning glass slides have been employed for diagnosis of HCC. Using high resolution Whole-slide images is infeasible for Convolutional Neural Network applications. Hence tiling the Whole-slide images is a common methodology for assigning Convolutional Neural Networks for classification and segmentation. Determination of the tile size affects the performance of the algorithms since small field of view can not capture the information on a larger scale and large field of view can not capture the information on a cellular scale. In this work, the effect of tile size on performance for classification problem is analysed. In addition, Multi Field of View CNN is assigned for taking advantage of the information provided by different tile sizes and Attention CNN is assigned for giving the capability of voting most contributing tile size. It is found that employing more than one tile size significantly increases the performance of the classification by 3.97% and both algorithms are found successful over the algorithm which uses only one tile size.
摘要：肝细胞癌（HCC）是癌症相关死亡的全球的主要原因。全滑动成像这是已被用于HCC的诊断扫描的载玻片的方法。使用高分辨率全幻灯片图像是不可行的卷积神经网络的应用。因此平铺全幻灯片图像是用于分类和分割分配卷积神经网络共同的方法。平铺尺寸的测定影响到自视野小的算法的性能不能捕获在更大的规模和大视场的信息不能捕获在细胞水平的信息。在这项工作中，瓷砖的大小对分类问题性能的影响进行了分析。另外，查看CNN的多场被分配用于拍摄的CNN被分配给了投票贡献最大平铺尺寸的能力不同瓷砖的大小和注意事项中提供的信息优势。研究发现，使用一个以上的瓷砖尺寸由3.97％显著提高分类的性能和算法都被发现在成功只使用一个分块大小的算法。

14. End-to-End Face Parsing via Interlinked Convolutional Neural Networks [PDF] 返回目录
Zi Yin, Valentin Yiu, Xiaolin Hu, Liang Tang
Abstract: Face parsing is an important computer vision task that requires accurate pixel segmentation of facial parts (such as eyes, nose, mouth, etc.), providing a basis for further face analysis, modification, and other applications. In this paper, we introduce a simple, end-to-end face parsing framework: STN-aided iCNN (STN-iCNN), which extends interlinked Convolutional Neural Network (iCNN) by adding a Spatial Transformer Network (STN) between the two isolated stages. The STN-iCNN uses the STN to provide a trainable connection to the original two-stage iCNN pipe-line, making end-to-end joint training possible. Moreover, as a by-product, STN also provides more precise cropped parts than the original cropper. Due to the two advantages, our approach significantly improves the accuracy of the original model.
摘要：面对解析是一项重要的计算机视觉任务，需要面部成分精确的像素分割（如眼睛，鼻子，嘴等），为进一步面上的分析，修改和其他应用程序的基础。在本文中，我们介绍一个简单的，端 - 端面上解析框架：STN辅助ICNN（STN-ICNN），其延伸通过两个分离之间添加空间变换器网（STN）相通卷积神经网络（ICNN）阶段。的STN-ICNN使用STN提供到原来的两阶段ICNN管线可训练连接，使得端至端联合培养成为可能。此外，作为副产物，STN还提供比原来的裁剪机更精确的裁切部分。由于两个优势，我们的做法显著提高了原有模型的准确性。

15. Uniform Interpolation Constrained Geodesic Learning on Data Manifold [PDF] 返回目录
Cong Geng, Jia Wang, Li Chen, Wenbo Bao, Chu Chu, Zhiyong Gao
Abstract: In this paper, we propose a method to learn a minimizing geodesic within a data manifold. Along the learned geodesic, our method can generate high-quality interpolations between two given data samples. Specifically, we use an autoencoder network to map data samples into latent space and perform interpolation via an interpolation net-work. We add prior geometric information to regularize our autoencoder for the convexity of representations so that for any given interpolation approach, the generated interpolations remain within the distribution of the data manifold. Before the learning of a geodesic, a proper Riemannianmetric should be defined. Therefore, we induce a Riemannian metric by the canonical metric in the Euclidean space which the data manifold is isometrically immersed in. Based on this defined Riemannian metric, we introduce a constant speed loss and a minimizing geodesic loss to regularize the interpolation network to generate uniform interpolation along the learned geodesic on the manifold. We provide a theoretical analysis of our model and use image translation as an example to demonstrate the effectiveness of our method.
摘要：在本文中，我们提出了学习数据歧管内的测地最小化的方法。除了学习大地，我们的方法可以产生两个给定的数据样本之间的高品质插值。具体地，我们使用自动编码器网络来的数据样本映射到潜在空间，并通过一个插网络执行内插。我们之前添加的几何信息来规范我们的交涉的凸自动编码，这样对于任何给定的插值方法，生成插值保持数据流形的分布范围内。测地的学习之前，适当Riemannianmetric应该被定义。因此，我们通过在欧几里德空间中的典型度量其中数据歧管等距浸入诱导黎曼度量。在此基础上定义的黎曼度量，我们引入一个恒定的速度损失和最小化测地损失到正规化的内插网络，以产生均匀的沿着歧管上的教训测地内插。我们提供我们的模型的理论分析和使用图像的平移作为一个例子来证明我们的方法的有效性。

16. Deep-HR: Fast Heart Rate Estimation from Face Video Under Realistic Conditions [PDF] 返回目录
Mohammad Sabokrou, Masoud Pourreza, Xiaobai Li, Mahmood Fathy, Guoying Zhao
Abstract: This paper presents a novel method for remote heart rate (HR) estimation. Recent studies have proved that blood pumping by the heart is highly correlated to the intense color of face pixels, and surprisingly can be utilized for remote HR estimation. Researchers successfully proposed several methods for this task, but making it work in realistic situations is still a challenging problem in computer vision community. Furthermore, learning to solve such a complex task on a dataset with very limited annotated samples is not reasonable. Consequently, researchers do not prefer to use the deep learning approaches for this problem. In this paper, we propose a simple yet efficient approach to benefit the advantages of the Deep Neural Network (DNN) by simplifying HR estimation from a complex task to learning from very correlated representation to HR. Inspired by previous work, we learn a component called Front-End (FE) to provide a discriminative representation of face videos, afterward a light deep regression auto-encoder as Back-End (BE) is learned to map the FE representation to HR. Regression task on the informative representation is simple and could be learned efficiently on limited training samples. Beside of this, to be more accurate and work well on low-quality videos, two deep encoder-decoder networks are trained to refine the output of FE. We also introduce a challenging dataset (HR-D) to show that our method can efficiently work in realistic conditions. Experimental results on HR-D and MAHNOB datasets confirm that our method could run as a real-time method and estimate the average HR better than state-of-the-art ones.
摘要：本文提出了远程心脏速率（HR）估计的新方法。最近的研究已经证明，泵血由心脏高度相关面的像素的强烈的色彩，并出人意料地可用于远程HR估计。研究人员成功地提出了这个任务的几种方法，但使其工作在实际情况仍然是计算机视觉社区一个具有挑战性的问题。此外，学习来解决非常有限的注释样本数据集这样一个复杂的任务，是不合理的。因此，研究人员并不喜欢使用深层学习方法针对此问题。在本文中，我们提出了一个简单而有效的方法，由一个复杂的任务简化HR估计从非常相关的代表性学习人力资源，以造福于深层神经网络（DNN）的优点。通过前期工作的启发，我们了解到一个叫做前端（FE）组件来提供的面部视频的具有区分表示，后来光深回归自动编码器来作为后端（BE）被学习映射FE表示对HR。在信息表示回归的任务很简单，并且可以在有限的训练样本有效地学习。除了这一点，更准确，并且运作良好的低质量的视频，两道深深的编码器，解码器网络进行培训，以完善FE的输出。我们还引入了一个具有挑战性的数据集（HR-d）表明我们的方法可以有效地在现实条件下工作。在HR-d和MAHNOB数据集实验结果证实了我们的方法可以作为一个实时运行的方法，更好地估计平均HR比国家的最先进的。

17. A Visual-inertial Navigation Method for High-Speed Unmanned Aerial Vehicles [PDF] 返回目录
Xin-long Luo, Jia-hui Lv, Geng Sun
Abstract: This paper investigates the localization problem of high-speed high-altitude unmanned aerial vehicle (UAV) with a monocular camera and inertial navigation system. It proposes a navigation method utilizing the complementarity of vision and inertial devices to overcome the singularity which arises from the horizontal flight of UAV. Furthermore, it modifies the mathematical model of localization problem via separating linear parts from nonlinear parts and replaces a nonlinear least-squares problem with a linearly equality-constrained optimization problem. In order to avoid the ill-condition property near the optimal point of sequential unconstrained minimization techniques(penalty methods), it constructs a semi-implicit continuous method with a trust-region technique based on a differential-algebraic dynamical system to solve the linearly equality-constrained optimization problem. It also analyzes the global convergence property of the semi-implicit continuous method in an infinity integrated interval other than the traditional convergence analysis of numerical methods for ordinary differential equations in a finite integrated interval. Finally, the promising numerical results are also presented.
摘要：本文使用单眼照相机和惯性导航系统调查高速高空无人驾驶飞行器（UAV）的定位问题。它提出了利用视觉和惯性器件的互补性，以克服其来自UAV的水平飞行的奇点的导航方法。此外，通过分离非线性份线性部分修改定位问题的数学模型，并替换一个非线性最小二乘问题线性等式约束的优化问题。为了避免顺序无约束极小化技术（惩罚的方法）的最佳点附近的病态属性，它构造与基于一个微分代数动力系统上的信赖域技术的半隐式连续方法，解决了线性平等T-受约束的优化问题。还分析在无限远的半隐式连续法的全局收敛性集成间隔以外的用于在有限常微分方程的数值方法的传统的收敛性分析集成间隔。最后，有前途的数值结果也。

18. MFFW: A new dataset for multi-focus image fusion [PDF] 返回目录
Shuang Xu, Xiaoli Wei, Chunxia Zhang, Junmin Liu, Jiangshe Zhang
Abstract: Multi-focus image fusion (MFF) is a fundamental task in the field of computational photography. Current methods have achieved significant performance improvement. It is found that current methods are evaluated on simulated image sets or Lytro dataset. Recently, a growing number of researchers pay attention to defocus spread effect, a phenomenon of real-world multi-focus images. Nonetheless, defocus spread effect is not obvious in simulated or Lytro datasets, where popular methods perform very similar. To compare their performance on images with defocus spread effect, this paper constructs a new dataset called MFF in the wild (MFFW). It contains 19 pairs of multi-focus images collected on the Internet. We register all pairs of source images, and provide focus maps and reference images for part of pairs. Compared with Lytro dataset, images in MFFW significantly suffer from defocus spread effect. In addition, the scenes of MFFW are more complex. The experiments demonstrate that most state-of-the-art methods on MFFW dataset cannot robustly generate satisfactory fusion images. MFFW can be a new baseline dataset to test whether an MMF algorithm is able to deal with defocus spread effect.
摘要：多聚焦图像融合（MFF）是计算摄影领域的根本任务。目前的方法都取得了显著的性能提升。研究发现，目前的方法是在模拟图像集或数据集Lytro公司评估。近来，越来越多的研究者的注意散焦波及效应，真实世界的多聚焦图像的现象。尽管如此，散焦散布效果并不模拟或Lytro公司的数据集，其中常用的方法执行非常相似的明显。比较其与散焦散布效果的图像性能，本文构建了一个在野外（MFFW）称为MFF新的数据集。它包含19对收集互联网上的多聚焦图像。我们注册所有对源图像，以及对部分重点提供地图和参考图像。与Lytro公司的数据集相比，MFFW图像显著遭受散焦散布效果。此外，MFFW的场景都比较复杂。实验证明上MFFW数据集，大多数国家的最先进的方法不能生成鲁棒令人满意融合图像。 MFFW可以是一个新的基准数据集测试的MMF算法是否能够处理散焦散布效果。

19. Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space [PDF] 返回目录
Mohammad Saeed Abrishami, Amir Erfan Eshratifar, David Eigen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram
Abstract: Recent advances in the field of artificial intelligence have been made possible by deep neural networks. In applications where data are scarce, transfer learning and data augmentation techniques are commonly used to improve the generalization of deep learning models. However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input. This is particularly critical when large models are implemented on embedded devices with limited computational and energy resources. In this work, we propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space. Our experimental results show that the proposed method drastically reduces the computation, while the accuracy of models is negligibly compromised.
摘要：在人工智能领域的最新进展已经通过深层神经网络成为可能。在数据稀少申请，转让学习和数据增强技术常用来改善深学习模式的推广。然而，微调的原始输入空间数据增强传输模型运行完整网络为每一个扩充输入计算成本高。当大型模型与有限的计算资源和能源的嵌入式设备中实现这一点特别重要。在这项工作中，我们提出了取代在大约一个在嵌入空间完全作用于原始输入空间增强的方法。我们的实验结果表明，该方法大大减少了计算量，而模型的准确性受到影响可以忽略不计。

20. Progressive Object Transfer Detection [PDF] 返回目录
Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao
Abstract: Recent development of object detection mainly depends on deep learning with large-scale benchmarks. However, collecting such fully-annotated data is often difficult or expensive for real-world applications, which restricts the power of deep neural networks in practice. Alternatively, humans can detect new objects with little annotation burden, since humans often use the prior knowledge to identify new objects with few elaborately-annotated examples, and subsequently generalize this capacity by exploiting objects from wild images. Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework. Specifically, we make three main contributions in this paper. First, POTD can leverage various object supervision of different domains effectively into a progressive detection procedure. Via such human-like learning, one can boost a target detection task with few annotations. Second, POTD consists of two delicate transfer stages, i.e., Low-Shot Transfer Detection (LSTD), and Weakly-Supervised Transfer Detection (WSTD). In LSTD, we distill the implicit object knowledge of source detector to enhance target detector with few annotations. It can effectively warm up WSTD later on. In WSTD, we design a recurrent object labelling mechanism for learning to annotate weakly-labeled images. More importantly, we exploit the reliable object supervision from LSTD, which can further enhance the robustness of target detector in the WSTD stage. Finally, we perform extensive experiments on a number of challenging detection benchmarks with different settings. The results demonstrate that, our POTD outperforms the recent state-of-the-art approaches.
摘要：目标检测的最新发展主要依赖于与大型基准深度学习。然而，收集这些完全标注的数据往往是困难的或昂贵的现实世界的应用，制约深层神经网络的力量，在实践中。另外，人类可以检测几乎没有注释负担新的对象，因为人们经常使用的先验知识来识别与几个精心标注的例子新的对象，然后通过利用野生图像中物体推广这方面的能力。通过学习来检测这个过程的启发，我们提出了一个新的进步对象传输检测（POTD）框架。具体来说，我们在这三个主要贡献。首先，POTD可以利用不同的域的各种对象监督有效成逐行检测过程。通过这种类似人类的学习，可以提高很少注释的目标探测任务。其次，POTD由两个精致的传输段的，亦即，低射击转移侦测（LSTD），和弱监督转移侦测（WSTD）。在LSTD，我们提炼源检测的隐式对象的知识，加强与一些注释靶标检测。它可以有效地热身WSTD以后。在WSTD，我们设计了一个经常性的对象标识机制，学习注释弱标记的图像。更重要的是，我们利用从LSTD可靠对象的监督，这可以进一步提高目标检测的鲁棒性的WSTD阶段。最后，我们在许多不同的设置具有挑战性的检测基准进行大量的实验。结果表明，我们的POTD优于近期国家的最先进的方法。

21. Improving Place Recognition Using Dynamic Object Detection [PDF] 返回目录
Juan Pablo Munoz, Scott Dexter
Abstract: Traditional appearance-based place recognition algorithms based on handcrafted features have proven inadequate in environments with a significant presence of dynamic objects -- objects that may or may not be present in an agent's subsequent visits. Place representations from features extracted using Deep Learning approaches have gained popularity for their robustness and because the algorithms that used them yield better accuracy. Nevertheless, handcrafted features are still popular in devices that have limited resources. This article presents a novel approach that improves place recognition in environments populated by dynamic objects by incorporating the very knowledge of these objects to improve the overall quality of the representations of places used for matching. The proposed approach fuses object detection and place description, Deep Learning and handcrafted features, with the significance of reducing memory and storage requirements. This article demonstrates that the proposed approach yields improved place recognition accuracy, and was evaluated using both synthetic and real-world datasets. The adoption of the proposed approach will significantly improve place recognition results in environments populated by dynamic objects, and explored by devices with limited resources, with particular utility in both indoor and outdoor environments.
摘要：基于手工提供传统的外观，基于位置识别算法已经在环境中证明是不充分的动态对象的显著存在 - 对象可能会或可能不会出现在代理的后续访问。从使用功能的地方交涉提取深层学习方法已经得到普及为他们的鲁棒性和因为用他们的算法产生更好的精度。然而，手工制作的功能仍然在具有有限资源的设备上普及。本文给出了一个改善通过将这些对象的非常知识，提高用于匹配的地方交涉的整体质量动态对象居住环境的地方认同的新方法。所提出的方法保险丝目标检测与地方的描述，深入学习和手工制作的特点，以减少内存和存储需求的意义。本文演示了该方法的产量提高了地方的识别精度，并使用合成和真实世界的数据集进行了评价。该方法的通过将显著改善动态对象居住环境处的识别结果，并通过设备资源有限探索，在室内和室外环境中特别有用。

22. Learning spatio-temporal representations with temporal squeeze pooling [PDF] 返回目录
Guoxi Huang, Adrian G. Bors
Abstract: In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images, named Squeezed Images. By embedding the Temporal Squeeze pooling as a layer into off-the-shelf Convolution Neural Networks (CNN), we design a new video classification model, named Temporal Squeeze Network (TeSNet). The resulting Squeezed Images contain the essential movement information from the video frames, corresponding to the optimization of the video classification task. We evaluate our architecture on two video classification benchmarks, and the results achieved are compared to the state-of-the-art.
摘要：在本文中，我们提出了一个新的视频表示学习方法，命名为颞挤压（TS）池，它可以从视频帧的长序列中提取必要的运动信息，并将其映射到一组几张图片，命名为压缩图像的。通过嵌入的时空挤压池作为一个层进入关闭的，现成的卷积神经网络（CNN），我们设计了一个新的视频分类模型，命名为颞挤压网络（TeSNet）。得到的压缩映像包含视频帧的基本运动信息，对应的视频分类任务的最优化。我们评估我们对两个视频分类的基准架构，以及所取得的结果相比，国家的最先进的。

23. Object Detection as a Positive-Unlabeled Problem [PDF] 返回目录
Yuewei Yang, Kevin J Liang, Lawrence Carin
Abstract: As with other deep learning methods, label quality is important for learning modern convolutional object detectors. However, the potentially large number and wide diversity of object instances that can be found in complex image scenes makes constituting complete annotations a challenging task; objects missing annotations can be observed in a variety of popular object detection datasets. These missing annotations can be problematic, as the standard cross-entropy loss employed to train object detection models treats classification as a positive-negative (PN) problem: unlabeled regions are implicitly assumed to be background. As such, any object missing a bounding box results in a confusing learning signal, the effects of which we observe empirically. To remedy this, we propose treating object detection as a positive-unlabeled (PU) problem, which removes the assumption that unlabeled regions must be negative. We demonstrate that our proposed PU classification loss outperforms the standard PN loss on PASCAL VOC and MS COCO across a range of label missingness, as well as on Visual Genome and DeepLesion with full labels.
摘要：与其他深的学习方法，标签质量是学习现代卷积对象探测器重要。然而，潜在的大量和对象实例的广泛多样性，可以在复杂的图像场景中找到使构成完整注释的具有挑战性的任务;对象缺少注释可以在各种流行的物体检测的数据集的被观察到。这些缺失的注释可以是有问题的，作为标准的交叉熵损失用于列车对象检测模型对待分类为正 - 负（PN）问题：未标记的区域被隐含地假定为背景。因此，任何物体缺少一个令人困惑的学习信号边框效果，其影响的我们经验观察。为了解决这个问题，我们提出治疗目标检测为阳性，未标记（PU）的问题，这消除假设未标记的区域必须是负的。我们证明了我们提出的PU分类损失优于上PASCAL VOC和MS COCO标准PN损失在一系列标签missingness的，以及对视觉基因组与DeepLesion全标签。

24. Validating uncertainty in medical image translation [PDF] 返回目录
Jacob C. Reinhold, Yufan He, Shizhong Han, Yunqiang Chen, Dashan Gao, Junghoon Lee, Jerry L. Prince, Aaron Carass
Abstract: Medical images are increasingly used as input to deep neural networks to produce quantitative values that aid researchers and clinicians. However, standard deep neural networks do not provide a reliable measure of uncertainty in those quantitative values. Recent work has shown that using dropout during training and testing can provide estimates of uncertainty. In this work, we investigate using dropout to estimate epistemic and aleatoric uncertainty in a CT-to-MR image translation task. We show that both types of uncertainty are captured, as defined, providing confidence in the output uncertainty estimates.
摘要：医学图像越来越多地用作输入深层神经网络，产生定量值援助研究人员和临床医生。但是，标准的深层神经网络的不确定性提供了可靠的测量这些定量值。最近的研究表明，训练期间使用辍学和测试可以提供不确定性的估计。在这项工作中，我们探讨用差来估计在CT对MR图像平移任务认知和肆意的不确定性。我们发现，这两种类型的不确定性被捕获，定义，提供的输出不确定性估计的信心。

25. Finding novelty with uncertainty [PDF] 返回目录
Jacob C. Reinhold, Yufan He, Shizhong Han, Yunqiang Chen, Dashan Gao, Junghoon Lee, Jerry L. Prince, Aaron Carass
Abstract: Medical images are often used to detect and characterize pathology and disease; however, automatically identifying and segmenting pathology in medical images is challenging because the appearance of pathology across diseases varies widely. To address this challenge, we propose a Bayesian deep learning method that learns to translate healthy computed tomography images to magnetic resonance images and simultaneously calculates voxel-wise uncertainty. Since high uncertainty occurs in pathological regions of the image, this uncertainty can be used for unsupervised anomaly segmentation. We show encouraging experimental results on an unsupervised anomaly segmentation task by combining two types of uncertainty into a novel quantity we call scibilic uncertainty.
摘要：医学图像常常被用来检测和表征病理和疾病;然而，自动地识别和在医学图像中分割病理学是具有挑战性，因为病理的跨疾病的外观变化很大。为了应对这一挑战，我们提出了一个贝叶斯深度学习方法学会翻译健康的计算机断层成像图像磁共振图像，同时计算出体素明智的不确定性。由于高的不确定性在图像的病理区域发生时，这种不确定性可用于无监督异常分割。我们展示两种类型的不确定性组合为我们称之为scibilic不确定性的一种新型的数量，鼓励在无人监督的异常分割任务的实验结果。

26. Patternless Adversarial Attacks on Video Recognition Networks [PDF] 返回目录
Itay Naeh, Roi Pony, Shie Mannor
Abstract: Deep neural networks for classification of videos, just like image classification networks, may be subjected to adversarial manipulation. The main difference between image classifiers and video classifiers is that the latter usually use temporal information contained within the video in the form of optical flow or implicitly by various differences between adjacent frames. In this work we present a manipulation scheme for fooling video classifiers by introducing a spatial patternless temporal perturbation that is practically unnoticed by human observers and undetectable by leading image adversarial pattern detection algorithms. After demonstrating the manipulation of action classification of single videos, we generalize the procedure to make adversarial patterns with temporal invariance that generalizes across different classes for both targeted and untargeted attacks.
摘要：视频分类深层神经网络，就像图像分类网络，可能会受到敌对操作。图像分类器和分类器的视频之间的主要区别是，后者通常是通过在相邻帧之间的各种差异使用在光流的形式包含在所述视频内的时间信息或隐式。在这项工作中，我们通过引入用于呈现视频嘴硬分类器的操作方案的空间无图案颞扰动是通过实际上人类观察者忽视和领先的图像对抗性图案检测算法检测到。展示的单一视频行为分类的操作后，我们推广的过程，使对抗模式与时间不变性跨越不同类别归纳为有针对性和无针对性的攻击。

27. From IC Layout to Die Photo: A CNN-Based Data-Driven Approach [PDF] 返回目录
Hao-Chiang Shao, Chao-Yi Peng, Jun-Rei Wu, Chia-Wen Lin, Shao-Yun Fang, Pin-Yen Tsai, Yan-Hsiu Liu
Abstract: Since IC fabrication is costly and time-consuming, it is highly desirable to develop virtual metrology tools that can predict the properties of a wafer based on fabrication configurations without performing physical measurements on a fabricated IC. We propose a deep learning-based data-driven framework consisting of two convolutional neural networks: i) LithoNet that predicts the shape deformations on a circuit due to IC fabrication, and ii) OPCNet that suggests IC layout corrections to compensate for such shape deformations. By learning the shape correspondence between pairs of layout design patterns and their SEM images of the product wafer thereof, given an IC layout pattern, LithoNet can mimic the fabrication procedure to predict its fabricated circuit shape for virtual metrology. Furthermore, LithoNet can take the wafer fabrication parameters as a latent vector to model the parametric product variations that can be inspected on SEM images. In addition, traditional lithography simulation methods used to suggest a correction on a lithographic photomask is computationally expensive. Our proposed OPCNet mimics the optical proximity correction (OPC) procedure and efficiently generates a corrected photomask by collaborating with LithoNet to examine if the shape of a fabricated IC circuitry best matches its original layout design. As a result, the proposed LithoNet-OPCNet framework cannot only predict the shape of a fabricated IC from its layout pattern, but also suggests a layout correction according to the consistency between the predicted shape and the given layout. Experimental results with several benchmark layout patterns demonstrate the effectiveness of the proposed method.
摘要：由于IC制造是昂贵和费时的，这是非常需要开发虚拟计量工具，可以预测在晶片的基础上制造的配置属性，而无需在制造IC执行的物理测量。我们提出了一个深基于学习的数据驱动框架由两个卷积神经网络的：ⅰ）LithoNet，预测由于IC制造中的电路上的形状的变形，以及ii）OPCNet即表明IC布局校正以补偿这样的形状变形。通过学习的布局设计模式对以及它们的晶片，给定的IC布局图案的产品的它们的SEM图像之间的形状的对应关系，LithoNet可以模仿的制造程序，以预测其制造的电路形状为虚拟测量。此外，LithoNet可以采取在晶片制造参数作为潜矢量的是可在SEM图像被检参数变型产品进行建模。此外，用来建议光刻掩膜修正传统的光刻仿真方法在计算上是昂贵的。我们提出的OPCNet模仿光学邻近校正（OPC）过程，有效地生成由与LithoNet合作，以检查是否a的形状制造的IC电路最佳地匹配它的原始布局设计校正光掩模。其结果是，所提出的LithoNet-OPCNet框架不仅可以预测的形状从其布局图案制造IC，但也表明根据所预测的形状和给定的布局之间的一致性的布局校正。与几个基准布局模式的实验结果证明了该方法的有效性。

28. Synaptic Integration of Spatiotemporal Features with a Dynamic Neuromorphic Processor [PDF] 返回目录
Mattias Nilsson, Foteini Liwicki, Fredrik Sandin
Abstract: Spiking neurons can perform spatiotemporal feature detection by nonlinear synaptic and dendritic integration of presynaptic spike patterns. Multicompartment models of nonlinear dendrites and related neuromorphic circuit designs enable faithful imitation of such dynamic integration processes, but these approaches are also associated with a relatively high computing cost or circuit size. Here, we investigate synaptic integration of spatiotemporal spike patterns with multiple dynamic synapses on point-neurons in the DYNAP-SE neuromorphic processor, which can offer a complementary resource-efficient, albeit less flexible, approach to feature detection. We investigate how previously proposed excitatory--inhibitory pairs of dynamic synapses can be combined to integrate multiple inputs, and we generalize that concept to a case in which one inhibitory synapse is combined with multiple excitatory synapses. We characterize the resulting delayed excitatory postsynaptic potentials (EPSPs) by measuring and analyzing the membrane potentials of the neuromorphic neuronal circuits. We find that biologically relevant EPSP delays, with variability of order 10 milliseconds per neuron, can be realized in the proposed manner by selecting different synapse combinations, thanks to device mismatch. Based on these results, we demonstrate that a single point-neuron with dynamic synapses in the DYNAP-SE can respond selectively to presynaptic spikes with a particular spatiotemporal structure, which enables, for instance, visual feature tuning of single neurons.
摘要：扣球神经元可以通过非线性突触和突触前尖峰图案树突集成执行时空特征检测。非线性树突和相关神经形态电路设计的多室模型使这样的动态集成过程的忠实模仿，但这些方法也具有相对高的计算成本或电路尺寸相关联。在这里，我们调查的时空秒杀模式的突触整合与点神经元多个动态突触在DYNAP-SE神经形态处理器，可提供互补资源利用率高，尽管不那么灵活，方法特征检测。我们研究如何先前提出的兴奋 - 动态突触抑制对可以合并整合多个输入，并推广了这一概念，其中一个抑制性突触与多个兴奋性突触结合的情况下。我们通过测量和分析的神经形态电路的神经元的膜电位表征所得延迟兴奋性突触后电位（EPSPS）。我们发现，生物学相关的EPSP延误，有秩序神经元每10毫秒，能够在建议的方式通过对设备不匹配选择不同的突触的组合，由于可以实现的可变性。基于这些结果，我们表明，单个点的神经元与所述DYNAP-SE动态突触可以选择性到突触前尖峰与特定时空结构，这使得能够，例如，单神经元的视觉特征的调谐响应。

29. Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes [PDF] 返回目录
Rachel Lea Draelos, David Dov, Maciej A. Mazurowski, Joseph Y. Lo, Ricardo Henao, Geoffrey D. Rubin, Lawrence Carin
Abstract: Developing machine learning models for radiology requires large-scale imaging data sets with labels for abnormalities, but the process is challenging due to the size and complexity of the data as well as the cost of labeling. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 20,201 unique patients. This is the largest multiply-annotated chest CT data set reported. To annotate this data set, we developed a rule-based method for automatically extracting abnormality labels from radiologist free-text reports with an average F-score of 0.976 (min 0.941, max 1.0). We also developed a model for multilabel abnormality classification of chest CT volumes that uses a deep convolutional neural network (CNN). This model reached a classification performance of AUROC greater than 0.90 for 18 abnormalities, with an average AUROC of 0.773 for all 83 abnormalities, demonstrating the feasibility of learning from unfiltered whole volume CT data. We show that training on more labels improves performance significantly: for a subset of 9 labels - nodule, opacity, atelectasis, pleural effusion, consolidation, mass, pericardial effusion, cardiomegaly, and pneumothorax - the model's average AUROC increased by 10 percent when the number of training labels was increased from 9 to all 83. All code for volume preprocessing, automated label extraction, and the volume abnormality prediction model will be made publicly available. The 36,316 CT volumes and labels will also be made publicly available pending institutional approval.
摘要：放射开发机器学习模型需要大规模成像数据集的标签异常，但这一进程因以及标签的成本数据的规模和复杂性挑战。我们策划并分析了从20,201独特患者36,316卷的胸部CT扫描（CT）数据集。这是最大的多重注解胸部CT数据集的报道。为了诠释这组数据中，我们开发了从放射科医生自由文本报告，其中的0.976的平均F-得分（分0.941，最大1.0）自动提取异常标签基于规则的方法。我们还开发了一个使用深卷积神经网络（CNN）胸部CT卷的多标签分类异常的模型。这种模式达到更高的AUROC比0.90分类表现为18点的异常，与0.773为所有83种异常的平均AUROC，展示了从未经过滤的全容积CT数据中学习的可行性。我们展示更多的标签，培训提高性能显著：对于9个标签的一个子集 - 结节，不透明度，肺不张，胸腔积液，整合，质量，心包积液，心脏扩大，气胸 - 模型的平均AUROC增加时，10％的数量训练标签的从9增加到所有83.体积预处理的所有代码，自动标签提取，并且体积异常预测模型将被公开。在36,316 CT容积和标签也将公之于众未决机构的认可。

30. A Single RGB Camera Based Gait Analysis with a Mobile Tele-Robot for Healthcare [PDF] 返回目录
Ziyang Wang, Fani Deligianni, Qi Liu, Irina Voiculescu, Guang-Zhong Yang
Abstract: With the increasing awareness of high-quality life, there is a growing need for health monitoring devices running robust algorithms in home environment. Health monitoring technologies enable real-time analysis of users' health status, offering long-term healthcare support and reducing hospitalization time. The purpose of this work is twofold, the software focuses on the analysis of gait, which is widely adopted for joint correction and assessing any lower limb or spinal problem. On the hardware side, we design a novel marker-less gait analysis device using a low-cost RGB camera mounted on a mobile tele-robot. As gait analysis with a single camera is much more challenging compared to previous works utilizing multi-cameras, a RGB-D camera or wearable sensors, we propose using vision-based human pose estimation approaches. More specifically, based on the output of two state-of-the-art human pose estimation models (Openpose and VNect), we devise measurements for four bespoke gait parameters: inversion/eversion, dorsiflexion/plantarflexion, ankle and foot progression angles. We thereby classify walking patterns into normal, supination, pronation and limp. We also illustrate how to run the purposed machine learning models in low-resource environments such as a single entry-level CPU. Experiments show that our single RGB camera method achieves competitive performance compared to state-of-the-art methods based on depth cameras or multi-camera motion capture system, at smaller hardware costs.
摘要：随着越来越多的高品质生活的意识，人们越来越需要健康监测运行在家庭环境中稳定的算法设备。健康监测技术使用户的健康状况进行实时分析，提供长期的医疗支持，减少住院时间。这项工作的目的是双重的，该软件侧重于步态分析，广泛联合校正和评估任何下肢或脊柱问题采纳。在硬件方面，我们使用搭载于移动远程机器人低成本RGB照相机设计的新型无标记步态分析装置。如同一台摄像机步态分析更具有挑战性相比，利用多摄像机以往的作品，一个RGB-d相机或穿戴式传感器，我们建议采用基于视觉的人体姿势估计方法。更具体地，基于对状态的最先进的两种人类姿势估计模型（Openpose和VNect）的输出中，我们设计出四个定制步态参数测量：反转/外翻，背屈/跖，踝关节和脚的进展角度。我们由此分类行走模式为正常，外旋，内旋和跛行。我们还说明了如何运行旨意机器学习在低资源环境等车型单一的入门级CPU。实验结果表明，相比于基于深度相机或多相机运动捕捉系统，在较小的硬件成本的国家的最先进的方法提供了单个RGB相机方法实现有竞争力的性能。

31. fastai: A Layered API for Deep Learning [PDF] 返回目录
Jeremy Howard, Sylvain Gugger
Abstract: fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4-5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We have used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.
摘要：fastai是深学习库，其提供从业人员与高级别组件，可以快速且容易地提供先进的最先进的结果在标准深度学习域，并且为研究人员提供低级别组件，可以是混合和匹配建立新的方法。它的目的是做没有实质性妥协的东西都在易用性，灵活性和性能。这可能要归功于精心分层的体系结构，它表达了解耦抽象的条款处理技术的许多深学习的共同的基本模式和数据。这些抽象能够借助底层Python语言的活力和PyTorch库的灵活性言简意赅地表达。 fastai包括：用于Python与语义类型层次结构张量沿着一个新型调度系统;一个GPU优化计算机视觉库可以在纯Python进行扩展;优化器，其refactors出现代优化的通用功能分成两个基本块，从而允许优化算法，以在4-5线的代码来实现;一种新颖的2路回调系统，可以访问该数据，模型，或优化器的任何部分，并在训练过程中的任何点进行更改;一个新的数据块的API;以及更多。我们使用这个库成功地创建一个完整的深度学习过程中，我们能够比使用以前的方法更快速地编写，并且代码更加清晰。该库已经在研究，工业和教学用途广。

32. A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data [PDF] 返回目录
Jun Hou, Tong Qin, Kailiang Wu, Dongbin Xiu
Abstract: A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When training dataset is sufficiently large, we prove that the corrected models deliver correct classification results as if there is no corruption in the training data. For datasets of finite size, the corrected models produce significantly better recovery results, compared to the models without the correction algorithm. All of the theoretical findings in the paper are verified by our numerical examples.
摘要：一种新的校正算法，提出了多类分类问题已损坏的训练数据。该算法是非侵入性的，通过将校正过程的模型预测在这个意义上它后处理一个训练的分类模型。修正过程可以配上任何逼近，如逻辑回归，各种结构的神经网络，等等。当训练数据集是足够大的，我们证明了修正模型提供正确的分类结果作为是否有在训练数据中没有损坏。对于有限大小的数据集，校正模型产生显著更好的恢复效果，相比于没有纠错算法模型。所有在纸上的理论成果都是由我们的算例验证。

33. Neuroevolution of Neural Network Architectures Using CoDeepNEAT and Keras [PDF] 返回目录
Jonas da Silveira Bohrer, Bruno Iochins Grisci, Marcio Dorn
Abstract: Machine learning is a huge field of study in computer science and statistics dedicated to the execution of computational tasks through algorithms that do not require explicit instructions but instead rely on learning patterns from data samples to automate inferences. A large portion of the work involved in a machine learning project is to define the best type of algorithm to solve a given problem. Neural networks - especially deep neural networks - are the predominant type of solution in the field. However, the networks themselves can produce very different results according to the architectural choices made for them. Finding the optimal network topology and configurations for a given problem is a challenge that requires domain knowledge and testing efforts due to a large number of parameters that need to be considered. The purpose of this work is to propose an adapted implementation of a well-established evolutionary technique from the neuroevolution field that manages to automate the tasks of topology and hyperparameter selection. It uses a popular and accessible machine learning framework - Keras - as the back-end, presenting results and proposed changes concerning the original algorithm. The implementation is available at GitHub (this https URL) with documentation and examples to reproduce the experiments performed for this work.
摘要：机器学习是通过不需要明确的指示，而是依赖于从数据样本的学习模式自动推理算法专用于计算任务的执行在计算机科学和统计学研究的一个巨大的领域。参与机器学习项目工作中的很大一部分是定义算法来解决特定问题的最佳类型。神经网络 - 尤其是深层神经网络 - 是主要的类型在该领域的解决方案。然而，网络本身可以根据他们做出的架构选择产生非常不同的结果。寻找最佳的网络拓扑和配置，对于给定的问题是需要专业知识和测试工作，由于大量的需要考虑的参数是一个挑战。这项工作的目的是提出一种适合实现从neuroevolution现场管理自动拓扑和超参数选择的任务一套行之有效的进化技术。它采用了流行的和可访问的机器学习框架 - Keras - 作为后端，呈现的结果和有关原始算法修改建议。实施可在GitHub上（此HTTPS URL）与文档和示例重现对这项工作进行的实验。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-02-13

目录

摘要