摘要

1. Multiple Attentional Pyramid Networks for Chinese Herbal Recognition [PDF] 返回目录
Yingxue Xu, Guihua Wen, Yang Hu, Mingnan Luo, Dan Dai, Yishan Zhuang, Wendy Hall
Abstract: Chinese herbs play a critical role in Traditional Chinese Medicine. Due to different recognition granularity, they can be recognized accurately only by professionals with much experience. It is expected that they can be recognized automatically using new techniques like machine learning. However, there is no Chinese herbal image dataset available. Simultaneously, there is no machine learning method which can deal with Chinese herbal image recognition well. Therefore, this paper begins with building a new standard Chinese-Herbs dataset. Subsequently, a new Attentional Pyramid Networks (APN) for Chinese herbal recognition is proposed, where both novel competitive attention and spatial collaborative attention are proposed and then applied. APN can adaptively model Chinese herbal images with different feature scales. Finally, a new framework for Chinese herbal recognition is proposed as a new application of APN. Experiments are conducted on our constructed dataset and validate the effectiveness of our methods.
摘要：中国药材发挥中国传统医学的重要作用。由于不同的识别粒度，他们可以准确地只能由具有太多经验的专业人士的认可。据预计，它们可以自动使用诸如机器学习新技术的认可。然而，没有中国草药图像数据集提供。同时，有可以对付中国中药图像识别以及没有机器学习方法。因此，本文开始建立一个新的标准的中国草药的数据集。随后，一个新的注意金字塔网络（APN）为中国草药识别提议，其中两个新的有竞争力的重视和空间协同的关注，提出，然后应用。 APN可以自适应建模与中国不同的特征尺度草药图像。最后，对于中国中草药识别的新框架提议作为APN的新的应用程序。实验是在我们构建的数据集进行并验证了我们方法的有效性。

2. Designing a Color Filter via Optimization of Vora-Value for Making a Camera more Colorimetric [PDF] 返回目录
Yuteng Zhu, Graham D. Finlayson
Abstract: The Luther condition states that if the spectral sensitivity responses of a camera are a linear transform from the color matching functions of the human visual system, the camera is colorimetric. Previous work proposed to solve for a filter which, when placed in front of a camera, results in sensitivities that best satisfy the Luther condition. By construction, the prior art solves for a filter for a given set of human visual sensitivities, e.g. the XYZ color matching functions or the cone response functions. However, depending on the target spectral sensitivity set, a different optimal filter is found. This paper begins with the observation that the cone fundamentals, XYZ color matching functions or any linear combination thereof span the same 3-dimensional subspace. Thus, we set out to solve for a filter that makes the vector space spanned by the filtered camera sensitivities as similar as possible to the space spanned by human vision sensors. We argue that the Vora-Value is a suitable way to measure subspace similarity and we develop an optimization method for finding a filter that maximizes the Vora-Value measure. Experiments demonstrate that our new optimization leads to filtered camera sensitivities which have a significantly higher Vora-Value compared with antecedent methods.
摘要：路德条件指出，如果照相机的对光谱感光度响应是线性从人的视觉系统的配色函数变换，相机是比色。以前的工作提出了解决为，当放置在照相机的前面，结果在灵敏度最满足路德条件的过滤器。通过构造，在现有技术解决了对于一组给定人的视觉灵敏度的滤光器例如XYZ色匹配函数或锥体响应函数。然而，根据该目标光谱灵敏度集，不同的最佳过滤器中找到。本文与观察开始，该锥体的基本面，XYZ色彩匹配函数或任何线性组合跨越相同的3维子空间。因此，我们着手解决用于过滤器，使由所述过滤相机灵敏度尽可能类似于由人的视觉传感器跨越的空间跨越的向量空间。我们认为，沃拉 - 值是测量子空间的相似性以适当的方式和我们开发查找过滤器最大化沃拉，价值尺度的优化方法。实验表明，我们的新的优化导致过滤相机感光度其中有显著较高沃拉 - 值前因方法相比。

3. FaR-GAN for One-Shot Face Reenactment [PDF] 返回目录
Hanxiang Hao, Sriram Baireddy, Amy R. Reibman, Edward J. Delp
Abstract: Animating a static face image with target facial expressions and movements is important in the area of image editing and movie production. This face reenactment process is challenging due to the complex geometry and movement of human faces. Previous work usually requires a large set of images from the same person to model the appearance. In this paper, we present a one-shot face reenactment model, FaR-GAN, that takes only one face image of any given source identity and a target expression as input, and then produces a face image of the same source identity but with the target expression. The proposed method makes no assumptions about the source identity, facial expression, head pose, or even image background. We evaluate our method on the VoxCeleb1 dataset and show that our method is able to generate a higher quality face image than the compared methods.
摘要：动画与目标的面部表情和动作静态脸图像在图像编辑和电影制作方面的重要。该面重演过程由于人脸的复杂几何形状和运动挑战。以前的工作通常需要从同一个人大组图像的外观模型。在本文中，我们提出了一种一次性面重演模型，FAR-GAN，即只需一个任何给定的源标识和目标表达作为输入的面部图像，然后产生相同源身份的但与面部图像靶的表达。所提出的方法不进行关于源身份，面部表情，头部姿势，或甚至图像背景的假设。我们评估我们在VoxCeleb1数据集的方法和证明我们的方法是能够产生比比较的方法更高质量的人脸图像。

4. SRDA-Net: Super-Resolution Domain Adaptation Networks for Semantic Segmentation [PDF] 返回目录
Enhai Liu, Zhenjie Tang, Bin Pan, Xia Xu, Tianyang Shi, Zhenwei Shi
Abstract: Recently, Unsupervised Domain Adaptation (UDA) was proposed to address the domain shift problem in semantic segmentation task, but it may perform poor when source and target domains belong to different resolutions. In this work, we design a novel end-to-end semantic segmentation network, Super- Resolution Domain Adaptation Network (SRDA-Net), which could simultaneously complete super-resolution and domain adaptation. Such characteristics exactly meet the requirement of semantic segmentation for remote sensing images which usually involve various resolutions. Generally, SRDA-Net includes three deep neural networks: a super-Resolution and Segmentation (RS) model focuses on recovering high-resolution image and predicting segmentation map; a pixel-level domain classifier (PDC) tries to distinguish the images from which domains; and output-space domain classifier (ODC) discriminates pixel label distribution from which domains. PDC and ODC are considered as the discriminators, and RS is treated as the generator. By the adversarial learning, RS tries to align the source with target domains on pixel-level visual appearance and output-space. Experiments are conducted on the two remote sensing datasets with different resolutions. SRDA-Net performs favorably against the state-of-the-art methods in terms of the mIoU metric.
摘要：近日，无监督领域适应性（UDA），提出了解决语义分割任务域转移的问题，但是，当源和目标域属于不同的分辨率可能执行较差。在这项工作中，我们设计了一个新的终端到终端的语义分割网络，超分辨率领域适应性网络（SRDA-网），它可以同时完成超高分辨率和领域适应性。这样的特性正好满足语义分割的遥感图像，其通常涉及各种分辨率的要求。通常，SRDA-Net的包括三个深神经网络：一个超分辨率和分割（RS）模型集中于恢复高分辨率图像和预测分割图;一个像素级域分类（PDC）试图区分哪些域的图像;和输出空间域分类器（ODC）判别像素标签分发从哪个域。 PDC和ODC被认为是鉴别器，并且RS被视为发电机。通过对抗学习，RS尝试对准与像素级的视觉外观和输出空间目标域的来源。实验在具有不同分辨率的两个遥感数据集进行的。有利地针对国家的最先进的方法在米欧度量而言SRDA-Net的执行。

5. Binarizing MobileNet via Evolution-based Searching [PDF] 返回目录
Hai Phan, Zechun Liu, Dang Huynh, Marios Savvides, Kwang-Ting Cheng, Zhiqiang Shen
Abstract: Binary Neural Networks (BNNs), known to be one among the effectively compact network architectures, have achieved great outcomes in the visual tasks. Designing efficient binary architectures is not trivial due to the binary nature of the network. In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network with separable depth-wise convolution. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs), assuming an approximately optimal trade-off between computational cost and model accuracy. Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution while optimizing the model performance in terms of complexity and latency. The approach is threefold. First, we train strong baseline binary networks with a wide range of random group combinations at each convolutional layer. This set-up gives the binary neural networks a capability of preserving essential information through layers. Second, to find a good set of hyperparameters for group convolutions we make use of the evolutionary search which leverages the exploration of efficient 1-bit models. Lastly, these binary models are trained from scratch in a usual manner to achieve the final binary model. Various experiments on ImageNet are conducted to show that following our construction guideline, the final model achieves 60.09% Top-1 accuracy and outperforms the state-of-the-art CI-BCNN with the same computational cost.
摘要：二进制神经网络（BNNs），已知的有效紧凑的网络架构中的一种，已经在视觉任务，取得了巨大的成果。设计高效的二进制架构是不平凡的，由于网络的二元特性。在本文中，我们提出了一种利用进化搜索，以方便施工和培训方案时二值化MobileNet，紧凑的网络可分离纵深卷积。通过一次性架构搜索框架的启发，我们处理组卷积的思想来设计的高效1位卷积神经网络（细胞神经网络），假设一个近似最优折衷计算成本和模型精确度之间。我们的目标是通过探索组卷积的最佳人选，而在复杂性和延迟方面的优化模型的性能要拿出一个小而高效的二进制神经结构。该方法是一举三得。首先，我们培养强基线二进制网络在每个卷积层的宽范围内的随机组的组合。这种设置使二元神经网络通过多层保护关键信息的能力。其次，要找到一个很好的一套超参数为组卷积我们利用它充分利用高效的1位模式的探索进化搜索的。最后，这些二进制模式从头开始以通常的方式培训，以实现最终的二元模型。在ImageNet各种实验以证明下列我们的建设方针，最终模型达到60.09％顶1精度和相同的计算成本优于国家的最先进的CI-BCNN。

6. Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors [PDF] 返回目录
Lucas Brynte, Fredrik Kahl
Abstract: In recent years, considerable progress has been made for the task of rigid object pose estimation from a single RGB-image, but achieving robustness to partial occlusions remains a challenging problem. Pose refinement via rendering has shown promise in order to achieve improved results, in particular, when data is scarce. In this paper we focus our attention on pose refinement, and show how to push the state-of-the-art further in the case of partial occlusions. The proposed pose refinement method leverages on a simplified learning task, where a CNN is trained to estimate the reprojection error between an observed and a rendered image. We experiment by training on purely synthetic data as well as a mixture of synthetic and real data. Current state-of-the-art results are outperformed for two out of three metrics on the Occlusion LINEMOD benchmark, while performing on-par for the final metric.
摘要：近年来，相当大的进展已经取得了刚性物体姿态估计从单一的RGB图像的任务，但实现稳健性部分遮挡仍然是一个具有挑战性的问题。通过绘制姿势细化已显示出为了实现改进的结果，特别是希望，当数据是稀少。在本文中，我们专注于姿态优化了我们的注意，并展示如何进一步推动国家的最先进的部分遮挡的情况。上简化学习任务，其中一个CNN被训练以估计所观察的和呈现的图像之间的投影误差所提出的姿势细化方法杠杆。我们对纯合成数据以及模拟和真实数据的混合实验用的培训。当前状态的最先进的结果优于两个列于闭塞LINEMOD基准三个度量的，而在标准杆执行用于最终度量。

7. DAugNet: Unsupervised, Multi-source, Multi-target, and Life-long Domain Adaptation for Semantic Segmentation of Satellite Images [PDF] 返回目录
Onur Tasar, Alain Giros, Yuliya Tarabalka, Pierre Alliez, Sébastien Clerc
Abstract: The domain adaptation of satellite images has recently gained an increasing attention to overcome the limited generalization abilities of machine learning models when segmenting large-scale satellite images. Most of the existing approaches seek for adapting the model from one domain to another. However, such single-source and single-target setting prevents the methods from being scalable solutions, since nowadays multiple source and target domains having different data distributions are usually available. Besides, the continuous proliferation of satellite images necessitates the classifiers to adapt to continuously increasing data. We propose a novel approach, coined DAugNet, for unsupervised, multi-source, multi-target, and life-long domain adaptation of satellite images. It consists of a classifier and a data augmentor. The data augmentor, which is a shallow network, is able to perform style transfer between multiple satellite images in an unsupervised manner, even when new data are added over the time. In each training iteration, it provides the classifier with diversified data, which makes the classifier robust to large data distribution difference between the domains. Our extensive experiments prove that DAugNet significantly better generalizes to new geographic locations than the existing approaches.
摘要：对卫星图像的领域适应性最近获得了越来越多的关注，以克服分割大型卫星图像时学习模型机的有限泛化能力。大多数现有方法的寻求从一个域适应模型到另一个。然而，这样的单源和单目标设定防止了可扩展的解决方案的方法中，由于具有不同的数据分布时下多个源和目标域通常是可利用。此外，卫星图像的连续增殖必要分类器，以适应不断增加的数据。我们提出了一种新的方法，创造DAugNet，无监督，多源，多目标，和卫星图片的终身领域适应性。它由一个分类和数据加力。数据加力，其是浅网络，能够以无监督的方式执行多个卫星图像之间样式转印，即使当新的数据被在时间加入。在每个训练迭代中，它提供了多元化数据分类器，这使得分类器坚固以在域之间的大数据分布的差别。我们大量的实验证明，DAugNet显著更好的推广到比现有的方法新的地理位置。

8. On the uncertainty of self-supervised monocular depth estimation [PDF] 返回目录
Matteo Poggi, Filippo Aleotti, Fabio Tosi, Stefano Mattoccia
Abstract: Self-supervised paradigms for monocular depth estimation are very appealing since they do not require ground truth annotations at all. Despite the astonishing results yielded by such methodologies, learning to reason about the uncertainty of the estimated depth maps is of paramount importance for practical applications, yet uncharted in the literature. Purposely, we explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy, proposing a novel peculiar technique specifically designed for self-supervised approaches. On the standard KITTI dataset, we exhaustively assess the performance of each method with different self-supervised paradigms. Such evaluation highlights that our proposal i) always improves depth accuracy significantly and ii) yields state-of-the-art results concerning uncertainty estimation when training on sequences and competitive results uniquely deploying stereo pairs.
摘要：单眼深度估计自我监督的范式是非常有吸引力的，因为他们并不需要在所有的地面实况注解。尽管令人惊讶的结果通过这样的方法产生，学习推理所估计的深度映射的不确定性的实际应用，在文献中还未知至关重要。故意，我们探讨的第一次如何估计此任务的不确定性，这将如何影响深度精度，提出专门为自我监督的方法设计了一个新颖独特的技术。在标准数据集KITTI，我们详尽评估与不同的自我监督范式每种方法的性能。这种评价的亮点，我们的建议我）总是显著提高深度精度和ii）产生的不确定性有关估计当在序列和有竞争力的结果训练部署独特的立体像对国家的先进成果。

9. Mean Oriented Riesz Features for Micro Expression Classification [PDF] 返回目录
Carlos Arango Duque, Olivier Alata, Rémi Emonet, Hubert Konik, Anne-Claire Legrand
Abstract: Micro-expressions are brief and subtle facial expressions that go on and off the face in a fraction of a second. This kind of facial expressions usually occurs in high stake situations and is considered to reflect a human's real intent. There has been some interest in micro-expression analysis, however, a great majority of the methods are based on classically established computer vision methods such as local binary patterns, histogram of gradients and optical flow. A novel methodology for micro-expression recognition using the Riesz pyramid, a multi-scale steerable Hilbert transform is presented. In fact, an image sequence is transformed with this tool, then the image phase variations are extracted and filtered as proxies for motion. Furthermore, the dominant orientation constancy from the Riesz transform is exploited to average the micro-expression sequence into an image pair. Based on that, the Mean Oriented Riesz Feature description is introduced. Finally the performance of our methods are tested in two spontaneous micro-expressions databases and compared to state-of-the-art methods.
摘要：微表情很简短，只上过脸去说在几分之一秒的细微表情。这种表情通常发生在高威胁的情况，被认为是反映人真实意图。出现了在微表达分析一些兴趣，但是，大多数的方法是基于经典的建立的计算机视觉方法，如局部二进制模式，梯度和光流的直方图。使用中Riesz金字塔微表情识别一种新颖的方法，多尺度可操纵Hilbert变换被呈现。事实上，图像序列转化有这个工具，则图像的相位变化被提取，并过滤作为代理进行运动。此外，来自中Riesz变换主方向恒常被利用以平均微表达序列插入的图像对。在此基础上，引入了面向中庸特征里斯描述。最后，我们的表现方法有两种自发微表情数据库进行测试，并与国家的最先进的方法。

10. Attribute-guided Feature Extraction and Augmentation Robust Learning for Vehicle Re-identification [PDF] 返回目录
Chaoran Zhuge, Yujie Peng, Yadong Li, Jiangbo Ai, Junru Chen
Abstract: Vehicle re-identification is one of the core technologies of intelligent transportation systems and smart cities, but large intra-class diversity and inter-class similarity poses great challenges for existing method. In this paper, we propose a multi-guided learning approach which utilizing the information of attributes and meanwhile introducing two novel random augments to improve the robustness during training. What's more, we propose an attribute constraint method and group re-ranking strategy to refine matching results. Our method achieves mAP of 66.83% and rank-1 accuracy 76.05% in the CVPR 2020 AI City Challenge.
摘要：车辆重新鉴定是智能交通系统和智能城市，但对于现有的方法，大量的类内的多样性和类之间的相似性构成巨大挑战的核心技术之一。在本文中，我们提出一种利用属性的信息，并同时推出了两款新的随机增强对提高训练过程中的鲁棒性多引导学习方式。更重要的是，我们提出了一个属性约束方法和组重新排序策略，以细化的匹配结果。我们的方法实现了66.83％的地图和等级1的准确性76.05％，在CVPR 2020 AI城市挑战。

11. 3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning [PDF] 返回目录
Mi Tian, Qiong Nie, Hao Shen
Abstract: Camera localization is a fundamental and key component of autonomous driving vehicles and mobile robots to localize themselves globally for further environment perception, path planning and motion control. Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods. In this work, we propose a compact network for absolute camera pose regression. Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents. We add this constraint as a regularization term to our proposed network by defining a pixel-level photometric loss and an image-level structural similarity loss. To benchmark our method, different challenging scenes including indoor and outdoor environment are tested with our proposed approach and state-of-the-arts. And the experimental results demonstrate significant performance improvement of our method on both prediction accuracy and convergence efficiency.
摘要：相机的定位是自主驾驶车辆和移动机器人的基础和重要组成部分，以全球定位自己进一步的环境感知，路径规划和运动控制。最近结束对终端的基于卷积神经网络的方法已经被很多研究，以达到甚至超过3D几何基于传统的方法。在这项工作中，我们提出了绝对的相机姿态回归的小型网络。不同于传统方法的启发，3D场景的几何形状感知的约束也通过利用所有可用的信息，包括运动，深度和图像内容介绍。我们通过定义一个像素级别的光度损耗和图像层次结构相似的损失该约束的正则化项添加到我们提出的网络。为了我们的基准测试方法，不同的具有挑战性的场景，包括室内和室外环境与我们提出的方法的国家的最艺术和测试。和实验结果证明了我们两个的预测精度和收敛效率的方法显著的性能提升。

12. Self-Supervised Deep Visual Odometry with Online Adaptation [PDF] 返回目录
Shunkai Li, Xin Wang, Yingdian Cao, Fei Xue, Zike Yan, Hongbin Zha
Abstract: Self-supervised VO methods have shown great success in jointly estimating camera pose and depth from videos. However, like most data-driven methods, existing VO networks suffer from a notable decrease in performance when confronted with scenes different from the training data, which makes them unsuitable for practical applications. In this paper, we propose an online meta-learning algorithm to enable VO networks to continuously adapt to new environments in a self-supervised manner. The proposed method utilizes convolutional long short-term memory (convLSTM) to aggregate rich spatial-temporal information in the past. The network is able to memorize and learn from its past experience for better estimation and fast adaptation to the current frame. When running VO in the open world, in order to deal with the changing environment, we propose an online feature alignment method by aligning feature distributions at different time. Our VO network is able to seamlessly adapt to different environments. Extensive experiments on unseen outdoor scenes, virtual to real world and outdoor to indoor environments demonstrate that our method consistently outperforms state-of-the-art self-supervised VO baselines considerably.
摘要：自监督VO方法已经从视频联合估计相机姿势和深度表现出了极大的成功。然而，最喜欢的数据驱动的方法，现有的VO网络在性能显着下降时，与从训练数据，这使得它们不适合实际应用的不同场景面临吃亏。在本文中，我们提出了一个在线的元学习算法，使VO网络不断适应新的环境中自我监督的方式。该方法采用卷积长短期记忆（convLSTM）在过去的合计丰富的时空信息。该网络能够记住并从其获得更好的估计和快速适应当前帧过去的经验中学习。当在开放的世界运行VO，以应对不断变化的环境中，我们通过在不同时间心功能分布提出了一个在线功能对准方法。我们的VO网络能够无缝地适应不同的环境。在看不见的室外场景广泛的实验，虚拟现实世界和室外到室内环境证明我们的方法的性能一直优于国家的最先进的自我监督VO基线相当。

13. RISE Video Dataset: Recognizing Industrial Smoke Emissions [PDF] 返回目录
Yen-Chia Hsu, Ting-Hao, Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Randy Sargent, Illah Nourbakhsh
Abstract: Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens in pursuing environmental justice. However, existing datasets do not have sufficient quality nor quantity for training robust CV models to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopt the citizen science approach to collaborate with local community members in annotating whether a video clip has smoke emissions. Our dataset contains 12,567 clips with 19 distinct views from cameras on three sites that monitored three different industrial facilities. The clips are from 30 days that spans four seasons in two years in the daytime. We run experiments using deep neural networks developed for video action recognition to establish a performance baseline and reveal the challenges for smoke recognition. Our data analysis also shows opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for social good.
摘要：工业烟气排放对人体健康有显著的关注。在此之前的作品已经表明，利用计算机视觉（CV）技术来识别烟雾视觉证据会影响监管机构和赋予公民权利的态度，追求环境正义。但是，现有的数据集没有足够的质量，也没有量训练稳健CV模式，以支持空气质量宣传。我们引入上涨，第一次大规模的视频数据集认清产业碳烟排放。我们采取与当地社区成员的公民科学的方法来协作，注释的视频剪辑是否有黑烟排放量。我们的数据集包含12567个剪辑从相机上监视三个不同的工业设施三个站点19点截然不同的意见。该短片是从30天横跨在白天2年四季。我们运行使用视频行为识别开发深层神经网络建立一个性能基准实验，揭示了烟雾识别的挑战。我们的数据分析也显示了公民科学家和人群工人融入人工智能的社会良好的应用机会。

14. Adversarial examples are useful too! [PDF] 返回目录
Ali Borji
Abstract: Deep learning has come a long way and has enjoyed an unprecedented success. Despite high accuracy, however, deep models are brittle and are easily fooled by imperceptible adversarial perturbations. In contrast to common inference-time attacks, Backdoor (\aka Trojan) attacks target the training phase of model construction, and are extremely difficult to combat since a) the model behaves normally on a pristine testing set and b) the augmented perturbations can be minute and may only affect few training samples. Here, I propose a new method to tell whether a model has been subject to a backdoor attack. The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM and then feed them back to the classifier. By computing the statistics (here simply mean maps) of the images in different categories and comparing them with the statistics of a reference model, it is possible to visually locate the perturbed regions and unveil the attack.
摘要：深学习已经走过了漫长的道路，并享有了空前的成功。虽然精度高，但深模型脆，很容易被察觉不到的对抗扰动上当。相比于常见的推理时间攻击，后门（\又称木马）攻击目标模型构建的训练阶段，而且是十分难以对付，因为a）在原始的测试组和B常模型的行为）的增强扰动可分钟，可能只会影响少数训练样本。在这里，我提出了一个新的方法告诉模型是否已经受到攻击的后门。这样做是为了产生对抗的例子，有针对性的或不相关的，使用常规的攻击，例如FGSM再喂他们回到分类器。通过计算不同类别的图像的统计数据（这里简单平均值地图），并将它们与参考模型的统计比较，可以直观地定位扰动区，揭开了攻击。

15. Apple Defect Detection Using Deep Learning Based Object Detection For Better Post Harvest Handling [PDF] 返回目录
Paolo Valdez
Abstract: The inclusion of Computer Vision and Deep Learning technologies in Agriculture aims to increase the harvest quality, and productivity of farmers. During postharvest, the export market and quality evaluation are affected by assorting of fruits and vegetables. In particular, apples are susceptible to a wide range of defects that can occur during harvesting or/and during the post-harvesting period. This paper aims to help farmers with post-harvest handling by exploring if recent computer vision and deep learning methods such as the YOLOv3 (Redmon & Farhadi (2018)) can help in detecting healthy apples from apples with defects.
摘要：计算机视觉和深度学习技术在农业纳入旨在提高收获质量和农民的生产力。在收获后，出口市场和质量评价是由水果和蔬菜的分拣影响。具体地讲，苹果很容易受到了广泛的可能发生的收获时和/或在后的收获期缺陷。本文旨在帮助农民收获后处理通过探索如果近期计算机视觉和深刻的学习方法，如YOLOv3（雷德曼＆法哈迪（2018）），可以从苹果的健康检测与苹果缺陷帮助。

16. Class-Incremental Learning for Semantic Segmentation Re-Using Neither Old Data Nor Old Labels [PDF] 返回目录
Marvin Klingner, Andreas Bär, Philipp Donn, Tim Fingscheidt
Abstract: While neural networks trained for semantic segmentation are essential for perception in autonomous driving, most current algorithms assume a fixed number of classes, presenting a major limitation when developing new autonomous driving systems with the need of additional classes. In this paper we present a technique implementing class-incremental learning for semantic segmentation without using the labeled data the model was initially trained on. Previous approaches still either rely on labels for both old and new classes, or fail to properly distinguish between them. We show how to overcome these problems with a novel class-incremental learning technique, which nonetheless requires labels only for the new classes. Specifically, (i) we introduce a new loss function that neither relies on old data nor on old labels, (ii) we show how new classes can be integrated in a modular fashion into pretrained semantic segmentation models, and finally (iii) we re-implement previous approaches in a unified setting to compare them to ours. We evaluate our method on the Cityscapes dataset, where we exceed the mIoU performance of all baselines by 3.5% absolute reaching a result, which is only 2.2% absolute below the upper performance limit of single-stage training, relying on all data and labels simultaneously.
摘要：虽然训练的语义分割神经网络是在自动驾驶感受至关重要，目前大多数算法假定的类固定数量，随需额外类开发新的自主驾驶系统时，提出的主要限制。在本文中，我们提出了实现语义分割类增量学习，而无需使用标记的数据模型最初的培训上的技术。以前的方法仍然要么依靠标签新老班，或者不给他们正确区分。我们展示了如何克服这些问题，一类新的增量学习技术，它仍然只需要为新类的标签。具体而言，（我），我们引入一个新的损失函数既不依赖于旧的数据也没有对旧标签，（二）我们展示类如何新可以集成在一个模块化的方式为预训练的语义分割模型，最后（iii）本公司重新在一个统一的环境 - 实施以前的方法将它们比作我们的。我们评估我们对城市景观的数据集，我们超过了3.5％，所有基准的米欧性能绝对达到了一个结果，这是只有2.2％的绝对低于单级训练上的性能极限，同时依托上的所有数据和标签法。

17. Compositional Few-Shot Recognition with Primitive Discovery and Enhancing [PDF] 返回目录
Yixiong Zou, Shanghang Zhang, Ke Chen, José M. F. Moura, Yaowei Wang, Yonghong Tian
Abstract: Few-shot learning (FSL) aims at recognizing novel classes given only few training samples, which still remains a great challenge for deep learning. However, humans can easily recognize novel classes with only few samples. A key component of such ability is the compositional recognition that human can perform, which has been well studied in cognitive science but is not well explored in FSL. Inspired by such capability of humans, we first provide a compositional view of the widely adopted FSL baseline model. Based on this view, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with two parts, i.e. primitive discovery and primitive enhancing. In primitive discovery, we focus on learning primitives related to object parts by self-supervision from the order of split input, avoiding extra laborious annotations and alleviating the effect of semantic gaps. In primitive enhancing, inspired by both mathematical deduction and biological studies (the Hebbian Learning rule and the Winner-Take-All mechanism), we propose a soft composition mechanism by enlarging the activation of important primitives while reducing that of others, so as to enhance the influence of important primitives and better utilize these primitives to compose novel classes. Extensive experiments on public benchmarks are conducted on both the few-shot image classification and video recognition tasks. Our method achieves the state-of-the-art performance on all these datasets and shows better interpretability.
摘要：很少次学习（FSL）旨在给予承认只有很少的训练样本小说类，它们仍然深学习的巨大挑战。然而，人类可以很容易地识别，只有少数样本小说类。这种能力的关键部件是组成认识到，人类可以执行，这已经得到很好的研究在认知科学，但未在FSL很好的探讨。人类的这种能力的启发，我们首先提供广泛采用FSL基线模型的构成图。基于这一观点，模仿人类的学习可视基元和作曲元识别新类型的能力，我们提出了一个方法来FSL学习的重要原语组成的特征表示，这是一起有两个部分的训练，即原始的发现和原始增强。在原始的发现，我们专注于通过学习从拆分输入顺序自检与对象部分的图元，避免了额外费力的注释和缓解语义差距的作用。在原始的增强，通过双方的数学推导和生物学研究（在赫宾学习规则，胜者通吃的机制）的启发，我们通过扩大的重要基元的激活提出了一个柔软的组合机制，同时减少其他人的，从而提升重要的原语和更好的影响利用这些原语撰写小说类。公共基准大量实验的几个镜头图像分类和视频识别任务都进行。我们的方法实现了对所有这些数据集，具有更好的可解释性的国家的最先进的性能。

18. Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition [PDF] 返回目录
Hui Ding, Peng Zhou, Rama Chellappa
Abstract: Recognizing the expressions of partially occluded faces is a challenging computer vision problem. Previous expression recognition methods, either overlooked this issue or resolved it using extreme assumptions. Motivated by the fact that the human visual system is adept at ignoring the occlusion and focus on non-occluded facial areas, we propose a landmark-guided attention branch to find and discard corrupted features from occluded regions so that they are not used for recognition. An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions. To further improve robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and task each block to predict the expression independently. This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded. Depending on the synergistic effects of the two branches, our occlusion-adaptive deep network significantly outperforms state-of-the-art methods on two challenging in-the-wild benchmark datasets and three real-world occluded expression datasets.
摘要：鉴于部分遮挡面部的表达是一个具有挑战性的计算机视觉问题。上一页表情识别方法，无论是忽略了这个问题，或者用极端假设解决它。的事实，人类的视觉系统是善于无视非遮挡面部区域的闭塞和重点的启发，我们提出了一个具有里程碑意义，引导关注分支找到并丢弃遮挡区域损坏的特点，使它们不用于识别。首先产生的注意力图来表示，如果一个特定的面部部分被遮挡，指导我们的模型参加到非封闭区域。为了进一步改善鲁棒性，我们提出了一种面部区域分支以分隔特征映射到不重叠的面部块和任务每个块独立地预测的表达。这导致更加多样化和判别特征，使表达识别系统，以恢复即使面被部分地遮挡。根据两个分支的协同效应，我们的闭塞自适应深网络显著优于国家的最先进的方法，在两个挑战中最百搭基准数据集和三个真实世界的闭塞表达数据。

19. Computer Vision Toolkit for Non-invasive Monitoring of Factory Floor Artifacts [PDF] 返回目录
Aditya M. Deshpande, Anil Kumar Telikicherla, Vinay Jakkali, David A. Wickelhaus, Manish Kumar, Sam Anand
Abstract: Digitization has led to smart, connected technologies be an integral part of businesses, governments and communities. For manufacturing digitization, there has been active research and development with a focus on Cloud Manufacturing (CM) and the Industrial Internet of Things (IIoT). This work presents a computer vision toolkit (CV Toolkit) for non-invasive digitization of the factory floor in line with Industry 4.0 requirements for factory data collection. Currently, technical challenges persist towards digitization of legacy systems due to the limitation for changes in their design and sensors. This novel toolkit is developed to facilitate easy integration of legacy production machinery and factory floor artifacts with the digital and smart manufacturing environment with no requirement of any physical changes in the machines. The system developed is modular, and allows real-time monitoring of production machinery. Modularity aspect allows the incorporation of new software applications in the current framework of CV Toolkit. To allow connectivity of this toolkit with manufacturing floors in a simple, deployable and cost-effective manner, the toolkit is integrated with a known manufacturing data standard, MTConnect, to "translate" the digital inputs into data streams that can be read by commercial status tracking and reporting software solutions. The proposed toolkit is demonstrated using a mock-panel environment developed in house at the University of Cincinnati to highlight its usability.
摘要：数字化导致了智能，连接性技术是企业，政府和社区的一个组成部分。对于制造数字化，出现了积极的研究和开发为重点的云制造（CM）和物联网产业的互联网（IIoT）。这项工作提出了计算机视觉工具包（CV工具包）为工厂车间的非侵入性的数字化符合工厂数据采集行业4.0的要求。目前，技术挑战对遗留系统的数字化依然存在，由于限制了他们的设计和传感器的变化。这种新颖的工具包开发的方便轻松集成传统的生产机械和工厂车间的文物与数字和智能制造环境中的机器的任何物理变化没有要求。开发的系统是模块化的，并允许实时监控生产机械。模块化方面允许的CV Toolkit的现有框架新的应用软件的结合。为了允许这种工具包的连接以简单的制造地板，部署和成本有效的方式，该工具包集成了一个公知的制造数据标准，MTCONNECT，“翻译”的数字输入转换成数据流，可以由商业状态被读取跟踪和报告软件解决方案。建议的工具包是使用自行开发的一个模拟面板环境在辛辛那提大学，以突出其实用性证明。

20. Increased-confidence adversarial examples for improved transferability of Counter-Forensic attacks [PDF] 返回目录
Wenjie Li, Benedetta Tondi, Rongrong Ni, Mauro Barni
Abstract: Transferability of adversarial examples is a key issue to study the security of multimedia forensics (MMF) techniques relying on Deep Learning (DL). The transferability of the attacks, in fact, would open the way to the deployment of successful counter forensics attacks also in cases where the attacker does not have a full knowledge of the to-be-attacked system. Some preliminary works have shown that adversarial examples against CNN-based image forensics detectors are in general non-transferrable, at least when the basic versions of the attacks implemented in the most popular attack packages are adopted. In this paper, we introduce a general strategy to increase the strength of the attacks and evaluate the transferability of the adversarial examples when such a strength varies. We experimentally show that, in this way, attack transferability can be improved to a large extent, at the expense of a larger distortion. Our research confirms the security threats posed by the existence of adversarial examples even in multimedia forensics scenarios, thus calling for new defense strategies to improve the security of DL-based MMF techniques.
摘要：对抗性的例子转让是一个关键问题，研究依赖于深度学习（DL）多媒体取证（MMF）技术的安全性。攻击的转让，实际上，将打开通往成功的反取证攻击也是案件的部署下，攻击者没有要被攻击系统的全部知识。一些前期工作已经表明，针对基于CNN-图像取证探测器对抗的例子是一般不可转让，至少当在最流行的攻击包实施攻击的基本版本采用。在本文中，我们介绍了增加的攻击强度和评估时，这样的强度变化对抗的例子可转让的一般策略。我们通过实验证明，以这种方式，转让攻击可以改善在很大程度上，在较大的失真为代价的。我们的研究证实了的对抗例子存在，即使在多媒体取证场景所带来的安全威胁，因此要求新的防御战略，以提高基于DL-MMF技术的安全性。

21. Fast Deep Multi-patch Hierarchical Network for Nonhomogeneous Image Dehazing [PDF] 返回目录
Sourya Dipta Das, Saikat Dutta
Abstract: Recently, CNN based end-to-end deep learning methods achieve superiority in Image Dehazing but they tend to fail drastically in Non-homogeneous dehazing. Apart from that, existing popular Multi-scale approaches are runtime intensive and memory inefficient. In this context, we proposed a fast Deep Multi-patch Hierarchical Network to restore Non-homogeneous hazed images by aggregating features from multiple image patches from different spatial sections of the hazed image with fewer number of network parameters. Our proposed method is quite robust for different environments with various density of the haze or fog in the scene and very lightweight as the total size of the model is around 21.7 MB. It also provides faster runtime compared to current multi-scale methods with an average runtime of 0.0145s to process 1200x1600 HD quality image. Finally, we show the superiority of this network on Dense Haze Removal to other state-of-the-art models.
摘要：近日，美国有线电视新闻网基于终端到终端的深度学习方法，可以实现在图像除雾的优势，但他们往往在非均质除雾大幅失败。除此之外，现有流行的多尺度方法是运行时间密集型和内存效率低下。在此背景下，我们提出了一种快速深多层次的补丁通过网络与更少数量的网络参数从雾面图像的不同空间部分聚合来自多个图像块功能得以恢复非均匀雾面图像。我们提出的方法是相当健壮针对不同的环境与场景中的霾或雾的各种密度和非常轻便作为模型的总规模约为21.7 MB。它还相比，目前的多尺度方法与0.0145s的平均运行时处理1200x1600 HD画质的图像提供了更快的运行时间。最后，我们将展示这个网络上密集薄云去除，以国家的最先进的其他车型的优势。

22. Local Fiber Orientation from X-ray Region-of-Interest Computed Tomography of large Fiber Reinforced Composite Components [PDF] 返回目录
Thomas Baranowski, Dascha Dobrovolskij, Kilian Dremel, Astrid Hölzing, Günter Lohfink, Katja Schladitz, Simon Zabler
Abstract: The local fiber orientation is a micro-structural feature crucial for the mechanical properties of parts made from fiber reinforced polymers. It can be determined from micro-computed tomography data and subsequent quantitative analysis of the resulting 3D images. However, although being by nature non-destructive, this method so far has required to cut samples of a few millimeter edge length in order to achieve the high lateral resolution needed for the analysis. Here, we report on the successful combination of region-of-interest scanning with structure texture orientation analysis rendering the above described approach truly non-destructive. Several regions of interest in a large bearing part from the automotive industry made of fiber reinforced polymer are scanned and analyzed. Differences of these regions with respect to local fiber orientation are quantified. Moreover, consistency of the analysis based on scans at varying lateral resolutions is proved. Finally, measured and numerically simulated orientation tensors are compared for one of the regions.
摘要：本地纤维取向是增强聚合物的微结构特征用于从纤维制成的部件的机械性能是至关重要的。它可以从微计算机断层扫描数据和所述产生的3D图像的后续定量分析来确定。然而，虽然通过自然非破坏性是，这种方法迄今已需要几个毫米边长的切割样品，以实现所需的分析，高横向分辨率。在这里，我们区域的利益扫描与结构纹理方向分析呈现上述办法真正无损成功结合报告。在一个大的支承部件从由纤维汽车工业兴趣几个区域增强的聚合物进行扫描和分析。相对于本地纤维取向这些区域的差异进行量化。此外，基于扫描在不同的横向分辨率的分析的一致性证实。最后，测量和数值模拟取向张量的区域中的一个相比较。

23. Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA [PDF] 返回目录
Hyounghun Kim, Zineng Tang, Mohit Bansal
Abstract: Videos convey rich information. Dynamic spatio-temporal relationships between people/objects, and diverse multimodal events are present in a video clip. Hence, it is important to develop automated models that can accurately extract such information from videos. Answering questions on videos is one of the tasks which can evaluate such AI abilities. In this paper, we propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions. Specifically, we first employ dense image captions to help identify objects and their detailed salient regions and actions, and hence give the model useful extra information (in explicit textual format to allow easier matching) for answering questions. Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier. Finally, we also cast the frame selection problem as a multi-label classification task and introduce two loss functions, In-andOut Frame Score Margin (IOFSM) and Balanced Binary Cross-Entropy (BBCE), to better supervise the model with human importance annotations. We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin (74.09% versus 70.52%). We also present several word, object, and frame level visualization studies. Our code is publicly available at: this https URL
摘要：影片传达丰富的信息。人/物，多样的多式联运事件之间动态的时空关系存在于一个视频剪辑。因此，开发自动模式，可以准确地从视频中提取这些信息是非常重要的。在视频回答问题是可以评估这些AI能力的任务之一。在本文中，我们提出一种有效整合多模式输入源和查找时间的相关信息来回答问题的视频答疑系统的模型。具体而言，我们首先聘请密集图片说明，以帮助识别对象及其详细的显着区域和行动，从而给模型有用的额外信息（明确的文本格式，以便更容易匹配）为回答问题。此外，我们的模型也由双级注意（字/对象和帧级），多头自/交叉融合的不同来源（视频和密集字幕），和大门其中传递更多的相关信息的分类。最后，我们也投架选择问题，多标签分类的任务，介绍了两种损失函数，在-andOut框架得分保证金（IOFSM）和平衡二叉树跨熵（BBCE），以更好地监督与人类的重要性批注的模型。我们评估我们的挑战TVQA数据集，其中每一个我们的模型组件提供显著的收益模式，我们的整体模型优于国家的最先进的大幅度（74.09％比70.52％）。我们还提出几个单词，对象，和帧级的可视化研究。我们的代码是公开的：这HTTPS URL

24. Towards segmentation and spatial alignment of the human embryonic brain using deep learning for atlas-based registration [PDF] 返回目录
Wietske A.P. Bastiaansen, Melek Rousian, Régine P.M. Steegers-Theunissen, Wiro J. Niessen, Anton Koning, Stefan Klein
Abstract: We propose an unsupervised deep learning method for atlas based registration to achieve segmentation and spatial alignment of the embryonic brain in a single framework. Our approach consists of two sequential networks with a specifically designed loss function to address the challenges in 3D first trimester ultrasound. The first part learns the affine transformation and the second part learns the voxelwise nonrigid deformation between the target image and the atlas. We trained this network end-to-end and validated it against a ground truth on synthetic datasets designed to resemble the challenges present in 3D first trimester ultrasound. The method was tested on a dataset of human embryonic ultrasound volumes acquired at 9 weeks gestational age, which showed alignment of the brain in some cases and gave insight in open challenges for the proposed method. We conclude that our method is a promising approach towards fully automated spatial alignment and segmentation of embryonic brains in 3D ultrasound.
摘要：本文提出了基于注册来实现分割，胚胎大脑的空间排列在一个单一的框架图谱一种无监督的深度学习方法。我们的方法包括两个连续的网络上提供专门设计的损失函数，以解决3D第一孕期超声检查的挑战。第一部分学习的仿射变换和所述第二部分学习所述目标图像和所述图谱之间的非刚性voxelwise变形。我们训练这个网络的端至端和验证它针对的设计类似于目前在3D第一孕期超声挑战合成数据集地面实况。该方法在9周的胎龄获得人类胚胎超声卷的数据集，这表明在某些情况下，大脑的定位和所提出的方法打开了挑战洞察测试。我们认为我们的方法是实现完全自动化的空间对准和三维超声胎脑的分割有前途的方法。

25. Neural Architecture Search for Gliomas Segmentation on Multimodal Magnetic Resonance Imaging [PDF] 返回目录
Feifan Wang, Bharat Biswal
Abstract: Past few years have witnessed the artificial intelligence inspired evolution in various medical fields. The diagnosis and treatment of gliomas -- one of the most commonly seen brain tumor with low survival rate -- relies heavily on the computer assisted segmentation process undertaken on the magnetic resonance imaging (MRI) scans. Although the encoder-decoder shaped deep learning networks have been the de facto standard style for semantic segmentation tasks in medical imaging analysis, enormous spirit is still required to be spent on designing the detail architecture of the down-sampling and up-sampling blocks. In this work, we propose a neural architecture search (NAS) based solution to brain tumor segmentation tasks on multimodal volumetric MRI scans. Three sets of candidate operations are composed respectively for three kinds of basic building blocks in which each operation is assigned with a specific probabilistic parameter to be learned. Through alternately updating the weights of operations and the other parameters in the network the searching mechanism ends up with two optimal structures for the upward and downward blocks. Moreover, the developed solution also integrates normalization and patching strategies tailored for brain MRI processing. Extensive comparative experiments on the BraTS 2019 dataset demonstrate that the proposed algorithm not only could relieve the pressure of fabricating block architectures but also possesses competitive performances.
摘要：过去几年已经见证了各种医疗领域的人工智能的启发进化。神经胶质瘤的诊断和治疗 - 低生存率最常见到脑肿瘤中的一个 - 在很大程度上依赖于在磁共振成像（MRI）扫描所进行的计算机辅助分割处理。虽然编码器，解码器形深度学习网络已经在医疗成像分析语义分割任务的事实上的标准样式，仍然需要巨大的精神要在设计下采样和上采样块的详细结构中度过。在这项工作中，我们提出了多模态体积MRI扫描神经结构搜索（NAS）的解决方案，以脑肿瘤分割任务。三组候选人的操作是三类基本的构建模块，其中每个操作被分配到学习特定的概率参数分别组成。通过交替地更新操作的权重和其它参数在网络中搜索机制与用于向上和向下块两条最优结构结束。此外，开发的解决方案还集成了规范化和脑MRI加工定制修补策略。对数据集表明，该算法不仅可以缓解制造块架构的压力，但也具有竞争力的表演臭小子2019广泛的比较实验。

26. Multi-modal Embedding Fusion-based Recommender [PDF] 返回目录
Anna Wroblewska, Jacek Dabrowski, Michal Pastuszak, Andrzej Michalowski, Michal Daniluk, Barbara Rychalska, Mikolaj Wieczorek, Sylwia Sysko-Romanczuk
Abstract: Recommendation systems have lately been popularized globally, with primary use cases in online interaction systems, with significant focus on e-commerce platforms. We have developed a machine learning-based recommendation platform, which can be easily applied to almost any items and/or actions domain. Contrary to existing recommendation systems, our platform supports multiple types of interaction data with multiple modalities of metadata natively. This is achieved through multi-modal fusion of various data representations. We deployed the platform into multiple e-commerce stores of different kinds, e.g. food and beverages, shoes, fashion items, telecom operators. Here, we present our system, its flexibility and performance. We also show benchmark results on open datasets, that significantly outperform state-of-the-art prior work.
摘要：推荐系统最近已经被全球范围内普及，与主要用例在网上互动系统，具有显著专注于电子商务平台。我们已经开发出一种基于机器学习的推荐平台，可以很容易地应用到几乎所有的项目和/或动作域。相反，现有的推荐系统，我们的平台支持多种类型的元数据本身的多个模态交互数据。这可以通过各种数据表示的多模态融合实现的。我们部署的平台，为不同类型的多个电子商务商店，例如食品和饮料，鞋子，时尚用品，电信运营商。在这里，我们提出我们的系统，它的灵活性和性能。我们还表明在开放数据集的基准测试结果，即显著强于大盘国家的最先进的前期工作。

27. A Survey on Patch-based Synthesis: GPU Implementation and Optimization [PDF] 返回目录
Hadi Abdi Khojasteh
Abstract: This thesis surveys the research in patch-based synthesis and algorithms for finding correspondences between small local regions of images. We additionally explore a large kind of applications of this new fast randomized matching technique. One of the algorithms we have studied in particular is PatchMatch, can find similar regions or "patches" of an image one to two orders of magnitude faster than previous techniques. The algorithmic program is driven by applying mathematical properties of nearest neighbors in natural images. It is observed that neighboring correspondences tend to be similar or "coherent" and use this observation in algorithm in order to quickly converge to an approximate solution. The algorithm is the most general form can find k-nearest neighbor matching, using patches that translate, rotate, or scale, using arbitrary descriptors, and between two or more images. Speed-ups are obtained over various techniques in an exceeding range of those areas. We have explored many applications of PatchMatch matching algorithm. In computer graphics, we have explored removing unwanted objects from images, seamlessly moving objects in images, changing image aspect ratios, and video summarization. In computer vision we have explored denoising images, object detection, detecting image forgeries, and detecting symmetries. We conclude by discussing the restrictions of our algorithmic program, GPU implementation and areas for future analysis.
摘要：本文调查研究基于贴片合成和查找图像的小局部区域之间的对应的算法。我们还探索出大样的这个新的快速随机匹配技术的应用。其中一个我们特别研究了算法是PatchMatch，可以找到类似的区域或图像一到两个数量级比以前的技术快的“补丁”。该算法程序通过应用自然图像的近邻的数学特性驱动。据观察，相邻的对应趋于相似或“一致”，并使用该观察算法，以便迅速收敛到一个近似解。该算法是最一般的形式可以找到k近邻匹配，使用补丁平移，旋转或缩放，使用任意的描述符，以及两个或多个图像之间。速度-UPS在这些地区的超过量程过各种技术获得。我们已经探索PatchMatch匹配算法的许多应用。在计算机图形学中，我们已经探索从图像中除去不需要的对象，在无缝图像移动的物体，改变图像的长宽比，和视频摘要。在计算机视觉我们探索图像去噪，对象检测，检测图像伪造，并检测对称性。最后，我们讨论我们的算法程序，GPU实现的，供日后分析的限制和地区。

28. Mitigating Gender Bias Amplification in Distribution by Posterior Regularization [PDF] 返回目录
Shengyu Jia, Tao Meng, Jieyu Zhao, Kai-Wei Chang
Abstract: Advanced machine learning techniques have boosted the performance of natural language processing. Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hidden in the corpus and further amplify it. However, their analysis is conducted only on models' top predictions. In this paper, we investigate the gender bias amplification issue from the distribution perspective and demonstrate that the bias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization. With little performance loss, our method can almost remove the bias amplification in the distribution. Our study sheds the light on understanding the bias amplification.
摘要：先进的机器学习技术已经提高自然语言处理的性能。然而，最近的研究，例如，赵等人。（2017）表明，这些技术无意中捕捉社会偏压隐藏在语料库和进一步放大它。然而，他们的分析仅在模型的顶部进行预测。在本文中，我们调查从分布角度来看，性别偏见放大的问题，并证明偏见在预测概率分布在标签视图放大。我们进一步提出了一种基于后正规化偏置缓解方法。几乎没有性能的损失，我们的方法几乎可以消除在分布偏差放大。我们的研究揭示了理解偏差放大的光。

29. Context Learning for Bone Shadow Exclusion in CheXNet Accuracy Improvement [PDF] 返回目录
Minh-Chuong Huynh, Trung-Hieu Nguyen, Minh-Triet Tran
Abstract: Chest X-ray examination plays an important role in lung disease detection. The more accuracy of this task, the more experienced radiologists are required. After ChestX-ray14 dataset containing over 100,000 frontal-view X-ray images of 14 diseases was released, several models were proposed with high accuracy. In this paper, we develop a work flow for lung disease diagnosis in chest X-ray images, which can improve the average AUROC of the state-of-the-art model from 0.8414 to 0.8445. We apply image preprocessing steps before feeding to the 14 diseases detection model. Our project includes three models: the first one is DenseNet-121 to predict whether a processed image has a better result, a convolutional auto-encoder model for bone shadow exclusion is the second one, and the last is the original CheXNet.
摘要：胸部X线检查起到肺部疾病的检测具有重要作用。此任务的更精确，需要更多的高年资医师。含有超过100,000正面视图的14种疾病的X射线图像ChestX-ray14数据集被释放后，几个模型提出了以高准确度。在本文中，我们开发在胸部的X射线图像，从而可以提高国家的最先进的模型的平均AUROC从0.8414到0.8445肺疾病的诊断工作流程。我们饲养的14种疾病的检测模型之前，应用图像预处理步骤。我们的项目包括三种模式：第一种是DenseNet-121预测处理后的图像是否有更好的结果，为骨阴影排除卷积自动编码器模式是第二个，最后是原始CheXNet。

30. Generalized Multi-view Shared Subspace Learning using View Bootstrapping [PDF] 返回目录
Krishna Somandepalli, Shrikanth Narayanan
Abstract: A key objective in multi-view learning is to model the information common to multiple parallel views of a class of objects/events to improve downstream learning tasks. In this context, two open research questions remain: How can we model hundreds of views per event? Can we learn robust multi-view embeddings without any knowledge of how these views are acquired? We present a neural method based on multi-view correlation to capture the information shared across a large number of views by subsampling them in a view-agnostic manner during training. To provide an upper bound on the number of views to subsample for a given embedding dimension, we analyze the error of the bootstrapped multi-view correlation objective using matrix concentration theory. Our experiments on spoken word recognition, 3D object classification and pose-invariant face recognition demonstrate the robustness of view bootstrapping to model a large number of views. Results underscore the applicability of our method for a view-agnostic learning setting.
摘要：在多视图学习的一个关键目的是将信息公共模型的一类对象/事件的多个并行的观点，提高下游学习任务。在这种情况下，两个开放研究的问题是：我们如何可以模拟数百每个事件的看法？我们可以学习没有对这些观点是如何获得的任何知识，强大的多视图的嵌入？提出了一种基于多视点相关神经方法通过训练过程中的视图无关的方式进行子采样它们捕捉在大量的视图共享的信息。提供有关的意见，子样本对于给定的嵌入维数的上限，我们分析了自举多视点相关目标利用矩阵理论浓度的误差。我们对口语单词识别实验，3D对象分类和姿态不变的人脸识别演示视图引导的稳健性来模拟大量的意见。结果强调了我们方法的适用性为视图无关的学习环境。

31. Monotone Boolean Functions, Feasibility/Infeasibility, LP-type problems and MaxCon [PDF] 返回目录
David Suter, Ruwan Tennakoon, Erchuan Zhang, Tat-Jun Chin, Alireza Bab-Hadiashar
Abstract: This paper outlines connections between Monotone Boolean Functions, LP-Type problems and the Maximum Consensus Problem. The latter refers to a particular type of robust fitting characterisation, popular in Computer Vision (MaxCon). Indeed, this is our main motivation but we believe the results of the study of these connections are more widely applicable to LP-type problems (at least 'thresholded versions', as we describe), and perhaps even more widely. We illustrate, with examples from Computer Vision, how the resulting perspectives suggest new algorithms. Indeed, we focus, in the experimental part, on how the Influence (a property of Boolean Functions that takes on a special form if the function is Monotone) can guide a search for the MaxCon solution.
摘要：单调布尔函数，LP-类型的问题和最大共识问题之间本文概述连接。后者是指一个特定类型的稳健拟合表征，在计算机视觉（MaxCon）流行。事实上，这是我们的主要动机，但我们相信，这些连接的研究成果更广泛地适用于LP型问题（至少是“阈值化版本”，为我们描绘了），甚至更广泛的应用。我们举例说明，从计算机视觉，产生的观点如何提出新的算法的例子。事实上，我们注重在实验部分，对影响（布尔函数是需要如果函数是单调一种特殊形式的财产）如何引导为MaxCon解决方案的搜索。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-14

目录

摘要