摘要

1. From ImageNet to Image Classification: Contextualizing Progress on Benchmarks [PDF] 返回目录
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry
Abstract: Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignments into account. To facilitate further research, we release our refined ImageNet annotations at this https URL.
摘要：构建丰富的机器学习数据集，可扩展的方式往往需要一个人群来源的数据收集管道。在这项工作中，我们用人类研究来使用这种管道的后果，注重流行ImageNet数据集。我们研究如何具体设计选择在ImageNet创建过程的影响所产生的数据集的保真度---包括引进偏见是国家的最先进的车型开发的。我们的分析精确定位如何嘈杂的数据收集管道可能会导致产生的基准，它充当一个服务器上的真实世界的任务之间的系统性偏差。最后，我们的研究结果强调有必要增强我们目前的模型训练和评估工具包采取这种错位考虑。为了便于进一步的研究，我们在此HTTPS URL释放我们的成品ImageNet注解。

2. Convolutional Neural Networks applied to sky images for short-term solar irradiance forecasting [PDF] 返回目录
Quentin Paletta, Joan Lasenby
Abstract: Despite the advances in the field of solar energy, improvements of solar forecasting techniques, addressing the intermittent electricity production, remain essential for securing its future integration into a wider energy supply. A promising approach to anticipate irradiance changes consists of modeling the cloud cover dynamics from ground taken or satellite images. This work presents preliminary results on the application of deep Convolutional Neural Networks for 2 to 20 min irradiance forecasting using hemispherical sky images and exogenous variables. We evaluate the models on a set of irradiance measurements and corresponding sky images collected in Palaiseau (France) over 8 months with a temporal resolution of 2 min. To outline the learning of neural networks in the context of short-term irradiance forecasting, we implemented visualisation techniques revealing the types of patterns recognised by trained algorithms in sky images. In addition, we show that training models with past samples of the same day improves their forecast skill, relative to the smart persistence model based on the Mean Square Error, by around 10% on a 10 min ahead prediction. These results emphasise the benefit of integrating previous same-day data in short-term forecasting. This, in turn, can be achieved through model fine tuning or using recurrent units to facilitate the extraction of relevant temporal features from past data.
摘要：尽管在太阳能领域的进步，太阳能预测技术，解决间歇性电力生产的改进，仍是确保其未来的融入更广泛的能源供应至关重要。有前途的方法来预测辐射变化包括从造型拍摄地面或卫星图像的云层动态。这项工作提出了深入的卷积神经网络的使用半球形天空图像和外生变量2至20分钟辐照预测为应用程序的初步结果。我们评估了一套辐射测量和帕莱（法国）8个月以上收集用2分钟的时间分辨率相应的天空图像的模式。大纲在短期辐照预测的背景下神经网络的学习，我们实现了可视化技术揭示类型的空中图像训练的算法识别的图案。此外，我们显示出与过去同日的样本培训模式提高他们的预报技巧，相对于基于均方误差智能持久性模型，大约10％在10分钟提前预测。这些结果强调短期预报以前当日数据集成的好处。这反过来，可以通过模型微调或反复使用单位，以促进相关时空特征从过去的数据中提取实现。

3. Predicting Video features from EEG and Vice versa [PDF] 返回目录
Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
Abstract: In this paper we explore predicting facial or lip video features from electroencephalography (EEG) features and predicting EEG features from recorded facial or lip video frames using deep learning models. The subjects were asked to read out loud English sentences shown to them on a computer screen and their simultaneous EEG signals and facial video frames were recorded. Our model was able to generate very broad characteristics of the facial or lip video frame from input EEG features. Our results demonstrate the first step towards synthesizing high quality facial or lip video from recorded EEG features. We demonstrate results for a data set consisting of seven subjects.
摘要：本文将探讨脑电图（EEG）特征预测脸部或唇部视频功能和预测EEG使用深度学习模型记录脸部或唇部的视频帧功能。受试者被要求读出证明他们在电脑屏幕上响亮的英文句子和他们同时EEG信号和面部视频帧的记录。我们的模型能够根据输入生成EEG功能的脸部或唇部视频帧的非常广等特点。我们的研究结果表明对从记录的EEG功能合成高品质的脸部或唇部视频的第一步。我们演示了由七个科数据集的结果。

4. KL-Divergence-Based Region Proposal Network for Object Detection [PDF] 返回目录
Geonseok Seo, Jaeyoung Yoo, Jaeseok Choi, Nojun Kwak
Abstract: The learning of the region proposal in object detection using the deep neural networks (DNN) is divided into two tasks: binary classification and bounding box regression task. However, traditional RPN (Region Proposal Network) defines these two tasks as different problems, and they are trained independently. In this paper, we propose a new region proposal learning method that considers the bounding box offset's uncertainty in the objectness score. Our method redefines RPN to a problem of minimizing the KL-divergence, difference between the two probability distributions. We applied KL-RPN, which performs region proposal using KL-Divergence, to the existing two-stage object detection framework and showed that it can improve the performance of the existing method. Experiments show that it achieves 2.6% and 2.0% AP improvements on MS COCO test-dev in Faster R-CNN with VGG-16 and R-FCN with ResNet-101 backbone, respectively.
摘要：区域提案的物体检测使用深层神经网络（DNN）的学习分为两个任务：二元分类和边框回归任务。然而，传统的RPN（地区议案网）定义了这两个任务，不同的问题，它们是独立的培训。在本文中，我们建议考虑边框偏移的在对象性得分不确定新的区域建议学习方法。我们的方法重新定义RPN到最小化KL散度，在两个概率分布之间的差异的问题。我们应用KL-RPN，其执行区域提议使用KL散度，现有的双级对象检测框架，并表明，它可以改善现有方法的性能。实验结果表明，它分别达到在MS COCO测试-dev的2.6％和2.0％AP改进更快R-CNN与VGG-16和R-FCN与RESNET-101骨干。

5. Symbolic Pregression: Discovering Physical Laws from Raw Distorted Video [PDF] 返回目录
Silviu-Marian Udrescu, Max Tegmark
Abstract: We present a method for unsupervised learning of equations of motion for objects in raw and optionally distorted unlabeled video. We first train an autoencoder that maps each video frame into a low-dimensional latent space where the laws of motion are as simple as possible, by minimizing a combination of non-linearity, acceleration and prediction error. Differential equations describing the motion are then discovered using Pareto-optimal symbolic regression. We find that our pre-regression ("pregression") step is able to rediscover Cartesian coordinates of unlabeled moving objects even when the video is distorted by a generalized lens. Using intuition from multidimensional knot-theory, we find that the pregression step is facilitated by first adding extra latent space dimensions to avoid topological problems during training and then removing these extra dimensions via principal component analysis.
摘要：我们提出了一种用于运动方程的无监督学习用于原料和任选的未标记失真的视频对象。我们首先训练每个视频帧映射到低维空间潜其中运动定律是尽可能简单，通过最小化非线性，加速度和预测误差的组合的自动编码器。描述运动微分方程然后使用帕累托最优符号回归发现。我们发现，我们的预回归（“pregression”）步骤能够重新发现未标记的运动的直角坐标即使视频是由一个广义的透镜扭曲对象。从多维结理论使用的直觉，我们发现pregression步骤是，首先添加额外的潜在空间尺寸，以避免在训练中的拓扑问题，然后通过去除主成分分析这些额外维度便利。

6. Deep covariate-learning: optimising information extraction from terrain texture for geostatistical modelling applications [PDF] 返回目录
Charlie Kirkwood
Abstract: Where data is available, it is desirable in geostatistical modelling to make use of additional covariates, for example terrain data, in order to improve prediction accuracy in the modelling task. While elevation itself may be important, additional explanatory power for any given problem can be sought (but not necessarily found) by filtering digital elevation models to extact higher-order derivatives such as slope angles, curvatures, and roughness. In essence, it would be beneficial to extract as much task-relevant information as possible from the elevation grid. However, given the complexities of the natural world, chance dictates that the use of 'off-the-shelf' filters is unlikely to derive covariates that provide strong explanatory power to the target variable at hand, and any attempt to manually design informative covariates is likely to be a trial-and-error process -- not optimal. In this paper we present a solution to this problem in the form of a deep learning approach to automatically deriving optimal task-specific terrain texture covariates from a standard SRTM 90m gridded digital elevation model (DEM). For our target variables we use point-sampled geochemical data from the British Geological Survey: concentrations of potassium, calcium and arsenic in stream sediments. We find that our deep learning approach produces covariates for geostatistical modelling that have surprisingly strong explanatory power on their own, with R-squared values around 0.6 for all three elements (with arsenic on the log scale). These results are achieved without the neural network being provided with easting, northing, or absolute elevation as inputs, and purely reflect the capacity of our deep neural network to extract task-specific information from terrain texture. We hope that these results will inspire further investigation into the capabilities of deep learning within geostatistical applications.
摘要：如果数据是可用的，它是在地质统计建模的需要利用额外的协变量，例如地形数据，以提高在建模任务的预测精度。而海拔本身可能是重要的，对于任何给定问题的其他解释力可以通过过滤数字高程模型到extact高阶衍生物如倾斜角，曲率，和粗糙度来寻求（但不一定找到）。从本质上说，这将是有益的提取物作为从海拔电网尽可能多的任务相关的信息。不过，考虑到自然世界的复杂性，偶然性决定了使用的“现成的架子”过滤器是不太可能提供强大的解释力手头的目标变量派生协变量，任何试图手动设计信息协变量是可能是一个试错的过程 - 不是最佳的。在本文中，我们提出在深学习方法的形式解决这个问题，从一个标准的SRTM90米自动获得任务的特定最佳地形纹理协网格数字高程模型（DEM）。对于我们的目标变量，我们使用来自英国地质调查局点采样地球化学数据：在水系沉积物钾，钙和砷的浓度。我们发现，我们深深的学习方式产生了对自己的意外强劲的解释力为地质统计建模协与R平方值在0.6左右为所有三个要素（与砷对数刻度）。这些结果不设置东向，北向，或绝对标高为输入神经网络来实现，而纯粹是反映了我们的深层神经网络来提取任务的具体信息，从地形纹理的能力。我们希望这些结果将激励进一步调查深度学习的地质统计应用中的功能。

7. A Comparative Evaluation of Heart Rate Estimation Methods using Face Videos [PDF] 返回目录
Javier Hernandez-Ortega, Julian Fierrez, Aythami Morales, David Diaz
Abstract: This paper presents a comparative evaluation of methods for remote heart rate estimation using face videos, i.e., given a video sequence of the face as input, methods to process it to obtain a robust estimation of the subjects heart rate at each moment. Four alternatives from the literature are tested, three based in hand crafted approaches and one based on deep learning. The methods are compared using RGB videos from the COHFACE database. Experiments show that the learning-based method achieves much better accuracy than the hand crafted ones. The low error rate achieved by the learning based model makes possible its application in real scenarios, e.g. in medical or sports environments.
摘要：本文呈现的使用面部的视频，即，用于远程心脏速率估计方法的比较评价，给出的面部作为输入的视频序列，方法来处理它，以获得在每个时刻受试者心脏速率的鲁棒估计。从文献四个备选方案进行了测试，以在手工制作的方法三个基于深度学习一个。该方法是使用RGB视频从COHFACE数据库进行比较。实验表明，基于学习的方法实现更好的精度比手工制作的。通过学习基于模型实现的低误码率使得其可能在实际情况下，例如应用程序在医疗用或运动环境。

8. Driver Identification through Stochastic Multi-State Car-Following Modeling [PDF] 返回目录
Donghao Xu, Zhezhang Ding, Chenfeng Tu, Huijing Zhao, Mathieu Moze, François Aioun, Franck Guillemard
Abstract: Intra-driver and inter-driver heterogeneity has been confirmed to exist in human driving behaviors by many studies. In this study, a joint model of the two types of heterogeneity in car-following behavior is proposed as an approach of driver profiling and identification. It is assumed that all drivers share a pool of driver states; under each state a car-following data sequence obeys a specific probability distribution in feature space; each driver has his/her own probability distribution over the states, called driver profile, which characterize the intradriver heterogeneity, while the difference between the driver profile of different drivers depict the inter-driver heterogeneity. Thus, the driver profile can be used to distinguish a driver from others. Based on the assumption, a stochastic car-following model is proposed to take both intra-driver and inter-driver heterogeneity into consideration, and a method is proposed to jointly learn parameters in behavioral feature extractor, driver states and driver profiles. Experiments demonstrate the performance of the proposed method in driver identification on naturalistic car-following data: accuracy of 82.3% is achieved in an 8-driver experiment using 10 car-following sequences of duration 15 seconds for online inference. The potential of fast registration of new drivers are demonstrated and discussed.
摘要：内部驱动器和驱动器间的异质性已被证实由许多研究在人类驾驶行为存在。在这项研究中，两种类型的跟车行为异质性的联合模型被提议为驾驶员分析和鉴定的方法。假设所有驱动程序共享驱动器状态的池;每个状态下的车厢跟随数据序列服从在特征空间中的特定概率分布;每个司机都有在美国他/她自己的概率分布，叫驾驶者资料，其中表征intradriver异质性，而不同的驱动器的驱动器外形的差值描绘驱动器间的异质性。因此，驾驶员简档可被用于区分其它的驱动器。基于这样的假设，随机跟车模型，提出采取两个内部驱动器和驱动器间的异质性加以考虑，并提出了一种方法，共同学习行为特征提取，驾驶状态和驾驶员图像参数。实验证明在自然跟车数据驱动器识别所提出方法的性能：82.3％的准确度是使用年限15秒10跟车序列在线推断在8驱动实验实现。新的驱动程序的快速登记的潜力被证明和讨论。

9. Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks [PDF] 返回目录
Hokuto Hirano, Kazuki Koga, Kazuhiro Takemoto
Abstract: Under the epidemic of the novel coronavirus disease 2019 (COVID-19), chest X-ray computed tomography imaging is being used for effectively screening COVID-19 patients. The development of computer-aided systems based on deep neural networks (DNNs) has been advanced, to rapidly and accurately detect COVID-19 cases, because the need for expert radiologists, who are limited in number, forms a bottleneck for the screening. However, so far, the vulnerability of DNN-based systems has been poorly evaluated, although DNNs are vulnerable to a single perturbation, called universal adversarial perturbation (UAP), which can induce DNN failure in most classification tasks. Thus, we focus on representative DNN models for detecting COVID-19 cases from chest X-ray images and evaluate their vulnerability to UAPs generated using simple iterative algorithms. We consider nontargeted UAPs, which cause a task failure resulting in an input being assigned an incorrect label, and targeted UAPs, which cause the DNN to classify an input into a specific class. The results demonstrate that the models are vulnerable to nontargeted and targeted UAPs, even in case of small UAPs. In particular, 2% norm of the UPAs to the average norm of an image in the image dataset achieves >85% and >90% success rates for the nontargeted and targeted attacks, respectively. Due to the nontargeted UAPs, the DNN models judge most chest X-ray images as COVID-19 cases. The targeted UAPs make the DNN models classify most chest X-ray images into a given target class. The results indicate that careful consideration is required in practical applications of DNNs to COVID-19 diagnosis; in particular, they emphasize the need for strategies to address security concerns. As an example, we show that iterative fine-tuning of the DNN models using UAPs improves the robustness of the DNN models against UAPs.
摘要：在所述新的冠状病毒病2019（COVID-19）的流行，胸部X射线计算机断层成像被用于COVID-19的患者有效筛选。基于深层神经网络（DNNs）计算机辅助系统的发展一直是先进，快速，准确地检测COVID-19的情况下，因为需要放射学专家，谁数量有限，形成了筛选的瓶颈。然而，到目前为止，基于DNN系统的脆弱性已经很差评估，虽然DNNs很容易受到单一扰动，称通用对抗扰动（UAP），可诱导DNN故障在大多数的分类任务。因此，我们集中于代表DNN模型从胸部X射线图像检测COVID-19病例和评估他们易受使用简单的迭代算法生成UAPs。我们认为非定向UAPs，这造成导致输入被分配了不正确的标签，任务失败，并有针对性UAPs，这导致DNN到输入分为一个特定的类。结果表明，这些模型很容易受到非定向和定向UAPs，即使在小UAPs的情况。特别地，UPAS的2％范数的图像的图像数据组中达到，分别平均规范> 85％和> 90％的成功率的非定标和目标的攻击。由于非靶UAPs，DNN的模型判断最胸部X射线图像作为COVID-19的情况。有针对性的UAPs使DNN模型最胸部X射线图像分为给定的目标类。结果表明，在仔细考虑在DNNs至COVID-19诊断的实际应用所需的;尤其是，他们强调的战略，以解决安全问题的需要。作为一个例子，我们表明，使用UAPs的DNN模型的重复微调提高对UAPs的DNN模型的鲁棒性。

10. Polarimetric image augmentation [PDF] 返回目录
Marc Blanchon, Olivier Morel, Fabrice Meriaudeau, Ralph Seulin, Désiré Sidibé
Abstract: Robotics applications in urban environments are subject to obstacles that exhibit specular reflections hampering autonomous navigation. On the other hand, these reflections are highly polarized and this extra information can successfully be used to segment the specular areas. In nature, polarized light is obtained by reflection or scattering. Deep Convolutional Neural Networks (DCNNs) have shown excellent segmentation results, but require a significant amount of data to achieve best performances. The lack of data is usually overcomed by using augmentation methods. However, unlike RGB images, polarization images are not only scalar (intensity) images and standard augmentation techniques cannot be applied straightforwardly. We propose to enhance deep learning models through a regularized augmentation procedure applied to polarimetric data in order to characterize scenes more effectively under challenging conditions. We subsequently observe an average of 18.1% improvement in IoU between non augmented and regularized training procedures on real world data.
摘要：在城市环境中的机器人应用受到呈现镜面反射阻碍了自主导航的障碍。在另一方面，这些思考是高度极化，这些额外的信息可以成功地用来分割镜面区域。在自然界中，偏振光被反射或散射而获得。深卷积神经网络（DCNNs）显示优秀的分割结果，但需要数据的显著量以达到最好的演出。缺乏数据通常通过使用增强的方法克服了。然而，不同于RGB图像，偏振图像不仅标量（强度）的图像和标准扩增技术不能被直接应用。我们建议通过艰难的条件下更有效地应用于极化数据正则增高过程中为了表征场景，以提高深的学习模式。随后，我们观察到平均18.1％的改善借据上现实世界的数据非增强和正规化训练程序之间。

11. Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification [PDF] 返回目录
Hongyu Li, Xiaogang Huang, Zhihui Fu, Xiaolin Li
Abstract: Preserving original noise residuals in images are critical to image fraud identification. Since the resizing operation during deep learning will damage the microstructures of image noise residuals, we propose a framework for directly training images of original input scales without resizing. Our arbitrary-sized image training method mainly depends on the pseudo-batch gradient descent (PBGD), which bridges the gap between the input batch and the update batch to assure that model updates can normally run for arbitrary-sized images. In addition, a 3-phase alternate training strategy is designed to learn optimal residual kernels for image fraud identification. With the learnt residual kernels and PBGD, the proposed framework achieved the state-of-the-art results in image fraud identification, especially for images with small tampered regions or unseen images with different tampering distributions.
摘要：在图像保留原始噪声残差图像识别欺诈的关键。由于深学习期间调整操作会破坏图像噪声残差的微观结构，我们提出了一个框架，直接培训原始输入秤的图像，而无需调整。我们的任意大小的图像训练方法主要取决于伪批次梯度下降（PBGD），桥接可以正常为任意大小的图像运行输入批次和批次更新，以确保模型更新之间的间隙。另外，三相交流训练策略旨在学习用于图像识别欺诈最佳残留内核。与所学习的残余内核和PBGD，所提出的框架实现状态的最先进的结果在图像识别欺诈，特别是对于具有小篡改区域或看不见的图像具有不同的篡改分布图像。

12. Style Normalization and Restitution for Generalizable Person Re-identification [PDF] 返回目录
Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Li Zhang
Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.
摘要：现有的完全监督人重新鉴定（里德）方法通常引起域的差距差泛化能力受到影响。关键要过滤掉身份无关的干扰和学习领域不变的人表示解决这个问题的根本所在。在本文中，我们的目标是设计一个可推广人里德框架，列车源域的模型还能够概括的目标域/表现良好。为了实现这一目标，我们提出了一个简单而有效的方式正常化和恢复原状（SNR）模块。具体来说，我们筛选出风格的变化（例如，照明，色彩对比）通过实例正常化（IN）。然而，这样的过程中不可避免地去除的区别信息。我们建议从被删除信息提制身份相关特征，并将其归还到网络，以确保高的歧视。为了更好的解开，我们强制执行SNR双因果损失的约束，鼓励与身份相关的特征和身份无关的功能分离。大量的实验证明我们的架构的强大的推广能力。我们通过SNR模块授权模型显著优于国家的最先进的领域泛化里德基准多个广泛使用的人接近，也显示在无人监管的领域适应性的优势。

13. Position-based Scaled Gradient for Model Quantization and Sparse Training [PDF] 返回目录
Jangho Kim, KiYoon Yoo, Nojun Kwak
Abstract: We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to a weight vector is very useful in model compression domains such as quantization and sparse training. PSG reduces the gap between the weight distributions of a full-precision model and its compressed counterpart. This enables the versatile deployment of a model either as an uncompressed mode or as a compressed mode depending on the availability of resources. The experimental results on CIFAR-10/100 and Imagenet datasets show the effectiveness of the proposed PSG in both domains of sparse training and quantization even for extremely low bits.
摘要：本文提出了基于位置的缩放梯度（PSG），梯度取决于权重向量的位置刻度，使之更压缩友好。首先，我们理论上表明施加PSG到标准梯度下降（GD），其被称为PSGD，相当于在变形的重量空间GD，通过经由适当设计的可逆函数翘曲原始重量空间形成的空间中。其次，我们根据经验表明，PSG作为一个正则的权重向量是在模型压缩领域，如量化和稀疏的培训非常有用的。 PSG减少一个全精度模型的重量分布和其压缩对方之间的差距。这使得模型的灵活多变的部署无论是作为非压缩模式或根据资源的可用性压缩模式。上CIFAR-10/100和Imagenet数据集的实验结果表明，该PSG的稀疏训练和量化甚至极低比特的两个域的有效性。

14. Bi-direction Context Propagation Network for Real-time Semantic Segmentation [PDF] 返回目录
Shijie Hao, Yuan Zhou, Yanrong Guo
Abstract: Spatial details and context correlations are two types of critical information for semantic segmentation. Generally, spatial details are most likely existed in shallow layers, but context correlations are most likely existed in deep layers. Aiming to use both of them, most of current methods choose forward transmitting the spatial details to deep layers. We find spatial details transmission is computationally expensives, and substantially lowers the model's execution speed. To address this problem, we propose a new Bi-direction Contexts Propagation Network (BCPNet), which performs semantic segmentation in real-time. Different from the previous methods, our BCPNet effectively back propagate the context information to the shallow layers, which is more computationally modesty. Extensive experiments validate that our BCPNet has achieved a good balance between accuracy and speed. For accuracy, our BCPNet has achieved 68.4 \% IoU on the Cityscapes test set and 67.8 % mIoU on the CamVid test set. For speed, our BCPNet can achieve 585.9 FPS and 1.7 ms runtime per an image.
摘要：空间信息和上下文相关性是两种类型的语义分割的关键信息。一般来说，空间细节最有可能在浅层存在，但情境相关性在深层最有可能存在的。针对使用它们两个，大部分目前的方法选择向前传输的空间细节深层。我们发现空间信息传输是计算expensives，并且基本上降低了模型的执行速度。为了解决这个问题，我们提出了一个新的双向上下文传播网络（BCPNet），执行实时语义分割。从以前的方法不同的是，我们的BCPNet有效回传播上下文信息的浅层，这在计算上更谦虚。大量的实验验证了我们的BCPNet取得了精度和速度之间的良好平衡。为了精确，我们BCPNet取得了68.4 \％借条上的风情测试集和67.8％米欧在CamVid测试集。对于速度，我们BCPNet可以达到585.9 FPS和每一个图像1.7毫秒运行时。

15. Feature selection for gesture recognition in Internet-of-Things for healthcare [PDF] 返回目录
Giulia Cisotto, Martina Capuzzo, Anna V. Guglielmi, Andrea Zanella
Abstract: Internet of Things is rapidly spreading across several fields, including healthcare, posing relevant questions related to communication capabilities, energy efficiency and sensors unobtrusiveness. Particularly, in the context of recognition of gestures, e.g., grasping of different objects, brain and muscular activity could be simultaneously recorded via EEG and EMG, respectively, and analyzed to identify the gesture that is being accomplished, and the quality of its performance. This paper proposes a new algorithm that aims (i) to robustly extract the most relevant features to classify different grasping tasks, and (ii) to retain the natural meaning of the selected features. This, in turn, gives the opportunity to simplify the recording setup to minimize the data traffic over the communication network, including Internet, and provide physiologically significant features for medical interpretation. The algorithm robustness is ensured both by consensus clustering as a feature selection strategy, and by nested cross-validation scheme to evaluate its classification performance.
摘要：物联网是迅速在几个领域，包括医疗保健传播，冒充通信相关的功能，能源效率和传感器的不可见性相关的问题。特别地，在识别手势的上下文中，例如，不同的对象，脑和肌肉活动的把持可以同时分别经由EEG和EMG，记录和分析以识别正在被完成的姿势，并且其性能的质量。本文提出了一种新的算法，其目的是（i）向稳健提取最相关的功能，以不同的抓任务分类，及（ii）保留选择的要素自然意义。这反过来，给人的机会来简化记录设置，以尽量减少在通信网络，包括互联网数据业务，并为医疗翻译提供生理显著的特点。该算法的鲁棒性是一致聚类特征选择策略，确保两者，通过嵌套交叉验证方案，以评估其分类性能。

16. Spoof Face Detection Via Semi-Supervised Adversarial Training [PDF] 返回目录
Chengwei Chen, Wang Yuan, Xuequan Lu, Lizhuang Ma
Abstract: Face spoofing causes severe security threats in face recognition systems. Previous anti-spoofing works focused on supervised techniques, typically with either binary or auxiliary supervision. Most of them suffer from limited robustness and generalization, especially in the cross-dataset setting. In this paper, we propose a semi-supervised adversarial learning framework for spoof face detection, which largely relaxes the supervision condition. To capture the underlying structure of live faces data in latent representation space, we propose to train the live face data only, with a convolutional Encoder-Decoder network acting as a Generator. Meanwhile, we add a second convolutional network serving as a Discriminator. The generator and discriminator are trained by competing with each other while collaborating to understand the underlying concept in the normal class(live faces). Since the spoof face detection is video based (i.e., temporal information), we intuitively take the optical flow maps converted from consecutive video frames as input. Our approach is free of the spoof faces, thus being robust and general to different types of spoof, even unknown spoof. Extensive experiments on intra- and cross-dataset tests show that our semi-supervised method achieves better or comparable results to state-of-the-art supervised techniques.
摘要：面对欺骗引起的面部识别系统严重的安全威胁。上一页防伪工作重点监督的技术，通常使用二进制或辅助监督。他们中的大多数遭受有限的鲁棒性和推广，特别是在跨数据集的设置。在本文中，我们提出了恶搞的人脸检测，这在很大程度上放松了监管条件的半监督敌对学习框架。为了捕捉在潜表示空间活面数据的底层结构，我们建议只训练真人面部数据，具有卷积编码器 - 解码器网络用作发电机。同时，我们添加作为鉴别第二卷积网络。发电机和鉴别器通过而协作，以了解在正常类（活面）的基本概念相互竞争的培训。由于欺骗脸部检测是基于（即，时间信息）的视频，我们采取直观光流映射从连续视频帧作为输入进行转换。我们的做法是自由恶搞的面孔，因此是稳健的和一般的不同类型的恶搞，甚至未知恶搞。在区域内和跨数据集测试，大量的实验表明，我们的半监督方法实现更好的或比较的结果，国家的最先进的技术监督。

17. A CNN-LSTM Architecture for Detection of Intracranial Hemorrhage on CT scans [PDF] 返回目录
Nhan T. Nguyen, Dat Q. Tran, Nghia T. Nguyen, Ha Q. Nguyen
Abstract: We propose a novel method that combines a convolutional neural network (CNN) with a long short-term memory (LSTM) mechanism for accurate prediction of intracranial hemorrhage on computed tomography (CT) scans. The CNN plays the role of a slice-wise feature extractor while the LSTM is responsible for linking the features across slices. The whole architecture is trained end-to-end with input being an RGB-like image formed by stacking 3 different viewing windows of a single slice. We validate the method on the recent RSNA Intracranial Hemorrhage Detection challenge and on the CQ500 dataset. For the RSNA challenge, our best single model achieves a weighted log loss of 0.0522 on the leaderboard, which is comparable to the top 3% performances, almost all of which make use of ensemble learning. Importantly, our method generalizes very well: the model trained on the RSNA dataset significantly outperforms the 2D model, which does not take into account the relationship between slices, on CQ500. Our codes and models is publicly avaiable at this https URL.
摘要：提出一种新的方法，它结合了一个长短期记忆（LSTM）机构，用于在计算机断层摄影（CT）扫描颅内出血的准确预测的卷积神经网络（CNN）。 CNN的起着逐个切片特征提取的作用，而LSTM负责在片连接的功能。整个体系结构被训练的端至端与输入端被一个RGB状图像通过堆叠单个切片的3个不同的观察窗形成。我们验证了最近RSNA颅内出血检测的挑战，在CQ500数据集的方法。对于RSNA挑战，我们最好的单一车型实现了0.0522的排行榜，这相当于前3％的演出，几乎全部使用集成学习的加权数损失。重要的是，我们的方法概括得很好：训练有素的RSNA数据集模型显著优于二维模型，不考虑切片之间的关系，对CQ500。我们的代码和模式是这个HTTPS URL公开缴费。

18. Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning [PDF] 返回目录
Youngjoon Yu, Hong Joo Lee, Byeong Cheon Kim, Jung Uk Kim, Yong Man Ro
Abstract: The success of multimodal data fusion in deep learning appears to be attributed to the use of complementary in-formation between multiple input data. Compared to their predictive performance, relatively less attention has been devoted to the robustness of multimodal fusion models. In this paper, we investigated whether the current multimodal fusion model utilizes the complementary intelligence to defend against adversarial attacks. We applied gradient based white-box attacks such as FGSM and PGD on MFNet, which is a major multispectral (RGB, Thermal) fusion deep learning model for semantic segmentation. We verified that the multimodal fusion model optimized for better prediction is still vulnerable to adversarial attack, even if only one of the sensors is attacked. Thus, it is hard to say that existing multimodal data fusion models are fully utilizing complementary relationships between multiple modalities in terms of adversarial robustness. We believe that our observations open a new horizon for adversarial attack research on multimodal data fusion.
摘要：在深学习多模态数据融合的成功似乎归因于使用在-形成多输入数据之间的互补的。相比于他们的预测性能，相对较少关注一直致力于多模态融合模型的鲁棒性。在本文中，我们研究了当前的多模态融合模型是否利用互补智能防御敌对攻击。我们采用基于梯度白盒攻击，如FGSM和PGD上MFNet，这是一个重大的多光谱（RGB，热）融合深度学习的语义分割模型。我们证实了更好的预测而优化的多模态融合模式仍然容易受到攻击的对抗性，即使传感器只有一个被攻击。因此，很难说现有的多模态数据融合模型，充分利用在对抗稳健性方面多模态之间的互补关系。我们相信，我们的观察打开多模态数据融合敌对攻击研究的新局面。

19. A Convolutional Neural Network with Parallel Multi-Scale Spatial Pooling to Detect Temporal Changes in SAR Images [PDF] 返回目录
Jia-Wei Chen, Rongfang Wang, Fan Ding, Bo Liu, Licheng Jiao, Jie Zhang
Abstract: In synthetic aperture radar (SAR) image change detection, it is quite challenging to exploit the changing information from the noisy difference image subject to the speckle. In this paper, we propose a multi-scale spatial pooling (MSSP) network to exploit the changed information from the noisy difference image. Being different from the traditional convolutional network with only mono-scale pooling kernels, in the proposed method, multi-scale pooling kernels are equipped in a convolutional network to exploit the spatial context information on changed regions from the difference image. Furthermore, to verify the generalization of the proposed method, we apply our proposed method to the cross-dataset bitemporal SAR image change detection, where the MSSP network (MSSP-Net) is trained on a dataset and then applied to an unknown testing dataset. We compare the proposed method with other state-of-arts and the comparisons are performed on four challenging datasets of bitemporal SAR images. Experimental results demonstrate that our proposed method obtains comparable results with S-PCA-Net on YR-A and YR-B dataset and outperforms other state-of-art methods, especially on the Sendai-A and Sendai-B datasets with more complex scenes. More important, MSSP-Net is more efficient than S-PCA-Net and convolutional neural networks (CNN) with less executing time in both training and testing phases.
摘要：在合成孔径雷达（SAR）图像变化检测，它是相当有挑战性的利用从嘈杂差分图像受斑点的变化的信息。在本文中，我们提出了一种多尺度空间汇集（MSSP）网络利用从嘈杂的差分图像的改变的信息。不同于传统的卷积网络不同，只有单尺度汇集内核，在所提出的方法，多尺度池内核都配备在卷积网络利用从差分图像上变更区域的空间上下文信息。此外，为了验证该方法的推广，我们应用我们提出的方法来跨数据集双颞SAR图像变化检测，其中MSSP网络（MSSP-网）是在数据集中培训，然后应用到一个未知的测试数据集。我们与其他国家的艺术比较提出的方法和比较是在双颞SAR图像四个有挑战性的数据集进行。实验结果表明，我们提出的方法获得与S-PCA-Net的上YR-A相当的结果和YR-B的数据集和优于其他国家的技术的方法，特别是在仙台-A和仙台-B的数据集与更复杂的场景。更重要的是，MSSP-Net的比S-PCA-Net和卷积神经网络（CNN）用更少的时间执行在训练和测试阶段更有效。

20. RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation [PDF] 返回目录
Donggen Dai, Wangkit Wong, Zhuojun Chen
Abstract: We address the challenging problem of RGB image-based head pose estimation. We first reformulate head pose representation learning to constrain it to a bounded space. Head pose represented as vector projection or vector angles shows helpful to improving performance. Further, a ranking loss combined with MSE regression loss is proposed. The ranking loss supervises a neural network with paired samples of the same person and penalises incorrect ordering of pose prediction. Analysis on this new loss function suggests it contributes to a better local feature extractor, where features are generalised to Abstract Landmarks which are pose-related features instead of pose-irrelevant information such as identity, age, and lighting. Extensive experiments show that our method significantly outperforms the current state-of-the-art schemes on public datasets: AFLW2000 and BIWI. Our model achieves significant improvements over previous SOTA MAE on AFLW2000 and BIWI from 4.50 to 3.66 and from 4.0 to 3.71 respectively. Source code will be made available at: this https URL.
摘要：基于地址的图像RGB头部姿态估计的具有挑战性的问题。首先，我们重新制定头部姿态表示学习到它限制在一个有限的空间。表示为矢量投影或载体头部姿态角示出有助于提高性能。此外，排名损失与MSE回归损失合并算法。排名损耗监督与同一个人的配对样本神经网络和惩罚姿态预测的不正确排序。分析这一新的损失函数表明，它有助于更好地局部特征提取，在功能推广到抽象地标它们姿态相关的功能，而不是姿态无关的信息，如身份，年龄和照明。大量的实验表明，我们的方法显著优于公共数据集的当前状态的最先进的方案：AFLW2000和BIWI。我们的模型实现了对AFLW2000和BIWI比以前的SOTA MAE显著的改善，从4.50到3.66和4.0至3.71分别。此HTTPS URL：源代码将在提供。

21. Focus Longer to See Better:Recursively Refined Attention for Fine-Grained Image Classification [PDF] 返回目录
Prateek Shroff, Tianlong Chen, Yunchao Wei, Zhangyang Wang
Abstract: Deep Neural Network has shown great strides in the coarse-grained image classification task. It was in part due to its strong ability to extract discriminative feature representations from the images. However, the marginal visual difference between different classes in fine-grained images makes this very task harder. In this paper, we tried to focus on these marginal differences to extract more representative features. Similar to human vision, our network repetitively focuses on parts of images to spot small discriminative parts among the classes. Moreover, we show through interpretability techniques how our network focus changes from coarse to fine details. Through our experiments, we also show that a simple attention model can aggregate (weighted) these finer details to focus on the most dominant discriminative part of the image. Our network uses only image-level labels and does not need bounding box/part annotation information. Further, the simplicity of our network makes it an easy plug-n-play module. Apart from providing interpretability, our network boosts the performance (up to 2%) when compared to its baseline counterparts. Our codebase is available at this https URL
摘要：深层神经网络已经显示出在粗粒图像分类任务了长足的发展。这部分是由于其强大的从图像中提取判别特征表示能力。然而，细颗粒图像不同阶层之间的边际视觉上的差异使得这一任务非常困难。在本文中，我们试图把重点放在这些边缘差异来提取比较有代表性的特征。类似于人类的视觉，我们的网络重复专注于图像发现的类之间的区别很小部分的部分。此外，我们通过展示技术解释性我们的网络焦点从粗怎么变成精致的细节。通过我们的实验中，我们还表明，一个简单的注意模型可以汇总（加权）这些更精细的细节专注于图像的最主要区别的部分。我们的网络只使用映像级标签，不需要边框/部分的注释信息。此外，我们的网络的简单性使得它容易插件的即插即用模块。除了提供可解释性，我们的网络提升性能（高达2％）相比，其基准同行。我们的代码库可在此HTTPS URL

22. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition [PDF] 返回目录
Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang
Abstract: Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nevertheless, they still face lots of challenges like image blur, uneven illumination, and incomplete characters. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. The semantic information is used both in the encoder module for supervision and in the decoder module for initializing. In particular, the state-of-the art ASTER method is integrated into the proposed framework as an exemplar. Extensive experiments demonstrate that the proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets.
摘要：场景文本识别是计算机视觉研究的热点。近日，根据编码器，解码器框架许多识别方法已经被提出，并且他们可以处理透视变形和曲线形状的场景文本。尽管如此，他们仍然面临着许多像图像模糊，光照不均，和不完整的人物挑战。我们认为，大多数编码解码器的方法是基于当地的视觉特征没有明确的全球语义信息。在这项工作中，我们提出了一种语义增强型编码器，解码器框架，有力识别低质量的场景文本。所述语义信息被用于监督所述编码器模块都在和用于初始化解码器模块中使用。特别地，国家的本领域ASTER的方法集成到所述提议的框架作为一个范例。大量的实验表明，该框架是低质量的文本图像更加稳健，并在几个基准数据集实现了国家的先进成果。

23. Head2Head: Video-based Neural Head Synthesis [PDF] 返回目录
Mohammad Rami Koujan, Michail Christos Doukas, Anastasios Roussos, Stefanos Zafeiriou
Abstract: In this paper, we propose a novel machine learning architecture for facial reenactment. In particular, contrary to the model-based approaches or recent frame-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames, we propose a novel method that (a) exploits the special structure of facial motion (paying particular attention to mouth motion) and (b) enforces temporal consistency. We demonstrate that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.
摘要：在本文中，我们提出了面部重演了一种新的机器学习建筑。特别是，相反，使用深卷积神经网络（DCNNs）生成单独的帧的基于模型的方法或近期基于帧的方法，我们提出了一种新方法，即（一）利用面部动作的特殊结构（特别注意到口运动）和（b）强制时间一致性。我们表明，该方法可以在照片般逼真的方式更准确地比国家的最先进的方法转移的面部表情，姿势和源演员的凝视目标视频。

24. When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition with Limited Data [PDF] 返回目录
Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe
Abstract: We present a new Deep Dictionary Learning and Coding Network (DDLCN) for image recognition tasks with limited data. The proposed DDLCN has most of the standard deep learning layers (e.g., input/output, pooling, fully connected, etc.), but the fundamental convolutional layers are replaced by our proposed compound dictionary learning and coding layers. The dictionary learning learns an over-complete dictionary for input training data. At the deep coding layer, a locality constraint is added to guarantee that the activated dictionary bases are close to each other. Then the activated dictionary atoms are assembled and passed to the compound dictionary learning and coding layers. In this way, the activated atoms in the first layer can be represented by the deeper atoms in the second dictionary. Intuitively, the second dictionary is designed to learn the fine-grained components shared among the input dictionary atoms, thus a more informative and discriminative low-level representation of the dictionary atoms can be obtained. We empirically compare DDLCN with several leading dictionary learning methods and deep learning models. Experimental results on five popular datasets show that DDLCN achieves competitive results compared with state-of-the-art methods when the training data is limited. Code is available at this https URL.
摘要：我们用有限的数据图像识别任务，提出了一种新的深层意思 - 和编码网络（DDLCN）。所提出的具有DDLCN大多数标准深度学习层的（例如，输入/输出，池，全连接，等等），但根本卷积层通过我们所提出的化合物字典学习和编码层代替。字典学习学习过完备字典输入训练数据。在深编码层，一个地方的约束被添加到保证激活的词典库接近对方。然后将活化的字典原子被组装并传送给化合物字典学习和编码层。以这种方式，在第一层中的激活的原子可以通过在第二词典中的更深的原子来表示。直观地，第二字典被设计为学习输入字典原子中共享使用的细粒成分，从而能够获得字典原子的更多的信息和判别低级表示。我们经验与一些领先的字典学习方法和深度学习模式比较DDLCN。在五个数据集流行实验结果表明，DDLCN与国家的最先进的方法相比，当训练数据被限制实现有竞争力的结果。代码可在此HTTPS URL。

25. Team Neuro at SemEval-2020 Task 8: Multi-Modal Fine Grain Emotion Classification of Memes using Multitask Learning [PDF] 返回目录
Sourya Dipta Das, Soumil Mandal
Abstract: In this article, we describe the system that we used for the memotion analysis challenge, which is Task 8 of SemEval-2020. This challenge had three subtasks where affect based sentiment classification of the memes was required along with intensities. The system we proposed combines the three tasks into a single one by representing it as multi-label hierarchical classification problem.Here,Multi-Task learning or Joint learning Procedure is used to train our model.We have used dual channels to extract text and image based features from separate Deep Neural Network Backbone and aggregate them to create task specific features. These task specific aggregated feature vectors ware then passed on to smaller networks with dense layers, each one assigned for predicting one type of fine grain sentiment label. Our Proposed method show the superiority of this system in few tasks to other best models from the challenge.
摘要：在这篇文章中，我们描述了我们用于memotion分析的挑战，这是SemEval-2020的任务8系统。这个挑战有地方影响基于被要求的模因情感分类随着强度的3子任务。该系统由我们代表它的多标签分层分类problem.Here，多任务学习或联名提出的将三个任务到一个单一的一个学习过程来训练我们的model.We已经使用双通道，提取文本和图像从单独的深层神经网络骨干基础功能和聚集他们创建任务的特定功能。然后，这些任务的具体聚合的特征向量洁具上与致密层更小的网络通过，每一个分配了用于预测一种类型的细晶粒情绪的标签。我们提出的方法表明，该系统在几个任务给其他最好的榜样，从挑战的优越性。

26. Joint Detection and Tracking in Videos with Identification Features [PDF] 返回目录
Bharti Munjal, Abdul Rafey Aftab, Sikandar Amin, Meltem D. Brandlmaier, Federico Tombari, Fabio Galasso
Abstract: Recent works have shown that combining object detection and tracking tasks, in the case of video data, results in higher performance for both tasks, but they require a high frame-rate as a strict requirement for performance. This is assumption is often violated in real-world applications, when models run on embedded devices, often at only a few frames per second. Videos at low frame-rate suffer from large object displacements. Here re-identification features may support to match large-displaced object detections, but current joint detection and re-identification formulations degrade the detector performance, as these two are contrasting tasks. In the real-world application having separate detector and re-id models is often not feasible, as both the memory and runtime effectively double. Towards robust long-term tracking applicable to reduced-computational-power devices, we propose the first joint optimization of detection, tracking and re-identification features for videos. Notably, our joint optimization maintains the detector performance, a typical multi-task challenge. At inference time, we leverage detections for tracking (tracking-by-detection) when the objects are visible, detectable and slowly moving in the image. We leverage instead re-identification features to match objects which disappeared (e.g. due to occlusion) for several frames or were not tracked due to fast motion (or low-frame-rate videos). Our proposed method reaches the state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge among online trackers, and 3rd overall.
摘要：最近的工作表明，结合目标检测和跟踪任务，在视频数据，结果在这两个任务更高的性能的情况下，但他们需要一个高帧频为性能严格要求。这是假设，通常在每秒只有几帧违反了现实世界的应用中，当模型在嵌入式设备上运行，频繁。影片在低帧速率从大的物体的位移受到影响。这里重新识别特征可以支持以匹配大位移对象检测，但目前的联合检测和重新鉴定制剂降低检测器的性能，因为这两个被对比的任务。在具有独立的检测和重新编号模拟现实世界的应用往往是不可行的，因为内存和运行有效的双两者。迈向稳健的长期跟踪适用于减计算功率器件，我们提出检测的第一联合优化，为视频跟踪和再识别特征。值得注意的是，我们的联合优化维护探测器的性能，一个典型的多任务挑战。在推理时，我们充分利用检测跟踪（跟踪按检测）当对象是可见的，检测和图像慢慢移动。我们利用不是重新识别特征，以匹配其消失的对象（例如，由于闭塞）用于若干帧或没有跟踪由于快速运动（或低帧率视频）。我们提出的方法达到了国家的最先进的MOT，它位列网上跟踪器之间的UA-DETRAC'18跟踪挑战1，和第3的整体。

27. SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading [PDF] 返回目录
Peratham Wiriyathammabhum
Abstract: This paper presents a novel deep learning architecture for word-level lipreading. Previous works suggest a potential for incorporating a pretrained deep 3D Convolutional Neural Networks as a front-end feature extractor. We introduce a SpotFast networks, a variant of the state-of-the-art SlowFast networks for action recognition, which utilizes a temporal window as a spot pathway and all frames as a fast pathway. We further incorporate memory augmented lateral transformers to learn sequential features for classification. We evaluate the proposed model on the LRW dataset. The experiments show that our proposed model outperforms various state-of-the-art models and incorporating the memory augmented lateral transformers makes a 3.7% improvement to the SpotFast networks.
摘要：本文提出了一种深度学习架构字级唇读。以前的作品表明用于集成了预训练的深3D卷积神经网络的前端特征提取的可能性。我们引入一个SpotFast网络中，状态的最先进的SlowFast网络进行动作识别，其利用时间窗作为spot通路和所有帧作为快速途径的变体。我们进一步将内存扩充横向变压器学习分类顺序的功能。我们评估对LRW数据集所提出的模型。实验表明，该模型优于各个国家的最先进的车型，并纳入内存扩充横向变压器使得3.7％提高到SpotFast网络。

28. PatchGuard: Provable Defense against Adversarial Patches Using Masks on Small Receptive Fields [PDF] 返回目录
Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal
Abstract: Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified. In this paper, we propose a general defense framework that can achieve both high clean accuracy and provable robustness against localized adversarial patches. The cornerstone of our defense framework is to use a convolutional network with small receptive fields that impose a bound on the number of features corrupted by an adversarial patch. We further present the robust masking defense that robustly detects and masks corrupted features for a secure feature aggregation. We evaluate our defense against the most powerful white-box untargeted adaptive attacker and achieve a 92.3% clean accuracy and an 85.2% provable robust accuracy on a 10-class subset of ImageNet against a 31x31 adversarial patch (2% pixels), a 57.4% clean accuracy and a 14.4% provable robust accuracy on 1000-class ImageNet against a 31x31 patch (2% pixels), and an 80.3% clean accuracy and a 61.3% provable accuracy on CIFAR-10 against a 5x5 patch (2.4% pixels). Notably, our provable defenses achieve state-of-the-art provable robust accuracy on ImageNet and CIFAR-10.
摘要：本地化对抗性贴剂的目标是通过图像的受限区域内任意地修改像素诱导的机器学习模型错误分类。这类攻击可在物理世界中通过对抗贴片附着到对象被错误分类来实现。在本文中，我们提出了一个一般的防御框架，能够同时实现高洁净的准确性和对局部对抗性补丁可证明的鲁棒性。我们的防守框架的基石是使用卷积网络的小感受野强加的一个下限由对抗性修补损坏的特征的数量。我们进一步展示了强大的掩蔽防御稳健检测和口罩破坏功能的安全功能聚集。我们评估我们对最强大的白盒无针对性的适应性攻击防御，实现了92.3％的清洁准确性和ImageNet的10类子集对一个31x31对抗补丁（2个％像素）的85.2％，可证明的强大的精确度，57.4％清洁精度和关于对一个31x31贴片（2点％的像素）1000级ImageNet 14.4％可证明的健壮的精度，和80.3％的清洁的精度和上CIFAR-10针对5x5的补丁（2.4％像素）一个61.3％可证明的精度。值得注意的是，我们可证明的防御实现对ImageNet和CIFAR-10国家的最先进的可证明的强大的精确度。

29. Unsupervised Domain Adaptation in Semantic Segmentation: a Review [PDF] 返回目录
Marco Toldo, Andrea Maracani, Umberto Michieli, Pietro Zanuttigh
Abstract: The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation. This task is attracting a wide interest, since semantic segmentation models require a huge amount of labeled data and the lack of data fitting specific requirements is the main limitation in the deployment of these techniques. This problem has been recently explored and has rapidly grown with a large number of ad-hoc approaches. This motivates us to build a comprehensive overview of the proposed methodologies and to provide a clear categorization. In this paper, we start by introducing the problem, its formulation and the various scenarios that can be considered. Then, we introduce the different levels at which adaptation strategies may be applied: namely, at the input (image) level, at the internal features representation and at the output level. Furthermore, we present a detailed overview of the literature in the field, dividing previous methods based on the following (non mutually exclusive) categories: adversarial learning, generative-based, analysis of the classifier discrepancies, self-teaching, entropy minimization, curriculum learning and multi-task learning. Novel research directions are also briefly introduced to give a hint of interesting open problems in the field. Finally, a comparison of the performance of the various methods in the widely used autonomous driving scenario is presented.
摘要：本文的目的是让在语义分割深层网络的无监督领域适应性（UDA）的最新进展进行了概述。这个任务是吸引了广泛关注，因为语义分割模式需要庞大的标记数据量和数据缺乏装修的具体要求是在这些技术部署的主要限制。这个问题最近已探索并迅速用大量的ad-hoc方式增长。这促使我们建立拟议方法的全面概述，并提供一个明确的分类。在本文中，我们引入了问题，它的配方和一个可以考虑各种各样的场景开始。然后，我们介绍了不同级别处的适应策略可以应用于：即，在输入（图像）的水平，在所述内部特征表示，并且在输出电平。此外，我们提出在该领域的文献的详细介绍，将基于下面的（非互相排斥的）类别的先前方法：对抗学习，分类器不符，基于生成-，分析，自教学，熵最小化，课程学习和多任务学习。新的研究方向进行了简要介绍给在外地有趣开放问题的提示。最后，在广泛使用的自动驾驶场景的各种方法的性能对比呈现。

30. RV-FuseNet: Range View based Fusion of Time-Series LiDAR Data for Joint 3D Object Detection and Motion Forecasting [PDF] 返回目录
Ankit Laddha, Shivam Gautam, Gregory P. Meyer, Carlos Vallespi-Gonzalez
Abstract: Autonomous vehicles rely on robust real-time detection and future motion prediction of traffic participants to safely navigate urban environments. We present a novel end-to-end approach that uses raw time-series LiDAR data to jointly solve both detection and prediction. We use the range view representation of LiDAR instead of voxelization since it does not discard information and is more efficient due to its compactness. However, for time-series fusion the data needs to be projected to a common viewpoint, and often this viewpoint is different from where it was captured leading to distortions. These distortions have an adverse impact on performance. Thus, we propose a novel architecture which reduces the impact of distortions by sequentially projecting each sweep into the viewpoint of the next sweep in time. We demonstrate that our sequential fusion approach is superior to methods that directly project all the data into the most recent viewpoint. Furthermore, we compare our approach to existing state-of-the art methods on multiple autonomous driving datasets and show competitive results.
摘要：自主车靠交通参与者的可靠的实时检测和未来的运动预测到安全航行的城市环境。我们提出了一个新颖的端至端的办法，使用原始的时间序列LiDAR数据，共同解决这两个检测和预测。我们使用激光雷达的范围视图表示，而不是体素化，因为它不丢弃信息，更有效，因为它的紧凑性。然而，对于时间序列融合所需要的数据将被投影到一个公共视点，而且往往这个观点从那里被捕获，导致失真的不同。这些扭曲会对性能产生不利影响。因此，我们提出一种减少失真的通过顺序地突出各扫描到时间下次扫描的观点考虑的影响的新颖体系结构。我们表明，我们的顺序融合方法优于直接项目中的所有数据到最近的观点方法。此外，我们比较了多个自主驾驶数据集的方式对现有的国家的最先进的方法，并显示有竞争力的结果。

31. PruneNet: Channel Pruning via Global Importance [PDF] 返回目录
Ashish Khetan, Zohar Karnin
Abstract: Channel pruning is one of the predominant approaches for accelerating deep neural networks. Most existing pruning methods either train from scratch with a sparsity inducing term such as group lasso, or prune redundant channels in a pretrained network and then fine tune the network. Both strategies suffer from some limitations: the use of group lasso is computationally expensive, difficult to converge and often suffers from worse behavior due to the regularization bias. The methods that start with a pretrained network either prune channels uniformly across the layers or prune channels based on the basic statistics of the network parameters. These approaches either ignore the fact that some CNN layers are more redundant than others or fail to adequately identify the level of redundancy in different layers. In this work, we investigate a simple-yet-effective method for pruning channels based on a computationally light-weight yet effective data driven optimization step that discovers the necessary width per layer. Experiments conducted on ILSVRC-$12$ confirm effectiveness of our approach. With non-uniform pruning across the layers on ResNet-$50$, we are able to match the FLOP reduction of state-of-the-art channel pruning results while achieving a $0.98\%$ higher accuracy. Further, we show that our pruned ResNet-$50$ network outperforms ResNet-$34$ and ResNet-$18$ networks, and that our pruned ResNet-$101$ outperforms ResNet-$50$.
摘要：通道修剪是加快深层神经网络的主要途径之一。大多数现有的修剪方法或者列车从头与稀疏诱导术语如组套索，或在一个预训练的网络剪枝冗余通道，然后微调网络。这两种策略从一些限制的影响：使用组的套索的计算成本高昂，难以收敛，往往从糟糕的行为遭受由于正偏差。与一个预训练的网络均匀地基于所述网络参数的基本统计的层或信道剪枝启动任一剪枝通道的方法。这些方法要么忽略的是，有些层CNN比其它更多的冗余或无法充分确定在不同层中的冗余水平。在这项工作中，我们研究了修剪基于计算上轻量又驱动有效数据优化步骤频道的简单但有效的方法，该方法发现的每一层的必要宽度。实验对我们的做法ILSVRC- $ 12 $确认有效性进行。与整个层的非均匀修剪上ResNet- $ 50 $，我们能够以匹配FLOP还原状态的最先进的信道修剪结果的同时实现了$ 0.98 \％$更高的精度。此外，我们证明了我们的修剪ResNet- $ 50个$网络性能优于ResNet- $ $ 34和$ ResNet- $ 18个网络和我们的修剪ResNet- $ 101 $性能优于ResNet- $ 50 $。

32. Semi-supervised Medical Image Classification with Global Latent Mixing [PDF] 返回目录
Prashnna Kumar Gyawali, Sandesh Ghimire, Pradeep Bajracharya, Zhiyuan Li, Linwei Wang
Abstract: Computer-aided diagnosis via deep learning relies on large-scale annotated data sets, which can be costly when involving expert knowledge. Semi-supervised learning (SSL) mitigates this challenge by leveraging unlabeled data. One effective SSL approach is to regularize the local smoothness of neural functions via perturbations around single data points. In this work, we argue that regularizing the global smoothness of neural functions by filling the void in between data points can further improve SSL. We present a novel SSL approach that trains the neural network on linear mixing of labeled and unlabeled data, at both the input and latent space in order to regularize different portions of the network. We evaluated the presented model on two distinct medical image data sets for semi-supervised classification of thoracic disease and skin lesion, demonstrating its improved performance over SSL with local perturbations and SSL with global mixing but at the input space only. Our code is available at this https URL.
摘要：通过深度学习的计算机辅助诊断依赖于大型注释的数据集，涉及的专业知识时，它可以是昂贵的。半监督学习（SSL），从而减轻通过利用未标记的数据这一挑战。一个有效的SSL的方法是通过围绕单一数据点的扰动以规范的神经功能的局部的平滑度。在这项工作中，我们认为，通过填充空隙中的数据点之间正规化的神经功能的全球平滑可以进一步提高SSL。我们提出了一个新颖的SSL的办法，列车上，以标记和未标记数据的线性混合，在输入和潜在空间两个神经网络来正规化所述网络的不同部分。我们评估了胸部疾病和皮肤损伤的半监督分类上两个不同的医学图像数据集提供的模型，证明了其SSL改进性能与当地扰动和SSL与全局混合，但在只输入空间。我们的代码可在此HTTPS URL。

33. Deep Learning Based Detection and Localization of Cerebal Aneurysms in Computed Tomography Angiography [PDF] 返回目录
Ziheng Duan, Daniel Montes, Yangsibo Huang, Dufan Wu, Javier M. Romero, Ramon Gilberto Gonzalez, Quanzheng Li
Abstract: Detecting cerebral aneurysms is an important clinical task of brain computed tomography angiography (CTA). However, human interpretation could be time consuming due to the small size of some aneurysms. In this work, we proposed DeepBrain, a deep learning based cerebral aneurysm detection and localization algorithm. The algorithm consisted of a 3D faster region-proposal convolution neural network for aneurysm detection and localization, and a 3D multi-scale fully convolutional neural network for false positive reduction. Furthermore, a novel hierarchical non-maximum suppression algorithm was proposed to process the detection results in 3D, which greatly reduced the time complexity by eliminating unnecessary comparisons. DeepBrain was trained and tested on 550 brain CTA scans and achieved sensitivity of 93.3% with 0.3 false positives per patient on average.
摘要：检测脑动脉瘤是脑CT血管造影（CTA）的一个重要的临床任务。然而，人类的解释可能是耗时的，由于一些动脉瘤的体积小。在这项工作中，我们提出了DeepBrain，基于脑动脉瘤检测和定位算法深度学习。该算法包括一个3D更快的区域，建议卷积神经动脉瘤检测和定位，以及一个3D多尺度完全卷积神经网络为假阳性降低网络。此外，一个新的分层非最大抑制算法，提出了以处理在3D中，通过消除不必要的比较大大减少的时间复杂度的检测结果。 DeepBrain进行训练，并且在550脑CTA扫描测试，并取得的93.3％的灵敏度，每个病人0.3误报上平均。

34. Misregistration Measurement and Improvement for Sentinel-1 SAR and Sentinel-2 Optical images [PDF] 返回目录
Yuanxin Ye, Chao Yang, Bai Zhu, Youquan He
Abstract: Co-registering the Sentinel-1 SAR and Sentinel-2 optical data of European Space Agency (ESA) is of great importance for many remote sensing applications. The Sentinel-1 and 2 product specifications from ESA show that the Sentinel-1 SAR L1 and the Sentinel-2 optical L1C images have a co-registration accuracy of within 2 pixels. However, we find that the actual misregistration errors are much larger than that between such images. This paper measures the misregistration errors by a block-based multimodal image matching strategy to six pairs of the Sentinel-1 SAR and Sentinel-2 optical images, which locate in China and Europe and cover three different terrains such as flat areas, hilly areas and mountainous areas. Our experimental results show the misregistration errors of the flat areas are 20-30 pixels, and these of the hilly areas are 20-40 pixels. While in the mountainous areas, the errors increase to 50-60 pixels. To eliminate the misregistration, we use some representative geometric transformation models such as polynomial models, projective models, and rational function models for the co-registration of the two types of images, and compare and analyze their registration accuracy under different number of control points and different terrains. The results of our analysis show that the 3rd. Order polynomial achieves the most satisfactory registration results. Its registration accuracy of the flat areas is less than 1.0 10m pixels, and that of the hilly areas is about 1.5 pixels, and that of the mountainous areas is between 1.8 and 2.3 pixels. In a word, this paper discloses and measures the misregistration between the Sentinel-1 SAR L1 and Sentinel-2 optical L1C images for the first time. Moreover, we also determine a relatively optimal geometric transformation model of the co-registration of the two types of images.
摘要：联合注册哨兵-1 SAR和欧洲航天局的哨兵-2光学数据（ESA）是许多遥感应用具有重要意义。从ESA显示哨兵-1和2的产品规格，该前哨1 SAR L1和Sentinel-2光学L1C图像具有的2个像素内的配准精度。然而，我们发现，实际的配准误差比，这样的图像之间大得多。本文测量通过基于块的多模态图像匹配策略定位失准误差六对哨兵-1 SAR和Sentinel-2的光学图像，这在中国和欧洲盖有三类不同的地形，如平原地区，丘陵地区和山区。我们的实验结果表明，该平坦区域的定位失准误差20-30像素，这些丘陵地区是20-40像素。而在山区，失误增多，以50-60像素。为了消除失准，我们使用一些有代表性的几何变换模型，如多项式模型，投影模式，并为两种图像的联合配准的有理函数模型，比较并根据不同的控制点分析它们的配准精度和不同的地形。我们的分析结果表明，第三位。阶多项式达到最满意的成果登记。其平坦区域的配准精度小于1.0 1000万像素，并且所述丘陵地区的是约1.5个像素，并且该山区是像素1.8和2.3之间。总之，本文公开和措施哨兵1 SAR L1和Sentinel-2光学L1C图像之间的配准不良的第一次。此外，我们也确定两个类型的图像的配准的相对最优几何变换模型。

35. Point2Mesh: A Self-Prior for Deformable Meshes [PDF] 返回目录
Rana Hanocka, Gal Metzer, Raja Giryes, Daniel Cohen-Or
Abstract: In this paper, we introduce Point2Mesh, a technique for reconstructing a surface mesh from an input point cloud. Instead of explicitly specifying a prior that encodes the expected shape properties, the prior is defined automatically using the input point cloud, which we refer to as a self-prior. The self-prior encapsulates reoccurring geometric repetitions from a single shape within the weights of a deep neural network. We optimize the network weights to deform an initial mesh to shrink-wrap a single input point cloud. This explicitly considers the entire reconstructed shape, since shared local kernels are calculated to fit the overall object. The convolutional kernels are optimized globally across the entire shape, which inherently encourages local-scale geometric self-similarity across the shape surface. We show that shrink-wrapping a point cloud with a self-prior converges to a desirable solution; compared to a prescribed smoothness prior, which often becomes trapped in undesirable local minima. While the performance of traditional reconstruction approaches degrades in non-ideal conditions that are often present in real world scanning, i.e., unoriented normals, noise and missing (low density) parts, Point2Mesh is robust to non-ideal conditions. We demonstrate the performance of Point2Mesh on a large variety of shapes with varying complexity.
摘要：在本文中，我们介绍Point2Mesh，用于从输入的点云来重构表面网格的技术。而不是明确指定了现有编码预期形状属性，在现有自动使用输入点云，其我们称之为自先前所定义。自前包封从深层神经网络的权重中的单个形状重现的几何重复。我们优化网络的权重，以变形的初始目至收缩包装的单个输入点云。这明确地考虑整个重构的形状，由于共享本地内核被计算，以适应整个对象。卷积内核的整个形状，其固有地鼓励整个形状表面局部尺度几何自相似全局优化。我们表明，收缩包装用自之前收敛到一个理想的解决方案的一个点云;相比于现有的规定的平滑度，这往往会被截留在不希望的局部最小值。虽然传统的重建的性能在非理想条件是通常存在于真实世界中的扫描，接近劣化即，非取向法线，噪声和缺失（低密度）的部分，Point2Mesh是稳健的非理想条件。我们证明Point2Mesh对种类繁多的复杂程度不同形状的性能。

36. SODA: Detecting Covid-19 in Chest X-rays with Semi-supervised Open Set Domain Adaptation [PDF] 返回目录
Jieli Zhou, Baoyu Jing, Zeya Wang
Abstract: The global pandemic of COVID-19 has infected millions of people since its first outbreak in last December. A key challenge for preventing and controlling COVID-19 is how to quickly, widely, and effectively implement the test for the disease, because testing is the first step to break the chains of transmission. To assist the diagnosis of the disease, radiology imaging is used to complement the screening process and triage patients into different risk levels. Deep learning methods have taken a more active role in automatically detecting COVID-19 disease in chest x-ray images, as witnessed in many recent works. Most of these works first train a CNN on an existing large-scale chest x-ray image dataset and then fine-tune it with a COVID-19 dataset at a much smaller scale. However, direct transfer across datasets from different domains may lead to poor performance due to visual domain shift. Also, the small scale of the COVID-19 dataset on the target domain can make the training fall into the overfitting trap. To solve all these crucial problems and fully exploit the available large-scale chest x-ray image dataset, we formulate the problem of COVID-19 chest x-ray image classification in a semi-supervised open set domain adaptation setting, through which we are motivated to reduce the domain shift and avoid overfitting when training on a very small dataset of COVID-19. In addressing this formulated problem, we propose a novel Semi-supervised Open set Domain Adversarial network (SODA), which is able to align the data distributions across different domains in a general domain space and also in a common subspace of source and target data. In our experiments, SODA achieves a leading classification performance compared with recent state-of-the-art models, as well as effectively separating COVID-19 with common pneumonia.
摘要：COVID-19的全球大流行，因为在去年12月召开的第一次爆发已经感染了数以百万计的人。为预防和控制COVID-19的一个关键挑战是如何快速，广泛，有效实现对疾病的测试，因为测试是第一步，打破传播链。为了协助疾病的诊断，放射学成像用于补充筛选过程和分类患者分为不同的风险水平。深度学习方法已在胸部X射线图像自动检测COVID-19的疾病采取了更加积极的作用，因为在最近的许多作品见证。这些作品大多第一列火车上的现有大型胸部X射线图像数据集，然后微调它的规模要小得多一个COVID-19数据集CNN。然而，来自不同域的数据集两端直接传输可能导致性能差，由于视觉域移位。此外，在目标域的COVID-19数据集的规模小可以使训练陷入过度拟合陷阱。为了解决这些关键问题，并充分利用现有的大型胸部X射线图像数据集，我们制定COVID，19胸部X射线图像分类的问题在一个半监督开集域适应环境，通过它，我们是积极地减少域转移，并避免对COVID-19的一个非常小的数据集训练时的过度拟合。在处理这一配制问题，我们提出了一种新颖的半监督打开设置域对抗性网络（SODA），这是能够对准在不同域中的数据分布在一般的域空间以及在源和目标数据的公共子空间。在我们的实验中，SODA与国家的最先进的最新型号相比，实现了领先的分类性能，以及有效地与普通肺炎分离COVID-19。

37. Deep learning application of vibration data for predictive maintenance of gravity acceleration equipment [PDF] 返回目录
SeonWoo Lee, YuHyeon Tak, HoJun Yang, JaeHeung Yang, GangMin Lim, KyuSung Kim, ByeongKeun Choi, JangWoo Kwon
Abstract: Hypergravity accelerators are used for gravity training or medical research. They are a kind of large machinery, and a failure of large equipment can be a serious problem in terms of safety or costs. In this paper, we propose a predictive maintenance model that can proactively prevent failures that may occur in a hypergravity accelerator. The method proposed in this paper is to convert vibration signals into spectograms and perform classification training using a deep learning model. We conducted an experiment to evaluate the performance of the method proposed in this paper. We attached a 4-channel accelerometer to the bearing housing which is a rotor, and obtained time-amplitude data from measured values by sampling. Then, the data was converted into a two-dimensional spectrogram, and classification training was performed using a deep learning model for four conditions of the equipment: Unbalance, Misalignment, Shaft Rubbing, and Normal. Experimental results showed that the proposed method has an accuracy of 99.5%, an increase of up to 23% compared to existing feature-based learning models.
摘要：超重加速器用于重力训练或医学研究。它们是一种大型机械和大型设备的故障可以在安全性或成本方面是一个严重的问题。在本文中，我们提出了一个预测性维护模式，可以主动地防止在超重加速器可能发生的故障。本文提出的方法是将振动信号转换成spectograms和使用深学习模式进行分类培训。我们进行了一项实验，以评估在本文所提出的方法的性能。我们附4通道加速度计到轴承壳体这是一个转子，并且从由采样的测量值获得的时间幅度数据。然后，数据被转换成一个二维频谱，并使用深学习模型为四个条件的设备的进行分类的训练：不平衡，不对，轴摩擦，和普通。实验结果表明，该方法具有99.5％的准确度，增幅高达的23％相比，现有的基于特征的学习模式。

38. Classification of Epithelial Ovarian Carcinoma Whole-Slide Pathology Images Using Deep Transfer Learning [PDF] 返回目录
Yiping Wang, David Farnell, Hossein Farahani, Mitchell Nursey, Basile Tessier-Cloutier, Steven J.M. Jones, David G. Huntsman, C. Blake Gilks, Ali Bashashati
Abstract: Ovarian cancer is the most lethal cancer of the female reproductive organs. There are $5$ major histological subtypes of epithelial ovarian cancer, each with distinct morphological, genetic, and clinical features. Currently, these histotypes are determined by a pathologist's microscopic examination of tumor whole-slide images (WSI). This process has been hampered by poor inter-observer agreement (Cohen's kappa $0.54$-$0.67$). We utilized a \textit{two}-stage deep transfer learning algorithm based on convolutional neural networks (CNN) and progressive resizing for automatic classification of epithelial ovarian carcinoma WSIs. The proposed algorithm achieved a mean accuracy of $87.54\%$ and Cohen's kappa of $0.8106$ in the slide-level classification of $305$ WSIs; performing better than a standard CNN and pathologists without gynecology-specific training.
摘要：卵巢癌是女性生殖器官中最致命的癌症。有$ 5上皮性卵巢癌的主要$病理类型，各具形态，遗传和临床特点。目前，这些组织型是由肿瘤全幻灯片图像的病理学家的镜检（WSI）确定。这个过程已经阻碍了贫穷国际观察员的协议（科恩kappa $ 0.54 $ - $ 0.67 $）。我们利用基于卷积神经网络（CNN）和用于上皮性卵巢癌峰会自动分类渐进调整大小一个\ textit {2} - 工段深传递的学习算法。该算法实现$ 87.54 \％$的平均准确性和在$ 305 $峰会的幻灯片级分类$ 0.8106 $科恩kappa;执行比标准CNN和病理学家没有更好妇科专门培训。

39. A Concise Review of Recent Few-shot Meta-learning Methods [PDF] 返回目录
Xiaoxu Li, Zhuo Sun, Jing-Hao Xue, Zhanyu Ma
Abstract: Few-shot meta-learning has been recently reviving with expectations to mimic humanity's fast adaption to new concepts based on prior knowledge. In this short communication, we give a concise review on recent representative methods in few-shot meta-learning, which are categorized into four branches according to their technical characteristics. We conclude this review with some vital current challenges and future prospects in few-shot meta-learning.
摘要：很少次元学习已经与期望模仿人类的快速适应基于已有知识的新概念最近被恢复。在这短暂的沟通，我们给在几个次元学习最近代表性的方法，它是根据自己的技术特点分为四个分支的简要回顾。我们得出结论：本次审查在几个次元学习一些重要的当前挑战和未来前景。

40. Large scale evaluation of importance maps in automatic speech recognition [PDF] 返回目录
Viet Anh Trinh, Michael I Mandel
Abstract: In this paper, we propose a metric that we call the structured saliency benchmark (SSBM) to evaluate importance maps computed for automatic speech recognizers on individual utterances. These maps indicate time-frequency points of the utterance that are most important for correct recognition of a target word. Our evaluation technique is not only suitable for standard classification tasks, but is also appropriate for structured prediction tasks like sequence-to-sequence models. Additionally, we use this approach to perform a large scale comparison of the importance maps created by our previously introduced technique using "bubble noise" to identify important points through correlation with a baseline approach based on smoothed speech energy and forced alignment. Our results show that the bubble analysis approach is better at identifying important speech regions than this baseline on 100 sentences from the AMI corpus.
摘要：在本文中，我们提出了一个指标，我们称之为结构化的显着性基准（SSBM）评估的重要性映射计算了个人话语自动语音识别。这些地图表明，对于正确识别目标词的最重要讲话的时频点。我们的评估技术不仅适用于标准分类的任务，但也适用于像序列对序列模型结构预测任务。此外，我们用这种方式来执行的重要性大规模映射相比我们之前介绍的技术使用“泡沫噪音”通过与基于平滑语音能量和强制排列基线的方法相关找出重要的点创建。我们的研究结果表明，泡沫的分析方法是在识别比这个基准从AMI语料100句重要讲话地区更好。

41. Evaluation of deep convolutional neural networks in classifying human embryo images based on their morphological quality [PDF] 返回目录
Prudhvi Thirumalaraju, Manoj Kumar Kanakasabapathy, Charles L Bormann, Raghav Gupta, Rohan Pooniwala, Hemanth Kandula, Irene Souter, Irene Dimitriadis, Hadi Shafiee
Abstract: A critical factor that influences the success of an in-vitro fertilization (IVF) procedure is the quality of the transferred embryo. Embryo morphology assessments, conventionally performed through manual microscopic analysis suffer from disparities in practice, selection criteria, and subjectivity due to the experience of the embryologist. Convolutional neural networks (CNNs) are powerful, promising algorithms with significant potential for accurate classifications across many object categories. Network architectures and hyper-parameters affect the efficiency of CNNs for any given task. Here, we evaluate multi-layered CNNs developed from scratch and popular deep-learning architectures such as Inception v3, ResNET, Inception-ResNET-v2, and Xception in differentiating between embryos based on their morphological quality at 113 hours post insemination (hpi). Xception performed the best in differentiating between the embryos based on their morphological quality.
摘要：其影响的体外受精（IVF）过程的成功是移植胚胎的质量的关键因素。胚胎形态学评估，通过手动显微镜分析以往进行从在实践中，选择标准，和主观性差距遭受由于胚胎学家的经验。卷积神经网络（细胞神经网络）是强大的，有前途的算法与在许多对象类别准确分类显著的潜力。网络架构和超参数影响细胞神经网络对于任何给定任务的效率。在这里，我们评估从无到有，流行的深学习体系结构，如盗梦空间V3，RESNET，成立之初，RESNET-V2开发的多层细胞神经网络，并Xception基于其在113后小时人工授精（HPI）形态胚胎质量之间的区别。 Xception基于其形态质量的胚胎之间的区别表现是最好的。

42. Conditionally Deep Hybrid Neural Networks Across Edge and Cloud [PDF] 返回目录
Yinghan Long, Indranil Chakraborty, Kaushik Roy
Abstract: The pervasiveness of "Internet-of-Things" in our daily life has led to a recent surge in fog computing, encompassing a collaboration of cloud computing and edge intelligence. To that effect, deep learning has been a major driving force towards enabling such intelligent systems. However, growing model sizes in deep learning pose a significant challenge towards deployment in resource-constrained edge devices. Moreover, in a distributed intelligence environment, efficient workload distribution is necessary between edge and cloud systems. To address these challenges, we propose a conditionally deep hybrid neural network for enabling AI-based fog computing. The proposed network can be deployed in a distributed manner, consisting of quantized layers and early exits at the edge and full-precision layers on the cloud. During inference, if an early exit has high confidence in the classification results, it would allow samples to exit at the edge, and the deeper layers on the cloud are activated conditionally, which can lead to improved energy efficiency and inference latency. We perform an extensive design space exploration with the goal of minimizing energy consumption at the edge while achieving state-of-the-art classification accuracies on image classification tasks. We show that with binarized layers at the edge, the proposed conditional hybrid network can process 65% of inferences at the edge, leading to 5.5x computational energy reduction with minimal accuracy degradation on CIFAR-10 dataset. For the more complex dataset CIFAR-100, we observe that the proposed network with 4-bit quantization at the edge achieves 52% early classification at the edge with 4.8x energy reduction. The analysis gives us insights on designing efficient hybrid networks which achieve significantly higher energy efficiency than full-precision networks for edge-cloud based distributed intelligence systems.
摘要：在我们的日常生活中“互联网 - 物联网”的普及导致了雾计算最近的激增，围绕云计算和边缘智力的合作。为此，深学习一直朝着实现这种智能系统的主要驱动力。然而，在深度学习成长模型大小造成对资源有限的边缘设备部署显著的挑战。此外，在分布式智能环境，高效工作负载分布是边缘和云系统之间必需的。为了应对这些挑战，我们提出了一个有条件深混合神经网络用于实现基于AI-雾计算。所提出的网络可以以分布式的方式来部署，包括在对云的边缘和全精度层量化层和早期退出的。在推理，如果提前退出具有在分类结果高置信度，这将允许样本在边缘处退出，并且在云中的更深的层被条件激活，这可以导致改进的能量效率和推断的延迟。我们与在边缘最大限度地减少能耗，同时在图像分类任务实现国家的最先进的分类准确度的目的，进行了广泛的设计空间探索。我们表明，与在边缘处的二值化层，所提出的条件混合网络可以在边缘处理推论的65％，从而导致具有最小精度降低5.5倍的计算能量削减上CIFAR-10数据集。对于更复杂的数据集CIFAR-100，我们观察到所提出的网络具有4位量化在边缘与在4.8倍能量削减的边缘达到52％早期分类。分析给了我们在设计其实现比全精密网络的边缘基于云的分布式智能系统显著提高能源效率高效混合网络的见解。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-25

目录

摘要