摘要

1. A modified Bayesian Convolutional Neural Network for Breast Histopathology Image Classification and Uncertainty Quantification [PDF] 返回目录
Pushkar Khairnar, Ponkrshnan Thiagarajan, Susanta Ghosh
Abstract: Convolutional neural network (CNN) based classification models have been successfully used on histopathological images for the detection of diseases. Despite its success, CNN may yield erroneous or overfitted results when the data is not sufficiently large or is biased. To overcome these limitations of CNN and to provide uncertainty quantification Bayesian CNN is recently proposed. However, we show that Bayesian-CNN still suffers from inaccuracies, especially in negative predictions. In the present work, we extend the Bayesian-CNN to improve accuracy and the rate of convergence. The proposed model is called modified Bayesian-CNN. The novelty of the proposed model lies in an adaptive activation function that contains a learnable parameter for each of the neurons. This adaptive activation function dynamically changes the loss function thereby providing faster convergence and better accuracy. The uncertainties associated with the predictions are obtained since the model learns a probability distribution on the network parameters. It reduces overfitting through an ensemble averaging over networks, which in turn improves accuracy on the unknown data. The proposed model demonstrates significant improvement by nearly eliminating overfitting and remarkably reducing (about 38%) the number of false-negative predictions. We found that the proposed model predicts higher uncertainty for images having features of both the classes. The uncertainty in the predictions of individual images can be used to decide when further human-expert intervention is needed. These findings have the potential to advance the state-of-the-art machine learning based automatic classification for histopathological images.
摘要：卷积神经网络（CNN）为基础的分类模型已经成功用于对检测疾病的病理图像。尽管它的成功，CNN可能会产生错误的或过度拟合结果当数据不足够大或施力。为了克服CNN的这些限制，并提供贝叶斯CNN的不确定性定量最近提出。然而，我们表明，贝叶斯CNN仍然受到不准确，尤其是在消极预测。在目前的工作中，我们扩展了贝叶斯CNN，以提高精度和收敛的速度。该模型被称为修正贝叶斯-CNN。所提出的模型谎言在包含用于每个神经元的一个可学习参数的自适应激励函数的新颖性。这种自适应的激活功能是动态变化的丧失功能，从而提供更快速的收敛和更高的精度。因为该模型学习网络参数的概率分布与预测相关联的不确定性而获得。它减少了通过整体平均通过网络，这又提高了对未知数据的准确性的过度拟合。该模型由接近消除过度拟合和显着减少（约38％）的假阴性预测数表明显著改善。我们发现，该模型预测具有两个类的特征的图像不确定性较高。在各个图像的预测的不确定性可用于需要进一步的人类专家介入时来决定。这些发现具有推进状态的最先进的基于机器学习的自动分类用于组织病理学图像的潜力。

2. Object-aware Feature Aggregation for Video Object Detection [PDF] 返回目录
Qichuan Geng, Hong Zhang, Na Jiang, Xiaojuan Qi, Liangjun Zhang, Zhong Zhou
Abstract: We present an Object-aware Feature Aggregation (OFA) module for video object detection (VID). Our approach is motivated by the intriguing property that video-level object-aware knowledge can be employed as a powerful semantic prior to help object recognition. As a consequence, augmenting features with such prior knowledge can effectively improve the classification and localization performance. To make features get access to more content about the whole video, we first capture the object-aware knowledge of proposals and incorporate such knowledge with the well-established pair-wise contexts. With extensive experimental results on the ImageNet VID dataset, our approach demonstrates the effectiveness of object-aware knowledge with the superior performance of 83.93% and 86.09% mAP with ResNet-101 and ResNeXt-101, respectively. When further equipped with Sequence DIoU NMS, we obtain the best-reported mAP of 85.07% and 86.88% upon the paper submitted. The code to reproduce our results will be released after acceptance.
摘要：我们提出了一个对象知晓特征聚合（OFA）的视频对象检测（VID）模块。我们的做法是由耐人寻味的属性，视频级对象知晓的知识可以作为一个强大的语义之前帮助物体识别的动机。因此，这种现有知识的扩充功能，可有效地提高分类和定位性能。为了使功能可以访问对整个视频内容比较多，我们先占领的提案对象知晓的知识和行之有效的成对环境整合这些知识。随着对ImageNet VID数据集丰富的实验结果，我们的做法表明了与83.93％和86.09％映像的卓越性能对象知晓知识与RESNET-101和ResNeXt-101，分别效力。当还配备有序列迪欧NMS，我们在提交的论文获得85.07％和86.88％，最好报图。重现我们的结果的代码将验收合格后公布。

3. High-Throughput Image-Based Plant Stand Count Estimation Using Convolutional Neural Networks [PDF] 返回目录
Saeed Khaki, Hieu Pham, Ye Han, Wade Kent, Lizhi Wang
Abstract: The future landscape of modern farming and plant breeding is rapidly changing due to the complex needs of our society. The explosion of collectable data has started a revolution in agriculture to the point where innovation must occur. To a commercial organization, the accurate and efficient collection of information is necessary to ensure that optimal decisions are made at key points of the breeding cycle. However, due to the shear size of a breeding program and current resource limitations, the ability to collect precise data on individual plants is not possible. In particular, efficient phenotyping of crops to record its color, shape, chemical properties, disease susceptibility, etc. is severely limited due to labor requirements and, oftentimes, expert domain knowledge. In this paper, we propose a deep learning based approach, named DeepStand, for image-based corn stand counting at early phenological stages. The proposed method adopts a truncated VGG-16 network as a backbone feature extractor and merges multiple feature maps with different scales to make the network robust against scale variation. Our extensive computational experiments suggest that our proposed method can successfully count corn stands and out-perform other state-of-the-art methods. It is the goal of our work to be used by the larger agricultural community as a way to enable high-throughput phenotyping without the use of extensive time and labor requirements.
摘要：现代农业和植物育种的未来景观正在迅速由于我们社会的复杂需求变化。收集的数据的爆炸性增长已经开始在农业革命，必须在其中发生的创新点。一个商业组织，信息的准确和有效地收集是必要的，以确保最佳的决策是在繁殖周期的关键点进行。然而，由于育种计划和目前资源限制的剪切尺寸，收集单个工厂的精确数据的能力是不可能的。特别是，作物高效的表型，以记录其颜色，形状，化学性质，疾病易感性等因被严格限定于对劳动力的需求，通常可以专业领域知识。在本文中，我们提出了一个深刻的学习为基础的方法，命名DeepStand，对基于图像的玉米立场计数早期生育期。该方法采用截断VGG-16网络作为骨干特征提取和合并多种功能，以不同的比例映射到使对尺度变化的网络的鲁棒性。我们广泛的计算实验表明，我们提出的方法能够成功计数玉米站和出执行其他国家的最先进的方法。它是由大型农业社区作为一种方法，使不使用大量的时间和人力的要求高吞吐量的表型是我们工作的目标。

4. ResNet or DenseNet? Introducing Dense Shortcuts to ResNet [PDF] 返回目录
Chaoning Zhang, Philipp Benz, Dawit Mureja Argaw, Seokju Lee, Junsik Kim, Francois Rameau, Jean-Charles Bazin, In So Kweon
Abstract: ResNet or DenseNet? Nowadays, most deep learning based approaches are implemented with seminal backbone networks, among them the two arguably most famous ones are ResNet and DenseNet. Despite their competitive performance and overwhelming popularity, inherent drawbacks exist for both of them. For ResNet, the identity shortcut that stabilizes training also limits its representation capacity, while DenseNet has a higher capacity with multi-layer feature concatenation. However, the dense concatenation causes a new problem of requiring high GPU memory and more training time. Partially due to this, it is not a trivial choice between ResNet and DenseNet. This paper provides a unified perspective of dense summation to analyze them, which facilitates a better understanding of their core difference. We further propose dense weighted normalized shortcuts as a solution to the dilemma between them. Our proposed dense shortcut inherits the design philosophy of simple design in ResNet and DenseNet. On several benchmark datasets, the experimental results show that the proposed DSNet achieves significantly better results than ResNet, and achieves comparable performance as DenseNet but requiring fewer computation resources.
摘要：RESNET或DenseNet？目前，大多数基于深刻的学习方法与开创性的骨干网络中实现，其中两个可以说是最有名的是RESNET和DenseNet。尽管他们的竞争力的性能和压倒性的普及，对他们俩的存在固有的缺陷。对于RESNET，身份快捷的稳定培养也限制了它的表示能力，同时DenseNet具有多层特征级联更高的容量。然而，密集的串联使需要高GPU的内存和更多的训练时间的新问题。部分是由于这一点，它不是之间RESNET和DenseNet一个简单的选择。本文提供密集的总和的统一的角度来分析它们，这有利于更好地了解自己的核心差异。我们进一步提出了密集的加权归快捷方式作为解决他们之间的两难选择。我们提出的密集快捷继承的RESNET和DenseNet设计简单的设计理念。在几个基准数据集，实验结果表明，该DSNet实现了比RESNET显著更好的成绩，并达到相当的性能为DenseNet但需要较少的计算资源。

5. Primal-Dual Mesh Convolutional Neural Networks [PDF] 返回目录
Francesco Milano, Antonio Loquercio, Antoni Rosinol, Davide Scaramuzza, Luca Carlone
Abstract: Recent works in geometric deep learning have introduced neural networks that allow performing inference tasks on three-dimensional geometric data by defining convolution, and sometimes pooling, operations on triangle meshes. These methods, however, either consider the input mesh as a graph, and do not exploit specific geometric properties of meshes for feature aggregation and downsampling, or are specialized for meshes, but rely on a rigid definition of convolution that does not properly capture the local topology of the mesh. We propose a method that combines the advantages of both types of approaches, while addressing their limitations: we extend a primal-dual framework drawn from the graph-neural-network literature to triangle meshes, and define convolutions on two types of graphs constructed from an input mesh. Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them using an attention mechanism. At the same time, we introduce a pooling operation with a precise geometric interpretation, that allows handling variations in the mesh connectivity by clustering mesh faces in a task-driven fashion. We provide theoretical insights of our approach using tools from the mesh-simplification literature. In addition, we validate experimentally our method in the tasks of shape classification and shape segmentation, where we obtain comparable or superior performance to the state of the art.
摘要：在几何深度学习最近的作品纷纷推出的神经网络，允许通过三角形网格定义卷积，有时集中，操作三维几何数据进行推理任务。这些方法，但是，无论是考虑输入网格作为图形，并没有利用网格的特定的几何性质功能聚集和下采样，或者是专门为网格，但依靠卷积的一个严格的定义不正确地捕捉到当地网格的拓扑结构。我们提出了一种方法，这两种类型的方法中的联合收割机的优点，同时解决它们的局限性：我们扩展从图的神经网络文献绘制三角形网格一个原始对偶框架，并且对从构建两种类型的图表定义卷积输入网格。我们的方法需要用于3D的两个边和面的功能目作为输入，并动态地使用注意机制聚集它们。与此同时，我们引入具有精确的几何解释，其允许通过在任务驱动的方式聚类网格面处理在网状连接变化的池操作。我们提供了使用工具从网格简化文学我们的做法的理论见解。另外，我们通过实验验证我们的形状分类和形状分割的任务，在那里我们获得相当或优异的性能对本领域的状态的方法。

6. LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration [PDF] 返回目录
Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll
Abstract: We address the problem of fitting 3D human models to 3D scans of dressed humans. Classical methods optimize both the data-to-model correspondences and the human model parameters (pose and shape), but are reliable only when initialized close to the solution. Some methods initialize the optimization based on fully supervised correspondence predictors, which is not differentiable end-to-end, and can only process a single scan at a time. Our main contribution is LoopReg, an end-to-end learning framework to register a corpus of scans to a common 3D human model. The key idea is to create a self-supervised loop. A backward map, parameterized by a Neural Network, predicts the correspondence from every scan point to the surface of the human model. A forward map, parameterized by a human model, transforms the corresponding points back to the scan based on the model parameters (pose and shape), thus closing the loop. Formulating this closed loop is not straightforward because it is not trivial to force the output of the NN to be on the surface of the human model - outside this surface the human model is not even defined. To this end, we propose two key innovations. First, we define the canonical surface implicitly as the zero level set of a distance field in R3, which in contrast to morecommon UV parameterizations, does not require cutting the surface, does not have discontinuities, and does not induce distortion. Second, we diffuse the human model to the 3D domain R3. This allows to map the NN predictions forward,even when they slightly deviate from the zero level set. Results demonstrate that we can train LoopRegmainly self-supervised - following a supervised warm-start, the model becomes increasingly more accurate as additional unlabelled raw scans are processed. Our code and pre-trained models can be downloaded for research.
摘要：针对装修的3D人体模型打扮人类的3D扫描的问题。经典方法优化数据到模型的对应和人体模型参数（姿势和形状）两者，但接近溶液只初始化时是可靠的。一些方法初始化基于完全监控对应的预测的优化，这是不可微的端至端，只能一次处理的单次扫描。我们的主要贡献是LoopReg，最终到终端的学习框架注册扫描到一个共同的3D人体模型的语料库。关键的想法是创建一个自我监督的循环。的反向映射，由神经网络参数，预测从每一个扫描点的人体模型的表面上的对应关系。正向映射，由人体模型参数，变换对应的点回到基于所述模型参数（姿势和形状）的扫描，从而使环路闭合。制定这个闭环并不简单，因为它是不平凡的强制NN的输出是人体模型的表面上 - 这面外侧的人体模型甚至没有定义。为此，我们提出了两个重要的创新。首先，我们隐式地在R3的零水平集的距离场的，这在对比morecommon UV参数化，不需要切割的表面上，不具有不连续性，并且不会引起失真定义的规范的表面。其次，我们漫人体模型在3D领域R3。这使得在NN预测前方地图，即使他们略微从零水平集偏离。结果表明，我们可以训练LoopRegmainly自我监督 - 继监督热启动，该模型变得额外的未标记的原始扫描处理越来越准确。我们的代码和预训练的模型可以下载研究。

7. Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training [PDF] 返回目录
Xiaofeng Liu, Yuzhuo Han, Song Bai, Yi Ge, Tianxing Wang, Xu Han, Site Li, Jane You, Ju Lu
Abstract: Semantic segmentation (SS) is an important perception manner for self-driving cars and robotics, which classifies each pixel into a pre-determined class. The widely-used cross entropy (CE) loss-based deep networks has achieved significant progress w.r.t. the mean Intersection-over Union (mIoU). However, the cross entropy loss can not take the different importance of each class in an self-driving system into account. For example, pedestrians in the image should be much more important than the surrounding buildings when make a decisions in the driving, so their segmentation results are expected to be as accurate as possible. In this paper, we propose to incorporate the importance-aware inter-class correlation in a Wasserstein training framework by configuring its ground distance matrix. The ground distance matrix can be pre-defined following a priori in a specific task, and the previous importance-ignored methods can be the particular cases. From an optimization perspective, we also extend our ground metric to a linear, convex or concave increasing function $w.r.t.$ pre-defined ground distance. We evaluate our method on CamVid and Cityscapes datasets with different backbones (SegNet, ENet, FCN and Deeplab) in a plug and play fashion. In our extenssive experiments, Wasserstein loss demonstrates superior segmentation performance on the predefined critical classes for safe-driving.
摘要：语义分割（SS）是用于自动驾驶汽车和机器人的重要感知方式，其中，每个像素分类成一个预先确定的类别。目前广泛使用的交叉熵（CE）基于丢失的深网已取得显著进展w.r.t.平均交叉点过联盟（米欧）。然而，交叉熵损失不能把每个类的不同重要性的自驾车系统考虑。例如，图像中的行人应比周围的建筑时，在驾驶做出的决定更重要的，所以他们的分割结果预计将尽可能准确。在本文中，我们建议配置其地面距离矩阵纳入在华的培训框架的重要性感知类间的相关性。地面的距离矩阵可以被预先定义在一个特定的任务以下先验，和先前的重要性被忽略的方法可以是特定的情况。从优化的角度来看，我们还提供了地面度量扩展到一个直链的，凸的或凹递增函数$ $ w.r.t.预先定义地面的距离。我们评估我们的CamVid和风情的数据集，在一个即插即用的方式不同骨干网（SegNet，ENET，FCN和Deeplab）方法。在我们的实验中extenssive，Wasserstein的损失证明对安全驾驶的预定的临界类卓越的分割性能。

8. BP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo [PDF] 返回目录
Christian Sormann, Patrick Knöbelreiter, Andreas Kuhn, Mattia Rossi, Thomas Pock, Friedrich Fraundorfer
Abstract: In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results.
摘要：在这项工作中，我们提出了BP-MVSNet，卷积神经网络（CNN）为基础的多视点立体（MVS），其采用的是微条件随机场（CRF）层的正则化方法。为此，我们建议扩大BP层，并添加什么是必要成功地使用它在MVS设置。因此，我们展示我们如何能够基于预期的3D错误，我们就可以使用标准化的标签在CRF跳跃正常化。这是为了使BP层不变的MVS设置不同的尺度。为了也使小数标签跳跃，我们提出了一个微插值步骤，我们嵌入到成对项的计算。这些扩展使我们能够在BP层集成到多尺度MVS网络，我们不断提高的粗略初始估计值，直到我们得到高品质的深度映射结果。我们评估所提出的BP-MVSNet在消融研究和DTU上，坦克和太阳穴ETH3D数据集进行了广泛的实验。实验表明，我们可以显著跑赢基准，实现国家的最先进的成果。

9. Pathological Visual Question Answering [PDF] 返回目录
Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
Abstract: Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology (ABP)? To build such a system, three challenges need to be addressed. First, we need to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer. Due to privacy concerns, pathology images are usually not publicly available. Besides, only well-trained pathologists can understand pathology images, but they barely have time to help create datasets for AI research. The second challenge is: since it is difficult to hire highly experienced pathologists to create pathology visual questions and answers, the resulting pathology VQA dataset may contain errors. Training pathology VQA models using these noisy or even erroneous data will lead to problematic models that cannot generalize well on unseen images. The third challenge is: the medical concepts and knowledge covered in pathology question-answer (QA) pairs are very diverse while the number of QA pairs available for modeling training is limited. How to learn effective representations of diverse medical concepts based on limited data is technically demanding. In this paper, we aim to address these three challenges. To our best knowledge, our work represents the first one addressing the pathology VQA problem. To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset. To address the second challenge, we propose a learning-by-ignoring approach. To address the third challenge, we propose to use cross-modal self-supervised learning. We perform experiments on our created PathVQA dataset and the results demonstrate the effectiveness of our proposed learning-by-ignoring method and cross-modal self-supervised learning methods.
摘要：是否有可能建立一个“AI病理学家”通过病理美国委员会（ABP）的委员会认证考试吗？要建立这样一个体系，三个挑战需要解决的问题。首先，我们需要创建在AI剂呈现病理图像连同一个问题，并要求给出正确的答案视觉问答（VQA）的数据集。由于隐私问题，病理图像通常是不公开的。此外，只有训练有素的病理学家可以理解病理图像，但他们几乎没有时间来帮助创造人工智能研究的数据集。第二个挑战是：因为它是很难聘请具有丰富经验的病理学家创造病理视觉问题和答案，所产生的病理VQA的数据集可能包含错误。培训使用这些嘈杂的，甚至错误的数据病理学VQA模型会导致问题的模式，不能在看不见的图像以及一概而论。第三个挑战是：医疗概念和知识覆盖病理问答（QA）对是非常不同的，而可供模拟训练QA对的数量是有限的。如何学习的基础上有限的数据是技术要求高的多元化医疗概念有效表示。在本文中，我们的目标是解决这三个难题。据我们所知，我们的工作是第一个解决病理VQA问题。为了解决这个问题是一个公开的病理VQA数据集欠缺，我们创建PathVQA数据集。为了解决第二个挑战，我们提出了边学边忽略的方法。为了解决第三个问题，我们建议使用跨模式的自我监督学习。我们在执行我们的实验创建PathVQA数据集和结果证明我们提出的学边忽略法和跨模态自我监督学习方法的有效性。

10. Unsupervised Domain Adaptation without Source Data by Casting a BAIT [PDF] 返回目录
Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz
Abstract: Unsupervised domain adaptation (UDA) aims to transfer the knowledge learned from labeled source domain to unlabeled target domain. Existing UDA methods require access to the data from the source domain, during adaptation to the target domain, which may not be feasible in some real-world situations. In this paper, we address Source-free Unsupervised Domain Adaptation (SFUDA), where the model has no access to any source data during the adaptation period. We propose a novel framework named BAIT to tackle SFUDA. Specifically, we first train the model on source domain. With the source-specific classifier head (referred to as anchor classifier) fixed, we further introduce a new learnable classifier head (referred to as bait classifier), which is initialized by the anchor classifier. When adapting the source model to the target domain, the source data are no more accessible and the bait classifier aims to push the target features towards the right side of the decision boundary of the anchor classifier, thus achieving the feature alignment. Experiment results show that proposed BAIT achieves state-of-the-art performance compared with existing normal UDA methods and several SFUDA methods.
摘要：无监督领域适应性（UDA）的目标是把从标记源域到未标记的目标域学到的知识。现有UDA方法需要从源域访问数据，适配到目标域，这可能不是在某些实际情况下是可行的期间。在本文中，我们针对源无无监督领域适应性（SFUDA），其中该模型具有适应期内任何源数据的访问权限。我们建议命名为诱饵，以解决SFUDA一个新的框架。具体来说，我们首先在训练源域模型。与固定在特定源分类器头（称为锚分类器），我们进一步引入一个新的可学习分类器头（称为诱饵分类器），这是由所述锚定分类器初始化。当调整源模型到目标域，源数据没有更多方便和诱饵分类旨在推动目标特征对锚分类的决策边界的右侧，从而实现功能定位。实验结果表明，与现有的普通UDA方法和几个SFUDA方法相比提出诱饵实现国家的最先进的性能。

11. DLDL: Dynamic Label Dictionary Learning via Hypergraph Regularization [PDF] 返回目录
Shuai Shao, Mengke Wang, Rui Xu, Yan-Jiang Wang, Bao-Di Liu
Abstract: For classification tasks, dictionary learning based methods have attracted lots of attention in recent years. One popular way to achieve this purpose is to introduce label information to generate a discriminative dictionary to represent samples. However, compared with traditional dictionary learning, this category of methods only achieves significant improvements in supervised learning, and has little positive influence on semi-supervised or unsupervised learning. To tackle this issue, we propose a Dynamic Label Dictionary Learning (DLDL) algorithm to generate the soft label matrix for unlabeled data. Specifically, we employ hypergraph manifold regularization to keep the relations among original data, transformed data, and soft labels consistent. We demonstrate the efficiency of the proposed DLDL approach on two remote sensing datasets.
摘要：对于分类任务，基于字典的学习方法吸引了大量的关注，近年来。一种流行的方式来实现这一目的是介绍标签信息生成的具有区分字典代表样本。然而，与传统的字典学习相比，这一类的方法只有实现了监督学习显著的改进，并且对半监督或监督学习一点积极的影响。为了解决这个问题，我们提出了一个动态标记意思 - （DLDL）算法来生成标签数据，软标签矩阵。具体来说，我们采用超图流形正，以保持原始数据，转换数据和软标签一致之间的关系。我们证明两个遥感数据集所提出的DLDL方法的有效性。

12. SAHDL: Sparse Attention Hypergraph Regularized Dictionary Learning [PDF] 返回目录
Shuai Shao, Rui Xu, Yan-Jiang Wang, Weifeng Liu, Bao-Di Liu
Abstract: In recent years, the attention mechanism contributes significantly to hypergraph based neural networks. However, these methods update the attention weights with the network propagating. That is to say, this type of attention mechanism is only suitable for deep learning-based methods while not applicable to the traditional machine learning approaches. In this paper, we propose a hypergraph based sparse attention mechanism to tackle this issue and embed it into dictionary learning. More specifically, we first construct a sparse attention hypergraph, asset attention weights to samples by employing the $\ell_1$-norm sparse regularization to mine the high-order relationship among sample features. Then, we introduce the hypergraph Laplacian operator to preserve the local structure for subspace transformation in dictionary learning. Besides, we incorporate the discriminative information into the hypergraph as the guidance to aggregate samples. Unlike previous works, our method updates attention weights independently, does not rely on the deep network. We demonstrate the efficacy of our approach on four benchmark datasets.
摘要：近年来，注意机制显著有助于基于超图的神经网络。然而，这些方法更新关注权重与网络传播。也就是说，这种类型的注意机制仅适用于深基于学习的方法，而并不适用于传统的机器学习方法。在本文中，我们提出了一种基于超图稀疏注意机制来解决这个问题，并把它嵌入到字典学习。更具体地讲，我们首先通过采用$ \ $ ell_1疏范数正规化开采的样品特征中高阶关系构建稀疏关注超图，资产注意权重样本。然后，我们介绍了超图拉普拉斯算子保持在字典学习子空间变换的局部结构。此外，我们结合了判别信息到超图作为指导，以总样本。不同于以往的作品中，我们的方法更新关注权重独立，不依赖于深网络上。我们证明了我们的四个基准数据集方法的有效性。

13. RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor [PDF] 返回目录
Fan Lu, Guang Chen, Yinlong Liu, Zhongnan Qu, Alois Knoll
Abstract: Keypoint detector and descriptor are two main components of point cloud registration. Previous learning-based keypoint detectors rely on saliency estimation for each point or farthest point sample (FPS) for candidate points selection, which are inefficient and not applicable in large scale scenes. This paper proposes Random Sample-based Keypoint Detector and Descriptor Network (RSKDD-Net) for large scale point cloud registration. The key idea is using random sampling to efficiently select candidate points and using a learning-based method to jointly generate keypoints and descriptors. To tackle the information loss of random sampling, we exploit a novel random dilation cluster strategy to enlarge the receptive field of each sampled point and an attention mechanism to aggregate the positions and features of neighbor points. Furthermore, we propose a matching loss to train the descriptor in a weakly supervised manner. Extensive experiments on two large scale outdoor LiDAR datasets show that the proposed RSKDD-Net achieves state-of-the-art performance with more than 15 times faster than existing methods. Our code is available at this https URL.
摘要：关键点检测器和描述符是点云登记两个主要组成部分。以前基于学习的关键点检测器依赖于显着估计为每个点或最远点样品（FPS）为候选点选择，这是低效的和不适用在大规模的场景。本文提出了随机样本为基础的关键点检测和描述符网络（RSKDD-网）大规模点云登记。其核心思想是用随机抽样的有效选择候选点，并使用基于学习的方法，共同生成的关键点和描述符。为了解决随机抽样的信息丢失，我们开发了一种新的随机扩张集群战略，以扩大每个采样点的感受野和注意机制聚合的邻居点的位置和功能。此外，我们提出了一个匹配损耗训练描述符弱监督的方式。两个大型大量的实验户外激光雷达数据集表明，该RSKDD-Net的实现国家的最先进的性能比现有方法快15倍。我们的代码可在此HTTPS URL。

14. Efficient grouping for keypoint detection [PDF] 返回目录
Alexey Sidnev, Ekaterina Krasikova, Maxim Kazakov
Abstract: The success of deep neural networks in the traditional keypoint detection task encourages researchers to solve new problems and collect more complex datasets. The size of the DeepFashion2 dataset poses a new challenge on the keypoint detection task, as it comprises 13 clothing categories that span a wide range of keypoints (294 in total). The direct prediction of all keypoints leads to huge memory consumption, slow training, and a slow inference time. This paper studies the keypoint grouping approach and how it affects the performance of the CenterNet architecture. We propose a simple and efficient automatic grouping technique with a powerful post-processing method and apply it to the DeepFashion2 fashion landmark task and the MS COCO pose estimation task. This reduces memory consumption and processing time during inference by up to 19% and 30% respectively, and during the training stage by 28% and 26% respectively, without compromising accuracy.
摘要：深层神经网络在传统的关键点检测任务的成功鼓励研究人员解决新问题，并收集更多复杂的数据集。所述DeepFashion2数据集的大小带来的关键点检测任务了新的挑战，因为它包括跨越宽范围的关键点（总共294）13服装类。所有关键点的直接预测导致巨大的内存消耗，慢训练和缓慢的推理时间。本文研究的关键点分组的方法，以及它如何影响CenterNet架构的性能。我们提出了一个简单而有效的自动分组技术，具有强大的后处理方法，它适用于DeepFashion2时尚新地标任务和MS COCO姿势估计任务。这减少了存储器消耗和处理时间达19％和30％推理过程中分别与期间由28％和26％的训练阶段，而不损害精度。

15. Spherical Harmonics for Shape-Constrained 3D Cell Segmentation [PDF] 返回目录
Dennis Eschweiler, Malte Rethwisch, Simon Koppers, Johannes Stegmaier
Abstract: Recent microscopy imaging techniques allow to precisely analyze cell morphology in 3D image data. To process the vast amount of image data generated by current digitized imaging techniques, automated approaches are demanded more than ever. Segmentation approaches used for morphological analyses, however, are often prone to produce unnaturally shaped predictions, which in conclusion could lead to inaccurate experimental outcomes. In order to minimize further manual interaction, shape priors help to constrain the predictions to the set of natural variations. In this paper, we show how spherical harmonics can be used as an alternative way to inherently constrain the predictions of neural networks for the segmentation of cells in 3D microscopy image data. Benefits and limitations of the spherical harmonic representation are analyzed and final results are compared to other state-of-the-art approaches on two different data sets.
摘要：最近的显微成像技术允许精确地分析三维图像数据中的细胞形态。处理由当前数字化的成像技术生成的图像数据的大量的自动化方法都要求比以往更。分割方法用于形态学分析，但是，往往容易产生不自然形状的预测，这结论可能导致不准确的试验结果。为了最小化进一步手动交互，形状先验有助于约束预测到组固有的变化。在本文中，我们显示如何谐波球形可被用作一个替代的方式来限制固有地神经网络的预测用于细胞在三维显微图像数据的分割。益处和球谐函数表达式的局限性进行了分析和最终结果进行比较，以其他国家的最先进的方法在两个不同的数据集。

16. Fusion of Dual Spatial Information for Hyperspectral Image Classification [PDF] 返回目录
Puhong Duan, Pedram Ghamisi, Xudong Kang, Behnood Rasti, Shutao Li, Richard Gloaguen
Abstract: The inclusion of spatial information into spectral classifiers for fine-resolution hyperspectral imagery has led to significant improvements in terms of classification performance. The task of spectral-spatial hyperspectral image classification has remained challenging because of high intraclass spectrum variability and low interclass spectral variability. This fact has made the extraction of spatial information highly active. In this work, a novel hyperspectral image classification framework using the fusion of dual spatial information is proposed, in which the dual spatial information is built by both exploiting pre-processing feature extraction and post-processing spatial optimization. In the feature extraction stage, an adaptive texture smoothing method is proposed to construct the structural profile (SP), which makes it possible to precisely extract discriminative features from hyperspectral images. The SP extraction method is used here for the first time in the remote sensing community. Then, the extracted SP is fed into a spectral classifier. In the spatial optimization stage, a pixel-level classifier is used to obtain the class probability followed by an extended random walker-based spatial optimization technique. Finally, a decision fusion rule is utilized to fuse the class probabilities obtained by the two different stages. Experiments performed on three data sets from different scenes illustrate that the proposed method can outperform other state-of-the-art classification techniques. In addition, the proposed feature extraction method, i.e., SP, can effectively improve the discrimination between different land covers.
摘要：包含的空间信息为频谱分类为精细分辨率的高光谱遥感图像已导致分类性能方面显著的改善。谱空间高光谱图像分类的任务仍然因为高组内的频谱可变性和低的类间的光谱变异的挑战。这一事实已经取得的空间信息的提取高活性。在这项工作中，使用的双空间信息融合的新的高光谱图像的分类框架提出，其中，所述双重空间信息由两个利用预处理特征提取和后处理空间优化构建的。在特征提取阶段，自适应纹理平滑化方法，提出了构造的结构轮廓（SP），这使得能够精确地从高光谱图像中提取判别特征。的SP提取方法在这里用于在遥感社区的第一次。然后，所提取的SP被送入光谱分类器。在空间优化阶段，一个像素级分类器被用来获得类概率，随后扩展随机基于步行者空间优化技术。最后，决定融合规则被用于融合由两个不同的阶段中获得的类概率。从不同的场景三个数据集进行的实验示出，所提出的方法可以超越国家的最先进的其它分类技术。此外，所提出的特征提取方法，即，SP，能够有效地提高不同土地覆盖之间的区分。

17. Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence [PDF] 返回目录
Feng Liu, Xiaoming Liu
Abstract: The goal of this paper is to learn dense 3D shape correspondence for topology-varying objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a part embedding vector for each 3D point, which is assumed to be similar to its densely corresponded point in another 3D shape of the same object category. Furthermore, we implement dense correspondence through an inverse function mapping from the part embedding to a corresponded 3D point. Both functions are jointly learned with several effective loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
摘要：本文的目的是学习拓扑变化的对象致密三维形状对应以无监督方式。常规隐函数估计给定的形状潜代码的3D点的占用。相反，我们的新颖的隐函数产生用于每个3D点，其被假定为类似于在相同的对象类别的另一3D形状其密集对应点的一部分嵌入载体。此外，我们通过从零件嵌入到一逆映射函数实现致密对应对应的3D点。两种功能共同使用几种有效的损失函数学会与编码器产生的形状潜在的代码来实现我们的假设，共同提高。在推理，如果用户选择在源极的形状中的任意点，我们的算法可以自动生成的置信度得分指示是否存在于目标形状的对应关系，以及对应的语义点（如果有）。这种机制本身有利于不同部分构成的人造物体。我们的方法的有效性是通过无监督的3D语义对应和形状分割证明。

18. Error Bounds of Projection Models in Weakly Supervised 3D Human Pose Estimation [PDF] 返回目录
Nikolas Klug, Moritz Einfalt, Stephan Brehm, Rainer Lienhart
Abstract: The current state-of-the-art in monocular 3D human pose estimation is heavily influenced by weakly supervised methods. These allow 2D labels to be used to learn effective 3D human pose recovery either directly from images or via 2D-to-3D pose uplifting. In this paper we present a detailed analysis of the most commonly used simplified projection models, which relate the estimated 3D pose representation to 2D labels: normalized perspective and weak perspective projections. Specifically, we derive theoretical lower bound errors for those projection models under the commonly used mean per-joint position error (MPJPE). Additionally, we show how the normalized perspective projection can be replaced to avoid this guaranteed minimal error. We evaluate the derived lower bounds on the most commonly used 3D human pose estimation benchmark datasets. Our results show that both projection models lead to an inherent minimal error between 19.3mm and 54.7mm, even after alignment in position and scale. This is a considerable share when comparing with recent state-of-the-art results. Our paper thus establishes a theoretical baseline that shows the importance of suitable projection models in weakly supervised 3D human pose estimation.
摘要：当前国家的最先进的单眼的3D人体姿态估计由弱监督方法的严重影响。这些允许2D标签被用来直接从图像或经由2D到3D姿势抬升或者学习有效的三维人体姿态恢复。在本文中，我们目前最常用的简化预测模型，其涉及的估计的3D姿势表示为2D标签的详细分析：归一化的角度和弱透视投影。具体来说，我们推导出通常使用的平均每关节位置误差（MPJPE）下的那些投影模型理论下限错误。此外，我们展示了标准化的透视投影如何可以更换，以避免这种保证最小误差。我们评估的最常用的3D人体姿态估计基准数据集派生的下限。我们的研究结果表明，这两种预测模型导致19.3毫米和54.7毫米之间固有的误差最小，即使在位置和比例排列后。这是当与国家的最先进的最新成果比较相当大的份额。因此，我们的文件规定，显示了适合的预测模型，在弱的重要性监督的3D人体姿态估计的理论基础。

19. Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels [PDF] 返回目录
Valentyn Melnychuk, Evgeniy Faerman, Ilja Manakov, Thomas Seidl
Abstract: Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed MixMatch and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnostic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that exponential moving average (EMA) of model parameters, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged. Our code is available online: this https URL
摘要：未标记的数据是在门诊中经常丰富，使得基于半监督学习此设置一个很好的匹配机器学习方法。尽管这样，他们目前正在接受医学图像分析文献相对较少关注。相反，大多数从业者和研究者重点监督或转让的学习方法。最近提出的MixMatch和FixMatch算法已经证明，在同时需要极少的标签，提取有用表示有希望的结果。最近这些成功的启发，我们应用MixMatch和FixMatch在眼科诊断设置和探讨他们如何票价对标准的迁移学习。我们发现，这两种算法优于迁移学习基础上的标签数据的全部分数。此外，我们的实验表明，模型参数的指数移动平均线（EMA），这两种算法的一个组成部分，是不是需要我们的分类问题，因为禁用它留下的结果不变。我们的代码可在网上：此HTTPS URL

20. Self-Learning Transformations for Improving Gaze and Head Redirection [PDF] 返回目录
Yufeng Zheng, Seonwook Park, Xucong Zhang, Shalini De Mello, Otmar Hilliges
Abstract: Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation. Please check our project page at: this https URL
摘要：许多计算机视觉任务依赖于标记的数据。在生成模型的快速进步导致合成逼真的图像的能力。然而，控制使得数据可用于下游的任务依然严峻的监管产生过程的具体方面。在本文中，我们提出一种用于脸部的图像的新颖生成模型，用眼睛注视和头部取向角度能够产生细粒度控制下高质量的图像。这需要很多外观相关的因素包括注视和头部方向，但还有照明，色调等解开我们提出了一个新颖的架构学会探索，解开，并在自学习的方式编码这些外来的变化，这。进一步的研究表明在凝视和头部方向的更精确的建模，明确任务解开，不相干因素的结果。一种新的评估方案显示，我们的方法改善了的状态的最先进的在注视方向和头部定向改变之间重定向的精度和解开。此外，我们表明，在有限数量的真实世界的训练数据的存在，我们的方法允许在下游任务改进的半监督跨数据集的目光估计。此HTTPS URL：请查阅我们的项目页面

21. Show and Speak: Directly Synthesize Spoken Description of Images [PDF] 返回目录
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg
Abstract: This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.
摘要：本文提出了一种新的模式，被称为表演和说话（SAS）模型，首次，能够直接合成图像的语音说明，绕过任何文本或音素的需要。 SAS的基本结构是一个编码器 - 解码器的体系结构，拍摄图像作为输入，并且预测语音的描述此图像的频谱图。最终的语音音频被从预测频谱通过WaveNet获得。公众基准数据库Flickr8k广泛的实验表明，该SAS能够合成自然语音说明的图片，表明合成语音说明的图像，同时绕过文字和音素是可行的。

22. IPU-Net: Multi Scale Identity-Preserved U-Net for Low Resolution Face Recognition [PDF] 返回目录
Vahid Reza Khazaie, Nicky Bayat, Yalda Mohsenzadeh
Abstract: State-of-the-art deep neural network models have reached near perfect face recognition accuracy rates on controlled high resolution face images. However, their performance is drastically degraded when they are tested with very low resolution face images. This is particularly critical in surveillance systems, where a low resolution probe image is to be matched with high resolution gallery images. Super resolution techniques aim at producing high resolution face images from low resolution counterparts. While they are capable of reconstructing images that are visually appealing, the identity-related information is not preserved. Here, we propose an identity-preserved U-Net which is capable of super-resolving very low resolution faces to their high resolution counterparts while preserving identity-related information. We achieve this by training a U-Net with a combination of a reconstruction and an identity-preserving loss, on multi-scale low resolution conditions. Extensive quantitative evaluations of our proposed model demonstrated that it outperforms competing super resolution and low resolution face recognition methods on natural and artificial low resolution face data sets and even unseen identities.
摘要：国家的最先进的深层神经网络模型已经在附近控制高分辨率的人脸图像完美的脸部识别准确率达到了。然而，当他们用非常低的分辨率的人脸图像测试其性能急剧下降。这是在监控系统，其中低分辨率探头图像是高分辨率图像画廊匹配特别关键的。超解像技术的目的是从低分辨率同行产生高分辨率的人脸图像。虽然他们能够重建被视觉吸引力的图像，不保留身份相关的信息。在这里，我们提出了一种身份的保留掌中宽带，它能够超分辨率分辨率非常低的面孔其高分辨率的同行，同时保留身份相关的信息。我们通过训练掌中与重建和身份保留损耗，多尺度低分辨率条件的组合实现这一目标。我们提出的模型的大量的定量评价表明，它优于竞超分辨率和对自然和人工低分辨率的面部数据集，甚至看不见的身份低分辨率的面部识别方法。

23. Domain Adaptation in LiDAR Semantic Segmentation [PDF] 返回目录
Inigo Alonso, Luis Riazuelo. Luis Montesano, Ana C. Murillo
Abstract: LiDAR semantic segmentation provides 3D semantic information about the environment, an essential cue for intelligent systems during their decision making processes. Deep neural networks are achieving state-of-the-art results on large public benchmarks on this task. Unfortunately, finding models that generalize well or adapt to additional domains, where data distribution is different, remains a major challenge. This work addresses the problem of unsupervised domain adaptation for LiDAR semantic segmentation models. Our approach combines novel ideas on top of the current state-of-the-art approaches and yields new state-of-the-art results. We propose simple but effective strategies to reduce the domain shift by aligning the data distribution on the input space. Besides, we propose a learning-based approach that aligns the distribution of the semantic classes of the target domain to the source domain. The presented ablation study shows how each part contributes to the final performance. Our strategy is shown to outperform previous approaches for domain adaptation with comparisons run on three different domains.
摘要：激光雷达语义分割提供了有关环境的三维语义信息，在其决策过程的智能系统的重要线索。深层神经网络实现对这项任务的大型公共基准国家的先进成果。不幸的是，发现推广以及或者适应其他域，其中数据分布是不同的机型，仍然是一个重大的挑战。这项工作解决了无人监管的领域适应性激光雷达语义分割模式的问题。我们的方法结合了当前国家的最先进的顶新奇的想法，方法和收益率新的国家的最先进的成果。我们提出了简单而有效的策略对准输入空间数据分布来减少域名转移。此外，我们提出了一个基于学习的做法，对齐语义类别目标域源域的分布。所提出的消融研究表明各部分如何有助于最终性能。我们的策略是证明域适应以前的方法有三种不同的域中运行比较跑赢大市。

24. Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning [PDF] 返回目录
Huan Fu, Shunming Li, Rongfei Jia, Mingming Gong, Binqiang Zhao, Dacheng Tao
Abstract: Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs. To tackle this issue, we develop a geometry-focused multi-view metric learning framework empowered by texture synthesis. The synthesis of textures for 3D shape models creates hard triplets, which suppress the adverse effects of rich texture in 2D images, thereby push the network to focus more on discovering geometric characteristics. Our approach shows state-of-the-art performance on a recently released large-scale 3D-FUTURE[1] repository, as well as three widely studied benchmarks, including Pix3D[2], Stanford Cars[3], and Comp Cars[4]. Codes will be made public available at: this https URL
摘要：基于图像的3D形状检索（IBSR）旨在找到从一个大3D形状数据库给定2D图像的对应的3D形状。公共例行程序是2D图像和3D形状映射到一个嵌入空间和限定（或学习）的形状相似性度量。虽然一些适应技术度量学习似乎是一个自然的解决方案，以形状相似的学习，表现往往不理想细粒度形状检索。在本文中，我们找出表现不佳的根源，并提出解决这个问题的实际解决方案。我们发现，负对之间的形状差纠缠着质感的差距，使得度量学习无效的推开负对。为了解决这个问题，我们开发了一个几何聚焦的纹理合成授权的多视角度量学习框架。纹理的三维形状模型的合成创建硬三重峰，这抑制富含2D图像纹理的不利影响，从而推动网络更专注于发现几何特性。我们在最近发布的大型3D-FUTURE方法显示出国家的最先进的性能[1]存储库，以及三个广泛研究的基准，包括Pix3D [2]，斯坦福汽车[3]，和Comp汽车[ 4]。代码将向社会公布，请访问：此HTTPS URL

25. Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition [PDF] 返回目录
Negar Heidari, Alexandros Iosifidis
Abstract: Graph convolutional networks (GCNs) have been very successful in modeling non-Euclidean data structures, like sequences of body skeletons forming actions modeled as spatio-temporal graphs. Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action. This leads to a high number of floating point operations (ranging from 16G to 100G FLOPs) to process a single sample, making their adoption in restricted computation application scenarios infeasible. In this paper, we propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition by selecting the most informative skeletons of an action at the early layers of the network. We incorporate the TAM in a light-weight GCN topology to further reduce the overall number of computations. Experimental results on two benchmark datasets show that the proposed method outperforms with a large margin the baseline GCN-based method while having 2.9 times less number of computations. Moreover, it performs on par with the state-of-the-art with up to 9.6 times less number of computations.
摘要：图形卷积网络（GCNs）已经在建模非欧几里得数据结构，像形成建模为时空图表动作体骨架的序列非常成功。大多数基于GCN动作识别方法使用具有高计算复杂深刻的前馈网络来处理动作的所有骨骼。这导致大量的浮点运算（从16G到100G FLOPS）来处理一个单一的样品，使得在受限的计算应用场景不可行其通过。在本文中，我们提出了通过在网络的早期层选择行动的最翔实的骨骼增加基于骨架动作识别效率时间关注模块（TAM）。我们结合了TAM在重量轻的GCN拓扑，以进一步减少计算的总数。在两个基准数据集实验结果表明，同时具有更少的2.9倍计算的数量，与一个大的余量，该方法优于基线GCN-基础的方法。此外，它执行看齐所述状态的最先进的具有高达计算的9.6倍数以下。

26. A Teacher-Student Framework for Semi-supervised Medical Image Segmentation From Mixed Supervision [PDF] 返回目录
Liyan Sun, Jianxiong Wu, Xinghao Ding, Yue Huang, Guisheng Wang, Yizhou Yu
Abstract: Standard segmentation of medical images based on full-supervised convolutional networks demands accurate dense annotations. Such learning framework is built on laborious manual annotation with restrict demands for expertise, leading to insufficient high-quality labels. To overcome such limitation and exploit massive weakly labeled data, we relaxed the rigid labeling requirement and developed a semi-supervised learning framework based on a teacher-student fashion for organ and lesion segmentation with partial dense-labeled supervision and supplementary loose bounding-box supervision which are easier to acquire. Observing the geometrical relation of an organ and its inner lesions in most cases, we propose a hierarchical organ-to-lesion (O2L) attention module in a teacher segmentor to produce pseudo-labels. Then a student segmentor is trained with combinations of manual-labeled and pseudo-labeled annotations. We further proposed a localization branch realized via an aggregation of high-level features in a deep decoder to predict locations of organ and lesion, which enriches student segmentor with precise localization information. We validated each design in our model on LiTS challenge datasets by ablation study and showed its state-of-the-art performance compared with recent methods. We show our model is robust to the quality of bounding box and achieves comparable performance compared with full-supervised learning methods.
摘要：基于全监督卷积网络医学图像的标准分割要求高精度的密集注解。这样的学习框架是建立在费力的人工标注有限制专业知识的需求，导致不够高品质的标签。为了克服这种限制，并利用大量的弱标记数据，我们放宽了刚性标签要求，并开发了基于一个师生时尚器官和肿瘤分割与局部密集标记的监督和补充宽松包围盒监督一个半监督学习框架这是更容易获得。观察器官和其内部的病变在大多数情况下的几何关系，我们提出了一种层次器官到病灶（O2L）注意模块中的老师分段装置来产生伪标签。然后学生分割器进行训练手册标记和伪标记注解的组合。我们进一步提出了本地化分支通过的高层次功能的集合在一个很深的解码器实现预测器官和病变的位置，丰富了学生的分段装置具有精确的定位信息。我们验证的每个设计在我们的双床挑战数据集模型的消融研究显示了其国家的最先进的性能，近期方法相比。我们表明我们的模型是稳健的包围盒的质量和全监督学习方法相比达到相当的性能。

27. Feature matching in Ultrasound images [PDF] 返回目录
Hang Zhu, Zihao Wang
Abstract: Feature matching is an important technique to identify a single object in different images. It helps machines to construct recognition of a specific object from multiple perspectives. For years, feature matching has been commonly used in various computer vision applications, like traffic surveillance, self-driving, and other systems. With the arise of Computer-Aided Diagnosis(CAD), the need for feature matching techniques also emerges in the medical imaging field. In this paper, we present a deep learning-based method specially for ultrasound images. It will be examined against existing methods that have outstanding results on regular images. As the ultrasound images are different from regular images in many fields like texture, noise type, and dimension, traditional methods will be evaluated and optimized to be applied to ultrasound images.
摘要：特征匹配是在不同的图像以识别单个对象的重要技术。它有助于机器结构识别从多角度特定对象的。多年来，特征匹配已常用于各种计算机视觉应用中，如交通监控，自驾车，及其他系统。与出现的计算机辅助诊断（CAD）的，需要进行特征匹配技术也出现在医学成像领域。在本文中，我们提出了一个深刻的学习法专门针对超声图像。这将打击那些对常规图像优异成绩现有的方法进行检查。作为所述超声图像是从在般的质感，噪声类型和尺寸许多领域常规的图像不同，传统的方法将被评估和优化要施加到超声图像。

28. The Analysis of Facial Feature Deformation using Optical Flow Algorithm [PDF] 返回目录
Dayang Nur Zulhijah Awang Jesemi, Hamimah Ujir, Irwandi Hipiny, Sarah Flora Samson Juan
Abstract: Facial features deformed according to the intended facial expression. Specific facial features are associated with specific facial expression, i.e. happy means the deformation of mouth. This paper presents the study of facial feature deformation for each facial expression by using an optical flow algorithm and segmented into three different regions of interest. The deformation of facial features shows the relation between facial the and facial expression. Based on the experiments, the deformations of eye and mouth are significant in all expressions except happy. For happy expression, cheeks and mouths are the significant regions. This work also suggests that different facial features' intensity varies in the way that they contribute to the recognition of the different facial expression intensity. The maximum magnitude across all expressions is shown by the mouth for surprise expression which is 9x10-4. While the minimum magnitude is shown by the mouth for angry expression which is 0.4x10-4.
摘要：面部特征根据预期的面部表情变形。特定的面部特征与特定面部表情相关联，即高兴装置嘴的变形。本文通过使用光流算法给出了每个面部表情面部特征变形的研究和分段成感兴趣的三个不同区域。的面部特征的变形示出的面部的面部表情之间的关系。根据实验，眼睛和嘴的变形是除了高兴所有表达式显著。对于幸福的表情，脸颊和嘴巴是显著的区域。这项工作还表明，不同的面部特征强度，他们的识别不同表情强度的贡献方式而异。在所有表达的最大大小由口中惊讶表达是9x10-4所示。而最小幅度由口中愤怒表情是0.4x10-4所示。

29. Learn Robust Features via Orthogonal Multi-Path [PDF] 返回目录
Kun Fang, Yingwen Wu, Tao Li, Xiaolin Huang, Jie Yang
Abstract: It is now widely known that by adversarial attacks, clean images with invisible perturbations can fool deep neural networks. To defend adversarial attacks, we design a block containing multiple paths to learn robust features and the parameters of these paths are required to be orthogonal with each other. The so-called Orthogonal Multi-Path (OMP) block could be posed in any layer of a neural network. Via forward learning and backward correction, one OMP block makes the neural networks learn features that are appropriate for all the paths and hence are expected to be robust. With careful design and thorough experiments on e.g., the positions of imposing orthogonality constraint, and the trade-off between the variety and accuracy, the robustness of the neural networks is significantly improved. For example, under white-box PGD attack with $l_\infty$ bound ${8}/{255}$ (this is a fierce attack that can make the accuracy of many vanilla neural networks drop to nearly $10\%$ on CIFAR10), VGG16 with the proposed OMP block could keep over $50\%$ accuracy. For black-box attacks, neural networks equipped with an OMP block have accuracy over $80\%$. The performance under both white-box and black-box attacks is much better than the existing state-of-the-art adversarial defenders.
摘要：现在广为人知的是，由敌对攻击，干净的图像与无形的扰动可以骗过深层神经网络。为了保护对抗攻击，我们设计包含多个路径的学习强大的功能和这些路径的参数需要是相互正交的块。所谓的正交多路径（OMP）嵌段可以在一个神经网络中的任意层来构成。通过学习向前和向后修正，一个OMP块，使神经网络学习是适合于所有的路径，因此预计将强大的功能。通过精心设计和周密例如实验，气势正交约束的位置，以及品种和精度之间的权衡，神经网络的健壮性显著改善。例如，在白盒PGD攻击与$ L_ \ infty $势必$ {8} / {255} $（这是一个猛烈的攻击，可以使许多香草的神经网络降至近$ 10 \％$上CIFAR10精度），VGG16与建议OMP块能保持在$ 50 \％$准确性。对于黑盒攻击，配备了OMP块神经网络有超过$ 80 \％$准确性。白盒和黑盒两种攻击下的性能比现有的国家的最先进的对抗防御者好得多。

30. Towards Fair Knowledge Transfer for Imbalanced Domain Adaptation [PDF] 返回目录
Taotao Jing Name, Bingrong Xu, Jingjing Li, Zhengming Ding
Abstract: Domain adaptation (DA) becomes an up-and-coming technique to address the insufficient or no annotation issue by exploiting external source knowledge. Existing DA algorithms mainly focus on practical knowledge transfer through domain alignment. Unfortunately, they ignore the fairness issue when the auxiliary source is extremely imbalanced across different categories, which results in severe under-presented knowledge adaptation of minority source set. To this end, we propose a Towards Fair Knowledge Transfer (TFKT) framework to handle the fairness challenge in imbalanced cross-domain learning. Specifically, a novel cross-domain mixup generation is exploited to augment the minority source set with target information to enhance fairness. Moreover, dual distinct classifiers and cross-domain prototype alignment are developed to seek a more robust classifier boundary and mitigate the domain shift. Such three strategies are formulated into a unified framework to address the fairness issue and domain shift challenge. Extensive experiments over two popular benchmarks have verified the effectiveness of our proposed model by comparing to existing state-of-the-art DA models, and especially our model significantly improves over 20% on two benchmarks in terms of the overall accuracy.
摘要：域名适应（DA）成为了崭露头角的技术，通过利用外部资源的知识来解决不足或根本没有注释的问题。现有的DA算法主要集中在通过畴取向实用的知识转移。不幸的是，他们忽略了公平问题，当辅助源极为不同类别不平衡，导致少数源组的严重不足提出了知识适应。为此，我们提出了迈向公平知识转移（TFKT）框架来处理不平衡跨域学习的公平性的挑战。具体而言，一种新颖的跨域的mixup代被利用，以增加少数源与目标信息设置为增强公平性。此外，双不同分类器和跨域原型对准的开发，以寻求更稳健的分类器边界和减轻域移位。这样的三种策略被配制成一个统一的框架来解决公平性问题和领域转变的挑战。在两个流行的基准测试大量的实验已通过比较国家的最先进的现有车型DA验证了我们提出的模型的有效性，特别是我们的模型显著提高了两个标杆超过20％的整体精度方面。

31. Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation [PDF] 返回目录
Yuxi Li, Ning Xu, Jinlong Peng, John See, Weiyao Lin
Abstract: In this paper, we address several inadequacies of current video object segmentation pipelines. Firstly, a cyclic mechanism is incorporated to the standard semi-supervised process to produce more robust representations. By relying on the accurate reference mask in the starting frame, we show that the error propagation problem can be mitigated. Next, we introduce a simple gradient correction module, which extends the offline pipeline to an online method while maintaining the efficiency of the former. Finally we develop cycle effective receptive field (cycle-ERF) based on gradient correction to provide a new perspective into analyzing object-specific regions of interests. We conduct comprehensive experiments on challenging benchmarks of DAVIS17 and Youtube-VOS, demonstrating that the cyclic mechanism is beneficial to segmentation quality.
摘要：在本文中，我们要解决当前视频对象分割流水线的几个不足之处。首先，环状机构被并入到标准半监督过程中产生更稳健的表示。通过依赖于在起始帧中的准确的参考掩模，我们表明，错误传播的问题可以得到缓解。接下来，我们介绍一个简单的梯度校正模块，它扩展了管道离线到在线的方法，同时保持了前者的效率。最后，我们基于梯度校正，以提供一个新的角度进入的利益分析对象的特定区域的开发周期有效感受域（周期-ERF）。我们对挑战DAVIS17和YouTube的VOS的基准，这表明循环机制是分段质量有益进行综合试验。

32. AdaCrowd: Unlabeled Scene Adaptation for Crowd Counting [PDF] 返回目录
Mahesh Kumar Krishna Reddy, Mrigank Rochan, Yiwei Lu, Yang Wang
Abstract: We address the problem of image-based crowd counting. In particular, we propose a new problem called unlabeled scene adaptive crowd counting. Given a new target scene, we would like to have a crowd counting model specifically adapted to this particular scene based on the target data that capture some information about the new scene. In this paper, we propose to use one or more unlabeled images from the target scene to perform the adaptation. In comparison with the existing problem setups (e.g. fully supervised), our proposed problem setup is closer to the real-world applications of crowd counting systems. We introduce a novel AdaCrowd framework to solve this problem. Our framework consists of a crowd counting network and a guiding network. The guiding network predicts some parameters in the crowd counting network based on the unlabeled images from a particular scene. This allows our model to adapt to different target scenes. The experimental results on several challenging benchmark datasets demonstrate the effectiveness of our proposed approach compared with other alternative methods.
摘要：针对基于图像的人群计数的问题。特别是，我们提出了一个所谓的未标记的场景适应人群计数新的问题。赋予了新的目标场景，我们希望有基于目标数据特别适合于这个特定的场景中的人群计数模式，捕捉有关新场景的一些信息。在本文中，我们建议使用一个或多个未标记的图片来自目标场景进行调整。在与现有问题的设置（例如全面监督）的比较，我们提出的问题设置更接近人群计数系统的现实应用。我们引入新的AdaCrowd框架来解决这个问题。我们的框架是由一群计数网络和指导网络。导向网络预测基于从特定场景的未标记的图象的人群计数网络中的一些参数。这使得我们的模型，以适应不同的目标场景。在一些富有挑战性的基准数据集上的实验结果表明，与其他替代方法相比，我们所提出的方法的有效性。

33. Rethinking the competition between detection and ReID in Multi-Object Tracking [PDF] 返回目录
Chao Liang, Zhipeng Zhang, Yi Lu, Xue Zhou, Bing Li, Xiyong Ye, Jianxiao Zou
Abstract: Due to balanced accuracy and speed, joint learning detection and ReID-based one-shot models have drawn great attention in multi-object tracking(MOT). However, the differences between the above two tasks in the one-shot tracking paradigm are unconsciously overlooked, leading to inferior performance than the two-stage methods. In this paper, we dissect the reasoning process of the aforementioned two tasks. Our analysis reveals that the competition of them inevitably hurts the learning of task-dependent representations, which further impedes the tracking performance. To remedy this issue, we propose a novel cross-correlation network that can effectively impel the separate branches to learn task-dependent representations. Furthermore, we introduce a scale-aware attention network that learns discriminative embeddings to improve the ReID capability. We integrate the delicately designed networks into a one-shot online MOT system, dubbed CSTrack. Without bells and whistles, our model achieves new state-of-the-art performances on MOT16 and MOT17. We will release our code to facilitate further work.
摘要：由于平衡的精度和速度，共同学习检测和基于里德一-shot机型在多目标追踪（MOT）已引起了极大的关注。然而，在一次性跟踪范例上述两个任务之间的差异也在不知不觉中被忽视，导致比二阶段方法性能较差。在本文中，我们解剖的上述两个任务的推理过程。我们的分析表明，它们的竞争不可避免地伤害取决于任务的表示的学习，这进一步阻碍了跟踪性能。为了解决这个问题，我们提出了一个新的交叉相关的网络，可以有效促使单独的分支学习取决于任务的表示。此外，我们引入一个规模感知关注网络学习辨别的嵌入，以提高里德能力。我们将精心设计的网络集成到一个单次在线MOT系统，被称为CSTrack。没有花俏，我们的模型实现了对MOT16和MOT17新的国家的最先进的性能。我们会发布我们的代码，以方便进一步的工作。

34. Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation [PDF] 返回目录
Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
Abstract: We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.
摘要：我们提出了用自然语言描述高效影像处理的新颖轻巧的生成敌对网络。为了实现这一目标，一个新词 - 平判断，提出了一种在文字层面提供了发电机细粒度培训反馈，以促进培训一个轻量级的发电机具有少量的参数，但仍然可以正确地着眼于特定的视觉图像，然后编辑它们的属性，而不会影响未在文中所述的其他内容。此外，由于涉及到每个字显式训练信号，鉴别也可以简化为具有结构轻巧。与现有技术状态相比，我们的方法具有参数少得多，但仍然实现了有竞争力的操控性能。广泛的实验结果表明，我们的方法可以更好地解开不同视觉属性，然后正确地将它们映射到相应的语义词语，以及使用自然语言描述从而达到更精确的图像修改。

35. Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization [PDF] 返回目录
Li Ren, Kai Li, LiQiang Wang, Kien Hua
Abstract: Matching information across image and text modalities is a fundamental challenge for many applications that involve both vision and natural language processing. The objective is to find efficient similarity metrics to compare the similarity between visual and textual information. Existing approaches mainly match the local visual objects and the sentence words in a shared space with attention mechanisms. The matching performance is still limited because the similarity computation is based on simple comparisons of the matching features, ignoring the characteristics of their distribution in the data. In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words. Specifically, we propose a novel Adversarial Discriminative Domain Regularization (ADDR) learning framework, beyond the paradigm metric learning objective, to construct a set of discriminative data domains within each image-text pairs. Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks by regulating the distribution of the hidden space between the matching pairs. The experimental results show that this new approach significantly improves the overall performance of several popular cross-modal matching techniques (SCAN, VSRN, BFAN) on the MS-COCO and Flickr30K benchmarks.
摘要：跨图片和文字形式匹配信息为对于包括视觉和自然语言处理的许多应用的根本挑战。其目的是要找到有效的相似性指标比较视觉和文本信息之间的相似性。现有的方法主要是匹配与关注机制的共享空间的局部视觉对象和一句话。因为相似度计算是基于匹配功能简单的比较，忽略了他们在数据分布特征的匹配性能仍然是有限的。在本文中，我们解决此限制与高效的学习目标考虑了视觉对象和句话之间的区别功能分布。具体地，我们提出了一个新颖的对抗性判别域的正则化（ADDR）的学习框架，超越范式度量学习目标，以构建组中的每个图像，文本对中判别数据域。我们的方法通常可以提高学习效率，通过调节隐藏空间的匹配对之间的分布学习框架现有指标的性能。实验结果表明，这种新方法显著提高了对MS-COCO和Flickr30K基准几个流行的跨模态匹配技术（SCAN，VSRN，BFAN）的整体性能。

36. GPS-Denied Navigation Using SAR Images and Neural Networks [PDF] 返回目录
Teresa White, Jesse Wheeler, Colton Lindstrom, Randall Christensen, Kevin R. Moon
Abstract: Unmanned aerial vehicles (UAV) often rely on GPS for navigation. GPS signals, however, are very low in power and easily jammed or otherwise disrupted. This paper presents a method for determining the navigation errors present at the beginning of a GPS-denied period utilizing data from a synthetic aperture radar (SAR) system. This is accomplished by comparing an online-generated SAR image with a reference image obtained a priori. The distortions relative to the reference image are learned and exploited with a convolutional neural network to recover the initial navigational errors, which can be used to recover the true flight trajectory throughout the synthetic aperture. The proposed neural network approach is able to learn to predict the initial errors on both simulated and real SAR image data.
摘要：无人机（UAV）通常依赖于GPS导航。 GPS信号，然而，在功率非常低，容易卡住，否则将会破坏。本文提出了确定本导航误差在利用来自合成孔径雷达（SAR）系统数据的GPS被拒绝周期的开始的方法。这是通过一个在线生成的SAR图像与参考图像进行比较来实现而获得的先验。相对于参考图像中的失真被学习，并用一个卷积神经网络利用来恢复初始的导航误差，其可用于恢复整个合成孔径真实飞行轨迹。所提出的神经网络方法能够学会预测两个模拟和实际SAR图像数据的初始误差。

37. Few-shot Image Recognition with Manifolds [PDF] 返回目录
Debasmit Das, J.H. Moon, C. S. George Lee
Abstract: In this paper, we extend the traditional few-shot learning (FSL) problem to the situation when the source-domain data is not accessible but only high-level information in the form of class prototypes is available. This limited information setup for the FSL problem deserves much attention due to its implication of privacy-preserving inaccessibility to the source-domain data but it has rarely been addressed before. Because of limited training data, we propose a non-parametric approach to this FSL problem by assuming that all the class prototypes are structurally arranged on a manifold. Accordingly, we estimate the novel-class prototype locations by projecting the few-shot samples onto the average of the subspaces on which the surrounding classes lie. During classification, we again exploit the structural arrangement of the categories by inducing a Markov chain on the graph constructed with the class prototypes. This manifold distance obtained using the Markov chain is expected to produce better results compared to a traditional nearest-neighbor-based Euclidean distance. To evaluate our proposed framework, we have tested it on two image datasets - the large-scale ImageNet and the small-scale but fine-grained CUB-200. We have also studied parameter sensitivity to better understand our framework.
摘要：在本文中，我们当源域数据不可访问，但只能在类原型的形式，高层次信息可扩展传统几拍学习（FSL）的问题的情况。对于FSL问题这有限的信息设置是非常值得关注，因为它隐私保护无法进入到源域数据的含义，但此前很少被讨论。由于有限的培训数据，我们假设所有的类原型结构布置在歧管提出了一种非参数方法这个问题FSL。因此，我们通过少数次采样投影到平均在其周围的类位于子空间的估算新颖级原型位置。在分类中，我们再次通过诱导与所述类原型构造的曲线图中的马尔可夫链利用的类别的结构排列。使用马尔可夫链获得该歧管的距离，预计相对于传统的基于最近邻居的欧几里得距离，以产生更好的结果。为了评估我们提出的框架中，我们已经测试了它在两个图像数据集 - 大型ImageNet和小规模但细粒度CUB-200。我们还研究了参数的敏感性，以更好地了解我们的框架。

38. Contrastive Learning with Adversarial Examples [PDF] 返回目录
Chih-Hui Ho, Nuno Vasconcelos
Abstract: Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. It uses pairs of augmentations of unlabeled training examples to define a classification task for pretext learning of a deep embedding. Despite extensive works in augmentation procedures, prior works do not address the selection of challenging negative pairs, as images within a sampled batch are treated independently. This paper addresses the problem, by introducing a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE. When compared to standard CL, the use of adversarial examples creates more challenging positive pairs and adversarial training produces harder negative pairs by accounting for all images in a batch during the optimization. CLAE is compatible with many CL methods in the literature. Experiments show that it improves the performance of several existing CL baselines on multiple datasets.
摘要：对比学习（CL）是视觉表现的自我监督学习（SSL）的流行技术。它采用对未标记的训练例子扩充的定义深嵌入的借口学习分类任务。尽管加大过程广泛作品，现有工程不应对挑战负对作为样品批内的图像被独立地处理的选择。本文通过介绍为constrastive学习对抗的例子一个新的家庭，并使用这些例子来定义SSL一个新的对抗训练算法，表示为解决CLAE问题。当与标准CL，使用对抗示例创建更具挑战正对和对抗性训练由占优化期间在一个批次中的所有图像产生较硬的负对。 CLAE是与文献中许多CL方法兼容。实验表明，它提高了对多个数据集几种现有CL基准的表现。

39. Keep your Eyes on the Lane: Attention-guided Lane Detection [PDF] 返回目录
Lucas Tabelini, Rodrigo Berriel, Thiago M. Paixão, Claudine Badue, Alberto F. De Souza, Thiago Olivera-Santos
Abstract: Modern lane detection methods have achieved remarkable performances in complex real-world scenarios, but many have issues maintaining real-time efficiency, which is important for autonomous vehicles. In this work, we propose LaneATT: an anchor-based deep lane detection model, which, akin to other generic deep object detectors, uses the anchors for the feature pooling step. Since lanes follow a regular pattern and are highly correlated, we hypothesize that in some cases global information may be crucial to infer their positions, especially in conditions such as occlusion, missing lane markers, and others. Thus, we propose a novel anchor-based attention mechanism that aggregates global information. The model was evaluated extensively on two of the most widely used datasets in the literature. The results show that our method outperforms the current state-of-the-art methods showing both a higher efficacy and efficiency. Moreover, we perform an ablation study and discuss efficiency trade-off options that are useful in practice. To reproduce our findings, source code and pretrained models are available at this https URL
摘要：现代车道检测方法在复杂的现实世界的情景取得了显着的表演，但很多有保持实时的效率，这是自主车的重要问题。在这项工作中，我们提出LaneATT：基于锚深车道检测模型，其中，类似于其他通用深物体探测器，采用锚的功能汇集的一步。由于车道遵循规律，并具有高度相关性，我们假设在某些情况下，全球信息可能推断出它们的位置是至关重要的，尤其是在条件，如闭塞，缺少车道标记，和其他人。因此，我们建议，汇集全球信息的新的基于锚的注意机制。该模型是在两个最广泛使用的数据集在文献中的广泛评估。结果表明，我们的方法优于当前状态的最先进的方法示出一个都更高的效力和效率。此外，我们进行消融的研究和讨论在实践有用效率的权衡选择。要重现我们的调查结果，源代码和预训练的模型可在此HTTPS URL

40. Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection [PDF] 返回目录
Zeyi Huang, Yang Zou, Vijayakumar Bhagavatula, Dong Huang
Abstract: Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images. CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO.
摘要：弱监督对象检测（WSOD）已成为一种有效的工具，以仅使用图像级类别标签列车对象检测器。然而，如果没有对象级标签，WSOD探测器易于检测上显着对象，群集对象和判别对象部分包围盒。此外，图像级类别标签不执行跨相同图像的不同转化一致物体检测。为了解决上述问题，我们提出了一个全面的关注自蒸馏（CASD）培训WSOD方法。为了平衡所有的对象实例之间功能的学习，CASD计算从多个变换和相同的图像的功能层聚合的广泛关注。要加强对对象一致空间的监督，CASD在WSOD网络进行自我升华，使得全面关注由多个转换和相同的图像的功能层，同时近似。 CASD上产生标准基准如PASCAL VOC二千零十二分之二千零七和MS-COCO新的国家的最先进的WSOD结果。

41. AutoPruning for Deep Neural Network with Dynamic Channel Masking [PDF] 返回目录
Baopu Li, Yanwen Fan, Zhihong Pan, Teng Xi, Gang Zhang
Abstract: Modern deep neural network models are large and computationally intensive. One typical solution to this issue is model pruning. However, most current pruning algorithms depend on hand crafted rules or domain expertise. To overcome this problem, we propose a learning based auto pruning algorithm for deep neural network, which is inspired by recent automatic machine learning(AutoML). A two objectives' problem that aims for the the weights and the best channels for each layer is first formulated. An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously. In the process of pruning, we utilize a searchable hyperparameter, remaining ratio, to denote the number of channels in each convolution layer, and then a dynamic masking process is proposed to describe the corresponding channel evolution. To control the trade-off between the accuracy of a model and the pruning ratio of floating point operations, a novel loss function is further introduced. Preliminary experimental results on benchmark datasets demonstrate that our scheme achieves competitive results for neural network pruning.
摘要：现代深层神经网络模型大，计算密集型。对这个问题的一个典型的解决方案是模型修剪。然而，目前大多数修剪算法依赖于手工制作的规则或领域的专业知识。为了克服这个问题，我们提出了深层神经网络，它是由最近的自动机器学习（AutoML）激发了学习基于自动修正算法。甲两个目标的问题，对于权重和每一层的最佳信道的目标是首先配制。然后可替换的优化方法，提出了同时导出最佳的信道号和权重。在修剪过程中，我们利用一个可搜索的超参数，残存率，以表示在每个卷积层信道的数量，然后动态掩蔽过程提出来描述对应的信道的演变。为了控制模型的准确度和浮点操作的修剪比之间的折衷，一种新颖的损失函数进一步导入。对标准数据集的初步实验结果表明，我们的方案实现了对神经网络修剪竞争力的结果。

42. Unsupervised deep learning for grading of age-related macular degeneration using retinal fundus images [PDF] 返回目录
Baladitya Yellapragada, Sascha Hornhauer, Kiersten Snyder, Stella Yu, Glenn Yiu
Abstract: Many diseases are classified based on human-defined rubrics that are prone to bias. Supervised neural networks can automate the grading of retinal fundus images, but require labor-intensive annotations and are restricted to the specific trained task. Here, we employed an unsupervised network with Non-Parametric Instance Discrimination (NPID) to grade age-related macular degeneration (AMD) severity using fundus photographs from the Age-Related Eye Disease Study (AREDS). Our unsupervised algorithm demonstrated versatility across different AMD classification schemes without retraining, and achieved unbalanced accuracies comparable to supervised networks and human ophthalmologists in classifying advanced or referable AMD, or on the 4-step AMD severity scale. Exploring the networks behavior revealed disease-related fundus features that drove predictions and unveiled the susceptibility of more granular human-defined AMD severity schemes to misclassification by both ophthalmologists and neural networks. Importantly, unsupervised learning enabled unbiased, data-driven discovery of AMD features such as geographic atrophy, as well as other ocular phenotypes of the choroid, vitreous, and lens, such as visually-impairing cataracts, that were not pre-defined by human labels.
摘要：许多疾病都是基于人定义的量规，它们很容易偏向分类。监督神经网络可以自动视网膜眼底图像的分级，但需要劳动密集型的注释和被限制在特定的培训任务。在这里，我们采用非参数实例歧视（NPID）无监督网络级年龄相关性黄斑变性（AMD）的严重程度用眼底照片从年龄相关性眼病研究（AREDS）。我们的无监督算法证明在不同的AMD的分类方案的通用性而无需再训练，并且在分类晚期或能够参照AMD，或在4步AMD严重程度量表实现不平衡精度媲美监督网络和人类眼科医生。探索网络行为表明，开车预测且由眼科医生和神经网络的推出更细化的人定义AMD的严重性方案的易感性误判疾病相关眼底功能。重要的是，无监督学习启用无偏，AMD的数据驱动发现作为地理萎缩，以及脉络膜的其它眼部表型，玻璃体和透镜功能，例如，如视觉损害白内障，不是由人的标签预先定义的。

43. Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising [PDF] 返回目录
Yaochen Xie, Zhengyang Wang, Shuiwang Ji
Abstract: Self-supervised frameworks that learn denoising models with merely individual noisy images have shown strong capability and promising performance in various image denoising tasks. Existing self-supervised denoising frameworks are mostly built upon the same theoretical foundation, where the denoising models are required to be J-invariant. However, our analyses indicate that the current theory and the J-invariance may lead to denoising models with reduced performance. In this work, we introduce Noise2Same, a novel self-supervised denoising framework. In Noise2Same, a new self-supervised loss is proposed by deriving a self-supervised upper bound of the typical supervised loss. In particular, Noise2Same requires neither J-invariance nor extra information about the noise model and can be used in a wider range of denoising applications. We analyze our proposed Noise2Same both theoretically and experimentally. The experimental results show that our Noise2Same remarkably outperforms previous self-supervised denoising methods in terms of denoising performance and training efficiency. Our code is available at this https URL.
摘要：该学会只是个别噪声图像去噪模型自我监督的框架已经显示了强大的能力，在不同的图像去噪任务有前途的性能。现有的自我监督降噪框架是在相同的理论基础，在需要的去噪模型是J-不变大多建。然而，我们的分析表明，目前的理论和J-不变性可能会导致性能降低的降噪模式。在这项工作中，我们介绍Noise2Same，一个新的自我监督降噪框架。在Noise2Same，一个新的自我监督的损失通过获取一提出自我监督的上限典型的监督损失。特别是，Noise2Same既不需要J-不变性也不关于噪声模型的额外信息，并且可以在较宽的范围去噪应用中使用。我们都从理论和实验分析我们提出Noise2Same。实验结果表明，我们的Noise2Same明显优于在去噪性能和训练效率方面先前的自我监督去噪方法。我们的代码可在此HTTPS URL。

44. Non-convex Super-resolution of OCT images via sparse representation [PDF] 返回目录
Gabriele Scrivanti, Luca Calatroni, Serena Morigi, Lindsay Nicholson, Alin Achim
Abstract: We propose a non-convex variational model for the super-resolution of Optical Coherence Tomography (OCT) images of the murine eye, by enforcing sparsity with respect to suitable dictionaries learnt from high-resolution OCT data. The statistical characteristics of OCT images motivate the use of {\alpha}-stable distributions for learning dictionaries, by considering the non-Gaussian case, {\alpha}=1. The sparsity-promoting cost function relies on a non-convex penalty - Cauchy-based or Minimax Concave Penalty (MCP) - which makes the problem particularly challenging. We propose an efficient algorithm for minimizing the function based on the forward-backward splitting strategy which guarantees at each iteration the existence and uniqueness of the proximal point. Comparisons with standard convex L1-based reconstructions show the better performance of non-convex models, especially in view of further OCT image analysis
摘要：我们提出了超分辨率光学相干断层扫描（OCT）小鼠眼的图像，非凸变模型通过实施稀疏相对于由高分辨率OCT数据了解到适合字典。 OCT图像的统计特性激励使用{\阿尔法} -stable分布学习字典，通过考虑非高斯的情况下，{\阿尔法} = 1。稀疏促进成本函数依赖于非凸惩罚 - 柯西基或极小凹罚款（MCP） - 这使得该问题尤其具有挑战性。我们提出了一种高效的算法最小化基于其在每次迭代保证近点的存在性和唯一向前向后的分裂策略的功能。与基于L1-标准凸重建比较表明非凸模型的更好的性能，特别是考虑到进一步OCT图像分析

45. CLOUD: Contrastive Learning of Unsupervised Dynamics [PDF] 返回目录
Jianren Wang, Yujie Lu, Hang Zhao
Abstract: Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently. In this work, we propose to learn forward and inverse dynamics in a fully unsupervised manner via contrastive estimation. Specifically, we train a forward dynamics model and an inverse dynamics model in the feature space of states and actions with data collected from random exploration. Unlike most existing deterministic models, our energy-based model takes into account the stochastic nature of agent-environment interactions. We demonstrate the efficacy of our approach across a variety of tasks including goal-directed planning and imitation from observations. Project videos and code are at this https URL.
摘要：可以从高维观察执行复杂的控制任务显影剂如像素是有效的学习动力，由于具有挑战性的困难。在这项工作中，我们提出通过对比估计要学会正向和反向动力学在完全无人监管的方式。具体来说，我们培养一个正向动力学模型，并在状态和动作的特征空间的逆动力学模型与随机勘探收集的数据。与大多数现有的确定性模型，我们的能源为基础的模型考虑到代理人与环境相互作用的随机性。我们证明了我们在各种任务，包括从观察目标为导向的规划和模仿方法的有效性。项目视频和代码都在此HTTPS URL。

46. Estimation of Cardiac Valve Annuli Motion with Deep Learning [PDF] 返回目录
Eric Kerfoot, Carlos Escudero King, Tefvik Ismail, David Nordsletten, Renee Miller
Abstract: Valve annuli motion and morphology, measured from non-invasive imaging, can be used to gain a better understanding of healthy and pathological heart function. Measurements such as long-axis strain as well as peak strain rates provide markers of systolic function. Likewise, early and late-diastolic filling velocities are used as indicators of diastolic function. Quantifying global strains, however, requires a fast and precise method of tracking long-axis motion throughout the cardiac cycle. Valve landmarks such as the insertion of leaflets into the myocardial wall provide features that can be tracked to measure global long-axis motion. Feature tracking methods require initialisation, which can be time-consuming in studies with large cohorts. Therefore, this study developed and trained a neural network to identify ten features from unlabeled long-axis MR images: six mitral valve points from three long-axis views, two aortic valve points and two tricuspid valve points. This study used manual annotations of valve landmarks in standard 2-, 3- and 4-chamber long-axis images collected in clinical scans to train the network. The accuracy in the identification of these ten features, in pixel distance, was compared with the accuracy of two commonly used feature tracking methods as well as the inter-observer variability of manual annotations. Clinical measures, such as valve landmark strain and motion between end-diastole and end-systole, are also presented to illustrate the utility and robustness of the method.
摘要：瓣环运动和形态，从非侵入性成像测量，可以用于更好地理解健康和病理心脏功能。测量，例如长轴应变以及峰值应变速率提供收缩功能的标记物。同样地，早期和晚期-舒张期充盈速度被用作舒张功能的指标。量化全球菌株，但是，需要一个快速和跟踪整个心动周期长轴运动的精确方法。阀地标，如插入单张进心肌壁提供可跟踪的测量全局长轴运动特征。特征跟踪方法需要初始化，这可能是耗时的大队列研究。从三个长轴视图，二主动脉瓣分2三尖瓣分6二尖瓣点：因此，本研究开发和训练的神经网络来无标签长轴MR图像识别10个地图。本研究中使用阀的地标手动注释在标准2-，3-和4-室长轴收集在临床扫描对网络进行训练图像。在这十个特征的识别准确度，在像素距离，用的两种常用的特征跟踪方法的准确性以及手动注释的观察者间变异性进行比较。临床措施，如阀界标应变和舒张末期和收缩末期之间的运动，还用于说明本方法的实用性和鲁棒性。

47. Progressive Training of Multi-level Wavelet Residual Networks for Image Denoising [PDF] 返回目录
Yali Peng, Yue Cao, Shigang Liu, Jian Yang, Wangmeng Zuo
Abstract: Recent years have witnessed the great success of deep convolutional neural networks (CNNs) in image denoising. Albeit deeper network and larger model capacity generally benefit performance, it remains a challenging practical issue to train a very deep image denoising network. Using multilevel wavelet-CNN (MWCNN) as an example, we empirically find that the denoising performance cannot be significantly improved by either increasing wavelet decomposition levels or increasing convolution layers within each level. To cope with this issue, this paper presents a multi-level wavelet residual network (MWRN) architecture as well as a progressive training (PTMWRN) scheme to improve image denoising performance. In contrast to MWCNN, our MWRN introduces several residual blocks after each level of discrete wavelet transform (DWT) and before inverse discrete wavelet transform (IDWT). For easing the training difficulty, scale-specific loss is applied to each level of MWRN by requiring the intermediate output to approximate the corresponding wavelet subbands of ground-truth clean image. To ensure the effectiveness of scale-specific loss, we also take the wavelet subbands of noisy image as the input to each scale of the encoder. Furthermore, progressive training scheme is adopted for better learning of MWRN by beigining with training the lowest level of MWRN and progressively training the upper levels to bring more fine details to denoising results. Experiments on both synthetic and real-world noisy images show that our PT-MWRN performs favorably against the state-of-the-art denoising methods in terms both quantitative metrics and visual quality.
摘要：近年来，两国在图像降噪深卷积神经网络（细胞神经网络）的巨大成功。尽管更深的网络和大容量模式有益于总体性能，它仍然是一个具有挑战性的实际问题，培养了非常深刻的图像去噪网络。使用多级小波CNN（MWCNN）作为一个例子，我们凭经验发现去噪性能不能显著由每个电平中任一增加小波分解水平或增加卷积层改善。为了解决这个问题，本文提出了一种多级小波剩余网络（MWRN）架构，以及一个循序渐进的训练（PTMWRN）方案，以提高图像去噪性能。在对比MWCNN，离散的每个级别后我们MWRN介绍了几种残余块的小波变换（DWT）和反相离散之前小波变换（IDWT）。用于缓和训练困难，特定尺度损失通过要求中间输出来近似地面实况清洁图像的对应小波子施加到MWRN的每个级别。为了确保尺度特定损失的有效性，我们也考虑噪声图像的小波子作为输入到编码器的每个刻度。此外，循序渐进的训练计划是由与训练MWRN的最低水平，并逐步培养上水平去噪结果带来更多的细节beigining采用MWRN的更好的学习。在人工和真实世界的噪声图像实验结果表明，PT-MWRN从优反对国家的最先进的去噪方法方面的定量指标和视觉质量执行。

48. Segmentation of the cortical plate in fetal brain MRI with a topological loss [PDF] 返回目录
Priscille de Dumast, Hamza Kebiri, Chirine Atat, Vincent Dunet, Mériam Koob, Meritxell Bach Cuadra
Abstract: The fetal cortical plate undergoes drastic morphological changes throughout early in utero development that can be observed using magnetic resonance (MR) imaging. An accurate MR image segmentation, and more importantly a topologically correct delineation of the cortical gray matter, is a key baseline to perform further quantitative analysis of brain development. In this paper, we propose for the first time the integration of a topological constraint, as an additional loss function, to enhance the morphological consistency of a deep learning-based segmentation of the fetal cortical plate. We quantitatively evaluate our method on 18 fetal brain atlases ranging from 21 to 38 weeks of gestation, showing the significant benefits of our method through all gestational ages as compared to a baseline method. Furthermore, qualitative evaluation by three different experts on 130 randomly selected slices from 26 clinical MRIs evidences the out-performance of our method independently of the MR reconstruction quality.
摘要：胎儿皮质板经历整个急剧的形态学变化在子宫内发展，可以使用磁共振（MR）成像中观察到的早期。准确的MR图像分割，并且更重要的是皮质灰质拓扑正确的划分，是进行大脑发育的进一步定量分析的重要基础。在本文中，我们提出的第一次拓扑约束的整合，作为一个额外的损失函数，以提高胎儿大脑皮层板的深学习型细分的形态一致。相比于基线方法，我们定量评价我们对18组胎儿脑图谱，从21〜38周妊娠的，表现出我们的方法通过所有孕龄的显著效益的方法。此外，通过对从26个临床核磁共振130个随机选择的切片三个不同专家定性评价能证明我们的独立于MR重建质量的方法的出性能。

49. Checkerboard-Artifact-Free Image-Enhancement Network Considering Local and Global Features [PDF] 返回目录
Yuma Kinoshita, Hitoshi Kiya
Abstract: In this paper, we propose a novel convolutional neural network (CNN) that never causes checkerboard artifacts, for image enhancement. In research fields of image-to-image translation problems, it is well-known that images generated by usual CNNs are distorted by checkerboard artifacts which mainly caused in forward-propagation of upsampling layers. However, checkerboard artifacts in image enhancement have never been discussed. In this paper, we point out that applying U-Net based CNNs to image enhancement causes checkerboard artifacts. In contrast, the proposed network that contains fixed convolutional layers can perfectly prevent the artifacts. In addition, the proposed network architecture, which can handle both local and global features, enables us to improve the performance of image enhancement. Experimental results show that the use of fixed convolutional layers can prevent checkerboard artifacts and the proposed network outperforms state-of-the-art CNN-based image-enhancement methods in terms of various objective quality metrics: PSNR, SSIM, and NIQE.
摘要：在本文中，我们提出了一个新颖的卷积神经网络（CNN），从来没有引起棋盘文物，进行图像增强。在的图像到图像翻译问题的研究领域，公知的是通过通常的细胞神经网络生成的图像是由主要引起上采样层的前向传播棋盘假象失真。然而，在图像增强棋盘文物从来没有被讨论。在本文中，我们指出，把掌中基于细胞神经网络的图像增强导致棋盘文物。相反，所提出的网络，它包含固定卷积层可以完全防止的工件。此外，所提出的网络架构，它可以处理局部和全局的功能，使我们能够提高图像增强的性能。实验结果表明，使用固定卷积层可以防止棋盘工件和所提出的网络性能优于基于CNN-状态的最先进的图像增强方法中的各种客观质量度量术语：PSNR，SSIM，和NIQE。

50. Tele-operative Robotic Lung Ultrasound Scanning Platform for Triage of COVID-19 Patients [PDF] 返回目录
Ryosuke Tsumura, John W. Hardin, Keshav Bimbraw, Olushola S. Odusanya, Yihao Zheng, Jeffrey C. Hill, Beatrice Hoffmann, Winston Soboyejo, Haichong K. Zhang
Abstract: Novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has become a pandemic of epic proportions and a global response to prepare health systems worldwide is of utmost importance. In addition to its cost-effectiveness in a resources-limited setting, lung ultrasound (LUS) has emerged as a rapid noninvasive imaging tool for the diagnosis of COVID-19 infected patients. Concerns surrounding LUS include the disparity of infected patients and healthcare providers, relatively small number of physicians and sonographers capable of performing LUS, and most importantly, the requirement for substantial physical contact between the patient and operator, increasing the risk of transmission. Mitigation of the spread of the virus is of paramount importance. A 2-dimensional (2D) tele-operative robotic platform capable of performing LUS in for COVID-19 infected patients may be of significant benefit. The authors address the aforementioned issues surrounding the use of LUS in the application of COVID- 19 infected patients. In addition, first time application, feasibility and safety were validated in three healthy subjects, along with 2D image optimization and comparison for overall accuracy. Preliminary results demonstrate that the proposed platform allows for successful acquisition and application of LUS in humans.
摘要：新型严重急性呼吸综合征冠状2（SARS-COV-2）已成为一种流行难遇，并准备全球卫生系统的全球反应是非常重要的。除了在一个资源受限设定其成本效益，肺超声（LUS）已经成为用于COVID-19感染的患者的诊断快速非侵入性成像的工具。周边LUS问题包括感染患者和医疗服务提供者的差距，相对较少的医生和能够执行LU的超声检查中，最重要的是，病人和运营商之间的大量身体接触的要求，增加传播的风险。该病毒的传播缓解是非常重要的。能够在用于COVID-19感染的患者进行LUS的2维（2D）远程操作机器人平台可以是显著益处。作者解决围绕在COVID- 19感染者的应用中使用LU的上述问题。此外，第一次申请，可行性和安全性三个健康受试者进行了验证，与2D图像优化和整体精度比较顺。初步结果表明，该平台允许在人类LUS的成功收购和应用。

51. Population Gradients improve performance across data-sets and architectures in object classification [PDF] 返回目录
Yurika Sakai, Andrey Kormilitzin, Qiang Liu, Alejo Nevado-Holgado
Abstract: The most successful methods such as ReLU transfer functions, batch normalization, Xavier initialization, dropout, learning rate decay, or dynamic optimizers, have become standards in the field due, particularly, to their ability to increase the performance of Neural Networks (NNs) significantly and in almost all situations. Here we present a new method to calculate the gradients while training NNs, and show that it significantly improves final performance across architectures, data-sets, hyper-parameter values, training length, and model sizes, including when it is being combined with other common performance-improving methods (such as the ones mentioned above). Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) it provides is as high or higher than this one of all the other widespread performance-improving methods that we have compared against. We call our method Population Gradients (PG), and it consists on using a population of NNs to calculate a non-local estimation of the gradient, which is closer to the theoretical exact gradient (i.e. this one obtainable only with an infinitely big data-set) of the error function than the empirical gradient (i.e. this one obtained with the real finite data-set).
摘要：最成功的方法，如RELU传递函数，批标准化，泽维尔初始化，辍学，学习速率衰减，或动态优化器，已经成为标准，在现场因，特别是，它们能够提高神经网络的性能的能力（神经网络）显著并且在几乎所有情况。在这里，我们提出了一种新的方法来计算，而训练神经网络的梯度，并表明它显著改善了整个体系结构中，数据集，超参数值，训练长度，和模型的尺寸，当它被与其他常见组合包括最终性能性能改进的方法（如那些上面提到的）。除了是有效的宽阵列的情况下，我们已经测试过，在性能上它提供了增加（例如F1）是高达或高于所有其他的广泛的性能改进方法，我们已经针对比较的这一个。我们把我们的方法人口梯度（PG），它包括利用神经网络的人口来计算梯度的非本地估计，这是更接近理论精确值梯度（即这一项只能用无限大数据 - 获得比经验梯度误差函数（的集合），即这一个与实际有限的数据集获得的）。

52. Coping with Label Shift via Distributionally Robust Optimisation [PDF] 返回目录
Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
Abstract: The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an \emph{unlabelled} test sample. This sample may be used to estimate the test label distribution, and to then train a suitably re-weighted classifier. While approaches using this idea have proven effective, their scope is limited as it is not always feasible to access the target domain; further, they require repeated retraining if the model is to be deployed in \emph{multiple} test environments. Can one instead learn a \emph{single} classifier that is robust to arbitrary label shifts from a broad family? In this paper, we answer this question by proposing a model that minimises an objective based on distributionally robust optimisation (DRO). We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective. %, and establish its convergence. Finally, through experiments on CIFAR-100 and ImageNet, we show that our technique can significantly improve performance over a number of baselines in settings where label shift is present.
摘要：标签转移问题指的是监督式学习环境，让火车和测试标签分布不匹配。现有的工作寻址标签移通常假定访问一个\ EMPH {未标记}测试样品。该样品可被用于估计测试标签的分布，并然后训练适当重新加权分类器。当使用这种想法的办法已经证明是有效的，其范围是有限的，因为它并不总是访问的目标领域是可行的;进一步，他们需要反复再训练，如果该模型是在\ {EMPH多个}测试环境中部署。一个人能代替学习\ {EMPH单}分类是稳健的任意标签的变化从一个大家族？在本文中，我们通过提出一个模型回答这个问题，最大限度地减少客观基于分布式地鲁棒优化（DRO）。然后，我们设计和分析大规模定制的问题梯度下降，近镜上升算法来优化所提出的目标。％，并建立其收敛。最后，通过对CIFAR-100和ImageNet实验，我们证明了我们的技术可以显著提高性能比在设置了一些基线，其中标签转移存在。

53. Tensor Reordering for CNN Compression [PDF] 返回目录
Matej Ulicny, Vladimir A. Krylov, Rozenn Dahyot
Abstract: We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain. Specifically, the representation extracted via Discrete Cosine Transform (DCT) is more conducive for pruning than the original space. By relying on a combination of weight tensor reshaping and reordering we achieve high levels of layer compression with just minor accuracy loss. Our approach is applied to compress pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance after a significant parameter reduction. We validate our approach on ResNet-50 and MobileNet-V2 architectures for ImageNet classification task.
摘要：我们将展示如何在卷积神经网络（CNN）参数的冗余可以通过在频域修剪可以有效地减少过滤器。具体而言，变换通过离散余弦萃取表示（DCT）可以比原来的空间修剪更有利于。依靠重张整形的组合和重新排序，我们实现高水平层压缩的只有轻微的精度损失。我们的做法是适用于压缩预训练的细胞神经网络，我们表明，未成年人的额外微调使我们的方法将显著参数还原后恢复原始模型的性能。我们验证上RESNET-50和MobileNet-V2架构的ImageNet分类任务我们的做法。

54. Language-Conditioned Imitation Learning for Robot Manipulation Tasks [PDF] 返回目录
Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee, Chitta Baral, Heni Ben Amor
Abstract: Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., "go to the large green bowl"). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.
摘要：模仿学习是教学运动技能的机器人流行的做法。然而，大多数方法集中在从单独执行迹线（即，运动轨迹和感知数据）提取策略参数。人类专家和机器人来描述的任务的关键方面，如目标物体或运动的预期形状的属性之间不存在足够的通信信道。通过分析上市公司人力教学过程的启发，我们引入了包含非结构化的自然语言到模仿学习的方法。在训练时，专家可以以描述提供示范与口头描述沿底下的意图（例如，“去大绿碗”）。然后在训练过程相互联系这两种方式来编码语言，感知和运动之间的关系。得到的语言空调视觉运动的政策可以在运行时对新人类的命令和指令，它允许对培训的政策更细粒度的控制，同时也减少态势歧义进行调节。我们展示了一组模拟试验我们的方法可以学习如何为七度的自由度的机械臂语言空调的操作策略，并比较结果的各种替代方法。

55. Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Inference Attacks [PDF] 返回目录
Mohd Sabra, Anindya Maiti, Murtuza Jadliwala
Abstract: Due to recent world events, video calls have become the new norm for both personal and professional remote communication. However, if a participant in a video call is not careful, he/she can reveal his/her private information to others in the call. In this paper, we design and evaluate an attack framework to infer one type of such private information from the video stream of a call -- keystrokes, i.e., text typed during the call. We evaluate our video-based keystroke inference framework using different experimental settings and parameters, including different webcams, video resolutions, keyboards, clothing, and backgrounds. Our relatively high keystroke inference accuracies under commonly occurring and realistic settings highlight the need for awareness and countermeasures against such attacks. Consequently, we also propose and evaluate effective mitigation techniques that can automatically protect users when they type during a video call.
摘要：由于近期世界的事件，视频通话已经成为个人和专业的远程通信的新标准。但是，如果在视频通话中的参与者是不小心，他/她可以发现他/她的个人信息给他人通话。在本文中，我们设计和评估的攻击框架来推断一个类型的来自呼叫的视频流等私人信息 - 在通话过程中输入的按键，即文本。我们用不同的实验设置和参数，包括不同的网络摄像头，视频分辨率，键盘，服装和背景评估我们的基于视频的击键推理框架。我们比较高的下经常发生的和现实的设置按键推断精度突出了认识和对策防止这种攻击的必要性。因此，我们也建议和评价，可以自动保护用户在视频通话过程中输入有效的缓解技术。

56. A generalized deep learning model for multi-disease Chest X-Ray diagnostics [PDF] 返回目录
Nabit Bajwa, Kedar Bajwa, Atif Rana, M. Faique Shakeel, Kashif Haqqi, Suleiman Ali Khan
Abstract: We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites. We systematically train the model using datasets from three independent sites with different patient populations: National Institute of Health (NIH), Stanford University Medical Centre (CheXpert), and Shifa International Hospital (SIH). We formulate a sequential training approach and demonstrate that the model produces generalized prediction performance using held out test sets from the three sites. Our model generalizes better when trained on multiple datasets, with the CheXpert-Shifa-NET model performing significantly better (p-values < 0.05) than the models trained on individual datasets for 3 out of the 4 distinct disease classes. The code for training the model will be made available open source at: this http URL at the time of publication.
摘要：我们调查深卷积神经网络（CNN）对疾病的分类，从收集到多个站点胸部X光检查任务的普遍性。美国国立卫生研究院（NIH），美国斯坦福大学医学中心（CheXpert）和希法国际医院（SIH）：我们系统使用的数据集由三个独立的站点，不同的患者群训练模型。我们制定一个连续的训练方法，并证明该模型产生使用伸出测试集从三个地点广义预测性能。我们的模型概括更好时，在多个数据集的训练，以比上训练数据集个别3出4个不同的疾病类别的模型CheXpert希法-NET模型显著更好执行（p值<0.05）。在出版的时候这个http网址：用于训练模型的代码将在被提供开源的。< font>

57. Simple Neighborhood Representative Pre-processing Boosts Outlier Detectors [PDF] 返回目录
Jiawei Yang, Yu Chen
Abstract: Outlier detectors heavily rely on data distribution. All outlier detectors will become ineffective, for example, when data has collective outliers or a large portion of outliers. To better handle this issue, we propose a pre-processing technique called neighborhood representative. The neighborhood representative first selects a subset of representative objects from data, then employs outlier detectors to score the representatives. The non-representative data objects share the same score with the representative object nearby. The proposed technique is essentially an add-on to most existing outlier detector as it can improve 16% accuracy (from 0.64 AUC to 0.74 AUC) on average evaluated on six datasets with nine state-of-the-art outlier detectors. In datasets with fewer outliers, the proposed technique can still improve most of the tested outlier detectors.
摘要：异常值检测器在很大程度上依赖于数据的分布。所有异常值检测器将成为无效的，例如，当数据具有集体离群或异常值的很大一部分。为了更好地处理这个问题，我们提出了所谓的邻里代表预处理技术。附近代表第一选择来自数据代表对象的子集，然后采用离群探测器得分代表。非代表数据对象共享与附近的代表性对象以同样的比分。所提出的技术基本上是一个附加到大多数现有的检测器的异常值，因为它可以提高16％的准确度（从0.64到AUC 0.74 AUC），平均评价在六个数据集与九个国家的最先进的异常值检测器。在具有较少的异常值的数据集，所提出的技术仍然可以提高最被测离群值检测器。

58. Using Deep Image Priors to Generate Counterfactual Explanations [PDF] 返回目录
Vivek Narayanaswamy, Jayaraman J. Thiagarajan, Andreas Spanias
Abstract: Through the use of carefully tailored convolutional neural network architectures, a deep image prior (DIP) can be used to obtain pre-images from latent representation encodings. Though DIP inversion has been known to be superior to conventional regularized inversion strategies such as total variation, such an over-parameterized generator is able to effectively reconstruct even images that are not in the original data distribution. This limitation makes it challenging to utilize such priors for tasks such as counterfactual reasoning, wherein the goal is to generate small, interpretable changes to an image that systematically leads to changes in the model prediction. To this end, we propose a novel regularization strategy based on an auxiliary loss estimator jointly trained with the predictor, which efficiently guides the prior to recover natural pre-images. Our empirical studies with a real-world ISIC skin lesion detection problem clearly evidence the effectiveness of the proposed approach in synthesizing meaningful counterfactuals. In comparison, we find that the standard DIP inversion often proposes visually imperceptible perturbations to irrelevant parts of the image, thus providing no additional insights into the model behavior.
摘要：通过使用精心定制卷积神经网络架构中，深图像之前（DIP）的可用于获得从潜表示编码预图像。虽然DIP反转已已知优于常规正则求逆策略，如总偏差，这样的过参数发生器能够有效地重建甚至不在原始数据的分布的图像。这种限制使得它具有挑战性的利用这样的先验对任务，如反推理，其特征在于，所述的目标是生成的图像小，可解释的变化系统地导致在该模型预测的变化。为此，我们提出了一种基于与预测，从而有效地引导之前，恢复自然的原像共同训练的辅助损失估计一个新的正则化。我们与现实世界ISIC皮肤损伤检测问题的实证研究明确证据合成有意义的反事实所提出的方法的有效性。在比较中，我们发现，标准的DIP反转往往提出视觉不易察觉的扰动图像的不相关部分，从而提供无需额外洞察到模型中的行为。

59. Deep Image Prior for Sparse-sampling Photoacoustic Microscopy [PDF] 返回目录
Tri Vu, Anthony DiSpirito III, Daiwei Li, Zixuan Zhang, Xiaoyi Zhu, Maomao Chen, Dong Zhang, Jianwen Luo, Yu Shrike Zhang, Roarke Horstmeyer, Junjie Yao
Abstract: Photoacoustic microscopy (PAM) is an emerging method for imaging both structural and functional information without the need for exogenous contrast agents. However, state-of-the-art PAM faces a tradeoff between imaging speed and spatial sampling density within the same field-of-view (FOV). Limited by the pulsed laser's repetition rate, the imaging speed is inversely proportional to the total number of effective pixels. To cover the same FOV in a shorter amount of time with the same PAM hardware, there is currently no other option than to decrease spatial sampling density (i.e., sparse sampling). Deep learning methods have recently been used to improve sparsely sampled PAM images; however, these methods often require time-consuming pre-training and a large training dataset that has fully sampled, co-registered ground truth. In this paper, we propose using a method known as "deep image prior" to improve the image quality of sparsely sampled PAM images. The network does not need prior learning or fully sampled ground truth, making its implementation more flexible and much quicker. Our results show promising improvement in PA vasculature images with as few as 2% of the effective pixels. Our deep image prior approach produces results that outperform interpolation methods and can be readily translated to other high-speed, sparse-sampling imaging modalities.
摘要：光声显微镜（PAM）可以，而不需要外源性对比剂的成像结构和功能的信息的新兴方法。然而，国家的最先进的PAM面临着同样的场的视场（FOV）内的成像速度和空间采样密度之间的折衷。通过脉冲激光的重复频率的限制，成像速度成反比的有效像素的总数。覆盖相同的FOV中具有相同的PAM硬件更短量的时间，目前比降低的空间采样密度（即，稀疏采样）没有其他选择。深学习方法最近被用来改善稀疏采样PAM图像;然而，这些方法往往需要费时前培训和大型训练数据集已经完全采样，共注册地道理。在本文中，我们提出了“之前深图像”使用被称为方法来提高稀疏采样PAM图像的图像质量。网络并不需要事先学习或完全采样地面实况，使得其实施更加灵活，更快。我们的研究结果显示，PA血管图像有希望改善与有效像素数少至2％。我们的深图像现有方法产生优于内插方法并且可以容易地转换为其他高速，稀疏采样的成像模态的结果。

60. Automating Abnormality Detection in Musculoskeletal Radiographs through Deep Learning [PDF] 返回目录
Goodarz Mehr
Abstract: This paper introduces MuRAD (Musculoskeletal Radiograph Abnormality Detection tool), a tool that can help radiologists automate the detection of abnormalities in musculoskeletal radiographs (bone X-rays). MuRAD utilizes a Convolutional Neural Network (CNN) that can accurately predict whether a bone X-ray is abnormal, and leverages Class Activation Map (CAM) to localize the abnormality in the image. MuRAD achieves an F1 score of 0.822 and a Cohen's kappa of 0.699, which is comparable to the performance of expert radiologists.
摘要：介绍MURAD（肌肉骨骼X光片异常检测工具），一个工具，可以帮助放射科医生在自动化肌肉骨骼X光片异常的检测（骨X射线）。 MURAD利用卷积神经网络（CNN），可以准确地预测骨透视是否异常，并利用类激活图（CAM）来定位图像中的异常。穆拉德实现了0.822的F1分和0.699科恩的卡帕，这与放射学专家的表现。

61. Towards falsifiable interpretability research [PDF] 返回目录
Matthew L. Leavitt, Ari Morcos
Abstract: Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpretability research suffers from an over-reliance on intuition-based approaches that risk-and in some cases have caused-illusory progress and misleading conclusions. We identify a set of limitations that we argue impede meaningful progress in interpretability research, and examine two popular classes of interpretability methods-saliency and single-neuron-based approaches-that serve as case studies for how overreliance on intuition and lack of falsifiability can undermine interpretability research. To address these concerns, we propose a strategy to address these impediments in the form of a framework for strongly falsifiable interpretability research. We encourage researchers to use their intuitions as a starting point to develop and test clear, falsifiable hypotheses, and hope that our framework yields robust, evidence-based interpretability methods that generate meaningful advances in our understanding of DNNs.
摘要：方法对于理解底层深层神经网络（DNNs）的决定和机制通常依赖于强调单个实例感官或语义特征建立的直觉。例如，方法的目标是可视化的输入分别是“重要的”到网络的决定，或以测量单个神经元的语义特性的部件。在这里，我们认为，解释性研究患有基于直觉的方法过度依赖风险，在某些情况下造成虚幻的进步和误导性的结论。我们确定了一组限制，我们认为在解释性研究阻碍有意义的进展，并研究两个流行类的基于单神经元的解释性方法的显着度和办法，那作为案例研究直觉如何过度依赖，缺乏证伪的可破坏解释性研究。为了解决这些问题，我们提出了一个战略，在大力证伪解释性研究框架的形式来解决这些障碍。我们鼓励研究人员利用自己的直觉为出发点，以开发和测试清晰，可证伪的假设，并希望我们的框架产生产生在我们DNNs的理解有意义的进步强大的，基于证据的解释性方法。

62. CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs [PDF] 返回目录
Dennis Bähr, Dennis Eschweiler, Anuk Bhattacharyya, Daniel Moreno-Andrés, Wolfram Antonin, Johannes Stegmaier
Abstract: Automatic analysis of spatio-temporal microscopy images is inevitable for state-of-the-art research in the life sciences. Recent developments in deep learning provide powerful tools for automatic analyses of such image data, but heavily depend on the amount and quality of provided training data to perform well. To this end, we developed a new method for realistic generation of synthetic 2D+t microscopy image data of fluorescently labeled cellular nuclei. The method combines spatiotemporal statistical shape models of different cell cycle stages with a conditional GAN to generate time series of cell populations and provides instance-level control of cell cycle stage and the fluorescence intensity of generated cells. We show the effect of the GAN conditioning and create a set of synthetic images that can be readily used for training and benchmarking of cell segmentation and tracking approaches.
摘要：时空显微镜图像的自动分析是不可避免的国家的最先进的研究在生命科学。在深度学习的最新发展这样的图像数据自动分析提供了强有力的工具，但在很大程度上取决于量和提供训练数据的质量表现良好。为此，我们开发了用于现实产生荧光标记的细胞核的合成2D +吨显微镜的图像数据的新方法。该方法结合时空与条件GAN不同细胞周期阶段的统计形状模型来产生时间序列的细胞群，并提供细胞周期阶段的实例级别控制和产生的细胞的荧光强度。我们展示了甘调理的效果，并创建一套能够方便地用于培训和细胞分割的基准和跟踪方法合成图像。

63. Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset [PDF] 返回目录
Zhanwen Chen, Shiyao Li, Roxanne Rashedi, Xiaoman Zi, Morgan Elrod-Erickson, Bryan Hollis, Angela Maliakal, Xinyu Shen, Simeng Zhao, Maithilee Kunda
Abstract: Modern social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content, e.g., for a scene in Harry Potter, "Is the father really upset about the boys flying the car?" Social visual question answering (social VQA) is emerging as a valuable methodology for studying social reasoning in both humans (e.g., children with autism) and AI agents. However, this problem space spans enormous variations in both videos and questions. We discuss methods for creating and characterizing social VQA datasets, including 1) crowdsourcing versus in-house authoring, including sample comparisons of two new datasets that we created (TinySocial-Crowd and TinySocial-InHouse) and the previously existing Social-IQ dataset; 2) a new rubric for characterizing the difficulty and content of a given video; and 3) a new rubric for characterizing question types. We close by describing how having well-characterized social VQA datasets will enhance the explainability of AI agents and can also inform assessments and educational interventions for people.
摘要：现代社会智力包括观看有关社会和理论的头脑相关的内容，例如，在哈利·波特的场景视频和回答问题的能力，“真的是苦恼飞行汽车男孩父亲？”社会视觉问答（社会VQA）正在成为一种有价值的方法为在人类研究社会推理（例如，自闭症儿童）和AI代理。然而，这个问题空间横跨在两个视频和问题的巨大变化。我们讨论了用于创建和表征社会VQA的数据集，包括：1）众包与内部创作，包括我们创建（TinySocial收拢，TinySocial-点播服务）和先前存在的社会智商的数据集两个新的数据集的样本进行比较的方法; 2）用于表征难度和给定的视频内容的新栏目; 3）表征问题的类型的新栏目。我们附近描述如何具有良好的特点的社会VQA数据集将增强AI代理的explainability，也可以告知评估和教育干预的人。

64. Investigating Cultural Aspects in the Fundamental Diagram using Convolutional Neural Networks and Simulation [PDF] 返回目录
Rodolfo M. Favaretto, Roberto R. Santos, Marcio Ballotin, Paulo Knob, Soraia R. Musse, Felipe Vilanova, Angelo B. Costa
Abstract: This paper presents a study regarding group behavior in a controlled experiment focused on differences in an important attribute that vary across cultures -- the personal spaces -- in two Countries: Brazil and Germany. In order to coherently compare Germany and Brazil evolutions with same population applying same task, we performed the pedestrian Fundamental Diagram experiment in Brazil, as performed in Germany. We use CNNs to detect and track people in video sequences. With this data, we use Voronoi Diagrams to find out the neighbor relation among people and then compute the walking distances to find out the personal spaces. Based on personal spaces analyses, we found out that people behavior is more similar, in terms of their behaviours, in high dense populations and vary more in low and medium densities. So, we focused our study on cultural differences between the two Countries in low and medium densities. Results indicate that personal space analyses can be a relevant feature in order to understand cultural aspects in video sequences. In addition to the cultural differences, we also investigate the personality model in crowds, using OCEAN. We also proposed a way to simulate the FD experiment from other countries using the OCEAN psychological traits model as input. The simulated countries were consistent with the literature.
摘要：本文介绍在受控的实验中，关于研究小组的行为集中在跨文化变化的一个重要属性差异 - 个人空间 - 在两个国家：巴西和德国。为了连贯与同一人群适用同样的任务比较德国和巴西变阵，我们在巴西进行的行人基本图的实验，在德国进行。我们使用细胞神经网络的检测和跟踪视频序列中的人。有了这些数据，我们使用Voronoi图，找出人与人之间的邻居关系，然后计算步行距离，找出个人空间。基于个人空间的分析，我们发现，人的行为是比较相似的，在他们的行为而言，在高密度的人口和低收入和中等密度变化更多。所以，我们针对我们的研究中低浓度两国之间的文化差异。结果表明，个人空间的分析可以是相关的功能，以了解视频序列中文化方面。除了文化上的差异，我们还调查人群的人格模式，利用海洋。我们还提出了一个方法来模拟使用OCEAN心理特质模型输入其他国家的FD实验。模拟国家都与文献一致。

65. MTGAT: Multimodal Temporal Graph Attention Networks for Unaligned Human Multimodal Language Sequences [PDF] 返回目录
Jianing Yang, Yongxin Wang, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, Louis-Philippe Morency
Abstract: Human communication is multimodal in nature; it is through multiple modalities, i.e., language, voice, and facial expressions, that opinions and emotions are expressed. Data in this domain exhibits complex multi-relational and temporal interactions. Learning from this data is a fundamentally challenging research problem. In this paper, we propose Multimodal Temporal Graph Attention Networks (MTGAT). MTGAT is an interpretable graph-based neural model that provides a suitable framework for analyzing this type of multimodal sequential data. We first introduce a procedure to convert unaligned multimodal sequence data into a graph with heterogeneous nodes and edges that captures the rich interactions between different modalities through time. Then, a novel graph operation, called Multimodal Temporal Graph Attention, along with a dynamic pruning and read-out technique is designed to efficiently process this multimodal temporal graph. By learning to focus only on the important interactions within the graph, our MTGAT is able to achieve state-of-the-art performance on multimodal sentiment analysis and emotion recognition benchmarks including IEMOCAP and CMU-MOSI, while utilizing significantly fewer computations.
摘要：人际交往本质上是多模态;它是通过多种方式，即，语言，声音和面部表情，那意见和情绪表达。数据在该领域显示出复杂的多关系和时间相互作用。从这些数据中学习是一个从根本上具有挑战性的研究课题。在本文中，我们提出了多模式域图表注意网络（MTGAT）。 MTGAT是提供用于分析这种类型的多模态的顺序数据的一个适当的框架可解释的基于图的神经网络模型。我们首先介绍一个过程来对齐多峰序列数据转换成具有异构节点和边的曲线图，通过时间不同模态之间捕获的丰富的互动。然后，一个新颖的图形操作时，多模态称为颞格拉夫注意，用动态修剪和读出技术沿着被设计为有效地处理这种多峰的时间曲线图。通过学习只注重图形中的重要作用，我们MTGAT能够实现多模态情感分析和情感识别基准测试包括IEMOCAP和CMU-MOSI国家的最先进的性能，同时利用显著更少的计算。

66. Deep Convolutional Neural Networks Model-based Brain Tumor Detection in Brain MRI Images [PDF] 返回目录
Md. Abu Bakr Siddique, Shadman Sakib, Mohammad Mahmudur Rahman Khan, Abyaz Kader Tanzeem, Madiha Chowdhury, Nowrin Yasmin
Abstract: Diagnosing Brain Tumor with the aid of Magnetic Resonance Imaging (MRI) has gained enormous prominence over the years, primarily in the field of medical science. Detection and/or partitioning of brain tumors solely with the aid of MR imaging is achieved at the cost of immense time and effort and demands a lot of expertise from engaged personnel. This substantiates the necessity of fabricating an autonomous model brain tumor diagnosis. Our work involves implementing a deep convolutional neural network (DCNN) for diagnosing brain tumors from MR images. The dataset used in this paper consists of 253 brain MR images where 155 images are reported to have tumors. Our model can single out the MR images with tumors with an overall accuracy of 96%. The model outperformed the existing conventional methods for the diagnosis of brain tumor in the test dataset (Precision = 0.93, Sensitivity = 1.00, and F1-score = 0.97). Moreover, the proposed model's average precision-recall score is 0.93, Cohen's Kappa 0.91, and AUC 0.95. Therefore, the proposed model can help clinical experts verify whether the patient has a brain tumor and, consequently, accelerate the treatment procedure.
摘要：诊断脑肿瘤与磁共振成像（MRI）的帮助下取得了巨大的突出多年来，主要在医学领域。检测和/或脑肿瘤完全与磁共振成像的援助的划分是在巨大的时间和精力为代价，并要求提供聘请的工作人员大量的专业知识。这证实了制造自主模型脑肿瘤诊断的必要性。我们的工作涉及实施从MR图像诊断脑肿瘤深刻的卷积神经网络（DCNN）。在本文中所使用的数据集组成，其中155个图像报告有肿瘤253脑部MR图像。我们的模型可以挑出具有肿瘤的MR图像与96％的总精度。该模型表现优于现有的常规方法为脑肿瘤中的测试数据集的诊断（精密= 0.93，灵敏度= 1.00，F1-分数= 0.97）。此外，该模型的平均准确率，召回得分是0.93，Cohen的κ0.91，和AUC 0.95。因此，该模型可以帮助临床专家确认患者是否有脑肿瘤，因此，加快处理程序。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-26

目录

摘要