摘要

1. Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation [PDF] 返回目录
Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira
Abstract: Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory. One main challenge is the problem of spatiotemporal variations (e.g. different people may perform the same activity in various ways). Therefore, we exploit unlabeled videos to address this problem by reformulating the action segmentation task as a cross-domain problem with domain discrepancy caused by spatio-temporal variations. To reduce the discrepancy, we propose Self-Supervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. On three challenging benchmark datasets (GTEA, 50Salads, and Breakfast), SSTDA outperforms the current state-of-the-art method by large margins (e.g. for the F1@25 score, from 59.6% to 69.1% on Breakfast, from 73.4% to 81.5% on 50Salads, and from 83.6% to 89.1% on GTEA), and requires only 65% of the labeled training data for comparable performance, demonstrating the usefulness of adapting to unlabeled target videos across variations. The source code is available at this https URL.
摘要：尽管最近的充分监督的作用分割技术的进步，其性能仍不能完全令人满意。其中一个主要的挑战是时空变化的问题（例如，不同的人可以进行各种方式相同的活动）。因此，我们利用未标记的视频通过重整动作分割任务作为跨域问题引起的时空变化域差异来解决这个问题。为了减少这种差异，我们提出自我监督的时空领域适应性（SSTDA），它包含两个自我监督的辅助任务（二进制和连续域预测），嵌入式与本地和全球的时空动态共同对准跨域特征空间，实现更好性能上比其他领域适应性（DA）方法。在三个挑战基准数据集（GTEA，50Salads，和早餐），SSTDA优于由大余量的当前状态的最先进的方法（例如，对于F1 @ 25得分，从59.6％到早餐69.1％，从73.4％对50Salads 81.5％，和从83.6％到89.1 GTEA％），并且只需要65％的相当的性能的标记的训练数据的，表明跨越变化适应未标记的目标视频的有用性。源代码可在此HTTPS URL。

2. Feature Extraction for Hyperspectral Imagery: The Evolution from Shallow to Deep [PDF] 返回目录
Behnood Rasti, Danfeng Hong, Renlong Hang, Pedram Ghamisi, Xudong Kang, Jocelyn Chanussot, Jon Atli Benediktsson
Abstract: Hyperspectral images provide detailed spectral information through hundreds of (narrow) spectral channels (also known as dimensionality or bands) with continuous spectral information that can accurately classify diverse materials of interest. The increased dimensionality of such data makes it possible to significantly improve data information content but provides a challenge to the conventional techniques (the so-called curse of dimensionality) for accurate analysis of hyperspectral images. Feature extraction, as a vibrant field of research in the hyperspectral community, evolved through decades of research to address this issue and extract informative features suitable for data representation and classification. The advances in feature extraction have been inspired by two fields of research, including the popularization of image and signal processing as well as machine (deep) learning, leading to two types of feature extraction approaches named shallow and deep techniques. This article outlines the advances in feature extraction approaches for hyperspectral imagery by providing a technical overview of the state-of-the-art techniques, providing useful entry points for researchers at different levels, including students, researchers, and senior researchers, willing to explore novel investigations on this challenging topic. % by supplying a rich amount of detail and references. In more detail, this paper provides a bird's eye view over shallow (both supervised and unsupervised) and deep feature extraction approaches specifically dedicated to the topic of hyperspectral feature extraction and its application on hyperspectral image classification. Additionally, this paper compares 15 advanced techniques with an emphasis on their methodological foundations in terms of classification accuracies.
摘要：高光谱图像提供通过数百个（窄）光谱信道（也称为维数或条带）具有连续的光谱信息的兴趣，可以准确地分类多样材料的详细的光谱信息。这样的数据的增加的维数，能够提高显著数据信息内容，但提供了一种用于高光谱图像的准确分析的常规技术挑战（维数的所谓的诅咒）。特征提取，如在高光谱社区研究一个充满活力的领域，通过几十年的研究发展到解决这个问题，适用于数据表示和分类提取信息量大的特点。在特征提取的进步已经由两个研究领域，包括图像和信号处理的普及以及机（深）学习启发，从而导致两种类型的特征提取办法命名浅层和深层技术。本文概述了在特征提取通过提供先进设备，最先进的技术的技术概述，研究人员在不同层面，包括学生，研究人员和高级研究人员提供有用的切入点超光谱成像方法的进步，乐于探究在这个富有挑战性的课题新颖的调查。％通过提供丰富的细节和引用的数量。更具体地说，本文提供了一个浅浅的鸟瞰图（包括监督和无监督）和深层特征提取办法专门用于高光谱特征提取的主题，其对高光谱影像分类中的应用。此外，本文在分类精度方面对他们的方法论基础的重点15项先进的技术进行比较。

3. Multi-object Tracking via End-to-end Tracklet Searching and Ranking [PDF] 返回目录
Tao Hu, Lichao Huang, Han Shen
Abstract: Recent works in multiple object tracking use sequence model to calculate the similarity score between the detections and the previous tracklets. However, the forced exposure to ground-truth in the training stage leads to the training-inference discrepancy problem, i.e., exposure bias, where association error could accumulate in the inference and make the trajectories drift. In this paper, we propose a novel method for optimizing tracklet consistency, which directly takes the prediction errors into account by introducing an online, end-to-end tracklet search training process. Notably, our methods directly optimize the whole tracklet score instead of pairwise affinity. With sequence model as appearance encoders of tracklet, our tracker achieves remarkable performance gain from conventional tracklet association baseline. Our methods have also achieved state-of-the-art in MOT15~17 challenge benchmarks using public detection and online settings.
摘要：多目标跟踪使用序列模型最近的作品来计算检测和以前tracklets之间的相似性得分。然而，被迫暴露在地面实况在训练阶段，导致训练推论的差异问题，即，曝光补偿，其中关联错误可能会在推理积聚，使轨道漂移。在本文中，我们提出了用于优化tracklet一致性，这通过引入在线，端至端tracklet搜索训练过程直接将预测误差在内的新方法。值得注意的是，我们的方法直接优化整个tracklet得分，而不是成对的亲和力。随着序列模型tracklet的外观编码器，我们的跟踪器实现从传统tracklet协会基线显着的性能增益。我们的方法也取得了国家的最先进的使用公共检测和在线设置MOT15〜17挑战基准。

4. Combating noisy labels by agreement: A joint training method with co-regularization [PDF] 返回目录
Hongxin Wei, Lei Feng, Xiangyu Chen, Bo An
Abstract: Deep Learning with noisy labels is a practically challenging problem in weakly-supervised learning. The state-of-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.
摘要：深学习与嘈杂的标签是弱监督学习实用具有挑战性的问题。国家的最先进的方法“脱钩”和“合作教学+”要求的“分歧”战略是缓解与嘈杂的标签学习的问题至关重要的。在本文中，我们从不同的角度入手，提出了一种强大的学习范例，称之为JoCoR，其目的是减少训练期间，两个网络的多样性。具体而言，我们首先使用两个网络，使在相同的小批量数据的预测和计算与合作正则联合损失每个训练例子。然后，我们选择小损失的例子来同时更新两个两个网络的参数。由联合损失的训练，这两个网络会越来越相似，由于联合正规化的效果。来自基准数据集包括MNIST，CIFAR-10，CIFAR-100和Clothing1M损坏的数据的广泛实验结果表明，JoCoR优于许多国家的最先进的方法用于与嘈杂标签学习。

5. Search Space of Adversarial Perturbations against Image Filters [PDF] 返回目录
Dang Duy Thang, Toshihiro Matsui
Abstract: The superiority of deep learning performance is threatened by safety issues for itself. Recent findings have shown that deep learning systems are very weak to adversarial examples, an attack form that was altered by the attacker's intent to deceive the deep learning system. There are many proposed defensive methods to protect deep learning systems against adversarial examples. However, there is still a lack of principal strategies to deceive those defensive methods. Any time a particular countermeasure is proposed, a new powerful adversarial attack will be invented to deceive that countermeasure. In this study, we focus on investigating the ability to create adversarial patterns in search space against defensive methods that use image filters. Experimental results conducted on the ImageNet dataset with image classification tasks showed the correlation between the search space of adversarial perturbation and filters. These findings open a new direction for building stronger offensive methods towards deep learning systems.
摘要：深学习性能的优越性是由安全问题为自己的威胁。最近的研究结果表明，深度学习系统是非常弱对抗的例子，这是由攻击者改变攻击形式是有意欺骗的深度学习系统。有许多建议防御方法来保护深度学习系统免受敌对的例子。然而，仍然缺乏主要策略，以欺骗那些防御方法。特定的对策建议任何时候，一个新的强大的对抗攻击将被发明欺骗该对策。在这项研究中，我们重点研究建立在对使用图像过滤器的防御方法搜索空间对抗模式的能力。在ImageNet数据集图像分类任务进行的实验结果表明，对抗扰动和过滤器的搜索空间之间的相关性。这些发现开辟了建设向深度学习系统强大的进攻方法的一个新方向。

6. Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction [PDF] 返回目录
Hyeon Cho, Taehoon Kim, Hyung Jin Chang, Wonjun Hwang
Abstract: We propose a self-supervised learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal representation of the video by leveraging the variations in the visual appearance according to different playback speeds under the assumption of temporal coherence. To learn the spatio-temporal variations in the entire video, we have not only predicted a single playback speed but also generated clips of various playback speeds with randomized starting points. We then train a 3D convolutional network by solving the formulation that sorts the shuffled clips by their playback speed. In this case, the playback speed includes both forward and reverse directions; hence the visual representation can be successfully learned from the directional dynamics of the video. We also propose a novel layer-dependable temporal group normalization method that can be applied to 3D convolutional networks to improve the representation learning performance where we divide the temporal features into several groups and normalize each one using the different corresponding parameters. We validate the effectiveness of the proposed method by fine-tuning it to the action recognition task. The experimental results show that the proposed method outperforms state-of-the-art self-supervised learning methods in action recognition.
摘要：本文提出通过预测视频的可变播放速度自我监督学习方法。没有语义标签，我们通过根据时间相干性的假设下不同的回放速度利用在视觉外观变化学习视频的时空表示。要了解在整个视频的时空变化，我们不仅预测一个单一的播放速度，而且产生的各种播放速度的剪辑随机起点。然后，我们解决了通过播放速度排序洗牌剪辑制定培养3D卷积网络。在这种情况下，播放速度包括正向和反向方向上;因此视觉表示可以从视频的定向动力学成功地获知。我们还建议，可以应用到三维卷积网络，其中我们把时间特征分成若干组，并归一化每一个使用不同的相应的参数，以改善表示学习性能的新型层可靠颞组归一化方法。我们通过它微调动作识别任务验证了该方法的有效性。实验结果表明，在动作识别，该方法优于国家的最先进的自我监督学习方法。

7. Image Generation from Freehand Scene Sketches [PDF] 返回目录
Chengying Gao, Qi Liu, Qi Xu, Jianzhuang Liu, Limin Wang, Changqing Zou
Abstract: We introduce the first method for automatic image generation from scene-level freehand sketches. Our model allows for controllable image generation by specifying the synthesis goal via freehand sketches. The key contribution is an attribute vector bridged generative adversarial network called edgeGAN which supports high visual-quality image content generation without using freehand sketches as training data. We build a large-scale composite dataset called SketchyCOCO to comprehensively evaluate our solution. We validate our approach on the task of both objectlevel and scene-level image generation on SketchyCOCO. We demonstrate the method's capacity to generate realistic complex scene-level images from a variety of freehand sketches by quantitative, qualitative results, and ablation studies.
摘要：介绍从现场级的手绘草图自动图像生成的第一个方法。我们的模型允许可控的图像生成由通过手绘草图指定合成目标。关键贡献是属性向量桥接生成称为edgeGAN对抗性网络支持高视觉质量的图像内容生成，而无需使用手绘草图作为训练数据。我们建立所谓SketchyCOCO全面评估我们的解决方案的大型复合数据集。我们确认我们的两个objectlevel和场景级图像生成上SketchyCOCO的任务的方法。我们证明该方法的产生来自各种通过定量，定性结果，并切除研究手绘草图的现实复杂场景级图像的能力。

8. AI outperformed every dermatologist: Improved dermoscopic melanoma diagnosis through customizing batch logic and loss function in an optimized Deep CNN architecture [PDF] 返回目录
Cong Tri Pham, Mai Chi Luong, Dung Van Hoang, Antoine Doucet
Abstract: Melanoma, one of most dangerous types of skin cancer, re-sults in a very high mortality rate. Early detection and resection are two key points for a successful cure. Recent research has used artificial intelligence to classify melanoma and nevus and to compare the assessment of these algorithms to that of dermatologists. However, an imbalance of sensitivity and specificity measures affected the performance of existing models. This study proposes a method using deep convolutional neural networks aiming to detect melanoma as a binary classification problem. It involves 3 key features, namely customized batch logic, customized loss function and reformed fully connected layers. The training dataset is kept up to date including 17,302 images of melanoma and nevus; this is the largest dataset by far. The model performance is compared to that of 157 dermatologists from 12 university hospitals in Germany based on MClass-D dataset. The model outperformed all 157 dermatologists and achieved state-of-the-art performance with AUC at 94.4% with sensitivity of 85.0% and specificity of 95.0% using a prediction threshold of 0.5 on the MClass-D dataset of 100 dermoscopic images. Moreover, a threshold of 0.40858 showed the most balanced measure compared to other researches, and is promisingly application to medical diagnosis, with sensitivity of 90.0% and specificity of 93.8%.
摘要：黑色素瘤，在一个非常高的死亡率最危险类型的皮肤癌，重新sults之一。早期发现和切除术是成功治愈的两个关键点。最近的研究中使用的人工智能分类黑色素瘤和痣和这些算法的评估比较的是皮肤科医生。然而，敏感性和特异性措施失衡的影响现有车型的性能。本研究提出采用深卷积神经网络，旨在检测黑素瘤的二元分类问题的方法。它涉及到3个的关键特征，即定制批量逻辑，定制的损失函数和重整完全连接层。训练数据集保持最新，包括黑色素瘤和痣的17302个图像;这是最大的数据集远远。该模型的性能相比，则是根据MClass-d数据集在德国12个大学附属医院皮肤科医生157。该模型表现优于所有157名皮肤科医生和使用上的100个皮肤镜图像MClass-d数据集0.5预测阈值与在95.0％的85.0％的灵敏度和特异性94.4％实现状态的最先进的性能与AUC。此外，0.40858阈值显示最平衡的措施相比其他的研究，并且是很有希望应用在医学诊断，以93.8％的90.0％的灵敏度和特异性。

9. MarginDistillation: distillation for margin-based softmax [PDF] 返回目录
David Svitov, Sergey Alyamkin
Abstract: The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network.
摘要：在与基于容限SOFTMAX方法结合卷积神经网络（细胞神经网络）的使用表明国家的最先进的表现为面部识别的问题。近年来，随着基于保证金SOFTMAX训练的轻量级神经网络模型已经推出了针对边缘设备的面部识别任务。在本文中，我们提出了其性能优于对LFW，AgeDB-30和活性剂Megaface数据集面部识别任务的其他已知的方法轻质神经网络结构的新型蒸馏法。该方法的想法是使用类中心从教师网络为学生网络。那么学生网络进行训练，以获取类中心和面嵌入物，由教师网络预测之间相同的角度。

10. GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images [PDF] 返回目录
Lei Kang, Pau Riba, Yaxing Wang, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas
Abstract: Although current image generation methods have reached impressive quality levels, they are still unable to produce plausible yet diverse images of handwritten words. On the contrary, when writing by hand, a great variability is observed across different writers, and even when analyzing words scribbled by the same individual, involuntary variations are conspicuous. In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words. We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content. Our generator is guided by three complementary learning objectives: to produce realistic images, to imitate a certain handwriting style and to convey a specific textual content. Our model is unconstrained to any predefined vocabulary, being able to render whatever input word. Given a sample writer, it is also able to mimic its calligraphic features in a few-shot setup. We significantly advance over prior art and demonstrate with qualitative, quantitative and human-based evaluations the realistic aspect of our synthetically produced images.
摘要：尽管目前的图像生成方法已经达到了令人印象深刻的质量水平，但仍无法生产的手写字似是而非而又多种多样的图像。相反，手写的时候，一个很大的可变性跨不同的作家观察和分析同一个人胡乱写着即使，不自主的变化是显着的。在这项工作中，我们采取了一步接近现实的生产和多样化人为渲染的手写字。我们建议，可以通过调节产生可信的手写字的图像生成过程既书法的风格特点和文本内容的新方法。我们的发电机由三个互补的学习目标导向：生产逼真的图像，模仿有一定的手写风格，并传达特定的文本内容。我们的模型是不受约束任何预定义的词汇，能够使任何输入的单词。给定样本的作家，也能在几炮设置模仿其书法特点。我们显著推进了现有技术，并与定性，定量和以人为本的评价我们的人工合成图像的真实感展示。

11. Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning [PDF] 返回目录
Byungsoo Ko, Geonmo Gu
Abstract: Learning the distance metric between pairs of samples has been studied for image retrieval and clustering. With the remarkable success of pair-based metric learning losses, recent works have proposed the use of generated synthetic points on metric learning losses for augmentation and generalization. However, these methods require additional generative networks along with the main network, which can lead to a larger model size, slower training speed, and harder optimization. Meanwhile, post-processing techniques, such as query expansion and database augmentation, have proposed the combination of feature points to obtain additional semantic information. In this paper, inspired by query expansion and database augmentation, we propose an augmentation method in an embedding space for pair-based metric learning losses, called embedding expansion. The proposed method generates synthetic points containing augmented information by a combination of feature points and performs hard negative pair mining to learn with the most informative feature representations. Because of its simplicity and flexibility, it can be used for existing metric learning losses without affecting model size, training speed, or optimization difficulty. Finally, the combination of embedding expansion and representative metric learning losses outperforms the state-of-the-art losses and previous sample generation methods in both image retrieval and clustering tasks. The implementation will be publicly available.
摘要：学习样本对之间的距离度量已经研究了图像检索和集群。随着对基于度量学习的损失显着的成功，最近的作品提出了关于增强和推广度量学习的损失利用生成的合成分。然而，这些方法需要与主网，这可能会导致更大的模型大小，速度较慢的训练速度，更难以优化沿着额外生成的网络。同时，后处理技术，例如查询扩展和数据库增强，已经提出的特征点的组合，以获得附加的语义信息。在本文中，通过查询扩展和数据库增强的启发，我们提出在对基于度量学习损失的嵌入空间，称为嵌入扩展的增强方法。所提出的方法产生的合成点由硬负对采矿学会与信息量最大特征表示特征点，并且执行组合含有增强信息。由于它的简单性和灵活性，它可以用于现有的度量学习的损失，而不会影响模型的大小，训练速度，或优化的难度。最后，嵌入膨胀和代表度量学习损失的组合优于国家的最先进的损失和在这两个图像检索先前样本的生成方法和聚类任务。实施将是公开的。

12. A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation [PDF] 返回目录
Jian Liang, Yunbo Wang, Dapeng Hu, Ran He, Jiashi Feng
Abstract: This work addresses the unsupervised domain adaptation problem, especially for the partial scenario where the class labels in the target domain are only a subset of those in the source domain. Such a partial transfer setting sounds realistic but challenging while existing methods always suffer from two key problems, i.e., negative transfer and uncertainty propagation. In this paper, we build on domain adversarial learning and propose a novel domain adaptation method BA$^3$US with two new techniques termed Balanced Adversarial Alignment (BAA) and Adaptive Uncertainty Suppression (AUS), respectively. On one hand, negative transfer results in that target samples are misclassified to the classes only present in the source domain. To address this issue, BAA aims to pursue the balance between label distributions across domains in a quite simple manner. Specifically, it randomly leverages a few source samples to augment the smaller target domain during domain alignment so that classes in different domains are symmetric. On the other hand, a source sample is denoted as uncertain if there is an incorrect class that has a relatively high prediction score. Such uncertainty is easily propagated to the unlabeled target data around it during alignment, which severely deteriorates the adaptation performance. Thus, AUS emphasizes uncertain samples and exploits an adaptive weighted complement entropy objective to expect that incorrect classes have the uniform and low prediction scores. Experimental results on multiple benchmarks demonstrate that BA$^3$US surpasses state-of-the-arts for partial domain adaptation tasks.
摘要：这项工作解决了无人监管的领域适应性问题，尤其是对于部分场景在目标域中的类标签只有那些在源域的一个子集。这样的部分传送设定听起来现实但挑战而现有的方法总是从两个关键问题，即，负转移和不确定性传播受到影响。在本文中，我们建立域对抗性学习，提出了一种新的领域适应性方法BA $ ^ 3 $美国有两个新技术称为平衡对抗性对齐（BAA）和自适应不确定性抑制（AUS），分别。一方面，该目标样品中的负转印结果错误分类到只存在于所述源域的类。为了解决这个问题，BAA的目的是追求跨域标签分布之间的平衡在一个相当简单的方式。具体而言，利用随机数源样本，以增加结构域对准，使得在不同的域中的类是对称的过程中更小的目标域。在另一方面中，源样本被表示为不确定的，如果存在具有相对高的预测分数的不正确的类。这种不确定性是很容易传播到对准期间它周围的未标记的目标数据，这严重损害了适应性能。因此，AUS强调不确定样品并利用自适应加权补熵客观地预期不正确类具有均匀且低预测得分。在多个基准测试实验结果表明，BA $ ^ 3的国家的最艺术的一部分的区域适应任务$ US反超。

13. Detecting Attended Visual Targets in Video [PDF] 返回目录
Eunji Chong, Yongxin Wang, Nataniel Ruiz, James M. Rehg
Abstract: We address the problem of detecting attention targets in video. Specifically, our goal is to identify where each person in each frame of a video is looking, and correctly handle the out-of-frame case. Our novel architecture effectively models the dynamic interaction between the scene and head features in order to infer time-varying attention targets. We introduce a new dataset, VideoAttentionTarget, consisting of fully-annotated video clips containing complex and dynamic patterns of real-world gaze behavior. Experiments on this dataset show that our model can effectively infer attention in videos. To further demonstrate the utility of our approach, we apply our predicted attention maps to two social gaze behavior recognition tasks, and show that the resulting classifiers significantly outperform existing methods. We achieve state-of-the-art performance on three datasets: GazeFollow (static images), VideoAttentionTarget (videos), and VideoCoAtt (videos), and obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers.
摘要：解决视频检测关注目标的问题。具体来说，我们的目标是确定每个人在视频的每一帧寻找，正确处理了框外的情况。我们新颖的架构有效模式，以推断随时间变化的关注目标的场景和头部特征之间的动态交互。我们推出了新的数据集，VideoAttentionTarget，由含有真实世界的目光行为复杂和动态模式完全注释的视频剪辑。在此数据集上的实验，我们的模型能够有效地推断出影片的关注。为了进一步证明了该方法的实用性，我们应用我们的预测关注映射到两个社会的注视行为识别任务，并表明，所产生的分类显著优于现有的方法。我们对三个数据集中实现国家的最先进的性能：GazeFollow（静态图像），VideoAttentionTarget（视频）和VideoCoAtt（视频），并获得第一个结果，而不穿戴式摄像机或眼动仪自动分类临床相关的注视行为。

14. Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization [PDF] 返回目录
Saehyung Lee, Hyungyu Lee, Sungroh Yoon
Abstract: Adversarial examples cause neural networks to produce incorrect outputs with high confidence. Although adversarial training is one of the most effective forms of defense against adversarial examples, unfortunately, a large gap exists between test accuracy and training accuracy in adversarial training. In this paper, we identify Adversarial Feature Overfitting (AFO), which may cause poor adversarially robust generalization, and we show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to AFO in our simple Gaussian model. Considering these theoretical results, we present soft labeling as a solution to the AFO problem. Furthermore, we propose Adversarial Vertex mixup (AVmixup), a soft-labeled data augmentation approach for improving adversarially robust generalization. We complement our theoretical analysis with experiments on CIFAR10, CIFAR100, SVHN, and Tiny ImageNet, and show that AVmixup significantly improves the robust generalization performance and that it reduces the trade-off between standard accuracy and adversarial robustness.
摘要：对抗性的例子引起神经网络，产生高可信度不正确的输出。虽然对抗训练是针对敌对例子最有效的形式防御的，不幸的是，测试的准确性和训练精度在对抗训练之间存在很大的差距。在本文中，我们确定对抗性功能过度拟合（AFO），这可能会导致较差adversarially强大的泛化，我们表明，对抗性训练可以超调的最佳点在强大的推广方面，导致AFO我们简单的高斯模型。考虑到这些理论成果，提出了软标签作为一个解决问题的AFO。此外，建议对抗性顶点的mixup（AVmixup），用于提高adversarially强大的推广软标签的数据增强方法。我们补充我们对CIFAR10，CIFAR100，SVHN和微型ImageNet实验的理论分析，并表明AVmixup显著提高了稳健的泛化性能，它降低了标准精度和对抗性鲁棒性之间进行权衡。

15. Fake Generated Painting Detection via Frequency Analysis [PDF] 返回目录
Yong Bai, Yuanfang Guo, Jinjie Wei, Lin Lu, Rui Wang, Yunhong Wang
Abstract: With the development of deep neural networks, digital fake paintings can be generated by various style transfer this http URL detect the fake generated paintings, we analyze the fake generated and real paintings in Fourier frequency domain and observe statistical differences and artifacts. Based on our observations, we propose Fake Generated Painting Detection via Frequency Analysis (FGPD-FA) by extracting three types of features in frequency domain. Besides, we also propose a digital fake painting detection database for assessing the proposed method. Experimental results demonstrate the excellence of the proposed method in different testing conditions.
摘要：随着深层神经网络的发展，可以通过各种风格的转移而产生的数字假画这个HTTP URL检测假画产生，我们分析了傅立叶频域假产生和真正的绘画和观察统计差异和文物。根据我们的观察，我们提出假生成在频域提取三种类型的特性通过绘画频率分析（FGPD-FA）检测。此外，我们还提出了一个数字假画检测数据库来评估所提出的方法。实验结果表明，在不同的测试条件下所提出的方法的卓越。

16. Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications [PDF] 返回目录
Chinthaka Gamanayake, Lahiru Jayasinghe, Benny Ng, Chau Yuen
Abstract: Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.
摘要：尽管卷积神经网络（CNN）已经显示在计算机视觉领域卓越的成绩，但仍处于边缘实现实时计算机视觉算法，尤其是在使用，由于低成本的物联网设备一项艰巨的任务在CNN的高内存消耗和计算复杂性。网络压缩方法，如重修剪，滤波器修剪，并且量化用于克服上述问题。即使过滤器修剪方法已与其他技术相比显示更好的性能，跨越一个CNN的不同层次修剪过滤器的数量的不规则性可能不符合广大神经计算的硬件架构。在本文中，一种新颖的贪婪方法称为群集修剪已经提出，其提供通过考虑滤波器的重要性和底层硬件架构除去在CNN滤波器的结构的方式。所提出的方法与帕斯卡-VOC打开的数据集以往的滤波修正算法进行比较，头计数数据集，这是我们自己开发的检测和计数进入房间的人数据集。我们的基准使用三个硬件架构，即CPU，GPU和Intel Movidius神经计算机棒（NCS），我们提出的方法流行的SSD-MobileNet和SSD-SqueezeNet用于边缘-AI视觉应用神经网络结构。结果表明，我们的方法优于传统的过滤器修剪方法，使用上文提到的硬件架构两个数据集。此外，由英特尔Movidius-NCS的低成本物联网硬件设置建议使用我们提出的修剪方法来部署边缘AI应用。

17. End-to-End Trainable One-Stage Parking Slot Detection Integrating Global and Local Information [PDF] 返回目录
Jae Kyu Suhr, Ho Gi Jung
Abstract: This paper proposes an end-to-end trainable one-stage parking slot detection method for around view monitor (AVM) images. The proposed method simultaneously acquires global information (entrance, type, and occupancy of parking slot) and local information (location and orientation of junction) by using a convolutional neural network (CNN), and integrates them to detect parking slots with their properties. This method divides an AVM image into a grid and performs a CNN-based feature extraction. For each cell of the grid, the global and local information of the parking slot is obtained by applying convolution filters to the extracted feature map. Final detection results are produced by integrating the global and local information of the parking slot through non-maximum suppression (NMS). Since the proposed method obtains most of the information of the parking slot using a fully convolutional network without a region proposal stage, it is an end-to-end trainable one-stage detector. In experiments, this method was quantitatively evaluated using the public dataset and outperforms previous methods by showing both recall and precision of 99.77%, type classification accuracy of 100%, and occupancy classification accuracy of 99.31% while processing 60 frames per second.
摘要：本文提出的端至端可训练一阶段停车间隙检测周边视监视器（AVM）图像的方法。该方法同时获取全球信息（入口，类型和停车位的占用），并通过使用卷积神经网络（CNN）的本地信息（位置和路口方向），并集成他们检测停车位与他们的财产。此方法共分AVM图像划分为网格，执行基于CNN-特征提取。为网格的每个单元中，通过施加卷积滤波器所提取的特征地图获得的停车位的全局和局部信息。最终检测结果通过通过非最大值抑制（NMS）积分停车时隙的全局和局部信息产生的。由于所提出的方法获得的大部分使用全卷积网络停车时隙的信息，而无需一个区域提案阶段，它是一个端至端可训练一阶段检测器。在实验中，该方法是使用公共数据集定量评价和优于通过示出在处理每秒60帧两者召回和99.77％的精度，100％类型分类准确性和99.31％占用分类精度以前的方法。

18. Drone Based RGBT Vehicle Detection and Counting: A Challenge [PDF] 返回目录
Pengfei Zhu, Yiming Sun, Longyin Wen, Yu Feng, Qinghua Hu
Abstract: Camera-equipped drones can capture targets on the ground from a wider field of view than static cameras or moving sensors over the ground. In this paper we present a large-scale vehicle detection and counting benchmark, named DroneVehicle, aiming at advancing visual analysis tasks on the drone platform. The images in the benchmark were captured over various urban areas, which include different types of urban roads, residential areas, parking lots, highways, etc., from day to night. Specifically, DroneVehicle consists of 15,532 pairs of images, i.e., RGB images and infrared images with rich annotations, including oriented object bounding boxes, object categories, etc. With intensive amount of effort, our benchmark has 441,642 annotated instances in 31,064 images. As a large-scale dataset with both RGB and thermal infrared (RGBT) images, the benchmark enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. In particular, we design two popular tasks with the benchmark, including object detection and object counting. All these tasks are extremely challenging in the proposed dataset due to factors such as illumination, occlusion, and scale variations. We hope the benchmark largely boost the research and development in visual analysis on drone platforms. The DroneVehicle dataset can be download from this https URL.
摘要：使用带相机的无人驾驶飞机可以从视图比静态照相机或在地面上移动的传感器更宽的视场在地面上捕获的目标。在本文中，我们提出了一种大型车辆检测和计数基准，命名DroneVehicle，针对无人机平台上推进可视化分析的任务。在基准图像在不同的城市地区，其中包括不同类型的城市道路，居住区，停车场，高速公路等，从白天到夜晚被抓获。具体而言，DroneVehicle由15532双图像，即，RGB图像和红外图像的具有丰富的注解，包括面向对象的包围盒，对象类别等等随着努力密集量，我们的基准具有31064幅图像441642个注释实例。如RGB和热红外（RGBT）图像的大规模数据集，基准使广泛的评估和无人驾驶飞机平台上视觉分析算法调查。特别是，我们设计了一个标杆两个流行的任务，包括目标检测和计数的对象。所有这些任务都极其在所提出的数据集由于因素如照明，闭塞，和尺度变化挑战。我们希望基准很大程度上提振可视化分析的研究和开发无人机平台。该DroneVehicle数据集可以从这个HTTPS URL下载。

19. Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference [PDF] 返回目录
Chengxi Li, Stanley H. Chan, Yi-Ting Chen
Abstract: We propose a framework based on causal inference for risk object identification, an essential task towards driver-centric risk assessment. In this work, risk objects are defined as objects influencing driver's goal-oriented behavior. There are two limitations of the existing approaches. First, they require strong supervisions such as risk object location or human gaze location. Second, there is no explicit reasoning stage for identifying risk object. To address these issues, the task of identifying causes of driver behavioral change is formalized in the language of functional causal models and interventions. Specifically, we iteratively simulate causal effect by removing an object using the proposed driving model. The risk object is determined as the one causing the most substantial causal effect. We evaluate the proposed framework on the Honda Research Institute Driving Dataset (HDD). The dataset provides the annotation for risk object localization to enable systematic benchmarking with existing approaches. Our framework demonstrates a substantial average performance boost over a strong baseline by 7.5%.
摘要：提出了一种基于因果推理风险对象标识，向驾驶员为中心的风险评估的一项重要任务的框架。在这项工作中，危险物品被定义为影响驾驶者的目标导向行为的对象。有现有的方法中的两个限制。首先，他们需要强有力的监督，如风险对象的位置或人的注视位置。其次，对于风险识别对象没有明确的论证阶段。为了解决这些问题，识别驾驶行为变化的原因的任务在功能因果模型和干预的语言形式化。具体而言，通过去除使用所提出的驱动模型中的对象，我们迭代模拟因果效应。风险对象被确定为一个引起的最实质性的因果关系。我们评估对本田研究所驾驶数据集（HDD）所提出的框架。该数据集提供了风险目标定位标注，以实现与现有方法系统化基准的。我们的框架展示了一个强有力的基线7.5％大幅平均性能提升。

20. A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI [PDF] 返回目录
Jens Behley, Andres Milioto, Cyrill Stachniss
Abstract: Panoptic segmentation is the recently introduced task that tackles semantic segmentation and instance segmentation jointly. In this paper, we present an extension of SemanticKITTI, which is a large-scale dataset providing dense point-wise semantic labels for all sequences of the KITTI Odometry Benchmark, for training and evaluation of laser-based panoptic segmentation. We provide the data and discuss the processing steps needed to enrich a given semantic annotation with temporally consistent instance information, i.e., instance information that supplements the semantic labels and identifies the same instance over sequences of LiDAR point clouds. Additionally, we present two strong baselines that combine state-of-the-art LiDAR-based semantic segmentation approaches with a state-of-the-art detector enriching the segmentation with instance information and that allow other researchers to compare their approaches against. We hope that our extension of SemanticKITTI with strong baselines enables the creation of novel algorithms for LiDAR-based panoptic segmentation as much as it has for the original semantic segmentation and semantic scene completion tasks. Data, code, and an online evaluation using a hidden test set will be published on this http URL.
摘要：全景分割是最近推出的任务铲球语义分割和实例分割共同所有。在本文中，我们提出SemanticKITTI，这是一个大型数据集为KITTI里程计基准的所有序列，用于基于激光的全景分割的训练和评估提供致密的逐点的语义标签的延伸。我们所提供的数据和讨论，以丰富给定的语义标注与时间一致的情况下的信息，即，例如信息所需要的处理步骤，以补充语义标签和识别在激光雷达点云的序列相同的实例。此外，我们提出结合国家的最先进的基于激光雷达的语义分割与国家的最先进的探测器以丰富的实例信息，并允许其他研究人员对他们的做法比较接近分割两个强基线。我们希望我们的SemanticKITTI的扩展具有很强的基准使基于激光雷达的全景分割创造新的算法，它更具有原始语义分割和语义现场完成的任务。数据，代码，并使用一个隐藏的测试集在线评估将在此HTTP URL发布。

21. Towards Fair Cross-Domain Adaptation via Generative Learning [PDF] 返回目录
Tongxin Wang, Zhengming Ding, Wei Shao, Haixu Tang, Kun Huang
Abstract: Domain Adaptation (DA) targets at adapting a model trained over the well-labeled source domain to the unlabeled target domain lying in different distributions. Existing DA normally assumes the well-labeled source domain is class-wise balanced, which means the size per source class is relatively similar. However, in real-world applications, labeled samples for some categories in the source domain could be extremely few due to the difficulty of data collection and annotation, which leads to decreasing performance over target domain on those few-shot categories. To perform fair cross-domain adaptation and boost the performance on these minority categories, we develop a novel Generative Few-shot Cross-domain Adaptation (GFCA) algorithm for fair cross-domain classification. Specifically, generative feature augmentation is explored to synthesize effective training data for few-shot source classes, while effective cross-domain alignment aims to adapt knowledge from source to facilitate the target learning. Experimental results on two large cross-domain visual datasets demonstrate the effectiveness of our proposed method on improving both few-shot and overall classification accuracy comparing with the state-of-the-art DA approaches.
摘要：在适应培训了良好标记源域到未标记的目标域躺在不同的分布模型领域适应性（DA）的目标。现有DA通常假定井标记的源域是类明智平衡，这意味着每个源类大小相对类似。某些类别的源域然而，在实际应用中，标记的样品可能是极少的，由于数据收集和注释，这将导致在那些极少数次类下降超过目标域性能的难度。为了进行公平的跨域适应和提高这些少数类别的表现，我们开发了一个新的剖成很少拍跨域适应（GFCA）算法公平跨域分类。具体来说，生成功能增强是探索以综合几拍源类有效的训练数据，而有效的跨域对准目标，从源头适应知识，以促进目标的学习。在两个大的跨域视觉数据集实验结果表明，我们所提出的改善都很少拍和总体分类精度与国家的最先进的方法DA比较方法的有效性。

22. Creating High Resolution Images with a Latent Adversarial Generator [PDF] 返回目录
David Berthelot, Peyman Milanfar, Ian Goodfellow
Abstract: Generating realistic images is difficult, and many formulations for this task have been proposed recently. If we restrict the task to that of generating a particular class of images, however, the task becomes more tractable. That is to say, instead of generating an arbitrary image as a sample from the manifold of natural images, we propose to sample images from a particular "subspace" of natural images, directed by a low-resolution image from the same subspace. The problem we address, while close to the formulation of the single-image super-resolution problem, is in fact rather different. Single image super-resolution is the task of predicting the image closest to the ground truth from a relatively low resolution image. We propose to produce samples of high resolution images given extremely small inputs with a new method called Latent Adversarial Generator (LAG). In our generative sampling framework, we only use the input (possibly of very low-resolution) to direct what class of samples the network should produce. As such, the output of our algorithm is not a unique image that relates to the input, but rather a possible se} of related images sampled from the manifold of natural images. Our method learns exclusively in the latent space of the adversary using perceptual loss -- it does not have a pixel loss.
摘要：生成逼真的图像是困难的，这个任务很多配方近来已经提出。如果我们限制任务是产生一类特殊的图像，但是，任务变得更容易处理。也就是说，代替生成任意图像作为自然图像的歧管中的样品，我们建议的样本图像从自然图像的一个特定的“子空间”，从同一子空间由低分辨率图像定向。这个问题，我们的地址，而接近单张影像超分辨率问题的提法，实际上是相当不同的。单图像超分辨率图像从一个相对较低分辨率图像预测最接近地面真理的任务。我们建议，产生了一个名为潜在对抗性生成器（LAG）的新方法给出极小输入高分辨率图像的样本。在我们的生成抽样框，我们只用了输入（可能是非常低的分辨率），以指导网络应该生产什么类别的样本。因此，我们的算法的输出是不是从自然图像的歧管采样相关的图像的唯一图像，涉及到输入，而是可能本身}。我们的方法可以学习专门在利用感知损失对手的潜在空间 - 它没有一个像素的损失。

23. Learning View and Target Invariant Visual Servoing for Navigation [PDF] 返回目录
Yimeng Li, Jana Kosecka
Abstract: The advances in deep reinforcement learning recently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches.
摘要：深强化学习基于数据驱动的学习方法，以导航最近重新兴趣进展。在本文中，我们提出了学习的观点不变，并针对本地移动机器人导航不变的视觉伺服;给定的初始视图和目标视图或目标的图像，我们培养深卷积网络控制器以达到期望的目标。我们提出了一个新的架构完成这个任务，它靠在建立由传统的反馈控制误差激励初始和目标视图和新的奖励结构之间的对应关系的能力。该模型的优点是，它不需要校准和深度信息和在各种环境和目标的实现强劲的视觉伺服不带任何参数的微调。我们目前的做法，并与其他深度学习架构以及在逼真的模拟环境，经典视觉伺服方法相比，综合评价。所提出的模型克服相比之前的学习方法的经典基于视觉伺服的方法和实现显著较高泛化能力的脆性。

24. The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study [PDF] 返回目录
Masood S. Mortazavi, Ning Yan
Abstract: In-painting networks use existing pixels to generate appropriate pixels to fill "holes" placed on parts of an image. A 2-D in-painting network's input usually consists of (1) a three-channel 2-D image, and (2) an additional channel for the "holes" to be in-painted in that image. In this paper, we study the robustness of a given in-painting neural network against variations in hole geometry distributions. We observe that the robustness of an in-painting network is dependent on the probability distribution function (PDF) of the hole geometry presented to it during its training even if the underlying image dataset used (in training and testing) does not alter. We develop an experimental methodology for testing and evaluating relative robustness of in-painting networks against four different kinds of hole geometry PDFs. We examine a number of hypothesis regarding (1) the natural bias of in-painting networks to the hole distribution used for their training, (2) the underlying dataset's ability to differentiate relative robustness as hole distributions vary in a train-test (cross-comparison) grid, and (3) the impact of the directional distribution of edges in the holes and in the image dataset. We present results for L1, PSNR and SSIM quality metrics and develop a specific measure of relative in-painting robustness to be used in cross-comparison grids based on these quality metrics. (One can incorporate other quality metrics in this relative measure.) The empirical work reported here is an initial step in a broader and deeper investigation of "filling the blank" neural networks' sensitivity, robustness and regularization with respect to hole "geometry" PDFs, and it suggests further research in this domain.
摘要：绘画网络使用现有像素以生成适当像素以填充放置在图像的部分的“洞”。在绘画网络的输入A 2-d通常包括（1）三通道2-d的图像，以及（2）对于“孔”的附加信道是在涂该映像中。在本文中，我们研究了在绘画针对孔几何分布变化的神经网络给出的鲁棒性。我们观察到的在绘画网络的健壮性是其训练过程中依赖于提交给它的孔几何形状的概率分布函数（PDF），即使使用的底层图像数据集（在训练和测试）不会改变。我们开发测试和评估的绘画对四种不同的孔几何PDF文件的网络相对稳健性的实验方法。我们研究关于数假设（1）的自然偏压在绘画网络（2）用于其训练的孔的分布，潜在的数据集的区分相对鲁棒性孔分布在列车试验而变化的能力（交比较）的网格，和（3）的在孔和图像数据组中的边缘的方向分布的影响。为L1，PSNR和SSIM质量度量，我们目前的研究结果，开发的相对在绘画的鲁棒性在交叉比较网格基于这些质量度量被使用的特定量度。（可以以此相对量度掺入其他质量度量。）的经验性工作这里报告处于“填充空白”神经网络的灵敏度，鲁棒性和正规化相对于孔‘几何’的PDF更广泛和更深入的调查的初始步骤和它表明在这一领域的进一步研究。

25. Exploring Partial Intrinsic and Extrinsic Symmetry in 3D Medical Imaging [PDF] 返回目录
Javad Fotouhi, Giacomo Taylor, Mathias Unberath, Alex Johnson, Sing Chun Lee, Greg Osgood, Mehran Armand, Nassir Navab
Abstract: We present a novel methodology to detect imperfect bilateral symmetry in CT of human anatomy. In this paper, the structurally symmetric nature of the pelvic bone is explored and is used to provide interventional image augmentation for treatment of unilateral fractures in patients with traumatic injuries. The mathematical basis of our solution is on the incorporation of attributes and characteristics that satisfy the properties of intrinsic and extrinsic symmetry and are robust to outliers. In the first step, feature points that satisfy intrinsic symmetry are automatically detected in the Möbius space defined on the CT data. These features are then pruned via a two-stage RANSAC to attain correspondences that satisfy also the extrinsic symmetry. Then, a disparity function based on Tukey's biweight robust estimator is introduced and minimized to identify a symmetry plane parametrization that yields maximum contralateral similarity. Finally, a novel regularization term is introduced to enhance similarity between bone density histograms across the partial symmetry plane, relying on the important biological observation that, even if injured, the dislocated bone segments remain within the body. Our extensive evaluations on various cases of common fracture types demonstrate the validity of the novel concepts and the robustness and accuracy of the proposed method.
摘要：本文提出了一种新的方法来检测不完善的两侧对称的人体解剖结构的CT。在本文中，骨盆骨的结构上对称的性质探索和用于提供用于治疗患者的创伤性损伤单方面骨折的介入图像增强。我们的解决方案的数学基础是满足内在和外在的对称特性，是稳健的异常值属性和特征的结合。在第一步骤中，满足固有对称性的特征点中的CT数据定义的莫比乌斯空间被自动检测。这些特征随后经由两级RANSAC修剪实现这一也满足外在对称性对应关系。然后，根据杜克的biweight强大的估计悬殊功能介绍和最小化的识别对称平面参数化能产生最大对侧相似。最后，一种新颖的正则化项被引入，以提高骨密度直方图之间的相似性在整个局部对称平面，依靠重要生物观测的是，即使受伤，脱臼骨段保留在体内。我们的共同骨折类型的各种情况下广泛评价显示的新颖概念所提出的方法的有效性和鲁棒性和准确性。

26. Event-Based Angular Velocity Regression with Spiking Networks [PDF] 返回目录
Mathias Gehrig, Sumit Bam Shrestha, Daniel Mouritzen, Davide Scaramuzza
Abstract: Spiking Neural Networks (SNNs) are bio-inspired networks that process information conveyed as temporal spikes rather than numeric values. A spiking neuron of an SNN only produces a spike whenever a significant number of spikes occur within a short period of time. Due to their spike-based computational model, SNNs can process output from event-based, asynchronous sensors without any pre-processing at extremely lower power unlike standard artificial neural networks. This is possible due to specialized neuromorphic hardware that implements the highly-parallelizable concept of SNNs in silicon. Yet, SNNs have not enjoyed the same rise of popularity as artificial neural networks. This not only stems from the fact that their input format is rather unconventional but also due to the challenges in training spiking networks. Despite their temporal nature and recent algorithmic advances, they have been mostly evaluated on classification problems. We propose, for the first time, a temporal regression problem of numerical values given events from an event camera. We specifically investigate the prediction of the 3-DOF angular velocity of a rotating event camera with an SNN. The difficulty of this problem arises from the prediction of angular velocities continuously in time directly from irregular, asynchronous event-based input. Directly utilising the output of event cameras without any pre-processing ensures that we inherit all the benefits that they provide over conventional cameras. That is high-temporal resolution, high-dynamic range and no motion blur. To assess the performance of SNNs on this task, we introduce a synthetic event camera dataset generated from real-world panoramic images and show that we can successfully train an SNN to perform angular velocity regression.
摘要：扣球神经网络（SNNS）是生物启发的网络处理信息的传达为颞区棘波，而不是数值。一个SNN的尖峰神经元仅产生每当发生的短时间内尖峰的显著数的峰值。由于它们基于尖峰计算模型，SNNS可以在不同于标准人工神经网络非常低功率过程中没有任何前处理从基于事件的，异步的传感器输出。这是可能的，因为专门的神经形态硬件实现硅SNNS的高度并行化的概念。然而，SNNS都没有享受过的人气一样崛起，人工神经网络。这不仅从一个事实，即他们的输入格式是相当不同寻常，但也由于在训练扣球网络的挑战茎。尽管他们的时间特性和最新算法的进步，他们已经大多的分类问题进行评估。我们建议，对于第一次，数值的时间回归问题给出从事件摄像机的事件。我们具体地研究旋转事件照相机的与SNN 3-DOF角速度的预测。这个问题的困难来自角速度的预测在时间上连续直接从不规则，异步基于事件的输入。直接利用的情况下相机的输出，无需任何预处理，确保我们继承他们提供比传统相机的所有好处。即高时间分辨率，高动态范围和没有运动模糊。为了评估此任务SNNS的性能，我们引入从现实世界的全景图像和显示，我们能够成功地训练的SNN执行角速度回归产生的合成事件相机数据集。

27. Dimensionality Reduction and Motion Clustering during Activities of Daily Living: 3, 4, and 7 Degree-of-Freedom Arm Movements [PDF] 返回目录
Yuri Gloumakov, Adam J. Spiers, Aaron M. Dollar
Abstract: The wide variety of motions performed by the human arm during daily tasks makes it desirable to find representative subsets to reduce the dimensionality of these movements for a variety of applications, including the design and control of robotic and prosthetic devices. This paper presents a novel method and the results of an extensive human subjects study to obtain representative arm joint angle trajectories that span naturalistic motions during Activities of Daily Living (ADLs). In particular, we seek to identify sets of useful motion trajectories of the upper limb that are functions of a single variable, allowing, for instance, an entire prosthetic or robotic arm to be controlled with a single input from a user, along with a means to select between motions for different tasks. Data driven approaches are used to obtain clusters as well as representative motion averages for the full-arm 7 degree of freedom (DOF), elbow-wrist 4 DOF, and wrist-only 3 DOF motions. The proposed method makes use of well-known techniques such as dynamic time warping (DTW) to obtain a divergence measure between motion segments, DTW barycenter averaging (DBA) to obtain averages, Ward's distance criterion to build hierarchical trees, batch-DTW to simultaneously align multiple motion data, and functional principal component analysis (fPCA) to evaluate cluster variability. The clusters that emerge associate various recorded motions into primarily hand start and end location for the full-arm system, motion direction for the wrist-only system, and an intermediate between the two qualities for the elbow-wrist system. The proposed clustering methodology is justified by comparing results against alternative approaches.
摘要：各种各样期间的日常工作由人的手臂进行运动使得期望找到有代表性的子集，以减少这些运动的维度用于各种应用，包括设计和机器人和假肢装置的控制。本文提出了一种新颖的方法和广泛的人类受试者研究的结果，以获得跨越日常生活（ADL的）的活动期间自然运动代表臂关节角度的轨迹。特别是，我们寻求确定集，它们是单个变量的函数的上肢的有用的运动轨迹，使得，例如，一个完整的假体或机器人臂将与来自用户的单个输入用装置控制，沿运动对不同的任务之间进行选择。数据驱动方法被用于获得集群以及代表运动的平均值为（DOF），肘腕4 DOF，和腕只有3 DOF运动全臂7自由度。所提出的方法使用公知的技术，例如动态时间规整（DTW），以获得DTW重心平均（DBA）运动节段之间的偏差的措施，以获得平均值，Ward的距离标准来构建分层树，分批DTW同时对准多个运动数据，和功能性主成分分析（FPCA）评价簇的可变性。所出现的各种关联记录运动到聚类主要手为全臂系统，运动方向的唯一手腕系统，以及两种品质的肘腕系统之间的中间开始和结束位置。所提出的聚类方法是通过比较其他方法结果合理。

28. Learning the sense of touch in simulation: a sim-to-real strategy for vision-based tactile sensing [PDF] 返回目录
Carmelo Sferrazza, Thomas Bi, Raffaello D'Andrea
Abstract: Data-driven approaches to tactile sensing aim to overcome the complexity of accurately modeling contact with soft materials. However, their widespread adoption is impaired by concerns about data efficiency and the capability to generalize when applied to various tasks. This paper focuses on both these aspects with regard to a vision-based tactile sensor, which aims to reconstruct the distribution of the three-dimensional contact forces applied on its soft surface. Accurate models for the soft materials and the camera projection, derived via state-of-the-art techniques in the respective domains, are employed to generate a dataset in simulation. A strategy is proposed to train a tailored deep neural network entirely from the simulation data. The resulting learning architecture is directly transferable across multiple tactile sensors without further training and yields accurate predictions on real data, while showing promising generalization capabilities to unseen contact conditions.
摘要：数据驱动方法触觉感测的目标是克服的准确建模有软材料接触的复杂性。然而，它们的广泛应用是通过有关数据的效率，当应用于各种任务来概括能力的担忧减弱。本文围绕这两方面关于基于视觉的触觉传感器，其目的是重建三维接触力的分布施加在其柔软的表面。为软质材料和相机投影精确的模型，通过在相应的域状态的最先进的技术获得的，被用来产生在模拟的数据集。策略是提出了从模拟数据完全训练量身定做的深层神经网络。将得到的学习架构是跨多个触觉传感器直接转让无需进一步培养和产生实际数据准确的预测，同时表现出有前途的泛化能力，以看不见的接触条件。

29. Demographic Bias in Biometrics: A Survey on an Emerging Challenge [PDF] 返回目录
P. Drozdowski, C. Rathgeb, A. Dantcheva, N. Damer, C. Busch
Abstract: Systems incorporating biometric technologies have become ubiquitous in personal, commercial, and governmental identity management applications. Both cooperative (e.g. access control) and non-cooperative (e.g. surveillance and forensics) systems have benefited from biometrics. Such systems rely on the uniqueness of certain biological or behavioural characteristics of human beings, which enable for individuals to be reliably recognised using automated algorithms. Recently, however, there has been a wave of public and academic concerns regarding the existence of systemic bias in automated decision systems (including biometrics). Most prominently, face recognition algorithms have often been labelled as "racist" or "biased" by the media, non-governmental organisations, and researchers alike. The main contributions of this article are: (1) an overview of the topic of algorithmic bias in the context of biometrics, (2) a comprehensive survey of the existing literature on biometric bias estimation and mitigation, (3) a discussion of the pertinent technical and social matters, and (4) an outline of the remaining challenges and future work items, both from technological and social points of view.
摘要：结合生物识别技术系统已经在个人，商业和政府的身份管理应用程序变得无处不在。二者合作（例如访问控制）和非合作（例如监测和取证）系统已经从生物特征中获益。这样的系统依赖于的人类的某些生物学或行为特征，它启用对使用自动算法来可靠地识别个体唯一性。然而，最近出现了关于系统性偏差的自动决策系统的存在（包括生物）公共和学术方面的一浪。最突出的是，面部识别算法常常被贴上“种族主义”或通过媒体，非政府组织和研究人员都“有偏见”。本文的主要贡献是：（1）算法偏压的生物识别的上下文中的主题的概述，（2）对生物统计偏差估计和缓解现有文献的全面调查，（3）相关的讨论技术和社会问题，以及（4）其余的挑战和未来的工作项目，无论是从技术和社会百分点轮廓。

30. Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family [PDF] 返回目录
Frank Nielsen, Richard Nock
Abstract: It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family. In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function and highlight the role of quasi-arithmetic means and their multivariate mean operator extensions. In practice, these cumulant-free formulas are handy when implementing these (dis)similarities using legacy Application Programming Interfaces (APIs) since our method requires only to partially factorize the densities canonically of the considered exponential family.
摘要：众所周知，巴氏，海林格，库勒巴克-莱布勒，$ \ $阿尔法-divergences和杰弗里斯属于同一个指数族的密度之间的分歧有通用的封闭形式的公式依赖于严格凸和现实解析累积功能特征的指数系列。在这项工作中，我们报告（DIS）的相似性公式，它们绕过明确使用累积功能和突出的准算术平均值和他们的多元平均算扩展的作用。在实践中，实现这些（DIS）在使用传统的应用程序编程接口（API），因为我们的方法只需要对部分比化密度标准地所考虑的指数家族的相似性，这些无累积量公式是得心应手。

31. Harnessing Multi-View Perspective of Light Fields for Low-Light Imaging [PDF] 返回目录
Mohit Lamba, Kranthi Kumar, Kaushik Mitra
Abstract: Light Field (LF) offers unique advantages such as post-capture refocusing and depth estimation, but low-light conditions limit these capabilities. To restore low-light LFs we should harness the geometric cues present in different LF views, which is not possible using single-frame low-light enhancement techniques. We, therefore, propose a deep neural network for Low-Light Light Field (L3F) restoration, which we refer to as L3Fnet. The proposed L3Fnet not only performs the necessary visual enhancement of each LF view but also preserves the epipolar geometry across views. We achieve this by adopting a two-stage architecture for L3Fnet. Stage-I looks at all the LF views to encode the LF geometry. This encoded information is then used in Stage-II to reconstruct each LF view. To facilitate learning-based techniques for low-light LF imaging, we collected a comprehensive LF dataset of various scenes. For each scene, we captured four LFs, one with near-optimal exposure and ISO settings and the others at different levels of low-light conditions varying from low to extreme low-light settings. The effectiveness of the proposed L3Fnet is supported by both visual and numerical comparisons on this dataset. To further analyze the performance of low-light reconstruction methods, we also propose an L3F-wild dataset that contains LF captured late at night with almost zero lux values. No ground truth is available in this dataset. To perform well on the L3F-wild dataset, any method must adapt to the light level of the captured scene. To do this we propose a novel pre-processing block that makes L3Fnet robust to various degrees of low-light conditions. Lastly, we show that L3Fnet can also be used for low-light enhancement of single-frame images, despite it being engineered for LF data. We do so by converting the single-frame DSLR image into a form suitable to L3Fnet, which we call as pseudo-LF.
摘要：光场（LF）优惠，例如捕获后重新聚焦和深度估计，但低光照条件下的独特优势限制了这些功能。要恢复的低光-LF类我们应该利用几何线索存在于不同LF视图，使用单帧低光增强技术，该技术是不可能的。因此，我们提出了弱光光场（L3F）恢复，这是我们称之为L3Fnet了深刻的神经网络。所提出的L3Fnet不仅执行每个LF鉴于必要的视觉增强，但还保留了跨观点对极几何。我们采用了L3Fnet两级架构实现这一目标。舞台我看所有的LF意见来编码LF几何。此经编码的信息，然后在阶段-II用于重建每个LF图。为了方便弱光成像LF基于学习的技术，我们收集了各种场景的综合LF数据集。对于每一个场景，我们掌握四个LF类，一个以接近最佳曝光和ISO设置，和其他人不同层次的低光照条件下，从不同的低到极低的照明设置。所提出的L3Fnet的有效性是通过在此数据集可视和数值比较的支持。为了进一步分析的低光重构方法的性能，我们还提出了一个包含LF后期拍摄，晚上几乎为零照度值的L3F野生的数据集。没有地面实况在此数据集是可用的。要在L3F野生集表现良好，任何方法都必须适应拍摄场景的光线水平。要做到这一点，我们提出了一种新颖的前处理块，使得L3Fnet坚固以不同程度的低光照条件下。最后，我们表明，L3Fnet也可用于单个帧图像的低光增强，尽管它是专为LF数据。我们通过单帧DSLR图像转换成适合于L3Fnet一种形式，我们称之为伪-LF这样做。

32. Team O2AS at the World Robot Summit 2018: An Approach to Robotic Kitting and Assembly Tasks using General Purpose Grippers and Tools [PDF] 返回目录
Felix von Drigalski, Chisato Nakashima, Yoshiya Shibata, Yoshinori Konishi, Joshua C. Triyonoputro, Kaidi Nie, Damien Petit, Toshio Ueshiba, Ryuichi Takase, Yukiyasu Domae, Taku Yoshioka, Yoshihisa Ijiri, Ixchel G. Ramirez-Alpizar, Weiwei Wan, Kensuke Harada
Abstract: We propose a versatile robotic system for kitting and assembly tasks which uses no jigs or commercial tool changers. Instead of specialized end effectors, it uses its two-finger grippers to grasp and hold tools to perform subtasks such as screwing and suctioning. A third gripper is used as a precision picking and centering tool, and uses in-built passive compliance to compensate for small position errors and uncertainty. A novel grasp point detection for bin picking is described for the kitting task, using a single depth map. Using the proposed system we competed in the Assembly Challenge of the Industrial Robotics Category of the World Robot Challenge at the World Robot Summit 2018, obtaining 4th place and the SICE award for lean design and versatile tool use. We show the effectiveness of our approach through experiments performed during the competition.
摘要：我们提出了一个多功能机器人系统的配套和组装任务，它不使用夹具或商业换刀。而不是专门的末端执行器，它使用两个手指夹爪抓住不放工具来执行的子任务，诸如螺丝及抽吸。第三夹具被用作精密采摘和居中工具，内置被动地顺从的用途，以弥补小位置误差和不确定性。针对备料任务描述了一种新的把握点检测为仓采摘，使用单一的深度图。利用所提出的系统，我们在世界机器人挑战赛的工业机器人范畴大会挑战竞争在世界机器人峰会2018年，获得第4名和精益设计和灵活的工具使用SICE奖。我们证明我们的方法，通过在比赛中进行实验的有效性。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-03-06

目录

摘要