目录
3. Comprehensive Semantic Segmentation on High Resolution Aerial Imagery for Natural Disaster Assessment [PDF] 摘要
4. Evaluation of Deep Convolutional Generative Adversarial Networks for data augmentation of chest X-ray images [PDF] 摘要
6. Semantically Adaptive Image-to-image Translation for Domain Adaptation of Semantic Segmentation [PDF] 摘要
14. Exploiting Latent Codes: Interactive Fashion Product Generation, Similar Image Retrieval, and Cross-Category Recommendation using Variational Autoencoders [PDF] 摘要
17. Deep Generative Model for Image Inpainting with Local Binary Pattern Learning and Spatial Attention [PDF] 摘要
22. Unsupervised Feature Learning by Autoencoder and Prototypical Contrastive Learning for Hyperspectral Classification [PDF] 摘要
24. 3D Facial Geometry Recovery from a Depth View with Attention Guided Generative Adversarial Network [PDF] 摘要
27. Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [PDF] 摘要
38. On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption [PDF] 摘要
43. Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [PDF] 摘要
54. The Effect of Various Strengths of Noises and Data Augmentations on Classification of Short Single-Lead ECG Signals Using Deep Neural Networks [PDF] 摘要
60. Efficient, high-performance pancreatic segmentation using multi-scale feature extraction [PDF] 摘要
62. Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance [PDF] 摘要
63. On Open and Strong-Scaling Tools for Atom Probe Crystallography: High-Throughput Methods for Indexing Crystal Structure and Orientation [PDF] 摘要
64. Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images [PDF] 摘要
摘要
1. Lunar Crater Identification in Digital Images [PDF] 返回目录
John A. Christian, Harm Derksen, Ryan Watkins
Abstract: It is often necessary to identify a pattern of observed craters in a single image of the lunar surface and without any prior knowledge of the camera's location. This so-called "lost-in-space" crater identification problem is common in both crater-based terrain relative navigation (TRN) and in automatic registration of scientific imagery. Past work on crater identification has largely been based on heuristic schemes, with poor performance outside of a narrowly defined operating regime (e.g., nadir pointing images, small search areas). This work provides the first mathematically rigorous treatment of the general crater identification problem. It is shown when it is (and when it is not) possible to recognize a pattern of elliptical crater rims in an image formed by perspective projection. For the cases when it is possible to recognize a pattern, descriptors are developed using invariant theory that provably capture all of the viewpoint invariant information. These descriptors may be pre-computed for known crater patterns and placed in a searchable index for fast recognition. New techniques are also developed for computing pose from crater rim observations and for evaluating crater rim correspondences. These techniques are demonstrated on both synthetic and real images.
摘要:这是常常需要确定在月球表面的单个图像和没有相机位置的任何先验知识的观察到的凹坑图案。这种所谓的“中损失的空间”火山口识别问题是常见于两个地形基于火山口相对导航(TRN),并在科学图像的自动登记。在火山口识别过去的工作主要是基于启发式方案,以狭义的操作状态(例如,天底指向图像,小搜索区域)的性能外差。这项工作提供了一般的火山口识别问题的第一严格的数学处理。它被示出时,它是(并且当它不是)能够识别由透视投影形成的图像中的椭圆弧坑边缘的图案。对于当它是可以识别的模式的情况下,描述符使用不变的理论可证明捕获所有的观点不变信息发达。这些描述符可以被预先计算已知火山口图案和置于用于快速识别一个可搜索的索引。新技术还开发了用于计算从陨石坑观测姿态和评估火山口边缘对应。这些技术都表现出对合成和真实图像。
John A. Christian, Harm Derksen, Ryan Watkins
Abstract: It is often necessary to identify a pattern of observed craters in a single image of the lunar surface and without any prior knowledge of the camera's location. This so-called "lost-in-space" crater identification problem is common in both crater-based terrain relative navigation (TRN) and in automatic registration of scientific imagery. Past work on crater identification has largely been based on heuristic schemes, with poor performance outside of a narrowly defined operating regime (e.g., nadir pointing images, small search areas). This work provides the first mathematically rigorous treatment of the general crater identification problem. It is shown when it is (and when it is not) possible to recognize a pattern of elliptical crater rims in an image formed by perspective projection. For the cases when it is possible to recognize a pattern, descriptors are developed using invariant theory that provably capture all of the viewpoint invariant information. These descriptors may be pre-computed for known crater patterns and placed in a searchable index for fast recognition. New techniques are also developed for computing pose from crater rim observations and for evaluating crater rim correspondences. These techniques are demonstrated on both synthetic and real images.
摘要:这是常常需要确定在月球表面的单个图像和没有相机位置的任何先验知识的观察到的凹坑图案。这种所谓的“中损失的空间”火山口识别问题是常见于两个地形基于火山口相对导航(TRN),并在科学图像的自动登记。在火山口识别过去的工作主要是基于启发式方案,以狭义的操作状态(例如,天底指向图像,小搜索区域)的性能外差。这项工作提供了一般的火山口识别问题的第一严格的数学处理。它被示出时,它是(并且当它不是)能够识别由透视投影形成的图像中的椭圆弧坑边缘的图案。对于当它是可以识别的模式的情况下,描述符使用不变的理论可证明捕获所有的观点不变信息发达。这些描述符可以被预先计算已知火山口图案和置于用于快速识别一个可搜索的索引。新技术还开发了用于计算从陨石坑观测姿态和评估火山口边缘对应。这些技术都表现出对合成和真实图像。
2. Seeing wake words: Audio-visual Keyword Spotting [PDF] 返回目录
Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman
Abstract: The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for in the wild videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio is available, visual keyword spotting improves the performance both for a clean and noisy audio signal. Finally, (3) we show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data, by fine-tuning the network pre-trained on English. The method exceeds the performance of the previous state-of-the-art visual keyword spotting architecture when trained and tested on the same benchmark, and also that of a state-of-the-art lip reading method.
摘要:这项工作的目的是自动确定目标词是否以及何时由一个会说话的脸说话,带或不带音频。我们建议适用于野外视频的零拍法。我们的主要贡献是:(1)一种新的卷积架构,KWS型网,其使用相似地图中间表示以分离任务分为(i)序列匹配,和(ii)模式检测,以决定该字是否是存在的,什么时候; (2),我们证明,如果音频可用,视觉关键词定位改善了性能既为一个纯净和有噪声的音频信号。最后,(3),我们表明,我们的方法可以推广到其他语言,尤其是法国和德国,并达到一个相当的性能,以英语较少语言的具体数据,通过微调英语网络预先训练。一个国家的最先进的唇读方法的训练和在相同基准测试时,也该方法超过了以前的状态的最先进的视觉关键词定位架构的性能。
Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman
Abstract: The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for in the wild videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio is available, visual keyword spotting improves the performance both for a clean and noisy audio signal. Finally, (3) we show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data, by fine-tuning the network pre-trained on English. The method exceeds the performance of the previous state-of-the-art visual keyword spotting architecture when trained and tested on the same benchmark, and also that of a state-of-the-art lip reading method.
摘要:这项工作的目的是自动确定目标词是否以及何时由一个会说话的脸说话,带或不带音频。我们建议适用于野外视频的零拍法。我们的主要贡献是:(1)一种新的卷积架构,KWS型网,其使用相似地图中间表示以分离任务分为(i)序列匹配,和(ii)模式检测,以决定该字是否是存在的,什么时候; (2),我们证明,如果音频可用,视觉关键词定位改善了性能既为一个纯净和有噪声的音频信号。最后,(3),我们表明,我们的方法可以推广到其他语言,尤其是法国和德国,并达到一个相当的性能,以英语较少语言的具体数据,通过微调英语网络预先训练。一个国家的最先进的唇读方法的训练和在相同基准测试时,也该方法超过了以前的状态的最先进的视觉关键词定位架构的性能。
3. Comprehensive Semantic Segmentation on High Resolution Aerial Imagery for Natural Disaster Assessment [PDF] 返回目录
Maryam Rahnemoonfar, Tashnim Chowdhury, Robin Murphy, Odair Fernandes
Abstract: In this paper, we present a large-scale hurricane Michael dataset for visual perception in disaster scenarios, and analyze state-of-the-art deep neural network models for semantic segmentation. The dataset consists of around 2000 high-resolution aerial images, with annotated ground-truth data for semantic segmentation. We discuss the challenges of the dataset and train the state-of-the-art methods on this dataset to evaluate how well these methods can recognize the disaster situations. Finally, we discuss challenges for future research.
摘要:在本文中,我们提出了在灾难情况下的视觉感知大规模飓风迈克尔的数据集,并分析国家的最先进的深层神经网络模型的语义分割。该数据集包括大约2000高分辨率航空影像,与语义分割注释的地面实况数据。我们讨论了数据集的挑战,培养对这个数据集的国家的最先进的方法来评估这些方法如何识别灾害情况。最后,我们讨论未来研究的挑战。
Maryam Rahnemoonfar, Tashnim Chowdhury, Robin Murphy, Odair Fernandes
Abstract: In this paper, we present a large-scale hurricane Michael dataset for visual perception in disaster scenarios, and analyze state-of-the-art deep neural network models for semantic segmentation. The dataset consists of around 2000 high-resolution aerial images, with annotated ground-truth data for semantic segmentation. We discuss the challenges of the dataset and train the state-of-the-art methods on this dataset to evaluate how well these methods can recognize the disaster situations. Finally, we discuss challenges for future research.
摘要:在本文中,我们提出了在灾难情况下的视觉感知大规模飓风迈克尔的数据集,并分析国家的最先进的深层神经网络模型的语义分割。该数据集包括大约2000高分辨率航空影像,与语义分割注释的地面实况数据。我们讨论了数据集的挑战,培养对这个数据集的国家的最先进的方法来评估这些方法如何识别灾害情况。最后,我们讨论未来研究的挑战。
4. Evaluation of Deep Convolutional Generative Adversarial Networks for data augmentation of chest X-ray images [PDF] 返回目录
Sagar Kora Venu
Abstract: Medical image datasets are usually imbalanced, due to the high costs of obtaining the data and time-consuming annotations. Training deep neural network models on such datasets to accurately classify the medical condition does not yield desired results and often over-fits the data on majority class samples. In order to address this issue, data augmentation is often performed on training data by position augmentation techniques such as scaling, cropping, flipping, padding, rotation, translation, affine transformation, and color augmentation techniques such as brightness, contrast, saturation, and hue to increase the dataset sizes. These augmentation techniques are not guaranteed to be advantageous in domains with limited data, especially medical image data, and could lead to further overfitting. In this work, we performed data augmentation on the Chest X-rays dataset through generative modeling (deep convolutional generative adversarial network) which creates artificial instances retaining similar characteristics to the original data and evaluation of the model resulted in Fréchet Distance of Inception (FID) score of 1.289.
摘要:医学图像数据通常是不均衡的,由于获得的数据和费时的注释的成本高。在这样的数据集,以准确地训练深神经网络模型分类的医疗状况不会产生期望的结果,往往过装配在多数类的样本数据。为了解决这个问题,数据扩张通常训练数据由位置扩增技术,如缩放,剪切,翻转,填充,旋转,平移,仿射变换,和彩色增强技术,如亮度,对比度,饱和度和色调进行增加数据集的大小。这些增强技术不能保证在具有有限的数据,尤其是医用图像数据域有利的,并且可能导致进一步过度拟合。在这项工作中,我们通过生成模型(深卷积生成对抗性的网络),这创造人造情况下保持相似的特征原始数据和模型的评价结果在成立之初的Fréchet可距离(FID)进行数据增强胸部X射线数据集得分1.289。
Sagar Kora Venu
Abstract: Medical image datasets are usually imbalanced, due to the high costs of obtaining the data and time-consuming annotations. Training deep neural network models on such datasets to accurately classify the medical condition does not yield desired results and often over-fits the data on majority class samples. In order to address this issue, data augmentation is often performed on training data by position augmentation techniques such as scaling, cropping, flipping, padding, rotation, translation, affine transformation, and color augmentation techniques such as brightness, contrast, saturation, and hue to increase the dataset sizes. These augmentation techniques are not guaranteed to be advantageous in domains with limited data, especially medical image data, and could lead to further overfitting. In this work, we performed data augmentation on the Chest X-rays dataset through generative modeling (deep convolutional generative adversarial network) which creates artificial instances retaining similar characteristics to the original data and evaluation of the model resulted in Fréchet Distance of Inception (FID) score of 1.289.
摘要:医学图像数据通常是不均衡的,由于获得的数据和费时的注释的成本高。在这样的数据集,以准确地训练深神经网络模型分类的医疗状况不会产生期望的结果,往往过装配在多数类的样本数据。为了解决这个问题,数据扩张通常训练数据由位置扩增技术,如缩放,剪切,翻转,填充,旋转,平移,仿射变换,和彩色增强技术,如亮度,对比度,饱和度和色调进行增加数据集的大小。这些增强技术不能保证在具有有限的数据,尤其是医用图像数据域有利的,并且可能导致进一步过度拟合。在这项工作中,我们通过生成模型(深卷积生成对抗性的网络),这创造人造情况下保持相似的特征原始数据和模型的评价结果在成立之初的Fréchet可距离(FID)进行数据增强胸部X射线数据集得分1.289。
5. Transform Quantization for CNN Compression [PDF] 返回目录
Sean I. Young, Wang Zhe, David Taubman, Bernd Girod
Abstract: In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
摘要:在本文中,我们压缩卷积神经网络(CNN)的权重后的培训通过变换量化。上一页CNN量化技术往往忽视权重和激活的联合统计,在给定的量化比特率产生次优CNN的表现,或只有在训练中考虑他们的联合统计,做不利于已受过训练的CNN模型的高效压缩。我们最佳的变换(去相关),并用率失真框架,以提高在任何给定的量化比特率压缩量化加权后的培训。变换量化相结合量化和降维(去相关)技术在一个单一的框架,以促进细胞神经网络的低比特率压缩和高效的推理在变换域中。我们首先介绍率与畸变CNN量化的理论,并提出最合适的量化的率失真优化问题。然后,我们表明,这个问题可以用最优的比特深度分配继最优端至端据悉变换(ELT)我们在本文中导出去相关解决。实验表明,变换量化进步现有技术的状态在CNN压缩在这两个重新训练和非重新训练量化方案。特别是,我们发现与再培训变换量化能够CNN等车型AlexNet,RESNET和DenseNet压缩到很低的比特率(1-2位)。
Sean I. Young, Wang Zhe, David Taubman, Bernd Girod
Abstract: In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
摘要:在本文中,我们压缩卷积神经网络(CNN)的权重后的培训通过变换量化。上一页CNN量化技术往往忽视权重和激活的联合统计,在给定的量化比特率产生次优CNN的表现,或只有在训练中考虑他们的联合统计,做不利于已受过训练的CNN模型的高效压缩。我们最佳的变换(去相关),并用率失真框架,以提高在任何给定的量化比特率压缩量化加权后的培训。变换量化相结合量化和降维(去相关)技术在一个单一的框架,以促进细胞神经网络的低比特率压缩和高效的推理在变换域中。我们首先介绍率与畸变CNN量化的理论,并提出最合适的量化的率失真优化问题。然后,我们表明,这个问题可以用最优的比特深度分配继最优端至端据悉变换(ELT)我们在本文中导出去相关解决。实验表明,变换量化进步现有技术的状态在CNN压缩在这两个重新训练和非重新训练量化方案。特别是,我们发现与再培训变换量化能够CNN等车型AlexNet,RESNET和DenseNet压缩到很低的比特率(1-2位)。
6. Semantically Adaptive Image-to-image Translation for Domain Adaptation of Semantic Segmentation [PDF] 返回目录
Luigi Musto, Andrea Zinelli
Abstract: Domain shift is a very challenging problem for semantic segmentation. Any model can be easily trained on synthetic data, where images and labels are artificially generated, but it will perform poorly when deployed on real environments. In this paper, we address the problem of domain adaptation for semantic segmentation of street scenes. Many state-of-the-art approaches focus on translating the source image while imposing that the result should be semantically consistent with the input. However, we advocate that the image semantics can also be exploited to guide the translation algorithm. To this end, we rethink the generative model to enforce this assumption and strengthen the connection between pixel-level and feature-level domain alignment. We conduct extensive experiments by training common semantic segmentation models with our method and show that the results we obtain on the synthetic-to-real benchmarks surpass the state-of-the-art.
摘要:域转变是语义分割一个非常具有挑战性的问题。任何模型都可以很容易地培训了合成数据,其中图像和标签被人为产生的,而是真实的环境中部署时,它会表现不佳。在本文中,我们处理领域适应性的街道场景的语义分割问题。许多的最先进的状态上平移所述源图像,同时强加的结果应与输入语义一致接近焦点。然而,我们主张的图像语义也可以被利用来指导翻译算法。为此,我们重新思考生成模型来执行这个假设,增强像素级和功能级域校准之间的连接。我们通过培训共同语义分割模型与我们的方法,并显示结果我们获得合成到真正的基准进行了广泛的实验,超越国家的最先进的。
Luigi Musto, Andrea Zinelli
Abstract: Domain shift is a very challenging problem for semantic segmentation. Any model can be easily trained on synthetic data, where images and labels are artificially generated, but it will perform poorly when deployed on real environments. In this paper, we address the problem of domain adaptation for semantic segmentation of street scenes. Many state-of-the-art approaches focus on translating the source image while imposing that the result should be semantically consistent with the input. However, we advocate that the image semantics can also be exploited to guide the translation algorithm. To this end, we rethink the generative model to enforce this assumption and strengthen the connection between pixel-level and feature-level domain alignment. We conduct extensive experiments by training common semantic segmentation models with our method and show that the results we obtain on the synthetic-to-real benchmarks surpass the state-of-the-art.
摘要:域转变是语义分割一个非常具有挑战性的问题。任何模型都可以很容易地培训了合成数据,其中图像和标签被人为产生的,而是真实的环境中部署时,它会表现不佳。在本文中,我们处理领域适应性的街道场景的语义分割问题。许多的最先进的状态上平移所述源图像,同时强加的结果应与输入语义一致接近焦点。然而,我们主张的图像语义也可以被利用来指导翻译算法。为此,我们重新思考生成模型来执行这个假设,增强像素级和功能级域校准之间的连接。我们通过培训共同语义分割模型与我们的方法,并显示结果我们获得合成到真正的基准进行了广泛的实验,超越国家的最先进的。
7. Local-HDP: Interactive Open-Ended 3D Object Categorization [PDF] 返回目录
H. Ayoobi, H. Kasaei, M. Cao, R. Verbrugge, B. Verheij
Abstract: We introduce a non-parametric hierarchical Bayesian approach for open-ended 3D object categorization, named the Local Hierarchical Dirichlet Process (Local-HDP). This method allows an agent to learn independent topics for each category incrementally and to adapt to the environment in time. Hierarchical Bayesian approaches like Latent Dirichlet Allocation (LDA) can transform low-level features to high-level conceptual topics for 3D object categorization. However, the efficiency and accuracy of LDA-based approaches depend on the number of topics that is chosen manually. Moreover, fixing the number of topics for all categories can lead to overfitting or underfitting of the model. In contrast, the proposed Local-HDP can autonomously determine the number of topics for each category. Furthermore, an inference method is proposed that results in a fast posterior approximation. Experiments show that Local-HDP outperforms other state-of-the-art approaches in terms of accuracy, scalability, and memory efficiency with a large margin.
摘要:介绍了开放式的3D对象分类的非参数分层贝叶斯方法,命名为局部分层狄利克雷过程(本地-HDP)。这种方法允许代理逐步学会独立的主题为每个类别和时间来适应环境。分层贝叶斯方法等隐含狄利克雷分布(LDA)可以将低级别的功能,以用于3D对象分类的高级别概念性主题。然而,基于LDA的办法的效率和精度依赖于所选择的手动的主题数。此外,固定的主题数所有类别可能会导致过度拟合或模型欠拟合。相反,所提出的本地-HDP可以自主确定主题为每个类别的数量。此外,推理方法提出了导致快速后部近似。实验表明,当地HDP优于其他国家的最先进的精确度,可扩展性和存储效率方面接近了大比分。
H. Ayoobi, H. Kasaei, M. Cao, R. Verbrugge, B. Verheij
Abstract: We introduce a non-parametric hierarchical Bayesian approach for open-ended 3D object categorization, named the Local Hierarchical Dirichlet Process (Local-HDP). This method allows an agent to learn independent topics for each category incrementally and to adapt to the environment in time. Hierarchical Bayesian approaches like Latent Dirichlet Allocation (LDA) can transform low-level features to high-level conceptual topics for 3D object categorization. However, the efficiency and accuracy of LDA-based approaches depend on the number of topics that is chosen manually. Moreover, fixing the number of topics for all categories can lead to overfitting or underfitting of the model. In contrast, the proposed Local-HDP can autonomously determine the number of topics for each category. Furthermore, an inference method is proposed that results in a fast posterior approximation. Experiments show that Local-HDP outperforms other state-of-the-art approaches in terms of accuracy, scalability, and memory efficiency with a large margin.
摘要:介绍了开放式的3D对象分类的非参数分层贝叶斯方法,命名为局部分层狄利克雷过程(本地-HDP)。这种方法允许代理逐步学会独立的主题为每个类别和时间来适应环境。分层贝叶斯方法等隐含狄利克雷分布(LDA)可以将低级别的功能,以用于3D对象分类的高级别概念性主题。然而,基于LDA的办法的效率和精度依赖于所选择的手动的主题数。此外,固定的主题数所有类别可能会导致过度拟合或模型欠拟合。相反,所提出的本地-HDP可以自主确定主题为每个类别的数量。此外,推理方法提出了导致快速后部近似。实验表明,当地HDP优于其他国家的最先进的精确度,可扩展性和存储效率方面接近了大比分。
8. Long-Term Anticipation of Activities with Cycle Consistency [PDF] 返回目录
Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, Juergen Gall
Abstract: With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads.
摘要:随着在视频分析活动的深度学习方法的成功,更多的关注最近一直专注对预测未来的活动。然而,大多数预期上工作的任一分析部分观察到的活性或预测下一个动作类。近日,新方法被提出以延伸预测水平在未来数分钟,并且预测未来的活动序列,包括他们的持续时间。尽管这些作品从脱钩的预期任务的观测序列的语义解释,我们提出了一个框架,直接从观测到的帧的功能,预测未来的活动,并在终端到终端的时尚训练它。此外,我们通过预测给出的预测未来,过去的活动引入随时间变化的周期一致性的损失。我们的框架,实现了对两个数据集的国家的最先进成果:早餐数据集和50Salads。
Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, Juergen Gall
Abstract: With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads.
摘要:随着在视频分析活动的深度学习方法的成功,更多的关注最近一直专注对预测未来的活动。然而,大多数预期上工作的任一分析部分观察到的活性或预测下一个动作类。近日,新方法被提出以延伸预测水平在未来数分钟,并且预测未来的活动序列,包括他们的持续时间。尽管这些作品从脱钩的预期任务的观测序列的语义解释,我们提出了一个框架,直接从观测到的帧的功能,预测未来的活动,并在终端到终端的时尚训练它。此外,我们通过预测给出的预测未来,过去的活动引入随时间变化的周期一致性的损失。我们的框架,实现了对两个数据集的国家的最先进成果:早餐数据集和50Salads。
9. Lifelong Object Detection [PDF] 返回目录
Wang Zhou, Shiyu Chang, Norma Sosa, Hendrik Hamann, David Cox
Abstract: Recent advances in object detection have benefited significantly from rapid developments in deep neural networks. However, neural networks suffer from the well-known issue of catastrophic forgetting, which makes continual or lifelong learning problematic. In this paper, we leverage the fact that new training classes arrive in a sequential manner and incrementally refine the model so that it additionally detects new object classes in the absence of previous training data. Specifically, we consider the representative object detector, Faster R-CNN, for both accurate and efficient prediction. To prevent abrupt performance degradation due to catastrophic forgetting, we propose to apply knowledge distillation on both the region proposal network and the region classification network, to retain the detection of previously trained classes. A pseudo-positive-aware sampling strategy is also introduced for distillation sample selection. We evaluate the proposed method on PASCAL VOC 2007 and MS COCO benchmarks and show competitive mAP and 6x inference speed improvement, which makes the approach more suitable for real-time applications. Our implementation will be publicly available.
摘要:目标检测的最新进展,从深层神经网络的快速发展显著受益。然而,神经网络灾难性遗忘众所周知的问题,这使得连续或者终身学习问题的困扰。在本文中,我们利用的事实,新的培训课程,以连续的方式到达并逐步改进模型,使其另外在没有以前的训练数据的检测到新的对象类。具体来说,我们考虑代表物体检测,更快R-CNN,用于既准确又高效预测。为了防止突然的性能下降,由于灾难性的遗忘,我们建议该地区的建议网络和区域分类网络上都运用知识蒸馏,以保留先前的培训类的检测。伪阳性感知抽样策略也被引入蒸馏样本选择。我们评估的PASCAL VOC 2007和MS COCO基准所提出的方法,并显示有竞争力的地图和推论6倍速度的提高,这使得该方法更适合于实时应用。我们的实现将公开。
Wang Zhou, Shiyu Chang, Norma Sosa, Hendrik Hamann, David Cox
Abstract: Recent advances in object detection have benefited significantly from rapid developments in deep neural networks. However, neural networks suffer from the well-known issue of catastrophic forgetting, which makes continual or lifelong learning problematic. In this paper, we leverage the fact that new training classes arrive in a sequential manner and incrementally refine the model so that it additionally detects new object classes in the absence of previous training data. Specifically, we consider the representative object detector, Faster R-CNN, for both accurate and efficient prediction. To prevent abrupt performance degradation due to catastrophic forgetting, we propose to apply knowledge distillation on both the region proposal network and the region classification network, to retain the detection of previously trained classes. A pseudo-positive-aware sampling strategy is also introduced for distillation sample selection. We evaluate the proposed method on PASCAL VOC 2007 and MS COCO benchmarks and show competitive mAP and 6x inference speed improvement, which makes the approach more suitable for real-time applications. Our implementation will be publicly available.
摘要:目标检测的最新进展,从深层神经网络的快速发展显著受益。然而,神经网络灾难性遗忘众所周知的问题,这使得连续或者终身学习问题的困扰。在本文中,我们利用的事实,新的培训课程,以连续的方式到达并逐步改进模型,使其另外在没有以前的训练数据的检测到新的对象类。具体来说,我们考虑代表物体检测,更快R-CNN,用于既准确又高效预测。为了防止突然的性能下降,由于灾难性的遗忘,我们建议该地区的建议网络和区域分类网络上都运用知识蒸馏,以保留先前的培训类的检测。伪阳性感知抽样策略也被引入蒸馏样本选择。我们评估的PASCAL VOC 2007和MS COCO基准所提出的方法,并显示有竞争力的地图和推论6倍速度的提高,这使得该方法更适合于实时应用。我们的实现将公开。
10. Perceptual Deep Neural Networks: Adversarial Robustness through Input Recreation [PDF] 返回目录
Danilo Vasconcellos Vargas, Bingli Liao, Takahiro Kanzaki
Abstract: Adversarial examples have shown that albeit highly accurate, models learned by machines, differently from humans,have many weaknesses. However, humans' perception is also fundamentally different from machines, because we do not see the signals which arrive at the retina but a rather complex recreation of them. In this paper, we explore how machines could recreate the input as well as investigate the benefits of such an augmented perception. In this regard, we propose Perceptual Deep Neural Networks ($\varphi$DNN) which also recreate their own input before further processing. The concept is formalized mathematically and two variations of it are developed (one based on inpainting the whole image and the other based on a noisy resized super resolution recreation). Experiments reveal that $\varphi$DNNs can reduce attacks' accuracy substantially, surpassing even state-of-the-art defenses. Moreover, the recreation process intentionally corrupts the input image. Interestingly, we show by ablation tests that corrupting the input is, although counter-intuitive,beneficial. This suggests that the blind-spot in vertebrates might also be, analogously, the precursor of visual robustness. Thus, $\varphi$DNNs reveal that input recreation has strong benefits for artificial neural networks similar to biological ones, shedding light into the importance of the blind-spot and starting an area of perception models for robust recognition in artificial intelligence.
摘要:对抗性的例子表明,尽管通过机器从人类学会,不同的高精确度,模型,有很多弱点。然而,人类的感知也从根本上机不同,因为我们没有看到它在视网膜但其中一个相当复杂的娱乐到来的信号。在本文中,我们将探讨如何机器可以重新输入,以及探讨这样一个增强感觉的好处。在这方面,我们提出了知觉深层神经网络($ \ varphi $ DNN),以便进一步处理之前也重现自己的投入。这个概念是在数学上形式化,它的两个变体的开发(基于补绘基于嘈杂整个图像,而另一个调整超分辨率娱乐)。实验显示,$ \ varphi $ DNNs可以大大减少攻击的准确性,甚至超过了国家的最先进的防御系统。此外,娱乐过程故意破坏了输入图像。有趣的是,我们通过展示消融测试,破坏输入的是,尽管反直觉的,有益的。这表明盲点脊椎动物也可能是类似,视觉鲁棒性的前体。因此,$ \ varphi $ DNNs表明输入娱乐具有类似生物的人人工神经网络,脱落光进入盲点的重要性,并开始感知模型的区域人工智能的强大的识别很大的好处。
Danilo Vasconcellos Vargas, Bingli Liao, Takahiro Kanzaki
Abstract: Adversarial examples have shown that albeit highly accurate, models learned by machines, differently from humans,have many weaknesses. However, humans' perception is also fundamentally different from machines, because we do not see the signals which arrive at the retina but a rather complex recreation of them. In this paper, we explore how machines could recreate the input as well as investigate the benefits of such an augmented perception. In this regard, we propose Perceptual Deep Neural Networks ($\varphi$DNN) which also recreate their own input before further processing. The concept is formalized mathematically and two variations of it are developed (one based on inpainting the whole image and the other based on a noisy resized super resolution recreation). Experiments reveal that $\varphi$DNNs can reduce attacks' accuracy substantially, surpassing even state-of-the-art defenses. Moreover, the recreation process intentionally corrupts the input image. Interestingly, we show by ablation tests that corrupting the input is, although counter-intuitive,beneficial. This suggests that the blind-spot in vertebrates might also be, analogously, the precursor of visual robustness. Thus, $\varphi$DNNs reveal that input recreation has strong benefits for artificial neural networks similar to biological ones, shedding light into the importance of the blind-spot and starting an area of perception models for robust recognition in artificial intelligence.
摘要:对抗性的例子表明,尽管通过机器从人类学会,不同的高精确度,模型,有很多弱点。然而,人类的感知也从根本上机不同,因为我们没有看到它在视网膜但其中一个相当复杂的娱乐到来的信号。在本文中,我们将探讨如何机器可以重新输入,以及探讨这样一个增强感觉的好处。在这方面,我们提出了知觉深层神经网络($ \ varphi $ DNN),以便进一步处理之前也重现自己的投入。这个概念是在数学上形式化,它的两个变体的开发(基于补绘基于嘈杂整个图像,而另一个调整超分辨率娱乐)。实验显示,$ \ varphi $ DNNs可以大大减少攻击的准确性,甚至超过了国家的最先进的防御系统。此外,娱乐过程故意破坏了输入图像。有趣的是,我们通过展示消融测试,破坏输入的是,尽管反直觉的,有益的。这表明盲点脊椎动物也可能是类似,视觉鲁棒性的前体。因此,$ \ varphi $ DNNs表明输入娱乐具有类似生物的人人工神经网络,脱落光进入盲点的重要性,并开始感知模型的区域人工智能的强大的识别很大的好处。
11. Face Image Quality Assessment: A Literature Survey [PDF] 返回目录
Torsten Schlett, Christian Rathgeb, Olaf Henniger, Javier Galbally, Julian Fierrez, Christoph Busch
Abstract: The performance of face analysis and recognition systems depends on the quality of the acquired face data, which is influenced by numerous factors. Automatically assessing the quality of face data in terms of biometric utility can thus be useful to filter out low quality data. This survey provides an overview of the face quality assessment literature in the framework of face biometrics, with a focus on face recognition based on visible wavelength face images as opposed to e.g. depth or infrared quality assessment. A trend towards deep learning based methods is observed, including notable conceptual differences among the recent approaches. Besides image selection, face image quality assessment can also be used in a variety of other application scenarios, which are discussed herein. Open issues and challenges are pointed out, i.a. highlighting the importance of comparability for algorithm evaluations, and the challenge for future work to create deep learning approaches that are interpretable in addition to providing accurate utility predictions.
摘要:面部分析和识别系统的性能取决于所获得的脸部数据,这是由多种因素影响的质量。从而自动评估面部数据的生物测定在实用性方面的质量可以过滤出低质量的数据是有用的。此调查提供在面对生物识别的框架内面质量评估文献的概要,其中基于可见光波长面部图像聚焦于面部识别,而不是例如深度或红外线质量评估。向基于深度学习方法的趋势观察,包括最近方法之间显着概念上的差异。此外图像选择,人脸图像质量评价也可以在各种其它应用场景中,在此讨论中使用。开放式的问题和挑战,指出了,关节内突出可比性的算法评估的重要性,以及对未来工作的挑战,创造了在除了提供精确的预测效用解释深学习方法。
Torsten Schlett, Christian Rathgeb, Olaf Henniger, Javier Galbally, Julian Fierrez, Christoph Busch
Abstract: The performance of face analysis and recognition systems depends on the quality of the acquired face data, which is influenced by numerous factors. Automatically assessing the quality of face data in terms of biometric utility can thus be useful to filter out low quality data. This survey provides an overview of the face quality assessment literature in the framework of face biometrics, with a focus on face recognition based on visible wavelength face images as opposed to e.g. depth or infrared quality assessment. A trend towards deep learning based methods is observed, including notable conceptual differences among the recent approaches. Besides image selection, face image quality assessment can also be used in a variety of other application scenarios, which are discussed herein. Open issues and challenges are pointed out, i.a. highlighting the importance of comparability for algorithm evaluations, and the challenge for future work to create deep learning approaches that are interpretable in addition to providing accurate utility predictions.
摘要:面部分析和识别系统的性能取决于所获得的脸部数据,这是由多种因素影响的质量。从而自动评估面部数据的生物测定在实用性方面的质量可以过滤出低质量的数据是有用的。此调查提供在面对生物识别的框架内面质量评估文献的概要,其中基于可见光波长面部图像聚焦于面部识别,而不是例如深度或红外线质量评估。向基于深度学习方法的趋势观察,包括最近方法之间显着概念上的差异。此外图像选择,人脸图像质量评价也可以在各种其它应用场景中,在此讨论中使用。开放式的问题和挑战,指出了,关节内突出可比性的算法评估的重要性,以及对未来工作的挑战,创造了在除了提供精确的预测效用解释深学习方法。
12. Unsupervised Domain Adaptation For Plant Organ Counting [PDF] 返回目录
Tewodros Ayalew, Jordan Ubbens, Ian Stavness
Abstract: Supervised learning is often used to count objects in images, but for counting small, densely located objects, the required image annotations are burdensome to collect. Counting plant organs for image-based plant phenotyping falls within this category. Object counting in plant images is further challenged by having plant image datasets with significant domain shift due to different experimental conditions, e.g. applying an annotated dataset of indoor plant images for use on outdoor images, or on a different plant species. In this paper, we propose a domain-adversarial learning approach for domain adaptation of density map estimation for the purposes of object counting. The approach does not assume perfectly aligned distributions between the source and target datasets, which makes it more broadly applicable within general object counting and plant organ counting tasks. Evaluation on two diverse object counting tasks (wheat spikelets, leaves) demonstrates consistent performance on the target datasets across different classes of domain shift: from indoor-to-outdoor images and from species-to-species adaptation.
摘要:监督学习经常被用来计算图像中的物体,但对于小计数,位于密集的对象,所需要的图像注释是繁重的收集。计数植物器官的基于图像的植物表型这一类内。在植物的图像对象计数由具有显著域移植物图象数据集进一步挑战由于不同的实验条件下,例如施加室内植物图像对室外图像中使用的带注释的数据集,或在不同的植物物种。在本文中,我们提出了密度图估计域适配对象计数的目的域对抗性的学习方法。该方法不采用源和目标数据集,这使得它更广泛地适用于一般目的计数和植物器官计数任务之间完全对齐分布。在两个不同的对象计数任务(小麦小穗,叶)评估表明跨不同类别域换挡的目标数据集性能稳定:从室内到室外的图像,并从种到种适应。
Tewodros Ayalew, Jordan Ubbens, Ian Stavness
Abstract: Supervised learning is often used to count objects in images, but for counting small, densely located objects, the required image annotations are burdensome to collect. Counting plant organs for image-based plant phenotyping falls within this category. Object counting in plant images is further challenged by having plant image datasets with significant domain shift due to different experimental conditions, e.g. applying an annotated dataset of indoor plant images for use on outdoor images, or on a different plant species. In this paper, we propose a domain-adversarial learning approach for domain adaptation of density map estimation for the purposes of object counting. The approach does not assume perfectly aligned distributions between the source and target datasets, which makes it more broadly applicable within general object counting and plant organ counting tasks. Evaluation on two diverse object counting tasks (wheat spikelets, leaves) demonstrates consistent performance on the target datasets across different classes of domain shift: from indoor-to-outdoor images and from species-to-species adaptation.
摘要:监督学习经常被用来计算图像中的物体,但对于小计数,位于密集的对象,所需要的图像注释是繁重的收集。计数植物器官的基于图像的植物表型这一类内。在植物的图像对象计数由具有显著域移植物图象数据集进一步挑战由于不同的实验条件下,例如施加室内植物图像对室外图像中使用的带注释的数据集,或在不同的植物物种。在本文中,我们提出了密度图估计域适配对象计数的目的域对抗性的学习方法。该方法不采用源和目标数据集,这使得它更广泛地适用于一般目的计数和植物器官计数任务之间完全对齐分布。在两个不同的对象计数任务(小麦小穗,叶)评估表明跨不同类别域换挡的目标数据集性能稳定:从室内到室外的图像,并从种到种适应。
13. Video Captioning Using Weak Annotation [PDF] 返回目录
Jingyi Hou, Yunde Jia, Xinxiao wu, Yayun Qi
Abstract: Video captioning has shown impressive progress in recent years. One key reason of the performance improvements made by existing methods lie in massive paired video-sentence data, but collecting such strong annotation, i.e., high-quality sentences, is time-consuming and laborious. It is the fact that there now exist an amazing number of videos with weak annotation that only contains semantic concepts such as actions and objects. In this paper, we investigate using weak annotation instead of strong annotation to train a video captioning model. To this end, we propose a progressive visual reasoning method that progressively generates fine sentences from weak annotations by inferring more semantic concepts and their dependency relationships for video captioning. To model concept relationships, we use dependency trees that are spanned by exploiting external knowledge from large sentence corpora. Through traversing the dependency trees, the sentences are generated to train the captioning model. Accordingly, we develop an iterative refinement algorithm that refines sentences via spanning dependency trees and fine-tunes the captioning model using the refined sentences in an alternative training manner. Experimental results demonstrate that our method using weak annotation is very competitive to the state-of-the-art methods using strong annotation.
摘要:视频字幕显示,近年来令人瞩目的进展。通过现有方法制备的性能改进的一个关键原因在于大量的成对视频句数据,但收集这种强注释,即,高品质的句子,是费时和费力的。这是现在存在着数量惊人的弱注解只包含语义概念,如动作和对象视频的事实。在本文中,我们研究了使用弱注释,而不是强标注的训练视频字幕模式。为此,我们建议通过推断更多的语义概念和视频字幕的依赖关系逐渐产生由弱精注释语句渐进视觉推理方法。为了模型概念的关系,我们使用的是由大句语料库利用外部知识的依赖跨越树。通过遍历依赖树,生成的句子来训练字幕模式。因此,我们开发了一个迭代优化算法,通过跨越依赖树木和提炼句子替代训练方式使用精炼的句子微调的字幕模型。实验结果表明,使用弱注解我们的方法是用强烈的注解国家的最先进的方法,非常有竞争力。
Jingyi Hou, Yunde Jia, Xinxiao wu, Yayun Qi
Abstract: Video captioning has shown impressive progress in recent years. One key reason of the performance improvements made by existing methods lie in massive paired video-sentence data, but collecting such strong annotation, i.e., high-quality sentences, is time-consuming and laborious. It is the fact that there now exist an amazing number of videos with weak annotation that only contains semantic concepts such as actions and objects. In this paper, we investigate using weak annotation instead of strong annotation to train a video captioning model. To this end, we propose a progressive visual reasoning method that progressively generates fine sentences from weak annotations by inferring more semantic concepts and their dependency relationships for video captioning. To model concept relationships, we use dependency trees that are spanned by exploiting external knowledge from large sentence corpora. Through traversing the dependency trees, the sentences are generated to train the captioning model. Accordingly, we develop an iterative refinement algorithm that refines sentences via spanning dependency trees and fine-tunes the captioning model using the refined sentences in an alternative training manner. Experimental results demonstrate that our method using weak annotation is very competitive to the state-of-the-art methods using strong annotation.
摘要:视频字幕显示,近年来令人瞩目的进展。通过现有方法制备的性能改进的一个关键原因在于大量的成对视频句数据,但收集这种强注释,即,高品质的句子,是费时和费力的。这是现在存在着数量惊人的弱注解只包含语义概念,如动作和对象视频的事实。在本文中,我们研究了使用弱注释,而不是强标注的训练视频字幕模式。为此,我们建议通过推断更多的语义概念和视频字幕的依赖关系逐渐产生由弱精注释语句渐进视觉推理方法。为了模型概念的关系,我们使用的是由大句语料库利用外部知识的依赖跨越树。通过遍历依赖树,生成的句子来训练字幕模式。因此,我们开发了一个迭代优化算法,通过跨越依赖树木和提炼句子替代训练方式使用精炼的句子微调的字幕模型。实验结果表明,使用弱注解我们的方法是用强烈的注解国家的最先进的方法,非常有竞争力。
14. Exploiting Latent Codes: Interactive Fashion Product Generation, Similar Image Retrieval, and Cross-Category Recommendation using Variational Autoencoders [PDF] 返回目录
James-Andrew Sarmiento
Abstract: The rise of deep learning applications in the fashion industry has fueled advances in curating large-scale datasets to build applications for product design, image retrieval, and recommender systems. In this paper, the author proposes using Variational Autoencoder (VAE) to build an interactive fashion product application framework that allows the users to generate products with attributes according to their liking, retrieve similar styles for the same product category, and receive content-based recommendations from other categories. Fashion product images dataset containing eyewear, footwear, and bags are appropriate to illustrate that this pipeline is applicable in the booming industry of e-commerce enabling direct user interaction in specifying desired products paired with new methods for data matching, and recommendation systems by using VAE and exploiting its generated latent codes.
摘要:深学习应用在时装界的兴起使得策划大型数据集建立了产品设计,图像检索和推荐系统应用方面的进展。在本文中,作者提出了使用变自动编码器(VAE)建立一个互动的时尚产品的应用框架,允许用户根据自己的喜好生成产品属性,检索同一产品类别类似的风格,并接收基于内容的推荐从其他类别。时尚产品图像数据集包含眼镜,鞋和袋适合,以说明该流水线中指定与进行数据匹配的新方法配对所需产物是适用于电子商务能够直接用户交互的蓬勃发展的行业,以及推荐系统通过使用VAE和利用其产生的潜代码。
James-Andrew Sarmiento
Abstract: The rise of deep learning applications in the fashion industry has fueled advances in curating large-scale datasets to build applications for product design, image retrieval, and recommender systems. In this paper, the author proposes using Variational Autoencoder (VAE) to build an interactive fashion product application framework that allows the users to generate products with attributes according to their liking, retrieve similar styles for the same product category, and receive content-based recommendations from other categories. Fashion product images dataset containing eyewear, footwear, and bags are appropriate to illustrate that this pipeline is applicable in the booming industry of e-commerce enabling direct user interaction in specifying desired products paired with new methods for data matching, and recommendation systems by using VAE and exploiting its generated latent codes.
摘要:深学习应用在时装界的兴起使得策划大型数据集建立了产品设计,图像检索和推荐系统应用方面的进展。在本文中,作者提出了使用变自动编码器(VAE)建立一个互动的时尚产品的应用框架,允许用户根据自己的喜好生成产品属性,检索同一产品类别类似的风格,并接收基于内容的推荐从其他类别。时尚产品图像数据集包含眼镜,鞋和袋适合,以说明该流水线中指定与进行数据匹配的新方法配对所需产物是适用于电子商务能够直接用户交互的蓬勃发展的行业,以及推荐系统通过使用VAE和利用其产生的潜代码。
15. Zero-Shot Human-Object Interaction Recognition via Affordance Graphs [PDF] 返回目录
Alessio Sarullo, Tingting Mu
Abstract: We propose a new approach for Zero-Shot Human-Object Interaction Recognition in the challenging setting that involves interactions with unseen actions (as opposed to just unseen combinations of seen actions and objects). Our approach makes use of knowledge external to the image content in the form of a graph that models affordance relations between actions and objects, i.e., whether an action can be performed on the given object or not. We propose a loss function with the aim of distilling the knowledge contained in the graph into the model, while also using the graph to regularise learnt representations by imposing a local structure on the latent space. We evaluate our approach on several datasets (including the popular HICO and HICO-DET) and show that it outperforms the current state of the art.
摘要:我们建议在涉及与看不见的动作交互(而不是看到的动作和对象只是看不见的组合)的挑战设置了零射门人机对象交互识别的新方法。我们的方法使得以图形的形式使用图像内容的知识外的那款启示动作和对象,即是否可以将给定对象上执行或不行动之间的关系。我们提出了一个损失函数与蒸馏包含在图形到模型知识的目的,同时还采用了图形通过对潜在空间强加一个局部结构来规范了解到表示。我们评估我们在几个数据集(包括流行的HICO和HICO-DET)方法,并表明它优于现有技术的当前状态。
Alessio Sarullo, Tingting Mu
Abstract: We propose a new approach for Zero-Shot Human-Object Interaction Recognition in the challenging setting that involves interactions with unseen actions (as opposed to just unseen combinations of seen actions and objects). Our approach makes use of knowledge external to the image content in the form of a graph that models affordance relations between actions and objects, i.e., whether an action can be performed on the given object or not. We propose a loss function with the aim of distilling the knowledge contained in the graph into the model, while also using the graph to regularise learnt representations by imposing a local structure on the latent space. We evaluate our approach on several datasets (including the popular HICO and HICO-DET) and show that it outperforms the current state of the art.
摘要:我们建议在涉及与看不见的动作交互(而不是看到的动作和对象只是看不见的组合)的挑战设置了零射门人机对象交互识别的新方法。我们的方法使得以图形的形式使用图像内容的知识外的那款启示动作和对象,即是否可以将给定对象上执行或不行动之间的关系。我们提出了一个损失函数与蒸馏包含在图形到模型知识的目的,同时还采用了图形通过对潜在空间强加一个局部结构来规范了解到表示。我们评估我们在几个数据集(包括流行的HICO和HICO-DET)方法,并表明它优于现有技术的当前状态。
16. IAUnet: Global Context-Aware Feature Learning for Person Re-Identification [PDF] 返回目录
Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen
Abstract: Person re-identification (reID) by CNNs based networks has achieved favorable performance in recent years. However, most of existing CNNs based methods do not take full advantage of spatial-temporal context modeling. In fact, the global spatial-temporal context can greatly clarify local distractions to enhance the target feature representation. To comprehensively leverage the spatial-temporal context information, in this work, we present a novel block, Interaction-Aggregation-Update (IAU), for high-performance person reID. Firstly, Spatial-Temporal IAU (STIAU) module is introduced. STIAU jointly incorporates two types of contextual interactions into a CNN framework for target feature learning. Here the spatial interactions learn to compute the contextual dependencies between different body parts of a single frame. While the temporal interactions are used to capture the contextual dependencies between the same body parts across all frames. Furthermore, a Channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts. Therefore, the IAU block enables the feature to incorporate the globally spatial, temporal, and channel context. It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet. The experiments show that IAUnet performs favorably against state-of-the-art on both image and video reID tasks and achieves compelling results on a general object categorization task. The source code is available at this https URL.
摘要:人重新鉴定(里德)基于细胞神经网络的网络在近几年取得了良好的业绩。然而,大多数现有的基于细胞神经网络的方法没有充分利用的时空背景建模。事实上,全球的时空背景下可以清晰地阐明了当地的干扰,以提高目标特征表示。为全面利用时空上下文信息,在这项工作中,我们提出了一个新的块,互动,聚集更新(IAU),高性能的人里德。首先,时空IAU(STIAU)模块被引入。 STIAU联合使用两种类型的上下文交互转化为目标特征的学习CNN的框架。在这里,空间相互作用学习来计算单个帧的不同的身体部位之间的上下文相关性。而时间相互作用被用来捕获在所有帧中的相同的身体部位之间的上下文相关性。此外,信道IAU(CIAU)模块被设计语义语境的相互作用之间的信道特征,以增强特征表示,模型特别适用于小规模的视觉提示和身体部位。因此,IAU块允许以纳入全局空间,时间,和频道的上下文特征。它是轻量级的,端 - 端可训练,并且可以很容易地插入到现有的细胞神经网络以形成IAUnet。实验结果表明,IAUnet执行针对有利状态的最先进的两个图像和视频REID任务并达到在一般对象分类任务令人信服的结果。源代码可在此HTTPS URL。
Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen
Abstract: Person re-identification (reID) by CNNs based networks has achieved favorable performance in recent years. However, most of existing CNNs based methods do not take full advantage of spatial-temporal context modeling. In fact, the global spatial-temporal context can greatly clarify local distractions to enhance the target feature representation. To comprehensively leverage the spatial-temporal context information, in this work, we present a novel block, Interaction-Aggregation-Update (IAU), for high-performance person reID. Firstly, Spatial-Temporal IAU (STIAU) module is introduced. STIAU jointly incorporates two types of contextual interactions into a CNN framework for target feature learning. Here the spatial interactions learn to compute the contextual dependencies between different body parts of a single frame. While the temporal interactions are used to capture the contextual dependencies between the same body parts across all frames. Furthermore, a Channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts. Therefore, the IAU block enables the feature to incorporate the globally spatial, temporal, and channel context. It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet. The experiments show that IAUnet performs favorably against state-of-the-art on both image and video reID tasks and achieves compelling results on a general object categorization task. The source code is available at this https URL.
摘要:人重新鉴定(里德)基于细胞神经网络的网络在近几年取得了良好的业绩。然而,大多数现有的基于细胞神经网络的方法没有充分利用的时空背景建模。事实上,全球的时空背景下可以清晰地阐明了当地的干扰,以提高目标特征表示。为全面利用时空上下文信息,在这项工作中,我们提出了一个新的块,互动,聚集更新(IAU),高性能的人里德。首先,时空IAU(STIAU)模块被引入。 STIAU联合使用两种类型的上下文交互转化为目标特征的学习CNN的框架。在这里,空间相互作用学习来计算单个帧的不同的身体部位之间的上下文相关性。而时间相互作用被用来捕获在所有帧中的相同的身体部位之间的上下文相关性。此外,信道IAU(CIAU)模块被设计语义语境的相互作用之间的信道特征,以增强特征表示,模型特别适用于小规模的视觉提示和身体部位。因此,IAU块允许以纳入全局空间,时间,和频道的上下文特征。它是轻量级的,端 - 端可训练,并且可以很容易地插入到现有的细胞神经网络以形成IAUnet。实验结果表明,IAUnet执行针对有利状态的最先进的两个图像和视频REID任务并达到在一般对象分类任务令人信服的结果。源代码可在此HTTPS URL。
17. Deep Generative Model for Image Inpainting with Local Binary Pattern Learning and Spatial Attention [PDF] 返回目录
Haiwei Wu, Jiantao Zhou, Yuanman Li
Abstract: Deep learning (DL) has demonstrated its powerful capabilities in the field of image inpainting. The DL-based image inpainting approaches can produce visually plausible results, but often generate various unpleasant artifacts, especially in the boundary and highly textured regions. To tackle this challenge, in this work, we propose a new end-to-end, two-stage (coarse-to-fine) generative model through combining a local binary pattern (LBP) learning network with an actual inpainting network. Specifically, the first LBP learning network using U-Net architecture is designed to accurately predict the structural information of the missing region, which subsequently guides the second image inpainting network for better filling the missing pixels. Furthermore, an improved spatial attention mechanism is integrated in the image inpainting network, by considering the consistency not only between the known region with the generated one, but also within the generated region itself. Extensive experiments on public datasets including CelebA-HQ, Places and Paris StreetView demonstrate that our model generates better inpainting results than the state-of-the-art competing algorithms, both quantitatively and qualitatively. The source code and trained models will be made available at this https URL.
摘要:深学习(DL)已经证明在图像修复领域中的强大功能。基于DL-图像修复方法能产生视觉上合理的结果,但往往产生各种不愉快伪像,尤其是在边界和高度纹理化的区域。为了应对这一挑战,在这项工作中,我们提出了一个新的终端到终端,双级(粗到细)生成通过局部二元模式(LBP)网络学习与实际修补网络相结合的模式。具体而言,使用U形网体系结构的第一LBP学习网络被设计成准确地预测丢失区域的结构的信息,其随后引导第二图像修复网络更好的填充丢失的像素。此外,改进的空间注意机构集成在图像修复网络中,通过不仅考虑与所产生的一个已知区域之间,而且所生成的区域本身内的一致性。公共数据集,包括CelebA-HQ,地点和巴黎街景大量的实验表明,我们的模型产生比国家的最先进的图像修复好成绩竞争算法,定量和定性。源代码和训练的模型将在此HTTPS URL提供。
Haiwei Wu, Jiantao Zhou, Yuanman Li
Abstract: Deep learning (DL) has demonstrated its powerful capabilities in the field of image inpainting. The DL-based image inpainting approaches can produce visually plausible results, but often generate various unpleasant artifacts, especially in the boundary and highly textured regions. To tackle this challenge, in this work, we propose a new end-to-end, two-stage (coarse-to-fine) generative model through combining a local binary pattern (LBP) learning network with an actual inpainting network. Specifically, the first LBP learning network using U-Net architecture is designed to accurately predict the structural information of the missing region, which subsequently guides the second image inpainting network for better filling the missing pixels. Furthermore, an improved spatial attention mechanism is integrated in the image inpainting network, by considering the consistency not only between the known region with the generated one, but also within the generated region itself. Extensive experiments on public datasets including CelebA-HQ, Places and Paris StreetView demonstrate that our model generates better inpainting results than the state-of-the-art competing algorithms, both quantitatively and qualitatively. The source code and trained models will be made available at this https URL.
摘要:深学习(DL)已经证明在图像修复领域中的强大功能。基于DL-图像修复方法能产生视觉上合理的结果,但往往产生各种不愉快伪像,尤其是在边界和高度纹理化的区域。为了应对这一挑战,在这项工作中,我们提出了一个新的终端到终端,双级(粗到细)生成通过局部二元模式(LBP)网络学习与实际修补网络相结合的模式。具体而言,使用U形网体系结构的第一LBP学习网络被设计成准确地预测丢失区域的结构的信息,其随后引导第二图像修复网络更好的填充丢失的像素。此外,改进的空间注意机构集成在图像修复网络中,通过不仅考虑与所产生的一个已知区域之间,而且所生成的区域本身内的一致性。公共数据集,包括CelebA-HQ,地点和巴黎街景大量的实验表明,我们的模型产生比国家的最先进的图像修复好成绩竞争算法,定量和定性。源代码和训练的模型将在此HTTPS URL提供。
18. Privacy Leakage of SIFT Features via Deep Generative Model based Image Reconstruction [PDF] 返回目录
Haiwei Wu, Jiantao Zhou
Abstract: Many practical applications, e.g., content based image retrieval and object recognition, heavily rely on the local features extracted from the query image. As these local features are usually exposed to untrustworthy parties, the privacy leakage problem of image local features has received increasing attention in recent years. In this work, we thoroughly evaluate the privacy leakage of Scale Invariant Feature Transform (SIFT), which is one of the most widely-used image local features. We first consider the case that the adversary can fully access the SIFT features, i.e., both the SIFT descriptors and the coordinates are available. We propose a novel end-to-end, coarse-to-fine deep generative model for reconstructing the latent image from its SIFT features. The designed deep generative model consists of two networks, where the first one attempts to learn the structural information of the latent image by transforming from SIFT features to Local Binary Pattern (LBP) features, while the second one aims to reconstruct the pixel values guided by the learned LBP. Compared with the state-of-the-art algorithms, the proposed deep generative model produces much improved reconstructed results over three public datasets. Furthermore, we address more challenging cases that only partial SIFT features (either SIFT descriptors or coordinates) are accessible to the adversary. It is shown that, if the adversary can only have access to the SIFT descriptors while not their coordinates, then the modest success of reconstructing the latent image can be achieved for highly-structured images (e.g., faces) and would fail in general settings. In addition, the latent image can be reconstructed with reasonably good quality solely from the SIFT coordinates. Our results would suggest that the privacy leakage problem can be largely avoided if the SIFT coordinates can be well protected.
摘要:许多实际应用中,例如,基于内容的图像检索和目标识别,在很大程度上依赖于从查询图像中提取的局部特征。由于这些局部特征通常暴露在不可信任的政党,图像局部特征的隐私泄漏问题日益受到重视,近年来。在这项工作中,我们要彻底评估变换(SIFT),这是最广泛使用的图像的局部特征之一尺度不变特征的隐私外泄。我们首先考虑的情况是,攻击者可以完全访问SIFT特征,即,无论是SIFT描述符和坐标都可用。我们提出了一个新颖的终端到终端,由粗到细深生成从其SIFT特征重建潜像模型。所设计的深生成模型包括两个网络,其中第一个尝试通过从SIFT转化学习潜在图像的结构信息提供给局部二元模式(LBP)的特点的,而第二个目标以重建由引导的像素值学习LBP。与国家的最先进的算法相比,所提出的深生成模型产生大大改善了三个公共数据集重建的结果。此外,我们要解决更多的挑战,只有部分SIFT特征(或者SIFT描述或坐标)是对手接近的情况下。它表明,如果对手仅有权访问SIFT描述符而没有它们的坐标,然后重构潜像的适度的成功可以为高度结构化的图像(例如,面部),并在常规设置将失败来实现。此外,潜像可以用相当不错的质量,单从SIFT坐标来重建。我们的研究结果建议,可以在很大程度上避免隐私泄露问题,如果SIFT坐标可以很好的保护。
Haiwei Wu, Jiantao Zhou
Abstract: Many practical applications, e.g., content based image retrieval and object recognition, heavily rely on the local features extracted from the query image. As these local features are usually exposed to untrustworthy parties, the privacy leakage problem of image local features has received increasing attention in recent years. In this work, we thoroughly evaluate the privacy leakage of Scale Invariant Feature Transform (SIFT), which is one of the most widely-used image local features. We first consider the case that the adversary can fully access the SIFT features, i.e., both the SIFT descriptors and the coordinates are available. We propose a novel end-to-end, coarse-to-fine deep generative model for reconstructing the latent image from its SIFT features. The designed deep generative model consists of two networks, where the first one attempts to learn the structural information of the latent image by transforming from SIFT features to Local Binary Pattern (LBP) features, while the second one aims to reconstruct the pixel values guided by the learned LBP. Compared with the state-of-the-art algorithms, the proposed deep generative model produces much improved reconstructed results over three public datasets. Furthermore, we address more challenging cases that only partial SIFT features (either SIFT descriptors or coordinates) are accessible to the adversary. It is shown that, if the adversary can only have access to the SIFT descriptors while not their coordinates, then the modest success of reconstructing the latent image can be achieved for highly-structured images (e.g., faces) and would fail in general settings. In addition, the latent image can be reconstructed with reasonably good quality solely from the SIFT coordinates. Our results would suggest that the privacy leakage problem can be largely avoided if the SIFT coordinates can be well protected.
摘要:许多实际应用中,例如,基于内容的图像检索和目标识别,在很大程度上依赖于从查询图像中提取的局部特征。由于这些局部特征通常暴露在不可信任的政党,图像局部特征的隐私泄漏问题日益受到重视,近年来。在这项工作中,我们要彻底评估变换(SIFT),这是最广泛使用的图像的局部特征之一尺度不变特征的隐私外泄。我们首先考虑的情况是,攻击者可以完全访问SIFT特征,即,无论是SIFT描述符和坐标都可用。我们提出了一个新颖的终端到终端,由粗到细深生成从其SIFT特征重建潜像模型。所设计的深生成模型包括两个网络,其中第一个尝试通过从SIFT转化学习潜在图像的结构信息提供给局部二元模式(LBP)的特点的,而第二个目标以重建由引导的像素值学习LBP。与国家的最先进的算法相比,所提出的深生成模型产生大大改善了三个公共数据集重建的结果。此外,我们要解决更多的挑战,只有部分SIFT特征(或者SIFT描述或坐标)是对手接近的情况下。它表明,如果对手仅有权访问SIFT描述符而没有它们的坐标,然后重构潜像的适度的成功可以为高度结构化的图像(例如,面部),并在常规设置将失败来实现。此外,潜像可以用相当不错的质量,单从SIFT坐标来重建。我们的研究结果建议,可以在很大程度上避免隐私泄露问题,如果SIFT坐标可以很好的保护。
19. ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and Interpolation [PDF] 返回目录
Akash Gupta, Abhishek Aich, Amit K. Roy-Chowdhury
Abstract: Existing works address the problem of generating high frame-rate sharp videos by separately learning the frame deblurring and frame interpolation modules. Most of these approaches have a strong prior assumption that all the input frames are blurry whereas in a real-world setting, the quality of frames varies. Moreover, such approaches are trained to perform either of the two tasks - deblurring or interpolation - in isolation, while many practical situations call for both. Different from these works, we address a more realistic problem of high frame-rate sharp video synthesis with no prior assumption that input is always blurry. We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation. We hypothesize that information from the latent representation of the consecutive frames can be utilized to generate optimized representations for both frame deblurring and frame interpolation. Specifically, we employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame. The optimized representation learnt using these attention modules help the model to generate and interpolate sharp frames. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
摘要:现有作品解决由分别学习帧去模糊和帧内插模块生成高帧频急剧视频的问题。大多数这些方法具有很强的先假设所有的输入帧是模糊的,而在现实世界中的设置,框架的质量参差不齐。此外,这样的方法进行培训,要么两个任务进行 - 去模糊或插值 - 孤立,而许多实际情况要求两者。从这些作品不同的是,我们针对高帧频鲜明的影像合成一个更现实的问题,没有事先假定输入总是模糊。我们介绍一种新颖的体系结构,自适应潜注意网络(ALANET),其合成先前没有输入帧的知识是模糊的或没有,从而执行既去模糊和内插的任务尖锐高帧频的视频。我们假设从连续帧的潜表示该信息可被用来生成用于两个帧去模糊和帧内插优化的表示。具体而言,我们采用的自关注和连续帧之间的交叉注意模块组合在潜空间以产生用于每帧优化的表示。使用这些模块的关注有助于模型来生成和插值帧锐利的优化的表示得知。在标准数据集大量的实验证明我们的方法中优选针对不同国家的最先进的方法,即使我们解决一个更棘手的问题进行。
Akash Gupta, Abhishek Aich, Amit K. Roy-Chowdhury
Abstract: Existing works address the problem of generating high frame-rate sharp videos by separately learning the frame deblurring and frame interpolation modules. Most of these approaches have a strong prior assumption that all the input frames are blurry whereas in a real-world setting, the quality of frames varies. Moreover, such approaches are trained to perform either of the two tasks - deblurring or interpolation - in isolation, while many practical situations call for both. Different from these works, we address a more realistic problem of high frame-rate sharp video synthesis with no prior assumption that input is always blurry. We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation. We hypothesize that information from the latent representation of the consecutive frames can be utilized to generate optimized representations for both frame deblurring and frame interpolation. Specifically, we employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame. The optimized representation learnt using these attention modules help the model to generate and interpolate sharp frames. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
摘要:现有作品解决由分别学习帧去模糊和帧内插模块生成高帧频急剧视频的问题。大多数这些方法具有很强的先假设所有的输入帧是模糊的,而在现实世界中的设置,框架的质量参差不齐。此外,这样的方法进行培训,要么两个任务进行 - 去模糊或插值 - 孤立,而许多实际情况要求两者。从这些作品不同的是,我们针对高帧频鲜明的影像合成一个更现实的问题,没有事先假定输入总是模糊。我们介绍一种新颖的体系结构,自适应潜注意网络(ALANET),其合成先前没有输入帧的知识是模糊的或没有,从而执行既去模糊和内插的任务尖锐高帧频的视频。我们假设从连续帧的潜表示该信息可被用来生成用于两个帧去模糊和帧内插优化的表示。具体而言,我们采用的自关注和连续帧之间的交叉注意模块组合在潜空间以产生用于每帧优化的表示。使用这些模块的关注有助于模型来生成和插值帧锐利的优化的表示得知。在标准数据集大量的实验证明我们的方法中优选针对不同国家的最先进的方法,即使我们解决一个更棘手的问题进行。
20. Perceiving Humans: from Monocular 3D Localization to Social Distancing [PDF] 返回目录
Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi
Abstract: Perceiving humans in the context of Intelligent Transportation Systems (ITS) often relies on multiple cameras or expensive LiDAR sensors. In this work, we present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image. We address the challenges related to the ill-posed monocular 3D tasks by proposing a deep learning method that predicts confidence intervals in contrast to point estimates. Our neural network architecture estimates humans 3D body locations and their orientation with a measure of uncertainty. Our vision-based system (i) is privacy-safe, (ii) works with any fixed or moving cameras, and (iii) does not rely on ground plane estimation. We demonstrate the performance of our method with respect to three applications: locating humans in 3D, detecting social interactions, and verifying the compliance of recent safety measures due to the COVID-19 outbreak. Indeed, we show that we can rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule. We publicly share the source code towards an open science mission.
摘要:在智能交通系统(ITS)的环境感知人类往往依赖于多台摄像机或昂贵的激光雷达传感器。在这项工作中,我们提出了一种新的具有成本效益的基于视觉的方法感知到人类的三维位置,并从单个图像身体方向。我们应对提议预测置信区间对比点估计了深刻的学习方法相关的病态单眼3D任务的挑战。我们的神经网络结构估计人类三维人体的位置和它们与不确定性的度量方向。我们基于视觉系统(i)为确保隐私安全,(二)适用于任何固定或移动摄像机,及(iii)不依赖于地平面估计。我们证明了我们对于方法的性能,以三个应用程序:在3D定位人类,检测社会交往和验证的最近安全措施遵守由于COVID-19的爆发。事实上,我们表明,我们可以重新考虑“社会距离”的概念,相反,一个简单的基于位置的规则,社会交往的一种形式。我们公开分享在一个开放的科学任务的源代码。
Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi
Abstract: Perceiving humans in the context of Intelligent Transportation Systems (ITS) often relies on multiple cameras or expensive LiDAR sensors. In this work, we present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image. We address the challenges related to the ill-posed monocular 3D tasks by proposing a deep learning method that predicts confidence intervals in contrast to point estimates. Our neural network architecture estimates humans 3D body locations and their orientation with a measure of uncertainty. Our vision-based system (i) is privacy-safe, (ii) works with any fixed or moving cameras, and (iii) does not rely on ground plane estimation. We demonstrate the performance of our method with respect to three applications: locating humans in 3D, detecting social interactions, and verifying the compliance of recent safety measures due to the COVID-19 outbreak. Indeed, we show that we can rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule. We publicly share the source code towards an open science mission.
摘要:在智能交通系统(ITS)的环境感知人类往往依赖于多台摄像机或昂贵的激光雷达传感器。在这项工作中,我们提出了一种新的具有成本效益的基于视觉的方法感知到人类的三维位置,并从单个图像身体方向。我们应对提议预测置信区间对比点估计了深刻的学习方法相关的病态单眼3D任务的挑战。我们的神经网络结构估计人类三维人体的位置和它们与不确定性的度量方向。我们基于视觉系统(i)为确保隐私安全,(二)适用于任何固定或移动摄像机,及(iii)不依赖于地平面估计。我们证明了我们对于方法的性能,以三个应用程序:在3D定位人类,检测社会交往和验证的最近安全措施遵守由于COVID-19的爆发。事实上,我们表明,我们可以重新考虑“社会距离”的概念,相反,一个简单的基于位置的规则,社会交往的一种形式。我们公开分享在一个开放的科学任务的源代码。
21. MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks [PDF] 返回目录
Chen Ma, Li Chen, Junhai Yong
Abstract: Many adversarial attacks have been proposed to investigate the security issues of deep neural networks. For the black-box setting, current model stealing attacks train a substitute model to counterfeit the functionality of the target model. However, the training requires querying the target model. Consequently, the query complexity remains high and such attacks can be defended easily by deploying the defense mechanism. In this study, we aim to learn a generalized substitute model called MetaSimulator that can mimic the functionality of the unknown target models. To this end, we build the training data with the form of multi-tasks by collecting query sequences generated in the attack of various existing networks. The learning consists of a double-network framework, including the task-specific network and MetaSimulator network, to learn the general simulation capability. Specifically, the task-specific network computes each task's meta-gradient, which is further accumulated from multiple tasks to update MetaSimulator to improve generalization. When attacking a target model that is unseen in training, the trained MetaSimulator can simulate its functionality accurately using its limited feedback. As a result, a large fraction of queries can be transferred to MetaSimulator in the attack, thereby reducing the high query complexity. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and TinyImageNet datasets demonstrate the proposed approach saves twice the number of queries on average compared with the baseline method. The source code is released on this https URL .
摘要:许多敌对攻击已经被提出来探讨深层神经网络的安全问题。对于黑盒设置,当前模型窃取攻击训练的替代模型假冒对象模型的功能。然而,训练需要查询的目标模式。因此,查询的复杂性仍然很高,这种攻击可以很容易地部署防御机制来捍卫。在这项研究中,我们的目标是学名为MetaSimulator广义的替代模型,可以模拟未知目标模型的功能。为此,我们通过收集各种现有网络的攻击生成的查询序列构建具有多任务形式的训练数据。学习由双网络架构,包括任务专用的网络和MetaSimulator网络,学习一般的模拟能力。具体来说,任务专用的网络计算每个任务的元梯度,它是由多个任务进一步累积更新MetaSimulator改善泛化。当攻击的目标模式,是在训练中看不见的,训练有素的MetaSimulator可以准确地利用其有限的反馈模拟其功能。其结果是,查询的大部分可以转移到MetaSimulator的攻击,从而降低了高查询的复杂。全面实验上CIFAR-10,CIFAR-100进行的,并TinyImageNet数据集表明,该方法节约了查询平均与基线方法相比的数量的两倍。源代码发布了有关该HTTPS URL。
Chen Ma, Li Chen, Junhai Yong
Abstract: Many adversarial attacks have been proposed to investigate the security issues of deep neural networks. For the black-box setting, current model stealing attacks train a substitute model to counterfeit the functionality of the target model. However, the training requires querying the target model. Consequently, the query complexity remains high and such attacks can be defended easily by deploying the defense mechanism. In this study, we aim to learn a generalized substitute model called MetaSimulator that can mimic the functionality of the unknown target models. To this end, we build the training data with the form of multi-tasks by collecting query sequences generated in the attack of various existing networks. The learning consists of a double-network framework, including the task-specific network and MetaSimulator network, to learn the general simulation capability. Specifically, the task-specific network computes each task's meta-gradient, which is further accumulated from multiple tasks to update MetaSimulator to improve generalization. When attacking a target model that is unseen in training, the trained MetaSimulator can simulate its functionality accurately using its limited feedback. As a result, a large fraction of queries can be transferred to MetaSimulator in the attack, thereby reducing the high query complexity. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and TinyImageNet datasets demonstrate the proposed approach saves twice the number of queries on average compared with the baseline method. The source code is released on this https URL .
摘要:许多敌对攻击已经被提出来探讨深层神经网络的安全问题。对于黑盒设置,当前模型窃取攻击训练的替代模型假冒对象模型的功能。然而,训练需要查询的目标模式。因此,查询的复杂性仍然很高,这种攻击可以很容易地部署防御机制来捍卫。在这项研究中,我们的目标是学名为MetaSimulator广义的替代模型,可以模拟未知目标模型的功能。为此,我们通过收集各种现有网络的攻击生成的查询序列构建具有多任务形式的训练数据。学习由双网络架构,包括任务专用的网络和MetaSimulator网络,学习一般的模拟能力。具体来说,任务专用的网络计算每个任务的元梯度,它是由多个任务进一步累积更新MetaSimulator改善泛化。当攻击的目标模式,是在训练中看不见的,训练有素的MetaSimulator可以准确地利用其有限的反馈模拟其功能。其结果是,查询的大部分可以转移到MetaSimulator的攻击,从而降低了高查询的复杂。全面实验上CIFAR-10,CIFAR-100进行的,并TinyImageNet数据集表明,该方法节约了查询平均与基线方法相比的数量的两倍。源代码发布了有关该HTTPS URL。
22. Unsupervised Feature Learning by Autoencoder and Prototypical Contrastive Learning for Hyperspectral Classification [PDF] 返回目录
Zeyu Cao, Xiaorun Li, Liaoying Zhao
Abstract: Unsupervised learning methods for feature extraction are becoming more and more popular. We combine the popular contrastive learning method (prototypical contrastive learning) and the classic representation learning method (autoencoder) to design an unsupervised feature learning network for hyperspectral classification. Experiments have proved that our two proposed autoencoder networks have good feature learning capabilities by themselves, and the contrastive learning network we designed can better combine the features of the two to learn more representative features. As a result, our method surpasses other comparison methods in the hyperspectral classification experiments, including some supervised methods. Moreover, our method maintains a fast feature extraction speed than baseline methods. In addition, our method reduces the requirements for huge computing resources, separates feature extraction and contrastive learning, and allows more researchers to conduct research and experiments on unsupervised contrastive learning.
摘要:特征提取无指导的学习方法变得越来越流行。我们结合了流行的对比学习方法(原型对比学习)和经典的代表学习方法(自动编码器)来设计的无监督功能学习网络的高光谱分类。实验证明,我们两国提出的自动编码网络本身具有很好的功能的学习能力,我们设计了对比学习网络可以更好地将两者结合起来的功能,以了解更多有代表性的特征。其结果是,我们的方法超越了高光谱分类实验等比较方法,包括一些监管方法。此外,我们的方法可以保持较快的特征提取速度比基准方法。此外,我们的方法减少了庞大的计算资源的要求,分离特征提取和对比学习,并允许更多的研究者进行研究和实验,无监督学习对比。
Zeyu Cao, Xiaorun Li, Liaoying Zhao
Abstract: Unsupervised learning methods for feature extraction are becoming more and more popular. We combine the popular contrastive learning method (prototypical contrastive learning) and the classic representation learning method (autoencoder) to design an unsupervised feature learning network for hyperspectral classification. Experiments have proved that our two proposed autoencoder networks have good feature learning capabilities by themselves, and the contrastive learning network we designed can better combine the features of the two to learn more representative features. As a result, our method surpasses other comparison methods in the hyperspectral classification experiments, including some supervised methods. Moreover, our method maintains a fast feature extraction speed than baseline methods. In addition, our method reduces the requirements for huge computing resources, separates feature extraction and contrastive learning, and allows more researchers to conduct research and experiments on unsupervised contrastive learning.
摘要:特征提取无指导的学习方法变得越来越流行。我们结合了流行的对比学习方法(原型对比学习)和经典的代表学习方法(自动编码器)来设计的无监督功能学习网络的高光谱分类。实验证明,我们两国提出的自动编码网络本身具有很好的功能的学习能力,我们设计了对比学习网络可以更好地将两者结合起来的功能,以了解更多有代表性的特征。其结果是,我们的方法超越了高光谱分类实验等比较方法,包括一些监管方法。此外,我们的方法可以保持较快的特征提取速度比基准方法。此外,我们的方法减少了庞大的计算资源的要求,分离特征提取和对比学习,并允许更多的研究者进行研究和实验,无监督学习对比。
23. Structure-Aware Generation Network for Recipe Generation from Images [PDF] 返回目录
Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
Abstract: Sharing food has become very popular with the development of social media. For many real-world applications, people are keen to know the underlying recipes of a food item. In this paper, we are interested in automatically generating cooking instructions for food. We investigate an open research task of generating cooking instructions based on only food images and ingredients, which is similar to the image captioning task. However, compared with image captioning datasets, the target recipes are long-length paragraphs and do not have annotations on structure information. To address the above limitations, we propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the inferred tree structures with the recipe generation procedure. Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
摘要:分享食物已经成为非常流行的社交媒体的发展。对于许多现实世界的应用,人们渴望知道的食品的基本食谱。在本文中,我们感兴趣的是自动生成的食物的烹饪说明。我们调查仅基于食物图像和配料,这是类似图像字幕生成任务烹调指令的一个开放的研究任务。然而,随着图像数据集字幕相比,目标食谱是长条状的段落,并没有对结构信息的注释。为了解决上述限制,我们提出结构感知的下一代网络(SGN)的新架构,以解决食品配方产生的任务。我们的做法带来的系统框架在一起的几个新奇的想法:(1)利用无监督学习方法训练之前获得句子层次树状结构的标签; (2)从与来自(1)得知树结构的标签的监督图像目标食谱生成树木;以及(3)积分所推断的树形结构与配方产生过程。我们提出的模型能生产出高品质和一致的配方,并实现对基准Recipe1M数据集的国家的最先进的性能。
Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
Abstract: Sharing food has become very popular with the development of social media. For many real-world applications, people are keen to know the underlying recipes of a food item. In this paper, we are interested in automatically generating cooking instructions for food. We investigate an open research task of generating cooking instructions based on only food images and ingredients, which is similar to the image captioning task. However, compared with image captioning datasets, the target recipes are long-length paragraphs and do not have annotations on structure information. To address the above limitations, we propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the inferred tree structures with the recipe generation procedure. Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.
摘要:分享食物已经成为非常流行的社交媒体的发展。对于许多现实世界的应用,人们渴望知道的食品的基本食谱。在本文中,我们感兴趣的是自动生成的食物的烹饪说明。我们调查仅基于食物图像和配料,这是类似图像字幕生成任务烹调指令的一个开放的研究任务。然而,随着图像数据集字幕相比,目标食谱是长条状的段落,并没有对结构信息的注释。为了解决上述限制,我们提出结构感知的下一代网络(SGN)的新架构,以解决食品配方产生的任务。我们的做法带来的系统框架在一起的几个新奇的想法:(1)利用无监督学习方法训练之前获得句子层次树状结构的标签; (2)从与来自(1)得知树结构的标签的监督图像目标食谱生成树木;以及(3)积分所推断的树形结构与配方产生过程。我们提出的模型能生产出高品质和一致的配方,并实现对基准Recipe1M数据集的国家的最先进的性能。
24. 3D Facial Geometry Recovery from a Depth View with Attention Guided Generative Adversarial Network [PDF] 返回目录
Xiaoxu Cai, Hui Yu, Jianwen Lou, Xuguang Zhang, Gongfa Li, Junyu Dong
Abstract: We present to recover the complete 3D facial geometry from a single depth view by proposing an Attention Guided Generative Adversarial Networks (AGGAN). In contrast to existing work which normally requires two or more depth views to recover a full 3D facial geometry, the proposed AGGAN is able to generate a dense 3D voxel grid of the face from a single unconstrained depth view. Specifically, AGGAN encodes the 3D facial geometry within a voxel space and utilizes an attention-guided GAN to model the illposed 2.5D depth-3D mapping. Multiple loss functions, which enforce the 3D facial geometry consistency, together with a prior distribution of facial surface points in voxel space are incorporated to guide the training process. Both qualitative and quantitative comparisons show that AGGAN recovers a more complete and smoother 3D facial shape, with the capability to handle a much wider range of view angles and resist to noise in the depth view than conventional methods
摘要:我们提出通过提出指导剖成对抗性网络(AGGAN)的注意恢复从一个单个深度视图的完整的3D面部几何。相比于通常需要两个或多个深度的观点来恢复一个完整的三维面部几何现有的工作,所提出的AGGAN能够从一个单一的不受约束的深度视图生成的面部的密集的3D体素网格。具体而言,AGGAN编码的体素空间内的3D面部的几何形状和使用的注意力引导GAN建模的illposed 2.5D深度-3D映射。多个损耗函数,其与面部表面点在体元空间先验分布执行3D面部几何一致性,一起被并入以指导训练过程。定性和定量比较表明,AGGAN恢复更完整和更平滑的三维面部的形状,有能力以处理更广范围的视角,并且抗蚀剂中的噪声比传统方法深度视图
Xiaoxu Cai, Hui Yu, Jianwen Lou, Xuguang Zhang, Gongfa Li, Junyu Dong
Abstract: We present to recover the complete 3D facial geometry from a single depth view by proposing an Attention Guided Generative Adversarial Networks (AGGAN). In contrast to existing work which normally requires two or more depth views to recover a full 3D facial geometry, the proposed AGGAN is able to generate a dense 3D voxel grid of the face from a single unconstrained depth view. Specifically, AGGAN encodes the 3D facial geometry within a voxel space and utilizes an attention-guided GAN to model the illposed 2.5D depth-3D mapping. Multiple loss functions, which enforce the 3D facial geometry consistency, together with a prior distribution of facial surface points in voxel space are incorporated to guide the training process. Both qualitative and quantitative comparisons show that AGGAN recovers a more complete and smoother 3D facial shape, with the capability to handle a much wider range of view angles and resist to noise in the depth view than conventional methods
摘要:我们提出通过提出指导剖成对抗性网络(AGGAN)的注意恢复从一个单个深度视图的完整的3D面部几何。相比于通常需要两个或多个深度的观点来恢复一个完整的三维面部几何现有的工作,所提出的AGGAN能够从一个单一的不受约束的深度视图生成的面部的密集的3D体素网格。具体而言,AGGAN编码的体素空间内的3D面部的几何形状和使用的注意力引导GAN建模的illposed 2.5D深度-3D映射。多个损耗函数,其与面部表面点在体元空间先验分布执行3D面部几何一致性,一起被并入以指导训练过程。定性和定量比较表明,AGGAN恢复更完整和更平滑的三维面部的形状,有能力以处理更广范围的视角,并且抗蚀剂中的噪声比传统方法深度视图
25. Real-time 3D Facial Tracking via Cascaded Compositional Learning [PDF] 返回目录
Jianwen Lou, Xiaoxu Cai, Junyu Dong, Hui Yu
Abstract: We propose to learn a cascade of globally-optimized modular boosted ferns (GoMBF) to solve multi-modal facial motion regression for real-time 3D facial tracking from a monocular RGB camera. GoMBF is a deep composition of multiple regression models with each is a boosted ferns initially trained to predict partial motion parameters of the same modality, and then concatenated together via a global optimization step to form a singular strong boosted ferns that can effectively handle the whole regression target. It can explicitly cope with the modality variety in output variables, while manifesting increased fitting power and a faster learning speed comparing against the conventional boosted ferns. By further cascading a sequence of GoMBFs (GoMBF-Cascade) to regress facial motion parameters, we achieve competitive tracking performance on a variety of in-the-wild videos comparing to the state-of-the-art methods, which require much more training data or have higher computational complexity. It provides a robust and highly elegant solution to real-time 3D facial tracking using a small set of training data and hence makes it more practical in real-world applications.
摘要:我们建议学习全局优化的模块化提振蕨(GoMBF)的级联来解决实时三维人脸跟踪的多模态面部动作的回归从单眼相机RGB。 GoMBF是多元回归模型的深组合物与各个初始训练以预测的相同模态的局部运动参数升压蕨类植物,然后经由全局优化步骤连接在一起以形成一个单一的强升压蕨类植物,可以有效地处理整个回归目标。它可以在输出变量的形式明确各种应付,而增加的体现装配力量和更快的速度学习对传统的提振蕨类植物比较。通过进一步级联GoMBFs(GoMBF串级)的序列倒退面部运动参数,我们对各种在最疯狂的视频比较国家的最先进的方法,这需要更多的训练取得竞争的跟踪性能数据或具有更高的计算复杂度。它使用少量的训练数据集的强大和高度优雅的解决方案,以实时三维人脸跟踪,从而使得它在实际应用中更加实用。
Jianwen Lou, Xiaoxu Cai, Junyu Dong, Hui Yu
Abstract: We propose to learn a cascade of globally-optimized modular boosted ferns (GoMBF) to solve multi-modal facial motion regression for real-time 3D facial tracking from a monocular RGB camera. GoMBF is a deep composition of multiple regression models with each is a boosted ferns initially trained to predict partial motion parameters of the same modality, and then concatenated together via a global optimization step to form a singular strong boosted ferns that can effectively handle the whole regression target. It can explicitly cope with the modality variety in output variables, while manifesting increased fitting power and a faster learning speed comparing against the conventional boosted ferns. By further cascading a sequence of GoMBFs (GoMBF-Cascade) to regress facial motion parameters, we achieve competitive tracking performance on a variety of in-the-wild videos comparing to the state-of-the-art methods, which require much more training data or have higher computational complexity. It provides a robust and highly elegant solution to real-time 3D facial tracking using a small set of training data and hence makes it more practical in real-world applications.
摘要:我们建议学习全局优化的模块化提振蕨(GoMBF)的级联来解决实时三维人脸跟踪的多模态面部动作的回归从单眼相机RGB。 GoMBF是多元回归模型的深组合物与各个初始训练以预测的相同模态的局部运动参数升压蕨类植物,然后经由全局优化步骤连接在一起以形成一个单一的强升压蕨类植物,可以有效地处理整个回归目标。它可以在输出变量的形式明确各种应付,而增加的体现装配力量和更快的速度学习对传统的提振蕨类植物比较。通过进一步级联GoMBFs(GoMBF串级)的序列倒退面部运动参数,我们对各种在最疯狂的视频比较国家的最先进的方法,这需要更多的训练取得竞争的跟踪性能数据或具有更高的计算复杂度。它使用少量的训练数据集的强大和高度优雅的解决方案,以实时三维人脸跟踪,从而使得它在实际应用中更加实用。
26. Deep Learning to Detect Bacterial Colonies for the Production of Vaccines [PDF] 返回目录
Thomas Beznik, Paul Smyth, Gaël de Lannoy, John A. Lee
Abstract: During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies.
摘要:在疫苗的开发,细菌菌落形成单位(CFU),以量化在发酵过程中产率进行计数。这本手册的任务是耗时且容易出错。在这项工作中,我们测试基础上,U型网CNN架构多个分割算法,并证明这些提供强大的,自动计数CFU。我们表明,一个定制的损失函数的多类泛化允许区分有毒和具有可接受的精度无毒殖民地。虽然许多可能性留下来探索,我们的结果表明,深度学习的分离和菌落分类的潜力。
Thomas Beznik, Paul Smyth, Gaël de Lannoy, John A. Lee
Abstract: During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies.
摘要:在疫苗的开发,细菌菌落形成单位(CFU),以量化在发酵过程中产率进行计数。这本手册的任务是耗时且容易出错。在这项工作中,我们测试基础上,U型网CNN架构多个分割算法,并证明这些提供强大的,自动计数CFU。我们表明,一个定制的损失函数的多类泛化允许区分有毒和具有可接受的精度无毒殖民地。虽然许多可能性留下来探索,我们的结果表明,深度学习的分离和菌落分类的潜力。
27. Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [PDF] 返回目录
Anna Hilsmann, Philipp Fechteler, Wieland Morgenstern, Wolfgang Paier, Ingo Feldmann, Oliver Schreer, Peter Eisert
Abstract: In this paper, we present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Going beyond the application of free-viewpoint volumetric video, we allow re-animation and alteration of an actor's performance through (i) the enrichment of the captured data with semantics and animation properties and (ii) applying hybrid geometry- and video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. Our hybrid geometry- and video-based animation approaches combine the flexibility of classical CG animation with the realism of real captured data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach where coarse movements and poses are modeled in the geometry only, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically in an autoencoder-based approach. This paper covers the full pipeline from capturing and producing high-quality video content, over the enrichment with semantics and deformation properties for re-animation and processing of the data for the final hybrid animation.
摘要:在本文中,我们提出了一个终端到终端的管线为创造人类表演高品质的动画立体视频内容。超越的自由视点立体视频的应用,我们可以给一个演员的表演重新动画和改变通过(i)与语义和动画属性和(ii)将混合的几何性质和基于视频的动画捕捉到的数据的富集方法,使高品质的数据本身的直接动画而不是创建类似于捕获数据的动画模型。语义富集和几何动画能力是通过建立在3D数据的时间一致性,然后用参数形状自适应全人体模型中的每个帧的自动索具来实现。我们的混合几何性质和基于视频的动画方法与真正捕获数据的现实主义结合了经典的CG动画的灵活性。对于姿态编辑,我们利用所捕获的数据尽可能多地和运动学上变形所捕获的帧,以适应期望的姿势。此外,我们从身体不同的方式对待脸在混合几何性质和基于视频动画的方法,其中粗动作和姿势都仿照只有几何,而很细,并在脸上细微的细节,往往纯粹的几何方法欠缺,在基于视频的纹理被捕获。这些被处理以交互地组合以形成新的面部表情。最重要的是,我们知道是具有挑战性的综合区域,如牙齿或眼睛,并填充在基于自动编码的方法切实缺失区域的出现。本文涵盖了从捕捉和生产高品质的视频内容,在具有语义和变形性能重新动画的富集和最终混合的动画数据的处理的全部管道。
Anna Hilsmann, Philipp Fechteler, Wieland Morgenstern, Wolfgang Paier, Ingo Feldmann, Oliver Schreer, Peter Eisert
Abstract: In this paper, we present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Going beyond the application of free-viewpoint volumetric video, we allow re-animation and alteration of an actor's performance through (i) the enrichment of the captured data with semantics and animation properties and (ii) applying hybrid geometry- and video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. Our hybrid geometry- and video-based animation approaches combine the flexibility of classical CG animation with the realism of real captured data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach where coarse movements and poses are modeled in the geometry only, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically in an autoencoder-based approach. This paper covers the full pipeline from capturing and producing high-quality video content, over the enrichment with semantics and deformation properties for re-animation and processing of the data for the final hybrid animation.
摘要:在本文中,我们提出了一个终端到终端的管线为创造人类表演高品质的动画立体视频内容。超越的自由视点立体视频的应用,我们可以给一个演员的表演重新动画和改变通过(i)与语义和动画属性和(ii)将混合的几何性质和基于视频的动画捕捉到的数据的富集方法,使高品质的数据本身的直接动画而不是创建类似于捕获数据的动画模型。语义富集和几何动画能力是通过建立在3D数据的时间一致性,然后用参数形状自适应全人体模型中的每个帧的自动索具来实现。我们的混合几何性质和基于视频的动画方法与真正捕获数据的现实主义结合了经典的CG动画的灵活性。对于姿态编辑,我们利用所捕获的数据尽可能多地和运动学上变形所捕获的帧,以适应期望的姿势。此外,我们从身体不同的方式对待脸在混合几何性质和基于视频动画的方法,其中粗动作和姿势都仿照只有几何,而很细,并在脸上细微的细节,往往纯粹的几何方法欠缺,在基于视频的纹理被捕获。这些被处理以交互地组合以形成新的面部表情。最重要的是,我们知道是具有挑战性的综合区域,如牙齿或眼睛,并填充在基于自动编码的方法切实缺失区域的出现。本文涵盖了从捕捉和生产高品质的视频内容,在具有语义和变形性能重新动画的富集和最终混合的动画数据的处理的全部管道。
28. Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams [PDF] 返回目录
Matthias De Lange, Tinne Tuytelaars
Abstract: As learning from non-stationary streams of data has been proven a challenging endeavour, current continual learners often strongly relax the problem, assuming balanced datasets, unlimited processing of data stream subsets, and additional availability of task information, sometimes even during inference. In contrast, our continual learner processes the data streams in an online fashion, without additional task-information, and shows solid robustness to imbalanced data streams resembling a real-world setting. Defying such challenging settings is achieved by aggregating prototypes and nearest-neighbour based classification in a shared latent space, where a Continual Prototype Evolution (CoPE) enables learning and prediction at any point in time. As the embedding network continually changes, prototypes inevitably become obsolete, which we prevent by replay of exemplars from memory. We obtain state-of-the-art performance by a significant margin on five benchmarks, including two highly unbalanced data streams.
摘要:从数据的非平稳流学习已被证明极具挑战性,当前持续的学习者往往强烈地放松的问题,假设平衡数据集,数据流子集的无限的处理和任务信息加推房源,有时甚至在推断。相比之下,我们不断学习处理以在线的方式将数据流,无需额外的任务信息,并显示固体鲁棒性不平衡的数据流类似真实世界的环境。违抗这种挑战的设置是通过在任何时间点聚集在共享潜在空间,其中一个持续原型演进(COPE)使学习和预测原型和基于最近邻分类来实现。作为嵌入网络不断变化,难免原型变得过时,这是我们防止由于内存典范的重播。我们对5个标准,其中包括两个高度不平衡的数据流通过一个显著保证金获得国家的最先进的性能。
Matthias De Lange, Tinne Tuytelaars
Abstract: As learning from non-stationary streams of data has been proven a challenging endeavour, current continual learners often strongly relax the problem, assuming balanced datasets, unlimited processing of data stream subsets, and additional availability of task information, sometimes even during inference. In contrast, our continual learner processes the data streams in an online fashion, without additional task-information, and shows solid robustness to imbalanced data streams resembling a real-world setting. Defying such challenging settings is achieved by aggregating prototypes and nearest-neighbour based classification in a shared latent space, where a Continual Prototype Evolution (CoPE) enables learning and prediction at any point in time. As the embedding network continually changes, prototypes inevitably become obsolete, which we prevent by replay of exemplars from memory. We obtain state-of-the-art performance by a significant margin on five benchmarks, including two highly unbalanced data streams.
摘要:从数据的非平稳流学习已被证明极具挑战性,当前持续的学习者往往强烈地放松的问题,假设平衡数据集,数据流子集的无限的处理和任务信息加推房源,有时甚至在推断。相比之下,我们不断学习处理以在线的方式将数据流,无需额外的任务信息,并显示固体鲁棒性不平衡的数据流类似真实世界的环境。违抗这种挑战的设置是通过在任何时间点聚集在共享潜在空间,其中一个持续原型演进(COPE)使学习和预测原型和基于最近邻分类来实现。作为嵌入网络不断变化,难免原型变得过时,这是我们防止由于内存典范的重播。我们对5个标准,其中包括两个高度不平衡的数据流通过一个显著保证金获得国家的最先进的性能。
29. Neural Crossbreed: Neural Based Image Metamorphosis [PDF] 返回目录
Sanghun Park, Kwanggyoon Seo, Junyong Noh
Abstract: We propose Neural Crossbreed, a feed-forward neural network that can learn a semantic change of input images in a latent space to create the morphing effect. Because the network learns a semantic change, a sequence of meaningful intermediate images can be generated without requiring the user to specify explicit correspondences. In addition, the semantic change learning makes it possible to perform the morphing between the images that contain objects with significantly different poses or camera views. Furthermore, just as in conventional morphing techniques, our morphing network can handle shape and appearance transitions separately by disentangling the content and the style transfer for rich usability. We prepare a training dataset for morphing using a pre-trained BigGAN, which generates an intermediate image by interpolating two latent vectors at an intended morphing value. This is the first attempt to address image morphing using a pre-trained generative model in order to learn semantic transformation. The experiments show that Neural Crossbreed produces high quality morphed images, overcoming various limitations associated with conventional approaches. In addition, Neural Crossbreed can be further extended for diverse applications such as multi-image morphing, appearance transfer, and video frame interpolation.
摘要:我们建议神经杂种,是可以学习的潜在空间输入图像的语义变化,以创建变形效果前馈神经网络。由于网络学习语义的改变,可以在不要求用户指定明确的对应关系来生成有意义的中间图像的序列。此外,语义变化学习使得有可能包含带有显著不同姿态或相机视图中的对象的图像之间进行变形。此外,就像在传统的变形技术,我们的变形网络可以分别由解开的内容和丰富的可用性风格转移处理形状和外观的转变。我们准备用于使用预训练BigGAN,其产生通过内插在预期的变形值的两个潜在向量的中间图像变形训练数据集。这是为了学习语义变换使用预先训练生成模型首次尝试地址图像变形。实验结果表明,神经杂种生产高品质的音素变形的图像,克服了与传统方法相关的各种限制。此外,神经杂种可进一步扩展用于多种应用,如多图像变形,外观传输和视频帧内插。
Sanghun Park, Kwanggyoon Seo, Junyong Noh
Abstract: We propose Neural Crossbreed, a feed-forward neural network that can learn a semantic change of input images in a latent space to create the morphing effect. Because the network learns a semantic change, a sequence of meaningful intermediate images can be generated without requiring the user to specify explicit correspondences. In addition, the semantic change learning makes it possible to perform the morphing between the images that contain objects with significantly different poses or camera views. Furthermore, just as in conventional morphing techniques, our morphing network can handle shape and appearance transitions separately by disentangling the content and the style transfer for rich usability. We prepare a training dataset for morphing using a pre-trained BigGAN, which generates an intermediate image by interpolating two latent vectors at an intended morphing value. This is the first attempt to address image morphing using a pre-trained generative model in order to learn semantic transformation. The experiments show that Neural Crossbreed produces high quality morphed images, overcoming various limitations associated with conventional approaches. In addition, Neural Crossbreed can be further extended for diverse applications such as multi-image morphing, appearance transfer, and video frame interpolation.
摘要:我们建议神经杂种,是可以学习的潜在空间输入图像的语义变化,以创建变形效果前馈神经网络。由于网络学习语义的改变,可以在不要求用户指定明确的对应关系来生成有意义的中间图像的序列。此外,语义变化学习使得有可能包含带有显著不同姿态或相机视图中的对象的图像之间进行变形。此外,就像在传统的变形技术,我们的变形网络可以分别由解开的内容和丰富的可用性风格转移处理形状和外观的转变。我们准备用于使用预训练BigGAN,其产生通过内插在预期的变形值的两个潜在向量的中间图像变形训练数据集。这是为了学习语义变换使用预先训练生成模型首次尝试地址图像变形。实验结果表明,神经杂种生产高品质的音素变形的图像,克服了与传统方法相关的各种限制。此外,神经杂种可进一步扩展用于多种应用,如多图像变形,外观传输和视频帧内插。
30. Adversarially Robust Neural Architectures [PDF] 返回目录
Minjing Dong, Yanxi Li, Yunhe Wang, Chang Xu
Abstract: Deep Neural Network (DNN) are vulnerable to adversarial attack. Existing methods are devoted to developing various robust training strategies or regularizations to update the weights of the neural network. But beyond the weights, the overall structure and information flow in the network are explicitly determined by the neural architecture, which remains unexplored. This paper thus aims to improve the adversarial robustness of the network from the architecture perspective with NAS framework. We explore the relationship among adversarial robustness, Lipschitz constant, and architecture parameters and show that an appropriate constraint on architecture parameters could reduce the Lipschitz constant to further improve the robustness. For NAS framework, all the architecture parameters are equally treated when the discrete architecture is sampled from supernet. However, the importance of architecture parameters could vary from operation to operation or connection to connection, which is not explored and might reduce the confidence of robust architecture sampling. Thus, we propose to sample architecture parameters from trainable multivariate log-normal distributions, with which the Lipschitz constant of entire network can be approximated using a univariate log-normal distribution with mean and variance related to architecture parameters. Compared with adversarially trained neural architectures searched by various NAS algorithms as well as efficient human-designed models, our algorithm empirically achieves the best performance among all the models under various attacks on different datasets.
摘要:深层神经网络(DNN)是容易受到攻击的对抗性。现有的方法致力于开发各种强大的培训策略或正则化更新神经网络的权重。但是,除了权重,在网络中的整体结构和信息流是明确的神经结构,这仍然是未知确定。因此本文的目的是改善从与NAS框架体系结构透视网络的对抗性鲁棒性。我们探索对抗性的鲁棒性,李氏常数,结构参数,并显示在结构参数的适当约束可以减少Lipschitz常数,以进一步提高稳健性之间的关系。对于NAS框架,当离散架构从超网采样的所有体系结构参数被同等对待。然而,结构参数的重要性可能有所不同,从操作到操作或连接的连接,这是不探索和可能减少健壮的架构取样的信心。因此,我们建议从可训练多元对数正态分布样品架构参数,利用该网络整体的Lipschitz常数可以利用与涉及架构参数均值和方差单变量数正态分布来近似。与adversarially训练的神经结构搜查各种NAS算法以及高效的人性化设计的机型相比,我们的算法实现凭经验下的不同的数据集各种攻击模型中的所有最佳性能。
Minjing Dong, Yanxi Li, Yunhe Wang, Chang Xu
Abstract: Deep Neural Network (DNN) are vulnerable to adversarial attack. Existing methods are devoted to developing various robust training strategies or regularizations to update the weights of the neural network. But beyond the weights, the overall structure and information flow in the network are explicitly determined by the neural architecture, which remains unexplored. This paper thus aims to improve the adversarial robustness of the network from the architecture perspective with NAS framework. We explore the relationship among adversarial robustness, Lipschitz constant, and architecture parameters and show that an appropriate constraint on architecture parameters could reduce the Lipschitz constant to further improve the robustness. For NAS framework, all the architecture parameters are equally treated when the discrete architecture is sampled from supernet. However, the importance of architecture parameters could vary from operation to operation or connection to connection, which is not explored and might reduce the confidence of robust architecture sampling. Thus, we propose to sample architecture parameters from trainable multivariate log-normal distributions, with which the Lipschitz constant of entire network can be approximated using a univariate log-normal distribution with mean and variance related to architecture parameters. Compared with adversarially trained neural architectures searched by various NAS algorithms as well as efficient human-designed models, our algorithm empirically achieves the best performance among all the models under various attacks on different datasets.
摘要:深层神经网络(DNN)是容易受到攻击的对抗性。现有的方法致力于开发各种强大的培训策略或正则化更新神经网络的权重。但是,除了权重,在网络中的整体结构和信息流是明确的神经结构,这仍然是未知确定。因此本文的目的是改善从与NAS框架体系结构透视网络的对抗性鲁棒性。我们探索对抗性的鲁棒性,李氏常数,结构参数,并显示在结构参数的适当约束可以减少Lipschitz常数,以进一步提高稳健性之间的关系。对于NAS框架,当离散架构从超网采样的所有体系结构参数被同等对待。然而,结构参数的重要性可能有所不同,从操作到操作或连接的连接,这是不探索和可能减少健壮的架构取样的信心。因此,我们建议从可训练多元对数正态分布样品架构参数,利用该网络整体的Lipschitz常数可以利用与涉及架构参数均值和方差单变量数正态分布来近似。与adversarially训练的神经结构搜查各种NAS算法以及高效的人性化设计的机型相比,我们的算法实现凭经验下的不同的数据集各种攻击模型中的所有最佳性能。
31. PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation [PDF] 返回目录
Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, Xian-Sheng Hua
Abstract: Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution. Thus, tackling the class imbalance trouble of SGG is critical and challenging. In this paper, we first discover that when predicate labels have strong correlation with each other, prevalent re-balancing strategies(e.g., re-sampling and re-weighting) will give rise to either over-fitting the tail data(e.g., bench sitting on sidewalk rather than on), or still suffering the adverse effect from the original uneven distribution(e.g., aggregating varied parked on/standing on/sitting on into on). We argue the principal reason is that re-balancing strategies are sensitive to the frequencies of predicates yet blind to their relatedness, which may play a more important role to promote the learning of predicate features. Therefore, we propose a novel Predicate-Correlation Perception Learning(PCPL for short) scheme to adaptively seek out appropriate loss weights by directly perceiving and utilizing the correlation among predicate classes. Moreover, our PCPL framework is further equipped with a graph encoder module to better extract context features. Extensive experiments on the benchmark VG150 dataset show that the proposed PCPL performs markedly better on tail classes while well-preserving the performance on head ones, which significantly outperforms previous state-of-the-art methods.
摘要:今天,场景图代(SGG)的任务是在真实的场景在很大程度上限制,主要是由于上游标注分布极长尾偏见。因此,解决SGG的类不平衡问题是关键的,并且具有挑战性。在本文中,我们首先发现,当谓语标签必须彼此,普遍的再平衡策略(例如,重新采样和重新加权)较强的相关性将产生或者过度拟合尾数据(例如,板凳坐人行道上,而不是),或仍然遭受从原来的分布不均(不利效应,例如,聚集变化停放在/站在/坐在插入)。我们认为主要的原因是,再平衡战略是谓词但盲目的相关性的频率,这可能会起到促进的谓语功能的学习更重要的作用敏感。因此,我们提出了一个新颖的谓语相关知觉学习(PCPL的简称)方案通过直接感知并利用谓词类之间的相关性自适应寻求适当的损耗的权重。此外,我们的框架PCPL还配备有一个图形编码器模块,以更好地提取背景特征。基准的VG150数据集显示,尾班提出的PCPL进行显着提高的同时,也保留对那些头,这显著优于以前的国家的最先进方法的性能广泛的实验。
Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, Xian-Sheng Hua
Abstract: Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution. Thus, tackling the class imbalance trouble of SGG is critical and challenging. In this paper, we first discover that when predicate labels have strong correlation with each other, prevalent re-balancing strategies(e.g., re-sampling and re-weighting) will give rise to either over-fitting the tail data(e.g., bench sitting on sidewalk rather than on), or still suffering the adverse effect from the original uneven distribution(e.g., aggregating varied parked on/standing on/sitting on into on). We argue the principal reason is that re-balancing strategies are sensitive to the frequencies of predicates yet blind to their relatedness, which may play a more important role to promote the learning of predicate features. Therefore, we propose a novel Predicate-Correlation Perception Learning(PCPL for short) scheme to adaptively seek out appropriate loss weights by directly perceiving and utilizing the correlation among predicate classes. Moreover, our PCPL framework is further equipped with a graph encoder module to better extract context features. Extensive experiments on the benchmark VG150 dataset show that the proposed PCPL performs markedly better on tail classes while well-preserving the performance on head ones, which significantly outperforms previous state-of-the-art methods.
摘要:今天,场景图代(SGG)的任务是在真实的场景在很大程度上限制,主要是由于上游标注分布极长尾偏见。因此,解决SGG的类不平衡问题是关键的,并且具有挑战性。在本文中,我们首先发现,当谓语标签必须彼此,普遍的再平衡策略(例如,重新采样和重新加权)较强的相关性将产生或者过度拟合尾数据(例如,板凳坐人行道上,而不是),或仍然遭受从原来的分布不均(不利效应,例如,聚集变化停放在/站在/坐在插入)。我们认为主要的原因是,再平衡战略是谓词但盲目的相关性的频率,这可能会起到促进的谓语功能的学习更重要的作用敏感。因此,我们提出了一个新颖的谓语相关知觉学习(PCPL的简称)方案通过直接感知并利用谓词类之间的相关性自适应寻求适当的损耗的权重。此外,我们的框架PCPL还配备有一个图形编码器模块,以更好地提取背景特征。基准的VG150数据集显示,尾班提出的PCPL进行显着提高的同时,也保留对那些头,这显著优于以前的国家的最先进方法的性能广泛的实验。
32. GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [PDF] 返回目录
Ibrahim Batuhan Akkaya, Ugur Halici
Abstract: Image-to-image translation (IIT) has made much progress recently with the development of adversarial learning. In most of the recent work, an adversarial loss is utilized to match the distributions of the translated and target image sets. However, this may create artifacts if two domains have different marginal distributions, for example, in uniform areas. In this work, we propose an unsupervised IIT method that preserves the uniform regions after the translation. The gradient adjustment loss, which is the L2 norm between the Sobel response of the target image and the adjusted Sobel response of the source images, is utilized. The proposed method is validated on the jellyfish-to-Haeckel dataset, which is prepared to demonstrate the mentioned problem, which contains images with different background distributions. We demonstrate that our method obtained a performance gain compared to the baseline method qualitatively and quantitatively, showing the effectiveness of the proposed method.
摘要:图像 - 图像平移(IIT)最近取得较大进展与对抗学习的发展。在大多数最近的工作的,对抗性损失利用以匹配转换图像和目标图像集的分布。但是,如果两个区域具有不同的边缘分布,例如,在均匀区域,这可能创建伪像。在这项工作中,我们建议在翻译后保留了均匀区域无监督IIT方法。梯度调整损失,这是目标图像的索贝尔响应和源图像的调整索贝尔响应之间的L2范数,被利用。所提出的方法进行了验证的水母到海克尔数据集,其被制备以证明上述问题,其中包含与不同的背景分布图像。我们证明我们的方法相比,定性和定量的基准方法获得的性能增益,显示了该方法的有效性。
Ibrahim Batuhan Akkaya, Ugur Halici
Abstract: Image-to-image translation (IIT) has made much progress recently with the development of adversarial learning. In most of the recent work, an adversarial loss is utilized to match the distributions of the translated and target image sets. However, this may create artifacts if two domains have different marginal distributions, for example, in uniform areas. In this work, we propose an unsupervised IIT method that preserves the uniform regions after the translation. The gradient adjustment loss, which is the L2 norm between the Sobel response of the target image and the adjusted Sobel response of the source images, is utilized. The proposed method is validated on the jellyfish-to-Haeckel dataset, which is prepared to demonstrate the mentioned problem, which contains images with different background distributions. We demonstrate that our method obtained a performance gain compared to the baseline method qualitatively and quantitatively, showing the effectiveness of the proposed method.
摘要:图像 - 图像平移(IIT)最近取得较大进展与对抗学习的发展。在大多数最近的工作的,对抗性损失利用以匹配转换图像和目标图像集的分布。但是,如果两个区域具有不同的边缘分布,例如,在均匀区域,这可能创建伪像。在这项工作中,我们建议在翻译后保留了均匀区域无监督IIT方法。梯度调整损失,这是目标图像的索贝尔响应和源图像的调整索贝尔响应之间的L2范数,被利用。所提出的方法进行了验证的水母到海克尔数据集,其被制备以证明上述问题,其中包含与不同的背景分布图像。我们证明我们的方法相比,定性和定量的基准方法获得的性能增益,显示了该方法的有效性。
33. ALEX: Active Learning based Enhancement of a Model's Explainability [PDF] 返回目录
Ishani Mondal, Debasis Ganguly
Abstract: An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner. While standard AL heuristics, such as selecting those points for annotation for which a classification model yields least confident predictions, there has been no empirical investigation to see if these heuristics lead to models that are more interpretable to humans. In the era of data-driven learning, this is an important research direction to pursue. This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps. Concretely speaking, our proposed selection function trains an `explainer' model in addition to the classifier model, and favours those instances where a different part of the data is used, on an average, to explain the predicted class. Initial experiments exhibited encouraging trends in showing that such a heuristic can lead to developing more effective and more explainable end-to-end data-driven classifiers.
摘要:主动学习(AL)算法旨在构建一种有效的分类用的自举方式标记的一个例子的最小数目。虽然标准AL启发,比如选择那些点标注了该分类模型的产量至少有信心的预测,一直没有实证调查,看看这些启发式导致模型更可解释人类。在数据驱动学习的时代,这是一个重要的研究方向追求。本文介绍了我们的工作,正在进行对发展的AL选择功能,除了模型的有效性还寻求提高模型的过程中引导步骤可解释性。具体而言,我们建议选择功能训练,除了分类模型的`讲解”模型,并有利于在数据的不同部分使用,平均,来解释预测的类的实例。最初的实验中显示出令人鼓舞的表示,这样的启发式趋势可导致发展更有效的和更可解释的端至端数据驱动分类器。
Ishani Mondal, Debasis Ganguly
Abstract: An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner. While standard AL heuristics, such as selecting those points for annotation for which a classification model yields least confident predictions, there has been no empirical investigation to see if these heuristics lead to models that are more interpretable to humans. In the era of data-driven learning, this is an important research direction to pursue. This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps. Concretely speaking, our proposed selection function trains an `explainer' model in addition to the classifier model, and favours those instances where a different part of the data is used, on an average, to explain the predicted class. Initial experiments exhibited encouraging trends in showing that such a heuristic can lead to developing more effective and more explainable end-to-end data-driven classifiers.
摘要:主动学习(AL)算法旨在构建一种有效的分类用的自举方式标记的一个例子的最小数目。虽然标准AL启发,比如选择那些点标注了该分类模型的产量至少有信心的预测,一直没有实证调查,看看这些启发式导致模型更可解释人类。在数据驱动学习的时代,这是一个重要的研究方向追求。本文介绍了我们的工作,正在进行对发展的AL选择功能,除了模型的有效性还寻求提高模型的过程中引导步骤可解释性。具体而言,我们建议选择功能训练,除了分类模型的`讲解”模型,并有利于在数据的不同部分使用,平均,来解释预测的类的实例。最初的实验中显示出令人鼓舞的表示,这样的启发式趋势可导致发展更有效的和更可解释的端至端数据驱动分类器。
34. e-TLD: Event-based Framework for Dynamic Object Tracking [PDF] 返回目录
Bharath Ramesh, Shihao Zhang, Hong Yang, Andres Ussa, Matthew Ong, Garrick Orchard, Cheng Xiang
Abstract: This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based local sliding window technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a data-driven, global sliding window detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.
摘要:本文提出了一般跟踪条件下的运动事件相机长期目标跟踪框架。用于这些革命性相机的先河,跟踪框架使用了在线学习的对象判别表示,检测并重新轨道的物体时,它回来进入该领域的视图。其中一个关键的新颖之处是使用基于事件的局部滑动窗口技术,不仅能够可靠地在混乱和纹理背景的场景轨道。此外,贝叶斯引导用于辅助实时处理和升压对象表示的辨别力。在另一方面,当对象重新进入场的视摄像机的,数据驱动的,全球滑动窗口检测器定位用于后续跟踪的对象。广泛的实验证明了该框架的跟踪和检测各种形状和尺寸的任意物体的能力,包括动态作为人等对象。这是一个显著的改善相比,只要他们是在简单的背景设置可见,简单地跟踪物体早期的作品。使用下三个运动设置,即平移,旋转和6-DOF五个不同的对象基本事实的位置,定量测量报告对各种性能问题的关键见解基于事件的跟踪框架。最后,在C ++中的亮点跟踪下缩放,旋转,视点和闭塞情景能力在实验室环境中实时实现。
Bharath Ramesh, Shihao Zhang, Hong Yang, Andres Ussa, Matthew Ong, Garrick Orchard, Cheng Xiang
Abstract: This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based local sliding window technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a data-driven, global sliding window detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.
摘要:本文提出了一般跟踪条件下的运动事件相机长期目标跟踪框架。用于这些革命性相机的先河,跟踪框架使用了在线学习的对象判别表示,检测并重新轨道的物体时,它回来进入该领域的视图。其中一个关键的新颖之处是使用基于事件的局部滑动窗口技术,不仅能够可靠地在混乱和纹理背景的场景轨道。此外,贝叶斯引导用于辅助实时处理和升压对象表示的辨别力。在另一方面,当对象重新进入场的视摄像机的,数据驱动的,全球滑动窗口检测器定位用于后续跟踪的对象。广泛的实验证明了该框架的跟踪和检测各种形状和尺寸的任意物体的能力,包括动态作为人等对象。这是一个显著的改善相比,只要他们是在简单的背景设置可见,简单地跟踪物体早期的作品。使用下三个运动设置,即平移,旋转和6-DOF五个不同的对象基本事实的位置,定量测量报告对各种性能问题的关键见解基于事件的跟踪框架。最后,在C ++中的亮点跟踪下缩放,旋转,视点和闭塞情景能力在实验室环境中实时实现。
35. Retaining Image Feature Matching Performance Under Low Light Conditions [PDF] 返回目录
Pranjay Shyam, Antyanta Bangunharcana, Kyung-Soo Kim
Abstract: Poor image quality in low light images may result in a reduced number of feature matching between images. In this paper, we investigate the performance of feature extraction algorithms in low light environments. To find an optimal setting to retain feature matching performance in low light images, we look into the effect of changing feature acceptance threshold for feature detector and adding pre-processing in the form of Low Light Image Enhancement (LLIE) prior to feature detection. We observe that even in low light images, feature matching using traditional hand-crafted feature detectors still performs reasonably well by lowering the threshold parameter. We also show that applying Low Light Image Enhancement (LLIE) algorithms can improve feature matching even more when paired with the right feature extraction algorithm.
摘要:在低光图像差的图像质量可能会导致图像之间的特征匹配的数量减少。在本文中,我们调查的特征提取算法,在低光环境下的性能。找到一个最佳设置以保留在低光图像特征匹配的性能,我们考虑改变特征接受阈值进行特征检测器和在低光图像增强(LLIE)形式的特征检测之前将前处理的效果。我们观察到,即使在低光图像,特征匹配使用传统的手工制作的特征检测器仍然进行合理通过公降低阈值参数。我们还表明,使用低光图像增强(LLIE)算法可以更加提高特征匹配时,用正确的特征提取算法配对。
Pranjay Shyam, Antyanta Bangunharcana, Kyung-Soo Kim
Abstract: Poor image quality in low light images may result in a reduced number of feature matching between images. In this paper, we investigate the performance of feature extraction algorithms in low light environments. To find an optimal setting to retain feature matching performance in low light images, we look into the effect of changing feature acceptance threshold for feature detector and adding pre-processing in the form of Low Light Image Enhancement (LLIE) prior to feature detection. We observe that even in low light images, feature matching using traditional hand-crafted feature detectors still performs reasonably well by lowering the threshold parameter. We also show that applying Low Light Image Enhancement (LLIE) algorithms can improve feature matching even more when paired with the right feature extraction algorithm.
摘要:在低光图像差的图像质量可能会导致图像之间的特征匹配的数量减少。在本文中,我们调查的特征提取算法,在低光环境下的性能。找到一个最佳设置以保留在低光图像特征匹配的性能,我们考虑改变特征接受阈值进行特征检测器和在低光图像增强(LLIE)形式的特征检测之前将前处理的效果。我们观察到,即使在低光图像,特征匹配使用传统的手工制作的特征检测器仍然进行合理通过公降低阈值参数。我们还表明,使用低光图像增强(LLIE)算法可以更加提高特征匹配时,用正确的特征提取算法配对。
36. Intrinsic Relationship Reasoning for Small Object Detection [PDF] 返回目录
Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian
Abstract: The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects. Specifically, we first construct a semantic module to model the sparse semantic relationships based on the initial regional features, and a spatial layout module to model the sparse spatial layout relationships based on their position and shape information, respectively. Both of them are then fed into a context reasoning module for integrating the contextual information with respect to the objects and their relationships, which is further fused with the original regional visual features for classification and regression. Experimental results reveal that the proposed approach can effectively boost the small object detection performance.
摘要:在图像和视频的小物件通常不是独立的个体。相反,他们或多或少存在相互一些语义和空间布局的关系。建模和推断这种内在关系,从而可以对小目标检测是有益的。在本文中,我们提出了小物件检测对象之间的哪个型号和推断的内在语义和空间布局关系的新的上下文推理方法。具体地讲,我们首先构造一个语义模块以基于所述初始区域特征稀疏语义关系进行建模,以及空间布局模块基于它们的位置和形状的信息的稀疏空间布局关系,分别建模。然后两者都被送入用于相对于所述对象和它们的关系,这与原来的区域的视觉特征对分类和回归进一步稠合积分所述上下文信息的上下文推理模块。实验结果表明,该方法能有效提升小物件检测性能。
Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian
Abstract: The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects. Specifically, we first construct a semantic module to model the sparse semantic relationships based on the initial regional features, and a spatial layout module to model the sparse spatial layout relationships based on their position and shape information, respectively. Both of them are then fed into a context reasoning module for integrating the contextual information with respect to the objects and their relationships, which is further fused with the original regional visual features for classification and regression. Experimental results reveal that the proposed approach can effectively boost the small object detection performance.
摘要:在图像和视频的小物件通常不是独立的个体。相反,他们或多或少存在相互一些语义和空间布局的关系。建模和推断这种内在关系,从而可以对小目标检测是有益的。在本文中,我们提出了小物件检测对象之间的哪个型号和推断的内在语义和空间布局关系的新的上下文推理方法。具体地讲,我们首先构造一个语义模块以基于所述初始区域特征稀疏语义关系进行建模,以及空间布局模块基于它们的位置和形状的信息的稀疏空间布局关系,分别建模。然后两者都被送入用于相对于所述对象和它们的关系,这与原来的区域的视觉特征对分类和回归进一步稠合积分所述上下文信息的上下文推理模块。实验结果表明,该方法能有效提升小物件检测性能。
37. Convolutional Nonlinear Dictionary with Cascaded Structure Filter Banks [PDF] 返回目录
Ruiki Kobayashi, Shogo Muramatsu
Abstract: This study proposes a convolutional nonlinear dictionary (CNLD) for image restoration using cascaded filter banks. Generally, convolutional neural networks (CNN) demonstrate their practicality in image restoration applications; however, existing CNNs are constructed without considering the relationship among atomic images (convolution kernels). As a result, there remains room for discussing the role of design spaces. To provide a framework for constructing an effective and structured convolutional network, this study proposes the CNLD. The backpropagation learning procedure is derived from certain image restoration experiments, and thereby the significance of CNLD is verified. It is demonstrated that the number of parameters is reduced while preserving the restoration performance.
摘要:本研究提出卷积非线性词典(CNLD)使用级联滤波器组图像恢复。一般地,卷积神经网络(CNN)证明其在图像恢复应用的实用性;然而,被构造现有细胞神经网络不考虑原子的图像(卷积核)之间的关系。其结果是,仍有余地讨论设计空间的作用。要构建一个有效和结构卷积网络提供了一个框架,这项研究提出了CNLD。的反向传播学习过程从某些图像恢复实验获得,并且由此CNLD的意义进行验证。据证实,同时保留恢复性能的参数的数量被减少。
Ruiki Kobayashi, Shogo Muramatsu
Abstract: This study proposes a convolutional nonlinear dictionary (CNLD) for image restoration using cascaded filter banks. Generally, convolutional neural networks (CNN) demonstrate their practicality in image restoration applications; however, existing CNNs are constructed without considering the relationship among atomic images (convolution kernels). As a result, there remains room for discussing the role of design spaces. To provide a framework for constructing an effective and structured convolutional network, this study proposes the CNLD. The backpropagation learning procedure is derived from certain image restoration experiments, and thereby the significance of CNLD is verified. It is demonstrated that the number of parameters is reduced while preserving the restoration performance.
摘要:本研究提出卷积非线性词典(CNLD)使用级联滤波器组图像恢复。一般地,卷积神经网络(CNN)证明其在图像恢复应用的实用性;然而,被构造现有细胞神经网络不考虑原子的图像(卷积核)之间的关系。其结果是,仍有余地讨论设计空间的作用。要构建一个有效和结构卷积网络提供了一个框架,这项研究提出了CNLD。的反向传播学习过程从某些图像恢复实验获得,并且由此CNLD的意义进行验证。据证实,同时保留恢复性能的参数的数量被减少。
38. On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption [PDF] 返回目录
Charles Lehman, Dogancan Temel, Ghassan AlRegib
Abstract: Semantic segmentation is a scene understanding task at the heart of safety-critical applications where robustness to corrupted inputs is essential. Implicit Background Estimation (IBE) has demonstrated to be a promising technique to improve the robustness to out-of-distribution inputs for semantic segmentation models for little to no cost. In this paper, we provide analysis comparing the structures learned as a result of optimization objectives that use Softmax, IBE, and Sigmoid in order to improve understanding their relationship to robustness. As a result of this analysis, we propose combining Sigmoid with IBE (SCrIBE) to improve robustness. Finally, we demonstrate that SCrIBE exhibits superior segmentation performance aggregated across all corruptions and severity levels with a mIOU of 42.1 compared to both IBE 40.3 and the Softmax Baseline 37.5.
摘要:语义分割是在安全关键应用的心脏的坚固性,以损坏投入是必不可少的一个场景理解任务。隐含背景估计(IBE)已证明是改善鲁棒性外的分配输入,用于语义分割模型几乎没有成本有前途的技术。在本文中,我们提供了分析比较得知由于使用SOFTMAX,IBE和乙状结肠,以提高他们的认识,以稳健性关系的优化目标结果的结构。作为这一分析的结果,我们提出了用乙状结肠IBE(隶)相结合,以提高耐用性。最后,我们证明抄写表现出优良的分割性能在所有腐败和严重程度汇总42.1一米欧相比,无论IBE 40.3和SOFTMAX基线37.5。
Charles Lehman, Dogancan Temel, Ghassan AlRegib
Abstract: Semantic segmentation is a scene understanding task at the heart of safety-critical applications where robustness to corrupted inputs is essential. Implicit Background Estimation (IBE) has demonstrated to be a promising technique to improve the robustness to out-of-distribution inputs for semantic segmentation models for little to no cost. In this paper, we provide analysis comparing the structures learned as a result of optimization objectives that use Softmax, IBE, and Sigmoid in order to improve understanding their relationship to robustness. As a result of this analysis, we propose combining Sigmoid with IBE (SCrIBE) to improve robustness. Finally, we demonstrate that SCrIBE exhibits superior segmentation performance aggregated across all corruptions and severity levels with a mIOU of 42.1 compared to both IBE 40.3 and the Softmax Baseline 37.5.
摘要:语义分割是在安全关键应用的心脏的坚固性,以损坏投入是必不可少的一个场景理解任务。隐含背景估计(IBE)已证明是改善鲁棒性外的分配输入,用于语义分割模型几乎没有成本有前途的技术。在本文中,我们提供了分析比较得知由于使用SOFTMAX,IBE和乙状结肠,以提高他们的认识,以稳健性关系的优化目标结果的结构。作为这一分析的结果,我们提出了用乙状结肠IBE(隶)相结合,以提高耐用性。最后,我们证明抄写表现出优良的分割性能在所有腐败和严重程度汇总42.1一米欧相比,无论IBE 40.3和SOFTMAX基线37.5。
39. Open-set Adversarial Defense [PDF] 返回目录
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
Abstract: Open-set recognition and adversarial defense study two key aspects of deep learning that are vital for real-world deployment. The objective of open-set recognition is to identify samples from open-set classes during testing, while adversarial defense aims to defend the network against images with imperceptible adversarial perturbations. In this paper, we show that open-set recognition systems are vulnerable to adversarial attacks. Furthermore, we show that adversarial defense mechanisms trained on known classes do not generalize well to open-set samples. Motivated by this observation, we emphasize the need of an Open-Set Adversarial Defense (OSAD) mechanism. This paper proposes an Open-Set Defense Network (OSDN) as a solution to the OSAD problem. The proposed network uses an encoder with feature-denoising layers coupled with a classifier to learn a noise-free latent feature representation. Two techniques are employed to obtain an informative latent feature space with the objective of improving open-set performance. First, a decoder is used to ensure that clean images can be reconstructed from the obtained latent features. Then, self-supervision is used to ensure that the latent features are informative enough to carry out an auxiliary task. We introduce a testing protocol to evaluate OSAD performance and show the effectiveness of the proposed method in multiple object classification datasets. The implementation code of the proposed method is available at: this https URL.
摘要:开放式设置的识别和对抗的防御研究,对于现实世界的部署至关重要的深度学习的两个关键方面。开集识别的目标是在测试过程,以确定在从开集类型的样品,而对抗的防守目标,以抵御与潜移默化的对抗性干扰图像的网络。在本文中,我们表明,开放式集合识别系统很容易受到攻击的对抗性。此外,我们还表明,经过训练上已知种类的对抗防御机制不能推广很好地开放组样品。通过这一观察的推动下,我们强调的开集对抗性防御(OSAD)机制的需要。本文提出了一种开集防御网络(OSDN)为解决OSAD问题。所提出的网络使用具有耦合的分类器学习一个无噪声的潜特征表示特征的去噪层的编码器。两种技术的采用,以获得与目标提高开放组性能的潜在信息特征空间。首先,解码器被用于确保清洁的图像可以从所获得的潜特征来重建。然后,自我监督来确保潜在的特点是信息足以进行辅助工作。我们引入一个测试方案,以评估OSAD性能,并显示在多个对象分类数据集所提出的方法的有效性。该方法的实现代码,请访问:此HTTPS URL。
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
Abstract: Open-set recognition and adversarial defense study two key aspects of deep learning that are vital for real-world deployment. The objective of open-set recognition is to identify samples from open-set classes during testing, while adversarial defense aims to defend the network against images with imperceptible adversarial perturbations. In this paper, we show that open-set recognition systems are vulnerable to adversarial attacks. Furthermore, we show that adversarial defense mechanisms trained on known classes do not generalize well to open-set samples. Motivated by this observation, we emphasize the need of an Open-Set Adversarial Defense (OSAD) mechanism. This paper proposes an Open-Set Defense Network (OSDN) as a solution to the OSAD problem. The proposed network uses an encoder with feature-denoising layers coupled with a classifier to learn a noise-free latent feature representation. Two techniques are employed to obtain an informative latent feature space with the objective of improving open-set performance. First, a decoder is used to ensure that clean images can be reconstructed from the obtained latent features. Then, self-supervision is used to ensure that the latent features are informative enough to carry out an auxiliary task. We introduce a testing protocol to evaluate OSAD performance and show the effectiveness of the proposed method in multiple object classification datasets. The implementation code of the proposed method is available at: this https URL.
摘要:开放式设置的识别和对抗的防御研究,对于现实世界的部署至关重要的深度学习的两个关键方面。开集识别的目标是在测试过程,以确定在从开集类型的样品,而对抗的防守目标,以抵御与潜移默化的对抗性干扰图像的网络。在本文中,我们表明,开放式集合识别系统很容易受到攻击的对抗性。此外,我们还表明,经过训练上已知种类的对抗防御机制不能推广很好地开放组样品。通过这一观察的推动下,我们强调的开集对抗性防御(OSAD)机制的需要。本文提出了一种开集防御网络(OSDN)为解决OSAD问题。所提出的网络使用具有耦合的分类器学习一个无噪声的潜特征表示特征的去噪层的编码器。两种技术的采用,以获得与目标提高开放组性能的潜在信息特征空间。首先,解码器被用于确保清洁的图像可以从所获得的潜特征来重建。然后,自我监督来确保潜在的特点是信息足以进行辅助工作。我们引入一个测试方案,以评估OSAD性能,并显示在多个对象分类数据集所提出的方法的有效性。该方法的实现代码,请访问:此HTTPS URL。
40. CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection [PDF] 返回目录
Su Pang, Daniel Morris, Hayder Radha
Abstract: There have been significant advances in neural networks for both 3D object detection using LiDAR and 2D object detection using video. However, it has been surprisingly difficult to train networks to effectively use both modalities in a way that demonstrates gain over single-modality networks. In this paper, we propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network. CLOCs fusion provides a low-complexity multi-modal fusion framework that significantly improves the performance of single-modality detectors. CLOCs operates on the combined output candidates before Non-Maximum Suppression (NMS) of any 2D and any 3D detector, and is trained to leverage their geometric and semantic consistencies to produce more accurate final 3D and 2D detection results. Our experimental evaluation on the challenging KITTI object detection benchmark, including 3D and bird's eye view metrics, shows significant improvements, especially at long distance, over the state-of-the-art fusion based methods. At time of submission, CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard. We will release our code upon acceptance.
摘要:目前已在使用使用激光雷达视频和2D物体检测两种立体物检测神经网络显著的进步。然而,它已经相当困难,培训网络,演示了单模态下的网络增益的方法来有效地使用这两种方式。在本文中,我们提出了一个新的相机,激光雷达目标的候选项(CLOCs)融合网络。 CLOCs融合提供了一种低复杂度的多模态融合框架,显著提高单模态检测器的性能。 CLOCs运行在任何二维和三维的任何检测器的非最大抑制(NMS)之前的组合输出候补,被训练来利用它们的几何和语义一致性,以产生更精确的最终3D和2D的检测结果。我们对挑战KITTI物体检测基准实验评估,包括3D和鸟瞰指标,示出了显著改善,特别是在长的距离,在所述状态的最先进的基于融合的方法。在提交的时候,CLOCs位居所有在官方KITTI领先的基于融合的方法中最高的。在接受我们将发布我们的代码。
Su Pang, Daniel Morris, Hayder Radha
Abstract: There have been significant advances in neural networks for both 3D object detection using LiDAR and 2D object detection using video. However, it has been surprisingly difficult to train networks to effectively use both modalities in a way that demonstrates gain over single-modality networks. In this paper, we propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network. CLOCs fusion provides a low-complexity multi-modal fusion framework that significantly improves the performance of single-modality detectors. CLOCs operates on the combined output candidates before Non-Maximum Suppression (NMS) of any 2D and any 3D detector, and is trained to leverage their geometric and semantic consistencies to produce more accurate final 3D and 2D detection results. Our experimental evaluation on the challenging KITTI object detection benchmark, including 3D and bird's eye view metrics, shows significant improvements, especially at long distance, over the state-of-the-art fusion based methods. At time of submission, CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard. We will release our code upon acceptance.
摘要:目前已在使用使用激光雷达视频和2D物体检测两种立体物检测神经网络显著的进步。然而,它已经相当困难,培训网络,演示了单模态下的网络增益的方法来有效地使用这两种方式。在本文中,我们提出了一个新的相机,激光雷达目标的候选项(CLOCs)融合网络。 CLOCs融合提供了一种低复杂度的多模态融合框架,显著提高单模态检测器的性能。 CLOCs运行在任何二维和三维的任何检测器的非最大抑制(NMS)之前的组合输出候补,被训练来利用它们的几何和语义一致性,以产生更精确的最终3D和2D的检测结果。我们对挑战KITTI物体检测基准实验评估,包括3D和鸟瞰指标,示出了显著改善,特别是在长的距离,在所述状态的最先进的基于融合的方法。在提交的时候,CLOCs位居所有在官方KITTI领先的基于融合的方法中最高的。在接受我们将发布我们的代码。
41. A perception centred self-driving system without HD Maps [PDF] 返回目录
Alan Sun
Abstract: This paper proposes a new self-driving system to solve the localization and lines detection problem with the scalability under consideration. The proposed system is HD Map unrelated. All path planning is based on a rebuilt scene based on a topological map, and the traffic lines detection result from our detection subsystem. The proposed lines detection subsystem achieves a state of the art performance without using deep learning. The proposed localization subsystem relies on neither GPS nor IMU and provide a human level localization result by counting the stop lines and intersections. The system was tested on diverse datasets covering complicated urban situations. It is proved to be robust and easy to implement on a large scale.
摘要:本文提出了一种新的自我驱动系统,所考虑的可扩展性,解决了定位和线路检测问题。所提出的系统是高清地图无关。所有的路径规划是基于基于拓扑地图上的重建现场,并从我们的检测子系统交通线路检测结果。所提出的线检测子系统实现了本领域性能的状态下不使用深度学习。所提出的定位子系统依赖于GPS都不也不IMU和停车线和交叉点计数提供一个人的水平定位结果。该系统是在不同的数据集涵盖复杂的城市情况进行测试。它被证明是可靠,易于实现大规模。
Alan Sun
Abstract: This paper proposes a new self-driving system to solve the localization and lines detection problem with the scalability under consideration. The proposed system is HD Map unrelated. All path planning is based on a rebuilt scene based on a topological map, and the traffic lines detection result from our detection subsystem. The proposed lines detection subsystem achieves a state of the art performance without using deep learning. The proposed localization subsystem relies on neither GPS nor IMU and provide a human level localization result by counting the stop lines and intersections. The system was tested on diverse datasets covering complicated urban situations. It is proved to be robust and easy to implement on a large scale.
摘要:本文提出了一种新的自我驱动系统,所考虑的可扩展性,解决了定位和线路检测问题。所提出的系统是高清地图无关。所有的路径规划是基于基于拓扑地图上的重建现场,并从我们的检测子系统交通线路检测结果。所提出的线检测子系统实现了本领域性能的状态下不使用深度学习。所提出的定位子系统依赖于GPS都不也不IMU和停车线和交叉点计数提供一个人的水平定位结果。该系统是在不同的数据集涵盖复杂的城市情况进行测试。它被证明是可靠,易于实现大规模。
42. LSMVOS: Long-Short-Term Similarity Matching for Video Object [PDF] 返回目录
Zhang Xuerui, Yuan Xia
Abstract: Objective Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. Existing algorithms are mostly based on the objectives of matching and propagation strategies, which often make use of the previous frame with masking or optical flow. This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation, and proposes the network of Long-Short-Term similarity matching for video object segmentation (LSMOVS) Method: By conducting pixel-level matching and correlation between long-term matching module and short-term matching module with the first frame and previous frame, global similarity map and local similarity map are obtained, as well as feature pattern of current frame and masking of previous frame. After two refine networks, final results are obtained through segmentation network. Results: According to the experiments on the two data sets DAVIS 2016 and 2017, the method of this paper achieves favorable average of region similarity and contour accuracy without online fine tuning, which achieves 86.5% and 77.4% in terms of single target and multiple targets. Besides, the count of segmented frames per second reached 21. Conclusion: The short-term matching module proposed in this paper is more conducive to extracting the information of the previous frame than only the mask. By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning
摘要:目的的半监督视频对象分割是指分割中给出在所述第一帧中的对象标签后续帧的对象。现有的算法大多基于的匹配和传播策略的目标,这经常利用前一帧与掩蔽或光流。本文探讨一种新的传播的方法,使用短期匹配模块,以提取前一帧的信息,并在传播应用它,并提出了长短期相似性匹配的视频对象分割(LSMOVS)方法的网络:通过在进行长期匹配模块并且与第一帧和前一帧,全局相似地图和局部相似地图短期匹配模块之间的像素电平匹配和相关获得,以及当前帧的特征图案和前一帧的掩蔽。 2个精炼网络之后,最后的结果是通过分割网络获得。结果:根据对两个数据的实验设置DAVIS 2016和2017,本文的方法实现了区域的相似性和轮廓精度的有利平均无在线微调,达到86.5%和77.4%在单靶和多个目标方面。另外,每秒分割的帧的计数达到21结论:在本文提出的短期匹配模块更有利于比仅掩模提取前一帧的信息。通过结合短线匹配模块的长期匹配模块,整个网络可以实现高效的视频对象分割不在线微调
Zhang Xuerui, Yuan Xia
Abstract: Objective Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. Existing algorithms are mostly based on the objectives of matching and propagation strategies, which often make use of the previous frame with masking or optical flow. This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation, and proposes the network of Long-Short-Term similarity matching for video object segmentation (LSMOVS) Method: By conducting pixel-level matching and correlation between long-term matching module and short-term matching module with the first frame and previous frame, global similarity map and local similarity map are obtained, as well as feature pattern of current frame and masking of previous frame. After two refine networks, final results are obtained through segmentation network. Results: According to the experiments on the two data sets DAVIS 2016 and 2017, the method of this paper achieves favorable average of region similarity and contour accuracy without online fine tuning, which achieves 86.5% and 77.4% in terms of single target and multiple targets. Besides, the count of segmented frames per second reached 21. Conclusion: The short-term matching module proposed in this paper is more conducive to extracting the information of the previous frame than only the mask. By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning
摘要:目的的半监督视频对象分割是指分割中给出在所述第一帧中的对象标签后续帧的对象。现有的算法大多基于的匹配和传播策略的目标,这经常利用前一帧与掩蔽或光流。本文探讨一种新的传播的方法,使用短期匹配模块,以提取前一帧的信息,并在传播应用它,并提出了长短期相似性匹配的视频对象分割(LSMOVS)方法的网络:通过在进行长期匹配模块并且与第一帧和前一帧,全局相似地图和局部相似地图短期匹配模块之间的像素电平匹配和相关获得,以及当前帧的特征图案和前一帧的掩蔽。 2个精炼网络之后,最后的结果是通过分割网络获得。结果:根据对两个数据的实验设置DAVIS 2016和2017,本文的方法实现了区域的相似性和轮廓精度的有利平均无在线微调,达到86.5%和77.4%在单靶和多个目标方面。另外,每秒分割的帧的计数达到21结论:在本文提出的短期匹配模块更有利于比仅掩模提取前一帧的信息。通过结合短线匹配模块的长期匹配模块,整个网络可以实现高效的视频对象分割不在线微调
43. Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [PDF] 返回目录
Peixuan Li
Abstract: In this work, we propose a novel single-shot and keypoints-based framework for monocular 3D objects detection using only RGB images, called KM3D-Net. We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute. Further, we reformulate the geometric constraints as a differentiable version and embed it into the network to reduce running time while maintaining the consistency of model outputs in an end-to-end fashion. Benefiting from this simple structure, we then propose an effective semi-supervised training strategy for the setting where labeled training data is scarce. In this strategy, we enforce a consensus prediction of two shared-weights KM3D-Net for the same unlabeled image under different input augmentation conditions and network regularization. In particular, we unify the coordinate-dependent augmentations as the affine transformation for the differential recovering position of objects and propose a keypoints-dropout module for the network regularization. Our model only requires RGB images without synthetic data, instance segmentation, CAD model, or depth generator. Nevertheless, extensive experiments on the popular KITTI 3D detection dataset indicate that the KM3D-Net surpasses all previous state-of-the-art methods in both efficiency and accuracy by a large margin. And also, to the best of our knowledge, this is the first time that semi-supervised learning is applied in monocular 3D objects detection. We even surpass most of the previous fully supervised methods with only 13\% labeled data on KITTI.
摘要:在这项工作中,我们建议只使用RGB图像,呼吁KM3D-Net的单眼3D一种新型的基于关键点单发和框架对象检测。我们设计了一个完全卷积模型来预测对象的关键点,尺寸和方向,然后用透视几何约束结合这些估计来计算位置属性。此外,我们重新制定几何约束作为微分版本,并且将其嵌入到所述网络,以减少运行时间,同时保持在端至端时装模特输出的一致性。从这个简单的结构中受益,那么,我们提出了在那里标记的训练数据稀少设定一个有效的半监督培训战略。在这种策略中,我们执行两个共享权重KM3D-Net的用于不同输入扩增条件和网络正规化下相同的未标记的图像的共识预测。特别是,我们统一坐标有关扩充作为差分恢复对象的位置的仿射变换,并提出用于网络正规化一个关键点差的模块。我们的模型中只需要RGB的图像,而不合成数据,例如分割,CAD模型,或深度发生器。然而,在流行的3D KITTI检测数据集大量实验表明,该KM3D-Net的大幅度超越了效率和准确性所有以前的国家的最先进的方法。而且,据我们所知,这是第一次,半监督学习在单眼3D应用目标检测。我们甚至超越以往大部分的充分监督方法与KITTI只有13 \%的标签数据。
Peixuan Li
Abstract: In this work, we propose a novel single-shot and keypoints-based framework for monocular 3D objects detection using only RGB images, called KM3D-Net. We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute. Further, we reformulate the geometric constraints as a differentiable version and embed it into the network to reduce running time while maintaining the consistency of model outputs in an end-to-end fashion. Benefiting from this simple structure, we then propose an effective semi-supervised training strategy for the setting where labeled training data is scarce. In this strategy, we enforce a consensus prediction of two shared-weights KM3D-Net for the same unlabeled image under different input augmentation conditions and network regularization. In particular, we unify the coordinate-dependent augmentations as the affine transformation for the differential recovering position of objects and propose a keypoints-dropout module for the network regularization. Our model only requires RGB images without synthetic data, instance segmentation, CAD model, or depth generator. Nevertheless, extensive experiments on the popular KITTI 3D detection dataset indicate that the KM3D-Net surpasses all previous state-of-the-art methods in both efficiency and accuracy by a large margin. And also, to the best of our knowledge, this is the first time that semi-supervised learning is applied in monocular 3D objects detection. We even surpass most of the previous fully supervised methods with only 13\% labeled data on KITTI.
摘要:在这项工作中,我们建议只使用RGB图像,呼吁KM3D-Net的单眼3D一种新型的基于关键点单发和框架对象检测。我们设计了一个完全卷积模型来预测对象的关键点,尺寸和方向,然后用透视几何约束结合这些估计来计算位置属性。此外,我们重新制定几何约束作为微分版本,并且将其嵌入到所述网络,以减少运行时间,同时保持在端至端时装模特输出的一致性。从这个简单的结构中受益,那么,我们提出了在那里标记的训练数据稀少设定一个有效的半监督培训战略。在这种策略中,我们执行两个共享权重KM3D-Net的用于不同输入扩增条件和网络正规化下相同的未标记的图像的共识预测。特别是,我们统一坐标有关扩充作为差分恢复对象的位置的仿射变换,并提出用于网络正规化一个关键点差的模块。我们的模型中只需要RGB的图像,而不合成数据,例如分割,CAD模型,或深度发生器。然而,在流行的3D KITTI检测数据集大量实验表明,该KM3D-Net的大幅度超越了效率和准确性所有以前的国家的最先进的方法。而且,据我们所知,这是第一次,半监督学习在单眼3D应用目标检测。我们甚至超越以往大部分的充分监督方法与KITTI只有13 \%的标签数据。
44. Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition [PDF] 返回目录
Priyanka Das, Joseph McGrath, Zhaoyuan Fang, Aidan Boyd, Ganghee Jang, Amir Mohammadi, Sandip Purnapatra, David Yambay, Sébastien Marcel, Mateusz Trokielewicz, Piotr Maciejewicz, Kevin Bowyer, Adam Czajka, Stephanie Schuckers, Juan Tapia, Sebastian Gonzalez, Meiling Fang, Naser Damer, Fadi Boutros, Arjan Kuijper, Renu Sharma, Cunjian Chen, Arun Ross
Abstract: Launched in 2013, LivDet-Iris is an international competition series open to academia and industry with the aim to assess and report advances in iris Presentation Attack Detection (PAD). This paper presents results from the fourth competition of the series: LivDet-Iris 2020. This year's competition introduced several novel elements: (a) incorporated new types of attacks (samples displayed on a screen, cadaver eyes and prosthetic eyes), (b) initiated LivDet-Iris as an on-going effort, with a testing protocol available now to everyone via the Biometrics Evaluation and Testing (BEAT)(this https URL) open-source platform to facilitate reproducibility and benchmarking of new algorithms continuously, and (c) performance comparison of the submitted entries with three baseline methods (offered by the University of Notre Dame and Michigan State University), and three open-source iris PAD methods available in the public domain. The best performing entry to the competition reported a weighted average APCER of 59.10\% and a BPCER of 0.46\% over all five attack types. This paper serves as the latest evaluation of iris PAD on a large spectrum of presentation attack instruments.
摘要:在2013年推出,LivDet光圈是一个国际竞争的一系列开放学术界和工业界,目的是评估和报告进展虹膜介绍入侵检测(PAD)。本文展示该系列的第四比赛结果:LivDet光圈2020今年的比赛引入了几个新的内容:(一)纳入新型攻击(样本显示在屏幕上,尸体的眼睛和义眼),(B)发起LivDet光圈为一个持续的努力,以提供一个测试协议现在通过生物识别技术评估和测试(BEAT)每个人(此HTTPS URL)的开源平台,促进重复性和持续的新算法标杆,和(c )三层基线的方法(由巴黎圣母院和密歇根州立大学)的大学提供,和三个开源光圈PAD方法可在公共领域的参赛作品的性能对比。表现最好的进入比赛报道的59.10 \%,比全部五个攻击类型的加权平均APCER和0.46 BPCER \%。本文作为虹膜PAD对大范围的攻击演示工具的最新评估。
Priyanka Das, Joseph McGrath, Zhaoyuan Fang, Aidan Boyd, Ganghee Jang, Amir Mohammadi, Sandip Purnapatra, David Yambay, Sébastien Marcel, Mateusz Trokielewicz, Piotr Maciejewicz, Kevin Bowyer, Adam Czajka, Stephanie Schuckers, Juan Tapia, Sebastian Gonzalez, Meiling Fang, Naser Damer, Fadi Boutros, Arjan Kuijper, Renu Sharma, Cunjian Chen, Arun Ross
Abstract: Launched in 2013, LivDet-Iris is an international competition series open to academia and industry with the aim to assess and report advances in iris Presentation Attack Detection (PAD). This paper presents results from the fourth competition of the series: LivDet-Iris 2020. This year's competition introduced several novel elements: (a) incorporated new types of attacks (samples displayed on a screen, cadaver eyes and prosthetic eyes), (b) initiated LivDet-Iris as an on-going effort, with a testing protocol available now to everyone via the Biometrics Evaluation and Testing (BEAT)(this https URL) open-source platform to facilitate reproducibility and benchmarking of new algorithms continuously, and (c) performance comparison of the submitted entries with three baseline methods (offered by the University of Notre Dame and Michigan State University), and three open-source iris PAD methods available in the public domain. The best performing entry to the competition reported a weighted average APCER of 59.10\% and a BPCER of 0.46\% over all five attack types. This paper serves as the latest evaluation of iris PAD on a large spectrum of presentation attack instruments.
摘要:在2013年推出,LivDet光圈是一个国际竞争的一系列开放学术界和工业界,目的是评估和报告进展虹膜介绍入侵检测(PAD)。本文展示该系列的第四比赛结果:LivDet光圈2020今年的比赛引入了几个新的内容:(一)纳入新型攻击(样本显示在屏幕上,尸体的眼睛和义眼),(B)发起LivDet光圈为一个持续的努力,以提供一个测试协议现在通过生物识别技术评估和测试(BEAT)每个人(此HTTPS URL)的开源平台,促进重复性和持续的新算法标杆,和(c )三层基线的方法(由巴黎圣母院和密歇根州立大学)的大学提供,和三个开源光圈PAD方法可在公共领域的参赛作品的性能对比。表现最好的进入比赛报道的59.10 \%,比全部五个攻击类型的加权平均APCER和0.46 BPCER \%。本文作为虹膜PAD对大范围的攻击演示工具的最新评估。
45. Bidirectional Attention Network for Monocular Depth Estimation [PDF] 返回目录
Shubhra Aich, Jean Marie Uwabeza Vianney, Md Amirul Islam, Mannat Kaur, Bingbing Liu
Abstract: In this paper, we propose a Bidirectional Attention Network (BANet), an end-to-end framework for monocular depth estimation that addresses the limitation of effectively integrating local and global information in convolutional neural networks. The structure of this mechanism derives from a strong conceptual foundation of neural machine translation, and presents a light-weight mechanism for adaptive control of computation similar to the dynamic nature of recurrent neural networks. We introduce bidirectional attention modules that utilize the feed-forward feature maps and incorporate the global context to filter out ambiguity. Extensive experiments reveal the high degree of capability that this bidirectional attention model presents over feed-forward baselines and other state-of-the-art methods for monocular depth estimation on two challenging datasets, KITTI and DIODE. We show that our proposed approach either outperforms or performs at least on a par with the state-of-the-art monocular depth estimation methods with less memory and computational complexity.
摘要:在本文中,我们提出了一个双向关注网络(BANet),最终到终端的框架单眼深度估计该地址的卷积神经网络的有效整合本地和全球信息的限制。从神经机器翻译的一个强有力的概念基础这种机制导出,并提出的结构的重量轻的机制类似于回归神经网络的动态特性计算的自适应控制。我们介绍的是利用前馈功能的地图,并纳入全球范围内筛选出模棱两可的双向关注的模块。大量的实验揭示了高度的能力,超过前馈基线和其他国家的最先进的方法单眼深度估计在两个具有挑战性的数据集,KITTI二极管这种双向注意力模型礼物。我们证明了我们提出的方法无论是在用较少的内存和计算复杂性的国家的最先进的单眼深度估计方法相提并论性能优于或至少执行。
Shubhra Aich, Jean Marie Uwabeza Vianney, Md Amirul Islam, Mannat Kaur, Bingbing Liu
Abstract: In this paper, we propose a Bidirectional Attention Network (BANet), an end-to-end framework for monocular depth estimation that addresses the limitation of effectively integrating local and global information in convolutional neural networks. The structure of this mechanism derives from a strong conceptual foundation of neural machine translation, and presents a light-weight mechanism for adaptive control of computation similar to the dynamic nature of recurrent neural networks. We introduce bidirectional attention modules that utilize the feed-forward feature maps and incorporate the global context to filter out ambiguity. Extensive experiments reveal the high degree of capability that this bidirectional attention model presents over feed-forward baselines and other state-of-the-art methods for monocular depth estimation on two challenging datasets, KITTI and DIODE. We show that our proposed approach either outperforms or performs at least on a par with the state-of-the-art monocular depth estimation methods with less memory and computational complexity.
摘要:在本文中,我们提出了一个双向关注网络(BANet),最终到终端的框架单眼深度估计该地址的卷积神经网络的有效整合本地和全球信息的限制。从神经机器翻译的一个强有力的概念基础这种机制导出,并提出的结构的重量轻的机制类似于回归神经网络的动态特性计算的自适应控制。我们介绍的是利用前馈功能的地图,并纳入全球范围内筛选出模棱两可的双向关注的模块。大量的实验揭示了高度的能力,超过前馈基线和其他国家的最先进的方法单眼深度估计在两个具有挑战性的数据集,KITTI二极管这种双向注意力模型礼物。我们证明了我们提出的方法无论是在用较少的内存和计算复杂性的国家的最先进的单眼深度估计方法相提并论性能优于或至少执行。
46. SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization [PDF] 返回目录
Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia
Abstract: We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effectively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.
摘要:用于检测和多种类型的图像操作的定位提出一种新的框架,空间金字塔注意网络(SPAN)。所提出的架构有效和高效地通过构建地方自治关注块的金字塔模型的多尺度图像块之间的关系。该设计包括一新颖的位置投影到编码补丁的空间位置。 SPAN被训练在一个通用的,合成的数据集,但也可以微调为特定的数据集;在比以前的国家的最先进的方法,标准数据集的性能所提出的方法显示显著的收益。
Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia
Abstract: We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effectively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.
摘要:用于检测和多种类型的图像操作的定位提出一种新的框架,空间金字塔注意网络(SPAN)。所提出的架构有效和高效地通过构建地方自治关注块的金字塔模型的多尺度图像块之间的关系。该设计包括一新颖的位置投影到编码补丁的空间位置。 SPAN被训练在一个通用的,合成的数据集,但也可以微调为特定的数据集;在比以前的国家的最先进的方法,标准数据集的性能所提出的方法显示显著的收益。
47. Unsupervised Single-Image Reflection Separation Using Perceptual Deep Image Priors [PDF] 返回目录
Suhong Kim, Hamed RahmaniKhezri, Seyed Mohammad Nourbakhsh, Mohamed Hefeeda
Abstract: Reflections often degrade the quality of the image by obstructing the background scene. This is not desirable for everyday users, and it negatively impacts the performance of multimedia applications that process images with reflections. Most current methods for removing reflections utilize supervised-learning models. However, these models require an extensive number of image pairs to perform well, especially on natural images with reflection, which is difficult to achieve in practice. In this paper, we propose a novel unsupervised framework for single-image reflection separation. Instead of learning from a large dataset, we optimize the parameters of two cross-coupled deep convolutional networks on a target image to generate two exclusive background and reflection layers. In particular, we design a new architecture of the network to embed semantic features extracted from a pre-trained deep classification network, which gives more meaningful separation similar to human perception. Quantitative and qualitative results on commonly used datasets in the literature show that our method's performance is at least on par with the state-of-the-art supervised methods and, occasionally, better without requiring large training datasets. Our results also show that our method significantly outperforms the closest unsupervised method in the literature for removing reflections from single images.
摘要:反思往往阻碍背景场景降低图像的质量。这是不可取的日常用户和多媒体应用处理图像与思考其负面影响性能。去除反射目前大多数方法利用监督学习模型。然而,这些模型需要图像对广泛的数目表现良好,特别是与反射,这是在实践中难以实现自然的图像。在本文中,我们提出了单图像反射分离的新的无监督的框架。代替从一个大的数据集的学习,我们优化目标图像上的两个交叉耦合的深卷积网络的参数来产生两个异背景和反射层。特别是,我们设计的网络的一个新的体系结构,以从预先训练的分类深网络,其提供了一种类似于人类感知更有意义分离提取嵌入语义特征。定量和文献表明,我们的方法的性能至少看齐,与通常使用的数据集的定性结果的国家的最先进的方法,监督,有时还更好,而不需要大量的训练数据集。我们的结果也表明,我们的方法显著优于最接近的无监督方法在文献从单个图像消除反射。
Suhong Kim, Hamed RahmaniKhezri, Seyed Mohammad Nourbakhsh, Mohamed Hefeeda
Abstract: Reflections often degrade the quality of the image by obstructing the background scene. This is not desirable for everyday users, and it negatively impacts the performance of multimedia applications that process images with reflections. Most current methods for removing reflections utilize supervised-learning models. However, these models require an extensive number of image pairs to perform well, especially on natural images with reflection, which is difficult to achieve in practice. In this paper, we propose a novel unsupervised framework for single-image reflection separation. Instead of learning from a large dataset, we optimize the parameters of two cross-coupled deep convolutional networks on a target image to generate two exclusive background and reflection layers. In particular, we design a new architecture of the network to embed semantic features extracted from a pre-trained deep classification network, which gives more meaningful separation similar to human perception. Quantitative and qualitative results on commonly used datasets in the literature show that our method's performance is at least on par with the state-of-the-art supervised methods and, occasionally, better without requiring large training datasets. Our results also show that our method significantly outperforms the closest unsupervised method in the literature for removing reflections from single images.
摘要:反思往往阻碍背景场景降低图像的质量。这是不可取的日常用户和多媒体应用处理图像与思考其负面影响性能。去除反射目前大多数方法利用监督学习模型。然而,这些模型需要图像对广泛的数目表现良好,特别是与反射,这是在实践中难以实现自然的图像。在本文中,我们提出了单图像反射分离的新的无监督的框架。代替从一个大的数据集的学习,我们优化目标图像上的两个交叉耦合的深卷积网络的参数来产生两个异背景和反射层。特别是,我们设计的网络的一个新的体系结构,以从预先训练的分类深网络,其提供了一种类似于人类感知更有意义分离提取嵌入语义特征。定量和文献表明,我们的方法的性能至少看齐,与通常使用的数据集的定性结果的国家的最先进的方法,监督,有时还更好,而不需要大量的训练数据集。我们的结果也表明,我们的方法显著优于最接近的无监督方法在文献从单个图像消除反射。
48. Aggregating Long-Term Context for Learning Surgical Workflows [PDF] 返回目录
Yutong Ban, Guy Rosman, Thomas Ward, Daniel Hashimoto, Taisei Kondo, Ozanan Meireles, Daniela Rus
Abstract: Analyzing surgical workflow is crucial for computers to understand surgeries. Deep learning techniques have recently been widely applied to recognize surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead of relying upon strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We leverage our approach within an LSTM back-bone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing state-of-the-art segmentation and novel segmentation techniques, on two laparoscopic cholecystectomy datasets: the already published Cholec80dataset and MGH100, a novel dataset with more challenging, yet clinically meaningful, segment labels.
摘要:分析手术流程是至关重要的电脑了解手术。深学习技术最近被广泛应用到识别的手术流程。现有的许多时间神经网络模型在他们的能力的限制来处理长期依赖于数据,而不是依赖于底层的每帧可视化模型的强劲表现。我们提出了一个新的时态网络结构任务的具体杠杆网络表示收集由足够的统计模型(SSM)传播的长期足够的统计数据。我们利用我们的LSTM回骨内的方法进行手术阶段识别任务和探索传播统计多种选择。我们证明了国家的最先进的现有分段和新颖的分割技术更好的结果,在两个腹腔镜胆囊切除术的数据集:已公布Cholec80dataset和MGH100,一种新型的数据集更具挑战性,但临床意义,段标签。
Yutong Ban, Guy Rosman, Thomas Ward, Daniel Hashimoto, Taisei Kondo, Ozanan Meireles, Daniela Rus
Abstract: Analyzing surgical workflow is crucial for computers to understand surgeries. Deep learning techniques have recently been widely applied to recognize surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead of relying upon strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We leverage our approach within an LSTM back-bone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing state-of-the-art segmentation and novel segmentation techniques, on two laparoscopic cholecystectomy datasets: the already published Cholec80dataset and MGH100, a novel dataset with more challenging, yet clinically meaningful, segment labels.
摘要:分析手术流程是至关重要的电脑了解手术。深学习技术最近被广泛应用到识别的手术流程。现有的许多时间神经网络模型在他们的能力的限制来处理长期依赖于数据,而不是依赖于底层的每帧可视化模型的强劲表现。我们提出了一个新的时态网络结构任务的具体杠杆网络表示收集由足够的统计模型(SSM)传播的长期足够的统计数据。我们利用我们的LSTM回骨内的方法进行手术阶段识别任务和探索传播统计多种选择。我们证明了国家的最先进的现有分段和新颖的分割技术更好的结果,在两个腹腔镜胆囊切除术的数据集:已公布Cholec80dataset和MGH100,一种新型的数据集更具挑战性,但临床意义,段标签。
49. Text and Style Conditioned GAN for Generation of Offline Handwriting Lines [PDF] 返回目录
Brian Davis, Chris Tensmeyer, Brian Price, Curtis Wigington, Bryan Morse, Rajiv Jain
Abstract: This paper presents a GAN for generating images of handwritten lines conditioned on arbitrary text and latent style vectors. Unlike prior work, which produce stroke points or single-word images, this model generates entire lines of offline handwriting. The model produces variable-sized images by using style vectors to determine character widths. A generator network is trained with GAN and autoencoder techniques to learn style, and uses a pre-trained handwriting recognition network to induce legibility. A study using human evaluators demonstrates that the model produces images that appear to be written by a human. After training, the encoder network can extract a style vector from an image, allowing images in a similar style to be generated, but with arbitrary text.
摘要:本文介绍了产生的手写线条图像的GaN上任意的文本和潜在风格矢量调节。不同于现有的工作,这产生笔画点或单词的图像,该模型生成离线笔迹的整个行。该模型通过使用样式矢量,以确定字符宽度产生可变大小的图像。发电机网络与GAN和自动编码技术的培训学习风格,并采用预训练手写识别网络诱发的易读性。采用人工评估的一项研究表明,该模型产生似乎是由人来书写的笔迹。训练结束后,编码器网络可以从图像中提取一个风格向量,从而允许产生在一个类似的风格的图像,但与任意文本。
Brian Davis, Chris Tensmeyer, Brian Price, Curtis Wigington, Bryan Morse, Rajiv Jain
Abstract: This paper presents a GAN for generating images of handwritten lines conditioned on arbitrary text and latent style vectors. Unlike prior work, which produce stroke points or single-word images, this model generates entire lines of offline handwriting. The model produces variable-sized images by using style vectors to determine character widths. A generator network is trained with GAN and autoencoder techniques to learn style, and uses a pre-trained handwriting recognition network to induce legibility. A study using human evaluators demonstrates that the model produces images that appear to be written by a human. After training, the encoder network can extract a style vector from an image, allowing images in a similar style to be generated, but with arbitrary text.
摘要:本文介绍了产生的手写线条图像的GaN上任意的文本和潜在风格矢量调节。不同于现有的工作,这产生笔画点或单词的图像,该模型生成离线笔迹的整个行。该模型通过使用样式矢量,以确定字符宽度产生可变大小的图像。发电机网络与GAN和自动编码技术的培训学习风格,并采用预训练手写识别网络诱发的易读性。采用人工评估的一项研究表明,该模型产生似乎是由人来书写的笔迹。训练结束后,编码器网络可以从图像中提取一个风格向量,从而允许产生在一个类似的风格的图像,但与任意文本。
50. Fed-Sim: Federated Simulation for Medical Imaging [PDF] 返回目录
Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F Frangi, Sanja Fidler
Abstract: Labelling data is expensive and time consuming especially for domains such as medical imaging that contain volumetric imaging data and require expert knowledge. Exploiting a larger pool of labeled data available across multiple centers, such as in federated learning, has also seen limited success since current deep learning approaches do not generalize well to images acquired with scanners from different manufacturers. We aim to address these problems in a common, learning-based image simulation framework which we refer to as Federated Simulation. We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations. Since the model of geometry and material is disentangled from the imaging sensor, it can effectively be trained across multiple medical centers. We show that our data synthesis framework improves the downstream segmentation performance on several datasets. Project Page: this https URL .
摘要:标签数据是昂贵的和时间消耗尤其对于结构域,如医学成像包含体积成像数据,并需要专家知识。利用跨多个中心,可标示更大的数据池,如联合学习,也看到了有限的成功,因为目前的深度学习方法不推广以及与来自不同制造商的扫描仪获取的图像。我们的目标是解决一个共同的,基于学习的图像模拟框架这些问题,我们称之为联合仿真。我们介绍由两个可学习神经模块物理驱动生成的方法:1),其合成的3D心脏形状与它们的材料沿着一个模块,和2)CT模拟器,使得这些成逼真的3D CT卷,与注释。因为几何形状和材料的模型被从成像传感器解开,它可以有效地跨多个医疗中心训练。我们证明了我们的数据合成框架提高了几个数据集下游分割性能。项目页面:这个HTTPS URL。
Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F Frangi, Sanja Fidler
Abstract: Labelling data is expensive and time consuming especially for domains such as medical imaging that contain volumetric imaging data and require expert knowledge. Exploiting a larger pool of labeled data available across multiple centers, such as in federated learning, has also seen limited success since current deep learning approaches do not generalize well to images acquired with scanners from different manufacturers. We aim to address these problems in a common, learning-based image simulation framework which we refer to as Federated Simulation. We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations. Since the model of geometry and material is disentangled from the imaging sensor, it can effectively be trained across multiple medical centers. We show that our data synthesis framework improves the downstream segmentation performance on several datasets. Project Page: this https URL .
摘要:标签数据是昂贵的和时间消耗尤其对于结构域,如医学成像包含体积成像数据,并需要专家知识。利用跨多个中心,可标示更大的数据池,如联合学习,也看到了有限的成功,因为目前的深度学习方法不推广以及与来自不同制造商的扫描仪获取的图像。我们的目标是解决一个共同的,基于学习的图像模拟框架这些问题,我们称之为联合仿真。我们介绍由两个可学习神经模块物理驱动生成的方法:1),其合成的3D心脏形状与它们的材料沿着一个模块,和2)CT模拟器,使得这些成逼真的3D CT卷,与注释。因为几何形状和材料的模型被从成像传感器解开,它可以有效地跨多个医疗中心训练。我们证明了我们的数据合成框架提高了几个数据集下游分割性能。项目页面:这个HTTPS URL。
51. View-invariant action recognition [PDF] 返回目录
Yogesh S Rawat, Shruti Vyas
Abstract: Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints.
摘要:人类动作识别是计算机视觉中的一个重要问题。它具有广泛的监控,人机交互,增强现实,视频索引和检索应用程序。由人的活动产生的空间 - 时间的外观的变化的图案是用于识别所执行的动作键。我们已经看到了大量的研究探索时空出现这种动态学习人类行为的可视化表示。然而,大多数在动作识别的研究都集中在一些常见的观点,当存在观点的改变,这些方法不能表现良好。人动作在3维环境中进行,并且当从一个给定视点的视频捕获被投影到一个2维空间。因此,一个动作都会有从不同的角度不同的时空外观。鉴于不变的动作识别解决了这个问题,侧重于识别来自看不见的观点人类行为的研究。
Yogesh S Rawat, Shruti Vyas
Abstract: Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints.
摘要:人类动作识别是计算机视觉中的一个重要问题。它具有广泛的监控,人机交互,增强现实,视频索引和检索应用程序。由人的活动产生的空间 - 时间的外观的变化的图案是用于识别所执行的动作键。我们已经看到了大量的研究探索时空出现这种动态学习人类行为的可视化表示。然而,大多数在动作识别的研究都集中在一些常见的观点,当存在观点的改变,这些方法不能表现良好。人动作在3维环境中进行,并且当从一个给定视点的视频捕获被投影到一个2维空间。因此,一个动作都会有从不同的角度不同的时空外观。鉴于不变的动作识别解决了这个问题,侧重于识别来自看不见的观点人类行为的研究。
52. NPRportrait 1.0: A Three-Level Benchmark for Non-Photorealistic Rendering of Portraits [PDF] 返回目录
Paul L. Rosin, Yu-Kun Lai, David Mould, Ran Yi, Itamar Berger, Lars Doyle, Seungyong Lee, Chuan Li, Yong-Jin Liu, Amir Semmo, Ariel Shamir, Minjung Son, Holger Winnemoller
Abstract: Despite the recent upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer, the state of performance evaluation in this field is limited, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and neural style transfer) using the new benchmark dataset.
摘要:尽管活动的基于图像的非真实感渲染(NPR),特别是人像图像程式化,由于神经风格转移的来临,近期高涨,绩效评估在这一领域的状态是有限的,特别是相对于该规范在计算机视觉和机器学习社区。不幸的是,评估图像程式化的任务是迄今没有明确定义,因为它涉及主观的,感性的和美学方面。为了争取解决的进展,提出了一种新的结构,三级,基准程式化的人像图像的分析数据集。严格标准用于其构建,和它的一致性是由用户研究验证。此外,新的方法已被开发用于评估肖像程式化的算法,这使得使用不同的基准水平,以及通过关于脸部的特征的用户研究提供的注释。我们使用新基准数据集多种图像程式化方法(人像特定都和通用,也无论是传统方法NPR和神经传递风格)进行评估。
Paul L. Rosin, Yu-Kun Lai, David Mould, Ran Yi, Itamar Berger, Lars Doyle, Seungyong Lee, Chuan Li, Yong-Jin Liu, Amir Semmo, Ariel Shamir, Minjung Son, Holger Winnemoller
Abstract: Despite the recent upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer, the state of performance evaluation in this field is limited, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and neural style transfer) using the new benchmark dataset.
摘要:尽管活动的基于图像的非真实感渲染(NPR),特别是人像图像程式化,由于神经风格转移的来临,近期高涨,绩效评估在这一领域的状态是有限的,特别是相对于该规范在计算机视觉和机器学习社区。不幸的是,评估图像程式化的任务是迄今没有明确定义,因为它涉及主观的,感性的和美学方面。为了争取解决的进展,提出了一种新的结构,三级,基准程式化的人像图像的分析数据集。严格标准用于其构建,和它的一致性是由用户研究验证。此外,新的方法已被开发用于评估肖像程式化的算法,这使得使用不同的基准水平,以及通过关于脸部的特征的用户研究提供的注释。我们使用新基准数据集多种图像程式化方法(人像特定都和通用,也无论是传统方法NPR和神经传递风格)进行评估。
53. Excavating "Excavating AI": The Elephant in the Gallery [PDF] 返回目录
Michael J. Lyons
Abstract: Contains critical commentary on the exhibitions "Training Humans" and "Making Faces" by Kate Crawford and Trevor Paglen, and on the accompanying essay "Excavating AI: The politics of images in machine learning training sets."
摘要:包含有关展览“培训人类”和凯特·克劳福德和特雷弗·帕格伦“做鬼脸”,并在文章随行解说的关键“挖掘AI:在机器学习训练集图像的政治”。
Michael J. Lyons
Abstract: Contains critical commentary on the exhibitions "Training Humans" and "Making Faces" by Kate Crawford and Trevor Paglen, and on the accompanying essay "Excavating AI: The politics of images in machine learning training sets."
摘要:包含有关展览“培训人类”和凯特·克劳福德和特雷弗·帕格伦“做鬼脸”,并在文章随行解说的关键“挖掘AI:在机器学习训练集图像的政治”。
54. The Effect of Various Strengths of Noises and Data Augmentations on Classification of Short Single-Lead ECG Signals Using Deep Neural Networks [PDF] 返回目录
Faezeh Nejati Hatamian, AmirAbbas Davari, Andreas Maier
Abstract: Due to the multiple imperfections during the signal acquisition, Electrocardiogram (ECG) datasets are typically contaminated with numerous types of noise, like salt and pepper and baseline drift. These datasets may contain different recordings with various types of noise [1] and thus, denoising may not be the easiest task. Furthermore, usually, the number of labeled bio-signals is very limited for a proper classification task.
摘要:由于信号采集期间所述多个缺陷,心电图(ECG)数据集通常污染与许多类型的噪声,如盐和胡椒粉和基线漂移。这些数据集可以包含不同的录音与各种类型的噪声[1],并因此,降噪可能不是最简单的任务。此外,通常,标记的生物信号的数量非常的正确分类任务的限制。
Faezeh Nejati Hatamian, AmirAbbas Davari, Andreas Maier
Abstract: Due to the multiple imperfections during the signal acquisition, Electrocardiogram (ECG) datasets are typically contaminated with numerous types of noise, like salt and pepper and baseline drift. These datasets may contain different recordings with various types of noise [1] and thus, denoising may not be the easiest task. Furthermore, usually, the number of labeled bio-signals is very limited for a proper classification task.
摘要:由于信号采集期间所述多个缺陷,心电图(ECG)数据集通常污染与许多类型的噪声,如盐和胡椒粉和基线漂移。这些数据集可以包含不同的录音与各种类型的噪声[1],并因此,降噪可能不是最简单的任务。此外,通常,标记的生物信号的数量非常的正确分类任务的限制。
55. Decentralized Source Localization Using Wireless Sensor Networks from Noisy Data [PDF] 返回目录
Akram Hussain
Abstract: In this paper, the source (event) localization problem is studied in decentralized wireless sensor networks under the fault model where the sensor nodes observe the source and report their decisions to the Fusion Center (FC) for estimating source location. Due to fault model, sensor nodes may provide false positive or false negative decisions to the FC. Event localizations have many applications such as localizing intruder, pollutant sources like biological and chemical weapons, enemies positions in combat monitoring, and faults in power systems. We propose two methods to estimate the source location under the fault model: hitting set approach and feature selection method, which utilize the noisy data set at the FC for estimation of the source location. We have shown that these methods are more fault tolerant in estimating the source location and are not complex as well. We also study the lower bound on the sample complexity requirement for hitting set method. These methods have also been extended for multiple sources localization. Finally, extensive simulations are carried out for different parameters (i.e., the number of sensor nodes and sample complexity) to validate our proposed methods, which show that the proposed methods achieve better performances under the fault model.
摘要:本文源(事件)的定位问题,故障模型,其中传感器节点观察源和估计源位置报告他们的决策融合中心(FC)在研究了分散式无线传感器网络。由于故障模型,传感器节点可以提供假阳性或假阴性的决定到FC。事件的本地化有许多应用,如定位入侵者,污染源像生物和化学武器,在战斗中敌人的监视位置,并在电力系统故障。我们提出了两种方法来估计故障模式下的源位置:打一套方法和特征选择方法,它利用在FC嘈杂的数据集的源位置的估计。我们已经表明,这些方法在估算源位置更高的容错能力和并不复杂,以及。我们还研究了下限击中设置方法的样本复杂度的要求。这些方法也被扩展为多源定位。最后,大量的模拟被进行不同的参数(即,传感器节点和样品复杂性的数量)来验证我们提出的方法,其中表明,所提出的方法实现了故障模型下更好的性能。
Akram Hussain
Abstract: In this paper, the source (event) localization problem is studied in decentralized wireless sensor networks under the fault model where the sensor nodes observe the source and report their decisions to the Fusion Center (FC) for estimating source location. Due to fault model, sensor nodes may provide false positive or false negative decisions to the FC. Event localizations have many applications such as localizing intruder, pollutant sources like biological and chemical weapons, enemies positions in combat monitoring, and faults in power systems. We propose two methods to estimate the source location under the fault model: hitting set approach and feature selection method, which utilize the noisy data set at the FC for estimation of the source location. We have shown that these methods are more fault tolerant in estimating the source location and are not complex as well. We also study the lower bound on the sample complexity requirement for hitting set method. These methods have also been extended for multiple sources localization. Finally, extensive simulations are carried out for different parameters (i.e., the number of sensor nodes and sample complexity) to validate our proposed methods, which show that the proposed methods achieve better performances under the fault model.
摘要:本文源(事件)的定位问题,故障模型,其中传感器节点观察源和估计源位置报告他们的决策融合中心(FC)在研究了分散式无线传感器网络。由于故障模型,传感器节点可以提供假阳性或假阴性的决定到FC。事件的本地化有许多应用,如定位入侵者,污染源像生物和化学武器,在战斗中敌人的监视位置,并在电力系统故障。我们提出了两种方法来估计故障模式下的源位置:打一套方法和特征选择方法,它利用在FC嘈杂的数据集的源位置的估计。我们已经表明,这些方法在估算源位置更高的容错能力和并不复杂,以及。我们还研究了下限击中设置方法的样本复杂度的要求。这些方法也被扩展为多源定位。最后,大量的模拟被进行不同的参数(即,传感器节点和样品复杂性的数量)来验证我们提出的方法,其中表明,所提出的方法实现了故障模型下更好的性能。
56. DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [PDF] 返回目录
Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan
Abstract: Despite the fast development of differentiable architecture search (DARTS), it suffers from a standing instability issue regarding searching performance, which extremely limits its application. Existing robustifying methods draw clues from the outcome instead of finding out the causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal of performance collapse, and the searching should be stopped once an indicator reaches a preset threshold. However, these methods tend to easily reject good architectures if thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections with a learnable architectural coefficient can easily recover from a disadvantageous state and become dominant. We conjecture that skip connections profit too much from this privilege, hence causing the collapse for the derived model. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. Extensive experiments on various datasets verify that our approach can substantially improve the robustness of DARTS.
摘要:尽管微架构搜索的快速发展(飞镖),它对于搜索性能站立不稳的问题,这极其限制了其应用受到影响。现有robustifying方法从结果而不是找出导致因素得出的线索。各种指标如Hessian矩阵的特征值被提出作为性能崩溃的信号,并且一旦指示器达到预设阈值的搜索应该停止。然而,这些方法往往容易拒绝良好的架构,如果阈值设置不当,更别说搜索本质上是嘈杂。在本文中,我们进行一个更微妙的和直接的方法来解决崩溃。我们首先证明了可以学习的建筑系数,可以很容易地从一个不利的状态中恢复,并成为占主导地位的是跳跃的连接。我们猜想,跳跃连接利润过高了这个特权,从而导致派生模型崩溃。因此,我们建议因素进行了辅助跳跃连接这样做的好处,确保所有操作更加公平的竞争。在各种数据集的实验结果验证我们的方法可以显着提高飞镖的鲁棒性。
Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan
Abstract: Despite the fast development of differentiable architecture search (DARTS), it suffers from a standing instability issue regarding searching performance, which extremely limits its application. Existing robustifying methods draw clues from the outcome instead of finding out the causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal of performance collapse, and the searching should be stopped once an indicator reaches a preset threshold. However, these methods tend to easily reject good architectures if thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections with a learnable architectural coefficient can easily recover from a disadvantageous state and become dominant. We conjecture that skip connections profit too much from this privilege, hence causing the collapse for the derived model. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. Extensive experiments on various datasets verify that our approach can substantially improve the robustness of DARTS.
摘要:尽管微架构搜索的快速发展(飞镖),它对于搜索性能站立不稳的问题,这极其限制了其应用受到影响。现有robustifying方法从结果而不是找出导致因素得出的线索。各种指标如Hessian矩阵的特征值被提出作为性能崩溃的信号,并且一旦指示器达到预设阈值的搜索应该停止。然而,这些方法往往容易拒绝良好的架构,如果阈值设置不当,更别说搜索本质上是嘈杂。在本文中,我们进行一个更微妙的和直接的方法来解决崩溃。我们首先证明了可以学习的建筑系数,可以很容易地从一个不利的状态中恢复,并成为占主导地位的是跳跃的连接。我们猜想,跳跃连接利润过高了这个特权,从而导致派生模型崩溃。因此,我们建议因素进行了辅助跳跃连接这样做的好处,确保所有操作更加公平的竞争。在各种数据集的实验结果验证我们的方法可以显着提高飞镖的鲁棒性。
57. Classification of Diabetic Retinopathy Using Unlabeled Data and Knowledge Distillation [PDF] 返回目录
Sajjad Abbasi, Mohsen Hajabdollahi, Pejman Khadivi, Nader Karimi, Roshanak Roshandel, Shahram Shirani, Shadrokh Samavi
Abstract: Knowledge distillation allows transferring knowledge from a pre-trained model to another. However, it suffers from limitations, and constraints related to the two models need to be architecturally similar. Knowledge distillation addresses some of the shortcomings associated with transfer learning by generalizing a complex model to a lighter model. However, some parts of the knowledge may not be distilled by knowledge distillation sufficiently. In this paper, a novel knowledge distillation approach using transfer learning is proposed. The proposed method transfers the entire knowledge of a model to a new smaller one. To accomplish this, unlabeled data are used in an unsupervised manner to transfer the maximum amount of knowledge to the new slimmer model. The proposed method can be beneficial in medical image analysis, where labeled data are typically scarce. The proposed approach is evaluated in the context of classification of images for diagnosing Diabetic Retinopathy on two publicly available datasets, including Messidor and EyePACS. Simulation results demonstrate that the approach is effective in transferring knowledge from a complex model to a lighter one. Furthermore, experimental results illustrate that the performance of different small models is improved significantly using unlabeled data and knowledge distillation.
摘要:知识蒸馏允许从预先训练模型转移到另一个知识。但是,它的局限性受到影响,涉及到两个模型的约束必须是建筑类似。知识蒸馏地址一些通过概括一个复杂的模型,以较轻的模型迁移学习相关的缺点。然而,知识的某些部分可能不被知识蒸馏充分蒸馏。在本文中,利用迁移学习一种新的知识蒸馏方法提出。该方法传送的模型到一个新的较小的整个知识。为了实现这一点,未标记的数据被以无监督的方式使用知识的最大量转移到新的轻薄模型。所提出的方法可以是在医学图像分析,其中标记的数据通常是稀缺的有益的。所提出的方法在图像分类为两个公开可用的数据集,其中包括获月和EyePACS诊断糖尿病视网膜病变的背景下进行评估。仿真结果表明,该方法能有效地从一个复杂的模型,知识转让给一个打火机一个。此外,实验结果表明,不同的小模型的性能显著使用未标记的数据和知识蒸馏改善。
Sajjad Abbasi, Mohsen Hajabdollahi, Pejman Khadivi, Nader Karimi, Roshanak Roshandel, Shahram Shirani, Shadrokh Samavi
Abstract: Knowledge distillation allows transferring knowledge from a pre-trained model to another. However, it suffers from limitations, and constraints related to the two models need to be architecturally similar. Knowledge distillation addresses some of the shortcomings associated with transfer learning by generalizing a complex model to a lighter model. However, some parts of the knowledge may not be distilled by knowledge distillation sufficiently. In this paper, a novel knowledge distillation approach using transfer learning is proposed. The proposed method transfers the entire knowledge of a model to a new smaller one. To accomplish this, unlabeled data are used in an unsupervised manner to transfer the maximum amount of knowledge to the new slimmer model. The proposed method can be beneficial in medical image analysis, where labeled data are typically scarce. The proposed approach is evaluated in the context of classification of images for diagnosing Diabetic Retinopathy on two publicly available datasets, including Messidor and EyePACS. Simulation results demonstrate that the approach is effective in transferring knowledge from a complex model to a lighter one. Furthermore, experimental results illustrate that the performance of different small models is improved significantly using unlabeled data and knowledge distillation.
摘要:知识蒸馏允许从预先训练模型转移到另一个知识。但是,它的局限性受到影响,涉及到两个模型的约束必须是建筑类似。知识蒸馏地址一些通过概括一个复杂的模型,以较轻的模型迁移学习相关的缺点。然而,知识的某些部分可能不被知识蒸馏充分蒸馏。在本文中,利用迁移学习一种新的知识蒸馏方法提出。该方法传送的模型到一个新的较小的整个知识。为了实现这一点,未标记的数据被以无监督的方式使用知识的最大量转移到新的轻薄模型。所提出的方法可以是在医学图像分析,其中标记的数据通常是稀缺的有益的。所提出的方法在图像分类为两个公开可用的数据集,其中包括获月和EyePACS诊断糖尿病视网膜病变的背景下进行评估。仿真结果表明,该方法能有效地从一个复杂的模型,知识转让给一个打火机一个。此外,实验结果表明,不同的小模型的性能显著使用未标记的数据和知识蒸馏改善。
58. Overcoming Negative Transfer: A Survey [PDF] 返回目录
Wen Zhang, Lingfei Deng, Dongrui Wu
Abstract: Transfer learning aims to help the target task with little or no training data by leveraging knowledge from one or multi-related auxiliary tasks. In practice, the success of transfer learning is not always guaranteed, negative transfer is a long-standing problem in transfer learning literature, which has been well recognized within the transfer learning community. How to overcome negative transfer has been studied for a long time and has raised increasing attention in recent years. Thus, it is both necessary and challenging to comprehensively review the relevant researches. This survey attempts to analyze the factors related to negative transfer and summarizes the theories and advances of overcoming negative transfer from four crucial aspects: source data quality, target data quality, domain divergence and generic algorithms, which may provide the readers an insight into the current research status and ideas. Additionally, we provided some general guidelines on how to detect and overcome negative transfer on real data, including the negative transfer detection, datasets, baselines, and general routines. The survey provides researchers a framework for better understanding and identifying the research status, fundamental questions, open challenges and future directions of the field.
摘要:迁移学习的目标是从一个或多个相关的辅助任务利用知识来帮助很少或根本没有训练数据的目标任务。在实践中,迁移学习的成功并不总是有保证的,负迁移是在传输学习文学,已转让的学习社区内的高度认可一个长期存在的问题。如何克服负迁移已经研究了很长一段时间,并已提出了在近几年越来越多的关注。因此,有必要和挑战,全面检讨有关的研究。本次调查试图分析与负迁移的因素,并总结了理论和克服四个关键方面负迁移的进展:源数据的质量,目标数据质量领域的分歧和通用算法,可提供读者见识到了目前的研究现状和思路。此外,我们还提供了关于如何检测和解决实际数据负迁移,包括负迁移检测,数据集,基线和一般程序的一般准则。本次调查为研究人员提供了更好的理解和识别的研究现状,根本性的问题,开放的挑战和领域的未来发展方向的框架。
Wen Zhang, Lingfei Deng, Dongrui Wu
Abstract: Transfer learning aims to help the target task with little or no training data by leveraging knowledge from one or multi-related auxiliary tasks. In practice, the success of transfer learning is not always guaranteed, negative transfer is a long-standing problem in transfer learning literature, which has been well recognized within the transfer learning community. How to overcome negative transfer has been studied for a long time and has raised increasing attention in recent years. Thus, it is both necessary and challenging to comprehensively review the relevant researches. This survey attempts to analyze the factors related to negative transfer and summarizes the theories and advances of overcoming negative transfer from four crucial aspects: source data quality, target data quality, domain divergence and generic algorithms, which may provide the readers an insight into the current research status and ideas. Additionally, we provided some general guidelines on how to detect and overcome negative transfer on real data, including the negative transfer detection, datasets, baselines, and general routines. The survey provides researchers a framework for better understanding and identifying the research status, fundamental questions, open challenges and future directions of the field.
摘要:迁移学习的目标是从一个或多个相关的辅助任务利用知识来帮助很少或根本没有训练数据的目标任务。在实践中,迁移学习的成功并不总是有保证的,负迁移是在传输学习文学,已转让的学习社区内的高度认可一个长期存在的问题。如何克服负迁移已经研究了很长一段时间,并已提出了在近几年越来越多的关注。因此,有必要和挑战,全面检讨有关的研究。本次调查试图分析与负迁移的因素,并总结了理论和克服四个关键方面负迁移的进展:源数据的质量,目标数据质量领域的分歧和通用算法,可提供读者见识到了目前的研究现状和思路。此外,我们还提供了关于如何检测和解决实际数据负迁移,包括负迁移检测,数据集,基线和一般程序的一般准则。本次调查为研究人员提供了更好的理解和识别的研究现状,根本性的问题,开放的挑战和领域的未来发展方向的框架。
59. DARWIN: A Highly Flexible Platform for Imaging Research in Radiology [PDF] 返回目录
Lufan Chang, Wenjing Zhuang, Richeng Wu, Sai Feng, Hao Liu, Jing Yu, Jia Ding, Ziteng Wang, Jiaqi Zhang
Abstract: To conduct a radiomics or deep learning research experiment, the radiologists or physicians need to grasp the needed programming skills, which, however, could be frustrating and costly when they have limited coding experience. In this paper, we present DARWIN, a flexible research platform with a graphical user interface for medical imaging research. Our platform is consists of a radiomics module and a deep learning module. The radiomics module can extract more than 1000 dimension features(first-, second-, and higher-order) and provided many draggable supervised and unsupervised machine learning models. Our deep learning module integrates state of the art architectures of classification, detection, and segmentation tasks. It allows users to manually select hyperparameters, or choose an algorithm to automatically search for the best ones. DARWIN also offers the possibility for users to define a custom pipeline for their experiment. These flexibilities enable radiologists to carry out various experiments easily.
摘要:为了进行radiomics或深度学习研究实验中,放射科医师或医生需要掌握必要的编程技巧,然而,可能是令人沮丧的和昂贵的时候他们只有有限的编码经验。在本文中,我们提出DARWIN,柔性研究平台的图形用户界面,用于医学成像研究。我们的平台是由radiomics模块和深度学习模块。所述radiomics模块可以提取超过1000个尺寸特征(第一,第二,和更高阶)和提供了许多可拖动的监督和无监督机器学习模型。我们深厚的学习模块集成的分类,检测和分割任务的技术结构的状态。它允许用户手动选择超参数,或者选择一个算法来自动搜索最好的。 DARWIN还提供了可能,为用户定义他们的实验中自定义管道。这些灵活性使放射科医生方便地进行各种实验。
Lufan Chang, Wenjing Zhuang, Richeng Wu, Sai Feng, Hao Liu, Jing Yu, Jia Ding, Ziteng Wang, Jiaqi Zhang
Abstract: To conduct a radiomics or deep learning research experiment, the radiologists or physicians need to grasp the needed programming skills, which, however, could be frustrating and costly when they have limited coding experience. In this paper, we present DARWIN, a flexible research platform with a graphical user interface for medical imaging research. Our platform is consists of a radiomics module and a deep learning module. The radiomics module can extract more than 1000 dimension features(first-, second-, and higher-order) and provided many draggable supervised and unsupervised machine learning models. Our deep learning module integrates state of the art architectures of classification, detection, and segmentation tasks. It allows users to manually select hyperparameters, or choose an algorithm to automatically search for the best ones. DARWIN also offers the possibility for users to define a custom pipeline for their experiment. These flexibilities enable radiologists to carry out various experiments easily.
摘要:为了进行radiomics或深度学习研究实验中,放射科医师或医生需要掌握必要的编程技巧,然而,可能是令人沮丧的和昂贵的时候他们只有有限的编码经验。在本文中,我们提出DARWIN,柔性研究平台的图形用户界面,用于医学成像研究。我们的平台是由radiomics模块和深度学习模块。所述radiomics模块可以提取超过1000个尺寸特征(第一,第二,和更高阶)和提供了许多可拖动的监督和无监督机器学习模型。我们深厚的学习模块集成的分类,检测和分割任务的技术结构的状态。它允许用户手动选择超参数,或者选择一个算法来自动搜索最好的。 DARWIN还提供了可能,为用户定义他们的实验中自定义管道。这些灵活性使放射科医生方便地进行各种实验。
60. Efficient, high-performance pancreatic segmentation using multi-scale feature extraction [PDF] 返回目录
Moritz Knolle, Georgios Kaissis, Friederike Jungmann, Sebastian Ziegelmayer, Daniel Sasse, Marcus Makowski, Daniel Rueckert, Rickmer Braren
Abstract: For artificial intelligence-based image analysis methods to reach clinical applicability, the development of high-performance algorithms is crucial. For example, existent segmentation algorithms based on natural images are neither efficient in their parameter use nor optimized for medical imaging. Here we present MoNet, a highly optimized neural-network-based pancreatic segmentation algorithm focused on achieving high performance by efficient multi-scale image feature utilization.
摘要:基于人工智能的图像分析方法,达到临床适用性,高性能算法的发展是至关重要的。例如,基于天然存在的图像分割算法在它们的参数使用既不有效也不用于医学成像的优化。在这里,我们目前莫奈,高度优化的,基于神经网络的胰腺分割算法专注于实现由高效的多尺度图像特征利用率高的性能。
Moritz Knolle, Georgios Kaissis, Friederike Jungmann, Sebastian Ziegelmayer, Daniel Sasse, Marcus Makowski, Daniel Rueckert, Rickmer Braren
Abstract: For artificial intelligence-based image analysis methods to reach clinical applicability, the development of high-performance algorithms is crucial. For example, existent segmentation algorithms based on natural images are neither efficient in their parameter use nor optimized for medical imaging. Here we present MoNet, a highly optimized neural-network-based pancreatic segmentation algorithm focused on achieving high performance by efficient multi-scale image feature utilization.
摘要:基于人工智能的图像分析方法,达到临床适用性,高性能算法的发展是至关重要的。例如,基于天然存在的图像分割算法在它们的参数使用既不有效也不用于医学成像的优化。在这里,我们目前莫奈,高度优化的,基于神经网络的胰腺分割算法专注于实现由高效的多尺度图像特征利用率高的性能。
61. Breast mass detection in digital mammography based on anchor-free architecture [PDF] 返回目录
Haichao Cao
Abstract: Background and Objective: Accurate detection of breast masses in mammography images is critical to diagnose early breast cancer, which can greatly improve the patients survival rate. However, it is still a big challenge due to the heterogeneity of breast masses and the complexity of their surrounding environment.Methods: To address these problems, we propose a one-stage object detection architecture, called Breast Mass Detection Network (BMassDNet), based on anchor-free and feature pyramid which makes the detection of breast masses of different sizes well adapted. We introduce a truncation normalization method and combine it with adaptive histogram equalization to enhance the contrast between the breast mass and the surrounding environment. Meanwhile, to solve the overfitting problem caused by small data size, we propose a natural deformation data augmentation method and mend the train data dynamic updating method based on the data complexity to effectively utilize the limited data. Finally, we use transfer learning to assist the training process and to improve the robustness of the model ulteriorly.Results: On the INbreast dataset, each image has an average of 0.495 false positives whilst the recall rate is 0.930; On the DDSM dataset, when each image has 0.599 false positives, the recall rate reaches 0.943.Conclusions: The experimental results on datasets INbreast and DDSM show that the proposed BMassDNet can obtain competitive detection performance over the current top ranked methods.
摘要:背景与目的:在乳腺X线摄影图像乳腺肿块的准确的检测是诊断早期乳腺癌,这样可以大大提高患者生存率的关键。但是,它仍然是一个很大的挑战,因为乳腺肿块的异质性和其周边environment.Methods的复杂性:为了解决这些问题,我们提出了一个阶段的目标检测架构,称为乳腺肿块探测网(BMassDNet),基于锚和无功能的金字塔,这使得乳腺肿块大小不同的检测很好地适应。我们引入截断归一化法和自适应直方图均衡化结合起来,以增强乳房的质量和周围环境之间的对比度。同时,为了解决因小数据尺寸的过拟合问题,我们提出了一个自然变形数据增强方法和修补基于所述数据的复杂性的列车数据动态更新的方法有效地利用有限的数据。最后,我们使用转移训练,以帮助培训过程,并提高模型ulteriorly.Results的鲁棒性:在INbreast数据集,每个图像的平均的0.495误报而召回率是0.930;在DDSM数据集,当每个图像有0.599误报,召回率达到0.943.Conclusions:对数据集INbreast和DDSM上的实验结果,提出BMassDNet能获得有竞争力的检测性能超过目前排名靠前的方法。
Haichao Cao
Abstract: Background and Objective: Accurate detection of breast masses in mammography images is critical to diagnose early breast cancer, which can greatly improve the patients survival rate. However, it is still a big challenge due to the heterogeneity of breast masses and the complexity of their surrounding environment.Methods: To address these problems, we propose a one-stage object detection architecture, called Breast Mass Detection Network (BMassDNet), based on anchor-free and feature pyramid which makes the detection of breast masses of different sizes well adapted. We introduce a truncation normalization method and combine it with adaptive histogram equalization to enhance the contrast between the breast mass and the surrounding environment. Meanwhile, to solve the overfitting problem caused by small data size, we propose a natural deformation data augmentation method and mend the train data dynamic updating method based on the data complexity to effectively utilize the limited data. Finally, we use transfer learning to assist the training process and to improve the robustness of the model ulteriorly.Results: On the INbreast dataset, each image has an average of 0.495 false positives whilst the recall rate is 0.930; On the DDSM dataset, when each image has 0.599 false positives, the recall rate reaches 0.943.Conclusions: The experimental results on datasets INbreast and DDSM show that the proposed BMassDNet can obtain competitive detection performance over the current top ranked methods.
摘要:背景与目的:在乳腺X线摄影图像乳腺肿块的准确的检测是诊断早期乳腺癌,这样可以大大提高患者生存率的关键。但是,它仍然是一个很大的挑战,因为乳腺肿块的异质性和其周边environment.Methods的复杂性:为了解决这些问题,我们提出了一个阶段的目标检测架构,称为乳腺肿块探测网(BMassDNet),基于锚和无功能的金字塔,这使得乳腺肿块大小不同的检测很好地适应。我们引入截断归一化法和自适应直方图均衡化结合起来,以增强乳房的质量和周围环境之间的对比度。同时,为了解决因小数据尺寸的过拟合问题,我们提出了一个自然变形数据增强方法和修补基于所述数据的复杂性的列车数据动态更新的方法有效地利用有限的数据。最后,我们使用转移训练,以帮助培训过程,并提高模型ulteriorly.Results的鲁棒性:在INbreast数据集,每个图像的平均的0.495误报而召回率是0.930;在DDSM数据集,当每个图像有0.599误报,召回率达到0.943.Conclusions:对数据集INbreast和DDSM上的实验结果,提出BMassDNet能获得有竞争力的检测性能超过目前排名靠前的方法。
62. Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance [PDF] 返回目录
Andrew J. Lohn
Abstract: Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification.
摘要:测试,评估,验证和确认(TEVV)对人工智能(AI)是一个威胁要限制经济和社会的回报是人工智能研究人员都投身到生产的挑战。 TEVV用于AI的中心任务推定脆性,其中脆性意味着系统功能以及内的一些边界和这些边界的差之外。本文认为,无论这些标准有一定的深层神经网络。首先,高度吹捧AI的成功(例如,图像分类和语音识别)是大小更容易出现故障的比,即使在设计界限临界系统典型地认证(完美地分布采样)的订单。第二,性能脱落仅逐渐作为输入成为进一步外的分布(OOD)。需要在设计,尽管易出故障的AI部件以及对评价和改进,以获得AI到可以清除TEVV和认证的具有挑战性的障碍OOD性能是弹性的系统更加强调。
Andrew J. Lohn
Abstract: Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification.
摘要:测试,评估,验证和确认(TEVV)对人工智能(AI)是一个威胁要限制经济和社会的回报是人工智能研究人员都投身到生产的挑战。 TEVV用于AI的中心任务推定脆性,其中脆性意味着系统功能以及内的一些边界和这些边界的差之外。本文认为,无论这些标准有一定的深层神经网络。首先,高度吹捧AI的成功(例如,图像分类和语音识别)是大小更容易出现故障的比,即使在设计界限临界系统典型地认证(完美地分布采样)的订单。第二,性能脱落仅逐渐作为输入成为进一步外的分布(OOD)。需要在设计,尽管易出故障的AI部件以及对评价和改进,以获得AI到可以清除TEVV和认证的具有挑战性的障碍OOD性能是弹性的系统更加强调。
63. On Open and Strong-Scaling Tools for Atom Probe Crystallography: High-Throughput Methods for Indexing Crystal Structure and Orientation [PDF] 返回目录
Markus Kühbach, Matthew Kasemer, Baptiste Gault, Andrew Breen
Abstract: Volumetric crystal structure indexing and orientation mapping are key data processing steps for virtually any quantitative study of spatial correlations between the local chemistry and the microstructure of a material. For electron and X-ray diffraction methods it is possible to develop indexing tools which compare measured and analytically computed patterns to decode the structure and relative orientation within local regions of interest. Consequently, a number of numerically efficient and automated software tools exist to solve the above characterisation tasks. For atom probe tomography (APT) experiments, however, the strategy of making comparisons between measured and analytically computed patterns is less robust because many APT datasets may contain substantial noise. Given that general enough predictive models for such noise remain elusive, crystallography tools for APT face several limitations: Their robustness to noise, and therefore, their capability to identify and distinguish different crystal structures and orientation is limited. In addition, the tools are sequential and demand substantial manual interaction. In combination, this makes robust uncertainty quantifying with automated high-throughput studies of the latent crystallographic information a difficult task with APT data. To improve the situation, we review the existent methods and discuss how they link to those in the diffraction communities. With this we modify some of the APT methods to yield more robust descriptors of the atomic arrangement. We report how this enables the development of an open-source software tool for strong-scaling and automated identifying of crystal structure and mapping crystal orientation in nanocrystalline APT datasets with multiple phases.
摘要:体积晶体结构的索引和取向映射是用于局部化学和材料的微结构之间的空间相关性的几乎任何定量研究密钥数据的处理步骤。对于电子和X射线衍射法,可以开发一种比较测量和分析计算模式,以所关注的局部区域内的结构和相对方位解码索引工具。因此,大量的数字高效率和自动化软件工具存在解决上述表征任务。对于原子探针断层摄影术(APT)的实验中,然而,使得测量和分析计算图案之间比较的策略是不太可靠的,因为许多数据集APT可能有相当大的噪声。鉴于这样的噪音一般足以预测模型仍然是APT也面临着一些限制难以捉摸,晶体工具:他们的鲁棒性的噪音,因此,他们的能力,以识别和区分不同的晶体结构和方向的限制。此外,该工具是连续的,并要求大量的人工交互。在组合,这使得与自动化高通量的潜在晶体信息研究中很难与APT数据任务健壮的不确定性量化。为了改善这种情况,我们回顾了存在的方法,并讨论他们如何链接到那些在衍射社区。有了这个,我们修改了一些APT方法得到原子排列的更强大的描述符。我们报告如何使强缩放的开源软件工具的开发和自动化纳米晶确定晶体结构和映射晶体取向的多阶段APT数据集。
Markus Kühbach, Matthew Kasemer, Baptiste Gault, Andrew Breen
Abstract: Volumetric crystal structure indexing and orientation mapping are key data processing steps for virtually any quantitative study of spatial correlations between the local chemistry and the microstructure of a material. For electron and X-ray diffraction methods it is possible to develop indexing tools which compare measured and analytically computed patterns to decode the structure and relative orientation within local regions of interest. Consequently, a number of numerically efficient and automated software tools exist to solve the above characterisation tasks. For atom probe tomography (APT) experiments, however, the strategy of making comparisons between measured and analytically computed patterns is less robust because many APT datasets may contain substantial noise. Given that general enough predictive models for such noise remain elusive, crystallography tools for APT face several limitations: Their robustness to noise, and therefore, their capability to identify and distinguish different crystal structures and orientation is limited. In addition, the tools are sequential and demand substantial manual interaction. In combination, this makes robust uncertainty quantifying with automated high-throughput studies of the latent crystallographic information a difficult task with APT data. To improve the situation, we review the existent methods and discuss how they link to those in the diffraction communities. With this we modify some of the APT methods to yield more robust descriptors of the atomic arrangement. We report how this enables the development of an open-source software tool for strong-scaling and automated identifying of crystal structure and mapping crystal orientation in nanocrystalline APT datasets with multiple phases.
摘要:体积晶体结构的索引和取向映射是用于局部化学和材料的微结构之间的空间相关性的几乎任何定量研究密钥数据的处理步骤。对于电子和X射线衍射法,可以开发一种比较测量和分析计算模式,以所关注的局部区域内的结构和相对方位解码索引工具。因此,大量的数字高效率和自动化软件工具存在解决上述表征任务。对于原子探针断层摄影术(APT)的实验中,然而,使得测量和分析计算图案之间比较的策略是不太可靠的,因为许多数据集APT可能有相当大的噪声。鉴于这样的噪音一般足以预测模型仍然是APT也面临着一些限制难以捉摸,晶体工具:他们的鲁棒性的噪音,因此,他们的能力,以识别和区分不同的晶体结构和方向的限制。此外,该工具是连续的,并要求大量的人工交互。在组合,这使得与自动化高通量的潜在晶体信息研究中很难与APT数据任务健壮的不确定性量化。为了改善这种情况,我们回顾了存在的方法,并讨论他们如何链接到那些在衍射社区。有了这个,我们修改了一些APT方法得到原子排列的更强大的描述符。我们报告如何使强缩放的开源软件工具的开发和自动化纳米晶确定晶体结构和映射晶体取向的多阶段APT数据集。
64. Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images [PDF] 返回目录
Seyedehnafiseh Mirniaharikandehei, Morteza Heidari, Gopichandh Danala, Sivaramakrishnan Lakshmivarahan, Bin Zheng
Abstract: Background and Objective: Non-invasively predicting the risk of cancer metastasis before surgery plays an essential role in determining optimal treatment methods for cancer patients (including who can benefit from neoadjuvant chemotherapy). Although developing radiomics based machine learning (ML) models has attracted broad research interest for this purpose, it often faces a challenge of how to build a highly performed and robust ML model using small and imbalanced image datasets. Methods: In this study, we explore a new approach to build an optimal ML model. A retrospective dataset involving abdominal computed tomography (CT) images acquired from 159 patients diagnosed with gastric cancer is assembled. Among them, 121 cases have peritoneal metastasis (PM), while 38 cases do not have PM. A computer-aided detection (CAD) scheme is first applied to segment primary gastric tumor volumes and initially computes 315 image features. Then, two Gradient Boosting Machine (GBM) models embedded with two different feature dimensionality reduction methods, namely, the principal component analysis (PCA) and a random projection algorithm (RPA) and a synthetic minority oversampling technique, are built to predict the risk of the patients having PM. All GBM models are trained and tested using a leave-one-case-out cross-validation method. Results: Results show that the GBM embedded with RPA yielded a significantly higher prediction accuracy (71.2%) than using PCA (65.2%) (p<0.05). conclusions: the study demonstrated that ct images of primary gastric tumors contain discriminatory information to predict risk pm, and rpa is a promising method generate optimal feature vector, improving performance ml models medical images. < font>
摘要:背景与目的:非侵入性预测癌症转移,手术风险起着决定最佳的治疗方法为癌症患者(包括谁可以从辅助化疗中获益)的重要作用了。虽然基于发展radiomics机器学习(ML)的模型吸引了为此目的广泛的研究兴趣,往往面临着如何建立利用小的不平衡图像数据集高度执行和强大的ML模型是一个挑战。方法:在这项研究中,我们探索建立一个最佳ML模型的新方法。涉及从经诊断患有胃癌159例患者获取腹部计算机断层扫描(CT)图像的回顾性数据集的组装。其中121例有腹膜转移(PM),而38事件不具有PM。一种计算机辅助检测(CAD)方案被首先施加到段原发性胃肿瘤体积和首先计算315个图像特征。然后,两个梯度增压机(GBM)模型嵌入有两个不同的特征维数降低的方式,即,使主成分分析(PCA)和随机投影算法(RPA)和合成少数过采样技术中,在建造时,预测的风险患者有PM。所有GBM模型被训练和使用留一情况下,交叉验证方法进行测试。结果:结果表明,与嵌入的RPA GBM得到比使用PCA(65.2%)(P <0.05)一个显著较高的预测精度(71.2%)。结论:研究表明,原发性胃肿瘤的ct图像包含歧视性的信息来预测pm的风险,rpa是产生最佳的特征向量,提高医学图像的ml车型的性能很有前途的方法。< font> 0.05)一个显著较高的预测精度(71.2%)。结论:研究表明,原发性胃肿瘤的ct图像包含歧视性的信息来预测pm的风险,rpa是产生最佳的特征向量,提高医学图像的ml车型的性能很有前途的方法。<>0.05).>
Seyedehnafiseh Mirniaharikandehei, Morteza Heidari, Gopichandh Danala, Sivaramakrishnan Lakshmivarahan, Bin Zheng
Abstract: Background and Objective: Non-invasively predicting the risk of cancer metastasis before surgery plays an essential role in determining optimal treatment methods for cancer patients (including who can benefit from neoadjuvant chemotherapy). Although developing radiomics based machine learning (ML) models has attracted broad research interest for this purpose, it often faces a challenge of how to build a highly performed and robust ML model using small and imbalanced image datasets. Methods: In this study, we explore a new approach to build an optimal ML model. A retrospective dataset involving abdominal computed tomography (CT) images acquired from 159 patients diagnosed with gastric cancer is assembled. Among them, 121 cases have peritoneal metastasis (PM), while 38 cases do not have PM. A computer-aided detection (CAD) scheme is first applied to segment primary gastric tumor volumes and initially computes 315 image features. Then, two Gradient Boosting Machine (GBM) models embedded with two different feature dimensionality reduction methods, namely, the principal component analysis (PCA) and a random projection algorithm (RPA) and a synthetic minority oversampling technique, are built to predict the risk of the patients having PM. All GBM models are trained and tested using a leave-one-case-out cross-validation method. Results: Results show that the GBM embedded with RPA yielded a significantly higher prediction accuracy (71.2%) than using PCA (65.2%) (p<0.05). conclusions: the study demonstrated that ct images of primary gastric tumors contain discriminatory information to predict risk pm, and rpa is a promising method generate optimal feature vector, improving performance ml models medical images. < font>
摘要:背景与目的:非侵入性预测癌症转移,手术风险起着决定最佳的治疗方法为癌症患者(包括谁可以从辅助化疗中获益)的重要作用了。虽然基于发展radiomics机器学习(ML)的模型吸引了为此目的广泛的研究兴趣,往往面临着如何建立利用小的不平衡图像数据集高度执行和强大的ML模型是一个挑战。方法:在这项研究中,我们探索建立一个最佳ML模型的新方法。涉及从经诊断患有胃癌159例患者获取腹部计算机断层扫描(CT)图像的回顾性数据集的组装。其中121例有腹膜转移(PM),而38事件不具有PM。一种计算机辅助检测(CAD)方案被首先施加到段原发性胃肿瘤体积和首先计算315个图像特征。然后,两个梯度增压机(GBM)模型嵌入有两个不同的特征维数降低的方式,即,使主成分分析(PCA)和随机投影算法(RPA)和合成少数过采样技术中,在建造时,预测的风险患者有PM。所有GBM模型被训练和使用留一情况下,交叉验证方法进行测试。结果:结果表明,与嵌入的RPA GBM得到比使用PCA(65.2%)(P <0.05)一个显著较高的预测精度(71.2%)。结论:研究表明,原发性胃肿瘤的ct图像包含歧视性的信息来预测pm的风险,rpa是产生最佳的特征向量,提高医学图像的ml车型的性能很有前途的方法。< font> 0.05)一个显著较高的预测精度(71.2%)。结论:研究表明,原发性胃肿瘤的ct图像包含歧视性的信息来预测pm的风险,rpa是产生最佳的特征向量,提高医学图像的ml车型的性能很有前途的方法。<>0.05).>
65. Operational vs Convolutional Neural Networks for Image Denoising [PDF] 返回目录
Junaid Malik, Serkan Kiranyaz, Moncef Gabbouj
Abstract: Convolutional Neural Networks (CNNs) have recently become a favored technique for image denoising due to its adaptive learning ability, especially with a deep configuration. However, their efficacy is inherently limited owing to their homogenous network formation with the unique use of linear convolution. In this study, we propose a heterogeneous network model which allows greater flexibility for embedding additional non-linearity at the core of the data transformation. To this end, we propose the idea of an operational neuron or Operational Neural Networks (ONN), which enables a flexible non-linear and heterogeneous configuration employing both inter and intra-layer neuronal diversity. Furthermore, we propose a robust operator search strategy inspired by the Hebbian theory, called the Synaptic Plasticity Monitoring (SPM) which can make data-driven choices for non-linearities in any architecture. An extensive set of comparative evaluations of ONNs and CNNs over two severe image denoising problems yield conclusive evidence that ONNs enriched by non-linear operators can achieve a superior denoising performance against CNNs with both equivalent and well-known deep configurations.
摘要:卷积神经网络(细胞神经网络)最近已成为对图像进行去噪青睐的技术,由于其适应性学习能力,尤其是与深度的配置。然而,它们的功效被固有由于其均匀的网络形成与所述唯一使用线性卷积的限制。在这项研究中,我们提出了一个异构网络模型,允许在数据转换的核心嵌入附加非线性更大的灵活性。为此,提出了一种操作神经元或操作神经网络(ONN),这使得能够同时采用帧间和帧内层神经元多样性的柔性非直线和异质结构的想法。此外,我们建议由赫布理论启发了强大的运营商搜索策略,被称为突触可塑性监测(SPM),它可以使数据驱动的选择,在任何结构的非线性。在两个严重的图像去噪的问题一套广泛ONNs和细胞神经网络的比较评价的产率确凿的证据表明通过非线性运算符富集ONNs可以实现对细胞神经网络优异性能去噪与两个当量的和众所周知的深配置。
Junaid Malik, Serkan Kiranyaz, Moncef Gabbouj
Abstract: Convolutional Neural Networks (CNNs) have recently become a favored technique for image denoising due to its adaptive learning ability, especially with a deep configuration. However, their efficacy is inherently limited owing to their homogenous network formation with the unique use of linear convolution. In this study, we propose a heterogeneous network model which allows greater flexibility for embedding additional non-linearity at the core of the data transformation. To this end, we propose the idea of an operational neuron or Operational Neural Networks (ONN), which enables a flexible non-linear and heterogeneous configuration employing both inter and intra-layer neuronal diversity. Furthermore, we propose a robust operator search strategy inspired by the Hebbian theory, called the Synaptic Plasticity Monitoring (SPM) which can make data-driven choices for non-linearities in any architecture. An extensive set of comparative evaluations of ONNs and CNNs over two severe image denoising problems yield conclusive evidence that ONNs enriched by non-linear operators can achieve a superior denoising performance against CNNs with both equivalent and well-known deep configurations.
摘要:卷积神经网络(细胞神经网络)最近已成为对图像进行去噪青睐的技术,由于其适应性学习能力,尤其是与深度的配置。然而,它们的功效被固有由于其均匀的网络形成与所述唯一使用线性卷积的限制。在这项研究中,我们提出了一个异构网络模型,允许在数据转换的核心嵌入附加非线性更大的灵活性。为此,提出了一种操作神经元或操作神经网络(ONN),这使得能够同时采用帧间和帧内层神经元多样性的柔性非直线和异质结构的想法。此外,我们建议由赫布理论启发了强大的运营商搜索策略,被称为突触可塑性监测(SPM),它可以使数据驱动的选择,在任何结构的非线性。在两个严重的图像去噪的问题一套广泛ONNs和细胞神经网络的比较评价的产率确凿的证据表明通过非线性运算符富集ONNs可以实现对细胞神经网络优异性能去噪与两个当量的和众所周知的深配置。
66. Adversarial Shapley Value Experience Replay for Task-Free Continual Learning [PDF] 返回目录
Zheda Mai, Dongsub Shim, Jihwan Jeong, Scott Sanner, Hyunwoo Kim, Jongseong Jang
Abstract: Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. In this paper, we specifically focus on the task-free setting where data are streamed online without task metadata and clear task boundaries. A simple and highly effective algorithm class for this setting is known as Experience Replay (ER) that selectively stores data samples from previous experience and leverages them to interleave memory-based and online batch learning updates. Recent advances in ER have proposed novel methods for scoring which samples to store in memory and which memory samples to interleave with online data during learning updates. In this paper, we contribute a novel Adversarial Shapley value ER (ASER) method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries). Overall, we observe that ASER provides competitive or improved performance on a variety of datasets compared to state-of-the-art ER-based continual learning methods.
摘要:持续的学习是深度学习的一个分支,力求在学习的稳定性和可塑性之间的平衡。在本文中,我们特别注重在数据的在线流媒体没有任务的元数据和清晰的任务边界的自由任务设置。对于此设置简单而高效的算法类被称为经验重播(ER),从以往的经验和他们对杠杆交织存储器为基础,在线批学习更新选择性存储数据样本。在急诊室的最新进展,提出了得分样品在内存中存储和其学习期间的更新与在线数据交错存储该样品的新方法。在本文中,我们贡献,根据他们保留潜决策边界为先前观察到的类(以维持学习稳定性和避免遗忘),而与潜决策边界干扰能力分数存储器的数据样本的新的对抗性Shapley值ER(ASER)方法目前正在班学(鼓励可塑性和新的阶级界限的最佳学习)。总体而言,我们观察到,ASER提供对各种数据集相比,国家的最先进的基于ER-不断的学习方法有竞争力的或更好的性能。
Zheda Mai, Dongsub Shim, Jihwan Jeong, Scott Sanner, Hyunwoo Kim, Jongseong Jang
Abstract: Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. In this paper, we specifically focus on the task-free setting where data are streamed online without task metadata and clear task boundaries. A simple and highly effective algorithm class for this setting is known as Experience Replay (ER) that selectively stores data samples from previous experience and leverages them to interleave memory-based and online batch learning updates. Recent advances in ER have proposed novel methods for scoring which samples to store in memory and which memory samples to interleave with online data during learning updates. In this paper, we contribute a novel Adversarial Shapley value ER (ASER) method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries). Overall, we observe that ASER provides competitive or improved performance on a variety of datasets compared to state-of-the-art ER-based continual learning methods.
摘要:持续的学习是深度学习的一个分支,力求在学习的稳定性和可塑性之间的平衡。在本文中,我们特别注重在数据的在线流媒体没有任务的元数据和清晰的任务边界的自由任务设置。对于此设置简单而高效的算法类被称为经验重播(ER),从以往的经验和他们对杠杆交织存储器为基础,在线批学习更新选择性存储数据样本。在急诊室的最新进展,提出了得分样品在内存中存储和其学习期间的更新与在线数据交错存储该样品的新方法。在本文中,我们贡献,根据他们保留潜决策边界为先前观察到的类(以维持学习稳定性和避免遗忘),而与潜决策边界干扰能力分数存储器的数据样本的新的对抗性Shapley值ER(ASER)方法目前正在班学(鼓励可塑性和新的阶级界限的最佳学习)。总体而言,我们观察到,ASER提供对各种数据集相比,国家的最先进的基于ER-不断的学习方法有竞争力的或更好的性能。
注:中文为机器翻译结果!封面为论文标题词云图!