目录
9. An Improved Person Re-identification Method by light-weight convolutional neural network [PDF] 摘要
10. Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification [PDF] 摘要
13. Automatic sleep stage classification with deep residual networks in a mixed-cohort setting [PDF] 摘要
14. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition [PDF] 摘要
19. Learning Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation with Few Labeled Source Samples [PDF] 摘要
23. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image [PDF] 摘要
26. Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics [PDF] 摘要
27. Automating the assessment of biofouling in images using expert agreement as a gold standard [PDF] 摘要
38. A persistent homology-based topological loss function for multi-class CNN segmentation of cardiac MRI [PDF] 摘要
40. A Survey on Assessing the Generalization Envelope of Deep Neural Networks at Inference Time for Image Classification [PDF] 摘要
41. Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019 [PDF] 摘要
摘要
1. Delving Deeper into Anti-aliasing in ConvNets [PDF] 返回目录
Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee
Abstract: Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a low-pass filter (e.g., Gaussian blur) before downsampling. However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks including ImageNet classification, COCO instance segmentation, and Cityscapes semantic segmentation. Qualitative and quantitative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at this https URL.
摘要:混叠是指这样的现象高频信号退化成采样后完全不同的。它的出现是因为采样层深度学习的环境问题在深架构的广泛采用,以减少参数和计算。标准解决方案是下采样之前应用一个低通滤波器(例如,高斯模糊)。然而,它可以是次优的,以在整个内容应用相同的过滤器,如特征图的频率可以在两个空间位置和信道特性而变化。为了解决这个问题,我们提出了一种自适应内容感知低通滤波层,其单独的预测滤波器权重对每个空间位置和信道组的输入特征地图。我们调查多个任务,包括ImageNet分类,例如COCO分割和风情语义分割了该方法的有效性和推广。定性和定量的结果表明,我们的方法有效地适应不同的特征频率,以避免混淆,同时保留对识别有用的信息。代码可在此HTTPS URL。
Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee
Abstract: Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a low-pass filter (e.g., Gaussian blur) before downsampling. However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks including ImageNet classification, COCO instance segmentation, and Cityscapes semantic segmentation. Qualitative and quantitative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at this https URL.
摘要:混叠是指这样的现象高频信号退化成采样后完全不同的。它的出现是因为采样层深度学习的环境问题在深架构的广泛采用,以减少参数和计算。标准解决方案是下采样之前应用一个低通滤波器(例如,高斯模糊)。然而,它可以是次优的,以在整个内容应用相同的过滤器,如特征图的频率可以在两个空间位置和信道特性而变化。为了解决这个问题,我们提出了一种自适应内容感知低通滤波层,其单独的预测滤波器权重对每个空间位置和信道组的输入特征地图。我们调查多个任务,包括ImageNet分类,例如COCO分割和风情语义分割了该方法的有效性和推广。定性和定量的结果表明,我们的方法有效地适应不同的特征频率,以避免混淆,同时保留对识别有用的信息。代码可在此HTTPS URL。
2. TAnoGAN: Time Series Anomaly Detection with Generative Adversarial Networks [PDF] 返回目录
Md Abul Bashar, Richi Nayak
Abstract: Anomaly detection in time series data is a significant problem faced in many application areas. Recently, Generative Adversarial Networks (GAN) have gained attention for generation and anomaly detection in image domain. In this paper, we propose a novel GAN-based unsupervised method called TAnoGan for detecting anomalies in time series when a small number of data points are available. We evaluate TAnoGan with 46 real-world time series datasets that cover a variety of domains. Extensive experimental results show that TAnoGan performs better than traditional and neural network models.
摘要:异常检测时间序列数据是面临许多应用领域一个显著的问题。近日,剖成对抗性网络(GAN)都获得了瞩目的产生和在图像域异常检测。在本文中,我们提出了称为TAnoGan用于按时间序列检测异常时可用的数据点的数量少一个新颖的基于GAN-无监督方法。我们评估TAnoGan与覆盖多个领域的46真实世界的时间序列数据集。大量的实验结果表明,TAnoGan执行比传统的神经网络模型更好。
Md Abul Bashar, Richi Nayak
Abstract: Anomaly detection in time series data is a significant problem faced in many application areas. Recently, Generative Adversarial Networks (GAN) have gained attention for generation and anomaly detection in image domain. In this paper, we propose a novel GAN-based unsupervised method called TAnoGan for detecting anomalies in time series when a small number of data points are available. We evaluate TAnoGan with 46 real-world time series datasets that cover a variety of domains. Extensive experimental results show that TAnoGan performs better than traditional and neural network models.
摘要:异常检测时间序列数据是面临许多应用领域一个显著的问题。近日,剖成对抗性网络(GAN)都获得了瞩目的产生和在图像域异常检测。在本文中,我们提出了称为TAnoGan用于按时间序列检测异常时可用的数据点的数量少一个新颖的基于GAN-无监督方法。我们评估TAnoGan与覆盖多个领域的46真实世界的时间序列数据集。大量的实验结果表明,TAnoGan执行比传统的神经网络模型更好。
3. Behavioural pattern discovery from collections of egocentric photo-streams [PDF] 返回目录
Martin Menchon, Estefania Talavera, Jose M Massa, Petia Radeva
Abstract: The automatic discovery of behaviour is of high importance when aiming to assess and improve the quality of life of people. Egocentric images offer a rich and objective description of the daily life of the camera wearer. This work proposes a new method to identify a person's patterns of behaviour from collected egocentric photo-streams. Our model characterizes time-frames based on the context (place, activities and environment objects) that define the images composition. Based on the similarity among the time-frames that describe the collected days for a user, we propose a new unsupervised greedy method to discover the behavioural pattern set based on a novel semantic clustering approach. Moreover, we present a new score metric to evaluate the performance of the proposed algorithm. We validate our method on 104 days and more than 100k images extracted from 7 users. Results show that behavioural patterns can be discovered to characterize the routine of individuals and consequently their lifestyle.
摘要:旨在评估和提高人民的生活质量时行为的自动发现是非常重要的。自我中心的图像提供了摄像头佩戴者的日常生活的丰富和客观的描述。这项工作提出了一种新的方法来收集从自我中心的照片流识别一个人的行为模式。我们的模型表征基于所述上下文(地点,活动和环境中的物体)定义的图像组成的时间帧。在此基础上描述了收集天为一个用户的时间帧之间的相似性,我们提出了一个新的无监督贪心方法来发现基于新型语义聚合方法的行为模式集。此外,我们提出了新的成绩指标来评价该算法的性能。我们确认我们靠104天,从7个用户提取100K以上图像的方法。结果表明,行为模式可以被发现表征个人的常规,因此他们的生活方式。
Martin Menchon, Estefania Talavera, Jose M Massa, Petia Radeva
Abstract: The automatic discovery of behaviour is of high importance when aiming to assess and improve the quality of life of people. Egocentric images offer a rich and objective description of the daily life of the camera wearer. This work proposes a new method to identify a person's patterns of behaviour from collected egocentric photo-streams. Our model characterizes time-frames based on the context (place, activities and environment objects) that define the images composition. Based on the similarity among the time-frames that describe the collected days for a user, we propose a new unsupervised greedy method to discover the behavioural pattern set based on a novel semantic clustering approach. Moreover, we present a new score metric to evaluate the performance of the proposed algorithm. We validate our method on 104 days and more than 100k images extracted from 7 users. Results show that behavioural patterns can be discovered to characterize the routine of individuals and consequently their lifestyle.
摘要:旨在评估和提高人民的生活质量时行为的自动发现是非常重要的。自我中心的图像提供了摄像头佩戴者的日常生活的丰富和客观的描述。这项工作提出了一种新的方法来收集从自我中心的照片流识别一个人的行为模式。我们的模型表征基于所述上下文(地点,活动和环境中的物体)定义的图像组成的时间帧。在此基础上描述了收集天为一个用户的时间帧之间的相似性,我们提出了一个新的无监督贪心方法来发现基于新型语义聚合方法的行为模式集。此外,我们提出了新的成绩指标来评价该算法的性能。我们确认我们靠104天,从7个用户提取100K以上图像的方法。结果表明,行为模式可以被发现表征个人的常规,因此他们的生活方式。
4. Deterministic PointNetLK for Generalized Registration [PDF] 返回目录
Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey
Abstract: There has been remarkable progress in the application of deep learning to 3D point cloud registration in recent years. Despite their success, these approaches tend to have poor generalization properties when attempting to align unseen point clouds at test time. PointNetLK has proven the exception to this rule by leveraging the intrinsic generalization properties of the Lucas & Kanade (LK) image alignment algorithm to point cloud registration. The approach relies heavily upon the estimation of a gradient through finite differentiation -- a strategy that is inherently ill-conditioned and highly sensitive to the step-size choice. To avoid these problems, we propose a deterministic PointNetLK method that uses analytical gradients. We also develop several strategies to improve large-volume point cloud processing. We compare our approach to canonical PointNetLK and other state-of-the-art methods and demonstrate how our approach provides accurate, reliable registration with high fidelity. Extended experiments on noisy, sparse, and partial point clouds depict the utility of our approach for many real-world scenarios. Further, the decomposition of the Jacobian matrix affords the reuse of feature embeddings for alternate warp functions.
摘要:已经有近年来在深学习三维点云登记的应用显着进展。尽管他们的成功,这些方法往往在测试时试图对齐看不见的点云时具有较差的泛化性能。 PointNetLK已经通过利用卢卡斯&奏(LK)图像对准算法,以点云登记的固有性质概括证明了例外。该方法在很大程度上依赖于通过有限分化的梯度的估计 - 一种策略,本质上是病态和步长选择高度敏感。为了避免这些问题,我们建议使用分析梯度确定性PointNetLK方法。我们还开发了多种策略,以提高大容量点云处理。我们我们的做法比较规范PointNetLK和其他国家的最先进的方法和证明我们的方法如何提供高保真度准确,可靠的定位。在嘈杂的,稀疏,局部点云扩展的实验描述我们的许多真实世界场景的方法的效用。此外,雅可比矩阵的分解,得到为备用翘曲函数的嵌入特征的再利用。
Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey
Abstract: There has been remarkable progress in the application of deep learning to 3D point cloud registration in recent years. Despite their success, these approaches tend to have poor generalization properties when attempting to align unseen point clouds at test time. PointNetLK has proven the exception to this rule by leveraging the intrinsic generalization properties of the Lucas & Kanade (LK) image alignment algorithm to point cloud registration. The approach relies heavily upon the estimation of a gradient through finite differentiation -- a strategy that is inherently ill-conditioned and highly sensitive to the step-size choice. To avoid these problems, we propose a deterministic PointNetLK method that uses analytical gradients. We also develop several strategies to improve large-volume point cloud processing. We compare our approach to canonical PointNetLK and other state-of-the-art methods and demonstrate how our approach provides accurate, reliable registration with high fidelity. Extended experiments on noisy, sparse, and partial point clouds depict the utility of our approach for many real-world scenarios. Further, the decomposition of the Jacobian matrix affords the reuse of feature embeddings for alternate warp functions.
摘要:已经有近年来在深学习三维点云登记的应用显着进展。尽管他们的成功,这些方法往往在测试时试图对齐看不见的点云时具有较差的泛化性能。 PointNetLK已经通过利用卢卡斯&奏(LK)图像对准算法,以点云登记的固有性质概括证明了例外。该方法在很大程度上依赖于通过有限分化的梯度的估计 - 一种策略,本质上是病态和步长选择高度敏感。为了避免这些问题,我们建议使用分析梯度确定性PointNetLK方法。我们还开发了多种策略,以提高大容量点云处理。我们我们的做法比较规范PointNetLK和其他国家的最先进的方法和证明我们的方法如何提供高保真度准确,可靠的定位。在嘈杂的,稀疏,局部点云扩展的实验描述我们的许多真实世界场景的方法的效用。此外,雅可比矩阵的分解,得到为备用翘曲函数的嵌入特征的再利用。
5. Graph Neural Networks for 3D Multi-Object Tracking [PDF] 返回目录
Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work often uses a tracking-by-detection pipeline, where the feature of each object is extracted independently to compute an affinity matrix. Then, the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. To that end, we propose two innovative techniques: (1) instead of obtaining the features for each object independently, we propose a novel feature interaction mechanism by introducing Graph Neural Networks; (2) instead of obtaining the features from either 2D or 3D space as in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space. Through experiments on the KITTI dataset, our proposed method achieves state-of-the-art 3D MOT performance. Our project website is at this http URL.
摘要:3D多目标追踪(MOT)是自治系统是至关重要的。最近的工作经常使用的跟踪逐检测流水线,其中每个对象的特征独立地提取出以计算亲和度矩阵。然后,将亲和基质被传递给匈牙利算法用于数据关联。这条管线的一个关键的过程是学习,以减少数据关联时发生混淆不同对象的判别特征。为此,我们提出了两种创新的技术:(1),而不是独立地获得每个对象的特征,我们提出了通过引入图形神经网络的新的特征交互机制; (2)而不是从获得二维或三维空间中的特征如在现有工作中,我们提出了一种新的联合特征提取器学习的外观和运动从二维和三维空间的功能。通过对数据集KITTI实验,我们提出的方法实现了国家的最先进的三维MOT性能。我们的项目网站是在这个HTTP URL。
Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work often uses a tracking-by-detection pipeline, where the feature of each object is extracted independently to compute an affinity matrix. Then, the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. To that end, we propose two innovative techniques: (1) instead of obtaining the features for each object independently, we propose a novel feature interaction mechanism by introducing Graph Neural Networks; (2) instead of obtaining the features from either 2D or 3D space as in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space. Through experiments on the KITTI dataset, our proposed method achieves state-of-the-art 3D MOT performance. Our project website is at this http URL.
摘要:3D多目标追踪(MOT)是自治系统是至关重要的。最近的工作经常使用的跟踪逐检测流水线,其中每个对象的特征独立地提取出以计算亲和度矩阵。然后,将亲和基质被传递给匈牙利算法用于数据关联。这条管线的一个关键的过程是学习,以减少数据关联时发生混淆不同对象的判别特征。为此,我们提出了两种创新的技术:(1),而不是独立地获得每个对象的特征,我们提出了通过引入图形神经网络的新的特征交互机制; (2)而不是从获得二维或三维空间中的特征如在现有工作中,我们提出了一种新的联合特征提取器学习的外观和运动从二维和三维空间的功能。通过对数据集KITTI实验,我们提出的方法实现了国家的最先进的三维MOT性能。我们的项目网站是在这个HTTP URL。
6. Single-Image Depth Prediction Makes Feature Matching Easier [PDF] 返回目录
Carl Toft, Daniyar Turmukhambetov, Torsten Sattler, Fredrik Kahl, Gabriel Brostow
Abstract: Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.
摘要:良好的局部特征改善许多3D重新定位和多视图重建管道的坚固性。的问题是,观看角度和距离严重影响的局部特征的识别性。试图通过选择更好的局部特征点或通过利用外界信息以改善外观不变,都来与由他们中的一些不切实际的先决条件。在本文中,我们提出了一个令人惊讶的有效增强局部特征提取,从而提高了匹配。我们发现,从单一的RGB图像推断基于CNN-深度是非常有帮助的,尽管他们的缺点。他们让我们预先扭曲图像和整流透视畸变,以显著提升SIFT轻快的特点,使更多精彩的比赛,即使相机在看同一场景,但方向相反。
Carl Toft, Daniyar Turmukhambetov, Torsten Sattler, Fredrik Kahl, Gabriel Brostow
Abstract: Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.
摘要:良好的局部特征改善许多3D重新定位和多视图重建管道的坚固性。的问题是,观看角度和距离严重影响的局部特征的识别性。试图通过选择更好的局部特征点或通过利用外界信息以改善外观不变,都来与由他们中的一些不切实际的先决条件。在本文中,我们提出了一个令人惊讶的有效增强局部特征提取,从而提高了匹配。我们发现,从单一的RGB图像推断基于CNN-深度是非常有帮助的,尽管他们的缺点。他们让我们预先扭曲图像和整流透视畸变,以显著提升SIFT轻快的特点,使更多精彩的比赛,即使相机在看同一场景,但方向相反。
7. Deep Phase Correlation for End-to-End Heterogeneous Sensor Measurements Matching [PDF] 返回目录
Zexi Chen, Xuecheng Xu, Yue Wang, Rong Xiong
Abstract: The crucial step for localization is to match the current observation to the map. When the two sensor modalities are significantly different, matching becomes challenging. In this paper, we present an end-to-end deep phase correlation network (DPCN) to match heterogeneous sensor measurements. In DPCN, the primary component is a differentiable correlation-based estimator that back-propagates the pose error to learnable feature extractors, which addresses the problem that there are no direct common features for supervision. Also, it eliminates the exhaustive evaluation in some previous methods, improving efficiency. With the interpretable modeling, the network is light-weighted and promising for better generalization. We evaluate the system on both the simulation data and Aero-Ground Dataset which consists of heterogeneous sensor images and aerial images acquired by satellites or aerial robots. The results show that our method is able to match the heterogeneous sensor measurements, outperforming the comparative traditional phase correlation and other learning-based methods.
摘要:本地化的关键一步是符合目前观测到地图中。当两个传感器模式是显著不同,匹配变得具有挑战性。在本文中,我们提出了一个端至端深相位相关性网络(DPCN)来匹配异质传感器测量。在DPCN,主要成分是微基于相关的估计是回传播的姿态误差可以学习的特征提取,监督该地址不存在直接的常见问题功能。此外,它消除了以前的一些方法的详尽评测,提高工作效率。随着可解释的建模,网络是重量轻,并承诺为更好的推广。我们评估它由异质传感器的图像和通过卫星或航空机器人获取的航空图像对模拟数据和航空地面两者数据集的系统。结果表明,我们的方法是能够匹配异构传感器测量,表现优于比较传统的相位相关和其他基于学习的方法。
Zexi Chen, Xuecheng Xu, Yue Wang, Rong Xiong
Abstract: The crucial step for localization is to match the current observation to the map. When the two sensor modalities are significantly different, matching becomes challenging. In this paper, we present an end-to-end deep phase correlation network (DPCN) to match heterogeneous sensor measurements. In DPCN, the primary component is a differentiable correlation-based estimator that back-propagates the pose error to learnable feature extractors, which addresses the problem that there are no direct common features for supervision. Also, it eliminates the exhaustive evaluation in some previous methods, improving efficiency. With the interpretable modeling, the network is light-weighted and promising for better generalization. We evaluate the system on both the simulation data and Aero-Ground Dataset which consists of heterogeneous sensor images and aerial images acquired by satellites or aerial robots. The results show that our method is able to match the heterogeneous sensor measurements, outperforming the comparative traditional phase correlation and other learning-based methods.
摘要:本地化的关键一步是符合目前观测到地图中。当两个传感器模式是显著不同,匹配变得具有挑战性。在本文中,我们提出了一个端至端深相位相关性网络(DPCN)来匹配异质传感器测量。在DPCN,主要成分是微基于相关的估计是回传播的姿态误差可以学习的特征提取,监督该地址不存在直接的常见问题功能。此外,它消除了以前的一些方法的详尽评测,提高工作效率。随着可解释的建模,网络是重量轻,并承诺为更好的推广。我们评估它由异质传感器的图像和通过卫星或航空机器人获取的航空图像对模拟数据和航空地面两者数据集的系统。结果表明,我们的方法是能够匹配异构传感器测量,表现优于比较传统的相位相关和其他基于学习的方法。
8. DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild [PDF] 返回目录
Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, Grégory Rogez
Abstract: We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous work, training data has been annotated or generated for simpler tasks focusing on bodies, hands or faces separately. In this work, we propose to take advantage of these datasets to train independent experts for each part, namely a body, a hand and a face expert, and distill their knowledge into a single deep network designed for whole-body 2D-3D pose detection. In practice, given a training image with partial or no annotation, each part expert detects its subset of keypoints in 2D and 3D and the resulting estimations are combined to obtain whole-body pseudo ground-truth poses. A distillation loss encourages the whole-body predictions to mimic the experts' outputs. Our results show that this approach significantly outperforms the same whole-body model trained without distillation while staying close to the performance of the experts. Importantly, DOPE is computationally less demanding than the ensemble of experts and can achieve real-time performance. Test code and models are available at this https URL.
摘要:介绍DOPE,检测和评估全身三维人体姿势,包括身体,手和脸,在野外第一种方法。实现这种程度的细节是对一些需要理解相互之间或与环境的人的互动应用的关键。主要的挑战是与标记的全身3D姿势缺乏最狂野的数据。在以往的工作,训练数据已经被注释或简单的任务集中在机构产生的,手或单独面对。在这项工作中,我们提出要利用这些数据集来训练独立专家对各部分,即身体,手和脸的专家和提炼他们的知识转化为专为全身2D-3D姿势检测一个深网。在实践中,用给定的部分或没有注释的训练图像,每个部分专家检测在二维和三维关键点的子集其将得到的估计被组合以获得全身伪地面实况姿势。的蒸馏损失鼓励全身预测模仿专家的输出。我们的研究结果表明,该方法显著优于同时保持接近专家的性能,而蒸馏训练一样全身模型。重要的是,原液计算要求不高比合奏的专家组成,可以实现实时性能。测试代码和型号可供选择,在此HTTPS URL。
Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, Grégory Rogez
Abstract: We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous work, training data has been annotated or generated for simpler tasks focusing on bodies, hands or faces separately. In this work, we propose to take advantage of these datasets to train independent experts for each part, namely a body, a hand and a face expert, and distill their knowledge into a single deep network designed for whole-body 2D-3D pose detection. In practice, given a training image with partial or no annotation, each part expert detects its subset of keypoints in 2D and 3D and the resulting estimations are combined to obtain whole-body pseudo ground-truth poses. A distillation loss encourages the whole-body predictions to mimic the experts' outputs. Our results show that this approach significantly outperforms the same whole-body model trained without distillation while staying close to the performance of the experts. Importantly, DOPE is computationally less demanding than the ensemble of experts and can achieve real-time performance. Test code and models are available at this https URL.
摘要:介绍DOPE,检测和评估全身三维人体姿势,包括身体,手和脸,在野外第一种方法。实现这种程度的细节是对一些需要理解相互之间或与环境的人的互动应用的关键。主要的挑战是与标记的全身3D姿势缺乏最狂野的数据。在以往的工作,训练数据已经被注释或简单的任务集中在机构产生的,手或单独面对。在这项工作中,我们提出要利用这些数据集来训练独立专家对各部分,即身体,手和脸的专家和提炼他们的知识转化为专为全身2D-3D姿势检测一个深网。在实践中,用给定的部分或没有注释的训练图像,每个部分专家检测在二维和三维关键点的子集其将得到的估计被组合以获得全身伪地面实况姿势。的蒸馏损失鼓励全身预测模仿专家的输出。我们的研究结果表明,该方法显著优于同时保持接近专家的性能,而蒸馏训练一样全身模型。重要的是,原液计算要求不高比合奏的专家组成,可以实现实时性能。测试代码和型号可供选择,在此HTTPS URL。
9. An Improved Person Re-identification Method by light-weight convolutional neural network [PDF] 返回目录
Sajad Amouei Sheshkal, Kazim Fouladi-Ghaleh, Hossein Aghababa
Abstract: Person Re-identification is defined as a recognizing process where the person is observed by non-overlapping cameras at different places. In the last decade, the rise in the applications and importance of Person Re-identification for surveillance systems popularized this subject in different areas of computer vision. Person Re-identification is faced with challenges such as low resolution, varying poses, illumination, background clutter, and occlusion, which could affect the result of recognizing process. The present paper aims to improve Person Re-identification using transfer learning and application of verification loss function within the framework of Siamese network. The Siamese network receives image pairs as inputs and extract their features via a pre-trained model. EfficientNet was employed to obtain discriminative features and reduce the demands for data. The advantages of verification loss were used in the network learning. Experiments showed that the proposed model performs better than state-of-the-art methods on the CUHK01 dataset. For example, rank5 accuracies are 95.2% (+5.7) for the CUHK01 datasets. It also achieved an acceptable percentage in Rank 1. Because of the small size of the pre-trained model parameters, learning speeds up and there will be a need for less hardware and data.
摘要:人重新识别被定义为其中所述人是由非重叠摄像机在不同的位置观察到的识别处理。在过去的十年中,监控系统中应用的兴起和重要性的人重新鉴定普及计算机视觉的不同区域这个问题。人重新鉴定面临着如低的分辨率,改变姿势,照明,背景杂波,和闭塞,这可能会影响识别过程的结果的挑战。本研究旨在利用连体网络的框架内迁移学习和验证损失函数的应用,以提高人重新鉴定。连体网络接收图像对作为输入,并且经由一个预训练的模型提取其特性。 EfficientNet被用于获得判别特征并减少数据的需求。验证损失的优势,在网络学习中使用。实验表明,该模型比对CUHK01数据集的国家的最先进的方法更好。例如,RANK5精度是用于CUHK01数据集95.2%(5.7)。它还在排名1.实现可接受的百分比由于预先训练的模型参数的规模小,学习的速度并会有需要较少的硬件和数据。
Sajad Amouei Sheshkal, Kazim Fouladi-Ghaleh, Hossein Aghababa
Abstract: Person Re-identification is defined as a recognizing process where the person is observed by non-overlapping cameras at different places. In the last decade, the rise in the applications and importance of Person Re-identification for surveillance systems popularized this subject in different areas of computer vision. Person Re-identification is faced with challenges such as low resolution, varying poses, illumination, background clutter, and occlusion, which could affect the result of recognizing process. The present paper aims to improve Person Re-identification using transfer learning and application of verification loss function within the framework of Siamese network. The Siamese network receives image pairs as inputs and extract their features via a pre-trained model. EfficientNet was employed to obtain discriminative features and reduce the demands for data. The advantages of verification loss were used in the network learning. Experiments showed that the proposed model performs better than state-of-the-art methods on the CUHK01 dataset. For example, rank5 accuracies are 95.2% (+5.7) for the CUHK01 datasets. It also achieved an acceptable percentage in Rank 1. Because of the small size of the pre-trained model parameters, learning speeds up and there will be a need for less hardware and data.
摘要:人重新识别被定义为其中所述人是由非重叠摄像机在不同的位置观察到的识别处理。在过去的十年中,监控系统中应用的兴起和重要性的人重新鉴定普及计算机视觉的不同区域这个问题。人重新鉴定面临着如低的分辨率,改变姿势,照明,背景杂波,和闭塞,这可能会影响识别过程的结果的挑战。本研究旨在利用连体网络的框架内迁移学习和验证损失函数的应用,以提高人重新鉴定。连体网络接收图像对作为输入,并且经由一个预训练的模型提取其特性。 EfficientNet被用于获得判别特征并减少数据的需求。验证损失的优势,在网络学习中使用。实验表明,该模型比对CUHK01数据集的国家的最先进的方法更好。例如,RANK5精度是用于CUHK01数据集95.2%(5.7)。它还在排名1.实现可接受的百分比由于预先训练的模型参数的规模小,学习的速度并会有需要较少的硬件和数据。
10. Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification [PDF] 返回目录
Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Huang Da, Jun Cheng, Bin Hu
Abstract: Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations ("locality"), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. Our codes are available at this https URL.
摘要:基于步态人重新鉴定(重新-ID)是安全关键型应用有价值的,并且仅使用的三维骨骼数据,以提取人重新编号辨别步态特征是一个新兴的开放式话题。现有的方法要么采用手工制作的功能或通过传统的监督学习范式学习的步态特征。不同于以往的方法,我们首次提出了一种通用的步态编码的方法,可以利用未骨架的资料来了解在自我监督的方式步态表示。具体而言,我们首先提出通过学习来重建骨架以相反的顺序,这有利于学习更丰富的高层次语义和更好的步态表示序列输入引入自我监督。其次,由事实启发是运动的连续性赋予了较高的相关性(“地区”),时间上相邻的骨架,我们建议,鼓励重建当前骨架时学大的关注权重暂时相邻骨架一个地方感知注意机制,以便了解当地编码步态时。最后,我们提出了基于注意力步态编码产物(AGEs),其使用由局部性感知注意了解到情境矢量,作为最终的步态表示建。的AGEs被直接用来实现有效的人重新ID。我们的做法通常是提高了10-20%,排名-1精度现有的基于骨架的方法,它实现了媲美甚至优于性能,多模态方法有额外的RGB或深度信息。我们的代码可在此HTTPS URL。
Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Huang Da, Jun Cheng, Bin Hu
Abstract: Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations ("locality"), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. Our codes are available at this https URL.
摘要:基于步态人重新鉴定(重新-ID)是安全关键型应用有价值的,并且仅使用的三维骨骼数据,以提取人重新编号辨别步态特征是一个新兴的开放式话题。现有的方法要么采用手工制作的功能或通过传统的监督学习范式学习的步态特征。不同于以往的方法,我们首次提出了一种通用的步态编码的方法,可以利用未骨架的资料来了解在自我监督的方式步态表示。具体而言,我们首先提出通过学习来重建骨架以相反的顺序,这有利于学习更丰富的高层次语义和更好的步态表示序列输入引入自我监督。其次,由事实启发是运动的连续性赋予了较高的相关性(“地区”),时间上相邻的骨架,我们建议,鼓励重建当前骨架时学大的关注权重暂时相邻骨架一个地方感知注意机制,以便了解当地编码步态时。最后,我们提出了基于注意力步态编码产物(AGEs),其使用由局部性感知注意了解到情境矢量,作为最终的步态表示建。的AGEs被直接用来实现有效的人重新ID。我们的做法通常是提高了10-20%,排名-1精度现有的基于骨架的方法,它实现了媲美甚至优于性能,多模态方法有额外的RGB或深度信息。我们的代码可在此HTTPS URL。
11. Method to Classify Skin Lesions using Dermoscopic images [PDF] 返回目录
Hemanth Nadipineni
Abstract: Skin cancer is the most common cancer in the existing world constituting one-third of the cancer cases. Benign skin cancers are not fatal, can be cured with proper medication. But it is not the same as the malignant skin cancers. In the case of malignant melanoma, in its peak stage, the maximum life expectancy is less than or equal to 5 years. But, it can be cured if detected in early stages. Though there are numerous clinical procedures, the accuracy of diagnosis falls between 49% to 81% and is time-consuming. So, dermoscopy has been brought into the picture. It helped in increasing the accuracy of diagnosis but could not demolish the error-prone behaviour. A quick and less error-prone solution is needed to diagnose this majorly growing skin cancer. This project deals with the usage of deep learning in skin lesion classification. In this project, an automated model for skin lesion classification using dermoscopic images has been developed with CNN(Convolution Neural Networks) as a training model. Convolution neural networks are known for capturing features of an image. So, they are preferred in analyzing medical images to find the characteristics that drive the model towards success. Techniques like data augmentation for tackling class imbalance, segmentation for focusing on the region of interest and 10-fold cross-validation to make the model robust have been brought into the picture. This project also includes usage of certain preprocessing techniques like brightening the images using piece-wise linear transformation function, grayscale conversion of the image, resize the image. This project throws a set of valuable insights on how the accuracy of the model hikes with the bringing of new input strategies, preprocessing techniques. The best accuracy this model could achieve is 0.886
摘要:皮肤癌是在构成三分之一的癌症病例现存世界上最常见的癌症。良性皮肤癌是不是致命的,可以用适当的药物治疗可以治愈。但它是不一样的恶性皮肤癌。在恶性黑色素瘤的情况下,在其峰值阶段,预期最大寿命小于或等于5年。但是,它可如果早期发现可以治愈。虽然有大量的临床程序,诊断的准确性下降%49 81%之间并且是耗时的。所以,皮肤镜已纳入画面。它帮助在提高诊断的准确性,但不能拆除容易出错的行为。需要快速且不易出错的解决方案来诊断该majorly增长的皮肤癌。该项目涉及深度学习的皮肤损伤的分类使用。在这个项目中,使用皮肤镜图像皮损分类自动化模型已经开发了CNN(卷积神经网络)作为训练模式。卷积神经网络是公知的用于捕获图像的特征。因此,它们是优选在分析医学图像发现,推动迈向成功模型的特点。像数据增量技术为处理类的不平衡,分割为重点的利益和区域10倍交叉验证来使模型健壮已纳入画面。这个项目还包括的某些预处理技术,如增亮使用分段线性变换函数,图像的灰度转换图像的使用,调整图像的大小。该项目投上一组有价值的见解如何模型翘尾因素与新的输入策略带来,预处理技术的精度。这种模式可以达到最佳的精度是0.886
Hemanth Nadipineni
Abstract: Skin cancer is the most common cancer in the existing world constituting one-third of the cancer cases. Benign skin cancers are not fatal, can be cured with proper medication. But it is not the same as the malignant skin cancers. In the case of malignant melanoma, in its peak stage, the maximum life expectancy is less than or equal to 5 years. But, it can be cured if detected in early stages. Though there are numerous clinical procedures, the accuracy of diagnosis falls between 49% to 81% and is time-consuming. So, dermoscopy has been brought into the picture. It helped in increasing the accuracy of diagnosis but could not demolish the error-prone behaviour. A quick and less error-prone solution is needed to diagnose this majorly growing skin cancer. This project deals with the usage of deep learning in skin lesion classification. In this project, an automated model for skin lesion classification using dermoscopic images has been developed with CNN(Convolution Neural Networks) as a training model. Convolution neural networks are known for capturing features of an image. So, they are preferred in analyzing medical images to find the characteristics that drive the model towards success. Techniques like data augmentation for tackling class imbalance, segmentation for focusing on the region of interest and 10-fold cross-validation to make the model robust have been brought into the picture. This project also includes usage of certain preprocessing techniques like brightening the images using piece-wise linear transformation function, grayscale conversion of the image, resize the image. This project throws a set of valuable insights on how the accuracy of the model hikes with the bringing of new input strategies, preprocessing techniques. The best accuracy this model could achieve is 0.886
摘要:皮肤癌是在构成三分之一的癌症病例现存世界上最常见的癌症。良性皮肤癌是不是致命的,可以用适当的药物治疗可以治愈。但它是不一样的恶性皮肤癌。在恶性黑色素瘤的情况下,在其峰值阶段,预期最大寿命小于或等于5年。但是,它可如果早期发现可以治愈。虽然有大量的临床程序,诊断的准确性下降%49 81%之间并且是耗时的。所以,皮肤镜已纳入画面。它帮助在提高诊断的准确性,但不能拆除容易出错的行为。需要快速且不易出错的解决方案来诊断该majorly增长的皮肤癌。该项目涉及深度学习的皮肤损伤的分类使用。在这个项目中,使用皮肤镜图像皮损分类自动化模型已经开发了CNN(卷积神经网络)作为训练模式。卷积神经网络是公知的用于捕获图像的特征。因此,它们是优选在分析医学图像发现,推动迈向成功模型的特点。像数据增量技术为处理类的不平衡,分割为重点的利益和区域10倍交叉验证来使模型健壮已纳入画面。这个项目还包括的某些预处理技术,如增亮使用分段线性变换函数,图像的灰度转换图像的使用,调整图像的大小。该项目投上一组有价值的见解如何模型翘尾因素与新的输入策略带来,预处理技术的精度。这种模式可以达到最佳的精度是0.886
12. Action-Based Representation Learning for Autonomous Driving [PDF] 返回目录
Yi Xiao, Felipe Codevilla, Christopher Pal, Antonio M. Lopez
Abstract: Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).
摘要:人力驱动程序产生一种,原则上可以用来改善自动驾驶系统数据的大量。不幸的是,创建端至端驱动模型,地图传感器数据直接进入驾驶行为是在解释性方面是有问题,通常看似简单的方法有显著困难与交易虚假相关。另外,我们建议使用这种基于动作的驱动数据的学习表示。我们的实验表明,基于启示驾驶模式这种方法预训练可以利用弱注解图像相对少量超越纯粹的终端到终端的驾驶模式,同时更加可解释的。此外,我们证明这一战略是如何优于基于学习逆动力学模型以及其他方法基础上厚重的人文监督(ImageNet)以前的方法。
Yi Xiao, Felipe Codevilla, Christopher Pal, Antonio M. Lopez
Abstract: Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).
摘要:人力驱动程序产生一种,原则上可以用来改善自动驾驶系统数据的大量。不幸的是,创建端至端驱动模型,地图传感器数据直接进入驾驶行为是在解释性方面是有问题,通常看似简单的方法有显著困难与交易虚假相关。另外,我们建议使用这种基于动作的驱动数据的学习表示。我们的实验表明,基于启示驾驶模式这种方法预训练可以利用弱注解图像相对少量超越纯粹的终端到终端的驾驶模式,同时更加可解释的。此外,我们证明这一战略是如何优于基于学习逆动力学模型以及其他方法基础上厚重的人文监督(ImageNet)以前的方法。
13. Automatic sleep stage classification with deep residual networks in a mixed-cohort setting [PDF] 返回目录
Alexander Neergaard Olesen, Poul Jennum, Emmanuel Mignot, Helge B D Sorensen
Abstract: Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15684 polysomnography studies from five different cohorts. We applied four different scenarios: 1) impact of varying time-scales in the model; 2) performance of a single cohort on other cohorts of smaller, greater or equal size relative to the performance of other cohorts on a single cohort; 3) varying the fraction of mixed-cohort training data compared to using single-origin data; and 4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25$\%$: 0.782 $\pm$ 0.097, 95$\%$ CI [0.777-0.787]; 100$\%$: 0.869 $\pm$ 0.064, 95$\%$ CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 $\pm$ 0.102, 95$\%$ CI [0.787-0.790]; 3: 0.808 $\pm$ 0.092, 95$\%$ CI [0.807-0.810]; 4: 0.821 $\pm$ 0.085, 95$\%$ CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.
摘要:研究目的:睡眠阶段的得分是由睡眠专家手动执行,而且容易与低内和interscorer可靠性评分规则的主观解释。许多自动系统依赖于少数小规模的数据库开发模式,并推广到新的数据集,因此是未知的。我们研究了一种新型的深层神经网络评估的几个大型同伙的普遍性。方法:深层神经网络模型,采用多导睡眠15684个研究从五个不同的组群的发展。我们采用四种不同的情况:1)改变模型中的时间尺度的影响; 2)小,更大或相对于在单个队列其他群组的性能相等大小的其他群组的单个队列的性能; 3)变化与使用单源数据混合队列训练数据的部分;和4)比较由2,3和4组群上训练数据的组合模式。结果:总的分类精度随着训练数据(0.25 $ \%$的馏分改进:0.782 $ \下午$ 0.097,95 $ \%$ CI [0.777-0.787]; 100 $ \%$:0.869 $ \下午$ 0.064 ,95 $ \%$ CI [0.864-0.872]),并用数增加的数据源(2:0.788 $ \下午0.102 $ 95 $ \%$ CI [0.787-0.790]; 3:0.808 $ \下午$ 0.092,95 $ \%$ CI [0.807-0.810]; 4:0.821 $ \下午$ 0.085,95 $ \%$ CI [0.819-0.823])。不同的队列展示给其他同伙泛化的不同级别。结论:基于深度学习算法自动睡眠阶段评分系统应该考虑从可用以确保适当的推广多种来源的数据就越多。基准测试数据集公众应提供对未来的研究。
Alexander Neergaard Olesen, Poul Jennum, Emmanuel Mignot, Helge B D Sorensen
Abstract: Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15684 polysomnography studies from five different cohorts. We applied four different scenarios: 1) impact of varying time-scales in the model; 2) performance of a single cohort on other cohorts of smaller, greater or equal size relative to the performance of other cohorts on a single cohort; 3) varying the fraction of mixed-cohort training data compared to using single-origin data; and 4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25$\%$: 0.782 $\pm$ 0.097, 95$\%$ CI [0.777-0.787]; 100$\%$: 0.869 $\pm$ 0.064, 95$\%$ CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 $\pm$ 0.102, 95$\%$ CI [0.787-0.790]; 3: 0.808 $\pm$ 0.092, 95$\%$ CI [0.807-0.810]; 4: 0.821 $\pm$ 0.085, 95$\%$ CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.
摘要:研究目的:睡眠阶段的得分是由睡眠专家手动执行,而且容易与低内和interscorer可靠性评分规则的主观解释。许多自动系统依赖于少数小规模的数据库开发模式,并推广到新的数据集,因此是未知的。我们研究了一种新型的深层神经网络评估的几个大型同伙的普遍性。方法:深层神经网络模型,采用多导睡眠15684个研究从五个不同的组群的发展。我们采用四种不同的情况:1)改变模型中的时间尺度的影响; 2)小,更大或相对于在单个队列其他群组的性能相等大小的其他群组的单个队列的性能; 3)变化与使用单源数据混合队列训练数据的部分;和4)比较由2,3和4组群上训练数据的组合模式。结果:总的分类精度随着训练数据(0.25 $ \%$的馏分改进:0.782 $ \下午$ 0.097,95 $ \%$ CI [0.777-0.787]; 100 $ \%$:0.869 $ \下午$ 0.064 ,95 $ \%$ CI [0.864-0.872]),并用数增加的数据源(2:0.788 $ \下午0.102 $ 95 $ \%$ CI [0.787-0.790]; 3:0.808 $ \下午$ 0.092,95 $ \%$ CI [0.807-0.810]; 4:0.821 $ \下午$ 0.085,95 $ \%$ CI [0.819-0.823])。不同的队列展示给其他同伙泛化的不同级别。结论:基于深度学习算法自动睡眠阶段评分系统应该考虑从可用以确保适当的推广多种来源的数据就越多。基准测试数据集公众应提供对未来的研究。
14. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition [PDF] 返回目录
Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao
Abstract: Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings.The code is available at this https URL
摘要:手势识别已经吸引了相当多的关注,由于在应用中的巨大潜力。尽管很大的进步已经在多模态学习方法最近提出,现有的方法仍然缺乏有效整合,充分发掘时空模式之间的协同有效的手势识别。问题是部分归因于一个事实,即在现有手动设计的网络架构具有在多模态的联合学习效率低。在本文中,我们提出了RGB-d手势识别第一神经结构搜索(NAS)为基础的方法。所提出的方法包括两个主要组件:1)增强通过所提出的3D中心差分卷积(3D-CDC)家族,其是能够通过聚合时间差信息来捕获丰富时间上下文的时间表示;和2)用于多采样率分支和改变模式之间横向连接优化的主链。将所得的多模态多速率网络提供了一个新的角度来理解RGB和深度模式和它们的时间动力学之间的关系。综合实验对三个标准数据集(IsoGD,NvGesture和EgoGesture)进行的,证明在这两个单和多模态settings.The代码的国家的最先进的性能可在此HTTPS URL
Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao
Abstract: Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings.The code is available at this https URL
摘要:手势识别已经吸引了相当多的关注,由于在应用中的巨大潜力。尽管很大的进步已经在多模态学习方法最近提出,现有的方法仍然缺乏有效整合,充分发掘时空模式之间的协同有效的手势识别。问题是部分归因于一个事实,即在现有手动设计的网络架构具有在多模态的联合学习效率低。在本文中,我们提出了RGB-d手势识别第一神经结构搜索(NAS)为基础的方法。所提出的方法包括两个主要组件:1)增强通过所提出的3D中心差分卷积(3D-CDC)家族,其是能够通过聚合时间差信息来捕获丰富时间上下文的时间表示;和2)用于多采样率分支和改变模式之间横向连接优化的主链。将所得的多模态多速率网络提供了一个新的角度来理解RGB和深度模式和它们的时间动力学之间的关系。综合实验对三个标准数据集(IsoGD,NvGesture和EgoGesture)进行的,证明在这两个单和多模态settings.The代码的国家的最先进的性能可在此HTTPS URL
15. Exploiting Scene-specific Features for Object Goal Navigation [PDF] 返回目录
Tommaso Campari, Paolo Eccher, Luciano Serafini, Lamberto Ballan
Abstract: Can the intrinsic relation between an object and the room in which it is usually located help agents in the Visual Navigation Task? We study this question in the context of Object Navigation, a problem in which an agent has to reach an object of a specific class while moving in a complex domestic environment. In this paper, we introduce a new reduced dataset that speeds up the training of navigation models, a notoriously complex task. Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times even without the use of huge computational resources. Therefore, this reduced dataset guarantees a significant benchmark and it can be used to identify promising models that could be then tried on bigger and more challenging datasets. Subsequently, we propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them, highlighting quantitatively how the idea is correct.
摘要:可以将对象和它通常位于帮助代理商在视觉导航任务的房间之间的内在关系?我们研究对象导航,其中的代理人必须达到特定的类的对象,而在一个复杂的家庭环境中移动问题的情况下这个问题。在本文中,我们介绍的是加快导航机型的训练,出了名的复杂的任务,新的数据集减少。我们提出的数据集允许包括不利用在合理的时间内在线内置的地图,即使没有使用大量的计算资源模型的培训。因此,这降低了数据集保证了显著的基准,它可以被用来确定有前途的模式,可以在更大,更具挑战性的数据集被然后试图。随后,我们提出SMTSC模型中,注重基于模型能够利用其中所含的场景和对象之间的相关性,定量突出知道如何正确的。
Tommaso Campari, Paolo Eccher, Luciano Serafini, Lamberto Ballan
Abstract: Can the intrinsic relation between an object and the room in which it is usually located help agents in the Visual Navigation Task? We study this question in the context of Object Navigation, a problem in which an agent has to reach an object of a specific class while moving in a complex domestic environment. In this paper, we introduce a new reduced dataset that speeds up the training of navigation models, a notoriously complex task. Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times even without the use of huge computational resources. Therefore, this reduced dataset guarantees a significant benchmark and it can be used to identify promising models that could be then tried on bigger and more challenging datasets. Subsequently, we propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them, highlighting quantitatively how the idea is correct.
摘要:可以将对象和它通常位于帮助代理商在视觉导航任务的房间之间的内在关系?我们研究对象导航,其中的代理人必须达到特定的类的对象,而在一个复杂的家庭环境中移动问题的情况下这个问题。在本文中,我们介绍的是加快导航机型的训练,出了名的复杂的任务,新的数据集减少。我们提出的数据集允许包括不利用在合理的时间内在线内置的地图,即使没有使用大量的计算资源模型的培训。因此,这降低了数据集保证了显著的基准,它可以被用来确定有前途的模式,可以在更大,更具挑战性的数据集被然后试图。随后,我们提出SMTSC模型中,注重基于模型能够利用其中所含的场景和对象之间的相关性,定量突出知道如何正确的。
16. Align Deep Features for Oriented Object Detection [PDF] 返回目录
Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia
Abstract: The past decade has witnessed significant progress on detecting objects in aerial images that are often distributed with large scale variations and arbitrary orientations. However most of existing methods rely on heuristically defined anchors with different scales, angles and aspect ratios and usually suffer from severe misalignment between anchor boxes and axis-aligned convolutional features, which leads to the common inconsistency between the classification score and localization accuracy. To address this issue, we propose a Single-shot Alignment Network (S$^2$A-Net) consisting of two modules: a Feature Alignment Module (FAM) and an Oriented Detection Module (ODM). The FAM can generate high-quality anchors with an Anchor Refinement Network and adaptively align the convolutional features according to the anchor boxes with a novel Alignment Convolution. The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy. Besides, we further explore the approach to detect objects in large-size images, which leads to a better trade-off between speed and accuracy. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on two commonly used aerial objects datasets (i.e., DOTA and HRSC2016) while keeping high efficiency. The code is available at this https URL.
摘要:在过去的十年对检测航拍图像对象往往与大规模变化和任意方向分布见证显著的进展。然而现有的方法最依靠试探性定义锚具有不同比例,角度和纵横比,并且通常从锚箱和轴对齐卷积特征之间严重错位,受这导致分类评分和定位精度之间的公共不一致性。为了解决这个问题,我们提出了一个单镜头对准网络(S $ ^ 2 $ A-网)由两个模块组成:功能校准模组(FAM)和取向检测模块(ODM)。的FAM可以生成有锚细化网络高质量锚和根据与一个新颖对齐卷积核的锚箱自适应对准卷积功能。的ODM第一采用有源旋转过滤来编码的取向的信息,然后产生方向敏感和取向不变特征,以减轻分类分值和定位精度之间的不一致性。此外,我们进一步探讨检测大尺寸图像的对象,这种方法导致更好的权衡速度和精度之间。大量的实验证明,我们的方法可以在两种常用的空中对象的数据集(即,DOTA和HRSC2016)实现状态的最先进的性能,同时保持高效率。该代码可在此HTTPS URL。
Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia
Abstract: The past decade has witnessed significant progress on detecting objects in aerial images that are often distributed with large scale variations and arbitrary orientations. However most of existing methods rely on heuristically defined anchors with different scales, angles and aspect ratios and usually suffer from severe misalignment between anchor boxes and axis-aligned convolutional features, which leads to the common inconsistency between the classification score and localization accuracy. To address this issue, we propose a Single-shot Alignment Network (S$^2$A-Net) consisting of two modules: a Feature Alignment Module (FAM) and an Oriented Detection Module (ODM). The FAM can generate high-quality anchors with an Anchor Refinement Network and adaptively align the convolutional features according to the anchor boxes with a novel Alignment Convolution. The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy. Besides, we further explore the approach to detect objects in large-size images, which leads to a better trade-off between speed and accuracy. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on two commonly used aerial objects datasets (i.e., DOTA and HRSC2016) while keeping high efficiency. The code is available at this https URL.
摘要:在过去的十年对检测航拍图像对象往往与大规模变化和任意方向分布见证显著的进展。然而现有的方法最依靠试探性定义锚具有不同比例,角度和纵横比,并且通常从锚箱和轴对齐卷积特征之间严重错位,受这导致分类评分和定位精度之间的公共不一致性。为了解决这个问题,我们提出了一个单镜头对准网络(S $ ^ 2 $ A-网)由两个模块组成:功能校准模组(FAM)和取向检测模块(ODM)。的FAM可以生成有锚细化网络高质量锚和根据与一个新颖对齐卷积核的锚箱自适应对准卷积功能。的ODM第一采用有源旋转过滤来编码的取向的信息,然后产生方向敏感和取向不变特征,以减轻分类分值和定位精度之间的不一致性。此外,我们进一步探讨检测大尺寸图像的对象,这种方法导致更好的权衡速度和精度之间。大量的实验证明,我们的方法可以在两种常用的空中对象的数据集(即,DOTA和HRSC2016)实现状态的最先进的性能,同时保持高效率。该代码可在此HTTPS URL。
17. CDE-GAN: Cooperative Dual Evolution Based Generative Adversarial Network [PDF] 返回目录
Shiming Chen, Wenjie Wang, Beihao Xia, Xinge You, Zehong Cao, Weiping Ding
Abstract: Generative adversarial networks (GANs) have been a popular deep generative model for real-word applications. Despite many recent efforts on GANs have been contributed, however, mode collapse and instability of GANs are still open problems caused by their adversarial optimization difficulties. In this paper, motivated by the cooperative co-evolutionary algorithm, we propose a Cooperative Dual Evolution based Generative Adversarial Network (CDE-GAN) to circumvent these drawbacks. In essence, CDE-GAN incorporates dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial framework, thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes, and to improve generative performance. Specifically, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (E-Generators and EDiscriminators), evolved by an individual evolutionary algorithm. Additionally, to keep the balance between E-Generators and EDiscriminators, we proposed a Soft Mechanism to cooperate them to conduct effective adversarial training. Extensive experiments on one synthetic dataset and three real-world benchmark image datasets, demonstrate that the proposed CDE-GAN achieves the competitive and superior performance in generating good quality and diverse samples over baselines. The code and more generated results are available at our project homepage this https URL.
摘要:创成对抗网络(甘斯)一直是真正的字应用的流行深生成模型。尽管在许多甘斯最近的努力已经做出了贡献,但是,模式崩溃和甘斯的不稳定仍然是造成其对抗优化困难有待解决的问题。在本文中,通过合作协同进化算法的动机,我们提出了基于协同进化的双对抗性剖成网(CDE-GAN)来规避这些缺点。在本质上,CDE-GAN结合有双进化相对于发电机(一个或多个)和鉴别成一个统一的进化对抗性框架,从而它利用了互补的性质并注入双突变多样性进入训练稳步多样化估计密度在捕获多模式并提高生成的性能。具体而言,CDE-GAN分解复杂的对抗性优化问题分为两个子问题(产生和歧视),并且每个子问题的解决,是一个分离的亚群(E-发电机和EDiscriminators)中,由一个单独的进化算法演变而来。此外,为了保持E-发电机和EDiscriminators之间的平衡,我们提出了一个软机制合作,他们进行有效的对抗训练。在一个合成数据集和三个真实世界的标杆形象数据集大量的实验,证明了所提出的CDE-GaN实现了产生良好的质量和多样化的样本超过基线的竞争和优越的性能。代码和生成更多的结果可在我们的项目主页此HTTPS URL。
Shiming Chen, Wenjie Wang, Beihao Xia, Xinge You, Zehong Cao, Weiping Ding
Abstract: Generative adversarial networks (GANs) have been a popular deep generative model for real-word applications. Despite many recent efforts on GANs have been contributed, however, mode collapse and instability of GANs are still open problems caused by their adversarial optimization difficulties. In this paper, motivated by the cooperative co-evolutionary algorithm, we propose a Cooperative Dual Evolution based Generative Adversarial Network (CDE-GAN) to circumvent these drawbacks. In essence, CDE-GAN incorporates dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial framework, thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes, and to improve generative performance. Specifically, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (E-Generators and EDiscriminators), evolved by an individual evolutionary algorithm. Additionally, to keep the balance between E-Generators and EDiscriminators, we proposed a Soft Mechanism to cooperate them to conduct effective adversarial training. Extensive experiments on one synthetic dataset and three real-world benchmark image datasets, demonstrate that the proposed CDE-GAN achieves the competitive and superior performance in generating good quality and diverse samples over baselines. The code and more generated results are available at our project homepage this https URL.
摘要:创成对抗网络(甘斯)一直是真正的字应用的流行深生成模型。尽管在许多甘斯最近的努力已经做出了贡献,但是,模式崩溃和甘斯的不稳定仍然是造成其对抗优化困难有待解决的问题。在本文中,通过合作协同进化算法的动机,我们提出了基于协同进化的双对抗性剖成网(CDE-GAN)来规避这些缺点。在本质上,CDE-GAN结合有双进化相对于发电机(一个或多个)和鉴别成一个统一的进化对抗性框架,从而它利用了互补的性质并注入双突变多样性进入训练稳步多样化估计密度在捕获多模式并提高生成的性能。具体而言,CDE-GAN分解复杂的对抗性优化问题分为两个子问题(产生和歧视),并且每个子问题的解决,是一个分离的亚群(E-发电机和EDiscriminators)中,由一个单独的进化算法演变而来。此外,为了保持E-发电机和EDiscriminators之间的平衡,我们提出了一个软机制合作,他们进行有效的对抗训练。在一个合成数据集和三个真实世界的标杆形象数据集大量的实验,证明了所提出的CDE-GaN实现了产生良好的质量和多样化的样本超过基线的竞争和优越的性能。代码和生成更多的结果可在我们的项目主页此HTTPS URL。
18. Learning Camera-Aware Noise Models [PDF] 返回目录
Ke-Chi Chang, Ren Wang, Hung-Jin Lin, Yu-Lun Liu, Chia-Ping Chen, Yu-Lin Chang, Hwann-Tzong Chen
Abstract: Modeling imaging sensor noise is a fundamental problem for image processing and computer vision applications. While most previous works adopt statistical noise models, real-world noise is far more complicated and beyond what these models can describe. To tackle this issue, we propose a data-driven approach, where a generative noise model is learned from real-world noise. The proposed noise model is camera-aware, that is, different noise characteristics of different camera sensors can be learned simultaneously, and a single learned noise model can generate different noise for different camera sensors. Experimental results show that our method quantitatively and qualitatively outperforms existing statistical noise models and learning-based methods.
摘要:建模成像传感器噪声是用于图像处理和计算机视觉应用的一个基本问题。虽然大多数以前的作品中采用的统计噪声模型,真实世界的噪声更为复杂,超出了这些模型可以描述。为了解决这个问题,我们提出了一个数据驱动的方法,其中生成噪声模型是从真实世界的噪声教训。所提出的噪声模型是相机感知,即,不同的相机传感器中的不同的噪声特性,可同时了解到,和一个单一的学习噪声模型可以生成用于不同相机传感器不同的噪声。实验结果表明,该方法定量和定性优于现有的统计噪声模型和学习基础的方法。
Ke-Chi Chang, Ren Wang, Hung-Jin Lin, Yu-Lun Liu, Chia-Ping Chen, Yu-Lin Chang, Hwann-Tzong Chen
Abstract: Modeling imaging sensor noise is a fundamental problem for image processing and computer vision applications. While most previous works adopt statistical noise models, real-world noise is far more complicated and beyond what these models can describe. To tackle this issue, we propose a data-driven approach, where a generative noise model is learned from real-world noise. The proposed noise model is camera-aware, that is, different noise characteristics of different camera sensors can be learned simultaneously, and a single learned noise model can generate different noise for different camera sensors. Experimental results show that our method quantitatively and qualitatively outperforms existing statistical noise models and learning-based methods.
摘要:建模成像传感器噪声是用于图像处理和计算机视觉应用的一个基本问题。虽然大多数以前的作品中采用的统计噪声模型,真实世界的噪声更为复杂,超出了这些模型可以描述。为了解决这个问题,我们提出了一个数据驱动的方法,其中生成噪声模型是从真实世界的噪声教训。所提出的噪声模型是相机感知,即,不同的相机传感器中的不同的噪声特性,可同时了解到,和一个单一的学习噪声模型可以生成用于不同相机传感器不同的噪声。实验结果表明,该方法定量和定性优于现有的统计噪声模型和学习基础的方法。
19. Learning Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation with Few Labeled Source Samples [PDF] 返回目录
Jinfeng Li, Weifeng Liu, Yicong Zhou, Jun Yu, Dapeng Tao
Abstract: Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain. However, these algorithms will be infeasible when only a few labeled data exist in the source domain, and thus the performance decreases significantly. To address this challenge, we propose a Domain-invariant Graph Learning (DGL) approach for domain adaptation with only a few labeled source samples. Firstly, DGL introduces the Nystrom method to construct a plastic graph that shares similar geometric property as the target domain. And then, DGL flexibly employs the Nystrom approximation error to measure the divergence between plastic graph and source graph to formalize the distribution mismatch from the geometric perspective. Through minimizing the approximation error, DGL learns a domain-invariant geometric graph to bridge source and target domains. Finally, we integrate the learned domain-invariant graph with the semi-supervised learning and further propose an adaptive semi-supervised model to handle the cross-domain problems. The results of extensive experiments on popular datasets verify the superiority of DGL, especially when only a few labeled source samples are available.
摘要:域名适应旨在从源域推广的典范,以解决相关但不同的目标域的任务。传统域自适应算法假定足够标记的数据,其将被视为现有知识是在源域中可用。然而,当在源域中只存在几个标记的数据,并且因此表现显著降低这些算法将是不可行的。为了应对这一挑战,我们提出了适应域只有几个标记源样本域不变图表学习(DGL)的方法。首先,DGL介绍了Nystrom方法的构建塑料图,其股类似的几何属性作为目标域。然后,DGL灵活采用奈斯特龙近似误差测量图表塑料和源极之间图形的发散,从几何角度正规化分布不匹配。通过最小化近似误差,DGL学习域不变几何图以桥源和目标域。最后,我们用半监督学习整合了解到域不变的图形,并进一步提出了自适应半监督模型来处理跨域问题。对流行的数据集广泛的实验结果验证了DGL的优势,尤其是当只有少数几个标记源样本可用。
Jinfeng Li, Weifeng Liu, Yicong Zhou, Jun Yu, Dapeng Tao
Abstract: Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain. However, these algorithms will be infeasible when only a few labeled data exist in the source domain, and thus the performance decreases significantly. To address this challenge, we propose a Domain-invariant Graph Learning (DGL) approach for domain adaptation with only a few labeled source samples. Firstly, DGL introduces the Nystrom method to construct a plastic graph that shares similar geometric property as the target domain. And then, DGL flexibly employs the Nystrom approximation error to measure the divergence between plastic graph and source graph to formalize the distribution mismatch from the geometric perspective. Through minimizing the approximation error, DGL learns a domain-invariant geometric graph to bridge source and target domains. Finally, we integrate the learned domain-invariant graph with the semi-supervised learning and further propose an adaptive semi-supervised model to handle the cross-domain problems. The results of extensive experiments on popular datasets verify the superiority of DGL, especially when only a few labeled source samples are available.
摘要:域名适应旨在从源域推广的典范,以解决相关但不同的目标域的任务。传统域自适应算法假定足够标记的数据,其将被视为现有知识是在源域中可用。然而,当在源域中只存在几个标记的数据,并且因此表现显著降低这些算法将是不可行的。为了应对这一挑战,我们提出了适应域只有几个标记源样本域不变图表学习(DGL)的方法。首先,DGL介绍了Nystrom方法的构建塑料图,其股类似的几何属性作为目标域。然后,DGL灵活采用奈斯特龙近似误差测量图表塑料和源极之间图形的发散,从几何角度正规化分布不匹配。通过最小化近似误差,DGL学习域不变几何图以桥源和目标域。最后,我们用半监督学习整合了解到域不变的图形,并进一步提出了自适应半监督模型来处理跨域问题。对流行的数据集广泛的实验结果验证了DGL的优势,尤其是当只有少数几个标记源样本可用。
20. SSGP: Sparse Spatial Guided Propagation for Robust and Generic Interpolation [PDF] 返回目录
René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker
Abstract: Interpolation of sparse pixel information towards a dense target resolution finds its application across multiple disciplines in computer vision. State-of-the-art interpolation of motion fields applies model-based interpolation that makes use of edge information extracted from the target image. For depth completion, data-driven learning approaches are widespread. Our work is inspired by latest trends in depth completion that tackle the problem of dense guidance for sparse information. We extend these ideas and create a generic cross-domain architecture that can be applied for a multitude of interpolation problems like optical flow, scene flow, or depth completion. In our experiments, we show that our proposed concept of Sparse Spatial Guided Propagation (SSGP) achieves improvements to robustness, accuracy, or speed compared to specialized algorithms.
摘要:对密集目标分辨率稀疏像素插值信息发现计算机视觉的跨多个学科的应用。运动的场的状态的最先进的内插应用于基于模型的插值,使得使用的从目标图像提取的边缘信息。对于深度完成,数据驱动的学习方法十分普遍。我们的工作是通过深入完成该处理的稀疏信息密集指导的问题最新趋势的启发。我们扩展这些想法并创建可应用于像光流,现场流,或深度完成插值的许多问题的通用跨域架构。在我们的实验中,我们证明了我们所提出的稀疏的空间的概念指导下传播(SSGP)实现了改进的鲁棒性,准确性和速度相比,专门的算法。
René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker
Abstract: Interpolation of sparse pixel information towards a dense target resolution finds its application across multiple disciplines in computer vision. State-of-the-art interpolation of motion fields applies model-based interpolation that makes use of edge information extracted from the target image. For depth completion, data-driven learning approaches are widespread. Our work is inspired by latest trends in depth completion that tackle the problem of dense guidance for sparse information. We extend these ideas and create a generic cross-domain architecture that can be applied for a multitude of interpolation problems like optical flow, scene flow, or depth completion. In our experiments, we show that our proposed concept of Sparse Spatial Guided Propagation (SSGP) achieves improvements to robustness, accuracy, or speed compared to specialized algorithms.
摘要:对密集目标分辨率稀疏像素插值信息发现计算机视觉的跨多个学科的应用。运动的场的状态的最先进的内插应用于基于模型的插值,使得使用的从目标图像提取的边缘信息。对于深度完成,数据驱动的学习方法十分普遍。我们的工作是通过深入完成该处理的稀疏信息密集指导的问题最新趋势的启发。我们扩展这些想法并创建可应用于像光流,现场流,或深度完成插值的许多问题的通用跨域架构。在我们的实验中,我们证明了我们所提出的稀疏的空间的概念指导下传播(SSGP)实现了改进的鲁棒性,准确性和速度相比,专门的算法。
21. Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs [PDF] 返回目录
Dingheng Wang, Bijiao Wu, Guangshe Zhao, Hengnu Chen, Lei Deng, Tianyi Yan, Guoqi Li
Abstract: Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, since the modern RNNs, including long-short term memory (LSTM) and gated recurrent unit (GRU) networks, have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and hierarchical Tucker (HT), appears to be the most amazing approach since a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both the space and computation efficiency. In this paper, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that the proposed KCP-RNNs have comparable performance of accuracy with those in other tensor-decomposed formats, and even 278,219x compression ratio could be obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones under similar ranks. Besides, we find KCP has the best potential for parallel computing to accelerate the calculations in neural networks.
摘要:递归神经网络(RNNs)在面向连续的数据,如自然语言处理和视频识别的任务是强大的。然而,由于现代RNNs,包括长短期记忆(LSTM)和门重复单元(GRU)网络,具有复杂的拓扑结构和昂贵的空间/计算复杂度,压缩他们成为近年来的热点和有前途的话题。其中大量的压缩方法,张量分解,例如,张量列车(TT),块术语(BT),张量环(TR)和分层塔克(HT),似乎是最令人惊讶的方法,因为非常高的压缩比可能是获得。然而,这些都不张分解格式可以同时提供空间和计算效率。在本文中,我们考虑到基于新型克罗内克CANDECOMP / PARAFAC(KCP)分解,其从克罗内克张量(KT)分解得到的,通过提出的输入和张量分解量之间的乘法的两个快速算法压缩RNNs。根据我们基于UCF11,YouTube的名人脸部和UCF50数据集实验,可以证实,所提出的KCP-RNNs具有与上述其它张量分解格式精度相当的性能,甚至278,219x压缩比可以由获得低级别KCP。更重要的是,KCP-RNNs是在空间和计算用在类似的行列其它张量分解的人相比,复杂性高效率。此外,我们发现KCP对并行计算加速神经网络计算中最有潜力的。
Dingheng Wang, Bijiao Wu, Guangshe Zhao, Hengnu Chen, Lei Deng, Tianyi Yan, Guoqi Li
Abstract: Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, since the modern RNNs, including long-short term memory (LSTM) and gated recurrent unit (GRU) networks, have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and hierarchical Tucker (HT), appears to be the most amazing approach since a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both the space and computation efficiency. In this paper, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that the proposed KCP-RNNs have comparable performance of accuracy with those in other tensor-decomposed formats, and even 278,219x compression ratio could be obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones under similar ranks. Besides, we find KCP has the best potential for parallel computing to accelerate the calculations in neural networks.
摘要:递归神经网络(RNNs)在面向连续的数据,如自然语言处理和视频识别的任务是强大的。然而,由于现代RNNs,包括长短期记忆(LSTM)和门重复单元(GRU)网络,具有复杂的拓扑结构和昂贵的空间/计算复杂度,压缩他们成为近年来的热点和有前途的话题。其中大量的压缩方法,张量分解,例如,张量列车(TT),块术语(BT),张量环(TR)和分层塔克(HT),似乎是最令人惊讶的方法,因为非常高的压缩比可能是获得。然而,这些都不张分解格式可以同时提供空间和计算效率。在本文中,我们考虑到基于新型克罗内克CANDECOMP / PARAFAC(KCP)分解,其从克罗内克张量(KT)分解得到的,通过提出的输入和张量分解量之间的乘法的两个快速算法压缩RNNs。根据我们基于UCF11,YouTube的名人脸部和UCF50数据集实验,可以证实,所提出的KCP-RNNs具有与上述其它张量分解格式精度相当的性能,甚至278,219x压缩比可以由获得低级别KCP。更重要的是,KCP-RNNs是在空间和计算用在类似的行列其它张量分解的人相比,复杂性高效率。此外,我们发现KCP对并行计算加速神经网络计算中最有潜力的。
22. Domain Adaptation of Learned Features for Visual Localization [PDF] 返回目录
Sungyong Baik, Hyo Jin Kim, Tianwei Shen, Eddy Ilg, Kyoung Mu Lee, Chris Sweeney
Abstract: We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain.
摘要:我们不断变化的条件,如一天,天气和季节的时间下解决视觉定位问题。基于深层神经网络最近了解到的局部特征表明相对于传统的手工制作的地方特色性能优越。然而,在实际情况中,往往存在着训练和目标图像,它可以显著降低定位精度之间存在较大的差距域。尽管现有的方法利用大量数据来解决这个问题,我们提出了一个新颖的和实用的方法,其中仅需要几个例子,以减少间隙域。特别是,我们提出了几个拍域适应框架,了解到当地的功能,在视觉定位不同条件的交易。实验结果表明,超过基线的优异性能,同时使用从所述目标域训练示例的数量稀少。
Sungyong Baik, Hyo Jin Kim, Tianwei Shen, Eddy Ilg, Kyoung Mu Lee, Chris Sweeney
Abstract: We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain.
摘要:我们不断变化的条件,如一天,天气和季节的时间下解决视觉定位问题。基于深层神经网络最近了解到的局部特征表明相对于传统的手工制作的地方特色性能优越。然而,在实际情况中,往往存在着训练和目标图像,它可以显著降低定位精度之间存在较大的差距域。尽管现有的方法利用大量数据来解决这个问题,我们提出了一个新颖的和实用的方法,其中仅需要几个例子,以减少间隙域。特别是,我们提出了几个拍域适应框架,了解到当地的功能,在视觉定位不同条件的交易。实验结果表明,超过基线的优异性能,同时使用从所述目标域训练示例的数量稀少。
23. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image [PDF] 返回目录
Gyeongsik Moon, Shoou-i Yu, He Wen, Takaaki Shiratori, Kyoung Mu Lee
Abstract: Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of \textbf{2.6M labeled single and interacting hand frames} under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available at this https URL.
摘要:手手相互作用的分析是对更好地理解人类行为的一个关键步骤。然而,在3D手姿势估计大部分的研究都集中在隔离单手情况。因此,我们首先提出:(1)大型数据集,InterHand2.6M,和(2)的基线网络,互联网,用于3D从单个RGB图像进行交互的手姿势估计。所提出的InterHand2.6M由\下从多个受试者各种姿势textbf {2.6M标记的单和相互作用手帧}。我们的互联网同时进行3D单和交互的手的形状估计。在我们的实验中,我们证明在3D大收益借力InterHand2.6M相互作用的一手数据交互时手姿势估计精度。我们还报告互联网对InterHand2.6M准确性,作为这个新的数据集强大的基线。最后,我们将展示从3D图像一般交互的手的形状估计结果。我们的代码和数据集可在此HTTPS URL。
Gyeongsik Moon, Shoou-i Yu, He Wen, Takaaki Shiratori, Kyoung Mu Lee
Abstract: Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of \textbf{2.6M labeled single and interacting hand frames} under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available at this https URL.
摘要:手手相互作用的分析是对更好地理解人类行为的一个关键步骤。然而,在3D手姿势估计大部分的研究都集中在隔离单手情况。因此,我们首先提出:(1)大型数据集,InterHand2.6M,和(2)的基线网络,互联网,用于3D从单个RGB图像进行交互的手姿势估计。所提出的InterHand2.6M由\下从多个受试者各种姿势textbf {2.6M标记的单和相互作用手帧}。我们的互联网同时进行3D单和交互的手的形状估计。在我们的实验中,我们证明在3D大收益借力InterHand2.6M相互作用的一手数据交互时手姿势估计精度。我们还报告互联网对InterHand2.6M准确性,作为这个新的数据集强大的基线。最后,我们将展示从3D图像一般交互的手的形状估计结果。我们的代码和数据集可在此HTTPS URL。
24. Robustness and Overfitting Behavior of Implicit Background Models [PDF] 返回目录
Shirley Liu, Charles Lehman, Ghassan AlRegib
Abstract: In this paper, we examine the overfitting behavior of image classification models modified with Implicit Background Estimation (SCrIBE), which transforms them into weakly supervised segmentation models that provide spatial domain visualizations without affecting performance. Using the segmentation masks, we derive an overfit detection criterion that does not require testing labels. In addition, we assess the change in model performance, calibration, and segmentation masks after applying data augmentations as overfitting reduction measures and testing on various types of distorted images.
摘要:在本文中,我们检查与隐背景估计(隶)修改后的图像分类模型的过拟合行为,将它们转换为弱监督分割模型提供了空间域可视化,而不会影响性能。使用分割掩码,我们得出不需要测试标签的过拟合检测标准。此外,我们评估应用数据扩充为过拟合削减措施,以及对各种类型的扭曲图像的测试后,模特表演,校准和分割掩码的变化。
Shirley Liu, Charles Lehman, Ghassan AlRegib
Abstract: In this paper, we examine the overfitting behavior of image classification models modified with Implicit Background Estimation (SCrIBE), which transforms them into weakly supervised segmentation models that provide spatial domain visualizations without affecting performance. Using the segmentation masks, we derive an overfit detection criterion that does not require testing labels. In addition, we assess the change in model performance, calibration, and segmentation masks after applying data augmentations as overfitting reduction measures and testing on various types of distorted images.
摘要:在本文中,我们检查与隐背景估计(隶)修改后的图像分类模型的过拟合行为,将它们转换为弱监督分割模型提供了空间域可视化,而不会影响性能。使用分割掩码,我们得出不需要测试标签的过拟合检测标准。此外,我们评估应用数据扩充为过拟合削减措施,以及对各种类型的扭曲图像的测试后,模特表演,校准和分割掩码的变化。
25. ATG-PVD: Ticketing Parking Violations on A Drone [PDF] 返回目录
Hengli Wang, Yuxuan Liu, Huaiyang Huang, Yuheng Pan, Wenbin Yu, Jialin Jiang, Dianbin Lyu, Mohammud J. Bocus, Ming Liu, Ioannis Pitas, Rui Fan
Abstract: In this paper, we introduce a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). Our proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegally parked car (IPC) candidate investigation module developed based on visual SLAM. The proposed framework was successfully embedded in a drone from ATG Robotics. The experimental results demonstrate that, firstly, our proposed SwiftFlow outperforms all other state-of-the-art unsupervised optical flow estimation approaches in terms of both speed and accuracy; secondly, IPC candidates can be effectively and efficiently detected by our proposed Flow-RCNN, with a better performance than our baseline network, Faster-RCNN; finally, the actual IPCs can be successfully verified by our investigation module after drone re-localization.
摘要:在本文中,我们介绍一种新颖的嫌疑人 - 和 - 调查框架,它可以很容易地嵌入在自动停车违规检测(PVD)无人驾驶飞机。我们提出的框架包括:1)SwiftFlow,无监督光流估计的有效和准确的卷积神经网络(CNN); 2)流的RCNN,流引导CNN汽车检测和分类; 3)非法停放的汽车(IPC)候选人检查模块开发了基于视觉SLAM。拟议的框架已成功嵌入在从ATG机器人无人驾驶飞机。实验结果表明,首先,我们所提出的SwiftFlow优于状态的最先进的无监督光流估计在速度和准确度方面接近所有其他;其次,IPC考生可有效地通过我们提出的流量,RCNN检测,比我们的基准网,快RCNN更好的性能;最后,实际的IPC可以成功无人机重新定位后,我们的调查验证模块。
Hengli Wang, Yuxuan Liu, Huaiyang Huang, Yuheng Pan, Wenbin Yu, Jialin Jiang, Dianbin Lyu, Mohammud J. Bocus, Ming Liu, Ioannis Pitas, Rui Fan
Abstract: In this paper, we introduce a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). Our proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegally parked car (IPC) candidate investigation module developed based on visual SLAM. The proposed framework was successfully embedded in a drone from ATG Robotics. The experimental results demonstrate that, firstly, our proposed SwiftFlow outperforms all other state-of-the-art unsupervised optical flow estimation approaches in terms of both speed and accuracy; secondly, IPC candidates can be effectively and efficiently detected by our proposed Flow-RCNN, with a better performance than our baseline network, Faster-RCNN; finally, the actual IPCs can be successfully verified by our investigation module after drone re-localization.
摘要:在本文中,我们介绍一种新颖的嫌疑人 - 和 - 调查框架,它可以很容易地嵌入在自动停车违规检测(PVD)无人驾驶飞机。我们提出的框架包括:1)SwiftFlow,无监督光流估计的有效和准确的卷积神经网络(CNN); 2)流的RCNN,流引导CNN汽车检测和分类; 3)非法停放的汽车(IPC)候选人检查模块开发了基于视觉SLAM。拟议的框架已成功嵌入在从ATG机器人无人驾驶飞机。实验结果表明,首先,我们所提出的SwiftFlow优于状态的最先进的无监督光流估计在速度和准确度方面接近所有其他;其次,IPC考生可有效地通过我们提出的流量,RCNN检测,比我们的基准网,快RCNN更好的性能;最后,实际的IPC可以成功无人机重新定位后,我们的调查验证模块。
26. Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics [PDF] 返回目录
Dou Xu, Chang Cai, Chaowei Fang, Bin Kong, Jihua Zhu, Zhongyu Li
Abstract: Annotating histopathological images is a time-consuming andlabor-intensive process, which requires broad-certificated pathologistscarefully examining large-scale whole-slide images from cells to tissues.Recent frontiers of transfer learning techniques have been widely investi-gated for image understanding tasks with limited annotations. However,when applied for the analytics of histology images, few of them can effec-tively avoid the performance degradation caused by the domain discrep-ancy between the source training dataset and the target dataset, suchas different tissues, staining appearances, and imaging devices. To thisend, we present a novel method for the unsupervised domain adaptationin histopathological image analysis, based on a backbone for embeddinginput images into a feature space, and a graph neural layer for propa-gating the supervision signals of images with labels. The graph model isset up by connecting every image with its close neighbors in the embed-ded feature space. Then graph neural network is employed to synthesizenew feature representation from every image. During the training stage,target samples with confident inferences are dynamically allocated withpseudo labels. The cross-entropy loss function is used to constrain thepredictions of source samples with manually marked labels and targetsamples with pseudo labels. Furthermore, the maximum mean diversityis adopted to facilitate the extraction of domain-invariant feature repre-sentations, and contrastive learning is exploited to enhance the categorydiscrimination of learned features. In experiments of the unsupervised do-main adaptation for histopathological image classification, our methodachieves state-of-the-art performance on four public datasets
摘要:组织病理学注解图像是一个耗时andlabor密集的过程,这需要广泛的认证pathologistscarefully从细胞研究的大型全幻灯片图像的传输学习技术tissues.Recent国界已被广泛investi门控的图像理解任务用有限的注解。然而,当施加用于组织学图像的分析,他们几个可以短跑运动员地还避免由源训练数据集和目标数据集之间的域discrep-ancy性能退化,suchas不同的组织,染色出现,和成像装置。到thisend,我们提出了adaptationin组织病理学图像分析无监督域的新方法,基于用于embeddinginput图像转换成一个特征空间的骨架,和用于propa选通的图像的监视信号与标签的图表神经层。图表模型由每一个形象与其在嵌入-DED特征空间近邻连接isset起来。然后图形神经网络被用来synthesizenew特征表示从每一个图像。在培训阶段,随着信心的推论目标样本是动态分配withpseudo标签。交叉熵损失函数用来与手动标记的标签和targetsamples赝标签源样本的约束thepredictions。此外,最大平均diversityis采用便于域不变特征repre-sentations的提取,对比学习被利用,以提高了解到特点categorydiscrimination。在组织病理图像分类,我们methodachieves国家的最先进的四个公共数据集的性能无监督的DO-主要适应的实验
Dou Xu, Chang Cai, Chaowei Fang, Bin Kong, Jihua Zhu, Zhongyu Li
Abstract: Annotating histopathological images is a time-consuming andlabor-intensive process, which requires broad-certificated pathologistscarefully examining large-scale whole-slide images from cells to tissues.Recent frontiers of transfer learning techniques have been widely investi-gated for image understanding tasks with limited annotations. However,when applied for the analytics of histology images, few of them can effec-tively avoid the performance degradation caused by the domain discrep-ancy between the source training dataset and the target dataset, suchas different tissues, staining appearances, and imaging devices. To thisend, we present a novel method for the unsupervised domain adaptationin histopathological image analysis, based on a backbone for embeddinginput images into a feature space, and a graph neural layer for propa-gating the supervision signals of images with labels. The graph model isset up by connecting every image with its close neighbors in the embed-ded feature space. Then graph neural network is employed to synthesizenew feature representation from every image. During the training stage,target samples with confident inferences are dynamically allocated withpseudo labels. The cross-entropy loss function is used to constrain thepredictions of source samples with manually marked labels and targetsamples with pseudo labels. Furthermore, the maximum mean diversityis adopted to facilitate the extraction of domain-invariant feature repre-sentations, and contrastive learning is exploited to enhance the categorydiscrimination of learned features. In experiments of the unsupervised do-main adaptation for histopathological image classification, our methodachieves state-of-the-art performance on four public datasets
摘要:组织病理学注解图像是一个耗时andlabor密集的过程,这需要广泛的认证pathologistscarefully从细胞研究的大型全幻灯片图像的传输学习技术tissues.Recent国界已被广泛investi门控的图像理解任务用有限的注解。然而,当施加用于组织学图像的分析,他们几个可以短跑运动员地还避免由源训练数据集和目标数据集之间的域discrep-ancy性能退化,suchas不同的组织,染色出现,和成像装置。到thisend,我们提出了adaptationin组织病理学图像分析无监督域的新方法,基于用于embeddinginput图像转换成一个特征空间的骨架,和用于propa选通的图像的监视信号与标签的图表神经层。图表模型由每一个形象与其在嵌入-DED特征空间近邻连接isset起来。然后图形神经网络被用来synthesizenew特征表示从每一个图像。在培训阶段,随着信心的推论目标样本是动态分配withpseudo标签。交叉熵损失函数用来与手动标记的标签和targetsamples赝标签源样本的约束thepredictions。此外,最大平均diversityis采用便于域不变特征repre-sentations的提取,对比学习被利用,以提高了解到特点categorydiscrimination。在组织病理图像分类,我们methodachieves国家的最先进的四个公共数据集的性能无监督的DO-主要适应的实验
27. Automating the assessment of biofouling in images using expert agreement as a gold standard [PDF] 返回目录
Nathaniel J. Bloomfield, Susan Wei, Bartholomew Woodham, Peter Wilkinson, Andrew Robinson
Abstract: Biofouling is the accumulation of organisms on surfaces immersed in water. It is of particular concern to the international shipping industry because fouling increases the drag on vessels as they move through the water, resulting in higher fuel costs, and presents a biosecurity risk by providing a pathway for marine non-indigenous species (NIS) to establish in new areas. There is growing interest within jurisdictions to strengthen biofouling risk-management regulations, but it is expensive to conduct in-water inspections and assess the collected data to determine the biofouling state of vessel hulls. Machine learning is well suited to tackle the latter challenge, and here we apply so-called deep learning to automate the classification of images from in-water inspections for the presence and severity of biofouling. We combined images collected from in-water surveys conducted by the Australian Department of Agriculture, Water and the Environment, the New Zealand Ministry for Primary Industries and the California State Lands Commission, and annotated them using the Amazon Mechanical Turk (MTurk) crowdsourcing platform. We compared the annotations from three biofouling experts on a 120-sample subset of these images, and found that for two tasks, identifying images containing fouling, and identifying images containing heavy fouling, they showed 89% agreement (95% CI: 87-92%). It was found that the MTurk labelling approach achieved similar agreement with experts, which we defined as performing at most 5% worse than experts (p=0.004-0.020). Our deep learning model trained with the MTurk annotations also showed reasonable performance in comparison to expert agreement, although at a lower significance level (p=0.071-0.093). We also demonstrate that significantly better performance than expert agreement can be achieved if a classifier with high recall or precision was required.
摘要:生物污损是生物体对浸没在水中的表面的积累。这是特别关注国际航运业的,因为通过提供海洋非本地物种途径(NIS)建立结垢的增加对血管的阻力,因为它们在水中移动,从而导致更高的燃料成本,并提出了生物安全风险在新的领域。目前辖区内越来越大的兴趣,加强生物污染的风险管理制度,但它是昂贵的,进行水的检查和评估所收集的数据,以确定容器船体的生物污染状态。机器学习是非常适合应对后者的挑战,在这里,我们采用所谓的深度学习图像的分类,从水检查的生物污染的存在和严重性自动化。我们结合从农业,水利澳大利亚部和环境,新西兰教育部的主要工业和加利福尼亚州进行了水调查所收集的土地委员会的图像,并利用亚马逊的Mechanical Turk(MTurk)众包平台注解了它们。我们比较了三种生物污染专家对这些图像的120样本子集的注释,发现了两个任务,识别含有污垢,并确定含重金属污染图像的图像,他们发现89%的协议(95%CI:87-92 %)。结果发现,在MTurk标记方法实现了类似的协议与专家,我们定义为大多数执行多名专家(P = 0.004-0.020),5%更差。与MTurk注释训练有素我们深度学习模型也显示出相比于专家一致合理的性能,尽管在较低的显着性水平(P = 0.071-0.093)。我们还演示了可如果要求高的召回或精确的分类来实现,除专家一致表现显著较好。
Nathaniel J. Bloomfield, Susan Wei, Bartholomew Woodham, Peter Wilkinson, Andrew Robinson
Abstract: Biofouling is the accumulation of organisms on surfaces immersed in water. It is of particular concern to the international shipping industry because fouling increases the drag on vessels as they move through the water, resulting in higher fuel costs, and presents a biosecurity risk by providing a pathway for marine non-indigenous species (NIS) to establish in new areas. There is growing interest within jurisdictions to strengthen biofouling risk-management regulations, but it is expensive to conduct in-water inspections and assess the collected data to determine the biofouling state of vessel hulls. Machine learning is well suited to tackle the latter challenge, and here we apply so-called deep learning to automate the classification of images from in-water inspections for the presence and severity of biofouling. We combined images collected from in-water surveys conducted by the Australian Department of Agriculture, Water and the Environment, the New Zealand Ministry for Primary Industries and the California State Lands Commission, and annotated them using the Amazon Mechanical Turk (MTurk) crowdsourcing platform. We compared the annotations from three biofouling experts on a 120-sample subset of these images, and found that for two tasks, identifying images containing fouling, and identifying images containing heavy fouling, they showed 89% agreement (95% CI: 87-92%). It was found that the MTurk labelling approach achieved similar agreement with experts, which we defined as performing at most 5% worse than experts (p=0.004-0.020). Our deep learning model trained with the MTurk annotations also showed reasonable performance in comparison to expert agreement, although at a lower significance level (p=0.071-0.093). We also demonstrate that significantly better performance than expert agreement can be achieved if a classifier with high recall or precision was required.
摘要:生物污损是生物体对浸没在水中的表面的积累。这是特别关注国际航运业的,因为通过提供海洋非本地物种途径(NIS)建立结垢的增加对血管的阻力,因为它们在水中移动,从而导致更高的燃料成本,并提出了生物安全风险在新的领域。目前辖区内越来越大的兴趣,加强生物污染的风险管理制度,但它是昂贵的,进行水的检查和评估所收集的数据,以确定容器船体的生物污染状态。机器学习是非常适合应对后者的挑战,在这里,我们采用所谓的深度学习图像的分类,从水检查的生物污染的存在和严重性自动化。我们结合从农业,水利澳大利亚部和环境,新西兰教育部的主要工业和加利福尼亚州进行了水调查所收集的土地委员会的图像,并利用亚马逊的Mechanical Turk(MTurk)众包平台注解了它们。我们比较了三种生物污染专家对这些图像的120样本子集的注释,发现了两个任务,识别含有污垢,并确定含重金属污染图像的图像,他们发现89%的协议(95%CI:87-92 %)。结果发现,在MTurk标记方法实现了类似的协议与专家,我们定义为大多数执行多名专家(P = 0.004-0.020),5%更差。与MTurk注释训练有素我们深度学习模型也显示出相比于专家一致合理的性能,尽管在较低的显着性水平(P = 0.071-0.093)。我们还演示了可如果要求高的召回或精确的分类来实现,除专家一致表现显著较好。
28. Occupancy Anticipation for Efficient Exploration and Navigation [PDF] 返回目录
Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
Abstract: State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awareness more rapidly, which facilitates efficient exploration and navigation in 3D environments. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment, with performance significantly better than strong baselines. Furthermore, when deployed for the sequential decision-making tasks of exploration and navigation, our model outperforms state-of-the-art methods on the Gibson and Matterport3D datasets. Our approach is the winning entry in the 2020 Habitat PointNav Challenge. Project page: this http URL
摘要:国家的最先进的导航方法利用一个空间记忆推广到新的环境,但其占用的地图被限制为捕获由所述代理直接观察到的几何结构。我们建议占有率预期,其中代理使用其自我中心的RGB-d的观测来推断占用状态超出可视区域。在此过程中,代理人建立其空间感更为迅速,这有利于在3D环境中高效勘探和导航。通过在这两个自我中心的观点和自上而下的映射我们的模型预测成功的环境更广泛的地图开发背景下,其性能比强基线显著更好。此外,部署勘探和导航的顺序决策任务时,我们的模型优于对吉布森和Matterport3D数据集的国家的最先进的方法。我们的做法是在2020年人居PointNav挑战赛的获奖作品。项目页面:这个HTTP URL
Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
Abstract: State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awareness more rapidly, which facilitates efficient exploration and navigation in 3D environments. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment, with performance significantly better than strong baselines. Furthermore, when deployed for the sequential decision-making tasks of exploration and navigation, our model outperforms state-of-the-art methods on the Gibson and Matterport3D datasets. Our approach is the winning entry in the 2020 Habitat PointNav Challenge. Project page: this http URL
摘要:国家的最先进的导航方法利用一个空间记忆推广到新的环境,但其占用的地图被限制为捕获由所述代理直接观察到的几何结构。我们建议占有率预期,其中代理使用其自我中心的RGB-d的观测来推断占用状态超出可视区域。在此过程中,代理人建立其空间感更为迅速,这有利于在3D环境中高效勘探和导航。通过在这两个自我中心的观点和自上而下的映射我们的模型预测成功的环境更广泛的地图开发背景下,其性能比强基线显著更好。此外,部署勘探和导航的顺序决策任务时,我们的模型优于对吉布森和Matterport3D数据集的国家的最先进的方法。我们的做法是在2020年人居PointNav挑战赛的获奖作品。项目页面:这个HTTP URL
29. Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid [PDF] 返回目录
Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler
Abstract: In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce \emph{Deformable Grid} DefGrid, a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric downsampling} layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: this http URL
摘要:在现代计算机视觉,图像通常表示为一个固定的均匀网格一些步幅并且经由深卷积神经网络处理。我们认为,电网变形与高频图像内容更好地将是一种更有效的策略。我们引入\ EMPH {变形网格} DefGrid,可以预测一个2维三角形栅格的顶点的位置的偏移量,一可学习神经网络模块,使得与图像边界变形网格对准的边缘。我们展示我们DefGrid在各种使用情况,即通过将其插入作为各级处理的模块。我们利用DefGrid作为端至端\ EMPH {可学习几何下采样}层替换用于馈送的图象分成一深CNN时减少特征分辨率标准池的方法。我们证明在相同的网格分辨率显著改进的结果相比,在均匀网格的语义分割的任务,利用细胞神经网络。我们还利用DefGrid在为对象模板注释的任务输出层,并显示有关我们的预测多边形网格引线对象边界推理通过现有的逐像素和基于曲线的方法更准确的结果。最后,我们展示DefGrid作为无监督图像分割的独立模块,显示出比现有方法优越的性能。项目网站:这个HTTP URL
Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler
Abstract: In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce \emph{Deformable Grid} DefGrid, a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric downsampling} layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: this http URL
摘要:在现代计算机视觉,图像通常表示为一个固定的均匀网格一些步幅并且经由深卷积神经网络处理。我们认为,电网变形与高频图像内容更好地将是一种更有效的策略。我们引入\ EMPH {变形网格} DefGrid,可以预测一个2维三角形栅格的顶点的位置的偏移量,一可学习神经网络模块,使得与图像边界变形网格对准的边缘。我们展示我们DefGrid在各种使用情况,即通过将其插入作为各级处理的模块。我们利用DefGrid作为端至端\ EMPH {可学习几何下采样}层替换用于馈送的图象分成一深CNN时减少特征分辨率标准池的方法。我们证明在相同的网格分辨率显著改进的结果相比,在均匀网格的语义分割的任务,利用细胞神经网络。我们还利用DefGrid在为对象模板注释的任务输出层,并显示有关我们的预测多边形网格引线对象边界推理通过现有的逐像素和基于曲线的方法更准确的结果。最后,我们展示DefGrid作为无监督图像分割的独立模块,显示出比现有方法优越的性能。项目网站:这个HTTP URL
30. Learning Affordance Landscapes forInteraction Exploration in 3D Environments [PDF] 返回目录
Tushar Nagarajan, Kristen Grauman
Abstract: Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen). Given an egocentric RGB-D camera and a high-level action space, the agent is rewarded for maximizing successful interactions while simultaneously training an image-based affordance segmentation model. The former yields a policy for acting efficiently in new environments to prepare for downstream interaction tasks, while the latter yields a convolutional neural network that maps image regions to the likelihood they permit each action, densifying the rewards for exploration. We demonstrate our idea with AI2-iTHOR. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like "find a knife and put it in the drawer." Project page: this http URL
摘要:体现代理商人类空间操作必须能够掌握自己的环境是如何工作的:什么对象可以代理使用,以及如何使用它们?我们引进了勘探相互作用,由此具体化剂自主发现新的映射的3D环境的启示景观强化学习方法(如一个陌生的厨房)。给定一个RGB自我中心-d照相机和一个高层次的动作空间,该试剂是奖励用于最大化成功的相互作用,而同时训练基于图像的启示分割模型。前者的收益率在新的环境中有效地作用于下游互动任务准备的政策,而后者则是收益率的图像区域映射到它们允许每个动作的可能性,致密勘探奖励卷积神经网络。我们证明我们与AI2-iTHOR想法。结果表明,药物可以学会如何聪明地使用新的家庭环境,它准备他们迅速解决各种下游任务,如“发现一把刀,把它放在抽屉了。”项目页面:这个HTTP URL
Tushar Nagarajan, Kristen Grauman
Abstract: Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen). Given an egocentric RGB-D camera and a high-level action space, the agent is rewarded for maximizing successful interactions while simultaneously training an image-based affordance segmentation model. The former yields a policy for acting efficiently in new environments to prepare for downstream interaction tasks, while the latter yields a convolutional neural network that maps image regions to the likelihood they permit each action, densifying the rewards for exploration. We demonstrate our idea with AI2-iTHOR. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like "find a knife and put it in the drawer." Project page: this http URL
摘要:体现代理商人类空间操作必须能够掌握自己的环境是如何工作的:什么对象可以代理使用,以及如何使用它们?我们引进了勘探相互作用,由此具体化剂自主发现新的映射的3D环境的启示景观强化学习方法(如一个陌生的厨房)。给定一个RGB自我中心-d照相机和一个高层次的动作空间,该试剂是奖励用于最大化成功的相互作用,而同时训练基于图像的启示分割模型。前者的收益率在新的环境中有效地作用于下游互动任务准备的政策,而后者则是收益率的图像区域映射到它们允许每个动作的可能性,致密勘探奖励卷积神经网络。我们证明我们与AI2-iTHOR想法。结果表明,药物可以学会如何聪明地使用新的家庭环境,它准备他们迅速解决各种下游任务,如“发现一把刀,把它放在抽屉了。”项目页面:这个HTTP URL
31. Learning to Abstract and Predict Human Actions [PDF] 返回目录
Romero Morais, Vuong Le, Truyen Tran, Svetha Venkatesh
Abstract: Human activities are naturally structured as hierarchies unrolled over time. For action prediction, temporal relations in event sequences are widely exploited by current methods while their semantic coherence across different levels of abstraction has not been well explored. In this work we model the hierarchical structure of human activities in videos and demonstrate the power of such structure in action prediction. We propose Hierarchical Encoder-Refresher-Anticipator, a multi-level neural machine that can learn the structure of human activities by observing a partial hierarchy of events and roll-out such structure into a future prediction in multiple levels of abstraction. We also introduce a new coarse-to-fine action annotation on the Breakfast Actions videos to create a comprehensive, consistent, and cleanly structured video hierarchical activity dataset. Through our experiments, we examine and rethink the settings and metrics of activity prediction tasks toward unbiased evaluation of prediction systems, and demonstrate the role of hierarchical modeling toward reliable and detailed long-term action forecasting.
摘要:人类活动是自然的结构层次展开随着时间的推移。对于动作预测,在事件序列时间关系由目前的方法被广泛利用,而他们在不同的抽象层次语义一致性没有得到很好的探索。在这项工作中,我们模拟人类活动的视频的层次结构和展示动作预测这种结构的功率。我们提出了分层编码器,进修,预估器,可以通过观察事件和转出这样结构的局部层次到多级抽象未来预测学人类活动的结构的多层次的神经机。我们还介绍了在早餐动作影片的一种新的粗到细的行动注解来创建一个全面的,一致的,结构清晰视频分层活动数据集。通过我们的实验中,我们研究和思考的走向预测系统的公正的评价活动预测任务的设置和标准,并证明分层建模的走向可靠和详细的长期行动的预测作用。
Romero Morais, Vuong Le, Truyen Tran, Svetha Venkatesh
Abstract: Human activities are naturally structured as hierarchies unrolled over time. For action prediction, temporal relations in event sequences are widely exploited by current methods while their semantic coherence across different levels of abstraction has not been well explored. In this work we model the hierarchical structure of human activities in videos and demonstrate the power of such structure in action prediction. We propose Hierarchical Encoder-Refresher-Anticipator, a multi-level neural machine that can learn the structure of human activities by observing a partial hierarchy of events and roll-out such structure into a future prediction in multiple levels of abstraction. We also introduce a new coarse-to-fine action annotation on the Breakfast Actions videos to create a comprehensive, consistent, and cleanly structured video hierarchical activity dataset. Through our experiments, we examine and rethink the settings and metrics of activity prediction tasks toward unbiased evaluation of prediction systems, and demonstrate the role of hierarchical modeling toward reliable and detailed long-term action forecasting.
摘要:人类活动是自然的结构层次展开随着时间的推移。对于动作预测,在事件序列时间关系由目前的方法被广泛利用,而他们在不同的抽象层次语义一致性没有得到很好的探索。在这项工作中,我们模拟人类活动的视频的层次结构和展示动作预测这种结构的功率。我们提出了分层编码器,进修,预估器,可以通过观察事件和转出这样结构的局部层次到多级抽象未来预测学人类活动的结构的多层次的神经机。我们还介绍了在早餐动作影片的一种新的粗到细的行动注解来创建一个全面的,一致的,结构清晰视频分层活动数据集。通过我们的实验中,我们研究和思考的走向预测系统的公正的评价活动预测任务的设置和标准,并证明分层建模的走向可靠和详细的长期行动的预测作用。
32. Image Stitching and Rectification for Hand-Held Cameras [PDF] 返回目录
Bingbing Zhuang, Quoc-Huy Tran
Abstract: In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input -- two consecutive frames from a video stream, wherein the inter-frame motion is restricted from being arbitrarily large. This allows us to adopt simpler differential motion model, leading to a straightforward and practical minimal solver. To deal with non-planar scene and camera parallax in stitching, we further propose an RS-aware spatially-varying homography field in the principle of As-Projective-As-Possible (APAP). We show superior performance over state-of-the-art methods both in RS image stitching and rectification, especially for images captured by hand-held shaking cameras.
摘要:在本文中,我们得到了新的差分单应可考虑在卷帘门(RS)摄像机的扫描线变摄影机姿态,并展示其应用一举开展RS感知图像拼接和整改。尽管RS的高复杂几何,我们在本文中集中在一个特殊但共同的输入 - 从视频流,其中所述帧间运动被从被任意大的限制两个连续帧。这使我们可以采取更简单的差速运动模式,导致的直接和实际的最小解算器。为了应对非平面场景,相机视差拼接,我们进一步AS-投影-AS-可能(APAP)的原则,提出了一个RS-感知空间变化的单应场。我们证明了无论是在RS图像拼接和整改国家的最先进的方法,性能优越,特别是为手持晃动相机拍摄的图像。
Bingbing Zhuang, Quoc-Huy Tran
Abstract: In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input -- two consecutive frames from a video stream, wherein the inter-frame motion is restricted from being arbitrarily large. This allows us to adopt simpler differential motion model, leading to a straightforward and practical minimal solver. To deal with non-planar scene and camera parallax in stitching, we further propose an RS-aware spatially-varying homography field in the principle of As-Projective-As-Possible (APAP). We show superior performance over state-of-the-art methods both in RS image stitching and rectification, especially for images captured by hand-held shaking cameras.
摘要:在本文中,我们得到了新的差分单应可考虑在卷帘门(RS)摄像机的扫描线变摄影机姿态,并展示其应用一举开展RS感知图像拼接和整改。尽管RS的高复杂几何,我们在本文中集中在一个特殊但共同的输入 - 从视频流,其中所述帧间运动被从被任意大的限制两个连续帧。这使我们可以采取更简单的差速运动模式,导致的直接和实际的最小解算器。为了应对非平面场景,相机视差拼接,我们进一步AS-投影-AS-可能(APAP)的原则,提出了一个RS-感知空间变化的单应场。我们证明了无论是在RS图像拼接和整改国家的最先进的方法,性能优越,特别是为手持晃动相机拍摄的图像。
33. Detecting natural disasters, damage, and incidents in the wild [PDF] 返回目录
Ethan Weber, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba
Abstract: Responding to natural disasters, such as earthquakes, floods, and wildfires, is a laborious task performed by on-the-ground emergency responders and analysts. Social media has emerged as a low-latency data source to quickly understand disaster situations. While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes. However, no large-scale image datasets for incident detection exists. In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes. We employ a baseline classification model that mitigates false-positive errors and we perform image filtering experiments on millions of social media images from Flickr and Twitter. Through these experiments, we show how the Incidents Dataset can be used to detect images with incidents in the wild. Code, data, and models are available online at this http URL.
摘要:应对自然灾害,如地震,洪水,野火,是通过在实地紧急救援人员和分析师进行了艰苦的任务。社交媒体已成为一个低延迟数据源迅速了解灾情。虽然在社交媒体上大多数研究仅限于文本,图像提供了解灾害和事故现场的详细信息。然而,对于事件的检测没有大型的图像数据组存在。在这项工作中,我们目前的事件数据集,其中包含由覆盖在各种场景中的43级人的事件注释446684个图像。我们采用的基准分类模型,减轻假阳性的错误,我们在数以百万计来自Flickr和Twitter的社交媒体图像进行图像滤波实验。通过这些实验,我们将展示怎样的数据集的事件,可以使用在野外事故检测图像。代码,数据和模型都可以在网上这个HTTP URL。
Ethan Weber, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba
Abstract: Responding to natural disasters, such as earthquakes, floods, and wildfires, is a laborious task performed by on-the-ground emergency responders and analysts. Social media has emerged as a low-latency data source to quickly understand disaster situations. While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes. However, no large-scale image datasets for incident detection exists. In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes. We employ a baseline classification model that mitigates false-positive errors and we perform image filtering experiments on millions of social media images from Flickr and Twitter. Through these experiments, we show how the Incidents Dataset can be used to detect images with incidents in the wild. Code, data, and models are available online at this http URL.
摘要:应对自然灾害,如地震,洪水,野火,是通过在实地紧急救援人员和分析师进行了艰苦的任务。社交媒体已成为一个低延迟数据源迅速了解灾情。虽然在社交媒体上大多数研究仅限于文本,图像提供了解灾害和事故现场的详细信息。然而,对于事件的检测没有大型的图像数据组存在。在这项工作中,我们目前的事件数据集,其中包含由覆盖在各种场景中的43级人的事件注释446684个图像。我们采用的基准分类模型,减轻假阳性的错误,我们在数以百万计来自Flickr和Twitter的社交媒体图像进行图像滤波实验。通过这些实验,我们将展示怎样的数据集的事件,可以使用在野外事故检测图像。代码,数据和模型都可以在网上这个HTTP URL。
34. PyTorch Metric Learning [PDF] 返回目录
Kevin Musgrave, Serge Belongie, Ser-Nam Lim
Abstract: Deep metric learning algorithms have a wide variety of applications, but implementing these algorithms can be tedious and time consuming. PyTorch Metric Learning is an open source library that aims to remove this barrier for both researchers and practitioners. The modular and flexible design allows users to easily try out different combinations of algorithms in their existing code. It also comes with complete train/test workflows, for users who want results fast. Code and documentation is available at this https URL.
摘要:深度量学习算法有广泛的应用,但实现这些算法可以是繁琐和费时。 PyTorch度量学习是一个开源库,旨在消除这一屏障,为研究人员和从业者。模块化和灵活的设计,可以让用户轻松尝试的算法不同组合在其现有的代码。它还配备了完整训练/测试工作流程,对谁快想要的结果的用户。代码和文档可在此HTTPS URL。
Kevin Musgrave, Serge Belongie, Ser-Nam Lim
Abstract: Deep metric learning algorithms have a wide variety of applications, but implementing these algorithms can be tedious and time consuming. PyTorch Metric Learning is an open source library that aims to remove this barrier for both researchers and practitioners. The modular and flexible design allows users to easily try out different combinations of algorithms in their existing code. It also comes with complete train/test workflows, for users who want results fast. Code and documentation is available at this https URL.
摘要:深度量学习算法有广泛的应用,但实现这些算法可以是繁琐和费时。 PyTorch度量学习是一个开源库,旨在消除这一屏障,为研究人员和从业者。模块化和灵活的设计,可以让用户轻松尝试的算法不同组合在其现有的代码。它还配备了完整训练/测试工作流程,对谁快想要的结果的用户。代码和文档可在此HTTPS URL。
35. Multi-scale Interaction for Real-time LiDAR Data Segmentation on an Embedded Platform [PDF] 返回目录
Shijie Li, Xieyuanli Chen, Yun Liu, Dengxin Dai, Cyrill Stachniss, Juergen Gall
Abstract: Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles, which are usually equipped with an embedded platform and have limited computational resources. Approaches that operate directly on the point cloud use complex spatial aggregation operations, which are very expensive and difficult to optimize for embedded platforms. They are therefore not suitable for real-time applications with embedded systems. As an alternative, projection-based methods are more efficient and can run on embedded platforms. However, the current state-of-the-art projection-based methods do not achieve the same accuracy as point-based methods and use millions of parameters. In this paper, we therefore propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate. The network uses multiple paths with different scales and balances the computational resources between the scales. Additional dense interactions between the scales avoid redundant computations and make the network highly efficient. The proposed network outperforms point-based, image-based, and projection-based methods in terms of accuracy, number of parameters, and runtime. Moreover, the network processes more than 24 scans per second on an embedded platform, which is higher than the framerates of LiDAR sensors. The network is therefore suitable for autonomous vehicles.
摘要:LiDAR数据的实时语义分割是自动驾驶车辆,通常配备了嵌入式平台和有限的计算资源是至关重要的。直接在点云,其操作方法使用复杂的空间聚集操作,这是非常昂贵且难以优化用于嵌入式平台。因此,它们不适合用于嵌入式系统的实时应用。作为替代,基于投影的方法是更有效的,并且可以在嵌入式平台上运行。但是,目前国家的最先进的基于投影的方法不能达到同样的精度基于点的方法和使用数百万的参数。在本文中,我们因此提出了一种基于投影的方法,称为多尺度交互网络(MINET),这是非常有效的和精确的。该网络使用具有不同比例的多条路径和平衡秤之间的计算资源。鳞之间的附加相互作用致密避免冗余计算和使网络高效。所提出的网络性能优于点为基础的,基于图像的,并在精确度,参数的数量,和运行时方面基于投影的方法。而且,每秒网络进程超过24个扫描嵌入式平台,它比LiDAR传感器的帧率更高上。网络因此适合于自动驾驶汽车。
Shijie Li, Xieyuanli Chen, Yun Liu, Dengxin Dai, Cyrill Stachniss, Juergen Gall
Abstract: Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles, which are usually equipped with an embedded platform and have limited computational resources. Approaches that operate directly on the point cloud use complex spatial aggregation operations, which are very expensive and difficult to optimize for embedded platforms. They are therefore not suitable for real-time applications with embedded systems. As an alternative, projection-based methods are more efficient and can run on embedded platforms. However, the current state-of-the-art projection-based methods do not achieve the same accuracy as point-based methods and use millions of parameters. In this paper, we therefore propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate. The network uses multiple paths with different scales and balances the computational resources between the scales. Additional dense interactions between the scales avoid redundant computations and make the network highly efficient. The proposed network outperforms point-based, image-based, and projection-based methods in terms of accuracy, number of parameters, and runtime. Moreover, the network processes more than 24 scans per second on an embedded platform, which is higher than the framerates of LiDAR sensors. The network is therefore suitable for autonomous vehicles.
摘要:LiDAR数据的实时语义分割是自动驾驶车辆,通常配备了嵌入式平台和有限的计算资源是至关重要的。直接在点云,其操作方法使用复杂的空间聚集操作,这是非常昂贵且难以优化用于嵌入式平台。因此,它们不适合用于嵌入式系统的实时应用。作为替代,基于投影的方法是更有效的,并且可以在嵌入式平台上运行。但是,目前国家的最先进的基于投影的方法不能达到同样的精度基于点的方法和使用数百万的参数。在本文中,我们因此提出了一种基于投影的方法,称为多尺度交互网络(MINET),这是非常有效的和精确的。该网络使用具有不同比例的多条路径和平衡秤之间的计算资源。鳞之间的附加相互作用致密避免冗余计算和使网络高效。所提出的网络性能优于点为基础的,基于图像的,并在精确度,参数的数量,和运行时方面基于投影的方法。而且,每秒网络进程超过24个扫描嵌入式平台,它比LiDAR传感器的帧率更高上。网络因此适合于自动驾驶汽车。
36. Causal Future Prediction in a Minkowski Space-Time [PDF] 返回目录
Athanasios Vlontzos, Henrique Bergallo Rocha, Daniel Rueckert, Bernhard Kainz
Abstract: Estimating future events is a difficult task. Unlike humans, machine learning approaches are not regularized by a natural understanding of physics. In the wild, a plausible succession of events is governed by the rules of causality, which cannot easily be derived from a finite training set. In this paper we propose a novel theoretical framework to perform causal future prediction by embedding spatiotemporal information on a Minkowski space-time. We utilize the concept of a light cone from special relativity to restrict and traverse the latent space of an arbitrary model. We demonstrate successful applications in causal image synthesis and future video frame prediction on a dataset of images. Our framework is architecture- and task-independent and comes with strong theoretical guarantees of causal capabilities.
摘要:预计未来事件是一项艰巨的任务。与人类不同,机器学习方法不被物理学的自然理解转正。在野外,活动的合理继承是由因果关系的规则,不能轻易地从一个有限的训练集导出管辖。在本文中,我们提出了一个新的理论框架,通过嵌入在闵可夫斯基时空时空信息进行因果未来预测。我们利用从狭义相对论,光锥,以限制的概念,并遍历任意模型的潜在空间。我们证明了一个图像数据集的因果图像合成和未来视频帧预测成功的应用。我们的框架是建筑 - 与任务无关,并配有因果功能强大的理论保证。
Athanasios Vlontzos, Henrique Bergallo Rocha, Daniel Rueckert, Bernhard Kainz
Abstract: Estimating future events is a difficult task. Unlike humans, machine learning approaches are not regularized by a natural understanding of physics. In the wild, a plausible succession of events is governed by the rules of causality, which cannot easily be derived from a finite training set. In this paper we propose a novel theoretical framework to perform causal future prediction by embedding spatiotemporal information on a Minkowski space-time. We utilize the concept of a light cone from special relativity to restrict and traverse the latent space of an arbitrary model. We demonstrate successful applications in causal image synthesis and future video frame prediction on a dataset of images. Our framework is architecture- and task-independent and comes with strong theoretical guarantees of causal capabilities.
摘要:预计未来事件是一项艰巨的任务。与人类不同,机器学习方法不被物理学的自然理解转正。在野外,活动的合理继承是由因果关系的规则,不能轻易地从一个有限的训练集导出管辖。在本文中,我们提出了一个新的理论框架,通过嵌入在闵可夫斯基时空时空信息进行因果未来预测。我们利用从狭义相对论,光锥,以限制的概念,并遍历任意模型的潜在空间。我们证明了一个图像数据集的因果图像合成和未来视频帧预测成功的应用。我们的框架是建筑 - 与任务无关,并配有因果功能强大的理论保证。
37. ImagiFilter: A resource to enable the semi-automatic mining of images at scale [PDF] 返回目录
Houda Alberts, Iacer Calixto
Abstract: Datasets (semi-)automatically collected from the web can easily scale to millions of entries, but a dataset's usefulness is directly related to how clean and high-quality its examples are. In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained from the web. Our dataset focusses on photographic and/or natural images, a very common use-case in computer vision research. We provide annotations for coarse prediction, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where we further break down the non-photographic class into five classes: maps, drawings, graphs, icons, and sketches. Results on held out validation data show that a model architecture with reduced memory footprint achieves over 96% accuracy on coarse-prediction. Our best model achieves 88% accuracy on the hardest fine-grained classification task available. Dataset and pretrained models are available at: this https URL.
摘要:数据集(半)从网络自动采集可以很容易地扩展到数百万条目,但数据集的有效性直接关系到它的实例如何清洁和高质量的。在本文中,我们描述与设计的(半)预训练的模型一起公开发布的图像数据集自动从非常大的图像集合,有可能从网上获得的过滤掉不良的图像。我们对摄影和/或自然图像数据集论点集中,一个很常见的用例在计算机视觉研究。我们的粗略预测提供了注解,即摄影与非摄影,和更小的细颗粒预测的任务,我们进一步打破非摄影类分为五个等级:地图,图纸,图表,图标和草图。上伸出的验证数据结果表明,与降低的内存占用达到以上粗粒预测96%的准确度的模型体系结构。我们最好的模型达到上可用的最艰难的细粒度分类任务88%的准确率。数据集和预训练的模型,请访问:此HTTPS URL。
Houda Alberts, Iacer Calixto
Abstract: Datasets (semi-)automatically collected from the web can easily scale to millions of entries, but a dataset's usefulness is directly related to how clean and high-quality its examples are. In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained from the web. Our dataset focusses on photographic and/or natural images, a very common use-case in computer vision research. We provide annotations for coarse prediction, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where we further break down the non-photographic class into five classes: maps, drawings, graphs, icons, and sketches. Results on held out validation data show that a model architecture with reduced memory footprint achieves over 96% accuracy on coarse-prediction. Our best model achieves 88% accuracy on the hardest fine-grained classification task available. Dataset and pretrained models are available at: this https URL.
摘要:数据集(半)从网络自动采集可以很容易地扩展到数百万条目,但数据集的有效性直接关系到它的实例如何清洁和高质量的。在本文中,我们描述与设计的(半)预训练的模型一起公开发布的图像数据集自动从非常大的图像集合,有可能从网上获得的过滤掉不良的图像。我们对摄影和/或自然图像数据集论点集中,一个很常见的用例在计算机视觉研究。我们的粗略预测提供了注解,即摄影与非摄影,和更小的细颗粒预测的任务,我们进一步打破非摄影类分为五个等级:地图,图纸,图表,图标和草图。上伸出的验证数据结果表明,与降低的内存占用达到以上粗粒预测96%的准确度的模型体系结构。我们最好的模型达到上可用的最艰难的细粒度分类任务88%的准确率。数据集和预训练的模型,请访问:此HTTPS URL。
38. A persistent homology-based topological loss function for multi-class CNN segmentation of cardiac MRI [PDF] 返回目录
Nick Byrne, James R. Clough, Giovanni Montana, Andrew P. King
Abstract: With respect to spatial overlap, CNN-based segmentation of short axis cardiovascular magnetic resonance (CMR) images has achieved a level of performance consistent with inter observer variation. However, conventional training procedures frequently depend on pixel-wise loss functions, limiting optimisation with respect to extended or global features. As a result, inferred segmentations can lack spatial coherence, including spurious connected components or holes. Such results are implausible, violating the anticipated topology of image segments, which is frequently known a priori. Addressing this challenge, published work has employed persistent homology, constructing topological loss functions for the evaluation of image segments against an explicit prior. Building a richer description of segmentation topology by considering all possible labels and label pairs, we extend these losses to the task of multi-class segmentation. These topological priors allow us to resolve all topological errors in a subset of 150 examples from the ACDC short axis CMR training data set, without sacrificing overlap performance.
摘要:对于空间重叠,短轴心血管磁共振(CMR)的图像的基于CNN-分割已经实现的性能与观察者间变化相一致的电平。然而,传统的训练方法经常依赖于逐像素损失的功能,限制相对于扩展或全局的功能优化。其结果是,推断出的分割可缺少的空间相干性,包括伪连接的部件或孔。这样的结果是不可信的,违反的图像段,这是经常先验已知的的预期拓扑。应对这一挑战,发表的作品已经采用持久的同源性,构建拓扑损失函数图像段针对明确的事先评估。通过考虑所有可能的标签和标签对构建分割拓扑更丰富的描述中,我们扩展这些损失多级分段的任务。这些拓扑先验使我们能够解决从ACDC短轴CMR训练数据集150例的一个子集的所有拓扑错误,在不牺牲性能的重叠。
Nick Byrne, James R. Clough, Giovanni Montana, Andrew P. King
Abstract: With respect to spatial overlap, CNN-based segmentation of short axis cardiovascular magnetic resonance (CMR) images has achieved a level of performance consistent with inter observer variation. However, conventional training procedures frequently depend on pixel-wise loss functions, limiting optimisation with respect to extended or global features. As a result, inferred segmentations can lack spatial coherence, including spurious connected components or holes. Such results are implausible, violating the anticipated topology of image segments, which is frequently known a priori. Addressing this challenge, published work has employed persistent homology, constructing topological loss functions for the evaluation of image segments against an explicit prior. Building a richer description of segmentation topology by considering all possible labels and label pairs, we extend these losses to the task of multi-class segmentation. These topological priors allow us to resolve all topological errors in a subset of 150 examples from the ACDC short axis CMR training data set, without sacrificing overlap performance.
摘要:对于空间重叠,短轴心血管磁共振(CMR)的图像的基于CNN-分割已经实现的性能与观察者间变化相一致的电平。然而,传统的训练方法经常依赖于逐像素损失的功能,限制相对于扩展或全局的功能优化。其结果是,推断出的分割可缺少的空间相干性,包括伪连接的部件或孔。这样的结果是不可信的,违反的图像段,这是经常先验已知的的预期拓扑。应对这一挑战,发表的作品已经采用持久的同源性,构建拓扑损失函数图像段针对明确的事先评估。通过考虑所有可能的标签和标签对构建分割拓扑更丰富的描述中,我们扩展这些损失多级分段的任务。这些拓扑先验使我们能够解决从ACDC短轴CMR训练数据集150例的一个子集的所有拓扑错误,在不牺牲性能的重叠。
39. GA-MSSR: Genetic Algorithm Maximizing Sharpe and Sterling Ratio Method for RoboTrading [PDF] 返回目录
Zezheng Zhang, Matloob Khushi
Abstract: Foreign exchange is the largest financial market in the world, and it is also one of the most volatile markets. Technical analysis plays an important role in the forex market and trading algorithms are designed utilizing machine learning techniques. Most literature used historical price information and technical indicators for training. However, the noisy nature of the market affects the consistency and profitability of the algorithms. To address this problem, we designed trading rule features that are derived from technical indicators and trading rules. The parameters of technical indicators are optimized to maximize trading performance. We also proposed a novel cost function that computes the risk-adjusted return, Sharpe and Sterling Ratio (SSR), in an effort to reduce the variance and the magnitude of drawdowns. An automatic robotic trading (RoboTrading) strategy is designed with the proposed Genetic Algorithm Maximizing Sharpe and Sterling Ratio model (GA-MSSR) model. The experiment was conducted on intraday data of 6 major currency pairs from 2018 to 2019. The results consistently showed significant positive returns and the performance of the trading system is superior using the optimized rule-based features. The highest return obtained was 320% annually using 5-minute AUDUSD currency pair. Besides, the proposed model achieves the best performance on risk factors, including maximum drawdowns and variance in return, comparing to benchmark models. The code can be accessed at this https URL
摘要:外汇是全球最大的金融市场在世界上,它也是最不稳定的市场之一。技术分析起在外汇市场上的重要角色和交易算法设计利用机器学习技术。大多数文献中使用的历史价格信息和培训的技术指标。然而,市场的喧嚣性质影响了算法的一致性和盈利能力。为了解决这个问题,我们设计的交易是从技术指标和交易规则导出规则的功能。技术指标的参数进行优化,以最大限度地提高交易业绩。我们还提出了计算风险调整后的收益,夏普和英镑比(SSR)一种新型的成本函数,以努力减少方差和流延膜的大小。自动机器人交易(RoboTrading)战略的目的所提出的遗传算法最大化夏普和英镑比模型(GA-MSSR)模型。该实验是在6个主要货币对盘中的数据进行了从2018到2019年一直呈显著正收益,并采用优化的基于规则的功能交易系统的性能优越的结果。获得的最高收益率为320%,每年用5分钟AUDUSD货币对。此外,该模型实现了对风险因素的最佳性能,包括最大亏损,并方差回报,比较的基准模型。该代码可以在此HTTPS URL访问
Zezheng Zhang, Matloob Khushi
Abstract: Foreign exchange is the largest financial market in the world, and it is also one of the most volatile markets. Technical analysis plays an important role in the forex market and trading algorithms are designed utilizing machine learning techniques. Most literature used historical price information and technical indicators for training. However, the noisy nature of the market affects the consistency and profitability of the algorithms. To address this problem, we designed trading rule features that are derived from technical indicators and trading rules. The parameters of technical indicators are optimized to maximize trading performance. We also proposed a novel cost function that computes the risk-adjusted return, Sharpe and Sterling Ratio (SSR), in an effort to reduce the variance and the magnitude of drawdowns. An automatic robotic trading (RoboTrading) strategy is designed with the proposed Genetic Algorithm Maximizing Sharpe and Sterling Ratio model (GA-MSSR) model. The experiment was conducted on intraday data of 6 major currency pairs from 2018 to 2019. The results consistently showed significant positive returns and the performance of the trading system is superior using the optimized rule-based features. The highest return obtained was 320% annually using 5-minute AUDUSD currency pair. Besides, the proposed model achieves the best performance on risk factors, including maximum drawdowns and variance in return, comparing to benchmark models. The code can be accessed at this https URL
摘要:外汇是全球最大的金融市场在世界上,它也是最不稳定的市场之一。技术分析起在外汇市场上的重要角色和交易算法设计利用机器学习技术。大多数文献中使用的历史价格信息和培训的技术指标。然而,市场的喧嚣性质影响了算法的一致性和盈利能力。为了解决这个问题,我们设计的交易是从技术指标和交易规则导出规则的功能。技术指标的参数进行优化,以最大限度地提高交易业绩。我们还提出了计算风险调整后的收益,夏普和英镑比(SSR)一种新型的成本函数,以努力减少方差和流延膜的大小。自动机器人交易(RoboTrading)战略的目的所提出的遗传算法最大化夏普和英镑比模型(GA-MSSR)模型。该实验是在6个主要货币对盘中的数据进行了从2018到2019年一直呈显著正收益,并采用优化的基于规则的功能交易系统的性能优越的结果。获得的最高收益率为320%,每年用5分钟AUDUSD货币对。此外,该模型实现了对风险因素的最佳性能,包括最大亏损,并方差回报,比较的基准模型。该代码可以在此HTTPS URL访问
40. A Survey on Assessing the Generalization Envelope of Deep Neural Networks at Inference Time for Image Classification [PDF] 返回目录
Julia Lust, Alexandru Paul Condurache
Abstract: Deep Neural Networks (DNNs) achieve state-of-the-art performance on numerous problem set-ups. However, humans are not able to tell beforehand if a DNN receiving an input will deliver the desired output since their decision criteria are usually non-transparent. A DNN delivers the desired output if the input is within its generalization envelope. In this case, the information contained in the input sample is processed reasonably by the network. Since common DNNs fail to provide relevant information to assess the generalization envelope at inference time, additional methods or adaptations to the DNN have to be performed. Existing methods are evaluated using different set-ups respectively connected to three literature fields: predictive uncertainty, out-of-distribution detection and adversarial example detection. This survey connects those fields and gives an overview of the adaptations and methods that provide at inference time information if the current input is within the generalization area of a DNN.
摘要:深层神经网络(DNNs)实现众多问题调校国家的最先进的性能。然而,人类不能事先告诉我们,如果因为它们的判决准则通常是不透明的一个DNN接收输入将提供所需的输出。甲DNN如果输入是其概括包络内提供所需的输出。在这种情况下,包含在输入样本中的信息由网络合理处理。由于共同DNNs未能提供有关资料,以评估在推理时间泛化信封,额外的方法或改编的DNN必须执行。现有的方法是使用不同的设置分别连接到三个文献字段评价:预测不确定性,外的分布检测和对抗性示例性检测。本次调查连接这些领域,并给出了在推理时间信息提供,如果当前输入是DNN的推广区域内的适应和方法的概述。
Julia Lust, Alexandru Paul Condurache
Abstract: Deep Neural Networks (DNNs) achieve state-of-the-art performance on numerous problem set-ups. However, humans are not able to tell beforehand if a DNN receiving an input will deliver the desired output since their decision criteria are usually non-transparent. A DNN delivers the desired output if the input is within its generalization envelope. In this case, the information contained in the input sample is processed reasonably by the network. Since common DNNs fail to provide relevant information to assess the generalization envelope at inference time, additional methods or adaptations to the DNN have to be performed. Existing methods are evaluated using different set-ups respectively connected to three literature fields: predictive uncertainty, out-of-distribution detection and adversarial example detection. This survey connects those fields and gives an overview of the adaptations and methods that provide at inference time information if the current input is within the generalization area of a DNN.
摘要:深层神经网络(DNNs)实现众多问题调校国家的最先进的性能。然而,人类不能事先告诉我们,如果因为它们的判决准则通常是不透明的一个DNN接收输入将提供所需的输出。甲DNN如果输入是其概括包络内提供所需的输出。在这种情况下,包含在输入样本中的信息由网络合理处理。由于共同DNNs未能提供有关资料,以评估在推理时间泛化信封,额外的方法或改编的DNN必须执行。现有的方法是使用不同的设置分别连接到三个文献字段评价:预测不确定性,外的分布检测和对抗性示例性检测。本次调查连接这些领域,并给出了在推理时间信息提供,如果当前输入是DNN的推广区域内的适应和方法的概述。
41. Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019 [PDF] 返回目录
Zhang Li, Jiehua Zhang, Tao Tan, Xichao Teng, Xiaoliang Sun, Yang Li, Lihong Liu, Yang Xiao, Byungjae Lee, Yilong Li, Qianni Zhang, Shujiao Sun, Yushan Zheng, Junyu Yan, Ni Li, Yiyu Hong, Junsu Ko, Hyun Jung, Yanling Liu, Yu-cheng Chen, Ching-wei Wang, Vladimir Yurovskiy, Pavel Maevskikh, Vahid Khanagha, Yi Jiang, Xiangjun Feng, Zhihong Liu, Daiqiang Li, Peter J. Schüffler, Qifeng Yu, Hui Chen, Yuling Tang, Geert Litjens
Abstract: Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean dc of 0.7966 and 0.7544, respectively. deep learning based methods could potentially help pathologists find suspicious regions for further analysis lung cancer in wsi. < font>
摘要:在病理切片肺癌的准确分割是改善病人护理的关键步骤。我们提出了ACDC @ LungHP(自动癌症检测和分类的全滑动肺组织学)的挑战,以评估对肺癌的自动诊断不同的计算机辅助诊断(CADS)方法。在ACDC @ LungHP 2019集中在分割在整个幻灯片成像(WSI)癌组织的(逐像素检测),采用150个训练图像和从200名患者50个的测试图像的注释的数据集。本文回顾了这一挑战,并总结了肺癌分割前10名提交方法。使用假阳性率,假阴性率,和DICE系数(DC)的所有方法进行评价。直流介于0.7354 $ \ $下午至0.1149 0.8372 $ \ $下午0.0858。最好的方法的DC是接近国际观察员的协议(0.8398 $ \ $下午0.0890)。是基于深度学习的所有方法,并分为两类:多模型法和单模型法。一般情况下,多模式方法是显著更好($ \ textit {P} $ <$ 0.01 $)比单模型的方法,其中的分别0.7966和0.7544,平均的dc。深度学习为基础的方法可能会帮助病理学家发现在wsi肺癌的进一步分析可疑区域。< font> $>$0.01$)>
Zhang Li, Jiehua Zhang, Tao Tan, Xichao Teng, Xiaoliang Sun, Yang Li, Lihong Liu, Yang Xiao, Byungjae Lee, Yilong Li, Qianni Zhang, Shujiao Sun, Yushan Zheng, Junyu Yan, Ni Li, Yiyu Hong, Junsu Ko, Hyun Jung, Yanling Liu, Yu-cheng Chen, Ching-wei Wang, Vladimir Yurovskiy, Pavel Maevskikh, Vahid Khanagha, Yi Jiang, Xiangjun Feng, Zhihong Liu, Daiqiang Li, Peter J. Schüffler, Qifeng Yu, Hui Chen, Yuling Tang, Geert Litjens
Abstract: Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean dc of 0.7966 and 0.7544, respectively. deep learning based methods could potentially help pathologists find suspicious regions for further analysis lung cancer in wsi. < font>
摘要:在病理切片肺癌的准确分割是改善病人护理的关键步骤。我们提出了ACDC @ LungHP(自动癌症检测和分类的全滑动肺组织学)的挑战,以评估对肺癌的自动诊断不同的计算机辅助诊断(CADS)方法。在ACDC @ LungHP 2019集中在分割在整个幻灯片成像(WSI)癌组织的(逐像素检测),采用150个训练图像和从200名患者50个的测试图像的注释的数据集。本文回顾了这一挑战,并总结了肺癌分割前10名提交方法。使用假阳性率,假阴性率,和DICE系数(DC)的所有方法进行评价。直流介于0.7354 $ \ $下午至0.1149 0.8372 $ \ $下午0.0858。最好的方法的DC是接近国际观察员的协议(0.8398 $ \ $下午0.0890)。是基于深度学习的所有方法,并分为两类:多模型法和单模型法。一般情况下,多模式方法是显著更好($ \ textit {P} $ <$ 0.01 $)比单模型的方法,其中的分别0.7966和0.7544,平均的dc。深度学习为基础的方法可能会帮助病理学家发现在wsi肺癌的进一步分析可疑区域。< font> $>$0.01$)>
42. DTDN: Dual-task De-raining Network [PDF] 返回目录
Zheng Wang, Jianwu Li, Ge Song
Abstract: Removing rain streaks from rainy images is necessary for many tasks in computer vision, such as object detection and recognition. It needs to address two mutually exclusive objectives: removing rain streaks and reserving realistic details. Balancing them is critical for de-raining methods. We propose an end-to-end network, called dual-task de-raining network (DTDN), consisting of two sub-networks: generative adversarial network (GAN) and convolutional neural network (CNN), to remove rain streaks via coordinating the two mutually exclusive objectives self-adaptively. DTDN-GAN is mainly used to remove structural rain streaks, and DTDN-CNN is designed to recover details in original images. We also design a training algorithm to train these two sub-networks of DTDN alternatively, which share same weights but use different training sets. We further enrich two existing datasets to approximate the distribution of real rain streaks. Experimental results show that our method outperforms several recent state-of-the-art methods, based on both benchmark testing datasets and real rainy images.
摘要:从多雨的图像删除雨条纹是必要的计算机视觉的许多任务,如目标探测和识别。它需要解决两个相互排斥的目标:消除雨条纹和保留逼真的细节。平衡他们是去下雨的方法是至关重要的。我们提出了一个终端到端到端的网络,所谓的双任务去下雨网络(DTDN),由两个子网络组成:生成对抗网络(GAN)和卷积神经网络(CNN),通过协调除去雨条纹两个相互排斥的目标自适应。 DTDN-GaN主要是用来消除结构性雨条纹,和DTDN-CNN的目的是恢复在原始图像的细节。我们还设计了训练算法训练DTDN或者这两个子网,共享相同的权重,但使用不同的训练集。我们进一步丰富现有的两个数据集来逼近真实的雨条纹的分布。实验结果表明,该方法优于最近几个国家的最先进的方法,基于这两个基准测试数据集和真正的雨季图像。
Zheng Wang, Jianwu Li, Ge Song
Abstract: Removing rain streaks from rainy images is necessary for many tasks in computer vision, such as object detection and recognition. It needs to address two mutually exclusive objectives: removing rain streaks and reserving realistic details. Balancing them is critical for de-raining methods. We propose an end-to-end network, called dual-task de-raining network (DTDN), consisting of two sub-networks: generative adversarial network (GAN) and convolutional neural network (CNN), to remove rain streaks via coordinating the two mutually exclusive objectives self-adaptively. DTDN-GAN is mainly used to remove structural rain streaks, and DTDN-CNN is designed to recover details in original images. We also design a training algorithm to train these two sub-networks of DTDN alternatively, which share same weights but use different training sets. We further enrich two existing datasets to approximate the distribution of real rain streaks. Experimental results show that our method outperforms several recent state-of-the-art methods, based on both benchmark testing datasets and real rainy images.
摘要:从多雨的图像删除雨条纹是必要的计算机视觉的许多任务,如目标探测和识别。它需要解决两个相互排斥的目标:消除雨条纹和保留逼真的细节。平衡他们是去下雨的方法是至关重要的。我们提出了一个终端到端到端的网络,所谓的双任务去下雨网络(DTDN),由两个子网络组成:生成对抗网络(GAN)和卷积神经网络(CNN),通过协调除去雨条纹两个相互排斥的目标自适应。 DTDN-GaN主要是用来消除结构性雨条纹,和DTDN-CNN的目的是恢复在原始图像的细节。我们还设计了训练算法训练DTDN或者这两个子网,共享相同的权重,但使用不同的训练集。我们进一步丰富现有的两个数据集来逼近真实的雨条纹的分布。实验结果表明,该方法优于最近几个国家的最先进的方法,基于这两个基准测试数据集和真正的雨季图像。
43. Line-Circle-Square (LCS): A Multilayered Geometric Filter for Edge-Based Detection [PDF] 返回目录
Seyed Amir Tafrishi, Xiaotian Dai, Vahid Esmaeilzadeh Kandjani
Abstract: This paper presents a state-of-the-art filter that reduces the complexity in object detection, tracking and mapping applications. Existing edge detection and tracking methods are proposed to create suitable autonomy for mobile robots, however many of them face overconfidence and large computations at the entrance to scenarios with an immense number of landmarks. In particular, it is not practically efficient to solely rely on limited sensors such as a camera. The method in this work, the Line-Circle-Square (LCS) filter, claims that mobile robots without a large database for object recognition and highly advanced prediction methods can deal with incoming objects that the camera captures in real-time. The proposed filter applies detection, tracking and learning to each defined expert to extract more information for judging scenes without over-calculation. The interactive learning feed between each expert creates a minimal error that works against overwhelming detected features in crowded scenes. Our experts are dependent on trust factors' covariance under the geometric definitions to ignore, emerge and compare detected landmarks. The experiment validates the effectiveness of the proposed filter in terms of detection precision and resource usage.
摘要:本文呈现状态的最先进的一个过滤器,降低了复杂性物体检测,跟踪和映射的应用程序。提出了现有的边缘检测和跟踪方法来创建移动机器人适合的自主权,但是很多人脸上自信和大型计算入口处的场景与标志的一个巨大的数字。特别是,它实际上是无法有效地仅仅依靠有限传感器,如照相机。在这项工作中的方法,所述线圈方(LCS)滤波器,权利要求书,如果没有一个大的数据库,用于对象识别和高度先进的预测方法的移动机器人可以对付传入对象实时相机捕获。所提出的滤波器应用于检测,跟踪和学习每个定义的专家以提取更多的信息,而不会过度的计算判断场景。每个专家之间的互动学习它创建了一个很小的错误,对在拥挤的场面铺天盖地检测功能工作。我们的专家都依赖于信任因素协方差几何定义下忽略,出现和比较检测的地标。实验验证了该过滤器在检测精度和资源使用方面的有效性。
Seyed Amir Tafrishi, Xiaotian Dai, Vahid Esmaeilzadeh Kandjani
Abstract: This paper presents a state-of-the-art filter that reduces the complexity in object detection, tracking and mapping applications. Existing edge detection and tracking methods are proposed to create suitable autonomy for mobile robots, however many of them face overconfidence and large computations at the entrance to scenarios with an immense number of landmarks. In particular, it is not practically efficient to solely rely on limited sensors such as a camera. The method in this work, the Line-Circle-Square (LCS) filter, claims that mobile robots without a large database for object recognition and highly advanced prediction methods can deal with incoming objects that the camera captures in real-time. The proposed filter applies detection, tracking and learning to each defined expert to extract more information for judging scenes without over-calculation. The interactive learning feed between each expert creates a minimal error that works against overwhelming detected features in crowded scenes. Our experts are dependent on trust factors' covariance under the geometric definitions to ignore, emerge and compare detected landmarks. The experiment validates the effectiveness of the proposed filter in terms of detection precision and resource usage.
摘要:本文呈现状态的最先进的一个过滤器,降低了复杂性物体检测,跟踪和映射的应用程序。提出了现有的边缘检测和跟踪方法来创建移动机器人适合的自主权,但是很多人脸上自信和大型计算入口处的场景与标志的一个巨大的数字。特别是,它实际上是无法有效地仅仅依靠有限传感器,如照相机。在这项工作中的方法,所述线圈方(LCS)滤波器,权利要求书,如果没有一个大的数据库,用于对象识别和高度先进的预测方法的移动机器人可以对付传入对象实时相机捕获。所提出的滤波器应用于检测,跟踪和学习每个定义的专家以提取更多的信息,而不会过度的计算判断场景。每个专家之间的互动学习它创建了一个很小的错误,对在拥挤的场面铺天盖地检测功能工作。我们的专家都依赖于信任因素协方差几何定义下忽略,出现和比较检测的地标。实验验证了该过滤器在检测精度和资源使用方面的有效性。
44. AWNet: Attentive Wavelet Network for Image ISP [PDF] 返回目录
Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen
Abstract: As the revolutionary improvement being made on the performance of smartphones over the last decade, mobile photography becomes one of the most common practices among the majority of smartphone users. However, due to the limited size of camera sensors on phone, the photographed image is still visually distinct to the one taken by the digital single-lens reflex (DSLR) camera. To narrow this performance gap, one is to redesign the camera image signal processor (ISP) to improve the image quality. Owing to the rapid rise of deep learning, recent works resort to the deep convolutional neural network (CNN) to develop a sophisticated data-driven ISP that directly maps the phone-captured image to the DSLR-captured one. In this paper, we introduce a novel network that utilizes the attention mechanism and wavelet transform, dubbed AWNet, to tackle this learnable image ISP problem. By adding the wavelet transform, our proposed method enables us to restore favorable image details from RAW information and achieve a larger receptive field while remaining high efficiency in terms of computational cost. The global context block is adopted in our method to learn the non-local color mapping for the generation of appealing RGB images. More importantly, this block alleviates the influence of image misalignment occurred on the provided dataset. Experimental results indicate the advances of our design in both qualitative and quantitative measurements.
摘要:在过去十年中的智能手机的性能所取得的革命性进步,移动摄影成为广大智能手机用户中最常见的做法之一。然而,由于照相机的传感器在手机上的尺寸有限,所拍摄的图像是静止视觉上不同,以由数字单镜头反光(DSLR)照相机拍摄的一个。要缩小该性能差距,一个是重新设计照相机图像信号处理器(ISP),以改善图像质量。由于深学习的迅速崛起,最近的作品诉诸深卷积神经网络(CNN)开发先进的数据驱动的ISP可以直接在手机拍摄的图像的DSLR捕捉的一个映射。在本文中,我们将介绍利用的注意机制和小波变换一种新的网络,被称为AWNet,来解决这个可以学习像ISP的问题。通过添加小波变换,我们提出的方法使我们能够恢复从原始信息良好的图像细节,实现了更大的感受野,而在计算成本方面剩余的高效率。全球范围内块在我们的方法通过学习非本地色彩映射吸引力RGB图像的生成。更重要的是,该块减轻发生在所提供的数据集图像错位的影响。实验结果表明,我们设计的定性和定量测量的进步。
Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen
Abstract: As the revolutionary improvement being made on the performance of smartphones over the last decade, mobile photography becomes one of the most common practices among the majority of smartphone users. However, due to the limited size of camera sensors on phone, the photographed image is still visually distinct to the one taken by the digital single-lens reflex (DSLR) camera. To narrow this performance gap, one is to redesign the camera image signal processor (ISP) to improve the image quality. Owing to the rapid rise of deep learning, recent works resort to the deep convolutional neural network (CNN) to develop a sophisticated data-driven ISP that directly maps the phone-captured image to the DSLR-captured one. In this paper, we introduce a novel network that utilizes the attention mechanism and wavelet transform, dubbed AWNet, to tackle this learnable image ISP problem. By adding the wavelet transform, our proposed method enables us to restore favorable image details from RAW information and achieve a larger receptive field while remaining high efficiency in terms of computational cost. The global context block is adopted in our method to learn the non-local color mapping for the generation of appealing RGB images. More importantly, this block alleviates the influence of image misalignment occurred on the provided dataset. Experimental results indicate the advances of our design in both qualitative and quantitative measurements.
摘要:在过去十年中的智能手机的性能所取得的革命性进步,移动摄影成为广大智能手机用户中最常见的做法之一。然而,由于照相机的传感器在手机上的尺寸有限,所拍摄的图像是静止视觉上不同,以由数字单镜头反光(DSLR)照相机拍摄的一个。要缩小该性能差距,一个是重新设计照相机图像信号处理器(ISP),以改善图像质量。由于深学习的迅速崛起,最近的作品诉诸深卷积神经网络(CNN)开发先进的数据驱动的ISP可以直接在手机拍摄的图像的DSLR捕捉的一个映射。在本文中,我们将介绍利用的注意机制和小波变换一种新的网络,被称为AWNet,来解决这个可以学习像ISP的问题。通过添加小波变换,我们提出的方法使我们能够恢复从原始信息良好的图像细节,实现了更大的感受野,而在计算成本方面剩余的高效率。全球范围内块在我们的方法通过学习非本地色彩映射吸引力RGB图像的生成。更重要的是,该块减轻发生在所提供的数据集图像错位的影响。实验结果表明,我们设计的定性和定量测量的进步。
45. Not My Deepfake: Towards Plausible Deniability for Machine-Generated Media [PDF] 返回目录
Baiwu Zhang, Jin Peng Zhou, Ilia Shumailov, Nicolas Papernot
Abstract: Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the very nature of frameworks used for generative modeling suggests that progress towards detecting deepfakes will enable more realistic deepfake generation. Therefore, it comes at no surprise that developers of generative models are under the scrutiny of stakeholders dealing with misinformation campaigns. As such, there is a clear need to develop tools that ensure the transparent use of generative modeling, while minimizing the harm caused by malicious applications. We propose a framework to provide developers of generative models with plausible deniability. We introduce two techniques to provide evidence that a model developer did not produce media that they are being accused of. The first optimizes over the source of entropy of each generative model to probabilistically attribute a deepfake to one of the models. The second involves cryptography to maintain a tamper-proof and publicly-broadcasted record of all legitimate uses of the model. We evaluate our approaches on the seminal example of face synthesis, demonstrating that our first approach achieves 97.62% attribution accuracy, and is less sensitive to perturbations and adversarial examples. In cases where a machine learning approach is unable to provide plausible deniability, we find that involving cryptography as done in our second approach is required. We also discuss the ethical implications of our work, and highlight that a more meaningful legislative framework is required for a more transparent and ethical use of generative modeling.
摘要:进展生成的造型,尤其是生殖对抗网络,使人们能够有效地大规模合成和旧媒体。恶意个人现在依靠这些机器产生的媒体,或deepfakes,操纵社会话语。为了保证媒体的真实性,现有的研究主要集中在deepfake检测。然而,用于生成建模框架的本质提出对检测deepfakes将使更多的现实deepfake一代的进步。因此,它是在毫不奇怪,生成模型的开发人员正在处理误传活动的利益相关者的监督下。因此,有一个明确的需要开发工具,确保透明地使用生成的造型,同时最大限度地减少所造成的恶意程序的危害。我们提出了一个框架,以提供生成模型的开发与合理的推诿。我们介绍了两种技术来提供证据,证明他们是被指责的典范开发商没有产生媒体。在每个生成模型的熵源的第一优的概率性属性的deepfake到模型之一。第二个涉及加密,以保持模型的所有合法用途的防篡改和公共广播记录。我们评估我们的脸合成的开创性例子方法,证明了我们的第一途径达到97.62%,归属精度,是扰动和敌对的例子不太敏感。在机器学习的方法是无法提供合理的推诿情况下,我们发现,在我们的第二个方法进行加密涉及需要。我们还讨论工作的伦理问题,并突出显示更有意义的立法框架需要一个更加透明和道德地使用生成的造型。
Baiwu Zhang, Jin Peng Zhou, Ilia Shumailov, Nicolas Papernot
Abstract: Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the very nature of frameworks used for generative modeling suggests that progress towards detecting deepfakes will enable more realistic deepfake generation. Therefore, it comes at no surprise that developers of generative models are under the scrutiny of stakeholders dealing with misinformation campaigns. As such, there is a clear need to develop tools that ensure the transparent use of generative modeling, while minimizing the harm caused by malicious applications. We propose a framework to provide developers of generative models with plausible deniability. We introduce two techniques to provide evidence that a model developer did not produce media that they are being accused of. The first optimizes over the source of entropy of each generative model to probabilistically attribute a deepfake to one of the models. The second involves cryptography to maintain a tamper-proof and publicly-broadcasted record of all legitimate uses of the model. We evaluate our approaches on the seminal example of face synthesis, demonstrating that our first approach achieves 97.62% attribution accuracy, and is less sensitive to perturbations and adversarial examples. In cases where a machine learning approach is unable to provide plausible deniability, we find that involving cryptography as done in our second approach is required. We also discuss the ethical implications of our work, and highlight that a more meaningful legislative framework is required for a more transparent and ethical use of generative modeling.
摘要:进展生成的造型,尤其是生殖对抗网络,使人们能够有效地大规模合成和旧媒体。恶意个人现在依靠这些机器产生的媒体,或deepfakes,操纵社会话语。为了保证媒体的真实性,现有的研究主要集中在deepfake检测。然而,用于生成建模框架的本质提出对检测deepfakes将使更多的现实deepfake一代的进步。因此,它是在毫不奇怪,生成模型的开发人员正在处理误传活动的利益相关者的监督下。因此,有一个明确的需要开发工具,确保透明地使用生成的造型,同时最大限度地减少所造成的恶意程序的危害。我们提出了一个框架,以提供生成模型的开发与合理的推诿。我们介绍了两种技术来提供证据,证明他们是被指责的典范开发商没有产生媒体。在每个生成模型的熵源的第一优的概率性属性的deepfake到模型之一。第二个涉及加密,以保持模型的所有合法用途的防篡改和公共广播记录。我们评估我们的脸合成的开创性例子方法,证明了我们的第一途径达到97.62%,归属精度,是扰动和敌对的例子不太敏感。在机器学习的方法是无法提供合理的推诿情况下,我们发现,在我们的第二个方法进行加密涉及需要。我们还讨论工作的伦理问题,并突出显示更有意义的立法框架需要一个更加透明和道德地使用生成的造型。
46. Conditional Entropy Coding for Efficient Video Compression [PDF] 返回目录
Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun
Abstract: We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.
摘要:我们提出了一个非常简单而有效的视频压缩框架,只专注于造型帧之间的条件熵。不同于现有基于学习的方法,我们通过不执行任何形式的帧之间的显式转换的减少复杂性和假设每个帧被编码有一个独立的状态的最先进的深图像压缩器。我们首先表明一个简单的建筑造型形象潜伏代码之间的熵是有竞争力的其他神经的视频压缩作品和视频编解码器,同时更快,更容易实现。然后,我们提出在此架构带来额外的10%的比特率的节省,而不会影响解码速度之上的新的内部学习推广。重要的是,我们证明了我们的方法比H.265等深学习基线在MS-SSIM更高比特率视频UVG,以及对较低的帧率所有视频编解码器,同时为成千上万倍的速度比利用自回归熵模型深解码模型。
Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun
Abstract: We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.
摘要:我们提出了一个非常简单而有效的视频压缩框架,只专注于造型帧之间的条件熵。不同于现有基于学习的方法,我们通过不执行任何形式的帧之间的显式转换的减少复杂性和假设每个帧被编码有一个独立的状态的最先进的深图像压缩器。我们首先表明一个简单的建筑造型形象潜伏代码之间的熵是有竞争力的其他神经的视频压缩作品和视频编解码器,同时更快,更容易实现。然后,我们提出在此架构带来额外的10%的比特率的节省,而不会影响解码速度之上的新的内部学习推广。重要的是,我们证明了我们的方法比H.265等深学习基线在MS-SSIM更高比特率视频UVG,以及对较低的帧率所有视频编解码器,同时为成千上万倍的速度比利用自回归熵模型深解码模型。
47. VisualSem: a high-quality knowledge graph for vision and language [PDF] 返回目录
Houda Alberts, Teresa Huang, Yash Deshpande, Yibo Liu, Kyunghyun Cho, Clara Vania, Iacer Calixto
Abstract: We argue that the next frontier in natural language understanding (NLU) and generation (NLG) will include models that can efficiently access external structured knowledge repositories. In order to support the development of such models, we release the VisualSem knowledge graph (KG) which includes nodes with multilingual glosses and multiple illustrative images and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG. This multi-modal retrieval model can be integrated into any (neural network) model pipeline and we encourage the research community to use VisualSem for data augmentation and/or as a source of grounding, among other possible uses. VisualSem as well as the multi-modal retrieval model are publicly available and can be downloaded in: this https URL.
摘要:我们认为,在自然语言理解(NLU)和代(NLG)的下一个前沿领域将包括能够有效地访问外部结构化的知识仓库模型。为了支持这种模式的发展,我们发布VisualSem知识图(KG),包括多语种的唇彩和多个说明图像和视觉相关关系的节点。我们也释放出可以使用图像或句子作为KG的输入和检索实体神经多模态检索模型。这种多模态检索模型可以被集成到任何(神经网络)模型管道,我们鼓励研究团体使用VisualSem数据增强和/或接地的来源,其他可能的用途。 VisualSem以及多模态检索模型是公开的,可以在以下网址下载:此HTTPS URL。
Houda Alberts, Teresa Huang, Yash Deshpande, Yibo Liu, Kyunghyun Cho, Clara Vania, Iacer Calixto
Abstract: We argue that the next frontier in natural language understanding (NLU) and generation (NLG) will include models that can efficiently access external structured knowledge repositories. In order to support the development of such models, we release the VisualSem knowledge graph (KG) which includes nodes with multilingual glosses and multiple illustrative images and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG. This multi-modal retrieval model can be integrated into any (neural network) model pipeline and we encourage the research community to use VisualSem for data augmentation and/or as a source of grounding, among other possible uses. VisualSem as well as the multi-modal retrieval model are publicly available and can be downloaded in: this https URL.
摘要:我们认为,在自然语言理解(NLU)和代(NLG)的下一个前沿领域将包括能够有效地访问外部结构化的知识仓库模型。为了支持这种模式的发展,我们发布VisualSem知识图(KG),包括多语种的唇彩和多个说明图像和视觉相关关系的节点。我们也释放出可以使用图像或句子作为KG的输入和检索实体神经多模态检索模型。这种多模态检索模型可以被集成到任何(神经网络)模型管道,我们鼓励研究团体使用VisualSem数据增强和/或接地的来源,其他可能的用途。 VisualSem以及多模态检索模型是公开的,可以在以下网址下载:此HTTPS URL。
注:中文为机器翻译结果!封面为论文标题词云图!