目录
1. Improving task-specific representation via 1M unlabelled images without any extra knowledge [PDF] 摘要
4. Modelling the Statistics of Cyclic Activities by Trajectory Analysis on the Manifold of Positive-Semi-Definite Matrices [PDF] 摘要
5. Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics [PDF] 摘要
8. X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [PDF] 摘要
9. Post-DAE: Anatomically Plausible Segmentation via Post-Processing with Denoising Autoencoders [PDF] 摘要
23. Large-scale detection and categorization of oil spills from SAR images with deep learning [PDF] 摘要
26. Defending against adversarial attacks on medical imaging AI system, classification or detection? [PDF] 摘要
31. Dynamic Functional Connectivity and Graph Convolution Network for Alzheimer's Disease Classification [PDF] 摘要
33. ATSO: Asynchronous Teacher-Student Optimizationfor Semi-Supervised Medical Image Segmentation [PDF] 摘要
39. NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search [PDF] 摘要
47. Microstructure Generation via Generative Adversarial Network for Heterogeneous, Topologically Complex 3D Materials [PDF] 摘要
48. Does Non-COVID19 Lung Lesion Help? Investigating Transferability in COVID-19 CT Image Segmentation [PDF] 摘要
49. COVIDLite: A depth-wise separable deep neural network with white balance and CLAHE for detection of COVID-19 [PDF] 摘要
51. Stacked Convolutional Neural Network for Diagnosis of COVID-19 Disease from X-ray Images [PDF] 摘要
54. A Novel Approach for Correcting Multiple Discrete Rigid In-Plane Motions Artefacts in MRI Scans [PDF] 摘要
57. On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures [PDF] 摘要
58. Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model [PDF] 摘要
66. Was there COVID-19 back in 2012? Challenge for AI in Diagnosis with Similar Indications [PDF] 摘要
摘要
1. Improving task-specific representation via 1M unlabelled images without any extra knowledge [PDF] 返回目录
Aayush Bansal
Abstract: We present a case-study to improve the task-specific representation by leveraging a million unlabelled images without any extra knowledge. We propose an exceedingly simple method of conditioning an existing representation on a diverse data distribution and observe that a model trained on diverse examples acts as a better initialization. We extensively study our findings for the task of surface normal estimation and semantic segmentation from a single image. We improve surface normal estimation on NYU-v2 depth dataset and semantic segmentation on PASCAL VOC by 4% over base model. We did not use any task-specific knowledge or auxiliary tasks, neither changed hyper-parameters nor made any modification in the underlying neural network architecture.
摘要:我们提出一个案例研究通过利用一百万未标记的图像而无需任何额外的知识来提高任务的具体表现。我们提出了一个非常简单的调理方法,在现有的表示在不同的数据分布和观察的培训上不同的实例模型作为一个更好的初始化。我们广泛的研究我们的研究结果为表面法线估计和语义分割的来自单个图像的任务。我们通过在基本模型4%提高NYU-V2深度数据集和语义分割上PASCAL VOC表面法线估计。我们没有使用任何任务的特定知识或辅助任务,既没有改变超参数,也没有在底层的神经网络结构做出任何修改。
Aayush Bansal
Abstract: We present a case-study to improve the task-specific representation by leveraging a million unlabelled images without any extra knowledge. We propose an exceedingly simple method of conditioning an existing representation on a diverse data distribution and observe that a model trained on diverse examples acts as a better initialization. We extensively study our findings for the task of surface normal estimation and semantic segmentation from a single image. We improve surface normal estimation on NYU-v2 depth dataset and semantic segmentation on PASCAL VOC by 4% over base model. We did not use any task-specific knowledge or auxiliary tasks, neither changed hyper-parameters nor made any modification in the underlying neural network architecture.
摘要:我们提出一个案例研究通过利用一百万未标记的图像而无需任何额外的知识来提高任务的具体表现。我们提出了一个非常简单的调理方法,在现有的表示在不同的数据分布和观察的培训上不同的实例模型作为一个更好的初始化。我们广泛的研究我们的研究结果为表面法线估计和语义分割的来自单个图像的任务。我们通过在基本模型4%提高NYU-V2深度数据集和语义分割上PASCAL VOC表面法线估计。我们没有使用任何任务的特定知识或辅助任务,既没有改变超参数,也没有在底层的神经网络结构做出任何修改。
2. 3DMotion-Net: Learning Continuous Flow Function for 3D Motion Prediction [PDF] 返回目录
Shuaihang Yuan, Xiang Li, Anthony Tzes, Yi Fang
Abstract: In this paper, we deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames. Previous methods mostly focus on sparse motion prediction in the form of skeletons. While in this paper we focus on predicting dense 3D motions in the from of 3D point clouds. To approach this problem, we propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds that can predict temporally consistent future motions and naturally bring out the correspondences among consecutive point clouds at the same time. More specifically, in our approach, to eliminate the unsolved and challenging process of defining a discrete point convolution on 3D point cloud sequences to encode spatial and temporal information, we introduce a learnable latent code to represent the temporal-aware shape descriptor which is optimized during model training. Moreover, a temporally consistent motion Morpher is proposed to learn a continuous flow field which deforms a 3D scan from the current frame to the next frame. We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input and produces consistent future 3D motion while requiring no ground truth supervision.
摘要:在本文中,我们处理这个问题从以前的两个连续帧预测3D物体扫描的未来3D运动。以前的方法主要集中在稀疏运动预测在骨架的形式。虽然在本文中,我们着重从三维点云的预测密集的3D运动。为了解决这个问题,我们提出了一个自我监督的方法,它利用深层神经网络的力量来学习三维点云的连续流功能,可预测的时间一致的未来运动,自然在连续的点云中带出的对应关系同时。更具体地,在我们的方法,以消除上定义3D点云序列的离散点卷积编码的空间和时间信息的未解和具有挑战性的过程中,我们引入一个可学习潜代码来表示其过程中优化所述时间感知形状描述符模型训练。此外,在时间上一致的运动变形器提出了学习而变形从当前帧到下一帧的3D扫描的连续流场。我们执行的d-FAUST,SCAPE和TOSCA基准数据集了广泛的实验,结果证明我们的做法是能够处理时间不一致的输入,并产生一致的未来的3D运动,而无需地面实况监督。
Shuaihang Yuan, Xiang Li, Anthony Tzes, Yi Fang
Abstract: In this paper, we deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames. Previous methods mostly focus on sparse motion prediction in the form of skeletons. While in this paper we focus on predicting dense 3D motions in the from of 3D point clouds. To approach this problem, we propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds that can predict temporally consistent future motions and naturally bring out the correspondences among consecutive point clouds at the same time. More specifically, in our approach, to eliminate the unsolved and challenging process of defining a discrete point convolution on 3D point cloud sequences to encode spatial and temporal information, we introduce a learnable latent code to represent the temporal-aware shape descriptor which is optimized during model training. Moreover, a temporally consistent motion Morpher is proposed to learn a continuous flow field which deforms a 3D scan from the current frame to the next frame. We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input and produces consistent future 3D motion while requiring no ground truth supervision.
摘要:在本文中,我们处理这个问题从以前的两个连续帧预测3D物体扫描的未来3D运动。以前的方法主要集中在稀疏运动预测在骨架的形式。虽然在本文中,我们着重从三维点云的预测密集的3D运动。为了解决这个问题,我们提出了一个自我监督的方法,它利用深层神经网络的力量来学习三维点云的连续流功能,可预测的时间一致的未来运动,自然在连续的点云中带出的对应关系同时。更具体地,在我们的方法,以消除上定义3D点云序列的离散点卷积编码的空间和时间信息的未解和具有挑战性的过程中,我们引入一个可学习潜代码来表示其过程中优化所述时间感知形状描述符模型训练。此外,在时间上一致的运动变形器提出了学习而变形从当前帧到下一帧的3D扫描的连续流场。我们执行的d-FAUST,SCAPE和TOSCA基准数据集了广泛的实验,结果证明我们的做法是能够处理时间不一致的输入,并产生一致的未来的3D运动,而无需地面实况监督。
3. Feature-dependent Cross-Connections in Multi-Path Neural Networks [PDF] 返回目录
Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo
Abstract: Learning a particular task from a dataset, samples in which originate from diverse contexts, is challenging, and usually addressed by deepening or widening standard neural networks. As opposed to conventional network widening, multi-path architectures restrict the quadratic increment of complexity to a linear scale. However, existing multi-column/path networks or model ensembling methods do not consider any feature-dependent allocation of parallel resources, and therefore, tend to learn redundant features. Given a layer in a multi-path network, if we restrict each path to learn a context-specific set of features and introduce a mechanism to intelligently allocate incoming feature maps to such paths, each path can specialize in a certain context, reducing the redundancy and improving the quality of extracted features. This eventually leads to better-optimized usage of parallel resources. To do this, we propose inserting feature-dependent cross-connections between parallel sets of feature maps in successive layers. The weights of these cross-connections are learned based on the input features of the particular layer. Our multi-path networks show improved image recognition accuracy at a similar complexity compared to conventional and state-of-the-art methods for deepening, widening and adaptive feature extracting, in both small and large scale datasets.
摘要:学习从数据集,样本来自不同背景源于一个特定的任务,是具有挑战性的,通常通过深化或扩大标准的神经网络解决。相对于常规网络加宽,多路径架构限制复杂性的一个线性标尺二次增量。但是,现有的多列/路径网络或模型ensembling方法没有考虑并行资源的任何取决于特征的分配,因此,倾向于学习冗余特征。鉴于在多路径网络的层,如果我们限制每个路径学习上下文特定的一组特征,引入一种机制,以智能地分配呼入特征映射到这样的路径,每个路径可以专注于一定的范围内,减少了冗余和改善的提取的特征的质量。这最终导致更好的优化的并行资源的使用。要做到这一点,我们建议将并行集以连续层的特征地图之间取决于特征的交叉连接。这些交叉连接的权重是基于特定层的输入特征获知。我们的多路径网络中展示相比深化,加宽和自适应特征提取,在小型和大型数据集常规状态的最先进和方法以相似的复杂性改进的图像识别精度。
Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo
Abstract: Learning a particular task from a dataset, samples in which originate from diverse contexts, is challenging, and usually addressed by deepening or widening standard neural networks. As opposed to conventional network widening, multi-path architectures restrict the quadratic increment of complexity to a linear scale. However, existing multi-column/path networks or model ensembling methods do not consider any feature-dependent allocation of parallel resources, and therefore, tend to learn redundant features. Given a layer in a multi-path network, if we restrict each path to learn a context-specific set of features and introduce a mechanism to intelligently allocate incoming feature maps to such paths, each path can specialize in a certain context, reducing the redundancy and improving the quality of extracted features. This eventually leads to better-optimized usage of parallel resources. To do this, we propose inserting feature-dependent cross-connections between parallel sets of feature maps in successive layers. The weights of these cross-connections are learned based on the input features of the particular layer. Our multi-path networks show improved image recognition accuracy at a similar complexity compared to conventional and state-of-the-art methods for deepening, widening and adaptive feature extracting, in both small and large scale datasets.
摘要:学习从数据集,样本来自不同背景源于一个特定的任务,是具有挑战性的,通常通过深化或扩大标准的神经网络解决。相对于常规网络加宽,多路径架构限制复杂性的一个线性标尺二次增量。但是,现有的多列/路径网络或模型ensembling方法没有考虑并行资源的任何取决于特征的分配,因此,倾向于学习冗余特征。鉴于在多路径网络的层,如果我们限制每个路径学习上下文特定的一组特征,引入一种机制,以智能地分配呼入特征映射到这样的路径,每个路径可以专注于一定的范围内,减少了冗余和改善的提取的特征的质量。这最终导致更好的优化的并行资源的使用。要做到这一点,我们建议将并行集以连续层的特征地图之间取决于特征的交叉连接。这些交叉连接的权重是基于特定层的输入特征获知。我们的多路径网络中展示相比深化,加宽和自适应特征提取,在小型和大型数据集常规状态的最先进和方法以相似的复杂性改进的图像识别精度。
4. Modelling the Statistics of Cyclic Activities by Trajectory Analysis on the Manifold of Positive-Semi-Definite Matrices [PDF] 返回目录
Ettore Maria Celozzi, Luca Ciabini, Luca Cultrera, Pietro Pala, Stefano Berretti, Mohamed Daoudi, Alberto Del Bimbo
Abstract: In this paper, a model is presented to extract statistical summaries to characterize the repetition of a cyclic body action, for instance a gym exercise, for the purpose of checking the compliance of the observed action to a template one and highlighting the parts of the action that are not correctly executed (if any). The proposed system relies on a Riemannian metric to compute the distance between two poses in such a way that the geometry of the manifold where the pose descriptors lie is preserved; a model to detect the begin and end of each cycle; a model to temporally align the poses of different cycles so as to accurately estimate the \emph{cross-sectional} mean and variance of poses across different cycles. The proposed model is demonstrated using gym videos taken from the Internet.
摘要:本文提出了一种模型,提出提取统计汇总来表征的环状体动作的重复,比如健身房锻炼,检查所观察到的动作模板之一的合规性和突出的部分的目的未正确执行动作(如果有的话)。所提出的系统依赖于黎曼度量来计算以这样的方式2个姿势之间的距离的是,当描述符姿态谎言被保留在歧管的几何形状;一个模型以检测开始和每个循环的结束;一个模型在时间上对准不同周期的姿势,以便准确地估计\ EMPH {截面}均值和不同周期的姿势变化。该模型是使用从互联网上采取健身房的视频演示。
Ettore Maria Celozzi, Luca Ciabini, Luca Cultrera, Pietro Pala, Stefano Berretti, Mohamed Daoudi, Alberto Del Bimbo
Abstract: In this paper, a model is presented to extract statistical summaries to characterize the repetition of a cyclic body action, for instance a gym exercise, for the purpose of checking the compliance of the observed action to a template one and highlighting the parts of the action that are not correctly executed (if any). The proposed system relies on a Riemannian metric to compute the distance between two poses in such a way that the geometry of the manifold where the pose descriptors lie is preserved; a model to detect the begin and end of each cycle; a model to temporally align the poses of different cycles so as to accurately estimate the \emph{cross-sectional} mean and variance of poses across different cycles. The proposed model is demonstrated using gym videos taken from the Internet.
摘要:本文提出了一种模型,提出提取统计汇总来表征的环状体动作的重复,比如健身房锻炼,检查所观察到的动作模板之一的合规性和突出的部分的目的未正确执行动作(如果有的话)。所提出的系统依赖于黎曼度量来计算以这样的方式2个姿势之间的距离的是,当描述符姿态谎言被保留在歧管的几何形状;一个模型以检测开始和每个循环的结束;一个模型在时间上对准不同周期的姿势,以便准确地估计\ EMPH {截面}均值和不同周期的姿势变化。该模型是使用从互联网上采取健身房的视频演示。
5. Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics [PDF] 返回目录
Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal
Abstract: We propose an automatic method for pain intensity measurement from video. For each video, pain intensity was measured using the dynamics of facial movement using 66 facial points. Gram matrices formulation was used for facial points trajectory representations on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. Curve fitting and temporal alignment were then used to smooth the extracted trajectories. A Support Vector Regression model was then trained to encode the extracted trajectories into ten pain intensity levels consistent with the Visual Analogue Scale for pain intensity measurement. The proposed approach was evaluated using the UNBC McMaster Shoulder Pain Archive and was compared to the state-of-the-art on the same data. Using both 5-fold cross-validation and leave-one-subject-out cross-validation, our results are competitive with respect to state-of-the-art methods.
摘要:我们提出了从视频疼痛强度测量的自动方法。对于每一个视频,疼痛强度使用面部运动的使用66个脸部点的动力学测量。克矩阵被用于脸部点轨迹表示在固定秩的对称半正定矩阵的黎曼流形制剂。然后曲线拟合和时间对准进行用来平滑所提取的轨迹。然后将支持向量回归模型被训练来编码所提取的轨迹与视觉模拟评分法一致的疼痛强度测量10疼痛强度的水平。使用UNBC麦克马斯特肩痛存档所提出的方法进行了评价,进行比较在相同的数据的状态的最先进的。同时使用5倍交叉验证和留一主题交叉验证,我们的结果是相对于国家的最先进的方法,有竞争力的。
Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal
Abstract: We propose an automatic method for pain intensity measurement from video. For each video, pain intensity was measured using the dynamics of facial movement using 66 facial points. Gram matrices formulation was used for facial points trajectory representations on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. Curve fitting and temporal alignment were then used to smooth the extracted trajectories. A Support Vector Regression model was then trained to encode the extracted trajectories into ten pain intensity levels consistent with the Visual Analogue Scale for pain intensity measurement. The proposed approach was evaluated using the UNBC McMaster Shoulder Pain Archive and was compared to the state-of-the-art on the same data. Using both 5-fold cross-validation and leave-one-subject-out cross-validation, our results are competitive with respect to state-of-the-art methods.
摘要:我们提出了从视频疼痛强度测量的自动方法。对于每一个视频,疼痛强度使用面部运动的使用66个脸部点的动力学测量。克矩阵被用于脸部点轨迹表示在固定秩的对称半正定矩阵的黎曼流形制剂。然后曲线拟合和时间对准进行用来平滑所提取的轨迹。然后将支持向量回归模型被训练来编码所提取的轨迹与视觉模拟评分法一致的疼痛强度测量10疼痛强度的水平。使用UNBC麦克马斯特肩痛存档所提出的方法进行了评价,进行比较在相同的数据的状态的最先进的。同时使用5倍交叉验证和留一主题交叉验证,我们的结果是相对于国家的最先进的方法,有竞争力的。
6. Movement Tracking by Optical Flow Assisted Inertial Navigation [PDF] 返回目录
Lassi Meronen, William J. Wilkinson, Arno Solin
Abstract: Robust and accurate six degree-of-freedom tracking on portable devices remains a challenging problem, especially on small hand-held devices such as smartphones. For improved robustness and accuracy, complementary movement information from an IMU and a camera is often fused. Conventional visual-inertial methods fuse information from IMUs with a sparse cloud of feature points tracked by the device camera. We consider a visually dense approach, where the IMU data is fused with the dense optical flow field estimated from the camera data. Learning-based methods applied to the full image frames can leverage visual cues and global consistency of the flow field to improve the flow estimates. We show how a learning-based optical flow model can be combined with conventional inertial navigation, and how ideas from probabilistic deep learning can aid the robustness of the measurement updates. The practical applicability is demonstrated on real-world data acquired by an iPad in a challenging low-texture environment.
摘要:六度的自由度健全和准确便携设备上跟踪仍然是一个具有挑战性的问题,特别是在小型手持设备如智能手机。用于从IMU和照相机改进的鲁棒性和精度,互补运动信息通常被熔接。常规视觉惯性方法熔丝信息从用的IMU的特征点的稀疏云跟踪由装置照相机。我们认为在视觉上密集的方法,其中,所述IMU数据融合与从相机数据估计的密集光流场。学习型适用于全图像帧可以利用视觉线索和流场的全球一致性,以提高流量估计方法。我们将展示如何以学习为主的光流模型可以与传统的惯性导航相结合,以及如何从概率深度学习的想法可以帮助测量更新的鲁棒性。的实用性证明由一个iPad在一个具有挑战性的低纹理环境取得的现实世界中的数据。
Lassi Meronen, William J. Wilkinson, Arno Solin
Abstract: Robust and accurate six degree-of-freedom tracking on portable devices remains a challenging problem, especially on small hand-held devices such as smartphones. For improved robustness and accuracy, complementary movement information from an IMU and a camera is often fused. Conventional visual-inertial methods fuse information from IMUs with a sparse cloud of feature points tracked by the device camera. We consider a visually dense approach, where the IMU data is fused with the dense optical flow field estimated from the camera data. Learning-based methods applied to the full image frames can leverage visual cues and global consistency of the flow field to improve the flow estimates. We show how a learning-based optical flow model can be combined with conventional inertial navigation, and how ideas from probabilistic deep learning can aid the robustness of the measurement updates. The practical applicability is demonstrated on real-world data acquired by an iPad in a challenging low-texture environment.
摘要:六度的自由度健全和准确便携设备上跟踪仍然是一个具有挑战性的问题,特别是在小型手持设备如智能手机。用于从IMU和照相机改进的鲁棒性和精度,互补运动信息通常被熔接。常规视觉惯性方法熔丝信息从用的IMU的特征点的稀疏云跟踪由装置照相机。我们认为在视觉上密集的方法,其中,所述IMU数据融合与从相机数据估计的密集光流场。学习型适用于全图像帧可以利用视觉线索和流场的全球一致性,以提高流量估计方法。我们将展示如何以学习为主的光流模型可以与传统的惯性导航相结合,以及如何从概率深度学习的想法可以帮助测量更新的鲁棒性。的实用性证明由一个iPad在一个具有挑战性的低纹理环境取得的现实世界中的数据。
7. DeepTracking-Net: 3D Tracking with Unsupervised Learning of Continuous Flow [PDF] 返回目录
Shuaihang Yuan, Xiang Li, Yi Fang
Abstract: This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes. Despite deep learning approaches have achieved promising performance for pairwise dense 3D shapes matching, it is a great challenge to generalize those approaches for the tracking of 3D time-varying geometries. In this paper, we aim at handling the problem of 3D tracking, which provides the tracking of the consecutive frames of 3D shapes. We propose a novel unsupervised 3D shape registration framework named DeepTracking-Net, which uses the deep neural networks (DNNs) as auxiliary functions to produce spatially and temporally continuous displacement fields for 3D tracking of objects in a temporal order. Our key novelty is that we present a novel temporal-aware correspondence descriptor (TCD) that captures spatio-temporal essence from consecutive 3D point cloud frames. Specifically, our DeepTracking-Net starts with optimizing a randomly initialized latent TCD. The TCD is then decoded to regress a continuous flow (i.e. a displacement vector field) which assigns a motion vector to every point of time-varying 3D shapes. Our DeepTracking-Net jointly optimizes TCDs and DNNs' weights towards the minimization of an unsupervised alignment loss. Experiments on both simulated and real data sets demonstrate that our unsupervised DeepTracking-Net outperforms the current supervised state-of-the-art method. In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.
摘要:3D跟踪,即问题本文讨论找到时变3D形状的顺序对应密集。尽管深学习方法都取得了有前途的成对密集的3D形状匹配性能,它是推广这些方法对于3D随时间变化的几何形状的跟踪一个巨大的挑战。在本文中,我们的目标是在处理三维跟踪,它提供的3D形状的连续帧的跟踪的问题。我们建议命名DeepTracking型网,其使用深神经网络(DNNs)作为辅助功能以产生用于对象的3D跟踪在空间上和时间上连续的位移场按照时间顺序的新的无监督3D形状登记框架。我们的主要新颖之处在于,我们提出一个新的时间感知对应描述符(TCD),其从连续3D点云帧捕获时空本质。具体来说,我们DeepTracking-Net的开始与优化随机初始化的潜在TCD。然后将TCD被解码以倒退的连续流(即一个位移矢量场),该运动矢量分配给随时间变化的3D形状的每个点。我们DeepTracking-Net的共同优化的TCD120和DNNs'的权重对无监督对准损失的最小化。两个模拟和真实数据集上的实验表明,我们的无监督DeepTracking-Net的优于当前监测状态的最先进的方法。此外,我们还准备新合成的三维数据,命名SynMotions,到3D跟踪和识别社区。
Shuaihang Yuan, Xiang Li, Yi Fang
Abstract: This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes. Despite deep learning approaches have achieved promising performance for pairwise dense 3D shapes matching, it is a great challenge to generalize those approaches for the tracking of 3D time-varying geometries. In this paper, we aim at handling the problem of 3D tracking, which provides the tracking of the consecutive frames of 3D shapes. We propose a novel unsupervised 3D shape registration framework named DeepTracking-Net, which uses the deep neural networks (DNNs) as auxiliary functions to produce spatially and temporally continuous displacement fields for 3D tracking of objects in a temporal order. Our key novelty is that we present a novel temporal-aware correspondence descriptor (TCD) that captures spatio-temporal essence from consecutive 3D point cloud frames. Specifically, our DeepTracking-Net starts with optimizing a randomly initialized latent TCD. The TCD is then decoded to regress a continuous flow (i.e. a displacement vector field) which assigns a motion vector to every point of time-varying 3D shapes. Our DeepTracking-Net jointly optimizes TCDs and DNNs' weights towards the minimization of an unsupervised alignment loss. Experiments on both simulated and real data sets demonstrate that our unsupervised DeepTracking-Net outperforms the current supervised state-of-the-art method. In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.
摘要:3D跟踪,即问题本文讨论找到时变3D形状的顺序对应密集。尽管深学习方法都取得了有前途的成对密集的3D形状匹配性能,它是推广这些方法对于3D随时间变化的几何形状的跟踪一个巨大的挑战。在本文中,我们的目标是在处理三维跟踪,它提供的3D形状的连续帧的跟踪的问题。我们建议命名DeepTracking型网,其使用深神经网络(DNNs)作为辅助功能以产生用于对象的3D跟踪在空间上和时间上连续的位移场按照时间顺序的新的无监督3D形状登记框架。我们的主要新颖之处在于,我们提出一个新的时间感知对应描述符(TCD),其从连续3D点云帧捕获时空本质。具体来说,我们DeepTracking-Net的开始与优化随机初始化的潜在TCD。然后将TCD被解码以倒退的连续流(即一个位移矢量场),该运动矢量分配给随时间变化的3D形状的每个点。我们DeepTracking-Net的共同优化的TCD120和DNNs'的权重对无监督对准损失的最小化。两个模拟和真实数据集上的实验表明,我们的无监督DeepTracking-Net的优于当前监测状态的最先进的方法。此外,我们还准备新合成的三维数据,命名SynMotions,到3D跟踪和识别社区。
8. X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [PDF] 返回目录
Danfeng Hong, Naoto Yokoya, Gui-Song Xia, Jocelyn Chanussot, Xiao Xiang Zhu
Abstract: This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
摘要:本文地址与遥感限于跨模态数据半监督学习转移的问题。大量的多模态地观测的图像,例如多光谱图像(MSI)或合成孔径雷达(SAR)的数据,是在全球范围内公开提供,使通过遥感图像解析全球城市场景。然而,它们在确定材料(逐像素分类)的能力仍然有限,由于噪声的采集环境和较差的区别信息以及公注释的训练图像的数量有限。为此,我们提出了一种新的跨模态深学习框架,称为X-ModalNet,有三个精心设计的模块:自对抗模块,互动学习模块,和标签传播模块,通过学习,把从更有辨别信息一个小规模的高光谱图像(HSI)到分类任务使用一个大型的MSI或者SAR数据。显著,X-ModalNet概括很好,由于对通过高级别构造的可更新图形传播标签上的网络的顶部设有,得到半监督跨模态学习。我们评估X-ModalNet两个多模态遥感数据集(HSI-MSI和HSI-SAR),实现比较显著的改善与国家的最先进的几种方法。
Danfeng Hong, Naoto Yokoya, Gui-Song Xia, Jocelyn Chanussot, Xiao Xiang Zhu
Abstract: This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
摘要:本文地址与遥感限于跨模态数据半监督学习转移的问题。大量的多模态地观测的图像,例如多光谱图像(MSI)或合成孔径雷达(SAR)的数据,是在全球范围内公开提供,使通过遥感图像解析全球城市场景。然而,它们在确定材料(逐像素分类)的能力仍然有限,由于噪声的采集环境和较差的区别信息以及公注释的训练图像的数量有限。为此,我们提出了一种新的跨模态深学习框架,称为X-ModalNet,有三个精心设计的模块:自对抗模块,互动学习模块,和标签传播模块,通过学习,把从更有辨别信息一个小规模的高光谱图像(HSI)到分类任务使用一个大型的MSI或者SAR数据。显著,X-ModalNet概括很好,由于对通过高级别构造的可更新图形传播标签上的网络的顶部设有,得到半监督跨模态学习。我们评估X-ModalNet两个多模态遥感数据集(HSI-MSI和HSI-SAR),实现比较显著的改善与国家的最先进的几种方法。
9. Post-DAE: Anatomically Plausible Segmentation via Post-Processing with Denoising Autoencoders [PDF] 返回目录
Agostina J Larrazabal, César Martínez, Ben Glocker, Enzo Ferrante
Abstract: We introduce Post-DAE, a post-processing method based on denoising autoencoders (DAE) to improve the anatomical plausibility of arbitrary biomedical image segmentation algorithms. Some of the most popular segmentation methods (e.g. based on convolutional neural networks or random forest classifiers) incorporate additional post-processing steps to ensure that the resulting masks fulfill expected connectivity constraints. These methods operate under the hypothesis that contiguous pixels with similar aspect should belong to the same class. Even if valid in general, this assumption does not consider more complex priors like topological restrictions or convexity, which cannot be easily incorporated into these methods. Post-DAE leverages the latest developments in manifold learning via denoising autoencoders. First, we learn a compact and non-linear embedding that represents the space of anatomically plausible segmentations. Then, given a segmentation mask obtained with an arbitrary method, we reconstruct its anatomically plausible version by projecting it onto the learnt manifold. The proposed method is trained using unpaired segmentation mask, what makes it independent of intensity information and image modality. We performed experiments in binary and multi-label segmentation of chest X-ray and cardiac magnetic resonance images. We show how erroneous and noisy segmentation masks can be improved using Post-DAE. With almost no additional computation cost, our method brings erroneous segmentations back to a feasible space.
摘要:介绍后DAE,基于降噪自动编码(DAE)改善的任意生物医学图像分割算法的解剖合理性后处理方法。一些最受欢迎的分割方法(例如,基于卷积神经网络或随机森林分类器)结合附加的后处理步骤,以确保所得到的掩模是否履行预期连接的约束。这些方法的假设下操作,与类似方面连续的像素应该是属于同一类。即使在一般有效的,这种假设不考虑像拓扑限制或凸,不能很容易地纳入这些方法更复杂的前科。后经DAE降噪自动编码利用流形学习中的最新发展。首先,我们知道代表解剖学合理分割空间的紧凑和非线性嵌入。然后,给定具有任意方法获得的分割掩码,我们通过将其投影到所述学习歧管重建其解剖学可信版本。所提出的方法使用不成对的分割掩码,让他独立的强度信息和图像模态的培训。我们在胸片及心脏磁共振图像的二进制和多标签分割进行了实验。我们将展示如何错误的,嘈杂的分割掩码可以使用Post-DAE得到改善。几乎没有额外的计算成本,我们的方法带来错误的分割回一个可行的空间。
Agostina J Larrazabal, César Martínez, Ben Glocker, Enzo Ferrante
Abstract: We introduce Post-DAE, a post-processing method based on denoising autoencoders (DAE) to improve the anatomical plausibility of arbitrary biomedical image segmentation algorithms. Some of the most popular segmentation methods (e.g. based on convolutional neural networks or random forest classifiers) incorporate additional post-processing steps to ensure that the resulting masks fulfill expected connectivity constraints. These methods operate under the hypothesis that contiguous pixels with similar aspect should belong to the same class. Even if valid in general, this assumption does not consider more complex priors like topological restrictions or convexity, which cannot be easily incorporated into these methods. Post-DAE leverages the latest developments in manifold learning via denoising autoencoders. First, we learn a compact and non-linear embedding that represents the space of anatomically plausible segmentations. Then, given a segmentation mask obtained with an arbitrary method, we reconstruct its anatomically plausible version by projecting it onto the learnt manifold. The proposed method is trained using unpaired segmentation mask, what makes it independent of intensity information and image modality. We performed experiments in binary and multi-label segmentation of chest X-ray and cardiac magnetic resonance images. We show how erroneous and noisy segmentation masks can be improved using Post-DAE. With almost no additional computation cost, our method brings erroneous segmentations back to a feasible space.
摘要:介绍后DAE,基于降噪自动编码(DAE)改善的任意生物医学图像分割算法的解剖合理性后处理方法。一些最受欢迎的分割方法(例如,基于卷积神经网络或随机森林分类器)结合附加的后处理步骤,以确保所得到的掩模是否履行预期连接的约束。这些方法的假设下操作,与类似方面连续的像素应该是属于同一类。即使在一般有效的,这种假设不考虑像拓扑限制或凸,不能很容易地纳入这些方法更复杂的前科。后经DAE降噪自动编码利用流形学习中的最新发展。首先,我们知道代表解剖学合理分割空间的紧凑和非线性嵌入。然后,给定具有任意方法获得的分割掩码,我们通过将其投影到所述学习歧管重建其解剖学可信版本。所提出的方法使用不成对的分割掩码,让他独立的强度信息和图像模态的培训。我们在胸片及心脏磁共振图像的二进制和多标签分割进行了实验。我们将展示如何错误的,嘈杂的分割掩码可以使用Post-DAE得到改善。几乎没有额外的计算成本,我们的方法带来错误的分割回一个可行的空间。
10. Neural Splines: Fitting 3D Surfaces with Infinitely-Wide Neural Networks [PDF] 返回目录
Francis Williams, Matthew Trager, Joan Bruna, Denis Zorin
Abstract: We present Neural Splines, a technique for 3D surface reconstruction that is based on random feature kernels arising from infinitely-wide shallow ReLU networks. Our method achieves state-of-the-art results, outperforming Screened Poisson Surface Reconstruction and modern neural network based techniques. Because our approach is based on a simple kernel formulation, it is fast to run and easy to analyze. We provide explicit analytical expressions for our kernel and argue that our formulation can be seen as a generalization of cubic spline interpolation to higher dimensions. In particular, the RKHS norm associated with our kernel biases toward smooth interpolants. Finally, we formulate Screened Poisson Surface Reconstruction as a kernel method and derive an analytic expression for its norm in the corresponding RKHS.
摘要:我们提出神经样条,三维表面重建的技术内核从无限宽的浅RELU网络出现了基于随机特性。我们的方法实现了国家的先进成果,跑赢屏蔽泊松曲面重构和现代基于神经网络的技术。因为我们的方法是基于一个简单的内核制剂时,运行速度快且易于分析。我们为内核提供明确的解析表达式,并认为我们的配方可以被看作是三次样条插值到更高层面的推广。特别是,RKHS范数我们对光滑插值内核偏见有关。最后,我们制定屏蔽泊松曲面重构的核方法得出其对应的RKHS范数的解析表达式。
Francis Williams, Matthew Trager, Joan Bruna, Denis Zorin
Abstract: We present Neural Splines, a technique for 3D surface reconstruction that is based on random feature kernels arising from infinitely-wide shallow ReLU networks. Our method achieves state-of-the-art results, outperforming Screened Poisson Surface Reconstruction and modern neural network based techniques. Because our approach is based on a simple kernel formulation, it is fast to run and easy to analyze. We provide explicit analytical expressions for our kernel and argue that our formulation can be seen as a generalization of cubic spline interpolation to higher dimensions. In particular, the RKHS norm associated with our kernel biases toward smooth interpolants. Finally, we formulate Screened Poisson Surface Reconstruction as a kernel method and derive an analytic expression for its norm in the corresponding RKHS.
摘要:我们提出神经样条,三维表面重建的技术内核从无限宽的浅RELU网络出现了基于随机特性。我们的方法实现了国家的先进成果,跑赢屏蔽泊松曲面重构和现代基于神经网络的技术。因为我们的方法是基于一个简单的内核制剂时,运行速度快且易于分析。我们为内核提供明确的解析表达式,并认为我们的配方可以被看作是三次样条插值到更高层面的推广。特别是,RKHS范数我们对光滑插值内核偏见有关。最后,我们制定屏蔽泊松曲面重构的核方法得出其对应的RKHS范数的解析表达式。
11. Insights from the Future for Continual Learning [PDF] 返回目录
Arthur Douillard, Eduardo Valle, Charles Ollion, Thomas Robert, Matthieu Cord
Abstract: Continual learning aims to learn tasks sequentially, with (often severe) constraints on the storage of old learning samples, without suffering from catastrophic forgetting. In this work, we propose prescient continual learning, a novel experimental setting, to incorporate existing information about the classes, prior to any training data. Usually, each task in a traditional continual learning setting evaluates the model on present and past classes, the latter with a limited number of training samples. Our setting adds future classes, with no training samples at all. We introduce Ghost Model, a representation-learning-based model for continual learning using ideas from zero-shot learning. A generative model of the representation space in concert with a careful adjustment of the losses allows us to exploit insights from future classes to constraint the spatial arrangement of the past and current classes. Quantitative results on the AwA2 and aP\&Y datasets and detailed visualizations showcase the interest of this new setting and the method we propose to address it.
摘要:持续的学习目标学习任务的顺序,与旧学样品的储存(通常重度)的约束,而不灾难性遗忘痛苦。在这项工作中,我们提出预见性的持续学习,一个新的实验设置,纳入有关课程,培训的任何数据之前已有的信息。通常情况下,在传统的持续学习设置每个任务会评估当前和过去的类模型,后者与训练样本的数量有限。我们设置添加未来的班,与根本没有训练样本。我们引进幽灵模式,使用的想法从零次学习不断学习基于表示学习模式。同的损失仔细调整一致表示空间的生成模型使我们能够利用从未来的类见解约束的过去和当前类的空间布置。在AwA2和AP \&Y数据集的定量结果和详细的可视化展示这个新的设置,并提出了解决这一问题的方法的兴趣。
Arthur Douillard, Eduardo Valle, Charles Ollion, Thomas Robert, Matthieu Cord
Abstract: Continual learning aims to learn tasks sequentially, with (often severe) constraints on the storage of old learning samples, without suffering from catastrophic forgetting. In this work, we propose prescient continual learning, a novel experimental setting, to incorporate existing information about the classes, prior to any training data. Usually, each task in a traditional continual learning setting evaluates the model on present and past classes, the latter with a limited number of training samples. Our setting adds future classes, with no training samples at all. We introduce Ghost Model, a representation-learning-based model for continual learning using ideas from zero-shot learning. A generative model of the representation space in concert with a careful adjustment of the losses allows us to exploit insights from future classes to constraint the spatial arrangement of the past and current classes. Quantitative results on the AwA2 and aP\&Y datasets and detailed visualizations showcase the interest of this new setting and the method we propose to address it.
摘要:持续的学习目标学习任务的顺序,与旧学样品的储存(通常重度)的约束,而不灾难性遗忘痛苦。在这项工作中,我们提出预见性的持续学习,一个新的实验设置,纳入有关课程,培训的任何数据之前已有的信息。通常情况下,在传统的持续学习设置每个任务会评估当前和过去的类模型,后者与训练样本的数量有限。我们设置添加未来的班,与根本没有训练样本。我们引进幽灵模式,使用的想法从零次学习不断学习基于表示学习模式。同的损失仔细调整一致表示空间的生成模型使我们能够利用从未来的类见解约束的过去和当前类的空间布置。在AwA2和AP \&Y数据集的定量结果和详细的可视化展示这个新的设置,并提出了解决这一问题的方法的兴趣。
12. PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks [PDF] 返回目录
Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan
Abstract: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|this http URL" for "this http URL" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a challenge in creating a dataset that can capture all possible variations. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs, conditioned on non-homoglpyh input text images. Practical changes to current SOTA were required to facilitate the generation of more varied homoglyph text-based images. We also demonstrate a workflow of how PhishGAN together with a Homoglyph Identifier (HI) model can be used to identify the domain the homoglyph was trying to imitate. Furthermore, we demonstrate how PhishGAN's ability to generate datasets on the fly facilitate the quick adaptation of cybersecurity systems to detect new threats as they emerge.
摘要:同形字攻击是黑客用来进行钓鱼攻击的常用技术。视觉上相似于实际的域名或链接通过的Punycode产生混淆攻击,使受害者的网络钓鱼更为敏感。例如,受害者可能误以为“|此http网址”“这个HTTP URL”,并在这个过程中,泄露个人信息的假网站。艺术(SOTA)的当前状态通常使用的字符串比较算法(例如Levenshtein距离),这是计算繁重。这其中的一个原因是缺乏因而阻碍了更先进的机器学习(ML)模型的训练可公开获得的数据集。此外,没有一个字体能够正确呈现所有类型的Punycode的,冒充显著挑战创建一个数据集的是对任何特定的字体偏见。这再加上广大互联网域名的构成中创建一个数据集,可以捕获所有可能变化的挑战。在这里,我们将展示如何有条件剖成对抗性网(GAN),PhishGAN,可以用来生成象形文字的图像,空调,非homoglpyh输入文本的图像。被要求当前SOTA实际变化,以方便更多的变化,同形字的基于文本的图像的生成。我们还演示了如何PhishGAN的工作流程连同同形字标识(HI)模型可以用来识别同形字试图模仿域。此外,我们证明PhishGAN对实时生成的数据集的能力如何促进网络安全系统的快速适应,因为他们冒出来检测新的威胁。
Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan
Abstract: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|this http URL" for "this http URL" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a challenge in creating a dataset that can capture all possible variations. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs, conditioned on non-homoglpyh input text images. Practical changes to current SOTA were required to facilitate the generation of more varied homoglyph text-based images. We also demonstrate a workflow of how PhishGAN together with a Homoglyph Identifier (HI) model can be used to identify the domain the homoglyph was trying to imitate. Furthermore, we demonstrate how PhishGAN's ability to generate datasets on the fly facilitate the quick adaptation of cybersecurity systems to detect new threats as they emerge.
摘要:同形字攻击是黑客用来进行钓鱼攻击的常用技术。视觉上相似于实际的域名或链接通过的Punycode产生混淆攻击,使受害者的网络钓鱼更为敏感。例如,受害者可能误以为“|此http网址”“这个HTTP URL”,并在这个过程中,泄露个人信息的假网站。艺术(SOTA)的当前状态通常使用的字符串比较算法(例如Levenshtein距离),这是计算繁重。这其中的一个原因是缺乏因而阻碍了更先进的机器学习(ML)模型的训练可公开获得的数据集。此外,没有一个字体能够正确呈现所有类型的Punycode的,冒充显著挑战创建一个数据集的是对任何特定的字体偏见。这再加上广大互联网域名的构成中创建一个数据集,可以捕获所有可能变化的挑战。在这里,我们将展示如何有条件剖成对抗性网(GAN),PhishGAN,可以用来生成象形文字的图像,空调,非homoglpyh输入文本的图像。被要求当前SOTA实际变化,以方便更多的变化,同形字的基于文本的图像的生成。我们还演示了如何PhishGAN的工作流程连同同形字标识(HI)模型可以用来识别同形字试图模仿域。此外,我们证明PhishGAN对实时生成的数据集的能力如何促进网络安全系统的快速适应,因为他们冒出来检测新的威胁。
13. Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness [PDF] 返回目录
Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, Yu-Gang Jiang
Abstract: Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called \emph{Imbalanced Gradients} that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a \emph{Margin Decomposition (MD)} attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We examine 12 state-of-the-art defense models, and find that models exploiting label smoothing easily cause imbalanced gradients, and on which our MD attacks can decrease their PGD robustness (evaluated by PGD attack) by over 23%. For 6 out of the 12 defenses, our attack can reduce their PGD robustness by at least 9%. The results suggest that imbalanced gradients need to be carefully addressed for more reliable adversarial robustness.
摘要:评估防御模型的鲁棒性是对立的鲁棒性研究的一个具有挑战性的任务。模糊梯度,一种类型的梯度掩蔽的,先前已发现,在许多防御方法存在并导致鲁棒性错误信号。在本文中,我们确定了一个名为更微妙的情况\ {EMPH不平衡梯度}这也可能导致高估对抗性的鲁棒性。发生不平衡梯度的现象,当裕度损失占主导地位并推向到次优的方向上的攻击的一个术语的梯度。为了利用不平衡的梯度,我们制定\ {EMPH保证金分解(MD)}攻击,分解的差额损失为单独的条款,然后通过一个两阶段的过程分别探讨了这些术语的攻击性。我们研究的国家的最先进的12款防守,发现模型利用标签的原因不平衡梯度容易平滑,并在其上我们的MD攻击可以超过23%,减少了PGD鲁棒性(通过PGD攻击评估)。对于6出12点防御,我们的攻击可以通过至少9%,减少他们的PGD鲁棒性。结果表明,不平衡的梯度需要认真解决的更可靠的对抗性的鲁棒性。
Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, Yu-Gang Jiang
Abstract: Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called \emph{Imbalanced Gradients} that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a \emph{Margin Decomposition (MD)} attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We examine 12 state-of-the-art defense models, and find that models exploiting label smoothing easily cause imbalanced gradients, and on which our MD attacks can decrease their PGD robustness (evaluated by PGD attack) by over 23%. For 6 out of the 12 defenses, our attack can reduce their PGD robustness by at least 9%. The results suggest that imbalanced gradients need to be carefully addressed for more reliable adversarial robustness.
摘要:评估防御模型的鲁棒性是对立的鲁棒性研究的一个具有挑战性的任务。模糊梯度,一种类型的梯度掩蔽的,先前已发现,在许多防御方法存在并导致鲁棒性错误信号。在本文中,我们确定了一个名为更微妙的情况\ {EMPH不平衡梯度}这也可能导致高估对抗性的鲁棒性。发生不平衡梯度的现象,当裕度损失占主导地位并推向到次优的方向上的攻击的一个术语的梯度。为了利用不平衡的梯度,我们制定\ {EMPH保证金分解(MD)}攻击,分解的差额损失为单独的条款,然后通过一个两阶段的过程分别探讨了这些术语的攻击性。我们研究的国家的最先进的12款防守,发现模型利用标签的原因不平衡梯度容易平滑,并在其上我们的MD攻击可以超过23%,减少了PGD鲁棒性(通过PGD攻击评估)。对于6出12点防御,我们的攻击可以通过至少9%,减少他们的PGD鲁棒性。结果表明,不平衡的梯度需要认真解决的更可靠的对抗性的鲁棒性。
14. FBK-HUPBA Submission to the EPIC-Kitchens Action Recognition 2020 Challenge [PDF] 返回目录
Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: Gate-Shift Module (GSM) [1] and EgoACO, an extension of Long Short-Term Attention (LSTA) [2]. We design an ensemble of GSM and EgoACO model families with different backbones and pre-training to generate the prediction scores. Our submission, visible on the public leaderboard with team name FBK-HUPBA, achieved a top-1 action recognition accuracy of 40.0% on S1 setting, and 25.71% on S2 setting, using only RGB.
摘要:在这个报告中,我们描述了我们提交给EPIC的厨房行为识别2020挑战的技术细节。参加挑战,我们部署了我们最近开发的时空特征提取和聚合模式:门移动模块(GSM)[1]和EgoACO,长短期关注(LSTA)的扩展[2]。我们设计的GSM和EgoACO示范户的集合不同骨干网和前培训,而产生预测分数。我们提出,与队名FBK-HUPBA公开排行榜可见,实现对S1设定为40.0%,并且在S2设置25.71%顶-1动作识别精度,只使用RGB。
Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: Gate-Shift Module (GSM) [1] and EgoACO, an extension of Long Short-Term Attention (LSTA) [2]. We design an ensemble of GSM and EgoACO model families with different backbones and pre-training to generate the prediction scores. Our submission, visible on the public leaderboard with team name FBK-HUPBA, achieved a top-1 action recognition accuracy of 40.0% on S1 setting, and 25.71% on S2 setting, using only RGB.
摘要:在这个报告中,我们描述了我们提交给EPIC的厨房行为识别2020挑战的技术细节。参加挑战,我们部署了我们最近开发的时空特征提取和聚合模式:门移动模块(GSM)[1]和EgoACO,长短期关注(LSTA)的扩展[2]。我们设计的GSM和EgoACO示范户的集合不同骨干网和前培训,而产生预测分数。我们提出,与队名FBK-HUPBA公开排行榜可见,实现对S1设定为40.0%,并且在S2设置25.71%顶-1动作识别精度,只使用RGB。
15. Artist-Guided Semiautomatic Animation Colorization [PDF] 返回目录
Harrish Thasarathan, Mehran Ebrahimi
Abstract: There is a delicate balance between automating repetitive work in creative domains while staying true to an artist's vision. The animation industry regularly outsources large animation workloads to foreign countries where labor is inexpensive and long hours are common. Automating part of this process can be incredibly useful for reducing costs and creating manageable workloads for major animation studios and outsourced artists. We present a method for automating line art colorization by keeping artists in the loop to successfully reduce this workload while staying true to an artist's vision. By incorporating color hints and temporal information to an adversarial image-to-image framework, we show that it is possible to meet the balance between automation and authenticity through artist's input to generate colored frames with temporal consistency.
摘要:目前在自动化领域的创意重复性的工作,同时坚守一个艺术家的眼光之间的微妙平衡。动漫产业外包定期动画大工作量国外劳动力价格低廉,很长时间是常见的。自动执行这个过程的一部分,因为它可以降低成本和创造大动画工作室和外包艺术家管理工作量非常有用的。我们提出了通过使艺术家循环成功地减少这种工作量,同时坚守一个艺术家的眼光自动化线条着色的方法。通过结合颜色暗示和时间信息,以敌对图像 - 图像框架,我们表明,有可能满足通过艺术家的输入自动化和真实性之间的平衡,产生与时间一致性彩色照片。
Harrish Thasarathan, Mehran Ebrahimi
Abstract: There is a delicate balance between automating repetitive work in creative domains while staying true to an artist's vision. The animation industry regularly outsources large animation workloads to foreign countries where labor is inexpensive and long hours are common. Automating part of this process can be incredibly useful for reducing costs and creating manageable workloads for major animation studios and outsourced artists. We present a method for automating line art colorization by keeping artists in the loop to successfully reduce this workload while staying true to an artist's vision. By incorporating color hints and temporal information to an adversarial image-to-image framework, we show that it is possible to meet the balance between automation and authenticity through artist's input to generate colored frames with temporal consistency.
摘要:目前在自动化领域的创意重复性的工作,同时坚守一个艺术家的眼光之间的微妙平衡。动漫产业外包定期动画大工作量国外劳动力价格低廉,很长时间是常见的。自动执行这个过程的一部分,因为它可以降低成本和创造大动画工作室和外包艺术家管理工作量非常有用的。我们提出了通过使艺术家循环成功地减少这种工作量,同时坚守一个艺术家的眼光自动化线条着色的方法。通过结合颜色暗示和时间信息,以敌对图像 - 图像框架,我们表明,有可能满足通过艺术家的输入自动化和真实性之间的平衡,产生与时间一致性彩色照片。
16. Self-Convolution: A Highly-Efficient Operator for Non-Local Image Restoration [PDF] 返回目录
Lanqing Guo, Saiprasad Ravishankar, Bihan Wen
Abstract: Constructing effective image priors is critical to solving ill-posed inverse problems, such as image reconstruction. Recent works proposed to exploit image non-local similarity for inverse problems by grouping similar patches, and demonstrated state-of-the-art results in many applications. However, comparing to classic local methods based on filtering or sparsity, most of the non-local algorithms are time-consuming, mainly due to the highly inefficient and redundant block matching step, where the distance between each pair of overlapping patches needs to be computed. In this work, we propose a novel self-convolution operator to exploit image non-local similarity in a self-supervised way. The proposed self-convolution can generalize the commonly-used block matching step, and produce the equivalent results with much cheaper computation. Furthermore, by applying self-convolution, we propose an effective multi-modality image restoration scheme, which is much more efficient than conventional block matching for non-local modeling. Experimental results demonstrate that (1) self-convolution can significantly speed up most of the popular non-local image restoration algorithms, with two-fold to nine-fold faster block matching; and (2) the proposed multi-modality restoration scheme achieves state-of-the-art denoising results on the RGB-NIR and Stereo image datasets. The code will be released on GitHub.
摘要:构建有效图像先验是解决不适定反演问题,如图像重建的关键。建议近期作品将类似的补丁利用逆问题图片非本地相似性,并展示成果的国家的最先进的在许多应用中。然而,当比较基于过滤或稀疏经典局部方法,大部分的非本地算法是耗时的,主要是由于非常低效和冗余块匹配步骤,其中,要被计算的每对重叠的贴片需求的之间的距离。在这项工作中,我们提出了一种新型自卷积运算符来利用自我监督方式图像非本地相似。所提出的自卷积可以概括的常用的块匹配步骤,并用便宜得多的计算产生等效的结果。此外,通过应用自卷积,我们提出了一种有效的多模态图像恢复方案,这是要比非本地建模传统的块匹配更有效。实验结果表明,(1)自卷积可以显著加快最流行的非本地图像恢复算法,用2倍至9倍更快块匹配;和(2)所提出的多模态恢复方案实现状态的最先进的去噪的RGB-NIR和立体声图像数据组的结果。该代码将在GitHub上公布。
Lanqing Guo, Saiprasad Ravishankar, Bihan Wen
Abstract: Constructing effective image priors is critical to solving ill-posed inverse problems, such as image reconstruction. Recent works proposed to exploit image non-local similarity for inverse problems by grouping similar patches, and demonstrated state-of-the-art results in many applications. However, comparing to classic local methods based on filtering or sparsity, most of the non-local algorithms are time-consuming, mainly due to the highly inefficient and redundant block matching step, where the distance between each pair of overlapping patches needs to be computed. In this work, we propose a novel self-convolution operator to exploit image non-local similarity in a self-supervised way. The proposed self-convolution can generalize the commonly-used block matching step, and produce the equivalent results with much cheaper computation. Furthermore, by applying self-convolution, we propose an effective multi-modality image restoration scheme, which is much more efficient than conventional block matching for non-local modeling. Experimental results demonstrate that (1) self-convolution can significantly speed up most of the popular non-local image restoration algorithms, with two-fold to nine-fold faster block matching; and (2) the proposed multi-modality restoration scheme achieves state-of-the-art denoising results on the RGB-NIR and Stereo image datasets. The code will be released on GitHub.
摘要:构建有效图像先验是解决不适定反演问题,如图像重建的关键。建议近期作品将类似的补丁利用逆问题图片非本地相似性,并展示成果的国家的最先进的在许多应用中。然而,当比较基于过滤或稀疏经典局部方法,大部分的非本地算法是耗时的,主要是由于非常低效和冗余块匹配步骤,其中,要被计算的每对重叠的贴片需求的之间的距离。在这项工作中,我们提出了一种新型自卷积运算符来利用自我监督方式图像非本地相似。所提出的自卷积可以概括的常用的块匹配步骤,并用便宜得多的计算产生等效的结果。此外,通过应用自卷积,我们提出了一种有效的多模态图像恢复方案,这是要比非本地建模传统的块匹配更有效。实验结果表明,(1)自卷积可以显著加快最流行的非本地图像恢复算法,用2倍至9倍更快块匹配;和(2)所提出的多模态恢复方案实现状态的最先进的去噪的RGB-NIR和立体声图像数据组的结果。该代码将在GitHub上公布。
17. Multi-view Drone-based Geo-localization via Style and Spatial Alignment [PDF] 返回目录
Siyi Hu, Xiaojun Chang
Abstract: In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different view points and view sources. However, these methods fail to pay sufficient attention to spatial information (especially viewpoint variances). To address this drawback, we propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature. Moreover, we provide a style alignment strategy to reduce the variance in image style and enhance the feature unification. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the large-scale benchmark dataset. The experimental results confirm the superiority of the proposed approach compared to state-of-the-art alternatives.
摘要:在本文中,我们侧重于多视图的多源的地理定位,其作为GPS定位的由匹配雄蜂视点图像和卫星视点图像与预注释GPS标签的重要辅助方法的任务。为了解决这个问题,大多数现有的方法采用度量损失与加权分类块强制的通过不同的视点和视图源共享共同的特征空间的产生。然而,这些方法不能引起足够重视空间信息(尤其是观点的差异)。为了解决这个缺点,我们提出了一个优雅的基于方向的方法来对齐模式,引进一个新的分支提取对准部分功能。此外,我们提供了一个风格对准策略,以减少影像风格的变化,增强功能的统一。为了证明该方法的性能,我们对大规模基准数据集进行了广泛的实验。实验结果证实了该方法的优越性相比,国家的最先进的替代品。
Siyi Hu, Xiaojun Chang
Abstract: In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different view points and view sources. However, these methods fail to pay sufficient attention to spatial information (especially viewpoint variances). To address this drawback, we propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature. Moreover, we provide a style alignment strategy to reduce the variance in image style and enhance the feature unification. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the large-scale benchmark dataset. The experimental results confirm the superiority of the proposed approach compared to state-of-the-art alternatives.
摘要:在本文中,我们侧重于多视图的多源的地理定位,其作为GPS定位的由匹配雄蜂视点图像和卫星视点图像与预注释GPS标签的重要辅助方法的任务。为了解决这个问题,大多数现有的方法采用度量损失与加权分类块强制的通过不同的视点和视图源共享共同的特征空间的产生。然而,这些方法不能引起足够重视空间信息(尤其是观点的差异)。为了解决这个缺点,我们提出了一个优雅的基于方向的方法来对齐模式,引进一个新的分支提取对准部分功能。此外,我们提供了一个风格对准策略,以减少影像风格的变化,增强功能的统一。为了证明该方法的性能,我们对大规模基准数据集进行了广泛的实验。实验结果证实了该方法的优越性相比,国家的最先进的替代品。
18. Labelling unlabelled videos from scratch with multi-modal self-supervision [PDF] 返回目录
Yuki M. Asano, Mandela Patrick, Christian Rupprecht, Andrea Vedaldi
Abstract: A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, recent methods have allowed to generate meaningful (pseudo-) labels for unlabelled datasets without supervision, this development is missing for the video domain where learning feature representations is the current focus. In this work, we a) show that unsupervised labelling of a video dataset does not come for free from strong feature encoders and b) propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations, by leveraging the natural correspondence between the audio and visual modalities. An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels. We further introduce the first benchmarking results on unsupervised labelling of common video datasets Kinetics, Kinetics-Sound, VGG-Sound and AVE.
摘要:深度学习的谎言中的数据的有效性目前成功的很大一部分 - 更准确地说:标签的数据。然而,标签与人类注释的数据集继续执行成本高,尤其是对视频。而在图像领域,最近的方法已经让其产生未标记的数据集有意义的(伪)标签,没有监督,这方面的发展缺少了视频领域在那里学习的特征表示是当前的重点。在这项工作中,我们一)显示,视频数据集的无监督的标签不从强大的功能编码器和b前来免费)提出了一种新的聚类方法,使视频数据集的伪标签,没有任何人的注解,通过利用自然音频和可视方式之间的对应关系。广泛的分析表明,所产生的集群具有较高的语义重叠到地面实况的人的标签。我们进一步介绍关于常见的视频数据集动力学,动力学,声音,VGG声和AVE的无监督贴标第一基准测试结果。
Yuki M. Asano, Mandela Patrick, Christian Rupprecht, Andrea Vedaldi
Abstract: A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, recent methods have allowed to generate meaningful (pseudo-) labels for unlabelled datasets without supervision, this development is missing for the video domain where learning feature representations is the current focus. In this work, we a) show that unsupervised labelling of a video dataset does not come for free from strong feature encoders and b) propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations, by leveraging the natural correspondence between the audio and visual modalities. An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels. We further introduce the first benchmarking results on unsupervised labelling of common video datasets Kinetics, Kinetics-Sound, VGG-Sound and AVE.
摘要:深度学习的谎言中的数据的有效性目前成功的很大一部分 - 更准确地说:标签的数据。然而,标签与人类注释的数据集继续执行成本高,尤其是对视频。而在图像领域,最近的方法已经让其产生未标记的数据集有意义的(伪)标签,没有监督,这方面的发展缺少了视频领域在那里学习的特征表示是当前的重点。在这项工作中,我们一)显示,视频数据集的无监督的标签不从强大的功能编码器和b前来免费)提出了一种新的聚类方法,使视频数据集的伪标签,没有任何人的注解,通过利用自然音频和可视方式之间的对应关系。广泛的分析表明,所产生的集群具有较高的语义重叠到地面实况的人的标签。我们进一步介绍关于常见的视频数据集动力学,动力学,声音,VGG声和AVE的无监督贴标第一基准测试结果。
19. Unifying Optimization Methods for Color Filter Design [PDF] 返回目录
Graham Finlayson, Yuteng Zhu
Abstract: Through optimization we can solve for a filter that when the camera views the world through this filter, it is more colorimetric. Previous work solved for the filter that best satisfied the Luther condition: the camera spectral sensitivities after filtering were approximately a linear transform from the CIE XYZ color matching functions. A more recent method optimized for the filter that maximized the Vora-Value (a measure which relates to the closeness of the vector spaces spanned by the camera sensors and human vision sensors). The optimized Luther- and Vora-filters are different from one another. In this paper we begin by observing that the function defining the Vora-Value is equivalent to the Luther-condition optimization if we use the orthonormal basis of the XYZ color matching functions, i.e. we linearly transform the XYZ sensitivities to a set of orthonormal basis. In this formulation, the Luther-optimization algorithm is shown to almost optimize the Vora-Value. Moreover, experiments demonstrate that the modified orthonormal Luther-method finds the same color filter compared to the Vora-Value filter optimization. Significantly, our modified algorithm is simpler in formulation and also converges faster than the direct Vora-Value method.
摘要:通过优化,我们可以解决一个过滤器,当相机通过这个过滤器看世界,更比色。以前的工作解决了用于过滤器最满足路德条件:滤波后的照相机的光谱灵敏度约为线性从CIE XYZ色彩匹配函数变换。为最大化沃拉 - 值(其涉及由照相机传感器和人的视觉传感器所跨越的线性空间的接近度的度量)的滤波器优化的更近的方法。优化Luther-和沃拉滤波器是彼此不同的。在本文中,我们首先观察到如果我们使用的XYZ色彩匹配函数的标准正交基定义沃拉 - 值的功能等效于路德条件的优化,即我们线性变换XYZ敏感性的一组正交基的。在这种制剂中,路德优化算法被示出为几乎优化沃拉-价值。此外,实验结果表明,经修饰的正交路德-方法查找相比沃拉 - 值滤波器优化滤色器相同。显著,我们的改进算法是配方简单,也收敛比直接沃拉值法更快。
Graham Finlayson, Yuteng Zhu
Abstract: Through optimization we can solve for a filter that when the camera views the world through this filter, it is more colorimetric. Previous work solved for the filter that best satisfied the Luther condition: the camera spectral sensitivities after filtering were approximately a linear transform from the CIE XYZ color matching functions. A more recent method optimized for the filter that maximized the Vora-Value (a measure which relates to the closeness of the vector spaces spanned by the camera sensors and human vision sensors). The optimized Luther- and Vora-filters are different from one another. In this paper we begin by observing that the function defining the Vora-Value is equivalent to the Luther-condition optimization if we use the orthonormal basis of the XYZ color matching functions, i.e. we linearly transform the XYZ sensitivities to a set of orthonormal basis. In this formulation, the Luther-optimization algorithm is shown to almost optimize the Vora-Value. Moreover, experiments demonstrate that the modified orthonormal Luther-method finds the same color filter compared to the Vora-Value filter optimization. Significantly, our modified algorithm is simpler in formulation and also converges faster than the direct Vora-Value method.
摘要:通过优化,我们可以解决一个过滤器,当相机通过这个过滤器看世界,更比色。以前的工作解决了用于过滤器最满足路德条件:滤波后的照相机的光谱灵敏度约为线性从CIE XYZ色彩匹配函数变换。为最大化沃拉 - 值(其涉及由照相机传感器和人的视觉传感器所跨越的线性空间的接近度的度量)的滤波器优化的更近的方法。优化Luther-和沃拉滤波器是彼此不同的。在本文中,我们首先观察到如果我们使用的XYZ色彩匹配函数的标准正交基定义沃拉 - 值的功能等效于路德条件的优化,即我们线性变换XYZ敏感性的一组正交基的。在这种制剂中,路德优化算法被示出为几乎优化沃拉-价值。此外,实验结果表明,经修饰的正交路德-方法查找相比沃拉 - 值滤波器优化滤色器相同。显著,我们的改进算法是配方简单,也收敛比直接沃拉值法更快。
20. Recurrent Relational Memory Network for Unsupervised Image Captioning [PDF] 返回目录
Dan Guo, Yang Wang, Peipei Song, Meng Wang
Abstract: Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network ($R^2M$). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, $R^2M$ implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. $R^2M$ encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of $R^2M$ than state-of-the-arts on all benchmark datasets.
摘要:不带注释的无监督图像字幕是计算机视觉,在现有的艺术通常采用甘(剖成对抗性网络)模型的新的挑战。在本文中,我们提出了一种新的基于内存的网络,而不是甘,叫复发性关系内存网络($ R ^ 2M $)。不同于复杂和敏感的对抗性学习,非理想的执行长句子生成,$ R ^ 2M $器具通过两个阶段的内存机制的概念对句子存储翻译:融合和经常性的记忆,共同的视觉概念之间的关系推理相关并且所产生的长时间的话。 $ R ^ 2M $编码通过在图像上无监督训练视觉上下文,同时使存储器经由监督方式从不相干文本语料库学习。我们的解决方案中享有小于可学习参数和比基于GAN-方法,这在很大程度上承受参数灵敏度更高的计算效率。我们通过实验验证$ R ^ 2M $的优势超过国家的的艺术上的所有基准数据集。
Dan Guo, Yang Wang, Peipei Song, Meng Wang
Abstract: Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network ($R^2M$). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, $R^2M$ implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. $R^2M$ encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of $R^2M$ than state-of-the-arts on all benchmark datasets.
摘要:不带注释的无监督图像字幕是计算机视觉,在现有的艺术通常采用甘(剖成对抗性网络)模型的新的挑战。在本文中,我们提出了一种新的基于内存的网络,而不是甘,叫复发性关系内存网络($ R ^ 2M $)。不同于复杂和敏感的对抗性学习,非理想的执行长句子生成,$ R ^ 2M $器具通过两个阶段的内存机制的概念对句子存储翻译:融合和经常性的记忆,共同的视觉概念之间的关系推理相关并且所产生的长时间的话。 $ R ^ 2M $编码通过在图像上无监督训练视觉上下文,同时使存储器经由监督方式从不相干文本语料库学习。我们的解决方案中享有小于可学习参数和比基于GAN-方法,这在很大程度上承受参数灵敏度更高的计算效率。我们通过实验验证$ R ^ 2M $的优势超过国家的的艺术上的所有基准数据集。
21. Comprehensive Information Integration Modeling Framework for Video Titling [PDF] 返回目录
Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu
Abstract: In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community...
摘要:在电子商务,消费者产生的视频,这在一般的交付消费者的个人喜好对某些产品的不同方面,在体积巨大的。为了更有效地推荐这些视频,潜在的消费者,多样化和朗朗上口的视频节目是至关重要的。然而,消费者产生的影片很少陪适当的标题。为了弥补这种差距,我们整合的综合信息来源,包括消费者生成的视频,由消费者提供的叙述评论的句子,以及产品属性的内容,在终端到终端的建模框架。虽然自动视频字幕是非常有用的和苛刻的,它更比解决视频字幕。后者则强调生成描述作为一个整体的视频,而我们的任务需要的产品感知的多晶视频分析句子。为了解决这个问题,所提出的方法包括两个过程,即粒度级交互建模和抽象化水平的故事线汇总的。具体地讲,粒度级相互作用第一建模利用时空界标线索,描述性词语,和抽象属性建立三个单独的图形和识别图形通过神经网络(GNN)每个图形帧内动作。然后,全局 - 局部聚集模块提出了跨越图和总异质图形间的动作模型转换成一个整体的图形表示。抽象级别故事线汇总进一步考虑了帧级别的视频特征和整体图形利用产品和背景之间的相互作用,并产生视频故事线主题。因此,我们收集了大型数据集从淘宝,全球领先的电子商务平台,真实世界的数据,并将使敏的公开版本提供给研究界的滋养进一步发展...
Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu
Abstract: In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community...
摘要:在电子商务,消费者产生的视频,这在一般的交付消费者的个人喜好对某些产品的不同方面,在体积巨大的。为了更有效地推荐这些视频,潜在的消费者,多样化和朗朗上口的视频节目是至关重要的。然而,消费者产生的影片很少陪适当的标题。为了弥补这种差距,我们整合的综合信息来源,包括消费者生成的视频,由消费者提供的叙述评论的句子,以及产品属性的内容,在终端到终端的建模框架。虽然自动视频字幕是非常有用的和苛刻的,它更比解决视频字幕。后者则强调生成描述作为一个整体的视频,而我们的任务需要的产品感知的多晶视频分析句子。为了解决这个问题,所提出的方法包括两个过程,即粒度级交互建模和抽象化水平的故事线汇总的。具体地讲,粒度级相互作用第一建模利用时空界标线索,描述性词语,和抽象属性建立三个单独的图形和识别图形通过神经网络(GNN)每个图形帧内动作。然后,全局 - 局部聚集模块提出了跨越图和总异质图形间的动作模型转换成一个整体的图形表示。抽象级别故事线汇总进一步考虑了帧级别的视频特征和整体图形利用产品和背景之间的相互作用,并产生视频故事线主题。因此,我们收集了大型数据集从淘宝,全球领先的电子商务平台,真实世界的数据,并将使敏的公开版本提供给研究界的滋养进一步发展...
22. Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks [PDF] 返回目录
Surgan Jandial, Ayush Chopra, Mausoom Sarkar, Piyush Gupta, Balaji Krishnamurthy, Vineeth Balasubramanian
Abstract: Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains - images, speech, text, and graphs - to show that the proposed loss results in improved performance across input domains, tasks, and architectures.
摘要:深层神经网络(DNNs)是在几个领域已经启用的突破强大的学习机。在这项工作中,我们引入了一个新的回顾性损失训练期间利用过去的模型状态提供的以往的经验,提高深层神经网络模型的训练。在朝着最佳参数状态下的电流训练步骤最大限度降低损失回顾展,与特定任务的损失一起,将参数状态,而在以前的训练步骤拉离参数状态。虽然一个简单的想法,我们分析的方法,以及进行全面台跨域实验 - 图像,语音,文字和图形 - 表明,所提出的损失导致在输入领域,任务和结构改善的性能。
Surgan Jandial, Ayush Chopra, Mausoom Sarkar, Piyush Gupta, Balaji Krishnamurthy, Vineeth Balasubramanian
Abstract: Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains - images, speech, text, and graphs - to show that the proposed loss results in improved performance across input domains, tasks, and architectures.
摘要:深层神经网络(DNNs)是在几个领域已经启用的突破强大的学习机。在这项工作中,我们引入了一个新的回顾性损失训练期间利用过去的模型状态提供的以往的经验,提高深层神经网络模型的训练。在朝着最佳参数状态下的电流训练步骤最大限度降低损失回顾展,与特定任务的损失一起,将参数状态,而在以前的训练步骤拉离参数状态。虽然一个简单的想法,我们分析的方法,以及进行全面台跨域实验 - 图像,语音,文字和图形 - 表明,所提出的损失导致在输入领域,任务和结构改善的性能。
23. Large-scale detection and categorization of oil spills from SAR images with deep learning [PDF] 返回目录
Filippo Maria Bianchi, Martine M. Espeseth, Njål Borch
Abstract: We propose a deep learning framework to detect and categorize oil spills in synthetic aperture radar (SAR) images at a large scale. By means of a carefully designed neural network model for image segmentation trained on an extensive dataset, we are able to obtain state-of-the-art performance in oil spill detection, achieving results that are comparable to results produced by human operators. We also introduce a classification task, which is novel in the context of oil spill detection in SAR. Specifically, after being detected, each oil spill is also classified according to different categories pertaining to its shape and texture characteristics. The classification results provide valuable insights for improving the design of oil spill services by world-leading providers. As the last contribution, we present our operational pipeline and a visualization tool for large-scale data, which allows to detect and analyze the historical presence of oil spills worldwide.
摘要:我们提出了一个深刻的学习框架来检测和分类油在大规模洒在合成孔径雷达(SAR)图像。通过对培训了广泛的数据集图像分割精心设计的神经网络模型的手段,我们能够获得在溢油检测状态的最先进的性能,实现的结果是可比的人工操作产生的结果。我们还引入了一个分类的任务,这是在SAR溢油检测方面的小说。具体而言,被检测后,每个漏油也根据有关它的形状和纹理特征不同的类别分类。分类结果为提高溢油服务的全球领先供应商的设计有价值的见解。作为最后的贡献,我们提出我们的业务管道,为大规模数据,这使得检测和分析的石油历史存在一个可视化工具的全球溢出。
Filippo Maria Bianchi, Martine M. Espeseth, Njål Borch
Abstract: We propose a deep learning framework to detect and categorize oil spills in synthetic aperture radar (SAR) images at a large scale. By means of a carefully designed neural network model for image segmentation trained on an extensive dataset, we are able to obtain state-of-the-art performance in oil spill detection, achieving results that are comparable to results produced by human operators. We also introduce a classification task, which is novel in the context of oil spill detection in SAR. Specifically, after being detected, each oil spill is also classified according to different categories pertaining to its shape and texture characteristics. The classification results provide valuable insights for improving the design of oil spill services by world-leading providers. As the last contribution, we present our operational pipeline and a visualization tool for large-scale data, which allows to detect and analyze the historical presence of oil spills worldwide.
摘要:我们提出了一个深刻的学习框架来检测和分类油在大规模洒在合成孔径雷达(SAR)图像。通过对培训了广泛的数据集图像分割精心设计的神经网络模型的手段,我们能够获得在溢油检测状态的最先进的性能,实现的结果是可比的人工操作产生的结果。我们还引入了一个分类的任务,这是在SAR溢油检测方面的小说。具体而言,被检测后,每个漏油也根据有关它的形状和纹理特征不同的类别分类。分类结果为提高溢油服务的全球领先供应商的设计有价值的见解。作为最后的贡献,我们提出我们的业务管道,为大规模数据,这使得检测和分析的石油历史存在一个可视化工具的全球溢出。
24. DISK: Learning local features with policy gradient [PDF] 返回目录
Michał J. Tyszkiewicz, Pascal Fua, Eduard Trulls
Abstract: Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.
摘要:局部特征框架难以在终端到终端的方式来学习,由于固有的稀疏的关键点的选择和搭配的离散。我们引入DISK(离散关键点),其克服由强化学习(RL)利用原则,优化端至端为高数量的正确的特征匹配的这些障碍的新方法。我们简单而传神概率模型让我们保持训练和推理政权的密切,同时保持足够好的收敛性可靠地从头开始训练。我们的特点可以非常密集的提取,而其余歧视性,挑战什么构成了良好的关键点共同持有的假设,如展示在图1中,和三个公共基准实现国家的最先进的成果。
Michał J. Tyszkiewicz, Pascal Fua, Eduard Trulls
Abstract: Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.
摘要:局部特征框架难以在终端到终端的方式来学习,由于固有的稀疏的关键点的选择和搭配的离散。我们引入DISK(离散关键点),其克服由强化学习(RL)利用原则,优化端至端为高数量的正确的特征匹配的这些障碍的新方法。我们简单而传神概率模型让我们保持训练和推理政权的密切,同时保持足够好的收敛性可靠地从头开始训练。我们的特点可以非常密集的提取,而其余歧视性,挑战什么构成了良好的关键点共同持有的假设,如展示在图1中,和三个公共基准实现国家的最先进的成果。
25. NINEPINS: Nuclei Instance Segmentation with Point Annotations [PDF] 返回目录
Ting-An Yen, Hung-Chun Hsu, Pushpak Pati, Maria Gabrani, Antonio Foncubierta-Rodríguez, Pau-Choo Chung
Abstract: Deep learning-based methods are gaining traction in digital pathology, with an increasing number of publications and challenges that aim at easing the work of systematically and exhaustively analyzing tissue slides. These methods often achieve very high accuracies, at the cost of requiring large annotated datasets to train. This requirement is especially difficult to fulfill in the medical field, where expert knowledge is essential. In this paper we focus on nuclei segmentation, which generally requires experienced pathologists to annotate the nuclear areas in gigapixel histological images. We propose an algorithm for instance segmentation that uses pseudo-label segmentations generated automatically from point annotations, as a method to reduce the burden for pathologists. With the generated segmentation masks, the proposed method trains a modified version of HoVer-Net model to achieve instance segmentation. Experimental results show that the proposed method is robust to inaccuracies in point annotations and comparison with Hover-Net trained with fully annotated instance masks shows that a degradation in segmentation performance does not always imply a degradation in higher order tasks such as tissue classification.
摘要:深基于学习的方法在数字病理学获得牵引力,随着越来越多的出版物和挑战,旨在缓解在系统地和详尽地分析组织切片的工作。这些方法往往能取得非常高的精度,在需要大型注释的数据集列车的成本。这个要求是特别困难的在医疗领域,其中的专业知识是必不可少的履行。在本文中,我们专注于细胞核的分割,一般需要有经验的病理学家注释在千兆像素的组织学图像核区域。我们建议从点注释自动生成使用伪标签分割,以减少对病理学家负担的方法,例如分割算法。随着生成的分割口罩,所提出的方法训练悬停网模型的修改版本,以实现例如分割。实验结果表明,该方法是稳健的点注释和比较不准确而悬停网与全注解实例面具表演的培训,在分割性能下降并不总是意味着更高的优先任务,如组织分类的退化。
Ting-An Yen, Hung-Chun Hsu, Pushpak Pati, Maria Gabrani, Antonio Foncubierta-Rodríguez, Pau-Choo Chung
Abstract: Deep learning-based methods are gaining traction in digital pathology, with an increasing number of publications and challenges that aim at easing the work of systematically and exhaustively analyzing tissue slides. These methods often achieve very high accuracies, at the cost of requiring large annotated datasets to train. This requirement is especially difficult to fulfill in the medical field, where expert knowledge is essential. In this paper we focus on nuclei segmentation, which generally requires experienced pathologists to annotate the nuclear areas in gigapixel histological images. We propose an algorithm for instance segmentation that uses pseudo-label segmentations generated automatically from point annotations, as a method to reduce the burden for pathologists. With the generated segmentation masks, the proposed method trains a modified version of HoVer-Net model to achieve instance segmentation. Experimental results show that the proposed method is robust to inaccuracies in point annotations and comparison with Hover-Net trained with fully annotated instance masks shows that a degradation in segmentation performance does not always imply a degradation in higher order tasks such as tissue classification.
摘要:深基于学习的方法在数字病理学获得牵引力,随着越来越多的出版物和挑战,旨在缓解在系统地和详尽地分析组织切片的工作。这些方法往往能取得非常高的精度,在需要大型注释的数据集列车的成本。这个要求是特别困难的在医疗领域,其中的专业知识是必不可少的履行。在本文中,我们专注于细胞核的分割,一般需要有经验的病理学家注释在千兆像素的组织学图像核区域。我们建议从点注释自动生成使用伪标签分割,以减少对病理学家负担的方法,例如分割算法。随着生成的分割口罩,所提出的方法训练悬停网模型的修改版本,以实现例如分割。实验结果表明,该方法是稳健的点注释和比较不准确而悬停网与全注解实例面具表演的培训,在分割性能下降并不总是意味着更高的优先任务,如组织分类的退化。
26. Defending against adversarial attacks on medical imaging AI system, classification or detection? [PDF] 返回目录
Xin Li, Deng Pan, Dongxiao Zhu
Abstract: Medical imaging AI systems such as disease classification and segmentation are increasingly inspired and transformed from computer vision based AI systems. Although an array of adversarial training and/or loss function based defense techniques have been developed and proved to be effective in computer vision, defending against adversarial attacks on medical images remains largely an uncharted territory due to the following unique challenges: 1) label scarcity in medical images significantly limits adversarial generalizability of the AI system; 2) vastly similar and dominant fore- and background in medical images make it hard samples for learning the discriminating features between different disease classes; and 3) crafted adversarial noises added to the entire medical image as opposed to the focused organ target can make clean and adversarial examples more discriminate than that between different disease classes. In this paper, we propose a novel robust medical imaging AI framework based on Semi-Supervised Adversarial Training (SSAT) and Unsupervised Adversarial Detection (UAD), followed by designing a new measure for assessing systems adversarial risk. We systematically demonstrate the advantages of our robust medical imaging AI system over the existing adversarial defense techniques under diverse real-world settings of adversarial attacks using a benchmark OCT imaging data set.
摘要:医学影像AI系统等疾病的分类和细分越来越多的启发和基于计算机视觉的AI系统改造。虽然对抗性训练和/或丧失是基于功能的防御技术的阵列已经制定并证明是有效的计算机视觉,防御医学图像对抗性攻击仍然很大程度上是一个未知领域,由于以下独特的挑战:1)标签稀缺性医学图像显著限制了AI系统的对抗性概; 2)极大地相似,主导fore-和医学图像的背景使学习不同的疾病类别之间的区别特征很难样品;和3)制作的对抗性的噪声添加到整个医用图像,而不是聚焦器官靶可以使清洁和对抗性例子比不同疾病类之间更判别。在本文中,我们提出了一种基于半监督对抗性训练(SSAT)和无监督对抗性检测(UAD)一种新的强大的医疗成像AI框架,其次是用于评估系统对抗风险设计一个新举措。我们系统地展示了下对抗攻击的多样化现实世界的设置使用的基准OCT成像数据集现有的对抗性防御技术我们强大的医疗成像AI系统的优势。
Xin Li, Deng Pan, Dongxiao Zhu
Abstract: Medical imaging AI systems such as disease classification and segmentation are increasingly inspired and transformed from computer vision based AI systems. Although an array of adversarial training and/or loss function based defense techniques have been developed and proved to be effective in computer vision, defending against adversarial attacks on medical images remains largely an uncharted territory due to the following unique challenges: 1) label scarcity in medical images significantly limits adversarial generalizability of the AI system; 2) vastly similar and dominant fore- and background in medical images make it hard samples for learning the discriminating features between different disease classes; and 3) crafted adversarial noises added to the entire medical image as opposed to the focused organ target can make clean and adversarial examples more discriminate than that between different disease classes. In this paper, we propose a novel robust medical imaging AI framework based on Semi-Supervised Adversarial Training (SSAT) and Unsupervised Adversarial Detection (UAD), followed by designing a new measure for assessing systems adversarial risk. We systematically demonstrate the advantages of our robust medical imaging AI system over the existing adversarial defense techniques under diverse real-world settings of adversarial attacks using a benchmark OCT imaging data set.
摘要:医学影像AI系统等疾病的分类和细分越来越多的启发和基于计算机视觉的AI系统改造。虽然对抗性训练和/或丧失是基于功能的防御技术的阵列已经制定并证明是有效的计算机视觉,防御医学图像对抗性攻击仍然很大程度上是一个未知领域,由于以下独特的挑战:1)标签稀缺性医学图像显著限制了AI系统的对抗性概; 2)极大地相似,主导fore-和医学图像的背景使学习不同的疾病类别之间的区别特征很难样品;和3)制作的对抗性的噪声添加到整个医用图像,而不是聚焦器官靶可以使清洁和对抗性例子比不同疾病类之间更判别。在本文中,我们提出了一种基于半监督对抗性训练(SSAT)和无监督对抗性检测(UAD)一种新的强大的医疗成像AI框架,其次是用于评估系统对抗风险设计一个新举措。我们系统地展示了下对抗攻击的多样化现实世界的设置使用的基准OCT成像数据集现有的对抗性防御技术我们强大的医疗成像AI系统的优势。
27. Affinity Fusion Graph-based Framework for Natural Image Segmentation [PDF] 返回目录
Yang Zhang, Moyun Liu, Jingwu He, Fei Pan, Yanwen Guo
Abstract: This paper proposes an affinity fusion graph framework to effectively connect different graphs with highly discriminating power and nonlinearity for natural image segmentation. The proposed framework combines adjacency-graphs and kernel spectral clustering based graphs (KSC-graphs) according to a new definition named affinity nodes of multi-scale superpixels. These affinity nodes are selected based on a better affiliation of superpixels, namely subspace-preserving representation which is generated by sparse subspace clustering based on subspace pursuit. Then a KSC-graph is built via a novel kernel spectral clustering to explore the nonlinear relationships among these affinity nodes. Moreover, an adjacency-graph at each scale is constructed, which is further used to update the proposed KSC-graph at affinity nodes. The fusion graph is built across different scales, and it is partitioned to obtain final segmentation result. Experimental results on the Berkeley segmentation dataset and Microsoft Research Cambridge dataset show the superiority of our framework in comparison with the state-of-the-art methods. The code is available at this https URL.
摘要:本文提出了一种具有高度区分功率和非线性自然图像分割亲和融合图形框架有效地连接不同的曲线图。所提出的框架结合邻接图和内核谱聚类根据命名多尺度超像素的亲和力节点的新定义基于图(KSC-图)。这些亲和力节点基于超像素,这是由基于子空间的追求稀疏子空间聚类生成即子空间保留表示的更好从属关系选择。然后一个KSC-曲线图通过新的内核谱聚类内置探索这些亲和力节点之间的非线性关系。此外,邻接-图表在各标尺构造,其进一步用于更新在亲和力节点提议的KSC-曲线图。融合图形跨越不同尺度建造,它被分割以获得最终的分割结果。在伯克利分割数据集和微软剑桥研究院的实验结果数据集上与国家的最先进的方法相比,我们的框架的优越性。该代码可在此HTTPS URL。
Yang Zhang, Moyun Liu, Jingwu He, Fei Pan, Yanwen Guo
Abstract: This paper proposes an affinity fusion graph framework to effectively connect different graphs with highly discriminating power and nonlinearity for natural image segmentation. The proposed framework combines adjacency-graphs and kernel spectral clustering based graphs (KSC-graphs) according to a new definition named affinity nodes of multi-scale superpixels. These affinity nodes are selected based on a better affiliation of superpixels, namely subspace-preserving representation which is generated by sparse subspace clustering based on subspace pursuit. Then a KSC-graph is built via a novel kernel spectral clustering to explore the nonlinear relationships among these affinity nodes. Moreover, an adjacency-graph at each scale is constructed, which is further used to update the proposed KSC-graph at affinity nodes. The fusion graph is built across different scales, and it is partitioned to obtain final segmentation result. Experimental results on the Berkeley segmentation dataset and Microsoft Research Cambridge dataset show the superiority of our framework in comparison with the state-of-the-art methods. The code is available at this https URL.
摘要:本文提出了一种具有高度区分功率和非线性自然图像分割亲和融合图形框架有效地连接不同的曲线图。所提出的框架结合邻接图和内核谱聚类根据命名多尺度超像素的亲和力节点的新定义基于图(KSC-图)。这些亲和力节点基于超像素,这是由基于子空间的追求稀疏子空间聚类生成即子空间保留表示的更好从属关系选择。然后一个KSC-曲线图通过新的内核谱聚类内置探索这些亲和力节点之间的非线性关系。此外,邻接-图表在各标尺构造,其进一步用于更新在亲和力节点提议的KSC-曲线图。融合图形跨越不同尺度建造,它被分割以获得最终的分割结果。在伯克利分割数据集和微软剑桥研究院的实验结果数据集上与国家的最先进的方法相比,我们的框架的优越性。该代码可在此HTTPS URL。
28. Towards Adversarial Planning for Indoor Scenes with Rotation [PDF] 返回目录
Xinhan Di, Pengqian Yu, Hong Zhu, Lei Cai, Qiuyan Sheng, Changyu Sun
Abstract: In this paper, we propose an adversarial model for producing furniture layout for interior scene synthesis when the interior room is rotated. The proposed model combines a conditional adversarial network, a rotation module, a mode module, and a rotation discriminator module. As compared with the prior work on scene synthesis, our proposed three modules enhance the ability of auto-layout generation and reduce the mode collapse during the rotation of the interior room. We provide an interior layout dataset that contains $14400$ designs from the professional designers with rotation. In our experiments, we compare the quality of the layouts with two baselines. The numerical results demonstrate that the proposed model provides higher-quality layouts for four types of rooms, including the bedroom, the bathroom, the study room, and the tatami room.
摘要:在本文中,我们提出了在内部空间旋转产生的室内场景的合成家具布局的对抗模式。该模型结合了有条件对抗性网络,旋转模块,模式模块,和一个旋转鉴别器模块。与场景合成的以前的工作相比,我们提出的三个模块增强自动布局一代的能力和内部空间的旋转过程中降低模式的崩溃。我们提供包含从专业设计师与旋转$ 14400所$设计的内饰布局数据集。在我们的实验中,我们比较的布局与这两个基准质量。计算结果表明,该模型四种类型的房间,包括卧室,卫生间,书房,和室榻榻米提供更高质量的布局。
Xinhan Di, Pengqian Yu, Hong Zhu, Lei Cai, Qiuyan Sheng, Changyu Sun
Abstract: In this paper, we propose an adversarial model for producing furniture layout for interior scene synthesis when the interior room is rotated. The proposed model combines a conditional adversarial network, a rotation module, a mode module, and a rotation discriminator module. As compared with the prior work on scene synthesis, our proposed three modules enhance the ability of auto-layout generation and reduce the mode collapse during the rotation of the interior room. We provide an interior layout dataset that contains $14400$ designs from the professional designers with rotation. In our experiments, we compare the quality of the layouts with two baselines. The numerical results demonstrate that the proposed model provides higher-quality layouts for four types of rooms, including the bedroom, the bathroom, the study room, and the tatami room.
摘要:在本文中,我们提出了在内部空间旋转产生的室内场景的合成家具布局的对抗模式。该模型结合了有条件对抗性网络,旋转模块,模式模块,和一个旋转鉴别器模块。与场景合成的以前的工作相比,我们提出的三个模块增强自动布局一代的能力和内部空间的旋转过程中降低模式的崩溃。我们提供包含从专业设计师与旋转$ 14400所$设计的内饰布局数据集。在我们的实验中,我们比较的布局与这两个基准质量。计算结果表明,该模型四种类型的房间,包括卧室,卫生间,书房,和室榻榻米提供更高质量的布局。
29. 3D Pose Detection in Videos: Focusing on Occlusion [PDF] 返回目录
Justin Wang, Edward Xu, Kangrui Xue, Lukasz Kidzinski
Abstract: In this work, we build upon existing methods for occlusion-aware 3D pose detection in videos. We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions, which are then inputted into a temporal convolutional network to produce 3D pose predictions. To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model used to generate occlusion labels. We find that the occlusion-aware network is able to achieve a mean-per-joint-position error 5 mm less than our linear baseline model on the Human3.6M dataset. Compared to our temporal convolutional network baseline, we achieve a comparable mean-per-joint-position error of 0.1 mm less at reduced computational cost.
摘要:在这项工作中,我们建立在对视频闭塞感知3D姿势检测现有的方法。我们实现,其由堆叠的沙漏网络来产生2D姿势的预测,然后将其输入到时间卷积网络以产生三维姿态的预测的两级结构。为了便于与闭塞关节姿势预测,我们介绍用于生成遮挡标签气缸人模型的直观概括。我们发现,闭塞感知网络是能够实现平均每关节位置误差5毫米小于我们对Human3.6M数据集线性基准模型。相比于我们的时间卷积网络基线,我们在降低计算成本达到0.1毫米,相当的平均每关节位置误差更小。
Justin Wang, Edward Xu, Kangrui Xue, Lukasz Kidzinski
Abstract: In this work, we build upon existing methods for occlusion-aware 3D pose detection in videos. We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions, which are then inputted into a temporal convolutional network to produce 3D pose predictions. To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model used to generate occlusion labels. We find that the occlusion-aware network is able to achieve a mean-per-joint-position error 5 mm less than our linear baseline model on the Human3.6M dataset. Compared to our temporal convolutional network baseline, we achieve a comparable mean-per-joint-position error of 0.1 mm less at reduced computational cost.
摘要:在这项工作中,我们建立在对视频闭塞感知3D姿势检测现有的方法。我们实现,其由堆叠的沙漏网络来产生2D姿势的预测,然后将其输入到时间卷积网络以产生三维姿态的预测的两级结构。为了便于与闭塞关节姿势预测,我们介绍用于生成遮挡标签气缸人模型的直观概括。我们发现,闭塞感知网络是能够实现平均每关节位置误差5毫米小于我们对Human3.6M数据集线性基准模型。相比于我们的时间卷积网络基线,我们在降低计算成本达到0.1毫米,相当的平均每关节位置误差更小。
30. Disentangle Perceptual Learning through Online Contrastive Learning [PDF] 返回目录
Kangfu Mei, Yao Lu, Qiaosi Yi, Haoyu Wu, Juncheng Li, Rui Huang
Abstract: Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception, while others are irrelevant, although both will affect the final image transformation results. Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning. The resulted network includes the pre-training part and a feature selection layer, followed by the contrastive learning module, which utilizes the transformed results, target images, and task-oriented distorted images as the positive, negative, and anchor samples, respectively. The contrastive learning aims at activating the perception-relevant dimensions and suppressing the irrelevant ones by using the triplet loss, so that the original representation can be disentangled for better perceptual quality. Experiments on various image transformation tasks demonstrate the superiority of our framework, in terms of human visual perception, to the existing approaches using pre-trained networks and empirically designed losses.
摘要:根据人类视觉感知追求逼真的效果是在图像变换任务的核心问题。知觉学习方法,如视觉损失是这样的任务经验强大,但它们通常依靠预先训练的分类网络上提供的功能,这不一定是图像变换的视觉感受方面最佳。在本文中,我们认为,从预训练的分类网络的特征表现中,只有有限的尺寸,关系到人的视觉感受,而有些则是不相关的,虽然两者都将影响到最终的图像变换的结果。在这种假设下,我们试图通过我们提出的在线对比学习中挣脱出来,从代表性的看法,相关尺寸。所得到的网络包括预训练部分和特征选择层,接着是对比学习模块,其利用经变换的结果,目标图像,以及作为正,负和锚样品面向任务的失真的图像,分别。在对比学习的目的是激活的感知相关的尺寸,并采用三重损失抑制不相关的,使原来表示可以迎刃而解更好的感知质量。对各种图像转换任务实验证明我们的架构的优越性,在人类视觉感知的方面,使用预训练的网络和经验设计损失现有的方法。
Kangfu Mei, Yao Lu, Qiaosi Yi, Haoyu Wu, Juncheng Li, Rui Huang
Abstract: Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception, while others are irrelevant, although both will affect the final image transformation results. Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning. The resulted network includes the pre-training part and a feature selection layer, followed by the contrastive learning module, which utilizes the transformed results, target images, and task-oriented distorted images as the positive, negative, and anchor samples, respectively. The contrastive learning aims at activating the perception-relevant dimensions and suppressing the irrelevant ones by using the triplet loss, so that the original representation can be disentangled for better perceptual quality. Experiments on various image transformation tasks demonstrate the superiority of our framework, in terms of human visual perception, to the existing approaches using pre-trained networks and empirically designed losses.
摘要:根据人类视觉感知追求逼真的效果是在图像变换任务的核心问题。知觉学习方法,如视觉损失是这样的任务经验强大,但它们通常依靠预先训练的分类网络上提供的功能,这不一定是图像变换的视觉感受方面最佳。在本文中,我们认为,从预训练的分类网络的特征表现中,只有有限的尺寸,关系到人的视觉感受,而有些则是不相关的,虽然两者都将影响到最终的图像变换的结果。在这种假设下,我们试图通过我们提出的在线对比学习中挣脱出来,从代表性的看法,相关尺寸。所得到的网络包括预训练部分和特征选择层,接着是对比学习模块,其利用经变换的结果,目标图像,以及作为正,负和锚样品面向任务的失真的图像,分别。在对比学习的目的是激活的感知相关的尺寸,并采用三重损失抑制不相关的,使原来表示可以迎刃而解更好的感知质量。对各种图像转换任务实验证明我们的架构的优越性,在人类视觉感知的方面,使用预训练的网络和经验设计损失现有的方法。
31. Dynamic Functional Connectivity and Graph Convolution Network for Alzheimer's Disease Classification [PDF] 返回目录
Xingwei An, Yutao Zhou, Yang Di, Dong Ming
Abstract: Alzheimer's disease (AD) is the most prevalent form of dementia. Traditional methods cannot achieve efficient and accurate diagnosis of AD. In this paper, we introduce a novel method based on dynamic functional connectivity (dFC) that can effectively capture changes in the brain. We compare and combine four different types of features including amplitude of low-frequency fluctuation (ALFF), regional homogeneity (ReHo), dFC and the adjacency matrix of different brain structures between subjects. We use graph convolution network (GCN) which consider the similarity of brain structure between patients to solve the classification problem of non-Euclidean domains. The proposed method's accuracy and the area under the receiver operating characteristic curve achieved 91.3% and 98.4%. This result demonstrated that our proposed method can be used for detecting AD.
摘要:阿尔茨海默病(AD)是痴呆的最普遍的形式。传统方法不能达到AD的有效和准确的诊断。在本文中,我们介绍了基于动态功能连接(DFC),其能够有效地捕获在大脑中的变化的新方法。我们比较和结合四种不同类型的特征,包括低频波动(ALFF)的振幅,局部一致性(ReHo值),DFC和受试者之间不同的大脑结构的邻接矩阵。我们用图卷积网络(GCN),它认为大脑结构的患者之间的相似性来解决非欧几里德域的分类问题。所提出的方法的准确度和所述接收器工作特性曲线下的面积来实现91.3%和98.4%。该结果表明,我们提出的方法可以被用于检测AD。
Xingwei An, Yutao Zhou, Yang Di, Dong Ming
Abstract: Alzheimer's disease (AD) is the most prevalent form of dementia. Traditional methods cannot achieve efficient and accurate diagnosis of AD. In this paper, we introduce a novel method based on dynamic functional connectivity (dFC) that can effectively capture changes in the brain. We compare and combine four different types of features including amplitude of low-frequency fluctuation (ALFF), regional homogeneity (ReHo), dFC and the adjacency matrix of different brain structures between subjects. We use graph convolution network (GCN) which consider the similarity of brain structure between patients to solve the classification problem of non-Euclidean domains. The proposed method's accuracy and the area under the receiver operating characteristic curve achieved 91.3% and 98.4%. This result demonstrated that our proposed method can be used for detecting AD.
摘要:阿尔茨海默病(AD)是痴呆的最普遍的形式。传统方法不能达到AD的有效和准确的诊断。在本文中,我们介绍了基于动态功能连接(DFC),其能够有效地捕获在大脑中的变化的新方法。我们比较和结合四种不同类型的特征,包括低频波动(ALFF)的振幅,局部一致性(ReHo值),DFC和受试者之间不同的大脑结构的邻接矩阵。我们用图卷积网络(GCN),它认为大脑结构的患者之间的相似性来解决非欧几里德域的分类问题。所提出的方法的准确度和所述接收器工作特性曲线下的面积来实现91.3%和98.4%。该结果表明,我们提出的方法可以被用于检测AD。
32. Learning Interclass Relations for Image Classification [PDF] 返回目录
Muhamedrahimov Raouf, Bar Amir, Akselrod-Ballin Ayelet
Abstract: In standard classification, we typically treat class categories as independent of one-another. In many problems, however, we would be neglecting the natural relations that exist between categories, which are often dictated by an underlying biological or physical process. In this work, we propose novel formulations of the classification problem, based on a realization that the assumption of class-independence is a limiting factor that leads to the requirement of more training data. First, we propose manual ways to reduce our data needs by reintroducing knowledge about problem-specific interclass relations into the training process. Second, we propose a general approach to jointly learn categorical label representations that can implicitly encode natural interclass relations, alleviating the need for strong prior assumptions, which are not always available. We demonstrate this in the domain of medical images, where access to large amounts of labelled data is not trivial. Specifically, our experiments show the advantages of this approach in the classification of Intravenous Contrast enhancement phases in CT images, which encapsulate multiple interesting inter-class relations.
摘要:在标准的分类,我们通常治疗类属独立的一另一个。在很多的问题,但是,我们会忽略的是类,这往往是由一个潜在的生物或物理方法规定之间存在着自然的关系。在这项工作中,我们提出了分类问题的新配方的基础上,认识到阶级独立的假设是一个限制因素,导致更多的训练数据的要求。首先,我们建议手动方式重新采用有关问题的特定知识类别间的关系到培训过程中,以减少我们的数据需求。其次,我们提出了一个通用的方法,共同学习分类标签表示可以含蓄地编码天然类别间的关系,减轻强之前的假设,这并不总是可用的需要。我们在医学影像,那里获取大量的标签数据是不平凡的域证明这一点。具体来说,我们的实验表明,在CT图像静脉对比度增强相,其封装多个有趣级间关系的分类这种方法的优点。
Muhamedrahimov Raouf, Bar Amir, Akselrod-Ballin Ayelet
Abstract: In standard classification, we typically treat class categories as independent of one-another. In many problems, however, we would be neglecting the natural relations that exist between categories, which are often dictated by an underlying biological or physical process. In this work, we propose novel formulations of the classification problem, based on a realization that the assumption of class-independence is a limiting factor that leads to the requirement of more training data. First, we propose manual ways to reduce our data needs by reintroducing knowledge about problem-specific interclass relations into the training process. Second, we propose a general approach to jointly learn categorical label representations that can implicitly encode natural interclass relations, alleviating the need for strong prior assumptions, which are not always available. We demonstrate this in the domain of medical images, where access to large amounts of labelled data is not trivial. Specifically, our experiments show the advantages of this approach in the classification of Intravenous Contrast enhancement phases in CT images, which encapsulate multiple interesting inter-class relations.
摘要:在标准的分类,我们通常治疗类属独立的一另一个。在很多的问题,但是,我们会忽略的是类,这往往是由一个潜在的生物或物理方法规定之间存在着自然的关系。在这项工作中,我们提出了分类问题的新配方的基础上,认识到阶级独立的假设是一个限制因素,导致更多的训练数据的要求。首先,我们建议手动方式重新采用有关问题的特定知识类别间的关系到培训过程中,以减少我们的数据需求。其次,我们提出了一个通用的方法,共同学习分类标签表示可以含蓄地编码天然类别间的关系,减轻强之前的假设,这并不总是可用的需要。我们在医学影像,那里获取大量的标签数据是不平凡的域证明这一点。具体来说,我们的实验表明,在CT图像静脉对比度增强相,其封装多个有趣级间关系的分类这种方法的优点。
33. ATSO: Asynchronous Teacher-Student Optimizationfor Semi-Supervised Medical Image Segmentation [PDF] 返回目录
Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Qi Tian
Abstract: In medical image analysis, semi-supervised learning is an effective method to extract knowledge from a small amount of labeled data and a large amount of unlabeled data. This paper focuses on a popular pipeline known as self learning, and points out a weakness namedlazy learningthat refers to the difficulty for a model to learn from the pseudo labels generated by itself. To alleviate this issue, we proposeATSO, anasynchronousversion of teacher-student optimization. ATSO partitions the unlabeled data into two subsets and alternately uses one subset to fine-tune the model and updates the label on the other subset. We evaluate ATSO on two popular medical image segmentation datasets and show its superior performance in various semi-supervised settings. With slight modification, ATSO transfers well to natural image segmentation for autonomous driving data.
摘要:医学图像分析,半监督学习是从标记数据少量和大量未标记的数据,以提取知识的有效方法。本文重点介绍一个受欢迎的管道被称为自学习,并指出一个弱点namedlazy learningthat指困难的模型由自身产生的伪标签学习。为了缓解这一问题,我们proposeATSO,师生优化anasynchronousversion。 ATSO分区未标记的数据分为两个子集,并交替使用一个子集,以微调模型和更新的其他子集的标签。我们评估ATSO两个流行的医学图像分割数据集,并显示在不同的半监督设置,其卓越的性能。稍加修改,ATSO传送以及自然图像分割为自主驾驶的数据。
Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Qi Tian
Abstract: In medical image analysis, semi-supervised learning is an effective method to extract knowledge from a small amount of labeled data and a large amount of unlabeled data. This paper focuses on a popular pipeline known as self learning, and points out a weakness namedlazy learningthat refers to the difficulty for a model to learn from the pseudo labels generated by itself. To alleviate this issue, we proposeATSO, anasynchronousversion of teacher-student optimization. ATSO partitions the unlabeled data into two subsets and alternately uses one subset to fine-tune the model and updates the label on the other subset. We evaluate ATSO on two popular medical image segmentation datasets and show its superior performance in various semi-supervised settings. With slight modification, ATSO transfers well to natural image segmentation for autonomous driving data.
摘要:医学图像分析,半监督学习是从标记数据少量和大量未标记的数据,以提取知识的有效方法。本文重点介绍一个受欢迎的管道被称为自学习,并指出一个弱点namedlazy learningthat指困难的模型由自身产生的伪标签学习。为了缓解这一问题,我们proposeATSO,师生优化anasynchronousversion。 ATSO分区未标记的数据分为两个子集,并交替使用一个子集,以微调模型和更新的其他子集的标签。我们评估ATSO两个流行的医学图像分割数据集,并显示在不同的半监督设置,其卓越的性能。稍加修改,ATSO传送以及自然图像分割为自主驾驶的数据。
34. IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency [PDF] 返回目录
Jiarui Cai, Yizhou Wang, Haotian Zhang, Hung-Min Hsu, Chengqian Ma, Jenq-Neng Hwang
Abstract: Multiple object tracking (MOT) is a crucial task in computer vision society. However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast-moving camera scenarios simultaneously due to ego-motion and frequent occlusion. In this work, we propose a novel tracking framework, called "instance-aware MOT" (IA-MOT), that can track multiple objects in either static or moving cameras by jointly considering the instance-level features and object motions. First, robust appearance features are extracted from a variant of Mask R-CNN detector with an additional embedding head, by sending the given detections as the region proposals. Meanwhile, the spatial attention, which focuses on the foreground within the bounding boxes, is generated from the given instance masks and applied to the extracted embedding features. In the tracking stage, object instance masks are aligned by feature similarity and motion consistency using the Hungarian association algorithm. Moreover, object re-identification (ReID) is incorporated to recover ID switches caused by long-term occlusion or missing detection. Overall, when evaluated on the MOTS20 and KITTI-MOTS dataset, our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR2020 workshops.
摘要:多目标追踪(MOT)是计算机视觉社会的关键任务。然而,大多数跟踪的检测方法MOT,与现有检测边界框,不能有效地处理静态的,缓慢移动,同时快速移动相机的情况,由于自我的运动和经常阻塞。在这项工作中,我们提出了一种新的跟踪框架,被称为“识别实例MOT”(IA-MOT),可以通过联合考虑实例级功能和对象的运动跟踪静态多个对象或移动摄像机。首先,健壮的外观特征从面膜R-CNN检测器的一个变体中提取与附加嵌入头,通过发送给定的检测作为区域的建议。同时,空间的关注,其重点是边界框之内的前景中,从给定的实例掩模产生和施加到所提取的嵌入特征。在跟踪阶段,对象实例掩模通过使用匈牙利关联算法特征相似度和运动一致性对准。此外,对象重新鉴定(雷德)被结合,以恢复ID交换机通过长期闭塞或缺失检测而引起。总体而言,在MOTS20和KITTI-MOTS数据集进行评估时,我们提出的方法在CVPR2020车间荣获第一名BMTT挑战赛的第3道。
Jiarui Cai, Yizhou Wang, Haotian Zhang, Hung-Min Hsu, Chengqian Ma, Jenq-Neng Hwang
Abstract: Multiple object tracking (MOT) is a crucial task in computer vision society. However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast-moving camera scenarios simultaneously due to ego-motion and frequent occlusion. In this work, we propose a novel tracking framework, called "instance-aware MOT" (IA-MOT), that can track multiple objects in either static or moving cameras by jointly considering the instance-level features and object motions. First, robust appearance features are extracted from a variant of Mask R-CNN detector with an additional embedding head, by sending the given detections as the region proposals. Meanwhile, the spatial attention, which focuses on the foreground within the bounding boxes, is generated from the given instance masks and applied to the extracted embedding features. In the tracking stage, object instance masks are aligned by feature similarity and motion consistency using the Hungarian association algorithm. Moreover, object re-identification (ReID) is incorporated to recover ID switches caused by long-term occlusion or missing detection. Overall, when evaluated on the MOTS20 and KITTI-MOTS dataset, our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR2020 workshops.
摘要:多目标追踪(MOT)是计算机视觉社会的关键任务。然而,大多数跟踪的检测方法MOT,与现有检测边界框,不能有效地处理静态的,缓慢移动,同时快速移动相机的情况,由于自我的运动和经常阻塞。在这项工作中,我们提出了一种新的跟踪框架,被称为“识别实例MOT”(IA-MOT),可以通过联合考虑实例级功能和对象的运动跟踪静态多个对象或移动摄像机。首先,健壮的外观特征从面膜R-CNN检测器的一个变体中提取与附加嵌入头,通过发送给定的检测作为区域的建议。同时,空间的关注,其重点是边界框之内的前景中,从给定的实例掩模产生和施加到所提取的嵌入特征。在跟踪阶段,对象实例掩模通过使用匈牙利关联算法特征相似度和运动一致性对准。此外,对象重新鉴定(雷德)被结合,以恢复ID交换机通过长期闭塞或缺失检测而引起。总体而言,在MOTS20和KITTI-MOTS数据集进行评估时,我们提出的方法在CVPR2020车间荣获第一名BMTT挑战赛的第3道。
35. Learning Semantically Enhanced Feature for Fine-Grained Image Classification [PDF] 返回目录
Wei Luo, Hengmin Zhang, Jun Li, Xiu-Shen Wei
Abstract: We target at providing a computational cheap yet effective approach for fine-grained image classification (FGIC) in this paper. Compared to previous methods that armed with a sophisticated part localization module for fine-grained feature learning, our approach attains this function by improving the semantics of sub-features of a global feature. To this end, we first achieve the sub-feature semantic by rearranging feature channels of a CNN into different groups through channel permutation, which is implicitly realized without the need of modifying backbone network structures. A weighted combination regularization derived from matching prediction distributions between the global feature and its sub-features is then employed to guide the learned groups to be activated on local parts with strong discriminability, thus increasing the discriminability of the global feature in fine-grained scales. Our approach brings negligible extra parameters to the backbone CNNs and can be implemented as a plug-and-play module as well as trained end-to-end with only image-level supervision. Experiments on four fine-grained benchmark datasets verified the effectiveness of our approach and validated its comparable performance to the state-of-the-art methods. Code is available at {\it \url{this https URL}}
摘要:我们在本文中提供细粒度图像分类(FGIC)的计算廉价而有效的进入目标。一个全球性的特征相比,以前的方法是装备了细粒度功能学习一个复杂的部分定位模块,我们的方法通过改进的语义达到这个功能的子功能。为此,我们首先实现子功能通过经由信道的置换,这是无需修改骨干网结构的隐式地实现重新排列CNN的特征信道分成不同的组语义。然后从全局特征和它的子功能之间的匹配预测的分布导出的加权组合正规化被用来引导要在具有较强的可辨性本地份活化所学习的基团,因此增加了在细粒秤全局特征的可辨性。我们的做法带来的微不足道的额外的参数,以骨干细胞神经网络,可以实现为一个插件和播放模块,以及训练有素的端至端只图像层次的监督。四细粒度标准数据集实验验证了该方法的有效性和验证它的性能相当给国家的最先进的方法。代码可以在{\是\ {URL这HTTPS URL}}
Wei Luo, Hengmin Zhang, Jun Li, Xiu-Shen Wei
Abstract: We target at providing a computational cheap yet effective approach for fine-grained image classification (FGIC) in this paper. Compared to previous methods that armed with a sophisticated part localization module for fine-grained feature learning, our approach attains this function by improving the semantics of sub-features of a global feature. To this end, we first achieve the sub-feature semantic by rearranging feature channels of a CNN into different groups through channel permutation, which is implicitly realized without the need of modifying backbone network structures. A weighted combination regularization derived from matching prediction distributions between the global feature and its sub-features is then employed to guide the learned groups to be activated on local parts with strong discriminability, thus increasing the discriminability of the global feature in fine-grained scales. Our approach brings negligible extra parameters to the backbone CNNs and can be implemented as a plug-and-play module as well as trained end-to-end with only image-level supervision. Experiments on four fine-grained benchmark datasets verified the effectiveness of our approach and validated its comparable performance to the state-of-the-art methods. Code is available at {\it \url{this https URL}}
摘要:我们在本文中提供细粒度图像分类(FGIC)的计算廉价而有效的进入目标。一个全球性的特征相比,以前的方法是装备了细粒度功能学习一个复杂的部分定位模块,我们的方法通过改进的语义达到这个功能的子功能。为此,我们首先实现子功能通过经由信道的置换,这是无需修改骨干网结构的隐式地实现重新排列CNN的特征信道分成不同的组语义。然后从全局特征和它的子功能之间的匹配预测的分布导出的加权组合正规化被用来引导要在具有较强的可辨性本地份活化所学习的基团,因此增加了在细粒秤全局特征的可辨性。我们的做法带来的微不足道的额外的参数,以骨干细胞神经网络,可以实现为一个插件和播放模块,以及训练有素的端至端只图像层次的监督。四细粒度标准数据集实验验证了该方法的有效性和验证它的性能相当给国家的最先进的方法。代码可以在{\是\ {URL这HTTPS URL}}
36. Road surface detection and differentiation considering surface damages [PDF] 返回目录
Thiago Rateke, Aldo von Wangenheim
Abstract: A challenge still to be overcome in the field of visual perception for vehicle and robotic navigation on heavily damaged and unpaved roads is the task of reliable path and obstacle detection. The vast majority of the researches have as scenario roads in good condition, from developed countries. These works cope with few situations of variation on the road surface and even fewer situations presenting surface damages. In this paper we present an approach for road detection considering variation in surface types, identifying paved and unpaved surfaces and also detecting damage and other information on other road surface that may be relevant to driving safety. We also present a new Ground Truth with image segmentation, used in our approach and that allowed us to evaluate our results. Our results show that it is possible to use passive vision for these purposes, even using images captured with low cost cameras.
摘要:仍然在视觉感知车辆和机器人导航的严重损坏和土路上的领域需要克服的挑战是可靠的路径和障碍物检测的任务。绝大多数的研究都在状况良好的情况的道路,来自发达国家。这些作品应付路面上的变化少数情况下,甚至更少的情况下呈现的表面损伤。在本文中,我们提出的方法为道路检测考虑表面类型的变化,识别铺砌和未铺砌的表面,并且还检测损伤和其他路面可能有关驾驶安全的其他信息。我们还提出了一种新的地面真相与图像分割,在我们的方法使用,使我们能够评估我们的结果。我们的研究结果表明,可以使用被动视觉出于这些目的,即使使用具有低成本的照相机拍摄的图像。
Thiago Rateke, Aldo von Wangenheim
Abstract: A challenge still to be overcome in the field of visual perception for vehicle and robotic navigation on heavily damaged and unpaved roads is the task of reliable path and obstacle detection. The vast majority of the researches have as scenario roads in good condition, from developed countries. These works cope with few situations of variation on the road surface and even fewer situations presenting surface damages. In this paper we present an approach for road detection considering variation in surface types, identifying paved and unpaved surfaces and also detecting damage and other information on other road surface that may be relevant to driving safety. We also present a new Ground Truth with image segmentation, used in our approach and that allowed us to evaluate our results. Our results show that it is possible to use passive vision for these purposes, even using images captured with low cost cameras.
摘要:仍然在视觉感知车辆和机器人导航的严重损坏和土路上的领域需要克服的挑战是可靠的路径和障碍物检测的任务。绝大多数的研究都在状况良好的情况的道路,来自发达国家。这些作品应付路面上的变化少数情况下,甚至更少的情况下呈现的表面损伤。在本文中,我们提出的方法为道路检测考虑表面类型的变化,识别铺砌和未铺砌的表面,并且还检测损伤和其他路面可能有关驾驶安全的其他信息。我们还提出了一种新的地面真相与图像分割,在我们的方法使用,使我们能够评估我们的结果。我们的研究结果表明,可以使用被动视觉出于这些目的,即使使用具有低成本的照相机拍摄的图像。
37. Rethinking Distributional Matching Based Domain Adaptation [PDF] 返回目录
Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Sicheng Zhao, Pengfei Xu, Wei Zhou, Yoshua Bengio, Kurt Keutzer
Abstract: Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain, with the core difficulty of resolving distributional shift between domains. Currently, most popular DA algorithms are based on distributional matching (DM). However in practice, realistic domain shifts (RDS) may violate their basic assumptions and as a result these methods will fail. In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods. We further propose InstaPBM, a novel Instance-based Predictive Behavior Matching method for robust DA. Extensive experiments on both conventional and RDS benchmarks demonstrate both the limitations of DM methods and the efficacy of InstaPBM: Compared with the best baselines, InstaPBM improves the classification accuracy respectively by $4.5\%$, $3.9\%$ on Digits5, VisDA2017, and $2.2\%$, $2.9\%$, $3.6\%$ on DomainNet-LDS, DomainNet-ILDS, ID-TwO. We hope our intuitive yet effective method will serve as a useful new direction and increase the robustness of DA in real scenarios. Code will be available at anonymous link: this https URL.
摘要:适配域(DA)是一种技术,转移预测训练于标记的源域的未标记的靶标域模型,以解决域之间分配偏移的核心困难。目前,最流行的DA算法是基于分布式匹配(DM)。然而在实践中,现实域转移(RDS)可能会违反他们的基本假设和结果这些方法将失败。在本文中,为了设计出稳健的DA算法,我们首先系统地分析了基于DM方法的局限性,然后建立与更真实的域转移的新基准,以评估广泛接受的DM方法。我们进一步建议InstaPBM,对于稳健的DA一种新型的基于实例的预测行为匹配方法。在常规和RDS基准大量实验证明两者的DM方法的局限性和InstaPBM的功效:用最好的基线相比,InstaPBM由$ 4.5 \%$分别提高了分类的准确性,$ 3.9 \上Digits5,VisDA2017%$和$ 2.2 \%$,$ 2.9 \%$,$ 3.6 \%的DomainNet-LDS,DomainNet-ILDS,ID-两个$。我们希望我们的直观而有效的方法,将作为一个有用的新方向,增加DA的健壮性在现实场景。代码将可在匿名链接:此HTTPS URL。
Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Sicheng Zhao, Pengfei Xu, Wei Zhou, Yoshua Bengio, Kurt Keutzer
Abstract: Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain, with the core difficulty of resolving distributional shift between domains. Currently, most popular DA algorithms are based on distributional matching (DM). However in practice, realistic domain shifts (RDS) may violate their basic assumptions and as a result these methods will fail. In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods. We further propose InstaPBM, a novel Instance-based Predictive Behavior Matching method for robust DA. Extensive experiments on both conventional and RDS benchmarks demonstrate both the limitations of DM methods and the efficacy of InstaPBM: Compared with the best baselines, InstaPBM improves the classification accuracy respectively by $4.5\%$, $3.9\%$ on Digits5, VisDA2017, and $2.2\%$, $2.9\%$, $3.6\%$ on DomainNet-LDS, DomainNet-ILDS, ID-TwO. We hope our intuitive yet effective method will serve as a useful new direction and increase the robustness of DA in real scenarios. Code will be available at anonymous link: this https URL.
摘要:适配域(DA)是一种技术,转移预测训练于标记的源域的未标记的靶标域模型,以解决域之间分配偏移的核心困难。目前,最流行的DA算法是基于分布式匹配(DM)。然而在实践中,现实域转移(RDS)可能会违反他们的基本假设和结果这些方法将失败。在本文中,为了设计出稳健的DA算法,我们首先系统地分析了基于DM方法的局限性,然后建立与更真实的域转移的新基准,以评估广泛接受的DM方法。我们进一步建议InstaPBM,对于稳健的DA一种新型的基于实例的预测行为匹配方法。在常规和RDS基准大量实验证明两者的DM方法的局限性和InstaPBM的功效:用最好的基线相比,InstaPBM由$ 4.5 \%$分别提高了分类的准确性,$ 3.9 \上Digits5,VisDA2017%$和$ 2.2 \%$,$ 2.9 \%$,$ 3.6 \%的DomainNet-LDS,DomainNet-ILDS,ID-两个$。我们希望我们的直观而有效的方法,将作为一个有用的新方向,增加DA的健壮性在现实场景。代码将可在匿名链接:此HTTPS URL。
38. Applying Lie Groups Approaches for Rigid Registration of Point Clouds [PDF] 返回目录
Liliane Rodrigues de Almeida, Gilson A. Giraldi, Marcelo Bernardes Vieira
Abstract: In the last decades, some literature appeared using the Lie groups theory to solve problems in computer vision. On the other hand, Lie algebraic representations of the transformations therein were introduced to overcome the difficulties behind group structure by mapping the transformation groups to linear spaces. In this paper we focus on application of Lie groups and Lie algebras to find the rigid transformation that best register two surfaces represented by point clouds. The so called pairwise rigid registration can be formulated by comparing intrinsic second-order orientation tensors that encode local geometry. These tensors can be (locally) represented by symmetric non-negative definite matrices. In this paper we interpret the obtained tensor field as a multivariate normal model. So, we start with the fact that the space of Gaussians can be equipped with a Lie group structure, that is isomorphic to a subgroup of the upper triangular matrices. Consequently, the associated Lie algebra structure enables us to handle Gaussians, and consequently, to compare orientation tensors, with Euclidean operations. We apply this methodology to variants of the Iterative Closest Point (ICP), a known technique for pairwise registration. We compare the obtained results with the original implementations that apply the comparative tensor shape factor (CTSF), which is a similarity notion based on the eigenvalues of the orientation tensors. We notice that the similarity measure in tensor spaces directly derived from Lie's approach is not invariant under rotations, which is a problem in terms of rigid registration. Despite of this, the performed computational experiments show promising results when embedding orientation tensor fields in Lie algebras.
摘要:在过去的几十年中,一些文献采用李群理论解决计算机视觉问题出现了。在另一方面,在于引入通过映射变换群到线性空间,以克服组结构背后的困难在其中的转换的代数表示。在本文中,我们侧重于李群和李代数的应用,找到最注册的点云表示的两个面的刚体变换。所谓的成对刚性配准可以通过比较本征二阶定向张量其编码局部几何形状来配制。这些张量可以是(局部地)通过对称的非负正定矩阵表示。在本文中,我们解释获得的张量场作为一个多元正态分布模型。所以,我们先从一个事实,即高斯的空间可以配备李群结构,即同构于上三角矩阵的子组。因此,相关的李代数结构使我们能够处理高斯,因此,比较取向张量,欧几里得操作。我们采用这种方法的迭代最近点(ICP),进行两两登记的公知技术的变体。我们比较适用的比较张量形状因子(CTSF),这是基于所述取向张量的本征值的相似性概念的原始实施方式中,获得的结果。我们注意到,从李群的做法直接导出张量空间的相似性措施不是不变的下旋转,这是刚性登记方面的问题。尽管这样,在执行计算实验表明在李代数嵌入取向张量场的时候有希望的结果。
Liliane Rodrigues de Almeida, Gilson A. Giraldi, Marcelo Bernardes Vieira
Abstract: In the last decades, some literature appeared using the Lie groups theory to solve problems in computer vision. On the other hand, Lie algebraic representations of the transformations therein were introduced to overcome the difficulties behind group structure by mapping the transformation groups to linear spaces. In this paper we focus on application of Lie groups and Lie algebras to find the rigid transformation that best register two surfaces represented by point clouds. The so called pairwise rigid registration can be formulated by comparing intrinsic second-order orientation tensors that encode local geometry. These tensors can be (locally) represented by symmetric non-negative definite matrices. In this paper we interpret the obtained tensor field as a multivariate normal model. So, we start with the fact that the space of Gaussians can be equipped with a Lie group structure, that is isomorphic to a subgroup of the upper triangular matrices. Consequently, the associated Lie algebra structure enables us to handle Gaussians, and consequently, to compare orientation tensors, with Euclidean operations. We apply this methodology to variants of the Iterative Closest Point (ICP), a known technique for pairwise registration. We compare the obtained results with the original implementations that apply the comparative tensor shape factor (CTSF), which is a similarity notion based on the eigenvalues of the orientation tensors. We notice that the similarity measure in tensor spaces directly derived from Lie's approach is not invariant under rotations, which is a problem in terms of rigid registration. Despite of this, the performed computational experiments show promising results when embedding orientation tensor fields in Lie algebras.
摘要:在过去的几十年中,一些文献采用李群理论解决计算机视觉问题出现了。在另一方面,在于引入通过映射变换群到线性空间,以克服组结构背后的困难在其中的转换的代数表示。在本文中,我们侧重于李群和李代数的应用,找到最注册的点云表示的两个面的刚体变换。所谓的成对刚性配准可以通过比较本征二阶定向张量其编码局部几何形状来配制。这些张量可以是(局部地)通过对称的非负正定矩阵表示。在本文中,我们解释获得的张量场作为一个多元正态分布模型。所以,我们先从一个事实,即高斯的空间可以配备李群结构,即同构于上三角矩阵的子组。因此,相关的李代数结构使我们能够处理高斯,因此,比较取向张量,欧几里得操作。我们采用这种方法的迭代最近点(ICP),进行两两登记的公知技术的变体。我们比较适用的比较张量形状因子(CTSF),这是基于所述取向张量的本征值的相似性概念的原始实施方式中,获得的结果。我们注意到,从李群的做法直接导出张量空间的相似性措施不是不变的下旋转,这是刚性登记方面的问题。尽管这样,在执行计算实验表明在李代数嵌入取向张量场的时候有希望的结果。
39. NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search [PDF] 返回目录
Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee
Abstract: Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset. Despite a number of recent results that show the promise of transfer from proxy datasets, a comprehensive evaluation of different NAS methods studying the impact of different source datasets and training protocols has not yet been addressed. In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as ImageNet1K and ImageNet22K. We find that: (i) On average, transfer performance of architectures searched using completely different small datasets perform similarly to the architectures searched directly on proxy target datasets. However, design of proxy sets has considerable impact on rankings of different NAS methods. (ii) While the different NAS methods show similar performance on a source dataset (e.g., CIFAR10), they significantly differ on the transfer performance to a large dataset (e.g., ImageNet1K). (iii) Even on large datasets, the randomly sampled architecture baseline is very competitive and significantly outperforms many representative NAS methods. (iv) The training protocol has a larger impact on small datasets, but it fails to provide consistent improvements on large datasets. We believe that our NASTransfer benchmark will be key to designing future NAS strategies that consistently show superior transfer performance on large scale datasets.
摘要:神经结构搜索(NAS)是机器学习的开放和具有挑战性的问题。尽管NAS提供了巨大的承诺,大部分的现有的NAS方法望而却步计算需求使得它很难直接搜索的大型任务的架构。导电大规模NAS的典型方法是搜索用于建筑积木上的小数据集(使用从大的数据集的代理集或完全不同的小规模的数据集),然后将块转移到一个更大的数据集。尽管一些,显示从代理数据集传送的诺言最近的结果,不同的NAS方法研究不同来源的数据集和训练方案的影响进行全面评估还没有得到解决。在这项工作中,我们提出了通过对大规模的基准,如ImageNet1K和ImageNet22K了一系列的实验来分析不同的NAS方法架构转让。我们发现:(一)平均架构的传输性能找遍使用完全不同的小型数据集执行类似的架构,直接上搜索代理的目标数据集。然而,代理套设计有不同的NAS方法的排名相当大的影响。 (ⅱ)当不同NAS方法显示在一数据集源(例如,CIFAR10)类似的性能,他们显著在转印性能大的数据集(例如,ImageNet1K)不同。 (ⅲ)即使在大的数据集,所述随机取样架构基线是非常有竞争力和显著优于许多代表NAS方法。 (ⅳ)所述的训练协议对小的数据集产生更大的影响,但它不能提供对大数据集相一致的改进。我们相信,我们NASTransfer标杆将是关键设计未来的NAS战略,始终显示在大规模数据集卓越的传输性能。
Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee
Abstract: Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset. Despite a number of recent results that show the promise of transfer from proxy datasets, a comprehensive evaluation of different NAS methods studying the impact of different source datasets and training protocols has not yet been addressed. In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as ImageNet1K and ImageNet22K. We find that: (i) On average, transfer performance of architectures searched using completely different small datasets perform similarly to the architectures searched directly on proxy target datasets. However, design of proxy sets has considerable impact on rankings of different NAS methods. (ii) While the different NAS methods show similar performance on a source dataset (e.g., CIFAR10), they significantly differ on the transfer performance to a large dataset (e.g., ImageNet1K). (iii) Even on large datasets, the randomly sampled architecture baseline is very competitive and significantly outperforms many representative NAS methods. (iv) The training protocol has a larger impact on small datasets, but it fails to provide consistent improvements on large datasets. We believe that our NASTransfer benchmark will be key to designing future NAS strategies that consistently show superior transfer performance on large scale datasets.
摘要:神经结构搜索(NAS)是机器学习的开放和具有挑战性的问题。尽管NAS提供了巨大的承诺,大部分的现有的NAS方法望而却步计算需求使得它很难直接搜索的大型任务的架构。导电大规模NAS的典型方法是搜索用于建筑积木上的小数据集(使用从大的数据集的代理集或完全不同的小规模的数据集),然后将块转移到一个更大的数据集。尽管一些,显示从代理数据集传送的诺言最近的结果,不同的NAS方法研究不同来源的数据集和训练方案的影响进行全面评估还没有得到解决。在这项工作中,我们提出了通过对大规模的基准,如ImageNet1K和ImageNet22K了一系列的实验来分析不同的NAS方法架构转让。我们发现:(一)平均架构的传输性能找遍使用完全不同的小型数据集执行类似的架构,直接上搜索代理的目标数据集。然而,代理套设计有不同的NAS方法的排名相当大的影响。 (ⅱ)当不同NAS方法显示在一数据集源(例如,CIFAR10)类似的性能,他们显著在转印性能大的数据集(例如,ImageNet1K)不同。 (ⅲ)即使在大的数据集,所述随机取样架构基线是非常有竞争力和显著优于许多代表NAS方法。 (ⅳ)所述的训练协议对小的数据集产生更大的影响,但它不能提供对大数据集相一致的改进。我们相信,我们NASTransfer标杆将是关键设计未来的NAS战略,始终显示在大规模数据集卓越的传输性能。
40. Image-to-image Mapping with Many Domains by Sparse Attribute Transfer [PDF] 返回目录
Matthew Amodio, Rim Assouel, Victor Schmidt, Tristan Sylvain, Smita Krishnaswamy, Yoshua Bengio
Abstract: Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points. The current convention is to approach this task with cycle-consistent GANs: using a discriminator to encourage the generator to change the image to match the target domain, while training the generator to be inverted with another mapping. While ending up with paired inverse functions may be a good end result, enforcing this restriction at all times during training can be a hindrance to effective modeling. We propose an alternate approach that directly restricts the generator to performing a simple sparse transformation in a latent layer, motivated by recent work from cognitive neuroscience suggesting an architectural prior on representations corresponding to consciousness. Our biologically motivated approach leads to representations more amenable to transformation by disentangling high-level abstract concepts in the latent space. We demonstrate that image-to-image domain translation with many different domains can be learned more effectively with our architecturally constrained, simple transformation than with previous unconstrained architectures that rely on a cycle-consistency loss.
摘要:无监督图像到图像平移由学习一对的两个结构域之间的映射,而不点之间已知的成对的对应的。目前的惯例是接近与周期一致甘斯这个任务:利用鉴别鼓励发生器改变图像相匹配的目标域,同时培养发电机与另一个映射反转。虽然有一对反函数结束了或许是个不错的最终结果,在训练期间执行在任何时候这种限制可能阻碍有效的建模。我们建议,直接制约着发电机的潜层进行简单改造稀疏的另一种方法,从认知神经科学最近的工作动机表明对应于意识交涉的建筑前。我们的生物动力方法导致申述于潜在空间解开高层次的抽象的概念更易于转化。我们证明图像到影像领域的翻译与许多不同的领域可以更有效地与我们的体系结构的限制,简单的转换比依赖于一个周期的一致性损失之前的无约束的架构来学习。
Matthew Amodio, Rim Assouel, Victor Schmidt, Tristan Sylvain, Smita Krishnaswamy, Yoshua Bengio
Abstract: Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points. The current convention is to approach this task with cycle-consistent GANs: using a discriminator to encourage the generator to change the image to match the target domain, while training the generator to be inverted with another mapping. While ending up with paired inverse functions may be a good end result, enforcing this restriction at all times during training can be a hindrance to effective modeling. We propose an alternate approach that directly restricts the generator to performing a simple sparse transformation in a latent layer, motivated by recent work from cognitive neuroscience suggesting an architectural prior on representations corresponding to consciousness. Our biologically motivated approach leads to representations more amenable to transformation by disentangling high-level abstract concepts in the latent space. We demonstrate that image-to-image domain translation with many different domains can be learned more effectively with our architecturally constrained, simple transformation than with previous unconstrained architectures that rely on a cycle-consistency loss.
摘要:无监督图像到图像平移由学习一对的两个结构域之间的映射,而不点之间已知的成对的对应的。目前的惯例是接近与周期一致甘斯这个任务:利用鉴别鼓励发生器改变图像相匹配的目标域,同时培养发电机与另一个映射反转。虽然有一对反函数结束了或许是个不错的最终结果,在训练期间执行在任何时候这种限制可能阻碍有效的建模。我们建议,直接制约着发电机的潜层进行简单改造稀疏的另一种方法,从认知神经科学最近的工作动机表明对应于意识交涉的建筑前。我们的生物动力方法导致申述于潜在空间解开高层次的抽象的概念更易于转化。我们证明图像到影像领域的翻译与许多不同的领域可以更有效地与我们的体系结构的限制,简单的转换比依赖于一个周期的一致性损失之前的无约束的架构来学习。
41. Anomaly Detection with Deep Perceptual Autoencoders [PDF] 返回目录
Nina Tuluptceva, Bart Bakker, Irina Fedulova, Heinrich Schulz, Dmitry V. Dylov
Abstract: Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in recognizing image anomalies, these methods still prove incapable of handling complex medical images, such as barely visible abnormalities in chest X-rays and metastases in lymph nodes. To address this problem, we introduce a new powerful method of image anomaly detection. It relies on the classical autoencoder approach with a re-designed training pipeline to handle high-resolution, complex images and a robust way of computing an image abnormality score. We revisit the very problem statement of fully unsupervised anomaly detection, where no abnormal examples at all are provided during the model setup. We propose to relax this unrealistic assumption by using a very small number of anomalies of confined variability merely to initiate the search of hyperparameters of the model. We evaluate our solution on natural image datasets with a known benchmark, as well as on two medical datasets containing radiology and digital pathology images. The proposed approach suggests a new strong baseline for image anomaly detection and outperforms state-of-the-art approaches in complex medical image analysis tasks.
摘要:异常检测是识别基于正常数据的看到的例子异常输入的问题。尽管最近在识别图像异常深的学习进展,这些方法仍然证明的处理复杂的医学图像,如在胸部X光几乎不可见异常和转移的淋巴结无能。为了解决这个问题,我们引入图像异常检测的一种新的有效方法。它依赖于一个重新设计的培训管道处理高分辨率,复杂的图像和计算图像异常性分值的一条有效的途径经典的自动编码器的方法。我们重温完全无监督异常检测,其中模型安装过程中提供完全没有异常的例子的非常问题陈述。我们建议放宽使用一个非常小的数目限于变异的异常仅仅是开始搜索模型的超参数的这种不切实际的假设。我们评估了已知的基准我们对自然的图像数据集的解决方案,以及对含有放射学和数字病理图像的两个医疗数据集。所提出的方法提出了图像异常检测,优于新的强基线的国家的最先进复杂的医学图像分析任务的方法。
Nina Tuluptceva, Bart Bakker, Irina Fedulova, Heinrich Schulz, Dmitry V. Dylov
Abstract: Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in recognizing image anomalies, these methods still prove incapable of handling complex medical images, such as barely visible abnormalities in chest X-rays and metastases in lymph nodes. To address this problem, we introduce a new powerful method of image anomaly detection. It relies on the classical autoencoder approach with a re-designed training pipeline to handle high-resolution, complex images and a robust way of computing an image abnormality score. We revisit the very problem statement of fully unsupervised anomaly detection, where no abnormal examples at all are provided during the model setup. We propose to relax this unrealistic assumption by using a very small number of anomalies of confined variability merely to initiate the search of hyperparameters of the model. We evaluate our solution on natural image datasets with a known benchmark, as well as on two medical datasets containing radiology and digital pathology images. The proposed approach suggests a new strong baseline for image anomaly detection and outperforms state-of-the-art approaches in complex medical image analysis tasks.
摘要:异常检测是识别基于正常数据的看到的例子异常输入的问题。尽管最近在识别图像异常深的学习进展,这些方法仍然证明的处理复杂的医学图像,如在胸部X光几乎不可见异常和转移的淋巴结无能。为了解决这个问题,我们引入图像异常检测的一种新的有效方法。它依赖于一个重新设计的培训管道处理高分辨率,复杂的图像和计算图像异常性分值的一条有效的途径经典的自动编码器的方法。我们重温完全无监督异常检测,其中模型安装过程中提供完全没有异常的例子的非常问题陈述。我们建议放宽使用一个非常小的数目限于变异的异常仅仅是开始搜索模型的超参数的这种不切实际的假设。我们评估了已知的基准我们对自然的图像数据集的解决方案,以及对含有放射学和数字病理图像的两个医疗数据集。所提出的方法提出了图像异常检测,优于新的强基线的国家的最先进复杂的医学图像分析任务的方法。
42. Rescaling Egocentric Vision [PDF] 返回目录
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
Abstract: This paper introduces EPIC-KITCHENS-100, the largest annotated egocentric dataset - 100 hrs, 20M frames, 90K actions - of wearable videos capturing long-term unscripted activities in 45 environments. This extends our previous dataset (EPIC-KITCHENS-55), released in 2018, resulting in more action segments (+128%), environments (+41%) and hours (+84%), using a novel annotation pipeline that allows denser and more complete annotations of fine-grained actions (54% more actions per minute). We evaluate the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), detection, anticipation, retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics. Our dataset and challenge leaderboards will be made publicly available.
摘要:本文介绍了EPIC的厨房-100,最大注解自我中心的数据集 - 100个小时,20M帧,90K行动 - 的可穿戴式视频捕捉45种环境下长期脱稿活动。这扩展我们之前的数据集(EPIC的厨房-55),于2018年公布的,从而导致更多的行为段(+ 128%),环境(+ 41%)和小时(+ 84%),使用一种新的注释管道,其允许更密集的和细粒度的操作(每分钟54%的行动)更完整的注释。我们评估“考试时间” - 即培训了2018年收集的数据模型是否可以推广到同一个假设下收集尽管“两个十年”的新素材。该数据集与6个挑战对齐:行为识别(全和监管不力),检测,预测,检索(从字幕),以及无监督领域适应性的行为识别。对于每一个挑战,我们定义任务,提供基准和评价指标。我们的数据和挑战的排行榜将公布于众。
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
Abstract: This paper introduces EPIC-KITCHENS-100, the largest annotated egocentric dataset - 100 hrs, 20M frames, 90K actions - of wearable videos capturing long-term unscripted activities in 45 environments. This extends our previous dataset (EPIC-KITCHENS-55), released in 2018, resulting in more action segments (+128%), environments (+41%) and hours (+84%), using a novel annotation pipeline that allows denser and more complete annotations of fine-grained actions (54% more actions per minute). We evaluate the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), detection, anticipation, retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics. Our dataset and challenge leaderboards will be made publicly available.
摘要:本文介绍了EPIC的厨房-100,最大注解自我中心的数据集 - 100个小时,20M帧,90K行动 - 的可穿戴式视频捕捉45种环境下长期脱稿活动。这扩展我们之前的数据集(EPIC的厨房-55),于2018年公布的,从而导致更多的行为段(+ 128%),环境(+ 41%)和小时(+ 84%),使用一种新的注释管道,其允许更密集的和细粒度的操作(每分钟54%的行动)更完整的注释。我们评估“考试时间” - 即培训了2018年收集的数据模型是否可以推广到同一个假设下收集尽管“两个十年”的新素材。该数据集与6个挑战对齐:行为识别(全和监管不力),检测,预测,检索(从字幕),以及无监督领域适应性的行为识别。对于每一个挑战,我们定义任务,提供基准和评价指标。我们的数据和挑战的排行榜将公布于众。
43. Iris Presentation Attack Detection: Where Are We Now? [PDF] 返回目录
Aidan Boyd, Zhaoyuan Fang, Adam Czajka, Kevin W. Bowyer
Abstract: As the popularity of iris recognition systems increases, the importance of effective security measures against presentation attacks becomes paramount. This work presents an overview of the most important advances in the area of iris presentation attack detection published in recent two years. Newly-released, publicly-available datasets for development and evaluation of iris presentation attack detection are discussed. Recent literature can be seen to be broken into three categories: traditional "hand-crafted" feature extraction and classification, deep learning-based solutions, and hybrid approaches fusing both methodologies. Conclusions of modern approaches underscore the difficulty of this task. Finally, commentary on possible directions for future research is provided.
摘要:由于虹膜识别系统增加的普及,有效的安全措施,防止呈现攻击的重要性变得极为重要。这项工作提出了虹膜呈现攻击检测的发布近两年来该地区最重要的进展的概述。最新发布的,虹膜呈现攻击检测的发展和评估公开可用的数据集进行了讨论。最近的文献可以看到被分为三类:传统的“手工制作”的特征提取和分类,深基础的学习解决方案,以及混合方法融合两种方法。现代方法的结论强调了这一任务的难度。最后,提供对未来研究可能的方向评。
Aidan Boyd, Zhaoyuan Fang, Adam Czajka, Kevin W. Bowyer
Abstract: As the popularity of iris recognition systems increases, the importance of effective security measures against presentation attacks becomes paramount. This work presents an overview of the most important advances in the area of iris presentation attack detection published in recent two years. Newly-released, publicly-available datasets for development and evaluation of iris presentation attack detection are discussed. Recent literature can be seen to be broken into three categories: traditional "hand-crafted" feature extraction and classification, deep learning-based solutions, and hybrid approaches fusing both methodologies. Conclusions of modern approaches underscore the difficulty of this task. Finally, commentary on possible directions for future research is provided.
摘要:由于虹膜识别系统增加的普及,有效的安全措施,防止呈现攻击的重要性变得极为重要。这项工作提出了虹膜呈现攻击检测的发布近两年来该地区最重要的进展的概述。最新发布的,虹膜呈现攻击检测的发展和评估公开可用的数据集进行了讨论。最近的文献可以看到被分为三类:传统的“手工制作”的特征提取和分类,深基础的学习解决方案,以及混合方法融合两种方法。现代方法的结论强调了这一任务的难度。最后,提供对未来研究可能的方向评。
44. Neural Non-Rigid Tracking [PDF] 返回目录
Aljaž Božič, Pablo Palafox, Michael Zollhöfer, Angela Dai, Justus Thies, Matthias Nießner
Abstract: We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. Given two input RGB-D frames of a non-rigidly moving object, we employ a convolutional neural network to predict dense correspondences. These correspondences are used as constraints in an as-rigid-as-possible (ARAP) optimization problem. By enabling gradient back-propagation through the non-rigid optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking. Furthermore, this formulation allows for learning correspondence weights in a self-supervised manner. Thus, outliers and wrong correspondences are down-weighted to enable robust tracking. Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance, while simultaneously achieving 85 times faster correspondence prediction than comparable deep-learning based methods.
摘要:我们介绍一种新颖的,端 - 端可学习,可微非刚性跟踪器,它使状态的最先进的非刚性重建。给定的非刚性运动物体的两个输入RGB-d帧,我们采用一个卷积神经网络来预测致密对应关系。这些对应被用作在作为刚性-AS-可能的(ARAP)的最优化问题的约束。通过非刚性的优化求解实现梯度反向传播,我们可以学习对应于一个终端到终端的方式使得它们对于非刚性的跟踪任务优化。此外,该制剂允许在自监督的方式学习对应的权重。因此,异常和错误的对应关系下加权,可稳定跟踪。相比状态的最先进的方法,我们的算法示出了改进的性能重建,同时实现85倍的速度对应的预测比相当深基于学习的方法。
Aljaž Božič, Pablo Palafox, Michael Zollhöfer, Angela Dai, Justus Thies, Matthias Nießner
Abstract: We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. Given two input RGB-D frames of a non-rigidly moving object, we employ a convolutional neural network to predict dense correspondences. These correspondences are used as constraints in an as-rigid-as-possible (ARAP) optimization problem. By enabling gradient back-propagation through the non-rigid optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking. Furthermore, this formulation allows for learning correspondence weights in a self-supervised manner. Thus, outliers and wrong correspondences are down-weighted to enable robust tracking. Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance, while simultaneously achieving 85 times faster correspondence prediction than comparable deep-learning based methods.
摘要:我们介绍一种新颖的,端 - 端可学习,可微非刚性跟踪器,它使状态的最先进的非刚性重建。给定的非刚性运动物体的两个输入RGB-d帧,我们采用一个卷积神经网络来预测致密对应关系。这些对应被用作在作为刚性-AS-可能的(ARAP)的最优化问题的约束。通过非刚性的优化求解实现梯度反向传播,我们可以学习对应于一个终端到终端的方式使得它们对于非刚性的跟踪任务优化。此外,该制剂允许在自监督的方式学习对应的权重。因此,异常和错误的对应关系下加权,可稳定跟踪。相比状态的最先进的方法,我们的算法示出了改进的性能重建,同时实现85倍的速度对应的预测比相当深基于学习的方法。
45. Meta Transfer Learning for Emotion Recognition [PDF] 返回目录
Dung Nguyen, Sridha Sridharan, Duc Thanh Nguyen, Simon Denman, David Dean, Clinton Fookes
Abstract: Deep learning has been widely adopted in automatic emotion recognition and has lead to significant progress in the field. However, due to insufficient annotated emotion datasets, pre-trained models are limited in their generalization capability and thus lead to poor performance on novel test sets. To mitigate this challenge, transfer learning performing fine-tuning on pre-trained models has been applied. However, the fine-tuned knowledge may overwrite and/or discard important knowledge learned from pre-trained models. In this paper, we address this issue by proposing a PathNet-based transfer learning method that is able to transfer emotional knowledge learned from one visual/audio emotion domain to another visual/audio emotion domain, and transfer the emotional knowledge learned from multiple audio emotion domains into one another to improve overall emotion recognition accuracy. To show the robustness of our proposed system, various sets of experiments for facial expression recognition and speech emotion recognition task on three emotion datasets: SAVEE, EMODB, and eNTERFACE have been carried out. The experimental results indicate that our proposed system is capable of improving the performance of emotion recognition, making its performance substantially superior to the recent proposed fine-tuning/pre-trained models based transfer learning methods.
摘要:深学习已自动情感识别被广泛采用,并已导致现场显著进展。然而,由于注释情感数据集不足,预先训练模型在其泛化能力的限制,从而导致对新的测试设备性能较差。为了缓解这一难题,在执行预先训练模型微调迁移学习中得到应用。然而,微调的知识可以覆盖和/或丢弃重要的知识,从预训练模式的经验教训。在本文中,我们通过提出一个PathNet基于迁移学习方法,它能够感性知识从一个视觉/音频情感域学会转移到另一个视觉/音频情感领域,并转移情感知识从多个音频情感了解到解决这个问题域到彼此,以提高整体情绪识别的准确性。为了表明我们所提出的系统的鲁棒性,实验的面部表情识别和语音情感识别任务的三个情感数据集各组:SAVEE,EMODB和eNTERFACE已经开展。实验结果表明,我们提出的系统能够提高情感识别的性能,使得其性能显着优于基于传输的学习方法,最近提出的微调/预先训练模式。
Dung Nguyen, Sridha Sridharan, Duc Thanh Nguyen, Simon Denman, David Dean, Clinton Fookes
Abstract: Deep learning has been widely adopted in automatic emotion recognition and has lead to significant progress in the field. However, due to insufficient annotated emotion datasets, pre-trained models are limited in their generalization capability and thus lead to poor performance on novel test sets. To mitigate this challenge, transfer learning performing fine-tuning on pre-trained models has been applied. However, the fine-tuned knowledge may overwrite and/or discard important knowledge learned from pre-trained models. In this paper, we address this issue by proposing a PathNet-based transfer learning method that is able to transfer emotional knowledge learned from one visual/audio emotion domain to another visual/audio emotion domain, and transfer the emotional knowledge learned from multiple audio emotion domains into one another to improve overall emotion recognition accuracy. To show the robustness of our proposed system, various sets of experiments for facial expression recognition and speech emotion recognition task on three emotion datasets: SAVEE, EMODB, and eNTERFACE have been carried out. The experimental results indicate that our proposed system is capable of improving the performance of emotion recognition, making its performance substantially superior to the recent proposed fine-tuning/pre-trained models based transfer learning methods.
摘要:深学习已自动情感识别被广泛采用,并已导致现场显著进展。然而,由于注释情感数据集不足,预先训练模型在其泛化能力的限制,从而导致对新的测试设备性能较差。为了缓解这一难题,在执行预先训练模型微调迁移学习中得到应用。然而,微调的知识可以覆盖和/或丢弃重要的知识,从预训练模式的经验教训。在本文中,我们通过提出一个PathNet基于迁移学习方法,它能够感性知识从一个视觉/音频情感域学会转移到另一个视觉/音频情感领域,并转移情感知识从多个音频情感了解到解决这个问题域到彼此,以提高整体情绪识别的准确性。为了表明我们所提出的系统的鲁棒性,实验的面部表情识别和语音情感识别任务的三个情感数据集各组:SAVEE,EMODB和eNTERFACE已经开展。实验结果表明,我们提出的系统能够提高情感识别的性能,使得其性能显着优于基于传输的学习方法,最近提出的微调/预先训练模式。
46. Malignancy-Aware Follow-Up Volume Prediction for Lung Nodules [PDF] 返回目录
Yamin Li, Jiancheng Yang, Yi Xu, Jingwei Xu, Xiaodan Ye, Guangyu Tao, Xueqian Xie, Guixue Liu
Abstract: Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, recent deep learning studies using convolutional neural networks (CNNs) to predict the malignancy score of nodules, only provides clinicians with black-box predictions. To this end, we propose a unified framework, named Nodule Follow-Up Prediction Network (NoFoNet), which predicts the growth of pulmonary nodules with high-quality visual appearances and accurate quantitative malignancy scores, given any time interval from baseline observations. It is achieved by predicting future displacement field of each voxel with a WarpNet. A TextureNet is further developed to refine textural details of WarpNet outputs. We also introduce techniques including Temporal Encoding Module and Warp Segmentation Loss to encourage time-aware and malignancy-aware representation learning. We build an in-house follow-up dataset from two medical centers to validate the effectiveness of the proposed method. NoFoNet~significantly outperforms direct prediction by a U-Net in terms of visual quality; more importantly, it demonstrates accurate differentiating performance between high- and low-risk nodules. Our promising results suggest the potentials in computer aided intervention for lung nodule management.
摘要:后续服务于肺结节的管理肺癌的重要作用。已与专家共识影像诊断指南,以帮助放射科医生为每个病人的临床决策。然而,肿瘤生长是这样一个复杂的过程,这是很难从分层基于形态特征的低风险那些高风险的结节。在另一方面,近期采用卷积神经网络(细胞神经网络)深的学习研究,以预测结节恶性得分,只为临床医生提供黑箱预测。为此,我们提出了一个统一的框架,命名为结节后续预测网(NoFoNet),该预测具有高品质的视觉外观和准确的定量恶性肿瘤得分肺部结节的生长,因为从基线观察任何时间间隔。它是通过预测每个像素的未来位移场与WarpNet实现。一个TextureNet进一步发展完善WarpNet输出的纹理细节。我们还引进技术,包括时间编码模块和经分割的损失,以鼓励时间感知和恶性肿瘤感知表示学习。我们建立两个医疗中心的内部后续数据集验证了该方法的有效性。 NoFoNet〜显著优于在视觉质量方面由U-Net的直接预测的;更重要的是,它表明之间的高风险和低风险的结节精确差分性能。我们希望的结果表明,在计算机的辅助潜力为肺结节管理干预。
Yamin Li, Jiancheng Yang, Yi Xu, Jingwei Xu, Xiaodan Ye, Guangyu Tao, Xueqian Xie, Guixue Liu
Abstract: Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, recent deep learning studies using convolutional neural networks (CNNs) to predict the malignancy score of nodules, only provides clinicians with black-box predictions. To this end, we propose a unified framework, named Nodule Follow-Up Prediction Network (NoFoNet), which predicts the growth of pulmonary nodules with high-quality visual appearances and accurate quantitative malignancy scores, given any time interval from baseline observations. It is achieved by predicting future displacement field of each voxel with a WarpNet. A TextureNet is further developed to refine textural details of WarpNet outputs. We also introduce techniques including Temporal Encoding Module and Warp Segmentation Loss to encourage time-aware and malignancy-aware representation learning. We build an in-house follow-up dataset from two medical centers to validate the effectiveness of the proposed method. NoFoNet~significantly outperforms direct prediction by a U-Net in terms of visual quality; more importantly, it demonstrates accurate differentiating performance between high- and low-risk nodules. Our promising results suggest the potentials in computer aided intervention for lung nodule management.
摘要:后续服务于肺结节的管理肺癌的重要作用。已与专家共识影像诊断指南,以帮助放射科医生为每个病人的临床决策。然而,肿瘤生长是这样一个复杂的过程,这是很难从分层基于形态特征的低风险那些高风险的结节。在另一方面,近期采用卷积神经网络(细胞神经网络)深的学习研究,以预测结节恶性得分,只为临床医生提供黑箱预测。为此,我们提出了一个统一的框架,命名为结节后续预测网(NoFoNet),该预测具有高品质的视觉外观和准确的定量恶性肿瘤得分肺部结节的生长,因为从基线观察任何时间间隔。它是通过预测每个像素的未来位移场与WarpNet实现。一个TextureNet进一步发展完善WarpNet输出的纹理细节。我们还引进技术,包括时间编码模块和经分割的损失,以鼓励时间感知和恶性肿瘤感知表示学习。我们建立两个医疗中心的内部后续数据集验证了该方法的有效性。 NoFoNet〜显著优于在视觉质量方面由U-Net的直接预测的;更重要的是,它表明之间的高风险和低风险的结节精确差分性能。我们希望的结果表明,在计算机的辅助潜力为肺结节管理干预。
47. Microstructure Generation via Generative Adversarial Network for Heterogeneous, Topologically Complex 3D Materials [PDF] 返回目录
Tim Hsu, William K. Epting, Hokon Kim, Harry W. Abernathy, Gregory A. Hackett, Anthony D. Rollett, Paul A. Salvador, Elizabeth A. Holm
Abstract: Using a large-scale, experimentally captured 3D microstructure dataset, we implement the generative adversarial network (GAN) framework to learn and generate 3D microstructures of solid oxide fuel cell electrodes. The generated microstructures are visually, statistically, and topologically realistic, with distributions of microstructural parameters, including volume fraction, particle size, surface area, tortuosity, and triple phase boundary density, being highly similar to those of the original microstructure. These results are compared and contrasted with those from an established, grain-based generation algorithm (DREAM.3D). Importantly, simulations of electrochemical performance, using a locally resolved finite element model, demonstrate that the GAN generated microstructures closely match the performance distribution of the original, while DREAM.3D leads to significant differences. The ability of the generative machine learning model to recreate microstructures with high fidelity suggests that the essence of complex microstructures may be captured and represented in a compact and manipulatable form.
摘要:利用一个大型,实验捕捉的3D数据集的微结构,我们实现的生成对抗网络(GAN)框架来学习并生成固体氧化物燃料电池的电极的三维微结构。所产生的微结构是在视觉上,在统计学上和拓扑现实的,具有微结构参数,包括体积分数,粒度,表面积,曲折度,和三相边界密度的分布,是高度相似的原始显微组织。这些结果进行比较,并与从已建立的,基于谷物的生成算法(DREAM.3D)对比。重要的是,电化学性能仿真,使用本地解析的有限元模型,证明了甘产生的微观结构密切配合原有的性能分布,而DREAM.3D导致显著的差异。的生成机器学习模型的重新创建的微结构具有高保真度的能力表明,复杂的微观的本质可以被捕获并且在一个紧凑的和可操纵的形式表示。
Tim Hsu, William K. Epting, Hokon Kim, Harry W. Abernathy, Gregory A. Hackett, Anthony D. Rollett, Paul A. Salvador, Elizabeth A. Holm
Abstract: Using a large-scale, experimentally captured 3D microstructure dataset, we implement the generative adversarial network (GAN) framework to learn and generate 3D microstructures of solid oxide fuel cell electrodes. The generated microstructures are visually, statistically, and topologically realistic, with distributions of microstructural parameters, including volume fraction, particle size, surface area, tortuosity, and triple phase boundary density, being highly similar to those of the original microstructure. These results are compared and contrasted with those from an established, grain-based generation algorithm (DREAM.3D). Importantly, simulations of electrochemical performance, using a locally resolved finite element model, demonstrate that the GAN generated microstructures closely match the performance distribution of the original, while DREAM.3D leads to significant differences. The ability of the generative machine learning model to recreate microstructures with high fidelity suggests that the essence of complex microstructures may be captured and represented in a compact and manipulatable form.
摘要:利用一个大型,实验捕捉的3D数据集的微结构,我们实现的生成对抗网络(GAN)框架来学习并生成固体氧化物燃料电池的电极的三维微结构。所产生的微结构是在视觉上,在统计学上和拓扑现实的,具有微结构参数,包括体积分数,粒度,表面积,曲折度,和三相边界密度的分布,是高度相似的原始显微组织。这些结果进行比较,并与从已建立的,基于谷物的生成算法(DREAM.3D)对比。重要的是,电化学性能仿真,使用本地解析的有限元模型,证明了甘产生的微观结构密切配合原有的性能分布,而DREAM.3D导致显著的差异。的生成机器学习模型的重新创建的微结构具有高保真度的能力表明,复杂的微观的本质可以被捕获并且在一个紧凑的和可操纵的形式表示。
48. Does Non-COVID19 Lung Lesion Help? Investigating Transferability in COVID-19 CT Image Segmentation [PDF] 返回目录
Yixin Wang, Yao Zhang, Yang Liu, Jiang Tian, Cheng Zhong, Zhongchao Shi, Yang Zhang, Zhiqiang He
Abstract: Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19 infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions. Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Multi-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.
摘要:冠状病毒病2019(COVID-19)是一种具有高度传染性病毒传播到世界各地。深学习已经被采用作为以帮助从计算机断层摄影(CT)图像COVID-19检测与分割的有效技术。主要的挑战体现在公共不足COVID-19数据集。近日,迁移学习已经成为一种广泛使用的技术,它利用在解决一个问题,并把它应用到一个不同但相关的问题中获得的知识。然而,各种非COVID19肺部病变是否有助于分割COVID19感染地区,如何更好地进行这种转移过程仍不清楚。本文提供了一种方法来了解非COVID19肺部病变的转移性。根据一个公开的COVID19 CT数据集和三个公共非COVID19数据集,我们采用3D掌中为标准编码器,解码器方法评估了四种传输的学习方法。结果表明,从非COVID19肺部病变传递知识,并从多种肺部损伤数据集的学习可以提取更多的一般特征,导致准确和强大的预先训练模式的好处。进一步的研究表明,编码器,了解肺部病变的特征表示,提高分割精度,便于训练收敛的能力。此外,我们提出的多编码器的学习方法结合转移肺部损伤的有效非COVID19数据集功能,达到显著的改善。这些发现推动了新的见解迁移学习的COVID-19的CT图像分割,这也可以进一步推广到其他医疗任务。
Yixin Wang, Yao Zhang, Yang Liu, Jiang Tian, Cheng Zhong, Zhongchao Shi, Yang Zhang, Zhiqiang He
Abstract: Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19 infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions. Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Multi-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.
摘要:冠状病毒病2019(COVID-19)是一种具有高度传染性病毒传播到世界各地。深学习已经被采用作为以帮助从计算机断层摄影(CT)图像COVID-19检测与分割的有效技术。主要的挑战体现在公共不足COVID-19数据集。近日,迁移学习已经成为一种广泛使用的技术,它利用在解决一个问题,并把它应用到一个不同但相关的问题中获得的知识。然而,各种非COVID19肺部病变是否有助于分割COVID19感染地区,如何更好地进行这种转移过程仍不清楚。本文提供了一种方法来了解非COVID19肺部病变的转移性。根据一个公开的COVID19 CT数据集和三个公共非COVID19数据集,我们采用3D掌中为标准编码器,解码器方法评估了四种传输的学习方法。结果表明,从非COVID19肺部病变传递知识,并从多种肺部损伤数据集的学习可以提取更多的一般特征,导致准确和强大的预先训练模式的好处。进一步的研究表明,编码器,了解肺部病变的特征表示,提高分割精度,便于训练收敛的能力。此外,我们提出的多编码器的学习方法结合转移肺部损伤的有效非COVID19数据集功能,达到显著的改善。这些发现推动了新的见解迁移学习的COVID-19的CT图像分割,这也可以进一步推广到其他医疗任务。
49. COVIDLite: A depth-wise separable deep neural network with white balance and CLAHE for detection of COVID-19 [PDF] 返回目录
Manu Siddhartha, Avik Santra
Abstract: Background and Objective:Currently, the whole world is facing a pandemic disease, novel Coronavirus also known as COVID-19, which spread in more than 200 countries with around 3.3 million active cases and 4.4 lakh deaths approximately. Due to rapid increase in number of cases and limited supply of testing kits, availability of alternative diagnostic method is necessary for containing the spread of COVID-19 cases at an early stage and reducing the death count. For making available an alternative diagnostic method, we proposed a deep neural network based diagnostic method which can be easily integrated with mobile devices for detection of COVID-19 and viral pneumonia using Chest X-rays (CXR) images. Methods:In this study, we have proposed a method named COVIDLite, which is a combination of white balance followed by Contrast Limited Adaptive Histogram Equalization (CLAHE) and depth-wise separable convolutional neural network (DSCNN). In this method, white balance followed by CLAHE is used as an image preprocessing step for enhancing the visibility of CXR images and DSCNN trained using sparse cross entropy is used for image classification with lesser parameters and significantly lighter in size, i.e., 8.4 MB without quantization. Results:The proposed COVIDLite method resulted in improved performance in comparison to vanilla DSCNN with no pre-processing. The proposed method achieved higher accuracy of 99.58% for binary classification, whereas 96.43% for multiclass classification and out-performed various state-of-the-art methods. Conclusion:Our proposed method, COVIDLite achieved exceptional results on various performance metrics. With detailed model interpretations, COVIDLite can assist radiologists in detecting COVID-19 patients from CXR images and can reduce the diagnosis time significantly.
摘要:背景与目的:目前,全世界正面临着流行病,新型冠状病毒也被称为COVID-19,在200多个国家传播其周围330万活跃病例和4.4十万人死亡约。由于病例数迅速增加和测试试剂盒的供应的限制,可替换的诊断方法的可用性是必要的含有COVID-19情况下,扩散在早期阶段和减少死亡数。为了制备可用的一种替代的诊断方法,我们提出了一个深基于神经网络的诊断,其可以与用于检测使用胸部X射线(CXR)图像COVID-19和病毒性肺炎的移动设备很容易地集成方法。方法:在这项研究中,我们提出了一个名为COVIDLite方法,这是白平衡组合之后对比度受限自适应直方图均衡(CLAHE)和深度向可分离卷积神经网络(DSCNN)。在该方法中,白平衡接着CLAHE被用作图像预处理步骤用于增强CXR图像的可视性和使用稀疏交叉熵DSCNN训练用于图像分类用较少参数和尺寸显著打火机,即8.4 MB无量化。结果:所提出的方法COVIDLite导致相比于香草DSCNN没有预先处理改进的性能。所提出的方法用于二元分类为多类分类和出执行的各种状态的最先进的方法来实现的99.58%更高的精度,而96.43%。结论:我们所提出的方法,COVIDLite实现各种性能指标优异成绩。与详细的模型解释,COVIDLite可以帮助放射科医生在从CXR图像COVID-19的患者检测,并且可以减少显著诊断时间。
Manu Siddhartha, Avik Santra
Abstract: Background and Objective:Currently, the whole world is facing a pandemic disease, novel Coronavirus also known as COVID-19, which spread in more than 200 countries with around 3.3 million active cases and 4.4 lakh deaths approximately. Due to rapid increase in number of cases and limited supply of testing kits, availability of alternative diagnostic method is necessary for containing the spread of COVID-19 cases at an early stage and reducing the death count. For making available an alternative diagnostic method, we proposed a deep neural network based diagnostic method which can be easily integrated with mobile devices for detection of COVID-19 and viral pneumonia using Chest X-rays (CXR) images. Methods:In this study, we have proposed a method named COVIDLite, which is a combination of white balance followed by Contrast Limited Adaptive Histogram Equalization (CLAHE) and depth-wise separable convolutional neural network (DSCNN). In this method, white balance followed by CLAHE is used as an image preprocessing step for enhancing the visibility of CXR images and DSCNN trained using sparse cross entropy is used for image classification with lesser parameters and significantly lighter in size, i.e., 8.4 MB without quantization. Results:The proposed COVIDLite method resulted in improved performance in comparison to vanilla DSCNN with no pre-processing. The proposed method achieved higher accuracy of 99.58% for binary classification, whereas 96.43% for multiclass classification and out-performed various state-of-the-art methods. Conclusion:Our proposed method, COVIDLite achieved exceptional results on various performance metrics. With detailed model interpretations, COVIDLite can assist radiologists in detecting COVID-19 patients from CXR images and can reduce the diagnosis time significantly.
摘要:背景与目的:目前,全世界正面临着流行病,新型冠状病毒也被称为COVID-19,在200多个国家传播其周围330万活跃病例和4.4十万人死亡约。由于病例数迅速增加和测试试剂盒的供应的限制,可替换的诊断方法的可用性是必要的含有COVID-19情况下,扩散在早期阶段和减少死亡数。为了制备可用的一种替代的诊断方法,我们提出了一个深基于神经网络的诊断,其可以与用于检测使用胸部X射线(CXR)图像COVID-19和病毒性肺炎的移动设备很容易地集成方法。方法:在这项研究中,我们提出了一个名为COVIDLite方法,这是白平衡组合之后对比度受限自适应直方图均衡(CLAHE)和深度向可分离卷积神经网络(DSCNN)。在该方法中,白平衡接着CLAHE被用作图像预处理步骤用于增强CXR图像的可视性和使用稀疏交叉熵DSCNN训练用于图像分类用较少参数和尺寸显著打火机,即8.4 MB无量化。结果:所提出的方法COVIDLite导致相比于香草DSCNN没有预先处理改进的性能。所提出的方法用于二元分类为多类分类和出执行的各种状态的最先进的方法来实现的99.58%更高的精度,而96.43%。结论:我们所提出的方法,COVIDLite实现各种性能指标优异成绩。与详细的模型解释,COVIDLite可以帮助放射科医生在从CXR图像COVID-19的患者检测,并且可以减少显著诊断时间。
50. Feedback Graph Attention Convolutional Network for Medical Image Enhancement [PDF] 返回目录
Xiaobin Hu, Yanyang Yan, Wenqi Ren, Hongwei Li, Yu Zhao, Amirhossein Bayat, Bjoern Menze
Abstract: Artifacts, blur and noise are the common distortions degrading MRI images during the acquisition process, and deep neural networks have been demonstrated to help in improving image quality. To well exploit global structural information and texture details, we propose a novel biomedical image enhancement network, named Feedback Graph Attention Convolutional Network (FB-GACN). As a key innovation, we consider the global structure of an image by building a graph network from image sub-regions that we consider to be node features, linking them non-locally according to their similarity. The proposed model consists of three main parts: 1) The parallel graph similarity branch and content branch, where the graph similarity branch aims at exploiting the similarity and symmetry across different image sub-regions in low-resolution feature space and provides additional priors for the content branch to enhance texture details. 2) A feedback mechanism with a recurrent structure to refine low-level representations with high-level information and generate powerful high-level texture details by handling the feedback connections. 3) A reconstruction to remove the artifacts and recover super-resolution images by using the estimated sub-region correlation priors obtained from the graph similarity branch. We evaluate our method on two image enhancement tasks: i) cross-protocol super resolution of diffusion MRI; ii) artifact removal of FLAIR MR images. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art methods.
摘要:文物,模糊和噪点是常见的失真在采集过程降解MRI图像,以及深层神经网络已经被证明在提高图像质量的帮助。为了最大限度地利用全球的结构信息和纹理细节,我们提出了一种新的生物医学图像增强网络,命名反馈图形注意卷积网络(FB-GACN)。作为一个重要的创新,我们通过建立从图像的子区域的图形网络,我们认为是节点的特点,根据他们的相似性联系起来的非本地考虑图像的整体结构。该模型包括三个主要部分:1)的平行图形相似分支和内容分支,其中所述图相似分支旨在利用在低分辨率特征空间中的相似性和对称性在不同的图像的子区域,并提供了附加的先验内容分支来增强纹理细节。 2)用复发性结构细化低电平交涉高级别信息的反馈机制,并且产生通过处理反馈连接强大高级别纹理细节。 3)重建以去除伪像和通过利用从曲线图相似分支获得的估计的子区域相关先验恢复超分辨率图像。我们评估我们的两个图像增强方法的任务:1)扩散MRI的跨协议超分辨率; ⅱ)假象去除FLAIR MR图像。实验结果表明,所提出的算法优于国家的最先进的方法。
Xiaobin Hu, Yanyang Yan, Wenqi Ren, Hongwei Li, Yu Zhao, Amirhossein Bayat, Bjoern Menze
Abstract: Artifacts, blur and noise are the common distortions degrading MRI images during the acquisition process, and deep neural networks have been demonstrated to help in improving image quality. To well exploit global structural information and texture details, we propose a novel biomedical image enhancement network, named Feedback Graph Attention Convolutional Network (FB-GACN). As a key innovation, we consider the global structure of an image by building a graph network from image sub-regions that we consider to be node features, linking them non-locally according to their similarity. The proposed model consists of three main parts: 1) The parallel graph similarity branch and content branch, where the graph similarity branch aims at exploiting the similarity and symmetry across different image sub-regions in low-resolution feature space and provides additional priors for the content branch to enhance texture details. 2) A feedback mechanism with a recurrent structure to refine low-level representations with high-level information and generate powerful high-level texture details by handling the feedback connections. 3) A reconstruction to remove the artifacts and recover super-resolution images by using the estimated sub-region correlation priors obtained from the graph similarity branch. We evaluate our method on two image enhancement tasks: i) cross-protocol super resolution of diffusion MRI; ii) artifact removal of FLAIR MR images. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art methods.
摘要:文物,模糊和噪点是常见的失真在采集过程降解MRI图像,以及深层神经网络已经被证明在提高图像质量的帮助。为了最大限度地利用全球的结构信息和纹理细节,我们提出了一种新的生物医学图像增强网络,命名反馈图形注意卷积网络(FB-GACN)。作为一个重要的创新,我们通过建立从图像的子区域的图形网络,我们认为是节点的特点,根据他们的相似性联系起来的非本地考虑图像的整体结构。该模型包括三个主要部分:1)的平行图形相似分支和内容分支,其中所述图相似分支旨在利用在低分辨率特征空间中的相似性和对称性在不同的图像的子区域,并提供了附加的先验内容分支来增强纹理细节。 2)用复发性结构细化低电平交涉高级别信息的反馈机制,并且产生通过处理反馈连接强大高级别纹理细节。 3)重建以去除伪像和通过利用从曲线图相似分支获得的估计的子区域相关先验恢复超分辨率图像。我们评估我们的两个图像增强方法的任务:1)扩散MRI的跨协议超分辨率; ⅱ)假象去除FLAIR MR图像。实验结果表明,所提出的算法优于国家的最先进的方法。
51. Stacked Convolutional Neural Network for Diagnosis of COVID-19 Disease from X-ray Images [PDF] 返回目录
Mahesh Gour, Sweta Jain
Abstract: Automatic and rapid screening of COVID-19 from the chest X-ray images has become an urgent need in this pandemic situation of SARS-CoV-2 worldwide in 2020. However, accurate and reliable screening of patients is a massive challenge due to the discrepancy between COVID-19 and other viral pneumonia in X-ray images. In this paper, we design a new stacked convolutional neural network model for the automatic diagnosis of COVID-19 disease from the chest X-ray images. We obtain different sub-models from the VGG19 and developed a 30-layered CNN model (named as CovNet30) during the training, and obtained sub-models are stacked together using logistic regression. The proposed CNN model combines the discriminating power of the different CNN`s sub-models and classifies chest X-ray images into COVID-19, Normal, and Pneumonia classes. In addition, we generate X-ray images dataset referred to as COVID19CXr, which includes 2764 chest x-ray images of 1768 patients from the three publicly available data repositories. The proposed stacked CNN achieves an accuracy of 92.74%, the sensitivity of 93.33%, PPV of 92.13%, and F1-score of 0.93 for the classification of X-ray images. Our proposed approach shows its superiority over the existing methods for the diagnosis of the COVID-19 from the X-ray images.
摘要:自动和胸部X射线图像COVID-19的快速筛选已经成为SARS-COV-2的全球在2020年这一流行病情况然而迫切需要,患者准确和可靠的检查是由于一个巨大的挑战在X射线图像COVID-19和其他病毒性肺炎之间的差异。在本文中,我们设计了一种新堆叠卷积神经网络COVID-19疾病的从胸部X射线图像的自动诊断模型。我们获得来自VGG19不同子模型和开发的培训中有30分层CNN模型(命名为CovNet30),并获得子模型使用logistic回归叠在一起。所提出的CNN模型结合不同CNN`s子模型和分类胸部X射线图像的区分能力为COVID-19,正常和肺炎类。此外,我们生成数据集被称为COVID19CXr,其包括来自三个可公开获得的数据储存库的1768名患者胸部2764的x射线图像的X射线图像。所提出的堆叠CNN实现了92.74%的准确度,93.33%对X射线图像的分类的敏感性,92.13%PPV,和0.93 F1-得分。我们建议的做法显示了其优越性为COVID-19从X射线图像诊断现有的方法了。
Mahesh Gour, Sweta Jain
Abstract: Automatic and rapid screening of COVID-19 from the chest X-ray images has become an urgent need in this pandemic situation of SARS-CoV-2 worldwide in 2020. However, accurate and reliable screening of patients is a massive challenge due to the discrepancy between COVID-19 and other viral pneumonia in X-ray images. In this paper, we design a new stacked convolutional neural network model for the automatic diagnosis of COVID-19 disease from the chest X-ray images. We obtain different sub-models from the VGG19 and developed a 30-layered CNN model (named as CovNet30) during the training, and obtained sub-models are stacked together using logistic regression. The proposed CNN model combines the discriminating power of the different CNN`s sub-models and classifies chest X-ray images into COVID-19, Normal, and Pneumonia classes. In addition, we generate X-ray images dataset referred to as COVID19CXr, which includes 2764 chest x-ray images of 1768 patients from the three publicly available data repositories. The proposed stacked CNN achieves an accuracy of 92.74%, the sensitivity of 93.33%, PPV of 92.13%, and F1-score of 0.93 for the classification of X-ray images. Our proposed approach shows its superiority over the existing methods for the diagnosis of the COVID-19 from the X-ray images.
摘要:自动和胸部X射线图像COVID-19的快速筛选已经成为SARS-COV-2的全球在2020年这一流行病情况然而迫切需要,患者准确和可靠的检查是由于一个巨大的挑战在X射线图像COVID-19和其他病毒性肺炎之间的差异。在本文中,我们设计了一种新堆叠卷积神经网络COVID-19疾病的从胸部X射线图像的自动诊断模型。我们获得来自VGG19不同子模型和开发的培训中有30分层CNN模型(命名为CovNet30),并获得子模型使用logistic回归叠在一起。所提出的CNN模型结合不同CNN`s子模型和分类胸部X射线图像的区分能力为COVID-19,正常和肺炎类。此外,我们生成数据集被称为COVID19CXr,其包括来自三个可公开获得的数据储存库的1768名患者胸部2764的x射线图像的X射线图像。所提出的堆叠CNN实现了92.74%的准确度,93.33%对X射线图像的分类的敏感性,92.13%PPV,和0.93 F1-得分。我们建议的做法显示了其优越性为COVID-19从X射线图像诊断现有的方法了。
52. Interpretable Deep Models for Cardiac Resynchronisation Therapy Response Prediction [PDF] 返回目录
Esther Puyol-Antón, Chen Chen, James R. Clough, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark Elliott, Vishal Mehta, Daniel Rueckert, Christopher A. Rinaldi, Andrew P. King
Abstract: Advances in deep learning (DL) have resulted in impressive accuracy in some medical image classification tasks, but often deep models lack interpretability. The ability of these models to explain their decisions is important for fostering clinical trust and facilitating clinical translation. Furthermore, for many problems in medicine there is a wealth of existing clinical knowledge to draw upon, which may be useful in generating explanations, but it is not obvious how this knowledge can be encoded into DL models - most models are learnt either from scratch or using transfer learning from a different domain. In this paper we address both of these issues. We propose a novel DL framework for image-based classification based on a variational autoencoder (VAE). The framework allows prediction of the output of interest from the latent space of the autoencoder, as well as visualisation (in the image domain) of the effects of crossing the decision boundary, thus enhancing the interpretability of the classifier. Our key contribution is that the VAE disentangles the latent space based on 'explanations' drawn from existing clinical knowledge. The framework can predict outputs as well as explanations for these outputs, and also raises the possibility of discovering new biomarkers that are separate (or disentangled) from the existing knowledge. We demonstrate our framework on the problem of predicting response of patients with cardiomyopathy to cardiac resynchronization therapy (CRT) from cine cardiac magnetic resonance images. The sensitivity and specificity of the proposed model on the task of CRT response prediction are 88.43% and 84.39% respectively, and we showcase the potential of our model in enhancing understanding of the factors contributing to CRT response.
摘要:进展深度学习(DL)已导致令人印象深刻的准确性在一些医学图像分类的任务,但往往是根深蒂固的模型缺乏可解释性。这些模型来解释他们的决策的能力是促进临床信任和促进临床转化的重要。此外,在医学上很多问题有丰富现有的临床知识时,这可能是在生成的解释有用绘制的,但它没有很明显这方面的知识可以被编码为DL模型 - 大部分车型都从头开始或任一学利用传递来自不同域的学习。在本文中,我们解决了这两个问题。我们提出了一种基于变分的自动编码(VAE)为基于图像的分类的新DL框架。该框架允许穿越判定边界,从而增强分类器的解释性的影响的从自动编码器的潜在空间感兴趣的输出的预测,以及可视化(在图像域中)。我们的主要贡献是,VAE理顺了那些纷繁根据现有的临床知识得出“解释”的潜在空间。该框架可以预测输出,以及这些输出的解释,并且也提出了发现新的生物标志物是从现有的知识分开的(或解缠结)的可能性。我们证明在预测患者的反应与心肌病从电影的心脏磁共振图像心脏再同步化治疗(CRT)的问题,我们的框架。对CRT的响应预测的任务,该模型的敏感性和特异性分别为88.43%和84.39%,我们展示我们的模型的潜力,增强促进CRT反应的因素的理解。
Esther Puyol-Antón, Chen Chen, James R. Clough, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark Elliott, Vishal Mehta, Daniel Rueckert, Christopher A. Rinaldi, Andrew P. King
Abstract: Advances in deep learning (DL) have resulted in impressive accuracy in some medical image classification tasks, but often deep models lack interpretability. The ability of these models to explain their decisions is important for fostering clinical trust and facilitating clinical translation. Furthermore, for many problems in medicine there is a wealth of existing clinical knowledge to draw upon, which may be useful in generating explanations, but it is not obvious how this knowledge can be encoded into DL models - most models are learnt either from scratch or using transfer learning from a different domain. In this paper we address both of these issues. We propose a novel DL framework for image-based classification based on a variational autoencoder (VAE). The framework allows prediction of the output of interest from the latent space of the autoencoder, as well as visualisation (in the image domain) of the effects of crossing the decision boundary, thus enhancing the interpretability of the classifier. Our key contribution is that the VAE disentangles the latent space based on 'explanations' drawn from existing clinical knowledge. The framework can predict outputs as well as explanations for these outputs, and also raises the possibility of discovering new biomarkers that are separate (or disentangled) from the existing knowledge. We demonstrate our framework on the problem of predicting response of patients with cardiomyopathy to cardiac resynchronization therapy (CRT) from cine cardiac magnetic resonance images. The sensitivity and specificity of the proposed model on the task of CRT response prediction are 88.43% and 84.39% respectively, and we showcase the potential of our model in enhancing understanding of the factors contributing to CRT response.
摘要:进展深度学习(DL)已导致令人印象深刻的准确性在一些医学图像分类的任务,但往往是根深蒂固的模型缺乏可解释性。这些模型来解释他们的决策的能力是促进临床信任和促进临床转化的重要。此外,在医学上很多问题有丰富现有的临床知识时,这可能是在生成的解释有用绘制的,但它没有很明显这方面的知识可以被编码为DL模型 - 大部分车型都从头开始或任一学利用传递来自不同域的学习。在本文中,我们解决了这两个问题。我们提出了一种基于变分的自动编码(VAE)为基于图像的分类的新DL框架。该框架允许穿越判定边界,从而增强分类器的解释性的影响的从自动编码器的潜在空间感兴趣的输出的预测,以及可视化(在图像域中)。我们的主要贡献是,VAE理顺了那些纷繁根据现有的临床知识得出“解释”的潜在空间。该框架可以预测输出,以及这些输出的解释,并且也提出了发现新的生物标志物是从现有的知识分开的(或解缠结)的可能性。我们证明在预测患者的反应与心肌病从电影的心脏磁共振图像心脏再同步化治疗(CRT)的问题,我们的框架。对CRT的响应预测的任务,该模型的敏感性和特异性分别为88.43%和84.39%,我们展示我们的模型的潜力,增强促进CRT反应的因素的理解。
53. COVID-CXNet: Detecting COVID-19 in Frontal Chest X-ray Images using Deep Learning [PDF] 返回目录
Arman Haghanifar, Mahdiyar Molahasani Majdabadi, Seokbum Ko
Abstract: One of the primary clinical observations for screening the infectious by the novel coronavirus is capturing a chest x-ray image. In most of the patients, a chest x-ray contains abnormalities, such as consolidation, which are the results of COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from various sources are collected, and the largest publicly accessible dataset is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized for developing COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system.
摘要:一个主要的临床观察用于由新的冠状病毒筛选感染性的被拍摄胸部X射线图像。在大多数患者中,胸部X射线含有异常,如合并,这是COVID-19病毒性肺炎的结果。在这项研究中,研究在大型数据集使用深卷积神经网络这种类型肺炎的有效地检测成像特征进行。它表明,简单的模型,旁边文献多数预训练网络,重点对决策无关的功能。在本文中,从各种渠道多次胸部X射线图像采集,最大可公开访问的数据集准备。最后,使用转印学习范例中,公知的CheXNet模型被用于显影COVID-CXNet。这种强大的模型是能够检测基于与精确定位有关的和有意义的特征的新的冠状病毒性肺炎。 COVID-CXNet是朝向一个完全自动化的和健壮COVID-19检测系统的步骤。
Arman Haghanifar, Mahdiyar Molahasani Majdabadi, Seokbum Ko
Abstract: One of the primary clinical observations for screening the infectious by the novel coronavirus is capturing a chest x-ray image. In most of the patients, a chest x-ray contains abnormalities, such as consolidation, which are the results of COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from various sources are collected, and the largest publicly accessible dataset is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized for developing COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system.
摘要:一个主要的临床观察用于由新的冠状病毒筛选感染性的被拍摄胸部X射线图像。在大多数患者中,胸部X射线含有异常,如合并,这是COVID-19病毒性肺炎的结果。在这项研究中,研究在大型数据集使用深卷积神经网络这种类型肺炎的有效地检测成像特征进行。它表明,简单的模型,旁边文献多数预训练网络,重点对决策无关的功能。在本文中,从各种渠道多次胸部X射线图像采集,最大可公开访问的数据集准备。最后,使用转印学习范例中,公知的CheXNet模型被用于显影COVID-CXNet。这种强大的模型是能够检测基于与精确定位有关的和有意义的特征的新的冠状病毒性肺炎。 COVID-CXNet是朝向一个完全自动化的和健壮COVID-19检测系统的步骤。
54. A Novel Approach for Correcting Multiple Discrete Rigid In-Plane Motions Artefacts in MRI Scans [PDF] 返回目录
Michael Rotman, Rafi Brada, Israel Beniaminy, Sangtae Ahn, Christopher J. Hardy, Lior Wolf
Abstract: Motion artefacts created by patient motion during an MRI scan occur frequently in practice, often rendering the scans clinically unusable and requiring a re-scan. While many methods have been employed to ameliorate the effects of patient motion, these often fall short in practice. In this paper we propose a novel method for removing motion artefacts using a deep neural network with two input branches that discriminates between patient poses using the motion's timing. The first branch receives a subset of the $k$-space data collected during a single patient pose, and the second branch receives the remaining part of the collected $k$-space data. The proposed method can be applied to artefacts generated by multiple movements of the patient. Furthermore, it can be used to correct motion for the case where $k$-space has been under-sampled, to shorten the scan time, as is common when using methods such as parallel imaging or compressed sensing. Experimental results on both simulated and real MRI data show the efficacy of our approach.
摘要:在MRI过程中患者运动产生的运动伪影扫描经常发生在实践中,往往使扫描临床上无法使用,必须重新扫描。虽然许多方法已被用来改善患者运动的影响,这些往往缺乏实践。在本文中,我们提出一种用于使用深神经网络具有两个输入分支去除运动伪影的新颖的方法,使用所述运动的定时患者姿势之间判别。第一分支接收单个患者姿态期间收集的$ $ķ数据k-空间的子集,并且所述第二分支的接收收集$ $ķ数据k-空间的剩余部分。所提出的方法可以应用到由患者的多个运动产生的伪影。此外,它可以被用于为的情况下正确的运动,其中$ $ķk-空间一直欠采样,为了缩短扫描时间,如使用的方法,例如平行成像或压缩传感时是常见的。两个模拟和实际MRI数据实验结果表明我们的方法的有效性。
Michael Rotman, Rafi Brada, Israel Beniaminy, Sangtae Ahn, Christopher J. Hardy, Lior Wolf
Abstract: Motion artefacts created by patient motion during an MRI scan occur frequently in practice, often rendering the scans clinically unusable and requiring a re-scan. While many methods have been employed to ameliorate the effects of patient motion, these often fall short in practice. In this paper we propose a novel method for removing motion artefacts using a deep neural network with two input branches that discriminates between patient poses using the motion's timing. The first branch receives a subset of the $k$-space data collected during a single patient pose, and the second branch receives the remaining part of the collected $k$-space data. The proposed method can be applied to artefacts generated by multiple movements of the patient. Furthermore, it can be used to correct motion for the case where $k$-space has been under-sampled, to shorten the scan time, as is common when using methods such as parallel imaging or compressed sensing. Experimental results on both simulated and real MRI data show the efficacy of our approach.
摘要:在MRI过程中患者运动产生的运动伪影扫描经常发生在实践中,往往使扫描临床上无法使用,必须重新扫描。虽然许多方法已被用来改善患者运动的影响,这些往往缺乏实践。在本文中,我们提出一种用于使用深神经网络具有两个输入分支去除运动伪影的新颖的方法,使用所述运动的定时患者姿势之间判别。第一分支接收单个患者姿态期间收集的$ $ķ数据k-空间的子集,并且所述第二分支的接收收集$ $ķ数据k-空间的剩余部分。所提出的方法可以应用到由患者的多个运动产生的伪影。此外,它可以被用于为的情况下正确的运动,其中$ $ķk-空间一直欠采样,为了缩短扫描时间,如使用的方法,例如平行成像或压缩传感时是常见的。两个模拟和实际MRI数据实验结果表明我们的方法的有效性。
55. OvA-INN: Continual Learning with Invertible Neural Networks [PDF] 返回目录
G. Hocquet, O. Bichler, D. Querlioz
Abstract: In the field of Continual Learning, the objective is to learn several tasks one after the other without access to the data from previous tasks. Several solutions have been proposed to tackle this problem but they usually assume that the user knows which of the tasks to perform at test time on a particular sample, or rely on small samples from previous data and most of them suffer of a substantial drop in accuracy when updated with batches of only one class at a time. In this article, we propose a new method, OvA-INN, which is able to learn one class at a time and without storing any of the previous data. To achieve this, for each class, we train a specific Invertible Neural Network to extract the relevant features to compute the likelihood on this class. At test time, we can predict the class of a sample by identifying the network which predicted the highest likelihood. With this method, we show that we can take advantage of pretrained models by stacking an Invertible Network on top of a feature extractor. This way, we are able to outperform state-of-the-art approaches that rely on features learning for the Continual Learning of MNIST and CIFAR-100 datasets. In our experiments, we reach 72% accuracy on CIFAR-100 after training our model one class at a time.
摘要:通过不断地学习领域,目的是学习一前一后的几个任务,而无需访问从以前的任务中的数据。几种解决方案已经提出来解决这个问题,但他们通常假设用户知道哪些任务在测试时间上的一个特定的样品进行,或者依赖于从以前的数据小样本,其中大部分遭受精度大幅下降的当在同一时间只有一个类的批次更新。在这篇文章中,我们提出了一种新的方法,OVA-INN,这是能够学习一个班的时间和不保存任何以前的数据。为了实现这一目标,为每个类,我们培养一个特定可逆的神经网络来提取相关特征来计算这个类的可能性。在测试时间,我们可以通过识别哪些所预测的最高似然的网络预测类的样品。通过这种方法,我们表明,我们可以通过在特征提取上堆叠可逆网络采取预训练模式的优势。通过这种方式,我们能够依靠学习功能的不断学习MNIST和CIFAR-100数据集,跑赢大市的国家的最先进的方法。在我们的实验中,我们在同一时间训练我们的模型一类后CIFAR-100达到72%的准确率。
G. Hocquet, O. Bichler, D. Querlioz
Abstract: In the field of Continual Learning, the objective is to learn several tasks one after the other without access to the data from previous tasks. Several solutions have been proposed to tackle this problem but they usually assume that the user knows which of the tasks to perform at test time on a particular sample, or rely on small samples from previous data and most of them suffer of a substantial drop in accuracy when updated with batches of only one class at a time. In this article, we propose a new method, OvA-INN, which is able to learn one class at a time and without storing any of the previous data. To achieve this, for each class, we train a specific Invertible Neural Network to extract the relevant features to compute the likelihood on this class. At test time, we can predict the class of a sample by identifying the network which predicted the highest likelihood. With this method, we show that we can take advantage of pretrained models by stacking an Invertible Network on top of a feature extractor. This way, we are able to outperform state-of-the-art approaches that rely on features learning for the Continual Learning of MNIST and CIFAR-100 datasets. In our experiments, we reach 72% accuracy on CIFAR-100 after training our model one class at a time.
摘要:通过不断地学习领域,目的是学习一前一后的几个任务,而无需访问从以前的任务中的数据。几种解决方案已经提出来解决这个问题,但他们通常假设用户知道哪些任务在测试时间上的一个特定的样品进行,或者依赖于从以前的数据小样本,其中大部分遭受精度大幅下降的当在同一时间只有一个类的批次更新。在这篇文章中,我们提出了一种新的方法,OVA-INN,这是能够学习一个班的时间和不保存任何以前的数据。为了实现这一目标,为每个类,我们培养一个特定可逆的神经网络来提取相关特征来计算这个类的可能性。在测试时间,我们可以通过识别哪些所预测的最高似然的网络预测类的样品。通过这种方法,我们表明,我们可以通过在特征提取上堆叠可逆网络采取预训练模式的优势。通过这种方式,我们能够依靠学习功能的不断学习MNIST和CIFAR-100数据集,跑赢大市的国家的最先进的方法。在我们的实验中,我们在同一时间训练我们的模型一类后CIFAR-100达到72%的准确率。
56. GMMLoc: Structure Consistent Visual Localization with Gaussian Mixture Models [PDF] 返回目录
Huaiyang Huang, Haoyang Ye, Yuxiang Sun, Ming Liu
Abstract: Incorporating prior structure information into the visual state estimation could generally improve the localization performance. In this letter, we aim to address the paradox between accuracy and efficiency in coupling visual factors with structure constraints. To this end, we present a cross-modality method that tracks a camera in a prior map modelled by the Gaussian Mixture Model (GMM). With the pose estimated by the front-end initially, the local visual observations and map components are associated efficiently, and the visual structure from the triangulation is refined simultaneously. By introducing the hybrid structure factors into the joint optimization, the camera poses are bundle-adjusted with the local visual structure. By evaluating our complete system, namely GMMLoc, on the public dataset, we show how our system can provide a centimeter-level localization accuracy with only trivial computational overhead. In addition, the comparative studies with the state-of-the-art vision-dominant state estimators demonstrate the competitive performance of our method.
摘要:进入视觉状态估计并入现有结构信息通常可以提高定位性能。在这种信,我们的目的在于解决精度和效率之间的悖论在耦合视觉因素与结构约束。为此,我们提出了一个跨模态方法,其轨迹由高斯混合模型(GMM)建模的现有地图的照相机。与由前端最初估计的姿态,局部视觉观察和地图部件被有效地相关联,并且从所述三角测量视觉结构被同时细化。通过引入混合结构因素考虑联合优化,照相机姿态是束调节到与局部视觉结构。通过评估我们的完整的系统,即GMMLoc,在公共数据集,我们显示我们的系统如何提供只有微不足道的计算开销厘米级定位精度。此外,随着国家的最先进的视觉显性状态估计的比较研究证明了该方法的有竞争力的表现。
Huaiyang Huang, Haoyang Ye, Yuxiang Sun, Ming Liu
Abstract: Incorporating prior structure information into the visual state estimation could generally improve the localization performance. In this letter, we aim to address the paradox between accuracy and efficiency in coupling visual factors with structure constraints. To this end, we present a cross-modality method that tracks a camera in a prior map modelled by the Gaussian Mixture Model (GMM). With the pose estimated by the front-end initially, the local visual observations and map components are associated efficiently, and the visual structure from the triangulation is refined simultaneously. By introducing the hybrid structure factors into the joint optimization, the camera poses are bundle-adjusted with the local visual structure. By evaluating our complete system, namely GMMLoc, on the public dataset, we show how our system can provide a centimeter-level localization accuracy with only trivial computational overhead. In addition, the comparative studies with the state-of-the-art vision-dominant state estimators demonstrate the competitive performance of our method.
摘要:进入视觉状态估计并入现有结构信息通常可以提高定位性能。在这种信,我们的目的在于解决精度和效率之间的悖论在耦合视觉因素与结构约束。为此,我们提出了一个跨模态方法,其轨迹由高斯混合模型(GMM)建模的现有地图的照相机。与由前端最初估计的姿态,局部视觉观察和地图部件被有效地相关联,并且从所述三角测量视觉结构被同时细化。通过引入混合结构因素考虑联合优化,照相机姿态是束调节到与局部视觉结构。通过评估我们的完整的系统,即GMMLoc,在公共数据集,我们显示我们的系统如何提供只有微不足道的计算开销厘米级定位精度。此外,随着国家的最先进的视觉显性状态估计的比较研究证明了该方法的有竞争力的表现。
57. On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures [PDF] 返回目录
Maxim Samarin, Volker Roth, David Belius
Abstract: The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of widths common in practice, trained on complex datasets such as ImageNet. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet, and find that their behavior deviates significantly from their finite-width NTK counterparts. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
摘要:神经切线内核(NTK)是在不断努力建立深度学习理论的一个重要里程碑。它的预测,即足够宽的神经网络表现为内核的方法,或等效随机特征模型,经验已经证实了某些广泛的架构。它仍然是一个悬而未决的问题NTK理论模型标准宽度常见的神经网络结构在实践中,如何在复杂的数据集的培训,如ImageNet。我们凭经验两个著名的卷积神经网络架构研究这个问题,即AlexNet和LeNet,而且发现他们的行为与他们有限的宽度NTK同行显著偏离。对于较宽的版本的这些网络中,信道和全连接层的宽度的数量增加时,偏差减小。
Maxim Samarin, Volker Roth, David Belius
Abstract: The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of widths common in practice, trained on complex datasets such as ImageNet. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet, and find that their behavior deviates significantly from their finite-width NTK counterparts. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
摘要:神经切线内核(NTK)是在不断努力建立深度学习理论的一个重要里程碑。它的预测,即足够宽的神经网络表现为内核的方法,或等效随机特征模型,经验已经证实了某些广泛的架构。它仍然是一个悬而未决的问题NTK理论模型标准宽度常见的神经网络结构在实践中,如何在复杂的数据集的培训,如ImageNet。我们凭经验两个著名的卷积神经网络架构研究这个问题,即AlexNet和LeNet,而且发现他们的行为与他们有限的宽度NTK同行显著偏离。对于较宽的版本的这些网络中,信道和全连接层的宽度的数量增加时,偏差减小。
58. Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model [PDF] 返回目录
Ren Yang, Fabian Mentzer, Luc Van Gool, Radu Timofte
Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265.
摘要:在过去的几年中已经在应用深度学习视频压缩目击越来越浓厚的兴趣。但是,现有的方法压缩,只有少数参考帧的数量,这限制了它们的能力完全利用视频帧之间的时间相关性的视频帧。为了克服这一缺点,本文提出了与复发自动编码器(RAE)和复发概率模型(RPM)反复出现的教训视频压缩(RLVC)的方法。具体而言,采用RAE在编码器和解码器两者经常性的细胞。这样,在大范围的帧的时间信息可被用于生成潜表示和重建压缩输出。此外,建议RPM网络反复估计潜在表现的概率质量函数(PMF),空调,以往潜伏表示的分布。由于连续帧之间的相关性,条件交叉熵可以比独立交叉熵越低,因此降低了比特率。实验表明,我们的方法实现了两个PSNR和MS-SSIM方面的先进设备,最先进的视频了解到压缩性能。此外,我们的方法比对PSNR X265的设置默认低时延P(LDP),并且也对MS-SSIM比SSIM调整的X265和X265的慢的设置更好的性能。
Ren Yang, Fabian Mentzer, Luc Van Gool, Radu Timofte
Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265.
摘要:在过去的几年中已经在应用深度学习视频压缩目击越来越浓厚的兴趣。但是,现有的方法压缩,只有少数参考帧的数量,这限制了它们的能力完全利用视频帧之间的时间相关性的视频帧。为了克服这一缺点,本文提出了与复发自动编码器(RAE)和复发概率模型(RPM)反复出现的教训视频压缩(RLVC)的方法。具体而言,采用RAE在编码器和解码器两者经常性的细胞。这样,在大范围的帧的时间信息可被用于生成潜表示和重建压缩输出。此外,建议RPM网络反复估计潜在表现的概率质量函数(PMF),空调,以往潜伏表示的分布。由于连续帧之间的相关性,条件交叉熵可以比独立交叉熵越低,因此降低了比特率。实验表明,我们的方法实现了两个PSNR和MS-SSIM方面的先进设备,最先进的视频了解到压缩性能。此外,我们的方法比对PSNR X265的设置默认低时延P(LDP),并且也对MS-SSIM比SSIM调整的X265和X265的慢的设置更好的性能。
59. Normalized Loss Functions for Deep Learning with Noisy Labels [PDF] 返回目录
Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, James Bailey
Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels. It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels. Whilst new loss functions have been designed, they are only partially robust. In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels. However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs. By investigating several robust loss functions, we find that they suffer from a problem of underfitting. To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL). APL combines two robust loss functions that mutually boost each other. Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.
摘要:稳健损失函数是在嘈杂的(不正确)标签的存在准确的训练深层神经网络(DNNs)是必不可少的。它已经表明,常用的交叉熵(CE)的损失是不稳健嘈杂的标签。虽然新的损失函数被设计的,但只是部分稳健。任何损失可以由稳健的嘈杂标签:在本文中,我们从理论上通过应用简单正常化,显示。然而,在实践中,只要是稳健是不够的损失函数来训练准确DNNs。通过考察几个健壮的损失函数,我们发现,他们从欠拟合的问题的困扰。为了解决这个问题,我们提出了一个框架来构建所谓的主被动损失(APL)稳健损失函数。 APL结合了相互促进彼此的两个强大的损失函数。在基准数据集实验证明,由我们的APL框架创建新的损失函数的家人可以国家的最先进的持续跑赢大盘的方法大的利润,尤其是在大的噪声率,如60%或80%的不正确的标签。
Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, James Bailey
Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels. It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels. Whilst new loss functions have been designed, they are only partially robust. In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels. However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs. By investigating several robust loss functions, we find that they suffer from a problem of underfitting. To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL). APL combines two robust loss functions that mutually boost each other. Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.
摘要:稳健损失函数是在嘈杂的(不正确)标签的存在准确的训练深层神经网络(DNNs)是必不可少的。它已经表明,常用的交叉熵(CE)的损失是不稳健嘈杂的标签。虽然新的损失函数被设计的,但只是部分稳健。任何损失可以由稳健的嘈杂标签:在本文中,我们从理论上通过应用简单正常化,显示。然而,在实践中,只要是稳健是不够的损失函数来训练准确DNNs。通过考察几个健壮的损失函数,我们发现,他们从欠拟合的问题的困扰。为了解决这个问题,我们提出了一个框架来构建所谓的主被动损失(APL)稳健损失函数。 APL结合了相互促进彼此的两个强大的损失函数。在基准数据集实验证明,由我们的APL框架创建新的损失函数的家人可以国家的最先进的持续跑赢大盘的方法大的利润,尤其是在大的噪声率,如60%或80%的不正确的标签。
60. Flexible Image Denoising with Multi-layer Conditional Feature Modulation [PDF] 返回目录
Jiazhi Du, Xin Qiao, Zifei Yan, Hongzhi Zhang, Wangmeng Zuo
Abstract: For flexible non-blind image denoising, existing deep networks usually take both noisy image and noise level map as the input to handle various noise levels with a single model. However, in this kind of solution, the noise variance (i.e., noise level) is only deployed to modulate the first layer of convolution feature with channel-wise shifting, which is limited in balancing noise removal and detail preservation. In this paper, we present a novel flexible image enoising network (CFMNet) by equipping an U-Net backbone with multi-layer conditional feature modulation (CFM) modules. In comparison to channel-wise shifting only in the first layer, CFMNet can make better use of noise level information by deploying multiple layers of CFM. Moreover, each CFM module takes onvolutional features from both noisy image and noise level map as input for better trade-off between noise removal and detail preservation. Experimental results show that our CFMNet is effective in exploiting noise level information for flexible non-blind denoising, and performs favorably against the existing deep image denoising methods in terms of both quantitative metrics and visual quality.
摘要:柔性非盲图像去噪,现有深网络通常都采取噪声图像和噪声水平的地图作为输入来处理与一个单一的模型中的各种噪声水平。然而,在该种溶液中,噪声方差(即,噪声电平)仅部署到调制卷积特征的信道与逐变速,这在平衡噪声去除和细节保留局限于第一层。在本文中,我们提出了一个新颖的灵活的图像通过配备有多层条件特征调制(CFM)模块的U形网骨干网enoising(CFMNet)。在仅在第一层相比,信道逐变速,CFMNet可以通过部署CFM的多个层更好地利用噪声电平信息。此外,每个CFM模块需要从两个噪声图像onvolutional特征和噪声电平的地图作为更好的权衡噪声去除和细节保留之间的输入。实验结果表明,我们的CFMNet有效地利用柔性非盲去噪噪声电平的信息,和有利地对现有的深图像进行去噪在定量指标和视觉质量方面的方法。
Jiazhi Du, Xin Qiao, Zifei Yan, Hongzhi Zhang, Wangmeng Zuo
Abstract: For flexible non-blind image denoising, existing deep networks usually take both noisy image and noise level map as the input to handle various noise levels with a single model. However, in this kind of solution, the noise variance (i.e., noise level) is only deployed to modulate the first layer of convolution feature with channel-wise shifting, which is limited in balancing noise removal and detail preservation. In this paper, we present a novel flexible image enoising network (CFMNet) by equipping an U-Net backbone with multi-layer conditional feature modulation (CFM) modules. In comparison to channel-wise shifting only in the first layer, CFMNet can make better use of noise level information by deploying multiple layers of CFM. Moreover, each CFM module takes onvolutional features from both noisy image and noise level map as input for better trade-off between noise removal and detail preservation. Experimental results show that our CFMNet is effective in exploiting noise level information for flexible non-blind denoising, and performs favorably against the existing deep image denoising methods in terms of both quantitative metrics and visual quality.
摘要:柔性非盲图像去噪,现有深网络通常都采取噪声图像和噪声水平的地图作为输入来处理与一个单一的模型中的各种噪声水平。然而,在该种溶液中,噪声方差(即,噪声电平)仅部署到调制卷积特征的信道与逐变速,这在平衡噪声去除和细节保留局限于第一层。在本文中,我们提出了一个新颖的灵活的图像通过配备有多层条件特征调制(CFM)模块的U形网骨干网enoising(CFMNet)。在仅在第一层相比,信道逐变速,CFMNet可以通过部署CFM的多个层更好地利用噪声电平信息。此外,每个CFM模块需要从两个噪声图像onvolutional特征和噪声电平的地图作为更好的权衡噪声去除和细节保留之间的输入。实验结果表明,我们的CFMNet有效地利用柔性非盲去噪噪声电平的信息,和有利地对现有的深图像进行去噪在定量指标和视觉质量方面的方法。
61. GIFnets: Differentiable GIF Encoding Framework [PDF] 返回目录
Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, Peyman Milanfar
Abstract: Graphics Interchange Format (GIF) is a widely used image file format. Due to the limited number of palette colors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF decoders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding.
摘要:图形交换格式(GIF)是一种广泛使用的图像文件格式。由于调色板颜色的数量有限,GIF编码常常引入彩色带条伪影。传统上,抖动被施加到减小的色带,但引入点缀图案的伪影。为了减少伪影并提供更好的和更有效的编码GIF,我们引入一个微分编码GIF管道,其包括三种新型神经网络:PaletteNet,DitherNet,和BandingNet。这三个网络提供了GIF编码流水线中的一个重要功能。 PaletteNet预测给定的输入图像中的接近最优的调色板。 DitherNet操作输入图像,以减少颜色的带状伪影,并提供传统的抖动的替代方案。最后,BandingNet被设计用于检测色带,并且具体地提供了一种新的感知损失为GIF图像。据我们所知,这是基于深层神经网络,并与现有的GIF解码器兼容的第一个完全可微GIF编码流水线。用户研究表明,我们的算法比弗洛伊德 - 斯坦伯格基于GIF编码更好。
Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, Peyman Milanfar
Abstract: Graphics Interchange Format (GIF) is a widely used image file format. Due to the limited number of palette colors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF decoders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding.
摘要:图形交换格式(GIF)是一种广泛使用的图像文件格式。由于调色板颜色的数量有限,GIF编码常常引入彩色带条伪影。传统上,抖动被施加到减小的色带,但引入点缀图案的伪影。为了减少伪影并提供更好的和更有效的编码GIF,我们引入一个微分编码GIF管道,其包括三种新型神经网络:PaletteNet,DitherNet,和BandingNet。这三个网络提供了GIF编码流水线中的一个重要功能。 PaletteNet预测给定的输入图像中的接近最优的调色板。 DitherNet操作输入图像,以减少颜色的带状伪影,并提供传统的抖动的替代方案。最后,BandingNet被设计用于检测色带,并且具体地提供了一种新的感知损失为GIF图像。据我们所知,这是基于深层神经网络,并与现有的GIF解码器兼容的第一个完全可微GIF编码流水线。用户研究表明,我们的算法比弗洛伊德 - 斯坦伯格基于GIF编码更好。
62. Learning Disentangled Representations of Video with Missing Data [PDF] 返回目录
Armand Comas Massague, Chi Zhang, Zlatan Feric, Octavia Camps, Rose Yu
Abstract: Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting.
摘要:数据丢失而造成的学习视频序列的一系列代表显著的挑战。我们目前估算的解开视频的自动编码(DIVE),深生成模式,责难和预测未来的视频帧中丢失的数据的存在。具体而言,DIVE引入了一个missingness潜变量,理顺了那些纷繁隐藏视频表示成静态和动态的外观,姿势,和missingness因素对于每个对象,而它责难其中数据丢失的每个对象的轨迹。与各情景缺少运动数据集MNIST,DIVE通过了足够的余量优于本领域基线的状态。我们对现实世界的MOTSChallenge行人数据集,这表明在一个更现实的设定我们的方法的实用价值也存在比较。
Armand Comas Massague, Chi Zhang, Zlatan Feric, Octavia Camps, Rose Yu
Abstract: Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting.
摘要:数据丢失而造成的学习视频序列的一系列代表显著的挑战。我们目前估算的解开视频的自动编码(DIVE),深生成模式,责难和预测未来的视频帧中丢失的数据的存在。具体而言,DIVE引入了一个missingness潜变量,理顺了那些纷繁隐藏视频表示成静态和动态的外观,姿势,和missingness因素对于每个对象,而它责难其中数据丢失的每个对象的轨迹。与各情景缺少运动数据集MNIST,DIVE通过了足够的余量优于本领域基线的状态。我们对现实世界的MOTSChallenge行人数据集,这表明在一个更现实的设定我们的方法的实用价值也存在比较。
63. Realistic Adversarial Data Augmentation for MR Image Segmentation [PDF] 返回目录
Chen Chen, Chen Qin, Huaqi Qiu, Cheng Ouyang, Shuo Wang, Liang Chen, Giacomo Tarroni, Wenjia Bai, Daniel Rueckert
Abstract: Neural network-based approaches can achieve high accuracy in various medical image segmentation tasks. However, they generally require large labelled datasets for supervised learning. Acquiring and manually labelling a large medical dataset is expensive and sometimes impractical due to data sharing and privacy issues. In this work, we propose an adversarial data augmentation method for training neural networks for medical image segmentation. Instead of generating pixel-wise adversarial attacks, our model generates plausible and realistic signal corruptions, which models the intensity inhomogeneities caused by a common type of artefacts in MR imaging: bias field. The proposed method does not rely on generative networks, and can be used as a plug-in module for general segmentation networks in both supervised and semi-supervised learning. Using cardiac MR imaging we show that such an approach can improve the generalization ability and robustness of models as well as provide significant improvements in low-data scenarios.
摘要:基于神经网络的方法可以在各种医学图像分割任务,精度高。然而,他们通常需要监督学习大标记的数据集。获取和手工标注大型医疗数据集是昂贵的,有时是不切实际的,由于数据共享和隐私问题。在这项工作中,我们提出了训练神经网络的医学图像分割的对抗性增强数据的方法。代替产生逐个像素的对抗攻击的,我们的模型生成合理的和现实的信号损坏,该模型所引起的MR成像的共同类型伪像的强度的不均匀性:偏置场。该方法不依赖于生成网络,并可以作为插件模块一般分割网络,既监督和半监督学习。使用心脏磁共振成像,我们表明,这种方法可以提高模型的泛化能力和鲁棒性,以及提供低数据的情况显著改善。
Chen Chen, Chen Qin, Huaqi Qiu, Cheng Ouyang, Shuo Wang, Liang Chen, Giacomo Tarroni, Wenjia Bai, Daniel Rueckert
Abstract: Neural network-based approaches can achieve high accuracy in various medical image segmentation tasks. However, they generally require large labelled datasets for supervised learning. Acquiring and manually labelling a large medical dataset is expensive and sometimes impractical due to data sharing and privacy issues. In this work, we propose an adversarial data augmentation method for training neural networks for medical image segmentation. Instead of generating pixel-wise adversarial attacks, our model generates plausible and realistic signal corruptions, which models the intensity inhomogeneities caused by a common type of artefacts in MR imaging: bias field. The proposed method does not rely on generative networks, and can be used as a plug-in module for general segmentation networks in both supervised and semi-supervised learning. Using cardiac MR imaging we show that such an approach can improve the generalization ability and robustness of models as well as provide significant improvements in low-data scenarios.
摘要:基于神经网络的方法可以在各种医学图像分割任务,精度高。然而,他们通常需要监督学习大标记的数据集。获取和手工标注大型医疗数据集是昂贵的,有时是不切实际的,由于数据共享和隐私问题。在这项工作中,我们提出了训练神经网络的医学图像分割的对抗性增强数据的方法。代替产生逐个像素的对抗攻击的,我们的模型生成合理的和现实的信号损坏,该模型所引起的MR成像的共同类型伪像的强度的不均匀性:偏置场。该方法不依赖于生成网络,并可以作为插件模块一般分割网络,既监督和半监督学习。使用心脏磁共振成像,我们表明,这种方法可以提高模型的泛化能力和鲁棒性,以及提供低数据的情况显著改善。
64. 70 years of machine learning in geoscience in review [PDF] 返回目录
Jesper Sören Dramsch
Abstract: This review gives an overview of the development of machine learning in geoscience. A thorough analysis of the co-developments of machine learning applications throughout the last 70 years relates the recent enthusiasm for machine learning to developments in geoscience. I explore the shift of kriging towards a mainstream machine learning method and the historic application of neural networks in geoscience, following the general trend of machine learning enthusiasm through the decades. Furthermore, this chapter explores the shift from mathematical fundamentals and knowledge in software development towards skills in model validation, applied statistics, and integrated subject matter expertise. The review is interspersed with code examples to complement the theoretical foundations and illustrate model validation and machine learning explainability for science. The scope of this review includes various shallow machine learning methods, e.g. Decision Trees, Random Forests, Support-Vector Machines, and Gaussian Processes, as well as, deep neural networks, including feed-forward neural networks, convolutional neural networks, recurrent neural networks and generative adversarial networks. Regarding geoscience, the review has a bias towards geophysics but aims to strike a balance with geochemistry, geostatistics, and geology, however excludes remote sensing, as this would exceed the scope. In general, I aim to provide context for the recent enthusiasm surrounding deep learning with respect to research, hardware, and software developments that enable successful application of shallow and deep machine learning in all disciplines of Earth science.
摘要:本文综述了机器学习在地球科学发展的概况。的机器学习应用的共同发展,在整个过去70年进行全面分析涉及机器学习在地球科学的发展,最近的热情。我探索克里金走向主流的机器学习方法和神经网络在地质历史悠久的应用,下面的机器学习的积极性,通过几十年的大趋势的转变。此外,这一章探讨了从数学基础的软件开发走向技能模型验证,应用统计,并综合专业技能的转变和知识。审查与代码示例穿插补充的理论基础,并说明模型验证和机器学习explainability科学。该评价的范围包括各种浅的机器学习方法,例如决策树,随机森林,支持向量机和高斯过程,以及,深层神经网络,包括前馈神经网络,卷积神经网络,递归神经网络,并生成对抗网络。关于地球科学,审查了对地球物理学,但目标偏向于打击与地球化学,地质统计学和地质学的平衡,但不包括遥感,因为这将超出范围。一般情况下,我的目标是提供最近围绕深度学习相对于研究,硬件和软件的发展,使地球科学学科浅层和深层的机器学习的成功应用的积极性上下文。
Jesper Sören Dramsch
Abstract: This review gives an overview of the development of machine learning in geoscience. A thorough analysis of the co-developments of machine learning applications throughout the last 70 years relates the recent enthusiasm for machine learning to developments in geoscience. I explore the shift of kriging towards a mainstream machine learning method and the historic application of neural networks in geoscience, following the general trend of machine learning enthusiasm through the decades. Furthermore, this chapter explores the shift from mathematical fundamentals and knowledge in software development towards skills in model validation, applied statistics, and integrated subject matter expertise. The review is interspersed with code examples to complement the theoretical foundations and illustrate model validation and machine learning explainability for science. The scope of this review includes various shallow machine learning methods, e.g. Decision Trees, Random Forests, Support-Vector Machines, and Gaussian Processes, as well as, deep neural networks, including feed-forward neural networks, convolutional neural networks, recurrent neural networks and generative adversarial networks. Regarding geoscience, the review has a bias towards geophysics but aims to strike a balance with geochemistry, geostatistics, and geology, however excludes remote sensing, as this would exceed the scope. In general, I aim to provide context for the recent enthusiasm surrounding deep learning with respect to research, hardware, and software developments that enable successful application of shallow and deep machine learning in all disciplines of Earth science.
摘要:本文综述了机器学习在地球科学发展的概况。的机器学习应用的共同发展,在整个过去70年进行全面分析涉及机器学习在地球科学的发展,最近的热情。我探索克里金走向主流的机器学习方法和神经网络在地质历史悠久的应用,下面的机器学习的积极性,通过几十年的大趋势的转变。此外,这一章探讨了从数学基础的软件开发走向技能模型验证,应用统计,并综合专业技能的转变和知识。审查与代码示例穿插补充的理论基础,并说明模型验证和机器学习explainability科学。该评价的范围包括各种浅的机器学习方法,例如决策树,随机森林,支持向量机和高斯过程,以及,深层神经网络,包括前馈神经网络,卷积神经网络,递归神经网络,并生成对抗网络。关于地球科学,审查了对地球物理学,但目标偏向于打击与地球化学,地质统计学和地质学的平衡,但不包括遥感,因为这将超出范围。一般情况下,我的目标是提供最近围绕深度学习相对于研究,硬件和软件的发展,使地球科学学科浅层和深层的机器学习的成功应用的积极性上下文。
65. Momentum Contrastive Learning for Few-Shot COVID-19 Diagnosis from Chest CT Images [PDF] 返回目录
Xiaocong Chen, Lina Yao, Tao Zhou, Jinming Dong, Yu Zhang
Abstract: The current pandemic, caused by the outbreak of a novel coronavirus (COVID-19) in December 2019, has led to a global emergency that has significantly impacted economies, healthcare systems and personal wellbeing all around the world. Controlling the rapidly evolving disease requires highly sensitive and specific diagnostics. While real-time RT-PCR is the most commonly used, these can take up to 8 hours, and require significant effort from healthcare professionals. As such, there is a critical need for a quick and automatic diagnostic system. Diagnosis from chest CT images is a promising direction. However, current studies are limited by the lack of sufficient training samples, as acquiring annotated CT images is time-consuming. To this end, we propose a new deep learning algorithm for the automated diagnosis of COVID-19, which only requires a few samples for training. Specifically, we use contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification. We validate the efficacy of the proposed model in comparison with other competing methods on two publicly available and annotated COVID-19 CT datasets. Our results demonstrate the superior performance of our model for the accurate diagnosis of COVID-19 based on chest CT images.
摘要:目前的流感大流行,2019年十二月引起的一种新型冠状病毒(COVID-19)的爆发,导致了全球性紧急情况,有显著影响经济,医疗保健系统和个人福利在世界各地。控制快速发展的需要疾病高度敏感和特异的诊断。虽然实时RT-PCR是最常用的,这可能需要长达8个小时,并要求医护人员显著的努力。因此,有一个快速和自动诊断系统的关键需求。从胸部CT图像诊断是一种很有前途的方向。然而,目前的研究是由缺乏足够的训练样本的限制,因为获取注释的CT图像是耗时的。为此,我们提出了COVID-19的自动诊断,只需要几个样本训练一个新的深度学习算法。具体来说,我们用对比的学习训练,可以捕捉大和公开的数据集,肺表现特征表示,采用原型网络分类编码器。我们确认在两个公开,并注释COVID-19 CT数据等竞争方法相比,该模型的有效性。我们的研究结果证明了我们对基于胸部CT图像COVID-19的精确诊断模型的卓越性能。
Xiaocong Chen, Lina Yao, Tao Zhou, Jinming Dong, Yu Zhang
Abstract: The current pandemic, caused by the outbreak of a novel coronavirus (COVID-19) in December 2019, has led to a global emergency that has significantly impacted economies, healthcare systems and personal wellbeing all around the world. Controlling the rapidly evolving disease requires highly sensitive and specific diagnostics. While real-time RT-PCR is the most commonly used, these can take up to 8 hours, and require significant effort from healthcare professionals. As such, there is a critical need for a quick and automatic diagnostic system. Diagnosis from chest CT images is a promising direction. However, current studies are limited by the lack of sufficient training samples, as acquiring annotated CT images is time-consuming. To this end, we propose a new deep learning algorithm for the automated diagnosis of COVID-19, which only requires a few samples for training. Specifically, we use contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification. We validate the efficacy of the proposed model in comparison with other competing methods on two publicly available and annotated COVID-19 CT datasets. Our results demonstrate the superior performance of our model for the accurate diagnosis of COVID-19 based on chest CT images.
摘要:目前的流感大流行,2019年十二月引起的一种新型冠状病毒(COVID-19)的爆发,导致了全球性紧急情况,有显著影响经济,医疗保健系统和个人福利在世界各地。控制快速发展的需要疾病高度敏感和特异的诊断。虽然实时RT-PCR是最常用的,这可能需要长达8个小时,并要求医护人员显著的努力。因此,有一个快速和自动诊断系统的关键需求。从胸部CT图像诊断是一种很有前途的方向。然而,目前的研究是由缺乏足够的训练样本的限制,因为获取注释的CT图像是耗时的。为此,我们提出了COVID-19的自动诊断,只需要几个样本训练一个新的深度学习算法。具体来说,我们用对比的学习训练,可以捕捉大和公开的数据集,肺表现特征表示,采用原型网络分类编码器。我们确认在两个公开,并注释COVID-19 CT数据等竞争方法相比,该模型的有效性。我们的研究结果证明了我们对基于胸部CT图像COVID-19的精确诊断模型的卓越性能。
66. Was there COVID-19 back in 2012? Challenge for AI in Diagnosis with Similar Indications [PDF] 返回目录
Imon Banerjee, Priyanshu Sinha, Saptarshi Purkayastha, Nazanin Mashhaditafreshi, Amara Tariq, Jiwoong Jeong, Hari Trivedi, Judy W. Gichoya
Abstract: Purpose: Since the recent COVID-19 outbreak, there has been an avalanche of research papers applying deep learning based image processing to chest radiographs for detection of the disease. To test the performance of the two top models for CXR COVID-19 diagnosis on external datasets to assess model generalizability. Methods: In this paper, we present our argument regarding the efficiency and applicability of existing deep learning models for COVID-19 diagnosis. We provide results from two popular models - COVID-Net and CoroNet evaluated on three publicly available datasets and an additional institutional dataset collected from EMORY Hospital between January and May 2020, containing patients tested for COVID-19 infection using RT-PCR. Results: There is a large false positive rate (FPR) for COVID-Net on both ChexPert (55.3%) and MIMIC-CXR (23.4%) dataset. On the EMORY Dataset, COVID-Net has 61.4% sensitivity, 0.54 F1-score and 0.49 precision value. The FPR of the CoroNet model is significantly lower across all the datasets as compared to COVID-Net EMORY(9.1%), ChexPert (1.3%), ChestX-ray14 (0.02%), MIMIC-CXR (0.06%). Conclusion: The models reported good to excellent performance on their internal datasets, however we observed from our testing that their performance dramatically worsened on external data. This is likely from several causes including overfitting models due to lack of appropriate control patients and ground truth labels. The fourth institutional dataset was labeled using RT-PCR, which could be positive without radiographic findings and vice versa. Therefore, a fusion model of both clinical and radiographic data may have better performance and generalization.
摘要:目的:近期以来COVID-19的爆发,已有的研究论文将深度学习基于图像处理胸片检测疾病的雪崩。为了检验两个顶配车型为外部数据集CXR COVID-19诊断的性能评估模型普遍性。方法:在本文中,我们提出了我们对效率与现有的深度学习模型的适用性COVID-19的诊断参数。我们提供了两个受欢迎的车型结果 - 三个公开可用的数据集和15月至2020年埃默里医院收集额外的机构的数据集,包含了使用RT-PCR COVID-19感染检测患者COVID-Net和凤冠评估。结果:有一个大的假阳性率(FPR)为COVID-Net的两个ChexPert(55.3%),并模仿-CXR(23.4%)的数据集。上埃默里数据集,COVID-Net的具有61.4%的灵敏度,0.54 F1-得分和0.49精度值。相比于COVID-Net的EMORY(9.1%),ChexPert(1.3%),ChestX-ray14(0.02%),MIMIC-CXR(0.06%)的皇冠模型的FPR显著下在所有的数据集。结论:该机型报好自己的内部数据集优异的性能,无论我们从我们的测试发现,其业绩急剧恶化的外部数据。这可能是从几个原因,包括过度拟合模型,由于缺乏适当的控制患者和地面实况标签。第四机构数据集使用RT-PCR,其可以是无放射影像结果,反之亦然正标记。因此,临床和影像学资料的融合模型可能有更好的性能和泛化。
Imon Banerjee, Priyanshu Sinha, Saptarshi Purkayastha, Nazanin Mashhaditafreshi, Amara Tariq, Jiwoong Jeong, Hari Trivedi, Judy W. Gichoya
Abstract: Purpose: Since the recent COVID-19 outbreak, there has been an avalanche of research papers applying deep learning based image processing to chest radiographs for detection of the disease. To test the performance of the two top models for CXR COVID-19 diagnosis on external datasets to assess model generalizability. Methods: In this paper, we present our argument regarding the efficiency and applicability of existing deep learning models for COVID-19 diagnosis. We provide results from two popular models - COVID-Net and CoroNet evaluated on three publicly available datasets and an additional institutional dataset collected from EMORY Hospital between January and May 2020, containing patients tested for COVID-19 infection using RT-PCR. Results: There is a large false positive rate (FPR) for COVID-Net on both ChexPert (55.3%) and MIMIC-CXR (23.4%) dataset. On the EMORY Dataset, COVID-Net has 61.4% sensitivity, 0.54 F1-score and 0.49 precision value. The FPR of the CoroNet model is significantly lower across all the datasets as compared to COVID-Net EMORY(9.1%), ChexPert (1.3%), ChestX-ray14 (0.02%), MIMIC-CXR (0.06%). Conclusion: The models reported good to excellent performance on their internal datasets, however we observed from our testing that their performance dramatically worsened on external data. This is likely from several causes including overfitting models due to lack of appropriate control patients and ground truth labels. The fourth institutional dataset was labeled using RT-PCR, which could be positive without radiographic findings and vice versa. Therefore, a fusion model of both clinical and radiographic data may have better performance and generalization.
摘要:目的:近期以来COVID-19的爆发,已有的研究论文将深度学习基于图像处理胸片检测疾病的雪崩。为了检验两个顶配车型为外部数据集CXR COVID-19诊断的性能评估模型普遍性。方法:在本文中,我们提出了我们对效率与现有的深度学习模型的适用性COVID-19的诊断参数。我们提供了两个受欢迎的车型结果 - 三个公开可用的数据集和15月至2020年埃默里医院收集额外的机构的数据集,包含了使用RT-PCR COVID-19感染检测患者COVID-Net和凤冠评估。结果:有一个大的假阳性率(FPR)为COVID-Net的两个ChexPert(55.3%),并模仿-CXR(23.4%)的数据集。上埃默里数据集,COVID-Net的具有61.4%的灵敏度,0.54 F1-得分和0.49精度值。相比于COVID-Net的EMORY(9.1%),ChexPert(1.3%),ChestX-ray14(0.02%),MIMIC-CXR(0.06%)的皇冠模型的FPR显著下在所有的数据集。结论:该机型报好自己的内部数据集优异的性能,无论我们从我们的测试发现,其业绩急剧恶化的外部数据。这可能是从几个原因,包括过度拟合模型,由于缺乏适当的控制患者和地面实况标签。第四机构数据集使用RT-PCR,其可以是无放射影像结果,反之亦然正标记。因此,临床和影像学资料的融合模型可能有更好的性能和泛化。
67. Robot Object Retrieval with Contextual Natural Language Queries [PDF] 返回目录
Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick, Stefanie Tellex
Abstract: Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves an average accuracy of 62.3% on a held-out test set of unseen ImageNet object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes, which have a different image distribution from ImageNet objects. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage. We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.
摘要:自然语言检索对象是在以人为中心的环境机器人一个非常有用而具有挑战性的任务。以前的工作主要集中在指定所需对象的类型的命令,如“剪刀”和/或视觉属性,例如“红”,因此限制了机器人唯一已知的对象类。我们开发了一个模型来获取基于其使用的描述对象。该模型采用包含一个动词语言命令,例如“我的手的东西切”和RGB图像候选对象和选择的对象,最好的满足动词所指定的任务。我们的模型预测,直接从动词短语指定对象的使用对象的外观。我们并不需要显式地指定对象的类标签。我们的方法使我们能够预测像基于语言的查询对象的效用高层次的概念。基于上下文存在于语言命令的信息,我们的模型可以推广到看不见的对象类,并在命令未知的名词。我们的模型正确地选择对象了套五名候选人履行自然语言指令,实现对保留检验一套看不见的ImageNet对象类和53.0%,在看不见的对象类和未知的名词的62.3%的平均准确度。我们的模型也达到54.7%上看不见YCB对象类,其具有从ImageNet对象不同的图像分布的平均精确度。我们证明了我们在库卡LBR iiwa机器人手臂模型,使机器人来检索根据其使用的自然语言描述的对象。我们还提出的655超过50个动词和216的对象类,表示对象使用动词 - 对象对新的数据集。
Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick, Stefanie Tellex
Abstract: Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves an average accuracy of 62.3% on a held-out test set of unseen ImageNet object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes, which have a different image distribution from ImageNet objects. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage. We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.
摘要:自然语言检索对象是在以人为中心的环境机器人一个非常有用而具有挑战性的任务。以前的工作主要集中在指定所需对象的类型的命令,如“剪刀”和/或视觉属性,例如“红”,因此限制了机器人唯一已知的对象类。我们开发了一个模型来获取基于其使用的描述对象。该模型采用包含一个动词语言命令,例如“我的手的东西切”和RGB图像候选对象和选择的对象,最好的满足动词所指定的任务。我们的模型预测,直接从动词短语指定对象的使用对象的外观。我们并不需要显式地指定对象的类标签。我们的方法使我们能够预测像基于语言的查询对象的效用高层次的概念。基于上下文存在于语言命令的信息,我们的模型可以推广到看不见的对象类,并在命令未知的名词。我们的模型正确地选择对象了套五名候选人履行自然语言指令,实现对保留检验一套看不见的ImageNet对象类和53.0%,在看不见的对象类和未知的名词的62.3%的平均准确度。我们的模型也达到54.7%上看不见YCB对象类,其具有从ImageNet对象不同的图像分布的平均精确度。我们证明了我们在库卡LBR iiwa机器人手臂模型,使机器人来检索根据其使用的自然语言描述的对象。我们还提出的655超过50个动词和216的对象类,表示对象使用动词 - 对象对新的数据集。
68. Automated Detection of COVID-19 from CT Scans Using Convolutional Neural Networks [PDF] 返回目录
Rohit Lokwani, Ashrika Gaikwad, Viraj Kulkarni, Aniruddha Pant, Amit Kharat
Abstract: COVID-19 is an infectious disease that causes respiratory problems similar to those caused by SARS-CoV (2003). Currently, swab samples are being used for its diagnosis. The most common testing method used is the RT-PCR method, which has high specificity but variable sensitivity. AI-based detection has the capability to overcome this drawback. In this paper, we propose a prospective method wherein we use chest CT scans to diagnose the patients for COVID-19 pneumonia. We use a set of open-source images, available as individual CT slices, and full CT scans from a private Indian Hospital to train our model. We build a 2D segmentation model using the U-Net architecture, which gives the output by marking out the region of infection. Our model achieves a sensitivity of 96.428% (95% CI: 88%-100%) and a specificity of 88.39% (95% CI: 82%-94%). Additionally, we derive a logic for converting our slice-level predictions to scan-level, which helps us reduce the false positives.
摘要:COVID-19是导致类似于那些由SARS冠状病毒(2003)呼吸困难的传染病。目前,正在被用于其诊断拭子样本。最常用的试验方法是RT-PCR的方法,其具有高的特异性,但可变的灵敏度。基于AI检测必须克服这个缺点的能力。在本文中,我们提出了一个前瞻性的方法,其中,我们用胸部CT扫描来诊断患者COVID-19的肺炎。我们使用了一套开源的图像,可以作为单独的CT片,全CT扫描从私印医院来训练我们的模型。我们建立使用U-Net的架构,这使得通过标记出感染的区域的输出的2D分割模型。我们的模型实现了96.428%(95%CI:88%-100%)的灵敏度和88.39%(95%CI:82%-94%)特异性。此外,我们得出,致力于把带级预测,以扫描的水平,这有助于我们减少误报的逻辑。
Rohit Lokwani, Ashrika Gaikwad, Viraj Kulkarni, Aniruddha Pant, Amit Kharat
Abstract: COVID-19 is an infectious disease that causes respiratory problems similar to those caused by SARS-CoV (2003). Currently, swab samples are being used for its diagnosis. The most common testing method used is the RT-PCR method, which has high specificity but variable sensitivity. AI-based detection has the capability to overcome this drawback. In this paper, we propose a prospective method wherein we use chest CT scans to diagnose the patients for COVID-19 pneumonia. We use a set of open-source images, available as individual CT slices, and full CT scans from a private Indian Hospital to train our model. We build a 2D segmentation model using the U-Net architecture, which gives the output by marking out the region of infection. Our model achieves a sensitivity of 96.428% (95% CI: 88%-100%) and a specificity of 88.39% (95% CI: 82%-94%). Additionally, we derive a logic for converting our slice-level predictions to scan-level, which helps us reduce the false positives.
摘要:COVID-19是导致类似于那些由SARS冠状病毒(2003)呼吸困难的传染病。目前,正在被用于其诊断拭子样本。最常用的试验方法是RT-PCR的方法,其具有高的特异性,但可变的灵敏度。基于AI检测必须克服这个缺点的能力。在本文中,我们提出了一个前瞻性的方法,其中,我们用胸部CT扫描来诊断患者COVID-19的肺炎。我们使用了一套开源的图像,可以作为单独的CT片,全CT扫描从私印医院来训练我们的模型。我们建立使用U-Net的架构,这使得通过标记出感染的区域的输出的2D分割模型。我们的模型实现了96.428%(95%CI:88%-100%)的灵敏度和88.39%(95%CI:82%-94%)特异性。此外,我们得出,致力于把带级预测,以扫描的水平,这有助于我们减少误报的逻辑。
注:中文为机器翻译结果!