目录
2. Disaster Feature Classification on Aerial Photography to Explain Typhoon Damaged Region using Grad-CAM [PDF] 摘要
5. Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [PDF] 摘要
6. Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision [PDF] 摘要
7. Unsupervised Domain Adaptation through Inter-modal Rotation for RGB-D Object Recognition [PDF] 摘要
8. PAI-GCN: Permutable Anisotropic Graph Convolutional Networks for 3D Shape Representation Learning [PDF] 摘要
12. Rice grain disease identification using dual phase convolutional neural network-based system aimed at small dataset [PDF] 摘要
13. TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning [PDF] 摘要
16. Instance Segmentation of Biomedical Images with an Object-aware Embedding Learned with Local Constraints [PDF] 摘要
17. Fast and Robust Registration of Aerial Images and LiDAR data Based on Structrual Features and 3D Phase Correlation [PDF] 摘要
18. AMC-Loss: Angular Margin Contrastive Loss for Improved Explainability in Image Classification [PDF] 摘要
19. Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition [PDF] 摘要
20. A CNN Framenwork Based on Line Annotations for Detecting Nematodes in Microscopic Images [PDF] 摘要
21. Decoupling Video and Human Motion: Towards Practical Event Detection in Athlete Recordings [PDF] 摘要
29. Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution [PDF] 摘要
32. Intelligent Querying for Target Tracking in Camera Networks using Deep Q-Learning with n-Step Bootstrapping [PDF] 摘要
33. LSQ+: Improving low-bit quantization through learnable offsets and better initialization [PDF] 摘要
38. 4D Spatio-Temporal Deep Learning with 4D fMRI Data for Autism Spectrum Disorder Classification [PDF] 摘要
39. EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks [PDF] 摘要
40. Spatio-spectral deep learning methods for in-vivo hyperspectral laryngeal cancer detection [PDF] 摘要
46. Deep Cerebellar Nuclei Segmentation via Semi-Supervised Deep Context-Aware Learning from 7T Diffusion MRI [PDF] 摘要
47. Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification [PDF] 摘要
摘要
1. TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition [PDF] 返回目录
Rami Ben-Ari, Mor Shpigel, Ophir Azulai, Udi Barzelay, Daniel Rotman
Abstract: Classification of a new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively time consuming and costly. Few-shot learning (FSL) suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of using just a few labeled samples to learn a new spatio-temporal pattern such as videos. In this paper, we present a Temporal Aware Embedding Network (TAEN) for few-shot action recognition, that learns to represent actions, in a metric space as a trajectory, conveying both short term semantics and longer term connectivity between sub-actions. We demonstrate the effectiveness of TAEN on two few shot tasks, video classification and temporal action detection. We achieve state-of-the-art results on the Kinetics few-shot benchmark and on the ActivityNet 1.2 few-shot temporal action detection task. Code will be released upon acceptance of the paper.
摘要:一类新的实体分类需要收集和注释样品数百或数千,往往是过于耗费时间和成本。很少有次学习(FSL)建议学习使用只是几个例子分类新类。只有研究少数解决只用一些标记的样品学习新的空间 - 时间模式,如视频的挑战。在本文中,我们提出了一个时空感知嵌入网络(TAEN)为几拍动作识别,该学会代表行动,在度量空间轨迹,既传达短期语义和子动作之间的更长期的连接。我们证明TAEN的两个几拍任务,视频分类和时间动作检测的有效性。我们实现了对几动力学拍基准,对ActivityNet 1.2几拍时间动作检测任务的国家的最先进的成果。代码将在接受纸被释放。
Rami Ben-Ari, Mor Shpigel, Ophir Azulai, Udi Barzelay, Daniel Rotman
Abstract: Classification of a new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively time consuming and costly. Few-shot learning (FSL) suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of using just a few labeled samples to learn a new spatio-temporal pattern such as videos. In this paper, we present a Temporal Aware Embedding Network (TAEN) for few-shot action recognition, that learns to represent actions, in a metric space as a trajectory, conveying both short term semantics and longer term connectivity between sub-actions. We demonstrate the effectiveness of TAEN on two few shot tasks, video classification and temporal action detection. We achieve state-of-the-art results on the Kinetics few-shot benchmark and on the ActivityNet 1.2 few-shot temporal action detection task. Code will be released upon acceptance of the paper.
摘要:一类新的实体分类需要收集和注释样品数百或数千,往往是过于耗费时间和成本。很少有次学习(FSL)建议学习使用只是几个例子分类新类。只有研究少数解决只用一些标记的样品学习新的空间 - 时间模式,如视频的挑战。在本文中,我们提出了一个时空感知嵌入网络(TAEN)为几拍动作识别,该学会代表行动,在度量空间轨迹,既传达短期语义和子动作之间的更长期的连接。我们证明TAEN的两个几拍任务,视频分类和时间动作检测的有效性。我们实现了对几动力学拍基准,对ActivityNet 1.2几拍时间动作检测任务的国家的最先进的成果。代码将在接受纸被释放。
2. Disaster Feature Classification on Aerial Photography to Explain Typhoon Damaged Region using Grad-CAM [PDF] 返回目录
Yasuno Takato
Abstract: Recent years, typhoon damages has become social problem owing to climate change. Especially, 9 September 2019, Typhoon Faxai passed on the south Chiba prefecture in Japan, whose damages included with electric and water provision stop and house roof break because of strong wind recorded on the maximum 45 meter per second. A large amount of tree fell down, and the neighbor electric poles also fell down at the same time. These disaster features have caused that it took eighteen days for recovery longer than past ones. Initial responses are important for faster recovery. As long as we can, aerial survey for global screening of devastated region would be required for decision support to respond where to recover ahead. This paper proposes a practical method to visualize the damaged areas focused on the typhoon disaster features using aerial photography. This method can classify eight classes which contains land covers without damages and areas with disaster, where an aerial photograph is partitioned into 4,096 grids that is 64 by 64, with each unit image of 48 meter square. Using target feature class probabilities, we can visualize disaster features map to scale the color range from blue to red or yellow. Furthermore, we can realize disaster feature mapping on each unit grid images to compute the convolutional activation map using Grad-CAM based on deep neural network layers for classification. This paper demonstrates case studies applied to aerial photographs recorded at the south Chiba prefecture in Japan after typhoon disaster.
摘要:近年来,台风损失已成为因气候变化带来的社会问题。特别是,2019 9月9日,台风Faxai通过在日本,其包含的,因为记录每秒最大45米大风电和水供应站,房屋屋顶破损害南千叶县。大量的树倒下去了,邻居电线杆也倒了下去,在同一时间。这些灾害的特点造成其掏出18天的恢复时间比过去的。最初的反应是快速的恢复很重要。只要我们能,为受灾地区的全球筛选航测将需要作出回应,其中收回提前决策支持。本文提出了以可视化的受损区域主要集中在台风灾害使用航空摄影功能的实用方法。该方法可8类含有土地覆盖而不损害和灾害,其中航空照片被划分为4096个网格为64由64个区域,具有48米见方的各单位图像进行分类。使用目标要素类的概率,我们可以想像的灾难功能映射到的颜色范围扩展从蓝色到红色或黄色。此外,我们可以实现对每个单元格图像灾难特征映射来计算使用基于用于分类深神经网络层梯度-CAM卷积激活图。本文展示了适用于记录在南千叶县日本台风灾后航拍照片案例研究。
Yasuno Takato
Abstract: Recent years, typhoon damages has become social problem owing to climate change. Especially, 9 September 2019, Typhoon Faxai passed on the south Chiba prefecture in Japan, whose damages included with electric and water provision stop and house roof break because of strong wind recorded on the maximum 45 meter per second. A large amount of tree fell down, and the neighbor electric poles also fell down at the same time. These disaster features have caused that it took eighteen days for recovery longer than past ones. Initial responses are important for faster recovery. As long as we can, aerial survey for global screening of devastated region would be required for decision support to respond where to recover ahead. This paper proposes a practical method to visualize the damaged areas focused on the typhoon disaster features using aerial photography. This method can classify eight classes which contains land covers without damages and areas with disaster, where an aerial photograph is partitioned into 4,096 grids that is 64 by 64, with each unit image of 48 meter square. Using target feature class probabilities, we can visualize disaster features map to scale the color range from blue to red or yellow. Furthermore, we can realize disaster feature mapping on each unit grid images to compute the convolutional activation map using Grad-CAM based on deep neural network layers for classification. This paper demonstrates case studies applied to aerial photographs recorded at the south Chiba prefecture in Japan after typhoon disaster.
摘要:近年来,台风损失已成为因气候变化带来的社会问题。特别是,2019 9月9日,台风Faxai通过在日本,其包含的,因为记录每秒最大45米大风电和水供应站,房屋屋顶破损害南千叶县。大量的树倒下去了,邻居电线杆也倒了下去,在同一时间。这些灾害的特点造成其掏出18天的恢复时间比过去的。最初的反应是快速的恢复很重要。只要我们能,为受灾地区的全球筛选航测将需要作出回应,其中收回提前决策支持。本文提出了以可视化的受损区域主要集中在台风灾害使用航空摄影功能的实用方法。该方法可8类含有土地覆盖而不损害和灾害,其中航空照片被划分为4096个网格为64由64个区域,具有48米见方的各单位图像进行分类。使用目标要素类的概率,我们可以想像的灾难功能映射到的颜色范围扩展从蓝色到红色或黄色。此外,我们可以实现对每个单元格图像灾难特征映射来计算使用基于用于分类深神经网络层梯度-CAM卷积激活图。本文展示了适用于记录在南千叶县日本台风灾后航拍照片案例研究。
3. Have you forgotten? A method to assess if machine learning models have forgotten data [PDF] 返回目录
Xiao Liu, Sotirios A Tsaftaris
Abstract: In the era of deep learning, aggregation of data from several sources is considered as a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. The provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model `forgets' their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model's output activations. We establish statistical methods that compare the outputs of the target with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage investigations on what information a model retains and inspire extensions in more complex settings.
摘要:在深度学习的时代,从多个来源的数据聚合被视为一种常见的方法来保证数据的多样性。让我们考虑这样一个场景,一些供应商提供数据的财团分类模型(以下简称目标模式)联合开发的,但是,现在的供应商之一,决定离开。该供应商的要求,他们的数据(以下简称查询数据集)从数据库中删除,而且该模型'忘记他们的数据。在本文中,第一次,我们要解决的数据是否已被遗忘通过模型的具有挑战性的问题。我们假设查询数据集的知识和模型的输出激活的分布。我们确立的目标输出与具有不同的数据集训练的模型输出比较的统计方法。我们评估的几个基准数据集(MNIST,CIFAR-10和SVHN),并使用从自动心脏诊断挑战(ACDC)数据的心脏病理诊断任务我们的做法。我们希望鼓励在模型上保留什么样的信息调查和激励在更复杂的设置扩展。
Xiao Liu, Sotirios A Tsaftaris
Abstract: In the era of deep learning, aggregation of data from several sources is considered as a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. The provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model `forgets' their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model's output activations. We establish statistical methods that compare the outputs of the target with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage investigations on what information a model retains and inspire extensions in more complex settings.
摘要:在深度学习的时代,从多个来源的数据聚合被视为一种常见的方法来保证数据的多样性。让我们考虑这样一个场景,一些供应商提供数据的财团分类模型(以下简称目标模式)联合开发的,但是,现在的供应商之一,决定离开。该供应商的要求,他们的数据(以下简称查询数据集)从数据库中删除,而且该模型'忘记他们的数据。在本文中,第一次,我们要解决的数据是否已被遗忘通过模型的具有挑战性的问题。我们假设查询数据集的知识和模型的输出激活的分布。我们确立的目标输出与具有不同的数据集训练的模型输出比较的统计方法。我们评估的几个基准数据集(MNIST,CIFAR-10和SVHN),并使用从自动心脏诊断挑战(ACDC)数据的心脏病理诊断任务我们的做法。我们希望鼓励在模型上保留什么样的信息调查和激励在更复杂的设置扩展。
4. Frequency-Weighted Robust Tensor Principal Component Analysis [PDF] 返回目录
Shenghan Wang, Yipeng Liu, Lanlan Feng, Ce Zhu
Abstract: Robust tensor principal component analysis (RTPCA) can separate the low-rank component and sparse component from multidimensional data, which has been used successfully in several image applications. Its performance varies with different kinds of tensor decompositions, and the tensor singular value decomposition (t-SVD) is a popularly selected one. The standard t-SVD takes the discrete Fourier transform to exploit the residual in the 3rd mode in the decomposition. When minimizing the tensor nuclear norm related to t-SVD, all the frontal slices in frequency domain are optimized equally. In this paper, we incorporate frequency component analysis into t-SVD to enhance the RTPCA performance. Specially, different frequency bands are unequally weighted with respect to the corresponding physical meanings, and the frequency-weighted tensor nuclear norm can be obtained. Accordingly we rigorously deduce the frequency-weighted tensor singular value threshold operator, and apply it for low rank approximation subproblem in RTPCA. The newly obtained frequency-weighted RTPCA can be solved by alternating direction method of multipliers, and it is the first time that frequency analysis is taken in tensor principal component analysis. Numerical experiments on synthetic 3D data, color image denoising and background modeling verify that the proposed method outperforms the state-of-the-art algorithms both in accuracy and computational complexity.
摘要:鲁棒张量主成分分析(RTPCA)可从多维数据,这已经在几个图像应用成功使用的低秩组分和疏组分分离。其性能与不同类型的张量分解的变化,并且张量奇异值分解(叔SVD)是一种普遍选择的一个。标准的T-SVD开出离散傅里叶变换利用残留在第三模式中的分解。当最大限度地减少到T-SVD张量核标准,所有在频域上的正面切片同样进行了优化。在本文中,我们结合频率成分分析到T-SVD提升RTPCA性能。特别地,不同的频带被不均等地相对于相应的物理含义加权,并且可以得到频率加权张量核常态。因此我们推断严格的频率加权的张量奇异值阈值算子,并将其应用在RTPCA低秩近似子问题。新获得的频率加权可以RTPCA通过交替乘法器方向的方法来解决,这是第一次,频率分析被取入张量主成分分析。上合成的3D数据的数值实验中,彩色图像去噪和背景建模验证所提出的方法都优于在精度和计算复杂度的状态的最先进的算法。
Shenghan Wang, Yipeng Liu, Lanlan Feng, Ce Zhu
Abstract: Robust tensor principal component analysis (RTPCA) can separate the low-rank component and sparse component from multidimensional data, which has been used successfully in several image applications. Its performance varies with different kinds of tensor decompositions, and the tensor singular value decomposition (t-SVD) is a popularly selected one. The standard t-SVD takes the discrete Fourier transform to exploit the residual in the 3rd mode in the decomposition. When minimizing the tensor nuclear norm related to t-SVD, all the frontal slices in frequency domain are optimized equally. In this paper, we incorporate frequency component analysis into t-SVD to enhance the RTPCA performance. Specially, different frequency bands are unequally weighted with respect to the corresponding physical meanings, and the frequency-weighted tensor nuclear norm can be obtained. Accordingly we rigorously deduce the frequency-weighted tensor singular value threshold operator, and apply it for low rank approximation subproblem in RTPCA. The newly obtained frequency-weighted RTPCA can be solved by alternating direction method of multipliers, and it is the first time that frequency analysis is taken in tensor principal component analysis. Numerical experiments on synthetic 3D data, color image denoising and background modeling verify that the proposed method outperforms the state-of-the-art algorithms both in accuracy and computational complexity.
摘要:鲁棒张量主成分分析(RTPCA)可从多维数据,这已经在几个图像应用成功使用的低秩组分和疏组分分离。其性能与不同类型的张量分解的变化,并且张量奇异值分解(叔SVD)是一种普遍选择的一个。标准的T-SVD开出离散傅里叶变换利用残留在第三模式中的分解。当最大限度地减少到T-SVD张量核标准,所有在频域上的正面切片同样进行了优化。在本文中,我们结合频率成分分析到T-SVD提升RTPCA性能。特别地,不同的频带被不均等地相对于相应的物理含义加权,并且可以得到频率加权张量核常态。因此我们推断严格的频率加权的张量奇异值阈值算子,并将其应用在RTPCA低秩近似子问题。新获得的频率加权可以RTPCA通过交替乘法器方向的方法来解决,这是第一次,频率分析被取入张量主成分分析。上合成的3D数据的数值实验中,彩色图像去噪和背景建模验证所提出的方法都优于在精度和计算复杂度的状态的最先进的算法。
5. Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [PDF] 返回目录
Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao
Abstract: It plays a fundamental role to compactly represent the visual information towards the optimization of the ultimate utility in myriad visual data centered applications. With numerous approaches proposed to efficiently compress the texture and visual features serving human visual perception and machine intelligence respectively, much less work has been dedicated to studying the interactions between them. Here we investigate the integration of feature and texture compression, and show that a universal and collaborative visual information representation can be achieved in a hierarchical way. In particular, we study the feature and texture compression in a scalable coding framework, where the base layer serves as the deep learning feature and enhancement layer targets to perfectly reconstruct the texture. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction, aiming to further construct texture representation from feature. As such, the residuals between the original and reconstructed texture could be further conveyed in the enhancement layer. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner such that the learned features enjoy both high quality reconstruction and high accuracy analysis. We further demonstrate the framework and optimization strategies in face image compression, and promising coding performance has been achieved in terms of both rate-fidelity and rate-accuracy.
摘要:它起着根本性的作用紧凑是朝着在无数的可视化数据的最终效用的优化视觉信息中心的应用程序。分别服务于人的视觉感知和机器智能提出了有效地压缩纹理很多方法和视觉特征,更谈不上工作一直致力于研究它们之间的相互作用。这里,我们调查的功能和纹理压缩的整合,并表明通用和协作可视化信息表示可以在一个分层的方式来实现。特别是,我们研究了一个可扩展编码框架,其中基础层作为深度学习功能和增强层的目标完美地重建纹理特征和纹理压缩。基于深神经网络的强生成能力,基特征层和增强层之间的间隙被进一步填充有功能级纹理重建,旨在从特征进一步构建纹理表示。这样,原始和重构的纹理之间的残差可以进一步在所述增强层传送。为了改进所提出的框架的效率,所述基础层的神经网络是在多任务方式训练,使得所学习的特征同时享受高质量重建和高准确度分析。我们进一步证明在人脸图像压缩的框架和优化策略,并看好编码性能已经在这两个速率保真度和速度,精度方面已经实现。
Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao
Abstract: It plays a fundamental role to compactly represent the visual information towards the optimization of the ultimate utility in myriad visual data centered applications. With numerous approaches proposed to efficiently compress the texture and visual features serving human visual perception and machine intelligence respectively, much less work has been dedicated to studying the interactions between them. Here we investigate the integration of feature and texture compression, and show that a universal and collaborative visual information representation can be achieved in a hierarchical way. In particular, we study the feature and texture compression in a scalable coding framework, where the base layer serves as the deep learning feature and enhancement layer targets to perfectly reconstruct the texture. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction, aiming to further construct texture representation from feature. As such, the residuals between the original and reconstructed texture could be further conveyed in the enhancement layer. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner such that the learned features enjoy both high quality reconstruction and high accuracy analysis. We further demonstrate the framework and optimization strategies in face image compression, and promising coding performance has been achieved in terms of both rate-fidelity and rate-accuracy.
摘要:它起着根本性的作用紧凑是朝着在无数的可视化数据的最终效用的优化视觉信息中心的应用程序。分别服务于人的视觉感知和机器智能提出了有效地压缩纹理很多方法和视觉特征,更谈不上工作一直致力于研究它们之间的相互作用。这里,我们调查的功能和纹理压缩的整合,并表明通用和协作可视化信息表示可以在一个分层的方式来实现。特别是,我们研究了一个可扩展编码框架,其中基础层作为深度学习功能和增强层的目标完美地重建纹理特征和纹理压缩。基于深神经网络的强生成能力,基特征层和增强层之间的间隙被进一步填充有功能级纹理重建,旨在从特征进一步构建纹理表示。这样,原始和重构的纹理之间的残差可以进一步在所述增强层传送。为了改进所提出的框架的效率,所述基础层的神经网络是在多任务方式训练,使得所学习的特征同时享受高质量重建和高准确度分析。我们进一步证明在人脸图像压缩的框架和优化策略,并看好编码性能已经在这两个速率保真度和速度,精度方面已经实现。
6. Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision [PDF] 返回目录
Haitian Zheng, Haofu Liao, Lele Chen, Wei Xiong, Tianlang Chen, Jiebo Luo
Abstract: Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplar image provides the style guidance that controls the appearance of the synthesized output. Despite the controllability advantage, the existing models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map. To this end, we first propose a Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two arbitrary scenes via efficient decoupled attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. Finally, we propose a novel self-supervision task to enable training. Experiments on the large-scale and more diverse COCO-stuff dataset show significant improvements over the existing methods. Moreover, our approach provides interpretability and can be readily extended to other content manipulation tasks including style and spatial interpolation or extrapolation.
摘要:实施例引导图像合成最近已试图从一个语义标签映射和示例性的图像合成的图像。在任务中,附加示例性图像提供的样式指导,其控制合成输出的外观。尽管可控性优势,现有车型的设计上与特定的数据集,并大致对准对象。在本文中,我们解决一个更具挑战性和总任务,其中的典范是从给定的标签映射语义不同的任意场景图像。为此,我们首先提出了一个屏蔽空间通道注意(MSCA)模块,该模块通过模型有效解耦注意任意两个场景之间的对应关系。接下来,我们提出了一个终端到端到端的网络联合全局和局部特征比对和综合。最后,我们提出了一种新的自我监督的任务,使培训。在大规模和更多样化的COCO-东西集实验表明,现有的方法在显著改善。此外,我们的方法提供了解释性和可以很容易地扩展到其他内容操纵的任务,包括风格和空间内插或外推。
Haitian Zheng, Haofu Liao, Lele Chen, Wei Xiong, Tianlang Chen, Jiebo Luo
Abstract: Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplar image provides the style guidance that controls the appearance of the synthesized output. Despite the controllability advantage, the existing models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map. To this end, we first propose a Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two arbitrary scenes via efficient decoupled attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. Finally, we propose a novel self-supervision task to enable training. Experiments on the large-scale and more diverse COCO-stuff dataset show significant improvements over the existing methods. Moreover, our approach provides interpretability and can be readily extended to other content manipulation tasks including style and spatial interpolation or extrapolation.
摘要:实施例引导图像合成最近已试图从一个语义标签映射和示例性的图像合成的图像。在任务中,附加示例性图像提供的样式指导,其控制合成输出的外观。尽管可控性优势,现有车型的设计上与特定的数据集,并大致对准对象。在本文中,我们解决一个更具挑战性和总任务,其中的典范是从给定的标签映射语义不同的任意场景图像。为此,我们首先提出了一个屏蔽空间通道注意(MSCA)模块,该模块通过模型有效解耦注意任意两个场景之间的对应关系。接下来,我们提出了一个终端到端到端的网络联合全局和局部特征比对和综合。最后,我们提出了一种新的自我监督的任务,使培训。在大规模和更多样化的COCO-东西集实验表明,现有的方法在显著改善。此外,我们的方法提供了解释性和可以很容易地扩展到其他内容操纵的任务,包括风格和空间内插或外推。
7. Unsupervised Domain Adaptation through Inter-modal Rotation for RGB-D Object Recognition [PDF] 返回目录
Mohammad Reza Loghmani, Luca Robbiano, Mirco Planamente, Kiru Park, Barbara Caputo, Markus Vincze
Abstract: Unsupervised Domain Adaptation (DA) exploits the supervision of a label-rich source dataset to make predictions on an unlabeled target dataset by aligning the two data distributions. In robotics, DA is used to take advantage of automatically generated synthetic data, that come with "free" annotation, to make effective predictions on real data. However, existing DA methods are not designed to cope with the multi-modal nature of RGB-D data, which are widely used in robotic vision. We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image. Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image. To evaluate our method and encourage further research in this area, we define two benchmark datasets for object categorization and instance recognition. With extensive experiments, we show the benefits of leveraging the inter-modal relations for RGB-D DA.
摘要:无监督领域适应性(DA)利用丰富的标签源数据集的监督将两个数据分布,使上未标记的目标数据集的预测。在机器人,DA是用来利用自动生成的合成数据,来与“自由”的注释,以使实际数据的有效预测。但是,现有的DA方法不是用来应付RGB-d的数据,其被广泛应用于机器人视觉的多模态性质。我们建议,减少通过利用RGB和深度图像之间的相互关系的模态合成到实域移位的新颖RGB-d DA方法。我们的方法包括训练卷积神经网络来解决,除了主要的识别任务,预测RGB和深度图像之间的相对旋转的借口任务。为了评估我们的方法,并鼓励在这一领域的进一步研究,我们定义对象分类和实例识别2个基准数据集。随着大量的实验中,我们展示了利用RGB-d DA的模态间关系的好处。
Mohammad Reza Loghmani, Luca Robbiano, Mirco Planamente, Kiru Park, Barbara Caputo, Markus Vincze
Abstract: Unsupervised Domain Adaptation (DA) exploits the supervision of a label-rich source dataset to make predictions on an unlabeled target dataset by aligning the two data distributions. In robotics, DA is used to take advantage of automatically generated synthetic data, that come with "free" annotation, to make effective predictions on real data. However, existing DA methods are not designed to cope with the multi-modal nature of RGB-D data, which are widely used in robotic vision. We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image. Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image. To evaluate our method and encourage further research in this area, we define two benchmark datasets for object categorization and instance recognition. With extensive experiments, we show the benefits of leveraging the inter-modal relations for RGB-D DA.
摘要:无监督领域适应性(DA)利用丰富的标签源数据集的监督将两个数据分布,使上未标记的目标数据集的预测。在机器人,DA是用来利用自动生成的合成数据,来与“自由”的注释,以使实际数据的有效预测。但是,现有的DA方法不是用来应付RGB-d的数据,其被广泛应用于机器人视觉的多模态性质。我们建议,减少通过利用RGB和深度图像之间的相互关系的模态合成到实域移位的新颖RGB-d DA方法。我们的方法包括训练卷积神经网络来解决,除了主要的识别任务,预测RGB和深度图像之间的相对旋转的借口任务。为了评估我们的方法,并鼓励在这一领域的进一步研究,我们定义对象分类和实例识别2个基准数据集。随着大量的实验中,我们展示了利用RGB-d DA的模态间关系的好处。
8. PAI-GCN: Permutable Anisotropic Graph Convolutional Networks for 3D Shape Representation Learning [PDF] 返回目录
Zhongpai Gao, Guangtao Zhai, Juyong Zhang, Yiyan Yang, Xiaokang Yang
Abstract: Demand for efficient 3D shape representation learning is increasing in many 3D computer vision applications. The recent success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to 3D shapes. However, unlike images that are Euclidean structured, 3D shape data are irregular since each node's neighbors are inconsistent. Various convolutional graph neural networks for 3D shapes have been developed using isotropic filters or using anisotropic filters with predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a permutable anisotropic convolutional operation (PAI-Conv) that learns adaptive soft-permutation matrices for each node according to the geometric shape of its neighbors and performs shared anisotropic filters as CNN does. Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
摘要:高效的3D形状表示学习的需求在很多3D计算机视觉的应用越来越多。卷积神经网络进行图像分析,最近成功(细胞神经网络)建议从CNN适应洞察到三维形状的价值。然而,不像是欧几里德图像结构,三维形状的数据,因为每个节点的邻居不一致是不规则的。用于3D形状的各种卷积图表神经网络已经使用各向同性滤波器,或者使用预定义的与局部坐标系各向异性过滤器,以克服上图的节点不一致显影。然而,各向同性的过滤器或预定义局部坐标系限制了表现力。在本文中,我们提出了一个置换各向异性卷积运算(PAI-CONV),该自适应获悉根据其邻居和执行的几何形状的每个节点软置换矩阵共享各向异性滤波器如CNN一样。综合实验表明,我们的模型在3D形状重建相比,国家的最先进的方法产生显著的改善。
Zhongpai Gao, Guangtao Zhai, Juyong Zhang, Yiyan Yang, Xiaokang Yang
Abstract: Demand for efficient 3D shape representation learning is increasing in many 3D computer vision applications. The recent success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to 3D shapes. However, unlike images that are Euclidean structured, 3D shape data are irregular since each node's neighbors are inconsistent. Various convolutional graph neural networks for 3D shapes have been developed using isotropic filters or using anisotropic filters with predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a permutable anisotropic convolutional operation (PAI-Conv) that learns adaptive soft-permutation matrices for each node according to the geometric shape of its neighbors and performs shared anisotropic filters as CNN does. Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
摘要:高效的3D形状表示学习的需求在很多3D计算机视觉的应用越来越多。卷积神经网络进行图像分析,最近成功(细胞神经网络)建议从CNN适应洞察到三维形状的价值。然而,不像是欧几里德图像结构,三维形状的数据,因为每个节点的邻居不一致是不规则的。用于3D形状的各种卷积图表神经网络已经使用各向同性滤波器,或者使用预定义的与局部坐标系各向异性过滤器,以克服上图的节点不一致显影。然而,各向同性的过滤器或预定义局部坐标系限制了表现力。在本文中,我们提出了一个置换各向异性卷积运算(PAI-CONV),该自适应获悉根据其邻居和执行的几何形状的每个节点软置换矩阵共享各向异性滤波器如CNN一样。综合实验表明,我们的模型在3D形状重建相比,国家的最先进的方法产生显著的改善。
9. Towards Generalization of 3D Human Pose Estimation In The Wild [PDF] 返回目录
Renato Baptista, Alexandre Saint, Kassem Al Ismaeil, Djamila Aouada
Abstract: In this paper, we propose 3DBodyTex.Pose, a dataset that addresses the task of 3D human pose estimation in-the-wild. Generalization to in-the-wild images remains limited due to the lack of adequate datasets. Existent ones are usually collected in indoor controlled environments where motion capture systems are used to obtain the 3D ground-truth annotations of humans. 3DBodyTex.Pose offers high quality and rich data containing 405 different real subjects in various clothing and poses, and 81k image samples with ground-truth 2D and 3D pose annotations. These images are generated from 200 viewpoints among which 70 challenging extreme viewpoints. This data was created starting from high resolution textured 3D body scans and by incorporating various realistic backgrounds. Retraining a state-of-the-art 3D pose estimation approach using data augmented with 3DBodyTex.Pose showed promising improvement in the overall performance, and a sensible decrease in the per joint position error when testing on challenging viewpoints. The 3DBodyTex.Pose is expected to offer the research community with new possibilities for generalizing 3D pose estimation from monocular in-the-wild images.
摘要:在本文中,我们提出3DBodyTex.Pose,一个数据集的地址在最疯狂的3D人体姿态估计的任务。推广在最狂野的图像遗体由于仅限于缺乏足够的数据集。存在的是,通常在动作捕捉系统被用于获得人的3D地面真值注释的室内控制环境中收集。 3DBodyTex.Pose提供了高品质和丰富的数据包含在各种服装和姿势405个不同的现实课题,并与地面实况2D和3D图像81K样本构成的注解。这些图像从200个视点其中70具有挑战性的极端视点产生。该数据被创建高分辨率纹理开始三维人体扫描,并通过将各种现实背景。再培训使用带有3DBodyTex.Pose增强数据的国家的最先进的三维姿态估计方法显示,在整体表现看好的改善,并在每个关节位置误差一个合理的下降有挑战性的观点进行测试时。该3DBodyTex.Pose有望提供新的可能性的研究团体在最狂野图片来自单眼推广3D姿态估计。
Renato Baptista, Alexandre Saint, Kassem Al Ismaeil, Djamila Aouada
Abstract: In this paper, we propose 3DBodyTex.Pose, a dataset that addresses the task of 3D human pose estimation in-the-wild. Generalization to in-the-wild images remains limited due to the lack of adequate datasets. Existent ones are usually collected in indoor controlled environments where motion capture systems are used to obtain the 3D ground-truth annotations of humans. 3DBodyTex.Pose offers high quality and rich data containing 405 different real subjects in various clothing and poses, and 81k image samples with ground-truth 2D and 3D pose annotations. These images are generated from 200 viewpoints among which 70 challenging extreme viewpoints. This data was created starting from high resolution textured 3D body scans and by incorporating various realistic backgrounds. Retraining a state-of-the-art 3D pose estimation approach using data augmented with 3DBodyTex.Pose showed promising improvement in the overall performance, and a sensible decrease in the per joint position error when testing on challenging viewpoints. The 3DBodyTex.Pose is expected to offer the research community with new possibilities for generalizing 3D pose estimation from monocular in-the-wild images.
摘要:在本文中,我们提出3DBodyTex.Pose,一个数据集的地址在最疯狂的3D人体姿态估计的任务。推广在最狂野的图像遗体由于仅限于缺乏足够的数据集。存在的是,通常在动作捕捉系统被用于获得人的3D地面真值注释的室内控制环境中收集。 3DBodyTex.Pose提供了高品质和丰富的数据包含在各种服装和姿势405个不同的现实课题,并与地面实况2D和3D图像81K样本构成的注解。这些图像从200个视点其中70具有挑战性的极端视点产生。该数据被创建高分辨率纹理开始三维人体扫描,并通过将各种现实背景。再培训使用带有3DBodyTex.Pose增强数据的国家的最先进的三维姿态估计方法显示,在整体表现看好的改善,并在每个关节位置误差一个合理的下降有挑战性的观点进行测试时。该3DBodyTex.Pose有望提供新的可能性的研究团体在最狂野图片来自单眼推广3D姿态估计。
10. Weakly Aligned Joint Cross-Modality Super Resolution [PDF] 返回目录
Guy Shacht, Sharon Fogel, Dov Danon, Daniel Cohen-Or
Abstract: Non-visual imaging sensors are widely used in the industry for different purposes. Those sensors are more expensive than visual (RGB) sensors, and usually produce images with lower resolution. To this end, Cross-Modality Super-Resolution methods were introduced, where an RGB image of a high-resolution assists in increasing the resolution of the low-resolution modality. However, fusing images from different modalities is not a trivial task; the output must be artifact-free and remain loyal to the characteristics of the target modality. Moreover, the input images are never perfectly aligned, which results in further artifacts during the fusion process. We present CMSR, a deep network for Cross-Modality Super-Resolution, which unlike previous methods, is designed to deal with weakly aligned images. The network is trained on the two input images only, learns their internal statistics and correlations, and applies them to up-sample the target modality. CMSR contains an internal transformer that is trained on-the-fly together with the up-sampling process itself, without explicit supervision. We show that CMSR succeeds to increase the resolution of the input image, gaining valuable information from its RGB counterpart, yet in a conservative way, without introducing artifacts or irrelevant details.
摘要:非视觉成像传感器广泛应用于工业用于不同的目的。这些传感器比视觉(RGB)传感器更昂贵,并且通常产生具有较低分辨率的图像。为此,引入了跨模态超分辨率的方法,其中高分辨率助攻在增加低分辨率模式的分辨率的RGB图像。然而,来自不同模态融合图像不是一个简单的任务;输出必须是伪影且无忠于目标模态的特性。此外,输入图像被从不完美对齐,这导致在融合过程进一步的伪影。我们目前CMSR,用于跨模态的超分辨率,它不像以前的方法,是专门用来对付弱排列图像的深网络。该网络是仅在两个输入图像的训练,学习他们的内部统计数据和相关性,并将其应用到了样本的目标模式。 CMSR包含训练连同上采样过程本身对即时,没有明确的监督内部变压器。我们表明,CMSR成功增加输入图像的分辨率,从它的RGB对方获取有价值的信息,但在一个保守的方式,而不会引入假象或不相关的细节。
Guy Shacht, Sharon Fogel, Dov Danon, Daniel Cohen-Or
Abstract: Non-visual imaging sensors are widely used in the industry for different purposes. Those sensors are more expensive than visual (RGB) sensors, and usually produce images with lower resolution. To this end, Cross-Modality Super-Resolution methods were introduced, where an RGB image of a high-resolution assists in increasing the resolution of the low-resolution modality. However, fusing images from different modalities is not a trivial task; the output must be artifact-free and remain loyal to the characteristics of the target modality. Moreover, the input images are never perfectly aligned, which results in further artifacts during the fusion process. We present CMSR, a deep network for Cross-Modality Super-Resolution, which unlike previous methods, is designed to deal with weakly aligned images. The network is trained on the two input images only, learns their internal statistics and correlations, and applies them to up-sample the target modality. CMSR contains an internal transformer that is trained on-the-fly together with the up-sampling process itself, without explicit supervision. We show that CMSR succeeds to increase the resolution of the input image, gaining valuable information from its RGB counterpart, yet in a conservative way, without introducing artifacts or irrelevant details.
摘要:非视觉成像传感器广泛应用于工业用于不同的目的。这些传感器比视觉(RGB)传感器更昂贵,并且通常产生具有较低分辨率的图像。为此,引入了跨模态超分辨率的方法,其中高分辨率助攻在增加低分辨率模式的分辨率的RGB图像。然而,来自不同模态融合图像不是一个简单的任务;输出必须是伪影且无忠于目标模态的特性。此外,输入图像被从不完美对齐,这导致在融合过程进一步的伪影。我们目前CMSR,用于跨模态的超分辨率,它不像以前的方法,是专门用来对付弱排列图像的深网络。该网络是仅在两个输入图像的训练,学习他们的内部统计数据和相关性,并将其应用到了样本的目标模式。 CMSR包含训练连同上采样过程本身对即时,没有明确的监督内部变压器。我们表明,CMSR成功增加输入图像的分辨率,从它的RGB对方获取有价值的信息,但在一个保守的方式,而不会引入假象或不相关的细节。
11. TTNet: Real-time temporal and spatial video analysis of table tennis [PDF] 返回目录
Roman Voeikov, Nikolay Falaleev, Ruslan Baikulov
Abstract: We present a neural network TTNet aimed at real-time processing of high-resolution table tennis videos, providing both temporal (events spotting) and spatial (ball detection and semantic segmentation) data. This approach gives core information for reasoning score updates by an auto-referee system. We also publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events, semantic segmentation masks, and ball coordinates for evaluation of multi-task approaches, primarily oriented on spotting of quick events and small objects tracking. TTNet demonstrated 97.0% accuracy in game events spotting along with 2 pixels RMSE in ball detection with 97.5% accuracy on the test part of the presented dataset. The proposed network allows the processing of downscaled full HD videos with inference time below 6 ms per input tensor on a machine with a single consumer-grade GPU. Thus, we are contributing to the development of real-time multi-task deep learning applications and presenting approach, which is potentially capable of substituting manual data collection by sports scouts, providing support for referees' decision-making, and gathering extra information about the game process.
摘要:本文提出了一种神经网络TTNET瞄准的高分辨率乒乓球视频实时处理,同时提供时间(事件斑点)和空间(球检测和语义分割)的数据。这种方法为通过自动裁判系统推理比分更新核心信息。我们还出版有标记的事件,语义分割口罩,并为多任务的评估球坐标的120 fps的乒乓球游戏视频多任务数据集OpenTTGames办法,对快速事件和小物体跟踪的斑点主要导向。 TTNET表明97.0%的精度在游戏事件与2个像素沿斑点RMSE在球检测与所呈现的数据集的测试部分97.5%的准确度。所提出的网络允许与推理时间低于每输入张量6毫秒的机器上的单一消费级GPU缩小的全高清视频的处理。因此,我们正在促进实时多任务的发展深度学习应用和呈现方式,这可能能够通过体育球探替代手工数据收集,为裁判的决策支持,并收集有关的额外信息的游戏过程。
Roman Voeikov, Nikolay Falaleev, Ruslan Baikulov
Abstract: We present a neural network TTNet aimed at real-time processing of high-resolution table tennis videos, providing both temporal (events spotting) and spatial (ball detection and semantic segmentation) data. This approach gives core information for reasoning score updates by an auto-referee system. We also publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events, semantic segmentation masks, and ball coordinates for evaluation of multi-task approaches, primarily oriented on spotting of quick events and small objects tracking. TTNet demonstrated 97.0% accuracy in game events spotting along with 2 pixels RMSE in ball detection with 97.5% accuracy on the test part of the presented dataset. The proposed network allows the processing of downscaled full HD videos with inference time below 6 ms per input tensor on a machine with a single consumer-grade GPU. Thus, we are contributing to the development of real-time multi-task deep learning applications and presenting approach, which is potentially capable of substituting manual data collection by sports scouts, providing support for referees' decision-making, and gathering extra information about the game process.
摘要:本文提出了一种神经网络TTNET瞄准的高分辨率乒乓球视频实时处理,同时提供时间(事件斑点)和空间(球检测和语义分割)的数据。这种方法为通过自动裁判系统推理比分更新核心信息。我们还出版有标记的事件,语义分割口罩,并为多任务的评估球坐标的120 fps的乒乓球游戏视频多任务数据集OpenTTGames办法,对快速事件和小物体跟踪的斑点主要导向。 TTNET表明97.0%的精度在游戏事件与2个像素沿斑点RMSE在球检测与所呈现的数据集的测试部分97.5%的准确度。所提出的网络允许与推理时间低于每输入张量6毫秒的机器上的单一消费级GPU缩小的全高清视频的处理。因此,我们正在促进实时多任务的发展深度学习应用和呈现方式,这可能能够通过体育球探替代手工数据收集,为裁判的决策支持,并收集有关的额外信息的游戏过程。
12. Rice grain disease identification using dual phase convolutional neural network-based system aimed at small dataset [PDF] 返回目录
Tashin Ahmed, Chowdhury Rafeed Rahman, Md. Faysal Mahmud Abid
Abstract: Although Convolutional neural networks (CNNs) are widely used for plant disease detection, they require a large number of training samples when dealing with wide variety of heterogeneous background. In this work, a CNN based dual phase method has been proposed which can work effectively on small rice grain disease dataset with heterogeneity. At the first phase, Faster RCNN method is applied for cropping out the significant portion (rice grain) from the image. This initial phase results in a secondary dataset of rice grains devoid of heterogeneous background. Disease classification is performed on such derived and simplified samples using CNN architecture. Comparison of the dual phase approach with straight forward application of CNN on the small grain dataset shows the effectiveness of the proposed method which provides a 5 fold cross validation accuracy of 88.07%.
摘要:虽然卷积神经网络(细胞神经网络)被广泛用于植物疾病的检测,他们需要大量的训练样本与多种异构的背景的时候。在这项工作中,根据CNN双相已经提出其可以与异质性小米粒疾病数据集有效地工作。在第一阶段中,被施加用于裁剪出从图像中显著部分(稻米)更快的RCNN方法。在米粒的次级数据集该初始阶段的结果没有异质背景。疾病分类上使用CNN架构等衍生的和简化的样品上进行。与CNN对小晶粒数据集示出了直线前进应用所提出的方法,其提供的88.07%5倍交叉验证的精度的效果的双重阶段方法的比较。
Tashin Ahmed, Chowdhury Rafeed Rahman, Md. Faysal Mahmud Abid
Abstract: Although Convolutional neural networks (CNNs) are widely used for plant disease detection, they require a large number of training samples when dealing with wide variety of heterogeneous background. In this work, a CNN based dual phase method has been proposed which can work effectively on small rice grain disease dataset with heterogeneity. At the first phase, Faster RCNN method is applied for cropping out the significant portion (rice grain) from the image. This initial phase results in a secondary dataset of rice grains devoid of heterogeneous background. Disease classification is performed on such derived and simplified samples using CNN architecture. Comparison of the dual phase approach with straight forward application of CNN on the small grain dataset shows the effectiveness of the proposed method which provides a 5 fold cross validation accuracy of 88.07%.
摘要:虽然卷积神经网络(细胞神经网络)被广泛用于植物疾病的检测,他们需要大量的训练样本与多种异构的背景的时候。在这项工作中,根据CNN双相已经提出其可以与异质性小米粒疾病数据集有效地工作。在第一阶段中,被施加用于裁剪出从图像中显著部分(稻米)更快的RCNN方法。在米粒的次级数据集该初始阶段的结果没有异质背景。疾病分类上使用CNN架构等衍生的和简化的样品上进行。与CNN对小晶粒数据集示出了直线前进应用所提出的方法,其提供的88.07%5倍交叉验证的精度的效果的双重阶段方法的比较。
13. TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning [PDF] 返回目录
Pengcheng Wang, Zihao Wang, Zhilong Ji, Xiao Liu, Songfan Yang, Zhongqin Wu
Abstract: This paper introduces our approach to the EmotioNet Challenge 2020. We pose the AU recognition problem as a multi-task learning problem, where the non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head motion (the last 6 AUs) are modeled separately. The co-occurrence of the expression features and the head pose features are explored. We observe that different AUs converge at various speed. By choosing the optimal checkpoint for each AU, the recognition results are improved. We are able to obtain a final score of 0.746 in validation set and 0.7306 in the test set of the challenge.
摘要:本文介绍了我们对EmotioNet挑战2020年我们提出了AU识别问题的多任务学习问题,方法,即非刚性的面部肌肉运动(主要是第17个AU)和刚性头部运动(最后6个AU)分别建模。表达的共同出现的特征和头部姿态的特征进行了探索。我们观察到不同的AU在不同的速度收敛。通过选择每个AU的最佳检查点,识别结果得到改善。我们能够获得的验证集0.746和0.7306的最终得分在测试中所确立的挑战。
Pengcheng Wang, Zihao Wang, Zhilong Ji, Xiao Liu, Songfan Yang, Zhongqin Wu
Abstract: This paper introduces our approach to the EmotioNet Challenge 2020. We pose the AU recognition problem as a multi-task learning problem, where the non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head motion (the last 6 AUs) are modeled separately. The co-occurrence of the expression features and the head pose features are explored. We observe that different AUs converge at various speed. By choosing the optimal checkpoint for each AU, the recognition results are improved. We are able to obtain a final score of 0.746 in validation set and 0.7306 in the test set of the challenge.
摘要:本文介绍了我们对EmotioNet挑战2020年我们提出了AU识别问题的多任务学习问题,方法,即非刚性的面部肌肉运动(主要是第17个AU)和刚性头部运动(最后6个AU)分别建模。表达的共同出现的特征和头部姿态的特征进行了探索。我们观察到不同的AU在不同的速度收敛。通过选择每个AU的最佳检查点,识别结果得到改善。我们能够获得的验证集0.746和0.7306的最终得分在测试中所确立的挑战。
14. Multispectral Video Fusion for Non-contact Monitoring of Respiratory Rate and Apnea [PDF] 返回目录
Gaetano Scebba, Giulia Da Poian, Walter Karlen
Abstract: Continuous monitoring of respiratory activity is desirable in many clinical applications to detect respiratory events. Non-contact monitoring of respiration can be achieved with near- and far-infrared spectrum cameras. However, current technologies are not sufficiently robust to be used in clinical applications. For example, they fail to estimate an accurate respiratory rate (RR) during apnea. We present a novel algorithm based on multispectral data fusion that aims at estimating RR also during apnea. The algorithm independently addresses the RR estimation and apnea detection tasks. Respiratory information is extracted from multiple sources and fed into an RR estimator and an apnea detector whose results are fused into a final respiratory activity estimation. We evaluated the system retrospectively using data from 30 healthy adults who performed diverse controlled breathing tasks while lying supine in a dark room and reproduced central and obstructive apneic events. Combining multiple respiratory information from multispectral cameras improved the root mean square error (RMSE) accuracy of the RR estimation from up to 4.64 monospectral data down to 1.60 breaths/min. The median F1 scores for classifying obstructive (0.75 to 0.86) and central apnea (0.75 to 0.93) also improved. Furthermore, the independent consideration of apnea detection led to a more robust system (RMSE of 4.44 vs. 7.96 breaths/min). Our findings may represent a step towards the use of cameras for vital sign monitoring in medical applications.
摘要:呼吸活动的连续监测,在许多临床应用中需要检测的呼吸事件。非接触监测呼吸的可与近和远红外光谱摄像机来实现。然而,目前的技术是不是足够强大在临床应用中使用。例如,它们不能呼吸暂停期间来估计准确的呼吸率(RR)。提出了一种基于多光谱数据融合的新颖算法,其目的是呼吸暂停期间也估计RR。该算法独立解决RR估计与呼吸暂停的检测任务。呼吸信息是从多个来源提取和馈送到RR估计器和一个呼吸暂停检测,其结果被融合成一个最终的呼吸活动估计。我们回顾性评估利用谁执行不同的控制呼吸的任务,而在黑暗的房间里平躺仰卧,再生中央和阻塞性呼吸暂停事件30名健康成年人的数据系统。组合来自多光谱照相机的多个呼吸信息提高了RR估计的均方根误差(RMSE)精度从多达4.64 monospectral数据下降到1.60次/ min。用于分类阻塞性中值F1得分(0.75至0.86)和中枢性呼吸暂停(0.75至0.93)也得到改善。此外,呼吸暂停检测的独立考虑导致更健壮的系统(4.44对7.96次/ min的RMSE)。我们的研究结果可能代表对摄像头的使用为生命体征在医疗应用监控的一个步骤。
Gaetano Scebba, Giulia Da Poian, Walter Karlen
Abstract: Continuous monitoring of respiratory activity is desirable in many clinical applications to detect respiratory events. Non-contact monitoring of respiration can be achieved with near- and far-infrared spectrum cameras. However, current technologies are not sufficiently robust to be used in clinical applications. For example, they fail to estimate an accurate respiratory rate (RR) during apnea. We present a novel algorithm based on multispectral data fusion that aims at estimating RR also during apnea. The algorithm independently addresses the RR estimation and apnea detection tasks. Respiratory information is extracted from multiple sources and fed into an RR estimator and an apnea detector whose results are fused into a final respiratory activity estimation. We evaluated the system retrospectively using data from 30 healthy adults who performed diverse controlled breathing tasks while lying supine in a dark room and reproduced central and obstructive apneic events. Combining multiple respiratory information from multispectral cameras improved the root mean square error (RMSE) accuracy of the RR estimation from up to 4.64 monospectral data down to 1.60 breaths/min. The median F1 scores for classifying obstructive (0.75 to 0.86) and central apnea (0.75 to 0.93) also improved. Furthermore, the independent consideration of apnea detection led to a more robust system (RMSE of 4.44 vs. 7.96 breaths/min). Our findings may represent a step towards the use of cameras for vital sign monitoring in medical applications.
摘要:呼吸活动的连续监测,在许多临床应用中需要检测的呼吸事件。非接触监测呼吸的可与近和远红外光谱摄像机来实现。然而,目前的技术是不是足够强大在临床应用中使用。例如,它们不能呼吸暂停期间来估计准确的呼吸率(RR)。提出了一种基于多光谱数据融合的新颖算法,其目的是呼吸暂停期间也估计RR。该算法独立解决RR估计与呼吸暂停的检测任务。呼吸信息是从多个来源提取和馈送到RR估计器和一个呼吸暂停检测,其结果被融合成一个最终的呼吸活动估计。我们回顾性评估利用谁执行不同的控制呼吸的任务,而在黑暗的房间里平躺仰卧,再生中央和阻塞性呼吸暂停事件30名健康成年人的数据系统。组合来自多光谱照相机的多个呼吸信息提高了RR估计的均方根误差(RMSE)精度从多达4.64 monospectral数据下降到1.60次/ min。用于分类阻塞性中值F1得分(0.75至0.86)和中枢性呼吸暂停(0.75至0.93)也得到改善。此外,呼吸暂停检测的独立考虑导致更健壮的系统(4.44对7.96次/ min的RMSE)。我们的研究结果可能代表对摄像头的使用为生命体征在医疗应用监控的一个步骤。
15. Robust Motion Averaging under MaximumCorrentropy Criterion [PDF] 返回目录
Jihua Zhu, Jie Hu, Zhongyu Li, Badong Chen
Abstract: Recently, the motion averaging method has been introduced as an effective means to solve the multi-view registration problem. This method aims to recover global motions from a set of relative motions, where original method is sensitive to outliers due to using the Frobenius norm error in the optimization. Accordingly, this paper proposes a novel robust motion averaging method based on the maximum correntropy criterion (MCC). Specifically, the correntropy measure is used instead of utilizing Frobenius norm error to improve the robustness of motion averaging against outliers. According to the half-quadratic technique, the correntropy measure based optimization problem can be solved by the alternating minimization procedure, which includes operations of weight assignment and weighted motion averaging. Further, we design a selection strategy of adaptive kernel width to take advantage of correntropy. Experimental results on benchmark data sets illustrate that the new method has superior performance on accuracy and robustness for multi-view registration.
摘要:最近,运动平均法已被引入以解决多视点配准问题的有效手段。这种方法的目的是从一组相对运动的,在原来的方法是由于优化使用弗罗比尼斯范模误差异常敏感恢复的全球运动。因此,本文提出基于最大correntropy准则(MCC)的新颖健壮运动平均方法。具体而言,correntropy措施来代替利用Frobenius范数误差来改善针对异常值的运动平均的鲁棒性。根据半二次技术中,基于correntropy量度优化问题可以通过交替最小化过程,包括权重分配和加权运动平均的操作来解决。此外,我们设计的自适应核宽度的选择策略采取correntropy的优势。上基准数据集的实验结果表明,该新方法具有关于准确性和鲁棒性用于多视图登记优越的性能。
Jihua Zhu, Jie Hu, Zhongyu Li, Badong Chen
Abstract: Recently, the motion averaging method has been introduced as an effective means to solve the multi-view registration problem. This method aims to recover global motions from a set of relative motions, where original method is sensitive to outliers due to using the Frobenius norm error in the optimization. Accordingly, this paper proposes a novel robust motion averaging method based on the maximum correntropy criterion (MCC). Specifically, the correntropy measure is used instead of utilizing Frobenius norm error to improve the robustness of motion averaging against outliers. According to the half-quadratic technique, the correntropy measure based optimization problem can be solved by the alternating minimization procedure, which includes operations of weight assignment and weighted motion averaging. Further, we design a selection strategy of adaptive kernel width to take advantage of correntropy. Experimental results on benchmark data sets illustrate that the new method has superior performance on accuracy and robustness for multi-view registration.
摘要:最近,运动平均法已被引入以解决多视点配准问题的有效手段。这种方法的目的是从一组相对运动的,在原来的方法是由于优化使用弗罗比尼斯范模误差异常敏感恢复的全球运动。因此,本文提出基于最大correntropy准则(MCC)的新颖健壮运动平均方法。具体而言,correntropy措施来代替利用Frobenius范数误差来改善针对异常值的运动平均的鲁棒性。根据半二次技术中,基于correntropy量度优化问题可以通过交替最小化过程,包括权重分配和加权运动平均的操作来解决。此外,我们设计的自适应核宽度的选择策略采取correntropy的优势。上基准数据集的实验结果表明,该新方法具有关于准确性和鲁棒性用于多视图登记优越的性能。
16. Instance Segmentation of Biomedical Images with an Object-aware Embedding Learned with Local Constraints [PDF] 返回目录
Long Chen, Martin Strauch, Dorit Merhof
Abstract: Automatic instance segmentation is a problem that occurs in many biomedical applications. State-of-the-art approaches either perform semantic segmentation or refine object bounding boxes obtained from detection methods. Both suffer from crowded objects to varying degrees, merging adjacent objects or suppressing a valid object. In this work, we assign an embedding vector to each pixel through a deep neural network. The network is trained to output embedding vectors of similar directions for pixels from the same object, while adjacent objects are orthogonal in the embedding space, which effectively avoids the fusion of objects in a crowd. Our method yields state-of-the-art results even with a light-weighted backbone network on a cell segmentation (BBBC006 + DSB2018) and a leaf segmentation data set (CVPPP2017). The code and model weights are public available.
摘要:自动实例分割是发生在许多生物医学应用的问题。办法状态的最先进的任一执行语义分割或从检测方法获得的精制对象的边界框。无论是从挤对象遭受不同程度地合并相邻的对象或抑制有效的对象。在这项工作中,我们通过深层神经网络分配矢量嵌入到每个像素。网络被训练以用于从同一对象的像素相似的方向上输出的嵌入载体,而相邻的对象是在嵌入的空间,这有效地避免了在人群中的对象的融合正交。我们的方法产生状态的最先进的结果,即使在小区上分割光加权骨干网络(BBBC006 + DSB2018)和叶分割数据集(CVPPP2017)。代码和模型权重为公众提供。
Long Chen, Martin Strauch, Dorit Merhof
Abstract: Automatic instance segmentation is a problem that occurs in many biomedical applications. State-of-the-art approaches either perform semantic segmentation or refine object bounding boxes obtained from detection methods. Both suffer from crowded objects to varying degrees, merging adjacent objects or suppressing a valid object. In this work, we assign an embedding vector to each pixel through a deep neural network. The network is trained to output embedding vectors of similar directions for pixels from the same object, while adjacent objects are orthogonal in the embedding space, which effectively avoids the fusion of objects in a crowd. Our method yields state-of-the-art results even with a light-weighted backbone network on a cell segmentation (BBBC006 + DSB2018) and a leaf segmentation data set (CVPPP2017). The code and model weights are public available.
摘要:自动实例分割是发生在许多生物医学应用的问题。办法状态的最先进的任一执行语义分割或从检测方法获得的精制对象的边界框。无论是从挤对象遭受不同程度地合并相邻的对象或抑制有效的对象。在这项工作中,我们通过深层神经网络分配矢量嵌入到每个像素。网络被训练以用于从同一对象的像素相似的方向上输出的嵌入载体,而相邻的对象是在嵌入的空间,这有效地避免了在人群中的对象的融合正交。我们的方法产生状态的最先进的结果,即使在小区上分割光加权骨干网络(BBBC006 + DSB2018)和叶分割数据集(CVPPP2017)。代码和模型权重为公众提供。
17. Fast and Robust Registration of Aerial Images and LiDAR data Based on Structrual Features and 3D Phase Correlation [PDF] 返回目录
Bai Zhu, Yuanxin Ye, Chao Yang, Liang Zhou, Huiyu Liu, Yungang Cao
Abstract: Co-Registration of aerial imagery and Light Detection and Ranging (LiDAR) data is quilt challenging because the different imaging mechanism causes significant geometric and radiometric distortions between such data. To tackle the problem, this paper proposes an automatic registration method based on structural features and three-dimension (3D) phase correlation. In the proposed method, the LiDAR point cloud data is first transformed into the intensity map, which is used as the reference image. Then, we employ the Fast operator to extract uniformly distributed interest points in the aerial image by a partition strategy and perform a local geometric correction by using the collinearity equation to eliminate scale and rotation difference between images. Subsequently, a robust structural feature descriptor is build based on dense gradient features, and the 3D phase correlation is used to detect control points (CPs) between aerial images and LiDAR data in the frequency domain, where the image matching is accelerated by the 3D Fast Fourier Transform (FFT). Finally, the obtained CPs are employed to correct the exterior orientation elements, which is used to achieve co-registration of aerial images and LiDAR data. Experiments with two datasets of aerial images and LiDAR data show that the proposed method is much faster and more robust than state of the art methods
摘要:航空图像和光检测和测距的共同登记(LIDAR)数据被子挑战性,因为不同的成像机构使这样的数据之间显著几何和辐射失真。为了解决该问题,提出一种基于结构特征和三维(3D)相位相关的自动配准方法。在所提出的方法中,激光雷达点云数据首先转化为强度图,其被用作参考图像。然后,我们使用快速操作者通过一个分区策略的空间图像中提取均匀分布的兴趣点,并通过使用共线方程来消除图像之间的尺度和旋转差执行本地几何校正。随后,一个强大的结构特征描述符是构建基于密梯度特征,并且3D相位相关被用来检测天线的图像和激光雷达数据在频域中,其中,所述图像匹配由3D快速加速之间的控制点(CP)进行傅立叶变换(FFT)。最后,将所获得的CP被用来纠正外方位元素,其被用于实现空间图像和LiDAR数据的配准。与航空图像和LiDAR数据的两个数据集实验表明,所提出的方法是远远超过现有技术方法的状态更快更稳健
Bai Zhu, Yuanxin Ye, Chao Yang, Liang Zhou, Huiyu Liu, Yungang Cao
Abstract: Co-Registration of aerial imagery and Light Detection and Ranging (LiDAR) data is quilt challenging because the different imaging mechanism causes significant geometric and radiometric distortions between such data. To tackle the problem, this paper proposes an automatic registration method based on structural features and three-dimension (3D) phase correlation. In the proposed method, the LiDAR point cloud data is first transformed into the intensity map, which is used as the reference image. Then, we employ the Fast operator to extract uniformly distributed interest points in the aerial image by a partition strategy and perform a local geometric correction by using the collinearity equation to eliminate scale and rotation difference between images. Subsequently, a robust structural feature descriptor is build based on dense gradient features, and the 3D phase correlation is used to detect control points (CPs) between aerial images and LiDAR data in the frequency domain, where the image matching is accelerated by the 3D Fast Fourier Transform (FFT). Finally, the obtained CPs are employed to correct the exterior orientation elements, which is used to achieve co-registration of aerial images and LiDAR data. Experiments with two datasets of aerial images and LiDAR data show that the proposed method is much faster and more robust than state of the art methods
摘要:航空图像和光检测和测距的共同登记(LIDAR)数据被子挑战性,因为不同的成像机构使这样的数据之间显著几何和辐射失真。为了解决该问题,提出一种基于结构特征和三维(3D)相位相关的自动配准方法。在所提出的方法中,激光雷达点云数据首先转化为强度图,其被用作参考图像。然后,我们使用快速操作者通过一个分区策略的空间图像中提取均匀分布的兴趣点,并通过使用共线方程来消除图像之间的尺度和旋转差执行本地几何校正。随后,一个强大的结构特征描述符是构建基于密梯度特征,并且3D相位相关被用来检测天线的图像和激光雷达数据在频域中,其中,所述图像匹配由3D快速加速之间的控制点(CP)进行傅立叶变换(FFT)。最后,将所获得的CP被用来纠正外方位元素,其被用于实现空间图像和LiDAR数据的配准。与航空图像和LiDAR数据的两个数据集实验表明,所提出的方法是远远超过现有技术方法的状态更快更稳健
18. AMC-Loss: Angular Margin Contrastive Loss for Improved Explainability in Image Classification [PDF] 返回目录
Hongjun Choi, Anirudh Som, Pavan Turaga
Abstract: Deep-learning architectures for classification problems involve the cross-entropy loss sometimes assisted with auxiliary loss functions like center loss, contrastive loss and triplet loss. These auxiliary loss functions facilitate better discrimination between the different classes of interest. However, recent studies hint at the fact that these loss functions do not take into account the intrinsic angular distribution exhibited by the low-level and high-level feature representations. This results in less compactness between samples from the same class and unclear boundary separations between data clusters of different classes. In this paper, we address this issue by proposing the use of geometric constraints, rooted in Riemannian geometry. Specifically, we propose Angular Margin Contrastive Loss (AMC-Loss), a new loss function to be used along with the traditional cross-entropy loss. The AMC-Loss employs the discriminative angular distance metric that is equivalent to geodesic distance on a hypersphere manifold such that it can serve a clear geometric interpretation. We demonstrate the effectiveness of AMC-Loss by providing quantitative and qualitative results. We find that although the proposed geometrically constrained loss-function improves quantitative results modestly, it has a qualitatively surprisingly beneficial effect on increasing the interpretability of deep-net decisions as seen by the visual explanations generated by techniques such as the Grad-CAM. Our code is available at this https URL.
摘要:分类问题深入学习架构涉及交叉熵损失有时像中心的损失,对比损失和三重损失辅助损失函数协助。这些辅助损失函数方便不同阶级的利益之间更好地歧视。然而,最近的研究暗示了一个事实,即这些损失函数不考虑由低层次和高层次的特征表示表现出固有的角分布。这导致从相同类的样品和不同类的数据的簇之间的边界不明确的分离之间的较少的紧凑性。在本文中,我们通过提出使用几何约束,植根于黎曼几何解决这个问题。具体来说,我们建议角保证金对比损失(AMC-亏损),一个新的损失函数与传统的交叉熵损失一起使用。所述AMC-损失采用判别角距离度量上的超球歧管,使得其可具有明确的几何解释,它等效于测地距离。我们通过提供定量和定性结果表明AMC-损失的有效性。我们发现,尽管所提出的几何约束丧失功能提高定量结果谦虚,但对提高深净决策的解释性通过技术如梯度-CAM所产生的视觉解释为出现了令人惊讶的定性有益作用。我们的代码可在此HTTPS URL。
Hongjun Choi, Anirudh Som, Pavan Turaga
Abstract: Deep-learning architectures for classification problems involve the cross-entropy loss sometimes assisted with auxiliary loss functions like center loss, contrastive loss and triplet loss. These auxiliary loss functions facilitate better discrimination between the different classes of interest. However, recent studies hint at the fact that these loss functions do not take into account the intrinsic angular distribution exhibited by the low-level and high-level feature representations. This results in less compactness between samples from the same class and unclear boundary separations between data clusters of different classes. In this paper, we address this issue by proposing the use of geometric constraints, rooted in Riemannian geometry. Specifically, we propose Angular Margin Contrastive Loss (AMC-Loss), a new loss function to be used along with the traditional cross-entropy loss. The AMC-Loss employs the discriminative angular distance metric that is equivalent to geodesic distance on a hypersphere manifold such that it can serve a clear geometric interpretation. We demonstrate the effectiveness of AMC-Loss by providing quantitative and qualitative results. We find that although the proposed geometrically constrained loss-function improves quantitative results modestly, it has a qualitatively surprisingly beneficial effect on increasing the interpretability of deep-net decisions as seen by the visual explanations generated by techniques such as the Grad-CAM. Our code is available at this https URL.
摘要:分类问题深入学习架构涉及交叉熵损失有时像中心的损失,对比损失和三重损失辅助损失函数协助。这些辅助损失函数方便不同阶级的利益之间更好地歧视。然而,最近的研究暗示了一个事实,即这些损失函数不考虑由低层次和高层次的特征表示表现出固有的角分布。这导致从相同类的样品和不同类的数据的簇之间的边界不明确的分离之间的较少的紧凑性。在本文中,我们通过提出使用几何约束,植根于黎曼几何解决这个问题。具体来说,我们建议角保证金对比损失(AMC-亏损),一个新的损失函数与传统的交叉熵损失一起使用。所述AMC-损失采用判别角距离度量上的超球歧管,使得其可具有明确的几何解释,它等效于测地距离。我们通过提供定量和定性结果表明AMC-损失的有效性。我们发现,尽管所提出的几何约束丧失功能提高定量结果谦虚,但对提高深净决策的解释性通过技术如梯度-CAM所产生的视觉解释为出现了令人惊讶的定性有益作用。我们的代码可在此HTTPS URL。
19. Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition [PDF] 返回目录
Qi Li, Hanlin Mo, Jinghan Zhao, Hongxiang Hao, Hua Li
Abstract: The dynamics of human skeletons have significant information for the task of action recognition. The similarity between trajectories of corresponding joints is an indicating feature of the same action, while this similarity may subject to some distortions that can be modeled as the combination of spatial and temporal affine transformations. In this work, we propose a novel feature called spatio-temporal dual affine differential invariant (STDADI). Furthermore, in order to improve the generalization ability of neural networks, a channel augmentation method is proposed. On the large scale action recognition dataset NTU-RGB+D, and its extended version NTU-RGB+D 120, it achieves remarkable improvements over previous state-of-the-art methods.
摘要:人类骨骼的动态具有动作识别的任务显著信息。对应关节的轨迹之间的相似性是相同的动作的指示特征,而此相似性可以受到一些扭曲现象可以建模为时空的仿射变换的组合。在这项工作中,我们提出了所谓的时空双仿射微分不变(STDADI)一个新的特征。此外,为了提高神经网络的推广能力,信道增强方法提出。在大规模动作识别数据集NTU-RGB + d,及其扩展版本NTU-RGB + d 120,其实现比以前的状态的最先进的方法显着改善。
Qi Li, Hanlin Mo, Jinghan Zhao, Hongxiang Hao, Hua Li
Abstract: The dynamics of human skeletons have significant information for the task of action recognition. The similarity between trajectories of corresponding joints is an indicating feature of the same action, while this similarity may subject to some distortions that can be modeled as the combination of spatial and temporal affine transformations. In this work, we propose a novel feature called spatio-temporal dual affine differential invariant (STDADI). Furthermore, in order to improve the generalization ability of neural networks, a channel augmentation method is proposed. On the large scale action recognition dataset NTU-RGB+D, and its extended version NTU-RGB+D 120, it achieves remarkable improvements over previous state-of-the-art methods.
摘要:人类骨骼的动态具有动作识别的任务显著信息。对应关节的轨迹之间的相似性是相同的动作的指示特征,而此相似性可以受到一些扭曲现象可以建模为时空的仿射变换的组合。在这项工作中,我们提出了所谓的时空双仿射微分不变(STDADI)一个新的特征。此外,为了提高神经网络的推广能力,信道增强方法提出。在大规模动作识别数据集NTU-RGB + d,及其扩展版本NTU-RGB + d 120,其实现比以前的状态的最先进的方法显着改善。
20. A CNN Framenwork Based on Line Annotations for Detecting Nematodes in Microscopic Images [PDF] 返回目录
Long Chen, Martin Strauch, Matthias Daub, Xiaochen Jiang, Marcus Jansen, Hans-Georg Luigs, Susanne Schultz-Kuhlmann, Stefan Krüssel, Dorif Merhof
Abstract: Plant parasitic nematodes cause damage to crop plants on a global scale. Robust detection on image data is a prerequisite for monitoring such nematodes, as well as for many biological studies involving the nematode C. elegans, a common model organism. Here, we propose a framework for detecting worm-shaped objects in microscopic images that is based on convolutional neural networks (CNNs). We annotate nematodes with curved lines along the body, which is more suitable for worm-shaped objects than bounding boxes. The trained model predicts worm skeletons and body endpoints. The endpoints serve to untangle the skeletons from which segmentation masks are reconstructed by estimating the body width at each location along the skeleton. With light-weight backbone networks, we achieve 75.85 % precision, 73.02 % recall on a potato cyst nematode data set and 84.20 % precision, 85.63 % recall on a public C. elegans data set.
摘要:植物寄生线虫造成损害作物在全球范围内。对图像数据的鲁棒检测是用于监控这种线虫,以及用于涉及线虫秀丽隐杆线虫,一个共同的模式生物许多生物学研究的先决条件。在这里,我们提出一种用于检测在显微镜图像蜗杆状的对象是基于卷积神经网络(细胞神经网络)的框架。我们注释线虫与沿着主体弯曲线,其中更适合于比边界框蜗杆状的物体。训练的模型预测虫的骨骼和身体的端点。端点用于解开从该分割掩码是由在沿骨架的每个位置估计车身宽度重建的骨骼。随着轻质骨干网,我们实现了75.85%的准确率,73.02%的召回对马铃薯胞囊线虫的数据集和84.20%的准确率,85.63%的召回在公共线虫的数据集。
Long Chen, Martin Strauch, Matthias Daub, Xiaochen Jiang, Marcus Jansen, Hans-Georg Luigs, Susanne Schultz-Kuhlmann, Stefan Krüssel, Dorif Merhof
Abstract: Plant parasitic nematodes cause damage to crop plants on a global scale. Robust detection on image data is a prerequisite for monitoring such nematodes, as well as for many biological studies involving the nematode C. elegans, a common model organism. Here, we propose a framework for detecting worm-shaped objects in microscopic images that is based on convolutional neural networks (CNNs). We annotate nematodes with curved lines along the body, which is more suitable for worm-shaped objects than bounding boxes. The trained model predicts worm skeletons and body endpoints. The endpoints serve to untangle the skeletons from which segmentation masks are reconstructed by estimating the body width at each location along the skeleton. With light-weight backbone networks, we achieve 75.85 % precision, 73.02 % recall on a potato cyst nematode data set and 84.20 % precision, 85.63 % recall on a public C. elegans data set.
摘要:植物寄生线虫造成损害作物在全球范围内。对图像数据的鲁棒检测是用于监控这种线虫,以及用于涉及线虫秀丽隐杆线虫,一个共同的模式生物许多生物学研究的先决条件。在这里,我们提出一种用于检测在显微镜图像蜗杆状的对象是基于卷积神经网络(细胞神经网络)的框架。我们注释线虫与沿着主体弯曲线,其中更适合于比边界框蜗杆状的物体。训练的模型预测虫的骨骼和身体的端点。端点用于解开从该分割掩码是由在沿骨架的每个位置估计车身宽度重建的骨骼。随着轻质骨干网,我们实现了75.85%的准确率,73.02%的召回对马铃薯胞囊线虫的数据集和84.20%的准确率,85.63%的召回在公共线虫的数据集。
21. Decoupling Video and Human Motion: Towards Practical Event Detection in Athlete Recordings [PDF] 返回目录
Moritz Einfalt, Rainer Lienhart
Abstract: In this paper we address the problem of motion event detection in athlete recordings from individual sports. In contrast to recent end-to-end approaches, we propose to use 2D human pose sequences as an intermediate representation that decouples human motion from the raw video information. Combined with domain-adapted athlete tracking, we describe two approaches to event detection on pose sequences and evaluate them in complementary domains: swimming and athletics. For swimming, we show how robust decision rules on pose statistics can detect different motion events during swim starts, with a F1 score of over 91% despite limited data. For athletics, we use a convolutional sequence model to infer stride-related events in long and triple jump recordings, leading to highly accurate detections with 96% in F1 score at only +/- 5ms temporal deviation. Our approach is not limited to these domains and shows the flexibility of pose-based motion event detection.
摘要:在本文中,我们解决个别体育健将记录运动事件检测的问题。相比于最近端至端接近,我们建议使用2D人类姿势的序列作为解耦的原始视频信息的人体运动的中间表示。与域适应运动员的跟踪相结合,我们将介绍两种方法事件检测的姿势序列和互补领域评价他们:游泳和田径。游泳,我们展示的姿势统计强大的决策规则如何能在游泳开始检测不同的运动事件,与F1分数超过91%,尽管数据有限。对于田径,我们使用长和三级跳录音卷积序列模型来推断跨度相关的事件,导致了高度精确的检测与F1分数96%只在+/- 5毫秒的时间差。我们的方法不限定于这些结构域和示出了基于姿态运动事件检测的灵活性。
Moritz Einfalt, Rainer Lienhart
Abstract: In this paper we address the problem of motion event detection in athlete recordings from individual sports. In contrast to recent end-to-end approaches, we propose to use 2D human pose sequences as an intermediate representation that decouples human motion from the raw video information. Combined with domain-adapted athlete tracking, we describe two approaches to event detection on pose sequences and evaluate them in complementary domains: swimming and athletics. For swimming, we show how robust decision rules on pose statistics can detect different motion events during swim starts, with a F1 score of over 91% despite limited data. For athletics, we use a convolutional sequence model to infer stride-related events in long and triple jump recordings, leading to highly accurate detections with 96% in F1 score at only +/- 5ms temporal deviation. Our approach is not limited to these domains and shows the flexibility of pose-based motion event detection.
摘要:在本文中,我们解决个别体育健将记录运动事件检测的问题。相比于最近端至端接近,我们建议使用2D人类姿势的序列作为解耦的原始视频信息的人体运动的中间表示。与域适应运动员的跟踪相结合,我们将介绍两种方法事件检测的姿势序列和互补领域评价他们:游泳和田径。游泳,我们展示的姿势统计强大的决策规则如何能在游泳开始检测不同的运动事件,与F1分数超过91%,尽管数据有限。对于田径,我们使用长和三级跳录音卷积序列模型来推断跨度相关的事件,导致了高度精确的检测与F1分数96%只在+/- 5毫秒的时间差。我们的方法不限定于这些结构域和示出了基于姿态运动事件检测的灵活性。
22. Fine-Grained Expression Manipulation via Structured Latent Space [PDF] 返回目录
Junshu Tang, Zhiwen Shao, Lizhuang Ma
Abstract: Fine-grained facial expression manipulation is a challenging problem, as fine-grained expression details are difficult to be captured. Most existing expression manipulation methods resort to discrete expression labels, which mainly edit global expressions and ignore the manipulation of fine details. To tackle this limitation, we propose an end-to-end expression-guided generative adversarial network (EGGAN), which utilizes structured latent codes and continuous expression labels as input to generate images with expected expressions. Specifically, we adopt an adversarial autoencoder to map a source image into a structured latent space. Then, given the source latent code and the target expression label, we employ a conditional GAN to generate a new image with the target expression. Moreover, we introduce a perceptual loss and a multi-scale structural similarity loss to preserve identity and global shape during generation. Extensive experiments show that our method can manipulate fine-grained expressions, and generate continuous intermediate expressions between source and target expressions.
摘要:细粒度表情操纵是一个具有挑战性的问题,因为细粒度表达细节难以被捕获。大多数现有的表达操作方法采取分立式标签,主要是编辑的全球表现而忽略了细节的操作。为了解决这种限制,我们提出了一种端至端表达引导生成对抗网络(EGGAN)是一种利用结构化潜代码和持续表达标签作为输入来产生具有预期表达式图像。具体而言,我们采用一个自动编码对抗向源图像映射到结构化的潜在空间。然后,给定源代码潜和目标表达标签,我们采用有条件GAN以产生与所述靶标表达的新图像。此外,我们引入一个感性的损失和多尺度结构相似性产生损耗过程中保留身份和全局形状。大量的实验表明,我们的方法可以操纵细粒度表达式,产生源和目标表达式之间连续中间表达式。
Junshu Tang, Zhiwen Shao, Lizhuang Ma
Abstract: Fine-grained facial expression manipulation is a challenging problem, as fine-grained expression details are difficult to be captured. Most existing expression manipulation methods resort to discrete expression labels, which mainly edit global expressions and ignore the manipulation of fine details. To tackle this limitation, we propose an end-to-end expression-guided generative adversarial network (EGGAN), which utilizes structured latent codes and continuous expression labels as input to generate images with expected expressions. Specifically, we adopt an adversarial autoencoder to map a source image into a structured latent space. Then, given the source latent code and the target expression label, we employ a conditional GAN to generate a new image with the target expression. Moreover, we introduce a perceptual loss and a multi-scale structural similarity loss to preserve identity and global shape during generation. Extensive experiments show that our method can manipulate fine-grained expressions, and generate continuous intermediate expressions between source and target expressions.
摘要:细粒度表情操纵是一个具有挑战性的问题,因为细粒度表达细节难以被捕获。大多数现有的表达操作方法采取分立式标签,主要是编辑的全球表现而忽略了细节的操作。为了解决这种限制,我们提出了一种端至端表达引导生成对抗网络(EGGAN)是一种利用结构化潜代码和持续表达标签作为输入来产生具有预期表达式图像。具体而言,我们采用一个自动编码对抗向源图像映射到结构化的潜在空间。然后,给定源代码潜和目标表达标签,我们采用有条件GAN以产生与所述靶标表达的新图像。此外,我们引入一个感性的损失和多尺度结构相似性产生损耗过程中保留身份和全局形状。大量的实验表明,我们的方法可以操纵细粒度表达式,产生源和目标表达式之间连续中间表达式。
23. Take a NAP: Non-Autoregressive Prediction for Pedestrian Trajectories [PDF] 返回目录
Hao Xue, Du. Q. Huynh, Mark Reynolds
Abstract: Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance.
摘要:行人轨迹预测是一项艰巨的任务,因为有人类的运动行为的三个属性,其需要解决的问题,即从其他行人,现场的限制,和预测的多模式(multiroute)性质的社会影响力。虽然现有方法已经探索这些关键特性,这些方法的预测处理是自回归。这意味着他们只能预测未来的位置顺序。在本文中,我们本NAP,对于轨迹预测的非自回归方法。我们的方法包括:专门设计特征的编码器和一个潜变量发生器来处理上述三个特性。它也有一个时间无关的上下文产生和用于非自回归预测的时间 - 特定上下文生成。通过比较最近对几种方法NAP广泛的实验,我们表明,NAP有国家的最先进的轨迹预测性能。
Hao Xue, Du. Q. Huynh, Mark Reynolds
Abstract: Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance.
摘要:行人轨迹预测是一项艰巨的任务,因为有人类的运动行为的三个属性,其需要解决的问题,即从其他行人,现场的限制,和预测的多模式(multiroute)性质的社会影响力。虽然现有方法已经探索这些关键特性,这些方法的预测处理是自回归。这意味着他们只能预测未来的位置顺序。在本文中,我们本NAP,对于轨迹预测的非自回归方法。我们的方法包括:专门设计特征的编码器和一个潜变量发生器来处理上述三个特性。它也有一个时间无关的上下文产生和用于非自回归预测的时间 - 特定上下文生成。通过比较最近对几种方法NAP广泛的实验,我们表明,NAP有国家的最先进的轨迹预测性能。
24. The 1st Agriculture-Vision Challenge: Methods and Results [PDF] 返回目录
Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao, Shaozuo Yu, Siwei Yang, Yin Wang, Hao Sheng, Xiao Chen, Jingyi Su, Ram Rajagopal, Andrew Ng, Van Thong Huynh, Soo-Hyung KimIn-Seop Nan, Ujjwal Baid, Shubham Innani, Prasad Dutande, Bhakti Baheti, Jianyu Tang
Abstract: The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here.
摘要:第一个与农业视觉挑战赛旨在鼓励研究开发从航拍图像农业模式识别新颖而有效的算法,尤其是对我们的挑战数据集相关联的语义分割任务。大约57来自不同国家的参赛队伍竞争,实现国家的最先进的空中农业语义分割。农业-Vision的挑战数据集采用,其包括21061天线和多光谱农田图像。本文提供的挑战显着的方法和结果的摘要。我们提交服务器的排行榜将继续打开研究者有兴趣在这个挑战数据集和任务;链接可以在这里找到。
Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao, Shaozuo Yu, Siwei Yang, Yin Wang, Hao Sheng, Xiao Chen, Jingyi Su, Ram Rajagopal, Andrew Ng, Van Thong Huynh, Soo-Hyung KimIn-Seop Nan, Ujjwal Baid, Shubham Innani, Prasad Dutande, Bhakti Baheti, Jianyu Tang
Abstract: The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here.
摘要:第一个与农业视觉挑战赛旨在鼓励研究开发从航拍图像农业模式识别新颖而有效的算法,尤其是对我们的挑战数据集相关联的语义分割任务。大约57来自不同国家的参赛队伍竞争,实现国家的最先进的空中农业语义分割。农业-Vision的挑战数据集采用,其包括21061天线和多光谱农田图像。本文提供的挑战显着的方法和结果的摘要。我们提交服务器的排行榜将继续打开研究者有兴趣在这个挑战数据集和任务;链接可以在这里找到。
25. MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation [PDF] 返回目录
Yu Qiu, Yun Liu, Jing Xu
Abstract: The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19), has seriously threatened global health. The gold standard for COVID-19 diagnosis is the tried-and-true polymerase chain reaction (PCR), but PCR is a laborious, time-consuming and complicated manual process that is in short supply. Deep learning based computer-aided screening, e.g., infection segmentation, is thus viewed as an alternative due to its great successes in medical imaging. However, the publicly available COVID-19 training data are limited, which would easily cause overfitting of traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also important for quick deployment and development of computer-aided COVID-19 screening systems, but traditional deep learning methods, especially for image segmentation, are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths: i) it only has 472K parameters and is thus not easy to overfit; ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg with traditional methods. Code and models will be released to promote the research and practical deployment for computer-aided COVID-19 screening.
摘要:新流行病的迅速蔓延,冠状病毒病2019(COVID-19),已经严重威胁全球健康。的黄金标准COVID-19的诊断是尝试和真实聚合酶链反应(PCR),但PCR是费力,费时且复杂的手动过程,是供不应求。深基于学习计算机辅助筛选,例如,感染的分割,因此视为一种替代方法由于在医学成像中的巨大的成功。然而,公众可获得的COVID-19训练数据有限,这将容易造成传统深度学习方法,这些方法通常是大量数据的数以百万计的参数过度拟合。在另一方面,快速的训练/测试和低计算成本也快速部署和计算机辅助COVID-19筛选系统的发展,但传统深厚的学习方法很重要,尤其是对于图像分割,通常是计算密集型的。为了解决上述问题,我们提出MiniSeg,高效COVID-19分割一个轻量级的深度学习模式。与传统的分割方法相比,MiniSeg有几个显著的优势:1)它只有472K的参数,因此不容易过拟合; ⅱ)它具有高计算效率,并因此便于实际部署; III)可以通过使用他们的私人COVID-19数据,以便进一步提高性能的其他用户进行快速的再培训。此外,我们还建立MiniSeg与传统方法比较全面的COVID-19分割基准。代码和模型将被释放,以促进计算机辅助COVID-19筛查研究和实际部署。
Yu Qiu, Yun Liu, Jing Xu
Abstract: The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19), has seriously threatened global health. The gold standard for COVID-19 diagnosis is the tried-and-true polymerase chain reaction (PCR), but PCR is a laborious, time-consuming and complicated manual process that is in short supply. Deep learning based computer-aided screening, e.g., infection segmentation, is thus viewed as an alternative due to its great successes in medical imaging. However, the publicly available COVID-19 training data are limited, which would easily cause overfitting of traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also important for quick deployment and development of computer-aided COVID-19 screening systems, but traditional deep learning methods, especially for image segmentation, are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths: i) it only has 472K parameters and is thus not easy to overfit; ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg with traditional methods. Code and models will be released to promote the research and practical deployment for computer-aided COVID-19 screening.
摘要:新流行病的迅速蔓延,冠状病毒病2019(COVID-19),已经严重威胁全球健康。的黄金标准COVID-19的诊断是尝试和真实聚合酶链反应(PCR),但PCR是费力,费时且复杂的手动过程,是供不应求。深基于学习计算机辅助筛选,例如,感染的分割,因此视为一种替代方法由于在医学成像中的巨大的成功。然而,公众可获得的COVID-19训练数据有限,这将容易造成传统深度学习方法,这些方法通常是大量数据的数以百万计的参数过度拟合。在另一方面,快速的训练/测试和低计算成本也快速部署和计算机辅助COVID-19筛选系统的发展,但传统深厚的学习方法很重要,尤其是对于图像分割,通常是计算密集型的。为了解决上述问题,我们提出MiniSeg,高效COVID-19分割一个轻量级的深度学习模式。与传统的分割方法相比,MiniSeg有几个显著的优势:1)它只有472K的参数,因此不容易过拟合; ⅱ)它具有高计算效率,并因此便于实际部署; III)可以通过使用他们的私人COVID-19数据,以便进一步提高性能的其他用户进行快速的再培训。此外,我们还建立MiniSeg与传统方法比较全面的COVID-19分割基准。代码和模型将被释放,以促进计算机辅助COVID-19筛查研究和实际部署。
26. TrueBranch: Metric Learning-based Verification of Forest Conservation Projects [PDF] 返回目录
Simona Santamaria, David Dao, Björn Lütjens, Ce Zhang
Abstract: International stakeholders increasingly invest in offsetting carbon emissions, for example, via issuing Payments for Ecosystem Services (PES) to forest conservation projects. Issuing trusted payments requires a transparent monitoring, reporting, and verification (MRV) process of the ecosystem services (e.g., carbon stored in forests). The current MRV process, however, is either too expensive (on-ground inspection of forest) or inaccurate (satellite). Recent works propose low-cost and accurate MRV via automatically determining forest carbon from drone imagery, collected by the landowners. The automation of MRV, however, opens up the possibility that landowners report untruthful drone imagery. To be robust against untruthful reporting, we propose TrueBranch, a metric learning-based algorithm that verifies the truthfulness of drone imagery from forest conservation projects. TrueBranch aims to detect untruthfully reported drone imagery by matching it with public satellite imagery. Preliminary results suggest that nominal distance metrics are not sufficient to reliably detect untruthfully reported imagery. TrueBranch leverages metric learning to create a feature embedding in which truthfully and untruthfully collected imagery is easily distinguishable by distance thresholding.
摘要:国际利益相关者越来越多投资于抵消碳排放,例如,通过发放生态系统服务(PES)支付给森林保护项目。发出可信付款需要一个透明的监测,报告和核实(MRV)的生态系统服务的过程(例如,存储在森林的碳)。当前MRV过程,但是,是要么太昂贵(地面上森林的检查)或不准确的(卫星)。最近的作品提出了低成本和精确的MRV通过自动确定从无人机影像森林碳,由土地所有者收集。 MRV的自动化,然而,打开了那个地主报告不真实无人机影像的可能性。要针对不实报告强劲,我们提出TrueBranch,度量学习型算法,验证无人机影像的森林保护项目的真实性。 TrueBranch旨在通过公共卫星图像匹配它来检测不如实报告无人机图像。初步结果表明,名义距离度量是不足以可靠地检测不真实报道图像。 TrueBranch杠杆度量学习创建功能嵌入其中如实不真实收集的图像是距离阈值很容易分辨。
Simona Santamaria, David Dao, Björn Lütjens, Ce Zhang
Abstract: International stakeholders increasingly invest in offsetting carbon emissions, for example, via issuing Payments for Ecosystem Services (PES) to forest conservation projects. Issuing trusted payments requires a transparent monitoring, reporting, and verification (MRV) process of the ecosystem services (e.g., carbon stored in forests). The current MRV process, however, is either too expensive (on-ground inspection of forest) or inaccurate (satellite). Recent works propose low-cost and accurate MRV via automatically determining forest carbon from drone imagery, collected by the landowners. The automation of MRV, however, opens up the possibility that landowners report untruthful drone imagery. To be robust against untruthful reporting, we propose TrueBranch, a metric learning-based algorithm that verifies the truthfulness of drone imagery from forest conservation projects. TrueBranch aims to detect untruthfully reported drone imagery by matching it with public satellite imagery. Preliminary results suggest that nominal distance metrics are not sufficient to reliably detect untruthfully reported imagery. TrueBranch leverages metric learning to create a feature embedding in which truthfully and untruthfully collected imagery is easily distinguishable by distance thresholding.
摘要:国际利益相关者越来越多投资于抵消碳排放,例如,通过发放生态系统服务(PES)支付给森林保护项目。发出可信付款需要一个透明的监测,报告和核实(MRV)的生态系统服务的过程(例如,存储在森林的碳)。当前MRV过程,但是,是要么太昂贵(地面上森林的检查)或不准确的(卫星)。最近的作品提出了低成本和精确的MRV通过自动确定从无人机影像森林碳,由土地所有者收集。 MRV的自动化,然而,打开了那个地主报告不真实无人机影像的可能性。要针对不实报告强劲,我们提出TrueBranch,度量学习型算法,验证无人机影像的森林保护项目的真实性。 TrueBranch旨在通过公共卫星图像匹配它来检测不如实报告无人机图像。初步结果表明,名义距离度量是不足以可靠地检测不真实报道图像。 TrueBranch杠杆度量学习创建功能嵌入其中如实不真实收集的图像是距离阈值很容易分辨。
27. NPF-MVSNet: Normal and Pyramid Feature Aided Unsupervised MVS Network [PDF] 返回目录
Baichuan Huang, Can Huang, Yijia He, Jingbin Liu, Xiao Liu
Abstract: We proposed an unsupervised learning-based network, named NPF-MVSNet, for multi-view stereo reconstruction without ground-truth 3D training data. Our network put forward: (a)The pyramid feature aggregation to capture more contextual information for cost volume construction; (b) Normal-depth consistency to make estimated depth maps more reasonable and precise in the real 3D world; (c) the combination of pixel-wise and feature-wise loss function to learning the inherent constraint from the perspective of perception beyond the pixel value. The experiments have proved the state of arts of NPF-MVSNet and each innovation insights contributes to the network with effective improvement. The excellent generalization ability of our network without any finetuning is shows in the leaderboard of Tanks \& Temples datasets. NPF-MVSNet is the best unsupervised MVS network with limited GPU memory consumption until April 17, 2020. Our codebase is available at this https URL.
摘要:提出了一种无监督学习型网络,名为NPF-MVSNet,多视点立体重建,而不地面实况3D训练数据。我们的网络提出:(一)金字塔特征聚集捕捉成本音量建设更多的上下文信息; (B)正常深度的一致性,使估计的深度映射更加合理,在真实世界3D精确; (c)中逐像素和特征明智损失函数的学习从感知的以外的像素值的观点来看的固有约束的组合。实验已经证明NPF-MVSNet每个创新的见解有助于工艺美术品的状态与有效的改善了网络。没有任何我们的细化和微调网优泛化能力是在坦克\&寺庙数据集的排行榜节目。 NPF-MVSNet是最好的无监督MVS网络限制GPU内存消耗,直到4月17日,2020年我们的代码库可在此HTTPS URL。
Baichuan Huang, Can Huang, Yijia He, Jingbin Liu, Xiao Liu
Abstract: We proposed an unsupervised learning-based network, named NPF-MVSNet, for multi-view stereo reconstruction without ground-truth 3D training data. Our network put forward: (a)The pyramid feature aggregation to capture more contextual information for cost volume construction; (b) Normal-depth consistency to make estimated depth maps more reasonable and precise in the real 3D world; (c) the combination of pixel-wise and feature-wise loss function to learning the inherent constraint from the perspective of perception beyond the pixel value. The experiments have proved the state of arts of NPF-MVSNet and each innovation insights contributes to the network with effective improvement. The excellent generalization ability of our network without any finetuning is shows in the leaderboard of Tanks \& Temples datasets. NPF-MVSNet is the best unsupervised MVS network with limited GPU memory consumption until April 17, 2020. Our codebase is available at this https URL.
摘要:提出了一种无监督学习型网络,名为NPF-MVSNet,多视点立体重建,而不地面实况3D训练数据。我们的网络提出:(一)金字塔特征聚集捕捉成本音量建设更多的上下文信息; (B)正常深度的一致性,使估计的深度映射更加合理,在真实世界3D精确; (c)中逐像素和特征明智损失函数的学习从感知的以外的像素值的观点来看的固有约束的组合。实验已经证明NPF-MVSNet每个创新的见解有助于工艺美术品的状态与有效的改善了网络。没有任何我们的细化和微调网优泛化能力是在坦克\&寺庙数据集的排行榜节目。 NPF-MVSNet是最好的无监督MVS网络限制GPU内存消耗,直到4月17日,2020年我们的代码库可在此HTTPS URL。
28. Image Retrieval using Multi-scale CNN Features Pooling [PDF] 返回目录
Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo
Abstract: In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on NetVLAD and a triplet mining procedure based on samples difficulty to obtain an effective image representation. Extensive experiments show that our approach is able to reach state-of-the-art results on three standard datasets.
摘要:在本文中,我们通过学习基于卷积神经网络的激活图像表示解决图像检索问题。我们提出,它利用基于NetVLAD一种新颖的多尺度本地池和基于样品困难获得有效图像表示三重采矿过程的端至端可训练网络架构。大量的实验表明,我们的做法是能够在三个标准数据集达到国家的先进成果。
Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo
Abstract: In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on NetVLAD and a triplet mining procedure based on samples difficulty to obtain an effective image representation. Extensive experiments show that our approach is able to reach state-of-the-art results on three standard datasets.
摘要:在本文中,我们通过学习基于卷积神经网络的激活图像表示解决图像检索问题。我们提出,它利用基于NetVLAD一种新颖的多尺度本地池和基于样品困难获得有效图像表示三重采矿过程的端至端可训练网络架构。大量的实验表明,我们的做法是能够在三个标准数据集达到国家的先进成果。
29. Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution [PDF] 返回目录
Yingruo Fan, Jacqueline C.K. Lam, Victor O.K. Li
Abstract: The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets.
摘要:面部动作单元(AU)的强度估计困难,因为在人的面部美观微妙的变化。以前的方法主要是依靠概率模型或AU的造型之中共生关系的预定义的规则,导致有限的概括。相比之下,我们提出了一个新的学习框架,通过建立特征图之间的语义对应自动学习的AU的潜在关系。在基于回归网络热图,特征映射保持与非盟的强度和位置相关的丰富的语义信息。此外,AU共同发生图案可以通过激活一组特征信道,其中每个信道编码AU的一个特定的视觉图案的被反射。这促使我们特征信道,其隐含地表示AU强度水平的共现关系之间的相关性进行建模。具体而言,我们引入一个语义对应卷积(SCC)模块,以动态地计算从深和低分辨率特征地图的对应,并且因此增强的特征的可辨性。实验结果表明,有效性和我们的方法的两个标准数据集卓越性能。
Yingruo Fan, Jacqueline C.K. Lam, Victor O.K. Li
Abstract: The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets.
摘要:面部动作单元(AU)的强度估计困难,因为在人的面部美观微妙的变化。以前的方法主要是依靠概率模型或AU的造型之中共生关系的预定义的规则,导致有限的概括。相比之下,我们提出了一个新的学习框架,通过建立特征图之间的语义对应自动学习的AU的潜在关系。在基于回归网络热图,特征映射保持与非盟的强度和位置相关的丰富的语义信息。此外,AU共同发生图案可以通过激活一组特征信道,其中每个信道编码AU的一个特定的视觉图案的被反射。这促使我们特征信道,其隐含地表示AU强度水平的共现关系之间的相关性进行建模。具体而言,我们引入一个语义对应卷积(SCC)模块,以动态地计算从深和低分辨率特征地图的对应,并且因此增强的特征的可辨性。实验结果表明,有效性和我们的方法的两个标准数据集卓越性能。
30. LRCN-RetailNet: A recurrent neural network architecture for accurate people counting [PDF] 返回目录
Lucas Massa, Adriano Barbosa, Krerley Oliveira, Thales Vieira
Abstract: Measuring and analyzing the flow of customers in retail stores is essential for a retailer to better comprehend customers' behavior and support decision-making. Nevertheless, not much attention has been given to the development of novel technologies for automatic people counting. We introduce LRCN-RetailNet: a recurrent neural network architecture capable of learning a non-linear regression model and accurately predicting the people count from videos captured by low-cost surveillance cameras. The input video format follows the recently proposed RGBP image format, which is comprised of color and people (foreground) information. Our architecture is capable of considering two relevant aspects: spatial features extracted through convolutional layers from the RGBP images; and the temporal coherence of the problem, which is exploited by recurrent layers. We show that, through a supervised learning approach, the trained models are capable of predicting the people count with high accuracy. Additionally, we present and demonstrate that a straightforward modification of the methodology is effective to exclude salespeople from the people count. Comprehensive experiments were conducted to validate, evaluate and compare the proposed architecture. Results corroborated that LRCN-RetailNet remarkably outperforms both the previous RetailNet architecture, which was limited to evaluating a single image per iteration; and a state-of-the-art neural network for object detection. Finally, computational performance experiments confirmed that the entire methodology is effective to estimate people count in real-time.
摘要:测量和分析客户在零售商店的流量是零售商更好地理解客户的行为,并支持决策至关重要。然而,没有太多的关注已考虑到新技术进行自动计数的人的发展。我们引进LRCN-RetailNet:一个经常性的神经网络结构能够学习非线性回归模型和准确预测人们通过低成本的监控摄像头拍摄的视频计数。输入视频格式如下最近提出的RGBP图像格式,它是由颜色和人(前景)的信息。我们的架构能够考虑两个相关方面:通过从图像RGBP卷积层提取的空间特征;和这个问题,这是由反复层利用的时间相干性。我们表明,通过监督学习方法,训练的模型能够预测人们高精度计数。此外,我们提出并证明了该方法的简单修改是有效排除销售人员从人算。综合进行实验来验证,评估和比较提议的架构。结果证实该LRCN-RetailNet显着优于两个先前RetailNet架构,这仅限于评估每次迭代的单个图像;和一个国家的最先进的神经网络对象检测。最后,计算性能实验证实,整个方法是有效的人估计在实时计数。
Lucas Massa, Adriano Barbosa, Krerley Oliveira, Thales Vieira
Abstract: Measuring and analyzing the flow of customers in retail stores is essential for a retailer to better comprehend customers' behavior and support decision-making. Nevertheless, not much attention has been given to the development of novel technologies for automatic people counting. We introduce LRCN-RetailNet: a recurrent neural network architecture capable of learning a non-linear regression model and accurately predicting the people count from videos captured by low-cost surveillance cameras. The input video format follows the recently proposed RGBP image format, which is comprised of color and people (foreground) information. Our architecture is capable of considering two relevant aspects: spatial features extracted through convolutional layers from the RGBP images; and the temporal coherence of the problem, which is exploited by recurrent layers. We show that, through a supervised learning approach, the trained models are capable of predicting the people count with high accuracy. Additionally, we present and demonstrate that a straightforward modification of the methodology is effective to exclude salespeople from the people count. Comprehensive experiments were conducted to validate, evaluate and compare the proposed architecture. Results corroborated that LRCN-RetailNet remarkably outperforms both the previous RetailNet architecture, which was limited to evaluating a single image per iteration; and a state-of-the-art neural network for object detection. Finally, computational performance experiments confirmed that the entire methodology is effective to estimate people count in real-time.
摘要:测量和分析客户在零售商店的流量是零售商更好地理解客户的行为,并支持决策至关重要。然而,没有太多的关注已考虑到新技术进行自动计数的人的发展。我们引进LRCN-RetailNet:一个经常性的神经网络结构能够学习非线性回归模型和准确预测人们通过低成本的监控摄像头拍摄的视频计数。输入视频格式如下最近提出的RGBP图像格式,它是由颜色和人(前景)的信息。我们的架构能够考虑两个相关方面:通过从图像RGBP卷积层提取的空间特征;和这个问题,这是由反复层利用的时间相干性。我们表明,通过监督学习方法,训练的模型能够预测人们高精度计数。此外,我们提出并证明了该方法的简单修改是有效排除销售人员从人算。综合进行实验来验证,评估和比较提议的架构。结果证实该LRCN-RetailNet显着优于两个先前RetailNet架构,这仅限于评估每次迭代的单个图像;和一个国家的最先进的神经网络对象检测。最后,计算性能实验证实,整个方法是有效的人估计在实时计数。
31. Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images [PDF] 返回目录
Ming Y. Lu, Drew F. K. Williamson, Tiffany Y. Chen, Richard J. Chen, Matteo Barbieri, Faisal Mahmood
Abstract: The rapidly emerging field of computational pathology has the potential to enable objective diagnosis, therapeutic response prediction and identification of new morphological features of clinical relevance. However, deep learning-based computational pathology approaches either require manual annotation of gigapixel whole slide images (WSIs) in fully-supervised settings or thousands of WSIs with slide-level labels in a weakly-supervised setting. Moreover, whole slide level computational pathology methods also suffer from domain adaptation and interpretability issues. These challenges have prevented the broad adaptation of computational pathology for clinical and research purposes. Here we present CLAM - Clustering-constrained attention multiple instance learning, an easy-to-use, high-throughput, and interpretable WSI-level processing and learning method that only requires slide-level labels while being data efficient, adaptable and capable of handling multi-class subtyping problems. CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. In three separate analyses, we demonstrate the data efficiency and adaptability of CLAM and its superior performance over standard weakly-supervised classification. We demonstrate that CLAM models are interpretable and can be used to identify well-known and new morphological features. We further show that models trained using CLAM are adaptable to independent test cohorts, cell phone microscopy images, and biopsies. CLAM is a general-purpose and adaptable method that can be used for a variety of different computational pathology tasks in both clinical and research settings.
摘要:计算病理学的迅速涌现的领域具有使客观的诊断,治疗响应和预测的临床相关性的新的形态特征识别的潜力。然而,深学习型计算病理学方法要么需要在弱监督设置在充分监督的设置或数千峰会与滑盖级标签千兆像素的整个幻灯片图像(WSIS)的手动标注。此外,整个幻灯片级别的计算方法病理也遭受领域适应性和可解释性的问题。这些挑战阻碍了计算病理学的广适用于临床和研究目的。在这里,我们目前CLAM - 聚类受限的关注多示例学习,易于使用,高通量,并解释WSI级处理和学习方法,只需要滑动级别的标签,而在对数据进行高效,适应性强,能够处理多级子类型的问题。 CLAM是深学习基于弱监督方法,其用途基于注意机制的学习来自动识别的高诊断价值的子区域,以便准确地对整个幻灯片进行分类,同时还利用实例级聚类在识别为代表的区域制约和完善的功能空间。在三个独立的分析中,我们证明CLAM的数据的效率和适应性和其通过标准弱监督分类性能优越。我们表明,CLAM模型是可解释和可用于识别著名和新的形态特征。进一步的研究表明使用CLAM训练的模型是适用于独立的测试组群,手机显微镜图像,和活检。 CLAM是可用于各种临床和研究设置不同的计算任务的病理通用和适应性强的方法。
Ming Y. Lu, Drew F. K. Williamson, Tiffany Y. Chen, Richard J. Chen, Matteo Barbieri, Faisal Mahmood
Abstract: The rapidly emerging field of computational pathology has the potential to enable objective diagnosis, therapeutic response prediction and identification of new morphological features of clinical relevance. However, deep learning-based computational pathology approaches either require manual annotation of gigapixel whole slide images (WSIs) in fully-supervised settings or thousands of WSIs with slide-level labels in a weakly-supervised setting. Moreover, whole slide level computational pathology methods also suffer from domain adaptation and interpretability issues. These challenges have prevented the broad adaptation of computational pathology for clinical and research purposes. Here we present CLAM - Clustering-constrained attention multiple instance learning, an easy-to-use, high-throughput, and interpretable WSI-level processing and learning method that only requires slide-level labels while being data efficient, adaptable and capable of handling multi-class subtyping problems. CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. In three separate analyses, we demonstrate the data efficiency and adaptability of CLAM and its superior performance over standard weakly-supervised classification. We demonstrate that CLAM models are interpretable and can be used to identify well-known and new morphological features. We further show that models trained using CLAM are adaptable to independent test cohorts, cell phone microscopy images, and biopsies. CLAM is a general-purpose and adaptable method that can be used for a variety of different computational pathology tasks in both clinical and research settings.
摘要:计算病理学的迅速涌现的领域具有使客观的诊断,治疗响应和预测的临床相关性的新的形态特征识别的潜力。然而,深学习型计算病理学方法要么需要在弱监督设置在充分监督的设置或数千峰会与滑盖级标签千兆像素的整个幻灯片图像(WSIS)的手动标注。此外,整个幻灯片级别的计算方法病理也遭受领域适应性和可解释性的问题。这些挑战阻碍了计算病理学的广适用于临床和研究目的。在这里,我们目前CLAM - 聚类受限的关注多示例学习,易于使用,高通量,并解释WSI级处理和学习方法,只需要滑动级别的标签,而在对数据进行高效,适应性强,能够处理多级子类型的问题。 CLAM是深学习基于弱监督方法,其用途基于注意机制的学习来自动识别的高诊断价值的子区域,以便准确地对整个幻灯片进行分类,同时还利用实例级聚类在识别为代表的区域制约和完善的功能空间。在三个独立的分析中,我们证明CLAM的数据的效率和适应性和其通过标准弱监督分类性能优越。我们表明,CLAM模型是可解释和可用于识别著名和新的形态特征。进一步的研究表明使用CLAM训练的模型是适用于独立的测试组群,手机显微镜图像,和活检。 CLAM是可用于各种临床和研究设置不同的计算任务的病理通用和适应性强的方法。
32. Intelligent Querying for Target Tracking in Camera Networks using Deep Q-Learning with n-Step Bootstrapping [PDF] 返回目录
Anil Sharma, Saket Anand, Sanjit K. Kaul
Abstract: Surveillance camera networks are a useful infrastructure for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across the network. Most multi-camera tracking works focus on target re-identification and trajectory association problems to track the target. However, since camera networks can generate enormous amount of video data, inefficient schemes for making re-identification or trajectory association queries can incur prohibitively large computational requirements. In this paper, we address the problem of intelligent scheduling of re-identification queries in a multi-camera tracking setting. To this end, we formulate the target tracking problem in a camera network as an MDP and learn a reinforcement learning based policy that selects a camera for making a re-identification query. The proposed approach to camera selection does not assume the knowledge of the camera network topology but the resulting policy implicitly learns it. We have also shown that such a policy can be learnt directly from data. Using the NLPR MCT and the Duke MTMC multi-camera multi-target tracking benchmarks, we empirically show that the proposed approach substantially reduces the number of frames queried.
摘要:监控摄像机网络是各种可视化分析应用中,高层次的推论和预测,可以根据在网络中的目标跟踪进行了有益的基础设施。大多数多摄像机跟踪工作重点放在目标重新鉴定和轨迹关联的问题跟踪目标。然而,由于照相机网络可以生成视频数据,用于使重新鉴定或轨迹关联查询可以招致大得惊人的计算要求低效率的方案的巨大的量。在本文中,我们解决了多摄像机跟踪设置,重新识别查询的智能调度的问题。为此,我们制定一个网络摄像头作为MDP目标跟踪问题和学习增强学习基于策略是选择用来进行重新识别查询的相机。所提出的方法摄像机选择不承担摄像机的网络拓扑结构的知识,但由此产生的政策隐含学习它。我们还表明,这种策略可以直接从数据中了解到。使用MCT NLPR和杜克MTMC多摄像机多目标跟踪基准,我们凭经验表明,所提出的方法基本上减少的帧的数量查询。
Anil Sharma, Saket Anand, Sanjit K. Kaul
Abstract: Surveillance camera networks are a useful infrastructure for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across the network. Most multi-camera tracking works focus on target re-identification and trajectory association problems to track the target. However, since camera networks can generate enormous amount of video data, inefficient schemes for making re-identification or trajectory association queries can incur prohibitively large computational requirements. In this paper, we address the problem of intelligent scheduling of re-identification queries in a multi-camera tracking setting. To this end, we formulate the target tracking problem in a camera network as an MDP and learn a reinforcement learning based policy that selects a camera for making a re-identification query. The proposed approach to camera selection does not assume the knowledge of the camera network topology but the resulting policy implicitly learns it. We have also shown that such a policy can be learnt directly from data. Using the NLPR MCT and the Duke MTMC multi-camera multi-target tracking benchmarks, we empirically show that the proposed approach substantially reduces the number of frames queried.
摘要:监控摄像机网络是各种可视化分析应用中,高层次的推论和预测,可以根据在网络中的目标跟踪进行了有益的基础设施。大多数多摄像机跟踪工作重点放在目标重新鉴定和轨迹关联的问题跟踪目标。然而,由于照相机网络可以生成视频数据,用于使重新鉴定或轨迹关联查询可以招致大得惊人的计算要求低效率的方案的巨大的量。在本文中,我们解决了多摄像机跟踪设置,重新识别查询的智能调度的问题。为此,我们制定一个网络摄像头作为MDP目标跟踪问题和学习增强学习基于策略是选择用来进行重新识别查询的相机。所提出的方法摄像机选择不承担摄像机的网络拓扑结构的知识,但由此产生的政策隐含学习它。我们还表明,这种策略可以直接从数据中了解到。使用MCT NLPR和杜克MTMC多摄像机多目标跟踪基准,我们凭经验表明,所提出的方法基本上减少的帧的数量查询。
33. LSQ+: Improving low-bit quantization through learnable offsets and better initialization [PDF] 返回目录
Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak
Abstract: Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently employed in popular efficient architectures can also result in negative activation values, with skewed positive and negative ranges. Typical learnable quantization schemes [PACT, LSQ] assume unsigned quantization for activations and quantize all negative activations to zero which leads to significant loss in performance. Naively using signed quantization to accommodate these negative values requires an extra sign bit which is expensive for low-bit (2-, 3-, 4-bit) quantization. To solve this problem, we propose LSQ+, a natural extension of LSQ, wherein we introduce a general asymmetric quantization scheme with trainable scale and offset parameters that can learn to accommodate the negative activations. Gradient-based learnable quantization schemes also commonly suffer from high instability or variance in the final training performance, hence requiring a great deal of hyper-parameter tuning to reach a satisfactory performance. LSQ+ alleviates this problem by using an MSE-based initialization scheme for the quantization parameters. We show that this initialization leads to significantly lower variance in final performance across multiple training runs. Overall, LSQ+ shows state-of-the-art results for EfficientNet and MixNet and also significantly outperforms LSQ for low-bit quantization of neural nets with Swish activations (e.g.: 1.8% gain with W4A4 quantization and upto 5.6% gain with W2A2 quantization of EfficientNet-B0 on ImageNet dataset). To the best of our knowledge, ours is the first work to quantize such architectures to extremely low bit-widths.
摘要:不同于RELU,较新的激活函数(如沙沙,H-沙沙,米什),它们在流行高效架构经常使用还可导致负的激活值,与倾斜正和负范围。典型可以学习的量化方案[PACT,LSQ]承担激活无符号量化和量化的所有负激活,以导致零在性能显著损失。天真地使用带量化,以适应这些负值需要一个额外的符号位,其为低的比特(2-,3-,4-位)量化昂贵。为了解决这个问题,我们提出LSQ +,LSQ的自然延伸,其中,我们介绍用可训练规模的一般非对称量化方案和偏移可以学习,以容纳负激活参数。基于梯度的可学习量化方案通常也从最后的训练高性能不稳定或方差受到影响,因此需要超参数调节大量达到令人满意的表现。 LSQ +使用的量化参数基于MSE初始化方案解决了这个问题。我们表明,这种初始化导致显著较低的差异在跨多个训练运行最后一场演出。总体而言,LSQ +显示状态的最先进的结果EfficientNet和混合网,并且还显著性能优于LSQ用于与沙沙激活(例如神经网络的低比特量化:1.8%的增益与W4A4量化和高达5.6%的增益与W2A2量化EfficientNet-B0上ImageNet数据集)。据我们所知,我们是量化这样的架构,以极低的位宽的第一部作品。
Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak
Abstract: Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently employed in popular efficient architectures can also result in negative activation values, with skewed positive and negative ranges. Typical learnable quantization schemes [PACT, LSQ] assume unsigned quantization for activations and quantize all negative activations to zero which leads to significant loss in performance. Naively using signed quantization to accommodate these negative values requires an extra sign bit which is expensive for low-bit (2-, 3-, 4-bit) quantization. To solve this problem, we propose LSQ+, a natural extension of LSQ, wherein we introduce a general asymmetric quantization scheme with trainable scale and offset parameters that can learn to accommodate the negative activations. Gradient-based learnable quantization schemes also commonly suffer from high instability or variance in the final training performance, hence requiring a great deal of hyper-parameter tuning to reach a satisfactory performance. LSQ+ alleviates this problem by using an MSE-based initialization scheme for the quantization parameters. We show that this initialization leads to significantly lower variance in final performance across multiple training runs. Overall, LSQ+ shows state-of-the-art results for EfficientNet and MixNet and also significantly outperforms LSQ for low-bit quantization of neural nets with Swish activations (e.g.: 1.8% gain with W4A4 quantization and upto 5.6% gain with W2A2 quantization of EfficientNet-B0 on ImageNet dataset). To the best of our knowledge, ours is the first work to quantize such architectures to extremely low bit-widths.
摘要:不同于RELU,较新的激活函数(如沙沙,H-沙沙,米什),它们在流行高效架构经常使用还可导致负的激活值,与倾斜正和负范围。典型可以学习的量化方案[PACT,LSQ]承担激活无符号量化和量化的所有负激活,以导致零在性能显著损失。天真地使用带量化,以适应这些负值需要一个额外的符号位,其为低的比特(2-,3-,4-位)量化昂贵。为了解决这个问题,我们提出LSQ +,LSQ的自然延伸,其中,我们介绍用可训练规模的一般非对称量化方案和偏移可以学习,以容纳负激活参数。基于梯度的可学习量化方案通常也从最后的训练高性能不稳定或方差受到影响,因此需要超参数调节大量达到令人满意的表现。 LSQ +使用的量化参数基于MSE初始化方案解决了这个问题。我们表明,这种初始化导致显著较低的差异在跨多个训练运行最后一场演出。总体而言,LSQ +显示状态的最先进的结果EfficientNet和混合网,并且还显著性能优于LSQ用于与沙沙激活(例如神经网络的低比特量化:1.8%的增益与W4A4量化和高达5.6%的增益与W2A2量化EfficientNet-B0上ImageNet数据集)。据我们所知,我们是量化这样的架构,以极低的位宽的第一部作品。
34. Utilizing Mask R-CNN for Waterline Detection in Canoe Sprint Video Analysis [PDF] 返回目录
Marie-Sophie von Braun, Patrick Frenzel, Christian Käding, Mirco Fuchs
Abstract: Determining a waterline in images recorded in canoe sprint training is an important component for the kinematic parameter analysis to assess an athlete's performance. Here, we propose an approach for the automated waterline detection. First, we utilized a pre-trained Mask R-CNN by means of transfer learning for canoe segmentation. Second, we developed a multi-stage approach to estimate a waterline from the outline of the segments. It consists of two linear regression stages and the systematic selection of canoe parts. We then introduced a parameterization of the waterline as a basis for further evaluations. Next, we conducted a study among several experts to estimate the ground truth waterlines. This not only included an average waterline drawn from the individual experts annotations but, more importantly, a measure for the uncertainty between individual results. Finally, we assessed our method with respect to the question whether the predicted waterlines are in accordance with the experts annotations. Our method demonstrated a high performance and provides opportunities for new applications in the field of automated video analysis in canoe sprint.
摘要:确定记录在独木舟冲刺训练图像的水线是运动学参数分析,以评估运动员的表现的一个重要组成部分。在这里,我们提出了自动检测水线的方法。首先,我们通过学习独木舟分割传送装置利用预先训练的面膜R-CNN。其次,我们发展到从各段的轮廓估计水线一个多阶段的方法。它由两个线性回归阶段的独木舟部分的系统选择的。然后,我们引入了水线的参数作为进一步评估的基础。接下来,我们进行了多次专家之间的研究,估计地面实况水纹。这不仅包括来自个别专家的注解,但是,更重要的是绘制的平均水线,个别结果的不确定性的度量。最后,我们对于这个问题我们的评估方法预测的水纹是否按照专家的注解。我们的方法表现出了较高的性能和在独木舟冲刺自动视频分析领域的新应用提供了机会。
Marie-Sophie von Braun, Patrick Frenzel, Christian Käding, Mirco Fuchs
Abstract: Determining a waterline in images recorded in canoe sprint training is an important component for the kinematic parameter analysis to assess an athlete's performance. Here, we propose an approach for the automated waterline detection. First, we utilized a pre-trained Mask R-CNN by means of transfer learning for canoe segmentation. Second, we developed a multi-stage approach to estimate a waterline from the outline of the segments. It consists of two linear regression stages and the systematic selection of canoe parts. We then introduced a parameterization of the waterline as a basis for further evaluations. Next, we conducted a study among several experts to estimate the ground truth waterlines. This not only included an average waterline drawn from the individual experts annotations but, more importantly, a measure for the uncertainty between individual results. Finally, we assessed our method with respect to the question whether the predicted waterlines are in accordance with the experts annotations. Our method demonstrated a high performance and provides opportunities for new applications in the field of automated video analysis in canoe sprint.
摘要:确定记录在独木舟冲刺训练图像的水线是运动学参数分析,以评估运动员的表现的一个重要组成部分。在这里,我们提出了自动检测水线的方法。首先,我们通过学习独木舟分割传送装置利用预先训练的面膜R-CNN。其次,我们发展到从各段的轮廓估计水线一个多阶段的方法。它由两个线性回归阶段的独木舟部分的系统选择的。然后,我们引入了水线的参数作为进一步评估的基础。接下来,我们进行了多次专家之间的研究,估计地面实况水纹。这不仅包括来自个别专家的注解,但是,更重要的是绘制的平均水线,个别结果的不确定性的度量。最后,我们对于这个问题我们的评估方法预测的水纹是否按照专家的注解。我们的方法表现出了较高的性能和在独木舟冲刺自动视频分析领域的新应用提供了机会。
35. AANet: Adaptive Aggregation Network for Efficient Stereo Matching [PDF] 返回目录
Haofei Xu, Juyong Zhang
Abstract: Despite the remarkable progress made by learning based stereo matching algorithms, one key challenge remains unsolved. Current state-of-the-art stereo models are mostly based on costly 3D convolutions, the cubic computational complexity and high memory consumption make it quite expensive to deploy in real-world applications. In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. To this end, we first propose a sparse points based intra-scale cost aggregation method to alleviate the well-known edge-fattening issue at disparity discontinuities. Further, we approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions. Both modules are simple, lightweight, and complementary, leading to an effective and efficient architecture for cost aggregation. With these two modules, we can not only significantly speed up existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than PSMNet and $38\times$ than GA-Net), but also improve the performance of fast stereo models (e.g., StereoNet). We also achieve competitive results on Scene Flow and KITTI datasets while running at 62ms, demonstrating the versatility and high efficiency of the proposed method. Our full framework is available at this https URL .
摘要:尽管基于学习的立体匹配算法,取得了显着的进步,一个关键的挑战仍然没有解决。国家的最先进的电流立体模型大多是基于昂贵的3D卷积,立方计算复杂度和高内存消耗,使其相当昂贵,在现实世界的应用部署。在本文中,我们的目标是完全替代常用的3D卷积实现快速推理速度,同时保持相当的精度。为此,我们首先提出了一个基于稀疏点规模内的成本聚合方法,以缓解众所周知的边缘育肥问题在差距不连续性。此外,我们近似传统的跨尺度的成本聚合算法和神经网络层来处理大量的无纹理的区域。两个模块是简单的,重量轻且互补,导致有效和高效的架构成本聚集。有了这两个模块,我们不仅可以显著加快现有的顶级表现的模型(例如,$ 41 \倍$比GC-网,$ 4 \倍$比PSMNet和$ 38 \倍$比GA-网),同时也提高了快速立体模型的性能(例如,StereoNet)。我们还实现在62ms运行时,表明该方法的多功能性和高效率的场景流量和KITTI数据集有竞争力的结果。我们的全面框架可在此HTTPS URL。
Haofei Xu, Juyong Zhang
Abstract: Despite the remarkable progress made by learning based stereo matching algorithms, one key challenge remains unsolved. Current state-of-the-art stereo models are mostly based on costly 3D convolutions, the cubic computational complexity and high memory consumption make it quite expensive to deploy in real-world applications. In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. To this end, we first propose a sparse points based intra-scale cost aggregation method to alleviate the well-known edge-fattening issue at disparity discontinuities. Further, we approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions. Both modules are simple, lightweight, and complementary, leading to an effective and efficient architecture for cost aggregation. With these two modules, we can not only significantly speed up existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than PSMNet and $38\times$ than GA-Net), but also improve the performance of fast stereo models (e.g., StereoNet). We also achieve competitive results on Scene Flow and KITTI datasets while running at 62ms, demonstrating the versatility and high efficiency of the proposed method. Our full framework is available at this https URL .
摘要:尽管基于学习的立体匹配算法,取得了显着的进步,一个关键的挑战仍然没有解决。国家的最先进的电流立体模型大多是基于昂贵的3D卷积,立方计算复杂度和高内存消耗,使其相当昂贵,在现实世界的应用部署。在本文中,我们的目标是完全替代常用的3D卷积实现快速推理速度,同时保持相当的精度。为此,我们首先提出了一个基于稀疏点规模内的成本聚合方法,以缓解众所周知的边缘育肥问题在差距不连续性。此外,我们近似传统的跨尺度的成本聚合算法和神经网络层来处理大量的无纹理的区域。两个模块是简单的,重量轻且互补,导致有效和高效的架构成本聚集。有了这两个模块,我们不仅可以显著加快现有的顶级表现的模型(例如,$ 41 \倍$比GC-网,$ 4 \倍$比PSMNet和$ 38 \倍$比GA-网),同时也提高了快速立体模型的性能(例如,StereoNet)。我们还实现在62ms运行时,表明该方法的多功能性和高效率的场景流量和KITTI数据集有竞争力的结果。我们的全面框架可在此HTTPS URL。
36. Multi-Scale Thermal to Visible Face Verification via Attribute Guided Synthesis [PDF] 返回目录
Xing Di, Benjamin S. Riggan, Shuowen Hu, Nathaniel J. Short, Vishal M. Patel
Abstract: Thermal-to-visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we use attributes extracted from visible images to synthesize the attributepreserved visible images from thermal imagery for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pretrained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification. An extended dataset consisting of polarimetric thermal faces of 121 subjects is also introduced. Extensive experiments evaluated on various datasets and protocols demonstrate that the proposed method achieves state-of-the-art per-formance.
摘要:热 - 可见人脸验证是一个具有挑战性的问题,由于模式之间的巨大差异域。现有靠近任尝试来合成从热面可见的人脸或提取从这些方式鲁棒特征为跨通道匹配。在本文中,我们使用从可见光图像提取的属性来合成从热成像的attributepreserved可见光图像跨模态匹配。 A-预先训练VGG-工作面网络被用于提取从所述可见图像的属性。然后,一种新颖的多尺度发生器拟从由所提取的属性引导的热图像合成可见图像。最后,一个预训练的VGG-FACE网络是从所述合成图像和用于验证该输入的可视图像利用来提取特征。由121名受试者极化热面的扩展数据集也被引入。在各种数据集和协议评价了广泛的实验表明,该方法实现了国家的最先进的每formance。
Xing Di, Benjamin S. Riggan, Shuowen Hu, Nathaniel J. Short, Vishal M. Patel
Abstract: Thermal-to-visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we use attributes extracted from visible images to synthesize the attributepreserved visible images from thermal imagery for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pretrained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification. An extended dataset consisting of polarimetric thermal faces of 121 subjects is also introduced. Extensive experiments evaluated on various datasets and protocols demonstrate that the proposed method achieves state-of-the-art per-formance.
摘要:热 - 可见人脸验证是一个具有挑战性的问题,由于模式之间的巨大差异域。现有靠近任尝试来合成从热面可见的人脸或提取从这些方式鲁棒特征为跨通道匹配。在本文中,我们使用从可见光图像提取的属性来合成从热成像的attributepreserved可见光图像跨模态匹配。 A-预先训练VGG-工作面网络被用于提取从所述可见图像的属性。然后,一种新颖的多尺度发生器拟从由所提取的属性引导的热图像合成可见图像。最后,一个预训练的VGG-FACE网络是从所述合成图像和用于验证该输入的可视图像利用来提取特征。由121名受试者极化热面的扩展数据集也被引入。在各种数据集和协议评价了广泛的实验表明,该方法实现了国家的最先进的每formance。
37. Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation [PDF] 返回目录
Ryan Julian, Benjamin Swanson, Gaurav S. Sukhatme, Sergey Levine, Chelsea Finn, Karol Hausman
Abstract: One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today are deployed as a fixed policy and they are not being adapted after their deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? In this paper, we present a method and empirical evidence towards a robot learning framework that facilitates continuous adaption. In particular, we demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, including changes in background, object shape and appearance, lighting conditions, and robot morphology. Further, this adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.
摘要:一个机器人学习系统的很有前途的是,他们将能够从错误中学习,不断适应不断变化的环境。尽管这种潜力,最机器人今天学习系统的部署为一个固定的政策,他们没有被他们的部署后调整。我们能否有效地适应以前学到的行为,新的环境,对象和知觉在现实世界?在本文中,我们提出朝向机器人学习框架有利于连续适配的方法和实验证据。特别是,我们将演示如何通过适应关政策强化学习基于视觉的机器人操作政策,新变化的微调,包括背景物体形状的变化,和外观,照明条件,和机器人形态。此外,这种适应使用所必需的数据低于0.2%,至从头学习任务。我们发现,我们的适应预先训练政策导致大幅的性能提升了微调的过程,并通过RL是前培训的方法是至关重要的:从头训练或监督ImageNet功能调整都是不成功等少量数据的。我们还发现,这些积极的结果保持在有限的持续学习环境,在其中我们反复微调的利用的新任务继承数据政策的单一血统。我们的实证结论一致通过实验模拟上的操作任务的支持,并通过一个真实的机器人抓取系统52独特的微调实验在58万个掌握预先训练。
Ryan Julian, Benjamin Swanson, Gaurav S. Sukhatme, Sergey Levine, Chelsea Finn, Karol Hausman
Abstract: One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today are deployed as a fixed policy and they are not being adapted after their deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? In this paper, we present a method and empirical evidence towards a robot learning framework that facilitates continuous adaption. In particular, we demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, including changes in background, object shape and appearance, lighting conditions, and robot morphology. Further, this adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.
摘要:一个机器人学习系统的很有前途的是,他们将能够从错误中学习,不断适应不断变化的环境。尽管这种潜力,最机器人今天学习系统的部署为一个固定的政策,他们没有被他们的部署后调整。我们能否有效地适应以前学到的行为,新的环境,对象和知觉在现实世界?在本文中,我们提出朝向机器人学习框架有利于连续适配的方法和实验证据。特别是,我们将演示如何通过适应关政策强化学习基于视觉的机器人操作政策,新变化的微调,包括背景物体形状的变化,和外观,照明条件,和机器人形态。此外,这种适应使用所必需的数据低于0.2%,至从头学习任务。我们发现,我们的适应预先训练政策导致大幅的性能提升了微调的过程,并通过RL是前培训的方法是至关重要的:从头训练或监督ImageNet功能调整都是不成功等少量数据的。我们还发现,这些积极的结果保持在有限的持续学习环境,在其中我们反复微调的利用的新任务继承数据政策的单一血统。我们的实证结论一致通过实验模拟上的操作任务的支持,并通过一个真实的机器人抓取系统52独特的微调实验在58万个掌握预先训练。
38. 4D Spatio-Temporal Deep Learning with 4D fMRI Data for Autism Spectrum Disorder Classification [PDF] 返回目录
Marcel Bengs, Nils Gessert, Alexander Schlaefer
Abstract: Autism spectrum disorder (ASD) is associated with behavioral and communication problems. Often, functional magnetic resonance imaging (fMRI) is used to detect and characterize brain changes related to the disorder. Recently, machine learning methods have been employed to reveal new patterns by trying to classify ASD from spatio-temporal fMRI images. Typically, these methods have either focused on temporal or spatial information processing. Instead, we propose a 4D spatio-temporal deep learning approach for ASD classification where we jointly learn from spatial and temporal data. We employ 4D convolutional neural networks and convolutional-recurrent models which outperform a previous approach with an F1-score of 0.71 compared to an F1-score of 0.65.
摘要:自闭症谱系障碍(ASD)与行为和通信问题有关。通常,功能性磁共振成像(fMRI)被用于检测和相关的病症特征分析大脑的变化。近来,已经采用机器学习方法从时空的fMRI图像试图进行分类ASD透露新模式。典型地,这些方法要么集中于时间或空间的信息处理。相反,我们提出了ASD分类,我们共同的空间和时间数据学习四维时空深层学习方法。我们采用4D卷积神经网络和卷积经常车型,其跑赢大盘0.71的F1-得分相比,0.65的F1-得分前一种方法。
Marcel Bengs, Nils Gessert, Alexander Schlaefer
Abstract: Autism spectrum disorder (ASD) is associated with behavioral and communication problems. Often, functional magnetic resonance imaging (fMRI) is used to detect and characterize brain changes related to the disorder. Recently, machine learning methods have been employed to reveal new patterns by trying to classify ASD from spatio-temporal fMRI images. Typically, these methods have either focused on temporal or spatial information processing. Instead, we propose a 4D spatio-temporal deep learning approach for ASD classification where we jointly learn from spatial and temporal data. We employ 4D convolutional neural networks and convolutional-recurrent models which outperform a previous approach with an F1-score of 0.71 compared to an F1-score of 0.65.
摘要:自闭症谱系障碍(ASD)与行为和通信问题有关。通常,功能性磁共振成像(fMRI)被用于检测和相关的病症特征分析大脑的变化。近来,已经采用机器学习方法从时空的fMRI图像试图进行分类ASD透露新模式。典型地,这些方法要么集中于时间或空间的信息处理。相反,我们提出了ASD分类,我们共同的空间和时间数据学习四维时空深层学习方法。我们采用4D卷积神经网络和卷积经常车型,其跑赢大盘0.71的F1-得分相比,0.65的F1-得分前一种方法。
39. EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks [PDF] 返回目录
Sanchari Sen, Balaraman Ravindran, Anand Raghunathan
Abstract: Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the 'best of both worlds', i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% 3 4 in our evaluations). we evaluate empir across a suite of dnns for different image recognition tasks (mnist, cifar-10 and imagenet) under adversarial attacks. results indicate that boosts the average accuracies by 42.6%, 15.2% 10.5% dnn models trained on mnist, imagenet datasets respectively, when compared to single full-precision models, without sacrificing accuracy unperturbed inputs. < font>
摘要:确保深层神经网络(DNNs)的稳健性是其在安全关键应用,如自动驾驶汽车,无人驾驶飞机和医疗保健采纳的关键。值得注意的是,DNNs很容易受到其中小输入扰动可能产生灾难性的错误分类对抗性攻击。在这项工作中,我们提出EMPIR,量化DNN型号不同数值精度的合奏,作为一种新的方法,以增加对抗敌对攻击的鲁棒性。 EMPIR是基于量化神经网络通常表现出更高的稳健性比全精度的网络对抗攻击的观察,但在精度上的原始(未扰动)投入大幅亏损的成本。 EMPIR克服了这一限制,以实现“两全其美”,即,全精度模型与低精度模型的更高的鲁棒组合的较高精度未扰动,通过在合奏构成它们。此外,作为低精度模型DNN比全精度的模型显著较低的计算和存储需求,EMPIR车型只比单全精度模型招致适度的计算和存储开销(<在我们的评估25%)。我们评估empir跨越一套dnns的3个不同的图像识别任务(mnist,cifar-10和imagenet),并在4次不同的敌对攻击。我们的研究结果表明,empir提升了42.6%,15.2%和训练有素的mnist的dnn模型10.5%,分别cifar-10和imagenet数据集,平均对抗精度比单全精度的模型时,没有对牺牲精度泰然自若投入。< font> 在我们的评估25%)。我们评估empir跨越一套dnns的3个不同的图像识别任务(mnist,cifar-10和imagenet),并在4次不同的敌对攻击。我们的研究结果表明,empir提升了42.6%,15.2%和训练有素的mnist的dnn模型10.5%,分别cifar-10和imagenet数据集,平均对抗精度比单全精度的模型时,没有对牺牲精度泰然自若投入。<>25%>
Sanchari Sen, Balaraman Ravindran, Anand Raghunathan
Abstract: Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the 'best of both worlds', i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% 3 4 in our evaluations). we evaluate empir across a suite of dnns for different image recognition tasks (mnist, cifar-10 and imagenet) under adversarial attacks. results indicate that boosts the average accuracies by 42.6%, 15.2% 10.5% dnn models trained on mnist, imagenet datasets respectively, when compared to single full-precision models, without sacrificing accuracy unperturbed inputs. < font>
摘要:确保深层神经网络(DNNs)的稳健性是其在安全关键应用,如自动驾驶汽车,无人驾驶飞机和医疗保健采纳的关键。值得注意的是,DNNs很容易受到其中小输入扰动可能产生灾难性的错误分类对抗性攻击。在这项工作中,我们提出EMPIR,量化DNN型号不同数值精度的合奏,作为一种新的方法,以增加对抗敌对攻击的鲁棒性。 EMPIR是基于量化神经网络通常表现出更高的稳健性比全精度的网络对抗攻击的观察,但在精度上的原始(未扰动)投入大幅亏损的成本。 EMPIR克服了这一限制,以实现“两全其美”,即,全精度模型与低精度模型的更高的鲁棒组合的较高精度未扰动,通过在合奏构成它们。此外,作为低精度模型DNN比全精度的模型显著较低的计算和存储需求,EMPIR车型只比单全精度模型招致适度的计算和存储开销(<在我们的评估25%)。我们评估empir跨越一套dnns的3个不同的图像识别任务(mnist,cifar-10和imagenet),并在4次不同的敌对攻击。我们的研究结果表明,empir提升了42.6%,15.2%和训练有素的mnist的dnn模型10.5%,分别cifar-10和imagenet数据集,平均对抗精度比单全精度的模型时,没有对牺牲精度泰然自若投入。< font> 在我们的评估25%)。我们评估empir跨越一套dnns的3个不同的图像识别任务(mnist,cifar-10和imagenet),并在4次不同的敌对攻击。我们的研究结果表明,empir提升了42.6%,15.2%和训练有素的mnist的dnn模型10.5%,分别cifar-10和imagenet数据集,平均对抗精度比单全精度的模型时,没有对牺牲精度泰然自若投入。<>25%>
40. Spatio-spectral deep learning methods for in-vivo hyperspectral laryngeal cancer detection [PDF] 返回目录
Marcel Bengs, Stephan Westermann, Nils Gessert, Dennis Eggert, Andreas O. H. Gerstner, Nina A. Mueller, Christian Betz, Wiebke Laffers, Alexander Schlaefer
Abstract: Early detection of head and neck tumors is crucial for patient survival. Often, diagnoses are made based on endoscopic examination of the larynx followed by biopsy and histological analysis, leading to a high inter-observer variability due to subjective assessment. In this regard, early non-invasive diagnostics independent of the clinician would be a valuable tool. A recent study has shown that hyperspectral imaging (HSI) can be used for non-invasive detection of head and neck tumors, as precancerous or cancerous lesions show specific spectral signatures that distinguish them from healthy tissue. However, HSI data processing is challenging due to high spectral variations, various image interferences, and the high dimensionality of the data. Therefore, performance of automatic HSI analysis has been limited and so far, mostly ex-vivo studies have been presented with deep learning. In this work, we analyze deep learning techniques for in-vivo hyperspectral laryngeal cancer detection. For this purpose we design and evaluate convolutional neural networks (CNNs) with 2D spatial or 3D spatio-spectral convolutions combined with a state-of-the-art Densenet architecture. For evaluation, we use an in-vivo data set with HSI of the oral cavity or oropharynx. Overall, we present multiple deep learning techniques for in-vivo laryngeal cancer detection based on HSI and we show that jointly learning from the spatial and spectral domain improves classification accuracy notably. Our 3D spatio-spectral Densenet achieves an average accuracy of 81%.
摘要:头颈部肿瘤的早期检测是患者的生存至关重要。通常情况下,诊断是基于喉内窥镜检查,随后做活检和组织学分析,导致高观察员变异由于主观评价。在这方面,早期无创诊断依赖于临床医生将是一个有价值的工具。最近的研究已经表明,高光谱成像(HSI)可用于非侵入性检测的头颈部肿瘤,的癌前或癌性病变表明,区别于健康组织特定光谱特征。然而,HSI数据处理由于高光谱变化,各种图像的干扰,并且将数据的高维挑战。因此,自动分析恒指表现受到了限制,到目前为止,主要是离体研究已经呈现深度学习。在这项工作中,我们分析体内高光谱喉癌检测深度学习技术。为此目的,我们设计和评估卷积神经网络(细胞神经网络)与2D空间或3D空间 - 光谱卷积用状态的最先进的Densenet架构相结合。为了评价,我们使用与口腔或口咽的HSI的体内数据集。总体而言,在体内基于HSI喉癌的检测,我们表明,从空间和光谱领域共同学习,我们现在多深学习技术提高了分类精度显着。我们的3D空域 - 光谱Densenet达到的81%的平均精确度。
Marcel Bengs, Stephan Westermann, Nils Gessert, Dennis Eggert, Andreas O. H. Gerstner, Nina A. Mueller, Christian Betz, Wiebke Laffers, Alexander Schlaefer
Abstract: Early detection of head and neck tumors is crucial for patient survival. Often, diagnoses are made based on endoscopic examination of the larynx followed by biopsy and histological analysis, leading to a high inter-observer variability due to subjective assessment. In this regard, early non-invasive diagnostics independent of the clinician would be a valuable tool. A recent study has shown that hyperspectral imaging (HSI) can be used for non-invasive detection of head and neck tumors, as precancerous or cancerous lesions show specific spectral signatures that distinguish them from healthy tissue. However, HSI data processing is challenging due to high spectral variations, various image interferences, and the high dimensionality of the data. Therefore, performance of automatic HSI analysis has been limited and so far, mostly ex-vivo studies have been presented with deep learning. In this work, we analyze deep learning techniques for in-vivo hyperspectral laryngeal cancer detection. For this purpose we design and evaluate convolutional neural networks (CNNs) with 2D spatial or 3D spatio-spectral convolutions combined with a state-of-the-art Densenet architecture. For evaluation, we use an in-vivo data set with HSI of the oral cavity or oropharynx. Overall, we present multiple deep learning techniques for in-vivo laryngeal cancer detection based on HSI and we show that jointly learning from the spatial and spectral domain improves classification accuracy notably. Our 3D spatio-spectral Densenet achieves an average accuracy of 81%.
摘要:头颈部肿瘤的早期检测是患者的生存至关重要。通常情况下,诊断是基于喉内窥镜检查,随后做活检和组织学分析,导致高观察员变异由于主观评价。在这方面,早期无创诊断依赖于临床医生将是一个有价值的工具。最近的研究已经表明,高光谱成像(HSI)可用于非侵入性检测的头颈部肿瘤,的癌前或癌性病变表明,区别于健康组织特定光谱特征。然而,HSI数据处理由于高光谱变化,各种图像的干扰,并且将数据的高维挑战。因此,自动分析恒指表现受到了限制,到目前为止,主要是离体研究已经呈现深度学习。在这项工作中,我们分析体内高光谱喉癌检测深度学习技术。为此目的,我们设计和评估卷积神经网络(细胞神经网络)与2D空间或3D空间 - 光谱卷积用状态的最先进的Densenet架构相结合。为了评价,我们使用与口腔或口咽的HSI的体内数据集。总体而言,在体内基于HSI喉癌的检测,我们表明,从空间和光谱领域共同学习,我们现在多深学习技术提高了分类精度显着。我们的3D空域 - 光谱Densenet达到的81%的平均精确度。
41. A Deep Learning Approach for Motion Forecasting Using 4D OCT Data [PDF] 返回目录
Marcel Bengs, Nils Gessert, Alexander Schlaefer
Abstract: Forecasting motion of a specific target object is a common problem for surgical interventions, e.g. for localization of a target region, guidance for surgical interventions, or motion compensation. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution. Recently, deep learning methods have shown promising performance for OCT-based motion estimation based on two volumetric images. We extend this approach and investigate whether using a time series of volumes enables motion forecasting. We propose 4D spatio-temporal deep learning for end-to-end motion forecasting and estimation using a stream of OCT volumes. We design and evaluate five different 3D and 4D deep learning methods using a tissue data set. Our best performing 4D method achieves motion forecasting with an overall average correlation coefficient of 97.41%, while also improving motion estimation performance by a factor of 2.5 compared to a previous 3D approach.
摘要:特定目标对象的预测运动是用于外科手术介入,例如一个共同的问题用于目标区域的定位,指导手术干预,或运动补偿。光学相干断层扫描(OCT)是一种成像模态具有高空间和时间分辨率。近日,深学习方法已经显示了基于两个体积图像基于OCT-运动估计有前途的性能。我们扩展这种方法,并探讨使用时间序列卷是否启用运动预测。我们建议使用四维华侨城卷流空域 - 时深度学习为终端到终端的运动预测和估计。我们设计和评估使用组织的数据集五种不同的3D和4D深学习方法。我们的最佳表现4D方法实现了运动预测与97.41%的总平均相关系数,同时通过相对于先前的三维方法的2.5倍提高运动估计的性能。
Marcel Bengs, Nils Gessert, Alexander Schlaefer
Abstract: Forecasting motion of a specific target object is a common problem for surgical interventions, e.g. for localization of a target region, guidance for surgical interventions, or motion compensation. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution. Recently, deep learning methods have shown promising performance for OCT-based motion estimation based on two volumetric images. We extend this approach and investigate whether using a time series of volumes enables motion forecasting. We propose 4D spatio-temporal deep learning for end-to-end motion forecasting and estimation using a stream of OCT volumes. We design and evaluate five different 3D and 4D deep learning methods using a tissue data set. Our best performing 4D method achieves motion forecasting with an overall average correlation coefficient of 97.41%, while also improving motion estimation performance by a factor of 2.5 compared to a previous 3D approach.
摘要:特定目标对象的预测运动是用于外科手术介入,例如一个共同的问题用于目标区域的定位,指导手术干预,或运动补偿。光学相干断层扫描(OCT)是一种成像模态具有高空间和时间分辨率。近日,深学习方法已经显示了基于两个体积图像基于OCT-运动估计有前途的性能。我们扩展这种方法,并探讨使用时间序列卷是否启用运动预测。我们建议使用四维华侨城卷流空域 - 时深度学习为终端到终端的运动预测和估计。我们设计和评估使用组织的数据集五种不同的3D和4D深学习方法。我们的最佳表现4D方法实现了运动预测与97.41%的总平均相关系数,同时通过相对于先前的三维方法的2.5倍提高运动估计的性能。
42. Spatio-Temporal Deep Learning Methods for Motion Estimation Using 4D OCT Image Data [PDF] 返回目录
Marcel Bengs, Nils Gessert, Matthias Schlüter, Alexander Schlaefer
Abstract: Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Recently, motion estimation between a template and a moving OCT image has been studied with deep learning methods to overcome the shortcomings of conventional, feature-based methods. Methods. We investigate whether using a temporal stream of OCT image volumes can improve deep learning-based motion estimation performance. For this purpose, we design and evaluate several 3D and 4D deep learning methods and we propose a new deep learning approach. Also, we propose a temporal regularization strategy at the model output. Results. Using a tissue dataset without additional markers, our deep learning methods using 4D data outperform previous approaches. The best performing 4D architecture achieves an correlation coefficient (aCC) of 98.58% compared to 85.0% of a previous 3D deep learning method. Also, our temporal regularization strategy at the output further improves 4D model performance to an aCC of 99.06%. In particular, our 4D method works well for larger motion and is robust towards image rotations and motion distortions. Conclusions. We propose 4D spatio-temporal deep learning for OCT-based motion estimation. On a tissue dataset, we find that using 4D information for the model input improves performance while maintaining reasonable inference times. Our regularization strategy demonstrates that additional temporal information is also beneficial at the model output.
摘要:目的。本地化的结构和估计的特定目标区域的运动是在外科手术干预导航常见问题。光学相干断层扫描(OCT)是一种成像模态与已用于术中成像,并且还用于运动估计,例如,在眼科手术或内耳开窗的上下文中,高空间和时间分辨率。近日,模板和移动OCT图像之间的运动估算已经研究了深学习方法,克服传统的,基于特征的方法的缺点。方法。我们调查是否使用OCT图像体积的时间流可以提高深基础的学习运动估计性能。为此,我们设计和评估的几个3D和4D深度学习的方法和我们提出了一个新的深度学习的方法。此外,我们建议在模型输出时间正策略。结果。使用组织的数据集,无需额外的标记,我们使用4D数据深度学习方法优于以前的方法。表现最佳的4D体系结构实现了比以前的3D深度学习方法的85.0%的98.58%的相关系数(ACC)。此外,我们在输出时间正规化战略,进一步提高4D模型性能的99.06%的累计。特别是,我们4D方法适用于较大的运动,这是实现图像旋转和运动的扭曲强劲。结论。我们提出了基于OCT-运动估计四维时空深度学习。在一个组织的数据集,我们发现使用4D信息模型输入提高了性能,同时保持合理的推论倍。我们的正规化战略表明,额外的时间信息,也为模型的输出是有益的。
Marcel Bengs, Nils Gessert, Matthias Schlüter, Alexander Schlaefer
Abstract: Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Recently, motion estimation between a template and a moving OCT image has been studied with deep learning methods to overcome the shortcomings of conventional, feature-based methods. Methods. We investigate whether using a temporal stream of OCT image volumes can improve deep learning-based motion estimation performance. For this purpose, we design and evaluate several 3D and 4D deep learning methods and we propose a new deep learning approach. Also, we propose a temporal regularization strategy at the model output. Results. Using a tissue dataset without additional markers, our deep learning methods using 4D data outperform previous approaches. The best performing 4D architecture achieves an correlation coefficient (aCC) of 98.58% compared to 85.0% of a previous 3D deep learning method. Also, our temporal regularization strategy at the output further improves 4D model performance to an aCC of 99.06%. In particular, our 4D method works well for larger motion and is robust towards image rotations and motion distortions. Conclusions. We propose 4D spatio-temporal deep learning for OCT-based motion estimation. On a tissue dataset, we find that using 4D information for the model input improves performance while maintaining reasonable inference times. Our regularization strategy demonstrates that additional temporal information is also beneficial at the model output.
摘要:目的。本地化的结构和估计的特定目标区域的运动是在外科手术干预导航常见问题。光学相干断层扫描(OCT)是一种成像模态与已用于术中成像,并且还用于运动估计,例如,在眼科手术或内耳开窗的上下文中,高空间和时间分辨率。近日,模板和移动OCT图像之间的运动估算已经研究了深学习方法,克服传统的,基于特征的方法的缺点。方法。我们调查是否使用OCT图像体积的时间流可以提高深基础的学习运动估计性能。为此,我们设计和评估的几个3D和4D深度学习的方法和我们提出了一个新的深度学习的方法。此外,我们建议在模型输出时间正策略。结果。使用组织的数据集,无需额外的标记,我们使用4D数据深度学习方法优于以前的方法。表现最佳的4D体系结构实现了比以前的3D深度学习方法的85.0%的98.58%的相关系数(ACC)。此外,我们在输出时间正规化战略,进一步提高4D模型性能的99.06%的累计。特别是,我们4D方法适用于较大的运动,这是实现图像旋转和运动的扭曲强劲。结论。我们提出了基于OCT-运动估计四维时空深度学习。在一个组织的数据集,我们发现使用4D信息模型输入提高了性能,同时保持合理的推论倍。我们的正规化战略表明,额外的时间信息,也为模型的输出是有益的。
43. AMP-Net: Denoising based Deep Unfolding for Compressive Image Sensing [PDF] 返回目录
Zhonghao Zhang, Yipeng Liu, Jiani Liu, Fei Wen, Ce Zhu
Abstract: Most compressive sensing (CS) reconstruction methods can be divided into two categories, i.e. model based methods and classical deep network methods. By unfolding the iterative optimization algorithm for model based methods into networks, deep unfolding method has the good interpretation of model based methods and the high speed of classical deep network methods. In this paper, to solve the visual image CS problem, we propose a deep unfolding model dubbed AMP-Net. Rather than learning regularization terms, it is established by unfolding the iterative denoising process of the well-known approximate message passing algorithm. Furthermore, AMP-Net integrates deblocking modules in order to eliminate the blocking artifact that usually appears in CS of visual image. In addition, the sampling matrix is jointly trained with other network parameters to enhance the reconstruction performance. Experimental results show that the proposed AMP-Net has better reconstruction accuracy than other state-of-the-art methods with high reconstruction speed and a small number of network parameters.
摘要:大多数压缩感测(CS)的重建方法可以分为两类,即,基于模型的方法和经典深网络的方法。通过基于模型的方法,迭代优化算法展开成网,深展开方法的基于模型的方法,很好地诠释和经典的深网络方法高速。在本文中,解决了视觉形象CS的问题,我们提出了被称为AMP-Net的深展开模型。而不是学习正则化项,它是由展开众所周知的近似消息传递算法的迭代去噪声处理成立。此外,AMP-Net的集成的解块模块,以消除块假象,通常出现在可视图像的CS。此外,该采样矩阵共同与其他网络参数训练,以提高重建性能。实验结果表明,所提出的AMP-Net的具有更好的重构精度比其他国家的最先进的方法具有高重建速度和少量的网络参数。
Zhonghao Zhang, Yipeng Liu, Jiani Liu, Fei Wen, Ce Zhu
Abstract: Most compressive sensing (CS) reconstruction methods can be divided into two categories, i.e. model based methods and classical deep network methods. By unfolding the iterative optimization algorithm for model based methods into networks, deep unfolding method has the good interpretation of model based methods and the high speed of classical deep network methods. In this paper, to solve the visual image CS problem, we propose a deep unfolding model dubbed AMP-Net. Rather than learning regularization terms, it is established by unfolding the iterative denoising process of the well-known approximate message passing algorithm. Furthermore, AMP-Net integrates deblocking modules in order to eliminate the blocking artifact that usually appears in CS of visual image. In addition, the sampling matrix is jointly trained with other network parameters to enhance the reconstruction performance. Experimental results show that the proposed AMP-Net has better reconstruction accuracy than other state-of-the-art methods with high reconstruction speed and a small number of network parameters.
摘要:大多数压缩感测(CS)的重建方法可以分为两类,即,基于模型的方法和经典深网络的方法。通过基于模型的方法,迭代优化算法展开成网,深展开方法的基于模型的方法,很好地诠释和经典的深网络方法高速。在本文中,解决了视觉形象CS的问题,我们提出了被称为AMP-Net的深展开模型。而不是学习正则化项,它是由展开众所周知的近似消息传递算法的迭代去噪声处理成立。此外,AMP-Net的集成的解块模块,以消除块假象,通常出现在可视图像的CS。此外,该采样矩阵共同与其他网络参数训练,以提高重建性能。实验结果表明,所提出的AMP-Net的具有更好的重构精度比其他国家的最先进的方法具有高重建速度和少量的网络参数。
44. Tensor Networks for Medical Image Classification [PDF] 返回目录
Raghavendra Selvan, Erik B Dam
Abstract: With the increasing adoption of machine learning tools like neural networks across several domains, interesting connections and comparisons to concepts from other domains are coming to light. In this work, we focus on the class of Tensor Networks, which has been a work horse for physicists in the last two decades to analyse quantum many-body systems. Building on the recent interest in tensor networks for machine learning, we extend the Matrix Product State tensor networks (which can be interpreted as linear classifiers operating in exponentially high dimensional spaces) to be useful in medical image analysis tasks. We focus on classification problems as a first step where we motivate the use of tensor networks and propose adaptions for 2D images using classical image domain concepts such as local orderlessness of images. With the proposed locally orderless tensor network model (LoTeNet), we show that tensor networks are capable of attaining performance that is comparable to state-of-the-art deep learning methods. We evaluate the model on two publicly available medical imaging datasets and show performance improvements with fewer model hyperparameters and lesser computational resources compared to relevant baseline methods.
摘要:随着越来越多地采用机器学习工具,如跨多个域,有趣的联系和比较,以从其他域概念神经网络来光。在这项工作中,我们专注于类张量网络,这一直是一个工作的马物理学家在过去二十年来分析量子多体系统。最近在网络张的利益机器学习的基础上,我们扩展了矩阵产品国家张网络(可以理解为以指数高维空间操作线性分类)是在医学图像分析任务非常有用。我们专注于分类问题,我们鼓励使用张网络,并提出了使用经典形象域的概念,如图像的局部orderlessness 2D图像adaptions的第一步。利用所提出的局部无序张网络模型(LoTeNet),我们表明,张网络能够达到的性能是相当的国家的最先进的深学习方法。我们评估在两个可公开获得的医疗影像数据集和更少的模型超参数和较少的计算资源显示性能的提升相比,相关的基线方法模型。
Raghavendra Selvan, Erik B Dam
Abstract: With the increasing adoption of machine learning tools like neural networks across several domains, interesting connections and comparisons to concepts from other domains are coming to light. In this work, we focus on the class of Tensor Networks, which has been a work horse for physicists in the last two decades to analyse quantum many-body systems. Building on the recent interest in tensor networks for machine learning, we extend the Matrix Product State tensor networks (which can be interpreted as linear classifiers operating in exponentially high dimensional spaces) to be useful in medical image analysis tasks. We focus on classification problems as a first step where we motivate the use of tensor networks and propose adaptions for 2D images using classical image domain concepts such as local orderlessness of images. With the proposed locally orderless tensor network model (LoTeNet), we show that tensor networks are capable of attaining performance that is comparable to state-of-the-art deep learning methods. We evaluate the model on two publicly available medical imaging datasets and show performance improvements with fewer model hyperparameters and lesser computational resources compared to relevant baseline methods.
摘要:随着越来越多地采用机器学习工具,如跨多个域,有趣的联系和比较,以从其他域概念神经网络来光。在这项工作中,我们专注于类张量网络,这一直是一个工作的马物理学家在过去二十年来分析量子多体系统。最近在网络张的利益机器学习的基础上,我们扩展了矩阵产品国家张网络(可以理解为以指数高维空间操作线性分类)是在医学图像分析任务非常有用。我们专注于分类问题,我们鼓励使用张网络,并提出了使用经典形象域的概念,如图像的局部orderlessness 2D图像adaptions的第一步。利用所提出的局部无序张网络模型(LoTeNet),我们表明,张网络能够达到的性能是相当的国家的最先进的深学习方法。我们评估在两个可公开获得的医疗影像数据集和更少的模型超参数和较少的计算资源显示性能的提升相比,相关的基线方法模型。
45. MixNet: Multi-modality Mix Network for Brain Segmentation [PDF] 返回目录
Long Chen, Dorit Merhof
Abstract: Automated brain structure segmentation is important to many clinical quantitative analysis and diagnoses. In this work, we introduce MixNet, a 2D semantic-wise deep convolutional neural network to segment brain structure in multi-modality MRI images. The network is composed of our modified deep residual learning units. In the unit, we replace the traditional convolution layer with the dilated convolutional layer, which avoids the use of pooling layers and deconvolutional layers, reducing the number of network parameters. Final predictions are made by aggregating information from multiple scales and modalities. A pyramid pooling module is used to capture spatial information of the anatomical structures at the output end. In addition, we test three architectures (MixNetv1, MixNetv2 and MixNetv3) which fuse the modalities differently to see the effect on the results. Our network achieves the state-of-the-art performance. MixNetv2 was submitted to the MRBrainS challenge at MICCAI 2018 and won the 3rd place in the 3-label task. On the MRBrainS2018 dataset, which includes subjects with a variety of pathologies, the overall DSC (Dice Coefficient) of 84.7% (gray matter), 87.3% (white matter) and 83.4% (cerebrospinal fluid) were obtained with only 7 subjects as training data.
摘要:自动化的大脑结构分割是众多临床定量分析和诊断非常重要。在这项工作中,我们介绍了混合网,在多模态MRI图像的二维语义明智深卷积神经网络分割大脑结构。该网络是由我们的修改深残留学习单元。在该单元中,我们替换为扩张卷积层,其避免了使用汇集层和解卷积层,减少的网络参数的数量的传统卷积层。最后的预言是通过聚合来自多个尺度和模式的资料制成。金字塔池模块用于捕获在输出端处的解剖结构的空间信息。此外,我们测试了三种架构(MixNetv1,MixNetv2和MixNetv3),它融合了不同的模式上看到结果的影响。我们的网络实现了国家的最先进的性能。 MixNetv2在MICCAI 2018提交MRBrainS挑战,并赢得了3标签任务第三名。在MRBrainS2018数据集,其包括与各种病理的,整体DSC的84.7%(骰子系数)(灰质),87.3%(白色物质)和83.4%(脑脊液)受试者仅7名受试者作为训练得到数据。
Long Chen, Dorit Merhof
Abstract: Automated brain structure segmentation is important to many clinical quantitative analysis and diagnoses. In this work, we introduce MixNet, a 2D semantic-wise deep convolutional neural network to segment brain structure in multi-modality MRI images. The network is composed of our modified deep residual learning units. In the unit, we replace the traditional convolution layer with the dilated convolutional layer, which avoids the use of pooling layers and deconvolutional layers, reducing the number of network parameters. Final predictions are made by aggregating information from multiple scales and modalities. A pyramid pooling module is used to capture spatial information of the anatomical structures at the output end. In addition, we test three architectures (MixNetv1, MixNetv2 and MixNetv3) which fuse the modalities differently to see the effect on the results. Our network achieves the state-of-the-art performance. MixNetv2 was submitted to the MRBrainS challenge at MICCAI 2018 and won the 3rd place in the 3-label task. On the MRBrainS2018 dataset, which includes subjects with a variety of pathologies, the overall DSC (Dice Coefficient) of 84.7% (gray matter), 87.3% (white matter) and 83.4% (cerebrospinal fluid) were obtained with only 7 subjects as training data.
摘要:自动化的大脑结构分割是众多临床定量分析和诊断非常重要。在这项工作中,我们介绍了混合网,在多模态MRI图像的二维语义明智深卷积神经网络分割大脑结构。该网络是由我们的修改深残留学习单元。在该单元中,我们替换为扩张卷积层,其避免了使用汇集层和解卷积层,减少的网络参数的数量的传统卷积层。最后的预言是通过聚合来自多个尺度和模式的资料制成。金字塔池模块用于捕获在输出端处的解剖结构的空间信息。此外,我们测试了三种架构(MixNetv1,MixNetv2和MixNetv3),它融合了不同的模式上看到结果的影响。我们的网络实现了国家的最先进的性能。 MixNetv2在MICCAI 2018提交MRBrainS挑战,并赢得了3标签任务第三名。在MRBrainS2018数据集,其包括与各种病理的,整体DSC的84.7%(骰子系数)(灰质),87.3%(白色物质)和83.4%(脑脊液)受试者仅7名受试者作为训练得到数据。
46. Deep Cerebellar Nuclei Segmentation via Semi-Supervised Deep Context-Aware Learning from 7T Diffusion MRI [PDF] 返回目录
Jinyoung Kim, Remi Patriat, Jordan Kaplan, Oren Solomon, Noam Harel
Abstract: Deep cerebellar nuclei are a key structure of the cerebellum that are involved in processing motor and sensory information. It is thus a crucial step to precisely segment deep cerebellar nuclei for the understanding of the cerebellum system and its utility in deep brain stimulation treatment. However, it is challenging to clearly visualize such small nuclei under standard clinical magnetic resonance imaging (MRI) protocols and therefore an automatic patient-specific segmentation is not feasible. Recent advances in 7 Tesla (T) MRI technology and great potential of deep neural networks facilitate automatic, fast, and accurate segmentation. In this paper, we propose a novel deep learning framework (referred to as DCN-Net) for the segmentation of deep cerebellar dentate and interposed nuclei on 7T diffusion MRI. DCN-Net effectively encodes contextual information on the image patches without consecutive pooling operations and adding complexity via proposed dilated dense blocks. During the end-to-end training, label probabilities of dentate and interposed nuclei are independently learned with a hybrid loss, handling highly imbalanced data. Finally, we utilize self-training strategies to cope with the problem of limited labeled data. To this end, auxiliary dentate and interposed nuclei labels are created on unlabeled data by using DCN-Net trained on manual labels. We validate the proposed framework using 7T B0 MRIs from 60 subjects. Experimental results demonstrate that DCN-Net provides better segmentation than atlas-based deep cerebellar nuclei segmentation tools and other state-of-the-art deep neural networks in terms of accuracy and consistency. We further prove the effectiveness of the proposed components within DCN-Net in dentate and interposed nuclei segmentation.
摘要:小脑深部核是参与在处理运动和感觉信息小脑的键结构体。因此,这是精确部分小脑深部核的关键步骤,小脑系统的认识及其在脑深部电刺激治疗效用。然而,可以在标准临床磁共振成像(MRI)的协议具有挑战性清楚地看到这样的小晶核,并且因此自动患者特定的分割是不可行的。在7特斯拉(T)MRI技术和深层神经网络的巨大潜力的最新进展方便自动,快速,准确的分割。在本文中,我们提出了小脑深部齿状的分段和插入核上扩散7T MRI的新型深学习框架(简称DCN-净)。 DCN-Net的有效编码不连续的池操作上的图像块的上下文信息,并通过添加提出扩张密集块的复杂性。在端至端的训练,齿状以及介于核的标签概率与混合动力损失独立地了解到,处理高不平衡数据。最后,我们利用自身的培训战略,应对有限的标记数据的问题。为此,辅助齿状以及介于核标签上的未标记的数据,通过使用DCN-Net的手动标签训练创建。我们验证使用7T核磁共振B0从60名受试者拟议的框架。实验结果表明,DCN-网提供比基于图谱-小脑深部核分割工具和在准确性和一致性方面的其他国家的最先进的深神经网络更好的分割。我们进一步证明齿状DCN-网内所提出的组件和插核分割的有效性。
Jinyoung Kim, Remi Patriat, Jordan Kaplan, Oren Solomon, Noam Harel
Abstract: Deep cerebellar nuclei are a key structure of the cerebellum that are involved in processing motor and sensory information. It is thus a crucial step to precisely segment deep cerebellar nuclei for the understanding of the cerebellum system and its utility in deep brain stimulation treatment. However, it is challenging to clearly visualize such small nuclei under standard clinical magnetic resonance imaging (MRI) protocols and therefore an automatic patient-specific segmentation is not feasible. Recent advances in 7 Tesla (T) MRI technology and great potential of deep neural networks facilitate automatic, fast, and accurate segmentation. In this paper, we propose a novel deep learning framework (referred to as DCN-Net) for the segmentation of deep cerebellar dentate and interposed nuclei on 7T diffusion MRI. DCN-Net effectively encodes contextual information on the image patches without consecutive pooling operations and adding complexity via proposed dilated dense blocks. During the end-to-end training, label probabilities of dentate and interposed nuclei are independently learned with a hybrid loss, handling highly imbalanced data. Finally, we utilize self-training strategies to cope with the problem of limited labeled data. To this end, auxiliary dentate and interposed nuclei labels are created on unlabeled data by using DCN-Net trained on manual labels. We validate the proposed framework using 7T B0 MRIs from 60 subjects. Experimental results demonstrate that DCN-Net provides better segmentation than atlas-based deep cerebellar nuclei segmentation tools and other state-of-the-art deep neural networks in terms of accuracy and consistency. We further prove the effectiveness of the proposed components within DCN-Net in dentate and interposed nuclei segmentation.
摘要:小脑深部核是参与在处理运动和感觉信息小脑的键结构体。因此,这是精确部分小脑深部核的关键步骤,小脑系统的认识及其在脑深部电刺激治疗效用。然而,可以在标准临床磁共振成像(MRI)的协议具有挑战性清楚地看到这样的小晶核,并且因此自动患者特定的分割是不可行的。在7特斯拉(T)MRI技术和深层神经网络的巨大潜力的最新进展方便自动,快速,准确的分割。在本文中,我们提出了小脑深部齿状的分段和插入核上扩散7T MRI的新型深学习框架(简称DCN-净)。 DCN-Net的有效编码不连续的池操作上的图像块的上下文信息,并通过添加提出扩张密集块的复杂性。在端至端的训练,齿状以及介于核的标签概率与混合动力损失独立地了解到,处理高不平衡数据。最后,我们利用自身的培训战略,应对有限的标记数据的问题。为此,辅助齿状以及介于核标签上的未标记的数据,通过使用DCN-Net的手动标签训练创建。我们验证使用7T核磁共振B0从60名受试者拟议的框架。实验结果表明,DCN-网提供比基于图谱-小脑深部核分割工具和在准确性和一致性方面的其他国家的最先进的深神经网络更好的分割。我们进一步证明齿状DCN-网内所提出的组件和插核分割的有效性。
47. Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification [PDF] 返回目录
Wei Zhu, Haofu Liao, Wenbin Li, Weijian Li, Jiebo Luo
Abstract: Skin disease classification from images is crucial to dermatological diagnosis. However, identifying skin lesions involves a variety of aspects in terms of size, color, shape, and texture. To make matters worse, many categories only contain very few samples, posing great challenges to conventional machine learning algorithms and even human experts. Inspired by the recent success of Few-Shot Learning (FSL) in natural image classification, we propose to apply FSL to skin disease identification to address the extreme scarcity of training sample problem. However, directly applying FSL to this task does not work well in practice, and we find that the problem can be largely attributed to the incompatibility between Cross Entropy (CE) and episode training, which are both commonly used in FSL. Based on a detailed analysis, we propose the Query-Relative (QR) loss, which proves superior to CE under episode training and is closely related to recently proposed mutual information estimation. Moreover, we further strengthen the proposed QR loss with a novel adaptive hard margin strategy. Comprehensive experiments validate the effectiveness of the proposed FSL scheme and the possibility to diagnosis rare skin disease with a few labeled samples.
摘要:从图像中的皮肤疾病分类是皮肤病诊断至关重要。然而,识别皮损涉及大小,颜色,形状和质地方面的各种方面。更糟糕的是,许多类别只含有极少量样品,构成了巨大的挑战传统的机器学习算法,甚至人类专家。通过近期对自然图像分类为数不多的射击学习(FSL)成功的启发,我们提出申请FSL皮肤疾病鉴别,解决训练样本问题极度匮乏。然而,直接应用到FSL这个任务不能很好地在实际工作中,我们发现,这个问题可以在很大程度上归因于交叉熵(CE)和集培训,这都是常用的FSL使用之间的不兼容性。基于一个详细的分析,我们提出查询相对(QR)的损失,这在情节的培训证明优于CE和密切相关,最近提出的互信息估计。此外,我们进一步加强了新的自适应硬缘战略所提出的QR损失。综合性实验验证了该方案FSL的有效性和可能性,诊断罕见的皮肤病有几个标记的样品。
Wei Zhu, Haofu Liao, Wenbin Li, Weijian Li, Jiebo Luo
Abstract: Skin disease classification from images is crucial to dermatological diagnosis. However, identifying skin lesions involves a variety of aspects in terms of size, color, shape, and texture. To make matters worse, many categories only contain very few samples, posing great challenges to conventional machine learning algorithms and even human experts. Inspired by the recent success of Few-Shot Learning (FSL) in natural image classification, we propose to apply FSL to skin disease identification to address the extreme scarcity of training sample problem. However, directly applying FSL to this task does not work well in practice, and we find that the problem can be largely attributed to the incompatibility between Cross Entropy (CE) and episode training, which are both commonly used in FSL. Based on a detailed analysis, we propose the Query-Relative (QR) loss, which proves superior to CE under episode training and is closely related to recently proposed mutual information estimation. Moreover, we further strengthen the proposed QR loss with a novel adaptive hard margin strategy. Comprehensive experiments validate the effectiveness of the proposed FSL scheme and the possibility to diagnosis rare skin disease with a few labeled samples.
摘要:从图像中的皮肤疾病分类是皮肤病诊断至关重要。然而,识别皮损涉及大小,颜色,形状和质地方面的各种方面。更糟糕的是,许多类别只含有极少量样品,构成了巨大的挑战传统的机器学习算法,甚至人类专家。通过近期对自然图像分类为数不多的射击学习(FSL)成功的启发,我们提出申请FSL皮肤疾病鉴别,解决训练样本问题极度匮乏。然而,直接应用到FSL这个任务不能很好地在实际工作中,我们发现,这个问题可以在很大程度上归因于交叉熵(CE)和集培训,这都是常用的FSL使用之间的不兼容性。基于一个详细的分析,我们提出查询相对(QR)的损失,这在情节的培训证明优于CE和密切相关,最近提出的互信息估计。此外,我们进一步加强了新的自适应硬缘战略所提出的QR损失。综合性实验验证了该方案FSL的有效性和可能性,诊断罕见的皮肤病有几个标记的样品。
48. Local Clustering with Mean Teacher for Semi-supervised Learning [PDF] 返回目录
Zexi Chen, Benjamin Dutton, Bharathkumar Ramachandra, Tianfu Wu, Ranga Raju Vatsavai
Abstract: The Mean Teacher (MT) model of Tarvainen and Valpola has shown favorable performance on several semi-supervised benchmark datasets. MT maintains a teacher model's weights as the exponential moving average of a student model's weights and minimizes the divergence between their probability predictions under diverse perturbations of the inputs. However, MT is known to suffer from confirmation bias, that is, reinforcing incorrect teacher model predictions. In this work, we propose a simple yet effective method called Local Clustering (LC) to mitigate the effect of confirmation bias. In MT, each data point is considered independent of other points during training; however, data points are likely to be close to each other in feature space if they share similar features. Motivated by this, we cluster data points locally by minimizing the pairwise distance between neighboring data points in feature space. Combined with a standard classification cross-entropy objective on labeled data points, the misclassified unlabeled data points are pulled towards high-density regions of their correct class with the help of their neighbors, thus improving model performance. We demonstrate on semi-supervised benchmark datasets SVHN and CIFAR-10 that adding our LC loss to MT yields significant improvements compared to MT and performance comparable to the state of the art in semi-supervised learning.
摘要:Tarvainen和Valpola的平均教师(MT)模型显示在几个半监督基准数据集良好的性能。 MT保持为指数老师模型的加权移动平均一个学生模型的权重,并最小化下的输入不同的扰动它们的概率预测之间的分歧。然而,MT被称为从确认偏见,即,加强教师不正确模型预测受苦。在这项工作中,我们提出了所谓的本地集群(LC)一个简单而有效的方法,以减轻确认偏见的影响。在MT中,每个数据点的训练过程中考虑独立于其他点;然而,数据点很可能是相互靠近的功能空间,如果他们有着相似的特点。这个启发,我们通过在特征空间最小化相邻数据点之间的距离成对本地集群中的数据点。用标准的分类交叉熵客观上标注的数据点相结合,误判未标记的数据点都被拉向自己与邻居的帮助下正确类的高密度区域,从而提高模型的性能。我们证明半监督基准数据集SVHN和CIFAR-10,加入我们的LC损失MT产生比MT显著改进和性能相媲美的技术半监督学习的状态。
Zexi Chen, Benjamin Dutton, Bharathkumar Ramachandra, Tianfu Wu, Ranga Raju Vatsavai
Abstract: The Mean Teacher (MT) model of Tarvainen and Valpola has shown favorable performance on several semi-supervised benchmark datasets. MT maintains a teacher model's weights as the exponential moving average of a student model's weights and minimizes the divergence between their probability predictions under diverse perturbations of the inputs. However, MT is known to suffer from confirmation bias, that is, reinforcing incorrect teacher model predictions. In this work, we propose a simple yet effective method called Local Clustering (LC) to mitigate the effect of confirmation bias. In MT, each data point is considered independent of other points during training; however, data points are likely to be close to each other in feature space if they share similar features. Motivated by this, we cluster data points locally by minimizing the pairwise distance between neighboring data points in feature space. Combined with a standard classification cross-entropy objective on labeled data points, the misclassified unlabeled data points are pulled towards high-density regions of their correct class with the help of their neighbors, thus improving model performance. We demonstrate on semi-supervised benchmark datasets SVHN and CIFAR-10 that adding our LC loss to MT yields significant improvements compared to MT and performance comparable to the state of the art in semi-supervised learning.
摘要:Tarvainen和Valpola的平均教师(MT)模型显示在几个半监督基准数据集良好的性能。 MT保持为指数老师模型的加权移动平均一个学生模型的权重,并最小化下的输入不同的扰动它们的概率预测之间的分歧。然而,MT被称为从确认偏见,即,加强教师不正确模型预测受苦。在这项工作中,我们提出了所谓的本地集群(LC)一个简单而有效的方法,以减轻确认偏见的影响。在MT中,每个数据点的训练过程中考虑独立于其他点;然而,数据点很可能是相互靠近的功能空间,如果他们有着相似的特点。这个启发,我们通过在特征空间最小化相邻数据点之间的距离成对本地集群中的数据点。用标准的分类交叉熵客观上标注的数据点相结合,误判未标记的数据点都被拉向自己与邻居的帮助下正确类的高密度区域,从而提高模型的性能。我们证明半监督基准数据集SVHN和CIFAR-10,加入我们的LC损失MT产生比MT显著改进和性能相媲美的技术半监督学习的状态。
49. Self-Supervised Feature Extraction for 3D Axon Segmentation [PDF] 返回目录
Tzofi Klinghoffer, Peter Morales, Young-Gyun Park, Nicholas Evans, Kwanghun Chung, Laura J. Brattain
Abstract: Existing learning-based methods to automatically trace axons in 3D brain imagery often rely on manually annotated segmentation labels. Labeling is a labor-intensive process and is not scalable to whole-brain analysis, which is needed for improved understanding of brain function. We propose a self-supervised auxiliary task that utilizes the tube-like structure of axons to build a feature extractor from unlabeled data. The proposed auxiliary task constrains a 3D convolutional neural network (CNN) to predict the order of permuted slices in an input 3D volume. By solving this task, the 3D CNN is able to learn features without ground-truth labels that are useful for downstream segmentation with the 3D U-Net model. To the best of our knowledge, our model is the first to perform automated segmentation of axons imaged at subcellular resolution with the SHIELD technique. We demonstrate improved segmentation performance over the 3D U-Net model on both the SHIELD PVGPe dataset and the BigNeuron Project, single neuron Janelia dataset.
摘要:在3D影像大脑现有的基于学习的方法来自动跟踪轴突往往依靠手动注释分割标签。标签是一个劳动力密集的过程,而不是扩展到全脑分析,这是需要改善大脑功能的理解。我们建议,利用管样轴突建立无标签数据的特征提取的结构的自我监督的辅助任务。所提出的任务的辅助约束三维卷积神经网络(CNN),以预测在输入的3D体积置换片的顺序。通过解决这一任务,3D CNN能够学习功能,无需地面实况的标签,是能够与3D掌中模型下游的分割是有用的。据我们所知,我们的模型是首次进行,在与屏蔽技术的亚细胞分辨率成像轴突的自动分割。我们证明了上盾PVGPe数据集和BigNeuron项目,单个神经元Janelia数据集两种3D掌中模型改进的分割性能。
Tzofi Klinghoffer, Peter Morales, Young-Gyun Park, Nicholas Evans, Kwanghun Chung, Laura J. Brattain
Abstract: Existing learning-based methods to automatically trace axons in 3D brain imagery often rely on manually annotated segmentation labels. Labeling is a labor-intensive process and is not scalable to whole-brain analysis, which is needed for improved understanding of brain function. We propose a self-supervised auxiliary task that utilizes the tube-like structure of axons to build a feature extractor from unlabeled data. The proposed auxiliary task constrains a 3D convolutional neural network (CNN) to predict the order of permuted slices in an input 3D volume. By solving this task, the 3D CNN is able to learn features without ground-truth labels that are useful for downstream segmentation with the 3D U-Net model. To the best of our knowledge, our model is the first to perform automated segmentation of axons imaged at subcellular resolution with the SHIELD technique. We demonstrate improved segmentation performance over the 3D U-Net model on both the SHIELD PVGPe dataset and the BigNeuron Project, single neuron Janelia dataset.
摘要:在3D影像大脑现有的基于学习的方法来自动跟踪轴突往往依靠手动注释分割标签。标签是一个劳动力密集的过程,而不是扩展到全脑分析,这是需要改善大脑功能的理解。我们建议,利用管样轴突建立无标签数据的特征提取的结构的自我监督的辅助任务。所提出的任务的辅助约束三维卷积神经网络(CNN),以预测在输入的3D体积置换片的顺序。通过解决这一任务,3D CNN能够学习功能,无需地面实况的标签,是能够与3D掌中模型下游的分割是有用的。据我们所知,我们的模型是首次进行,在与屏蔽技术的亚细胞分辨率成像轴突的自动分割。我们证明了上盾PVGPe数据集和BigNeuron项目,单个神经元Janelia数据集两种3D掌中模型改进的分割性能。
50. Deep variational network for rapid 4D flow MRI reconstruction [PDF] 返回目录
Valery Vishnevskiy, Jonas Walheim, Sebastian Kozerke
Abstract: Phase-contrast magnetic resonance imaging (MRI) provides time-resolved quantification of blood flow dynamics that can aid clinical diagnosis. Long in vivo scan times due to repeated three-dimensional (3D) volume sampling over cardiac phases and breathing cycles necessitate accelerated imaging techniques that leverage data correlations. Standard compressed sensing reconstruction methods require tuning of hyperparameters and are computationally expensive, which diminishes the potential reduction of examination times. We propose an efficient model-based deep neural reconstruction network and evaluate its performance on clinical aortic flow data. The network is shown to reconstruct undersampled 4D flow MRI data in under a minute on standard consumer hardware. Remarkably, the relatively low amounts of tunable parameters allowed the network to be trained on images from 11 reference scans while generalizing well to retrospective and prospective undersampled data for various acceleration factors and anatomies.
摘要:相衬磁共振成像(MRI)提供的血流动力学,可以帮助临床诊断的时间分辨定量。长在由于反复的三维(3D)体体内的扫描时间采样过心脏相位和呼吸周期必要加速成像技术利用数据相关性。标准压缩传感重建方法需要超参数的调谐和在计算上昂贵,这减少的检查时间的电势降低。我们提出了一种高效的基于模型的深层神经网络重建并评价其临床主动脉流量数据的表现。该网络示出为重建标准消费者硬件一分钟下在欠4D流MRI数据。值得注意的是,相对低量的可调参数的允许网络上的图像从11个参考扫描而概括好以各种加速因子和解剖回顾性和前瞻性欠采样数据被训练。
Valery Vishnevskiy, Jonas Walheim, Sebastian Kozerke
Abstract: Phase-contrast magnetic resonance imaging (MRI) provides time-resolved quantification of blood flow dynamics that can aid clinical diagnosis. Long in vivo scan times due to repeated three-dimensional (3D) volume sampling over cardiac phases and breathing cycles necessitate accelerated imaging techniques that leverage data correlations. Standard compressed sensing reconstruction methods require tuning of hyperparameters and are computationally expensive, which diminishes the potential reduction of examination times. We propose an efficient model-based deep neural reconstruction network and evaluate its performance on clinical aortic flow data. The network is shown to reconstruct undersampled 4D flow MRI data in under a minute on standard consumer hardware. Remarkably, the relatively low amounts of tunable parameters allowed the network to be trained on images from 11 reference scans while generalizing well to retrospective and prospective undersampled data for various acceleration factors and anatomies.
摘要:相衬磁共振成像(MRI)提供的血流动力学,可以帮助临床诊断的时间分辨定量。长在由于反复的三维(3D)体体内的扫描时间采样过心脏相位和呼吸周期必要加速成像技术利用数据相关性。标准压缩传感重建方法需要超参数的调谐和在计算上昂贵,这减少的检查时间的电势降低。我们提出了一种高效的基于模型的深层神经网络重建并评价其临床主动脉流量数据的表现。该网络示出为重建标准消费者硬件一分钟下在欠4D流MRI数据。值得注意的是,相对低量的可调参数的允许网络上的图像从11个参考扫描而概括好以各种加速因子和解剖回顾性和前瞻性欠采样数据被训练。
51. Adversarial Distortion for Learned Video Compression [PDF] 返回目录
Vijay Veerabadrany, Reza Pourreza, Amirhossein Habibian, Taco Cohen
Abstract: In this paper, we present a novel adversarial lossy video compression model. At extremely low bit-rates, standard video coding schemes suffer from unpleasant reconstruction artifacts such as blocking, ringing etc. Existing learned neural approaches to video compression have achieved reasonable success on reducing the bit-rate for efficient transmission and reduce the impact of artifacts to an extent. However, they still tend to produce blurred results under extreme compression. In this paper, we present a deep adversarial learned video compression model that minimizes an auxiliary adversarial distortion objective. We find this adversarial objective to correlate better with human perceptual quality judgement relative to traditional quality metrics such as MS-SSIM and PSNR. Our experiments using a state-of-the-art learned video compression system demonstrate a reduction of perceptual artifacts and reconstruction of detail lost especially under extremely high compression.
摘要:在本文中,我们提出了一个新颖的对抗有损视频压缩模式。在极低的比特率,标准的视频编码方案不愉快的重建文物如堵,振铃等现有学会神经方法视频压缩遭受对降低比特率传输效率达到合理的成功和减少文物的影响的程度。然而,他们仍然倾向于产生在极端压缩模糊的效果。在本文中,我们提出了一个深对抗性了解到视频压缩模式,最小化辅助对抗失真目标。我们认为,这对抗目标,相对于传统的质量指标,如MS-SSIM和PSNR人类感知质量的判断关联更好。使用状态的最先进的学习视频压缩系统我们的实验证明减少感知伪像和细节重建特别是在非常高的压缩丢失的。
Vijay Veerabadrany, Reza Pourreza, Amirhossein Habibian, Taco Cohen
Abstract: In this paper, we present a novel adversarial lossy video compression model. At extremely low bit-rates, standard video coding schemes suffer from unpleasant reconstruction artifacts such as blocking, ringing etc. Existing learned neural approaches to video compression have achieved reasonable success on reducing the bit-rate for efficient transmission and reduce the impact of artifacts to an extent. However, they still tend to produce blurred results under extreme compression. In this paper, we present a deep adversarial learned video compression model that minimizes an auxiliary adversarial distortion objective. We find this adversarial objective to correlate better with human perceptual quality judgement relative to traditional quality metrics such as MS-SSIM and PSNR. Our experiments using a state-of-the-art learned video compression system demonstrate a reduction of perceptual artifacts and reconstruction of detail lost especially under extremely high compression.
摘要:在本文中,我们提出了一个新颖的对抗有损视频压缩模式。在极低的比特率,标准的视频编码方案不愉快的重建文物如堵,振铃等现有学会神经方法视频压缩遭受对降低比特率传输效率达到合理的成功和减少文物的影响的程度。然而,他们仍然倾向于产生在极端压缩模糊的效果。在本文中,我们提出了一个深对抗性了解到视频压缩模式,最小化辅助对抗失真目标。我们认为,这对抗目标,相对于传统的质量指标,如MS-SSIM和PSNR人类感知质量的判断关联更好。使用状态的最先进的学习视频压缩系统我们的实验证明减少感知伪像和细节重建特别是在非常高的压缩丢失的。
注:中文为机器翻译结果!