目录
2. PointIso: Point Cloud Based Deep Learning Model for Detecting Arbitrary-Precision Peptide Features in LC-MS Map through Attention Based Segmentation [PDF] 摘要
7. HGCN-GJS: Hierarchical Graph Convolutional Network with Groupwise Joint Sampling for Trajectory Prediction [PDF] 摘要
10. CSI2Image: Image Reconstruction from Channel State Information Using Generative Adversarial Networks [PDF] 摘要
14. FairCVtest Demo: Understanding Bias in Multimodal Learning with a Testbed in Fair Automatic Recruitment [PDF] 摘要
21. Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for Scene Segmentation [PDF] 摘要
22. A Self Contour-based Rotation and Translation-Invariant Transformation for Point Clouds Recognition [PDF] 摘要
23. Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition [PDF] 摘要
26. Leveraging Domain Knowledge using Machine Learning for Image Compression in Internet-of-Things [PDF] 摘要
31. F3RNet: Full-Resolution Residual Registration Network for Multimodal Image Registration [PDF] 摘要
33. Multi-structure bone segmentation in pediatric MR images with combined regularization from shape priors and adversarial network [PDF] 摘要
34. RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model [PDF] 摘要
39. Learning a Single Model with a Wide Range of Quality Factors for JPEG Image Artifacts Removal [PDF] 摘要
44. Qutrit-inspired Fully Self-supervised Shallow Quantum Learning Network for Brain Tumor Segmentation [PDF] 摘要
45. Simultaneous Denoising and Motion Estimation for Low-dose Gated PET using a Siamese Adversarial Network with Gate-to-Gate Consistency Learning [PDF] 摘要
47. Hold Tight and Never Let Go: Security of Deep Learning based Automated Lane Centering under Physical-World Attack [PDF] 摘要
摘要
1. Understanding Deformable Alignment in Video Super-Resolution [PDF] 返回目录
Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy
Abstract: Deformable convolution, originally proposed for the adaptation to geometric variations of objects, has recently shown compelling performance in aligning multiple frames and is increasingly adopted for video super-resolution. Despite its remarkable performance, its underlying mechanism for alignment remains unclear. In this study, we carefully investigate the relation between deformable alignment and the classic flow-based alignment. We show that deformable convolution can be decomposed into a combination of spatial warping and convolution. This decomposition reveals the commonality of deformable alignment and flow-based alignment in formulation, but with a key difference in their offset diversity. We further demonstrate through experiments that the increased diversity in deformable alignment yields better-aligned features, and hence significantly improves the quality of video super-resolution output. Based on our observations, we propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment. Aside from the contributions to deformable alignment, our formulation inspires a more flexible approach to introduce offset diversity to flow-based alignment, improving its performance.
摘要:变形卷积,最初提出为适应对象的几何形状的变化,近来显示出卓越的性能排列多个帧进行视频超分辨率越来越多地采用。尽管其卓越的性能,对准其机制尚不清楚。在这项研究中,我们仔细调查变形的对齐和经典的基于流的定位之间的关系。我们表明,变形卷积可以被分解成多个空间翘曲和卷积的组合。这种分解揭示了变形的调整和制定中基于流动对齐的共性,但在他们的偏移多样性的一个关键区别。我们通过实验发现,在变形对准产量增加的多样性更好的对齐功能进一步证明,因此显著提高了视频超分辨率的输出质量。根据我们的观察,我们提出了一个偏移保真损失导向偏移与光流学习。实验表明,我们的损失成功避免了偏移的溢出,并减轻变形对齐的不稳定问题。除了可变形对齐的贡献,我们制定激励更灵活的方式引进偏移多样性基于流的调整,提高其性能。
Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy
Abstract: Deformable convolution, originally proposed for the adaptation to geometric variations of objects, has recently shown compelling performance in aligning multiple frames and is increasingly adopted for video super-resolution. Despite its remarkable performance, its underlying mechanism for alignment remains unclear. In this study, we carefully investigate the relation between deformable alignment and the classic flow-based alignment. We show that deformable convolution can be decomposed into a combination of spatial warping and convolution. This decomposition reveals the commonality of deformable alignment and flow-based alignment in formulation, but with a key difference in their offset diversity. We further demonstrate through experiments that the increased diversity in deformable alignment yields better-aligned features, and hence significantly improves the quality of video super-resolution output. Based on our observations, we propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment. Aside from the contributions to deformable alignment, our formulation inspires a more flexible approach to introduce offset diversity to flow-based alignment, improving its performance.
摘要:变形卷积,最初提出为适应对象的几何形状的变化,近来显示出卓越的性能排列多个帧进行视频超分辨率越来越多地采用。尽管其卓越的性能,对准其机制尚不清楚。在这项研究中,我们仔细调查变形的对齐和经典的基于流的定位之间的关系。我们表明,变形卷积可以被分解成多个空间翘曲和卷积的组合。这种分解揭示了变形的调整和制定中基于流动对齐的共性,但在他们的偏移多样性的一个关键区别。我们通过实验发现,在变形对准产量增加的多样性更好的对齐功能进一步证明,因此显著提高了视频超分辨率的输出质量。根据我们的观察,我们提出了一个偏移保真损失导向偏移与光流学习。实验表明,我们的损失成功避免了偏移的溢出,并减轻变形对齐的不稳定问题。除了可变形对齐的贡献,我们制定激励更灵活的方式引进偏移多样性基于流的调整,提高其性能。
2. PointIso: Point Cloud Based Deep Learning Model for Detecting Arbitrary-Precision Peptide Features in LC-MS Map through Attention Based Segmentation [PDF] 返回目录
Fatema Tuz Zohora, M Ziaur Rahman, Ngoc Hieu Tran, Lei Xin, Baozhen Shan, Ming Li
Abstract: A promising technique of discovering disease biomarkers is to measure the relative protein abundance in multiple biofluid samples through liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics. The key step involves peptide feature detection in LC-MS map, along with its charge and intensity. Existing heuristic algorithms suffer from inaccurate parameters since different settings of the parameters result in significantly different outcomes. Therefore, we propose PointIso, to serve the necessity of an automated system for peptide feature detection that is able to find out the proper parameters itself, and is easily adaptable to different types of datasets. It consists of an attention based scanning step for segmenting the multi-isotopic pattern of peptide features along with charge and a sequence classification step for grouping those isotopes into potential peptide features. PointIso is the first point cloud based, arbitrary-precision deep learning network to address the problem and achieves 98% detection of high quality MS/MS identifications in a benchmark dataset, which is higher than several other widely used algorithms. Besides contributing to the proteomics study, we believe our novel segmentation technique should serve the general image processing domain as well.
摘要:发现有前途的疾病生物标志的技术是通过与串联质谱(LC-MS / MS)基于定量蛋白质组学液相色谱法测量相对蛋白质丰度的多个生物流体样本英寸的关键的步骤涉及肽特征检测在LC-MS的地图,其电荷和强度沿。现有的启发式算法不准确的参数受到影响,因为这些参数的不同设置导致显著不同的结果。因此,我们建议PointIso,以服务为肽特征检测的自动化系统,该系统能够找到合适的参数本身,很容易适应不同类型的数据集的必要性。它包括用于与充电,并为那些同位素分组为可能的肽的特征的序列分类步骤沿分割的肽的特征的多同位素模式基于关注扫描步骤的。 PointIso是基于所述第一点云,任意精度的深学习网络来解决这个问题和达到高品质的MS / MS鉴定的98%的检测的基准数据集,这是比几个其他广泛使用的算法更高。除了有助于蛋白质组学研究中,我们相信我们的新的分割技术应该服务于一般的图像处理领域也是如此。
Fatema Tuz Zohora, M Ziaur Rahman, Ngoc Hieu Tran, Lei Xin, Baozhen Shan, Ming Li
Abstract: A promising technique of discovering disease biomarkers is to measure the relative protein abundance in multiple biofluid samples through liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics. The key step involves peptide feature detection in LC-MS map, along with its charge and intensity. Existing heuristic algorithms suffer from inaccurate parameters since different settings of the parameters result in significantly different outcomes. Therefore, we propose PointIso, to serve the necessity of an automated system for peptide feature detection that is able to find out the proper parameters itself, and is easily adaptable to different types of datasets. It consists of an attention based scanning step for segmenting the multi-isotopic pattern of peptide features along with charge and a sequence classification step for grouping those isotopes into potential peptide features. PointIso is the first point cloud based, arbitrary-precision deep learning network to address the problem and achieves 98% detection of high quality MS/MS identifications in a benchmark dataset, which is higher than several other widely used algorithms. Besides contributing to the proteomics study, we believe our novel segmentation technique should serve the general image processing domain as well.
摘要:发现有前途的疾病生物标志的技术是通过与串联质谱(LC-MS / MS)基于定量蛋白质组学液相色谱法测量相对蛋白质丰度的多个生物流体样本英寸的关键的步骤涉及肽特征检测在LC-MS的地图,其电荷和强度沿。现有的启发式算法不准确的参数受到影响,因为这些参数的不同设置导致显著不同的结果。因此,我们建议PointIso,以服务为肽特征检测的自动化系统,该系统能够找到合适的参数本身,很容易适应不同类型的数据集的必要性。它包括用于与充电,并为那些同位素分组为可能的肽的特征的序列分类步骤沿分割的肽的特征的多同位素模式基于关注扫描步骤的。 PointIso是基于所述第一点云,任意精度的深学习网络来解决这个问题和达到高品质的MS / MS鉴定的98%的检测的基准数据集,这是比几个其他广泛使用的算法更高。除了有助于蛋白质组学研究中,我们相信我们的新的分割技术应该服务于一般的图像处理领域也是如此。
3. 3D_DEN: Open-ended 3D Object Recognition using Dynamically Expandable Networks [PDF] 返回目录
Sudhakaran Jain, Hamidreza Kasaei
Abstract: Service robots, in general, have to work independently and adapt to the dynamic changes in the environment. One important aspect in such scenarios is to continually learn to recognize new objects when they become available. This combines two main research problems namely continual learning and 3D object recognition. Most of the existing research approaches include the use of deep Convolutional Neural Networks (CNNs) focusing on image datasets. A modified approach might be needed for continually learning 3D objects. A major concern in using CNNs is the problem of catastrophic forgetting when a model tries to learn new data. In spite of various recent proposed solutions to mitigate this problem, there still exist a few side-effects (such as time/computational complexity) of such solutions. We propose a model capable of learning 3D objects in an open-ended fashion by employing deep transfer learning-based approach combined with dynamically expandable layers, which also makes sure that these side-effects are minimized to a great extent. We show that this model sets a new state-of-the-art standard not only with regards to accuracy but also for computational complexity.
摘要:服务机器人,一般来说,有独立工作和适应环境的动态变化。在这样的场景中的一个重要方面就是要不断学习认识新的对象时,他们变得可用。这种结合了两个主要研究问题,即不断地学习和三维物体识别。大多数现有的研究方法包括专注于图像数据集采用深卷积神经网络(细胞神经网络)的。可能需要不断地学习3D对象的改进方法。在使用细胞神经网络的主要问题是灾难性遗忘的问题,当一个模型,努力学习新的数据。尽管最近各种提议的解决方案来缓解这个问题,但仍存在这样的解决方案的一些副作用(如时间/计算复杂性)。我们建议能够学习3D模型采用深陷转会基于学习的方法与动态可扩展层,这也可以确保这些副作用最小化在很大程度上组合在一个开放式的对象。我们表明,这种模型设置一个新的国家的最先进的标准,不仅有关于精度,而且对于计算复杂度。
Sudhakaran Jain, Hamidreza Kasaei
Abstract: Service robots, in general, have to work independently and adapt to the dynamic changes in the environment. One important aspect in such scenarios is to continually learn to recognize new objects when they become available. This combines two main research problems namely continual learning and 3D object recognition. Most of the existing research approaches include the use of deep Convolutional Neural Networks (CNNs) focusing on image datasets. A modified approach might be needed for continually learning 3D objects. A major concern in using CNNs is the problem of catastrophic forgetting when a model tries to learn new data. In spite of various recent proposed solutions to mitigate this problem, there still exist a few side-effects (such as time/computational complexity) of such solutions. We propose a model capable of learning 3D objects in an open-ended fashion by employing deep transfer learning-based approach combined with dynamically expandable layers, which also makes sure that these side-effects are minimized to a great extent. We show that this model sets a new state-of-the-art standard not only with regards to accuracy but also for computational complexity.
摘要:服务机器人,一般来说,有独立工作和适应环境的动态变化。在这样的场景中的一个重要方面就是要不断学习认识新的对象时,他们变得可用。这种结合了两个主要研究问题,即不断地学习和三维物体识别。大多数现有的研究方法包括专注于图像数据集采用深卷积神经网络(细胞神经网络)的。可能需要不断地学习3D对象的改进方法。在使用细胞神经网络的主要问题是灾难性遗忘的问题,当一个模型,努力学习新的数据。尽管最近各种提议的解决方案来缓解这个问题,但仍存在这样的解决方案的一些副作用(如时间/计算复杂性)。我们建议能够学习3D模型采用深陷转会基于学习的方法与动态可扩展层,这也可以确保这些副作用最小化在很大程度上组合在一个开放式的对象。我们表明,这种模型设置一个新的国家的最先进的标准,不仅有关于精度,而且对于计算复杂度。
4. Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks [PDF] 返回目录
Chen Ma, Shuyu Cheng, Li Chen, Junhai Yong
Abstract: We propose a simple and highly query-efficient black-box adversarial attack named SWITCH, which has a state-of-the-art performance under $\ell_2$ and $\ell_\infty$ norms in the score-based setting. In the black box attack setting, designing query-efficient attacks remains an open problem. The high query efficiency of the proposed approach stems from the combination of transfer-based attacks and random-search-based ones. The surrogate model's gradient $\hat{\mathbf{g}}$ is exploited for the guidance, which is then switched if our algorithm detects that it does not point to the adversarial region by using a query, thereby keeping the objective loss function of the target model rising as much as possible. Two switch operations are available, i.e., SWITCH$_\text{neg}$ and SWITCH$_\text{rnd}$. SWITCH$_\text{neg}$ takes $-\hat{\mathbf{g}}$ as the new direction, which is reasonable under an approximate local linearity assumption. SWITCH$_\text{rnd}$ computes the gradient from another model, which is randomly selected from a large model set, to help bypass the potential obstacle in optimization. Experimental results show that these strategies boost the optimization process whereas following the original surrogate gradients does not work. In SWITCH, no query is used to estimate the gradient, and all the queries aim to determine whether to switch directions, resulting in unprecedented query efficiency. We demonstrate that our approach outperforms 10 state-of-the-art attacks on CIFAR-10, CIFAR-100 and TinyImageNet datasets. SWITCH can serve as a strong baseline for future black-box attacks. The PyTorch source code is released in this https URL .
摘要:我们提出了一个简单而高效的查询效率暗箱敌对攻击命名为开关,它具有在$ \ ell_2 $,并在基于分数设定$ \ ell_ \ infty $规范一个国家的最先进的性能。在黑匣子攻击设定,设计的查询效率的攻击仍然是一个悬而未决的问题。该方法的高效率查询从转移为基础的攻击和基于随机搜索的人的结合造成的。该代理模型的梯度$ \帽子{\ mathbf {G}} $被利用的指导,然后如果我们的算法检测到它通过使用查询,从而保持的目标损失函数不指向对抗的区域切换目标模型尽可能地上升。两个开关的操作是可用的,即,SWITCH $ _ \ {文字NEG} $和$ SWITCH _ \ {文字RND} $。 SWITCH $ _ \ {文字NEG} $需要$ - \帽子{\ mathbf {G}} $作为新的发展方向,其近似局部线性假设条件下是合理的。 SWITCH $ _ \ {文字RND} $计算从另一种模式,这是随机的大型模型集合中选择,以帮助旁路优化潜在的障碍梯度。实验结果表明,这些策略提高优化过程,而按照原代理梯度不起作用。在SWITCH,没有查询被用来估计梯度,并且所有的查询的目的是确定是否切换方向,导致了前所未有的查询效率。我们证明我们的方法优于上CIFAR-10,CIFAR-100和TinyImageNet数据集10国家的最先进的攻击。交换机可以作为未来暗箱攻击强大的基线。该PyTorch源代码发布在此HTTPS URL。
Chen Ma, Shuyu Cheng, Li Chen, Junhai Yong
Abstract: We propose a simple and highly query-efficient black-box adversarial attack named SWITCH, which has a state-of-the-art performance under $\ell_2$ and $\ell_\infty$ norms in the score-based setting. In the black box attack setting, designing query-efficient attacks remains an open problem. The high query efficiency of the proposed approach stems from the combination of transfer-based attacks and random-search-based ones. The surrogate model's gradient $\hat{\mathbf{g}}$ is exploited for the guidance, which is then switched if our algorithm detects that it does not point to the adversarial region by using a query, thereby keeping the objective loss function of the target model rising as much as possible. Two switch operations are available, i.e., SWITCH$_\text{neg}$ and SWITCH$_\text{rnd}$. SWITCH$_\text{neg}$ takes $-\hat{\mathbf{g}}$ as the new direction, which is reasonable under an approximate local linearity assumption. SWITCH$_\text{rnd}$ computes the gradient from another model, which is randomly selected from a large model set, to help bypass the potential obstacle in optimization. Experimental results show that these strategies boost the optimization process whereas following the original surrogate gradients does not work. In SWITCH, no query is used to estimate the gradient, and all the queries aim to determine whether to switch directions, resulting in unprecedented query efficiency. We demonstrate that our approach outperforms 10 state-of-the-art attacks on CIFAR-10, CIFAR-100 and TinyImageNet datasets. SWITCH can serve as a strong baseline for future black-box attacks. The PyTorch source code is released in this https URL .
摘要:我们提出了一个简单而高效的查询效率暗箱敌对攻击命名为开关,它具有在$ \ ell_2 $,并在基于分数设定$ \ ell_ \ infty $规范一个国家的最先进的性能。在黑匣子攻击设定,设计的查询效率的攻击仍然是一个悬而未决的问题。该方法的高效率查询从转移为基础的攻击和基于随机搜索的人的结合造成的。该代理模型的梯度$ \帽子{\ mathbf {G}} $被利用的指导,然后如果我们的算法检测到它通过使用查询,从而保持的目标损失函数不指向对抗的区域切换目标模型尽可能地上升。两个开关的操作是可用的,即,SWITCH $ _ \ {文字NEG} $和$ SWITCH _ \ {文字RND} $。 SWITCH $ _ \ {文字NEG} $需要$ - \帽子{\ mathbf {G}} $作为新的发展方向,其近似局部线性假设条件下是合理的。 SWITCH $ _ \ {文字RND} $计算从另一种模式,这是随机的大型模型集合中选择,以帮助旁路优化潜在的障碍梯度。实验结果表明,这些策略提高优化过程,而按照原代理梯度不起作用。在SWITCH,没有查询被用来估计梯度,并且所有的查询的目的是确定是否切换方向,导致了前所未有的查询效率。我们证明我们的方法优于上CIFAR-10,CIFAR-100和TinyImageNet数据集10国家的最先进的攻击。交换机可以作为未来暗箱攻击强大的基线。该PyTorch源代码发布在此HTTPS URL。
5. ResNet-like Architecture with Low Hardware Requirements [PDF] 返回目录
Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov
Abstract: One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing popularity of edge computing makes us look for ways to reduce its time for mobile and embedded devices. One way to decrease the neural network inference time is to modify a neuron model to make it moreefficient for computations on a specific device. The example ofsuch a model is a bipolar morphological neuron model. The bipolar morphological neuron is based on the idea of replacing multiplication with addition and maximum operations. This model has been demonstrated for simple image classification with LeNet-like architectures [1]. In the paper, we introduce a bipolar morphological ResNet (BM-ResNet) model obtained from a much more complex ResNet architecture by converting its layers to bipolar morphological ones. We apply BM-ResNet to image classification on MNIST and CIFAR-10 datasets with only a moderate accuracy decrease from 99.3% to 99.1% and from 85.3% to 85.1%. We also estimate the computational complexity of the resulting model. We show that for the majority of ResNet layers, the considered model requires 2.1-2.9 times fewer logic gates for implementation and 15-30% lower latency.
摘要:一个现代识别系统的计算量最大的部分是用于图像分类,分割,增强和识别深层神经网络的推断。边缘计算的日益普及,使我们寻求新的方式来减少其移动和嵌入式设备的时间。降低神经网络推理时间的一种方法是修改神经元模型,使其moreefficient为特定设备上的计算。 ofsuch一个模型的例子是双极形态神经元模型。双极神经元形态是基于与加法和最大值运算代替乘法的想法。该模型已被证实为简单图像分类与LeNet状结构[1]。在论文中,我们通过将其层以双极形态那些介绍从一个更复杂的结构RESNET获得的双极形态RESNET(BM-RESNET)模型。我们采用BM-RESNET对MNIST和CIFAR-10数据集图像分类,只有从99.3%的适度精度的降低到99.1%和85.3%至85.1%。我们也估计结果模型的计算复杂度。我们发现,对于大多数层RESNET,所考虑的模型需要2.1-2.9倍更少的逻辑门实施和15-30%,更低的延迟。
Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov
Abstract: One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing popularity of edge computing makes us look for ways to reduce its time for mobile and embedded devices. One way to decrease the neural network inference time is to modify a neuron model to make it moreefficient for computations on a specific device. The example ofsuch a model is a bipolar morphological neuron model. The bipolar morphological neuron is based on the idea of replacing multiplication with addition and maximum operations. This model has been demonstrated for simple image classification with LeNet-like architectures [1]. In the paper, we introduce a bipolar morphological ResNet (BM-ResNet) model obtained from a much more complex ResNet architecture by converting its layers to bipolar morphological ones. We apply BM-ResNet to image classification on MNIST and CIFAR-10 datasets with only a moderate accuracy decrease from 99.3% to 99.1% and from 85.3% to 85.1%. We also estimate the computational complexity of the resulting model. We show that for the majority of ResNet layers, the considered model requires 2.1-2.9 times fewer logic gates for implementation and 15-30% lower latency.
摘要:一个现代识别系统的计算量最大的部分是用于图像分类,分割,增强和识别深层神经网络的推断。边缘计算的日益普及,使我们寻求新的方式来减少其移动和嵌入式设备的时间。降低神经网络推理时间的一种方法是修改神经元模型,使其moreefficient为特定设备上的计算。 ofsuch一个模型的例子是双极形态神经元模型。双极神经元形态是基于与加法和最大值运算代替乘法的想法。该模型已被证实为简单图像分类与LeNet状结构[1]。在论文中,我们通过将其层以双极形态那些介绍从一个更复杂的结构RESNET获得的双极形态RESNET(BM-RESNET)模型。我们采用BM-RESNET对MNIST和CIFAR-10数据集图像分类,只有从99.3%的适度精度的降低到99.1%和85.3%至85.1%。我们也估计结果模型的计算复杂度。我们发现,对于大多数层RESNET,所考虑的模型需要2.1-2.9倍更少的逻辑门实施和15-30%,更低的延迟。
6. AMRNet: Chips Augmentation in Areial Images Object Detection [PDF] 返回目录
Zhiwei Wei, Chenzhen Duan
Abstract: Detecting object in aerial image is challenging task due to 1) objects are often small and dense relative to images. 2) object scale varies in a large range. 3) object number in different classes is imbalanced. Current solutions almost adopt cropping method: splitting high resolution images into serials subregions (chips) and detecting on them. However, few works notice that some problems including scale variation, object sparsity exist when directly train network with chips. In this work, Three augmentation methods are introduced. Specifically, we propose a scale adaptive module compatable with all existing cropping method. It dynamically adjust cropping size to balance cover proportion between objects and chips, which narrows object scale variation in training and improves performance without bells and whistels; In addtion, we introduce mosaic effective sloving object sparity and background similarity problems in areial dataset; To balance catgory, we present mask resampling in chips providing higher quality training sample; Our model achieves state-of-the-art perfomance on two popular aerial images datasets of VisDrone and UAVDT. Remarkably, All methods can independent apply to detectiors increasing performance steady without the sacrifice of inference efficiency.
摘要:在空间像检测对象是具有挑战性的任务,因为1)对象通常相对于图像小而密集。 2)对象规模在一个大范围内变化。 3)在不同类别的对象号是不平衡的。目前的解决方案几乎采用裁剪方法:分裂高分辨率图像转换成连续分区域(芯片)和检测它们。然而,作品很少注意到,包括尺度变化,物体稀疏存在一些问题时,直接培训网络芯片。在这项工作中,引入了三个增强的方法。具体来说,我们建议所有现有的裁剪方法的规模自适应模块compatable。它动态调整裁剪尺寸的对象和芯片之间的平衡盖比例,这变窄训练对象尺度变化和改进而不钟声和whistels性能;在addtion,我们介绍在数据集中areial镶嵌有效方法求对象sparity和背景相似的问题;到平衡catgory,我们本掩模在芯片提供更高质量的训练样本重新采样;我们的模型实现了对VisDrone和UAVDT的两个流行的航拍图像数据集的国家的最先进的更流畅。值得注意的是,所有方法都可以独立适用于detectiors稳步提高性能,而推理效率的牺牲。
Zhiwei Wei, Chenzhen Duan
Abstract: Detecting object in aerial image is challenging task due to 1) objects are often small and dense relative to images. 2) object scale varies in a large range. 3) object number in different classes is imbalanced. Current solutions almost adopt cropping method: splitting high resolution images into serials subregions (chips) and detecting on them. However, few works notice that some problems including scale variation, object sparsity exist when directly train network with chips. In this work, Three augmentation methods are introduced. Specifically, we propose a scale adaptive module compatable with all existing cropping method. It dynamically adjust cropping size to balance cover proportion between objects and chips, which narrows object scale variation in training and improves performance without bells and whistels; In addtion, we introduce mosaic effective sloving object sparity and background similarity problems in areial dataset; To balance catgory, we present mask resampling in chips providing higher quality training sample; Our model achieves state-of-the-art perfomance on two popular aerial images datasets of VisDrone and UAVDT. Remarkably, All methods can independent apply to detectiors increasing performance steady without the sacrifice of inference efficiency.
摘要:在空间像检测对象是具有挑战性的任务,因为1)对象通常相对于图像小而密集。 2)对象规模在一个大范围内变化。 3)在不同类别的对象号是不平衡的。目前的解决方案几乎采用裁剪方法:分裂高分辨率图像转换成连续分区域(芯片)和检测它们。然而,作品很少注意到,包括尺度变化,物体稀疏存在一些问题时,直接培训网络芯片。在这项工作中,引入了三个增强的方法。具体来说,我们建议所有现有的裁剪方法的规模自适应模块compatable。它动态调整裁剪尺寸的对象和芯片之间的平衡盖比例,这变窄训练对象尺度变化和改进而不钟声和whistels性能;在addtion,我们介绍在数据集中areial镶嵌有效方法求对象sparity和背景相似的问题;到平衡catgory,我们本掩模在芯片提供更高质量的训练样本重新采样;我们的模型实现了对VisDrone和UAVDT的两个流行的航拍图像数据集的国家的最先进的更流畅。值得注意的是,所有方法都可以独立适用于detectiors稳步提高性能,而推理效率的牺牲。
7. HGCN-GJS: Hierarchical Graph Convolutional Network with Groupwise Joint Sampling for Trajectory Prediction [PDF] 返回目录
Yuying Chen, Congcong Liu, Bertram E. Shi, Ming Liu
Abstract: Accurate pedestrian trajectory prediction is of great importance for downstream tasks such as autonomous driving and mobile robot navigation. Fully investigating the social interactions within the crowd is crucial for accurate pedestrian trajectory prediction. However, most existing methods do not capture group level interactions well, focusing only on pairwise interactions and neglecting group-wise interactions. In this work, we propose a hierarchical graph convolutional network, HGCN-GJS, for trajectory prediction which well leverages group level interactions within the crowd. Furthermore, we introduce a novel joint sampling scheme for modeling the joint distribution of multiple pedestrians in the future trajectories. Based on the group information, this scheme associates the trajectory of one person with the trajectory of other people in the group, but maintains the independence of the trajectories of outsiders. We demonstrate the performance of our network on several trajectory prediction datasets, achieving state-of-the-art results on all datasets considered.
摘要:准确的行人轨迹预测是对下游任务,如自主行走和移动机器人导航重视。充分调查的社会交往中的人群是准确的行人轨迹预测至关重要。然而,大多数现有的方法不捕获集团层面的互动很好,只专注于两两相互作用而忽略组间的相互作用。在这项工作中,我们提出了一个层次图卷积网络,HGCN-GJS,用于轨迹预测这很好利用了人群中的组级别的交互。此外,我们介绍了在未来轨迹建模多个行人的联合分布新颖的联合采样方案。基于组信息,此方案相关联一个人与组中其他人的轨迹的轨迹,但保持局外人的轨迹的独立性。我们证明我们的网络在几个轨迹预测数据集的性能,实现在考虑所有数据集的国家的最先进的成果。
Yuying Chen, Congcong Liu, Bertram E. Shi, Ming Liu
Abstract: Accurate pedestrian trajectory prediction is of great importance for downstream tasks such as autonomous driving and mobile robot navigation. Fully investigating the social interactions within the crowd is crucial for accurate pedestrian trajectory prediction. However, most existing methods do not capture group level interactions well, focusing only on pairwise interactions and neglecting group-wise interactions. In this work, we propose a hierarchical graph convolutional network, HGCN-GJS, for trajectory prediction which well leverages group level interactions within the crowd. Furthermore, we introduce a novel joint sampling scheme for modeling the joint distribution of multiple pedestrians in the future trajectories. Based on the group information, this scheme associates the trajectory of one person with the trajectory of other people in the group, but maintains the independence of the trajectories of outsiders. We demonstrate the performance of our network on several trajectory prediction datasets, achieving state-of-the-art results on all datasets considered.
摘要:准确的行人轨迹预测是对下游任务,如自主行走和移动机器人导航重视。充分调查的社会交往中的人群是准确的行人轨迹预测至关重要。然而,大多数现有的方法不捕获集团层面的互动很好,只专注于两两相互作用而忽略组间的相互作用。在这项工作中,我们提出了一个层次图卷积网络,HGCN-GJS,用于轨迹预测这很好利用了人群中的组级别的交互。此外,我们介绍了在未来轨迹建模多个行人的联合分布新颖的联合采样方案。基于组信息,此方案相关联一个人与组中其他人的轨迹的轨迹,但保持局外人的轨迹的独立性。我们证明我们的网络在几个轨迹预测数据集的性能,实现在考虑所有数据集的国家的最先进的成果。
8. A Mobile App for Wound Localization using Deep Learning [PDF] 返回目录
D. M. Anisuzzaman, Yash Patel, Jeffrey Niezgoda, Sandeep Gopalakrishnan, Zeyun Yu
Abstract: We present an automated wound localizer from 2D wound and ulcer images by using deep neural network, as the first step towards building an automated and complete wound diagnostic system. The wound localizer has been developed by using YOLOv3 model, which is then turned into an iOS mobile application. The developed localizer can detect the wound and its surrounding tissues and isolate the localized wounded region from images, which would be very helpful for future processing such as wound segmentation and classification due to the removal of unnecessary regions from wound images. For Mobile App development with video processing, a lighter version of YOLOv3 named tiny-YOLOv3 has been used. The model is trained and tested on our own image dataset in collaboration with AZH Wound and Vascular Center, Milwaukee, Wisconsin. The YOLOv3 model is compared with SSD model, showing that YOLOv3 gives a mAP value of 93.9%, which is much better than the SSD model (86.4%). The robustness and reliability of these models are also tested on a publicly available dataset named Medetec and shows a very good performance as well.
摘要:通过使用深神经网络呈现来自2D伤口和溃疡的图像的自动卷绕定位,作为朝着建立一个自动化和伤口完全诊断系统的第一步。伤口定位已通过使用YOLOv3模型,然后将其转变成一个iOS移动应用开发的。发达定位可以检测伤口和它的周围组织,并从图像,这将处理诸如伤口分割和分类成为未来非常有益的,由于从伤口图像中去除不需要的区域的隔离局部受伤区域。移动应用的发展与视频处理,YOLOv3的轻型版本命名为微小-YOLOv3已被使用。该模型训练和在与AZH伤口和血管中心,威斯康星州密尔沃基市的合作我们自己的图像数据集进行测试。该YOLOv3模型与SSD模式相比,显示出YOLOv3给出的93.9%,这比SSD模型(86.4%)要好得多映射值。这些模型的鲁棒性和可靠性也被测试名为Medetec一个公开的数据集,并显示了非常不错的表现,以及。
D. M. Anisuzzaman, Yash Patel, Jeffrey Niezgoda, Sandeep Gopalakrishnan, Zeyun Yu
Abstract: We present an automated wound localizer from 2D wound and ulcer images by using deep neural network, as the first step towards building an automated and complete wound diagnostic system. The wound localizer has been developed by using YOLOv3 model, which is then turned into an iOS mobile application. The developed localizer can detect the wound and its surrounding tissues and isolate the localized wounded region from images, which would be very helpful for future processing such as wound segmentation and classification due to the removal of unnecessary regions from wound images. For Mobile App development with video processing, a lighter version of YOLOv3 named tiny-YOLOv3 has been used. The model is trained and tested on our own image dataset in collaboration with AZH Wound and Vascular Center, Milwaukee, Wisconsin. The YOLOv3 model is compared with SSD model, showing that YOLOv3 gives a mAP value of 93.9%, which is much better than the SSD model (86.4%). The robustness and reliability of these models are also tested on a publicly available dataset named Medetec and shows a very good performance as well.
摘要:通过使用深神经网络呈现来自2D伤口和溃疡的图像的自动卷绕定位,作为朝着建立一个自动化和伤口完全诊断系统的第一步。伤口定位已通过使用YOLOv3模型,然后将其转变成一个iOS移动应用开发的。发达定位可以检测伤口和它的周围组织,并从图像,这将处理诸如伤口分割和分类成为未来非常有益的,由于从伤口图像中去除不需要的区域的隔离局部受伤区域。移动应用的发展与视频处理,YOLOv3的轻型版本命名为微小-YOLOv3已被使用。该模型训练和在与AZH伤口和血管中心,威斯康星州密尔沃基市的合作我们自己的图像数据集进行测试。该YOLOv3模型与SSD模式相比,显示出YOLOv3给出的93.9%,这比SSD模型(86.4%)要好得多映射值。这些模型的鲁棒性和可靠性也被测试名为Medetec一个公开的数据集,并显示了非常不错的表现,以及。
9. Polyp-artifact relationship analysis using graph inductive learned representations [PDF] 返回目录
Roger D. Soberanis-Mukul, Shadi Albarqouni, Nassir Navab
Abstract: The diagnosis process of colorectal cancer mainly focuses on the localization and characterization of abnormal growths in the colon tissue known as polyps. Despite recent advances in deep object localization, the localization of polyps remains challenging due to the similarities between tissues, and the high level of artifacts. Recent studies have shown the negative impact of the presence of artifacts in the polyp detection task, and have started to take them into account within the training process. However, the use of prior knowledge related to the spatial interaction of polyps and artifacts has not yet been considered. In this work, we incorporate artifact knowledge in a post-processing step. Our method models this task as an inductive graph representation learning problem, and is composed of training and inference steps. Detected bounding boxes around polyps and artifacts are considered as nodes connected by a defined criterion. The training step generates a node classifier with ground truth bounding boxes. In inference, we use this classifier to analyze a second graph, generated from artifact and polyp predictions given by region proposal networks. We evaluate how the choices in the connectivity and artifacts affect the performance of our method and show that it has the potential to reduce the false positives in the results of a region proposal network.
摘要:结肠直肠癌的诊断处理主要集中在被称为息肉结肠组织中的定位和反常生长量表征。尽管在深对象定位的最新进展,息肉遗体的本地化挑战由于组织之间的相似性,和工件的较高的水平。最近的研究表明,在息肉检测任务文物存在的负面影响,并已开始考虑到这些训练过程中。然而,使用相关的息肉和文物的空间相互作用先验知识尚未考虑。在这项工作中,我们结合在后处理步骤神器知识。我们的方法模型此任务感应图表示的学习问题,是由训练和推理步骤。检测边界周围息肉和工件箱被认为是由定义标准连接的节点。训练步骤生成地面实况边框节点分类。在推理,我们使用这种分类来分析第二曲线,从区域网络的提议给予神器和息肉的预测产生的。我们评估的连通性的选择和文物如何影响我们的方法的性能,并表明它具有降低区域网络的建议的结果假阳性的可能。
Roger D. Soberanis-Mukul, Shadi Albarqouni, Nassir Navab
Abstract: The diagnosis process of colorectal cancer mainly focuses on the localization and characterization of abnormal growths in the colon tissue known as polyps. Despite recent advances in deep object localization, the localization of polyps remains challenging due to the similarities between tissues, and the high level of artifacts. Recent studies have shown the negative impact of the presence of artifacts in the polyp detection task, and have started to take them into account within the training process. However, the use of prior knowledge related to the spatial interaction of polyps and artifacts has not yet been considered. In this work, we incorporate artifact knowledge in a post-processing step. Our method models this task as an inductive graph representation learning problem, and is composed of training and inference steps. Detected bounding boxes around polyps and artifacts are considered as nodes connected by a defined criterion. The training step generates a node classifier with ground truth bounding boxes. In inference, we use this classifier to analyze a second graph, generated from artifact and polyp predictions given by region proposal networks. We evaluate how the choices in the connectivity and artifacts affect the performance of our method and show that it has the potential to reduce the false positives in the results of a region proposal network.
摘要:结肠直肠癌的诊断处理主要集中在被称为息肉结肠组织中的定位和反常生长量表征。尽管在深对象定位的最新进展,息肉遗体的本地化挑战由于组织之间的相似性,和工件的较高的水平。最近的研究表明,在息肉检测任务文物存在的负面影响,并已开始考虑到这些训练过程中。然而,使用相关的息肉和文物的空间相互作用先验知识尚未考虑。在这项工作中,我们结合在后处理步骤神器知识。我们的方法模型此任务感应图表示的学习问题,是由训练和推理步骤。检测边界周围息肉和工件箱被认为是由定义标准连接的节点。训练步骤生成地面实况边框节点分类。在推理,我们使用这种分类来分析第二曲线,从区域网络的提议给予神器和息肉的预测产生的。我们评估的连通性的选择和文物如何影响我们的方法的性能,并表明它具有降低区域网络的建议的结果假阳性的可能。
10. CSI2Image: Image Reconstruction from Channel State Information Using Generative Adversarial Networks [PDF] 返回目录
Sorachi Kato, Takeru Fukushima, Tomoki Murakami, Hirantha Abeysekera, Yusuke Iwasaki, Takuya Fujihashi, Takashi Watanabe, Shunsuke Saruwatari
Abstract: This study aims to find the upper limit of the wireless sensing capability of acquiring physical space information. This is a challenging objective, because at present, wireless sensing studies continue to succeed in acquiring novel phenomena. Thus, although a complete answer cannot be obtained yet, a step is taken towards it here. To achieve this, CSI2Image, a novel channel-state-information (CSI)-to-image conversion method based on generative adversarial networks (GANs), is proposed. The type of physical information acquired using wireless sensing can be estimated by checking wheth\-er the reconstructed image captures the desired physical space information. Three types of learning methods are demonstrated: gen\-er\-a\-tor-only learning, GAN-only learning, and hybrid learning. Evaluating the performance of CSI2Image is difficult, because both the clarity of the image and the presence of the desired physical space information must be evaluated. To solve this problem, a quantitative evaluation methodology using an object detection library is also proposed. CSI2Image was implemented using IEEE 802.11ac compressed CSI, and the evaluation results show that the image was successfully reconstructed. The results demonstrate that gen\-er\-a\-tor-only learning is sufficient for simple wireless sensing problems, but in complex wireless sensing problems, GANs are important for reconstructing generalized images with more accurate physical space information.
摘要:本研究旨在寻找获取物理空间信息的无线传感能力的上限。这是一个具有挑战性的目标,因为目前,无线传感研究继续取得新的现象成功。因此,虽然一个完整的答案目前还无法获得,一个步骤,在这里迈出了它。为了实现这一点,CSI2Image,一种新颖的信道状态信息基于生成对抗网络(甘斯)(CSI)-to-图像转换方法,提出了。的类型的使用无线传感获取的物理信息可通过检查wheth \ -er重建图像捕获所需的物理空间的信息来估计。三种类型的学习方法证明:根\ -er \ -a \ - 叔 - 只有学习,GAN-只有学习和混合学习。评价CSI2Image的性能是困难的,因为图像的两者的清晰度和所需的物理空间信息的存在必须进行评估。为了解决这个问题,使用对象检测库的定量评价方法也提出了。 CSI2Image使用IEEE 802.11ac的压缩CSI实施,评价结果表明,该映像已成功重建。结果表明,根\ -er \ -a \ - 叔 - 只有学习就足够了简单的无线传感的问题,但在复杂的无线传感问题,甘斯是用更精确的物理空间信息重建广义图像非常重要的。
Sorachi Kato, Takeru Fukushima, Tomoki Murakami, Hirantha Abeysekera, Yusuke Iwasaki, Takuya Fujihashi, Takashi Watanabe, Shunsuke Saruwatari
Abstract: This study aims to find the upper limit of the wireless sensing capability of acquiring physical space information. This is a challenging objective, because at present, wireless sensing studies continue to succeed in acquiring novel phenomena. Thus, although a complete answer cannot be obtained yet, a step is taken towards it here. To achieve this, CSI2Image, a novel channel-state-information (CSI)-to-image conversion method based on generative adversarial networks (GANs), is proposed. The type of physical information acquired using wireless sensing can be estimated by checking wheth\-er the reconstructed image captures the desired physical space information. Three types of learning methods are demonstrated: gen\-er\-a\-tor-only learning, GAN-only learning, and hybrid learning. Evaluating the performance of CSI2Image is difficult, because both the clarity of the image and the presence of the desired physical space information must be evaluated. To solve this problem, a quantitative evaluation methodology using an object detection library is also proposed. CSI2Image was implemented using IEEE 802.11ac compressed CSI, and the evaluation results show that the image was successfully reconstructed. The results demonstrate that gen\-er\-a\-tor-only learning is sufficient for simple wireless sensing problems, but in complex wireless sensing problems, GANs are important for reconstructing generalized images with more accurate physical space information.
摘要:本研究旨在寻找获取物理空间信息的无线传感能力的上限。这是一个具有挑战性的目标,因为目前,无线传感研究继续取得新的现象成功。因此,虽然一个完整的答案目前还无法获得,一个步骤,在这里迈出了它。为了实现这一点,CSI2Image,一种新颖的信道状态信息基于生成对抗网络(甘斯)(CSI)-to-图像转换方法,提出了。的类型的使用无线传感获取的物理信息可通过检查wheth \ -er重建图像捕获所需的物理空间的信息来估计。三种类型的学习方法证明:根\ -er \ -a \ - 叔 - 只有学习,GAN-只有学习和混合学习。评价CSI2Image的性能是困难的,因为图像的两者的清晰度和所需的物理空间信息的存在必须进行评估。为了解决这个问题,使用对象检测库的定量评价方法也提出了。 CSI2Image使用IEEE 802.11ac的压缩CSI实施,评价结果表明,该映像已成功重建。结果表明,根\ -er \ -a \ - 叔 - 只有学习就足够了简单的无线传感的问题,但在复杂的无线传感问题,甘斯是用更精确的物理空间信息重建广义图像非常重要的。
11. Old Photo Restoration via Deep Latent Space Translation [PDF] 返回目录
Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
Abstract: We propose to restore old photos that suffer from severe degradation through a deep learning approach. Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize. Therefore, we propose a novel triplet domain translation network by leveraging real photos along with massive synthetic image pairs. Specifically, we train two variational autoencoders (VAEs) to respectively transform old photos and clean photos into two latent spaces. And the translation between these two latent spaces is learned with synthetic paired data. This translation generalizes well to real photos because the domain gap is closed in the compact latent space. Besides, to address multiple degradations mixed in one old photo, we design a global branch with apartial nonlocal block targeting to the structured defects, such as scratches and dust spots, and a local branch targeting to the unstructured defects, such as noises and blurriness. Two branches are fused in the latent space, leading to improved capability to restore old photos from multiple defects. Furthermore, we apply another face refinement network to recover fine details of faces in the old photos, thus ultimately generating photos with enhanced perceptual quality. With comprehensive experiments, the proposed pipeline demonstrates superior performance over state-of-the-art methods as well as existing commercial tools in terms of visual quality for old photos restoration.
摘要:本文提出恢复,通过深刻的学习方式从严重退化遭受的老照片。不同于通过监督学习来解决以往恢复任务,在真实照片的降解是复杂的和合成的图像和真实的老照片之间的差距域使网络不能一概而论。因此,我们通过利用实物照片进行大规模的合成图像对一起提出了一个新颖的三重领域的翻译网络。具体来说,我们班列车2个变自动编码(VAES)老照片和干净的照片分别转换成两个潜在空间。而这两个潜在的空间之间的转换与合成配对数据获悉。这种翻译推广以及真实的照片,因为域差距在紧凑的潜在空间封闭。此外,为解决多个降级在一个旧照片混,我们设计了一个全球性的分支apartial非本地块靶向结构缺陷,如划痕和污点,以及本地分支靶向非结构化的缺陷,比如噪声和模糊。两个分支融合于潜在空间,从而提高能力,以从多个缺陷修复老照片。此外,我们申请另一张面孔细化网络恢复面部的细节的老照片,因此最终产生具有增强的感知质量的照片。随着综合性实验,所提出的管道展示了国家的最先进的方法,卓越的性能,以及在老照片恢复视觉质量方面存在的商业工具。
Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
Abstract: We propose to restore old photos that suffer from severe degradation through a deep learning approach. Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize. Therefore, we propose a novel triplet domain translation network by leveraging real photos along with massive synthetic image pairs. Specifically, we train two variational autoencoders (VAEs) to respectively transform old photos and clean photos into two latent spaces. And the translation between these two latent spaces is learned with synthetic paired data. This translation generalizes well to real photos because the domain gap is closed in the compact latent space. Besides, to address multiple degradations mixed in one old photo, we design a global branch with apartial nonlocal block targeting to the structured defects, such as scratches and dust spots, and a local branch targeting to the unstructured defects, such as noises and blurriness. Two branches are fused in the latent space, leading to improved capability to restore old photos from multiple defects. Furthermore, we apply another face refinement network to recover fine details of faces in the old photos, thus ultimately generating photos with enhanced perceptual quality. With comprehensive experiments, the proposed pipeline demonstrates superior performance over state-of-the-art methods as well as existing commercial tools in terms of visual quality for old photos restoration.
摘要:本文提出恢复,通过深刻的学习方式从严重退化遭受的老照片。不同于通过监督学习来解决以往恢复任务,在真实照片的降解是复杂的和合成的图像和真实的老照片之间的差距域使网络不能一概而论。因此,我们通过利用实物照片进行大规模的合成图像对一起提出了一个新颖的三重领域的翻译网络。具体来说,我们班列车2个变自动编码(VAES)老照片和干净的照片分别转换成两个潜在空间。而这两个潜在的空间之间的转换与合成配对数据获悉。这种翻译推广以及真实的照片,因为域差距在紧凑的潜在空间封闭。此外,为解决多个降级在一个旧照片混,我们设计了一个全球性的分支apartial非本地块靶向结构缺陷,如划痕和污点,以及本地分支靶向非结构化的缺陷,比如噪声和模糊。两个分支融合于潜在空间,从而提高能力,以从多个缺陷修复老照片。此外,我们申请另一张面孔细化网络恢复面部的细节的老照片,因此最终产生具有增强的感知质量的照片。随着综合性实验,所提出的管道展示了国家的最先进的方法,卓越的性能,以及在老照片恢复视觉质量方面存在的商业工具。
12. TCDesc: Learning Topology Consistent Descriptors for Image Matching [PDF] 返回目录
Honghu Pan, Fanyang Meng, Nana Fan, Zhenyu He
Abstract: The constraint of neighborhood consistency or local consistency is widely used for robust image matching. In this paper, we focus on learning neighborhood topology consistent descriptors (TCDesc), while former works of learning descriptors, such as HardNet and DSM, only consider point-to-point Euclidean distance among descriptors and totally neglect neighborhood information of descriptors. To learn topology consistent descriptors, first we propose the linear combination weights to depict the topological relationship between center descriptor and its kNN descriptors, where the difference between center descriptor and the linear combination of its kNN descriptors is minimized. Then we propose the global mapping function which maps the local linear combination weights to the global topology vector and define the topology distance of matching descriptors as l1 distance between their topology vectors. Last we employ adaptive weighting strategy to jointly minimize topology distance and Euclidean distance, which automatically adjust the weight or attention of two distances in triplet loss. Our method has the following two advantages: (1) We are the first to consider neighborhood information of descriptors, while former works mainly focus on neighborhood consistency of feature points; (2) Our method can be applied in any former work of learning descriptors by triplet loss. Experimental results verify the generalization of our method: We can improve the performances of both HardNet and DSM on several benchmarks.
摘要:附近的一致性或局部一致性约束广泛用于稳健的图像匹配。在本文中,我们重点学习附近的拓扑结构是一致的描述符(TCDesc),一边学习的描述,如HardNet和DSM的以前的作品,只考虑描述符中的点至点的欧氏距离和描述的完全忽视附近的信息。要了解拓扑一致的描述,我们首先提出了线性组合的权重来描绘中心描述符及其k近邻的描述,其中中心描述符及其k近邻描述的线性组合之间的差异最小化之间的拓扑关系。然后,我们提出一种在本地线性组合权重映射到全局拓扑矢量全局映射函数和匹配限定的描述符的拓扑距离作为其拓扑矢量之间距离L1。最后我们采用自适应加权策略,共同减少拓扑距离和欧氏距离,自动调节三重损失两个距离的重量或关注。我们的方法有以下两个优点:(1)我们首先考虑描述符的邻近地区的信息,而以前的作品主要集中在特征点附近的一致性; (2)我们的方法可以在通过三重损失学描述的任何前任工作中应用。实验结果验证了该方法的概括:我们可以提高几个基准都HardNet和DSM的性能。
Honghu Pan, Fanyang Meng, Nana Fan, Zhenyu He
Abstract: The constraint of neighborhood consistency or local consistency is widely used for robust image matching. In this paper, we focus on learning neighborhood topology consistent descriptors (TCDesc), while former works of learning descriptors, such as HardNet and DSM, only consider point-to-point Euclidean distance among descriptors and totally neglect neighborhood information of descriptors. To learn topology consistent descriptors, first we propose the linear combination weights to depict the topological relationship between center descriptor and its kNN descriptors, where the difference between center descriptor and the linear combination of its kNN descriptors is minimized. Then we propose the global mapping function which maps the local linear combination weights to the global topology vector and define the topology distance of matching descriptors as l1 distance between their topology vectors. Last we employ adaptive weighting strategy to jointly minimize topology distance and Euclidean distance, which automatically adjust the weight or attention of two distances in triplet loss. Our method has the following two advantages: (1) We are the first to consider neighborhood information of descriptors, while former works mainly focus on neighborhood consistency of feature points; (2) Our method can be applied in any former work of learning descriptors by triplet loss. Experimental results verify the generalization of our method: We can improve the performances of both HardNet and DSM on several benchmarks.
摘要:附近的一致性或局部一致性约束广泛用于稳健的图像匹配。在本文中,我们重点学习附近的拓扑结构是一致的描述符(TCDesc),一边学习的描述,如HardNet和DSM的以前的作品,只考虑描述符中的点至点的欧氏距离和描述的完全忽视附近的信息。要了解拓扑一致的描述,我们首先提出了线性组合的权重来描绘中心描述符及其k近邻的描述,其中中心描述符及其k近邻描述的线性组合之间的差异最小化之间的拓扑关系。然后,我们提出一种在本地线性组合权重映射到全局拓扑矢量全局映射函数和匹配限定的描述符的拓扑距离作为其拓扑矢量之间距离L1。最后我们采用自适应加权策略,共同减少拓扑距离和欧氏距离,自动调节三重损失两个距离的重量或关注。我们的方法有以下两个优点:(1)我们首先考虑描述符的邻近地区的信息,而以前的作品主要集中在特征点附近的一致性; (2)我们的方法可以在通过三重损失学描述的任何前任工作中应用。实验结果验证了该方法的概括:我们可以提高几个基准都HardNet和DSM的性能。
13. SA-Net: A deep spectral analysis network for image clustering [PDF] 返回目录
Jinghua Wang, Jianmin Jiang
Abstract: Although supervised deep representation learning has attracted enormous attentions across areas of pattern recognition and computer vision, little progress has been made towards unsupervised deep representation learning for image clustering. In this paper, we propose a deep spectral analysis network for unsupervised representation learning and image clustering. While spectral analysis is established with solid theoretical foundations and has been widely applied to unsupervised data mining, its essential weakness lies in the fact that it is difficult to construct a proper affinity matrix and determine the involving Laplacian matrix for a given dataset. In this paper, we propose a SA-Net to overcome these weaknesses and achieve improved image clustering by extending the spectral analysis procedure into a deep learning framework with multiple layers. The SA-Net has the capability to learn deep representations and reveal deep correlations among data samples. Compared with the existing spectral analysis, the SA-Net achieves two advantages: (i) Given the fact that one spectral analysis procedure can only deal with one subset of the given dataset, our proposed SA-Net elegantly integrates multiple parallel and consecutive spectral analysis procedures together to enable interactive learning across different units towards a coordinated clustering model; (ii) Our SA-Net can identify the local similarities among different images at patch level and hence achieves a higher level of robustness against occlusions. Extensive experiments on a number of popular datasets support that our proposed SA-Net outperforms 11 benchmarks across a number of image clustering applications.
摘要:尽管监管深表示学习已经跨越模式识别和计算机视觉领域引起了巨大的关注,没有什么进展方面已经取得了无人监管的深表示学习图像集群。在本文中,我们提出了无监督代表学习和图像聚集了深刻的频谱分析网络。而建立与固体理论基础频谱分析,并已被广泛应用到无监督数据挖掘,其基本弱点在于事实,即它是难以构建一个适当的亲和度矩阵,并确定涉及拉普拉斯矩阵对于给定的数据集。在本文中,我们提出了一个SA-净克服这些弱点,并通过扩展频谱分析过程与多层的深度学习框架,实现改善图像的聚类。在SA-网拥有学习深表示,揭示数据样本中深相关的能力。与现有的频谱分析相比,SA-网实现了两个优点:(I)中给出了一个事实,就是频谱分析程序只能处理给定数据集的一个子集,我们提出的SA-Net的优雅集成了多个平行的和连续的频谱分析程序共同实现跨越努力形成协调一致的集群模型不同单位互动学习; (ⅱ)我们的SA-网可在补丁级别识别不同的图像之间的相似性本地并因此实现对闭塞的鲁棒性较高的水平。对一些流行的数据集的大量的实验支持,我们提出的SA-Net的性能优于跨越多个图像群集应用程序11个基准测试。
Jinghua Wang, Jianmin Jiang
Abstract: Although supervised deep representation learning has attracted enormous attentions across areas of pattern recognition and computer vision, little progress has been made towards unsupervised deep representation learning for image clustering. In this paper, we propose a deep spectral analysis network for unsupervised representation learning and image clustering. While spectral analysis is established with solid theoretical foundations and has been widely applied to unsupervised data mining, its essential weakness lies in the fact that it is difficult to construct a proper affinity matrix and determine the involving Laplacian matrix for a given dataset. In this paper, we propose a SA-Net to overcome these weaknesses and achieve improved image clustering by extending the spectral analysis procedure into a deep learning framework with multiple layers. The SA-Net has the capability to learn deep representations and reveal deep correlations among data samples. Compared with the existing spectral analysis, the SA-Net achieves two advantages: (i) Given the fact that one spectral analysis procedure can only deal with one subset of the given dataset, our proposed SA-Net elegantly integrates multiple parallel and consecutive spectral analysis procedures together to enable interactive learning across different units towards a coordinated clustering model; (ii) Our SA-Net can identify the local similarities among different images at patch level and hence achieves a higher level of robustness against occlusions. Extensive experiments on a number of popular datasets support that our proposed SA-Net outperforms 11 benchmarks across a number of image clustering applications.
摘要:尽管监管深表示学习已经跨越模式识别和计算机视觉领域引起了巨大的关注,没有什么进展方面已经取得了无人监管的深表示学习图像集群。在本文中,我们提出了无监督代表学习和图像聚集了深刻的频谱分析网络。而建立与固体理论基础频谱分析,并已被广泛应用到无监督数据挖掘,其基本弱点在于事实,即它是难以构建一个适当的亲和度矩阵,并确定涉及拉普拉斯矩阵对于给定的数据集。在本文中,我们提出了一个SA-净克服这些弱点,并通过扩展频谱分析过程与多层的深度学习框架,实现改善图像的聚类。在SA-网拥有学习深表示,揭示数据样本中深相关的能力。与现有的频谱分析相比,SA-网实现了两个优点:(I)中给出了一个事实,就是频谱分析程序只能处理给定数据集的一个子集,我们提出的SA-Net的优雅集成了多个平行的和连续的频谱分析程序共同实现跨越努力形成协调一致的集群模型不同单位互动学习; (ⅱ)我们的SA-网可在补丁级别识别不同的图像之间的相似性本地并因此实现对闭塞的鲁棒性较高的水平。对一些流行的数据集的大量的实验支持,我们提出的SA-Net的性能优于跨越多个图像群集应用程序11个基准测试。
14. FairCVtest Demo: Understanding Bias in Multimodal Learning with a Testbed in Fair Automatic Recruitment [PDF] 返回目录
Alejandro Peña, Ignacio Serna, Aythami Morales, Julian Fierrez
Abstract: With the aim of studying how current multimodal AI algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data, this demonstrator experiments over an automated recruitment testbed based on Curriculum Vitae: FairCVtest. The presence of decision-making algorithms in society is rapidly increasing nowadays, while concerns about their transparency and the possibility of these algorithms becoming new sources of discrimination are arising. This demo shows the capacity of the Artificial Intelligence (AI) behind a recruitment tool to extract sensitive information from unstructured data, and exploit it in combination to data biases in undesirable (unfair) ways. Aditionally, the demo includes a new algorithm (SensitiveNets) for discrimination-aware learning which eliminates sensitive information in our multimodal AI framework.
摘要:随着研究如何基于信息异构源电流多峰AI算法通过在数据敏感元件和内偏差影响的目的,经一个自动招募这个演示实验测试平台基于履历:FairCVtest。决策算法在社会中的存在是迅速增加的今天,而他们的透明度,这些算法的可能性的担忧成为新的歧视来源产生。本演示展示了人工智能(AI)后面的招聘工具的能力提取非结构化数据的敏感信息,并利用它在组合数据偏见不良(不公平)的方式。的方法,另外,演示包括歧视感知学习一种新的算法(SensitiveNets),它消除了我们的多式联运AI框架敏感信息。
Alejandro Peña, Ignacio Serna, Aythami Morales, Julian Fierrez
Abstract: With the aim of studying how current multimodal AI algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data, this demonstrator experiments over an automated recruitment testbed based on Curriculum Vitae: FairCVtest. The presence of decision-making algorithms in society is rapidly increasing nowadays, while concerns about their transparency and the possibility of these algorithms becoming new sources of discrimination are arising. This demo shows the capacity of the Artificial Intelligence (AI) behind a recruitment tool to extract sensitive information from unstructured data, and exploit it in combination to data biases in undesirable (unfair) ways. Aditionally, the demo includes a new algorithm (SensitiveNets) for discrimination-aware learning which eliminates sensitive information in our multimodal AI framework.
摘要:随着研究如何基于信息异构源电流多峰AI算法通过在数据敏感元件和内偏差影响的目的,经一个自动招募这个演示实验测试平台基于履历:FairCVtest。决策算法在社会中的存在是迅速增加的今天,而他们的透明度,这些算法的可能性的担忧成为新的歧视来源产生。本演示展示了人工智能(AI)后面的招聘工具的能力提取非结构化数据的敏感信息,并利用它在组合数据偏见不良(不公平)的方式。的方法,另外,演示包括歧视感知学习一种新的算法(SensitiveNets),它消除了我们的多式联运AI框架敏感信息。
15. Decision-based Universal Adversarial Attack [PDF] 返回目录
Jing Wu, Mingyi Zhou, Shuaicheng Liu, Yipeng Liu, Ce Zhu
Abstract: A single perturbation can pose the most natural images to be misclassified by classifiers. In black-box setting, current universal adversarial attack methods utilize substitute models to generate the perturbation, then apply the perturbation to the attacked model. However, this transfer often produces inferior results. In this study, we directly work in the black-box setting to generate the universal adversarial perturbation. Besides, we aim to design an adversary generating a single perturbation having texture like stripes based on orthogonal matrix, as the top convolutional layers are sensitive to stripes. To this end, we propose an efficient Decision-based Universal Attack (DUAttack). With few data, the proposed adversary computes the perturbation based solely on the final inferred labels, but good transferability has been realized not only across models but also span different vision tasks. The effectiveness of DUAttack is validated through comparisons with other state-of-the-art attacks. The efficiency of DUAttack is also demonstrated on real world settings including the Microsoft Azure. In addition, several representative defense methods are struggling with DUAttack, indicating the practicability of the proposed method.
摘要:一个扰动可能带来最自然的图像由分类错误分类。在黑盒设置,目前通用的敌对攻击方法利用替代模型产生扰动,然后扰动应用到攻击模式。然而,这种转移往往产生差的结果。在这项研究中,我们直接在暗箱设置工作产生的普遍敌对扰动。此外,我们的目标是设计一个对手生成单个具有扰动纹理等基于正交矩阵的条纹,作为顶部卷积层到条纹敏感。为此,我们提出了一种高效的基于决策环球攻击(DUAttack)。随着一些数据,建议对手仅计算对最终的推断基于标签的扰动,但良好的转印已经实现了不仅跨越车型,而且跨越不同的视觉任务。 DUAttack的有效性是通过与国家的最先进的其他攻击比较验证。 DUAttack的效率也表现出对现实世界的设置,包括微软Azure。此外,一些有代表性的防御方法与DUAttack挣扎,说明了方法的实用性。
Jing Wu, Mingyi Zhou, Shuaicheng Liu, Yipeng Liu, Ce Zhu
Abstract: A single perturbation can pose the most natural images to be misclassified by classifiers. In black-box setting, current universal adversarial attack methods utilize substitute models to generate the perturbation, then apply the perturbation to the attacked model. However, this transfer often produces inferior results. In this study, we directly work in the black-box setting to generate the universal adversarial perturbation. Besides, we aim to design an adversary generating a single perturbation having texture like stripes based on orthogonal matrix, as the top convolutional layers are sensitive to stripes. To this end, we propose an efficient Decision-based Universal Attack (DUAttack). With few data, the proposed adversary computes the perturbation based solely on the final inferred labels, but good transferability has been realized not only across models but also span different vision tasks. The effectiveness of DUAttack is validated through comparisons with other state-of-the-art attacks. The efficiency of DUAttack is also demonstrated on real world settings including the Microsoft Azure. In addition, several representative defense methods are struggling with DUAttack, indicating the practicability of the proposed method.
摘要:一个扰动可能带来最自然的图像由分类错误分类。在黑盒设置,目前通用的敌对攻击方法利用替代模型产生扰动,然后扰动应用到攻击模式。然而,这种转移往往产生差的结果。在这项研究中,我们直接在暗箱设置工作产生的普遍敌对扰动。此外,我们的目标是设计一个对手生成单个具有扰动纹理等基于正交矩阵的条纹,作为顶部卷积层到条纹敏感。为此,我们提出了一种高效的基于决策环球攻击(DUAttack)。随着一些数据,建议对手仅计算对最终的推断基于标签的扰动,但良好的转印已经实现了不仅跨越车型,而且跨越不同的视觉任务。 DUAttack的有效性是通过与国家的最先进的其他攻击比较验证。 DUAttack的效率也表现出对现实世界的设置,包括微软Azure。此外,一些有代表性的防御方法与DUAttack挣扎,说明了方法的实用性。
16. Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach [PDF] 返回目录
Anastasia Petrova, Dominique Vaufreydaz, Philippe Dessus
Abstract: This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application.
摘要:本文介绍了我们在情绪识别中野(EmotiW)挑战2020 1.本次挑战赛旨在音像组情感识别子任务单峰的安全保密和非个人建议在野外视频分为三种类别:正面,中性和负面。最近的深度学习模型已经在分析人与人之间的互动,预测人的行为和情感的评估显示出了巨大的进步。尽管如此,他们的表现来自基于个体的分析,该装置总结和平均个别检测的得分,这必然会引起一些隐私问题。在这项研究中,我们研究了一个节俭的办法来能够捕获整个图像全球情绪,而不使用面部或姿态的检测,或任何基于个体的特征作为输入的模型。所建议的方法混合状态的最先进的和专用的合成语料库作为训练源。随着神经网络结构为组级别的情感识别的深入探索,我们建立了一个基于VGG模型上VGAF测试组达到59.13%的准确度(第十一位的挑战)。鉴于该分析仅基于全局特征和性能上的真实世界的数据集评估单峰,这些结果是令人鼓舞的,让我们想象到这个模型延伸到多模态课堂氛围的评价,我们的最终目标应用程序。
Anastasia Petrova, Dominique Vaufreydaz, Philippe Dessus
Abstract: This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application.
摘要:本文介绍了我们在情绪识别中野(EmotiW)挑战2020 1.本次挑战赛旨在音像组情感识别子任务单峰的安全保密和非个人建议在野外视频分为三种类别:正面,中性和负面。最近的深度学习模型已经在分析人与人之间的互动,预测人的行为和情感的评估显示出了巨大的进步。尽管如此,他们的表现来自基于个体的分析,该装置总结和平均个别检测的得分,这必然会引起一些隐私问题。在这项研究中,我们研究了一个节俭的办法来能够捕获整个图像全球情绪,而不使用面部或姿态的检测,或任何基于个体的特征作为输入的模型。所建议的方法混合状态的最先进的和专用的合成语料库作为训练源。随着神经网络结构为组级别的情感识别的深入探索,我们建立了一个基于VGG模型上VGAF测试组达到59.13%的准确度(第十一位的挑战)。鉴于该分析仅基于全局特征和性能上的真实世界的数据集评估单峰,这些结果是令人鼓舞的,让我们想象到这个模型延伸到多模态课堂氛围的评价,我们的最终目标应用程序。
17. Promoting Connectivity of Network-Like Structures by Enforcing Region Separation [PDF] 返回目录
Doruk Oner, Mateusz Koziński, Leonardo Citraro, Nathan C. Dadap, Alexandra G. Konings, Pascal Fua
Abstract: We propose a novel, connectivity-oriented loss function for training deep convolutional networks to reconstruct network-like structures, like roads and irrigation canals, from aerial images. The main idea behind our loss is to express the connectivity of roads, or canals, in terms of disconnections that they create between background regions of the image. In simple terms, a gap in the predicted road causes two background regions, that lie on the opposite sides of a ground truth road, to touch in prediction. Our loss function is designed to prevent such unwanted connections between background regions, and therefore close the gaps in predicted roads. It also prevents predicting false positive roads and canals by penalizing unwarranted disconnections of background regions. In order to capture even short, dead-ending road segments, we evaluate the loss in small image crops. We show, in experiments on two standard road benchmarks and a new data set of irrigation canals, that convnets trained with our loss function recover road connectivity so well, that it suffices to skeletonize their output to produce state of the art maps. A distinct advantage of our approach is that the loss can be plugged in to any existing training setup without further modifications.
摘要:本文提出一种新型的,面向连接的损失函数用于训练卷积深网络重建网络状结构,如道路和灌溉水渠,从航拍图像。我们的损失背后的主要思想是快速路,或运河的连通性,因为它们的图像的背景区域之间创建断开的条款。简单来说,在所预测的道路的间隙导致两个背景区域,即位于一条地面实况路的相对侧上,在预测触摸。我们的损失函数被设计成防止背景区域之间的这种不希望的连接,因此,在预测的道路封闭间隙。它还可以防止通过惩罚背景区域的不必要的断开预测假阳性的道路和运河。为了甚至捕捉短,死结束的路段,我们评估在小的图像农作物的损失。我们发现,在两个标准的道路基准和灌溉渠道的一个新的数据集的实验,用我们的损失函数训练有素convnets恢复道路连通性这么好,它足以它们的输出缩略的艺术地图的生产状态。我们的方法的显着优点是可以将损失插入到任何现有的培训设置没有进一步的修改。
Doruk Oner, Mateusz Koziński, Leonardo Citraro, Nathan C. Dadap, Alexandra G. Konings, Pascal Fua
Abstract: We propose a novel, connectivity-oriented loss function for training deep convolutional networks to reconstruct network-like structures, like roads and irrigation canals, from aerial images. The main idea behind our loss is to express the connectivity of roads, or canals, in terms of disconnections that they create between background regions of the image. In simple terms, a gap in the predicted road causes two background regions, that lie on the opposite sides of a ground truth road, to touch in prediction. Our loss function is designed to prevent such unwanted connections between background regions, and therefore close the gaps in predicted roads. It also prevents predicting false positive roads and canals by penalizing unwarranted disconnections of background regions. In order to capture even short, dead-ending road segments, we evaluate the loss in small image crops. We show, in experiments on two standard road benchmarks and a new data set of irrigation canals, that convnets trained with our loss function recover road connectivity so well, that it suffices to skeletonize their output to produce state of the art maps. A distinct advantage of our approach is that the loss can be plugged in to any existing training setup without further modifications.
摘要:本文提出一种新型的,面向连接的损失函数用于训练卷积深网络重建网络状结构,如道路和灌溉水渠,从航拍图像。我们的损失背后的主要思想是快速路,或运河的连通性,因为它们的图像的背景区域之间创建断开的条款。简单来说,在所预测的道路的间隙导致两个背景区域,即位于一条地面实况路的相对侧上,在预测触摸。我们的损失函数被设计成防止背景区域之间的这种不希望的连接,因此,在预测的道路封闭间隙。它还可以防止通过惩罚背景区域的不必要的断开预测假阳性的道路和运河。为了甚至捕捉短,死结束的路段,我们评估在小的图像农作物的损失。我们发现,在两个标准的道路基准和灌溉渠道的一个新的数据集的实验,用我们的损失函数训练有素convnets恢复道路连通性这么好,它足以它们的输出缩略的艺术地图的生产状态。我们的方法的显着优点是可以将损失插入到任何现有的培训设置没有进一步的修改。
18. Optimal Use of Multi-spectral Satellite Data with Convolutional Neural Networks [PDF] 返回目录
Sagar Vaze, James Foley, Mohamed Seddiq, Alexey Unagaev, Natalia Efremova
Abstract: The analysis of satellite imagery will prove a crucial tool in the pursuit of sustainable development. While Convolutional Neural Networks (CNNs) have made large gains in natural image analysis, their application to multi-spectral satellite images (wherein input images have a large number of channels) remains relatively unexplored. In this paper, we compare different methods of leveraging multi-band information with CNNs, demonstrating the performance of all compared methods on the task of semantic segmentation of agricultural vegetation (vineyards). We show that standard industry practice of using bands selected by a domain expert leads to a significantly worse test accuracy than the other methods compared. Specifically, we compare: using bands specified by an expert; using all available bands; learning attention maps over the input bands; and leveraging Bayesian optimisation to dictate band choice. We show that simply using all available band information already increases test time performance, and show that the Bayesian optimisation, first applied to band selection in this work, can be used to further boost accuracy.
摘要:卫星图像的分析将证明在追求可持续发展的一个重要工具。虽然卷积神经网络(细胞神经网络)已在自然图像分析大的增益,它们的应用到多光谱卫星图像(其中,输入图像具有大量信道)保持相对未知。在本文中,我们比较了利用与细胞神经网络多带信息,表明对农业植被(葡萄园)的语义分割的任务都比较的方法表现的不同方法。我们展示利用领域专家的选择导致的显著更糟糕的测试精度比相比其他方法带中的标准的行业惯例。具体来说,我们比较:使用由专家指定的频段;使用所有可用的频段;学习注意力映射在输入波段;并利用贝叶斯优化支配乐队的选择。我们表明,简单地使用所有可用频带信息已经增加了测试时间的表现,并表明,贝叶斯优化,首先应用于波段选择在此工作,可以用来进一步推动准确性。
Sagar Vaze, James Foley, Mohamed Seddiq, Alexey Unagaev, Natalia Efremova
Abstract: The analysis of satellite imagery will prove a crucial tool in the pursuit of sustainable development. While Convolutional Neural Networks (CNNs) have made large gains in natural image analysis, their application to multi-spectral satellite images (wherein input images have a large number of channels) remains relatively unexplored. In this paper, we compare different methods of leveraging multi-band information with CNNs, demonstrating the performance of all compared methods on the task of semantic segmentation of agricultural vegetation (vineyards). We show that standard industry practice of using bands selected by a domain expert leads to a significantly worse test accuracy than the other methods compared. Specifically, we compare: using bands specified by an expert; using all available bands; learning attention maps over the input bands; and leveraging Bayesian optimisation to dictate band choice. We show that simply using all available band information already increases test time performance, and show that the Bayesian optimisation, first applied to band selection in this work, can be used to further boost accuracy.
摘要:卫星图像的分析将证明在追求可持续发展的一个重要工具。虽然卷积神经网络(细胞神经网络)已在自然图像分析大的增益,它们的应用到多光谱卫星图像(其中,输入图像具有大量信道)保持相对未知。在本文中,我们比较了利用与细胞神经网络多带信息,表明对农业植被(葡萄园)的语义分割的任务都比较的方法表现的不同方法。我们展示利用领域专家的选择导致的显著更糟糕的测试精度比相比其他方法带中的标准的行业惯例。具体来说,我们比较:使用由专家指定的频段;使用所有可用的频段;学习注意力映射在输入波段;并利用贝叶斯优化支配乐队的选择。我们表明,简单地使用所有可用频带信息已经增加了测试时间的表现,并表明,贝叶斯优化,首先应用于波段选择在此工作,可以用来进一步推动准确性。
19. Gravitational Models Explain Shifts on Human Visual Attention [PDF] 返回目录
Dario Zanca, Marco Gori, Stefano Melacci, Alessandra Rufa
Abstract: Visual attention refers to the human brain's ability to select relevant sensory information for preferential processing, improving performance in visual and cognitive tasks. It proceeds in two phases. One in which visual feature maps are acquired and processed in parallel. Another where the information from these maps is merged in order to select a single location to be attended for further and more complex computations and reasoning. Its computational description is challenging, especially if the temporal dynamics of the process are taken into account. Numerous methods to estimate saliency have been proposed in the last three decades. They achieve almost perfect performance in estimating saliency at the pixel level, but the way they generate shifts in visual attention fully depends on winner-take-all (WTA) circuitry. WTA is implemented} by the biological hardware in order to select a location with maximum saliency, towards which to direct overt attention. In this paper we propose a gravitational model (GRAV) to describe the attentional shifts. Every single feature acts as an attractor and {the shifts are the result of the joint effects of the attractors. In the current framework, the assumption of a single, centralized saliency map is no longer necessary, though still plausible. Quantitative results on two large image datasets show that this model predicts shifts more accurately than winner-take-all.
摘要:视觉注意是指人脑的选择要优先处理相关的感觉信息,从而改善视觉和认知任务的表现能力。它的收益在两个阶段。在其中视觉特征映射获取和并行处理。另一其中来自这些地图的信息,以便选择一个位置被合并到出席进一步和更复杂的计算和推理。它的计算说明是具有挑战性的,尤其是当过程的时间动态变化考虑在内。许多方法来估计的显着性在过去三个十年中被提出。他们实现在像素级估计的显着性的表现近乎完美,但它们产生的视觉注意力转移的方式完全取决于赢家通吃(WTA)电路。 WTA是为了选择具有最大的显着位置,朝向直接显性注意实施}由生物硬件。在本文中,我们提出了一个引力模型(GRAV)来描述注意力转移。每一个功能充当吸引和{的移位是的吸引子的联合作用的结果。在目前的框架下,一个集中的显着图的假设不再是必要的,虽然还是合理的。在两个大的图像数据集的定量结果表明,该模型更准确地预测的变化比赢家通吃。
Dario Zanca, Marco Gori, Stefano Melacci, Alessandra Rufa
Abstract: Visual attention refers to the human brain's ability to select relevant sensory information for preferential processing, improving performance in visual and cognitive tasks. It proceeds in two phases. One in which visual feature maps are acquired and processed in parallel. Another where the information from these maps is merged in order to select a single location to be attended for further and more complex computations and reasoning. Its computational description is challenging, especially if the temporal dynamics of the process are taken into account. Numerous methods to estimate saliency have been proposed in the last three decades. They achieve almost perfect performance in estimating saliency at the pixel level, but the way they generate shifts in visual attention fully depends on winner-take-all (WTA) circuitry. WTA is implemented} by the biological hardware in order to select a location with maximum saliency, towards which to direct overt attention. In this paper we propose a gravitational model (GRAV) to describe the attentional shifts. Every single feature acts as an attractor and {the shifts are the result of the joint effects of the attractors. In the current framework, the assumption of a single, centralized saliency map is no longer necessary, though still plausible. Quantitative results on two large image datasets show that this model predicts shifts more accurately than winner-take-all.
摘要:视觉注意是指人脑的选择要优先处理相关的感觉信息,从而改善视觉和认知任务的表现能力。它的收益在两个阶段。在其中视觉特征映射获取和并行处理。另一其中来自这些地图的信息,以便选择一个位置被合并到出席进一步和更复杂的计算和推理。它的计算说明是具有挑战性的,尤其是当过程的时间动态变化考虑在内。许多方法来估计的显着性在过去三个十年中被提出。他们实现在像素级估计的显着性的表现近乎完美,但它们产生的视觉注意力转移的方式完全取决于赢家通吃(WTA)电路。 WTA是为了选择具有最大的显着位置,朝向直接显性注意实施}由生物硬件。在本文中,我们提出了一个引力模型(GRAV)来描述注意力转移。每一个功能充当吸引和{的移位是的吸引子的联合作用的结果。在目前的框架下,一个集中的显着图的假设不再是必要的,虽然还是合理的。在两个大的图像数据集的定量结果表明,该模型更准确地预测的变化比赢家通吃。
20. 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [PDF] 返回目录
Ashesh Mishra, Hsuan-Tien Lin
Abstract: Gaze estimation involves predicting where the person is looking at, given either a single input image or a sequence of images. One challenging task, gaze estimation in the wild, concerns data collected in unconstrained environments with varying camera-person distances, like the Gaze360 dataset. The varying distances result in varying face sizes in the images, which makes it hard for current CNN backbones to estimate the gaze robustly. Inspired by our natural skill to identify the gaze by taking a focused look at the face area, we propose a novel architecture that similarly zooms in on the face area of the image at multiple scales to improve prediction accuracy. Another challenging task, 360-degree gaze estimation (also introduced by the Gaze360 dataset), consists of estimating not only the forward gazes, but also the backward ones. The backward gazes introduce discontinuity in the yaw angle values of the gaze, making the deep learning models affected by some huge loss around the discontinuous points. We propose to convert the angle values by sine-cosine transform to avoid the discontinuity and represent the physical meaning of the yaw angle better. We conduct ablation studies on both ideas, the novel architecture and the transform, to validate their effectiveness. The two ideas allow our proposed model to achieve state-of-the-art performance for both the Gaze360 dataset and the RT-Gene dataset when using single images. Furthermore, we extend the model to a sequential version that systematically zooms in on a given sequence of images. The sequential version again achieves state-of-the-art performance on the Gaze360 dataset, which further demonstrates the usefulness of our proposed ideas.
摘要:凝视估计涉及预测,其中该人是看着给出一个单一的输入图像或图像序列。一个具有挑战性的任务,凝视估计在野外,在具有不同相机的人的距离,像Gaze360数据集不受约束的环境中收集数据的担忧。在不同的距离导致在图像,这使得它很难用于电流CNN骨架来估计鲁棒凝视不同面部尺寸。我们的天赋技能,采取在面部区域专注的神情,以确定凝视的启发,我们提出了一个新的架构,该架构在多尺度的图像,以提高预测精度的面部区域同样放大。另一个挑战性的任务,360度的注视估计(也由Gaze360数据集引入),由估计不仅向前注视,而且向后那些的。落后的目光在注视的偏航角值不连续推出,使受周围的不连续点了一些巨大的损失深学习模型。我们建议按正弦余弦的角度值转换变换,以避免不连续性和代表偏航角的物理意义更好。我们两个的想法进行消融的研究中,新颖的架构和改造,以验证其有效性。这两个想法让我们提出的模型中使用单一图像时,实现国家的最先进的性能为Gaze360数据集和RT-基因数据集两者。此外,我们的模型扩展到连续的版本,在图像的给定序列系统变焦。顺序版本再次达到上Gaze360数据集,这进一步证明了我们提出的思路,有用的国家的最先进的性能。
Ashesh Mishra, Hsuan-Tien Lin
Abstract: Gaze estimation involves predicting where the person is looking at, given either a single input image or a sequence of images. One challenging task, gaze estimation in the wild, concerns data collected in unconstrained environments with varying camera-person distances, like the Gaze360 dataset. The varying distances result in varying face sizes in the images, which makes it hard for current CNN backbones to estimate the gaze robustly. Inspired by our natural skill to identify the gaze by taking a focused look at the face area, we propose a novel architecture that similarly zooms in on the face area of the image at multiple scales to improve prediction accuracy. Another challenging task, 360-degree gaze estimation (also introduced by the Gaze360 dataset), consists of estimating not only the forward gazes, but also the backward ones. The backward gazes introduce discontinuity in the yaw angle values of the gaze, making the deep learning models affected by some huge loss around the discontinuous points. We propose to convert the angle values by sine-cosine transform to avoid the discontinuity and represent the physical meaning of the yaw angle better. We conduct ablation studies on both ideas, the novel architecture and the transform, to validate their effectiveness. The two ideas allow our proposed model to achieve state-of-the-art performance for both the Gaze360 dataset and the RT-Gene dataset when using single images. Furthermore, we extend the model to a sequential version that systematically zooms in on a given sequence of images. The sequential version again achieves state-of-the-art performance on the Gaze360 dataset, which further demonstrates the usefulness of our proposed ideas.
摘要:凝视估计涉及预测,其中该人是看着给出一个单一的输入图像或图像序列。一个具有挑战性的任务,凝视估计在野外,在具有不同相机的人的距离,像Gaze360数据集不受约束的环境中收集数据的担忧。在不同的距离导致在图像,这使得它很难用于电流CNN骨架来估计鲁棒凝视不同面部尺寸。我们的天赋技能,采取在面部区域专注的神情,以确定凝视的启发,我们提出了一个新的架构,该架构在多尺度的图像,以提高预测精度的面部区域同样放大。另一个挑战性的任务,360度的注视估计(也由Gaze360数据集引入),由估计不仅向前注视,而且向后那些的。落后的目光在注视的偏航角值不连续推出,使受周围的不连续点了一些巨大的损失深学习模型。我们建议按正弦余弦的角度值转换变换,以避免不连续性和代表偏航角的物理意义更好。我们两个的想法进行消融的研究中,新颖的架构和改造,以验证其有效性。这两个想法让我们提出的模型中使用单一图像时,实现国家的最先进的性能为Gaze360数据集和RT-基因数据集两者。此外,我们的模型扩展到连续的版本,在图像的给定序列系统变焦。顺序版本再次达到上Gaze360数据集,这进一步证明了我们提出的思路,有用的国家的最先进的性能。
21. Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for Scene Segmentation [PDF] 返回目录
Soham Chattopadhyay, Hritam Basak
Abstract: Despite the growing success of Convolution neural networks (CNN) in the recent past in the task of scene segmentation, the standard models lack some of the important features that might result in sub-optimal segmentation outputs. The widely used encoder-decoder architecture extracts and uses several redundant and low-level features at different steps and different scales. Also, these networks fail to map the long-range dependencies of local features, which results in discriminative feature maps corresponding to each semantic class in the resulting segmented image. In this paper, we propose a novel multi-scale attention network for scene segmentation purposes by using the rich contextual information from an image. Different from the original UNet architecture we have used attention gates which take the features from the encoder and the output of the pyramid pool as input and produced out-put is further concatenated with the up-sampled output of the previous pyramid-pool layer and mapped to the next subsequent layer. This network can map local features with their global counterparts with improved accuracy and emphasize on discriminative image regions by focusing on relevant local features only. We also propose a compound loss function by optimizing the IoU loss and fusing Dice Loss and Weighted Cross-entropy loss with it to achieve an optimal solution at a faster convergence rate. We have evaluated our model on two standard datasets named PascalVOC2012 and ADE20k and was able to achieve mean IoU of 79.88% and 44.88% on the two datasets respectively, and compared our result with the widely known models to prove the superiority of our model over them.
摘要:尽管在最近的过去的场景分割的任务卷积神经网络(CNN)的不断成功,标准模型缺乏一些可能导致次优的分割输出的重要特征。广泛使用的编码器 - 解码器架构提取物和用途几个冗余和低级别的功能在不同的步骤和不同的尺度。此外,这些网络失败来映射的局部特征的长程的相关性,其结果在判别特征对应于每个语义类别所得分割图像中的地图。在本文中,我们通过使用从图像中丰富的上下文信息提出一种新的多尺度关注网络场景分割的目的。从我们使用的注意栅极内搭特征从编码器和金字塔池作为输入的输出,并且产生的输出与先前的棱锥池层的上采样的输出进一步连接在一起并映射原始UNET架构的不同到下一个后续层。该网络可以与他们改进的精度全球同行地图局部特征和只针对相关的本地特色注重辨别图像区域。我们还通过优化借条丢失和融合骰子损耗和加权互熵损失与它实现以更快的收敛速度的最佳解决方案提出了复合损失函数。我们评估我们的模型命名PascalVOC2012和ADE20k两个标准数据集,并能分别实现对两个数据集79.88%和44.88%,平均欠条,并且比较我们与广为人知的模型结果来证明我们的模型在他们的优越性。
Soham Chattopadhyay, Hritam Basak
Abstract: Despite the growing success of Convolution neural networks (CNN) in the recent past in the task of scene segmentation, the standard models lack some of the important features that might result in sub-optimal segmentation outputs. The widely used encoder-decoder architecture extracts and uses several redundant and low-level features at different steps and different scales. Also, these networks fail to map the long-range dependencies of local features, which results in discriminative feature maps corresponding to each semantic class in the resulting segmented image. In this paper, we propose a novel multi-scale attention network for scene segmentation purposes by using the rich contextual information from an image. Different from the original UNet architecture we have used attention gates which take the features from the encoder and the output of the pyramid pool as input and produced out-put is further concatenated with the up-sampled output of the previous pyramid-pool layer and mapped to the next subsequent layer. This network can map local features with their global counterparts with improved accuracy and emphasize on discriminative image regions by focusing on relevant local features only. We also propose a compound loss function by optimizing the IoU loss and fusing Dice Loss and Weighted Cross-entropy loss with it to achieve an optimal solution at a faster convergence rate. We have evaluated our model on two standard datasets named PascalVOC2012 and ADE20k and was able to achieve mean IoU of 79.88% and 44.88% on the two datasets respectively, and compared our result with the widely known models to prove the superiority of our model over them.
摘要:尽管在最近的过去的场景分割的任务卷积神经网络(CNN)的不断成功,标准模型缺乏一些可能导致次优的分割输出的重要特征。广泛使用的编码器 - 解码器架构提取物和用途几个冗余和低级别的功能在不同的步骤和不同的尺度。此外,这些网络失败来映射的局部特征的长程的相关性,其结果在判别特征对应于每个语义类别所得分割图像中的地图。在本文中,我们通过使用从图像中丰富的上下文信息提出一种新的多尺度关注网络场景分割的目的。从我们使用的注意栅极内搭特征从编码器和金字塔池作为输入的输出,并且产生的输出与先前的棱锥池层的上采样的输出进一步连接在一起并映射原始UNET架构的不同到下一个后续层。该网络可以与他们改进的精度全球同行地图局部特征和只针对相关的本地特色注重辨别图像区域。我们还通过优化借条丢失和融合骰子损耗和加权互熵损失与它实现以更快的收敛速度的最佳解决方案提出了复合损失函数。我们评估我们的模型命名PascalVOC2012和ADE20k两个标准数据集,并能分别实现对两个数据集79.88%和44.88%,平均欠条,并且比较我们与广为人知的模型结果来证明我们的模型在他们的优越性。
22. A Self Contour-based Rotation and Translation-Invariant Transformation for Point Clouds Recognition [PDF] 返回目录
Dongrui Liu, Chuanchuan Chen, Changqing Xu, Qi Cai, Lei Chu, Robert Caiming Qiu
Abstract: Recently, several direct processing point cloud models have achieved state-of-the-art performances for classification and segmentation tasks. However, these methods lack rotation robustness, and their performances degrade severely under random rotations, failing to extend to real-world applications with varying orientations. To address this problem, we propose a method named Self Contour-based Transformation (SCT), which can be flexibly integrated into a variety of existing point cloud recognition models against arbitrary rotations without any extra modifications. The SCT provides efficient and mathematically proved rotation and translation invariance by introducing Rotation and Translation-Invariant Transformation. It linearly transforms Cartesian coordinates of points to the self contour-based rotation-invariant representations while maintaining the global geometric structure. Moreover, to enhance discriminative feature extraction, the Frame Alignment module is further introduced, aiming to capture contours and transform self contour-based frames to the intra-class frame. Extensive experimental results and mathematical analyses show that the proposed method outperforms the state-of-the-art approaches under arbitrary rotations without any rotation augmentation on standard benchmarks, including ModelNet40, ScanObjectNN and ShapeNet.
摘要:最近,一些直接加工点云模型已经取得了国家的最先进的性能进行分类和细分任务。然而,这些方法缺乏稳健性的旋转,并且其性能在随机轮换严重降低,未能延伸到现实世界的应用具有不同的取向。为了解决这个问题,我们提出了一个名为基于轮廓的自我转型(SCT)方法,它可以灵活地集成到各种禁止任意旋转现有的点云识别模型中没有任何多余的修饰。在SCT通过引入旋转和平移不变变换提供高效和数学上证明旋转和平移不变性。这非线性变换到基于轮廓自转不变表示点的笛卡尔坐标,同时保持全球几何结构。此外,为了提高判别特征提取,帧校准模块被进一步引入,目的是捕获的轮廓和变换自基于轮廓的帧提供给类内帧。广泛的实验结果和数学分析表明,该方法优于在任意转速接近而对标准基准,包括ModelNet40,ScanObjectNN和ShapeNet任何旋转增强先进国家的的。
Dongrui Liu, Chuanchuan Chen, Changqing Xu, Qi Cai, Lei Chu, Robert Caiming Qiu
Abstract: Recently, several direct processing point cloud models have achieved state-of-the-art performances for classification and segmentation tasks. However, these methods lack rotation robustness, and their performances degrade severely under random rotations, failing to extend to real-world applications with varying orientations. To address this problem, we propose a method named Self Contour-based Transformation (SCT), which can be flexibly integrated into a variety of existing point cloud recognition models against arbitrary rotations without any extra modifications. The SCT provides efficient and mathematically proved rotation and translation invariance by introducing Rotation and Translation-Invariant Transformation. It linearly transforms Cartesian coordinates of points to the self contour-based rotation-invariant representations while maintaining the global geometric structure. Moreover, to enhance discriminative feature extraction, the Frame Alignment module is further introduced, aiming to capture contours and transform self contour-based frames to the intra-class frame. Extensive experimental results and mathematical analyses show that the proposed method outperforms the state-of-the-art approaches under arbitrary rotations without any rotation augmentation on standard benchmarks, including ModelNet40, ScanObjectNN and ShapeNet.
摘要:最近,一些直接加工点云模型已经取得了国家的最先进的性能进行分类和细分任务。然而,这些方法缺乏稳健性的旋转,并且其性能在随机轮换严重降低,未能延伸到现实世界的应用具有不同的取向。为了解决这个问题,我们提出了一个名为基于轮廓的自我转型(SCT)方法,它可以灵活地集成到各种禁止任意旋转现有的点云识别模型中没有任何多余的修饰。在SCT通过引入旋转和平移不变变换提供高效和数学上证明旋转和平移不变性。这非线性变换到基于轮廓自转不变表示点的笛卡尔坐标,同时保持全球几何结构。此外,为了提高判别特征提取,帧校准模块被进一步引入,目的是捕获的轮廓和变换自基于轮廓的帧提供给类内帧。广泛的实验结果和数学分析表明,该方法优于在任意转速接近而对标准基准,包括ModelNet40,ScanObjectNN和ShapeNet任何旋转增强先进国家的的。
23. Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition [PDF] 返回目录
Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao
Abstract: Recent years have witnessed the significant progress of action recognition task with deep networks. However, most of current video networks require large memory and computational resources, which hinders their applications in practice. Existing knowledge distillation methods are limited to the image-level spatial domain, ignoring the temporal and frequency information which provide structural knowledge and are important for video analysis. This paper explores how to train small and efficient networks for action recognition. Specifically, we propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively. Our insight is that appealing performance of action recognition requires \textit{explicitly} modeling the temporal frequency spectrum of video features. Therefore, we introduce a spectrum loss that enforces the student network to mimic the temporal frequency spectrum from the teacher network, instead of \textit{implicitly} distilling features as many previous works. Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher. Besides, a collaborative learning strategy is presented to optimize the training process from a probabilistic view. Extensive experiments are conducted on several action recognition benchmarks, such as Kinetics, Something-Something, and Jester, which consistently verify effectiveness of our approach, and demonstrate that our method can achieve higher performance than state-of-the-art methods with the same backbone.
摘要:近年来,双方深网络行为识别任务的显著进展。然而,目前大多数的视频网络需要大量的内存和计算资源,这阻碍了它们的应用在实践中。现有的知识蒸馏方法限于图像的水平空间域中,忽略其提供结构的知识,并且是视频分析重要的时间和频率信息。本文探讨了如何培养小而高效的网络行为识别。具体来说,我们建议在频域中两个蒸馏策略,即分别为所述特征谱和参数分布蒸馏。我们的观点是有吸引力动作识别的性能需要\ textit {}明确建模的视频功能的时间频率频谱。因此,我们介绍的不是\ textit {}隐式蒸馏特征为以前的许多作品,从教师网络强制学生网络模拟时间频谱频谱损失。二,参数频率分布进一步采取引导学生网络学习从老师的外观造型过程。此外,提出了一种协作学习策略,从概率的观点优化训练过程。大量的实验是在几个动作识别基准,如动力学,东西多岁,和小丑,这始终验证我们的方法的有效性,并证明我们的方法可以实现比国家的最先进的方法具有相同的更高的性能进行骨干。
Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao
Abstract: Recent years have witnessed the significant progress of action recognition task with deep networks. However, most of current video networks require large memory and computational resources, which hinders their applications in practice. Existing knowledge distillation methods are limited to the image-level spatial domain, ignoring the temporal and frequency information which provide structural knowledge and are important for video analysis. This paper explores how to train small and efficient networks for action recognition. Specifically, we propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively. Our insight is that appealing performance of action recognition requires \textit{explicitly} modeling the temporal frequency spectrum of video features. Therefore, we introduce a spectrum loss that enforces the student network to mimic the temporal frequency spectrum from the teacher network, instead of \textit{implicitly} distilling features as many previous works. Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher. Besides, a collaborative learning strategy is presented to optimize the training process from a probabilistic view. Extensive experiments are conducted on several action recognition benchmarks, such as Kinetics, Something-Something, and Jester, which consistently verify effectiveness of our approach, and demonstrate that our method can achieve higher performance than state-of-the-art methods with the same backbone.
摘要:近年来,双方深网络行为识别任务的显著进展。然而,目前大多数的视频网络需要大量的内存和计算资源,这阻碍了它们的应用在实践中。现有的知识蒸馏方法限于图像的水平空间域中,忽略其提供结构的知识,并且是视频分析重要的时间和频率信息。本文探讨了如何培养小而高效的网络行为识别。具体来说,我们建议在频域中两个蒸馏策略,即分别为所述特征谱和参数分布蒸馏。我们的观点是有吸引力动作识别的性能需要\ textit {}明确建模的视频功能的时间频率频谱。因此,我们介绍的不是\ textit {}隐式蒸馏特征为以前的许多作品,从教师网络强制学生网络模拟时间频谱频谱损失。二,参数频率分布进一步采取引导学生网络学习从老师的外观造型过程。此外,提出了一种协作学习策略,从概率的观点优化训练过程。大量的实验是在几个动作识别基准,如动力学,东西多岁,和小丑,这始终验证我们的方法的有效性,并证明我们的方法可以实现比国家的最先进的方法具有相同的更高的性能进行骨干。
24. 3DPVNet: Patch-level 3D Hough Voting Network for 6D Pose Estimation [PDF] 返回目录
Yuanpeng Liu, Jun Zhou, Yuqi Zhang, Chao Ding, Jun Wang
Abstract: In this paper, we focus on estimating the 6D pose of objects in point clouds. Although the topic has been widely studied, pose estimation in point clouds remains a challenging problem due to the noise and occlusion. To address the problem, a novel 3DPVNet is presented in this work, which utilizes 3D local patches to vote for the object 6D poses. 3DPVNet is comprised of three modules. In particular, a Patch Unification (\textbf{PU}) module is first introduced to normalize the input patch, and also create a standard local coordinate frame on it to generate a reliable vote. We then devise a Weight-guided Neighboring Feature Fusion (\textbf{WNFF}) module in the network, which fuses the neighboring features to yield a semi-global feature for the center patch. WNFF module mines the neighboring information of a local patch, such that the representation capability to local geometric characteristics is significantly enhanced, making the method robust to a certain level of noise. Moreover, we present a Patch-level Voting (\textbf{PV}) module to regress transformations and generates pose votes. After the aggregation of all votes from patches and a refinement step, the final pose of the object can be obtained. Compared to recent voting-based methods, 3DPVNet is patch-level, and directly carried out on point clouds. Therefore, 3DPVNet achieves less computation than point/pixel-level voting scheme, and has robustness to partial data. Experiments on several datasets demonstrate that 3DPVNet achieves the state-of-the-art performance, and is also robust against noise and occlusions.
摘要:在本文中,我们侧重于估算点云对象的6D姿态。虽然话题已被广泛研究,姿态估计在点云仍然是一个具有挑战性的问题,由于噪音和闭塞。为了解决这个问题,一种新型的3DPVNet呈现在这项工作中,它利用3D本地补丁来投票对象6D姿态。 3DPVNet包括三个模块。特别地,补丁合并(\ textbf {PU})模块首先被引入到归一化输入补丁,并且还在其上创建一个标准的局部坐标系,以产生一个可靠的投票。然后,我们设计一种重量引导邻接特征融合(\ textbf {WNFF})模块的网络,它融合了邻近特征以产生一个半全局特征为中心贴片英寸WNFF模块地雷局部斑块的邻近信息,使得表示能力局部几何特性显著的强化,使得健壮到噪声的一定水平的方法。此外,我们提出了一个补丁级别投票(\ textbf {PV})模块,以回归变换并生成姿态票。所有选票补丁聚集和精制工序后,可以得到对象的最终姿态。相较于最近基于投票的方法,3DPVNet是补丁级别,并在点云直接进行。因此,实现了3DPVNet较少的计算比点/像素级表决方案,且具有鲁棒性的部分数据。几个数据集实验结果表明,3DPVNet实现国家的最先进的性能,并且还对噪声和遮挡的鲁棒性。
Yuanpeng Liu, Jun Zhou, Yuqi Zhang, Chao Ding, Jun Wang
Abstract: In this paper, we focus on estimating the 6D pose of objects in point clouds. Although the topic has been widely studied, pose estimation in point clouds remains a challenging problem due to the noise and occlusion. To address the problem, a novel 3DPVNet is presented in this work, which utilizes 3D local patches to vote for the object 6D poses. 3DPVNet is comprised of three modules. In particular, a Patch Unification (\textbf{PU}) module is first introduced to normalize the input patch, and also create a standard local coordinate frame on it to generate a reliable vote. We then devise a Weight-guided Neighboring Feature Fusion (\textbf{WNFF}) module in the network, which fuses the neighboring features to yield a semi-global feature for the center patch. WNFF module mines the neighboring information of a local patch, such that the representation capability to local geometric characteristics is significantly enhanced, making the method robust to a certain level of noise. Moreover, we present a Patch-level Voting (\textbf{PV}) module to regress transformations and generates pose votes. After the aggregation of all votes from patches and a refinement step, the final pose of the object can be obtained. Compared to recent voting-based methods, 3DPVNet is patch-level, and directly carried out on point clouds. Therefore, 3DPVNet achieves less computation than point/pixel-level voting scheme, and has robustness to partial data. Experiments on several datasets demonstrate that 3DPVNet achieves the state-of-the-art performance, and is also robust against noise and occlusions.
摘要:在本文中,我们侧重于估算点云对象的6D姿态。虽然话题已被广泛研究,姿态估计在点云仍然是一个具有挑战性的问题,由于噪音和闭塞。为了解决这个问题,一种新型的3DPVNet呈现在这项工作中,它利用3D本地补丁来投票对象6D姿态。 3DPVNet包括三个模块。特别地,补丁合并(\ textbf {PU})模块首先被引入到归一化输入补丁,并且还在其上创建一个标准的局部坐标系,以产生一个可靠的投票。然后,我们设计一种重量引导邻接特征融合(\ textbf {WNFF})模块的网络,它融合了邻近特征以产生一个半全局特征为中心贴片英寸WNFF模块地雷局部斑块的邻近信息,使得表示能力局部几何特性显著的强化,使得健壮到噪声的一定水平的方法。此外,我们提出了一个补丁级别投票(\ textbf {PV})模块,以回归变换并生成姿态票。所有选票补丁聚集和精制工序后,可以得到对象的最终姿态。相较于最近基于投票的方法,3DPVNet是补丁级别,并在点云直接进行。因此,实现了3DPVNet较少的计算比点/像素级表决方案,且具有鲁棒性的部分数据。几个数据集实验结果表明,3DPVNet实现国家的最先进的性能,并且还对噪声和遮挡的鲁棒性。
25. Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze [PDF] 返回目录
Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li, Zhen Sun, Wenxian Yu
Abstract: This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM. Most SLAM methods treat all the features extracted from the images as equal importance during the optimization process. However, the salient feature points in scenes have more significant influence during the human navigation process. Therefore, we first propose a visual saliency model called SalVavNet in which we introduce a correlation module and propose an adaptive Exponential Moving Average (EMA) module. These modules mitigate the center bias to enable the saliency maps generated by SalNavNet to pay more attention to the same salient object. Moreover, the saliency maps simulate the human behavior for the refinement of SLAM results. The feature points extracted from the salient regions have greater importance in optimization process. We add semantic saliency information to the Euroc dataset to generate an open-source saliency SLAM dataset. Comprehensive test results prove that Attention-SLAM outperforms benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO in terms of efficiency, accuracy, and robustness in most test cases.
摘要:本文提出了一种新颖的同步定位和地图创建(SLAM)的方法,即注意力SLAM,其通过结合视觉显着性模型(SalNavNet)与传统的单目视觉SLAM模拟人导航模式。大多数SLAM方法对待所有在优化过程中从图像作为同等重要提取的特征。然而,在场景中的突出特征点都在人类的导航过程更显著的影响。因此,我们首先提出称为视觉显着性模型SalVavNet其中我们引入一个相关模块,并提出一种自适应指数移动平均(EMA)模块。这些模块减轻中心偏置启用Google地图通过SalNavNet产生的显着性,以更加注重同显着对象。此外,显着图模拟了SLAM结果的细化的人类行为。从显着的区域中提取的特征点,在优化过程更重要。我们添加语义显着信息的Euroc数据集生成一个开源的显着SLAM数据集。全面的测试结果证明,注意力SLAM性能优于基准,如直接稀疏测程(DSO),ORB-SLAM和显着DSO在效率,精度和耐用性方面在大多数测试案例。
Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li, Zhen Sun, Wenxian Yu
Abstract: This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM. Most SLAM methods treat all the features extracted from the images as equal importance during the optimization process. However, the salient feature points in scenes have more significant influence during the human navigation process. Therefore, we first propose a visual saliency model called SalVavNet in which we introduce a correlation module and propose an adaptive Exponential Moving Average (EMA) module. These modules mitigate the center bias to enable the saliency maps generated by SalNavNet to pay more attention to the same salient object. Moreover, the saliency maps simulate the human behavior for the refinement of SLAM results. The feature points extracted from the salient regions have greater importance in optimization process. We add semantic saliency information to the Euroc dataset to generate an open-source saliency SLAM dataset. Comprehensive test results prove that Attention-SLAM outperforms benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO in terms of efficiency, accuracy, and robustness in most test cases.
摘要:本文提出了一种新颖的同步定位和地图创建(SLAM)的方法,即注意力SLAM,其通过结合视觉显着性模型(SalNavNet)与传统的单目视觉SLAM模拟人导航模式。大多数SLAM方法对待所有在优化过程中从图像作为同等重要提取的特征。然而,在场景中的突出特征点都在人类的导航过程更显著的影响。因此,我们首先提出称为视觉显着性模型SalVavNet其中我们引入一个相关模块,并提出一种自适应指数移动平均(EMA)模块。这些模块减轻中心偏置启用Google地图通过SalNavNet产生的显着性,以更加注重同显着对象。此外,显着图模拟了SLAM结果的细化的人类行为。从显着的区域中提取的特征点,在优化过程更重要。我们添加语义显着信息的Euroc数据集生成一个开源的显着SLAM数据集。全面的测试结果证明,注意力SLAM性能优于基准,如直接稀疏测程(DSO),ORB-SLAM和显着DSO在效率,精度和耐用性方面在大多数测试案例。
26. Leveraging Domain Knowledge using Machine Learning for Image Compression in Internet-of-Things [PDF] 返回目录
Prabuddha Chakraborty, Jonathan Cruz, Swarup Bhunia
Abstract: The emergent ecosystems of intelligent edge devices in diverse Internet of Things (IoT) applications, from automatic surveillance to precision agriculture, increasingly rely on recording and processing variety of image data. Due to resource constraints, e.g., energy and communication bandwidth requirements, these applications require compressing the recorded images before transmission. For these applications, image compression commonly requires: (1) maintaining features for coarse-grain pattern recognition instead of the high-level details for human perception due to machine-to-machine communications; (2) high compression ratio that leads to improved energy and transmission efficiency; (3) large dynamic range of compression and an easy trade-off between compression factor and quality of reconstruction to accommodate a wide diversity of IoT applications as well as their time-varying energy/performance needs. To address these requirements, we propose, MAGIC, a novel machine learning (ML) guided image compression framework that judiciously sacrifices visual quality to achieve much higher compression when compared to traditional techniques, while maintaining accuracy for coarse-grained vision tasks. The central idea is to capture application-specific domain knowledge and efficiently utilize it in achieving high compression. We demonstrate that the MAGIC framework is configurable across a wide range of compression/quality and is capable of compressing beyond the standard quality factor limits of both JPEG 2000 and WebP. We perform experiments on representative IoT applications using two vision datasets and show up to 42.65x compression at similar accuracy with respect to the source. We highlight low variance in compression rate across images using our technique as compared to JPEG 2000 and WebP.
摘要:智能边缘设备中的东西(IOT)的应用程序不同的因特网,从自动监视用于精密农业生态系统突发,越来越依赖于记录和处理各种图像数据。由于资源限制,例如,能源和通信带宽要求,这些应用需要在传输之前压缩所述所记录的图像。对于这些应用,图像压缩通常需要:(1)维持粗粒模式识别而不是为人类感知由于机器对机器通信的高级别细节特征; (2)高的压缩比,导致改进的能量和传输效率; (3)压缩的动态范围大,容易权衡压缩因子和重建的质量之间,以适应应用的IoT广泛多样的以及它们随时间变化的能量/性能需求。为了满足这些需求,我们建议,魔术,一种新型的机器学习(ML)引导的图像压缩框架,明智的牺牲与传统技术相比时,同时保持对粗粒度的视觉任务准确性视觉质量,以实现更高的压缩。其中心思想是将捕获特定应用领域知识和有效地利用它实现高压缩。我们证明了MAGIC框架是在广泛的压缩/品质的配置,并且能够压缩超过两JPEG 2000和WebP的标准品质因数限制的。我们使用两个视觉数据集代表物联网应用进行实验,并相对于源类似准确性现身42.65x压缩。相比,JPEG 2000和WebP的我们强调在整个使用我们的技术,图像压缩率低方差。
Prabuddha Chakraborty, Jonathan Cruz, Swarup Bhunia
Abstract: The emergent ecosystems of intelligent edge devices in diverse Internet of Things (IoT) applications, from automatic surveillance to precision agriculture, increasingly rely on recording and processing variety of image data. Due to resource constraints, e.g., energy and communication bandwidth requirements, these applications require compressing the recorded images before transmission. For these applications, image compression commonly requires: (1) maintaining features for coarse-grain pattern recognition instead of the high-level details for human perception due to machine-to-machine communications; (2) high compression ratio that leads to improved energy and transmission efficiency; (3) large dynamic range of compression and an easy trade-off between compression factor and quality of reconstruction to accommodate a wide diversity of IoT applications as well as their time-varying energy/performance needs. To address these requirements, we propose, MAGIC, a novel machine learning (ML) guided image compression framework that judiciously sacrifices visual quality to achieve much higher compression when compared to traditional techniques, while maintaining accuracy for coarse-grained vision tasks. The central idea is to capture application-specific domain knowledge and efficiently utilize it in achieving high compression. We demonstrate that the MAGIC framework is configurable across a wide range of compression/quality and is capable of compressing beyond the standard quality factor limits of both JPEG 2000 and WebP. We perform experiments on representative IoT applications using two vision datasets and show up to 42.65x compression at similar accuracy with respect to the source. We highlight low variance in compression rate across images using our technique as compared to JPEG 2000 and WebP.
摘要:智能边缘设备中的东西(IOT)的应用程序不同的因特网,从自动监视用于精密农业生态系统突发,越来越依赖于记录和处理各种图像数据。由于资源限制,例如,能源和通信带宽要求,这些应用需要在传输之前压缩所述所记录的图像。对于这些应用,图像压缩通常需要:(1)维持粗粒模式识别而不是为人类感知由于机器对机器通信的高级别细节特征; (2)高的压缩比,导致改进的能量和传输效率; (3)压缩的动态范围大,容易权衡压缩因子和重建的质量之间,以适应应用的IoT广泛多样的以及它们随时间变化的能量/性能需求。为了满足这些需求,我们建议,魔术,一种新型的机器学习(ML)引导的图像压缩框架,明智的牺牲与传统技术相比时,同时保持对粗粒度的视觉任务准确性视觉质量,以实现更高的压缩。其中心思想是将捕获特定应用领域知识和有效地利用它实现高压缩。我们证明了MAGIC框架是在广泛的压缩/品质的配置,并且能够压缩超过两JPEG 2000和WebP的标准品质因数限制的。我们使用两个视觉数据集代表物联网应用进行实验,并相对于源类似准确性现身42.65x压缩。相比,JPEG 2000和WebP的我们强调在整个使用我们的技术,图像压缩率低方差。
27. Methods of the Vehicle Re-identification [PDF] 返回目录
Mohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Abstract: Most of researchers use the vehicle re-identification based on classification. This always requires an update with the new vehicle models in the market. In this paper, two types of vehicle re-identification will be presented. First, the standard method, which needs an image from the search vehicle. VRIC and VehicleID data set are suitable for training this module. It will be explained in detail how to improve the performance of this method using a trained network, which is designed for the classification. The second method takes as input a representative image of the search vehicle with similar make/model, released year and colour. It is very useful when an image from the search vehicle is not available. It produces as output a shape and a colour features. This could be used by the matching across a database to re-identify vehicles, which look similar to the search vehicle. To get a robust module for the re-identification, a fine-grained classification has been trained, which its class consists of four elements: the make of a vehicle refers to the vehicle's manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model within that manufacturer's portfolio, e.g. C Class, the year refers to the iteration of the model, which may receive progressive alterations and upgrades by its manufacturer and the perspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree of specificity. The aim of the vehicle shape classification is to classify the combination of these four elements. The colour classification has been separately trained. The results of vehicle re-identification will be shown. Using a developed tool, the re-identification of vehicles on video images and on controlled data set will be demonstrated. This work was partially funded under the grant.
摘要:大多数研究者使用基于分类的车辆重新鉴定。这总是需要在市场上的新车型的更新。在本文中,两种类型的车辆重新鉴定将提交。首先,标准的方法,该方法从搜索车辆需要的图像。 VRIC和VehicleID数据集都适合练这个模块。它详细讲述如何利用训练的网络,这是专为分类,以提高该方法的性能来解释。第二种方法需要输入与同类品牌/型号,发行年份和颜色搜索车辆的形象代表。当从搜索车辆的图像是不存在,它是非常有用的。它产生作为输出的形状和颜色特征。这可以通过在数据库中的匹配使用重新标识的车辆,这类似于搜索车辆。要获取重新鉴定稳健模块,细粒度的分类已经被训练,这同级车包括四个要素:车辆的化妆是指车辆的制造商,例如奔驰,车辆的模型是指制造商的组合,例如内类型的模型C类,一年指模型的迭代中,其可以通过其制造商和车辆的透视接收渐进的改变和升级。因此,所有的四种元素在增加特异性的程度描述的车辆。车辆形状分类的目的是这四个元件的组合进行分类。颜色分类已单独训练。的车辆再识别的结果将被显示。使用开发的工具,对视频图像和控制数据集的车辆重新鉴定论证会。这项工作是部分赠款资助。
Mohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Abstract: Most of researchers use the vehicle re-identification based on classification. This always requires an update with the new vehicle models in the market. In this paper, two types of vehicle re-identification will be presented. First, the standard method, which needs an image from the search vehicle. VRIC and VehicleID data set are suitable for training this module. It will be explained in detail how to improve the performance of this method using a trained network, which is designed for the classification. The second method takes as input a representative image of the search vehicle with similar make/model, released year and colour. It is very useful when an image from the search vehicle is not available. It produces as output a shape and a colour features. This could be used by the matching across a database to re-identify vehicles, which look similar to the search vehicle. To get a robust module for the re-identification, a fine-grained classification has been trained, which its class consists of four elements: the make of a vehicle refers to the vehicle's manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model within that manufacturer's portfolio, e.g. C Class, the year refers to the iteration of the model, which may receive progressive alterations and upgrades by its manufacturer and the perspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree of specificity. The aim of the vehicle shape classification is to classify the combination of these four elements. The colour classification has been separately trained. The results of vehicle re-identification will be shown. Using a developed tool, the re-identification of vehicles on video images and on controlled data set will be demonstrated. This work was partially funded under the grant.
摘要:大多数研究者使用基于分类的车辆重新鉴定。这总是需要在市场上的新车型的更新。在本文中,两种类型的车辆重新鉴定将提交。首先,标准的方法,该方法从搜索车辆需要的图像。 VRIC和VehicleID数据集都适合练这个模块。它详细讲述如何利用训练的网络,这是专为分类,以提高该方法的性能来解释。第二种方法需要输入与同类品牌/型号,发行年份和颜色搜索车辆的形象代表。当从搜索车辆的图像是不存在,它是非常有用的。它产生作为输出的形状和颜色特征。这可以通过在数据库中的匹配使用重新标识的车辆,这类似于搜索车辆。要获取重新鉴定稳健模块,细粒度的分类已经被训练,这同级车包括四个要素:车辆的化妆是指车辆的制造商,例如奔驰,车辆的模型是指制造商的组合,例如内类型的模型C类,一年指模型的迭代中,其可以通过其制造商和车辆的透视接收渐进的改变和升级。因此,所有的四种元素在增加特异性的程度描述的车辆。车辆形状分类的目的是这四个元件的组合进行分类。颜色分类已单独训练。的车辆再识别的结果将被显示。使用开发的工具,对视频图像和控制数据集的车辆重新鉴定论证会。这项工作是部分赠款资助。
28. SML: Semantic Meta-learning for Few-shot Semantic Segmentation [PDF] 返回目录
Ayyappa Kumar Pambala, Titir Dutta, Soma Biswas
Abstract: The significant amount of training data required for training Convolutional Neural Networks has become a bottleneck for applications like semantic segmentation. Few-shot semantic segmentation algorithms address this problem, with an aim to achieve good performance in the low-data regime, with few annotated training images. Recently, approaches based on class-prototypes computed from available training data have achieved immense success for this task. In this work, we propose a novel meta-learning framework, Semantic Meta-Learning (SML) which incorporates class level semantic descriptions in the generated prototypes for this problem. In addition, we propose to use the well established technique, ridge regression, to not only bring in the class-level semantic information, but also to effectively utilise the information available from multiple images present in the training data for prototype computation. This has a simple closed-form solution, and thus can be implemented easily and efficiently. Extensive experiments on the benchmark PASCAL-5i dataset under different experimental settings show the effectiveness of the proposed framework.
摘要:训练卷积神经网络所需的训练数据的显著量已经成为像语义分割应用程序的瓶颈。很少拍语义分割算法解决这个问题,有目的,以实现在低数据政权性能好,很少有注释的训练图像。近日,方法的基础上从现有的训练数据计算类的原型都取得了巨大成功完成这个任务。在这项工作中,我们提出了一种新的元学习框架,语义元学习(SML),它结合了一流水平的语义描述在这个问题的产生原型。此外,我们建议使用的成熟技术,岭回归,不仅在类级别的语义信息带来的,而且能够有效地利用现有的从多个图像的信息呈现为原型的计算训练数据。这具有一个简单的闭合形式的解,因此可以容易且有效地实现。对在不同的实验设置基准PASCAL-5I数据集大量的实验证明了该框架的有效性。
Ayyappa Kumar Pambala, Titir Dutta, Soma Biswas
Abstract: The significant amount of training data required for training Convolutional Neural Networks has become a bottleneck for applications like semantic segmentation. Few-shot semantic segmentation algorithms address this problem, with an aim to achieve good performance in the low-data regime, with few annotated training images. Recently, approaches based on class-prototypes computed from available training data have achieved immense success for this task. In this work, we propose a novel meta-learning framework, Semantic Meta-Learning (SML) which incorporates class level semantic descriptions in the generated prototypes for this problem. In addition, we propose to use the well established technique, ridge regression, to not only bring in the class-level semantic information, but also to effectively utilise the information available from multiple images present in the training data for prototype computation. This has a simple closed-form solution, and thus can be implemented easily and efficiently. Extensive experiments on the benchmark PASCAL-5i dataset under different experimental settings show the effectiveness of the proposed framework.
摘要:训练卷积神经网络所需的训练数据的显著量已经成为像语义分割应用程序的瓶颈。很少拍语义分割算法解决这个问题,有目的,以实现在低数据政权性能好,很少有注释的训练图像。近日,方法的基础上从现有的训练数据计算类的原型都取得了巨大成功完成这个任务。在这项工作中,我们提出了一种新的元学习框架,语义元学习(SML),它结合了一流水平的语义描述在这个问题的产生原型。此外,我们建议使用的成熟技术,岭回归,不仅在类级别的语义信息带来的,而且能够有效地利用现有的从多个图像的信息呈现为原型的计算训练数据。这具有一个简单的闭合形式的解,因此可以容易且有效地实现。对在不同的实验设置基准PASCAL-5I数据集大量的实验证明了该框架的有效性。
29. Data Augmentation and Clustering for Vehicle Make/Model Classification [PDF] 返回目录
Mohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Abstract: Vehicle shape information is very important in Intelligent Traffic Systems (ITS). In this paper we present a way to exploit a training data set of vehicles released in different years and captured under different perspectives. Also the efficacy of clustering to enhance the make/model classification is presented. Both steps led to improved classification results and a greater robustness. Deeper convolutional neural network based on ResNet architecture has been designed for the training of the vehicle make/model classification. The unequal class distribution of training data produces an a priori probability. Its elimination, obtained by removing of the bias and through hard normalization of the centroids in the classification layer, improves the classification results. A developed application has been used to test the vehicle re-identification on video data manually based on make/model and color classification. This work was partially funded under the grant.
摘要:车辆形状信息是智能交通系统非常重要(ITS)。在本文中,我们提出了一种方法来利用不同年份发行和拍摄下不同角度的车辆的训练数据集。此外聚类提升品牌/型号分类的效果呈现。这两个步骤导致了改进的分类结果和更大的鲁棒性。基于RESNET架构更深的卷积神经网络已被设计为车辆品牌/型号分类的培训。训练数据的不相等的类分布产生的先验概率。其消除,由偏置的去除,并通过在分类层的质心的硬正常化获得,提高了分类的结果。的展开应用程序已被用于测试的视频数据的车辆再识别手动基于品牌/型号和颜色分类。这项工作是部分赠款资助。
Mohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Abstract: Vehicle shape information is very important in Intelligent Traffic Systems (ITS). In this paper we present a way to exploit a training data set of vehicles released in different years and captured under different perspectives. Also the efficacy of clustering to enhance the make/model classification is presented. Both steps led to improved classification results and a greater robustness. Deeper convolutional neural network based on ResNet architecture has been designed for the training of the vehicle make/model classification. The unequal class distribution of training data produces an a priori probability. Its elimination, obtained by removing of the bias and through hard normalization of the centroids in the classification layer, improves the classification results. A developed application has been used to test the vehicle re-identification on video data manually based on make/model and color classification. This work was partially funded under the grant.
摘要:车辆形状信息是智能交通系统非常重要(ITS)。在本文中,我们提出了一种方法来利用不同年份发行和拍摄下不同角度的车辆的训练数据集。此外聚类提升品牌/型号分类的效果呈现。这两个步骤导致了改进的分类结果和更大的鲁棒性。基于RESNET架构更深的卷积神经网络已被设计为车辆品牌/型号分类的培训。训练数据的不相等的类分布产生的先验概率。其消除,由偏置的去除,并通过在分类层的质心的硬正常化获得,提高了分类的结果。的展开应用程序已被用于测试的视频数据的车辆再识别手动基于品牌/型号和颜色分类。这项工作是部分赠款资助。
30. WDRN : A Wavelet Decomposed RelightNet for Image Relighting [PDF] 返回目录
Densen Puthussery, Hrishikesh P.S., Melvin Kuriakose, Jiji C.V
Abstract: The task of recalibrating the illumination settings in an image to a target configuration is known as relighting. Relighting techniques have potential applications in digital photography, gaming industry and in augmented reality. In this paper, we address the one-to-one relighting problem where an image at a target illumination settings is predicted given an input image with specific illumination conditions. To this end, we propose a wavelet decomposed RelightNet called WDRN which is a novel encoder-decoder network employing wavelet based decomposition followed by convolution layers under a muti-resolution framework. We also propose a novel loss function called gray loss that ensures efficient learning of gradient in illumination along different directions of the ground truth image giving rise to visually superior relit images. The proposed solution won the first position in the relighting challenge event in advances in image manipulation (AIM) 2020 workshop which proves its effectiveness measured in terms of a Mean Perceptual Score which in turn is measured using SSIM and a Learned Perceptual Image Patch Similarity score.
摘要:图像到目标配置在重新校准照明设置的任务被称为重新点灯。重燃技术在数码摄影,游戏产业和增强现实的应用潜力。在本文中,我们解决了一个对一重新点灯的问题,其中在目标照明设置的图像预测给定与特定的照明条件的输入图像。为此,我们提出了一种小波分解RelightNet称为WDRN这是一种新型的编码器 - 解码器网络用人基于小波的分解,随后下一个穆蒂分辨率框架卷积层。我们还建议称为灰色损失的新颖的损失函数,在沿所述地面实况图像从而产生视觉上优越重新点燃的图像的不同方向照明梯度确保高效学习。所提出的方案获得了在图像处理(AIM)2020车间这证明在其又使用SSIM和了解到感知图像补丁相似度得分所测量的平均数感知分数衡量其效能的进步在重新点灯挑战事件的第一位置。
Densen Puthussery, Hrishikesh P.S., Melvin Kuriakose, Jiji C.V
Abstract: The task of recalibrating the illumination settings in an image to a target configuration is known as relighting. Relighting techniques have potential applications in digital photography, gaming industry and in augmented reality. In this paper, we address the one-to-one relighting problem where an image at a target illumination settings is predicted given an input image with specific illumination conditions. To this end, we propose a wavelet decomposed RelightNet called WDRN which is a novel encoder-decoder network employing wavelet based decomposition followed by convolution layers under a muti-resolution framework. We also propose a novel loss function called gray loss that ensures efficient learning of gradient in illumination along different directions of the ground truth image giving rise to visually superior relit images. The proposed solution won the first position in the relighting challenge event in advances in image manipulation (AIM) 2020 workshop which proves its effectiveness measured in terms of a Mean Perceptual Score which in turn is measured using SSIM and a Learned Perceptual Image Patch Similarity score.
摘要:图像到目标配置在重新校准照明设置的任务被称为重新点灯。重燃技术在数码摄影,游戏产业和增强现实的应用潜力。在本文中,我们解决了一个对一重新点灯的问题,其中在目标照明设置的图像预测给定与特定的照明条件的输入图像。为此,我们提出了一种小波分解RelightNet称为WDRN这是一种新型的编码器 - 解码器网络用人基于小波的分解,随后下一个穆蒂分辨率框架卷积层。我们还建议称为灰色损失的新颖的损失函数,在沿所述地面实况图像从而产生视觉上优越重新点燃的图像的不同方向照明梯度确保高效学习。所提出的方案获得了在图像处理(AIM)2020车间这证明在其又使用SSIM和了解到感知图像补丁相似度得分所测量的平均数感知分数衡量其效能的进步在重新点灯挑战事件的第一位置。
31. F3RNet: Full-Resolution Residual Registration Network for Multimodal Image Registration [PDF] 返回目录
Zhe Xu, Jie Luo, Jiangpeng Yan, Xiu Li, Jagadeesan Jayender
Abstract: Multimodal deformable image registration is essential for many image-guided therapies. Recently, deep learning approaches have gained substantial popularity and success in deformable image registration. Most deep learning approaches use the so-called mono-stream "high-to-low, low-to-high" network structure, and can achieve satisfactory overall registration results. However, accurate alignments for some severely deformed local regions, which are crucial for pinpointing surgical targets, are often overlooked, especially for multimodal inputs with vast intensity differences. Consequently, these approaches are not sensitive to some hard-to-align regions, e.g., intra-patient registration of deformed liver lobes. In this paper, we propose a novel unsupervised registration network, namely Full-Resolution Residual Registration Network (F3RNet), for multimodal registration of severely deformed organs. The proposed method combines two parallel processing streams in a residual learning fashion. One stream takes advantage of the full-resolution information that facilitates accurate voxel-level registration. The other stream learns the deep multi-scale residual representations to obtain robust recognition. We also factorize the 3D convolution to reduce the training parameters and enhance network efficiency. We validate the proposed method on 50 sets of clinically acquired intra-patient abdominal CT-MRI data. Experiments on both CT-to-MRI and MRI-to-CT registration demonstrate promising results compared to state-of-the-art approaches.
摘要:多模态变形图像配准是许多图像引导的治疗至关重要。近日,深学习方法已在变形图像配准获得了大量的人气和成功。最深刻的学习方法使用所谓的单流“高到低,低到高”的网络结构,可以实现令人满意的整体成果登记。然而,对于一些严重变形的局部区域,这是手术精确定位目标的关键准确对齐,往往被忽视,尤其是对于广大的强度差多投入。因此,这些方法不是一些难以对准区域敏感,例如,变形肝叶的的患者内注册。在本文中,我们提出了一种新的无监督登记网络,即全分辨率残差注册网络(F3RNet),为严重变形器官的多模态配准。所提出的方法结合了两个并行处理中的残余学习方式流。一个流需要的,有助于准确体素水平注册的全分辨率的信息优势。其他流学深多尺度残留表示以获得稳健的认可。我们也因式分解的3D卷积,以减少训练参数,提高网络效率。我们验证对50套临床获取的患者内腹部CT-MRI数据的所提出的方法。在实验既CT-到-MRI和MRI到CT登记证明有希望的结果相比较的状态的最先进的方法。
Zhe Xu, Jie Luo, Jiangpeng Yan, Xiu Li, Jagadeesan Jayender
Abstract: Multimodal deformable image registration is essential for many image-guided therapies. Recently, deep learning approaches have gained substantial popularity and success in deformable image registration. Most deep learning approaches use the so-called mono-stream "high-to-low, low-to-high" network structure, and can achieve satisfactory overall registration results. However, accurate alignments for some severely deformed local regions, which are crucial for pinpointing surgical targets, are often overlooked, especially for multimodal inputs with vast intensity differences. Consequently, these approaches are not sensitive to some hard-to-align regions, e.g., intra-patient registration of deformed liver lobes. In this paper, we propose a novel unsupervised registration network, namely Full-Resolution Residual Registration Network (F3RNet), for multimodal registration of severely deformed organs. The proposed method combines two parallel processing streams in a residual learning fashion. One stream takes advantage of the full-resolution information that facilitates accurate voxel-level registration. The other stream learns the deep multi-scale residual representations to obtain robust recognition. We also factorize the 3D convolution to reduce the training parameters and enhance network efficiency. We validate the proposed method on 50 sets of clinically acquired intra-patient abdominal CT-MRI data. Experiments on both CT-to-MRI and MRI-to-CT registration demonstrate promising results compared to state-of-the-art approaches.
摘要:多模态变形图像配准是许多图像引导的治疗至关重要。近日,深学习方法已在变形图像配准获得了大量的人气和成功。最深刻的学习方法使用所谓的单流“高到低,低到高”的网络结构,可以实现令人满意的整体成果登记。然而,对于一些严重变形的局部区域,这是手术精确定位目标的关键准确对齐,往往被忽视,尤其是对于广大的强度差多投入。因此,这些方法不是一些难以对准区域敏感,例如,变形肝叶的的患者内注册。在本文中,我们提出了一种新的无监督登记网络,即全分辨率残差注册网络(F3RNet),为严重变形器官的多模态配准。所提出的方法结合了两个并行处理中的残余学习方式流。一个流需要的,有助于准确体素水平注册的全分辨率的信息优势。其他流学深多尺度残留表示以获得稳健的认可。我们也因式分解的3D卷积,以减少训练参数,提高网络效率。我们验证对50套临床获取的患者内腹部CT-MRI数据的所提出的方法。在实验既CT-到-MRI和MRI到CT登记证明有希望的结果相比较的状态的最先进的方法。
32. Image Based Artificial Intelligence in Wound Assessment: A Systematic Review [PDF] 返回目录
D. M. Anisuzzaman, Chuanbo Wang, Behrouz Rostami, Sandeep Gopalakrishnan, Jeffrey Niezgoda, Zeyun Yu
Abstract: Efficient and effective assessment of acute and chronic wounds can help wound care teams in clinical practice to greatly improve wound diagnosis, optimize treatment plans, ease the workload and achieve health related quality of life to the patient population. While artificial intelligence (AI) has found wide applications in health-related sciences and technology, AI-based systems remain to be developed clinically and computationally for high-quality wound care. To this end, we have carried out a systematic review of intelligent image-based data analysis and system developments for wound assessment. Specifically, we provide an extensive review of research methods on wound measurement (segmentation) and wound diagnosis (classification). We also reviewed recent work on wound assessment systems (including hardware, software, and mobile apps). More than 250 articles were retrieved from various publication databases and online resources, and 115 of them were carefully selected to cover the breadth and depth of most recent and relevant work to convey the current review to its fulfillment.
摘要:急性和慢性伤口的高效和有效的评估可以帮助伤口护理团队在临床实践中,大大提高了诊断伤口,优化治疗方案,减轻工作量,实现健康相关的生活质量的患者群体。虽然人工智能(AI)已经找到了与健康有关的科学和技术的广泛应用,基于人工智能的系统仍有待临床和计算开发了高品质的伤口护理。为此,我们开展了基于智能图像数据分析和系统开发用于伤口评估系统的审查。具体来说,我们提供对伤口测量(分割)和伤口诊断(分类)的研究方法的广泛审查。我们还回顾了最近对伤口评估系统(包括硬件,软件和移动应用程序)的工作。超过250篇,从不同的出版物数据库和在线资源中检索,并把它们的115名经过精心挑选,涵盖的最新和最相关的工作的广度和深度,以传达当前审查其履行。
D. M. Anisuzzaman, Chuanbo Wang, Behrouz Rostami, Sandeep Gopalakrishnan, Jeffrey Niezgoda, Zeyun Yu
Abstract: Efficient and effective assessment of acute and chronic wounds can help wound care teams in clinical practice to greatly improve wound diagnosis, optimize treatment plans, ease the workload and achieve health related quality of life to the patient population. While artificial intelligence (AI) has found wide applications in health-related sciences and technology, AI-based systems remain to be developed clinically and computationally for high-quality wound care. To this end, we have carried out a systematic review of intelligent image-based data analysis and system developments for wound assessment. Specifically, we provide an extensive review of research methods on wound measurement (segmentation) and wound diagnosis (classification). We also reviewed recent work on wound assessment systems (including hardware, software, and mobile apps). More than 250 articles were retrieved from various publication databases and online resources, and 115 of them were carefully selected to cover the breadth and depth of most recent and relevant work to convey the current review to its fulfillment.
摘要:急性和慢性伤口的高效和有效的评估可以帮助伤口护理团队在临床实践中,大大提高了诊断伤口,优化治疗方案,减轻工作量,实现健康相关的生活质量的患者群体。虽然人工智能(AI)已经找到了与健康有关的科学和技术的广泛应用,基于人工智能的系统仍有待临床和计算开发了高品质的伤口护理。为此,我们开展了基于智能图像数据分析和系统开发用于伤口评估系统的审查。具体来说,我们提供对伤口测量(分割)和伤口诊断(分类)的研究方法的广泛审查。我们还回顾了最近对伤口评估系统(包括硬件,软件和移动应用程序)的工作。超过250篇,从不同的出版物数据库和在线资源中检索,并把它们的115名经过精心挑选,涵盖的最新和最相关的工作的广度和深度,以传达当前审查其履行。
33. Multi-structure bone segmentation in pediatric MR images with combined regularization from shape priors and adversarial network [PDF] 返回目录
Arnaud Boutillon, Bhushan Borotikar, Valérie Burdin, Pierre-Henri Conze
Abstract: Morphological and diagnostic evaluation of pediatric musculoskeletal system is crucial in clinical practice. However, most segmentation models do not perform well on scarce pediatric imaging data. We propose a regularized convolutional encoder-decoder network for the challenging task of segmenting pediatric magnetic resonance (MR) images. To overcome the scarcity and heterogeneity of pediatric imaging datasets, we adopt a regularization strategy to improve the generalization of segmentation models. To this end, we have conceived a novel optimization scheme for the segmentation network which comprises additional regularization terms to the loss function. In order to obtain globally consistent predictions, we incorporate a shape priors based regularization, derived from a non-linear shape representation learnt by an auto-encoder. Additionally, an adversarial regularization computed by a discriminator is integrated to encourage plausible delineations. Our method is evaluated for the task of multi-bone segmentation on two pediatric imaging datasets from different joints (ankle and shoulder), comprising pathological as well as healthy examinations. We illustrate that the proposed approach can be easily integrated into various multi-structure strategies and can improve the prediction accuracy of state-of-the-art models. The obtained results bring new perspectives for the management of pediatric musculoskeletal disorders.
摘要:小儿肌肉骨骼系统的形态和诊断评估是在临床实践中是至关重要的。然而,大多数细分车型不稀缺的儿科成像数据表现良好。我们提出了一个正规化的卷积编码器,解码器网络分割小儿磁共振(MR)图像的具有挑战性的任务。为了克服儿科成像数据集的稀缺性和异质性,我们采用正规化战略,提高细分车型的推广。为此,我们已设想用于分割网络,其包括额外的正则化项的损失函数的新颖的优化方案。为了获得全球一致的预测,我们结合了基于形状的先验正则化,从非直线形状表示由自动编码器得知的。此外,通过鉴别计算的对抗正被集成到鼓励合理delineations。我们的方法是用于多骨分割的任务评价来自不同的关节(踝关节和肩关节),包括病理以及健康检查2个儿科成像数据集。我们举例说明,该方法可以很容易地集成到各种多结构的策略,可以提高国家的最先进机型的预测精度。得到的结果带来新的前景儿童肌肉骨骼疾病的管理。
Arnaud Boutillon, Bhushan Borotikar, Valérie Burdin, Pierre-Henri Conze
Abstract: Morphological and diagnostic evaluation of pediatric musculoskeletal system is crucial in clinical practice. However, most segmentation models do not perform well on scarce pediatric imaging data. We propose a regularized convolutional encoder-decoder network for the challenging task of segmenting pediatric magnetic resonance (MR) images. To overcome the scarcity and heterogeneity of pediatric imaging datasets, we adopt a regularization strategy to improve the generalization of segmentation models. To this end, we have conceived a novel optimization scheme for the segmentation network which comprises additional regularization terms to the loss function. In order to obtain globally consistent predictions, we incorporate a shape priors based regularization, derived from a non-linear shape representation learnt by an auto-encoder. Additionally, an adversarial regularization computed by a discriminator is integrated to encourage plausible delineations. Our method is evaluated for the task of multi-bone segmentation on two pediatric imaging datasets from different joints (ankle and shoulder), comprising pathological as well as healthy examinations. We illustrate that the proposed approach can be easily integrated into various multi-structure strategies and can improve the prediction accuracy of state-of-the-art models. The obtained results bring new perspectives for the management of pediatric musculoskeletal disorders.
摘要:小儿肌肉骨骼系统的形态和诊断评估是在临床实践中是至关重要的。然而,大多数细分车型不稀缺的儿科成像数据表现良好。我们提出了一个正规化的卷积编码器,解码器网络分割小儿磁共振(MR)图像的具有挑战性的任务。为了克服儿科成像数据集的稀缺性和异质性,我们采用正规化战略,提高细分车型的推广。为此,我们已设想用于分割网络,其包括额外的正则化项的损失函数的新颖的优化方案。为了获得全球一致的预测,我们结合了基于形状的先验正则化,从非直线形状表示由自动编码器得知的。此外,通过鉴别计算的对抗正被集成到鼓励合理delineations。我们的方法是用于多骨分割的任务评价来自不同的关节(踝关节和肩关节),包括病理以及健康检查2个儿科成像数据集。我们举例说明,该方法可以很容易地集成到各种多结构的策略,可以提高国家的最先进机型的预测精度。得到的结果带来新的前景儿童肌肉骨骼疾病的管理。
34. RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model [PDF] 返回目录
Huan Yin, Yue Wang, Runjian Chen, Rong Xiong
Abstract: Radar sensor provides lighting and weather invariant sensing, which is naturally suitable for long-term localization in outdoor scenes. On the other hand, the most popular available map currently is built by lidar. In this paper, we propose a deep neural network for end-to-end learning of radar localization on lidar map to bridge the gap. We first embed both sensor modals into a common feature space by a neural network. Then multiple offsets are added to the map modal for similarity evaluation against the current radar modal, yielding the regression of the current pose. Finally, we apply this differentiable measurement model to a Kalman filter to learn the whole sequential localization process in an end-to-end manner. To validate the feasibility and effectiveness, we employ multi-session multi-scene datasets collected from the real world, and the results demonstrate that our proposed system achieves superior performance over 90km driving, even in generalization scenarios where the model training is in UK, while testing in South Korea. We also release the source code publicly.
摘要:雷达传感器提供照明和天气不变感测,这自然是适合于在室外场景长期定位。在另一方面,最有人气的地图目前是由激光雷达建成。在本文中,我们提出了基于激光雷达的地图为最终到终端的学习雷达定位的深层神经网络来弥补缺口。我们首先通过一个神经网络嵌入两个传感器模态到一个共同的特征空间。然后多个偏移量添加到地图模态为逆水雷达模态相似性评价,得到当前姿势的消退。最后,我们这个微计量模型应用到卡尔曼滤波器学习整个顺序本地化过程中的端至端的方式。为了验证的可行性和有效性,我们采用从现实世界中收集的多会话多场景的数据集,结果表明,我们所提出的系统达到了90公里行驶性能优越,即使在泛化场景中的模型训练是在英国,而测试在韩国。我们也公开发布的源代码。
Huan Yin, Yue Wang, Runjian Chen, Rong Xiong
Abstract: Radar sensor provides lighting and weather invariant sensing, which is naturally suitable for long-term localization in outdoor scenes. On the other hand, the most popular available map currently is built by lidar. In this paper, we propose a deep neural network for end-to-end learning of radar localization on lidar map to bridge the gap. We first embed both sensor modals into a common feature space by a neural network. Then multiple offsets are added to the map modal for similarity evaluation against the current radar modal, yielding the regression of the current pose. Finally, we apply this differentiable measurement model to a Kalman filter to learn the whole sequential localization process in an end-to-end manner. To validate the feasibility and effectiveness, we employ multi-session multi-scene datasets collected from the real world, and the results demonstrate that our proposed system achieves superior performance over 90km driving, even in generalization scenarios where the model training is in UK, while testing in South Korea. We also release the source code publicly.
摘要:雷达传感器提供照明和天气不变感测,这自然是适合于在室外场景长期定位。在另一方面,最有人气的地图目前是由激光雷达建成。在本文中,我们提出了基于激光雷达的地图为最终到终端的学习雷达定位的深层神经网络来弥补缺口。我们首先通过一个神经网络嵌入两个传感器模态到一个共同的特征空间。然后多个偏移量添加到地图模态为逆水雷达模态相似性评价,得到当前姿势的消退。最后,我们这个微计量模型应用到卡尔曼滤波器学习整个顺序本地化过程中的端至端的方式。为了验证的可行性和有效性,我们采用从现实世界中收集的多会话多场景的数据集,结果表明,我们所提出的系统达到了90公里行驶性能优越,即使在泛化场景中的模型训练是在英国,而测试在韩国。我们也公开发布的源代码。
35. Deep Transparent Prediction through Latent Representation Analysis [PDF] 返回目录
D. Kollias, N. Bouas, Y. Vlaxos, V. Brillakis, M. Seferis, I. Kollia, L. Sukissian, J. Wingate, S. Kollias
Abstract: The paper presents a novel deep learning approach, which extracts latent information from trained Deep Neural Networks (DNNs) and derives concise representations that are analyzed in an effective, unified way for prediction purposes. It is well known that DNNs are capable of analyzing complex data; however, they lack transparency in their decision making, in the sense that it is not straightforward to justify their prediction, or to visualize the features on which the decision was based. Moreover, they generally require large amounts of data in order to learn and become able to adapt to different environments. This makes their use difficult in healthcare, where trust and personalization are key issues. Transparency combined with high prediction accuracy are the targeted goals of the proposed approach. It includes both supervised DNN training and unsupervised learning of latent variables extracted from the trained DNNs. Domain Adaptation from multiple sources is also presented as an extension, where the extracted latent variable representations are used to generate predictions in other, non-annotated, environments. Successful application is illustrated through a large experimental study in various fields: prediction of Parkinson's disease from MRI and DaTScans; prediction of COVID-19 and pneumonia from CT scans and X-rays; optical character verification in retail food packaging.
摘要:本文提出了一种新的深度学习的方法,从训练的深层神经网络(DNNs)提取潜在的信息并得出简洁了在用于预测目的,有效,统一的方式分析表示。它公知的是DNNs能够分析复杂的数据;然而,他们缺乏透明度,在他们的决策,在这个意义上,它不是简单的来证明自己的预测,或可视化在其上的决定是基于特征。此外,它们通常需要以学习,并成为能够适应不同环境的大量数据。这使得它们在医疗保健,其中的信任和个性化是关键问题困难。透明度高的预测准确度相结合是该方法的针对性目标。它包括监督DNN训练,并从训练的DNNs提取潜在变量的无监督学习。来自多个源的域的适应也提出作为扩展,其中所提取的潜在变量表示被用于产生其它的,非注释的,环境预测。成功应用是通过在各种领域中的大型实验研究所示:从MRI和DaTScans帕金森氏病的预测; COVID-19和肺炎从CT扫描和X射线的预测;光学字符验证零售食品包装。
D. Kollias, N. Bouas, Y. Vlaxos, V. Brillakis, M. Seferis, I. Kollia, L. Sukissian, J. Wingate, S. Kollias
Abstract: The paper presents a novel deep learning approach, which extracts latent information from trained Deep Neural Networks (DNNs) and derives concise representations that are analyzed in an effective, unified way for prediction purposes. It is well known that DNNs are capable of analyzing complex data; however, they lack transparency in their decision making, in the sense that it is not straightforward to justify their prediction, or to visualize the features on which the decision was based. Moreover, they generally require large amounts of data in order to learn and become able to adapt to different environments. This makes their use difficult in healthcare, where trust and personalization are key issues. Transparency combined with high prediction accuracy are the targeted goals of the proposed approach. It includes both supervised DNN training and unsupervised learning of latent variables extracted from the trained DNNs. Domain Adaptation from multiple sources is also presented as an extension, where the extracted latent variable representations are used to generate predictions in other, non-annotated, environments. Successful application is illustrated through a large experimental study in various fields: prediction of Parkinson's disease from MRI and DaTScans; prediction of COVID-19 and pneumonia from CT scans and X-rays; optical character verification in retail food packaging.
摘要:本文提出了一种新的深度学习的方法,从训练的深层神经网络(DNNs)提取潜在的信息并得出简洁了在用于预测目的,有效,统一的方式分析表示。它公知的是DNNs能够分析复杂的数据;然而,他们缺乏透明度,在他们的决策,在这个意义上,它不是简单的来证明自己的预测,或可视化在其上的决定是基于特征。此外,它们通常需要以学习,并成为能够适应不同环境的大量数据。这使得它们在医疗保健,其中的信任和个性化是关键问题困难。透明度高的预测准确度相结合是该方法的针对性目标。它包括监督DNN训练,并从训练的DNNs提取潜在变量的无监督学习。来自多个源的域的适应也提出作为扩展,其中所提取的潜在变量表示被用于产生其它的,非注释的,环境预测。成功应用是通过在各种领域中的大型实验研究所示:从MRI和DaTScans帕金森氏病的预测; COVID-19和肺炎从CT扫描和X射线的预测;光学字符验证零售食品包装。
36. Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems [PDF] 返回目录
Haoliang Li, Yufei Wang, Xiaofei Xie, Yang Liu, Shiqi Wang, Renjie Wan, Lap-Pui Chau, Alex C. Kot
Abstract: Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.
摘要:深层神经网络(DNN)显示,在许多计算机视觉应用了巨大的成功。然而,他们也被称为易受后门攻击。当进行后门攻击,大部分现有的方法中假设目标DNN始终可用,并且攻击者可以随时注入特定模式来训练数据,以进一步微调的DNN模型。然而,在实践中,这样的攻击可能并不像DNN模型是加密的,只适用于安全区域是可行的。在本文中,我们提出了面部识别系统,其可以在没有目标DNN模型的知识进行一个新的黑盒后门攻击技术。具体而言,我们提出具有新颖彩色条纹图案触发,这可以通过在一个专门的波形调制LED来产生一个后门攻击。我们还使用一种进化计算战略,以优化后门攻击波。我们的后门攻击可以在非常温和的条件下进行:1)该对手不能操纵输入以不自然的方式(例如,注入对抗噪声); 2)对手不能访问的训练数据库; 3)对手没有训练模型的知识以及被害人方所使用的训练集。我们表明,后门触发可以是相当有效的,在进攻成功率可高达$ 88 \%$根据我们的模拟研究和高达$ 40 \%$基于我们考虑人脸识别的任务物理领域的研究在认证过程中最三次尝试验证基础上的。最后,我们评估朝后门攻击,有几个国家的最先进的潜在的防御,并找到我们的进攻仍然有效。我们强调,我们的研究揭示了一个新的物理攻击的后门,这要求对现有的面部识别/验证技术的安全性问题的关注。
Haoliang Li, Yufei Wang, Xiaofei Xie, Yang Liu, Shiqi Wang, Renjie Wan, Lap-Pui Chau, Alex C. Kot
Abstract: Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.
摘要:深层神经网络(DNN)显示,在许多计算机视觉应用了巨大的成功。然而,他们也被称为易受后门攻击。当进行后门攻击,大部分现有的方法中假设目标DNN始终可用,并且攻击者可以随时注入特定模式来训练数据,以进一步微调的DNN模型。然而,在实践中,这样的攻击可能并不像DNN模型是加密的,只适用于安全区域是可行的。在本文中,我们提出了面部识别系统,其可以在没有目标DNN模型的知识进行一个新的黑盒后门攻击技术。具体而言,我们提出具有新颖彩色条纹图案触发,这可以通过在一个专门的波形调制LED来产生一个后门攻击。我们还使用一种进化计算战略,以优化后门攻击波。我们的后门攻击可以在非常温和的条件下进行:1)该对手不能操纵输入以不自然的方式(例如,注入对抗噪声); 2)对手不能访问的训练数据库; 3)对手没有训练模型的知识以及被害人方所使用的训练集。我们表明,后门触发可以是相当有效的,在进攻成功率可高达$ 88 \%$根据我们的模拟研究和高达$ 40 \%$基于我们考虑人脸识别的任务物理领域的研究在认证过程中最三次尝试验证基础上的。最后,我们评估朝后门攻击,有几个国家的最先进的潜在的防御,并找到我们的进攻仍然有效。我们强调,我们的研究揭示了一个新的物理攻击的后门,这要求对现有的面部识别/验证技术的安全性问题的关注。
37. Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup [PDF] 返回目录
Jang-Hyun Kim, Wonho Choo, Hyun Oh Song
Abstract: While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating between the multi-label objective for optimal mixing mask and saliency discounted optimal transport objective. Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets. The source code is available at this https URL.
摘要:尽管深层神经网络实现对拟合训练分布伟大的表现,学习网络很容易发生过度拟合和易受敌对攻击。在这方面,一些基础的mixup增强方法近来已经提出。然而,这些方法主要集中在创建以前看不见的虚拟实例,并且可以提供有时监控信号误导到网络。为此,我们提出了益智混合,显式利用显着信息和自然的例子基础统计的mixup方法。这导致的多标签客观上为最佳混合面具和显着性之间有一个有趣的优化问题交替贴现最佳传送对象。我们的实验表明益智混合达到了艺术概括的状态和对抗稳健结果相比,在CIFAR-100其他的mixup方法,微型-ImageNet和ImageNet数据集。源代码可在此HTTPS URL。
Jang-Hyun Kim, Wonho Choo, Hyun Oh Song
Abstract: While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating between the multi-label objective for optimal mixing mask and saliency discounted optimal transport objective. Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets. The source code is available at this https URL.
摘要:尽管深层神经网络实现对拟合训练分布伟大的表现,学习网络很容易发生过度拟合和易受敌对攻击。在这方面,一些基础的mixup增强方法近来已经提出。然而,这些方法主要集中在创建以前看不见的虚拟实例,并且可以提供有时监控信号误导到网络。为此,我们提出了益智混合,显式利用显着信息和自然的例子基础统计的mixup方法。这导致的多标签客观上为最佳混合面具和显着性之间有一个有趣的优化问题交替贴现最佳传送对象。我们的实验表明益智混合达到了艺术概括的状态和对抗稳健结果相比,在CIFAR-100其他的mixup方法,微型-ImageNet和ImageNet数据集。源代码可在此HTTPS URL。
38. AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results [PDF] 返回目录
Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Liang Chen, Jiangtao Zhang, Xiaotong Luo, Yanyun Qu, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao
Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
摘要:本文综述了高效的单图像超分辨率重点提出的解决方案和结果AIM 2020挑战。挑战任务是超解析基于一组的低之前的例子和相应的高分辨率图像的缩放率X4的输入图像。我们的目标是设计出减少了一个或多个方面,如运行时,参数计数,触发器,激活,和内存消耗,同时至少保持MSRResNet的PSNR的网络。这条赛道有注册学员150,并提交了最终结果25支球队。它们衡量状态的最先进的高效单个图像超分辨率。
Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Liang Chen, Jiangtao Zhang, Xiaotong Luo, Yanyun Qu, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao
Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
摘要:本文综述了高效的单图像超分辨率重点提出的解决方案和结果AIM 2020挑战。挑战任务是超解析基于一组的低之前的例子和相应的高分辨率图像的缩放率X4的输入图像。我们的目标是设计出减少了一个或多个方面,如运行时,参数计数,触发器,激活,和内存消耗,同时至少保持MSRResNet的PSNR的网络。这条赛道有注册学员150,并提交了最终结果25支球队。它们衡量状态的最先进的高效单个图像超分辨率。
39. Learning a Single Model with a Wide Range of Quality Factors for JPEG Image Artifacts Removal [PDF] 返回目录
Jianwei Li, Yongtao Wang, Haihua Xie, Kai-Kuang Ma
Abstract: Lossy compression brings artifacts into the compressed image and degrades the visual quality. In recent years, many compression artifacts removal methods based on convolutional neural network (CNN) have been developed with great success. However, these methods usually train a model based on one specific value or a small range of quality factors. Obviously, if the test image's quality factor does not match to the assumed value range, then degraded performance will be resulted. With this motivation and further consideration of practical usage, a highly robust compression artifacts removal network is proposed in this paper. Our proposed network is a single model approach that can be trained for handling a wide range of quality factors while consistently delivering superior or comparable image artifacts removal performance. To demonstrate, we focus on the JPEG compression with quality factors, ranging from 1 to 60. Note that a turnkey success of our proposed network lies in the novel utilization of the quantization tables as part of the training data. Furthermore, it has two branches in parallel---i.e., the restoration branch and the global branch. The former effectively removes the local artifacts, such as ringing artifacts removal. On the other hand, the latter extracts the global features of the entire image that provides highly instrumental image quality improvement, especially effective on dealing with the global artifacts, such as blocking, color shifting. Extensive experimental results performed on color and grayscale images have clearly demonstrated the effectiveness and efficacy of our proposed single-model approach on the removal of compression artifacts from the decoded image.
摘要:有损压缩带来的文物到压缩图像,并降低了视觉质量。近年来,基于卷积神经网络(CNN)在许多压缩失真去除方法已经发展取得了巨大成功。然而,这些方法通常是训练基于一个特定的值或小的范围的质量因素的模型。显然,如果测试图像的品质因数不匹配假设值范围内,那么降级的性能将导致。与此动机和实际使用的进一步考虑,一个非常稳定的压缩伪像移除网络在本文提出。我们提出的网络是可以处理大范围的质量因素而不断提供卓越的或类似的图像伪影去除性能进行培训,单一的模型方法。为了证明,我们所关注的JPEG压缩质量的因素,从1到60中,我们提出的网络谎言的交钥匙成功的量化表的新利用率训练数据的一部分。此外,在平行两个分支---即,恢复分支和全局分支。前者有效地消除当地文物,如振铃效应去除。在另一方面,后者的提取物,其提供高度仪器的图像质量改进,在应对全球工件,如阻断,色移是特别有效的整个图像的全局特征。广泛的实验结果上进行彩色和灰度图像清楚地表明从解码图像中去除压缩伪像的我们提出的单模型方法的有效性和效率。
Jianwei Li, Yongtao Wang, Haihua Xie, Kai-Kuang Ma
Abstract: Lossy compression brings artifacts into the compressed image and degrades the visual quality. In recent years, many compression artifacts removal methods based on convolutional neural network (CNN) have been developed with great success. However, these methods usually train a model based on one specific value or a small range of quality factors. Obviously, if the test image's quality factor does not match to the assumed value range, then degraded performance will be resulted. With this motivation and further consideration of practical usage, a highly robust compression artifacts removal network is proposed in this paper. Our proposed network is a single model approach that can be trained for handling a wide range of quality factors while consistently delivering superior or comparable image artifacts removal performance. To demonstrate, we focus on the JPEG compression with quality factors, ranging from 1 to 60. Note that a turnkey success of our proposed network lies in the novel utilization of the quantization tables as part of the training data. Furthermore, it has two branches in parallel---i.e., the restoration branch and the global branch. The former effectively removes the local artifacts, such as ringing artifacts removal. On the other hand, the latter extracts the global features of the entire image that provides highly instrumental image quality improvement, especially effective on dealing with the global artifacts, such as blocking, color shifting. Extensive experimental results performed on color and grayscale images have clearly demonstrated the effectiveness and efficacy of our proposed single-model approach on the removal of compression artifacts from the decoded image.
摘要:有损压缩带来的文物到压缩图像,并降低了视觉质量。近年来,基于卷积神经网络(CNN)在许多压缩失真去除方法已经发展取得了巨大成功。然而,这些方法通常是训练基于一个特定的值或小的范围的质量因素的模型。显然,如果测试图像的品质因数不匹配假设值范围内,那么降级的性能将导致。与此动机和实际使用的进一步考虑,一个非常稳定的压缩伪像移除网络在本文提出。我们提出的网络是可以处理大范围的质量因素而不断提供卓越的或类似的图像伪影去除性能进行培训,单一的模型方法。为了证明,我们所关注的JPEG压缩质量的因素,从1到60中,我们提出的网络谎言的交钥匙成功的量化表的新利用率训练数据的一部分。此外,在平行两个分支---即,恢复分支和全局分支。前者有效地消除当地文物,如振铃效应去除。在另一方面,后者的提取物,其提供高度仪器的图像质量改进,在应对全球工件,如阻断,色移是特别有效的整个图像的全局特征。广泛的实验结果上进行彩色和灰度图像清楚地表明从解码图像中去除压缩伪像的我们提出的单模型方法的有效性和效率。
40. Ensemble learning of diffractive optical networks [PDF] 返回目录
Md Sadman Sakib Rahman, Jingxi Li, Deniz Mengu, Yair Rivenson, Aydogan Ozcan
Abstract: A plethora of research advances have emerged in the fields of optics and photonics that benefit from harnessing the power of machine learning. Specifically, there has been a revival of interest in optical computing hardware, due to its potential advantages for machine learning tasks in terms of parallelization, power efficiency and computation speed. Diffractive Deep Neural Networks (D2NNs) form such an optical computing framework, which benefits from deep learning-based design of successive diffractive layers to all-optically process information as the input light diffracts through these passive layers. D2NNs have demonstrated success in various tasks, including e.g., object classification, spectral-encoding of information, optical pulse shaping and imaging, among others. Here, we significantly improve the inference performance of diffractive optical networks using feature engineering and ensemble learning. After independently training a total of 1252 D2NNs that were diversely engineered with a variety of passive input filters, we applied a pruning algorithm to select an optimized ensemble of D2NNs that collectively improve their image classification accuracy. Through this pruning, we numerically demonstrated that ensembles of N=14 and N=30 D2NNs achieve blind testing accuracies of 61.14% and 62.13%, respectively, on the classification of CIFAR-10 test images, providing an inference improvement of >16% compared to the average performance of the individual D2NNs within each ensemble. These results constitute the highest inference accuracies achieved to date by any diffractive optical neural network design on the same dataset and might provide a significant leapfrog to extend the application space of diffractive optical image classification and machine vision systems.
摘要:研究进展的大量涌现在光学领域,光电子,从利用机器学习的动力优势。具体来说,出现了光学计算硬件的兴趣,复兴由于其机器学习任务并行化,能效和计算速度方面的潜在优势。衍射深神经网络(D2NNs)形成这样的光学计算框架,其受益于连续的衍射层的深基于学习的设计,以通过这些无源层的输入光衍射全光学处理信息。 D2NNs已经在各种任务,包括例如,对象分类,编码频谱的信息,光脉冲整形和成像,等等证明成功。在这里,我们显著提高使用功能的工程和集成学习的衍射光网络的推理性能。总共1252个D2NNs其被不同地与多种被动输入滤波器设计独立地训练之后,我们采用了修剪算法来选择一个优化的合奏共同提高其图像的分类精度的D2NNs。通过此修剪,我们数值表明的N = 14和N = 30个D2NNs合奏实现的分别61.14%和62.13%,盲检测的精度,上CIFAR-10测试图像的分类,提供> 16%的推断改进相比每个集合内的各个D2NNs的平均性能。这些结果构成任何衍射光学神经网络的设计对同一数据集迄今为止取得的最高推断精度,并可能提供显著越级延长衍射光学图像分类和机器视觉系统的应用空间。
Md Sadman Sakib Rahman, Jingxi Li, Deniz Mengu, Yair Rivenson, Aydogan Ozcan
Abstract: A plethora of research advances have emerged in the fields of optics and photonics that benefit from harnessing the power of machine learning. Specifically, there has been a revival of interest in optical computing hardware, due to its potential advantages for machine learning tasks in terms of parallelization, power efficiency and computation speed. Diffractive Deep Neural Networks (D2NNs) form such an optical computing framework, which benefits from deep learning-based design of successive diffractive layers to all-optically process information as the input light diffracts through these passive layers. D2NNs have demonstrated success in various tasks, including e.g., object classification, spectral-encoding of information, optical pulse shaping and imaging, among others. Here, we significantly improve the inference performance of diffractive optical networks using feature engineering and ensemble learning. After independently training a total of 1252 D2NNs that were diversely engineered with a variety of passive input filters, we applied a pruning algorithm to select an optimized ensemble of D2NNs that collectively improve their image classification accuracy. Through this pruning, we numerically demonstrated that ensembles of N=14 and N=30 D2NNs achieve blind testing accuracies of 61.14% and 62.13%, respectively, on the classification of CIFAR-10 test images, providing an inference improvement of >16% compared to the average performance of the individual D2NNs within each ensemble. These results constitute the highest inference accuracies achieved to date by any diffractive optical neural network design on the same dataset and might provide a significant leapfrog to extend the application space of diffractive optical image classification and machine vision systems.
摘要:研究进展的大量涌现在光学领域,光电子,从利用机器学习的动力优势。具体来说,出现了光学计算硬件的兴趣,复兴由于其机器学习任务并行化,能效和计算速度方面的潜在优势。衍射深神经网络(D2NNs)形成这样的光学计算框架,其受益于连续的衍射层的深基于学习的设计,以通过这些无源层的输入光衍射全光学处理信息。 D2NNs已经在各种任务,包括例如,对象分类,编码频谱的信息,光脉冲整形和成像,等等证明成功。在这里,我们显著提高使用功能的工程和集成学习的衍射光网络的推理性能。总共1252个D2NNs其被不同地与多种被动输入滤波器设计独立地训练之后,我们采用了修剪算法来选择一个优化的合奏共同提高其图像的分类精度的D2NNs。通过此修剪,我们数值表明的N = 14和N = 30个D2NNs合奏实现的分别61.14%和62.13%,盲检测的精度,上CIFAR-10测试图像的分类,提供> 16%的推断改进相比每个集合内的各个D2NNs的平均性能。这些结果构成任何衍射光学神经网络的设计对同一数据集迄今为止取得的最高推断精度,并可能提供显著越级延长衍射光学图像分类和机器视觉系统的应用空间。
41. Deep Reinforcement Learning for Unknown Anomaly Detection [PDF] 返回目录
Guansong Pang, Anton van den Hengel, Chunhua Shen, Longbing Cao
Abstract: We address a critical yet largely unsolved anomaly detection problem, in which we aim to learn detection models from a small set of partially labeled anomalies and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either proceed unsupervised with the unlabeled data, or exclusively fit the limited anomaly examples that often do not span the entire set of anomalies. We propose here instead a deep reinforcement-learning-based approach that actively seeks novel classes of anomaly that lie beyond the scope of the labeled training data. This approach learns to balance exploiting its existing data model against exploring for new classes of anomaly. It is thus able to exploit the labeled anomaly data to improve detection accuracy, without limiting the set of anomalies sought to those given anomaly examples. This is of significant practical benefit, as anomalies are inevitably unpredictable in form and often expensive to miss. Extensive experiments on 48 real-world datasets show that our approach significantly outperforms five state-of-the-art competing methods.
摘要:我们解决的关键又在很大程度上解决的异常检测的问题,在我们的目标是从一个小集合部分标记异常和大规模数据集未标记学习检测模型。这是在许多重要的应用中常见的场景。现有的相关方法进行下去要么无人看管的无标签的数据,或者完全适应,往往不跨越整个集异常的有限异常的例子。相反,我们在这里提出了一个深基于强化学习的办法,积极寻求反常现象谎言的小说类超出了标记的训练数据的范围。这种方法学会了平衡利用其现有的数据模型,对探索新的类别的异常。因此能够利用标记的异常数据,以提高检测精度,不限制该组异常寻求给定异常的实例的那些。这是显著的实际好处,因为异常是在形式上必然不可预测的,往往价格昂贵错过。 48真实世界的数据集,大量实验表明,我们的方法显著优于五州的最先进的竞争方法。
Guansong Pang, Anton van den Hengel, Chunhua Shen, Longbing Cao
Abstract: We address a critical yet largely unsolved anomaly detection problem, in which we aim to learn detection models from a small set of partially labeled anomalies and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either proceed unsupervised with the unlabeled data, or exclusively fit the limited anomaly examples that often do not span the entire set of anomalies. We propose here instead a deep reinforcement-learning-based approach that actively seeks novel classes of anomaly that lie beyond the scope of the labeled training data. This approach learns to balance exploiting its existing data model against exploring for new classes of anomaly. It is thus able to exploit the labeled anomaly data to improve detection accuracy, without limiting the set of anomalies sought to those given anomaly examples. This is of significant practical benefit, as anomalies are inevitably unpredictable in form and often expensive to miss. Extensive experiments on 48 real-world datasets show that our approach significantly outperforms five state-of-the-art competing methods.
摘要:我们解决的关键又在很大程度上解决的异常检测的问题,在我们的目标是从一个小集合部分标记异常和大规模数据集未标记学习检测模型。这是在许多重要的应用中常见的场景。现有的相关方法进行下去要么无人看管的无标签的数据,或者完全适应,往往不跨越整个集异常的有限异常的例子。相反,我们在这里提出了一个深基于强化学习的办法,积极寻求反常现象谎言的小说类超出了标记的训练数据的范围。这种方法学会了平衡利用其现有的数据模型,对探索新的类别的异常。因此能够利用标记的异常数据,以提高检测精度,不限制该组异常寻求给定异常的实例的那些。这是显著的实际好处,因为异常是在形式上必然不可预测的,往往价格昂贵错过。 48真实世界的数据集,大量实验表明,我们的方法显著优于五州的最先进的竞争方法。
42. Microscope Based HER2 Scoring System [PDF] 返回目录
Jun Zhang, Kuan Tian, Pei Dong, Haocheng Shen, Kezhou Yan, Jianhua Yao, Junzhou Huang, Xiao Han
Abstract: The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.
摘要:人表皮生长因子受体2(HER2)的过表达已被确定为在多种类型的癌症,如乳腺癌和胃癌的治疗靶标。免疫组织化学(IHC)被用作基本HER2测试以识别HER2阳性,边缘,和HER2阴性患者。然而,HER2得分的可靠性和准确性受很多因素,比如病理学家经验的影响。近日,人工智能(AI)已经在各种疾病的诊断来提高诊断的准确性和可靠性,但诊断结果的解释仍然是一个悬而未决的问题。在本文中,我们提出了一个实时的HER2评分系统,它遵循HER2得分指导方针,完成诊断,因此每一步都是可以解释的。与基于全滑动成像以前的评分系统,我们的HER2得分系统被集成到的是,虽然读滑动罐反馈AI结果给病理学家的增强现实(AR)显微镜。病理学家可以帮助的视图中选择信息字段(FOV的),避免了混杂区域,诸如DCIS。重要的是,我们示出了具有膜染色条件和细胞分类结果的中间结果,使得可以评估诊断结果的可靠性。此外,我们支持地区选择的兴趣,使我们的系统在临床实践中更灵活的交互式修改。 AI和病理学家的合作,可以显著提高我们系统的鲁棒性。我们评估我们与285乳房IHC HER2滑梯系统,以及95只\%,显示了我们的HER2评分系统的有效性分类精度。
Jun Zhang, Kuan Tian, Pei Dong, Haocheng Shen, Kezhou Yan, Jianhua Yao, Junzhou Huang, Xiao Han
Abstract: The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.
摘要:人表皮生长因子受体2(HER2)的过表达已被确定为在多种类型的癌症,如乳腺癌和胃癌的治疗靶标。免疫组织化学(IHC)被用作基本HER2测试以识别HER2阳性,边缘,和HER2阴性患者。然而,HER2得分的可靠性和准确性受很多因素,比如病理学家经验的影响。近日,人工智能(AI)已经在各种疾病的诊断来提高诊断的准确性和可靠性,但诊断结果的解释仍然是一个悬而未决的问题。在本文中,我们提出了一个实时的HER2评分系统,它遵循HER2得分指导方针,完成诊断,因此每一步都是可以解释的。与基于全滑动成像以前的评分系统,我们的HER2得分系统被集成到的是,虽然读滑动罐反馈AI结果给病理学家的增强现实(AR)显微镜。病理学家可以帮助的视图中选择信息字段(FOV的),避免了混杂区域,诸如DCIS。重要的是,我们示出了具有膜染色条件和细胞分类结果的中间结果,使得可以评估诊断结果的可靠性。此外,我们支持地区选择的兴趣,使我们的系统在临床实践中更灵活的交互式修改。 AI和病理学家的合作,可以显著提高我们系统的鲁棒性。我们评估我们与285乳房IHC HER2滑梯系统,以及95只\%,显示了我们的HER2评分系统的有效性分类精度。
43. Short-term synaptic plasticity optimally models continuous environments [PDF] 返回目录
Timoleon Moraitis, Abu Sebastian, Evangelos Eleftheriou
Abstract: Biological neural networks operate with extraordinary energy efficiency, owing to properties such as spike-based communication and synaptic plasticity driven by local activity. When emulated in silico, such properties also enable highly energy-efficient machine learning and inference systems. However, it is unclear whether these mechanisms only trade off performance for efficiency or rather they are partly responsible for the superiority of biological intelligence. Here, we first address this theoretically, proving rigorously that indeed the optimal prediction and inference of randomly but continuously transforming environments, a common natural setting, relies on adaptivity through short-term spike-timing dependent plasticity, a hallmark of biological neural networks. Secondly, we assess this theoretical optimality via simulations and also demonstrate improved artificial intelligence (AI). For the first time, a largely biologically modelled spiking neural network (SNN) surpasses state-of-the-art artificial neural networks (ANNs) in all relevant aspects, in an example task of recognizing video frames transformed by moving occlusions. The SNN recognizes the frames more accurately, even if trained on few, still, and untransformed images, with unsupervised and synaptically-local learning, binary spikes, and a single layer of neurons - all in contrast to the deep-learning-trained ANNs. These results indicate that on-line adaptivity and spike-based computation may optimize natural intelligence for natural environments. Moreover, this expands the goal of exploiting biological neuro-synaptic properties for AI, from mere efficiency, to computational supremacy altogether.
摘要:生物神经网络具有非凡的节能操作,由于诸如基于穗通信和突触可塑性通过局部活动驱动的特性。当在计算机芯片仿真,这些性质也使高能效的机器学习和推理系统。但是,目前还不清楚这些机制是否只是权衡性能效率,或者说它们是生物智能的优势部分原因。在这里,我们首先解决这个理论上,证明严格确实最优预测和随机而是连续转化环境推断,一个常见的天然设置中,通过短期尖峰定时依赖可塑性,生物神经网络的一个特点依赖于自适应性。其次,我们评估通过模拟这种理论上的最优性,也显示出改进的人工智能(AI)。对于第一次,在很大程度上生物模拟尖峰神经网络(SNN)超过在所有相关方面的国家的最先进的人工神经网络(人工神经网络),在识别通过移动遮挡变换的视频帧的例子的任务。该SNN更准确地识别帧,即使培训了几下,仍和未转换的图像,与无监督和突触本地学习,二进制尖峰和神经元的单层 - 所有对比的深学习训练的人工神经网络。这些结果表明,上线的适应性和基于穗计算可以优化自然智能的自然环境。此外,这种扩展共利用生物神经突触性质AI,从单纯的效率,计算至上的目标。
Timoleon Moraitis, Abu Sebastian, Evangelos Eleftheriou
Abstract: Biological neural networks operate with extraordinary energy efficiency, owing to properties such as spike-based communication and synaptic plasticity driven by local activity. When emulated in silico, such properties also enable highly energy-efficient machine learning and inference systems. However, it is unclear whether these mechanisms only trade off performance for efficiency or rather they are partly responsible for the superiority of biological intelligence. Here, we first address this theoretically, proving rigorously that indeed the optimal prediction and inference of randomly but continuously transforming environments, a common natural setting, relies on adaptivity through short-term spike-timing dependent plasticity, a hallmark of biological neural networks. Secondly, we assess this theoretical optimality via simulations and also demonstrate improved artificial intelligence (AI). For the first time, a largely biologically modelled spiking neural network (SNN) surpasses state-of-the-art artificial neural networks (ANNs) in all relevant aspects, in an example task of recognizing video frames transformed by moving occlusions. The SNN recognizes the frames more accurately, even if trained on few, still, and untransformed images, with unsupervised and synaptically-local learning, binary spikes, and a single layer of neurons - all in contrast to the deep-learning-trained ANNs. These results indicate that on-line adaptivity and spike-based computation may optimize natural intelligence for natural environments. Moreover, this expands the goal of exploiting biological neuro-synaptic properties for AI, from mere efficiency, to computational supremacy altogether.
摘要:生物神经网络具有非凡的节能操作,由于诸如基于穗通信和突触可塑性通过局部活动驱动的特性。当在计算机芯片仿真,这些性质也使高能效的机器学习和推理系统。但是,目前还不清楚这些机制是否只是权衡性能效率,或者说它们是生物智能的优势部分原因。在这里,我们首先解决这个理论上,证明严格确实最优预测和随机而是连续转化环境推断,一个常见的天然设置中,通过短期尖峰定时依赖可塑性,生物神经网络的一个特点依赖于自适应性。其次,我们评估通过模拟这种理论上的最优性,也显示出改进的人工智能(AI)。对于第一次,在很大程度上生物模拟尖峰神经网络(SNN)超过在所有相关方面的国家的最先进的人工神经网络(人工神经网络),在识别通过移动遮挡变换的视频帧的例子的任务。该SNN更准确地识别帧,即使培训了几下,仍和未转换的图像,与无监督和突触本地学习,二进制尖峰和神经元的单层 - 所有对比的深学习训练的人工神经网络。这些结果表明,上线的适应性和基于穗计算可以优化自然智能的自然环境。此外,这种扩展共利用生物神经突触性质AI,从单纯的效率,计算至上的目标。
44. Qutrit-inspired Fully Self-supervised Shallow Quantum Learning Network for Brain Tumor Segmentation [PDF] 返回目录
Debanjan Konar, Siddhartha Bhattacharyya, Bijaya K. Panigrahi, Elizabeth Behrman
Abstract: Classical self-supervised networks suffer from convergence problems and reduced segmentation accuracy due to forceful termination. Qubits or bi-level quantum bits often describe quantum neural network models. In this article, a novel self-supervised shallow learning network model exploiting the sophisticated three-level qutrit-inspired quantum information system referred to as Quantum Fully Self-Supervised Neural Network (QFS-Net) is presented for automated segmentation of brain MR images. The QFS-Net model comprises a trinity of a layered structure of qutrits inter-connected through parametric Hadamard gates using an 8-connected second-order neighborhood-based topology. The non-linear transformation of the qutrit states allows the underlying quantum neural network model to encode the quantum states, thereby enabling a faster self-organized counter-propagation of these states between the layers without supervision. The suggested QFS-Net model is tailored and extensively validated on Cancer Imaging Archive (TCIA) data set collected from Nature repository and also compared with state of the art supervised (U-Net and URes-Net architectures) and the self-supervised QIS-Net model. Results shed promising segmented outcome in detecting tumors in terms of dice similarity and accuracy with minimum human intervention and computational resources.
摘要:经典的自我监督网络免受因强制结束衔接问题,减少分割的准确性受到影响。量子位或双级量子比特经常描述量子神经网络模型。在本文中,一种新颖的自监督浅学习网络模型利用精密的三电平qutrit启发量子信息系统被称为量子完全自受监督的神经网络(QFS-净)提出了一种用于大脑MR图像的自动分割。所述QFS-Net的模型包括qutrits的层状结构的一个三位一体相互连接通过使用8连接的第二阶基于邻域的拓扑参数哈达玛栅极。所述qutrit状态的非线性变换允许底层量子神经网络模型来编码的量子状态,从而能够在层之间这些状态中的无监督更快自组织反传播。所建议的QFS网模型被定制并广泛癌症成像存档(TCIA)从自然库收集并且也与现有技术状态相比数据组验证监督(U-Net和URES-Net的架构)和自监督QIS-网络模型。结果棚在骰子相似性和准确性具有最小的人为干预和计算资源方面检测肿瘤有为分割结果。
Debanjan Konar, Siddhartha Bhattacharyya, Bijaya K. Panigrahi, Elizabeth Behrman
Abstract: Classical self-supervised networks suffer from convergence problems and reduced segmentation accuracy due to forceful termination. Qubits or bi-level quantum bits often describe quantum neural network models. In this article, a novel self-supervised shallow learning network model exploiting the sophisticated three-level qutrit-inspired quantum information system referred to as Quantum Fully Self-Supervised Neural Network (QFS-Net) is presented for automated segmentation of brain MR images. The QFS-Net model comprises a trinity of a layered structure of qutrits inter-connected through parametric Hadamard gates using an 8-connected second-order neighborhood-based topology. The non-linear transformation of the qutrit states allows the underlying quantum neural network model to encode the quantum states, thereby enabling a faster self-organized counter-propagation of these states between the layers without supervision. The suggested QFS-Net model is tailored and extensively validated on Cancer Imaging Archive (TCIA) data set collected from Nature repository and also compared with state of the art supervised (U-Net and URes-Net architectures) and the self-supervised QIS-Net model. Results shed promising segmented outcome in detecting tumors in terms of dice similarity and accuracy with minimum human intervention and computational resources.
摘要:经典的自我监督网络免受因强制结束衔接问题,减少分割的准确性受到影响。量子位或双级量子比特经常描述量子神经网络模型。在本文中,一种新颖的自监督浅学习网络模型利用精密的三电平qutrit启发量子信息系统被称为量子完全自受监督的神经网络(QFS-净)提出了一种用于大脑MR图像的自动分割。所述QFS-Net的模型包括qutrits的层状结构的一个三位一体相互连接通过使用8连接的第二阶基于邻域的拓扑参数哈达玛栅极。所述qutrit状态的非线性变换允许底层量子神经网络模型来编码的量子状态,从而能够在层之间这些状态中的无监督更快自组织反传播。所建议的QFS网模型被定制并广泛癌症成像存档(TCIA)从自然库收集并且也与现有技术状态相比数据组验证监督(U-Net和URES-Net的架构)和自监督QIS-网络模型。结果棚在骰子相似性和准确性具有最小的人为干预和计算资源方面检测肿瘤有为分割结果。
45. Simultaneous Denoising and Motion Estimation for Low-dose Gated PET using a Siamese Adversarial Network with Gate-to-Gate Consistency Learning [PDF] 返回目录
Bo Zhou, Yu-Jung Tsai, Chi Liu
Abstract: Gating is commonly used in PET imaging to reduce respiratory motion blurring and facilitate more sophisticated motion correction methods. In the applications of low dose PET, however, reducing injection dose causes increased noise and reduces signal-to-noise ratio (SNR), subsequently corrupting the motion estimation/correction steps, causing inferior image quality. To tackle these issues, we first propose a Siamese adversarial network (SAN) that can efficiently recover high dose gated image volume from low dose gated image volume. To ensure the appearance consistency between the recovered gated volumes, we then utilize a pre-trained motion estimation network incorporated into SAN that enables the constraint of gate-to-gate (G2G) consistency. With high-quality recovered gated volumes, gate-to-gate motion vectors can be simultaneously outputted from the motion estimation network. Comprehensive evaluations on a low dose gated PET dataset of 29 subjects demonstrate that our method can effectively recover the low dose gated PET volumes, with an average PSNR of 37.16 and SSIM of 0.97, and simultaneously generate robust motion estimation that could benefit subsequent motion corrections.
摘要:门控是在PET成像中通常用来降低呼吸运动模糊和促进更复杂的运动校正的方法。在低剂量的PET的应用,然而,降低注射剂量引起噪声增加,并且降低了信噪比(SNR),随后破坏运动估计/校正的步骤,从而导致较差的图像质量。为了解决这些问题,我们首先提出了一个连体对抗网络(SAN),可以有效地恢复从低剂量门控图像体积高剂量门控图像体积。为了确保所回收的门控体积之间的外观的一致性,我们然后利用并入到SAN预训练的运动估计网络,使的栅极到栅极(G2G)一致性约束。提供高品质的回收门控卷,栅极到栅极的运动矢量可以同时从运动估计网络输出。上的低全面评估剂量门控PET数据集的29名受试者表明,我们的方法可以有效地恢复低剂量门控PET卷,用0.97 37.16和SSIM的平均PSNR,并且同时生成可受益后续运动校正健壮运动估计。
Bo Zhou, Yu-Jung Tsai, Chi Liu
Abstract: Gating is commonly used in PET imaging to reduce respiratory motion blurring and facilitate more sophisticated motion correction methods. In the applications of low dose PET, however, reducing injection dose causes increased noise and reduces signal-to-noise ratio (SNR), subsequently corrupting the motion estimation/correction steps, causing inferior image quality. To tackle these issues, we first propose a Siamese adversarial network (SAN) that can efficiently recover high dose gated image volume from low dose gated image volume. To ensure the appearance consistency between the recovered gated volumes, we then utilize a pre-trained motion estimation network incorporated into SAN that enables the constraint of gate-to-gate (G2G) consistency. With high-quality recovered gated volumes, gate-to-gate motion vectors can be simultaneously outputted from the motion estimation network. Comprehensive evaluations on a low dose gated PET dataset of 29 subjects demonstrate that our method can effectively recover the low dose gated PET volumes, with an average PSNR of 37.16 and SSIM of 0.97, and simultaneously generate robust motion estimation that could benefit subsequent motion corrections.
摘要:门控是在PET成像中通常用来降低呼吸运动模糊和促进更复杂的运动校正的方法。在低剂量的PET的应用,然而,降低注射剂量引起噪声增加,并且降低了信噪比(SNR),随后破坏运动估计/校正的步骤,从而导致较差的图像质量。为了解决这些问题,我们首先提出了一个连体对抗网络(SAN),可以有效地恢复从低剂量门控图像体积高剂量门控图像体积。为了确保所回收的门控体积之间的外观的一致性,我们然后利用并入到SAN预训练的运动估计网络,使的栅极到栅极(G2G)一致性约束。提供高品质的回收门控卷,栅极到栅极的运动矢量可以同时从运动估计网络输出。上的低全面评估剂量门控PET数据集的29名受试者表明,我们的方法可以有效地恢复低剂量门控PET卷,用0.97 37.16和SSIM的平均PSNR,并且同时生成可受益后续运动校正健壮运动估计。
46. Efficient Transformers: A Survey [PDF] 返回目录
Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
Abstract: Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of \emph{"X-former"} models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few which improve upon the original Transformer architecture, many of which make improvements around computational and memory \emph{efficiency}. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.
摘要:变压器模型架构已经赢得了兴趣盎然最近由于在一系列类似的语言,视觉和强化学习领域中的有效性。在例如自然语言处理领域,变形金刚已经成为现代深度学习堆栈中不可缺少的主食。近日,{“X-前”}模型已被提出\ EMPH令人目眩的数字 - 改革者,Linformer,演员,Longformer,仅举几例,其提高后,原来的变压器结构,其中有许多作出改善周围的计算和存储\ EMPH {效率}。以就业为狂热的研究员浏览这个乱舞的目的,本文表征大周到选择最近的效率味“X-前”车型,提供跨多个域现有的工作和模型的有组织的和全面的概述。
Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
Abstract: Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of \emph{"X-former"} models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few which improve upon the original Transformer architecture, many of which make improvements around computational and memory \emph{efficiency}. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.
摘要:变压器模型架构已经赢得了兴趣盎然最近由于在一系列类似的语言,视觉和强化学习领域中的有效性。在例如自然语言处理领域,变形金刚已经成为现代深度学习堆栈中不可缺少的主食。近日,{“X-前”}模型已被提出\ EMPH令人目眩的数字 - 改革者,Linformer,演员,Longformer,仅举几例,其提高后,原来的变压器结构,其中有许多作出改善周围的计算和存储\ EMPH {效率}。以就业为狂热的研究员浏览这个乱舞的目的,本文表征大周到选择最近的效率味“X-前”车型,提供跨多个域现有的工作和模型的有组织的和全面的概述。
47. Hold Tight and Never Let Go: Security of Deep Learning based Automated Lane Centering under Physical-World Attack [PDF] 返回目录
Takami Sato, Junjie Shen, Ningfei Wang, Yunhan Jack Jia, Xue Lin, Qi Alfred Chen
Abstract: Automated Lane Centering (ALC) systems are convenient and widely deployed today, but also highly security and safety critical. In this work, we are the first to systematically study the security of state-of-the-art deep learning based ALC systems in their designed operational domains under physical-world adversarial attacks. We formulate the problem with a safety-critical attack goal, and a novel and domain-specific attack vector: dirty road patches. To systematically generate the attack, we adopt an optimization-based approach and overcome domain-specific design challenges such as camera frame inter-dependencies due to dynamic vehicle actuation, and the lack of objective function design for lane detection models. We evaluate our attack method on a production ALC system using 80 attack scenarios from real-world driving traces. The results show that our attack is highly effective with over 92% success rates and less than 0.95 sec average success time, which is substantially lower than the average driver reaction time. Such high attack effectiveness is also found (1) robust to motion model inaccuracies, different lane detection model designs, and physical-world factors, and (2) stealthy from the driver's view. To concretely understand the end-to-end safety consequences, we further evaluate on concrete real-world attack scenarios using a production-grade simulator, and find that our attack can successfully cause the victim to hit the highway concrete barrier or a truck in the opposite direction with 98% and 100% success rates. We also discuss defense directions.
摘要:自动车道(ALC)系统今天方便和广泛部署的,但也高度的安全性和安全性是至关重要的。在这项工作中,我们是第一个系统地研究国家的最先进的深学习他们的设计工作域的基于ALC系统在物理世界中的对抗攻击的安全性。我们制定的问题,安全性至关重要的攻击目标,一个新的和特定域的攻击向量:肮脏的马路补丁。为了系统产生的攻击,我们采用基于优化的方法,克服特定领域的设计,由于动态车辆驱动的挑战,如摄像头帧相互依赖性,以及缺乏客观的功能设计车道检测模型。我们评估我们使用80分攻击的情况下,从实际驾驶的痕迹生产ALC系统上的攻击方法。结果表明,我们的攻击是具有超过92%的成功率和小于0.95秒的平均成功时,其基本上大于一般的驾驶员的反应时间降低非常有效。这种高攻击效力还发现(1)鲁棒的运动模型的不准确性,不同车道检测模型的设计,和物理世界的因素,和(2)从驾驶员的视野隐身。为了具体了解终端到终端的安全后果,我们进一步评估对使用生产级模拟器具体现实世界的攻击方案,并找到我们的攻击能够成功的原因受害者击中高速公路水泥墩或卡车用98%和100%的成功率相反的方向。我们还讨论了辩护方向。
Takami Sato, Junjie Shen, Ningfei Wang, Yunhan Jack Jia, Xue Lin, Qi Alfred Chen
Abstract: Automated Lane Centering (ALC) systems are convenient and widely deployed today, but also highly security and safety critical. In this work, we are the first to systematically study the security of state-of-the-art deep learning based ALC systems in their designed operational domains under physical-world adversarial attacks. We formulate the problem with a safety-critical attack goal, and a novel and domain-specific attack vector: dirty road patches. To systematically generate the attack, we adopt an optimization-based approach and overcome domain-specific design challenges such as camera frame inter-dependencies due to dynamic vehicle actuation, and the lack of objective function design for lane detection models. We evaluate our attack method on a production ALC system using 80 attack scenarios from real-world driving traces. The results show that our attack is highly effective with over 92% success rates and less than 0.95 sec average success time, which is substantially lower than the average driver reaction time. Such high attack effectiveness is also found (1) robust to motion model inaccuracies, different lane detection model designs, and physical-world factors, and (2) stealthy from the driver's view. To concretely understand the end-to-end safety consequences, we further evaluate on concrete real-world attack scenarios using a production-grade simulator, and find that our attack can successfully cause the victim to hit the highway concrete barrier or a truck in the opposite direction with 98% and 100% success rates. We also discuss defense directions.
摘要:自动车道(ALC)系统今天方便和广泛部署的,但也高度的安全性和安全性是至关重要的。在这项工作中,我们是第一个系统地研究国家的最先进的深学习他们的设计工作域的基于ALC系统在物理世界中的对抗攻击的安全性。我们制定的问题,安全性至关重要的攻击目标,一个新的和特定域的攻击向量:肮脏的马路补丁。为了系统产生的攻击,我们采用基于优化的方法,克服特定领域的设计,由于动态车辆驱动的挑战,如摄像头帧相互依赖性,以及缺乏客观的功能设计车道检测模型。我们评估我们使用80分攻击的情况下,从实际驾驶的痕迹生产ALC系统上的攻击方法。结果表明,我们的攻击是具有超过92%的成功率和小于0.95秒的平均成功时,其基本上大于一般的驾驶员的反应时间降低非常有效。这种高攻击效力还发现(1)鲁棒的运动模型的不准确性,不同车道检测模型的设计,和物理世界的因素,和(2)从驾驶员的视野隐身。为了具体了解终端到终端的安全后果,我们进一步评估对使用生产级模拟器具体现实世界的攻击方案,并找到我们的攻击能够成功的原因受害者击中高速公路水泥墩或卡车用98%和100%的成功率相反的方向。我们还讨论了辩护方向。
注:中文为机器翻译结果!封面为论文标题词云图!