目录
1. Geometric Graph Representations and Geometric Graph Convolutions for Deep Learning on Three-Dimensional (3D) Graphs [PDF] 摘要
14. Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining [PDF] 摘要
18. A heterogeneous branch and multi-level classification network for person re-identification [PDF] 摘要
23. An embedded system for the automated generation of labeled plant images to enable machine learning applications in agriculture [PDF] 摘要
25. Fast and accurate aberration estimation from 3D bead images using convolutional neural networks [PDF] 摘要
28. AnalogNet: Convolutional Neural Network Inference on Analog Focal Plane Sensor Processors [PDF] 摘要
29. A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks [PDF] 摘要
31. A Review on End-To-End Methods for Brain Tumor Segmentation and Overall Survival Prediction [PDF] 摘要
33. CT-based COVID-19 Triage: Deep Multitask Learning Improves Joint Identification and Severity Quantification [PDF] 摘要
35. COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images [PDF] 摘要
36. Exploring the role of Input and Output Layers of a Deep Neural Network in Adversarial Defense [PDF] 摘要
37. Adaptive convolutional neural networks for k-space data interpolation in fast magnetic resonance imaging [PDF] 摘要
39. Fusion of Real Time Thermal Image and 1D/2D/3D Depth Laser Readings for Remote Thermal Sensing in Industrial Plants by Means of UAVs and/or Robots [PDF] 摘要
40. A comparative study of 2D image segmentation algorithms for traumatic brain lesions using CT data from the ProTECTIII multicenter clinical trial [PDF] 摘要
摘要
1. Geometric Graph Representations and Geometric Graph Convolutions for Deep Learning on Three-Dimensional (3D) Graphs [PDF] 返回目录
Daniel T. Chang
Abstract: The geometry of three-dimensional (3D) graphs, consisting of nodes and edges, plays a crucial role in many important applications. An excellent example is molecular graphs, whose geometry influences important properties of a molecule including its reactivity and biological activity. To facilitate the incorporation of geometry in deep learning on 3D graphs, we define three types of geometric graph representations: positional, angle-geometric and distance-geometric. For proof of concept, we use the distance-geometric graph representation for geometric graph convolutions. Further, to utilize standard graph convolution networks, we employ a simple edge weight / edge distance correlation scheme, whose parameters can be fixed using reference values or determined through Bayesian hyperparameter optimization. The results of geometric graph convolutions, for the ESOL and Freesol datasets, show significant improvement over those of standard graph convolutions. Our work demonstrates the feasibility and promise of incorporating geometry, using the distance-geometric graph representation, in deep learning on 3D graphs.
摘要:三维(3D)图形的几何形状,由节点和边缘的,在扮演着许多重要的应用至关重要的作用。一个很好的例子是分子图,其几何形状影响的分子的重要特性,包括其反应性和生物活性。为了便于在3D图形深度学习几何的掺入,我们定义了三种类型的几何图形表示的:位置,角度和几何距离几何。用于概念验证,我们使用几何图形卷积的距离几何图表示。此外,利用标准曲线卷积网络,我们采用一个简单的边缘权重/边缘距离相关性方案,其参数可使用的基准值是固定的或通过贝叶斯超参数优化来确定。几何图形卷积的结果,对于ESOL和Freesol数据集,表现出对这些标准的图形化卷积显著改善。我们的工作表明的可行性,并纳入几何形状,使用距离几何图表示,在深学习三维图形的承诺。
Daniel T. Chang
Abstract: The geometry of three-dimensional (3D) graphs, consisting of nodes and edges, plays a crucial role in many important applications. An excellent example is molecular graphs, whose geometry influences important properties of a molecule including its reactivity and biological activity. To facilitate the incorporation of geometry in deep learning on 3D graphs, we define three types of geometric graph representations: positional, angle-geometric and distance-geometric. For proof of concept, we use the distance-geometric graph representation for geometric graph convolutions. Further, to utilize standard graph convolution networks, we employ a simple edge weight / edge distance correlation scheme, whose parameters can be fixed using reference values or determined through Bayesian hyperparameter optimization. The results of geometric graph convolutions, for the ESOL and Freesol datasets, show significant improvement over those of standard graph convolutions. Our work demonstrates the feasibility and promise of incorporating geometry, using the distance-geometric graph representation, in deep learning on 3D graphs.
摘要:三维(3D)图形的几何形状,由节点和边缘的,在扮演着许多重要的应用至关重要的作用。一个很好的例子是分子图,其几何形状影响的分子的重要特性,包括其反应性和生物活性。为了便于在3D图形深度学习几何的掺入,我们定义了三种类型的几何图形表示的:位置,角度和几何距离几何。用于概念验证,我们使用几何图形卷积的距离几何图表示。此外,利用标准曲线卷积网络,我们采用一个简单的边缘权重/边缘距离相关性方案,其参数可使用的基准值是固定的或通过贝叶斯超参数优化来确定。几何图形卷积的结果,对于ESOL和Freesol数据集,表现出对这些标准的图形化卷积显著改善。我们的工作表明的可行性,并纳入几何形状,使用距离几何图表示,在深学习三维图形的承诺。
2. A Novel Nudity Detection Algorithm for Web and Mobile Application Development [PDF] 返回目录
Rahat Yeasin Emon
Abstract: In our current web and mobile application development runtime nude image content detection is very important. This paper presents a runtime nudity detection method for web and mobile application development. We use two parameters to detect the nude content of an image. One is the number of skin pixels another is face region. A skin color model based on RGB, HSV color spaces are used to detect skin pixels in an image. Google vision api is used to detect the face region. By the percentage of skin regions and face regions an image is identified nude or not. The success of this algorithm exists in detecting skin regions and face regions. The skin detection algorithm can detect skin 95% accurately with a low false-positive rate and the google vision api for web and mobile applications can detect face 99% accurately with less than 1 second time. From the experimental analysis, we have seen that the proposed algorithm can detect 95% percent accurately the nudity of an image.
摘要:在我国目前的网络和移动应用开发运行时的裸体图像内容检测是非常重要的。本文提出了网络和移动应用程序开发的运行裸体检测方法。我们用两个参数来检测图像的裸体内容。一个是皮肤的像素数是另一个人脸区域。基于RGB的皮肤颜色模型,HSV颜色空间被用于检测图像中的皮肤像素。谷歌的愿景API用于检测人脸区域。通过皮肤区域和面部区域所占的百分比的图像识别裸体与否。该算法的成功确实存在检测皮肤区域和面部区域。皮肤检测算法能够准确地检测皮肤95%具有低假阳性率和谷歌的视觉API对网络和移动应用能够准确地检测脸部99%,小于1秒的时间。从实验分析,我们已经看到,该算法可以检测95%百分比准确的图像的裸露。
Rahat Yeasin Emon
Abstract: In our current web and mobile application development runtime nude image content detection is very important. This paper presents a runtime nudity detection method for web and mobile application development. We use two parameters to detect the nude content of an image. One is the number of skin pixels another is face region. A skin color model based on RGB, HSV color spaces are used to detect skin pixels in an image. Google vision api is used to detect the face region. By the percentage of skin regions and face regions an image is identified nude or not. The success of this algorithm exists in detecting skin regions and face regions. The skin detection algorithm can detect skin 95% accurately with a low false-positive rate and the google vision api for web and mobile applications can detect face 99% accurately with less than 1 second time. From the experimental analysis, we have seen that the proposed algorithm can detect 95% percent accurately the nudity of an image.
摘要:在我国目前的网络和移动应用开发运行时的裸体图像内容检测是非常重要的。本文提出了网络和移动应用程序开发的运行裸体检测方法。我们用两个参数来检测图像的裸体内容。一个是皮肤的像素数是另一个人脸区域。基于RGB的皮肤颜色模型,HSV颜色空间被用于检测图像中的皮肤像素。谷歌的愿景API用于检测人脸区域。通过皮肤区域和面部区域所占的百分比的图像识别裸体与否。该算法的成功确实存在检测皮肤区域和面部区域。皮肤检测算法能够准确地检测皮肤95%具有低假阳性率和谷歌的视觉API对网络和移动应用能够准确地检测脸部99%,小于1秒的时间。从实验分析,我们已经看到,该算法可以检测95%百分比准确的图像的裸露。
3. Fast and automated biomarker detection in breath samples with machine learning [PDF] 返回目录
Angelika Skarysz, Dahlia Salman, Michael Eddleston, Martin Sykora, Eugenie Hunsicker, William H Nailon, Kareen Darnley, Duncan B McLaren, C L Paul Thomas, Andrea Soltoggio
Abstract: Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions and can be used for fast, accurate and non-invasive diagnostics. Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis that is time-consuming, subjective and may introduce errors. We propose a system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs directly from raw data, thus bypassing expert-led processing. The new proposed approach showed to outperform the expert-led analysis by detecting a significantly higher number of VOCs in just a fraction of time while maintaining high specificity. These results suggest that the proposed method can help the large-scale deployment of breath-based diagnosis by reducing time and cost, and increasing accuracy and consistency.
摘要:在人类呼吸挥发性有机化合物(VOC)可揭示的健康状况大的频谱和可用于快速,准确,非侵入性诊断。气相色谱 - 质谱(GC-MS)来测量的VOC,但其应用由是费时的,主观的并且可引入误差专家驱动的数据分析的限制。我们提出了一个系统执行以下利用学习和自动直接从原始数据检测挥发性有机物,从而绕过专家主导的加工深度学习模式识别能力的GC-MS的数据分析。新提出的方法显示在短短的一小部分时间检测显著较高数量的VOC,同时保持较高的特异性跑赢专家主导的分析。这些结果表明,该方法能够减少时间和成本,并提高准确性和一致性,帮助呼吸诊断的大规模部署。
Angelika Skarysz, Dahlia Salman, Michael Eddleston, Martin Sykora, Eugenie Hunsicker, William H Nailon, Kareen Darnley, Duncan B McLaren, C L Paul Thomas, Andrea Soltoggio
Abstract: Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions and can be used for fast, accurate and non-invasive diagnostics. Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis that is time-consuming, subjective and may introduce errors. We propose a system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs directly from raw data, thus bypassing expert-led processing. The new proposed approach showed to outperform the expert-led analysis by detecting a significantly higher number of VOCs in just a fraction of time while maintaining high specificity. These results suggest that the proposed method can help the large-scale deployment of breath-based diagnosis by reducing time and cost, and increasing accuracy and consistency.
摘要:在人类呼吸挥发性有机化合物(VOC)可揭示的健康状况大的频谱和可用于快速,准确,非侵入性诊断。气相色谱 - 质谱(GC-MS)来测量的VOC,但其应用由是费时的,主观的并且可引入误差专家驱动的数据分析的限制。我们提出了一个系统执行以下利用学习和自动直接从原始数据检测挥发性有机物,从而绕过专家主导的加工深度学习模式识别能力的GC-MS的数据分析。新提出的方法显示在短短的一小部分时间检测显著较高数量的VOC,同时保持较高的特异性跑赢专家主导的分析。这些结果表明,该方法能够减少时间和成本,并提高准确性和一致性,帮助呼吸诊断的大规模部署。
4. SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision Sensors [PDF] 返回目录
Shasha Guo, Lei Wang, Xiaofan Chen, Limeng Zhang, Ziyang Kang, Weixia Xu
Abstract: Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imaging sensors. However, they are sensitive to background activity (BA) events that are unwanted. There are some filters for tackling this problem based on spatio-temporal correlation. However, they are either memory-intensive or computing-intensive. We propose \emph{SeqXFilter}, a spatio-temporal correlation filter with only a past event window that has an O(1) space complexity and has simple computations. We explore the spatial correlation of an event with its past few events by analyzing the distribution of the events when applying different functions on the spatial distances. We find the best function to check the spatio-temporal correlation for an event for \emph{SeqXFilter}, best separating real events and noise events. We not only give the visual denoising effect of the filter but also use two metrics for quantitatively analyzing the filter's performance. Four neuromorphic event-based datasets, recorded from four DVS with different output sizes, are used for validation of our method. The experimental results show that \emph{SeqXFilter} achieves similar performance as baseline NNb filters, but with extremely small memory cost and simple computation logic.
摘要:基于事件神经形态动态视觉传感器(DVS)具有更快的采样率和更高的动态范围比的基于帧的成像传感器。然而,他们是背景活动(BA)是不需要的事件敏感。有一些滤波器解决基于时空相关性这个问题。然而,它们或者内存密集型或计算密集的。我们建议\ {EMPH} SeqXFilter,只有有一个O(1)空间复杂度,并具有简单的计算过去的事件窗口中的时空相关性过滤器。我们通过对空间距离应用不同的功能时,分析事件的分布探讨其过去几年的事件事件的空间相关性。我们发现最好的函数来检查事件的时空相关性为\ {EMPH} SeqXFilter,最好分开的真实事件和噪声事件。我们不仅给过滤器的视觉去噪效果还要使用两个指标定量分析滤波器的性能。四神经运动基于事件的数据集,从四个DVS与不同输出大小的记录,用于我们的方法的有效性。实验结果表明,\ {EMPH} SeqXFilter达到性能类似基线NNB过滤器,但是具有非常小的存储器的成本和简单的计算逻辑。
Shasha Guo, Lei Wang, Xiaofan Chen, Limeng Zhang, Ziyang Kang, Weixia Xu
Abstract: Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imaging sensors. However, they are sensitive to background activity (BA) events that are unwanted. There are some filters for tackling this problem based on spatio-temporal correlation. However, they are either memory-intensive or computing-intensive. We propose \emph{SeqXFilter}, a spatio-temporal correlation filter with only a past event window that has an O(1) space complexity and has simple computations. We explore the spatial correlation of an event with its past few events by analyzing the distribution of the events when applying different functions on the spatial distances. We find the best function to check the spatio-temporal correlation for an event for \emph{SeqXFilter}, best separating real events and noise events. We not only give the visual denoising effect of the filter but also use two metrics for quantitatively analyzing the filter's performance. Four neuromorphic event-based datasets, recorded from four DVS with different output sizes, are used for validation of our method. The experimental results show that \emph{SeqXFilter} achieves similar performance as baseline NNb filters, but with extremely small memory cost and simple computation logic.
摘要:基于事件神经形态动态视觉传感器(DVS)具有更快的采样率和更高的动态范围比的基于帧的成像传感器。然而,他们是背景活动(BA)是不需要的事件敏感。有一些滤波器解决基于时空相关性这个问题。然而,它们或者内存密集型或计算密集的。我们建议\ {EMPH} SeqXFilter,只有有一个O(1)空间复杂度,并具有简单的计算过去的事件窗口中的时空相关性过滤器。我们通过对空间距离应用不同的功能时,分析事件的分布探讨其过去几年的事件事件的空间相关性。我们发现最好的函数来检查事件的时空相关性为\ {EMPH} SeqXFilter,最好分开的真实事件和噪声事件。我们不仅给过滤器的视觉去噪效果还要使用两个指标定量分析滤波器的性能。四神经运动基于事件的数据集,从四个DVS与不同输出大小的记录,用于我们的方法的有效性。实验结果表明,\ {EMPH} SeqXFilter达到性能类似基线NNB过滤器,但是具有非常小的存储器的成本和简单的计算逻辑。
5. Channel Distillation: Channel-Wise Attention for Knowledge Distillation [PDF] 返回目录
Zaida Zhou, Chaoran Zhuge, Xinwei Guan, Wen Liu
Abstract: Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the teacher. In this paper, we propose a new distillation method, which contains two transfer distillation strategies and a loss decay strategy. The first transfer strategy is based on channel-wise attention, called Channel Distillation (CD). CD transfers the channel information from the teacher to the student. The second is Guided Knowledge Distillation (GKD). Unlike Knowledge Distillation (KD), which allows the student to mimic each sample's prediction distribution of the teacher, GKD only enables the student to mimic the correct output of the teacher. The last part is Early Decay Teacher (EDT). During the training process, we gradually decay the weight of the distillation loss. The purpose is to enable the student to gradually control the optimization rather than the teacher. Our proposed method is evaluated on ImageNet and CIFAR100. On ImageNet, we achieve 27.68% of top-1 error with ResNet18, which outperforms state-of-the-art methods. On CIFAR100, we achieve surprising result that the student outperforms the teacher. Code is available at this https URL.
摘要:知识蒸馏是传递由教师网络到学生网络学习的数据的知识,让学生有参数少,少计算的优势,而且准确度接近老师。在本文中,我们提出了一个新的蒸馏法,其中包含了两个转移蒸馏策略和损失衰减策略。第一传送策略是基于信道明智的关注,被称为通道蒸馏(CD)。 CD传输从老师到学生的频道信息。第二被引导知识蒸馏(GKD)。不像知识蒸馏(KD),其允许学生在老师的模拟每个样本的预测分布,仅GKD使学生老师的模拟正确的输出。最后一部分是早期衰变教师(EDT)。在培训过程中,我们逐渐衰减蒸馏损失的重量。目的是使学生逐渐控制优化,而不是老师。我们提出的方法是在ImageNet和CIFAR100评估。上ImageNet,我们达到顶部1误差ResNet18,这优于状态的最先进的方法的27.68%。在CIFAR100,我们实现了令人惊讶的结果,学生胜过老师。代码可在此HTTPS URL。
Zaida Zhou, Chaoran Zhuge, Xinwei Guan, Wen Liu
Abstract: Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the teacher. In this paper, we propose a new distillation method, which contains two transfer distillation strategies and a loss decay strategy. The first transfer strategy is based on channel-wise attention, called Channel Distillation (CD). CD transfers the channel information from the teacher to the student. The second is Guided Knowledge Distillation (GKD). Unlike Knowledge Distillation (KD), which allows the student to mimic each sample's prediction distribution of the teacher, GKD only enables the student to mimic the correct output of the teacher. The last part is Early Decay Teacher (EDT). During the training process, we gradually decay the weight of the distillation loss. The purpose is to enable the student to gradually control the optimization rather than the teacher. Our proposed method is evaluated on ImageNet and CIFAR100. On ImageNet, we achieve 27.68% of top-1 error with ResNet18, which outperforms state-of-the-art methods. On CIFAR100, we achieve surprising result that the student outperforms the teacher. Code is available at this https URL.
摘要:知识蒸馏是传递由教师网络到学生网络学习的数据的知识,让学生有参数少,少计算的优势,而且准确度接近老师。在本文中,我们提出了一个新的蒸馏法,其中包含了两个转移蒸馏策略和损失衰减策略。第一传送策略是基于信道明智的关注,被称为通道蒸馏(CD)。 CD传输从老师到学生的频道信息。第二被引导知识蒸馏(GKD)。不像知识蒸馏(KD),其允许学生在老师的模拟每个样本的预测分布,仅GKD使学生老师的模拟正确的输出。最后一部分是早期衰变教师(EDT)。在培训过程中,我们逐渐衰减蒸馏损失的重量。目的是使学生逐渐控制优化,而不是老师。我们提出的方法是在ImageNet和CIFAR100评估。上ImageNet,我们达到顶部1误差ResNet18,这优于状态的最先进的方法的27.68%。在CIFAR100,我们实现了令人惊讶的结果,学生胜过老师。代码可在此HTTPS URL。
6. Interpretation of ResNet by Visualization of Preferred Stimulus in Receptive Fields [PDF] 返回目录
Genta Kobayashi, Hayaru Shouno
Abstract: One of the methods used in image recognition is the Deep Convolutional Neural Network (DCNN). DCNN is a model in which the expressive power of features is greatly improved by deepening the hidden layer of CNN. The architecture of CNNs is determined based on a model of the visual cortex of mammals. There is a model called Residual Network (ResNet) that has a skip connection. ResNet is an advanced model in terms of the learning method, but it has no biological viewpoint. In this research, we investigate the receptive fields of a ResNet on the classification task in ImageNet. We find that ResNet has orientation selective neurons and double opponent color neurons. In addition, we suggest that some inactive neurons in the first layer of ResNet effect for the classification task.
摘要:一个图像识别所使用的方法是深卷积神经网络(DCNN)。 DCNN是一个模型,其中的特征的表达能力大大通过加深CNN的隐藏层改善。细胞神经网络的体系结构是基于哺乳动物的视觉皮层的模型来确定。有一种叫做剩余网络(RESNET)模型具有跳跃连接。 RESNET是在学习方法方面的先进模式,但它没有生物的观点。在这项研究中,我们探讨在ImageNet分类任务RESNET的感受野。我们发现,RESNET具有定向选择性的神经元和双颜色对手神经元。此外,我们建议,在为分类任务RESNET效应的第一层的一些不活动的神经元。
Genta Kobayashi, Hayaru Shouno
Abstract: One of the methods used in image recognition is the Deep Convolutional Neural Network (DCNN). DCNN is a model in which the expressive power of features is greatly improved by deepening the hidden layer of CNN. The architecture of CNNs is determined based on a model of the visual cortex of mammals. There is a model called Residual Network (ResNet) that has a skip connection. ResNet is an advanced model in terms of the learning method, but it has no biological viewpoint. In this research, we investigate the receptive fields of a ResNet on the classification task in ImageNet. We find that ResNet has orientation selective neurons and double opponent color neurons. In addition, we suggest that some inactive neurons in the first layer of ResNet effect for the classification task.
摘要:一个图像识别所使用的方法是深卷积神经网络(DCNN)。 DCNN是一个模型,其中的特征的表达能力大大通过加深CNN的隐藏层改善。细胞神经网络的体系结构是基于哺乳动物的视觉皮层的模型来确定。有一种叫做剩余网络(RESNET)模型具有跳跃连接。 RESNET是在学习方法方面的先进模式,但它没有生物的观点。在这项研究中,我们探讨在ImageNet分类任务RESNET的感受野。我们发现,RESNET具有定向选择性的神经元和双颜色对手神经元。此外,我们建议,在为分类任务RESNET效应的第一层的一些不活动的神经元。
7. Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge [PDF] 返回目录
Peng Wang, Dongyang Liu, Hui Li, Qi Wu
Abstract: Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object by describing its affordance or other non-visual attributes, especially when we do not have a precise target. For example, sometimes we say 'Give me something to eat'. In this case, we need to use commonsense knowledge to identify the objects in the image. Unfortunately, these is no existing referring expression dataset reflecting this requirement, not to mention a model to tackle this challenge. In this paper, we collect a new referring expression dataset, called KB-Ref, containing 43k expressions on 16k images. In KB-Ref, to answer each expression (detect the target object referred by the expression), at least one piece of commonsense knowledge must be required. We then test state-of-the-art (SoTA) REF models on KB-Ref, finding that all of them present a large drop compared to their outstanding performance on general REF datasets. We also present an expression conditioned image and fact attention (ECIFA) network that extract information from correlated image regions and commonsense knowledge facts. Our method leads to a significant improvement over SoTA REF models, although there is still a gap between this strong baseline and human performance. The dataset and baseline models will be released.
摘要:传统的指表达理解(REF)通过描述它的外观和空间位置假定人们从图像查询的东西,但在实践中,我们经常通过描述它的启示,或者其他非视觉属性要求的对象,特别是当我们没有一个精确的目标。例如,有时我们说“给我东西吃。”在这种情况下,我们需要用常识性知识,识别图像中的对象。不幸的是,这是没有现成的参考表达数据集反映了这一要求,更何况一个模型来解决这一挑战。在本文中,我们收集了新的参考表达数据集,名为KB-参考,载有16K的图像43K表达式。在KB-参考,回答每个表达式(检测目标对象称为由表达式),必须按规定的常识知识库的至少一条。然后,我们测试的国家的最先进的KB-参考(SOTA)REF模型,发现所有的人相比,他们对一般REF数据集出色的表现呈现出大幅下降。我们还提出了一种表达条件的图像和事实的关注(ECIFA)网络,从相关的图像区域和常识性知识事实提取信息。我们的方法导致了SOTA REF车型显著改善,但仍然有这种强烈的基线和人的行为之间的差距。该数据集和基线模型将被释放。
Peng Wang, Dongyang Liu, Hui Li, Qi Wu
Abstract: Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object by describing its affordance or other non-visual attributes, especially when we do not have a precise target. For example, sometimes we say 'Give me something to eat'. In this case, we need to use commonsense knowledge to identify the objects in the image. Unfortunately, these is no existing referring expression dataset reflecting this requirement, not to mention a model to tackle this challenge. In this paper, we collect a new referring expression dataset, called KB-Ref, containing 43k expressions on 16k images. In KB-Ref, to answer each expression (detect the target object referred by the expression), at least one piece of commonsense knowledge must be required. We then test state-of-the-art (SoTA) REF models on KB-Ref, finding that all of them present a large drop compared to their outstanding performance on general REF datasets. We also present an expression conditioned image and fact attention (ECIFA) network that extract information from correlated image regions and commonsense knowledge facts. Our method leads to a significant improvement over SoTA REF models, although there is still a gap between this strong baseline and human performance. The dataset and baseline models will be released.
摘要:传统的指表达理解(REF)通过描述它的外观和空间位置假定人们从图像查询的东西,但在实践中,我们经常通过描述它的启示,或者其他非视觉属性要求的对象,特别是当我们没有一个精确的目标。例如,有时我们说“给我东西吃。”在这种情况下,我们需要用常识性知识,识别图像中的对象。不幸的是,这是没有现成的参考表达数据集反映了这一要求,更何况一个模型来解决这一挑战。在本文中,我们收集了新的参考表达数据集,名为KB-参考,载有16K的图像43K表达式。在KB-参考,回答每个表达式(检测目标对象称为由表达式),必须按规定的常识知识库的至少一条。然后,我们测试的国家的最先进的KB-参考(SOTA)REF模型,发现所有的人相比,他们对一般REF数据集出色的表现呈现出大幅下降。我们还提出了一种表达条件的图像和事实的关注(ECIFA)网络,从相关的图像区域和常识性知识事实提取信息。我们的方法导致了SOTA REF车型显著改善,但仍然有这种强烈的基线和人的行为之间的差距。该数据集和基线模型将被释放。
8. A Multi-Task Comparator Framework for Kinship Verification [PDF] 返回目录
Stefan Hörmann, Martin Knoche, Gerhard Rigoll
Abstract: Approaches for kinship verification often rely on cosine distances between face identification features. However, due to gender bias inherent in these features, it is hard to reliably predict whether two opposite-gender pairs are related. Instead of fine tuning the feature extractor network on kinship verification, we propose a comparator network to cope with this bias. After concatenating both features, cascaded local expert networks extract the information most relevant for their corresponding kinship relation. We demonstrate that our framework is robust against this gender bias and achieves comparable results on two tracks of the RFIW Challenge 2020. Moreover, we show how our framework can be further extended to handle partially known or unknown kinship relations.
摘要:为途径亲属核查往往依靠脸部识别功能之间的余弦距离。然而,由于在这些功能中固有的性别偏见,很难可靠地预测两相对性别对是否是相关的。而是微调血缘验证特征提取的网络,我们提出了一个比较网络,以应对这种偏见。串联这两个功能后,当地级联专家网络中提取最相关的其相应的亲属关系有关的信息。我们证明了我们的框架是对这种性别偏见强大,实现了对RFIW挑战2020的两条轨道类似的结果。此外,我们展示了如何我们的框架可以进一步扩展到处理部分已知或未知的亲属关系。
Stefan Hörmann, Martin Knoche, Gerhard Rigoll
Abstract: Approaches for kinship verification often rely on cosine distances between face identification features. However, due to gender bias inherent in these features, it is hard to reliably predict whether two opposite-gender pairs are related. Instead of fine tuning the feature extractor network on kinship verification, we propose a comparator network to cope with this bias. After concatenating both features, cascaded local expert networks extract the information most relevant for their corresponding kinship relation. We demonstrate that our framework is robust against this gender bias and achieves comparable results on two tracks of the RFIW Challenge 2020. Moreover, we show how our framework can be further extended to handle partially known or unknown kinship relations.
摘要:为途径亲属核查往往依靠脸部识别功能之间的余弦距离。然而,由于在这些功能中固有的性别偏见,很难可靠地预测两相对性别对是否是相关的。而是微调血缘验证特征提取的网络,我们提出了一个比较网络,以应对这种偏见。串联这两个功能后,当地级联专家网络中提取最相关的其相应的亲属关系有关的信息。我们证明了我们的框架是对这种性别偏见强大,实现了对RFIW挑战2020的两条轨道类似的结果。此外,我们展示了如何我们的框架可以进一步扩展到处理部分已知或未知的亲属关系。
9. CNNs on Surfaces using Rotation-Equivariant Features [PDF] 返回目录
Ruben Wiersma, Elmar Eisemann, Klaus Hildebrandt
Abstract: This paper is concerned with a fundamental problem in geometric deep learning that arises in the construction of convolutional neural networks on surfaces. Due to curvature, the transport of filter kernels on surfaces results in a rotational ambiguity, which prevents a uniform alignment of these kernels on the surface. We propose a network architecture for surfaces that consists of vector-valued, rotation-equivariant features. The equivariance property makes it possible to locally align features, which were computed in arbitrary coordinate systems, when aggregating features in a convolution layer. The resulting network is agnostic to the choices of coordinate systems for the tangent spaces on the surface. We implement our approach for triangle meshes. Based on circular harmonic functions, we introduce convolution filters for meshes that are rotation-equivariant at the discrete level. We evaluate the resulting networks on shape correspondence and shape classifications tasks and compare their performance to other approaches.
摘要:本文是关于几何深度学习中出现的卷积神经网络的表面上的建设的根本问题。由于曲率,滤波器的传输内核上在旋转歧义,这防止了表面上的这些内核的均匀对准表面的结果。我们提出的表面是由矢量值,旋转等变功能的网络架构。的同变性性质使得有可能局部地对准特征,这是在一个卷积层聚集特征时计算在任意的坐标系。将得到的网络是不可知的坐标为表面上的正切空间系统的选择。我们实现我们的三角形网格的方法。基于循环调和函数,我们引入卷积滤波器是旋转等变在离散的水平网格。我们评估对形状的对应和形状分类的任务所产生的网络并比较其性能与其他方法。
Ruben Wiersma, Elmar Eisemann, Klaus Hildebrandt
Abstract: This paper is concerned with a fundamental problem in geometric deep learning that arises in the construction of convolutional neural networks on surfaces. Due to curvature, the transport of filter kernels on surfaces results in a rotational ambiguity, which prevents a uniform alignment of these kernels on the surface. We propose a network architecture for surfaces that consists of vector-valued, rotation-equivariant features. The equivariance property makes it possible to locally align features, which were computed in arbitrary coordinate systems, when aggregating features in a convolution layer. The resulting network is agnostic to the choices of coordinate systems for the tangent spaces on the surface. We implement our approach for triangle meshes. Based on circular harmonic functions, we introduce convolution filters for meshes that are rotation-equivariant at the discrete level. We evaluate the resulting networks on shape correspondence and shape classifications tasks and compare their performance to other approaches.
摘要:本文是关于几何深度学习中出现的卷积神经网络的表面上的建设的根本问题。由于曲率,滤波器的传输内核上在旋转歧义,这防止了表面上的这些内核的均匀对准表面的结果。我们提出的表面是由矢量值,旋转等变功能的网络架构。的同变性性质使得有可能局部地对准特征,这是在一个卷积层聚集特征时计算在任意的坐标系。将得到的网络是不可知的坐标为表面上的正切空间系统的选择。我们实现我们的三角形网格的方法。基于循环调和函数,我们引入卷积滤波器是旋转等变在离散的水平网格。我们评估对形状的对应和形状分类的任务所产生的网络并比较其性能与其他方法。
10. Studying The Effect of MIL Pooling Filters on MIL Tasks [PDF] 返回目录
Mustafa Umit Oner, Jared Marc Song Kye-Jet, Hwee Kuan Lee, Wing-Kin Sung
Abstract: There are different multiple instance learning (MIL) pooling filters used in MIL models. In this paper, we study the effect of different MIL pooling filters on the performance of MIL models in real world MIL tasks. We designed a neural network based MIL framework with 5 different MIL pooling filters: `max', `mean', `attention', `distribution' and `distribution with attention'. We also formulated 5 different MIL tasks on a real world lymph node metastases dataset. We found that the performance of our framework in a task is different for different filters. We also observed that the performances of the five pooling filters are also different from task to task. Hence, the selection of a correct MIL pooling filter for each MIL task is crucial for better performance. Furthermore, we noticed that models with `distribution' and `distribution with attention' pooling filters consistently perform well in almost all of the tasks. We attribute this phenomena to the amount of information captured by `distribution' based pooling filters. While point estimate based pooling filters, like `max' and `mean', produce point estimates of distributions, `distribution' based pooling filters capture the full information in distributions. Lastly, we compared the performance of our neural network model with `distribution' pooling filter with the performance of the best MIL methods in the literature on classical MIL datasets and our model outperformed the others.
摘要:有不同的多实例学习(MIL)集中在MIL模型使用的过滤器。在本文中,我们研究了不同MIL池过滤器上的MIL模型在现实世界MIL任务性能的影响。我们设计了5个不同的MIL池过滤器基于神经网络的MIL框架:'最大“'平均”,'关注“'分配”和'分布的关注”。我们还制定了一个真实的世界淋巴结转移数据集5个不同的MIL任务。我们发现,我们的框架在任务中的表现是不同的过滤器不同。我们还注意到,五个池过滤器的性能也可从任务到任务不同。因此,对于每MIL任务正确的MIL池过滤器的选择是为了更好的性能是至关重要的。此外,我们注意到'分配“和'分布的关注”池过滤器,型号一致以及在执行几乎所有的任务。我们认为这种现象被`分配”基于池过滤器捕获的信息量。虽然点估计基于池过滤器,像分布的`最大“和'平均”,产生点估计,基于池过滤器捕获分布的全部信息'分配”。最后,我们比较我们与文献中最好的MIL方法古典MIL数据集的性能`分布”池过滤器的神经网络模型的性能和我们的模型表现优于他人。
Mustafa Umit Oner, Jared Marc Song Kye-Jet, Hwee Kuan Lee, Wing-Kin Sung
Abstract: There are different multiple instance learning (MIL) pooling filters used in MIL models. In this paper, we study the effect of different MIL pooling filters on the performance of MIL models in real world MIL tasks. We designed a neural network based MIL framework with 5 different MIL pooling filters: `max', `mean', `attention', `distribution' and `distribution with attention'. We also formulated 5 different MIL tasks on a real world lymph node metastases dataset. We found that the performance of our framework in a task is different for different filters. We also observed that the performances of the five pooling filters are also different from task to task. Hence, the selection of a correct MIL pooling filter for each MIL task is crucial for better performance. Furthermore, we noticed that models with `distribution' and `distribution with attention' pooling filters consistently perform well in almost all of the tasks. We attribute this phenomena to the amount of information captured by `distribution' based pooling filters. While point estimate based pooling filters, like `max' and `mean', produce point estimates of distributions, `distribution' based pooling filters capture the full information in distributions. Lastly, we compared the performance of our neural network model with `distribution' pooling filter with the performance of the best MIL methods in the literature on classical MIL datasets and our model outperformed the others.
摘要:有不同的多实例学习(MIL)集中在MIL模型使用的过滤器。在本文中,我们研究了不同MIL池过滤器上的MIL模型在现实世界MIL任务性能的影响。我们设计了5个不同的MIL池过滤器基于神经网络的MIL框架:'最大“'平均”,'关注“'分配”和'分布的关注”。我们还制定了一个真实的世界淋巴结转移数据集5个不同的MIL任务。我们发现,我们的框架在任务中的表现是不同的过滤器不同。我们还注意到,五个池过滤器的性能也可从任务到任务不同。因此,对于每MIL任务正确的MIL池过滤器的选择是为了更好的性能是至关重要的。此外,我们注意到'分配“和'分布的关注”池过滤器,型号一致以及在执行几乎所有的任务。我们认为这种现象被`分配”基于池过滤器捕获的信息量。虽然点估计基于池过滤器,像分布的`最大“和'平均”,产生点估计,基于池过滤器捕获分布的全部信息'分配”。最后,我们比较我们与文献中最好的MIL方法古典MIL数据集的性能`分布”池过滤器的神经网络模型的性能和我们的模型表现优于他人。
11. Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution [PDF] 返回目录
Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, Hongdng Li, Ruigang Yang
Abstract: Despite the remarkable progresses made in deep-learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with real-world DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively re-exploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.
摘要:尽管深学习基础的深度图超分辨率(DSR)取得了显着进步,如何解决在低分辨率(LR)的深度现实世界退化图仍然是一个重大的挑战。现有DSR模式一般培训,并对合成的数据集,这是什么会从一个真正的深度传感器得到非常不同的测试。在本文中,我们认为,在此设置下训练的DSR模型是严格,在处理现实世界的DSR任务无效。让我们在处理不同深度传感器的真实世界的降解两个贡献。首先,我们建议LR深度图的生成分为两种类型:噪声和间隔采样非线性下采样,为此,DSR模型相应的经验教训。其次,我们提出了现实世界的DSR一个新的框架,它由四个模块组成:1)的迭代残留学习模块深监督学习有效的高频分量的深度,由粗到细的方式映射; 2)关注信道的策略以增强与丰富的高频分量信道; 3)一种多级融合模块有效地重新利用在粗到细的过程的结果;和4)的深度细化模块,以改善由TGV正规化和输入损失的深度图。标杆数据集大量的实验证明我们的方法超过国家的最先进的电流DSR方法的优越性。
Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, Hongdng Li, Ruigang Yang
Abstract: Despite the remarkable progresses made in deep-learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with real-world DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively re-exploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.
摘要:尽管深学习基础的深度图超分辨率(DSR)取得了显着进步,如何解决在低分辨率(LR)的深度现实世界退化图仍然是一个重大的挑战。现有DSR模式一般培训,并对合成的数据集,这是什么会从一个真正的深度传感器得到非常不同的测试。在本文中,我们认为,在此设置下训练的DSR模型是严格,在处理现实世界的DSR任务无效。让我们在处理不同深度传感器的真实世界的降解两个贡献。首先,我们建议LR深度图的生成分为两种类型:噪声和间隔采样非线性下采样,为此,DSR模型相应的经验教训。其次,我们提出了现实世界的DSR一个新的框架,它由四个模块组成:1)的迭代残留学习模块深监督学习有效的高频分量的深度,由粗到细的方式映射; 2)关注信道的策略以增强与丰富的高频分量信道; 3)一种多级融合模块有效地重新利用在粗到细的过程的结果;和4)的深度细化模块,以改善由TGV正规化和输入损失的深度图。标杆数据集大量的实验证明我们的方法超过国家的最先进的电流DSR方法的优越性。
12. Recapture as You Want [PDF] 返回目录
Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li
Abstract: With the increasing prevalence and more powerful camera systems of mobile devices, people can conveniently take photos in their daily life, which naturally brings the demand for more intelligent photo post-processing techniques, especially on those portrait photos. In this paper, we present a portrait recapture method enabling users to easily edit their portrait to desired posture/view, body figure and clothing style, which are very challenging to achieve since it requires to simultaneously perform non-rigid deformation of human body, invisible body-parts reasoning and semantic-aware editing. We decompose the editing procedure into semantic-aware geometric and appearance transformation. In geometric transformation, a semantic layout map is generated that meets user demands to represent part-level spatial constraints and further guides the semantic-aware appearance transformation. In appearance transformation, we design two novel modules, Semantic-aware Attentive Transfer (SAT) and Layout Graph Reasoning (LGR), to conduct intra-part transfer and inter-part reasoning, respectively. SAT module produces each human part by paying attention to the semantically consistent regions in the source portrait. It effectively addresses the non-rigid deformation issue and well preserves the intrinsic structure/appearance with rich texture details. LGR module utilizes body skeleton knowledge to construct a layout graph that connects all relevant part features, where graph reasoning mechanism is used to propagate information among part nodes to mine their relations. In this way, LGR module infers invisible body parts and guarantees global coherence among all the parts. Extensive experiments on DeepFashion, Market-1501 and in-the-wild photos demonstrate the effectiveness and superiority of our approach. Video demo is at: \url{this https URL}.
摘要:随着越来越流行和移动设备的功能更强大的摄像系统,人们可以方便地搭在他们的日常生活中,这自然带来了更智能的照片的后期处理技术的需求的照片,尤其是那些人像照片。在本文中,我们提出了一个画像捕法,使用户能够容易地编辑他们的人像到期望的姿势/视图,体型和服装样式,这是非常具有挑战性的实现,因为它需要同时进行人体的非刚性变形,隐形身体部分的推理和语义感知编辑。我们分解编辑程序变成语义感知几何和外观改造。在几何变换时,产生一个语义布局地图满足用户需求来表示部件级空间约束,并进一步引导语义感知外观转化。在外观上变换,我们设计了两个新的模块,语义感知细心的转移(SAT)和布局图形推理(LGR),以分别进行帧内部分转移和间部分推理。 SAT模块产生由源倾情关注语义一致的区域中的每个人的一部分。它有效地解决了非刚性变形的问题,以及保存具有丰富的纹理细节的内在结构/外观。 LGR模块利用车身骨架知识来构建连接所有相关的部分功能,其中图形推理机制,部分节点矿两国关系中用于传播信息的布局图。通过这种方式,LGR模块推断看不见的身体部位,并保证各部分之间所有的全局一致性。在DeepFashion,市场-1501和最狂野的照片大量的实验证明了该方法的有效性和优越性。视频演示为:\ {URL这HTTPS URL}。
Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li
Abstract: With the increasing prevalence and more powerful camera systems of mobile devices, people can conveniently take photos in their daily life, which naturally brings the demand for more intelligent photo post-processing techniques, especially on those portrait photos. In this paper, we present a portrait recapture method enabling users to easily edit their portrait to desired posture/view, body figure and clothing style, which are very challenging to achieve since it requires to simultaneously perform non-rigid deformation of human body, invisible body-parts reasoning and semantic-aware editing. We decompose the editing procedure into semantic-aware geometric and appearance transformation. In geometric transformation, a semantic layout map is generated that meets user demands to represent part-level spatial constraints and further guides the semantic-aware appearance transformation. In appearance transformation, we design two novel modules, Semantic-aware Attentive Transfer (SAT) and Layout Graph Reasoning (LGR), to conduct intra-part transfer and inter-part reasoning, respectively. SAT module produces each human part by paying attention to the semantically consistent regions in the source portrait. It effectively addresses the non-rigid deformation issue and well preserves the intrinsic structure/appearance with rich texture details. LGR module utilizes body skeleton knowledge to construct a layout graph that connects all relevant part features, where graph reasoning mechanism is used to propagate information among part nodes to mine their relations. In this way, LGR module infers invisible body parts and guarantees global coherence among all the parts. Extensive experiments on DeepFashion, Market-1501 and in-the-wild photos demonstrate the effectiveness and superiority of our approach. Video demo is at: \url{this https URL}.
摘要:随着越来越流行和移动设备的功能更强大的摄像系统,人们可以方便地搭在他们的日常生活中,这自然带来了更智能的照片的后期处理技术的需求的照片,尤其是那些人像照片。在本文中,我们提出了一个画像捕法,使用户能够容易地编辑他们的人像到期望的姿势/视图,体型和服装样式,这是非常具有挑战性的实现,因为它需要同时进行人体的非刚性变形,隐形身体部分的推理和语义感知编辑。我们分解编辑程序变成语义感知几何和外观改造。在几何变换时,产生一个语义布局地图满足用户需求来表示部件级空间约束,并进一步引导语义感知外观转化。在外观上变换,我们设计了两个新的模块,语义感知细心的转移(SAT)和布局图形推理(LGR),以分别进行帧内部分转移和间部分推理。 SAT模块产生由源倾情关注语义一致的区域中的每个人的一部分。它有效地解决了非刚性变形的问题,以及保存具有丰富的纹理细节的内在结构/外观。 LGR模块利用车身骨架知识来构建连接所有相关的部分功能,其中图形推理机制,部分节点矿两国关系中用于传播信息的布局图。通过这种方式,LGR模块推断看不见的身体部位,并保证各部分之间所有的全局一致性。在DeepFashion,市场-1501和最狂野的照片大量的实验证明了该方法的有效性和优越性。视频演示为:\ {URL这HTTPS URL}。
13. Distribution Aligned Multimodal and Multi-Domain Image Stylization [PDF] 返回目录
Minxuan Lin, Fan Tang, Weiming Dong, Xiao Li, Chongyang Ma, Changsheng Xu
Abstract: Multimodal and multi-domain stylization are two important problems in the field of image style transfer. Currently, there are few methods that can perform both multimodal and multi-domain stylization simultaneously. In this paper, we propose a unified framework for multimodal and multi-domain style transfer with the support of both exemplar-based reference and randomly sampled guidance. The key component of our method is a novel style distribution alignment module that eliminates the explicit distribution gaps between various style domains and reduces the risk of mode collapse. The multimodal diversity is ensured by either guidance from multiple images or random style code, while the multi-domain controllability is directly achieved by using a domain label. We validate our proposed framework on painting style transfer with a variety of different artistic styles and genres. Qualitative and quantitative comparisons with state-of-the-art methods demonstrate that our method can generate high-quality results of multi-domain styles and multimodal instances with reference style guidance or random sampled style.
摘要:多式联运和多域程式化在影像风格传输领域的两个重要问题。目前,有一些既可以多和多域程式化同时执行几个方法。在本文中,我们提出了既基于标本参考的支持和随机抽样指导多式联运和多域风格传递一个统一的框架。我们的方法的关键组件是消除各种样式结构域之间的明确的分配间隙,并减少模式倒塌的危险的新的样式分布对准模块。多模式的多样性是由多个图像或随机样式代码或者指导保证,而多域控制性直接使用域标签实现。我们验证我们提出了画有各种不同的艺术风格和流派的风格转让框架。与国家的最先进的方法,定性和定量的比较表明,我们的方法可以生成参考样式指导或随机抽取的风格多域风格和多实例高品质的结果。
Minxuan Lin, Fan Tang, Weiming Dong, Xiao Li, Chongyang Ma, Changsheng Xu
Abstract: Multimodal and multi-domain stylization are two important problems in the field of image style transfer. Currently, there are few methods that can perform both multimodal and multi-domain stylization simultaneously. In this paper, we propose a unified framework for multimodal and multi-domain style transfer with the support of both exemplar-based reference and randomly sampled guidance. The key component of our method is a novel style distribution alignment module that eliminates the explicit distribution gaps between various style domains and reduces the risk of mode collapse. The multimodal diversity is ensured by either guidance from multiple images or random style code, while the multi-domain controllability is directly achieved by using a domain label. We validate our proposed framework on painting style transfer with a variety of different artistic styles and genres. Qualitative and quantitative comparisons with state-of-the-art methods demonstrate that our method can generate high-quality results of multi-domain styles and multimodal instances with reference style guidance or random sampled style.
摘要:多式联运和多域程式化在影像风格传输领域的两个重要问题。目前,有一些既可以多和多域程式化同时执行几个方法。在本文中,我们提出了既基于标本参考的支持和随机抽样指导多式联运和多域风格传递一个统一的框架。我们的方法的关键组件是消除各种样式结构域之间的明确的分配间隙,并减少模式倒塌的危险的新的样式分布对准模块。多模式的多样性是由多个图像或随机样式代码或者指导保证,而多域控制性直接使用域标签实现。我们验证我们提出了画有各种不同的艺术风格和流派的风格转让框架。与国家的最先进的方法,定性和定量的比较表明,我们的方法可以生成参考样式指导或随机抽取的风格多域风格和多实例高品质的结果。
14. Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining [PDF] 返回目录
Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi
Abstract: Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks.
摘要:深基于卷积的单幅图像超分辨率(SISR)网络拥抱从大规模对外形象资源,学习当地的恢复,但大多数现有的作品带来的好处自然图像都忽略了远程功能方面的相似性。最近的一些作品已经成功地利用通过探索非本地注意模块这种内在功能的相关性。然而,没有现在的深模型研究了图像的另一个固有的特性:跨尺度的功能相关。在本文中,我们提出了以集成第一跨尺度非本地(CS-NL)注意模块插入一个经常性的神经网络。通过强大的复发性融合细胞与地方和在大规模非本地先验之前结合新的CS-NL,我们可以找到一个低分辨率(LR)图像中更多的跨尺度特征的相关性。 SISR的性能是通过详尽整合所有可能的先验显著改善。大量的实验通过对多个SISR基准设定新的国家的最艺术验证了CS-NL模块的有效性。
Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi
Abstract: Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks.
摘要:深基于卷积的单幅图像超分辨率(SISR)网络拥抱从大规模对外形象资源,学习当地的恢复,但大多数现有的作品带来的好处自然图像都忽略了远程功能方面的相似性。最近的一些作品已经成功地利用通过探索非本地注意模块这种内在功能的相关性。然而,没有现在的深模型研究了图像的另一个固有的特性:跨尺度的功能相关。在本文中,我们提出了以集成第一跨尺度非本地(CS-NL)注意模块插入一个经常性的神经网络。通过强大的复发性融合细胞与地方和在大规模非本地先验之前结合新的CS-NL,我们可以找到一个低分辨率(LR)图像中更多的跨尺度特征的相关性。 SISR的性能是通过详尽整合所有可能的先验显著改善。大量的实验通过对多个SISR基准设定新的国家的最艺术验证了CS-NL模块的有效性。
15. Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods [PDF] 返回目录
Yucheng Chen, Yingli Tian, Mingyi He
Abstract: Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and remarkable breakthroughs in the field of human pose estimation. This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014. This paper summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.
摘要:基于视觉的单目人体姿势推断中,如计算机视觉中最基本也是最有挑战性的问题之一,目的是从输入图像或视频序列获得人体的姿势。最近的深学习技术的发展已经带来了人类姿态估计领域显著的进展和显着的突破。本次调查广泛综述了近年来为基础的学习深刻的2D和3D人体姿势估计公布的方法,因为2014年总结了挑战,主要框架,标准数据集,评价指标,性能比较,并讨论了一些有希望的未来的研究方向。
Yucheng Chen, Yingli Tian, Mingyi He
Abstract: Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and remarkable breakthroughs in the field of human pose estimation. This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014. This paper summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.
摘要:基于视觉的单目人体姿势推断中,如计算机视觉中最基本也是最有挑战性的问题之一,目的是从输入图像或视频序列获得人体的姿势。最近的深学习技术的发展已经带来了人类姿态估计领域显著的进展和显着的突破。本次调查广泛综述了近年来为基础的学习深刻的2D和3D人体姿势估计公布的方法,因为2014年总结了挑战,主要框架,标准数据集,评价指标,性能比较,并讨论了一些有希望的未来的研究方向。
16. Resolving Class Imbalance in Object Detection with Weighted Cross Entropy Losses [PDF] 返回目录
Trong Huy Phan, Kazuma Yamamoto
Abstract: Object detection is an important task in computer vision which serves a lot of real-world applications such as autonomous driving, surveillance and robotics. Along with the rapid thrive of large-scale data, numerous state-of-the-art generalized object detectors (e.g. Faster R-CNN, YOLO, SSD) were developed in the past decade. Despite continual efforts in model modification and improvement in training strategies to boost detection accuracy, there are still limitations in performance of detectors when it comes to specialized datasets with uneven object class distributions. This originates from the common usage of Cross Entropy loss function for object classification sub-task that simply ignores the frequency of appearance of object class during training, and thus results in lower accuracies for object classes with fewer number of samples. Class-imbalance in general machine learning has been widely studied, however, little attention has been paid on the subject of object detection. In this paper, we propose to explore and overcome such problem by application of several weighted variants of Cross Entropy loss, for examples Balanced Cross Entropy, Focal Loss and Class-Balanced Loss Based on Effective Number of Samples to our object detector. Experiments with BDD100K (a highly class-imbalanced driving database acquired from on-vehicle cameras capturing mostly Car-class objects and other minority object classes such as Bus, Person and Motor) have proven better class-wise performances of detector trained with the afore-mentioned loss functions.
摘要:目标检测是在供应了大量的现实世界的应用,如自动驾驶,监控和机器人计算机视觉的一项重要任务。随着快速茁壮成长大规模数据,许多国家的最先进的广义对象检测器(例如,更快的R-CNN,YOLO,SSD)在过去的十年中发展。尽管在训练策略,以提升检测精度的模型修改和完善不断的努力,仍有探测器的性能限制,当谈到与不平对象类分布的专业数据集。这从交叉熵损失函数的对于对象分类的子任务的常见用法,简单地忽略训练期间对象类的出现频率,因此源自导致对于对象类具有较少数量的采样的低精度。类不平衡,一般的机器学习中得到了广泛的研究,然而,很少有人注意支付对象检测的对象。在本文中,我们提出了探索和通过交叉熵损失的几个变种加权应用克服这样的问题,例子平衡交叉熵,焦损耗和基于样品到我们的对象检测器的有效位数类期平衡损失。与BDD100K(从车载摄像头拍摄大多是汽车类对象和其他少数民族的对象类,如公交车,人与汽车获得了高度的类不平衡驱动数据库)的实验已经证明了探测器的较好的一类明智的表演与afore-受训提到损失函数。
Trong Huy Phan, Kazuma Yamamoto
Abstract: Object detection is an important task in computer vision which serves a lot of real-world applications such as autonomous driving, surveillance and robotics. Along with the rapid thrive of large-scale data, numerous state-of-the-art generalized object detectors (e.g. Faster R-CNN, YOLO, SSD) were developed in the past decade. Despite continual efforts in model modification and improvement in training strategies to boost detection accuracy, there are still limitations in performance of detectors when it comes to specialized datasets with uneven object class distributions. This originates from the common usage of Cross Entropy loss function for object classification sub-task that simply ignores the frequency of appearance of object class during training, and thus results in lower accuracies for object classes with fewer number of samples. Class-imbalance in general machine learning has been widely studied, however, little attention has been paid on the subject of object detection. In this paper, we propose to explore and overcome such problem by application of several weighted variants of Cross Entropy loss, for examples Balanced Cross Entropy, Focal Loss and Class-Balanced Loss Based on Effective Number of Samples to our object detector. Experiments with BDD100K (a highly class-imbalanced driving database acquired from on-vehicle cameras capturing mostly Car-class objects and other minority object classes such as Bus, Person and Motor) have proven better class-wise performances of detector trained with the afore-mentioned loss functions.
摘要:目标检测是在供应了大量的现实世界的应用,如自动驾驶,监控和机器人计算机视觉的一项重要任务。随着快速茁壮成长大规模数据,许多国家的最先进的广义对象检测器(例如,更快的R-CNN,YOLO,SSD)在过去的十年中发展。尽管在训练策略,以提升检测精度的模型修改和完善不断的努力,仍有探测器的性能限制,当谈到与不平对象类分布的专业数据集。这从交叉熵损失函数的对于对象分类的子任务的常见用法,简单地忽略训练期间对象类的出现频率,因此源自导致对于对象类具有较少数量的采样的低精度。类不平衡,一般的机器学习中得到了广泛的研究,然而,很少有人注意支付对象检测的对象。在本文中,我们提出了探索和通过交叉熵损失的几个变种加权应用克服这样的问题,例子平衡交叉熵,焦损耗和基于样品到我们的对象检测器的有效位数类期平衡损失。与BDD100K(从车载摄像头拍摄大多是汽车类对象和其他少数民族的对象类,如公交车,人与汽车获得了高度的类不平衡驱动数据库)的实验已经证明了探测器的较好的一类明智的表演与afore-受训提到损失函数。
17. Transfoming Multi-Concept Attention into Video Summarization [PDF] 返回目录
Yen-Ting Liu*, Yu-Jhe Li*, Yu-Chiang Frank Wang
Abstract: Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input. In this paper, we propose an novel attention-based framework for video summarization with complex video data. Unlike previous works which only apply attention mechanism on the correspondence between frames, our multi-concept video self-attention (MC-VSA) model is presented to identify informative regions across temporal and concept video features, which jointly exploit context diversity over time and space for summarization purposes. Together with consistency between video and summary enforced in our framework, our model can be applied to both labeled and unlabeled data, making our method preferable to real-world applications. Extensive and complete experiments on two benchmarks demonstrate the effectiveness of our model both quantitatively and qualitatively, and confirms its superiority over the stateof-the-arts.
摘要:视频摘要是在识别了一个漫长的视频输入亮点帧或镜头在计算机视觉挑战性的任务,其目标之一。在本文中,我们提出了复杂的视频数据的视频汇总的一个新的关注为基础的框架。不同于只在帧之间的对应应用注意机制以前的作品,提出我们的多概念视频自我关注(MC-VSA)模型来识别跨时空概念的视频功能,其中共同开拓方面的多样性在时间和空间信息的区域对于摘要的目的。用视频和我们的框架中执行概要之间的一致性总之,我们的模型可以应用到标记和未标记数据,使得我们的方法中,最好到现实世界的应用。两个基准广泛和完善的实验,在数量上和质量上证明我们的模型的有效性,并确认其在stateof最艺术的优越性。
Yen-Ting Liu*, Yu-Jhe Li*, Yu-Chiang Frank Wang
Abstract: Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input. In this paper, we propose an novel attention-based framework for video summarization with complex video data. Unlike previous works which only apply attention mechanism on the correspondence between frames, our multi-concept video self-attention (MC-VSA) model is presented to identify informative regions across temporal and concept video features, which jointly exploit context diversity over time and space for summarization purposes. Together with consistency between video and summary enforced in our framework, our model can be applied to both labeled and unlabeled data, making our method preferable to real-world applications. Extensive and complete experiments on two benchmarks demonstrate the effectiveness of our model both quantitatively and qualitatively, and confirms its superiority over the stateof-the-arts.
摘要:视频摘要是在识别了一个漫长的视频输入亮点帧或镜头在计算机视觉挑战性的任务,其目标之一。在本文中,我们提出了复杂的视频数据的视频汇总的一个新的关注为基础的框架。不同于只在帧之间的对应应用注意机制以前的作品,提出我们的多概念视频自我关注(MC-VSA)模型来识别跨时空概念的视频功能,其中共同开拓方面的多样性在时间和空间信息的区域对于摘要的目的。用视频和我们的框架中执行概要之间的一致性总之,我们的模型可以应用到标记和未标记数据,使得我们的方法中,最好到现实世界的应用。两个基准广泛和完善的实验,在数量上和质量上证明我们的模型的有效性,并确认其在stateof最艺术的优越性。
18. A heterogeneous branch and multi-level classification network for person re-identification [PDF] 返回目录
Jiabao Wang, Yang Li, Yangshuo Zhang, Zhuang Miao, Rui Zhang
Abstract: Convolutional neural networks with multiple branches have recently been proved highly effective in person re-identification (re-ID). Researchers design multi-branch networks using part models, yet they always attribute the effectiveness to multiple parts. In addition, existing multi-branch networks always have isomorphic branches, which lack structural diversity. In order to improve this problem, we propose a novel Heterogeneous Branch and Multi-level Classification Network (HBMCN), which is designed based on the pre-trained ResNet-50 model. A new heterogeneous branch, SE-Res-Branch, is proposed based on the SE-Res module, which consists of the Squeeze-and-Excitation block and the residual block. Furthermore, a new multi-level classification joint objective function is proposed for the supervised learning of HBMCN, whereby multi-level features are extracted from multiple high-level layers and concatenated to represent a person. Based on three public person re-ID benchmarks (Market1501, DukeMTMC-reID and CUHK03), experimental results show that the proposed HBMCN reaches 94.4%, 85.7% and 73.8% in Rank-1, and 85.7%, 74.6% and 69.0% in mAP, achieving a state-of-the-art performance. Further analysis demonstrates that the specially designed heterogeneous branch performs better than an isomorphic branch, and multi-level classification provides more discriminative features compared to single-level classification. As a result, HBMCN provides substantial further improvements in person re-ID tasks.
摘要:多分支卷积神经网络最近已经证明人重新鉴定(重新编号)非常有效。研究人员使用的零件模型设计多分支网络,但他们总是属性的有效性多个部分。此外,现有的多分支网络总是有同构的分支机构,缺乏结构多样性。为了改善这个问题,我们提出了一个新颖的异构科和多级分类网(HBMCN),这是基于预先训练RESNET-50模型设计的。一种新的异质分支,SE-RES-科,是基于SE-RES模块,它由挤压和 - 激励块和残差块的上提出。此外,一个新的多级分类联合目标函数,提出了HBMCN,的监督学习,由此多级特征从多个高级别层萃取并连接起来以代表一个人。基于三个公众人物里德基准(Market1501,DukeMTMC-Reid和CUHK03),实验结果表明,该HBMCN达到94.4%,85.7%和秩-1 73.8%,和85.7%,74.6%和69.0%地图,实现国家的最先进的性能。进一步的分析表明,比同构分支专门设计的多相分支进行更好,并且比单级分类多级分类提供了更多的判别特征。其结果是,HBMCN提供者重新编号任务实质性进一步改进。
Jiabao Wang, Yang Li, Yangshuo Zhang, Zhuang Miao, Rui Zhang
Abstract: Convolutional neural networks with multiple branches have recently been proved highly effective in person re-identification (re-ID). Researchers design multi-branch networks using part models, yet they always attribute the effectiveness to multiple parts. In addition, existing multi-branch networks always have isomorphic branches, which lack structural diversity. In order to improve this problem, we propose a novel Heterogeneous Branch and Multi-level Classification Network (HBMCN), which is designed based on the pre-trained ResNet-50 model. A new heterogeneous branch, SE-Res-Branch, is proposed based on the SE-Res module, which consists of the Squeeze-and-Excitation block and the residual block. Furthermore, a new multi-level classification joint objective function is proposed for the supervised learning of HBMCN, whereby multi-level features are extracted from multiple high-level layers and concatenated to represent a person. Based on three public person re-ID benchmarks (Market1501, DukeMTMC-reID and CUHK03), experimental results show that the proposed HBMCN reaches 94.4%, 85.7% and 73.8% in Rank-1, and 85.7%, 74.6% and 69.0% in mAP, achieving a state-of-the-art performance. Further analysis demonstrates that the specially designed heterogeneous branch performs better than an isomorphic branch, and multi-level classification provides more discriminative features compared to single-level classification. As a result, HBMCN provides substantial further improvements in person re-ID tasks.
摘要:多分支卷积神经网络最近已经证明人重新鉴定(重新编号)非常有效。研究人员使用的零件模型设计多分支网络,但他们总是属性的有效性多个部分。此外,现有的多分支网络总是有同构的分支机构,缺乏结构多样性。为了改善这个问题,我们提出了一个新颖的异构科和多级分类网(HBMCN),这是基于预先训练RESNET-50模型设计的。一种新的异质分支,SE-RES-科,是基于SE-RES模块,它由挤压和 - 激励块和残差块的上提出。此外,一个新的多级分类联合目标函数,提出了HBMCN,的监督学习,由此多级特征从多个高级别层萃取并连接起来以代表一个人。基于三个公众人物里德基准(Market1501,DukeMTMC-Reid和CUHK03),实验结果表明,该HBMCN达到94.4%,85.7%和秩-1 73.8%,和85.7%,74.6%和69.0%地图,实现国家的最先进的性能。进一步的分析表明,比同构分支专门设计的多相分支进行更好,并且比单级分类多级分类提供了更多的判别特征。其结果是,HBMCN提供者重新编号任务实质性进一步改进。
19. Two-hand Global 3D Pose Estimation Using Monocular RGB [PDF] 返回目录
Fanqing Lin, Connor Wilhelm, Tony Martinez
Abstract: We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images. We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands despite occlusion between two hands and complex background noise and estimates the 2D and 3D canonical joint locations without any depth information. Global joint locations with respect to the camera origin are computed using the hand pose estimations and the actual length of the key bone with a novel projection algorithm. To train the CNNs for this new task, we introduce a large-scale synthetic 3D hand pose dataset. We demonstrate that our system outperforms previous works on 3D canonical hand pose estimation benchmark datasets with RGB-only information. Additionally, we present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs and provide extensive quantitative and qualitative evaluation.
摘要:我们通过解决只单眼RGB输入图像估算双手全球3D关节位置的具有挑战性的任务。我们提出了一个新颖的多级卷积基于神经网络的管道准确细分和定位的手中,尽管两只手和复杂的背景噪音,估计没有任何深度信息的2D和3D规范的关节位置之间闭塞。相对于摄影机原点的全球关节位置所使用的手姿势估计和密钥骨的具有新颖投影算法的实际长度来计算。为了培养细胞神经网络这一新的任务,我们引进了大型合成三维的手的形状数据集。我们表明,我们的系统优于三维规范的手的形状与RGB-只有信息估计基准数据集以前的作品。此外,我们提出实现双手使用RGB-仅输入,并提供丰富的定量和定性评估准确的全球3D手势跟踪的第一部作品。
Fanqing Lin, Connor Wilhelm, Tony Martinez
Abstract: We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images. We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands despite occlusion between two hands and complex background noise and estimates the 2D and 3D canonical joint locations without any depth information. Global joint locations with respect to the camera origin are computed using the hand pose estimations and the actual length of the key bone with a novel projection algorithm. To train the CNNs for this new task, we introduce a large-scale synthetic 3D hand pose dataset. We demonstrate that our system outperforms previous works on 3D canonical hand pose estimation benchmark datasets with RGB-only information. Additionally, we present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs and provide extensive quantitative and qualitative evaluation.
摘要:我们通过解决只单眼RGB输入图像估算双手全球3D关节位置的具有挑战性的任务。我们提出了一个新颖的多级卷积基于神经网络的管道准确细分和定位的手中,尽管两只手和复杂的背景噪音,估计没有任何深度信息的2D和3D规范的关节位置之间闭塞。相对于摄影机原点的全球关节位置所使用的手姿势估计和密钥骨的具有新颖投影算法的实际长度来计算。为了培养细胞神经网络这一新的任务,我们引进了大型合成三维的手的形状数据集。我们表明,我们的系统优于三维规范的手的形状与RGB-只有信息估计基准数据集以前的作品。此外,我们提出实现双手使用RGB-仅输入,并提供丰富的定量和定性评估准确的全球3D手势跟踪的第一部作品。
20. Multi-view Deep Features for Robust Facial Kinship Verification [PDF] 返回目录
Oualid Laiadi, Abdelmalik Ouamane, Abdelhamid Benakcha, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Abstract: Automatic kinship verification from facial images is an emerging research topic in machine learning community. In this paper, we proposed an effective facial features extraction model based on multi-view deep features. Thus, we used four pre-trained deep learning models using eight features layers (FC6 and FC7 layers of each VGG-F, VGG-M, VGG-S and VGG-Face models) to train the proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN) method. Furthermore, we show that how can metric learning methods based on WCCN method integration improves the Simple Scoring Cosine similarity (SSC) method. We refer that we used the SSC method in RFIW'20 competition using the eight deep features concatenation. Thus, the integration of WCCN in the metric learning methods decreases the intra-class variations effect introduced by the deep features weights. We evaluate our proposed method on two kinship benchmarks namely KinFaceW-I and KinFaceW-II databases using four Parent-Child relations (Father-Son, Father-Daughter, Mother-Son and Mother-Daughter). Thus, the proposed MSIDA+WCCN method improves the SSC method with 12.80% and 14.65% on KinFaceW-I and KinFaceW-II databases, respectively. The results obtained are positively compared with some modern methods, including those that rely on deep learning.
摘要:从面部图像自动亲属关系的验证是在机器学习领域一个新兴的研究课题。在本文中,我们提出了一种基于多视图的功能深一种有效的面部特征提取模型。因此,我们使用使用八个功能层(FC6和各VGG-F的FC7层,VGG-M,VGG-S和VGG-的脸部模型)来训练基于判别分析所提出的多线性边信息四个预先训练的深度学习模式集成在类内方差归一化(木西达+ WCCN)方法。此外,我们表明,怎么能根据WCCN方法整合度量学习方法,提高了简单的评分余弦相似性(SSC)方法。我们参考,我们使用深八个功能级联使用RFIW'20竞争SSC方法。因此,WCCN的在度量学习方法的整合减小由深特征权重引入的类内的变化的影响。我们评估用四个亲子关系(父子,父女,母子和母女)两大亲属基准即KinFaceW-I和KinFaceW-II数据库我们提出的方法。因此,所提出木西达+ WCCN方法改善了与分别12.80%和KinFaceW-I 14.65%和KinFaceW-II数据库中,SSC方法。得到的结果是正与一些现代的方法,包括那些依赖于深度学习比较。
Oualid Laiadi, Abdelmalik Ouamane, Abdelhamid Benakcha, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Abstract: Automatic kinship verification from facial images is an emerging research topic in machine learning community. In this paper, we proposed an effective facial features extraction model based on multi-view deep features. Thus, we used four pre-trained deep learning models using eight features layers (FC6 and FC7 layers of each VGG-F, VGG-M, VGG-S and VGG-Face models) to train the proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN) method. Furthermore, we show that how can metric learning methods based on WCCN method integration improves the Simple Scoring Cosine similarity (SSC) method. We refer that we used the SSC method in RFIW'20 competition using the eight deep features concatenation. Thus, the integration of WCCN in the metric learning methods decreases the intra-class variations effect introduced by the deep features weights. We evaluate our proposed method on two kinship benchmarks namely KinFaceW-I and KinFaceW-II databases using four Parent-Child relations (Father-Son, Father-Daughter, Mother-Son and Mother-Daughter). Thus, the proposed MSIDA+WCCN method improves the SSC method with 12.80% and 14.65% on KinFaceW-I and KinFaceW-II databases, respectively. The results obtained are positively compared with some modern methods, including those that rely on deep learning.
摘要:从面部图像自动亲属关系的验证是在机器学习领域一个新兴的研究课题。在本文中,我们提出了一种基于多视图的功能深一种有效的面部特征提取模型。因此,我们使用使用八个功能层(FC6和各VGG-F的FC7层,VGG-M,VGG-S和VGG-的脸部模型)来训练基于判别分析所提出的多线性边信息四个预先训练的深度学习模式集成在类内方差归一化(木西达+ WCCN)方法。此外,我们表明,怎么能根据WCCN方法整合度量学习方法,提高了简单的评分余弦相似性(SSC)方法。我们参考,我们使用深八个功能级联使用RFIW'20竞争SSC方法。因此,WCCN的在度量学习方法的整合减小由深特征权重引入的类内的变化的影响。我们评估用四个亲子关系(父子,父女,母子和母女)两大亲属基准即KinFaceW-I和KinFaceW-II数据库我们提出的方法。因此,所提出木西达+ WCCN方法改善了与分别12.80%和KinFaceW-I 14.65%和KinFaceW-II数据库中,SSC方法。得到的结果是正与一些现代的方法,包括那些依赖于深度学习比较。
21. Learning to Detect 3D Objects from Point Clouds in Real Time [PDF] 返回目录
Abhinav Sagar
Abstract: In this paper, we present a combined architecture using dilated and transposed convolutional neural networks for accurate and efficient semantic image segmentation. In contrast to previous fully convolutional neural networks such as FCN with almost all computation shared on the entire image, we propose an additional architecture which we have named as dilated - transposed fully convolutional neural networks. To achieve this goal, we used dilated convolutional layers in downsampling and transposed convolutional layers in upsampling layers. We have used skip connections in between the blocks formed by convolutions and max pooling layers. This type of architecture has been used successfully in the past for image classification using residual network. In addition we also found selu activation function instead of relu to give better results on the test set images. We reason this is the due to avoiding the model getting stuck in a local minimum, thus experiencing a famous vanishing gradient problem in case with relu activation function. Meanwhile, our result achieved pixel wise class accuracy of 88% on the test set and mean Intersection Over Union(IOU) value of 53.5 which is better than the state of the art using the previous fully convolutional neural networks.
摘要:在本文中,我们使用扩张和互换的卷积神经网络的准确和有效的语义的图像分割呈现组合的架构。相较于以前的完全卷积神经网络,如FCN与几乎所有的计算整个图像上的共享,我们提出了一个额外的架构,我们命名为扩张型 - 换位完全卷积神经网络。为了实现这一目标,我们采用扩张性的采样卷积层和上采样层换位卷积层。我们已经通过在卷积和最大池层形成的块之间使用跳跃连接。这种类型的体系结构已被成功地用于在过去使用剩余网络图像分类。此外,我们还发现激活九色鹿函数,而不是RELU给在测试组图像更好的结果。我们之所以这样是因为避免了模型陷入局部极小,因此在经历与的情况下激活RELU功能的著名消失梯度问题。同时,88%的测试集和53.5平均交集过联盟(IOU)值比使用以前的完全卷积神经网络的技术状态更好的结果我们实现了像素级精度等级。
Abhinav Sagar
Abstract: In this paper, we present a combined architecture using dilated and transposed convolutional neural networks for accurate and efficient semantic image segmentation. In contrast to previous fully convolutional neural networks such as FCN with almost all computation shared on the entire image, we propose an additional architecture which we have named as dilated - transposed fully convolutional neural networks. To achieve this goal, we used dilated convolutional layers in downsampling and transposed convolutional layers in upsampling layers. We have used skip connections in between the blocks formed by convolutions and max pooling layers. This type of architecture has been used successfully in the past for image classification using residual network. In addition we also found selu activation function instead of relu to give better results on the test set images. We reason this is the due to avoiding the model getting stuck in a local minimum, thus experiencing a famous vanishing gradient problem in case with relu activation function. Meanwhile, our result achieved pixel wise class accuracy of 88% on the test set and mean Intersection Over Union(IOU) value of 53.5 which is better than the state of the art using the previous fully convolutional neural networks.
摘要:在本文中,我们使用扩张和互换的卷积神经网络的准确和有效的语义的图像分割呈现组合的架构。相较于以前的完全卷积神经网络,如FCN与几乎所有的计算整个图像上的共享,我们提出了一个额外的架构,我们命名为扩张型 - 换位完全卷积神经网络。为了实现这一目标,我们采用扩张性的采样卷积层和上采样层换位卷积层。我们已经通过在卷积和最大池层形成的块之间使用跳跃连接。这种类型的体系结构已被成功地用于在过去使用剩余网络图像分类。此外,我们还发现激活九色鹿函数,而不是RELU给在测试组图像更好的结果。我们之所以这样是因为避免了模型陷入局部极小,因此在经历与的情况下激活RELU功能的著名消失梯度问题。同时,88%的测试集和53.5平均交集过联盟(IOU)值比使用以前的完全卷积神经网络的技术状态更好的结果我们实现了像素级精度等级。
22. BWCNN: Blink to Word, a Real-Time Convolutional Neural Network Approach [PDF] 返回目录
Albara Ah Ramli, Rex Liu, Rahul Krishnamoorthy, Vishal I B, Xiaoxiao Wang, Ilias Tagkopoulos, Xin Liu
Abstract: Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease of the brain and the spinal cord, which leads to paralysis of motor functions. Patients retain their ability to blink, which can be used for communication. Here, We present an Artificial Intelligence (AI) system that uses eye-blinks to communicate with the outside world, running on real-time Internet-of-Things (IoT) devices. The system uses a Convolutional Neural Network (CNN) to find the blinking pattern, which is defined as a series of Open and Closed states. Each pattern is mapped to a collection of words that manifest the patient's intent. To investigate the best trade-off between accuracy and latency, we investigated several Convolutional Network architectures, such as ResNet, SqueezeNet, DenseNet, and InceptionV3, and evaluated their performance. We found that the InceptionV3 architecture, after hyper-parameter fine-tuning on the specific task led to the best performance with an accuracy of 99.20% and 94ms latency. This work demonstrates how the latest advances in deep learning architectures can be adapted for clinical systems that ameliorate the patient's quality of life regardless of the point-of-care.
摘要:肌萎缩侧索硬化症(ALS)是脑和脊髓,这导致运动功能麻痹的进行性神经变性疾病。患者保留自己的能力眨眼,可用于通信。在这里,我们提出了一个人工智能(AI)系统,该系统使用眼睛闪烁与外界沟通,实时互联网的-事(物联网)的设备上运行。该系统采用一个卷积神经网络(CNN)找到闪烁模式,其被定义为一系列打开和关闭状态的。每个模式映射到词的集合清单病人的意图。调查精度和延迟之间的最佳平衡点,我们调查了几个卷积网络结构,如RESNET,SqueezeNet,DenseNet和InceptionV3,并评估他们的表现。我们发现,InceptionV3架构,就导致了与99.20%和94ms的延时精度最佳性能的特定任务超参数微调后。这项工作演示了如何在深度学习架构的最新进展,可以适用于不管改善点护理的患者生活质量的临床系统。
Albara Ah Ramli, Rex Liu, Rahul Krishnamoorthy, Vishal I B, Xiaoxiao Wang, Ilias Tagkopoulos, Xin Liu
Abstract: Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease of the brain and the spinal cord, which leads to paralysis of motor functions. Patients retain their ability to blink, which can be used for communication. Here, We present an Artificial Intelligence (AI) system that uses eye-blinks to communicate with the outside world, running on real-time Internet-of-Things (IoT) devices. The system uses a Convolutional Neural Network (CNN) to find the blinking pattern, which is defined as a series of Open and Closed states. Each pattern is mapped to a collection of words that manifest the patient's intent. To investigate the best trade-off between accuracy and latency, we investigated several Convolutional Network architectures, such as ResNet, SqueezeNet, DenseNet, and InceptionV3, and evaluated their performance. We found that the InceptionV3 architecture, after hyper-parameter fine-tuning on the specific task led to the best performance with an accuracy of 99.20% and 94ms latency. This work demonstrates how the latest advances in deep learning architectures can be adapted for clinical systems that ameliorate the patient's quality of life regardless of the point-of-care.
摘要:肌萎缩侧索硬化症(ALS)是脑和脊髓,这导致运动功能麻痹的进行性神经变性疾病。患者保留自己的能力眨眼,可用于通信。在这里,我们提出了一个人工智能(AI)系统,该系统使用眼睛闪烁与外界沟通,实时互联网的-事(物联网)的设备上运行。该系统采用一个卷积神经网络(CNN)找到闪烁模式,其被定义为一系列打开和关闭状态的。每个模式映射到词的集合清单病人的意图。调查精度和延迟之间的最佳平衡点,我们调查了几个卷积网络结构,如RESNET,SqueezeNet,DenseNet和InceptionV3,并评估他们的表现。我们发现,InceptionV3架构,就导致了与99.20%和94ms的延时精度最佳性能的特定任务超参数微调后。这项工作演示了如何在深度学习架构的最新进展,可以适用于不管改善点护理的患者生活质量的临床系统。
23. An embedded system for the automated generation of labeled plant images to enable machine learning applications in agriculture [PDF] 返回目录
Michael A. Beck, Chen-Yi Liu, Christopher P. Bidinosti, Christopher J. Henry, Cara M. Godee, Manisha Ajmani
Abstract: A lack of sufficient training data, both in terms of variety and quantity, is often the bottleneck in the development of machine learning (ML) applications in any domain. For agricultural applications, ML-based models designed to perform tasks such as autonomous plant classification will typically be coupled to just one or perhaps a few plant species. As a consequence, each crop-specific task is very likely to require its own specialized training data, and the question of how to serve this need for data now often overshadows the more routine exercise of actually training such models. To tackle this problem, we have developed an embedded robotic system to automatically generate and label large datasets of plant images for ML applications in agriculture. The system can image plants from virtually any angle, thereby ensuring a wide variety of data; and with an imaging rate of up to one image per second, it can produce lableled datasets on the scale of thousands to tens of thousands of images per day. As such, this system offers an important alternative to time and cost-intensive methods of manual generation and labeling. Furthermore, the use of a uniform background made of blue keying fabric enables additional image processing techniques such as background replacement and plant segmentation. It also helps in the training process, essentially forcing the model to focus on the plant features and eliminating random correlations. To demonstrate the capabilities of our system, we generated a dataset of over 34,000 labeled images, with which we trained an ML-model to distinguish grasses from non-grasses in test data from a variety of sources. We now plan to generate much larger datasets of Canadian crop plants and weeds that will be made publicly available in the hope of further enabling ML applications in the agriculture sector.
摘要:缺乏足够的训练数据,无论是在品种和数量而言,通常是在机器学习(ML)的应用程序在任何领域发展的瓶颈。对于农业应用,基于ML-模型设计到执行诸如自主植物分类通常会被连接到只有一个或者是几个植物物种的任务。因此,每个特定作物的任务极有可能需要自己专门的训练数据,以及如何满足这种需要的数据,现在的问题往往掩盖了实际培训等车型更例行演习。为了解决这个问题,我们已经开发的嵌入式机器人系统自动生成和标签的植物图片的大型数据集在农业中的应用ML。从几乎任何角度,从而确保各种数据的系统可以图像植物;并用高达每秒一个图像的成像速率,它可以产生lableled上的数千规模的数据集,以每天图像数万。这样,该系统提供时间和手动生成和标签的成本密集的方法的重要替代。此外,使用由蓝色键控织物的均匀背景使得附加的图像处理技术,如背景更换和植物分割。这也有助于在培训过程中,实质上迫使模型专注于植物的特性和消除随机的相关性。为了证明我们的系统的能力,我们产生超过34000标记的图像,与我们的训练ML-模型以从各种来源区分非草草在测试数据的数据集。现在,我们计划产生更大的加拿大农作物和杂草,将在农业部门进一步使ML申请,希望可公开获得的数据集。
Michael A. Beck, Chen-Yi Liu, Christopher P. Bidinosti, Christopher J. Henry, Cara M. Godee, Manisha Ajmani
Abstract: A lack of sufficient training data, both in terms of variety and quantity, is often the bottleneck in the development of machine learning (ML) applications in any domain. For agricultural applications, ML-based models designed to perform tasks such as autonomous plant classification will typically be coupled to just one or perhaps a few plant species. As a consequence, each crop-specific task is very likely to require its own specialized training data, and the question of how to serve this need for data now often overshadows the more routine exercise of actually training such models. To tackle this problem, we have developed an embedded robotic system to automatically generate and label large datasets of plant images for ML applications in agriculture. The system can image plants from virtually any angle, thereby ensuring a wide variety of data; and with an imaging rate of up to one image per second, it can produce lableled datasets on the scale of thousands to tens of thousands of images per day. As such, this system offers an important alternative to time and cost-intensive methods of manual generation and labeling. Furthermore, the use of a uniform background made of blue keying fabric enables additional image processing techniques such as background replacement and plant segmentation. It also helps in the training process, essentially forcing the model to focus on the plant features and eliminating random correlations. To demonstrate the capabilities of our system, we generated a dataset of over 34,000 labeled images, with which we trained an ML-model to distinguish grasses from non-grasses in test data from a variety of sources. We now plan to generate much larger datasets of Canadian crop plants and weeds that will be made publicly available in the hope of further enabling ML applications in the agriculture sector.
摘要:缺乏足够的训练数据,无论是在品种和数量而言,通常是在机器学习(ML)的应用程序在任何领域发展的瓶颈。对于农业应用,基于ML-模型设计到执行诸如自主植物分类通常会被连接到只有一个或者是几个植物物种的任务。因此,每个特定作物的任务极有可能需要自己专门的训练数据,以及如何满足这种需要的数据,现在的问题往往掩盖了实际培训等车型更例行演习。为了解决这个问题,我们已经开发的嵌入式机器人系统自动生成和标签的植物图片的大型数据集在农业中的应用ML。从几乎任何角度,从而确保各种数据的系统可以图像植物;并用高达每秒一个图像的成像速率,它可以产生lableled上的数千规模的数据集,以每天图像数万。这样,该系统提供时间和手动生成和标签的成本密集的方法的重要替代。此外,使用由蓝色键控织物的均匀背景使得附加的图像处理技术,如背景更换和植物分割。这也有助于在培训过程中,实质上迫使模型专注于植物的特性和消除随机的相关性。为了证明我们的系统的能力,我们产生超过34000标记的图像,与我们的训练ML-模型以从各种来源区分非草草在测试数据的数据集。现在,我们计划产生更大的加拿大农作物和杂草,将在农业部门进一步使ML申请,希望可公开获得的数据集。
24. High-quality Panorama Stitching based on Asymmetric Bidirectional Optical Flow [PDF] 返回目录
Mingyuan Meng, Shaojun Liu
Abstract: In this paper, we propose a panorama stitching algorithm based on asymmetric bidirectional optical flow. This algorithm expects multiple photos captured by fisheye lens cameras as input, and then, through the proposed algorithm, these photos can be merged into a high-quality 360-degree spherical panoramic image. For photos taken from a distant perspective, the parallax among them is relatively small, and the obtained panoramic image can be nearly seamless and undistorted. For photos taken from a close perspective or with a relatively large parallax, a seamless though partially distorted panoramic image can also be obtained. Besides, with the help of Graphics Processing Unit (GPU), this algorithm can complete the whole stitching process at a very fast speed: typically, it only takes less than 30s to obtain a panoramic image of 9000-by-4000 pixels, which means our panorama stitching algorithm is of high value in many real-time applications. Our code is available at this https URL.
摘要:本文提出了一种基于非对称双向光流全景拼接算法。该算法预计到鱼眼镜头摄像机作为输入捕获的多个照片,然后,通过该算法,这些照片可以合并为一个高质量的360度的球形全景图像。用于从一个遥远的透视拍摄的照片,其中视差是相对小的,并且所得到的全景图像可以是几乎无缝和不失真的。对于从接近透视或具有相对大的视差拍摄的照片,也能够获得无缝虽然部分失真的全景图像。此外,随着图形处理单元(GPU)的帮助下,该算法能够以非常快的速度完成整个缝合处理:典型地,只需要小于30s,以获得9000逐4000像素,该装置的全景图像我们的全景拼接算法是许多实时应用的高价值。我们的代码可在此HTTPS URL。
Mingyuan Meng, Shaojun Liu
Abstract: In this paper, we propose a panorama stitching algorithm based on asymmetric bidirectional optical flow. This algorithm expects multiple photos captured by fisheye lens cameras as input, and then, through the proposed algorithm, these photos can be merged into a high-quality 360-degree spherical panoramic image. For photos taken from a distant perspective, the parallax among them is relatively small, and the obtained panoramic image can be nearly seamless and undistorted. For photos taken from a close perspective or with a relatively large parallax, a seamless though partially distorted panoramic image can also be obtained. Besides, with the help of Graphics Processing Unit (GPU), this algorithm can complete the whole stitching process at a very fast speed: typically, it only takes less than 30s to obtain a panoramic image of 9000-by-4000 pixels, which means our panorama stitching algorithm is of high value in many real-time applications. Our code is available at this https URL.
摘要:本文提出了一种基于非对称双向光流全景拼接算法。该算法预计到鱼眼镜头摄像机作为输入捕获的多个照片,然后,通过该算法,这些照片可以合并为一个高质量的360度的球形全景图像。用于从一个遥远的透视拍摄的照片,其中视差是相对小的,并且所得到的全景图像可以是几乎无缝和不失真的。对于从接近透视或具有相对大的视差拍摄的照片,也能够获得无缝虽然部分失真的全景图像。此外,随着图形处理单元(GPU)的帮助下,该算法能够以非常快的速度完成整个缝合处理:典型地,只需要小于30s,以获得9000逐4000像素,该装置的全景图像我们的全景拼接算法是许多实时应用的高价值。我们的代码可在此HTTPS URL。
25. Fast and accurate aberration estimation from 3D bead images using convolutional neural networks [PDF] 返回目录
Debayan Saha, Uwe Schmidt, Qinrong Zhang, Aurelien Barbotin, Qi Hu, Na Ji, Martin J. Booth, Martin Weigert, Eugene W. Myers
Abstract: Estimating optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for microscopy. Here we describe a method (PHASENET) for fast and accurate aberration measurement from experimentally acquired 3D bead images using convolutional neural networks. Importantly, we show that networks trained only on synthetically generated data can successfully predict aberrations from experimental images. We demonstrate our approach on two data sets acquired with different microscopy modalities and find that PHASENET yields results better than or comparable to classical methods while being orders of magnitude faster. We furthermore show that the number of focal planes required for satisfactory prediction is related to different symmetry groups of Zernike modes. PHASENET is freely available as open-source software in Python.
摘要:从体积强度图像估计光学像差是在显微镜传感器自适应光学的关键步骤。在这里,我们描述了从使用卷积神经网络实验获得的3D图像珠快速和准确的像差测量方法(PHASENET)。重要的是,我们证明了网络训练的只有合成产生的数据能够成功地预测从实验图像畸变。我们证明了我们与不同的显微镜方式获得的两组数据的方法,发现PHASENET产生的结果优于或相当于,同时快几个数量级的经典方法。我们还表明,满意的预测所需的焦平面的数量有关的泽尼克模式不同的对称群。 PHASENET是免费提供的在Python开源软件。
Debayan Saha, Uwe Schmidt, Qinrong Zhang, Aurelien Barbotin, Qi Hu, Na Ji, Martin J. Booth, Martin Weigert, Eugene W. Myers
Abstract: Estimating optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for microscopy. Here we describe a method (PHASENET) for fast and accurate aberration measurement from experimentally acquired 3D bead images using convolutional neural networks. Importantly, we show that networks trained only on synthetically generated data can successfully predict aberrations from experimental images. We demonstrate our approach on two data sets acquired with different microscopy modalities and find that PHASENET yields results better than or comparable to classical methods while being orders of magnitude faster. We furthermore show that the number of focal planes required for satisfactory prediction is related to different symmetry groups of Zernike modes. PHASENET is freely available as open-source software in Python.
摘要:从体积强度图像估计光学像差是在显微镜传感器自适应光学的关键步骤。在这里,我们描述了从使用卷积神经网络实验获得的3D图像珠快速和准确的像差测量方法(PHASENET)。重要的是,我们证明了网络训练的只有合成产生的数据能够成功地预测从实验图像畸变。我们证明了我们与不同的显微镜方式获得的两组数据的方法,发现PHASENET产生的结果优于或相当于,同时快几个数量级的经典方法。我们还表明,满意的预测所需的焦平面的数量有关的泽尼克模式不同的对称群。 PHASENET是免费提供的在Python开源软件。
26. Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision [PDF] 返回目录
Patrick Rosenberger, Akansel Cosgun, Rhys Newbury, Jun Kwan, Valerio Ortenzi, Peter Corke, Manfred Grafinger
Abstract: We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.
摘要:我们提出了安全和独立的对象使用实时机器人视觉和操作人 - 机器人切换的方法。我们的目标是用一个通用对象检测器,一个快速把握选择算法一般适用性通过使用单个夹持器安装RGB-d相机,因此不依赖于外部传感器。该机器人通过向感兴趣对象视觉伺服控制。把一个高度重视安全,我们使用了两个感知模块:人体部位分割和手/手指分割。被认为属于人类的像素从候选把握姿势过滤掉,从而保证机器人安全地挑选对象,而与人合伙碰撞。把握选择和感知模块,实时,允许监测的进展同时运行。在与13个对象的实验中,机器人能够成功采取从人对象中试验的81.9%。
Patrick Rosenberger, Akansel Cosgun, Rhys Newbury, Jun Kwan, Valerio Ortenzi, Peter Corke, Manfred Grafinger
Abstract: We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.
摘要:我们提出了安全和独立的对象使用实时机器人视觉和操作人 - 机器人切换的方法。我们的目标是用一个通用对象检测器,一个快速把握选择算法一般适用性通过使用单个夹持器安装RGB-d相机,因此不依赖于外部传感器。该机器人通过向感兴趣对象视觉伺服控制。把一个高度重视安全,我们使用了两个感知模块:人体部位分割和手/手指分割。被认为属于人类的像素从候选把握姿势过滤掉,从而保证机器人安全地挑选对象,而与人合伙碰撞。把握选择和感知模块,实时,允许监测的进展同时运行。在与13个对象的实验中,机器人能够成功采取从人对象中试验的81.9%。
27. Shapley Value as Principled Metric for Structured Network Pruning [PDF] 返回目录
Marco Ancona, Cengiz Öztireli, Markus Gross
Abstract: Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to reduce the harm induced by pruning. Recent results showed that random pruning performs on par with other metrics, given enough fine-tuning resources. In this work, we show that this is not true on a low-data regime when fine-tuning is either not possible or not effective. In this case, reducing the harm caused by pruning becomes crucial to retain the performance of the network. First, we analyze the problem of estimating the contribution of hidden units with tools suggested by cooperative game theory and propose Shapley values as a principled ranking metric for this task. We compare with several alternatives proposed in the literature and discuss how Shapley values are theoretically preferable. Finally, we compare all ranking metrics on the challenging scenario of low-data pruning, where we demonstrate how Shapley values outperform other heuristics.
摘要:结构化修剪是一个众所周知的技术来降低存储大小和神经网络的推理成本。通常修剪管道由居网络内部过滤器和激活相对于他们对网络性能的贡献,用最低的贡献移除单位,微调的网络,以减少修剪引起的危害。最近的研究结果显示,在同水准与其他指标随机修剪执行,给予足够的微调资源。在这项工作中,我们表明,当微调是要么不可能,要么没有效果,这是不低的数据政权如此。在这种情况下,减少因修剪的危害就变得至关重要,以保持网络的性能。首先,我们来分析估计隐单元与合作博弈理论认为工具的贡献的问题,并提出沙普利值作为一个原则性的排序度量此任务。我们比较与文献中提出了几种方案,并讨论沙普利值如何在理论上是优选的。最后,我们会比较低数据删减,具有挑战性的场景所有的排名指标,我们展示的Shapley值是如何优于其他启发式。
Marco Ancona, Cengiz Öztireli, Markus Gross
Abstract: Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to reduce the harm induced by pruning. Recent results showed that random pruning performs on par with other metrics, given enough fine-tuning resources. In this work, we show that this is not true on a low-data regime when fine-tuning is either not possible or not effective. In this case, reducing the harm caused by pruning becomes crucial to retain the performance of the network. First, we analyze the problem of estimating the contribution of hidden units with tools suggested by cooperative game theory and propose Shapley values as a principled ranking metric for this task. We compare with several alternatives proposed in the literature and discuss how Shapley values are theoretically preferable. Finally, we compare all ranking metrics on the challenging scenario of low-data pruning, where we demonstrate how Shapley values outperform other heuristics.
摘要:结构化修剪是一个众所周知的技术来降低存储大小和神经网络的推理成本。通常修剪管道由居网络内部过滤器和激活相对于他们对网络性能的贡献,用最低的贡献移除单位,微调的网络,以减少修剪引起的危害。最近的研究结果显示,在同水准与其他指标随机修剪执行,给予足够的微调资源。在这项工作中,我们表明,当微调是要么不可能,要么没有效果,这是不低的数据政权如此。在这种情况下,减少因修剪的危害就变得至关重要,以保持网络的性能。首先,我们来分析估计隐单元与合作博弈理论认为工具的贡献的问题,并提出沙普利值作为一个原则性的排序度量此任务。我们比较与文献中提出了几种方案,并讨论沙普利值如何在理论上是优选的。最后,我们会比较低数据删减,具有挑战性的场景所有的排名指标,我们展示的Shapley值是如何优于其他启发式。
28. AnalogNet: Convolutional Neural Network Inference on Analog Focal Plane Sensor Processors [PDF] 返回目录
Matthew Z. Wong, Benoit Guillard, Riku Murai, Sajad Saeedi, Paul H.J. Kelly
Abstract: We present a high-speed, energy-efficient Convolutional Neural Network (CNN) architecture utilising the capabilities of a unique class of devices known as analog Focal Plane Sensor Processors (FPSP), in which the sensor and the processor are embedded together on the same silicon chip. Unlike traditional vision systems, where the sensor array sends collected data to a separate processor for processing, FPSPs allow data to be processed on the imaging device itself. This unique architecture enables ultra-fast image processing and high energy efficiency, at the expense of limited processing resources and approximate computations. In this work, we show how to convert standard CNNs to FPSP code, and demonstrate a method of training networks to increase their robustness to analog computation errors. Our proposed architecture, coined AnalogNet, reaches a testing accuracy of 96.9% on the MNIST handwritten digits recognition task, at a speed of 2260 FPS, for a cost of 0.7 mJ per frame.
摘要:我们提出利用独特的一类被称为模拟焦平面传感器处理器(FPSP),其中,所述传感器和所述处理器一起嵌入在设备中的功能的高速,节能的卷积神经网络(CNN)架构相同的硅芯片上。不同于传统的视觉系统,其中所述传感器阵列发送所收集的数据以用于处理单独的处理器,FPSPs允许数据到成像设备本身上被处理。这种独特的架构实现了超快的图像处理和高能源效率,在有限的处理资源和近似计算的费用。在这项工作中,我们将展示如何标准细胞神经网络转换为FPSP代码和演示的培训网络,以提高其稳健性模拟计算误差的方法。我们提出的架构,创造AnalogNet,达到96.9%的MNIST手写数字识别任务测试精度,在2260 FPS的速度,每帧0.7兆焦耳的成本。
Matthew Z. Wong, Benoit Guillard, Riku Murai, Sajad Saeedi, Paul H.J. Kelly
Abstract: We present a high-speed, energy-efficient Convolutional Neural Network (CNN) architecture utilising the capabilities of a unique class of devices known as analog Focal Plane Sensor Processors (FPSP), in which the sensor and the processor are embedded together on the same silicon chip. Unlike traditional vision systems, where the sensor array sends collected data to a separate processor for processing, FPSPs allow data to be processed on the imaging device itself. This unique architecture enables ultra-fast image processing and high energy efficiency, at the expense of limited processing resources and approximate computations. In this work, we show how to convert standard CNNs to FPSP code, and demonstrate a method of training networks to increase their robustness to analog computation errors. Our proposed architecture, coined AnalogNet, reaches a testing accuracy of 96.9% on the MNIST handwritten digits recognition task, at a speed of 2260 FPS, for a cost of 0.7 mJ per frame.
摘要:我们提出利用独特的一类被称为模拟焦平面传感器处理器(FPSP),其中,所述传感器和所述处理器一起嵌入在设备中的功能的高速,节能的卷积神经网络(CNN)架构相同的硅芯片上。不同于传统的视觉系统,其中所述传感器阵列发送所收集的数据以用于处理单独的处理器,FPSPs允许数据到成像设备本身上被处理。这种独特的架构实现了超快的图像处理和高能源效率,在有限的处理资源和近似计算的费用。在这项工作中,我们将展示如何标准细胞神经网络转换为FPSP代码和演示的培训网络,以提高其稳健性模拟计算误差的方法。我们提出的架构,创造AnalogNet,达到96.9%的MNIST手写数字识别任务测试精度,在2260 FPS的速度,每帧0.7兆焦耳的成本。
29. A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks [PDF] 返回目录
Ruqian Hao, Khashayar Namdar, Lin Liu, Masoom A. Haider, Farzad Khalvati
Abstract: Data augmentation refers to a group of techniques whose goal is to battle limited amount of available data to improve model generalization and push sample distribution toward the true distribution. While different augmentation strategies and their combinations have been investigated for various computer vision tasks in the context of deep learning, a specific work in the domain of medical imaging is rare and to the best of our knowledge, there has been no dedicated work on exploring the effects of various augmentation methods on the performance of deep learning models in prostate cancer detection. In this work, we have statically applied five most frequently used augmentation techniques (random rotation, horizontal flip, vertical flip, random crop, and translation) to prostate Diffusion-weighted Magnetic Resonance Imaging training dataset of 217 patients separately and evaluated the effect of each method on the accuracy of prostate cancer detection. The augmentation algorithms were applied independently to each data channel and a shallow as well as a deep Convolutional Neural Network (CNN) were trained on the five augmented sets separately. We used Area Under Receiver Operating Characteristic (ROC) curve (AUC) to evaluate the performance of the trained CNNs on a separate test set of 95 patients, using a validation set of 102 patients for finetuning. The shallow network outperformed the deep network with the best 2D slice-based AUC of 0.85 obtained by the rotation method.
摘要:数据增强是指一组技术,其目标是战斗有限的可用数据的量,以提高模型的概括和推样本分布朝向真实分布。虽然不同的增强策略和他们的组合已经在深学习的角度研究了适用于各种计算机视觉任务,在医疗成像领域具体工作是罕见的,据我们所知,目前已经在探索没有专门的工作对深学习模式在前列腺癌检测性能的各种增强方法的效果。在这项工作中,我们静态地加五个最频繁使用的增强技术(随机旋转,水平翻转,垂直翻转,任意裁剪,和翻译)前列腺扩散加权磁共振成像培训的217例患者分别数据集和评估每个效果方法对前列腺癌检测的精度。所述增强算法被独立地施加到每个数据信道和一个浅的,以及一个深卷积神经网络(CNN)被在五个增强集单独培训。我们使用下面积接受者操作特征(ROC)曲线(AUC),以评估在单独的测试组的95例患者的训练细胞神经网络的性能,采用的102例患者为一细化和微调验证集。浅网络胜过深网络0.85最好基于切片的2D AUC通过旋转方法获得的。
Ruqian Hao, Khashayar Namdar, Lin Liu, Masoom A. Haider, Farzad Khalvati
Abstract: Data augmentation refers to a group of techniques whose goal is to battle limited amount of available data to improve model generalization and push sample distribution toward the true distribution. While different augmentation strategies and their combinations have been investigated for various computer vision tasks in the context of deep learning, a specific work in the domain of medical imaging is rare and to the best of our knowledge, there has been no dedicated work on exploring the effects of various augmentation methods on the performance of deep learning models in prostate cancer detection. In this work, we have statically applied five most frequently used augmentation techniques (random rotation, horizontal flip, vertical flip, random crop, and translation) to prostate Diffusion-weighted Magnetic Resonance Imaging training dataset of 217 patients separately and evaluated the effect of each method on the accuracy of prostate cancer detection. The augmentation algorithms were applied independently to each data channel and a shallow as well as a deep Convolutional Neural Network (CNN) were trained on the five augmented sets separately. We used Area Under Receiver Operating Characteristic (ROC) curve (AUC) to evaluate the performance of the trained CNNs on a separate test set of 95 patients, using a validation set of 102 patients for finetuning. The shallow network outperformed the deep network with the best 2D slice-based AUC of 0.85 obtained by the rotation method.
摘要:数据增强是指一组技术,其目标是战斗有限的可用数据的量,以提高模型的概括和推样本分布朝向真实分布。虽然不同的增强策略和他们的组合已经在深学习的角度研究了适用于各种计算机视觉任务,在医疗成像领域具体工作是罕见的,据我们所知,目前已经在探索没有专门的工作对深学习模式在前列腺癌检测性能的各种增强方法的效果。在这项工作中,我们静态地加五个最频繁使用的增强技术(随机旋转,水平翻转,垂直翻转,任意裁剪,和翻译)前列腺扩散加权磁共振成像培训的217例患者分别数据集和评估每个效果方法对前列腺癌检测的精度。所述增强算法被独立地施加到每个数据信道和一个浅的,以及一个深卷积神经网络(CNN)被在五个增强集单独培训。我们使用下面积接受者操作特征(ROC)曲线(AUC),以评估在单独的测试组的95例患者的训练细胞神经网络的性能,采用的102例患者为一细化和微调验证集。浅网络胜过深网络0.85最好基于切片的2D AUC通过旋转方法获得的。
30. Variational Inference and Learning of Piecewise-linear Dynamical Systems [PDF] 返回目录
Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud
Abstract: Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. The baseline method assumes that both the dynamic and observation models follow linear-Gaussian models. Non-linear extensions lead to intractable solvers. It is also possible to consider several linear models, or a piecewise linear model, and to combine them with a switching mechanism, which is also intractable because of the exponential explosion of the number of Gaussian components. In this paper, we propose a variational approximation of piecewise linear dynamic systems. We provide full details of the derivation of a variational expectation-maximization algorithm that can be used either as a filter or as a smoother. We show that the model parameters can be split into two sets, a set of static (or observation parameters) and a set of dynamic parameters. The immediate consequences are that the former set can be estimated off-line and that the number of linear models (or the number of states of the switching variable) can be learned based on model selection. We apply the proposed method to the problem of visual tracking and we thoroughly compare our algorithm with several visual trackers applied to the problem of head-pose estimation.
摘要:建模数据的时间特性是许多科学和工程领域的原始意义。基线方法假定动态和观察两种模型遵循线性高斯模型。非线性扩展导致顽固性求解。也可以考虑几个线性模型或线性模型分段,并将其与切换机制,这是因为高斯要素的数量的指数爆炸还顽固性结合起来。在本文中,我们提出了分段线性动态系统的变分近似。我们提供的不仅可以被用来作为一个过滤器或者作为平滑变分最大期望算法的推导的全部细节。我们表明,该模型的参数可以被分成两组,一组静态的(或观察参数)和一组动态参数。直接后果是,前者集可以离线和线性模型的数量(或转换变量的状态数)可以基于模型选择学习来估计。我们采用该方法以视觉跟踪的问题,我们彻底地比较我们与一些视觉跟踪算法应用到头部姿态估计的问题。
Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud
Abstract: Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. The baseline method assumes that both the dynamic and observation models follow linear-Gaussian models. Non-linear extensions lead to intractable solvers. It is also possible to consider several linear models, or a piecewise linear model, and to combine them with a switching mechanism, which is also intractable because of the exponential explosion of the number of Gaussian components. In this paper, we propose a variational approximation of piecewise linear dynamic systems. We provide full details of the derivation of a variational expectation-maximization algorithm that can be used either as a filter or as a smoother. We show that the model parameters can be split into two sets, a set of static (or observation parameters) and a set of dynamic parameters. The immediate consequences are that the former set can be estimated off-line and that the number of linear models (or the number of states of the switching variable) can be learned based on model selection. We apply the proposed method to the problem of visual tracking and we thoroughly compare our algorithm with several visual trackers applied to the problem of head-pose estimation.
摘要:建模数据的时间特性是许多科学和工程领域的原始意义。基线方法假定动态和观察两种模型遵循线性高斯模型。非线性扩展导致顽固性求解。也可以考虑几个线性模型或线性模型分段,并将其与切换机制,这是因为高斯要素的数量的指数爆炸还顽固性结合起来。在本文中,我们提出了分段线性动态系统的变分近似。我们提供的不仅可以被用来作为一个过滤器或者作为平滑变分最大期望算法的推导的全部细节。我们表明,该模型的参数可以被分成两组,一组静态的(或观察参数)和一组动态参数。直接后果是,前者集可以离线和线性模型的数量(或转换变量的状态数)可以基于模型选择学习来估计。我们采用该方法以视觉跟踪的问题,我们彻底地比较我们与一些视觉跟踪算法应用到头部姿态估计的问题。
31. A Review on End-To-End Methods for Brain Tumor Segmentation and Overall Survival Prediction [PDF] 返回目录
Snehal Rajput, Mehul S Raval
Abstract: Brain tumor segmentation intends to delineate tumor tissues from healthy brain tissues. The tumor tissues include necrosis, peritumoral edema, and active tumor. In contrast, healthy brain tissues include white matter, gray matter, and cerebrospinal fluid. The MRI based brain tumor segmentation research is gaining popularity as; 1. It does not irradiate ionized radiation like X-ray or computed tomography imaging. 2. It produces detailed pictures of internal body structures. The MRI scans are input to deep learning-based approaches which are useful for automatic brain tumor segmentation. The features from segments are fed to the classifier which predict the overall survival of the patient. The motive of this paper is to give an extensive overview of state-of-the-art jointly covering brain tumor segmentation and overall survival prediction.
摘要:脑肿瘤分割拟划定从健康的脑组织中的肿瘤组织。肿瘤组织包括坏死,瘤周水肿,和活跃的肿瘤。相反,健康的脑组织包括白质,灰质和脑脊液。基于MRI脑肿瘤细分研究越来越流行的; 1.不照射电离放射线等X射线或计算机断层摄影成像。 2.它产生人体内部结构的详细的图片。所述MRI扫描输入到其是用于自动脑瘤分割有用深基于学习的方法。从段中的特征被馈送到预测患者的总体生存的分类器。本文的动机是,得到的状态的最先进的共同覆盖脑瘤分割和总体存活预测的广泛概述。
Snehal Rajput, Mehul S Raval
Abstract: Brain tumor segmentation intends to delineate tumor tissues from healthy brain tissues. The tumor tissues include necrosis, peritumoral edema, and active tumor. In contrast, healthy brain tissues include white matter, gray matter, and cerebrospinal fluid. The MRI based brain tumor segmentation research is gaining popularity as; 1. It does not irradiate ionized radiation like X-ray or computed tomography imaging. 2. It produces detailed pictures of internal body structures. The MRI scans are input to deep learning-based approaches which are useful for automatic brain tumor segmentation. The features from segments are fed to the classifier which predict the overall survival of the patient. The motive of this paper is to give an extensive overview of state-of-the-art jointly covering brain tumor segmentation and overall survival prediction.
摘要:脑肿瘤分割拟划定从健康的脑组织中的肿瘤组织。肿瘤组织包括坏死,瘤周水肿,和活跃的肿瘤。相反,健康的脑组织包括白质,灰质和脑脊液。基于MRI脑肿瘤细分研究越来越流行的; 1.不照射电离放射线等X射线或计算机断层摄影成像。 2.它产生人体内部结构的详细的图片。所述MRI扫描输入到其是用于自动脑瘤分割有用深基于学习的方法。从段中的特征被馈送到预测患者的总体生存的分类器。本文的动机是,得到的状态的最先进的共同覆盖脑瘤分割和总体存活预测的广泛概述。
32. Perturbation Analysis of Gradient-based Adversarial Attacks [PDF] 返回目录
Utku Ozbulak, Manvel Gasparyan, Wesley De Neve, Arnout Van Messem
Abstract: After the discovery of adversarial examples and their adverse effects on deep learning models, many studies focused on finding more diverse methods to generate these carefully crafted samples. Although empirical results on the effectiveness of adversarial example generation methods against defense mechanisms are discussed in detail in the literature, an in-depth study of the theoretical properties and the perturbation effectiveness of these adversarial attacks has largely been lacking. In this paper, we investigate the objective functions of three popular methods for adversarial example generation: the L-BFGS attack, the Iterative Fast Gradient Sign attack, and Carlini & Wagner's attack (CW). Specifically, we perform a comparative and formal analysis of the loss functions underlying the aforementioned attacks while laying out large-scale experimental results on ImageNet dataset. This analysis exposes (1) the faster optimization speed as well as the constrained optimization space of the cross-entropy loss, (2) the detrimental effects of using the signature of the cross-entropy loss on optimization precision as well as optimization space, and (3) the slow optimization speed of the logit loss in the context of adversariality. Our experiments reveal that the Iterative Fast Gradient Sign attack, which is thought to be fast for generating adversarial examples, is the worst attack in terms of the number of iterations required to create adversarial examples in the setting of equal perturbation. Moreover, our experiments show that the underlying loss function of CW, which is criticized for being substantially slower than other adversarial attacks, is not that much slower than other loss functions. Finally, we analyze how well neural networks can identify adversarial perturbations generated by the attacks under consideration, hereby revisiting the idea of adversarial retraining on ImageNet.
摘要:对抗性例子,他们对深学习模型的不利影响后发现,许多研究重点是寻找更多样化的方法来生成这些精心制作的样本。虽然针对防御机制对抗性例如生成方法的效力经验结果详述于文献中讨论,已经在很大程度上被缺乏理论性能的深入研究以及这些对抗攻击的扰动效果。在本文中,我们研究的三种常用方法对抗例如一代的客观功能:L-BFGS进攻,快速迭代倾斜的符号攻击,和卡烈尼和瓦格纳的攻击(CW)。具体而言,我们进行的底层与上述攻击,同时上ImageNet数据集布局的大规模实验结果的损失函数的比较和正式分析。该分析自曝(1)的速度更快的优化速度以及横熵损失的受约束的优化空间,(2)使用上优化精度交叉熵损失的签名以及优化空间的不利影响,并(3)在adversariality的上下文中,分对数损失的慢优化速度。我们的实验表明,该迭代快速倾斜的符号攻击,这被认为是快速产生对抗的例子,是在迭代次数方面的最严重的袭击创造平等扰动的设置对抗性的例子需要。此外,我们的实验表明,CW的潜在损失函数,这是批评基本上慢于其他敌对攻击,是不是比其他损失函数慢得多。最后,我们分析神经网络如何能够很好地识别所考虑的攻击产生对抗性干扰,在此重温对抗性再培训上ImageNet的想法。
Utku Ozbulak, Manvel Gasparyan, Wesley De Neve, Arnout Van Messem
Abstract: After the discovery of adversarial examples and their adverse effects on deep learning models, many studies focused on finding more diverse methods to generate these carefully crafted samples. Although empirical results on the effectiveness of adversarial example generation methods against defense mechanisms are discussed in detail in the literature, an in-depth study of the theoretical properties and the perturbation effectiveness of these adversarial attacks has largely been lacking. In this paper, we investigate the objective functions of three popular methods for adversarial example generation: the L-BFGS attack, the Iterative Fast Gradient Sign attack, and Carlini & Wagner's attack (CW). Specifically, we perform a comparative and formal analysis of the loss functions underlying the aforementioned attacks while laying out large-scale experimental results on ImageNet dataset. This analysis exposes (1) the faster optimization speed as well as the constrained optimization space of the cross-entropy loss, (2) the detrimental effects of using the signature of the cross-entropy loss on optimization precision as well as optimization space, and (3) the slow optimization speed of the logit loss in the context of adversariality. Our experiments reveal that the Iterative Fast Gradient Sign attack, which is thought to be fast for generating adversarial examples, is the worst attack in terms of the number of iterations required to create adversarial examples in the setting of equal perturbation. Moreover, our experiments show that the underlying loss function of CW, which is criticized for being substantially slower than other adversarial attacks, is not that much slower than other loss functions. Finally, we analyze how well neural networks can identify adversarial perturbations generated by the attacks under consideration, hereby revisiting the idea of adversarial retraining on ImageNet.
摘要:对抗性例子,他们对深学习模型的不利影响后发现,许多研究重点是寻找更多样化的方法来生成这些精心制作的样本。虽然针对防御机制对抗性例如生成方法的效力经验结果详述于文献中讨论,已经在很大程度上被缺乏理论性能的深入研究以及这些对抗攻击的扰动效果。在本文中,我们研究的三种常用方法对抗例如一代的客观功能:L-BFGS进攻,快速迭代倾斜的符号攻击,和卡烈尼和瓦格纳的攻击(CW)。具体而言,我们进行的底层与上述攻击,同时上ImageNet数据集布局的大规模实验结果的损失函数的比较和正式分析。该分析自曝(1)的速度更快的优化速度以及横熵损失的受约束的优化空间,(2)使用上优化精度交叉熵损失的签名以及优化空间的不利影响,并(3)在adversariality的上下文中,分对数损失的慢优化速度。我们的实验表明,该迭代快速倾斜的符号攻击,这被认为是快速产生对抗的例子,是在迭代次数方面的最严重的袭击创造平等扰动的设置对抗性的例子需要。此外,我们的实验表明,CW的潜在损失函数,这是批评基本上慢于其他敌对攻击,是不是比其他损失函数慢得多。最后,我们分析神经网络如何能够很好地识别所考虑的攻击产生对抗性干扰,在此重温对抗性再培训上ImageNet的想法。
33. CT-based COVID-19 Triage: Deep Multitask Learning Improves Joint Identification and Severity Quantification [PDF] 返回目录
Mikhail Goncharov, Maxim Pisov, Alexey Shevtsov, Boris Shirokikh, Anvar Kurmukov, Ivan Blokhin, Valeria Chernina, Alexander Solovev, Victor Gombolevskiy, Sergey Morozov, Mikhail Belyaev
Abstract: The current COVID-19 pandemic overloads healthcare systems, including radiology departments. Though several deep learning approaches were developed to assist in CT analysis, nobody considered study triage directly as a computer science problem. We describe two basic setups: Identification of COVID-19 to prioritize studies of potentially infected patients to isolate them as early as possible; Severity quantification to highlight studies of severe patients and direct them to a hospital or provide emergency medical care. We formalize these tasks as binary classification and estimation of affected lung percentage. Though similar problems were well-studied separately, we show that existing methods provide reasonable quality only for one of these setups. To consolidate both triage approaches, we employ a multitask learning and propose a convolutional neural network to combine all available labels within a single model. We train our model on approximately 2000 publicly available CT studies and test it with a carefully designed set consisting of 33 COVID patients, 32 healthy patients, and 36 patients with other lung pathologies to emulate a typical patient flow in an out-patient hospital. The developed model achieved 0.951 ROC AUC for Identification of COVID-19 and 0.98 Spearman Correlation for Severity quantification. We release all the code and create a public leaderboard, where other community members can test their models on our dataset.
摘要:目前COVID-19大流行过载医疗系统,包括放射科。虽然几个深的学习方法被开发,以协助CT分析,没有人认为学习分流直接作为计算机科学问题。我们描述两个基本设置:中COVID-19的优先次序可能受感染的病人的研究,他们尽可能早地分离鉴定;严重性量化突出重症患者的研究,并引导他们到医院或提供紧急医疗服务。我们正式确定这些任务二元分类和受影响肺百分比的估计。尽管类似的问题分别进行充分研究,我们证明了现有方法仅适用于这些设置的一个提供合理的质量。为了巩固这两个分流方法,我们采用了多任务学习,并提出卷积神经网络的单一模式中的所有可用标签结合起来。我们培训的大约2000个可公开获得的CT研究和测试模型与精心设计的一套包括33名COVID患者,32周健康的患者,以及36例其他肺部疾病效仿在门诊医院的典型病人流量。开发的模型达到0.951 ROC AUC为COVID-19的鉴定和0.98严重性量化Spearman等级相关。我们发布的所有代码,并创建一个公共排行榜,在其他社区成员可以测试我们的数据他们的模型。
Mikhail Goncharov, Maxim Pisov, Alexey Shevtsov, Boris Shirokikh, Anvar Kurmukov, Ivan Blokhin, Valeria Chernina, Alexander Solovev, Victor Gombolevskiy, Sergey Morozov, Mikhail Belyaev
Abstract: The current COVID-19 pandemic overloads healthcare systems, including radiology departments. Though several deep learning approaches were developed to assist in CT analysis, nobody considered study triage directly as a computer science problem. We describe two basic setups: Identification of COVID-19 to prioritize studies of potentially infected patients to isolate them as early as possible; Severity quantification to highlight studies of severe patients and direct them to a hospital or provide emergency medical care. We formalize these tasks as binary classification and estimation of affected lung percentage. Though similar problems were well-studied separately, we show that existing methods provide reasonable quality only for one of these setups. To consolidate both triage approaches, we employ a multitask learning and propose a convolutional neural network to combine all available labels within a single model. We train our model on approximately 2000 publicly available CT studies and test it with a carefully designed set consisting of 33 COVID patients, 32 healthy patients, and 36 patients with other lung pathologies to emulate a typical patient flow in an out-patient hospital. The developed model achieved 0.951 ROC AUC for Identification of COVID-19 and 0.98 Spearman Correlation for Severity quantification. We release all the code and create a public leaderboard, where other community members can test their models on our dataset.
摘要:目前COVID-19大流行过载医疗系统,包括放射科。虽然几个深的学习方法被开发,以协助CT分析,没有人认为学习分流直接作为计算机科学问题。我们描述两个基本设置:中COVID-19的优先次序可能受感染的病人的研究,他们尽可能早地分离鉴定;严重性量化突出重症患者的研究,并引导他们到医院或提供紧急医疗服务。我们正式确定这些任务二元分类和受影响肺百分比的估计。尽管类似的问题分别进行充分研究,我们证明了现有方法仅适用于这些设置的一个提供合理的质量。为了巩固这两个分流方法,我们采用了多任务学习,并提出卷积神经网络的单一模式中的所有可用标签结合起来。我们培训的大约2000个可公开获得的CT研究和测试模型与精心设计的一套包括33名COVID患者,32周健康的患者,以及36例其他肺部疾病效仿在门诊医院的典型病人流量。开发的模型达到0.951 ROC AUC为COVID-19的鉴定和0.98严重性量化Spearman等级相关。我们发布的所有代码,并创建一个公共排行榜,在其他社区成员可以测试我们的数据他们的模型。
34. Learning to do multiframe blind deconvolution unsupervisedly [PDF] 返回目录
A. Asensio Ramos
Abstract: Observation from ground based telescopes are affected by the presence of the Earth atmosphere, which severely perturbs them. The use of adaptive optics techniques has allowed us to partly beat this limitation. However, image selection or post-facto image reconstruction methods are routinely needed to reach the diffraction limit of telescopes. Deep learning has been recently used to accelerate these image reconstructions. Currently, these deep neural networks are trained with supervision, so that standard deconvolution algorithms need to be applied a-priori to generate the training sets. Our aim is to propose an unsupervised method which can then be trained simply with observations and check it with data from the FastCam instrument. We use a neural model composed of three neural networks that are trained end-to-end by leveraging the linear image formation theory to construct a physically-motivated loss function. The analysis of the trained neural model shows that multiframe blind deconvolution can be trained self-supervisedly, i.e., using only observations. The output of the network are the corrected images and also estimations of the instantaneous wavefronts. The network model is of the order of 1000 times faster than applying standard deconvolution based on optimization. With some work, the model can bed used on real-time at the telescope.
摘要:基于地面望远镜观测被地球大气层,这严重扰乱了他们的存在的影响。采用自适应光学技术使我们能够击败一部分这一限制。然而,常规地需要图像选择或事后图像重建方法以达到望远镜的衍射极限。深度学习最近已被使用,以加速这些图像重建。目前,这些深层神经网络与监督的培训,让标准去卷积算法需要应用先验生成的训练集。我们的目的是提出,然后可以与观测简单地训练无监督方法,并使用从FASTCAM仪器数据进行检查。我们使用由利用线性成像理论,构建物理激励的损失函数训练有素的终端到年底起神经网络组成的神经网络模型。的训练神经模型显示了分析,认为复盲解能,即只用观察来训练自我supervisedly。网络的输出是校正后的图像,并且还估计瞬时波前。网络模型是1000倍的数量级比应用基于优化的标准去卷积更快。一段时间的努力,该模型可以在床上望远镜实时使用。
A. Asensio Ramos
Abstract: Observation from ground based telescopes are affected by the presence of the Earth atmosphere, which severely perturbs them. The use of adaptive optics techniques has allowed us to partly beat this limitation. However, image selection or post-facto image reconstruction methods are routinely needed to reach the diffraction limit of telescopes. Deep learning has been recently used to accelerate these image reconstructions. Currently, these deep neural networks are trained with supervision, so that standard deconvolution algorithms need to be applied a-priori to generate the training sets. Our aim is to propose an unsupervised method which can then be trained simply with observations and check it with data from the FastCam instrument. We use a neural model composed of three neural networks that are trained end-to-end by leveraging the linear image formation theory to construct a physically-motivated loss function. The analysis of the trained neural model shows that multiframe blind deconvolution can be trained self-supervisedly, i.e., using only observations. The output of the network are the corrected images and also estimations of the instantaneous wavefronts. The network model is of the order of 1000 times faster than applying standard deconvolution based on optimization. With some work, the model can bed used on real-time at the telescope.
摘要:基于地面望远镜观测被地球大气层,这严重扰乱了他们的存在的影响。采用自适应光学技术使我们能够击败一部分这一限制。然而,常规地需要图像选择或事后图像重建方法以达到望远镜的衍射极限。深度学习最近已被使用,以加速这些图像重建。目前,这些深层神经网络与监督的培训,让标准去卷积算法需要应用先验生成的训练集。我们的目的是提出,然后可以与观测简单地训练无监督方法,并使用从FASTCAM仪器数据进行检查。我们使用由利用线性成像理论,构建物理激励的损失函数训练有素的终端到年底起神经网络组成的神经网络模型。的训练神经模型显示了分析,认为复盲解能,即只用观察来训练自我supervisedly。网络的输出是校正后的图像,并且还估计瞬时波前。网络模型是1000倍的数量级比应用基于优化的标准去卷积更快。一段时间的努力,该模型可以在床上望远镜实时使用。
35. COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images [PDF] 返回目录
S. Tabik, A. Gómez-Ríos, J.L. Martín-Rodríguez, I. Sevillano-García, M. Rey-Area, D. Charte, E. Guirado, J.L. Suárez, J. Luengo, M.A. Valero-González, P. García-Villanova, E. Olmedo-Sánchez, F. Herrera
Abstract: Currently, Coronavirus disease (COVID-19), one of the most infectious diseases in the 21st century, is diagnosed using RT-PCR testing, CT scans and/or Chest X-Ray (CXR) images. CT (Computed Tomography) scanners and RT-PCR testing are not available in most medical centers and hence in many cases CXR images become the most time/cost effective tool for assisting clinicians in making decisions. Deep learning neural networks have a great potential for building triage systems for detecting COVID-19 patients, especially patients with low severity. Unfortunately, current databases do not allow building such systems as they are highly heterogeneous and biased towards severe cases. This paper is three-fold: (i) we demystify the high sensitivities achieved by most recent COVID-19 classification models, (ii) under a close collaboration with Hospital Universitario Clínico San Cecilio, Granada, Spain, we built COVIDGR-1.0, a homogeneous and balanced database that includes all levels of severity, from Normal with positive RT-PCR, Mild, Moderate to Severe. COVIDGR-1.0 contains 377 positive and 377 negative PA (PosteroAnterior) CXR views and (iii) we propose COVID Smart Data based Network (COVID-SDNet) methodology for improving the generalization capacity of COVID-classification models. Our approach reaches good and stable results with an accuracy of $97.37\% \pm 1.86 \%$, $88.14\% \pm 2.02\%$, $66.5\% \pm 8.04\%$ in severe, moderate and mild COVID severity levels. Our approach could help in the early detection of COVID-19. COVIDGR-1.0 dataset will be made available after the review process.
摘要:目前,冠状病(COVID-19),最传染病在21世纪中的一个中,使用RT-PCR检测诊断,CT扫描和/或胸部X光(CXR)图像。 CT(计算机断层扫描)扫描仪和RT-PCR试验是在大多数医疗中心提供,因此在很多情况下CXR图像成为协助临床医生在做决定的时间最多/成本的有效工具。深度学习神经网络对建设分流系统COVID-19的患者,尤其是患者的低强度检测的巨大潜力。不幸的是,当前的数据库不允许建立这样的系统,因为它们是高度异质性和对重症病例偏见。本文有三个方面:(一)我们神秘性通过最近COVID-19分类模型实现了高感光度,(ii)根据与医院UniversitarioClínico医院圣塞西,格拉纳达,西班牙密切合作,我们建立COVIDGR-1.0,一包括严重的各个层面,从正常阳性RT-PCR,柔和均匀,平衡的数据库,中度至重度。 COVIDGR-1.0包含377正和负377 PA(后前位)CXR意见和改进COVID分类模型的泛化能力(iii)本公司提出COVID智能数据基于网络(COVID-SDNet)的方法。我们的方法达到良好的和与97.37 $ \%\下午1.86 \%$,$ 88.14 \%\下午2.02 \%$,$ 66.5 \%\下午8.04 \%$严重的精度稳定的结果,中度和轻度COVID严重级别。我们的方法可以帮助在早期发现COVID-19。 COVIDGR-1.0数据集将在审核之后提供。
S. Tabik, A. Gómez-Ríos, J.L. Martín-Rodríguez, I. Sevillano-García, M. Rey-Area, D. Charte, E. Guirado, J.L. Suárez, J. Luengo, M.A. Valero-González, P. García-Villanova, E. Olmedo-Sánchez, F. Herrera
Abstract: Currently, Coronavirus disease (COVID-19), one of the most infectious diseases in the 21st century, is diagnosed using RT-PCR testing, CT scans and/or Chest X-Ray (CXR) images. CT (Computed Tomography) scanners and RT-PCR testing are not available in most medical centers and hence in many cases CXR images become the most time/cost effective tool for assisting clinicians in making decisions. Deep learning neural networks have a great potential for building triage systems for detecting COVID-19 patients, especially patients with low severity. Unfortunately, current databases do not allow building such systems as they are highly heterogeneous and biased towards severe cases. This paper is three-fold: (i) we demystify the high sensitivities achieved by most recent COVID-19 classification models, (ii) under a close collaboration with Hospital Universitario Clínico San Cecilio, Granada, Spain, we built COVIDGR-1.0, a homogeneous and balanced database that includes all levels of severity, from Normal with positive RT-PCR, Mild, Moderate to Severe. COVIDGR-1.0 contains 377 positive and 377 negative PA (PosteroAnterior) CXR views and (iii) we propose COVID Smart Data based Network (COVID-SDNet) methodology for improving the generalization capacity of COVID-classification models. Our approach reaches good and stable results with an accuracy of $97.37\% \pm 1.86 \%$, $88.14\% \pm 2.02\%$, $66.5\% \pm 8.04\%$ in severe, moderate and mild COVID severity levels. Our approach could help in the early detection of COVID-19. COVIDGR-1.0 dataset will be made available after the review process.
摘要:目前,冠状病(COVID-19),最传染病在21世纪中的一个中,使用RT-PCR检测诊断,CT扫描和/或胸部X光(CXR)图像。 CT(计算机断层扫描)扫描仪和RT-PCR试验是在大多数医疗中心提供,因此在很多情况下CXR图像成为协助临床医生在做决定的时间最多/成本的有效工具。深度学习神经网络对建设分流系统COVID-19的患者,尤其是患者的低强度检测的巨大潜力。不幸的是,当前的数据库不允许建立这样的系统,因为它们是高度异质性和对重症病例偏见。本文有三个方面:(一)我们神秘性通过最近COVID-19分类模型实现了高感光度,(ii)根据与医院UniversitarioClínico医院圣塞西,格拉纳达,西班牙密切合作,我们建立COVIDGR-1.0,一包括严重的各个层面,从正常阳性RT-PCR,柔和均匀,平衡的数据库,中度至重度。 COVIDGR-1.0包含377正和负377 PA(后前位)CXR意见和改进COVID分类模型的泛化能力(iii)本公司提出COVID智能数据基于网络(COVID-SDNet)的方法。我们的方法达到良好的和与97.37 $ \%\下午1.86 \%$,$ 88.14 \%\下午2.02 \%$,$ 66.5 \%\下午8.04 \%$严重的精度稳定的结果,中度和轻度COVID严重级别。我们的方法可以帮助在早期发现COVID-19。 COVIDGR-1.0数据集将在审核之后提供。
36. Exploring the role of Input and Output Layers of a Deep Neural Network in Adversarial Defense [PDF] 返回目录
Jay N. Paranjape, Rahul Kumar Dubey, Vijendran V Gopalan
Abstract: Deep neural networks are learning models having achieved state of the art performance in many fields like prediction, computer vision, language processing and so on. However, it has been shown that certain inputs exist which would not trick a human normally, but may mislead the model completely. These inputs are known as adversarial inputs. These inputs pose a high security threat when such models are used in real world applications. In this work, we have analyzed the resistance of three different classes of fully connected dense networks against the rarely tested non-gradient based adversarial attacks. These classes are created by manipulating the input and output layers. We have proven empirically that owing to certain characteristics of the network, they provide a high robustness against these attacks, and can be used in fine tuning other models to increase defense against adversarial attacks.
摘要测,计算机视觉,语言处理等许多领域已实现深层神经网络学习模型的先进的性能:抽象。然而,已经表明,某些输入存在通常不会欺骗人,但也可以完全误导模型。这些输入被称为对抗输入。这些投入带来高安全威胁时,这种模式在实际应用中使用。在这项工作中,我们已经分析了三种不同级别的对抗很少测试非基于梯度的敌对攻击完全连接的密集网络的阻力。这些类通过操纵输入层和输出层创建。我们经验证明,由于网络的某些特点,他们提供免受这些攻击高鲁棒性,并且可以在微调等车型被用于增加对敌对攻击的防御。
Jay N. Paranjape, Rahul Kumar Dubey, Vijendran V Gopalan
Abstract: Deep neural networks are learning models having achieved state of the art performance in many fields like prediction, computer vision, language processing and so on. However, it has been shown that certain inputs exist which would not trick a human normally, but may mislead the model completely. These inputs are known as adversarial inputs. These inputs pose a high security threat when such models are used in real world applications. In this work, we have analyzed the resistance of three different classes of fully connected dense networks against the rarely tested non-gradient based adversarial attacks. These classes are created by manipulating the input and output layers. We have proven empirically that owing to certain characteristics of the network, they provide a high robustness against these attacks, and can be used in fine tuning other models to increase defense against adversarial attacks.
摘要测,计算机视觉,语言处理等许多领域已实现深层神经网络学习模型的先进的性能:抽象。然而,已经表明,某些输入存在通常不会欺骗人,但也可以完全误导模型。这些输入被称为对抗输入。这些投入带来高安全威胁时,这种模式在实际应用中使用。在这项工作中,我们已经分析了三种不同级别的对抗很少测试非基于梯度的敌对攻击完全连接的密集网络的阻力。这些类通过操纵输入层和输出层创建。我们经验证明,由于网络的某些特点,他们提供免受这些攻击高鲁棒性,并且可以在微调等车型被用于增加对敌对攻击的防御。
37. Adaptive convolutional neural networks for k-space data interpolation in fast magnetic resonance imaging [PDF] 返回目录
Tianming Du, Honggang Zhang, Hee Kwon Song, Yong Fan
Abstract: Deep learning in k-space has demonstrated great potential for image reconstruction from undersampled k-space data in fast magnetic resonance imaging (MRI). However, existing deep learning-based image reconstruction methods typically apply weight-sharing convolutional neural networks (CNNs) to k-space data without taking into consideration the k-space data's spatial frequency properties, leading to ineffective learning of the image reconstruction models. Moreover, complementary information of spatially adjacent slices is often ignored in existing deep learning methods. To overcome such limitations, we develop a deep learning algorithm, referred to as adaptive convolutional neural networks for k-space data interpolation (ACNN-k-Space), which adopts a residual Encoder-Decoder network architecture to interpolate the undersampled k-space data by integrating spatially contiguous slices as multi-channel input, along with k-space data from multiple coils if available. The network is enhanced by self-attention layers to adaptively focus on k-space data at different spatial frequencies and channels. We have evaluated our method on two public datasets and compared it with state-of-the-art existing methods. Ablation studies and experimental results demonstrate that our method effectively reconstructs images from undersampled k-space data and achieves significantly better image reconstruction performance than current state-of-the-art techniques.
摘要:在k空间中深学习已经证明,用于从快速磁共振成像(MRI)欠采样的k-空间数据的图像重建的巨大潜力。然而,现有的深基于学习的图像重建方法通常适用重量共享卷积神经网络(细胞神经网络)到k空间的数据,而不考虑到k空间数据的空间频率的特性,从而导致图像重建模型的无效学习。此外,在空间上相邻的切片的互补信息经常被忽略在现有深的学习方法。为了克服这种限制,我们开发了一个深的学习算法,称为用于k空间数据的内插(ACNN-k空间),其采用的残留编码器 - 解码器的网络架构来内插所述欠采样的k-空间数据的自适应卷积神经网络通过积分在空间上邻接的片的多通道输入,与来自多个线圈的k-空间数据(如果可用)沿。该网络由自关注层增强的自适应焦点上的k-空间数据以不同的空间频率和信道。我们已经评估了两个公共数据集我们的方法,并与国家的最先进的现有方法进行了比较。消融的研究和实验结果表明,我们的方法有效地从欠采样的k-空间数据来重构图像,并实现了比状态的最先进的当前技术显著更好的图像重建性能。
Tianming Du, Honggang Zhang, Hee Kwon Song, Yong Fan
Abstract: Deep learning in k-space has demonstrated great potential for image reconstruction from undersampled k-space data in fast magnetic resonance imaging (MRI). However, existing deep learning-based image reconstruction methods typically apply weight-sharing convolutional neural networks (CNNs) to k-space data without taking into consideration the k-space data's spatial frequency properties, leading to ineffective learning of the image reconstruction models. Moreover, complementary information of spatially adjacent slices is often ignored in existing deep learning methods. To overcome such limitations, we develop a deep learning algorithm, referred to as adaptive convolutional neural networks for k-space data interpolation (ACNN-k-Space), which adopts a residual Encoder-Decoder network architecture to interpolate the undersampled k-space data by integrating spatially contiguous slices as multi-channel input, along with k-space data from multiple coils if available. The network is enhanced by self-attention layers to adaptively focus on k-space data at different spatial frequencies and channels. We have evaluated our method on two public datasets and compared it with state-of-the-art existing methods. Ablation studies and experimental results demonstrate that our method effectively reconstructs images from undersampled k-space data and achieves significantly better image reconstruction performance than current state-of-the-art techniques.
摘要:在k空间中深学习已经证明,用于从快速磁共振成像(MRI)欠采样的k-空间数据的图像重建的巨大潜力。然而,现有的深基于学习的图像重建方法通常适用重量共享卷积神经网络(细胞神经网络)到k空间的数据,而不考虑到k空间数据的空间频率的特性,从而导致图像重建模型的无效学习。此外,在空间上相邻的切片的互补信息经常被忽略在现有深的学习方法。为了克服这种限制,我们开发了一个深的学习算法,称为用于k空间数据的内插(ACNN-k空间),其采用的残留编码器 - 解码器的网络架构来内插所述欠采样的k-空间数据的自适应卷积神经网络通过积分在空间上邻接的片的多通道输入,与来自多个线圈的k-空间数据(如果可用)沿。该网络由自关注层增强的自适应焦点上的k-空间数据以不同的空间频率和信道。我们已经评估了两个公共数据集我们的方法,并与国家的最先进的现有方法进行了比较。消融的研究和实验结果表明,我们的方法有效地从欠采样的k-空间数据来重构图像,并实现了比状态的最先进的当前技术显著更好的图像重建性能。
38. Eye Movements Biometrics: A Bibliometric Analysis from 2004 to 2019 [PDF] 返回目录
Antonio Ricardo Alexandre Brasil, Jefferson Oliveira Andrade, Karin Satie Komati
Abstract: Person identification based on eye movements is getting more and more attention, as it is anti-spoofing resistant and can be useful for continuous authentication. Therefore, it is noteworthy for researchers to know who and what is relevant in the field, including authors, journals, conferences, and institutions. This paper presents a comprehensive quantitative overview of the field of eye movement biometrics using a bibliometric approach. All data and analyses are based on documents written in English published between 2004 and 2019. Scopus was used to perform information retrieval. This research focused on temporal evolution, leading authors, most cited papers, leading journals, competitions and collaboration networks.
摘要:基于眼球运动的人标识也越来越多的关注,因为它是防伪性和可连续认证有用。因此,值得注意的是研究人员知道谁和什么是在该领域,包括作者,期刊,会议和相关机构。本文给出了使用文献计量学的方法眼球运动生物识别领域的综合量化的概述。所有的数据和分析均基于用英文写的文件到2004年间出版的2019年和SCOPUS被用来进行信息检索。该研究主要集中在时间演变,知名作家,被引用最多的论文,顶级期刊,竞争和合作网络。
Antonio Ricardo Alexandre Brasil, Jefferson Oliveira Andrade, Karin Satie Komati
Abstract: Person identification based on eye movements is getting more and more attention, as it is anti-spoofing resistant and can be useful for continuous authentication. Therefore, it is noteworthy for researchers to know who and what is relevant in the field, including authors, journals, conferences, and institutions. This paper presents a comprehensive quantitative overview of the field of eye movement biometrics using a bibliometric approach. All data and analyses are based on documents written in English published between 2004 and 2019. Scopus was used to perform information retrieval. This research focused on temporal evolution, leading authors, most cited papers, leading journals, competitions and collaboration networks.
摘要:基于眼球运动的人标识也越来越多的关注,因为它是防伪性和可连续认证有用。因此,值得注意的是研究人员知道谁和什么是在该领域,包括作者,期刊,会议和相关机构。本文给出了使用文献计量学的方法眼球运动生物识别领域的综合量化的概述。所有的数据和分析均基于用英文写的文件到2004年间出版的2019年和SCOPUS被用来进行信息检索。该研究主要集中在时间演变,知名作家,被引用最多的论文,顶级期刊,竞争和合作网络。
39. Fusion of Real Time Thermal Image and 1D/2D/3D Depth Laser Readings for Remote Thermal Sensing in Industrial Plants by Means of UAVs and/or Robots [PDF] 返回目录
Corneliu Arsene
Abstract: This paper presents fast procedures for thermal infrared remote sensing in dark, GPS-denied environments, such as those found in industrial plants such as in High-Voltage Direct Current (HVDC) converter stations. These procedures are based on the combination of the depth estimation obtained from either a 1-Dimensional LIDAR laser or a 2-Dimensional Hokuyo laser or a 3D MultiSense SLB laser sensor and the visible and thermal cameras from a FLIR Duo R dual-sensor thermal camera. The combination of these sensors/cameras is suitable to be mounted on Unmanned Aerial Vehicles (UAVs) and/or robots in order to provide reliable information about the potential malfunctions, which can be found within the hazardous environment. For example, the capabilities of the developed software and hardware system corresponding to the combination of the 1-D LIDAR sensor and the FLIR Duo R dual-sensor thermal camera is assessed from the point of the accuracy of results and the required computational times: the obtained computational times are under 10 ms, with a maximum localization error of 8 mm and an average standard deviation for the measured temperatures of 1.11 degree Celsius, which results are obtained for a number of test cases. The paper is structured as follows: the description of the system used for identification and localization of hotspots in industrial plants is presented in section II. In section III, the method for faults identification and localization in plants by using a 1-Dimensional LIDAR laser sensor and thermal images is described together with results. In section IV the real time thermal image processing is presented. Fusion of the 2-Dimensional depth laser Hokuyo and the thermal images is described in section V. In section VI the combination of the 3D MultiSense SLB laser and thermal images is described. In section VII a discussion and several conclusions are drawn.
摘要:本文介绍了在黑暗的热红外遥感快速程序,GPS被拒绝的环境,如在工业厂房,如高压直流输电(HVDC)换流站找到。这些过程是基于或从任一维LIDAR激光获得深度估计2维北洋激光器或三维的多传感SLB激光传感器和从FLIR铎ř双传感器热相机可见和热感照相机的组合。这些传感器/摄像机的组合是适合于安装在无人飞行器(UAV)和/或机器人,以便提供关于潜在的故障,可在危险环境中找到的可靠信息。例如,对应于1-d LIDAR传感器和FLIR铎ř双传感器的热照相机的组合所开发的软件和硬件系统的功能是从结果的准确性和所需的计算时间点评估:所述得到的计算时间是在10毫秒,以8mm的最大定位误差和平均标准偏差为1.11摄氏度的测量的温度,这结果被用于若干测试案例获得。用于鉴定和工业厂房的热点定位系统的描述中部分II给出:本文的结构如下。在第三节通过使用1维LIDAR激光传感器和热图像进行故障识别和定位在植物中的方法与结果一起描述。在第IV节呈现的实时热图像处理。 2维深度激光北洋和热图像的融合于第五节在3D多传感SLB激光和热图像的组合中描述第六节描述。在第七节的讨论和几个结论。
Corneliu Arsene
Abstract: This paper presents fast procedures for thermal infrared remote sensing in dark, GPS-denied environments, such as those found in industrial plants such as in High-Voltage Direct Current (HVDC) converter stations. These procedures are based on the combination of the depth estimation obtained from either a 1-Dimensional LIDAR laser or a 2-Dimensional Hokuyo laser or a 3D MultiSense SLB laser sensor and the visible and thermal cameras from a FLIR Duo R dual-sensor thermal camera. The combination of these sensors/cameras is suitable to be mounted on Unmanned Aerial Vehicles (UAVs) and/or robots in order to provide reliable information about the potential malfunctions, which can be found within the hazardous environment. For example, the capabilities of the developed software and hardware system corresponding to the combination of the 1-D LIDAR sensor and the FLIR Duo R dual-sensor thermal camera is assessed from the point of the accuracy of results and the required computational times: the obtained computational times are under 10 ms, with a maximum localization error of 8 mm and an average standard deviation for the measured temperatures of 1.11 degree Celsius, which results are obtained for a number of test cases. The paper is structured as follows: the description of the system used for identification and localization of hotspots in industrial plants is presented in section II. In section III, the method for faults identification and localization in plants by using a 1-Dimensional LIDAR laser sensor and thermal images is described together with results. In section IV the real time thermal image processing is presented. Fusion of the 2-Dimensional depth laser Hokuyo and the thermal images is described in section V. In section VI the combination of the 3D MultiSense SLB laser and thermal images is described. In section VII a discussion and several conclusions are drawn.
摘要:本文介绍了在黑暗的热红外遥感快速程序,GPS被拒绝的环境,如在工业厂房,如高压直流输电(HVDC)换流站找到。这些过程是基于或从任一维LIDAR激光获得深度估计2维北洋激光器或三维的多传感SLB激光传感器和从FLIR铎ř双传感器热相机可见和热感照相机的组合。这些传感器/摄像机的组合是适合于安装在无人飞行器(UAV)和/或机器人,以便提供关于潜在的故障,可在危险环境中找到的可靠信息。例如,对应于1-d LIDAR传感器和FLIR铎ř双传感器的热照相机的组合所开发的软件和硬件系统的功能是从结果的准确性和所需的计算时间点评估:所述得到的计算时间是在10毫秒,以8mm的最大定位误差和平均标准偏差为1.11摄氏度的测量的温度,这结果被用于若干测试案例获得。用于鉴定和工业厂房的热点定位系统的描述中部分II给出:本文的结构如下。在第三节通过使用1维LIDAR激光传感器和热图像进行故障识别和定位在植物中的方法与结果一起描述。在第IV节呈现的实时热图像处理。 2维深度激光北洋和热图像的融合于第五节在3D多传感SLB激光和热图像的组合中描述第六节描述。在第七节的讨论和几个结论。
40. A comparative study of 2D image segmentation algorithms for traumatic brain lesions using CT data from the ProTECTIII multicenter clinical trial [PDF] 返回目录
Shruti Jadon, Owen P. Leary, Ian Pan, Tyler J. Harder, David W. Wright, Lisa H. Merck, Derek L. Merck
Abstract: Automated segmentation of medical imaging is of broad interest to clinicians and machine learning researchers alike. The goal of segmentation is to increase efficiency and simplicity of visualization and quantification of regions of interest within a medical image. Image segmentation is a difficult task because of multiparametric heterogeneity within the images, an obstacle that has proven especially challenging in efforts to automate the segmentation of brain lesions from non-contrast head computed tomography (CT). In this research, we have experimented with multiple available deep learning architectures to segment different phenotypes of hemorrhagic lesions found after moderate to severe traumatic brain injury (TBI). These include: intraparenchymal hemorrhage (IPH), subdural hematoma (SDH), epidural hematoma (EDH), and traumatic contusions. We were able to achieve an optimal Dice Coefficient1 score of 0.94 using UNet++ 2D Architecture with Focal Tversky Loss Function, an increase from 0.85 using UNet 2D with Binary Cross-Entropy Loss Function in intraparenchymal hemorrhage (IPH) cases. Furthermore, using the same setting, we were able to achieve the Dice Coefficient score of 0.90 and 0.86 in cases of Extra-Axial bleeds and Traumatic contusions, respectively.
摘要:医学影像的自动分割是广泛的兴趣临床医生和机器学习研究人员的一致好评。分割的目标是一个医学图像中增加的可视化和感兴趣的区域的量化的效率和简易性。图像分割是一项艰巨的任务,因为在图像中多参数的异质性,已被证明在努力从自动化非对比头计算机断层扫描(CT),脑部病变的分割尤其具有挑战性的障碍。在这项研究中,我们已经尝试了多个可用的深度学习架构,出血性病变段不同表型后中度至重度颅脑损伤(TBI)找到。这些措施包括:脑实质出血(IPH),硬膜下血肿(SDH),硬膜外血肿(EDH),和外伤性挫伤。我们能够达到最佳的骰子Coefficient1使用UNET ++ 2D架构局灶性特沃斯基损失函数,使用UNET 2D与脑实质出血(IPH)的情况下二进制交叉熵损失函数从0.85增加得分0.94。此外,使用相同的设置,我们能够实现骰子得分系数为0.90和0.86的轴外出血的情况和创伤挫伤,分别。
Shruti Jadon, Owen P. Leary, Ian Pan, Tyler J. Harder, David W. Wright, Lisa H. Merck, Derek L. Merck
Abstract: Automated segmentation of medical imaging is of broad interest to clinicians and machine learning researchers alike. The goal of segmentation is to increase efficiency and simplicity of visualization and quantification of regions of interest within a medical image. Image segmentation is a difficult task because of multiparametric heterogeneity within the images, an obstacle that has proven especially challenging in efforts to automate the segmentation of brain lesions from non-contrast head computed tomography (CT). In this research, we have experimented with multiple available deep learning architectures to segment different phenotypes of hemorrhagic lesions found after moderate to severe traumatic brain injury (TBI). These include: intraparenchymal hemorrhage (IPH), subdural hematoma (SDH), epidural hematoma (EDH), and traumatic contusions. We were able to achieve an optimal Dice Coefficient1 score of 0.94 using UNet++ 2D Architecture with Focal Tversky Loss Function, an increase from 0.85 using UNet 2D with Binary Cross-Entropy Loss Function in intraparenchymal hemorrhage (IPH) cases. Furthermore, using the same setting, we were able to achieve the Dice Coefficient score of 0.90 and 0.86 in cases of Extra-Axial bleeds and Traumatic contusions, respectively.
摘要:医学影像的自动分割是广泛的兴趣临床医生和机器学习研究人员的一致好评。分割的目标是一个医学图像中增加的可视化和感兴趣的区域的量化的效率和简易性。图像分割是一项艰巨的任务,因为在图像中多参数的异质性,已被证明在努力从自动化非对比头计算机断层扫描(CT),脑部病变的分割尤其具有挑战性的障碍。在这项研究中,我们已经尝试了多个可用的深度学习架构,出血性病变段不同表型后中度至重度颅脑损伤(TBI)找到。这些措施包括:脑实质出血(IPH),硬膜下血肿(SDH),硬膜外血肿(EDH),和外伤性挫伤。我们能够达到最佳的骰子Coefficient1使用UNET ++ 2D架构局灶性特沃斯基损失函数,使用UNET 2D与脑实质出血(IPH)的情况下二进制交叉熵损失函数从0.85增加得分0.94。此外,使用相同的设置,我们能够实现骰子得分系数为0.90和0.86的轴外出血的情况和创伤挫伤,分别。
41. BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients [PDF] 返回目录
Maria de la Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco Garcia, Marisa Caparrós, Germán González, Jose María Salinas
Abstract: In this work we describe BIMCV-COVID-19+ dataset, a large dataset from Medical Imaging Databank in Valencian Region Medical ImageBank (BIMCV) with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. To the best of our knowledge, this is the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from this http URL.
摘要:在这项工作中,我们描述BIMCV-COVID-19 +数据集,从医学影像资料库大型数据集在巴伦西亚地区医疗ImageBank(BIMCV)与胸部X射线图像CXR(CR,DX)和计算机断层摄影(CT)成像COVID-19 +患者其放射结果和位置,沿病状,放射报告(以西班牙语),DICOM元数据,聚合酶链反应(PCR),免疫球蛋白G(IgG)和免疫球蛋白M(IgM抗体)诊断抗体测试。这些发现被映射到标准一体化医学语言系统(UMLS)的术语和它们覆盖胸实体的谱广,与在以前的数据集注释实体的更多数量减少对比。图像存储在高分辨率和实体与医疗成像数据结构(MIDS)格式解剖标签本地化。此外,10个图像是由一组放射的注释,包括放射学发现的语义分割。该数据库的第一次迭代包括1380 CX,DX 885和1311 COVID-19 +的患者163个CT研究。据我们所知,这是最大的COVID-19 +的数据集在一个开放的格式提供的图像。该数据集可以从这个HTTP URL下载。
Maria de la Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco Garcia, Marisa Caparrós, Germán González, Jose María Salinas
Abstract: In this work we describe BIMCV-COVID-19+ dataset, a large dataset from Medical Imaging Databank in Valencian Region Medical ImageBank (BIMCV) with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. To the best of our knowledge, this is the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from this http URL.
摘要:在这项工作中,我们描述BIMCV-COVID-19 +数据集,从医学影像资料库大型数据集在巴伦西亚地区医疗ImageBank(BIMCV)与胸部X射线图像CXR(CR,DX)和计算机断层摄影(CT)成像COVID-19 +患者其放射结果和位置,沿病状,放射报告(以西班牙语),DICOM元数据,聚合酶链反应(PCR),免疫球蛋白G(IgG)和免疫球蛋白M(IgM抗体)诊断抗体测试。这些发现被映射到标准一体化医学语言系统(UMLS)的术语和它们覆盖胸实体的谱广,与在以前的数据集注释实体的更多数量减少对比。图像存储在高分辨率和实体与医疗成像数据结构(MIDS)格式解剖标签本地化。此外,10个图像是由一组放射的注释,包括放射学发现的语义分割。该数据库的第一次迭代包括1380 CX,DX 885和1311 COVID-19 +的患者163个CT研究。据我们所知,这是最大的COVID-19 +的数据集在一个开放的格式提供的图像。该数据集可以从这个HTTP URL下载。
注:中文为机器翻译结果!