目录
3. A Smooth Representation of Belief over SO(3) for Deep Rotation Learning with Uncertainty [PDF] 摘要
8. Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification [PDF] 摘要
11. Multi-scale Cloud Detection in Remote Sensing Images using a Dual Convolutional Neural Network [PDF] 摘要
20. Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-based Framework [PDF] 摘要
28. Modified Segmentation Algorithm for Recognition of Older Geez Scripts Written on Vellum [PDF] 摘要
30. EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Neuromorphic Vision Sensors [PDF] 摘要
33. Critical Assessment of Transfer Learning for Medical Image Segmentation with Fully Convolutional Neural Networks [PDF] 摘要
35. SDCT-AuxNet$^θ$: DCT Augmented Stain Deconvolutional CNN with Auxiliary Classifier for Cancer Diagnosis [PDF] 摘要
37. Positron Emission Tomography (PET) image enhancement using a gradient vector orientation based nonlinear diffusion filter (GVOF) for accurate quantitation of radioactivity concentration [PDF] 摘要
41. Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts [PDF] 摘要
53. Assessing the validity of saliency maps for abnormality localization in medical imaging [PDF] 摘要
56. Reducing the X-ray radiation exposure frequency in cardio-angiography via deep-learning based video interpolation [PDF] 摘要
58. Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: combination of data augmentation methods in a small dataset [PDF] 摘要
64. DC-UNet: Rethinking the U-Net Architecture with Dual Channel Efficient CNN for Medical Images Segmentation [PDF] 摘要
67. Hyperspectral Image Denoising via Global Spatial-Spectral Total Variation Regularized Nonconvex Local Low-Rank Tensor Approximation [PDF] 摘要
73. Approximating the Ideal Observer for joint signal detection and localization tasks by use of supervised learning methods [PDF] 摘要
74. Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms [PDF] 摘要
75. Automatic segmentation of the pulmonary lobes with a 3D u-net and optimized loss function [PDF] 摘要
76. Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data [PDF] 摘要
77. Automatic Diagnosis of Pulmonary Embolism Using an Attention-guided Framework: A Large-scale Study [PDF] 摘要
80. Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs [PDF] 摘要
81. Glaucoma Detection From Raw Circumapillary OCT Images Using Fully Convolutional Neural Networks [PDF] 摘要
摘要
1. DPDnet: A Robust People Detector using Deep Learning with an Overhead Depth Camera [PDF] 返回目录
David Fuentes-Jimenez, Roberto Martin-Lopez, Cristina Losada-Gutierrez, David Casillas-Perez, Javier Macias-Guarasa, Daniel Pizarro, Carlos A.Luna
Abstract: In this paper we propose a method based on deep learning that detects multiple people from a single overhead depth image with high reliability. Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers. The Main Block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution. The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and head position labels. The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets, without retraining not fine-tuning. In addition, the computational complexity of our proposal is independent of the number of people in the scene and runs in real time using conventional GPUs.
摘要:本文提出了一种基于深度学习检测多的人从高可靠性的单顶置深度图像的方法。我们的神经网络,称为DPDnet,基于残余层基于两个全卷积编码器 - 解码器神经块。主块需要一个深度图像作为输入并产生一个逐像素信心地图,其中每个检测到的人物图像中的由高斯状分布表示。该精化块结合了深度图像和从主块的输出,以细化信心地图。两个块是使用深度图像和头部位置的标签同时训练端至端。实验工作表明,DPDNet性能优于国家的最先进的方法,用精度比在三个不同的可公开获得的数据集的99%以上,再训练没有不微调。此外,我们建议的计算复杂度是独立的人在现场的数量和使用传统GPU的实时运行。
David Fuentes-Jimenez, Roberto Martin-Lopez, Cristina Losada-Gutierrez, David Casillas-Perez, Javier Macias-Guarasa, Daniel Pizarro, Carlos A.Luna
Abstract: In this paper we propose a method based on deep learning that detects multiple people from a single overhead depth image with high reliability. Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers. The Main Block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution. The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and head position labels. The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets, without retraining not fine-tuning. In addition, the computational complexity of our proposal is independent of the number of people in the scene and runs in real time using conventional GPUs.
摘要:本文提出了一种基于深度学习检测多的人从高可靠性的单顶置深度图像的方法。我们的神经网络,称为DPDnet,基于残余层基于两个全卷积编码器 - 解码器神经块。主块需要一个深度图像作为输入并产生一个逐像素信心地图,其中每个检测到的人物图像中的由高斯状分布表示。该精化块结合了深度图像和从主块的输出,以细化信心地图。两个块是使用深度图像和头部位置的标签同时训练端至端。实验工作表明,DPDNet性能优于国家的最先进的方法,用精度比在三个不同的可公开获得的数据集的99%以上,再训练没有不微调。此外,我们建议的计算复杂度是独立的人在现场的数量和使用传统GPU的实时运行。
2. Deep Generation of Face Images from Sketches [PDF] 返回目录
Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, Hongbo Fu
Abstract: Recent deep image-to-image translation techniques allow fast generation of face images from freehand sketches. However, existing solutions tend to overfit to sketches, thus requiring professional sketches or even edge maps as input. To address this issue, our key idea is to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch. We take a local-to-global approach. We first learn feature embeddings of key face components, and push corresponding parts of input sketches towards underlying component manifolds defined by the feature vectors of face component samples. We also propose another deep neural network to learn the mapping from the embedded component features to realistic images with multi-channel feature maps as intermediate results to improve the information flow. Our method essentially uses input sketches as soft constraints and is thus able to produce high-quality face images even from rough and/or incomplete sketches. Our tool is easy to use even for non-artists, while still supporting fine-grained control of shape details. Both qualitative and quantitative evaluations show the superior generation ability of our system to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.
摘要:近期深图像 - 图像转换技术允许从手绘草图快速一代人脸图像。然而,现有的解决方案趋向于过度拟合到草图,因此需要专业的草图或甚至边缘图作为输入。为了解决这个问题,我们的主要想法是隐式模型似是而非的人脸图像的形状,空间,在这个空间里合成的面部图像近似的输入草图。我们采取了从地方到全球性的办法。我们首先学习关键脸部部件的特征的嵌入,并朝向由面部分量样本的特征向量定义的组件的歧管底层推输入草图的相应的部分。我们还建议另一个深层神经网络学习从嵌入部件特征的映射到与多通道特征地图逼真的图像作为中间结果以改善的信息流。我们的方法基本上使用输入草图作为软约束,因此能够甚至从粗糙的和/或不完整的草图产生高质量的面部图像。我们的工具很容易使用,甚至对非艺术家,同时仍支持的形状细节细粒度控制。定性和定量评估表明我们的系统现有的和替代性解决方案的卓越能力产生。我们系统的可用性和表现是由用户研究证实。
Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, Hongbo Fu
Abstract: Recent deep image-to-image translation techniques allow fast generation of face images from freehand sketches. However, existing solutions tend to overfit to sketches, thus requiring professional sketches or even edge maps as input. To address this issue, our key idea is to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch. We take a local-to-global approach. We first learn feature embeddings of key face components, and push corresponding parts of input sketches towards underlying component manifolds defined by the feature vectors of face component samples. We also propose another deep neural network to learn the mapping from the embedded component features to realistic images with multi-channel feature maps as intermediate results to improve the information flow. Our method essentially uses input sketches as soft constraints and is thus able to produce high-quality face images even from rough and/or incomplete sketches. Our tool is easy to use even for non-artists, while still supporting fine-grained control of shape details. Both qualitative and quantitative evaluations show the superior generation ability of our system to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.
摘要:近期深图像 - 图像转换技术允许从手绘草图快速一代人脸图像。然而,现有的解决方案趋向于过度拟合到草图,因此需要专业的草图或甚至边缘图作为输入。为了解决这个问题,我们的主要想法是隐式模型似是而非的人脸图像的形状,空间,在这个空间里合成的面部图像近似的输入草图。我们采取了从地方到全球性的办法。我们首先学习关键脸部部件的特征的嵌入,并朝向由面部分量样本的特征向量定义的组件的歧管底层推输入草图的相应的部分。我们还建议另一个深层神经网络学习从嵌入部件特征的映射到与多通道特征地图逼真的图像作为中间结果以改善的信息流。我们的方法基本上使用输入草图作为软约束,因此能够甚至从粗糙的和/或不完整的草图产生高质量的面部图像。我们的工具很容易使用,甚至对非艺术家,同时仍支持的形状细节细粒度控制。定性和定量评估表明我们的系统现有的和替代性解决方案的卓越能力产生。我们系统的可用性和表现是由用户研究证实。
3. A Smooth Representation of Belief over SO(3) for Deep Rotation Learning with Uncertainty [PDF] 返回目录
Valentin Peretroukhin, Matthew Giamou, David M. Rosen, W. Nicholas Greene, Nicholas Roy, Jonathan Kelly
Abstract: Accurate rotation estimation is at the heart of robot perception tasks such as visual odometry and object pose estimation. Deep neural networks have provided a new way to perform these tasks, and the choice of rotation representation is an important part of network design. In this work, we present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models: (1) it satisfies a smoothness property that improves convergence and generalization when regressing large rotation targets, and (2) it encodes a symmetric Bingham belief over the space of unit quaternions, permitting the training of uncertainty-aware models. We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities. First, we use synthetic point-cloud data to show that our representation leads to superior predictive accuracy over existing representations for arbitrary rotation targets. Second, we use image data collected onboard ground and aerial vehicles to demonstrate that our representation is amenable to an effective out-of-distribution (OOD) rejection technique that significantly improves the robustness of rotation estimates to unseen environmental effects and corrupted input images, without requiring the use of an explicit likelihood loss, stochastic sampling, or an auxiliary classifier. This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.
摘要:精确旋转估计是在机器人感知任务的心脏,如视觉里程和对象姿态估计。深层神经网络已经提供执行这些任务的新方式,并轮流担任的选择是网络设计的重要组成部分。倒退当(1)它满足改善收敛和泛化一个平滑度特性:在本工作中,我们提出的3D旋转组的一种新颖的对称矩阵表示,SO(3),具有使其特别适合于学习模型两个重要的属性大转动的目标;(2)它编码对称宾汉姆相信在单位四元的空间,允许不确定性感知模型的训练。我们通过实证两个数据模式训练深层神经旋转回归系数验证了我们制定的好处。首先,我们使用合成的点云数据显示在任意旋转目标的现有表示,我们的表现带来更出色的预测准确度。其次,我们使用收集板载地面和高空作业车的图像数据,以证明我们的表示是服从有效外的分布(OOD)抑制技术,显著提高了旋转估计到看不见的环境影响和损坏的输入图像的稳定性,无需要使用一个明确的可能性损失,随机取样,或辅助分类。这种能力对于其中检测新的输入可以防止了解到车型的灾难性故障安全关键应用的关键。
Valentin Peretroukhin, Matthew Giamou, David M. Rosen, W. Nicholas Greene, Nicholas Roy, Jonathan Kelly
Abstract: Accurate rotation estimation is at the heart of robot perception tasks such as visual odometry and object pose estimation. Deep neural networks have provided a new way to perform these tasks, and the choice of rotation representation is an important part of network design. In this work, we present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models: (1) it satisfies a smoothness property that improves convergence and generalization when regressing large rotation targets, and (2) it encodes a symmetric Bingham belief over the space of unit quaternions, permitting the training of uncertainty-aware models. We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities. First, we use synthetic point-cloud data to show that our representation leads to superior predictive accuracy over existing representations for arbitrary rotation targets. Second, we use image data collected onboard ground and aerial vehicles to demonstrate that our representation is amenable to an effective out-of-distribution (OOD) rejection technique that significantly improves the robustness of rotation estimates to unseen environmental effects and corrupted input images, without requiring the use of an explicit likelihood loss, stochastic sampling, or an auxiliary classifier. This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.
摘要:精确旋转估计是在机器人感知任务的心脏,如视觉里程和对象姿态估计。深层神经网络已经提供执行这些任务的新方式,并轮流担任的选择是网络设计的重要组成部分。倒退当(1)它满足改善收敛和泛化一个平滑度特性:在本工作中,我们提出的3D旋转组的一种新颖的对称矩阵表示,SO(3),具有使其特别适合于学习模型两个重要的属性大转动的目标;(2)它编码对称宾汉姆相信在单位四元的空间,允许不确定性感知模型的训练。我们通过实证两个数据模式训练深层神经旋转回归系数验证了我们制定的好处。首先,我们使用合成的点云数据显示在任意旋转目标的现有表示,我们的表现带来更出色的预测准确度。其次,我们使用收集板载地面和高空作业车的图像数据,以证明我们的表示是服从有效外的分布(OOD)抑制技术,显著提高了旋转估计到看不见的环境影响和损坏的输入图像的稳定性,无需要使用一个明确的可能性损失,随机取样,或辅助分类。这种能力对于其中检测新的输入可以防止了解到车型的灾难性故障安全关键应用的关键。
4. GoodPoint: unsupervised learning of keypoint detection and description [PDF] 返回目录
Anatoly Belikov, Alexey Potapov
Abstract: This paper introduces a new algorithm for unsupervised learning of keypoint detectors and descriptors, which demonstrates fast convergence and good performance across different datasets. The training procedure uses homographic transformation of images. The proposed model learns to detect points and generate descriptors on pairs of transformed images, which are easy for it to distinguish and repeatedly detect. The trained model follows SuperPoint architecture for ease of comparison, and demonstrates similar performance on natural images from HPatches dataset, and better performance on retina images from Fundus Image Registration Dataset, which contain low number of corner-like features. For HPatches and other datasets, coverage was also computed to provide better estimation of model quality.
摘要:本文介绍了关键点探测器和描述的无监督学习,这表明快速收敛和不同的数据集优良性能的新算法。训练过程使用图像的单应变换。该模型学会了检测点,并产生对变换图像,这是很容易为它区分和反复检测的描述符。训练的模型如下SuperPoint架构为便于比较,并演示了从眼底图像配准数据集的视网膜图像,其中包含的角落样特征的低数量从HPatches数据集上的自然图像相似的性能,以及更好的性能。对于HPatches和其他数据集,覆盖面也计算提供模型质量的更好的估计。
Anatoly Belikov, Alexey Potapov
Abstract: This paper introduces a new algorithm for unsupervised learning of keypoint detectors and descriptors, which demonstrates fast convergence and good performance across different datasets. The training procedure uses homographic transformation of images. The proposed model learns to detect points and generate descriptors on pairs of transformed images, which are easy for it to distinguish and repeatedly detect. The trained model follows SuperPoint architecture for ease of comparison, and demonstrates similar performance on natural images from HPatches dataset, and better performance on retina images from Fundus Image Registration Dataset, which contain low number of corner-like features. For HPatches and other datasets, coverage was also computed to provide better estimation of model quality.
摘要:本文介绍了关键点探测器和描述的无监督学习,这表明快速收敛和不同的数据集优良性能的新算法。训练过程使用图像的单应变换。该模型学会了检测点,并产生对变换图像,这是很容易为它区分和反复检测的描述符。训练的模型如下SuperPoint架构为便于比较,并演示了从眼底图像配准数据集的视网膜图像,其中包含的角落样特征的低数量从HPatches数据集上的自然图像相似的性能,以及更好的性能。对于HPatches和其他数据集,覆盖面也计算提供模型质量的更好的估计。
5. One Versus all for deep Neural Network Incertitude (OVNNI) quantification [PDF] 返回目录
Gianni Franchi, Andrei Bursuc, Emanuel Aldea, Severine Dubuisson, Isabelle Bloch
Abstract: Deep neural networks (DNNs) are powerful learning models yet their results are not always reliable. This is due to the fact that modern DNNs are usually uncalibrated and we cannot characterize their epistemic uncertainty. In this work, we propose a new technique to quantify the epistemic uncertainty of data easily. This method consists in mixing the predictions of an ensemble of DNNs trained to classify One class vs All the other classes (OVA) with predictions from a standard DNN trained to perform All vs All (AVA) classification. On the one hand, the adjustment provided by the AVA DNN to the score of the base classifiers allows for a more fine-grained inter-class separation. On the other hand, the two types of classifiers enforce mutually their detection of out-of-distribution (OOD) samples, circumventing entirely the requirement of using such samples during training. Our method achieves state of the art performance in quantifying OOD data across multiple datasets and architectures while requiring little hyper-parameter tuning.
摘要:深层神经网络(DNNs)是功能强大的学习模式但他们的结果并不总是可靠的。这是由于现代DNNs通常未校准,我们不能描述他们的主观因素。在这项工作中,我们提出了一种新技术,能够轻松地量化数据的主观因素。该方法包括在训练一类VS所有分类DNNs的合奏的其他类(OVA)与从标准DNN预测训练来执行所有VS所有(AVA)分类的预测的混合。在一方面,由AVA DNN提供给基分类的分数的调整允许更细粒度级间的分离。在另一方面,这两种类型的分类器执行相互它们的检测外的分布(OOD)的样品,完全绕过训练期间使用这样的样品的要求。我们的方法实现了在跨多个数据集和架构量化OOD数据,同时需要很少的超参数调谐的技术性能状态。
Gianni Franchi, Andrei Bursuc, Emanuel Aldea, Severine Dubuisson, Isabelle Bloch
Abstract: Deep neural networks (DNNs) are powerful learning models yet their results are not always reliable. This is due to the fact that modern DNNs are usually uncalibrated and we cannot characterize their epistemic uncertainty. In this work, we propose a new technique to quantify the epistemic uncertainty of data easily. This method consists in mixing the predictions of an ensemble of DNNs trained to classify One class vs All the other classes (OVA) with predictions from a standard DNN trained to perform All vs All (AVA) classification. On the one hand, the adjustment provided by the AVA DNN to the score of the base classifiers allows for a more fine-grained inter-class separation. On the other hand, the two types of classifiers enforce mutually their detection of out-of-distribution (OOD) samples, circumventing entirely the requirement of using such samples during training. Our method achieves state of the art performance in quantifying OOD data across multiple datasets and architectures while requiring little hyper-parameter tuning.
摘要:深层神经网络(DNNs)是功能强大的学习模式但他们的结果并不总是可靠的。这是由于现代DNNs通常未校准,我们不能描述他们的主观因素。在这项工作中,我们提出了一种新技术,能够轻松地量化数据的主观因素。该方法包括在训练一类VS所有分类DNNs的合奏的其他类(OVA)与从标准DNN预测训练来执行所有VS所有(AVA)分类的预测的混合。在一方面,由AVA DNN提供给基分类的分数的调整允许更细粒度级间的分离。在另一方面,这两种类型的分类器执行相互它们的检测外的分布(OOD)的样品,完全绕过训练期间使用这样的样品的要求。我们的方法实现了在跨多个数据集和架构量化OOD数据,同时需要很少的超参数调谐的技术性能状态。
6. Multimodal grid features and cell pointers for Scene Text Visual Question Answering [PDF] 返回目录
Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Dimosthenis Karatzas
Abstract: This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it. The proposed model is based on an attention mechanism that attends to multi-modal features conditioned to the question, allowing it to reason jointly about the textual and visual modalities in the scene. The output weights of this attention module over the grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text the to the given question. Our experiments demonstrate competitive performance in two standard datasets. Furthermore, this paper provides a novel analysis of the ST-VQA dataset based on a human performance study.
摘要:本文介绍了现场文字视觉答疑的任务,其中关于给定的图像问题只能通过阅读和理解现场的文字存在于它回答的新模式。该模型是基于注意机制,照顾到多模式调节功能的问题,允许其共同的原因有关场景的文字和视觉的方式。这种关注在模块的多模态空间特征电网的输出权被解释为对图像的特定空间位置包含答复文件的给定问题的概率。我们的实验证明两个标准数据集竞争性优势。此外,本文提供了一种基于人类性能研究的ST-VQA数据集的一种新的分析。
Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Dimosthenis Karatzas
Abstract: This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it. The proposed model is based on an attention mechanism that attends to multi-modal features conditioned to the question, allowing it to reason jointly about the textual and visual modalities in the scene. The output weights of this attention module over the grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text the to the given question. Our experiments demonstrate competitive performance in two standard datasets. Furthermore, this paper provides a novel analysis of the ST-VQA dataset based on a human performance study.
摘要:本文介绍了现场文字视觉答疑的任务,其中关于给定的图像问题只能通过阅读和理解现场的文字存在于它回答的新模式。该模型是基于注意机制,照顾到多模式调节功能的问题,允许其共同的原因有关场景的文字和视觉的方式。这种关注在模块的多模态空间特征电网的输出权被解释为对图像的特定空间位置包含答复文件的给定问题的概率。我们的实验证明两个标准数据集竞争性优势。此外,本文提供了一种基于人类性能研究的ST-VQA数据集的一种新的分析。
7. Implementing AI-powered semantic character recognition in motor racing sports [PDF] 返回目录
Jose David Fernández Rodríguez, David Daniel Albarracín Molina, Jesús Hormigo Cebolla
Abstract: Oftentimes TV producers of motor-racing programs overlay visual and textual media to provide on-screen context about drivers, such as a driver's name, position or photo. Typically this is accomplished by a human producer who visually identifies the drivers on screen, manually toggling the contextual media associated to each one and coordinating with cameramen and other TV producers to keep the racer in the shot while the contextual media is on screen. This labor-intensive and highly dedicated process is mostly suited to static overlays and makes it difficult to overlay contextual information about many drivers at the same time in short shots. This paper presents a system that largely automates these tasks and enables dynamic overlays using deep learning to track the drivers as they move on screen, without human intervention. This system is not merely theoretical, but an implementation has already been deployed during live races by a TV production company at Formula E races. We present the challenges faced during the implementation and discuss the implications. Additionally, we cover future applications and roadmap of this new technological development.
摘要:汽车赛车节目时常电视制片覆盖的视觉和文本媒体提供有关驱动程序,如司机的姓名,职务或照片屏幕上的内容。通常,这是由人类生产谁视觉识别屏幕上的驱动程序,手动切换关联到每一个上下文媒体和摄像师和其他电视生产商协调,以保持赛车的镜头,而上下文媒体在屏幕上完成。这种劳动密集和高度敬业的过程主要是适用于静态的叠加,使之难以覆盖约在缺料的同时很多司机的上下文信息。本文提出了很大程度上自动化这些任务,并实现了动态叠加使用深度学习跟踪司机,因为他们在屏幕上移动,无需人工干预的系统。该系统不仅是理论上的,但实现已经期间配方E比赛实况转播所有比赛由电视制作公司部署。我们提出在实施过程中面临的挑战,并讨论了影响。此外,我们将介绍未来的应用和新技术发展的路线图。
Jose David Fernández Rodríguez, David Daniel Albarracín Molina, Jesús Hormigo Cebolla
Abstract: Oftentimes TV producers of motor-racing programs overlay visual and textual media to provide on-screen context about drivers, such as a driver's name, position or photo. Typically this is accomplished by a human producer who visually identifies the drivers on screen, manually toggling the contextual media associated to each one and coordinating with cameramen and other TV producers to keep the racer in the shot while the contextual media is on screen. This labor-intensive and highly dedicated process is mostly suited to static overlays and makes it difficult to overlay contextual information about many drivers at the same time in short shots. This paper presents a system that largely automates these tasks and enables dynamic overlays using deep learning to track the drivers as they move on screen, without human intervention. This system is not merely theoretical, but an implementation has already been deployed during live races by a TV production company at Formula E races. We present the challenges faced during the implementation and discuss the implications. Additionally, we cover future applications and roadmap of this new technological development.
摘要:汽车赛车节目时常电视制片覆盖的视觉和文本媒体提供有关驱动程序,如司机的姓名,职务或照片屏幕上的内容。通常,这是由人类生产谁视觉识别屏幕上的驱动程序,手动切换关联到每一个上下文媒体和摄像师和其他电视生产商协调,以保持赛车的镜头,而上下文媒体在屏幕上完成。这种劳动密集和高度敬业的过程主要是适用于静态的叠加,使之难以覆盖约在缺料的同时很多司机的上下文信息。本文提出了很大程度上自动化这些任务,并实现了动态叠加使用深度学习跟踪司机,因为他们在屏幕上移动,无需人工干预的系统。该系统不仅是理论上的,但实现已经期间配方E比赛实况转播所有比赛由电视制作公司部署。我们提出在实施过程中面临的挑战,并讨论了影响。此外,我们将介绍未来的应用和新技术发展的路线图。
8. Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification [PDF] 返回目录
Hanrong Ye, Hong Liu, Fanyang Meng, Xia Li
Abstract: RGB-Infrared person re-identification (RGB-IR Re-ID) is a cross-modality matching problem with promising applications in the dark environment. Most existing works use Euclidean metric based constraints to resolve the discrepancy between features of different modalities. However, these methods are incapable of learning angularly discriminative feature embedding because Euclidean distance cannot measure the included angle between embedding vectors effectively. As an angularly discriminative feature space is important for classifying the human images based on their embedding vectors, in this paper, we propose a novel ranking loss function, named Bi-directional Exponential Angular Triplet Loss, to help learn an angularly separable common feature space by explicitly constraining the included angles between embedding vectors. Moreover, to help stabilize and learn the magnitudes of embedding vectors, we adopt a common space batch normalization layer. Quantitative experiments on the SYSU-MM01 and RegDB dataset support our analysis. On SYSU-MM01 dataset, the performance is improved from 7.40%/11.46% to 38.57%/38.61% for rank-1 accuracy/mAP compared with the baseline. The proposed method can be generalized to the task of single-modality Re-ID and improves the rank-1 accuracy/mAP from 92.0%/81.7% to 94.7%/86.6% on the Market-1501 dataset, from 82.6%/70.6% to 87.6%/77.1% on the DukeMTMC-reID dataset.
摘要:RGB红外的人重新鉴定(RGB-IR重新-ID)是一个跨模态匹配问题,在黑暗的环境中颇有前景的应用。大多数现有的作品使用欧几里德度量基于约束来解决不同形式的特征之间的差异。然而,这些方法不能学习角度判别特征嵌入因为欧几里德距离不能测量有效地嵌入矢量之间的夹角。作为一个角度判别特征空间是基于其嵌入矢量人的图像分类,在本文中重要的是,我们提出了一个新颖的排名损失函数,命名为双向指数角三重损失,以帮助学习由角度可分共同特征空间明确地约束嵌入矢量之间的夹角。此外,为帮助稳定和学习嵌入矢量的大小,我们采用一个共同的空间批标准化层。在中山大学-MM01和RegDB定量实验数据集支持我们的分析。上SYSU-MM01的数据集,其性能是从7.40%/ 11.46%提高到38.57%/ 38.61%用于秩1的精度/地图与基准进行比较。所提出的方法可以推广到单模态重新ID的任务,提高了秩1的精度/ MAP从92.0%/ 81.7%对市场-1501数据集94.7%/ 86.6%,从82.6%/ 70.6%对DukeMTMC-REID数据集87.6%/ 77.1%。
Hanrong Ye, Hong Liu, Fanyang Meng, Xia Li
Abstract: RGB-Infrared person re-identification (RGB-IR Re-ID) is a cross-modality matching problem with promising applications in the dark environment. Most existing works use Euclidean metric based constraints to resolve the discrepancy between features of different modalities. However, these methods are incapable of learning angularly discriminative feature embedding because Euclidean distance cannot measure the included angle between embedding vectors effectively. As an angularly discriminative feature space is important for classifying the human images based on their embedding vectors, in this paper, we propose a novel ranking loss function, named Bi-directional Exponential Angular Triplet Loss, to help learn an angularly separable common feature space by explicitly constraining the included angles between embedding vectors. Moreover, to help stabilize and learn the magnitudes of embedding vectors, we adopt a common space batch normalization layer. Quantitative experiments on the SYSU-MM01 and RegDB dataset support our analysis. On SYSU-MM01 dataset, the performance is improved from 7.40%/11.46% to 38.57%/38.61% for rank-1 accuracy/mAP compared with the baseline. The proposed method can be generalized to the task of single-modality Re-ID and improves the rank-1 accuracy/mAP from 92.0%/81.7% to 94.7%/86.6% on the Market-1501 dataset, from 82.6%/70.6% to 87.6%/77.1% on the DukeMTMC-reID dataset.
摘要:RGB红外的人重新鉴定(RGB-IR重新-ID)是一个跨模态匹配问题,在黑暗的环境中颇有前景的应用。大多数现有的作品使用欧几里德度量基于约束来解决不同形式的特征之间的差异。然而,这些方法不能学习角度判别特征嵌入因为欧几里德距离不能测量有效地嵌入矢量之间的夹角。作为一个角度判别特征空间是基于其嵌入矢量人的图像分类,在本文中重要的是,我们提出了一个新颖的排名损失函数,命名为双向指数角三重损失,以帮助学习由角度可分共同特征空间明确地约束嵌入矢量之间的夹角。此外,为帮助稳定和学习嵌入矢量的大小,我们采用一个共同的空间批标准化层。在中山大学-MM01和RegDB定量实验数据集支持我们的分析。上SYSU-MM01的数据集,其性能是从7.40%/ 11.46%提高到38.57%/ 38.61%用于秩1的精度/地图与基准进行比较。所提出的方法可以推广到单模态重新ID的任务,提高了秩1的精度/ MAP从92.0%/ 81.7%对市场-1501数据集94.7%/ 86.6%,从82.6%/ 70.6%对DukeMTMC-REID数据集87.6%/ 77.1%。
9. 3D Lidar Mapping Relative Accuracy Automatic Evaluation Algorithm [PDF] 返回目录
Guibin Chen, Jiong Deng, Dongze Huang, Shuo Zhang
Abstract: HD (High Definition) map based on 3D lidar plays a vital role in autonomous vehicle localization, planning, decision-making, perception, etc. Many 3D lidar mapping technologies related to SLAM (Simultaneous Localization and Mapping) are used in HD map construction to ensure its high accuracy. To evaluate the accuracy of 3D lidar mapping, the most common methods use ground truth of poses to calculate the error between estimated poses and ground truth, however it's usually so difficult to get the ground truth of poses in the actual lidar mapping for autonomous vehicle. In this paper, we proposed a relative accuracy evaluation algorithm that can automatically evaluate the accuracy of HD map built by 3D lidar mapping without ground truth. A method for detecting the degree of ghosting in point cloud map quantitatively is designed to reflect the accuracy indirectly, which takes advantage of the principle of light traveling in a straight line and the fact that light can not penetrate opaque objects. Our experimental results confirm that the proposed evaluation algorithm can automatically and efficiently detect the bad poses whose accuracy are less than the set threshold such as 0.1m, then calculate the bad poses percentage P_bad in all estimated poses to obtain the final accuracy metric P_acc = 1 - P_bad.
摘要:HD(高清晰度)地图基于三维激光雷达起着自主车型的定位,规划,决策,感知等许多三维激光雷达与SLAM(同时定位与地图)测绘技术至关重要的作用高清地图使用建设,确保其精度高。为了评估三维激光雷达测绘的精度,最常见的方法使用姿势来计算估计姿态和地面真相之间的误差的基础事实,但它通常是如此难以得到姿态的地面实况在自主汽车的实际激光雷达测绘。在本文中,我们提出了一个相对准确的评估算法,能够自动评估由3D激光雷达测绘没有内置地面实况高清地图的准确性。一种用于检测在点云地图定量被设计成间接反映准确度,其在直线和事实,即光不能穿透不透明物体花费的光行进的主要优点重影的程度的方法。我们的实验结果确认,所提出的评估算法可以自动和有效地检测不良姿势,其精度是小于设定的阈值,例如为0.1M,然后计算在所有估计姿态的姿态不良百分比P_bad以获得最终的准确性度量P_acc = 1 - P_bad。
Guibin Chen, Jiong Deng, Dongze Huang, Shuo Zhang
Abstract: HD (High Definition) map based on 3D lidar plays a vital role in autonomous vehicle localization, planning, decision-making, perception, etc. Many 3D lidar mapping technologies related to SLAM (Simultaneous Localization and Mapping) are used in HD map construction to ensure its high accuracy. To evaluate the accuracy of 3D lidar mapping, the most common methods use ground truth of poses to calculate the error between estimated poses and ground truth, however it's usually so difficult to get the ground truth of poses in the actual lidar mapping for autonomous vehicle. In this paper, we proposed a relative accuracy evaluation algorithm that can automatically evaluate the accuracy of HD map built by 3D lidar mapping without ground truth. A method for detecting the degree of ghosting in point cloud map quantitatively is designed to reflect the accuracy indirectly, which takes advantage of the principle of light traveling in a straight line and the fact that light can not penetrate opaque objects. Our experimental results confirm that the proposed evaluation algorithm can automatically and efficiently detect the bad poses whose accuracy are less than the set threshold such as 0.1m, then calculate the bad poses percentage P_bad in all estimated poses to obtain the final accuracy metric P_acc = 1 - P_bad.
摘要:HD(高清晰度)地图基于三维激光雷达起着自主车型的定位,规划,决策,感知等许多三维激光雷达与SLAM(同时定位与地图)测绘技术至关重要的作用高清地图使用建设,确保其精度高。为了评估三维激光雷达测绘的精度,最常见的方法使用姿势来计算估计姿态和地面真相之间的误差的基础事实,但它通常是如此难以得到姿态的地面实况在自主汽车的实际激光雷达测绘。在本文中,我们提出了一个相对准确的评估算法,能够自动评估由3D激光雷达测绘没有内置地面实况高清地图的准确性。一种用于检测在点云地图定量被设计成间接反映准确度,其在直线和事实,即光不能穿透不透明物体花费的光行进的主要优点重影的程度的方法。我们的实验结果确认,所提出的评估算法可以自动和有效地检测不良姿势,其精度是小于设定的阈值,例如为0.1M,然后计算在所有估计姿态的姿态不良百分比P_bad以获得最终的准确性度量P_acc = 1 - P_bad。
10. LFTag: A Scalable Visual Fiducial System with Low Spatial Frequency [PDF] 返回目录
Ben Wang
Abstract: Visual fiducial systems are a key component of many robotics and AR/VR applications for 6-DOF monocular relative pose estimation and target identification. This paper presents LFTag, a visual fiducial system based on topological detection and relative position data encoding which optimizes data density within spatial frequency constraints. The marker is constructed to resolve rotational ambiguity, which combined with the robust geometric and topological false positive rejection, allows all marker bits to be used for data. When compared to existing state-of-the-art square binary markers (AprilTag) and topological markers (TopoTag) in simulation, the proposed fiducial system (LFTag) offers significant advances in dictionary size and range. LFTag 3x3 achieves 546 times the dictionary size of AprilTag 25h9 and LFTag 4x4 achieves 126 thousand times the dictionary size of AprilTag 41h12 while simultaneously achieving longer detection range. LFTag 3x3 also achieves more than twice the detection range of TopoTag 4x4 at the same dictionary size.
摘要:视觉基准系统有许多机器人和AR / VR应用的6-DOF单眼的关键组成部分的相对姿态估计和目标识别。本文呈现LFTag,基于拓扑检测和相对位置数据编码中的视觉基准系统,其空间频率限制内优化数据密度。所述标记经构造以解决旋转歧义,这与鲁棒几何和拓扑假阳性抑制相结合,允许使用对数据的所有标记比特。当与现有状态的最先进的方二进制标记(AprilTag)和拓扑标记(TopoTag)在仿真中,所提出的基准系统(LFTag)报价词典的大小和范围在显著进展。 LFTag 3x3的实现AprilTag 25h9的字典大小546倍和4×4 LFTag实现AprilTag 41h12 126千倍字典大小,同时实现更长的检测范围。 LFTag 3×3也以相同的字典大小达到两倍以上TopoTag 4×4的检测范围。
Ben Wang
Abstract: Visual fiducial systems are a key component of many robotics and AR/VR applications for 6-DOF monocular relative pose estimation and target identification. This paper presents LFTag, a visual fiducial system based on topological detection and relative position data encoding which optimizes data density within spatial frequency constraints. The marker is constructed to resolve rotational ambiguity, which combined with the robust geometric and topological false positive rejection, allows all marker bits to be used for data. When compared to existing state-of-the-art square binary markers (AprilTag) and topological markers (TopoTag) in simulation, the proposed fiducial system (LFTag) offers significant advances in dictionary size and range. LFTag 3x3 achieves 546 times the dictionary size of AprilTag 25h9 and LFTag 4x4 achieves 126 thousand times the dictionary size of AprilTag 41h12 while simultaneously achieving longer detection range. LFTag 3x3 also achieves more than twice the detection range of TopoTag 4x4 at the same dictionary size.
摘要:视觉基准系统有许多机器人和AR / VR应用的6-DOF单眼的关键组成部分的相对姿态估计和目标识别。本文呈现LFTag,基于拓扑检测和相对位置数据编码中的视觉基准系统,其空间频率限制内优化数据密度。所述标记经构造以解决旋转歧义,这与鲁棒几何和拓扑假阳性抑制相结合,允许使用对数据的所有标记比特。当与现有状态的最先进的方二进制标记(AprilTag)和拓扑标记(TopoTag)在仿真中,所提出的基准系统(LFTag)报价词典的大小和范围在显著进展。 LFTag 3x3的实现AprilTag 25h9的字典大小546倍和4×4 LFTag实现AprilTag 41h12 126千倍字典大小,同时实现更长的检测范围。 LFTag 3×3也以相同的字典大小达到两倍以上TopoTag 4×4的检测范围。
11. Multi-scale Cloud Detection in Remote Sensing Images using a Dual Convolutional Neural Network [PDF] 返回目录
Markku Luotamo, Sari Metsämäki, Arto Klami
Abstract: Semantic segmentation by convolutional neural networks (CNN) has advanced the state of the art in pixel-level classification of remote sensing images. However, processing large images typically requires analyzing the image in small patches, and hence features that have large spatial extent still cause challenges in tasks such as cloud masking. To support a wider scale of spatial features while simultaneously reducing computational requirements for large satellite images, we propose an architecture of two cascaded CNN model components successively processing undersampled and full resolution images. The first component distinguishes between patches in the inner cloud area from patches at the cloud's boundary region. For the cloud-ambiguous edge patches requiring further segmentation, the framework then delegates computation to a fine-grained model component. We apply the architecture to a cloud detection dataset of complete Sentinel-2 multispectral images, approximately annotated for minimal false negatives in a land use application. On this specific task and data, we achieve a 16\% relative improvement in pixel accuracy over a CNN baseline based on patching.
摘要:通过卷积神经网络(CNN)的语义分割具有先进的现有技术的状态在遥感图像的像素级分类。然而,处理放大图通常需要在小片分析图像,并因此提供具有较大的空间范围仍然原因在任务的挑战,例如云掩蔽。为了支持空间要素在更大范围,同时降低大型卫星图像的计算需求,我们提出了两个级联CNN模型组件的架构先后处理欠和全分辨率的图像。片之间的第一组分区分在内部云区域从在云的边界区域的补丁。对于需要进一步分割云暧昧边缘补丁,所述框架则代表计算到细粒度模型组件。我们应用架构完整的Sentinel-2多光谱图像的云检测数据集,大约注释,以便在土地使用的应用程序最小的假阴性。在这个特定的任务和数据,我们实现了像素精度优于基于贴敷CNN基线16 \%的相对改善。
Markku Luotamo, Sari Metsämäki, Arto Klami
Abstract: Semantic segmentation by convolutional neural networks (CNN) has advanced the state of the art in pixel-level classification of remote sensing images. However, processing large images typically requires analyzing the image in small patches, and hence features that have large spatial extent still cause challenges in tasks such as cloud masking. To support a wider scale of spatial features while simultaneously reducing computational requirements for large satellite images, we propose an architecture of two cascaded CNN model components successively processing undersampled and full resolution images. The first component distinguishes between patches in the inner cloud area from patches at the cloud's boundary region. For the cloud-ambiguous edge patches requiring further segmentation, the framework then delegates computation to a fine-grained model component. We apply the architecture to a cloud detection dataset of complete Sentinel-2 multispectral images, approximately annotated for minimal false negatives in a land use application. On this specific task and data, we achieve a 16\% relative improvement in pixel accuracy over a CNN baseline based on patching.
摘要:通过卷积神经网络(CNN)的语义分割具有先进的现有技术的状态在遥感图像的像素级分类。然而,处理放大图通常需要在小片分析图像,并因此提供具有较大的空间范围仍然原因在任务的挑战,例如云掩蔽。为了支持空间要素在更大范围,同时降低大型卫星图像的计算需求,我们提出了两个级联CNN模型组件的架构先后处理欠和全分辨率的图像。片之间的第一组分区分在内部云区域从在云的边界区域的补丁。对于需要进一步分割云暧昧边缘补丁,所述框架则代表计算到细粒度模型组件。我们应用架构完整的Sentinel-2多光谱图像的云检测数据集,大约注释,以便在土地使用的应用程序最小的假阴性。在这个特定的任务和数据,我们实现了像素精度优于基于贴敷CNN基线16 \%的相对改善。
12. Temporal Aggregate Representations for Long Term Video Understanding [PDF] 返回目录
Fadime Sener, Dipika Singhania, Angela Yao
Abstract: Future prediction requires reasoning from current and past observations and raises several fundamental questions. How much past information is necessary? What is a reasonable temporal scale to process the past? How much semantic abstraction is required? We address all of these questions with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state-of-the-art results in both next action and dense anticipation using simple techniques such as max pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on the Breakfast Actions, 50Salads and EPIC-Kitchens datasets where we achieve state-of-the-art or comparable results. We also show that our model can be used for temporal video segmentation and action recognition with minimal modifications.
摘要:未来预测需要从当前和过去的观察推理,并提出几个基本问题。多少过去的信息是必要的吗?什么是合理的时间尺度来处理过去?多少语义抽象需要是什么吗?我们解决所有这些问题有一个灵活的多粒度时空聚合框架。我们表明,这是可能实现国家的最先进的结果在这两个下一个动作,并使用简单的技术,如最大池和关注密集期待。为了证明我们的模型的预测能力,我们的早餐操作,50Salads和EPIC-厨房的数据集,我们实现国家的最先进的或比较的结果进行实验。我们还表明,我们的模型可以用来进行影像的时间分割和动作识别与最小的改动。
Fadime Sener, Dipika Singhania, Angela Yao
Abstract: Future prediction requires reasoning from current and past observations and raises several fundamental questions. How much past information is necessary? What is a reasonable temporal scale to process the past? How much semantic abstraction is required? We address all of these questions with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state-of-the-art results in both next action and dense anticipation using simple techniques such as max pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on the Breakfast Actions, 50Salads and EPIC-Kitchens datasets where we achieve state-of-the-art or comparable results. We also show that our model can be used for temporal video segmentation and action recognition with minimal modifications.
摘要:未来预测需要从当前和过去的观察推理,并提出几个基本问题。多少过去的信息是必要的吗?什么是合理的时间尺度来处理过去?多少语义抽象需要是什么吗?我们解决所有这些问题有一个灵活的多粒度时空聚合框架。我们表明,这是可能实现国家的最先进的结果在这两个下一个动作,并使用简单的技术,如最大池和关注密集期待。为了证明我们的模型的预测能力,我们的早餐操作,50Salads和EPIC-厨房的数据集,我们实现国家的最先进的或比较的结果进行实验。我们还表明,我们的模型可以用来进行影像的时间分割和动作识别与最小的改动。
13. Thermal Object Detection using Domain Adaptation through Style Consistency [PDF] 返回目录
Farzeen Munir, Shoaib Azam, Muhammad Aasim Rafique, Ahmad Muqeem Sheri, Moongu Jeon
Abstract: A recent fatal accident of an autonomous vehicle opens a debate about the use of infrared technology in the sensor suite for autonomous driving to increase visibility for robust object detection. Thermal imaging has an advantage over lidar, radar, and camera because it can detect the heat difference emitted by objects in the infrared spectrum. In contrast, lidar and camera capture in the visible spectrum, and adverse weather conditions can impact their accuracy. The limitations of object detection in images from conventional imaging sensors can be catered to by thermal images. This paper presents a domain adaptation method for object detection in thermal images. We explore multiple ideas of domain adaption. First, a generative adversarial network is used to transfer the low-level features from the visible spectrum to the infrared spectrum domain through style consistency. Second, a cross-domain model with style consistency is used for object detection in the infrared spectrum by transferring the trained visible spectrum model. The proposed strategies are evaluated on publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral). We find that adapting the low-level features from the source domain to the target domain through domain adaptation increases in mean average precision by approximately 10%.
摘要:最近的自主车辆的死亡事故中打开一个关于在传感器套件采用红外技术的自主驾驶,以提高知名度为稳健的物体检测辩论。热成像具有优于激光雷达,雷达,和照相机,因为它可以检测由物体在红外光谱中发出的热量差。相比之下,激光雷达和在可见光谱相机捕获和恶劣的天气条件可能会影响其准确性。物体检测在从常规成像传感器图像的限制可以通过热图像来迎合。本文提出了在热图像对象检测域自适应方法。我们探索适应域多思路。首先,生成对抗网络用于通过风格的一致性从可见光谱传送低级特征的红外光谱域。其次,随着风格的一致性跨域模型由传递训练可见光谱模型用于对象检测在红外光谱中。所提出的策略是可公开获得的热图像数据集(FLIR ADAS和KAIST的多谱)进行评估。我们发现,适应了低级别的约10%来自源域,通过平均平均精度领域适应性增加目标域特征。
Farzeen Munir, Shoaib Azam, Muhammad Aasim Rafique, Ahmad Muqeem Sheri, Moongu Jeon
Abstract: A recent fatal accident of an autonomous vehicle opens a debate about the use of infrared technology in the sensor suite for autonomous driving to increase visibility for robust object detection. Thermal imaging has an advantage over lidar, radar, and camera because it can detect the heat difference emitted by objects in the infrared spectrum. In contrast, lidar and camera capture in the visible spectrum, and adverse weather conditions can impact their accuracy. The limitations of object detection in images from conventional imaging sensors can be catered to by thermal images. This paper presents a domain adaptation method for object detection in thermal images. We explore multiple ideas of domain adaption. First, a generative adversarial network is used to transfer the low-level features from the visible spectrum to the infrared spectrum domain through style consistency. Second, a cross-domain model with style consistency is used for object detection in the infrared spectrum by transferring the trained visible spectrum model. The proposed strategies are evaluated on publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral). We find that adapting the low-level features from the source domain to the target domain through domain adaptation increases in mean average precision by approximately 10%.
摘要:最近的自主车辆的死亡事故中打开一个关于在传感器套件采用红外技术的自主驾驶,以提高知名度为稳健的物体检测辩论。热成像具有优于激光雷达,雷达,和照相机,因为它可以检测由物体在红外光谱中发出的热量差。相比之下,激光雷达和在可见光谱相机捕获和恶劣的天气条件可能会影响其准确性。物体检测在从常规成像传感器图像的限制可以通过热图像来迎合。本文提出了在热图像对象检测域自适应方法。我们探索适应域多思路。首先,生成对抗网络用于通过风格的一致性从可见光谱传送低级特征的红外光谱域。其次,随着风格的一致性跨域模型由传递训练可见光谱模型用于对象检测在红外光谱中。所提出的策略是可公开获得的热图像数据集(FLIR ADAS和KAIST的多谱)进行评估。我们发现,适应了低级别的约10%来自源域,通过平均平均精度领域适应性增加目标域特征。
14. Real-Time Face and Landmark Localization for Eyeblink Detection [PDF] 返回目录
Paul Bakker, Henk-Jan Boele, Zaid Al-Ars, Christos Strydis
Abstract: Pavlovian eyeblink conditioning is a powerful experiment used in the field of neuroscience to measure multiple aspects of how we learn in our daily life. To track the movement of the eyelid during an experiment, researchers have traditionally made use of potentiometers or electromyography. More recently, the use of computer vision and image processing alleviated the need for these techniques but currently employed methods require human intervention and are not fast enough to enable real-time processing. In this work, a face- and landmark-detection algorithm have been carefully combined in order to provide fully automated eyelid tracking, and have further been accelerated to make the first crucial step towards online, closed-loop experiments. Such experiments have not been achieved so far and are expected to offer significant insights in the workings of neurological and psychiatric disorders. Based on an extensive literature search, various different algorithms for face detection and landmark detection have been analyzed and evaluated. Two algorithms were identified as most suitable for eyelid detection: the Histogram-of-Oriented-Gradients (HOG) algorithm for face detection and the Ensemble-of-Regression-Trees (ERT) algorithm for landmark detection. These two algorithms have been accelerated on GPU and CPU, achieving speedups of 1,753$\times$ and 11$\times$, respectively. To demonstrate the usefulness of our eyelid-detection algorithm, a research hypothesis was formed and a well-established neuroscientific experiment was employed: eyeblink detection. Our experimental evaluation reveals an overall application runtime of 0.533 ms per frame, which is 1,101$\times$ faster than the sequential implementation and well within the real-time requirements of eyeblink conditioning in humans, i.e. faster than 500 frames per second.
摘要:巴甫洛夫眨眼空调在神经科学领域用来衡量我们在日常生活中如何学习多个方面强大的实验。要跟踪实验期间眼睑的运动,研究人员传统上利用电位器或肌电图。最近,利用计算机视觉和图像处理的缓解这些技术的需要,但目前采用的方法需要人的干预和速度不够快,使实时处理。在这项工作中,face-和里程碑意义的检测算法已经特别小心,以提供完全自动化的眼睑跟踪相结合,进一步加快了在实现第一个关键的一步在线,闭环实验。这样的实验还没有迄今取得的成果,预计将提供神经系统和精神疾病的工作显著的见解。根据广泛的文献检索,用于面部检测和标志检测各种不同的算法进行了分析和评价。两种算法被鉴定为最适合于眼睑检测:直方图-的定向的梯度(HOG)算法进行面部检测和为标志检测的合奏-的回归树(ERT)算法。这两种算法已经加快了GPU和CPU,分别实现的1753 $ \ $倍和11 $ \ $倍,速度提升。为了证明我们的眼睑检测算法的有效性,研究假设成立,并采用了成熟的神经科学实验:眨眼检测。我们的实验评估显示每帧0.533 ms,这1,101 $ \ $时间比顺序实现更快,以及人类的眨眼调理的实时性要求之内,即每秒的速度比500帧的总体应用程序运行时。
Paul Bakker, Henk-Jan Boele, Zaid Al-Ars, Christos Strydis
Abstract: Pavlovian eyeblink conditioning is a powerful experiment used in the field of neuroscience to measure multiple aspects of how we learn in our daily life. To track the movement of the eyelid during an experiment, researchers have traditionally made use of potentiometers or electromyography. More recently, the use of computer vision and image processing alleviated the need for these techniques but currently employed methods require human intervention and are not fast enough to enable real-time processing. In this work, a face- and landmark-detection algorithm have been carefully combined in order to provide fully automated eyelid tracking, and have further been accelerated to make the first crucial step towards online, closed-loop experiments. Such experiments have not been achieved so far and are expected to offer significant insights in the workings of neurological and psychiatric disorders. Based on an extensive literature search, various different algorithms for face detection and landmark detection have been analyzed and evaluated. Two algorithms were identified as most suitable for eyelid detection: the Histogram-of-Oriented-Gradients (HOG) algorithm for face detection and the Ensemble-of-Regression-Trees (ERT) algorithm for landmark detection. These two algorithms have been accelerated on GPU and CPU, achieving speedups of 1,753$\times$ and 11$\times$, respectively. To demonstrate the usefulness of our eyelid-detection algorithm, a research hypothesis was formed and a well-established neuroscientific experiment was employed: eyeblink detection. Our experimental evaluation reveals an overall application runtime of 0.533 ms per frame, which is 1,101$\times$ faster than the sequential implementation and well within the real-time requirements of eyeblink conditioning in humans, i.e. faster than 500 frames per second.
摘要:巴甫洛夫眨眼空调在神经科学领域用来衡量我们在日常生活中如何学习多个方面强大的实验。要跟踪实验期间眼睑的运动,研究人员传统上利用电位器或肌电图。最近,利用计算机视觉和图像处理的缓解这些技术的需要,但目前采用的方法需要人的干预和速度不够快,使实时处理。在这项工作中,face-和里程碑意义的检测算法已经特别小心,以提供完全自动化的眼睑跟踪相结合,进一步加快了在实现第一个关键的一步在线,闭环实验。这样的实验还没有迄今取得的成果,预计将提供神经系统和精神疾病的工作显著的见解。根据广泛的文献检索,用于面部检测和标志检测各种不同的算法进行了分析和评价。两种算法被鉴定为最适合于眼睑检测:直方图-的定向的梯度(HOG)算法进行面部检测和为标志检测的合奏-的回归树(ERT)算法。这两种算法已经加快了GPU和CPU,分别实现的1753 $ \ $倍和11 $ \ $倍,速度提升。为了证明我们的眼睑检测算法的有效性,研究假设成立,并采用了成熟的神经科学实验:眨眼检测。我们的实验评估显示每帧0.533 ms,这1,101 $ \ $时间比顺序实现更快,以及人类的眨眼调理的实时性要求之内,即每秒的速度比500帧的总体应用程序运行时。
15. Foreground-aware Semantic Representations for Image Harmonization [PDF] 返回目录
Konstantin Sofiiuk, Polina Popenova, Anton Konushin
Abstract: Image harmonization is an important step in photo editing to achieve visual consistency in composite images by adjusting the appearances of foreground to make it compatible with background. Previous approaches to harmonize composites are based on training of encoder-decoder networks from scratch, which makes it challenging for a neural network to learn a high-level representation of objects. We propose a novel architecture to utilize the space of high-level features learned by a pre-trained classification network. We create our models as a combination of existing encoder-decoder architectures and a pre-trained foreground-aware deep high-resolution network. We extensively evaluate the proposed method on existing image harmonization benchmark and set up a new state-of-the-art in terms of MSE and PSNR metrics. The code and trained models are available at \url{this https URL}.
摘要:图像统一为在照片编辑通过调节前景的外观,使其与背景兼容实现合成图像的视觉一致性的一个重要步骤。协调的复合材料以前的方法是基于从头开始,这使得它具有挑战性的神经网络来学习对象的高级别代表编码器,解码器网络的培训。我们提出了一个新颖的架构利用高层次的空间特性在预先训练的分类网了解到。我们创建我们的模型作为现有的编码器,解码器架构的组合和预训练前景感知深高分辨率网络。我们广泛评估现有的图像一致基准所提出的方法,并建立一个新的国家的最先进的MSE和PSNR指标方面。代码和训练的模型可在\ {URL这HTTPS URL}。
Konstantin Sofiiuk, Polina Popenova, Anton Konushin
Abstract: Image harmonization is an important step in photo editing to achieve visual consistency in composite images by adjusting the appearances of foreground to make it compatible with background. Previous approaches to harmonize composites are based on training of encoder-decoder networks from scratch, which makes it challenging for a neural network to learn a high-level representation of objects. We propose a novel architecture to utilize the space of high-level features learned by a pre-trained classification network. We create our models as a combination of existing encoder-decoder architectures and a pre-trained foreground-aware deep high-resolution network. We extensively evaluate the proposed method on existing image harmonization benchmark and set up a new state-of-the-art in terms of MSE and PSNR metrics. The code and trained models are available at \url{this https URL}.
摘要:图像统一为在照片编辑通过调节前景的外观,使其与背景兼容实现合成图像的视觉一致性的一个重要步骤。协调的复合材料以前的方法是基于从头开始,这使得它具有挑战性的神经网络来学习对象的高级别代表编码器,解码器网络的培训。我们提出了一个新颖的架构利用高层次的空间特性在预先训练的分类网了解到。我们创建我们的模型作为现有的编码器,解码器架构的组合和预训练前景感知深高分辨率网络。我们广泛评估现有的图像一致基准所提出的方法,并建立一个新的国家的最先进的MSE和PSNR指标方面。代码和训练的模型可在\ {URL这HTTPS URL}。
16. Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos [PDF] 返回目录
Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto
Abstract: In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that introducing the human-generated textual transcriptions of the spoken narratives helps to the training procedure yielding to get better embedding representations. The triad speech, image and words allows for a better estimate of the point embedding and show an improving of the performance within tasks like image and speech retrieval, even when text third modality, text, is not present in the task.
摘要:在这项工作中,我们提出了结合三种同时训练方式嵌入独特交涉的有效途径:图像和口语和文字叙述。从基线系统所提出的方法的出发会派生只有口头叙述和图像线索培养了嵌入空间。我们对EPIC-厨房和地点语音说明数据集实验表明,引入口语叙述的人类生成的文本改编有助于训练过程产生,以获得更好的嵌入表示。黑社会讲话,图像和文字允许嵌入点的更好的估计,并显示像图像和语音检索任务中表现的提高,即使文字第三种方式,文本中不存在的任务。
Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto
Abstract: In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that introducing the human-generated textual transcriptions of the spoken narratives helps to the training procedure yielding to get better embedding representations. The triad speech, image and words allows for a better estimate of the point embedding and show an improving of the performance within tasks like image and speech retrieval, even when text third modality, text, is not present in the task.
摘要:在这项工作中,我们提出了结合三种同时训练方式嵌入独特交涉的有效途径:图像和口语和文字叙述。从基线系统所提出的方法的出发会派生只有口头叙述和图像线索培养了嵌入空间。我们对EPIC-厨房和地点语音说明数据集实验表明,引入口语叙述的人类生成的文本改编有助于训练过程产生,以获得更好的嵌入表示。黑社会讲话,图像和文字允许嵌入点的更好的估计,并显示像图像和语音检索任务中表现的提高,即使文字第三种方式,文本中不存在的任务。
17. Structured Multimodal Attentions for TextVQA [PDF] 返回目录
Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu
Abstract: Text based Visual Question Answering (TextVQA) is a recently raised challenge that requires a machine to read text in images and answer natural language questions by jointly reasoning over the question, Optical Character Recognition (OCR) tokens and visual content. Most of the state-of-the-art (SoTA) VQA methods fail to answer these questions because of i) poor text reading ability; ii) lacking of text-visual reasoning capacity; and iii) adopting a discriminative answering mechanism instead of a generative one which is hard to cover both OCR tokens and general text tokens in the final answer. In this paper, we propose a structured multimodal attention (SMA) neural network to solve the above issues. Our SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then design a multimodal graph attention network to reason over it. Finally, the outputs from the above module are processed by a global-local attentional answering module to produce an answer that covers tokens from both OCR and general text iteratively. Our proposed model outperforms the SoTA models on TextVQA dataset and all three tasks of ST-VQA dataset. To provide an upper bound for our method and a fair testing base for further works, we also provide human-annotated ground-truth OCR annotations for the TextVQA dataset, which were not given in the original release.
摘要:基于文本的视觉答疑(TextVQA)是最近提出的挑战,需要一个机图片,阅读文本,并通过联合推理了这个问题回答自然语言问题,光学字符识别(OCR),令牌和视觉内容。大多数国家的最先进的(SOTA)VQA方法不能回答这些问题,因为我)可怜的文本阅读能力; ⅱ)缺乏的文本视觉推理能力;和iii)采用歧视性应答的机制,而不是生成一个这是很难覆盖在最后的答案都OCR令牌和一般的文本标记。在本文中,我们提出了一个结构化的多式联运关注(SMA)神经网络来解决上述问题。我们SMA首先使用结构图表示编码出现在图像中的对象的对象,对象的文字和文字文本的关系,然后设计一个多图形关注网络,理智战胜了它。最后,从上述模块的输出由全局 - 局部注意力应答模块处理,以产生覆盖来自OCR和一般文本反复令牌一个答案。我们提出的模型优于上TextVQA数据集的SOTA模型和ST-VQA数据集的所有三个任务。为了提供一个上限对我们的方法和公正的测试进行进一步工作的基础上,我们还提供了TextVQA数据集,这是不是在原来的版本赐予人类标注的地面真OCR注解。
Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu
Abstract: Text based Visual Question Answering (TextVQA) is a recently raised challenge that requires a machine to read text in images and answer natural language questions by jointly reasoning over the question, Optical Character Recognition (OCR) tokens and visual content. Most of the state-of-the-art (SoTA) VQA methods fail to answer these questions because of i) poor text reading ability; ii) lacking of text-visual reasoning capacity; and iii) adopting a discriminative answering mechanism instead of a generative one which is hard to cover both OCR tokens and general text tokens in the final answer. In this paper, we propose a structured multimodal attention (SMA) neural network to solve the above issues. Our SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then design a multimodal graph attention network to reason over it. Finally, the outputs from the above module are processed by a global-local attentional answering module to produce an answer that covers tokens from both OCR and general text iteratively. Our proposed model outperforms the SoTA models on TextVQA dataset and all three tasks of ST-VQA dataset. To provide an upper bound for our method and a fair testing base for further works, we also provide human-annotated ground-truth OCR annotations for the TextVQA dataset, which were not given in the original release.
摘要:基于文本的视觉答疑(TextVQA)是最近提出的挑战,需要一个机图片,阅读文本,并通过联合推理了这个问题回答自然语言问题,光学字符识别(OCR),令牌和视觉内容。大多数国家的最先进的(SOTA)VQA方法不能回答这些问题,因为我)可怜的文本阅读能力; ⅱ)缺乏的文本视觉推理能力;和iii)采用歧视性应答的机制,而不是生成一个这是很难覆盖在最后的答案都OCR令牌和一般的文本标记。在本文中,我们提出了一个结构化的多式联运关注(SMA)神经网络来解决上述问题。我们SMA首先使用结构图表示编码出现在图像中的对象的对象,对象的文字和文字文本的关系,然后设计一个多图形关注网络,理智战胜了它。最后,从上述模块的输出由全局 - 局部注意力应答模块处理,以产生覆盖来自OCR和一般文本反复令牌一个答案。我们提出的模型优于上TextVQA数据集的SOTA模型和ST-VQA数据集的所有三个任务。为了提供一个上限对我们的方法和公正的测试进行进一步工作的基础上,我们还提供了TextVQA数据集,这是不是在原来的版本赐予人类标注的地面真OCR注解。
18. Global Distance-distributions Separation for Unsupervised Person Re-identification [PDF] 返回目录
Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen
Abstract: Supervised person re-identification (ReID) often has poor scalability and usability in real-world deployments due to domain gaps and the lack of annotations for the target domain data. Unsupervised person ReID through domain adaptation is attractive yet challenging. Existing unsupervised ReID approaches often fail in correctly identifying the positive samples and negative samples through the distance-based matching/ranking. The two distributions of distances for positive sample pairs (Pos-distr) and negative sample pairs (Neg-distr) are often not well separated, having large overlap. To address this problem, we introduce a global distance-distributions separation (GDS) constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view. We model the two global distance distributions as Gaussian distributions and push apart the two distributions while encouraging their sharpness in the unsupervised training process. Particularly, to model the distributions from a global view and facilitate the timely updating of the distributions and the GDS related losses, we leverage a momentum update mechanism for building and maintaining the distribution parameters (mean and variance) and calculate the loss on the fly during the training. Distribution-based hard mining is proposed to further promote the separation of the two distributions. We validate the effectiveness of the GDS constraint in unsupervised ReID networks. Extensive experiments on multiple ReID benchmark datasets show our method leads to significant improvement over the baselines and achieves the state-of-the-art performance.
摘要:受监管人重新鉴定(里德)经常有因域空白的可扩展性和可用性差在现实世界的部署和缺少注释目标域数据。通过领域适应性无监督人里德是有吸引力而具有挑战性的。现有的无监督里德方法往往不通过基于距离的匹配正确识别正样本和负样本/排名。距离为阳性样品对(POS-颇)和阴性样品对(NEG-颇)的两个分布往往不能很好地分离,其具有大的重叠。为了解决这个问题,我们引入了两个分布全球距离发行分离(GDS)的约束,鼓励阳性和阴性样品从全球视野的清晰的分离。我们模拟的两个全球距离分布为高斯分布,推开两个分布,同时鼓励他们在无人监督的训练过程的清晰度。尤其是,到分布中从全局视图模型,并促进分布的及时更新和相关损失的GDS,我们利用建立和维护分布参数的势头更新机制(均值和方差),并计算在对飞的损失培训。配送为主的硬开采提出了进一步促进两个分布的分离。我们验证在无人监督的里德网络GDS约束的有效性。在多个里德基准数据集大量的实验证明我们的方法导致在基线显著改善,达到国家的最先进的性能。
Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen
Abstract: Supervised person re-identification (ReID) often has poor scalability and usability in real-world deployments due to domain gaps and the lack of annotations for the target domain data. Unsupervised person ReID through domain adaptation is attractive yet challenging. Existing unsupervised ReID approaches often fail in correctly identifying the positive samples and negative samples through the distance-based matching/ranking. The two distributions of distances for positive sample pairs (Pos-distr) and negative sample pairs (Neg-distr) are often not well separated, having large overlap. To address this problem, we introduce a global distance-distributions separation (GDS) constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view. We model the two global distance distributions as Gaussian distributions and push apart the two distributions while encouraging their sharpness in the unsupervised training process. Particularly, to model the distributions from a global view and facilitate the timely updating of the distributions and the GDS related losses, we leverage a momentum update mechanism for building and maintaining the distribution parameters (mean and variance) and calculate the loss on the fly during the training. Distribution-based hard mining is proposed to further promote the separation of the two distributions. We validate the effectiveness of the GDS constraint in unsupervised ReID networks. Extensive experiments on multiple ReID benchmark datasets show our method leads to significant improvement over the baselines and achieves the state-of-the-art performance.
摘要:受监管人重新鉴定(里德)经常有因域空白的可扩展性和可用性差在现实世界的部署和缺少注释目标域数据。通过领域适应性无监督人里德是有吸引力而具有挑战性的。现有的无监督里德方法往往不通过基于距离的匹配正确识别正样本和负样本/排名。距离为阳性样品对(POS-颇)和阴性样品对(NEG-颇)的两个分布往往不能很好地分离,其具有大的重叠。为了解决这个问题,我们引入了两个分布全球距离发行分离(GDS)的约束,鼓励阳性和阴性样品从全球视野的清晰的分离。我们模拟的两个全球距离分布为高斯分布,推开两个分布,同时鼓励他们在无人监督的训练过程的清晰度。尤其是,到分布中从全局视图模型,并促进分布的及时更新和相关损失的GDS,我们利用建立和维护分布参数的势头更新机制(均值和方差),并计算在对飞的损失培训。配送为主的硬开采提出了进一步促进两个分布的分离。我们验证在无人监督的里德网络GDS约束的有效性。在多个里德基准数据集大量的实验证明我们的方法导致在基线显著改善,达到国家的最先进的性能。
19. DeepMark++: CenterNet-based Clothing Detection [PDF] 返回目录
Alexey Sidnev, Alexander Krapivin, Alexey Trushkov, Ekaterina Krasikova, Maxim Kazakov
Abstract: The single-stage approach for fast clothing detection as a modification of a multi-target network, CenterNet, is proposed in this paper. We introduce several powerful post-processing techniques that may be applied to increase the quality of keypoint localization tasks. The semantic keypoint grouping approach and post-processing techniques make it possible to achieve a state-of-the-art accuracy of 0.737 mAP for the bounding box detection task and 0.591 mAP for the landmark detection task on the DeepFashion2 validation dataset. We have also achieved the second place in the DeepFashion2 Challenge 2020 with 0.582 mAP on the test dataset. The proposed approach can also be used on low-power devices with relatively high accuracy without requiring any post-processing techniques.
摘要:用于快速服装检测为多目标网络,CenterNet的变形例的单级方法中,在本文提出。我们介绍了可用于增加的关键点定位工作质量的几个强大的后处理技术。语义关键点分组的方法和后处理技术,使其能够达到0.737地图边界框检测任务和0.591地图上DeepFashion2验证数据集标志检测任务的国家的最先进的精度。我们还与0.582地图上的测试数据集所取得的第二个地方DeepFashion2挑战2020所提出的方法也可以在低功率装置以相对高的精确度,而不需要任何后处理技术中使用。
Alexey Sidnev, Alexander Krapivin, Alexey Trushkov, Ekaterina Krasikova, Maxim Kazakov
Abstract: The single-stage approach for fast clothing detection as a modification of a multi-target network, CenterNet, is proposed in this paper. We introduce several powerful post-processing techniques that may be applied to increase the quality of keypoint localization tasks. The semantic keypoint grouping approach and post-processing techniques make it possible to achieve a state-of-the-art accuracy of 0.737 mAP for the bounding box detection task and 0.591 mAP for the landmark detection task on the DeepFashion2 validation dataset. We have also achieved the second place in the DeepFashion2 Challenge 2020 with 0.582 mAP on the test dataset. The proposed approach can also be used on low-power devices with relatively high accuracy without requiring any post-processing techniques.
摘要:用于快速服装检测为多目标网络,CenterNet的变形例的单级方法中,在本文提出。我们介绍了可用于增加的关键点定位工作质量的几个强大的后处理技术。语义关键点分组的方法和后处理技术,使其能够达到0.737地图边界框检测任务和0.591地图上DeepFashion2验证数据集标志检测任务的国家的最先进的精度。我们还与0.582地图上的测试数据集所取得的第二个地方DeepFashion2挑战2020所提出的方法也可以在低功率装置以相对高的精确度,而不需要任何后处理技术中使用。
20. Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-based Framework [PDF] 返回目录
Alireza Rezvanifar, Melissa Cote, Alexandra Branzan Albu
Abstract: This papers focuses on symbol spotting on real-world digital architectural floor plans with a deep learning (DL)-based framework. Traditional on-the-fly symbol spotting methods are unable to address the semantic challenge of graphical notation variability, i.e. low intra-class symbol similarity, an issue that is particularly important in architectural floor plan analysis. The presence of occlusion and clutter, characteristic of real-world plans, along with a varying graphical symbol complexity from almost trivial to highly complex, also pose challenges to existing spotting methods. In this paper, we address all of the above issues by leveraging recent advances in DL and adapting an object detection framework based on the You-Only-Look-Once (YOLO) architecture. We propose a training strategy based on tiles, avoiding many issues particular to DL-based object detection networks related to the relative small size of symbols compared to entire floor plans, aspect ratios, and data augmentation. Experiments on real-world floor plans demonstrate that our method successfully detects architectural symbols with low intra-class similarity and of variable graphical complexity, even in the presence of heavy occlusion and clutter. Additional experiments on the public SESYD dataset confirm that our proposed approach can deal with various degradation and noise levels and outperforms other symbol spotting methods.
摘要:本论文侧重于对现实世界的数字建筑平面图与深度学习(DL)为基础的框架符号斑点。传统上即时符号点滴的方法都无法满足图形符号变化,即低类内符号相似,即在建筑平面图分析尤为重要问题的语义挑战。闭塞和杂波的存在,真实世界的计划的特点,从几乎微不足道的高度复杂变化的图形符号的复杂性一起,也对现有的斑点方法的挑战。在本文中,我们通过利用在DL的最新进展和调整基础上,你-ONLY-查找一次(永乐)架构的目标检测框架解决所有上述问题。我们提出了一种基于瓷砖的培训战略,避免特别是与符号相关的比较小的尺寸相比,整个楼层平面图,长宽比和数据增强基于DL体检测网络中的许多问题。现实世界的平面图实验表明,我们的方法成功地检测到低类内的相似性和可变图形的复杂性建筑符号,即使在重闭塞和杂波的存在。对公众SESYD数据集中确认,我们提出的方法可以应对各种退化和噪声水平,优于其他符号斑点方法附加实验。
Alireza Rezvanifar, Melissa Cote, Alexandra Branzan Albu
Abstract: This papers focuses on symbol spotting on real-world digital architectural floor plans with a deep learning (DL)-based framework. Traditional on-the-fly symbol spotting methods are unable to address the semantic challenge of graphical notation variability, i.e. low intra-class symbol similarity, an issue that is particularly important in architectural floor plan analysis. The presence of occlusion and clutter, characteristic of real-world plans, along with a varying graphical symbol complexity from almost trivial to highly complex, also pose challenges to existing spotting methods. In this paper, we address all of the above issues by leveraging recent advances in DL and adapting an object detection framework based on the You-Only-Look-Once (YOLO) architecture. We propose a training strategy based on tiles, avoiding many issues particular to DL-based object detection networks related to the relative small size of symbols compared to entire floor plans, aspect ratios, and data augmentation. Experiments on real-world floor plans demonstrate that our method successfully detects architectural symbols with low intra-class similarity and of variable graphical complexity, even in the presence of heavy occlusion and clutter. Additional experiments on the public SESYD dataset confirm that our proposed approach can deal with various degradation and noise levels and outperforms other symbol spotting methods.
摘要:本论文侧重于对现实世界的数字建筑平面图与深度学习(DL)为基础的框架符号斑点。传统上即时符号点滴的方法都无法满足图形符号变化,即低类内符号相似,即在建筑平面图分析尤为重要问题的语义挑战。闭塞和杂波的存在,真实世界的计划的特点,从几乎微不足道的高度复杂变化的图形符号的复杂性一起,也对现有的斑点方法的挑战。在本文中,我们通过利用在DL的最新进展和调整基础上,你-ONLY-查找一次(永乐)架构的目标检测框架解决所有上述问题。我们提出了一种基于瓷砖的培训战略,避免特别是与符号相关的比较小的尺寸相比,整个楼层平面图,长宽比和数据增强基于DL体检测网络中的许多问题。现实世界的平面图实验表明,我们的方法成功地检测到低类内的相似性和可变图形的复杂性建筑符号,即使在重闭塞和杂波的存在。对公众SESYD数据集中确认,我们提出的方法可以应对各种退化和噪声水平,优于其他符号斑点方法附加实验。
21. Review on 3D Lidar Localization for Autonomous Driving Cars [PDF] 返回目录
Mahdi Elhousni, Xinming Huang
Abstract: LIDAR sensors are bound to become one the core sensors in achieving full autonomy for self driving cars. LIDARs are able to produce rich, dense and precise spatial data, which can tremendously help in localizing and tracking a moving vehicle. In this paper, we review the latest finding in 3D LIDAR localization for autonomous driving cars, and analyze the results obtained by each method, in an effort to guide the research community towards the path that seems to be the most promising.
摘要:激光雷达传感器必然成为一个核心的传感器在实现自我驾驶汽车有充分的自主权。激光雷达能够产生丰富的,密集的,精确的空间数据,它可以在定位和跟踪移动的车辆极大的帮助。在本文中,我们回顾3D激光雷达国产化,为自动驾驶汽车的最新调查结果,并分析各方法得到的结果,努力引导研究机构走向,这似乎是最有希望的路径。
Mahdi Elhousni, Xinming Huang
Abstract: LIDAR sensors are bound to become one the core sensors in achieving full autonomy for self driving cars. LIDARs are able to produce rich, dense and precise spatial data, which can tremendously help in localizing and tracking a moving vehicle. In this paper, we review the latest finding in 3D LIDAR localization for autonomous driving cars, and analyze the results obtained by each method, in an effort to guide the research community towards the path that seems to be the most promising.
摘要:激光雷达传感器必然成为一个核心的传感器在实现自我驾驶汽车有充分的自主权。激光雷达能够产生丰富的,密集的,精确的空间数据,它可以在定位和跟踪移动的车辆极大的帮助。在本文中,我们回顾3D激光雷达国产化,为自动驾驶汽车的最新调查结果,并分析各方法得到的结果,努力引导研究机构走向,这似乎是最有希望的路径。
22. Automatic Building and Labeling of HD Maps with Deep Learning [PDF] 返回目录
Mahdi Elhousni, Yecheng Lyu, Ziming Zhang, Xinming Huang
Abstract: In a world where autonomous driving cars are becoming increasingly more common, creating an adequate infrastructure for this new technology is essential. This includes building and labeling high-definition (HD) maps accurately and efficiently. Today, the process of creating HD maps requires a lot of human input, which takes time and is prone to errors. In this paper, we propose a novel method capable of generating labelled HD maps from raw sensor data. We implemented and tested our methods on several urban scenarios using data collected from our test vehicle. The results show that the pro-posed deep learning based method can produce highly accurate HD maps. This approach speeds up the process of building and labeling HD maps, which can make meaningful contribution to the deployment of autonomous vehicle.
摘要:在当今世界,自主驾驶汽车正变得越来越普遍,这种新技术创造充分的基础设施是必不可少的。这包括建立与标识的高清晰度(HD)地图准确且有效地。如今,打造高清地图的过程中需要大量的人力投入,这需要时间和容易出错的。在本文中,我们提出了能够产生从原始传感器数据标记HD地图的新颖方法。我们实施和测试使用从我们的测试车辆收集的数据的几个城市环境下我们的方法。结果表明,亲定的深度学习为基础的方法可以产生高度精确的HD地图。这种方法加快建立和标签高清地图,它可以使对自主汽车的部署有意义的贡献的过程。
Mahdi Elhousni, Yecheng Lyu, Ziming Zhang, Xinming Huang
Abstract: In a world where autonomous driving cars are becoming increasingly more common, creating an adequate infrastructure for this new technology is essential. This includes building and labeling high-definition (HD) maps accurately and efficiently. Today, the process of creating HD maps requires a lot of human input, which takes time and is prone to errors. In this paper, we propose a novel method capable of generating labelled HD maps from raw sensor data. We implemented and tested our methods on several urban scenarios using data collected from our test vehicle. The results show that the pro-posed deep learning based method can produce highly accurate HD maps. This approach speeds up the process of building and labeling HD maps, which can make meaningful contribution to the deployment of autonomous vehicle.
摘要:在当今世界,自主驾驶汽车正变得越来越普遍,这种新技术创造充分的基础设施是必不可少的。这包括建立与标识的高清晰度(HD)地图准确且有效地。如今,打造高清地图的过程中需要大量的人力投入,这需要时间和容易出错的。在本文中,我们提出了能够产生从原始传感器数据标记HD地图的新颖方法。我们实施和测试使用从我们的测试车辆收集的数据的几个城市环境下我们的方法。结果表明,亲定的深度学习为基础的方法可以产生高度精确的HD地图。这种方法加快建立和标签高清地图,它可以使对自主汽车的部署有意义的贡献的过程。
23. In the Eye of the Beholder: Gaze and Actions in First Person Video [PDF] 返回目录
Yin Li, Miao Liu, James M. Rehg
Abstract: We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. To facilitate our research, we first introduce the EGTEA Gaze+ dataset. Our dataset comes with videos, gaze tracking data, hand masks and action annotations, thereby providing the most comprehensive benchmark for First Person Vision (FPV). Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We further sample from these stochastic units, generating an attention map to guide the aggregation of visual features for action recognition. Our method is evaluated on our EGTEA Gaze+ dataset and achieves a performance level that exceeds the state-of-the-art by a significant margin. More importantly, we demonstrate that our model can be applied to larger scale FPV dataset---EPIC-Kitchens even without using gaze, offering new state-of-the-art results on FPV action recognition.
摘要:解决共同确定什么人在做什么,他们是基于视频的通过头戴式摄像头捕捉到的分析看任务。为了方便我们的研究,我们首先介绍了EGTEA凝视+数据集。我们的数据自带的视频,视线跟踪数据,手口罩和行动注释,从而为第一人称视觉(FPV)最全面的基准。超越数据集,我们提出了在FPV联合目光估计和动作识别一种新的深层模型。我们的方法描述了参与者的目光如使用随机单位在深网络中的概率变量和模型及其分布。我们从这些随机单元进一步样品,生成关注地图引导的视觉特征为动作识别聚集。我们的方法是在我们的EGTEA凝视+数据集进行评估,并实现了超过了国家的最先进的由显著保证金的性能水平。更重要的是,我们证明了我们的模型,即使不使用凝视适用于较大规模的FPV数据集--- EPIC的厨房,提供国家的最先进的新的FPV动作识别结果。
Yin Li, Miao Liu, James M. Rehg
Abstract: We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. To facilitate our research, we first introduce the EGTEA Gaze+ dataset. Our dataset comes with videos, gaze tracking data, hand masks and action annotations, thereby providing the most comprehensive benchmark for First Person Vision (FPV). Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We further sample from these stochastic units, generating an attention map to guide the aggregation of visual features for action recognition. Our method is evaluated on our EGTEA Gaze+ dataset and achieves a performance level that exceeds the state-of-the-art by a significant margin. More importantly, we demonstrate that our model can be applied to larger scale FPV dataset---EPIC-Kitchens even without using gaze, offering new state-of-the-art results on FPV action recognition.
摘要:解决共同确定什么人在做什么,他们是基于视频的通过头戴式摄像头捕捉到的分析看任务。为了方便我们的研究,我们首先介绍了EGTEA凝视+数据集。我们的数据自带的视频,视线跟踪数据,手口罩和行动注释,从而为第一人称视觉(FPV)最全面的基准。超越数据集,我们提出了在FPV联合目光估计和动作识别一种新的深层模型。我们的方法描述了参与者的目光如使用随机单位在深网络中的概率变量和模型及其分布。我们从这些随机单元进一步样品,生成关注地图引导的视觉特征为动作识别聚集。我们的方法是在我们的EGTEA凝视+数据集进行评估,并实现了超过了国家的最先进的由显著保证金的性能水平。更重要的是,我们证明了我们的模型,即使不使用凝视适用于较大规模的FPV数据集--- EPIC的厨房,提供国家的最先进的新的FPV动作识别结果。
24. A General-Purpose Dehazing Algorithm based on Local Contrast Enhancement Approaches [PDF] 返回目录
Bangyong Sun, Vincent Whannou de Dravo, Zhe Yu
Abstract: Dehazing is in the image processing and computer vision communities, the task of enhancing the image taken in foggy conditions. To better understand this type of algorithm, we present in this document a dehazing method which is suitable for several local contrast adjustment algorithms. We base it on two filters. The first filter is built with a step of normalization with some other statistical tricks while the last represents the local contrast improvement algorithm. Thus, it can work on both CPU and GPU for real-time applications. We hope that our approach will open the door to new ideas in the community. Other advantages of our method are first that it does not need to be trained, then it does not need additional optimization processing. Furthermore, it can be used as a pre-treatment or post-processing step in many vision tasks. In addition, it does not need to convert the problem into a physical interpretation, and finally that it is very fast. This family of defogging algorithms is fairly simple, but it shows promising results compared to state-of-the-art algorithms based not only on a visual assessment but also on objective criteria.
摘要:除雾是在图像处理和计算机视觉社区,加强雾天条件下拍摄的图像的任务。为了更好地理解这种类型的算法中,我们提出本文档中的去雾方法,它是适合于若干局部对比度调整算法。我们将它基于两个过滤器。第一个过滤器与正常化的一些其他统计技巧一步建成而最后表示局部对比度改进算法。因此,它可以CPU和GPU进行实时应用上工作。我们希望,我们的方法将敞开大门,以在社区的新思路。我们的方法的其它优点是:第一,它并不需要训练,那么它不需要额外的优化处理。此外,它可以被用作许多视觉任务预处理或后处理步骤。此外,它并不需要把问题转换成物理解释,最后,这是非常快的。除雾的算法,这个家庭是相当简单的,但它显示了有希望的结果相比,不仅基于视觉评估,但也客观标准的国家的最先进的算法。
Bangyong Sun, Vincent Whannou de Dravo, Zhe Yu
Abstract: Dehazing is in the image processing and computer vision communities, the task of enhancing the image taken in foggy conditions. To better understand this type of algorithm, we present in this document a dehazing method which is suitable for several local contrast adjustment algorithms. We base it on two filters. The first filter is built with a step of normalization with some other statistical tricks while the last represents the local contrast improvement algorithm. Thus, it can work on both CPU and GPU for real-time applications. We hope that our approach will open the door to new ideas in the community. Other advantages of our method are first that it does not need to be trained, then it does not need additional optimization processing. Furthermore, it can be used as a pre-treatment or post-processing step in many vision tasks. In addition, it does not need to convert the problem into a physical interpretation, and finally that it is very fast. This family of defogging algorithms is fairly simple, but it shows promising results compared to state-of-the-art algorithms based not only on a visual assessment but also on objective criteria.
摘要:除雾是在图像处理和计算机视觉社区,加强雾天条件下拍摄的图像的任务。为了更好地理解这种类型的算法中,我们提出本文档中的去雾方法,它是适合于若干局部对比度调整算法。我们将它基于两个过滤器。第一个过滤器与正常化的一些其他统计技巧一步建成而最后表示局部对比度改进算法。因此,它可以CPU和GPU进行实时应用上工作。我们希望,我们的方法将敞开大门,以在社区的新思路。我们的方法的其它优点是:第一,它并不需要训练,那么它不需要额外的优化处理。此外,它可以被用作许多视觉任务预处理或后处理步骤。此外,它并不需要把问题转换成物理解释,最后,这是非常快的。除雾的算法,这个家庭是相当简单的,但它显示了有希望的结果相比,不仅基于视觉评估,但也客观标准的国家的最先进的算法。
25. Face Authentication from Grayscale Coded Light Field [PDF] 返回目录
Dana Weitzner, David Mendlovic, Raja Giryes
Abstract: Face verification is a fast-growing authentication tool for everyday systems, such as smartphones. While current 2D face recognition methods are very accurate, it has been suggested recently that one may wish to add a 3D sensor to such solutions to make them more reliable and robust to spoofing, e.g., using a 2D print of a person's face. Yet, this requires an additional relatively expensive depth sensor. To mitigate this, we propose a novel authentication system, based on slim grayscale coded light field imaging. We provide a reconstruction free fast anti-spoofing mechanism, working directly on the coded image. It is followed by a multi-view, multi-modal face verification network that given grayscale data together with a low-res depth map achieves competitive results to the RGB case. We demonstrate the effectiveness of our solution on a simulated 3D (RGBD) version of LFW, which will be made public, and a set of real faces acquired by a light field computational camera.
摘要:人脸验证是日常的系统,如智能手机快速增长的认证工具。虽然目前的二维人脸识别方法是非常准确的,最近已经表明,一个可能希望一个3D传感器加入到这样的解决方案,使其更加可靠和稳定,以欺骗,例如,用一个人的脸的2D打印。然而,这需要附加相当昂贵的深度传感器。为了减轻这个问题,我们提出了一种认证系统,是根据灰度级纤细编码光场成像。我们免费提供了一个快速的重建防伪机制,直接编码图像上工作。其次是多视图,多模态面部验证网络,与低分辨率的深度图一起给出灰度数据实现有竞争力的结果向RGB的情况。我们证明了我们在模拟的3D(RGBD)版本LFW的,将被公之于众解决方案的有效性,以及一组由光场相机的计算取得的现实面。
Dana Weitzner, David Mendlovic, Raja Giryes
Abstract: Face verification is a fast-growing authentication tool for everyday systems, such as smartphones. While current 2D face recognition methods are very accurate, it has been suggested recently that one may wish to add a 3D sensor to such solutions to make them more reliable and robust to spoofing, e.g., using a 2D print of a person's face. Yet, this requires an additional relatively expensive depth sensor. To mitigate this, we propose a novel authentication system, based on slim grayscale coded light field imaging. We provide a reconstruction free fast anti-spoofing mechanism, working directly on the coded image. It is followed by a multi-view, multi-modal face verification network that given grayscale data together with a low-res depth map achieves competitive results to the RGB case. We demonstrate the effectiveness of our solution on a simulated 3D (RGBD) version of LFW, which will be made public, and a set of real faces acquired by a light field computational camera.
摘要:人脸验证是日常的系统,如智能手机快速增长的认证工具。虽然目前的二维人脸识别方法是非常准确的,最近已经表明,一个可能希望一个3D传感器加入到这样的解决方案,使其更加可靠和稳定,以欺骗,例如,用一个人的脸的2D打印。然而,这需要附加相当昂贵的深度传感器。为了减轻这个问题,我们提出了一种认证系统,是根据灰度级纤细编码光场成像。我们免费提供了一个快速的重建防伪机制,直接编码图像上工作。其次是多视图,多模态面部验证网络,与低分辨率的深度图一起给出灰度数据实现有竞争力的结果向RGB的情况。我们证明了我们在模拟的3D(RGBD)版本LFW的,将被公之于众解决方案的有效性,以及一组由光场相机的计算取得的现实面。
26. Exemplar-based Generative Facial Editing [PDF] 返回目录
Jingtao Guo, Yi Liu, Zhenzhen Qian, Zuowei Zhou
Abstract: Image synthesis has witnessed substantial progress due to the increasing power of generative model. This paper we propose a novel generative approach for exemplar based facial editing in the form of the region inpainting. Our method first masks the facial editing region to eliminates the pixel constraints of the original image, then exemplar based facial editing can be achieved by learning the corresponding information from the reference image to complete the masked region. In additional, we impose the attribute labels constraint to model disentangled encodings in order to avoid undesired information being transferred from the exemplar to the original image editing region. Experimental results demonstrate our method can produce diverse and personalized face editing results and provide far more user control flexibility than nearly all existing methods.
摘要:图像合成见证了实质性的进展,由于生成模型的力量增强。本文中,我们提出了在该地区的补绘的形式范例基于脸部编辑一个新的生成方法。我们的方法的第一掩模面部编辑区域以消除原始图像的像素的限制,然后范例基于面部编辑可以通过从参考图像学习相应的信息以完成掩模区来实现。在另外的,我们强加约束的标签以模型解开编码的属性,以避免不需要的信息从典范转移到原始图像编辑区域。实验结果表明,我们的方法可以产生不同的,个性化的面孔编辑结果,并提供更多的用户控制的灵活性比几乎所有现有的方法。
Jingtao Guo, Yi Liu, Zhenzhen Qian, Zuowei Zhou
Abstract: Image synthesis has witnessed substantial progress due to the increasing power of generative model. This paper we propose a novel generative approach for exemplar based facial editing in the form of the region inpainting. Our method first masks the facial editing region to eliminates the pixel constraints of the original image, then exemplar based facial editing can be achieved by learning the corresponding information from the reference image to complete the masked region. In additional, we impose the attribute labels constraint to model disentangled encodings in order to avoid undesired information being transferred from the exemplar to the original image editing region. Experimental results demonstrate our method can produce diverse and personalized face editing results and provide far more user control flexibility than nearly all existing methods.
摘要:图像合成见证了实质性的进展,由于生成模型的力量增强。本文中,我们提出了在该地区的补绘的形式范例基于脸部编辑一个新的生成方法。我们的方法的第一掩模面部编辑区域以消除原始图像的像素的限制,然后范例基于面部编辑可以通过从参考图像学习相应的信息以完成掩模区来实现。在另外的,我们强加约束的标签以模型解开编码的属性,以避免不需要的信息从典范转移到原始图像编辑区域。实验结果表明,我们的方法可以产生不同的,个性化的面孔编辑结果,并提供更多的用户控制的灵活性比几乎所有现有的方法。
27. End-to-End Change Detection for High Resolution Drone Images with GAN Architecture [PDF] 返回目录
Yura Zharkovsky, Ovadya Menadeva
Abstract: Monitoring large areas is presently feasible with high resolution drone cameras, as opposed to time-consuming and expensive ground surveys. In this work we reveal for the first time, the potential of using a state-of-the-art change detection GAN based algorithm with high resolution drone images for infrastructure inspection. We demonstrate this concept on solar panel installation. A deep learning, data-driven algorithm for identifying changes based on a change detection deep learning algorithm was proposed. We use the Conditional Adversarial Network approach to present a framework for change detection in images. The proposed network architecture is based on pix2pix GAN framework. Extensive experimental results have shown that our proposed approach outperforms the other state-of-the-art change detection methods.
摘要:监控大面积与高分辨率无人机相机目前是可行的,而不是费时和昂贵的地面调查。在这项工作中,我们揭示了首次采用高分辨率无人机图像基础设施检验一个国家的最先进的变化检测GaN基算法的潜力。我们证明上的太阳能电池板安装这个概念。提出了一种用于基于一个变化检测深学习算法变化的深度学习,数据驱动的算法。我们使用条件对抗性网络的方式呈现在图像变化检测的框架。所提出的网络体系结构是基于pix2pix甘框架。大量的实验结果表明,我们所提出的方法比其他国家的最先进的变化检测方法。
Yura Zharkovsky, Ovadya Menadeva
Abstract: Monitoring large areas is presently feasible with high resolution drone cameras, as opposed to time-consuming and expensive ground surveys. In this work we reveal for the first time, the potential of using a state-of-the-art change detection GAN based algorithm with high resolution drone images for infrastructure inspection. We demonstrate this concept on solar panel installation. A deep learning, data-driven algorithm for identifying changes based on a change detection deep learning algorithm was proposed. We use the Conditional Adversarial Network approach to present a framework for change detection in images. The proposed network architecture is based on pix2pix GAN framework. Extensive experimental results have shown that our proposed approach outperforms the other state-of-the-art change detection methods.
摘要:监控大面积与高分辨率无人机相机目前是可行的,而不是费时和昂贵的地面调查。在这项工作中,我们揭示了首次采用高分辨率无人机图像基础设施检验一个国家的最先进的变化检测GaN基算法的潜力。我们证明上的太阳能电池板安装这个概念。提出了一种用于基于一个变化检测深学习算法变化的深度学习,数据驱动的算法。我们使用条件对抗性网络的方式呈现在图像变化检测的框架。所提出的网络体系结构是基于pix2pix甘框架。大量的实验结果表明,我们所提出的方法比其他国家的最先进的变化检测方法。
28. Modified Segmentation Algorithm for Recognition of Older Geez Scripts Written on Vellum [PDF] 返回目录
Girma Negashe, Adane Mamuye
Abstract: Recognition of handwritten document aims at transforming document images into a machine understandable format. Handwritten document recognition is the most challenging area in the field of pattern recognition. It becomes more complex when a document was written on vellum before hundreds of years, like older Geez scripts. In this study, we introduced a modified segmentation approach to recognize older Geez scripts. We used adaptive filtering for noise reduction, Isodata iterative global thresholding for document image binarization, modified bounding box projection to segment distinct strokes between Geez characters, numbers, and punctuation marks. SVM multiclass classifier scored 79.32% recognition accuracy with the modified segmentation algorithm.
摘要:手写文件旨在转化文档图像转换成机器可以理解的格式识别。手写文档识别是模式识别领域最具挑战性的区域。当一个文件是数百年来,如老式吉兹脚本之前写在羊皮纸它变得更加复杂。在这项研究中,我们推出了修改的分段的方法来认识到老年人吉兹脚本。我们使用自适应滤波用于降噪,ISODATA迭代全局阈值用于文档图像二值化,改性边界框投影到吉兹字符,数字,和标点符号之间段不同笔画。 SVM多分类打进79.32%的识别精度与修改后的分割算法。
Girma Negashe, Adane Mamuye
Abstract: Recognition of handwritten document aims at transforming document images into a machine understandable format. Handwritten document recognition is the most challenging area in the field of pattern recognition. It becomes more complex when a document was written on vellum before hundreds of years, like older Geez scripts. In this study, we introduced a modified segmentation approach to recognize older Geez scripts. We used adaptive filtering for noise reduction, Isodata iterative global thresholding for document image binarization, modified bounding box projection to segment distinct strokes between Geez characters, numbers, and punctuation marks. SVM multiclass classifier scored 79.32% recognition accuracy with the modified segmentation algorithm.
摘要:手写文件旨在转化文档图像转换成机器可以理解的格式识别。手写文档识别是模式识别领域最具挑战性的区域。当一个文件是数百年来,如老式吉兹脚本之前写在羊皮纸它变得更加复杂。在这项研究中,我们推出了修改的分段的方法来认识到老年人吉兹脚本。我们使用自适应滤波用于降噪,ISODATA迭代全局阈值用于文档图像二值化,改性边界框投影到吉兹字符,数字,和标点符号之间段不同笔画。 SVM多分类打进79.32%的识别精度与修改后的分割算法。
29. Fast Enhancement for Non-Uniform Illumination Images using Light-weight CNNs [PDF] 返回目录
Feifan Lv, Bo Liu, Feng Lu
Abstract: This paper proposes a new light-weight convolutional neural network (5k parameters) for non-uniform illumination image enhancement to handle color, exposure, contrast, noise and artifacts, etc., simultaneously and effectively. More concretely, the input image is first enhanced using Retinex model from dual different aspects (enhancing under-exposure and suppressing over-exposure), respectively. Then, these two enhanced results and the original image are fused to obtain an image with satisfactory brightness, contrast and details. Finally, the extra noise and compression artifacts are removed to get the final result. To train this network, we propose a semi-supervised retouching solution and construct a new dataset (82k images) contains various scenes and light conditions. Our model can enhance 0.5 mega-pixel (like 600*800) images in real time (50 fps), which is faster than existing enhancement methods. Extensive experiments show that our solution is fast and effective to deal with non-uniform illumination images.
摘要:本文提出了一种用于非均匀照明图像增强处理色彩,曝光,对比度,噪声和伪像等新的重量轻的卷积神经网络(5K参数),同时,有效地。更具体地,输入图像是使用从双不同方面的Retinex模型第一增强(增强下曝光并抑制过度曝光),分别。然后,这两个增强的结果和原始图像融合,以获得具有令人满意的亮度,对比度和细节的图像。最后,额外的噪声和压缩失真被移除,以得到最终结果。为了训练这个网络中,我们提出了一种半监督润饰解决方案,构建一个新的数据集(82K图片)含有各种场景和光线条件。我们的模型可以提高0.5百万像素(如600 * 800)实时(50 fps)的速度,比现有的增强方法更快的图像。大量的实验表明,我们的解决方案是快速和有效应对非均匀照明的图像。
Feifan Lv, Bo Liu, Feng Lu
Abstract: This paper proposes a new light-weight convolutional neural network (5k parameters) for non-uniform illumination image enhancement to handle color, exposure, contrast, noise and artifacts, etc., simultaneously and effectively. More concretely, the input image is first enhanced using Retinex model from dual different aspects (enhancing under-exposure and suppressing over-exposure), respectively. Then, these two enhanced results and the original image are fused to obtain an image with satisfactory brightness, contrast and details. Finally, the extra noise and compression artifacts are removed to get the final result. To train this network, we propose a semi-supervised retouching solution and construct a new dataset (82k images) contains various scenes and light conditions. Our model can enhance 0.5 mega-pixel (like 600*800) images in real time (50 fps), which is faster than existing enhancement methods. Extensive experiments show that our solution is fast and effective to deal with non-uniform illumination images.
摘要:本文提出了一种用于非均匀照明图像增强处理色彩,曝光,对比度,噪声和伪像等新的重量轻的卷积神经网络(5K参数),同时,有效地。更具体地,输入图像是使用从双不同方面的Retinex模型第一增强(增强下曝光并抑制过度曝光),分别。然后,这两个增强的结果和原始图像融合,以获得具有令人满意的亮度,对比度和细节的图像。最后,额外的噪声和压缩失真被移除,以得到最终结果。为了训练这个网络中,我们提出了一种半监督润饰解决方案,构建一个新的数据集(82K图片)含有各种场景和光线条件。我们的模型可以提高0.5百万像素(如600 * 800)实时(50 fps)的速度,比现有的增强方法更快的图像。大量的实验表明,我们的解决方案是快速和有效应对非均匀照明的图像。
30. EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Neuromorphic Vision Sensors [PDF] 返回目录
Deepak Singla, Vivek Mohan, Tarun Pulluri, Andres Ussa, Bharath Ramesh, Arindam Basu
Abstract: In this paper, we present a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic vision sensor (NVS) used in the application of traffic monitoring with a hardware efficient processing pipeline that optimizes memory and computational needs. The usage of NVS gives the advantage of rejecting background while it has a unique disadvantage of fragmented objects due to lack of events generated by smooth areas such as glass windows. To exploit the background removal, we propose an event based binary image (EBBI) creation that signals presence or absence of events in a frame duration. This reduces memory requirement and enables usage of simple algorithms like median filtering and connected component labeling (CCL) for denoise and region proposal (RP) respectively. To overcome the fragmentation issue, a YOLO inspired neural network based detector and classifier (NNDC) to merge fragmented region proposals has been proposed. Finally, a simplified version of Kalman filter, termed overlap based tracker (OT), exploiting overlap between detections and tracks is proposed with heuristics to overcome occlusion. The proposed pipeline is evaluated using more than 5 hours of traffic recordings. Our proposed hybrid architecture outperformed (AUC = $0.45$) Deep learning (DL) based tracker SiamMask (AUC = $0.33$) operating on simultaneously recorded RGB frames while requiring $2200\times$ less computations. Compared to pure event based mean shift (AUC = $0.31$), our approach requires $68\times$ more computations but provides much better performance. Finally, we also evaluated our performance on two different NVS: DAVIS and CeleX and demonstrated similar gains. To the best of our knowledge, this is the first report where an NVS based solution is directly compared to other simultaneously recorded frame based method and shows tremendous promise.
摘要:在本文中,我们提出了一种用于检测和跟踪由流量与硬件高效的处理管线优化存储器和计算需求监视应用中使用的固定的神经形态视觉传感器(NVS)记录对象的混合动力事件帧的方法。 NVS的使用给出了由于缺少由平滑区域,如玻璃窗中产生的事件的拒绝背景而它具有分段对象的唯一缺点的优点。要利用的背景去除,提出了一种基于事件的二进制图像(EBBI)创建该信号存在或不存在的事件在一帧的持续时间。这减少了存储器需求,允许使用简单算法,如分别用于降噪和区域提案(RP)中值滤波和连通区域标记(CCL)的使用。为了克服碎片问题,一个YOLO启发基于神经网络检测器和分类(NNDC)合并分散的区域建议已经提出。最后,卡尔曼滤波器的简化版本,称为重叠基于跟踪器(OT),提出了用试探法来克服遮挡检测和轨道之间利用重叠。拟议的管道使用5个多小时的流量记录的评估。我们提出的混合体系结构优于同时需要$ 2200 \倍$更少的计算操作上同时记录RGB帧(AUC = $ $ 0.45)深学习(DL)基于跟踪SiamMask(AUC = $ $ 0.33)。相比于纯基于事件的均值漂移(AUC = $ 0.31 $),我们的方法需要$ 68 \ $倍以上的计算,但提供了更好的性能。最后,我们还评估了我们在两个不同的NVS性能:戴维斯和CELEX并展示了类似的收益。据我们所知,这是其中一个基于NVS解决方案是直接相对于其他同时记录基于帧的方法,并显示了巨大的承诺的第一份报告。
Deepak Singla, Vivek Mohan, Tarun Pulluri, Andres Ussa, Bharath Ramesh, Arindam Basu
Abstract: In this paper, we present a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic vision sensor (NVS) used in the application of traffic monitoring with a hardware efficient processing pipeline that optimizes memory and computational needs. The usage of NVS gives the advantage of rejecting background while it has a unique disadvantage of fragmented objects due to lack of events generated by smooth areas such as glass windows. To exploit the background removal, we propose an event based binary image (EBBI) creation that signals presence or absence of events in a frame duration. This reduces memory requirement and enables usage of simple algorithms like median filtering and connected component labeling (CCL) for denoise and region proposal (RP) respectively. To overcome the fragmentation issue, a YOLO inspired neural network based detector and classifier (NNDC) to merge fragmented region proposals has been proposed. Finally, a simplified version of Kalman filter, termed overlap based tracker (OT), exploiting overlap between detections and tracks is proposed with heuristics to overcome occlusion. The proposed pipeline is evaluated using more than 5 hours of traffic recordings. Our proposed hybrid architecture outperformed (AUC = $0.45$) Deep learning (DL) based tracker SiamMask (AUC = $0.33$) operating on simultaneously recorded RGB frames while requiring $2200\times$ less computations. Compared to pure event based mean shift (AUC = $0.31$), our approach requires $68\times$ more computations but provides much better performance. Finally, we also evaluated our performance on two different NVS: DAVIS and CeleX and demonstrated similar gains. To the best of our knowledge, this is the first report where an NVS based solution is directly compared to other simultaneously recorded frame based method and shows tremendous promise.
摘要:在本文中,我们提出了一种用于检测和跟踪由流量与硬件高效的处理管线优化存储器和计算需求监视应用中使用的固定的神经形态视觉传感器(NVS)记录对象的混合动力事件帧的方法。 NVS的使用给出了由于缺少由平滑区域,如玻璃窗中产生的事件的拒绝背景而它具有分段对象的唯一缺点的优点。要利用的背景去除,提出了一种基于事件的二进制图像(EBBI)创建该信号存在或不存在的事件在一帧的持续时间。这减少了存储器需求,允许使用简单算法,如分别用于降噪和区域提案(RP)中值滤波和连通区域标记(CCL)的使用。为了克服碎片问题,一个YOLO启发基于神经网络检测器和分类(NNDC)合并分散的区域建议已经提出。最后,卡尔曼滤波器的简化版本,称为重叠基于跟踪器(OT),提出了用试探法来克服遮挡检测和轨道之间利用重叠。拟议的管道使用5个多小时的流量记录的评估。我们提出的混合体系结构优于同时需要$ 2200 \倍$更少的计算操作上同时记录RGB帧(AUC = $ $ 0.45)深学习(DL)基于跟踪SiamMask(AUC = $ $ 0.33)。相比于纯基于事件的均值漂移(AUC = $ 0.31 $),我们的方法需要$ 68 \ $倍以上的计算,但提供了更好的性能。最后,我们还评估了我们在两个不同的NVS性能:戴维斯和CELEX并展示了类似的收益。据我们所知,这是其中一个基于NVS解决方案是直接相对于其他同时记录基于帧的方法,并显示了巨大的承诺的第一份报告。
31. Attribute-Induced Bias Eliminating for Transductive Zero-Shot Learning [PDF] 返回目录
Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu
Abstract: Transductive Zero-shot learning (ZSL) targets to recognize the unseen categories by aligning the visual and semantic information in a joint embedding space. There exist four kinds of domain biases in Transductive ZSL, i.e., visual bias and semantic bias between two domains and two visual-semantic biases in respective seen and unseen domains, but existing work only focuses on the part of them, which leads to severe semantic ambiguity during the knowledge transfer. To solve the above problem, we propose a novel Attribute-Induced Bias Eliminating (AIBE) module for Transductive ZSL. Specifically, for the visual bias between two domains, the Mean-Teacher module is first leveraged to bridge the visual representation discrepancy between two domains with unsupervised learning and unlabelled images. Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories. Besides, to reduce the semantic-visual bias in the seen domain, we align the visual center of each category, instead of the individual visual data point, with the corresponding semantic attributes, which further preserves the semantic relationship in the embedding space. Finally, for the semantic-visual bias in the unseen domain, an unseen semantic alignment constraint is designed to align visual and semantic space in an unsupervised manner. The evaluations on several benchmarks demonstrate the effectiveness of the proposed method, e.g., obtaining the 82.8%/75.5%, 97.1%/82.5%, and 73.2%/52.1% for Conventional/Generalized ZSL settings for CUB, AwA2, and SUN datasets, respectively.
摘要:直推零次学习(ZSL)目标对准在一份联合嵌入空间视觉和语义信息来识别看不见的类别。存在4种域偏见的直推式ZSL,即视觉偏差和两个域,并在各自的可见和不可见的领域两个可视语义偏见,但现有的工作之间的语义偏差只关注其中的一部分,从而导致严重的语义知识转移过程中的模糊性。为了解决上述问题,我们提出了一种新颖的属性诱导偏置消除(AIBE)为直推ZSL模块。具体地,两个结构域之间的视觉偏差,平均数教师模块被第一杠杆桥接与无监督学习和未标记的图像的两个结构域之间的视觉表示差异。然后,注意力图表属性嵌入,提出了减少可见和不可见的类别之间的语义偏压,其利用图形操作来捕获类别之间的语义关系。此外,为了减少在可见域的语义视觉偏差,我们对准每个类别的视觉中心,而不是单独的视觉数据点,与对应的语义属性,这进一步蜜饯在嵌入空间的语义关系。最后,对于在看不见的域中的语义视觉偏差,一个看不见的语义对齐约束被设计为以无监督的方式对准的视觉和语义空间。在几个基准的评估证明了该方法的有效性,例如,获得82.8%/ 75.5%,97.1%/ 82.5%,并为古巴,AwA2和SUN数据集常规/广义ZSL设置73.2%/ 52.1%,分别。
Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu
Abstract: Transductive Zero-shot learning (ZSL) targets to recognize the unseen categories by aligning the visual and semantic information in a joint embedding space. There exist four kinds of domain biases in Transductive ZSL, i.e., visual bias and semantic bias between two domains and two visual-semantic biases in respective seen and unseen domains, but existing work only focuses on the part of them, which leads to severe semantic ambiguity during the knowledge transfer. To solve the above problem, we propose a novel Attribute-Induced Bias Eliminating (AIBE) module for Transductive ZSL. Specifically, for the visual bias between two domains, the Mean-Teacher module is first leveraged to bridge the visual representation discrepancy between two domains with unsupervised learning and unlabelled images. Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories. Besides, to reduce the semantic-visual bias in the seen domain, we align the visual center of each category, instead of the individual visual data point, with the corresponding semantic attributes, which further preserves the semantic relationship in the embedding space. Finally, for the semantic-visual bias in the unseen domain, an unseen semantic alignment constraint is designed to align visual and semantic space in an unsupervised manner. The evaluations on several benchmarks demonstrate the effectiveness of the proposed method, e.g., obtaining the 82.8%/75.5%, 97.1%/82.5%, and 73.2%/52.1% for Conventional/Generalized ZSL settings for CUB, AwA2, and SUN datasets, respectively.
摘要:直推零次学习(ZSL)目标对准在一份联合嵌入空间视觉和语义信息来识别看不见的类别。存在4种域偏见的直推式ZSL,即视觉偏差和两个域,并在各自的可见和不可见的领域两个可视语义偏见,但现有的工作之间的语义偏差只关注其中的一部分,从而导致严重的语义知识转移过程中的模糊性。为了解决上述问题,我们提出了一种新颖的属性诱导偏置消除(AIBE)为直推ZSL模块。具体地,两个结构域之间的视觉偏差,平均数教师模块被第一杠杆桥接与无监督学习和未标记的图像的两个结构域之间的视觉表示差异。然后,注意力图表属性嵌入,提出了减少可见和不可见的类别之间的语义偏压,其利用图形操作来捕获类别之间的语义关系。此外,为了减少在可见域的语义视觉偏差,我们对准每个类别的视觉中心,而不是单独的视觉数据点,与对应的语义属性,这进一步蜜饯在嵌入空间的语义关系。最后,对于在看不见的域中的语义视觉偏差,一个看不见的语义对齐约束被设计为以无监督的方式对准的视觉和语义空间。在几个基准的评估证明了该方法的有效性,例如,获得82.8%/ 75.5%,97.1%/ 82.5%,并为古巴,AwA2和SUN数据集常规/广义ZSL设置73.2%/ 52.1%,分别。
32. Entropy Decision Fusion for Smartphone Sensor based Human Activity Recognition [PDF] 返回目录
Olasimbo Ayodeji Arigbabu
Abstract: Human activity recognition serves an important part in building continuous behavioral monitoring systems, which are deployable for visual surveillance, patient rehabilitation, gaming, and even personally inclined smart homes. This paper demonstrates our efforts to develop a collaborative decision fusion mechanism for integrating the predicted scores from multiple learning algorithms trained on smartphone sensor based human activity data. We present an approach for fusing convolutional neural network, recurrent convolutional network, and support vector machine by computing and fusing the relative weighted scores from each classifier based on Tsallis entropy to improve human activity recognition performance. To assess the suitability of this approach, experiments are conducted on two benchmark datasets, UCI-HAR and WISDM. The recognition results attained using the proposed approach are comparable to existing methods.
摘要:人类活动识别服务于建设连续行为监测系统,这是部署了视频监控,病人康复,游戏,甚至亲自倾斜智能家居的重要组成部分。本文展示了我们的努力,开发出协同决策融合机制从训练的智能手机上的基于传感器的人类活动数据的多个学习算法集成的预测分数。我们提出了通过计算和融合来自每个分类器基于Tsallis相对加权得分熵改善人类活动识别性能熔合卷积神经网络,递归卷积网络,支持向量机的方法。为了评估这一办法的适用性,实验在两个标准数据集,UCI-HAR和WISDM进行。使用该方法获得的识别结果与现有方法。
Olasimbo Ayodeji Arigbabu
Abstract: Human activity recognition serves an important part in building continuous behavioral monitoring systems, which are deployable for visual surveillance, patient rehabilitation, gaming, and even personally inclined smart homes. This paper demonstrates our efforts to develop a collaborative decision fusion mechanism for integrating the predicted scores from multiple learning algorithms trained on smartphone sensor based human activity data. We present an approach for fusing convolutional neural network, recurrent convolutional network, and support vector machine by computing and fusing the relative weighted scores from each classifier based on Tsallis entropy to improve human activity recognition performance. To assess the suitability of this approach, experiments are conducted on two benchmark datasets, UCI-HAR and WISDM. The recognition results attained using the proposed approach are comparable to existing methods.
摘要:人类活动识别服务于建设连续行为监测系统,这是部署了视频监控,病人康复,游戏,甚至亲自倾斜智能家居的重要组成部分。本文展示了我们的努力,开发出协同决策融合机制从训练的智能手机上的基于传感器的人类活动数据的多个学习算法集成的预测分数。我们提出了通过计算和融合来自每个分类器基于Tsallis相对加权得分熵改善人类活动识别性能熔合卷积神经网络,递归卷积网络,支持向量机的方法。为了评估这一办法的适用性,实验在两个标准数据集,UCI-HAR和WISDM进行。使用该方法获得的识别结果与现有方法。
33. Critical Assessment of Transfer Learning for Medical Image Segmentation with Fully Convolutional Neural Networks [PDF] 返回目录
Davood Karimi, Simon K. Warfield, Ali Gholipour
Abstract: Transfer learning is widely used for training machine learning models. Here, we study the role of transfer learning for training fully convolutional networks (FCNs) for medical image segmentation. Our experiments show that although transfer learning reduces the training time on the target task, the improvement in segmentation accuracy is highly task/data-dependent. Larger improvements in accuracy are observed when the segmentation task is more challenging and the target training data is smaller. We observe that convolutional filters of an FCN change little during training for medical image segmentation, and still look random at convergence. We further show that quite accurate FCNs can be built by freezing the encoder section of the network at random values and only training the decoder section. At least for medical image segmentation, this finding challenges the common belief that the encoder section needs to learn data/task-specific representations. We examine the evolution of FCN representations to gain a better insight into the effects of transfer learning on the training dynamics. Our analysis shows that although FCNs trained via transfer learning learn different representations than FCNs trained with random initialization, the variability among FCNs trained via transfer learning can be as high as that among FCNs trained with random initialization. Moreover, feature reuse is not restricted to the early encoder layers; rather, it can be more significant in deeper layers. These findings offer new insights and suggest alternative ways of training FCNs for medical image segmentation.
摘要:迁移学习被广泛用于训练机器学习模型。在这里,我们学习迁移学习的作用,为医学图像分割训练完全卷积网络(FCNs)。我们的实验表明,尽管迁移学习降低目标任务的训练时间,在分割精度的提高是非常任务/数据依赖。当分割任务更有挑战性的精度较大改进观察和目标训练数据较小。我们的医学图像分割训练中观察到的变化FCN一点是卷积过滤器,仍然看起来是随机的收敛。进一步的研究表明相当准确FCNs可以通过随机值冻结了网络的编码器部分,只有训练解码器部分建成。至少对于医学图像分割,这一发现挑战了共同的信念,即编码器部分需要了解数据/任务的具体表示。我们研究FCN表示的进化,以更好地洞察传递学习的培训力度的影响。我们的分析表明,虽然通过转移学习培训的FCNs学习比随机初始化训练有素FCNs不同的表示,通过转移学习培训的FCNs中的变异可高达与随机初始化训练有素FCNs之中。此外,特征重用不限于早期编码器层;相反,它可以在更深的层次更加显著。这些发现提供了新的见解和建议的培训FCNs的医学图像分割的替代方式。
Davood Karimi, Simon K. Warfield, Ali Gholipour
Abstract: Transfer learning is widely used for training machine learning models. Here, we study the role of transfer learning for training fully convolutional networks (FCNs) for medical image segmentation. Our experiments show that although transfer learning reduces the training time on the target task, the improvement in segmentation accuracy is highly task/data-dependent. Larger improvements in accuracy are observed when the segmentation task is more challenging and the target training data is smaller. We observe that convolutional filters of an FCN change little during training for medical image segmentation, and still look random at convergence. We further show that quite accurate FCNs can be built by freezing the encoder section of the network at random values and only training the decoder section. At least for medical image segmentation, this finding challenges the common belief that the encoder section needs to learn data/task-specific representations. We examine the evolution of FCN representations to gain a better insight into the effects of transfer learning on the training dynamics. Our analysis shows that although FCNs trained via transfer learning learn different representations than FCNs trained with random initialization, the variability among FCNs trained via transfer learning can be as high as that among FCNs trained with random initialization. Moreover, feature reuse is not restricted to the early encoder layers; rather, it can be more significant in deeper layers. These findings offer new insights and suggest alternative ways of training FCNs for medical image segmentation.
摘要:迁移学习被广泛用于训练机器学习模型。在这里,我们学习迁移学习的作用,为医学图像分割训练完全卷积网络(FCNs)。我们的实验表明,尽管迁移学习降低目标任务的训练时间,在分割精度的提高是非常任务/数据依赖。当分割任务更有挑战性的精度较大改进观察和目标训练数据较小。我们的医学图像分割训练中观察到的变化FCN一点是卷积过滤器,仍然看起来是随机的收敛。进一步的研究表明相当准确FCNs可以通过随机值冻结了网络的编码器部分,只有训练解码器部分建成。至少对于医学图像分割,这一发现挑战了共同的信念,即编码器部分需要了解数据/任务的具体表示。我们研究FCN表示的进化,以更好地洞察传递学习的培训力度的影响。我们的分析表明,虽然通过转移学习培训的FCNs学习比随机初始化训练有素FCNs不同的表示,通过转移学习培训的FCNs中的变异可高达与随机初始化训练有素FCNs之中。此外,特征重用不限于早期编码器层;相反,它可以在更深的层次更加显著。这些发现提供了新的见解和建议的培训FCNs的医学图像分割的替代方式。
34. Semi-Supervised Fine-Tuning for Deep Learning Models in Remote Sensing Applications [PDF] 返回目录
Eftychios Protopapadakis, Anastasios Doulamis, Nikolaos Doulamis, Evangelos Maltezos
Abstract: A combinatory approach of two well-known fields: deep learning and semi supervised learning is presented, to tackle the land cover identification problem. The proposed methodology demonstrates the impact on the performance of deep learning models, when SSL approaches are used as performance functions during training. Obtained results, at pixel level segmentation tasks over orthoimages, suggest that SSL enhanced loss functions can be beneficial in models' performance.
摘要:两个著名的字段的组合子方法:深学习和半监督学习提出,应对土地覆盖识别问题。拟议的方法表明深学习模型的性能,在训练过程中SSL的方法被用作功能的不同影响。得到的结果,在超过正射影像像素级细分任务,建议SSL增强损失函数可以在模型的性能是有益的。
Eftychios Protopapadakis, Anastasios Doulamis, Nikolaos Doulamis, Evangelos Maltezos
Abstract: A combinatory approach of two well-known fields: deep learning and semi supervised learning is presented, to tackle the land cover identification problem. The proposed methodology demonstrates the impact on the performance of deep learning models, when SSL approaches are used as performance functions during training. Obtained results, at pixel level segmentation tasks over orthoimages, suggest that SSL enhanced loss functions can be beneficial in models' performance.
摘要:两个著名的字段的组合子方法:深学习和半监督学习提出,应对土地覆盖识别问题。拟议的方法表明深学习模型的性能,在训练过程中SSL的方法被用作功能的不同影响。得到的结果,在超过正射影像像素级细分任务,建议SSL增强损失函数可以在模型的性能是有益的。
35. SDCT-AuxNet$^θ$: DCT Augmented Stain Deconvolutional CNN with Auxiliary Classifier for Cancer Diagnosis [PDF] 返回目录
Shiv Gehlot, Anubha Gupta, Ritu Gupta
Abstract: Acute lymphoblastic leukemia (ALL) is a pervasive pediatric white blood cell cancer across the globe. With the popularity of convolutional neural networks (CNNs), computer-aided diagnosis of cancer has attracted considerable attention. Such tools are easily deployable and are cost-effective. Hence, these can enable extensive coverage of cancer diagnostic facilities. However, the development of such a tool for ALL cancer was challenging so far due to the non-availability of a large training dataset. The visual similarity between the malignant and normal cells adds to the complexity of the problem. This paper discusses the recent release of a large dataset and presents a novel deep learning architecture for the classification of cell images of ALL cancer. The proposed architecture, namely, SDCT-AuxNet$^{\theta}$ is a 2-module framework that utilizes a compact CNN as the main classifier in one module and a Kernel SVM as the auxiliary classifier in the other one. While CNN classifier uses features through bilinear-pooling, spectral-averaged features are used by the auxiliary classifier. Further, this CNN is trained on the stain deconvolved quantity images in the optical density domain instead of the conventional RGB images. A novel test strategy is proposed that exploits both the classifiers for decision making using the confidence scores of their predicted class labels. Elaborate experiments have been carried out on our recently released public dataset of 15114 images of ALL cancer and healthy cells to establish the validity of the proposed methodology that is also robust to subject-level variability. A weighted F1 score of 94.8$\%$ is obtained that is best so far on this challenging dataset.
摘要:急性淋巴细胞白血病(ALL)是一种无处不在小儿白细胞世界各地的癌症。随着卷积神经网络(细胞神经网络)的普及,癌症计算机辅助诊断吸引了相当多的关注。这些工具都易于部署且具有成本效益。因此,这些可以使癌症的诊断设施广泛报道。然而,这样的工具为所有癌症的发展是由于大量的训练数据集的非可用性到目前为止挑战。恶性细胞和正常细胞之间的视觉相似性增加了这个问题的复杂性。本文讨论了最近的大型数据集,并提出了一种深度学习架构,所有癌症的细胞图像的分类的释放。所提出的架构,即,SDCT-AuxNet $ ^ {\ THETA} $是一个2模块框架,它利用紧凑CNN为一个模块在主分类器和核SVM如在其它一个辅助分类器。而CNN分类器使用设有通过双线性池,频谱平均特征由辅助分类器中使用。此外,该CNN上在光密度域,而不是传统的RGB图像的污点量去卷积图像训练。一种新的测试策略建议漏洞既为决策的分类使用他们的预测类标签的信心分数决策。精心制作的实验已经进行了在我们最近发布的所有癌症细胞和健康细胞的15114个图像的公共数据集建立拟议的方法,这也是稳健的对象级的变异的有效性。获得94.8 $ \%$的加权F1得分是最好的,到目前为止对这个具有挑战性的数据集。
Shiv Gehlot, Anubha Gupta, Ritu Gupta
Abstract: Acute lymphoblastic leukemia (ALL) is a pervasive pediatric white blood cell cancer across the globe. With the popularity of convolutional neural networks (CNNs), computer-aided diagnosis of cancer has attracted considerable attention. Such tools are easily deployable and are cost-effective. Hence, these can enable extensive coverage of cancer diagnostic facilities. However, the development of such a tool for ALL cancer was challenging so far due to the non-availability of a large training dataset. The visual similarity between the malignant and normal cells adds to the complexity of the problem. This paper discusses the recent release of a large dataset and presents a novel deep learning architecture for the classification of cell images of ALL cancer. The proposed architecture, namely, SDCT-AuxNet$^{\theta}$ is a 2-module framework that utilizes a compact CNN as the main classifier in one module and a Kernel SVM as the auxiliary classifier in the other one. While CNN classifier uses features through bilinear-pooling, spectral-averaged features are used by the auxiliary classifier. Further, this CNN is trained on the stain deconvolved quantity images in the optical density domain instead of the conventional RGB images. A novel test strategy is proposed that exploits both the classifiers for decision making using the confidence scores of their predicted class labels. Elaborate experiments have been carried out on our recently released public dataset of 15114 images of ALL cancer and healthy cells to establish the validity of the proposed methodology that is also robust to subject-level variability. A weighted F1 score of 94.8$\%$ is obtained that is best so far on this challenging dataset.
摘要:急性淋巴细胞白血病(ALL)是一种无处不在小儿白细胞世界各地的癌症。随着卷积神经网络(细胞神经网络)的普及,癌症计算机辅助诊断吸引了相当多的关注。这些工具都易于部署且具有成本效益。因此,这些可以使癌症的诊断设施广泛报道。然而,这样的工具为所有癌症的发展是由于大量的训练数据集的非可用性到目前为止挑战。恶性细胞和正常细胞之间的视觉相似性增加了这个问题的复杂性。本文讨论了最近的大型数据集,并提出了一种深度学习架构,所有癌症的细胞图像的分类的释放。所提出的架构,即,SDCT-AuxNet $ ^ {\ THETA} $是一个2模块框架,它利用紧凑CNN为一个模块在主分类器和核SVM如在其它一个辅助分类器。而CNN分类器使用设有通过双线性池,频谱平均特征由辅助分类器中使用。此外,该CNN上在光密度域,而不是传统的RGB图像的污点量去卷积图像训练。一种新的测试策略建议漏洞既为决策的分类使用他们的预测类标签的信心分数决策。精心制作的实验已经进行了在我们最近发布的所有癌症细胞和健康细胞的15114个图像的公共数据集建立拟议的方法,这也是稳健的对象级的变异的有效性。获得94.8 $ \%$的加权F1得分是最好的,到目前为止对这个具有挑战性的数据集。
36. Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation [PDF] 返回目录
Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu
Abstract: Image segmentation is a fundamental vision task and a crucial step for many applications. In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. Precisely, we define BPD on each pixel as a two-dimensional unit vector pointing from its nearest boundary to the pixel. In the BPD, nearby pixels from different regions have opposite directions departing from each other, and adjacent pixels in the same region have directions pointing to the other or each other (i.e., around medial points). We make use of such property to partition an image into super-BPDs, which are novel informative superpixels with robust direction similarity for fast grouping into segmentation regions. Extensive experimental results on BSDS500 and Pascal Context demonstrate the accuracy and efficency of the proposed super-BPD in segmenting images. In practice, the proposed super-BPD achieves comparable or superior performance with MCG while running at ~25fps vs. 0.07fps. Super-BPD also exhibits a noteworthy transferability to unseen scenes. The code is publicly available at this https URL.
摘要:图像分割是一个基本的视觉任务,并成为许多应用的关键一步。在本文中,我们提出基于一种新的超边界到像素方向(超BPD)一个快速的图像分割方法,并用超BPD定制分割算法。准确地说,我们定义在每个像素上从其最接近像素边界的二维单位矢量指向BPD。在BPD,来自不同区域的附近像素具有相反的方向从在同一区域彼此,并且相邻像素的情况下有指向其它或彼此(即,围绕内侧分)的方向。我们利用这样的属性的分区图像分成超BPD的,其是新颖的超像素信息与鲁棒方向相似性进行快速分组到分割的区域。在BSDS500和Pascal语境广泛的实验结果表明,在分割图像所提出的超BPD的准确性和efficency。在实践中,提出了超BPD在25fps的〜对0.07fps运行的同时实现了与MCG相当或优异的性能。超BPD也表现出显着的复制性看不见的场景。该代码是公开的,在此HTTPS URL。
Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu
Abstract: Image segmentation is a fundamental vision task and a crucial step for many applications. In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. Precisely, we define BPD on each pixel as a two-dimensional unit vector pointing from its nearest boundary to the pixel. In the BPD, nearby pixels from different regions have opposite directions departing from each other, and adjacent pixels in the same region have directions pointing to the other or each other (i.e., around medial points). We make use of such property to partition an image into super-BPDs, which are novel informative superpixels with robust direction similarity for fast grouping into segmentation regions. Extensive experimental results on BSDS500 and Pascal Context demonstrate the accuracy and efficency of the proposed super-BPD in segmenting images. In practice, the proposed super-BPD achieves comparable or superior performance with MCG while running at ~25fps vs. 0.07fps. Super-BPD also exhibits a noteworthy transferability to unseen scenes. The code is publicly available at this https URL.
摘要:图像分割是一个基本的视觉任务,并成为许多应用的关键一步。在本文中,我们提出基于一种新的超边界到像素方向(超BPD)一个快速的图像分割方法,并用超BPD定制分割算法。准确地说,我们定义在每个像素上从其最接近像素边界的二维单位矢量指向BPD。在BPD,来自不同区域的附近像素具有相反的方向从在同一区域彼此,并且相邻像素的情况下有指向其它或彼此(即,围绕内侧分)的方向。我们利用这样的属性的分区图像分成超BPD的,其是新颖的超像素信息与鲁棒方向相似性进行快速分组到分割的区域。在BSDS500和Pascal语境广泛的实验结果表明,在分割图像所提出的超BPD的准确性和efficency。在实践中,提出了超BPD在25fps的〜对0.07fps运行的同时实现了与MCG相当或优异的性能。超BPD也表现出显着的复制性看不见的场景。该代码是公开的,在此HTTPS URL。
37. Positron Emission Tomography (PET) image enhancement using a gradient vector orientation based nonlinear diffusion filter (GVOF) for accurate quantitation of radioactivity concentration [PDF] 返回目录
Mahbubunnabi Tamal
Abstract: To accurately quantify in vivo radiotracer uptake using Positron Emission Tomography (PET) is a challenging task due to low signal-to-noise ratio (SNR) and poor spatial resolution of PET camera along with the finite image sampling constraint. Furthermore, inter lesion variations of the SNR and contrast along with the variations in size of the lesion make the quantitation even more difficult. One of the ways to improve the quantitation is via post reconstruction filtering with Gaussian Filter (GF). Edge preserving Bilateral Filter (BF) and Nonlinear Diffusion Filter (NDF) are the alternatives to GF that can improve the SNR without degrading the image resolution. However, the performance of these edge preserving methods are only optimum for high count and low noise cases. A novel parameter free gradient vector orientation based nonlinear diffusion filter (GVOF) is proposed in this paper that is insensitive to statistical fluctuations (e. g., SNR, contrast, size etc.). GVOF method applied on the PET images collected with the NEMA phantom with varying levels of contrast and noise reveals that the GVOF method provides the highest SNR, CNR (contrast-to-noise ratio) and resolution compared to the original and other filtered images. The percentage bias in estimating the maximum activity representing SUVmax (Maximum Standardized Uptake Value) for the spheres with diameter > 2cm where the partial volume effects (PVE) is negligible is the lowest for the GVOF method. The GVOF method also improves the maximum intensity reproducibility. Robustness of the GVOF against variation in sizes, contrast levels and SNR makes it a suitable post filtering method for both accurate diagnosis and response assessment. Furthermore, its capability to provide accurate quantitative measurements irrespective of the SNR, it can also be effective in reduction of radioactivity dose.
摘要:为了准确地使用正电子发射断层扫描(PET)是一项具有挑战性的任务体内放射性示踪剂摄取量化由于低信噪比(SNR),并用有限图像采样约束沿PET照相机的空间分辨率较差。此外,SNR和与病灶大小变化沿相反的病变间的变化使定量更加困难。一的,以改善定量的方法之一是通过与高斯滤波(GF)后重建滤波。边缘保持双边滤波器(BF)和非线性扩散滤波器(NDF)是替代GF,可以提高信噪比不会降低图像分辨率。然而,这些边缘保护方法的性能仅适用于高支低噪音情况下最佳。一种新的参数的分类梯度矢量取向基于非线性扩散滤波器(GVOF)在本文认为是不敏感的统计波动提出(例如,SNR,对比度,尺寸等)。 GVOF法涂布于与对比度变化水平的NEMA幻象收集的PET图像和噪声揭示了GVOF方法相比原来和其他滤波图像提供了最高的SNR,CNR(对比度噪声比)和分辨率。在用于与直径>2厘米球体估计表示SUVmax的最大活性(最大标准化摄取值)的百分比偏差,其中部分容积效应(PVE)是可忽略的是最低的为GVOF方法。该方法GVOF也提高最大强度的再现性。在尺寸,对比度水平针对变化的GVOF的鲁棒性和SNR使得它成为既准确的诊断和响应评估合适的后置滤波方法。此外,它的能力,而不管SNR的提供准确的定量测量,它也可以有效地减少放射性剂量。
Mahbubunnabi Tamal
Abstract: To accurately quantify in vivo radiotracer uptake using Positron Emission Tomography (PET) is a challenging task due to low signal-to-noise ratio (SNR) and poor spatial resolution of PET camera along with the finite image sampling constraint. Furthermore, inter lesion variations of the SNR and contrast along with the variations in size of the lesion make the quantitation even more difficult. One of the ways to improve the quantitation is via post reconstruction filtering with Gaussian Filter (GF). Edge preserving Bilateral Filter (BF) and Nonlinear Diffusion Filter (NDF) are the alternatives to GF that can improve the SNR without degrading the image resolution. However, the performance of these edge preserving methods are only optimum for high count and low noise cases. A novel parameter free gradient vector orientation based nonlinear diffusion filter (GVOF) is proposed in this paper that is insensitive to statistical fluctuations (e. g., SNR, contrast, size etc.). GVOF method applied on the PET images collected with the NEMA phantom with varying levels of contrast and noise reveals that the GVOF method provides the highest SNR, CNR (contrast-to-noise ratio) and resolution compared to the original and other filtered images. The percentage bias in estimating the maximum activity representing SUVmax (Maximum Standardized Uptake Value) for the spheres with diameter > 2cm where the partial volume effects (PVE) is negligible is the lowest for the GVOF method. The GVOF method also improves the maximum intensity reproducibility. Robustness of the GVOF against variation in sizes, contrast levels and SNR makes it a suitable post filtering method for both accurate diagnosis and response assessment. Furthermore, its capability to provide accurate quantitative measurements irrespective of the SNR, it can also be effective in reduction of radioactivity dose.
摘要:为了准确地使用正电子发射断层扫描(PET)是一项具有挑战性的任务体内放射性示踪剂摄取量化由于低信噪比(SNR),并用有限图像采样约束沿PET照相机的空间分辨率较差。此外,SNR和与病灶大小变化沿相反的病变间的变化使定量更加困难。一的,以改善定量的方法之一是通过与高斯滤波(GF)后重建滤波。边缘保持双边滤波器(BF)和非线性扩散滤波器(NDF)是替代GF,可以提高信噪比不会降低图像分辨率。然而,这些边缘保护方法的性能仅适用于高支低噪音情况下最佳。一种新的参数的分类梯度矢量取向基于非线性扩散滤波器(GVOF)在本文认为是不敏感的统计波动提出(例如,SNR,对比度,尺寸等)。 GVOF法涂布于与对比度变化水平的NEMA幻象收集的PET图像和噪声揭示了GVOF方法相比原来和其他滤波图像提供了最高的SNR,CNR(对比度噪声比)和分辨率。在用于与直径>2厘米球体估计表示SUVmax的最大活性(最大标准化摄取值)的百分比偏差,其中部分容积效应(PVE)是可忽略的是最低的为GVOF方法。该方法GVOF也提高最大强度的再现性。在尺寸,对比度水平针对变化的GVOF的鲁棒性和SNR使得它成为既准确的诊断和响应评估合适的后置滤波方法。此外,它的能力,而不管SNR的提供准确的定量测量,它也可以有效地减少放射性剂量。
38. Is Depth Really Necessary for Salient Object Detection? [PDF] 返回目录
Shengwei Zhao, Yifan Zhao, Jia Li, Xiaowu Chen
Abstract: Salient object detection (SOD) is a crucial and preliminary task for many computer vision applications, which have made progress with deep CNNs. Most of the existing methods mainly rely on the RGB information to distinguish the salient objects, which faces difficulties in some complex scenarios. To solve this, many recent RGBD-based networks are proposed by adopting the depth map as an independent input and fuse the features with RGB information. Taking the advantages of RGB and RGBD methods, we propose a novel depth-aware salient object detection framework, which has following superior designs: 1) It only takes the depth information as training data while only relies on RGB information in the testing phase. 2) It comprehensively optimizes SOD features with multi-level depth-aware regularizations. 3) The depth information also serves as error-weighted map to correct the segmentation process. With these insightful designs combined, we make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference, which not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin, while adopting less information and implementation light-weighted. The code and model will be publicly available.
摘要:显着对象检测(SOD)是许多计算机视觉应用,它已与深细胞神经网络的进步至关重要的初步任务。大多数现有的方法主要依靠RGB信息来区分的显着对象,它面临着一些复杂的场景困难。为了解决这个问题,最近许多基于RGBD网络被采用深度图作为独立的输入和保险丝与RGB信息的特征提出的。取的RGB和RGBD方法的优点,我们提出了一个新颖的深度感知的显着对象的检测框架,它具有以下优良的设计:1)只在该深度信息作为训练数据,而只依赖于在测试阶段RGB信息。 2)全面优化SOD具有多级深度感知正则化特征。 3)深度信息也作为误差加权地图校正分离处理。有了这些精辟的设计结合起来,我们做的只有RGB信息作为推断投入,这不仅超过五个公共RGB SOD基准的国家的最先进的性能实现统一的深度感知框架的第一次尝试,也是超过大幅度五个基准基于RGBD的方法,而采用较少的信息和执行轻量化。代码和模型将公开。
Shengwei Zhao, Yifan Zhao, Jia Li, Xiaowu Chen
Abstract: Salient object detection (SOD) is a crucial and preliminary task for many computer vision applications, which have made progress with deep CNNs. Most of the existing methods mainly rely on the RGB information to distinguish the salient objects, which faces difficulties in some complex scenarios. To solve this, many recent RGBD-based networks are proposed by adopting the depth map as an independent input and fuse the features with RGB information. Taking the advantages of RGB and RGBD methods, we propose a novel depth-aware salient object detection framework, which has following superior designs: 1) It only takes the depth information as training data while only relies on RGB information in the testing phase. 2) It comprehensively optimizes SOD features with multi-level depth-aware regularizations. 3) The depth information also serves as error-weighted map to correct the segmentation process. With these insightful designs combined, we make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference, which not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin, while adopting less information and implementation light-weighted. The code and model will be publicly available.
摘要:显着对象检测(SOD)是许多计算机视觉应用,它已与深细胞神经网络的进步至关重要的初步任务。大多数现有的方法主要依靠RGB信息来区分的显着对象,它面临着一些复杂的场景困难。为了解决这个问题,最近许多基于RGBD网络被采用深度图作为独立的输入和保险丝与RGB信息的特征提出的。取的RGB和RGBD方法的优点,我们提出了一个新颖的深度感知的显着对象的检测框架,它具有以下优良的设计:1)只在该深度信息作为训练数据,而只依赖于在测试阶段RGB信息。 2)全面优化SOD具有多级深度感知正则化特征。 3)深度信息也作为误差加权地图校正分离处理。有了这些精辟的设计结合起来,我们做的只有RGB信息作为推断投入,这不仅超过五个公共RGB SOD基准的国家的最先进的性能实现统一的深度感知框架的第一次尝试,也是超过大幅度五个基准基于RGBD的方法,而采用较少的信息和执行轻量化。代码和模型将公开。
39. Web page classification with Google Image Search results [PDF] 返回目录
Fahri Aydos, A. Murat Özbayoğlu, Yahya Şirin, M. Fatih Demirci
Abstract: In this paper, we introduce a novel method that combines multiple neural network results to decide the class of the input. In our model, each element is represented by multiple descriptive images. After the training process of the neural network model, each element is classified by calculating its descriptive image results. We apply our idea to the web page classification problem using Google Image Search results as descriptive images. We obtained a classification rate of 94.90% on the WebScreenshots dataset that contains 20000 web sites in 4 classes. The method is easily applicable to similar problems.
摘要:在本文中,我们介绍了一种新方法,它结合了多种神经网络的结果来决定类的输入。在我们的模型中,每个元件由多个描述性图像表示。神经网络模型的训练过程之后,每个元件是通过计算其描述性图像的结果进行分类。我们使用谷歌图片搜索结果的描述性图像应用我们的想法,网页分类问题。我们获得了94.90%,这是包含4类20000个网站的WebScreenshots数据集分类率。该方法很容易适用于类似的问题。
Fahri Aydos, A. Murat Özbayoğlu, Yahya Şirin, M. Fatih Demirci
Abstract: In this paper, we introduce a novel method that combines multiple neural network results to decide the class of the input. In our model, each element is represented by multiple descriptive images. After the training process of the neural network model, each element is classified by calculating its descriptive image results. We apply our idea to the web page classification problem using Google Image Search results as descriptive images. We obtained a classification rate of 94.90% on the WebScreenshots dataset that contains 20000 web sites in 4 classes. The method is easily applicable to similar problems.
摘要:在本文中,我们介绍了一种新方法,它结合了多种神经网络的结果来决定类的输入。在我们的模型中,每个元件由多个描述性图像表示。神经网络模型的训练过程之后,每个元件是通过计算其描述性图像的结果进行分类。我们使用谷歌图片搜索结果的描述性图像应用我们的想法,网页分类问题。我们获得了94.90%,这是包含4类20000个网站的WebScreenshots数据集分类率。该方法很容易适用于类似的问题。
40. Self-adaptive Re-weighted Adversarial Domain Adaptation [PDF] 返回目录
Shanshan Wang, Lei Zhang
Abstract: Existing adversarial domain adaptation methods mainly consider the marginal distribution and these methods may lead to either under transfer or negative transfer. To address this problem, we present a self-adaptive re-weighted adversarial domain adaptation approach, which tries to enhance domain alignment from the perspective of conditional distribution. In order to promote positive transfer and combat negative transfer, we reduce the weight of the adversarial loss for aligned features while increasing the adversarial force for those poorly aligned measured by the conditional entropy. Additionally, triplet loss leveraging source samples and pseudo-labeled target samples is employed on the confusing domain. Such metric loss ensures the distance of the intra-class sample pairs closer than the inter-class pairs to achieve the class-level alignment. In this way, the high accurate pseudolabeled target samples and semantic alignment can be captured simultaneously in the co-training process. Our method achieved low joint error of the ideal source and target hypothesis. The expected target error can then be upper bounded following Ben-David's theorem. Empirical evidence demonstrates that the proposed model outperforms state of the arts on standard domain adaptation datasets.
摘要:现有的对抗性领域适应性方法主要考虑的边缘分布和这些方法可能会导致无论是在转让或负迁移。为了解决这个问题,我们提出了一种自适应重加权对抗域自适应方法,它试图从条件分布的角度增强域对齐。为了促进正迁移和打击负迁移,我们降低对对齐功能,对抗损失的重量,同时增加了对抗力较差者通过对齐的条件熵测量。此外,三重峰损失杠杆源样本和伪标记的靶样品中采用的混淆域。这种度量损耗确保了类内样本对的距离比级间对更接近达到类级的对准。以这种方式,高准确pseudolabeled目标样本和语义对准可以同时在共培养过程捕获的。我们的方法达到理想的源和目标假设的低共同的错误。预期目标误差然后可以上界跟随本·戴维定理。经验证据表明,在标准领域适应性数据集艺术所提出的模型优于状态。
Shanshan Wang, Lei Zhang
Abstract: Existing adversarial domain adaptation methods mainly consider the marginal distribution and these methods may lead to either under transfer or negative transfer. To address this problem, we present a self-adaptive re-weighted adversarial domain adaptation approach, which tries to enhance domain alignment from the perspective of conditional distribution. In order to promote positive transfer and combat negative transfer, we reduce the weight of the adversarial loss for aligned features while increasing the adversarial force for those poorly aligned measured by the conditional entropy. Additionally, triplet loss leveraging source samples and pseudo-labeled target samples is employed on the confusing domain. Such metric loss ensures the distance of the intra-class sample pairs closer than the inter-class pairs to achieve the class-level alignment. In this way, the high accurate pseudolabeled target samples and semantic alignment can be captured simultaneously in the co-training process. Our method achieved low joint error of the ideal source and target hypothesis. The expected target error can then be upper bounded following Ben-David's theorem. Empirical evidence demonstrates that the proposed model outperforms state of the arts on standard domain adaptation datasets.
摘要:现有的对抗性领域适应性方法主要考虑的边缘分布和这些方法可能会导致无论是在转让或负迁移。为了解决这个问题,我们提出了一种自适应重加权对抗域自适应方法,它试图从条件分布的角度增强域对齐。为了促进正迁移和打击负迁移,我们降低对对齐功能,对抗损失的重量,同时增加了对抗力较差者通过对齐的条件熵测量。此外,三重峰损失杠杆源样本和伪标记的靶样品中采用的混淆域。这种度量损耗确保了类内样本对的距离比级间对更接近达到类级的对准。以这种方式,高准确pseudolabeled目标样本和语义对准可以同时在共培养过程捕获的。我们的方法达到理想的源和目标假设的低共同的错误。预期目标误差然后可以上界跟随本·戴维定理。经验证据表明,在标准领域适应性数据集艺术所提出的模型优于状态。
41. Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts [PDF] 返回目录
Bo Pang, Kaiwen Zha, Hanwen Cao, Jiajun Tang, Minghui Yu, Cewu Lu
Abstract: Understanding sequential information is a fundamental task for artificial intelligence. Current neural networks attempt to learn spatial and temporal information as a whole, limited their abilities to represent large scale spatial representations over long-range sequences. Here, we introduce a new modeling strategy called Semi-Coupled Structure (SCS), which consists of deep neural networks that decouple the complex spatial and temporal concepts learning. Semi-Coupled Structure can learn to implicitly separate input information into independent parts and process these parts respectively. Experiments demonstrate that a Semi-Coupled Structure can successfully annotate the outline of an object in images sequentially and perform video action recognition. For sequence-to-sequence problems, a Semi-Coupled Structure can predict future meteorological radar echo images based on observed images. Taken together, our results demonstrate that a Semi-Coupled Structure has the capacity to improve the performance of LSTM-like models on large scale sequential tasks.
摘要:了解顺序信息是人工智能的基本任务。目前神经网络试图学习空间和时间信息作为一个整体,限制了他们的能力表示大规模的空间表示了远射序列。在这里,我们介绍了所谓的半耦合结构(SCS)新的建模策略,其中包括该解耦复杂的空间和时间概念学习深层神经网络。半耦合结构可以学习隐式分离的信息输入到独立的部分并分别处理这些部件。实验表明,一个半耦合结构可以成功地注释的对象的轮廓中的图像依次执行视频动作识别。对于序列到序列的问题,一个半耦合结构可以预测基于观察图像未来气象雷达回波图像。总之,我们的结果表明,一个半耦合结构,以提高性能的能力LSTM般的大规模连续任务的机型。
Bo Pang, Kaiwen Zha, Hanwen Cao, Jiajun Tang, Minghui Yu, Cewu Lu
Abstract: Understanding sequential information is a fundamental task for artificial intelligence. Current neural networks attempt to learn spatial and temporal information as a whole, limited their abilities to represent large scale spatial representations over long-range sequences. Here, we introduce a new modeling strategy called Semi-Coupled Structure (SCS), which consists of deep neural networks that decouple the complex spatial and temporal concepts learning. Semi-Coupled Structure can learn to implicitly separate input information into independent parts and process these parts respectively. Experiments demonstrate that a Semi-Coupled Structure can successfully annotate the outline of an object in images sequentially and perform video action recognition. For sequence-to-sequence problems, a Semi-Coupled Structure can predict future meteorological radar echo images based on observed images. Taken together, our results demonstrate that a Semi-Coupled Structure has the capacity to improve the performance of LSTM-like models on large scale sequential tasks.
摘要:了解顺序信息是人工智能的基本任务。目前神经网络试图学习空间和时间信息作为一个整体,限制了他们的能力表示大规模的空间表示了远射序列。在这里,我们介绍了所谓的半耦合结构(SCS)新的建模策略,其中包括该解耦复杂的空间和时间概念学习深层神经网络。半耦合结构可以学习隐式分离的信息输入到独立的部分并分别处理这些部件。实验表明,一个半耦合结构可以成功地注释的对象的轮廓中的图像依次执行视频动作识别。对于序列到序列的问题,一个半耦合结构可以预测基于观察图像未来气象雷达回波图像。总之,我们的结果表明,一个半耦合结构,以提高性能的能力LSTM般的大规模连续任务的机型。
42. Attention-Guided Discriminative Region Localization for Bone Age Assessment [PDF] 返回目录
Chao Chen, Zhihong Chen1, Xinyu Jin, Lanjuan Li, William Speier, Corey W. Arnold2
Abstract: Bone age assessment (BAA) is clinically important as it can be used to diagnose endocrine and metabolic disorders during child development. Existing deep learning based methods for classifying bone age generally use the global image as input, or exploit local information by annotating extra bounding boxes or key points. Training with the global image underutilizes discriminative local information, while providing extra annotations is expensive and subjective. In this paper, we propose an attention-guided approach to automatically localize the discriminative regions for BAA without any extra annotations. Specifically, we first train a classification model to learn the attention heat maps of the discriminative regions, finding the hand region, the most discriminative region (the carpal bones), and the next most discriminative region (the metacarpal bones). We then crop these informative local regions from the original image and aggregate different regions for bone age regression. Extensive comparison experiments are conducted on the RSNA pediatric bone age data set. Using no training annotations, our method achieves competitive results compared with existing state-of-the-art semi-automatic deep learning-based methods that require manual annotation. codes are available at \url{this https URL}.
摘要:骨龄评估(BAA)在临床上是重要的,因为它可以用于儿童发育过程中诊断内分泌代谢紊乱。现有的深度学习为基础的方法对骨龄划分一般采用全球形象作为输入,或利用通过注释额外边框或关键点的本地信息。随着全球形象培训无法充分利用判别本地信息,同时提供额外的注解是昂贵的和主观的。在本文中,我们提出了一个注意引导的方式来自动定位为BAA的鉴别的区域没有任何额外的注解。具体而言,我们首先训练分类模型学习鉴别的区域的关注热图,发现手区域,最区分力的区域(腕骨),以及下一个最歧视性区域(掌骨)。然后,我们从原始图像裁剪这些翔实的局部区域和聚集骨龄回归不同的区域。广泛的比较实验在RSNA儿童骨龄数据集进行。利用没有训练批注,与需要人工标注现有的国家的最先进的半自动深基于学习的方法相比,我们的方法实现了有竞争力的结果。代码可以在\ {URL这HTTPS URL}。
Chao Chen, Zhihong Chen1, Xinyu Jin, Lanjuan Li, William Speier, Corey W. Arnold2
Abstract: Bone age assessment (BAA) is clinically important as it can be used to diagnose endocrine and metabolic disorders during child development. Existing deep learning based methods for classifying bone age generally use the global image as input, or exploit local information by annotating extra bounding boxes or key points. Training with the global image underutilizes discriminative local information, while providing extra annotations is expensive and subjective. In this paper, we propose an attention-guided approach to automatically localize the discriminative regions for BAA without any extra annotations. Specifically, we first train a classification model to learn the attention heat maps of the discriminative regions, finding the hand region, the most discriminative region (the carpal bones), and the next most discriminative region (the metacarpal bones). We then crop these informative local regions from the original image and aggregate different regions for bone age regression. Extensive comparison experiments are conducted on the RSNA pediatric bone age data set. Using no training annotations, our method achieves competitive results compared with existing state-of-the-art semi-automatic deep learning-based methods that require manual annotation. codes are available at \url{this https URL}.
摘要:骨龄评估(BAA)在临床上是重要的,因为它可以用于儿童发育过程中诊断内分泌代谢紊乱。现有的深度学习为基础的方法对骨龄划分一般采用全球形象作为输入,或利用通过注释额外边框或关键点的本地信息。随着全球形象培训无法充分利用判别本地信息,同时提供额外的注解是昂贵的和主观的。在本文中,我们提出了一个注意引导的方式来自动定位为BAA的鉴别的区域没有任何额外的注解。具体而言,我们首先训练分类模型学习鉴别的区域的关注热图,发现手区域,最区分力的区域(腕骨),以及下一个最歧视性区域(掌骨)。然后,我们从原始图像裁剪这些翔实的局部区域和聚集骨龄回归不同的区域。广泛的比较实验在RSNA儿童骨龄数据集进行。利用没有训练批注,与需要人工标注现有的国家的最先进的半自动深基于学习的方法相比,我们的方法实现了有竞争力的结果。代码可以在\ {URL这HTTPS URL}。
43. OPAL-Net: A Generative Model for Part-based Object Layout Generation [PDF] 返回目录
Rishabh Baghel, Ravi Kiran Sarvadevabhatla
Abstract: We propose OPAL-Net, a novel hierarchical architecture for part-based layout generation of objects from multiple categories using a single unified model. We adopt a coarse-to-fine strategy involving semantically conditioned autoregressive generation of bounding box layouts and pixel-level part layouts for objects. We use Graph Convolutional Networks, Deep Recurrent Networks along with custom-designed Conditional Variational Autoencoders to enable flexible, diverse and category-aware generation of object layouts. We train OPAL-Net on PASCAL-Parts dataset. The generated samples and corresponding evaluation scores demonstrate the versatility of OPAL-Net compared to ablative variants and baselines.
摘要:本文提出OPAL型网,使用一个统一的模型基于部分的版面设计从多个类别对象的新的层次结构。我们采用语义涉及的自回归条件代边框布局和像素级的部分布局对象的粗到细的策略。我们用图表卷积网络,深复发网络与定制设计条件变自动编码一起,以实现灵活,多样,类别感知代对象的布局。我们在PASCAL零配件数据集训练OPAL-网。所生成的样品和相应的评价分数表明相比烧蚀变体和基线OPAL-网的通用性。
Rishabh Baghel, Ravi Kiran Sarvadevabhatla
Abstract: We propose OPAL-Net, a novel hierarchical architecture for part-based layout generation of objects from multiple categories using a single unified model. We adopt a coarse-to-fine strategy involving semantically conditioned autoregressive generation of bounding box layouts and pixel-level part layouts for objects. We use Graph Convolutional Networks, Deep Recurrent Networks along with custom-designed Conditional Variational Autoencoders to enable flexible, diverse and category-aware generation of object layouts. We train OPAL-Net on PASCAL-Parts dataset. The generated samples and corresponding evaluation scores demonstrate the versatility of OPAL-Net compared to ablative variants and baselines.
摘要:本文提出OPAL型网,使用一个统一的模型基于部分的版面设计从多个类别对象的新的层次结构。我们采用语义涉及的自回归条件代边框布局和像素级的部分布局对象的粗到细的策略。我们用图表卷积网络,深复发网络与定制设计条件变自动编码一起,以实现灵活,多样,类别感知代对象的布局。我们在PASCAL零配件数据集训练OPAL-网。所生成的样品和相应的评价分数表明相比烧蚀变体和基线OPAL-网的通用性。
44. An Efficient Planar Bundle Adjustment Algorithm [PDF] 返回目录
Lipu Zhou, Daniel Koppel, Hui Ju, Frank Steinbruecker, Michael Kaess
Abstract: This paper presents an efficient algorithm for the least-squares problem using the point-to-plane cost, which aims to jointly optimize depth sensor poses and plane parameters for 3D reconstruction. We call this least-squares problem \textbf{Planar Bundle Adjustment} (PBA), due to the similarity between this problem and the original Bundle Adjustment (BA) in visual reconstruction. As planes ubiquitously exist in the man-made environment, they are generally used as landmarks in SLAM algorithms for various depth sensors. PBA is important to reduce drift and improve the quality of the map. However, directly adopting the well-established BA framework in visual reconstruction will result in a very inefficient solution for PBA. This is because a 3D point only has one observation at a camera pose. In contrast, a depth sensor can record hundreds of points in a plane at a time, which results in a very large nonlinear least-squares problem even for a small-scale space. Fortunately, we find that there exist a special structure of the PBA problem. We introduce a reduced Jacobian matrix and a reduced residual vector, and prove that they can replace the original Jacobian matrix and residual vector in the generally adopted Levenberg-Marquardt (LM) algorithm. This significantly reduces the computational cost. Besides, when planes are combined with other features for 3D reconstruction, the reduced Jacobian matrix and residual vector can also replace the corresponding parts derived from planes. Our experimental results verify that our algorithm can significantly reduce the computational time compared to the solution using the traditional BA framework. Besides, our algorithm is faster, more accuracy, and more robust to initialization errors compared to the start-of-the-art solution using the plane-to-plane cost
摘要:提出一种有效的算法为最小二乘问题使用点到平面成本,其目的是优化联合深度传感器的姿势和平面参数3D重建。我们称此最小平方问题\ textbf {平面光束法平差}(PBA),由于这个问题和原始光束法平差(BA)在视觉重建之间的相似度。由于飞机在人造环境遍存在的,它们通常被用作在SLAM算法关于各种深度传感器的地标。 PBA是很重要的,以减少漂移,提高了地图的质量。然而,直接采用在视觉重建行之有效的BA框架将导致对PBA一个非常低效的解决方案。这是因为三维点只有在照相机姿势一个观察。相比之下,深度传感器可以记录数百点的飞机的时间,这导致非常大的非线性最小二乘法问题,即使是小规模的空间。幸运的是,我们发现了存在的问题,PBA的特殊结构。我们引入一个减小雅可比矩阵和减小的残差矢量,并证明它们可以替代在通常采用的列文伯格 - 马夸尔特(LM)算法对原雅可比矩阵和残差矢量。这显著降低了计算成本。此外,当平面与其它特征的三维重建结合,降低的雅可比矩阵和残差矢量也可以代替从平面导出的相应的部分。我们的实验结果证实了我们的算法能够显著减少计算时间比采用传统的BA框架的解决方案。此外,我们的算法是与使用平面到平面成本启动的最先进的解决方案更快,更准确,更稳健初始化错误
Lipu Zhou, Daniel Koppel, Hui Ju, Frank Steinbruecker, Michael Kaess
Abstract: This paper presents an efficient algorithm for the least-squares problem using the point-to-plane cost, which aims to jointly optimize depth sensor poses and plane parameters for 3D reconstruction. We call this least-squares problem \textbf{Planar Bundle Adjustment} (PBA), due to the similarity between this problem and the original Bundle Adjustment (BA) in visual reconstruction. As planes ubiquitously exist in the man-made environment, they are generally used as landmarks in SLAM algorithms for various depth sensors. PBA is important to reduce drift and improve the quality of the map. However, directly adopting the well-established BA framework in visual reconstruction will result in a very inefficient solution for PBA. This is because a 3D point only has one observation at a camera pose. In contrast, a depth sensor can record hundreds of points in a plane at a time, which results in a very large nonlinear least-squares problem even for a small-scale space. Fortunately, we find that there exist a special structure of the PBA problem. We introduce a reduced Jacobian matrix and a reduced residual vector, and prove that they can replace the original Jacobian matrix and residual vector in the generally adopted Levenberg-Marquardt (LM) algorithm. This significantly reduces the computational cost. Besides, when planes are combined with other features for 3D reconstruction, the reduced Jacobian matrix and residual vector can also replace the corresponding parts derived from planes. Our experimental results verify that our algorithm can significantly reduce the computational time compared to the solution using the traditional BA framework. Besides, our algorithm is faster, more accuracy, and more robust to initialization errors compared to the start-of-the-art solution using the plane-to-plane cost
摘要:提出一种有效的算法为最小二乘问题使用点到平面成本,其目的是优化联合深度传感器的姿势和平面参数3D重建。我们称此最小平方问题\ textbf {平面光束法平差}(PBA),由于这个问题和原始光束法平差(BA)在视觉重建之间的相似度。由于飞机在人造环境遍存在的,它们通常被用作在SLAM算法关于各种深度传感器的地标。 PBA是很重要的,以减少漂移,提高了地图的质量。然而,直接采用在视觉重建行之有效的BA框架将导致对PBA一个非常低效的解决方案。这是因为三维点只有在照相机姿势一个观察。相比之下,深度传感器可以记录数百点的飞机的时间,这导致非常大的非线性最小二乘法问题,即使是小规模的空间。幸运的是,我们发现了存在的问题,PBA的特殊结构。我们引入一个减小雅可比矩阵和减小的残差矢量,并证明它们可以替代在通常采用的列文伯格 - 马夸尔特(LM)算法对原雅可比矩阵和残差矢量。这显著降低了计算成本。此外,当平面与其它特征的三维重建结合,降低的雅可比矩阵和残差矢量也可以代替从平面导出的相应的部分。我们的实验结果证实了我们的算法能够显著减少计算时间比采用传统的BA框架的解决方案。此外,我们的算法是与使用平面到平面成本启动的最先进的解决方案更快,更准确,更稳健初始化错误
45. When2com: Multi-Agent Perception via Communication Graph Grouping [PDF] 返回目录
Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, Zsolt Kira
Abstract: While significant advances have been made for single-agent perception, many applications require multiple sensing agents and cross-agent communication due to benefits such as coverage and robustness. It is therefore critical to develop frameworks which support multi-agent collaborative perception in a distributed and bandwidth-efficient manner. In this paper, we address the collaborative perception problem, where one agent is required to perform a perception task and can communicate and share information with other agents on the same task. Specifically, we propose a communication framework by learning both to construct communication groups and decide when to communicate. We demonstrate the generalizability of our framework on two different perception tasks and show that it significantly reduces communication bandwidth while maintaining superior performance.
摘要:虽然显著的进步已经进行了单药的看法,很多应用都需要因好处,如覆盖面和健壮性多个感应剂和交叉代理通信。因此,至关重要的开发一种支持分布式和带宽高效的方式的多代理协作感知框架。在本文中,我们解决了协同感知问题,当需要一个代理执行任务的看法,可以沟通,并与相同任务的代理共享信息。具体来说,我们通过学习这两个构建通信组并决定何时进行沟通提出了一个通信框架。我们证明了我们在两个不同的感知任务框架的普遍性,并表明它显著降低了通信的带宽,同时保持了卓越的性能。
Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, Zsolt Kira
Abstract: While significant advances have been made for single-agent perception, many applications require multiple sensing agents and cross-agent communication due to benefits such as coverage and robustness. It is therefore critical to develop frameworks which support multi-agent collaborative perception in a distributed and bandwidth-efficient manner. In this paper, we address the collaborative perception problem, where one agent is required to perform a perception task and can communicate and share information with other agents on the same task. Specifically, we propose a communication framework by learning both to construct communication groups and decide when to communicate. We demonstrate the generalizability of our framework on two different perception tasks and show that it significantly reduces communication bandwidth while maintaining superior performance.
摘要:虽然显著的进步已经进行了单药的看法,很多应用都需要因好处,如覆盖面和健壮性多个感应剂和交叉代理通信。因此,至关重要的开发一种支持分布式和带宽高效的方式的多代理协作感知框架。在本文中,我们解决了协同感知问题,当需要一个代理执行任务的看法,可以沟通,并与相同任务的代理共享信息。具体来说,我们通过学习这两个构建通信组并决定何时进行沟通提出了一个通信框架。我们证明了我们在两个不同的感知任务框架的普遍性,并表明它显著降低了通信的带宽,同时保持了卓越的性能。
46. Retrieval of Family Members Using Siamese Neural Network [PDF] 返回目录
Jun Yu, Guochen Xie, Mengyan Li, Xinlong Hao
Abstract: Retrieval of family members in the wild aims at finding family members of the given subject in the dataset, which is useful in finding the lost children and analyzing the kinship. However, due to the diversity in age, gender, pose and illumination of the collected data, this task is always challenging. To solve this problem, we propose our solution with deep Siamese neural network. Our solution can be divided into two parts: similarity computation and ranking. In training procedure, the Siamese network firstly takes two candidate images as input and produces two feature vectors. And then, the similarity between the two vectors is computed with several fully connected layers. While in inference procedure, we try another similarity computing method by dropping the followed several fully connected layers and directly computing the cosine similarity of the two feature vectors. After similarity computation, we use the ranking algorithm to merge the similarity scores with the same identity and output the ordered list according to their similarities. To gain further improvement, we try different combinations of backbones, training methods and similarity computing methods. Finally, we submit the best combination as our solution and our team(ustc-nelslip) obtains favorable result in the track3 of the RFIW2020 challenge with the first runner-up, which verifies the effectiveness of our method. Our code is available at: this https URL
摘要:在发现数据集中给定的问题,这是在寻找走失的孩子和分析有用的血缘关系的家庭成员野生目标家庭成员的检索。然而,由于年龄,性别,姿势和所收集数据的照明多样性,这个任务总是具有挑战性。为了解决这个问题,我们提出我们的深连体神经网络的解决方案。我们的解决方案可以分为两个部分:相似度计算和排名。在训练过程中,连体网络首先需要两个候选图像作为输入,并产生两个特征向量。然后,这两个向量之间的相似性被计算与几个完全连接层。而在推理过程中,我们尝试使用其他相似性通过降低随后几个完全连接层和直接计算两个特征向量的余弦相似度的计算方法。相似度计算后,我们使用的排名算法,以相同的身份和输出根据它们的相似有序列表合并相似的分数。为了获得进一步的改进,我们尽量骨干,训练方法和相似度计算方法的不同组合。最后,我们提出为我们的解决方案,我们的团队的最佳组合(USTC-nelslip)获得与第一亚军,从而验证了该方法的有效性RFIW2020挑战TRACK3有利的结果。我们的代码,请访问:此HTTPS URL
Jun Yu, Guochen Xie, Mengyan Li, Xinlong Hao
Abstract: Retrieval of family members in the wild aims at finding family members of the given subject in the dataset, which is useful in finding the lost children and analyzing the kinship. However, due to the diversity in age, gender, pose and illumination of the collected data, this task is always challenging. To solve this problem, we propose our solution with deep Siamese neural network. Our solution can be divided into two parts: similarity computation and ranking. In training procedure, the Siamese network firstly takes two candidate images as input and produces two feature vectors. And then, the similarity between the two vectors is computed with several fully connected layers. While in inference procedure, we try another similarity computing method by dropping the followed several fully connected layers and directly computing the cosine similarity of the two feature vectors. After similarity computation, we use the ranking algorithm to merge the similarity scores with the same identity and output the ordered list according to their similarities. To gain further improvement, we try different combinations of backbones, training methods and similarity computing methods. Finally, we submit the best combination as our solution and our team(ustc-nelslip) obtains favorable result in the track3 of the RFIW2020 challenge with the first runner-up, which verifies the effectiveness of our method. Our code is available at: this https URL
摘要:在发现数据集中给定的问题,这是在寻找走失的孩子和分析有用的血缘关系的家庭成员野生目标家庭成员的检索。然而,由于年龄,性别,姿势和所收集数据的照明多样性,这个任务总是具有挑战性。为了解决这个问题,我们提出我们的深连体神经网络的解决方案。我们的解决方案可以分为两个部分:相似度计算和排名。在训练过程中,连体网络首先需要两个候选图像作为输入,并产生两个特征向量。然后,这两个向量之间的相似性被计算与几个完全连接层。而在推理过程中,我们尝试使用其他相似性通过降低随后几个完全连接层和直接计算两个特征向量的余弦相似度的计算方法。相似度计算后,我们使用的排名算法,以相同的身份和输出根据它们的相似有序列表合并相似的分数。为了获得进一步的改进,我们尽量骨干,训练方法和相似度计算方法的不同组合。最后,我们提出为我们的解决方案,我们的团队的最佳组合(USTC-nelslip)获得与第一亚军,从而验证了该方法的有效性RFIW2020挑战TRACK3有利的结果。我们的代码,请访问:此HTTPS URL
47. Joint Person Objectness and Repulsion for Person Search [PDF] 返回目录
Hantao Yao, Changsheng Xu
Abstract: Person search targets to search the probe person from the unconstrainted scene images, which can be treated as the combination of person detection and person matching. However, the existing methods based on the Detection-Matching framework ignore the person objectness and repulsion (OR) which are both beneficial to reduce the effect of distractor images. In this paper, we propose an OR similarity by jointly considering the objectness and repulsion information. Besides the traditional visual similarity term, the OR similarity also contains an objectness term and a repulsion term. The objectness term can reduce the similarity of distractor images that not contain a person and boost the performance of person search by improving the ranking of positive samples. Because the probe person has a different person ID with its \emph{neighbors}, the gallery images having a higher similarity with the \emph{neighbors of probe} should have a lower similarity with the probe person. Based on this repulsion constraint, the repulsion term is proposed to reduce the similarity of distractor images that are not most similar to the probe person. Treating the Faster R-CNN as the person detector, the OR similarity is evaluated on PRW and CUHK-SYSU datasets by the Detection-Matching framework with six description models. The extensive experiments demonstrate that the proposed OR similarity can effectively reduce the similarity of distractor samples and further boost the performance of person search, e.g., improve the mAP from 92.32% to 93.23% for CUHK-SYSY dataset, and from 50.91% to 52.30% for PRW datasets.
摘要:人搜索目标从unconstrainted场景图像,这可以被视为人检测和人匹配的组合来搜索探测人。然而,根据检测匹配框架,现有方法忽略了人的对象性和排斥(OR),这都利于减少牵张图像的效果。在本文中,我们通过联合考虑对象性和排斥信息提出的或相似性。除了传统的视觉相似性来看,或相似性还包含一个对象性长期和短期排斥。该对象性术语可以减少不含人,通过改善阳性样品的排名提升人搜索的性能牵引器图像的相似性。因为探针的人具有与其\ EMPH {邻居}不同的人物ID,具有与所述\ EMPH {探针的邻居}更高相似性的图像画廊应具有与探针的人更低的相似性。在此基础上排斥约束时,排斥术语提出了减少不属于最相似的探测器人牵张图像的相似性。处理更快R-CNN作为人检测器,所述或相似性上PRW和CUHK-中山大学数据集由具有六个描述模型检测匹配框架评价。广泛的实验结果表明,所提出的或相似性可以有效地降低牵引器样品,并进一步提升人搜索的性能,例如相似性,从92.32%提高了地图为CUHK-SYSY数据集93.23%,从50.91%至52.30%对于PRW数据集。
Hantao Yao, Changsheng Xu
Abstract: Person search targets to search the probe person from the unconstrainted scene images, which can be treated as the combination of person detection and person matching. However, the existing methods based on the Detection-Matching framework ignore the person objectness and repulsion (OR) which are both beneficial to reduce the effect of distractor images. In this paper, we propose an OR similarity by jointly considering the objectness and repulsion information. Besides the traditional visual similarity term, the OR similarity also contains an objectness term and a repulsion term. The objectness term can reduce the similarity of distractor images that not contain a person and boost the performance of person search by improving the ranking of positive samples. Because the probe person has a different person ID with its \emph{neighbors}, the gallery images having a higher similarity with the \emph{neighbors of probe} should have a lower similarity with the probe person. Based on this repulsion constraint, the repulsion term is proposed to reduce the similarity of distractor images that are not most similar to the probe person. Treating the Faster R-CNN as the person detector, the OR similarity is evaluated on PRW and CUHK-SYSU datasets by the Detection-Matching framework with six description models. The extensive experiments demonstrate that the proposed OR similarity can effectively reduce the similarity of distractor samples and further boost the performance of person search, e.g., improve the mAP from 92.32% to 93.23% for CUHK-SYSY dataset, and from 50.91% to 52.30% for PRW datasets.
摘要:人搜索目标从unconstrainted场景图像,这可以被视为人检测和人匹配的组合来搜索探测人。然而,根据检测匹配框架,现有方法忽略了人的对象性和排斥(OR),这都利于减少牵张图像的效果。在本文中,我们通过联合考虑对象性和排斥信息提出的或相似性。除了传统的视觉相似性来看,或相似性还包含一个对象性长期和短期排斥。该对象性术语可以减少不含人,通过改善阳性样品的排名提升人搜索的性能牵引器图像的相似性。因为探针的人具有与其\ EMPH {邻居}不同的人物ID,具有与所述\ EMPH {探针的邻居}更高相似性的图像画廊应具有与探针的人更低的相似性。在此基础上排斥约束时,排斥术语提出了减少不属于最相似的探测器人牵张图像的相似性。处理更快R-CNN作为人检测器,所述或相似性上PRW和CUHK-中山大学数据集由具有六个描述模型检测匹配框架评价。广泛的实验结果表明,所提出的或相似性可以有效地降低牵引器样品,并进一步提升人搜索的性能,例如相似性,从92.32%提高了地图为CUHK-SYSY数据集93.23%,从50.91%至52.30%对于PRW数据集。
48. Challenge report: Recognizing Families In the Wild Data Challenge [PDF] 返回目录
Zhipeng Luo, Zhiguang Zhang, Zhenyu Xu, Lixuan Che
Abstract: This paper is a brief report to our submission to the Recognizing Families In the Wild Data Challenge (4th Edition), in conjunction with FG 2020 Forum. Automatic kinship recognition has attracted many researchers' attention for its full application, but it is still a very challenging task because of the limited information that can be used to determine whether a pair of faces are blood relatives or not. In this paper, we studied previous methods and proposed our method. We try many methods, like deep metric learning-based, to extract deep embedding feature for every image, then determine if they are blood relatives by Euclidean distance or method based on classes. Finally, we find some tricks like sampling more negative samples and high resolution that can help get better performance. Moreover, we proposed a symmetric network with a binary classification based method to get our best score in all tasks.
摘要:本文是对我们提交认识家庭在野外数据挑战(第四版),连同FG 2020论坛的简要报告。自动识别亲属关系吸引了众多研究者的关注其全面应用,但它仍然是因为可用于确定一对面的是否有血缘关系的亲属或不是有限的信息非常具有挑战性的任务。在本文中,我们研究了以前的方法,并提出我们的方法。我们尝试了很多方法,如深度量学习为基础,以提取深嵌入功能的每张图片,然后确定他们是否有血缘关系的亲属通过欧氏距离或方法基于类。最后,我们找到了一些小把戏,比如采样更阴性的样品和高分辨率,可以帮助获得更好的性能。此外,我们提出了一个对称网络的二元分类为基础的方法,让我们的最好成绩在所有任务。
Zhipeng Luo, Zhiguang Zhang, Zhenyu Xu, Lixuan Che
Abstract: This paper is a brief report to our submission to the Recognizing Families In the Wild Data Challenge (4th Edition), in conjunction with FG 2020 Forum. Automatic kinship recognition has attracted many researchers' attention for its full application, but it is still a very challenging task because of the limited information that can be used to determine whether a pair of faces are blood relatives or not. In this paper, we studied previous methods and proposed our method. We try many methods, like deep metric learning-based, to extract deep embedding feature for every image, then determine if they are blood relatives by Euclidean distance or method based on classes. Finally, we find some tricks like sampling more negative samples and high resolution that can help get better performance. Moreover, we proposed a symmetric network with a binary classification based method to get our best score in all tasks.
摘要:本文是对我们提交认识家庭在野外数据挑战(第四版),连同FG 2020论坛的简要报告。自动识别亲属关系吸引了众多研究者的关注其全面应用,但它仍然是因为可用于确定一对面的是否有血缘关系的亲属或不是有限的信息非常具有挑战性的任务。在本文中,我们研究了以前的方法,并提出我们的方法。我们尝试了很多方法,如深度量学习为基础,以提取深嵌入功能的每张图片,然后确定他们是否有血缘关系的亲属通过欧氏距离或方法基于类。最后,我们找到了一些小把戏,比如采样更阴性的样品和高分辨率,可以帮助获得更好的性能。此外,我们提出了一个对称网络的二元分类为基础的方法,让我们的最好成绩在所有任务。
49. Deep Fusion Siamese Network for Automatic Kinship Verification [PDF] 返回目录
Jun Yu, Mengyan Li, Xinlong Hao, Guochen Xie
Abstract: Automatic kinship verification aims to determine whether some individuals belong to the same family. It is of great research significance to help missing persons reunite with their families. In this work, the challenging problem is progressively addressed in two respects. First, we propose a deep siamese network to quantify the relative similarity between two individuals. When given two input face images, the deep siamese network extracts the features from them and fuses these features by combining and concatenating. Then, the fused features are fed into a fully-connected network to obtain the similarity score between two faces, which is used to verify the kinship. To improve the performance, a jury system is also employed for multi-model fusion. Second, two deep siamese networks are integrated into a deep triplet network for tri-subject (i.e., father, mother and child) kinship verification, which is intended to decide whether a child is related to a pair of parents or not. Specifically, the obtained similarity scores of father-child and mother-child are weighted to generate the parent-child similarity score for kinship verification. Recognizing Families In the Wild (RFIW) is a challenging kinship recognition task with multiple tracks, which is based on Families in the Wild (FIW), a large-scale and comprehensive image database for automatic kinship recognition. The Kinship Verification (track I) and Tri-Subject Verification (track II) are supported during the ongoing RFIW2020 Challenge. Our team (ustc-nelslip) ranked 1st in track II, and 3rd in track I. The code is available at {\color{blue}this https URL}.
摘要:自动亲属验证的目的是确定一些个人是否属于同一家族。这是极具研究意义的帮助失踪者与家人团聚。在这项工作中,具有挑战性的问题在两个方面正在逐步解决。首先,我们提出了一个深刻的连体网络来量化两个人之间的相对相似。当给两个输入人脸图像,深连体网络提取从他们的特点,并结合和连接融合这些功能。然后,将熔融的特征被馈送到完全连接的网络,以获得两个面,这是用来验证亲属间的相似性得分。为提高性能,陪审团系统也用于多模式融合。其次,两道深深的连体网络都集成到一个深深的三重网络三科目(即父亲,母亲和子女)亲属关系的验证,其目的是决定孩子是否与一对父母或不是。具体地说,父 - 子和母子所获得的相似性得分进行加权,以产生父子相似度分数的血缘关系的验证。认识到家庭在野外(RFIW)是一个多轨道,这是基于在野生(FIW)的家庭,大规模和自动亲属识别综合图像数据库一个具有挑战性的亲属识别任务。亲属验证(跟踪我)和三主题验证(追踪II)正在进行RFIW2020挑战赛的支持。我们的团队(USTC-nelslip)排名第1的轨迹II,和第三轨道一的代码可以在{\ {色蓝}这个HTTPS URL}。
Jun Yu, Mengyan Li, Xinlong Hao, Guochen Xie
Abstract: Automatic kinship verification aims to determine whether some individuals belong to the same family. It is of great research significance to help missing persons reunite with their families. In this work, the challenging problem is progressively addressed in two respects. First, we propose a deep siamese network to quantify the relative similarity between two individuals. When given two input face images, the deep siamese network extracts the features from them and fuses these features by combining and concatenating. Then, the fused features are fed into a fully-connected network to obtain the similarity score between two faces, which is used to verify the kinship. To improve the performance, a jury system is also employed for multi-model fusion. Second, two deep siamese networks are integrated into a deep triplet network for tri-subject (i.e., father, mother and child) kinship verification, which is intended to decide whether a child is related to a pair of parents or not. Specifically, the obtained similarity scores of father-child and mother-child are weighted to generate the parent-child similarity score for kinship verification. Recognizing Families In the Wild (RFIW) is a challenging kinship recognition task with multiple tracks, which is based on Families in the Wild (FIW), a large-scale and comprehensive image database for automatic kinship recognition. The Kinship Verification (track I) and Tri-Subject Verification (track II) are supported during the ongoing RFIW2020 Challenge. Our team (ustc-nelslip) ranked 1st in track II, and 3rd in track I. The code is available at {\color{blue}this https URL}.
摘要:自动亲属验证的目的是确定一些个人是否属于同一家族。这是极具研究意义的帮助失踪者与家人团聚。在这项工作中,具有挑战性的问题在两个方面正在逐步解决。首先,我们提出了一个深刻的连体网络来量化两个人之间的相对相似。当给两个输入人脸图像,深连体网络提取从他们的特点,并结合和连接融合这些功能。然后,将熔融的特征被馈送到完全连接的网络,以获得两个面,这是用来验证亲属间的相似性得分。为提高性能,陪审团系统也用于多模式融合。其次,两道深深的连体网络都集成到一个深深的三重网络三科目(即父亲,母亲和子女)亲属关系的验证,其目的是决定孩子是否与一对父母或不是。具体地说,父 - 子和母子所获得的相似性得分进行加权,以产生父子相似度分数的血缘关系的验证。认识到家庭在野外(RFIW)是一个多轨道,这是基于在野生(FIW)的家庭,大规模和自动亲属识别综合图像数据库一个具有挑战性的亲属识别任务。亲属验证(跟踪我)和三主题验证(追踪II)正在进行RFIW2020挑战赛的支持。我们的团队(USTC-nelslip)排名第1的轨迹II,和第三轨道一的代码可以在{\ {色蓝}这个HTTPS URL}。
50. Automated Neuron Shape Analysis from Electron Microscopy [PDF] 返回目录
Sharmishtaa Seshamani, Leila Elabbady, Casey Schneider-Mizell, Gayathri Mahalingam, Sven Dorkenwald, Agnes Bodor, Thomas Macrina, Daniel Bumbarger, JoAnn Buchanan, Marc Takeno, Wenjing Yin, Derrick Brittain, Russel Torres, Daniel Kapner, Kisuk lee, Ran Lu, Jinpeng Wu, Nuno daCosta, Clay Reid, Forrest Collman
Abstract: Morphology based analysis of cell types has been an area of great interest to the neuroscience community for several decades. Recently, high resolution electron microscopy (EM) datasets of the mouse brain have opened up opportunities for data analysis at a level of detail that was previously impossible. These datasets are very large in nature and thus, manual analysis is not a practical solution. Of particular interest are details to the level of post synaptic structures. This paper proposes a fully automated framework for analysis of post-synaptic structure based neuron analysis from EM data. The processing framework involves shape extraction, representation with an autoencoder, and whole cell modeling and analysis based on shape distributions. We apply our novel framework on a dataset of 1031 neurons obtained from imaging a 1mm x 1mm x 40 micrometer volume of the mouse visual cortex and show the strength of our method in clustering and classification of neuronal shapes.
摘要:基于形态学细胞类型的分析已经极大兴趣神经科学界几十年的区域。近日,小鼠大脑的高分辨率电子显微镜(EM)的数据集已经在细节的水平,这在以前是不可能开辟了数据分析的机会。这些数据集是在本质上非常大,因此,人工分析是不实际的解决方案。特别关注的是细节,突触后结构的水平。本文提出了一种从EM数据基于突触后神经元结构分析,分析的完全自动化的框架。处理框架包括形状提取,与自动编码表示,以及全细胞建模和分析基于形状分布。我们应用在从成像鼠标视觉皮层提供1mm x1毫米×40微米的体积得到的1031个神经元数据集提供了新颖的框架和显示我们的方法的强度在聚类和神经元形状的分类。
Sharmishtaa Seshamani, Leila Elabbady, Casey Schneider-Mizell, Gayathri Mahalingam, Sven Dorkenwald, Agnes Bodor, Thomas Macrina, Daniel Bumbarger, JoAnn Buchanan, Marc Takeno, Wenjing Yin, Derrick Brittain, Russel Torres, Daniel Kapner, Kisuk lee, Ran Lu, Jinpeng Wu, Nuno daCosta, Clay Reid, Forrest Collman
Abstract: Morphology based analysis of cell types has been an area of great interest to the neuroscience community for several decades. Recently, high resolution electron microscopy (EM) datasets of the mouse brain have opened up opportunities for data analysis at a level of detail that was previously impossible. These datasets are very large in nature and thus, manual analysis is not a practical solution. Of particular interest are details to the level of post synaptic structures. This paper proposes a fully automated framework for analysis of post-synaptic structure based neuron analysis from EM data. The processing framework involves shape extraction, representation with an autoencoder, and whole cell modeling and analysis based on shape distributions. We apply our novel framework on a dataset of 1031 neurons obtained from imaging a 1mm x 1mm x 40 micrometer volume of the mouse visual cortex and show the strength of our method in clustering and classification of neuronal shapes.
摘要:基于形态学细胞类型的分析已经极大兴趣神经科学界几十年的区域。近日,小鼠大脑的高分辨率电子显微镜(EM)的数据集已经在细节的水平,这在以前是不可能开辟了数据分析的机会。这些数据集是在本质上非常大,因此,人工分析是不实际的解决方案。特别关注的是细节,突触后结构的水平。本文提出了一种从EM数据基于突触后神经元结构分析,分析的完全自动化的框架。处理框架包括形状提取,与自动编码表示,以及全细胞建模和分析基于形状分布。我们应用在从成像鼠标视觉皮层提供1mm x1毫米×40微米的体积得到的1031个神经元数据集提供了新颖的框架和显示我们的方法的强度在聚类和神经元形状的分类。
51. Anatomical Predictions using Subject-Specific Medical Data [PDF] 返回目录
Marianne Rakic, John Guttag, Adrian V. Dalca
Abstract: Changes over time in brain anatomy can provide important insight for treatment design or scientific analyses. We present a method that predicts how a brain MRI for an individual will change over time. We model changes using a diffeomorphic deformation field that we predict using function using convolutional neural networks. Given a predicted deformation field, a baseline scan can be warped to give a prediction of the brain scan at a future time. We demonstrate the method using the ADNI cohort, and analyze how performance is affected by model variants and the subject-specific information provided. We show that the model provides good predictions and that external clinical data can improve predictions.
摘要:随时间的变化在大脑解剖可以为治疗提供设计和科学分析的重要见解。我们提出了如何预测脑MRI用于个人会随时间而改变的方法。我们利用微分同胚的形变场是我们预测使用功能使用卷积神经网络模型的变化。由于预测的形变场,基线扫描可以被扭曲在将来某个时间给大脑扫描的预测。我们演示使用ADNI队列的方法,并分析如何影响性能的模型变量和所提供的对象特定信息。我们表明,该模型提供了良好的预测和外部临床数据可提高预测。
Marianne Rakic, John Guttag, Adrian V. Dalca
Abstract: Changes over time in brain anatomy can provide important insight for treatment design or scientific analyses. We present a method that predicts how a brain MRI for an individual will change over time. We model changes using a diffeomorphic deformation field that we predict using function using convolutional neural networks. Given a predicted deformation field, a baseline scan can be warped to give a prediction of the brain scan at a future time. We demonstrate the method using the ADNI cohort, and analyze how performance is affected by model variants and the subject-specific information provided. We show that the model provides good predictions and that external clinical data can improve predictions.
摘要:随时间的变化在大脑解剖可以为治疗提供设计和科学分析的重要见解。我们提出了如何预测脑MRI用于个人会随时间而改变的方法。我们利用微分同胚的形变场是我们预测使用功能使用卷积神经网络模型的变化。由于预测的形变场,基线扫描可以被扭曲在将来某个时间给大脑扫描的预测。我们演示使用ADNI队列的方法,并分析如何影响性能的模型变量和所提供的对象特定信息。我们表明,该模型提供了良好的预测和外部临床数据可提高预测。
52. Automated Measurements of Key Morphological Features of Human Embryos for IVF [PDF] 返回目录
B. D. Leahy, W.-D. Jang, H. Y. Yang, R. Struyven, D. Wei, Z. Sun, K. R. Lee, C. Royston, L. Cam, Y. Kalma, F. Azem, D. Ben-Yosef, H. Pfister, D. Needleman
Abstract: A major challenge in clinical In-Vitro Fertilization (IVF) is selecting the highest quality embryo to transfer to the patient in the hopes of achieving a pregnancy. Time-lapse microscopy provides clinicians with a wealth of information for selecting embryos. However, the resulting movies of embryos are currently analyzed manually, which is time consuming and subjective. Here, we automate feature extraction of time-lapse microscopy of human embryos with a machine-learning pipeline of five convolutional neural networks (CNNs). Our pipeline consists of (1) semantic segmentation of the regions of the embryo, (2) regression predictions of fragment severity, (3) classification of the developmental stage, and object instance segmentation of (4) cells and (5) pronuclei. Our approach greatly speeds up the measurement of quantitative, biologically relevant features that may aid in embryo selection.
摘要:在临床体外受精(IVF)的一个主要挑战是如何选择最优质的胚胎转移到实现怀孕的希望病人。时间推移显微镜为临床医生提供了丰富的选择胚胎的信息。然而,得到的胚胎的电影目前人工分析,这是耗时的和主观的。在这里,我们有五个卷积神经网络(细胞神经网络)的机器学习管道自动化人类胚胎的时间间隔显微镜的特征提取。我们的管道包括:(1)胚胎的区域的语义分割,片段严重性的(2)的回归预测,发育阶段(3)的分类,和(4)细胞的对象实例分割和(5)原核。我们的方法大大加快了,可能在胚胎选择有助于定量,生物相关功能的测量。
B. D. Leahy, W.-D. Jang, H. Y. Yang, R. Struyven, D. Wei, Z. Sun, K. R. Lee, C. Royston, L. Cam, Y. Kalma, F. Azem, D. Ben-Yosef, H. Pfister, D. Needleman
Abstract: A major challenge in clinical In-Vitro Fertilization (IVF) is selecting the highest quality embryo to transfer to the patient in the hopes of achieving a pregnancy. Time-lapse microscopy provides clinicians with a wealth of information for selecting embryos. However, the resulting movies of embryos are currently analyzed manually, which is time consuming and subjective. Here, we automate feature extraction of time-lapse microscopy of human embryos with a machine-learning pipeline of five convolutional neural networks (CNNs). Our pipeline consists of (1) semantic segmentation of the regions of the embryo, (2) regression predictions of fragment severity, (3) classification of the developmental stage, and object instance segmentation of (4) cells and (5) pronuclei. Our approach greatly speeds up the measurement of quantitative, biologically relevant features that may aid in embryo selection.
摘要:在临床体外受精(IVF)的一个主要挑战是如何选择最优质的胚胎转移到实现怀孕的希望病人。时间推移显微镜为临床医生提供了丰富的选择胚胎的信息。然而,得到的胚胎的电影目前人工分析,这是耗时的和主观的。在这里,我们有五个卷积神经网络(细胞神经网络)的机器学习管道自动化人类胚胎的时间间隔显微镜的特征提取。我们的管道包括:(1)胚胎的区域的语义分割,片段严重性的(2)的回归预测,发育阶段(3)的分类,和(4)细胞的对象实例分割和(5)原核。我们的方法大大加快了,可能在胚胎选择有助于定量,生物相关功能的测量。
53. Assessing the validity of saliency maps for abnormality localization in medical imaging [PDF] 返回目录
Nishanth Thumbavanam Arun, Nathan Gaw, Praveer Singh, Ken Chang, Katharina Viktoria Hoebel, Jay Patel, Mishka Gidwani, Jayashree Kalpathy-Cramer
Abstract: Saliency maps have become a widely used method to assess which areas of the input image are most pertinent to the prediction of a trained neural network. However, in the context of medical imaging, there is no study to our knowledge that has examined the efficacy of these techniques and quantified them using overlap with ground truth bounding boxes. In this work, we explored the credibility of the various existing saliency map methods on the RSNA Pneumonia dataset. We found that GradCAM was the most sensitive to model parameter and label randomization, and was highly agnostic to model architecture.
摘要:显着性映射已成为评估哪些输入图像的区域是最相关的训练的神经网络的预测的广泛使用的方法。然而,在医学成像的背景下,有没有研究就我们所知,已经审查了这些技术的有效性和使用的重叠与地面实况边框量化它们。在这项工作中,我们探讨了在RSNA肺炎数据集现有的各种显着图方法的可信度。我们发现,GradCAM是模型参数和标签随机化最敏感的,并且是高度不可知的模型架构。
Nishanth Thumbavanam Arun, Nathan Gaw, Praveer Singh, Ken Chang, Katharina Viktoria Hoebel, Jay Patel, Mishka Gidwani, Jayashree Kalpathy-Cramer
Abstract: Saliency maps have become a widely used method to assess which areas of the input image are most pertinent to the prediction of a trained neural network. However, in the context of medical imaging, there is no study to our knowledge that has examined the efficacy of these techniques and quantified them using overlap with ground truth bounding boxes. In this work, we explored the credibility of the various existing saliency map methods on the RSNA Pneumonia dataset. We found that GradCAM was the most sensitive to model parameter and label randomization, and was highly agnostic to model architecture.
摘要:显着性映射已成为评估哪些输入图像的区域是最相关的训练的神经网络的预测的广泛使用的方法。然而,在医学成像的背景下,有没有研究就我们所知,已经审查了这些技术的有效性和使用的重叠与地面实况边框量化它们。在这项工作中,我们探讨了在RSNA肺炎数据集现有的各种显着图方法的可信度。我们发现,GradCAM是模型参数和标签随机化最敏感的,并且是高度不可知的模型架构。
54. PlenoptiSign: an optical design tool for plenoptic imaging [PDF] 返回目录
Christopher Hahne, Amar Aggoun
Abstract: Plenoptic imaging enables a light-field to be captured by a single monocular objective lens and an array of micro lenses attached to an image sensor. Metric distances of the light-field's depth planes remain unapparent prior to acquisition. Recent research showed that sampled depth locations rely on the parameters of the system's optical components. This paper presents PlenoptiSign, which implements these findings as a Python software package to help assist in an experimental or prototyping stage of a plenoptic system.
摘要:全光成像使得光场由单个单目物镜并连接到图像传感器的微透镜的阵列被捕获。光场的景深面的公制距离保持收购前不明显。最近的研究表明,采样深度位置靠系统的光学元件的参数。本文礼物PlenoptiSign,它实现这些发现为Python软件包帮助协助实验或原型全光系统的阶段。
Christopher Hahne, Amar Aggoun
Abstract: Plenoptic imaging enables a light-field to be captured by a single monocular objective lens and an array of micro lenses attached to an image sensor. Metric distances of the light-field's depth planes remain unapparent prior to acquisition. Recent research showed that sampled depth locations rely on the parameters of the system's optical components. This paper presents PlenoptiSign, which implements these findings as a Python software package to help assist in an experimental or prototyping stage of a plenoptic system.
摘要:全光成像使得光场由单个单目物镜并连接到图像传感器的微透镜的阵列被捕获。光场的景深面的公制距离保持收购前不明显。最近的研究表明,采样深度位置靠系统的光学元件的参数。本文礼物PlenoptiSign,它实现这些发现为Python软件包帮助协助实验或原型全光系统的阶段。
55. An Online Platform for Automatic Skull Defect Restoration and Cranial Implant Design [PDF] 返回目录
Jianning Li, Antonio Pepe, Christina Gsaxner, Jan Egger
Abstract: We introduce a fully automatic system for cranial implant design, a common task in cranioplasty operations. The system is currently integrated in Studierfenster (this http URL), an online, cloud-based medical image processing platform for medical imaging applications. Enhanced by deep learning algorithms, the system automatically restores the missing part of a skull (i.e., skull shape completion) and generates the desired implant by subtracting the defective skull from the completed skull. The generated implant can be downloaded in the STereoLithography (.stl) format directly via the browser interface of the system. The implant model can then be sent to a 3D printer for in loco implant manufacturing. Furthermore, thanks to the standard format, the user can thereafter load the model into another application for post-processing whenever necessary. Such an automatic cranial implant design system can be integrated into the clinical practice to improve the current routine for surgeries related to skull defect repair (e.g., cranioplasty). Our system, although currently intended for educational and research use only, can be seen as an application of additive manufacturing for fast, patient-specific implant design.
摘要:介绍了颅骨植入物的设计,在颅骨操作的共同任务全自动系统。该系统目前集成在Studierfenster(这个HTTP URL),用于医疗成像应用程序的在线,基于云的医疗图像处理平台。通过深学习算法增强,系统自动恢复在头骨(即,头骨形状完成)的缺失部分,并通过从完成的颅骨减去缺陷颅骨产生期望植入物。所生成的植入物可以在立体光刻(.STL)格式直接通过系统的浏览器接口下载。所述植入物模型然后可以在机车的植入物制造被发送到3D打印机。此外,由于标准格式,用户可以随后模型加载到必要的另一应用为后处理每当。这种自动颅植入物设计系统可以集成到临床实践,以改善与颅骨缺损修补术(例如,颅骨成形术)手术当前程序。我们的系统,虽然目前用于教育和研究只使用,可以被看作是添加剂制造的快速应用,患者特异性植入物设计。
Jianning Li, Antonio Pepe, Christina Gsaxner, Jan Egger
Abstract: We introduce a fully automatic system for cranial implant design, a common task in cranioplasty operations. The system is currently integrated in Studierfenster (this http URL), an online, cloud-based medical image processing platform for medical imaging applications. Enhanced by deep learning algorithms, the system automatically restores the missing part of a skull (i.e., skull shape completion) and generates the desired implant by subtracting the defective skull from the completed skull. The generated implant can be downloaded in the STereoLithography (.stl) format directly via the browser interface of the system. The implant model can then be sent to a 3D printer for in loco implant manufacturing. Furthermore, thanks to the standard format, the user can thereafter load the model into another application for post-processing whenever necessary. Such an automatic cranial implant design system can be integrated into the clinical practice to improve the current routine for surgeries related to skull defect repair (e.g., cranioplasty). Our system, although currently intended for educational and research use only, can be seen as an application of additive manufacturing for fast, patient-specific implant design.
摘要:介绍了颅骨植入物的设计,在颅骨操作的共同任务全自动系统。该系统目前集成在Studierfenster(这个HTTP URL),用于医疗成像应用程序的在线,基于云的医疗图像处理平台。通过深学习算法增强,系统自动恢复在头骨(即,头骨形状完成)的缺失部分,并通过从完成的颅骨减去缺陷颅骨产生期望植入物。所生成的植入物可以在立体光刻(.STL)格式直接通过系统的浏览器接口下载。所述植入物模型然后可以在机车的植入物制造被发送到3D打印机。此外,由于标准格式,用户可以随后模型加载到必要的另一应用为后处理每当。这种自动颅植入物设计系统可以集成到临床实践,以改善与颅骨缺损修补术(例如,颅骨成形术)手术当前程序。我们的系统,虽然目前用于教育和研究只使用,可以被看作是添加剂制造的快速应用,患者特异性植入物设计。
56. Reducing the X-ray radiation exposure frequency in cardio-angiography via deep-learning based video interpolation [PDF] 返回目录
Xiao-Lei Yin, Dong-Xue Liang, Lu Wang, Jing Qiu, Zhi-Yun Yang, Jun-Hui Xing, Jian-Zeng Dong, Zhao-Yuan Ma
Abstract: Cardiac coronary angiography is a major technology to assist doctors during cardiac interventional surgeries. Under the exposure of X-ray radiation, doctors inject contrast agents through catheters to determine the position and status of coronary vessels in real time. To get a coronary angiography video with a high frame rate, the doctor needs to increase the exposure frequency and intensity of the X-ray. This will inevitably increase the X-ray harm to both patients and surgeons. In this work, we innovatively utilize a deep-learning based video interpolation algorithm to interpolate coronary angiography videos. Moreover, we establish a new coronary angiography image dataset ,which contains 95,039 triplets images to retrain the video interpolation network model. Using the retrained network we synthesize high frame rate coronary angiography video from the low frame rate coronary angiography video. The average peak signal to noise ratio(PSNR) of those synthesized video frames reaches 34dB. Extensive experiment results demonstrate the feasibility of using the video frame interpolation algorithm to synthesize continuous and clear high frame rate coronary angiography video. With the help of this technology, doctors can significantly reduce exposure frequency and intensity of the X-ray during coronary angiography.
摘要:心脏冠状动脉造影是在心脏介入手术辅助医生的主要技术。下的X射线辐射的曝光,医生注射造影剂通过导管,以确定实时位置和冠状血管的状态。为了得到具有高帧速率的冠状动脉血管造影术的视频,医生需要增加X射线的曝光频率和强度。这将不可避免地增加了X射线伤害患者和外科医生。在这项工作中,我们创新性地采用了深刻的学习基于视频插值算法进行插值冠状动脉造影的视频。此外,我们建立了一个新的冠状动脉造影图像数据集,其中包含95039倍三胞胎的图像重新培训视频插值网络模型。利用再培训网络,我们从合成低帧率冠状动脉造影视频高帧率冠状动脉造影视频。平均峰值信号,以合成的那些视频帧的信噪比(PSNR)达到34分贝。广泛实验结果证明使用视频帧内插算法来合成连续和清晰的高帧速率冠状动脉造影视频的可行性。有了这项技术的帮助下,医生可以显著降低冠脉造影过程中曝光频率和X射线的强度。
Xiao-Lei Yin, Dong-Xue Liang, Lu Wang, Jing Qiu, Zhi-Yun Yang, Jun-Hui Xing, Jian-Zeng Dong, Zhao-Yuan Ma
Abstract: Cardiac coronary angiography is a major technology to assist doctors during cardiac interventional surgeries. Under the exposure of X-ray radiation, doctors inject contrast agents through catheters to determine the position and status of coronary vessels in real time. To get a coronary angiography video with a high frame rate, the doctor needs to increase the exposure frequency and intensity of the X-ray. This will inevitably increase the X-ray harm to both patients and surgeons. In this work, we innovatively utilize a deep-learning based video interpolation algorithm to interpolate coronary angiography videos. Moreover, we establish a new coronary angiography image dataset ,which contains 95,039 triplets images to retrain the video interpolation network model. Using the retrained network we synthesize high frame rate coronary angiography video from the low frame rate coronary angiography video. The average peak signal to noise ratio(PSNR) of those synthesized video frames reaches 34dB. Extensive experiment results demonstrate the feasibility of using the video frame interpolation algorithm to synthesize continuous and clear high frame rate coronary angiography video. With the help of this technology, doctors can significantly reduce exposure frequency and intensity of the X-ray during coronary angiography.
摘要:心脏冠状动脉造影是在心脏介入手术辅助医生的主要技术。下的X射线辐射的曝光,医生注射造影剂通过导管,以确定实时位置和冠状血管的状态。为了得到具有高帧速率的冠状动脉血管造影术的视频,医生需要增加X射线的曝光频率和强度。这将不可避免地增加了X射线伤害患者和外科医生。在这项工作中,我们创新性地采用了深刻的学习基于视频插值算法进行插值冠状动脉造影的视频。此外,我们建立了一个新的冠状动脉造影图像数据集,其中包含95039倍三胞胎的图像重新培训视频插值网络模型。利用再培训网络,我们从合成低帧率冠状动脉造影视频高帧率冠状动脉造影视频。平均峰值信号,以合成的那些视频帧的信噪比(PSNR)达到34分贝。广泛实验结果证明使用视频帧内插算法来合成连续和清晰的高帧速率冠状动脉造影视频的可行性。有了这项技术的帮助下,医生可以显著降低冠脉造影过程中曝光频率和X射线的强度。
57. Residual Squeeze-and-Excitation Network for Fast Image Deraining [PDF] 返回目录
Jun Fu, Jianfeng Xu, Kazuyuki Tasaka, Zhibo Chen
Abstract: Image deraining is an important image processing task as rain streaks not only severely degrade the visual quality of images but also significantly affect the performance of high-level vision tasks. Traditional methods progressively remove rain streaks via different recurrent neural networks. However, these methods fail to yield plausible rain-free images in an efficient manner. In this paper, we propose a residual squeeze-and-excitation network called RSEN for fast image deraining as well as superior deraining performance compared with state-of-the-art approaches. Specifically, RSEN adopts a lightweight encoder-decoder architecture to conduct rain removal in one stage. Besides, both encoder and decoder adopt a novel residual squeeze-and-excitation block as the core of feature extraction, which contains a residual block for producing hierarchical features, followed by a squeeze-and-excitation block for channel-wisely enhancing the resulted hierarchical features. Experimental results demonstrate that our method can not only considerably reduce the computational complexity but also significantly improve the deraining performance compared with state-of-the-art methods.
摘要:图片deraining是一种重要的图像处理任务,因为雨条纹不仅严重降低图像的视觉质量,同时也显著影响的高层次的视觉任务的性能。传统的方法逐步通过不同的重复神经网络删除雨条纹。然而,这些方法不能以有效的方式,以产生可行的无雨的图像。在本文中,我们提出了称为RSEN为与国家的最先进的方法相比快速图像deraining以及优异的性能deraining残余挤压和激励网络。具体而言,采用RSEN轻质编码器 - 解码器架构在一个阶段中进行雨去除。此外,编码器和解码器采用一种新颖的残余挤压和激励块作为特征提取的核心,其含有用于制造分层特征的残余块,随后是给信道-明智地提高挤压和 - 激励块导致分层特征。实验结果表明,该方法不仅可以大大降低计算复杂度,但与国家的最先进的方法相比也显著提高deraining性能。
Jun Fu, Jianfeng Xu, Kazuyuki Tasaka, Zhibo Chen
Abstract: Image deraining is an important image processing task as rain streaks not only severely degrade the visual quality of images but also significantly affect the performance of high-level vision tasks. Traditional methods progressively remove rain streaks via different recurrent neural networks. However, these methods fail to yield plausible rain-free images in an efficient manner. In this paper, we propose a residual squeeze-and-excitation network called RSEN for fast image deraining as well as superior deraining performance compared with state-of-the-art approaches. Specifically, RSEN adopts a lightweight encoder-decoder architecture to conduct rain removal in one stage. Besides, both encoder and decoder adopt a novel residual squeeze-and-excitation block as the core of feature extraction, which contains a residual block for producing hierarchical features, followed by a squeeze-and-excitation block for channel-wisely enhancing the resulted hierarchical features. Experimental results demonstrate that our method can not only considerably reduce the computational complexity but also significantly improve the deraining performance compared with state-of-the-art methods.
摘要:图片deraining是一种重要的图像处理任务,因为雨条纹不仅严重降低图像的视觉质量,同时也显著影响的高层次的视觉任务的性能。传统的方法逐步通过不同的重复神经网络删除雨条纹。然而,这些方法不能以有效的方式,以产生可行的无雨的图像。在本文中,我们提出了称为RSEN为与国家的最先进的方法相比快速图像deraining以及优异的性能deraining残余挤压和激励网络。具体而言,采用RSEN轻质编码器 - 解码器架构在一个阶段中进行雨去除。此外,编码器和解码器采用一种新颖的残余挤压和激励块作为特征提取的核心,其含有用于制造分层特征的残余块,随后是给信道-明智地提高挤压和 - 激励块导致分层特征。实验结果表明,该方法不仅可以大大降低计算复杂度,但与国家的最先进的方法相比也显著提高deraining性能。
58. Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: combination of data augmentation methods in a small dataset [PDF] 返回目录
Mizuho Nishio, Shunjiro Noguchi, Hidetoshi Matsuo, Takamichi Murakami
Abstract: Purpose: To develop and validate computer-aided diagnosis (CXDx) system for classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray (CXR) images. Because CXR datasets related with COVID-19 were small, transfer learning with pretrained models and combination of data augmentation methods were used to improve accuracy and robustness of the CADx system. Materials and Methods: From two public datasets, 1248 CXR images were obtained, which included 215, 533, and 500 CXR images of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. The proposed CADx system utilized VGG16 as a pretrained model and combination of conventional method and mixup as data augmentation methods. Other types of pretrained models were used for comparison with the VGG16-based model. In addition, single type or no data augmentation methods were also evaluated. Splitting of training/validation/test sets was used when building and evaluating the CADx system. Three-category accuracy was evaluated for test set with 125 CXR images. Results: The three-category accuracy of the CAD system was 83.6% between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. In addition, sensitivity of COVID-19 pneumonia was more than 90%. The combination of conventional method and mixup was more useful than single type or no data augmentation methods. Conclusions: It was possible to build the accurate CADx system for the 3-category classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy.
摘要:目的:开发和用于COVID-19肺炎,非COVID-19肺炎,和上胸部透视(CXR)图像健康之间分类验证计算机辅助诊断(CXDx)系统。因为与COVID-19相关的数据集CXR太小,与预训练的模型和数据的增强方法的组合传递学习被用来改善的CADx系统的精度和鲁棒性。材料和方法:从两个公共数据集,获得1248个CXR图像,其中包括215,533,和COVID-19肺炎500个CXR图像,非COVID-19肺炎,和健康。所提出的CADx系统利用VGG16常规方法和如的mixup数据扩张方法的一个预训练的模型和组合。其他类型的预训练的模型被用于与基于VGG16模型比较。此外,单一类型的数据或没有数据增强方法也进行了评价。建设和评估的CADx系统时使用的培训/认证/测试集拆分。三分类准确度进行了评估测试集125个CXR图像。结果:CAD系统的三个类别准确率为83.6%COVID-19肺炎,非COVID-19肺炎,和健康之间。此外,COVID-19肺炎的灵敏度为90%以上。常规方法和的mixup的组合比单个类型的数据或没有数据增强方法更有用。结论:这是可能的建立COVID-19的肺炎,非COVID-19肺炎和健康的3类分类的准确的CADx系统。
Mizuho Nishio, Shunjiro Noguchi, Hidetoshi Matsuo, Takamichi Murakami
Abstract: Purpose: To develop and validate computer-aided diagnosis (CXDx) system for classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray (CXR) images. Because CXR datasets related with COVID-19 were small, transfer learning with pretrained models and combination of data augmentation methods were used to improve accuracy and robustness of the CADx system. Materials and Methods: From two public datasets, 1248 CXR images were obtained, which included 215, 533, and 500 CXR images of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. The proposed CADx system utilized VGG16 as a pretrained model and combination of conventional method and mixup as data augmentation methods. Other types of pretrained models were used for comparison with the VGG16-based model. In addition, single type or no data augmentation methods were also evaluated. Splitting of training/validation/test sets was used when building and evaluating the CADx system. Three-category accuracy was evaluated for test set with 125 CXR images. Results: The three-category accuracy of the CAD system was 83.6% between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. In addition, sensitivity of COVID-19 pneumonia was more than 90%. The combination of conventional method and mixup was more useful than single type or no data augmentation methods. Conclusions: It was possible to build the accurate CADx system for the 3-category classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy.
摘要:目的:开发和用于COVID-19肺炎,非COVID-19肺炎,和上胸部透视(CXR)图像健康之间分类验证计算机辅助诊断(CXDx)系统。因为与COVID-19相关的数据集CXR太小,与预训练的模型和数据的增强方法的组合传递学习被用来改善的CADx系统的精度和鲁棒性。材料和方法:从两个公共数据集,获得1248个CXR图像,其中包括215,533,和COVID-19肺炎500个CXR图像,非COVID-19肺炎,和健康。所提出的CADx系统利用VGG16常规方法和如的mixup数据扩张方法的一个预训练的模型和组合。其他类型的预训练的模型被用于与基于VGG16模型比较。此外,单一类型的数据或没有数据增强方法也进行了评价。建设和评估的CADx系统时使用的培训/认证/测试集拆分。三分类准确度进行了评估测试集125个CXR图像。结果:CAD系统的三个类别准确率为83.6%COVID-19肺炎,非COVID-19肺炎,和健康之间。此外,COVID-19肺炎的灵敏度为90%以上。常规方法和的mixup的组合比单个类型的数据或没有数据增强方法更有用。结论:这是可能的建立COVID-19的肺炎,非COVID-19肺炎和健康的3类分类的准确的CADx系统。
59. Using Generative Models for Pediatric wbMRI [PDF] 返回目录
Alex Chang, Vinith M. Suriyakumar, Abhishek Moturu, Nipaporn Tewattanarat, Andrea Doria, Anna Goldenberg
Abstract: Early detection of cancer is key to a good prognosis and requires frequent testing, especially in pediatrics. Whole-body magnetic resonance imaging (wbMRI) is an essential part of several well-established screening protocols, with screening starting in early childhood. To date, machine learning (ML) has been used on wbMRI images to stage adult cancer patients. It is not possible to use such tools in pediatrics due to the changing bone signal throughout growth, the difficulty of obtaining these images in young children due to movement and limited compliance, and the rarity of positive cases. We evaluate the quality of wbMRI images generated using generative adversarial networks (GANs) trained on wbMRI data from The Hospital for Sick Children in Toronto. We use the Frchet Inception Distance (FID) metric, Domain Frchet Distance (DFD), and blind tests with a radiology fellow for evaluation. We demonstrate that StyleGAN2 provides the best performance in generating wbMRI images with respect to all three metrics.
摘要:癌症的早期检测是关键,一个良好的预后,并且需要频繁的测试,尤其是在儿科。全身磁共振成像(wbMRI)是几个公认的筛查方案的主要部分,与筛选在幼儿期开始。迄今为止,机器学习(ML)已经被用在wbMRI图像期成人癌症患者。这是不可能的使用在儿科这样的工具,由于整个生长变化的骨信号,幼儿由于运动和有限的合规获取这些图像的困难,阳性病例罕见。我们评估wbMRI图像的使用生成对抗性网络产生的质量(甘斯)在多伦多病童的培训上wbMRI数据从医院。我们使用Frchet盗梦空间距离(FID)指标,域Frchet距离(DFD),和盲测试与放射学研究员进行评估。我们证明StyleGAN2提供相对于这三个指标产生wbMRI图像的最佳性能。
Alex Chang, Vinith M. Suriyakumar, Abhishek Moturu, Nipaporn Tewattanarat, Andrea Doria, Anna Goldenberg
Abstract: Early detection of cancer is key to a good prognosis and requires frequent testing, especially in pediatrics. Whole-body magnetic resonance imaging (wbMRI) is an essential part of several well-established screening protocols, with screening starting in early childhood. To date, machine learning (ML) has been used on wbMRI images to stage adult cancer patients. It is not possible to use such tools in pediatrics due to the changing bone signal throughout growth, the difficulty of obtaining these images in young children due to movement and limited compliance, and the rarity of positive cases. We evaluate the quality of wbMRI images generated using generative adversarial networks (GANs) trained on wbMRI data from The Hospital for Sick Children in Toronto. We use the Frchet Inception Distance (FID) metric, Domain Frchet Distance (DFD), and blind tests with a radiology fellow for evaluation. We demonstrate that StyleGAN2 provides the best performance in generating wbMRI images with respect to all three metrics.
摘要:癌症的早期检测是关键,一个良好的预后,并且需要频繁的测试,尤其是在儿科。全身磁共振成像(wbMRI)是几个公认的筛查方案的主要部分,与筛选在幼儿期开始。迄今为止,机器学习(ML)已经被用在wbMRI图像期成人癌症患者。这是不可能的使用在儿科这样的工具,由于整个生长变化的骨信号,幼儿由于运动和有限的合规获取这些图像的困难,阳性病例罕见。我们评估wbMRI图像的使用生成对抗性网络产生的质量(甘斯)在多伦多病童的培训上wbMRI数据从医院。我们使用Frchet盗梦空间距离(FID)指标,域Frchet距离(DFD),和盲测试与放射学研究员进行评估。我们证明StyleGAN2提供相对于这三个指标产生wbMRI图像的最佳性能。
60. A multimodal approach for multi-label movie genre classification [PDF] 返回目录
Rafael B. Mangolin, Rodolfo M. Pereira, Alceu S. Britto Jr., Carlos N. Silla Jr., Valéria D. Feltrim, Diego Bertolini, Yandre M. G. Costa
Abstract: Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database. The dataset was carefully curated and organized, and it was also made available as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients, Statistical Spectrum Descriptor , Local Binary Pattern with spectrograms, Long-Short Term Memory, and Convolutional Neural Networks. The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.
摘要:电影流派分类是一项具有挑战性的任务,已日益引起研究者的关注。在本文中,我们讨论了电影流派的多标签分类以在多路。为此,我们建立了一个预告片视频剪辑,字幕,提纲,和电影的海报从152622部电影,从电影数据库中获取的组成数据集。该数据集进行了仔细的策划,组织,它也提供作为这项工作的一个贡献。数据集的每一个电影是根据一组十八流派标签标记。我们使用不同种类的描述符,即梅尔频率倒谱系数,统计谱描述,局部二元模式与频谱图,长短期记忆,和卷积神经网络的提取从这些数据的功能。使用不同的分类,例如BinaryRelevance和ML-KNN描述符进行评价。我们还调查了不同分类的组合的性能/功能使用后融合策略,取得了令人鼓舞的结果。创建基于F-得分指标,我们的最好成绩,0.628,被分类的融合获得使用上的梗概LSTM,以及电影预告片帧使用CNN分类创建。当考虑到AUC-PR指标,最好的结果,0.673,也通过结合这些表示实现,但除此之外,基于LSTM分类从创建字幕使用。这些结果证实了根据在此应用领域不同的信息源分类器之间的互补性的存在。据我们所知,这是在对执行电影流派分类信息的多媒体源的多样性方面制定的最全面的研究。
Rafael B. Mangolin, Rodolfo M. Pereira, Alceu S. Britto Jr., Carlos N. Silla Jr., Valéria D. Feltrim, Diego Bertolini, Yandre M. G. Costa
Abstract: Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database. The dataset was carefully curated and organized, and it was also made available as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients, Statistical Spectrum Descriptor , Local Binary Pattern with spectrograms, Long-Short Term Memory, and Convolutional Neural Networks. The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.
摘要:电影流派分类是一项具有挑战性的任务,已日益引起研究者的关注。在本文中,我们讨论了电影流派的多标签分类以在多路。为此,我们建立了一个预告片视频剪辑,字幕,提纲,和电影的海报从152622部电影,从电影数据库中获取的组成数据集。该数据集进行了仔细的策划,组织,它也提供作为这项工作的一个贡献。数据集的每一个电影是根据一组十八流派标签标记。我们使用不同种类的描述符,即梅尔频率倒谱系数,统计谱描述,局部二元模式与频谱图,长短期记忆,和卷积神经网络的提取从这些数据的功能。使用不同的分类,例如BinaryRelevance和ML-KNN描述符进行评价。我们还调查了不同分类的组合的性能/功能使用后融合策略,取得了令人鼓舞的结果。创建基于F-得分指标,我们的最好成绩,0.628,被分类的融合获得使用上的梗概LSTM,以及电影预告片帧使用CNN分类创建。当考虑到AUC-PR指标,最好的结果,0.673,也通过结合这些表示实现,但除此之外,基于LSTM分类从创建字幕使用。这些结果证实了根据在此应用领域不同的信息源分类器之间的互补性的存在。据我们所知,这是在对执行电影流派分类信息的多媒体源的多样性方面制定的最全面的研究。
61. Limited-angle CT reconstruction via the L1/L2 minimization [PDF] 返回目录
Chao Wang, Min Tao, James Nagy, Yifei Lou
Abstract: In this paper, we consider minimizing the L1/L2 term on the gradient for a limit-angle scanning problem in computed tomography (CT) reconstruction. We design a splitting framework for both constrained and unconstrained optimization models. In addition, we can incorporate a box constraint that is reasonable for imaging applications. Numerical schemes are based on the alternating direction method of multipliers (ADMM), and we provide the convergence analysis of all the proposed algorithms (constrained/unconstrained and with/without the box constraint). Experimental results demonstrate the efficiency of our proposed approaches, showing significant improvements over the state-of-the-art methods in the limit-angle CT reconstruction. Specifically worth noticing is an exact recovery of the Shepp-Logan phantom from noiseless projection data with 30 scanning angle.
摘要:在本文中,我们考虑最大限度地减少对梯度为在计算机断层摄影(CT)重建一个极限角度扫描问题的L1 / L2术语。我们设计了两个约束和无约束优化模型分裂框架。另外,我们可以将一个盒子约束是合理的成像应用。数值方案是基于的乘法器(ADMM)交替方向方法,而且我们提供的所有提出的算法的收敛分析(约束/无约束和有/无的框约束)。实验结果表明,我们的提出的方法的效率,显示随国家的最先进的方法在极限角CT重建显著改进。具体而言值得注意的是Shepp-洛根幻象的从与30扫描角无噪声的投影数据的准确恢复。
Chao Wang, Min Tao, James Nagy, Yifei Lou
Abstract: In this paper, we consider minimizing the L1/L2 term on the gradient for a limit-angle scanning problem in computed tomography (CT) reconstruction. We design a splitting framework for both constrained and unconstrained optimization models. In addition, we can incorporate a box constraint that is reasonable for imaging applications. Numerical schemes are based on the alternating direction method of multipliers (ADMM), and we provide the convergence analysis of all the proposed algorithms (constrained/unconstrained and with/without the box constraint). Experimental results demonstrate the efficiency of our proposed approaches, showing significant improvements over the state-of-the-art methods in the limit-angle CT reconstruction. Specifically worth noticing is an exact recovery of the Shepp-Logan phantom from noiseless projection data with 30 scanning angle.
摘要:在本文中,我们考虑最大限度地减少对梯度为在计算机断层摄影(CT)重建一个极限角度扫描问题的L1 / L2术语。我们设计了两个约束和无约束优化模型分裂框架。另外,我们可以将一个盒子约束是合理的成像应用。数值方案是基于的乘法器(ADMM)交替方向方法,而且我们提供的所有提出的算法的收敛分析(约束/无约束和有/无的框约束)。实验结果表明,我们的提出的方法的效率,显示随国家的最先进的方法在极限角CT重建显著改进。具体而言值得注意的是Shepp-洛根幻象的从与30扫描角无噪声的投影数据的准确恢复。
62. Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos [PDF] 返回目录
Ajay Kumar Tanwani, Pierre Sermanet, Andy Yan, Raghav Anand, Mariano Phielipp, Ken Goldberg
Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at this https URL
摘要:在嵌入空间学习有意义的可视化表示可以促进下游任务,如行动分割和模仿概括。在本文中,我们将它们分组到行为段/分目标/选项半监督的方式学习手术视频演示的运动为中心的代表性。我们本Motion2Vec,即通过在连体网络最小化度量学习损失学习从视频观测深嵌入特征空间的算法:从相同的行为段图像被拉到一起而来自其他段的随机采样的图像被推开,同时尊重图像的时间顺序。该嵌入物被反复与后岗前培训,连体网络嵌入空间的一个给定的参数化经常性的神经网络分段。我们只使用一小套标记视频片段的语义来推断在学习的模型参数对准嵌入空间,并分配伪标签剩余未标记数据。我们证明从竖锯的数据集的公开可用的影片中使用这种表示的模仿手术缝合动作。结果得到平均表明性能改进85.5%分割准确度国家的最先进的几个基线,而运动姿势仿给出了在对测试组每观测位置0.94厘米错误。视频,代码和数据都可以在此HTTPS URL
Ajay Kumar Tanwani, Pierre Sermanet, Andy Yan, Raghav Anand, Mariano Phielipp, Ken Goldberg
Abstract: Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at this https URL
摘要:在嵌入空间学习有意义的可视化表示可以促进下游任务,如行动分割和模仿概括。在本文中,我们将它们分组到行为段/分目标/选项半监督的方式学习手术视频演示的运动为中心的代表性。我们本Motion2Vec,即通过在连体网络最小化度量学习损失学习从视频观测深嵌入特征空间的算法:从相同的行为段图像被拉到一起而来自其他段的随机采样的图像被推开,同时尊重图像的时间顺序。该嵌入物被反复与后岗前培训,连体网络嵌入空间的一个给定的参数化经常性的神经网络分段。我们只使用一小套标记视频片段的语义来推断在学习的模型参数对准嵌入空间,并分配伪标签剩余未标记数据。我们证明从竖锯的数据集的公开可用的影片中使用这种表示的模仿手术缝合动作。结果得到平均表明性能改进85.5%分割准确度国家的最先进的几个基线,而运动姿势仿给出了在对测试组每观测位置0.94厘米错误。视频,代码和数据都可以在此HTTPS URL
63. Pseudo-Representation Labeling Semi-Supervised Learning [PDF] 返回目录
Song-Bo Yang, Tian-li Yu
Abstract: In recent years, semi-supervised learning (SSL) has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models, which significantly reduces the demand for large amounts of labeled data. Many SSL techniques have been proposed and have shown promising performance on famous datasets such as ImageNet and CIFAR-10. However, some exiting techniques (especially data augmentation based) are not suitable for industrial applications empirically. Therefore, this work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data. In addition, our framework is integrated with self-supervised representation learning such that the classifier gains benefits from representation learning of both labeled and unlabeled data. This framework can be implemented without being limited at the specific model structure, but a general technique to improve the existing model. Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world. Empirically, it outperforms the current state-of-the-art semi-supervised learning methods in industrial types of classification problems such as the WM-811K wafer map and the MIT-BIH Arrhythmia dataset.
摘要:近年来,半监督学习(SSL)已显示出利用未标记的数据,以提高深学习模型的性能,这显著减少了大量的标签数据的需求巨大的成功。许多SSL技术已经被提出,并已显示出大有希望著名的数据集,如ImageNet和CIFAR-10的性能。然而,一些在离开技术(尤其是基于数据扩张)不适合于工业应用的经验。因此,这项工作提出了伪表示标签,一个简单而灵活的框架,利用伪标记技术来迭代标注未标记的数据量小,并把它们作为训练数据。此外,我们的框架与自我监督表示综合学习,使得从表示分级收益收益学习标记的和未标记的数据。该框架可以在不以特定的模型结构的限制来实现,但一般的技术,以改善现有模型。与现行方法相比,伪表示标记是更直观,能有效地解决现实世界中的实际问题。根据经验,它优于在工业类型的分类问题的WM-811K晶片图和MIT-BIH心律失常数据集如当前状态的最先进的半监督学习方法。
Song-Bo Yang, Tian-li Yu
Abstract: In recent years, semi-supervised learning (SSL) has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models, which significantly reduces the demand for large amounts of labeled data. Many SSL techniques have been proposed and have shown promising performance on famous datasets such as ImageNet and CIFAR-10. However, some exiting techniques (especially data augmentation based) are not suitable for industrial applications empirically. Therefore, this work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data. In addition, our framework is integrated with self-supervised representation learning such that the classifier gains benefits from representation learning of both labeled and unlabeled data. This framework can be implemented without being limited at the specific model structure, but a general technique to improve the existing model. Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world. Empirically, it outperforms the current state-of-the-art semi-supervised learning methods in industrial types of classification problems such as the WM-811K wafer map and the MIT-BIH Arrhythmia dataset.
摘要:近年来,半监督学习(SSL)已显示出利用未标记的数据,以提高深学习模型的性能,这显著减少了大量的标签数据的需求巨大的成功。许多SSL技术已经被提出,并已显示出大有希望著名的数据集,如ImageNet和CIFAR-10的性能。然而,一些在离开技术(尤其是基于数据扩张)不适合于工业应用的经验。因此,这项工作提出了伪表示标签,一个简单而灵活的框架,利用伪标记技术来迭代标注未标记的数据量小,并把它们作为训练数据。此外,我们的框架与自我监督表示综合学习,使得从表示分级收益收益学习标记的和未标记的数据。该框架可以在不以特定的模型结构的限制来实现,但一般的技术,以改善现有模型。与现行方法相比,伪表示标记是更直观,能有效地解决现实世界中的实际问题。根据经验,它优于在工业类型的分类问题的WM-811K晶片图和MIT-BIH心律失常数据集如当前状态的最先进的半监督学习方法。
64. DC-UNet: Rethinking the U-Net Architecture with Dual Channel Efficient CNN for Medical Images Segmentation [PDF] 返回目录
Ange Lou, Shuyue Guan, Murray Loew
Abstract: Recently, deep learning has become much more popular in computer vision area. The Convolution Neural Network (CNN) has brought a breakthrough in images segmentation areas, especially, for medical images. In this regard, U-Net is the predominant approach to medical image segmentation task. The U-Net not only performs well in segmenting multimodal medical images generally, but also in some tough cases of them. However, we found that the classical U-Net architecture has limitation in several aspects. Therefore, we applied modifications: 1) designed efficient CNN architecture to replace encoder and decoder, 2) applied residual module to replace skip connection between encoder and decoder to improve based on the-state-of-the-art U-Net model. Following these modifications, we designed a novel architecture--DC-UNet, as a potential successor to the U-Net architecture. We created a new effective CNN architecture and build the DC-UNet based on this CNN. We have evaluated our model on three datasets with tough cases and have obtained a relative improvement in performance of 2.90%, 1.49% and 11.42% respectively compared with classical U-Net. In addition, we used the Tanimoto similarity to replace the Jaccard similarity for gray-to-gray image comparisons.
摘要:近日,深学习已成为很多计算机视觉领域更受欢迎。卷积神经网络(CNN)带来了突破性的图像分割领域,特别是医学图像。在这方面,U-Net是主要方法医学图像分割任务。在U-Net的不仅畅销一般分割多模态医学图像进行,而且在他们的一些强硬的情况下。然而,我们发现,传统的U型网结构有几个方面的限制。因此,我们采用的修改:1)设计高效CNN架构,以取代编码器和解码器,2)施加残余模块来替换跳过,以改善基于该状态的最先进的U形网模型的编码器和解码器之间的连接。根据这些修改,我们设计了一个新颖的架构 - DC-UNET,作为一个潜在的继任者UNET架构。我们创建了一个新的有效CNN架构和建立在此基础上CNN的DC-UNET。我们已经评估了三个数据集中与碰硬我们的模型,并获得了经典的U型网络相比,2.90%,1.49%和11.42分别%的性能相对改善。此外,我们所用的谷本相似替换Jaccard相似用于灰度到灰度图像的比较。
Ange Lou, Shuyue Guan, Murray Loew
Abstract: Recently, deep learning has become much more popular in computer vision area. The Convolution Neural Network (CNN) has brought a breakthrough in images segmentation areas, especially, for medical images. In this regard, U-Net is the predominant approach to medical image segmentation task. The U-Net not only performs well in segmenting multimodal medical images generally, but also in some tough cases of them. However, we found that the classical U-Net architecture has limitation in several aspects. Therefore, we applied modifications: 1) designed efficient CNN architecture to replace encoder and decoder, 2) applied residual module to replace skip connection between encoder and decoder to improve based on the-state-of-the-art U-Net model. Following these modifications, we designed a novel architecture--DC-UNet, as a potential successor to the U-Net architecture. We created a new effective CNN architecture and build the DC-UNet based on this CNN. We have evaluated our model on three datasets with tough cases and have obtained a relative improvement in performance of 2.90%, 1.49% and 11.42% respectively compared with classical U-Net. In addition, we used the Tanimoto similarity to replace the Jaccard similarity for gray-to-gray image comparisons.
摘要:近日,深学习已成为很多计算机视觉领域更受欢迎。卷积神经网络(CNN)带来了突破性的图像分割领域,特别是医学图像。在这方面,U-Net是主要方法医学图像分割任务。在U-Net的不仅畅销一般分割多模态医学图像进行,而且在他们的一些强硬的情况下。然而,我们发现,传统的U型网结构有几个方面的限制。因此,我们采用的修改:1)设计高效CNN架构,以取代编码器和解码器,2)施加残余模块来替换跳过,以改善基于该状态的最先进的U形网模型的编码器和解码器之间的连接。根据这些修改,我们设计了一个新颖的架构 - DC-UNET,作为一个潜在的继任者UNET架构。我们创建了一个新的有效CNN架构和建立在此基础上CNN的DC-UNET。我们已经评估了三个数据集中与碰硬我们的模型,并获得了经典的U型网络相比,2.90%,1.49%和11.42分别%的性能相对改善。此外,我们所用的谷本相似替换Jaccard相似用于灰度到灰度图像的比较。
65. Probabilistic self-learning framework for Low-dose CT Denoising [PDF] 返回目录
Ti Bai, Dan Nguyen, Biling Wang, Steve Jiang
Abstract: Despite the indispensable role of X-ray computed tomography (CT) in diagnostic medicine field, the associated ionizing radiation is still a major concern considering that it may cause genetic and cancerous diseases. Decreasing the exposure can reduce the dose and hence the radiation-related risk, but will also induce higher quantum noise. Supervised deep learning can be used to train a neural network to denoise the low-dose CT (LDCT). However, its success requires massive pixel-wise paired LDCT and normal-dose CT (NDCT) images, which are rarely available in real practice. To alleviate this problem, in this paper, a shift-invariant property based neural network was devised to learn the inherent pixel correlations and also the noise distribution by only using the LDCT images, shaping into our probabilistic self-learning framework. Experimental results demonstrated that the proposed method outperformed the competitors, producing an enhanced LDCT image that has similar image style as the routine NDCT which is highly-preferable in clinic practice.
摘要:尽管X射线计算机断层扫描(CT)的诊断医学领域的不可或缺的作用,相关的电离辐射仍然是考虑到它可能造成遗传性和癌性疾病的主要问题。降低曝光可以减少剂量,并因此在辐射有关的危险,而且会诱发更高的量子噪声。监督学习深可用于训练神经网络进行去噪的低剂量CT(LDCT)。然而,它的成功需要大量的逐像素配对LDCT和正常剂量CT(NDCT)的图像,这在现实实践中很少使用。为了缓解这一问题,本文提出了一种移不变特性基于神经网络设计只用LDCT影像,塑造成我们的概率自我学习学习的框架内在像素的相关性,也是噪声分布。实验结果表明,所提出的方法优于竞争对手,产生具有类似于图像风格为常规NDCT这在临床实践高度优选的增强LDCT图像。
Ti Bai, Dan Nguyen, Biling Wang, Steve Jiang
Abstract: Despite the indispensable role of X-ray computed tomography (CT) in diagnostic medicine field, the associated ionizing radiation is still a major concern considering that it may cause genetic and cancerous diseases. Decreasing the exposure can reduce the dose and hence the radiation-related risk, but will also induce higher quantum noise. Supervised deep learning can be used to train a neural network to denoise the low-dose CT (LDCT). However, its success requires massive pixel-wise paired LDCT and normal-dose CT (NDCT) images, which are rarely available in real practice. To alleviate this problem, in this paper, a shift-invariant property based neural network was devised to learn the inherent pixel correlations and also the noise distribution by only using the LDCT images, shaping into our probabilistic self-learning framework. Experimental results demonstrated that the proposed method outperformed the competitors, producing an enhanced LDCT image that has similar image style as the routine NDCT which is highly-preferable in clinic practice.
摘要:尽管X射线计算机断层扫描(CT)的诊断医学领域的不可或缺的作用,相关的电离辐射仍然是考虑到它可能造成遗传性和癌性疾病的主要问题。降低曝光可以减少剂量,并因此在辐射有关的危险,而且会诱发更高的量子噪声。监督学习深可用于训练神经网络进行去噪的低剂量CT(LDCT)。然而,它的成功需要大量的逐像素配对LDCT和正常剂量CT(NDCT)的图像,这在现实实践中很少使用。为了缓解这一问题,本文提出了一种移不变特性基于神经网络设计只用LDCT影像,塑造成我们的概率自我学习学习的框架内在像素的相关性,也是噪声分布。实验结果表明,所提出的方法优于竞争对手,产生具有类似于图像风格为常规NDCT这在临床实践高度优选的增强LDCT图像。
66. Reconstructing undersampled photoacoustic microscopy images using deep learning [PDF] 返回目录
Anthony DiSpirito III, Daiwei Li, Tri Vu, Maomao Chen, Dong Zhang, Jianwen Luo, Roarke Horstmeyer, Junjie Yao
Abstract: One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed. In this study, we propose a novel application of deep learning principles to reconstruct undersampled PAM images and transcend the trade-off between spatial resolution and imaging speed. We compared various convolutional neural network (CNN) architectures, and selected a fully dense U-net (FD U-net) model that produced the best results. To mimic various undersampling conditions in practice, we artificially downsampled fully-sampled PAM images of mouse brain vasculature at different ratios. This allowed us to not only definitively establish the ground truth, but also train and test our deep learning model at various imaging conditions. Our results and numerical analysis have collectively demonstrated the robust performance of our model to reconstruct PAM images with as few as 2% of the original pixels, which may effectively shorten the imaging time without substantially sacrificing the image quality.
摘要:在光声显微镜(PAM)一个主要的技术挑战是空间分辨率和成像速度之间的必要的妥协。在这项研究中,我们提出的深度学习原理的新的应用来重建欠PAM图像,超越空间分辨率和成像速度之间的权衡。我们比较了各种卷积神经网络(CNN)的体系结构,和选定的完全致密的U形网(FD U形网)模型,该模型产生最好的结果。要在实践中模拟各种欠条件下,我们人为地以不同的比例间苗小鼠脑血管系统的完全采样PAM图像。这使我们不仅要明确建立地面实况,同时也培养并在不同的拍摄条件测试,我们深切的学习模式。我们的研究结果和数值分析已经集体展示了我们的模型的鲁棒性能来重建图像PAM与原始像素,这可有效的缩短摄像时间的少至2%,而基本上不牺牲图像质量。
Anthony DiSpirito III, Daiwei Li, Tri Vu, Maomao Chen, Dong Zhang, Jianwen Luo, Roarke Horstmeyer, Junjie Yao
Abstract: One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed. In this study, we propose a novel application of deep learning principles to reconstruct undersampled PAM images and transcend the trade-off between spatial resolution and imaging speed. We compared various convolutional neural network (CNN) architectures, and selected a fully dense U-net (FD U-net) model that produced the best results. To mimic various undersampling conditions in practice, we artificially downsampled fully-sampled PAM images of mouse brain vasculature at different ratios. This allowed us to not only definitively establish the ground truth, but also train and test our deep learning model at various imaging conditions. Our results and numerical analysis have collectively demonstrated the robust performance of our model to reconstruct PAM images with as few as 2% of the original pixels, which may effectively shorten the imaging time without substantially sacrificing the image quality.
摘要:在光声显微镜(PAM)一个主要的技术挑战是空间分辨率和成像速度之间的必要的妥协。在这项研究中,我们提出的深度学习原理的新的应用来重建欠PAM图像,超越空间分辨率和成像速度之间的权衡。我们比较了各种卷积神经网络(CNN)的体系结构,和选定的完全致密的U形网(FD U形网)模型,该模型产生最好的结果。要在实践中模拟各种欠条件下,我们人为地以不同的比例间苗小鼠脑血管系统的完全采样PAM图像。这使我们不仅要明确建立地面实况,同时也培养并在不同的拍摄条件测试,我们深切的学习模式。我们的研究结果和数值分析已经集体展示了我们的模型的鲁棒性能来重建图像PAM与原始像素,这可有效的缩短摄像时间的少至2%,而基本上不牺牲图像质量。
67. Hyperspectral Image Denoising via Global Spatial-Spectral Total Variation Regularized Nonconvex Local Low-Rank Tensor Approximation [PDF] 返回目录
Haijin Zeng, Xiaozhen Xie, Jifeng Ning
Abstract: Hyperspectral image (HSI) denoising aims to restore clean HSI from the noise-contaminated one. Noise contamination can often be caused during data acquisition and conversion. In this paper, we propose a novel spatial-spectral total variation (SSTV) regularized nonconvex local low-rank (LR) tensor approximation method to remove mixed noise in HSIs. From one aspect, the clean HSI data have its underlying local LR tensor property, even though the real HSI data may not be globally low-rank due to out-liers and non-Gaussian noise. According to this fact, we propose a novel tensor $L_{\gamma}$-norm to formulate the local LR prior. From another aspect, HSIs are assumed to be piecewisely smooth in the global spatial and spectral domains. Instead of traditional bandwise total variation, we use the SSTV regularization to simultaneously consider global spatial structure and spectral correlation of neighboring bands. Results on simulated and real HSI datasets indicate that the use of local LR tensor penalty and global SSTV can boost the preserving of local details and overall structural information in HSIs.
摘要:高光谱图像(HSI)降噪目的是从噪声污染的一个恢复干净HSI。噪声污染可经常数据采集和转换过程中所引起的。在本文中,我们提出了正规化的新颖的空间光谱总变化(SSTV)非凸本地低秩(LR)张量近似方法在HSIS以除去混合噪声。从一个侧面,干净恒指数据有其潜在的本地LR张财产,即使真实HSI数据可能不会在全球范围内低等级由于外liers和非高斯噪声。根据这一事实,我们提出了一个新的张量$ L _ {\伽玛} $ - 标准制定之前,当地的LR。从另一个方面,HSIS被假定为在全局空间和光谱域是piecewisely平滑。相反,传统的bandwise总的变化,我们使用SSTV正规化同时考虑全球性的空间结构和相邻带的频谱相关性。仿真和实测数据集恒指结果表明,使用本地LR张量罚和全球SSTV可以促进局部细节和整体HSIS结构信息的保护。
Haijin Zeng, Xiaozhen Xie, Jifeng Ning
Abstract: Hyperspectral image (HSI) denoising aims to restore clean HSI from the noise-contaminated one. Noise contamination can often be caused during data acquisition and conversion. In this paper, we propose a novel spatial-spectral total variation (SSTV) regularized nonconvex local low-rank (LR) tensor approximation method to remove mixed noise in HSIs. From one aspect, the clean HSI data have its underlying local LR tensor property, even though the real HSI data may not be globally low-rank due to out-liers and non-Gaussian noise. According to this fact, we propose a novel tensor $L_{\gamma}$-norm to formulate the local LR prior. From another aspect, HSIs are assumed to be piecewisely smooth in the global spatial and spectral domains. Instead of traditional bandwise total variation, we use the SSTV regularization to simultaneously consider global spatial structure and spectral correlation of neighboring bands. Results on simulated and real HSI datasets indicate that the use of local LR tensor penalty and global SSTV can boost the preserving of local details and overall structural information in HSIs.
摘要:高光谱图像(HSI)降噪目的是从噪声污染的一个恢复干净HSI。噪声污染可经常数据采集和转换过程中所引起的。在本文中,我们提出了正规化的新颖的空间光谱总变化(SSTV)非凸本地低秩(LR)张量近似方法在HSIS以除去混合噪声。从一个侧面,干净恒指数据有其潜在的本地LR张财产,即使真实HSI数据可能不会在全球范围内低等级由于外liers和非高斯噪声。根据这一事实,我们提出了一个新的张量$ L _ {\伽玛} $ - 标准制定之前,当地的LR。从另一个方面,HSIS被假定为在全局空间和光谱域是piecewisely平滑。相反,传统的bandwise总的变化,我们使用SSTV正规化同时考虑全球性的空间结构和相邻带的频谱相关性。仿真和实测数据集恒指结果表明,使用本地LR张量罚和全球SSTV可以促进局部细节和整体HSIS结构信息的保护。
68. The global information for land cover classification by dual-branch deep learning [PDF] 返回目录
Fan Zhang, MinChao Yan, Chen Hu, Jun Ni, Fei Ma
Abstract: Land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image so as to reduce the work of human. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. This paper proposed a novel method to take into the information of remote sensing image, i.e. geographic latitude-longitude information. In addition, a dual-channel convolutional neural network (CNN) classification method is designed to mine pixel feature of image in combination with the global information simultaneously. Firstly, 1-demensional network of CNN is designed to extract pixel information of remote sensing image, and the fully connected network (FCN) is employed to extract latitude-longitude feature. Then, their features of two neural networks are fused by another fully neural network to realize remote sensing image classification. Finally, two kinds of remote sensing, involving hyperspectral imaging (HSI) and polarimetric synthetic aperture radar (PolSAR), are used to verify the effectiveness of our method. The results of the proposed method is superior to the traditional single-channel convolutional neural network.
摘要:土地覆盖分类遥感等方面发挥了重要的作用,因为它可以智能地识别在一个巨大的遥感影像的东西,以减少人为的工作。然而,大量的分类方法是基于像素特征或遥感图像,这限制了他们的方法的分类精度和普遍性的有限空间特征而设计的。本文提出考虑到遥感图像的信息,即,地理纬度 - 经度信息的新方法。此外,双通道卷积神经网络(CNN)的分类方法被设计成图像的矿像素特征结合同时全球信息。首先,CNN的1-维动网络被设计为遥感图像的提取像素信息,并且所述全连接网络(FCN)被用来提取纬度 - 经度特征。然后,它们的两个神经网络的特征由另一个完全神经网络融合到实现遥感图像分类。最后,二种遥感,涉及高光谱成像(HSI)和极化合成孔径雷达(极化SAR),被用来验证我们的方法的有效性。所提出的方法的结果是优于传统的单信道的卷积神经网络。
Fan Zhang, MinChao Yan, Chen Hu, Jun Ni, Fei Ma
Abstract: Land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image so as to reduce the work of human. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. This paper proposed a novel method to take into the information of remote sensing image, i.e. geographic latitude-longitude information. In addition, a dual-channel convolutional neural network (CNN) classification method is designed to mine pixel feature of image in combination with the global information simultaneously. Firstly, 1-demensional network of CNN is designed to extract pixel information of remote sensing image, and the fully connected network (FCN) is employed to extract latitude-longitude feature. Then, their features of two neural networks are fused by another fully neural network to realize remote sensing image classification. Finally, two kinds of remote sensing, involving hyperspectral imaging (HSI) and polarimetric synthetic aperture radar (PolSAR), are used to verify the effectiveness of our method. The results of the proposed method is superior to the traditional single-channel convolutional neural network.
摘要:土地覆盖分类遥感等方面发挥了重要的作用,因为它可以智能地识别在一个巨大的遥感影像的东西,以减少人为的工作。然而,大量的分类方法是基于像素特征或遥感图像,这限制了他们的方法的分类精度和普遍性的有限空间特征而设计的。本文提出考虑到遥感图像的信息,即,地理纬度 - 经度信息的新方法。此外,双通道卷积神经网络(CNN)的分类方法被设计成图像的矿像素特征结合同时全球信息。首先,CNN的1-维动网络被设计为遥感图像的提取像素信息,并且所述全连接网络(FCN)被用来提取纬度 - 经度特征。然后,它们的两个神经网络的特征由另一个完全神经网络融合到实现遥感图像分类。最后,二种遥感,涉及高光谱成像(HSI)和极化合成孔径雷达(极化SAR),被用来验证我们的方法的有效性。所提出的方法的结果是优于传统的单信道的卷积神经网络。
69. Blended Multi-Modal Deep ConvNet Features for Diabetic Retinopathy Severity Prediction [PDF] 返回目录
J.D. Bodapati, N. Veeranjaneyulu, S.N. Shareef, S. Hakak, M. Bilal, P.K.R. Maddikunta, O. Jo
Abstract: Diabetic Retinopathy (DR) is one of the major causes of visual impairment and blindness across the world. It is usually found in patients who suffer from diabetes for a long period. The major focus of this work is to derive optimal representation of retinal images that further helps to improve the performance of DR recognition models. To extract optimal representation, features extracted from multiple pre-trained ConvNet models are blended using proposed multi-modal fusion module. These final representations are used to train a Deep Neural Network (DNN) used for DR identification and severity level prediction. As each ConvNet extracts different features, fusing them using 1D pooling and cross pooling leads to better representation than using features extracted from a single ConvNet. Experimental studies on benchmark Kaggle APTOS 2019 contest dataset reveals that the model trained on proposed blended feature representations is superior to the existing methods. In addition, we notice that cross average pooling based fusion of features from Xception and VGG16 is the most appropriate for DR recognition. With the proposed model, we achieve an accuracy of 97.41%, and a kappa statistic of 94.82 for DR identification and an accuracy of 81.7% and a kappa statistic of 71.1% for severity level prediction. Another interesting observation is that DNN with dropout at input layer converges more quickly when trained using blended features, compared to the same model trained using uni-modal deep features.
摘要:糖尿病视网膜病变(DR)是世界各地的视力损伤和失明的主要原因之一。它通常是谁患有糖尿病,很长一段时间患者。这项工作的主要重点是视网膜图像的导出最佳表征进一步有助于改善DR识别模型的性能。以提取最优的表示,特征从多个预先训练ConvNet模型提取的使用提出的多模态融合模块共混。这些最终表示被用来训练用于DR识别和严重性级别预测深神经网络(DNN)。由于每个ConvNet提取不同的特征,使用1D池和交叉池导致比使用来自单个ConvNet提取的特征更好的代表性融合它们。在基准Kaggle APTOS 2019竞赛数据集实验研究表明,经过训练,对提出的混合特征表示模型优于现有的方法。此外,我们注意到,从Xception和VGG16部件的横平均池基于融合是最适合DR的认可。随着该模型中,我们实现了97.41%的精确度,和94.82的DR识别kappa统计和81.7%的精度和71.1%的严重程度预测的kappa统计。另一个有趣的观察是DNN与更快速地在使用混合特征的训练,相比于同型号输入层收敛漏失使用单峰深特征的训练。
J.D. Bodapati, N. Veeranjaneyulu, S.N. Shareef, S. Hakak, M. Bilal, P.K.R. Maddikunta, O. Jo
Abstract: Diabetic Retinopathy (DR) is one of the major causes of visual impairment and blindness across the world. It is usually found in patients who suffer from diabetes for a long period. The major focus of this work is to derive optimal representation of retinal images that further helps to improve the performance of DR recognition models. To extract optimal representation, features extracted from multiple pre-trained ConvNet models are blended using proposed multi-modal fusion module. These final representations are used to train a Deep Neural Network (DNN) used for DR identification and severity level prediction. As each ConvNet extracts different features, fusing them using 1D pooling and cross pooling leads to better representation than using features extracted from a single ConvNet. Experimental studies on benchmark Kaggle APTOS 2019 contest dataset reveals that the model trained on proposed blended feature representations is superior to the existing methods. In addition, we notice that cross average pooling based fusion of features from Xception and VGG16 is the most appropriate for DR recognition. With the proposed model, we achieve an accuracy of 97.41%, and a kappa statistic of 94.82 for DR identification and an accuracy of 81.7% and a kappa statistic of 71.1% for severity level prediction. Another interesting observation is that DNN with dropout at input layer converges more quickly when trained using blended features, compared to the same model trained using uni-modal deep features.
摘要:糖尿病视网膜病变(DR)是世界各地的视力损伤和失明的主要原因之一。它通常是谁患有糖尿病,很长一段时间患者。这项工作的主要重点是视网膜图像的导出最佳表征进一步有助于改善DR识别模型的性能。以提取最优的表示,特征从多个预先训练ConvNet模型提取的使用提出的多模态融合模块共混。这些最终表示被用来训练用于DR识别和严重性级别预测深神经网络(DNN)。由于每个ConvNet提取不同的特征,使用1D池和交叉池导致比使用来自单个ConvNet提取的特征更好的代表性融合它们。在基准Kaggle APTOS 2019竞赛数据集实验研究表明,经过训练,对提出的混合特征表示模型优于现有的方法。此外,我们注意到,从Xception和VGG16部件的横平均池基于融合是最适合DR的认可。随着该模型中,我们实现了97.41%的精确度,和94.82的DR识别kappa统计和81.7%的精度和71.1%的严重程度预测的kappa统计。另一个有趣的观察是DNN与更快速地在使用混合特征的训练,相比于同型号输入层收敛漏失使用单峰深特征的训练。
70. Advanced Single Image Resolution Upsurging Using a Generative Adversarial Network [PDF] 返回目录
Md. Moshiur Rahman, Samrat Kumar Dey, Kabid Hassan Shibly
Abstract: The resolution of an image is a very important criterion for evaluating the quality of the image. A higher resolution of an image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. A higher resolution of an image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate a higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
摘要:图像的分辨率是评价图像的质量非常重要的标准。的图像的分辨率越高,总是优选的,因为较低分辨率的图像是不适合的,由于模糊质量。的图像的较高的分辨率用于各种领域,例如医学成像重要;天文学作品等当它们的尺寸被放大分辨率较低的图像变得不清晰和模糊的。近年来,各种研究工作进行生成图像的从较低的分辨率更高的分辨率。在本文中,我们提出了产生更高分辨率的图像中的残余致密块状网络架构使用残差具有深网络形成较低分辨率的技术。我们还比较我们与其他方法的方法来证明我们的方法提供了更好的视觉质量的图像。
Md. Moshiur Rahman, Samrat Kumar Dey, Kabid Hassan Shibly
Abstract: The resolution of an image is a very important criterion for evaluating the quality of the image. A higher resolution of an image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. A higher resolution of an image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate a higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
摘要:图像的分辨率是评价图像的质量非常重要的标准。的图像的分辨率越高,总是优选的,因为较低分辨率的图像是不适合的,由于模糊质量。的图像的较高的分辨率用于各种领域,例如医学成像重要;天文学作品等当它们的尺寸被放大分辨率较低的图像变得不清晰和模糊的。近年来,各种研究工作进行生成图像的从较低的分辨率更高的分辨率。在本文中,我们提出了产生更高分辨率的图像中的残余致密块状网络架构使用残差具有深网络形成较低分辨率的技术。我们还比较我们与其他方法的方法来证明我们的方法提供了更好的视觉质量的图像。
71. MetaInv-Net: Meta Inversion Network for Sparse View CT Image Reconstruction [PDF] 返回目录
Haimiao Zhang, Baodong Liu, Hengyong Yu, Bin Dong
Abstract: X-ray Computed Tomography (CT) is widely used in clinical applications such as diagnosis and image-guided interventions. In this paper, we propose a new deep learning based model for CT image reconstruction with the backbone network architecture built by unrolling an iterative algorithm. However, unlike the existing strategy to include as many data-adaptive components in the unrolled dynamics model as possible, we find that it is enough to only learn the parts where traditional designs mostly rely on intuitions and experience. More specifically, we propose to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model. Other components, such as image priors and hyperparameters, are kept as the original design. This makes the proposed model very light-weighted. Since a hypernetwork is introduced to inference on the initialization of the CG module, it makes the proposed model a certain meta-learning model. Therefore, we shall call the proposed model the meta-inversion network (MetaInv-Net). The proposed MetaInv-Net has much less trainable parameters and superior image reconstruction performance than some state-of-the-art deep models in CT imaging. In simulated and real data experiments, MetaInv-Net performs very well and can be generalized beyond the training setting, i.e., to other scanning settings, noise levels, and noise types.
摘要:X射线计算机断层扫描(CT)被广泛用于临床应用,例如诊断和图像引导的介入。在本文中,我们提出了CT图像重建与展开迭代算法建立的骨干网络架构的新的深度学习基于模型。然而,不同于现有的战略,包括在展开的动力学模型尽可能多的数据自适应组件,我们发现,这是足以只学而传统的设计主要依赖于直觉和经验的部分。更具体地说,我们提出去学习,参与的骨干模型的子问题的一个共轭梯度(CG)算法的初始化。其他组件,如先验图像和超参数,保持为原来的设计。这使得该模型很轻加权。由于超网络引入推理的CG模块的初始化,这使得该模型有一定的元学习模型。因此,我们称该模型的元反转网络(MetaInv-网)。所提出的MetaInv-Net的具有少得多的可训练参数和比CT成像一些国家的最先进的深模型优异的图像重建性能。在模拟和真实数据的实验,MetaInv-Net的表现非常好,可以概括超越训练设置,即与其他扫描设置,噪声水平和噪声类型。
Haimiao Zhang, Baodong Liu, Hengyong Yu, Bin Dong
Abstract: X-ray Computed Tomography (CT) is widely used in clinical applications such as diagnosis and image-guided interventions. In this paper, we propose a new deep learning based model for CT image reconstruction with the backbone network architecture built by unrolling an iterative algorithm. However, unlike the existing strategy to include as many data-adaptive components in the unrolled dynamics model as possible, we find that it is enough to only learn the parts where traditional designs mostly rely on intuitions and experience. More specifically, we propose to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model. Other components, such as image priors and hyperparameters, are kept as the original design. This makes the proposed model very light-weighted. Since a hypernetwork is introduced to inference on the initialization of the CG module, it makes the proposed model a certain meta-learning model. Therefore, we shall call the proposed model the meta-inversion network (MetaInv-Net). The proposed MetaInv-Net has much less trainable parameters and superior image reconstruction performance than some state-of-the-art deep models in CT imaging. In simulated and real data experiments, MetaInv-Net performs very well and can be generalized beyond the training setting, i.e., to other scanning settings, noise levels, and noise types.
摘要:X射线计算机断层扫描(CT)被广泛用于临床应用,例如诊断和图像引导的介入。在本文中,我们提出了CT图像重建与展开迭代算法建立的骨干网络架构的新的深度学习基于模型。然而,不同于现有的战略,包括在展开的动力学模型尽可能多的数据自适应组件,我们发现,这是足以只学而传统的设计主要依赖于直觉和经验的部分。更具体地说,我们提出去学习,参与的骨干模型的子问题的一个共轭梯度(CG)算法的初始化。其他组件,如先验图像和超参数,保持为原来的设计。这使得该模型很轻加权。由于超网络引入推理的CG模块的初始化,这使得该模型有一定的元学习模型。因此,我们称该模型的元反转网络(MetaInv-网)。所提出的MetaInv-Net的具有少得多的可训练参数和比CT成像一些国家的最先进的深模型优异的图像重建性能。在模拟和真实数据的实验,MetaInv-Net的表现非常好,可以概括超越训练设置,即与其他扫描设置,噪声水平和噪声类型。
72. Overview of Scanner Invariant Representations [PDF] 返回目录
Daniel Moyer, Greg Ver Steeg, Paul M. Thompson
Abstract: Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. Nevertheless, numerous solutions leveraging direct correspondence have been proposed. In contrast to this, Moyer et al. (2019) proposes an unsupervised solution using invariant representations, one which does not require correspondence and thus does not require paired images. By leveraging the data processing inequality, an invariant representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to the underlying structure. In the present abstract we provide an overview of this method.
摘要:从多个源汇集的成像数据经受来自每个源极偏压。研究认为在最佳赔统计力量,这些扫描仪/站点的偏差不正确的,在最坏的情况让伪相关在他们的数据。的偏见影响估计是不平凡的,由于与整个网站对应的数据的缺乏,所谓的“旅行幽灵”的数据,这是收集昂贵。然而,利用直接对应多种解决方案已被提出。与此相反,莫耶等。 (2019),使用恒定表征,一个其不需要对应提出了一种无监督溶液和因此不需要成对图像。通过利用数据处理不等式,不变表示可以然后被用来创建图像重建即无信息其原始源的,但仍然忠实于底层结构。在本抽象我们提供这方法的概述。
Daniel Moyer, Greg Ver Steeg, Paul M. Thompson
Abstract: Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. Nevertheless, numerous solutions leveraging direct correspondence have been proposed. In contrast to this, Moyer et al. (2019) proposes an unsupervised solution using invariant representations, one which does not require correspondence and thus does not require paired images. By leveraging the data processing inequality, an invariant representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to the underlying structure. In the present abstract we provide an overview of this method.
摘要:从多个源汇集的成像数据经受来自每个源极偏压。研究认为在最佳赔统计力量,这些扫描仪/站点的偏差不正确的,在最坏的情况让伪相关在他们的数据。的偏见影响估计是不平凡的,由于与整个网站对应的数据的缺乏,所谓的“旅行幽灵”的数据,这是收集昂贵。然而,利用直接对应多种解决方案已被提出。与此相反,莫耶等。 (2019),使用恒定表征,一个其不需要对应提出了一种无监督溶液和因此不需要成对图像。通过利用数据处理不等式,不变表示可以然后被用来创建图像重建即无信息其原始源的,但仍然忠实于底层结构。在本抽象我们提供这方法的概述。
73. Approximating the Ideal Observer for joint signal detection and localization tasks by use of supervised learning methods [PDF] 返回目录
Weimin Zhou, Hua Li, Mark A. Anastasio
Abstract: Medical imaging systems are commonly assessed and optimized by use of objective measures of image quality (IQ). The Ideal Observer (IO) performance has been advocated to provide a figure-of-merit for use in assessing and optimizing imaging systems because the IO sets an upper performance limit among all observers. When joint signal detection and localization tasks are considered, the IO that employs a modified generalized likelihood ratio test maximizes observer performance as characterized by the localization receiver operating characteristic (LROC) curve. Computations of likelihood ratios are analytically intractable in the majority of cases. Therefore, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been developed to approximate the likelihood ratios. However, the applications of MCMC methods have been limited to relatively simple object models. Supervised learning-based methods that employ convolutional neural networks have been recently developed to approximate the IO for binary signal detection tasks. In this paper, the ability of supervised learning-based methods to approximate the IO for joint signal detection and localization tasks is explored. Both background-known-exactly and background-known-statistically signal detection and localization tasks are considered. The considered object models include a lumpy object model and a clustered lumpy model, and the considered measurement noise models include Laplacian noise, Gaussian noise, and mixed Poisson-Gaussian noise. The LROC curves produced by the supervised learning-based method are compared to those produced by the MCMC approach or analytical computation when feasible. The potential utility of the proposed method for computing objective measures of IQ for optimizing imaging system performance is explored.
摘要:医疗成像系统通常评估和使用的图像质量(IQ)的客观量度优化。理想观看(IO)的性能一直主张,以提供使用的图的品质因数在评估和优化的成像系统,因为IO设置所有的观察者间的上的性能极限。当联合信号检测和定位的任务被认为是,其采用经修饰的广义似然比检验的IO为特征的操作特征(LROC)曲线的定位接收机最大化观察者性能。似然比的计算是在大多数情况下的解析顽固性。因此,采样为基础的雇用马尔可夫链蒙特卡洛(MCMC)技术已经被开发来近似似然比的方法。然而,MCMC方法的应用一直局限于相对简单对象模型。雇用卷积神经网络监督的基于学习的方法最近研制近似IO二进制信号检测任务。在本文中,基于监督学习的方法来近似IO联合信号检测和定位任务的能力进行了探讨。这两个背景已知准确,背景已知统计信号检测和定位任务的考虑。所考虑的对象模型包括一个块状的对象模型和群集块状模型,以及考虑的测量噪声模型包括拉普拉斯噪声,高斯噪声和混合泊松高斯噪声。由监督基于学习的方法生产的LROC曲线相比,那些由MCMC方法或当分析计算可行生产。用于计算IQ的客观测量用于优化成象系统的性能所提出的方法的潜在效用进行了探索。
Weimin Zhou, Hua Li, Mark A. Anastasio
Abstract: Medical imaging systems are commonly assessed and optimized by use of objective measures of image quality (IQ). The Ideal Observer (IO) performance has been advocated to provide a figure-of-merit for use in assessing and optimizing imaging systems because the IO sets an upper performance limit among all observers. When joint signal detection and localization tasks are considered, the IO that employs a modified generalized likelihood ratio test maximizes observer performance as characterized by the localization receiver operating characteristic (LROC) curve. Computations of likelihood ratios are analytically intractable in the majority of cases. Therefore, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been developed to approximate the likelihood ratios. However, the applications of MCMC methods have been limited to relatively simple object models. Supervised learning-based methods that employ convolutional neural networks have been recently developed to approximate the IO for binary signal detection tasks. In this paper, the ability of supervised learning-based methods to approximate the IO for joint signal detection and localization tasks is explored. Both background-known-exactly and background-known-statistically signal detection and localization tasks are considered. The considered object models include a lumpy object model and a clustered lumpy model, and the considered measurement noise models include Laplacian noise, Gaussian noise, and mixed Poisson-Gaussian noise. The LROC curves produced by the supervised learning-based method are compared to those produced by the MCMC approach or analytical computation when feasible. The potential utility of the proposed method for computing objective measures of IQ for optimizing imaging system performance is explored.
摘要:医疗成像系统通常评估和使用的图像质量(IQ)的客观量度优化。理想观看(IO)的性能一直主张,以提供使用的图的品质因数在评估和优化的成像系统,因为IO设置所有的观察者间的上的性能极限。当联合信号检测和定位的任务被认为是,其采用经修饰的广义似然比检验的IO为特征的操作特征(LROC)曲线的定位接收机最大化观察者性能。似然比的计算是在大多数情况下的解析顽固性。因此,采样为基础的雇用马尔可夫链蒙特卡洛(MCMC)技术已经被开发来近似似然比的方法。然而,MCMC方法的应用一直局限于相对简单对象模型。雇用卷积神经网络监督的基于学习的方法最近研制近似IO二进制信号检测任务。在本文中,基于监督学习的方法来近似IO联合信号检测和定位任务的能力进行了探讨。这两个背景已知准确,背景已知统计信号检测和定位任务的考虑。所考虑的对象模型包括一个块状的对象模型和群集块状模型,以及考虑的测量噪声模型包括拉普拉斯噪声,高斯噪声和混合泊松高斯噪声。由监督基于学习的方法生产的LROC曲线相比,那些由MCMC方法或当分析计算可行生产。用于计算IQ的客观测量用于优化成象系统的性能所提出的方法的潜在效用进行了探索。
74. Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms [PDF] 返回目录
Eric Wu, Kevin Wu, William Lotter
Abstract: Data scarcity and class imbalance are two fundamental challenges in many machine learning applications to healthcare. Breast cancer classification in mammography exemplifies these challenges, with a malignancy rate of around 0.5% in a screening population, which is compounded by the relatively small size of lesions (~1% of the image) in malignant cases. Simultaneously, the prevalence of screening mammography creates a potential abundance of non-cancer exams to use for training. Altogether, these characteristics lead to overfitting on cancer cases, while under-utilizing non-cancer data. Here, we present a novel generative adversarial network (GAN) model for data augmentation that can realistically synthesize and remove lesions on mammograms. With self-attention and semi-supervised learning components, the U-net-based architecture can generate high resolution (256x256px) outputs, as necessary for mammography. When augmenting the original training set with the GAN-generated samples, we find a significant improvement in malignancy classification performance on a test set of real mammogram patches. Overall, the empirical results of our algorithm and the relevance to other medical imaging paradigms point to potentially fruitful further applications.
摘要:数据缺乏和不平衡类在很多机器学习应用到医疗两个基本挑战。在乳房X线照相乳腺癌分类举例说明了这些挑战,在筛选群体的大约0.5%的恶性肿瘤率,这是由病变的相对小的尺寸配混(〜图像的1%)在恶性病例。同时,乳房X线筛查的普及产生非癌症检查的潜在大量使用进行培训。总之,这些特点导致过度拟合的癌症病例,同时利用下非癌症的数据。这里,我们提出了数据扩张,可以逼真地合成和乳房X线照片上移除病变的新的生成对抗网络(GAN)模型。具有自关注和半监督学习部件,所述基于U形网架构可以生成高分辨率(256x256px)输出,所必需的乳房X线照相。当增强与GAN-生成的样本原始训练集,我们发现在恶性分类性能上的测试组真实乳房X光检查补丁的显著改善。总的来说,我们的算法和关联到其他医疗成像范式的实证结果指向潜在的富有成果的进一步应用。
Eric Wu, Kevin Wu, William Lotter
Abstract: Data scarcity and class imbalance are two fundamental challenges in many machine learning applications to healthcare. Breast cancer classification in mammography exemplifies these challenges, with a malignancy rate of around 0.5% in a screening population, which is compounded by the relatively small size of lesions (~1% of the image) in malignant cases. Simultaneously, the prevalence of screening mammography creates a potential abundance of non-cancer exams to use for training. Altogether, these characteristics lead to overfitting on cancer cases, while under-utilizing non-cancer data. Here, we present a novel generative adversarial network (GAN) model for data augmentation that can realistically synthesize and remove lesions on mammograms. With self-attention and semi-supervised learning components, the U-net-based architecture can generate high resolution (256x256px) outputs, as necessary for mammography. When augmenting the original training set with the GAN-generated samples, we find a significant improvement in malignancy classification performance on a test set of real mammogram patches. Overall, the empirical results of our algorithm and the relevance to other medical imaging paradigms point to potentially fruitful further applications.
摘要:数据缺乏和不平衡类在很多机器学习应用到医疗两个基本挑战。在乳房X线照相乳腺癌分类举例说明了这些挑战,在筛选群体的大约0.5%的恶性肿瘤率,这是由病变的相对小的尺寸配混(〜图像的1%)在恶性病例。同时,乳房X线筛查的普及产生非癌症检查的潜在大量使用进行培训。总之,这些特点导致过度拟合的癌症病例,同时利用下非癌症的数据。这里,我们提出了数据扩张,可以逼真地合成和乳房X线照片上移除病变的新的生成对抗网络(GAN)模型。具有自关注和半监督学习部件,所述基于U形网架构可以生成高分辨率(256x256px)输出,所必需的乳房X线照相。当增强与GAN-生成的样本原始训练集,我们发现在恶性分类性能上的测试组真实乳房X光检查补丁的显著改善。总的来说,我们的算法和关联到其他医疗成像范式的实证结果指向潜在的富有成果的进一步应用。
75. Automatic segmentation of the pulmonary lobes with a 3D u-net and optimized loss function [PDF] 返回目录
Bianca Lassen-Schmidt, Alessa Hering, Stefan Krass, Hans Meine
Abstract: Fully-automatic lung lobe segmentation is challenging due to anatomical variations, pathologies, and incomplete fissures. We trained a 3D u-net for pulmonary lobe segmentation on 49 mainly publically available datasets and introduced a weighted Dice loss function to emphasize the lobar boundaries. To validate the performance of the proposed method we compared the results to two other methods. The new loss function improved the mean distance to 1.46 mm (compared to 2.08 mm for simple loss function without weighting).
摘要:全自动肺叶分割是具有挑战性由于解剖变异,病理,和不完全的裂缝。我们培养了三维的u-网为肺叶分割上49个主要是公开可用的数据集,并引入了加权骰子损失函数强调肺叶边界。为了验证该方法的性能,我们比较的结果与其他两种方法。新的损失函数改进的平均距离〜1.46毫米(相比2.08毫米简单损失功能,而不加权)。
Bianca Lassen-Schmidt, Alessa Hering, Stefan Krass, Hans Meine
Abstract: Fully-automatic lung lobe segmentation is challenging due to anatomical variations, pathologies, and incomplete fissures. We trained a 3D u-net for pulmonary lobe segmentation on 49 mainly publically available datasets and introduced a weighted Dice loss function to emphasize the lobar boundaries. To validate the performance of the proposed method we compared the results to two other methods. The new loss function improved the mean distance to 1.46 mm (compared to 2.08 mm for simple loss function without weighting).
摘要:全自动肺叶分割是具有挑战性由于解剖变异,病理,和不完全的裂缝。我们培养了三维的u-网为肺叶分割上49个主要是公开可用的数据集,并引入了加权骰子损失函数强调肺叶边界。为了验证该方法的性能,我们比较的结果与其他两种方法。新的损失函数改进的平均距离〜1.46毫米(相比2.08毫米简单损失功能,而不加权)。
76. Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data [PDF] 返回目录
Qi Chang, Hui Qu, Yikai Zhang, Mert Sabuncu, Chao Chen, Tong Zhang, Dimitris Metaxas
Abstract: In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased.
摘要:在本文中,我们提出了一个数据隐私保护和沟通效率名为分布式异步化鉴别甘(AsynDGAN)分布式GAN学习框架。我们提出的框架旨在从分布鉴别培训中心发电机获悉,以及使用生成的合成图像仅用于训练分割model.We验证卫生机构学习这是众所周知的是隐私敏感问题的应用程序所提出的框架。我们的实验表明,我们的方法:1)可以学习从多个数据集中的真实图像的分布,而不共享病人的原始数据。 2)是更有效的,并且需要比其他分布式深学习方法更低的带宽。 3)相比于一个真实的数据集,几乎比所有真实数据集训练模型相同的性能训练模式达到更高的性能。 4)具有可证明的保证发电机可以学习该分发在这样所有重要的时尚是公正的。
Qi Chang, Hui Qu, Yikai Zhang, Mert Sabuncu, Chao Chen, Tong Zhang, Dimitris Metaxas
Abstract: In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased.
摘要:在本文中,我们提出了一个数据隐私保护和沟通效率名为分布式异步化鉴别甘(AsynDGAN)分布式GAN学习框架。我们提出的框架旨在从分布鉴别培训中心发电机获悉,以及使用生成的合成图像仅用于训练分割model.We验证卫生机构学习这是众所周知的是隐私敏感问题的应用程序所提出的框架。我们的实验表明,我们的方法:1)可以学习从多个数据集中的真实图像的分布,而不共享病人的原始数据。 2)是更有效的,并且需要比其他分布式深学习方法更低的带宽。 3)相比于一个真实的数据集,几乎比所有真实数据集训练模型相同的性能训练模式达到更高的性能。 4)具有可证明的保证发电机可以学习该分发在这样所有重要的时尚是公正的。
77. Automatic Diagnosis of Pulmonary Embolism Using an Attention-guided Framework: A Large-scale Study [PDF] 返回目录
Luyao Shi, Deepta Rajan, Shafiq Abedin, Manikanta Srikar Yellapragada, David Beymer, Ehsan Dehghan
Abstract: Pulmonary Embolism (PE) is a life-threatening disorder associated with high mortality and morbidity. Prompt diagnosis and immediate initiation of therapeutic action is important. We explored a deep learning model to detect PE on volumetric contrast-enhanced chest CT scans using a 2-stage training strategy. First, a residual convolutional neural network (ResNet) was trained using annotated 2D images. In addition to the classification loss, an attention loss was added during training to help the network focus attention on PE. Next, a recurrent network was used to scan sequentially through the features provided by the pre-trained ResNet to detect PE. This combination allows the network to be trained using both a limited and sparse set of pixel-level annotated images and a large number of easily obtainable patient-level image-label pairs. We used 1,670 sparsely annotated studies and more than 10,000 labeled studies in our training. On a test set with 2,160 patient studies, the proposed method achieved an area under the ROC curve (AUC) of 0.812. The proposed framework is also able to provide localized attention maps that indicate possible PE lesions, which could potentially help radiologists accelerate the diagnostic process.
摘要:肺栓塞(PE)是一种具有高发病率和死亡率危及生命的疾病。及时诊断和治疗行动立即开始是很重要的。我们探讨了深刻的学习模型来检测使用2级培训战略的体积对比增强胸部CT扫描PE。第一,残差卷积神经网络(RESNET)被训练使用注释的2D图像。除了分类亏损,培训,帮助网络注意力集中在PE中加入一个关注的损失。接着,复发性网络被用来通过提供的特征顺序地扫描预先训练RESNET检测PE。这种组合允许网络使用像素级注释的图像的两者的限制和稀疏集合和大量容易获得的病人级图像标记对被训练。我们使用1670个稀疏注释研究和我们的培训超过10,000标记研究。在测试组具有2160周患者的研究中,所提出的方法的ROC曲线的0.812(AUC)下取得的区域。拟议的框架也能够提供本地化的关注地图存在的可能性,PE的病变,也可能帮助放射科医师加速诊断过程。
Luyao Shi, Deepta Rajan, Shafiq Abedin, Manikanta Srikar Yellapragada, David Beymer, Ehsan Dehghan
Abstract: Pulmonary Embolism (PE) is a life-threatening disorder associated with high mortality and morbidity. Prompt diagnosis and immediate initiation of therapeutic action is important. We explored a deep learning model to detect PE on volumetric contrast-enhanced chest CT scans using a 2-stage training strategy. First, a residual convolutional neural network (ResNet) was trained using annotated 2D images. In addition to the classification loss, an attention loss was added during training to help the network focus attention on PE. Next, a recurrent network was used to scan sequentially through the features provided by the pre-trained ResNet to detect PE. This combination allows the network to be trained using both a limited and sparse set of pixel-level annotated images and a large number of easily obtainable patient-level image-label pairs. We used 1,670 sparsely annotated studies and more than 10,000 labeled studies in our training. On a test set with 2,160 patient studies, the proposed method achieved an area under the ROC curve (AUC) of 0.812. The proposed framework is also able to provide localized attention maps that indicate possible PE lesions, which could potentially help radiologists accelerate the diagnostic process.
摘要:肺栓塞(PE)是一种具有高发病率和死亡率危及生命的疾病。及时诊断和治疗行动立即开始是很重要的。我们探讨了深刻的学习模型来检测使用2级培训战略的体积对比增强胸部CT扫描PE。第一,残差卷积神经网络(RESNET)被训练使用注释的2D图像。除了分类亏损,培训,帮助网络注意力集中在PE中加入一个关注的损失。接着,复发性网络被用来通过提供的特征顺序地扫描预先训练RESNET检测PE。这种组合允许网络使用像素级注释的图像的两者的限制和稀疏集合和大量容易获得的病人级图像标记对被训练。我们使用1670个稀疏注释研究和我们的培训超过10,000标记研究。在测试组具有2160周患者的研究中,所提出的方法的ROC曲线的0.812(AUC)下取得的区域。拟议的框架也能够提供本地化的关注地图存在的可能性,PE的病变,也可能帮助放射科医师加速诊断过程。
78. Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving [PDF] 返回目录
Vasiliki Kondyli, Mehul Bhatt, Jakob Suchan
Abstract: We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaining to embodied visuoauditory attention. We report preliminary steps to apply the developed cognitive model of visuospatial complexity for human-factors guided dataset creation and benchmarking, and for its use as a semantic template for the (explainable) computational analysis of visuospatial complexity.
摘要:我们开发的视觉空间在日常的复杂性,自然的驾驶条件以人为本,认知模式。重点是视觉感知,该模型结合了定量,结构和动态属性在所选的上下文可识别的;模型谎言与人类受试者的行为评价的以人为本的基础上就有关具体化visuoauditory注意心理的措施。我们报告的初步步骤申请的视觉空间为指导创建数据集和基准人为因素的复杂性开发的认知模型,以及其作为视觉空间的复杂性(可以解释)计算分析语义模板使用。
Vasiliki Kondyli, Mehul Bhatt, Jakob Suchan
Abstract: We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaining to embodied visuoauditory attention. We report preliminary steps to apply the developed cognitive model of visuospatial complexity for human-factors guided dataset creation and benchmarking, and for its use as a semantic template for the (explainable) computational analysis of visuospatial complexity.
摘要:我们开发的视觉空间在日常的复杂性,自然的驾驶条件以人为本,认知模式。重点是视觉感知,该模型结合了定量,结构和动态属性在所选的上下文可识别的;模型谎言与人类受试者的行为评价的以人为本的基础上就有关具体化visuoauditory注意心理的措施。我们报告的初步步骤申请的视觉空间为指导创建数据集和基准人为因素的复杂性开发的认知模型,以及其作为视觉空间的复杂性(可以解释)计算分析语义模板使用。
79. Applying the Decisiveness and Robustness Metrics to Convolutional Neural Networks [PDF] 返回目录
Christopher A. George, Eduardo A. Barrera, Kenric P. Nelson
Abstract: We review three recently-proposed classifier quality metrics and consider their suitability for large-scale classification challenges such as applying convolutional neural networks to the 1000-class ImageNet dataset. These metrics, referred to as the "geometric accuracy," "decisiveness," and "robustness," are based on the generalized mean ($\rho$ equals 0, 1, and -2/3, respectively) of the classifier's self-reported and measured probabilities of correct classification. We also propose some minor clarifications to standardize the metric definitions. With these updates, we show some examples of calculating the metrics using deep convolutional neural networks (AlexNet and DenseNet) acting on large datasets (the German Traffic Sign Recognition Benchmark and ImageNet).
摘要:我们评论三星最近提出的分类质量标准,并考虑为应用卷积神经网络对1000级ImageNet数据集他们对大型分类适应性挑战,例如。这些指标,被称为“几何精度”,“决断”和“鲁棒性”,是基于广义平均值($ \ RHO $等于0,1,和-2/3,分别地)的分类器的自我的报告和测量的正确分类的概率。我们也提出了一些轻微的澄清规范指标定义。有了这些更新,我们将展示采用深卷积神经网络计算指标(AlexNet和DenseNet)作用于大型数据集的一些例子(德国交通标志识别基准和ImageNet)。
Christopher A. George, Eduardo A. Barrera, Kenric P. Nelson
Abstract: We review three recently-proposed classifier quality metrics and consider their suitability for large-scale classification challenges such as applying convolutional neural networks to the 1000-class ImageNet dataset. These metrics, referred to as the "geometric accuracy," "decisiveness," and "robustness," are based on the generalized mean ($\rho$ equals 0, 1, and -2/3, respectively) of the classifier's self-reported and measured probabilities of correct classification. We also propose some minor clarifications to standardize the metric definitions. With these updates, we show some examples of calculating the metrics using deep convolutional neural networks (AlexNet and DenseNet) acting on large datasets (the German Traffic Sign Recognition Benchmark and ImageNet).
摘要:我们评论三星最近提出的分类质量标准,并考虑为应用卷积神经网络对1000级ImageNet数据集他们对大型分类适应性挑战,例如。这些指标,被称为“几何精度”,“决断”和“鲁棒性”,是基于广义平均值($ \ RHO $等于0,1,和-2/3,分别地)的分类器的自我的报告和测量的正确分类的概率。我们也提出了一些轻微的澄清规范指标定义。有了这些更新,我们将展示采用深卷积神经网络计算指标(AlexNet和DenseNet)作用于大型数据集的一些例子(德国交通标志识别基准和ImageNet)。
80. Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs [PDF] 返回目录
Weimin Zhou, Sayantan Bhadra, Frank J. Brooks, Hua Li, Mark A. Anastasio
Abstract: It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.
摘要:已经主张医疗成像系统和重建算法应评估和通过使用该量化观察者的特定诊断任务的性能的图像质量的目标的措施进行了优化。变异性的一个重要来源,可以显著限制观察者性能中的对象变化待成像。变性的这个源可以通过随机对象模型(SOM网络)进行说明。甲SOM是可以用于建立的待成像的与规定的统计属性的对象的合奏生成模型。为了在解剖结构和对象的纹理准确模型变化,期望从通过使用充分表征的成像系统获取的实验成像测量建立的SOM。深生成神经网络,如生成对抗性网络(甘斯)持有该任务的巨大潜力。然而,传统的甘斯通常通过使用由测量噪声和重建过程的效应的影响重建图像的训练。为了规避这一点,AmbientGAN已经提出,加强了通过测量操作的GaN。然而,原始AmbientGAN不能立即受益于现代化的培训程序,如进行性增长,这限制了它的能力被应用到实际大小的医用图像数据。为了避免这一点,在这项工作中,一个新的进步成长AmbientGAN(ProAmGAN)策略是从医学成像测量建立的SOM开发。对应于常见的医疗成像模态数值程式化的研究以证明和验证用于建立的SOM所提出的方法。
Weimin Zhou, Sayantan Bhadra, Frank J. Brooks, Hua Li, Mark A. Anastasio
Abstract: It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.
摘要:已经主张医疗成像系统和重建算法应评估和通过使用该量化观察者的特定诊断任务的性能的图像质量的目标的措施进行了优化。变异性的一个重要来源,可以显著限制观察者性能中的对象变化待成像。变性的这个源可以通过随机对象模型(SOM网络)进行说明。甲SOM是可以用于建立的待成像的与规定的统计属性的对象的合奏生成模型。为了在解剖结构和对象的纹理准确模型变化,期望从通过使用充分表征的成像系统获取的实验成像测量建立的SOM。深生成神经网络,如生成对抗性网络(甘斯)持有该任务的巨大潜力。然而,传统的甘斯通常通过使用由测量噪声和重建过程的效应的影响重建图像的训练。为了规避这一点,AmbientGAN已经提出,加强了通过测量操作的GaN。然而,原始AmbientGAN不能立即受益于现代化的培训程序,如进行性增长,这限制了它的能力被应用到实际大小的医用图像数据。为了避免这一点,在这项工作中,一个新的进步成长AmbientGAN(ProAmGAN)策略是从医学成像测量建立的SOM开发。对应于常见的医疗成像模态数值程式化的研究以证明和验证用于建立的SOM所提出的方法。
81. Glaucoma Detection From Raw Circumapillary OCT Images Using Fully Convolutional Neural Networks [PDF] 返回目录
Gabriel García, Rocío del Amor, Adrián Colomer, Valery Naranjo
Abstract: Nowadays, glaucoma is the leading cause of blindness worldwide. We propose in this paper two different deep-learning-based approaches to address glaucoma detection just from raw circumpapillary OCT images. The first one is based on the development of convolutional neural networks (CNNs) trained from scratch. The second one lies in fine-tuning some of the most common state-of-the-art CNNs architectures. The experiments were performed on a private database composed of 93 glaucomatous and 156 normal B-scans around the optic nerve head of the retina, which were diagnosed by expert ophthalmologists. The validation results evidence that fine-tuned CNNs outperform the networks trained from scratch when small databases are addressed. Additionally, the VGG family of networks reports the most promising results, with an area under the ROC curve of 0.96 and an accuracy of 0.92, during the prediction of the independent test set.
摘要:如今,青光眼是全球失明的首要原因。我们建议在本文只是从原材料circumpapillary OCT图像两种不同的基于深的学习方法来解决青光眼检测。第一种是基于从头训练的卷积神经网络(细胞神经网络)的发展。第二个在于微调一些最常见的国家的最先进的细胞神经网络架构。该实验的93青光眼和156正常B扫描周围的视网膜,其被诊断由专家眼科医生的视神经头构成的专用数据库上执行的。验证结果的证据表明,微调细胞神经网络优于从头训练网络时,小型数据库得到解决。此外,网络的VGG家庭报告最有希望的结果,与0.96的ROC曲线下面积和0.92的独立测试集的预测过程的精度。
Gabriel García, Rocío del Amor, Adrián Colomer, Valery Naranjo
Abstract: Nowadays, glaucoma is the leading cause of blindness worldwide. We propose in this paper two different deep-learning-based approaches to address glaucoma detection just from raw circumpapillary OCT images. The first one is based on the development of convolutional neural networks (CNNs) trained from scratch. The second one lies in fine-tuning some of the most common state-of-the-art CNNs architectures. The experiments were performed on a private database composed of 93 glaucomatous and 156 normal B-scans around the optic nerve head of the retina, which were diagnosed by expert ophthalmologists. The validation results evidence that fine-tuned CNNs outperform the networks trained from scratch when small databases are addressed. Additionally, the VGG family of networks reports the most promising results, with an area under the ROC curve of 0.96 and an accuracy of 0.92, during the prediction of the independent test set.
摘要:如今,青光眼是全球失明的首要原因。我们建议在本文只是从原材料circumpapillary OCT图像两种不同的基于深的学习方法来解决青光眼检测。第一种是基于从头训练的卷积神经网络(细胞神经网络)的发展。第二个在于微调一些最常见的国家的最先进的细胞神经网络架构。该实验的93青光眼和156正常B扫描周围的视网膜,其被诊断由专家眼科医生的视神经头构成的专用数据库上执行的。验证结果的证据表明,微调细胞神经网络优于从头训练网络时,小型数据库得到解决。此外,网络的VGG家庭报告最有希望的结果,与0.96的ROC曲线下面积和0.92的独立测试集的预测过程的精度。
82. Image Restoration from Parametric Transformations using Generative Models [PDF] 返回目录
Kalliopi Basioti, George V. Moustakides
Abstract: When images are statistically described by a generative model we can use this information to develop optimum techniques for various image restoration problems as inpainting, super-resolution, image coloring, generative model inversion, etc. With the help of the generative model it is possible to formulate, in a natural way, these restoration problems as Statistical estimation problems. Our approach, by combining maximum a-posteriori probability with maximum likelihood estimation, is capable of restoring images that are distorted by transformations even when the latter contain unknown parameters. This must be compared with the current state of the art which requires exact knowledge of the transformations. We should also mention that our method does not contain any regularizer terms with unknown weights that need to be properly selected, as is common practice in all recent generative image restoration techniques. Finally, we extend our method to accommodate combinations of multiple images where each image is described by its own generative model and the participating images are being separated from a single combination.
摘要:当图像通过统计生成模型描述,我们可以利用这些信息来开发各种图像复原问题的最佳技术为补绘,超分辨率,图像色彩,生成模型反演等。随着生成模型的帮助,这是可以配制,以自然的方式,这些恢复问题,统计估计问题。我们的方法,通过最大后验概率与最大似然估计相结合,是能够恢复由变换扭曲即使当后者含有未知参数的图像。这必须与需要转换的确切知识领域的当前状态进行比较。我们还应该提到的是我们的方法不包含未知的权重是需要适当地选择任何正则化方面,由于是在最近所有的生成图像复原技术,常见的做法。最后,我们扩展我们的方法,以适应其中每个图像通过其自身的生成模型描述和参与图像被从一个单一的组合分离的多个图像的组合。
Kalliopi Basioti, George V. Moustakides
Abstract: When images are statistically described by a generative model we can use this information to develop optimum techniques for various image restoration problems as inpainting, super-resolution, image coloring, generative model inversion, etc. With the help of the generative model it is possible to formulate, in a natural way, these restoration problems as Statistical estimation problems. Our approach, by combining maximum a-posteriori probability with maximum likelihood estimation, is capable of restoring images that are distorted by transformations even when the latter contain unknown parameters. This must be compared with the current state of the art which requires exact knowledge of the transformations. We should also mention that our method does not contain any regularizer terms with unknown weights that need to be properly selected, as is common practice in all recent generative image restoration techniques. Finally, we extend our method to accommodate combinations of multiple images where each image is described by its own generative model and the participating images are being separated from a single combination.
摘要:当图像通过统计生成模型描述,我们可以利用这些信息来开发各种图像复原问题的最佳技术为补绘,超分辨率,图像色彩,生成模型反演等。随着生成模型的帮助,这是可以配制,以自然的方式,这些恢复问题,统计估计问题。我们的方法,通过最大后验概率与最大似然估计相结合,是能够恢复由变换扭曲即使当后者含有未知参数的图像。这必须与需要转换的确切知识领域的当前状态进行比较。我们还应该提到的是我们的方法不包含未知的权重是需要适当地选择任何正则化方面,由于是在最近所有的生成图像复原技术,常见的做法。最后,我们扩展我们的方法,以适应其中每个图像通过其自身的生成模型描述和参与图像被从一个单一的组合分离的多个图像的组合。
注:中文为机器翻译结果!