目录
7. Ice Monitoring in Swiss Lakes from Optical Satellites and Webcams using Machine Learning [PDF] 摘要
10. End-to-end trainable network for degraded license plate detection via vehicle-plate relation mining [PDF] 摘要
11. A Simple and Efficient Registration of 3D Point Cloud and Image Data for Indoor Mobile Mapping System [PDF] 摘要
15. A Multi-task Two-stream Spatiotemporal Convolutional Neural Network for Convective Storm Nowcasting [PDF] 摘要
16. MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering [PDF] 摘要
19. Cross-directional Feature Fusion Network for Building Damage Assessment from Satellite Imagery [PDF] 摘要
21. Developing Univariate Neurodegeneration Biomarkers with Low-Rank and Sparse Subspace Decomposition [PDF] 摘要
26. Peak Detection On Data Independent Acquisition Mass Spectrometry Data With Semisupervised Convolutional Transformers [PDF] 摘要
27. Application of sequential processing of computer vision methods for solving the problem of detecting the edges of a honeycomb block [PDF] 摘要
31. Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa) [PDF] 摘要
32. Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging [PDF] 摘要
35. CT Reconstruction with PDF: Parameter-Dependent Framework for Multiple Scanning Geometries and Dose Levels [PDF] 摘要
39. Micro-CT Synthesis and Inner Ear Super Resolution via Bayesian Generative Adversarial Networks [PDF] 摘要
41. Impact of Spherical Coordinates Transformation Pre-processing in Deep Convolution Neural Networks for Brain Tumor Segmentation and Survival Prediction [PDF] 摘要
43. Improved Supervised Training of Physics-Guided Deep Learning Image Reconstruction with Multi-Masking [PDF] 摘要
摘要
1. Robust Skeletonization for Plant Root Structure Reconstruction from MRI [PDF] 返回目录
Jannis Horn, Yi Zhao, Nils Wandel, Magdalena Landl, Andrea Schnepf, Sven Behnke
Abstract: Structural reconstruction of plant roots from MRI is challenging, because of low resolution and low signal-to-noise ratio of the 3D measurements which may lead to disconnectivities and wrongly connected roots. We propose a two-stage approach for this task. The first stage is based on semantic root vs. soil segmentation and finds lowest-cost paths from any root voxel to the shoot. The second stage takes the largest fully connected component generated in the first stage and uses 3D skeletonization to extract a graph structure. We evaluate our method on 22 MRI scans and compare to human expert reconstructions.
摘要:从MRI植物根部结构重建是具有挑战性的,因为低的分辨率和低信噪比的三维测量,这可能导致disconnectivities和错误连接根部的。我们提出了这个任务,两阶段的方式。第一阶段是基于语义根与土壤分割和的发现从任何根体素拍摄最低成本路径。第二阶段发生在第一阶段中所产生的最大的充分连接的组件,并使用3D骨架提取图形结构。我们评估我们在22次MRI扫描的方法,并比较人类专家的重建。
Jannis Horn, Yi Zhao, Nils Wandel, Magdalena Landl, Andrea Schnepf, Sven Behnke
Abstract: Structural reconstruction of plant roots from MRI is challenging, because of low resolution and low signal-to-noise ratio of the 3D measurements which may lead to disconnectivities and wrongly connected roots. We propose a two-stage approach for this task. The first stage is based on semantic root vs. soil segmentation and finds lowest-cost paths from any root voxel to the shoot. The second stage takes the largest fully connected component generated in the first stage and uses 3D skeletonization to extract a graph structure. We evaluate our method on 22 MRI scans and compare to human expert reconstructions.
摘要:从MRI植物根部结构重建是具有挑战性的,因为低的分辨率和低信噪比的三维测量,这可能导致disconnectivities和错误连接根部的。我们提出了这个任务,两阶段的方式。第一阶段是基于语义根与土壤分割和的发现从任何根体素拍摄最低成本路径。第二阶段发生在第一阶段中所产生的最大的充分连接的组件,并使用3D骨架提取图形结构。我们评估我们在22次MRI扫描的方法,并比较人类专家的重建。
2. Structured Visual Search via Composition-aware Learning [PDF] 返回目录
Mert Kilickaya, Arnold W.M. Smeulders
Abstract: This paper studies visual search using structured queries. The structure is in the form of a 2D composition that encodes the position and the category of the objects. The transformation of the position and the category of the objects leads to a continuous-valued relationship between visual compositions, which carries highly beneficial information, although not leveraged by previous techniques. To that end, in this work, our goal is to leverage these continuous relationships by using the notion of symmetry in equivariance. Our model output is trained to change symmetrically with respect to the input transformations, leading to a sensitive feature space. Doing so leads to a highly efficient search technique, as our approach learns from fewer data using a smaller feature space. Experiments on two large-scale benchmarks of MS-COCO and HICO-DET demonstrates that our approach leads to a considerable gain in the performance against competing techniques.
摘要:本文研究了利用结构化查询,可视化搜索。该结构是在编码的位置和对象的类别二维组合物的形式。的位置的变换和对象引线的视觉组合物之间的连续值的关系,其携带高度有益的信息,尽管不是由先前的技术利用了类别。为此,在这项工作中,我们的目标是通过使用同变性对称的概念,利用这些连续的关系。我们的模型输出培训,以相对于输入转换对称发生变化,从而导致敏感的特征空间。这样做导致了高效的搜索技术,因为我们从更少的数据获悉方法使用一个较小的特征空间。在MS-COCO和HICO-DET的两次大规模的基准实验表明,我们的方法会导致同台竞技技术的性能相当的收益。
Mert Kilickaya, Arnold W.M. Smeulders
Abstract: This paper studies visual search using structured queries. The structure is in the form of a 2D composition that encodes the position and the category of the objects. The transformation of the position and the category of the objects leads to a continuous-valued relationship between visual compositions, which carries highly beneficial information, although not leveraged by previous techniques. To that end, in this work, our goal is to leverage these continuous relationships by using the notion of symmetry in equivariance. Our model output is trained to change symmetrically with respect to the input transformations, leading to a sensitive feature space. Doing so leads to a highly efficient search technique, as our approach learns from fewer data using a smaller feature space. Experiments on two large-scale benchmarks of MS-COCO and HICO-DET demonstrates that our approach leads to a considerable gain in the performance against competing techniques.
摘要:本文研究了利用结构化查询,可视化搜索。该结构是在编码的位置和对象的类别二维组合物的形式。的位置的变换和对象引线的视觉组合物之间的连续值的关系,其携带高度有益的信息,尽管不是由先前的技术利用了类别。为此,在这项工作中,我们的目标是通过使用同变性对称的概念,利用这些连续的关系。我们的模型输出培训,以相对于输入转换对称发生变化,从而导致敏感的特征空间。这样做导致了高效的搜索技术,因为我们从更少的数据获悉方法使用一个较小的特征空间。在MS-COCO和HICO-DET的两次大规模的基准实验表明,我们的方法会导致同台竞技技术的性能相当的收益。
3. Improving Word Recognition using Multiple Hypotheses and Deep Embeddings [PDF] 返回目录
Siddhant Bansal, Praveen Krishnan, C.V. Jawahar
Abstract: We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10 percent in terms of word recognition accuracy.
摘要:我们提出了改善使用文字图像的嵌入单词识别精度的新方案。我们用一个训练有素的文本识别器,它可以预测多个文本假设对于一个给定词的图像。我们的融合方案提高利用从训练的字图像嵌入网络获得的文字图片和文字的嵌入识别过程。我们建议EmbedNet,这是使用学习合适的嵌入空间的字,其中嵌入图像的更接近于相应的文字转录的嵌入三重损失训练。因此,更新的嵌入空间有助于在更高的信心选择正确的预测。为了进一步提高精度,我们提出了一个所谓的信心基于精度助推器(CAB)插件和播放模块。在CAB模块接受来自文本识别器和所述的嵌入之间的欧几里德距离所获得的置信度得分,以生成更新的距离矢量。更新的距离向量具有用于正确的单词和更高的距离值不正确的话低的距离值。我们严格的系统评价对在印地文图书收藏我们提出的方法。我们的方法实现了10%左右的单词识别精度方面的绝对改善。
Siddhant Bansal, Praveen Krishnan, C.V. Jawahar
Abstract: We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10 percent in terms of word recognition accuracy.
摘要:我们提出了改善使用文字图像的嵌入单词识别精度的新方案。我们用一个训练有素的文本识别器,它可以预测多个文本假设对于一个给定词的图像。我们的融合方案提高利用从训练的字图像嵌入网络获得的文字图片和文字的嵌入识别过程。我们建议EmbedNet,这是使用学习合适的嵌入空间的字,其中嵌入图像的更接近于相应的文字转录的嵌入三重损失训练。因此,更新的嵌入空间有助于在更高的信心选择正确的预测。为了进一步提高精度,我们提出了一个所谓的信心基于精度助推器(CAB)插件和播放模块。在CAB模块接受来自文本识别器和所述的嵌入之间的欧几里德距离所获得的置信度得分,以生成更新的距离矢量。更新的距离向量具有用于正确的单词和更高的距离值不正确的话低的距离值。我们严格的系统评价对在印地文图书收藏我们提出的方法。我们的方法实现了10%左右的单词识别精度方面的绝对改善。
4. Pixel-based Facial Expression Synthesis [PDF] 返回目录
Arbish Akram, Nazar Khan
Abstract: Facial expression synthesis has achieved remarkable advances with the advent of Generative Adversarial Networks (GANs). However, GAN-based approaches mostly generate photo-realistic results as long as the testing data distribution is close to the training data distribution. The quality of GAN results significantly degrades when testing images are from a slightly different distribution. Moreover, recent work has shown that facial expressions can be synthesized by changing localized face regions. In this work, we propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed method achieves good generalization capability by leveraging only a few hundred training images. Experimental results demonstrate that the proposed method performs comparably well against state-of-the-art GANs on in-dataset images and significantly better on out-of-dataset images. In addition, the proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
摘要:面部表情合成取得了显着的进步与剖成对抗性网络(甘斯)的出现。然而,基于GaN的方法主要是生成照片般逼真的效果,只要测试数据分布接近于训练数据的分布。 GAN的质量显著导致劣化时测试图像是从稍微不同的分布。此外,最近的工作已表明面部表情可以通过改变局部的面部区域进行合成。在这项工作中,我们提出,其中每个输出像素观测只有一个输入像素中的基于像素的面部表情合成方法。该方法通过利用只有几百训练图像取得了较好的泛化能力。实验结果表明,在数据集在图像所提出的方法进行同等良好抵抗状态的最先进的甘斯和显著更好上外的数据集的图像。此外,该模型的幅度要小,这使得它适合在资源受限的设备部署两个数量级。
Arbish Akram, Nazar Khan
Abstract: Facial expression synthesis has achieved remarkable advances with the advent of Generative Adversarial Networks (GANs). However, GAN-based approaches mostly generate photo-realistic results as long as the testing data distribution is close to the training data distribution. The quality of GAN results significantly degrades when testing images are from a slightly different distribution. Moreover, recent work has shown that facial expressions can be synthesized by changing localized face regions. In this work, we propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed method achieves good generalization capability by leveraging only a few hundred training images. Experimental results demonstrate that the proposed method performs comparably well against state-of-the-art GANs on in-dataset images and significantly better on out-of-dataset images. In addition, the proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
摘要:面部表情合成取得了显着的进步与剖成对抗性网络(甘斯)的出现。然而,基于GaN的方法主要是生成照片般逼真的效果,只要测试数据分布接近于训练数据的分布。 GAN的质量显著导致劣化时测试图像是从稍微不同的分布。此外,最近的工作已表明面部表情可以通过改变局部的面部区域进行合成。在这项工作中,我们提出,其中每个输出像素观测只有一个输入像素中的基于像素的面部表情合成方法。该方法通过利用只有几百训练图像取得了较好的泛化能力。实验结果表明,在数据集在图像所提出的方法进行同等良好抵抗状态的最先进的甘斯和显著更好上外的数据集的图像。此外,该模型的幅度要小,这使得它适合在资源受限的设备部署两个数量级。
5. Learning to Infer Unseen Attribute-Object Compositions [PDF] 返回目录
Hui Chen, Zhixiong Nan, Jiang Jingjing, Nanning Zheng
Abstract: The composition recognition of unseen attribute-object is critical to make machines learn to decompose and compose complex concepts like people. Most of the existing methods are limited to the composition recognition of single-attribute-object, and can hardly distinguish the compositions with similar appearances. In this paper, a graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions. The model maps the visual features of images and the attribute-object category labels represented by word embedding vectors into a latent space. Then, according to the constraints of the attribute-object semantic association, distances are calculated between visual features and the corresponding label semantic features in the latent space. During the inference, the composition that is closest to the given image feature among all compositions is used as the reasoning result. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 composition categories. Experiments on MAD and two other single-attribute-object benchmark datasets demonstrate the effectiveness of our approach.
摘要:看不见的属性对象的组成识别关键是让机器学会分解和撰写人一样复杂的概念。大多数现有的方法仅限于单属性对象的组合物的识别,并且可以几乎不具有类似外观区分组合物。在本文中,基于图形的模型,提出了能够灵活识别这两种单和多属性对象的组合物。该模型映射的图像的视觉特征和由字嵌入矢量成潜在空间表示的属性的对象类别的标签。然后,根据该属性对象语义关联的制约,距离视觉特征,并在潜在空间对应的标签语义特征之间计算。在推论,即最接近于该给定的图像特征的所有组合物中的组合物被用作推理结果。此外,我们用116099个图像和8030点组成的类别建立了大型多属性数据集(MAD)。在MAD实验和其他两个单属性的对象基准数据集证明了该方法的有效性。
Hui Chen, Zhixiong Nan, Jiang Jingjing, Nanning Zheng
Abstract: The composition recognition of unseen attribute-object is critical to make machines learn to decompose and compose complex concepts like people. Most of the existing methods are limited to the composition recognition of single-attribute-object, and can hardly distinguish the compositions with similar appearances. In this paper, a graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions. The model maps the visual features of images and the attribute-object category labels represented by word embedding vectors into a latent space. Then, according to the constraints of the attribute-object semantic association, distances are calculated between visual features and the corresponding label semantic features in the latent space. During the inference, the composition that is closest to the given image feature among all compositions is used as the reasoning result. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 composition categories. Experiments on MAD and two other single-attribute-object benchmark datasets demonstrate the effectiveness of our approach.
摘要:看不见的属性对象的组成识别关键是让机器学会分解和撰写人一样复杂的概念。大多数现有的方法仅限于单属性对象的组合物的识别,并且可以几乎不具有类似外观区分组合物。在本文中,基于图形的模型,提出了能够灵活识别这两种单和多属性对象的组合物。该模型映射的图像的视觉特征和由字嵌入矢量成潜在空间表示的属性的对象类别的标签。然后,根据该属性对象语义关联的制约,距离视觉特征,并在潜在空间对应的标签语义特征之间计算。在推论,即最接近于该给定的图像特征的所有组合物中的组合物被用作推理结果。此外,我们用116099个图像和8030点组成的类别建立了大型多属性数据集(MAD)。在MAD实验和其他两个单属性的对象基准数据集证明了该方法的有效性。
6. SIRI: Spatial Relation Induced Network For Spatial Description Resolution [PDF] 返回目录
Peiyao Wang, Weixin Luo, Yanyu Xu, Haojie Li, Shugong Xu, Jianyu Yang, Shenghua Gao
Abstract: Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional reasoning ambiguities. Both the linguistic and visual features are concatenated to finalize the target localization. Experimental results on the Touchdown show that our method is around 24\% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius. Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
摘要:空间描述分辨率,作为一种语言引导下定位任务,提出了在全景街道视图的目标位置,给予相应的语言描述。同时蒸空间关系,目前不存在,但关键这项任务明确表征对象层次关系。模仿人类,谁依次横向空间关系的词和对象以第一人称视角来定位自己的目标,我们提出了诱导(SIRI)网络一种新型的空间关系。具体而言,视觉特征首先在投影潜在空间隐式对象级相关;然后它们通过每个空间关系字蒸馏,产生表示每个空间关系不同的每个启动部件。此外,我们引入全球位置先验固定的情况下的位置信息,这可能导致全球定位推理歧义。无论是语言和视觉功能连接起来,最后确定目标定位。在触地表明,我们的方法是约24 \%比国家的最先进的方法在精度方面,通过一个80像素半径测量更好的实验结果。我们的方法还推广使用了相同的设置着陆以及我们提出的扩展数据集的收集。
Peiyao Wang, Weixin Luo, Yanyu Xu, Haojie Li, Shugong Xu, Jianyu Yang, Shenghua Gao
Abstract: Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional reasoning ambiguities. Both the linguistic and visual features are concatenated to finalize the target localization. Experimental results on the Touchdown show that our method is around 24\% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius. Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
摘要:空间描述分辨率,作为一种语言引导下定位任务,提出了在全景街道视图的目标位置,给予相应的语言描述。同时蒸空间关系,目前不存在,但关键这项任务明确表征对象层次关系。模仿人类,谁依次横向空间关系的词和对象以第一人称视角来定位自己的目标,我们提出了诱导(SIRI)网络一种新型的空间关系。具体而言,视觉特征首先在投影潜在空间隐式对象级相关;然后它们通过每个空间关系字蒸馏,产生表示每个空间关系不同的每个启动部件。此外,我们引入全球位置先验固定的情况下的位置信息,这可能导致全球定位推理歧义。无论是语言和视觉功能连接起来,最后确定目标定位。在触地表明,我们的方法是约24 \%比国家的最先进的方法在精度方面,通过一个80像素半径测量更好的实验结果。我们的方法还推广使用了相同的设置着陆以及我们提出的扩展数据集的收集。
7. Ice Monitoring in Swiss Lakes from Optical Satellites and Webcams using Machine Learning [PDF] 返回目录
Manu Tom, Rajanie Prabha, Tianyu Wu, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler
Abstract: Continuous observation of climate indicators, such as trends in lake freezing, is important to understand the dynamics of the local and global climate system. Consequently, lake ice has been included among the Essential Climate Variables (ECVs) of the Global Climate Observing System (GCOS), and there is a need to set up operational monitoring capabilities. Multi-temporal satellite images and publicly available webcam streams are among the viable data sources to monitor lake ice. In this work we investigate machine learning-based image analysis as a tool to determine the spatio-temporal extent of ice on Swiss Alpine lakes as well as the ice-on and ice-off dates, from both multispectral optical satellite images (VIIRS and MODIS) and RGB webcam images. We model lake ice monitoring as a pixel-wise semantic segmentation problem, i.e., each pixel on the lake surface is classified to obtain a spatially explicit map of ice cover. We show experimentally that the proposed system produces consistently good results when tested on data from multiple winters and lakes. Our satellite-based method obtains mean Intersection-over-Union (mIoU) scores >93%, for both sensors. It also generalises well across lakes and winters with mIoU scores >78% and >80% respectively. On average, our webcam approach achieves mIoU values of 87% (approx.) and generalisation scores of 71% (approx.) and 69% (approx.) across different cameras and winters respectively. Additionally, we put forward a new benchmark dataset of webcam images (Photi-LakeIce) which includes data from two winters and three cameras.
摘要:气候指标,如在湖边结冰的趋势持续观察,重要的是要了解当地和全球气候系统的动态特性。因此,湖冰已被列入全球气候观测系统(GCOS)的基本气候变量(ECV)之一,有必要设立业务监测能力。多时卫星图像和公开的网络摄像头流是可行的数据来源,以监测冰湖中。在这项工作中,我们调查机基于学习的图像分析作为一种工具,以确定冰的时空程度上瑞士阿尔卑斯湖泊以及冰上和冰断日期,从两个多光谱光学卫星图像(VIIRS和MODIS )和RGB摄像头的图像。我们模型湖冰监测作为逐像素语义分割的问题,即,湖面上的每个像素被分类,以获得冰覆盖的空间直观图。我们实验表明,当来自多个冬季和湖泊数据测试所提出的系统产生持续良好的效果。我们的基于卫星的方法求出平均交叉口-过联盟(米欧)得分> 93%,对于两个传感器。它也跨越湖泊和冬天可以推广以及分别与80%米欧得分> 78%和>。平均来说,我们的摄像头方法实现了87%的值米欧(大约)和跨越分别不同的摄像机和冬天的71%(约)和69%的概括得分(大约)。此外,我们提出了网络摄像头的图像(Photi-LakeIce),其中包括来自两个冬天和三个摄像头数据的新的基准数据集。
Manu Tom, Rajanie Prabha, Tianyu Wu, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler
Abstract: Continuous observation of climate indicators, such as trends in lake freezing, is important to understand the dynamics of the local and global climate system. Consequently, lake ice has been included among the Essential Climate Variables (ECVs) of the Global Climate Observing System (GCOS), and there is a need to set up operational monitoring capabilities. Multi-temporal satellite images and publicly available webcam streams are among the viable data sources to monitor lake ice. In this work we investigate machine learning-based image analysis as a tool to determine the spatio-temporal extent of ice on Swiss Alpine lakes as well as the ice-on and ice-off dates, from both multispectral optical satellite images (VIIRS and MODIS) and RGB webcam images. We model lake ice monitoring as a pixel-wise semantic segmentation problem, i.e., each pixel on the lake surface is classified to obtain a spatially explicit map of ice cover. We show experimentally that the proposed system produces consistently good results when tested on data from multiple winters and lakes. Our satellite-based method obtains mean Intersection-over-Union (mIoU) scores >93%, for both sensors. It also generalises well across lakes and winters with mIoU scores >78% and >80% respectively. On average, our webcam approach achieves mIoU values of 87% (approx.) and generalisation scores of 71% (approx.) and 69% (approx.) across different cameras and winters respectively. Additionally, we put forward a new benchmark dataset of webcam images (Photi-LakeIce) which includes data from two winters and three cameras.
摘要:气候指标,如在湖边结冰的趋势持续观察,重要的是要了解当地和全球气候系统的动态特性。因此,湖冰已被列入全球气候观测系统(GCOS)的基本气候变量(ECV)之一,有必要设立业务监测能力。多时卫星图像和公开的网络摄像头流是可行的数据来源,以监测冰湖中。在这项工作中,我们调查机基于学习的图像分析作为一种工具,以确定冰的时空程度上瑞士阿尔卑斯湖泊以及冰上和冰断日期,从两个多光谱光学卫星图像(VIIRS和MODIS )和RGB摄像头的图像。我们模型湖冰监测作为逐像素语义分割的问题,即,湖面上的每个像素被分类,以获得冰覆盖的空间直观图。我们实验表明,当来自多个冬季和湖泊数据测试所提出的系统产生持续良好的效果。我们的基于卫星的方法求出平均交叉口-过联盟(米欧)得分> 93%,对于两个传感器。它也跨越湖泊和冬天可以推广以及分别与80%米欧得分> 78%和>。平均来说,我们的摄像头方法实现了87%的值米欧(大约)和跨越分别不同的摄像机和冬天的71%(约)和69%的概括得分(大约)。此外,我们提出了网络摄像头的图像(Photi-LakeIce),其中包括来自两个冬天和三个摄像头数据的新的基准数据集。
8. Fast Local Attack: Generating Local Adversarial Examples for Object Detectors [PDF] 返回目录
Quanyu Liao, Xin Wang, Bin Kong, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu
Abstract: The deep neural network is vulnerable to adversarial examples. Adding imperceptible adversarial perturbations to images is enough to make them fail. Most existing research focuses on attacking image classifiers or anchor-based object detectors, but they generate globally perturbation on the whole image, which is unnecessary. In our work, we leverage higher-level semantic information to generate high aggressive local perturbations for anchor-free object detectors. As a result, it is less computationally intensive and achieves a higher black-box attack as well as transferring attack performance. The adversarial examples generated by our method are not only capable of attacking anchor-free object detectors, but also able to be transferred to attack anchor-based object detector.
摘要:深层神经网络很容易受到对抗性的例子。添加潜移默化的对抗扰动的图像是足以让他们失败。大多数现有的研究主要集中于攻击图像分类或基于锚的对象检测器,但它们所产生的扰动全球的整体形象,这是不必要的。在我们的工作中,我们充分利用更高级别的语义信息来生成无锚对象探测器高侵略性局部扰动。其结果是,它不太密集计算和实现更高的黑盒攻击以及攻击传输性能。由我们的方法所产生的对抗式的例子是不仅能够攻击无锚对象检测器,而且还能够被转移到基于锚攻击对象检测器。
Quanyu Liao, Xin Wang, Bin Kong, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu
Abstract: The deep neural network is vulnerable to adversarial examples. Adding imperceptible adversarial perturbations to images is enough to make them fail. Most existing research focuses on attacking image classifiers or anchor-based object detectors, but they generate globally perturbation on the whole image, which is unnecessary. In our work, we leverage higher-level semantic information to generate high aggressive local perturbations for anchor-free object detectors. As a result, it is less computationally intensive and achieves a higher black-box attack as well as transferring attack performance. The adversarial examples generated by our method are not only capable of attacking anchor-free object detectors, but also able to be transferred to attack anchor-based object detector.
摘要:深层神经网络很容易受到对抗性的例子。添加潜移默化的对抗扰动的图像是足以让他们失败。大多数现有的研究主要集中于攻击图像分类或基于锚的对象检测器,但它们所产生的扰动全球的整体形象,这是不必要的。在我们的工作中,我们充分利用更高级别的语义信息来生成无锚对象探测器高侵略性局部扰动。其结果是,它不太密集计算和实现更高的黑盒攻击以及攻击传输性能。由我们的方法所产生的对抗式的例子是不仅能够攻击无锚对象检测器,而且还能够被转移到基于锚攻击对象检测器。
9. A Method of Generating Measurable Panoramic Image for Indoor Mobile Measurement System [PDF] 返回目录
Hao Ma, Jingbin Liu, Zhirong Hu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
Abstract: This paper designs a technique route to generate high-quality panoramic image with depth information, which involves two critical research hotspots: fusion of LiDAR and image data and image stitching. For the fusion of 3D points and image data, since a sparse depth map can be firstly generated by projecting LiDAR point onto the RGB image plane based on our reliable calibrated and synchronized sensors, we adopt a parameter self-adaptive framework to produce 2D dense depth map. For image stitching, optimal seamline for the overlapping area is searched using a graph-cuts-based method to alleviate the geometric influence and image blending based on the pyramid multi-band is utilized to eliminate the photometric effects near the stitching line. Since each pixel is associated with a depth value, we design this depth value as a radius in the spherical projection which can further project the panoramic image to the world coordinate and consequently produces a high-quality measurable panoramic image. The purposed method is tested on the data from our data collection platform and presents a satisfactory application prospects.
摘要:本文设计了一种技术路线,以产生具有深度信息,其涉及两个关键的研究热点高质量全景图像:激光雷达的融合图像数据和图像拼接。对于3D点和图像数据的融合,由于稀疏深度图可通过投射激光雷达点到基于我们的可靠校准和同步传感器的RGB图像平面被首先产生,我们采用一个参数自适应框架以产生2D稠密深度地图。为图像拼接,对于重叠区域最佳接缝线是使用基于图形的切割方法以减轻基于金字塔多频带的几何影响和图像混合被用来消除缝合线附近的测光效果搜索。由于每个像素与深度值相关联,我们设计该深度值作为球面投影,可以进一步突出全景图像世界坐标并因此产生高品质的可测量全景图像的半径。该旨意方法从我们的数据收集平台的数据,并呈现出良好的应用前景进行测试。
Hao Ma, Jingbin Liu, Zhirong Hu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
Abstract: This paper designs a technique route to generate high-quality panoramic image with depth information, which involves two critical research hotspots: fusion of LiDAR and image data and image stitching. For the fusion of 3D points and image data, since a sparse depth map can be firstly generated by projecting LiDAR point onto the RGB image plane based on our reliable calibrated and synchronized sensors, we adopt a parameter self-adaptive framework to produce 2D dense depth map. For image stitching, optimal seamline for the overlapping area is searched using a graph-cuts-based method to alleviate the geometric influence and image blending based on the pyramid multi-band is utilized to eliminate the photometric effects near the stitching line. Since each pixel is associated with a depth value, we design this depth value as a radius in the spherical projection which can further project the panoramic image to the world coordinate and consequently produces a high-quality measurable panoramic image. The purposed method is tested on the data from our data collection platform and presents a satisfactory application prospects.
摘要:本文设计了一种技术路线,以产生具有深度信息,其涉及两个关键的研究热点高质量全景图像:激光雷达的融合图像数据和图像拼接。对于3D点和图像数据的融合,由于稀疏深度图可通过投射激光雷达点到基于我们的可靠校准和同步传感器的RGB图像平面被首先产生,我们采用一个参数自适应框架以产生2D稠密深度地图。为图像拼接,对于重叠区域最佳接缝线是使用基于图形的切割方法以减轻基于金字塔多频带的几何影响和图像混合被用来消除缝合线附近的测光效果搜索。由于每个像素与深度值相关联,我们设计该深度值作为球面投影,可以进一步突出全景图像世界坐标并因此产生高品质的可测量全景图像的半径。该旨意方法从我们的数据收集平台的数据,并呈现出良好的应用前景进行测试。
10. End-to-end trainable network for degraded license plate detection via vehicle-plate relation mining [PDF] 返回目录
Song-Lu Chen, Shu Tian, Jia-Wei Ma, Qi Liu, Chun Yang, Feng Chen, Xu-Cheng Yin
Abstract: License plate detection is the first and essential step of the license plate recognition system and is still challenging in real applications, such as on-road scenarios. In particular, small-sized and oblique license plates, mainly caused by the distant and mobile camera, are difficult to detect. In this work, we propose a novel and applicable method for degraded license plate detection via vehicle-plate relation mining, which localizes the license plate in a coarse-to-fine scheme. First, we propose to estimate the local region around the license plate by using the relationships between the vehicle and the license plate, which can greatly reduce the search area and precisely detect very small-sized license plates. Second, we propose to predict the quadrilateral bounding box in the local region by regressing the four corners of the license plate to robustly detect oblique license plates. Moreover, the whole network can be trained in an end-to-end manner. Extensive experiments verify the effectiveness of our proposed method for small-sized and oblique license plates. Codes are available at this https URL.
摘要:车牌检测是车牌识别系统的第一和必不可少的步骤,并在实际应用中,如在道路上的情况仍然充满挑战。特别地,小尺寸和倾斜车牌,主要是由远处的和移动相机,难以检测。在这项工作中,我们提出了通过车辆板关系挖掘,其中局部化的车牌在粗到细的方案降低车牌检测一种新颖的和应用的方法。首先,我们建议通过将车辆和车牌之间的关系,这样可以大大减少搜索区域,精确地探测到极小的尺寸车牌估计各地车牌的局部区域。第二,我们提出通过回归车牌的四个角稳健地检测倾斜车牌来预测在局部区域中的四边形边界框。此外,整个网络可以在端至端的方式来训练。广泛的实验验证我们提出的方法的对小尺寸和倾斜车牌的有效性。代码可在此HTTPS URL。
Song-Lu Chen, Shu Tian, Jia-Wei Ma, Qi Liu, Chun Yang, Feng Chen, Xu-Cheng Yin
Abstract: License plate detection is the first and essential step of the license plate recognition system and is still challenging in real applications, such as on-road scenarios. In particular, small-sized and oblique license plates, mainly caused by the distant and mobile camera, are difficult to detect. In this work, we propose a novel and applicable method for degraded license plate detection via vehicle-plate relation mining, which localizes the license plate in a coarse-to-fine scheme. First, we propose to estimate the local region around the license plate by using the relationships between the vehicle and the license plate, which can greatly reduce the search area and precisely detect very small-sized license plates. Second, we propose to predict the quadrilateral bounding box in the local region by regressing the four corners of the license plate to robustly detect oblique license plates. Moreover, the whole network can be trained in an end-to-end manner. Extensive experiments verify the effectiveness of our proposed method for small-sized and oblique license plates. Codes are available at this https URL.
摘要:车牌检测是车牌识别系统的第一和必不可少的步骤,并在实际应用中,如在道路上的情况仍然充满挑战。特别地,小尺寸和倾斜车牌,主要是由远处的和移动相机,难以检测。在这项工作中,我们提出了通过车辆板关系挖掘,其中局部化的车牌在粗到细的方案降低车牌检测一种新颖的和应用的方法。首先,我们建议通过将车辆和车牌之间的关系,这样可以大大减少搜索区域,精确地探测到极小的尺寸车牌估计各地车牌的局部区域。第二,我们提出通过回归车牌的四个角稳健地检测倾斜车牌来预测在局部区域中的四边形边界框。此外,整个网络可以在端至端的方式来训练。广泛的实验验证我们提出的方法的对小尺寸和倾斜车牌的有效性。代码可在此HTTPS URL。
11. A Simple and Efficient Registration of 3D Point Cloud and Image Data for Indoor Mobile Mapping System [PDF] 返回目录
Hao Ma, Jingbin Liu, Keke Liu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
Abstract: Registration of 3D LiDAR point clouds with optical images is critical in the combination of multi-source data. Geometric misalignment originally exists in the pose data between LiDAR point clouds and optical images. To improve the accuracy of the initial pose and the applicability of the integration of 3D points and image data, we develop a simple but efficient registration method. We firstly extract point features from LiDAR point clouds and images: point features is extracted from single-frame LiDAR and point features from images using classical Canny method. Cost map is subsequently built based on Canny image edge detection. The optimization direction is guided by the cost map where low cost represents the the desired direction, and loss function is also considered to improve the robustness of the the purposed method. Experiments show pleasant results.
摘要:光学图像三维激光雷达点云的登记是在多源数据的组合是至关重要的。几何不一致原本存在于激光雷达点云和光学图像之间的姿势数据。为了提高初始姿态的精度和的3D点和图像数据的整合的适用性,我们开发了一个简单而有效的配准方法。我们首先从激光雷达点云和图像中提取特征点:点的特征是从单帧激光雷达提取并从点使用经典的Canny方法的图像特征。成本地图是基于Canny算子图像边缘检测随后建造。最优化方向由成本地图,低成本代表所期望的方向引导,并且损失函数也被认为是改善的旨意方法的鲁棒性。实验表明愉快的结果。
Hao Ma, Jingbin Liu, Keke Liu, Hongyu Qiu, Dong Xu, Zemin Wang, Xiaodong Gong, Sheng Yang
Abstract: Registration of 3D LiDAR point clouds with optical images is critical in the combination of multi-source data. Geometric misalignment originally exists in the pose data between LiDAR point clouds and optical images. To improve the accuracy of the initial pose and the applicability of the integration of 3D points and image data, we develop a simple but efficient registration method. We firstly extract point features from LiDAR point clouds and images: point features is extracted from single-frame LiDAR and point features from images using classical Canny method. Cost map is subsequently built based on Canny image edge detection. The optimization direction is guided by the cost map where low cost represents the the desired direction, and loss function is also considered to improve the robustness of the the purposed method. Experiments show pleasant results.
摘要:光学图像三维激光雷达点云的登记是在多源数据的组合是至关重要的。几何不一致原本存在于激光雷达点云和光学图像之间的姿势数据。为了提高初始姿态的精度和的3D点和图像数据的整合的适用性,我们开发了一个简单而有效的配准方法。我们首先从激光雷达点云和图像中提取特征点:点的特征是从单帧激光雷达提取并从点使用经典的Canny方法的图像特征。成本地图是基于Canny算子图像边缘检测随后建造。最优化方向由成本地图,低成本代表所期望的方向引导,并且损失函数也被认为是改善的旨意方法的鲁棒性。实验表明愉快的结果。
12. Reconstruction of Voxels with Position- and Angle-Dependent Weightings [PDF] 返回目录
Lina Felsner, Tobias Würfl, Christopher Syben, Philipp Roser, Alexander Preuhs, Andreas Maier, Christian Riess
Abstract: The reconstruction problem of voxels with individual weightings can be modeled a position- and angle- dependent function in the forward-projection. This changes the system matrix and prohibits to use standard filtered backprojection. In this work we first formulate this reconstruction problem in terms of a system matrix and weighting part. We compute the pseudoinverse and show that the solution is rank-deficient and hence very ill posed. This is a fundamental limitation for reconstruction. We then derive an iterative solution and experimentally show its uperiority to any closed-form solution.
摘要:与个别的体素的权重的重建问题可以在正向投影来建模的位置 - 和闭角依赖性的功能。这改变了系统矩阵,并禁止使用标准滤波反。在这项工作中,我们首先在系统矩阵和加权部分条款提出这个重建的问题。我们计算伪逆,并显示该解决方案是秩亏,因此非常不适定。这是重建的基本限制。然后,我们得出一个迭代求解和实验显示其uperiority任何封闭形式的解决方案。
Lina Felsner, Tobias Würfl, Christopher Syben, Philipp Roser, Alexander Preuhs, Andreas Maier, Christian Riess
Abstract: The reconstruction problem of voxels with individual weightings can be modeled a position- and angle- dependent function in the forward-projection. This changes the system matrix and prohibits to use standard filtered backprojection. In this work we first formulate this reconstruction problem in terms of a system matrix and weighting part. We compute the pseudoinverse and show that the solution is rank-deficient and hence very ill posed. This is a fundamental limitation for reconstruction. We then derive an iterative solution and experimentally show its uperiority to any closed-form solution.
摘要:与个别的体素的权重的重建问题可以在正向投影来建模的位置 - 和闭角依赖性的功能。这改变了系统矩阵,并禁止使用标准滤波反。在这项工作中,我们首先在系统矩阵和加权部分条款提出这个重建的问题。我们计算伪逆,并显示该解决方案是秩亏,因此非常不适定。这是重建的基本限制。然后,我们得出一个迭代求解和实验显示其uperiority任何封闭形式的解决方案。
13. Mining Generalized Features for Detecting AI-Manipulated Fake Faces [PDF] 返回目录
Yang Yu, Rongrong Ni, Yao Zhao
Abstract: Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society. Although existing detection methods consider different categories of fake faces, the performance on detecting the fake faces with "unseen" manipulation techniques is still poor due to the distribution bias among cross-manipulation techniques. To solve this problem, we propose a novel framework that focuses on mining intrinsic features and further eliminating the distribution bias to improve the generalization ability. Firstly, we focus on mining the intrinsic clues in the channel difference image (CDI) and spectrum image (SI) from the camera imaging process and the indispensable step in AI manipulation process. Then, we introduce the Octave Convolution (OctConv) and an attention-based fusion module to effectively and adaptively mine intrinsic features from CDI and SI. Finally, we design an alignment module to eliminate the bias of manipulation techniques to obtain a more generalized detection framework. We evaluate the proposed framework on four categories of fake faces datasets with the most popular and state-of-the-art manipulation techniques, and achieve very competitive performances. To further verify the generalization ability of the proposed framework, we conduct experiments on cross-manipulation techniques, and the results show the advantages of our method.
摘要:近日,AI操纵的脸技术已迅速和不断发展,这已经引起社会的新的安全问题。虽然现有的检测方法考虑不同类别的假脸,与“看不见的”操作技术检测所述假脸的性能仍交叉操作技术中的分配偏压差所致。为了解决这个问题,我们建议侧重于挖掘内在特性和进一步消除分布的偏差,提高泛化能力的新的框架。首先,我们专注于挖掘信道差分图像中的固有线索(CDI)和从所述照相机成像过程频谱图像(SI)和AI操纵过程中不可缺少的步骤。然后,我们介绍了八度卷积(OctConv)和注意力基于融合模块从CDI和SI有效和自适应煤矿本质特征。最后,我们设计了一个对准模块,以消除的操作技术的偏置,以获得更广义的检测框架。我们评估四类假的拟议框架面临的数据集与最流行的和国家的最先进的操作技术,并取得非常有竞争力的表演。为了进一步验证了该框架的泛化能力,我们进行跨操作技术实验,结果表明我们的方法的优点。
Yang Yu, Rongrong Ni, Yao Zhao
Abstract: Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society. Although existing detection methods consider different categories of fake faces, the performance on detecting the fake faces with "unseen" manipulation techniques is still poor due to the distribution bias among cross-manipulation techniques. To solve this problem, we propose a novel framework that focuses on mining intrinsic features and further eliminating the distribution bias to improve the generalization ability. Firstly, we focus on mining the intrinsic clues in the channel difference image (CDI) and spectrum image (SI) from the camera imaging process and the indispensable step in AI manipulation process. Then, we introduce the Octave Convolution (OctConv) and an attention-based fusion module to effectively and adaptively mine intrinsic features from CDI and SI. Finally, we design an alignment module to eliminate the bias of manipulation techniques to obtain a more generalized detection framework. We evaluate the proposed framework on four categories of fake faces datasets with the most popular and state-of-the-art manipulation techniques, and achieve very competitive performances. To further verify the generalization ability of the proposed framework, we conduct experiments on cross-manipulation techniques, and the results show the advantages of our method.
摘要:近日,AI操纵的脸技术已迅速和不断发展,这已经引起社会的新的安全问题。虽然现有的检测方法考虑不同类别的假脸,与“看不见的”操作技术检测所述假脸的性能仍交叉操作技术中的分配偏压差所致。为了解决这个问题,我们建议侧重于挖掘内在特性和进一步消除分布的偏差,提高泛化能力的新的框架。首先,我们专注于挖掘信道差分图像中的固有线索(CDI)和从所述照相机成像过程频谱图像(SI)和AI操纵过程中不可缺少的步骤。然后,我们介绍了八度卷积(OctConv)和注意力基于融合模块从CDI和SI有效和自适应煤矿本质特征。最后,我们设计了一个对准模块,以消除的操作技术的偏置,以获得更广义的检测框架。我们评估四类假的拟议框架面临的数据集与最流行的和国家的最先进的操作技术,并取得非常有竞争力的表演。为了进一步验证了该框架的泛化能力,我们进行跨操作技术实验,结果表明我们的方法的优点。
14. Co-attentional Transformers for Story-Based Video Understanding [PDF] 返回目录
Björn Bebensee, Byoung-Tak Zhang
Abstract: Inspired by recent trends in vision and language learning, we explore applications of attention mechanisms for visio-lingual fusion within an application to story-based video understanding. Like other video-based QA tasks, video story understanding requires agents to grasp complex temporal dependencies. However, as it focuses on the narrative aspect of video it also requires understanding of the interactions between different characters, as well as their actions and their motivations. We propose a novel co-attentional transformer model to better capture long-term dependencies seen in visual stories such as dramas and measure its performance on the video question answering task. We evaluate our approach on the recently introduced DramaQA dataset which features character-centered video story understanding questions. Our model outperforms the baseline model by 8 percentage points overall, at least 4.95 and up to 12.8 percentage points on all difficulty levels and manages to beat the winner of the DramaQA challenge.
摘要:通过视觉和语言学习的最新趋势的启发,我们的应用程序,以故事为基础的视频理解范围内探索Visio的语言融合的重视机制的应用程序。像其他基于视频的QA任务,视频故事的理解要求代理商掌握复杂的时间依赖性。然而,因为它专注于视频的叙述方面也需要不同的角色,以及他们的行动和动机之间的相互作用的理解。我们提出了一个新颖的共注意力变压器模型在视觉故事,看到更好的捕捉长期相关性,如戏剧,并测量其上的视频答疑任务性能。我们评估在最近推出DramaQA数据集为特色以角色为中心视频故事的理解问题,我们的做法。我们的模型了8个百分点,优于基准模型整体而言,至少4.95和高达所有困难水平12.8个百分点,并设法击败DramaQA挑战的赢家。
Björn Bebensee, Byoung-Tak Zhang
Abstract: Inspired by recent trends in vision and language learning, we explore applications of attention mechanisms for visio-lingual fusion within an application to story-based video understanding. Like other video-based QA tasks, video story understanding requires agents to grasp complex temporal dependencies. However, as it focuses on the narrative aspect of video it also requires understanding of the interactions between different characters, as well as their actions and their motivations. We propose a novel co-attentional transformer model to better capture long-term dependencies seen in visual stories such as dramas and measure its performance on the video question answering task. We evaluate our approach on the recently introduced DramaQA dataset which features character-centered video story understanding questions. Our model outperforms the baseline model by 8 percentage points overall, at least 4.95 and up to 12.8 percentage points on all difficulty levels and manages to beat the winner of the DramaQA challenge.
摘要:通过视觉和语言学习的最新趋势的启发,我们的应用程序,以故事为基础的视频理解范围内探索Visio的语言融合的重视机制的应用程序。像其他基于视频的QA任务,视频故事的理解要求代理商掌握复杂的时间依赖性。然而,因为它专注于视频的叙述方面也需要不同的角色,以及他们的行动和动机之间的相互作用的理解。我们提出了一个新颖的共注意力变压器模型在视觉故事,看到更好的捕捉长期相关性,如戏剧,并测量其上的视频答疑任务性能。我们评估在最近推出DramaQA数据集为特色以角色为中心视频故事的理解问题,我们的做法。我们的模型了8个百分点,优于基准模型整体而言,至少4.95和高达所有困难水平12.8个百分点,并设法击败DramaQA挑战的赢家。
15. A Multi-task Two-stream Spatiotemporal Convolutional Neural Network for Convective Storm Nowcasting [PDF] 返回目录
W. Zhang, H. Liu, P. Li, L. Han
Abstract: The goal of convective storm nowcasting is local prediction of severe and imminent convective storms. Here, we consider the convective storm nowcasting problem from the perspective of machine learning. First, we use a pixel-wise sampling method to construct spatiotemporal features for nowcasting, and flexibly adjust the proportions of positive and negative samples in the training set to mitigate class-imbalance issues. Second, we employ a concise two-stream convolutional neural network to extract spatial and temporal cues for nowcasting. This simplifies the network structure, reduces the training time requirement, and improves classification accuracy. The two-stream network used both radar and satellite data. In the resulting two-stream, fused convolutional neural network, some of the parameters are entered into a single-stream convolutional neural network, but it can learn the features of many data. Further, considering the relevance of classification and regression tasks, we develop a multi-task learning strategy that predicts the labels used in such tasks. We integrate two-stream multi-task learning into a single convolutional neural network. Given the compact architecture, this network is more efficient and easier to optimize than existing recurrent neural networks.
摘要:对流风暴临近预报的目标是严重的,迫在眉睫的对流风暴的局部预测。在这里,我们考虑从机器学习的角度对流风暴临近预报问题。首先,我们使用逐像素采样方法来构造时空特征为临近预报,并灵活调整阳性和阴性样品的比例在训练集中,以减轻类不平衡的问题。其次,我们采用了简洁的两流卷积神经网络来提取时空线索临近预报。这简化了网络结构,减少了训练时间要求,和提高了分类的准确性。两流网络中使用的雷达和卫星数据。在产生的两个流,融合卷积神经网络,一些参数被输入到单流卷积神经网络,但它可以学到许多数据的功能。此外,考虑到分类和回归任务的相关性,我们开发了预测,这些任务使用的标签多任务学习策略。我们集成了两个流多任务学习到一个单一的卷积神经网络。由于结构紧凑,该网络是更有效和更容易进行优化比现有反复发作的神经网络。
W. Zhang, H. Liu, P. Li, L. Han
Abstract: The goal of convective storm nowcasting is local prediction of severe and imminent convective storms. Here, we consider the convective storm nowcasting problem from the perspective of machine learning. First, we use a pixel-wise sampling method to construct spatiotemporal features for nowcasting, and flexibly adjust the proportions of positive and negative samples in the training set to mitigate class-imbalance issues. Second, we employ a concise two-stream convolutional neural network to extract spatial and temporal cues for nowcasting. This simplifies the network structure, reduces the training time requirement, and improves classification accuracy. The two-stream network used both radar and satellite data. In the resulting two-stream, fused convolutional neural network, some of the parameters are entered into a single-stream convolutional neural network, but it can learn the features of many data. Further, considering the relevance of classification and regression tasks, we develop a multi-task learning strategy that predicts the labels used in such tasks. We integrate two-stream multi-task learning into a single convolutional neural network. Given the compact architecture, this network is more efficient and easier to optimize than existing recurrent neural networks.
摘要:对流风暴临近预报的目标是严重的,迫在眉睫的对流风暴的局部预测。在这里,我们考虑从机器学习的角度对流风暴临近预报问题。首先,我们使用逐像素采样方法来构造时空特征为临近预报,并灵活调整阳性和阴性样品的比例在训练集中,以减轻类不平衡的问题。其次,我们采用了简洁的两流卷积神经网络来提取时空线索临近预报。这简化了网络结构,减少了训练时间要求,和提高了分类的准确性。两流网络中使用的雷达和卫星数据。在产生的两个流,融合卷积神经网络,一些参数被输入到单流卷积神经网络,但它可以学到许多数据的功能。此外,考虑到分类和回归任务的相关性,我们开发了预测,这些任务使用的标签多任务学习策略。我们集成了两个流多任务学习到一个单一的卷积神经网络。由于结构紧凑,该网络是更有效和更容易进行优化比现有反复发作的神经网络。
16. MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering [PDF] 返回目录
Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
Abstract: We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities. Our approach benefits from processing multimodal data (video and text) adopting the BERT encodings individually and using a novel transformer-based fusion method to fuse them together. Our method decomposes the different sources of modalities, into different BERT instances with similar architectures, but variable weights. This achieves SOTA results on the TVQA dataset. Additionally, we provide TVQA-Visual, an isolated diagnostic subset of TVQA, which strictly requires the knowledge of visual (V) modality based on a human annotator's judgment. This set of questions helps us to study the model's behavior and the challenges TVQA poses to prevent the achievement of super human performance. Extensive experiments show the effectiveness and superiority of our method.
摘要:我们目前MMFT-BERT(多模态融合互感器BERT编码),解决了视觉问题解答(VQA)确保多种输入方式的单一和组合处理。我们从处理多模态数据(视频和文本)单独采用BERT编码和使用新颖的基于变压器的融合方法融合在一起的方法的好处。我们的方法分解模态的不同来源,与类似的体系结构,但变权不同BERT实例。这实现了在TVQA数据集SOTA结果。此外,我们还提供TVQA,视觉,TVQA,其严格要求视觉(V)形态的基于人注释的判断知识的隔离诊断子集。这组问题,帮助我们研究模型的行为和所面临的挑战TVQA姿势,防止超人类表现的成就。大量的实验证明了该方法的有效性和优越性。
Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
Abstract: We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities. Our approach benefits from processing multimodal data (video and text) adopting the BERT encodings individually and using a novel transformer-based fusion method to fuse them together. Our method decomposes the different sources of modalities, into different BERT instances with similar architectures, but variable weights. This achieves SOTA results on the TVQA dataset. Additionally, we provide TVQA-Visual, an isolated diagnostic subset of TVQA, which strictly requires the knowledge of visual (V) modality based on a human annotator's judgment. This set of questions helps us to study the model's behavior and the challenges TVQA poses to prevent the achievement of super human performance. Extensive experiments show the effectiveness and superiority of our method.
摘要:我们目前MMFT-BERT(多模态融合互感器BERT编码),解决了视觉问题解答(VQA)确保多种输入方式的单一和组合处理。我们从处理多模态数据(视频和文本)单独采用BERT编码和使用新颖的基于变压器的融合方法融合在一起的方法的好处。我们的方法分解模态的不同来源,与类似的体系结构,但变权不同BERT实例。这实现了在TVQA数据集SOTA结果。此外,我们还提供TVQA,视觉,TVQA,其严格要求视觉(V)形态的基于人注释的判断知识的隔离诊断子集。这组问题,帮助我们研究模型的行为和所面临的挑战TVQA姿势,防止超人类表现的成就。大量的实验证明了该方法的有效性和优越性。
17. $P^2$ Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation [PDF] 返回目录
Luanxuan Hou, Jie Cao, Yuan Zhao, Haifeng Shen, Jian Tang, Ran He
Abstract: We propose an augmented Parallel-Pyramid Net ($P^2~Net$) with feature refinement by dilated bottleneck and attention module. During data preprocessing, we proposed a differentiable auto data augmentation ($DA^2$) method. We formulate the problem of searching data augmentaion policy in a differentiable form, so that the optimal policy setting can be easily updated by back propagation during training. $DA^2$ improves the training efficiency. A parallel-pyramid structure is followed to compensate the information loss introduced by the network. We innovate two fusion structures, i.e. Parallel Fusion and Progressive Fusion, to process pyramid features from backbone network. Both fusion structures leverage the advantages of spatial information affluence at high resolution and semantic comprehension at low resolution effectively. We propose a refinement stage for the pyramid features to further boost the accuracy of our network. By introducing dilated bottleneck and attention module, we increase the receptive field for the features with limited complexity and tune the importance to different feature channels. To further refine the feature maps after completion of feature extraction stage, an Attention Module ($AM$) is defined to extract weighted features from different scale feature maps generated by the parallel-pyramid structure. Compared with the traditional up-sampling refining, $AM$ can better capture the relationship between channels. Experiments corroborate the effectiveness of our proposed method. Notably, our method achieves the best performance on the challenging MSCOCO and MPII datasets.
摘要:本文提出的增强并行金字塔网($ P ^ 2〜净$)与扩张的瓶颈,并注意模块功能细化。在数据预处理中,我们提出了一个微自动数据增强($ DA ^ $ 2)方法。我们制定的微分形式的搜索数据augmentaion政策的问题,因此最优策略设置可以很容易地通过反向传播训练期间更新。 $ DA ^ 2 $提高了训练效率。并行金字塔结构之后,以补偿由所述网络引入的信息丢失。我们创新两个融合的结构,即平行融合和逐行融合,以处理从金字塔骨干网络功能。两种融合结构有效利用空间信息的富裕在高分辨率和语义理解的优势,在低分辨率。我们提出的金字塔细化阶段特点,进一步提升我们的网络的准确性。通过引入扩张的瓶颈和关注模块,我们增加了感受野与有限的复杂性和调整的重要性,不同功能的通道的功能。以进一步缩小特征提取级,注意模块($ $ AM)的完成被定义从不同尺度特征地图由平行金字塔结构产生的提取加权后的特征的特征映射。与传统上采样精炼相比,$ AM $可以更好地捕捉通道之间的关系。实验证实了我们所提出的方法的有效性。值得注意的是,我们的方法实现的挑战MSCOCO和MPII数据集的最佳性能。
Luanxuan Hou, Jie Cao, Yuan Zhao, Haifeng Shen, Jian Tang, Ran He
Abstract: We propose an augmented Parallel-Pyramid Net ($P^2~Net$) with feature refinement by dilated bottleneck and attention module. During data preprocessing, we proposed a differentiable auto data augmentation ($DA^2$) method. We formulate the problem of searching data augmentaion policy in a differentiable form, so that the optimal policy setting can be easily updated by back propagation during training. $DA^2$ improves the training efficiency. A parallel-pyramid structure is followed to compensate the information loss introduced by the network. We innovate two fusion structures, i.e. Parallel Fusion and Progressive Fusion, to process pyramid features from backbone network. Both fusion structures leverage the advantages of spatial information affluence at high resolution and semantic comprehension at low resolution effectively. We propose a refinement stage for the pyramid features to further boost the accuracy of our network. By introducing dilated bottleneck and attention module, we increase the receptive field for the features with limited complexity and tune the importance to different feature channels. To further refine the feature maps after completion of feature extraction stage, an Attention Module ($AM$) is defined to extract weighted features from different scale feature maps generated by the parallel-pyramid structure. Compared with the traditional up-sampling refining, $AM$ can better capture the relationship between channels. Experiments corroborate the effectiveness of our proposed method. Notably, our method achieves the best performance on the challenging MSCOCO and MPII datasets.
摘要:本文提出的增强并行金字塔网($ P ^ 2〜净$)与扩张的瓶颈,并注意模块功能细化。在数据预处理中,我们提出了一个微自动数据增强($ DA ^ $ 2)方法。我们制定的微分形式的搜索数据augmentaion政策的问题,因此最优策略设置可以很容易地通过反向传播训练期间更新。 $ DA ^ 2 $提高了训练效率。并行金字塔结构之后,以补偿由所述网络引入的信息丢失。我们创新两个融合的结构,即平行融合和逐行融合,以处理从金字塔骨干网络功能。两种融合结构有效利用空间信息的富裕在高分辨率和语义理解的优势,在低分辨率。我们提出的金字塔细化阶段特点,进一步提升我们的网络的准确性。通过引入扩张的瓶颈和关注模块,我们增加了感受野与有限的复杂性和调整的重要性,不同功能的通道的功能。以进一步缩小特征提取级,注意模块($ $ AM)的完成被定义从不同尺度特征地图由平行金字塔结构产生的提取加权后的特征的特征映射。与传统上采样精炼相比,$ AM $可以更好地捕捉通道之间的关系。实验证实了我们所提出的方法的有效性。值得注意的是,我们的方法实现的挑战MSCOCO和MPII数据集的最佳性能。
18. Synthetic Training for Monocular Human Mesh Recovery [PDF] 返回目录
Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei
Abstract: Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D projection. However, we observe that the commonly used weak-perspective model performs poorly in dealing with the external foreshortening effect of camera projection. Therefore, we propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision. The proposed method outperforms previous methods on the CMU Panoptic Studio dataset according to the evaluation results and achieves comparable results on the Human3.6M body and STB hand benchmarks. More impressively, the performance in close shot images gets significantly improved using the proposed D2S projection for weak supervision, while maintains obvious superiority in computational efficiency.
摘要:从单眼图像恢复人体三维网格是计算机视觉中的热门话题,并具有广泛的应用前景。本文旨在估计的多个身体部位(例如,身体,手)与来自一个单一的RGB图像的大型的差异3D啮合。现有的方法大多是基于迭代优化,这是非常耗时的。我们建议培养一个单杆模型来实现这一目标。主要的挑战是缺乏那些在2D图像的所有身体部位的完整的3D注释的训练数据。为了解决这个问题,我们设计了一个多分支框架解开不同的身体特性的回归,使我们能够分离在使用可用的不成对数据的合成训练方式每个组件的训练。此外,为加强推广能力,现有的大多数方法使用在最狂野的2D姿态的数据集,以监督通过3D到2D投影的估计的3D姿势。然而,我们看到,常用的弱透视模型表现不佳在处理摄像机投影的外部缩短的效果。因此,我们提出了一个深入到规模(D2S)投影纳入深度差到投影功能,为了得到每规模联合更多适当的监督变种。该方法优于根据评估结果对CMU全景工作室数据集以前的方法,达到对Human3.6M身体和STB手基准比较的结果。更令人印象深刻,在近景图像性能使用提出D2S投影薄弱的监督得到显著改善,同时维持计算效率明显优势。
Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei
Abstract: Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D projection. However, we observe that the commonly used weak-perspective model performs poorly in dealing with the external foreshortening effect of camera projection. Therefore, we propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision. The proposed method outperforms previous methods on the CMU Panoptic Studio dataset according to the evaluation results and achieves comparable results on the Human3.6M body and STB hand benchmarks. More impressively, the performance in close shot images gets significantly improved using the proposed D2S projection for weak supervision, while maintains obvious superiority in computational efficiency.
摘要:从单眼图像恢复人体三维网格是计算机视觉中的热门话题,并具有广泛的应用前景。本文旨在估计的多个身体部位(例如,身体,手)与来自一个单一的RGB图像的大型的差异3D啮合。现有的方法大多是基于迭代优化,这是非常耗时的。我们建议培养一个单杆模型来实现这一目标。主要的挑战是缺乏那些在2D图像的所有身体部位的完整的3D注释的训练数据。为了解决这个问题,我们设计了一个多分支框架解开不同的身体特性的回归,使我们能够分离在使用可用的不成对数据的合成训练方式每个组件的训练。此外,为加强推广能力,现有的大多数方法使用在最狂野的2D姿态的数据集,以监督通过3D到2D投影的估计的3D姿势。然而,我们看到,常用的弱透视模型表现不佳在处理摄像机投影的外部缩短的效果。因此,我们提出了一个深入到规模(D2S)投影纳入深度差到投影功能,为了得到每规模联合更多适当的监督变种。该方法优于根据评估结果对CMU全景工作室数据集以前的方法,达到对Human3.6M身体和STB手基准比较的结果。更令人印象深刻,在近景图像性能使用提出D2S投影薄弱的监督得到显著改善,同时维持计算效率明显优势。
19. Cross-directional Feature Fusion Network for Building Damage Assessment from Satellite Imagery [PDF] 返回目录
Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen
Abstract: Fast and effective responses are required when a natural disaster (e.g., earthquake, hurricane, etc.) strikes. Building damage assessment from satellite imagery is critical before an effective response is conducted. High-resolution satellite images provide rich information with pre- and post-disaster scenes for analysis. However, most existing works simply use pre- and post-disaster images as input without considering their correlations. In this paper, we propose a novel cross-directional fusion strategy to better explore the correlations between pre- and post-disaster images. Moreover, the data augmentation method CutMix is exploited to tackle the challenge of hard classes. The proposed method achieves state-of-the-art performance on a large-scale building damage assessment dataset -- xBD.
摘要:快速和有效的反应需要时自然灾害(如地震,飓风等)的罢工。被传导的有效响应之前建立从卫星图像损伤评估是至关重要的。高分辨率卫星图像提供用于分析前和灾后场景丰富的信息。然而,大多数现有的作品简单地使用前和灾后图像作为输入,而不考虑它们之间的关系。在本文中,我们提出了一个新颖的横向的融合策略,以便更好地探索前和灾后图像之间的相关性。此外,数据增强方法CutMix被利用来解决硬类的挑战。所提出的方法实现对大型建筑损伤评估数据集的国家的最先进的性能 - XBD。
Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen
Abstract: Fast and effective responses are required when a natural disaster (e.g., earthquake, hurricane, etc.) strikes. Building damage assessment from satellite imagery is critical before an effective response is conducted. High-resolution satellite images provide rich information with pre- and post-disaster scenes for analysis. However, most existing works simply use pre- and post-disaster images as input without considering their correlations. In this paper, we propose a novel cross-directional fusion strategy to better explore the correlations between pre- and post-disaster images. Moreover, the data augmentation method CutMix is exploited to tackle the challenge of hard classes. The proposed method achieves state-of-the-art performance on a large-scale building damage assessment dataset -- xBD.
摘要:快速和有效的反应需要时自然灾害(如地震,飓风等)的罢工。被传导的有效响应之前建立从卫星图像损伤评估是至关重要的。高分辨率卫星图像提供用于分析前和灾后场景丰富的信息。然而,大多数现有的作品简单地使用前和灾后图像作为输入,而不考虑它们之间的关系。在本文中,我们提出了一个新颖的横向的融合策略,以便更好地探索前和灾后图像之间的相关性。此外,数据增强方法CutMix被利用来解决硬类的挑战。所提出的方法实现对大型建筑损伤评估数据集的国家的最先进的性能 - XBD。
20. Decentralized Attribution of Generative Models [PDF] 返回目录
Changhoon Kim, Yi Ren, Yezhou Yang
Abstract: There have been growing concerns regarding the fabrication of contents through generative models. This paper investigates the feasibility of decentralized attribution of such models. Given a set of generative models learned from the same dataset, attributability is achieved when a public verification service exists to correctly identify the source models for generated content. Attribution allows tracing of machine-generated content back to its source model, thus facilitating IP-protection and content regulation. Existing attribution methods are non-scalable with respect to the number of models and lack theoretical bounds on attributability. This paper studies decentralized attribution, where provable attributability can be achieved by only requiring each model to be distinguishable from the authentic data. Our major contributions are the derivation of the sufficient conditions for decentralized attribution and the design of keys following these conditions. Specifically, we show that decentralized attribution can be achieved when keys are (1) orthogonal to each other, and (2) belonging to a subspace determined by the data distribution. This result is validated on MNIST and CelebA. Lastly, we use these datasets to examine the trade-off between generation quality and robust attributability against adversarial post-processes.
摘要:已经有关于内容通过生成模型制造越来越多的关注。本文研究了这种模式的分散归属的可行性。给定一组来自同一数据集生成了解到车型,attributability是当公共验证服务的存在是为了正确地识别生成的内容源模型来实现的。归因允许跟踪机器生成的内容回其源模型的,从而有利于IP-保护和内容规则。现有的归属方法是不可扩展对于模型的数量和缺乏attributability理论界。本文研究了分散的归属,其中可证明attributability可以只要求每个模型来实现是从真实数据区分。我们的主要贡献是对分散的归属和按键的下面这些条件设计的充分条件推导。具体而言,我们表明,当密钥是(1)彼此正交的分散归因可以实现的,和(2)属于子空间确定由所述数据分配。这一结果进行了验证上MNIST和CelebA。最后,我们使用这些数据集来检查代品质和强大的attributability之间对抗敌对的后处理的权衡。
Changhoon Kim, Yi Ren, Yezhou Yang
Abstract: There have been growing concerns regarding the fabrication of contents through generative models. This paper investigates the feasibility of decentralized attribution of such models. Given a set of generative models learned from the same dataset, attributability is achieved when a public verification service exists to correctly identify the source models for generated content. Attribution allows tracing of machine-generated content back to its source model, thus facilitating IP-protection and content regulation. Existing attribution methods are non-scalable with respect to the number of models and lack theoretical bounds on attributability. This paper studies decentralized attribution, where provable attributability can be achieved by only requiring each model to be distinguishable from the authentic data. Our major contributions are the derivation of the sufficient conditions for decentralized attribution and the design of keys following these conditions. Specifically, we show that decentralized attribution can be achieved when keys are (1) orthogonal to each other, and (2) belonging to a subspace determined by the data distribution. This result is validated on MNIST and CelebA. Lastly, we use these datasets to examine the trade-off between generation quality and robust attributability against adversarial post-processes.
摘要:已经有关于内容通过生成模型制造越来越多的关注。本文研究了这种模式的分散归属的可行性。给定一组来自同一数据集生成了解到车型,attributability是当公共验证服务的存在是为了正确地识别生成的内容源模型来实现的。归因允许跟踪机器生成的内容回其源模型的,从而有利于IP-保护和内容规则。现有的归属方法是不可扩展对于模型的数量和缺乏attributability理论界。本文研究了分散的归属,其中可证明attributability可以只要求每个模型来实现是从真实数据区分。我们的主要贡献是对分散的归属和按键的下面这些条件设计的充分条件推导。具体而言,我们表明,当密钥是(1)彼此正交的分散归因可以实现的,和(2)属于子空间确定由所述数据分配。这一结果进行了验证上MNIST和CelebA。最后,我们使用这些数据集来检查代品质和强大的attributability之间对抗敌对的后处理的权衡。
21. Developing Univariate Neurodegeneration Biomarkers with Low-Rank and Sparse Subspace Decomposition [PDF] 返回目录
Gang Wang, Qunxi Dong, Jianfeng Wu, Yi Su, Kewei Chen, Qingtang Su, Xiaofeng Zhang, Jinguang Hao, Tao Yao, Li Liu, Caiming Zhang, Richard J Caselli, Eric M Reiman, Yalin Wang
Abstract: Cognitive decline due to Alzheimer's disease (AD) is closely associated with brain structure alterations captured by structural magnetic resonance imaging (sMRI). It supports the validity to develop sMRI-based univariate neurodegeneration biomarkers (UNB). However, existing UNB work either fails to model large group variances or does not capture AD dementia (ADD) induced changes. We propose a novel low-rank and sparse subspace decomposition method capable of stably quantifying the morphological changes induced by ADD. Specifically, we propose a numerically efficient rank minimization mechanism to extract group common structure and impose regularization constraints to encode the original 3D morphometry connectivity. Further, we generate regions-of-interest (ROI) with group difference study between common subspaces of $A\beta+$ AD and $A\beta-$ cognitively unimpaired (CU) groups. A univariate morphometry index (UMI) is constructed from these ROIs by summarizing individual morphological characteristics weighted by normalized difference between $A\beta+$ AD and $A\beta-$ CU groups. We use hippocampal surface radial distance feature to compute the UMIs and validate our work in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. With hippocampal UMIs, the estimated minimum sample sizes needed to detect a 25$\%$ reduction in the mean annual change with 80$\%$ power and two-tailed $P=0.05$ are 116, 279 and 387 for the longitudinal $A\beta+$ AD, $A\beta+$ mild cognitive impairment (MCI) and $A\beta+$ CU groups, respectively. Additionally, for MCI patients, UMIs well correlate with hazard ratio of conversion to AD ($4.3$, $95\%$ CI=$2.3-8.2$) within 18 months. Our experimental results outperform traditional hippocampal volume measures and suggest the application of UMI as a potential UNB.
摘要:由于阿耳茨海默氏病(AD)认知能力的下降是密切由结构磁共振成像(SMRI)捕获的大脑结构改变有关。它支持开发基于SMRI,单变量神经退化生物标志(UNB)的有效性。然而,现有的UNB工作或者不能大组建模方差或不捕获AD痴呆(ADD)引起的变化。我们提出了一个新颖的低秩和能够稳定地定量通过加入诱导形态学变化稀疏子空间分解方法。具体而言,提出了一种有效的数值最小化等级的机制来提取物组共同的结构和施加的约束正则化来编码原始3D形态连接。此外,我们生成区域的感兴趣与$ A \测试+ $ AD和$ A \β-葡$认知未受损害(CU)基团的共同的子空间之间的组间差异的研究(ROI)。单变量形态指数(UMI)从这些感兴趣区通过总结由$ A \测试+ $ AD和$ A \β-葡$ CU组之间归一化差异加权个别形态特征构成。我们使用海马表面径向距离的功能来计算的UMI和验证在阿尔茨海默氏病的神经影像学倡议(ADNI)队列我们的工作。与海马的UMI,尺寸所需要的估计的最小样品与80 $ \%$功率和双尾$ P年平均变化来检测25 $ \%$减少= 0.05 $是116,对于纵向$ 279和387 A \测试+ $ AD,$ A \测试+ $轻度认知障碍(MCI)和$ A \测试+ $ CU组,分别。另外,对于MCI患者,用的UMI至AD 18个月内的转换风险比($ 4.3,$ 95 \%$ CI = $ 2.3-8.2 $)以及相关。我们的实验结果优于传统的海马体积的措施和建议UMI的应用程序作为一个潜在的UNB。
Gang Wang, Qunxi Dong, Jianfeng Wu, Yi Su, Kewei Chen, Qingtang Su, Xiaofeng Zhang, Jinguang Hao, Tao Yao, Li Liu, Caiming Zhang, Richard J Caselli, Eric M Reiman, Yalin Wang
Abstract: Cognitive decline due to Alzheimer's disease (AD) is closely associated with brain structure alterations captured by structural magnetic resonance imaging (sMRI). It supports the validity to develop sMRI-based univariate neurodegeneration biomarkers (UNB). However, existing UNB work either fails to model large group variances or does not capture AD dementia (ADD) induced changes. We propose a novel low-rank and sparse subspace decomposition method capable of stably quantifying the morphological changes induced by ADD. Specifically, we propose a numerically efficient rank minimization mechanism to extract group common structure and impose regularization constraints to encode the original 3D morphometry connectivity. Further, we generate regions-of-interest (ROI) with group difference study between common subspaces of $A\beta+$ AD and $A\beta-$ cognitively unimpaired (CU) groups. A univariate morphometry index (UMI) is constructed from these ROIs by summarizing individual morphological characteristics weighted by normalized difference between $A\beta+$ AD and $A\beta-$ CU groups. We use hippocampal surface radial distance feature to compute the UMIs and validate our work in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. With hippocampal UMIs, the estimated minimum sample sizes needed to detect a 25$\%$ reduction in the mean annual change with 80$\%$ power and two-tailed $P=0.05$ are 116, 279 and 387 for the longitudinal $A\beta+$ AD, $A\beta+$ mild cognitive impairment (MCI) and $A\beta+$ CU groups, respectively. Additionally, for MCI patients, UMIs well correlate with hazard ratio of conversion to AD ($4.3$, $95\%$ CI=$2.3-8.2$) within 18 months. Our experimental results outperform traditional hippocampal volume measures and suggest the application of UMI as a potential UNB.
摘要:由于阿耳茨海默氏病(AD)认知能力的下降是密切由结构磁共振成像(SMRI)捕获的大脑结构改变有关。它支持开发基于SMRI,单变量神经退化生物标志(UNB)的有效性。然而,现有的UNB工作或者不能大组建模方差或不捕获AD痴呆(ADD)引起的变化。我们提出了一个新颖的低秩和能够稳定地定量通过加入诱导形态学变化稀疏子空间分解方法。具体而言,提出了一种有效的数值最小化等级的机制来提取物组共同的结构和施加的约束正则化来编码原始3D形态连接。此外,我们生成区域的感兴趣与$ A \测试+ $ AD和$ A \β-葡$认知未受损害(CU)基团的共同的子空间之间的组间差异的研究(ROI)。单变量形态指数(UMI)从这些感兴趣区通过总结由$ A \测试+ $ AD和$ A \β-葡$ CU组之间归一化差异加权个别形态特征构成。我们使用海马表面径向距离的功能来计算的UMI和验证在阿尔茨海默氏病的神经影像学倡议(ADNI)队列我们的工作。与海马的UMI,尺寸所需要的估计的最小样品与80 $ \%$功率和双尾$ P年平均变化来检测25 $ \%$减少= 0.05 $是116,对于纵向$ 279和387 A \测试+ $ AD,$ A \测试+ $轻度认知障碍(MCI)和$ A \测试+ $ CU组,分别。另外,对于MCI患者,用的UMI至AD 18个月内的转换风险比($ 4.3,$ 95 \%$ CI = $ 2.3-8.2 $)以及相关。我们的实验结果优于传统的海马体积的措施和建议UMI的应用程序作为一个潜在的UNB。
22. Neural Unsigned Distance Fields for Implicit Function Learning [PDF] 返回目录
Julian Chibane, Aymen Mir, Gerard Pons-Moll
Abstract: In this work we target a learnable output representation that allows continuous, high resolution outputs of arbitrary shape. Recent works represent 3D surfaces implicitly with a Neural Network, thereby breaking previous barriers in resolution, and ability to represent diverse topologies. However, neural implicit representations are limited to closed surfaces, which divide the space into inside and outside. Many real world objects such as walls of a scene scanned by a sensor, clothing, or a car with inner structures are not closed. This constitutes a significant barrier, in terms of data pre-processing (objects need to be artificially closed creating artifacts), and the ability to output open surfaces. In this work, we propose Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds. NDF represent surfaces at high resolutions as prior implicit models, but do not require closed surface data, and significantly broaden the class of representable shapes in the output. NDF allow to extract the surface as very dense point clouds and as meshes. We also show that NDF allow for surface normal calculation and can be rendered using a slight modification of sphere tracing. We find NDF can be used for multi-target regression (multiple outputs for one input) with techniques that have been exclusively used for rendering in graphics. Experiments on ShapeNet show that NDF, while simple, is the state-of-the art, and allows to reconstruct shapes with inner structures, such as the chairs inside a bus. Notably, we show that NDF are not restricted to 3D shapes, and can approximate more general open surfaces such as curves, manifolds, and functions. Code is available for research at this https URL.
摘要:在这项工作中,我们靶向可学习输出表示允许任意形状的连续的,高分辨率的输出。近期的作品代表了3D与神经网络隐含表面,从而打破了以往决议的障碍,并代表不同的拓扑结构的能力。然而,神经隐式表示被仅限于封闭的表面,它们将空间分成内部和外部。许多真实世界的物体,如由传感器,服装扫描的场景,或与内部结构汽车的墙壁都没有关闭。这样就构成了显著屏障,在数据方面预处理(对象需要被人工关闭创建工件),并有能力输出开口表面。在这项工作中,我们提出了神经距离字段(NDF),基于神经网络模型的形状给稀疏的点云,其预测的无符号的距离场任意3D。 NDF表示表面在高分辨率作为现有隐式模型,但不要求封闭的表面数据,和显著拓宽类在输出表示的形状。 NDF允许表面提取非常密集的点云和网格。我们还表明,NDF允许表面法线计算,并且可以使用球跟踪的略微修改来呈现。我们发现NDF可用于多目标回归(一个输入多输出)与已专门用于图形渲染技术。上ShapeNet实验表明,NDF,虽然简单,是国家的本领域中,并且允许重构与内部结构,诸如总线内的椅子形状。值得注意的是,我们表明,NDF不限于三维形状,并且可以近似更一般的开放表面,例如曲线,歧管,和功能。代码可用于研究在此HTTPS URL。
Julian Chibane, Aymen Mir, Gerard Pons-Moll
Abstract: In this work we target a learnable output representation that allows continuous, high resolution outputs of arbitrary shape. Recent works represent 3D surfaces implicitly with a Neural Network, thereby breaking previous barriers in resolution, and ability to represent diverse topologies. However, neural implicit representations are limited to closed surfaces, which divide the space into inside and outside. Many real world objects such as walls of a scene scanned by a sensor, clothing, or a car with inner structures are not closed. This constitutes a significant barrier, in terms of data pre-processing (objects need to be artificially closed creating artifacts), and the ability to output open surfaces. In this work, we propose Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds. NDF represent surfaces at high resolutions as prior implicit models, but do not require closed surface data, and significantly broaden the class of representable shapes in the output. NDF allow to extract the surface as very dense point clouds and as meshes. We also show that NDF allow for surface normal calculation and can be rendered using a slight modification of sphere tracing. We find NDF can be used for multi-target regression (multiple outputs for one input) with techniques that have been exclusively used for rendering in graphics. Experiments on ShapeNet show that NDF, while simple, is the state-of-the art, and allows to reconstruct shapes with inner structures, such as the chairs inside a bus. Notably, we show that NDF are not restricted to 3D shapes, and can approximate more general open surfaces such as curves, manifolds, and functions. Code is available for research at this https URL.
摘要:在这项工作中,我们靶向可学习输出表示允许任意形状的连续的,高分辨率的输出。近期的作品代表了3D与神经网络隐含表面,从而打破了以往决议的障碍,并代表不同的拓扑结构的能力。然而,神经隐式表示被仅限于封闭的表面,它们将空间分成内部和外部。许多真实世界的物体,如由传感器,服装扫描的场景,或与内部结构汽车的墙壁都没有关闭。这样就构成了显著屏障,在数据方面预处理(对象需要被人工关闭创建工件),并有能力输出开口表面。在这项工作中,我们提出了神经距离字段(NDF),基于神经网络模型的形状给稀疏的点云,其预测的无符号的距离场任意3D。 NDF表示表面在高分辨率作为现有隐式模型,但不要求封闭的表面数据,和显著拓宽类在输出表示的形状。 NDF允许表面提取非常密集的点云和网格。我们还表明,NDF允许表面法线计算,并且可以使用球跟踪的略微修改来呈现。我们发现NDF可用于多目标回归(一个输入多输出)与已专门用于图形渲染技术。上ShapeNet实验表明,NDF,虽然简单,是国家的本领域中,并且允许重构与内部结构,诸如总线内的椅子形状。值得注意的是,我们表明,NDF不限于三维形状,并且可以近似更一般的开放表面,例如曲线,歧管,和功能。代码可用于研究在此HTTPS URL。
23. Processing of incomplete images by (graph) convolutional neural networks [PDF] 返回目录
Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Łukasz Maziarka
Abstract: We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical CNNs operating on images. On one hand, our approach avoids the problem of missing data imputation while, on the other hand, there is a natural correspondence between CNNs and SGCN. Experiments confirm that our approach performs better than analogical CNNs with the imputation of missing values on typical classification and reconstruction tasks.
摘要:我们调查的培训,从残缺的图像的神经网络,而无需更换缺失值的问题。为了这个目的,我们首先表示为曲线图,其中,缺少的像素被完全忽略的图像。一种类型的曲线图卷积的网络,这是对图像操作经典细胞神经网络的一个适当的一般化 - 图表图像表示是使用了空间图形卷积网络(SGCN)处理。一方面,我们的方法避免了丢失的数据估算,同时,在另一方面,有细胞神经网络和SGCN之间的自然对应的问题。实验证实,比类比细胞神经网络我们的方法进行更好地与典型的分类和重建任务缺失值的估算。
Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Łukasz Maziarka
Abstract: We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical CNNs operating on images. On one hand, our approach avoids the problem of missing data imputation while, on the other hand, there is a natural correspondence between CNNs and SGCN. Experiments confirm that our approach performs better than analogical CNNs with the imputation of missing values on typical classification and reconstruction tasks.
摘要:我们调查的培训,从残缺的图像的神经网络,而无需更换缺失值的问题。为了这个目的,我们首先表示为曲线图,其中,缺少的像素被完全忽略的图像。一种类型的曲线图卷积的网络,这是对图像操作经典细胞神经网络的一个适当的一般化 - 图表图像表示是使用了空间图形卷积网络(SGCN)处理。一方面,我们的方法避免了丢失的数据估算,同时,在另一方面,有细胞神经网络和SGCN之间的自然对应的问题。实验证实,比类比细胞神经网络我们的方法进行更好地与典型的分类和重建任务缺失值的估算。
24. Multi-Class Zero-Shot Learning for Artistic Material Recognition [PDF] 返回目录
Alexander W Olson, Andreea Cucu, Tom Bock
Abstract: Zero-Shot Learning (ZSL) is an extreme form of transfer learning, where no labelled examples of the data to be classified are provided during the training stage. Instead, ZSL uses additional information learned about the domain, and relies upon transfer learning algorithms to infer knowledge about the missing instances. ZSL approaches are an attractive solution for sparse datasets. Here we outline a model to identify the materials with which a work of art was created, by learning the relationship between English descriptions of the subject of a piece and its composite materials. After experimenting with a range of hyper-parameters, we produce a model which is capable of correctly identifying the materials used on pieces from an entirely distinct museum dataset. This model returned a classification accuracy of 48.42% on 5,000 artworks taken from the Tate collection, which is distinct from the Rijksmuseum network used to create and train our model.
摘要:零次触发学习(ZSL)是迁移学习,其中在训练阶段被提供给被分类的数据的无标记的例子的一个极端形式。相反,ZSL使用了解域中的其他信息,并在迁移学习算法依赖于知识推断关于丢失的情况。 ZSL方法是稀疏数据集有吸引力的解决方案。在这里,我们勾勒出一个模型来识别与艺术作品被创造的材料,通过学习了一块主题的英文描述及其复合材料之间的关系。与一系列超参数试验后,我们产生一个模型,其能够正确识别从完全不同的数据集博物馆片上所用的材料。这种模式返回从泰特集合,这是从用于创建和训练我们的模型国立博物馆网络不同拍摄5,000件艺术品的48.42%的分类准确度。
Alexander W Olson, Andreea Cucu, Tom Bock
Abstract: Zero-Shot Learning (ZSL) is an extreme form of transfer learning, where no labelled examples of the data to be classified are provided during the training stage. Instead, ZSL uses additional information learned about the domain, and relies upon transfer learning algorithms to infer knowledge about the missing instances. ZSL approaches are an attractive solution for sparse datasets. Here we outline a model to identify the materials with which a work of art was created, by learning the relationship between English descriptions of the subject of a piece and its composite materials. After experimenting with a range of hyper-parameters, we produce a model which is capable of correctly identifying the materials used on pieces from an entirely distinct museum dataset. This model returned a classification accuracy of 48.42% on 5,000 artworks taken from the Tate collection, which is distinct from the Rijksmuseum network used to create and train our model.
摘要:零次触发学习(ZSL)是迁移学习,其中在训练阶段被提供给被分类的数据的无标记的例子的一个极端形式。相反,ZSL使用了解域中的其他信息,并在迁移学习算法依赖于知识推断关于丢失的情况。 ZSL方法是稀疏数据集有吸引力的解决方案。在这里,我们勾勒出一个模型来识别与艺术作品被创造的材料,通过学习了一块主题的英文描述及其复合材料之间的关系。与一系列超参数试验后,我们产生一个模型,其能够正确识别从完全不同的数据集博物馆片上所用的材料。这种模式返回从泰特集合,这是从用于创建和训练我们的模型国立博物馆网络不同拍摄5,000件艺术品的48.42%的分类准确度。
25. Enhancing road signs segmentation using photometric invariants [PDF] 返回目录
Tarik Ayaou, Azeddine Beghdadi, Karim Afdel, Abdellah Amghar
Abstract: Road signs detection and recognition in natural scenes is one of the most important tasksin the design of Intelligent Transport Systems (ITS). However, illumination changes remain a major problem. In this paper, an efficient ap-proach of road signs segmentation based on photometric invariants is proposed. This method is based on color in-formation using a hybrid distance, by exploiting the chro-matic distance and the red and blue ratio, on l Theta Phi color space which is invariant to highlight, shading and shadow changes. A comparative study is performed to demonstrate the robustness of this approach over the most frequently used methods for road sign segmentation. The experimental results and the detailed analysis show the high performance of the algorithm described in this paper.
摘要:路标检测与识别的自然景观是最重要的tasksin的一个智能交通系统(ITS)的设计。然而,光照变化仍是一个大问题。在本文中,基于光度不变量路标分割的有效AP-proach建议。该方法在-形成使用混合的距离,通过利用显色剂-MATIC距离和红色和蓝色比率,上升西塔披颜色空间是不变的高亮,阴影和阴影变化基于颜色。比较研究进行证明这种方法在道路标志分割的最常用的方法的稳健性。实验结果和详细的分析显示出在本文所描述的算法的高性能。
Tarik Ayaou, Azeddine Beghdadi, Karim Afdel, Abdellah Amghar
Abstract: Road signs detection and recognition in natural scenes is one of the most important tasksin the design of Intelligent Transport Systems (ITS). However, illumination changes remain a major problem. In this paper, an efficient ap-proach of road signs segmentation based on photometric invariants is proposed. This method is based on color in-formation using a hybrid distance, by exploiting the chro-matic distance and the red and blue ratio, on l Theta Phi color space which is invariant to highlight, shading and shadow changes. A comparative study is performed to demonstrate the robustness of this approach over the most frequently used methods for road sign segmentation. The experimental results and the detailed analysis show the high performance of the algorithm described in this paper.
摘要:路标检测与识别的自然景观是最重要的tasksin的一个智能交通系统(ITS)的设计。然而,光照变化仍是一个大问题。在本文中,基于光度不变量路标分割的有效AP-proach建议。该方法在-形成使用混合的距离,通过利用显色剂-MATIC距离和红色和蓝色比率,上升西塔披颜色空间是不变的高亮,阴影和阴影变化基于颜色。比较研究进行证明这种方法在道路标志分割的最常用的方法的稳健性。实验结果和详细的分析显示出在本文所描述的算法的高性能。
26. Peak Detection On Data Independent Acquisition Mass Spectrometry Data With Semisupervised Convolutional Transformers [PDF] 返回目录
Leon L. Xu, Hannes L. Röst
Abstract: Liquid Chromatography coupled to Mass Spectrometry (LC-MS) based methods are commonly used for high-throughput, quantitative measurements of the proteome (i.e. the set of all proteins in a sample at a given time). Targeted LC-MS produces data in the form of a two-dimensional time series spectrum, with the mass to charge ratio of analytes (m/z) on one axis, and the retention time from the chromatography on the other. The elution of a peptide of interest produces highly specific patterns across multiple fragment ion traces (extracted ion chromatograms, or XICs). In this paper, we formulate this peak detection problem as a multivariate time series segmentation problem, and propose a novel approach based on the Transformer architecture. Here we augment Transformers, which are capable of capturing long distance dependencies with a global view, with Convolutional Neural Networks (CNNs), which can capture local context important to the task at hand, in the form of Transformers with Convolutional Self-Attention. We further train this model in a semisupervised manner by adapting state of the art semisupervised image classification techniques for multi-channel time series data. Experiments on a representative LC-MS dataset are benchmarked using manual annotations to showcase the encouraging performance of our method; it outperforms baseline neural network architectures and is competitive against the current state of the art in automated peak detection.
摘要:液相色谱耦合到质谱分析法(LC-MS)的方法通常用于高通量,蛋白质组(即,在给定的时间所设定的样品中的所有蛋白质的)的定量测量。靶向LC-MS中的二维频谱时序的形式产生数据,用质量为在一个轴的分析物(M / Z)的电荷比,并从另一方的色谱的保留时间。感兴趣的肽的洗脱产生跨多个片段离子的痕迹高度特异性的图案(提取离子色谱,或XICs)。在本文中,我们制定这个峰值检测问题作为一个多元时间序列分割问题,并提出了基于变压器架构的新方法。在这里,我们增加变压器,它能够捕捉远距离的依赖与全球视野的,有卷积神经网络(细胞神经网络),它可以在手捕捉本地环境的重要任务,在变压器与卷积自注意形式。我们通过调节本领域半监督图像分类技术用于多通道时间序列数据状态进一步培养在半监督方式这种模式。数据集上使用手动注释展示我们的方法的鼓励性能基准的代表性LC-MS实验;它优于基线神经网络体系结构和是针对现有技术中的自动峰值检测的当前状态的竞争力。
Leon L. Xu, Hannes L. Röst
Abstract: Liquid Chromatography coupled to Mass Spectrometry (LC-MS) based methods are commonly used for high-throughput, quantitative measurements of the proteome (i.e. the set of all proteins in a sample at a given time). Targeted LC-MS produces data in the form of a two-dimensional time series spectrum, with the mass to charge ratio of analytes (m/z) on one axis, and the retention time from the chromatography on the other. The elution of a peptide of interest produces highly specific patterns across multiple fragment ion traces (extracted ion chromatograms, or XICs). In this paper, we formulate this peak detection problem as a multivariate time series segmentation problem, and propose a novel approach based on the Transformer architecture. Here we augment Transformers, which are capable of capturing long distance dependencies with a global view, with Convolutional Neural Networks (CNNs), which can capture local context important to the task at hand, in the form of Transformers with Convolutional Self-Attention. We further train this model in a semisupervised manner by adapting state of the art semisupervised image classification techniques for multi-channel time series data. Experiments on a representative LC-MS dataset are benchmarked using manual annotations to showcase the encouraging performance of our method; it outperforms baseline neural network architectures and is competitive against the current state of the art in automated peak detection.
摘要:液相色谱耦合到质谱分析法(LC-MS)的方法通常用于高通量,蛋白质组(即,在给定的时间所设定的样品中的所有蛋白质的)的定量测量。靶向LC-MS中的二维频谱时序的形式产生数据,用质量为在一个轴的分析物(M / Z)的电荷比,并从另一方的色谱的保留时间。感兴趣的肽的洗脱产生跨多个片段离子的痕迹高度特异性的图案(提取离子色谱,或XICs)。在本文中,我们制定这个峰值检测问题作为一个多元时间序列分割问题,并提出了基于变压器架构的新方法。在这里,我们增加变压器,它能够捕捉远距离的依赖与全球视野的,有卷积神经网络(细胞神经网络),它可以在手捕捉本地环境的重要任务,在变压器与卷积自注意形式。我们通过调节本领域半监督图像分类技术用于多通道时间序列数据状态进一步培养在半监督方式这种模式。数据集上使用手动注释展示我们的方法的鼓励性能基准的代表性LC-MS实验;它优于基线神经网络体系结构和是针对现有技术中的自动峰值检测的当前状态的竞争力。
27. Application of sequential processing of computer vision methods for solving the problem of detecting the edges of a honeycomb block [PDF] 返回目录
M V Kubrikov, I A Paulin, M V Saramud, A S Kubrikova
Abstract: The article describes the application of the Hough transform to a honeycomb block image. The problem of cutting a mold from a honeycomb block is described. A number of image transformations are considered to increase the efficiency of the Hough algorithm. A method for obtaining a binary image using a simple threshold, a method for obtaining a binary image using Otsu binarization, and the Canny Edge Detection algorithm are considered. The method of binary skeleton (skeletonization) is considered, in which the skeleton is obtained using 2 main morphological operations: Dilation and Erosion. As a result of a number of experiments, the optimal sequence of processing the original image was revealed, which allows obtaining the coordinates of the maximum number of faces. This result allows one to choose the optimal places for cutting a honeycomb block, which will improve the quality of the resulting shapes.
摘要:本文描述了霍夫变换的应用变换到一个蜂窝块图像。从蜂窝块切割模具的问题进行说明。许多图像变换都被认为增加霍夫算法的效率。一种用于使用简单的阈值,对于使用大津二值化获取二进制图像的方法获得的二进制图像的方法,和Canny边缘检测算法被考虑。二进制骨架(骨架化)的方法被认为是,在其中骨架是使用2个主要形态操作获得:膨胀和腐蚀。由于一些实验的结果,处理所述原始图像的最优序列被揭露,这允许获得面的最大数量的坐标。这一结果使一个选择最佳的地方切割蜂窝块,这将提高所得形状的质量。
M V Kubrikov, I A Paulin, M V Saramud, A S Kubrikova
Abstract: The article describes the application of the Hough transform to a honeycomb block image. The problem of cutting a mold from a honeycomb block is described. A number of image transformations are considered to increase the efficiency of the Hough algorithm. A method for obtaining a binary image using a simple threshold, a method for obtaining a binary image using Otsu binarization, and the Canny Edge Detection algorithm are considered. The method of binary skeleton (skeletonization) is considered, in which the skeleton is obtained using 2 main morphological operations: Dilation and Erosion. As a result of a number of experiments, the optimal sequence of processing the original image was revealed, which allows obtaining the coordinates of the maximum number of faces. This result allows one to choose the optimal places for cutting a honeycomb block, which will improve the quality of the resulting shapes.
摘要:本文描述了霍夫变换的应用变换到一个蜂窝块图像。从蜂窝块切割模具的问题进行说明。许多图像变换都被认为增加霍夫算法的效率。一种用于使用简单的阈值,对于使用大津二值化获取二进制图像的方法获得的二进制图像的方法,和Canny边缘检测算法被考虑。二进制骨架(骨架化)的方法被认为是,在其中骨架是使用2个主要形态操作获得:膨胀和腐蚀。由于一些实验的结果,处理所述原始图像的最优序列被揭露,这允许获得面的最大数量的坐标。这一结果使一个选择最佳的地方切割蜂窝块,这将提高所得形状的质量。
28. Wavelet Flow: Fast Training of High Resolution Normalizing Flows [PDF] 返回目录
Jason J. Yu, Konstantinos G. Derpanis, Marcus A. Brubaker
Abstract: Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets. A Wavelet Flow has an explicit representation of signal scale that inherently includes models of lower resolution signals and conditional generation of higher resolution signals, i.e., super resolution. A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data (e.g., 1024 x 1024 images) that are impractical with previous models. Furthermore, Wavelet Flow is competitive with previous normalizing flows in terms of bits per dimension on standard (low resolution) benchmarks while being up to 15x faster to train.
摘要:正火流是一类概率生成模型,其允许既快速密度计算和高效的采样和处于造型复杂的分布等的图像有效的。目前的方法中的缺点是它们显著培训费用,有时需要的GPU训练几个月的时间,实现国家的最先进的成果。本文介绍了小波流量,多尺度,基于小波规范化流程架构。的小波流具有信号规模固有地包括较低的分辨率的信号和条件生成更高分辨率的信号,即,超分辨率的模型的显式表示。小波流量的主要优势是建立生成模型对于那些不切实际的与以前的型号高分辨率数据(例如,1024×1024的图像)的能力。此外,小波Flow是竞争性与以前的正火在标准(低分辨率)基准每个维度的比特而言,同时高达15倍更快列车流动。
Jason J. Yu, Konstantinos G. Derpanis, Marcus A. Brubaker
Abstract: Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets. A Wavelet Flow has an explicit representation of signal scale that inherently includes models of lower resolution signals and conditional generation of higher resolution signals, i.e., super resolution. A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data (e.g., 1024 x 1024 images) that are impractical with previous models. Furthermore, Wavelet Flow is competitive with previous normalizing flows in terms of bits per dimension on standard (low resolution) benchmarks while being up to 15x faster to train.
摘要:正火流是一类概率生成模型,其允许既快速密度计算和高效的采样和处于造型复杂的分布等的图像有效的。目前的方法中的缺点是它们显著培训费用,有时需要的GPU训练几个月的时间,实现国家的最先进的成果。本文介绍了小波流量,多尺度,基于小波规范化流程架构。的小波流具有信号规模固有地包括较低的分辨率的信号和条件生成更高分辨率的信号,即,超分辨率的模型的显式表示。小波流量的主要优势是建立生成模型对于那些不切实际的与以前的型号高分辨率数据(例如,1024×1024的图像)的能力。此外,小波Flow是竞争性与以前的正火在标准(低分辨率)基准每个维度的比特而言,同时高达15倍更快列车流动。
29. Detector Algorithms of Bounding Box and Segmentation Mask of a Mask R-CNN Model [PDF] 返回目录
Haruhiro Fujita, Masatoshi Itagaki, Yew Kwang Hooi, Kenta Ichikawa, Kazutaka Kawano, Ryo Yamamoto
Abstract: Detection performances on bounding box and segmentation mask outputs of Mask R-CNN models are evaluated. There are significant differences in detection performances of bounding boxes and segmentation masks, where the former is constantly superior to the latter. Harmonic values of precisions and recalls of linear cracks, joints, fillings, and shadows are significantly lower in segmentation masks than bounding boxes. Other classes showed similar harmonic values. Discussions are made on different performances of detection metrics of bounding boxes and segmentation masks focusing on detection algorithms of both detectors.
摘要:在边界面膜R-CNN模型的盒子和分割面具输出检测性能进行评估。有在边框和分割口罩,其中前者是不断优于后者检测性能显著差异。精度的谐波值和线性裂缝,接缝,馅料的回顾,并且阴影是在分割掩码比边界框显著更低。其他类表现出类似的谐波值。讨论是在包围盒与分割掩码着眼于两个检测器的检测算法的检测度量的不同性能制成。
Haruhiro Fujita, Masatoshi Itagaki, Yew Kwang Hooi, Kenta Ichikawa, Kazutaka Kawano, Ryo Yamamoto
Abstract: Detection performances on bounding box and segmentation mask outputs of Mask R-CNN models are evaluated. There are significant differences in detection performances of bounding boxes and segmentation masks, where the former is constantly superior to the latter. Harmonic values of precisions and recalls of linear cracks, joints, fillings, and shadows are significantly lower in segmentation masks than bounding boxes. Other classes showed similar harmonic values. Discussions are made on different performances of detection metrics of bounding boxes and segmentation masks focusing on detection algorithms of both detectors.
摘要:在边界面膜R-CNN模型的盒子和分割面具输出检测性能进行评估。有在边框和分割口罩,其中前者是不断优于后者检测性能显著差异。精度的谐波值和线性裂缝,接缝,馅料的回顾,并且阴影是在分割掩码比边界框显著更低。其他类表现出类似的谐波值。讨论是在包围盒与分割掩码着眼于两个检测器的检测算法的检测度量的不同性能制成。
30. Memory Optimization for Deep Networks [PDF] 返回目录
Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl
Abstract: Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, \sysname requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at this https URL.
摘要:深学习是缓慢,但稳步地,创下了内存瓶颈。而在顶级的线图形处理器的计算张在过去五年增加了32倍,总的可用内存仅增长了2.5倍。从探索大体系结构中,作为训练大型网络这防止研究者需要用于存储中间输出更多的存储器。在本文中,我们提出了莫奈,自动框架,能够最大限度地减少内存占用和深层网络的计算开销。莫奈联合优化调度的检查点和各种运营商的执行情况。莫奈能够胜过现有的所有手工调整操作以及自动检查点。莫奈减少了3倍于各种型号PyTorch整体存储需求,在计算的9-16%的开销。出于同样的计算成本,\类型为sysname需要1.2-1.8x比状态的最先进的电流自动检查点框架更少的内存。我们的代码可在此HTTPS URL。
Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl
Abstract: Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, \sysname requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at this https URL.
摘要:深学习是缓慢,但稳步地,创下了内存瓶颈。而在顶级的线图形处理器的计算张在过去五年增加了32倍,总的可用内存仅增长了2.5倍。从探索大体系结构中,作为训练大型网络这防止研究者需要用于存储中间输出更多的存储器。在本文中,我们提出了莫奈,自动框架,能够最大限度地减少内存占用和深层网络的计算开销。莫奈联合优化调度的检查点和各种运营商的执行情况。莫奈能够胜过现有的所有手工调整操作以及自动检查点。莫奈减少了3倍于各种型号PyTorch整体存储需求,在计算的9-16%的开销。出于同样的计算成本,\类型为sysname需要1.2-1.8x比状态的最先进的电流自动检查点框架更少的内存。我们的代码可在此HTTPS URL。
31. Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa) [PDF] 返回目录
Mladen Popović, Maruf A. Dhali, Lambert Schomaker
Abstract: The Dead Sea Scrolls are tangible evidence of the Bible's ancient scribal culture. Palaeography - the study of ancient handwriting - can provide access to this scribal culture. However, one of the problems of traditional palaeography is to determine writer identity when the writing style is near uniform. This is exemplified by the Great Isaiah Scroll (1QIsaa). To this end, we used pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification and to pioneer the microlevel of individual scribes to open access to the Bible's ancient scribal culture. Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll. Without prior assumption of writer identity, based on point clouds of the reduced-dimensionality feature-space, we found that columns from the first and second halves of the manuscript ended up in two distinct zones of such scatter plots, notably for a range of digital palaeography tools, each addressing very different featural aspects of the script samples. In a secondary, independent, analysis, now assuming writer difference and using yet another independent feature method and several different types of statistical testing, a switching point was found in the column series. A clear phase transition is apparent around column 27. Given the statistically significant differences between the two halves, a tertiary, post-hoc analysis was performed. Demonstrating that two main scribes were responsible for the Great Isaiah Scroll, this study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript.
摘要:死海古卷是圣经的古抄本文化的实物证据。古文字学 - 古代笔迹的研究 - 可以访问这个抄写文化。然而,传统的古文字学的问题之一是确定的作家身份,当写作风格接近均匀。这是由大以赛亚书(1QIsaa)为例。为此,我们采用模式识别和人工智能技术创新滚动关于笔迹鉴别的古文字学和个人文士的微观先驱开放获取圣经的古老文化抄写。虽然许多学者认为1QIsaa被写了一条划痕,我们报告在此滚动的系列专栏的一个突破点的新证据。如果没有作家的身份事先假设的基础上,缩减维特征空间的点云,我们发现,从手稿的第一页和第二页列在这样的散点图的两个不同的区域结束了,尤其是对一系列数字古文字学的工具,每个处理脚本样本非常不同featural方面。在二次的,独立的,分析,现在假定作家差,并使用另一独立的特征的方法和几种不同类型的统计测试的,切换点在列系列被发现。一个明显的相变是由于在两个半部之间的统计学差异显著表观围绕柱27,进行第三,事后分析。这表明两个主要的文士分别负责大以赛亚书,本研究通过提供新的,切实的证据表明,古代经文不是由只有一个文士复制,但多个文士会密切合作上揭示了圣经的古抄本文化另眼相看一个特定的手稿。
Mladen Popović, Maruf A. Dhali, Lambert Schomaker
Abstract: The Dead Sea Scrolls are tangible evidence of the Bible's ancient scribal culture. Palaeography - the study of ancient handwriting - can provide access to this scribal culture. However, one of the problems of traditional palaeography is to determine writer identity when the writing style is near uniform. This is exemplified by the Great Isaiah Scroll (1QIsaa). To this end, we used pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification and to pioneer the microlevel of individual scribes to open access to the Bible's ancient scribal culture. Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll. Without prior assumption of writer identity, based on point clouds of the reduced-dimensionality feature-space, we found that columns from the first and second halves of the manuscript ended up in two distinct zones of such scatter plots, notably for a range of digital palaeography tools, each addressing very different featural aspects of the script samples. In a secondary, independent, analysis, now assuming writer difference and using yet another independent feature method and several different types of statistical testing, a switching point was found in the column series. A clear phase transition is apparent around column 27. Given the statistically significant differences between the two halves, a tertiary, post-hoc analysis was performed. Demonstrating that two main scribes were responsible for the Great Isaiah Scroll, this study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript.
摘要:死海古卷是圣经的古抄本文化的实物证据。古文字学 - 古代笔迹的研究 - 可以访问这个抄写文化。然而,传统的古文字学的问题之一是确定的作家身份,当写作风格接近均匀。这是由大以赛亚书(1QIsaa)为例。为此,我们采用模式识别和人工智能技术创新滚动关于笔迹鉴别的古文字学和个人文士的微观先驱开放获取圣经的古老文化抄写。虽然许多学者认为1QIsaa被写了一条划痕,我们报告在此滚动的系列专栏的一个突破点的新证据。如果没有作家的身份事先假设的基础上,缩减维特征空间的点云,我们发现,从手稿的第一页和第二页列在这样的散点图的两个不同的区域结束了,尤其是对一系列数字古文字学的工具,每个处理脚本样本非常不同featural方面。在二次的,独立的,分析,现在假定作家差,并使用另一独立的特征的方法和几种不同类型的统计测试的,切换点在列系列被发现。一个明显的相变是由于在两个半部之间的统计学差异显著表观围绕柱27,进行第三,事后分析。这表明两个主要的文士分别负责大以赛亚书,本研究通过提供新的,切实的证据表明,古代经文不是由只有一个文士复制,但多个文士会密切合作上揭示了圣经的古抄本文化另眼相看一个特定的手稿。
32. Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging [PDF] 返回目录
He Sun, Katherine L. Bouman
Abstract: Computational image reconstruction algorithms generally produce a single image without any measure of uncertainty or confidence. Regularized Maximum Likelihood (RML) and feed-forward deep learning approaches for inverse problems typically focus on recovering a point estimate. This is a serious limitation when working with underdetermined imaging systems, where it is conceivable that multiple image modes would be consistent with the measured data. Characterizing the space of probable images that explain the observational data is therefore crucial. In this paper, we propose a variational deep probabilistic imaging approach to quantify reconstruction uncertainty. Deep Probabilistic Imaging (DPI) employs an untrained deep generative model to estimate a posterior distribution of an unobserved image. This approach does not require any training data; instead, it optimizes the weights of a neural network to generate image samples that fit a particular measurement dataset. Once the network weights have been learned, the posterior distribution can be efficiently sampled. We demonstrate this approach in the context of interferometric radio imaging, which is used for black hole imaging with the Event Horizon Telescope.
摘要:计算图像重建算法通常产生单个图像而没有不确定性或置信度的任何措施。正则最大似然(RML)和前馈深学习方法对逆问题通常集中在回收点估计。这是欠定的成像系统,其中,可以想到的是多个图像模式将与测得的数据相一致工作时,一个严重的限制。表征解释观测资料可能的图像的空间是至关重要的。在本文中,我们提出了一个变深概率成像方法,用以量化重建的不确定性。深概率成像(DPI)采用未经训练的深生成模型来估计未观测到的图像的后验分布。这种方法不需要任何训练数据;相反,它可优化神经网络的权重,以生成出满足特定的测量数据组的图像样本。一旦网络的权重已经被学习,后验分布,可以有效地采样。我们证明在干涉放射性成像,它是用于与事件视界望远镜黑洞成像的背景下,这个办法。
He Sun, Katherine L. Bouman
Abstract: Computational image reconstruction algorithms generally produce a single image without any measure of uncertainty or confidence. Regularized Maximum Likelihood (RML) and feed-forward deep learning approaches for inverse problems typically focus on recovering a point estimate. This is a serious limitation when working with underdetermined imaging systems, where it is conceivable that multiple image modes would be consistent with the measured data. Characterizing the space of probable images that explain the observational data is therefore crucial. In this paper, we propose a variational deep probabilistic imaging approach to quantify reconstruction uncertainty. Deep Probabilistic Imaging (DPI) employs an untrained deep generative model to estimate a posterior distribution of an unobserved image. This approach does not require any training data; instead, it optimizes the weights of a neural network to generate image samples that fit a particular measurement dataset. Once the network weights have been learned, the posterior distribution can be efficiently sampled. We demonstrate this approach in the context of interferometric radio imaging, which is used for black hole imaging with the Event Horizon Telescope.
摘要:计算图像重建算法通常产生单个图像而没有不确定性或置信度的任何措施。正则最大似然(RML)和前馈深学习方法对逆问题通常集中在回收点估计。这是欠定的成像系统,其中,可以想到的是多个图像模式将与测得的数据相一致工作时,一个严重的限制。表征解释观测资料可能的图像的空间是至关重要的。在本文中,我们提出了一个变深概率成像方法,用以量化重建的不确定性。深概率成像(DPI)采用未经训练的深生成模型来估计未观测到的图像的后验分布。这种方法不需要任何训练数据;相反,它可优化神经网络的权重,以生成出满足特定的测量数据组的图像样本。一旦网络的权重已经被学习,后验分布,可以有效地采样。我们证明在干涉放射性成像,它是用于与事件视界望远镜黑洞成像的背景下,这个办法。
33. On the Transfer of Disentangled Representations in Realistic Settings [PDF] 返回目录
Andrea Dittadi, Frederik Träuble, Francesco Locatello, Manuel Wüthrich, Vaibhav Agrawal, Ole Winther, Stefan Bauer, Bernhard Schölkopf
Abstract: Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same robotic setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.
摘要:该解开数据生成过程的底层结构学习有意义的陈述被认为是机器学习至关重要。虽然被发现解缠结的交涉,对不同的任务非常有用,如抽象推理和公平的分类,他们的可扩展性和现实世界的影响仍值得怀疑。我们介绍用1M的新高清晰度的数据集模拟图像,并在1800注释相同的机器人安装的真实世界的影像。相较于以前的工作中,这个新的数据集的相关性显示出,一个复杂的底层结构,并允许评估转移到看不见模拟和现实世界的设置,其中所述编码器ⅰ)保持在分配或ii)是分布的。我们以规模解开表示学习逼真的高分辨率设置,并在此数据集进行解开表示的大型实证研究提出了新的架构。我们观察到的解开是外的分布(OOD)任务绩效的良好预测。
Andrea Dittadi, Frederik Träuble, Francesco Locatello, Manuel Wüthrich, Vaibhav Agrawal, Ole Winther, Stefan Bauer, Bernhard Schölkopf
Abstract: Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same robotic setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.
摘要:该解开数据生成过程的底层结构学习有意义的陈述被认为是机器学习至关重要。虽然被发现解缠结的交涉,对不同的任务非常有用,如抽象推理和公平的分类,他们的可扩展性和现实世界的影响仍值得怀疑。我们介绍用1M的新高清晰度的数据集模拟图像,并在1800注释相同的机器人安装的真实世界的影像。相较于以前的工作中,这个新的数据集的相关性显示出,一个复杂的底层结构,并允许评估转移到看不见模拟和现实世界的设置,其中所述编码器ⅰ)保持在分配或ii)是分布的。我们以规模解开表示学习逼真的高分辨率设置,并在此数据集进行解开表示的大型实证研究提出了新的架构。我们观察到的解开是外的分布(OOD)任务绩效的良好预测。
34. Fourth-Order Nonlocal Tensor Decomposition Model for Spectral Computed Tomography [PDF] 返回目录
Xiang Chen, Wenjun Xia, Yan Liu, Hu Chen, Jiliu Zhou, Yi Zhang
Abstract: Spectral computed tomography (CT) can reconstruct spectral images from different energy bins using photon counting detectors (PCDs). However, due to the limited photons and counting rate in the corresponding spectral fraction, the reconstructed spectral images usually suffer from severe noise. In this paper, a fourth-order nonlocal tensor decomposition model for spectral CT image reconstruction (FONT-SIR) method is proposed. Similar patches are collected in both spatial and spectral dimensions simultaneously to form the basic tensor unit. Additionally, principal component analysis (PCA) is applied to extract latent features from the patches for a robust and efficient similarity measure. Then, low-rank and sparsity decomposition is performed on the produced fourth-order tensor unit, and the weighted nuclear norm and total variation (TV) norm are used to enforce the low-rank and sparsity constraints, respectively. The alternating direction method of multipliers (ADMM) is adopted to optimize the objective function. The experimental results with our proposed FONT-SIR demonstrates a superior qualitative and quantitative performance for both simulated and real data sets relative to several state-of-the-art methods, in terms of noise suppression and detail preservation.
摘要:光谱计算机断层摄影(CT)可以使用光子计数探测器(个人通讯装置)重建从不同能量仓光谱图像。然而,由于有限的光子,并在对应的光谱分数计数率,重建的分光图像通常从严重的噪声困扰。在本文中,用于频谱CT图像重建(FONT-SIR)方法中的第四阶非局部张量分解模型。类似补丁同时收集在空间和光谱尺寸以形成基本张量单元。此外,主成分分析(PCA)被施加到提取从补丁潜特征为一个健壮和有效的相似性度量。然后,低秩和稀疏分解所产生的四阶张量单元上执行,并且所述加权核范数和总偏差(TV)标准被用于执行低秩和稀疏性约束,分别。乘法器(ADMM)的交替方向方法采用以优化目标函数。我们建议FONT-SIR实验结果表明相对于国家的最先进的几种方法都模拟和真实数据集卓越的定性和定量的性能,噪声抑制和细节保留方面。
Xiang Chen, Wenjun Xia, Yan Liu, Hu Chen, Jiliu Zhou, Yi Zhang
Abstract: Spectral computed tomography (CT) can reconstruct spectral images from different energy bins using photon counting detectors (PCDs). However, due to the limited photons and counting rate in the corresponding spectral fraction, the reconstructed spectral images usually suffer from severe noise. In this paper, a fourth-order nonlocal tensor decomposition model for spectral CT image reconstruction (FONT-SIR) method is proposed. Similar patches are collected in both spatial and spectral dimensions simultaneously to form the basic tensor unit. Additionally, principal component analysis (PCA) is applied to extract latent features from the patches for a robust and efficient similarity measure. Then, low-rank and sparsity decomposition is performed on the produced fourth-order tensor unit, and the weighted nuclear norm and total variation (TV) norm are used to enforce the low-rank and sparsity constraints, respectively. The alternating direction method of multipliers (ADMM) is adopted to optimize the objective function. The experimental results with our proposed FONT-SIR demonstrates a superior qualitative and quantitative performance for both simulated and real data sets relative to several state-of-the-art methods, in terms of noise suppression and detail preservation.
摘要:光谱计算机断层摄影(CT)可以使用光子计数探测器(个人通讯装置)重建从不同能量仓光谱图像。然而,由于有限的光子,并在对应的光谱分数计数率,重建的分光图像通常从严重的噪声困扰。在本文中,用于频谱CT图像重建(FONT-SIR)方法中的第四阶非局部张量分解模型。类似补丁同时收集在空间和光谱尺寸以形成基本张量单元。此外,主成分分析(PCA)被施加到提取从补丁潜特征为一个健壮和有效的相似性度量。然后,低秩和稀疏分解所产生的四阶张量单元上执行,并且所述加权核范数和总偏差(TV)标准被用于执行低秩和稀疏性约束,分别。乘法器(ADMM)的交替方向方法采用以优化目标函数。我们建议FONT-SIR实验结果表明相对于国家的最先进的几种方法都模拟和真实数据集卓越的定性和定量的性能,噪声抑制和细节保留方面。
35. CT Reconstruction with PDF: Parameter-Dependent Framework for Multiple Scanning Geometries and Dose Levels [PDF] 返回目录
Wenjun Xia, Zexin Lu, Yongqiang Huang, Yan Liu, Hu Chen, Jiliu Zhou, Yi Zhang
Abstract: Current mainstream of CT reconstruction methods based on deep learning usually needs to fix the scanning geometry and dose level, which will significantly aggravate the training cost and need more training data for clinical application. In this paper, we propose a parameter-dependent framework (PDF) which trains data with multiple scanning geometries and dose levels simultaneously. In the proposed PDF, the geometry and dose level are parameterized and fed into two multi-layer perceptrons (MLPs). The MLPs are leveraged to modulate the feature maps of CT reconstruction network, which condition the network outputs on different scanning geometries and dose levels. The experiments show that our proposed method can obtain competing performance similar to the original network trained with specific geometry and dose level, which can efficiently save the extra training cost for multiple scanning geometries and dose levels.
摘要:基于深度学习CT重建方法目前主流通常需要固定扫描几何形状和剂量水平,这将加剧显著的培训成本,并需要更多的训练数据的临床应用。在本文中,我们提出了一个从属参数的框架(PDF)与多个扫描几何形状和剂量水平同时这列车数据。在所提出的PDF,几何形状和剂量水平被参数化并送入两个多层感知器(的MLP)。所述的MLP被利用来调节特征CT重建网络,分别映射该条件对不同的扫描几何形状和剂量水平的网络输出。实验表明,该方法能够获得竞争类似于特定的几何形状和剂量水平训练的原有网络,性能,能有效节省多重扫描几何形状和剂量水平额外的培训费用。
Wenjun Xia, Zexin Lu, Yongqiang Huang, Yan Liu, Hu Chen, Jiliu Zhou, Yi Zhang
Abstract: Current mainstream of CT reconstruction methods based on deep learning usually needs to fix the scanning geometry and dose level, which will significantly aggravate the training cost and need more training data for clinical application. In this paper, we propose a parameter-dependent framework (PDF) which trains data with multiple scanning geometries and dose levels simultaneously. In the proposed PDF, the geometry and dose level are parameterized and fed into two multi-layer perceptrons (MLPs). The MLPs are leveraged to modulate the feature maps of CT reconstruction network, which condition the network outputs on different scanning geometries and dose levels. The experiments show that our proposed method can obtain competing performance similar to the original network trained with specific geometry and dose level, which can efficiently save the extra training cost for multiple scanning geometries and dose levels.
摘要:基于深度学习CT重建方法目前主流通常需要固定扫描几何形状和剂量水平,这将加剧显著的培训成本,并需要更多的训练数据的临床应用。在本文中,我们提出了一个从属参数的框架(PDF)与多个扫描几何形状和剂量水平同时这列车数据。在所提出的PDF,几何形状和剂量水平被参数化并送入两个多层感知器(的MLP)。所述的MLP被利用来调节特征CT重建网络,分别映射该条件对不同的扫描几何形状和剂量水平的网络输出。实验表明,该方法能够获得竞争类似于特定的几何形状和剂量水平训练的原有网络,性能,能有效节省多重扫描几何形状和剂量水平额外的培训费用。
36. Fit to Measure: Reasoning about Sizes for Robust Object Recognition [PDF] 返回目录
Agnese Chiatti, Enrico Motta, Enrico Daga, Gianluca Bardaro
Abstract: Service robots can help with many of our daily tasks, especially in those cases where it is inconvenient or unsafe for us to intervene: e.g., under extreme weather conditions or when social distance needs to be maintained. However, before we can successfully delegate complex tasks to robots, we need to enhance their ability to make sense of dynamic, real world environments. In this context, the first prerequisite to improving the Visual Intelligence of a robot is building robust and reliable object recognition systems. While object recognition solutions are traditionally based on Machine Learning methods, augmenting them with knowledge based reasoners has been shown to improve their performance. In particular, based on our prior work on identifying the epistemic requirements of Visual Intelligence, we hypothesise that knowledge of the typical size of objects could significantly improve the accuracy of an object recognition system. To verify this hypothesis, in this paper we present an approach to integrating knowledge about object sizes in a ML based architecture. Our experiments in a real world robotic scenario show that this combined approach ensures a significant performance increase over state of the art Machine Learning methods.
摘要:服务机器人可以与我们的许多日常任务的帮助,尤其是在那些情况下不方便或不安全的对我们进行干预:例如,极端天气条件下,或当要保持社交距离的需求下。然而,才可以成功委派复杂任务的机器人,我们需要提高其动态的意义,真实世界环境的能力。在这种情况下,第一前提提高机器人的视觉情报是建立坚固和可靠的对象识别系统。虽然物体识别解决方案是传统的基于机器学习方法,基于知识的推理增强他们已被证明是提高其性能。尤其是,基于我们之前的识别视觉智能的认知要求的工作,我们假设对象的典型尺寸的知识可以显著提高目标识别系统的准确度。为了验证这一假说,在本文中,我们提出了一种方法来整合有关基于ML架构对象大小的知识。我们在现实世界的机器人的情况表明,这种组合方法确保在艺术机器学习方法状态显著的性能提升实验。
Agnese Chiatti, Enrico Motta, Enrico Daga, Gianluca Bardaro
Abstract: Service robots can help with many of our daily tasks, especially in those cases where it is inconvenient or unsafe for us to intervene: e.g., under extreme weather conditions or when social distance needs to be maintained. However, before we can successfully delegate complex tasks to robots, we need to enhance their ability to make sense of dynamic, real world environments. In this context, the first prerequisite to improving the Visual Intelligence of a robot is building robust and reliable object recognition systems. While object recognition solutions are traditionally based on Machine Learning methods, augmenting them with knowledge based reasoners has been shown to improve their performance. In particular, based on our prior work on identifying the epistemic requirements of Visual Intelligence, we hypothesise that knowledge of the typical size of objects could significantly improve the accuracy of an object recognition system. To verify this hypothesis, in this paper we present an approach to integrating knowledge about object sizes in a ML based architecture. Our experiments in a real world robotic scenario show that this combined approach ensures a significant performance increase over state of the art Machine Learning methods.
摘要:服务机器人可以与我们的许多日常任务的帮助,尤其是在那些情况下不方便或不安全的对我们进行干预:例如,极端天气条件下,或当要保持社交距离的需求下。然而,才可以成功委派复杂任务的机器人,我们需要提高其动态的意义,真实世界环境的能力。在这种情况下,第一前提提高机器人的视觉情报是建立坚固和可靠的对象识别系统。虽然物体识别解决方案是传统的基于机器学习方法,基于知识的推理增强他们已被证明是提高其性能。尤其是,基于我们之前的识别视觉智能的认知要求的工作,我们假设对象的典型尺寸的知识可以显著提高目标识别系统的准确度。为了验证这一假说,在本文中,我们提出了一种方法来整合有关基于ML架构对象大小的知识。我们在现实世界的机器人的情况表明,这种组合方法确保在艺术机器学习方法状态显著的性能提升实验。
37. Robust Odometry and Mapping for Multi-LiDAR Systems with Online Extrinsic Calibration [PDF] 返回目录
Jianhao Jiao, Haoyang Ye, Yilong Zhu, Ming Liu
Abstract: Combining multiple LiDARs enables a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, which is promising for simultaneous localization and mapping (SLAM). This paper proposes a system to achieve robust and simultaneous extrinsic calibration, odometry, and mapping for multiple LiDARs. Our approach starts with measurement preprocessing to extract edge and planar features from raw measurements. After a motion and extrinsic initialization procedure, a sliding window-based multi-LiDAR odometry runs onboard to estimate poses with online calibration refinement and convergence identification. We further develop a mapping algorithm to construct a global map and optimize poses with sufficient features together with a method to model and reduce data uncertainty. We validate our approach's performance with extensive experiments on ten sequences (4.60km total length) for the calibration and SLAM and compare them against the state-of-the-art. We demonstrate that the proposed work is a complete, robust, and extensible system for various multi-LiDAR setups. The source code, datasets, and demonstrations are available at this https URL.
摘要:合并多个激光雷达使机器人以最大化其感知环境的认识,并获得足够的测量,这是有希望的用于同时定位和地图创建(SLAM)。本文提出了一种系统,以实现鲁棒和同时外部校准,里程计,和映射为多个激光雷达。我们的方法开始于测量预处理来提取边缘和平面从原始测量功能。运动和外在的初始化程序后,基于滑动窗口的多激光雷达测距机上运行,估计姿势与在线标定细化和收敛鉴定。我们进一步发展的映射算法构建具有足够的功能构成的世界地图和优化的姿势与方法一起进行建模和减少数据的不确定性。我们确认我们的方法的性能与校准和SLAM十项序列广泛的实验(4.60公里总长度),并比较他们对国家的最先进的。我们表明,该作品是各种多激光雷达设置一个完整的,强大的,可扩展的系统。源代码,数据集,并演示可在此HTTPS URL。
Jianhao Jiao, Haoyang Ye, Yilong Zhu, Ming Liu
Abstract: Combining multiple LiDARs enables a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, which is promising for simultaneous localization and mapping (SLAM). This paper proposes a system to achieve robust and simultaneous extrinsic calibration, odometry, and mapping for multiple LiDARs. Our approach starts with measurement preprocessing to extract edge and planar features from raw measurements. After a motion and extrinsic initialization procedure, a sliding window-based multi-LiDAR odometry runs onboard to estimate poses with online calibration refinement and convergence identification. We further develop a mapping algorithm to construct a global map and optimize poses with sufficient features together with a method to model and reduce data uncertainty. We validate our approach's performance with extensive experiments on ten sequences (4.60km total length) for the calibration and SLAM and compare them against the state-of-the-art. We demonstrate that the proposed work is a complete, robust, and extensible system for various multi-LiDAR setups. The source code, datasets, and demonstrations are available at this https URL.
摘要:合并多个激光雷达使机器人以最大化其感知环境的认识,并获得足够的测量,这是有希望的用于同时定位和地图创建(SLAM)。本文提出了一种系统,以实现鲁棒和同时外部校准,里程计,和映射为多个激光雷达。我们的方法开始于测量预处理来提取边缘和平面从原始测量功能。运动和外在的初始化程序后,基于滑动窗口的多激光雷达测距机上运行,估计姿势与在线标定细化和收敛鉴定。我们进一步发展的映射算法构建具有足够的功能构成的世界地图和优化的姿势与方法一起进行建模和减少数据的不确定性。我们确认我们的方法的性能与校准和SLAM十项序列广泛的实验(4.60公里总长度),并比较他们对国家的最先进的。我们表明,该作品是各种多激光雷达设置一个完整的,强大的,可扩展的系统。源代码,数据集,并演示可在此HTTPS URL。
38. Hyperspectral Anomaly Change Detection Based on Auto-encoder [PDF] 返回目录
Meiqi Hu, Chen Wu, Liangpei Zhang, Bo Du
Abstract: With the hyperspectral imaging technology, hyperspectral data provides abundant spectral information and plays a more important role in geological survey, vegetation analysis and military reconnaissance. Different from normal change detection, hyperspectral anomaly change detection (HACD) helps to find those small but important anomaly changes between multi-temporal hyperspectral images (HSI). In previous works, most classical methods use linear regression to establish the mapping relationship between two HSIs and then detect the anomalies from the residual image. However, the real spectral differences between multi-temporal HSIs are likely to be quite complex and of nonlinearity, leading to the limited performance of these linear predictors. In this paper, we propose an original HACD algorithm based on auto-encoder (ACDA) to give a nonlinear solution. The proposed ACDA can construct an effective predictor model when facing complex imaging conditions. In the ACDA model, two systematic auto-encoder (AE) networks are deployed to construct two predictors from two directions. The predictor is used to model the spectral variation of the background to obtain the predicted image under another imaging condition. Then mean square error (MSE) between the predictive image and corresponding expected image is computed to obtain the loss map, where the spectral differences of the unchanged pixels are highly suppressed and anomaly changes are highlighted. Ultimately, we take the minimum of the two loss maps of two directions as the final anomaly change intensity map. The experiments results on public "Viareggio 2013" datasets demonstrate the efficiency and superiority over traditional methods.
摘要:随着高光谱成像技术,高光谱数据提供了丰富的光谱信息,并起着地质调查,植被分析和军事侦察更重要的作用。从正常的变化检测不同的是,高光谱异常变化检测(HACD)帮助寻找多时的高光谱图像(HSI)之间的那些小而重要的变化异常。在以前的工作,最经典的方法使用线性回归从残留图像建立两个HSIS之间的映射关系,然后检测该异常。然而,多时HSIS之间真正的光谱差异可能是相当复杂的,非线性的,导致这些线性预测的性能有限。在本文中,我们提出了基于自动编码器(ACDA)原始HACD算法给非线性解决方案。面临复杂的成像条件时所提出的ACDA可以构造一个有效的预测模型。在ACDA模型中,两个系统的自动编码器(AE)网络的部署,从两个方向修建两个预测。预测器用于背景的光谱变化进行建模另一个成像条件下,获得的预测图像。然后将预测图像和相应的预期图像之间的平均平方误差(MSE)的计算,以获得损耗地图,其中,所述不变像素的光谱差异是高度抑制和异常变化突出显示。最终,我们以最小的两个过程,最终的异常变化强度图的两个损失地图。公共“维亚雷焦2013”数据集上的实验结果表明,在传统方法的有效性和优越性。
Meiqi Hu, Chen Wu, Liangpei Zhang, Bo Du
Abstract: With the hyperspectral imaging technology, hyperspectral data provides abundant spectral information and plays a more important role in geological survey, vegetation analysis and military reconnaissance. Different from normal change detection, hyperspectral anomaly change detection (HACD) helps to find those small but important anomaly changes between multi-temporal hyperspectral images (HSI). In previous works, most classical methods use linear regression to establish the mapping relationship between two HSIs and then detect the anomalies from the residual image. However, the real spectral differences between multi-temporal HSIs are likely to be quite complex and of nonlinearity, leading to the limited performance of these linear predictors. In this paper, we propose an original HACD algorithm based on auto-encoder (ACDA) to give a nonlinear solution. The proposed ACDA can construct an effective predictor model when facing complex imaging conditions. In the ACDA model, two systematic auto-encoder (AE) networks are deployed to construct two predictors from two directions. The predictor is used to model the spectral variation of the background to obtain the predicted image under another imaging condition. Then mean square error (MSE) between the predictive image and corresponding expected image is computed to obtain the loss map, where the spectral differences of the unchanged pixels are highly suppressed and anomaly changes are highlighted. Ultimately, we take the minimum of the two loss maps of two directions as the final anomaly change intensity map. The experiments results on public "Viareggio 2013" datasets demonstrate the efficiency and superiority over traditional methods.
摘要:随着高光谱成像技术,高光谱数据提供了丰富的光谱信息,并起着地质调查,植被分析和军事侦察更重要的作用。从正常的变化检测不同的是,高光谱异常变化检测(HACD)帮助寻找多时的高光谱图像(HSI)之间的那些小而重要的变化异常。在以前的工作,最经典的方法使用线性回归从残留图像建立两个HSIS之间的映射关系,然后检测该异常。然而,多时HSIS之间真正的光谱差异可能是相当复杂的,非线性的,导致这些线性预测的性能有限。在本文中,我们提出了基于自动编码器(ACDA)原始HACD算法给非线性解决方案。面临复杂的成像条件时所提出的ACDA可以构造一个有效的预测模型。在ACDA模型中,两个系统的自动编码器(AE)网络的部署,从两个方向修建两个预测。预测器用于背景的光谱变化进行建模另一个成像条件下,获得的预测图像。然后将预测图像和相应的预期图像之间的平均平方误差(MSE)的计算,以获得损耗地图,其中,所述不变像素的光谱差异是高度抑制和异常变化突出显示。最终,我们以最小的两个过程,最终的异常变化强度图的两个损失地图。公共“维亚雷焦2013”数据集上的实验结果表明,在传统方法的有效性和优越性。
39. Micro-CT Synthesis and Inner Ear Super Resolution via Bayesian Generative Adversarial Networks [PDF] 返回目录
Hongwei Li, Rameshwara G. N. Prasad, Anjany Sekuboyina, Chen Niu, Siwei Bai, Werner Hemmert, Bjoern Menze
Abstract: Existing medical image super-resolution methods rely on pairs of low- and high- resolution images to learn a mapping in a fully supervised manner. However, such image pairs are often not available in clinical practice. In this paper, we address super resolution problem in a real-world scenario using unpaired data and synthesize linearly \textbf{eight times} higher resolved Micro-CT images of temporal bone structure, which is embedded in the inner ear. We explore cycle-consistency generative adversarial networks for super-resolution task and equip the translation approach with Bayesian inference. We further introduce \emph{Hu Moment} the evaluation metric to quantify the structure of the temporal bone. We evaluate our method on a public inner ear CT dataset and have seen both visual and quantitative improvement over state-of-the-art deep-learning based methods. In addition, we perform a multi-rater visual evaluation experiment and find that trained experts consistently rate the proposed method highest quality scores among all methods. Implementing our approach as an end-to-end learning task, we are able to quantify uncertainty in the unpaired translation tasks and find that the uncertainty mask can provide structural information of the temporal bone.
摘要:现有医学影像超分辨率方法依赖于对低收入和高清晰度的图像,以了解在完全监督的方式映射。然而,这样的图像对经常在临床上并不可用。在本文中,我们采用非配对数据和解决现实世界的情况下超分辨率问题合成线性\ textbf {八次}颞骨结构,它被嵌入在内耳的更高分辨显微CT图像。我们探讨了超分辨率任务周期的一致性生成对抗网络及装备与贝叶斯推理的翻译方法。我们进一步介绍\ {EMPH Hu矩}评价指标量化颞骨的结构。我们评估我们在公共内耳CT数据集的方法和已经经历过深刻的学习基础的方法的国家的最先进的视觉和定量的改善。此外,我们进行多方位的视觉评价实验,发现训练有素的专家的所有方法中一致率所提出的方法最高的质量分数。实现我们作为一个终端到终端的学习任务的办法,我们可以在不成对的翻译任务量化的不确定性,如果发现不确定性面膜能提供颞骨的结构信息。
Hongwei Li, Rameshwara G. N. Prasad, Anjany Sekuboyina, Chen Niu, Siwei Bai, Werner Hemmert, Bjoern Menze
Abstract: Existing medical image super-resolution methods rely on pairs of low- and high- resolution images to learn a mapping in a fully supervised manner. However, such image pairs are often not available in clinical practice. In this paper, we address super resolution problem in a real-world scenario using unpaired data and synthesize linearly \textbf{eight times} higher resolved Micro-CT images of temporal bone structure, which is embedded in the inner ear. We explore cycle-consistency generative adversarial networks for super-resolution task and equip the translation approach with Bayesian inference. We further introduce \emph{Hu Moment} the evaluation metric to quantify the structure of the temporal bone. We evaluate our method on a public inner ear CT dataset and have seen both visual and quantitative improvement over state-of-the-art deep-learning based methods. In addition, we perform a multi-rater visual evaluation experiment and find that trained experts consistently rate the proposed method highest quality scores among all methods. Implementing our approach as an end-to-end learning task, we are able to quantify uncertainty in the unpaired translation tasks and find that the uncertainty mask can provide structural information of the temporal bone.
摘要:现有医学影像超分辨率方法依赖于对低收入和高清晰度的图像,以了解在完全监督的方式映射。然而,这样的图像对经常在临床上并不可用。在本文中,我们采用非配对数据和解决现实世界的情况下超分辨率问题合成线性\ textbf {八次}颞骨结构,它被嵌入在内耳的更高分辨显微CT图像。我们探讨了超分辨率任务周期的一致性生成对抗网络及装备与贝叶斯推理的翻译方法。我们进一步介绍\ {EMPH Hu矩}评价指标量化颞骨的结构。我们评估我们在公共内耳CT数据集的方法和已经经历过深刻的学习基础的方法的国家的最先进的视觉和定量的改善。此外,我们进行多方位的视觉评价实验,发现训练有素的专家的所有方法中一致率所提出的方法最高的质量分数。实现我们作为一个终端到终端的学习任务的办法,我们可以在不成对的翻译任务量化的不确定性,如果发现不确定性面膜能提供颞骨的结构信息。
40. Triple-view Convolutional Neural Networks for COVID-19 Diagnosis with Chest X-ray [PDF] 返回目录
Jianjia Zhang
Abstract: The Coronavirus Disease 2019 (COVID-19) is affecting increasingly large number of people worldwide, posing significant stress to the health care systems. Early and accurate diagnosis of COVID-19 is critical in screening of infected patients and breaking the person-to-person transmission. Chest X-ray (CXR) based computer-aided diagnosis of COVID-19 using deep learning becomes a promising solution to this end. However, the diverse and various radiographic features of COVID-19 make it challenging, especially when considering each CXR scan typically only generates one single image. Data scarcity is another issue since collecting large-scale medical CXR data set could be difficult at present. Therefore, how to extract more informative and relevant features from the limited samples available becomes essential. To address these issues, unlike traditional methods processing each CXR image from a single view, this paper proposes triple-view convolutional neural networks for COVID-19 diagnosis with CXR images. Specifically, the proposed networks extract individual features from three views of each CXR image, i.e., the left lung view, the right lung view and the overall view, in three streams and then integrate them for joint diagnosis. The proposed network structure respects the anatomical structure of human lungs and is well aligned with clinical diagnosis of COVID-19 in practice. In addition, the labeling of the views does not require experts' domain knowledge, which is needed by many existing methods. The experimental results show that the proposed method achieves state-of-the-art performance, especially in the more challenging three class classification task, and admits wide generality and high flexibility.
摘要:冠状病毒病2019(COVID-19)的影响越来越大,许多人在世界范围内,显著压力威胁到了卫生保健系统。 COVID-19的早期和准确的诊断是感染患者的筛查和打破了人对人的传播至关重要。胸部透视COVID-19使用深的学习(CXR)基于计算机辅助诊断成为有前途的解决这一端。然而,COVID-19补充的不同和各种放射学特征它挑战,考虑每个CXR扫描通常只生成一个单一的图像时尤其如此。数据匮乏是因为收集大型医疗CXR数据集可能难以在目前的另一个问题。因此,如何从有限的样本中提取更多的信息和相关的可用功能就显得至关重要了。为了解决这些问题,与传统的方法处理来自单个视图中的每个图像CXR,本文提出了一种用于COVID-19诊断CXR图像三重视图卷积神经网络。具体地,所提出的网络从三个视图每个CXR图像,即,左视图肺,右肺视图和整体视图的,在三个流中提取各个特征,然后将它们集成为共同诊断。所提出的网络结构,尊重人体肺部的解剖结构,并很好地在实践中临床诊断COVID-19的对齐。此外,意见标签并不需要专家的领域知识,这是许多现有方法需要。实验结果表明,所提出的方法实现了国家的最先进的性能,尤其是在更具挑战性的3类分类任务,并且承认宽的通用性和高柔软性。
Jianjia Zhang
Abstract: The Coronavirus Disease 2019 (COVID-19) is affecting increasingly large number of people worldwide, posing significant stress to the health care systems. Early and accurate diagnosis of COVID-19 is critical in screening of infected patients and breaking the person-to-person transmission. Chest X-ray (CXR) based computer-aided diagnosis of COVID-19 using deep learning becomes a promising solution to this end. However, the diverse and various radiographic features of COVID-19 make it challenging, especially when considering each CXR scan typically only generates one single image. Data scarcity is another issue since collecting large-scale medical CXR data set could be difficult at present. Therefore, how to extract more informative and relevant features from the limited samples available becomes essential. To address these issues, unlike traditional methods processing each CXR image from a single view, this paper proposes triple-view convolutional neural networks for COVID-19 diagnosis with CXR images. Specifically, the proposed networks extract individual features from three views of each CXR image, i.e., the left lung view, the right lung view and the overall view, in three streams and then integrate them for joint diagnosis. The proposed network structure respects the anatomical structure of human lungs and is well aligned with clinical diagnosis of COVID-19 in practice. In addition, the labeling of the views does not require experts' domain knowledge, which is needed by many existing methods. The experimental results show that the proposed method achieves state-of-the-art performance, especially in the more challenging three class classification task, and admits wide generality and high flexibility.
摘要:冠状病毒病2019(COVID-19)的影响越来越大,许多人在世界范围内,显著压力威胁到了卫生保健系统。 COVID-19的早期和准确的诊断是感染患者的筛查和打破了人对人的传播至关重要。胸部透视COVID-19使用深的学习(CXR)基于计算机辅助诊断成为有前途的解决这一端。然而,COVID-19补充的不同和各种放射学特征它挑战,考虑每个CXR扫描通常只生成一个单一的图像时尤其如此。数据匮乏是因为收集大型医疗CXR数据集可能难以在目前的另一个问题。因此,如何从有限的样本中提取更多的信息和相关的可用功能就显得至关重要了。为了解决这些问题,与传统的方法处理来自单个视图中的每个图像CXR,本文提出了一种用于COVID-19诊断CXR图像三重视图卷积神经网络。具体地,所提出的网络从三个视图每个CXR图像,即,左视图肺,右肺视图和整体视图的,在三个流中提取各个特征,然后将它们集成为共同诊断。所提出的网络结构,尊重人体肺部的解剖结构,并很好地在实践中临床诊断COVID-19的对齐。此外,意见标签并不需要专家的领域知识,这是许多现有方法需要。实验结果表明,所提出的方法实现了国家的最先进的性能,尤其是在更具挑战性的3类分类任务,并且承认宽的通用性和高柔软性。
41. Impact of Spherical Coordinates Transformation Pre-processing in Deep Convolution Neural Networks for Brain Tumor Segmentation and Survival Prediction [PDF] 返回目录
Carlo Russo, Sidong Liu, Antonio Di Ieva
Abstract: Pre-processing and Data Augmentation play an important role in Deep Convolutional Neural Networks (DCNN). Whereby several methods aim for standardization and augmentation of the dataset, we here propose a novel method aimed to feed DCNN with spherical space transformed input data that could better facilitate feature learning compared to standard Cartesian space images and volumes. In this work, the spherical coordinates transformation has been applied as a preprocessing method that, used in conjunction with normal MRI volumes, improves the accuracy of brain tumor segmentation and patient overall survival (OS) prediction on Brain Tumor Segmentation (BraTS) Challenge 2020 dataset. The LesionEncoder framework has been then applied to automatically extract features from DCNN models, achieving 0.586 accuracy of OS prediction on the validation data set, which is one of the best results according to BraTS 2020 leaderboard.
摘要:预处理和数据扩张发挥深层卷积神经网络(DCNN)具有重要作用。由此几种方法瞄准标准化和数据集的增强,我们在这里提出旨在与能更好地促成特征球形空间变换输入数据养活DCNN一种新颖的方法学习的标准相比,笛卡尔空间图像和体积。在这项工作中,球面坐标变换已被应用作为预处理方法,在与正常MRI体积结合使用,改善了脑肿瘤分割和脑肿瘤分割(臭小子)患者的整体存活(OS)预测挑战2020的数据集的准确度。所述LesionEncoder框架已经然后被施加到自动从DCNN模型提取特征,所述验证数据集,这是根据臭小子2020排行榜最好的结果中的一个实现OS预测的0.586精度。
Carlo Russo, Sidong Liu, Antonio Di Ieva
Abstract: Pre-processing and Data Augmentation play an important role in Deep Convolutional Neural Networks (DCNN). Whereby several methods aim for standardization and augmentation of the dataset, we here propose a novel method aimed to feed DCNN with spherical space transformed input data that could better facilitate feature learning compared to standard Cartesian space images and volumes. In this work, the spherical coordinates transformation has been applied as a preprocessing method that, used in conjunction with normal MRI volumes, improves the accuracy of brain tumor segmentation and patient overall survival (OS) prediction on Brain Tumor Segmentation (BraTS) Challenge 2020 dataset. The LesionEncoder framework has been then applied to automatically extract features from DCNN models, achieving 0.586 accuracy of OS prediction on the validation data set, which is one of the best results according to BraTS 2020 leaderboard.
摘要:预处理和数据扩张发挥深层卷积神经网络(DCNN)具有重要作用。由此几种方法瞄准标准化和数据集的增强,我们在这里提出旨在与能更好地促成特征球形空间变换输入数据养活DCNN一种新颖的方法学习的标准相比,笛卡尔空间图像和体积。在这项工作中,球面坐标变换已被应用作为预处理方法,在与正常MRI体积结合使用,改善了脑肿瘤分割和脑肿瘤分割(臭小子)患者的整体存活(OS)预测挑战2020的数据集的准确度。所述LesionEncoder框架已经然后被施加到自动从DCNN模型提取特征,所述验证数据集,这是根据臭小子2020排行榜最好的结果中的一个实现OS预测的0.586精度。
42. MELD: Meta-Reinforcement Learning from Images via Latent State Models [PDF] 返回目录
Tony Z. Zhao, Anusha Nagabandi, Kate Rakelly, Chelsea Finn, Sergey Levine
Abstract: Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.
摘要:元强化学习算法可以实现自主代理,如机器人,通过在一组相关的培训任务,充分利用以往的经验,以快速获取新的行为。然而,从感官输入,比如图像学的挑战加剧的元培训繁重的数据需求做出元RL挑战适用于真正的机器人系统。潜伏状态模型,学习从观察序列紧缩状态表示,可加速从表示视觉投入学习。在本文中,我们利用元学习的任务推论表明,潜伏状态的模型可以\ EMPH的角度{}还执行元学习给予适当定义的观察空间。在这种认识的基础上,我们开发的元RL与潜在动力(MELD),从图像的元RL的算法执行处于潜伏状态模型推理迅速掌握新的技能给予意见和奖励。 MELD优于几个模拟基于图像的机器人控制问题之前,元-RL方法,并实现了真正的WidowX机械臂插入以太网线缆插入只有8 $ $小时的真实世界元的培训后给予稀疏任务完成信号的新位置。据我们所知,MELD是从图像的真实世界的机器人控制设置训练的第一元-RL算法。
Tony Z. Zhao, Anusha Nagabandi, Kate Rakelly, Chelsea Finn, Sergey Levine
Abstract: Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.
摘要:元强化学习算法可以实现自主代理,如机器人,通过在一组相关的培训任务,充分利用以往的经验,以快速获取新的行为。然而,从感官输入,比如图像学的挑战加剧的元培训繁重的数据需求做出元RL挑战适用于真正的机器人系统。潜伏状态模型,学习从观察序列紧缩状态表示,可加速从表示视觉投入学习。在本文中,我们利用元学习的任务推论表明,潜伏状态的模型可以\ EMPH的角度{}还执行元学习给予适当定义的观察空间。在这种认识的基础上,我们开发的元RL与潜在动力(MELD),从图像的元RL的算法执行处于潜伏状态模型推理迅速掌握新的技能给予意见和奖励。 MELD优于几个模拟基于图像的机器人控制问题之前,元-RL方法,并实现了真正的WidowX机械臂插入以太网线缆插入只有8 $ $小时的真实世界元的培训后给予稀疏任务完成信号的新位置。据我们所知,MELD是从图像的真实世界的机器人控制设置训练的第一元-RL算法。
43. Improved Supervised Training of Physics-Guided Deep Learning Image Reconstruction with Multi-Masking [PDF] 返回目录
Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Mehmet Akçakaya
Abstract: Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all of the available sub-sampled measurements in their data consistency units. Thus, the network learns to fit the rest of the measurements. In this study, we propose to improve the performance and robustness of supervised training by utilizing randomness by retrospectively selecting only a subset of all the available measurements for data consistency units. The process is repeated multiple times using different random masks during training for further enhancement. Results on knee MRI show that the proposed multi-mask supervised PG-DL enhances reconstruction performance compared to conventional supervised PG-DL approaches.
摘要:物理引导深度学习(PG-DL),通过算法展开已收到改进的图像重建显著利益,包括MRI应用。这些方法展开迭代优化算法为一系列的正则和数据的一致性单元。展开的网络使用有监督的方法通常训练的端至端。当前监督PG-DL接近使用所有可用的子采样测量他们的数据一致性的单位。因此,网络学习,以适应测量的其余部分。在这项研究中,我们提出通过回顾性选择只为数据一致性单位所有可用的测量值的子集,利用随机性,提高监督训练的性能和稳定性。该工艺进一步提升培训期间重复使用不同的随机掩码多次。膝关节MRI的结果显示,所提出的多掩模监督PG-DL增强重建性能相比于传统的监督PG-DL接近。
Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Mehmet Akçakaya
Abstract: Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all of the available sub-sampled measurements in their data consistency units. Thus, the network learns to fit the rest of the measurements. In this study, we propose to improve the performance and robustness of supervised training by utilizing randomness by retrospectively selecting only a subset of all the available measurements for data consistency units. The process is repeated multiple times using different random masks during training for further enhancement. Results on knee MRI show that the proposed multi-mask supervised PG-DL enhances reconstruction performance compared to conventional supervised PG-DL approaches.
摘要:物理引导深度学习(PG-DL),通过算法展开已收到改进的图像重建显著利益,包括MRI应用。这些方法展开迭代优化算法为一系列的正则和数据的一致性单元。展开的网络使用有监督的方法通常训练的端至端。当前监督PG-DL接近使用所有可用的子采样测量他们的数据一致性的单位。因此,网络学习,以适应测量的其余部分。在这项研究中,我们提出通过回顾性选择只为数据一致性单位所有可用的测量值的子集,利用随机性,提高监督训练的性能和稳定性。该工艺进一步提升培训期间重复使用不同的随机掩码多次。膝关节MRI的结果显示,所提出的多掩模监督PG-DL增强重建性能相比于传统的监督PG-DL接近。
44. Diptychs of human and machine perceptions [PDF] 返回目录
Vivien Cabannes, Thomas Kerdreux, Louis Thiry
Abstract: We propose visual creations that put differences in algorithms and humans \emph{perceptions} into perspective. We exploit saliency maps of neural networks and visual focus of humans to create diptychs that are reinterpretations of an original image according to both machine and human attentions. Using those diptychs as a qualitative evaluation of perception, we discuss some crucial issues of current \textit{task-oriented} artificial intelligence.
摘要:本文提出的视觉创作是放在算法和人类\ {EMPH看法}区别地看待。我们利用神经网络和人的视觉焦点的特征地图创建是根据两个机器和人的关注原始图像的重新解释diptychs。使用这些diptychs知觉的定性评价,我们讨论当前\ textit {面向任务}人工智能的一些关键问题。
Vivien Cabannes, Thomas Kerdreux, Louis Thiry
Abstract: We propose visual creations that put differences in algorithms and humans \emph{perceptions} into perspective. We exploit saliency maps of neural networks and visual focus of humans to create diptychs that are reinterpretations of an original image according to both machine and human attentions. Using those diptychs as a qualitative evaluation of perception, we discuss some crucial issues of current \textit{task-oriented} artificial intelligence.
摘要:本文提出的视觉创作是放在算法和人类\ {EMPH看法}区别地看待。我们利用神经网络和人的视觉焦点的特征地图创建是根据两个机器和人的关注原始图像的重新解释diptychs。使用这些diptychs知觉的定性评价,我们讨论当前\ textit {面向任务}人工智能的一些关键问题。
注:中文为机器翻译结果!封面为论文标题词云图!