目录
4. A Generative Adversarial Approach with Residual Learning for Dust and Scratches Artifacts Removal [PDF] 摘要
5. What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors [PDF] 摘要
9. The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data [PDF] 摘要
24. Beyond Triplet Loss: Person Re-identification with Fine-grained Difference-aware Pairwise Loss [PDF] 摘要
26. Design of Efficient Deep Learning models for Determining Road Surface Condition from Roadside Camera Images and Weather Data [PDF] 摘要
27. Towards image-based automatic meter reading in unconstrained scenarios: A robust and efficient approach [PDF] 摘要
28. Segmentation and Defect Classification of the Power Line Insulators: A Deep Learning-based Approach [PDF] 摘要
33. Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time [PDF] 摘要
35. CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation [PDF] 摘要
38. Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey [PDF] 摘要
39. Semantic Workflows and Machine Learning for the Assessment of Carbon Storage by Urban Trees [PDF] 摘要
41. Survey of explainable machine learning with visual and granular methods beyond quasi-explanations [PDF] 摘要
43. Operator-valued formulas for Riemannian Gradient and Hessian and families of tractable metrics in optimization and machine learning [PDF] 摘要
44. CCBlock: An Effective Use of Deep Learning for Automatic Diagnosis of COVID-19 Using X-Ray Images [PDF] 摘要
摘要
1. MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video [PDF] 返回目录
Donglai Xiang, Fabian Prada, Chenglei Wu, Jessica Hodgins
Abstract: We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input. In contrast to the existing literature, our method does not require a pre-scanned personalized mesh template, and thus can be applied to in-the-wild videos. To constrain the output to a valid deformation space, we build statistical deformation models for three types of clothing: T-shirt, short pants and long pants. A differentiable renderer is utilized to align our captured shapes to the input frames by minimizing the difference in both silhouette and texture. We develop a UV texture growing method which expands the visible texture region of the clothing sequentially in order to minimize drift in deformation tracking. We also extract fine-grained wrinkle detail from the input videos by fitting the clothed surface to the normal maps estimated by a convolutional neural network. Our method produces temporally coherent reconstruction of body and clothing from monocular video. We demonstrate successful clothing capture results from a variety of challenging videos. Extensive quantitative experiments demonstrate the effectiveness of our method on metrics including body pose error and surface reconstruction error of the clothing.
摘要:从单眼RGB视频输入呈现给拍摄时间相干动态服装变形的方法。相较于现有文献,我们的方法不需要预扫描个性化的网模板,从而可以适用于在最疯狂的视频。为了限制输出到一个有效的变形空间,我们建立统计变形模型三种类型的服装:T恤,短裤和长裤。可微渲染器被用来通过最小化在两个轮廓和纹理的差异使我们的捕获的形状,以输入帧。我们开发了一个UV纹理生长方法,其扩展了服装的可见纹理区域依次以在变形跟踪漂移最小化。我们还通过包布表面适应由卷积神经网络估计的法线贴图提取输入视频细粒度的皱纹细节。我们的方法产生从单眼视频身体和衣服的时间相干重建。我们证明从各种挑战视频的成功的服装捕获结果。大量的定量实验表明我们的方法对度量包括身体姿态误差和服装的表面重构误差的有效性。
Donglai Xiang, Fabian Prada, Chenglei Wu, Jessica Hodgins
Abstract: We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input. In contrast to the existing literature, our method does not require a pre-scanned personalized mesh template, and thus can be applied to in-the-wild videos. To constrain the output to a valid deformation space, we build statistical deformation models for three types of clothing: T-shirt, short pants and long pants. A differentiable renderer is utilized to align our captured shapes to the input frames by minimizing the difference in both silhouette and texture. We develop a UV texture growing method which expands the visible texture region of the clothing sequentially in order to minimize drift in deformation tracking. We also extract fine-grained wrinkle detail from the input videos by fitting the clothed surface to the normal maps estimated by a convolutional neural network. Our method produces temporally coherent reconstruction of body and clothing from monocular video. We demonstrate successful clothing capture results from a variety of challenging videos. Extensive quantitative experiments demonstrate the effectiveness of our method on metrics including body pose error and surface reconstruction error of the clothing.
摘要:从单眼RGB视频输入呈现给拍摄时间相干动态服装变形的方法。相较于现有文献,我们的方法不需要预扫描个性化的网模板,从而可以适用于在最疯狂的视频。为了限制输出到一个有效的变形空间,我们建立统计变形模型三种类型的服装:T恤,短裤和长裤。可微渲染器被用来通过最小化在两个轮廓和纹理的差异使我们的捕获的形状,以输入帧。我们开发了一个UV纹理生长方法,其扩展了服装的可见纹理区域依次以在变形跟踪漂移最小化。我们还通过包布表面适应由卷积神经网络估计的法线贴图提取输入视频细粒度的皱纹细节。我们的方法产生从单眼视频身体和衣服的时间相干重建。我们证明从各种挑战视频的成功的服装捕获结果。大量的定量实验表明我们的方法对度量包括身体姿态误差和服装的表面重构误差的有效性。
2. TSV Extrusion Morphology Classification Using Deep Convolutional Neural Networks [PDF] 返回目录
Brendan Reidy, Golareh Jalilvand, Tengfei Jiang, Ramtin Zand
Abstract: In this paper, we utilize deep convolutional neural networks (CNNs) to classify the morphology of through-silicon via (TSV) extrusion in three dimensional (3D) integrated circuits (ICs). TSV extrusion is a crucial reliability concern which can deform and crack interconnect layers in 3D ICs and cause device failures. Herein, the white light interferometry (WLI) technique is used to obtain the surface profile of the extruded TSVs. We have developed a program that uses raw data obtained from WLI to create a TSV extrusion morphology dataset, including TSV images with 54x54 pixels that are labeled and categorized into three morphology classes. Four CNN architectures with different network complexities are implemented and trained for TSV extrusion morphology classification application. Data augmentation and dropout approaches are utilized to realize a balance between overfitting and underfitting in the CNN models. Results obtained show that the CNN model with optimized complexity, dropout, and data augmentation can achieve a classification accuracy comparable to that of a human expert.
摘要:在本文中,我们利用深卷积神经网络(细胞神经网络),以经由在三维(3D)集成电路(IC)(TSV)挤出的直通硅形态进行分类。 TSV挤出是一个关键的可靠性问题,其可以变形和裂纹在3D集成电路互连层和事业设备故障。在本文中,白光干涉(WLI)技术被用来获得挤出的TSV的表面轮廓。我们已经开发出一种程序,它从WLI获得使用原始数据创建一个TSV挤压形态数据集,包括与被标记,并分为三种形态类54x54像素TSV图像。四CNN架构与不同的网络复杂性是实现和训练TSV挤压形态分类中的应用。数据增加和漏失方法被用来实现过度拟合,并在CNN模型欠拟合之间的平衡。获得的结果显示,与优化的复杂性,漏失,和数据扩张的CNN模型可以实现分类精度比得上人类专家的。
Brendan Reidy, Golareh Jalilvand, Tengfei Jiang, Ramtin Zand
Abstract: In this paper, we utilize deep convolutional neural networks (CNNs) to classify the morphology of through-silicon via (TSV) extrusion in three dimensional (3D) integrated circuits (ICs). TSV extrusion is a crucial reliability concern which can deform and crack interconnect layers in 3D ICs and cause device failures. Herein, the white light interferometry (WLI) technique is used to obtain the surface profile of the extruded TSVs. We have developed a program that uses raw data obtained from WLI to create a TSV extrusion morphology dataset, including TSV images with 54x54 pixels that are labeled and categorized into three morphology classes. Four CNN architectures with different network complexities are implemented and trained for TSV extrusion morphology classification application. Data augmentation and dropout approaches are utilized to realize a balance between overfitting and underfitting in the CNN models. Results obtained show that the CNN model with optimized complexity, dropout, and data augmentation can achieve a classification accuracy comparable to that of a human expert.
摘要:在本文中,我们利用深卷积神经网络(细胞神经网络),以经由在三维(3D)集成电路(IC)(TSV)挤出的直通硅形态进行分类。 TSV挤出是一个关键的可靠性问题,其可以变形和裂纹在3D集成电路互连层和事业设备故障。在本文中,白光干涉(WLI)技术被用来获得挤出的TSV的表面轮廓。我们已经开发出一种程序,它从WLI获得使用原始数据创建一个TSV挤压形态数据集,包括与被标记,并分为三种形态类54x54像素TSV图像。四CNN架构与不同的网络复杂性是实现和训练TSV挤压形态分类中的应用。数据增加和漏失方法被用来实现过度拟合,并在CNN模型欠拟合之间的平衡。获得的结果显示,与优化的复杂性,漏失,和数据扩张的CNN模型可以实现分类精度比得上人类专家的。
3. An embedded deep learning system for augmented reality in firefighting applications [PDF] 返回目录
Manish Bhattarai, Aura Rose Jensen-Curtis, Manel MartíNez-Ramón
Abstract: Firefighting is a dynamic activity, in which numerous operations occur simultaneously. Maintaining situational awareness (i.e., knowledge of current conditions and activities at the scene) is critical to the accurate decision-making necessary for the safe and successful navigation of a fire environment by firefighters. Conversely, the disorientation caused by hazards such as smoke and extreme heat can lead to injury or even fatality. This research implements recent advancements in technology such as deep learning, point cloud and thermal imaging, and augmented reality platforms to improve a firefighter's situational awareness and scene navigation through improved interpretation of that scene. We have designed and built a prototype embedded system that can leverage data streamed from cameras built into a firefighter's personal protective equipment (PPE) to capture thermal, RGB color, and depth imagery and then deploy already developed deep learning models to analyze the input data in real time. The embedded system analyzes and returns the processed images via wireless streaming, where they can be viewed remotely and relayed back to the firefighter using an augmented reality platform that visualizes the results of the analyzed inputs and draws the firefighter's attention to objects of interest, such as doors and windows otherwise invisible through smoke and flames.
摘要:消防是一个动态的活动,其中同时发生许多操作。保持态势感知(即,当前的状况和活动现场的知识)是至关重要的准确决策需要由消防员火灾环境的安全和成功的导航。相反,神志不清造成的危害,如烟雾和酷热可能导致人身伤害甚至死亡。该研究实现了技术的最新发展,如深学习,点云和热成像和增强现实平台,通过改进的那一幕演绎,以提高消防员的态势感知和场景导航。我们已经设计并建造了可以利用的数据从内置到消防员个人防护装备(PPE)影音数据捕获热,RGB颜色和深度图像,然后部署已开发的深度学习模型来分析输入数据的原型嵌入式系统即时的。嵌入式系统的分析和回报通过无线流,在那里他们可以远程查看并使用增强现实平台,可视化的分析输入的结果,并提请消防员的关注感兴趣的对象,如中继回消防员处理图像门窗透过烟雾和火焰不可见。
Manish Bhattarai, Aura Rose Jensen-Curtis, Manel MartíNez-Ramón
Abstract: Firefighting is a dynamic activity, in which numerous operations occur simultaneously. Maintaining situational awareness (i.e., knowledge of current conditions and activities at the scene) is critical to the accurate decision-making necessary for the safe and successful navigation of a fire environment by firefighters. Conversely, the disorientation caused by hazards such as smoke and extreme heat can lead to injury or even fatality. This research implements recent advancements in technology such as deep learning, point cloud and thermal imaging, and augmented reality platforms to improve a firefighter's situational awareness and scene navigation through improved interpretation of that scene. We have designed and built a prototype embedded system that can leverage data streamed from cameras built into a firefighter's personal protective equipment (PPE) to capture thermal, RGB color, and depth imagery and then deploy already developed deep learning models to analyze the input data in real time. The embedded system analyzes and returns the processed images via wireless streaming, where they can be viewed remotely and relayed back to the firefighter using an augmented reality platform that visualizes the results of the analyzed inputs and draws the firefighter's attention to objects of interest, such as doors and windows otherwise invisible through smoke and flames.
摘要:消防是一个动态的活动,其中同时发生许多操作。保持态势感知(即,当前的状况和活动现场的知识)是至关重要的准确决策需要由消防员火灾环境的安全和成功的导航。相反,神志不清造成的危害,如烟雾和酷热可能导致人身伤害甚至死亡。该研究实现了技术的最新发展,如深学习,点云和热成像和增强现实平台,通过改进的那一幕演绎,以提高消防员的态势感知和场景导航。我们已经设计并建造了可以利用的数据从内置到消防员个人防护装备(PPE)影音数据捕获热,RGB颜色和深度图像,然后部署已开发的深度学习模型来分析输入数据的原型嵌入式系统即时的。嵌入式系统的分析和回报通过无线流,在那里他们可以远程查看并使用增强现实平台,可视化的分析输入的结果,并提请消防员的关注感兴趣的对象,如中继回消防员处理图像门窗透过烟雾和火焰不可见。
4. A Generative Adversarial Approach with Residual Learning for Dust and Scratches Artifacts Removal [PDF] 返回目录
Ionuţ Mironică
Abstract: Retouching can significantly elevate the visual appeal of photos, but many casual photographers lack the expertise to operate in a professional manner. One particularly challenging task for old photo retouching remains the removal of dust and scratches artifacts. Traditionally, this task has been completed manually with special image enhancement software and represents a tedious task that requires special know-how of photo editing applications. However, recent research utilizing Generative Adversarial Networks (GANs) has been proven to obtain good results in various automated image enhancement tasks compared to traditional methods. This motivated us to explore the use of GANs in the context of film photo editing. In this paper, we present a GAN based method that is able to remove dust and scratches errors from film scans. Specifically, residual learning is utilized to speed up the training process, as well as boost the denoising performance. An extensive evaluation of our model on a community provided dataset shows that it generalizes remarkably well, not being dependent on any particular type of image. Finally, we significantly outperform the state-of-the-art methods and software applications, providing superior results.
摘要:加工可以显著提升照片的视觉吸引力,但许多休闲摄影师缺乏以专业的方式来操作的专业知识。老照片修饰一个特别具有挑战性的任务仍然是去除灰尘和划痕假象。传统上,这个任务已经手动特殊图像增强软件完成,并表示需要诀窍的特殊照片编辑应用程序繁琐的任务。然而,利用剖成对抗性网络(甘斯)最近的研究已经证明相比于传统的方法在不同的自动图像增强工作取得了良好的效果。这促使我们去探索在电影照片编辑的背景下使用甘斯的。在本文中,我们提出了一个基于GaN的方法,其能够从胶片扫描去除灰尘和划痕的错误。具体地,残留的学习被用来加快培养过程中,以及提高降噪性能。我们的社区模型的一个广泛的评估提供数据集表明,它概括得非常好,不依赖于任何特定类型的图像。最后,我们显著超越国家的最先进的方法和软件应用,提供更好的结果。
Ionuţ Mironică
Abstract: Retouching can significantly elevate the visual appeal of photos, but many casual photographers lack the expertise to operate in a professional manner. One particularly challenging task for old photo retouching remains the removal of dust and scratches artifacts. Traditionally, this task has been completed manually with special image enhancement software and represents a tedious task that requires special know-how of photo editing applications. However, recent research utilizing Generative Adversarial Networks (GANs) has been proven to obtain good results in various automated image enhancement tasks compared to traditional methods. This motivated us to explore the use of GANs in the context of film photo editing. In this paper, we present a GAN based method that is able to remove dust and scratches errors from film scans. Specifically, residual learning is utilized to speed up the training process, as well as boost the denoising performance. An extensive evaluation of our model on a community provided dataset shows that it generalizes remarkably well, not being dependent on any particular type of image. Finally, we significantly outperform the state-of-the-art methods and software applications, providing superior results.
摘要:加工可以显著提升照片的视觉吸引力,但许多休闲摄影师缺乏以专业的方式来操作的专业知识。老照片修饰一个特别具有挑战性的任务仍然是去除灰尘和划痕假象。传统上,这个任务已经手动特殊图像增强软件完成,并表示需要诀窍的特殊照片编辑应用程序繁琐的任务。然而,利用剖成对抗性网络(甘斯)最近的研究已经证明相比于传统的方法在不同的自动图像增强工作取得了良好的效果。这促使我们去探索在电影照片编辑的背景下使用甘斯的。在本文中,我们提出了一个基于GaN的方法,其能够从胶片扫描去除灰尘和划痕的错误。具体地,残留的学习被用来加快培养过程中,以及提高降噪性能。我们的社区模型的一个广泛的评估提供数据集表明,它概括得非常好,不依赖于任何特定类型的图像。最后,我们显著超越国家的最先进的方法和软件应用,提供更好的结果。
5. What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors [PDF] 返回目录
Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik
Abstract: EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.
摘要:可解释的AI(XAI)方法被提出来解释深层神经网络如何通过突出的输入部分视为重要的是到达特定目标决策模型显着的解释预测的投入。但是,它仍然具有挑战性,因为目前的评估方法要么需要从人类或主观的输入与自动评价付出高昂的计算成本量化其解释性的正确性。在本文中,我们提出了借壳触发模式 - 隐藏的恶意功能这一事业误判 - 自动化显着解释的评价。我们的主要发现是,触发器提供地面真理的投入,以评估通过XAI方法确定的区域是否将其输出真正相关。由于后门触发器是故意造成误判的最重要的特点,一个强大的XAI方法应揭露在推理时间他们的存在。我们介绍解释的系统评价3个互补度量的XAI方法生成并评估通过36个模型与特制触发器的特洛伊木马程序使用的颜色,形状,纹理,位置7的状态的最先进的模型和无模型特定事后方法和大小。我们发现六种方法是使用本地的解释和特征相关不能完全凸显触发区域,只有一个免费的模型的方法,可以发现整个触发区域。
Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik
Abstract: EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.
摘要:可解释的AI(XAI)方法被提出来解释深层神经网络如何通过突出的输入部分视为重要的是到达特定目标决策模型显着的解释预测的投入。但是,它仍然具有挑战性,因为目前的评估方法要么需要从人类或主观的输入与自动评价付出高昂的计算成本量化其解释性的正确性。在本文中,我们提出了借壳触发模式 - 隐藏的恶意功能这一事业误判 - 自动化显着解释的评价。我们的主要发现是,触发器提供地面真理的投入,以评估通过XAI方法确定的区域是否将其输出真正相关。由于后门触发器是故意造成误判的最重要的特点,一个强大的XAI方法应揭露在推理时间他们的存在。我们介绍解释的系统评价3个互补度量的XAI方法生成并评估通过36个模型与特制触发器的特洛伊木马程序使用的颜色,形状,纹理,位置7的状态的最先进的模型和无模型特定事后方法和大小。我们发现六种方法是使用本地的解释和特征相关不能完全凸显触发区域,只有一个免费的模型的方法,可以发现整个触发区域。
6. Whole page recognition of historical handwriting [PDF] 返回目录
Hans J.G.A. Dolfing
Abstract: Historical handwritten documents guard an important part of human knowledge only within reach of a few scholars and experts. Recent developments in machine learning and handwriting research have the potential of rendering this information accessible and searchable to a larger audience. To this end, we investigate an end-to-end inference approach without text localization which takes a handwritten page and transcribes its full text. No explicit character, word or line segmentation is involved in inference which is why we call this approach "segmentation free". We explore its robustness and accuracy compared to a line-by-line segmented approach based on the IAM, RODRIGO and ScribbleLens corpora, in three languages with handwriting styles spanning 400 years. We concentrate on model types and sizes which can be deployed on a hand-held or embedded device. We conclude that a whole page inference approach without text localization and segmentation is competitive.
摘要:历史的手写文件,只在少数学者和专家前往守卫人类知识的重要组成部分。在机器学习和手写研究最近的事态发展使这些信息访问和搜索到更多的观众的潜力。为此,我们调查没有文字的本地化,这需要手写的页面,并录制了全文的终端到终端的推理方法。没有明确的文字,文字或线条分割参与推断这就是为什么我们把这种方式称之为“分段自由”。我们探究其坚固性和准确性相比,行由行分割方法基础上,IAM,RODRIGO和ScribbleLens语料库,在三种语言手写风格跨越400年。我们专注于可手持或嵌入式设备上部署模型的类型和尺寸。我们的结论是没有文字的定位和细分一整页的推理方法是有竞争力的。
Hans J.G.A. Dolfing
Abstract: Historical handwritten documents guard an important part of human knowledge only within reach of a few scholars and experts. Recent developments in machine learning and handwriting research have the potential of rendering this information accessible and searchable to a larger audience. To this end, we investigate an end-to-end inference approach without text localization which takes a handwritten page and transcribes its full text. No explicit character, word or line segmentation is involved in inference which is why we call this approach "segmentation free". We explore its robustness and accuracy compared to a line-by-line segmented approach based on the IAM, RODRIGO and ScribbleLens corpora, in three languages with handwriting styles spanning 400 years. We concentrate on model types and sizes which can be deployed on a hand-held or embedded device. We conclude that a whole page inference approach without text localization and segmentation is competitive.
摘要:历史的手写文件,只在少数学者和专家前往守卫人类知识的重要组成部分。在机器学习和手写研究最近的事态发展使这些信息访问和搜索到更多的观众的潜力。为此,我们调查没有文字的本地化,这需要手写的页面,并录制了全文的终端到终端的推理方法。没有明确的文字,文字或线条分割参与推断这就是为什么我们把这种方式称之为“分段自由”。我们探究其坚固性和准确性相比,行由行分割方法基础上,IAM,RODRIGO和ScribbleLens语料库,在三种语言手写风格跨越400年。我们专注于可手持或嵌入式设备上部署模型的类型和尺寸。我们的结论是没有文字的定位和细分一整页的推理方法是有竞争力的。
7. Curriculum Learning with Diversity for Supervised Computer Vision Tasks [PDF] 返回目录
Petru Soviany
Abstract: Curriculum learning techniques are a viable solution for improving the accuracy of automatic models, by replacing the traditional random training with an easy-to-hard strategy. However, the standard curriculum methodology does not automatically provide improved results, but it is constrained by multiple elements like the data distribution or the proposed model. In this paper, we introduce a novel curriculum sampling strategy which takes into consideration the diversity of the training data together with the difficulty of the inputs. We determine the difficulty using a state-of-the-art estimator based on the human time required for solving a visual search task. We consider this kind of difficulty metric to be better suited for solving general problems, as it is not based on certain task-dependent elements, but more on the context of each image. We ensure the diversity during training, giving higher priority to elements from less visited classes. We conduct object detection and instance segmentation experiments on Pascal VOC 2007 and Cityscapes data sets, surpassing both the randomly-trained baseline and the standard curriculum approach. We prove that our strategy is very efficient for unbalanced data sets, leading to faster convergence and more accurate results, when other curriculum-based strategies fail.
摘要:课程学习技术是提高自动挡车型的准确度,通过与易硬的策略取代传统的随机训练一个可行的解决方案。但是,标准的课程方法并不自动提供改善的结果,但它是由多个元件像数据分配或所提出的模型的约束。在本文中,我们介绍一种新颖的课程抽样策略,其与输入端的难度一起考虑了训练数据的多样性。我们使用基于解决视觉搜索任务所需的人力时间一个国家的最先进的估计判断的难度。我们认为这种难度度量,以更适合解决一般问题,因为它不是基于某些任务相关的元素,但更多的每个图像的背景。我们在训练中确保多样性,从少访问类元素赋予较高的优先级。我们进行帕斯卡VOC 2007年和风情的数据集对象检测和实例分割实验,超越了随机训练的基线和标准课程的做法两者。我们证明了我们的策略是不平衡的数据集非常有效的,从而导致更快的收敛和更准确的结果,当其他基于课程的策略失败。
Petru Soviany
Abstract: Curriculum learning techniques are a viable solution for improving the accuracy of automatic models, by replacing the traditional random training with an easy-to-hard strategy. However, the standard curriculum methodology does not automatically provide improved results, but it is constrained by multiple elements like the data distribution or the proposed model. In this paper, we introduce a novel curriculum sampling strategy which takes into consideration the diversity of the training data together with the difficulty of the inputs. We determine the difficulty using a state-of-the-art estimator based on the human time required for solving a visual search task. We consider this kind of difficulty metric to be better suited for solving general problems, as it is not based on certain task-dependent elements, but more on the context of each image. We ensure the diversity during training, giving higher priority to elements from less visited classes. We conduct object detection and instance segmentation experiments on Pascal VOC 2007 and Cityscapes data sets, surpassing both the randomly-trained baseline and the standard curriculum approach. We prove that our strategy is very efficient for unbalanced data sets, leading to faster convergence and more accurate results, when other curriculum-based strategies fail.
摘要:课程学习技术是提高自动挡车型的准确度,通过与易硬的策略取代传统的随机训练一个可行的解决方案。但是,标准的课程方法并不自动提供改善的结果,但它是由多个元件像数据分配或所提出的模型的约束。在本文中,我们介绍一种新颖的课程抽样策略,其与输入端的难度一起考虑了训练数据的多样性。我们使用基于解决视觉搜索任务所需的人力时间一个国家的最先进的估计判断的难度。我们认为这种难度度量,以更适合解决一般问题,因为它不是基于某些任务相关的元素,但更多的每个图像的背景。我们在训练中确保多样性,从少访问类元素赋予较高的优先级。我们进行帕斯卡VOC 2007年和风情的数据集对象检测和实例分割实验,超越了随机训练的基线和标准课程的做法两者。我们证明了我们的策略是不平衡的数据集非常有效的,从而导致更快的收敛和更准确的结果,当其他基于课程的策略失败。
8. Detection Of Concrete Cracks using Dual-channel Deep Convolutional Network [PDF] 返回目录
Babloo Kumar, Sayantari Ghosh
Abstract: Due to cyclic loading and fatigue stress cracks are generated, which affect the safety of any civil infrastructure. Nowadays machine vision is being used to assist us for appropriate maintenance, monitoring and inspection of concrete structures by partial replacement of human-conducted onsite inspections. The current study proposes a crack detection method based on deep convolutional neural network (CNN) for detection of concrete cracks without explicitly calculating the defect features. In the course of the study, a database of 3200 labelled images with concrete cracks has been created, where the contrast, lighting conditions, orientations and severity of the cracks were extremely variable. In this paper, starting from a deep CNN trained with these images of 256 x 256 pixel-resolution, we have gradually optimized the model by identifying the difficulties. Using an augmented dataset, which takes into account the variations and degradations compatible to drone videos, like, random zooming, rotation and intensity scaling and exhaustive ablation studies, we have designed a dual-channel deep CNN which shows high accuracy (~ 92.25%) as well as robustness in finding concrete cracks in realis-tic situations. The model has been tested on the basis of performance and analyzed with the help of feature maps, which establishes the importance of the dual-channel structure.
摘要:由于循环载荷和疲劳应力裂纹产生,这会影响任何民用基础设施的安全性。如今机器视觉被用来帮助我们适当的维护,监测和部分替代人类进行现场检查的混凝土结构的检查。目前的研究提出了一种基于深卷积神经网络(CNN),用于检测混凝土裂缝而无需明确计算缺陷特征的裂纹检测方法。在研究的过程中,3200个与混凝土裂缝标记图像的数据库已经被创建,其中,相反,照明条件,取向和裂纹的严重程度是极其变量。在本文中,从深度CNN训练了与256×256像素的分辨率的这些图像开始,我们逐步通过识别困难优化模型。使用增强数据集,考虑到的变化和以无人驾驶飞机的视频,等等,随机缩放,旋转和强度缩放和详尽烧蚀研究兼容退化,我们已设计了一种双通道深CNN其示出了高的精度(〜92.25%)以及在寻找在realis-TIC情况混凝土裂缝鲁棒性。该模型已经过测试的性能的基础上,并与特征地图的帮助,建立了双通道结构的重要性进行分析。
Babloo Kumar, Sayantari Ghosh
Abstract: Due to cyclic loading and fatigue stress cracks are generated, which affect the safety of any civil infrastructure. Nowadays machine vision is being used to assist us for appropriate maintenance, monitoring and inspection of concrete structures by partial replacement of human-conducted onsite inspections. The current study proposes a crack detection method based on deep convolutional neural network (CNN) for detection of concrete cracks without explicitly calculating the defect features. In the course of the study, a database of 3200 labelled images with concrete cracks has been created, where the contrast, lighting conditions, orientations and severity of the cracks were extremely variable. In this paper, starting from a deep CNN trained with these images of 256 x 256 pixel-resolution, we have gradually optimized the model by identifying the difficulties. Using an augmented dataset, which takes into account the variations and degradations compatible to drone videos, like, random zooming, rotation and intensity scaling and exhaustive ablation studies, we have designed a dual-channel deep CNN which shows high accuracy (~ 92.25%) as well as robustness in finding concrete cracks in realis-tic situations. The model has been tested on the basis of performance and analyzed with the help of feature maps, which establishes the importance of the dual-channel structure.
摘要:由于循环载荷和疲劳应力裂纹产生,这会影响任何民用基础设施的安全性。如今机器视觉被用来帮助我们适当的维护,监测和部分替代人类进行现场检查的混凝土结构的检查。目前的研究提出了一种基于深卷积神经网络(CNN),用于检测混凝土裂缝而无需明确计算缺陷特征的裂纹检测方法。在研究的过程中,3200个与混凝土裂缝标记图像的数据库已经被创建,其中,相反,照明条件,取向和裂纹的严重程度是极其变量。在本文中,从深度CNN训练了与256×256像素的分辨率的这些图像开始,我们逐步通过识别困难优化模型。使用增强数据集,考虑到的变化和以无人驾驶飞机的视频,等等,随机缩放,旋转和强度缩放和详尽烧蚀研究兼容退化,我们已设计了一种双通道深CNN其示出了高的精度(〜92.25%)以及在寻找在realis-TIC情况混凝土裂缝鲁棒性。该模型已经过测试的性能的基础上,并与特征地图的帮助,建立了双通道结构的重要性进行分析。
9. The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data [PDF] 返回目录
Catherine Ordun, Edward Raff, Sanjay Purushotham
Abstract: With the increased attention on thermal imagery for Covid-19 screening, the public sector may believe there are new opportunities to exploit thermal as a modality for computer vision and AI. Thermal physiology research has been ongoing since the late nineties. This research lies at the intersections of medicine, psychology, machine learning, optics, and affective computing. We will review the known factors of thermal vs. RGB imaging for facial emotion recognition. But we also propose that thermal imagery may provide a semi-anonymous modality for computer vision, over RGB, which has been plagued by misuse in facial recognition. However, the transition to adopting thermal imagery as a source for any human-centered AI task is not easy and relies on the availability of high fidelity data sources across multiple demographics and thorough validation. This paper takes the reader on a short review of machine learning in thermal FER and the limitations of collecting and developing thermal FER data for AI training. Our motivation is to provide an introductory overview into recent advances for thermal FER and stimulate conversation about the limitations in current datasets.
摘要:随着热成像更加注意对Covid-19筛查,公共部门可能会认为有利用热为计算机视觉和人工智能一种模式的新机遇。自上个世纪九十年代后期热生理学的研究一直在进行。这项研究的谎言在医学,心理学,学习机,光学和情感计算的交叉点。我们会检讨热与RGB成像的面部情感识别已知的因素。但是,我们也建议热成像可以提供计算机视觉半匿名的方式,在RGB,这在面部识别一直困扰着滥用。然而,为了采用热图像作为任何人为中心的AI任务的源的过渡是不容易的,并且依赖于跨多个人口统计和彻底的验证高保真数据源的可用性。本文以对热FER和机器学习的一个简短的回顾收集和开发AI培训热FER数据的局限性读者。我们的动机是提供一个介绍性的概述为近期热FER进步和刺激有关当前数据集的限制谈话。
Catherine Ordun, Edward Raff, Sanjay Purushotham
Abstract: With the increased attention on thermal imagery for Covid-19 screening, the public sector may believe there are new opportunities to exploit thermal as a modality for computer vision and AI. Thermal physiology research has been ongoing since the late nineties. This research lies at the intersections of medicine, psychology, machine learning, optics, and affective computing. We will review the known factors of thermal vs. RGB imaging for facial emotion recognition. But we also propose that thermal imagery may provide a semi-anonymous modality for computer vision, over RGB, which has been plagued by misuse in facial recognition. However, the transition to adopting thermal imagery as a source for any human-centered AI task is not easy and relies on the availability of high fidelity data sources across multiple demographics and thorough validation. This paper takes the reader on a short review of machine learning in thermal FER and the limitations of collecting and developing thermal FER data for AI training. Our motivation is to provide an introductory overview into recent advances for thermal FER and stimulate conversation about the limitations in current datasets.
摘要:随着热成像更加注意对Covid-19筛查,公共部门可能会认为有利用热为计算机视觉和人工智能一种模式的新机遇。自上个世纪九十年代后期热生理学的研究一直在进行。这项研究的谎言在医学,心理学,学习机,光学和情感计算的交叉点。我们会检讨热与RGB成像的面部情感识别已知的因素。但是,我们也建议热成像可以提供计算机视觉半匿名的方式,在RGB,这在面部识别一直困扰着滥用。然而,为了采用热图像作为任何人为中心的AI任务的源的过渡是不容易的,并且依赖于跨多个人口统计和彻底的验证高保真数据源的可用性。本文以对热FER和机器学习的一个简短的回顾收集和开发AI培训热FER数据的局限性读者。我们的动机是提供一个介绍性的概述为近期热FER进步和刺激有关当前数据集的限制谈话。
10. Heuristic Rank Selection with Progressively Searching Tensor Ring Network [PDF] 返回目录
Nannan Li, Yu Pan, Yaran Chen, Zixiang Ding, Dongbin Zhao, Zenglin Xu
Abstract: Recently, Tensor Ring Networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a certain region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named Progressively Searching Tensor Ring Network Search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100 and HMDB51, achieving state-of-the-art performance.
摘要:近日,张量环网(TRNS)已在深网络中应用,实现了压缩比和精度显着的成就。虽然高度相关TRNS的成效,排名很少研究在以前的作品中,通常设置在实验相等。同时,没有任何启发式选择等级和枚举的方式找到合适的等级是非常耗时。有趣的是,我们发现军衔元素的一部分是在一定的区域,即感兴趣区域敏感,通常聚集。因此,基于上述现象,我们提出了一个名为逐步搜索张量环网搜索(PSTRN),它具有高精度,高效率,以找到最佳的秩的能力的新的进步的遗传算法。通过进化阶段和渐进的阶段,PSTRN可以快速收敛到感兴趣的区域,并收获良好的性能。实验结果表明,PSTRN可以显著降低求秩的复杂性,与枚举法相比。此外,我们的方法是有效的公共基准像MNIST,CIFAR10 / 100和HMDB51,实现国家的最先进的性能。
Nannan Li, Yu Pan, Yaran Chen, Zixiang Ding, Dongbin Zhao, Zenglin Xu
Abstract: Recently, Tensor Ring Networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a certain region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named Progressively Searching Tensor Ring Network Search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100 and HMDB51, achieving state-of-the-art performance.
摘要:近日,张量环网(TRNS)已在深网络中应用,实现了压缩比和精度显着的成就。虽然高度相关TRNS的成效,排名很少研究在以前的作品中,通常设置在实验相等。同时,没有任何启发式选择等级和枚举的方式找到合适的等级是非常耗时。有趣的是,我们发现军衔元素的一部分是在一定的区域,即感兴趣区域敏感,通常聚集。因此,基于上述现象,我们提出了一个名为逐步搜索张量环网搜索(PSTRN),它具有高精度,高效率,以找到最佳的秩的能力的新的进步的遗传算法。通过进化阶段和渐进的阶段,PSTRN可以快速收敛到感兴趣的区域,并收获良好的性能。实验结果表明,PSTRN可以显著降低求秩的复杂性,与枚举法相比。此外,我们的方法是有效的公共基准像MNIST,CIFAR10 / 100和HMDB51,实现国家的最先进的性能。
11. Improving Point Cloud Semantic Segmentation by Learning 3D Object Proposal Generation [PDF] 返回目录
Ozan Unal, Luc Van Gool, Dengxin Dai
Abstract: Point cloud semantic segmentation plays an essential role in autonomous driving, providing vital information about drivable surfaces and nearby objects that can aid higher level tasks such as path planning and collision avoidance. While current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes, they show a significant drop in performance for underrepresented classes that share similar geometric features. We propose a novel Detection Aware 3D Semantic Segmentation (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task. By utilizing multitask training, the shared feature representation of the network is guided to be aware of per class detection features that aid tackling the differentiation of geometrically similar classes. We additionally provide a pipeline that uses DASS to generate high recall proposals for existing 2-stage detectors and demonstrate that the added supervisory signal can be used to improve 3D orientation estimation capabilities. Extensive experiments on both the SemanticKITTI and KITTI object datasets show that DASS can improve 3D semantic segmentation results of geometrically similar classes up to 37.8% IoU in image FOV while maintaining high precision BEV detection results.
摘要:点云语义分割起着自动驾驶至关重要的作用,提供约可行驶的表面和附近的物体,可以帮助更高级别的任务,如路径规划和避免碰撞的重要信息。虽然目前3D语义分割网络专注于为很好的代表类执行伟大的卷积架构,它们显示性能有着相似的几何特征代表性不足的班显著下跌。我们提出了一个新颖的探测感知3D语义分割(DASS)框架,明确利用本地化与辅助立体物检测任务的功能。通过利用多任务的训练,该网络的共享特征表示被引导要注意的每类检测功能以帮助解决几何相似类的分化。我们还提供了一个管道,它使用DASS产生现有2级探测器高召回建议,并表明添加的监控信号能够被用来提高3D方向估计能力。在SemanticKITTI和KITTI对象数据集都广泛实验表明,DASS可提高几何相似的类别的三维语义分割结果高达37.8%IOU在图像FOV,同时保持高精确度BEV的检测结果。
Ozan Unal, Luc Van Gool, Dengxin Dai
Abstract: Point cloud semantic segmentation plays an essential role in autonomous driving, providing vital information about drivable surfaces and nearby objects that can aid higher level tasks such as path planning and collision avoidance. While current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes, they show a significant drop in performance for underrepresented classes that share similar geometric features. We propose a novel Detection Aware 3D Semantic Segmentation (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task. By utilizing multitask training, the shared feature representation of the network is guided to be aware of per class detection features that aid tackling the differentiation of geometrically similar classes. We additionally provide a pipeline that uses DASS to generate high recall proposals for existing 2-stage detectors and demonstrate that the added supervisory signal can be used to improve 3D orientation estimation capabilities. Extensive experiments on both the SemanticKITTI and KITTI object datasets show that DASS can improve 3D semantic segmentation results of geometrically similar classes up to 37.8% IoU in image FOV while maintaining high precision BEV detection results.
摘要:点云语义分割起着自动驾驶至关重要的作用,提供约可行驶的表面和附近的物体,可以帮助更高级别的任务,如路径规划和避免碰撞的重要信息。虽然目前3D语义分割网络专注于为很好的代表类执行伟大的卷积架构,它们显示性能有着相似的几何特征代表性不足的班显著下跌。我们提出了一个新颖的探测感知3D语义分割(DASS)框架,明确利用本地化与辅助立体物检测任务的功能。通过利用多任务的训练,该网络的共享特征表示被引导要注意的每类检测功能以帮助解决几何相似类的分化。我们还提供了一个管道,它使用DASS产生现有2级探测器高召回建议,并表明添加的监控信号能够被用来提高3D方向估计能力。在SemanticKITTI和KITTI对象数据集都广泛实验表明,DASS可提高几何相似的类别的三维语义分割结果高达37.8%IOU在图像FOV,同时保持高精确度BEV的检测结果。
12. A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch [PDF] 返回目录
E. Riba, D. Mishkin, J. Shi, D. Ponsa, F. Moreno-Noguer, G. Bradski
Abstract: This work presents Kornia, an open source computer vision library built upon a set of differentiable routines and modules that aims to solve generic computer vision problems. The package uses PyTorch as its main backend, not only for efficiency but also to take advantage of the reverse auto-differentiation engine to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be integrated into neural networks to train models to perform a wide range of operations including image transformations,camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations on graphical processing units, generating faster systems. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.
摘要:这项工作提出Kornia,一个开源的计算机视觉库在一组微程序和模块,旨在解决通用的计算机视觉问题的建造。包使用PyTorch作为其主要的后端,不仅为效率,而且也采取逆向自动分化发动机的优点,以限定和计算的复杂功能的梯度。通过OpenCV的启发,Kornia是由一组包含运营商的模块可被集成到神经网络训练模型来执行各种操作,包括图像变换,摄像机标定,对极几何的和低级别的图像处理技术,如滤波和边缘检测直接在上的图形处理单元的高维张量表示的是操作,产生更快的系统。提供的使用我们的框架实现古典视力问题的例子包括比较现有视力库的基准。
E. Riba, D. Mishkin, J. Shi, D. Ponsa, F. Moreno-Noguer, G. Bradski
Abstract: This work presents Kornia, an open source computer vision library built upon a set of differentiable routines and modules that aims to solve generic computer vision problems. The package uses PyTorch as its main backend, not only for efficiency but also to take advantage of the reverse auto-differentiation engine to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be integrated into neural networks to train models to perform a wide range of operations including image transformations,camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations on graphical processing units, generating faster systems. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.
摘要:这项工作提出Kornia,一个开源的计算机视觉库在一组微程序和模块,旨在解决通用的计算机视觉问题的建造。包使用PyTorch作为其主要的后端,不仅为效率,而且也采取逆向自动分化发动机的优点,以限定和计算的复杂功能的梯度。通过OpenCV的启发,Kornia是由一组包含运营商的模块可被集成到神经网络训练模型来执行各种操作,包括图像变换,摄像机标定,对极几何的和低级别的图像处理技术,如滤波和边缘检测直接在上的图形处理单元的高维张量表示的是操作,产生更快的系统。提供的使用我们的框架实现古典视力问题的例子包括比较现有视力库的基准。
13. OpenREALM: Real-time Mapping for Unmanned Aerial Vehicles [PDF] 返回目录
Alexander Kern, Markus Bobbe, Yogesh Khedar, Ulf Bestmann
Abstract: This paper presents OpenREALM, a real-time mapping framework for Unmanned Aerial Vehicles (UAVs). A camera attached to the onboard computer of a moving UAV is utilized to acquire high resolution image mosaics of a targeted area of interest. Different modes of operation allow OpenREALM to perform simple stitching assuming an approximate plane ground, or to fully recover complex 3D surface information to extract both elevation maps and geometrically corrected orthophotos. Additionally, the global position of the UAV is used to georeference the data. In all modes incremental progress of the resulting map can be viewed live by an operator on the ground. Obtained, up-to-date surface information will be a push forward to a variety of UAV applications. For the benefit of the community, source code is public at this https URL.
摘要:本文介绍OpenREALM,无人驾驶飞行器(UAV)实时映射框架。附连到移动UAV的机载计算机的相机被用于采集感兴趣的目标区域的高分辨率图像马赛克。不同的操作模式允许OpenREALM来执行简单的缝合假定的近似平面地,或完全恢复复杂的三维表面信息,以提取两标高地图和几何校正的正射影像。此外,无人机的全球位置用于地理参考的数据。在所有模式中生成的地图的逐步进展可以通过地面上的操作来查看现场。获得了最新的表面信息将向前一推,各种无人机应用。为了造福社会,源代码是公开的,在此HTTPS URL。
Alexander Kern, Markus Bobbe, Yogesh Khedar, Ulf Bestmann
Abstract: This paper presents OpenREALM, a real-time mapping framework for Unmanned Aerial Vehicles (UAVs). A camera attached to the onboard computer of a moving UAV is utilized to acquire high resolution image mosaics of a targeted area of interest. Different modes of operation allow OpenREALM to perform simple stitching assuming an approximate plane ground, or to fully recover complex 3D surface information to extract both elevation maps and geometrically corrected orthophotos. Additionally, the global position of the UAV is used to georeference the data. In all modes incremental progress of the resulting map can be viewed live by an operator on the ground. Obtained, up-to-date surface information will be a push forward to a variety of UAV applications. For the benefit of the community, source code is public at this https URL.
摘要:本文介绍OpenREALM,无人驾驶飞行器(UAV)实时映射框架。附连到移动UAV的机载计算机的相机被用于采集感兴趣的目标区域的高分辨率图像马赛克。不同的操作模式允许OpenREALM来执行简单的缝合假定的近似平面地,或完全恢复复杂的三维表面信息,以提取两标高地图和几何校正的正射影像。此外,无人机的全球位置用于地理参考的数据。在所有模式中生成的地图的逐步进展可以通过地面上的操作来查看现场。获得了最新的表面信息将向前一推,各种无人机应用。为了造福社会,源代码是公开的,在此HTTPS URL。
14. Spatial-Temporal Block and LSTM Network for Pedestrian Trajectories Prediction [PDF] 返回目录
Xiong Dan
Abstract: Pedestrian trajectory prediction is a critical to avoid autonomous driving collision. But this prediction is a challenging problem due to social forces and cluttered scenes. Such human-human and human-space interactions lead to many socially plausible trajectories. In this paper, we propose a novel LSTM-based algorithm. We tackle the problem by considering the static scene and pedestrian which combine the Graph Convolutional Networks and Temporal Convolutional Networks to extract features from pedestrians. Each pedestrian in the scene is regarded as a node, and we can obtain the relationship between each node and its neighborhoods by graph embedding. It is LSTM that encode the relationship so that our model predicts nodes trajectories in crowd scenarios simultaneously. To effectively predict multiple possible future trajectories, we further introduce Spatio-Temporal Convolutional Block to make the network flexible. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed ST-Block and we achieve state-of-the-art approaches in human trajectory prediction.
摘要:行人轨迹预测是关键,以避免自动驾驶碰撞。但这种预测是一个具有挑战性的问题,由于社会力量和混乱场面。这种人与人和人的空间相互作用导致许多社会似是而非的轨迹。在本文中,我们提出了一种新的基于LSTM算法。我们考虑了静态场景和行人,其结合了图形卷积网络和时空卷积网络来提取行人功能解决这个问题。场景中的每个行人被视为一个节点,我们可以得到每个节点和它的邻域由图嵌入之间的关系。这是LSTM其编码的关系,使我们的模型预测,同时节点轨迹在人群中的场景。为了有效地预测多个可能的未来轨迹,我们进一步引入时空卷积块,使网络的灵活性。在两个公共数据集,即ETH和UCY,实验结果表明,我们提出的ST-块的有效性,我们实现了国家的最先进的人体轨迹预测方法。
Xiong Dan
Abstract: Pedestrian trajectory prediction is a critical to avoid autonomous driving collision. But this prediction is a challenging problem due to social forces and cluttered scenes. Such human-human and human-space interactions lead to many socially plausible trajectories. In this paper, we propose a novel LSTM-based algorithm. We tackle the problem by considering the static scene and pedestrian which combine the Graph Convolutional Networks and Temporal Convolutional Networks to extract features from pedestrians. Each pedestrian in the scene is regarded as a node, and we can obtain the relationship between each node and its neighborhoods by graph embedding. It is LSTM that encode the relationship so that our model predicts nodes trajectories in crowd scenarios simultaneously. To effectively predict multiple possible future trajectories, we further introduce Spatio-Temporal Convolutional Block to make the network flexible. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed ST-Block and we achieve state-of-the-art approaches in human trajectory prediction.
摘要:行人轨迹预测是关键,以避免自动驾驶碰撞。但这种预测是一个具有挑战性的问题,由于社会力量和混乱场面。这种人与人和人的空间相互作用导致许多社会似是而非的轨迹。在本文中,我们提出了一种新的基于LSTM算法。我们考虑了静态场景和行人,其结合了图形卷积网络和时空卷积网络来提取行人功能解决这个问题。场景中的每个行人被视为一个节点,我们可以得到每个节点和它的邻域由图嵌入之间的关系。这是LSTM其编码的关系,使我们的模型预测,同时节点轨迹在人群中的场景。为了有效地预测多个可能的未来轨迹,我们进一步引入时空卷积块,使网络的灵活性。在两个公共数据集,即ETH和UCY,实验结果表明,我们提出的ST-块的有效性,我们实现了国家的最先进的人体轨迹预测方法。
15. Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion [PDF] 返回目录
Ivan Tishchenko, Sandro Lombardi, Martin R. Oswald, Marc Pollefeys
Abstract: Most of the current scene flow methods choose to model scene flow as a per point translation vector without differentiating between static and dynamic components of 3D motion. In this work we present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes. We propose to learn the relative rigid transformation from a pair of point clouds followed by an iterative refinement. We then learn the non-rigid flow from transformed inputs with the deducted rigid part of the flow. Furthermore, we extend the supervised framework with self-supervisory signals based on the temporal consistency property of a point cloud sequence. Our solution allows both training in a supervised mode complemented by self-supervisory loss terms as well as training in a fully self-supervised mode. We demonstrate that decomposition of scene flow into non-rigid flow and ego-motion flow along with an introduction of the self-supervisory signals allowed us to outperform the current state-of-the-art supervised methods.
摘要:大多数的当前场景的流动方式选择模型场景流量为每点平移矢量没有3D运动的静态和动态组件之间的区别。在这项工作中,我们通过非刚性残余流和自运动流动态3D场景的联合估计呈现用于端至端场景流学习的替代方法。我们建议学习从一对点云后面的迭代细化的相对刚性变换。然后,我们从学习转化的投入与流动的扣除刚性部分非刚性流程。此外,我们扩展了基于云点序列的时间一致性财产的自我监控信号的监督框架。我们的解决方案可以通过自我监督方面的损失以及培训在一个完全自我监管模式的补充有监督的模式,既训练。我们证明现场流是分解成非刚性流动和自我运动流与引进的自动监视信号沿使我们能够超越当前国家的最先进的监督方法。
Ivan Tishchenko, Sandro Lombardi, Martin R. Oswald, Marc Pollefeys
Abstract: Most of the current scene flow methods choose to model scene flow as a per point translation vector without differentiating between static and dynamic components of 3D motion. In this work we present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes. We propose to learn the relative rigid transformation from a pair of point clouds followed by an iterative refinement. We then learn the non-rigid flow from transformed inputs with the deducted rigid part of the flow. Furthermore, we extend the supervised framework with self-supervisory signals based on the temporal consistency property of a point cloud sequence. Our solution allows both training in a supervised mode complemented by self-supervisory loss terms as well as training in a fully self-supervised mode. We demonstrate that decomposition of scene flow into non-rigid flow and ego-motion flow along with an introduction of the self-supervisory signals allowed us to outperform the current state-of-the-art supervised methods.
摘要:大多数的当前场景的流动方式选择模型场景流量为每点平移矢量没有3D运动的静态和动态组件之间的区别。在这项工作中,我们通过非刚性残余流和自运动流动态3D场景的联合估计呈现用于端至端场景流学习的替代方法。我们建议学习从一对点云后面的迭代细化的相对刚性变换。然后,我们从学习转化的投入与流动的扣除刚性部分非刚性流程。此外,我们扩展了基于云点序列的时间一致性财产的自我监控信号的监督框架。我们的解决方案可以通过自我监督方面的损失以及培训在一个完全自我监管模式的补充有监督的模式,既训练。我们证明现场流是分解成非刚性流动和自我运动流与引进的自动监视信号沿使我们能够超越当前国家的最先进的监督方法。
16. Deep N-ary Error Correcting Output Codes [PDF] 返回目录
Hao Zhang, Joey Tianyi Zhou, Tianying Wang, Ivor W. Tsang, Rick Siow Mong Goh
Abstract: Ensemble learning consistently improves the performance of multi-class classification through aggregating a series of base classifiers. To this end, data-independent ensemble methods like Error Correcting Output Codes (ECOC) attract increasing attention due to its easiness of implementation and parallelization. Specifically, traditional ECOCs and its general extension N-ary ECOC decompose the original multi-class classification problem into a series of independent simpler classification subproblems. Unfortunately, integrating ECOCs, especially N-ary ECOC with deep neural networks, termed as deep N-ary ECOC, is not straightforward and yet fully exploited in the literature, due to the high expense of training base learners. To facilitate the training of N-ary ECOC with deep learning base learners, we further propose three different variants of parameter sharing architectures for deep N-ary ECOC. To verify the generalization ability of deep N-ary ECOC, we conduct experiments by varying the backbone with different deep neural network architectures for both image and text classification tasks. Furthermore, extensive ablation studies on deep N-ary ECOC show its superior performance over other deep data-independent ensemble methods.
摘要:集成学习始终提高多类分类的通过聚集一系列的基分类的性能。为此,与数据无关的集成方法类似纠错输出码(ECOC)吸引了越来越多的关注,因为它的实现和并行化的容易程度。具体而言,传统的ECOCs及其一般扩展N进制ECOC分解原始多类分类问题分解成一系列独立的简单分类子问题。不幸的是,整合ECOCs,尤其是N进制ECOC深神经网络,称为深N进制ECOC,并不简单,并且在文献尚未完全开发,由于训练基地学习者的高代价。为了方便与深度学习基础的学习者N元ECOC的培训中,我们进一步提出了参数共享架构的深N进制ECOC的三个不同的变种。为了验证深N进制ECOC的泛化能力,我们通过改变不同的深层神经网络体系架构和用于图像和文本分类任务的骨干进行实验。此外,在深N进制ECOC广泛消融的研究表明其比其它深数据无关的集成方法优越的性能。
Hao Zhang, Joey Tianyi Zhou, Tianying Wang, Ivor W. Tsang, Rick Siow Mong Goh
Abstract: Ensemble learning consistently improves the performance of multi-class classification through aggregating a series of base classifiers. To this end, data-independent ensemble methods like Error Correcting Output Codes (ECOC) attract increasing attention due to its easiness of implementation and parallelization. Specifically, traditional ECOCs and its general extension N-ary ECOC decompose the original multi-class classification problem into a series of independent simpler classification subproblems. Unfortunately, integrating ECOCs, especially N-ary ECOC with deep neural networks, termed as deep N-ary ECOC, is not straightforward and yet fully exploited in the literature, due to the high expense of training base learners. To facilitate the training of N-ary ECOC with deep learning base learners, we further propose three different variants of parameter sharing architectures for deep N-ary ECOC. To verify the generalization ability of deep N-ary ECOC, we conduct experiments by varying the backbone with different deep neural network architectures for both image and text classification tasks. Furthermore, extensive ablation studies on deep N-ary ECOC show its superior performance over other deep data-independent ensemble methods.
摘要:集成学习始终提高多类分类的通过聚集一系列的基分类的性能。为此,与数据无关的集成方法类似纠错输出码(ECOC)吸引了越来越多的关注,因为它的实现和并行化的容易程度。具体而言,传统的ECOCs及其一般扩展N进制ECOC分解原始多类分类问题分解成一系列独立的简单分类子问题。不幸的是,整合ECOCs,尤其是N进制ECOC深神经网络,称为深N进制ECOC,并不简单,并且在文献尚未完全开发,由于训练基地学习者的高代价。为了方便与深度学习基础的学习者N元ECOC的培训中,我们进一步提出了参数共享架构的深N进制ECOC的三个不同的变种。为了验证深N进制ECOC的泛化能力,我们通过改变不同的深层神经网络体系架构和用于图像和文本分类任务的骨干进行实验。此外,在深N进制ECOC广泛消融的研究表明其比其它深数据无关的集成方法优越的性能。
17. Performance Indicator in Multilinear Compressive Learning [PDF] 返回目录
Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis
Abstract: Recently, the Multilinear Compressive Learning (MCL) framework was proposed to efficiently optimize the sensing and learning steps when working with multidimensional signals, i.e. tensors. In Compressive Learning in general, and in MCL in particular, the number of compressed measurements captured by a compressive sensing device characterizes the storage requirement or the bandwidth requirement for transmission. This number, however, does not completely characterize the learning performance of a MCL system. In this paper, we analyze the relationship between the input signal resolution, the number of compressed measurements and the learning performance of MCL. Our empirical analysis shows that the reconstruction error obtained at the initialization step of MCL strongly correlates with the learning performance, thus can act as a good indicator to efficiently characterize learning performances obtained from different sensor configurations without optimizing the entire system.
摘要:最近,多线性压缩学习(MCL)框架被建议将有效地优化感测并用多维信号工作时,即张量学习步骤。在压缩学习一般,并在MCL中特别是,通过压缩感测设备捕捉压缩测量的数目表征存储需求或用于传输的带宽要求。这个数字,但是,并不能完全表征MCL系统的学习表现。在本文中,我们分析输入信号的分辨率,压缩测量的数量和MCL的学习绩效之间的关系。我们的经验分析表明,在MCL的初始化步骤中获得的重建误差强烈学习性能相关,因此可作为一个良好的指示器作用以有效地表征学而不优化整个系统从不同的传感器配置得到的性能。
Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis
Abstract: Recently, the Multilinear Compressive Learning (MCL) framework was proposed to efficiently optimize the sensing and learning steps when working with multidimensional signals, i.e. tensors. In Compressive Learning in general, and in MCL in particular, the number of compressed measurements captured by a compressive sensing device characterizes the storage requirement or the bandwidth requirement for transmission. This number, however, does not completely characterize the learning performance of a MCL system. In this paper, we analyze the relationship between the input signal resolution, the number of compressed measurements and the learning performance of MCL. Our empirical analysis shows that the reconstruction error obtained at the initialization step of MCL strongly correlates with the learning performance, thus can act as a good indicator to efficiently characterize learning performances obtained from different sensor configurations without optimizing the entire system.
摘要:最近,多线性压缩学习(MCL)框架被建议将有效地优化感测并用多维信号工作时,即张量学习步骤。在压缩学习一般,并在MCL中特别是,通过压缩感测设备捕捉压缩测量的数目表征存储需求或用于传输的带宽要求。这个数字,但是,并不能完全表征MCL系统的学习表现。在本文中,我们分析输入信号的分辨率,压缩测量的数量和MCL的学习绩效之间的关系。我们的经验分析表明,在MCL的初始化步骤中获得的重建误差强烈学习性能相关,因此可作为一个良好的指示器作用以有效地表征学而不优化整个系统从不同的传感器配置得到的性能。
18. Frame-wise Cross-modal Match for Video Moment Retrieval [PDF] 返回目录
Haoyu Tang, Jihua Zhu, Meng Liu, Member, IEEE, Zan Gao, Zhiyong Cheng
Abstract: Video moment retrieval targets at retrieving a golden moment in a video for a given natural language query. The main challenges of this task include 1) the requirement of accurately localizing (i.e., the start time and the end time of) the relevant moment in an untrimmed video stream, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, One mainstream approach is to generate a multimodal feature vector for the target query and video frames (e.g., concatenation) and then use a regression approach upon the multimodal feature vector for boundary detection. Although some progress has been achieved by this approach, we argue that those methods have not well captured the cross-modal interactions between the query and video frames. In this paper, we propose an Attentive Cross-modal Relevance Matching (ACRM) model which predicts the temporal bounders based on an interaction modeling between two modalities. In addition, an attention module is introduced to automatically assign higher weights to query words with richer semantic cues, which are considered to be more important for finding relevant video contents. Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy. Extensive experiments on two public datasetsdemonstrate the superiority of our method over several state-of-the-art methods.
摘要:视频瞬间在视频检索黄金时刻对于给定的自然语言查询检索的目标。此任务的主要挑战包括:1)的准确定位(要求即,开始时间和结束时间)有关的时刻未修剪的视频流,以及2)桥接文本查询和视频内容之间的语义差距。为了解决这些问题,一种主流的方法是生成用于目标查询和视频帧(例如,串联)的多峰的特征向量,然后使用基于边界检测多模式特征向量回归方法。尽管取得了一些进展通过这种方法实现的,我们认为,这些方法都不能很好地捕捉查询和视频帧之间的跨模态的相互作用。在本文中,我们提出了这预测基于交互两种模式之间建模时间bounders周到的跨模态相关性匹配(ACRM)模型。此外,注意模块引入更丰富的语义线索,这被认为是寻找相关的视频内容更重要自动分配更高的权重查询词。另一个贡献是,我们提出了一个额外的预测,利用模型训练的内部框架,以提高定位精度。在两个公共datasetsdemonstrate我们方法的优越性广泛的实验,在国家的最先进的几种方法。
Haoyu Tang, Jihua Zhu, Meng Liu, Member, IEEE, Zan Gao, Zhiyong Cheng
Abstract: Video moment retrieval targets at retrieving a golden moment in a video for a given natural language query. The main challenges of this task include 1) the requirement of accurately localizing (i.e., the start time and the end time of) the relevant moment in an untrimmed video stream, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, One mainstream approach is to generate a multimodal feature vector for the target query and video frames (e.g., concatenation) and then use a regression approach upon the multimodal feature vector for boundary detection. Although some progress has been achieved by this approach, we argue that those methods have not well captured the cross-modal interactions between the query and video frames. In this paper, we propose an Attentive Cross-modal Relevance Matching (ACRM) model which predicts the temporal bounders based on an interaction modeling between two modalities. In addition, an attention module is introduced to automatically assign higher weights to query words with richer semantic cues, which are considered to be more important for finding relevant video contents. Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy. Extensive experiments on two public datasetsdemonstrate the superiority of our method over several state-of-the-art methods.
摘要:视频瞬间在视频检索黄金时刻对于给定的自然语言查询检索的目标。此任务的主要挑战包括:1)的准确定位(要求即,开始时间和结束时间)有关的时刻未修剪的视频流,以及2)桥接文本查询和视频内容之间的语义差距。为了解决这些问题,一种主流的方法是生成用于目标查询和视频帧(例如,串联)的多峰的特征向量,然后使用基于边界检测多模式特征向量回归方法。尽管取得了一些进展通过这种方法实现的,我们认为,这些方法都不能很好地捕捉查询和视频帧之间的跨模态的相互作用。在本文中,我们提出了这预测基于交互两种模式之间建模时间bounders周到的跨模态相关性匹配(ACRM)模型。此外,注意模块引入更丰富的语义线索,这被认为是寻找相关的视频内容更重要自动分配更高的权重查询词。另一个贡献是,我们提出了一个额外的预测,利用模型训练的内部框架,以提高定位精度。在两个公共datasetsdemonstrate我们方法的优越性广泛的实验,在国家的最先进的几种方法。
19. Conditional Sequential Modulation for Efficient Global Image Retouching [PDF] 返回目录
Jingwen He, Yihao Liu, Yu Qiao, Chao Dong
Abstract: Photo retouching aims at enhancing the aesthetic visual quality of images that suffer from photographic defects such as over/under exposure, poor contrast, inharmonious saturation. Practically, photo retouching can be accomplished by a series of image processing operations. In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs). Based on this analysis, we propose an extremely light-weight framework - Conditional Sequential Retouching Network (CSRNet) for efficient global image retouching. CSRNet consists of a base network and a condition network. The base network acts like an MLP that processes each pixel independently and the condition network extracts the global features of the input image to generate a condition vector. To realize retouching operations, we modulate the intermediate features using Global Feature Modulation (GFM), of which the parameters are transformed by condition vector. Benefiting from the utilization of $1\times1$ convolution, CSRNet only contains less than 37k trainable parameters, which is orders of magnitude smaller than existing learning-based methods. Extensive experiments show that our method achieves state-of-the-art performance on the benchmark MIT-Adobe FiveK dataset quantitively and qualitatively. Code is available at this https URL.
摘要:照片是提高从照相等缺陷上/下曝光,对比度差,不和谐的饱和遭受图像的审美视觉质量润饰目标。实际上,照片修饰可以通过一系列图像处理操作来实现。在本文中,我们研究了一些常用的修饰操作和数学发现,这些像素无关的操作可以近似或多层感知器(的MLP)制定。基于这种分析,我们提出了一个非常轻量的框架 - 有条件的顺序修饰网(CSRNet)为有效的全球图像润饰。 CSRNet由基座网络和条件的网络。基座网络的作用就像一个MLP独立地处理每个像素和条件网络提取输入图像的全局特征,以产生条件向量。为了实现润饰操作,我们调制使用Global特性调制(GFM)的中间体的特征,其中的参数由条件载体转化。从$ 1 \ $ times1卷积,CSRNet利用率惠民仅包含小于37K可训练参数,这比现有的基于学习的方法小几个数量级。大量的实验表明,该方法实现国家的最先进的性能基准的麻省理工学院的Adobe FiveK数据集定量和质量。代码可在此HTTPS URL。
Jingwen He, Yihao Liu, Yu Qiao, Chao Dong
Abstract: Photo retouching aims at enhancing the aesthetic visual quality of images that suffer from photographic defects such as over/under exposure, poor contrast, inharmonious saturation. Practically, photo retouching can be accomplished by a series of image processing operations. In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs). Based on this analysis, we propose an extremely light-weight framework - Conditional Sequential Retouching Network (CSRNet) for efficient global image retouching. CSRNet consists of a base network and a condition network. The base network acts like an MLP that processes each pixel independently and the condition network extracts the global features of the input image to generate a condition vector. To realize retouching operations, we modulate the intermediate features using Global Feature Modulation (GFM), of which the parameters are transformed by condition vector. Benefiting from the utilization of $1\times1$ convolution, CSRNet only contains less than 37k trainable parameters, which is orders of magnitude smaller than existing learning-based methods. Extensive experiments show that our method achieves state-of-the-art performance on the benchmark MIT-Adobe FiveK dataset quantitively and qualitatively. Code is available at this https URL.
摘要:照片是提高从照相等缺陷上/下曝光,对比度差,不和谐的饱和遭受图像的审美视觉质量润饰目标。实际上,照片修饰可以通过一系列图像处理操作来实现。在本文中,我们研究了一些常用的修饰操作和数学发现,这些像素无关的操作可以近似或多层感知器(的MLP)制定。基于这种分析,我们提出了一个非常轻量的框架 - 有条件的顺序修饰网(CSRNet)为有效的全球图像润饰。 CSRNet由基座网络和条件的网络。基座网络的作用就像一个MLP独立地处理每个像素和条件网络提取输入图像的全局特征,以产生条件向量。为了实现润饰操作,我们调制使用Global特性调制(GFM)的中间体的特征,其中的参数由条件载体转化。从$ 1 \ $ times1卷积,CSRNet利用率惠民仅包含小于37K可训练参数,这比现有的基于学习的方法小几个数量级。大量的实验表明,该方法实现国家的最先进的性能基准的麻省理工学院的Adobe FiveK数据集定量和质量。代码可在此HTTPS URL。
20. Visual Methods for Sign Language Recognition: A Modality-Based Review [PDF] 返回目录
Bassem Seddik, Najoua Essoukri Ben Amara
Abstract: Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields. Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances. They are then prone to creating interactive services for the deaf and hearing-impaired communities. A population that is expected to grow considerably in the years to come. This paper aims at reviewing the human actions recognition literature with the sign-language visual understanding as a scope. The methods analyzed will be mainly organized according to the different types of unimodal inputs exploited, their relative multi-modal combinations and pipeline steps. In each section, we will detail and compare the related datasets, approaches then distinguish the still open contribution paths suitable for the creation of sign language related services. Special attention will be paid to the approaches and commercial solutions handling facial expressions and continuous signing.
摘要:从连续多模态流手语视觉识别仍是最具挑战性的领域之一。在人类活动的识别最新进展利用GPU为基础的学习,从海量数据的提升,而越来越接近人类般的表演。他们是那么容易聋人和听力障碍的社区创建交互式服务。 ,预计将在未来几年大幅增长一个人口来。本文旨在与手语视觉理解为一个范围审查人类行为识别文献。分析将根据不同类型的单峰的输入来组织主要方法利用,它们的相对的多模态的组合和管道的步骤。在每一部分中,我们将详细介绍和比较相关的数据集,然后办法区分适合创作的手语相关服务依然敞开的贡献路径。将特别注意支付给方案和商业解决方案处理面部表情和连续签约。
Bassem Seddik, Najoua Essoukri Ben Amara
Abstract: Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields. Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances. They are then prone to creating interactive services for the deaf and hearing-impaired communities. A population that is expected to grow considerably in the years to come. This paper aims at reviewing the human actions recognition literature with the sign-language visual understanding as a scope. The methods analyzed will be mainly organized according to the different types of unimodal inputs exploited, their relative multi-modal combinations and pipeline steps. In each section, we will detail and compare the related datasets, approaches then distinguish the still open contribution paths suitable for the creation of sign language related services. Special attention will be paid to the approaches and commercial solutions handling facial expressions and continuous signing.
摘要:从连续多模态流手语视觉识别仍是最具挑战性的领域之一。在人类活动的识别最新进展利用GPU为基础的学习,从海量数据的提升,而越来越接近人类般的表演。他们是那么容易聋人和听力障碍的社区创建交互式服务。 ,预计将在未来几年大幅增长一个人口来。本文旨在与手语视觉理解为一个范围审查人类行为识别文献。分析将根据不同类型的单峰的输入来组织主要方法利用,它们的相对的多模态的组合和管道的步骤。在每一部分中,我们将详细介绍和比较相关的数据集,然后办法区分适合创作的手语相关服务依然敞开的贡献路径。将特别注意支付给方案和商业解决方案处理面部表情和连续签约。
21. Neural Face Models for Example-Based Visual Speech Synthesis [PDF] 返回目录
Wolfgang Paier, Anna Hilsmann, Peter Eisert
Abstract: Creating realistic animations of human faces with computer graphic models is still a challenging task. It is often solved either with tedious manual work or motion capture based techniques that require specialised and costly hardware. Example based animation approaches circumvent these problems by re-using captured data of real people. This data is split into short motion samples that can be looped or concatenated in order to create novel motion sequences. The obvious advantages of this approach are the simplicity of use and the high realism, since the data exhibits only real deformations. Rather than tuning weights of a complex face rig, the animation task is performed on a higher level by arranging typical motion samples in a way such that the desired facial performance is achieved. Two difficulties with example based approaches, however, are high memory requirements as well as the creation of artefact-free and realistic transitions between motion samples. We solve these problems by combining the realism and simplicity of example-based animations with the advantages of neural face models. Our neural face model is capable of synthesising high quality 3D face geometry and texture according to a compact latent parameter vector. This latent representation reduces memory requirements by a factor of 100 and helps creating seamless transitions between concatenated motion samples. In this paper, we present a marker-less approach for facial motion capture based on multi-view video. Based on the captured data, we learn a neural representation of facial expressions, which is used to seamlessly concatenate facial performances during the animation procedure. We demonstrate the effectiveness of our approach by synthesising mouthings for Swiss-German sign language based on viseme query sequences.
摘要:创建与计算机图形模型的人脸逼真的动画仍然是一个艰巨的任务。它往往与需要专门繁琐的手工工作或运动捕捉为基础的技术和昂贵的硬件或者解决。例如基于动画办法规避重新使用捕捉真实的人的数据这些问题。这个数据被划分成可以以创造新的运动序列被环或级联的短运动样本。这种方法的明显优点是使用的简便性和高真实感,由于数据仅表现出真正的变形。而不是一个复杂的面部钻机调谐权重,动画任务在一个更高的水平通过的方式,使得期望的面部才能达到最佳性能配置典型的运动样品上进行。与基于示例方法两个困难,然而,高的存储器需求以及运动样品之间建立人工噪声的自由的和现实的转换。我们在解决由现实主义和基于实例的动画的简单性神经的脸部模型的优点结合这些问题。我们的神经脸部模型是能够合成根据紧凑潜参数矢量高质量的3D面部的几何形状和纹理。该潜表示减少了100倍的存储器需求,并有助于创建级联运动样本之间的无缝转换。在本文中,我们提出了一种基于多视图视频面部动作捕捉一个无标记的方法。基于所捕获的数据,我们得知面部表情的神经表示,这是在动画过程期间使用无缝串连面部表演。我们通过综合mouthings基于唇形查询序列瑞士德语用手语证明了该方法的有效性。
Wolfgang Paier, Anna Hilsmann, Peter Eisert
Abstract: Creating realistic animations of human faces with computer graphic models is still a challenging task. It is often solved either with tedious manual work or motion capture based techniques that require specialised and costly hardware. Example based animation approaches circumvent these problems by re-using captured data of real people. This data is split into short motion samples that can be looped or concatenated in order to create novel motion sequences. The obvious advantages of this approach are the simplicity of use and the high realism, since the data exhibits only real deformations. Rather than tuning weights of a complex face rig, the animation task is performed on a higher level by arranging typical motion samples in a way such that the desired facial performance is achieved. Two difficulties with example based approaches, however, are high memory requirements as well as the creation of artefact-free and realistic transitions between motion samples. We solve these problems by combining the realism and simplicity of example-based animations with the advantages of neural face models. Our neural face model is capable of synthesising high quality 3D face geometry and texture according to a compact latent parameter vector. This latent representation reduces memory requirements by a factor of 100 and helps creating seamless transitions between concatenated motion samples. In this paper, we present a marker-less approach for facial motion capture based on multi-view video. Based on the captured data, we learn a neural representation of facial expressions, which is used to seamlessly concatenate facial performances during the animation procedure. We demonstrate the effectiveness of our approach by synthesising mouthings for Swiss-German sign language based on viseme query sequences.
摘要:创建与计算机图形模型的人脸逼真的动画仍然是一个艰巨的任务。它往往与需要专门繁琐的手工工作或运动捕捉为基础的技术和昂贵的硬件或者解决。例如基于动画办法规避重新使用捕捉真实的人的数据这些问题。这个数据被划分成可以以创造新的运动序列被环或级联的短运动样本。这种方法的明显优点是使用的简便性和高真实感,由于数据仅表现出真正的变形。而不是一个复杂的面部钻机调谐权重,动画任务在一个更高的水平通过的方式,使得期望的面部才能达到最佳性能配置典型的运动样品上进行。与基于示例方法两个困难,然而,高的存储器需求以及运动样品之间建立人工噪声的自由的和现实的转换。我们在解决由现实主义和基于实例的动画的简单性神经的脸部模型的优点结合这些问题。我们的神经脸部模型是能够合成根据紧凑潜参数矢量高质量的3D面部的几何形状和纹理。该潜表示减少了100倍的存储器需求,并有助于创建级联运动样本之间的无缝转换。在本文中,我们提出了一种基于多视图视频面部动作捕捉一个无标记的方法。基于所捕获的数据,我们得知面部表情的神经表示,这是在动画过程期间使用无缝串连面部表演。我们通过综合mouthings基于唇形查询序列瑞士德语用手语证明了该方法的有效性。
22. SAMOT: Switcher-Aware Multi-Object Tracking and Still Another MOT Measure [PDF] 返回目录
Weitao Feng, Zhihao Hu, Baopu Li, Weihao Gan, Wei Wu, Wanli Ouyang
Abstract: Multi-Object Tracking (MOT) is a popular topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a challenging problem. To address it, switchers, i.e., confusing targets thatmay cause identity issues, should be focused. Based on this motivation,this paper proposes a novel switcher-aware framework for multi-object tracking, which consists of Spatial Conflict Graph model (SCG) and Switcher-Aware Association (SAA). The SCG eliminates spatial switch-ers within one frame by building a conflict graph and working out the optimal subgraph. The SAA utilizes additional information from potential temporal switcher across frames, enabling more accurate data association. Besides, we propose a new MOT evaluation measure, Still Another IDF score (SAIDF), aiming to focus more on identity issues.This new measure may overcome some problems of the previous measures and provide a better insight for identity issues in MOT. Finally,the proposed framework is tested under both the traditional measures and the new measure we proposed. Extensive experiments show that ourmethod achieves competitive results on all measure.
摘要:多目标追踪(MOT)是计算机视觉中的热门话题。然而,身份的问题,即,对象被错误地与不同的身份的另一个对象相关联,仍然是一个挑战性的问题。为了解决这个问题,切换器,即混淆目标thatmay事业身份的问题,值得重点关注。在此基础上的动机,提出一种多对象跟踪的新颖切换感知框架,它由空间冲突图模型(SCG)和切换器感知协会(SAA)的。在SCG通过构建一个冲突图和工作了最优子消除了在一个帧内的空间开关-ERS。的SAA利用来自在帧之间潜在的时间切换的其他信息,从而实现更精确的数据关联。此外,我们提出了一个新的MOT评价尺度,另一个IDF评分(SAIDF),旨在更专注于身份issues.This新措施可以克服以前的措施的一些问题,并提供了在MOT身份问题,更好地了解。最后,所提出的框架下既有传统的措施,并提出了新的措施进行测试。大量的实验表明,ourmethod达到所有指标有竞争力的结果。
Weitao Feng, Zhihao Hu, Baopu Li, Weihao Gan, Wei Wu, Wanli Ouyang
Abstract: Multi-Object Tracking (MOT) is a popular topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a challenging problem. To address it, switchers, i.e., confusing targets thatmay cause identity issues, should be focused. Based on this motivation,this paper proposes a novel switcher-aware framework for multi-object tracking, which consists of Spatial Conflict Graph model (SCG) and Switcher-Aware Association (SAA). The SCG eliminates spatial switch-ers within one frame by building a conflict graph and working out the optimal subgraph. The SAA utilizes additional information from potential temporal switcher across frames, enabling more accurate data association. Besides, we propose a new MOT evaluation measure, Still Another IDF score (SAIDF), aiming to focus more on identity issues.This new measure may overcome some problems of the previous measures and provide a better insight for identity issues in MOT. Finally,the proposed framework is tested under both the traditional measures and the new measure we proposed. Extensive experiments show that ourmethod achieves competitive results on all measure.
摘要:多目标追踪(MOT)是计算机视觉中的热门话题。然而,身份的问题,即,对象被错误地与不同的身份的另一个对象相关联,仍然是一个挑战性的问题。为了解决这个问题,切换器,即混淆目标thatmay事业身份的问题,值得重点关注。在此基础上的动机,提出一种多对象跟踪的新颖切换感知框架,它由空间冲突图模型(SCG)和切换器感知协会(SAA)的。在SCG通过构建一个冲突图和工作了最优子消除了在一个帧内的空间开关-ERS。的SAA利用来自在帧之间潜在的时间切换的其他信息,从而实现更精确的数据关联。此外,我们提出了一个新的MOT评价尺度,另一个IDF评分(SAIDF),旨在更专注于身份issues.This新措施可以克服以前的措施的一些问题,并提供了在MOT身份问题,更好地了解。最后,所提出的框架下既有传统的措施,并提出了新的措施进行测试。大量的实验表明,ourmethod达到所有指标有竞争力的结果。
23. Learning Image Labels On-the-fly for Training Robust Classification Models [PDF] 返回目录
Xiaosong Wang, Ziyue Xu, Dong Yang, Leo Tam, Holger Roth, Daguang Xu
Abstract: Current deep learning paradigms largely benefit from the tremendous amount of annotated data. However, the quality of the annotations often varies among labelers. Multi-observer studies have been conducted to study these annotation variances (by labeling the same data for multiple times) and its effects on critical applications like medical image analysis. This process indeed adds an extra burden to the already tedious annotation work that usually requires professional training and expertise in the specific domains. On the other hand, automated annotation methods based on NLP algorithms have recently shown promise as a reasonable alternative, relying on the existing diagnostic reports of those images that are widely available in the clinical system. Compared to human labelers, different algorithms provide labels with varying qualities that are even noisier. In this paper, we show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks. Specifically, the concept of attention-on-label is introduced to sample better label sets on-the-fly as the training data. A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes. We apply the attention-on-label scheme on the classification task of a synthetic noisy CIFAR-10 dataset to prove the concept, and then demonstrate superior results (3-5% increase on average in multiple disease classification AUCs) on the chest x-ray images from a hospital-scale dataset (MIMIC-CXR) and hand-labeled dataset (OpenI) in comparison to regular training paradigms.
摘要:当前深度学习范式主要来自注释数据的大量获益。然而,注释的质量往往贴标的不同而不同。多观察研究已进行研究这些注释差异(做上标记,多次相同的数据)和像医学图像分析关键应用的影响。这个过程确实增加了额外的负担,已经繁琐注释的工作,通常需要在特定领域的专业培训和专业知识。在另一方面,基于NLP算法自动注释的方法最近已显示有希望作为一个合理的选择,依赖于那些在临床系统广泛可用图像的现有的诊断报告。相比于人类贴标机,不同的算法提供不同的是,即使喧闹的品质标签。在本文中,我们显示了如何噪声注释(例如,来自不同的算法为基础的贴标)可以被一起使用,相互受益的分类任务的学习。具体而言,注意力上标签概念引入到即时作为训练样本数据较好标签组。阿元的培训基于标签的采样模块的设计参加惠民模型通过附加的反向传播过程中学习最标签。我们采用一种合成嘈杂CIFAR-10数据集的分类任务的注意力的标签计划,以验证概念,然后展示胸部的x优异的业绩(平均3-5%的增幅多种疾病分类的AUC)从相比于常规的训练范式医院大规模数据集(MIMIC-CXR)和手工标记的数据集(OpenI)射线图像。
Xiaosong Wang, Ziyue Xu, Dong Yang, Leo Tam, Holger Roth, Daguang Xu
Abstract: Current deep learning paradigms largely benefit from the tremendous amount of annotated data. However, the quality of the annotations often varies among labelers. Multi-observer studies have been conducted to study these annotation variances (by labeling the same data for multiple times) and its effects on critical applications like medical image analysis. This process indeed adds an extra burden to the already tedious annotation work that usually requires professional training and expertise in the specific domains. On the other hand, automated annotation methods based on NLP algorithms have recently shown promise as a reasonable alternative, relying on the existing diagnostic reports of those images that are widely available in the clinical system. Compared to human labelers, different algorithms provide labels with varying qualities that are even noisier. In this paper, we show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks. Specifically, the concept of attention-on-label is introduced to sample better label sets on-the-fly as the training data. A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes. We apply the attention-on-label scheme on the classification task of a synthetic noisy CIFAR-10 dataset to prove the concept, and then demonstrate superior results (3-5% increase on average in multiple disease classification AUCs) on the chest x-ray images from a hospital-scale dataset (MIMIC-CXR) and hand-labeled dataset (OpenI) in comparison to regular training paradigms.
摘要:当前深度学习范式主要来自注释数据的大量获益。然而,注释的质量往往贴标的不同而不同。多观察研究已进行研究这些注释差异(做上标记,多次相同的数据)和像医学图像分析关键应用的影响。这个过程确实增加了额外的负担,已经繁琐注释的工作,通常需要在特定领域的专业培训和专业知识。在另一方面,基于NLP算法自动注释的方法最近已显示有希望作为一个合理的选择,依赖于那些在临床系统广泛可用图像的现有的诊断报告。相比于人类贴标机,不同的算法提供不同的是,即使喧闹的品质标签。在本文中,我们显示了如何噪声注释(例如,来自不同的算法为基础的贴标)可以被一起使用,相互受益的分类任务的学习。具体而言,注意力上标签概念引入到即时作为训练样本数据较好标签组。阿元的培训基于标签的采样模块的设计参加惠民模型通过附加的反向传播过程中学习最标签。我们采用一种合成嘈杂CIFAR-10数据集的分类任务的注意力的标签计划,以验证概念,然后展示胸部的x优异的业绩(平均3-5%的增幅多种疾病分类的AUC)从相比于常规的训练范式医院大规模数据集(MIMIC-CXR)和手工标记的数据集(OpenI)射线图像。
24. Beyond Triplet Loss: Person Re-identification with Fine-grained Difference-aware Pairwise Loss [PDF] 返回目录
Cheng Yan, Guansong Pang, Xiao Bai, Jun Zhou, Lin Gu
Abstract: Person Re-IDentification (ReID) aims at re-identifying persons from different viewpoints across multiple cameras. Capturing the fine-grained appearance differences is often the key to accurate person ReID, because many identities can be differentiated only when looking into these fine-grained differences. However, most state-of-the-art person ReID approaches, typically driven by a triplet loss, fail to effectively learn the fine-grained features as they are focused more on differentiating large appearance differences. To address this issue, we introduce a novel pairwise loss function that enables ReID models to learn the fine-grained features by adaptively enforcing an exponential penalization on the images of small differences and a bounded penalization on the images of large differences. The proposed loss is generic and can be used as a plugin to replace the triplet loss to significantly enhance different types of state-of-the-art approaches. Experimental results on four benchmark datasets show that the proposed loss substantially outperforms a number of popular loss functions by large margins; and it also enables significantly improved data efficiency.
摘要:人重新鉴定(里德)旨在从多个摄像头不同的观点再用于人员鉴别。捕捉细粒度的外观差异往往是关键,准确的人里德,因为寻找到这些细粒度的差异,当众多的身份只能分化。然而,大多数国家的最先进的人里德接近,通常由三重损失驱动,无法有效地学习,因为他们更专注于差异化较大的差异外观细粒度的功能。为了解决这个问题,我们引入了一种新的成对损失函数,使里德模型通过自适应执行上的微小差异和分歧较大的图像有界惩罚图像指数惩罚学习细粒度的功能。所提出的损失是通用的,可以用作插件以取代三重损失显著提高不同类型的状态的最先进的方法。在四个基准数据集实验结果表明,所提出的损失基本上优于许多大流行边缘损失函数;并且它也使显著改进的数据效率。
Cheng Yan, Guansong Pang, Xiao Bai, Jun Zhou, Lin Gu
Abstract: Person Re-IDentification (ReID) aims at re-identifying persons from different viewpoints across multiple cameras. Capturing the fine-grained appearance differences is often the key to accurate person ReID, because many identities can be differentiated only when looking into these fine-grained differences. However, most state-of-the-art person ReID approaches, typically driven by a triplet loss, fail to effectively learn the fine-grained features as they are focused more on differentiating large appearance differences. To address this issue, we introduce a novel pairwise loss function that enables ReID models to learn the fine-grained features by adaptively enforcing an exponential penalization on the images of small differences and a bounded penalization on the images of large differences. The proposed loss is generic and can be used as a plugin to replace the triplet loss to significantly enhance different types of state-of-the-art approaches. Experimental results on four benchmark datasets show that the proposed loss substantially outperforms a number of popular loss functions by large margins; and it also enables significantly improved data efficiency.
摘要:人重新鉴定(里德)旨在从多个摄像头不同的观点再用于人员鉴别。捕捉细粒度的外观差异往往是关键,准确的人里德,因为寻找到这些细粒度的差异,当众多的身份只能分化。然而,大多数国家的最先进的人里德接近,通常由三重损失驱动,无法有效地学习,因为他们更专注于差异化较大的差异外观细粒度的功能。为了解决这个问题,我们引入了一种新的成对损失函数,使里德模型通过自适应执行上的微小差异和分歧较大的图像有界惩罚图像指数惩罚学习细粒度的功能。所提出的损失是通用的,可以用作插件以取代三重损失显著提高不同类型的状态的最先进的方法。在四个基准数据集实验结果表明,所提出的损失基本上优于许多大流行边缘损失函数;并且它也使显著改进的数据效率。
25. PennSyn2Real: Training Object Recognition Models without Human Labeling [PDF] 返回目录
Ty Nguyen, Ian D. Miller, Avi Cohen, Dinesh Thakur, Shashank Prasad, Arjun Guru, Camillo J. Taylor, Pratik Chaudrahi, Vijay Kumar
Abstract: Scalability is a critical problem in generating training images for deep learning models. We propose PennSyn2Real - a photo-realistic synthetic dataset with more than 100, 000 4K images of more than 20 types of micro aerial vehicles (MAV) that can be used to generate an arbitrary number of training images for MAV detection and classification. Our data generation framework bootstraps chroma-keying, a matured cinematography technique with a motion tracking system, providing artifact-free and curated annotated images where object orientations and lighting are controlled. This framework is easy to set up and can be applied to a broad range of objects, reducing the gap between synthetic and real-world data. We demonstrate that CNNs trained on the synthetic data have on par performance with those trained on real-world data in both semantic segmentation and object detection setups.
摘要:可扩展性是在生成训练图像的深度学习模型的关键问题。我们提出PennSyn2Real - 一个照片般逼真的合成数据集与多于100,多于20点的类型,可以被用来生成训练图像对MAV检测和分类的任意数量的微飞行器(MAV)的000 4K图像。我们的数据生成框架自举色度键控,与运动跟踪系统中的成熟的摄影技术中,提供其中对象的方向和照明控制伪影且无策划注释的图像。该框架是易于设置并且可以应用于范围广泛的对象,减少了合成的和真实世界的数据之间的差距。我们证明细胞神经网络对合成数据训练对那些在这两个语义分割和目标检测设置培训了真实世界的数据标准杆的成绩。
Ty Nguyen, Ian D. Miller, Avi Cohen, Dinesh Thakur, Shashank Prasad, Arjun Guru, Camillo J. Taylor, Pratik Chaudrahi, Vijay Kumar
Abstract: Scalability is a critical problem in generating training images for deep learning models. We propose PennSyn2Real - a photo-realistic synthetic dataset with more than 100, 000 4K images of more than 20 types of micro aerial vehicles (MAV) that can be used to generate an arbitrary number of training images for MAV detection and classification. Our data generation framework bootstraps chroma-keying, a matured cinematography technique with a motion tracking system, providing artifact-free and curated annotated images where object orientations and lighting are controlled. This framework is easy to set up and can be applied to a broad range of objects, reducing the gap between synthetic and real-world data. We demonstrate that CNNs trained on the synthetic data have on par performance with those trained on real-world data in both semantic segmentation and object detection setups.
摘要:可扩展性是在生成训练图像的深度学习模型的关键问题。我们提出PennSyn2Real - 一个照片般逼真的合成数据集与多于100,多于20点的类型,可以被用来生成训练图像对MAV检测和分类的任意数量的微飞行器(MAV)的000 4K图像。我们的数据生成框架自举色度键控,与运动跟踪系统中的成熟的摄影技术中,提供其中对象的方向和照明控制伪影且无策划注释的图像。该框架是易于设置并且可以应用于范围广泛的对象,减少了合成的和真实世界的数据之间的差距。我们证明细胞神经网络对合成数据训练对那些在这两个语义分割和目标检测设置培训了真实世界的数据标准杆的成绩。
26. Design of Efficient Deep Learning models for Determining Road Surface Condition from Roadside Camera Images and Weather Data [PDF] 返回目录
Juan Carrillo, Mark Crowley, Guangyuan Pan, Liping Fu
Abstract: Road maintenance during the Winter season is a safety critical and resource demanding operation. One of its key activities is determining road surface condition (RSC) in order to prioritize roads and allocate cleaning efforts such as plowing or salting. Two conventional approaches for determining RSC are: visual examination of roadside camera images by trained personnel and patrolling the roads to perform on-site inspections. However, with more than 500 cameras collecting images across Ontario, visual examination becomes a resource-intensive activity, difficult to scale especially during periods of snowstorms. This paper presents the results of a study focused on improving the efficiency of road maintenance operations. We use multiple Deep Learning models to automatically determine RSC from roadside camera images and weather variables, extending previous research where similar methods have been used to deal with the problem. The dataset we use was collected during the 2017-2018 Winter season from 40 stations connected to the Ontario Road Weather Information System (RWIS), it includes 14.000 labeled images and 70.000 weather measurements. We train and evaluate the performance of seven state-of-the-art models from the Computer Vision literature, including the recent DenseNet, NASNet, and MobileNet. Moreover, by following systematic ablation experiments we adapt previously published Deep Learning models and reduce their number of parameters to about ~1.3% compared to their original parameter count, and by integrating observations from weather variables the models are able to better ascertain RSC under poor visibility conditions.
摘要:在冬季道路维修是一个关键的安全和资源要求很高的操作。它的一个重要活动的,以便优先道路判定路面状态(RSC),并分配清洁努力如犁地或盐析。确定RSC两种常规方法是:由受过培训的人员路边摄像机图像的视觉检查和巡逻道路进行现场检查。然而,有超过500个相机安大略省采集图像,目视检查变得特别是在雪灾期间的资源密集型活动,很难形成规模。本文提出了一种研究的结果集中在提高公路养护作业的效率。我们使用多个深度学习模型自动从路边摄像机的图像和气象变量决定RSC,其中类似的方法已被用来解决这个问题扩展以前的研究。在从连接到安大略省道路气象信息系统(RWIS),它包括了14.000标记的图像和70.000天气测量40个站2017年至2018年冬季收集的数据集,我们使用。我们培养,并从计算机视觉文献评价国家的最先进的七款车型的性能,包括最近DenseNet,NASNet和MobileNet。此外,按照系统性消融实验中,我们适应先前公布的深度学习模式,减少其参数的数量约〜1.3%,比原来的参数个数,并通过整合从天气变量观测模型能够在低能见度更好地确定RSC条件。
Juan Carrillo, Mark Crowley, Guangyuan Pan, Liping Fu
Abstract: Road maintenance during the Winter season is a safety critical and resource demanding operation. One of its key activities is determining road surface condition (RSC) in order to prioritize roads and allocate cleaning efforts such as plowing or salting. Two conventional approaches for determining RSC are: visual examination of roadside camera images by trained personnel and patrolling the roads to perform on-site inspections. However, with more than 500 cameras collecting images across Ontario, visual examination becomes a resource-intensive activity, difficult to scale especially during periods of snowstorms. This paper presents the results of a study focused on improving the efficiency of road maintenance operations. We use multiple Deep Learning models to automatically determine RSC from roadside camera images and weather variables, extending previous research where similar methods have been used to deal with the problem. The dataset we use was collected during the 2017-2018 Winter season from 40 stations connected to the Ontario Road Weather Information System (RWIS), it includes 14.000 labeled images and 70.000 weather measurements. We train and evaluate the performance of seven state-of-the-art models from the Computer Vision literature, including the recent DenseNet, NASNet, and MobileNet. Moreover, by following systematic ablation experiments we adapt previously published Deep Learning models and reduce their number of parameters to about ~1.3% compared to their original parameter count, and by integrating observations from weather variables the models are able to better ascertain RSC under poor visibility conditions.
摘要:在冬季道路维修是一个关键的安全和资源要求很高的操作。它的一个重要活动的,以便优先道路判定路面状态(RSC),并分配清洁努力如犁地或盐析。确定RSC两种常规方法是:由受过培训的人员路边摄像机图像的视觉检查和巡逻道路进行现场检查。然而,有超过500个相机安大略省采集图像,目视检查变得特别是在雪灾期间的资源密集型活动,很难形成规模。本文提出了一种研究的结果集中在提高公路养护作业的效率。我们使用多个深度学习模型自动从路边摄像机的图像和气象变量决定RSC,其中类似的方法已被用来解决这个问题扩展以前的研究。在从连接到安大略省道路气象信息系统(RWIS),它包括了14.000标记的图像和70.000天气测量40个站2017年至2018年冬季收集的数据集,我们使用。我们培养,并从计算机视觉文献评价国家的最先进的七款车型的性能,包括最近DenseNet,NASNet和MobileNet。此外,按照系统性消融实验中,我们适应先前公布的深度学习模式,减少其参数的数量约〜1.3%,比原来的参数个数,并通过整合从天气变量观测模型能够在低能见度更好地确定RSC条件。
27. Towards image-based automatic meter reading in unconstrained scenarios: A robust and efficient approach [PDF] 返回目录
Rayson Laroca, Alessandra B. Araujo, Luiz A. Zanlorensi, Eduardo C. de Almeida, David Menotti
Abstract: Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system outperforms six baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values.
摘要:基于图像的自动抄表(AMR)的现有方法已在控制良好的情况下拍摄的图像进行评估。然而,现实世界的抄表礼物不受约束的情况下被方式更加具有挑战性,由于污垢,各种灯光条件下,规模的变化,面内和外的平面内旋转,以及其他因素。在这项工作中,我们提出了一个AMR集中在不受约束的情况下结束到终端的方法。我们的主要贡献是在AMR流水线一个新的阶段,称为角点检测和计数分类,使专项整治柜台区域的插入 - 以及拒绝字迹/故障米 - 之前的识别阶段。我们还引进了可公开获得的数据集,名为COPEL-AMR,包含由服务公司的员工自己在该领域取得的12500米图像,包括故障米的情况下阅读是闭塞,难以辨认2500倍的图像。实验证明评价,提出系统优于6个基线在识别率方面,同时仍然非常有效。此外,因为很少读数误差在实际应用中的耐受性,我们证明了我们的AMR系统实现了令人印象深刻的识别率(即,> 99%)拒绝以较低的信心值进行读数时。
Rayson Laroca, Alessandra B. Araujo, Luiz A. Zanlorensi, Eduardo C. de Almeida, David Menotti
Abstract: Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system outperforms six baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values.
摘要:基于图像的自动抄表(AMR)的现有方法已在控制良好的情况下拍摄的图像进行评估。然而,现实世界的抄表礼物不受约束的情况下被方式更加具有挑战性,由于污垢,各种灯光条件下,规模的变化,面内和外的平面内旋转,以及其他因素。在这项工作中,我们提出了一个AMR集中在不受约束的情况下结束到终端的方法。我们的主要贡献是在AMR流水线一个新的阶段,称为角点检测和计数分类,使专项整治柜台区域的插入 - 以及拒绝字迹/故障米 - 之前的识别阶段。我们还引进了可公开获得的数据集,名为COPEL-AMR,包含由服务公司的员工自己在该领域取得的12500米图像,包括故障米的情况下阅读是闭塞,难以辨认2500倍的图像。实验证明评价,提出系统优于6个基线在识别率方面,同时仍然非常有效。此外,因为很少读数误差在实际应用中的耐受性,我们证明了我们的AMR系统实现了令人印象深刻的识别率(即,> 99%)拒绝以较低的信心值进行读数时。
28. Segmentation and Defect Classification of the Power Line Insulators: A Deep Learning-based Approach [PDF] 返回目录
Arman Alahyari, Anton Hinneck, Rahim Tariverdi, David Pozo
Abstract: Power transmission network physically connects the power generators to the electric consumers extending over hundreds of kilometers. There are many components in the transmission infrastructure that requires a proper inspection to guarantee flawless performance and reliable delivery, which, if done manually, can be very costly and time taking. One of the essential components is the insulator, where its failure could cause the interruption of the entire transmission line or widespread power failure. Automated fault detection of insulators could significantly decrease inspection time and its related cost. Recently, several works have been proposed based on convolutional neural networks to deal with the issue mentioned above. However, the existing studies in the literature focus on specific types of fault for insulators. Thus, in this study, we introduce a two-stage model in which we first segment insulators from the background images and then classify its state into four different categories, namely: healthy, broken, burned, and missing cap. The test results show that the proposed approach can realize the effective segmentation of insulators and achieve high accuracy in detecting several types of faults.
摘要:电力传输网络物理上的发电机连接到上延伸数百公里的耗电。有在需要保证完美的表现和可靠的交付,而如果手工完成,可以是非常昂贵和时间服用适当的检查传输基础设施的许多组件。其中一个重要的组成部分是绝缘体,在其故障可能导致整个传输线或广泛的电源故障中断。绝缘子的自动故障检测能显著减少检查时间及相关成本。近日,几部作品都基于卷积神经网络来处理这个问题上面提到的被提出。然而,在文献中关注具体故障类型为绝缘体的现有研究。因此,在该研究中,我们引入一个两阶段模型,其中从所述背景图像,然后我们第一段绝缘体其状态分为四种不同的类别,即:健康,破碎,焚烧,缺少帽。测试结果表明,该方法可以实现绝缘体的有效分割,实现高精度检测几种类型的故障。
Arman Alahyari, Anton Hinneck, Rahim Tariverdi, David Pozo
Abstract: Power transmission network physically connects the power generators to the electric consumers extending over hundreds of kilometers. There are many components in the transmission infrastructure that requires a proper inspection to guarantee flawless performance and reliable delivery, which, if done manually, can be very costly and time taking. One of the essential components is the insulator, where its failure could cause the interruption of the entire transmission line or widespread power failure. Automated fault detection of insulators could significantly decrease inspection time and its related cost. Recently, several works have been proposed based on convolutional neural networks to deal with the issue mentioned above. However, the existing studies in the literature focus on specific types of fault for insulators. Thus, in this study, we introduce a two-stage model in which we first segment insulators from the background images and then classify its state into four different categories, namely: healthy, broken, burned, and missing cap. The test results show that the proposed approach can realize the effective segmentation of insulators and achieve high accuracy in detecting several types of faults.
摘要:电力传输网络物理上的发电机连接到上延伸数百公里的耗电。有在需要保证完美的表现和可靠的交付,而如果手工完成,可以是非常昂贵和时间服用适当的检查传输基础设施的许多组件。其中一个重要的组成部分是绝缘体,在其故障可能导致整个传输线或广泛的电源故障中断。绝缘子的自动故障检测能显著减少检查时间及相关成本。近日,几部作品都基于卷积神经网络来处理这个问题上面提到的被提出。然而,在文献中关注具体故障类型为绝缘体的现有研究。因此,在该研究中,我们引入一个两阶段模型,其中从所述背景图像,然后我们第一段绝缘体其状态分为四种不同的类别,即:健康,破碎,焚烧,缺少帽。测试结果表明,该方法可以实现绝缘体的有效分割,实现高精度检测几种类型的故障。
29. Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations [PDF] 返回目录
Alex Wong, Mukund Mundhra, Stefano Soatto
Abstract: We study the effect of adversarial perturbations of images on the estimates of disparity by deep learning models trained for stereo. We show that imperceptible additive perturbations can significantly alter the disparity map, and correspondingly the perceived geometry of the scene. These perturbations not only affect the specific model they are crafted for, but transfer to models with different architecture, trained with different loss functions. We show that, when used for adversarial data augmentation, our perturbations result in trained models that are more robust, without sacrificing overall accuracy of the model. This is unlike what has been observed in image classification, where adding the perturbed images to the training set makes the model less vulnerable to adversarial perturbations, but to the detriment of overall accuracy. We test our method using the most recent stereo networks and evaluate their performance on public benchmark datasets.
摘要:研究了由训练有素的立体深度学习模型差距的估计图像的对抗扰动的影响。我们证明了潜移默化的添加剂扰动可能显著改变视差映射,并相应场景的感知几何。这些扰动不仅影响他们制作的具体型号,但转移到不同的架构模型,用不同的损失函数训练。我们表明,用于对抗性增强的数据时,我们的扰动导致训练的模式,更加坚固,在不牺牲模型的整体精度。这不像什么在图像分类,其中添加扰动图像训练集,使模型更不容易受到干扰对抗中观察到,但整体精度造成损害。我们测试使用的是最新的立体网络,我们的方法,并评估其对公共基准数据集的性能。
Alex Wong, Mukund Mundhra, Stefano Soatto
Abstract: We study the effect of adversarial perturbations of images on the estimates of disparity by deep learning models trained for stereo. We show that imperceptible additive perturbations can significantly alter the disparity map, and correspondingly the perceived geometry of the scene. These perturbations not only affect the specific model they are crafted for, but transfer to models with different architecture, trained with different loss functions. We show that, when used for adversarial data augmentation, our perturbations result in trained models that are more robust, without sacrificing overall accuracy of the model. This is unlike what has been observed in image classification, where adding the perturbed images to the training set makes the model less vulnerable to adversarial perturbations, but to the detriment of overall accuracy. We test our method using the most recent stereo networks and evaluate their performance on public benchmark datasets.
摘要:研究了由训练有素的立体深度学习模型差距的估计图像的对抗扰动的影响。我们证明了潜移默化的添加剂扰动可能显著改变视差映射,并相应场景的感知几何。这些扰动不仅影响他们制作的具体型号,但转移到不同的架构模型,用不同的损失函数训练。我们表明,用于对抗性增强的数据时,我们的扰动导致训练的模式,更加坚固,在不牺牲模型的整体精度。这不像什么在图像分类,其中添加扰动图像训练集,使模型更不容易受到干扰对抗中观察到,但整体精度造成损害。我们测试使用的是最新的立体网络,我们的方法,并评估其对公共基准数据集的性能。
30. Deep Learning Applied to Chest X-Rays: Exploiting and Preventing Shortcuts [PDF] 返回目录
Sarah Jabbour, David Fouhey, Ella Kazerooni, Michael W. Sjoding, Jenna Wiens
Abstract: While deep learning has shown promise in improving the automated diagnosis of disease based on chest X-rays, deep networks may exhibit undesirable behavior related to shortcuts. This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest. For instance, clinical protocols might lead to a dataset in which patients with pacemakers are disproportionately likely to have congestive heart failure. This skew can lead to models that take shortcuts by heavily relying on the biased attribute. We explore this problem across a number of attributes in the context of diagnosing the cause of acute hypoxemic respiratory failure. Applied to chest X-rays, we show that i) deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90), ii) they tend to exploit correlations between such attributes and the outcome label when learning to predict a diagnosis, leading to poor performance when such correlations do not hold in the test population (e.g., everyone in the test set is male), and iii) a simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good generalization performance. On the task of diagnosing congestive heart failure based on a set of chest X-rays skewed towards older patients (age >= 63), the proposed approach improves generalization over standard training from 0.66 (95% CI: 0.54-0.77) to 0.84 (95% CI: 0.73-0.92) AUROC. While simple, the proposed approach has the potential to improve the performance of models across populations by encouraging reliance on clinically relevant manifestations of disease, i.e., those that a clinician would use to make a diagnosis.
摘要:虽然深度学习已经显示出在提高基于胸部X光疾病的自动诊断承诺,深网络可以展现与快捷方式不期望的行为。本文研究伪类歪斜使病人具有特定属性是不合逻辑更可能有兴趣结果的情况下。例如,临床方案可能导致一个数据集,其中带有起搏器的患者不成比例可能有充血性心脏衰竭。这歪斜可能导致由主要依靠偏置属性走捷径模型。我们在诊断急性呼吸衰竭的原因的情况下跨越多个属性的探讨这个问题。施加到胸部X射线,我们表明,ⅰ)深网可以准确地识别许多病人属性,包括性别(AUROC = 0.96)和年龄(AUROC> = 0.90),ⅱ)它们倾向于利用这样的属性和结果标签之间的相关性学习预测诊断,导致业绩不佳时,这种相关性并不在测试人群持有时(例如,大家都在测试集是男性),以及iii)一个简单的迁移学习方法是令人惊讶的有效措施,防止快捷方式,促进良好的泛化性能。在诊断基于一组胸部X射线的充血性心脏衰竭的任务,对老年患者(年龄> = 63)倾斜,所提出的方法提高了0.66概括了标准的培训(95%CI:0.54-0.77)至0.84( 95%CI:0.73-0.92)AUROC。虽然简单,但是该方法具有通过鼓励对疾病,即临床相关表现的依赖,以提高车型不同人群的性能的潜力,那些临床医生将使用做出诊断。
Sarah Jabbour, David Fouhey, Ella Kazerooni, Michael W. Sjoding, Jenna Wiens
Abstract: While deep learning has shown promise in improving the automated diagnosis of disease based on chest X-rays, deep networks may exhibit undesirable behavior related to shortcuts. This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest. For instance, clinical protocols might lead to a dataset in which patients with pacemakers are disproportionately likely to have congestive heart failure. This skew can lead to models that take shortcuts by heavily relying on the biased attribute. We explore this problem across a number of attributes in the context of diagnosing the cause of acute hypoxemic respiratory failure. Applied to chest X-rays, we show that i) deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90), ii) they tend to exploit correlations between such attributes and the outcome label when learning to predict a diagnosis, leading to poor performance when such correlations do not hold in the test population (e.g., everyone in the test set is male), and iii) a simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good generalization performance. On the task of diagnosing congestive heart failure based on a set of chest X-rays skewed towards older patients (age >= 63), the proposed approach improves generalization over standard training from 0.66 (95% CI: 0.54-0.77) to 0.84 (95% CI: 0.73-0.92) AUROC. While simple, the proposed approach has the potential to improve the performance of models across populations by encouraging reliance on clinically relevant manifestations of disease, i.e., those that a clinician would use to make a diagnosis.
摘要:虽然深度学习已经显示出在提高基于胸部X光疾病的自动诊断承诺,深网络可以展现与快捷方式不期望的行为。本文研究伪类歪斜使病人具有特定属性是不合逻辑更可能有兴趣结果的情况下。例如,临床方案可能导致一个数据集,其中带有起搏器的患者不成比例可能有充血性心脏衰竭。这歪斜可能导致由主要依靠偏置属性走捷径模型。我们在诊断急性呼吸衰竭的原因的情况下跨越多个属性的探讨这个问题。施加到胸部X射线,我们表明,ⅰ)深网可以准确地识别许多病人属性,包括性别(AUROC = 0.96)和年龄(AUROC> = 0.90),ⅱ)它们倾向于利用这样的属性和结果标签之间的相关性学习预测诊断,导致业绩不佳时,这种相关性并不在测试人群持有时(例如,大家都在测试集是男性),以及iii)一个简单的迁移学习方法是令人惊讶的有效措施,防止快捷方式,促进良好的泛化性能。在诊断基于一组胸部X射线的充血性心脏衰竭的任务,对老年患者(年龄> = 63)倾斜,所提出的方法提高了0.66概括了标准的培训(95%CI:0.54-0.77)至0.84( 95%CI:0.73-0.92)AUROC。虽然简单,但是该方法具有通过鼓励对疾病,即临床相关表现的依赖,以提高车型不同人群的性能的潜力,那些临床医生将使用做出诊断。
31. Extreme compression of grayscale images [PDF] 返回目录
Franklin Mendivil, Örjan Stenflo
Abstract: Given an grayscale digital image, and a positive integer $n$, how well can we store the image at a compression ratio of $n:1$? In this paper we address the above question in extreme cases when $n>>50$ using "$\mathbf{V}$-variable image compression".
摘要:给定一个灰度数字图像,和一个正整数$ n $的,如何才能将图像存储在$ N的压缩比:1个$?在本文中,我们解决在极端情况下上述问题,当$ N >> 50 $使用“$ \ mathbf {V} $ - 可变图像压缩”。
Franklin Mendivil, Örjan Stenflo
Abstract: Given an grayscale digital image, and a positive integer $n$, how well can we store the image at a compression ratio of $n:1$? In this paper we address the above question in extreme cases when $n>>50$ using "$\mathbf{V}$-variable image compression".
摘要:给定一个灰度数字图像,和一个正整数$ n $的,如何才能将图像存储在$ N的压缩比:1个$?在本文中,我们解决在极端情况下上述问题,当$ N >> 50 $使用“$ \ mathbf {V} $ - 可变图像压缩”。
32. Deep Learning based NAS Score and Fibrosis Stage Prediction from CT and Pathology Data [PDF] 返回目录
Ananya Jana, Hui Qu, Puru Rattan, Carlos D. Minacapelli, Vinod Rustgi, Dimitris Metaxas
Abstract: Non-Alcoholic Fatty Liver Disease (NAFLD) is becoming increasingly prevalent in the world population. Without diagnosis at the right time, NAFLD can lead to non-alcoholic steatohepatitis (NASH) and subsequent liver damage. The diagnosis and treatment of NAFLD depend on the NAFLD activity score (NAS) and the liver fibrosis stage, which are usually evaluated from liver biopsies by pathologists. In this work, we propose a novel method to automatically predict NAS score and fibrosis stage from CT data that is non-invasive and inexpensive to obtain compared with liver biopsy. We also present a method to combine the information from CT and H\&E stained pathology data to improve the performance of NAS score and fibrosis stage prediction, when both types of data are available. This is of great value to assist the pathologists in computer-aided diagnosis process. Experiments on a 30-patient dataset illustrate the effectiveness of our method.
摘要:非酒精性脂肪性肝病(NAFLD)正在成为世界人口越来越普遍。如果没有在正确的时间的诊断,NAFLD可导致非酒精性脂肪性肝炎(NASH)和随后的肝损伤。 NAFLD的诊断和治疗取决于NAFLD活动性评分(NAS)和肝纤维化的阶段,其通常由肝活检由病理学家评估。在这项工作中,我们提出了一种新颖的方法自动预测NAS分数和纤维化阶段从CT数据即非侵入性和廉价地获得具有肝活检比较。我们还提出从CT和H \&E染色的病理数据的信息相结合,以提高NAS分数和纤维化阶段预测的性能,当两种类型的数据是可用的方法。这是很有价值的,以帮助计算机辅助诊断过程中的病理学家。在30患者数据集实验说明了该方法的有效性。
Ananya Jana, Hui Qu, Puru Rattan, Carlos D. Minacapelli, Vinod Rustgi, Dimitris Metaxas
Abstract: Non-Alcoholic Fatty Liver Disease (NAFLD) is becoming increasingly prevalent in the world population. Without diagnosis at the right time, NAFLD can lead to non-alcoholic steatohepatitis (NASH) and subsequent liver damage. The diagnosis and treatment of NAFLD depend on the NAFLD activity score (NAS) and the liver fibrosis stage, which are usually evaluated from liver biopsies by pathologists. In this work, we propose a novel method to automatically predict NAS score and fibrosis stage from CT data that is non-invasive and inexpensive to obtain compared with liver biopsy. We also present a method to combine the information from CT and H\&E stained pathology data to improve the performance of NAS score and fibrosis stage prediction, when both types of data are available. This is of great value to assist the pathologists in computer-aided diagnosis process. Experiments on a 30-patient dataset illustrate the effectiveness of our method.
摘要:非酒精性脂肪性肝病(NAFLD)正在成为世界人口越来越普遍。如果没有在正确的时间的诊断,NAFLD可导致非酒精性脂肪性肝炎(NASH)和随后的肝损伤。 NAFLD的诊断和治疗取决于NAFLD活动性评分(NAS)和肝纤维化的阶段,其通常由肝活检由病理学家评估。在这项工作中,我们提出了一种新颖的方法自动预测NAS分数和纤维化阶段从CT数据即非侵入性和廉价地获得具有肝活检比较。我们还提出从CT和H \&E染色的病理数据的信息相结合,以提高NAS分数和纤维化阶段预测的性能,当两种类型的数据是可用的方法。这是很有价值的,以帮助计算机辅助诊断过程中的病理学家。在30患者数据集实验说明了该方法的有效性。
33. Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time [PDF] 返回目录
Ferran Alet, Kenji Kawaguchi, Tomas Lozano-Perez, Leslie Pack Kaelbling
Abstract: From CNNs to attention mechanisms, encoding inductive biases into neural networks has been a fruitful source of improvement in machine learning. Auxiliary losses are a general way of encoding biases in order to help networks learn better representations by adding extra terms to the loss function. However, since they are minimized on the training data, they suffer from the same generalization gap as regular task losses. Moreover, by changing the loss function, the network is optimizing a different objective than the one we care about. In this work we solve both problems: first, we take inspiration from transductive learning and note that, after receiving an input but before making a prediction, we can fine-tune our models on any unsupervised objective. We call this process tailoring, because we customize the model to each input. Second, we formulate a nested optimization (similar to those in meta-learning) and train our models to perform well on the task loss after adapting to the tailoring loss. The advantages of tailoring and meta-tailoring are discussed theoretically and demonstrated empirically on several diverse examples: encoding inductive conservation laws from physics to improve predictions, improving local smoothness to increase robustness to adversarial examples, and using contrastive losses on the query image to improve generalization.
摘要:从细胞神经网络来关注机制,编码感应偏见到神经网络一直是机器学习提高了卓有成效的来源。辅助损失都是为了编码的偏见,以帮助网络通过添加额外的条款,损失函数学习好表示的一般方式。然而,因为它们在训练数据最小化,他们来自同一个泛化差距经常性的工作遭受损失。此外,通过改变损失函数,网络比我们关心的一个优化不同的目标。在这项工作中,我们解决这两个问题:第一,我们从推式学习和注意的是,接收输入之后,但进行预测之前,我们可以微调我们对任何不受监督的目标车型的灵感。我们称这个过程裁剪,因为我们自定义模式,每个输入。第二,我们制定一个嵌套优化(类似于那些在元学习)和训练我们的模型,以适应剪裁损失后的任务丧失表现良好。剪裁和元剪裁的优点理论上讨论并在几个不同的实施例中凭经验证实:从物理编码感应守恒定律来提高预测,提高局部的平滑度,以增加的鲁棒性对抗性的例子,以及使用所述查询图像对比损耗,提高泛化。
Ferran Alet, Kenji Kawaguchi, Tomas Lozano-Perez, Leslie Pack Kaelbling
Abstract: From CNNs to attention mechanisms, encoding inductive biases into neural networks has been a fruitful source of improvement in machine learning. Auxiliary losses are a general way of encoding biases in order to help networks learn better representations by adding extra terms to the loss function. However, since they are minimized on the training data, they suffer from the same generalization gap as regular task losses. Moreover, by changing the loss function, the network is optimizing a different objective than the one we care about. In this work we solve both problems: first, we take inspiration from transductive learning and note that, after receiving an input but before making a prediction, we can fine-tune our models on any unsupervised objective. We call this process tailoring, because we customize the model to each input. Second, we formulate a nested optimization (similar to those in meta-learning) and train our models to perform well on the task loss after adapting to the tailoring loss. The advantages of tailoring and meta-tailoring are discussed theoretically and demonstrated empirically on several diverse examples: encoding inductive conservation laws from physics to improve predictions, improving local smoothness to increase robustness to adversarial examples, and using contrastive losses on the query image to improve generalization.
摘要:从细胞神经网络来关注机制,编码感应偏见到神经网络一直是机器学习提高了卓有成效的来源。辅助损失都是为了编码的偏见,以帮助网络通过添加额外的条款,损失函数学习好表示的一般方式。然而,因为它们在训练数据最小化,他们来自同一个泛化差距经常性的工作遭受损失。此外,通过改变损失函数,网络比我们关心的一个优化不同的目标。在这项工作中,我们解决这两个问题:第一,我们从推式学习和注意的是,接收输入之后,但进行预测之前,我们可以微调我们对任何不受监督的目标车型的灵感。我们称这个过程裁剪,因为我们自定义模式,每个输入。第二,我们制定一个嵌套优化(类似于那些在元学习)和训练我们的模型,以适应剪裁损失后的任务丧失表现良好。剪裁和元剪裁的优点理论上讨论并在几个不同的实施例中凭经验证实:从物理编码感应守恒定律来提高预测,提高局部的平滑度,以增加的鲁棒性对抗性的例子,以及使用所述查询图像对比损耗,提高泛化。
34. Dual Encoder Fusion U-Net (DEFU-Net) for Cross-manufacturer Chest X-ray Segmentation [PDF] 返回目录
Lipei Zhang, Aozhi Liu, Jing Xiao, Paul Taylor
Abstract: A number of methods based on the deep learning have been applied to medical image segmentation and have achieved state-of-the-art performance. Due to the importance of chest x-ray data in studying COVID-19, there is a demand for state-of-the-art models capable of precisely segmenting chest x-rays before obtaining mask annotations about this sort of dataset. The dataset for exploring best pre-trained model is from Montgomery and Shenzhen hospital which had opened in 2014. The most famous technique is U-Net which has been used to many medical datasets including the Chest X-ray. However, most of variant U-Net mainly focus on extraction of contextual information and dense skip connection. There is still a large space for improving extraction of spatial feature. In this paper, we propose a dual encoder fusion U-Net framework for Chest X-rays based on Inception Convolutional Neural Network with dilation, Densely Connected Recurrent Convolutional Neural Network, which is named DEFU-Net. The densely connected recurrent path extends the network deeper for facilitating context feature extraction. In order to increase the width of network and enrich representation of features, the inception blocks with dilation have been used. The inception blocks can capture globally and locally spatial information by various receptive fields. Meanwhile, the features fusion of two path by summation preserve the context and the spatial information for decoding part. This multi-learning-scale model are benefiting in Chest X-ray dataset from two different manufacturers (Montgomery and Shenzhen hospital). The DEFU-Net achieves the better performance than basic U-Net, residual U-Net, BCDU-Net, modified R2U-Net and modified attention R2U-Net. This model is proved the feasibility for mixed dataset. The open source code for this proposed framework will be public soon.
摘要:基于所述深度学习方法的数目已经被应用于医学图像分割并取得状态的最先进的性能。由于胸部X射线数据的研究COVID-19的重要性,对于能够获得的关于这种数据集的面膜标注之前精确分割胸部X光检查的国家的最先进车型的需求。探索最好预先训练模型的数据集是从曾在2014年最有名的技术打开蒙哥马利和深圳医院是U网已经用于许多医疗数据集,包括胸部X线检查。然而,大多数变种U形网主要集中在上下文信息和密集跳过连接提取。仍有改善空间特征的提取空间大。在本文中,我们提出了一个双编码器融合基础上成立之初卷积神经网络的扩张胸部X光片的U .Net框架,密集连接递归卷积神经网络,它被命名为DEFU-Net的。密集连接的复发性路径延伸该网络用于促进上下文特征提取更深。为了增加网络的宽度和特征富集表示,已经使用与扩张以来块。成立之初块可全局和局部捕捉各种感受域的空间信息。同时,特征融合与两种路径的由求和保存上下文和用于解码部分空间信息。这种多学习尺度模型在胸部X射线数据集从两个不同的制造商(蒙哥马利和深圳医院)受益。该DEFU-Net的实现比基本掌中宽带,剩余掌中宽带,BCDU型网,修改R2U-Net和修改注意R2U-Net向更好的性能。该模型证明了混合数据集的可行性。这个拟议框架的开源代码将很快公开。
Lipei Zhang, Aozhi Liu, Jing Xiao, Paul Taylor
Abstract: A number of methods based on the deep learning have been applied to medical image segmentation and have achieved state-of-the-art performance. Due to the importance of chest x-ray data in studying COVID-19, there is a demand for state-of-the-art models capable of precisely segmenting chest x-rays before obtaining mask annotations about this sort of dataset. The dataset for exploring best pre-trained model is from Montgomery and Shenzhen hospital which had opened in 2014. The most famous technique is U-Net which has been used to many medical datasets including the Chest X-ray. However, most of variant U-Net mainly focus on extraction of contextual information and dense skip connection. There is still a large space for improving extraction of spatial feature. In this paper, we propose a dual encoder fusion U-Net framework for Chest X-rays based on Inception Convolutional Neural Network with dilation, Densely Connected Recurrent Convolutional Neural Network, which is named DEFU-Net. The densely connected recurrent path extends the network deeper for facilitating context feature extraction. In order to increase the width of network and enrich representation of features, the inception blocks with dilation have been used. The inception blocks can capture globally and locally spatial information by various receptive fields. Meanwhile, the features fusion of two path by summation preserve the context and the spatial information for decoding part. This multi-learning-scale model are benefiting in Chest X-ray dataset from two different manufacturers (Montgomery and Shenzhen hospital). The DEFU-Net achieves the better performance than basic U-Net, residual U-Net, BCDU-Net, modified R2U-Net and modified attention R2U-Net. This model is proved the feasibility for mixed dataset. The open source code for this proposed framework will be public soon.
摘要:基于所述深度学习方法的数目已经被应用于医学图像分割并取得状态的最先进的性能。由于胸部X射线数据的研究COVID-19的重要性,对于能够获得的关于这种数据集的面膜标注之前精确分割胸部X光检查的国家的最先进车型的需求。探索最好预先训练模型的数据集是从曾在2014年最有名的技术打开蒙哥马利和深圳医院是U网已经用于许多医疗数据集,包括胸部X线检查。然而,大多数变种U形网主要集中在上下文信息和密集跳过连接提取。仍有改善空间特征的提取空间大。在本文中,我们提出了一个双编码器融合基础上成立之初卷积神经网络的扩张胸部X光片的U .Net框架,密集连接递归卷积神经网络,它被命名为DEFU-Net的。密集连接的复发性路径延伸该网络用于促进上下文特征提取更深。为了增加网络的宽度和特征富集表示,已经使用与扩张以来块。成立之初块可全局和局部捕捉各种感受域的空间信息。同时,特征融合与两种路径的由求和保存上下文和用于解码部分空间信息。这种多学习尺度模型在胸部X射线数据集从两个不同的制造商(蒙哥马利和深圳医院)受益。该DEFU-Net的实现比基本掌中宽带,剩余掌中宽带,BCDU型网,修改R2U-Net和修改注意R2U-Net向更好的性能。该模型证明了混合数据集的可行性。这个拟议框架的开源代码将很快公开。
35. CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation [PDF] 返回目录
Ran Gu, Guotai Wang, Tao Song, Rui Huang, Michael Aertsen, Jan Deprest, Sébastien Ourselin, Tom Vercauteren, Shaoting Zhang
Abstract: Accurate medical image segmentation is essential for diagnosis and treatment planning of diseases. Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they are still challenged by complicated conditions where the segmentation target has large variations of position, shape and scale, and existing CNNs have a poor explainability that limits their application to clinical decisions. In this work, we make extensive use of multiple attentions in a CNN architecture and propose a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that is aware of the most important spatial positions, channels and scales at the same time. In particular, we first propose a joint spatial attention module to make the network focus more on the foreground region. Then, a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels. Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object. Extensive experiments on skin lesion segmentation from ISIC 2018 and multi-class segmentation of fetal MRI found that our proposed CA-Net significantly improved the average segmentation Dice score from 87.77% to 92.08% for skin lesion, 84.79% to 87.08% for the placenta and 93.20% to 95.88% for the fetal brain respectively compared with U-Net. It reduced the model size to around 15 times smaller with close or even better accuracy compared with state-of-the-art DeepLabv3+. In addition, it has a much higher explainability than existing networks by visualizing the attention weight maps. Our code is available at this https URL
摘要:准确的医学图像分割是用于疾病的诊断和治疗计划是必不可少的。卷积神经网络(细胞神经网络)已经实现状态的最先进的性能自动医学图像分割。然而,他们仍然复杂的情况,其中细分目标具有位置,形状和规模的大的变化的挑战,和现有的细胞神经网络有不良explainability限制其应用到临床决策。在这项工作中,我们做的CNN建筑广泛使用多的关注,并提出了一个全面的基于关注CNN(CA-网)更精确和解释的医学图像分割是知道最重要的空间位置,渠道和规模在同时。特别是,我们首先提出一个联合空间注意模块使网络更专注于前景区域。于是,一个新的通道注意模块提出了自适应重新校准信道明智的功能反应,并突出显示最相关的功能频道。此外,我们提出了一个规模注意模块含蓄地强调了最显着的特征多尺度之间的映射,使得CNN是适应对象的大小。从ISIC 2018皮损分割和胎儿MRI的多级分段大量实验发现,我们所提出的CA-网显著提高了平均分割骰子从87.77%得分,以92.08%的皮肤损害,84.79%至87.08%,为胎盘和93.20%至95.88%,为胎脑带U-Net的分别比较。它减少了模型的大小,以具有接近或甚至更好的精度小大约15倍与国家的最先进的DeepLabv3 +比较。此外,它具有比通过可视化的关注权重贴图现有网络更高的explainability。我们的代码可在此HTTPS URL
Ran Gu, Guotai Wang, Tao Song, Rui Huang, Michael Aertsen, Jan Deprest, Sébastien Ourselin, Tom Vercauteren, Shaoting Zhang
Abstract: Accurate medical image segmentation is essential for diagnosis and treatment planning of diseases. Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they are still challenged by complicated conditions where the segmentation target has large variations of position, shape and scale, and existing CNNs have a poor explainability that limits their application to clinical decisions. In this work, we make extensive use of multiple attentions in a CNN architecture and propose a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that is aware of the most important spatial positions, channels and scales at the same time. In particular, we first propose a joint spatial attention module to make the network focus more on the foreground region. Then, a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels. Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object. Extensive experiments on skin lesion segmentation from ISIC 2018 and multi-class segmentation of fetal MRI found that our proposed CA-Net significantly improved the average segmentation Dice score from 87.77% to 92.08% for skin lesion, 84.79% to 87.08% for the placenta and 93.20% to 95.88% for the fetal brain respectively compared with U-Net. It reduced the model size to around 15 times smaller with close or even better accuracy compared with state-of-the-art DeepLabv3+. In addition, it has a much higher explainability than existing networks by visualizing the attention weight maps. Our code is available at this https URL
摘要:准确的医学图像分割是用于疾病的诊断和治疗计划是必不可少的。卷积神经网络(细胞神经网络)已经实现状态的最先进的性能自动医学图像分割。然而,他们仍然复杂的情况,其中细分目标具有位置,形状和规模的大的变化的挑战,和现有的细胞神经网络有不良explainability限制其应用到临床决策。在这项工作中,我们做的CNN建筑广泛使用多的关注,并提出了一个全面的基于关注CNN(CA-网)更精确和解释的医学图像分割是知道最重要的空间位置,渠道和规模在同时。特别是,我们首先提出一个联合空间注意模块使网络更专注于前景区域。于是,一个新的通道注意模块提出了自适应重新校准信道明智的功能反应,并突出显示最相关的功能频道。此外,我们提出了一个规模注意模块含蓄地强调了最显着的特征多尺度之间的映射,使得CNN是适应对象的大小。从ISIC 2018皮损分割和胎儿MRI的多级分段大量实验发现,我们所提出的CA-网显著提高了平均分割骰子从87.77%得分,以92.08%的皮肤损害,84.79%至87.08%,为胎盘和93.20%至95.88%,为胎脑带U-Net的分别比较。它减少了模型的大小,以具有接近或甚至更好的精度小大约15倍与国家的最先进的DeepLabv3 +比较。此外,它具有比通过可视化的关注权重贴图现有网络更高的explainability。我们的代码可在此HTTPS URL
36. EI-MTD:Moving Target Defense for Edge Intelligence against Adversarial Attacks [PDF] 返回目录
Yaguan Qian, Qiqi Shao, Jiamin Wang, Xiang Lin, Yankai Guo, Zhaoquan Gu, Bin Wang, Chunming Wu
Abstract: With the boom of edge intelligence, its vulnerability to adversarial attacks becomes an urgent problem. The so-called adversarial example can fool a deep learning model on the edge node to misclassify. Due to the property of transferability, the adversary can easily make a black-box attack using a local substitute model. Nevertheless, the limitation of resource of edge nodes cannot afford a complicated defense mechanism as doing on the cloud data center. To overcome the challenge, we propose a dynamic defense mechanism, namely EI-MTD. It first obtains robust member models with small size through differential knowledge distillation from a complicated teacher model on the cloud data center. Then, a dynamic scheduling policy based on a Bayesian Stackelberg game is applied to the choice of a target model for service. This dynamic defense can prohibit the adversary from selecting an optimal substitute model for black-box attacks. Our experimental result shows that this dynamic scheduling can effectively protect edge intelligence against adversarial attacks under the black-box setting.
摘要:随着智能边缘的热潮,其对抗攻击的漏洞,成为一个亟待解决的问题。所谓敌对例如可以欺骗边缘节点上的错误分类深刻的学习模式。由于转让的财产,对手可以很容易地使用本地的替代模型黑箱攻击。然而,边缘节点的资源的限制,不能承受复杂的防御机制,因为这样做对云数据中心。为了克服这一挑战,我们提出了一个动态防御机制,即EI-MTD。它首先获得健壮构件模型通过差动知识蒸馏小尺寸从在云数据中心内的复杂老师模型。然后,根据斯坦博格贝叶斯游戏中的动态调度策略应用到服务的目标模式的选择。这种动态的防守可以从选择用于暗箱攻击的最佳替代模型禁止对手。我们的实验结果表明,该动态调度可以针对黑盒设置下的对抗攻击的有效保护边缘智力。
Yaguan Qian, Qiqi Shao, Jiamin Wang, Xiang Lin, Yankai Guo, Zhaoquan Gu, Bin Wang, Chunming Wu
Abstract: With the boom of edge intelligence, its vulnerability to adversarial attacks becomes an urgent problem. The so-called adversarial example can fool a deep learning model on the edge node to misclassify. Due to the property of transferability, the adversary can easily make a black-box attack using a local substitute model. Nevertheless, the limitation of resource of edge nodes cannot afford a complicated defense mechanism as doing on the cloud data center. To overcome the challenge, we propose a dynamic defense mechanism, namely EI-MTD. It first obtains robust member models with small size through differential knowledge distillation from a complicated teacher model on the cloud data center. Then, a dynamic scheduling policy based on a Bayesian Stackelberg game is applied to the choice of a target model for service. This dynamic defense can prohibit the adversary from selecting an optimal substitute model for black-box attacks. Our experimental result shows that this dynamic scheduling can effectively protect edge intelligence against adversarial attacks under the black-box setting.
摘要:随着智能边缘的热潮,其对抗攻击的漏洞,成为一个亟待解决的问题。所谓敌对例如可以欺骗边缘节点上的错误分类深刻的学习模式。由于转让的财产,对手可以很容易地使用本地的替代模型黑箱攻击。然而,边缘节点的资源的限制,不能承受复杂的防御机制,因为这样做对云数据中心。为了克服这一挑战,我们提出了一个动态防御机制,即EI-MTD。它首先获得健壮构件模型通过差动知识蒸馏小尺寸从在云数据中心内的复杂老师模型。然后,根据斯坦博格贝叶斯游戏中的动态调度策略应用到服务的目标模式的选择。这种动态的防守可以从选择用于暗箱攻击的最佳替代模型禁止对手。我们的实验结果表明,该动态调度可以针对黑盒设置下的对抗攻击的有效保护边缘智力。
37. Classification of COVID-19 in CT Scans using Multi-Source Transfer Learning [PDF] 返回目录
Alejandro R. Martinez
Abstract: Since December of 2019, novel coronavirus disease COVID-19 has spread around the world infecting millions of people and upending the global economy. One of the driving reasons behind its high rate of infection is due to the unreliability and lack of RT-PCR testing. At times the turnaround results span as long as a couple of days, only to yield a roughly 70% sensitivity rate. As an alternative, recent research has investigated the use of Computer Vision with Convolutional Neural Networks (CNNs) for the classification of COVID-19 from CT scans. Due to an inherent lack of available COVID-19 CT data, these research efforts have been forced to leverage the use of Transfer Learning. This commonly employed Deep Learning technique has shown to improve model performance on tasks with relatively small amounts of data, as long as the Source feature space somewhat resembles the Target feature space. Unfortunately, a lack of similarity is often encountered in the classification of medical images as publicly available Source datasets usually lack the visual features found in medical images. In this study, we propose the use of Multi-Source Transfer Learning (MSTL) to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. We additionally, propose an unsupervised label creation process, which enhances the performance of our Deep Residual Networks. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
摘要:自2019年十二月,新型冠状病毒病COVID-19在世界各地传播感染数百万人,并颠覆了全球经济。它的一个高的感染率背后的驱动原因是由于不可靠和缺乏RT-PCR检测。有时周转结果跨越只要一两天,只产生大约70%的敏感率。作为替代,最近的研究调查了COVID-19的CT扫描分级利用计算机视觉与卷积神经网络(细胞神经网络)的。由于固有的缺乏可用COVID-19 CT数据,这些研究工作已被迫利用使用迁移学习的。这通常采用深学习技术已经显示出改善与相对少量的数据的任务模型的性能,只要源功能的空间,有点象目标特征空间。不幸的是,由于缺乏类似的医学图像的分类是经常遇到的公开的源数据集通常缺乏医学图像中发现的视觉特征。在这项研究中,我们建议使用多源迁移学习(MSTL),以提高对传统迁移学习的COVID-19的CT扫描分类。随着我们的多源微调的做法,我们的模型跑赢基准模型微调与ImageNet。我们另外,提出了一种无监督的标签的创建过程,增强了我们的深层残留网络的性能。我们表现最好的模型能够达到0.893的精度和召回得分0.897,9.3%,跑赢其基准召回得分。
Alejandro R. Martinez
Abstract: Since December of 2019, novel coronavirus disease COVID-19 has spread around the world infecting millions of people and upending the global economy. One of the driving reasons behind its high rate of infection is due to the unreliability and lack of RT-PCR testing. At times the turnaround results span as long as a couple of days, only to yield a roughly 70% sensitivity rate. As an alternative, recent research has investigated the use of Computer Vision with Convolutional Neural Networks (CNNs) for the classification of COVID-19 from CT scans. Due to an inherent lack of available COVID-19 CT data, these research efforts have been forced to leverage the use of Transfer Learning. This commonly employed Deep Learning technique has shown to improve model performance on tasks with relatively small amounts of data, as long as the Source feature space somewhat resembles the Target feature space. Unfortunately, a lack of similarity is often encountered in the classification of medical images as publicly available Source datasets usually lack the visual features found in medical images. In this study, we propose the use of Multi-Source Transfer Learning (MSTL) to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. We additionally, propose an unsupervised label creation process, which enhances the performance of our Deep Residual Networks. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
摘要:自2019年十二月,新型冠状病毒病COVID-19在世界各地传播感染数百万人,并颠覆了全球经济。它的一个高的感染率背后的驱动原因是由于不可靠和缺乏RT-PCR检测。有时周转结果跨越只要一两天,只产生大约70%的敏感率。作为替代,最近的研究调查了COVID-19的CT扫描分级利用计算机视觉与卷积神经网络(细胞神经网络)的。由于固有的缺乏可用COVID-19 CT数据,这些研究工作已被迫利用使用迁移学习的。这通常采用深学习技术已经显示出改善与相对少量的数据的任务模型的性能,只要源功能的空间,有点象目标特征空间。不幸的是,由于缺乏类似的医学图像的分类是经常遇到的公开的源数据集通常缺乏医学图像中发现的视觉特征。在这项研究中,我们建议使用多源迁移学习(MSTL),以提高对传统迁移学习的COVID-19的CT扫描分类。随着我们的多源微调的做法,我们的模型跑赢基准模型微调与ImageNet。我们另外,提出了一种无监督的标签的创建过程,增强了我们的深层残留网络的性能。我们表现最好的模型能够达到0.893的精度和召回得分0.897,9.3%,跑赢其基准召回得分。
38. Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey [PDF] 返回目录
Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley
Abstract: Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods. Some simulations to visualize the embeddings are also provided.
摘要:随机邻居嵌入(SNE)是歧管的学习和维数降低方法与概率方法。在SNE,每一点都考虑要以一定的概率的所有其他点的邻居,这个概率试图在嵌入空间被保留。 SNE认为在输入和嵌入两个空间的概率高斯分布。然而,T-SNE分别使用这些空间的t和高斯分布。在本教程和调查论文中,我们解释SNE,对称SNE,叔SNE(或柯西-SNE)和叔SNE与一般的自由度。我们还覆盖外的示例扩展和加速了这些方法。一些模拟可视化还提供了嵌入物。
Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley
Abstract: Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods. Some simulations to visualize the embeddings are also provided.
摘要:随机邻居嵌入(SNE)是歧管的学习和维数降低方法与概率方法。在SNE,每一点都考虑要以一定的概率的所有其他点的邻居,这个概率试图在嵌入空间被保留。 SNE认为在输入和嵌入两个空间的概率高斯分布。然而,T-SNE分别使用这些空间的t和高斯分布。在本教程和调查论文中,我们解释SNE,对称SNE,叔SNE(或柯西-SNE)和叔SNE与一般的自由度。我们还覆盖外的示例扩展和加速了这些方法。一些模拟可视化还提供了嵌入物。
39. Semantic Workflows and Machine Learning for the Assessment of Carbon Storage by Urban Trees [PDF] 返回目录
Juan Carrillo, Daniel Garijo, Mark Crowley, Rober Carrillo, Yolanda Gil, Katherine Borda
Abstract: Climate science is critical for understanding both the causes and consequences of changes in global temperatures and has become imperative for decisive policy-making. However, climate science studies commonly require addressing complex interoperability issues between data, software, and experimental approaches from multiple fields. Scientific workflow systems provide unparalleled advantages to address these issues, including reproducibility of experiments, provenance capture, software reusability and knowledge sharing. In this paper, we introduce a novel workflow with a series of connected components to perform spatial data preparation, classification of satellite imagery with machine learning algorithms, and assessment of carbon stored by urban trees. To the best of our knowledge, this is the first study that estimates carbon storage for a region in Africa following the guidelines from the Intergovernmental Panel on Climate Change (IPCC).
摘要:气候科学是了解双方的原因和全球气温变化的影响至关重要,已成为决定性的决策势在必行。然而,气候学研究通常需要处理的数据,软件,以及来自多个领域的实验方法之间复杂的互操作性问题。科学的工作流程系统提供了无与伦比的优势来解决这些问题,包括实验的重复性,出处捕获,软件的可重用性和知识共享。在本文中,我们介绍了一系列连接的组件的新颖的工作流执行空间数据准备,卫星图像与机器学习算法的分类,以及城市树木储存的碳的评估。据我们所知,这是估计碳储存遵循来自政府间气候变化专门委员会(IPCC)的指引非洲地区第一个研究。
Juan Carrillo, Daniel Garijo, Mark Crowley, Rober Carrillo, Yolanda Gil, Katherine Borda
Abstract: Climate science is critical for understanding both the causes and consequences of changes in global temperatures and has become imperative for decisive policy-making. However, climate science studies commonly require addressing complex interoperability issues between data, software, and experimental approaches from multiple fields. Scientific workflow systems provide unparalleled advantages to address these issues, including reproducibility of experiments, provenance capture, software reusability and knowledge sharing. In this paper, we introduce a novel workflow with a series of connected components to perform spatial data preparation, classification of satellite imagery with machine learning algorithms, and assessment of carbon stored by urban trees. To the best of our knowledge, this is the first study that estimates carbon storage for a region in Africa following the guidelines from the Intergovernmental Panel on Climate Change (IPCC).
摘要:气候科学是了解双方的原因和全球气温变化的影响至关重要,已成为决定性的决策势在必行。然而,气候学研究通常需要处理的数据,软件,以及来自多个领域的实验方法之间复杂的互操作性问题。科学的工作流程系统提供了无与伦比的优势来解决这些问题,包括实验的重复性,出处捕获,软件的可重用性和知识共享。在本文中,我们介绍了一系列连接的组件的新颖的工作流执行空间数据准备,卫星图像与机器学习算法的分类,以及城市树木储存的碳的评估。据我们所知,这是估计碳储存遵循来自政府间气候变化专门委员会(IPCC)的指引非洲地区第一个研究。
40. ALICE: Active Learning with Contrastive Natural Language Explanations [PDF] 返回目录
Weixin Liang, James Zou, Zhou Yu
Abstract: Training a supervised neural network classifier typically requires many annotated training samples. Collecting and annotating a large number of data points are costly and sometimes even infeasible. Traditional annotation process uses a low-bandwidth human-machine communication interface: classification labels, each of which only provides several bits of information. We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning. ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations from experts. Then it extracts knowledge from these explanations using a semantic parser. Finally, it incorporates the extracted knowledge through dynamically changing the learning model's structure. We applied ALICE in two visual recognition tasks, bird species classification and social relationship classification. We found by incorporating contrastive explanations, our models outperform baseline models that are trained with 40-100% more training data. We found that adding 1 explanation leads to similar performance gain as adding 13-30 labeled training data points.
摘要:培训监督的神经网络分类器通常需要许多注释的训练样本。收集和注释了大量的数据点的是昂贵的,有时甚至是不可行的。传统注释过程采用了低带宽人机通信接口:分类标签,其中的每一个仅提供的信息的若干比特。我们建议用对比说明(ALICE),一个专家在半实物培训框架,利用对比自然语言解释的学习,以提高数据效率的主动学习。 ALICE学会第一次使用主动学习来选择最翔实的对标签类从专家征求对比自然语言的解释。然后,它提取从使用语义解析这些解释的知识。最后,它包含通过动态改变学习模式的结构所提取的知识。我们在两个视觉识别任务应用的驴友,鸟类物种分类和社会关系的分类。我们发现通过将对比说明,我们的模型超越了与40-100%的训练数据训练的基本模式。我们发现,添加1所解释导致类似的性能增益增加13-30标记的训练数据点。
Weixin Liang, James Zou, Zhou Yu
Abstract: Training a supervised neural network classifier typically requires many annotated training samples. Collecting and annotating a large number of data points are costly and sometimes even infeasible. Traditional annotation process uses a low-bandwidth human-machine communication interface: classification labels, each of which only provides several bits of information. We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning. ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations from experts. Then it extracts knowledge from these explanations using a semantic parser. Finally, it incorporates the extracted knowledge through dynamically changing the learning model's structure. We applied ALICE in two visual recognition tasks, bird species classification and social relationship classification. We found by incorporating contrastive explanations, our models outperform baseline models that are trained with 40-100% more training data. We found that adding 1 explanation leads to similar performance gain as adding 13-30 labeled training data points.
摘要:培训监督的神经网络分类器通常需要许多注释的训练样本。收集和注释了大量的数据点的是昂贵的,有时甚至是不可行的。传统注释过程采用了低带宽人机通信接口:分类标签,其中的每一个仅提供的信息的若干比特。我们建议用对比说明(ALICE),一个专家在半实物培训框架,利用对比自然语言解释的学习,以提高数据效率的主动学习。 ALICE学会第一次使用主动学习来选择最翔实的对标签类从专家征求对比自然语言的解释。然后,它提取从使用语义解析这些解释的知识。最后,它包含通过动态改变学习模式的结构所提取的知识。我们在两个视觉识别任务应用的驴友,鸟类物种分类和社会关系的分类。我们发现通过将对比说明,我们的模型超越了与40-100%的训练数据训练的基本模式。我们发现,添加1所解释导致类似的性能增益增加13-30标记的训练数据点。
41. Survey of explainable machine learning with visual and granular methods beyond quasi-explanations [PDF] 返回目录
Boris Kovalerchuk, Muhammad Aurangzeb Ahmad, Ankur Teredesai
Abstract: This paper surveys visual methods of explainability of Machine Learning (ML) with focus on moving from quasi-explanations that dominate in ML to domain-specific explanation supported by granular visuals. ML interpretation is fundamentally a human activity and visual methods are more readily interpretable. While efficient visual representations of high-dimensional data exist, the loss of interpretable information, occlusion, and clutter continue to be a challenge, which lead to quasi-explanations. We start with the motivation and the different definitions of explainability. The paper focuses on a clear distinction between quasi-explanations and domain specific explanations, and between explainable and an actually explained ML model that are critically important for the explainability domain. We discuss foundations of interpretability, overview visual interpretability and present several types of methods to visualize the ML models. Next, we present methods of visual discovery of ML models, with the focus on interpretable models, based on the recently introduced concept of General Line Coordinates (GLC). These methods take the critical step of creating visual explanations that are not merely quasi-explanations but are also domain specific visual explanations while these methods themselves are domain-agnostic. The paper includes results on theoretical limits to preserve n-D distances in lower dimensions, based on the Johnson-Lindenstrauss lemma, point-to-point and point-to-graph GLC approaches, and real-world case studies. The paper also covers traditional visual methods for understanding ML models, which include deep learning and time series models. We show that many of these methods are quasi-explanations and need further enhancement to become domain specific explanations. We conclude with outlining open problems and current research frontiers.
摘要:本文调查从准的解释是移动粒状视觉主宰ML特定域的解释支持重点机器学习(ML)的explainability的视觉方法。 ML解释基本上是人类活动和视觉方法更容易解释。虽然存在高维数据的高效视觉表示,的可解释的信息,闭塞,和杂波的损失仍然是一个挑战,这导致准解释。我们先从动机和explainability的不同定义。本文侧重于明确的区分准解释和域特定的解释之间,和解释的和实际解释ML模型都为explainability域至关重要。我们讨论解释性,概述视觉解释性和当前几种方法,以可视化的ML车型的基础。接下来,我们目前ML车型的视觉发现的方法,并重点解释模型,基于总路线坐标(GLC)最近引入的概念。这些方法都创造视觉解释,即不只是准解释的关键一步,但同时也是域特定的视觉的解释,而这些方法本身是域无关。该文件包括了理论极限结果,以保留正d的距离在低维度空间的基础上,约翰逊Lindenstrauss引理,点至点和点对图形GLC方法,和现实世界的案例研究。文中还涵盖了理解ML车型,其中包括深度学习和时间序列模型传统的视觉方法。我们发现,很多这些方法都是准的解释,需要进一步增强,成为特定领域的说明。最后,我们概述的问题和目前的研究领域。
Boris Kovalerchuk, Muhammad Aurangzeb Ahmad, Ankur Teredesai
Abstract: This paper surveys visual methods of explainability of Machine Learning (ML) with focus on moving from quasi-explanations that dominate in ML to domain-specific explanation supported by granular visuals. ML interpretation is fundamentally a human activity and visual methods are more readily interpretable. While efficient visual representations of high-dimensional data exist, the loss of interpretable information, occlusion, and clutter continue to be a challenge, which lead to quasi-explanations. We start with the motivation and the different definitions of explainability. The paper focuses on a clear distinction between quasi-explanations and domain specific explanations, and between explainable and an actually explained ML model that are critically important for the explainability domain. We discuss foundations of interpretability, overview visual interpretability and present several types of methods to visualize the ML models. Next, we present methods of visual discovery of ML models, with the focus on interpretable models, based on the recently introduced concept of General Line Coordinates (GLC). These methods take the critical step of creating visual explanations that are not merely quasi-explanations but are also domain specific visual explanations while these methods themselves are domain-agnostic. The paper includes results on theoretical limits to preserve n-D distances in lower dimensions, based on the Johnson-Lindenstrauss lemma, point-to-point and point-to-graph GLC approaches, and real-world case studies. The paper also covers traditional visual methods for understanding ML models, which include deep learning and time series models. We show that many of these methods are quasi-explanations and need further enhancement to become domain specific explanations. We conclude with outlining open problems and current research frontiers.
摘要:本文调查从准的解释是移动粒状视觉主宰ML特定域的解释支持重点机器学习(ML)的explainability的视觉方法。 ML解释基本上是人类活动和视觉方法更容易解释。虽然存在高维数据的高效视觉表示,的可解释的信息,闭塞,和杂波的损失仍然是一个挑战,这导致准解释。我们先从动机和explainability的不同定义。本文侧重于明确的区分准解释和域特定的解释之间,和解释的和实际解释ML模型都为explainability域至关重要。我们讨论解释性,概述视觉解释性和当前几种方法,以可视化的ML车型的基础。接下来,我们目前ML车型的视觉发现的方法,并重点解释模型,基于总路线坐标(GLC)最近引入的概念。这些方法都创造视觉解释,即不只是准解释的关键一步,但同时也是域特定的视觉的解释,而这些方法本身是域无关。该文件包括了理论极限结果,以保留正d的距离在低维度空间的基础上,约翰逊Lindenstrauss引理,点至点和点对图形GLC方法,和现实世界的案例研究。文中还涵盖了理解ML车型,其中包括深度学习和时间序列模型传统的视觉方法。我们发现,很多这些方法都是准的解释,需要进一步增强,成为特定领域的说明。最后,我们概述的问题和目前的研究领域。
42. Federated Learning for Computational Pathology on Gigapixel Whole Slide Images [PDF] 返回目录
Ming Y. Lu, Dehan Kong, Jana Lipkova, Richard J. Chen, Rajendra Singh, Drew F. K. Williamsona, Tiffany Y. Chena, Faisal Mahmood
Abstract: Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non-human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable, and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns amongst other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation.
摘要:基于学习的深计算病理算法已经证明,在一个宽的阵列,其范围从公知的形态表型表征从组织学如分子改变预测非人类可识别特征的任务深刻能力到Excel。然而,健壮,适应能力强,准确的深学习型模式的发展往往依赖于最好应来自不同的来源和患者群体,以应付存在的异质性的收集和耗时的策展大型优质注释的训练数据在这样的数据集。多中心的,在多个机构自然可以帮助克服这一挑战,并提高模型的性能医疗数据的协同整合,而是通过隐私担忧之中其他困难可能在复杂的数据共享过程中出现的模型倾向于使用几十万的规模限制超高像素全幻灯片图像。在本文中,我们使用弱监督的关注多示例学习和差分隐私介绍了千兆像素的整个幻灯片图像隐私保护联盟学习的计算病理。我们评估使用上千整个组织学幻灯片图像的只有幻灯片级标签两种不同的诊断问题,我们的做法。此外,我们提出了存活预测和患者分层从整个幻灯片图像的弱监督学习框架,并展示其在联合设定有效性。我们的研究结果表明,采用联合学习,我们可以有效地培养正确的弱监督深度学习模型从分布式数据仓库没有直接的数据共享和其相关的复杂性,同时还采用了随机噪声的产生差动保护隐私。
Ming Y. Lu, Dehan Kong, Jana Lipkova, Richard J. Chen, Rajendra Singh, Drew F. K. Williamsona, Tiffany Y. Chena, Faisal Mahmood
Abstract: Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non-human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable, and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns amongst other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation.
摘要:基于学习的深计算病理算法已经证明,在一个宽的阵列,其范围从公知的形态表型表征从组织学如分子改变预测非人类可识别特征的任务深刻能力到Excel。然而,健壮,适应能力强,准确的深学习型模式的发展往往依赖于最好应来自不同的来源和患者群体,以应付存在的异质性的收集和耗时的策展大型优质注释的训练数据在这样的数据集。多中心的,在多个机构自然可以帮助克服这一挑战,并提高模型的性能医疗数据的协同整合,而是通过隐私担忧之中其他困难可能在复杂的数据共享过程中出现的模型倾向于使用几十万的规模限制超高像素全幻灯片图像。在本文中,我们使用弱监督的关注多示例学习和差分隐私介绍了千兆像素的整个幻灯片图像隐私保护联盟学习的计算病理。我们评估使用上千整个组织学幻灯片图像的只有幻灯片级标签两种不同的诊断问题,我们的做法。此外,我们提出了存活预测和患者分层从整个幻灯片图像的弱监督学习框架,并展示其在联合设定有效性。我们的研究结果表明,采用联合学习,我们可以有效地培养正确的弱监督深度学习模型从分布式数据仓库没有直接的数据共享和其相关的复杂性,同时还采用了随机噪声的产生差动保护隐私。
43. Operator-valued formulas for Riemannian Gradient and Hessian and families of tractable metrics in optimization and machine learning [PDF] 返回目录
Du Nguyen
Abstract: We provide an explicit formula for the Levi-Civita connection and Riemannian Hessian when the {\it tangent space} at each point of a Riemannian manifold is embedded in an inner product space with a non-constant metric. Together with a classical formula for projection, this allows us to evaluate Riemannian gradient and Hessian for several families of metric extending existing ones on classical manifolds: a family of metrics on Stiefel manifolds connecting both the constant and canonical ambient metrics with closed-form geodesics; a family of quotient metrics on a manifold of positive-semidefinite matrices of fixed rank, considered as a quotient of a product of Stiefel and positive-definite matrix manifold with affine-invariant metrics; a large family of new metrics on flag manifolds. We show in many instances, this method allows us to apply symbolic calculus to derive formulas for the Riemannian gradient and Hessian. The method greatly extends the list of potential metrics that could be used in manifold optimization and machine learning.
摘要:提供用于在所述{\它正切空间}在黎曼流形的每个点被嵌入在具有非恒定度量的内积空间的列维 - 奇维塔联络和黎曼海森显式公式。连同用于投影经典公式,这使我们能够评估黎曼梯度和Hessian矩阵对古典歧管度量延伸现有的几个家族:家族上连接两个常数,并与封闭形式的测地线典型环境度量的Stiefel歧管度量;商数度量的固定等级的正半定矩阵的歧管中的家庭,视为Stiefel的并用仿射不变度量正定矩阵歧管的产品的商;一个大家族的旗流形的新指标。我们发现在许多情况下,这种方法可以让我们的符号演算适用于推导公式为黎曼梯度和黑森州。该方法大大扩展,可以在歧管优化和机器学习中使用的潜在指标的清单。
Du Nguyen
Abstract: We provide an explicit formula for the Levi-Civita connection and Riemannian Hessian when the {\it tangent space} at each point of a Riemannian manifold is embedded in an inner product space with a non-constant metric. Together with a classical formula for projection, this allows us to evaluate Riemannian gradient and Hessian for several families of metric extending existing ones on classical manifolds: a family of metrics on Stiefel manifolds connecting both the constant and canonical ambient metrics with closed-form geodesics; a family of quotient metrics on a manifold of positive-semidefinite matrices of fixed rank, considered as a quotient of a product of Stiefel and positive-definite matrix manifold with affine-invariant metrics; a large family of new metrics on flag manifolds. We show in many instances, this method allows us to apply symbolic calculus to derive formulas for the Riemannian gradient and Hessian. The method greatly extends the list of potential metrics that could be used in manifold optimization and machine learning.
摘要:提供用于在所述{\它正切空间}在黎曼流形的每个点被嵌入在具有非恒定度量的内积空间的列维 - 奇维塔联络和黎曼海森显式公式。连同用于投影经典公式,这使我们能够评估黎曼梯度和Hessian矩阵对古典歧管度量延伸现有的几个家族:家族上连接两个常数,并与封闭形式的测地线典型环境度量的Stiefel歧管度量;商数度量的固定等级的正半定矩阵的歧管中的家庭,视为Stiefel的并用仿射不变度量正定矩阵歧管的产品的商;一个大家族的旗流形的新指标。我们发现在许多情况下,这种方法可以让我们的符号演算适用于推导公式为黎曼梯度和黑森州。该方法大大扩展,可以在歧管优化和机器学习中使用的潜在指标的清单。
44. CCBlock: An Effective Use of Deep Learning for Automatic Diagnosis of COVID-19 Using X-Ray Images [PDF] 返回目录
Ali Al-Bawi, Karrar Ali Al-Kaabi, Mohammed Jeryo, Ahmad Al-Fatlawi
Abstract: Propose: Troubling countries one after another, the COVID-19 pandemic has dramatically affected the health and well-being of the world's population. The disease may continue to persist more extensively due to the increasing number of new cases daily, the rapid spread of the virus, and delay in the PCR analysis results. Therefore, it is necessary to consider developing assistive methods for detecting and diagnosing the COVID-19 to eradicate the spread of the novel coronavirus among people. Based on convolutional neural networks (CNNs), automated detection systems have shown promising results of diagnosing patients with the COVID-19 through radiography; thus, they are introduced as a workable solution to the COVID-19 diagnosis. Materials and Methods: Based on the enhancement of the classical visual geometry group (VGG) network with the convolutional COVID block (CCBlock), an efficient screening model was proposed in this study to diagnose and distinguish patients with the COVID-19 from those with pneumonia and the healthy people through radiography. The model testing dataset included 1,828 x-ray images available on public platforms. 310 images were showing confirmed COVID-19 cases, 864 images indicating pneumonia cases, and 654 images showing healthy people. Results: According to the test results, enhancing the classical VGG network with radiography provided the highest diagnosis performance and overall accuracy of 98.52% for two classes as well as accuracy of 95.34% for three classes. Conclusions: According to the results, using the enhanced VGG deep neural network can help radiologists automatically diagnose the COVID-19 through radiography.
摘要:提出:令人不安的国家一个又一个,COVID-19大流行大大影响了健康和幸福的世界人口。这种疾病可能会继续,由于越来越多的新发病例每天,病毒的迅速蔓延,而延误的PCR分析结果更广泛地坚持。因此,有必要考虑制定检测和诊断的COVID-19根除新型冠状病毒的人与人之间传播的辅助方法。基于卷积神经网络(细胞神经网络),自动化检测系统已显示有希望的诊断患者与COVID-19通过射线照相术的结果;因此,它们被引入作为一个可行的解决COVID-19的诊断。材料与方法:基于经典视觉几何组的增强(VGG)网络与卷积COVID块(CCBlock),高效的筛选模型在这项研究中,提出了诊断和有肺炎与COVID-19分清患者和健康人通过照相。该模型的测试数据集包括1828公共平台上提供的X射线图像。 310对图像进行显示证实COVID-19的情况下,864倍的图像显示肺炎病例,以及654倍的图像显示健康人。结果:根据测试结果,增强了古典VGG网络摄片为三类提供两类95.34%,最高的诊断性能和98.52%,总体准确性和精确度。结论:根据实验结果,使用增强VGG深层神经网络可以帮助放射科医生通过X线摄影自动诊断的COVID-19。
Ali Al-Bawi, Karrar Ali Al-Kaabi, Mohammed Jeryo, Ahmad Al-Fatlawi
Abstract: Propose: Troubling countries one after another, the COVID-19 pandemic has dramatically affected the health and well-being of the world's population. The disease may continue to persist more extensively due to the increasing number of new cases daily, the rapid spread of the virus, and delay in the PCR analysis results. Therefore, it is necessary to consider developing assistive methods for detecting and diagnosing the COVID-19 to eradicate the spread of the novel coronavirus among people. Based on convolutional neural networks (CNNs), automated detection systems have shown promising results of diagnosing patients with the COVID-19 through radiography; thus, they are introduced as a workable solution to the COVID-19 diagnosis. Materials and Methods: Based on the enhancement of the classical visual geometry group (VGG) network with the convolutional COVID block (CCBlock), an efficient screening model was proposed in this study to diagnose and distinguish patients with the COVID-19 from those with pneumonia and the healthy people through radiography. The model testing dataset included 1,828 x-ray images available on public platforms. 310 images were showing confirmed COVID-19 cases, 864 images indicating pneumonia cases, and 654 images showing healthy people. Results: According to the test results, enhancing the classical VGG network with radiography provided the highest diagnosis performance and overall accuracy of 98.52% for two classes as well as accuracy of 95.34% for three classes. Conclusions: According to the results, using the enhanced VGG deep neural network can help radiologists automatically diagnose the COVID-19 through radiography.
摘要:提出:令人不安的国家一个又一个,COVID-19大流行大大影响了健康和幸福的世界人口。这种疾病可能会继续,由于越来越多的新发病例每天,病毒的迅速蔓延,而延误的PCR分析结果更广泛地坚持。因此,有必要考虑制定检测和诊断的COVID-19根除新型冠状病毒的人与人之间传播的辅助方法。基于卷积神经网络(细胞神经网络),自动化检测系统已显示有希望的诊断患者与COVID-19通过射线照相术的结果;因此,它们被引入作为一个可行的解决COVID-19的诊断。材料与方法:基于经典视觉几何组的增强(VGG)网络与卷积COVID块(CCBlock),高效的筛选模型在这项研究中,提出了诊断和有肺炎与COVID-19分清患者和健康人通过照相。该模型的测试数据集包括1828公共平台上提供的X射线图像。 310对图像进行显示证实COVID-19的情况下,864倍的图像显示肺炎病例,以及654倍的图像显示健康人。结果:根据测试结果,增强了古典VGG网络摄片为三类提供两类95.34%,最高的诊断性能和98.52%,总体准确性和精确度。结论:根据实验结果,使用增强VGG深层神经网络可以帮助放射科医生通过X线摄影自动诊断的COVID-19。
注:中文为机器翻译结果!封面为论文标题词云图!