目录
3. Interactive Visual Study of Multiple Attributes Learning Model of X-Ray Scattering Images [PDF] 摘要
19. Multi-Attention-Network for Semantic Segmentation of High-Resolution Remote Sensing Images [PDF] 摘要
20. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity [PDF] 摘要
23. The Little W-Net That Could: State-of-the-Art Retinal Vessel Segmentation with Minimalistic Models [PDF] 摘要
摘要
1. Vulnerability of Face Recognition Systems Against Composite Face Reconstruction Attack [PDF] 返回目录
Hadi Mansourifar, Weidong Shi
Abstract: Rounding confidence score is considered trivial but a simple and effective countermeasure to stop gradient descent based image reconstruction attacks. However, its capability in the face of more sophisticated reconstruction attacks is an uninvestigated research area. In this paper, we prove that, the face reconstruction attacks based on composite faces can reveal the inefficiency of rounding policy as countermeasure. We assume that, the attacker takes advantage of face composite parts which helps the attacker to get access to the most important features of the face or decompose it to the independent segments. Afterwards, decomposed segments are exploited as search parameters to create a search path to reconstruct optimal face. Face composition parts enable the attacker to violate the privacy of face recognition models even with a blind search. However, we assume that, the attacker may take advantage of random search to reconstruct the target face faster. The algorithm is started with random composition of face parts as initial face and confidence score is considered as fitness value. Our experiments show that, since the rounding policy as countermeasure can't stop the random search process, current face recognition systems are extremely vulnerable against such sophisticated attacks. To address this problem, we successfully test Face Detection Score Filtering (FDSF) as a countermeasure to protect the privacy of training data against proposed attack.
摘要:舍入的信心分数被认为是微不足道的,但一个简单而有效的对策,以阻止基于梯度下降的图像重建的攻击。然而,它在更复杂的重建攻击面前能力是一个未经调查的研究领域。在本文中,我们证明了,基于复合材料脸脸重建的攻击可以揭示四舍五入政策对策的低效率。我们假设,攻击者利用它可以帮助攻击者可以访问脸部最重要的特征或将其分解为独立的段面的复合材料部件的。然后,分解段被利用作为搜索参数以创建重建最佳面部的搜索路径。面部组成部分使攻击者违反脸部识别模型的隐私甚至盲目搜索。但是,我们认为,攻击者可以采取随机搜索的优势,以重建目标面部更快。该算法开始与面部器官的随机组成初始脸部和信心分数被认为是适应值。我们的实验表明,由于舍入政策,对策也无法阻止随机搜索过程中,目前的人脸识别系统对这种复杂的攻击非常脆弱。为了解决这个问题,我们成功地测试人脸探测得分过滤(FDSF)作为对策,以保护的训练数据对提出攻击的隐私。
Hadi Mansourifar, Weidong Shi
Abstract: Rounding confidence score is considered trivial but a simple and effective countermeasure to stop gradient descent based image reconstruction attacks. However, its capability in the face of more sophisticated reconstruction attacks is an uninvestigated research area. In this paper, we prove that, the face reconstruction attacks based on composite faces can reveal the inefficiency of rounding policy as countermeasure. We assume that, the attacker takes advantage of face composite parts which helps the attacker to get access to the most important features of the face or decompose it to the independent segments. Afterwards, decomposed segments are exploited as search parameters to create a search path to reconstruct optimal face. Face composition parts enable the attacker to violate the privacy of face recognition models even with a blind search. However, we assume that, the attacker may take advantage of random search to reconstruct the target face faster. The algorithm is started with random composition of face parts as initial face and confidence score is considered as fitness value. Our experiments show that, since the rounding policy as countermeasure can't stop the random search process, current face recognition systems are extremely vulnerable against such sophisticated attacks. To address this problem, we successfully test Face Detection Score Filtering (FDSF) as a countermeasure to protect the privacy of training data against proposed attack.
摘要:舍入的信心分数被认为是微不足道的,但一个简单而有效的对策,以阻止基于梯度下降的图像重建的攻击。然而,它在更复杂的重建攻击面前能力是一个未经调查的研究领域。在本文中,我们证明了,基于复合材料脸脸重建的攻击可以揭示四舍五入政策对策的低效率。我们假设,攻击者利用它可以帮助攻击者可以访问脸部最重要的特征或将其分解为独立的段面的复合材料部件的。然后,分解段被利用作为搜索参数以创建重建最佳面部的搜索路径。面部组成部分使攻击者违反脸部识别模型的隐私甚至盲目搜索。但是,我们认为,攻击者可以采取随机搜索的优势,以重建目标面部更快。该算法开始与面部器官的随机组成初始脸部和信心分数被认为是适应值。我们的实验表明,由于舍入政策,对策也无法阻止随机搜索过程中,目前的人脸识别系统对这种复杂的攻击非常脆弱。为了解决这个问题,我们成功地测试人脸探测得分过滤(FDSF)作为对策,以保护的训练数据对提出攻击的隐私。
2. Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [PDF] 返回目录
Jonas Geiping, Liam Fowl, W. Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, Tom Goldstein
Abstract: Data Poisoning attacks involve an attacker modifying training data to maliciouslycontrol a model trained on this data. Previous poisoning attacks against deep neural networks have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. In this work, we focus on a particularly malicious poisoning attack that is both "from scratch" and"clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. The central mechanism of this attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.
摘要:数据中毒攻击是攻击者修改训练数据maliciouslycontrol训练的这个数据的模型。上一页中毒对深层神经网络攻击的范围和成功是有限的,只有在简化设置工作或者是用于大型数据集过于昂贵。在这项工作中,我们专注于一个特别的恶意投毒攻击,既“从头开始”和“清洁标签”,意思是当干扰我们分析认为成功的作品对新,随机初始化模式,是几乎无法察觉到人类的攻击,所有只有训练数据的一小部分。这种攻击的中心机构是匹配的恶意例的梯度方向。我们分析为什么这个作品,补充与实际的考虑。并显示其真实世界从业者的威胁,发现它是第一中毒的方法来引起从头开始训练了全尺寸的现代深网络目标分类错误,中毒ImageNet数据集。最后,我们证明现有的防御战略对抗这种攻击,认为数据中毒的局限性是一个可信的威胁,即使是大型深的学习系统。
Jonas Geiping, Liam Fowl, W. Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, Tom Goldstein
Abstract: Data Poisoning attacks involve an attacker modifying training data to maliciouslycontrol a model trained on this data. Previous poisoning attacks against deep neural networks have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. In this work, we focus on a particularly malicious poisoning attack that is both "from scratch" and"clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. The central mechanism of this attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.
摘要:数据中毒攻击是攻击者修改训练数据maliciouslycontrol训练的这个数据的模型。上一页中毒对深层神经网络攻击的范围和成功是有限的,只有在简化设置工作或者是用于大型数据集过于昂贵。在这项工作中,我们专注于一个特别的恶意投毒攻击,既“从头开始”和“清洁标签”,意思是当干扰我们分析认为成功的作品对新,随机初始化模式,是几乎无法察觉到人类的攻击,所有只有训练数据的一小部分。这种攻击的中心机构是匹配的恶意例的梯度方向。我们分析为什么这个作品,补充与实际的考虑。并显示其真实世界从业者的威胁,发现它是第一中毒的方法来引起从头开始训练了全尺寸的现代深网络目标分类错误,中毒ImageNet数据集。最后,我们证明现有的防御战略对抗这种攻击,认为数据中毒的局限性是一个可信的威胁,即使是大型深的学习系统。
3. Interactive Visual Study of Multiple Attributes Learning Model of X-Ray Scattering Images [PDF] 返回目录
Xinyi Huang, Suphanut Jamonnak, Ye Zhao, Boyu Wang, Minh Hoai, Kevin Yager, Wei Xu
Abstract: Existing interactive visualization tools for deep learning are mostly applied to the training, debugging, and refinement of neural network models working on natural images. However, visual analytics tools are lacking for the specific application of x-ray image classification with multiple structural attributes. In this paper, we present an interactive system for domain scientists to visually study the multiple attributes learning models applied to x-ray scattering images. It allows domain scientists to interactively explore this important type of scientific images in embedded spaces that are defined on the model prediction output, the actual labels, and the discovered feature space of neural networks. Users are allowed to flexibly select instance images, their clusters, and compare them regarding the specified visual representation of attributes. The exploration is guided by the manifestation of model performance related to mutual relationships among attributes, which often affect the learning accuracy and effectiveness. The system thus supports domain scientists to improve the training dataset and model, find questionable attributes labels, and identify outlier images or spurious data clusters. Case studies and scientists feedback demonstrate its functionalities and usefulness.
摘要:现有的交互式可视化工具,深度学习大多应用于培训,调试,以及对自然的图像工作的神经网络模型的细化。然而,视觉分析工具缺乏对X射线图像分类与多个结构属性的具体应用。在本文中,我们提出了一个交互式系统域科学家们直观地研究了多个属性的学习模式应用到X射线散射图像。它允许域科学家交互方式探索嵌入式空间科学的图像,是在模型预测输出,实际的标签和神经网络的发现的特征空间中定义的这个重要类型。允许用户灵活选择例如图像,它们的群集,并比较它们关于属性指定的视觉表现。探索通过的相关属性之间的相互关系,这往往影响学习准确性和有效性模型的性能表现引导。因此,该系统支持领域的科学家,以提高训练数据集和模型,寻找可疑的属性标签和识别离群图像或虚假的数据集群。案例研究和科学家的反馈表明它的功能和实用性。
Xinyi Huang, Suphanut Jamonnak, Ye Zhao, Boyu Wang, Minh Hoai, Kevin Yager, Wei Xu
Abstract: Existing interactive visualization tools for deep learning are mostly applied to the training, debugging, and refinement of neural network models working on natural images. However, visual analytics tools are lacking for the specific application of x-ray image classification with multiple structural attributes. In this paper, we present an interactive system for domain scientists to visually study the multiple attributes learning models applied to x-ray scattering images. It allows domain scientists to interactively explore this important type of scientific images in embedded spaces that are defined on the model prediction output, the actual labels, and the discovered feature space of neural networks. Users are allowed to flexibly select instance images, their clusters, and compare them regarding the specified visual representation of attributes. The exploration is guided by the manifestation of model performance related to mutual relationships among attributes, which often affect the learning accuracy and effectiveness. The system thus supports domain scientists to improve the training dataset and model, find questionable attributes labels, and identify outlier images or spurious data clusters. Case studies and scientists feedback demonstrate its functionalities and usefulness.
摘要:现有的交互式可视化工具,深度学习大多应用于培训,调试,以及对自然的图像工作的神经网络模型的细化。然而,视觉分析工具缺乏对X射线图像分类与多个结构属性的具体应用。在本文中,我们提出了一个交互式系统域科学家们直观地研究了多个属性的学习模式应用到X射线散射图像。它允许域科学家交互方式探索嵌入式空间科学的图像,是在模型预测输出,实际的标签和神经网络的发现的特征空间中定义的这个重要类型。允许用户灵活选择例如图像,它们的群集,并比较它们关于属性指定的视觉表现。探索通过的相关属性之间的相互关系,这往往影响学习准确性和有效性模型的性能表现引导。因此,该系统支持领域的科学家,以提高训练数据集和模型,寻找可疑的属性标签和识别离群图像或虚假的数据集群。案例研究和科学家的反馈表明它的功能和实用性。
4. Imbalanced Image Classification with Complement Cross Entropy [PDF] 返回目录
Yechan Kim, Younkwan Lee, Moongu Jeon
Abstract: Recently, deep learning models have achieved great success in computer vision applications, relying on large-scale class-balanced datasets. However, imbalanced class distributions still limit the wide applicability of these models due to degradation in performance. To solve this problem, we focus on the study of cross entropy: it mostly ignores output scores on wrong classes. In this work, we discover that neutralizing predicted probabilities on incorrect classes helps improve accuracy of prediction for imbalanced image classification. This paper proposes a simple but effective loss named complement cross entropy (CCE) based on this finding. Our loss makes the ground truth class overwhelm the other classes in terms of softmax probability, by neutralizing probabilities of incorrect classes, without additional training procedures. Along with it, this loss facilitates the models to learn key information especially from samples on minority classes. It ensures more accurate and robust classification results for imbalanced class distributions. Extensive experiments on imbalanced datasets demonstrate the effectiveness of our method compared to other state-of-the-art methods.
摘要:近日,深度学习模型已经实现了计算机视觉应用了巨大的成功,依靠大型类平衡数据集。然而,不平衡类分布仍然限制了这些模型的广泛适用性因性能下降。为了解决这个问题,我们专注于交叉熵的研究:它主要是忽略了错类输出分数。在这项工作中,我们发现,不正确的类中和预测概率有助于改善不平衡的图像分类预测的准确性。本文提出了一个名为补交叉熵(CCE)基于这一发现一个简单而有效的损失。我们的损失使地面实况压倒类中的其他类SOFTMAX概率而言,通过中和不正确类别的概率,无需额外的培训程序。随着它,这个损失有利于车型尤其是学习从样品上的少数类的关键信息。它确保了不平衡类分布更加精确和稳健的分类结果。在不平衡数据集大量实验表明,相对于其他国家的最先进的方法,我们的方法的有效性。
Yechan Kim, Younkwan Lee, Moongu Jeon
Abstract: Recently, deep learning models have achieved great success in computer vision applications, relying on large-scale class-balanced datasets. However, imbalanced class distributions still limit the wide applicability of these models due to degradation in performance. To solve this problem, we focus on the study of cross entropy: it mostly ignores output scores on wrong classes. In this work, we discover that neutralizing predicted probabilities on incorrect classes helps improve accuracy of prediction for imbalanced image classification. This paper proposes a simple but effective loss named complement cross entropy (CCE) based on this finding. Our loss makes the ground truth class overwhelm the other classes in terms of softmax probability, by neutralizing probabilities of incorrect classes, without additional training procedures. Along with it, this loss facilitates the models to learn key information especially from samples on minority classes. It ensures more accurate and robust classification results for imbalanced class distributions. Extensive experiments on imbalanced datasets demonstrate the effectiveness of our method compared to other state-of-the-art methods.
摘要:近日,深度学习模型已经实现了计算机视觉应用了巨大的成功,依靠大型类平衡数据集。然而,不平衡类分布仍然限制了这些模型的广泛适用性因性能下降。为了解决这个问题,我们专注于交叉熵的研究:它主要是忽略了错类输出分数。在这项工作中,我们发现,不正确的类中和预测概率有助于改善不平衡的图像分类预测的准确性。本文提出了一个名为补交叉熵(CCE)基于这一发现一个简单而有效的损失。我们的损失使地面实况压倒类中的其他类SOFTMAX概率而言,通过中和不正确类别的概率,无需额外的培训程序。随着它,这个损失有利于车型尤其是学习从样品上的少数类的关键信息。它确保了不平衡类分布更加精确和稳健的分类结果。在不平衡数据集大量实验表明,相对于其他国家的最先进的方法,我们的方法的有效性。
5. Looking for change? Roll the Dice and demand Attention [PDF] 返回目录
Foivos I. Diakogiannis, François Waldner, Peter Caccetta
Abstract: Change detection, i.e. identification per pixel of changes for some classes of interest from a set of bi-temporal co-registered images, is a fundamental task in the field of remote sensing. It remains challenging due to unrelated forms of change that appear at different times in input images. These are changes due to to different environmental conditions or simply changes of objects that are not of interest. Here, we propose a reliable deep learning framework for the task of semantic change detection in very high-resolution aerial images. Our framework consists of a new loss function, new attention modules, new feature extraction building blocks, and a new backbone architecture that is tailored for the task of semantic change detection. Specifically, we define a new form of set similarity, that is based on an iterative evaluation of a variant of the Dice coefficient. We use this similarity metric to define a new loss function as well as a new spatial and channel convolution Attention layer (the FracTAL). The new attention layer, designed specifically for vision tasks, is memory efficient, thus suitable for use in all levels of deep convolutional networks. Based on these, we introduce two new efficient self-contained feature extraction convolution units. We term these units CEECNet and FracTAL ResNet units. We validate the performance of these feature extraction building blocks on the CIFAR10 reference data and compare the results with standard ResNet modules. Further, we introduce a new encoder/decoder scheme, a network macro-topology, that is tailored for the task of change detection. We validate our approach by showing excellent performance and achieving state of the art score (F1 and Intersection over Union hereafter IoU) on two building change detection datasets, namely, the LEVIRCD (F1: 0.918, IoU: 0.848) and the WHU (F1: 0.938, IoU: 0.882) datasets.
摘要:变化检测,每对从一组双时间共注册图像的兴趣某些类的变化像素,即识别,是在遥感领域的基本任务。它仍然具有挑战性,由于出现在输入图像中不同时期的变化无关的形式。这些变化,由于不同的环境条件或简单地改变那些不感兴趣的对象。在这里,我们提出了语义变化检测任务可靠的深度学习框架非常高分辨率的航拍图像。我们的框架由一个新的损失的功能,新的关注的模块,新的特征提取积木,那就是语义变化检测任务量身打造的一款新的骨干架构的。具体地,我们定义一组相似性,即基于所述骰子系数的变型的迭代评估的新形式。我们使用该相似性度量来定义一个新的损失函数,以及一个新的空间和通道卷积注意层(分形)。新的关注层,专为视觉任务设计的,是高效存储,因此它适用于深卷积网络的各个层面使用。在此基础上,我们引入了两个新的高效的自包含的特征提取卷积单元。我们称这些单位CEECNet和分形RESNET单位。我们验证的CIFAR10参考数据这些特征提取积木的性能和标准RESNET模块的比较结果。此外,我们还引进了新的编码/解码方案,网络宏观拓扑结构,即对于变化检测任务量身定制。我们通过展示出色的性能和两个实现艺术评分(F1和路口在联盟以后IOU)的国家建设变化检测数据集验证我们的做法,即LEVIRCD(F1:0.918,欠条:0.848)和武汉大学(F1: 0.938,IOU:0.882)的数据集。
Foivos I. Diakogiannis, François Waldner, Peter Caccetta
Abstract: Change detection, i.e. identification per pixel of changes for some classes of interest from a set of bi-temporal co-registered images, is a fundamental task in the field of remote sensing. It remains challenging due to unrelated forms of change that appear at different times in input images. These are changes due to to different environmental conditions or simply changes of objects that are not of interest. Here, we propose a reliable deep learning framework for the task of semantic change detection in very high-resolution aerial images. Our framework consists of a new loss function, new attention modules, new feature extraction building blocks, and a new backbone architecture that is tailored for the task of semantic change detection. Specifically, we define a new form of set similarity, that is based on an iterative evaluation of a variant of the Dice coefficient. We use this similarity metric to define a new loss function as well as a new spatial and channel convolution Attention layer (the FracTAL). The new attention layer, designed specifically for vision tasks, is memory efficient, thus suitable for use in all levels of deep convolutional networks. Based on these, we introduce two new efficient self-contained feature extraction convolution units. We term these units CEECNet and FracTAL ResNet units. We validate the performance of these feature extraction building blocks on the CIFAR10 reference data and compare the results with standard ResNet modules. Further, we introduce a new encoder/decoder scheme, a network macro-topology, that is tailored for the task of change detection. We validate our approach by showing excellent performance and achieving state of the art score (F1 and Intersection over Union hereafter IoU) on two building change detection datasets, namely, the LEVIRCD (F1: 0.918, IoU: 0.848) and the WHU (F1: 0.938, IoU: 0.882) datasets.
摘要:变化检测,每对从一组双时间共注册图像的兴趣某些类的变化像素,即识别,是在遥感领域的基本任务。它仍然具有挑战性,由于出现在输入图像中不同时期的变化无关的形式。这些变化,由于不同的环境条件或简单地改变那些不感兴趣的对象。在这里,我们提出了语义变化检测任务可靠的深度学习框架非常高分辨率的航拍图像。我们的框架由一个新的损失的功能,新的关注的模块,新的特征提取积木,那就是语义变化检测任务量身打造的一款新的骨干架构的。具体地,我们定义一组相似性,即基于所述骰子系数的变型的迭代评估的新形式。我们使用该相似性度量来定义一个新的损失函数,以及一个新的空间和通道卷积注意层(分形)。新的关注层,专为视觉任务设计的,是高效存储,因此它适用于深卷积网络的各个层面使用。在此基础上,我们引入了两个新的高效的自包含的特征提取卷积单元。我们称这些单位CEECNet和分形RESNET单位。我们验证的CIFAR10参考数据这些特征提取积木的性能和标准RESNET模块的比较结果。此外,我们还引进了新的编码/解码方案,网络宏观拓扑结构,即对于变化检测任务量身定制。我们通过展示出色的性能和两个实现艺术评分(F1和路口在联盟以后IOU)的国家建设变化检测数据集验证我们的做法,即LEVIRCD(F1:0.918,欠条:0.848)和武汉大学(F1: 0.938,IOU:0.882)的数据集。
6. TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator [PDF] 返回目录
Doyeon Kim, Donggyu Joo, Junmo Kim
Abstract: Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames.This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.
摘要:技术的进步导致的,它可以创建所需的视觉多媒体手段的发展。特别是,使用深层学习图像生成已被广泛地跨越不同领域的研究。相比较而言,视频产生,特别是在条件输入,仍然是一个挑战性和更少的探索区域。要缩小这个差距,我们的目标是培养我们的模型生成对应于给定文本描述的视频。我们提出了一个新颖的培训框架,文本到影像到视频剖成对抗性网(TiVGAN),其演变帧一帧,最后产生一个完整长度的视频。在第一阶段中,我们侧重于同时学习文本和图像之间的关系建立高品质的单个视频帧。作为步骤进行,我们的模式正逐步的培训更多数量的连续frames.This的一步一步的学习过程,有助于稳定训练和实现了基于条件的文字说明创建高分辨率的视频。定性和各种数据集的定量实验结果证明了该方法的有效性。
Doyeon Kim, Donggyu Joo, Junmo Kim
Abstract: Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames.This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.
摘要:技术的进步导致的,它可以创建所需的视觉多媒体手段的发展。特别是,使用深层学习图像生成已被广泛地跨越不同领域的研究。相比较而言,视频产生,特别是在条件输入,仍然是一个挑战性和更少的探索区域。要缩小这个差距,我们的目标是培养我们的模型生成对应于给定文本描述的视频。我们提出了一个新颖的培训框架,文本到影像到视频剖成对抗性网(TiVGAN),其演变帧一帧,最后产生一个完整长度的视频。在第一阶段中,我们侧重于同时学习文本和图像之间的关系建立高品质的单个视频帧。作为步骤进行,我们的模式正逐步的培训更多数量的连续frames.This的一步一步的学习过程,有助于稳定训练和实现了基于条件的文字说明创建高分辨率的视频。定性和各种数据集的定量实验结果证明了该方法的有效性。
7. Real-Time Selfie Video Stabilization [PDF] 返回目录
Jiyang Yu, Ravi Ramamoorthi, Keli Cheng, Michel Sarkis, Ning Bi
Abstract: We propose a novel real-time selfie video stabilization method. Our method is completely automatic and runs at 26 fps. We use a 1D linear convolutional network to directly infer the rigid moving least squares warping which implicitly balances between the global rigidity and local flexibility. Our network structure is specifically designed to stabilize the background and foreground at the same time, while providing optional control of stabilization focus (relative importance of foreground vs. background) to the users. To train our network, we collect a selfie video dataset with 1005 videos, which is significantly larger than previous selfie video datasets. We also propose a grid approximation method to the rigid moving least squares warping that enables the real-time frame warping. Our method is fully automatic and produces visually and quantitatively better results than previous real-time general video stabilization methods. Compared to previous offline selfie video methods, our approach produces comparable quality with a speed improvement of orders of magnitude.
摘要:本文提出了一种实时的自拍视频稳定方法。我们的方法是完全自动的,在26 fps的运行。我们使用一维线性卷积网络直接推断出刚性移动最小二乘扭曲其整体刚性和局部灵活性之间的平衡含蓄。我们的网络结构被特别设计成稳定的同时在背景和前景,同时提供稳定的聚焦可选控制(的前景与背景相对重要性)给用户。为了训练我们的网络,我们将收集与1005点的视频,这比以前的自拍视频数据集显著更大的自拍视频数据集。我们还提出了一个网格逼近法刚性移动最小二乘扭曲,使实时帧翘曲。我们的方法是完全自动的产生比以前实时一般视频稳定方法视觉上和数量上更好的结果。相比于之前的离线自拍视频的方法,我们的做法会产生相当的质量与数量级的速度提升。
Jiyang Yu, Ravi Ramamoorthi, Keli Cheng, Michel Sarkis, Ning Bi
Abstract: We propose a novel real-time selfie video stabilization method. Our method is completely automatic and runs at 26 fps. We use a 1D linear convolutional network to directly infer the rigid moving least squares warping which implicitly balances between the global rigidity and local flexibility. Our network structure is specifically designed to stabilize the background and foreground at the same time, while providing optional control of stabilization focus (relative importance of foreground vs. background) to the users. To train our network, we collect a selfie video dataset with 1005 videos, which is significantly larger than previous selfie video datasets. We also propose a grid approximation method to the rigid moving least squares warping that enables the real-time frame warping. Our method is fully automatic and produces visually and quantitatively better results than previous real-time general video stabilization methods. Compared to previous offline selfie video methods, our approach produces comparable quality with a speed improvement of orders of magnitude.
摘要:本文提出了一种实时的自拍视频稳定方法。我们的方法是完全自动的,在26 fps的运行。我们使用一维线性卷积网络直接推断出刚性移动最小二乘扭曲其整体刚性和局部灵活性之间的平衡含蓄。我们的网络结构被特别设计成稳定的同时在背景和前景,同时提供稳定的聚焦可选控制(的前景与背景相对重要性)给用户。为了训练我们的网络,我们将收集与1005点的视频,这比以前的自拍视频数据集显著更大的自拍视频数据集。我们还提出了一个网格逼近法刚性移动最小二乘扭曲,使实时帧翘曲。我们的方法是完全自动的产生比以前实时一般视频稳定方法视觉上和数量上更好的结果。相比于之前的离线自拍视频的方法,我们的做法会产生相当的质量与数量级的速度提升。
8. SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human Pose Regression [PDF] 返回目录
Diogo Luvizon, Hedi Tabia, David Picard
Abstract: In this paper we propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach the Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps. We demonstrate the accuracy and the effectiveness of our method by providing extensive experiments on two of the most important publicly available datasets for 3D pose estimation, Human3.6M and MPI-INF-3DHP. Additionally, we provide relevant insights about our decisions on the network architecture and show its flexibility to meet the best precision-speed compromise.
摘要:本文提出了一种高度可扩展的卷积神经网络,终端到终端的可训练的,从仍然RGB图像的实时三维人体姿势的回归。我们把这种方法可扩展顺序金字塔网络(SSP-网),因为它是用在多尺度以连续的方式细化监督训练。我们的网络需要一个单一的训练过程,并且能够在超过200 FPS在测试时产生以每秒120帧(FPS),或可接受的预测其最佳的预测切割时的感觉。我们表明,该回归的方法是不变的特征地图的大小,使我们的方法以非常低的分辨率功能的地图进行多分辨率的中间监督和达到效果媲美的国家的最先进的。我们展示的准确性和我们的方法通过在两个三维姿态估计,Human3.6M和MPI-INF-3DHP最重要的公开可用的数据集提供了广泛的实验的有效性。此外,我们还提供相关的见解对我们的网络架构决定,并展现其灵活性,以满足最精密的高速妥协。
Diogo Luvizon, Hedi Tabia, David Picard
Abstract: In this paper we propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach the Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps. We demonstrate the accuracy and the effectiveness of our method by providing extensive experiments on two of the most important publicly available datasets for 3D pose estimation, Human3.6M and MPI-INF-3DHP. Additionally, we provide relevant insights about our decisions on the network architecture and show its flexibility to meet the best precision-speed compromise.
摘要:本文提出了一种高度可扩展的卷积神经网络,终端到终端的可训练的,从仍然RGB图像的实时三维人体姿势的回归。我们把这种方法可扩展顺序金字塔网络(SSP-网),因为它是用在多尺度以连续的方式细化监督训练。我们的网络需要一个单一的训练过程,并且能够在超过200 FPS在测试时产生以每秒120帧(FPS),或可接受的预测其最佳的预测切割时的感觉。我们表明,该回归的方法是不变的特征地图的大小,使我们的方法以非常低的分辨率功能的地图进行多分辨率的中间监督和达到效果媲美的国家的最先进的。我们展示的准确性和我们的方法通过在两个三维姿态估计,Human3.6M和MPI-INF-3DHP最重要的公开可用的数据集提供了广泛的实验的有效性。此外,我们还提供相关的见解对我们的网络架构决定,并展现其灵活性,以满足最精密的高速妥协。
9. A Hybrid Deep Learning Model for Arabic Text Recognition [PDF] 返回目录
Mohammad Fasha, Bassam Hammo, Nadim Obeid, Jabir Widian
Abstract: Arabic text recognition is a challenging task because of the cursive nature of Arabic writing system, its joint writing scheme, the large number of ligatures and many other challenges. Deep Learning DL models achieved significant progress in numerous domains including computer vision and sequence modelling. This paper presents a model that can recognize Arabic text that was printed using multiple font types including fonts that mimic Arabic handwritten scripts. The proposed model employs a hybrid DL network that can recognize Arabic printed text without the need for character segmentation. The model was tested on a custom dataset comprised of over two million word samples that were generated using 18 different Arabic font types. The objective of the testing process was to assess the model capability in recognizing a diverse set of Arabic fonts representing a varied cursive styles. The model achieved good results in recognizing characters and words and it also achieved promising results in recognizing characters when it was tested on unseen data. The prepared model, the custom datasets and the toolkit for generating similar datasets are made publicly available, these tools can be used to prepare models for recognizing other font types as well as to further extend and enhance the performance of the proposed model.
摘要:阿拉伯语文字识别是一项具有挑战性的任务,因为阿拉伯语书写系统,其合资写作计划,大量的绷带和许多其他挑战的草书性质。深度学习DL车型在众多领域,包括计算机视觉和序列建模取得显著的进展。本文提出了能够识别出使用多种字体类型,包括字体,模仿阿拉伯语手写脚本印制阿拉伯语文本模型。所提出的模型采用能识别阿拉伯打印文本而无需字符分割混合DL网络。该模型是由上使用18种不同的阿拉伯字体类型生成的超过两百万字的样本的数据集定制测试。测试过程的目的是评估在识别一组不同的代表变化的草书风格的阿拉伯字体的型号能力。该模型在识别字符和单词取得了良好效果,同时也实现了,当它是在看不见的测试数据中识别字符可喜的成果。准备好的模型,自定义数据集,并产生类似的数据集的工具都公之于众,这些工具可用于识别其他的字体类型,以及进一步扩展和增强了模型的性能做准备车型。
Mohammad Fasha, Bassam Hammo, Nadim Obeid, Jabir Widian
Abstract: Arabic text recognition is a challenging task because of the cursive nature of Arabic writing system, its joint writing scheme, the large number of ligatures and many other challenges. Deep Learning DL models achieved significant progress in numerous domains including computer vision and sequence modelling. This paper presents a model that can recognize Arabic text that was printed using multiple font types including fonts that mimic Arabic handwritten scripts. The proposed model employs a hybrid DL network that can recognize Arabic printed text without the need for character segmentation. The model was tested on a custom dataset comprised of over two million word samples that were generated using 18 different Arabic font types. The objective of the testing process was to assess the model capability in recognizing a diverse set of Arabic fonts representing a varied cursive styles. The model achieved good results in recognizing characters and words and it also achieved promising results in recognizing characters when it was tested on unseen data. The prepared model, the custom datasets and the toolkit for generating similar datasets are made publicly available, these tools can be used to prepare models for recognizing other font types as well as to further extend and enhance the performance of the proposed model.
摘要:阿拉伯语文字识别是一项具有挑战性的任务,因为阿拉伯语书写系统,其合资写作计划,大量的绷带和许多其他挑战的草书性质。深度学习DL车型在众多领域,包括计算机视觉和序列建模取得显著的进展。本文提出了能够识别出使用多种字体类型,包括字体,模仿阿拉伯语手写脚本印制阿拉伯语文本模型。所提出的模型采用能识别阿拉伯打印文本而无需字符分割混合DL网络。该模型是由上使用18种不同的阿拉伯字体类型生成的超过两百万字的样本的数据集定制测试。测试过程的目的是评估在识别一组不同的代表变化的草书风格的阿拉伯字体的型号能力。该模型在识别字符和单词取得了良好效果,同时也实现了,当它是在看不见的测试数据中识别字符可喜的成果。准备好的模型,自定义数据集,并产生类似的数据集的工具都公之于众,这些工具可用于识别其他的字体类型,以及进一步扩展和增强了模型的性能做准备车型。
10. Attribute Adaptive Margin Softmax Loss using Privileged Information [PDF] 返回目录
Seyed Mehdi Iranmanesh, Ali Dabouei, Nasser M. Nasrabadi
Abstract: We present a novel framework to exploit privileged information for recognition which is provided only during the training phase. Here, we focus on recognition task where images are provided as the main view and soft biometric traits (attributes) are provided as the privileged data (only available during training phase). We demonstrate that more discriminative feature space can be learned by enforcing a deep network to adjust adaptive margins between classes utilizing attributes. This tight constraint also effectively reduces the class imbalance inherent in the local data neighborhood, thus carving more balanced class boundaries locally and using feature space more efficiently. Extensive experiments are performed on five different datasets and the results show the superiority of our method compared to the state-of-the-art models in both tasks of face recognition and person re-identification.
摘要:本文提出了一种新的框架,以利用其在训练阶段仅供识别特权信息。在这里,我们重点放在图像作为主视图和软生物统计学特征(属性)是作为特许数据(仅在训练阶段提供)识别任务。我们证明更有辨别力的特征空间可以通过强制深网络,利用属性的类之间调整适应的利润率来学习。这种紧密的约束也有效地降低了本地数据居委会固有的不平衡类,因此局部雕刻更加平衡阶级界限,更有效地使用功能的空间。大量的实验是在五个不同的数据集进行,结果表明我们的方法的优越性相比,国家的最先进的车型在人脸识别和人重新鉴定的两项任务。
Seyed Mehdi Iranmanesh, Ali Dabouei, Nasser M. Nasrabadi
Abstract: We present a novel framework to exploit privileged information for recognition which is provided only during the training phase. Here, we focus on recognition task where images are provided as the main view and soft biometric traits (attributes) are provided as the privileged data (only available during training phase). We demonstrate that more discriminative feature space can be learned by enforcing a deep network to adjust adaptive margins between classes utilizing attributes. This tight constraint also effectively reduces the class imbalance inherent in the local data neighborhood, thus carving more balanced class boundaries locally and using feature space more efficiently. Extensive experiments are performed on five different datasets and the results show the superiority of our method compared to the state-of-the-art models in both tasks of face recognition and person re-identification.
摘要:本文提出了一种新的框架,以利用其在训练阶段仅供识别特权信息。在这里,我们重点放在图像作为主视图和软生物统计学特征(属性)是作为特许数据(仅在训练阶段提供)识别任务。我们证明更有辨别力的特征空间可以通过强制深网络,利用属性的类之间调整适应的利润率来学习。这种紧密的约束也有效地降低了本地数据居委会固有的不平衡类,因此局部雕刻更加平衡阶级界限,更有效地使用功能的空间。大量的实验是在五个不同的数据集进行,结果表明我们的方法的优越性相比,国家的最先进的车型在人脸识别和人重新鉴定的两项任务。
11. Compression-aware Continual Learning using Singular Value Decomposition [PDF] 返回目录
Varigonda Pavan Teja, Priyadarshini Panda
Abstract: We propose a compression based continual task learning method that can dynamically grow a neural network. Inspired from the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations using singular value decomposition (SVD) to achieve network compaction. By encouraging the network to learn low-rank weight filters, our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning. Specifically, we decompose the weight filters using SVD and train the network on incremental tasks in its factorized form. Such a factorization allows us to directly impose sparsity-inducing regularizers over the singular values and allows us to use fewer number of parameters for each task. We further introduce a novel shared representational space based learning between tasks. This promotes the incoming tasks to only learn residual task-specific information on top of the previously learnt weight filters and greatly helps in learning under fixed capacity constraints. Our method significantly outperforms prior continual learning approaches on three benchmark datasets, demonstrating accuracy improvements of 10.3%, 12.3%, 15.6% on 20-split CIFAR-100, miniImageNet and a 5-sequence dataset, respectively, over state-of-the-art. Further, our method yields compressed models that have ~3.64x, 2.88x, 5.91x fewer number of parameters respectively, on the above mentioned datasets in comparison to baseline individual task models. Our source code is available at this https URL.
摘要:本文提出了一种基于压缩任务持续的学习方法,可以动态增加一个神经网络。从最近的模型压缩技术的启发,我们采用压缩感知训练和使用奇异值分解(SVD)来实现网络的压实进行低级别的重量近似。通过鼓励网络学习的低品质重过滤器,我们的方法实现压缩以最小的性能退化表示,而不需要昂贵的微调。具体来说,我们使用SVD分解的权重过滤器和列车网络在其因式分解形式增量任务。这种分解允许我们直接施以稀疏诱导regularizers过的奇异值,并允许我们使用的参数数量较少每个任务。我们进一步引入新的基于共享的任务之间的学习表示空间。这促进了进来的任务不仅学习上以前学过重过滤器的顶部残留特定任务的信息,大大下固定容量限制的学习有所帮助。我们的方法显著优于现有不断学习分别在三个基准数据集接近,这表明10.3%,12.3%,20分割CIFAR-100,miniImageNet 15.6%的准确度的改进和有5序列数据集,在国家的the-艺术。此外,我们的方法的产率相比于基线个别任务模型压缩分别具有的参数〜3.64x,2.88x,5.91x较少数量的模型,上提的数据集的上方。我们的源代码可在此HTTPS URL。
Varigonda Pavan Teja, Priyadarshini Panda
Abstract: We propose a compression based continual task learning method that can dynamically grow a neural network. Inspired from the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations using singular value decomposition (SVD) to achieve network compaction. By encouraging the network to learn low-rank weight filters, our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning. Specifically, we decompose the weight filters using SVD and train the network on incremental tasks in its factorized form. Such a factorization allows us to directly impose sparsity-inducing regularizers over the singular values and allows us to use fewer number of parameters for each task. We further introduce a novel shared representational space based learning between tasks. This promotes the incoming tasks to only learn residual task-specific information on top of the previously learnt weight filters and greatly helps in learning under fixed capacity constraints. Our method significantly outperforms prior continual learning approaches on three benchmark datasets, demonstrating accuracy improvements of 10.3%, 12.3%, 15.6% on 20-split CIFAR-100, miniImageNet and a 5-sequence dataset, respectively, over state-of-the-art. Further, our method yields compressed models that have ~3.64x, 2.88x, 5.91x fewer number of parameters respectively, on the above mentioned datasets in comparison to baseline individual task models. Our source code is available at this https URL.
摘要:本文提出了一种基于压缩任务持续的学习方法,可以动态增加一个神经网络。从最近的模型压缩技术的启发,我们采用压缩感知训练和使用奇异值分解(SVD)来实现网络的压实进行低级别的重量近似。通过鼓励网络学习的低品质重过滤器,我们的方法实现压缩以最小的性能退化表示,而不需要昂贵的微调。具体来说,我们使用SVD分解的权重过滤器和列车网络在其因式分解形式增量任务。这种分解允许我们直接施以稀疏诱导regularizers过的奇异值,并允许我们使用的参数数量较少每个任务。我们进一步引入新的基于共享的任务之间的学习表示空间。这促进了进来的任务不仅学习上以前学过重过滤器的顶部残留特定任务的信息,大大下固定容量限制的学习有所帮助。我们的方法显著优于现有不断学习分别在三个基准数据集接近,这表明10.3%,12.3%,20分割CIFAR-100,miniImageNet 15.6%的准确度的改进和有5序列数据集,在国家的the-艺术。此外,我们的方法的产率相比于基线个别任务模型压缩分别具有的参数〜3.64x,2.88x,5.91x较少数量的模型,上提的数据集的上方。我们的源代码可在此HTTPS URL。
12. Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera [PDF] 返回目录
Chen Fu, Chiyu Dong, Christoph Mertz, John M. Dolan
Abstract: Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.
摘要:现代高清激光雷达是商业自主驾驶车辆和小型室内机器人昂贵。经济实惠的解决这个问题是与RGB图像平面LIDAR的融合,以提供感知能力的类似水平。即使状态的最先进的方法提供接近从有限传感器输入预测深度信息,它们通常的稀疏LIDAR特征的简单级联和致密RGB通过端至端融合架构采用。在本文中,我们引入一个电感后融合块,更好的保险丝由一个概率模型的启发不同的传感器模式。所提出的示范和聚合网络中传播的混合背景和深度特性来预测网络并用作深度完成的先验知识。此后融合块使用密上下文设有基于由稀疏深度特征示范引导深度预测。除了评估基准上完成深度数据集包括NYUDepthV2和KITTI所提出的方法,我们也测试在模拟平面LIDAR数据集所提出的方法。我们的方法显示出有希望的结果相比,在基准数据集和模拟数据集各种3D密度两种以前的方法。
Chen Fu, Chiyu Dong, Christoph Mertz, John M. Dolan
Abstract: Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.
摘要:现代高清激光雷达是商业自主驾驶车辆和小型室内机器人昂贵。经济实惠的解决这个问题是与RGB图像平面LIDAR的融合,以提供感知能力的类似水平。即使状态的最先进的方法提供接近从有限传感器输入预测深度信息,它们通常的稀疏LIDAR特征的简单级联和致密RGB通过端至端融合架构采用。在本文中,我们引入一个电感后融合块,更好的保险丝由一个概率模型的启发不同的传感器模式。所提出的示范和聚合网络中传播的混合背景和深度特性来预测网络并用作深度完成的先验知识。此后融合块使用密上下文设有基于由稀疏深度特征示范引导深度预测。除了评估基准上完成深度数据集包括NYUDepthV2和KITTI所提出的方法,我们也测试在模拟平面LIDAR数据集所提出的方法。我们的方法显示出有希望的结果相比,在基准数据集和模拟数据集各种3D密度两种以前的方法。
13. A general approach to bridge the reality-gap [PDF] 返回目录
Michael Lomnitz, Zigfried Hampel-Arias, Nina Lopatina, Felipe A. Mejia
Abstract: Employing machine learning models in the real world requires collecting large amounts of data, which is both time consuming and costly to collect. A common approach to circumvent this is to leverage existing, similar data-sets with large amounts of labelled data. However, models trained on these canonical distributions do not readily transfer to real-world ones. Domain adaptation and transfer learning are often used to breach this "reality gap", though both require a substantial amount of real-world data. In this paper we discuss a more general approach: we propose learning a general transformation to bring arbitrary images towards a canonical distribution where we can naively apply the trained machine learning models. This transformation is trained in an unsupervised regime, leveraging data augmentation to generate off-canonical examples of images and training a Deep Learning model to recover their original counterpart. We quantify the performance of this transformation using pre-trained ImageNet classifiers, demonstrating that this procedure can recover half of the loss in performance on the distorted data-set. We then validate the effectiveness of this approach on a series of pre-trained ImageNet models on a real world data set collected by printing and photographing images in different lighting conditions.
摘要:在现实世界中用人机器学习模型,需要收集大量的数据,这是既费时又费钱收集。绕过这个一种常见的方法是利用现有的,类似的数据集用大量标记的数据。然而,培训了这些规范的分布模型不容易转移到现实世界的。领域适应性和迁移学习常常被用来违反本“现实的差距”,但都需要真实世界的数据的大量。在本文中,我们讨论一个更通用的方法:我们建议学习走向的正则分布在这里我们可以天真地应用训练的机器学习模型带来任意图像一般的转变。这种转变是在无监督制度的培训,充分利用数据扩张产生偏离规范的图像的示例,并培养了深度学习模式,恢复到原来的副本。我们使用预训练ImageNet分类,这表明这个程序可以在扭曲的数据集恢复损失的一半业绩量化这种转变的表现。然后,我们验证对现实世界的数据集的一系列预训练ImageNet模型通过打印和拍摄不同的光照条件下的图像采集这种方法的有效性。
Michael Lomnitz, Zigfried Hampel-Arias, Nina Lopatina, Felipe A. Mejia
Abstract: Employing machine learning models in the real world requires collecting large amounts of data, which is both time consuming and costly to collect. A common approach to circumvent this is to leverage existing, similar data-sets with large amounts of labelled data. However, models trained on these canonical distributions do not readily transfer to real-world ones. Domain adaptation and transfer learning are often used to breach this "reality gap", though both require a substantial amount of real-world data. In this paper we discuss a more general approach: we propose learning a general transformation to bring arbitrary images towards a canonical distribution where we can naively apply the trained machine learning models. This transformation is trained in an unsupervised regime, leveraging data augmentation to generate off-canonical examples of images and training a Deep Learning model to recover their original counterpart. We quantify the performance of this transformation using pre-trained ImageNet classifiers, demonstrating that this procedure can recover half of the loss in performance on the distorted data-set. We then validate the effectiveness of this approach on a series of pre-trained ImageNet models on a real world data set collected by printing and photographing images in different lighting conditions.
摘要:在现实世界中用人机器学习模型,需要收集大量的数据,这是既费时又费钱收集。绕过这个一种常见的方法是利用现有的,类似的数据集用大量标记的数据。然而,培训了这些规范的分布模型不容易转移到现实世界的。领域适应性和迁移学习常常被用来违反本“现实的差距”,但都需要真实世界的数据的大量。在本文中,我们讨论一个更通用的方法:我们建议学习走向的正则分布在这里我们可以天真地应用训练的机器学习模型带来任意图像一般的转变。这种转变是在无监督制度的培训,充分利用数据扩张产生偏离规范的图像的示例,并培养了深度学习模式,恢复到原来的副本。我们使用预训练ImageNet分类,这表明这个程序可以在扭曲的数据集恢复损失的一半业绩量化这种转变的表现。然后,我们验证对现实世界的数据集的一系列预训练ImageNet模型通过打印和拍摄不同的光照条件下的图像采集这种方法的有效性。
14. Improving axial resolution in SIM using deep learning [PDF] 返回目录
Miguel Boland, Edward A.K. Cohen, Seth Flaxman, Mark A.A. Neil
Abstract: Structured Illumination Microscopy is a widespread methodology to image live and fixed biological structures smaller than the diffraction limits of conventional optical microscopy. Using recent advances in image up-scaling through deep learning models, we demonstrate a method to reconstruct 3D SIM image stacks with twice the axial resolution attainable through conventional SIM reconstructions. We further evaluate our method for robustness to noise & generalisability to varying observed specimens, and discuss potential adaptions of the method to further improvements in resolution.
摘要:结构照明显微镜是一种普遍的方法,以图像实时和固定生物结构比常规的光学显微镜的衍射极限小。在图像上缩放通过深入学习模型中使用的最新进展,我们展示了重建的3D图像SIM栈通过常规SIM重建的轴向分辨率可达到两倍的方法。我们我们的稳健性噪声和普适性方法进一步评估,以不同的观察标本,并讨论该方法的潜力adaptions在分辨率进一步改进。
Miguel Boland, Edward A.K. Cohen, Seth Flaxman, Mark A.A. Neil
Abstract: Structured Illumination Microscopy is a widespread methodology to image live and fixed biological structures smaller than the diffraction limits of conventional optical microscopy. Using recent advances in image up-scaling through deep learning models, we demonstrate a method to reconstruct 3D SIM image stacks with twice the axial resolution attainable through conventional SIM reconstructions. We further evaluate our method for robustness to noise & generalisability to varying observed specimens, and discuss potential adaptions of the method to further improvements in resolution.
摘要:结构照明显微镜是一种普遍的方法,以图像实时和固定生物结构比常规的光学显微镜的衍射极限小。在图像上缩放通过深入学习模型中使用的最新进展,我们展示了重建的3D图像SIM栈通过常规SIM重建的轴向分辨率可达到两倍的方法。我们我们的稳健性噪声和普适性方法进一步评估,以不同的观察标本,并讨论该方法的潜力adaptions在分辨率进一步改进。
15. SketchPatch: Sketch Stylization via Seamless Patch-level Synthesis [PDF] 返回目录
Noa Fish, Lilach Perry, Amit Bermano, Daniel Cohen-Or
Abstract: The paradigm of image-to-image translation is leveraged for the benefit of sketch stylization via transfer of geometric textural details. Lacking the necessary volumes of data for standard training of translation systems, we advocate for operation at the patch level, where a handful of stylized sketches provide ample mining potential for patches featuring basic geometric primitives. Operating at the patch level necessitates special consideration of full sketch translation, as individual translation of patches with no regard to neighbors is likely to produce visible seams and artifacts at patch borders. Aligned pairs of styled and plain primitives are combined to form input hybrids containing styled elements around the border and plain elements within, and given as input to a seamless translation (ST) generator, whose output patches are expected to reconstruct the fully styled patch. An adversarial addition promotes generalization and robustness to diverse geometries at inference time, forming a simple and effective system for arbitrary sketch stylization, as demonstrated upon a variety of styles and sketches.
摘要:图像 - 图像转换的范式是杠杆经由几何纹理细节传递素描风格化的好处。缺乏对翻译系统的标准培训必要的数据量,我们在补丁级别,其中程式化的草图寥寥提供补丁配备基本的几何图元充裕挖掘潜在倡导的操作。在补丁级别工作充分必要小品翻译的特殊考虑,因为没有考虑到邻居的补丁个人翻译很可能在补丁边界产生可见的接缝和瑕疵。对准对风格和滑动原语被组合以形成含有约边界风格元素和内滑动元件输入杂交体,并作为输入提供给无缝转换(ST)发生器,它的输出补丁预计以重构完全的风格补丁给出。对抗性除了促进概括和鲁棒性在推理时间不同的几何形状,形成用于任意草图程式化一个简单而有效的系统中,在各种不同的风格和草图所证明。
Noa Fish, Lilach Perry, Amit Bermano, Daniel Cohen-Or
Abstract: The paradigm of image-to-image translation is leveraged for the benefit of sketch stylization via transfer of geometric textural details. Lacking the necessary volumes of data for standard training of translation systems, we advocate for operation at the patch level, where a handful of stylized sketches provide ample mining potential for patches featuring basic geometric primitives. Operating at the patch level necessitates special consideration of full sketch translation, as individual translation of patches with no regard to neighbors is likely to produce visible seams and artifacts at patch borders. Aligned pairs of styled and plain primitives are combined to form input hybrids containing styled elements around the border and plain elements within, and given as input to a seamless translation (ST) generator, whose output patches are expected to reconstruct the fully styled patch. An adversarial addition promotes generalization and robustness to diverse geometries at inference time, forming a simple and effective system for arbitrary sketch stylization, as demonstrated upon a variety of styles and sketches.
摘要:图像 - 图像转换的范式是杠杆经由几何纹理细节传递素描风格化的好处。缺乏对翻译系统的标准培训必要的数据量,我们在补丁级别,其中程式化的草图寥寥提供补丁配备基本的几何图元充裕挖掘潜在倡导的操作。在补丁级别工作充分必要小品翻译的特殊考虑,因为没有考虑到邻居的补丁个人翻译很可能在补丁边界产生可见的接缝和瑕疵。对准对风格和滑动原语被组合以形成含有约边界风格元素和内滑动元件输入杂交体,并作为输入提供给无缝转换(ST)发生器,它的输出补丁预计以重构完全的风格补丁给出。对抗性除了促进概括和鲁棒性在推理时间不同的几何形状,形成用于任意草图程式化一个简单而有效的系统中,在各种不同的风格和草图所证明。
16. Dual Precision Deep Neural Network [PDF] 返回目录
Jae Hyun Park, Ji Sub Choi, Jong Hwan Ko
Abstract: On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.
摘要:深层神经网络(DNNs)的在线精密的可扩展性是DNN推理过程中的一个关键功能,支持精确度和复杂性的权衡。在本文中,我们提出了双精度DNN包括在一个单一的模型两种不同的精度模式,从而支持无需重新训练的在线精度开关。在提出了两个阶段的训练过程优化了高,低精度模式。
Jae Hyun Park, Ji Sub Choi, Jong Hwan Ko
Abstract: On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.
摘要:深层神经网络(DNNs)的在线精密的可扩展性是DNN推理过程中的一个关键功能,支持精确度和复杂性的权衡。在本文中,我们提出了双精度DNN包括在一个单一的模型两种不同的精度模式,从而支持无需重新训练的在线精度开关。在提出了两个阶段的训练过程优化了高,低精度模式。
17. Naive Artificial Intelligence [PDF] 返回目录
Tomer Barak, Yehonatan Avidan, Yonatan Loewenstein
Abstract: In the cognitive sciences, it is common to distinguish between crystal intelligence, the ability to utilize knowledge acquired through past learning or experience and fluid intelligence, the ability to solve novel problems without relying on prior knowledge. Using this cognitive distinction between the two types of intelligence, extensively-trained deep networks that can play chess or Go exhibit crystal but not fluid intelligence. In humans, fluid intelligence is typically studied and quantified using intelligence tests. Previous studies have shown that deep networks can solve some forms of intelligence tests, but only after extensive training. Here we present a computational model that solves intelligence tests without any prior training. This ability is based on continual inductive reasoning, and is implemented by deep unsupervised latent-prediction networks. Our work demonstrates the potential fluid intelligence of deep networks. Finally, we propose that the computational principles underlying our approach can be used to model fluid intelligence in the cognitive sciences.
摘要:在认知科学,它是常见的晶体智力区分,利用通过过去学习或经验和流体智力,解决新问题,而不依赖于先验知识的能力,获取知识的能力。使用两种类型的智力之间的这种认知区别,广泛培训的深网络,可以下棋或者去展览晶但不流体智力。在人类中,流体智力一般的研究和利用智力测验量化。以往的研究表明深网络可以解决一些形式的智力测验,但只有经过广泛的培训。在这里,我们提出一个计算模型,没有任何事先的培训解决了智力测验。这种能力是基于不断的归纳推理,并通过深无监督潜预测网络来实现。我们的工作表明深层网络的潜力流体智力。最后,我们建议我们的方法基本的计算原理可用于流体智力认知科学的模型。
Tomer Barak, Yehonatan Avidan, Yonatan Loewenstein
Abstract: In the cognitive sciences, it is common to distinguish between crystal intelligence, the ability to utilize knowledge acquired through past learning or experience and fluid intelligence, the ability to solve novel problems without relying on prior knowledge. Using this cognitive distinction between the two types of intelligence, extensively-trained deep networks that can play chess or Go exhibit crystal but not fluid intelligence. In humans, fluid intelligence is typically studied and quantified using intelligence tests. Previous studies have shown that deep networks can solve some forms of intelligence tests, but only after extensive training. Here we present a computational model that solves intelligence tests without any prior training. This ability is based on continual inductive reasoning, and is implemented by deep unsupervised latent-prediction networks. Our work demonstrates the potential fluid intelligence of deep networks. Finally, we propose that the computational principles underlying our approach can be used to model fluid intelligence in the cognitive sciences.
摘要:在认知科学,它是常见的晶体智力区分,利用通过过去学习或经验和流体智力,解决新问题,而不依赖于先验知识的能力,获取知识的能力。使用两种类型的智力之间的这种认知区别,广泛培训的深网络,可以下棋或者去展览晶但不流体智力。在人类中,流体智力一般的研究和利用智力测验量化。以往的研究表明深网络可以解决一些形式的智力测验,但只有经过广泛的培训。在这里,我们提出一个计算模型,没有任何事先的培训解决了智力测验。这种能力是基于不断的归纳推理,并通过深无监督潜预测网络来实现。我们的工作表明深层网络的潜力流体智力。最后,我们建议我们的方法基本的计算原理可用于流体智力认知科学的模型。
18. Improving Self-Organizing Maps with Unsupervised Feature Extraction [PDF] 返回目录
Lyes Khacef, Laurent Rodriguez, Benoit Miramond
Abstract: The Self-Organizing Map (SOM) is a brain-inspired neural model that is very promising for unsupervised learning, especially in embedded applications. However, it is unable to learn efficient prototypes when dealing with complex datasets. We propose in this work to improve the SOM performance by using extracted features instead of raw data. We conduct a comparative study on the SOM classification accuracy with unsupervised feature extraction using two different approaches: a machine learning approach with Sparse Convolutional Auto-Encoders using gradient-based learning, and a neuroscience approach with Spiking Neural Networks using Spike Timing Dependant Plasticity learning. The SOM is trained on the extracted features, then very few labeled samples are used to label the neurons with their corresponding class. We investigate the impact of the feature maps, the SOM size and the labeled subset size on the classification accuracy using the different feature extraction methods. We improve the SOM classification by +6.09\% and reach state-of-the-art performance on unsupervised image classification.
摘要:自组织映射(SOM)是非常有前途的无监督学习,尤其是在嵌入式应用脑启发的神经网络模型。但是,它无法与复杂的数据集打交道时,学习效率原型。我们建议在这项工作中,通过使用提取的特征,而不是原始数据,以提高性能SOM。我们进行了使用两种不同的方法与无监督特征提取的SOM分类准确度进行比较研究:使用基于梯度的学习与稀疏卷积自动编码器的机器学习方法,以及使用峰值时相关的塑性学习与扣球神经网络神经科学的方法。 SOM的是所提取的特征的训练,那么很少有标记的样品用来与它们对应的类来标记神经元。我们调查的特征地图上用不同的特征提取方法的分类精度的影响,SOM尺寸和标记的子集大小。我们改善6.09无监督图像分类的SOM分类\%和国家的最先进的性能范围。
Lyes Khacef, Laurent Rodriguez, Benoit Miramond
Abstract: The Self-Organizing Map (SOM) is a brain-inspired neural model that is very promising for unsupervised learning, especially in embedded applications. However, it is unable to learn efficient prototypes when dealing with complex datasets. We propose in this work to improve the SOM performance by using extracted features instead of raw data. We conduct a comparative study on the SOM classification accuracy with unsupervised feature extraction using two different approaches: a machine learning approach with Sparse Convolutional Auto-Encoders using gradient-based learning, and a neuroscience approach with Spiking Neural Networks using Spike Timing Dependant Plasticity learning. The SOM is trained on the extracted features, then very few labeled samples are used to label the neurons with their corresponding class. We investigate the impact of the feature maps, the SOM size and the labeled subset size on the classification accuracy using the different feature extraction methods. We improve the SOM classification by +6.09\% and reach state-of-the-art performance on unsupervised image classification.
摘要:自组织映射(SOM)是非常有前途的无监督学习,尤其是在嵌入式应用脑启发的神经网络模型。但是,它无法与复杂的数据集打交道时,学习效率原型。我们建议在这项工作中,通过使用提取的特征,而不是原始数据,以提高性能SOM。我们进行了使用两种不同的方法与无监督特征提取的SOM分类准确度进行比较研究:使用基于梯度的学习与稀疏卷积自动编码器的机器学习方法,以及使用峰值时相关的塑性学习与扣球神经网络神经科学的方法。 SOM的是所提取的特征的训练,那么很少有标记的样品用来与它们对应的类来标记神经元。我们调查的特征地图上用不同的特征提取方法的分类精度的影响,SOM尺寸和标记的子集大小。我们改善6.09无监督图像分类的SOM分类\%和国家的最先进的性能范围。
19. Multi-Attention-Network for Semantic Segmentation of High-Resolution Remote Sensing Images [PDF] 返回目录
Rui Li, Shunyi Zheng, Chenxi Duan, Jianlin Su
Abstract: Semantic segmentation of remote sensing images plays an important role in land resource management, yield estimation, and economic assessment. Even though the semantic segmentation of remote sensing images has been prominently improved by convolutional neural networks, there are still several limitations contained in standard models. First, for encoder-decoder architectures like U-Net, the utilization of multi-scale features causes overuse of information, where similar low-level features are exploited at multiple scales for multiple times. Second, long-range dependencies of feature maps are not sufficiently explored, leading to feature representations associated with each semantic class are not optimal. Third, despite the dot-product attention mechanism has been introduced and harnessed widely in semantic segmentation to model long-range dependencies, the high time and space complexities of attention impede the usage of attention in application scenarios with large input. In this paper, we proposed a Multi-Attention-Network (MANet) to remedy these drawbacks, which extracts contextual dependencies by multi efficient attention mechanisms. A novel attention mechanism named kernel attention with linear complexity is proposed to alleviate the high computational demand of attention. Based on kernel attention and channel attention, we integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies, and adaptively signalize interdependent channel maps. Experiments conducted on two remote sensing image datasets captured by variant satellites demonstrate that the performance of our MANet transcends the DeepLab V3+, PSPNet, FastFCN, and other baseline algorithms.
摘要:遥感图像语义分割中扮演着土地资源管理,估产和经济评估的一个重要的角色。虽然遥感图像的语义分割已经卷积神经网络得到显着改善,但仍有包含在标准模型的一些限制。首先,编码器 - 解码器的体系结构类似U形网,多尺度的利用率特征原因过度使用的信息,其中,类似的低层次特征在多尺度下多次利用。其次,特征地图远距离的依赖不会被充分探讨,从而导致每个语义类相关联的特征表示是不是最佳的。第三,尽管点积注意机制已经出台,并在语义分割广泛利用,以模型远射依赖性,高时间和空间的关注复杂阻碍注意使用在应用场景与大的输入。在本文中,我们提出了多注意力网络(MANET)弥补这些缺陷,其提取由多效机制,重视语境依赖性。线性复杂命名的内核关注新颖注意机制,提出了缓解人们关注的高计算需求。基于内核的关注和重视渠道,我们整合本地特征映射由ResNeXt-101及其相应的全球依赖性提取,自适应信号化相互依存的通道映射。由变体卫星捕获的两个遥感图像数据组进行的实验表明,我们的MANET的性能超越DeepLab V3 +,PSPNet,FastFCN和其它基线的算法。
Rui Li, Shunyi Zheng, Chenxi Duan, Jianlin Su
Abstract: Semantic segmentation of remote sensing images plays an important role in land resource management, yield estimation, and economic assessment. Even though the semantic segmentation of remote sensing images has been prominently improved by convolutional neural networks, there are still several limitations contained in standard models. First, for encoder-decoder architectures like U-Net, the utilization of multi-scale features causes overuse of information, where similar low-level features are exploited at multiple scales for multiple times. Second, long-range dependencies of feature maps are not sufficiently explored, leading to feature representations associated with each semantic class are not optimal. Third, despite the dot-product attention mechanism has been introduced and harnessed widely in semantic segmentation to model long-range dependencies, the high time and space complexities of attention impede the usage of attention in application scenarios with large input. In this paper, we proposed a Multi-Attention-Network (MANet) to remedy these drawbacks, which extracts contextual dependencies by multi efficient attention mechanisms. A novel attention mechanism named kernel attention with linear complexity is proposed to alleviate the high computational demand of attention. Based on kernel attention and channel attention, we integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies, and adaptively signalize interdependent channel maps. Experiments conducted on two remote sensing image datasets captured by variant satellites demonstrate that the performance of our MANet transcends the DeepLab V3+, PSPNet, FastFCN, and other baseline algorithms.
摘要:遥感图像语义分割中扮演着土地资源管理,估产和经济评估的一个重要的角色。虽然遥感图像的语义分割已经卷积神经网络得到显着改善,但仍有包含在标准模型的一些限制。首先,编码器 - 解码器的体系结构类似U形网,多尺度的利用率特征原因过度使用的信息,其中,类似的低层次特征在多尺度下多次利用。其次,特征地图远距离的依赖不会被充分探讨,从而导致每个语义类相关联的特征表示是不是最佳的。第三,尽管点积注意机制已经出台,并在语义分割广泛利用,以模型远射依赖性,高时间和空间的关注复杂阻碍注意使用在应用场景与大的输入。在本文中,我们提出了多注意力网络(MANET)弥补这些缺陷,其提取由多效机制,重视语境依赖性。线性复杂命名的内核关注新颖注意机制,提出了缓解人们关注的高计算需求。基于内核的关注和重视渠道,我们整合本地特征映射由ResNeXt-101及其相应的全球依赖性提取,自适应信号化相互依存的通道映射。由变体卫星捕获的两个遥感图像数据组进行的实验表明,我们的MANET的性能超越DeepLab V3 +,PSPNet,FastFCN和其它基线的算法。
20. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity [PDF] 返回目录
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, Geehyuk Lee
Abstract: For human-like agents, including virtual avatars and social robots, making proper gestures while speaking is crucial in human--agent interaction. Co-speech gestures enhance interaction experiences and make the agents look alive. However, it is difficult to generate human-like gestures due to the lack of understanding of how people gesture. Data-driven approaches attempt to learn gesticulation skills from human demonstrations, but the ambiguous and individual nature of gestures hinders learning. In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. By incorporating a multimodal context and an adversarial training scheme, the proposed model outputs gestures that are human-like and that match with speech content and rhythm. We also introduce a new quantitative evaluation metric for gesture generation models. Experiments with the introduced metric and subjective human evaluation showed that the proposed gesture generation model is better than existing end-to-end generation models. We further confirm that our model is able to work with synthesized audio in a scenario where contexts are constrained, and show that different gesture styles can be generated for the same speech by specifying different speaker identities in the style embedding space that is learned from videos of various speakers. All the code and data is available at this https URL.
摘要:对于类似人类的药物,包括虚拟化身和社会的机器人,进行正确的手势,而说是人类至关重要 - 代理交互。联合演讲手势提升交互体验,使代理商看上去精神抖擞。然而,就很难产生类似人类的姿态由于缺乏人们如何手势的理解。数据驱动的方法试图从人的示威学习手势技能,但手势阻碍的暧昧与个体性学习。在本文中,我们提出了一个使用语音文本,音频和扬声器身份的可靠地生成手势多峰上下文的自动手势生成模型。通过结合多上下文和对抗性训练计划,即有类似人类的和匹配的讲话内容和节奏提出的模型输出的手势。我们还引进了手势一代车型新的定量评价指标。所引入的指标和主观人为评估实验表明,该手势代车型比现有的终端到终端的代车型更好。我们还确认,我们的模型能够在一个场景中合成音频,其中上下文的限制,并表明,不同的手势风格可以为同一个讲话,在从影片学到的风格嵌入区域指定不同的扬声器身份而产生的工作各种扬声器。所有的代码和数据可在此HTTPS URL。
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, Geehyuk Lee
Abstract: For human-like agents, including virtual avatars and social robots, making proper gestures while speaking is crucial in human--agent interaction. Co-speech gestures enhance interaction experiences and make the agents look alive. However, it is difficult to generate human-like gestures due to the lack of understanding of how people gesture. Data-driven approaches attempt to learn gesticulation skills from human demonstrations, but the ambiguous and individual nature of gestures hinders learning. In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. By incorporating a multimodal context and an adversarial training scheme, the proposed model outputs gestures that are human-like and that match with speech content and rhythm. We also introduce a new quantitative evaluation metric for gesture generation models. Experiments with the introduced metric and subjective human evaluation showed that the proposed gesture generation model is better than existing end-to-end generation models. We further confirm that our model is able to work with synthesized audio in a scenario where contexts are constrained, and show that different gesture styles can be generated for the same speech by specifying different speaker identities in the style embedding space that is learned from videos of various speakers. All the code and data is available at this https URL.
摘要:对于类似人类的药物,包括虚拟化身和社会的机器人,进行正确的手势,而说是人类至关重要 - 代理交互。联合演讲手势提升交互体验,使代理商看上去精神抖擞。然而,就很难产生类似人类的姿态由于缺乏人们如何手势的理解。数据驱动的方法试图从人的示威学习手势技能,但手势阻碍的暧昧与个体性学习。在本文中,我们提出了一个使用语音文本,音频和扬声器身份的可靠地生成手势多峰上下文的自动手势生成模型。通过结合多上下文和对抗性训练计划,即有类似人类的和匹配的讲话内容和节奏提出的模型输出的手势。我们还引进了手势一代车型新的定量评价指标。所引入的指标和主观人为评估实验表明,该手势代车型比现有的终端到终端的代车型更好。我们还确认,我们的模型能够在一个场景中合成音频,其中上下文的限制,并表明,不同的手势风格可以为同一个讲话,在从影片学到的风格嵌入区域指定不同的扬声器身份而产生的工作各种扬声器。所有的代码和数据可在此HTTPS URL。
21. S3NAS: Fast NPU-aware Neural Architecture Search Methodology [PDF] 返回目录
Jaeseong Lee, Duseok Kang, Soonhoi Ha
Abstract: As the application area of convolutional neural networks (CNN) is growing in embedded devices, it becomes popular to use a hardware CNN accelerator, called neural processing unit (NPU), to achieve higher performance per watt than CPUs or GPUs. Recently, automated neural architecture search (NAS) emerges as the default technique to find a state-of-the-art CNN architecture with higher accuracy than manually-designed architectures for image classification. In this paper, we present a fast NPU-aware NAS methodology, called S3NAS, to find a CNN architecture with higher accuracy than the existing ones under a given latency constraint. It consists of three steps: supernet design, Single-Path NAS for fast architecture exploration, and scaling. To widen the search space of the supernet structure that consists of stages, we allow stages to have a different number of blocks and blocks to have parallel layers of different kernel sizes. For a fast neural architecture search, we apply a modified Single-Path NAS technique to the proposed supernet structure. In this step, we assume a shorter latency constraint than the required to reduce the search space and the search time. The last step is to scale up the network maximally within the latency constraint. For accurate latency estimation, an analytical latency estimator is devised, based on a cycle-level NPU simulator that runs an entire CNN considering the memory access overhead accurately. With the proposed methodology, we are able to find a network in 3 hours using TPUv3, which shows 82.72% top-1 accuracy on ImageNet with 11.66 ms latency. Code are released at this https URL
摘要:卷积神经网络的应用领域(CNN)在嵌入式设备的不断增长,它变得流行使用硬件加速器CNN,称为神经处理单元(NPU),达到每瓦比CPU或GPU更高的性能。近来,自动神经结构搜索(NAS)出现作为默认技术可以找到一个国家的最先进的CNN架构具有比手动设计的架构为图像分类更高的精度。在本文中,我们提出了一个快速NPU感知NAS方法,称为S3NAS,找CNN架构比一个给定的等待时间约束下的现有的更高的精度。它包括三个步骤:超网设计,单路NAS快速架构探索和缩放。以加宽其由级的超网结构的搜索空间,我们允许阶段具有不同数量的块和块具有不同的内核大小的平行的层。对于一个快速的神经结构搜索,我们将修改过单一路径NAS技术所提出的超网结构。在这一步中,我们假设比减少搜索空间和搜索时间需要更短的等待时间约束。最后一步是将延迟约束范围内最大限度地扩展网络。为了获得准确的延迟估计,分析延迟估计器被设计的基础上,运行一个完整的CNN考虑存储器存取的开销准确一个周期级NPU模拟器。在所提出的方法,我们都能够找到在3小时内使用TPUv3上ImageNet的网络,这表明82.72%最高1精度11.66毫秒的延迟。代码在此HTTPS URL发布
Jaeseong Lee, Duseok Kang, Soonhoi Ha
Abstract: As the application area of convolutional neural networks (CNN) is growing in embedded devices, it becomes popular to use a hardware CNN accelerator, called neural processing unit (NPU), to achieve higher performance per watt than CPUs or GPUs. Recently, automated neural architecture search (NAS) emerges as the default technique to find a state-of-the-art CNN architecture with higher accuracy than manually-designed architectures for image classification. In this paper, we present a fast NPU-aware NAS methodology, called S3NAS, to find a CNN architecture with higher accuracy than the existing ones under a given latency constraint. It consists of three steps: supernet design, Single-Path NAS for fast architecture exploration, and scaling. To widen the search space of the supernet structure that consists of stages, we allow stages to have a different number of blocks and blocks to have parallel layers of different kernel sizes. For a fast neural architecture search, we apply a modified Single-Path NAS technique to the proposed supernet structure. In this step, we assume a shorter latency constraint than the required to reduce the search space and the search time. The last step is to scale up the network maximally within the latency constraint. For accurate latency estimation, an analytical latency estimator is devised, based on a cycle-level NPU simulator that runs an entire CNN considering the memory access overhead accurately. With the proposed methodology, we are able to find a network in 3 hours using TPUv3, which shows 82.72% top-1 accuracy on ImageNet with 11.66 ms latency. Code are released at this https URL
摘要:卷积神经网络的应用领域(CNN)在嵌入式设备的不断增长,它变得流行使用硬件加速器CNN,称为神经处理单元(NPU),达到每瓦比CPU或GPU更高的性能。近来,自动神经结构搜索(NAS)出现作为默认技术可以找到一个国家的最先进的CNN架构具有比手动设计的架构为图像分类更高的精度。在本文中,我们提出了一个快速NPU感知NAS方法,称为S3NAS,找CNN架构比一个给定的等待时间约束下的现有的更高的精度。它包括三个步骤:超网设计,单路NAS快速架构探索和缩放。以加宽其由级的超网结构的搜索空间,我们允许阶段具有不同数量的块和块具有不同的内核大小的平行的层。对于一个快速的神经结构搜索,我们将修改过单一路径NAS技术所提出的超网结构。在这一步中,我们假设比减少搜索空间和搜索时间需要更短的等待时间约束。最后一步是将延迟约束范围内最大限度地扩展网络。为了获得准确的延迟估计,分析延迟估计器被设计的基础上,运行一个完整的CNN考虑存储器存取的开销准确一个周期级NPU模拟器。在所提出的方法,我们都能够找到在3小时内使用TPUv3上ImageNet的网络,这表明82.72%最高1精度11.66毫秒的延迟。代码在此HTTPS URL发布
22. Introduction to Medical Image Registration with DeepReg, Between Old and New [PDF] 返回目录
N. Montana Brown, Y. Fu, S. U. Saeed, A. Casamitjana, Z. M. C. Baum, R. Delaunay, Q. Yang, A. Grimwood, Z. Min, E. Bonmati, T. Vercauteren, M. J. Clarkson, Y. Hu
Abstract: This document outlines a tutorial to get started with medical image registration using the open-source package DeepReg. The basic concepts of medical image registration are discussed, linking classical methods to newer methods using deep learning. Two iterative, classical algorithms using optimisation and one learning-based algorithm using deep learning are coded step-by-step using DeepReg utilities, all with real, open-accessible, medical data.
摘要:本文概述了使用开源包DeepReg上手医学图像配准的教程。医学图像配准的基本概念进行了讨论,使用深学习联经典方法来更新的方法。两种迭代,优化使用经典算法和一个基于学习的算法使用深度学习进行编码一步一步的使用DeepReg设施,所有与真正的开放式访问,医疗数据。
N. Montana Brown, Y. Fu, S. U. Saeed, A. Casamitjana, Z. M. C. Baum, R. Delaunay, Q. Yang, A. Grimwood, Z. Min, E. Bonmati, T. Vercauteren, M. J. Clarkson, Y. Hu
Abstract: This document outlines a tutorial to get started with medical image registration using the open-source package DeepReg. The basic concepts of medical image registration are discussed, linking classical methods to newer methods using deep learning. Two iterative, classical algorithms using optimisation and one learning-based algorithm using deep learning are coded step-by-step using DeepReg utilities, all with real, open-accessible, medical data.
摘要:本文概述了使用开源包DeepReg上手医学图像配准的教程。医学图像配准的基本概念进行了讨论,使用深学习联经典方法来更新的方法。两种迭代,优化使用经典算法和一个基于学习的算法使用深度学习进行编码一步一步的使用DeepReg设施,所有与真正的开放式访问,医疗数据。
23. The Little W-Net That Could: State-of-the-Art Retinal Vessel Segmentation with Minimalistic Models [PDF] 返回目录
Adrian Galdran, André Anjos, José Dolz, Hadi Chakor, Hervé Lombaert, Ismail Ben Ayed
Abstract: The segmentation of the retinal vasculature from eye fundus images represents one of the most fundamental tasks in retinal image analysis. Over recent years, increasingly complex approaches based on sophisticated Convolutional Neural Network architectures have been slowly pushing performance on well-established benchmark datasets. In this paper, we take a step back and analyze the real need of such complexity. Specifically, we demonstrate that a minimalistic version of a standard U-Net with several orders of magnitude less parameters, carefully trained and rigorously evaluated, closely approximates the performance of current best techniques. In addition, we propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets, still using orders of magnitude less learnable weights than any previously published approach. Furthermore, we provide the most comprehensive cross-dataset performance analysis to date, involving up to 10 different databases. Our analysis demonstrates that the retinal vessel segmentation problem is far from solved when considering test images that differ substantially from the training data, and that this task represents an ideal scenario for the exploration of domain adaptation techniques. In this context, we experiment with a simple self-labeling strategy that allows us to moderately enhance cross-dataset performance, indicating that there is still much room for improvement in this area. Finally, we also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art, at a fraction of the model complexity in recent literature. All the code to reproduce the results in this paper is released.
摘要:从眼底图像视网膜血管的分割代表的视网膜图像分析最根本的任务之一。近年来,基于复杂的卷积神经网络结构日益复杂的方法已经被慢慢推行之有效的标准数据集的性能。在本文中,我们退后一步,并分析这些复杂的实际需要。具体而言,我们证明了一个标准的U-网的简约版本的大小参数少,认真训练和严格的评估几个数量级,非常接近当前最先进的技术性能。此外,我们提出了一个简单的扩展,被称为W-网络,达到对几个流行的数据集出色的表现,依然采用幅度小于学得权重比此前公布的方案的订单。此外,我们提供最全面的跨数据集的性能分析到目前为止,涉案金额高达10个不同的数据库。我们的分析表明,视网膜血管分割问题还远远没有考虑到从训练数据显着不同的测试图像时解决,这个任务代表的域自适应技术探索的理想方案。在此背景下,我们尝试用一个简单的自我标记策略,使我们能够适度地增强跨数据集的性能,这表明还有很大的空间在这个区域的改善。最后,我们还测试的动脉/静脉分割问题,在这里我们再次实现结果与国家的最先进的良好对准,在模型的复杂性在最近的文献中的一小部分我们的方法。所有重现本文的结果代码被释放。
Adrian Galdran, André Anjos, José Dolz, Hadi Chakor, Hervé Lombaert, Ismail Ben Ayed
Abstract: The segmentation of the retinal vasculature from eye fundus images represents one of the most fundamental tasks in retinal image analysis. Over recent years, increasingly complex approaches based on sophisticated Convolutional Neural Network architectures have been slowly pushing performance on well-established benchmark datasets. In this paper, we take a step back and analyze the real need of such complexity. Specifically, we demonstrate that a minimalistic version of a standard U-Net with several orders of magnitude less parameters, carefully trained and rigorously evaluated, closely approximates the performance of current best techniques. In addition, we propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets, still using orders of magnitude less learnable weights than any previously published approach. Furthermore, we provide the most comprehensive cross-dataset performance analysis to date, involving up to 10 different databases. Our analysis demonstrates that the retinal vessel segmentation problem is far from solved when considering test images that differ substantially from the training data, and that this task represents an ideal scenario for the exploration of domain adaptation techniques. In this context, we experiment with a simple self-labeling strategy that allows us to moderately enhance cross-dataset performance, indicating that there is still much room for improvement in this area. Finally, we also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art, at a fraction of the model complexity in recent literature. All the code to reproduce the results in this paper is released.
摘要:从眼底图像视网膜血管的分割代表的视网膜图像分析最根本的任务之一。近年来,基于复杂的卷积神经网络结构日益复杂的方法已经被慢慢推行之有效的标准数据集的性能。在本文中,我们退后一步,并分析这些复杂的实际需要。具体而言,我们证明了一个标准的U-网的简约版本的大小参数少,认真训练和严格的评估几个数量级,非常接近当前最先进的技术性能。此外,我们提出了一个简单的扩展,被称为W-网络,达到对几个流行的数据集出色的表现,依然采用幅度小于学得权重比此前公布的方案的订单。此外,我们提供最全面的跨数据集的性能分析到目前为止,涉案金额高达10个不同的数据库。我们的分析表明,视网膜血管分割问题还远远没有考虑到从训练数据显着不同的测试图像时解决,这个任务代表的域自适应技术探索的理想方案。在此背景下,我们尝试用一个简单的自我标记策略,使我们能够适度地增强跨数据集的性能,这表明还有很大的空间在这个区域的改善。最后,我们还测试的动脉/静脉分割问题,在这里我们再次实现结果与国家的最先进的良好对准,在模型的复杂性在最近的文献中的一小部分我们的方法。所有重现本文的结果代码被释放。
24. Federated Learning for Breast Density Classification: A Real-World Implementation [PDF] 返回目录
Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz, Miao Zhang, Adam McCarthy, B. Min Yun, Elshaimaa Sharaf, Katharina V. Hoebel, Jay B. Patel, Bryan Chen, Sean Ko, Evan Leibovitz, Etta D. Pisano, Laura Coombs, Daguang Xu, Keith J. Dreyer, Ittai Dayan, Ram C. Naidu, Mona Flores, Daniel Rubin, Jayashree Kalpathy-Cramer
Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data.
摘要:建设强大的深学习型模式需要大量不同的训练数据。在这项研究中,我们调查使用联合学习(佛罗里达州)的一个真实世界的协作环境打造医疗影像分类模型。来自世界各地的七大临床机构加入了这一FL努力训练为基础的乳腺影像,报告和数据系统(BI-RADS)乳房密度分类的典范。我们发现,尽管所有站点的数据集之间巨大差异(乳腺X线摄影系统,类分布和数据集大小),并没有集中的数据,我们可以成功地训练人工智能模型联合会。结果表明,使用FL训练的模型上进行平均6.3%,比他们的同行单独训练的上院的本地数据更好。此外,我们展示的模型中45.8%的相对改善测试数据普遍性的其他参与网站评价时。
Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz, Miao Zhang, Adam McCarthy, B. Min Yun, Elshaimaa Sharaf, Katharina V. Hoebel, Jay B. Patel, Bryan Chen, Sean Ko, Evan Leibovitz, Etta D. Pisano, Laura Coombs, Daguang Xu, Keith J. Dreyer, Ittai Dayan, Ram C. Naidu, Mona Flores, Daniel Rubin, Jayashree Kalpathy-Cramer
Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data.
摘要:建设强大的深学习型模式需要大量不同的训练数据。在这项研究中,我们调查使用联合学习(佛罗里达州)的一个真实世界的协作环境打造医疗影像分类模型。来自世界各地的七大临床机构加入了这一FL努力训练为基础的乳腺影像,报告和数据系统(BI-RADS)乳房密度分类的典范。我们发现,尽管所有站点的数据集之间巨大差异(乳腺X线摄影系统,类分布和数据集大小),并没有集中的数据,我们可以成功地训练人工智能模型联合会。结果表明,使用FL训练的模型上进行平均6.3%,比他们的同行单独训练的上院的本地数据更好。此外,我们展示的模型中45.8%的相对改善测试数据普遍性的其他参与网站评价时。
25. ESMFL: Efficient and Secure Models for Federated Learning [PDF] 返回目录
Sheng Lin, Chenghong Wang, Hongjia Li, Jieren Deng, Yanzhi Wang, Caiwen Ding
Abstract: Deep Neural Networks are widely applied to various domains. The successful deployment of these applications is everywhere and it depends on the availability of big data. However, massive data collection required for deep neural network reveals the potential privacy issues and also consumes large mounts of communication bandwidth. To address this problem, we propose a privacy-preserving method for the federated learning distributed system, operated on Intel Software Guard Extensions, a set of instructions that increases the security of application code and data. Meanwhile, the encrypted models make the transmission overhead larger. Hence, we reduce the commutation cost by sparsification and achieve reasonable accuracy with different model architectures. Experimental results under our privacy-preserving framework show that, for LeNet-5, we obtain 98.78% accuracy on IID data and 97.60% accuracy on Non-IID data with 34.85% communication saving, and 1.8X total elapsed time acceleration. For MobileNetV2, we obtain 85.40% accuracy on IID data and 81.66% accuracy on Non-IID data with 15.85% communication saving, and 1.2X total elapsed time acceleration.
摘要:深层神经网络被广泛应用于各个领域。这些应用的成功部署是无处不在,它依赖于大数据的可用性。然而,对于深层神经网络需要大量的数据收集揭示了潜在的隐私问题,也消耗的通信带宽大的坐骑。为了解决这个问题,我们提出了联合学习分布式系统,英特尔软件狗扩展,一组指令增加的应用程序代码和数据的安全性操作的隐私保护法。同时,加密模式进行传输开销大。因此,我们通过稀疏化降低成本的换向,实现不同模式的架构合理的精度。根据我们的隐私保护框架的实验结果表明,对于LeNet-5,我们得到的数据IID 98.78%的准确率和非IID数据97.60%的准确率有34.85%的通讯节约和1.8X总运行时间加速。对于MobileNetV2,我们得到的数据IID 85.40%的准确率和非IID数据81.66%的准确率有15.85%的通讯节约和1.2X总运行时间加速。
Sheng Lin, Chenghong Wang, Hongjia Li, Jieren Deng, Yanzhi Wang, Caiwen Ding
Abstract: Deep Neural Networks are widely applied to various domains. The successful deployment of these applications is everywhere and it depends on the availability of big data. However, massive data collection required for deep neural network reveals the potential privacy issues and also consumes large mounts of communication bandwidth. To address this problem, we propose a privacy-preserving method for the federated learning distributed system, operated on Intel Software Guard Extensions, a set of instructions that increases the security of application code and data. Meanwhile, the encrypted models make the transmission overhead larger. Hence, we reduce the commutation cost by sparsification and achieve reasonable accuracy with different model architectures. Experimental results under our privacy-preserving framework show that, for LeNet-5, we obtain 98.78% accuracy on IID data and 97.60% accuracy on Non-IID data with 34.85% communication saving, and 1.8X total elapsed time acceleration. For MobileNetV2, we obtain 85.40% accuracy on IID data and 81.66% accuracy on Non-IID data with 15.85% communication saving, and 1.2X total elapsed time acceleration.
摘要:深层神经网络被广泛应用于各个领域。这些应用的成功部署是无处不在,它依赖于大数据的可用性。然而,对于深层神经网络需要大量的数据收集揭示了潜在的隐私问题,也消耗的通信带宽大的坐骑。为了解决这个问题,我们提出了联合学习分布式系统,英特尔软件狗扩展,一组指令增加的应用程序代码和数据的安全性操作的隐私保护法。同时,加密模式进行传输开销大。因此,我们通过稀疏化降低成本的换向,实现不同模式的架构合理的精度。根据我们的隐私保护框架的实验结果表明,对于LeNet-5,我们得到的数据IID 98.78%的准确率和非IID数据97.60%的准确率有34.85%的通讯节约和1.8X总运行时间加速。对于MobileNetV2,我们得到的数据IID 85.40%的准确率和非IID数据81.66%的准确率有15.85%的通讯节约和1.2X总运行时间加速。
26. Mutual Teaching for Graph Convolutional Networks [PDF] 返回目录
Kun Zhan, Chaoxi Niu
Abstract: Graph convolutional networks produce good predictions of unlabeled samples due to its transductive label propagation. Since samples have different predicted confidences, we take high-confidence predictions as pseudo labels to expand the label set so that more samples are selected for updating models. We propose a new training method named as mutual teaching, i.e., we train dual models and let them teach each other during each batch. First, each network feeds forward all samples and selects samples with high-confidence predictions. Second, each model is updated by samples selected by its peer network. We view the high-confidence predictions as useful knowledge, and the useful knowledge of one network teaches the peer network with model updating in each batch. In mutual teaching, the pseudo-label set of a network is from its peer network. Since we use the new strategy of network training, performance improves significantly. Extensive experimental results demonstrate that our method achieves superior performance over state-of-the-art methods under very low label rates.
摘要:图形卷积网络产生的未标记样本的良好预测由于其变换式标签的传播。由于样本有不同的预测可信度在,我们以高可信度的预测伪标签,扩大标签组,让更多的样品被选择用于更新的款式。我们建议,即我们培养的双模式,让他们每个批次中互相学习命名为相互教学了新的训练方法。首先,每个网络前馈所有样品和高可信度的预测选择样本。第二,每个模型是通过其对等网络选择的样品进行更新。我们认为,高可信度的预测为有用的知识,和一个网络的有用的知识教,在每个批次模型更新对等网络。在交互教学,伪标签集网络是从它的对等网络。由于我们使用网络培训的新战略,业绩显著提高。大量的实验结果表明,我们的方法实现在极低的标签率超国家的最先进的方法,性能优越。
Kun Zhan, Chaoxi Niu
Abstract: Graph convolutional networks produce good predictions of unlabeled samples due to its transductive label propagation. Since samples have different predicted confidences, we take high-confidence predictions as pseudo labels to expand the label set so that more samples are selected for updating models. We propose a new training method named as mutual teaching, i.e., we train dual models and let them teach each other during each batch. First, each network feeds forward all samples and selects samples with high-confidence predictions. Second, each model is updated by samples selected by its peer network. We view the high-confidence predictions as useful knowledge, and the useful knowledge of one network teaches the peer network with model updating in each batch. In mutual teaching, the pseudo-label set of a network is from its peer network. Since we use the new strategy of network training, performance improves significantly. Extensive experimental results demonstrate that our method achieves superior performance over state-of-the-art methods under very low label rates.
摘要:图形卷积网络产生的未标记样本的良好预测由于其变换式标签的传播。由于样本有不同的预测可信度在,我们以高可信度的预测伪标签,扩大标签组,让更多的样品被选择用于更新的款式。我们建议,即我们培养的双模式,让他们每个批次中互相学习命名为相互教学了新的训练方法。首先,每个网络前馈所有样品和高可信度的预测选择样本。第二,每个模型是通过其对等网络选择的样品进行更新。我们认为,高可信度的预测为有用的知识,和一个网络的有用的知识教,在每个批次模型更新对等网络。在交互教学,伪标签集网络是从它的对等网络。由于我们使用网络培训的新战略,业绩显著提高。大量的实验结果表明,我们的方法实现在极低的标签率超国家的最先进的方法,性能优越。
注:中文为机器翻译结果!封面为论文标题词云图!