目录
3. Computational optimization of convolutional neural networks using separated filters architecture [PDF] 摘要
8. An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization [PDF] 摘要
13. Automated Cardiothoracic Ratio Calculation and Cardiomegaly Detection using Deep Learning Approach [PDF] 摘要
14. Registration of multi-view point sets under the perspective of expectation-maximization [PDF] 摘要
16. EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement [PDF] 摘要
摘要
1. Camera Model Anonymisation with Augmented cGANs [PDF] 返回目录
Jerone T. A. Andrews, Yidan Zhang, Lewis D. Griffin
Abstract: The model of camera that was used to capture a particular photographic image (model attribution) can be inferred from model-specific artefacts present within the image. Typically these artefacts are found in high-frequency pixel patterns, rather than image content. Model anonymisation is the process of transforming these artefacts such that the apparent capture model is changed. Improved methods for attribution and anonymisation are important for improving digital forensics, and understanding its limits. Through conditional adversarial training, we present an approach for learning these transformations. Significantly, we augment the objective with the losses from pre-trained auxiliary model attribution classifiers that constrain the generator to not only synthesise discriminative high-frequency artefacts, but also salient image-based artefacts lost during image content suppression. Quantitative comparisons against a recent representative approach demonstrate the efficacy of our framework in a non-interactive black-box setting.
摘要:这是用于捕捉特定摄影图像(模型归属)摄像机的模型可以从图像中存在特定于模型的伪影来推断。典型地,这些伪像在高频像素图案中发现的,而不是图像内容。模型匿名化是转化这些伪影,使得表观捕获模型改变的过程。归属和匿名化的改进方法是提高数字取证,并了解其局限性重要。通过有条件的对抗性训练,我们提出了学习这些变革的方法。显著,我们扩充了客观与预训练辅助模型属性分类器限制发电机不仅合成辨别高频文物,而且图像内容抑制过程中丢失突出的基于图像的文物的损失。针对最近的代表性方法定量比较证明非交互式黑盒设置我们的框架的有效性。
Jerone T. A. Andrews, Yidan Zhang, Lewis D. Griffin
Abstract: The model of camera that was used to capture a particular photographic image (model attribution) can be inferred from model-specific artefacts present within the image. Typically these artefacts are found in high-frequency pixel patterns, rather than image content. Model anonymisation is the process of transforming these artefacts such that the apparent capture model is changed. Improved methods for attribution and anonymisation are important for improving digital forensics, and understanding its limits. Through conditional adversarial training, we present an approach for learning these transformations. Significantly, we augment the objective with the losses from pre-trained auxiliary model attribution classifiers that constrain the generator to not only synthesise discriminative high-frequency artefacts, but also salient image-based artefacts lost during image content suppression. Quantitative comparisons against a recent representative approach demonstrate the efficacy of our framework in a non-interactive black-box setting.
摘要:这是用于捕捉特定摄影图像(模型归属)摄像机的模型可以从图像中存在特定于模型的伪影来推断。典型地,这些伪像在高频像素图案中发现的,而不是图像内容。模型匿名化是转化这些伪影,使得表观捕获模型改变的过程。归属和匿名化的改进方法是提高数字取证,并了解其局限性重要。通过有条件的对抗性训练,我们提出了学习这些变革的方法。显著,我们扩充了客观与预训练辅助模型属性分类器限制发电机不仅合成辨别高频文物,而且图像内容抑制过程中丢失突出的基于图像的文物的损失。针对最近的代表性方法定量比较证明非交互式黑盒设置我们的框架的有效性。
2. MAST: A Memory-Augmented Self-supervised Tracker [PDF] 返回目录
Zihang Lai, Erika Lu, Weidi Xie
Abstract: Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.
摘要:在自我监督密集跟踪最近的兴趣已经取得了飞速的进步,但表现还是从监督的方法仍然远。我们提出了基于视频的培训而没有超越现有基准以前的自我监督方法,通过一个显著保证金(+ 15%)的任何注释的密集跟踪模型,并实现了性能堪比监督方法。在本文中,我们首先重新评估用于通过进行深入的实验终于阐明最佳选择,自我指导训练和重建丢失传统的选择。其次,我们进一步对现有的方法通过增加我们的架构的关键存储组件提高。第三,我们在基准大型半监督视频对象分割(又名密集跟踪),并提出了新的指标:普遍性。我们的第一个两个捐款产生自我监督的网络,第一次是与密集跟踪的标准评价指标的监督方法的竞争力。当测量普遍性,我们展现自我监督的方法实际上是优于大多数的监督方法。我们相信这个新的普遍性指标可以更好地反映真实世界的使用情况进行跟踪密集,并且将刺激这一研究方向的新兴趣。
Zihang Lai, Erika Lu, Weidi Xie
Abstract: Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.
摘要:在自我监督密集跟踪最近的兴趣已经取得了飞速的进步,但表现还是从监督的方法仍然远。我们提出了基于视频的培训而没有超越现有基准以前的自我监督方法,通过一个显著保证金(+ 15%)的任何注释的密集跟踪模型,并实现了性能堪比监督方法。在本文中,我们首先重新评估用于通过进行深入的实验终于阐明最佳选择,自我指导训练和重建丢失传统的选择。其次,我们进一步对现有的方法通过增加我们的架构的关键存储组件提高。第三,我们在基准大型半监督视频对象分割(又名密集跟踪),并提出了新的指标:普遍性。我们的第一个两个捐款产生自我监督的网络,第一次是与密集跟踪的标准评价指标的监督方法的竞争力。当测量普遍性,我们展现自我监督的方法实际上是优于大多数的监督方法。我们相信这个新的普遍性指标可以更好地反映真实世界的使用情况进行跟踪密集,并且将刺激这一研究方向的新兴趣。
3. Computational optimization of convolutional neural networks using separated filters architecture [PDF] 返回目录
Elena Limonova, Alexander Sheshkus, Dmitry Nikolaev
Abstract: This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding, for example for recognition on mobile platforms or in embedded systems. In this paper we propose CNN structure transformation which expresses 2D convolution filters as a linear combination of separable filters. It allows to obtain separated convolutional filters by standard training algorithms. We study the computation efficiency of this structure transformation and suggest fast implementation easily handled by CPU or GPU. We demonstrate that CNNs designed for letter and digit recognition of proposed structure show 15% speedup without accuracy loss in industrial image recognition system. In conclusion, we discuss the question of possible accuracy decrease and the application of proposed transformation to different recognition problems. convolutional neural networks, computational optimization, separable filters, complexity reduction.
摘要:本文认为,降低计算复杂度卷积神经网络转型,从而加速比神经网络处理。卷积神经网络(CNN)的用法是标准的做法,以图像识别,尽管它们可以过计算要求,例如用于识别移动平台上或在嵌入式系统中的事实。在本文中,我们提出,其表达二维卷积滤波器作为可分离滤波器的线性组合CNN结构的变换。它允许通过标准训练算法,以获得分离的卷积滤波器。我们研究这种结构变换的计算效率,建议快速实现由CPU或GPU容易处理。我们证明细胞神经网络的设计文字和提出的结构的数字识别显示无工业图像识别系统的精度损失15%的速度提升。最后,我们讨论可能的精度下降的问题,并提出了改造,以不同的识别问题中的应用。卷积神经网络,计算优化,可分离过滤器,降低复杂性。
Elena Limonova, Alexander Sheshkus, Dmitry Nikolaev
Abstract: This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding, for example for recognition on mobile platforms or in embedded systems. In this paper we propose CNN structure transformation which expresses 2D convolution filters as a linear combination of separable filters. It allows to obtain separated convolutional filters by standard training algorithms. We study the computation efficiency of this structure transformation and suggest fast implementation easily handled by CPU or GPU. We demonstrate that CNNs designed for letter and digit recognition of proposed structure show 15% speedup without accuracy loss in industrial image recognition system. In conclusion, we discuss the question of possible accuracy decrease and the application of proposed transformation to different recognition problems. convolutional neural networks, computational optimization, separable filters, complexity reduction.
摘要:本文认为,降低计算复杂度卷积神经网络转型,从而加速比神经网络处理。卷积神经网络(CNN)的用法是标准的做法,以图像识别,尽管它们可以过计算要求,例如用于识别移动平台上或在嵌入式系统中的事实。在本文中,我们提出,其表达二维卷积滤波器作为可分离滤波器的线性组合CNN结构的变换。它允许通过标准训练算法,以获得分离的卷积滤波器。我们研究这种结构变换的计算效率,建议快速实现由CPU或GPU容易处理。我们证明细胞神经网络的设计文字和提出的结构的数字识别显示无工业图像识别系统的精度损失15%的速度提升。最后,我们讨论可能的精度下降的问题,并提出了改造,以不同的识别问题中的应用。卷积神经网络,计算优化,可分离过滤器,降低复杂性。
4. Towards Bounding-Box Free Panoptic Segmentation [PDF] 返回目录
Ujwal Bonde, Pablo F. Alcantarilla, Stefan Leutenegger
Abstract: In this work we introduce a new bounding-box free network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for a bounding-box free approach as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from an off-the-shelf semantic segmentation network and refine them to predict instance labels. Towards this goal BBFNet predicts coarse watershed levels and use it to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects. A novel triplet loss network helps merging fragmented instances while refining boundary pixels. Our approach is distinct from previous works in panoptic segmentation that rely on a combination of a semantic segmentation network with a computationally costly instance segmentation network based on bounding boxes, such as Mask R-CNN, to guide the prediction of instance labels using a Mixture-of-Expert (MoE) approach. We benchmark our non-MoE method on Cityscapes and Microsoft COCO datasets and show competitive performance with other MoE based approaches while outperfroming exisiting non-proposal based approaches. We achieve this while been computationally more efficient in terms of number of parameters and FLOPs.
摘要:在这项工作中,我们引入了全景分割新的边界框免费网络(BBFNet)。全景分割是一个包围盒免费接近理想的问题,因为它已经要求每个像素的含义类的标签。我们使用这种观察从关闭的,现成的语义分割网络攻击阶级界限和完善他们预测实例标识。为了实现这一目标BBFNet预测粗分水岭水平,并用它来检测其中的界限的明确界定大型实例的候选人。对于较小的情况下,其边界是不可靠的,BBFNet也紧跟着平均移动可靠地检测小物体霍夫投票的方式预测实例中心。一种新型的三重损失的网络可以帮助合并分散的情况下,同时优化边界像素。我们的方法是从在全景分割依赖于一个语义分割网络与基于包围盒,如面膜R-CNN,在计算上昂贵的实例分割的网络的组合之前的作品不同,以指导实例标签的使用Mixture-预测的-专家(MOE)的方法。我们的基准上风情和微软COCO数据集我们的非教育部方法,并显示与其他基于教育部的方法有竞争力的性能,同时outperfroming exisiting基于非建议的方法。我们做到这一点,而在参数和触发器数量方面已经计算更高效。
Ujwal Bonde, Pablo F. Alcantarilla, Stefan Leutenegger
Abstract: In this work we introduce a new bounding-box free network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for a bounding-box free approach as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from an off-the-shelf semantic segmentation network and refine them to predict instance labels. Towards this goal BBFNet predicts coarse watershed levels and use it to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects. A novel triplet loss network helps merging fragmented instances while refining boundary pixels. Our approach is distinct from previous works in panoptic segmentation that rely on a combination of a semantic segmentation network with a computationally costly instance segmentation network based on bounding boxes, such as Mask R-CNN, to guide the prediction of instance labels using a Mixture-of-Expert (MoE) approach. We benchmark our non-MoE method on Cityscapes and Microsoft COCO datasets and show competitive performance with other MoE based approaches while outperfroming exisiting non-proposal based approaches. We achieve this while been computationally more efficient in terms of number of parameters and FLOPs.
摘要:在这项工作中,我们引入了全景分割新的边界框免费网络(BBFNet)。全景分割是一个包围盒免费接近理想的问题,因为它已经要求每个像素的含义类的标签。我们使用这种观察从关闭的,现成的语义分割网络攻击阶级界限和完善他们预测实例标识。为了实现这一目标BBFNet预测粗分水岭水平,并用它来检测其中的界限的明确界定大型实例的候选人。对于较小的情况下,其边界是不可靠的,BBFNet也紧跟着平均移动可靠地检测小物体霍夫投票的方式预测实例中心。一种新型的三重损失的网络可以帮助合并分散的情况下,同时优化边界像素。我们的方法是从在全景分割依赖于一个语义分割网络与基于包围盒,如面膜R-CNN,在计算上昂贵的实例分割的网络的组合之前的作品不同,以指导实例标签的使用Mixture-预测的-专家(MOE)的方法。我们的基准上风情和微软COCO数据集我们的非教育部方法,并显示与其他基于教育部的方法有竞争力的性能,同时outperfroming exisiting基于非建议的方法。我们做到这一点,而在参数和触发器数量方面已经计算更高效。
5. Voxel-Based Indoor Reconstruction From HoloLens Triangle Meshes [PDF] 返回目录
P. Hübner, M. Weinmann, S. Wursthorn
Abstract: Current mobile augmented reality devices are often equipped with range sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight (ToF) range camera providing coarse triangle meshes that can be used in custom applications. We suggest to use the triangle meshes for the automatic generation of indoor models that can serve as basis for augmenting their physical counterpart with location-dependent information. In this paper, we present a novel voxel-based approach for automated indoor reconstruction from unstructured three-dimensional geometries like triangle meshes. After an initial voxelization of the input data, rooms are detected in the resulting voxel grid by segmenting connected voxel components of ceiling candidates and extruding them downwards to find floor candidates. Semantic class labels like 'Wall', 'Wall Opening', 'Interior Object' and 'Empty Interior' are then assigned to the room voxels in-between ceiling and floor by a rule-based voxel sweep algorithm. Finally, the geometry of the detected walls and their openings is refined in voxel representation. The proposed approach is not restricted to Manhattan World scenarios and does not rely on room surfaces being planar.
摘要:目前的移动增强现实设备通常装备有多种传感器。例如微软HoloLens配备有越时间的飞行时间(TOF)范围照相机提供可以在定制的应用程序中使用粗三角形网格。我们建议使用三角形网格的自动生成室内模式,可以作为基础,与位置相关的信息,增强其物理副本的。在本文中,我们提出了从非结构化三维几何形状像三角形网格自动室内重建的新的基于体素的方法。所述输入数据的初始体素化后,房间在所得到的体素网格通过分割天花板候选的体素连接部件和将它们挤出向下找到地板候选检测。语义类的标签,如“长城”,“长城开”,“内政部对象”和“空”内部,然后通过基于规则的体素扫描算法分配到在中间的房间素天花板和地板。最后,将检测到的墙壁和其开口的几何形状中的体素表示精制而成。所提出的方法不限于曼哈顿的世界场景,不依赖于房间表面是平面的。
P. Hübner, M. Weinmann, S. Wursthorn
Abstract: Current mobile augmented reality devices are often equipped with range sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight (ToF) range camera providing coarse triangle meshes that can be used in custom applications. We suggest to use the triangle meshes for the automatic generation of indoor models that can serve as basis for augmenting their physical counterpart with location-dependent information. In this paper, we present a novel voxel-based approach for automated indoor reconstruction from unstructured three-dimensional geometries like triangle meshes. After an initial voxelization of the input data, rooms are detected in the resulting voxel grid by segmenting connected voxel components of ceiling candidates and extruding them downwards to find floor candidates. Semantic class labels like 'Wall', 'Wall Opening', 'Interior Object' and 'Empty Interior' are then assigned to the room voxels in-between ceiling and floor by a rule-based voxel sweep algorithm. Finally, the geometry of the detected walls and their openings is refined in voxel representation. The proposed approach is not restricted to Manhattan World scenarios and does not rely on room surfaces being planar.
摘要:目前的移动增强现实设备通常装备有多种传感器。例如微软HoloLens配备有越时间的飞行时间(TOF)范围照相机提供可以在定制的应用程序中使用粗三角形网格。我们建议使用三角形网格的自动生成室内模式,可以作为基础,与位置相关的信息,增强其物理副本的。在本文中,我们提出了从非结构化三维几何形状像三角形网格自动室内重建的新的基于体素的方法。所述输入数据的初始体素化后,房间在所得到的体素网格通过分割天花板候选的体素连接部件和将它们挤出向下找到地板候选检测。语义类的标签,如“长城”,“长城开”,“内政部对象”和“空”内部,然后通过基于规则的体素扫描算法分配到在中间的房间素天花板和地板。最后,将检测到的墙壁和其开口的几何形状中的体素表示精制而成。所提出的方法不限于曼哈顿的世界场景,不依赖于房间表面是平面的。
6. FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings [PDF] 返回目录
Niels Ole Salscheider
Abstract: Most state of the art object detectors output multiple detections per object. The duplicates are removed in a post-processing step called Non-Maximum Suppression. Classical Non-Maximum Suppression has shortcomings in scenes that contain objects with high overlap: The idea of this heuristic is that a high bounding box overlap corresponds to a high probability of having a duplicate. We propose FeatureNMS to solve this problem. FeatureNMS recognizes duplicates not only based on the intersection over union between bounding boxes, but also based on the difference of feature vectors. These feature vectors can encode more information like visual appearance. Our approach outperforms classical NMS and derived approaches and achieves state of the art performance.
摘要:每个对象的技术对象检测器的输出多次检测的大多数状态。重复项以称为非最大抑制后处理步骤中除去。经典非最大抑制在包含有高度重叠对象的场景缺点:这种启发式的想法是,高边界框重叠对应于具有重复的概率高。我们建议FeatureNMS来解决这个问题。 FeatureNMS重复不仅承认基于边界框之间的交叉点上的结合,同时也基于特征向量的差异。这些特征向量可以编码更像视觉外观的信息。我们的方法优于传统的NMS和衍生方法和实现的先进的性能。
Niels Ole Salscheider
Abstract: Most state of the art object detectors output multiple detections per object. The duplicates are removed in a post-processing step called Non-Maximum Suppression. Classical Non-Maximum Suppression has shortcomings in scenes that contain objects with high overlap: The idea of this heuristic is that a high bounding box overlap corresponds to a high probability of having a duplicate. We propose FeatureNMS to solve this problem. FeatureNMS recognizes duplicates not only based on the intersection over union between bounding boxes, but also based on the difference of feature vectors. These feature vectors can encode more information like visual appearance. Our approach outperforms classical NMS and derived approaches and achieves state of the art performance.
摘要:每个对象的技术对象检测器的输出多次检测的大多数状态。重复项以称为非最大抑制后处理步骤中除去。经典非最大抑制在包含有高度重叠对象的场景缺点:这种启发式的想法是,高边界框重叠对应于具有重复的概率高。我们建议FeatureNMS来解决这个问题。 FeatureNMS重复不仅承认基于边界框之间的交叉点上的结合,同时也基于特征向量的差异。这些特征向量可以编码更像视觉外观的信息。我们的方法优于传统的NMS和衍生方法和实现的先进的性能。
7. Neural arbitrary style transfer for portrait images using the attention mechanism [PDF] 返回目录
S. A. Berezin, V.M. Volkova
Abstract: Arbitrary style transfer is the task of synthesis of an image that has never been seen before, using two given images: content image and style image. The content image forms the structure, the basic geometric lines and shapes of the resulting image, while the style image sets the color and texture of the result. The word "arbitrary" in this context means the absence of any one pre-learned style. So, for example, convolutional neural networks capable of transferring a new style only after training or retraining on a new amount of data are not con-sidered to solve such a problem, while networks based on the attention mech-anism that are capable of performing such a transformation without retraining - yes. An original image can be, for example, a photograph, and a style image can be a painting of a famous artist. The resulting image in this case will be the scene depicted in the original photograph, made in the stylie of this picture. Recent arbitrary style transfer algorithms make it possible to achieve good re-sults in this task, however, in processing portrait images of people, the result of such algorithms is either unacceptable due to excessive distortion of facial features, or weakly expressed, not bearing the characteristic features of a style image. In this paper, we consider an approach to solving this problem using the combined architecture of deep neural networks with a attention mechanism that transfers style based on the contents of a particular image segment: with a clear predominance of style over the form for the background part of the im-age, and with the prevalence of content over the form in the image part con-taining directly the image of a person.
摘要:任意的方式传输图像的合成一个从未见过的,使用两个给定图像的任务:内容形象和风格的图像。内容图像的形式的结构中,基本的几何线和所得到的图像的形状,而风格图像设置结果的颜色和质地。在这方面的手段单词“任意”没有任何一个预先得知的风格。因此,例如,能够仅培训后转移,新的样式或再上一个新的数据量的卷积神经网络是不是CON-sidered来解决这样的问题,而基于网络的关注机甲anism能够执行的没有再培训这样的转变 - 是的。原始图像可以,例如,照片,和样式的图像可以是画一个著名的艺术家的。在这种情况下,所得的图像将是在原来的照片中描绘的场景,在该画面的stylie制成。最近任意的方式传输算法使人们有可能在这个任务中,实现良好的再sults,然而,在处理人的肖像图片,这些算法的结果要么是不可接受由于五官过度扭曲,或弱阳性,不承风格图像的特征。在本文中,我们考虑一种方法来使用深层神经网络相结合的体系结构与注意机制解决这一问题,基于特定图像段的内容传送风格:一个有风格上的形式为背景的明确优势图像,和与内容在图像部分中的患病率在形式直接CON-泰宁的人的图像。
S. A. Berezin, V.M. Volkova
Abstract: Arbitrary style transfer is the task of synthesis of an image that has never been seen before, using two given images: content image and style image. The content image forms the structure, the basic geometric lines and shapes of the resulting image, while the style image sets the color and texture of the result. The word "arbitrary" in this context means the absence of any one pre-learned style. So, for example, convolutional neural networks capable of transferring a new style only after training or retraining on a new amount of data are not con-sidered to solve such a problem, while networks based on the attention mech-anism that are capable of performing such a transformation without retraining - yes. An original image can be, for example, a photograph, and a style image can be a painting of a famous artist. The resulting image in this case will be the scene depicted in the original photograph, made in the stylie of this picture. Recent arbitrary style transfer algorithms make it possible to achieve good re-sults in this task, however, in processing portrait images of people, the result of such algorithms is either unacceptable due to excessive distortion of facial features, or weakly expressed, not bearing the characteristic features of a style image. In this paper, we consider an approach to solving this problem using the combined architecture of deep neural networks with a attention mechanism that transfers style based on the contents of a particular image segment: with a clear predominance of style over the form for the background part of the im-age, and with the prevalence of content over the form in the image part con-taining directly the image of a person.
摘要:任意的方式传输图像的合成一个从未见过的,使用两个给定图像的任务:内容形象和风格的图像。内容图像的形式的结构中,基本的几何线和所得到的图像的形状,而风格图像设置结果的颜色和质地。在这方面的手段单词“任意”没有任何一个预先得知的风格。因此,例如,能够仅培训后转移,新的样式或再上一个新的数据量的卷积神经网络是不是CON-sidered来解决这样的问题,而基于网络的关注机甲anism能够执行的没有再培训这样的转变 - 是的。原始图像可以,例如,照片,和样式的图像可以是画一个著名的艺术家的。在这种情况下,所得的图像将是在原来的照片中描绘的场景,在该画面的stylie制成。最近任意的方式传输算法使人们有可能在这个任务中,实现良好的再sults,然而,在处理人的肖像图片,这些算法的结果要么是不可接受由于五官过度扭曲,或弱阳性,不承风格图像的特征。在本文中,我们考虑一种方法来使用深层神经网络相结合的体系结构与注意机制解决这一问题,基于特定图像段的内容传送风格:一个有风格上的形式为背景的明确优势图像,和与内容在图像部分中的患病率在形式直接CON-泰宁的人的图像。
8. An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization [PDF] 返回目录
Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Abstract: Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: this https URL.
摘要:医学图像自然图像显著更高的分辨率和利益较小的区域不同。由于这些差异,对于自然的图像工作得很好神经网络结构可能并不适用于医学图像分析。在这项工作中,我们扩展了全球意识的多个实例的分类,我们提出,解决医学图像的这些独特性的框架。这种模式首先使用整个图像上的低容量,又节省内存,网络找出最翔实的地区。然后,它适用于另一个更高容量的网络,从选择区域收集的详细信息。最后,它采用了汇集全球和本地的信息来做出最后预测的融合模块。尽管现有方法的培训过程中经常需要病变划分,我们的模型进行训练只图像级标签,可以生成像素级的显着图指示可能的恶性结果。我们的模型适用于乳房摄影筛检解释:预测良,恶性病变的存在或不存在。在NYU乳腺癌检查数据集,包括超过一个百万的图像,我们的模型实现了0.93的AUC在乳房恶性结果进行分类,表现优于RESNET-34和更快的R-CNN。相比于RESNET-34,我们的模型是4.1倍更快的推理,同时使用更少的78.4%GPU内存。此外,我们证明,在读者学习,我们的模型由0.11余量超过放射科级AUC。该模型可在网上:此HTTPS URL。
Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Abstract: Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: this https URL.
摘要:医学图像自然图像显著更高的分辨率和利益较小的区域不同。由于这些差异,对于自然的图像工作得很好神经网络结构可能并不适用于医学图像分析。在这项工作中,我们扩展了全球意识的多个实例的分类,我们提出,解决医学图像的这些独特性的框架。这种模式首先使用整个图像上的低容量,又节省内存,网络找出最翔实的地区。然后,它适用于另一个更高容量的网络,从选择区域收集的详细信息。最后,它采用了汇集全球和本地的信息来做出最后预测的融合模块。尽管现有方法的培训过程中经常需要病变划分,我们的模型进行训练只图像级标签,可以生成像素级的显着图指示可能的恶性结果。我们的模型适用于乳房摄影筛检解释:预测良,恶性病变的存在或不存在。在NYU乳腺癌检查数据集,包括超过一个百万的图像,我们的模型实现了0.93的AUC在乳房恶性结果进行分类,表现优于RESNET-34和更快的R-CNN。相比于RESNET-34,我们的模型是4.1倍更快的推理,同时使用更少的78.4%GPU内存。此外,我们证明,在读者学习,我们的模型由0.11余量超过放射科级AUC。该模型可在网上:此HTTPS URL。
9. Few-Shot Few-Shot Learning and the role of Spatial Attention [PDF] 返回目录
Yann Lifchitz, Yannis Avrithis, Sylvaine Picard
Abstract: Few-shot learning is often motivated by the ability of humans to learn new tasks from few examples. However, standard few-shot classification benchmarks assume that the representation is learned on a limited amount of base class data, ignoring the amount of prior knowledge that a human may have accumulated before learning new tasks. At the same time, even if a powerful representation is available, it may happen in some domain that base class data are limited or non-existent. This motivates us to study a problem where the representation is obtained from a classifier pre-trained on a large-scale dataset of a different domain, assuming no access to its training process, while the base class data are limited to few examples per class and their role is to adapt the representation to the domain at hand rather than learn from scratch. We adapt the representation in two stages, namely on the few base class data if available and on the even fewer data of new tasks. In doing so, we obtain from the pre-trained classifier a spatial attention map that allows focusing on objects and suppressing background clutter. This is important in the new problem, because when base class data are few, the network cannot learn where to focus implicitly. We also show that a pre-trained network may be easily adapted to novel classes, without meta-learning.
摘要:很少次的学习往往是由人类从几个例子学习新任务的能力动机。然而,标准的为数不多的镜头分类基准假设表示。据悉在基类数据的数量有限,忽略了人类可以学习新的任务之前所积累的先验知识量。与此同时,即使强大的表示是可用的,它可能在某些领域发生的基类的数据是有限的或不存在。这促使我们研究其中从分类器上的不同的域的大规模数据集预先训练获得的表示中的问题,假设它的训练过程中没有访问,同时基类的数据被限制为每类几个例子和它们的作用就在眼前适应代表性的域,而不是从头学起。我们采用两个阶段的表现,即上如果有少数基类的数据和新的任务,甚至更少的数据。在此过程中,我们从预先训练的分类器空间注意,允许重点对象和抑制背景杂波地图获得。这是新的问题很重要,因为当基础类数据少,网络无法学习到哪里隐式对焦。我们还表明,预训练的网络可以很容易地适应新的课程,而无需元学习。
Yann Lifchitz, Yannis Avrithis, Sylvaine Picard
Abstract: Few-shot learning is often motivated by the ability of humans to learn new tasks from few examples. However, standard few-shot classification benchmarks assume that the representation is learned on a limited amount of base class data, ignoring the amount of prior knowledge that a human may have accumulated before learning new tasks. At the same time, even if a powerful representation is available, it may happen in some domain that base class data are limited or non-existent. This motivates us to study a problem where the representation is obtained from a classifier pre-trained on a large-scale dataset of a different domain, assuming no access to its training process, while the base class data are limited to few examples per class and their role is to adapt the representation to the domain at hand rather than learn from scratch. We adapt the representation in two stages, namely on the few base class data if available and on the even fewer data of new tasks. In doing so, we obtain from the pre-trained classifier a spatial attention map that allows focusing on objects and suppressing background clutter. This is important in the new problem, because when base class data are few, the network cannot learn where to focus implicitly. We also show that a pre-trained network may be easily adapted to novel classes, without meta-learning.
摘要:很少次的学习往往是由人类从几个例子学习新任务的能力动机。然而,标准的为数不多的镜头分类基准假设表示。据悉在基类数据的数量有限,忽略了人类可以学习新的任务之前所积累的先验知识量。与此同时,即使强大的表示是可用的,它可能在某些领域发生的基类的数据是有限的或不存在。这促使我们研究其中从分类器上的不同的域的大规模数据集预先训练获得的表示中的问题,假设它的训练过程中没有访问,同时基类的数据被限制为每类几个例子和它们的作用就在眼前适应代表性的域,而不是从头学起。我们采用两个阶段的表现,即上如果有少数基类的数据和新的任务,甚至更少的数据。在此过程中,我们从预先训练的分类器空间注意,允许重点对象和抑制背景杂波地图获得。这是新的问题很重要,因为当基础类数据少,网络无法学习到哪里隐式对焦。我们还表明,预训练的网络可以很容易地适应新的课程,而无需元学习。
10. NoiseBreaker: Gradual Image Denoising Guided by Noise Analysis [PDF] 返回目录
Florian Lemarchand, Erwan Nogues, Maxime Pelcat
Abstract: Fully supervised deep-learning based denoisers are currently the most performing image denoising solutions. However, they require clean reference images. When the target noise is complex, e.g. composed of an unknown mixture of primary noises with unknown intensity, fully supervised solutions are limited by the difficulty to build a suited training set for the problem. This paper proposes a gradual denoising strategy that iteratively detects the dominating noise in an image, and removes it using a tailored denoiser. The method is shown to keep up with state of the art blind denoisers on mixture noises. Moreover, noise analysis is demonstrated to guide denoisers efficiently not only on noise type, but also on noise intensity. The method provides an insight on the nature of the encountered noise, and it makes it possible to extend an existing denoiser with new noise nature. This feature makes the method adaptive to varied denoising cases.
摘要:全面监督深学习基础denoisers是目前最进行图像去噪的解决方案。然而,他们需要干净的参考图像。当目标噪声是复杂的,例如具有未知强度一次噪声的未知混合物组成,完全监督溶液通过的困难限制建立针对该问题的合适的训练集。本文提出了一种使用定制的降噪逐渐去噪策略,反复检测图像中的主要噪声,并将其删除。该方法被示出为跟上对混合物噪声的技术盲denoisers的状态。此外,噪声分析证明有效地引导denoisers不仅对噪声的类型,而且还取决于噪声强度。该方法提供关于所遇到的噪声的性质见识,并且它使得能够延伸,以与新的噪声性质的现有降噪。这一特征使得该方法自适应于变化去噪例。
Florian Lemarchand, Erwan Nogues, Maxime Pelcat
Abstract: Fully supervised deep-learning based denoisers are currently the most performing image denoising solutions. However, they require clean reference images. When the target noise is complex, e.g. composed of an unknown mixture of primary noises with unknown intensity, fully supervised solutions are limited by the difficulty to build a suited training set for the problem. This paper proposes a gradual denoising strategy that iteratively detects the dominating noise in an image, and removes it using a tailored denoiser. The method is shown to keep up with state of the art blind denoisers on mixture noises. Moreover, noise analysis is demonstrated to guide denoisers efficiently not only on noise type, but also on noise intensity. The method provides an insight on the nature of the encountered noise, and it makes it possible to extend an existing denoiser with new noise nature. This feature makes the method adaptive to varied denoising cases.
摘要:全面监督深学习基础denoisers是目前最进行图像去噪的解决方案。然而,他们需要干净的参考图像。当目标噪声是复杂的,例如具有未知强度一次噪声的未知混合物组成,完全监督溶液通过的困难限制建立针对该问题的合适的训练集。本文提出了一种使用定制的降噪逐渐去噪策略,反复检测图像中的主要噪声,并将其删除。该方法被示出为跟上对混合物噪声的技术盲denoisers的状态。此外,噪声分析证明有效地引导denoisers不仅对噪声的类型,而且还取决于噪声强度。该方法提供关于所遇到的噪声的性质见识,并且它使得能够延伸,以与新的噪声性质的现有降噪。这一特征使得该方法自适应于变化去噪例。
11. Motion Deblurring using Spatiotemporal Phase Aperture Coding [PDF] 返回目录
Shay Elmalem, Raja Giryes, Emanuel Marom
Abstract: Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediate optical image. This encoding embeds both the motion direction and extent by coloring the spatial blur of each object. The color cues serve as prior information for a blind deblurring process, implemented using a convolutional neural network (CNN) trained to utilize such coding for image restoration. We demonstrate the advantage of the proposed approach over blind-deblurring with no coding and other solutions that use coded acquisition, both in simulation and real-world experiments.
摘要:运动模糊是摄影中已知的问题,因为它限制了曝光时间,同时捕捉移动的物体。大量研究已经进行,以弥补它。在这项工作中,运动去模糊的计算成像方法提出并论证。使用动态相位编码的图像采集期间,在透镜孔径,运动的轨迹的中间光学图像中被编码。这种编码由着色的每个对象的空间模糊嵌入两个运动方向和程度。颜色线索作为盲去模糊处理之前的信息,使用已训练的卷积神经网络(CNN),利用这样的编码的图像复原实现。我们证明了盲去模糊所提出的方法的优点没有编码和其他的解决方案,使用编码的收购,无论是在模拟和真实世界的实验。
Shay Elmalem, Raja Giryes, Emanuel Marom
Abstract: Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediate optical image. This encoding embeds both the motion direction and extent by coloring the spatial blur of each object. The color cues serve as prior information for a blind deblurring process, implemented using a convolutional neural network (CNN) trained to utilize such coding for image restoration. We demonstrate the advantage of the proposed approach over blind-deblurring with no coding and other solutions that use coded acquisition, both in simulation and real-world experiments.
摘要:运动模糊是摄影中已知的问题,因为它限制了曝光时间,同时捕捉移动的物体。大量研究已经进行,以弥补它。在这项工作中,运动去模糊的计算成像方法提出并论证。使用动态相位编码的图像采集期间,在透镜孔径,运动的轨迹的中间光学图像中被编码。这种编码由着色的每个对象的空间模糊嵌入两个运动方向和程度。颜色线索作为盲去模糊处理之前的信息,使用已训练的卷积神经网络(CNN),利用这样的编码的图像复原实现。我们证明了盲去模糊所提出的方法的优点没有编码和其他的解决方案,使用编码的收购,无论是在模拟和真实世界的实验。
12. Knowledge Integration Networks for Action Recognition [PDF] 返回目录
Shiwen Zhang, Sheng Guo, Limin Wang, Weilin Huang, Matthew R. Scott
Abstract: In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.
摘要:在这项工作中,我们提出了知识整合网络(简称KINET)视频行为识别。的Kinet是能够聚集有意义的上下文特征,其是非常重要的识别动作,如人信息和场景上下文。我们设计了一个三分支结构,包括用于动作识别一个主分区,并为人类的解析和场景识别其允许模型编码人类和场景的动作识别知识两个辅助分支。我们探讨两个预训练的模型作为教师网络提炼人类和场景的知识培训KINET的辅助任务。此外,我们提出了编码包含用于编码所述辅助知识成中等程度的卷积特征的交叉处集成(CBI)模块机制两级知识以及动作知识图(AKG),用于有效地熔合高级别上下文信息。这导致终端到终端的可训练的框架,这三个任务可以协同的训练,使模型有效计算强背景知识。所提出的Kinet实现上大规模动作识别基准动力学-400的状态的最先进的性能,具有77.8%的顶1的精度。我们进一步证明我们的KINET具有由动力学训练模式转移到UCF-101,它获得97.8%最高1精度能力强。
Shiwen Zhang, Sheng Guo, Limin Wang, Weilin Huang, Matthew R. Scott
Abstract: In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.
摘要:在这项工作中,我们提出了知识整合网络(简称KINET)视频行为识别。的Kinet是能够聚集有意义的上下文特征,其是非常重要的识别动作,如人信息和场景上下文。我们设计了一个三分支结构,包括用于动作识别一个主分区,并为人类的解析和场景识别其允许模型编码人类和场景的动作识别知识两个辅助分支。我们探讨两个预训练的模型作为教师网络提炼人类和场景的知识培训KINET的辅助任务。此外,我们提出了编码包含用于编码所述辅助知识成中等程度的卷积特征的交叉处集成(CBI)模块机制两级知识以及动作知识图(AKG),用于有效地熔合高级别上下文信息。这导致终端到终端的可训练的框架,这三个任务可以协同的训练,使模型有效计算强背景知识。所提出的Kinet实现上大规模动作识别基准动力学-400的状态的最先进的性能,具有77.8%的顶1的精度。我们进一步证明我们的KINET具有由动力学训练模式转移到UCF-101,它获得97.8%最高1精度能力强。
13. Automated Cardiothoracic Ratio Calculation and Cardiomegaly Detection using Deep Learning Approach [PDF] 返回目录
Isarun Chamveha, Treethep Promwiset, Trongtum Tongdee, Pairash Saiviroonporn, Warasinee Chaisangmongkon
Abstract: We propose an algorithm for calculating the cardiothoracic ratio (CTR) from chest X-ray films. Our approach applies a deep learning model based on U-Net with VGG16 encoder to extract lung and heart masks from chest X-ray images and calculate CTR from the extents of obtained masks. Human radiologists evaluated our CTR measurements, and $76.5\%$ were accepted to be included in medical reports without any need for adjustment. This result translates to a large amount of time and labor saved for radiologists using our automated tools.
摘要:我们提出的算法,用于计算从胸部X光片的心胸比(CTR)。我们的方法适用于基于掌中从获得口罩的程度VGG16编码器提取肺和心脏口罩从胸部的X射线图像,并计算CTR了深刻的学习模式。人类放射科医师评估我们的点击率测量结果和$ 76.5 \%$被接受被列入,而无需任何调整医疗报告。这一结果转化为节省使用我们的自动化工具放射大量的时间和人力。
Isarun Chamveha, Treethep Promwiset, Trongtum Tongdee, Pairash Saiviroonporn, Warasinee Chaisangmongkon
Abstract: We propose an algorithm for calculating the cardiothoracic ratio (CTR) from chest X-ray films. Our approach applies a deep learning model based on U-Net with VGG16 encoder to extract lung and heart masks from chest X-ray images and calculate CTR from the extents of obtained masks. Human radiologists evaluated our CTR measurements, and $76.5\%$ were accepted to be included in medical reports without any need for adjustment. This result translates to a large amount of time and labor saved for radiologists using our automated tools.
摘要:我们提出的算法,用于计算从胸部X光片的心胸比(CTR)。我们的方法适用于基于掌中从获得口罩的程度VGG16编码器提取肺和心脏口罩从胸部的X射线图像,并计算CTR了深刻的学习模式。人类放射科医师评估我们的点击率测量结果和$ 76.5 \%$被接受被列入,而无需任何调整医疗报告。这一结果转化为节省使用我们的自动化工具放射大量的时间和人力。
14. Registration of multi-view point sets under the perspective of expectation-maximization [PDF] 返回目录
Jihua Zhu, Jing Zhang, Zhongyu Li
Abstract: Registration of multi-view point sets is a prerequisite for 3D model reconstruction. To solve this problem, most of previous approaches either partially explore available information or blindly utilize unnecessary information to align each point set, which may lead to the undesired results or introduce extra computation complexity. To this end, this paper consider the multi-view registration problem as a maximum likelihood estimation problem and proposes a novel multi-view registration approach under the perspective of Expectation-Maximization (EM). The basic idea of our approach is that different data points are generated by the same number of Gaussian mixture models (GMMs). For each data point in one well-aligned point set, its nearest neighbors can be searched from other well-aligned point sets to explore more available information. Then, we can suppose this data point is generated by the special GMM, which is composed of each of its nearest neighbor adhered with one Gaussian distribution. Based on this assumption, it is reasonable to define the likelihood function, which contains all rigid transformations required to be estimated for multi-view registration. Subsequently, the EM algorithm is utilized to maximize the likelihood function so as to estimate all rigid transformations. Finally, the proposed approach is tested on several bench mark data sets and compared with some state-of-the-art algorithms. Experimental results illustrate its super performance on accuracy and efficiency for the registration of multi-view point sets.
摘要:多视点组的登记是3D模型重建的先决条件。为了解决这个问题,大多数以前的方法的任一部分探索可用信息或盲目利用不必要的信息来对齐各点集,这可能导致不期望的结果或引入额外的计算复杂度。为此,本文考虑的多视点登记的问题,因为最大似然估计问题,并提出了期望最大化(EM)的角度下的新的多视图配准方法。我们的方法的基本思想是由相同数量的高斯混合模型(的GMM)产生的不同的数据点。对于一个良好对准点集中每个数据点,其最近的邻居可以从其他对准良好的点集进行搜索,探索更多的可用信息。然后,我们可以假设是由特殊的GMM,它是由每一个高斯分布附着其最近邻的生成该数据点。基于这个假设,这是合理的定义似然函数,其中包含所需要估计用于多视图登记所有刚性变换。随后,EM算法来最大化似然函数,从而估计所有刚性变换。最后,所提出的方法是在多个基准点的数据集进行测试和与国家的最先进的一些算法进行比较。实验结果示出在精度和效率的多视点组的登记其超级性能。
Jihua Zhu, Jing Zhang, Zhongyu Li
Abstract: Registration of multi-view point sets is a prerequisite for 3D model reconstruction. To solve this problem, most of previous approaches either partially explore available information or blindly utilize unnecessary information to align each point set, which may lead to the undesired results or introduce extra computation complexity. To this end, this paper consider the multi-view registration problem as a maximum likelihood estimation problem and proposes a novel multi-view registration approach under the perspective of Expectation-Maximization (EM). The basic idea of our approach is that different data points are generated by the same number of Gaussian mixture models (GMMs). For each data point in one well-aligned point set, its nearest neighbors can be searched from other well-aligned point sets to explore more available information. Then, we can suppose this data point is generated by the special GMM, which is composed of each of its nearest neighbor adhered with one Gaussian distribution. Based on this assumption, it is reasonable to define the likelihood function, which contains all rigid transformations required to be estimated for multi-view registration. Subsequently, the EM algorithm is utilized to maximize the likelihood function so as to estimate all rigid transformations. Finally, the proposed approach is tested on several bench mark data sets and compared with some state-of-the-art algorithms. Experimental results illustrate its super performance on accuracy and efficiency for the registration of multi-view point sets.
摘要:多视点组的登记是3D模型重建的先决条件。为了解决这个问题,大多数以前的方法的任一部分探索可用信息或盲目利用不必要的信息来对齐各点集,这可能导致不期望的结果或引入额外的计算复杂度。为此,本文考虑的多视点登记的问题,因为最大似然估计问题,并提出了期望最大化(EM)的角度下的新的多视图配准方法。我们的方法的基本思想是由相同数量的高斯混合模型(的GMM)产生的不同的数据点。对于一个良好对准点集中每个数据点,其最近的邻居可以从其他对准良好的点集进行搜索,探索更多的可用信息。然后,我们可以假设是由特殊的GMM,它是由每一个高斯分布附着其最近邻的生成该数据点。基于这个假设,这是合理的定义似然函数,其中包含所需要估计用于多视图登记所有刚性变换。随后,EM算法来最大化似然函数,从而估计所有刚性变换。最后,所提出的方法是在多个基准点的数据集进行测试和与国家的最先进的一些算法进行比较。实验结果示出在精度和效率的多视点组的登记其超级性能。
15. V4D:4D Convolutional Neural Networks for Video-level Representation Learning [PDF] 返回目录
Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang
Abstract: Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
摘要:大多数现有的三维细胞神经网络的视频表示学习是基于片段的方法,因此不考虑时空特性的视频级的时间演变。在本文中,我们提出了视频级4D卷积神经网络,被称为V4D,与4D卷积长程时空表示的演化模型,并在同一时间,以保持与残烈的3D时空表示连接。具体来说,我们设计了新的4D残余块能够捕获夹具间的相互作用,这可增强原始片段级3D细胞神经网络的表现力。四维残余块可以很容易地集成到现有的3D细胞神经网络来执行远程分级建模。我们进一步介绍拟议V4D训练和推理方法。大量的实验是在三个视频识别基准,其中V4D取得了优异的成绩,大幅度超越近期3D细胞神经网络进行。
Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang
Abstract: Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
摘要:大多数现有的三维细胞神经网络的视频表示学习是基于片段的方法,因此不考虑时空特性的视频级的时间演变。在本文中,我们提出了视频级4D卷积神经网络,被称为V4D,与4D卷积长程时空表示的演化模型,并在同一时间,以保持与残烈的3D时空表示连接。具体来说,我们设计了新的4D残余块能够捕获夹具间的相互作用,这可增强原始片段级3D细胞神经网络的表现力。四维残余块可以很容易地集成到现有的3D细胞神经网络来执行远程分级建模。我们进一步介绍拟议V4D训练和推理方法。大量的实验是在三个视频识别基准,其中V4D取得了优异的成绩,大幅度超越近期3D细胞神经网络进行。
16. EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement [PDF] 返回目录
Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li
Abstract: Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAMRPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.
摘要:上训练完全注释的数据对象探测器目前得到的先进的性能,但需要昂贵的手动注释。在另一方面,弱监督检测器具有低得多的性能,并且不能在现实设置可靠地使用。在本文中,我们研究了混合监督对象检测的问题,目的是培养高质量的检测器,只有fullyannotated有限的数据量,并且充分利用与映像级别标签便宜的数据。的现有技术方法通常提出的迭代方法,产生伪标签和更新检测器之间交替。这种模式需要在每一轮仔细的手工超参数调整矿业良好的伪标签,是相当费时。为了解决这些问题,我们本EHSOD,端至端的混合监督物体检测系统,其能够在一杆在两个完全和弱注释的数据被训练。具体地,基于两阶段检测器上,我们提出了两个模块,以充分利用来自两个种标签的信息:1)CAMRPN模块目的是找出由类激活热图引导前景建议; 2)混合监督级联模块进一步提炼包围盒位置和分类与辅助头图像级数据兼容的帮助。广泛的实验证明了该方法的有效性,并能实现在多个对象检测基准比较的结果,只有30%的全注释的数据,例如在COCO 37.5%映像。我们将发布的代码和训练的模型。
Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li
Abstract: Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAMRPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.
摘要:上训练完全注释的数据对象探测器目前得到的先进的性能,但需要昂贵的手动注释。在另一方面,弱监督检测器具有低得多的性能,并且不能在现实设置可靠地使用。在本文中,我们研究了混合监督对象检测的问题,目的是培养高质量的检测器,只有fullyannotated有限的数据量,并且充分利用与映像级别标签便宜的数据。的现有技术方法通常提出的迭代方法,产生伪标签和更新检测器之间交替。这种模式需要在每一轮仔细的手工超参数调整矿业良好的伪标签,是相当费时。为了解决这些问题,我们本EHSOD,端至端的混合监督物体检测系统,其能够在一杆在两个完全和弱注释的数据被训练。具体地,基于两阶段检测器上,我们提出了两个模块,以充分利用来自两个种标签的信息:1)CAMRPN模块目的是找出由类激活热图引导前景建议; 2)混合监督级联模块进一步提炼包围盒位置和分类与辅助头图像级数据兼容的帮助。广泛的实验证明了该方法的有效性,并能实现在多个对象检测基准比较的结果,只有30%的全注释的数据,例如在COCO 37.5%映像。我们将发布的代码和训练的模型。
17. Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN [PDF] 返回目录
Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li
Abstract: The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).
摘要:占优对象检测方法分别和配合治疗每个数据集向一个特定的域,这无法适应其他域没有广泛的重新训练。在本文中,我们解决设计,它利用来自多个域不同类别的粒度通用对象检测模型的问题,并预测各种类别的一个系统。现有的工作原理是在一个共享的骨干网络整合多个检测分支看待这个问题。然而,这种模式可以观看多个域,如类别层次结构,视觉相似,和语言之间的关系的关键语义相关性。为了解决这些缺点,提出了所谓的通用-RCNN一种新颖的通用对象检测器并入图转印学习用于传播跨越多个数据集相关的语义信息以达到语义相干性。具体而言,我们首先通过整合所有类别中的所有高层的语义表达产生一个全球性的语义池。然后域内推理模块学习和传播通过空间感知GCN引导一个数据集内的稀疏图表示。最后,域间传输模块,提出利用在所有领域多元化转移的依赖,并通过参加和全球转移语义语境提升区域特征表示。广泛的实验表明,该方法优于显著多分支模式和实现上的多个物体检测基准(MAP:上COCO 49.1%)的状态下的最先进的结果。
Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li
Abstract: The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).
摘要:占优对象检测方法分别和配合治疗每个数据集向一个特定的域,这无法适应其他域没有广泛的重新训练。在本文中,我们解决设计,它利用来自多个域不同类别的粒度通用对象检测模型的问题,并预测各种类别的一个系统。现有的工作原理是在一个共享的骨干网络整合多个检测分支看待这个问题。然而,这种模式可以观看多个域,如类别层次结构,视觉相似,和语言之间的关系的关键语义相关性。为了解决这些缺点,提出了所谓的通用-RCNN一种新颖的通用对象检测器并入图转印学习用于传播跨越多个数据集相关的语义信息以达到语义相干性。具体而言,我们首先通过整合所有类别中的所有高层的语义表达产生一个全球性的语义池。然后域内推理模块学习和传播通过空间感知GCN引导一个数据集内的稀疏图表示。最后,域间传输模块,提出利用在所有领域多元化转移的依赖,并通过参加和全球转移语义语境提升区域特征表示。广泛的实验表明,该方法优于显著多分支模式和实现上的多个物体检测基准(MAP:上COCO 49.1%)的状态下的最先进的结果。
18. DivideMix: Learning with Noisy Labels as Semi-supervised Learning [PDF] 返回目录
Junnan Li, Richard Socher, Steven C.H. Hoi
Abstract: Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at this https URL .
摘要:深层神经网络被称为是注释大户。许多已经努力与深网络学习时,减少注释成本。二是突出方向包括通过利用未标记的数据与嘈杂的标签和半监督学习学习。在这项工作中,我们提出DivideMix,与嘈杂的标签,通过利用半监督学习技术,学习新的框架。特别地,DivideMix模型每样品与混合模型损失分布训练数据动态地划分为标记组用干净的样品和未标记的集合与噪声采样,和火车上以半两个标记和未标记的数据模型-supervised方式。为了避免确认偏见,我们同时培养每个网络使用的数据集划分与其他网络中的两个分歧的网络。在半监督训练阶段,我们通过改进执行标签共同细化和标签共同猜测分别标记和未标记的样本,MixMatch策略。上的多个基准数据集的实验表明在国家的最先进的方法显着改进。代码可在此HTTPS URL。
Junnan Li, Richard Socher, Steven C.H. Hoi
Abstract: Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at this https URL .
摘要:深层神经网络被称为是注释大户。许多已经努力与深网络学习时,减少注释成本。二是突出方向包括通过利用未标记的数据与嘈杂的标签和半监督学习学习。在这项工作中,我们提出DivideMix,与嘈杂的标签,通过利用半监督学习技术,学习新的框架。特别地,DivideMix模型每样品与混合模型损失分布训练数据动态地划分为标记组用干净的样品和未标记的集合与噪声采样,和火车上以半两个标记和未标记的数据模型-supervised方式。为了避免确认偏见,我们同时培养每个网络使用的数据集划分与其他网络中的两个分歧的网络。在半监督训练阶段,我们通过改进执行标签共同细化和标签共同猜测分别标记和未标记的样本,MixMatch策略。上的多个基准数据集的实验表明在国家的最先进的方法显着改进。代码可在此HTTPS URL。
19. High-Order Paired-ASPP Networks for Semantic Segmenation [PDF] 返回目录
Yu Zhang, Xin Sun, Junyu Dong, Changrui Chen, Yue Shen
Abstract: Current semantic segmentation models only exploit first-order statistics, while rarely exploring high-order statistics. However, common first-order statistics are insufficient to support a solid unanimous representation. In this paper, we propose High-Order Paired-ASPP Network to exploit high-order statistics from various feature levels. The network first introduces a High-Order Representation module to extract the contextual high-order information from all stages of the backbone. They can provide more semantic clues and discriminative information than the first-order ones. Besides, a Paired-ASPP module is proposed to embed high-order statistics of the early stages into the last stage. It can further preserve the boundary-related and spatial context in the low-level features for final prediction. Our experiments show that the high-order statistics significantly boost the performance on confusing objects. Our method achieves competitive performance without bells and whistles on three benchmarks, i.e, Cityscapes, ADE20K and Pascal-Context with the mIoU of 81.6%, 45.3% and 52.9%.
摘要:当前语义分割模型只利用一阶统计,而很少探讨高阶统计。然而,常见的一阶统计不足以支持了坚实的一致表示。在本文中,我们提出了高阶配对ASPP网络利用各种特征等级高阶统计。该网络首先介绍高阶呈现模块来提取主干的各个阶段的情境高阶信息。他们可以提供更多的语义线索和比第一阶的人的区别信息。此外,一个配对ASPP模块提出了早期的嵌入高阶统计进入最后阶段。它可以进一步保持在低层次功能的最终预测边界相关和空间背景。我们的实验表明,高阶统计显著提升上混淆对象的性能。我们的方法实现,而不在三个基准,即,风情,ADE20K和帕斯卡 - 语境与81.6%,45.3%和52.9%的米欧花里胡哨的竞争性优势。
Yu Zhang, Xin Sun, Junyu Dong, Changrui Chen, Yue Shen
Abstract: Current semantic segmentation models only exploit first-order statistics, while rarely exploring high-order statistics. However, common first-order statistics are insufficient to support a solid unanimous representation. In this paper, we propose High-Order Paired-ASPP Network to exploit high-order statistics from various feature levels. The network first introduces a High-Order Representation module to extract the contextual high-order information from all stages of the backbone. They can provide more semantic clues and discriminative information than the first-order ones. Besides, a Paired-ASPP module is proposed to embed high-order statistics of the early stages into the last stage. It can further preserve the boundary-related and spatial context in the low-level features for final prediction. Our experiments show that the high-order statistics significantly boost the performance on confusing objects. Our method achieves competitive performance without bells and whistles on three benchmarks, i.e, Cityscapes, ADE20K and Pascal-Context with the mIoU of 81.6%, 45.3% and 52.9%.
摘要:当前语义分割模型只利用一阶统计,而很少探讨高阶统计。然而,常见的一阶统计不足以支持了坚实的一致表示。在本文中,我们提出了高阶配对ASPP网络利用各种特征等级高阶统计。该网络首先介绍高阶呈现模块来提取主干的各个阶段的情境高阶信息。他们可以提供更多的语义线索和比第一阶的人的区别信息。此外,一个配对ASPP模块提出了早期的嵌入高阶统计进入最后阶段。它可以进一步保持在低层次功能的最终预测边界相关和空间背景。我们的实验表明,高阶统计显著提升上混淆对象的性能。我们的方法实现,而不在三个基准,即,风情,ADE20K和帕斯卡 - 语境与81.6%,45.3%和52.9%的米欧花里胡哨的竞争性优势。
20. Multi-Task Learning from Videos via Efficient Inter-Frame Attention [PDF] 返回目录
Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni
Abstract: Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos. Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slower network runs on sparsely sampled keyframes and the lightweight shallow network runs on non-key frames at a high frame rate. We further propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. The proposed architecture ensures low-latency multi-task learning while maintaining high quality prediction. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by 70%. Meanwhile, our attention based feature propagation outperforms other feature propagation methods in accuracy by up to 90% reduction of FLOPs.
摘要:在多任务学习以前的工作主要集中在单个图像上的预测。在这项工作中,我们提出了多任务从视频中学习的新方法。我们的方法包含了允许跨帧的学习任务特别注意一个新的帧间注意模块。我们嵌入关注模块中的“慢 - 快”的架构,其中在稀疏采样关键帧的速度较慢的网络运行,并以高帧速率的非关键帧的轻质浅的网络运行。我们进一步提出了一个有效的对抗的学习策略,鼓励缓慢和快速的网络学习类似的功能。所提出的架构确保低延迟多任务学习,同时保持高品质的预测。相比于两个多任务学习的基准状态的最先进的实验表明,有竞争力的准确性,同时由70%降低浮点运算(FLOPS)的数量。同时,我们根据重视功能传播优于高达FLOPS的减少了90%的准确度等特性的传播方法。
Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni
Abstract: Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos. Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slower network runs on sparsely sampled keyframes and the lightweight shallow network runs on non-key frames at a high frame rate. We further propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. The proposed architecture ensures low-latency multi-task learning while maintaining high quality prediction. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by 70%. Meanwhile, our attention based feature propagation outperforms other feature propagation methods in accuracy by up to 90% reduction of FLOPs.
摘要:在多任务学习以前的工作主要集中在单个图像上的预测。在这项工作中,我们提出了多任务从视频中学习的新方法。我们的方法包含了允许跨帧的学习任务特别注意一个新的帧间注意模块。我们嵌入关注模块中的“慢 - 快”的架构,其中在稀疏采样关键帧的速度较慢的网络运行,并以高帧速率的非关键帧的轻质浅的网络运行。我们进一步提出了一个有效的对抗的学习策略,鼓励缓慢和快速的网络学习类似的功能。所提出的架构确保低延迟多任务学习,同时保持高品质的预测。相比于两个多任务学习的基准状态的最先进的实验表明,有竞争力的准确性,同时由70%降低浮点运算(FLOPS)的数量。同时,我们根据重视功能传播优于高达FLOPS的减少了90%的准确度等特性的传播方法。
21. Constraining Temporal Relationship for Action Localization [PDF] 返回目录
Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Qi Tian
Abstract: Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves predicting three values at each time point, corresponding to the probabilities that the action starts, continues and ends, and post-processing these curves for the final localization. This paper delves deep into this mechanism, and argues that existing approaches mostly ignored the potential relationship of these curves, and results in low quality of action proposals. To alleviate this problem, we add extra constraints to these curves, e.g., the probability of ''action continues'' should be relatively high between probability peaks of ''action starts'' and ''action ends'', so that the entire framework is aware of these latent constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively (in terms of the AR@AN and mAP) and qualitatively (the curves in the testing stage become much smoother). In particular, when we build our constraints beyond TSA-Net and PGCN, we achieve the state-of-the-art performance especially at strict high IoU settings. The code will be available.
摘要:近日,颞行动本地化(TAL),即查找特定行为段的修剪视频,吸引了计算机视觉界越来越多的关注。状态的最先进的用于解决方案TAL在每个时间点包括预测三个值,对应于概率,该动作开始,继续和结束,以及后处理对最终定位这些曲线。本文深入钻研这一机制,并认为现有的方法大都忽略了这些曲线的潜在关系,并在行动建议低质量的结果。为了缓解这个问题,我们增加额外的约束这些曲线,例如,“概率”行动继续“”应该是'行动启动“”和“”行动结束'的概率峰之间相对较高,从而使整个框架是知道这些潜限制期间结束到终端的优化过程。实验是在两个流行TAL数据集,THUMOS14和ActivityNet1.3执行。我们的方法明显优于基线在数量上(在AR @ AN和地图方面)和质量(变得更加顺畅处于测试阶段的曲线)。特别是,当我们建立我们超越TSA-Net和PGCN的限制,我们实现特别是在严格的高欠条设置的国家的最先进的性能。该代码将可用。
Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Qi Tian
Abstract: Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves predicting three values at each time point, corresponding to the probabilities that the action starts, continues and ends, and post-processing these curves for the final localization. This paper delves deep into this mechanism, and argues that existing approaches mostly ignored the potential relationship of these curves, and results in low quality of action proposals. To alleviate this problem, we add extra constraints to these curves, e.g., the probability of ''action continues'' should be relatively high between probability peaks of ''action starts'' and ''action ends'', so that the entire framework is aware of these latent constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively (in terms of the AR@AN and mAP) and qualitatively (the curves in the testing stage become much smoother). In particular, when we build our constraints beyond TSA-Net and PGCN, we achieve the state-of-the-art performance especially at strict high IoU settings. The code will be available.
摘要:近日,颞行动本地化(TAL),即查找特定行为段的修剪视频,吸引了计算机视觉界越来越多的关注。状态的最先进的用于解决方案TAL在每个时间点包括预测三个值,对应于概率,该动作开始,继续和结束,以及后处理对最终定位这些曲线。本文深入钻研这一机制,并认为现有的方法大都忽略了这些曲线的潜在关系,并在行动建议低质量的结果。为了缓解这个问题,我们增加额外的约束这些曲线,例如,“概率”行动继续“”应该是'行动启动“”和“”行动结束'的概率峰之间相对较高,从而使整个框架是知道这些潜限制期间结束到终端的优化过程。实验是在两个流行TAL数据集,THUMOS14和ActivityNet1.3执行。我们的方法明显优于基线在数量上(在AR @ AN和地图方面)和质量(变得更加顺畅处于测试阶段的曲线)。特别是,当我们建立我们超越TSA-Net和PGCN的限制,我们实现特别是在严格的高欠条设置的国家的最先进的性能。该代码将可用。
22. Restricted Structural Random Matrix for Compressive Sensing [PDF] 返回目录
Thuong Nguyen Canh, Byeungwoo Jeon
Abstract: Compressive sensing (CS) is well-known for its unique functionalities of sensing, compressing, and security (i.e. CS measurements are equally important). However, there is a tradeoff. Improving sensing and compressing efficiency with prior signal information tends to favor particular measurements, thus decrease the security. This work aimed to improve the sensing and compressing efficiency without compromise the security with a novel sampling matrix, named Restricted Structural Random Matrix (RSRM). RSRM unified the advantages of frame-based and block-based sensing together with the global smoothness prior (i.e. low-resolution signals are highly correlated). RSRM acquired compressive measurements with random projection (equally important) of multiple randomly sub-sampled signals, which was restricted to be the low-resolution signals (equal in energy), thereby, its observations are equally important. RSRM was proven to satisfies the Restricted Isometry Property and shows comparable reconstruction performance with recent state-of-the-art compressive sensing and deep learning-based methods.
摘要:压缩感测(CS)是以其独特的感测,压缩和安全的功能众所周知的(即CS测量是同样重要的)。然而,有一个权衡。改进的感测和与现有信号信息压缩效率往往有利于特定的测量,从而降低安全性。这项工作的目的是提高传感和压缩效率不妥协的新颖采样矩阵,命名为受限结构随机矩阵(RSRM)的安全性。 RSRM统一的优点基于帧和基于块的与全局平滑度之前(即低分辨率信号是高度相关的)一起感测。 RSRM获取与随机投影压缩测量多个随机子采样的信号,将其限制为低分辨率信号(在能量相等)的(同样重要的),从而,它的观察是同样重要的。 RSRM被证明满足受限等距属性和表演相媲美重建性能与近期国家的最先进的压缩感知和深基于学习的方法。
Thuong Nguyen Canh, Byeungwoo Jeon
Abstract: Compressive sensing (CS) is well-known for its unique functionalities of sensing, compressing, and security (i.e. CS measurements are equally important). However, there is a tradeoff. Improving sensing and compressing efficiency with prior signal information tends to favor particular measurements, thus decrease the security. This work aimed to improve the sensing and compressing efficiency without compromise the security with a novel sampling matrix, named Restricted Structural Random Matrix (RSRM). RSRM unified the advantages of frame-based and block-based sensing together with the global smoothness prior (i.e. low-resolution signals are highly correlated). RSRM acquired compressive measurements with random projection (equally important) of multiple randomly sub-sampled signals, which was restricted to be the low-resolution signals (equal in energy), thereby, its observations are equally important. RSRM was proven to satisfies the Restricted Isometry Property and shows comparable reconstruction performance with recent state-of-the-art compressive sensing and deep learning-based methods.
摘要:压缩感测(CS)是以其独特的感测,压缩和安全的功能众所周知的(即CS测量是同样重要的)。然而,有一个权衡。改进的感测和与现有信号信息压缩效率往往有利于特定的测量,从而降低安全性。这项工作的目的是提高传感和压缩效率不妥协的新颖采样矩阵,命名为受限结构随机矩阵(RSRM)的安全性。 RSRM统一的优点基于帧和基于块的与全局平滑度之前(即低分辨率信号是高度相关的)一起感测。 RSRM获取与随机投影压缩测量多个随机子采样的信号,将其限制为低分辨率信号(在能量相等)的(同样重要的),从而,它的观察是同样重要的。 RSRM被证明满足受限等距属性和表演相媲美重建性能与近期国家的最先进的压缩感知和深基于学习的方法。
23. 3D Gated Recurrent Fusion for Semantic Scene Completion [PDF] 返回目录
Yu Liu, Jie Li, Qingsen Yan, Xia Yuan, Chunxia Zhao, Ian Reid, Cesar Cadena
Abstract: This paper tackles the problem of data fusion in the semantic scene completion (SSC) task, which can simultaneously deal with semantic labeling and scene completion. RGB images contain texture details of the object(s) which are vital for semantic scene understanding. Meanwhile, depth images capture geometric clues of high relevance for shape completion. Using both RGB and depth images can further boost the accuracy of SSC over employing one modality in isolation. We propose a 3D gated recurrent fusion network (GRFNet), which learns to adaptively select and fuse the relevant information from depth and RGB by making use of the gate and memory modules. Based on the single-stage fusion, we further propose a multi-stage fusion strategy, which could model the correlations among different stages within the network. Extensive experiments on two benchmark datasets demonstrate the superior performance and the effectiveness of the proposed GRFNet for data fusion in SSC. Code will be made available.
摘要:本文铲球数据融合在语义现场完成(SSC)任务的问题,可以用语义标签和现场完成同时处理。 RGB图像包含的对象(S)的纹理细节这对语义理解的场景非常重要。同时,深度图像捕捉形状完成相关性高的几何线索。使用RGB和深度图像可以进一步推动南南合作的准确度采用单独一个模式。我们提出了一个3D门选复发性融合网络(GRFNet),其学会自适应地选择和熔断器从深度和RGB通过利用栅极和存储器模块的有关信息。基于单级融合,我们进一步提出了多级融合策略,可以在网络中的不同阶段之间的相关性进行建模。两个基准数据集大量的实验证明了卓越的性能和拟议GRFNet的在SSC数据融合的有效性。代码将提供。
Yu Liu, Jie Li, Qingsen Yan, Xia Yuan, Chunxia Zhao, Ian Reid, Cesar Cadena
Abstract: This paper tackles the problem of data fusion in the semantic scene completion (SSC) task, which can simultaneously deal with semantic labeling and scene completion. RGB images contain texture details of the object(s) which are vital for semantic scene understanding. Meanwhile, depth images capture geometric clues of high relevance for shape completion. Using both RGB and depth images can further boost the accuracy of SSC over employing one modality in isolation. We propose a 3D gated recurrent fusion network (GRFNet), which learns to adaptively select and fuse the relevant information from depth and RGB by making use of the gate and memory modules. Based on the single-stage fusion, we further propose a multi-stage fusion strategy, which could model the correlations among different stages within the network. Extensive experiments on two benchmark datasets demonstrate the superior performance and the effectiveness of the proposed GRFNet for data fusion in SSC. Code will be made available.
摘要:本文铲球数据融合在语义现场完成(SSC)任务的问题,可以用语义标签和现场完成同时处理。 RGB图像包含的对象(S)的纹理细节这对语义理解的场景非常重要。同时,深度图像捕捉形状完成相关性高的几何线索。使用RGB和深度图像可以进一步推动南南合作的准确度采用单独一个模式。我们提出了一个3D门选复发性融合网络(GRFNet),其学会自适应地选择和熔断器从深度和RGB通过利用栅极和存储器模块的有关信息。基于单级融合,我们进一步提出了多级融合策略,可以在网络中的不同阶段之间的相关性进行建模。两个基准数据集大量的实验证明了卓越的性能和拟议GRFNet的在SSC数据融合的有效性。代码将提供。
24. Dual-Attention GAN for Large-Pose Face Frontalization [PDF] 返回目录
Yu Yin, Songyao Jiang, Joseph P. Robinson, Yun Fu
Abstract: Face frontalization provides an effective and efficient way for face data augmentation and further improves the face recognition performance in extreme pose scenario. Despite recent advances in deep learning-based face synthesis approaches, this problem is still challenging due to significant pose and illumination discrepancy. In this paper, we present a novel Dual-Attention Generative Adversarial Network (DA-GAN) for photo-realistic face frontalization by capturing both contextual dependencies and local consistency during GAN training. Specifically, a self-attention-based generator is introduced to integrate local features with their long-range dependencies yielding better feature representations, and hence generate faces that preserve identities better, especially for larger pose angles. Moreover, a novel face-attention-based discriminator is applied to emphasize local features of face regions, and hence reinforce the realism of synthetic frontal faces. Guided by semantic segmentation, four independent discriminators are used to distinguish between different aspects of a face (\ie skin, keypoints, hairline, and frontalized face). By introducing these two complementary attention mechanisms in generator and discriminator separately, we can learn a richer feature representation and generate identity preserving inference of frontal views with much finer details (i.e., more accurate facial appearance and textures) comparing to the state-of-the-art. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our DA-GAN approach.
摘要:面对frontalization提供了脸部数据增强的有效途径,进一步提高了在极端姿势方案中的面部识别性能。尽管深学习型脸合成方法的最新进展,这一问题仍然是由于具有挑战性的显著姿态和照度差异。在本文中,我们通过GAN训练期间捕获两者的上下文依赖关系和局部一致性呈现为照片般逼真的面frontalization一种新颖的双注意剖成对抗性网络(DA-GAN)。具体而言,基于自我关注发生器被引入到当地特色与他们的长距离依赖产生更好的特征表示整合,从而产生保护身份更好,特别是对于较大的姿势角度的面孔。另外,一种新型的基于面部注意鉴别器被施加到强调面部区域的局部特征,并因此加强合成正面人脸的真实感。通过语义分割的指导下,四个独立的鉴别器中使用的脸的不同方面(\即皮肤,关键点,发纹,和frontalized面)之间进行区分。通过分别引入发电机和鉴别这两种互补的注意力机制,我们可以学到更丰富的功能表现和产生的身份保护与更精细的细节(即,更准确的面部外形和纹理)正面看法推断比较状态的最-艺术。定量和定性实验结果表明我们的DA-GaN方法的有效性和效率。
Yu Yin, Songyao Jiang, Joseph P. Robinson, Yun Fu
Abstract: Face frontalization provides an effective and efficient way for face data augmentation and further improves the face recognition performance in extreme pose scenario. Despite recent advances in deep learning-based face synthesis approaches, this problem is still challenging due to significant pose and illumination discrepancy. In this paper, we present a novel Dual-Attention Generative Adversarial Network (DA-GAN) for photo-realistic face frontalization by capturing both contextual dependencies and local consistency during GAN training. Specifically, a self-attention-based generator is introduced to integrate local features with their long-range dependencies yielding better feature representations, and hence generate faces that preserve identities better, especially for larger pose angles. Moreover, a novel face-attention-based discriminator is applied to emphasize local features of face regions, and hence reinforce the realism of synthetic frontal faces. Guided by semantic segmentation, four independent discriminators are used to distinguish between different aspects of a face (\ie skin, keypoints, hairline, and frontalized face). By introducing these two complementary attention mechanisms in generator and discriminator separately, we can learn a richer feature representation and generate identity preserving inference of frontal views with much finer details (i.e., more accurate facial appearance and textures) comparing to the state-of-the-art. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our DA-GAN approach.
摘要:面对frontalization提供了脸部数据增强的有效途径,进一步提高了在极端姿势方案中的面部识别性能。尽管深学习型脸合成方法的最新进展,这一问题仍然是由于具有挑战性的显著姿态和照度差异。在本文中,我们通过GAN训练期间捕获两者的上下文依赖关系和局部一致性呈现为照片般逼真的面frontalization一种新颖的双注意剖成对抗性网络(DA-GAN)。具体而言,基于自我关注发生器被引入到当地特色与他们的长距离依赖产生更好的特征表示整合,从而产生保护身份更好,特别是对于较大的姿势角度的面孔。另外,一种新型的基于面部注意鉴别器被施加到强调面部区域的局部特征,并因此加强合成正面人脸的真实感。通过语义分割的指导下,四个独立的鉴别器中使用的脸的不同方面(\即皮肤,关键点,发纹,和frontalized面)之间进行区分。通过分别引入发电机和鉴别这两种互补的注意力机制,我们可以学到更丰富的功能表现和产生的身份保护与更精细的细节(即,更准确的面部外形和纹理)正面看法推断比较状态的最-艺术。定量和定性实验结果表明我们的DA-GaN方法的有效性和效率。
25. Multilinear Compressive Learning with Prior Knowledge [PDF] 返回目录
Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis
Abstract: The recently proposed Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system that takes into account the multidimensional structure of the signals when designing the sensing and feature synthesis components. The key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. Thus, the ability to find such a discriminative tensor subspace and optimize the system to project the signals onto that data manifold plays an important role in Multilinear Compressive Learning. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable? and How to optimize the sensing and feature synthesis components to transform the original signals to the data manifold found in the first question? In our proposal, the discovery of a high-quality data manifold is conducted by training a nonlinear compressive learning system on the inference task. Its knowledge of the data manifold of interest is then progressively transferred to the MCL components via multi-stage supervised training with the supervisory information encoding how the compressed measurements, the synthesized features, and the predictions should be like. The proposed knowledge transfer algorithm also comes with a semi-supervised adaption that enables compressive learning models to utilize unlabeled data effectively. Extensive experiments demonstrate that the proposed knowledge transfer method can effectively train MCL models to compressively sense and synthesize better features for the learning tasks with improved performances, especially when the complexity of the learning task increases.
摘要:最近提出的多线性压缩学习(MCL)的框架联合多线性压缩感知和机器学习到一个终端到终端的系统设计感测和特征的合成组分时,考虑到的信号的多维结构。 MCL背后的关键思想是一个张子空间,可以从下游的学习任务,信号捕获本质特征的存在的假设。因此,要找到这样的区别张子空间,优化系统将信号投射到的数据歧管的能力起着压缩多线性学习了重要的作用。在本文中,我们提出了一个新的解决方案,以解决双方的上述要求,即如何找到那些张子空间,其中感兴趣的信号是非常可分离?和如何优化感测和特征合成组分来改造原始信号中的数据的第一个问题歧管发现?在我们的建议,高品质的数据流形的发现是通过对推理任务训练非线性压缩学习系统进行。其感兴趣的数据歧管的知识,然后逐步通过多级转移到MCL组件与监管信息编码指导训练如何压缩测量,合成特征和预测应该是这样的。所提出的知识转移算法还配备了一个半监督适应,使压缩的学习模式有效地利用无标签数据。大量的实验表明,该知识转移方法能有效地训练MCL模型来压缩感及合成更好的功能与性能改善的学习任务,尤其是当的学习任务的复杂性增加。
Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis
Abstract: The recently proposed Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system that takes into account the multidimensional structure of the signals when designing the sensing and feature synthesis components. The key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. Thus, the ability to find such a discriminative tensor subspace and optimize the system to project the signals onto that data manifold plays an important role in Multilinear Compressive Learning. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable? and How to optimize the sensing and feature synthesis components to transform the original signals to the data manifold found in the first question? In our proposal, the discovery of a high-quality data manifold is conducted by training a nonlinear compressive learning system on the inference task. Its knowledge of the data manifold of interest is then progressively transferred to the MCL components via multi-stage supervised training with the supervisory information encoding how the compressed measurements, the synthesized features, and the predictions should be like. The proposed knowledge transfer algorithm also comes with a semi-supervised adaption that enables compressive learning models to utilize unlabeled data effectively. Extensive experiments demonstrate that the proposed knowledge transfer method can effectively train MCL models to compressively sense and synthesize better features for the learning tasks with improved performances, especially when the complexity of the learning task increases.
摘要:最近提出的多线性压缩学习(MCL)的框架联合多线性压缩感知和机器学习到一个终端到终端的系统设计感测和特征的合成组分时,考虑到的信号的多维结构。 MCL背后的关键思想是一个张子空间,可以从下游的学习任务,信号捕获本质特征的存在的假设。因此,要找到这样的区别张子空间,优化系统将信号投射到的数据歧管的能力起着压缩多线性学习了重要的作用。在本文中,我们提出了一个新的解决方案,以解决双方的上述要求,即如何找到那些张子空间,其中感兴趣的信号是非常可分离?和如何优化感测和特征合成组分来改造原始信号中的数据的第一个问题歧管发现?在我们的建议,高品质的数据流形的发现是通过对推理任务训练非线性压缩学习系统进行。其感兴趣的数据歧管的知识,然后逐步通过多级转移到MCL组件与监管信息编码指导训练如何压缩测量,合成特征和预测应该是这样的。所提出的知识转移算法还配备了一个半监督适应,使压缩的学习模式有效地利用无标签数据。大量的实验表明,该知识转移方法能有效地训练MCL模型来压缩感及合成更好的功能与性能改善的学习任务,尤其是当的学习任务的复杂性增加。
26. The Tree Ensemble Layer: Differentiability meets Conditional Computation [PDF] 返回目录
Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder
Abstract: Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, in practice they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e.g., SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.
摘要:神经网络和树歌舞团是国家的最先进的学习者,各有其独特的统计和计算的优势。我们旨在通过引入用于神经网络新的层,可微决策树(也称为软树)的集合构成的这些优点结合起来。虽然微树分析看好文献结果,在实践中它们通常在训练和推理缓慢,因为它们不支持的条件计算。我们通过引入样本路由新稀疏激活功能缓解此问题,并通过利用稀疏开发专门的向前和向后传播算法实现真正的条件计算。我们高效的算法铺平道路,共同训练过使用一阶方法(例如,SGD)深而宽的树合奏。于23个分类数据集的实验表明相比,在文献中使用的并在相对于梯度的参数的数量超过20倍减少升压树,同时维持有竞争力的性能可分化树木超过10倍速度起坐。此外,在CIFAR,MNIST和时装MNIST实验表明,与我们的树层替换细胞神经网络的致密层减少了由7-53%的测试损失和参数由8倍的数量。我们提供了Keras API的开源TensorFlow实现。
Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder
Abstract: Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, in practice they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e.g., SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.
摘要:神经网络和树歌舞团是国家的最先进的学习者,各有其独特的统计和计算的优势。我们旨在通过引入用于神经网络新的层,可微决策树(也称为软树)的集合构成的这些优点结合起来。虽然微树分析看好文献结果,在实践中它们通常在训练和推理缓慢,因为它们不支持的条件计算。我们通过引入样本路由新稀疏激活功能缓解此问题,并通过利用稀疏开发专门的向前和向后传播算法实现真正的条件计算。我们高效的算法铺平道路,共同训练过使用一阶方法(例如,SGD)深而宽的树合奏。于23个分类数据集的实验表明相比,在文献中使用的并在相对于梯度的参数的数量超过20倍减少升压树,同时维持有竞争力的性能可分化树木超过10倍速度起坐。此外,在CIFAR,MNIST和时装MNIST实验表明,与我们的树层替换细胞神经网络的致密层减少了由7-53%的测试损失和参数由8倍的数量。我们提供了Keras API的开源TensorFlow实现。
27. Learning Bijective Feature Maps for Linear ICA [PDF] 返回目录
Alexander Camuto, Matthew Willetts, Brooks Paige, Chris Holmes, Stephen Roberts
Abstract: Separating high-dimensional data like images into independent latent factors remains an open research problem. Here we develop a method that jointly learns a linear independent component analysis (ICA) model with non-linear bijective feature maps. By combining these two methods, ICA can learn interpretable latent structure for images. For non-square ICA, where we assume the number of sources is less than the dimensionality of data, we achieve better unsupervised latent factor discovery than flow-based models and linear ICA. This performance scales to large image datasets such as CelebA
摘要:分离高维数据,如图像转换成独立的潜在因素仍然是一个开放性的研究问题。在这里,我们开发共同学习线性独立成分分析(ICA)与非线性双射特征映射模型的方法。通过这两种方法相结合,ICA可以学习图像判读潜在结构。对于非方形ICA,在这里我们假设源的数量小于数据的维度,我们取得更好的监督的潜在因素发现不是基于流的模型和线性ICA。这种性能扩展到大的图像数据集,如CelebA
Alexander Camuto, Matthew Willetts, Brooks Paige, Chris Holmes, Stephen Roberts
Abstract: Separating high-dimensional data like images into independent latent factors remains an open research problem. Here we develop a method that jointly learns a linear independent component analysis (ICA) model with non-linear bijective feature maps. By combining these two methods, ICA can learn interpretable latent structure for images. For non-square ICA, where we assume the number of sources is less than the dimensionality of data, we achieve better unsupervised latent factor discovery than flow-based models and linear ICA. This performance scales to large image datasets such as CelebA
摘要:分离高维数据,如图像转换成独立的潜在因素仍然是一个开放性的研究问题。在这里,我们开发共同学习线性独立成分分析(ICA)与非线性双射特征映射模型的方法。通过这两种方法相结合,ICA可以学习图像判读潜在结构。对于非方形ICA,在这里我们假设源的数量小于数据的维度,我们取得更好的监督的潜在因素发现不是基于流的模型和线性ICA。这种性能扩展到大的图像数据集,如CelebA
28. Deep Learning in Medical Ultrasound Image Segmentation: a Review [PDF] 返回目录
Ziyang Wang
Abstract: Applying machine learning technologies, especially deep learning, into medical image segmentation is being widely studied because of its state-of-the-art performance and results. It can be a key step to provide a reliable basis for clinical diagnosis, such as 3D reconstruction of human tissues, image-guided interventions, image analyzing and visualization. In this review article, deep-learning-based methods for ultrasound image segmentation are categorized into six main groups according to their architectures and training at first. Secondly, for each group, several current representative algorithms are selected, introduced, analyzed and summarized in detail. In addition, common evaluation methods for image segmentation and ultrasound image segmentation datasets are summarized. Further, the performance of the current methods and their evaluations are reviewed. In the end, the challenges and potential research directions for medical ultrasound image segmentation are discussed.
摘要:将机器学习技术,特别是深度学习,为医学图像分割是因为国家的最先进的性能和效果被广泛研究。它可以是提供用于临床诊断,如3D重建人体组织的,图像引导的介入,图像分析和可视化的可靠依据的关键步骤。在这篇综述文章中,超声图像分割深学习为基础的方法是根据在第一个自己的架构和培训分为六个主要群体。其次,对于每个组,几个当前代表性的算法被选择,引入,分析和详细总结。此外,图像分割和超声图像分割数据集共同的评价方法进行了总结。此外,目前的方法和他们的评价性能进行了综述。最后,对医学超声图像分割的挑战和潜在的研究方向进行了探讨。
Ziyang Wang
Abstract: Applying machine learning technologies, especially deep learning, into medical image segmentation is being widely studied because of its state-of-the-art performance and results. It can be a key step to provide a reliable basis for clinical diagnosis, such as 3D reconstruction of human tissues, image-guided interventions, image analyzing and visualization. In this review article, deep-learning-based methods for ultrasound image segmentation are categorized into six main groups according to their architectures and training at first. Secondly, for each group, several current representative algorithms are selected, introduced, analyzed and summarized in detail. In addition, common evaluation methods for image segmentation and ultrasound image segmentation datasets are summarized. Further, the performance of the current methods and their evaluations are reviewed. In the end, the challenges and potential research directions for medical ultrasound image segmentation are discussed.
摘要:将机器学习技术,特别是深度学习,为医学图像分割是因为国家的最先进的性能和效果被广泛研究。它可以是提供用于临床诊断,如3D重建人体组织的,图像引导的介入,图像分析和可视化的可靠依据的关键步骤。在这篇综述文章中,超声图像分割深学习为基础的方法是根据在第一个自己的架构和培训分为六个主要群体。其次,对于每个组,几个当前代表性的算法被选择,引入,分析和详细总结。此外,图像分割和超声图像分割数据集共同的评价方法进行了总结。此外,目前的方法和他们的评价性能进行了综述。最后,对医学超声图像分割的挑战和潜在的研究方向进行了探讨。
29. Robust Quantization: One Model to Rule Them All [PDF] 返回目录
Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yuri Nahshan, Alex Bronstein, Uri Weiser
Abstract: Neural network quantization methods often involve simulating the quantization process during training. This makes the trained model highly dependent on the precise way quantization is performed. Since low-precision accelerators differ in their quantization policies and their supported mix of data-types, a model trained for one accelerator may not be suitable for another. To address this issue, we propose KURE, a method that provides intrinsic robustness to the model against a broad range of quantization implementations. We show that KURE yields a generic model that may be deployed on numerous inference accelerators without a significant loss in accuracy.
摘要:神经网络量化方法往往需要在训练中模拟量化处理。这使得训练模型高度依赖于进行量化的精确的方式。由于低精度加速器在其量化政策及其支持的数据类型的组合不同,训练了一个加速器模式未必适合另一个。为了解决这个问题,我们提出KURE,提供固有的稳健性针对广泛的量化实现的模型的方法。我们发现,KURE产生,可能在许多推论加速器部署没有精度显著损失的通用模型。
Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yuri Nahshan, Alex Bronstein, Uri Weiser
Abstract: Neural network quantization methods often involve simulating the quantization process during training. This makes the trained model highly dependent on the precise way quantization is performed. Since low-precision accelerators differ in their quantization policies and their supported mix of data-types, a model trained for one accelerator may not be suitable for another. To address this issue, we propose KURE, a method that provides intrinsic robustness to the model against a broad range of quantization implementations. We show that KURE yields a generic model that may be deployed on numerous inference accelerators without a significant loss in accuracy.
摘要:神经网络量化方法往往需要在训练中模拟量化处理。这使得训练模型高度依赖于进行量化的精确的方式。由于低精度加速器在其量化政策及其支持的数据类型的组合不同,训练了一个加速器模式未必适合另一个。为了解决这个问题,我们提出KURE,提供固有的稳健性针对广泛的量化实现的模型的方法。我们发现,KURE产生,可能在许多推论加速器部署没有精度显著损失的通用模型。
30. Image Entropy for Classification and Analysis of Pathology Slides [PDF] 返回目录
Steven J. Frank
Abstract: Pathology slides of lung malignancies are classified using the "Salient Slices" technique described in Frank et al., 2020. A four-fold cross-validation study using a small image set (42 adenocarcinoma slides and 42 squamous cell carcinoma slides) produced fully correct classifications in each fold. Probability maps enable visualization of the underlying basis for a classification.
摘要:(42张腺癌幻灯片和42张鳞状细胞癌幻灯片)产生的肺恶性肿瘤的病理载玻片使用弗兰克描述的“凸片”技术等人,2020 A,使用一个小的图像组四个倍交叉验证研究分类。完全正确的分类在每个倍。概率图启用对分类的底层基础的可视化。
Steven J. Frank
Abstract: Pathology slides of lung malignancies are classified using the "Salient Slices" technique described in Frank et al., 2020. A four-fold cross-validation study using a small image set (42 adenocarcinoma slides and 42 squamous cell carcinoma slides) produced fully correct classifications in each fold. Probability maps enable visualization of the underlying basis for a classification.
摘要:(42张腺癌幻灯片和42张鳞状细胞癌幻灯片)产生的肺恶性肿瘤的病理载玻片使用弗兰克描述的“凸片”技术等人,2020 A,使用一个小的图像组四个倍交叉验证研究分类。完全正确的分类在每个倍。概率图启用对分类的底层基础的可视化。
31. Deflecting Adversarial Attacks [PDF] 返回目录
Yao Qin, Nicholas Frosst, Colin Raffel, Garrison Cottrell, Geoffrey Hinton
Abstract: There has been an ongoing cycle where stronger defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach towards ending this cycle where we "deflect'' adversarial attacks by causing the attacker to produce an input that semantically resembles the attack's target class. To this end, we first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance on both standard and defense-aware attacks. We then show that undetected attacks against our defense often perceptually resemble the adversarial target class by performing a human study where participants are asked to label images produced by the attack. These attack images can no longer be called "adversarial'' because our network classifies them the same way as humans do.
摘要:已经有一个持续的周期中对敌对攻击防御强随后被更先进的国防意识的攻击破坏。我们提出争取结束这个周期的新方法,我们“打歪'对抗性通过使攻击者产生语义类似攻击的目标类的输入攻击。为此,我们首先提出了一种基于胶囊网络更强的防御结合了三种检测机制,以实现国家的最先进的检测性能的标准和国防意识的攻击。然后,我们表现出对我们的防守是未被发现的袭击进行人体研究,参与者被要求标签图像往往感知类似于敌对目标类通过攻击产生的。这些攻击的图像不能再被称为是因为我们的网络进行分类“对抗性‘’他们为人类做同样的方式。
Yao Qin, Nicholas Frosst, Colin Raffel, Garrison Cottrell, Geoffrey Hinton
Abstract: There has been an ongoing cycle where stronger defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach towards ending this cycle where we "deflect'' adversarial attacks by causing the attacker to produce an input that semantically resembles the attack's target class. To this end, we first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance on both standard and defense-aware attacks. We then show that undetected attacks against our defense often perceptually resemble the adversarial target class by performing a human study where participants are asked to label images produced by the attack. These attack images can no longer be called "adversarial'' because our network classifies them the same way as humans do.
摘要:已经有一个持续的周期中对敌对攻击防御强随后被更先进的国防意识的攻击破坏。我们提出争取结束这个周期的新方法,我们“打歪'对抗性通过使攻击者产生语义类似攻击的目标类的输入攻击。为此,我们首先提出了一种基于胶囊网络更强的防御结合了三种检测机制,以实现国家的最先进的检测性能的标准和国防意识的攻击。然后,我们表现出对我们的防守是未被发现的袭击进行人体研究,参与者被要求标签图像往往感知类似于敌对目标类通过攻击产生的。这些攻击的图像不能再被称为是因为我们的网络进行分类“对抗性‘’他们为人类做同样的方式。
32. Picking Winning Tickets Before Training by Preserving Gradient Flow [PDF] 返回目录
Chaoqi Wang, Guodong Zhang, Roger Grosse
Abstract: Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.
摘要:Overparameterization已经显示出两者的优化和神经网络的推广中受益,但大型网络资源都在训练和测试时间饿了。网络修剪可以减少测试时间资源需求,但通常适用于训练有素的网络,因此无法避免昂贵的培训过程。我们的目标是修剪网络在初始化,从而在训练时间和节约资源。具体而言,我们认为,有效的训练需要保留通过网络梯度流动。这导致了一个简单而有效的删除准则,我们长期梯度信号保存(GRASP)。我们经验探讨CIFAR-10所提出的方法具有广泛的实验效果,CIFAR-100,微型-ImageNet和ImageNet,使用VGGNet和RESNET架构。我们的方法可以在初始化修剪上ImageNet一个VGG-16网络的权重的80%,与只在顶1精度的1.6%的降幅。此外,我们的方法实现了比在极端稀疏水平基线显著更好的性能。
Chaoqi Wang, Guodong Zhang, Roger Grosse
Abstract: Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.
摘要:Overparameterization已经显示出两者的优化和神经网络的推广中受益,但大型网络资源都在训练和测试时间饿了。网络修剪可以减少测试时间资源需求,但通常适用于训练有素的网络,因此无法避免昂贵的培训过程。我们的目标是修剪网络在初始化,从而在训练时间和节约资源。具体而言,我们认为,有效的训练需要保留通过网络梯度流动。这导致了一个简单而有效的删除准则,我们长期梯度信号保存(GRASP)。我们经验探讨CIFAR-10所提出的方法具有广泛的实验效果,CIFAR-100,微型-ImageNet和ImageNet,使用VGGNet和RESNET架构。我们的方法可以在初始化修剪上ImageNet一个VGG-16网络的权重的80%,与只在顶1精度的1.6%的降幅。此外,我们的方法实现了比在极端稀疏水平基线显著更好的性能。
33. Evolutionary Optimization of Deep Learning Activation Functions [PDF] 返回目录
Garrett Bingham, William Macke, Risto Miikkulainen
Abstract: The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.
摘要:激活功能的选择可能会对神经网络的性能有很大的影响。虽然已经有一些尝试来手工工程师新颖激活函数,整流线性单元(RELU)保持在实践中最常用的。本文表明,进化算法可以发现优于RELU新颖激活功能。候选激活功能的基于树的搜索空间的定义,并与变异,交叉和穷举搜索探索。上的CIFAR-10和CIFAR-100图像数据组训练宽残留网络实验表明,该方法是有效的。在网络精度统计显著上升与演变激活功能结果更换RELU。当进化允许自定义激活功能,一个特定的任务,则可获得最佳性能;然而,这些新的激活函数被示出一概而论,实现跨越任务高性能。因此,激活功能进化优化神经网络元学习的有前途的新的层面。
Garrett Bingham, William Macke, Risto Miikkulainen
Abstract: The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.
摘要:激活功能的选择可能会对神经网络的性能有很大的影响。虽然已经有一些尝试来手工工程师新颖激活函数,整流线性单元(RELU)保持在实践中最常用的。本文表明,进化算法可以发现优于RELU新颖激活功能。候选激活功能的基于树的搜索空间的定义,并与变异,交叉和穷举搜索探索。上的CIFAR-10和CIFAR-100图像数据组训练宽残留网络实验表明,该方法是有效的。在网络精度统计显著上升与演变激活功能结果更换RELU。当进化允许自定义激活功能,一个特定的任务,则可获得最佳性能;然而,这些新的激活函数被示出一概而论,实现跨越任务高性能。因此,激活功能进化优化神经网络元学习的有前途的新的层面。
34. AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite [PDF] 返回目录
Wanling Gao, Fei Tang, Jianfeng Zhan, Chuanxin Lan, Chunjie Luo, Lei Wang, Jiahui Dai, Zheng Cao, Xiongwang Xiong, Zihan Jiang, Tianshu Hao, Fanda Fan, Xu Wen, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Gang Lu, Junchao Shao, Zhenyu Wang, Xiaoyu Wang, Hainan Ye
Abstract: Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{this http URL}.
摘要:特定领域的软件和硬件的协同设计是令人鼓舞的,因为它是非常容易实现的任务较少的效率。敏捷特定领域的标杆速度可达的过程,因为它不仅提供了相应的设计输入,但也有相关的指标和工具。不幸的是,像大数据,人工智能和互联网服务的现代化工作负载中相形见绌代码大小,部署规模和执行路径方面的传统之一,并因此引起严重的标杆挑战。本文提出了一种灵活的特定领域的基准测试方法。加上17级行业的合作伙伴,我们确定的十大终端到终端的应用场景,其中十六个代表性的AI任务蒸馏水作为AI部件基准。我们提出重要的AI和非AI成分基准为终端到终端的基准的排列。终端到终端的基准是行业规模化应用的本质属性的升华。我们设计并实现了一个高度可扩展的,可配置的,灵活的基准框架,其中,我们提出了建立终端到终端的基准方针的基础上,并给出了第一端至端的互联网服务AI基准。初步评估显示了我们的基准测试套件的硬件和软件设计,微体系结构的研究,和代码开发者的价值---对MLPerf和TailBench AIBench。这些规范,源代码,测试平台和结果是公众可从该网站\ {URL这个HTTP URL}。
Wanling Gao, Fei Tang, Jianfeng Zhan, Chuanxin Lan, Chunjie Luo, Lei Wang, Jiahui Dai, Zheng Cao, Xiongwang Xiong, Zihan Jiang, Tianshu Hao, Fanda Fan, Xu Wen, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Gang Lu, Junchao Shao, Zhenyu Wang, Xiaoyu Wang, Hainan Ye
Abstract: Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{this http URL}.
摘要:特定领域的软件和硬件的协同设计是令人鼓舞的,因为它是非常容易实现的任务较少的效率。敏捷特定领域的标杆速度可达的过程,因为它不仅提供了相应的设计输入,但也有相关的指标和工具。不幸的是,像大数据,人工智能和互联网服务的现代化工作负载中相形见绌代码大小,部署规模和执行路径方面的传统之一,并因此引起严重的标杆挑战。本文提出了一种灵活的特定领域的基准测试方法。加上17级行业的合作伙伴,我们确定的十大终端到终端的应用场景,其中十六个代表性的AI任务蒸馏水作为AI部件基准。我们提出重要的AI和非AI成分基准为终端到终端的基准的排列。终端到终端的基准是行业规模化应用的本质属性的升华。我们设计并实现了一个高度可扩展的,可配置的,灵活的基准框架,其中,我们提出了建立终端到终端的基准方针的基础上,并给出了第一端至端的互联网服务AI基准。初步评估显示了我们的基准测试套件的硬件和软件设计,微体系结构的研究,和代码开发者的价值---对MLPerf和TailBench AIBench。这些规范,源代码,测试平台和结果是公众可从该网站\ {URL这个HTTP URL}。
注:中文为机器翻译结果!