目录
3. Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation [PDF] 摘要
7. Deep Multi-Shot Network for modelling Appearance Similarity in Multi-Person Tracking applications [PDF] 摘要
11. Simultaneous Learning from Human Pose and Object Cues for Real-Time Activity Recognition [PDF] 摘要
15. MNEW: Multi-domain Neighborhood Embedding and Weighting for Sparse Point Clouds Segmentation [PDF] 摘要
20. Robust Self-Supervised Convolutional Neural Network for Subspace Clustering and Classification [PDF] 摘要
21. Improving BPSO-based feature selection applied to offline WI handwritten signature verification through overfitting control [PDF] 摘要
22. A white-box analysis on the writer-independent dichotomy transformation applied to offline handwritten signature verification [PDF] 摘要
29. Multiform Fonts-to-Fonts Translation via Style and Content Disentangled Representations of Chinese Character [PDF] 摘要
30. An End-to-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings [PDF] 摘要
33. Streaming Networks: Increase Noise Robustness and Filter Diversity via Hard-wired and Input-induced Sparsity [PDF] 摘要
44. Real-time Classification from Short Event-Camera Streams using Input-filtering Neural ODEs [PDF] 摘要
45. Adaptive Multiscale Illumination-Invariant Feature Representation for Undersampled Face Recognition [PDF] 摘要
46. Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation [PDF] 摘要
57. Field-Level Crop Type Classification with k Nearest Neighbors: A Baseline for a New Kenya Smallholder Dataset [PDF] 摘要
59. LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood [PDF] 摘要
63. Efficient Scale Estimation Methods using Lightweight Deep Convolutional Neural Networks for Visual Tracking [PDF] 摘要
64. Beyond Background-Aware Correlation Filters: Adaptive Context Modeling by Hand-Crafted and Deep RGB Features for Visual Tracking [PDF] 摘要
65. Empirical Upper Bound, Error Diagnosis and Invariance Analysis of Modern Object Detectors [PDF] 摘要
69. Harmony-Search and Otsu based System for Coronavirus Disease (COVID-19) Detection using Lung CT Scan Images [PDF] 摘要
71. Deep Learning on Chest X-ray Images to Detect and Evaluate Pneumonia Cases at the Era of COVID-19 [PDF] 摘要
73. Convolutional Neural Networks based automated segmentation and labelling of the lumbar spine X-ray [PDF] 摘要
75. Two-Stage Resampling for Convolutional Neural Network Training in the Imbalanced Colorectal Cancer Image Classification [PDF] 摘要
77. Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study [PDF] 摘要
80. Deep Attentive Generative Adversarial Network for Photo-Realistic Image De-Quantization [PDF] 摘要
83. COVID-Xpert: An AI Powered Population Screening of COVID-19 Cases Using Chest Radiography Images [PDF] 摘要
摘要
1. Feature Pyramid Grids [PDF] 返回目录
Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer
Abstract: Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale. In this paper, we present Feature Pyramid Grids (FPG), a deep multi-pathway feature pyramid, that represents the feature scale-space as a regular grid of parallel bottom-up pathways which are fused by multi-directional lateral connections. FPG can improve single-pathway feature pyramid networks by significantly increasing its performance at similar computation cost, highlighting importance of deep pyramid representations. In addition to its general and uniform structure, over complicated structures that have been found with neural architecture search, it also compares favorably against such approaches without relying on search. We hope that FPG with its uniform and effective nature can serve as a strong component for future work in object recognition.
摘要:功能金字塔网络在目标物检测文献中被广泛采用,以提高功能表示为更好地处理规模的变化。在本文中,我们本特征金字塔网格(FPG),深多途径特征金字塔,表示所述特征的尺度空间作为并行自下而上途径的规则网格其通过多方向横向连接稠合。 FPG可以通过类似的计算成本显著提高其性能,突出深金字塔表示的重要性提高单通路功能金字塔网络。除了它一般和结构均匀,已经发现与神经结构搜索过于复杂的结构,它也毫不逊色反对这种方法进行了比较,而不依赖于搜索。我们希望能与它的统一和有效本性FPG可以作为在物体识别今后工作的强有力的成分。
Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer
Abstract: Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale. In this paper, we present Feature Pyramid Grids (FPG), a deep multi-pathway feature pyramid, that represents the feature scale-space as a regular grid of parallel bottom-up pathways which are fused by multi-directional lateral connections. FPG can improve single-pathway feature pyramid networks by significantly increasing its performance at similar computation cost, highlighting importance of deep pyramid representations. In addition to its general and uniform structure, over complicated structures that have been found with neural architecture search, it also compares favorably against such approaches without relying on search. We hope that FPG with its uniform and effective nature can serve as a strong component for future work in object recognition.
摘要:功能金字塔网络在目标物检测文献中被广泛采用,以提高功能表示为更好地处理规模的变化。在本文中,我们本特征金字塔网格(FPG),深多途径特征金字塔,表示所述特征的尺度空间作为并行自下而上途径的规则网格其通过多方向横向连接稠合。 FPG可以通过类似的计算成本显著提高其性能,突出深金字塔表示的重要性提高单通路功能金字塔网络。除了它一般和结构均匀,已经发现与神经结构搜索过于复杂的结构,它也毫不逊色反对这种方法进行了比较,而不依赖于搜索。我们希望能与它的统一和有效本性FPG可以作为在物体识别今后工作的强有力的成分。
2. Event Based, Near Eye Gaze Tracking Beyond 10,000Hz [PDF] 返回目录
Anastasios N. Angelopoulos, Julien N.P. Martel, Amit P.S. Kohli, Jorg Conradt, Gordon Wetzstein
Abstract: Fast and accurate eye tracking is crucial for many applications. Current camera-based eye tracking systems, however, are fundamentally limited by their bandwidth, forcing a tradeoff between image resolution and framerate, i.e. between latency and update rate. Here, we propose a hybrid frame-event-based near-eye gaze tracking system offering update rates beyond 10,000 Hz with an accuracy that matches that of high-end desktop-mounted commercial eye trackers when evaluated in the same conditions. Our system builds on emerging event cameras that simultaneously acquire regularly sampled frames and adaptively sampled events. We develop an online 2D pupil fitting method that updates a parametric model every one or few events. Moreover, we propose a polynomial regressor for estimating the gaze vector from the parametric pupil model in real time. Using the first hybrid frame-event gaze dataset, which will be made public, we demonstrate that our system achieves accuracies of 0.45 degrees - 1.75 degrees for fields of view ranging from 45 degrees to 98 degrees.
摘要:快速准确的眼动追踪对许多应用是至关重要的。当前基于相机的眼睛跟踪系统,但是,从根本上可以通过带宽的限制,迫使延迟和更新速率之间的图像分辨率和帧率,即之间的折衷。在这里,我们提出了一种基于帧事件混合动力车近眼睛注视跟踪系统提供的更新速率超过10000赫兹,其精确度在相同条件下评估时高端的比赛桌面安装商业眼动仪。我们的系统建立在新兴的事件摄像机可同时获得定期采样帧和自适应采样事件。我们开发了一个在线2D瞳拟合方法,更新的参数模型的每一个或几个事件。此外,我们提出了一个多项式回归从实时参数瞳孔模型估计的注视矢量。使用所述第一混合帧事件注视数据集,这将被公开,我们证明,我们的系统实现了0.45度的精度 - 1.75度用于测距从45度到98度的视场。
Anastasios N. Angelopoulos, Julien N.P. Martel, Amit P.S. Kohli, Jorg Conradt, Gordon Wetzstein
Abstract: Fast and accurate eye tracking is crucial for many applications. Current camera-based eye tracking systems, however, are fundamentally limited by their bandwidth, forcing a tradeoff between image resolution and framerate, i.e. between latency and update rate. Here, we propose a hybrid frame-event-based near-eye gaze tracking system offering update rates beyond 10,000 Hz with an accuracy that matches that of high-end desktop-mounted commercial eye trackers when evaluated in the same conditions. Our system builds on emerging event cameras that simultaneously acquire regularly sampled frames and adaptively sampled events. We develop an online 2D pupil fitting method that updates a parametric model every one or few events. Moreover, we propose a polynomial regressor for estimating the gaze vector from the parametric pupil model in real time. Using the first hybrid frame-event gaze dataset, which will be made public, we demonstrate that our system achieves accuracies of 0.45 degrees - 1.75 degrees for fields of view ranging from 45 degrees to 98 degrees.
摘要:快速准确的眼动追踪对许多应用是至关重要的。当前基于相机的眼睛跟踪系统,但是,从根本上可以通过带宽的限制,迫使延迟和更新速率之间的图像分辨率和帧率,即之间的折衷。在这里,我们提出了一种基于帧事件混合动力车近眼睛注视跟踪系统提供的更新速率超过10000赫兹,其精确度在相同条件下评估时高端的比赛桌面安装商业眼动仪。我们的系统建立在新兴的事件摄像机可同时获得定期采样帧和自适应采样事件。我们开发了一个在线2D瞳拟合方法,更新的参数模型的每一个或几个事件。此外,我们提出了一个多项式回归从实时参数瞳孔模型估计的注视矢量。使用所述第一混合帧事件注视数据集,这将被公开,我们证明,我们的系统实现了0.45度的精度 - 1.75度用于测距从45度到98度的视场。
3. Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation [PDF] 返回目录
Jiaming Sun, Linghao Chen, Yiming Xie, Siyu Zhang, Qinhong Jiang, Xiaowei Zhou, Hujun Bao
Abstract: In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
摘要:在本文中,我们提出了名为详细显示R-CNN用于从立体图像的3D对象检测的新颖系统。最近的许多作品首先恢复与差异估算点云解决这个问题,然后应用3D探测器。视差图来计算针对整个图像,这是昂贵的和以前未能类别特异性的杠杆作用。与此相反,我们设计出只对感兴趣的对象像素预测差距和学习之前更精确的视差估计一类特定形状的实例视差估计网络(iDispNet)。为了解决在训练差距注释稀缺的挑战,我们建议使用一个统计形状模型来生成稠密视差伪地面实况,而不需要激光雷达点云的,这使得我们的系统更加广泛适用。在KITTI实验数据集上,即使在激光雷达地面实况不可用在训练时间,详细显示R-CNN实现了有竞争力的性能,远远超过前国家的最先进的方法,通过20%的平均准确率方面。
Jiaming Sun, Linghao Chen, Yiming Xie, Siyu Zhang, Qinhong Jiang, Xiaowei Zhou, Hujun Bao
Abstract: In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
摘要:在本文中,我们提出了名为详细显示R-CNN用于从立体图像的3D对象检测的新颖系统。最近的许多作品首先恢复与差异估算点云解决这个问题,然后应用3D探测器。视差图来计算针对整个图像,这是昂贵的和以前未能类别特异性的杠杆作用。与此相反,我们设计出只对感兴趣的对象像素预测差距和学习之前更精确的视差估计一类特定形状的实例视差估计网络(iDispNet)。为了解决在训练差距注释稀缺的挑战,我们建议使用一个统计形状模型来生成稠密视差伪地面实况,而不需要激光雷达点云的,这使得我们的系统更加广泛适用。在KITTI实验数据集上,即使在激光雷达地面实况不可用在训练时间,详细显示R-CNN实现了有竞争力的性能,远远超过前国家的最先进的方法,通过20%的平均准确率方面。
4. Temporal Pyramid Network for Action Recognition [PDF] 返回目录
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, Bolei Zhou
Abstract: Visual tempo characterizes the dynamics and the temporal scale of an action. Modeling such visual tempos of different actions facilitates their recognition. Previous works often capture the visual tempo through sampling raw videos at multiple rates and constructing an input-level frame pyramid, which usually requires a costly multi-branch network to handle. In this work we propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner. Two essential components of TPN, the source of features and the fusion of features, form a feature hierarchy for the backbone so that it can capture action instances at various tempos. TPN also shows consistent improvements over other challenging baselines on several action recognition datasets. Specifically, when equipped with TPN, the 3D ResNet-50 with dense sampling obtains a 2% gain on the validation set of Kinetics-400. A further analysis also reveals that TPN gains most of its improvements on action classes that have large variances in their visual tempos, validating the effectiveness of TPN.
摘要:视觉节奏特点的动态和行动的时间尺度。造型各异的动作这样的视觉节奏有利于他们的认可。以前的作品往往通过多个速率采样原始视频和构建输入级帧的金字塔,这通常需要昂贵的多分支网络来处理捕获的视觉节奏。在这项工作中,我们提出了一个通用的时空金字塔网(TPN)在功能层面,它可以在一个插件和播放方式灵活地集成到2D或3D骨干网。 TPN的两个重要组成部分,功能的来源和功能融合,形成骨干功能层次,以便它可以在不同的节拍捕捉动作的情况。 TPN还显示了几个动作识别的数据集在其它极具挑战性基线持续改善。具体地,当配备有TPN,所述3D RESNET-50与密集采样获得关于验证集合动力学-400的2%的增益。进一步的分析还表明,TPN获得其大部分的改进对那些在他们的视觉节奏变化很大的,验证TPN的有效性动作类。
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, Bolei Zhou
Abstract: Visual tempo characterizes the dynamics and the temporal scale of an action. Modeling such visual tempos of different actions facilitates their recognition. Previous works often capture the visual tempo through sampling raw videos at multiple rates and constructing an input-level frame pyramid, which usually requires a costly multi-branch network to handle. In this work we propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner. Two essential components of TPN, the source of features and the fusion of features, form a feature hierarchy for the backbone so that it can capture action instances at various tempos. TPN also shows consistent improvements over other challenging baselines on several action recognition datasets. Specifically, when equipped with TPN, the 3D ResNet-50 with dense sampling obtains a 2% gain on the validation set of Kinetics-400. A further analysis also reveals that TPN gains most of its improvements on action classes that have large variances in their visual tempos, validating the effectiveness of TPN.
摘要:视觉节奏特点的动态和行动的时间尺度。造型各异的动作这样的视觉节奏有利于他们的认可。以前的作品往往通过多个速率采样原始视频和构建输入级帧的金字塔,这通常需要昂贵的多分支网络来处理捕获的视觉节奏。在这项工作中,我们提出了一个通用的时空金字塔网(TPN)在功能层面,它可以在一个插件和播放方式灵活地集成到2D或3D骨干网。 TPN的两个重要组成部分,功能的来源和功能融合,形成骨干功能层次,以便它可以在不同的节拍捕捉动作的情况。 TPN还显示了几个动作识别的数据集在其它极具挑战性基线持续改善。具体地,当配备有TPN,所述3D RESNET-50与密集采样获得关于验证集合动力学-400的2%的增益。进一步的分析还表明,TPN获得其大部分的改进对那些在他们的视觉节奏变化很大的,验证TPN的有效性动作类。
5. Unsupervised Person Re-identification via Softened Similarity Learning [PDF] 返回目录
Yutian Lin, Lingxi Xie, Yu Wu, Chenggang Yan, Qi Tian
Abstract: Person re-identification (re-ID) is an important topic in computer vision. This paper studies the unsupervised setting of re-ID, which does not require any labeled information and thus is freely deployed to new scenarios. There are very few studies under this setting, and one of the best approach till now used iterative clustering and classification, so that unlabeled images are clustered into pseudo classes for a classifier to get trained, and the updated features are used for clustering and so on. This approach suffers two problems, namely, the difficulty of determining the number of clusters, and the hard quantization loss in clustering. In this paper, we follow the iterative training mechanism but discard clustering, since it incurs loss from hard quantization, yet its only product, image-level similarity, can be easily replaced by pairwise computation and a softened classification task. With these improvements, our approach becomes more elegant and is more robust to hyper-parameter changes. Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance under the unsupervised re-ID setting.
摘要:人重新鉴定(重新-ID)是计算机视觉中的一个重要课题。本文研究再ID的无监督的设置,它不需要任何标签的信息,从而自由地部署新的场景。有在此设置下很少研究,至今使用迭代聚类和分类的最佳方法之一,使未标记的图像被聚集成伪类分类得到培训,更新的功能是用于群集等。这种方法受到两个问题,即确定集群的数量的难度,并在集群硬量化损失。在本文中,我们遵循的迭代训练机制,但丢弃群集,因为它会带来硬量化损失,但其唯一的产品,图像层次相似,可以很容易地通过逐计算和软化的分类任务所取代。通过这些改进,我们的方法变得更优雅,更稳健的超参数变化。上两个基于视频图像为基础的和数据集实验表明无监督重新ID设定下状态的最先进的性能。
Yutian Lin, Lingxi Xie, Yu Wu, Chenggang Yan, Qi Tian
Abstract: Person re-identification (re-ID) is an important topic in computer vision. This paper studies the unsupervised setting of re-ID, which does not require any labeled information and thus is freely deployed to new scenarios. There are very few studies under this setting, and one of the best approach till now used iterative clustering and classification, so that unlabeled images are clustered into pseudo classes for a classifier to get trained, and the updated features are used for clustering and so on. This approach suffers two problems, namely, the difficulty of determining the number of clusters, and the hard quantization loss in clustering. In this paper, we follow the iterative training mechanism but discard clustering, since it incurs loss from hard quantization, yet its only product, image-level similarity, can be easily replaced by pairwise computation and a softened classification task. With these improvements, our approach becomes more elegant and is more robust to hyper-parameter changes. Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance under the unsupervised re-ID setting.
摘要:人重新鉴定(重新-ID)是计算机视觉中的一个重要课题。本文研究再ID的无监督的设置,它不需要任何标签的信息,从而自由地部署新的场景。有在此设置下很少研究,至今使用迭代聚类和分类的最佳方法之一,使未标记的图像被聚集成伪类分类得到培训,更新的功能是用于群集等。这种方法受到两个问题,即确定集群的数量的难度,并在集群硬量化损失。在本文中,我们遵循的迭代训练机制,但丢弃群集,因为它会带来硬量化损失,但其唯一的产品,图像层次相似,可以很容易地通过逐计算和软化的分类任务所取代。通过这些改进,我们的方法变得更优雅,更稳健的超参数变化。上两个基于视频图像为基础的和数据集实验表明无监督重新ID设定下状态的最先进的性能。
6. Dense Regression Network for Video Grounding [PDF] 返回目录
Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan
Abstract: We address the problem of video grounding from natural language queries. The key challenge in this task is that one training video might only contain a few annotated starting/ending frames that can be used as positive examples for model training. Most conventional approaches directly train a binary classifier using such imbalance data, thus achieving inferior results. The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy. Specifically, we design a novel dense regression network (DRN) to regress the distances from each frame to the starting (ending) frame of the video segment described by the query. We also propose a simple but effective IoU regression head module to explicitly consider the localization quality of the grounding results (i.e., the IoU between the predicted location and the ground truth). Experimental results show that our approach significantly outperforms state-of-the-arts on three datasets (i.e., Charades-STA, ActivityNet-Captions, and TACoS).
摘要:从地址的自然语言查询视频接地的问题。此任务中的关键挑战是,一个培训视频可能只包含可作为模型训练正例数注释开始/结束帧。最传统的方法使用这样的不平衡数据,从而实现差的结果直接训练二元分类器。本文的核心思想是使用真实地面内的画面,并开始(结束)帧密集监督,以提高视频接地精度之间的距离。具体来说,我们设计了一个新的密回归网络(DRN),以从每一帧的距离回归到由查询所描述的视频段的开始(结束)帧。我们还提出了一个简单而有效的欠条回归头模组,明确地考虑接地结果的本地化质量(即,预测位置和地面实况之间的IOU)。实验结果表明,我们的方法显著优于国家的最艺术的三个数据集(即字谜-STA,ActivityNet,字幕,和玉米饼)。
Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan
Abstract: We address the problem of video grounding from natural language queries. The key challenge in this task is that one training video might only contain a few annotated starting/ending frames that can be used as positive examples for model training. Most conventional approaches directly train a binary classifier using such imbalance data, thus achieving inferior results. The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy. Specifically, we design a novel dense regression network (DRN) to regress the distances from each frame to the starting (ending) frame of the video segment described by the query. We also propose a simple but effective IoU regression head module to explicitly consider the localization quality of the grounding results (i.e., the IoU between the predicted location and the ground truth). Experimental results show that our approach significantly outperforms state-of-the-arts on three datasets (i.e., Charades-STA, ActivityNet-Captions, and TACoS).
摘要:从地址的自然语言查询视频接地的问题。此任务中的关键挑战是,一个培训视频可能只包含可作为模型训练正例数注释开始/结束帧。最传统的方法使用这样的不平衡数据,从而实现差的结果直接训练二元分类器。本文的核心思想是使用真实地面内的画面,并开始(结束)帧密集监督,以提高视频接地精度之间的距离。具体来说,我们设计了一个新的密回归网络(DRN),以从每一帧的距离回归到由查询所描述的视频段的开始(结束)帧。我们还提出了一个简单而有效的欠条回归头模组,明确地考虑接地结果的本地化质量(即,预测位置和地面实况之间的IOU)。实验结果表明,我们的方法显著优于国家的最艺术的三个数据集(即字谜-STA,ActivityNet,字幕,和玉米饼)。
7. Deep Multi-Shot Network for modelling Appearance Similarity in Multi-Person Tracking applications [PDF] 返回目录
María J. Gómez-Silva
Abstract: The automatization of Multi-Object Tracking becomes a demanding task in real unconstrained scenarios, where the algorithms have to deal with crowds, crossing people, occlusions, disappearances and the presence of visually similar individuals. In those circumstances, the data association between the incoming detections and their corresponding identities could miss some tracks or produce identity switches. In order to reduce these tracking errors, and even their propagation in further frames, this article presents a Deep Multi-Shot neural model for measuring the Degree of Appearance Similarity (MS-DoAS) between person observations. This model provides temporal consistency to the individuals' appearance representation, and provides an affinity metric to perform frame-by-frame data association, allowing online tracking. The model has been deliberately trained to be able to manage the presence of previous identity switches and missed observations in the handled tracks. With that purpose, a novel data generation tool has been designed to create training tracklets that simulate such situations. The model has demonstrated a high capacity to discern when a new observation corresponds to a certain track, achieving a classification accuracy of 97\% in a hard test that simulates tracks with previous mistakes. Moreover, the tracking efficiency of the model in a Surveillance application has been demonstrated by integrating that into the frame-by-frame association of a Tracking-by-Detection algorithm.
摘要:多目标跟踪的自动化成为现实不受约束的情况,这里的算法必须处理的人群,过路的人,闭塞,失踪和视觉上相似的个体存在一个艰巨的任务。在这些情况下,输入的检测和它们对应的身份之间的关联的数据可能会错过一些轨道或出示身份开关。为了减少这些跟踪误差,甚至他们在进一步帧传输,本文提出了一种人的意见之间的测量外观的相似度(MS-DOAS)深连拍神经网络模型。该模型提供了对个体的外观表示时间一致性,并且提供度量来执行帧接一帧数据关联,允许在线跟踪的亲和力。该模型被刻意培养成能够管理在处理的磁道此前的身份交换机和错过意见的存在。有了这个目的,一种新型的数据生成工具的设计是为了创建一个模拟这种情况下训练tracklets。该模型已显示出高容量以辨别何时一个新的观察对应于某一轨道,在硬测试模拟与先前的错误的轨道实现的97 \%的分类准确度。此外,在监视应用程序中的模型的跟踪效率已经通过集成证明成跟踪逐检测算法的帧接一帧的关联。
María J. Gómez-Silva
Abstract: The automatization of Multi-Object Tracking becomes a demanding task in real unconstrained scenarios, where the algorithms have to deal with crowds, crossing people, occlusions, disappearances and the presence of visually similar individuals. In those circumstances, the data association between the incoming detections and their corresponding identities could miss some tracks or produce identity switches. In order to reduce these tracking errors, and even their propagation in further frames, this article presents a Deep Multi-Shot neural model for measuring the Degree of Appearance Similarity (MS-DoAS) between person observations. This model provides temporal consistency to the individuals' appearance representation, and provides an affinity metric to perform frame-by-frame data association, allowing online tracking. The model has been deliberately trained to be able to manage the presence of previous identity switches and missed observations in the handled tracks. With that purpose, a novel data generation tool has been designed to create training tracklets that simulate such situations. The model has demonstrated a high capacity to discern when a new observation corresponds to a certain track, achieving a classification accuracy of 97\% in a hard test that simulates tracks with previous mistakes. Moreover, the tracking efficiency of the model in a Surveillance application has been demonstrated by integrating that into the frame-by-frame association of a Tracking-by-Detection algorithm.
摘要:多目标跟踪的自动化成为现实不受约束的情况,这里的算法必须处理的人群,过路的人,闭塞,失踪和视觉上相似的个体存在一个艰巨的任务。在这些情况下,输入的检测和它们对应的身份之间的关联的数据可能会错过一些轨道或出示身份开关。为了减少这些跟踪误差,甚至他们在进一步帧传输,本文提出了一种人的意见之间的测量外观的相似度(MS-DOAS)深连拍神经网络模型。该模型提供了对个体的外观表示时间一致性,并且提供度量来执行帧接一帧数据关联,允许在线跟踪的亲和力。该模型被刻意培养成能够管理在处理的磁道此前的身份交换机和错过意见的存在。有了这个目的,一种新型的数据生成工具的设计是为了创建一个模拟这种情况下训练tracklets。该模型已显示出高容量以辨别何时一个新的观察对应于某一轨道,在硬测试模拟与先前的错误的轨道实现的97 \%的分类准确度。此外,在监视应用程序中的模型的跟踪效率已经通过集成证明成跟踪逐检测算法的帧接一帧的关联。
8. Bayesian aggregation improves traditional single image crop classification approaches [PDF] 返回目录
Ivan Matvienko, Mikhail Gasanov, Anna Petrovskaia, Raghavendra Belur Jana, Maria Pukalchik, Ivan Oseledets
Abstract: Machine learning (ML) methods and neural networks (NN) are widely implemented for crop types recognition and classification based on satellite images. However, most of these studies use several multi-temporal images which could be inapplicable for cloudy regions. We present a comparison between the classical ML approaches and U-Net NN for classifying crops with a single satellite image. The results show the advantages of using field-wise classification over pixel-wise approach. We first used a Bayesian aggregation for field-wise classification and improved on 1.5% results between majority voting aggregation. The best result for single satellite image crop classification is achieved for gradient boosting with an overall accuracy of 77.4% and macro F1-score 0.66.
摘要:机器学习(ML)方法和神经网络(NN)基于卫星图像作物种类识别和分类的广泛实施。然而,这些研究大多使用的可能是不适用于阴天地区几个多时相影像。我们提出的经典ML之间的比较接近和U-Net的NN用于作物与单个卫星图像分类。结果表明,使用现场明智的分类在逐象素方法的优点。我们首先用于现场明智的分类的贝叶斯聚集和多数表决聚集之间1.5%的结果改善。对于单个卫星图像作物分类最好的结果是对于具有77.4%的总体准确度和宏观梯度升压实现F1-得分0.66。
Ivan Matvienko, Mikhail Gasanov, Anna Petrovskaia, Raghavendra Belur Jana, Maria Pukalchik, Ivan Oseledets
Abstract: Machine learning (ML) methods and neural networks (NN) are widely implemented for crop types recognition and classification based on satellite images. However, most of these studies use several multi-temporal images which could be inapplicable for cloudy regions. We present a comparison between the classical ML approaches and U-Net NN for classifying crops with a single satellite image. The results show the advantages of using field-wise classification over pixel-wise approach. We first used a Bayesian aggregation for field-wise classification and improved on 1.5% results between majority voting aggregation. The best result for single satellite image crop classification is achieved for gradient boosting with an overall accuracy of 77.4% and macro F1-score 0.66.
摘要:机器学习(ML)方法和神经网络(NN)基于卫星图像作物种类识别和分类的广泛实施。然而,这些研究大多使用的可能是不适用于阴天地区几个多时相影像。我们提出的经典ML之间的比较接近和U-Net的NN用于作物与单个卫星图像分类。结果表明,使用现场明智的分类在逐象素方法的优点。我们首先用于现场明智的分类的贝叶斯聚集和多数表决聚集之间1.5%的结果改善。对于单个卫星图像作物分类最好的结果是对于具有77.4%的总体准确度和宏观梯度升压实现F1-得分0.66。
9. Hierarchical Image Classification using Entailment Cone Embeddings [PDF] 返回目录
Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, Andreas Krause
Abstract: Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
摘要:图像分类已被广泛研究,但一直有限的工作,使用比传统的训练图像标签对其他非常规,外部指导。我们提出了一套方法利用有关嵌入类标签语义层级的信息。我们首先注入标签层次知识转化为任意基于CNN分类器和经验表明在与图像提升整体性能的视觉语义结合这样的外部语义信息的可用性。进一步先走一步朝着这个方向,我们的模型更明确的标签,标签和使用的由欧几里德和双曲几何形状,在自然语言普遍管辖保序的嵌入标签图像交互,并定制他们的分层图像分类和代表学习。我们经验验证所有的分层ETHEC数据集模型。
Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, Andreas Krause
Abstract: Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
摘要:图像分类已被广泛研究,但一直有限的工作,使用比传统的训练图像标签对其他非常规,外部指导。我们提出了一套方法利用有关嵌入类标签语义层级的信息。我们首先注入标签层次知识转化为任意基于CNN分类器和经验表明在与图像提升整体性能的视觉语义结合这样的外部语义信息的可用性。进一步先走一步朝着这个方向,我们的模型更明确的标签,标签和使用的由欧几里德和双曲几何形状,在自然语言普遍管辖保序的嵌入标签图像交互,并定制他们的分层图像分类和代表学习。我们经验验证所有的分层ETHEC数据集模型。
10. Learning Formation of Physically-Based Face Attributes [PDF] 返回目录
Ruilong Li, Karl Bladin, Yajie Zhao, Chinmay Chinara, Owen Ingraham, Pengda Xiang, Xinglei Ren, Pratusha Prasad, Bipin Kishore, Jun Xing, Hao Li
Abstract: Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals, while demonstrating the cumulative value of every individual's complete biometric profile.
摘要:基于4000次高分辨率面部扫描的组合数据集,我们引入一个非线性形变脸部模型,能够产生孔隙级别分辨率的繁杂面的几何形状,再加上用于基于物理的渲染使用材料的属性。我们的目标是最大化各种身份,同时增加独特组件,包括中等频率的几何形状,反照率地图,镜面强度地图和高频位移的信息之间的对应关系的鲁棒性。我们深厚的学习基础生成模型获悉关联反照率和几何形状,保证了资产产生的解剖正确性。我们演示了新的身份生成,模型拟合,插值,动画,高保真数据可视化,以及低到高的分辨率的数据传输领域的潜在应用我们的生成模型的。我们希望这生成模型的发布将鼓励所有的图形,视觉之间的进一步合作和数据集中的专业人士,同时展现每个人的生物特征完整轮廓的累计值。
Ruilong Li, Karl Bladin, Yajie Zhao, Chinmay Chinara, Owen Ingraham, Pengda Xiang, Xinglei Ren, Pratusha Prasad, Bipin Kishore, Jun Xing, Hao Li
Abstract: Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals, while demonstrating the cumulative value of every individual's complete biometric profile.
摘要:基于4000次高分辨率面部扫描的组合数据集,我们引入一个非线性形变脸部模型,能够产生孔隙级别分辨率的繁杂面的几何形状,再加上用于基于物理的渲染使用材料的属性。我们的目标是最大化各种身份,同时增加独特组件,包括中等频率的几何形状,反照率地图,镜面强度地图和高频位移的信息之间的对应关系的鲁棒性。我们深厚的学习基础生成模型获悉关联反照率和几何形状,保证了资产产生的解剖正确性。我们演示了新的身份生成,模型拟合,插值,动画,高保真数据可视化,以及低到高的分辨率的数据传输领域的潜在应用我们的生成模型的。我们希望这生成模型的发布将鼓励所有的图形,视觉之间的进一步合作和数据集中的专业人士,同时展现每个人的生物特征完整轮廓的累计值。
11. Simultaneous Learning from Human Pose and Object Cues for Real-Time Activity Recognition [PDF] 返回目录
Brian Reily, Qingzhao Zhu, Christopher Reardon, Hao Zhang
Abstract: Real-time human activity recognition plays an essential role in real-world human-centered robotics applications, such as assisted living and human-robot collaboration. Although previous methods based on skeletal data to encode human poses showed promising results on real-time activity recognition, they lacked the capability to consider the context provided by objects within the scene and in use by the humans, which can provide a further discriminant between human activity categories. In this paper, we propose a novel approach to real-time human activity recognition, through simultaneously learning from observations of both human poses and objects involved in the human activity. We formulate human activity recognition as a joint optimization problem under a unified mathematical framework, which uses a regression-like loss function to integrate human pose and object cues and defines structured sparsity-inducing norms to identify discriminative body joints and object attributes. To evaluate our method, we perform extensive experiments on two benchmark datasets and a physical robot in a home assistance setting. Experimental results have shown that our method outperforms previous methods and obtains real-time performance for human activity recognition with a processing speed of 10^4 Hz.
摘要:实时人类活动的识别发挥在现实世界的人类为中心的机器人应用,如辅助生活和人类与机器人合作的重要作用。虽然基于骨骼数据编码人体姿势以前的方法显示出期望的实时行为识别的结果,他们缺乏考虑人类的场景中,并使用由对象提供的背景下,它可以提供人之间的进一步判别能力活动类别。在本文中,我们提出了一种新的方法来实时人类活动的识别,通过从两个人的姿势的同时观察和学习对象参与了人类活动。我们制定人类活动的认可,在统一的数学框架,它采用了回归般的损失函数整合结构化稀疏诱导规范,以确定辨别身体关节和对象属性人体姿势和对象的线索和定义的联合优化问题。为了评估我们的方法,我们在两个基准数据集,并在家庭援助设置物理机器人进行大量的实验。实验结果已经表明,我们的方法优于先前的方法和获得用于人类的活动识别与10 ^ 4赫兹的处理速度的实时性能。
Brian Reily, Qingzhao Zhu, Christopher Reardon, Hao Zhang
Abstract: Real-time human activity recognition plays an essential role in real-world human-centered robotics applications, such as assisted living and human-robot collaboration. Although previous methods based on skeletal data to encode human poses showed promising results on real-time activity recognition, they lacked the capability to consider the context provided by objects within the scene and in use by the humans, which can provide a further discriminant between human activity categories. In this paper, we propose a novel approach to real-time human activity recognition, through simultaneously learning from observations of both human poses and objects involved in the human activity. We formulate human activity recognition as a joint optimization problem under a unified mathematical framework, which uses a regression-like loss function to integrate human pose and object cues and defines structured sparsity-inducing norms to identify discriminative body joints and object attributes. To evaluate our method, we perform extensive experiments on two benchmark datasets and a physical robot in a home assistance setting. Experimental results have shown that our method outperforms previous methods and obtains real-time performance for human activity recognition with a processing speed of 10^4 Hz.
摘要:实时人类活动的识别发挥在现实世界的人类为中心的机器人应用,如辅助生活和人类与机器人合作的重要作用。虽然基于骨骼数据编码人体姿势以前的方法显示出期望的实时行为识别的结果,他们缺乏考虑人类的场景中,并使用由对象提供的背景下,它可以提供人之间的进一步判别能力活动类别。在本文中,我们提出了一种新的方法来实时人类活动的识别,通过从两个人的姿势的同时观察和学习对象参与了人类活动。我们制定人类活动的认可,在统一的数学框架,它采用了回归般的损失函数整合结构化稀疏诱导规范,以确定辨别身体关节和对象属性人体姿势和对象的线索和定义的联合优化问题。为了评估我们的方法,我们在两个基准数据集,并在家庭援助设置物理机器人进行大量的实验。实验结果已经表明,我们的方法优于先前的方法和获得用于人类的活动识别与10 ^ 4赫兹的处理速度的实时性能。
12. Strategies for Robust Image Classification [PDF] 返回目录
Jason Stock, Andy Dolan, Tom Cavey
Abstract: In this work we evaluate the impact of digitally altered images on the performance of artificial neural networks. We explore factors that negatively affect the ability of an image classification model to produce consistent and accurate results. A model's ability to classify is negatively influenced by alterations to images as a result of digital abnormalities or changes in the physical environment. The focus of this paper is to discover and replicate scenarios that modify the appearance of an image and evaluate them on state-of-the-art machine learning models. Our contributions present various training techniques that enhance a model's ability to generalize and improve robustness against these alterations.
摘要:在这项工作中,我们评估数字图像改变的人工神经网络的性能的影响。我们探索负面影响图像分类模型的产生一致和准确的结果的能力因素。模型的分类能力产生负面影响,改变对图像作为数字异常或物理环境变化的结果。本文的重点是发现和修改图像的外观,并评估他们在国家的最先进的机器学习模型复制方案。我们的贡献呈现各种培训技术,增强模型的概括,并提高对这些变化的鲁棒性的能力。
Jason Stock, Andy Dolan, Tom Cavey
Abstract: In this work we evaluate the impact of digitally altered images on the performance of artificial neural networks. We explore factors that negatively affect the ability of an image classification model to produce consistent and accurate results. A model's ability to classify is negatively influenced by alterations to images as a result of digital abnormalities or changes in the physical environment. The focus of this paper is to discover and replicate scenarios that modify the appearance of an image and evaluate them on state-of-the-art machine learning models. Our contributions present various training techniques that enhance a model's ability to generalize and improve robustness against these alterations.
摘要:在这项工作中,我们评估数字图像改变的人工神经网络的性能的影响。我们探索负面影响图像分类模型的产生一致和准确的结果的能力因素。模型的分类能力产生负面影响,改变对图像作为数字异常或物理环境变化的结果。本文的重点是发现和修改图像的外观,并评估他们在国家的最先进的机器学习模型复制方案。我们的贡献呈现各种培训技术,增强模型的概括,并提高对这些变化的鲁棒性的能力。
13. RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar [PDF] 返回目录
Prannay Kaul, Daniele De Martini, Matthew Gadd, Paul Newman
Abstract: This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects.
摘要:本文提出了一种有效的注释过程以及它们的使用FMCW雷达扫描感测到的环境的端至端,富含语义分割的应用程序。我们主张雷达在用于此任务的传统的传感器,因为它工作在较长的范围内,并且基本上更稳健不利的天气和照明条件。我们避免费力的手工贴标通过利用迄今收集的最大的雷达为重点的城市自治数据集,雷达扫描与RGB照相机和激光雷达传感器,其语义分割是一个已经整合过程相关。训练过程利用一个国家的最先进的自然图像分割系统,其可公开获得的,因此,相对于先前的方法,允许生产丰富标签雷达流的通过将4个相机和两个激光雷达流。此外,该损耗是通过沿着向前的姿势链和后面车辆的当前位置的累积激光雷达返回计算要考虑标签雷达传感器的视野。最后,我们提出用多通道雷达扫描输入的网络,以应对短暂的和动态场景中的对象。
Prannay Kaul, Daniele De Martini, Matthew Gadd, Paul Newman
Abstract: This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects.
摘要:本文提出了一种有效的注释过程以及它们的使用FMCW雷达扫描感测到的环境的端至端,富含语义分割的应用程序。我们主张雷达在用于此任务的传统的传感器,因为它工作在较长的范围内,并且基本上更稳健不利的天气和照明条件。我们避免费力的手工贴标通过利用迄今收集的最大的雷达为重点的城市自治数据集,雷达扫描与RGB照相机和激光雷达传感器,其语义分割是一个已经整合过程相关。训练过程利用一个国家的最先进的自然图像分割系统,其可公开获得的,因此,相对于先前的方法,允许生产丰富标签雷达流的通过将4个相机和两个激光雷达流。此外,该损耗是通过沿着向前的姿势链和后面车辆的当前位置的累积激光雷达返回计算要考虑标签雷达传感器的视野。最后,我们提出用多通道雷达扫描输入的网络,以应对短暂的和动态场景中的对象。
14. Nonparametric Data Analysis on the Space of Perceived Colors [PDF] 返回目录
Vic Patrangenaru, Yifang Deng
Abstract: Moving around in a 3D world, requires the visual system of a living individual to rely on three channels of image recognition, which is done through three types of retinal cones. Newton, Grasmann, Helmholz and Schr$\ddot{o}$dinger laid down the basic assumptions needed to understand colored vision. Such concepts were furthered by Resnikoff, who imagined the space of perceived colors as a 3D homogeneous space. This article is concerned with perceived colors regarded as random objects on a Resnikoff 3D homogeneous space model. Two applications to color differentiation in machine vision are illustrated for the proposed statistical methodology, applied to the Euclidean model for perceived colors.
摘要:在3D世界中四处移动,需要有生命的个体的视觉系统依靠图像识别,这是通过三种视锥的做的三个通道。牛顿,Grasmann,亥姆霍兹和薛定谔$ \ {DDOTØ} $薛定谔放下来了解彩色愿景所需的基本假设。这些概念是由Resnikoff,谁想到感知颜色的空间作为3D齐性空间进一步发展。本文关注的是作为一个Resnikoff 3D均匀空间模型随机物体感知颜色。两个应用程序,以颜色区分的机器视觉针对所提出的统计方法,应用到感知颜色的欧几里得模型所示。
Vic Patrangenaru, Yifang Deng
Abstract: Moving around in a 3D world, requires the visual system of a living individual to rely on three channels of image recognition, which is done through three types of retinal cones. Newton, Grasmann, Helmholz and Schr$\ddot{o}$dinger laid down the basic assumptions needed to understand colored vision. Such concepts were furthered by Resnikoff, who imagined the space of perceived colors as a 3D homogeneous space. This article is concerned with perceived colors regarded as random objects on a Resnikoff 3D homogeneous space model. Two applications to color differentiation in machine vision are illustrated for the proposed statistical methodology, applied to the Euclidean model for perceived colors.
摘要:在3D世界中四处移动,需要有生命的个体的视觉系统依靠图像识别,这是通过三种视锥的做的三个通道。牛顿,Grasmann,亥姆霍兹和薛定谔$ \ {DDOTØ} $薛定谔放下来了解彩色愿景所需的基本假设。这些概念是由Resnikoff,谁想到感知颜色的空间作为3D齐性空间进一步发展。本文关注的是作为一个Resnikoff 3D均匀空间模型随机物体感知颜色。两个应用程序,以颜色区分的机器视觉针对所提出的统计方法,应用到感知颜色的欧几里得模型所示。
15. MNEW: Multi-domain Neighborhood Embedding and Weighting for Sparse Point Clouds Segmentation [PDF] 返回目录
Yang Zheng, Izzat H. Izzat, Sanling Song
Abstract: Point clouds have been widely adopted in 3D semantic scene understanding. However, point clouds for typical tasks such as 3D shape segmentation or indoor scenario parsing are much denser than outdoor LiDAR sweeps for the application of autonomous driving perception. Due to the spatial property disparity, many successful methods designed for dense point clouds behave depreciated effectiveness on the sparse data. In this paper, we focus on the semantic segmentation task of sparse outdoor point clouds. We propose a new method called MNEW, including multi-domain neighborhood embedding, and attention weighting based on their geometry distance, feature similarity, and neighborhood sparsity. The network architecture inherits PointNet which directly process point clouds to capture pointwise details and global semantics, and is improved by involving multi-scale local neighborhoods in static geometry domain and dynamic feature space. The distance/similarity attention and sparsity-adapted weighting mechanism of MNEW enable its capability for a wide range of data sparsity distribution. With experiments conducted on virtual and real KITTI semantic datasets, MNEW achieves the top performance for sparse point clouds, which is important to the application of LiDAR-based automated driving perception.
摘要:点云已经在3D场景中的语义理解被广泛采用。然而,对于典型的任务,例如3D形状分割或室内场景解析点云密得多比室外激光雷达扫描的自动驾驶感觉的应用。由于空间的财产差距,专为密集的点云许多成功的方法的行为对稀疏数据贬值的有效性。在本文中,我们专注于户外稀疏点云的语义分割任务。我们提出了一个名为MNEW新方法,其中包括多域附近嵌入,并基于其几何距离关注权重,功能相似,和邻里稀疏。网络架构继承PointNet直接处理点云来捕获逐点细节和全局语义,并通过涉及静态几何域和动态特征空间多尺度地方邻里提高。的距离和/相似性的关注和MNEW的稀疏性适应加权机制使其能够对于宽范围的数据的稀疏分布的能力。随着虚拟和实际KITTI语义数据集进行的实验中,MNEW实现为稀疏点云的最佳性能,这是基于激光雷达的自动驾驶知觉的应用是重要的。
Yang Zheng, Izzat H. Izzat, Sanling Song
Abstract: Point clouds have been widely adopted in 3D semantic scene understanding. However, point clouds for typical tasks such as 3D shape segmentation or indoor scenario parsing are much denser than outdoor LiDAR sweeps for the application of autonomous driving perception. Due to the spatial property disparity, many successful methods designed for dense point clouds behave depreciated effectiveness on the sparse data. In this paper, we focus on the semantic segmentation task of sparse outdoor point clouds. We propose a new method called MNEW, including multi-domain neighborhood embedding, and attention weighting based on their geometry distance, feature similarity, and neighborhood sparsity. The network architecture inherits PointNet which directly process point clouds to capture pointwise details and global semantics, and is improved by involving multi-scale local neighborhoods in static geometry domain and dynamic feature space. The distance/similarity attention and sparsity-adapted weighting mechanism of MNEW enable its capability for a wide range of data sparsity distribution. With experiments conducted on virtual and real KITTI semantic datasets, MNEW achieves the top performance for sparse point clouds, which is important to the application of LiDAR-based automated driving perception.
摘要:点云已经在3D场景中的语义理解被广泛采用。然而,对于典型的任务,例如3D形状分割或室内场景解析点云密得多比室外激光雷达扫描的自动驾驶感觉的应用。由于空间的财产差距,专为密集的点云许多成功的方法的行为对稀疏数据贬值的有效性。在本文中,我们专注于户外稀疏点云的语义分割任务。我们提出了一个名为MNEW新方法,其中包括多域附近嵌入,并基于其几何距离关注权重,功能相似,和邻里稀疏。网络架构继承PointNet直接处理点云来捕获逐点细节和全局语义,并通过涉及静态几何域和动态特征空间多尺度地方邻里提高。的距离和/相似性的关注和MNEW的稀疏性适应加权机制使其能够对于宽范围的数据的稀疏分布的能力。随着虚拟和实际KITTI语义数据集进行的实验中,MNEW实现为稀疏点云的最佳性能,这是基于激光雷达的自动驾驶知觉的应用是重要的。
16. Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method [PDF] 返回目录
J. Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro
Abstract: Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.
摘要:主动照明是一个突出的补充,以提高2D脸部识别,并使其更加坚固,例如,以欺骗和低光照条件下。另外,在本工作中,我们表明,可以采用有源照明以增强状态的最先进的2D人脸识别接近与3D功能,同时绕过3D重建的复杂任务。其核心思想是项目在测试面高空间频率模式,这使我们能够同时恢复真正的3D信息加上一个标准的2D脸部图像。因此,国家的最先进的2D人脸识别溶液可透明地施加,而从输入图像的高频分量被提取的互补3D面部特征。对ND-2006数据集的实验结果表明所提出的想法可以显著提升面部识别性能,并显着提高了鲁棒性欺骗攻击。
J. Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro
Abstract: Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.
摘要:主动照明是一个突出的补充,以提高2D脸部识别,并使其更加坚固,例如,以欺骗和低光照条件下。另外,在本工作中,我们表明,可以采用有源照明以增强状态的最先进的2D人脸识别接近与3D功能,同时绕过3D重建的复杂任务。其核心思想是项目在测试面高空间频率模式,这使我们能够同时恢复真正的3D信息加上一个标准的2D脸部图像。因此,国家的最先进的2D人脸识别溶液可透明地施加,而从输入图像的高频分量被提取的互补3D面部特征。对ND-2006数据集的实验结果表明所提出的想法可以显著提升面部识别性能,并显着提高了鲁棒性欺骗攻击。
17. Attribution in Scale and Space [PDF] 返回目录
Shawn Xu, Subashini Venugopalan, Mukund Sundararajan
Abstract: We study the attribution problem [28] for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called \emph{Blur Integrated Gradients}. This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces scores in the scale/frequency dimension, that we find captures interesting phenomena. Second, it satisfies the scale-space axioms [14], which imply that it employs perturbations that are free of artifact. We therefore produce explanations that are cleaner and consistent with the operation of deep networks. Third, it eliminates the need for a 'baseline' parameter for Integrated Gradients [31] for perception tasks. This is desirable because the choice of baseline has a significant effect on the explanations. We compare the proposed technique against previous techniques and demonstrate application on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification.
摘要:我们研究的归属问题[28]应用到感知任务深层网络。对于视觉任务,归属技术属性的网络与输入图像的像素的预测。我们提出了一个所谓的新技术\ {EMPH综合模糊渐变}。这种技术比其他方法的几个优点。首先,它可以在什么规模的网络识别对象告诉。它产生的分数,在规模/频率维度,我们发现捕捉有趣的现象。其次,它满足尺度空间公理[14],这意味着它采用了是无伪影的扰动。因此,我们生产出更清洁和深层网络的操作一致的解释。第三,它消除了对综合梯度[31]一个“基准”参数感知任务的需要。这是可取的,因为基线的选择对一个解释效果显著。我们比较反对先前的技术所提出的技术和展示三个任务应用程序:ImageNet物体识别,糖尿病视网膜病变的预测,并AudioSet音频事件识别。
Shawn Xu, Subashini Venugopalan, Mukund Sundararajan
Abstract: We study the attribution problem [28] for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called \emph{Blur Integrated Gradients}. This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces scores in the scale/frequency dimension, that we find captures interesting phenomena. Second, it satisfies the scale-space axioms [14], which imply that it employs perturbations that are free of artifact. We therefore produce explanations that are cleaner and consistent with the operation of deep networks. Third, it eliminates the need for a 'baseline' parameter for Integrated Gradients [31] for perception tasks. This is desirable because the choice of baseline has a significant effect on the explanations. We compare the proposed technique against previous techniques and demonstrate application on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification.
摘要:我们研究的归属问题[28]应用到感知任务深层网络。对于视觉任务,归属技术属性的网络与输入图像的像素的预测。我们提出了一个所谓的新技术\ {EMPH综合模糊渐变}。这种技术比其他方法的几个优点。首先,它可以在什么规模的网络识别对象告诉。它产生的分数,在规模/频率维度,我们发现捕捉有趣的现象。其次,它满足尺度空间公理[14],这意味着它采用了是无伪影的扰动。因此,我们生产出更清洁和深层网络的操作一致的解释。第三,它消除了对综合梯度[31]一个“基准”参数感知任务的需要。这是可取的,因为基线的选择对一个解释效果显著。我们比较反对先前的技术所提出的技术和展示三个任务应用程序:ImageNet物体识别,糖尿病视网膜病变的预测,并AudioSet音频事件识别。
18. Error-Corrected Margin-Based Deep Cross-Modal Hashing for Facial Image Retrieval [PDF] 返回目录
Fariborz Taherkhani, Veeru Talreja, Matthew C. Valenti, Nasser M. Nasrabadi
Abstract: Cross-modal hashing facilitates mapping of heterogeneous multimedia data into a common Hamming space, which can beutilized for fast and flexible retrieval across different modalities. In this paper, we propose a novel cross-modal hashingarchitecture-deep neural decoder cross-modal hashing (DNDCMH), which uses a binary vector specifying the presence of certainfacial attributes as an input query to retrieve relevant face images from a database. The DNDCMH network consists of two separatecomponents: an attribute-based deep cross-modal hashing (ADCMH) module, which uses a margin (m)-based loss function toefficiently learn compact binary codes to preserve similarity between modalities in the Hamming space, and a neural error correctingdecoder (NECD), which is an error correcting decoder implemented with a neural network. The goal of NECD network in DNDCMH isto error correct the hash codes generated by ADCMH to improve the retrieval efficiency. The NECD network is trained such that it hasan error correcting capability greater than or equal to the margin (m) of the margin-based loss function. This results in NECD cancorrect the corrupted hash codes generated by ADCMH up to the Hamming distance of m. We have evaluated and comparedDNDCMH with state-of-the-art cross-modal hashing methods on standard datasets to demonstrate the superiority of our method.
摘要:跨模态散列有助于不同种类的多媒体数据的映射到一个共同的汉明空间,其可以为beutilized跨不同模态快速灵活的检索。在本文中,我们提出了一种新颖的跨通道hashingarchitecture深神经解码器跨通道散列(DNDCMH),它使用双元载体,指定作为输入查询以从数据库中检索相关的面部图像certainfacial属性的存在。所述DNDCMH网络由两个separatecomponents的:基于属性的深跨通道散列(ADCMH)模块,它使用一个余量(M)系损失函数toefficiently学习紧凑二进制码保存模式之间的相似性中的汉明空间,和一个神经误差correctingdecoder(NECD),这是用神经网络来实现纠错解码器。在这一领域的空白DNDCMH NECD误差网络的目标纠正ADCMH产生,以提高检索效率的散列码。所述NECD网络进行训练,使得其哈桑纠错能力大于或等于基于容限的损失函数的裕度(M)。这导致NECD cancorrect由ADCMH产生高达m的汉明距离被破坏的散列码。我们已评估并comparedDNDCMH与标准数据集的国家的最先进的跨模态散列方法来证明我们的方法的优越性。
Fariborz Taherkhani, Veeru Talreja, Matthew C. Valenti, Nasser M. Nasrabadi
Abstract: Cross-modal hashing facilitates mapping of heterogeneous multimedia data into a common Hamming space, which can beutilized for fast and flexible retrieval across different modalities. In this paper, we propose a novel cross-modal hashingarchitecture-deep neural decoder cross-modal hashing (DNDCMH), which uses a binary vector specifying the presence of certainfacial attributes as an input query to retrieve relevant face images from a database. The DNDCMH network consists of two separatecomponents: an attribute-based deep cross-modal hashing (ADCMH) module, which uses a margin (m)-based loss function toefficiently learn compact binary codes to preserve similarity between modalities in the Hamming space, and a neural error correctingdecoder (NECD), which is an error correcting decoder implemented with a neural network. The goal of NECD network in DNDCMH isto error correct the hash codes generated by ADCMH to improve the retrieval efficiency. The NECD network is trained such that it hasan error correcting capability greater than or equal to the margin (m) of the margin-based loss function. This results in NECD cancorrect the corrupted hash codes generated by ADCMH up to the Hamming distance of m. We have evaluated and comparedDNDCMH with state-of-the-art cross-modal hashing methods on standard datasets to demonstrate the superiority of our method.
摘要:跨模态散列有助于不同种类的多媒体数据的映射到一个共同的汉明空间,其可以为beutilized跨不同模态快速灵活的检索。在本文中,我们提出了一种新颖的跨通道hashingarchitecture深神经解码器跨通道散列(DNDCMH),它使用双元载体,指定作为输入查询以从数据库中检索相关的面部图像certainfacial属性的存在。所述DNDCMH网络由两个separatecomponents的:基于属性的深跨通道散列(ADCMH)模块,它使用一个余量(M)系损失函数toefficiently学习紧凑二进制码保存模式之间的相似性中的汉明空间,和一个神经误差correctingdecoder(NECD),这是用神经网络来实现纠错解码器。在这一领域的空白DNDCMH NECD误差网络的目标纠正ADCMH产生,以提高检索效率的散列码。所述NECD网络进行训练,使得其哈桑纠错能力大于或等于基于容限的损失函数的裕度(M)。这导致NECD cancorrect由ADCMH产生高达m的汉明距离被破坏的散列码。我们已评估并comparedDNDCMH与标准数据集的国家的最先进的跨模态散列方法来证明我们的方法的优越性。
19. Composition of Saliency Metrics for Channel Pruning with a Myopic Oracle [PDF] 返回目录
Kaveena Persand, Andrew Anderson, David Gregg
Abstract: The computation and memory needed for Convolutional Neural Network (CNN) inference can be reduced by pruning weights from the trained network. Pruning is guided by a pruning saliency, which heuristically approximates the change in the loss function associated with the removal of specific weights. Many pruning signals have been proposed, but the performance of each heuristic depends on the particular trained network. This leaves the data scientist with a difficult choice. When using any one saliency metric for the entire pruning process, we run the risk of the metric assumptions being invalidated, leading to poor decisions being made by the metric. Ideally we could combine the best aspects of different saliency metrics. However, despite an extensive literature review, we are unable to find any prior work on composing different saliency metrics. The chief difficulty lies in combining the numerical output of different saliency metrics, which are not directly comparable. We propose a method to compose several primitive pruning saliencies, to exploit the cases where each saliency measure does well. Our experiments show that the composition of saliencies avoids many poor pruning choices identified by individual saliencies. In most cases our method finds better selections than even the best individual pruning saliency.
摘要:可以通过从训练的网络修剪权重来降低所需要的卷积神经网络(CNN)推论的计算和存储。修剪由修剪显着性,这启发式近似于具有去除特定的权重相关联的损失函数的变化引导。许多修剪信号已经被提出,但每个启发式的性能取决于特定训练的网络上。这使得困难的抉择数据科学家。当使用度量整个清理过程中的任何一个显着,我们冒着被无效的度量假设的风险,从而导致决策失误由指标作出。理想情况下,我们可以结合不同的显着性指标最好的方面。然而,尽管大量的文献综述,我们无法找到组成不同的显着性指标任何以前的工作。主要困难在于组合不同的显着性的指标,这是不直接进行比较的数字输出。我们提出了一个方法来编写几个原始的修剪凸极,以利用每个显着特征度量做得很好的情况下。我们的实验表明,凸极的组成可避免由个别凸极发现了许多修剪糟糕的选择。在大多数情况下,我们的方法发现甚至比最佳个体修剪显着更好的选择。
Kaveena Persand, Andrew Anderson, David Gregg
Abstract: The computation and memory needed for Convolutional Neural Network (CNN) inference can be reduced by pruning weights from the trained network. Pruning is guided by a pruning saliency, which heuristically approximates the change in the loss function associated with the removal of specific weights. Many pruning signals have been proposed, but the performance of each heuristic depends on the particular trained network. This leaves the data scientist with a difficult choice. When using any one saliency metric for the entire pruning process, we run the risk of the metric assumptions being invalidated, leading to poor decisions being made by the metric. Ideally we could combine the best aspects of different saliency metrics. However, despite an extensive literature review, we are unable to find any prior work on composing different saliency metrics. The chief difficulty lies in combining the numerical output of different saliency metrics, which are not directly comparable. We propose a method to compose several primitive pruning saliencies, to exploit the cases where each saliency measure does well. Our experiments show that the composition of saliencies avoids many poor pruning choices identified by individual saliencies. In most cases our method finds better selections than even the best individual pruning saliency.
摘要:可以通过从训练的网络修剪权重来降低所需要的卷积神经网络(CNN)推论的计算和存储。修剪由修剪显着性,这启发式近似于具有去除特定的权重相关联的损失函数的变化引导。许多修剪信号已经被提出,但每个启发式的性能取决于特定训练的网络上。这使得困难的抉择数据科学家。当使用度量整个清理过程中的任何一个显着,我们冒着被无效的度量假设的风险,从而导致决策失误由指标作出。理想情况下,我们可以结合不同的显着性指标最好的方面。然而,尽管大量的文献综述,我们无法找到组成不同的显着性指标任何以前的工作。主要困难在于组合不同的显着性的指标,这是不直接进行比较的数字输出。我们提出了一个方法来编写几个原始的修剪凸极,以利用每个显着特征度量做得很好的情况下。我们的实验表明,凸极的组成可避免由个别凸极发现了许多修剪糟糕的选择。在大多数情况下,我们的方法发现甚至比最佳个体修剪显着更好的选择。
20. Robust Self-Supervised Convolutional Neural Network for Subspace Clustering and Classification [PDF] 返回目录
Dario Sitnik, Ivica Kopriva
Abstract: Insufficient capability of existing subspace clustering methods to handle data coming from nonlinear manifolds, data corruptions, and out-of-sample data hinders their applicability to address real-world clustering and classification problems. This paper proposes the robust formulation of the self-supervised convolutional subspace clustering network ($S^2$ConvSCN) that incorporates the fully connected (FC) layer and, thus, it is capable for handling out-of-sample data by classifying them using a softmax classifier. $S^2$ConvSCN clusters data coming from nonlinear manifolds by learning the linear self-representation model in the feature space. Robustness to data corruptions is achieved by using the correntropy induced metric (CIM) of the error. Furthermore, the block-diagonal (BD) structure of the representation matrix is enforced explicitly through BD regularization. In a truly unsupervised training environment, Robust $S^2$ConvSCN outperforms its baseline version by a significant amount for both seen and unseen data on four well-known datasets. Arguably, such an ablation study has not been reported before.
摘要:现有的子空间聚类方法来处理数据,从非线性歧管,数据损坏到来能力不足,外的样本数据阻碍了其适用于解决现实世界的聚类和分类问题。本文提出的自监督卷积子空间聚类网络的鲁棒制剂($ S ^ 2 $ ConvSCN)结合了完全连接(FC)层,因此,它能够用于通过其分类处理外的样本数据用Softmax分类。 $ S ^ 2点$ ConvSCN簇的数据通过学习在特征空间中的线性自表示模型从非线性歧管来。鲁棒性数据损坏通过使用错误的诱导度量(CIM)的correntropy实现。此外,表示矩阵的块对角(BD)结构是通过BD正规化明确地执行。在一个真正的无监督的训练环境,乐百氏$ S ^ 2 $ ConvSCN由四个著名的数据集都看到和看不到的数据显著量优于其基线版本。可以说,这样的消融研究尚未见报道。
Dario Sitnik, Ivica Kopriva
Abstract: Insufficient capability of existing subspace clustering methods to handle data coming from nonlinear manifolds, data corruptions, and out-of-sample data hinders their applicability to address real-world clustering and classification problems. This paper proposes the robust formulation of the self-supervised convolutional subspace clustering network ($S^2$ConvSCN) that incorporates the fully connected (FC) layer and, thus, it is capable for handling out-of-sample data by classifying them using a softmax classifier. $S^2$ConvSCN clusters data coming from nonlinear manifolds by learning the linear self-representation model in the feature space. Robustness to data corruptions is achieved by using the correntropy induced metric (CIM) of the error. Furthermore, the block-diagonal (BD) structure of the representation matrix is enforced explicitly through BD regularization. In a truly unsupervised training environment, Robust $S^2$ConvSCN outperforms its baseline version by a significant amount for both seen and unseen data on four well-known datasets. Arguably, such an ablation study has not been reported before.
摘要:现有的子空间聚类方法来处理数据,从非线性歧管,数据损坏到来能力不足,外的样本数据阻碍了其适用于解决现实世界的聚类和分类问题。本文提出的自监督卷积子空间聚类网络的鲁棒制剂($ S ^ 2 $ ConvSCN)结合了完全连接(FC)层,因此,它能够用于通过其分类处理外的样本数据用Softmax分类。 $ S ^ 2点$ ConvSCN簇的数据通过学习在特征空间中的线性自表示模型从非线性歧管来。鲁棒性数据损坏通过使用错误的诱导度量(CIM)的correntropy实现。此外,表示矩阵的块对角(BD)结构是通过BD正规化明确地执行。在一个真正的无监督的训练环境,乐百氏$ S ^ 2 $ ConvSCN由四个著名的数据集都看到和看不到的数据显著量优于其基线版本。可以说,这样的消融研究尚未见报道。
21. Improving BPSO-based feature selection applied to offline WI handwritten signature verification through overfitting control [PDF] 返回目录
Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Abstract: This paper investigates the presence of overfitting when using Binary Particle Swarm Optimization (BPSO) to perform the feature selection in a context of Handwritten Signature Verification (HSV). SigNet is a state of the art Deep CNN model for feature representation in the HSV context and contains 2048 dimensions. Some of these dimensions may include redundant information in the dissimilarity representation space generated by the dichotomy transformation (DT) used by the writer-independent (WI) approach. The analysis is carried out on the GPDS-960 dataset. Experiments demonstrate that the proposed method is able to control overfitting during the search for the most discriminant representation.
摘要:本文研究使用二进制粒子群优化(BPSO)在手写签名验证(HSV)的上下文中执行特征选择时过度拟合的存在。图章是本领域深CNN模型在HSV上下文特征的表示的状态,并含有2048和尺寸。一些这些尺寸可包括在通过由写入器无关的(WI)的方法中使用的二分法变换(DT)中产生的相异度表示空间冗余信息。该分析是在GPDS-960数据集进行。实验表明,该方法能够控制搜索最判别表示期间的过度拟合。
Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Abstract: This paper investigates the presence of overfitting when using Binary Particle Swarm Optimization (BPSO) to perform the feature selection in a context of Handwritten Signature Verification (HSV). SigNet is a state of the art Deep CNN model for feature representation in the HSV context and contains 2048 dimensions. Some of these dimensions may include redundant information in the dissimilarity representation space generated by the dichotomy transformation (DT) used by the writer-independent (WI) approach. The analysis is carried out on the GPDS-960 dataset. Experiments demonstrate that the proposed method is able to control overfitting during the search for the most discriminant representation.
摘要:本文研究使用二进制粒子群优化(BPSO)在手写签名验证(HSV)的上下文中执行特征选择时过度拟合的存在。图章是本领域深CNN模型在HSV上下文特征的表示的状态,并含有2048和尺寸。一些这些尺寸可包括在通过由写入器无关的(WI)的方法中使用的二分法变换(DT)中产生的相异度表示空间冗余信息。该分析是在GPDS-960数据集进行。实验表明,该方法能够控制搜索最判别表示期间的过度拟合。
22. A white-box analysis on the writer-independent dichotomy transformation applied to offline handwritten signature verification [PDF] 返回目录
Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Abstract: High number of writers, small number of training samples per writer with high intra-class variability and heavily imbalanced class distributions are among the challenges and difficulties of the offline Handwritten Signature Verification (HSV) problem. A good alternative to tackle these issues is to use a writer-independent (WI) framework. In WI systems, a single model is trained to perform signature verification for all writers from a dissimilarity space generated by the dichotomy transformation. Among the advantages of this framework is its scalability to deal with some of these challenges and its ease in managing new writers, and hence of being used in a transfer learning context. In this work, we present a white-box analysis of this approach highlighting how it handles the challenges, the dynamic selection of references through fusion function, and its application for transfer learning. All the analyses are carried out at the instance level using the instance hardness (IH) measure. The experimental results show that, using the IH analysis, we were able to characterize "good" and "bad" quality skilled forgeries as well as the frontier region between positive and negative samples. This enables futures investigations on methods for improving discrimination between genuine signatures and skilled forgeries by considering these characterizations.
摘要:大量作家,培养每个作家的样本具有较高的类内变化和严重不平衡类分布少数是脱机手写签名认证(HSV)问题的挑战和困难之中。一个很好的替代解决这些问题的方法是使用一个作家独立(WI)框架。在WI系统中,单个模型被训练以用于从由二分法变换产生的相异度空间中的所有作家执行签名验证。在这个框架的优点是其可扩展性,以应对一些挑战及其管理的新的作家,因此在转移学习环境中使用放心。在这项工作中,我们提出这个方法的白盒分析强调它是如何处理的挑战,通过融合函数引用的动态选择,及其转移学习应用。所有的分析是在使用实例硬度(1H)测量实例级进行。实验结果表明,采用IH分析,我们可以更好地表征“好”与“坏”的高素质技能伪造以及阳性和阴性样品之间的边境地区。这使得对通过考虑这些特性提高真正的签名和伪造熟练区分方法期货调查。
Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Abstract: High number of writers, small number of training samples per writer with high intra-class variability and heavily imbalanced class distributions are among the challenges and difficulties of the offline Handwritten Signature Verification (HSV) problem. A good alternative to tackle these issues is to use a writer-independent (WI) framework. In WI systems, a single model is trained to perform signature verification for all writers from a dissimilarity space generated by the dichotomy transformation. Among the advantages of this framework is its scalability to deal with some of these challenges and its ease in managing new writers, and hence of being used in a transfer learning context. In this work, we present a white-box analysis of this approach highlighting how it handles the challenges, the dynamic selection of references through fusion function, and its application for transfer learning. All the analyses are carried out at the instance level using the instance hardness (IH) measure. The experimental results show that, using the IH analysis, we were able to characterize "good" and "bad" quality skilled forgeries as well as the frontier region between positive and negative samples. This enables futures investigations on methods for improving discrimination between genuine signatures and skilled forgeries by considering these characterizations.
摘要:大量作家,培养每个作家的样本具有较高的类内变化和严重不平衡类分布少数是脱机手写签名认证(HSV)问题的挑战和困难之中。一个很好的替代解决这些问题的方法是使用一个作家独立(WI)框架。在WI系统中,单个模型被训练以用于从由二分法变换产生的相异度空间中的所有作家执行签名验证。在这个框架的优点是其可扩展性,以应对一些挑战及其管理的新的作家,因此在转移学习环境中使用放心。在这项工作中,我们提出这个方法的白盒分析强调它是如何处理的挑战,通过融合函数引用的动态选择,及其转移学习应用。所有的分析是在使用实例硬度(1H)测量实例级进行。实验结果表明,采用IH分析,我们可以更好地表征“好”与“坏”的高素质技能伪造以及阳性和阴性样品之间的边境地区。这使得对通过考虑这些特性提高真正的签名和伪造熟练区分方法期货调查。
23. Knife and Threat Detectors [PDF] 返回目录
David A. Noever, Sam E. Miller Noever
Abstract: Despite rapid advances in image-based machine learning, the threat identification of a knife wielding attacker has not garnered substantial academic attention. This relative research gap appears less understandable given the high knife assault rate (>100,000 annually) and the increasing availability of public video surveillance to analyze and forensically document. We present three complementary methods for scoring automated threat identification using multiple knife image datasets, each with the goal of narrowing down possible assault intentions while minimizing misidentifying false positives and risky false negatives. To alert an observer to the knife-wielding threat, we test and deploy classification built around MobileNet in a sparse and pruned neural network with a small memory requirement (< 2.2 megabytes) and 95% test accuracy. We secondly train a detection algorithm (MaskRCNN) to segment the hand from the knife in a single image and assign probable certainty to their relative location. This segmentation accomplishes both localization with bounding boxes but also relative positions to infer overhand threats. A final model built on the PoseNet architecture assigns anatomical waypoints or skeletal features to narrow the threat characteristics and reduce misunderstood intentions. We further identify and supplement existing data gaps that might blind a deployed knife threat detector such as collecting innocuous hand and fist images as important negative training sets. When automated on commodity hardware and software solutions one original research contribution is this systematic survey of timely and readily available image-based alerts to task and prioritize crime prevention countermeasures prior to a tragic outcome.
摘要:尽管基于图像的机器学习的迅猛发展,一把刀挥舞攻击者的威胁识别并没有囊括实质性学术界的重视。给予很高的持刀伤人率(> 100,000每年)和公共视频监控的日益普及,分析和取证文档此相关研究的差距似乎更难理解。我们目前使用多刀图像数据集进行打分自动化威胁识别三种互为补充的方法,分别具有缩小可能的攻击意图,同时尽量减少错误识别误报和风险漏报的目标。要提醒的观察员持刀威胁,我们测试和部署分类内置围绕MobileNet在稀疏和修剪神经网络的小内存要求(<2.2兆字节)和95%的测试精度。我们其次在单个图像训练检测算法(maskrcnn)来分割手从刀和可能确定性分配给它们的相对位置。这种分割实现国产化都与边框也相对位置来推断上手的威胁。建立在posenet架构受让人解剖航点或骨骼特征的最终模型缩小的威胁特性,并减少误解的意图。我们进一步确定并有可能瞎部署的刀威胁检测器,例如收集无害一手握拳图像作为重要的负面训练集合补充现有的数据差距。当在商用硬件和软件解决方案的自动化一个原创性研究的贡献是基于图像的及时和随时可用警报任务和之前的悲惨结局重点发展预防犯罪对策的这个系统的调查。< font> 2.2兆字节)和95%的测试精度。我们其次在单个图像训练检测算法(maskrcnn)来分割手从刀和可能确定性分配给它们的相对位置。这种分割实现国产化都与边框也相对位置来推断上手的威胁。建立在posenet架构受让人解剖航点或骨骼特征的最终模型缩小的威胁特性,并减少误解的意图。我们进一步确定并有可能瞎部署的刀威胁检测器,例如收集无害一手握拳图像作为重要的负面训练集合补充现有的数据差距。当在商用硬件和软件解决方案的自动化一个原创性研究的贡献是基于图像的及时和随时可用警报任务和之前的悲惨结局重点发展预防犯罪对策的这个系统的调查。<>
David A. Noever, Sam E. Miller Noever
Abstract: Despite rapid advances in image-based machine learning, the threat identification of a knife wielding attacker has not garnered substantial academic attention. This relative research gap appears less understandable given the high knife assault rate (>100,000 annually) and the increasing availability of public video surveillance to analyze and forensically document. We present three complementary methods for scoring automated threat identification using multiple knife image datasets, each with the goal of narrowing down possible assault intentions while minimizing misidentifying false positives and risky false negatives. To alert an observer to the knife-wielding threat, we test and deploy classification built around MobileNet in a sparse and pruned neural network with a small memory requirement (< 2.2 megabytes) and 95% test accuracy. We secondly train a detection algorithm (MaskRCNN) to segment the hand from the knife in a single image and assign probable certainty to their relative location. This segmentation accomplishes both localization with bounding boxes but also relative positions to infer overhand threats. A final model built on the PoseNet architecture assigns anatomical waypoints or skeletal features to narrow the threat characteristics and reduce misunderstood intentions. We further identify and supplement existing data gaps that might blind a deployed knife threat detector such as collecting innocuous hand and fist images as important negative training sets. When automated on commodity hardware and software solutions one original research contribution is this systematic survey of timely and readily available image-based alerts to task and prioritize crime prevention countermeasures prior to a tragic outcome.
摘要:尽管基于图像的机器学习的迅猛发展,一把刀挥舞攻击者的威胁识别并没有囊括实质性学术界的重视。给予很高的持刀伤人率(> 100,000每年)和公共视频监控的日益普及,分析和取证文档此相关研究的差距似乎更难理解。我们目前使用多刀图像数据集进行打分自动化威胁识别三种互为补充的方法,分别具有缩小可能的攻击意图,同时尽量减少错误识别误报和风险漏报的目标。要提醒的观察员持刀威胁,我们测试和部署分类内置围绕MobileNet在稀疏和修剪神经网络的小内存要求(<2.2兆字节)和95%的测试精度。我们其次在单个图像训练检测算法(maskrcnn)来分割手从刀和可能确定性分配给它们的相对位置。这种分割实现国产化都与边框也相对位置来推断上手的威胁。建立在posenet架构受让人解剖航点或骨骼特征的最终模型缩小的威胁特性,并减少误解的意图。我们进一步确定并有可能瞎部署的刀威胁检测器,例如收集无害一手握拳图像作为重要的负面训练集合补充现有的数据差距。当在商用硬件和软件解决方案的自动化一个原创性研究的贡献是基于图像的及时和随时可用警报任务和之前的悲惨结局重点发展预防犯罪对策的这个系统的调查。< font> 2.2兆字节)和95%的测试精度。我们其次在单个图像训练检测算法(maskrcnn)来分割手从刀和可能确定性分配给它们的相对位置。这种分割实现国产化都与边框也相对位置来推断上手的威胁。建立在posenet架构受让人解剖航点或骨骼特征的最终模型缩小的威胁特性,并减少误解的意图。我们进一步确定并有可能瞎部署的刀威胁检测器,例如收集无害一手握拳图像作为重要的负面训练集合补充现有的数据差距。当在商用硬件和软件解决方案的自动化一个原创性研究的贡献是基于图像的及时和随时可用警报任务和之前的悲惨结局重点发展预防犯罪对策的这个系统的调查。<>
24. A Machine Learning Based Framework for the Smart Healthcare Monitoring [PDF] 返回目录
Abrar Zahin, Le Thanh Tan, Rose Qingyang Hu
Abstract: In this paper, we propose a novel framework for the smart healthcare system, where we employ the compressed sensing (CS) and the combination of the state-of-the-art machine learning based denoiser as well as the alternating direction of method of multipliers (ADMM) structure. This integration significantly simplifies the software implementation for the lowcomplexity encoder, thanks to the modular structure of ADMM. Furthermore, we focus on detecting fall down actions from image streams. Thus, teh primary purpose of thus study is to reconstruct the image as visibly clear as possible and hence it helps the detection step at the trained classifier. For this efficient smart health monitoring framework, we employ the trained binary convolutional neural network (CNN) classifier for the fall-action classifier, because this scheme is a part of surveillance scenario. In this scenario, we deal with the fallimages, thus, we compress, transmit and reconstruct the fallimages. Experimental results demonstrate the impacts of network parameters and the significant performance gain of the proposal compared to traditional methods.
摘要:在本文中,我们提出一种用于智能医疗系统,其中我们采用压缩感测(CS)和国家的先进机器的组合的新的框架基于学习的降噪以及方法的交替方向的乘法器(ADMM)结构。这种集成显著简化了lowcomplexity编码器的软件实现,得益于ADMM的模块化结构。此外,我们专注于检测落下从图像流的行为。因此,由此研究TEH主要目的是重建图像作为可见透明尽可能并且因此有助于在经训练的分类的检测步骤。对于这种高效的智能健康监测框架,我们采用秋季行动分类的训练的二进制卷积神经网络(CNN)的分类,因为该方案是监控方案的一部分。在这种情况下,我们应对fallimages,因此,我们压缩,传输和重建fallimages。实验结果表明,网络参数的影响,并与传统方法相比该提案的显著性能增益。
Abrar Zahin, Le Thanh Tan, Rose Qingyang Hu
Abstract: In this paper, we propose a novel framework for the smart healthcare system, where we employ the compressed sensing (CS) and the combination of the state-of-the-art machine learning based denoiser as well as the alternating direction of method of multipliers (ADMM) structure. This integration significantly simplifies the software implementation for the lowcomplexity encoder, thanks to the modular structure of ADMM. Furthermore, we focus on detecting fall down actions from image streams. Thus, teh primary purpose of thus study is to reconstruct the image as visibly clear as possible and hence it helps the detection step at the trained classifier. For this efficient smart health monitoring framework, we employ the trained binary convolutional neural network (CNN) classifier for the fall-action classifier, because this scheme is a part of surveillance scenario. In this scenario, we deal with the fallimages, thus, we compress, transmit and reconstruct the fallimages. Experimental results demonstrate the impacts of network parameters and the significant performance gain of the proposal compared to traditional methods.
摘要:在本文中,我们提出一种用于智能医疗系统,其中我们采用压缩感测(CS)和国家的先进机器的组合的新的框架基于学习的降噪以及方法的交替方向的乘法器(ADMM)结构。这种集成显著简化了lowcomplexity编码器的软件实现,得益于ADMM的模块化结构。此外,我们专注于检测落下从图像流的行为。因此,由此研究TEH主要目的是重建图像作为可见透明尽可能并且因此有助于在经训练的分类的检测步骤。对于这种高效的智能健康监测框架,我们采用秋季行动分类的训练的二进制卷积神经网络(CNN)的分类,因为该方案是监控方案的一部分。在这种情况下,我们应对fallimages,因此,我们压缩,传输和重建fallimages。实验结果表明,网络参数的影响,并与传统方法相比该提案的显著性能增益。
25. Deep learning approaches in food recognition [PDF] 返回目录
Chairi Kiourt, George Pavlidis, Stella Markantonatou
Abstract: Automatic image-based food recognition is a particularly challenging task. Traditional image analysis approaches have achieved low classification accuracy in the past, whereas deep learning approaches enabled the identification of food types and their ingredients. The contents of food dishes are typically deformable objects, usually including complex semantics, which makes the task of defining their structure very difficult. Deep learning methods have already shown very promising results in such challenges, so this chapter focuses on the presentation of some popular approaches and techniques applied in image-based food recognition. The three main lines of solutions, namely the design from scratch, the transfer learning and the platform-based approaches, are outlined, particularly for the task at hand, and are tested and compared to reveal the inherent strengths and weaknesses. The chapter is complemented with basic background material, a section devoted to the relevant datasets that are crucial in light of the empirical approaches adopted, and some concluding remarks that underline the future directions.
摘要:自动基于图像的食品识别是一个特别具有挑战性的任务。传统的图像分析方法已经在过去取得低分类精度,而深度学习的方法使食物的种类及其成分的鉴定。食品菜肴的内容通常是可变形物体,通常包括复杂的语义,这使得确定其结构非常困难的任务。深学习方法已经显示出非常乐观的这种挑战的结果,因此本章重点介绍一些流行的方法和技术演示基于图像识别的食品应用。解决方案,即从头开始设计,传递学习和基于平台的方法的三条主线,概述,特别是对于手头的任务,并进行测试和比较,揭示内在的优势和弱点。该章补充了基本的背景材料,专用于相关数据集,它们采用的办法在经验的光的关键一节,和一些结论性意见是下划线的未来发展方向。
Chairi Kiourt, George Pavlidis, Stella Markantonatou
Abstract: Automatic image-based food recognition is a particularly challenging task. Traditional image analysis approaches have achieved low classification accuracy in the past, whereas deep learning approaches enabled the identification of food types and their ingredients. The contents of food dishes are typically deformable objects, usually including complex semantics, which makes the task of defining their structure very difficult. Deep learning methods have already shown very promising results in such challenges, so this chapter focuses on the presentation of some popular approaches and techniques applied in image-based food recognition. The three main lines of solutions, namely the design from scratch, the transfer learning and the platform-based approaches, are outlined, particularly for the task at hand, and are tested and compared to reveal the inherent strengths and weaknesses. The chapter is complemented with basic background material, a section devoted to the relevant datasets that are crucial in light of the empirical approaches adopted, and some concluding remarks that underline the future directions.
摘要:自动基于图像的食品识别是一个特别具有挑战性的任务。传统的图像分析方法已经在过去取得低分类精度,而深度学习的方法使食物的种类及其成分的鉴定。食品菜肴的内容通常是可变形物体,通常包括复杂的语义,这使得确定其结构非常困难的任务。深学习方法已经显示出非常乐观的这种挑战的结果,因此本章重点介绍一些流行的方法和技术演示基于图像识别的食品应用。解决方案,即从头开始设计,传递学习和基于平台的方法的三条主线,概述,特别是对于手头的任务,并进行测试和比较,揭示内在的优势和弱点。该章补充了基本的背景材料,专用于相关数据集,它们采用的办法在经验的光的关键一节,和一些结论性意见是下划线的未来发展方向。
26. Inclusive GAN: Improving Data and Minority Coverage in Generative Models [PDF] 返回目录
Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz
Abstract: Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images. Yet the equitable allocation of their modeling capacity among subgroups has received less attention, which could lead to potential biases against underrepresented minorities if left uncontrolled. In this work, we first formalize the problem of minority inclusion as one of data coverage, and then propose to improve data coverage by harmonizing adversarial training with reconstructive generation. The experiments show that our method outperforms the existing state-of-the-art methods in terms of data coverage on both seen and unseen data. We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset. Code, models, and supplemental videos are available at GitHub.
摘要:创成对抗性网络(甘斯)带来了对生成逼真的图像快速进步。然而,他们的造型能力组间公平分配已经受到足够的重视,如果不加以控制可能导致潜在的偏倚对少数族裔。在这项工作中,我们第一次正式列入少数民族数据覆盖的一个问题,然后提出通过协调与重建一代对抗训练,以提高数据覆盖。实验结果表明,我们的方法优于在看见也看不见数据数据覆盖范围方面存在的国家的最先进的方法。我们开发的扩展,允许在少数分组明确的控制,该模型应确保包括,在小妥协从整个数据集的整体性能验证其有效性。代码,模型和补充视频可在GitHub上。
Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz
Abstract: Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images. Yet the equitable allocation of their modeling capacity among subgroups has received less attention, which could lead to potential biases against underrepresented minorities if left uncontrolled. In this work, we first formalize the problem of minority inclusion as one of data coverage, and then propose to improve data coverage by harmonizing adversarial training with reconstructive generation. The experiments show that our method outperforms the existing state-of-the-art methods in terms of data coverage on both seen and unseen data. We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset. Code, models, and supplemental videos are available at GitHub.
摘要:创成对抗性网络(甘斯)带来了对生成逼真的图像快速进步。然而,他们的造型能力组间公平分配已经受到足够的重视,如果不加以控制可能导致潜在的偏倚对少数族裔。在这项工作中,我们第一次正式列入少数民族数据覆盖的一个问题,然后提出通过协调与重建一代对抗训练,以提高数据覆盖。实验结果表明,我们的方法优于在看见也看不见数据数据覆盖范围方面存在的国家的最先进的方法。我们开发的扩展,允许在少数分组明确的控制,该模型应确保包括,在小妥协从整个数据集的整体性能验证其有效性。代码,模型和补充视频可在GitHub上。
27. An Image Labeling Tool and Agricultural Dataset for Deep Learning [PDF] 返回目录
Patrick Wspanialy, Justin Brooks, Medhat Moussa
Abstract: We introduce a labeling tool and dataset aimed to facilitate computer vision research in agriculture. The annotation tool introduces novel methods for labeling with a variety of manual, semi-automatic, and fully-automatic tools. The dataset includes original images collected from commercial greenhouses, images from PlantVillage, and images from Google Images. Images were annotated with segmentations for foreground leaf, fruit, and stem instances, and diseased leaf area. Labels were in an extended COCO format. In total the dataset contained 10k tomatoes, 7k leaves, 2k stems, and 2k diseased leaf annotations.
摘要:介绍一个标签工具和数据集的目的是要促进农业的计算机视觉研究。注释工具引入了各种手动,半自动,和全自动工具用于标记的新方法。该数据集包括从谷歌图片来自商业大棚采集的原始图像,从图像PlantVillage和图像。图像用分割的前景叶,果注释和干的情况下,和患病叶面积。标签是在扩展COCO格式。在总的数据集包含10K西红柿,7K叶,茎2K和2K病叶注解。
Patrick Wspanialy, Justin Brooks, Medhat Moussa
Abstract: We introduce a labeling tool and dataset aimed to facilitate computer vision research in agriculture. The annotation tool introduces novel methods for labeling with a variety of manual, semi-automatic, and fully-automatic tools. The dataset includes original images collected from commercial greenhouses, images from PlantVillage, and images from Google Images. Images were annotated with segmentations for foreground leaf, fruit, and stem instances, and diseased leaf area. Labels were in an extended COCO format. In total the dataset contained 10k tomatoes, 7k leaves, 2k stems, and 2k diseased leaf annotations.
摘要:介绍一个标签工具和数据集的目的是要促进农业的计算机视觉研究。注释工具引入了各种手动,半自动,和全自动工具用于标记的新方法。该数据集包括从谷歌图片来自商业大棚采集的原始图像,从图像PlantVillage和图像。图像用分割的前景叶,果注释和干的情况下,和患病叶面积。标签是在扩展COCO格式。在总的数据集包含10K西红柿,7K叶,茎2K和2K病叶注解。
28. Automatic Generation of Chinese Handwriting via Fonts Style Representation Learning [PDF] 返回目录
Fenxi Xiao, Bo Huang, Xia Wu
Abstract: In this paper, we propose and end-to-end deep Chinese font generation system. This system can generate new style fonts by interpolation of latent style-related embeding variables that could achieve smooth transition between different style. Our method is simpler and more effective than other methods, which will help to improve the font design efficiency
摘要:在本文中,我们提出和终端到终端的深厚的中国字体发电系统。该系统可以通过生成,可以实现不同的风格之间的平滑过渡的潜在风格相关embeding变量的插值新风格的字体。我们的方法是比其他方法,这将有助于改善字体的设计效率,更简单和更有效
Fenxi Xiao, Bo Huang, Xia Wu
Abstract: In this paper, we propose and end-to-end deep Chinese font generation system. This system can generate new style fonts by interpolation of latent style-related embeding variables that could achieve smooth transition between different style. Our method is simpler and more effective than other methods, which will help to improve the font design efficiency
摘要:在本文中,我们提出和终端到终端的深厚的中国字体发电系统。该系统可以通过生成,可以实现不同的风格之间的平滑过渡的潜在风格相关embeding变量的插值新风格的字体。我们的方法是比其他方法,这将有助于改善字体的设计效率,更简单和更有效
29. Multiform Fonts-to-Fonts Translation via Style and Content Disentangled Representations of Chinese Character [PDF] 返回目录
Fenxi Xiao, Jie Zhang, Bo Huang, Xia Wu
Abstract: This paper mainly discusses the generation of personalized fonts as the problem of image style transfer. The main purpose of this paper is to design a network framework that can extract and recombine the content and style of the characters. These attempts can be used to synthesize the entire set of fonts with only a small amount of characters. The paper combines various depth networks such as Convolutional Neural Network, Multi-layer Perceptron and Residual Network to find the optimal model to extract the features of the fonts character. The result shows that those characters we have generated is very close to real characters, using Structural Similarity index and Peak Signal-to-Noise Ratio evaluation criterions.
摘要:本文主要论述了个性化字体的一代影像风格转移的问题。本文的主要目的是设计一个网络架构,可以提取和重组字符的内容和风格。这些尝试可以被用于合成整个集的字体的,只有字符的量小。在本文结合各种深度网络,诸如卷积神经网络,多层感知和剩余网络以找到最佳的模型以提取字体的字符的功能。结果表明,我们已经产生这些字符是非常接近真实人物,利用结构相似度指数和峰值信噪比评价标准。
Fenxi Xiao, Jie Zhang, Bo Huang, Xia Wu
Abstract: This paper mainly discusses the generation of personalized fonts as the problem of image style transfer. The main purpose of this paper is to design a network framework that can extract and recombine the content and style of the characters. These attempts can be used to synthesize the entire set of fonts with only a small amount of characters. The paper combines various depth networks such as Convolutional Neural Network, Multi-layer Perceptron and Residual Network to find the optimal model to extract the features of the fonts character. The result shows that those characters we have generated is very close to real characters, using Structural Similarity index and Peak Signal-to-Noise Ratio evaluation criterions.
摘要:本文主要论述了个性化字体的一代影像风格转移的问题。本文的主要目的是设计一个网络架构,可以提取和重组字符的内容和风格。这些尝试可以被用于合成整个集的字体的,只有字符的量小。在本文结合各种深度网络,诸如卷积神经网络,多层感知和剩余网络以找到最佳的模型以提取字体的字符的功能。结果表明,我们已经产生这些字符是非常接近真实人物,利用结构相似度指数和峰值信噪比评价标准。
30. An End-to-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings [PDF] 返回目录
Andre G. Hochuli, Alceu S. Britto Jr., Jean P. Barddal, Luiz E. S. Oliveira, Robert Sabourin
Abstract: An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints related to the string length. A robust experimental protocol based on several numeral string datasets, including one composed of historical documents, has shown that the proposed method is a feasible end-to-end solution for numeral string recognition. Besides, it reduces the complexity of the string recognition task considerably since it drops out classical steps, in special preprocessing, segmentation, and a set of classifiers devoted to strings with a specific length.
摘要:端至端解决方案,手写数字字符串识别提出,其中如由自动检测和基于YOLO模型识别的对象的数字串被考虑。本文的主要贡献是避免串预处理和分割,需要面向任务的分类,并利用相关的字符串长度的具体限制启发式为主的方法。基于几个数字串的数据集,包括一个历史文件组成的健壮的实验方案,已经显示出所提出的方法是一种可行的端至端的解决方案,数字串的识别。此外,它降低了串识别任务的复杂性显着,因为它滴出古典的步骤,在特殊预处理,分段,以及一组专门用于字符串具有特定长度分类的。
Andre G. Hochuli, Alceu S. Britto Jr., Jean P. Barddal, Luiz E. S. Oliveira, Robert Sabourin
Abstract: An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints related to the string length. A robust experimental protocol based on several numeral string datasets, including one composed of historical documents, has shown that the proposed method is a feasible end-to-end solution for numeral string recognition. Besides, it reduces the complexity of the string recognition task considerably since it drops out classical steps, in special preprocessing, segmentation, and a set of classifiers devoted to strings with a specific length.
摘要:端至端解决方案,手写数字字符串识别提出,其中如由自动检测和基于YOLO模型识别的对象的数字串被考虑。本文的主要贡献是避免串预处理和分割,需要面向任务的分类,并利用相关的字符串长度的具体限制启发式为主的方法。基于几个数字串的数据集,包括一个历史文件组成的健壮的实验方案,已经显示出所提出的方法是一种可行的端至端的解决方案,数字串的识别。此外,它降低了串识别任务的复杂性显着,因为它滴出古典的步骤,在特殊预处理,分段,以及一组专门用于字符串具有特定长度分类的。
31. Predict the model of a camera [PDF] 返回目录
Ciro Javier Diaz Penedo
Abstract: In this work we address the problem of predicting the model of a camera based on the content of their photographs. We use two set of features, one set consist in properties extracted from a Discrete Wavelet Domain (DWD) obtained by applying a 4 level Fast Wavelet Decomposition of the images, and a second set are Local Binary Patterns (LBP) features from the after filter noise of images. The algorithms used for classification were Logistic regression, K-NN and Artificial Neural Networks
摘要:在这项工作中,我们解决预测基础上,他们的照片内容的摄像机的型号的问题。我们使用了两个组特征,一组包括在从通过施加第4级的图像的快速小波分解,以及第二组获得的离散小波变换域(DWD)提取的属性是局部二元模式(LBP)从所述后过滤器设有图像的噪声。用于分类的算法是Logistic回归分析,K-NN和人工神经网络
Ciro Javier Diaz Penedo
Abstract: In this work we address the problem of predicting the model of a camera based on the content of their photographs. We use two set of features, one set consist in properties extracted from a Discrete Wavelet Domain (DWD) obtained by applying a 4 level Fast Wavelet Decomposition of the images, and a second set are Local Binary Patterns (LBP) features from the after filter noise of images. The algorithms used for classification were Logistic regression, K-NN and Artificial Neural Networks
摘要:在这项工作中,我们解决预测基础上,他们的照片内容的摄像机的型号的问题。我们使用了两个组特征,一组包括在从通过施加第4级的图像的快速小波分解,以及第二组获得的离散小波变换域(DWD)提取的属性是局部二元模式(LBP)从所述后过滤器设有图像的噪声。用于分类的算法是Logistic回归分析,K-NN和人工神经网络
32. FusedProp: Towards Efficient Training of Generative Adversarial Networks [PDF] 返回目录
Zachary Polizzi, Chuan-Yung Tsai
Abstract: Generative adversarial networks (GANs) are capable of generating strikingly realistic samples but state-of-the-art GANs can be extremely computationally expensive to train. In this paper, we propose the fused propagation (FusedProp) algorithm which can be used to efficiently train the discriminator and the generator of common GANs simultaneously using only one forward and one backward propagation. We show that FusedProp achieves 1.49 times the training speed compared to the conventional training of GANs, although further studies are required to improve its stability. By reporting our preliminary results and open-sourcing our implementation, we hope to accelerate future research on the training of GANs.
摘要:剖成对抗网络(甘斯)能够产生显着现实的样本但国家的最先进的甘斯可能是极其昂贵的计算来训练。在本文中,我们提出一种可用于有效地训练鉴别器和共同甘斯的同时只使用一个正向和一个反向传播发电机的熔融传播(FusedProp)算法。我们表明,FusedProp相比,甘斯的常规训练达到训练速度的1.49倍,虽然还需要进一步研究,以提高其稳定性。通过报告我们的初步结果,以及开放式采购项目中,我们希望能够加快未来在甘斯的训练研究。
Zachary Polizzi, Chuan-Yung Tsai
Abstract: Generative adversarial networks (GANs) are capable of generating strikingly realistic samples but state-of-the-art GANs can be extremely computationally expensive to train. In this paper, we propose the fused propagation (FusedProp) algorithm which can be used to efficiently train the discriminator and the generator of common GANs simultaneously using only one forward and one backward propagation. We show that FusedProp achieves 1.49 times the training speed compared to the conventional training of GANs, although further studies are required to improve its stability. By reporting our preliminary results and open-sourcing our implementation, we hope to accelerate future research on the training of GANs.
摘要:剖成对抗网络(甘斯)能够产生显着现实的样本但国家的最先进的甘斯可能是极其昂贵的计算来训练。在本文中,我们提出一种可用于有效地训练鉴别器和共同甘斯的同时只使用一个正向和一个反向传播发电机的熔融传播(FusedProp)算法。我们表明,FusedProp相比,甘斯的常规训练达到训练速度的1.49倍,虽然还需要进一步研究,以提高其稳定性。通过报告我们的初步结果,以及开放式采购项目中,我们希望能够加快未来在甘斯的训练研究。
33. Streaming Networks: Increase Noise Robustness and Filter Diversity via Hard-wired and Input-induced Sparsity [PDF] 返回目录
Sergey Tarasenko, Fumihiko Takahashi
Abstract: The CNNs have achieved a state-of-the-art performance in many applications. Recent studies illustrate that CNN's recognition accuracy drops drastically if images are noise corrupted. We focus on the problem of robust recognition accuracy of noise-corrupted images. We introduce a novel network architecture called Streaming Networks. Each stream is taking a certain intensity slice of the original image as an input, and stream parameters are trained independently. We use network capacity, hard-wired and input-induced sparsity as the dimensions for experiments. The results indicate that only the presence of both hard-wired and input-induces sparsity enables robust noisy image recognition. Streaming Nets is the only architecture which has both types of sparsity and exhibits higher robustness to noise. Finally, to illustrate increase in filter diversity we illustrate that a distribution of filter weights of the first conv layer gradually approaches uniform distribution as the degree of hard-wired and domain-induced sparsity and capacities increases.
摘要:细胞神经网络已经实现在许多应用中一个国家的最先进的性能。最近的研究表明,CNN的识别准确率急剧下降,如果图像噪声污染。我们注重的噪声破坏图像的强大的识别准确度的问题。我们引入了一种新的网络架构,称为流媒体网络。每个流走的是原始图像的特定强度切片作为输入,和流参数被独立地训练。我们使用的网络容量,硬连线和输入引起的稀疏的尺寸实验。结果表明,只有既硬连线和输入诱导稀疏性的存在使得健壮嘈杂图像识别。流网是具有两种类型的稀疏性和表现出更高的鲁棒性噪声的唯一结构。最后,以示出过滤器的多样性的增加,我们说明的是,第一层CONV的滤波器权重的分布均匀分布逐渐接近作为硬连线和域诱导的稀疏性和能力的程度增加。
Sergey Tarasenko, Fumihiko Takahashi
Abstract: The CNNs have achieved a state-of-the-art performance in many applications. Recent studies illustrate that CNN's recognition accuracy drops drastically if images are noise corrupted. We focus on the problem of robust recognition accuracy of noise-corrupted images. We introduce a novel network architecture called Streaming Networks. Each stream is taking a certain intensity slice of the original image as an input, and stream parameters are trained independently. We use network capacity, hard-wired and input-induced sparsity as the dimensions for experiments. The results indicate that only the presence of both hard-wired and input-induces sparsity enables robust noisy image recognition. Streaming Nets is the only architecture which has both types of sparsity and exhibits higher robustness to noise. Finally, to illustrate increase in filter diversity we illustrate that a distribution of filter weights of the first conv layer gradually approaches uniform distribution as the degree of hard-wired and domain-induced sparsity and capacities increases.
摘要:细胞神经网络已经实现在许多应用中一个国家的最先进的性能。最近的研究表明,CNN的识别准确率急剧下降,如果图像噪声污染。我们注重的噪声破坏图像的强大的识别准确度的问题。我们引入了一种新的网络架构,称为流媒体网络。每个流走的是原始图像的特定强度切片作为输入,和流参数被独立地训练。我们使用的网络容量,硬连线和输入引起的稀疏的尺寸实验。结果表明,只有既硬连线和输入诱导稀疏性的存在使得健壮嘈杂图像识别。流网是具有两种类型的稀疏性和表现出更高的鲁棒性噪声的唯一结构。最后,以示出过滤器的多样性的增加,我们说明的是,第一层CONV的滤波器权重的分布均匀分布逐渐接近作为硬连线和域诱导的稀疏性和能力的程度增加。
34. Cascaded Refinement Network for Point Cloud Completion [PDF] 返回目录
Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee
Abstract: Point clouds are often sparse and incomplete. Existing shape completion methods are incapable of generating details of objects or learning the complex point distributions. To this end, we propose a cascaded refinement network together with a coarse-to-fine strategy to synthesize the detailed object shapes. Considering the local details of partial input with the global shape information together, we can preserve the existing details in the incomplete point set and generate the missing parts with high fidelity. We also design a patch discriminator that guarantees every local area has the same pattern with the ground truth to learn the complicated point distribution. Quantitative and qualitative experiments on different datasets show that our method achieves superior results compared to existing state-of-the-art approaches on the 3D point cloud completion task. Our source code is available at this https URL.
摘要:点云,往往稀疏,不完整的。现有形状完成方法不能产生对象的详细信息或学习复杂的点分布。为此,我们一起提出了一个级联细化网络,由粗到细的策略,合成了详细的目标形状。考虑到与整体形状信息一起输入部分的局部细节,我们可以在不完整的点集保留现有的细节并产生高保真缺少的部分。我们还设计了一个补丁鉴别,保证每一个局部区域与地面实况相同的模式来学习复杂的点分布。对不同的数据集的定量和定性实验表明,相对于现有的最先进的国家,我们的方法实现了优异的业绩接近的三维点云完成任务。我们的源代码可在此HTTPS URL。
Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee
Abstract: Point clouds are often sparse and incomplete. Existing shape completion methods are incapable of generating details of objects or learning the complex point distributions. To this end, we propose a cascaded refinement network together with a coarse-to-fine strategy to synthesize the detailed object shapes. Considering the local details of partial input with the global shape information together, we can preserve the existing details in the incomplete point set and generate the missing parts with high fidelity. We also design a patch discriminator that guarantees every local area has the same pattern with the ground truth to learn the complicated point distribution. Quantitative and qualitative experiments on different datasets show that our method achieves superior results compared to existing state-of-the-art approaches on the 3D point cloud completion task. Our source code is available at this https URL.
摘要:点云,往往稀疏,不完整的。现有形状完成方法不能产生对象的详细信息或学习复杂的点分布。为此,我们一起提出了一个级联细化网络,由粗到细的策略,合成了详细的目标形状。考虑到与整体形状信息一起输入部分的局部细节,我们可以在不完整的点集保留现有的细节并产生高保真缺少的部分。我们还设计了一个补丁鉴别,保证每一个局部区域与地面实况相同的模式来学习复杂的点分布。对不同的数据集的定量和定性实验表明,相对于现有的最先进的国家,我们的方法实现了优异的业绩接近的三维点云完成任务。我们的源代码可在此HTTPS URL。
35. Towards Efficient Unconstrained Palmprint Recognition via Deep Distillation Hashing [PDF] 返回目录
Huikai Shao, Dexing Zhong, Xuefeng Du
Abstract: Deep palmprint recognition has become an emerging issue with great potential for personal authentication on handheld and wearable consumer devices. Previous studies of palmprint recognition are mainly based on constrained datasets collected by dedicated devices in controlled environments, which has to reduce the flexibility and convenience. In addition, general deep palmprint recognition algorithms are often too heavy to meet the real-time requirements of embedded system. In this paper, a new palmprint benchmark is established, which consists of more than 20,000 images collected by 5 brands of smart phones in an unconstrained manner. Each image has been manually labeled with 14 key points for region of interest (ROI) extraction. Further, the approach called Deep Distillation Hashing (DDH) is proposed as benchmark for efficient deep palmprint recognition. Palmprint images are converted to binary codes to improve the efficiency of feature matching. Derived from knowledge distillation, novel distillation loss functions are constructed to compress deep model to further improve the efficiency of feature extraction on light network. Comprehensive experiments are conducted on both constrained and unconstrained palmprint databases. Using DDH, the accuracy of palmprint identification can be increased by up to 11.37%, and the Equal Error Rate (EER) of palmprint verification can be reduced by up to 3.11%. The results indicate the feasibility of our database, and DDH can outperform other baselines to achieve the state-of-the-art performance. The collected dataset and related source codes are publicly available at this http URL.
摘要:深掌纹识别已经成为对手持设备和可穿戴式消费电子设备的个人认证潜力巨大的新兴问题。掌纹识别的以前的研究主要是根据通过在受控环境中的专用设备,其具有降低的灵活性和方便收集约束数据集。此外,一般的深掌纹识别算法往往过于沉重,以满足嵌入式系统的实时性要求。在本文中,一种新的掌纹基准的建立,其中包括5个品牌的智能手机不受约束的方式收集的超过20,000图像。每个图像具有对关注区域(ROI)的提取区域14个的关键点被手动标记。此外,该方法被称为深蒸馏散列(DDH),提出为基准的高效深掌纹识别。掌纹图像被转换为二进制码来提高特征匹配的效率。从知识蒸馏而得,新颖蒸馏损失函数被构造成压缩深模型进一步提高的特征提取的光网络上的效率。综合实验两个约束和不受约束的掌纹数据库进行。使用DDH,掌纹识别精度可提高到11.37%,并且可以通过最多减少到3.11%掌纹核查等错误率(EER)。结果表明我们的数据库的可行性,并DDH可以超越其他基准,实现国家的最先进的性能。所收集的数据集和相关的源代码是公开的,在此http网址。
Huikai Shao, Dexing Zhong, Xuefeng Du
Abstract: Deep palmprint recognition has become an emerging issue with great potential for personal authentication on handheld and wearable consumer devices. Previous studies of palmprint recognition are mainly based on constrained datasets collected by dedicated devices in controlled environments, which has to reduce the flexibility and convenience. In addition, general deep palmprint recognition algorithms are often too heavy to meet the real-time requirements of embedded system. In this paper, a new palmprint benchmark is established, which consists of more than 20,000 images collected by 5 brands of smart phones in an unconstrained manner. Each image has been manually labeled with 14 key points for region of interest (ROI) extraction. Further, the approach called Deep Distillation Hashing (DDH) is proposed as benchmark for efficient deep palmprint recognition. Palmprint images are converted to binary codes to improve the efficiency of feature matching. Derived from knowledge distillation, novel distillation loss functions are constructed to compress deep model to further improve the efficiency of feature extraction on light network. Comprehensive experiments are conducted on both constrained and unconstrained palmprint databases. Using DDH, the accuracy of palmprint identification can be increased by up to 11.37%, and the Equal Error Rate (EER) of palmprint verification can be reduced by up to 3.11%. The results indicate the feasibility of our database, and DDH can outperform other baselines to achieve the state-of-the-art performance. The collected dataset and related source codes are publicly available at this http URL.
摘要:深掌纹识别已经成为对手持设备和可穿戴式消费电子设备的个人认证潜力巨大的新兴问题。掌纹识别的以前的研究主要是根据通过在受控环境中的专用设备,其具有降低的灵活性和方便收集约束数据集。此外,一般的深掌纹识别算法往往过于沉重,以满足嵌入式系统的实时性要求。在本文中,一种新的掌纹基准的建立,其中包括5个品牌的智能手机不受约束的方式收集的超过20,000图像。每个图像具有对关注区域(ROI)的提取区域14个的关键点被手动标记。此外,该方法被称为深蒸馏散列(DDH),提出为基准的高效深掌纹识别。掌纹图像被转换为二进制码来提高特征匹配的效率。从知识蒸馏而得,新颖蒸馏损失函数被构造成压缩深模型进一步提高的特征提取的光网络上的效率。综合实验两个约束和不受约束的掌纹数据库进行。使用DDH,掌纹识别精度可提高到11.37%,并且可以通过最多减少到3.11%掌纹核查等错误率(EER)。结果表明我们的数据库的可行性,并DDH可以超越其他基准,实现国家的最先进的性能。所收集的数据集和相关的源代码是公开的,在此http网址。
36. Pyramid Focusing Network for mutation prediction and classification in CT images [PDF] 返回目录
Xukun Zhang, Wenxin Hu, Wen Wu
Abstract: Predicting the mutation status of genes in tumors is of great clinical significance. Recent studies have suggested that certain mutations may be noninvasively predicted by studying image features of the tumors from Computed Tomography (CT) data. Currently, this kind of image feature identification method mainly relies on manual processing to extract generalized image features alone or machine processing without considering the morphological differences of the tumor itself, which makes it difficult to achieve further breakthroughs. In this paper, we propose a pyramid focusing network (PFNet) for mutation prediction and classification based on CT images. Firstly, we use Space Pyramid Pooling to collect semantic cues in feature maps from multiple scales according to the observation that the shape and size of the tumors are varied.Secondly, we improve the loss function based on the consideration that the features required for proper mutation detection are often not obvious in cross-sections of tumor edges, which raises more attention to these hard examples in the network. Finally, we devise a training scheme based on data augmentation to enhance the generalization ability of networks. Extensively verified on clinical gastric CT datasets of 20 testing volumes with 63648 CT images, our method achieves the accuracy of 94.90% in predicting the HER-2 genes mutation status of at the CT image.
摘要:预测肿瘤基因的突变状态具有重要的临床意义。最近的研究已经表明,某些突变可以通过研究从计算机断层摄影(CT)数据肿瘤的图像特征来非侵入性地进行预测。目前,这种图像特征识别方法的主要依赖于手工处理来提取图像的广义特征单独使用或机器处理,而不考虑肿瘤本身,这使得它难以实现进一步突破的形态差异。在本文中,我们提出了一个金字塔聚焦网络(PFNET)基于CT图像突变预测和分类。首先,我们使用空间金字塔池收集线索的语义功能中根据观察,从多尺度地图,肿瘤的形状和大小varied.Secondly,我们提高基于考虑损失函数的需要进行适当的突变特征检测经常在肿瘤的边缘,这引起了更多的关注到网络中的这些硬实例的横截面并不明显。最后,我们设计基于数据增强,以提高网络的泛化能力的培训计划。对20测试卷与63648 CT图像临床胃癌CT数据验证广泛,我们的方法实现了94.90%的准确预测在CT图像的HER-2基因突变状态。
Xukun Zhang, Wenxin Hu, Wen Wu
Abstract: Predicting the mutation status of genes in tumors is of great clinical significance. Recent studies have suggested that certain mutations may be noninvasively predicted by studying image features of the tumors from Computed Tomography (CT) data. Currently, this kind of image feature identification method mainly relies on manual processing to extract generalized image features alone or machine processing without considering the morphological differences of the tumor itself, which makes it difficult to achieve further breakthroughs. In this paper, we propose a pyramid focusing network (PFNet) for mutation prediction and classification based on CT images. Firstly, we use Space Pyramid Pooling to collect semantic cues in feature maps from multiple scales according to the observation that the shape and size of the tumors are varied.Secondly, we improve the loss function based on the consideration that the features required for proper mutation detection are often not obvious in cross-sections of tumor edges, which raises more attention to these hard examples in the network. Finally, we devise a training scheme based on data augmentation to enhance the generalization ability of networks. Extensively verified on clinical gastric CT datasets of 20 testing volumes with 63648 CT images, our method achieves the accuracy of 94.90% in predicting the HER-2 genes mutation status of at the CT image.
摘要:预测肿瘤基因的突变状态具有重要的临床意义。最近的研究已经表明,某些突变可以通过研究从计算机断层摄影(CT)数据肿瘤的图像特征来非侵入性地进行预测。目前,这种图像特征识别方法的主要依赖于手工处理来提取图像的广义特征单独使用或机器处理,而不考虑肿瘤本身,这使得它难以实现进一步突破的形态差异。在本文中,我们提出了一个金字塔聚焦网络(PFNET)基于CT图像突变预测和分类。首先,我们使用空间金字塔池收集线索的语义功能中根据观察,从多尺度地图,肿瘤的形状和大小varied.Secondly,我们提高基于考虑损失函数的需要进行适当的突变特征检测经常在肿瘤的边缘,这引起了更多的关注到网络中的这些硬实例的横截面并不明显。最后,我们设计基于数据增强,以提高网络的泛化能力的培训计划。对20测试卷与63648 CT图像临床胃癌CT数据验证广泛,我们的方法实现了94.90%的准确预测在CT图像的HER-2基因突变状态。
37. Super-resolution of clinical CT volumes with modified CycleGAN using micro CT volumes [PDF] 返回目录
Tong ZHENG, Hirohisa ODA, Takayasu MORIYA, Takaaki SUGINO, Shota NAKAMURA, Masahiro ODA, Masaki MORI, Hirotsugu TAKABATAKE, Hiroshi NATORI, Kensaku MORI
Abstract: This paper presents a super-resolution (SR) method with unpaired training dataset of clinical CT and micro CT volumes. For obtaining very detailed information such as cancer invasion from pre-operative clinical CT volumes of lung cancer patients, SR of clinical CT volumes to $\m$}CT level is desired. While most SR methods require paired low- and high- resolution images for training, it is infeasible to obtain paired clinical CT and {\mu}CT volumes. We propose a SR approach based on CycleGAN, which could perform SR on clinical CT into $\mu$CT level. We proposed new loss functions to keep cycle consistency, while training without paired volumes. Experimental results demonstrated that our proposed method successfully performed SR of clinical CT volume of lung cancer patients into $\mu$CT level.
摘要:本文提出了一种超分辨率(SR)方法的临床CT和微型CT卷不成对训练数据集。为了获得非常详细的信息,例如来自肺癌患者,临床CT卷SR的手术前的临床CT体积癌症侵入到$ \ M $} CT水平是期望的。虽然大多数SR方法需要配对的低收入和高清晰度的图像进行训练,这是不可行的,获得配对临床CT和{\亩} CT卷。我们提出了一种基于CycleGAN一个SR方法,这可以在临床CT执行SR到$ \ $万亩CT水平。我们提出了新的损失的功能,以保持一致性循环,而没有配对训练量。实验结果表明,我们提出的方法成功地进行临床CT体积的肺癌患者到$ \ $万亩CT水平的SR。
Tong ZHENG, Hirohisa ODA, Takayasu MORIYA, Takaaki SUGINO, Shota NAKAMURA, Masahiro ODA, Masaki MORI, Hirotsugu TAKABATAKE, Hiroshi NATORI, Kensaku MORI
Abstract: This paper presents a super-resolution (SR) method with unpaired training dataset of clinical CT and micro CT volumes. For obtaining very detailed information such as cancer invasion from pre-operative clinical CT volumes of lung cancer patients, SR of clinical CT volumes to $\m$}CT level is desired. While most SR methods require paired low- and high- resolution images for training, it is infeasible to obtain paired clinical CT and {\mu}CT volumes. We propose a SR approach based on CycleGAN, which could perform SR on clinical CT into $\mu$CT level. We proposed new loss functions to keep cycle consistency, while training without paired volumes. Experimental results demonstrated that our proposed method successfully performed SR of clinical CT volume of lung cancer patients into $\mu$CT level.
摘要:本文提出了一种超分辨率(SR)方法的临床CT和微型CT卷不成对训练数据集。为了获得非常详细的信息,例如来自肺癌患者,临床CT卷SR的手术前的临床CT体积癌症侵入到$ \ M $} CT水平是期望的。虽然大多数SR方法需要配对的低收入和高清晰度的图像进行训练,这是不可行的,获得配对临床CT和{\亩} CT卷。我们提出了一种基于CycleGAN一个SR方法,这可以在临床CT执行SR到$ \ $万亩CT水平。我们提出了新的损失的功能,以保持一致性循环,而没有配对训练量。实验结果表明,我们提出的方法成功地进行临床CT体积的肺癌患者到$ \ $万亩CT水平的SR。
38. SC4D: A Sparse 4D Convolutional Network for Skeleton-Based Action Recognition [PDF] 返回目录
Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu
Abstract: In this paper, a new perspective is presented for skeleton-based action recognition. Specifically, we regard the skeletal sequence as a spatial-temporal point cloud and voxelize it into a 4-dimensional grid. A novel sparse 4D convolutional network (SC4D) is proposed to directly process the generated 4D grid for high-level perceptions. Without manually designing the hand-crafted transformation rules, it makes better use of the advantages of the convolutional network, resulting in a more concise, general and robust framework for skeletal data. Besides, by processing the space and time simultaneously, it largely keeps the spatial-temporal consistency of the skeletal data, and thus brings better expressiveness. Moreover, with the help of the sparse tensor, it can be efficiently executed with less computations. To verify the superiority of SC4D, extensive experiments are conducted on two challenging datasets, namely, NTU-RGBD and SHREC, where SC4D achieves state-of-the-art performance on both of them.
摘要:本文提出了一种新的视角提出了一种基于骨架动作识别。具体而言,我们认为骨架序列作为时空点云和它体素化到4维网格。一种新的疏4D卷积网络(SC4D)提出了直接处理所生成的网格4D为高电平感知。无需手动设计手工制作的转换规则,它可以更好地利用的卷积网络的优势,产生了更加简洁,通用和健壮的框架骨骼数据。此外,通过同时处理所述空间和时间,它在很大程度上保持骨架数据的时空一致性,并因此带来了更好的表现力。而且,随着稀疏张量的帮助下,它可有效地以较少的计算来执行。为了验证SC4D的优越性,广泛的实验工作是两个有挑战性的数据集,分别是台大RGBD和SHREC,其中SC4D达到他们两个国家的最先进的性能进行。
Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu
Abstract: In this paper, a new perspective is presented for skeleton-based action recognition. Specifically, we regard the skeletal sequence as a spatial-temporal point cloud and voxelize it into a 4-dimensional grid. A novel sparse 4D convolutional network (SC4D) is proposed to directly process the generated 4D grid for high-level perceptions. Without manually designing the hand-crafted transformation rules, it makes better use of the advantages of the convolutional network, resulting in a more concise, general and robust framework for skeletal data. Besides, by processing the space and time simultaneously, it largely keeps the spatial-temporal consistency of the skeletal data, and thus brings better expressiveness. Moreover, with the help of the sparse tensor, it can be efficiently executed with less computations. To verify the superiority of SC4D, extensive experiments are conducted on two challenging datasets, namely, NTU-RGBD and SHREC, where SC4D achieves state-of-the-art performance on both of them.
摘要:本文提出了一种新的视角提出了一种基于骨架动作识别。具体而言,我们认为骨架序列作为时空点云和它体素化到4维网格。一种新的疏4D卷积网络(SC4D)提出了直接处理所生成的网格4D为高电平感知。无需手动设计手工制作的转换规则,它可以更好地利用的卷积网络的优势,产生了更加简洁,通用和健壮的框架骨骼数据。此外,通过同时处理所述空间和时间,它在很大程度上保持骨架数据的时空一致性,并因此带来了更好的表现力。而且,随着稀疏张量的帮助下,它可有效地以较少的计算来执行。为了验证SC4D的优越性,广泛的实验工作是两个有挑战性的数据集,分别是台大RGBD和SHREC,其中SC4D达到他们两个国家的最先进的性能进行。
39. Hierarchical Opacity Propagation for Image Matting [PDF] 返回目录
Yaoyi Li, Qingyao Xu, Hongtao Lu
Abstract: Natural image matting is a fundamental problem in computational photography and computer vision. Deep neural networks have seen the surge of successful methods in natural image matting in recent years. In contrast to traditional propagation-based matting methods, some top-tier deep image matting approaches tend to perform propagation in the neural network implicitly. A novel structure for more direct alpha matte propagation between pixels is in demand. To this end, this paper presents a hierarchical opacity propagation (HOP) matting method, where the opacity information is propagated in the neighborhood of each point at different semantic levels. The hierarchical structure is based on one global and multiple local propagation blocks. With the HOP structure, every feature point pair in high-resolution feature maps will be connected based on the appearance of input image. We further propose a scale-insensitive positional encoding tailored for image matting to deal with the unfixed size of input image and introduce the random interpolation augmentation into image matting. Extensive experiments and ablation study show that HOP matting is capable of outperforming state-of-the-art matting methods.
摘要:自然图像抠图是计算摄影测量和计算机视觉的一个基本问题。深层神经网络已经看到了在自然图像抠图成功的方法在近年来的激增。与传统的基于传播消光的方法,一些顶级深图像抠图方法往往在神经网络中隐式地进行传播。为像素之间更直接的阿尔法磨砂传播一种新颖的结构是需要的。为此,本文提出了一种分层的不透明度传播(HOP)消光方法,其中不透明度信息在不同的语义等级的每个点附近传播。层次结构是基于一个全局和多个本地传播块。与HOP结构,在高分辨率的特征地图的每个特征点对将基于输入的图像的外观进行连接。我们进一步提出了图像抠图量身定做应对输入图像的大小不固定,并介绍了随机插值增强到图像抠图规模不敏感的位置编码。大量的实验和研究的消融表明,HOP消光能够超越国家的最先进的消光方法。
Yaoyi Li, Qingyao Xu, Hongtao Lu
Abstract: Natural image matting is a fundamental problem in computational photography and computer vision. Deep neural networks have seen the surge of successful methods in natural image matting in recent years. In contrast to traditional propagation-based matting methods, some top-tier deep image matting approaches tend to perform propagation in the neural network implicitly. A novel structure for more direct alpha matte propagation between pixels is in demand. To this end, this paper presents a hierarchical opacity propagation (HOP) matting method, where the opacity information is propagated in the neighborhood of each point at different semantic levels. The hierarchical structure is based on one global and multiple local propagation blocks. With the HOP structure, every feature point pair in high-resolution feature maps will be connected based on the appearance of input image. We further propose a scale-insensitive positional encoding tailored for image matting to deal with the unfixed size of input image and introduce the random interpolation augmentation into image matting. Extensive experiments and ablation study show that HOP matting is capable of outperforming state-of-the-art matting methods.
摘要:自然图像抠图是计算摄影测量和计算机视觉的一个基本问题。深层神经网络已经看到了在自然图像抠图成功的方法在近年来的激增。与传统的基于传播消光的方法,一些顶级深图像抠图方法往往在神经网络中隐式地进行传播。为像素之间更直接的阿尔法磨砂传播一种新颖的结构是需要的。为此,本文提出了一种分层的不透明度传播(HOP)消光方法,其中不透明度信息在不同的语义等级的每个点附近传播。层次结构是基于一个全局和多个本地传播块。与HOP结构,在高分辨率的特征地图的每个特征点对将基于输入的图像的外观进行连接。我们进一步提出了图像抠图量身定做应对输入图像的大小不固定,并介绍了随机插值增强到图像抠图规模不敏感的位置编码。大量的实验和研究的消融表明,HOP消光能够超越国家的最先进的消光方法。
40. Motion-supervised Co-Part Segmentation [PDF] 返回目录
Aliaksandr Siarohin*, Subhankar Roy*, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe
Abstract: Recent co-part segmentation methods mostly operate in a supervised learning setting, which requires a large amount of annotated data for training. To overcome this limitation, we propose a self-supervised deep learning method for co-part segmentation. Differently from previous works, our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts. To this end, our method relies on pairs of frames sampled from the same video. The network learns to predict part segments together with a representation of the motion between two frames, which permits reconstruction of the target image. Through extensive experimental evaluation on publicly available video sequences we demonstrate that our approach can produce improved segmentation maps with respect to previous self-supervised co-part segmentation approaches.
摘要:最近的共同部分的分割方法主要是在监督学习环境,这需要大量的训练注解数据进行操作。为了克服这种局限性,我们提出了联合部分分割自我监督的深度学习方法。不同于以往的作品,我们的方法发展的想法,从视频中推断出运动信息可以被利用来发现有意义的对象部分。为此,我们的方法依赖于对来自同一视频采样帧。网络学习有两个帧之间的运动,这允许对象图像的重建的表示预测部段连接在一起。通过对可公开获得的视频序列广泛的实验评估中,我们证明了我们的方法可以产生改进的分割地图相对于以前的自我监督的共同部分的分割方法。
Aliaksandr Siarohin*, Subhankar Roy*, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe
Abstract: Recent co-part segmentation methods mostly operate in a supervised learning setting, which requires a large amount of annotated data for training. To overcome this limitation, we propose a self-supervised deep learning method for co-part segmentation. Differently from previous works, our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts. To this end, our method relies on pairs of frames sampled from the same video. The network learns to predict part segments together with a representation of the motion between two frames, which permits reconstruction of the target image. Through extensive experimental evaluation on publicly available video sequences we demonstrate that our approach can produce improved segmentation maps with respect to previous self-supervised co-part segmentation approaches.
摘要:最近的共同部分的分割方法主要是在监督学习环境,这需要大量的训练注解数据进行操作。为了克服这种局限性,我们提出了联合部分分割自我监督的深度学习方法。不同于以往的作品,我们的方法发展的想法,从视频中推断出运动信息可以被利用来发现有意义的对象部分。为此,我们的方法依赖于对来自同一视频采样帧。网络学习有两个帧之间的运动,这允许对象图像的重建的表示预测部段连接在一起。通过对可公开获得的视频序列广泛的实验评估中,我们证明了我们的方法可以产生改进的分割地图相对于以前的自我监督的共同部分的分割方法。
41. Utilising Prior Knowledge for Visual Navigation: Distil and Adapt [PDF] 返回目录
M. Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Shi
Abstract: We, as humans, can impeccably navigate to localise a target object, even in an unseen environment. We argue that this impressive ability is largely due to incorporation of \emph{prior knowledge} (or experience) and \emph{visual cues}--that current visual navigation approaches lack. In this paper, we propose to use externally learned prior knowledge of object relations, which is integrated to our model via constructing a neural graph. To combine appropriate assessment of the states and the prior (knowledge), we propose to decompose the value function in the actor-critic reinforcement learning algorithm and incorporate the prior in the critic in a novel way that reduces the model complexity and improves model generalisation. Our approach outperforms the current state-of-the-art in AI2THOR visual navigation dataset.
摘要:我们作为人类,可以抹得导航本地化的目标对象,即使是在一个看不见的环境。我们认为,这令人印象深刻的能力在很大程度上是由于\ {EMPH先验知识纳入}(或经验)和\ {EMPH视觉线索} - ,目前的可视导航方法缺乏。在本文中,我们建议使用对象的关系,这是通过构建神经图形集成到我们的模型从外部学习先验知识。要结合状态和之前的(知识)的合适的评估,我们建议以分解演员评论家强化学习算法的价值功能和降低模型复杂度,提高模型综合了一种新的方式结合在评论家之前。我们的方法优于当前状态的最先进的在AI2THOR可视导航数据集。
M. Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Shi
Abstract: We, as humans, can impeccably navigate to localise a target object, even in an unseen environment. We argue that this impressive ability is largely due to incorporation of \emph{prior knowledge} (or experience) and \emph{visual cues}--that current visual navigation approaches lack. In this paper, we propose to use externally learned prior knowledge of object relations, which is integrated to our model via constructing a neural graph. To combine appropriate assessment of the states and the prior (knowledge), we propose to decompose the value function in the actor-critic reinforcement learning algorithm and incorporate the prior in the critic in a novel way that reduces the model complexity and improves model generalisation. Our approach outperforms the current state-of-the-art in AI2THOR visual navigation dataset.
摘要:我们作为人类,可以抹得导航本地化的目标对象,即使是在一个看不见的环境。我们认为,这令人印象深刻的能力在很大程度上是由于\ {EMPH先验知识纳入}(或经验)和\ {EMPH视觉线索} - ,目前的可视导航方法缺乏。在本文中,我们建议使用对象的关系,这是通过构建神经图形集成到我们的模型从外部学习先验知识。要结合状态和之前的(知识)的合适的评估,我们建议以分解演员评论家强化学习算法的价值功能和降低模型复杂度,提高模型综合了一种新的方式结合在评论家之前。我们的方法优于当前状态的最先进的在AI2THOR可视导航数据集。
42. Neural Image Inpainting Guided with Descriptive Text [PDF] 返回目录
Lisai Zhang, Qingcai Chen, Baotian Hu, Shuoran Jiang
Abstract: Neural image inpainting has achieved promising performance in generating semantically plausible content. Most of the recent works mainly focus on inpainting images depending on vision information, while neglecting the semantic information implied in human languages. To acquire more semantically accurate inpainting images, this paper proposes a novel inpainting model named \textit{N}eural \textit{I}mage Inpainting \textit{G}uided with \textit{D}escriptive \textit{T}ext (NIGDT). First, a dual multi-modal attention mechanism is designed to extract the explicit semantic information about corrupted regions. The mechanism is trained to combine the descriptive text and two complementary images through reciprocal attention maps. Second, an image-text matching loss is designed to enforce the model output following the descriptive text. Its goal is to maximize the semantic similarity of the generated image and the text. Finally, experiments are conducted on two open datasets with captions. Experimental results show that the proposed NIGDT model outperforms all compared models on both quantitative and qualitative comparison. The results also demonstrate that the proposed model can generate images consistent with the guidance text, which provides a flexible way for user-guided inpainting. Our systems and code will be released soon.
摘要:在产生合理的语义内容神经图像修复已取得看好的表现。最近期的作品主要集中在依赖于视觉信息补绘的图像,而忽略了人类语言所蕴含的语义信息。为了获得更多语义准确修补图像,提出命名一种新颖修复模型\ textit {N} eural \ textit {I}法师图像修复\ textit {G} uided与\ textit {d} escriptive \ textit横置EXT(NIGDT )。首先,双多模态注意机制的目的是提取约损坏的地区明确的语义信息。该机制被训练为描述性文字,并通过相互关注地图两种互补的图像组合。第二,图像文本匹配损耗被设计为执行以下描述文本模式输出。它的目标是最大限度地生成的图像和文本的语义相似。最后,实验议两个公开数据集与字幕进行。实验结果表明,该NIGDT模型优于在定性和定量比较所有比较的车型。结果还表明,该模型可生成具有指导文本,它为用户提供引导修补灵活的方式是一致的图像。我们的系统和代码将很快被释放。
Lisai Zhang, Qingcai Chen, Baotian Hu, Shuoran Jiang
Abstract: Neural image inpainting has achieved promising performance in generating semantically plausible content. Most of the recent works mainly focus on inpainting images depending on vision information, while neglecting the semantic information implied in human languages. To acquire more semantically accurate inpainting images, this paper proposes a novel inpainting model named \textit{N}eural \textit{I}mage Inpainting \textit{G}uided with \textit{D}escriptive \textit{T}ext (NIGDT). First, a dual multi-modal attention mechanism is designed to extract the explicit semantic information about corrupted regions. The mechanism is trained to combine the descriptive text and two complementary images through reciprocal attention maps. Second, an image-text matching loss is designed to enforce the model output following the descriptive text. Its goal is to maximize the semantic similarity of the generated image and the text. Finally, experiments are conducted on two open datasets with captions. Experimental results show that the proposed NIGDT model outperforms all compared models on both quantitative and qualitative comparison. The results also demonstrate that the proposed model can generate images consistent with the guidance text, which provides a flexible way for user-guided inpainting. Our systems and code will be released soon.
摘要:在产生合理的语义内容神经图像修复已取得看好的表现。最近期的作品主要集中在依赖于视觉信息补绘的图像,而忽略了人类语言所蕴含的语义信息。为了获得更多语义准确修补图像,提出命名一种新颖修复模型\ textit {N} eural \ textit {I}法师图像修复\ textit {G} uided与\ textit {d} escriptive \ textit横置EXT(NIGDT )。首先,双多模态注意机制的目的是提取约损坏的地区明确的语义信息。该机制被训练为描述性文字,并通过相互关注地图两种互补的图像组合。第二,图像文本匹配损耗被设计为执行以下描述文本模式输出。它的目标是最大限度地生成的图像和文本的语义相似。最后,实验议两个公开数据集与字幕进行。实验结果表明,该NIGDT模型优于在定性和定量比较所有比较的车型。结果还表明,该模型可生成具有指导文本,它为用户提供引导修补灵活的方式是一致的图像。我们的系统和代码将很快被释放。
43. Multi-Task Learning via Co-Attentive Sharing for Pedestrian Attribute Recognition [PDF] 返回目录
Haitian Zeng, Haizhou Ai, Zijie Zhuang, Long Chen
Abstract: Learning to predict multiple attributes of a pedestrian is a multi-task learning problem. To share feature representation between two individual task networks, conventional methods like Cross-Stitch and Sluice network learn a linear combination of features or feature subspaces. However, linear combination rules out the complex interdependency between channels. Moreover, spatial information exchanging is less-considered. In this paper, we propose a novel Co-Attentive Sharing (CAS) module which extracts discriminative channels and spatial regions for more effective feature sharing in multi-task learning. The module consists of three branches, which leverage different channels for between-task feature fusing, attention generation and task-specific feature enhancing, respectively. Experiments on two pedestrian attribute recognition datasets show that our module outperforms the conventional sharing units and achieves superior results compared to the state-of-the-art approaches using many metrics.
摘要:学习预测行人的多个属性是一个多任务学习问题。到两个单独的任务网络之间共享的特征表示,像十字绣和水闸网络常规方法学的特征或特征子空间的线性组合。然而,线性组合排除了信道之间的复杂的相互依赖性。此外,交换空间信息被考虑的以下。在本文中,我们提出一种提取歧视渠道,在多任务学习更有效的特征共享空间区域一个新的合作细心的共享(CAS)模块。该模块包括三个分支,其任务间的功能融合,关注代和任务特定功能的杠杆作用不同渠道分别提高。在两个行人属性识别数据集实验结果表明,我们的模块优于传统的共享单元并且实现优异的结果相比,所述状态的最技术方法使用多个指标。
Haitian Zeng, Haizhou Ai, Zijie Zhuang, Long Chen
Abstract: Learning to predict multiple attributes of a pedestrian is a multi-task learning problem. To share feature representation between two individual task networks, conventional methods like Cross-Stitch and Sluice network learn a linear combination of features or feature subspaces. However, linear combination rules out the complex interdependency between channels. Moreover, spatial information exchanging is less-considered. In this paper, we propose a novel Co-Attentive Sharing (CAS) module which extracts discriminative channels and spatial regions for more effective feature sharing in multi-task learning. The module consists of three branches, which leverage different channels for between-task feature fusing, attention generation and task-specific feature enhancing, respectively. Experiments on two pedestrian attribute recognition datasets show that our module outperforms the conventional sharing units and achieves superior results compared to the state-of-the-art approaches using many metrics.
摘要:学习预测行人的多个属性是一个多任务学习问题。到两个单独的任务网络之间共享的特征表示,像十字绣和水闸网络常规方法学的特征或特征子空间的线性组合。然而,线性组合排除了信道之间的复杂的相互依赖性。此外,交换空间信息被考虑的以下。在本文中,我们提出一种提取歧视渠道,在多任务学习更有效的特征共享空间区域一个新的合作细心的共享(CAS)模块。该模块包括三个分支,其任务间的功能融合,关注代和任务特定功能的杠杆作用不同渠道分别提高。在两个行人属性识别数据集实验结果表明,我们的模块优于传统的共享单元并且实现优异的结果相比,所述状态的最技术方法使用多个指标。
44. Real-time Classification from Short Event-Camera Streams using Input-filtering Neural ODEs [PDF] 返回目录
Giorgio Giannone, Asha Anoosheh, Alessio Quaglino, Pierluca D'Oro, Marco Gallieri, Jonathan Masci
Abstract: Event-based cameras are novel, efficient sensors inspired by the human vision system, generating an asynchronous, pixel-wise stream of data. Learning from such data is generally performed through heavy preprocessing and event integration into images. This requires buffering of possibly long sequences and can limit the response time of the inference system. In this work, we instead propose to directly use events from a DVS camera, a stream of intensity changes and their spatial coordinates. This sequence is used as the input for a novel \emph{asynchronous} RNN-like architecture, the Input-filtering Neural ODEs (INODE). This is inspired by the dynamical systems and filtering literature. INODE is an extension of Neural ODEs (NODE) that allows for input signals to be continuously fed to the network, like in filtering. The approach naturally handles batches of time series with irregular time-stamps by implementing a batch forward Euler solver. INODE is trained like a standard RNN, it learns to discriminate short event sequences and to perform event-by-event online inference. We demonstrate our approach on a series of classification tasks, comparing against a set of LSTM baselines. We show that, independently of the camera resolution, INODE can outperform the baselines by a large margin on the ASL task and it's on par with a much larger LSTM for the NCALTECH task. Finally, we show that INODE is accurate even when provided with very few events.
摘要:基于事件的相机是通过人类视觉系统的启发新颖,高效的传感器,生成数据的异步,逐像素流。从这些数据中学习一般是通过重预处理和事件融入图像进行。这需要可能长序列的缓冲和可以限制推理系统的响应时间。在这项工作中,我们反而建议直接使用从DVS相机,强度变化及其空间坐标的流事件。该序列被用作输入一种新颖\ EMPH {异步} RNN状架构,输入滤波神经常微分方程(INODE)。这是由动力系统和过滤文献启发。 INODE是神经常微分方程(节点),其允许输入信号被连续地供给到所述网络,像滤波的延伸。该方法通过实施一批向欧拉求解自然处理的不规则的时间戳时间序列批次。 INODE被训练像一个标准的RNN,它学会辨别短的事件序列,并执行事件的事件在网上推断。我们证明了我们一系列的分类任务的方法,针对一组LSTM基准的比较。我们表明,独立于照相机的分辨率,索引节点可以通过翔升任务大幅跑赢基准和它看齐的NCALTECH任务大得多LSTM。最后,我们表明,即使极少数事件提供索引节点是准确的。
Giorgio Giannone, Asha Anoosheh, Alessio Quaglino, Pierluca D'Oro, Marco Gallieri, Jonathan Masci
Abstract: Event-based cameras are novel, efficient sensors inspired by the human vision system, generating an asynchronous, pixel-wise stream of data. Learning from such data is generally performed through heavy preprocessing and event integration into images. This requires buffering of possibly long sequences and can limit the response time of the inference system. In this work, we instead propose to directly use events from a DVS camera, a stream of intensity changes and their spatial coordinates. This sequence is used as the input for a novel \emph{asynchronous} RNN-like architecture, the Input-filtering Neural ODEs (INODE). This is inspired by the dynamical systems and filtering literature. INODE is an extension of Neural ODEs (NODE) that allows for input signals to be continuously fed to the network, like in filtering. The approach naturally handles batches of time series with irregular time-stamps by implementing a batch forward Euler solver. INODE is trained like a standard RNN, it learns to discriminate short event sequences and to perform event-by-event online inference. We demonstrate our approach on a series of classification tasks, comparing against a set of LSTM baselines. We show that, independently of the camera resolution, INODE can outperform the baselines by a large margin on the ASL task and it's on par with a much larger LSTM for the NCALTECH task. Finally, we show that INODE is accurate even when provided with very few events.
摘要:基于事件的相机是通过人类视觉系统的启发新颖,高效的传感器,生成数据的异步,逐像素流。从这些数据中学习一般是通过重预处理和事件融入图像进行。这需要可能长序列的缓冲和可以限制推理系统的响应时间。在这项工作中,我们反而建议直接使用从DVS相机,强度变化及其空间坐标的流事件。该序列被用作输入一种新颖\ EMPH {异步} RNN状架构,输入滤波神经常微分方程(INODE)。这是由动力系统和过滤文献启发。 INODE是神经常微分方程(节点),其允许输入信号被连续地供给到所述网络,像滤波的延伸。该方法通过实施一批向欧拉求解自然处理的不规则的时间戳时间序列批次。 INODE被训练像一个标准的RNN,它学会辨别短的事件序列,并执行事件的事件在网上推断。我们证明了我们一系列的分类任务的方法,针对一组LSTM基准的比较。我们表明,独立于照相机的分辨率,索引节点可以通过翔升任务大幅跑赢基准和它看齐的NCALTECH任务大得多LSTM。最后,我们表明,即使极少数事件提供索引节点是准确的。
45. Adaptive Multiscale Illumination-Invariant Feature Representation for Undersampled Face Recognition [PDF] 返回目录
Yang Zhang, Changhui Hu, Xiaobo Lu
Abstract: This paper presents an novel illumination-invariant feature representation approach used to eliminate the varying illumination affection in undersampled face recognition. Firstly, a new illumination level classification technique based on Singular Value Decomposition (SVD) is proposed to judge the illumination level of input image. Secondly, we construct the logarithm edgemaps feature (LEF) based on lambertian model and local near neighbor feature of the face image, applying to local region within multiple scales. Then, the illumination level is referenced to construct the high performance LEF as well realize adaptive fusion for multiple scales LEFs for the face image, performing JLEF-feature. In addition, the constrain operation is used to remove the useless high-frequency interference, disentangling useful facial feature edges and constructing AJLEF-face. Finally, the effects of the our methods and other state-of-the-art algorithms including deep learning methods are tested on Extended Yale B, CMU PIE, AR as well as our Self-build Driver database (SDB). The experimental results demonstrate that the JLEF-feature and AJLEF-face outperform other related approaches for undersampled face recognition under varying illumination.
摘要:提出用于消除欠人脸识别的变化的照明感情的新颖照明不变的特征表示的方法。首先,基于奇异值分解(SVD)以新的照明电平分类技术,提出了以判断输入图像的照明水平。其次,我们构建了对数edgemaps功能(LEF)基于朗伯模型和人脸图像的局部近邻的特征,多尺度范围内适用于局部地区。然后,将照明水平被引用来构建高性能LEF以及实现对多尺度LEFS为脸部图像,在执行JLEF特征自适应融合。此外,该约束操作用于去除无用的高频干扰,解开有用面部特征的边缘和构建AJLEF面。最后,对我们的方法和国家的最先进的其他算法包括深学习方法的效果是在扩展耶鲁B,CMU PIE,AR测试以及我们自建的驱动程序数据库(SDB)。实验结果表明,JLEF特征和AJLEF面优于变化的照明下欠人脸识别其他相关方法。
Yang Zhang, Changhui Hu, Xiaobo Lu
Abstract: This paper presents an novel illumination-invariant feature representation approach used to eliminate the varying illumination affection in undersampled face recognition. Firstly, a new illumination level classification technique based on Singular Value Decomposition (SVD) is proposed to judge the illumination level of input image. Secondly, we construct the logarithm edgemaps feature (LEF) based on lambertian model and local near neighbor feature of the face image, applying to local region within multiple scales. Then, the illumination level is referenced to construct the high performance LEF as well realize adaptive fusion for multiple scales LEFs for the face image, performing JLEF-feature. In addition, the constrain operation is used to remove the useless high-frequency interference, disentangling useful facial feature edges and constructing AJLEF-face. Finally, the effects of the our methods and other state-of-the-art algorithms including deep learning methods are tested on Extended Yale B, CMU PIE, AR as well as our Self-build Driver database (SDB). The experimental results demonstrate that the JLEF-feature and AJLEF-face outperform other related approaches for undersampled face recognition under varying illumination.
摘要:提出用于消除欠人脸识别的变化的照明感情的新颖照明不变的特征表示的方法。首先,基于奇异值分解(SVD)以新的照明电平分类技术,提出了以判断输入图像的照明水平。其次,我们构建了对数edgemaps功能(LEF)基于朗伯模型和人脸图像的局部近邻的特征,多尺度范围内适用于局部地区。然后,将照明水平被引用来构建高性能LEF以及实现对多尺度LEFS为脸部图像,在执行JLEF特征自适应融合。此外,该约束操作用于去除无用的高频干扰,解开有用面部特征的边缘和构建AJLEF面。最后,对我们的方法和国家的最先进的其他算法包括深学习方法的效果是在扩展耶鲁B,CMU PIE,AR测试以及我们自建的驱动程序数据库(SDB)。实验结果表明,JLEF特征和AJLEF面优于变化的照明下欠人脸识别其他相关方法。
46. Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation [PDF] 返回目录
Zhe Wang, Daeyun Shin, Charless C. Fowlkes
Abstract: Monocular estimation of 3d human pose has attracted increased attention with the availability of large ground-truth motion capture datasets. However, the diversity of training data available is limited and it is not clear to what extent methods generalize outside the specific datasets they are trained on. In this work we carry out a systematic study of the diversity and biases present in specific datasets and its effect on cross-dataset generalization across a compendium of 5 pose datasets. We specifically focus on systematic differences in the distribution of camera viewpoints relative to a body-centered coordinate frame. Based on this observation, we propose an auxiliary task of predicting the camera viewpoint in addition to pose. We find that models trained to jointly predict viewpoint and pose systematically show significantly improved cross-dataset generalization.
摘要:三维人体姿势的单眼估计已吸引了越来越多的关注与大型地面实况动作捕捉数据集的可用性。然而,现有的训练数据的多样性是有限的,目前尚不清楚在何种程度上概括的方法外,他们正在训练特定数据集。在这项工作中,我们开展多样性和偏见存在于特定的数据集的系统研究和跨5个姿态数据集汇编了跨数据集推广效果。我们特别着眼于在照相机视点的分布相对于系统性差异体心坐标框架。基于这一观察,我们提出预测除了姿势摄像机视点的辅助任务。我们发现受过训练,共同的是模型预测的观点,并造成系统表现出显著改善跨数据集的泛化。
Zhe Wang, Daeyun Shin, Charless C. Fowlkes
Abstract: Monocular estimation of 3d human pose has attracted increased attention with the availability of large ground-truth motion capture datasets. However, the diversity of training data available is limited and it is not clear to what extent methods generalize outside the specific datasets they are trained on. In this work we carry out a systematic study of the diversity and biases present in specific datasets and its effect on cross-dataset generalization across a compendium of 5 pose datasets. We specifically focus on systematic differences in the distribution of camera viewpoints relative to a body-centered coordinate frame. Based on this observation, we propose an auxiliary task of predicting the camera viewpoint in addition to pose. We find that models trained to jointly predict viewpoint and pose systematically show significantly improved cross-dataset generalization.
摘要:三维人体姿势的单眼估计已吸引了越来越多的关注与大型地面实况动作捕捉数据集的可用性。然而,现有的训练数据的多样性是有限的,目前尚不清楚在何种程度上概括的方法外,他们正在训练特定数据集。在这项工作中,我们开展多样性和偏见存在于特定的数据集的系统研究和跨5个姿态数据集汇编了跨数据集推广效果。我们特别着眼于在照相机视点的分布相对于系统性差异体心坐标框架。基于这一观察,我们提出预测除了姿势摄像机视点的辅助任务。我们发现受过训练,共同的是模型预测的观点,并造成系统表现出显著改善跨数据集的泛化。
47. Human Motion Transfer from Poses in the Wild [PDF] 返回目录
Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang
Abstract: In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. Despite substantial progress on the topic, there exist several problems with the previous methods. First, there is a domain gap between training and testing pose sequences--the model is tested on poses it has not seen during training, such as difficult dancing moves. Furthermore, pose detection errors are inevitable, making the job of the generator harder. Finally, generating realistic pixels from sparse poses is challenging in a single step. To address these challenges, we introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training. We propose a pose augmentation method to minimize the training-test gap, a unified paired and unpaired learning strategy to improve the robustness to detection errors, and two-stage network architecture to achieve superior texture quality. To further boost research on the topic, we build two human motion datasets. Finally, we show the superiority of our approach over the state-of-the-art studies through extensive experiments and evaluations on different datasets.
摘要:在本文中,我们解决人类运动传递,在这里我们合成新型移动视频,从参考视频模仿运动的目标的人的问题。这是一个视频到视频翻译任务所估计的姿态被用来桥接两个领域。尽管在话题实质性进展,存在着一些问题与以前的方法。首先,有训练和测试姿势序列之间的间隙域 - 该模型是在其上训练时没有看到姿势,如难以舞蹈动作测试。此外,姿势检测错误是不可避免的,使得发电机的工作更难。最后,从稀疏姿势产生逼真的像素在一个单一的步骤是具有挑战性的。为了应对这些挑战,我们引入了一种新的姿势到视频转换框架生成高品质的视频在时间上一致,即使在最野化训练过程中看不见的姿势序列。我们提出了一个姿势隆胸方法,以尽量减少培训测试间隙,统一成对和不成对学习策略,提高稳健性检测误差,和两级网络架构,从而实现卓越的纹理质量。对问题的进一步提升研究,我们构建两个人体运动数据集。最后,我们将展示我们对国家的最先进的研究方法,通过对不同的数据集广泛的实验和评价的优越性。
Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang
Abstract: In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. Despite substantial progress on the topic, there exist several problems with the previous methods. First, there is a domain gap between training and testing pose sequences--the model is tested on poses it has not seen during training, such as difficult dancing moves. Furthermore, pose detection errors are inevitable, making the job of the generator harder. Finally, generating realistic pixels from sparse poses is challenging in a single step. To address these challenges, we introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training. We propose a pose augmentation method to minimize the training-test gap, a unified paired and unpaired learning strategy to improve the robustness to detection errors, and two-stage network architecture to achieve superior texture quality. To further boost research on the topic, we build two human motion datasets. Finally, we show the superiority of our approach over the state-of-the-art studies through extensive experiments and evaluations on different datasets.
摘要:在本文中,我们解决人类运动传递,在这里我们合成新型移动视频,从参考视频模仿运动的目标的人的问题。这是一个视频到视频翻译任务所估计的姿态被用来桥接两个领域。尽管在话题实质性进展,存在着一些问题与以前的方法。首先,有训练和测试姿势序列之间的间隙域 - 该模型是在其上训练时没有看到姿势,如难以舞蹈动作测试。此外,姿势检测错误是不可避免的,使得发电机的工作更难。最后,从稀疏姿势产生逼真的像素在一个单一的步骤是具有挑战性的。为了应对这些挑战,我们引入了一种新的姿势到视频转换框架生成高品质的视频在时间上一致,即使在最野化训练过程中看不见的姿势序列。我们提出了一个姿势隆胸方法,以尽量减少培训测试间隙,统一成对和不成对学习策略,提高稳健性检测误差,和两级网络架构,从而实现卓越的纹理质量。对问题的进一步提升研究,我们构建两个人体运动数据集。最后,我们将展示我们对国家的最先进的研究方法,通过对不同的数据集广泛的实验和评价的优越性。
48. Toward Fine-grained Facial Expression Manipulation [PDF] 返回目录
Jun Ling, Han Xue, Li Song, Shuhui Yang, Rong Xie, Xiao Gu
Abstract: Facial expression manipulation, as an image-to-image translation problem, aims at editing facial expression with a given condition. Previous methods edit an input image under the guidance of a discrete emotion label or absolute condition (e.g., facial action units) to possess the desired expression. However, these methods either suffer from changing condition-irrelevant regions or are inefficient to preserve image quality. In this study, we take these two objectives into consideration and propose a novel conditional GAN model. First, we replace continuous absolute condition with relative condition, specifically, relative action units. With relative action units, the generator learns to only transform regions of interest which are specified by non-zero-valued relative AUs, avoiding estimating the current AUs of input image. Second, our generator is built on U-Net architecture and strengthened by multi-scale feature fusion (MSF) mechanism for high-quality expression editing purpose. Extensive experiments on both quantitative and qualitative evaluation demonstrate the improvements of our proposed approach compared with the state-of-the-art expression editing methods.
摘要:人脸表情的操作,作为图像 - 图像翻译的问题,目的是编辑给定的条件的面部表情。以前的方法编辑离散情感标签或绝对条件(例如,面部动作单元)的指导下的输入图像具有所需的表达。然而,这些方法无论是从变更条件无关的区域遭受或效率低下保持图像质量。在这项研究中,我们以这两个目标考虑,提出了一种新的条件GAN模式。首先,我们与相关条件,具体而言,相对动作单元代替连续绝对条件。与相对动作单元,发电机学会仅变换其通过非零值的AU相对指定感兴趣的区域,从而避免估计输入图像的当前的AU。其次,我们的发电机是建立在掌中宽带架构,并通过多尺度特征融合(MSF)机制,高品质的表达编辑宗旨加强。在定量和定性评估大量的实验证明与国家的最先进的表达编辑方法相比我们所提出的方法的改进。
Jun Ling, Han Xue, Li Song, Shuhui Yang, Rong Xie, Xiao Gu
Abstract: Facial expression manipulation, as an image-to-image translation problem, aims at editing facial expression with a given condition. Previous methods edit an input image under the guidance of a discrete emotion label or absolute condition (e.g., facial action units) to possess the desired expression. However, these methods either suffer from changing condition-irrelevant regions or are inefficient to preserve image quality. In this study, we take these two objectives into consideration and propose a novel conditional GAN model. First, we replace continuous absolute condition with relative condition, specifically, relative action units. With relative action units, the generator learns to only transform regions of interest which are specified by non-zero-valued relative AUs, avoiding estimating the current AUs of input image. Second, our generator is built on U-Net architecture and strengthened by multi-scale feature fusion (MSF) mechanism for high-quality expression editing purpose. Extensive experiments on both quantitative and qualitative evaluation demonstrate the improvements of our proposed approach compared with the state-of-the-art expression editing methods.
摘要:人脸表情的操作,作为图像 - 图像翻译的问题,目的是编辑给定的条件的面部表情。以前的方法编辑离散情感标签或绝对条件(例如,面部动作单元)的指导下的输入图像具有所需的表达。然而,这些方法无论是从变更条件无关的区域遭受或效率低下保持图像质量。在这项研究中,我们以这两个目标考虑,提出了一种新的条件GAN模式。首先,我们与相关条件,具体而言,相对动作单元代替连续绝对条件。与相对动作单元,发电机学会仅变换其通过非零值的AU相对指定感兴趣的区域,从而避免估计输入图像的当前的AU。其次,我们的发电机是建立在掌中宽带架构,并通过多尺度特征融合(MSF)机制,高品质的表达编辑宗旨加强。在定量和定性评估大量的实验证明与国家的最先进的表达编辑方法相比我们所提出的方法的改进。
49. Generative Adversarial Zero-shot Learning via Knowledge Graphs [PDF] 返回目录
Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen
Abstract: Zero-shot learning (ZSL) is to handle the prediction of those unseen classes that have no labeled training data. Recently, generative methods like Generative Adversarial Networks (GANs) are being widely investigated for ZSL due to their high accuracy, generalization capability and so on. However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes. In this paper, we introduce a new generative ZSL method named KG-GAN by incorporating rich semantics in a knowledge graph (KG) into GANs. Specifically, we build upon Graph Neural Networks and encode KG from two views: class view and attribute view considering the different semantics of KG. With well-learned semantic embeddings for each node (representing a visual category), we leverage GANs to synthesize compelling visual features for unseen classes. According to our evaluation with multiple image classification datasets, KG-GAN can achieve better performance than the state-of-the-art baselines.
摘要:零次学习(ZSL)是处理那些没有标记的训练数据的那些看不见类的预测。最近,像剖成对抗性网络(甘斯)生成方法正在广泛研究的ZSL由于其精度高,泛化能力强等特点。然而,现在使用的类的辅助信息被限制在文字描述和属性的注释,这是在类的语义的短。在本文中,我们引入一个名为KG-GaN通过在知识图(KG)到甘斯结合丰富的语义新生成ZSL方法。具体来说,我们从两个观点建立在图的神经网络和编码KG:类映射视图,并考虑KG的不同语义属性视图。随着每一个节点(代表了一种类型)以及学习的语义的嵌入,我们利用甘斯合成引人注目的视觉特征为看不见的类。根据我们与多个图像分类数据集的评估,KG-GaN可以实现比国家的最先进的基线更好的性能。
Yuxia Geng, Jiaoyan Chen, Zhuo Chen, Zhiquan Ye, Zonggang Yuan, Yantao Jia, Huajun Chen
Abstract: Zero-shot learning (ZSL) is to handle the prediction of those unseen classes that have no labeled training data. Recently, generative methods like Generative Adversarial Networks (GANs) are being widely investigated for ZSL due to their high accuracy, generalization capability and so on. However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes. In this paper, we introduce a new generative ZSL method named KG-GAN by incorporating rich semantics in a knowledge graph (KG) into GANs. Specifically, we build upon Graph Neural Networks and encode KG from two views: class view and attribute view considering the different semantics of KG. With well-learned semantic embeddings for each node (representing a visual category), we leverage GANs to synthesize compelling visual features for unseen classes. According to our evaluation with multiple image classification datasets, KG-GAN can achieve better performance than the state-of-the-art baselines.
摘要:零次学习(ZSL)是处理那些没有标记的训练数据的那些看不见类的预测。最近,像剖成对抗性网络(甘斯)生成方法正在广泛研究的ZSL由于其精度高,泛化能力强等特点。然而,现在使用的类的辅助信息被限制在文字描述和属性的注释,这是在类的语义的短。在本文中,我们引入一个名为KG-GaN通过在知识图(KG)到甘斯结合丰富的语义新生成ZSL方法。具体来说,我们从两个观点建立在图的神经网络和编码KG:类映射视图,并考虑KG的不同语义属性视图。随着每一个节点(代表了一种类型)以及学习的语义的嵌入,我们利用甘斯合成引人注目的视觉特征为看不见的类。根据我们与多个图像分类数据集的评估,KG-GaN可以实现比国家的最先进的基线更好的性能。
50. End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [PDF] 返回目录
Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao
Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. However, so far these two networks have to be trained separately. In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end. The resulting framework is compatible with most state-of-the-art networks for both tasks and in combination with PointRCNN improves over PL consistently across all benchmarks -- yielding the highest entry on the KITTI image-based 3D object detection leaderboard at the time of submission. Our code will be made available at this https URL.
摘要:可靠和准确的立体物检测是安全的自动驾驶的必需品。虽然激光雷达传感器可以提供环境的准确的三维点云估计,它们也是许多设置过于昂贵。最近,引入伪激光雷达(PL)的导致了基于激光雷达传感器和方法是那些基于廉价立体摄像机之间的间隙的精度急剧降低。 PL通过转换2D深度图输出到3D点云输入结合了与那些用于3D物体检测三维深度估计状态的最先进的深神经网络。然而,到目前为止,这两个网络必须单独训练。在本文中,我们介绍了基于表象的微变化(COR)模块,允许整个PL管道进行培训结束到终端的新框架。所得到的框架是与大多数国家的最先进的网络,对于这两项任务,并与PointRCNN改善了PL组合兼容一直存在于所有的基准 - 产生在时间上基于图像KITTI立体物检测排行榜最高的入门提交。我们的代码将在这个HTTPS URL提供。
Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao
Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. However, so far these two networks have to be trained separately. In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end. The resulting framework is compatible with most state-of-the-art networks for both tasks and in combination with PointRCNN improves over PL consistently across all benchmarks -- yielding the highest entry on the KITTI image-based 3D object detection leaderboard at the time of submission. Our code will be made available at this https URL.
摘要:可靠和准确的立体物检测是安全的自动驾驶的必需品。虽然激光雷达传感器可以提供环境的准确的三维点云估计,它们也是许多设置过于昂贵。最近,引入伪激光雷达(PL)的导致了基于激光雷达传感器和方法是那些基于廉价立体摄像机之间的间隙的精度急剧降低。 PL通过转换2D深度图输出到3D点云输入结合了与那些用于3D物体检测三维深度估计状态的最先进的深神经网络。然而,到目前为止,这两个网络必须单独训练。在本文中,我们介绍了基于表象的微变化(COR)模块,允许整个PL管道进行培训结束到终端的新框架。所得到的框架是与大多数国家的最先进的网络,对于这两项任务,并与PointRCNN改善了PL组合兼容一直存在于所有的基准 - 产生在时间上基于图像KITTI立体物检测排行榜最高的入门提交。我们的代码将在这个HTTPS URL提供。
51. A Method for Curation of Web-Scraped Face Image Datasets [PDF] 返回目录
Kai Zhang, Vítor Albiero, Kevin W. Bowyer
Abstract: Web-scraped, in-the-wild datasets have become the norm in face recognition research. The numbers of subjects and images acquired in web-scraped datasets are usually very large, with number of images on the millions scale. A variety of issues occur when collecting a dataset in-the-wild, including images with the wrong identity label, duplicate images, duplicate subjects and variation in quality. With the number of images being in the millions, a manual cleaning procedure is not feasible. But fully automated methods used to date result in a less-than-ideal level of clean dataset. We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods, with similar quality across men and women, to support comparison of accuracy across gender. Our approach removes near-duplicate images, merges duplicate subjects, corrects mislabeled images, and removes images outside a defined range of pose and quality. We conduct the curation on the Asian Face Dataset (AFD) and VGGFace2 test dataset. The experiments show that a state-of-the-art method achieves a much higher accuracy on the datasets after they are curated. Finally, we release our cleaned versions of both datasets to the research community.
摘要:网络刮,在最狂野的数据集已经成为人脸识别研究的规范。在Web刮数据集收购对象和图像的数字通常是非常大的,对数以百万计的规模图像数量。在最疯狂收集数据集时,包括在质量与错误的身份标签图像,图像重复,重复的主题和变化会发生各种问题。与图像数百万的存在的数,手动清洁程序是不可行的。但是全自动清洁数据集的低于理想水平迄今使用结果的方法。我们提出了一个半自动化的方法,其目的是有一个干净的数据集用于测试人脸识别方法,并在保持男性和女性类似的质量,跨性别的准确性支持比较。我们的方式消除近乎重复的图像,合并重复的科目,贴错标签的图像进行校正,并定义的姿势和质量的范围之外,移除了图像。我们进行的亚洲面孔数据集(AFD)和VGGFace2测试数据集的策展。实验结果表明,国家的最先进的方法实现上的数据集的更高的精度它们策划后。最后,我们发布清洁两个数据集的版本研究团体。
Kai Zhang, Vítor Albiero, Kevin W. Bowyer
Abstract: Web-scraped, in-the-wild datasets have become the norm in face recognition research. The numbers of subjects and images acquired in web-scraped datasets are usually very large, with number of images on the millions scale. A variety of issues occur when collecting a dataset in-the-wild, including images with the wrong identity label, duplicate images, duplicate subjects and variation in quality. With the number of images being in the millions, a manual cleaning procedure is not feasible. But fully automated methods used to date result in a less-than-ideal level of clean dataset. We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods, with similar quality across men and women, to support comparison of accuracy across gender. Our approach removes near-duplicate images, merges duplicate subjects, corrects mislabeled images, and removes images outside a defined range of pose and quality. We conduct the curation on the Asian Face Dataset (AFD) and VGGFace2 test dataset. The experiments show that a state-of-the-art method achieves a much higher accuracy on the datasets after they are curated. Finally, we release our cleaned versions of both datasets to the research community.
摘要:网络刮,在最狂野的数据集已经成为人脸识别研究的规范。在Web刮数据集收购对象和图像的数字通常是非常大的,对数以百万计的规模图像数量。在最疯狂收集数据集时,包括在质量与错误的身份标签图像,图像重复,重复的主题和变化会发生各种问题。与图像数百万的存在的数,手动清洁程序是不可行的。但是全自动清洁数据集的低于理想水平迄今使用结果的方法。我们提出了一个半自动化的方法,其目的是有一个干净的数据集用于测试人脸识别方法,并在保持男性和女性类似的质量,跨性别的准确性支持比较。我们的方式消除近乎重复的图像,合并重复的科目,贴错标签的图像进行校正,并定义的姿势和质量的范围之外,移除了图像。我们进行的亚洲面孔数据集(AFD)和VGGFace2测试数据集的策展。实验结果表明,国家的最先进的方法实现上的数据集的更高的精度它们策划后。最后,我们发布清洁两个数据集的版本研究团体。
52. MGGR: MultiModal-Guided Gaze Redirection with Coarse-to-Fine Learning [PDF] 返回目录
Jingjing Chen, Jichao Zhang, Jiayuan Fan, Tao Chen, Enver Sangineto, Nicu Sebe
Abstract: Gaze redirection aims at manipulating a given eye gaze to a desirable direction according to a reference angle and it can be applied to many real life scenarios, such as video-conferencing or taking groups. However, the previous works suffer from two limitations: (1) low-quality generation and (2) low redirection precision. To this end, we propose an innovative MultiModal-Guided Gaze Redirection~(MGGR) framework that fully exploits eye-map images and target angles to adjust a given eye appearance through a designed coarse-to-fine learning. Our contribution is combining the flow-learning and adversarial learning for coarse-to-fine generation. More specifically, the role of the proposed coarse branch with flow field is to rapidly learn the spatial transformation for attaining the warped result with the desired gaze. The proposed fine-grained branch consists of a generator network with conditional residual image learning and a multi-task discriminator to reduce the gap between the warped image and the ground-truth image for recovering the finer texture details. Moreover, we propose leveraging the gazemap for desired angles as an extra guide to further improve the precision of gaze redirection. Extensive experiments on a benchmark dataset show that the proposed method outperforms the state-of-the-art methods in terms of image quality and redirection precision. Further evaluations demonstrate the effectiveness of the proposed coarse-to-fine and gazemap modules.
摘要:凝视重定向目的在操纵一个给定的眼根据参考角度注视到所希望的方向和它可以被应用到许多真实生活场景,诸如视频会议或服用组。 (1)低质量生成和(2)低重定向精度:但是,之前的作品从两个限制的影响。为此,我们提出了一个创新的多模式制导凝视重定向〜(MGGR)框架,充分利用了人眼的地图图像和目标角度来调整通过设计粗到精的学习给定眼的外观。我们的贡献是结合流动学习和对抗性学习粗到细的一代。更具体地,与流场所提出的粗支的作用是迅速学习空间变换为与所需的注视达到翘曲结果。所提出的细粒度支路由发电机网络条件残像学习和多任务鉴别,减少扭曲图像和地面实况图像之间的差距为恢复更精细的纹理细节。此外,我们建议撬动gazemap为所需角度作为一个额外的引导件,以进一步提高注视重定向的精度。上的基准数据集显示,该方法优于在图像质量和精度重定向方面国家的最先进的方法广泛的实验。进一步的评估证明了该粗到细和gazemap模块的有效性。
Jingjing Chen, Jichao Zhang, Jiayuan Fan, Tao Chen, Enver Sangineto, Nicu Sebe
Abstract: Gaze redirection aims at manipulating a given eye gaze to a desirable direction according to a reference angle and it can be applied to many real life scenarios, such as video-conferencing or taking groups. However, the previous works suffer from two limitations: (1) low-quality generation and (2) low redirection precision. To this end, we propose an innovative MultiModal-Guided Gaze Redirection~(MGGR) framework that fully exploits eye-map images and target angles to adjust a given eye appearance through a designed coarse-to-fine learning. Our contribution is combining the flow-learning and adversarial learning for coarse-to-fine generation. More specifically, the role of the proposed coarse branch with flow field is to rapidly learn the spatial transformation for attaining the warped result with the desired gaze. The proposed fine-grained branch consists of a generator network with conditional residual image learning and a multi-task discriminator to reduce the gap between the warped image and the ground-truth image for recovering the finer texture details. Moreover, we propose leveraging the gazemap for desired angles as an extra guide to further improve the precision of gaze redirection. Extensive experiments on a benchmark dataset show that the proposed method outperforms the state-of-the-art methods in terms of image quality and redirection precision. Further evaluations demonstrate the effectiveness of the proposed coarse-to-fine and gazemap modules.
摘要:凝视重定向目的在操纵一个给定的眼根据参考角度注视到所希望的方向和它可以被应用到许多真实生活场景,诸如视频会议或服用组。 (1)低质量生成和(2)低重定向精度:但是,之前的作品从两个限制的影响。为此,我们提出了一个创新的多模式制导凝视重定向〜(MGGR)框架,充分利用了人眼的地图图像和目标角度来调整通过设计粗到精的学习给定眼的外观。我们的贡献是结合流动学习和对抗性学习粗到细的一代。更具体地,与流场所提出的粗支的作用是迅速学习空间变换为与所需的注视达到翘曲结果。所提出的细粒度支路由发电机网络条件残像学习和多任务鉴别,减少扭曲图像和地面实况图像之间的差距为恢复更精细的纹理细节。此外,我们建议撬动gazemap为所需角度作为一个额外的引导件,以进一步提高注视重定向的精度。上的基准数据集显示,该方法优于在图像质量和精度重定向方面国家的最先进的方法广泛的实验。进一步的评估证明了该粗到细和gazemap模块的有效性。
53. Depth Sensing Beyond LiDAR Range [PDF] 返回目录
Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen
Abstract: Depth sensing is a critical component of autonomous driving technologies, but today's LiDAR- or stereo camera-based solutions have limited range. We seek to increase the maximum range of self-driving vehicles' depth perception modules for the sake of better safety. To that end, we propose a novel three-camera system that utilizes small field of view cameras. Our system, along with our novel algorithm for computing metric depth, does not require full pre-calibration and can output dense depth maps with practically acceptable accuracy for scenes and objects at long distances not well covered by most commercial LiDARs.
摘要:深度感知是自主驾驶技术的重要组成部分,但今天的LiDAR-或立体声基于摄像机的解决方案具有有限的范围内。我们努力提高的自驾车车辆的深度感知模块的最大范围内获得更好的安全着想。为此,我们建议采用的视摄像头的小领域的新三摄像系统。我们的系统,我们的新算法计算量度深度一起,不需要与长距离不能很好的大多数商业激光雷达覆盖的场景和物体实际上是可接受的准确性承担全部预校准,并可以输出稠密深度图。
Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen
Abstract: Depth sensing is a critical component of autonomous driving technologies, but today's LiDAR- or stereo camera-based solutions have limited range. We seek to increase the maximum range of self-driving vehicles' depth perception modules for the sake of better safety. To that end, we propose a novel three-camera system that utilizes small field of view cameras. Our system, along with our novel algorithm for computing metric depth, does not require full pre-calibration and can output dense depth maps with practically acceptable accuracy for scenes and objects at long distances not well covered by most commercial LiDARs.
摘要:深度感知是自主驾驶技术的重要组成部分,但今天的LiDAR-或立体声基于摄像机的解决方案具有有限的范围内。我们努力提高的自驾车车辆的深度感知模块的最大范围内获得更好的安全着想。为此,我们建议采用的视摄像头的小领域的新三摄像系统。我们的系统,我们的新算法计算量度深度一起,不需要与长距离不能很好的大多数商业激光雷达覆盖的场景和物体实际上是可接受的准确性承担全部预校准,并可以输出稠密深度图。
54. Manifold-driven Attention Maps for Weakly Supervised Segmentation [PDF] 返回目录
Sukesh Adiga V, Jose Dolz, Herve Lombaert
Abstract: Segmentation using deep learning has shown promising directions in medical imaging as it aids in the analysis and diagnosis of diseases. Nevertheless, a main drawback of deep models is that they require a large amount of pixel-level labels, which are laborious and expensive to obtain. To mitigate this problem, weakly supervised learning has emerged as an efficient alternative, which employs image-level labels, scribbles, points, or bounding boxes as supervision. Among these, image-level labels are easier to obtain. However, since this type of annotation only contains object category information, the segmentation task under this learning paradigm is a challenging problem. To address this issue, visual salient regions derived from trained classification networks are typically used. Despite their success to identify important regions on classification tasks, these saliency regions only focus on the most discriminant areas of an image, limiting their use in semantic segmentation. In this work, we propose a manifold driven attention-based network to enhance visual salient regions, thereby improving segmentation accuracy in a weakly supervised setting. Our method generates superior attention maps directly during inference without the need of extra computations. We evaluate the benefits of our approach in the task of segmentation using a public benchmark on skin lesion images. Results demonstrate that our method outperforms the state-of-the-art GradCAM by a margin of ~22% in terms of Dice score.
摘要:分割使用深学习已示于医学成像有前途的方向,因为它在分析和诊断疾病爱滋病。然而,深模型的主要缺点是它们需要大量的像素级别的标签,这是费力的和昂贵的获得。为了缓解这一问题,弱监督学习已经成为一种有效的替代方案,它采用影像级标签,涂鸦,点,或边界框的监督。其中,图像层次标签更容易获得。然而,由于这种类型的注释中只包含对象类别的信息,这种学习模式下的细分任务是一个具有挑战性的问题。为了解决这个问题,从训练的分类网络衍生的视觉显着区域通常被使用。尽管他们的成功,以确定对分类任务的重要区域,这些区域的显着只注重图像的最判别区域,限制其在语义分割使用。在这项工作中,我们提出了一种歧管驱动的关注,基于网络,以增强视觉显着区域,从而提高在弱监督设置分割精度。我们的方法产生的推理过程中优越的注意力直接映射,而无需额外计算。我们评估我们使用对皮肤的损伤图像公共基准分割的任务方法的好处。结果表明,我们的方法通过的〜22%的裕度优于国家的最先进的GradCAM在骰子分数方面。
Sukesh Adiga V, Jose Dolz, Herve Lombaert
Abstract: Segmentation using deep learning has shown promising directions in medical imaging as it aids in the analysis and diagnosis of diseases. Nevertheless, a main drawback of deep models is that they require a large amount of pixel-level labels, which are laborious and expensive to obtain. To mitigate this problem, weakly supervised learning has emerged as an efficient alternative, which employs image-level labels, scribbles, points, or bounding boxes as supervision. Among these, image-level labels are easier to obtain. However, since this type of annotation only contains object category information, the segmentation task under this learning paradigm is a challenging problem. To address this issue, visual salient regions derived from trained classification networks are typically used. Despite their success to identify important regions on classification tasks, these saliency regions only focus on the most discriminant areas of an image, limiting their use in semantic segmentation. In this work, we propose a manifold driven attention-based network to enhance visual salient regions, thereby improving segmentation accuracy in a weakly supervised setting. Our method generates superior attention maps directly during inference without the need of extra computations. We evaluate the benefits of our approach in the task of segmentation using a public benchmark on skin lesion images. Results demonstrate that our method outperforms the state-of-the-art GradCAM by a margin of ~22% in terms of Dice score.
摘要:分割使用深学习已示于医学成像有前途的方向,因为它在分析和诊断疾病爱滋病。然而,深模型的主要缺点是它们需要大量的像素级别的标签,这是费力的和昂贵的获得。为了缓解这一问题,弱监督学习已经成为一种有效的替代方案,它采用影像级标签,涂鸦,点,或边界框的监督。其中,图像层次标签更容易获得。然而,由于这种类型的注释中只包含对象类别的信息,这种学习模式下的细分任务是一个具有挑战性的问题。为了解决这个问题,从训练的分类网络衍生的视觉显着区域通常被使用。尽管他们的成功,以确定对分类任务的重要区域,这些区域的显着只注重图像的最判别区域,限制其在语义分割使用。在这项工作中,我们提出了一种歧管驱动的关注,基于网络,以增强视觉显着区域,从而提高在弱监督设置分割精度。我们的方法产生的推理过程中优越的注意力直接映射,而无需额外计算。我们评估我们使用对皮肤的损伤图像公共基准分割的任务方法的好处。结果表明,我们的方法通过的〜22%的裕度优于国家的最先进的GradCAM在骰子分数方面。
55. When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos [PDF] 返回目录
Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Ella Atkins, David Crandall
Abstract: Video anomaly detection (VAD) has been extensively studied. However, research on egocentric traffic videos with dynamic scenes lacks large-scale benchmark datasets as well as effective evaluation metrics. This paper proposes traffic anomaly detection with a \textit{when-where-what} pipeline to detect, localize, and recognize anomalous events from egocentric videos. We introduce a new dataset called Detection of Traffic Anomaly (DoTA) containing 4,677 videos with temporal, spatial, and categorical annotations. A new spatial-temporal area under curve (STAUC) evaluation metric is proposed and used with DoTA. State-of-the-art methods are benchmarked for two VAD-related tasks.Experimental results show STAUC is an effective VAD metric. To our knowledge, DoTA is the largest traffic anomaly dataset to-date and is the first supporting traffic anomaly studies across when-where-what perspectives. Our code and dataset can be found in: this https URL
摘要:视频异常检测(VAD)已被广泛研究。然而,与动态场景以自我为中心的交通视频研究缺乏大规模的基准数据集以及有效的评价指标。本文提出了流量异常检测与\ {textit时,在那里,什么}管线探测,定位和识别从以自我为中心的视频异常事件。我们引入所谓的流量异常检测(DOTA)含有4677个与时间,空间和明确的注解视频新的数据集。下曲线(STAUC)评估度量的新的空间 - 时间区域被提出并具有DOTA使用。方法基准两VAD相关tasks.Experimental结果的最先进的国家显示STAUC是一种有效的VAD度量。据我们所知,DOTA是最大的流量异常数据集的最新,也是第一支撑流量异常时,对面,在那里,什么样的观点研究。此HTTPS URL:我们的代码和数据集可以发现
Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Ella Atkins, David Crandall
Abstract: Video anomaly detection (VAD) has been extensively studied. However, research on egocentric traffic videos with dynamic scenes lacks large-scale benchmark datasets as well as effective evaluation metrics. This paper proposes traffic anomaly detection with a \textit{when-where-what} pipeline to detect, localize, and recognize anomalous events from egocentric videos. We introduce a new dataset called Detection of Traffic Anomaly (DoTA) containing 4,677 videos with temporal, spatial, and categorical annotations. A new spatial-temporal area under curve (STAUC) evaluation metric is proposed and used with DoTA. State-of-the-art methods are benchmarked for two VAD-related tasks.Experimental results show STAUC is an effective VAD metric. To our knowledge, DoTA is the largest traffic anomaly dataset to-date and is the first supporting traffic anomaly studies across when-where-what perspectives. Our code and dataset can be found in: this https URL
摘要:视频异常检测(VAD)已被广泛研究。然而,与动态场景以自我为中心的交通视频研究缺乏大规模的基准数据集以及有效的评价指标。本文提出了流量异常检测与\ {textit时,在那里,什么}管线探测,定位和识别从以自我为中心的视频异常事件。我们引入所谓的流量异常检测(DOTA)含有4677个与时间,空间和明确的注解视频新的数据集。下曲线(STAUC)评估度量的新的空间 - 时间区域被提出并具有DOTA使用。方法基准两VAD相关tasks.Experimental结果的最先进的国家显示STAUC是一种有效的VAD度量。据我们所知,DOTA是最大的流量异常数据集的最新,也是第一支撑流量异常时,对面,在那里,什么样的观点研究。此HTTPS URL:我们的代码和数据集可以发现
56. Learning Generative Models of Shape Handles [PDF] 返回目录
Matheus Gadelha, Giorgio Gori, Duygu Ceylan, Radomir Mech, Nathan Carr, Tamy Boubekeur, Rui Wang, Subhransu Maji
Abstract: We present a generative model to synthesize 3D shapes as sets of handles - lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations. Our model can generate handle sets with varying cardinality and different types of handles (Figure 1). Key to our approach is a deep architecture that predicts both the parameters and existence of shape handles, and a novel similarity measure that can easily accommodate different types of handles, such as cuboids or sphere-meshes. We leverage the recent advances in semantic 3D annotation as well as automatic shape summarizing techniques to supervise our approach. We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art. Finally, we demonstrate how our method can be used in applications such as interactive shape editing, completion, and interpolation, leveraging the latent space learned by our model to guide these tasks. Project page: this http URL.
摘要:本文提出了一种生成模型合成三维形状,套柄 - 轻量级代理是近似原始的3D形状 - 在交互式编辑,形状分析,建设紧凑型3D表示应用程序。我们的模型可以产生手柄套具有变化的基数和不同类型的手柄(图1)。关键是我们的方法是一种深架构,预测双方的参数和形状的手柄的存在,和一种新型的相似性度量,可以很容易适应不同类型的手柄,诸如长方体或球体的网格。我们利用语义标注3D以及自动形状总结技术的最新进展,以监督我们的方法。我们发现,产生的形状表示是直观,实现卓越的质量比以前的国家的最先进的。最后,我们将演示如何我们的方法可以在应用程序中使用,如互动形状编辑,完成和插值,从而利用我们的模型学会引导这些任务的潜在空间。项目页面:这个HTTP URL。
Matheus Gadelha, Giorgio Gori, Duygu Ceylan, Radomir Mech, Nathan Carr, Tamy Boubekeur, Rui Wang, Subhransu Maji
Abstract: We present a generative model to synthesize 3D shapes as sets of handles - lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations. Our model can generate handle sets with varying cardinality and different types of handles (Figure 1). Key to our approach is a deep architecture that predicts both the parameters and existence of shape handles, and a novel similarity measure that can easily accommodate different types of handles, such as cuboids or sphere-meshes. We leverage the recent advances in semantic 3D annotation as well as automatic shape summarizing techniques to supervise our approach. We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art. Finally, we demonstrate how our method can be used in applications such as interactive shape editing, completion, and interpolation, leveraging the latent space learned by our model to guide these tasks. Project page: this http URL.
摘要:本文提出了一种生成模型合成三维形状,套柄 - 轻量级代理是近似原始的3D形状 - 在交互式编辑,形状分析,建设紧凑型3D表示应用程序。我们的模型可以产生手柄套具有变化的基数和不同类型的手柄(图1)。关键是我们的方法是一种深架构,预测双方的参数和形状的手柄的存在,和一种新型的相似性度量,可以很容易适应不同类型的手柄,诸如长方体或球体的网格。我们利用语义标注3D以及自动形状总结技术的最新进展,以监督我们的方法。我们发现,产生的形状表示是直观,实现卓越的质量比以前的国家的最先进的。最后,我们将演示如何我们的方法可以在应用程序中使用,如互动形状编辑,完成和插值,从而利用我们的模型学会引导这些任务的潜在空间。项目页面:这个HTTP URL。
57. Field-Level Crop Type Classification with k Nearest Neighbors: A Baseline for a New Kenya Smallholder Dataset [PDF] 返回目录
Hannah Kerner, Catherine Nakalembe, Inbal Becker-Reshef
Abstract: Accurate crop type maps provide critical information for ensuring food security, yet there has been limited research on crop type classification for smallholder agriculture, particularly in sub-Saharan Africa where risk of food insecurity is highest. Publicly-available ground-truth data such as the newly-released training dataset of crop types in Kenya (Radiant MLHub) are catalyzing this research, but it is important to understand the context of when, where, and how these datasets were obtained when evaluating classification performance and using them as a benchmark across methods. In this paper, we provide context for the new western Kenya dataset which was collected during an atypical 2019 main growing season and demonstrate classification accuracy up to 64% for maize and 70% for cassava using k Nearest Neighbors--a fast, interpretable, and scalable method that can serve as a baseline for future work.
摘要:精确的作物类型映射为确保粮食安全的重要信息,还出现了对作物类型的分类研究有限公司为小农农业,特别是在撒哈拉以南非洲,粮食不安全的风险是最高的。如在肯尼亚(辐射MLHub)作物类型的新推出的训练数据集公开可用的地面实况数据催化该研究,但要明白何时,何地,如何获得这些数据集的情况下是很重要的评估时,分类性能和使用它们作为整个方法的基准。在本文中,我们提供了这是在非典型2019主要生长季节收集到的新的肯尼亚西部数据集的背景下,展示的分类准确率高达64%的玉米和使用k近邻木薯70% - 一个快速的,可解释的,并可伸缩的方法,可以作为今后工作的基础。
Hannah Kerner, Catherine Nakalembe, Inbal Becker-Reshef
Abstract: Accurate crop type maps provide critical information for ensuring food security, yet there has been limited research on crop type classification for smallholder agriculture, particularly in sub-Saharan Africa where risk of food insecurity is highest. Publicly-available ground-truth data such as the newly-released training dataset of crop types in Kenya (Radiant MLHub) are catalyzing this research, but it is important to understand the context of when, where, and how these datasets were obtained when evaluating classification performance and using them as a benchmark across methods. In this paper, we provide context for the new western Kenya dataset which was collected during an atypical 2019 main growing season and demonstrate classification accuracy up to 64% for maize and 70% for cassava using k Nearest Neighbors--a fast, interpretable, and scalable method that can serve as a baseline for future work.
摘要:精确的作物类型映射为确保粮食安全的重要信息,还出现了对作物类型的分类研究有限公司为小农农业,特别是在撒哈拉以南非洲,粮食不安全的风险是最高的。如在肯尼亚(辐射MLHub)作物类型的新推出的训练数据集公开可用的地面实况数据催化该研究,但要明白何时,何地,如何获得这些数据集的情况下是很重要的评估时,分类性能和使用它们作为整个方法的基准。在本文中,我们提供了这是在非典型2019主要生长季节收集到的新的肯尼亚西部数据集的背景下,展示的分类准确率高达64%的玉米和使用k近邻木薯70% - 一个快速的,可解释的,并可伸缩的方法,可以作为今后工作的基础。
58. Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment [PDF] 返回目录
Qiuyu Chen, Wei Zhang, Ning Zhou, Peng Lei, Yi Xu, Yu Zheng, Jianping Fan
Abstract: To leverage deep learning for image aesthetics assessment, one critical but unsolved issue is how to seamlessly incorporate the information of image aspect ratios to learn more robust models. In this paper, an adaptive fractional dilated convolution (AFDC), which is aspect-ratio-embedded, composition-preserving and parameter-free, is developed to tackle this issue natively in convolutional kernel level. Specifically, the fractional dilated kernel is adaptively constructed according to the image aspect ratios, where the interpolation of nearest two integers dilated kernels is used to cope with the misalignment of fractional sampling. Moreover, we provide a concise formulation for mini-batch training and utilize a grouping strategy to reduce computational overhead. As a result, it can be easily implemented by common deep learning libraries and plugged into popular CNN architectures in a computation-efficient manner. Our experimental results demonstrate that our proposed method achieves state-of-the-art performance on image aesthetics assessment over the AVA dataset.
摘要:为了能利用深层学习图像美学评价,一个关键,但没有解决的问题是如何无缝地整合图像宽高比的信息,以了解更多可靠的模型。在本文中,一种自适应分数扩张卷积(AFDC),这是纵横比包埋,组合物保留和无参数,被显影以在卷积内核级本地处理这个问题。具体地讲,分数扩张内核根据图像的纵横比,其中最近的两个整数的插值扩张内核用来应付分数采样的未对准自适应地构成。此外,我们提供小批量训练简洁制定和利用组策略,减少计算开销。其结果是,它可以容易地由普通深学习库实现,并插入到在计算有效的方式流行CNN架构。我们的实验结果表明,我们提出的方法实现对图像的美学评价过AVA数据集的国家的最先进的性能。
Qiuyu Chen, Wei Zhang, Ning Zhou, Peng Lei, Yi Xu, Yu Zheng, Jianping Fan
Abstract: To leverage deep learning for image aesthetics assessment, one critical but unsolved issue is how to seamlessly incorporate the information of image aspect ratios to learn more robust models. In this paper, an adaptive fractional dilated convolution (AFDC), which is aspect-ratio-embedded, composition-preserving and parameter-free, is developed to tackle this issue natively in convolutional kernel level. Specifically, the fractional dilated kernel is adaptively constructed according to the image aspect ratios, where the interpolation of nearest two integers dilated kernels is used to cope with the misalignment of fractional sampling. Moreover, we provide a concise formulation for mini-batch training and utilize a grouping strategy to reduce computational overhead. As a result, it can be easily implemented by common deep learning libraries and plugged into popular CNN architectures in a computation-efficient manner. Our experimental results demonstrate that our proposed method achieves state-of-the-art performance on image aesthetics assessment over the AVA dataset.
摘要:为了能利用深层学习图像美学评价,一个关键,但没有解决的问题是如何无缝地整合图像宽高比的信息,以了解更多可靠的模型。在本文中,一种自适应分数扩张卷积(AFDC),这是纵横比包埋,组合物保留和无参数,被显影以在卷积内核级本地处理这个问题。具体地讲,分数扩张内核根据图像的纵横比,其中最近的两个整数的插值扩张内核用来应付分数采样的未对准自适应地构成。此外,我们提供小批量训练简洁制定和利用组策略,减少计算开销。其结果是,它可以容易地由普通深学习库实现,并插入到在计算有效的方式流行CNN架构。我们的实验结果表明,我们提出的方法实现对图像的美学评价过AVA数据集的国家的最先进的性能。
59. LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood [PDF] 返回目录
Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
Abstract: Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations, but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method's estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.
摘要:现代的脸比对方法已经在预测脸部显着标记的位置相当准确,但他们通常不会估计其预测位置的不确定性,也没有预测的地标是否可见。在本文中,我们提出了一个新颖的框架共同预测界标位置,这些预测的位置的相关的不确定性,和界标能见度。我们这些建模为混合随机变量,并使用与我们提出的地点,不确定性和能见度似然(LUVLi)亏损培养了深厚的网络估计它们。此外,我们在全方位头部姿势释放出大量的脸比对数据集有超过19,000的人脸图像的全新标识。每一个面是手动与68个的地标地面实况位置标记的,与每一个地标是否是未被遮挡的附加信息,自闭塞(由于极端头的姿势),或从外部封闭。不仅我们的联合估计产量预测的地标位置的不确定性准确的估计,但同时也产生了国家的最先进的估计在多个标准人脸比对数据集的界标位置本身。预测的地标位置的不确定性,我们的方法的估计可以用来自动识别哪个脸比对失败的输入图像,这对于下游的任务至关重要。
Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
Abstract: Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations, but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method's estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.
摘要:现代的脸比对方法已经在预测脸部显着标记的位置相当准确,但他们通常不会估计其预测位置的不确定性,也没有预测的地标是否可见。在本文中,我们提出了一个新颖的框架共同预测界标位置,这些预测的位置的相关的不确定性,和界标能见度。我们这些建模为混合随机变量,并使用与我们提出的地点,不确定性和能见度似然(LUVLi)亏损培养了深厚的网络估计它们。此外,我们在全方位头部姿势释放出大量的脸比对数据集有超过19,000的人脸图像的全新标识。每一个面是手动与68个的地标地面实况位置标记的,与每一个地标是否是未被遮挡的附加信息,自闭塞(由于极端头的姿势),或从外部封闭。不仅我们的联合估计产量预测的地标位置的不确定性准确的估计,但同时也产生了国家的最先进的估计在多个标准人脸比对数据集的界标位置本身。预测的地标位置的不确定性,我们的方法的估计可以用来自动识别哪个脸比对失败的输入图像,这对于下游的任务至关重要。
60. Deblurring using Analysis-Synthesis Networks Pair [PDF] 返回目录
Adam Kaufman, Raanan Fattal
Abstract: Blind image deblurring remains a challenging problem for modern artificial neural networks. Unlike other image restoration problems, deblurring networks fail behind the performance of existing deblurring algorithms in case of uniform and 3D blur models. This follows from the diverse and profound effect that the unknown blur-kernel has on the deblurring operator. We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image. Unlike existing deblurring networks, this design allows us to explicitly incorporate the blur-kernel in the network's training. In addition, we introduce new cross-correlation layers that allow better blur estimations, as well as unique components that allow the estimate blur to control the action of the synthesis deblurring action. Evaluating the new approach over established benchmark datasets shows its ability to achieve state-of-the-art deblurring accuracy on various tests, as well as offer a major speedup in runtime.
摘要:盲图像去模糊仍然是现代人工神经网络的一个具有挑战性的问题。不像其他的图像恢复的问题,去模糊网络故障的存在去模糊算法统一和3D模糊模型的情况下,表现落后。在此之前,从多样化和深远的影响是未知的模糊内核的去模糊操作。我们提出了一个新的架构,它打破了去模糊网络成估计模糊的分析网络,以及使用该内核图像去模糊的综合网络。与现有的去模糊的网络,这样的设计可以让我们明确地将模糊内核在网络的训练。此外,我们引入允许更好的模糊估算新的互相关的层,以及独特的组件,其允许估计模糊控制合成去模糊的动作的动作。评估新办法在建立标准数据集显示了其实现国家的最先进的去模糊各种实验的准确性,以及提供在运行时可以大大提高速度的能力。
Adam Kaufman, Raanan Fattal
Abstract: Blind image deblurring remains a challenging problem for modern artificial neural networks. Unlike other image restoration problems, deblurring networks fail behind the performance of existing deblurring algorithms in case of uniform and 3D blur models. This follows from the diverse and profound effect that the unknown blur-kernel has on the deblurring operator. We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image. Unlike existing deblurring networks, this design allows us to explicitly incorporate the blur-kernel in the network's training. In addition, we introduce new cross-correlation layers that allow better blur estimations, as well as unique components that allow the estimate blur to control the action of the synthesis deblurring action. Evaluating the new approach over established benchmark datasets shows its ability to achieve state-of-the-art deblurring accuracy on various tests, as well as offer a major speedup in runtime.
摘要:盲图像去模糊仍然是现代人工神经网络的一个具有挑战性的问题。不像其他的图像恢复的问题,去模糊网络故障的存在去模糊算法统一和3D模糊模型的情况下,表现落后。在此之前,从多样化和深远的影响是未知的模糊内核的去模糊操作。我们提出了一个新的架构,它打破了去模糊网络成估计模糊的分析网络,以及使用该内核图像去模糊的综合网络。与现有的去模糊的网络,这样的设计可以让我们明确地将模糊内核在网络的训练。此外,我们引入允许更好的模糊估算新的互相关的层,以及独特的组件,其允许估计模糊控制合成去模糊的动作的动作。评估新办法在建立标准数据集显示了其实现国家的最先进的去模糊各种实验的准确性,以及提供在运行时可以大大提高速度的能力。
61. Objectness-Aware One-Shot Semantic Segmentation [PDF] 返回目录
Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
Abstract: While deep convolutional neural networks have led to great progress in image semantic segmentation, they typically require collecting a large number of densely-annotated images for training. Moreover, once trained, the model can only make predictions in a pre-defined set of categories. Therefore, few-shot image semantic segmentation has been explored to learn to segment from only a few annotated examples. In this paper, we tackle the challenging one-shot semantic segmentation problem by taking advantage of objectness. In order to capture prior knowledge of object and background, we first train an objectness segmentation module which generalizes well to unseen categories. Then we use the objectness module to predict the objects present in the query image, and train an objectness-aware few-shot segmentation model that takes advantage of both the object information and limited annotations of the unseen category to perform segmentation in the query image. Our method achieves a mIoU score of 57.9% and 22.6% given only one annotated example of an unseen category in PASCAL-5i and COCO-20i, outperforming related baselines overall.
摘要:虽然深卷积神经网络已经导致图像语义分割了长足的进步,它们通常需要收集大量的培训密集标注图像。而且,一旦受过训练的模型只能使一组预定义的类别的预测。因此,很少有镜头图像语义分割已探索从只有几个注释例子学段。在本文中,我们通过采取对象性的优势,解决具有挑战性的单次语义分割问题。为了捕捉对象和背景的先验知识,我们首先训练从而推广很好地看不见的类别的对象性分割模块。然后我们使用对象性模块预测的对象出现在查询图像中,培养的对象性意识的为数不多的镜头分割模式,拍摄对象信息和看不见类的有限的注释两者的优势,以查询图像中进行分割。我们的方法实现了米欧得分的57.9%和22.6%,仅给出一个注释在PASCAL-5i和COCO-20I一个看不见的类实例,跑赢基准相关的整体。
Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
Abstract: While deep convolutional neural networks have led to great progress in image semantic segmentation, they typically require collecting a large number of densely-annotated images for training. Moreover, once trained, the model can only make predictions in a pre-defined set of categories. Therefore, few-shot image semantic segmentation has been explored to learn to segment from only a few annotated examples. In this paper, we tackle the challenging one-shot semantic segmentation problem by taking advantage of objectness. In order to capture prior knowledge of object and background, we first train an objectness segmentation module which generalizes well to unseen categories. Then we use the objectness module to predict the objects present in the query image, and train an objectness-aware few-shot segmentation model that takes advantage of both the object information and limited annotations of the unseen category to perform segmentation in the query image. Our method achieves a mIoU score of 57.9% and 22.6% given only one annotated example of an unseen category in PASCAL-5i and COCO-20i, outperforming related baselines overall.
摘要:虽然深卷积神经网络已经导致图像语义分割了长足的进步,它们通常需要收集大量的培训密集标注图像。而且,一旦受过训练的模型只能使一组预定义的类别的预测。因此,很少有镜头图像语义分割已探索从只有几个注释例子学段。在本文中,我们通过采取对象性的优势,解决具有挑战性的单次语义分割问题。为了捕捉对象和背景的先验知识,我们首先训练从而推广很好地看不见的类别的对象性分割模块。然后我们使用对象性模块预测的对象出现在查询图像中,培养的对象性意识的为数不多的镜头分割模式,拍摄对象信息和看不见类的有限的注释两者的优势,以查询图像中进行分割。我们的方法实现了米欧得分的57.9%和22.6%,仅给出一个注释在PASCAL-5i和COCO-20I一个看不见的类实例,跑赢基准相关的整体。
62. Fingerprint Presentation Attack Detection: A Sensor and Material Agnostic Approach [PDF] 返回目录
Steven A. Grosz, Tarang Chugh, Anil K. Jain
Abstract: The vulnerability of automated fingerprint recognition systems to presentation attacks (PA), i.e., spoof or altered fingers, has been a growing concern, warranting the development of accurate and efficient presentation attack detection (PAD) methods. However, one major limitation of the existing PAD solutions is their poor generalization to new PA materials and fingerprint sensors, not used in training. In this study, we propose a robust PAD solution with improved cross-material and cross-sensor generalization. Specifically, we build on top of any CNN-based architecture trained for fingerprint spoof detection combined with cross-material spoof generalization using a style transfer network wrapper. We also incorporate adversarial representation learning (ARL) in deep neural networks (DNN) to learn sensor and material invariant representations for PAD. Experimental results on LivDet 2015 and 2017 public domain datasets exhibit the effectiveness of the proposed approach.
摘要:指纹自动识别系统演示攻击(PA),即,恶搞或改变手指的漏洞,已经日益受到关注,warranting的准确,高效演示的攻击检测(PAD)方法的发展。然而,现有的PAD解决方案的一个主要限制是其泛化差新的PA材料和指纹传感器,在训练中不使用。在这项研究中,我们提出了具有改进的横材料和跨传感器的概括鲁棒PAD溶液。具体来说,我们建立在任何基于CNN的架构训练指纹欺骗检测与使用式传送网络封装器横材料欺骗一般化组合的顶部。我们也纳入深层神经网络(DNN)对抗性表示学习(ARL),以了解传感器和材料恒定表征的PAD。在2015年LivDet至2017年公共领域的数据集实验结果显示了该方法的有效性。
Steven A. Grosz, Tarang Chugh, Anil K. Jain
Abstract: The vulnerability of automated fingerprint recognition systems to presentation attacks (PA), i.e., spoof or altered fingers, has been a growing concern, warranting the development of accurate and efficient presentation attack detection (PAD) methods. However, one major limitation of the existing PAD solutions is their poor generalization to new PA materials and fingerprint sensors, not used in training. In this study, we propose a robust PAD solution with improved cross-material and cross-sensor generalization. Specifically, we build on top of any CNN-based architecture trained for fingerprint spoof detection combined with cross-material spoof generalization using a style transfer network wrapper. We also incorporate adversarial representation learning (ARL) in deep neural networks (DNN) to learn sensor and material invariant representations for PAD. Experimental results on LivDet 2015 and 2017 public domain datasets exhibit the effectiveness of the proposed approach.
摘要:指纹自动识别系统演示攻击(PA),即,恶搞或改变手指的漏洞,已经日益受到关注,warranting的准确,高效演示的攻击检测(PAD)方法的发展。然而,现有的PAD解决方案的一个主要限制是其泛化差新的PA材料和指纹传感器,在训练中不使用。在这项研究中,我们提出了具有改进的横材料和跨传感器的概括鲁棒PAD溶液。具体来说,我们建立在任何基于CNN的架构训练指纹欺骗检测与使用式传送网络封装器横材料欺骗一般化组合的顶部。我们也纳入深层神经网络(DNN)对抗性表示学习(ARL),以了解传感器和材料恒定表征的PAD。在2015年LivDet至2017年公共领域的数据集实验结果显示了该方法的有效性。
63. Efficient Scale Estimation Methods using Lightweight Deep Convolutional Neural Networks for Visual Tracking [PDF] 返回目录
Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract: In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising. However, most of these methods suffer from a lack of robust scale estimation skills. Although a wide range of recent DCF-based methods exploit the features that are extracted from deep convolutional neural networks (CNNs) in their translation model, the scale of the visual target is still estimated by hand-crafted features. Whereas the exploitation of CNNs imposes a high computational burden, this paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods, which not only improve the visual tracking performance but also provide acceptable tracking speeds. The proposed methods are formulated based on either holistic or region representation of convolutional feature maps to efficiently integrate into DCF formulations to learn a robust scale model in the frequency domain. Moreover, against the conventional scale estimation methods with iterative feature extraction of different target regions, the proposed methods exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency. Comprehensive experimental results on the OTB-50, OTB-100, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed visual tracking methods outperform the state-of-the-art methods, effectively.
摘要:近年来,基于辨别相关滤波器(DCF)的视觉跟踪方法已经非常有前景。然而,这些方法大多苦于缺乏强大的规模估算技能。尽管广泛的最近基于DCF-方法利用的是从深卷积神经网络(细胞神经网络)在它们的翻译模型提取的特征,视觉目标的规模仍然由手工制作的特征估计。而细胞神经网络的开采征收较高的计算负担,本文利用预先训练的轻质细胞神经网络模型,提出了两个有效的规模估算方法,这不仅提高了视觉跟踪性能,而且还提供可接受的跟踪速度。所提出的方法是基于卷积特征的任一或整体区域表示配制映射到有效地集成到DCF制剂学在频域中的健壮比例模型。此外,针对具有不同靶区域的迭代特征提取的常规刻度的估计方法,所提出的方法利用提议显著提高计算效率单程特征提取过程。在OTB-50,OTB-100,TC-128和VOT-2018视觉跟踪数据集的综合实验结果表明,所提出的视觉跟踪方法优于国家的最先进的方法,有效。
Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract: In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising. However, most of these methods suffer from a lack of robust scale estimation skills. Although a wide range of recent DCF-based methods exploit the features that are extracted from deep convolutional neural networks (CNNs) in their translation model, the scale of the visual target is still estimated by hand-crafted features. Whereas the exploitation of CNNs imposes a high computational burden, this paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods, which not only improve the visual tracking performance but also provide acceptable tracking speeds. The proposed methods are formulated based on either holistic or region representation of convolutional feature maps to efficiently integrate into DCF formulations to learn a robust scale model in the frequency domain. Moreover, against the conventional scale estimation methods with iterative feature extraction of different target regions, the proposed methods exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency. Comprehensive experimental results on the OTB-50, OTB-100, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed visual tracking methods outperform the state-of-the-art methods, effectively.
摘要:近年来,基于辨别相关滤波器(DCF)的视觉跟踪方法已经非常有前景。然而,这些方法大多苦于缺乏强大的规模估算技能。尽管广泛的最近基于DCF-方法利用的是从深卷积神经网络(细胞神经网络)在它们的翻译模型提取的特征,视觉目标的规模仍然由手工制作的特征估计。而细胞神经网络的开采征收较高的计算负担,本文利用预先训练的轻质细胞神经网络模型,提出了两个有效的规模估算方法,这不仅提高了视觉跟踪性能,而且还提供可接受的跟踪速度。所提出的方法是基于卷积特征的任一或整体区域表示配制映射到有效地集成到DCF制剂学在频域中的健壮比例模型。此外,针对具有不同靶区域的迭代特征提取的常规刻度的估计方法,所提出的方法利用提议显著提高计算效率单程特征提取过程。在OTB-50,OTB-100,TC-128和VOT-2018视觉跟踪数据集的综合实验结果表明,所提出的视觉跟踪方法优于国家的最先进的方法,有效。
64. Beyond Background-Aware Correlation Filters: Adaptive Context Modeling by Hand-Crafted and Deep RGB Features for Visual Tracking [PDF] 返回目录
Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract: In recent years, the background-aware correlation filters have achie-ved a lot of research interest in the visual target tracking. However, these methods cannot suitably model the target appearance due to the exploitation of hand-crafted features. On the other hand, the recent deep learning-based visual tracking methods have provided a competitive performance along with extensive computations. In this paper, an adaptive background-aware correlation filter-based tracker is proposed that effectively models the target appearance by using either the histogram of oriented gradients (HOG) or convolutional neural network (CNN) feature maps. The proposed method exploits the fast 2D non-maximum suppression (NMS) algorithm and the semantic information comparison to detect challenging situations. When the HOG-based response map is not reliable, or the context region has a low semantic similarity with prior regions, the proposed method constructs the CNN context model to improve the target region estimation. Furthermore, the rejection option allows the proposed method to update the CNN context model only on valid regions. Comprehensive experimental results demonstrate that the proposed adaptive method clearly outperforms the accuracy and robustness of visual target tracking compared to the state-of-the-art methods on the OTB-50, OTB-100, TC-128, UAV-123, and VOT-2015 datasets.
摘要:近年来,背景感知相关滤波器具有的政绩,粘弹性阻尼器有很多的研究兴趣在视觉目标跟踪。然而,这些方法不能合适地目标外观模型由于手工制作的特征的开发。在另一方面,近期深基于学习的视觉跟踪方法已经提供了广泛的计算沿着竞争力的性能。在本文中,一个自适应背景感知相关性基于过滤器的跟踪器提出了有效的模型通过使用定向梯度(HOG)或卷积神经网络(CNN)特征映射的直方图的目标外观。所提出的方法利用了快速的2D非最大抑制(NMS)算法和语义信息比较,以检测有挑战性的情况。当基于HOG - 响应图是不可靠的,或上下文区域具有与现有的区域低的语义相似性,所提出的方法构造CNN上下文模型来提高目标区域估计。此外,拒绝选项允许该方法仅在有效区域更新CNN上下文模型。全面实验结果表明,所提出的自适应方法相比对OTB-50,OTB-100,TC-128,UAV-123,和VOT所述状态的最先进的方法明显优于准确性和视觉目标跟踪的鲁棒性-2015数据集。
Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract: In recent years, the background-aware correlation filters have achie-ved a lot of research interest in the visual target tracking. However, these methods cannot suitably model the target appearance due to the exploitation of hand-crafted features. On the other hand, the recent deep learning-based visual tracking methods have provided a competitive performance along with extensive computations. In this paper, an adaptive background-aware correlation filter-based tracker is proposed that effectively models the target appearance by using either the histogram of oriented gradients (HOG) or convolutional neural network (CNN) feature maps. The proposed method exploits the fast 2D non-maximum suppression (NMS) algorithm and the semantic information comparison to detect challenging situations. When the HOG-based response map is not reliable, or the context region has a low semantic similarity with prior regions, the proposed method constructs the CNN context model to improve the target region estimation. Furthermore, the rejection option allows the proposed method to update the CNN context model only on valid regions. Comprehensive experimental results demonstrate that the proposed adaptive method clearly outperforms the accuracy and robustness of visual target tracking compared to the state-of-the-art methods on the OTB-50, OTB-100, TC-128, UAV-123, and VOT-2015 datasets.
摘要:近年来,背景感知相关滤波器具有的政绩,粘弹性阻尼器有很多的研究兴趣在视觉目标跟踪。然而,这些方法不能合适地目标外观模型由于手工制作的特征的开发。在另一方面,近期深基于学习的视觉跟踪方法已经提供了广泛的计算沿着竞争力的性能。在本文中,一个自适应背景感知相关性基于过滤器的跟踪器提出了有效的模型通过使用定向梯度(HOG)或卷积神经网络(CNN)特征映射的直方图的目标外观。所提出的方法利用了快速的2D非最大抑制(NMS)算法和语义信息比较,以检测有挑战性的情况。当基于HOG - 响应图是不可靠的,或上下文区域具有与现有的区域低的语义相似性,所提出的方法构造CNN上下文模型来提高目标区域估计。此外,拒绝选项允许该方法仅在有效区域更新CNN上下文模型。全面实验结果表明,所提出的自适应方法相比对OTB-50,OTB-100,TC-128,UAV-123,和VOT所述状态的最先进的方法明显优于准确性和视觉目标跟踪的鲁棒性-2015数据集。
65. Empirical Upper Bound, Error Diagnosis and Invariance Analysis of Modern Object Detectors [PDF] 返回目录
Ali Borji
Abstract: Object detection remains as one of the most notorious open problems in computer vision. Despite large strides in accuracy in recent years, modern object detectors have started to saturate on popular benchmarks raising the question of how far we can reach with deep learning tools and tricks. Here, by employing 2 state-of-the-art object detection benchmarks, and analyzing more than 15 models over 4 large scale datasets, we I) carefully determine the upper bound in AP, which is 91.6% on VOC (test2007), 78.2% on COCO (val2017), and 58.9% on OpenImages V4 (validation), regardless of the IOU threshold. These numbers are much better than the mAP of the best model (47.9% on VOC, and 46.9% on COCO; IOUs=.5:.05:.95), II) characterize the sources of errors in object detectors, in a novel and intuitive way, and find that classification error (confusion with other classes and misses) explains the largest fraction of errors and weighs more than localization and duplicate errors, and III) analyze the invariance properties of models when surrounding context of an object is removed, when an object is placed in an incongruent background, and when images are blurred or flipped vertically. We find that models generate a lot of boxes on empty regions and that context is more important for detecting small objects than larger ones. Our work taps into the tight relationship between object detection and object recognition and offers insights for building better models. Our code is publicly available at this https URL bound.git.
摘要:目的检测遗体作为计算机视觉领域中最臭名昭著的开放问题之一。尽管在精度巨大的进步,近年来,现代对象检测器已经开始饱和流行的基准提高,我们可以在多大程度上与深度学习工具和技巧达到的问题。这里,通过采用2状态的最先进的物体检测的基准,并通过4个分析超过15个模型大规模数据集,我们I)仔细确定在AP,这是91.6%的VOC(test2007),78.2上界%的COCO(val2017),并在OpenImages V4(验证)58.9%,而不管IOU阈值。这些数字是比最好的模型(上VOC 47.9%,和46.9%上COCO;白条= 0.5:0.05:0.95)的地图好得多,II)表征对象检测器的错误的来源,以一种新颖的而直观的方式,发现分类错误(混乱与其他类和未命中)解释错误的最大部分和重量超过本地化和重复的错误,当一个物体的周围的情况下被删除III)分析模型的不变性,当物体被放置在不一致的背景,并且当图像被模糊或垂直翻转。我们发现,模型产生的空白区域有很多的箱子和这方面是比较大的检测小的物体更重要。我们的工作水龙头到建立更好的模型对象的检测和物体识别,并提供见解之间的紧密关系。我们的代码是公开的,在此HTTPS URL bound.git。
Ali Borji
Abstract: Object detection remains as one of the most notorious open problems in computer vision. Despite large strides in accuracy in recent years, modern object detectors have started to saturate on popular benchmarks raising the question of how far we can reach with deep learning tools and tricks. Here, by employing 2 state-of-the-art object detection benchmarks, and analyzing more than 15 models over 4 large scale datasets, we I) carefully determine the upper bound in AP, which is 91.6% on VOC (test2007), 78.2% on COCO (val2017), and 58.9% on OpenImages V4 (validation), regardless of the IOU threshold. These numbers are much better than the mAP of the best model (47.9% on VOC, and 46.9% on COCO; IOUs=.5:.05:.95), II) characterize the sources of errors in object detectors, in a novel and intuitive way, and find that classification error (confusion with other classes and misses) explains the largest fraction of errors and weighs more than localization and duplicate errors, and III) analyze the invariance properties of models when surrounding context of an object is removed, when an object is placed in an incongruent background, and when images are blurred or flipped vertically. We find that models generate a lot of boxes on empty regions and that context is more important for detecting small objects than larger ones. Our work taps into the tight relationship between object detection and object recognition and offers insights for building better models. Our code is publicly available at this https URL bound.git.
摘要:目的检测遗体作为计算机视觉领域中最臭名昭著的开放问题之一。尽管在精度巨大的进步,近年来,现代对象检测器已经开始饱和流行的基准提高,我们可以在多大程度上与深度学习工具和技巧达到的问题。这里,通过采用2状态的最先进的物体检测的基准,并通过4个分析超过15个模型大规模数据集,我们I)仔细确定在AP,这是91.6%的VOC(test2007),78.2上界%的COCO(val2017),并在OpenImages V4(验证)58.9%,而不管IOU阈值。这些数字是比最好的模型(上VOC 47.9%,和46.9%上COCO;白条= 0.5:0.05:0.95)的地图好得多,II)表征对象检测器的错误的来源,以一种新颖的而直观的方式,发现分类错误(混乱与其他类和未命中)解释错误的最大部分和重量超过本地化和重复的错误,当一个物体的周围的情况下被删除III)分析模型的不变性,当物体被放置在不一致的背景,并且当图像被模糊或垂直翻转。我们发现,模型产生的空白区域有很多的箱子和这方面是比较大的检测小的物体更重要。我们的工作水龙头到建立更好的模型对象的检测和物体识别,并提供见解之间的紧密关系。我们的代码是公开的,在此HTTPS URL bound.git。
66. U-Net Using Stacked Dilated Convolutions for Medical Image Segmentation [PDF] 返回目录
Shuhang Wang, Szu-Yeu Hu, Eugene Cheah, Xiaohong Wang, Jingchao Wang, Lei Chen, Masoud Baikpour, Arinc Ozturk, Qian Li, Shinn-Huey Chou, Connie Lehman, Viksit Kumar, Anthony Samir
Abstract: This paper proposes a novel U-Net variant using stacked dilated convolutions for medical image segmentation (SDU-Net). SDU-Net adopts the architecture of vanilla U-Net with modifications in the encoder and decoder operations (an operation indicates all the processing for feature maps of the same resolution). Unlike vanilla U-Net which incorporates two standard convolutions in each encoder/decoder operation, SDU-Net uses one standard convolution followed by multiple dilated convolutions and concatenates all dilated convolution outputs as input to the next operation. Experiments showed that SDU-Net outperformed vanilla U-Net, attention U-Net (AttU-Net), and recurrent residual U-Net (R2U-Net) in all four tested segmentation tasks while using parameters around 40% of vanilla U-Net's, 17% of AttU-Net's, and 15% of R2U-Net's.
摘要:本文提出了一种使用在医学图像分割(SDU-净)堆叠扩张卷积一种新颖的U形网的变体。 SDU-Net的采用香草U形网与在编码器和解码器的操作的修改的结构(操作指示所有相同的分辨率的特征地图的处理)。不像香草U型网,其包括两个标准卷积中的每个编码器/解码器操作,SDU-Net使用一个标准卷积其后是多个扩张卷积并连接所有扩张型卷积输出作为输入到下一操作。实验表明,SDU-Net的表现优于香草掌中宽带,注意掌中宽带(阿图-NET)和复发残留的U网(R2U-网)在所有四个测试的细分任务,同时利用各地的香草掌中宽带的40%参数,阿图的-Net的17%,而R2U-Net的15%。
Shuhang Wang, Szu-Yeu Hu, Eugene Cheah, Xiaohong Wang, Jingchao Wang, Lei Chen, Masoud Baikpour, Arinc Ozturk, Qian Li, Shinn-Huey Chou, Connie Lehman, Viksit Kumar, Anthony Samir
Abstract: This paper proposes a novel U-Net variant using stacked dilated convolutions for medical image segmentation (SDU-Net). SDU-Net adopts the architecture of vanilla U-Net with modifications in the encoder and decoder operations (an operation indicates all the processing for feature maps of the same resolution). Unlike vanilla U-Net which incorporates two standard convolutions in each encoder/decoder operation, SDU-Net uses one standard convolution followed by multiple dilated convolutions and concatenates all dilated convolution outputs as input to the next operation. Experiments showed that SDU-Net outperformed vanilla U-Net, attention U-Net (AttU-Net), and recurrent residual U-Net (R2U-Net) in all four tested segmentation tasks while using parameters around 40% of vanilla U-Net's, 17% of AttU-Net's, and 15% of R2U-Net's.
摘要:本文提出了一种使用在医学图像分割(SDU-净)堆叠扩张卷积一种新颖的U形网的变体。 SDU-Net的采用香草U形网与在编码器和解码器的操作的修改的结构(操作指示所有相同的分辨率的特征地图的处理)。不像香草U型网,其包括两个标准卷积中的每个编码器/解码器操作,SDU-Net使用一个标准卷积其后是多个扩张卷积并连接所有扩张型卷积输出作为输入到下一操作。实验表明,SDU-Net的表现优于香草掌中宽带,注意掌中宽带(阿图-NET)和复发残留的U网(R2U-网)在所有四个测试的细分任务,同时利用各地的香草掌中宽带的40%参数,阿图的-Net的17%,而R2U-Net的15%。
67. Learning to Accelerate Decomposition for Multi-Directional 3D Printing [PDF] 返回目录
Chenming Wu, Yong-Jin Liu, Charlie C.L. Wang
Abstract: As a strong complementary of additive manufacturing, multi-directional 3D printing has the capability of decreasing or eliminating the need for support structures. Recent work proposed a beam-guided search algorithm to find an optimized sequence of plane-clipping, which gives volume decomposition of a given 3D model. Different printing directions are employed in different regions so that a model can be fabricated with tremendously less supports (or even no support in many cases). To obtain optimized decomposition, a large beam width needs to be used in the search algorithm, which therefore leads to a very time-consuming computation. In this paper, we propose a learning framework that can accelerate the beam-guided search by using only 1/2 of the original beam width to obtain results with similar quality. Specifically, we train a classifier for each pair of candidate clipping planes based on six newly proposed feature metrics from the results of beam-guided search with large beam width. With the help of these feature metrics, both the current and the sequence-dependent information are captured by the classifier to score candidates of clipping. As a result, we can achieve around 2 times acceleration. We test and demonstrate the performance of our accelerated decomposition on a large dataset of models for 3D printing.
摘要:作为添加剂制造的很强的互补性,多方向三维打印具有降低或消除对支撑结构的需要的能力。最近的工作提出了波束制导搜索算法找到平面裁剪,这给给定的3D模型的体积分解的优化序列。不同的印刷方向在不同的区域中使用,使得模型可以与极大较少支撑件(或甚至在许多情况下不支持)来制造。为了获得优化的分解,在搜索算法,可以使用一个大的光束宽度需要其因此导致一个非常耗时的计算。在本文中,我们提出了一个学习框架,可以只使用原来的光束宽度的1/2,以获得类似质量的结果加速了波束制导搜索。具体来说,我们训练分类器对于每一对候选剪切平面的基础上从与大波束宽度光束引导搜索的结果6个新提出的特征指标。有了这些功能指标的帮助下,当前和顺序相关的信息被分类捕捉到得分剪裁的候选人。因此,我们可以实现2倍左右的加速度。我们测试和演示大型数据集的模型,3D打印,我们加快分解的性能。
Chenming Wu, Yong-Jin Liu, Charlie C.L. Wang
Abstract: As a strong complementary of additive manufacturing, multi-directional 3D printing has the capability of decreasing or eliminating the need for support structures. Recent work proposed a beam-guided search algorithm to find an optimized sequence of plane-clipping, which gives volume decomposition of a given 3D model. Different printing directions are employed in different regions so that a model can be fabricated with tremendously less supports (or even no support in many cases). To obtain optimized decomposition, a large beam width needs to be used in the search algorithm, which therefore leads to a very time-consuming computation. In this paper, we propose a learning framework that can accelerate the beam-guided search by using only 1/2 of the original beam width to obtain results with similar quality. Specifically, we train a classifier for each pair of candidate clipping planes based on six newly proposed feature metrics from the results of beam-guided search with large beam width. With the help of these feature metrics, both the current and the sequence-dependent information are captured by the classifier to score candidates of clipping. As a result, we can achieve around 2 times acceleration. We test and demonstrate the performance of our accelerated decomposition on a large dataset of models for 3D printing.
摘要:作为添加剂制造的很强的互补性,多方向三维打印具有降低或消除对支撑结构的需要的能力。最近的工作提出了波束制导搜索算法找到平面裁剪,这给给定的3D模型的体积分解的优化序列。不同的印刷方向在不同的区域中使用,使得模型可以与极大较少支撑件(或甚至在许多情况下不支持)来制造。为了获得优化的分解,在搜索算法,可以使用一个大的光束宽度需要其因此导致一个非常耗时的计算。在本文中,我们提出了一个学习框架,可以只使用原来的光束宽度的1/2,以获得类似质量的结果加速了波束制导搜索。具体来说,我们训练分类器对于每一对候选剪切平面的基础上从与大波束宽度光束引导搜索的结果6个新提出的特征指标。有了这些功能指标的帮助下,当前和顺序相关的信息被分类捕捉到得分剪裁的候选人。因此,我们可以实现2倍左右的加速度。我们测试和演示大型数据集的模型,3D打印,我们加快分解的性能。
68. Deep Open Space Segmentation using Automotive Radar [PDF] 返回目录
Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Fahed Al Hassanat, Elnaz Jahani Heravi, Robert Laganiere, Julien Rebut, Waqas Malik
Abstract: In this work, we propose the use of radar with advanced deep segmentation models to identify open space in parking scenarios. A publically available dataset of radar observations called SCORP was collected. Deep models are evaluated with various radar input representations. Our proposed approach achieves low memory usage and real-time processing speeds, and is thus very well suited for embedded deployment.
摘要:在这项工作中,我们提出了先进的深细分车型使用雷达的泊车场景识别开放空间。雷达观测称为SCORP的可公开获得的数据集进行收集。深模型与各种雷达输入交涉评估。我们建议的做法,实现了低内存占用和实时处理速度,因此非常适合用于嵌入式部署。
Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Fahed Al Hassanat, Elnaz Jahani Heravi, Robert Laganiere, Julien Rebut, Waqas Malik
Abstract: In this work, we propose the use of radar with advanced deep segmentation models to identify open space in parking scenarios. A publically available dataset of radar observations called SCORP was collected. Deep models are evaluated with various radar input representations. Our proposed approach achieves low memory usage and real-time processing speeds, and is thus very well suited for embedded deployment.
摘要:在这项工作中,我们提出了先进的深细分车型使用雷达的泊车场景识别开放空间。雷达观测称为SCORP的可公开获得的数据集进行收集。深模型与各种雷达输入交涉评估。我们建议的做法,实现了低内存占用和实时处理速度,因此非常适合用于嵌入式部署。
69. Harmony-Search and Otsu based System for Coronavirus Disease (COVID-19) Detection using Lung CT Scan Images [PDF] 返回目录
V. Rajinikanth, Nilanjan Dey, Alex Noel Joseph Raj, Aboul Ella Hassanien, K.C. Santosh, N. Sri Madhava Raja
Abstract: Pneumonia is one of the foremost lung diseases and untreated pneumonia will lead to serious threats for all age groups. The proposed work aims to extract and evaluate the Coronavirus disease (COVID-19) caused pneumonia infection in lung using CT scans. We propose an image-assisted system to extract COVID-19 infected sections from lung CT scans (coronal view). It includes following steps: (i) Threshold filter to extract the lung region by eliminating possible artifacts; (ii) Image enhancement using Harmony-Search-Optimization and Otsu thresholding; (iii) Image segmentation to extract infected region(s); and (iv) Region-of-interest (ROI) extraction (features) from binary image to compute level of severity. The features that are extracted from ROI are then employed to identify the pixel ratio between the lung and infection sections to identify infection level of severity. The primary objective of the tool is to assist the pulmonologist not only to detect but also to help plan treatment process. As a consequence, for mass screening processing, it will help prevent diagnostic burden.
摘要:肺炎是最重要的肺部疾病和未经处理的肺炎之一将导致对所有年龄组的严重威胁。所提出的工作目标,以提取和评估使用CT扫描造成的冠状病毒病(COVID-19)肺炎感染肺。我们提出了一种图像辅助系统来提取肺CT扫描(冠状图)COVID-19感染的部分。它包括以下步骤:(i)门限滤波器通过消除可能伪影来提取肺区域; (ii)使用和声-搜索的优化和大津的阈值图像增强; (ⅲ)的图像分割来提取感染区域(一个或多个);和(iv)区域的感兴趣(ROI)萃取(特征)从二值图像至严重程度计算水平。那么,从ROI中提取的特征被用来识别肺和感染部分之间的像素比例,以确定严重性的感染水平。该工具的主要目的是协助胸腔不仅检测还能帮助计划处理工艺。因此,大规模筛选处理,这将有助于预防诊断负担。
V. Rajinikanth, Nilanjan Dey, Alex Noel Joseph Raj, Aboul Ella Hassanien, K.C. Santosh, N. Sri Madhava Raja
Abstract: Pneumonia is one of the foremost lung diseases and untreated pneumonia will lead to serious threats for all age groups. The proposed work aims to extract and evaluate the Coronavirus disease (COVID-19) caused pneumonia infection in lung using CT scans. We propose an image-assisted system to extract COVID-19 infected sections from lung CT scans (coronal view). It includes following steps: (i) Threshold filter to extract the lung region by eliminating possible artifacts; (ii) Image enhancement using Harmony-Search-Optimization and Otsu thresholding; (iii) Image segmentation to extract infected region(s); and (iv) Region-of-interest (ROI) extraction (features) from binary image to compute level of severity. The features that are extracted from ROI are then employed to identify the pixel ratio between the lung and infection sections to identify infection level of severity. The primary objective of the tool is to assist the pulmonologist not only to detect but also to help plan treatment process. As a consequence, for mass screening processing, it will help prevent diagnostic burden.
摘要:肺炎是最重要的肺部疾病和未经处理的肺炎之一将导致对所有年龄组的严重威胁。所提出的工作目标,以提取和评估使用CT扫描造成的冠状病毒病(COVID-19)肺炎感染肺。我们提出了一种图像辅助系统来提取肺CT扫描(冠状图)COVID-19感染的部分。它包括以下步骤:(i)门限滤波器通过消除可能伪影来提取肺区域; (ii)使用和声-搜索的优化和大津的阈值图像增强; (ⅲ)的图像分割来提取感染区域(一个或多个);和(iv)区域的感兴趣(ROI)萃取(特征)从二值图像至严重程度计算水平。那么,从ROI中提取的特征被用来识别肺和感染部分之间的像素比例,以确定严重性的感染水平。该工具的主要目的是协助胸腔不仅检测还能帮助计划处理工艺。因此,大规模筛选处理,这将有助于预防诊断负担。
70. Automated Smartphone based System for Diagnosis of Diabetic Retinopathy [PDF] 返回目录
Misgina Tsighe Hagos, Shri Kant, Surayya Ado Bala
Abstract: Early diagnosis of diabetic retinopathy for treatment of the disease has been failing to reach diabetic people living in rural areas. Shortage of trained ophthalmologists, limited availability of healthcare centers, and expensiveness of diagnostic equipment are among the reasons. Although many deep learning-based automatic diagnosis of diabetic retinopathy techniques have been implemented in the literature, these methods still fail to provide a point-of-care diagnosis. This raises the need for an independent diagnostic of diabetic retinopathy that can be used by a non-expert. Recently the usage of smartphones has been increasing across the world. Automated diagnoses of diabetic retinopathy can be deployed on smartphones in order to provide an instant diagnosis to diabetic people residing in remote areas. In this paper, inception based convolutional neural network and binary decision tree-based ensemble of classifiers have been proposed and implemented to detect and classify diabetic retinopathy. The proposed method was further imported into a smartphone application for mobile-based classification, which provides an offline and automatic system for diagnosis of diabetic retinopathy.
摘要:糖尿病性视网膜病变的治疗本病的早期诊断一直未能达到生活在农村的糖尿病人。训练有素的眼科医生,医疗保健中心的有限,以及诊断设备价格昂贵的不足是原因之一。虽然糖尿病视网膜病变技术的许多深学习型自动诊断已在文献中得到落实,这些方法仍不能提供点的护理诊断。这就提出了一个独立的诊断糖尿病性视网膜病变,可以由非专业使用的需要。最近智能手机的使用已经在世界各地越来越多。糖尿病性视网膜病变的诊断自动化可以以提供即时诊断,居住在偏远地区的糖尿病人部署在智能手机上。在本文中,基于以来卷积神经网络,并已提出并实施了用于检测和分类糖尿病视网膜病变分类的二元决策基于树的合奏。所提出的方法,进一步导入用于基于移动设备的分类智能电话应用,其提供了一种用于糖尿病性视网膜病的诊断的脱机及自动系统。
Misgina Tsighe Hagos, Shri Kant, Surayya Ado Bala
Abstract: Early diagnosis of diabetic retinopathy for treatment of the disease has been failing to reach diabetic people living in rural areas. Shortage of trained ophthalmologists, limited availability of healthcare centers, and expensiveness of diagnostic equipment are among the reasons. Although many deep learning-based automatic diagnosis of diabetic retinopathy techniques have been implemented in the literature, these methods still fail to provide a point-of-care diagnosis. This raises the need for an independent diagnostic of diabetic retinopathy that can be used by a non-expert. Recently the usage of smartphones has been increasing across the world. Automated diagnoses of diabetic retinopathy can be deployed on smartphones in order to provide an instant diagnosis to diabetic people residing in remote areas. In this paper, inception based convolutional neural network and binary decision tree-based ensemble of classifiers have been proposed and implemented to detect and classify diabetic retinopathy. The proposed method was further imported into a smartphone application for mobile-based classification, which provides an offline and automatic system for diagnosis of diabetic retinopathy.
摘要:糖尿病性视网膜病变的治疗本病的早期诊断一直未能达到生活在农村的糖尿病人。训练有素的眼科医生,医疗保健中心的有限,以及诊断设备价格昂贵的不足是原因之一。虽然糖尿病视网膜病变技术的许多深学习型自动诊断已在文献中得到落实,这些方法仍不能提供点的护理诊断。这就提出了一个独立的诊断糖尿病性视网膜病变,可以由非专业使用的需要。最近智能手机的使用已经在世界各地越来越多。糖尿病性视网膜病变的诊断自动化可以以提供即时诊断,居住在偏远地区的糖尿病人部署在智能手机上。在本文中,基于以来卷积神经网络,并已提出并实施了用于检测和分类糖尿病视网膜病变分类的二元决策基于树的合奏。所提出的方法,进一步导入用于基于移动设备的分类智能电话应用,其提供了一种用于糖尿病性视网膜病的诊断的脱机及自动系统。
71. Deep Learning on Chest X-ray Images to Detect and Evaluate Pneumonia Cases at the Era of COVID-19 [PDF] 返回目录
Karim Hammoudi, Halim Benhabiles, Mahmoud Melkemi, Fadi Dornaika, Ignacio Arganda-Carreras, Dominique Collard, Arnaud Scherpereel
Abstract: Coronavirus disease 2019 (COVID-19) is an infectious disease with first symptoms similar to the flu. COVID-19 appeared first in China and very quickly spreads to the rest of the world, causing then the 2019-20 coronavirus pandemic. In many cases, this disease causes pneumonia. Since pulmonary infections can be observed through radiography images, this paper investigates deep learning methods for automatically analyzing query chest X-ray images with the hope to bring precision tools to health professionals towards screening the COVID-19 and diagnosing confirmed patients. In this context, training datasets, deep learning architectures and analysis strategies have been experimented from publicly open sets of chest X-ray images. Tailored deep learning models are proposed to detect pneumonia infection cases, notably viral cases. It is assumed that viral pneumonia cases detected during an epidemic COVID-19 context have a high probability to presume COVID-19 infections. Moreover, easy-to-apply health indicators are proposed for estimating infection status and predicting patient status from the detected pneumonia cases. Experimental results show possibilities of training deep learning models over publicly open sets of chest X-ray images towards screening viral pneumonia. Chest X-ray test images of COVID-19 infected patients are successfully diagnosed through detection models retained for their performances. The efficiency of proposed health indicators is highlighted through simulated scenarios of patients presenting infections and health problems by combining real and synthetic health data.
摘要:冠状病毒病2019(COVID-19)是一种传染病,与类似流感的首发症状。 COVID-19首次在中国并很快出现蔓延到世界其他地区,造成那么2019-20冠状病毒大流行。在许多情况下,这种疾病会导致肺炎。由于肺部感染可通过X线摄影图像可以观察到,本文探讨深学习,希望把精密工具,以卫生专业人员对筛选COVID-19和诊断确诊病人自动分析查询胸部X射线图像的方法。在此背景下,训练数据集,深度学习架构和分析战略已经从公众开放集胸部X射线图像的实验。提出量身定制的深度学习模型来检测肺炎感染病例,特别是病毒病例。据推测,疫情COVID-19方面有很高的概率假定COVID-19感染过程中检测到病毒性肺炎病例。此外,容易申请卫生指标都提出了从检测肺炎病例估计感染状况及预测患者状态。实验结果表明朝筛选病毒性肺炎培养深度学习模式在公开开集胸部X射线图像的可能性。 COVID-19感染患者胸部X线测试图像诊断成功通过检测模型保留了他们的表演。提出卫生指标的效率是通过患者通过结合实际的和合成的健康数据呈现感染和健康问题的模拟场景突出。
Karim Hammoudi, Halim Benhabiles, Mahmoud Melkemi, Fadi Dornaika, Ignacio Arganda-Carreras, Dominique Collard, Arnaud Scherpereel
Abstract: Coronavirus disease 2019 (COVID-19) is an infectious disease with first symptoms similar to the flu. COVID-19 appeared first in China and very quickly spreads to the rest of the world, causing then the 2019-20 coronavirus pandemic. In many cases, this disease causes pneumonia. Since pulmonary infections can be observed through radiography images, this paper investigates deep learning methods for automatically analyzing query chest X-ray images with the hope to bring precision tools to health professionals towards screening the COVID-19 and diagnosing confirmed patients. In this context, training datasets, deep learning architectures and analysis strategies have been experimented from publicly open sets of chest X-ray images. Tailored deep learning models are proposed to detect pneumonia infection cases, notably viral cases. It is assumed that viral pneumonia cases detected during an epidemic COVID-19 context have a high probability to presume COVID-19 infections. Moreover, easy-to-apply health indicators are proposed for estimating infection status and predicting patient status from the detected pneumonia cases. Experimental results show possibilities of training deep learning models over publicly open sets of chest X-ray images towards screening viral pneumonia. Chest X-ray test images of COVID-19 infected patients are successfully diagnosed through detection models retained for their performances. The efficiency of proposed health indicators is highlighted through simulated scenarios of patients presenting infections and health problems by combining real and synthetic health data.
摘要:冠状病毒病2019(COVID-19)是一种传染病,与类似流感的首发症状。 COVID-19首次在中国并很快出现蔓延到世界其他地区,造成那么2019-20冠状病毒大流行。在许多情况下,这种疾病会导致肺炎。由于肺部感染可通过X线摄影图像可以观察到,本文探讨深学习,希望把精密工具,以卫生专业人员对筛选COVID-19和诊断确诊病人自动分析查询胸部X射线图像的方法。在此背景下,训练数据集,深度学习架构和分析战略已经从公众开放集胸部X射线图像的实验。提出量身定制的深度学习模型来检测肺炎感染病例,特别是病毒病例。据推测,疫情COVID-19方面有很高的概率假定COVID-19感染过程中检测到病毒性肺炎病例。此外,容易申请卫生指标都提出了从检测肺炎病例估计感染状况及预测患者状态。实验结果表明朝筛选病毒性肺炎培养深度学习模式在公开开集胸部X射线图像的可能性。 COVID-19感染患者胸部X线测试图像诊断成功通过检测模型保留了他们的表演。提出卫生指标的效率是通过患者通过结合实际的和合成的健康数据呈现感染和健康问题的模拟场景突出。
72. Complete CVDL Methodology for Investigating Hydrodynamic Instabilities [PDF] 返回目录
Re'em Harel, Matan Rusanovsky, Yehonatan Fridman, Assaf Shimony, Gal Oren
Abstract: In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon namely analytical models, experiments and simulations - and all of them are primarily investigated and correlated using human expertise. In this work we claim and demonstrate that a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision). Specifically, we target and evaluate specific state-of-the-art techniques such as Image Retrieval, Template Matching, Parameters Regression and Spatiotemporal Prediction - for the quantitative and qualitative benefits they provide. In order to do so we focus in this research on one of the most representative instabilities, the Rayleigh-Taylor one, simulate its behaviour and create an open-sourced state-of-the-art annotated database (RayleAI). Finally, we use adjusted experimental results and novel physical loss methodologies to validate the correspondence of the predicted results to actual physical reality to prove the models efficiency. The techniques which were developed and proved in this work can be served as essential tools for physicists in the field of hydrodynamics for investigating a variety of physical systems, and also could be used via Transfer Learning to other instabilities research. A part of the techniques can be easily applied on already exist simulation results. All models as well as the data-set that was created for this work, are publicly available at: this https URL.
摘要:在流体动力学中,最重要的研究领域之一是流体力学不稳定性以及它们在不同的流态变化。说的不稳定性的研究关注的是高度非线性动力学。目前,使用三种主要方法对这类现象的理解,即分析模型,实验和模拟 - 和所有的人主要是调查和使用人的专业相关。在这项工作中,我们要求,并表明这项研究工作的主要部分可以而且应该在计算机视觉与深度学习(CVDL,或深计算机视觉)领域使用最近的突破进展进行分析。具体来说,我们的目标和评价国家的最先进的特殊技术,如图像检索,模板匹配,参数回归和时空预测 - 为他们提供的定量和定性的好处。为了做到这一点,我们在这项研究专注于最有代表性的不稳定性之一,瑞利 - 泰勒之一,模拟其行为,并创建一个开放源代码的国家的最先进的注释数据库(RayleAI)。最后,我们使用调整后的实验结果和新颖的物理损失的方法来验证预测结果与实际的物理现实的对应关系,证明了模型的效率。它被开发并证明了在此工作的技术可作为必不可少的工具,物理学家在流体力学领域调查的各种物理系统,并且还可以通过迁移学习到其他不稳定性研究中使用。的技术的部分可以容易地应用于上已存在的模拟结果。所有型号以及这是对这项工作产生的数据集,都公布于:此HTTPS URL。
Re'em Harel, Matan Rusanovsky, Yehonatan Fridman, Assaf Shimony, Gal Oren
Abstract: In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon namely analytical models, experiments and simulations - and all of them are primarily investigated and correlated using human expertise. In this work we claim and demonstrate that a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision). Specifically, we target and evaluate specific state-of-the-art techniques such as Image Retrieval, Template Matching, Parameters Regression and Spatiotemporal Prediction - for the quantitative and qualitative benefits they provide. In order to do so we focus in this research on one of the most representative instabilities, the Rayleigh-Taylor one, simulate its behaviour and create an open-sourced state-of-the-art annotated database (RayleAI). Finally, we use adjusted experimental results and novel physical loss methodologies to validate the correspondence of the predicted results to actual physical reality to prove the models efficiency. The techniques which were developed and proved in this work can be served as essential tools for physicists in the field of hydrodynamics for investigating a variety of physical systems, and also could be used via Transfer Learning to other instabilities research. A part of the techniques can be easily applied on already exist simulation results. All models as well as the data-set that was created for this work, are publicly available at: this https URL.
摘要:在流体动力学中,最重要的研究领域之一是流体力学不稳定性以及它们在不同的流态变化。说的不稳定性的研究关注的是高度非线性动力学。目前,使用三种主要方法对这类现象的理解,即分析模型,实验和模拟 - 和所有的人主要是调查和使用人的专业相关。在这项工作中,我们要求,并表明这项研究工作的主要部分可以而且应该在计算机视觉与深度学习(CVDL,或深计算机视觉)领域使用最近的突破进展进行分析。具体来说,我们的目标和评价国家的最先进的特殊技术,如图像检索,模板匹配,参数回归和时空预测 - 为他们提供的定量和定性的好处。为了做到这一点,我们在这项研究专注于最有代表性的不稳定性之一,瑞利 - 泰勒之一,模拟其行为,并创建一个开放源代码的国家的最先进的注释数据库(RayleAI)。最后,我们使用调整后的实验结果和新颖的物理损失的方法来验证预测结果与实际的物理现实的对应关系,证明了模型的效率。它被开发并证明了在此工作的技术可作为必不可少的工具,物理学家在流体力学领域调查的各种物理系统,并且还可以通过迁移学习到其他不稳定性研究中使用。的技术的部分可以容易地应用于上已存在的模拟结果。所有型号以及这是对这项工作产生的数据集,都公布于:此HTTPS URL。
73. Convolutional Neural Networks based automated segmentation and labelling of the lumbar spine X-ray [PDF] 返回目录
Sandor Konya, Sai Natarajan T R, Hassan Allouch, Kais Abu Nahleh, Omneya Yakout Dogheim, Heinrich Boehm
Abstract: The aim of this study is to investigate the segmentation accuracies of different segmentation networks trained on 730 manually annotated lateral lumbar spine X-rays. Instance segmentation networks were compared to semantic segmentation networks. The study cohort comprised diseased spines and postoperative images with metallic implants. The average mean accuracy and mean intersection over union (IoU) was up to 3 percent better for the best performing instance segmentation model, the average pixel accuracy and weighted IoU were slightly better for the best performing semantic segmentation model. Moreover, the inferences of the instance segmentation models are easier to implement for further processing pipelines in clinical decision support.
摘要:本研究的目的是调查上训练730手动注释腰椎侧位X射线不同的分割网络的分割精度。例如分割网络进行了比较,语义分割网络。该研究组由患病刺和术后影像与金属植入物。平均的平均精确度和平均交叉点上接头(IOU)达更好为表现最佳实例分割模型中,平均像素精度3%和加权IOU略好为最佳执行语义分割模型。此外,例如分割模型的推论更容易实现在临床决策支持进一步的处理管道。
Sandor Konya, Sai Natarajan T R, Hassan Allouch, Kais Abu Nahleh, Omneya Yakout Dogheim, Heinrich Boehm
Abstract: The aim of this study is to investigate the segmentation accuracies of different segmentation networks trained on 730 manually annotated lateral lumbar spine X-rays. Instance segmentation networks were compared to semantic segmentation networks. The study cohort comprised diseased spines and postoperative images with metallic implants. The average mean accuracy and mean intersection over union (IoU) was up to 3 percent better for the best performing instance segmentation model, the average pixel accuracy and weighted IoU were slightly better for the best performing semantic segmentation model. Moreover, the inferences of the instance segmentation models are easier to implement for further processing pipelines in clinical decision support.
摘要:本研究的目的是调查上训练730手动注释腰椎侧位X射线不同的分割网络的分割精度。例如分割网络进行了比较,语义分割网络。该研究组由患病刺和术后影像与金属植入物。平均的平均精确度和平均交叉点上接头(IOU)达更好为表现最佳实例分割模型中,平均像素精度3%和加权IOU略好为最佳执行语义分割模型。此外,例如分割模型的推论更容易实现在临床决策支持进一步的处理管道。
74. Binary Neural Networks: A Survey [PDF] 返回目录
Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, Nicu Sebe
Abstract: The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices. However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.
摘要:二元神经网络,在很大程度上节约了存储和计算,作为对资源有限的设备部署深车型有前途的技术。然而,二值化不可避免地导致严重的信息丢失,甚至更糟的是,它的不连续性带来的困难深网络的优化。为了解决这些问题,不同的算法已经被提出,并取得了近年来满足进展。在本文中,我们提出的这些算法的全面调查,主要分为天然溶液直接进行二值化,并利用技术,例如最小化量化误差,提高了网络的损失函数,并减小梯度误差的优化的。我们还调查二元神经网络的其他实际问题,如硬件的人性化设计和训练技巧。然后,我们给不同的任务,包括图像分类,目标检测和语义分割评价和讨论。最后,可能会面临在今后的研究面临的挑战进行了展望。
Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, Nicu Sebe
Abstract: The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices. However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.
摘要:二元神经网络,在很大程度上节约了存储和计算,作为对资源有限的设备部署深车型有前途的技术。然而,二值化不可避免地导致严重的信息丢失,甚至更糟的是,它的不连续性带来的困难深网络的优化。为了解决这些问题,不同的算法已经被提出,并取得了近年来满足进展。在本文中,我们提出的这些算法的全面调查,主要分为天然溶液直接进行二值化,并利用技术,例如最小化量化误差,提高了网络的损失函数,并减小梯度误差的优化的。我们还调查二元神经网络的其他实际问题,如硬件的人性化设计和训练技巧。然后,我们给不同的任务,包括图像分类,目标检测和语义分割评价和讨论。最后,可能会面临在今后的研究面临的挑战进行了展望。
75. Two-Stage Resampling for Convolutional Neural Network Training in the Imbalanced Colorectal Cancer Image Classification [PDF] 返回目录
Michał Koziarski
Abstract: Data imbalance remains one of the open challenges in the contemporary machine learning. It is especially prevalent in case of medical data, such as histopathological images. Traditional data-level approaches for dealing with data imbalance are ill-suited for image data: oversampling methods such as SMOTE and its derivatives lead to creation of unrealistic synthetic observations, whereas undersampling reduces the amount of available data, critical for successful training of convolutional neural networks. To alleviate the problems associated with over- and undersampling we propose a novel two-stage resampling methodology, in which we initially use the oversampling techniques in the image space to leverage a large amount of data for training of a convolutional neural network, and afterwards apply undersampling in the feature space to fine-tune the last layers of the network. Experiments conducted on a colorectal cancer image dataset indicate the usefulness of the proposed approach.
摘要:数据的不平衡依然是当代机器学习开放的挑战之一。它在医疗数据,例如组织病理学图像的情况下是特别普遍。传统的数据电平与数据失衡处理是不适合用于图像数据接近:过采样方法,诸如SMOTE和其衍生物导致创建不切实际合成观测,而欠减少了可用的数据量,为的卷积神经成功的培训临界网络。为了缓解与过压和欠我们提出了一个新颖的2级重采样方法,在我们最初使用过采样技术,在图像空间利用大量数据的卷积神经网络训练,事后申请相关的问题欠的功能空间,以微调网络的最后一层。在结直肠癌的图像数据集进行的实验表明该方法的有效性。
Michał Koziarski
Abstract: Data imbalance remains one of the open challenges in the contemporary machine learning. It is especially prevalent in case of medical data, such as histopathological images. Traditional data-level approaches for dealing with data imbalance are ill-suited for image data: oversampling methods such as SMOTE and its derivatives lead to creation of unrealistic synthetic observations, whereas undersampling reduces the amount of available data, critical for successful training of convolutional neural networks. To alleviate the problems associated with over- and undersampling we propose a novel two-stage resampling methodology, in which we initially use the oversampling techniques in the image space to leverage a large amount of data for training of a convolutional neural network, and afterwards apply undersampling in the feature space to fine-tune the last layers of the network. Experiments conducted on a colorectal cancer image dataset indicate the usefulness of the proposed approach.
摘要:数据的不平衡依然是当代机器学习开放的挑战之一。它在医疗数据,例如组织病理学图像的情况下是特别普遍。传统的数据电平与数据失衡处理是不适合用于图像数据接近:过采样方法,诸如SMOTE和其衍生物导致创建不切实际合成观测,而欠减少了可用的数据量,为的卷积神经成功的培训临界网络。为了缓解与过压和欠我们提出了一个新颖的2级重采样方法,在我们最初使用过采样技术,在图像空间利用大量数据的卷积神经网络训练,事后申请相关的问题欠的功能空间,以微调网络的最后一层。在结直肠癌的图像数据集进行的实验表明该方法的有效性。
76. Teacher-Class Network: A Neural Network Compression Mechanism [PDF] 返回目录
Shaiq Munir Malik, Mohbat Tharani, Murtaza Taj
Abstract: To solve the problem of the overwhelming size of Deep Neural Networks (DNN) several compression schemes have been proposed, one of them is teacher-student. Teacher-student tries to transfer knowledge from a complex teacher network to a simple student network. In this paper, we propose a novel method called a teacher-class network consisting of a single teacher and multiple student networks (i.e. class of students). Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge about the entire solution to each student. Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network. Thus unlike the logits-based single student approach, the combined knowledge learned by the class of students can be used to solve other problems as well. These students can be designed to satisfy a given budget, e.g. for comparative purposes we kept the collective parameters of all the students less than or equivalent to that of a single student in the teacher-student approach . These small student networks are trained independently, making it possible to train and deploy models on memory deficient devices as well as on parallel processing systems such as data centers. The proposed teacher-class architecture is evaluated on several benchmark datasets including MNIST, FashionMNIST, IMDB Movie Reviews and CAMVid on multiple tasks including classification, sentiment classification and segmentation. Our approach outperforms the state-of-the-art single student approach in terms of accuracy as well as computational cost and in many cases it achieves an accuracy equivalent to the teacher network while having 10-30 times fewer parameters.
摘要:为解决深层神经网络铺天盖地的大小的问题(DNN)几种压缩方案已经被提出,其中之一是师生。师生试图从复杂的教师网络传输知识,以一个简单的学生网络。在本文中,我们提出了一个所谓的老师级的网络,由一个单一的教师和学生多网络(即班的学生)的新方法。相反,知识转移到只有一个学生的,所提出的方法转移对整个解决方案,每个学生知识的一大块。我们的学生没有受过训练的问题,具体logits,他们被训练老师网了解到模仿知识(密表示)。因此,不同于基于logits一个学生的做法,由班级的学生学到的综合知识可以用来解决其他的问题。这些学生可以被设计成以给定的预算,例如出于比较的目的,我们一直都在学生的集体参数小于或等同于师生办法一个学生的。这些小的学生网络被独立地训练,使得可以在存储器缺陷的设备,以及对并行处理系统,诸如数据中心训练和部署模型。建议的老师级架构的几个基准数据集包括MNIST,FashionMNIST,IMDB电影评论和CAMVid多个任务,包括分类,情感分类和分割评估。我们的方法优于在精度方面,以及计算成本的国家的最先进的一个学生的做法,而且在许多情况下,它同时具有10-30倍的参数达到少的精度相当于老师网络。
Shaiq Munir Malik, Mohbat Tharani, Murtaza Taj
Abstract: To solve the problem of the overwhelming size of Deep Neural Networks (DNN) several compression schemes have been proposed, one of them is teacher-student. Teacher-student tries to transfer knowledge from a complex teacher network to a simple student network. In this paper, we propose a novel method called a teacher-class network consisting of a single teacher and multiple student networks (i.e. class of students). Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge about the entire solution to each student. Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network. Thus unlike the logits-based single student approach, the combined knowledge learned by the class of students can be used to solve other problems as well. These students can be designed to satisfy a given budget, e.g. for comparative purposes we kept the collective parameters of all the students less than or equivalent to that of a single student in the teacher-student approach . These small student networks are trained independently, making it possible to train and deploy models on memory deficient devices as well as on parallel processing systems such as data centers. The proposed teacher-class architecture is evaluated on several benchmark datasets including MNIST, FashionMNIST, IMDB Movie Reviews and CAMVid on multiple tasks including classification, sentiment classification and segmentation. Our approach outperforms the state-of-the-art single student approach in terms of accuracy as well as computational cost and in many cases it achieves an accuracy equivalent to the teacher network while having 10-30 times fewer parameters.
摘要:为解决深层神经网络铺天盖地的大小的问题(DNN)几种压缩方案已经被提出,其中之一是师生。师生试图从复杂的教师网络传输知识,以一个简单的学生网络。在本文中,我们提出了一个所谓的老师级的网络,由一个单一的教师和学生多网络(即班的学生)的新方法。相反,知识转移到只有一个学生的,所提出的方法转移对整个解决方案,每个学生知识的一大块。我们的学生没有受过训练的问题,具体logits,他们被训练老师网了解到模仿知识(密表示)。因此,不同于基于logits一个学生的做法,由班级的学生学到的综合知识可以用来解决其他的问题。这些学生可以被设计成以给定的预算,例如出于比较的目的,我们一直都在学生的集体参数小于或等同于师生办法一个学生的。这些小的学生网络被独立地训练,使得可以在存储器缺陷的设备,以及对并行处理系统,诸如数据中心训练和部署模型。建议的老师级架构的几个基准数据集包括MNIST,FashionMNIST,IMDB电影评论和CAMVid多个任务,包括分类,情感分类和分割评估。我们的方法优于在精度方面,以及计算成本的国家的最先进的一个学生的做法,而且在许多情况下,它同时具有10-30倍的参数达到少的精度相当于老师网络。
77. Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study [PDF] 返回目录
Christoph Baur, Stefan Denner, Benedikt Wiestler, Shadi Albarqouni, Nassir Navab
Abstract: Deep unsupervised representation learning has recently led to new approaches in the field of Unsupervised Anomaly Detection (UAD) in brain MRI. The main principle behind these works is to learn a model of normal anatomy by learning to compress and recover healthy data. This allows to spot abnormal structures from erroneous recoveries of compressed, potentially anomalous samples. The concept is of great interest to the medical image analysis community as it i) relieves from the need of vast amounts of manually segmented training data---a necessity for and pitfall of current supervised Deep Learning---and ii) theoretically allows to detect arbitrary, even rare pathologies which supervised approaches might fail to find. To date, the experimental design of most works hinders a valid comparison, because i) they are evaluated against different datasets and different pathologies, ii) use different image resolutions and iii) different model architectures with varying complexity. The intent of this work is to establish comparability among recent methods by utilizing a single architecture, a single resolution and the same dataset(s). Besides providing a ranking of the methods, we also try to answer questions like i) how many healthy training subjects are needed to model normality and ii) if the reviewed approaches are also sensitive to domain shift. Further, we identify open challenges and provide suggestions for future community efforts and research directions.
摘要:深无监督表示学习最近导致无监督异常检测(UAD)脑MRI领域的新途径。这些作品背后的主要原理是通过学习来压缩和恢复健康的资料来了解正常解剖模型。这允许从压缩的,潜在异常样品的错误回收率发现异常的结构。这个概念是非常感兴趣的医学图像分析领域,因为它我)可释放从需要大量人工分段训练数据的---一个必要性和当前的陷阱监督深度学习---和ii)在理论上允许检测任意的,即使是监督的方法可能无法发现罕见的病症。迄今为止,大多数作品的实验设计阻碍了有效的比较,因为我)他们针对不同的数据集,不同的病症进行评估,二)使用复杂程度不同不同的图像分辨率以及iii)不同的模型架构。这样做的目的这项工作是利用一个单一的架构,一个单一的分辨率和相同的数据集(县)最近建立方法之间的可比性。除了提供一个排序的方法,我们也尝试回答这样的问题我)许多健康的训练科目是如何需要模型正常和ii)如审查方法也是域转变敏感。此外,我们确定开放的挑战,并为未来社会的努力和研究方向的建议。
Christoph Baur, Stefan Denner, Benedikt Wiestler, Shadi Albarqouni, Nassir Navab
Abstract: Deep unsupervised representation learning has recently led to new approaches in the field of Unsupervised Anomaly Detection (UAD) in brain MRI. The main principle behind these works is to learn a model of normal anatomy by learning to compress and recover healthy data. This allows to spot abnormal structures from erroneous recoveries of compressed, potentially anomalous samples. The concept is of great interest to the medical image analysis community as it i) relieves from the need of vast amounts of manually segmented training data---a necessity for and pitfall of current supervised Deep Learning---and ii) theoretically allows to detect arbitrary, even rare pathologies which supervised approaches might fail to find. To date, the experimental design of most works hinders a valid comparison, because i) they are evaluated against different datasets and different pathologies, ii) use different image resolutions and iii) different model architectures with varying complexity. The intent of this work is to establish comparability among recent methods by utilizing a single architecture, a single resolution and the same dataset(s). Besides providing a ranking of the methods, we also try to answer questions like i) how many healthy training subjects are needed to model normality and ii) if the reviewed approaches are also sensitive to domain shift. Further, we identify open challenges and provide suggestions for future community efforts and research directions.
摘要:深无监督表示学习最近导致无监督异常检测(UAD)脑MRI领域的新途径。这些作品背后的主要原理是通过学习来压缩和恢复健康的资料来了解正常解剖模型。这允许从压缩的,潜在异常样品的错误回收率发现异常的结构。这个概念是非常感兴趣的医学图像分析领域,因为它我)可释放从需要大量人工分段训练数据的---一个必要性和当前的陷阱监督深度学习---和ii)在理论上允许检测任意的,即使是监督的方法可能无法发现罕见的病症。迄今为止,大多数作品的实验设计阻碍了有效的比较,因为我)他们针对不同的数据集,不同的病症进行评估,二)使用复杂程度不同不同的图像分辨率以及iii)不同的模型架构。这样做的目的这项工作是利用一个单一的架构,一个单一的分辨率和相同的数据集(县)最近建立方法之间的可比性。除了提供一个排序的方法,我们也尝试回答这样的问题我)许多健康的训练科目是如何需要模型正常和ii)如审查方法也是域转变敏感。此外,我们确定开放的挑战,并为未来社会的努力和研究方向的建议。
78. Inspector Gadget: A Data Programming-based Labeling System for Industrial Images [PDF] 返回目录
Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang
Abstract: As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better accuracy than state-of-the-art techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.
摘要:随着机器学习的图像软件2.0时代变得民主化,严重瓶颈之一是培训确保足够的标签数据。这个问题是在制造环境,让智能工厂通过分析工业图像依靠机器学习对产品质量的控制尤为重要。这样的图像典型地是大的,并且可以只需要部分地进行分析,其中只有一小部分是有问题(例如,识别表面上的缺陷)。由于手工贴标这些图像是昂贵的,监管不力是一个有吸引力的选择,其中的想法是产生微弱的标签,是不完美的,但是可以在规模生产。数据编程是这一类,其中它使用的标签功能形式的人类知识和它们组合成一个生成模型在最近的范例。数据编程已成功在基于文本或结构化数据的应用程序,也可以应用到图像通常如果能找到一种方法,将它们转换成结构化数据。在这项工作中,我们通过它直接应用于图像,而这种转换,这是工业应用的常见场景扩展数据编程的视野。我们建议神探,图像标签系统,结合众包,数据增强,以及数据编程大规模生产薄弱标签图像分类。我们进行真实工业图像数据集,并表明神探获得比国家的最先进的技术,更准确的实验:Snuba,护目镜和自我学习的基线使用卷积神经网络(细胞神经网络),而前培训。
Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang
Abstract: As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better accuracy than state-of-the-art techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.
摘要:随着机器学习的图像软件2.0时代变得民主化,严重瓶颈之一是培训确保足够的标签数据。这个问题是在制造环境,让智能工厂通过分析工业图像依靠机器学习对产品质量的控制尤为重要。这样的图像典型地是大的,并且可以只需要部分地进行分析,其中只有一小部分是有问题(例如,识别表面上的缺陷)。由于手工贴标这些图像是昂贵的,监管不力是一个有吸引力的选择,其中的想法是产生微弱的标签,是不完美的,但是可以在规模生产。数据编程是这一类,其中它使用的标签功能形式的人类知识和它们组合成一个生成模型在最近的范例。数据编程已成功在基于文本或结构化数据的应用程序,也可以应用到图像通常如果能找到一种方法,将它们转换成结构化数据。在这项工作中,我们通过它直接应用于图像,而这种转换,这是工业应用的常见场景扩展数据编程的视野。我们建议神探,图像标签系统,结合众包,数据增强,以及数据编程大规模生产薄弱标签图像分类。我们进行真实工业图像数据集,并表明神探获得比国家的最先进的技术,更准确的实验:Snuba,护目镜和自我学习的基线使用卷积神经网络(细胞神经网络),而前培训。
79. Iconify: Converting Photographs into Icons [PDF] 返回目录
Takuro Karamatsu, Gibran Benitez-Garcia, Keiji Yanai, Seiichi Uchida
Abstract: In this paper, we tackle a challenging domain conversion task between photo and icon images. Although icons often originate from real object images (i.e., photographs), severe abstractions and simplifications are applied to generate icon images by professional graphic designers. Moreover, there is no one-to-one correspondence between the two domains, for this reason we cannot use it as the ground-truth for learning a direct conversion function. Since generative adversarial networks (GAN) can undertake the problem of domain conversion without any correspondence, we test CycleGAN and UNIT to generate icons from objects segmented from photo images. Our experiments with several image datasets prove that CycleGAN learns sufficient abstraction and simplification ability to generate icon-like images.
摘要:在本文中,我们将处理照片和图标图像之间的挑战域转换任务。尽管图标常常从真实对象图像(即,照片),严重的抽象和简化起源被施加到生成图标由专业图形设计师图像。此外,还有两个域之间没有一个一一对应,因为这个原因,我们不能把它作为地面真学习的直接转换功能。由于生成对抗网络(GAN)可在没有任何对应承接域转换的问题,我们测试CycleGAN和单元,产生图标从物体从照片图像分割。我们与几个图像数据集实验证明,CycleGAN学会生成图标般的画面足够的抽象和简化的能力。
Takuro Karamatsu, Gibran Benitez-Garcia, Keiji Yanai, Seiichi Uchida
Abstract: In this paper, we tackle a challenging domain conversion task between photo and icon images. Although icons often originate from real object images (i.e., photographs), severe abstractions and simplifications are applied to generate icon images by professional graphic designers. Moreover, there is no one-to-one correspondence between the two domains, for this reason we cannot use it as the ground-truth for learning a direct conversion function. Since generative adversarial networks (GAN) can undertake the problem of domain conversion without any correspondence, we test CycleGAN and UNIT to generate icons from objects segmented from photo images. Our experiments with several image datasets prove that CycleGAN learns sufficient abstraction and simplification ability to generate icon-like images.
摘要:在本文中,我们将处理照片和图标图像之间的挑战域转换任务。尽管图标常常从真实对象图像(即,照片),严重的抽象和简化起源被施加到生成图标由专业图形设计师图像。此外,还有两个域之间没有一个一一对应,因为这个原因,我们不能把它作为地面真学习的直接转换功能。由于生成对抗网络(GAN)可在没有任何对应承接域转换的问题,我们测试CycleGAN和单元,产生图标从物体从照片图像分割。我们与几个图像数据集实验证明,CycleGAN学会生成图标般的画面足够的抽象和简化的能力。
80. Deep Attentive Generative Adversarial Network for Photo-Realistic Image De-Quantization [PDF] 返回目录
Yang Zhang, Changhui Hu, Xiaobo Lu
Abstract: Most of current display devices are with eight or higher bit-depth. However, the quality of most multimedia tools cannot achieve this bit-depth standard for the generating images. De-quantization can improve the visual quality of low bit-depth image to display on high bit-depth screen. This paper proposes DAGAN algorithm to perform super-resolution on image intensity resolution, which is orthogonal to the spatial resolution, realizing photo-realistic de-quantization via an end-to-end learning pattern. Until now, this is the first attempt to apply Generative Adversarial Network (GAN) framework for image de-quantization. Specifically, we propose the Dense Residual Self-attention (DenseResAtt) module, which is consisted of dense residual blocks armed with self-attention mechanism, to pay more attention on high-frequency information. Moreover, the series connection of sequential DenseResAtt modules forms deep attentive network with superior discriminative learning ability in image de-quantization, modeling representative feature maps to recover as much useful information as possible. In addition, due to the adversarial learning framework can reliably produce high quality natural images, the specified content loss as well as the adversarial loss are back-propagated to optimize the training of model. Above all, DAGAN is able to generate the photo-realistic high bit-depth image without banding artifacts. Experiment results on several public benchmarks prove that the DAGAN algorithm possesses ability to achieve excellent visual effect and satisfied quantitative performance.
摘要:目前大多数显示设备都与八个或更高位深度。然而,大多数的多媒体工具的质量无法达到发电的图像,此位深度的标准。去量化可以提高低比特深度的图像的视觉质量,以显示高比特深度屏幕上。本文提出的算法DAGAN执行使图像强度分辨率,这是正交的空间分辨率,经由端部至端学习图案实现照片般逼真的反量化超分辨率。到现在为止,这是应用剖成对抗性网络(GAN)的框架图像去量化的首次尝试。具体来说,我们提出了密集的残余自我关注(DenseResAtt)模块,包括具有自注意机制武装密集的残余块,更注重于高频率的信息。此外,顺序DenseResAtt的串联连接模块的形式在图像具有优异的判别学习能力深周到网络去量化,造型代表特征映射到恢复尽可能多的有用信息。另外,由于对抗学习框架能够可靠地产生高质量的自然图像,以及对抗损耗反向传播来优化模型的训练指定的内容丢失。首先,DAGAN能够在没有带状伪影,生成逼真的高位深度的图像。在几个公共基准测试实验结果证明,该DAGAN算法具有实现出色的视觉效果和满意的定量绩效能力。
Yang Zhang, Changhui Hu, Xiaobo Lu
Abstract: Most of current display devices are with eight or higher bit-depth. However, the quality of most multimedia tools cannot achieve this bit-depth standard for the generating images. De-quantization can improve the visual quality of low bit-depth image to display on high bit-depth screen. This paper proposes DAGAN algorithm to perform super-resolution on image intensity resolution, which is orthogonal to the spatial resolution, realizing photo-realistic de-quantization via an end-to-end learning pattern. Until now, this is the first attempt to apply Generative Adversarial Network (GAN) framework for image de-quantization. Specifically, we propose the Dense Residual Self-attention (DenseResAtt) module, which is consisted of dense residual blocks armed with self-attention mechanism, to pay more attention on high-frequency information. Moreover, the series connection of sequential DenseResAtt modules forms deep attentive network with superior discriminative learning ability in image de-quantization, modeling representative feature maps to recover as much useful information as possible. In addition, due to the adversarial learning framework can reliably produce high quality natural images, the specified content loss as well as the adversarial loss are back-propagated to optimize the training of model. Above all, DAGAN is able to generate the photo-realistic high bit-depth image without banding artifacts. Experiment results on several public benchmarks prove that the DAGAN algorithm possesses ability to achieve excellent visual effect and satisfied quantitative performance.
摘要:目前大多数显示设备都与八个或更高位深度。然而,大多数的多媒体工具的质量无法达到发电的图像,此位深度的标准。去量化可以提高低比特深度的图像的视觉质量,以显示高比特深度屏幕上。本文提出的算法DAGAN执行使图像强度分辨率,这是正交的空间分辨率,经由端部至端学习图案实现照片般逼真的反量化超分辨率。到现在为止,这是应用剖成对抗性网络(GAN)的框架图像去量化的首次尝试。具体来说,我们提出了密集的残余自我关注(DenseResAtt)模块,包括具有自注意机制武装密集的残余块,更注重于高频率的信息。此外,顺序DenseResAtt的串联连接模块的形式在图像具有优异的判别学习能力深周到网络去量化,造型代表特征映射到恢复尽可能多的有用信息。另外,由于对抗学习框架能够可靠地产生高质量的自然图像,以及对抗损耗反向传播来优化模型的训练指定的内容丢失。首先,DAGAN能够在没有带状伪影,生成逼真的高位深度的图像。在几个公共基准测试实验结果证明,该DAGAN算法具有实现出色的视觉效果和满意的定量绩效能力。
81. Plug-and-play ISTA converges with kernel denoisers [PDF] 返回目录
Ruturaj G. Gavaskar, Kunal N. Chaudhury
Abstract: Plug-and-play (PnP) method is a recent paradigm for image regularization, where the proximal operator (associated with some given regularizer) in an iterative algorithm is replaced with a powerful denoiser. Algorithmically, this involves repeated inversion (of the forward model) and denoising until convergence. Remarkably, PnP regularization produces promising results for several restoration applications. However, a fundamental question in this regard is the theoretical convergence of the PnP iterations, since the algorithm is not strictly derived from an optimization framework. This question has been investigated in recent works, but there are still many unresolved problems. For example, it is not known if convergence can be guaranteed if we use generic kernel denoisers (e.g. nonlocal means) within the ISTA framework (PnP-ISTA). We prove that, under reasonable assumptions, fixed-point convergence of PnP-ISTA is indeed guaranteed for linear inverse problems such as deblurring, inpainting and superresolution (the assumptions are verifiable for inpainting). We compare our theoretical findings with existing results, validate them numerically, and explain their practical relevance.
摘要:插件和播放(PNP)的方法是进行图像正规化,其中近端操作(与一些给出正则相关)的迭代算法替换为功能强大的降噪最近的范例。在算法上,这涉及重复反转(正演模型)和去噪直至收敛。值得注意的是,即插即用正规化产生可喜的成果数恢复应用。然而,在这方面的一个基本问题是即插即用的迭代理论收敛,因为算法不严格从优化框架的。这个问题已经在近期的作品被查处,但仍有许多未解决的问题。例如,它不知道是否能收敛,如果我们的ISTA框架用(PnP-ISTA)内使用通用内核denoisers(例如非局部方式)得到保证。我们证明了,在合理的假设,即插即用-ISTA的定点收敛确实保用线性逆问题,如去模糊,图像修复和超分辨率(假设是可验证为补绘)。我们比较我们的理论成果与现有的结果,数值对其进行验证,并解释他们的实际意义。
Ruturaj G. Gavaskar, Kunal N. Chaudhury
Abstract: Plug-and-play (PnP) method is a recent paradigm for image regularization, where the proximal operator (associated with some given regularizer) in an iterative algorithm is replaced with a powerful denoiser. Algorithmically, this involves repeated inversion (of the forward model) and denoising until convergence. Remarkably, PnP regularization produces promising results for several restoration applications. However, a fundamental question in this regard is the theoretical convergence of the PnP iterations, since the algorithm is not strictly derived from an optimization framework. This question has been investigated in recent works, but there are still many unresolved problems. For example, it is not known if convergence can be guaranteed if we use generic kernel denoisers (e.g. nonlocal means) within the ISTA framework (PnP-ISTA). We prove that, under reasonable assumptions, fixed-point convergence of PnP-ISTA is indeed guaranteed for linear inverse problems such as deblurring, inpainting and superresolution (the assumptions are verifiable for inpainting). We compare our theoretical findings with existing results, validate them numerically, and explain their practical relevance.
摘要:插件和播放(PNP)的方法是进行图像正规化,其中近端操作(与一些给出正则相关)的迭代算法替换为功能强大的降噪最近的范例。在算法上,这涉及重复反转(正演模型)和去噪直至收敛。值得注意的是,即插即用正规化产生可喜的成果数恢复应用。然而,在这方面的一个基本问题是即插即用的迭代理论收敛,因为算法不严格从优化框架的。这个问题已经在近期的作品被查处,但仍有许多未解决的问题。例如,它不知道是否能收敛,如果我们的ISTA框架用(PnP-ISTA)内使用通用内核denoisers(例如非局部方式)得到保证。我们证明了,在合理的假设,即插即用-ISTA的定点收敛确实保用线性逆问题,如去模糊,图像修复和超分辨率(假设是可验证为补绘)。我们比较我们的理论成果与现有的结果,数值对其进行验证,并解释他们的实际意义。
82. Generalized Label Enhancement with Sample Correlations [PDF] 返回目录
Qinghai Zheng, Jihua Zhu, Haoyu Tang, Xinyuan Liu, Zhongyu Li, Huimin Lu
Abstract: Recently, label distribution learning (LDL) has drawn much attention in machine learning, where LDL model is learned from labeled instances. Different from single-label and multi-label annotations, label distributions describe the instance by multiple labels with different intensities and accommodates to more general conditions. As most existing machine learning datasets merely provide logical labels, label distributions are unavailable in many real-world applications. To handle this problem, we propose two novel label enhancement methods, i.e., Label Enhancement with Sample Correlations (LESC) and generalized Label Enhancement with Sample Correlations (gLESC). More specifically, LESC employs a low-rank representation of samples in the feature space, and gLESC leverages a tensor multi-rank minimization to further investigate sample correlations in both the feature space and label space. Benefit from the sample correlation, the proposed method can boost the performance of LE. Extensive experiments on 14 benchmark datasets demonstrate that LESC and gLESC can achieve state-of-the-art results as compared to previous label enhancement baselines.
摘要:在机器学习,其中LDL模型是从标记实例了解到近日,标签分发学习(LDL)已经引起广泛关注。从单标签和多标签注释不同,标签分布由多个标签具有不同强度和容纳更一般条件描述该实例。由于大多数现有的机器学习数据集仅仅是提供合理的标签,标签分布在许多现实世界的应用程序不可用。为了解决这个问题,我们提出两种新标签增强方法,即,标签与增强样本相关性(LESC)和通用标签与增强样本相关性(gLESC)。更具体地,采用LESC在特征空间中的样本的低秩表示,并且利用gLESC张量多秩最小化,以进一步调查在特征空间和标签空间二者样本相关性。受益于样本相关系数,该方法可以提高LE的性能。 14个基准数据集大量的实验证明,LESC和gLESC相比以前的标签增强基线可以达到国家的先进成果。
Qinghai Zheng, Jihua Zhu, Haoyu Tang, Xinyuan Liu, Zhongyu Li, Huimin Lu
Abstract: Recently, label distribution learning (LDL) has drawn much attention in machine learning, where LDL model is learned from labeled instances. Different from single-label and multi-label annotations, label distributions describe the instance by multiple labels with different intensities and accommodates to more general conditions. As most existing machine learning datasets merely provide logical labels, label distributions are unavailable in many real-world applications. To handle this problem, we propose two novel label enhancement methods, i.e., Label Enhancement with Sample Correlations (LESC) and generalized Label Enhancement with Sample Correlations (gLESC). More specifically, LESC employs a low-rank representation of samples in the feature space, and gLESC leverages a tensor multi-rank minimization to further investigate sample correlations in both the feature space and label space. Benefit from the sample correlation, the proposed method can boost the performance of LE. Extensive experiments on 14 benchmark datasets demonstrate that LESC and gLESC can achieve state-of-the-art results as compared to previous label enhancement baselines.
摘要:在机器学习,其中LDL模型是从标记实例了解到近日,标签分发学习(LDL)已经引起广泛关注。从单标签和多标签注释不同,标签分布由多个标签具有不同强度和容纳更一般条件描述该实例。由于大多数现有的机器学习数据集仅仅是提供合理的标签,标签分布在许多现实世界的应用程序不可用。为了解决这个问题,我们提出两种新标签增强方法,即,标签与增强样本相关性(LESC)和通用标签与增强样本相关性(gLESC)。更具体地,采用LESC在特征空间中的样本的低秩表示,并且利用gLESC张量多秩最小化,以进一步调查在特征空间和标签空间二者样本相关性。受益于样本相关系数,该方法可以提高LE的性能。 14个基准数据集大量的实验证明,LESC和gLESC相比以前的标签增强基线可以达到国家的先进成果。
83. COVID-Xpert: An AI Powered Population Screening of COVID-19 Cases Using Chest Radiography Images [PDF] 返回目录
Xin Li, Dongxiao Zhu
Abstract: With the increasing demand for millions of COVID-19 tests, Computed Tomography (CT) based test has emerged as a promising alternative to the gold standard RT-PCR test. However, it is primarily provided in emergency department and hospital settings due to the need for expensive equipment and trained radiologists. The accurate, rapid yet inexpensive test that is suitable for population screening of COVID-19 cases at mobile, urgent and primary care settings is urgently needed. Here we design a deep convolutional neural network (CNN) that extracts X-ray Chest Radiography (XCR) imaging features from large scale pneumonia and normal training cases and refine them with a small amount of COVID-19 cases to learn the imaging features that are capable of automatically discriminating COVID-19 cases from pneumonia and/or normal XCR imaging cases. We demonstrate the strong potential of our XCR based population screening approach, COVID-Xpert, for detecting COVID-19 cases through an impressive experimental performance. The trained models and information of the compiled data set are available from this https URL.
摘要:随着数以百万计的COVID-19测试的需求不断增加,计算机断层扫描(CT)的测试已经成为一种很有前途的替代金本位RT-PCR试验。但是,它主要是在急诊室和医院的设置,由于需要昂贵的设备和训练有素的放射科医生提供。迫切需要准确,快速而廉价的测试是适合COVID-19案件人群筛查在移动,紧迫和初级保健设置。在这里,我们设计了一个深刻的卷积神经网络(CNN)提取X射线胸片(XCR)成像,从大型肺炎和正常训练的情况下拥有和使用的COVID-19的情况下少量改进他们学习的成像功能,是能够从肺炎和/或正常XCR成像情况下自动判别COVID-19的情况。我们证明了我们基于XCR人群的筛查方法,COVID-的Xpert的巨大潜力,通过令人印象深刻的性能试验检测COVID-19的情况。训练有素的模型和汇编的数据集的信息都可以从这个HTTPS URL。
Xin Li, Dongxiao Zhu
Abstract: With the increasing demand for millions of COVID-19 tests, Computed Tomography (CT) based test has emerged as a promising alternative to the gold standard RT-PCR test. However, it is primarily provided in emergency department and hospital settings due to the need for expensive equipment and trained radiologists. The accurate, rapid yet inexpensive test that is suitable for population screening of COVID-19 cases at mobile, urgent and primary care settings is urgently needed. Here we design a deep convolutional neural network (CNN) that extracts X-ray Chest Radiography (XCR) imaging features from large scale pneumonia and normal training cases and refine them with a small amount of COVID-19 cases to learn the imaging features that are capable of automatically discriminating COVID-19 cases from pneumonia and/or normal XCR imaging cases. We demonstrate the strong potential of our XCR based population screening approach, COVID-Xpert, for detecting COVID-19 cases through an impressive experimental performance. The trained models and information of the compiled data set are available from this https URL.
摘要:随着数以百万计的COVID-19测试的需求不断增加,计算机断层扫描(CT)的测试已经成为一种很有前途的替代金本位RT-PCR试验。但是,它主要是在急诊室和医院的设置,由于需要昂贵的设备和训练有素的放射科医生提供。迫切需要准确,快速而廉价的测试是适合COVID-19案件人群筛查在移动,紧迫和初级保健设置。在这里,我们设计了一个深刻的卷积神经网络(CNN)提取X射线胸片(XCR)成像,从大型肺炎和正常训练的情况下拥有和使用的COVID-19的情况下少量改进他们学习的成像功能,是能够从肺炎和/或正常XCR成像情况下自动判别COVID-19的情况。我们证明了我们基于XCR人群的筛查方法,COVID-的Xpert的巨大潜力,通过令人印象深刻的性能试验检测COVID-19的情况。训练有素的模型和汇编的数据集的信息都可以从这个HTTPS URL。
84. Dense Steerable Filter CNNs for Exploiting Rotational Symmetry in Histology Images [PDF] 返回目录
Simon Graham, David Epstein, Nasir Rajpoot
Abstract: Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior knowledge in modern Convolutional Neural Networks (CNNs), resulting in data hungry models that learn independent features at each orientation. Allowing CNNs to be rotation-equivariant removes the necessity to learn this set of transformations from the data and instead frees up model capacity, allowing more discriminative features to be learned. This reduction in the number of required parameters also reduces the risk of overfitting. In this paper, we propose Dense Steerable Filter CNNs (DSF-CNNs) that use group convolutions with multiple rotated copies of each filter in a densely connected framework. Each filter is defined as a linear combination of steerable basis filters, enabling exact rotation and decreasing the number of trainable parameters compared to standard filters. We also provide the first in-depth comparison of different rotation-equivariant CNNs for histology image analysis and demonstrate the advantage of encoding rotational symmetry into modern architectures. We show that DSF-CNNs achieve state-of-the-art performance, with significantly fewer parameters, when applied to three different tasks in the area of computational pathology: breast tumour classification, colon gland segmentation and multi-tissue nuclear segmentation.
摘要:组织学图像下旋转,其中每个方向同样为可能出现本质上是对称的。然而,这种旋转对称性没有被广泛用作现代卷积神经网络(细胞神经网络)的先验知识,从而导致数据在学习每个方向无关的特性饿车型。允许细胞神经网络是旋转等变删除要学习这一套从数据转换的必要性,而是腾出模型能力,让更多的判别特征的教训。这减少所需的参数的数量也减少了过度拟合的风险。在本文中,我们提出一种具有在一密集连接的框架各滤波器的多个循环副本密集方向可调滤波器细胞神经网络(DSF-细胞神经网络)的在使用组卷积。每个滤波器被定义为可转向的基础滤波器的线性组合,从而实现精确旋转和降低与标准过滤器可训练参数的数量。我们还提供不同的旋转等变细胞神经网络的组织学图像分析的第一次深入的比较和论证编码旋转对称到现代建筑的优势。我们发现,DSF-细胞神经网络实现国家的最先进的性能,更少的显著参数,当计算病变区域应用到三个不同的任务:乳腺肿瘤分类,大肠腺分割和多组织核分割。
Simon Graham, David Epstein, Nasir Rajpoot
Abstract: Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior knowledge in modern Convolutional Neural Networks (CNNs), resulting in data hungry models that learn independent features at each orientation. Allowing CNNs to be rotation-equivariant removes the necessity to learn this set of transformations from the data and instead frees up model capacity, allowing more discriminative features to be learned. This reduction in the number of required parameters also reduces the risk of overfitting. In this paper, we propose Dense Steerable Filter CNNs (DSF-CNNs) that use group convolutions with multiple rotated copies of each filter in a densely connected framework. Each filter is defined as a linear combination of steerable basis filters, enabling exact rotation and decreasing the number of trainable parameters compared to standard filters. We also provide the first in-depth comparison of different rotation-equivariant CNNs for histology image analysis and demonstrate the advantage of encoding rotational symmetry into modern architectures. We show that DSF-CNNs achieve state-of-the-art performance, with significantly fewer parameters, when applied to three different tasks in the area of computational pathology: breast tumour classification, colon gland segmentation and multi-tissue nuclear segmentation.
摘要:组织学图像下旋转,其中每个方向同样为可能出现本质上是对称的。然而,这种旋转对称性没有被广泛用作现代卷积神经网络(细胞神经网络)的先验知识,从而导致数据在学习每个方向无关的特性饿车型。允许细胞神经网络是旋转等变删除要学习这一套从数据转换的必要性,而是腾出模型能力,让更多的判别特征的教训。这减少所需的参数的数量也减少了过度拟合的风险。在本文中,我们提出一种具有在一密集连接的框架各滤波器的多个循环副本密集方向可调滤波器细胞神经网络(DSF-细胞神经网络)的在使用组卷积。每个滤波器被定义为可转向的基础滤波器的线性组合,从而实现精确旋转和降低与标准过滤器可训练参数的数量。我们还提供不同的旋转等变细胞神经网络的组织学图像分析的第一次深入的比较和论证编码旋转对称到现代建筑的优势。我们发现,DSF-细胞神经网络实现国家的最先进的性能,更少的显著参数,当计算病变区域应用到三个不同的任务:乳腺肿瘤分类,大肠腺分割和多组织核分割。
85. Evolving Normalization-Activation Layers [PDF] 返回目录
Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le
Abstract: Normalization layers and activation functions are critical components in deep neural networks that frequently co-locate with each other. Instead of designing them separately, we unify them into a single computation graph, and evolve its structure starting from low-level primitives. Our layer search algorithm leads to the discovery of EvoNorms, a set of new normalization-activation layers that go beyond existing design patterns. Several of these layers enjoy the property of being independent from the batch statistics. Our experiments show that EvoNorms not only excel on a variety of image classification models including ResNets, MobileNets and EfficientNets, but also transfer well to Mask R-CNN for instance segmentation and BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers by a significant margin in many cases.
摘要:归一化的层和激活功能是在深神经网络经常彼此共定位的关键组件。相反,单独设计他们的,我们统一成一个单一的计算图,并发展其结构从低级原开始。我们的层搜索算法导致EvoNorms的发现,一组超越现有的设计模式新归激活层。一些这些层的享受是独立于批次的统计特性。我们的实验表明,EvoNorms不仅擅长对各种图像分类模型,包括ResNets,MobileNets和EfficientNets,也转移阱掩盖R-CNN例如分割和BigGAN图像合成,由显著利润率跑赢BatchNorm和GroupNorm基础层在许多情况下。
Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le
Abstract: Normalization layers and activation functions are critical components in deep neural networks that frequently co-locate with each other. Instead of designing them separately, we unify them into a single computation graph, and evolve its structure starting from low-level primitives. Our layer search algorithm leads to the discovery of EvoNorms, a set of new normalization-activation layers that go beyond existing design patterns. Several of these layers enjoy the property of being independent from the batch statistics. Our experiments show that EvoNorms not only excel on a variety of image classification models including ResNets, MobileNets and EfficientNets, but also transfer well to Mask R-CNN for instance segmentation and BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers by a significant margin in many cases.
摘要:归一化的层和激活功能是在深神经网络经常彼此共定位的关键组件。相反,单独设计他们的,我们统一成一个单一的计算图,并发展其结构从低级原开始。我们的层搜索算法导致EvoNorms的发现,一组超越现有的设计模式新归激活层。一些这些层的享受是独立于批次的统计特性。我们的实验表明,EvoNorms不仅擅长对各种图像分类模型,包括ResNets,MobileNets和EfficientNets,也转移阱掩盖R-CNN例如分割和BigGAN图像合成,由显著利润率跑赢BatchNorm和GroupNorm基础层在许多情况下。
注:中文为机器翻译结果!