目录
2. Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification [PDF] 摘要
5. Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation [PDF] 摘要
14. Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization [PDF] 摘要
24. Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning [PDF] 摘要
26. Overview: Computer vision and machine learning for microstructural characterization and analysis [PDF] 摘要
28. Fuzziness-based Spatial-Spectral Class Discriminant Information Preserving Active Learning for Hyperspectral Image Classification [PDF] 摘要
29. FCN+RL: A Fully Convolutional Network followed by Refinement Layers to Offline Handwritten Signature Segmentation [PDF] 摘要
33. Hyperspectral Image Super-resolution via Deep Spatio-spectral Convolutional Neural Networks [PDF] 摘要
34. A Light-Weighted Convolutional Neural Network for Bitemporal SAR Image Change Detection [PDF] 摘要
35. Extracting low-dimensional psychological representations from convolutional neural networks [PDF] 摘要
摘要
1. PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [PDF] 返回目录
Ming Liang, Bin Yang, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun
Abstract: We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories. The key component is a novel tracking module that generates object tracks online from detections and exploits trajectory level features for motion forecasting. Specifically, the object tracks get updated at each time step by solving both the data association problem and the trajectory estimation problem. Importantly, the whole model is end-to-end trainable and benefits from joint optimization of all tasks. We validate PnPNet on two large-scale driving datasets, and show significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.
摘要:我们应对共同感知和运动预测的问题,自驾驶车辆的情况下。为了实现这一目标,我们提出PnPNet,端至端模型作为输入顺序的传感器数据,并输出在每个时间步对象轨迹和未来轨迹。的关键部件是从检测线上产生目标轨迹和利用轨迹级功能用于运动预测的新的跟踪模块。具体而言,物体的轨道,在得到通过求解数据关联问题和轨迹估计问题既每个时间段更新。重要的是,整个模型是终端到终端的可训练和各项任务联合优化的好处。我们验证PnPNet两个大型数据集驱动,并显示在国家的最先进的具有更好的堵塞恢复和更准确的预测未来的改善显著。
Ming Liang, Bin Yang, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun
Abstract: We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories. The key component is a novel tracking module that generates object tracks online from detections and exploits trajectory level features for motion forecasting. Specifically, the object tracks get updated at each time step by solving both the data association problem and the trajectory estimation problem. Importantly, the whole model is end-to-end trainable and benefits from joint optimization of all tasks. We validate PnPNet on two large-scale driving datasets, and show significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.
摘要:我们应对共同感知和运动预测的问题,自驾驶车辆的情况下。为了实现这一目标,我们提出PnPNet,端至端模型作为输入顺序的传感器数据,并输出在每个时间步对象轨迹和未来轨迹。的关键部件是从检测线上产生目标轨迹和利用轨迹级功能用于运动预测的新的跟踪模块。具体而言,物体的轨道,在得到通过求解数据关联问题和轨迹估计问题既每个时间段更新。重要的是,整个模型是终端到终端的可训练和各项任务联合优化的好处。我们验证PnPNet两个大型数据集驱动,并显示在国家的最先进的具有更好的堵塞恢复和更准确的预测未来的改善显著。
2. Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification [PDF] 返回目录
Fei Shen, Jianqing Zhu, Xiaobin Zhu, Yi Xie, Jingchang Huang
Abstract: Existing vehicle re-identification methods commonly use spatial pooling operations to aggregate feature maps extracted via off-the-shelf backbone networks. They ignore exploring the spatial significance of feature maps, eventually degrading the vehicle re-identification performance.
摘要:现有车辆再识别方法通常使用空间汇集操作以聚集体的特征图通过现成的,货架骨干网络萃取。他们无视探索特征地图的空间意义,最终降低车辆重新鉴定的性能。
Fei Shen, Jianqing Zhu, Xiaobin Zhu, Yi Xie, Jingchang Huang
Abstract: Existing vehicle re-identification methods commonly use spatial pooling operations to aggregate feature maps extracted via off-the-shelf backbone networks. They ignore exploring the spatial significance of feature maps, eventually degrading the vehicle re-identification performance.
摘要:现有车辆再识别方法通常使用空间汇集操作以聚集体的特征图通过现成的,货架骨干网络萃取。他们无视探索特征地图的空间意义,最终降低车辆重新鉴定的性能。
3. Federated Face Anti-spoofing [PDF] 返回目录
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
Abstract: Face presentation attack detection plays a critical role in the modern face recognition pipeline. A face anti-spoofing (FAS) model with good generalization can be obtained when it is trained with face images from different input distributions and different types of spoof attacks. In reality, training data (both real face images and spoof images) are not directly shared between data owners due to legal and privacy issues. In this paper, with the motivation of circumventing this challenge, we propose Federated Face Anti-spoofing (FedFAS) framework. FedFAS simultaneously takes advantage of rich FAS information available at different data owners while preserving data privacy. In the proposed framework, each data owner (referred to as \textit{data centers}) locally trains its own FAS model. A server learns a global FAS model by iteratively aggregating model updates from all data centers without accessing private data in each of them. Once the learned global model converges, it is used for FAS inference. We introduce the experimental setting to evaluate the proposed FedFAS framework and carry out extensive experiments to provide various insights about federated learning for FAS.
摘要:面部呈现攻击检测对现代面部识别管道中的关键作用。可以当它与从不同的输入分布和不同类型的欺骗攻击面部图像训练得到具有良好的泛化A面反欺骗(FAS)模式。在现实中,训练数据(真正的人脸图像和恶搞图片)没有直接数据所有者之间的共享由于法律和隐私问题。在本文中,以规避这一挑战的动机,我们提出了联合遭遇反欺骗(FedFAS)框架。 FedFAS同时利用丰富的FAS信息可以在不同的数据所有者,同时保持数据的私密性。在所提出的框架中,每个数据拥有者(被称为\ textit {数据中心})本地训练自己的FAS模型。服务器通过反复汇集来自所有的数据中心模型更新,而无需在他们每个人的访问私人数据获悉,全球FAS模型。一旦学会了全局模型收敛,它是用于FAS推断。我们介绍了实验设置,以评估该FedFAS框架,并进行了广泛的实验,以提供有关联合学习的FAS不同的见解。
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
Abstract: Face presentation attack detection plays a critical role in the modern face recognition pipeline. A face anti-spoofing (FAS) model with good generalization can be obtained when it is trained with face images from different input distributions and different types of spoof attacks. In reality, training data (both real face images and spoof images) are not directly shared between data owners due to legal and privacy issues. In this paper, with the motivation of circumventing this challenge, we propose Federated Face Anti-spoofing (FedFAS) framework. FedFAS simultaneously takes advantage of rich FAS information available at different data owners while preserving data privacy. In the proposed framework, each data owner (referred to as \textit{data centers}) locally trains its own FAS model. A server learns a global FAS model by iteratively aggregating model updates from all data centers without accessing private data in each of them. Once the learned global model converges, it is used for FAS inference. We introduce the experimental setting to evaluate the proposed FedFAS framework and carry out extensive experiments to provide various insights about federated learning for FAS.
摘要:面部呈现攻击检测对现代面部识别管道中的关键作用。可以当它与从不同的输入分布和不同类型的欺骗攻击面部图像训练得到具有良好的泛化A面反欺骗(FAS)模式。在现实中,训练数据(真正的人脸图像和恶搞图片)没有直接数据所有者之间的共享由于法律和隐私问题。在本文中,以规避这一挑战的动机,我们提出了联合遭遇反欺骗(FedFAS)框架。 FedFAS同时利用丰富的FAS信息可以在不同的数据所有者,同时保持数据的私密性。在所提出的框架中,每个数据拥有者(被称为\ textit {数据中心})本地训练自己的FAS模型。服务器通过反复汇集来自所有的数据中心模型更新,而无需在他们每个人的访问私人数据获悉,全球FAS模型。一旦学会了全局模型收敛,它是用于FAS推断。我们介绍了实验设置,以评估该FedFAS框架,并进行了广泛的实验,以提供有关联合学习的FAS不同的见解。
4. Fixed-size Objects Encoding for Visual Relationship Detection [PDF] 返回目录
Hengyue Pan, Xin Niu, Rongchun Li, Siqi Shen, Yong Dou
Abstract: In this paper, we propose a fixed-size object encoding method (FOE-VRD) to improve performance of visual relationship detection tasks. Comparing with previous methods, FOE-VRD has an important feature, i.e., it uses one fixed-size vector to encoding all objects in each input image to assist the process of relationship detection. Firstly, we use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet in input images, i.e., $<$subject-predicate-object$>$, we apply ROI-pooling to get feature vectors of two regions on the feature maps that corresponding to bounding boxes of the subject and object. Besides the subject and object, our analysis implies that the results of predicate classification may also related to the rest objects in input images (we call them background objects). Due to the variable number of background objects in different images and computational costs, we cannot generate feature vectors for them one-by-one by using ROI pooling technique. Instead, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 vectors we generate above, we successfully encode the objects using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get predicate classification results. Experimental results on VRD database (entire set and zero-shot tests) show that the proposed method works well on both predicate classification and relationship detection. $subject-predicate-object$>
摘要:在本文中,我们提出了一个固定大小的对象编码方法(FOE-VRD),以改善视觉关系检测任务的性能。与以前的方法相比,FOE-VRD具有一个重要的特征,即,它使用一个固定大小的载体能够在每个输入图像编码的所有对象,以协助关系检测的过程。首先,我们使用常规的卷积神经网络的特征提取,产生输入图像的高级特性。然后,对于输入图像,每个关系三重即$ <$主谓对象$> $,我们应用ROI-池以获得所述特征的两个区域的特征向量映射对应于主体和客体的包围盒。除了主体与客体,我们的分析意味着谓词分类的结果也可能与在输入图像其余对象(我们称之为背景对象)。由于背景的可变数目在不同的图像和计算成本的对象,我们不能使用ROI池技术产生他们一个接一个特征向量。相反,我们提出了一种新方法,通过使用一个固定大小的矢量(即,矢量FBE)来编码每个图像中所有的背景的对象。通过连接3个向量我们上面产生,我们成功地使用编码一个固定大小的矢量的对象。然后所生成的特征向量被馈送到完全连接的神经网络以获得谓词分类结果。在VRD数据库(整套和零次测试)实验结果表明,该方法适用于两个谓语分类和关系的检测。$主谓对象$>
Hengyue Pan, Xin Niu, Rongchun Li, Siqi Shen, Yong Dou
Abstract: In this paper, we propose a fixed-size object encoding method (FOE-VRD) to improve performance of visual relationship detection tasks. Comparing with previous methods, FOE-VRD has an important feature, i.e., it uses one fixed-size vector to encoding all objects in each input image to assist the process of relationship detection. Firstly, we use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet in input images, i.e., $<$subject-predicate-object$>$, we apply ROI-pooling to get feature vectors of two regions on the feature maps that corresponding to bounding boxes of the subject and object. Besides the subject and object, our analysis implies that the results of predicate classification may also related to the rest objects in input images (we call them background objects). Due to the variable number of background objects in different images and computational costs, we cannot generate feature vectors for them one-by-one by using ROI pooling technique. Instead, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 vectors we generate above, we successfully encode the objects using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get predicate classification results. Experimental results on VRD database (entire set and zero-shot tests) show that the proposed method works well on both predicate classification and relationship detection. $subject-predicate-object$>
摘要:在本文中,我们提出了一个固定大小的对象编码方法(FOE-VRD),以改善视觉关系检测任务的性能。与以前的方法相比,FOE-VRD具有一个重要的特征,即,它使用一个固定大小的载体能够在每个输入图像编码的所有对象,以协助关系检测的过程。首先,我们使用常规的卷积神经网络的特征提取,产生输入图像的高级特性。然后,对于输入图像,每个关系三重即$ <$主谓对象$> $,我们应用ROI-池以获得所述特征的两个区域的特征向量映射对应于主体和客体的包围盒。除了主体与客体,我们的分析意味着谓词分类的结果也可能与在输入图像其余对象(我们称之为背景对象)。由于背景的可变数目在不同的图像和计算成本的对象,我们不能使用ROI池技术产生他们一个接一个特征向量。相反,我们提出了一种新方法,通过使用一个固定大小的矢量(即,矢量FBE)来编码每个图像中所有的背景的对象。通过连接3个向量我们上面产生,我们成功地使用编码一个固定大小的矢量的对象。然后所生成的特征向量被馈送到完全连接的神经网络以获得谓词分类结果。在VRD数据库(整套和零次测试)实验结果表明,该方法适用于两个谓语分类和关系的检测。$主谓对象$>
5. Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation [PDF] 返回目录
Christos Sakaridis, Dengxin Dai, Luc Van Gool
Abstract: We address the problem of semantic nighttime image segmentation and improve the state-of-the-art, by adapting daytime models to nighttime without using nighttime annotations. Moreover, we design a new evaluation framework to address the substantial uncertainty of semantics in nighttime images. Our central contributions are: 1) a curriculum framework to gradually adapt semantic segmentation models from day to night through progressively darker times of day, exploiting cross-time-of-day correspondences between daytime images from a reference map and dark images to guide the label inference in the dark domains; 2) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation, including image regions beyond human recognition capability in the evaluation in a principled fashion; 3) the Dark Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight images with correspondences to their daytime counterparts plus a set of 201 nighttime images with fine pixel-level annotations created with our protocol, which serves as a first benchmark for our novel evaluation. Experiments show that our map-guided curriculum adaptation significantly outperforms state-of-the-art methods on nighttime sets both for standard metrics and our uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals that selective invalidation of predictions can improve results on data with ambiguous content such as our benchmark and profit safety-oriented applications involving invalid inputs.
摘要:针对语义夜间图像分割的问题,提高国家的最先进的,通过适应白天模式,以夜间不使用夜间的注解。此外,我们设计了一个新的评估框架,以解决语义的很大的不确定性在夜间图像。我们的核心贡献是:1)课程框架,逐步语义分割模式,从白天到夜晚,通过一天的逐渐变暗的时间适应,从参考图和暗的图像利用白天图像之间的跨时间的日对应引导标签推论在黑暗域; 2)一种新的不确定性感知注释和评估框架和度量语义分割,包括超出有原则的方式评价人类识别能力的图像区域; 3)黑暗苏黎世数据集,其包括2416未标记的夜间,并用对应2920个未标记暮图像到其白天对应加上与我们的协议,其用作对我们的新颖评价用第一基准创建精细像素级注释的一组201个夜间图像。实验表明,我们的地图导引课程调整显著优于两个标准指标和我们的不确定性感知度量夜间套先进设备,最先进的方法。此外,我们的不确定性感知评估显示,预测的选择性失效可以改善不明确的内容数据结果如涉及无效输入我们的基准和利润的安全应用。
Christos Sakaridis, Dengxin Dai, Luc Van Gool
Abstract: We address the problem of semantic nighttime image segmentation and improve the state-of-the-art, by adapting daytime models to nighttime without using nighttime annotations. Moreover, we design a new evaluation framework to address the substantial uncertainty of semantics in nighttime images. Our central contributions are: 1) a curriculum framework to gradually adapt semantic segmentation models from day to night through progressively darker times of day, exploiting cross-time-of-day correspondences between daytime images from a reference map and dark images to guide the label inference in the dark domains; 2) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation, including image regions beyond human recognition capability in the evaluation in a principled fashion; 3) the Dark Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight images with correspondences to their daytime counterparts plus a set of 201 nighttime images with fine pixel-level annotations created with our protocol, which serves as a first benchmark for our novel evaluation. Experiments show that our map-guided curriculum adaptation significantly outperforms state-of-the-art methods on nighttime sets both for standard metrics and our uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals that selective invalidation of predictions can improve results on data with ambiguous content such as our benchmark and profit safety-oriented applications involving invalid inputs.
摘要:针对语义夜间图像分割的问题,提高国家的最先进的,通过适应白天模式,以夜间不使用夜间的注解。此外,我们设计了一个新的评估框架,以解决语义的很大的不确定性在夜间图像。我们的核心贡献是:1)课程框架,逐步语义分割模式,从白天到夜晚,通过一天的逐渐变暗的时间适应,从参考图和暗的图像利用白天图像之间的跨时间的日对应引导标签推论在黑暗域; 2)一种新的不确定性感知注释和评估框架和度量语义分割,包括超出有原则的方式评价人类识别能力的图像区域; 3)黑暗苏黎世数据集,其包括2416未标记的夜间,并用对应2920个未标记暮图像到其白天对应加上与我们的协议,其用作对我们的新颖评价用第一基准创建精细像素级注释的一组201个夜间图像。实验表明,我们的地图导引课程调整显著优于两个标准指标和我们的不确定性感知度量夜间套先进设备,最先进的方法。此外,我们的不确定性感知评估显示,预测的选择性失效可以改善不明确的内容数据结果如涉及无效输入我们的基准和利润的安全应用。
6. Enhanced nonconvex low-rank representation for tensor completion [PDF] 返回目录
Haijin Zeng, Xiaozhen Xie, Jifeng Ning
Abstract: Higher-order low-rank tensor arises in many data processing applications and has attracted great interests. In this paper, we propose a new low rank model for higher-order tensor completion task based on the double nonconvex $L_{\gamma}$ norm, which is used to better approximate the rank minimization of tensor mode-matrix. An block successive upper-bound minimization method-based algorithm is designed to efficiently solve the proposed model, and it can be demonstrated that our numerical scheme converge to the coordinatewise minimizers. Numerical results on three types of public multi-dimensional datasets have tested and shown that our algorithms can recover a variety of low-rank tensors with significantly fewer samples than the compared methods.
摘要:高阶低阶张量出现在许多数据处理应用,并吸引了极大的兴趣。在本文中,我们提出了基于双非凸$ L为高阶张量完成任务,一个新的低阶型号_ {\伽玛} $规范,它是用来更好地逼近张量模式矩阵的秩最小化。连续的上部结合的最小化方法为基础的算法被设计为有效地一个块解决所提出的模型,而且它可以证明我们的数值格式收敛到coordinatewise极小。三种类型的公共多维数据集的计算结果已经测试和证明,我们的算法可以恢复各种低阶张量与比比较的方法样本显著少。
Haijin Zeng, Xiaozhen Xie, Jifeng Ning
Abstract: Higher-order low-rank tensor arises in many data processing applications and has attracted great interests. In this paper, we propose a new low rank model for higher-order tensor completion task based on the double nonconvex $L_{\gamma}$ norm, which is used to better approximate the rank minimization of tensor mode-matrix. An block successive upper-bound minimization method-based algorithm is designed to efficiently solve the proposed model, and it can be demonstrated that our numerical scheme converge to the coordinatewise minimizers. Numerical results on three types of public multi-dimensional datasets have tested and shown that our algorithms can recover a variety of low-rank tensors with significantly fewer samples than the compared methods.
摘要:高阶低阶张量出现在许多数据处理应用,并吸引了极大的兴趣。在本文中,我们提出了基于双非凸$ L为高阶张量完成任务,一个新的低阶型号_ {\伽玛} $规范,它是用来更好地逼近张量模式矩阵的秩最小化。连续的上部结合的最小化方法为基础的算法被设计为有效地一个块解决所提出的模型,而且它可以证明我们的数值格式收敛到coordinatewise极小。三种类型的公共多维数据集的计算结果已经测试和证明,我们的算法可以恢复各种低阶张量与比比较的方法样本显著少。
7. NuClick: A Deep Learning Framework for Interactive Segmentation of Microscopy Images [PDF] 返回目录
Navid Alemi Koohbanani, Mostafa Jahanifar, Neda Zamani Tajadin, Nasir Rajpoot
Abstract: Object Segmentation is an important step in the work-flow of computational pathology. Deep learning based models as the best forming models require huge amount of labeled data for precise and reliable prediction. However, collecting labeled data is expensive, because it necessarily involves expert knowledge. This is perhaps best illustrated by medical tasks where measurements call for expensive machinery and labels are the fruit of a time-consuming analysis that draws from multiple human experts. As nuclei, cells and glands are fundamental objects for downstream analysis in histology, in this paper we propose a simple CNN-based approach to speed up collecting segmentation annotation for these objects by utilizing minimum input from an annotator. We show for nuclei and cells as small objects, one click inside objects is enough to have precise annotation. For glands as large objects, providing a squiggle to show the extend of gland can guide the model to outline the exact boundaries. This supervisory signals are fed to network as an auxiliary channels along with RGB channels. With detailed experiments, we show that our approach is generalizable, robust against variations in the user input and that it can be used to obtain annotations for completely different domains. Practically, a model trained on the masks generated by NuClick could achieve first rank in LYON19 challenge. Furthermore, as the output of our framework, we release two data-sets: 1) a dataset of lymphocyte annotations within IHC images and 2) a dataset of WBCs annotated in blood sample images.
摘要:对象分割是计算病理学的工作流程的一个重要步骤。深度学习基础的模式是最好的形成模型需要精确和可靠的预测标记数据量巨大。然而,收集标签数据是昂贵的,因为它必然涉及专业知识。这也许是最好的,其中的测量要求昂贵的机器和标签是从多个人类专家绘制耗时分析的水果医疗任务的说明。作为核,细胞和腺体是用于组织学下游分析基本目的,在本文中,我们提出基于CNN-简单的方法来加快通过利用从注释器的最小输入收集分段注释这些对象。我们显示了细胞核和细胞的小物件,点击里面的对象是足有精确的注解。对于腺体大对象,提供一个波浪线以示腺体的延长可以指导模型勾勒的确切边界。该监控信号被馈送到网络与RGB通道沿着一个辅助信道。详细的实验,我们表明,我们的做法是在用户输入普及,具备应对变化,它可以用来获得完全不同的域注解。实际上,一个模型中训练由NuClick产生可以实现LYON19挑战排名第一的面具。此外,作为我们的框架的输出,我们释放两个数据集:1)淋巴细胞的注释的IHC图像内的数据集和2)在血液样本图像注释的WBC的一个数据集。
Navid Alemi Koohbanani, Mostafa Jahanifar, Neda Zamani Tajadin, Nasir Rajpoot
Abstract: Object Segmentation is an important step in the work-flow of computational pathology. Deep learning based models as the best forming models require huge amount of labeled data for precise and reliable prediction. However, collecting labeled data is expensive, because it necessarily involves expert knowledge. This is perhaps best illustrated by medical tasks where measurements call for expensive machinery and labels are the fruit of a time-consuming analysis that draws from multiple human experts. As nuclei, cells and glands are fundamental objects for downstream analysis in histology, in this paper we propose a simple CNN-based approach to speed up collecting segmentation annotation for these objects by utilizing minimum input from an annotator. We show for nuclei and cells as small objects, one click inside objects is enough to have precise annotation. For glands as large objects, providing a squiggle to show the extend of gland can guide the model to outline the exact boundaries. This supervisory signals are fed to network as an auxiliary channels along with RGB channels. With detailed experiments, we show that our approach is generalizable, robust against variations in the user input and that it can be used to obtain annotations for completely different domains. Practically, a model trained on the masks generated by NuClick could achieve first rank in LYON19 challenge. Furthermore, as the output of our framework, we release two data-sets: 1) a dataset of lymphocyte annotations within IHC images and 2) a dataset of WBCs annotated in blood sample images.
摘要:对象分割是计算病理学的工作流程的一个重要步骤。深度学习基础的模式是最好的形成模型需要精确和可靠的预测标记数据量巨大。然而,收集标签数据是昂贵的,因为它必然涉及专业知识。这也许是最好的,其中的测量要求昂贵的机器和标签是从多个人类专家绘制耗时分析的水果医疗任务的说明。作为核,细胞和腺体是用于组织学下游分析基本目的,在本文中,我们提出基于CNN-简单的方法来加快通过利用从注释器的最小输入收集分段注释这些对象。我们显示了细胞核和细胞的小物件,点击里面的对象是足有精确的注解。对于腺体大对象,提供一个波浪线以示腺体的延长可以指导模型勾勒的确切边界。该监控信号被馈送到网络与RGB通道沿着一个辅助信道。详细的实验,我们表明,我们的做法是在用户输入普及,具备应对变化,它可以用来获得完全不同的域注解。实际上,一个模型中训练由NuClick产生可以实现LYON19挑战排名第一的面具。此外,作为我们的框架的输出,我们释放两个数据集:1)淋巴细胞的注释的IHC图像内的数据集和2)在血液样本图像注释的WBC的一个数据集。
8. Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation [PDF] 返回目录
Uzair Nadeem, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel
Abstract: This paper proposes a novel concept to directly match feature descriptors extracted from 2D images with feature descriptors extracted from 3D point clouds. We use this concept to directly localize images in a 3D point cloud. We generate a dataset of matching 2D and 3D points and their corresponding feature descriptors, which is used to learn a Descriptor-Matcher classifier. To localize the pose of an image at test time, we extract keypoints and feature descriptors from the query image. The trained Descriptor-Matcher is then used to match the features from the image and the point cloud. The locations of the matched features are used in a robust pose estimation algorithm to predict the location and orientation of the query image. We carried out an extensive evaluation of the proposed method for indoor and outdoor scenarios and with different types of point clouds to verify the feasibility of our approach. Experimental results demonstrate that direct matching of feature descriptors from images and point clouds is not only a viable idea but can also be reliably used to estimate the 6-DOF poses of query cameras in any type of 3D point cloud in an unconstrained manner with high precision.
摘要:本文提出了一种新颖的概念,以直接匹配从二维图像与来自三维点云提取的特征描述符提取的特征描述符。我们用这个概念来直接本地化图像的三维点云。我们生成匹配的二维和三维点和它们对应的特征描述符,它是用来学习描述符匹配器分类的数据集。要在测试时本地化图像的姿态,我们提取关键点和特征描述符从查询图像。然后,将描述符训练-匹配器用来匹配从图像和点云的功能。的匹配的特征的位置是在一个稳健的姿态估计算法用于预测查询图像的位置和方向。我们进行了该方法的广泛评价为室内和室外场景和不同类型的点云数据来验证我们的方法的可行性。实验结果表明,从图像和点云的特征描述符的直接匹配不仅是一个可行的想法,但也可以被可靠地用来估计的查询摄像机的6-DOF姿态的任何类型的3D点云的在以高精度不受约束的方式。
Uzair Nadeem, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel
Abstract: This paper proposes a novel concept to directly match feature descriptors extracted from 2D images with feature descriptors extracted from 3D point clouds. We use this concept to directly localize images in a 3D point cloud. We generate a dataset of matching 2D and 3D points and their corresponding feature descriptors, which is used to learn a Descriptor-Matcher classifier. To localize the pose of an image at test time, we extract keypoints and feature descriptors from the query image. The trained Descriptor-Matcher is then used to match the features from the image and the point cloud. The locations of the matched features are used in a robust pose estimation algorithm to predict the location and orientation of the query image. We carried out an extensive evaluation of the proposed method for indoor and outdoor scenarios and with different types of point clouds to verify the feasibility of our approach. Experimental results demonstrate that direct matching of feature descriptors from images and point clouds is not only a viable idea but can also be reliably used to estimate the 6-DOF poses of query cameras in any type of 3D point cloud in an unconstrained manner with high precision.
摘要:本文提出了一种新颖的概念,以直接匹配从二维图像与来自三维点云提取的特征描述符提取的特征描述符。我们用这个概念来直接本地化图像的三维点云。我们生成匹配的二维和三维点和它们对应的特征描述符,它是用来学习描述符匹配器分类的数据集。要在测试时本地化图像的姿态,我们提取关键点和特征描述符从查询图像。然后,将描述符训练-匹配器用来匹配从图像和点云的功能。的匹配的特征的位置是在一个稳健的姿态估计算法用于预测查询图像的位置和方向。我们进行了该方法的广泛评价为室内和室外场景和不同类型的点云数据来验证我们的方法的可行性。实验结果表明,从图像和点云的特征描述符的直接匹配不仅是一个可行的想法,但也可以被可靠地用来估计的查询摄像机的6-DOF姿态的任何类型的3D点云的在以高精度不受约束的方式。
9. Weakly Supervised Lesion Localization With Probabilistic-CAM Pooling [PDF] 返回目录
Wenwu Ye, Jin Yao, Hui Xue, Yi Li
Abstract: Localizing thoracic diseases on chest X-ray plays a critical role in clinical practices such as diagnosis and treatment planning. However, current deep learning based approaches often require strong supervision, e.g. annotated bounding boxes, for training such systems, which is infeasible to harvest in large-scale. We present Probabilistic Class Activation Map (PCAM) pooling, a novel global pooling operation for lesion localization with only image-level supervision. PCAM pooling explicitly leverages the excellent localization ability of CAM during training in a probabilistic fashion. Experiments on the ChestX-ray14 dataset show a ResNet-34 model trained with PCAM pooling outperforms state-of-the-art baselines on both the classification task and the localization task. Visual examination on the probability maps generated by PCAM pooling shows clear and sharp boundaries around lesion regions compared to the localization heatmaps generated by CAM. PCAM pooling is open sourced at this https URL.
摘要:胸部X射线胸部本地化疾病起着临床实践,如诊断和治疗计划的关键作用。然而,目前的深度学习为基础的方法往往需要强有力的监督,例如注释边界框,为培养这样的系统,这是不可行的大规模收获。我们提出概率等级激活地图(PCAM)池,用于只图像层次监督病灶定位一个新的全球性池操作。 PCAM池明确地利用了概率的方式训练期间CAM的优秀的本地化能力。在ChestX-ray14实验数据集上与国家的最先进的PCAM池性能优于在分类任务和定位任务两条基线培养了RESNET-34模型。在概率目视检查映射通过PCAM池显示效果清晰周围病变区域尖锐边界相比,CAM所产生的本地化热图生成。 PCAM池是开源,开源在这个HTTPS URL。
Wenwu Ye, Jin Yao, Hui Xue, Yi Li
Abstract: Localizing thoracic diseases on chest X-ray plays a critical role in clinical practices such as diagnosis and treatment planning. However, current deep learning based approaches often require strong supervision, e.g. annotated bounding boxes, for training such systems, which is infeasible to harvest in large-scale. We present Probabilistic Class Activation Map (PCAM) pooling, a novel global pooling operation for lesion localization with only image-level supervision. PCAM pooling explicitly leverages the excellent localization ability of CAM during training in a probabilistic fashion. Experiments on the ChestX-ray14 dataset show a ResNet-34 model trained with PCAM pooling outperforms state-of-the-art baselines on both the classification task and the localization task. Visual examination on the probability maps generated by PCAM pooling shows clear and sharp boundaries around lesion regions compared to the localization heatmaps generated by CAM. PCAM pooling is open sourced at this https URL.
摘要:胸部X射线胸部本地化疾病起着临床实践,如诊断和治疗计划的关键作用。然而,目前的深度学习为基础的方法往往需要强有力的监督,例如注释边界框,为培养这样的系统,这是不可行的大规模收获。我们提出概率等级激活地图(PCAM)池,用于只图像层次监督病灶定位一个新的全球性池操作。 PCAM池明确地利用了概率的方式训练期间CAM的优秀的本地化能力。在ChestX-ray14实验数据集上与国家的最先进的PCAM池性能优于在分类任务和定位任务两条基线培养了RESNET-34模型。在概率目视检查映射通过PCAM池显示效果清晰周围病变区域尖锐边界相比,CAM所产生的本地化热图生成。 PCAM池是开源,开源在这个HTTPS URL。
10. WaveSNet: Wavelet Integrated Deep Networks for Image Segmentation [PDF] 返回目录
Qiufu Li, Linlin Shen
Abstract: In deep networks, the lost data details significantly degrade the performances of image segmentation. In this paper, we propose to apply Discrete Wavelet Transform (DWT) to extract the data details during feature map down-sampling, and adopt Inverse DWT (IDWT) with the extracted details during the up-sampling to recover the details. We firstly transform DWT/IDWT as general network layers, which are applicable to 1D/2D/3D data and various wavelets like Haar, Cohen, and Daubechies, etc. Then, we design wavelet integrated deep networks for image segmentation (WaveSNets) based on various architectures, including U-Net, SegNet, and DeepLabv3+. Due to the effectiveness of the DWT/IDWT in processing data details, experimental results on CamVid, Pascal VOC, and Cityscapes show that our WaveSNets achieve better segmentation performances than their vanilla versions.
摘要:在深网络,丢失的数据的详细信息显著降低图像分割的性能。在本文中,我们建议采用离散小波上采样恢复细节时变换(DWT)在特征映射下采样来提取数据的详细信息,并采取逆DWT(IDWT)与提取的细节。我们首先变换DWT / IDWT作为一般的网络层,其适用于一维/二维基于/ 3D然后,我们设计小波集成深网络用于图像分割(WaveSNets)的数据和各种小波等哈尔,科恩和Daubechies小波等各种架构,包括U形网,SegNet,和DeepLabv3 +。由于DWT / IDWT在处理数据信息的有效性,对CamVid,帕斯卡VOC和风情实验结果表明,我们的WaveSNets实现更好的分割性能比他们的香草版本。
Qiufu Li, Linlin Shen
Abstract: In deep networks, the lost data details significantly degrade the performances of image segmentation. In this paper, we propose to apply Discrete Wavelet Transform (DWT) to extract the data details during feature map down-sampling, and adopt Inverse DWT (IDWT) with the extracted details during the up-sampling to recover the details. We firstly transform DWT/IDWT as general network layers, which are applicable to 1D/2D/3D data and various wavelets like Haar, Cohen, and Daubechies, etc. Then, we design wavelet integrated deep networks for image segmentation (WaveSNets) based on various architectures, including U-Net, SegNet, and DeepLabv3+. Due to the effectiveness of the DWT/IDWT in processing data details, experimental results on CamVid, Pascal VOC, and Cityscapes show that our WaveSNets achieve better segmentation performances than their vanilla versions.
摘要:在深网络,丢失的数据的详细信息显著降低图像分割的性能。在本文中,我们建议采用离散小波上采样恢复细节时变换(DWT)在特征映射下采样来提取数据的详细信息,并采取逆DWT(IDWT)与提取的细节。我们首先变换DWT / IDWT作为一般的网络层,其适用于一维/二维基于/ 3D然后,我们设计小波集成深网络用于图像分割(WaveSNets)的数据和各种小波等哈尔,科恩和Daubechies小波等各种架构,包括U形网,SegNet,和DeepLabv3 +。由于DWT / IDWT在处理数据信息的有效性,对CamVid,帕斯卡VOC和风情实验结果表明,我们的WaveSNets实现更好的分割性能比他们的香草版本。
11. HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens [PDF] 返回目录
Zhaohui Yang, Yunhe Wang, Dacheng Tao, Xinghao Chen, Jianyuan Guo, Chunjing Xu, Chao Xu, Chang Xu
Abstract: Neural Architecture Search (NAS) refers to automatically design the architecture. We propose an hourglass-inspired approach (HourNAS) for this problem that is motivated by the fact that the effects of the architecture often proceed from the vital few blocks. Acting like the narrow neck of an hourglass, vital blocks in the guaranteed path from the input to the output of a deep neural network restrict the information flow and influence the network accuracy. The other blocks occupy the major volume of the network and determine the overall network complexity, corresponding to the bulbs of an hourglass. To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search. The search space of those non-vital blocks is further shrunk to only cover the candidates that are affordable under the computational resource constraints. Experimental results on the ImageNet show that only using 3 hours (0.1 days) with one GPU, our HourNAS can search an architecture that achieves a 77.0% Top-1 accuracy, which outperforms the state-of-the-art methods.
摘要:神经结构搜索(NAS)是指自动设计架构。我们提出这个问题是由一个事实,即建筑的影响往往是从重要的几个街区进行激励沙漏启发方法(HourNAS)。表现得像一个沙漏的窄颈,在从输入到深神经网络的输出处的保证路径重要的模块限制所述信息流和影响网络的精度。其他块占据了网络的主要体积,并确定总的网络复杂性,对应于沙漏的灯泡。要实现一个非常快速的NAS,同时保持高精确度,我们建议找出重要的模块,使它们在结构搜索的优先级。那些非重要的模块的搜索空间进一步缩小到只包括那些在计算资源的限制实惠的候选人。上,仅使用3小时(0.1天)与一个GPU的ImageNet显示实验结果,我们的HourNAS可以搜索实现了77.0%顶1精度,这优于国家的最先进的方法的体系结构。
Zhaohui Yang, Yunhe Wang, Dacheng Tao, Xinghao Chen, Jianyuan Guo, Chunjing Xu, Chao Xu, Chang Xu
Abstract: Neural Architecture Search (NAS) refers to automatically design the architecture. We propose an hourglass-inspired approach (HourNAS) for this problem that is motivated by the fact that the effects of the architecture often proceed from the vital few blocks. Acting like the narrow neck of an hourglass, vital blocks in the guaranteed path from the input to the output of a deep neural network restrict the information flow and influence the network accuracy. The other blocks occupy the major volume of the network and determine the overall network complexity, corresponding to the bulbs of an hourglass. To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search. The search space of those non-vital blocks is further shrunk to only cover the candidates that are affordable under the computational resource constraints. Experimental results on the ImageNet show that only using 3 hours (0.1 days) with one GPU, our HourNAS can search an architecture that achieves a 77.0% Top-1 accuracy, which outperforms the state-of-the-art methods.
摘要:神经结构搜索(NAS)是指自动设计架构。我们提出这个问题是由一个事实,即建筑的影响往往是从重要的几个街区进行激励沙漏启发方法(HourNAS)。表现得像一个沙漏的窄颈,在从输入到深神经网络的输出处的保证路径重要的模块限制所述信息流和影响网络的精度。其他块占据了网络的主要体积,并确定总的网络复杂性,对应于沙漏的灯泡。要实现一个非常快速的NAS,同时保持高精确度,我们建议找出重要的模块,使它们在结构搜索的优先级。那些非重要的模块的搜索空间进一步缩小到只包括那些在计算资源的限制实惠的候选人。上,仅使用3小时(0.1天)与一个GPU的ImageNet显示实验结果,我们的HourNAS可以搜索实现了77.0%顶1精度,这优于国家的最先进的方法的体系结构。
12. Dynamic Routing with Path Diversity and Consistency for Compact Network Learning [PDF] 返回目录
Huanyu Wang, Zequn Qin, Xi Li
Abstract: In this paper, we propose a novel dynamic routing inference method with diversity and consistency that better takes advantage of the network capacity. Specifically, by diverse routing, we achieve the goal of better utilizing of the network. By consistent routing, the better optimization of the routing mechanism is realized. Moreover, we propose a customizable computational cost controlling method that could balance the trade-off between cost and accuracy. Extensive ablation studies and experiments show that our method could achieve state-of-the-art results compared with the original full network, other dynamic networks and model compression methods. Our code will be made publicly available.
摘要:在本文中,我们提出了一种新的动态路由具有多样性和一致性,更好地采取网络容量的优势推理方法。具体来说,通过不同的路由,就可以实现更好的利用网络的目标。通过一致的路由,路由机制的更好的优化实现。此外,我们提出了一个可定制的计算成本控制方法,可以平衡成本和精度之间的权衡。广泛切除研究和实验表明,我们的方法可以与原来全网,其它动态网络和模型压缩方法相比,实现国家的最先进的成果。我们的代码将被公之于众。
Huanyu Wang, Zequn Qin, Xi Li
Abstract: In this paper, we propose a novel dynamic routing inference method with diversity and consistency that better takes advantage of the network capacity. Specifically, by diverse routing, we achieve the goal of better utilizing of the network. By consistent routing, the better optimization of the routing mechanism is realized. Moreover, we propose a customizable computational cost controlling method that could balance the trade-off between cost and accuracy. Extensive ablation studies and experiments show that our method could achieve state-of-the-art results compared with the original full network, other dynamic networks and model compression methods. Our code will be made publicly available.
摘要:在本文中,我们提出了一种新的动态路由具有多样性和一致性,更好地采取网络容量的优势推理方法。具体来说,通过不同的路由,就可以实现更好的利用网络的目标。通过一致的路由,路由机制的更好的优化实现。此外,我们提出了一个可定制的计算成本控制方法,可以平衡成本和精度之间的权衡。广泛切除研究和实验表明,我们的方法可以与原来全网,其它动态网络和模型压缩方法相比,实现国家的最先进的成果。我们的代码将被公之于众。
13. High-order structure preserving graph neural network for few-shot learning [PDF] 返回目录
Guangfeng Lin, Ying Yang, Yindi Fan, Xiaobing Kang, Kaiyang Liao, Fan Zhao
Abstract: Few-shot learning can find the latent structure information between the prior knowledge and the queried data by the similarity metric of meta-learning to construct the discriminative model for recognizing the new categories with the rare labeled samples. Most existing methods try to model the similarity relationship of the samples in the intra tasks, and generalize the model to identify the new categories. However, the relationship of samples between the separated tasks is difficultly considered because of the different metric criterion in the respective tasks. In contrast, the proposed high-order structure preserving graph neural network(HOSP-GNN) can further explore the rich structure of the samples to predict the label of the queried data on graph that enables the structure evolution to explicitly discriminate the categories by iteratively updating the high-order structure relationship (the relative metric in multi-samples,instead of pairwise sample metric) with the manifold structure constraints. HOSP-GNN can not only mine the high-order structure for complementing the relevance between samples that may be divided into the different task in meta-learning, and but also generate the rule of the structure updating by manifold constraint. Furthermore, HOSP-GNN doesn't need retrain the learning model for recognizing the new classes, and HOSP-GNN has the well-generalizable high-order structure for model adaptability. Experiments show that HOSP-GNN outperforms the state-of-the-art methods on supervised and semi-supervised few-shot learning in three benchmark datasets that are miniImageNet, tieredImageNet and FC100.
摘要:很少拍学习可以找到先验知识和相似性量度元学习的构建判别模型识别新的类别与稀土标记的样品所查询的数据之间的潜在结构的信息。大多数现有的方法试图模拟样品在内部任务的相似性关系,并推广模型,以确定新的类别。然而,分离的任务之间的样本的关系难以认为是由于在各任务的不同度量标准的。与此相反,所提出的高次结构保留图表神经网络(HOSP-GNN)可以进一步探讨样品的丰富结构来预测查询的数据的上曲线图,其使得结构演进明确的标签识别的类别通过迭代地更新高阶结构的关系(多样品中的相对度量,而不是成对样品度量)与所述歧管结构的约束。 HOSP-GNN不仅可以用于雷补充样品可以分为元学习不同的任务之间的相关性,但也产生了结构更新由总管约束规则中的高次结构。此外,HOSP-GNN不需要重新培训学习模型识别新的类,HOSP-GNN对模型适应性好,可推广的高阶结构。实验表明,HOSP-GNN优于在国家的最先进的方法监督,在三个标准数据集是miniImageNet,tieredImageNet和FC100半监督几拍的学习。
Guangfeng Lin, Ying Yang, Yindi Fan, Xiaobing Kang, Kaiyang Liao, Fan Zhao
Abstract: Few-shot learning can find the latent structure information between the prior knowledge and the queried data by the similarity metric of meta-learning to construct the discriminative model for recognizing the new categories with the rare labeled samples. Most existing methods try to model the similarity relationship of the samples in the intra tasks, and generalize the model to identify the new categories. However, the relationship of samples between the separated tasks is difficultly considered because of the different metric criterion in the respective tasks. In contrast, the proposed high-order structure preserving graph neural network(HOSP-GNN) can further explore the rich structure of the samples to predict the label of the queried data on graph that enables the structure evolution to explicitly discriminate the categories by iteratively updating the high-order structure relationship (the relative metric in multi-samples,instead of pairwise sample metric) with the manifold structure constraints. HOSP-GNN can not only mine the high-order structure for complementing the relevance between samples that may be divided into the different task in meta-learning, and but also generate the rule of the structure updating by manifold constraint. Furthermore, HOSP-GNN doesn't need retrain the learning model for recognizing the new classes, and HOSP-GNN has the well-generalizable high-order structure for model adaptability. Experiments show that HOSP-GNN outperforms the state-of-the-art methods on supervised and semi-supervised few-shot learning in three benchmark datasets that are miniImageNet, tieredImageNet and FC100.
摘要:很少拍学习可以找到先验知识和相似性量度元学习的构建判别模型识别新的类别与稀土标记的样品所查询的数据之间的潜在结构的信息。大多数现有的方法试图模拟样品在内部任务的相似性关系,并推广模型,以确定新的类别。然而,分离的任务之间的样本的关系难以认为是由于在各任务的不同度量标准的。与此相反,所提出的高次结构保留图表神经网络(HOSP-GNN)可以进一步探讨样品的丰富结构来预测查询的数据的上曲线图,其使得结构演进明确的标签识别的类别通过迭代地更新高阶结构的关系(多样品中的相对度量,而不是成对样品度量)与所述歧管结构的约束。 HOSP-GNN不仅可以用于雷补充样品可以分为元学习不同的任务之间的相关性,但也产生了结构更新由总管约束规则中的高次结构。此外,HOSP-GNN不需要重新培训学习模型识别新的类,HOSP-GNN对模型适应性好,可推广的高阶结构。实验表明,HOSP-GNN优于在国家的最先进的方法监督,在三个标准数据集是miniImageNet,tieredImageNet和FC100半监督几拍的学习。
14. Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization [PDF] 返回目录
Komal Chugh, Parul Gupta, Abhinav Dhall, Ramanathan Subramanian
Abstract: We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, eg, loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7%. We also demonstrate temporal forgery localization, and show how our technique identifies the manipulated video segments.
摘要:本文提出的基于音频和视觉方式之间的差异性,称为该形态不和谐分数(MDS)deepfake视频检测。我们推测任一形态的那个操纵将导致两个模态之间DIS-和谐,例如,唇同步,非天然的面部和嘴唇运动等MDS被计算为不相似度得分的一个音频和视频段之间的聚集体的损失视频。判别特征被学习用于在块式的方式在音频和视频信道,采用单个方式的交叉熵损失,和一个对比损失模型模态间的相似性。在DFDC和DeepFake-TIMIT数据集大量实验表明,我们的方法优于国家的最先进的高达7%。我们还演示了时间伪造的定位,并显示我们如何识别技术的操作视频片段。
Komal Chugh, Parul Gupta, Abhinav Dhall, Ramanathan Subramanian
Abstract: We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, eg, loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7%. We also demonstrate temporal forgery localization, and show how our technique identifies the manipulated video segments.
摘要:本文提出的基于音频和视觉方式之间的差异性,称为该形态不和谐分数(MDS)deepfake视频检测。我们推测任一形态的那个操纵将导致两个模态之间DIS-和谐,例如,唇同步,非天然的面部和嘴唇运动等MDS被计算为不相似度得分的一个音频和视频段之间的聚集体的损失视频。判别特征被学习用于在块式的方式在音频和视频信道,采用单个方式的交叉熵损失,和一个对比损失模型模态间的相似性。在DFDC和DeepFake-TIMIT数据集大量实验表明,我们的方法优于国家的最先进的高达7%。我们还演示了时间伪造的定位,并显示我们如何识别技术的操作视频片段。
15. Deep graph learning for semi-supervised classification [PDF] 返回目录
Guangfeng Lin, Xiaobing Kang, Kaiyang Liao, Fan Zhao, Yajun Chen
Abstract: Graph learning (GL) can dynamically capture the distribution structure (graph structure) of data based on graph convolutional networks (GCN), and the learning quality of the graph structure directly influences GCN for semi-supervised classification. Existing methods mostly combine the computational layer and the related losses into GCN for exploring the global graph(measuring graph structure from all data samples) or local graph (measuring graph structure from local data samples). Global graph emphasises on the whole structure description of the inter-class data, while local graph trend to the neighborhood structure representation of intra-class data. However, it is difficult to simultaneously balance these graphs of the learning process for semi-supervised classification because of the interdependence of these graphs. To simulate the interdependence, deep graph learning(DGL) is proposed to find the better graph representation for semi-supervised classification. DGL can not only learn the global structure by the previous layer metric computation updating, but also mine the local structure by next layer local weight reassignment. Furthermore, DGL can fuse the different structures by dynamically encoding the interdependence of these structures, and deeply mine the relationship of the different structures by the hierarchical progressive learning for improving the performance of semi-supervised classification. Experiments demonstrate the DGL outperforms state-of-the-art methods on three benchmark datasets (Citeseer,Cora, and Pubmed) for citation networks and two benchmark datasets (MNIST and Cifar10) for images.
摘要:图形学习(GL)可以动态地捕获数据的分布结构(图结构)基于图卷积网络(GDN),和图结构的学习质量直接影响GCN用于半监督分类。现有的方法大多结合的计算层和相关的损失到GCN游览全球图(从所有数据样本测量图形结构)或本地图(从本地数据样本测量曲线结构)。全球图形强调,级间数据的整体结构的描述,而本地图形趋势类内数据的邻域结构表示。但是,很难同时兼顾学习过程中的这些图的半监督分类,因为这些图形的相互依存关系。为了模拟的相互依存关系,深图表学习(DGL)的建议找到半监督分类的好图表示。 DGL不仅可以了解由上一层度量计算更新的全球结构,而且矿井局部结构由下一层局部权重的重新分配。此外,DGL可以通过动态地编码这些结构的相互依赖融合不同的结构,并深深挖掘,不同结构的由分层渐进式学习的关系用于改善半监督分类的性能。实验证明国家的最先进的DGL性能优于三个基准数据集(Citeseer,科拉和PUBMED),用于引网络和两个基准数据集(MNIST和Cifar10),用于图像的方法。
Guangfeng Lin, Xiaobing Kang, Kaiyang Liao, Fan Zhao, Yajun Chen
Abstract: Graph learning (GL) can dynamically capture the distribution structure (graph structure) of data based on graph convolutional networks (GCN), and the learning quality of the graph structure directly influences GCN for semi-supervised classification. Existing methods mostly combine the computational layer and the related losses into GCN for exploring the global graph(measuring graph structure from all data samples) or local graph (measuring graph structure from local data samples). Global graph emphasises on the whole structure description of the inter-class data, while local graph trend to the neighborhood structure representation of intra-class data. However, it is difficult to simultaneously balance these graphs of the learning process for semi-supervised classification because of the interdependence of these graphs. To simulate the interdependence, deep graph learning(DGL) is proposed to find the better graph representation for semi-supervised classification. DGL can not only learn the global structure by the previous layer metric computation updating, but also mine the local structure by next layer local weight reassignment. Furthermore, DGL can fuse the different structures by dynamically encoding the interdependence of these structures, and deeply mine the relationship of the different structures by the hierarchical progressive learning for improving the performance of semi-supervised classification. Experiments demonstrate the DGL outperforms state-of-the-art methods on three benchmark datasets (Citeseer,Cora, and Pubmed) for citation networks and two benchmark datasets (MNIST and Cifar10) for images.
摘要:图形学习(GL)可以动态地捕获数据的分布结构(图结构)基于图卷积网络(GDN),和图结构的学习质量直接影响GCN用于半监督分类。现有的方法大多结合的计算层和相关的损失到GCN游览全球图(从所有数据样本测量图形结构)或本地图(从本地数据样本测量曲线结构)。全球图形强调,级间数据的整体结构的描述,而本地图形趋势类内数据的邻域结构表示。但是,很难同时兼顾学习过程中的这些图的半监督分类,因为这些图形的相互依存关系。为了模拟的相互依存关系,深图表学习(DGL)的建议找到半监督分类的好图表示。 DGL不仅可以了解由上一层度量计算更新的全球结构,而且矿井局部结构由下一层局部权重的重新分配。此外,DGL可以通过动态地编码这些结构的相互依赖融合不同的结构,并深深挖掘,不同结构的由分层渐进式学习的关系用于改善半监督分类的性能。实验证明国家的最先进的DGL性能优于三个基准数据集(Citeseer,科拉和PUBMED),用于引网络和两个基准数据集(MNIST和Cifar10),用于图像的方法。
16. Privacy-Protection Drone Patrol System based on Face Anonymization [PDF] 返回目录
Harim Lee, Myeung Un Kim, Yeongjun Kim, Hyeonsu Lyu, Hyun Jong Yang
Abstract: The robot market has been growing significantly and is expected to become 1.5 times larger in 2024 than what it was in 2019. Robots have attracted attention of security companies thanks to their mobility. These days, for security robots, unmanned aerial vehicles (UAVs) have quickly emerged by highlighting their advantage: they can even go to any hazardous place that humans cannot access. For UAVs, Drone has been a representative model and has several merits to consist of various sensors such as high-resolution cameras. Therefore, Drone is the most suitable as a mobile surveillance robot. These attractive advantages such as high-resolution cameras and mobility can be a double-edged sword, i.e., privacy infringement. Surveillance drones take videos with high-resolution to fulfill their role, however, those contain a lot of privacy sensitive information. The indiscriminate shooting is a critical issue for those who are very reluctant to be exposed. To tackle the privacy infringement, this work proposes face-anonymizing drone patrol system. In this system, one person's face in a video is transformed into a different face with facial components maintained. To construct our privacy-preserving system, we have adopted the latest generative adversarial networks frameworks and have some modifications on losses of those frameworks. Our face-anonymzing approach is evaluated with various public face-image and video dataset. Moreover, our system is evaluated with a customized drone consisting of a high-resolution camera, a companion computer, and a drone control computer. Finally, we confirm that our system can protect privacy sensitive information with our face-anonymzing algorithm while preserving the performance of robot perception, i.e., simultaneous localization and mapping.
摘要:机器人市场已经显著增长,并有望成为2024年比它在2019年的机器人吸引了感谢保安公司关注其流动性大1.5倍。这些天来,为了安全机器人,无人飞行器(UAV)已迅速通过强调自己的优势出现了:他们甚至可以去任何地方危害人类无法访问。对于无人机,无人机一直是代表模型,有几个优点,以包括各种传感器,诸如高分辨率摄像机。因此,雄蜂是最适合作为移动监视机器人。这些诱人的优点,如高分辨率相机和移动性可以是一把双刃剑,即侵犯隐私权。侦察机采取与高分辨率的视频,以发挥其作用,但是,这些含有大量的隐私敏感信息。在滥射是为那些谁是极不情愿的被暴露的一个关键问题。为了解决隐私侵犯,这项工作提出了面对面的匿名无人机巡逻制度。在这个系统中,一个人在一个视频面部被转换成不同的脸,面部组件保持不变。为了构建我们的隐私保护系统,我们采用最新的生成对抗网络框架和对这些框架的损失了一些修改。我们的脸anonymzing方法与各种公共面部图像和视频数据集进行评估。此外,我们的系统与由高清晰度摄像头,一个同伴计算机和无人驾驶飞机控制计算机的定制无人机进行评估。最后,我们可以证实,我们的系统可以保护隐私敏感信息,与我们面对面anonymzing算法,同时保持机器人的感知,即同时定位和地图的性能。
Harim Lee, Myeung Un Kim, Yeongjun Kim, Hyeonsu Lyu, Hyun Jong Yang
Abstract: The robot market has been growing significantly and is expected to become 1.5 times larger in 2024 than what it was in 2019. Robots have attracted attention of security companies thanks to their mobility. These days, for security robots, unmanned aerial vehicles (UAVs) have quickly emerged by highlighting their advantage: they can even go to any hazardous place that humans cannot access. For UAVs, Drone has been a representative model and has several merits to consist of various sensors such as high-resolution cameras. Therefore, Drone is the most suitable as a mobile surveillance robot. These attractive advantages such as high-resolution cameras and mobility can be a double-edged sword, i.e., privacy infringement. Surveillance drones take videos with high-resolution to fulfill their role, however, those contain a lot of privacy sensitive information. The indiscriminate shooting is a critical issue for those who are very reluctant to be exposed. To tackle the privacy infringement, this work proposes face-anonymizing drone patrol system. In this system, one person's face in a video is transformed into a different face with facial components maintained. To construct our privacy-preserving system, we have adopted the latest generative adversarial networks frameworks and have some modifications on losses of those frameworks. Our face-anonymzing approach is evaluated with various public face-image and video dataset. Moreover, our system is evaluated with a customized drone consisting of a high-resolution camera, a companion computer, and a drone control computer. Finally, we confirm that our system can protect privacy sensitive information with our face-anonymzing algorithm while preserving the performance of robot perception, i.e., simultaneous localization and mapping.
摘要:机器人市场已经显著增长,并有望成为2024年比它在2019年的机器人吸引了感谢保安公司关注其流动性大1.5倍。这些天来,为了安全机器人,无人飞行器(UAV)已迅速通过强调自己的优势出现了:他们甚至可以去任何地方危害人类无法访问。对于无人机,无人机一直是代表模型,有几个优点,以包括各种传感器,诸如高分辨率摄像机。因此,雄蜂是最适合作为移动监视机器人。这些诱人的优点,如高分辨率相机和移动性可以是一把双刃剑,即侵犯隐私权。侦察机采取与高分辨率的视频,以发挥其作用,但是,这些含有大量的隐私敏感信息。在滥射是为那些谁是极不情愿的被暴露的一个关键问题。为了解决隐私侵犯,这项工作提出了面对面的匿名无人机巡逻制度。在这个系统中,一个人在一个视频面部被转换成不同的脸,面部组件保持不变。为了构建我们的隐私保护系统,我们采用最新的生成对抗网络框架和对这些框架的损失了一些修改。我们的脸anonymzing方法与各种公共面部图像和视频数据集进行评估。此外,我们的系统与由高清晰度摄像头,一个同伴计算机和无人驾驶飞机控制计算机的定制无人机进行评估。最后,我们可以证实,我们的系统可以保护隐私敏感信息,与我们面对面anonymzing算法,同时保持机器人的感知,即同时定位和地图的性能。
17. Controlling Length in Image Captioning [PDF] 返回目录
Ruotian Luo, Greg Shakhnarovich
Abstract: We develop and evaluate captioning models that allow control of caption length. Our models can leverage this control to generate captions of different style and descriptiveness.
摘要:我们开发和评估字幕模式,使标题长度的控制。我们的模型可以利用这种控制产生不同的风格和描述性标题。
Ruotian Luo, Greg Shakhnarovich
Abstract: We develop and evaluate captioning models that allow control of caption length. Our models can leverage this control to generate captions of different style and descriptiveness.
摘要:我们开发和评估字幕模式,使标题长度的控制。我们的模型可以利用这种控制产生不同的风格和描述性标题。
18. UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [PDF] 返回目录
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Abstract: Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC content are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and VQA model design. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models to create a new fusion-based BVQA model, which we dub the \textbf{VID}eo quality \textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: \url{this https URL}.
摘要:近年来,双方共享用户生成内容(UGC)视频的爆炸和流在互联网上,由于价格实惠,可靠的消费捕捉设备的发展,以及社会媒体平台的巨大人气。因此,精确的视频质量评估(VQA)模型UGC /消费视频的精彩需要监测,控制和优化这个庞大的内容。在最疯狂的视频是相当具有挑战性,因为UGC内容的质量的下降是不可预测的,复杂的,而且经常混合盲质量预测。在这里,我们促成通过开展领导无参考/盲VQA(BVQA)的综合评价推进UGC-VQA问题特点和模型的固定评估架构,产生于主观视频质量的研究和VQA模型设计新的经验见解。通过采用上领先的VQA模式的特点顶部的特征选择策略,我们能够提取由领军车型使用的763个统计特征60来创建新的基于融合的BVQA模型,这是我们配音\ textbf {VID} EO质量\ textbf {EVAL} uator(VIDEVAL),即有效地平衡VQA性能和效率之间的折衷。我们的实验结果表明,VIDEVAL比其他主要的模式大大降低计算成本实现国家的最先进的性能。我们的研究协议还定义了UGC-VQA问题可靠的基准,我们相信这将有利于深学习型VQA建模研究的深入,以及感知优化高效的UGC视频处理,转码和流。为了促进可重复的研究和公众评价,VIDEVAL的实现已经在网上提供:\ {URL这HTTPS URL}。
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Abstract: Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC content are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and VQA model design. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models to create a new fusion-based BVQA model, which we dub the \textbf{VID}eo quality \textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: \url{this https URL}.
摘要:近年来,双方共享用户生成内容(UGC)视频的爆炸和流在互联网上,由于价格实惠,可靠的消费捕捉设备的发展,以及社会媒体平台的巨大人气。因此,精确的视频质量评估(VQA)模型UGC /消费视频的精彩需要监测,控制和优化这个庞大的内容。在最疯狂的视频是相当具有挑战性,因为UGC内容的质量的下降是不可预测的,复杂的,而且经常混合盲质量预测。在这里,我们促成通过开展领导无参考/盲VQA(BVQA)的综合评价推进UGC-VQA问题特点和模型的固定评估架构,产生于主观视频质量的研究和VQA模型设计新的经验见解。通过采用上领先的VQA模式的特点顶部的特征选择策略,我们能够提取由领军车型使用的763个统计特征60来创建新的基于融合的BVQA模型,这是我们配音\ textbf {VID} EO质量\ textbf {EVAL} uator(VIDEVAL),即有效地平衡VQA性能和效率之间的折衷。我们的实验结果表明,VIDEVAL比其他主要的模式大大降低计算成本实现国家的最先进的性能。我们的研究协议还定义了UGC-VQA问题可靠的基准,我们相信这将有利于深学习型VQA建模研究的深入,以及感知优化高效的UGC视频处理,转码和流。为了促进可重复的研究和公众评价,VIDEVAL的实现已经在网上提供:\ {URL这HTTPS URL}。
19. Self-Attention Dense Depth Estimation Network for Unrectified Video Sequences [PDF] 返回目录
Alwyn Mathew, Aditya Prakash Patra, Jimson Mathew
Abstract: The dense depth estimation of a 3D scene has numerous applications, mainly in robotics and surveillance. LiDAR and radar sensors are the hardware solution for real-time depth estimation, but these sensors produce sparse depth maps and are sometimes unreliable. In recent years research aimed at tackling depth estimation using single 2D image has received a lot of attention. The deep learning based self-supervised depth estimation methods from the rectified stereo and monocular video frames have shown promising results. We propose a self-attention based depth and ego-motion network for unrectified images. We also introduce non-differentiable distortion of the camera into the training pipeline. Our approach performs competitively when compared to other established approaches that used rectified images for depth estimation.
摘要:3D场景的密集深度估计有无数的应用,主要是在机器人和监视。激光雷达和雷达传感器用于实时深度估计的硬件解决方案,但这些传感器产生稀疏深度图和有时是不可靠的。近年来的研究旨在利用单个2D图像已经收到了很多关注解决深度估计。从整流立体声和单目视频帧的深度学习基于自我监督的深度估计方法已展现出可喜效果。我们建议对未校正图像的自我关注为基础的深度和自我运动网络。我们还介绍了相机的非微失真投入到训练中的管道。相比使用整流图像进行深度估计等地建立了方法,当我们的方法进行竞争。
Alwyn Mathew, Aditya Prakash Patra, Jimson Mathew
Abstract: The dense depth estimation of a 3D scene has numerous applications, mainly in robotics and surveillance. LiDAR and radar sensors are the hardware solution for real-time depth estimation, but these sensors produce sparse depth maps and are sometimes unreliable. In recent years research aimed at tackling depth estimation using single 2D image has received a lot of attention. The deep learning based self-supervised depth estimation methods from the rectified stereo and monocular video frames have shown promising results. We propose a self-attention based depth and ego-motion network for unrectified images. We also introduce non-differentiable distortion of the camera into the training pipeline. Our approach performs competitively when compared to other established approaches that used rectified images for depth estimation.
摘要:3D场景的密集深度估计有无数的应用,主要是在机器人和监视。激光雷达和雷达传感器用于实时深度估计的硬件解决方案,但这些传感器产生稀疏深度图和有时是不可靠的。近年来的研究旨在利用单个2D图像已经收到了很多关注解决深度估计。从整流立体声和单目视频帧的深度学习基于自我监督的深度估计方法已展现出可喜效果。我们建议对未校正图像的自我关注为基础的深度和自我运动网络。我们还介绍了相机的非微失真投入到训练中的管道。相比使用整流图像进行深度估计等地建立了方法,当我们的方法进行竞争。
20. Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning [PDF] 返回目录
Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai
Abstract: Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer's internal belief states were modeled as dynamic contextual belief maps of object locations. These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.
摘要:能预测人类的目光行为对行为视觉和计算机视觉应用的重要性显而易见。大多数型号主要集中在使用显着图预测免费观看的行为,但这些预测没有推广到目标导向的行为,如当一个人搜索的可视化目标对象。我们建议第一逆强化学习(IRL)模型来学习视觉搜索过程中使用人类内部回报功能和政策。观看者的内部信念状态为蓝本,为物体位置的动态语境信念地图。这些地图由IRL教训,然后用于预测多个目标类别的行为扫描路径。培训和评估我们的IRL模型中,我们创建了COCO-Search18,这是目前存在的高品质的搜索注视的最大的数据集。 COCO-Search18具有10名参与者搜索在6202个图像中的每个的18目标对象类别,使得约300,000目标导向注视。当训练和对COCO-Search18,在预测搜索固定扫描路径的IRL模型跑赢基准模型进行评估,无论是在相似的人肉搜索行为的条款和搜索效率。最后,奖励地图恢复由IRL模型揭示对象优先级区分,这是我们理解作为学习对象上下文的鲜明目标相关的图案。
Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai
Abstract: Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer's internal belief states were modeled as dynamic contextual belief maps of object locations. These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.
摘要:能预测人类的目光行为对行为视觉和计算机视觉应用的重要性显而易见。大多数型号主要集中在使用显着图预测免费观看的行为,但这些预测没有推广到目标导向的行为,如当一个人搜索的可视化目标对象。我们建议第一逆强化学习(IRL)模型来学习视觉搜索过程中使用人类内部回报功能和政策。观看者的内部信念状态为蓝本,为物体位置的动态语境信念地图。这些地图由IRL教训,然后用于预测多个目标类别的行为扫描路径。培训和评估我们的IRL模型中,我们创建了COCO-Search18,这是目前存在的高品质的搜索注视的最大的数据集。 COCO-Search18具有10名参与者搜索在6202个图像中的每个的18目标对象类别,使得约300,000目标导向注视。当训练和对COCO-Search18,在预测搜索固定扫描路径的IRL模型跑赢基准模型进行评估,无论是在相似的人肉搜索行为的条款和搜索效率。最后,奖励地图恢复由IRL模型揭示对象优先级区分,这是我们理解作为学习对象上下文的鲜明目标相关的图案。
21. Combining Fine- and Coarse-Grained Classifiers for Diabetic Retinopathy Detection [PDF] 返回目录
Muhammad Naseer Bajwa, Yoshinobu Taniguchi, Muhammad Imran Malik, Wolfgang Neumeier, Andreas Dengel, Sheraz Ahmed
Abstract: Visual artefacts of early diabetic retinopathy in retinal fundus images are usually small in size, inconspicuous, and scattered all over retina. Detecting diabetic retinopathy requires physicians to look at the whole image and fixate on some specific regions to locate potential biomarkers of the disease. Therefore, getting inspiration from ophthalmologist, we propose to combine coarse-grained classifiers that detect discriminating features from the whole images, with a recent breed of fine-grained classifiers that discover and pay particular attention to pathologically significant regions. To evaluate the performance of this proposed ensemble, we used publicly available EyePACS and Messidor datasets. Extensive experimentation for binary, ternary and quaternary classification shows that this ensemble largely outperforms individual image classifiers as well as most of the published works in most training setups for diabetic retinopathy detection. Furthermore, the performance of fine-grained classifiers is found notably superior than coarse-grained image classifiers encouraging the development of task-oriented fine-grained classifiers modelled after specialist ophthalmologists.
摘要:在视网膜眼底图像早期糖尿病性视网膜病变的视觉假象通常体积小,不起眼,而散落在视网膜上。检测糖尿病视网膜病变,需要在整个图像和注视部分特定地区医生看看,找到疾病的潜在生物标志物。因此,从眼科医生获得灵感,我们建议粗粒度的分类检测来自整个图像识别功能,与最近的是发现并特别注意病理显著地区细粒度分类的品种相结合。为了评估这一提议合奏的性能,我们使用公开可用的EyePACS并获月的数据集。二元,三元和四元分类显示广泛的实验,这在很大程度上合奏优于单独的图像的分类器以及大部分的出版的作品在糖尿病性视网膜病变检测最训练设置。此外,细粒度分类的性能比发现粗粒度的图像分类,鼓励专业眼科医生仿照面向任务的细粒度分类的发展明显优越。
Muhammad Naseer Bajwa, Yoshinobu Taniguchi, Muhammad Imran Malik, Wolfgang Neumeier, Andreas Dengel, Sheraz Ahmed
Abstract: Visual artefacts of early diabetic retinopathy in retinal fundus images are usually small in size, inconspicuous, and scattered all over retina. Detecting diabetic retinopathy requires physicians to look at the whole image and fixate on some specific regions to locate potential biomarkers of the disease. Therefore, getting inspiration from ophthalmologist, we propose to combine coarse-grained classifiers that detect discriminating features from the whole images, with a recent breed of fine-grained classifiers that discover and pay particular attention to pathologically significant regions. To evaluate the performance of this proposed ensemble, we used publicly available EyePACS and Messidor datasets. Extensive experimentation for binary, ternary and quaternary classification shows that this ensemble largely outperforms individual image classifiers as well as most of the published works in most training setups for diabetic retinopathy detection. Furthermore, the performance of fine-grained classifiers is found notably superior than coarse-grained image classifiers encouraging the development of task-oriented fine-grained classifiers modelled after specialist ophthalmologists.
摘要:在视网膜眼底图像早期糖尿病性视网膜病变的视觉假象通常体积小,不起眼,而散落在视网膜上。检测糖尿病视网膜病变,需要在整个图像和注视部分特定地区医生看看,找到疾病的潜在生物标志物。因此,从眼科医生获得灵感,我们建议粗粒度的分类检测来自整个图像识别功能,与最近的是发现并特别注意病理显著地区细粒度分类的品种相结合。为了评估这一提议合奏的性能,我们使用公开可用的EyePACS并获月的数据集。二元,三元和四元分类显示广泛的实验,这在很大程度上合奏优于单独的图像的分类器以及大部分的出版的作品在糖尿病性视网膜病变检测最训练设置。此外,细粒度分类的性能比发现粗粒度的图像分类,鼓励专业眼科医生仿照面向任务的细粒度分类的发展明显优越。
22. Monocular Depth Estimators: Vulnerabilities and Attacks [PDF] 返回目录
Alwyn Mathew, Aditya Prakash Patra, Jimson Mathew
Abstract: Recent advancements of neural networks lead to reliable monocular depth estimation. Monocular depth estimated techniques have the upper hand over traditional depth estimation techniques as it only needs one image during inference. Depth estimation is one of the essential tasks in robotics, and monocular depth estimation has a wide variety of safety-critical applications like in self-driving cars and surgical devices. Thus, the robustness of such techniques is very crucial. It has been shown in recent works that these deep neural networks are highly vulnerable to adversarial samples for tasks like classification, detection and segmentation. These adversarial samples can completely ruin the output of the system, making their credibility in real-time deployment questionable. In this paper, we investigate the robustness of the most state-of-the-art monocular depth estimation networks against adversarial attacks. Our experiments show that tiny perturbations on an image that are invisible to the naked eye (perturbation attack) and corruption less than about 1% of an image (patch attack) can affect the depth estimation drastically. We introduce a novel deep feature annihilation loss that corrupts the hidden feature space representation forcing the decoder of the network to output poor depth maps. The white-box and black-box test compliments the effectiveness of the proposed attack. We also perform adversarial example transferability tests, mainly cross-data transferability.
摘要:神经网络的最新进展导致可靠单眼深度估计。单眼深度估计技术比传统的深度估计技术上风,因为它只能推断期间需要一个图像。深度估计是在机器人的基本任务之一,单眼深度估计有各种各样的像在自动驾驶汽车和手术设备安全关键应用。因此,这种技术的鲁棒性是非常重要的。它已被证明在最近的作品,这些深层神经网络很容易受到敌对样本像分类,检测与分割的任务。这些对抗性的样品完全可以毁掉系统的输出,使他们在实时部署可信度值得怀疑。在本文中,我们调查对敌对攻击的最先进设备,最先进的单眼深度估计网络的鲁棒性。我们的实验表明,是肉眼看不到(摄攻击)和腐败小于图像(补丁攻击)的1%的图像上微小的扰动可能显着影响的深度估计。我们引入新的深特征湮灭损失破坏了隐藏的功能空间表现迫使网络的解码器输出差深度贴图。白盒和黑盒测试恭维提出攻击的有效性。我们还进行对抗性例如转移性测试,主要是跨数据转让。
Alwyn Mathew, Aditya Prakash Patra, Jimson Mathew
Abstract: Recent advancements of neural networks lead to reliable monocular depth estimation. Monocular depth estimated techniques have the upper hand over traditional depth estimation techniques as it only needs one image during inference. Depth estimation is one of the essential tasks in robotics, and monocular depth estimation has a wide variety of safety-critical applications like in self-driving cars and surgical devices. Thus, the robustness of such techniques is very crucial. It has been shown in recent works that these deep neural networks are highly vulnerable to adversarial samples for tasks like classification, detection and segmentation. These adversarial samples can completely ruin the output of the system, making their credibility in real-time deployment questionable. In this paper, we investigate the robustness of the most state-of-the-art monocular depth estimation networks against adversarial attacks. Our experiments show that tiny perturbations on an image that are invisible to the naked eye (perturbation attack) and corruption less than about 1% of an image (patch attack) can affect the depth estimation drastically. We introduce a novel deep feature annihilation loss that corrupts the hidden feature space representation forcing the decoder of the network to output poor depth maps. The white-box and black-box test compliments the effectiveness of the proposed attack. We also perform adversarial example transferability tests, mainly cross-data transferability.
摘要:神经网络的最新进展导致可靠单眼深度估计。单眼深度估计技术比传统的深度估计技术上风,因为它只能推断期间需要一个图像。深度估计是在机器人的基本任务之一,单眼深度估计有各种各样的像在自动驾驶汽车和手术设备安全关键应用。因此,这种技术的鲁棒性是非常重要的。它已被证明在最近的作品,这些深层神经网络很容易受到敌对样本像分类,检测与分割的任务。这些对抗性的样品完全可以毁掉系统的输出,使他们在实时部署可信度值得怀疑。在本文中,我们调查对敌对攻击的最先进设备,最先进的单眼深度估计网络的鲁棒性。我们的实验表明,是肉眼看不到(摄攻击)和腐败小于图像(补丁攻击)的1%的图像上微小的扰动可能显着影响的深度估计。我们引入新的深特征湮灭损失破坏了隐藏的功能空间表现迫使网络的解码器输出差深度贴图。白盒和黑盒测试恭维提出攻击的有效性。我们还进行对抗性例如转移性测试,主要是跨数据转让。
23. ePillID Dataset: A Low-Shot Fine-Grained Benchmark for Pill Identification [PDF] 返回目录
Naoto Usuyama, Natalia Larios Delgado, Amanda K. Hall, Jessica Lundin
Abstract: Identifying prescription medications is a frequent task for patients and medical professionals; however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the risk of medication errors. In this paper, we introduce ePillID, the largest public benchmark on pill image recognition, composed of 13k images representing 8184 appearance classes (two sides for 4092 pill types). For most of the appearance classes, there exists only one reference image, making it a challenging low-shot recognition setting. We present our experimental setup and evaluation results of various baseline models on the benchmark. The best baseline using a multi-head metric-learning approach with bilinear features performed remarkably well; however, our error analysis suggests that they still fail to distinguish particularly confusing classes. The code and data are available at \url{this https URL}.
摘要:确定处方药物对患者和医疗专业人员频繁的任务;然而,这是一个容易出错的任务,因为很多药有相似的外观(如白色圆形药片),这增加了用药错误的风险。在本文中,我们介绍了ePillID,在丸图像识别最大的公共基准,较8184外观类(双方于4092种类型)13K图像组成。对于大多数的外观类的,只存在一个参考图像,使之成为一个具有挑战性的低射击识别设置。我们目前的基准我们的各种基本模式,实验装置和评价结果。使用具有双线性特征多头度量学习方法的最好的基线显着表现良好;然而,我们的误差分析表明,他们仍然无法区分特别混乱类。代码和数据都可以在\ {URL这HTTPS URL}。
Naoto Usuyama, Natalia Larios Delgado, Amanda K. Hall, Jessica Lundin
Abstract: Identifying prescription medications is a frequent task for patients and medical professionals; however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the risk of medication errors. In this paper, we introduce ePillID, the largest public benchmark on pill image recognition, composed of 13k images representing 8184 appearance classes (two sides for 4092 pill types). For most of the appearance classes, there exists only one reference image, making it a challenging low-shot recognition setting. We present our experimental setup and evaluation results of various baseline models on the benchmark. The best baseline using a multi-head metric-learning approach with bilinear features performed remarkably well; however, our error analysis suggests that they still fail to distinguish particularly confusing classes. The code and data are available at \url{this https URL}.
摘要:确定处方药物对患者和医疗专业人员频繁的任务;然而,这是一个容易出错的任务,因为很多药有相似的外观(如白色圆形药片),这增加了用药错误的风险。在本文中,我们介绍了ePillID,在丸图像识别最大的公共基准,较8184外观类(双方于4092种类型)13K图像组成。对于大多数的外观类的,只存在一个参考图像,使之成为一个具有挑战性的低射击识别设置。我们目前的基准我们的各种基本模式,实验装置和评价结果。使用具有双线性特征多头度量学习方法的最好的基线显着表现良好;然而,我们的误差分析表明,他们仍然无法区分特别混乱类。代码和数据都可以在\ {URL这HTTPS URL}。
24. Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning [PDF] 返回目录
Muhammad Naseer Bajwa, Muhammad Imran Malik, Shoaib Ahmed Siddiqui, Andreas Dengel, Faisal Shafait, Wolfgang Neumeier, Sheraz Ahmed
Abstract: With the advancement of powerful image processing and machine learning techniques, CAD has become ever more prevalent in all fields of medicine including ophthalmology. Since optic disc is the most important part of retinal fundus image for glaucoma detection, this paper proposes a two-stage framework that first detects and localizes optic disc and then classifies it into healthy or glaucomatous. The first stage is based on RCNN and is responsible for localizing and extracting optic disc from a retinal fundus image while the second stage uses Deep CNN to classify the extracted disc into healthy or glaucomatous. In addition to the proposed solution, we also developed a rule-based semi-automatic ground truth generation method that provides necessary annotations for training RCNN based model for automated disc localization. The proposed method is evaluated on seven publicly available datasets for disc localization and on ORIGA dataset, which is the largest publicly available dataset for glaucoma classification. The results of automatic localization mark new state-of-the-art on six datasets with accuracy reaching 100% on four of them. For glaucoma classification we achieved AUC equal to 0.874 which is 2.7% relative improvement over the state-of-the-art results previously obtained for classification on ORIGA. Once trained on carefully annotated data, Deep Learning based methods for optic disc detection and localization are not only robust, accurate and fully automated but also eliminates the need for dataset-dependent heuristic algorithms. Our empirical evaluation of glaucoma classification on ORIGA reveals that reporting only AUC, for datasets with class imbalance and without pre-defined train and test splits, does not portray true picture of the classifier's performance and calls for additional performance metrics to substantiate the results.
摘要:凭借强大的图像处理和机器学习技术的进步,CAD已经在医学领域包括眼科变得越来越普遍。因为视盘是青光眼视网膜检测眼底图像的最重要的部分,本文提出了一种两阶段的框架,它首先检测和局部化视盘,然后将其分类为健康或青光眼。第一级是基于RCNN并负责从视网膜眼底图像定位和提取视盘,而第二级使用深CNN于提取的光盘划分为健康或青光眼。除了提出的解决方案,我们还开发了基于规则的半自动地面实况生成方法,提供必要的注释训练基础RCNN模型自动盘定位。该方法是对盘的定位和ORIGA数据集,这是青光眼的分类上最大的公开可用的数据集7个可公开获得的数据集进行评估。自动定位的结果标记六个数据集新的国家的最先进的精度对它们四个达到100%。对于青光眼分类我们实现AUC等于0.874这是优于先前用在ORIGA分类获得的状态的最先进的结果为2.7%的相对改善。一旦仔细标注的数据训练,为视盘检测和定位深度学习为基础的方法不仅是强大的,准确的和完全自动化的,但也省去了数据集依赖启发式算法的需要。青光眼的分类上ORIGA我们的实证分析表明,只有报告AUC,使用类不平衡和不预先定义的训练和测试拆分数据集,并没有刻画分类性能的真实情况和其他性能指标来证明结果的电话。
Muhammad Naseer Bajwa, Muhammad Imran Malik, Shoaib Ahmed Siddiqui, Andreas Dengel, Faisal Shafait, Wolfgang Neumeier, Sheraz Ahmed
Abstract: With the advancement of powerful image processing and machine learning techniques, CAD has become ever more prevalent in all fields of medicine including ophthalmology. Since optic disc is the most important part of retinal fundus image for glaucoma detection, this paper proposes a two-stage framework that first detects and localizes optic disc and then classifies it into healthy or glaucomatous. The first stage is based on RCNN and is responsible for localizing and extracting optic disc from a retinal fundus image while the second stage uses Deep CNN to classify the extracted disc into healthy or glaucomatous. In addition to the proposed solution, we also developed a rule-based semi-automatic ground truth generation method that provides necessary annotations for training RCNN based model for automated disc localization. The proposed method is evaluated on seven publicly available datasets for disc localization and on ORIGA dataset, which is the largest publicly available dataset for glaucoma classification. The results of automatic localization mark new state-of-the-art on six datasets with accuracy reaching 100% on four of them. For glaucoma classification we achieved AUC equal to 0.874 which is 2.7% relative improvement over the state-of-the-art results previously obtained for classification on ORIGA. Once trained on carefully annotated data, Deep Learning based methods for optic disc detection and localization are not only robust, accurate and fully automated but also eliminates the need for dataset-dependent heuristic algorithms. Our empirical evaluation of glaucoma classification on ORIGA reveals that reporting only AUC, for datasets with class imbalance and without pre-defined train and test splits, does not portray true picture of the classifier's performance and calls for additional performance metrics to substantiate the results.
摘要:凭借强大的图像处理和机器学习技术的进步,CAD已经在医学领域包括眼科变得越来越普遍。因为视盘是青光眼视网膜检测眼底图像的最重要的部分,本文提出了一种两阶段的框架,它首先检测和局部化视盘,然后将其分类为健康或青光眼。第一级是基于RCNN并负责从视网膜眼底图像定位和提取视盘,而第二级使用深CNN于提取的光盘划分为健康或青光眼。除了提出的解决方案,我们还开发了基于规则的半自动地面实况生成方法,提供必要的注释训练基础RCNN模型自动盘定位。该方法是对盘的定位和ORIGA数据集,这是青光眼的分类上最大的公开可用的数据集7个可公开获得的数据集进行评估。自动定位的结果标记六个数据集新的国家的最先进的精度对它们四个达到100%。对于青光眼分类我们实现AUC等于0.874这是优于先前用在ORIGA分类获得的状态的最先进的结果为2.7%的相对改善。一旦仔细标注的数据训练,为视盘检测和定位深度学习为基础的方法不仅是强大的,准确的和完全自动化的,但也省去了数据集依赖启发式算法的需要。青光眼的分类上ORIGA我们的实证分析表明,只有报告AUC,使用类不平衡和不预先定义的训练和测试拆分数据集,并没有刻画分类性能的真实情况和其他性能指标来证明结果的电话。
25. LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery [PDF] 返回目录
Wentong Liao, Xiang Chen, Jingfeng Yang, Stefan Roth, Michael Goesele, Michael Ying Yang, Bodo Rosenhahn
Abstract: State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset.
摘要:国家的最先进的物体检测方法,如快速/更快的R-CNN,SSD或YOLO有困难检测与在大航拍任意取向致密,小目标。主要的原因是,使用插值来对齐ROI特征会导致缺乏准确性或位置信息甚至亏损。我们目前当地感知区域卷积神经网络(LR-CNN),在航空影像车辆检测一种新型的两级方法。我们提高平移不变性检测密集车辆和通过聚合高精度的ROI功能解决边界问题的量化密之间的车辆。此外,我们重新取样高层语义池功能,使他们重新获得较浅的卷积块的功能位置信息。这增强了重采样功能的局部特征不变性,使检测车辆在任意方位。局部特征不变性提高焦点损失函数的学习能力,并且焦距损失进一步有助于集中在硬盘的例子。总之,我们的方法更好地满足航空影像的挑战。我们评估对一些具有挑战性的数据集(VEDAI,DOTA)我们的做法,表明了国家的最先进的方法显著的改善。我们证明了我们在DLR 3K数据集的方法的良好的泛化能力。
Wentong Liao, Xiang Chen, Jingfeng Yang, Stefan Roth, Michael Goesele, Michael Ying Yang, Bodo Rosenhahn
Abstract: State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset.
摘要:国家的最先进的物体检测方法,如快速/更快的R-CNN,SSD或YOLO有困难检测与在大航拍任意取向致密,小目标。主要的原因是,使用插值来对齐ROI特征会导致缺乏准确性或位置信息甚至亏损。我们目前当地感知区域卷积神经网络(LR-CNN),在航空影像车辆检测一种新型的两级方法。我们提高平移不变性检测密集车辆和通过聚合高精度的ROI功能解决边界问题的量化密之间的车辆。此外,我们重新取样高层语义池功能,使他们重新获得较浅的卷积块的功能位置信息。这增强了重采样功能的局部特征不变性,使检测车辆在任意方位。局部特征不变性提高焦点损失函数的学习能力,并且焦距损失进一步有助于集中在硬盘的例子。总之,我们的方法更好地满足航空影像的挑战。我们评估对一些具有挑战性的数据集(VEDAI,DOTA)我们的做法,表明了国家的最先进的方法显著的改善。我们证明了我们在DLR 3K数据集的方法的良好的泛化能力。
26. Overview: Computer vision and machine learning for microstructural characterization and analysis [PDF] 返回目录
Elizabeth A. Holm, Ryan Cohn, Nan Gao, Andrew R. Kitahara, Thomas P. Matson, Bo Lei, Srujana Rao Yarasi
Abstract: The characterization and analysis of microstructure is the foundation of microstructural science, connecting the materials structure to its composition, process history, and properties. Microstructural quantification traditionally involves a human deciding a priori what to measure and then devising a purpose-built method for doing so. However, recent advances in data science, including computer vision (CV) and machine learning (ML) offer new approaches to extracting information from microstructural images. This overview surveys CV approaches to numerically encode the visual information contained in a microstructural image, which then provides input to supervised or unsupervised ML algorithms that find associations and trends in the high-dimensional image representation. CV/ML systems for microstructural characterization and analysis span the taxonomy of image analysis tasks, including image classification, semantic segmentation, object detection, and instance segmentation. These tools enable new approaches to microstructural analysis, including the development of new, rich visual metrics and the discovery of processing-microstructure-property relationships.
摘要:微观结构的表征和分析微观结构是科学的基础,连接材料结构,其组成,工艺历史,和属性。显微定量传统的涉及人的决定先验测量什么,然后制定一个专门建造的方法这样做。然而,最近在数据科学的进步,包括计算机视觉(CV)和机器学习(ML)提供了新的方法来提取微观结构图像的信息。此概述调查CV接近数值编码包含在微观结构图像,其然后提供输入到找到协会和趋势高维图像表示监督或无人监督的ML算法的视觉信息。 CV /对微观结构表征和分析跨度的图像分析任务,包括图像分类,语义分割,对象检测和实例分割的类目ML系统。这些工具使新的方法来微观结构分析,包括新的,丰富的视觉指标的开发和处理微观结构 - 性能关系的发现。
Elizabeth A. Holm, Ryan Cohn, Nan Gao, Andrew R. Kitahara, Thomas P. Matson, Bo Lei, Srujana Rao Yarasi
Abstract: The characterization and analysis of microstructure is the foundation of microstructural science, connecting the materials structure to its composition, process history, and properties. Microstructural quantification traditionally involves a human deciding a priori what to measure and then devising a purpose-built method for doing so. However, recent advances in data science, including computer vision (CV) and machine learning (ML) offer new approaches to extracting information from microstructural images. This overview surveys CV approaches to numerically encode the visual information contained in a microstructural image, which then provides input to supervised or unsupervised ML algorithms that find associations and trends in the high-dimensional image representation. CV/ML systems for microstructural characterization and analysis span the taxonomy of image analysis tasks, including image classification, semantic segmentation, object detection, and instance segmentation. These tools enable new approaches to microstructural analysis, including the development of new, rich visual metrics and the discovery of processing-microstructure-property relationships.
摘要:微观结构的表征和分析微观结构是科学的基础,连接材料结构,其组成,工艺历史,和属性。显微定量传统的涉及人的决定先验测量什么,然后制定一个专门建造的方法这样做。然而,最近在数据科学的进步,包括计算机视觉(CV)和机器学习(ML)提供了新的方法来提取微观结构图像的信息。此概述调查CV接近数值编码包含在微观结构图像,其然后提供输入到找到协会和趋势高维图像表示监督或无人监督的ML算法的视觉信息。 CV /对微观结构表征和分析跨度的图像分析任务,包括图像分类,语义分割,对象检测和实例分割的类目ML系统。这些工具使新的方法来微观结构分析,包括新的,丰富的视觉指标的开发和处理微观结构 - 性能关系的发现。
27. Human Recognition Using Face in Computed Tomography [PDF] 返回目录
Jiuwen Zhu, Hu Han, S. Kevin Zhou
Abstract: With the mushrooming use of computed tomography (CT) images in clinical decision making, management of CT data becomes increasingly difficult. From the patient identification perspective, using the standard DICOM tag to track patient information is challenged by issues such as misspelling, lost file, site variation, etc. In this paper, we explore the feasibility of leveraging the faces in 3D CT images as biometric features. Specifically, we propose an automatic processing pipeline that first detects facial landmarks in 3D for ROI extraction and then generates aligned 2D depth images, which are used for automatic recognition. To boost the recognition performance, we employ transfer learning to reduce the data sparsity issue and to introduce a group sampling strategy to increase inter-class discrimination when training the recognition network. Our proposed method is capable of capturing underlying identity characteristics in medical images while reducing memory consumption. To test its effectiveness, we curate 600 3D CT images of 280 patients from multiple sources for performance evaluation. Experimental results demonstrate that our method achieves a 1:56 identification accuracy of 92.53% and a 1:1 verification accuracy of 96.12%, outperforming other competing approaches.
摘要:随着临床决策如雨后春笋般使用计算机断层扫描(CT)图像,CT数据的管理变得越来越困难。从患者识别的角度来看,使用标准的DICOM标签来追踪患者信息是通过诸如拼写错误,丢失的文件,现场的变化,等等。在这种纸张问题的挑战,我们探索借力面三维CT图像作为生物特征的可行性。具体而言,提出了一种自动处理流水线,在3D第一检测面部界标为ROI提取和然后产生对齐2D深度图像,其用于自动识别。为了提高识别性能,我们采用转移学习,以减少数据稀疏问题,并引进一批抽样战略,以提高级间辨别训练识别网络时。我们提出的方法是能够捕获在医学图像中底层身份特征,同时减少内存消耗。为了测试其有效性,我们收录的280例600个的3D CT图像从性能评估多个源。实验结果表明,我们的方法实现的92.53%一1:56识别精度和的1:96.12%1验证准确性,优于其他竞争的方法。
Jiuwen Zhu, Hu Han, S. Kevin Zhou
Abstract: With the mushrooming use of computed tomography (CT) images in clinical decision making, management of CT data becomes increasingly difficult. From the patient identification perspective, using the standard DICOM tag to track patient information is challenged by issues such as misspelling, lost file, site variation, etc. In this paper, we explore the feasibility of leveraging the faces in 3D CT images as biometric features. Specifically, we propose an automatic processing pipeline that first detects facial landmarks in 3D for ROI extraction and then generates aligned 2D depth images, which are used for automatic recognition. To boost the recognition performance, we employ transfer learning to reduce the data sparsity issue and to introduce a group sampling strategy to increase inter-class discrimination when training the recognition network. Our proposed method is capable of capturing underlying identity characteristics in medical images while reducing memory consumption. To test its effectiveness, we curate 600 3D CT images of 280 patients from multiple sources for performance evaluation. Experimental results demonstrate that our method achieves a 1:56 identification accuracy of 92.53% and a 1:1 verification accuracy of 96.12%, outperforming other competing approaches.
摘要:随着临床决策如雨后春笋般使用计算机断层扫描(CT)图像,CT数据的管理变得越来越困难。从患者识别的角度来看,使用标准的DICOM标签来追踪患者信息是通过诸如拼写错误,丢失的文件,现场的变化,等等。在这种纸张问题的挑战,我们探索借力面三维CT图像作为生物特征的可行性。具体而言,提出了一种自动处理流水线,在3D第一检测面部界标为ROI提取和然后产生对齐2D深度图像,其用于自动识别。为了提高识别性能,我们采用转移学习,以减少数据稀疏问题,并引进一批抽样战略,以提高级间辨别训练识别网络时。我们提出的方法是能够捕获在医学图像中底层身份特征,同时减少内存消耗。为了测试其有效性,我们收录的280例600个的3D CT图像从性能评估多个源。实验结果表明,我们的方法实现的92.53%一1:56识别精度和的1:96.12%1验证准确性,优于其他竞争的方法。
28. Fuzziness-based Spatial-Spectral Class Discriminant Information Preserving Active Learning for Hyperspectral Image Classification [PDF] 返回目录
Muhammad Ahmad
Abstract: Traditional Active/Self/Interactive Learning for Hyperspectral Image Classification (HSIC) increases the size of the training set without considering the class scatters and randomness among the existing and new samples. Second, very limited research has been carried out on joint spectral-spatial information and finally, a minor but still worth mentioning is the stopping criteria which not being much considered by the community. Therefore, this work proposes a novel fuzziness-based spatial-spectral within and between for both local and global class discriminant information preserving (FLG) method. We first investigate a spatial prior fuzziness-based misclassified sample information. We then compute the total local and global for both within and between class information and formulate it in a fine-grained manner. Later this information is fed to a discriminative objective function to query the heterogeneous samples which eliminate the randomness among the training samples. Experimental results on benchmark HSI datasets demonstrate the effectiveness of the FLG method on Generative, Extreme Learning Machine and Sparse Multinomial Logistic Regression (SMLR)-LORSAL classifiers.
摘要:传统的主动/自拍/互动学习的高光谱影像分类(HSIC)增加训练集的大小,而不考虑现有的和新的样本中的具有类散射和随机性。其次,非常有限的研究已进行了联合谱空间信息,最后,一个小,但仍然值得一提的是它没有被倍受社会各界考虑停止准则。因此,这项工作提出了内和之间为本地和全局类判别信息保存(FLG)方法的新颖的基于模糊空间光谱。我们首先探讨的空间之前基于模糊,错误分类样本信息。然后,我们计算总的局部和全局的内部和之间的类信息,并以精细的方式制定的。后来这一信息被馈送到判别目标函数以查询哪些消除训练样本之间的随机性异质样品。在恒指数据集实验结果表明,生成性,极限学习机和稀疏多项Logistic回归(SMLR)-LORSAL分类的FLG方法的有效性。
Muhammad Ahmad
Abstract: Traditional Active/Self/Interactive Learning for Hyperspectral Image Classification (HSIC) increases the size of the training set without considering the class scatters and randomness among the existing and new samples. Second, very limited research has been carried out on joint spectral-spatial information and finally, a minor but still worth mentioning is the stopping criteria which not being much considered by the community. Therefore, this work proposes a novel fuzziness-based spatial-spectral within and between for both local and global class discriminant information preserving (FLG) method. We first investigate a spatial prior fuzziness-based misclassified sample information. We then compute the total local and global for both within and between class information and formulate it in a fine-grained manner. Later this information is fed to a discriminative objective function to query the heterogeneous samples which eliminate the randomness among the training samples. Experimental results on benchmark HSI datasets demonstrate the effectiveness of the FLG method on Generative, Extreme Learning Machine and Sparse Multinomial Logistic Regression (SMLR)-LORSAL classifiers.
摘要:传统的主动/自拍/互动学习的高光谱影像分类(HSIC)增加训练集的大小,而不考虑现有的和新的样本中的具有类散射和随机性。其次,非常有限的研究已进行了联合谱空间信息,最后,一个小,但仍然值得一提的是它没有被倍受社会各界考虑停止准则。因此,这项工作提出了内和之间为本地和全局类判别信息保存(FLG)方法的新颖的基于模糊空间光谱。我们首先探讨的空间之前基于模糊,错误分类样本信息。然后,我们计算总的局部和全局的内部和之间的类信息,并以精细的方式制定的。后来这一信息被馈送到判别目标函数以查询哪些消除训练样本之间的随机性异质样品。在恒指数据集实验结果表明,生成性,极限学习机和稀疏多项Logistic回归(SMLR)-LORSAL分类的FLG方法的有效性。
29. FCN+RL: A Fully Convolutional Network followed by Refinement Layers to Offline Handwritten Signature Segmentation [PDF] 返回目录
Celso A. M. Lopes Junior, Matheus Henrique M. da Silva, Byron Leite Dantas Bezerra, Bruno Jose Torres Fernandes, Donato Impedovo
Abstract: Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels increase the difficulty of the targeting task. Other factors that make it challenging are the various variations present in handwritten signatures such as location, type of ink, color and type of pen, and the type of stroke. In this work, we propose an approach to locate and extract the pixels of handwritten signatures on identification documents, without any prior information on the location of the signatures. The technique used is based on a fully convolutional encoder-decoder network combined with a block of refinement layers for the alpha channel of the predicted image. The experimental results demonstrate that the technique outputs a clean signature with higher fidelity in the lines than the traditional approaches and preservation of the pertinent characteristics to the signer's spelling. To evaluate the quality of our proposal, we use the following image similarity metrics: SSIM, SIFT, and Dice Coefficient. The qualitative and quantitative results show a significant improvement in comparison with the baseline system.
摘要:虽然世俗,手写签名是大多数国家所使用的最可靠的生物特征识别方法之一。在过去的十年里,技术的手写签名验证的应用已经强烈的演变,包括法医方面。一些因素,如背景的复杂性和感兴趣的区域的小尺寸 - 签名像素提高了目标任务的难度。使得它具有挑战性的其他因素是存在于手写签名,如位置,笔的墨水,颜色和类型的类型,和中风的类型的各种变型。在这项工作中,我们提出来定位和提取上身份证明文件的手写签名的像素,而不对签名的位置的任何先验信息的方法。所使用的技术是基于一个完全卷积编码器 - 解码器网络与用于预测图像的alpha通道细化层的方块组合上。实验结果表明,该技术输出干净签名与比的相关特性,以签名者的拼写的传统方法和保存线更高的保真度。为了评估我们的建议的质量,我们用下面的图像相似性指标:SSIM,过筛,和骰子系数。定性和定量结果显示,与基线系统比较显著的改善。
Celso A. M. Lopes Junior, Matheus Henrique M. da Silva, Byron Leite Dantas Bezerra, Bruno Jose Torres Fernandes, Donato Impedovo
Abstract: Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels increase the difficulty of the targeting task. Other factors that make it challenging are the various variations present in handwritten signatures such as location, type of ink, color and type of pen, and the type of stroke. In this work, we propose an approach to locate and extract the pixels of handwritten signatures on identification documents, without any prior information on the location of the signatures. The technique used is based on a fully convolutional encoder-decoder network combined with a block of refinement layers for the alpha channel of the predicted image. The experimental results demonstrate that the technique outputs a clean signature with higher fidelity in the lines than the traditional approaches and preservation of the pertinent characteristics to the signer's spelling. To evaluate the quality of our proposal, we use the following image similarity metrics: SSIM, SIFT, and Dice Coefficient. The qualitative and quantitative results show a significant improvement in comparison with the baseline system.
摘要:虽然世俗,手写签名是大多数国家所使用的最可靠的生物特征识别方法之一。在过去的十年里,技术的手写签名验证的应用已经强烈的演变,包括法医方面。一些因素,如背景的复杂性和感兴趣的区域的小尺寸 - 签名像素提高了目标任务的难度。使得它具有挑战性的其他因素是存在于手写签名,如位置,笔的墨水,颜色和类型的类型,和中风的类型的各种变型。在这项工作中,我们提出来定位和提取上身份证明文件的手写签名的像素,而不对签名的位置的任何先验信息的方法。所使用的技术是基于一个完全卷积编码器 - 解码器网络与用于预测图像的alpha通道细化层的方块组合上。实验结果表明,该技术输出干净签名与比的相关特性,以签名者的拼写的传统方法和保存线更高的保真度。为了评估我们的建议的质量,我们用下面的图像相似性指标:SSIM,过筛,和骰子系数。定性和定量结果显示,与基线系统比较显著的改善。
30. Depth-aware Blending of Smoothed Images for Bokeh Effect Generation [PDF] 返回目录
Saikat Dutta
Abstract: Bokeh effect is used in photography to capture images where the closer objects look sharp and every-thing else stays out-of-focus. Bokeh photos are generally captured using Single Lens Reflex cameras using shallow depth-of-field. Most of the modern smartphones can take bokeh images by leveraging dual rear cameras or a good auto-focus hardware. However, for smartphones with single-rear camera without a good auto-focus hardware, we have to rely on software to generate bokeh images. This kind of system is also useful to generate bokeh effect in already captured images. In this paper, an end-to-end deep learning framework is proposed to generate high-quality bokeh effect from images. The original image and different versions of smoothed images are blended to generate Bokeh effect with the help of a monocular depth estimation network. The proposed approach is compared against a saliency detection based baseline and a number of approaches proposed in AIM 2019 Challenge on Bokeh Effect Synthesis. Extensive experiments are shown in order to understand different parts of the proposed algorithm. The network is lightweight and can process an HD image in 0.03 seconds. This approach ranked second in AIM 2019 Bokeh effect challenge-Perceptual Track.
摘要:背景虚化效果的摄影中使用到的更近的物体显得清晰,每-别的东西撑出焦拍摄图像。背景虚化的照片是使用使用浅景深场单镜头反光相机一般拍摄。大多数现代的智能手机可以通过利用双后置摄像头和一个很好的自动对焦硬件采取的背景虚化的图像。然而,对于单后置摄像头的智能手机没有一个良好的自动对焦硬件,我们必须依靠软件来产生的背景虚化的图像。这种系统也是有用的生成已经拍摄的图像背景虚化效果。在本文中,一个终端到终端的深度学习框架,提出了生成图像的高品质的背景虚化效果。原始图像和不同版本的平滑图像的混合来产生与单眼深度估计网络的帮助,背景虚化效果。所提出的方法是对一个显着性检测基于基线和一些挑战的背景虚化效果合成的AIM 2019提出的方法进行了比较。大量的实验显示,以了解该算法的不同部分。该网络是轻量级的,并且可以在0.03秒处理HD图像。这种方法在AIM 2019背景虚化效果的挑战,感知轨迹排名第二。
Saikat Dutta
Abstract: Bokeh effect is used in photography to capture images where the closer objects look sharp and every-thing else stays out-of-focus. Bokeh photos are generally captured using Single Lens Reflex cameras using shallow depth-of-field. Most of the modern smartphones can take bokeh images by leveraging dual rear cameras or a good auto-focus hardware. However, for smartphones with single-rear camera without a good auto-focus hardware, we have to rely on software to generate bokeh images. This kind of system is also useful to generate bokeh effect in already captured images. In this paper, an end-to-end deep learning framework is proposed to generate high-quality bokeh effect from images. The original image and different versions of smoothed images are blended to generate Bokeh effect with the help of a monocular depth estimation network. The proposed approach is compared against a saliency detection based baseline and a number of approaches proposed in AIM 2019 Challenge on Bokeh Effect Synthesis. Extensive experiments are shown in order to understand different parts of the proposed algorithm. The network is lightweight and can process an HD image in 0.03 seconds. This approach ranked second in AIM 2019 Bokeh effect challenge-Perceptual Track.
摘要:背景虚化效果的摄影中使用到的更近的物体显得清晰,每-别的东西撑出焦拍摄图像。背景虚化的照片是使用使用浅景深场单镜头反光相机一般拍摄。大多数现代的智能手机可以通过利用双后置摄像头和一个很好的自动对焦硬件采取的背景虚化的图像。然而,对于单后置摄像头的智能手机没有一个良好的自动对焦硬件,我们必须依靠软件来产生的背景虚化的图像。这种系统也是有用的生成已经拍摄的图像背景虚化效果。在本文中,一个终端到终端的深度学习框架,提出了生成图像的高品质的背景虚化效果。原始图像和不同版本的平滑图像的混合来产生与单眼深度估计网络的帮助,背景虚化效果。所提出的方法是对一个显着性检测基于基线和一些挑战的背景虚化效果合成的AIM 2019提出的方法进行了比较。大量的实验显示,以了解该算法的不同部分。该网络是轻量级的,并且可以在0.03秒处理HD图像。这种方法在AIM 2019背景虚化效果的挑战,感知轨迹排名第二。
31. Improving Community Resiliency and Emergency Response With Artificial Intelligence [PDF] 返回目录
Ben Ortiz, Laura Kahn, Marc Bosch, Philip Bogden, Viveca Pavon-Harr, Onur Savas, Ian McCulloh
Abstract: New crisis response and management approaches that incorporate the latest information technologies are essential in all phases of emergency preparedness and response, including the planning, response, recovery, and assessment phases. Accurate and timely information is as crucial as is rapid and coherent coordination among the responding organizations. We are working towards a multipronged emergency response tool that provide stakeholders timely access to comprehensive, relevant, and reliable information. The faster emergency personnel are able to analyze, disseminate and act on key information, the more effective and timelier their response will be and the greater the benefit to affected populations. Our tool consists of encoding multiple layers of open source geospatial data including flood risk location, road network strength, inundation maps that proxy inland flooding and computer vision semantic segmentation for estimating flooded areas and damaged infrastructure. These data layers are combined and used as input data for machine learning algorithms such as finding the best evacuation routes before, during and after an emergency or providing a list of available lodging for first responders in an impacted area for first. Even though our system could be used in a number of use cases where people are forced from one location to another, we demonstrate the feasibility of our system for the use case of Hurricane Florence in Lumberton, North Carolina.
摘要:新的危机应对和管理方法结合了最新的信息技术在应急准备和响应的所有阶段,包括规划,响应,恢复和评估阶段至关重要。准确及时的信息是至关重要的是响应组织之间迅速和连贯协调。我们正在朝着一个多管齐下的应急工具,利益攸关方提供全面,相关,可靠的信息的及时访问。更快的急救人员能够分析,传播和关键信息采取行动,更有效和及时的他们的反应将和更大的利益受影响的民众。我们的工具由编码开源地理空间数据的多层,包括洪水风险的位置,路网强度,淹没地图,代理估计受灾地区和基础设施遭到破坏内陆洪水和计算机视觉语义分割的。这些数据层合并,并用作机器学习算法,如之前寻找最佳的疏散路线,期间和紧急之后,或在一个受影响区域对第一提供用于第一响应者可用的倒伏的列表的输入数据。即使我们的系统可以在一些使用情况下,人们从一个地方被迫另一个被使用,我们证明了我们系统的可行性飓风佛罗伦萨兰伯顿,北卡罗莱纳州的使用情况。
Ben Ortiz, Laura Kahn, Marc Bosch, Philip Bogden, Viveca Pavon-Harr, Onur Savas, Ian McCulloh
Abstract: New crisis response and management approaches that incorporate the latest information technologies are essential in all phases of emergency preparedness and response, including the planning, response, recovery, and assessment phases. Accurate and timely information is as crucial as is rapid and coherent coordination among the responding organizations. We are working towards a multipronged emergency response tool that provide stakeholders timely access to comprehensive, relevant, and reliable information. The faster emergency personnel are able to analyze, disseminate and act on key information, the more effective and timelier their response will be and the greater the benefit to affected populations. Our tool consists of encoding multiple layers of open source geospatial data including flood risk location, road network strength, inundation maps that proxy inland flooding and computer vision semantic segmentation for estimating flooded areas and damaged infrastructure. These data layers are combined and used as input data for machine learning algorithms such as finding the best evacuation routes before, during and after an emergency or providing a list of available lodging for first responders in an impacted area for first. Even though our system could be used in a number of use cases where people are forced from one location to another, we demonstrate the feasibility of our system for the use case of Hurricane Florence in Lumberton, North Carolina.
摘要:新的危机应对和管理方法结合了最新的信息技术在应急准备和响应的所有阶段,包括规划,响应,恢复和评估阶段至关重要。准确及时的信息是至关重要的是响应组织之间迅速和连贯协调。我们正在朝着一个多管齐下的应急工具,利益攸关方提供全面,相关,可靠的信息的及时访问。更快的急救人员能够分析,传播和关键信息采取行动,更有效和及时的他们的反应将和更大的利益受影响的民众。我们的工具由编码开源地理空间数据的多层,包括洪水风险的位置,路网强度,淹没地图,代理估计受灾地区和基础设施遭到破坏内陆洪水和计算机视觉语义分割的。这些数据层合并,并用作机器学习算法,如之前寻找最佳的疏散路线,期间和紧急之后,或在一个受影响区域对第一提供用于第一响应者可用的倒伏的列表的输入数据。即使我们的系统可以在一些使用情况下,人们从一个地方被迫另一个被使用,我们证明了我们系统的可行性飓风佛罗伦萨兰伯顿,北卡罗莱纳州的使用情况。
32. Probabilistic Object Classification using CNN ML-MAP layers [PDF] 返回目录
G. Melotti, C. Premebida, J.J. Bird, D.R. Faria, N. Gonçalves
Abstract: Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach based on distributions calculated in the network's Logit layer. The approach enables Bayesian inference by means of ML and MAP layers. Experiments with calibrated and the proposed prediction layers are carried out on object classification using data from the KITTI database. Results are reported for camera ($RGB$) and LiDAR (range-view) modalities, where the new approach shows promising performance compared to SoftMax.
摘要:深网络目前在国家的最先进的在自动驾驶和机器人的感官知觉。然而,深模型通常产生过于自信的预测正确排除概率解释,我们认为是由于使用SoftMax层的性质。为了减少过度自信而不影响分类性能,我们介绍了基于网络的Logit模型层计算分布的CNN概率方法。该方法使得能够通过ML和MAP层的手段贝叶斯推理。用校准的实验和所提出的预测层上使用从数据库KITTI数据对象分类进行。结果报告相机($ RGB $),并在新的方法显示出有前途的性能相比使用SoftMax激光雷达(范围视图)模式。
G. Melotti, C. Premebida, J.J. Bird, D.R. Faria, N. Gonçalves
Abstract: Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach based on distributions calculated in the network's Logit layer. The approach enables Bayesian inference by means of ML and MAP layers. Experiments with calibrated and the proposed prediction layers are carried out on object classification using data from the KITTI database. Results are reported for camera ($RGB$) and LiDAR (range-view) modalities, where the new approach shows promising performance compared to SoftMax.
摘要:深网络目前在国家的最先进的在自动驾驶和机器人的感官知觉。然而,深模型通常产生过于自信的预测正确排除概率解释,我们认为是由于使用SoftMax层的性质。为了减少过度自信而不影响分类性能,我们介绍了基于网络的Logit模型层计算分布的CNN概率方法。该方法使得能够通过ML和MAP层的手段贝叶斯推理。用校准的实验和所提出的预测层上使用从数据库KITTI数据对象分类进行。结果报告相机($ RGB $),并在新的方法显示出有前途的性能相比使用SoftMax激光雷达(范围视图)模式。
33. Hyperspectral Image Super-resolution via Deep Spatio-spectral Convolutional Neural Networks [PDF] 返回目录
Jin-Fan Hu, Ting-Zhu Huang, Liang-Jian Deng, Tai-Xiang Jiang, Gemine Vivone, Jocelyn Chanussot
Abstract: Hyperspectral images are of crucial importance in order to better understand features of different materials. To reach this goal, they leverage on a high number of spectral bands. However, this interesting characteristic is often paid by a reduced spatial resolution compared with traditional multispectral image systems. In order to alleviate this issue, in this work, we propose a simple and efficient architecture for deep convolutional neural networks to fuse a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution hyperspectral image (HR-HSI). The network is designed to preserve both spatial and spectral information thanks to an architecture from two folds: one is to utilize the HR-HSI at a different scale to get an output with a satisfied spectral preservation; another one is to apply concepts of multi-resolution analysis to extract high-frequency information, aiming to output high quality spatial details. Finally, a plain mean squared error loss function is used to measure the performance during the training. Extensive experiments demonstrate that the proposed network architecture achieves best performance (both qualitatively and quantitatively) compared with recent state-of-the-art hyperspectral image super-resolution approaches. Moreover, other significant advantages can be pointed out by the use of the proposed approach, such as, a better network generalization ability, a limited computational burden, and a robustness with respect to the number of training samples.
摘要:高光谱图像是至关重要的,以便更好地了解不同材料的特点。为了达到这个目标,他们利用在大量的光谱波段。然而,这个有趣的特点往往是由减小的空间分辨率与传统多光谱图像系统相比支付。为了缓解这一问题,在这项工作中,我们提出了深刻的卷积神经网络的简单和高效的架构融合低分辨率光谱图像(LR-HSI)和高分辨率多光谱图像(HR-MSI),产生高分辨率光谱图像(HR-HSI)。网络被设计为空间和光谱信息感谢从两个折痕保留到的架构:一种是利用HR-HSI以不同的尺度来获得具有满意的光谱保存的输出;另一种是多分辨率分析的概念也适用于提取的高频信息,旨在输出高质量空间细节。最后,一个普通的均方误差损失函数是用来衡量在培训期间的表现。大量的实验表明,该网络架构与近期国家的最先进的高光谱图像超分辨率方法相比,实现了最佳的性能(定性和定量)。此外,其他显著优势,可以通过使用该方法的指出,比如,一个更好的网络推广能力,有限的计算负担,并且相对于训练样本数的鲁棒性。
Jin-Fan Hu, Ting-Zhu Huang, Liang-Jian Deng, Tai-Xiang Jiang, Gemine Vivone, Jocelyn Chanussot
Abstract: Hyperspectral images are of crucial importance in order to better understand features of different materials. To reach this goal, they leverage on a high number of spectral bands. However, this interesting characteristic is often paid by a reduced spatial resolution compared with traditional multispectral image systems. In order to alleviate this issue, in this work, we propose a simple and efficient architecture for deep convolutional neural networks to fuse a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution hyperspectral image (HR-HSI). The network is designed to preserve both spatial and spectral information thanks to an architecture from two folds: one is to utilize the HR-HSI at a different scale to get an output with a satisfied spectral preservation; another one is to apply concepts of multi-resolution analysis to extract high-frequency information, aiming to output high quality spatial details. Finally, a plain mean squared error loss function is used to measure the performance during the training. Extensive experiments demonstrate that the proposed network architecture achieves best performance (both qualitatively and quantitatively) compared with recent state-of-the-art hyperspectral image super-resolution approaches. Moreover, other significant advantages can be pointed out by the use of the proposed approach, such as, a better network generalization ability, a limited computational burden, and a robustness with respect to the number of training samples.
摘要:高光谱图像是至关重要的,以便更好地了解不同材料的特点。为了达到这个目标,他们利用在大量的光谱波段。然而,这个有趣的特点往往是由减小的空间分辨率与传统多光谱图像系统相比支付。为了缓解这一问题,在这项工作中,我们提出了深刻的卷积神经网络的简单和高效的架构融合低分辨率光谱图像(LR-HSI)和高分辨率多光谱图像(HR-MSI),产生高分辨率光谱图像(HR-HSI)。网络被设计为空间和光谱信息感谢从两个折痕保留到的架构:一种是利用HR-HSI以不同的尺度来获得具有满意的光谱保存的输出;另一种是多分辨率分析的概念也适用于提取的高频信息,旨在输出高质量空间细节。最后,一个普通的均方误差损失函数是用来衡量在培训期间的表现。大量的实验表明,该网络架构与近期国家的最先进的高光谱图像超分辨率方法相比,实现了最佳的性能(定性和定量)。此外,其他显著优势,可以通过使用该方法的指出,比如,一个更好的网络推广能力,有限的计算负担,并且相对于训练样本数的鲁棒性。
34. A Light-Weighted Convolutional Neural Network for Bitemporal SAR Image Change Detection [PDF] 返回目录
Rongfang Wang, Fan Ding, Licheng Jiao, Jia-Wei Chen, Bo Liu, Wenping Ma, Mi Wang
Abstract: Recently, many Convolution Neural Networks (CNN) have been successfully employed in bitemporal SAR image change detection. However, most of the existing networks are too heavy and occupy a large volume of memory for storage and calculation. Motivated by this, in this paper, we propose a lightweight neural network to reduce the computational and spatial complexity and facilitate the change detection on an edge device. In the proposed network, we replace normal convolutional layers with bottleneck layers that keep the same number of channels between input and output. Next, we employ dilated convolutional kernels with a few non-zero entries that reduce the running time in convolutional operators. Comparing with the conventional convolutional neural network, our light-weighted neural network will be more efficient with fewer parameters. We verify our light-weighted neural network on four sets of bitemporal SAR images. The experimental results show that the proposed network can obtain better performance than the conventional CNN and has better model generalization, especially on the challenging datasets with complex scenes.
摘要:近日,许多卷积神经网络(CNN)已经成功地在双颞SAR图像变化检测使用。然而,大多数现有网络的过于沉重和占用的存储和计算大量的内存。这个启发,在本文中,我们提出了一个轻量级的神经网络,以减少计算和空间复杂度,并促进边缘设备上的变化检测。在所提出的网络,我们替换与保持相同数量的输入和输出之间的信道瓶颈层正常卷积层。接下来,我们采用扩张性的卷积核与减少卷积运营商的运行时间几个非零项。与传统的卷积神经网络相比,我们的光加权神经网络将与更少的参数更有效。我们确认我们的光加权神经网络的四套双颞SAR图像。实验结果表明,该网络能够获得比传统的CNN更好的性能,具有较好的推广模式,特别是对复杂场景的具有挑战性的数据集。
Rongfang Wang, Fan Ding, Licheng Jiao, Jia-Wei Chen, Bo Liu, Wenping Ma, Mi Wang
Abstract: Recently, many Convolution Neural Networks (CNN) have been successfully employed in bitemporal SAR image change detection. However, most of the existing networks are too heavy and occupy a large volume of memory for storage and calculation. Motivated by this, in this paper, we propose a lightweight neural network to reduce the computational and spatial complexity and facilitate the change detection on an edge device. In the proposed network, we replace normal convolutional layers with bottleneck layers that keep the same number of channels between input and output. Next, we employ dilated convolutional kernels with a few non-zero entries that reduce the running time in convolutional operators. Comparing with the conventional convolutional neural network, our light-weighted neural network will be more efficient with fewer parameters. We verify our light-weighted neural network on four sets of bitemporal SAR images. The experimental results show that the proposed network can obtain better performance than the conventional CNN and has better model generalization, especially on the challenging datasets with complex scenes.
摘要:近日,许多卷积神经网络(CNN)已经成功地在双颞SAR图像变化检测使用。然而,大多数现有网络的过于沉重和占用的存储和计算大量的内存。这个启发,在本文中,我们提出了一个轻量级的神经网络,以减少计算和空间复杂度,并促进边缘设备上的变化检测。在所提出的网络,我们替换与保持相同数量的输入和输出之间的信道瓶颈层正常卷积层。接下来,我们采用扩张性的卷积核与减少卷积运营商的运行时间几个非零项。与传统的卷积神经网络相比,我们的光加权神经网络将与更少的参数更有效。我们确认我们的光加权神经网络的四套双颞SAR图像。实验结果表明,该网络能够获得比传统的CNN更好的性能,具有较好的推广模式,特别是对复杂场景的具有挑战性的数据集。
35. Extracting low-dimensional psychological representations from convolutional neural networks [PDF] 返回目录
Aditi Jha, Joshua Peterson, Thomas L. Griffiths
Abstract: Deep neural networks are increasingly being used in cognitive modeling as a means of deriving representations for complex stimuli such as images. While the predictive power of these networks is high, it is often not clear whether they also offer useful explanations of the task at hand. Convolutional neural network representations have been shown to be predictive of human similarity judgments for images after appropriate adaptation. However, these high-dimensional representations are difficult to interpret. Here we present a method for reducing these representations to a low-dimensional space which is still predictive of similarity judgments. We show that these low-dimensional representations also provide insightful explanations of factors underlying human similarity judgments.
摘要:深层神经网络越来越多地被认知模型作为推导复杂的刺激表示,诸如图像的一种手段。虽然这些网络的预测能力是很高,往往并不清楚他们是否也提供了手头的工作有用的解释。卷积神经网络表示已经被证明是人类的预测相似的判断为经过适当调整图像。然而,这些高维表示难以解释。这里,我们提出用于减少这些表示,其仍然是预测相似判断的低维空间中的方法。我们发现,这些低维表示还提供了底层人的相似性判断的因素有见地的解释。
Aditi Jha, Joshua Peterson, Thomas L. Griffiths
Abstract: Deep neural networks are increasingly being used in cognitive modeling as a means of deriving representations for complex stimuli such as images. While the predictive power of these networks is high, it is often not clear whether they also offer useful explanations of the task at hand. Convolutional neural network representations have been shown to be predictive of human similarity judgments for images after appropriate adaptation. However, these high-dimensional representations are difficult to interpret. Here we present a method for reducing these representations to a low-dimensional space which is still predictive of similarity judgments. We show that these low-dimensional representations also provide insightful explanations of factors underlying human similarity judgments.
摘要:深层神经网络越来越多地被认知模型作为推导复杂的刺激表示,诸如图像的一种手段。虽然这些网络的预测能力是很高,往往并不清楚他们是否也提供了手头的工作有用的解释。卷积神经网络表示已经被证明是人类的预测相似的判断为经过适当调整图像。然而,这些高维表示难以解释。这里,我们提出用于减少这些表示,其仍然是预测相似判断的低维空间中的方法。我们发现,这些低维表示还提供了底层人的相似性判断的因素有见地的解释。
36. Enhancing Foreground Boundaries for Medical Image Segmentation [PDF] 返回目录
Dong Yang, Holger Roth, Xiaosong Wang, Ziyue Xu, Andriy Myronenko, Daguang Xu
Abstract: Object segmentation plays an important role in the modern medical image analysis, which benefits clinical study, disease diagnosis, and surgery planning. Given the various modalities of medical images, the automated or semi-automated segmentation approaches have been used to identify and parse organs, bones, tumors, and other regions-of-interest (ROI). However, these contemporary segmentation approaches tend to fail to predict the boundary areas of ROI, because of the fuzzy appearance contrast caused during the imaging procedure. To further improve the segmentation quality of boundary areas, we propose a boundary enhancement loss to enforce additional constraints on optimizing machine learning models. The proposed loss function is light-weighted and easy to implement without any pre- or post-processing. Our experimental results validate that our loss function are better than, or at least comparable to, other state-of-the-art loss functions in terms of segmentation accuracy.
摘要:对象分割发挥在现代医学图像分析,有利于临床研究,疾病诊断和手术规划具有重要作用。由于医学图像的各种模式,自动或半自动的分割方法已被用来识别和解析器官,骨骼,肿瘤,以及其他地区的兴趣(ROI)。然而,这些现代的分割方法往往无法预测的,因为在成像过程中产生的模糊的外观对比ROI的边界地区。为了进一步提高边界地区的分割质量,我们提出了一个边界损失增强执行优化机器学习模型额外的约束。所提出的损失函数是重量轻且容易无需任何预处理或后处理,以实现。我们的实验结果验证了我们的损失功能均优于或至少媲美,国家的最先进的其他损失的功能分割精度方面。
Dong Yang, Holger Roth, Xiaosong Wang, Ziyue Xu, Andriy Myronenko, Daguang Xu
Abstract: Object segmentation plays an important role in the modern medical image analysis, which benefits clinical study, disease diagnosis, and surgery planning. Given the various modalities of medical images, the automated or semi-automated segmentation approaches have been used to identify and parse organs, bones, tumors, and other regions-of-interest (ROI). However, these contemporary segmentation approaches tend to fail to predict the boundary areas of ROI, because of the fuzzy appearance contrast caused during the imaging procedure. To further improve the segmentation quality of boundary areas, we propose a boundary enhancement loss to enforce additional constraints on optimizing machine learning models. The proposed loss function is light-weighted and easy to implement without any pre- or post-processing. Our experimental results validate that our loss function are better than, or at least comparable to, other state-of-the-art loss functions in terms of segmentation accuracy.
摘要:对象分割发挥在现代医学图像分析,有利于临床研究,疾病诊断和手术规划具有重要作用。由于医学图像的各种模式,自动或半自动的分割方法已被用来识别和解析器官,骨骼,肿瘤,以及其他地区的兴趣(ROI)。然而,这些现代的分割方法往往无法预测的,因为在成像过程中产生的模糊的外观对比ROI的边界地区。为了进一步提高边界地区的分割质量,我们提出了一个边界损失增强执行优化机器学习模型额外的约束。所提出的损失函数是重量轻且容易无需任何预处理或后处理,以实现。我们的实验结果验证了我们的损失功能均优于或至少媲美,国家的最先进的其他损失的功能分割精度方面。
37. Bipartite Distance for Shape-Aware Landmark Detection in Spinal X-Ray Images [PDF] 返回目录
Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Kenneth M.C. Cheung, Michael To, Zhen Qian, Demetri Terzopoulos
Abstract: Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spinal landmarks. To guide a CNN in the learning of spinal shape while detecting landmarks in X-ray images, we propose a novel loss based on a bipartite distance (BPD) measure, and show that it consistently improves landmark detection performance.
摘要:脊柱侧弯是一种先天性的疾病,导致侧弯的脊柱。其评估依赖于椎骨的脊柱X射线图像的识别和定位,以往经由易于主观性和观察变异乏味和耗时的手动放射线程序。可靠性可以通过脊柱的地标自动检测和定位得以改善。以引导CNN在脊柱形状的学习,而在X射线图像检测地标,我们提出了一种基于二分的距离(BPD)测量值的新颖的损失,并表明它一致地提高了标志检测性能。
Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Kenneth M.C. Cheung, Michael To, Zhen Qian, Demetri Terzopoulos
Abstract: Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spinal landmarks. To guide a CNN in the learning of spinal shape while detecting landmarks in X-ray images, we propose a novel loss based on a bipartite distance (BPD) measure, and show that it consistently improves landmark detection performance.
摘要:脊柱侧弯是一种先天性的疾病,导致侧弯的脊柱。其评估依赖于椎骨的脊柱X射线图像的识别和定位,以往经由易于主观性和观察变异乏味和耗时的手动放射线程序。可靠性可以通过脊柱的地标自动检测和定位得以改善。以引导CNN在脊柱形状的学习,而在X射线图像检测地标,我们提出了一种基于二分的距离(BPD)测量值的新颖的损失,并表明它一致地提高了标志检测性能。
38. Uncertainty Evaluation Metric for Brain Tumour Segmentation [PDF] 返回目录
Raghav Mehta, Angelos Filos, Yarin Gal, Tal Arbel
Abstract: In this paper, we develop a metric designed to assess and rank uncertainty measures for the task of brain tumour sub-tissue segmentation in the BraTS 2019 sub-challenge on uncertainty quantification. The metric is designed to: (1) reward uncertainty measures where high confidence is assigned to correct assertions, and where incorrect assertions are assigned low confidence and (2) penalize measures that have higher percentages of under-confident correct assertions. Here, the workings of the components of the metric are explored based on a number of popular uncertainty measures evaluated on the BraTS 2019 dataset.
摘要:在本文中,我们开发了一个指标,旨在评估和排名的不确定性措施的不确定性量化臭小子2019次挑战脑瘤子组织分割的任务。度量被设计为:其中高置信度被分配给正确的断言(1)的不确定性的奖励措施,并且其中不正确的断言被分配低置信度和具有下自信正确断言的更高的百分比(2)违法处罚措施。在这里,度量的部件的工作进行了探索基于许多对臭小子2019集评估流行的不确定性的措施。
Raghav Mehta, Angelos Filos, Yarin Gal, Tal Arbel
Abstract: In this paper, we develop a metric designed to assess and rank uncertainty measures for the task of brain tumour sub-tissue segmentation in the BraTS 2019 sub-challenge on uncertainty quantification. The metric is designed to: (1) reward uncertainty measures where high confidence is assigned to correct assertions, and where incorrect assertions are assigned low confidence and (2) penalize measures that have higher percentages of under-confident correct assertions. Here, the workings of the components of the metric are explored based on a number of popular uncertainty measures evaluated on the BraTS 2019 dataset.
摘要:在本文中,我们开发了一个指标,旨在评估和排名的不确定性措施的不确定性量化臭小子2019次挑战脑瘤子组织分割的任务。度量被设计为:其中高置信度被分配给正确的断言(1)的不确定性的奖励措施,并且其中不正确的断言被分配低置信度和具有下自信正确断言的更高的百分比(2)违法处罚措施。在这里,度量的部件的工作进行了探索基于许多对臭小子2019集评估流行的不确定性的措施。
39. Joint Total Variation ESTATICS for Robust Multi-Parameter Mapping [PDF] 返回目录
Yaël Balbastre, Mikael Brudfors, Michela Azzarito, Christian Lambert, Martina F. Callaghan, John Ashburner
Abstract: Quantitative magnetic resonance imaging (qMRI) derives tissue-specific parameters -- such as the apparent transverse relaxation rate R2*, the longitudinal relaxation rate R1 and the magnetisation transfer saturation - that can be compared across sites and scanners and carry important information about the underlying microstructure. The multi-parameter mapping (MPM) protocol takes advantage of multi-echo acquisitions with variable flip angles to extract these parameters in a clinically acceptable scan time. In this context, ESTATICS performs a joint loglinear fit of multiple echo series to extract R2* and multiple extrapolated intercepts, thereby improving robustness to motion and decreasing the variance of the estimators. In this paper, we extend this model in two ways: (1) by introducing a joint total variation (JTV) prior on the intercepts and decay, and (2) by deriving a nonlinear maximum \emph{a posteriori} estimate. We evaluated the proposed algorithm by predicting left-out echoes in a rich single-subject dataset. In this validation, we outperformed other state-of-the-art methods and additionally showed that the proposed approach greatly reduces the variance of the estimated maps, without introducing bias.
摘要:定量磁共振成像(qMRI)导出组织特异性参数 - 诸如表观横向弛豫率R2 *,纵向弛豫率R1和磁化转移饱和 - 可以跨站点和扫描仪进行比较,并携带重要信息底层的微观结构。多参数映射(MPM)协议利用多回波采集的具有可变翻转角在临床上可接受的扫描时间,以提取这些参数。在这种情况下,执行ESTATICS的多重回波系列的联合对数线性拟合,以提取R2 *和多个外推拦截,从而提高鲁棒性运动和减小估计的方差。在本文中,我们扩展该模型有两种方式:(1)由通过导出一个非线性最大\ EMPH {后验估计}引入联合总变化(JTV)之前在拦截并衰减,和(2)。我们通过在丰富的单科预测数据集留出的回波评估算法。在该验证中,我们优于其他国家的最先进的方法,并且另外表明,所提出的方法大大降低了估计的地图的方差,而不会引入偏倚。
Yaël Balbastre, Mikael Brudfors, Michela Azzarito, Christian Lambert, Martina F. Callaghan, John Ashburner
Abstract: Quantitative magnetic resonance imaging (qMRI) derives tissue-specific parameters -- such as the apparent transverse relaxation rate R2*, the longitudinal relaxation rate R1 and the magnetisation transfer saturation - that can be compared across sites and scanners and carry important information about the underlying microstructure. The multi-parameter mapping (MPM) protocol takes advantage of multi-echo acquisitions with variable flip angles to extract these parameters in a clinically acceptable scan time. In this context, ESTATICS performs a joint loglinear fit of multiple echo series to extract R2* and multiple extrapolated intercepts, thereby improving robustness to motion and decreasing the variance of the estimators. In this paper, we extend this model in two ways: (1) by introducing a joint total variation (JTV) prior on the intercepts and decay, and (2) by deriving a nonlinear maximum \emph{a posteriori} estimate. We evaluated the proposed algorithm by predicting left-out echoes in a rich single-subject dataset. In this validation, we outperformed other state-of-the-art methods and additionally showed that the proposed approach greatly reduces the variance of the estimated maps, without introducing bias.
摘要:定量磁共振成像(qMRI)导出组织特异性参数 - 诸如表观横向弛豫率R2 *,纵向弛豫率R1和磁化转移饱和 - 可以跨站点和扫描仪进行比较,并携带重要信息底层的微观结构。多参数映射(MPM)协议利用多回波采集的具有可变翻转角在临床上可接受的扫描时间,以提取这些参数。在这种情况下,执行ESTATICS的多重回波系列的联合对数线性拟合,以提取R2 *和多个外推拦截,从而提高鲁棒性运动和减小估计的方差。在本文中,我们扩展该模型有两种方式:(1)由通过导出一个非线性最大\ EMPH {后验估计}引入联合总变化(JTV)之前在拦截并衰减,和(2)。我们通过在丰富的单科预测数据集留出的回波评估算法。在该验证中,我们优于其他国家的最先进的方法,并且另外表明,所提出的方法大大降低了估计的地图的方差,而不会引入偏倚。
注:中文为机器翻译结果!