目录
7. Multi-Scale Profiling of Brain Multigraphs by Eigen-based Cross-Diffusion and Heat Tracing for Brain State Profiling [PDF] 摘要
13. Automatic identification of fossils and abiotic grains during carbonate microfacies analysis using deep convolutional neural networks [PDF] 摘要
15. Dense Forecasting of Wildfire Smoke Particulate Matter Using Sparsity Invariant Convolutional Neural Networks [PDF] 摘要
17. ECOVNet: An Ensemble of Deep Convolutional Neural Networks Based on EfficientNet to Detect COVID-19 From Chest X-rays [PDF] 摘要
22. Eye Movement Feature Classification for Soccer Expertise Identification in Virtual Reality [PDF] 摘要
24. Adversarial Brain Multiplex Prediction From a Single Network for High-Order Connectional Gender-Specific Brain Mapping [PDF] 摘要
26. Generative Modelling of 3D in-silico Spongiosa with Controllable Micro-Structural Parameters [PDF] 摘要
摘要
1. Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection [PDF] 返回目录
Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon
Abstract: A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing. We use knowledge distillation to bridge the gap between a model trained on high-quality inputs at training time and another tested on low-quality inputs at inference time. In particular, we design a two-stage training pipeline for point cloud object detection. First, we train an object detection model on dense point clouds, which are generated from multiple frames using extra information only available at training time. Then, we train the model's identical counterpart on sparse single-frame point clouds with consistency regularization on features from both models. We show that this procedure improves performance on low-quality data during testing, without additional overhead.
摘要:在三维物体检测自动驾驶一个共同的困境是,高品质,密集的点云培训期间只提供,而不是测试。我们用知识蒸馏弥合在训练时间训练的高品质输入的模式,另一个在推理时间在低质量的输入测试之间的差距。特别是,我们设计了点云物体检测两个阶段的培训渠道。首先,我们训练的密度点云,这是从使用只适用于培训时间额外的信息多帧生成的对象检测模型。然后,我们训练上从两个模型的功能一致性正规化稀疏单帧点云模型的同样的对应。我们表明,这种方法提高了测试过程中对低质量的数据性能,而无需额外的开销。
Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon
Abstract: A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing. We use knowledge distillation to bridge the gap between a model trained on high-quality inputs at training time and another tested on low-quality inputs at inference time. In particular, we design a two-stage training pipeline for point cloud object detection. First, we train an object detection model on dense point clouds, which are generated from multiple frames using extra information only available at training time. Then, we train the model's identical counterpart on sparse single-frame point clouds with consistency regularization on features from both models. We show that this procedure improves performance on low-quality data during testing, without additional overhead.
摘要:在三维物体检测自动驾驶一个共同的困境是,高品质,密集的点云培训期间只提供,而不是测试。我们用知识蒸馏弥合在训练时间训练的高品质输入的模式,另一个在推理时间在低质量的输入测试之间的差距。特别是,我们设计了点云物体检测两个阶段的培训渠道。首先,我们训练的密度点云,这是从使用只适用于培训时间额外的信息多帧生成的对象检测模型。然后,我们训练上从两个模型的功能一致性正规化稀疏单帧点云模型的同样的对应。我们表明,这种方法提高了测试过程中对低质量的数据性能,而无需额外的开销。
2. Attribute Propagation Network for Graph Zero-shot Learning [PDF] 返回目录
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
Abstract: The goal of zero-shot learning (ZSL) is to train a model to classify samples of classes that were not seen during training. To address this challenging task, most ZSL methods relate unseen test classes to seen(training) classes via a pre-defined set of attributes that can describe all classes in the same semantic space, so the knowledge learned on the training classes can be adapted to unseen classes. In this paper, we aim to optimize the attribute space for ZSL by training a propagation mechanism to refine the semantic attributes of each class based on its neighbors and related classes on a graph of classes. We show that the propagated attributes can produce classifiers for zero-shot classes with significantly improved performance in different ZSL settings. The graph of classes is usually free or very cheap to acquire such as WordNet or ImageNet classes. When the graph is not provided, given pre-defined semantic embeddings of the classes, we can learn a mechanism to generate the graph in an end-to-end manner along with the propagation mechanism. However, this graph-aided technique has not been well-explored in the literature. In this paper, we introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier categorizing an image to the class with the nearest attribute vector to the image's embedding. For better generalization over unseen classes, different from previous methods, we adopt a meta-learning strategy to train the propagation mechanism and the similarity metric for the NN classifier on multiple sub-graphs, each associated with a classification task over a subset of training classes. In experiments with two zero-shot learning settings and five benchmark datasets, APNet achieves either compelling performance or new state-of-the-art results.
摘要:零次学习(ZSL)的目标是培养一个模型的培训期间也没有观察到类分类的样本。为了解决这个具有挑战性的任务,最ZSL方法,通过预先定义的设置,可以描述相同的语义空间的所有类的属性涉及看不见的测试类见过(培训)班,所以在训练课学到的知识能够适应看不见类。在本文中,我们的目标是通过训练传播机制来细化基于其邻居和相关的类上的类的曲线的每个类的语义属性来优化ZSL属性的空间。我们表明,传播属性可以产生零次类分类在不同ZSL设置显著改善性能。类的图形通常是免费的或非常便宜如WordNet的或ImageNet类获得这种。当不提供该曲线图中,给定预定义的类的语义的嵌入,我们可以了解到一种机制来与传播机制沿产生在端至端的方式的图表。然而,这种图形辅助技术还没有得到很好的研究文献中。在本文中,我们介绍了属性传播网络(APNET),它是由1)为每个类和2)一个参数最近邻(NN)分类器分类的图像的类与最近的曲线传播模型生成属性向量属性矢量图像的嵌入。为了更好地推广了看不见的班,从以前的方法不同,我们采用元学习策略的传播机制和多子图的神经网络分类器的相似性度量,每一个分类任务相关的培训在培训课程的一个子集。在与两个零射学习环境和五个标准数据集,实验APNET达到无论是卓越的性能和国家的最先进的新成果。
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
Abstract: The goal of zero-shot learning (ZSL) is to train a model to classify samples of classes that were not seen during training. To address this challenging task, most ZSL methods relate unseen test classes to seen(training) classes via a pre-defined set of attributes that can describe all classes in the same semantic space, so the knowledge learned on the training classes can be adapted to unseen classes. In this paper, we aim to optimize the attribute space for ZSL by training a propagation mechanism to refine the semantic attributes of each class based on its neighbors and related classes on a graph of classes. We show that the propagated attributes can produce classifiers for zero-shot classes with significantly improved performance in different ZSL settings. The graph of classes is usually free or very cheap to acquire such as WordNet or ImageNet classes. When the graph is not provided, given pre-defined semantic embeddings of the classes, we can learn a mechanism to generate the graph in an end-to-end manner along with the propagation mechanism. However, this graph-aided technique has not been well-explored in the literature. In this paper, we introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier categorizing an image to the class with the nearest attribute vector to the image's embedding. For better generalization over unseen classes, different from previous methods, we adopt a meta-learning strategy to train the propagation mechanism and the similarity metric for the NN classifier on multiple sub-graphs, each associated with a classification task over a subset of training classes. In experiments with two zero-shot learning settings and five benchmark datasets, APNet achieves either compelling performance or new state-of-the-art results.
摘要:零次学习(ZSL)的目标是培养一个模型的培训期间也没有观察到类分类的样本。为了解决这个具有挑战性的任务,最ZSL方法,通过预先定义的设置,可以描述相同的语义空间的所有类的属性涉及看不见的测试类见过(培训)班,所以在训练课学到的知识能够适应看不见类。在本文中,我们的目标是通过训练传播机制来细化基于其邻居和相关的类上的类的曲线的每个类的语义属性来优化ZSL属性的空间。我们表明,传播属性可以产生零次类分类在不同ZSL设置显著改善性能。类的图形通常是免费的或非常便宜如WordNet的或ImageNet类获得这种。当不提供该曲线图中,给定预定义的类的语义的嵌入,我们可以了解到一种机制来与传播机制沿产生在端至端的方式的图表。然而,这种图形辅助技术还没有得到很好的研究文献中。在本文中,我们介绍了属性传播网络(APNET),它是由1)为每个类和2)一个参数最近邻(NN)分类器分类的图像的类与最近的曲线传播模型生成属性向量属性矢量图像的嵌入。为了更好地推广了看不见的班,从以前的方法不同,我们采用元学习策略的传播机制和多子图的神经网络分类器的相似性度量,每一个分类任务相关的培训在培训课程的一个子集。在与两个零射学习环境和五个标准数据集,实验APNET达到无论是卓越的性能和国家的最先进的新成果。
3. Heuristics based Mosaic of Social-Sensor Services for Scene Reconstruction [PDF] 返回目录
Tooba Aamir, Hai Dong, Athman Bouguettaya
Abstract: We propose a heuristics-based social-sensor cloud service selection and composition model to reconstruct mosaic scenes. The proposed approach leverages crowdsourced social media images to create an image mosaic to reconstruct a scene at a designated location and an interval of time. The novel approach relies on the set of features defined on the bases of the image metadata to determine the relevance and composability of services. Novel heuristics are developed to filter out non-relevant services. Multiple machine learning strategies are employed to produce smooth service composition resulting in a mosaic of relevant images indexed by geolocation and time. The preliminary analytical results prove the feasibility of the proposed composition model.
摘要:本文提出一种基于启发式社会传感器的云服务选择和组合模型重构马赛克场景。所提出的方法利用众包社交媒体图像创建图像拼接在指定地点和时间间隔来重建场景。新颖的方法依赖于所述一组在图像的元数据的定义的碱,以确定服务的相关性和可组合的特征。新型的启发式的开发,以过滤掉不相关的服务。多个机器学习策略被用来产生导致由地理定位和时间索引的相关图像的马赛克平滑服务组合。初步分析结果证明了该组合模型的可行性。
Tooba Aamir, Hai Dong, Athman Bouguettaya
Abstract: We propose a heuristics-based social-sensor cloud service selection and composition model to reconstruct mosaic scenes. The proposed approach leverages crowdsourced social media images to create an image mosaic to reconstruct a scene at a designated location and an interval of time. The novel approach relies on the set of features defined on the bases of the image metadata to determine the relevance and composability of services. Novel heuristics are developed to filter out non-relevant services. Multiple machine learning strategies are employed to produce smooth service composition resulting in a mosaic of relevant images indexed by geolocation and time. The preliminary analytical results prove the feasibility of the proposed composition model.
摘要:本文提出一种基于启发式社会传感器的云服务选择和组合模型重构马赛克场景。所提出的方法利用众包社交媒体图像创建图像拼接在指定地点和时间间隔来重建场景。新颖的方法依赖于所述一组在图像的元数据的定义的碱,以确定服务的相关性和可组合的特征。新型的启发式的开发,以过滤掉不相关的服务。多个机器学习策略被用来产生导致由地理定位和时间索引的相关图像的马赛克平滑服务组合。初步分析结果证明了该组合模型的可行性。
4. Cloud Cover Nowcasting with Deep Learning [PDF] 返回目录
Léa Berthomier, Bruno Pradel, Lior Perez
Abstract: Nowcasting is a field of meteorology which aims at forecasting weather on a short term of up to a few hours. In the meteorology landscape, this field is rather specific as it requires particular techniques, such as data extrapolation, where conventional meteorology is generally based on physical modeling. In this paper, we focus on cloud cover nowcasting, which has various application areas such as satellite shots optimisation and photovoltaic energy production forecast. Following recent deep learning successes on multiple imagery tasks, we applied deep convolutionnal neural networks on Meteosat satellite images for cloud cover nowcasting. We present the results of several architectures specialized in image segmentation and time series prediction. We selected the best models according to machine learning metrics as well as meteorological metrics. All selected architectures showed significant improvements over persistence and the well-known U-Net surpasses AROME physical model.
摘要:临近预报是气象的领域,在上了一个短期的几个小时预测天气,使目标。在气象景观,这个字段是相当特异的,因为它需要特别技术,如数据外推,在那里常规的气象通常基于物理建模。在本文中,我们专注于云临近预报,其中有各种应用领域,如卫星拍摄的优化和光伏能源产量预测。继多个图像任务的最近的深度学习的成功,我们采用深convolutionnal神经网络的气象卫星图像的云盖临近预报。我们目前专业从事图像分割和时间序列预测多种结构的结果。我们根据机器学习指标以及气象指标选择最好的榜样。所有被选中的架构显示了持续显著的改进和著名的掌中宽带赶超AROME物理模型。
Léa Berthomier, Bruno Pradel, Lior Perez
Abstract: Nowcasting is a field of meteorology which aims at forecasting weather on a short term of up to a few hours. In the meteorology landscape, this field is rather specific as it requires particular techniques, such as data extrapolation, where conventional meteorology is generally based on physical modeling. In this paper, we focus on cloud cover nowcasting, which has various application areas such as satellite shots optimisation and photovoltaic energy production forecast. Following recent deep learning successes on multiple imagery tasks, we applied deep convolutionnal neural networks on Meteosat satellite images for cloud cover nowcasting. We present the results of several architectures specialized in image segmentation and time series prediction. We selected the best models according to machine learning metrics as well as meteorological metrics. All selected architectures showed significant improvements over persistence and the well-known U-Net surpasses AROME physical model.
摘要:临近预报是气象的领域,在上了一个短期的几个小时预测天气,使目标。在气象景观,这个字段是相当特异的,因为它需要特别技术,如数据外推,在那里常规的气象通常基于物理建模。在本文中,我们专注于云临近预报,其中有各种应用领域,如卫星拍摄的优化和光伏能源产量预测。继多个图像任务的最近的深度学习的成功,我们采用深convolutionnal神经网络的气象卫星图像的云盖临近预报。我们目前专业从事图像分割和时间序列预测多种结构的结果。我们根据机器学习指标以及气象指标选择最好的榜样。所有被选中的架构显示了持续显著的改进和著名的掌中宽带赶超AROME物理模型。
5. Local Context Attention for Salient Object Segmentation [PDF] 返回目录
Jing Tan, Pengfei Xiong, Yuwen He, Kuntao Xiao, Zhengyi Lv
Abstract: Salient object segmentation aims at distinguishing various salient objects from backgrounds. Despite the lack of semantic consistency, salient objects often have obvious texture and location characteristics in local area. Based on this priori, we propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture. The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context. Then it is expanded to a Local Context Block(LCB). Furthermore, an one-stage coarse-to-fine structure is implemented based on LCB to adaptively enhance the local context description ability. Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods, especially with 0.883 max F-score and 0.034 MAE on DUTS-TE dataset.
摘要:显着对象分割的目的是从背景区分不同的显着对象。尽管缺乏语义一致性,显着对象往往在局部区域明显的纹理和位置特征。在此基础上先验的,我们提出了一个新的本地上下文关注网络(LCANet)在本地生成一个统一的代表性建筑加固的特征图。所提出的网络引入了一个所注意的相关滤波(ACF)模块通过计算粗略预测和全局上下文之间的相关性的特征图来生成显式本地关注。然后将其扩展到本地上下文块(LCB)。此外,一个段粗到细的结构是基于LCB实现自适应地增强局部上下文描述能力。全面实验在几个显着对象分割数据集进行的,证明所提出的LCANet针对国家的最先进的方法的优良性能,特别是0.883最大F值和0.034 MAE上DUTS-TE的数据集。
Jing Tan, Pengfei Xiong, Yuwen He, Kuntao Xiao, Zhengyi Lv
Abstract: Salient object segmentation aims at distinguishing various salient objects from backgrounds. Despite the lack of semantic consistency, salient objects often have obvious texture and location characteristics in local area. Based on this priori, we propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture. The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context. Then it is expanded to a Local Context Block(LCB). Furthermore, an one-stage coarse-to-fine structure is implemented based on LCB to adaptively enhance the local context description ability. Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods, especially with 0.883 max F-score and 0.034 MAE on DUTS-TE dataset.
摘要:显着对象分割的目的是从背景区分不同的显着对象。尽管缺乏语义一致性,显着对象往往在局部区域明显的纹理和位置特征。在此基础上先验的,我们提出了一个新的本地上下文关注网络(LCANet)在本地生成一个统一的代表性建筑加固的特征图。所提出的网络引入了一个所注意的相关滤波(ACF)模块通过计算粗略预测和全局上下文之间的相关性的特征图来生成显式本地关注。然后将其扩展到本地上下文块(LCB)。此外,一个段粗到细的结构是基于LCB实现自适应地增强局部上下文描述能力。全面实验在几个显着对象分割数据集进行的,证明所提出的LCANet针对国家的最先进的方法的优良性能,特别是0.883最大F值和0.034 MAE上DUTS-TE的数据集。
6. Multi-View Brain HyperConnectome AutoEncoder For Brain State Classification [PDF] 返回目录
Alin Banka, Inis Buzi, Islem Rekik
Abstract: Graph embedding is a powerful method to represent graph neurological data (e.g., brain connectomes) in a low dimensional space for brain connectivity mapping, prediction and classification. However, existing embedding algorithms have two major limitations. First, they primarily focus on preserving one-to-one topological relationships between nodes (i.e., regions of interest (ROIs) in a connectome), but they have mostly ignored many-to-many relationships (i.e., set to set), which can be captured using a hyperconnectome structure. Second, existing graph embedding techniques cannot be easily adapted to multi-view graph data with heterogeneous distributions. In this paper, while cross-pollinating adversarial deep learning with hypergraph theory, we aim to jointly learn deep latent embeddings of subject0specific multi-view brain graphs to eventually disentangle different brain states. First, we propose a new simple strategy to build a hyperconnectome for each brain view based on nearest neighbour algorithm to preserve the connectivities across pairs of ROIs. Second, we design a hyperconnectome autoencoder (HCAE) framework which operates directly on the multi-view hyperconnectomes based on hypergraph convolutional layers to better capture the many-to-many relationships between brain regions (i.e., nodes). For each subject, we further regularize the hypergraph autoencoding by adversarial regularization to align the distribution of the learned hyperconnectome embeddings with that of the input hyperconnectomes. We formalize our hyperconnectome embedding within a geometric deep learning framework to optimize for a given subject, thereby designing an individual-based learning framework. Our experiments showed that the learned embeddings by HCAE yield to better results for brain state classification compared with other deep graph embedding methods methods.
摘要:图嵌入在用于脑连通映射,预测和分类的低维空间表示图形神经数据(例如,脑connectomes)的有效方法。但是,现有的嵌入算法有两个主要的限制。首先,他们主要集中在保节点之间的一个一对一的拓扑关系(即感兴趣区域(ROI)的连接组),但他们大多忽略了许多一对多的关系(即设置为集),其中可以使用hyperconnectome结构被捕获。其次,现有的图嵌入技术不能容易地适于与异构分布的多视点图数据。在本文中,而交叉授粉与超图理论对抗深度学习,我们的目标是共同学习subject0specific多视角的脑图的深潜在的嵌入到最终解开不同的大脑状态。首先,我们提出了一个新的简单的战略,以建立基于近邻法保持跨越对投资回报的连通性将每个脑视图hyperconnectome。其次,我们设计了直接操作基于超图卷积层,以更好地捕获多视点hyperconnectomes一个自动编码hyperconnectome(HCAE)框架的多对许多脑区域(即,节点)之间的关系。对于每一个主题,我们进一步对抗正规化正规化超图autoencoding对准了解到hyperconnectome的嵌入的分布与输入hyperconnectomes的。我们正式几何深度学习框架内,我们hyperconnectome嵌入到优化针对特定的主题,从而设计一个基于个体的学习框架。我们的实验表明,HCAE产量为大脑状态分类更好的结果与其他深图表嵌入方法方法相比所学的嵌入。
Alin Banka, Inis Buzi, Islem Rekik
Abstract: Graph embedding is a powerful method to represent graph neurological data (e.g., brain connectomes) in a low dimensional space for brain connectivity mapping, prediction and classification. However, existing embedding algorithms have two major limitations. First, they primarily focus on preserving one-to-one topological relationships between nodes (i.e., regions of interest (ROIs) in a connectome), but they have mostly ignored many-to-many relationships (i.e., set to set), which can be captured using a hyperconnectome structure. Second, existing graph embedding techniques cannot be easily adapted to multi-view graph data with heterogeneous distributions. In this paper, while cross-pollinating adversarial deep learning with hypergraph theory, we aim to jointly learn deep latent embeddings of subject0specific multi-view brain graphs to eventually disentangle different brain states. First, we propose a new simple strategy to build a hyperconnectome for each brain view based on nearest neighbour algorithm to preserve the connectivities across pairs of ROIs. Second, we design a hyperconnectome autoencoder (HCAE) framework which operates directly on the multi-view hyperconnectomes based on hypergraph convolutional layers to better capture the many-to-many relationships between brain regions (i.e., nodes). For each subject, we further regularize the hypergraph autoencoding by adversarial regularization to align the distribution of the learned hyperconnectome embeddings with that of the input hyperconnectomes. We formalize our hyperconnectome embedding within a geometric deep learning framework to optimize for a given subject, thereby designing an individual-based learning framework. Our experiments showed that the learned embeddings by HCAE yield to better results for brain state classification compared with other deep graph embedding methods methods.
摘要:图嵌入在用于脑连通映射,预测和分类的低维空间表示图形神经数据(例如,脑connectomes)的有效方法。但是,现有的嵌入算法有两个主要的限制。首先,他们主要集中在保节点之间的一个一对一的拓扑关系(即感兴趣区域(ROI)的连接组),但他们大多忽略了许多一对多的关系(即设置为集),其中可以使用hyperconnectome结构被捕获。其次,现有的图嵌入技术不能容易地适于与异构分布的多视点图数据。在本文中,而交叉授粉与超图理论对抗深度学习,我们的目标是共同学习subject0specific多视角的脑图的深潜在的嵌入到最终解开不同的大脑状态。首先,我们提出了一个新的简单的战略,以建立基于近邻法保持跨越对投资回报的连通性将每个脑视图hyperconnectome。其次,我们设计了直接操作基于超图卷积层,以更好地捕获多视点hyperconnectomes一个自动编码hyperconnectome(HCAE)框架的多对许多脑区域(即,节点)之间的关系。对于每一个主题,我们进一步对抗正规化正规化超图autoencoding对准了解到hyperconnectome的嵌入的分布与输入hyperconnectomes的。我们正式几何深度学习框架内,我们hyperconnectome嵌入到优化针对特定的主题,从而设计一个基于个体的学习框架。我们的实验表明,HCAE产量为大脑状态分类更好的结果与其他深图表嵌入方法方法相比所学的嵌入。
7. Multi-Scale Profiling of Brain Multigraphs by Eigen-based Cross-Diffusion and Heat Tracing for Brain State Profiling [PDF] 返回目录
Mustafa Saglam, Islem Rekik
Abstract: The individual brain can be viewed as a highly-complex multigraph (i.e. a set of graphs also called connectomes), where each graph represents a unique connectional view of pairwise brain region (node) relationships such as function or morphology. Due to its multifold complexity, understanding how brain disorders alter not only a single view of the brain graph, but its multigraph representation at the individual and population scales, remains one of the most challenging obstacles to profiling brain connectivity for ultimately disentangling a wide spectrum of brain states (e.g., healthy vs. disordered). In this work, while cross-pollinating the fields of spectral graph theory and diffusion models, we unprecedentedly propose an eigen-based cross-diffusion strategy for multigraph brain integration, comparison, and profiling. Specifically, we first devise a brain multigraph fusion model guided by eigenvector centrality to rely on most central nodes in the cross-diffusion process. Next, since the graph spectrum encodes its shape (or geometry) as if one can hear the shape of the graph, for the first time, we profile the fused multigraphs at several diffusion timescales by extracting the compact heat-trace signatures of their corresponding Laplacian matrices. Here, we reveal for the first time autistic and healthy profiles of morphological brain multigraphs, derived from T1-w magnetic resonance imaging (MRI), and demonstrate their discriminability in boosting the classification of unseen samples in comparison with state-of-the-art methods. This study presents the first step towards hearing the shape of the brain multigraph that can be leveraged for profiling and disentangling comorbid neurological disorders, thereby advancing precision medicine.
摘要:个体大脑可以被看作是一个高度复杂的多重图,其中每个图表示成对脑区域(节点)的关系如函数或形态的独特connectional视图(即曲线图也被称为connectomes的集)。由于其多方面的复杂性,了解脑部疾病如何改变大脑图的不仅是一个单一的视图,但在个人和人口规模及其多图表示,仍然是最具挑战性的障碍之一剖析脑连通性为最终解开的广泛大脑状态(例如,健康与无序)。在这项工作中,而交叉授粉谱图理论和扩散模型的领域,我们前所未有地提出了多图大脑整合,比较和分析基于固有的交叉扩散战略。具体地讲,我们首先通过设计特征向量中心引导的脑多重图融合模型依赖于最中心节点在横向扩散过程。接着,由于图谱编码它的形状(或几何形状),就好像一个可以听到的曲线图的形状,在第一次,我们通过提取其对应的拉普拉斯算子的紧凑热跟踪签名简档的若干扩散时间尺度融合多重图矩阵。在这里,我们揭示了首次自闭症和形态学脑多重图的健康概况,从T1-W磁共振成像(MRI)导出的,并且表现出与升压看不见样本的分类中比较它们的可辨性状态的最先进的方法。本研究提出对听觉的大脑重图可利用的分析和解开伴发神经系统紊乱,从而促进精药的形状的第一步。
Mustafa Saglam, Islem Rekik
Abstract: The individual brain can be viewed as a highly-complex multigraph (i.e. a set of graphs also called connectomes), where each graph represents a unique connectional view of pairwise brain region (node) relationships such as function or morphology. Due to its multifold complexity, understanding how brain disorders alter not only a single view of the brain graph, but its multigraph representation at the individual and population scales, remains one of the most challenging obstacles to profiling brain connectivity for ultimately disentangling a wide spectrum of brain states (e.g., healthy vs. disordered). In this work, while cross-pollinating the fields of spectral graph theory and diffusion models, we unprecedentedly propose an eigen-based cross-diffusion strategy for multigraph brain integration, comparison, and profiling. Specifically, we first devise a brain multigraph fusion model guided by eigenvector centrality to rely on most central nodes in the cross-diffusion process. Next, since the graph spectrum encodes its shape (or geometry) as if one can hear the shape of the graph, for the first time, we profile the fused multigraphs at several diffusion timescales by extracting the compact heat-trace signatures of their corresponding Laplacian matrices. Here, we reveal for the first time autistic and healthy profiles of morphological brain multigraphs, derived from T1-w magnetic resonance imaging (MRI), and demonstrate their discriminability in boosting the classification of unseen samples in comparison with state-of-the-art methods. This study presents the first step towards hearing the shape of the brain multigraph that can be leveraged for profiling and disentangling comorbid neurological disorders, thereby advancing precision medicine.
摘要:个体大脑可以被看作是一个高度复杂的多重图,其中每个图表示成对脑区域(节点)的关系如函数或形态的独特connectional视图(即曲线图也被称为connectomes的集)。由于其多方面的复杂性,了解脑部疾病如何改变大脑图的不仅是一个单一的视图,但在个人和人口规模及其多图表示,仍然是最具挑战性的障碍之一剖析脑连通性为最终解开的广泛大脑状态(例如,健康与无序)。在这项工作中,而交叉授粉谱图理论和扩散模型的领域,我们前所未有地提出了多图大脑整合,比较和分析基于固有的交叉扩散战略。具体地讲,我们首先通过设计特征向量中心引导的脑多重图融合模型依赖于最中心节点在横向扩散过程。接着,由于图谱编码它的形状(或几何形状),就好像一个可以听到的曲线图的形状,在第一次,我们通过提取其对应的拉普拉斯算子的紧凑热跟踪签名简档的若干扩散时间尺度融合多重图矩阵。在这里,我们揭示了首次自闭症和形态学脑多重图的健康概况,从T1-W磁共振成像(MRI)导出的,并且表现出与升压看不见样本的分类中比较它们的可辨性状态的最先进的方法。本研究提出对听觉的大脑重图可利用的分析和解开伴发神经系统紊乱,从而促进精药的形状的第一步。
8. MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection [PDF] 返回目录
Xin Lu, Quanquan Li, Buyu Li, Junjie Yan
Abstract: Modern object detection methods can be divided into one-stage approaches and two-stage ones. One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy. Although recent work try to improve the one-stage detectors by imitating the structural design of the two-stage ones, the accuracy gap is still significant. In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors. Unlike conventional mimic methods, MimicDet has a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking. Thus MimicDet can be end-to-end trained without the pre-train of the teacher network. And the cost does not increase much, which makes it practical to adopt large networks as backbones. We also make several specialized designs such as dual-path mimicking and staggered feature pyramid to facilitate the mimicking process. Experiments on the challenging COCO detection benchmark demonstrate the effectiveness of MimicDet. It achieves 46.1 mAP with ResNeXt-101 backbone on the COCO test-dev set, which significantly surpasses current state-of-the-art methods.
摘要:现代对象的检测方法大致可分为一阶段的方法和两阶段的。一阶段检测器是由于更有效的简单的结构,但两阶段检测器仍然需要在精确度领先。虽然最近的工作试图通过模仿两级者的结构设计,提高了一个阶段的检测,精度差距仍然显著。在本文中,我们提出MimicDet,一种新颖的和有效的框架通过直接模拟物来训练一阶段检测器的两阶段特征,旨在弥补单阶段和两阶段检测器之间的间隙的精度。不同于传统的模拟方法中,MimicDet具有用于一阶段和两阶段检测器的共用支柱,然后它分支成两个磁头,其是公设计为具有兼容功能的用于模仿。因此MimicDet可以是端至端无教师网络的预列车训练。而成本没有太大增加,这使得它的实际采用大型网络的骨干。我们还做一些专门的设计,如双通道模拟和交错功能金字塔便于模仿的过程。在挑战COCO检测基准实验证明MimicDet的有效性。它实现46.1地图上COCO测试-dev的组,其超过显著国家的最先进的现有方法ResNeXt-101骨干。
Xin Lu, Quanquan Li, Buyu Li, Junjie Yan
Abstract: Modern object detection methods can be divided into one-stage approaches and two-stage ones. One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy. Although recent work try to improve the one-stage detectors by imitating the structural design of the two-stage ones, the accuracy gap is still significant. In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors. Unlike conventional mimic methods, MimicDet has a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking. Thus MimicDet can be end-to-end trained without the pre-train of the teacher network. And the cost does not increase much, which makes it practical to adopt large networks as backbones. We also make several specialized designs such as dual-path mimicking and staggered feature pyramid to facilitate the mimicking process. Experiments on the challenging COCO detection benchmark demonstrate the effectiveness of MimicDet. It achieves 46.1 mAP with ResNeXt-101 backbone on the COCO test-dev set, which significantly surpasses current state-of-the-art methods.
摘要:现代对象的检测方法大致可分为一阶段的方法和两阶段的。一阶段检测器是由于更有效的简单的结构,但两阶段检测器仍然需要在精确度领先。虽然最近的工作试图通过模仿两级者的结构设计,提高了一个阶段的检测,精度差距仍然显著。在本文中,我们提出MimicDet,一种新颖的和有效的框架通过直接模拟物来训练一阶段检测器的两阶段特征,旨在弥补单阶段和两阶段检测器之间的间隙的精度。不同于传统的模拟方法中,MimicDet具有用于一阶段和两阶段检测器的共用支柱,然后它分支成两个磁头,其是公设计为具有兼容功能的用于模仿。因此MimicDet可以是端至端无教师网络的预列车训练。而成本没有太大增加,这使得它的实际采用大型网络的骨干。我们还做一些专门的设计,如双通道模拟和交错功能金字塔便于模仿的过程。在挑战COCO检测基准实验证明MimicDet的有效性。它实现46.1地图上COCO测试-dev的组,其超过显著国家的最先进的现有方法ResNeXt-101骨干。
9. Understanding Fairness of Gender Classification Algorithms Across Gender-Race Groups [PDF] 返回目录
Anoop Krishnan, Ali Almadan, Ajita Rattani
Abstract: Automated gender classification has important applications in many domains, such as demographic research, law enforcement, online advertising, as well as human-computer interaction. Recent research has questioned the fairness of this technology across gender and race. Specifically, the majority of the studies raised the concern of higher error rates of the face-based gender classification system for darker-skinned people like African-American and for women. However, to date, the majority of existing studies were limited to African-American and Caucasian only. The aim of this paper is to investigate the differential performance of the gender classification algorithms across gender-race groups. To this aim, we investigate the impact of (a) architectural differences in the deep learning algorithms and (b) training set imbalance, as a potential source of bias causing differential performance across gender and race. Experimental investigations are conducted on two latest large-scale publicly available facial attribute datasets, namely, UTKFace and FairFace. The experimental results suggested that the algorithms with architectural differences varied in performance with consistency towards specific gender-race groups. For instance, for all the algorithms used, Black females (Black race in general) always obtained the least accuracy rates. Middle Eastern males and Latino females obtained higher accuracy rates most of the time. Training set imbalance further widens the gap in the unequal accuracy rates across all gender-race groups. Further investigations using facial landmarks suggested that facial morphological differences due to the bone structure influenced by genetic and environmental factors could be the cause of the least performance of Black females and Black race, in general.
摘要:自动性别分类在许多领域的重要应用,如人口研究,执法,网络广告,以及人机交互。最近的研究质疑跨性别和种族这一技术的公平性。具体而言,大多数的研究提高了深色皮肤的人喜欢非洲裔和女性的脸性别分类系统的高错误率的关注。然而,迄今为止,大部分现有的研究仅限于唯一的非洲裔和白人。本文的目的是调查的跨性别族群中的性别分类算法差的表现。为了达到这个目的,我们调查(一)结构上的差异在深学习算法的影响,以及(b)训练集的不平衡,由于偏见造成跨性别和种族的差分性能的潜在来源。实验研究都是在最近两次大规模公开可用的面部属性的数据集,即UTKFace和FairFace进行。实验结果表明,与结构上的差异的算法在性能变化与向特定性别种族组的一致性。例如,用于所有的算法,黑人女性(一般为黑色人种)总是能获得最准确率。中东男性和拉丁裔女性获得更高的准确率大部分时间。训练集的不平衡进一步扩大在所有性别种族群体的不平等的准确率的差距。使用面部地标进一步的调查表明,由于受遗传和环境因素的影响可能是黑人女性和黑色人种,在一般的至少一个演出的原因骨骼结构的面部形态差异。
Anoop Krishnan, Ali Almadan, Ajita Rattani
Abstract: Automated gender classification has important applications in many domains, such as demographic research, law enforcement, online advertising, as well as human-computer interaction. Recent research has questioned the fairness of this technology across gender and race. Specifically, the majority of the studies raised the concern of higher error rates of the face-based gender classification system for darker-skinned people like African-American and for women. However, to date, the majority of existing studies were limited to African-American and Caucasian only. The aim of this paper is to investigate the differential performance of the gender classification algorithms across gender-race groups. To this aim, we investigate the impact of (a) architectural differences in the deep learning algorithms and (b) training set imbalance, as a potential source of bias causing differential performance across gender and race. Experimental investigations are conducted on two latest large-scale publicly available facial attribute datasets, namely, UTKFace and FairFace. The experimental results suggested that the algorithms with architectural differences varied in performance with consistency towards specific gender-race groups. For instance, for all the algorithms used, Black females (Black race in general) always obtained the least accuracy rates. Middle Eastern males and Latino females obtained higher accuracy rates most of the time. Training set imbalance further widens the gap in the unequal accuracy rates across all gender-race groups. Further investigations using facial landmarks suggested that facial morphological differences due to the bone structure influenced by genetic and environmental factors could be the cause of the least performance of Black females and Black race, in general.
摘要:自动性别分类在许多领域的重要应用,如人口研究,执法,网络广告,以及人机交互。最近的研究质疑跨性别和种族这一技术的公平性。具体而言,大多数的研究提高了深色皮肤的人喜欢非洲裔和女性的脸性别分类系统的高错误率的关注。然而,迄今为止,大部分现有的研究仅限于唯一的非洲裔和白人。本文的目的是调查的跨性别族群中的性别分类算法差的表现。为了达到这个目的,我们调查(一)结构上的差异在深学习算法的影响,以及(b)训练集的不平衡,由于偏见造成跨性别和种族的差分性能的潜在来源。实验研究都是在最近两次大规模公开可用的面部属性的数据集,即UTKFace和FairFace进行。实验结果表明,与结构上的差异的算法在性能变化与向特定性别种族组的一致性。例如,用于所有的算法,黑人女性(一般为黑色人种)总是能获得最准确率。中东男性和拉丁裔女性获得更高的准确率大部分时间。训练集的不平衡进一步扩大在所有性别种族群体的不平等的准确率的差距。使用面部地标进一步的调查表明,由于受遗传和环境因素的影响可能是黑人女性和黑色人种,在一般的至少一个演出的原因骨骼结构的面部形态差异。
10. BWCFace: Open-set Face Recognition using Body-worn Camera [PDF] 返回目录
Ali Almadan, Anoop Krishnan, Ajita Rattani
Abstract: With computer vision reaching an inflection point in the past decade, face recognition technology has become pervasive in policing, intelligence gathering, and consumer applications. Recently, face recognition technology has been deployed on bodyworn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic using traditional techniques on datasets with small sample size. This paper aims to bridge the gap in the state-of-the-art face recognition using bodyworn cameras (BWC). To this aim, the contribution of this work is two-fold: (1) collection of a dataset called BWCFace consisting of a total of 178K facial images of 132 subjects captured using the body-worn camera in in-door and daylight conditions, and (2) open-set evaluation of the latest deep-learning-based Convolutional Neural Network (CNN) architectures combined with five different loss functions for face identification, on the collected dataset. Experimental results on our BWCFace dataset suggest a maximum of 33.89% Rank-1 accuracy obtained when facial features are extracted using SENet-50 trained on a large scale VGGFace2 facial image dataset. However, performance improved up to a maximum of 99.00% Rank-1 accuracy when pretrained CNN models are fine-tuned on a subset of identities in our BWCFace dataset. Equivalent performances were obtained across body-worn camera sensor models used in existing face datasets. The collected BWCFace dataset and the pretrained/ fine-tuned algorithms are publicly available to promote further research and development in this area. A downloadable link of this dataset and the algorithms is available by contacting the authors.
摘要:随着计算机视觉达到过去十年的拐点,人脸识别技术已经成为维持治安,情报搜集和消费类应用普遍。近日,人脸识别技术已经被部署在佩相机保持人员的安全,实现态势感知和审判提供证据。然而,有限的学术研究已经使用在小样本数据集传统工艺这一主题进行。本文旨在弥补使用戴式摄像机(BWC)的状态的最先进的面部识别的间隙。为了这个目的,这项工作的贡献是双重的:(1)一个被称为BWCFace数据集包括总共132名受试者178K的面部图像的集合使用所述体佩式照相机在门和日光条件捕获,并且(2)最新的深学习基础的卷积神经网络(CNN)的体系结构与面部识别五种不同的损失函数,对所收集的数据集相结合的开放式套装评测。在我们的BWCFace实验结果数据集建议在使用SENET-50训练大规模VGGFace2面部图像数据集被提取的面部特征所获得的最大的33.89%秩-1精度。然而,性能提升最高可达99.00%,秩1精度时预先训练CNN模型是微调在我们BWCFace数据集身份的一个子集。横跨在现有面数据集用于体佩式相机传感器模型,获得等效的性能。所收集的数据集BWCFace和预训练/微调算法是公开的,以促进这一领域的进一步研究和开发。该数据集和算法的下载链接可联系作者。
Ali Almadan, Anoop Krishnan, Ajita Rattani
Abstract: With computer vision reaching an inflection point in the past decade, face recognition technology has become pervasive in policing, intelligence gathering, and consumer applications. Recently, face recognition technology has been deployed on bodyworn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic using traditional techniques on datasets with small sample size. This paper aims to bridge the gap in the state-of-the-art face recognition using bodyworn cameras (BWC). To this aim, the contribution of this work is two-fold: (1) collection of a dataset called BWCFace consisting of a total of 178K facial images of 132 subjects captured using the body-worn camera in in-door and daylight conditions, and (2) open-set evaluation of the latest deep-learning-based Convolutional Neural Network (CNN) architectures combined with five different loss functions for face identification, on the collected dataset. Experimental results on our BWCFace dataset suggest a maximum of 33.89% Rank-1 accuracy obtained when facial features are extracted using SENet-50 trained on a large scale VGGFace2 facial image dataset. However, performance improved up to a maximum of 99.00% Rank-1 accuracy when pretrained CNN models are fine-tuned on a subset of identities in our BWCFace dataset. Equivalent performances were obtained across body-worn camera sensor models used in existing face datasets. The collected BWCFace dataset and the pretrained/ fine-tuned algorithms are publicly available to promote further research and development in this area. A downloadable link of this dataset and the algorithms is available by contacting the authors.
摘要:随着计算机视觉达到过去十年的拐点,人脸识别技术已经成为维持治安,情报搜集和消费类应用普遍。近日,人脸识别技术已经被部署在佩相机保持人员的安全,实现态势感知和审判提供证据。然而,有限的学术研究已经使用在小样本数据集传统工艺这一主题进行。本文旨在弥补使用戴式摄像机(BWC)的状态的最先进的面部识别的间隙。为了这个目的,这项工作的贡献是双重的:(1)一个被称为BWCFace数据集包括总共132名受试者178K的面部图像的集合使用所述体佩式照相机在门和日光条件捕获,并且(2)最新的深学习基础的卷积神经网络(CNN)的体系结构与面部识别五种不同的损失函数,对所收集的数据集相结合的开放式套装评测。在我们的BWCFace实验结果数据集建议在使用SENET-50训练大规模VGGFace2面部图像数据集被提取的面部特征所获得的最大的33.89%秩-1精度。然而,性能提升最高可达99.00%,秩1精度时预先训练CNN模型是微调在我们BWCFace数据集身份的一个子集。横跨在现有面数据集用于体佩式相机传感器模型,获得等效的性能。所收集的数据集BWCFace和预训练/微调算法是公开的,以促进这一领域的进一步研究和开发。该数据集和算法的下载链接可联系作者。
11. 3D Object Localization Using 2D Estimates for Computer Vision Applications [PDF] 返回目录
Taha Hasan Masood Siddique, Muhammad Usman
Abstract: A technique for object localization based on pose estimation and camera calibration is presented. The 3-dimensional (3D) coordinates are estimated by collecting multiple 2-dimensional (2D) images of the object and are utilized for the calibration of the camera. The calibration steps involving a number of parameter calculation including intrinsic and extrinsic parameters for the removal of lens distortion, computation of object's size and camera's position calculation are discussed. A transformation strategy to estimate the 3D pose using the 2D images is presented. The proposed method is implemented on MATLAB and validation experiments are carried out for both pose estimation and camera calibration.
摘要了一种用于基于姿态估计和摄像机标定物体定位技术:抽象。三维(3D)坐标通过收集对象的多个2维(2D)图像和被用于摄像机的校准估计。涉及多个参数计算的,包括用于去除透镜畸变的内部和外部参数的校准步骤,对象的大小和摄像机的位置计算的计算进行了讨论。估计使用2D图像的3D姿态的转型战略提出。该方法是在MATLAB实施和验证实验两种姿态估计和摄像机标定进行。
Taha Hasan Masood Siddique, Muhammad Usman
Abstract: A technique for object localization based on pose estimation and camera calibration is presented. The 3-dimensional (3D) coordinates are estimated by collecting multiple 2-dimensional (2D) images of the object and are utilized for the calibration of the camera. The calibration steps involving a number of parameter calculation including intrinsic and extrinsic parameters for the removal of lens distortion, computation of object's size and camera's position calculation are discussed. A transformation strategy to estimate the 3D pose using the 2D images is presented. The proposed method is implemented on MATLAB and validation experiments are carried out for both pose estimation and camera calibration.
摘要了一种用于基于姿态估计和摄像机标定物体定位技术:抽象。三维(3D)坐标通过收集对象的多个2维(2D)图像和被用于摄像机的校准估计。涉及多个参数计算的,包括用于去除透镜畸变的内部和外部参数的校准步骤,对象的大小和摄像机的位置计算的计算进行了讨论。估计使用2D图像的3D姿态的转型战略提出。该方法是在MATLAB实施和验证实验两种姿态估计和摄像机标定进行。
12. Unifying data for fine-grained visual species classification [PDF] 返回目录
Sayali Kulkarni, Tomer Gadot, Chen Luo, Tanya Birch, Eric Fegraus
Abstract: Wildlife monitoring is crucial to nature conservation and has been done by manual observations from motion-triggered camera traps deployed in the field. Widespread adoption of such in-situ sensors has resulted in unprecedented data volumes being collected over the last decade. A significant challenge exists to process and reliably identify what is in these images efficiently. Advances in computer vision are poised to provide effective solutions with custom AI models built to automatically identify images of interest and label the species in them. Here we outline the data unification effort for the Wildlife Insights platform from various conservation partners, and the challenges involved. Then we present an initial deep convolutional neural network model, trained on 2.9M images across 465 fine-grained species, with a goal to reduce the load on human experts to classify species in images manually. The long-term goal is to enable scientists to make conservation recommendations from near real-time analysis of species abundance and population health.
摘要:野生动物监测是自然保护的关键,并已通过部署场运动触发相机陷阱人工观测完成。广泛采用这种原位传感器已造成前所未有的数据量正在收集在过去的十年。一个显著的挑战存在于处理和可靠地识别什么是在这些图像有效。在计算机视觉的进步正准备提供内置自动识别感兴趣的图像和标签的物种为其量身定制的AI模式有效的解决方案。在这里,我们勾勒出各种保护合作伙伴的野生动物洞察平台的数据统一的努力,以及所涉及的挑战。然后,我们提出了一个初始的深卷积神经网络模型,培养跨465细粒物质2.9M的图像,用进球来对人类专家的负荷减少手动分类品种中的图像。长期目标是让科学家从物种丰富度和人群健康的接近实时的分析使养护建议。
Sayali Kulkarni, Tomer Gadot, Chen Luo, Tanya Birch, Eric Fegraus
Abstract: Wildlife monitoring is crucial to nature conservation and has been done by manual observations from motion-triggered camera traps deployed in the field. Widespread adoption of such in-situ sensors has resulted in unprecedented data volumes being collected over the last decade. A significant challenge exists to process and reliably identify what is in these images efficiently. Advances in computer vision are poised to provide effective solutions with custom AI models built to automatically identify images of interest and label the species in them. Here we outline the data unification effort for the Wildlife Insights platform from various conservation partners, and the challenges involved. Then we present an initial deep convolutional neural network model, trained on 2.9M images across 465 fine-grained species, with a goal to reduce the load on human experts to classify species in images manually. The long-term goal is to enable scientists to make conservation recommendations from near real-time analysis of species abundance and population health.
摘要:野生动物监测是自然保护的关键,并已通过部署场运动触发相机陷阱人工观测完成。广泛采用这种原位传感器已造成前所未有的数据量正在收集在过去的十年。一个显著的挑战存在于处理和可靠地识别什么是在这些图像有效。在计算机视觉的进步正准备提供内置自动识别感兴趣的图像和标签的物种为其量身定制的AI模式有效的解决方案。在这里,我们勾勒出各种保护合作伙伴的野生动物洞察平台的数据统一的努力,以及所涉及的挑战。然后,我们提出了一个初始的深卷积神经网络模型,培养跨465细粒物质2.9M的图像,用进球来对人类专家的负荷减少手动分类品种中的图像。长期目标是让科学家从物种丰富度和人群健康的接近实时的分析使养护建议。
13. Automatic identification of fossils and abiotic grains during carbonate microfacies analysis using deep convolutional neural networks [PDF] 返回目录
Xiaokang Liu, Haijun Song
Abstract: Petrographic analysis based on microfacies identification in thin sections is widely used in sedimentary environment interpretation and paleoecological reconstruction. Fossil recognition from microfacies is an essential procedure for petrographers to complete this task. Distinguishing the morphological and microstructural diversity of skeletal fragments requires extensive prior knowledge of fossil morphotypes in microfacies and long training sessions under the microscope. This requirement engenders certain challenges for sedimentologists and paleontologists, especially novices. However, a machine classifier can help address this challenge. We collected a microfacies image dataset comprising both public data from 1,149 references and our own materials (including a total of 30,815 images of 22 fossil and abiotic grain groups). We employed a high-performance workstation to implement four classic deep convolutional neural networks (DCNNs), which have proven to be highly efficient in computer vision over the last several years. Our framework uses a transfer learning technique, which reuses the pre-trained parameters that are trained on a larger ImageNet dataset as initialization for the network to achieve high accuracy with low computing costs. We obtained up to 95% of the top one and 99% of the top three test accuracies in the Inception ResNet v2 architecture. The machine classifier exhibited 0.99 precision on minerals, such as dolomite and pyrite. Although it had some difficulty on samples having similar morphologies, such as the bivalve, brachiopod, and ostracod, it nevertheless obtained 0.88 precision. Our machine learning framework demonstrated high accuracy with reproducibility and bias avoidance that was comparable to those of human classifiers. Its application can thus eliminate much of the tedious, manually intensive efforts by human experts conducting routine identification.
摘要:基于在薄切片微识别岩相分析被广泛用于沉积环境解释和古生态重建。从微化石认可是岩石学家来完成这一任务的重要步骤。区分骨骼碎片的形态和微观结构的多样性,需要在显微镜下微长培训班化石形态的丰富的先验知识。这一要求对于滋生沉积学家和古生物学家,尤其是新手一定的挑战。然而,一机分类可以帮助解决这一难题。我们收集了一个微图像数据集,其包含(共包括22化石30815个图像和非生物颗粒团)从1149点的引用和我们自己的材料二者的公共数据。我们采用了高性能的工作站来实现四大名著深卷积神经网络(DCNNs),这已被证明在过去的几年是在计算机视觉高效。我们的框架使用的转印学习技术,它重新使用那些在更大的数据集ImageNet训练作为初始化的网络,以实现具有低计算成本高的精度预先训练的参数。我们获得了顶端一95%,而在成立之初RESNET V2架构前三名测试精度的99%。机器分类器表现出在矿物,如白云石和黄铁矿0.99精度。虽然对具有类似形态,如双壳,腕足动物,和介形类样品一些困难,但它获得0.88的精度。我们的机器学习框架演示了可重复性和偏见回避,这是媲美人类分类的精度高。因此,它的应用可以消除很多人类专家进行常规鉴定的繁琐,人工密集的努力。
Xiaokang Liu, Haijun Song
Abstract: Petrographic analysis based on microfacies identification in thin sections is widely used in sedimentary environment interpretation and paleoecological reconstruction. Fossil recognition from microfacies is an essential procedure for petrographers to complete this task. Distinguishing the morphological and microstructural diversity of skeletal fragments requires extensive prior knowledge of fossil morphotypes in microfacies and long training sessions under the microscope. This requirement engenders certain challenges for sedimentologists and paleontologists, especially novices. However, a machine classifier can help address this challenge. We collected a microfacies image dataset comprising both public data from 1,149 references and our own materials (including a total of 30,815 images of 22 fossil and abiotic grain groups). We employed a high-performance workstation to implement four classic deep convolutional neural networks (DCNNs), which have proven to be highly efficient in computer vision over the last several years. Our framework uses a transfer learning technique, which reuses the pre-trained parameters that are trained on a larger ImageNet dataset as initialization for the network to achieve high accuracy with low computing costs. We obtained up to 95% of the top one and 99% of the top three test accuracies in the Inception ResNet v2 architecture. The machine classifier exhibited 0.99 precision on minerals, such as dolomite and pyrite. Although it had some difficulty on samples having similar morphologies, such as the bivalve, brachiopod, and ostracod, it nevertheless obtained 0.88 precision. Our machine learning framework demonstrated high accuracy with reproducibility and bias avoidance that was comparable to those of human classifiers. Its application can thus eliminate much of the tedious, manually intensive efforts by human experts conducting routine identification.
摘要:基于在薄切片微识别岩相分析被广泛用于沉积环境解释和古生态重建。从微化石认可是岩石学家来完成这一任务的重要步骤。区分骨骼碎片的形态和微观结构的多样性,需要在显微镜下微长培训班化石形态的丰富的先验知识。这一要求对于滋生沉积学家和古生物学家,尤其是新手一定的挑战。然而,一机分类可以帮助解决这一难题。我们收集了一个微图像数据集,其包含(共包括22化石30815个图像和非生物颗粒团)从1149点的引用和我们自己的材料二者的公共数据。我们采用了高性能的工作站来实现四大名著深卷积神经网络(DCNNs),这已被证明在过去的几年是在计算机视觉高效。我们的框架使用的转印学习技术,它重新使用那些在更大的数据集ImageNet训练作为初始化的网络,以实现具有低计算成本高的精度预先训练的参数。我们获得了顶端一95%,而在成立之初RESNET V2架构前三名测试精度的99%。机器分类器表现出在矿物,如白云石和黄铁矿0.99精度。虽然对具有类似形态,如双壳,腕足动物,和介形类样品一些困难,但它获得0.88的精度。我们的机器学习框架演示了可重复性和偏见回避,这是媲美人类分类的精度高。因此,它的应用可以消除很多人类专家进行常规鉴定的繁琐,人工密集的努力。
14. FTN: Foreground-Guided Texture-Focused Person Re-Identification [PDF] 返回目录
Donghaisheng Liu, Shoudong Han, Yang Chen, Chenfei Xia, Jun Zhao
Abstract: Person re-identification (Re-ID) is a challenging task as persons are often in different backgrounds. Most recent Re-ID methods treat the foreground and background information equally for person discriminative learning, but can easily lead to potential false alarm problems when different persons are in similar backgrounds or the same person is in different backgrounds. In this paper, we propose a Foreground-Guided Texture-Focused Network (FTN) for Re-ID, which can weaken the representation of unrelated background and highlight the attributes person-related in an end-to-end manner. FTN consists of a semantic encoder (S-Enc) and a compact foreground attention module (CFA) for Re-ID task, and a texture-focused decoder (TF-Dec) for reconstruction task. Particularly, we build a foreground-guided semi-supervised learning strategy for TF-Dec because the reconstructed ground-truths are only the inputs of FTN weighted by the Gaussian mask and the attention mask generated by CFA. Moreover, a new gradient loss is introduced to encourage the network to mine the texture consistency between the inputs and the reconstructed outputs. Our FTN is computationally efficient and extensive experiments on three commonly used datasets Market1501, CUHK03 and MSMT17 demonstrate that the proposed method performs favorably against the state-of-the-art methods.
摘要:人重新鉴定(重新-ID)是一项具有挑战性的任务,因为人往往在不同的背景。最近重新编号方法公平对待的前景和背景信息的人判别学习,但也容易导致潜在的误报问题时,不同的人在相似背景或同一人在不同的背景。在本文中,我们提出了一个前景制导纹理为中心的网络(FTN),用于重新ID,其可以削弱背景无关的表示,并突出显示属性个人相关的端至端的方式。 FTN由语义编码器(S-ENC)和用于重新ID任务紧凑前景注意模块(CFA),以及用于重建任务纹理聚焦解码器(TF-月)的。特别是,我们建立TF - 12月前景制导半监督学习策略,因为重建的地面真理只能由高斯掩模及CFA产生的关注度掩加权FTN的投入。此外,一个新的梯度损失被引入,以鼓励网络矿输入和重构的输出之间的质地的一致性。我们的FTN是计算有效的和常用的三种数据集Market1501,CUHK03和MSMT17广泛的实验表明,该方法有利地执行对国家的最先进的方法。
Donghaisheng Liu, Shoudong Han, Yang Chen, Chenfei Xia, Jun Zhao
Abstract: Person re-identification (Re-ID) is a challenging task as persons are often in different backgrounds. Most recent Re-ID methods treat the foreground and background information equally for person discriminative learning, but can easily lead to potential false alarm problems when different persons are in similar backgrounds or the same person is in different backgrounds. In this paper, we propose a Foreground-Guided Texture-Focused Network (FTN) for Re-ID, which can weaken the representation of unrelated background and highlight the attributes person-related in an end-to-end manner. FTN consists of a semantic encoder (S-Enc) and a compact foreground attention module (CFA) for Re-ID task, and a texture-focused decoder (TF-Dec) for reconstruction task. Particularly, we build a foreground-guided semi-supervised learning strategy for TF-Dec because the reconstructed ground-truths are only the inputs of FTN weighted by the Gaussian mask and the attention mask generated by CFA. Moreover, a new gradient loss is introduced to encourage the network to mine the texture consistency between the inputs and the reconstructed outputs. Our FTN is computationally efficient and extensive experiments on three commonly used datasets Market1501, CUHK03 and MSMT17 demonstrate that the proposed method performs favorably against the state-of-the-art methods.
摘要:人重新鉴定(重新-ID)是一项具有挑战性的任务,因为人往往在不同的背景。最近重新编号方法公平对待的前景和背景信息的人判别学习,但也容易导致潜在的误报问题时,不同的人在相似背景或同一人在不同的背景。在本文中,我们提出了一个前景制导纹理为中心的网络(FTN),用于重新ID,其可以削弱背景无关的表示,并突出显示属性个人相关的端至端的方式。 FTN由语义编码器(S-ENC)和用于重新ID任务紧凑前景注意模块(CFA),以及用于重建任务纹理聚焦解码器(TF-月)的。特别是,我们建立TF - 12月前景制导半监督学习策略,因为重建的地面真理只能由高斯掩模及CFA产生的关注度掩加权FTN的投入。此外,一个新的梯度损失被引入,以鼓励网络矿输入和重构的输出之间的质地的一致性。我们的FTN是计算有效的和常用的三种数据集Market1501,CUHK03和MSMT17广泛的实验表明,该方法有利地执行对国家的最先进的方法。
15. Dense Forecasting of Wildfire Smoke Particulate Matter Using Sparsity Invariant Convolutional Neural Networks [PDF] 返回目录
Renhao Wang, Ashutosh Bhudia, Brandon Dos Remedios, Minnie Teng, Raymond Ng
Abstract: Accurate forecasts of fine particulate matter (PM 2.5) from wildfire smoke are crucial to safeguarding cardiopulmonary public health. Existing forecasting systems are trained on sparse and inaccurate ground truths, and do not take sufficient advantage of important spatial inductive biases. In this work, we present a convolutional neural network which preserves sparsity invariance throughout, and leverages multitask learning to perform dense forecasts of PM 2.5values. We demonstrate that our model outperforms two existing smoke forecasting systems during the 2018 and 2019 wildfire season in British Columbia, Canada, predicting PM 2.5 at a grid resolution of 10 km, 24 hours in advance with high fidelity. Most interestingly, our model also generalizes to meaningful smoke dispersion patterns despite training with irregularly distributed ground truth PM 2.5 values available in only 0.5% of grid cells.
摘要:从野火烟雾细颗粒物(PM 2.5)准确预报,对维护心肺公众健康是至关重要的。现有的预报系统训练的稀疏和不准确的基础事实,并没有考虑的重要空间感应偏见足够的优势。在这项工作中,我们提出它保留整个稀疏不变性卷积神经网络,并利用多任务学习执行PM 2.5values密集的预测。我们证明了我们的模特在加拿大不列颠哥伦比亚省2018年和2019野火季节优于现有的两个烟雾预报系统,在10公里,24小时的网格分辨率提前与高保真预测PM 2.5。最有趣的是,我们的模型还可以推广到有意义的烟雾扩散图案,尽管训练与不规则分布的地面实况PM 2.5值在仅0.5%的网格单元。
Renhao Wang, Ashutosh Bhudia, Brandon Dos Remedios, Minnie Teng, Raymond Ng
Abstract: Accurate forecasts of fine particulate matter (PM 2.5) from wildfire smoke are crucial to safeguarding cardiopulmonary public health. Existing forecasting systems are trained on sparse and inaccurate ground truths, and do not take sufficient advantage of important spatial inductive biases. In this work, we present a convolutional neural network which preserves sparsity invariance throughout, and leverages multitask learning to perform dense forecasts of PM 2.5values. We demonstrate that our model outperforms two existing smoke forecasting systems during the 2018 and 2019 wildfire season in British Columbia, Canada, predicting PM 2.5 at a grid resolution of 10 km, 24 hours in advance with high fidelity. Most interestingly, our model also generalizes to meaningful smoke dispersion patterns despite training with irregularly distributed ground truth PM 2.5 values available in only 0.5% of grid cells.
摘要:从野火烟雾细颗粒物(PM 2.5)准确预报,对维护心肺公众健康是至关重要的。现有的预报系统训练的稀疏和不准确的基础事实,并没有考虑的重要空间感应偏见足够的优势。在这项工作中,我们提出它保留整个稀疏不变性卷积神经网络,并利用多任务学习执行PM 2.5values密集的预测。我们证明了我们的模特在加拿大不列颠哥伦比亚省2018年和2019野火季节优于现有的两个烟雾预报系统,在10公里,24小时的网格分辨率提前与高保真预测PM 2.5。最有趣的是,我们的模型还可以推广到有意义的烟雾扩散图案,尽管训练与不规则分布的地面实况PM 2.5值在仅0.5%的网格单元。
16. Insights on Evaluation of Camera Re-localization Using Relative Pose Regression [PDF] 返回目录
Amir Shalev, Omer Achrack, Brian Fulkerson, Ben-Zion Bobrovsky
Abstract: We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in reality performing worse. We reveal a tradeoff between accuracy and the 3D volume of the regressed subspace. We believe that unlike other relocalization approaches, in the case of relative pose regression, the regressed subspace 3D volume is less dependent on the scene and more affect by the method used to score the overlap, which determined how closely sampled viewpoints are. We propose three new metrics to remedy the issue mentioned above. The proposed metrics incorporate statistics about the regression subspace volume. We also propose a new pose regression network that serves as a new baseline for this task. We compare the performance of our trained model on Microsoft 7-Scenes and Cambridge Landmarks datasets both with the standard metrics and the newly proposed metrics and adjust the overlap score to reveal the tradeoff between the subspace and performance. The results show that the proposed metrics are more robust to different overlap threshold than the conventional approaches. Finally, we show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes.
摘要:我们认为相对位姿回归的视觉重新定位的问题。最近,几个有前途的方法已经出现在这一领域。我们主张,即使他们证明上使用相同的分裂训练和测试相同的数据集,它们之间的忠实比较是不是因为在当前采用的评估指标可用,一些方法可能顺利地执行,而在现实中效果较差。我们揭示准确性和回归的子空间的三维体积之间的权衡。我们相信,不像其他方法重新定位,在相对位姿回归的情况下,回归的子空间三维体积较少依赖于现场和用来评定重叠的方法,其中确定密切采样观点如何更影响。我们提出了三个新的指标来解决上述问题。所提出的指标纳入对回归子空间量统计信息。我们还提出了一种新的姿态回归网络作为完成这个任务,新的基准。我们比较了微软7花絮我们的训练模型的性能和剑桥地标数据集都与标准指标和新提出的指标,并调整重叠得分揭示子空间和性能之间的权衡。结果表明,所提出的度量是更健壮到比传统方法不同的重叠阈值。最后,我们表明,我们的网络推广顺利,特别是,在一个场景引线训练对其它场景的性能几乎没有损失。
Amir Shalev, Omer Achrack, Brian Fulkerson, Ben-Zion Bobrovsky
Abstract: We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in reality performing worse. We reveal a tradeoff between accuracy and the 3D volume of the regressed subspace. We believe that unlike other relocalization approaches, in the case of relative pose regression, the regressed subspace 3D volume is less dependent on the scene and more affect by the method used to score the overlap, which determined how closely sampled viewpoints are. We propose three new metrics to remedy the issue mentioned above. The proposed metrics incorporate statistics about the regression subspace volume. We also propose a new pose regression network that serves as a new baseline for this task. We compare the performance of our trained model on Microsoft 7-Scenes and Cambridge Landmarks datasets both with the standard metrics and the newly proposed metrics and adjust the overlap score to reveal the tradeoff between the subspace and performance. The results show that the proposed metrics are more robust to different overlap threshold than the conventional approaches. Finally, we show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes.
摘要:我们认为相对位姿回归的视觉重新定位的问题。最近,几个有前途的方法已经出现在这一领域。我们主张,即使他们证明上使用相同的分裂训练和测试相同的数据集,它们之间的忠实比较是不是因为在当前采用的评估指标可用,一些方法可能顺利地执行,而在现实中效果较差。我们揭示准确性和回归的子空间的三维体积之间的权衡。我们相信,不像其他方法重新定位,在相对位姿回归的情况下,回归的子空间三维体积较少依赖于现场和用来评定重叠的方法,其中确定密切采样观点如何更影响。我们提出了三个新的指标来解决上述问题。所提出的指标纳入对回归子空间量统计信息。我们还提出了一种新的姿态回归网络作为完成这个任务,新的基准。我们比较了微软7花絮我们的训练模型的性能和剑桥地标数据集都与标准指标和新提出的指标,并调整重叠得分揭示子空间和性能之间的权衡。结果表明,所提出的度量是更健壮到比传统方法不同的重叠阈值。最后,我们表明,我们的网络推广顺利,特别是,在一个场景引线训练对其它场景的性能几乎没有损失。
17. ECOVNet: An Ensemble of Deep Convolutional Neural Networks Based on EfficientNet to Detect COVID-19 From Chest X-rays [PDF] 返回目录
Nihad Karim Chowdhury, Md. Muhtadir Rahman, Noortaz Rezoana, Muhammad Ashad Kabir
Abstract: This paper proposed an ensemble of deep convolutional neural networks (CNN) based on EfficientNet, named ECOVNet, to detect COVID-19 using a large chest X-ray data set. At first, the open-access large chest X-ray collection is augmented, and then ImageNet pre-trained weights for EfficientNet is transferred with some customized fine-tuning top layers that are trained, followed by an ensemble of model snapshots to classify chest X-rays corresponding to COVID-19, normal, and pneumonia. The predictions of the model snapshots, which are created during a single training, are combined through two ensemble strategies, i.e., hard ensemble and soft ensemble to ameliorate classification performance and generalization in the related task of classifying chest X-rays.
摘要:本文提出了一种基于EfficientNet深卷积神经网络(CNN),命名ECOVNet,的合奏使用大胸部X射线数据组,以检测COVID-19。首先,开放接入大胸部X线收集被增大,然后ImageNet预先训练权重EfficientNet已与被训练一些定制微调顶层转移,随后模型快照的分类胸部X合奏射线对应于COVID-19,正常,和肺炎。模型快照,其被一个单一的训练期间产生的预测,通过两个合奏策略,即,硬合奏和软合奏在胸部X光进行分类的相关任务改善分类性能和泛化组合。
Nihad Karim Chowdhury, Md. Muhtadir Rahman, Noortaz Rezoana, Muhammad Ashad Kabir
Abstract: This paper proposed an ensemble of deep convolutional neural networks (CNN) based on EfficientNet, named ECOVNet, to detect COVID-19 using a large chest X-ray data set. At first, the open-access large chest X-ray collection is augmented, and then ImageNet pre-trained weights for EfficientNet is transferred with some customized fine-tuning top layers that are trained, followed by an ensemble of model snapshots to classify chest X-rays corresponding to COVID-19, normal, and pneumonia. The predictions of the model snapshots, which are created during a single training, are combined through two ensemble strategies, i.e., hard ensemble and soft ensemble to ameliorate classification performance and generalization in the related task of classifying chest X-rays.
摘要:本文提出了一种基于EfficientNet深卷积神经网络(CNN),命名ECOVNet,的合奏使用大胸部X射线数据组,以检测COVID-19。首先,开放接入大胸部X线收集被增大,然后ImageNet预先训练权重EfficientNet已与被训练一些定制微调顶层转移,随后模型快照的分类胸部X合奏射线对应于COVID-19,正常,和肺炎。模型快照,其被一个单一的训练期间产生的预测,通过两个合奏策略,即,硬合奏和软合奏在胸部X光进行分类的相关任务改善分类性能和泛化组合。
18. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [PDF] 返回目录
Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Abstract: We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while multilayer perceptrons (MLPs) do not extrapolate well in simple tasks, Graph Neural Networks (GNNs), a structured network with MLP modules, have some success in more complex tasks. We provide a theoretical explanation and identify conditions under which MLPs and GNNs extrapolate well. We start by showing ReLU MLPs trained by gradient descent converge quickly to linear functions along any direction from the origin, which suggests ReLU MLPs cannot extrapolate well in most non-linear tasks. On the other hand, ReLU MLPs can provably converge to a linear target function when the training distribution is "diverse" enough. These observations lead to a hypothesis: GNNs can extrapolate well in dynamic programming (DP) tasks if we encode appropriate non-linearity in the architecture and input representation. We provide theoretical and empirical support for the hypothesis. Our theory explains previous extrapolation success and suggest their limitations: successful extrapolation relies on incorporating task-specific non-linearity, which often requires domain knowledge or extensive model search.
摘要:我们研究网络通过梯度下降外推,即他们学习培训外分布的支持培训的神经如何。以前的作品与神经网络推断报告时混合实证结果:在多层感知器(的MLP)的简单任务做无法推断很好,图表神经网络(GNNS),结构化网络与MLP模块,在更复杂的任务,取得了一些成功。我们提供了理论上的解释,并确定其下的MLP和GNNS推断良好的条件。首先,我们通过展示RELU的MLP迅速由训练有素的梯度下降收敛到线性函数沿着从原点,这表明RELU的MLP不能外推于大多数非线性任务的任何方向。在另一方面,RELU业主有限合伙制可以可证明收敛到一个线性目标函数当训练分布是“多元化”够了。这些观察导致一种假设:如果我们在编码的结构和输入表示适当的非线性GNNS可以推断以及动态规划(DP)的任务。我们提供的假设理论和实证支持。我们的理论解释了先前的推断成功,并建议其局限性:成功推依赖结合特定任务的非线性,这往往需要领域知识或粗放型搜索。
Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Abstract: We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while multilayer perceptrons (MLPs) do not extrapolate well in simple tasks, Graph Neural Networks (GNNs), a structured network with MLP modules, have some success in more complex tasks. We provide a theoretical explanation and identify conditions under which MLPs and GNNs extrapolate well. We start by showing ReLU MLPs trained by gradient descent converge quickly to linear functions along any direction from the origin, which suggests ReLU MLPs cannot extrapolate well in most non-linear tasks. On the other hand, ReLU MLPs can provably converge to a linear target function when the training distribution is "diverse" enough. These observations lead to a hypothesis: GNNs can extrapolate well in dynamic programming (DP) tasks if we encode appropriate non-linearity in the architecture and input representation. We provide theoretical and empirical support for the hypothesis. Our theory explains previous extrapolation success and suggest their limitations: successful extrapolation relies on incorporating task-specific non-linearity, which often requires domain knowledge or extensive model search.
摘要:我们研究网络通过梯度下降外推,即他们学习培训外分布的支持培训的神经如何。以前的作品与神经网络推断报告时混合实证结果:在多层感知器(的MLP)的简单任务做无法推断很好,图表神经网络(GNNS),结构化网络与MLP模块,在更复杂的任务,取得了一些成功。我们提供了理论上的解释,并确定其下的MLP和GNNS推断良好的条件。首先,我们通过展示RELU的MLP迅速由训练有素的梯度下降收敛到线性函数沿着从原点,这表明RELU的MLP不能外推于大多数非线性任务的任何方向。在另一方面,RELU业主有限合伙制可以可证明收敛到一个线性目标函数当训练分布是“多元化”够了。这些观察导致一种假设:如果我们在编码的结构和输入表示适当的非线性GNNS可以推断以及动态规划(DP)的任务。我们提供的假设理论和实证支持。我们的理论解释了先前的推断成功,并建议其局限性:成功推依赖结合特定任务的非线性,这往往需要领域知识或粗放型搜索。
19. A Gradient Flow Framework For Analyzing Network Pruning [PDF] 返回目录
Ekdeep Singh Lubana, Robert P. Dick
Abstract: Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed for pruning trained models. Despite lacking justification for their use early-on in training, models pruned using such measures result in surprisingly minimal accuracy loss. To better explain this behavior, we develop a general, gradient-flow based framework that relates state-of-the-art importance measures through an order of time-derivative of the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several findings related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is well-motivated for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, and increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10 and CIFAR-100. Code available at this https URL.
摘要:最近网络修剪方法集中在早期的修剪模型训练。为了估计移除参数的影响,这些方法使用最初设计用于修剪训练的模型重要措施。尽管缺乏理由,其使用早期的训练,模型中使用这种措施导致令人惊讶的最小精度损失修剪。为了更好地解释这种行为,我们开发了一般,梯度流基础的框架,它通过时间微分模型参数的范数的顺序涉及的国家的最先进的重要性的措施。我们用这个框架来确定修剪措施和模型参数变化之间的关系,建立相关的培训早在修剪模型几个结论:(I)级为基础的修剪,为减少损失作出贡献至少移除了参数,导致模型该收敛比的大小无关的方法快; (二)损失保全基于修剪蜜饯阶模型演化动力学和非常激励的修剪微创训练的模型;和(iii)梯度范数基于修剪影响二阶模型进化动力学,并通过修剪可以产生效果不佳的模型增加梯度范数。我们验证几个VGG-13,MobileNet-V1,和RESNET-56机型培训了CIFAR-10和CIFAR-100我们的要求。代码可在此HTTPS URL。
Ekdeep Singh Lubana, Robert P. Dick
Abstract: Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed for pruning trained models. Despite lacking justification for their use early-on in training, models pruned using such measures result in surprisingly minimal accuracy loss. To better explain this behavior, we develop a general, gradient-flow based framework that relates state-of-the-art importance measures through an order of time-derivative of the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several findings related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is well-motivated for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, and increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10 and CIFAR-100. Code available at this https URL.
摘要:最近网络修剪方法集中在早期的修剪模型训练。为了估计移除参数的影响,这些方法使用最初设计用于修剪训练的模型重要措施。尽管缺乏理由,其使用早期的训练,模型中使用这种措施导致令人惊讶的最小精度损失修剪。为了更好地解释这种行为,我们开发了一般,梯度流基础的框架,它通过时间微分模型参数的范数的顺序涉及的国家的最先进的重要性的措施。我们用这个框架来确定修剪措施和模型参数变化之间的关系,建立相关的培训早在修剪模型几个结论:(I)级为基础的修剪,为减少损失作出贡献至少移除了参数,导致模型该收敛比的大小无关的方法快; (二)损失保全基于修剪蜜饯阶模型演化动力学和非常激励的修剪微创训练的模型;和(iii)梯度范数基于修剪影响二阶模型进化动力学,并通过修剪可以产生效果不佳的模型增加梯度范数。我们验证几个VGG-13,MobileNet-V1,和RESNET-56机型培训了CIFAR-10和CIFAR-100我们的要求。代码可在此HTTPS URL。
20. Learning Graph Normalization for Graph Neural Networks [PDF] 返回目录
Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, Rong Xiao
Abstract: Graph Neural Networks (GNNs) have attracted considerable attention and have emerged as a new promising paradigm to process graph-structured data. GNNs are usually stacked to multiple layers and the node representations in each layer are computed through propagating and aggregating the neighboring node features with respect to the graph. By stacking to multiple layers, GNNs are able to capture the long-range dependencies among the data on the graph and thus bring performance improvements. To train a GNN with multiple layers effectively, some normalization techniques (e.g., node-wise normalization, batch-wise normalization) are necessary. However, the normalization techniques for GNNs are highly task-relevant and different application tasks prefer to different normalization techniques, which is hard to know in advance. To tackle this deficiency, in this paper, we propose to learn graph normalization by optimizing a weighted combination of normalization techniques at four different levels, including node-wise normalization, adjacency-wise normalization, graph-wise normalization, and batch-wise normalization, in which the adjacency-wise normalization and the graph-wise normalization are newly proposed in this paper to take into account the local structure and the global structure on the graph, respectively. By learning the optimal weights, we are able to automatically select a single best or a best combination of multiple normalizations for a specific task. We conduct extensive experiments on benchmark datasets for different tasks, including node classification, link prediction, graph classification and graph regression, and confirm that the learned graph normalization leads to competitive results and that the learned weights suggest the appropriate normalization techniques for the specific task. Source code is released here this https URL.
摘要:图形神经网络(GNNS)已经吸引了相当多的关注,并已成为一种新的有前途的范式来处理图形的结构化数据。 GNNS通常堆叠多个层并且各层中的节点表示通过传播并且相对于所述图形聚集相邻节点特征来计算。通过层叠多个层,GNNS能够捕获图形上的数据,并且因此带来的性能改进中长距离的依赖关系。为了有效地训练具有多层的GNN,一些标准化技术(例如,节点逐归一化,分批正常化)是必要的。然而,对于GNNS标准化技术是高度与任务相关的和不同的应用任务更喜欢不同的标准化技术,这是很难提前知道。为了解决这一缺陷,在本文中,我们提出了通过优化的标准化技术的加权组合在四个不同的级别,包括节点明智的正常化,邻接明智的正常化,图表明智的正常化,并分批正常化学习曲线正常化,其中邻接明智的正常化和图形明智正常化本文新近提出分别考虑局部结构和全局结构的图。通过学习的最优权重,我们能够自动选择最佳的单一或多重归一特定任务的最佳组合。我们进行的基准数据集为不同的任务,包括节点分类,链接预测,图形分类和图形回归,并确认了广泛的实验是学习曲线正常化导致的竞争结果和学习的权重提出了具体任务适当的规范化技术。源代码是这里这HTTPS URL释放。
Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, Rong Xiao
Abstract: Graph Neural Networks (GNNs) have attracted considerable attention and have emerged as a new promising paradigm to process graph-structured data. GNNs are usually stacked to multiple layers and the node representations in each layer are computed through propagating and aggregating the neighboring node features with respect to the graph. By stacking to multiple layers, GNNs are able to capture the long-range dependencies among the data on the graph and thus bring performance improvements. To train a GNN with multiple layers effectively, some normalization techniques (e.g., node-wise normalization, batch-wise normalization) are necessary. However, the normalization techniques for GNNs are highly task-relevant and different application tasks prefer to different normalization techniques, which is hard to know in advance. To tackle this deficiency, in this paper, we propose to learn graph normalization by optimizing a weighted combination of normalization techniques at four different levels, including node-wise normalization, adjacency-wise normalization, graph-wise normalization, and batch-wise normalization, in which the adjacency-wise normalization and the graph-wise normalization are newly proposed in this paper to take into account the local structure and the global structure on the graph, respectively. By learning the optimal weights, we are able to automatically select a single best or a best combination of multiple normalizations for a specific task. We conduct extensive experiments on benchmark datasets for different tasks, including node classification, link prediction, graph classification and graph regression, and confirm that the learned graph normalization leads to competitive results and that the learned weights suggest the appropriate normalization techniques for the specific task. Source code is released here this https URL.
摘要:图形神经网络(GNNS)已经吸引了相当多的关注,并已成为一种新的有前途的范式来处理图形的结构化数据。 GNNS通常堆叠多个层并且各层中的节点表示通过传播并且相对于所述图形聚集相邻节点特征来计算。通过层叠多个层,GNNS能够捕获图形上的数据,并且因此带来的性能改进中长距离的依赖关系。为了有效地训练具有多层的GNN,一些标准化技术(例如,节点逐归一化,分批正常化)是必要的。然而,对于GNNS标准化技术是高度与任务相关的和不同的应用任务更喜欢不同的标准化技术,这是很难提前知道。为了解决这一缺陷,在本文中,我们提出了通过优化的标准化技术的加权组合在四个不同的级别,包括节点明智的正常化,邻接明智的正常化,图表明智的正常化,并分批正常化学习曲线正常化,其中邻接明智的正常化和图形明智正常化本文新近提出分别考虑局部结构和全局结构的图。通过学习的最优权重,我们能够自动选择最佳的单一或多重归一特定任务的最佳组合。我们进行的基准数据集为不同的任务,包括节点分类,链接预测,图形分类和图形回归,并确认了广泛的实验是学习曲线正常化导致的竞争结果和学习的权重提出了具体任务适当的规范化技术。源代码是这里这HTTPS URL释放。
21. Interpreting and Boosting Dropout from a Game-Theoretic View [PDF] 返回目录
Hao Zhang, Sen Li, Yinchao Ma, Mingjie Li, Yichen Xie, Quanshi Zhang
Abstract: This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretic interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretical proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. Thus, the utility of dropout can be regarded as decreasing interactions to alleviating the significance of over-fitting. Based on this understanding, we propose an interaction loss to further improve the utility of dropout. Experimental results have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.
摘要:本文旨在了解并从博弈论相互作用的角度提高差操作的效用。我们证明了辍学能抑制深层神经网络(DNNs)的输入变量之间相互作用的强度。理论证明也通过各种实验验证。此外,我们发现,这种相互作用是密切相关的深学习过拟合问题。因此,压差的效用可视为交互下降到减轻过拟合的意义。基于这种认识,我们提出了一个互动的损失,进一步提高辍学的效用。实验结果表明,相互作用的损失可以有效地提高辍学的效用,促进DNNs的性能。
Hao Zhang, Sen Li, Yinchao Ma, Mingjie Li, Yichen Xie, Quanshi Zhang
Abstract: This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretic interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretical proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. Thus, the utility of dropout can be regarded as decreasing interactions to alleviating the significance of over-fitting. Based on this understanding, we propose an interaction loss to further improve the utility of dropout. Experimental results have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.
摘要:本文旨在了解并从博弈论相互作用的角度提高差操作的效用。我们证明了辍学能抑制深层神经网络(DNNs)的输入变量之间相互作用的强度。理论证明也通过各种实验验证。此外,我们发现,这种相互作用是密切相关的深学习过拟合问题。因此,压差的效用可视为交互下降到减轻过拟合的意义。基于这种认识,我们提出了一个互动的损失,进一步提高辍学的效用。实验结果表明,相互作用的损失可以有效地提高辍学的效用,促进DNNs的性能。
22. Eye Movement Feature Classification for Soccer Expertise Identification in Virtual Reality [PDF] 返回目录
Benedikt Hosp, Florian Schultz, Enkelejda Kasneci, Oliver Höner
Abstract: Latest research in expertise assessment of soccer players pronounced the importance of perceptual skills. Former research focused either on high experimental control or natural presentation mode. To assess perceptual skills of athletes, in an optimized manner, we captured omnidirectional in-field scenes, showed to 12 expert, 9 intermediate and 13 novice goalkeepers from soccer on virtual reality glasses. All scenes where shown from the same natural goalkeeper perspective and ended after the return pass to the goalkeeper. Based on their responses and gaze behavior we classified their expertise with common machine learning techniques. This pilot study shows promising results for objective classification of goalkeepers expertise based on their gaze behaviour.
摘要:足球运动员的专业评估最新的研究明显的感知能力的重要性。以往的研究要么专注于高实验控制或自然呈现方式。为了评估运动员感知技能,以优化的方式,我们捕获全向在现场的场景,显示出12个专家,9中间体并且从足球虚拟现实眼镜13名新手守门员。其中来自同一自然门将的角度显示和返回后结束所有场景传给门将。根据他们的回答和凝视行为,我们归类它们与普通机器学习技术的专业知识。这项初步研究显示有前途的基于他们的目光行为门将的专业客观的分类结果。
Benedikt Hosp, Florian Schultz, Enkelejda Kasneci, Oliver Höner
Abstract: Latest research in expertise assessment of soccer players pronounced the importance of perceptual skills. Former research focused either on high experimental control or natural presentation mode. To assess perceptual skills of athletes, in an optimized manner, we captured omnidirectional in-field scenes, showed to 12 expert, 9 intermediate and 13 novice goalkeepers from soccer on virtual reality glasses. All scenes where shown from the same natural goalkeeper perspective and ended after the return pass to the goalkeeper. Based on their responses and gaze behavior we classified their expertise with common machine learning techniques. This pilot study shows promising results for objective classification of goalkeepers expertise based on their gaze behaviour.
摘要:足球运动员的专业评估最新的研究明显的感知能力的重要性。以往的研究要么专注于高实验控制或自然呈现方式。为了评估运动员感知技能,以优化的方式,我们捕获全向在现场的场景,显示出12个专家,9中间体并且从足球虚拟现实眼镜13名新手守门员。其中来自同一自然门将的角度显示和返回后结束所有场景传给门将。根据他们的回答和凝视行为,我们归类它们与普通机器学习技术的专业知识。这项初步研究显示有前途的基于他们的目光行为门将的专业客观的分类结果。
23. Residual Feature Distillation Network for Lightweight Image Super-Resolution [PDF] 返回目录
Jie Liu, Jie Tang, Gangshan Wu
Abstract: Recent advances in single image super-resolution (SISR) explored the power of convolutional neural network (CNN) to achieve a better performance. Despite the great success of CNN-based methods, it is not easy to apply these methods to edge devices due to the requirement of heavy computation. To solve this problem, various fast and lightweight CNN models have been proposed. The information distillation network is one of the state-of-the-art methods, which adopts the channel splitting operation to extract distilled features. However, it is not clear enough how this operation helps in the design of efficient SISR models. In this paper, we propose the feature distillation connection (FDC) that is functionally equivalent to the channel splitting operation while being more lightweight and flexible. Thanks to FDC, we can rethink the information multi-distillation network (IMDN) and propose a lightweight and accurate SISR model called residual feature distillation network (RFDN). RFDN uses multiple feature distillation connections to learn more discriminative feature representations. We also propose a shallow residual block (SRB) as the main building block of RFDN so that the network can benefit most from residual learning while still being lightweight enough. Extensive experimental results show that the proposed RFDN achieve a better trade-off against the state-of-the-art methods in terms of performance and model complexity. Moreover, we propose an enhanced RFDN (E-RFDN) and won the first place in the AIM 2020 efficient super-resolution challenge. Code will be available at this https URL.
摘要:在单个图像超分辨率(SISR)的最新进展探索卷积神经网络(CNN)的功率,以实现更好的性能。尽管基于CNN的方法取得的巨大成功,这是不容易的这些方法边缘设备上应用,由于大量计算的要求。为了解决这个问题,各种快速,轻量级的CNN模型已被提出。信息蒸馏网络是国家的最先进的方法,即采用信道分割操作,以提取蒸馏水的特征之一。然而,这是不够明确此操作在有效SISR模型的设计如何帮助。在本文中,我们提出了特征蒸馏连接(FDC),同时更加轻便灵活在功能上等效于信道分离操作。由于FDC,我们可以重新思考信息的多蒸馏网络(IMDN),并提出了一个轻量级的,准确的SISR模型称为残余特征蒸馏网络(RFDN)。 RFDN使用多个功能蒸馏连接到学习更有辨别力的特征表示。我们还提出了一个浅浅的残余块(SRB)作为RFDN主楼块,使得网络可以从残留的学习,同时仍然重量足够轻受益最大。大量的实验结果表明,该RFDN达到更好的权衡对国家的最先进的方法,在性能和模型复杂度。此外,我们提出了一种增强RFDN(E-RFDN)和AIM 2020高效超分辨率的挑战荣获第一名。代码将可在此HTTPS URL。
Jie Liu, Jie Tang, Gangshan Wu
Abstract: Recent advances in single image super-resolution (SISR) explored the power of convolutional neural network (CNN) to achieve a better performance. Despite the great success of CNN-based methods, it is not easy to apply these methods to edge devices due to the requirement of heavy computation. To solve this problem, various fast and lightweight CNN models have been proposed. The information distillation network is one of the state-of-the-art methods, which adopts the channel splitting operation to extract distilled features. However, it is not clear enough how this operation helps in the design of efficient SISR models. In this paper, we propose the feature distillation connection (FDC) that is functionally equivalent to the channel splitting operation while being more lightweight and flexible. Thanks to FDC, we can rethink the information multi-distillation network (IMDN) and propose a lightweight and accurate SISR model called residual feature distillation network (RFDN). RFDN uses multiple feature distillation connections to learn more discriminative feature representations. We also propose a shallow residual block (SRB) as the main building block of RFDN so that the network can benefit most from residual learning while still being lightweight enough. Extensive experimental results show that the proposed RFDN achieve a better trade-off against the state-of-the-art methods in terms of performance and model complexity. Moreover, we propose an enhanced RFDN (E-RFDN) and won the first place in the AIM 2020 efficient super-resolution challenge. Code will be available at this https URL.
摘要:在单个图像超分辨率(SISR)的最新进展探索卷积神经网络(CNN)的功率,以实现更好的性能。尽管基于CNN的方法取得的巨大成功,这是不容易的这些方法边缘设备上应用,由于大量计算的要求。为了解决这个问题,各种快速,轻量级的CNN模型已被提出。信息蒸馏网络是国家的最先进的方法,即采用信道分割操作,以提取蒸馏水的特征之一。然而,这是不够明确此操作在有效SISR模型的设计如何帮助。在本文中,我们提出了特征蒸馏连接(FDC),同时更加轻便灵活在功能上等效于信道分离操作。由于FDC,我们可以重新思考信息的多蒸馏网络(IMDN),并提出了一个轻量级的,准确的SISR模型称为残余特征蒸馏网络(RFDN)。 RFDN使用多个功能蒸馏连接到学习更有辨别力的特征表示。我们还提出了一个浅浅的残余块(SRB)作为RFDN主楼块,使得网络可以从残留的学习,同时仍然重量足够轻受益最大。大量的实验结果表明,该RFDN达到更好的权衡对国家的最先进的方法,在性能和模型复杂度。此外,我们提出了一种增强RFDN(E-RFDN)和AIM 2020高效超分辨率的挑战荣获第一名。代码将可在此HTTPS URL。
24. Adversarial Brain Multiplex Prediction From a Single Network for High-Order Connectional Gender-Specific Brain Mapping [PDF] 返回目录
Ahmed Nebli, Islem Rekik
Abstract: Brain connectivity networks, derived from magnetic resonance imaging (MRI), non-invasively quantify the relationship in function, structure, and morphology between two brain regions of interest (ROIs) and give insights into gender-related connectional differences. However, to the best of our knowledge, studies on gender differences in brain connectivity were limited to investigating pairwise (i.e., low-order) relationship ROIs, overlooking the complex high-order interconnectedness of the brain as a network. To address this limitation, brain multiplexes have been introduced to model the relationship between at least two different brain networks. However, this inhibits their application to datasets with single brain networks such as functional networks. To fill this gap, we propose the first work on predicting brain multiplexes from a source network to investigate gender differences. Recently, generative adversarial networks (GANs) submerged the field of medical data synthesis. However, although conventional GANs work well on images, they cannot handle brain networks due to their non-Euclidean topological structure. Differently, in this paper, we tap into the nascent field of geometric-GANs (G-GAN) to design a deep multiplex prediction architecture comprising (i) a geometric source to target network translator mimicking a U-Net architecture with skip connections and (ii) a conditional discriminator which classifies predicted target intra-layers by conditioning on the multiplex source intra-layers. Such architecture simultaneously learns the latent source network representation and the deep non-linear mapping from the source to target multiplex intra-layers. Our experiments on a large dataset demonstrated that predicted multiplexes significantly boost gender classification accuracy compared with source networks and identifies both low and high-order gender-specific multiplex connections.
摘要:脑连接的网络,从磁共振成像(MRI)衍生的,非侵入性地定量目标(投资回报)2个脑区域之间的功能,结构的关系,以及形态和给见解性别相关connectional差异。然而,据我们所知,在脑连通性别差异的研究仅限于研究配对(即低位)关系的ROI,俯瞰着大脑网络的复杂的高阶相互联系。为了解决这个限制,脑复已被引入至少两个不同的大脑网络之间的关系进行建模。然而,这种抑制它们与单脑网络,诸如功能网络的数据集的应用程序。为了填补这一空白,我们提出了从源网络预测脑复研究性别差异的第一项工作。最近,生成对抗网络(甘斯)浸没医疗数据合成领域。然而,尽管传统的甘斯上的图像很好地工作,他们不能处理大脑网络由于其非欧几里德的拓扑结构。不同的是,在本文中,我们打入几何甘斯(G-GAN)的新兴领域设计一个深多路复用预测结构包括(i)一个几何源到目标网络翻译模仿与跳过的连接和U-Net的架构( ⅱ)进行分类调理上复用源帧内层预测的目标帧内层的条件鉴别器。这样的架构同时获知潜源网络表示和来自源的深非线性映射到目标复用帧内层。我们在一个大的数据集的实验表明,预测多路复用显著提高性别分类精度与源网络和识别低和高次性别特异性多重连接相比较。
Ahmed Nebli, Islem Rekik
Abstract: Brain connectivity networks, derived from magnetic resonance imaging (MRI), non-invasively quantify the relationship in function, structure, and morphology between two brain regions of interest (ROIs) and give insights into gender-related connectional differences. However, to the best of our knowledge, studies on gender differences in brain connectivity were limited to investigating pairwise (i.e., low-order) relationship ROIs, overlooking the complex high-order interconnectedness of the brain as a network. To address this limitation, brain multiplexes have been introduced to model the relationship between at least two different brain networks. However, this inhibits their application to datasets with single brain networks such as functional networks. To fill this gap, we propose the first work on predicting brain multiplexes from a source network to investigate gender differences. Recently, generative adversarial networks (GANs) submerged the field of medical data synthesis. However, although conventional GANs work well on images, they cannot handle brain networks due to their non-Euclidean topological structure. Differently, in this paper, we tap into the nascent field of geometric-GANs (G-GAN) to design a deep multiplex prediction architecture comprising (i) a geometric source to target network translator mimicking a U-Net architecture with skip connections and (ii) a conditional discriminator which classifies predicted target intra-layers by conditioning on the multiplex source intra-layers. Such architecture simultaneously learns the latent source network representation and the deep non-linear mapping from the source to target multiplex intra-layers. Our experiments on a large dataset demonstrated that predicted multiplexes significantly boost gender classification accuracy compared with source networks and identifies both low and high-order gender-specific multiplex connections.
摘要:脑连接的网络,从磁共振成像(MRI)衍生的,非侵入性地定量目标(投资回报)2个脑区域之间的功能,结构的关系,以及形态和给见解性别相关connectional差异。然而,据我们所知,在脑连通性别差异的研究仅限于研究配对(即低位)关系的ROI,俯瞰着大脑网络的复杂的高阶相互联系。为了解决这个限制,脑复已被引入至少两个不同的大脑网络之间的关系进行建模。然而,这种抑制它们与单脑网络,诸如功能网络的数据集的应用程序。为了填补这一空白,我们提出了从源网络预测脑复研究性别差异的第一项工作。最近,生成对抗网络(甘斯)浸没医疗数据合成领域。然而,尽管传统的甘斯上的图像很好地工作,他们不能处理大脑网络由于其非欧几里德的拓扑结构。不同的是,在本文中,我们打入几何甘斯(G-GAN)的新兴领域设计一个深多路复用预测结构包括(i)一个几何源到目标网络翻译模仿与跳过的连接和U-Net的架构( ⅱ)进行分类调理上复用源帧内层预测的目标帧内层的条件鉴别器。这样的架构同时获知潜源网络表示和来自源的深非线性映射到目标复用帧内层。我们在一个大的数据集的实验表明,预测多路复用显著提高性别分类精度与源网络和识别低和高次性别特异性多重连接相比较。
25. Detection of Iterative Adversarial Attacks via Counter Attack [PDF] 返回目录
Matthias Rottmann, Mathis Peyron, Natasa Krejic, Hanno Gottschalk
Abstract: Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work, we demonstrate that such iterative minimization attacks can by used as detectors themselves. Thus, in some sense we show that one can fight fire with fire. This work also outlines a mathematical proof that under certain assumptions this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we obtain AUROC values up to 99.73% for our detection method. This distinctly surpasses state of the art detection rates for CW attacks from the literature. We also give numerical evidence that our method is robust against the attacker's choice of the method of attack.
摘要:深层神经网络(DNNs)已被证明是处理非结构化数据的强大工具。然而,对于高维数据,如图像,他们很容易遭受攻击的对抗性。小添加到输入几乎看不见的扰动,可以用来愚弄DNNs。各种攻击,硬化的方法和检测方法已经在近几年相继推出。出了名的,卡烈尼 - 瓦格纳(CW)类型的攻击计算通过迭代最小化属于那些最难以察觉。在这项工作中,我们证明了这种迭代最小化攻击可以通过使用探测器本身。因此,在某种意义上,我们表明,人们可以以火攻火。这项工作还概述了一个数学证明,在一定条件下该探测器提供原件及攻击图像的渐近最优的分离。在数值实验,我们得到AUROC值最多可为我们的检测方法99.73%。这明显超过了技术的检测率从文献CW攻击的状态。我们还给出了数值的证据表明,我们的方法是对攻击者选择的攻击方法的稳健。
Matthias Rottmann, Mathis Peyron, Natasa Krejic, Hanno Gottschalk
Abstract: Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work, we demonstrate that such iterative minimization attacks can by used as detectors themselves. Thus, in some sense we show that one can fight fire with fire. This work also outlines a mathematical proof that under certain assumptions this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we obtain AUROC values up to 99.73% for our detection method. This distinctly surpasses state of the art detection rates for CW attacks from the literature. We also give numerical evidence that our method is robust against the attacker's choice of the method of attack.
摘要:深层神经网络(DNNs)已被证明是处理非结构化数据的强大工具。然而,对于高维数据,如图像,他们很容易遭受攻击的对抗性。小添加到输入几乎看不见的扰动,可以用来愚弄DNNs。各种攻击,硬化的方法和检测方法已经在近几年相继推出。出了名的,卡烈尼 - 瓦格纳(CW)类型的攻击计算通过迭代最小化属于那些最难以察觉。在这项工作中,我们证明了这种迭代最小化攻击可以通过使用探测器本身。因此,在某种意义上,我们表明,人们可以以火攻火。这项工作还概述了一个数学证明,在一定条件下该探测器提供原件及攻击图像的渐近最优的分离。在数值实验,我们得到AUROC值最多可为我们的检测方法99.73%。这明显超过了技术的检测率从文献CW攻击的状态。我们还给出了数值的证据表明,我们的方法是对攻击者选择的攻击方法的稳健。
26. Generative Modelling of 3D in-silico Spongiosa with Controllable Micro-Structural Parameters [PDF] 返回目录
Emmanuel Iarussi, Felix Thomsen, Claudio Delrieux
Abstract: Research in vertebral bone micro-structure generally requires costly procedures to obtain physical scans of real bone with a specific pathology under study, since no methods are available yet to generate realistic bone structures in-silico. Here we propose to apply recent advances in generative adversarial networks (GANs) to develop such a method. We adapted style-transfer techniques, which have been largely used in other contexts, in order to transfer style between image pairs while preserving its informational content. In a first step, we trained a volumetric generative model in a progressive manner using a Wasserstein objective and gradient penalty (PWGAN-GP) to create patches of realistic bone structure in-silico. The training set contained 7660 purely spongeous bone samples from twelve human vertebrae (T12 or L1) with isotropic resolution of 164um and scanned with a high resolution peripheral quantitative CT (Scanco XCT). After training, we generated new samples with tailored micro-structure properties by optimizing a vector z in the learned latent space. To solve this optimization problem, we formulated a differentiable goal function that leads to valid samples while compromising the appearance (content) with target 3D properties (style). Properties of the learned latent space effectively matched the data distribution. Furthermore, we were able to simulate the resulting bone structure after deterioration or treatment effects of osteoporosis therapies based only on expected changes of micro-structural parameters. Our method allows to generate a virtually infinite number of patches of realistic bone micro-structure, and thereby likely serves for the development of bone-biomarkers and to simulate bone therapies in advance.
摘要:研究椎体骨质微观结构通常需要昂贵的程序,以获取真正的骨的物理扫描与正在研究具体的病理,因为没有方法可还没有产生,硅片现实的骨骼结构。在这里我们建议适用于生成对抗网络(甘斯)的最新进展,开发这样的方法。我们改编的风格传输技术,这已经在很大程度上在其他背景图像对之间,而保留其信息内容使用,以转让的风格。在第一步骤中,我们培养以渐进的方式使用瓦瑟斯坦客观和梯度罚分(PWGAN-GP)来创建在计算机芯片现实骨结构的贴片的体积生成模型。训练集包含7660个纯粹海绵状骨样品从12人的椎骨(T12或L1)与164um的各向同性分辨率和高分辨率扫描外周定量CT(SCANCO XCT)。训练结束后,我们通过优化学习潜空间中的向量z产生具有定制的微结构性质的新的样品。为了解决这个优化问题,我们制定了一个微目标函数,导致有效样本,而影响外观(内容)目标3D属性(样式)。学习潜在空间的属性相匹配有效数据分布。此外,我们能够后仅基于微结构参数的预期变化的骨质疏松症治疗的恶化或治疗效果,以模拟所得到的骨骼结构。我们的方法可以生成逼真的骨微结构的补丁几乎有无限多种,从而有可能服务于骨生物标志物的发展,提前模拟骨疗法。
Emmanuel Iarussi, Felix Thomsen, Claudio Delrieux
Abstract: Research in vertebral bone micro-structure generally requires costly procedures to obtain physical scans of real bone with a specific pathology under study, since no methods are available yet to generate realistic bone structures in-silico. Here we propose to apply recent advances in generative adversarial networks (GANs) to develop such a method. We adapted style-transfer techniques, which have been largely used in other contexts, in order to transfer style between image pairs while preserving its informational content. In a first step, we trained a volumetric generative model in a progressive manner using a Wasserstein objective and gradient penalty (PWGAN-GP) to create patches of realistic bone structure in-silico. The training set contained 7660 purely spongeous bone samples from twelve human vertebrae (T12 or L1) with isotropic resolution of 164um and scanned with a high resolution peripheral quantitative CT (Scanco XCT). After training, we generated new samples with tailored micro-structure properties by optimizing a vector z in the learned latent space. To solve this optimization problem, we formulated a differentiable goal function that leads to valid samples while compromising the appearance (content) with target 3D properties (style). Properties of the learned latent space effectively matched the data distribution. Furthermore, we were able to simulate the resulting bone structure after deterioration or treatment effects of osteoporosis therapies based only on expected changes of micro-structural parameters. Our method allows to generate a virtually infinite number of patches of realistic bone micro-structure, and thereby likely serves for the development of bone-biomarkers and to simulate bone therapies in advance.
摘要:研究椎体骨质微观结构通常需要昂贵的程序,以获取真正的骨的物理扫描与正在研究具体的病理,因为没有方法可还没有产生,硅片现实的骨骼结构。在这里我们建议适用于生成对抗网络(甘斯)的最新进展,开发这样的方法。我们改编的风格传输技术,这已经在很大程度上在其他背景图像对之间,而保留其信息内容使用,以转让的风格。在第一步骤中,我们培养以渐进的方式使用瓦瑟斯坦客观和梯度罚分(PWGAN-GP)来创建在计算机芯片现实骨结构的贴片的体积生成模型。训练集包含7660个纯粹海绵状骨样品从12人的椎骨(T12或L1)与164um的各向同性分辨率和高分辨率扫描外周定量CT(SCANCO XCT)。训练结束后,我们通过优化学习潜空间中的向量z产生具有定制的微结构性质的新的样品。为了解决这个优化问题,我们制定了一个微目标函数,导致有效样本,而影响外观(内容)目标3D属性(样式)。学习潜在空间的属性相匹配有效数据分布。此外,我们能够后仅基于微结构参数的预期变化的骨质疏松症治疗的恶化或治疗效果,以模拟所得到的骨骼结构。我们的方法可以生成逼真的骨微结构的补丁几乎有无限多种,从而有可能服务于骨生物标志物的发展,提前模拟骨疗法。
注:中文为机器翻译结果!封面为论文标题词云图!