目录
5. Weakly Supervised Semantic Segmentation of Satellite Images for Land Cover Mapping -- Challenges and Opportunities [PDF] 摘要
7. siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection [PDF] 摘要
12. Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos [PDF] 摘要
17. Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [PDF] 摘要
21. SYMOG: learning symmetric mixture of Gaussian modes for improved fixed-point quantization [PDF] 摘要
26. Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation [PDF] 摘要
摘要
1. Extracting Semantic Indoor Maps from Occupancy Grids [PDF] 返回目录
Ziyuan Liu, Georg von Wichert
Abstract: The primary challenge for any autonomous system operating in realistic, rather unconstrained scenarios is to manage the complexity and uncertainty of the real world. While it is unclear how exactly humans and other higher animals master these problems, it seems evident, that abstraction plays an important role. The use of abstract concepts allows to define the system behavior on higher levels. In this paper we focus on the semantic mapping of indoor environments. We propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suited for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach using real-world data.
摘要:在现实,而不受约束的情况下任何自治系统运行的主要挑战是管理的复杂性和现实世界的不确定性。虽然目前还不清楚究竟人类和其他高等动物是如何掌握了这些问题,它似乎很明显,即抽象起着重要的作用。采用抽象的概念可以定义更高层次上的系统行为。在本文中,我们专注于室内环境的语义映射。我们提出了一个方法来提取使用贝叶斯推理典型的网格地图抽象的平面图。这个过程的结果是在抽象的概念定义的环境的概率生成模型。它非常适合于更高层次的推理和沟通的目的。我们演示使用真实世界的数据的方法的有效性。
Ziyuan Liu, Georg von Wichert
Abstract: The primary challenge for any autonomous system operating in realistic, rather unconstrained scenarios is to manage the complexity and uncertainty of the real world. While it is unclear how exactly humans and other higher animals master these problems, it seems evident, that abstraction plays an important role. The use of abstract concepts allows to define the system behavior on higher levels. In this paper we focus on the semantic mapping of indoor environments. We propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suited for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach using real-world data.
摘要:在现实,而不受约束的情况下任何自治系统运行的主要挑战是管理的复杂性和现实世界的不确定性。虽然目前还不清楚究竟人类和其他高等动物是如何掌握了这些问题,它似乎很明显,即抽象起着重要的作用。采用抽象的概念可以定义更高层次上的系统行为。在本文中,我们专注于室内环境的语义映射。我们提出了一个方法来提取使用贝叶斯推理典型的网格地图抽象的平面图。这个过程的结果是在抽象的概念定义的环境的概率生成模型。它非常适合于更高层次的推理和沟通的目的。我们演示使用真实世界的数据的方法的有效性。
2. Towards a Complete Pipeline for Segmenting Nuclei in Feulgen-Stained Images [PDF] 返回目录
Luiz Antonio Buschetto Macarini, Aldo von Wangenheim, Felipe Perozzo Daltoé, Alexandre Sherlley Casimiro Onofre, Fabiana Botelho de Miranda Onofre, Marcelo Ricardo Stemmer
Abstract: Cervical cancer is the second most common cancer type in women around the world. In some countries, due to non-existent or inadequate screening, it is often detected at late stages, making standard treatment options often absent or unaffordable. It is a deadly disease that could benefit from early detection approaches. It is usually done by cytological exams which consist of visually inspecting the nuclei searching for morphological alteration. Since it is done by humans, naturally, some subjectivity is introduced. Computational methods could be used to reduce this, where the first stage of the process would be the nuclei segmentation. In this context, we present a complete pipeline for the segmentation of nuclei in Feulgen-stained images using Convolutional Neural Networks. Here we show the entire process of segmentation, since the collection of the samples, passing through pre-processing, training the network, post-processing and results evaluation. We achieved an overall IoU of 0.78, showing the affordability of the approach of nuclei segmentation on Feulgen-stained images. The code is available in: this https URL.
摘要:宫颈癌是全球女性第二常见的癌症类型。在一些国家,由于不存在或不充分的筛选,它往往在后期检测,使标准治疗方案往往缺乏或负担不起。这是一种致命的疾病,可以从早期检测方法中受益。它通常是由包括目视检查为形态学改变核搜索细胞学检查完成。既然是由人完成的,当然,有些主观性介绍。计算方法可以用来降低这一点,其中该方法的第一阶段是细胞核分割。在这方面,我们提出了核的使用卷积神经网络富尔根染色的图像分割一个完整的管道。在这里我们细分的整个过程中,由于样本的采集,通过预处理,网络训练,后处理和结果评估。我们实现了整体IOU 0.78,表现出细胞核分割的上富尔根染色的图像的方法的承受能力。该代码是可用:此HTTPS URL。
Luiz Antonio Buschetto Macarini, Aldo von Wangenheim, Felipe Perozzo Daltoé, Alexandre Sherlley Casimiro Onofre, Fabiana Botelho de Miranda Onofre, Marcelo Ricardo Stemmer
Abstract: Cervical cancer is the second most common cancer type in women around the world. In some countries, due to non-existent or inadequate screening, it is often detected at late stages, making standard treatment options often absent or unaffordable. It is a deadly disease that could benefit from early detection approaches. It is usually done by cytological exams which consist of visually inspecting the nuclei searching for morphological alteration. Since it is done by humans, naturally, some subjectivity is introduced. Computational methods could be used to reduce this, where the first stage of the process would be the nuclei segmentation. In this context, we present a complete pipeline for the segmentation of nuclei in Feulgen-stained images using Convolutional Neural Networks. Here we show the entire process of segmentation, since the collection of the samples, passing through pre-processing, training the network, post-processing and results evaluation. We achieved an overall IoU of 0.78, showing the affordability of the approach of nuclei segmentation on Feulgen-stained images. The code is available in: this https URL.
摘要:宫颈癌是全球女性第二常见的癌症类型。在一些国家,由于不存在或不充分的筛选,它往往在后期检测,使标准治疗方案往往缺乏或负担不起。这是一种致命的疾病,可以从早期检测方法中受益。它通常是由包括目视检查为形态学改变核搜索细胞学检查完成。既然是由人完成的,当然,有些主观性介绍。计算方法可以用来降低这一点,其中该方法的第一阶段是细胞核分割。在这方面,我们提出了核的使用卷积神经网络富尔根染色的图像分割一个完整的管道。在这里我们细分的整个过程中,由于样本的采集,通过预处理,网络训练,后处理和结果评估。我们实现了整体IOU 0.78,表现出细胞核分割的上富尔根染色的图像的方法的承受能力。该代码是可用:此HTTPS URL。
3. VQA-LOL: Visual Question Answering under the Lens of Logic [PDF] 返回目录
Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
Abstract: Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate visual question answering (VQA) through the lens of logical transformation and posit that systems that seek to answer questions about images must be robust to these transformations of the question. If a VQA system is able to answer a question, it should also be able to answer the logical composition of questions. We analyze the performance of state-of-the-art models on the VQA task under these logical operations and show that they have difficulty in correctly answering such questions. We then construct an augmentation of the VQA dataset with questions containing logical operations and retrain the same models to establish a baseline. We further propose a novel methodology to train models to learn negation, conjunction, and disjunction and show improvement in learning logical composition and retaining performance on VQA. We suggest this work as a move towards embedding logical connectives in visual understanding, along with the benefits of robustness and generalizability. Our code and dataset is available online at this https URL
摘要:逻辑连接词及其对自然语言句子的意思含义是理解一个基本方面。在本文中,我们研究了视觉问答(VQA)通过逻辑转型断定的镜头,那些寻求回答有关图像的问题系统必须稳固的问题的这些变化。如果VQA系统能够回答的问题,它也应该能够回答问题的逻辑成分。我们分析针对这些逻辑运算的VQA任务状态的最先进机型的性能,并表明他们在正确回答这些问题的难度。然后,我们构建了VQA数据集包含逻辑运算问题的增强和再培训相同的模型来建立一个基线。我们进一步提出了一种新的方法来训练模型来了解否定,合和脱节,并显示改善学习逻辑组成,保持对VQA性能。我们认为这项工作作为对视觉的理解嵌入逻辑连接词,与坚固性和普遍性的优点外,一招。我们的代码和数据集可在网上这个HTTPS URL
Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
Abstract: Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate visual question answering (VQA) through the lens of logical transformation and posit that systems that seek to answer questions about images must be robust to these transformations of the question. If a VQA system is able to answer a question, it should also be able to answer the logical composition of questions. We analyze the performance of state-of-the-art models on the VQA task under these logical operations and show that they have difficulty in correctly answering such questions. We then construct an augmentation of the VQA dataset with questions containing logical operations and retrain the same models to establish a baseline. We further propose a novel methodology to train models to learn negation, conjunction, and disjunction and show improvement in learning logical composition and retaining performance on VQA. We suggest this work as a move towards embedding logical connectives in visual understanding, along with the benefits of robustness and generalizability. Our code and dataset is available online at this https URL
摘要:逻辑连接词及其对自然语言句子的意思含义是理解一个基本方面。在本文中,我们研究了视觉问答(VQA)通过逻辑转型断定的镜头,那些寻求回答有关图像的问题系统必须稳固的问题的这些变化。如果VQA系统能够回答的问题,它也应该能够回答问题的逻辑成分。我们分析针对这些逻辑运算的VQA任务状态的最先进机型的性能,并表明他们在正确回答这些问题的难度。然后,我们构建了VQA数据集包含逻辑运算问题的增强和再培训相同的模型来建立一个基线。我们进一步提出了一种新的方法来训练模型来了解否定,合和脱节,并显示改善学习逻辑组成,保持对VQA性能。我们认为这项工作作为对视觉的理解嵌入逻辑连接词,与坚固性和普遍性的优点外,一招。我们的代码和数据集可在网上这个HTTPS URL
4. When Radiology Report Generation Meets Knowledge Graph [PDF] 返回目录
Yixiao Zhang, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, Daguang Xu
Abstract: Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.
摘要:自动影像报告生成已朝向计算机辅助诊断的研究吸引了问题减轻医生的工作量在最近几年。自然图像字幕深度学习技术成功地适应产生的放射学报告。然而,放射影像报告是从自然图像字幕任务不同在两个方面:1)积极疾病关键词的准确性提到是在比较中自然图像标题的每一个单词的重要性相当于放射影像报告至关重要; 2)报告质量评估应更注重匹配疾病关键字及其相关联的属性,而不是计数的N-gram的发生。基于这些问题,我们提出了利用多个疾病的发现预构建的图形嵌入模块(有图卷积神经网络模型),以帮助生成报告了这项工作。知识图的结合允许对每一种疾病的发现和关系,它们之间建模专用功能的学习。此外,我们建议对于具有相同组成的图形的协助放射图像报告新的评价指标。实验结果表明,与所提出的图形胸片的可公开访问的数据集(IU-RR),但以两个图像字幕和我们提出的那些普遍采用的传统评价指标以前的方法相比,嵌入模块集成的方法的优越性能。
Yixiao Zhang, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, Daguang Xu
Abstract: Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.
摘要:自动影像报告生成已朝向计算机辅助诊断的研究吸引了问题减轻医生的工作量在最近几年。自然图像字幕深度学习技术成功地适应产生的放射学报告。然而,放射影像报告是从自然图像字幕任务不同在两个方面:1)积极疾病关键词的准确性提到是在比较中自然图像标题的每一个单词的重要性相当于放射影像报告至关重要; 2)报告质量评估应更注重匹配疾病关键字及其相关联的属性,而不是计数的N-gram的发生。基于这些问题,我们提出了利用多个疾病的发现预构建的图形嵌入模块(有图卷积神经网络模型),以帮助生成报告了这项工作。知识图的结合允许对每一种疾病的发现和关系,它们之间建模专用功能的学习。此外,我们建议对于具有相同组成的图形的协助放射图像报告新的评价指标。实验结果表明,与所提出的图形胸片的可公开访问的数据集(IU-RR),但以两个图像字幕和我们提出的那些普遍采用的传统评价指标以前的方法相比,嵌入模块集成的方法的优越性能。
5. Weakly Supervised Semantic Segmentation of Satellite Images for Land Cover Mapping -- Challenges and Opportunities [PDF] 返回目录
Michael Schmitt, Jonathan Prexl, Patrick Ebel, Lukas Liebel, Xiao Xiang Zhu
Abstract: Fully automatic large-scale land cover mapping belongs to the core challenges addressed by the remote sensing community. Usually, the basis of this task is formed by (supervised) machine learning models. However, in spite of recent growth in the availability of satellite observations, accurate training data remains comparably scarce. On the other hand, numerous global land cover products exist and can be accessed often free-of-charge. Unfortunately, these maps are typically of a much lower resolution than modern day satellite imagery. Besides, they always come with a significant amount of noise, as they cannot be considered ground truth, but are products of previous (semi-)automatic prediction tasks. Therefore, this paper seeks to make a case for the application of weakly supervised learning strategies to get the most out of available data sources and achieve progress in high-resolution large-scale land cover mapping. Challenges and opportunities are discussed based on the SEN12MS dataset, for which also some baseline results are shown. These baselines indicate that there is still a lot of potential for dedicated approaches designed to deal with remote sensing-specific forms of weak supervision.
摘要:全自动大型土地覆盖制图属于核心挑战谈到了遥感社区。通常情况下,这一任务的基础是由(监督),机器学习模型形成。然而,尽管在卫星观测的可用性最近的增长的,准确的训练数据仍然相当稀少。在另一方面,众多全球土地覆盖产品的存在,经常可以访问免费的充电。不幸的是,这些地图通常是低得多的分辨率比现今的卫星图像。此外,他们总是有噪音的显著量,因为它们不能被认为是基本事实,但以前的(半)自动预测任务的产品。因此,本文试图以充分的理由的弱监督学习策略,以充分利用现有的数据源,实现高分辨率的大型土地覆盖制图进步的应用程序。挑战与机遇是基于SEN12MS数据集,为此,也有一些基准结果显示了讨论。这些基准表明,仍然有很多的专门用来对付监管不力的遥感特定形式的专用方法的潜力。
Michael Schmitt, Jonathan Prexl, Patrick Ebel, Lukas Liebel, Xiao Xiang Zhu
Abstract: Fully automatic large-scale land cover mapping belongs to the core challenges addressed by the remote sensing community. Usually, the basis of this task is formed by (supervised) machine learning models. However, in spite of recent growth in the availability of satellite observations, accurate training data remains comparably scarce. On the other hand, numerous global land cover products exist and can be accessed often free-of-charge. Unfortunately, these maps are typically of a much lower resolution than modern day satellite imagery. Besides, they always come with a significant amount of noise, as they cannot be considered ground truth, but are products of previous (semi-)automatic prediction tasks. Therefore, this paper seeks to make a case for the application of weakly supervised learning strategies to get the most out of available data sources and achieve progress in high-resolution large-scale land cover mapping. Challenges and opportunities are discussed based on the SEN12MS dataset, for which also some baseline results are shown. These baselines indicate that there is still a lot of potential for dedicated approaches designed to deal with remote sensing-specific forms of weak supervision.
摘要:全自动大型土地覆盖制图属于核心挑战谈到了遥感社区。通常情况下,这一任务的基础是由(监督),机器学习模型形成。然而,尽管在卫星观测的可用性最近的增长的,准确的训练数据仍然相当稀少。在另一方面,众多全球土地覆盖产品的存在,经常可以访问免费的充电。不幸的是,这些地图通常是低得多的分辨率比现今的卫星图像。此外,他们总是有噪音的显著量,因为它们不能被认为是基本事实,但以前的(半)自动预测任务的产品。因此,本文试图以充分的理由的弱监督学习策略,以充分利用现有的数据源,实现高分辨率的大型土地覆盖制图进步的应用程序。挑战与机遇是基于SEN12MS数据集,为此,也有一些基准结果显示了讨论。这些基准表明,仍然有很多的专门用来对付监管不力的遥感特定形式的专用方法的潜力。
6. AI Online Filters to Real World Image Recognition [PDF] 返回目录
Hai Xiao, Jin Shang, Mengyuan Huang
Abstract: Deep artificial neural networks, trained with labeled data sets are widely used in numerous vision and robotics applications today. In terms of AI, these are called reflex models, referring to the fact that they do not self-evolve or actively adapt to environmental changes. As demand for intelligent robot control expands to many high level tasks, reinforcement learning and state based models play an increasingly important role. Herein, in computer vision and robotics domain, we study a novel approach to add reinforcement controls onto the image recognition reflex models to attain better overall performance, specifically to a wider environment range beyond what is expected of the task reflex models. Follow a common infrastructure with environment sensing and AI based modeling of self-adaptive agents, we implement multiple types of AI control agents. To the end, we provide comparative results of these agents with baseline, and an insightful analysis of their benefit to improve overall image recognition performance in real world.
摘要:深人工神经网络,与标记的数据集的培训被广泛应用于众多视觉和机器人应用的今天。在AI方面,这些被称为反射模式,指的是事实,他们不会自我进化或主动适应环境变化。至于智能机器人控制的扩张需求,很多高级别任务,强化学习和基于状态的模型中扮演着越来越重要的作用。在此,计算机视觉和机器人领域,我们研究了一种新的方法来添加增强控件拖到图像识别反射模式,以获得更好的整体性能,具体到超出预期的任务反射模型的更广泛的环境范围。遵循环境感知和自适应代理的AI基于建模一个通用的基础设施,我们实现了多种类型的AI控制剂。到最后,我们提供基线这些药物的对比结果,以及他们的利益的精辟分析,以改善在现实世界中的整体形象识别性能。
Hai Xiao, Jin Shang, Mengyuan Huang
Abstract: Deep artificial neural networks, trained with labeled data sets are widely used in numerous vision and robotics applications today. In terms of AI, these are called reflex models, referring to the fact that they do not self-evolve or actively adapt to environmental changes. As demand for intelligent robot control expands to many high level tasks, reinforcement learning and state based models play an increasingly important role. Herein, in computer vision and robotics domain, we study a novel approach to add reinforcement controls onto the image recognition reflex models to attain better overall performance, specifically to a wider environment range beyond what is expected of the task reflex models. Follow a common infrastructure with environment sensing and AI based modeling of self-adaptive agents, we implement multiple types of AI control agents. To the end, we provide comparative results of these agents with baseline, and an insightful analysis of their benefit to improve overall image recognition performance in real world.
摘要:深人工神经网络,与标记的数据集的培训被广泛应用于众多视觉和机器人应用的今天。在AI方面,这些被称为反射模式,指的是事实,他们不会自我进化或主动适应环境变化。至于智能机器人控制的扩张需求,很多高级别任务,强化学习和基于状态的模型中扮演着越来越重要的作用。在此,计算机视觉和机器人领域,我们研究了一种新的方法来添加增强控件拖到图像识别反射模式,以获得更好的整体性能,具体到超出预期的任务反射模型的更广泛的环境范围。遵循环境感知和自适应代理的AI基于建模一个通用的基础设施,我们实现了多种类型的AI控制剂。到最后,我们提供基线这些药物的对比结果,以及他们的利益的精辟分析,以改善在现实世界中的整体形象识别性能。
7. siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection [PDF] 返回目录
Irene Cortes, Jorge Beltran, Arturo de la Escalera, Fernando Garcia
Abstract: The rapid development of embedded hardware in autonomous vehicles broadens their computational capabilities, thus bringing the possibility to mount more complete sensor setups able to handle driving scenarios of higher complexity. As a result, new challenges such as multiple detections of the same object have to be addressed. In this work, a siamese network is integrated into the pipeline of a well-known 3D object detector approach to suppress duplicate proposals coming from different cameras via re-identification. Additionally, associations are exploited to enhance the 3D box regression of the object by aggregating their corresponding LiDAR frustums. The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
摘要:嵌入式硬件的自主汽车的快速发展拓宽了他们的计算能力,从而把可能安装能够处理更复杂驾驶情况下更完整的传感器设置。其结果是,如在同一对象的多个检测新的挑战必须得到解决。在这项工作中,连体网络被集成到一个公知的3D对象检测器的方法来通过重新鉴定从不同的摄像机来抑制重复的提案的管道。此外,关联是利用通过聚合它们相应的激光雷达平截头体以增强物体的3D箱回归。在nuScenes数据集显示了试验评价,认为该方法优于传统方法NMS。
Irene Cortes, Jorge Beltran, Arturo de la Escalera, Fernando Garcia
Abstract: The rapid development of embedded hardware in autonomous vehicles broadens their computational capabilities, thus bringing the possibility to mount more complete sensor setups able to handle driving scenarios of higher complexity. As a result, new challenges such as multiple detections of the same object have to be addressed. In this work, a siamese network is integrated into the pipeline of a well-known 3D object detector approach to suppress duplicate proposals coming from different cameras via re-identification. Additionally, associations are exploited to enhance the 3D box regression of the object by aggregating their corresponding LiDAR frustums. The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
摘要:嵌入式硬件的自主汽车的快速发展拓宽了他们的计算能力,从而把可能安装能够处理更复杂驾驶情况下更完整的传感器设置。其结果是,如在同一对象的多个检测新的挑战必须得到解决。在这项工作中,连体网络被集成到一个公知的3D对象检测器的方法来通过重新鉴定从不同的摄像机来抑制重复的提案的管道。此外,关联是利用通过聚合它们相应的激光雷达平截头体以增强物体的3D箱回归。在nuScenes数据集显示了试验评价,认为该方法优于传统方法NMS。
8. Three-Stream Fusion Network for First-Person Interaction Recognition [PDF] 返回目录
Ye-Ji Kim, Dong-Gyu Lee, Seong-Whan Lee
Abstract: First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. Thre three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two-public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.
摘要:第一人称交互识别是因为从相机穿着者的运动而产生的不稳定视频条件具有挑战性的任务。用于从第一人称视点的人类交互识别,本文提出了一种具有两个主要部分的三流融合网络:三流架构和三流相关的融合。 THRE三流架构捕获目标的外观,目标运动,和照相机自运动的特性。同时三流相关融合体组合这三个流中的特征地图考虑目标的外观,目标运动和相机自运动之间的相关性。将融合的特征向量是稳健的摄像机运动并补偿相机自运动的噪声。短期间隔使用所述稠合特征矢量建模,并且一个长短期记忆(LSTM)模型考虑了视频的时间动态。我们评估了两个公共标准数据集所提出的方法来验证我们方法的有效性。实验结果表明,所提出的融合方法成功生成的具有区分特征向量,而我们网络的表现优于在相当大的相机自运动发生的第一人称视频所有竞争行为识别方法。
Ye-Ji Kim, Dong-Gyu Lee, Seong-Whan Lee
Abstract: First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. Thre three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two-public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.
摘要:第一人称交互识别是因为从相机穿着者的运动而产生的不稳定视频条件具有挑战性的任务。用于从第一人称视点的人类交互识别,本文提出了一种具有两个主要部分的三流融合网络:三流架构和三流相关的融合。 THRE三流架构捕获目标的外观,目标运动,和照相机自运动的特性。同时三流相关融合体组合这三个流中的特征地图考虑目标的外观,目标运动和相机自运动之间的相关性。将融合的特征向量是稳健的摄像机运动并补偿相机自运动的噪声。短期间隔使用所述稠合特征矢量建模,并且一个长短期记忆(LSTM)模型考虑了视频的时间动态。我们评估了两个公共标准数据集所提出的方法来验证我们方法的有效性。实验结果表明,所提出的融合方法成功生成的具有区分特征向量,而我们网络的表现优于在相当大的相机自运动发生的第一人称视频所有竞争行为识别方法。
9. DeFraudNet:End2End Fingerprint Spoof Detection using Patch Level Attention [PDF] 返回目录
B.V.S Anusha, Sayan Banerjee, Subhasis Chaudhuri
Abstract: In recent years, fingerprint recognition systems have made remarkable advancements in the field of biometric security as it plays an important role in personal, national and global security. In spite of all these notable advancements, the fingerprint recognition technology is still susceptible to spoof attacks which can significantly jeopardize the user security. The cross sensor and cross material spoof detection still pose a challenge with a myriad of spoof materials emerging every day, compromising sensor interoperability and robustness. This paper proposes a novel method for fingerprint spoof detection using both global and local fingerprint feature descriptors. These descriptors are extracted using DenseNet which significantly improves cross-sensor, cross-material and cross-dataset performance. A novel patch attention network is used for finding the most discriminative patches and also for network fusion. We evaluate our method on four publicly available datasets:LivDet 2011, 2013, 2015 and 2017. A set of comprehensive experiments are carried out to evaluate cross-sensor, cross-material and cross-dataset performance over these datasets. The proposed approach achieves an average accuracy of 99.52%, 99.16% and 99.72% on LivDet 2017,2015 and 2011 respectively outperforming the current state-of-the-art results by 3% and 4% for LivDet 2015 and 2011 respectively.
摘要:近年来,指纹识别系统在生物安全领域取得了令人瞩目的进步,因为它对个人,国家和全球安全的重要作用。尽管有这些显着的进步,指纹识别技术仍然容易受到欺骗攻击,它可以显著危及用户安全。交叉传感器和跨材料欺骗检测仍然对与恶搞材料无数,每天新出现的,影响传感器的互操作性和稳健性提出了挑战。本文提出了一种利用全局和局部的指纹特征描述指纹欺骗检测的新方法。这些描述符使用DenseNet其中显著提高跨传感器,横材料和跨数据集性能萃取。一种新的补丁关注网络用于寻找最歧视性的补丁,也为网络的融合。我们评估我们在四个可公开获得的数据集的方法:LivDet 2011年,2013年,2015年和2017年的综合性实验一组都进行了评估跨传感器,跨材料,跨数据集对这些数据集的性能。所提出的方法实现了99.52%,99.16%和上LivDet 2017,2015和2011 99.72%分别优于当前状态的最先进的结果由3%和4%分别为2015年LivDet和2011年的平均精确度。
B.V.S Anusha, Sayan Banerjee, Subhasis Chaudhuri
Abstract: In recent years, fingerprint recognition systems have made remarkable advancements in the field of biometric security as it plays an important role in personal, national and global security. In spite of all these notable advancements, the fingerprint recognition technology is still susceptible to spoof attacks which can significantly jeopardize the user security. The cross sensor and cross material spoof detection still pose a challenge with a myriad of spoof materials emerging every day, compromising sensor interoperability and robustness. This paper proposes a novel method for fingerprint spoof detection using both global and local fingerprint feature descriptors. These descriptors are extracted using DenseNet which significantly improves cross-sensor, cross-material and cross-dataset performance. A novel patch attention network is used for finding the most discriminative patches and also for network fusion. We evaluate our method on four publicly available datasets:LivDet 2011, 2013, 2015 and 2017. A set of comprehensive experiments are carried out to evaluate cross-sensor, cross-material and cross-dataset performance over these datasets. The proposed approach achieves an average accuracy of 99.52%, 99.16% and 99.72% on LivDet 2017,2015 and 2011 respectively outperforming the current state-of-the-art results by 3% and 4% for LivDet 2015 and 2011 respectively.
摘要:近年来,指纹识别系统在生物安全领域取得了令人瞩目的进步,因为它对个人,国家和全球安全的重要作用。尽管有这些显着的进步,指纹识别技术仍然容易受到欺骗攻击,它可以显著危及用户安全。交叉传感器和跨材料欺骗检测仍然对与恶搞材料无数,每天新出现的,影响传感器的互操作性和稳健性提出了挑战。本文提出了一种利用全局和局部的指纹特征描述指纹欺骗检测的新方法。这些描述符使用DenseNet其中显著提高跨传感器,横材料和跨数据集性能萃取。一种新的补丁关注网络用于寻找最歧视性的补丁,也为网络的融合。我们评估我们在四个可公开获得的数据集的方法:LivDet 2011年,2013年,2015年和2017年的综合性实验一组都进行了评估跨传感器,跨材料,跨数据集对这些数据集的性能。所提出的方法实现了99.52%,99.16%和上LivDet 2017,2015和2011 99.72%分别优于当前状态的最先进的结果由3%和4%分别为2015年LivDet和2011年的平均精确度。
10. Model-Agnostic Structured Sparsification with Learnable Channel Shuffle [PDF] 返回目录
Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang
Abstract: Recent advances in convolutional neural networks (CNNs) usually come with the expense of considerable computational overhead and memory footprint. Network compression aims to alleviate this issue by training compact models with comparable performance. However, existing compression techniques either entail dedicated expert design or compromise with a moderate performance drop. To this end, we propose a model-agnostic structured sparsification method for efficient network compression. The proposed method automatically induces structurally sparse representations of the convolutional weights, thereby facilitating the implementation of the compressed model with the highly-optimized group convolution. We further address the problem of inter-group communication with a learnable channel shuffle mechanism. The proposed approach is model-agnostic and highly compressible with a negligible performance drop. Extensive experimental results and analysis demonstrate that our approach performs favorably against the state-of-the-art network pruning methods. The code will be publicly available after the review process.
摘要:卷积神经网络(细胞神经网络)的最新进展通常会具有相当大的计算开销和内存占用为代价的。网络压缩的目的是通过培训紧凑车型相媲美的性能来缓解这个问题。然而,现有的压缩技术,无论是继承权问题专门设计的专家或妥协具有中等性能下降。为此,我们提出了高效的网络压缩模型无关的结构化方法稀疏。所提出的方法自动诱导卷积权重的结构稀疏表示,从而促进与高度优化组卷积压缩模型的实现。我们进一步用可学习信道洗牌机构解决组间通信的问题。建议的做法是模型无关,用微不足道的性能下降高度压缩。广泛的实验结果和分析表明,我们的方法进行良好地对国家的最先进的网络修剪方法。该代码将在审核之后公开。
Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang
Abstract: Recent advances in convolutional neural networks (CNNs) usually come with the expense of considerable computational overhead and memory footprint. Network compression aims to alleviate this issue by training compact models with comparable performance. However, existing compression techniques either entail dedicated expert design or compromise with a moderate performance drop. To this end, we propose a model-agnostic structured sparsification method for efficient network compression. The proposed method automatically induces structurally sparse representations of the convolutional weights, thereby facilitating the implementation of the compressed model with the highly-optimized group convolution. We further address the problem of inter-group communication with a learnable channel shuffle mechanism. The proposed approach is model-agnostic and highly compressible with a negligible performance drop. Extensive experimental results and analysis demonstrate that our approach performs favorably against the state-of-the-art network pruning methods. The code will be publicly available after the review process.
摘要:卷积神经网络(细胞神经网络)的最新进展通常会具有相当大的计算开销和内存占用为代价的。网络压缩的目的是通过培训紧凑车型相媲美的性能来缓解这个问题。然而,现有的压缩技术,无论是继承权问题专门设计的专家或妥协具有中等性能下降。为此,我们提出了高效的网络压缩模型无关的结构化方法稀疏。所提出的方法自动诱导卷积权重的结构稀疏表示,从而促进与高度优化组卷积压缩模型的实现。我们进一步用可学习信道洗牌机构解决组间通信的问题。建议的做法是模型无关,用微不足道的性能下降高度压缩。广泛的实验结果和分析表明,我们的方法进行良好地对国家的最先进的网络修剪方法。该代码将在审核之后公开。
11. Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [PDF] 返回目录
Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang
Abstract: Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. Recent methods have exploited classification networks to localize objects by selecting regions with strong response. While such response map provides sparse information, however, there exist strong pairwise relations between pixels in natural images, which can be utilized to propagate the sparse map to a much denser one. In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network. The refined results by the pairwise network are then used as supervision to train the unary network, and the procedures are conducted iteratively to obtain better segmentation progressively. To learn reliable pixel affinity without accurate annotation, we also propose to mine confident regions. We show that iteratively training this framework is equivalent to optimizing an energy function with convergence to a local minimum. Experimental results on the PASCAL VOC 2012 and COCO datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
摘要:弱监督语义分割是一项具有挑战性的任务,因为没有逐个像素的标签信息提供了培训。最近的方法都用强烈反响选择地区利用分级网络本地化的对象。虽然这种反应图提供稀疏信息,但是,也存在于自然图像,其可以被利用来稀疏地图传播到致密得多的一个像素之间的强关系成对。在本文中,我们提出了一种迭代算法来学习这样的成对关系,它由两个分支的,一元分割网络,学习用于每个像素的标签的概率,并且其学习的亲和基质和提炼从产生的概率映射图的成对亲和网络一元网络。精制的结果通过成对的网络,然后作为监督训练一元网络,程序进行迭代以逐渐获得更好的分割。要了解可靠的像素亲和力不准确的注解,我们也建议我的自信区域。我们表明,反复训练这个框架相当于优化具有收敛的能量函数的局部最小值。在PASCAL VOC 2012和COCO数据集实验结果表明,所提出的算法进行对有利国家的最先进的方法。
Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang
Abstract: Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. Recent methods have exploited classification networks to localize objects by selecting regions with strong response. While such response map provides sparse information, however, there exist strong pairwise relations between pixels in natural images, which can be utilized to propagate the sparse map to a much denser one. In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network. The refined results by the pairwise network are then used as supervision to train the unary network, and the procedures are conducted iteratively to obtain better segmentation progressively. To learn reliable pixel affinity without accurate annotation, we also propose to mine confident regions. We show that iteratively training this framework is equivalent to optimizing an energy function with convergence to a local minimum. Experimental results on the PASCAL VOC 2012 and COCO datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
摘要:弱监督语义分割是一项具有挑战性的任务,因为没有逐个像素的标签信息提供了培训。最近的方法都用强烈反响选择地区利用分级网络本地化的对象。虽然这种反应图提供稀疏信息,但是,也存在于自然图像,其可以被利用来稀疏地图传播到致密得多的一个像素之间的强关系成对。在本文中,我们提出了一种迭代算法来学习这样的成对关系,它由两个分支的,一元分割网络,学习用于每个像素的标签的概率,并且其学习的亲和基质和提炼从产生的概率映射图的成对亲和网络一元网络。精制的结果通过成对的网络,然后作为监督训练一元网络,程序进行迭代以逐渐获得更好的分割。要了解可靠的像素亲和力不准确的注解,我们也建议我的自信区域。我们表明,反复训练这个框架相当于优化具有收敛的能量函数的局部最小值。在PASCAL VOC 2012和COCO数据集实验结果表明,所提出的算法进行对有利国家的最先进的方法。
12. Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos [PDF] 返回目录
Subhajit Chaudhury, Daiki Kimura, Phongtharin Vinayavekhin, Asim Munawar, Ryuki Tachibana, Koji Ito, Yuki Inaba, Minoru Matsumoto, Shuji Kidokoro, Hiroki Ozaki
Abstract: Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environments with arbitrary camera angles. The transition from structured to unstructured video analysis produces multiple challenges that we address in our paper. Specifically, we identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles. For the first problem, we propose a temporal feature aggregation algorithm using person re-identification features to obtain high player retrieval precision by boosting a weak heuristic scoring method. Additionally, we propose a data augmentation technique, based on multi-modal image translation model, to reduce bias in the appearance of training samples. Experimental evaluations show that our proposed method improves precision for player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally, we obtain an improvement in F1 score for rally detection in table tennis videos from 0.79 in case of global frame-level features to 0.89 using our proposed player-level features. Please see the supplementary video submission at this https URL.
摘要:基于图像的运动分析启用关键事件的自动检索在一场比赛中,以加快分析人类专家处理。然而,大多数现有的方法集中在与具有捕捉姿势最小变异直和固定摄像机结构电视广播视频数据集。在本文中,我们研究事件检测的在任意的拍摄角度非结构化环境中运动视频的情况。从结构到非结构化的视频分析的转变产生了我们在纸上应对多种挑战。具体来说,我们发现并解决两大问题:球员在训练的模型的非结构化设置和泛化带来的变化无监督的识别由于任意的拍摄角度。对于第一个问题,我们使用人重新识别特征通过提高弱启发式记分方法来获得高的球员检索精度提出了一个时间特征聚合算法。此外,我们提出了一个数据增强技术,基于多模态图像平移模式,以减少训练样本的出现偏差。试验评估表明,我们提出的方法提高精度玩家检索从0.78〜0.86的倾斜角度的视频。此外,我们获得F1在全球帧级的情况下,在乒乓球的视频分数集会检测从0.79到功能使用0.89我们建议的玩家级功能的改善。请参阅本HTTPS URL补充提交影片。
Subhajit Chaudhury, Daiki Kimura, Phongtharin Vinayavekhin, Asim Munawar, Ryuki Tachibana, Koji Ito, Yuki Inaba, Minoru Matsumoto, Shuji Kidokoro, Hiroki Ozaki
Abstract: Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environments with arbitrary camera angles. The transition from structured to unstructured video analysis produces multiple challenges that we address in our paper. Specifically, we identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles. For the first problem, we propose a temporal feature aggregation algorithm using person re-identification features to obtain high player retrieval precision by boosting a weak heuristic scoring method. Additionally, we propose a data augmentation technique, based on multi-modal image translation model, to reduce bias in the appearance of training samples. Experimental evaluations show that our proposed method improves precision for player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally, we obtain an improvement in F1 score for rally detection in table tennis videos from 0.79 in case of global frame-level features to 0.89 using our proposed player-level features. Please see the supplementary video submission at this https URL.
摘要:基于图像的运动分析启用关键事件的自动检索在一场比赛中,以加快分析人类专家处理。然而,大多数现有的方法集中在与具有捕捉姿势最小变异直和固定摄像机结构电视广播视频数据集。在本文中,我们研究事件检测的在任意的拍摄角度非结构化环境中运动视频的情况。从结构到非结构化的视频分析的转变产生了我们在纸上应对多种挑战。具体来说,我们发现并解决两大问题:球员在训练的模型的非结构化设置和泛化带来的变化无监督的识别由于任意的拍摄角度。对于第一个问题,我们使用人重新识别特征通过提高弱启发式记分方法来获得高的球员检索精度提出了一个时间特征聚合算法。此外,我们提出了一个数据增强技术,基于多模态图像平移模式,以减少训练样本的出现偏差。试验评估表明,我们提出的方法提高精度玩家检索从0.78〜0.86的倾斜角度的视频。此外,我们获得F1在全球帧级的情况下,在乒乓球的视频分数集会检测从0.79到功能使用0.89我们建议的玩家级功能的改善。请参阅本HTTPS URL补充提交影片。
13. Meta Segmentation Network for Ultra-Resolution Medical Images [PDF] 返回目录
Tong Wu, Yuan Xie, Yanyun Qu, Bicheng Dai, Shuxin Chen
Abstract: Despite recent progress on semantic segmentation, there still exist huge challenges in medical ultra-resolution image segmentation. The methods based on multi-branch structure can make a good balance between computational burdens and segmentation accuracy. However, the fusion structure in these methods require to be designed elaborately to achieve desirable result, which leads to model redundancy. In this paper, we propose Meta Segmentation Network (MSN) to solve this challenging problem. With the help of meta-learning, the fusion module of MSN is quite simple but effective. MSN can fast generate the weights of fusion layers through a simple meta-learner, requiring only a few training samples and epochs to converge. In addition, to avoid learning all branches from scratch, we further introduce a particular weight sharing mechanism to realize a fast knowledge adaptation and share the weights among multiple branches, resulting in the performance improvement and significant parameters reduction. The experimental results on two challenging ultra-resolution medical datasets BACH and ISIC show that MSN achieves the best performance compared with the state-of-the-art methods.
摘要:尽管在语义分割的最新进展,仍然存在着医疗超分辨率图像分割的巨大挑战。基于多分支结构的方法可以使计算负担和分割精度之间的良好平衡。然而,在这些方法中,融合结构需要被精心设计来实现期望的结果,这导致模型的冗余。在本文中,我们提出了分割元网络(MSN)来解决这个具有挑战性的问题。随着元学习的帮助下,MSN的融合模块是非常简单而有效。 MSN可以快速通过一个简单的元学习者产生熔融层的权重,只需要几个训练样本和时代收敛。此外,为了避免学习从头各分公司,我们进一步介绍,特定的加权共享机制,实现了快速的知识适应和多个分支机构之间共享的权重,从而导致性能改进和显著参数减少。两个实验结果挑战超分辨率医疗数据集巴赫和ISIC显示,MSN与国家的最先进的方法相比达到最佳的性能。
Tong Wu, Yuan Xie, Yanyun Qu, Bicheng Dai, Shuxin Chen
Abstract: Despite recent progress on semantic segmentation, there still exist huge challenges in medical ultra-resolution image segmentation. The methods based on multi-branch structure can make a good balance between computational burdens and segmentation accuracy. However, the fusion structure in these methods require to be designed elaborately to achieve desirable result, which leads to model redundancy. In this paper, we propose Meta Segmentation Network (MSN) to solve this challenging problem. With the help of meta-learning, the fusion module of MSN is quite simple but effective. MSN can fast generate the weights of fusion layers through a simple meta-learner, requiring only a few training samples and epochs to converge. In addition, to avoid learning all branches from scratch, we further introduce a particular weight sharing mechanism to realize a fast knowledge adaptation and share the weights among multiple branches, resulting in the performance improvement and significant parameters reduction. The experimental results on two challenging ultra-resolution medical datasets BACH and ISIC show that MSN achieves the best performance compared with the state-of-the-art methods.
摘要:尽管在语义分割的最新进展,仍然存在着医疗超分辨率图像分割的巨大挑战。基于多分支结构的方法可以使计算负担和分割精度之间的良好平衡。然而,在这些方法中,融合结构需要被精心设计来实现期望的结果,这导致模型的冗余。在本文中,我们提出了分割元网络(MSN)来解决这个具有挑战性的问题。随着元学习的帮助下,MSN的融合模块是非常简单而有效。 MSN可以快速通过一个简单的元学习者产生熔融层的权重,只需要几个训练样本和时代收敛。此外,为了避免学习从头各分公司,我们进一步介绍,特定的加权共享机制,实现了快速的知识适应和多个分支机构之间共享的权重,从而导致性能改进和显著参数减少。两个实验结果挑战超分辨率医疗数据集巴赫和ISIC显示,MSN与国家的最先进的方法相比达到最佳的性能。
14. Feasibility of Video-based Sub-meter Localization on Resource-constrained Platforms [PDF] 返回目录
Abm Musa, Jakob Eriksson
Abstract: While the satellite-based Global Positioning System (GPS) is adequate for some outdoor applications, many other applications are held back by its multi-meter positioning errors and poor indoor coverage. In this paper, we study the feasibility of real-time video-based localization on resource-constrained platforms. Before commencing a localization task, a video-based localization system downloads an offline model of a restricted target environment, such as a set of city streets, or an indoor shopping mall. The system is then able to localize the user within the model, using only video as input. To enable such a system to run on resource-constrained embedded systems or smartphones, we (a) propose techniques for efficiently building a 3D model of a surveyed path, through frame selection and efficient feature matching, (b) substantially reduce model size by multiple compression techniques, without sacrificing localization accuracy, (c) propose efficient and concurrent techniques for feature extraction and matching to enable online localization, (d) propose a method with interleaved feature matching and optical flow based tracking to reduce the feature extraction and matching time in online localization. Based on an extensive set of both indoor and outdoor videos, manually annotated with location ground truth, we demonstrate that sub-meter accuracy, at real-time rates, is achievable on smart-phone type platforms, despite challenging video conditions.
摘要:虽然基于卫星的全球定位系统(GPS)是否适合一些户外应用,许多其它应用也阻碍了其多米的定位精度和室内覆盖较差。在本文中,我们研究了实时视频为基础的本地化的资源约束型平台的可行性。开始本地化任务之前,基于视频的定位系统下载受限制的目标环境,离线模式等一整套城市街道,或室内购物商场。该系统然后能够在模型内定位用户,仅使用视频作为输入。为了使这样一个系统,以在资源受限的嵌入式系统或智能手机上运行,我们(a)中提出一种用于高效地构建调查路径的3D模型,通过帧选择和高效的特征匹配,(b)中的技术通过将多个实质上减少模型的大小压缩技术,在不牺牲定位精度,(c)中提出的特征提取和匹配高效和并行的技术,以使在线本地化,(d)提出与基于跟踪交织特征匹配和光流的方法,以减少特征提取和匹配时间在在线本地化。基于一套广泛的室内和室外的视频,与位置地面实况手动注释,我们证明了亚米级精度,在实时价格,是智能型手机平台实现的,尽管具有挑战性的视频条件。
Abm Musa, Jakob Eriksson
Abstract: While the satellite-based Global Positioning System (GPS) is adequate for some outdoor applications, many other applications are held back by its multi-meter positioning errors and poor indoor coverage. In this paper, we study the feasibility of real-time video-based localization on resource-constrained platforms. Before commencing a localization task, a video-based localization system downloads an offline model of a restricted target environment, such as a set of city streets, or an indoor shopping mall. The system is then able to localize the user within the model, using only video as input. To enable such a system to run on resource-constrained embedded systems or smartphones, we (a) propose techniques for efficiently building a 3D model of a surveyed path, through frame selection and efficient feature matching, (b) substantially reduce model size by multiple compression techniques, without sacrificing localization accuracy, (c) propose efficient and concurrent techniques for feature extraction and matching to enable online localization, (d) propose a method with interleaved feature matching and optical flow based tracking to reduce the feature extraction and matching time in online localization. Based on an extensive set of both indoor and outdoor videos, manually annotated with location ground truth, we demonstrate that sub-meter accuracy, at real-time rates, is achievable on smart-phone type platforms, despite challenging video conditions.
摘要:虽然基于卫星的全球定位系统(GPS)是否适合一些户外应用,许多其它应用也阻碍了其多米的定位精度和室内覆盖较差。在本文中,我们研究了实时视频为基础的本地化的资源约束型平台的可行性。开始本地化任务之前,基于视频的定位系统下载受限制的目标环境,离线模式等一整套城市街道,或室内购物商场。该系统然后能够在模型内定位用户,仅使用视频作为输入。为了使这样一个系统,以在资源受限的嵌入式系统或智能手机上运行,我们(a)中提出一种用于高效地构建调查路径的3D模型,通过帧选择和高效的特征匹配,(b)中的技术通过将多个实质上减少模型的大小压缩技术,在不牺牲定位精度,(c)中提出的特征提取和匹配高效和并行的技术,以使在线本地化,(d)提出与基于跟踪交织特征匹配和光流的方法,以减少特征提取和匹配时间在在线本地化。基于一套广泛的室内和室外的视频,与位置地面实况手动注释,我们证明了亚米级精度,在实时价格,是智能型手机平台实现的,尽管具有挑战性的视频条件。
15. On-line non-overlapping camera calibration net [PDF] 返回目录
Zhao Fangda, Toru Tamaki, Takio Kurita, Bisser Raytchev, Kazufumi Kaneda
Abstract: We propose an easy-to-use non-overlapping camera calibration method. First, successive images are fed to a PoseNet-based network to obtain ego-motion of cameras between frames. Next, the pose between cameras are estimated. Instead of using a batch method, we propose an on-line method of the inter-camera pose estimation. Furthermore, we implement the entire procedure on a computation graph. Experiments with simulations and the KITTI dataset show the proposed method to be effective in simulation.
摘要:我们提出了一个易于使用的非重叠摄像机标定方法。首先,连续的图像被馈送到基于PoseNet的网络获得的帧之间的摄像机自运动。接下来,相机的姿势估计。而是采用间歇式方法的,我们提出了摄像装置间姿态估计的上线方法。此外,我们在执行计算图的整个过程。用模拟和实验KITTI数据集示出该方法可有效地模拟。
Zhao Fangda, Toru Tamaki, Takio Kurita, Bisser Raytchev, Kazufumi Kaneda
Abstract: We propose an easy-to-use non-overlapping camera calibration method. First, successive images are fed to a PoseNet-based network to obtain ego-motion of cameras between frames. Next, the pose between cameras are estimated. Instead of using a batch method, we propose an on-line method of the inter-camera pose estimation. Furthermore, we implement the entire procedure on a computation graph. Experiments with simulations and the KITTI dataset show the proposed method to be effective in simulation.
摘要:我们提出了一个易于使用的非重叠摄像机标定方法。首先,连续的图像被馈送到基于PoseNet的网络获得的帧之间的摄像机自运动。接下来,相机的姿势估计。而是采用间歇式方法的,我们提出了摄像装置间姿态估计的上线方法。此外,我们在执行计算图的整个过程。用模拟和实验KITTI数据集示出该方法可有效地模拟。
16. Universal Domain Adaptation through Self Supervision [PDF] 返回目录
Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko
Abstract: Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains. While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori. We propose a more universally applicable domain adaptation approach that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE). DANCE combines two novel ideas: First, as we cannot fully rely on source categories to learn features discriminative for the target, we propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way. Second, we use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy. We show through extensive experiments that DANCE outperforms baselines across open-set, open-partial and partial domain adaptation settings.
摘要:无监督域适配方法传统上假定所有源类别存在于目标域。在实践中,可以小大约两个结构域之间的重叠类是已知的。而具有部分或开组类别一些方法中地址目标设置,它们假定特定设置是先验已知的。我们提出了一个更普遍适用的领域适应性方法,可以处理任意类型的转变,通过优化熵(DANCE)称为域名适应性研究邻聚类。舞蹈结合了两种新奇的想法:首先,正如我们不能完全依靠源类别学习辨别目标,我们提出了一个新的邻里聚类技术学习目标域的结构,自我监督方式的特点。其次,我们采用基于熵的功能定位和拒绝来对准目标特征与源,或拒绝它们基于它们的熵未知的类别。我们发现,通过大量的实验跨开集,开放式部分和部分领域适应性设置DANCE性能优于基准。
Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko
Abstract: Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains. While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori. We propose a more universally applicable domain adaptation approach that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE). DANCE combines two novel ideas: First, as we cannot fully rely on source categories to learn features discriminative for the target, we propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way. Second, we use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy. We show through extensive experiments that DANCE outperforms baselines across open-set, open-partial and partial domain adaptation settings.
摘要:无监督域适配方法传统上假定所有源类别存在于目标域。在实践中,可以小大约两个结构域之间的重叠类是已知的。而具有部分或开组类别一些方法中地址目标设置,它们假定特定设置是先验已知的。我们提出了一个更普遍适用的领域适应性方法,可以处理任意类型的转变,通过优化熵(DANCE)称为域名适应性研究邻聚类。舞蹈结合了两种新奇的想法:首先,正如我们不能完全依靠源类别学习辨别目标,我们提出了一个新的邻里聚类技术学习目标域的结构,自我监督方式的特点。其次,我们采用基于熵的功能定位和拒绝来对准目标特征与源,或拒绝它们基于它们的熵未知的类别。我们发现,通过大量的实验跨开集,开放式部分和部分领域适应性设置DANCE性能优于基准。
17. Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [PDF] 返回目录
Le Hou, Rajarsi Gupta, John S. Van Arnam, Yuwei Zhang, Kaustubh Sivalenka, Dimitris Samaras, Tahsin M. Kurc, Joel H. Saltz
Abstract: The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process. We have generated nucleus segmentation results in 5,060 Whole Slide Tissue images from 10 cancer types in The Cancer Genome Atlas. One key component of our work is that we carried out a multi-level quality control process (WSI-level and image patch-level), to evaluate the quality of our segmentation results. The image patch-level quality control used manual segmentation ground truth data from 1,356 sampled image patches. The datasets we publish in this work consist of roughly 5 billion quality controlled nuclei from more than 5,060 TCGA WSIs from 10 different TCGA cancer types and 1,356 manually segmented TCGA image patches from the same 10 cancer types plus additional 4 cancer types.
摘要:分布和原子核的外观是用于诊断和癌症研究必不可少的标志。尽管核形态的重要性,但缺乏大规模,准确,公开访问的细胞核分段的数据。为了解决这个问题,我们开发了一个分析流水线段在从多种癌症类型整个幻灯片组织图像与质量控制过程的核。我们已经在10种癌症类型中的癌症基因组图谱5060个整个幻灯片组织图像生成核的分割结果。我们工作的一个重要组成部分是,我们进行了一个多层次的质量控制流程(WSI级和像块级),来评估我们的分割结果的质量。图像补丁级别的质量控制从1356个采样的图像块中使用手动分割地面实况数据。我们在这项工作中公布的数据集从超过5060次TCGA峰会从10种不同的TCGA癌症类型包括约5十亿质量控制核的和1356手动分段TCGA图像块从相同的10种癌症类型加上另外4的癌症类型。
Le Hou, Rajarsi Gupta, John S. Van Arnam, Yuwei Zhang, Kaustubh Sivalenka, Dimitris Samaras, Tahsin M. Kurc, Joel H. Saltz
Abstract: The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process. We have generated nucleus segmentation results in 5,060 Whole Slide Tissue images from 10 cancer types in The Cancer Genome Atlas. One key component of our work is that we carried out a multi-level quality control process (WSI-level and image patch-level), to evaluate the quality of our segmentation results. The image patch-level quality control used manual segmentation ground truth data from 1,356 sampled image patches. The datasets we publish in this work consist of roughly 5 billion quality controlled nuclei from more than 5,060 TCGA WSIs from 10 different TCGA cancer types and 1,356 manually segmented TCGA image patches from the same 10 cancer types plus additional 4 cancer types.
摘要:分布和原子核的外观是用于诊断和癌症研究必不可少的标志。尽管核形态的重要性,但缺乏大规模,准确,公开访问的细胞核分段的数据。为了解决这个问题,我们开发了一个分析流水线段在从多种癌症类型整个幻灯片组织图像与质量控制过程的核。我们已经在10种癌症类型中的癌症基因组图谱5060个整个幻灯片组织图像生成核的分割结果。我们工作的一个重要组成部分是,我们进行了一个多层次的质量控制流程(WSI级和像块级),来评估我们的分割结果的质量。图像补丁级别的质量控制从1356个采样的图像块中使用手动分割地面实况数据。我们在这项工作中公布的数据集从超过5060次TCGA峰会从10种不同的TCGA癌症类型包括约5十亿质量控制核的和1356手动分段TCGA图像块从相同的10种癌症类型加上另外4的癌症类型。
18. Lake Ice Monitoring with Webcams and Crowd-Sourced Images [PDF] 返回目录
Rajanie Prabha, Manu Tom, Mathias Rothermel, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler
Abstract: Lake ice is a strong climate indicator and has been recognised as part of the Essential Climate Variables (ECV) by the Global Climate Observing System (GCOS). The dynamics of freezing and thawing, and possible shifts of freezing patterns over time, can help in understanding the local and global climate systems. One way to acquire the spatio-temporal information about lake ice formation, independent of clouds, is to analyse webcam images. This paper intends to move towards a universal model for monitoring lake ice with freely available webcam data. We demonstrate good performance, including the ability to generalise across different winters and different lakes, with a state-of-the-art Convolutional Neural Network (CNN) model for semantic image segmentation, Deeplab v3+. Moreover, we design a variant of that model, termed Deep-U-Lab, which predicts sharper, more correct segmentation boundaries. We have tested the model's ability to generalise with data from multiple camera views and two different winters. On average, it achieves intersection-over-union (IoU) values of ~71% across different cameras and ~69% across different winters, greatly outperforming prior work. Going even further, we show that the model even achieves 60% IoU on arbitrary images scraped from photo-sharing web sites. As part of the work, we introduce a new benchmark dataset of webcam images, Photi-LakeIce, from multiple cameras and two different winters, along with pixel-wise ground truth annotations.
摘要:湖冰是一个强有力的气候指标,已被确认为基本气候变量(ECV)全球气候观测系统(GCOS)的一部分。冷冻和解冻,并随着时间的推移冻结模式的可能变化,动态,可了解当地和全球气候系统的帮助。获得关于湖的冰面上形成独立云的时空信息,一种方法,是分析网络摄像头的图像。本文拟迈向通用模型移动监测湖冰与免费提供的摄像头数据。我们表现出良好的性能,包括在不同的冬季和不同的湖泊一概而论,与一个国家的最先进的卷积神经网络(CNN)模型语义图像分割,Deeplab V3 +的能力。此外,我们设计这个模型的变种,称为深U型实验室,它预测更清晰,更准确的分割界线。我们已经测试模型与来自多个摄像机视图和两个不同的冬天的数据归纳的能力。平均而言,实现了在不同的摄像机的〜71%相交-过联盟(IOU)值和〜69%的在不同的冬天,大大优于以前的工作。变本加厉,我们表明,该模型甚至达到60%的借条从照片共享网站刮任意图像。作为工作的一部分,我们介绍摄像头图像的新的基准数据集,Photi-LakeIce,来自多个摄像机和两种不同的冬天,用逐像素地面真相批注一起。
Rajanie Prabha, Manu Tom, Mathias Rothermel, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler
Abstract: Lake ice is a strong climate indicator and has been recognised as part of the Essential Climate Variables (ECV) by the Global Climate Observing System (GCOS). The dynamics of freezing and thawing, and possible shifts of freezing patterns over time, can help in understanding the local and global climate systems. One way to acquire the spatio-temporal information about lake ice formation, independent of clouds, is to analyse webcam images. This paper intends to move towards a universal model for monitoring lake ice with freely available webcam data. We demonstrate good performance, including the ability to generalise across different winters and different lakes, with a state-of-the-art Convolutional Neural Network (CNN) model for semantic image segmentation, Deeplab v3+. Moreover, we design a variant of that model, termed Deep-U-Lab, which predicts sharper, more correct segmentation boundaries. We have tested the model's ability to generalise with data from multiple camera views and two different winters. On average, it achieves intersection-over-union (IoU) values of ~71% across different cameras and ~69% across different winters, greatly outperforming prior work. Going even further, we show that the model even achieves 60% IoU on arbitrary images scraped from photo-sharing web sites. As part of the work, we introduce a new benchmark dataset of webcam images, Photi-LakeIce, from multiple cameras and two different winters, along with pixel-wise ground truth annotations.
摘要:湖冰是一个强有力的气候指标,已被确认为基本气候变量(ECV)全球气候观测系统(GCOS)的一部分。冷冻和解冻,并随着时间的推移冻结模式的可能变化,动态,可了解当地和全球气候系统的帮助。获得关于湖的冰面上形成独立云的时空信息,一种方法,是分析网络摄像头的图像。本文拟迈向通用模型移动监测湖冰与免费提供的摄像头数据。我们表现出良好的性能,包括在不同的冬季和不同的湖泊一概而论,与一个国家的最先进的卷积神经网络(CNN)模型语义图像分割,Deeplab V3 +的能力。此外,我们设计这个模型的变种,称为深U型实验室,它预测更清晰,更准确的分割界线。我们已经测试模型与来自多个摄像机视图和两个不同的冬天的数据归纳的能力。平均而言,实现了在不同的摄像机的〜71%相交-过联盟(IOU)值和〜69%的在不同的冬天,大大优于以前的工作。变本加厉,我们表明,该模型甚至达到60%的借条从照片共享网站刮任意图像。作为工作的一部分,我们介绍摄像头图像的新的基准数据集,Photi-LakeIce,来自多个摄像机和两种不同的冬天,用逐像素地面真相批注一起。
19. Fawkes: Protecting Personal Privacy against Unauthorized Deep Learning Models [PDF] 返回目录
Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li, Haitao Zheng, Ben Y. Zhao
Abstract: Today's proliferation of powerful facial recognition models poses a real threat to personal privacy. As this http URL demonstrated, anyone can canvas the Internet for data, and train highly accurate facial recognition models of us without our knowledge. We need tools to protect ourselves from unauthorized facial recognition systems and their numerous potential misuses. Unfortunately, work in related areas are limited in practicality and effectiveness. In this paper, we propose Fawkes, a system that allow individuals to inoculate themselves against unauthorized facial recognition models. Fawkes achieves this by helping users adding imperceptible pixel-level changes (we call them "cloaks") to their own photos before publishing them online. When collected by a third-party "tracker" and used to train facial recognition models, these "cloaked" images produce functional models that consistently misidentify the user. We experimentally prove that Fawkes provides 95+% protection against user recognition regardless of how trackers train their models. Even when clean, uncloaked images are "leaked" to the tracker and used for training, Fawkes can still maintain a 80+% protection success rate. In fact, we perform real experiments against today's state-of-the-art facial recognition services and achieve 100% success. Finally, we show that Fawkes is robust against a variety of countermeasures that try to detect or disrupt cloaks.
摘要:强大的面部识别模型今天的扩散对个人隐私构成真正的威胁。由于这个HTTP URL证明,任何人都可以帆布互联网数据,和训练我们的高度精确的面部识别模型不知情的情况下。我们需要工具来保护自己免受未经授权的面部识别系统和它们的许多潜在的滥用。不幸的是,在相关领域工作的实用性和有效性是有限的。在本文中,我们提出了福克斯,一个系统,允许个人接种自己免受未经授权的面部识别模型。福克斯通过网上发布之前,帮助用户将感觉不到的像素级的变化(我们称他们为“斗篷”),以自己的照片达到这一点。当由第三方“跟踪器”收集并用于训练面部识别模式,这些“隐形”的图像产生功能模型,始终错误识别用户。我们通过实验证明,无论怎样追踪他们的训练模式,福克斯提供了95 +%,对用户的识别保护。即使清洁,取消遮盖图像“泄露”到跟踪器和用于训练,福克斯仍能保持80 +%的保护成功率。事实上,我们对今天的国家的最先进的面部识别服务进行实时实验,达到100次%的成功。最后,我们表明,福克斯是对各种试图探测或破坏斗篷对策强劲。
Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li, Haitao Zheng, Ben Y. Zhao
Abstract: Today's proliferation of powerful facial recognition models poses a real threat to personal privacy. As this http URL demonstrated, anyone can canvas the Internet for data, and train highly accurate facial recognition models of us without our knowledge. We need tools to protect ourselves from unauthorized facial recognition systems and their numerous potential misuses. Unfortunately, work in related areas are limited in practicality and effectiveness. In this paper, we propose Fawkes, a system that allow individuals to inoculate themselves against unauthorized facial recognition models. Fawkes achieves this by helping users adding imperceptible pixel-level changes (we call them "cloaks") to their own photos before publishing them online. When collected by a third-party "tracker" and used to train facial recognition models, these "cloaked" images produce functional models that consistently misidentify the user. We experimentally prove that Fawkes provides 95+% protection against user recognition regardless of how trackers train their models. Even when clean, uncloaked images are "leaked" to the tracker and used for training, Fawkes can still maintain a 80+% protection success rate. In fact, we perform real experiments against today's state-of-the-art facial recognition services and achieve 100% success. Finally, we show that Fawkes is robust against a variety of countermeasures that try to detect or disrupt cloaks.
摘要:强大的面部识别模型今天的扩散对个人隐私构成真正的威胁。由于这个HTTP URL证明,任何人都可以帆布互联网数据,和训练我们的高度精确的面部识别模型不知情的情况下。我们需要工具来保护自己免受未经授权的面部识别系统和它们的许多潜在的滥用。不幸的是,在相关领域工作的实用性和有效性是有限的。在本文中,我们提出了福克斯,一个系统,允许个人接种自己免受未经授权的面部识别模型。福克斯通过网上发布之前,帮助用户将感觉不到的像素级的变化(我们称他们为“斗篷”),以自己的照片达到这一点。当由第三方“跟踪器”收集并用于训练面部识别模式,这些“隐形”的图像产生功能模型,始终错误识别用户。我们通过实验证明,无论怎样追踪他们的训练模式,福克斯提供了95 +%,对用户的识别保护。即使清洁,取消遮盖图像“泄露”到跟踪器和用于训练,福克斯仍能保持80 +%的保护成功率。事实上,我们对今天的国家的最先进的面部识别服务进行实时实验,达到100次%的成功。最后,我们表明,福克斯是对各种试图探测或破坏斗篷对策强劲。
20. Variational Encoder-based Reliable Classification [PDF] 返回目录
Chitresh Bhushan, Zhaoyuan Yang, Nurali Virani, Naresh Iyer
Abstract: Machine learning models provide statistically impressive results which might be individually unreliable. To provide reliability, we propose an Epistemic Classifier (EC) that can provide justification of its belief using support from the training dataset as well as quality of reconstruction. Our approach is based on modified variational auto-encoders that can identify a semantically meaningful low-dimensional space where perceptually similar instances are close in $\ell_2$-distance too. Our results demonstrate improved reliability of predictions and robust identification of samples with adversarial attacks as compared to baseline of softmax-based thresholding.
摘要:机器学习模型提供这可能是个别不可靠的统计结果令人印象深刻。为了保证系统的可靠性,我们提出了一个认知分类(EC),可以提供使用从训练数据集的支持,以及重建的质量,它相信的理由。我们的做法是基于改进变自动编码器,它可以识别语义上有意义的低维空间,感觉上类似的例子是接近$ \ $ ell_2太距离d。相比于基于SOFTMAX-阈值的基线我们的结果显示出改进的预测和与敌对攻击样品的鲁棒辨识的可靠性。
Chitresh Bhushan, Zhaoyuan Yang, Nurali Virani, Naresh Iyer
Abstract: Machine learning models provide statistically impressive results which might be individually unreliable. To provide reliability, we propose an Epistemic Classifier (EC) that can provide justification of its belief using support from the training dataset as well as quality of reconstruction. Our approach is based on modified variational auto-encoders that can identify a semantically meaningful low-dimensional space where perceptually similar instances are close in $\ell_2$-distance too. Our results demonstrate improved reliability of predictions and robust identification of samples with adversarial attacks as compared to baseline of softmax-based thresholding.
摘要:机器学习模型提供这可能是个别不可靠的统计结果令人印象深刻。为了保证系统的可靠性,我们提出了一个认知分类(EC),可以提供使用从训练数据集的支持,以及重建的质量,它相信的理由。我们的做法是基于改进变自动编码器,它可以识别语义上有意义的低维空间,感觉上类似的例子是接近$ \ $ ell_2太距离d。相比于基于SOFTMAX-阈值的基线我们的结果显示出改进的预测和与敌对攻击样品的鲁棒辨识的可靠性。
21. SYMOG: learning symmetric mixture of Gaussian modes for improved fixed-point quantization [PDF] 返回目录
Lukas Enderich, Fabian Timm, Wolfram Burgard
Abstract: Deep neural networks (DNNs) have been proven to outperform classical methods on several machine learning benchmarks. However, they have high computational complexity and require powerful processing units. Especially when deployed on embedded systems, model size and inference time must be significantly reduced. We propose SYMOG (symmetric mixture of Gaussian modes), which significantly decreases the complexity of DNNs through low-bit fixed-point quantization. SYMOG is a novel soft quantization method such that the learning task and the quantization are solved simultaneously. During training the weight distribution changes from an unimodal Gaussian distribution to a symmetric mixture of Gaussians, where each mean value belongs to a particular fixed-point mode. We evaluate our approach with different architectures (LeNet5, VGG7, VGG11, DenseNet) on common benchmark data sets (MNIST, CIFAR-10, CIFAR-100) and we compare with state-of-the-art quantization approaches. We achieve excellent results and outperform 2-bit state-of-the-art performance with an error rate of only 5.71% on CIFAR-10 and 27.65% on CIFAR-100.
摘要:深层神经网络(DNNs)已被证明优于几种机器学习经典的基准方法。然而,他们具有很高的计算复杂度,需要强大的处理单元。尤其是在嵌入式系统部署的时候,模型的大小和推理时必须显著减少。我们建议SYMOG(高斯模式对称混合物),其显著降低DNNs的通过低位定点量化的复杂度。 SYMOG是一种新型的软量化方法,使得学习任务和量化被同时解决。期间从单峰高斯分布训练重量分布的变化,以高斯对称混合物,其中,每个平均值属于特定的定点模式。我们评估我们有共同的基准数据集不同的架构(LeNet5,VGG7,VGG11,DenseNet)(MNIST,CIFAR-10,CIFAR-100)的方式,我们与国家的最先进的量化方法进行比较。我们实现了优异的成绩,并优于国家的最先进的2位的性能仅为5.71%上CIFAR-10的错误率和27.65%的CIFAR-100。
Lukas Enderich, Fabian Timm, Wolfram Burgard
Abstract: Deep neural networks (DNNs) have been proven to outperform classical methods on several machine learning benchmarks. However, they have high computational complexity and require powerful processing units. Especially when deployed on embedded systems, model size and inference time must be significantly reduced. We propose SYMOG (symmetric mixture of Gaussian modes), which significantly decreases the complexity of DNNs through low-bit fixed-point quantization. SYMOG is a novel soft quantization method such that the learning task and the quantization are solved simultaneously. During training the weight distribution changes from an unimodal Gaussian distribution to a symmetric mixture of Gaussians, where each mean value belongs to a particular fixed-point mode. We evaluate our approach with different architectures (LeNet5, VGG7, VGG11, DenseNet) on common benchmark data sets (MNIST, CIFAR-10, CIFAR-100) and we compare with state-of-the-art quantization approaches. We achieve excellent results and outperform 2-bit state-of-the-art performance with an error rate of only 5.71% on CIFAR-10 and 27.65% on CIFAR-100.
摘要:深层神经网络(DNNs)已被证明优于几种机器学习经典的基准方法。然而,他们具有很高的计算复杂度,需要强大的处理单元。尤其是在嵌入式系统部署的时候,模型的大小和推理时必须显著减少。我们建议SYMOG(高斯模式对称混合物),其显著降低DNNs的通过低位定点量化的复杂度。 SYMOG是一种新型的软量化方法,使得学习任务和量化被同时解决。期间从单峰高斯分布训练重量分布的变化,以高斯对称混合物,其中,每个平均值属于特定的定点模式。我们评估我们有共同的基准数据集不同的架构(LeNet5,VGG7,VGG11,DenseNet)(MNIST,CIFAR-10,CIFAR-100)的方式,我们与国家的最先进的量化方法进行比较。我们实现了优异的成绩,并优于国家的最先进的2位的性能仅为5.71%上CIFAR-10的错误率和27.65%的CIFAR-100。
22. Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding [PDF] 返回目录
Yibo Yang, Robert Bamler, Stephan Mandt
Abstract: Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the "plug and play" nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
摘要:深贝叶斯潜变量模型已经启用了新的方法来既模型和数据压缩。在这里,我们提出了压缩在深概率模型潜表示,如变自动编码,在后处理的新算法。该方法从而分离压缩任务模型的设计和培训。我们的算法推广算术编码连续域,采用自适应离散精度,它利用后不确定性的估值。我们的做法的“即插即用”特性的结果是,各率失真权衡可以用一个单一的训练模型来实现,无需训练多模型不同的比特率。我们的实验结果表明,考虑到不确定因素后的重要性,并显示在宽范围内只使用一台机器学习模型的比特率的算法性能优于JPEG是图像压缩。贝叶斯神经字的嵌入进一步的实验证明了该方法的通用性。
Yibo Yang, Robert Bamler, Stephan Mandt
Abstract: Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the "plug and play" nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
摘要:深贝叶斯潜变量模型已经启用了新的方法来既模型和数据压缩。在这里,我们提出了压缩在深概率模型潜表示,如变自动编码,在后处理的新算法。该方法从而分离压缩任务模型的设计和培训。我们的算法推广算术编码连续域,采用自适应离散精度,它利用后不确定性的估值。我们的做法的“即插即用”特性的结果是,各率失真权衡可以用一个单一的训练模型来实现,无需训练多模型不同的比特率。我们的实验结果表明,考虑到不确定因素后的重要性,并显示在宽范围内只使用一台机器学习模型的比特率的算法性能优于JPEG是图像压缩。贝叶斯神经字的嵌入进一步的实验证明了该方法的通用性。
23. Randomized Smoothing of All Shapes and Sizes [PDF] 返回目录
Greg Yang, Tony Duan, Edward Hu, Hadi Salman, Ilya Razenshteyn, Jerry Li
Abstract: Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against $\ell_2$ perturbations. Soon after, a number of works devised new randomized smoothing schemes for other metrics, such as $\ell_1$ or $\ell_\infty$; however, for each geometry, substantial effort was needed to derive new robustness guarantees. This begs the question: can we find a general theory for randomized smoothing? In this work we propose a novel framework for devising and analyzing randomized smoothing schemes, and validate its effectiveness in practice. Our theoretical contributions are as follows: (1) We show that for an appropriate notion of "optimal", the optimal smoothing distributions for any "nice" norm have level sets given by the *Wulff Crystal* of that norm. (2) We propose two novel and complementary methods for deriving provably robust radii for any smoothing distribution. Finally, (3) we show fundamental limits to current randomized smoothing techniques via the theory of *Banach space cotypes*. By combining (1) and (2), we significantly improve the state-of-the-art certified accuracy in $\ell_1$ on standard datasets. On the other hand, using (3), we show that, without more information than label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbations of $\ell_\infty$-norm $\Omega(1/\sqrt d)$, when the input dimension $d$ is large. We provide code in this http URL.
摘要:随机平滑是对已取得国家的最先进的可证明可以有效抵抗$ \ $ ell_2扰动敌对攻击最近提出的辩护。不久之后,一系列的工程,设计出新的随机平滑方案的其他指标,如$ \ $ ell_1或$ \ ell_ \ infty $;然而,对于每一个几何体,大量的努力,需要推导出新的鲁棒性的保证。这引出了一个问题:我们可以找到随机平滑的一般理论?在这项工作中,我们提出了一个新的框架设计和分析随机平滑方案,并在实践中验证其有效性。我们的理论贡献如下:(1)我们表明,“最优”的适当观念,优化平滑分布的任何“好”的标准必须由*乌尔夫水晶给出水平集*是规范的。 (2)我们提出用于推导可证明坚固的半径为任何平滑分布两种新型和互补的方法。最后,(3)我们将展示通过*的Banach空间cotypes的理论,目前随机平滑技术的根本限制*。通过组合(1)和(2),我们显著改善标准数据集在$ \ $ ell_1的国家的最先进的认证精度。在另一方面,用(3),我们发现,没有比在随机输入扰动标签统计信息的详细信息,随机平滑无法实现对$ \ ell_ \ infty $范数$ \欧米茄的扰动平凡的认证精度(1 / \开方d)$,当输入尺寸$ d $大。我们在这个HTTP URL提供的代码。
Greg Yang, Tony Duan, Edward Hu, Hadi Salman, Ilya Razenshteyn, Jerry Li
Abstract: Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against $\ell_2$ perturbations. Soon after, a number of works devised new randomized smoothing schemes for other metrics, such as $\ell_1$ or $\ell_\infty$; however, for each geometry, substantial effort was needed to derive new robustness guarantees. This begs the question: can we find a general theory for randomized smoothing? In this work we propose a novel framework for devising and analyzing randomized smoothing schemes, and validate its effectiveness in practice. Our theoretical contributions are as follows: (1) We show that for an appropriate notion of "optimal", the optimal smoothing distributions for any "nice" norm have level sets given by the *Wulff Crystal* of that norm. (2) We propose two novel and complementary methods for deriving provably robust radii for any smoothing distribution. Finally, (3) we show fundamental limits to current randomized smoothing techniques via the theory of *Banach space cotypes*. By combining (1) and (2), we significantly improve the state-of-the-art certified accuracy in $\ell_1$ on standard datasets. On the other hand, using (3), we show that, without more information than label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbations of $\ell_\infty$-norm $\Omega(1/\sqrt d)$, when the input dimension $d$ is large. We provide code in this http URL.
摘要:随机平滑是对已取得国家的最先进的可证明可以有效抵抗$ \ $ ell_2扰动敌对攻击最近提出的辩护。不久之后,一系列的工程,设计出新的随机平滑方案的其他指标,如$ \ $ ell_1或$ \ ell_ \ infty $;然而,对于每一个几何体,大量的努力,需要推导出新的鲁棒性的保证。这引出了一个问题:我们可以找到随机平滑的一般理论?在这项工作中,我们提出了一个新的框架设计和分析随机平滑方案,并在实践中验证其有效性。我们的理论贡献如下:(1)我们表明,“最优”的适当观念,优化平滑分布的任何“好”的标准必须由*乌尔夫水晶给出水平集*是规范的。 (2)我们提出用于推导可证明坚固的半径为任何平滑分布两种新型和互补的方法。最后,(3)我们将展示通过*的Banach空间cotypes的理论,目前随机平滑技术的根本限制*。通过组合(1)和(2),我们显著改善标准数据集在$ \ $ ell_1的国家的最先进的认证精度。在另一方面,用(3),我们发现,没有比在随机输入扰动标签统计信息的详细信息,随机平滑无法实现对$ \ ell_ \ infty $范数$ \欧米茄的扰动平凡的认证精度(1 / \开方d)$,当输入尺寸$ d $大。我们在这个HTTP URL提供的代码。
24. Hierarchical Quantized Autoencoders [PDF] 返回目录
Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty
Abstract: Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and high-level features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational AutoEncoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a more probabilistic framing of the VQ-VAE, of which previous work is a limiting case. Our hierarchy produces a Markovian series of latent variables that reconstruct high-quality images which retain semantically meaningful features. These latents can then be further used to generate realistic samples. We provide qualitative and quantitative evaluations of reconstructions and samples on the CelebA and MNIST datasets.
摘要:尽管在训练神经网络的有损图像压缩的进步,目前的做法不能在非常低的比特率,以保持两者感知质量和高层次的特点。通过学习与矢量量化变自动编码(VQ-VAES)离散表示最近成功的鼓舞,我们鼓励使用VQ-VAES的层次达到压缩的高因素。我们表明,量化和层次潜在结构的组合有助于基于可能性的图像压缩。这使我们介绍VQ-VAE,其中以前的工作是一种极限情况的概率更取景。我们的层次产生了马氏一系列潜在变量是重构的高品质的图像保留语义上有意义的功能。然后,这些latents可以进一步用于产生逼真的样品。我们提供的CelebA和MNIST数据集重建和样品的定性和定量评估。
Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty
Abstract: Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and high-level features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational AutoEncoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a more probabilistic framing of the VQ-VAE, of which previous work is a limiting case. Our hierarchy produces a Markovian series of latent variables that reconstruct high-quality images which retain semantically meaningful features. These latents can then be further used to generate realistic samples. We provide qualitative and quantitative evaluations of reconstructions and samples on the CelebA and MNIST datasets.
摘要:尽管在训练神经网络的有损图像压缩的进步,目前的做法不能在非常低的比特率,以保持两者感知质量和高层次的特点。通过学习与矢量量化变自动编码(VQ-VAES)离散表示最近成功的鼓舞,我们鼓励使用VQ-VAES的层次达到压缩的高因素。我们表明,量化和层次潜在结构的组合有助于基于可能性的图像压缩。这使我们介绍VQ-VAE,其中以前的工作是一种极限情况的概率更取景。我们的层次产生了马氏一系列潜在变量是重构的高品质的图像保留语义上有意义的功能。然后,这些latents可以进一步用于产生逼真的样品。我们提供的CelebA和MNIST数据集重建和样品的定性和定量评估。
25. Neural Networks on Random Graphs [PDF] 返回目录
Romuald A. Janik, Aleksandra Nowak
Abstract: We performed a massive evaluation of neural networks with architectures corresponding to random graphs of various types. Apart from the classical random graph families including random, scale-free and small world graphs, we introduced a novel and flexible algorithm for directly generating random directed acyclic graphs (DAG) and studied a class of graphs derived from functional resting state fMRI networks. A majority of the best performing networks were indeed in these new families. We also proposed a general procedure for turning a graph into a DAG necessary for a feed-forward neural network. We investigated various structural and numerical properties of the graphs in relation to neural network test accuracy. Since none of the classical numerical graph invariants by itself seems to allow to single out the best networks, we introduced new numerical characteristics that selected a set of quasi-1-dimensional graphs, which were the majority among the best performing networks.
摘要:我们进行神经网络的大规模评估对应于不同类型的随机图的架构。除了经典的随机图家庭包括无规,无标度和小世界图形,我们介绍了用于直接产生随机向无环图(DAG)和研究了类从功能的静止状态的fMRI网络导出图形的一种新颖的和灵活的算法。大多数表现最好的网络确实在这些新的家庭。我们还提出了把一个图形转化为前馈神经网络所必需的DAG的一般过程。我们研究了有关神经网络的测试精度图形的各种结构和数值性质。由于没有一个经典的数值图形不变量本身似乎允许挑出最佳的网络中,我们介绍了该选择的一组准一维图形,这是大多数表现最佳的网络之间的新的数字特征。
Romuald A. Janik, Aleksandra Nowak
Abstract: We performed a massive evaluation of neural networks with architectures corresponding to random graphs of various types. Apart from the classical random graph families including random, scale-free and small world graphs, we introduced a novel and flexible algorithm for directly generating random directed acyclic graphs (DAG) and studied a class of graphs derived from functional resting state fMRI networks. A majority of the best performing networks were indeed in these new families. We also proposed a general procedure for turning a graph into a DAG necessary for a feed-forward neural network. We investigated various structural and numerical properties of the graphs in relation to neural network test accuracy. Since none of the classical numerical graph invariants by itself seems to allow to single out the best networks, we introduced new numerical characteristics that selected a set of quasi-1-dimensional graphs, which were the majority among the best performing networks.
摘要:我们进行神经网络的大规模评估对应于不同类型的随机图的架构。除了经典的随机图家庭包括无规,无标度和小世界图形,我们介绍了用于直接产生随机向无环图(DAG)和研究了类从功能的静止状态的fMRI网络导出图形的一种新颖的和灵活的算法。大多数表现最好的网络确实在这些新的家庭。我们还提出了把一个图形转化为前馈神经网络所必需的DAG的一般过程。我们研究了有关神经网络的测试精度图形的各种结构和数值性质。由于没有一个经典的数值图形不变量本身似乎允许挑出最佳的网络中,我们介绍了该选择的一组准一维图形,这是大多数表现最佳的网络之间的新的数字特征。
26. Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation [PDF] 返回目录
Hai H. Tran, Sumyeong Ahn, Taeyoung Lee, Yung Yi
Abstract: In this paper, we study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of empowering the discriminativeness: Adding a new, artificial class and training the model on the data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore drawing the decision boundaries more effectively. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.
摘要:在本文中,我们研究了无人监管的领域适应性的问题,其目的是获得一个预测模型使用从目标域源域和未标记数据标记数据目标域。存在最近研究的基础上提取功能,不仅不变量两个领域,但也提供了目标域的高辨别力的想法的数组。在本文中,我们建议赋权discriminativeness的想法:添加一个新的,人造类和数据与新类的GAN-生成的样本一起训练模型。基于新类的样品训练的模型能够提取是通过在目标域中重新定位当前类的数据,并因此更有效地绘制决策边界更有辨别力的特征的。我们的想法是非常通用的,所以它与现有的许多方法,如DANN,VADA,和污垢-T兼容。我们进行了各种实验常用的无监督域适应的评价标准数据,并证明我们的算法实现在很多场景的SOTA性能。
Hai H. Tran, Sumyeong Ahn, Taeyoung Lee, Yung Yi
Abstract: In this paper, we study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of empowering the discriminativeness: Adding a new, artificial class and training the model on the data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore drawing the decision boundaries more effectively. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.
摘要:在本文中,我们研究了无人监管的领域适应性的问题,其目的是获得一个预测模型使用从目标域源域和未标记数据标记数据目标域。存在最近研究的基础上提取功能,不仅不变量两个领域,但也提供了目标域的高辨别力的想法的数组。在本文中,我们建议赋权discriminativeness的想法:添加一个新的,人造类和数据与新类的GAN-生成的样本一起训练模型。基于新类的样品训练的模型能够提取是通过在目标域中重新定位当前类的数据,并因此更有效地绘制决策边界更有辨别力的特征的。我们的想法是非常通用的,所以它与现有的许多方法,如DANN,VADA,和污垢-T兼容。我们进行了各种实验常用的无监督域适应的评价标准数据,并证明我们的算法实现在很多场景的SOTA性能。
27. Globally optimal point set registration by joint symmetry plane fitting [PDF] 返回目录
Lan Hu, Haomin Shi, Laurent Kneip
Abstract: The present work proposes a solution to the challenging problem of registering two partial point sets of the same object with very limited overlap. We leverage the fact that most objects found in man-made environments contain a plane of symmetry. By reflecting the points of each set with respect to the plane of symmetry, we can largely increase the overlap between the sets and therefore boost the registration process. However, prior knowledge about the plane of symmetry is generally unavailable or at least very hard to find, especially with limited partial views, and finding this plane could strongly benefit from a prior alignment of the partial point sets. We solve this chicken-and-egg problem by jointly optimizing the relative pose and symmetry plane parameters, and notably do so under global optimality by employing the branch-and-bound (BnB) paradigm. Our results demonstrate a great improvement over the current state-of-the-art in globally optimal point set registration for common objects. We furthermore show an interesting application of our method to dense 3D reconstruction of scenes with repetitive objects.
摘要:本工作提出了一种解决方案,以登记所述同一物体的两个部分点集与非常有限的重叠有挑战性的问题。我们利用一个事实,即在人工环境中最对象包含对称面。通过反映每个组的点相对于对称面,我们可以在很大程度上增加集之间的重叠,并且因此提高了注册过程。然而,关于对称平面先验知识一般是不可用的,或者至少很难找到,特别是有限的局部视图,并发现这种飞机能够从强劲的部分点集的前对准受益。我们通过联合优化的相对姿态和对称平面参数解决这个鸡和蛋的问题,并通过采用分支定界(泡泡堂)范例尤其是这样做的全局最优下。我们的研究结果表明在当前国家的最先进的,共同的对象全局最优的点集注册了很大的改进。我们还表明我们的方法的一个有趣的应用程序,以密集的三维重建与重复对象的场景。
Lan Hu, Haomin Shi, Laurent Kneip
Abstract: The present work proposes a solution to the challenging problem of registering two partial point sets of the same object with very limited overlap. We leverage the fact that most objects found in man-made environments contain a plane of symmetry. By reflecting the points of each set with respect to the plane of symmetry, we can largely increase the overlap between the sets and therefore boost the registration process. However, prior knowledge about the plane of symmetry is generally unavailable or at least very hard to find, especially with limited partial views, and finding this plane could strongly benefit from a prior alignment of the partial point sets. We solve this chicken-and-egg problem by jointly optimizing the relative pose and symmetry plane parameters, and notably do so under global optimality by employing the branch-and-bound (BnB) paradigm. Our results demonstrate a great improvement over the current state-of-the-art in globally optimal point set registration for common objects. We furthermore show an interesting application of our method to dense 3D reconstruction of scenes with repetitive objects.
摘要:本工作提出了一种解决方案,以登记所述同一物体的两个部分点集与非常有限的重叠有挑战性的问题。我们利用一个事实,即在人工环境中最对象包含对称面。通过反映每个组的点相对于对称面,我们可以在很大程度上增加集之间的重叠,并且因此提高了注册过程。然而,关于对称平面先验知识一般是不可用的,或者至少很难找到,特别是有限的局部视图,并发现这种飞机能够从强劲的部分点集的前对准受益。我们通过联合优化的相对姿态和对称平面参数解决这个鸡和蛋的问题,并通过采用分支定界(泡泡堂)范例尤其是这样做的全局最优下。我们的研究结果表明在当前国家的最先进的,共同的对象全局最优的点集注册了很大的改进。我们还表明我们的方法的一个有趣的应用程序,以密集的三维重建与重复对象的场景。
28. Block Switching: A Stochastic Approach for Deep Learning Security [PDF] 返回目录
Xiao Wang, Siyue Wang, Pin-Yu Chen, Xue Lin, Peter Chin
Abstract: Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. That is, subtly crafted perturbations of the input can make a trained network with high accuracy produce arbitrary incorrect predictions, while maintain imperceptible to human vision system. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on stochasticity. BS replaces a block of model layers with multiple parallel channels, and the active channel is randomly assigned in the run time hence unpredictable to the adversary. We show empirically that BS leads to a more dispersed input gradient distribution and superior defense effectiveness compared with other stochastic defenses such as stochastic activation pruning (SAP). Compared to other defenses, BS is also characterized by the following features: (i) BS causes less test accuracy drop; (ii) BS is attack-independent and (iii) BS is compatible with other defenses and can be used jointly with others.
摘要:对抗性攻击最近的研究已经揭示现代深度学习模式的脆弱性。也就是说,输入的巧妙制作的扰动可以使高精度生产任意不正确的预测训练有素的网络,同时保持察觉不到人类视觉系统。在本文中,我们介绍了块切换(BS),针对基于随机性对抗攻击的防御策略。 BS替换具有多个平行通道的模型层的块,以及有源沟道在运行时间,因此无法预测到对手随机分配。我们表明凭经验该BS导致更分散的输入梯度分布,并与其他随机防御相比优越防御效果如随机活化修剪(SAP)。相比其他防御,BS的特征还在于以下特征:(ⅰ)BS引起较少测试精度降; (二)BS是攻击独立及(iii)BS与其他防御兼容,可以与他人共同使用。
Xiao Wang, Siyue Wang, Pin-Yu Chen, Xue Lin, Peter Chin
Abstract: Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. That is, subtly crafted perturbations of the input can make a trained network with high accuracy produce arbitrary incorrect predictions, while maintain imperceptible to human vision system. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on stochasticity. BS replaces a block of model layers with multiple parallel channels, and the active channel is randomly assigned in the run time hence unpredictable to the adversary. We show empirically that BS leads to a more dispersed input gradient distribution and superior defense effectiveness compared with other stochastic defenses such as stochastic activation pruning (SAP). Compared to other defenses, BS is also characterized by the following features: (i) BS causes less test accuracy drop; (ii) BS is attack-independent and (iii) BS is compatible with other defenses and can be used jointly with others.
摘要:对抗性攻击最近的研究已经揭示现代深度学习模式的脆弱性。也就是说,输入的巧妙制作的扰动可以使高精度生产任意不正确的预测训练有素的网络,同时保持察觉不到人类视觉系统。在本文中,我们介绍了块切换(BS),针对基于随机性对抗攻击的防御策略。 BS替换具有多个平行通道的模型层的块,以及有源沟道在运行时间,因此无法预测到对手随机分配。我们表明凭经验该BS导致更分散的输入梯度分布,并与其他随机防御相比优越防御效果如随机活化修剪(SAP)。相比其他防御,BS的特征还在于以下特征:(ⅰ)BS引起较少测试精度降; (二)BS是攻击独立及(iii)BS与其他防御兼容,可以与他人共同使用。
29. LocoGAN -- Locally Convolutional GAN [PDF] 返回目录
Łukasz Struski, Szymon Knop, Jacek Tabor, Wiktor Daniec, Przemysław Spurek
Abstract: In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is given by noise-like images of possibly different resolutions. The learning is local, i.e. we process not the whole noise-like image, but the sub-images of a fixed size. As a consequence LocoGAN can produce images of arbitrary dimensions e.g. LSUN bedroom data set. Another advantage of our approach comes from the fact that we use the position channels, which allows the generation of fully periodic (e.g. cylindrical panoramic images) or almost periodic ,,infinitely long" images (e.g. wall-papers).
摘要:在本文中,我们构建完全卷积GAN模型:LocoGAN,其潜在空间由下式给出噪声状的可能不同的分辨率的图像。学习是本地的,即,我们处理不整类噪声图像,但一个固定尺寸的子图像。因此LocoGAN可以产生任意的尺寸例如图像LSUN卧室的数据集。我们的方法的另一个优势来自于一个事实,我们使用的位置的通道,这使得全周期(例如圆柱形全景图像)的生成或几乎周期,,无限长”的图像(如墙纸)。
Łukasz Struski, Szymon Knop, Jacek Tabor, Wiktor Daniec, Przemysław Spurek
Abstract: In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is given by noise-like images of possibly different resolutions. The learning is local, i.e. we process not the whole noise-like image, but the sub-images of a fixed size. As a consequence LocoGAN can produce images of arbitrary dimensions e.g. LSUN bedroom data set. Another advantage of our approach comes from the fact that we use the position channels, which allows the generation of fully periodic (e.g. cylindrical panoramic images) or almost periodic ,,infinitely long" images (e.g. wall-papers).
摘要:在本文中,我们构建完全卷积GAN模型:LocoGAN,其潜在空间由下式给出噪声状的可能不同的分辨率的图像。学习是本地的,即,我们处理不整类噪声图像,但一个固定尺寸的子图像。因此LocoGAN可以产生任意的尺寸例如图像LSUN卧室的数据集。我们的方法的另一个优势来自于一个事实,我们使用的位置的通道,这使得全周期(例如圆柱形全景图像)的生成或几乎周期,,无限长”的图像(如墙纸)。
30. Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent [PDF] 返回目录
Pu Zhao, Pin-Yu Chen, Siyue Wang, Xue Lin
Abstract: Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
摘要:尽管现代深层神经网络(DNNs)的巨大成就,国家的最先进的DNNs的脆弱性/稳健性提出了在许多应用领域的安全问题要求高可靠性。各种敌对攻击,提出了破坏DNN模型的学习表现。在这些,暗箱敌对攻击方法已经收到由于其实用性和简单的特殊关注。黑盒攻击通常是为了保持隐身和低成本喜欢少的查询。然而,大多数的当前黑箱攻击方法采用一阶梯度下降法,其可以配有某些缺陷,诸如相对慢的收敛和超参数设置高灵敏度。在本文中,我们提出了一个零阶的自然梯度下降(ZO-NGD)方法来设计敌对攻击,其采用了零阶梯度估计技术迎合黑盒攻击场景和第二阶固有梯度下降以实现更高的查询效率。使图像数据集分类的经验评价表明,ZO-NGD可与国家的最先进的攻击方法相比获得显著下模型查询复杂性。
Pu Zhao, Pin-Yu Chen, Siyue Wang, Xue Lin
Abstract: Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
摘要:尽管现代深层神经网络(DNNs)的巨大成就,国家的最先进的DNNs的脆弱性/稳健性提出了在许多应用领域的安全问题要求高可靠性。各种敌对攻击,提出了破坏DNN模型的学习表现。在这些,暗箱敌对攻击方法已经收到由于其实用性和简单的特殊关注。黑盒攻击通常是为了保持隐身和低成本喜欢少的查询。然而,大多数的当前黑箱攻击方法采用一阶梯度下降法,其可以配有某些缺陷,诸如相对慢的收敛和超参数设置高灵敏度。在本文中,我们提出了一个零阶的自然梯度下降(ZO-NGD)方法来设计敌对攻击,其采用了零阶梯度估计技术迎合黑盒攻击场景和第二阶固有梯度下降以实现更高的查询效率。使图像数据集分类的经验评价表明,ZO-NGD可与国家的最先进的攻击方法相比获得显著下模型查询复杂性。
31. CBIR using features derived by Deep Learning [PDF] 返回目录
Subhadip Maji, Smarajit Bose
Abstract: In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image. The usual procedure is to extract some useful features from the query image, and retrieve images which have similar set of features. For this purpose, a suitable similarity measure is chosen, and images with high similarity scores are retrieved. Naturally the choice of these features play a very important role in the success of this system, and high level features are required to reduce the semantic gap. In this paper, we propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem. This approach appears to produce vastly superior results for a variety of databases, and it outperforms many contemporary CBIR systems. We analyse the retrieval time of the method, and also propose a pre-clustering of the database based on the above-mentioned features which yields comparable results in a much shorter time in most of the cases.
摘要:基于内容的图像检索(CBIR)系统,任务是从给定的查询图像的大型数据库中检索类似的图像。通常的程序是从查询图像中提取一些有用的功能,和检索具有类似的一组特征的图像。为此目的,合适的相似性度量被选择,并以高相似性分数的图像检索。当然这些功能的选择发挥该系统的成功非常重要的作用,而高层次的特点是要求减少语义鸿沟。在本文中,我们建议使用从预先训练网络模型导出的特征,从训练大图像分类问题深学习卷积网络。这种方法似乎产生远远优于结果的各种数据库,它优于许多当代CBIR系统。我们分析了该方法的检索时间,并且还提出了一种基于其产生类似的结果在更短的时间在大多数的情况下,上述特征数据库的预集群。
Subhadip Maji, Smarajit Bose
Abstract: In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image. The usual procedure is to extract some useful features from the query image, and retrieve images which have similar set of features. For this purpose, a suitable similarity measure is chosen, and images with high similarity scores are retrieved. Naturally the choice of these features play a very important role in the success of this system, and high level features are required to reduce the semantic gap. In this paper, we propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem. This approach appears to produce vastly superior results for a variety of databases, and it outperforms many contemporary CBIR systems. We analyse the retrieval time of the method, and also propose a pre-clustering of the database based on the above-mentioned features which yields comparable results in a much shorter time in most of the cases.
摘要:基于内容的图像检索(CBIR)系统,任务是从给定的查询图像的大型数据库中检索类似的图像。通常的程序是从查询图像中提取一些有用的功能,和检索具有类似的一组特征的图像。为此目的,合适的相似性度量被选择,并以高相似性分数的图像检索。当然这些功能的选择发挥该系统的成功非常重要的作用,而高层次的特点是要求减少语义鸿沟。在本文中,我们建议使用从预先训练网络模型导出的特征,从训练大图像分类问题深学习卷积网络。这种方法似乎产生远远优于结果的各种数据库,它优于许多当代CBIR系统。我们分析了该方法的检索时间,并且还提出了一种基于其产生类似的结果在更短的时间在大多数的情况下,上述特征数据库的预集群。
注:中文为机器翻译结果!