目录
2. Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction [PDF] 摘要
8. The Effects of Skin Lesion Segmentation on the Performance of Dermatoscopic Image Classification [PDF] 摘要
9. PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges [PDF] 摘要
14. Counting from Sky: A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method [PDF] 摘要
17. Fast Single-shot Ship Instance Segmentation Based on Polar Template Mask in Remote Sensing Images [PDF] 摘要
22. Modality Attention and Sampling Enables Deep Learning with Heterogeneous Marker Combinations in Fluorescence Microscopy [PDF] 摘要
26. A Scene-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video [PDF] 摘要
27. Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision [PDF] 摘要
31. Simulation-supervised deep learning for analysing organelles states and behaviour in living cells [PDF] 摘要
36. W-Net: Dense Semantic Segmentation of Subcutaneous Tissue in Ultrasound Images by Expanding U-Net to Incorporate Ultrasound RF Waveform Data [PDF] 摘要
37. Improving the Segmentation of Scanning Probe Microscope Images using Convolutional Neural Networks [PDF] 摘要
摘要
1. AllenAct: A Framework for Embodied AI Research [PDF] 返回目录
Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi
Abstract: The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied question answering), and associated leaderboards. While this diversity has been beneficial and organic, it has also fragmented the community: a huge amount of effort is required to do something as simple as taking a model trained in one environment and testing it in another. This discourages good science. We introduce AllenAct, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research. AllenAct provides first-class support for a growing collection of embodied environments, tasks and algorithms, provides reproductions of state-of-the-art models and includes extensive documentation, tutorials, start-up code, and pre-trained models. We hope that our framework makes Embodied AI more accessible and encourages new researchers to join this exciting area. The framework can be accessed at: this https URL
摘要:体现AI,其中药剂学完成任务通过互动与以自我为中心的观察他们的环境,经历了深强化学习的出现大幅增长,并增加了从计算机视觉,自然语言处理和机器人社区利益的领域。这种增长已被创建的大量模拟环境(如AI2-THOR,生境和CARLA)的容易,任务(像点导航,指令以下内容,具体化问题回答)和相关联的排行榜。虽然这种多样性一直是有益的,有机的,它也支离破碎的社会:一个大量的精力,需要做一些简单的拍摄在一个环境中训练的模型,并在另一个测试它。这阻碍了良好的科学。我们介绍AllenAct,模块化和灵活的学习框架设计的重点是体现人工智能研究的独特需求。 AllenAct提供的具体环境,任务和算法越来越多的集合一流的支持,提供了国家的最先进机型的复制品,包括广泛的文件,教程,启动代码,并预先训练模式。我们希望我们的框架,使体现AI更方便和鼓励新的研究人员加入这个令人兴奋的领域。此HTTPS URL:该框架可以在访问
Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi
Abstract: The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied question answering), and associated leaderboards. While this diversity has been beneficial and organic, it has also fragmented the community: a huge amount of effort is required to do something as simple as taking a model trained in one environment and testing it in another. This discourages good science. We introduce AllenAct, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research. AllenAct provides first-class support for a growing collection of embodied environments, tasks and algorithms, provides reproductions of state-of-the-art models and includes extensive documentation, tutorials, start-up code, and pre-trained models. We hope that our framework makes Embodied AI more accessible and encourages new researchers to join this exciting area. The framework can be accessed at: this https URL
摘要:体现AI,其中药剂学完成任务通过互动与以自我为中心的观察他们的环境,经历了深强化学习的出现大幅增长,并增加了从计算机视觉,自然语言处理和机器人社区利益的领域。这种增长已被创建的大量模拟环境(如AI2-THOR,生境和CARLA)的容易,任务(像点导航,指令以下内容,具体化问题回答)和相关联的排行榜。虽然这种多样性一直是有益的,有机的,它也支离破碎的社会:一个大量的精力,需要做一些简单的拍摄在一个环境中训练的模型,并在另一个测试它。这阻碍了良好的科学。我们介绍AllenAct,模块化和灵活的学习框架设计的重点是体现人工智能研究的独特需求。 AllenAct提供的具体环境,任务和算法越来越多的集合一流的支持,提供了国家的最先进机型的复制品,包括广泛的文件,教程,启动代码,并预先训练模式。我们希望我们的框架,使体现AI更方便和鼓励新的研究人员加入这个令人兴奋的领域。此HTTPS URL:该框架可以在访问
2. Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction [PDF] 返回目录
David Novotny, Roman Shapovalov, Andrea Vedaldi
Abstract: We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
摘要:本文提出了规范的3D变形器地图,公共对象类的3D形状,可以从独立对象的2D图像的采集学到的新表示。我们的方法生成在从参量变形的模型,非参数三维重建,和典型的嵌入概念的新颖的方式,结合它们各自的优点。特别是,它学会每个图像像素与对应的3D对象点的变形模型,该模型是规范的,即固有的点的身份和整个类别的对象共享相关联。其结果是,给定的只在训练时间稀疏2D监督,可以在测试时,重建从单个视图中的对象的三维形状和纹理,同时建立对象实例之间进行有意义的致密的对应的方法。它还实现了对公共国家的先进成果密集三维重建中最疯狂的面孔,汽车和鸟类的数据集。
David Novotny, Roman Shapovalov, Andrea Vedaldi
Abstract: We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
摘要:本文提出了规范的3D变形器地图,公共对象类的3D形状,可以从独立对象的2D图像的采集学到的新表示。我们的方法生成在从参量变形的模型,非参数三维重建,和典型的嵌入概念的新颖的方式,结合它们各自的优点。特别是,它学会每个图像像素与对应的3D对象点的变形模型,该模型是规范的,即固有的点的身份和整个类别的对象共享相关联。其结果是,给定的只在训练时间稀疏2D监督,可以在测试时,重建从单个视图中的对象的三维形状和纹理,同时建立对象实例之间进行有意义的致密的对应的方法。它还实现了对公共国家的先进成果密集三维重建中最疯狂的面孔,汽车和鸟类的数据集。
3. On the Reliability of the PNU for Source Camera Identification Tasks [PDF] 返回目录
Andrea Bruno, Giuseppe Cattaneo, Paola Capasso
Abstract: The PNU is an essential and reliable tool to perform SCI and, during the years, became a standard de-facto for this task in the forensic field. In this paper, we show that, although strategies exist that aim to cancel, modify, replace the PNU traces in a digital camera image, it is still possible, through our experimental method, to find residual traces of the noise produced by the sensor used to shoot the photo. Furthermore, we show that is possible to inject the PNU of a different camera in a target image and trace it back to the source camera, but only under the condition that the new camera is of the same model of the original one used to take the target image. Both cameras must fall within our availability. For completeness, we carried out 2 experiments and, rather than using the popular public reference dataset, CASIA TIDE, we preferred to introduce a dataset that does not present any kind of statistical artifacts. A preliminary experiment on a small dataset of smartphones showed that the injection of PNU from a different device makes it impossible to identify the source camera correctly. For a second experiment, we built a large dataset of images taken with the same model DSLR. We extracted a denoised version of each image, injected each one with the RN of all the cameras in the dataset and compared all with a RP from each camera. The results of the experiments, clearly, show that either in the denoised images and the injected ones is possible to find residual traces of the original camera PNU. The combined results of the experiments show that, even in theory is possible to remove or replace the \ac{PNU} from an image, this process can be, easily, detected and is possible, under some hard conditions, confirming the robustness of the \ac{PNU} under this type of attacks.
摘要:PNU是在年内完成SCI和,一个重要和可靠的工具,成为标准事实上的在法医领域这一任务。在本文中,我们表明,虽然策略存在这一目标取消,修改,在数字照相机中的图像替换PNU痕迹,它仍然是可能的,通过我们的实验方法,以找到使用的传感器产生的噪声的残留的痕量拍摄的照片。此外,我们表明,可以注入不同的相机的PNU的目标图像,并追踪它回到源摄像头,但只有新的相机是原来一个同型号的条件下使用取目标图像。这两款相机都必须落在我们的可用性内。为了完整性,我们进行了2个实验和,而不是使用流行公共参考数据集,CASIA TIDE,我们优选的是引入不存在任何类型的统计假象的一个数据集。在智能手机上的一个小数据集的初步试验表明,PNU的从不同的设备的喷射使得不可能正确地识别源相机。对于第二个实验中,我们建立了一个大型数据集与同型号的数码单反相机拍摄的图像。我们提取每个图像的去噪版本,注入的数据集中的所有摄像机的RN每一个和所有从每个相机RP比较。该实验的结果,显然,表明无论是在去噪图像和注射的人可以发现,原来相机PNU的残留痕迹。实验的合并结果表明,即使在理论上是可能的,以除去或从图像替换\交流{PNU},这个过程可以是容易,检测和是可能的,一些硬的条件下,确认的鲁棒性\ {AC} PNU这种类型的攻击之下。
Andrea Bruno, Giuseppe Cattaneo, Paola Capasso
Abstract: The PNU is an essential and reliable tool to perform SCI and, during the years, became a standard de-facto for this task in the forensic field. In this paper, we show that, although strategies exist that aim to cancel, modify, replace the PNU traces in a digital camera image, it is still possible, through our experimental method, to find residual traces of the noise produced by the sensor used to shoot the photo. Furthermore, we show that is possible to inject the PNU of a different camera in a target image and trace it back to the source camera, but only under the condition that the new camera is of the same model of the original one used to take the target image. Both cameras must fall within our availability. For completeness, we carried out 2 experiments and, rather than using the popular public reference dataset, CASIA TIDE, we preferred to introduce a dataset that does not present any kind of statistical artifacts. A preliminary experiment on a small dataset of smartphones showed that the injection of PNU from a different device makes it impossible to identify the source camera correctly. For a second experiment, we built a large dataset of images taken with the same model DSLR. We extracted a denoised version of each image, injected each one with the RN of all the cameras in the dataset and compared all with a RP from each camera. The results of the experiments, clearly, show that either in the denoised images and the injected ones is possible to find residual traces of the original camera PNU. The combined results of the experiments show that, even in theory is possible to remove or replace the \ac{PNU} from an image, this process can be, easily, detected and is possible, under some hard conditions, confirming the robustness of the \ac{PNU} under this type of attacks.
摘要:PNU是在年内完成SCI和,一个重要和可靠的工具,成为标准事实上的在法医领域这一任务。在本文中,我们表明,虽然策略存在这一目标取消,修改,在数字照相机中的图像替换PNU痕迹,它仍然是可能的,通过我们的实验方法,以找到使用的传感器产生的噪声的残留的痕量拍摄的照片。此外,我们表明,可以注入不同的相机的PNU的目标图像,并追踪它回到源摄像头,但只有新的相机是原来一个同型号的条件下使用取目标图像。这两款相机都必须落在我们的可用性内。为了完整性,我们进行了2个实验和,而不是使用流行公共参考数据集,CASIA TIDE,我们优选的是引入不存在任何类型的统计假象的一个数据集。在智能手机上的一个小数据集的初步试验表明,PNU的从不同的设备的喷射使得不可能正确地识别源相机。对于第二个实验中,我们建立了一个大型数据集与同型号的数码单反相机拍摄的图像。我们提取每个图像的去噪版本,注入的数据集中的所有摄像机的RN每一个和所有从每个相机RP比较。该实验的结果,显然,表明无论是在去噪图像和注射的人可以发现,原来相机PNU的残留痕迹。实验的合并结果表明,即使在理论上是可能的,以除去或从图像替换\交流{PNU},这个过程可以是容易,检测和是可能的,一些硬的条件下,确认的鲁棒性\ {AC} PNU这种类型的攻击之下。
4. Person-in-Context Synthesiswith Compositional Structural Space [PDF] 返回目录
Weidong Yin, Ziwei Liu, Leonid Sigal
Abstract: Despite significant progress, controlled generation of complex images with interacting people remains difficult. Existing layout generation methods fall short of synthesizing realistic person instances; while pose-guided generation approaches focus on a single person and assume simple or known backgrounds. To tackle these limitations, we propose a new problem, \textbf{Persons in Context Synthesis}, which aims to synthesize diverse person instance(s) in consistent contexts, with user control over both. The context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated. To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared ``compositional structural space'', which encodes shape, location and appearance information for both context and person structures in a disentangled manner. This structural space is then decoded to the image space using multi-level feature modulation strategy, and learned in a self supervised manner from image collections and their corresponding inputs. Extensive experiments on two large-scale datasets (COCO-Stuff \cite{caesar2018cvpr} and Visual Genome \cite{krishna2017visual}) demonstrate that our framework outperforms state-of-the-art methods w.r.t. synthesis quality.
摘要:尽管有显著进步,控制与一代人的相互作用复杂的图像仍然困难。现有布局生成方法功亏一篑合成现实的人的实例;而姿态引导生成方法对一个人的焦点,并承担简单的或已知的背景。为了解决这些限制,我们提出了一个新的问题,\,在一致的上下文来合成不同的人情况(S),其目的,用在两个用户控制textbf {语境中合成人}。上下文是由缺乏形状信息的边界框对象布局指定,而由关键点的人的姿势,其被稀疏地进行注释。为了处理在输入结构形成鲜明差异,我们提出了两个独立的神经分支到聚精会神复合相应(上下文/人)输入到共享``组成的结构空间“”,其编码的形状,位置和外观信息两者上下文和人结构在解缠结方式。然后,该结构空间解码以使用多级特征调制策略的图像空间,并且在自学习监督从图像集合及其相应的输入方式。两个大型数据集大量的实验(COCO-东西\ {引用} caesar2018cvpr和Visual基因组\ {引用} krishna2017visual)证明我们的架构性能优于国家的最先进的方法w.r.t.综合素质。
Weidong Yin, Ziwei Liu, Leonid Sigal
Abstract: Despite significant progress, controlled generation of complex images with interacting people remains difficult. Existing layout generation methods fall short of synthesizing realistic person instances; while pose-guided generation approaches focus on a single person and assume simple or known backgrounds. To tackle these limitations, we propose a new problem, \textbf{Persons in Context Synthesis}, which aims to synthesize diverse person instance(s) in consistent contexts, with user control over both. The context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated. To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared ``compositional structural space'', which encodes shape, location and appearance information for both context and person structures in a disentangled manner. This structural space is then decoded to the image space using multi-level feature modulation strategy, and learned in a self supervised manner from image collections and their corresponding inputs. Extensive experiments on two large-scale datasets (COCO-Stuff \cite{caesar2018cvpr} and Visual Genome \cite{krishna2017visual}) demonstrate that our framework outperforms state-of-the-art methods w.r.t. synthesis quality.
摘要:尽管有显著进步,控制与一代人的相互作用复杂的图像仍然困难。现有布局生成方法功亏一篑合成现实的人的实例;而姿态引导生成方法对一个人的焦点,并承担简单的或已知的背景。为了解决这些限制,我们提出了一个新的问题,\,在一致的上下文来合成不同的人情况(S),其目的,用在两个用户控制textbf {语境中合成人}。上下文是由缺乏形状信息的边界框对象布局指定,而由关键点的人的姿势,其被稀疏地进行注释。为了处理在输入结构形成鲜明差异,我们提出了两个独立的神经分支到聚精会神复合相应(上下文/人)输入到共享``组成的结构空间“”,其编码的形状,位置和外观信息两者上下文和人结构在解缠结方式。然后,该结构空间解码以使用多级特征调制策略的图像空间,并且在自学习监督从图像集合及其相应的输入方式。两个大型数据集大量的实验(COCO-东西\ {引用} caesar2018cvpr和Visual基因组\ {引用} krishna2017visual)证明我们的架构性能优于国家的最先进的方法w.r.t.综合素质。
5. Next-Best View Policy for 3D Reconstruction [PDF] 返回目录
Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote
Abstract: Manually selecting viewpoints or using commonly available flight planners like circular path for large-scale 3D reconstruction using drones often results in incomplete 3D models. Recent works have relied on hand-engineered heuristics such as information gain to select the Next-Best Views. In this work, we present a learning-based algorithm called Scan-RL to learn a Next-Best View (NBV) Policy. To train and evaluate the agent, we created Houses3K, a dataset of 3D house models. Our experiments show that using Scan-RL, the agent can scan houses with fewer number of steps and a shorter distance compared to our baseline circular path. Experimental results also demonstrate that a single NBV policy can be used to scan multiple houses including those that were not seen during training. The link to Scan-RL is available athttps://github.com/darylperalta/ScanRL and Houses3K dataset can be found at this https URL.
摘要:手动选择的观点或使用常用飞行计划人员喜欢使用无人驾驶飞机的大型3D重建圆形路径通常会导致不完整的3D模型。最近的作品一直依靠手工设计的启发,如信息增益来选择下一个最佳观。在这项工作中,我们提出了所谓的扫描-RL学习下一个最佳视图(NBV)策略基于学习的算法。为了培养和评价药剂,我们创建Houses3K,三维模型内部的数据集。我们的实验表明,采用扫描-RL,代理可以扫描的房子以较少的步骤和更短的距离相比,我们的基线圆形路径。实验结果还表明,单一的NBV政策可以用来扫描多个住宅包括那些在训练期间也没有观察到。到扫描-RL的链接可athttps://github.com/darylperalta/ScanRL和Houses3K数据集可以在此HTTPS URL中找到。
Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote
Abstract: Manually selecting viewpoints or using commonly available flight planners like circular path for large-scale 3D reconstruction using drones often results in incomplete 3D models. Recent works have relied on hand-engineered heuristics such as information gain to select the Next-Best Views. In this work, we present a learning-based algorithm called Scan-RL to learn a Next-Best View (NBV) Policy. To train and evaluate the agent, we created Houses3K, a dataset of 3D house models. Our experiments show that using Scan-RL, the agent can scan houses with fewer number of steps and a shorter distance compared to our baseline circular path. Experimental results also demonstrate that a single NBV policy can be used to scan multiple houses including those that were not seen during training. The link to Scan-RL is available athttps://github.com/darylperalta/ScanRL and Houses3K dataset can be found at this https URL.
摘要:手动选择的观点或使用常用飞行计划人员喜欢使用无人驾驶飞机的大型3D重建圆形路径通常会导致不完整的3D模型。最近的作品一直依靠手工设计的启发,如信息增益来选择下一个最佳观。在这项工作中,我们提出了所谓的扫描-RL学习下一个最佳视图(NBV)策略基于学习的算法。为了培养和评价药剂,我们创建Houses3K,三维模型内部的数据集。我们的实验表明,采用扫描-RL,代理可以扫描的房子以较少的步骤和更短的距离相比,我们的基线圆形路径。实验结果还表明,单一的NBV政策可以用来扫描多个住宅包括那些在训练期间也没有观察到。到扫描-RL的链接可athttps://github.com/darylperalta/ScanRL和Houses3K数据集可以在此HTTPS URL中找到。
6. Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation [PDF] 返回目录
Yurui Ren, Ge Li, Shan Liu, Thomas H. Li
Abstract: Pose-guided person image generation and animation aim to transform a source person image to target poses. These tasks require spatial manipulation of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. This framework first estimates global flow fields between sources and targets. Then, corresponding local source feature patches are sampled with content-aware local attention coefficients. We show that our framework can spatially transform the inputs in an efficient manner. Meanwhile, we further model the temporal consistency for the person image animation task to generate coherent videos. The experiment results of both image generation and animation tasks demonstrate the superiority of our model. Besides, additional results of novel view synthesis and face image animation show that our model is applicable to other tasks requiring spatial transformation. The source code of our project is available at this https URL.
摘要:姿态引导人图像生成和动画的目标是一个人源图像变换到目标姿势。这些任务需要的源数据的空间操作。然而,卷积神经网络是由缺乏在空间上变换的输入能力的限制。在本文中,我们提出了一个微全球流动局部注意力框架在功能层面重新组合的输入。该框架首先估算源和目标之间的全局流场。然后,相应的本地源功能修补程序与内容,了解当地的注意力系数进行采样。我们证明了我们的框架可以在空间上有效的方式改造的投入。同时,我们进一步建模的人物图像动画任务时间一致性产生连贯的视频。两个图像生成和动画任务的实验结果表明我们的模型的优越性。此外,新的视图合成和人脸图像的动画显示,我们的模型适用于需要空间变换等任务的额外的结果。我们的项目的源代码可在此HTTPS URL。
Yurui Ren, Ge Li, Shan Liu, Thomas H. Li
Abstract: Pose-guided person image generation and animation aim to transform a source person image to target poses. These tasks require spatial manipulation of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. This framework first estimates global flow fields between sources and targets. Then, corresponding local source feature patches are sampled with content-aware local attention coefficients. We show that our framework can spatially transform the inputs in an efficient manner. Meanwhile, we further model the temporal consistency for the person image animation task to generate coherent videos. The experiment results of both image generation and animation tasks demonstrate the superiority of our model. Besides, additional results of novel view synthesis and face image animation show that our model is applicable to other tasks requiring spatial transformation. The source code of our project is available at this https URL.
摘要:姿态引导人图像生成和动画的目标是一个人源图像变换到目标姿势。这些任务需要的源数据的空间操作。然而,卷积神经网络是由缺乏在空间上变换的输入能力的限制。在本文中,我们提出了一个微全球流动局部注意力框架在功能层面重新组合的输入。该框架首先估算源和目标之间的全局流场。然后,相应的本地源功能修补程序与内容,了解当地的注意力系数进行采样。我们证明了我们的框架可以在空间上有效的方式改造的投入。同时,我们进一步建模的人物图像动画任务时间一致性产生连贯的视频。两个图像生成和动画任务的实验结果表明我们的模型的优越性。此外,新的视图合成和人脸图像的动画显示,我们的模型适用于需要空间变换等任务的额外的结果。我们的项目的源代码可在此HTTPS URL。
7. A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater Visual Analysis [PDF] 返回目录
Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov, Michael Bradley, David Vazquez, Marcus Sheaves
Abstract: Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the complexity of underwater fish habitats. To address this limitation, we present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 \green{habitats in the} marine-environments of tropical Australia. The dataset originally contained only classification labels. Thus, we collected point-level and segmentation labels to have a more comprehensive fish analysis benchmark. These labels enable models to learn to automatically monitor fish count, identify their locations, and estimate their sizes. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches based on our benchmark. Although models pre-trained on ImageNet have successfully performed on this benchmark, there is still room for improvement. Therefore, this benchmark serves as a testbed to motivate further development in this challenging domain of underwater computer vision. Code is available at: this https URL
摘要:复杂的鱼类生境的视觉分析是对人类消耗和环境保护的可持续渔业的重要一步。在大型数据集时训练的深度学习方法已用于场景分析渐露。然而,对于鱼分析当前数据集往往侧重于约束,普通环境中的分类任务不捕捉水下鱼类生境的复杂性。为了解决这个限制,我们目前DeepFish为基准的套房设有一个大型数据集进行训练和试验方法几个计算机视觉任务。该数据集包括收集水下20 \绿意盎然的热带澳大利亚的海洋环境{中}栖息地约40000图像。该数据集最初只包含分类标签。因此,我们收集点的水平和分割标签有一个更全面的FISH分析的基准。这些标签使模型来了解自动监控鱼类数量,确定它们的位置,并估计其大小。我们的实验提供的数据集特性的深入分析,以及一些国家的最先进的绩效评价方法的基础上我们的基准。在ImageNet对这个基准测试顺利完成。虽然车型预先训练,还有改进的余地。因此,这个基准作为一个测试平台,以在水下计算机视觉的这一具有挑战性的领域激励进一步发展。代码,请访问:此HTTPS URL
Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov, Michael Bradley, David Vazquez, Marcus Sheaves
Abstract: Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the complexity of underwater fish habitats. To address this limitation, we present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 \green{habitats in the} marine-environments of tropical Australia. The dataset originally contained only classification labels. Thus, we collected point-level and segmentation labels to have a more comprehensive fish analysis benchmark. These labels enable models to learn to automatically monitor fish count, identify their locations, and estimate their sizes. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches based on our benchmark. Although models pre-trained on ImageNet have successfully performed on this benchmark, there is still room for improvement. Therefore, this benchmark serves as a testbed to motivate further development in this challenging domain of underwater computer vision. Code is available at: this https URL
摘要:复杂的鱼类生境的视觉分析是对人类消耗和环境保护的可持续渔业的重要一步。在大型数据集时训练的深度学习方法已用于场景分析渐露。然而,对于鱼分析当前数据集往往侧重于约束,普通环境中的分类任务不捕捉水下鱼类生境的复杂性。为了解决这个限制,我们目前DeepFish为基准的套房设有一个大型数据集进行训练和试验方法几个计算机视觉任务。该数据集包括收集水下20 \绿意盎然的热带澳大利亚的海洋环境{中}栖息地约40000图像。该数据集最初只包含分类标签。因此,我们收集点的水平和分割标签有一个更全面的FISH分析的基准。这些标签使模型来了解自动监控鱼类数量,确定它们的位置,并估计其大小。我们的实验提供的数据集特性的深入分析,以及一些国家的最先进的绩效评价方法的基础上我们的基准。在ImageNet对这个基准测试顺利完成。虽然车型预先训练,还有改进的余地。因此,这个基准作为一个测试平台,以在水下计算机视觉的这一具有挑战性的领域激励进一步发展。代码,请访问:此HTTPS URL
8. The Effects of Skin Lesion Segmentation on the Performance of Dermatoscopic Image Classification [PDF] 返回目录
Amirreza Mahbod, Philipp Tschandl, Georg Langs, Rupert Ecker, Isabella Ellinger
Abstract: Malignant melanoma (MM) is one of the deadliest types of skin cancer. Analysing dermatoscopic images plays an important role in the early detection of MM and other pigmented skin lesions. Among different computer-based methods, deep learning-based approaches and in particular convolutional neural networks have shown excellent classification and segmentation performances for dermatoscopic skin lesion images. These models can be trained end-to-end without requiring any hand-crafted features. However, the effect of using lesion segmentation information on classification performance has remained an open question. In this study, we explicitly investigated the impact of using skin lesion segmentation masks on the performance of dermatoscopic image classification. To do this, first, we developed a baseline classifier as the reference model without using any segmentation masks. Then, we used either manually or automatically created segmentation masks in both training and test phases in different scenarios and investigated the classification performances. Evaluated on the ISIC 2017 challenge dataset which contained two binary classification tasks (i.e. MM vs. all and seborrheic keratosis (SK) vs. all) and based on the derived area under the receiver operating characteristic curve scores, we observed four main outcomes. Our results show that 1) using segmentation masks did not significantly improve the MM classification performance in any scenario, 2) in one of the scenarios (using segmentation masks for dilated cropping), SK classification performance was significantly improved, 3) removing all background information by the segmentation masks significantly degraded the overall classification performance, and 4) in case of using the appropriate scenario (using segmentation for dilated cropping), there is no significant difference of using manually or automatically created segmentation masks.
摘要:恶性黑色素瘤(MM)是最致命的类型的皮肤癌的一种。分析dermatoscopic图像起到早期发现MM等色素性皮肤病变的重要作用。在不同的基于计算机的方法,深度学习为基础的方法,特别是卷积神经网络已经显示了dermatoscopic皮肤损伤图像优秀的分类和分割的表演。这些模型可以被训练的端至端无需任何手工制作的功能。然而,在使用上的分类性能病变划分信息的效果一直保持一个开放的问题。在这项研究中,我们明确地研究使用上dermatoscopic图像分类的性能皮损分割掩码的影响。要做到这一点,首先,我们制定了一个基准分类作为参考模型,而无需使用任何分割掩码。然后,我们用手动或自动生成的分割掩码在不同的场景训练和测试阶段,并研究了分类性能。评价了含有两个二进制分类任务(即MM与相对于全体所有和脂溢性角化病(SK)),并基于所述接收器工作特性曲线下的得分得出的区域的ISIC 2017挑战数据集,我们观察到四种主要结果。我们的研究结果表明:1)使用分割掩码没有显著改善的情况(使用扩张性耕作分割掩码)之一,在任何情况下的MM分类性能,2),SK分类性能显著提高,3)删除所有背景信息由分割掩码显著劣化在使用适当的方案(使用散瞳裁剪分割的情况下,整体分类性能,4)),存在使用手工或自动创建分割掩码的无显著差异。
Amirreza Mahbod, Philipp Tschandl, Georg Langs, Rupert Ecker, Isabella Ellinger
Abstract: Malignant melanoma (MM) is one of the deadliest types of skin cancer. Analysing dermatoscopic images plays an important role in the early detection of MM and other pigmented skin lesions. Among different computer-based methods, deep learning-based approaches and in particular convolutional neural networks have shown excellent classification and segmentation performances for dermatoscopic skin lesion images. These models can be trained end-to-end without requiring any hand-crafted features. However, the effect of using lesion segmentation information on classification performance has remained an open question. In this study, we explicitly investigated the impact of using skin lesion segmentation masks on the performance of dermatoscopic image classification. To do this, first, we developed a baseline classifier as the reference model without using any segmentation masks. Then, we used either manually or automatically created segmentation masks in both training and test phases in different scenarios and investigated the classification performances. Evaluated on the ISIC 2017 challenge dataset which contained two binary classification tasks (i.e. MM vs. all and seborrheic keratosis (SK) vs. all) and based on the derived area under the receiver operating characteristic curve scores, we observed four main outcomes. Our results show that 1) using segmentation masks did not significantly improve the MM classification performance in any scenario, 2) in one of the scenarios (using segmentation masks for dilated cropping), SK classification performance was significantly improved, 3) removing all background information by the segmentation masks significantly degraded the overall classification performance, and 4) in case of using the appropriate scenario (using segmentation for dilated cropping), there is no significant difference of using manually or automatically created segmentation masks.
摘要:恶性黑色素瘤(MM)是最致命的类型的皮肤癌的一种。分析dermatoscopic图像起到早期发现MM等色素性皮肤病变的重要作用。在不同的基于计算机的方法,深度学习为基础的方法,特别是卷积神经网络已经显示了dermatoscopic皮肤损伤图像优秀的分类和分割的表演。这些模型可以被训练的端至端无需任何手工制作的功能。然而,在使用上的分类性能病变划分信息的效果一直保持一个开放的问题。在这项研究中,我们明确地研究使用上dermatoscopic图像分类的性能皮损分割掩码的影响。要做到这一点,首先,我们制定了一个基准分类作为参考模型,而无需使用任何分割掩码。然后,我们用手动或自动生成的分割掩码在不同的场景训练和测试阶段,并研究了分类性能。评价了含有两个二进制分类任务(即MM与相对于全体所有和脂溢性角化病(SK)),并基于所述接收器工作特性曲线下的得分得出的区域的ISIC 2017挑战数据集,我们观察到四种主要结果。我们的研究结果表明:1)使用分割掩码没有显著改善的情况(使用扩张性耕作分割掩码)之一,在任何情况下的MM分类性能,2),SK分类性能显著提高,3)删除所有背景信息由分割掩码显著劣化在使用适当的方案(使用散瞳裁剪分割的情况下,整体分类性能,4)),存在使用手工或自动创建分割掩码的无显著差异。
9. PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges [PDF] 返回目录
Shaoshuai Shi, Chaoxu Guo, Jihan Yang, Hongsheng Li
Abstract: In this technical report, we present the top-performing LiDAR-only solutions for 3D detection, 3D tracking and domain adaptation three tracks in Waymo Open Dataset Challenges 2020. Our solutions for the competition are built upon our recent proposed PV-RCNN 3D object detection framework. Several variants of our PV-RCNN are explored, including temporal information incorporation, dynamic voxelization, adaptive training sample selection, classification with RoI features, etc. A simple model ensemble strategy with non-maximum-suppression and box voting is adopted to generate the final results. By using only LiDAR point cloud data, our models finally achieve the 1st place among all LiDAR-only methods, and the 2nd place among all multi-modal methods, on the 3D Detection, 3D Tracking and Domain Adaptation three tracks of Waymo Open Dataset Challenges. Our solutions will be available at this https URL
摘要:在这个技术报告中,我们提出了三维检测,3D跟踪和领域适应性三条轨道在Waymo打开的数据集挑战2020年我们的解决方案的竞争是在我们最近提出的PV-RCNN 3D内置了顶级的激光雷达进行,唯一的解决办法物体检测框架。我们的PV-RCNN的几个变种进行了探讨,其中包括时间信息结合,动态体素化,适应性训练样本选择,通过ROI分类特征,与非最大抑制和箱表决等简单模式集合策略被采用来生成最终的结果。通过只使用激光雷达点云数据,我们的模型最终实现所有激光雷达,只有方法中的第一名,并且所有的多模态法中的第2位,在三维检测,3D跟踪和领域适应性Waymo打开的数据集挑战的三个轨道。我们的解决方案将可在此HTTPS URL
Shaoshuai Shi, Chaoxu Guo, Jihan Yang, Hongsheng Li
Abstract: In this technical report, we present the top-performing LiDAR-only solutions for 3D detection, 3D tracking and domain adaptation three tracks in Waymo Open Dataset Challenges 2020. Our solutions for the competition are built upon our recent proposed PV-RCNN 3D object detection framework. Several variants of our PV-RCNN are explored, including temporal information incorporation, dynamic voxelization, adaptive training sample selection, classification with RoI features, etc. A simple model ensemble strategy with non-maximum-suppression and box voting is adopted to generate the final results. By using only LiDAR point cloud data, our models finally achieve the 1st place among all LiDAR-only methods, and the 2nd place among all multi-modal methods, on the 3D Detection, 3D Tracking and Domain Adaptation three tracks of Waymo Open Dataset Challenges. Our solutions will be available at this https URL
摘要:在这个技术报告中,我们提出了三维检测,3D跟踪和领域适应性三条轨道在Waymo打开的数据集挑战2020年我们的解决方案的竞争是在我们最近提出的PV-RCNN 3D内置了顶级的激光雷达进行,唯一的解决办法物体检测框架。我们的PV-RCNN的几个变种进行了探讨,其中包括时间信息结合,动态体素化,适应性训练样本选择,通过ROI分类特征,与非最大抑制和箱表决等简单模式集合策略被采用来生成最终的结果。通过只使用激光雷达点云数据,我们的模型最终实现所有激光雷达,只有方法中的第一名,并且所有的多模态法中的第2位,在三维检测,3D跟踪和领域适应性Waymo打开的数据集挑战的三个轨道。我们的解决方案将可在此HTTPS URL
10. Same Same But DifferNet: Semi-Supervised Defect Detection with Normalizing Flows [PDF] 返回目录
Marco Rudolph, Bastian Wandt, Bodo Rosenhahn
Abstract: The detection of manufacturing errors is crucial in fabrication processes to ensure product quality and safety standards. Since many defects occur very rarely and their characteristics are mostly unknown a priori, their detection is still an open research question. To this end, we propose DifferNet: It leverages the descriptiveness of features extracted by convolutional neural networks to estimate their density using normalizing flows. Normalizing flows are well-suited to deal with low dimensional data distributions. However, they struggle with the high dimensionality of images. Therefore, we employ a multi-scale feature extractor which enables the normalizing flow to assign meaningful likelihoods to the images. Based on these likelihoods we develop a scoring function that indicates defects. Moreover, propagating the score back to the image enables pixel-wise localization. To achieve a high robustness and performance we exploit multiple transformations in training and evaluation. In contrast to most other methods, ours does not require a large number of training samples and performs well with as low as 16 images. We demonstrate the superior performance over existing approaches on the challenging and newly proposed MVTec AD and Magnetic Tile Defects datasets.
摘要:制造误差的检测是在制造工艺至关重要,以确保产品质量和安全标准。由于许多缺陷发生很少和他们的特点大多是未知的先验,他们的检测仍然是一个开放的研究问题。为此,我们提出了目前存在的:它利用的是卷积神经网络中提取估计采用标准化流程的高密度特性的描述性。正火流动是非常适合于处理低维数据分布。然而,他们与图像的高维奋斗。因此,我们采用了多尺度特征提取使正火流量分配有意义可能性的图像。基于这些可能性,我们制定了评分功能,指示缺陷。此外,传播所述分回图像使得能够逐个像素定位。为了实现高耐用性和性能,我们利用在培训和评估多个转换。与大多数其他的方法,我们并不需要大量的训练样本和执行以及与低至16个图像。我们证明了在充满挑战和新提出MVTec公司AD和磁瓦缺陷数据集现有方法的优越性能。
Marco Rudolph, Bastian Wandt, Bodo Rosenhahn
Abstract: The detection of manufacturing errors is crucial in fabrication processes to ensure product quality and safety standards. Since many defects occur very rarely and their characteristics are mostly unknown a priori, their detection is still an open research question. To this end, we propose DifferNet: It leverages the descriptiveness of features extracted by convolutional neural networks to estimate their density using normalizing flows. Normalizing flows are well-suited to deal with low dimensional data distributions. However, they struggle with the high dimensionality of images. Therefore, we employ a multi-scale feature extractor which enables the normalizing flow to assign meaningful likelihoods to the images. Based on these likelihoods we develop a scoring function that indicates defects. Moreover, propagating the score back to the image enables pixel-wise localization. To achieve a high robustness and performance we exploit multiple transformations in training and evaluation. In contrast to most other methods, ours does not require a large number of training samples and performs well with as low as 16 images. We demonstrate the superior performance over existing approaches on the challenging and newly proposed MVTec AD and Magnetic Tile Defects datasets.
摘要:制造误差的检测是在制造工艺至关重要,以确保产品质量和安全标准。由于许多缺陷发生很少和他们的特点大多是未知的先验,他们的检测仍然是一个开放的研究问题。为此,我们提出了目前存在的:它利用的是卷积神经网络中提取估计采用标准化流程的高密度特性的描述性。正火流动是非常适合于处理低维数据分布。然而,他们与图像的高维奋斗。因此,我们采用了多尺度特征提取使正火流量分配有意义可能性的图像。基于这些可能性,我们制定了评分功能,指示缺陷。此外,传播所述分回图像使得能够逐个像素定位。为了实现高耐用性和性能,我们利用在培训和评估多个转换。与大多数其他的方法,我们并不需要大量的训练样本和执行以及与低至16个图像。我们证明了在充满挑战和新提出MVTec公司AD和磁瓦缺陷数据集现有方法的优越性能。
11. A Dataset and Baselines for Visual Question Answering on Art [PDF] 返回目录
Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura
Abstract: Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.
摘要:在回答有关的艺术作品(绘画)的问题是一项艰巨的任务,因为这意味着在画面表现出的不仅是视觉信息的理解,而且还通过艺术史的研究中所获得的语境知识。在这项工作中,我们介绍我们努力建设一个新的数据集的第一次尝试,创造了AQUA(艺术问答)。的问题 - 回答(QA)对使用基于在现有技术理解的数据集提供的绘画和注释国家的最先进的问题生成方法自动生成。质量保证对通过众包的工人就其语法的正确性,回应能力,并回答正确清洗。我们的数据本质上是由视觉(基于绘画)和知识(基于注释的)问题。我们还提出一个有两个分支模型为基准,在视觉和知识的问题是独立处理。我们比较广泛我们的基础模型对国家的最先进的型号为问答,我们提供有关的艺术视觉答疑的挑战和潜在的未来方向进行全面的研究。
Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura
Abstract: Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.
摘要:在回答有关的艺术作品(绘画)的问题是一项艰巨的任务,因为这意味着在画面表现出的不仅是视觉信息的理解,而且还通过艺术史的研究中所获得的语境知识。在这项工作中,我们介绍我们努力建设一个新的数据集的第一次尝试,创造了AQUA(艺术问答)。的问题 - 回答(QA)对使用基于在现有技术理解的数据集提供的绘画和注释国家的最先进的问题生成方法自动生成。质量保证对通过众包的工人就其语法的正确性,回应能力,并回答正确清洗。我们的数据本质上是由视觉(基于绘画)和知识(基于注释的)问题。我们还提出一个有两个分支模型为基准,在视觉和知识的问题是独立处理。我们比较广泛我们的基础模型对国家的最先进的型号为问答,我们提供有关的艺术视觉答疑的挑战和潜在的未来方向进行全面的研究。
12. Distortion-Adaptive Grape Bunch Counting for Omnidirectional Images [PDF] 返回目录
Ryota Akai, Yuzuko Utsumi, Yuka Miwa, Masakazu Iwamura, Koichi Kise
Abstract: This paper proposes the first object counting method for omnidirectional images. Because conventional object counting methods cannot handle the distortion of omnidirectional images, we propose to process them using stereographic projection, which enables conventional methods to obtain a good approximation of the density function. However, the images obtained by stereographic projection are still distorted. Hence, to manage this distortion, we propose two methods. One is a new data augmentation method designed for the stereographic projection of omnidirectional images. The other is a distortion-adaptive Gaussian kernel that generates a density map ground truth while taking into account the distortion of stereographic projection. Using the counting of grape bunches as a case study, we constructed an original grape-bunch image dataset consisting of omnidirectional images and conducted experiments to evaluate the proposed method. The results show that the proposed method performs better than a direct application of the conventional method, improving mean absolute error by 14.7% and mean squared error by 10.5%.
摘要:本文提出了一种全向图像中的第一对象的计数方法。由于传统的对象的计数方法不能处理全方位图像的失真,我们提出使用立体投影,这使得传统的方法,以获得密度函数的良好近似来处理它们。然而,通过立体投影所获得的图像仍是扭曲的。因此,要管理这种扭曲,我们提出了两种方法。一个是设计用于全向图像的立体投影一个新的数据增强方法。另一个是失真自适应高斯内核生成密度图地面实况考虑到立体投影的失真而。使用葡萄串作为案例研究的计数,我们构建原始葡萄一堆图像数据集由全方位图像和所进行的实验,以评估所提出的方法。结果表明,比常规方法的直接应用所提出的方法执行得更好,10.5%,14.7%和均方误差提高平均绝对误差。
Ryota Akai, Yuzuko Utsumi, Yuka Miwa, Masakazu Iwamura, Koichi Kise
Abstract: This paper proposes the first object counting method for omnidirectional images. Because conventional object counting methods cannot handle the distortion of omnidirectional images, we propose to process them using stereographic projection, which enables conventional methods to obtain a good approximation of the density function. However, the images obtained by stereographic projection are still distorted. Hence, to manage this distortion, we propose two methods. One is a new data augmentation method designed for the stereographic projection of omnidirectional images. The other is a distortion-adaptive Gaussian kernel that generates a density map ground truth while taking into account the distortion of stereographic projection. Using the counting of grape bunches as a case study, we constructed an original grape-bunch image dataset consisting of omnidirectional images and conducted experiments to evaluate the proposed method. The results show that the proposed method performs better than a direct application of the conventional method, improving mean absolute error by 14.7% and mean squared error by 10.5%.
摘要:本文提出了一种全向图像中的第一对象的计数方法。由于传统的对象的计数方法不能处理全方位图像的失真,我们提出使用立体投影,这使得传统的方法,以获得密度函数的良好近似来处理它们。然而,通过立体投影所获得的图像仍是扭曲的。因此,要管理这种扭曲,我们提出了两种方法。一个是设计用于全向图像的立体投影一个新的数据增强方法。另一个是失真自适应高斯内核生成密度图地面实况考虑到立体投影的失真而。使用葡萄串作为案例研究的计数,我们构建原始葡萄一堆图像数据集由全方位图像和所进行的实验,以评估所提出的方法。结果表明,比常规方法的直接应用所提出的方法执行得更好,10.5%,14.7%和均方误差提高平均绝对误差。
13. Few-Shot Object Detection via Knowledge Transfer [PDF] 返回目录
Geonuk Kim, Hong-Gyu Jung, Seong-Whan Lee
Abstract: Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and annotations, the object detectors easily overfit and fail to generalize. It exposes the practical weakness of the object detectors. On the other hand, human can easily master new reasoning rules with only a few demonstrations using previously learned knowledge. In this paper, we introduce a few-shot object detection via knowledge transfer, which aims to detect objects from a few training examples. Central to our method is prototypical knowledge transfer with an attached meta-learner. The meta-learner takes support set images that include the few examples of the novel categories and base categories, and predicts prototypes that represent each category as a vector. Then, the prototypes reweight each RoI (Region-of-Interest) feature vector from a query image to remodels R-CNN predictor heads. To facilitate the remodeling process, we predict the prototypes under a graph structure, which propagates information of the correlated base categories to the novel categories with explicit guidance of prior knowledge that represents correlations among categories. Extensive experiments on the PASCAL VOC dataset verifies the effectiveness of the proposed method.
摘要:目标检测的常规方法通常需要的训练数据大量和注释边框。如果只有几个训练数据和注释,对象检测器容易过拟合,并不能一概而论。它暴露的对象检测器的实际弱点。在另一方面,人类可以轻松掌握使用以前学过的知识,只有少数示威新的推理规则。在本文中,我们将介绍通过知识转移,其目的是从几个训练例子探测物体几拍物体检测。中央对我们的方法是用一个附加的元学习者典型的知识转移。所述元学习者需要支持的图像集,其包括新颖的类别和碱类的几个例子,并且预测代表每个类别作为载体原型。然后,将原型reweight从查询图像中的每个ROI(地区的感兴趣)特征向量重塑R-CNN预测器头。为了便于重塑过程中,我们预测的曲线结构,它传播的相关基础类别的新颖类别与表示类别之间的相关性的先验知识的明确指导的信息下的原型。在PASCAL VOC数据集大量的实验验证了该方法的有效性。
Geonuk Kim, Hong-Gyu Jung, Seong-Whan Lee
Abstract: Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and annotations, the object detectors easily overfit and fail to generalize. It exposes the practical weakness of the object detectors. On the other hand, human can easily master new reasoning rules with only a few demonstrations using previously learned knowledge. In this paper, we introduce a few-shot object detection via knowledge transfer, which aims to detect objects from a few training examples. Central to our method is prototypical knowledge transfer with an attached meta-learner. The meta-learner takes support set images that include the few examples of the novel categories and base categories, and predicts prototypes that represent each category as a vector. Then, the prototypes reweight each RoI (Region-of-Interest) feature vector from a query image to remodels R-CNN predictor heads. To facilitate the remodeling process, we predict the prototypes under a graph structure, which propagates information of the correlated base categories to the novel categories with explicit guidance of prior knowledge that represents correlations among categories. Extensive experiments on the PASCAL VOC dataset verifies the effectiveness of the proposed method.
摘要:目标检测的常规方法通常需要的训练数据大量和注释边框。如果只有几个训练数据和注释,对象检测器容易过拟合,并不能一概而论。它暴露的对象检测器的实际弱点。在另一方面,人类可以轻松掌握使用以前学过的知识,只有少数示威新的推理规则。在本文中,我们将介绍通过知识转移,其目的是从几个训练例子探测物体几拍物体检测。中央对我们的方法是用一个附加的元学习者典型的知识转移。所述元学习者需要支持的图像集,其包括新颖的类别和碱类的几个例子,并且预测代表每个类别作为载体原型。然后,将原型reweight从查询图像中的每个ROI(地区的感兴趣)特征向量重塑R-CNN预测器头。为了便于重塑过程中,我们预测的曲线结构,它传播的相关基础类别的新颖类别与表示类别之间的相关性的先验知识的明确指导的信息下的原型。在PASCAL VOC数据集大量的实验验证了该方法的有效性。
14. Counting from Sky: A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method [PDF] 返回目录
Guangshuai Gao, Qingjie Liu, Yunhong Wang
Abstract: Object counting, whose aim is to estimate the number of objects from a given image, is an important and challenging computation task. Significant efforts have been devoted to addressing this problem and achieved great progress, yet counting the number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots. We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image. The proposed network consists of three parts namely attention module, scale pyramid module and deformable convolution module to attack the aforementioned challenging factors. Extensive experiments are performed on the proposed dataset and one crowd counting datset, which demonstrate the challenges of the proposed dataset and the superiority and effectiveness of our method compared with state-of-the-art methods.
摘要:对象计数,其目的是从给定的图像估计对象的数量,是一项重要而具有挑战性的计算任务。显著的努力一直致力于解决这个问题,并取得了很大的进步,但计算遥感影像地面物体的数量几乎没有影响。在本文中,我们感兴趣的是从遥感图像计数密集的对象。在自然场景中的对象计数相比,这个任务有以下几个因素的挑战:规模大的变化,复杂的复杂背景和方向随意性。更重要的是,数据的缺乏严重限制了该领域研究的发展。为了解决这些问题,我们首先构造与遥感影像,它包含四个重要地理对象一个大型对象计数数据集:建筑,拥挤的船只在港口,大型车和小型车的停车场。然后,我们的基准数据集,通过设计,可以产生输入图像的密度图的新的神经网络。所提出的网络由三个部分组成,即注意模块,规模金字塔模块和可变形卷积模块来攻击上述挑战的因素。大量的实验是在所提出的数据集和一个人群计数datset,这证明了该数据集,并与国家的最先进的方法相比,我们的方法的优越性和有效性的挑战进行。
Guangshuai Gao, Qingjie Liu, Yunhong Wang
Abstract: Object counting, whose aim is to estimate the number of objects from a given image, is an important and challenging computation task. Significant efforts have been devoted to addressing this problem and achieved great progress, yet counting the number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots. We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image. The proposed network consists of three parts namely attention module, scale pyramid module and deformable convolution module to attack the aforementioned challenging factors. Extensive experiments are performed on the proposed dataset and one crowd counting datset, which demonstrate the challenges of the proposed dataset and the superiority and effectiveness of our method compared with state-of-the-art methods.
摘要:对象计数,其目的是从给定的图像估计对象的数量,是一项重要而具有挑战性的计算任务。显著的努力一直致力于解决这个问题,并取得了很大的进步,但计算遥感影像地面物体的数量几乎没有影响。在本文中,我们感兴趣的是从遥感图像计数密集的对象。在自然场景中的对象计数相比,这个任务有以下几个因素的挑战:规模大的变化,复杂的复杂背景和方向随意性。更重要的是,数据的缺乏严重限制了该领域研究的发展。为了解决这些问题,我们首先构造与遥感影像,它包含四个重要地理对象一个大型对象计数数据集:建筑,拥挤的船只在港口,大型车和小型车的停车场。然后,我们的基准数据集,通过设计,可以产生输入图像的密度图的新的神经网络。所提出的网络由三个部分组成,即注意模块,规模金字塔模块和可变形卷积模块来攻击上述挑战的因素。大量的实验是在所提出的数据集和一个人群计数datset,这证明了该数据集,并与国家的最先进的方法相比,我们的方法的优越性和有效性的挑战进行。
15. Adaptive WGAN with loss change rate balancing [PDF] 返回目录
Xu Ouyang, Gady Agam
Abstract: Optimizing the discriminator in Generative Adversarial Networks (GANs) to completion in the inner training loop is computationally prohibitive, and on finite datasets would result in overfitting. To address this, a common update strategy is to alternate between k optimization steps for the discriminator D and one optimization step for the generator G. This strategy is repeated in various GAN algorithms where k is selected empirically. In this paper, we show that this update strategy is not optimal in terms of accuracy and convergence speed, and propose a new update strategy for Wasserstein GANs (WGAN) and other GANs using the WGAN loss(e.g. WGAN-GP, Deblur GAN, and Super-resolution GAN). The proposed update strategy is based on a loss change ratio comparison of G and D. We demonstrate that the proposed strategy improves both convergence speed and accuracy.
摘要:在优化控辩剖成网络(甘斯)的鉴别,以在内部训练循环完成在计算上是禁止的,并在有限的数据集将导致过度拟合。为了解决这个问题,一个共同的更新策略是提供一种用于鉴别d和用于发电机G.这种策略在各种GAN算法重复其中k是根据经验选择的一个优化步骤k的最优化步骤之间交替。在本文中,我们表明,这种更新策略不是最佳的精度和收敛速度方面,并使用WGAN损失(如WGAN-GP,去模糊GAN提出Wasserstein的甘斯(WGAN)和其他甘斯新的更新策略,超分辨率GAN)。建议更新策略是基于G和D的损失变化率比较,我们证明了该策略改善了收敛速度和精度。
Xu Ouyang, Gady Agam
Abstract: Optimizing the discriminator in Generative Adversarial Networks (GANs) to completion in the inner training loop is computationally prohibitive, and on finite datasets would result in overfitting. To address this, a common update strategy is to alternate between k optimization steps for the discriminator D and one optimization step for the generator G. This strategy is repeated in various GAN algorithms where k is selected empirically. In this paper, we show that this update strategy is not optimal in terms of accuracy and convergence speed, and propose a new update strategy for Wasserstein GANs (WGAN) and other GANs using the WGAN loss(e.g. WGAN-GP, Deblur GAN, and Super-resolution GAN). The proposed update strategy is based on a loss change ratio comparison of G and D. We demonstrate that the proposed strategy improves both convergence speed and accuracy.
摘要:在优化控辩剖成网络(甘斯)的鉴别,以在内部训练循环完成在计算上是禁止的,并在有限的数据集将导致过度拟合。为了解决这个问题,一个共同的更新策略是提供一种用于鉴别d和用于发电机G.这种策略在各种GAN算法重复其中k是根据经验选择的一个优化步骤k的最优化步骤之间交替。在本文中,我们表明,这种更新策略不是最佳的精度和收敛速度方面,并使用WGAN损失(如WGAN-GP,去模糊GAN提出Wasserstein的甘斯(WGAN)和其他甘斯新的更新策略,超分辨率GAN)。建议更新策略是基于G和D的损失变化率比较,我们证明了该策略改善了收敛速度和精度。
16. Color and Edge-Aware Adversarial Image Perturbations [PDF] 返回目录
Robert Bassett, Mitchell Graves
Abstract: Adversarial perturbation of images, in which a source image is deliberately modified with the intent of causing a classifier to misclassify the image, provides important insight into the robustness of image classifiers. In this work we develop two new methods for constructing adversarial perturbations, both of which are motivated by minimizing human ability to detect changes between the perturbed and source image. The first of these, the Edge-Aware method, reduces the magnitude of perturbations permitted in smooth regions of an image where changes are more easily detected. Our second method, the Color-Aware method, performs the perturbation in a color space which accurately captures human ability to distinguish differences in colors, thus reducing the perceived change. The Color-Aware and Edge-Aware methods can also be implemented simultaneously, resulting in image perturbations which account for both human color perception and sensitivity to changes in homogeneous regions. Though Edge-Aware and Color-Aware modifications exist for many image perturbations techniques, we focus on easily computed perturbations. We empirically demonstrate that the Color-Aware and Edge-Aware perturbations we consider effectively cause misclassification, are less distinguishable to human perception, and are as easy to compute as the most efficient image perturbation techniques. Code and demo available at this https URL
摘要:图像,其中,源图像是故意的意图使分类器错误分类的图像的修改的对抗性扰动,提供了重要的见解图像分类器的稳健性。在这项工作中,我们开发建设对抗性扰动两个新的方法,这两者都是通过最小化来检测扰动与源图像之间改变人的能力的动机。这些第一,边缘感知方法,减少了扰动的大小在更容易被检测到的变化的图像的平滑区域中允许的。我们的第二方法中,在色敏感的方法,执行其中准确地捕捉到区分颜色的差异,从而降低感知到改变人的能力的颜色空间扰动。非色盲和边缘感知方法也可以同时实施,导致占人类颜色感知和敏感性在均匀区域的变化图像扰动。虽然许多图像干扰技术存在边缘感知和色彩感知的修改,我们专注于很容易计算的扰动。我们经验表明,我们认为非色盲和边缘感知扰动有效误判的原因,是人类感知少区分,并作为容易计算作为最有效的图像干扰技术。代码和演示可在此HTTPS URL
Robert Bassett, Mitchell Graves
Abstract: Adversarial perturbation of images, in which a source image is deliberately modified with the intent of causing a classifier to misclassify the image, provides important insight into the robustness of image classifiers. In this work we develop two new methods for constructing adversarial perturbations, both of which are motivated by minimizing human ability to detect changes between the perturbed and source image. The first of these, the Edge-Aware method, reduces the magnitude of perturbations permitted in smooth regions of an image where changes are more easily detected. Our second method, the Color-Aware method, performs the perturbation in a color space which accurately captures human ability to distinguish differences in colors, thus reducing the perceived change. The Color-Aware and Edge-Aware methods can also be implemented simultaneously, resulting in image perturbations which account for both human color perception and sensitivity to changes in homogeneous regions. Though Edge-Aware and Color-Aware modifications exist for many image perturbations techniques, we focus on easily computed perturbations. We empirically demonstrate that the Color-Aware and Edge-Aware perturbations we consider effectively cause misclassification, are less distinguishable to human perception, and are as easy to compute as the most efficient image perturbation techniques. Code and demo available at this https URL
摘要:图像,其中,源图像是故意的意图使分类器错误分类的图像的修改的对抗性扰动,提供了重要的见解图像分类器的稳健性。在这项工作中,我们开发建设对抗性扰动两个新的方法,这两者都是通过最小化来检测扰动与源图像之间改变人的能力的动机。这些第一,边缘感知方法,减少了扰动的大小在更容易被检测到的变化的图像的平滑区域中允许的。我们的第二方法中,在色敏感的方法,执行其中准确地捕捉到区分颜色的差异,从而降低感知到改变人的能力的颜色空间扰动。非色盲和边缘感知方法也可以同时实施,导致占人类颜色感知和敏感性在均匀区域的变化图像扰动。虽然许多图像干扰技术存在边缘感知和色彩感知的修改,我们专注于很容易计算的扰动。我们经验表明,我们认为非色盲和边缘感知扰动有效误判的原因,是人类感知少区分,并作为容易计算作为最有效的图像干扰技术。代码和演示可在此HTTPS URL
17. Fast Single-shot Ship Instance Segmentation Based on Polar Template Mask in Remote Sensing Images [PDF] 返回目录
Zhenhang Huang, Shihao Sun, Ruirui Li
Abstract: Object detection and instance segmentation in remote sensing images is a fundamental and challenging task, due to the complexity of scenes and targets. The latest methods tried to take into account both the efficiency and the accuracy of instance segmentation. In order to improve both of them, in this paper, we propose a single-shot convolutional neural network structure, which is conceptually simple and straightforward, and meanwhile makes up for the problem of low accuracy of single-shot networks. Our method, termed with SSS-Net, detects targets based on the location of the object's center and the distances between the center and the points on the silhouette sampling with non-uniform angle intervals, thereby achieving abalanced sampling of lines in mask generation. In addition, we propose a non-uniform polar template IoU based on the contour template in polar coordinates. Experiments on both the Airbus Ship Detection Challenge dataset and the ISAIDships dataset show that SSS-Net has strong competitiveness in precision and speed for ship instance segmentation.
摘要:在遥感图像目标检测和实例分割是一个基本的和艰巨的任务,由于场景和目标的复杂性。最新的方法试图兼顾效率和实例分割的准确性。为了提高他们两个,在本文中,我们提出了一个单杆卷积神经网络结构,这是概念上简单明了,同时弥补了单次网络的精确度低的问题。我们的方法中,与SSS-Net的称为,检测基于所述对象的中心的位置与中心和上具有非均匀角度间隔的轮廓采样的点之间的距离,由此实现在掩模产生线abalanced采样的目标。此外,我们提出了一种基于极坐标轮廓模板的非均匀极性模板欠条。空客舰船检测挑战数据集和ISAIDships两个实验数据集显示,SSS-网拥有船舶例如分割的精度和速度较强的竞争力。
Zhenhang Huang, Shihao Sun, Ruirui Li
Abstract: Object detection and instance segmentation in remote sensing images is a fundamental and challenging task, due to the complexity of scenes and targets. The latest methods tried to take into account both the efficiency and the accuracy of instance segmentation. In order to improve both of them, in this paper, we propose a single-shot convolutional neural network structure, which is conceptually simple and straightforward, and meanwhile makes up for the problem of low accuracy of single-shot networks. Our method, termed with SSS-Net, detects targets based on the location of the object's center and the distances between the center and the points on the silhouette sampling with non-uniform angle intervals, thereby achieving abalanced sampling of lines in mask generation. In addition, we propose a non-uniform polar template IoU based on the contour template in polar coordinates. Experiments on both the Airbus Ship Detection Challenge dataset and the ISAIDships dataset show that SSS-Net has strong competitiveness in precision and speed for ship instance segmentation.
摘要:在遥感图像目标检测和实例分割是一个基本的和艰巨的任务,由于场景和目标的复杂性。最新的方法试图兼顾效率和实例分割的准确性。为了提高他们两个,在本文中,我们提出了一个单杆卷积神经网络结构,这是概念上简单明了,同时弥补了单次网络的精确度低的问题。我们的方法中,与SSS-Net的称为,检测基于所述对象的中心的位置与中心和上具有非均匀角度间隔的轮廓采样的点之间的距离,由此实现在掩模产生线abalanced采样的目标。此外,我们提出了一种基于极坐标轮廓模板的非均匀极性模板欠条。空客舰船检测挑战数据集和ISAIDships两个实验数据集显示,SSS-网拥有船舶例如分割的精度和速度较强的竞争力。
18. Pixel-Face: A Large-Scale, High-Resolution Benchmark for 3D Face Reconstruction [PDF] 返回目录
Zhang Yunxuan, Rong Yu, Liu Ziwei, Cheng Cheng
Abstract: 3D face reconstruction is a fundamental task that can facilitate numerous applications such as robust facial analysis and augmented reality. It is also a challenging task due to the lack of high-quality datasets that can fuel current deep learning-based methods. However, existing datasets are limited in quantity, realisticity and diversity. To circumvent these hurdles, we introduce Pixel-Face, a large-scale, high-resolution and diverse 3D face dataset with massive annotations. Specifically, Pixel-Face contains 855 subjects aging from 18 to 80. Each subject has more than 20 samples with various expressions. Each sample is composed of high-resolution multi-view RGB images and 3D meshes with various expressions. Moreover, we collect precise landmarks annotation and 3D registration result for each data. To demonstrate the advantages of Pixel-Face, we re-parameterize the 3D Morphable Model (3DMM) into Pixel-3DM using the collected data. We show that the obtained Pixel-3DM is better in modeling a wide range of face shapes and expressions. We also carefully benchmark existing 3D face reconstruction methods on our dataset. Moreover, Pixel-Face serves as an effective training source. We observe that the performance of current face reconstruction models significantly improves both on existing benchmarks and Pixel-Face after being fine-tuned using our newly collected data. Extensive experiments demonstrate the effectiveness of Pixel-3DM and the usefulness of Pixel-Face. The code and data is available at this https URL.
摘要:三维人脸重建是能促进多种应用,如强大的面部分析和增强现实的根本任务。这也是一项艰巨的任务,由于缺乏高质量的数据集,可以推动目前深基于学习的方法。但是,现有的数据集在数量上,realisticity和多样性的限制。为了克服这些障碍,我们引入像素脸,大规模,高解析度和多样化的三维人脸数据集进行大规模的注解。具体而言,像素脸包含855名受试者老化从18至80。每个受试者具有超过20米的样品与各种表情。每个样品由高分辨率多视点RGB图像和3D网格与各种表情。此外,我们收集每个数据精确地标注释和3D配准结果。为了证明像素工作面的优点,我们使用所收集的数据重新参数的三维形变模型(3DMM)转换成像素3DM。我们发现,所获得的像素3DM是在建模广泛脸型和表情的更好。我们还精心基准生存在我们的数据集三维人脸重建方法。此外,像素面用作有效的训练源。我们观察到,当前面的重建模型的性能是微调使用我们的新收集的数据后,显著改善了现有的基准和像素面。大量的实验证明像素3DM的有效性和Pixel面的有效性。代码和数据可在此HTTPS URL。
Zhang Yunxuan, Rong Yu, Liu Ziwei, Cheng Cheng
Abstract: 3D face reconstruction is a fundamental task that can facilitate numerous applications such as robust facial analysis and augmented reality. It is also a challenging task due to the lack of high-quality datasets that can fuel current deep learning-based methods. However, existing datasets are limited in quantity, realisticity and diversity. To circumvent these hurdles, we introduce Pixel-Face, a large-scale, high-resolution and diverse 3D face dataset with massive annotations. Specifically, Pixel-Face contains 855 subjects aging from 18 to 80. Each subject has more than 20 samples with various expressions. Each sample is composed of high-resolution multi-view RGB images and 3D meshes with various expressions. Moreover, we collect precise landmarks annotation and 3D registration result for each data. To demonstrate the advantages of Pixel-Face, we re-parameterize the 3D Morphable Model (3DMM) into Pixel-3DM using the collected data. We show that the obtained Pixel-3DM is better in modeling a wide range of face shapes and expressions. We also carefully benchmark existing 3D face reconstruction methods on our dataset. Moreover, Pixel-Face serves as an effective training source. We observe that the performance of current face reconstruction models significantly improves both on existing benchmarks and Pixel-Face after being fine-tuned using our newly collected data. Extensive experiments demonstrate the effectiveness of Pixel-3DM and the usefulness of Pixel-Face. The code and data is available at this https URL.
摘要:三维人脸重建是能促进多种应用,如强大的面部分析和增强现实的根本任务。这也是一项艰巨的任务,由于缺乏高质量的数据集,可以推动目前深基于学习的方法。但是,现有的数据集在数量上,realisticity和多样性的限制。为了克服这些障碍,我们引入像素脸,大规模,高解析度和多样化的三维人脸数据集进行大规模的注解。具体而言,像素脸包含855名受试者老化从18至80。每个受试者具有超过20米的样品与各种表情。每个样品由高分辨率多视点RGB图像和3D网格与各种表情。此外,我们收集每个数据精确地标注释和3D配准结果。为了证明像素工作面的优点,我们使用所收集的数据重新参数的三维形变模型(3DMM)转换成像素3DM。我们发现,所获得的像素3DM是在建模广泛脸型和表情的更好。我们还精心基准生存在我们的数据集三维人脸重建方法。此外,像素面用作有效的训练源。我们观察到,当前面的重建模型的性能是微调使用我们的新收集的数据后,显著改善了现有的基准和像素面。大量的实验证明像素3DM的有效性和Pixel面的有效性。代码和数据可在此HTTPS URL。
19. All About Knowledge Graphs for Actions [PDF] 返回目录
Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava
Abstract: Current action recognition systems require large amounts of training data for recognizing an action. Recent works have explored the paradigm of zero-shot and few-shot learning to learn classifiers for unseen categories or categories with few labels. Following similar paradigms in object recognition, these approaches utilize external sources of knowledge (eg. knowledge graphs from language domains). However, unlike objects, it is unclear what is the best knowledge representation for actions. In this paper, we intend to gain a better understanding of knowledge graphs (KGs) that can be utilized for zero-shot and few-shot action recognition. In particular, we study three different construction mechanisms for KGs: action embeddings, action-object embeddings, visual embeddings. We present extensive analysis of the impact of different KGs in different experimental setups. Finally, to enable a systematic study of zero-shot and few-shot approaches, we propose an improved evaluation paradigm based on UCF101, HMDB51, and Charades datasets for knowledge transfer from models trained on Kinetics.
摘要:当前动作识别系统需要大量的训练数据用于识别一个动作。最近的作品已经探索了零射门,很少拍学会学习中的潜在类别或类别,有几个标签分类的范例。继物体识别类似的范例,这些方法利用知识的外部来源(例如,从语言知识域图)。然而,不同于对象,目前还不清楚什么是行动的最佳知识表示。在本文中,我们打算更好地了解可用于零射门,很少拍动作识别知识图(KGS)的。特别是,我们研究了幼稚园三种不同的建设机制:行动的嵌入,行动对象的嵌入,视觉的嵌入。我们提出的不同幼儿园在不同的实验设置的影响广泛的分析。最后,为了使零射门,很少拍接近系统的研究,提出了一种基于UCF101,HMDB51改进评估模式,并字谜数据集从训练的有关动力学模型的知识转移。
Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava
Abstract: Current action recognition systems require large amounts of training data for recognizing an action. Recent works have explored the paradigm of zero-shot and few-shot learning to learn classifiers for unseen categories or categories with few labels. Following similar paradigms in object recognition, these approaches utilize external sources of knowledge (eg. knowledge graphs from language domains). However, unlike objects, it is unclear what is the best knowledge representation for actions. In this paper, we intend to gain a better understanding of knowledge graphs (KGs) that can be utilized for zero-shot and few-shot action recognition. In particular, we study three different construction mechanisms for KGs: action embeddings, action-object embeddings, visual embeddings. We present extensive analysis of the impact of different KGs in different experimental setups. Finally, to enable a systematic study of zero-shot and few-shot approaches, we propose an improved evaluation paradigm based on UCF101, HMDB51, and Charades datasets for knowledge transfer from models trained on Kinetics.
摘要:当前动作识别系统需要大量的训练数据用于识别一个动作。最近的作品已经探索了零射门,很少拍学会学习中的潜在类别或类别,有几个标签分类的范例。继物体识别类似的范例,这些方法利用知识的外部来源(例如,从语言知识域图)。然而,不同于对象,目前还不清楚什么是行动的最佳知识表示。在本文中,我们打算更好地了解可用于零射门,很少拍动作识别知识图(KGS)的。特别是,我们研究了幼稚园三种不同的建设机制:行动的嵌入,行动对象的嵌入,视觉的嵌入。我们提出的不同幼儿园在不同的实验设置的影响广泛的分析。最后,为了使零射门,很少拍接近系统的研究,提出了一种基于UCF101,HMDB51改进评估模式,并字谜数据集从训练的有关动力学模型的知识转移。
20. Regularized Densely-connected Pyramid Network for Salient Instance Segmentation [PDF] 返回目录
Yu-Huan Wu, Yun Liu, Le Zhang, Wang Gao, Ming-Ming Cheng
Abstract: Much of the recent efforts on salient object detection (SOD) has been devoted to producing accurate saliency maps without being aware of their instance labels. To this end, we propose a new pipeline for end-to-end salient instance segmentation (SIS) that predicts a class-agnostic mask for each detected salient instance. To make better use of the rich feature hierarchies in deep networks, we propose the regularized dense connections, which attentively promote informative features and suppress non-informative ones from all feature pyramids, to enhance the side predictions. A novel multi-level RoIAlign based decoder is introduced as well to adaptively aggregate multi-level features for better mask predictions. Such good strategies can be well-encapsulated into the Mask-RCNN pipeline. Extensive experiments on popular benchmarks demonstrate that our design significantly outperforms existing state-of-the-art competitors by 6.3% (58.6% vs 52.3%) in terms of the AP metric. The code is available at this https URL.
摘要:许多对突出物的检测,最近的努力(SOD)一直致力于生产精确的显着性映射并没有意识到它们的实例标签。为此,我们提出了终端到终端的突出实例分割(SIS)的新管道,预测为每个检测到显着的实例类无关的面具。为了更好地利用网络深沉丰富的功能层次,提出了正规化密连接,这聚精会神地促进信息量大的特点,抑制非知识性的人从所有功能金字塔,提升了侧面的预测。一种新颖的多级RoIAlign基于解码器的引入以及用于更好掩模预测自适应骨料多级的功能。这么好的策略可以很好地封装成面膜RCNN管道。流行的基准广泛实验表明,我们的设计显著6.3%(58.6%比52.3%)的AP指标方面优于现有的国家的最先进的竞争对手。该代码可在此HTTPS URL。
Yu-Huan Wu, Yun Liu, Le Zhang, Wang Gao, Ming-Ming Cheng
Abstract: Much of the recent efforts on salient object detection (SOD) has been devoted to producing accurate saliency maps without being aware of their instance labels. To this end, we propose a new pipeline for end-to-end salient instance segmentation (SIS) that predicts a class-agnostic mask for each detected salient instance. To make better use of the rich feature hierarchies in deep networks, we propose the regularized dense connections, which attentively promote informative features and suppress non-informative ones from all feature pyramids, to enhance the side predictions. A novel multi-level RoIAlign based decoder is introduced as well to adaptively aggregate multi-level features for better mask predictions. Such good strategies can be well-encapsulated into the Mask-RCNN pipeline. Extensive experiments on popular benchmarks demonstrate that our design significantly outperforms existing state-of-the-art competitors by 6.3% (58.6% vs 52.3%) in terms of the AP metric. The code is available at this https URL.
摘要:许多对突出物的检测,最近的努力(SOD)一直致力于生产精确的显着性映射并没有意识到它们的实例标签。为此,我们提出了终端到终端的突出实例分割(SIS)的新管道,预测为每个检测到显着的实例类无关的面具。为了更好地利用网络深沉丰富的功能层次,提出了正规化密连接,这聚精会神地促进信息量大的特点,抑制非知识性的人从所有功能金字塔,提升了侧面的预测。一种新颖的多级RoIAlign基于解码器的引入以及用于更好掩模预测自适应骨料多级的功能。这么好的策略可以很好地封装成面膜RCNN管道。流行的基准广泛实验表明,我们的设计显著6.3%(58.6%比52.3%)的AP指标方面优于现有的国家的最先进的竞争对手。该代码可在此HTTPS URL。
21. Adversarial Training for Multi-Channel Sign Language Production [PDF] 返回目录
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Abstract: Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face and body) features in a precise, intricate manner. Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody this full sign morphology to be truly understandable by the Deaf community. Previous work has mainly focused on manual feature production, with an under-articulated output caused by regression to the mean. In this paper, we propose an Adversarial Multi-Channel approach to SLP. We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator. Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output. Additionally, we fully encapsulate sign articulators with the inclusion of non-manual features, producing facial features and mouthing patterns. We evaluate on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, and report state-of-the art SLP back-translation performance for manual production. We set new benchmarks for the production of multi-channel sign to underpin future research into realistic SLP.
摘要:手语可以丰富的多通道语言,需要手动(手)的关节运动和非手动(脸和身体)的特征是在精确,复杂的方式。手语生产(SLP),从口语自动翻译手语,必须体现这个充满标志的形态是由聋人社区真正理解的。以前的工作主要集中在手动功能的生产,所造成的均值回归的下铰接输出。在本文中,我们提出了一种对抗性的多渠道方式来SLP。我们帧招牌制作作为基于变压器发电机和条件鉴别之间的极小游戏。我们的对抗鉴别评估招牌制作的现实条件上的源文本,推动发电机向现实而清晰的输出。此外,我们完全封装标志发音器官与包容的非手动功能,生产五官和唱衰的图案。我们评估的挑战RWTH-PHOENIX-天气-2014T(PHOENIX14T)数据集,和国家的最先进的SLP报告回译表现为手工生产。我们集生产多通道标志,以支持未来的研究转化为现实SLP了新的基准。
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Abstract: Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face and body) features in a precise, intricate manner. Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody this full sign morphology to be truly understandable by the Deaf community. Previous work has mainly focused on manual feature production, with an under-articulated output caused by regression to the mean. In this paper, we propose an Adversarial Multi-Channel approach to SLP. We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator. Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output. Additionally, we fully encapsulate sign articulators with the inclusion of non-manual features, producing facial features and mouthing patterns. We evaluate on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, and report state-of-the art SLP back-translation performance for manual production. We set new benchmarks for the production of multi-channel sign to underpin future research into realistic SLP.
摘要:手语可以丰富的多通道语言,需要手动(手)的关节运动和非手动(脸和身体)的特征是在精确,复杂的方式。手语生产(SLP),从口语自动翻译手语,必须体现这个充满标志的形态是由聋人社区真正理解的。以前的工作主要集中在手动功能的生产,所造成的均值回归的下铰接输出。在本文中,我们提出了一种对抗性的多渠道方式来SLP。我们帧招牌制作作为基于变压器发电机和条件鉴别之间的极小游戏。我们的对抗鉴别评估招牌制作的现实条件上的源文本,推动发电机向现实而清晰的输出。此外,我们完全封装标志发音器官与包容的非手动功能,生产五官和唱衰的图案。我们评估的挑战RWTH-PHOENIX-天气-2014T(PHOENIX14T)数据集,和国家的最先进的SLP报告回译表现为手工生产。我们集生产多通道标志,以支持未来的研究转化为现实SLP了新的基准。
22. Modality Attention and Sampling Enables Deep Learning with Heterogeneous Marker Combinations in Fluorescence Microscopy [PDF] 返回目录
Alvaro Gomariz, Tiziano Portenier, Patrick M. Helbling, Stephan Isringhausen, Ute Suessbier, César Nombela-Arrieta, Orcun Goksel
Abstract: Fluorescence microscopy allows for a detailed inspection of cells, cellular networks, and anatomical landmarks by staining with a variety of carefully-selected markers visualized as color channels. Quantitative characterization of structures in acquired images often relies on automatic image analysis methods. Despite the success of deep learning methods in other vision applications, their potential for fluorescence image analysis remains underexploited. One reason lies in the considerable workload required to train accurate models, which are normally specific for a given combination of markers, and therefore applicable to a very restricted number of experimental settings. We herein propose Marker Sampling and Excite, a neural network approach with a modality sampling strategy and a novel attention module that together enable ($i$) flexible training with heterogeneous datasets with combinations of markers and ($ii$) successful utility of learned models on arbitrary subsets of markers prospectively. We show that our single neural network solution performs comparably to an upper bound scenario where an ensemble of many networks is naïvely trained for each possible marker combination separately. In addition, we demonstrate the feasibility of our framework in high-throughput biological analysis by revising a recent quantitative characterization of bone marrow vasculature in 3D confocal microscopy datasets. Not only can our work substantially ameliorate the use of deep learning in fluorescence microscopy analysis, but it can also be utilized in other fields with incomplete data acquisitions and missing modalities.
摘要:荧光显微镜允许通过与多种可视化为颜色通道精心选择的标记的染色细胞,蜂窝网络和解剖标志的详细的检查。在获取的图像结构的定量表征往往依赖于自动图像分析方法。尽管在其他视觉应用深度学习方法的成功,他们的荧光图像分析仍然潜力尚未开发。一个原因是在相当大的工作量需要训练精确的模型,这是标记的给定组合通常特定的,因此适用于实验设置的非常有限的数字。在此我们建议标记采样和激励,具有形态抽样策略和新的关注模块一起启动($ I $)灵活的培训与异构数据集带标记的组合($ II $)了解到机型的成功效用神经网络方法对前瞻性指标的任意子集。我们表明,我们的单神经元网络解决方案进行同等于上限方案,其中许多网络的合奏天真地训练每个可能的标记组合分开。此外,我们在3D共聚焦显微镜数据集修订骨髓血管的最近定量表征证明我们的架构在高通量生物分析的可行性。不仅可以我们的工作基本改善在荧光显微镜分析中的应用深度学习的,但它也可以用不完整的数据采集和失踪模式等领域使用。
Alvaro Gomariz, Tiziano Portenier, Patrick M. Helbling, Stephan Isringhausen, Ute Suessbier, César Nombela-Arrieta, Orcun Goksel
Abstract: Fluorescence microscopy allows for a detailed inspection of cells, cellular networks, and anatomical landmarks by staining with a variety of carefully-selected markers visualized as color channels. Quantitative characterization of structures in acquired images often relies on automatic image analysis methods. Despite the success of deep learning methods in other vision applications, their potential for fluorescence image analysis remains underexploited. One reason lies in the considerable workload required to train accurate models, which are normally specific for a given combination of markers, and therefore applicable to a very restricted number of experimental settings. We herein propose Marker Sampling and Excite, a neural network approach with a modality sampling strategy and a novel attention module that together enable ($i$) flexible training with heterogeneous datasets with combinations of markers and ($ii$) successful utility of learned models on arbitrary subsets of markers prospectively. We show that our single neural network solution performs comparably to an upper bound scenario where an ensemble of many networks is naïvely trained for each possible marker combination separately. In addition, we demonstrate the feasibility of our framework in high-throughput biological analysis by revising a recent quantitative characterization of bone marrow vasculature in 3D confocal microscopy datasets. Not only can our work substantially ameliorate the use of deep learning in fluorescence microscopy analysis, but it can also be utilized in other fields with incomplete data acquisitions and missing modalities.
摘要:荧光显微镜允许通过与多种可视化为颜色通道精心选择的标记的染色细胞,蜂窝网络和解剖标志的详细的检查。在获取的图像结构的定量表征往往依赖于自动图像分析方法。尽管在其他视觉应用深度学习方法的成功,他们的荧光图像分析仍然潜力尚未开发。一个原因是在相当大的工作量需要训练精确的模型,这是标记的给定组合通常特定的,因此适用于实验设置的非常有限的数字。在此我们建议标记采样和激励,具有形态抽样策略和新的关注模块一起启动($ I $)灵活的培训与异构数据集带标记的组合($ II $)了解到机型的成功效用神经网络方法对前瞻性指标的任意子集。我们表明,我们的单神经元网络解决方案进行同等于上限方案,其中许多网络的合奏天真地训练每个可能的标记组合分开。此外,我们在3D共聚焦显微镜数据集修订骨髓血管的最近定量表征证明我们的架构在高通量生物分析的可行性。不仅可以我们的工作基本改善在荧光显微镜分析中的应用深度学习的,但它也可以用不完整的数据采集和失踪模式等领域使用。
23. Metrics for Exposing the Biases of Content-Style Disentanglement [PDF] 返回目录
Xiao Liu, Spyridon Thermos, Gabriele Valvano, Agisilaos Chartsias, Alison O'Neil, Sotirios A. Tsaftaris
Abstract: Recent state-of-the-art semi- and un-supervised solutions for challenging computer vision tasks have used the idea of encoding image content into a spatial tensor and image appearance or "style" into a vector. These decomposed representations take advantage of equivariant properties of network design and improve performance in equivariant tasks, such as image-to-image translation. Most of these methods use the term "disentangled" for their representations and employ model design, learning objectives, and data biases to achieve good model performance. While considerable effort has been made to measure disentanglement in vector representations, currently, metrics that can characterize the degree of disentanglement between the representations and task performance are lacking. In this paper, we propose metrics to measure how (un)correlated, biased, and informative the content and style representations are. In particular, we first identify key design choices and learning constraints on three popular models that employ content-style disentanglement and derive ablated versions. Then, we use our metrics to ascertain the role of each bias. Our experiments reveal a "sweet-spot" between disentanglement, task performance and latent space interpretability. The proposed metrics enable the design of better models and the selection of models that achieve the desired performance and disentanglement. Our metrics library is available at this https URL.
摘要:最近的国家的最先进的半自动和无监督的解决方案挑战性的电脑视觉任务都使用编码图像内容为空间张量和图像外观或“风格”为载体的想法。这些分解表现进行网络设计等变性能的优势,提高在等变化的任务,如图像 - 图像转换性能。这些方法大多使用“迎刃而解”他们表示,聘请模型设计,学习目标和数据偏见这个词来达到良好的模型性能。尽管相当大的努力已经取得了测量向量表示的解开,目前,可表征的陈述和工作绩效之间的解开程度的指标缺乏。在本文中,我们提出的标准来衡量(UN)如何相关,偏见,和翔实的内容和风格表示是。特别是,我们先找出三种流行模式,聘请内容风格的解开关键设计选择和学习的约束和派生消融版本。然后,我们用我们的标准来确定每个偏差的作用。我们的实验揭示的解开,任务绩效和潜在空间解释性之间的“最佳点”。所提出的指标能更好的模型的设计和实现期望的性能和解开车型选择。我们的指标库可在此HTTPS URL。
Xiao Liu, Spyridon Thermos, Gabriele Valvano, Agisilaos Chartsias, Alison O'Neil, Sotirios A. Tsaftaris
Abstract: Recent state-of-the-art semi- and un-supervised solutions for challenging computer vision tasks have used the idea of encoding image content into a spatial tensor and image appearance or "style" into a vector. These decomposed representations take advantage of equivariant properties of network design and improve performance in equivariant tasks, such as image-to-image translation. Most of these methods use the term "disentangled" for their representations and employ model design, learning objectives, and data biases to achieve good model performance. While considerable effort has been made to measure disentanglement in vector representations, currently, metrics that can characterize the degree of disentanglement between the representations and task performance are lacking. In this paper, we propose metrics to measure how (un)correlated, biased, and informative the content and style representations are. In particular, we first identify key design choices and learning constraints on three popular models that employ content-style disentanglement and derive ablated versions. Then, we use our metrics to ascertain the role of each bias. Our experiments reveal a "sweet-spot" between disentanglement, task performance and latent space interpretability. The proposed metrics enable the design of better models and the selection of models that achieve the desired performance and disentanglement. Our metrics library is available at this https URL.
摘要:最近的国家的最先进的半自动和无监督的解决方案挑战性的电脑视觉任务都使用编码图像内容为空间张量和图像外观或“风格”为载体的想法。这些分解表现进行网络设计等变性能的优势,提高在等变化的任务,如图像 - 图像转换性能。这些方法大多使用“迎刃而解”他们表示,聘请模型设计,学习目标和数据偏见这个词来达到良好的模型性能。尽管相当大的努力已经取得了测量向量表示的解开,目前,可表征的陈述和工作绩效之间的解开程度的指标缺乏。在本文中,我们提出的标准来衡量(UN)如何相关,偏见,和翔实的内容和风格表示是。特别是,我们先找出三种流行模式,聘请内容风格的解开关键设计选择和学习的约束和派生消融版本。然后,我们用我们的标准来确定每个偏差的作用。我们的实验揭示的解开,任务绩效和潜在空间解释性之间的“最佳点”。所提出的指标能更好的模型的设计和实现期望的性能和解开车型选择。我们的指标库可在此HTTPS URL。
24. Analyzing Worldwide Social Distancing through Large-Scale Computer Vision [PDF] 返回目录
Isha Ghodgaonkar, Subhankar Chakraborty, Vishnu Banna, Shane Allcroft, Mohammed Metwaly, Fischer Bordwell, Kohsuke Kimura, Xinxin Zhao, Abhinav Goel, Caleb Tung, Akhil Chinnakotla, Minghao Xue, Yung-Hsiang Lu, Mark Daniel Ward, Wei Zakharov, David S. Ebert, David M. Barbarash, George K. Thiruvathukal
Abstract: In order to contain the COVID-19 pandemic, countries around the world have introduced social distancing guidelines as public health interventions to reduce the spread of the disease. However, monitoring the efficacy of these guidelines at a large scale (nationwide or worldwide) is difficult. To make matters worse, traditional observational methods such as in-person reporting is dangerous because observers may risk infection. A better solution is to observe activities through network cameras; this approach is scalable and observers can stay in safe locations. This research team has created methods that can discover thousands of network cameras worldwide, retrieve data from the cameras, analyze the data, and report the sizes of crowds as different countries issued and lifted restrictions (also called ''lockdown''). We discover 11,140 network cameras that provide real-time data and we present the results across 15 countries. We collect data from these cameras beginning April 2020 at approximately 0.5TB per week. After analyzing 10,424,459 images from still image cameras and frames extracted periodically from video, the data reveals that the residents in some countries exhibited more activity (judged by numbers of people and vehicles) after the restrictions were lifted. In other countries, the amounts of activities showed no obvious changes during the restrictions and after the restrictions were lifted. The data further reveals whether people stay ''social distancing'', at least 6 feet apart. This study discerns whether social distancing is being followed in several types of locations and geographical locations worldwide and serve as an early indicator whether another wave of infections is likely to occur soon.
摘要:作为公共卫生干预措施为了遏制COVID-19大流行,世界各国纷纷出台社会隔离指引,以减少疾病的传播。然而,在大规模(全国或全世界范围)监测这些准则的功效是困难的。更糟糕的是,传统的观测方法,如在人的报道是危险的,因为观察者可能感染的风险。更好的方法是通过网络摄像机观察活动;这种方法可扩展性和观察员可以留在安全地点。该研究团队已经创建了一个可以找到成千上万的网络摄像机世界各地,检索摄像机数据,分析数据,并报告人群为发行和取消限制,不同国家的大小的方法(也称为“”锁定“”)。我们发现11140台网络摄像机提供实时数据和我们目前对15个国家的结果。我们收集这些相机大约0.5TB每星期开始的2020年4月的数据。从周期性地从视频中提取静态图像摄像头和帧分析10424459张图像后,数据显示,在一些国家,居民表现出的限制被取消后,更多的活动(通过人车数量判断)。在其他国家,活动量显示期间的限制,后限制无明显变化被取消。该数据进一步揭示了人们是否留“”社会距离“”,除了至少6脚。这项研究辨别出社会距离是否被遵循全球几种类型的位置和地理位置,并作为一个早期指标感染的另一种波是否可能很快发生。
Isha Ghodgaonkar, Subhankar Chakraborty, Vishnu Banna, Shane Allcroft, Mohammed Metwaly, Fischer Bordwell, Kohsuke Kimura, Xinxin Zhao, Abhinav Goel, Caleb Tung, Akhil Chinnakotla, Minghao Xue, Yung-Hsiang Lu, Mark Daniel Ward, Wei Zakharov, David S. Ebert, David M. Barbarash, George K. Thiruvathukal
Abstract: In order to contain the COVID-19 pandemic, countries around the world have introduced social distancing guidelines as public health interventions to reduce the spread of the disease. However, monitoring the efficacy of these guidelines at a large scale (nationwide or worldwide) is difficult. To make matters worse, traditional observational methods such as in-person reporting is dangerous because observers may risk infection. A better solution is to observe activities through network cameras; this approach is scalable and observers can stay in safe locations. This research team has created methods that can discover thousands of network cameras worldwide, retrieve data from the cameras, analyze the data, and report the sizes of crowds as different countries issued and lifted restrictions (also called ''lockdown''). We discover 11,140 network cameras that provide real-time data and we present the results across 15 countries. We collect data from these cameras beginning April 2020 at approximately 0.5TB per week. After analyzing 10,424,459 images from still image cameras and frames extracted periodically from video, the data reveals that the residents in some countries exhibited more activity (judged by numbers of people and vehicles) after the restrictions were lifted. In other countries, the amounts of activities showed no obvious changes during the restrictions and after the restrictions were lifted. The data further reveals whether people stay ''social distancing'', at least 6 feet apart. This study discerns whether social distancing is being followed in several types of locations and geographical locations worldwide and serve as an early indicator whether another wave of infections is likely to occur soon.
摘要:作为公共卫生干预措施为了遏制COVID-19大流行,世界各国纷纷出台社会隔离指引,以减少疾病的传播。然而,在大规模(全国或全世界范围)监测这些准则的功效是困难的。更糟糕的是,传统的观测方法,如在人的报道是危险的,因为观察者可能感染的风险。更好的方法是通过网络摄像机观察活动;这种方法可扩展性和观察员可以留在安全地点。该研究团队已经创建了一个可以找到成千上万的网络摄像机世界各地,检索摄像机数据,分析数据,并报告人群为发行和取消限制,不同国家的大小的方法(也称为“”锁定“”)。我们发现11140台网络摄像机提供实时数据和我们目前对15个国家的结果。我们收集这些相机大约0.5TB每星期开始的2020年4月的数据。从周期性地从视频中提取静态图像摄像头和帧分析10424459张图像后,数据显示,在一些国家,居民表现出的限制被取消后,更多的活动(通过人车数量判断)。在其他国家,活动量显示期间的限制,后限制无明显变化被取消。该数据进一步揭示了人们是否留“”社会距离“”,除了至少6脚。这项研究辨别出社会距离是否被遵循全球几种类型的位置和地理位置,并作为一个早期指标感染的另一种波是否可能很快发生。
25. A Federated Approach for Fine-Grained Classification of Fashion Apparel [PDF] 返回目录
Tejaswini Mallavarapu, Luke Cranfill, Junggab Son, Eun Hye Kim, Reza M. Parizi, John Morris
Abstract: As online retail services proliferate and are pervasive in modern lives, applications for classifying fashion apparel features from image data are becoming more indispensable. Online retailers, from leading companies to start-ups, can leverage such applications in order to increase profit margin and enhance the consumer experience. Many notable schemes have been proposed to classify fashion items, however, the majority of which focused upon classifying basic-level categories, such as T-shirts, pants, skirts, shoes, bags, and so forth. In contrast to most prior efforts, this paper aims to enable an in-depth classification of fashion item attributes within the same category. Beginning with a single dress, we seek to classify the type of dress hem, the hem length, and the sleeve length. The proposed scheme is comprised of three major stages: (a) localization of a target item from an input image using semantic segmentation, (b) detection of human key points (e.g., point of shoulder) using a pre-trained CNN and a bounding box, and (c) three phases to classify the attributes using a combination of algorithmic approaches and deep neural networks. The experimental results demonstrate that the proposed scheme is highly effective, with all categories having average precision of above 93.02%, and outperforms existing Convolutional Neural Networks (CNNs)-based schemes.
摘要:随着网络零售服务增殖,在现代生活中无处不在,为时尚服饰分类应用中的图像数据功能正变得越来越不可缺少。在线零售商,从龙头企业到初创企业,可为了提高利润率,提升消费者的使用体验充分利用这样的应用程序。许多著名的方案被提出来分类的时尚单品,然而,其中大部分集中在基层类,如T恤,裤子,裙子,鞋子,箱包等进行分类。与大多数现有成果,本文的目的在于使时尚物品属性的同一类别中进行了深入的分类。与单一的着装开始,我们寻求衣服下摆,下摆长度和袖子长度的类型进行分类。所提出的方案是由三个主要阶段:从输入图像中使用语义分割目标项目(a)的定位,人类要点(b)的检测(例如,肩部的点),使用预先训练CNN和边界盒,和(c)三个阶段使用的算法的方法和深神经网络的组合的属性进行分类。实验结果表明,所提出的方案是非常有效的,与具有所有类别的平均精确度93.02%以上,和现有的卷积神经网络(细胞神经网络)的基于性能优于方案。
Tejaswini Mallavarapu, Luke Cranfill, Junggab Son, Eun Hye Kim, Reza M. Parizi, John Morris
Abstract: As online retail services proliferate and are pervasive in modern lives, applications for classifying fashion apparel features from image data are becoming more indispensable. Online retailers, from leading companies to start-ups, can leverage such applications in order to increase profit margin and enhance the consumer experience. Many notable schemes have been proposed to classify fashion items, however, the majority of which focused upon classifying basic-level categories, such as T-shirts, pants, skirts, shoes, bags, and so forth. In contrast to most prior efforts, this paper aims to enable an in-depth classification of fashion item attributes within the same category. Beginning with a single dress, we seek to classify the type of dress hem, the hem length, and the sleeve length. The proposed scheme is comprised of three major stages: (a) localization of a target item from an input image using semantic segmentation, (b) detection of human key points (e.g., point of shoulder) using a pre-trained CNN and a bounding box, and (c) three phases to classify the attributes using a combination of algorithmic approaches and deep neural networks. The experimental results demonstrate that the proposed scheme is highly effective, with all categories having average precision of above 93.02%, and outperforms existing Convolutional Neural Networks (CNNs)-based schemes.
摘要:随着网络零售服务增殖,在现代生活中无处不在,为时尚服饰分类应用中的图像数据功能正变得越来越不可缺少。在线零售商,从龙头企业到初创企业,可为了提高利润率,提升消费者的使用体验充分利用这样的应用程序。许多著名的方案被提出来分类的时尚单品,然而,其中大部分集中在基层类,如T恤,裤子,裙子,鞋子,箱包等进行分类。与大多数现有成果,本文的目的在于使时尚物品属性的同一类别中进行了深入的分类。与单一的着装开始,我们寻求衣服下摆,下摆长度和袖子长度的类型进行分类。所提出的方案是由三个主要阶段:从输入图像中使用语义分割目标项目(a)的定位,人类要点(b)的检测(例如,肩部的点),使用预先训练CNN和边界盒,和(c)三个阶段使用的算法的方法和深神经网络的组合的属性进行分类。实验结果表明,所提出的方案是非常有效的,与具有所有类别的平均精确度93.02%以上,和现有的卷积神经网络(细胞神经网络)的基于性能优于方案。
26. A Scene-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video [PDF] 返回目录
Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
Abstract: Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-agreed definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a scene-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a discriminator. Since our framework only looks at object detections, it can be applied to different scenes, provided that abnormal events are defined identically across scenes. This makes our method scene agnostic, as we rely strictly on objects that can cause anomalies, and not on the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain adversarial examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the adversarial examples. We further utilize the adversarial examples to serve as abnormal examples when training a binary classifier to discriminate between normal and abnormal latent features and reconstructions. Furthermore, to ensure that the auto-encoders focus only on the main object inside each bounding box image, we introduce a branch that learns to segment the main object. We compare our framework with the state-of-the-art methods on three benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets.
摘要:视频异常事件检测是一个复杂的计算机视觉的问题,已引起显著的关注在最近几年。该任务的复杂性来自于异常事件的共同商定的定义,即很少发生的事件,通常依赖于周围的环境。继异常事件检测作为异常检测的标准制定,我们建议从培训视频获悉只含有正常活动的场景无关的框架。我们的框架由对象检测器,一组的外观和运动的自动编码器,和一个鉴别器。由于我们的框架只着眼于物体检测,它可以适用于不同的场景,提供了异常事件跨场景相同的定义。这使得我们的方法的场景无关,因为我们严格依靠的对象,可能会导致异常,而不是背景。为了克服训练过程中缺乏异常数据,我们提出了自动编码器对抗性的学习策略。我们创建一个场景无关的一套外的领域对抗的例子,这是正确的自动编码器在对抗实例应用梯度上升之前重建。我们进一步利用对抗例子训练二元分类器时的正常和异常潜特征和重建之间进行区分,作为异常的例子。此外,为了确保自动编码器只注重每个包围盒图像内的主要对象,我们引入一个分支,学会段的主要对象。我们比较我们与三个基准数据集的国家的最先进的方法框架,使用各种评价指标。相比于现有的方法,实证结果表明,我们的方法实现对所有数据集良好的性能。
Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
Abstract: Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-agreed definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a scene-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a discriminator. Since our framework only looks at object detections, it can be applied to different scenes, provided that abnormal events are defined identically across scenes. This makes our method scene agnostic, as we rely strictly on objects that can cause anomalies, and not on the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain adversarial examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the adversarial examples. We further utilize the adversarial examples to serve as abnormal examples when training a binary classifier to discriminate between normal and abnormal latent features and reconstructions. Furthermore, to ensure that the auto-encoders focus only on the main object inside each bounding box image, we introduce a branch that learns to segment the main object. We compare our framework with the state-of-the-art methods on three benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets.
摘要:视频异常事件检测是一个复杂的计算机视觉的问题,已引起显著的关注在最近几年。该任务的复杂性来自于异常事件的共同商定的定义,即很少发生的事件,通常依赖于周围的环境。继异常事件检测作为异常检测的标准制定,我们建议从培训视频获悉只含有正常活动的场景无关的框架。我们的框架由对象检测器,一组的外观和运动的自动编码器,和一个鉴别器。由于我们的框架只着眼于物体检测,它可以适用于不同的场景,提供了异常事件跨场景相同的定义。这使得我们的方法的场景无关,因为我们严格依靠的对象,可能会导致异常,而不是背景。为了克服训练过程中缺乏异常数据,我们提出了自动编码器对抗性的学习策略。我们创建一个场景无关的一套外的领域对抗的例子,这是正确的自动编码器在对抗实例应用梯度上升之前重建。我们进一步利用对抗例子训练二元分类器时的正常和异常潜特征和重建之间进行区分,作为异常的例子。此外,为了确保自动编码器只注重每个包围盒图像内的主要对象,我们引入一个分支,学会段的主要对象。我们比较我们与三个基准数据集的国家的最先进的方法框架,使用各种评价指标。相比于现有的方法,实证结果表明,我们的方法实现对所有数据集良好的性能。
27. Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision [PDF] 返回目录
David Z. Li, Masaru Ishii, Russell H. Taylor, Gregory D. Hager, Ayushi Sinha
Abstract: In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from those without tools. We use three different methods to manipulate these latent representations in order to predict tool presence in each frame. Our fully unsupervised methods can identify whether endoscopic video frames contain tools with average precision of 71.56, 73.93, and 76.18, respectively, comparable to supervised methods. Our code is available at this https URL
摘要:在这项工作中,我们探讨是否有可能学习内镜视频帧的表示来执行任务,如确定没有监督的外科手术工具存在。我们用最大平均偏差(MMD)变的自动编码(VAE)学习内镜视频低维的潜在陈述和操纵这些陈述以区分含有从这些工具,而工具框架。我们使用三种不同的方法,以预测在每帧工具存在操纵这些潜在的表示。我们完全无监督的方法可以识别内窥镜视频帧是否包含与工具的71.56,73.93平均精度和76.18,分别相当于监督方法。我们的代码可在此HTTPS URL
David Z. Li, Masaru Ishii, Russell H. Taylor, Gregory D. Hager, Ayushi Sinha
Abstract: In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from those without tools. We use three different methods to manipulate these latent representations in order to predict tool presence in each frame. Our fully unsupervised methods can identify whether endoscopic video frames contain tools with average precision of 71.56, 73.93, and 76.18, respectively, comparable to supervised methods. Our code is available at this https URL
摘要:在这项工作中,我们探讨是否有可能学习内镜视频帧的表示来执行任务,如确定没有监督的外科手术工具存在。我们用最大平均偏差(MMD)变的自动编码(VAE)学习内镜视频低维的潜在陈述和操纵这些陈述以区分含有从这些工具,而工具框架。我们使用三种不同的方法,以预测在每帧工具存在操纵这些潜在的表示。我们完全无监督的方法可以识别内窥镜视频帧是否包含与工具的71.56,73.93平均精度和76.18,分别相当于监督方法。我们的代码可在此HTTPS URL
28. CNN-Based Image Reconstruction Method for Ultrafast Ultrasound Imaging [PDF] 返回目录
Dimitris Perdios, Manuel Vonlanthen, Florian Martinez, Marcel Arditi, Jean-Philippe Thiran
Abstract: Ultrafast ultrasound (US) revolutionized biomedical imaging with its capability of acquiring full-view frames at over 1 kHz, unlocking breakthrough modalities such as shear-wave elastography and functional US neuroimaging. Yet, it suffers from strong diffraction artifacts, mainly caused by grating lobes, side lobes, or edge waves. Multiple acquisitions are typically required to obtain a sufficient image quality, at the cost of a reduced frame rate. To answer the increasing demand for high-quality imaging from single-shot acquisitions, we propose a two-step convolutional neural network (CNN)-based image reconstruction method, compatible with real-time imaging. A low-quality estimate is obtained by means of a backprojection-based operation, akin to conventional delay-and-sum beamforming, from which a high-quality image is restored using a residual CNN with multi-scale and multi-channel filtering properties, trained specifically to remove the diffraction artifacts inherent to ultrafast US imaging. To account for both the high dynamic range and the radio frequency property of US images, we introduce the mean signed logarithmic absolute error (MSLAE) as training loss function. Experiments were conducted with a linear transducer array, in single plane wave (PW) imaging. Trainings were performed on a simulated dataset, crafted to contain a wide diversity of structures and echogenicities. Extensive numerical evaluations demonstrate that the proposed approach can reconstruct images from single PWs with a quality similar to that of gold-standard synthetic aperture imaging, on a dynamic range in excess of 60 dB. In vitro and in vivo experiments show that trainings performed on simulated data translate well to experimental settings.
摘要:超快超声(US)彻底改变生物医学成像,其在超过10千赫获取全视图帧的能力,解锁突破模态,诸如横波弹性成像和功能US神经成像。然而,它由强衍射伪像,主要是由于栅瓣,旁瓣,或边缘波受到影响。多次采集通常需要获得足够的图像质量,以降低的帧速率的成本。要回答从单次采集高质量成像的需求不断增加,我们提出了一个两步卷积神经网络(CNN)为基础的图像重建方法,具有实时成像兼容。一个低质量估计通过基于反投影操作的手段,类似于常规延迟与求和波束形成,从该高品质的图像是使用残差CNN与多尺度多通道的过滤性能恢复获得专门培训以除去固有的超快US成像的衍射伪像。考虑到这两种高动态范围和超声图像的射频性能,我们引进了平均签约数绝对误差(MSLAE)培训损失函数。实验用的线性换能器阵列,在单个平面波(PW)成像进行。培训是一个模拟数据集,制作包含结构和echogenicities的广泛的多样性进行。广泛的数值评价证明,该方法可以从重建的PW单图像具有类似于金标准合成孔径成像的质量,在过量的60dB的动态范围。在体外和体内试验表明在模拟的数据执行训练很好的转化实验设置。
Dimitris Perdios, Manuel Vonlanthen, Florian Martinez, Marcel Arditi, Jean-Philippe Thiran
Abstract: Ultrafast ultrasound (US) revolutionized biomedical imaging with its capability of acquiring full-view frames at over 1 kHz, unlocking breakthrough modalities such as shear-wave elastography and functional US neuroimaging. Yet, it suffers from strong diffraction artifacts, mainly caused by grating lobes, side lobes, or edge waves. Multiple acquisitions are typically required to obtain a sufficient image quality, at the cost of a reduced frame rate. To answer the increasing demand for high-quality imaging from single-shot acquisitions, we propose a two-step convolutional neural network (CNN)-based image reconstruction method, compatible with real-time imaging. A low-quality estimate is obtained by means of a backprojection-based operation, akin to conventional delay-and-sum beamforming, from which a high-quality image is restored using a residual CNN with multi-scale and multi-channel filtering properties, trained specifically to remove the diffraction artifacts inherent to ultrafast US imaging. To account for both the high dynamic range and the radio frequency property of US images, we introduce the mean signed logarithmic absolute error (MSLAE) as training loss function. Experiments were conducted with a linear transducer array, in single plane wave (PW) imaging. Trainings were performed on a simulated dataset, crafted to contain a wide diversity of structures and echogenicities. Extensive numerical evaluations demonstrate that the proposed approach can reconstruct images from single PWs with a quality similar to that of gold-standard synthetic aperture imaging, on a dynamic range in excess of 60 dB. In vitro and in vivo experiments show that trainings performed on simulated data translate well to experimental settings.
摘要:超快超声(US)彻底改变生物医学成像,其在超过10千赫获取全视图帧的能力,解锁突破模态,诸如横波弹性成像和功能US神经成像。然而,它由强衍射伪像,主要是由于栅瓣,旁瓣,或边缘波受到影响。多次采集通常需要获得足够的图像质量,以降低的帧速率的成本。要回答从单次采集高质量成像的需求不断增加,我们提出了一个两步卷积神经网络(CNN)为基础的图像重建方法,具有实时成像兼容。一个低质量估计通过基于反投影操作的手段,类似于常规延迟与求和波束形成,从该高品质的图像是使用残差CNN与多尺度多通道的过滤性能恢复获得专门培训以除去固有的超快US成像的衍射伪像。考虑到这两种高动态范围和超声图像的射频性能,我们引进了平均签约数绝对误差(MSLAE)培训损失函数。实验用的线性换能器阵列,在单个平面波(PW)成像进行。培训是一个模拟数据集,制作包含结构和echogenicities的广泛的多样性进行。广泛的数值评价证明,该方法可以从重建的PW单图像具有类似于金标准合成孔径成像的质量,在过量的60dB的动态范围。在体外和体内试验表明在模拟的数据执行训练很好的转化实验设置。
29. Bayesian Neural Networks for Uncertainty Estimation of Imaging Biomarkers [PDF] 返回目录
J. Senapati, A. Guha Roy, S. Poelsterl, D. Gutmann, S. Gatidis, C. Schlett, A. Peters, F. Bamberg, C. Wachinger
Abstract: Image segmentation enables to extract quantitative measures from scans that can serve as imaging biomarkers for diseases. However, segmentation quality can vary substantially across scans, and therefore yield unfaithful estimates in the follow-up statistical analysis of biomarkers. The core problem is that segmentation and biomarker analysis are performed independently. We propose to propagate segmentation uncertainty to the statistical analysis to account for variations in segmentation confidence. To this end, we evaluate four Bayesian neural networks to sample from the posterior distribution and estimate the uncertainty. We then assign confidence measures to the biomarker and propose statistical models for its integration in group analysis and disease classification. Our results for segmenting the liver in patients with diabetes mellitus clearly demonstrate the improvement of integrating biomarker uncertainty in the statistical inference.
摘要:图像分割使得能够从扫描可以用作成像生物标志物用于疾病提取量化测量。然而,分割质量可以跨扫描显着变化,因此,在生物标志物的后续的统计分析得到不忠实估计。核心问题是,分割和生物标志物分析独立地进行。我们建议分割的不确定性传播到统计分析,以考虑分割信心的变化。为此,我们评估从后验分布4个贝叶斯神经网络样本和估计的不确定性。然后我们分配信心的措施,生物标志物,并提出统计模型及其在群体分析和疾病分类整合。我们对在分割糖尿病患者肝脏的结果清楚地表明,在统计推断整合生物标志物的不确定性的提高。
J. Senapati, A. Guha Roy, S. Poelsterl, D. Gutmann, S. Gatidis, C. Schlett, A. Peters, F. Bamberg, C. Wachinger
Abstract: Image segmentation enables to extract quantitative measures from scans that can serve as imaging biomarkers for diseases. However, segmentation quality can vary substantially across scans, and therefore yield unfaithful estimates in the follow-up statistical analysis of biomarkers. The core problem is that segmentation and biomarker analysis are performed independently. We propose to propagate segmentation uncertainty to the statistical analysis to account for variations in segmentation confidence. To this end, we evaluate four Bayesian neural networks to sample from the posterior distribution and estimate the uncertainty. We then assign confidence measures to the biomarker and propose statistical models for its integration in group analysis and disease classification. Our results for segmenting the liver in patients with diabetes mellitus clearly demonstrate the improvement of integrating biomarker uncertainty in the statistical inference.
摘要:图像分割使得能够从扫描可以用作成像生物标志物用于疾病提取量化测量。然而,分割质量可以跨扫描显着变化,因此,在生物标志物的后续的统计分析得到不忠实估计。核心问题是,分割和生物标志物分析独立地进行。我们建议分割的不确定性传播到统计分析,以考虑分割信心的变化。为此,我们评估从后验分布4个贝叶斯神经网络样本和估计的不确定性。然后我们分配信心的措施,生物标志物,并提出统计模型及其在群体分析和疾病分类整合。我们对在分割糖尿病患者肝脏的结果清楚地表明,在统计推断整合生物标志物的不确定性的提高。
30. Are Deep Neural Networks "Robust"? [PDF] 返回目录
Peter Meer
Abstract: Separating outliers from inliers is the definition of robustness in computer vision. This essay delineates how deep neural networks are different than typical robust estimators. Deep neural networks not robust by this traditional definition.
摘要:从正常值分离异常是计算机视觉鲁棒性的定义。本文描绘深层神经网络是如何比典型的稳健估计的不同。深层神经网络不受此传统定义强劲。
Peter Meer
Abstract: Separating outliers from inliers is the definition of robustness in computer vision. This essay delineates how deep neural networks are different than typical robust estimators. Deep neural networks not robust by this traditional definition.
摘要:从正常值分离异常是计算机视觉鲁棒性的定义。本文描绘深层神经网络是如何比典型的稳健估计的不同。深层神经网络不受此传统定义强劲。
31. Simulation-supervised deep learning for analysing organelles states and behaviour in living cells [PDF] 返回目录
Arif Ahmed Sekh, Ida S. Opstad, Rohit Agarwal, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad
Abstract: In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy videos of living cells is a prime example. The 3D blurring function of microscope, digital resolution from pixel size, optical resolution due to the character of light, noise characteristics, and complex 3D deformable shapes of mitochondria, all contribute to making this problem GT hard. Manual segmentation of 100s of mitochondria across 1000s of frames and then across many such videos is not only herculean but also physically inaccurate because of the instrument and phenomena imposed limitations. Unsupervised learning produces less than optimal results and accuracy is important if inferences relevant to therapy are to be derived. In order to solve this unsurmountable problem, we bring modeling and deep learning to a nexus. We show that accurate physics based modeling of microscopy data including all its limitations can be the solution for generating simulated training datasets for supervised learning. We show here that our simulation-supervised segmentation approach is a great enabler for studying mitochondrial states and behaviour in heart muscle cells, where mitochondria have a significant role to play in the health of the cells. We report unprecedented mean IoU score of 91% for binary segmentation (19% better than the best performing unsupervised approach) of mitochondria in actual microscopy videos of living cells. We further demonstrate the possibility of performing multi-class classification, tracking, and morphology associated analytics at the scale of individual mitochondrion.
摘要:在许多真实世界的科学难题,产生地面实况(GT)的监督学习几乎是不可能的。其原因包括通过科学仪器,物理现象本身,或建模的复杂性所施加的限制。执行人工智能(AI)的任务,如分割,跟踪和小亚细胞结构的分析,如活体细胞的显微镜视频线粒体是一个典型的例子。显微镜的三维模糊函数,从像素尺寸,光学分辨率由于光,噪声特性,及线粒体的复杂的三维变形的形状的字符数字分辨率,都有助于使这个问题GT硬。整个帧的1000,然后过不少这样的视频线粒体100S手动分割不仅是艰巨的,但也是不准确的物理由于仪器和现象施加的限制。无监督学习产生达不到最佳效果,如果相关的治疗推论要导出的准确性是非常重要的。为了解决这一不可克服的问题,我们把建模和深度学习的关系。我们表明,显微镜数据,包括它的所有限制的基于精确的物理建模可以用于生成模拟训练数据集的监督学习解决方案。我们在这里展示我们的模拟监督分割方法是在心脏肌肉细胞,其中线粒体在细胞的健康方面发挥作用显著研究线粒体状态和性能有很大的推动力。我们在活细胞中的实际显微镜的视频报道二进制分割(19%比表现最好的无监督的办法更好)线粒体前所未有的平均得分借条的91%。我们进一步证明进行多级分类,跟踪的可能性,并在形态线粒体个人的规模相关分析。
Arif Ahmed Sekh, Ida S. Opstad, Rohit Agarwal, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad
Abstract: In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy videos of living cells is a prime example. The 3D blurring function of microscope, digital resolution from pixel size, optical resolution due to the character of light, noise characteristics, and complex 3D deformable shapes of mitochondria, all contribute to making this problem GT hard. Manual segmentation of 100s of mitochondria across 1000s of frames and then across many such videos is not only herculean but also physically inaccurate because of the instrument and phenomena imposed limitations. Unsupervised learning produces less than optimal results and accuracy is important if inferences relevant to therapy are to be derived. In order to solve this unsurmountable problem, we bring modeling and deep learning to a nexus. We show that accurate physics based modeling of microscopy data including all its limitations can be the solution for generating simulated training datasets for supervised learning. We show here that our simulation-supervised segmentation approach is a great enabler for studying mitochondrial states and behaviour in heart muscle cells, where mitochondria have a significant role to play in the health of the cells. We report unprecedented mean IoU score of 91% for binary segmentation (19% better than the best performing unsupervised approach) of mitochondria in actual microscopy videos of living cells. We further demonstrate the possibility of performing multi-class classification, tracking, and morphology associated analytics at the scale of individual mitochondrion.
摘要:在许多真实世界的科学难题,产生地面实况(GT)的监督学习几乎是不可能的。其原因包括通过科学仪器,物理现象本身,或建模的复杂性所施加的限制。执行人工智能(AI)的任务,如分割,跟踪和小亚细胞结构的分析,如活体细胞的显微镜视频线粒体是一个典型的例子。显微镜的三维模糊函数,从像素尺寸,光学分辨率由于光,噪声特性,及线粒体的复杂的三维变形的形状的字符数字分辨率,都有助于使这个问题GT硬。整个帧的1000,然后过不少这样的视频线粒体100S手动分割不仅是艰巨的,但也是不准确的物理由于仪器和现象施加的限制。无监督学习产生达不到最佳效果,如果相关的治疗推论要导出的准确性是非常重要的。为了解决这一不可克服的问题,我们把建模和深度学习的关系。我们表明,显微镜数据,包括它的所有限制的基于精确的物理建模可以用于生成模拟训练数据集的监督学习解决方案。我们在这里展示我们的模拟监督分割方法是在心脏肌肉细胞,其中线粒体在细胞的健康方面发挥作用显著研究线粒体状态和性能有很大的推动力。我们在活细胞中的实际显微镜的视频报道二进制分割(19%比表现最好的无监督的办法更好)线粒体前所未有的平均得分借条的91%。我们进一步证明进行多级分类,跟踪的可能性,并在形态线粒体个人的规模相关分析。
32. Soft Tissue Sarcoma Co-Segmentation in Combined MRI and PET/CT Data [PDF] 返回目录
Theresa Neubauer, Maria Wimmer, Astrid Berg, David Major, Dimitrios Lenis, Thomas Beyer, Jelena Saponjski, Katja Bühler
Abstract: Tumor segmentation in multimodal medical images has seen a growing trend towards deep learning based methods. Typically, studies dealing with this topic fuse multimodal image data to improve the tumor segmentation contour for a single imaging modality. However, they do not take into account that tumor characteristics are emphasized differently by each modality, which affects the tumor delineation. Thus, the tumor segmentation is modality- and task-dependent. This is especially the case for soft tissue sarcomas, where, due to necrotic tumor tissue, the segmentation differs vastly. Closing this gap, we develop a modalityspecific sarcoma segmentation model that utilizes multimodal image data to improve the tumor delineation on each individual modality. We propose a simultaneous co-segmentation method, which enables multimodal feature learning through modality-specific encoder and decoder branches, and the use of resource-effcient densely connected convolutional layers. We further conduct experiments to analyze how different input modalities and encoder-decoder fusion strategies affect the segmentation result. We demonstrate the effectiveness of our approach on public soft tissue sarcoma data, which comprises MRI (T1 and T2 sequence) and PET/CT scans. The results show that our multimodal co-segmentation model provides better modality-specific tumor segmentation than models using only the PET or MRI (T1 and T2) scan as input.
摘要:在多模态医学影像肿瘤分割已经看到了向基于深刻的学习方法有增多的趋势。典型地,处理这一主题的保险丝多模态图像数据的研究,以改善针对单个成像模态的肿瘤分割轮廓。然而,他们没有考虑到肿瘤的特点是由每个模态,从而影响肿瘤划定不同的强调。因此,肿瘤分割是modality-和任务依赖性。这是特别适用于软组织肉瘤,在那里,由于坏死的肿瘤组织中,分段的不同大大的情况。闭该间隙中,我们开发了利用多模态的图像数据,以提高在每个单独的模态的肿瘤描绘一个modalityspecific肉瘤分割模型。我们提出了一个同时共分割方法,其使得多峰特征学习通过特定的模态编码器和解码器的分支,以及使用资源effcient的密集连接卷积层。我们进一步进行实验分析的输入方式和编码器,解码器融合策略如何影响不同的分割结果。我们证明了我们对公共软组织肉瘤数据的方法,该方法包括MRI(T1和T2序列)和PET / CT扫描的效力。结果表明,我们的多峰共分割模型提供了更好的模态特异性肿瘤分割比仅使用PET或MRI(T1和T2)扫描作为输入模式。
Theresa Neubauer, Maria Wimmer, Astrid Berg, David Major, Dimitrios Lenis, Thomas Beyer, Jelena Saponjski, Katja Bühler
Abstract: Tumor segmentation in multimodal medical images has seen a growing trend towards deep learning based methods. Typically, studies dealing with this topic fuse multimodal image data to improve the tumor segmentation contour for a single imaging modality. However, they do not take into account that tumor characteristics are emphasized differently by each modality, which affects the tumor delineation. Thus, the tumor segmentation is modality- and task-dependent. This is especially the case for soft tissue sarcomas, where, due to necrotic tumor tissue, the segmentation differs vastly. Closing this gap, we develop a modalityspecific sarcoma segmentation model that utilizes multimodal image data to improve the tumor delineation on each individual modality. We propose a simultaneous co-segmentation method, which enables multimodal feature learning through modality-specific encoder and decoder branches, and the use of resource-effcient densely connected convolutional layers. We further conduct experiments to analyze how different input modalities and encoder-decoder fusion strategies affect the segmentation result. We demonstrate the effectiveness of our approach on public soft tissue sarcoma data, which comprises MRI (T1 and T2 sequence) and PET/CT scans. The results show that our multimodal co-segmentation model provides better modality-specific tumor segmentation than models using only the PET or MRI (T1 and T2) scan as input.
摘要:在多模态医学影像肿瘤分割已经看到了向基于深刻的学习方法有增多的趋势。典型地,处理这一主题的保险丝多模态图像数据的研究,以改善针对单个成像模态的肿瘤分割轮廓。然而,他们没有考虑到肿瘤的特点是由每个模态,从而影响肿瘤划定不同的强调。因此,肿瘤分割是modality-和任务依赖性。这是特别适用于软组织肉瘤,在那里,由于坏死的肿瘤组织中,分段的不同大大的情况。闭该间隙中,我们开发了利用多模态的图像数据,以提高在每个单独的模态的肿瘤描绘一个modalityspecific肉瘤分割模型。我们提出了一个同时共分割方法,其使得多峰特征学习通过特定的模态编码器和解码器的分支,以及使用资源effcient的密集连接卷积层。我们进一步进行实验分析的输入方式和编码器,解码器融合策略如何影响不同的分割结果。我们证明了我们对公共软组织肉瘤数据的方法,该方法包括MRI(T1和T2序列)和PET / CT扫描的效力。结果表明,我们的多峰共分割模型提供了更好的模态特异性肿瘤分割比仅使用PET或MRI(T1和T2)扫描作为输入模式。
33. Nonlocal Adaptive Direction-Guided Structure Tensor Total Variation For Image Recovery [PDF] 返回目录
Ezgi Demircan-Tureyen, Mustafa E. Kamasak
Abstract: A common strategy in variational image recovery is utilizing the nonlocal self-similarity (NSS) property, when designing energy functionals. One such contribution is nonlocal structure tensor total variation (NLSTV), which lies at the core of this study. This paper is concerned with boosting the NLSTV regularization term through the use of directional priors. More specifically, NLSTV is leveraged so that, at each image point, it gains more sensitivity in the direction that is presumed to have the minimum local variation. The actual difficulty here is capturing this directional information from the corrupted image. In this regard, we propose a method that employs anisotropic Gaussian kernels to estimate directional features to be later used by our proposed model. The experiments validate that our entire two-stage framework achieves better results than the NLSTV model and two other competing local models, in terms of visual and quantitative evaluation.
摘要:在变图像恢复的常用策略利用外地自相似性(NSS)性能,在设计能源函时。一种这样的贡献是外地结构张量总变化(NLSTV),在本研究的核心,其所在。本文关注的是通过使用定向先验的助推NLSTV调整项。更具体地,NLSTV是杠杆,使得在每个图像点,它在被推定为具有最小局部变化的方向获得更多的灵敏度。这里的实际困难是获取从损坏的图像这一方向信息。在这方面,我们建议采用各向异性高斯核估计稍后使用我们提出的模型定向特征的方法。实验证实,我们的整个两级架构实现了比NLSTV模型更好的结果和其他两个竞争的本土车型,在视觉和定量评价方面。
Ezgi Demircan-Tureyen, Mustafa E. Kamasak
Abstract: A common strategy in variational image recovery is utilizing the nonlocal self-similarity (NSS) property, when designing energy functionals. One such contribution is nonlocal structure tensor total variation (NLSTV), which lies at the core of this study. This paper is concerned with boosting the NLSTV regularization term through the use of directional priors. More specifically, NLSTV is leveraged so that, at each image point, it gains more sensitivity in the direction that is presumed to have the minimum local variation. The actual difficulty here is capturing this directional information from the corrupted image. In this regard, we propose a method that employs anisotropic Gaussian kernels to estimate directional features to be later used by our proposed model. The experiments validate that our entire two-stage framework achieves better results than the NLSTV model and two other competing local models, in terms of visual and quantitative evaluation.
摘要:在变图像恢复的常用策略利用外地自相似性(NSS)性能,在设计能源函时。一种这样的贡献是外地结构张量总变化(NLSTV),在本研究的核心,其所在。本文关注的是通过使用定向先验的助推NLSTV调整项。更具体地,NLSTV是杠杆,使得在每个图像点,它在被推定为具有最小局部变化的方向获得更多的灵敏度。这里的实际困难是获取从损坏的图像这一方向信息。在这方面,我们建议采用各向异性高斯核估计稍后使用我们提出的模型定向特征的方法。实验证实,我们的整个两级架构实现了比NLSTV模型更好的结果和其他两个竞争的本土车型,在视觉和定量评价方面。
34. DALE : Dark Region-Aware Low-light Image Enhancement [PDF] 返回目录
Dokyeong Kwon, Guisik Kim, Junseok Kwon
Abstract: In this paper, we present a novel low-light image enhancement method called dark region-aware low-light image enhancement (DALE), where dark regions are accurately recognized by the proposed visual attention module and their brightness are intensively enhanced. Our method can estimate the visual attention in an efficient manner using super-pixels without any complicated process. Thus, the method can preserve the color, tone, and brightness of original images and prevents normally illuminated areas of the images from being saturated and distorted. Experimental results show that our method accurately identifies dark regions via the proposed visual attention, and qualitatively and quantitatively outperforms state-of-the-art methods.
摘要:在本文中,我们提出了称为暗区感知低光图像增强(DALE)一种新颖的低光图像增强方法,其中暗的区域是精确地由所提出的视觉注意模块和它们的亮度识别的被集中加强。我们的方法可以利用超象素没有任何复杂的过程估计以有效的方式视觉注意。由此,该方法可以保留的颜色,色调,和原始图像,并防止亮度通常照明的图像的区域被饱和和扭曲。实验结果表明,我们的方法通过所提出的视觉注意准确地识别暗区,并定性和定量性能优于国家的最先进的方法。
Dokyeong Kwon, Guisik Kim, Junseok Kwon
Abstract: In this paper, we present a novel low-light image enhancement method called dark region-aware low-light image enhancement (DALE), where dark regions are accurately recognized by the proposed visual attention module and their brightness are intensively enhanced. Our method can estimate the visual attention in an efficient manner using super-pixels without any complicated process. Thus, the method can preserve the color, tone, and brightness of original images and prevents normally illuminated areas of the images from being saturated and distorted. Experimental results show that our method accurately identifies dark regions via the proposed visual attention, and qualitatively and quantitatively outperforms state-of-the-art methods.
摘要:在本文中,我们提出了称为暗区感知低光图像增强(DALE)一种新颖的低光图像增强方法,其中暗的区域是精确地由所提出的视觉注意模块和它们的亮度识别的被集中加强。我们的方法可以利用超象素没有任何复杂的过程估计以有效的方式视觉注意。由此,该方法可以保留的颜色,色调,和原始图像,并防止亮度通常照明的图像的区域被饱和和扭曲。实验结果表明,我们的方法通过所提出的视觉注意准确地识别暗区,并定性和定量性能优于国家的最先进的方法。
35. Human Blastocyst Classification after In Vitro Fertilization Using Deep Learning [PDF] 返回目录
Ali Akbar Septiandri, Ade Jamal, Pritta Ameilia Iffanolida, Oki Riayati, Budi Wiweko
Abstract: Embryo quality assessment after in vitro fertilization (IVF) is primarily done visually by embryologists. Variability among assessors, however, remains one of the main causes of the low success rate of IVF. This study aims to develop an automated embryo assessment based on a deep learning model. This study includes a total of 1084 images from 1226 embryos. The images were captured by an inverted microscope at day 3 after fertilization. The images were labelled based on Veeck criteria that differentiate embryos to grade 1 to 5 based on the size of the blastomere and the grade of fragmentation. Our deep learning grading results were compared to the grading results from trained embryologists to evaluate the model performance. Our best model from fine-tuning a pre-trained ResNet50 on the dataset results in 91.79% accuracy. The model presented could be developed into an automated embryo assessment method in point-of-care settings.
摘要:在体外受精(IVF)后胚胎质量评估主要是由胚胎学家在视觉上完成。变异性评估中,但是,仍然试管婴儿的成功率较低的主要原因之一。本研究旨在开发一种基于深刻的学习模型的自动胚胎评估。这项研究包括从1226个胚胎共1084个图像。这些图像是通过倒置显微镜在受精后3天捕获。基于威克条件的基础上,卵裂球的大小和碎片的等级分化的胚胎为1级至5中的图像进行标记。我们深厚的学习分级结果进行了比较,从训练的胚胎学家评估模型性能的分级结果。我们从微调预训练ResNet50在91.79%的准确度的数据集的结果最好的模型。提出的模型可能被开发成在定点护理机构的自动胚胎评估方法。
Ali Akbar Septiandri, Ade Jamal, Pritta Ameilia Iffanolida, Oki Riayati, Budi Wiweko
Abstract: Embryo quality assessment after in vitro fertilization (IVF) is primarily done visually by embryologists. Variability among assessors, however, remains one of the main causes of the low success rate of IVF. This study aims to develop an automated embryo assessment based on a deep learning model. This study includes a total of 1084 images from 1226 embryos. The images were captured by an inverted microscope at day 3 after fertilization. The images were labelled based on Veeck criteria that differentiate embryos to grade 1 to 5 based on the size of the blastomere and the grade of fragmentation. Our deep learning grading results were compared to the grading results from trained embryologists to evaluate the model performance. Our best model from fine-tuning a pre-trained ResNet50 on the dataset results in 91.79% accuracy. The model presented could be developed into an automated embryo assessment method in point-of-care settings.
摘要:在体外受精(IVF)后胚胎质量评估主要是由胚胎学家在视觉上完成。变异性评估中,但是,仍然试管婴儿的成功率较低的主要原因之一。本研究旨在开发一种基于深刻的学习模型的自动胚胎评估。这项研究包括从1226个胚胎共1084个图像。这些图像是通过倒置显微镜在受精后3天捕获。基于威克条件的基础上,卵裂球的大小和碎片的等级分化的胚胎为1级至5中的图像进行标记。我们深厚的学习分级结果进行了比较,从训练的胚胎学家评估模型性能的分级结果。我们从微调预训练ResNet50在91.79%的准确度的数据集的结果最好的模型。提出的模型可能被开发成在定点护理机构的自动胚胎评估方法。
36. W-Net: Dense Semantic Segmentation of Subcutaneous Tissue in Ultrasound Images by Expanding U-Net to Incorporate Ultrasound RF Waveform Data [PDF] 返回目录
Gautam Rajendrakumar Gare, Jiayuan Li, Rohan Joshi, Mrunal Prashant Vaze, Rishikesh Magar, Michael Yousefpour, Ricardo Luis Rodriguez, John Micheal Galeotti
Abstract: We present W-Net, a novel Convolution Neural Network (CNN) framework that employs raw ultrasound waveforms from each A-scan, typically referred to as ultrasound Radio Frequency (RF) data, in addition to the gray ultrasound image to semantically segment and label tissues. Unlike prior work, we seek to label every pixel in the image, without the use of a background class. To the best of our knowledge, this is also the first deep-learning or CNN approach for segmentation that analyses ultrasound raw RF data along with the gray image. International patent(s) pending [PCT/US20/37519]. We chose subcutaneous tissue (SubQ) segmentation as our initial clinical goal since it has diverse intermixed tissues, is challenging to segment, and is an underrepresented research area. SubQ potential applications include plastic surgery, adipose stem-cell harvesting, lymphatic monitoring, and possibly detection/treatment of certain types of tumors. A custom dataset consisting of hand-labeled images by an expert clinician and trainees are used for the experimentation, currently labeled into the following categories: skin, fat, fat fascia/stroma, muscle and muscle fascia. We compared our results with U-Net and Attention U-Net. Our novel \emph{W-Net}'s RF-Waveform input and architecture increased mIoU accuracy (averaged across all tissue classes) by 4.5\% and 4.9\% compared to regular U-Net and Attention U-Net, respectively. We present analysis as to why the Muscle fascia and Fat fascia/stroma are the most difficult tissues to label. Muscle fascia in particular, the most difficult anatomic class to recognize for both humans and AI algorithms, saw mIoU improvements of 13\% and 16\% from our W-Net vs U-Net and Attention U-Net respectively.
摘要:本发明的W型网,一种新颖的卷积神经网络(CNN)的框架,从各A扫描,典型地被称为超声射频(RF)数据到灰度超声图像采用原始超声波形,除了语义段和标签组织。不同于现有的工作中,我们力求每一个标签像素的图像中,不使用背景类。据我们所知,这也是分割的第一个深学习或CNN的办法,分析超声原始RF数据与灰度图像一起。国际专利(S)未决[PCT / US20 / 37519]。我们选择皮下组织(的SubQ)分割作为我们最初的临床目标,因为它具有多样混杂的组织,是具有挑战性的部分,并且是一个未被充分代表的研究领域。的SubQ潜在应用包括整形外科,脂肪干细胞收获,淋巴监测,并可能检测/治疗某些类型的肿瘤的。由专家医师和学员由手工标记的图像的自定义数据集被用于实验,目前标记为以下几类:皮肤,脂肪,脂肪筋膜/间质,肌肉和肌肉筋膜。我们比较我们带U-Net和注意掌中结果。我们的新颖\ EMPH {W-净}的RF-波形输入和体系结构4.5 \%和4.9 \%相比分别定期的U Net和注意U形网,增加米欧精度(在所有组织类平均)。我们目前的分析,为什么肌肉筋膜和脂肪筋膜/基质是最难组织中的标签。肌肉筋膜尤其是最困难的解剖班认识到人类和人工智能算法的13 \%,从我们的W-VS净额分别掌中宽带和注意掌中16 \%的人认为米欧改进。
Gautam Rajendrakumar Gare, Jiayuan Li, Rohan Joshi, Mrunal Prashant Vaze, Rishikesh Magar, Michael Yousefpour, Ricardo Luis Rodriguez, John Micheal Galeotti
Abstract: We present W-Net, a novel Convolution Neural Network (CNN) framework that employs raw ultrasound waveforms from each A-scan, typically referred to as ultrasound Radio Frequency (RF) data, in addition to the gray ultrasound image to semantically segment and label tissues. Unlike prior work, we seek to label every pixel in the image, without the use of a background class. To the best of our knowledge, this is also the first deep-learning or CNN approach for segmentation that analyses ultrasound raw RF data along with the gray image. International patent(s) pending [PCT/US20/37519]. We chose subcutaneous tissue (SubQ) segmentation as our initial clinical goal since it has diverse intermixed tissues, is challenging to segment, and is an underrepresented research area. SubQ potential applications include plastic surgery, adipose stem-cell harvesting, lymphatic monitoring, and possibly detection/treatment of certain types of tumors. A custom dataset consisting of hand-labeled images by an expert clinician and trainees are used for the experimentation, currently labeled into the following categories: skin, fat, fat fascia/stroma, muscle and muscle fascia. We compared our results with U-Net and Attention U-Net. Our novel \emph{W-Net}'s RF-Waveform input and architecture increased mIoU accuracy (averaged across all tissue classes) by 4.5\% and 4.9\% compared to regular U-Net and Attention U-Net, respectively. We present analysis as to why the Muscle fascia and Fat fascia/stroma are the most difficult tissues to label. Muscle fascia in particular, the most difficult anatomic class to recognize for both humans and AI algorithms, saw mIoU improvements of 13\% and 16\% from our W-Net vs U-Net and Attention U-Net respectively.
摘要:本发明的W型网,一种新颖的卷积神经网络(CNN)的框架,从各A扫描,典型地被称为超声射频(RF)数据到灰度超声图像采用原始超声波形,除了语义段和标签组织。不同于现有的工作中,我们力求每一个标签像素的图像中,不使用背景类。据我们所知,这也是分割的第一个深学习或CNN的办法,分析超声原始RF数据与灰度图像一起。国际专利(S)未决[PCT / US20 / 37519]。我们选择皮下组织(的SubQ)分割作为我们最初的临床目标,因为它具有多样混杂的组织,是具有挑战性的部分,并且是一个未被充分代表的研究领域。的SubQ潜在应用包括整形外科,脂肪干细胞收获,淋巴监测,并可能检测/治疗某些类型的肿瘤的。由专家医师和学员由手工标记的图像的自定义数据集被用于实验,目前标记为以下几类:皮肤,脂肪,脂肪筋膜/间质,肌肉和肌肉筋膜。我们比较我们带U-Net和注意掌中结果。我们的新颖\ EMPH {W-净}的RF-波形输入和体系结构4.5 \%和4.9 \%相比分别定期的U Net和注意U形网,增加米欧精度(在所有组织类平均)。我们目前的分析,为什么肌肉筋膜和脂肪筋膜/基质是最难组织中的标签。肌肉筋膜尤其是最困难的解剖班认识到人类和人工智能算法的13 \%,从我们的W-VS净额分别掌中宽带和注意掌中16 \%的人认为米欧改进。
37. Improving the Segmentation of Scanning Probe Microscope Images using Convolutional Neural Networks [PDF] 返回目录
Steff Farley, Jo E.A. Hodgkinson, Oliver M. Gordon, Joanna Turner, Andrea Soltoggio, Philip J. Moriarty, Eugenie Hunsicker
Abstract: A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the images to ensure accurate and meaningful statistical analysis can be carried out. Here we develop protocols for the segmentation of images of 2D assemblies of gold nanoparticles formed on silicon surfaces via deposition from an organic solvent. The evaporation of the solvent drives far-from-equilibrium self-organisation of the particles, producing a wide variety of nano- and micro-structured patterns. We show that a segmentation strategy using the U-Net convolutional neural network outperforms traditional automated approaches and has particular potential in the processing of images of nanostructured systems.
摘要:广泛的技术可以被认为是纳米结构化表面的图像的分割。手动分割这些图像是费时和在依赖于用户的分割偏压的效果,同时有目前最好的自动分段方法的特定技术,图像类和样品没有达成共识。任何图像分割方法必须最小化在图像中的噪声,以确保准确的和有意义的统计分析可以进行。在这里,我们开发的通过沉积形成在硅表面从有机溶剂中的金纳米颗粒的2D组件的图像的分割的协议。溶剂的驱动器远离平衡颗粒的自组织的蒸发,产生各种各样的纳米和微结构化图案。我们发现,使用U-Net的卷积神经网络分割策略优于传统的自动化方法,并在纳米结构系统的图像处理特定的潜力。
Steff Farley, Jo E.A. Hodgkinson, Oliver M. Gordon, Joanna Turner, Andrea Soltoggio, Philip J. Moriarty, Eugenie Hunsicker
Abstract: A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the images to ensure accurate and meaningful statistical analysis can be carried out. Here we develop protocols for the segmentation of images of 2D assemblies of gold nanoparticles formed on silicon surfaces via deposition from an organic solvent. The evaporation of the solvent drives far-from-equilibrium self-organisation of the particles, producing a wide variety of nano- and micro-structured patterns. We show that a segmentation strategy using the U-Net convolutional neural network outperforms traditional automated approaches and has particular potential in the processing of images of nanostructured systems.
摘要:广泛的技术可以被认为是纳米结构化表面的图像的分割。手动分割这些图像是费时和在依赖于用户的分割偏压的效果,同时有目前最好的自动分段方法的特定技术,图像类和样品没有达成共识。任何图像分割方法必须最小化在图像中的噪声,以确保准确的和有意义的统计分析可以进行。在这里,我们开发的通过沉积形成在硅表面从有机溶剂中的金纳米颗粒的2D组件的图像的分割的协议。溶剂的驱动器远离平衡颗粒的自组织的蒸发,产生各种各样的纳米和微结构化图案。我们发现,使用U-Net的卷积神经网络分割策略优于传统的自动化方法,并在纳米结构系统的图像处理特定的潜力。
38. Adversarially Robust Learning via Entropic Regularization [PDF] 返回目录
Gauri Jagatap, Animesh Basak Chowdhury, Siddharth Garg, Chinmay Hegde
Abstract: In this paper we propose a new family of algorithms for training adversarially robust deep neural networks. We formulate a new loss function that uses an entropic regularization. Our loss function considers the contribution of adversarial samples which are drawn from a specially designed distribution that assigns high probability to points with high loss from the immediate neighborhood of training samples. Our data entropy guided SGD approach is designed to search for adversarially robust valleys of the loss landscape. We observe that our approach generalizes better in terms of classification accuracy robustness as compared to state of art approaches based on projected gradient descent.
摘要:本文提出的算法,一个新的家庭为adversarially训练强劲深层神经网络。我们制定一个使用熵正则新的损失函数。我们的损失函数认为这是由一个专门设计的分布分配可能性大点从训练样本的近邻高损耗抽取样品对抗性的贡献。我们的数据熵引导SGD方法旨在寻找丢失景观adversarially强大的山谷。我们注意到,我们的方法推广了分类精度鲁棒性方面更好,因为相比于艺术的状态接近基于投影梯度下降。
Gauri Jagatap, Animesh Basak Chowdhury, Siddharth Garg, Chinmay Hegde
Abstract: In this paper we propose a new family of algorithms for training adversarially robust deep neural networks. We formulate a new loss function that uses an entropic regularization. Our loss function considers the contribution of adversarial samples which are drawn from a specially designed distribution that assigns high probability to points with high loss from the immediate neighborhood of training samples. Our data entropy guided SGD approach is designed to search for adversarially robust valleys of the loss landscape. We observe that our approach generalizes better in terms of classification accuracy robustness as compared to state of art approaches based on projected gradient descent.
摘要:本文提出的算法,一个新的家庭为adversarially训练强劲深层神经网络。我们制定一个使用熵正则新的损失函数。我们的损失函数认为这是由一个专门设计的分布分配可能性大点从训练样本的近邻高损耗抽取样品对抗性的贡献。我们的数据熵引导SGD方法旨在寻找丢失景观adversarially强大的山谷。我们注意到,我们的方法推广了分类精度鲁棒性方面更好,因为相比于艺术的状态接近基于投影梯度下降。
注:中文为机器翻译结果!封面为论文标题词云图!