0%

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-07

目录

1. DenoiSeg: Joint Denoising and Segmentation [PDF] 摘要
2. Generating Memorable Images Based on Human Visual Memory Schemas [PDF] 摘要
3. Manipulated Face Detector: Joint Spatial and Frequency Domain Attention Network [PDF] 摘要
4. GraCIAS: Grassmannian of Corrupted Images for Adversarial Security [PDF] 摘要
5. Automatic Detection and Recognition of Individuals in Patterned Species [PDF] 摘要
6. Automatic Plant Image Identification of Vietnamese species using Deep Learning Models [PDF] 摘要
7. UST: Unifying Spatio-Temporal Context for Trajectory Prediction in Autonomous Driving [PDF] 摘要
8. Design and Development of a Web-based Tool for Inpainting of Dissected Aortae in Angiography Images [PDF] 摘要
9. Probabilistic Color Constancy [PDF] 摘要
10. Fast Geometric Surface based Segmentation of Point Cloud from Lidar Data [PDF] 摘要
11. ProbaNet: Proposal-balanced Network for Object Detection [PDF] 摘要
12. Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds [PDF] 摘要
13. Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia [PDF] 摘要
14. CONFIG: Controllable Neural Face Image Generation [PDF] 摘要
15. Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning [PDF] 摘要
16. Deep Recurrent Disease Progression Model for Conversion-Time Prediction of Alzheimer's Disease [PDF] 摘要
17. Low-shot Object Detection via Classification Refinement [PDF] 摘要
18. Dependency Aware Filter Pruning [PDF] 摘要
19. Exploiting Inter-Frame Regional Correlation for Efficient Action Recognition [PDF] 摘要
20. CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement [PDF] 摘要
21. Multi-Head Attention with Joint Agent-Map Representation for Trajectory Prediction in Autonomous Driving [PDF] 摘要
22. Partly Supervised Multitask Learning [PDF] 摘要
23. Mimicry: Towards the Reproducibility of GAN Research [PDF] 摘要
24. Temporal Event Segmentation using Attention-based Perceptual Prediction Model for Continual Learning [PDF] 摘要
25. Iris segmentation techniques to recognize the behavior of a vigilant driver [PDF] 摘要
26. Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery [PDF] 摘要
27. Groupwise Multimodal Image Registration using Joint Total Variation [PDF] 摘要
28. High-Contrast Limited-Angle Reflection Tomography [PDF] 摘要
29. Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction [PDF] 摘要
30. Unsupervised Real-world Low-light Image Enhancement with Decoupled Networks [PDF] 摘要
31. Knee Injury Detection using MRI with Efficiently-Layered Network (ELNet) [PDF] 摘要
32. Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns [PDF] 摘要
33. Multi-task pre-training of deep neural networks [PDF] 摘要
34. Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder [PDF] 摘要
35. Cross-media Structured Common Space for Multimedia Event Extraction [PDF] 摘要
36. A new design of a flying robot, with advanced computer vision techniques to perform self-maintenance of smart grids [PDF] 摘要

摘要

1. DenoiSeg: Joint Denoising and Segmentation [PDF] 返回目录
  Tim-Oliver Buchholz, Mangal Prakash, Alexander Krull, Florian Jug
Abstract: Microscopy image analysis often requires the segmentation of objects, but training data for this task is typically scarce and hard to obtain. Here we propose DenoiSeg, a new method that can be trained end-to-end on only a few annotated ground truth segmentations. We achieve this by extending Noise2Void, a self-supervised denoising scheme that can be trained on noisy images alone, to also predict dense 3-class segmentations. The reason for the success of our method is that segmentation can profit from denoising, especially when performed jointly within the same network. The network becomes a denoising expert by seeing all available raw data, while co-learning to segment, even if only a few segmentation labels are available. This hypothesis is additionally fueled by our observation that the best segmentation results on high quality (very low noise) raw data are obtained when moderate amounts of synthetic noise are added. This renders the denoising-task non-trivial and unleashes the desired co-learning effect. We believe that DenoiSeg offers a viable way to circumvent the tremendous hunger for high quality training data and effectively enables few-shot learning of dense segmentations.
摘要:显微图像分析往往需要的对象的分割,但这个任务训练数据通常是稀缺的,难以取得。在这里我们建议DenoiSeg,可以在只有少数注释地面实况分割进行培训,终端到终端的新方法。我们通过扩展Noise2Void,自我监督的降噪方案,可以在嘈杂的图像单独进行训练,做到这一点也预测密集的3级分割。我们之所以方法的成功是细分可以从去噪,尤其是当同一网络内进行共同获利。该网络通过观察所有可用的原始数据变成一个去噪专家,而共同学习段,即使只有几个细分标签可用。这一假说另外通过我们的观察燃料,当加入合成噪声的适中量被获得的原始数据上高品质最好的分割结果(非常低的噪声)。这使得降噪任务不平凡和释放出所期望的共同学习效果。我们相信,DenoiSeg提供了一个可行的方法来规避高品质的训练数据的巨大饥饿和有效地实现几拍学习密集分割的。

2. Generating Memorable Images Based on Human Visual Memory Schemas [PDF] 返回目录
  Cameron Kyle-Davidson, Adrian G. Bors, Karla K. Evans
Abstract: This research study proposes using Generative Adversarial Networks (GAN) that incorporate a two-dimensional measure of human memorability to generate memorable or non-memorable images of scenes. The memorability of the generated images is evaluated by modelling Visual Memory Schemas (VMS), which correspond to mental representations that human observers use to encode an image into memory. The VMS model is based upon the results of memory experiments conducted on human observers, and provides a 2D map of memorability. We impose a memorability constraint upon the latent space of a GAN by employing a VMS map prediction model as an auxiliary loss. We assess the difference in memorability between images generated to be memorable or non-memorable through an independent computational measure of memorability, and additionally assess the effect of memorability on the realness of the generated images.
摘要:本研究研究提出使用生成对抗性网络(GAN),纳入人类记性二维措施产生的场景让人难忘的还是非难忘的影像。所生成的图像的记忆性通过建模视觉记忆架构(VMS),即人类观察者用来将图像编码到存储器对应于精神表示评价。该VMS模型基于对人类观察者进行存储器的实验的结果,并提供了记忆性的2D地图。我们通过采用VMS地图预测模型辅助损失后的GaN的潜在空间处以记忆性约束。我们评估产生通过记忆性的独立计算量度是令人难忘或非难忘的图像之间记忆性的差异,并且还评估记忆性的所生成的图像的真实性的效果。

3. Manipulated Face Detector: Joint Spatial and Frequency Domain Attention Network [PDF] 返回目录
  Zehao Chen, Hua Yang
Abstract: Face manipulation methods develop rapidly in recent years, which can generate high quality manipulated face images. However, detection methods perform not well on data produced by state-of-the-art manipulation methods, and they lack of generalization ability. In this paper, we propose a novel manipulated face detector, which is based on spatial and frequency domain combination and attention mechanism. Spatial domain features are extracted by facial semantic segmentation, and frequency domain features are extracted by Discrete Fourier Transform. We use features both in spatial domain and frequency domain as inputs in proposed model. And we add attention-based layers to backbone networks, in order to improve its generalization ability. We evaluate proposed model on several datasets and compare it with other state-of-the-art manipulated face detection methods. The results show our model performs best on both seen and unseen data.
摘要:面对操控方式,近年来发展迅速,它可以产生高品质的操纵面图像。然而,检测方法由国家的最先进的操作方法产生的数据执行不很好,它们缺乏的泛化能力。在本文中,我们提出了一种新颖的操纵面检测器,这是基于空间域和频域的组合和注意机制。空间域的特征由面部语义分割萃取,和频域特征由离散傅立叶变换提取。我们使用无论是在空间域和频域在提出的模型的输入功能。我们加关注系层到骨干网络,以提高其泛化能力。我们评估的几个数据集提出的模型,并与其他国家的先进操作的人脸检测方法进行了比较。结果表明,我们最好在两个看到和看不到的数据模型执行。

4. GraCIAS: Grassmannian of Corrupted Images for Adversarial Security [PDF] 返回目录
  Ankita Shukla, Pavan Turaga, Saket Anand
Abstract: Input transformation based defense strategies fall short in defending against strong adversarial attacks. Some successful defenses adopt approaches that either increase the randomness within the applied transformations, or make the defense computationally intensive, making it substantially more challenging for the attacker. However, it limits the applicability of such defenses as a pre-processing step, similar to computationally heavy approaches that use retraining and network modifications to achieve robustness to perturbations. In this work, we propose a defense strategy that applies random image corruptions to the input image alone, constructs a self-correlation based subspace followed by a projection operation to suppress the adversarial perturbation. Due to its simplicity, the proposed defense is computationally efficient as compared to the state-of-the-art, and yet can withstand huge perturbations. Further, we develop proximity relationships between the projection operator of a clean image and of its adversarially perturbed version, via bounds relating geodesic distance on the Grassmannian to matrix Frobenius norms. We empirically show that our strategy is complementary to other weak defenses like JPEG compression and can be seamlessly integrated with them to create a stronger defense. We present extensive experiments on the ImageNet dataset across four different models namely InceptionV3, ResNet50, VGG16 and MobileNet models with perturbation magnitude set to {\epsilon} = 16. Unlike state-of-the-art approaches, even without any retraining, the proposed strategy achieves an absolute improvement of ~ 4.5% in defense accuracy on ImageNet.
摘要:基于输入变换防守策略,抵御强对抗性攻击功亏一篑。一些成功的防御采用,要么增加随机性应用的转换中,或使国防计算密集的方法,使之更加显着,攻击者挑战。然而,它限制了这种防御作为预处理步骤的适用性,类似于计算繁重接近于使用再培训和网络的修改,以实现鲁棒性扰动。在这项工作中,我们提出了单独应用随机图像损坏输入图像,构建了一个基于自相关子空间,然后投影操作来抑制对抗扰动一种防御策略。由于其简单,建议防守相比,国家的最先进的计算效率,并且还能够承受巨大的扰动。此外,我们开发了一个干净的形象和其adversarially扰动版投影运营商之间的邻近关系,通过对有关格拉斯曼测地距离为矩阵弗罗贝纽斯规范范围。我们经验表明,我们的策略是其他弱防御如JPEG压缩互补,可以与他们进行无缝集成,建立更强大的防御。我们在四个不同的型号上ImageNet数据集目前大量的实验,即InceptionV3,ResNet50,VGG16和MobileNet型号扰动幅度设置为{\小量} = 16.与国家的最先进的方法,即使没有任何再培训,建议策略实现了〜4.5%的国防精度的绝对改善上ImageNet。

5. Automatic Detection and Recognition of Individuals in Patterned Species [PDF] 返回目录
  Gullal Singh Cheema, Saket Anand
Abstract: Visual animal biometrics is rapidly gaining popularity as it enables a non-invasive and cost-effective approach for wildlife monitoring applications. Widespread usage of camera traps has led to large volumes of collected images, making manual processing of visual content hard to manage. In this work, we develop a framework for automatic detection and recognition of individuals in different patterned species like tigers, zebras and jaguars. Most existing systems primarily rely on manual input for localizing the animal, which does not scale well to large datasets. In order to automate the detection process while retaining robustness to blur, partial occlusion, illumination and pose variations, we use the recently proposed Faster-RCNN object detection framework to efficiently detect animals in images. We further extract features from AlexNet of the animal's flank and train a logistic regression (or Linear SVM) classifier to recognize the individuals. We primarily test and evaluate our framework on a camera trap tiger image dataset that contains images that vary in overall image quality, animal pose, scale and lighting. We also evaluate our recognition system on zebra and jaguar images to show generalization to other patterned species. Our framework gives perfect detection results in camera trapped tiger images and a similar or better individual recognition performance when compared with state-of-the-art recognition techniques.
摘要:视觉动物生物被迅速普及,因为它使野生动物监测应用的非侵入性和成本效益的方法。相机陷阱的广泛使用已经导致大量采集的图像,使得可视内容手工处理难以管理。在这项工作中,我们开发了自动检测和识别不同的图案物种如老虎,斑马和美洲豹个人的框架。大多数现有的系统主要依赖于手动输入本地化的动物,它不能很好地扩展到大型数据集。为了同时保持鲁棒性模糊,部分闭塞,照明和姿势的变化来自动检测过程中,我们使用最近提出的快速-RCNN对象检测框架,以有效地检测图像中的动物。我们从动物的侧翼AlexNet进一步特征提取和培养逻辑回归(或线性SVM)分类识别个人。我们主要测试以及包含在整体图像质量,动物的姿势,规模和照明改变图像的相机陷阱老虎图像数据集评估我们的框架。我们也评估我们的斑马和美洲豹的图像识别系统来显示泛化到其他图案的种类。当与国家的最先进的识别技术相比,我们的框架给出了相机被困虎图像和类似或更好的个人识别性能完善的检测结果。

6. Automatic Plant Image Identification of Vietnamese species using Deep Learning Models [PDF] 返回目录
  Nguyen Van Hieu, Ngo Le Huy Hien
Abstract: It is complicated to distinguish among thousands of plant species in the natural ecosystem, and many efforts have been investigated to address the issue. In Vietnam, the task of identifying one from 12,000 species requires specialized experts in flora management, with thorough training skills and in-depth knowledge. Therefore, with the advance of machine learning, automatic plant identification systems have been proposed to benefit various stakeholders, including botanists, pharmaceutical laboratories, taxonomists, forestry services, and organizations. The concept has fueled an interest in research and application from global researchers and engineers in both fields of machine learning and computer vision. In this paper, the Vietnamese plant image dataset was collected from an online encyclopedia of Vietnamese organisms, together with the Encyclopedia of Life, to generate a total of 28,046 environmental images of 109 plant species in Vietnam. A comparative evaluation of four deep convolutional feature extraction models, which are MobileNetV2, VGG16, ResnetV2, and Inception Resnet V2, is presented. Those models have been tested on the Support Vector Machine (SVM) classifier to experiment with the purpose of plant image identification. The proposed models achieve promising recognition rates, and MobilenetV2 attained the highest with 83.9%. This result demonstrates that machine learning models are potential for plant species identification in the natural environment, and future works need to examine proposing higher accuracy systems on a larger dataset to meet the current application demand.
摘要:这是复杂的自然生态系统中成千上万的植物种类来区分,许多努力已经被研究来解决这个问题。在越南,一个标识为12,000种的任务,需要专门的专家菌群管理,具有全面的培训技能和深入的了解。因此,机器学习的推进,工厂自动识别系统已经被提出来有利于各利益相关者,包括植物学家,制药实验室,分类学家,林业服务和组织。这一概念已助长了全球的研究人员和工程师在机器学习和计算机视觉的两个领域的研究和应用的兴趣。在本文中,越南工厂的图像数据集是从越南生物的在线百科全书收集,生活的百科全书在一起,以产生一个总的在越南的109种植物28046个环境图像。四个深卷积特征提取车型,这是MobileNetV2,VGG16,ResnetV2和启RESNET V2的对比评测,给出。这些模型已经在支持向量机(SVM)分类测试与植物图像识别的目的进行实验。提出的模型实现承诺的识别率,并MobilenetV2达到最高,为83.9%。这一结果表明,机器学习模型是植物物种鉴定潜在的自然环境,以及今后的工作需要考察在更大的数据集提出了更高的精度系统,以满足当前的应用需求。

7. UST: Unifying Spatio-Temporal Context for Trajectory Prediction in Autonomous Driving [PDF] 返回目录
  Hao He, Hengchen Dai, Naiyan Wang
Abstract: Trajectory prediction has always been a challenging problem for autonomous driving, since it needs to infer the latent intention from the behaviors and interactions from traffic participants. This problem is intrinsically hard, because each participant may behave differently under different environments and interactions. This key is to effectively model the interlaced influence from both spatial context and temporal context. Existing work usually encodes these two types of context separately, which would lead to inferior modeling of the scenarios. In this paper, we first propose a unified approach to treat time and space dimensions equally for modeling spatio-temporal context. The proposed module is simple and easy to implement within several lines of codes. In contrast to existing methods which heavily rely on recurrent neural network for temporal context and hand-crafted structure for spatial context, our method could automatically partition the spatio-temporal space to adapt the data. Lastly, we test our proposed framework on two recently proposed trajectory prediction dataset ApolloScape and Argoverse. We show that the proposed method substantially outperforms the previous state-of-the-art methods while maintaining its simplicity. These encouraging results further validate the superiority of our approach.
摘要:轨迹预测一直是自主驾驶一个具有挑战性的问题,因为它需要推断从交通参与者的行为和相互作用的潜在意图。这个问题本身就很难,因为每个参与者的行为可能不同在不同的环境和互动。这关键是要有效地模拟从两个空间范围和时间范围内的交错影响。现有的工作通常编码这两种情况下的分开,这将导致该场景的建模低劣。在本文中,我们首先提出了一个统一的方法来治疗的时间和空间的尺寸同样为造型的时空背景。所提出的模块是简单,易于内码的几行实施。相较于严重依赖复发性神经网络的时间背景和空间背景的手工制作的结构已有的方法,我们的方法可以自动分区时空空间,以适应数据。最后,我们两个最近提出的轨迹预测数据集ApolloScape和Argoverse测试我们建议的框架。我们表明,所提出的方法基本上优于以前的状态的最先进的方法,同时保持它的简单性。这些令人鼓舞的结果进一步验证了我们方法的优越性。

8. Design and Development of a Web-based Tool for Inpainting of Dissected Aortae in Angiography Images [PDF] 返回目录
  Alexander Prutsch, Antonio Pepe, Jan Egger
Abstract: Medical imaging is an important tool for the diagnosis and the evaluation of an aortic dissection (AD); a serious condition of the aorta, which could lead to a life-threatening aortic rupture. AD patients need life-long medical monitoring of the aortic enlargement and of the disease progression, subsequent to the diagnosis of the aortic dissection. Since there is a lack of 'healthy-dissected' image pairs from medical studies, the application of inpainting techniques offers an alternative source for generating them by doing a virtual regression from dissected aortae to healthy aortae; an indirect way to study the origin of the disease. The proposed inpainting tool combines a neural network, which was trained on the task of inpainting aortic dissections, with an easy-to-use user interface. To achieve this goal, the inpainting tool has been integrated within the 3D medical image viewer of StudierFenster (this http URL). By designing the tool as a web application, we simplify the usage of the neural network and reduce the initial learning curve.
摘要:医学成像是用于诊断和主动脉夹层(AD)的评价的一个重要工具;主动脉,这可能导致危及生命的主动脉破裂的病情严重。 AD患者需要终身医疗监测主动脉放大的和疾病进展,对主动脉夹层的诊断之后。由于存在一个缺乏来自医学研究“健康-解剖”图像对,图像修复的技术提供了通过执行从解剖主动脉到健康主动脉的虚拟回归生成它们的替代来源的应用;一种间接的方式来研究疾病的起源。所提出的图像修补工具结合了神经网络,这是受过训练的图像修补主动脉夹层的任务,用一个易于使用的用户界面。为了实现这一目标,图像修复工具已经StudierFenster的三维医学图像查看器中集成(此HTTP URL)。通过设计工具,Web应用程序,我们简化了神经网络的使用,降低了最初的学习曲线。

9. Probabilistic Color Constancy [PDF] 返回目录
  Firas Laakom, Jenni Raitoharju, Alexandros Iosifidis, Uygar Tuna, Jarno Nikkanen, Moncef Gabbouj
Abstract: In this paper, we propose a novel unsupervised color constancy method, called Probabilistic Color Constancy (PCC). We define a framework for estimating the illumination of a scene by weighting the contribution of different image regions using a graph-based representation of the image. To estimate the weight of each (super-)pixel, we rely on two assumptions: (Super-)pixels with similar colors contribute similarly and darker (super-)pixels contribute less. The resulting system has one global optimum solution. The proposed method achieves competitive performance, compared to the state-of-the-art, on INTEL-TAU dataset.
摘要:在本文中,我们提出了一种新的无监督颜色恒常方法,称为概率颜色恒常(PCC)。我们定义了一个框架,通过加权使用图像的基于图的表示不同的图像区域的贡献估计场景的照明。为了估计每个(超)像素的权重,我们依靠两个假设:(超)像素具有相似的颜色同样有助于和较暗(超)像素贡献较小。这样的系统有一个全局最优解。相比于国家的最先进的,上INTEL-TAU数据集所提出的方法实现了有竞争力的性能,。

10. Fast Geometric Surface based Segmentation of Point Cloud from Lidar Data [PDF] 返回目录
  Aritra Mukherjee, Sourya Dipta Das, Jasorsi Ghosh, Ananda S. Chowdhury, Sanjoy Kumar Saha
Abstract: Mapping the environment has been an important task for robot navigation and Simultaneous Localization And Mapping (SLAM). LIDAR provides a fast and accurate 3D point cloud map of the environment which helps in map building. However, processing millions of points in the point cloud becomes a computationally expensive task. In this paper, a methodology is presented to generate the segmented surfaces in real time and these can be used in modeling the 3D objects. At first an algorithm is proposed for efficient map building from single shot data of spinning Lidar. It is based on fast meshing and sub-sampling. It exploits the physical design and the working principle of the spinning Lidar sensor. The generated mesh surfaces are then segmented by estimating the normal and considering their homogeneity. The segmented surfaces can be used as proposals for predicting geometrically accurate model of objects in the robots activity environment. The proposed methodology is compared with some popular point cloud segmentation methods to highlight the efficacy in terms of accuracy and speed.
摘要:映射环境一直是机器人导航和同步定位与地图(SLAM)的一项重要任务。 LIDAR提供了帮助在地图中的建筑环境的快速和准确的三维点云地图。然而,在点云处理数百万个点变成昂贵的计算任务。在本文中,一种方法被提出,以产生实时的分段表面,并且这些可以在3D建模的对象来使用。起初的算法,提出了从纺纱激光雷达的单次数据有效的地图构建。它是基于快速网格划分和子采样。它利用物理设计和旋转雷达传感器的工作原理。所生成的网格表面,然后通过估计正常和考虑到它们的均匀性分段。分段的表面可以用作建议用于预测在机器人活动环境中的物体的几何形状准确的模型。所提出的方法与一些流行的点云的分割方法相比,突出的精度和速度方面的功效。

11. ProbaNet: Proposal-balanced Network for Object Detection [PDF] 返回目录
  Xiang Zhang, Jing Wu, Mingyi Zhou
Abstract: Candidate object proposals generated by object detectors based on convolutional neural network (CNN) encounter easy-hard samples imbalance problem, which can affect overall performance. In this study, we propose a Proposal-balanced Network (ProbaNet) for alleviating the imbalance problem. Firstly, ProbaNet increases the probability of choosing hard samples for training by discarding easy samples through threshold truncation. Secondly, ProbaNet emphasizes foreground proposals by increasing their weights. To evaluate the effectiveness of ProbaNet, we train models based on different benchmarks. Mean Average Precision (mAP) of the model using ProbaNet achieves 1.2$\%$ higher than the baseline on PASCAL VOC 2007. Furthermore, it is compatible with existing two-stage detectors and offers a very small amount of additional computational cost.
摘要:通过基于卷积神经网络(CNN)遭遇易于硬样品不平衡问题对象的检测器,这会影响整体性能产生的候选对象的建议。在这项研究中,我们提出了一个建议,平衡网络(ProbaNet),对于缓解失衡问题。首先,ProbaNet增加通过门槛截断丢弃容易样本选择的刻苦训练样本的概率。其次,ProbaNet通过增加它们的权重强调前景的建议。为了评估ProbaNet的有效性,我们训练根据不同的基准模型。使用ProbaNet模型的均值平均精确度(MAP)达到1.2 $ \%,比上PASCAL VOC基线2007年。此外,它与现有的两级检测,并提供额外的计算成本非常小的量实现较高$。

12. Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds [PDF] 返回目录
  Li Wang, Dawei Zhao, Tao Wu, Hao Fu, Zhiyu Wang, Liang Xiao, Xin Xu, Bin Dai
Abstract: 3D moving object detection is one of the most critical tasks in dynamic scene analysis. In this paper, we propose a novel Drosophila-inspired 3D moving object detection method using Lidar sensors. According to the theory of elementary motion detector, we have developed a motion detector based on the shallow visual neural pathway of Drosophila. This detector is sensitive to the movement of objects and can well suppress background noise. Designing neural circuits with different connection modes, the approach searches for motion areas in a coarse-to-fine fashion and extracts point clouds of each motion area to form moving object proposals. An improved 3D object detection network is then used to estimate the point clouds of each proposal and efficiently generates the 3D bounding boxes and the object categories. We evaluate the proposed approach on the widely-used KITTI benchmark, and state-of-the-art performance was obtained by using the proposed approach on the task of motion detection.
摘要:三维运动目标检测是在动态场景分析最重要的任务之一。在本文中,我们提出了一个新颖的果蝇启发3D运动使用激光雷达传感器物体检测方法。根据基本的运动探测器的理论,我们已经开发了一种基于果蝇的视觉浅神经通路的运动检测器。这个检测器对物体的运动敏感并能很好地抑制背景噪音。设计的神经回路具有不同的连接方式,该方法搜索运动区域在由粗到细的方式,并提取指向每个运动区的云层形成运动对象的建议。然后一种改进的三维物体检测网络被用于估计每个提议的点云且高效地生成3D边界框和对象类别。我们评估的广泛使用KITTI基准所提出的方法,并通过运动检测的任务所提出的方法获得国家的最先进的性能。

13. Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia [PDF] 返回目录
  Xi Ouyang, Jiayu Huo, Liming Xia, Fei Shan, Jun Liu, Zhanhao Mo, Fuhua Yan, Zhongxiang Ding, Qi Yang, Bin Song, Feng Shi, Huan Yuan, Ying Wei, Xiaohuan Cao, Yaozong Gao, Dijia Wu, Qian Wang, Dinggang Shen
Abstract: The coronavirus disease (COVID-19) is rapidly spreading all over the world, and has infected more than 1,436,000 people in more than 200 countries and territories as of April 9, 2020. Detecting COVID-19 at early stage is essential to deliver proper healthcare to the patients and also to protect the uninfected population. To this end, we develop a dual-sampling attention network to automatically diagnose COVID- 19 from the community acquired pneumonia (CAP) in chest computed tomography (CT). In particular, we propose a novel online attention module with a 3D convolutional network (CNN) to focus on the infection regions in lungs when making decisions of diagnoses. Note that there exists imbalanced distribution of the sizes of the infection regions between COVID-19 and CAP, partially due to fast progress of COVID-19 after symptom onset. Therefore, we develop a dual-sampling strategy to mitigate the imbalanced learning. Our method is evaluated (to our best knowledge) upon the largest multi-center CT data for COVID-19 from 8 hospitals. In the training-validation stage, we collect 2186 CT scans from 1588 patients for a 5-fold cross-validation. In the testing stage, we employ another independent large-scale testing dataset including 2796 CT scans from 2057 patients. Results show that our algorithm can identify the COVID-19 images with the area under the receiver operating characteristic curve (AUC) value of 0.944, accuracy of 87.5%, sensitivity of 86.9%, specificity of 90.1%, and F1-score of 82.0%. With this performance, the proposed algorithm could potentially aid radiologists with COVID-19 diagnosis from CAP, especially in the early stage of the COVID-19 outbreak.
摘要:冠状病毒病(COVID-19)正在迅速蔓延世界各地,并已感染了超过1436000人在超过200个国家和地区为4月9日,2020年检测COVID-19在早期阶段是必不可少的交付适当医疗的患者,也保护未感染人群。为此,我们开发了双采样重视网络在胸部CT(CT)自动诊断从社区获得性肺炎(CAP)COVID- 19。特别是,我们提出了一个3D卷积网络(CNN)一种新型的在线关注模块进行诊断的决策时重点感染地区的肺部。需要注意的是存在COVID-19和CAP之间的感染区域的大小分布失衡,部分原因是由于后症状发作COVID-19的快速进步。因此,我们开发了双抽样策略,以减轻不平衡的学习。我们的方法是从8家医院评价(据我们所知)在最大的多中心CT数据COVID-19。在训练验证阶段,我们收集了1588例患者2186次CT扫描的5倍交叉验证。在测试阶段,我们采用另一个独立的大型数据集的测试包括2057例患者2796次CT扫描。结果表明,我们的算法可以与区域识别接收机下COVID-19的图像的0.944操作特性曲线(AUC)值,为87.5%的准确度,86.9%的灵敏度,90.1%的特异性和82.0%的F1-得分。有了这样的表现,该算法可能潜在地帮助放射科医师与CAP COVID-19的诊断,尤其是在COVID-19爆发的早期阶段。

14. CONFIG: Controllable Neural Face Image Generation [PDF] 返回目录
  Marek Kowalski, Stephan J. Garbin, Virginia Estellers, Tadas Baltrušaitis, Matthew Johnson, Jamie Shotton
Abstract: Our ability to sample realistic natural images, particularly faces, has advanced by leaps and bounds in recent years, yet our ability to exert fine-tuned control over the generative process has lagged behind. If this new technology is to find practical uses, we need to achieve a level of control over generative networks which, without sacrificing realism, is on par with that seen in computer graphics and character animation. To this end we propose ConfigNet, a neural face model that allows for controlling individual aspects of output images in semantically meaningful ways and that is a significant step on the path towards finely-controllable neural rendering. ConfigNet is trained on real face images as well as synthetic face renders. Our novel method uses synthetic data to factorize the latent space into elements that correspond to the inputs of a traditional rendering pipeline, separating aspects such as head pose, facial expression, hair style, illumination, and many others which are very hard to annotate in real data. The real images, which are presented to the network without labels, extend the variety of the generated images and encourage realism. Finally, we propose an evaluation criterion using an attribute detection network combined with a user study and demonstrate state-of-the-art individual control over attributes in the output images.
摘要:我们有能力样品逼真自然的图像,特别是面孔,已经突飞猛进,近年来先进的,但我们在生成过程已经落后能力发挥微调控制。如果这项新技术是找到实际的用途,我们需要实现对生成的网络控制的水平,这不牺牲现实主义,是看齐,在计算机图形和角色动画看到。为此,我们提出ConfigNet,神经脸模型,允许在语义上有意义的方式控制输出图像的各个方面,这是朝着精细可控神经渲染路径上的显著的一步。 ConfigNet作为合成面孔呈现真实的人脸图像的培训也是如此。我们的新方法使用合成的数据因式分解潜在空间成元件对应于这是非常难以在实际注释传统渲染流水线的输入端,分离如头的姿势,表情,发型,照明,以及许多其他方面数据。真正的图像,这是提供给网络的无标签,扩展各种所生成的图像,并鼓励真实感。最后,我们提出使用属性检测网络与用户研究组合的评价基准,并证明在输出图像中的属性状态的最先进的单独控制。

15. Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning [PDF] 返回目录
  Anh Duc Le
Abstract: Recognizing the full-page of Japanese historical documents is a challenging problem due to the complex layout/background and difficulty of writing styles, such as cursive and connected characters. Most of the previous methods divided the recognition process into character segmentation and recognition. However, those methods provide only character bounding boxes and classes without text transcription. In this paper, we enlarge our previous humaninspired recognition system from multiple lines to the full-page of Kuzushiji documents. The human-inspired recognition system simulates human eye movement during the reading process. For the lack of training data, we propose a random text line erasure approach that randomly erases text lines and distorts documents. For the convergence problem of the recognition system for fullpage documents, we employ curriculum learning that trains the recognition system step by step from the easy level (several text lines of documents) to the difficult level (full-page documents). We tested the step training approach and random text line erasure approach on the dataset of the Kuzushiji recognition competition on Kaggle. The results of the experiments demonstrate the effectiveness of our proposed approaches. These results are competitive with other participants of the Kuzushiji recognition competition.
摘要:认识的日本历史文献整版是一个具有挑战性的问题,由于复杂的布局/背景和写作风格,如草书和连字符的难度。以往大部分的划分方法识别过程分为字符分割和识别。然而,这些方法只提供文字边框和类无文本的转录。在本文中,我们从多条线路到Kuzushiji文件的整版扩大我们以前humaninspired识别系统。人类灵感的识别系统在读取过程中模拟了人眼的运动。对于缺乏训练数据,我们建议,随机删除文本行,歪曲文件随机文本行擦除方法。对于全页文档识别系统的衔接问题,我们采用课程学习是训练识别系统逐步由简单级别(文件数文本行)的难易程度(全页文档)的一步。我们测试步骤的培训方法和随机文本行擦除方式上Kaggle的Kuzushiji承认比赛的数据集。实验的结果证明我们提出的方法的有效性。这些结果与Kuzushiji承认竞争的其他参与者的竞争力。

16. Deep Recurrent Disease Progression Model for Conversion-Time Prediction of Alzheimer's Disease [PDF] 返回目录
  Wonsik Jung, Eunji Jun, Heung-Il Suk
Abstract: Alzheimer's disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years, with no treatments or available medicines. In this regard, there have been efforts to identify the risk of developing AD in its earliest time. While many of the previous works considered cross-sectional analysis, more recent studies have focused on the diagnosis and prognosis of AD with longitudinal or time-series data in a way of disease progression modeling (DPM). Under the same problem settings, in this work, we propose a novel computational framework that forecasts the phenotypic measurements of MRI biomarkers and predicts the clinical statuses at multiple future time points. However, in handling time series data, it generally faces with many unexpected missing observations. In regard to such an unfavorable situation, we define a secondary problem of estimating those missing values and tackle it in a systematic way by taking account of temporal and multivariate relations inherent in time series data. Concretely, we propose a deep recurrent network that jointly tackles the three problems of (i) missing value imputation, (ii) phenotypic measurements forecasting, and (iii) clinical status prediction of a subject based on his/her longitudinal imaging biomarkers. Notably, the learnable model parameters of our network are trained in an end to end manner with our circumspectly defined loss function. In our experiments over TADPOLE challenge cohort, we measured performance for various metrics and compared our method to competing methods in the literature. Exhaustive analyses and ablation studies were also conducted to better confirm the effectiveness of our method.
摘要:阿尔茨海默病(AD)被称为老年痴呆症的主要原因之一,在几年的特点是病程进展缓慢,没有处理或提供药品。在这方面,一直在努力确定其最早的时间来开发AD的风险。尽管许多人认为横截面分析之前的作品的,最近的研究集中在AD的具有纵向或时间序列数据的诊断和预后中的疾病进展建模(DPM)的一种方式。在同样的问题设置,在这项工作中,我们提出了预测MRI生物标志物的表型测量,并预测在未来多个时间点的临床状态的一种新型的计算框架。然而,在处理时间序列数据,所以一般有许多意想不到的缺失观察面。对于这样的不利局面,我们定义这些估计缺失值的第二个问题,并采取在时间序列数据中固有的时间和多变量关系的账户解决它以系统的方式。具体而言,我们建议联合铲球(I)缺失值估算,(二)表型测量的预测,并根据他/她的纵向成像生物标记对象的(三)临床状况预测的三个问题深循环网络。值得注意的是,我们的网络的可学习的模型参数中的端部被训练来结束的方式与我们的慎重界定的损失函数。在我们的实验在TADPOLE挑战队列中,我们测量了各种指标的表现相比我们的方法与文献的竞争方法。详尽的分析和切除研究还进行更好地确认了该方法的有效性。

17. Low-shot Object Detection via Classification Refinement [PDF] 返回目录
  Yiting Li, Yu Cheng, Lu Liu, Sichao Tian, Haiyue Zhu, Cheng Xiang, Prahlad Vadakkepat, Cheksing Teo, Tongheng Lee
Abstract: This work aims to address the problem of low-shot object detection, where only a few training samples are available for each category. Regarding the fact that conventional fully supervised approaches usually suffer huge performance drop with rare classes where data is insufficient, our study reveals that there exists more serious misalignment between classification confidence and localization accuracy on rarely labeled categories, and the prone to overfitting class-specific parameters is the crucial cause of this issue. In this paper, we propose a novel low-shot classification correction network (LSCN) which can be adopted into any anchor-based detector to directly enhance the detection accuracy on data-rare categories, without sacrificing the performance on base categories. Specially, we sample false positive proposals from a base detector to train a separate classification correction network. During inference, the well-trained correction network removes false positives from the base detector. The proposed correction network is data-efficient yet highly effective with four carefully designed components, which are Unified recognition, Global receptive field, Inter-class separation, and Confidence calibration. Experiments show our proposed method can bring significant performance gains to rarely labeled categories and outperforms previous work on COCO and PASCAL VOC by a large margin.
摘要:这项工作的目的是解决低拍物体检测,其中只有少数训练样本可用于每个类别的问题。对于这样的事实,传统的充分监督的方法通常遭受罕见班巨大的性能下降,其中数据是不够的,我们的研究表明,存在着分类置信度和定位精度上很少被标注类别之间更严重的错位,以及容易产生过拟合类特定参数这是问题的关键原因。在本文中,我们提出一种可以采用到任何基于锚的检测器,以直接增强数据稀土类的检测精度,在不牺牲在基座类别的性能的新型的低次分类校正网络(LSCN)。特别地,我们采样从基站检测器的假阳性的建议来训练单独分类校正网络。在推理时,训练有素校正网络移除从基体检测器误报。所建议的更正网络是数据高效但具有四个精心设计的部件高效,这是统一识别,全球感受域,类间的分离,和置信校准。实验表明,我们提出的方法可以大幅度带来显著的性能提升到很少被标注类别,优于上COCO和PASCAL VOC以前的工作。

18. Dependency Aware Filter Pruning [PDF] 返回目录
  Kai Zhao, Xin-Yu Zhang, Qi Han, Ming-Ming Cheng
Abstract: Convolutional neural networks (CNNs) are typically over-parameterized, bringing considerable computational overhead and memory footprint in inference. Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. For this purpose, identifying unimportant convolutional filters is the key to effective filter pruning. Previous work prunes filters according to either their weight norms or the corresponding batch-norm scaling factors, while neglecting the sequential dependency between adjacent layers. In this paper, we further develop the norm-based importance estimation by taking the dependency between the adjacent layers into consideration. Besides, we propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity. In this way, we can identify unimportant filters and search for the optimal network architecture within certain resource budgets in a more principled manner. Comprehensive experimental results demonstrate the proposed method performs favorably against the existing strong baseline on the CIFAR, SVHN, and ImageNet datasets. The training sources will be publicly available after the review process.
摘要:卷积神经网络(细胞神经网络)通常是过度参数,带来可观的计算开销和内存占用的推断。修剪不重要的过滤器的比例,以减轻推理成本的有效方式。为此目的,确定不重要卷积滤波器的关键是有效的过滤修剪。根据任一它们的重量规范或相应的批处理范数比例因子,而忽视了相邻层之间的顺序相关性以前的工作梅干滤波器。在本文中,我们进一步发展,采取相邻层之间的依赖关系考虑基于范数的重要性估计。此外,我们提出了一种新的机制来动态控制稀疏性的诱导正则化,以实现期望的稀疏性。通过这种方式,我们可以找出不重要的过滤器和更强调原则性地搜索特定的资源预算中的最佳网络架构。综合实验结果证明所提出的方法进行良好地防止在CIFAR,SVHN现有强基线,和ImageNet数据集。培训来源将是审查过程后公开。

19. Exploiting Inter-Frame Regional Correlation for Efficient Action Recognition [PDF] 返回目录
  Yuecong Xu, Jianfei Yang, Kezhi Mao, Jianxiong Yin, Simon See
Abstract: Temporal feature extraction is an important issue in video-based action recognition. Optical flow is a popular method to extract temporal feature, which produces excellent performance thanks to its capacity of capturing pixel-level correlation information between consecutive frames. However, such a pixel-level correlation is extracted at the cost of high computational complexity and large storage resource. In this paper, we propose a novel temporal feature extraction method, named Attentive Correlated Temporal Feature (ACTF), by exploring inter-frame correlation within a certain region. The proposed ACTF exploits both bilinear and linear correlation between successive frames on the regional level. Our method has the advantage of achieving performance comparable to or better than optical flow-based methods while avoiding the introduction of optical flow. Experimental results demonstrate our proposed method achieves the state-of-the-art performances of 96.3% on UCF101 and 76.3% on HMDB51 benchmark datasets.
摘要:时间特征提取是基于视频的动作识别的一个重要问题。光流是一种流行的方法来提取时间特征,这产生优异的性能由于其捕获连续的帧之间的像素级的相关性的信息的容量。然而,这样的像素级的相关性是在高计算复杂性和大的存储资源的成本萃取。在本文中,我们提出了一种新颖的时间特征提取方法,命名为细心的相关时序特征(ACTF),通过在一定区域内探索帧间相关。所提出的利用ACTF既双线性和在区域一级连续帧之间的线性关系。我们的方法具有实现性能相当或同时避免引入光流的比光流为基础的方法更好的优点。实验结果表明,我们提出的方法实现对UCF101 96.3%和HMDB51基准数据集76.3%的国家的最先进的性能。

20. CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement [PDF] 返回目录
  Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang
Abstract: State-of-the-art semantic segmentation methods were almost exclusively trained on images within a fixed resolution range. These segmentations are inaccurate for very high-resolution images since using bicubic upsampling of low-resolution segmentation does not adequately capture high-resolution details along object boundaries. In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data. The key insight is our CascadePSP network which refines and corrects local boundaries whenever possible. Although our network is trained with low-resolution segmentation data, our method is applicable to any resolution even for very high-resolution images larger than 4K. We present quantitative and qualitative studies on different datasets to show that CascadePSP can reveal pixel-accurate segmentation boundaries using our novel refinement module without any finetuning. Thus, our method can be regarded as class-agnostic. Finally, we demonstrate the application of our model to scene parsing in multi-class segmentation.
摘要:国家的最先进的语义分割方法是在一个固定的分辨率范围内的图像几乎全部训练。这些细分是因为使用低分辨率分割的双三次采样并没有充分反映沿着对象边界高分辨率的细节不准确的非常高分辨率的图像。在本文中,我们提出了一种新的方法来解决高解析度分割问题,而无需使用任何高分辨率的训练数据。关键的洞察力是我们CascadePSP网络,优化和校正局部边界只要有可能。虽然我们的网络与低分辨率分割数据训练,我们的方法即使对于非常高分辨率的图像大于4K适用于任何决议。我们对不同的数据集目前定量和定性的研究表明,CascadePSP可以使用我们新的提炼模块无任何显示细化和微调像素精确的分段边界。因此,我们的方法可以被视为类无关。最后,我们证明我们的模型的多级细分应用场景解析。

21. Multi-Head Attention with Joint Agent-Map Representation for Trajectory Prediction in Autonomous Driving [PDF] 返回目录
  Kaouther Messaoud, Nachiket Deo, Mohan M. Trivedi, Fawzi Nashashibi
Abstract: For autonomous vehicles to navigate in urban environment, the ability to predict the possible future behaviors of surrounding vehicles is essential to increase their safety level by avoiding dangerous situations in advance. The behavior anticipation task is mainly based on two tightly linked cues; surrounding agents' recent motions and scene information. The configuration of the agents may uncover which part of the scene is important, while scene structure determines the influential existing agents. To better present this correlation, we deploy multi-head attention on a joint agents and map context. Moreover, to account for the uncertainty of the future, we use an efficient multi-modal probabilistic trajectory prediction model that learns to extract different joint context features and generate diverse possible trajectories accordingly in one forward pass. Results on the publicly available nuScenes dataset prove that our model achieves the performance of existing methods and generates diverse possible future trajectories compliant with scene structure. Most importantly, the visualization of attention maps reveals some of the underlying prediction logic of our approach which increases its interpretability and reliability to deploy in the real world.
摘要:对于自主车在城市环境中导航,预测周边车辆的未来可能行为的能力是必不可少的,避免提前危险的情况下,以提高他们的安全水平。行为预期任务主要是基于两个紧密联系的线索;周围代理人的近期动作,场景信息。的代理的配置可能会发现它的场景的一部分是重要的,而场景结构确定有影响现有代理。为了更好地呈现这种相关性,我们就联合代理和地图方面部署多头关注。此外,以考虑到未来的不确定性,我们使用一个有效的多模态概率轨迹预测模型,该模型学会提取不同联合上下文特征,并且在一个直传相应地产生不同的可能的轨迹。公开提供的nuScenes结果数据集证明,我们的模型实现了现有方法的性能,并产生不同的可能的未来轨迹与现场结构一致。最重要的是,人们关注的可视化映射揭示了我们的一些做法的基本预测逻辑增加了解释性和可靠性,在现实世界中来部署的。

22. Partly Supervised Multitask Learning [PDF] 返回目录
  Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Yuan Xiao, Dingjun Hao, Zhen Qian, Demetri Terzopoulos
Abstract: Semi-supervised learning has recently been attracting attention as an alternative to fully supervised models that require large pools of labeled data. Moreover, optimizing a model for multiple tasks can provide better generalizability than single-task learning. Leveraging self-supervision and adversarial training, we propose a novel general purpose semi-supervised, multiple-task model---namely, self-supervised, semi-supervised, multitask learning (S$^4$MTL)---for accomplishing two important tasks in medical imaging, segmentation and diagnostic classification. Experimental results on chest and spine X-ray datasets suggest that our S$^4$MTL model significantly outperforms semi-supervised single task, semi/fully-supervised multitask, and fully-supervised single task models, even with a 50\% reduction of class and segmentation labels. We hypothesize that our proposed model can be effective in tackling limited annotation problems for joint training, not only in medical imaging domains, but also for general-purpose vision tasks.
摘要:半监督学习最近备受关注作为替代,需要标记数据的大池完全监控模式。此外,优化模型为多个任务可以提供比单任务学习更好的普遍性。凭借自身的监督和对抗性训练,我们提出了一个新的通用半监督,多任务模式---即自我监督,半监督,多任务学习(S $ ^ $ 4 MTL)---为实现在医学成像中,分割和诊断分类的两个重要的任务。胸部和脊柱X射线数据集的实验结果表明了S $ ^ $ 4 MTL模型显著优于半监督的单任务,半/全监督的多任务,充分监督的单任务模式,甚至减少了50 \%类和分割标签。我们假设,我们提出的模型可以有效地解决联合训练,不仅在医疗成像领域,同时也为通用的视觉任务有限的注释问题。

23. Mimicry: Towards the Reproducibility of GAN Research [PDF] 返回目录
  Kwot Sin Lee, Christopher Town
Abstract: Advancing the state of Generative Adversarial Networks (GANs) research requires one to make careful and accurate comparisons with existing works. Yet, this is often difficult to achieve in practice when models are often implemented differently using varying frameworks, and evaluated using different procedures even when the same metric is used. To mitigate these issues, we introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics to closely reproduce reported scores in the literature. We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures. The library can be found at this https URL.
摘要:推进创成对抗性网络(甘斯)研究需要一个周密而准确的比较与现有工程的进展情况。然而,这往往是在实践中难以实现的模型时,使用不同的框架往往实现方式不同,评价和使用不同的程序,即使使用相同的度量。为了缓解这些问题,我们引进模仿,一个轻量级PyTorch库,提供先进设备,最先进的流行甘斯和评价指标的实现与文献再现范围密切报道分数。我们在相同条件下培养这些甘斯,并使用相同的方法在三个流行甘指标进行评价,提供了不同的甘斯在七个广泛使用的数据集的综合基线表演。库可以在此HTTPS URL中找到。

24. Temporal Event Segmentation using Attention-based Perceptual Prediction Model for Continual Learning [PDF] 返回目录
  Ramy Mounir, Roman Gula, Jörn Theuerkauf, Sudeep Sarkar
Abstract: Temporal event segmentation of a long video into coherent events requires a high level understanding of activities' temporal features. The event segmentation problem has been tackled by researchers in an offline training scheme, either by providing full, or weak, supervision through manually annotated labels or by self-supervised epoch based training. In this work, we present a continual learning perceptual prediction framework (influenced by cognitive psychology) capable of temporal event segmentation through understanding of the underlying representation of objects within individual frames. Our framework also outputs attention maps which effectively localize and track events-causing objects in each frame. The model is tested on a wildlife monitoring dataset in a continual training manner resulting in $80\%$ recall rate at $20\%$ false positive rate for frame level segmentation. Activity level testing has yielded $80\%$ activity recall rate for one false activity detection every 50 minutes.
摘要:长视频成连贯的事件时间事件分割需要一个高水平的理解活动时间特征。事件分割问题已解决通过在离线训练方案的研究,方法是通过人工标注的标签或基于自我监督划时代培训提供完整的,或弱,监督。在这项工作中,我们通过各个帧中对象的基本表示的理解呈现不断学习感知预测框架(由认知心理学的影响)能够时间事件分割的。我们的框架还输出关注的地图,其中有效地定位并跟踪每一帧事件,造成的对象。该模型是在野生动物中导致$ 80 \%$召回率在20 $ \%一个连续训练方式监控数据集测试元帧水平分割假阳性率。活动水平测试已经取得了$ 80 \%$活动召回率,每50分钟一个假的活动检测。

25. Iris segmentation techniques to recognize the behavior of a vigilant driver [PDF] 返回目录
  Abdullatif Baba
Abstract: In this paper, we clarify how to recognize different levels of vigilance for vehicle drivers. In order to avoid the classical problems of crisp logic, we preferred to employ a fuzzy logic-based system that depends on two variables to make the final decision. Two iris segmentation techniques are well illustrated. A new technique for pupil position detection is also provided here with the possibility to correct the pupil detected position when dealing with some noisy cases.
摘要:在本文中,我们阐明如何识别不同级别的警惕对车辆驾驶员。为了避免脆逻辑的经典问题,我们最好使用一个模糊基于逻辑的系统,它取决于两个变量来作出最后的决定。两个虹膜分割技术是公示出。也与一些嘈杂的案件时纠正瞳孔检测位置的可能性在这里提供的瞳孔位置检测的新技术。

26. Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery [PDF] 返回目录
  Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon
Abstract: Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, where by contrast the high data availability in visible-band imagery readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality for non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies.
摘要:机器学习不可见图像内驱动目标检测和分类在许多领域,如夜视,全天候监控和航空安全的重要作用。但是,这样的应用经常遭受由于非可见光谱域成像的有限数量和种类,其中通过对比度可见波段图像的高数据可用性容易使当代深度学习从动检测和分类方法。为了解决这个问题,提出并评估其经由生成域转移模型通过使用更容易获得的可见波段图像的新颖数据增强方法。该模型可以通过图像转换从可见光图像域合成大量不可见域影像。此外,我们还表明,通过我们新的条件CycleGAN的mixup增强(C2GMA)方法插值混合类(非可视域)图像的例子产生可导致质量不可见的领域分类任务显著改善,否则遭受因有限的数据的可用性。着眼于合成孔径雷达(SAR)域内的分类,我们的做法是在挪威国家石油公司/ C-CORE冰山分类挑战数据集的变化进行评估,达到75.4%的准确率,当针对传统数据增强策略相比,表现出一个显著的改善。

27. Groupwise Multimodal Image Registration using Joint Total Variation [PDF] 返回目录
  Mikael Brudfors, Yaël Balbastre, John Ashburner
Abstract: In medical imaging it is common practice to acquire a wide range of modalities (MRI, CT, PET, etc.), to highlight different structures or pathologies. As patient movement between scans or scanning session is unavoidable, registration is often an essential step before any subsequent image analysis. In this paper, we introduce a cost function based on joint total variation for such multimodal image registration. This cost function has the advantage of enabling principled, groupwise alignment of multiple images, whilst being insensitive to strong intensity non-uniformities. We evaluate our algorithm on rigidly aligning both simulated and real 3D brain scans. This validation shows robustness to strong intensity non-uniformities and low registration errors for CT/PET to MRI alignment. Our implementation is publicly available at this https URL.
摘要:在医学成像中,通常的做法来获取宽范围的模态(MRI,CT,PET等)的,以突出显示不同的结构或病理。如扫描或扫描会话之间患者移动是不可避免的,常登记是任何随后的图像分析之前的必要步骤。在本文中,我们介绍了基于这样的多模态图像配准联合总变化的成本函数。该成本函数具有使原则性,多个图像中的成组对准,同时又是不敏感的强度强的非均匀性的优点。我们评估的严格对准这两个模拟和真实的3D脑部扫描我们的算法。此验证示出的鲁棒性强的强度的非均匀性和低的配准误差为CT / PET到MRI对准。我们的实现是公开的,在此HTTPS URL。

28. High-Contrast Limited-Angle Reflection Tomography [PDF] 返回目录
  Ajinkya Kadu, Hassan Mansour, Petros T. Boufounos
Abstract: Inverse scattering is the process of estimating the spatial distribution of the scattering potential of an object by measuring the scattered wavefields around it. In this paper, we consider a limited-angle reflection tomography of high contrast objects that commonly occurs in ground-penetrating radar, exploration geophysics, terahertz imaging, ultrasound, and electron microscopy. Unlike conventional transmission tomography, the reflection regime is severely ill-posed since the measured wavefields contain far less spatial frequency information of the target object. We propose a constrained incremental frequency inversion framework that requires no side information from a background model of the object. Our framework solves a sequence of regularized least-squares subproblems that ensure consistency with the measured scattered wavefield while imposing total-variation and non-negativity constraints. We propose a proximal Quasi-Newton method to solve the resulting subproblem and devise an automatic parameter selection routine to determine the constraint of each subproblem. We validate the performance of our approach on synthetic low-resolution phantoms and with a mismatched forward model test on a high-resolution phantom.
摘要:逆散射是通过测量它周围的散射波场估计目标的散射势的空间分布的方法。在本文中,我们认为,通常发生在地面穿透雷达,地球物理勘探,太赫兹成像,超声,和电子显微镜的高对比度对象的限制角反射断层扫描。不同于传统的传输断层扫描,反射制度是严重不适定因为所测量的波场包含目标对象的少得多的空间频率信息。我们提出了一个约束的增量备份的频率反转的框架,需要从对象的背景模型没有副作用的信息。我们的框架解决了正则化最小二乘子问题的是确保与测量的散射的波场的一致性而强加总变化和非负约束条件的序列。我们提出了一个近端拟牛顿法求解得到的子问题和设计一种自动参数选择例程来确定每个子问题的约束。我们验证了我们对合成的低分辨率幻影和与高分辨率幻影不匹配的前锋模型试验方法的性能。

29. Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction [PDF] 返回目录
  Toshiaki Koike-Akino, Ye Wang
Abstract: We propose a new concept of rateless auto-encoders (RL-AEs) that enable a flexible latent dimensionality, which can be seamlessly adjusted for varying distortion and dimensionality requirements. In the proposed RL-AEs, instead of a deterministic bottleneck architecture, we use an over-complete representation that is stochastically regularized with weighted dropouts, in a manner analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically increasing dropout rates across the latent representation nodes such that the latent variables become sorted by importance like in principal component analysis (PCA). This is motivated by the rateless property of conventional PCA, where the least important principal components can be discarded to realize variable rate dimensionality reduction that gracefully degrades the distortion. In contrast, since the latent variables of conventional AEs are equally important for data reconstruction, they cannot be simply discarded to further reduce the dimensionality after the AE model is trained. Our proposed stochastic bottleneck framework enables seamless rate adaptation with high reconstruction performance, without requiring predetermined latent dimensionality at training. We experimentally demonstrate that the proposed RL-AEs can achieve variable dimensionality reduction while achieving low distortion compared to conventional AEs.
摘要:我们提出,实现灵活的潜在维度,它可以无缝地调整不同的失真和维度需求无限速率自动编码器(RL-AES)的新概念。在所提出的RL-AES,而不是确定性的瓶颈结构,我们使用随机地加权遗失正规化过完整的表示,其方式类似于稀疏AE(SAE)。与严重不良事件,我们的RL-AES聘请跨潜表示单调辍学率增加节点使得潜在变量成为按重要性就像在主成分分析(PCA)进行分类。这是通过常规PCA的无比率特性,其中最不重要的主成分可以被丢弃,以实现可变速率的降维的是缓慢下降的失真的动机。相反,由于现有的AE的潜在变量是用于数据重建同样重要,它们不能被简单地丢弃,以进一步降低维数的AE模型被训练后。我们提出的随机瓶颈框架使无缝速率自适应以高重建的性能,而不会在训练需要预定潜维数。我们通过实验证明,该RL-AES可以实现可变降维,同时相比传统的AE实现低失真。

30. Unsupervised Real-world Low-light Image Enhancement with Decoupled Networks [PDF] 返回目录
  Wei Xiong, Ding Liu, Xiaohui Shen, Chen Fang, Jiebo Luo
Abstract: Conventional learning-based approaches to low-light image enhancement typically require a large amount of paired training data, which are difficult to acquire in real-world scenarios. Recently, unsupervised models for this task have been explored to eliminate the use of paired data. However, these methods primarily tackle the problem of illumination enhancement, and usually fail to suppress the noises that ubiquitously exist in images taken under real-world low-light conditions. In this paper, we address the real-world low-light image enhancement problem by decoupling this task into two sub-tasks: illumination enhancement and noise suppression. We propose to learn a two-stage GAN-based framework to enhance the real-world low-light images in a fully unsupervised fashion. In addition to conventional benchmark datasets, a new unpaired low-light image enhancement dataset is built and used to thoroughly evaluate the performance of our model. Extensive experiments show that our method outperforms the state-of-the-art unsupervised image enhancement methods in terms of both illumination enhancement and noise reduction.
摘要:传统的基于学习的方法,以低光图像增强通常需要大量的配对训练数据,这是很难在现实世界的情景收购。近日,这个任务无人监管模式进行了探讨,以杜绝使用成对的数据。然而,这些方法主要是解决照明增强的问题,通常无法抑制,在现实世界中的低光照条件下拍摄的图像存在无所不在的噪音。光照增强和噪声抑制:在本文中,我们将去耦这个任务分成两个子任务,解决现实世界的低光图像增强问题。我们建议学习两个阶段的基于GaN的框架,加强在完全无人监管的方式真实世界的低光图像。除了传统的基准数据集,新配对的低光图像增强数据集构建和用于全面评估我们的模型的性能。大量的实验表明,我们的方法优于在两个照明增强和降噪方面的状态的最先进的无监督的图像增强方法。

31. Knee Injury Detection using MRI with Efficiently-Layered Network (ELNet) [PDF] 返回目录
  Chen-Han Tsai, Nahum Kiryati, Eli Konen, Iris Eshed, Arnaldo Mayer
Abstract: Magnetic Resonance Imaging (MRI) is a widely-accepted imaging technique for knee injury analysis. Its advantage of capturing knee structure in three dimensions makes it the ideal tool for radiologists to locate potential tears in the knee. In order to better confront the ever growing workload of musculoskeletal (MSK) radiologists, automated tools for patients' triage are becoming a real need, reducing delays in the reading of pathological cases. In this work, we present the Efficiently-Layered Network (ELNet), a convolutional neural network (CNN) architecture optimized for the task of initial knee MRI diagnosis for triage. Unlike past approaches, we train ELNet from scratch instead of using a transfer-learning approach. The proposed method is validated quantitatively and qualitatively, and compares favorably against state-of-the-art MRNet while using a single imaging stack (axial or coronal) as input. Additionally, we demonstrate our model's capability to locate tears in the knee despite the absence of localization information during training. Lastly, the proposed model is extremely lightweight ($<$ 1mb) and therefore easy to train deploy in real clinical settings. < font>
摘要:磁共振成像(MRI)是用于膝伤分析一个广泛接受的成像技术。它在三维空间中捕捉膝盖结构的优势使得它成为放射科医生定位在膝盖潜在眼泪的理想工具。为了更好地对垒肌肉骨骼的不断增加的工作量(MSK)放射科医生,为病人分流自动化工具正在成为一个真正的需要,减少在病理情况下的读取延迟。在这项工作中,我们提出了高效,分层网络(ELNet),卷积神经网络(CNN)架构初步膝关节MRI诊断分流的任务进行了优化。与过去的做法,我们训练ELNet从头开始,而不是使用传送学习方法。所提出的方法是定量和定性验证,同时使用单个成像堆(轴向或冠状)作为输入对国家的最先进的MRNet逊色。此外,我们证明我们的模型的能力,定位的眼泪在膝盖,尽管在训练中没有的定位信息。最后,该模型是非常轻量级($ <$ 1mb),因此很容易在实际临床环境培训和部署。< font>

32. Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns [PDF] 返回目录
  Anirudh Som, Narayanan Krishnamurthi, Matthew Buman, Pavan Turaga
Abstract: Application and use of deep learning algorithms for different healthcare applications is gaining interest at a steady pace. However, use of such algorithms can prove to be challenging as they require large amounts of training data that capture different possible variations. This makes it difficult to use them in a clinical setting since in most health applications researchers often have to work with limited data. Less data can cause the deep learning model to over-fit. In this paper, we ask how can we use data from a different environment, different use-case, with widely differing data distributions. We exemplify this use case by using single-sensor accelerometer data from healthy subjects performing activities of daily living - ADLs (source dataset), to extract features relevant to multi-sensor accelerometer gait data (target dataset) for Parkinson's disease classification. We train the pre-trained model using the source dataset and use it as a feature extractor. We show that the features extracted for the target dataset can be used to train an effective classification model. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.
摘要:应用和使用的深度学习算法不同的医疗应用正在获得稳步的兴趣。然而,使用这种算法,可以证明,因为他们需要捕捉各种可能的变化大量的训练数据的充满挑战。这使得难以在临床上,因为在大多数保健方面的应用研究人员经常要与有限的数据工作中使用它们。更少的数据可引起深自学习模型过度拟合。在本文中,我们问我们如何使用的数据来自不同的环境,不同的使用情况下,有广泛不同的数据分布。我们通过使用单sensor重力从执行日常活动健康受试者的数据举例说明这种使用情况 - 有关日常生活活动(源数据集),提取特征的多传感器加速度计为帕金森氏病的分类步态数据(目标数据集)。我们培养使用源数据集预先训练的模型,并用它作为一个特征提取。我们表明,目标数据集提取的特征可以用来训练有效的分类模型。我们的预训练源模型包括一个卷积自动编码器的,并且目标分类模型是一个简单的多层感知模型。我们探讨两个不同的预先训练源模型,采用不同的活动小组的训练,并分析了影响预先训练模式的选择有超过帕金森氏疾病分类的任务。

33. Multi-task pre-training of deep neural networks [PDF] 返回目录
  Romain Mormont, Pierre Geurts, Raphaël Marée
Abstract: In this work, we investigate multi-task learning as a way of pre-training models for classification tasks in digital pathology. It is motivated by the fact that many small and medium-size datasets have been released by the community over the years whereas there is no large scale dataset similar to ImageNet in the domain. We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images. Then, we propose a simple architecture and training scheme for creating a transferable model and a robust evaluation and selection protocol in order to evaluate our method. Depending on the target task, we show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance. Fine-tuning improves performance over feature extraction and is able to recover the lack of specificity of ImageNet features, as both pre-training sources yield comparable performance.
摘要:在这项工作中,我们研究了多任务学习作为前期培训模式在数字病理学分类任务的方式。它是由一个事实,即许多小型和中等规模的数据集已被释放社区多年来而存在的域类似ImageNet没有大规模数据集的动机。我们首先组装和改造众多数字病理数据集到22级的任务,几乎900K的图像池。然后,我们提出了一个简单的架构和培训方案,以评估我们的方法创建转让模式和强大的评估和选择协议。根据不同的目标任务,我们表明,使用我们的模型为特征提取要么提高显著过ImageNet预先训练模型或提供相当的性能。微调改进了特征提取的性能和能够恢复缺乏ImageNet的特异性功能,既是前培训来源的收益相当的性能。

34. Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder [PDF] 返回目录
  Guanlin Li, Shuya Ding, Jun Luo, Chang Liu
Abstract: Whereas adversarial training is employed as the main defence strategy against specific adversarial samples, it has limited generalization capability and incurs excessive time complexity. In this paper, we propose an attack-agnostic defence framework to enhance the intrinsic robustness of neural networks, without jeopardizing the ability of generalizing clean samples. Our Feature Pyramid Decoder (FPD) framework applies to all block-based convolutional neural networks (CNNs). It implants denoising and image restoration modules into a targeted CNN, and it also constraints the Lipschitz constant of the classification layer. Moreover, we propose a two-phase strategy to train the FPD-enhanced CNN, utilizing $\epsilon$-neighbourhood noisy images with multi-task and self-supervised learning. Evaluated against a variety of white-box and black-box attacks, we demonstrate that FPD-enhanced CNNs gain sufficient robustness against general adversarial samples on MNIST, SVHN and CALTECH. In addition, if we further conduct adversarial training, the FPD-enhanced CNNs perform better than their non-enhanced versions.
摘要:尽管对抗训练作为针对特定对抗样本主要的防御战略,它具有有限的泛化能力,并会导致过多的时间复杂度。在本文中,我们提出了一个攻击无关的防御框架,以提高神经网络的内在稳健性,同时又不损害推广清洁样品的能力。我们的特点金字塔解码器(FPD)框架适用于所有基于块的卷积神经网络(细胞神经网络)。它植入去噪和图像复原模块成靶向CNN,并且它也约束分类层的Lipschitz常数。此外,我们提出了两个阶段的战略来训练FPD增强CNN,利用$ \ $小量与-neighbourhood多任务和自我监督学习噪声图像。评估针对各种白盒和黑盒攻击,我们证明了FPD增强细胞神经网络获得对上MNIST,SVHN和加州理工学院一般对抗性的样本足够的稳健性。此外,如果我们进一步进行对抗性训练,在FPD增强细胞神经网络进行比非增强版本更好。

35. Cross-media Structured Common Space for Multimedia Event Extraction [PDF] 返回目录
  Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
Abstract: We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.
摘要:我们推出了新的任务,多媒体事件提取(M2E2),其目的是提取事件和多媒体文件的参数。我们开发的第一个基准测试,并收集与广泛的注释事件和参数245篇多媒体新闻文章的数据集。我们提出了一个新颖的方法,弱对齐结构嵌入(早生),其编码结构化的文本从和可视数据的语义信息表示成一个共同嵌入的空间。该结构是通过采用弱指导训练策略,这使得利用现有资源没有明确的跨媒体跨标注方式排列。相较于国家的最先进的单峰的方法,我们的方法实现对文本事件的说法角色标注和视觉事件提取4.0%和9.8%的绝对F-得分收益。相较于国家的最先进的多媒体非结构化表示,我们分别实现多媒体事件的提取和论证角色标注为8.3%和5.0%的绝对F-得分收益。通过利用图片,我们提取更多的21.4%提到的事件比传统的纯文本的方法。

36. A new design of a flying robot, with advanced computer vision techniques to perform self-maintenance of smart grids [PDF] 返回目录
  Abdullatif Baba
Abstract: In this paper, we present a full design of a flying robot to investigate the state of power grid components and to perform the appropriate maintenance procedures according to each fail or defect that could be recognized. To realize this purpose; different types of sensors including thermal and aerial vision-based systems are employed in this design. The main features and technical specifications of this robot are presented and discussed here in detail. Some essential and advanced computer vision techniques are exploited in this work to take some readings and measurements from the robot's surroundings. From each given image, many sub-images containing different electrical components are extracted using a new region proposal approach that relies on Discrete Wavelet Transform, to be classified later by utilizing a Convolutional Neural Network.
摘要:在本文中,我们提出了一个飞行机器人的完整设计,调查电网组件的状态,并根据执行相应的维护程序的每个失败或可能被识别的缺陷。为了实现这一目的;不同类型的传感器包括热和空中基于视觉的系统,可在这样的设计中采用。主要特点和该机器人的技术规范提出并详细讨论。一些重要的和先进的计算机视觉技术在这项工作中利用采取一些读数和测量从机器人的周围环境。从每个给定的图像,使用依赖于离散小波新的区域的建议的方法变换,要通过利用卷积神经网络分类后提取包含不同的电气部件的许多子图像。

注:中文为机器翻译结果!