摘要

1. Deformable DETR: Deformable Transformers for End-to-End Object Detection [PDF] 返回目录
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
Abstract: DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10$\times$ less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code shall be released.
摘要：DETR最近已经提出了消除物体检测许多手工设计的部件的需求，同时展现良好的性能。然而，从收敛速度慢，功能有限的空间分辨率受苦，由于变压器注意模块用于处理图像特征映射的限制。为了缓解这些问题，我们提出了变形DETR，他的注意力只模块照顾一小部分围绕基准关键采样点。变形DETR可以实现比DETR 10 $ \ $次培训较少时期更好的性能（尤其是在小的物体）。在COCO基准大量的实验证明了该方法的有效性。代码将被释放。

2. A Survey On Anti-Spoofing Methods For Face Recognition with RGB Cameras of Generic Consumer Devices [PDF] 返回目录
Zuheng Ming, Muriel Visani, Muhammad Muzzamil Luqman, Jean-Christophe Burie
Abstract: The widespread deployment of face recognition-based biometric systems has made face Presentation Attack Detection (face anti-spoofing) an increasingly critical issue. This survey thoroughly investigates the face Presentation Attack Detection (PAD) methods, that only require RGB cameras of generic consumer devices, over the past two decades. We present an attack scenario-oriented typology of the existing face PAD methods and we provide a review of over 50 of the most recent face PAD methods and their related issues. We adopt a comprehensive presentation of the methods that have most influenced face PAD following the proposed typology, and in chronological order. By doing so, we depict the main challenges, evolutions and current trends in the field of face PAD, and provide insights on its future research. From an experimental point of view, this survey paper provides a summarized overview of the available public databases and extensive comparative experimental results of different PAD methods.
摘要：脸部识别，基于生物特征识别系统的广泛部署已取得的脸演示攻击检测（脸部反欺骗）一个日益严重的问题。本次调查彻底调查脸演示攻击检测（PAD）的方法，只需要普通的消费电子设备的RGB摄像头，在过去的二十年。我们目前现有的脸PAD方法攻击面向情景类型和我们提供的最新的脸部PAD方法50及其相关问题进行审查。我们采用的是已经影响最大表面衬垫的方法的建议类型学以下一个全面的介绍，并按照时间顺序。通过这样做，我们描绘了脸PAD领域的主要挑战，演变和当前的趋势，并提供有关其未来的研究见解。但从实验来看，本次调查文件提供可用的公共数据库和不同的PAD方法广泛的比较实验结果的总结概述。

3. Deep SVBRDF Estimation on Real Materials [PDF] 返回目录
Louis-Philippe Asselin, Denis Laurendeau, Jean-François Lalonde
Abstract: Recent work has demonstrated that deep learning approaches can successfully be used to recover accurate estimates of the spatially-varying BRDF (SVBRDF) of a surface from as little as a single image. Closer inspection reveals, however, that most approaches in the literature are trained purely on synthetic data, which, while diverse and realistic, is often not representative of the richness of the real world. In this paper, we show that training such networks exclusively on synthetic data is insufficient to achieve adequate results when tested on real data. Our analysis leverages a new dataset of real materials obtained with a novel portable multi-light capture apparatus. Through an extensive series of experiments and with the use of a novel deep learning architecture, we explore two strategies for improving results on real data: finetuning, and a per-material optimization procedure. We show that adapting network weights to real data is of critical importance, resulting in an approach which significantly outperforms previous methods for SVBRDF estimation on real materials. Dataset and code are available at this https URL
摘要：最近的研究表明，深学习方法可以成功地用于从少到单个图像恢复表面的空间变化BRDF（SVBRDF）的准确估计。仔细进行观察，但是，是在合成数据，其中，同时多样化的和现实的，往往不能代表现实世界的丰富性的纯粹训练有素的大多数方法在文献中。在本文中，我们表明，在合成数据训练这种网络完全是不够的，当实际数据测试，以达到足够的效果。我们的分析利用了具有新型便携式多光捕获装置获得的真实材料的新的数据集。通过一系列广泛的实验，并使用新的深度学习建筑，我们探索两种策略对于提高实际数据结果：细化和微调，并且每个材料优化过程。我们表明，适应网络权到真实的数据是至关重要的，导致其显著优于对真材实料SVBRDF估计以前的方法的方法。数据集和代码可在此HTTPS URL

4. 3D Object Detection and Pose Estimation of Unseen Objects in Color Images with Local Surface Embeddings [PDF] 返回目录
Giorgia Pitteri, Aurélie Bugeau, Slobodan Ilic, Vincent Lepetit
Abstract: We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model and no training phase for new objects. Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images. For points at the surface of objects, this embedding can be computed directly from the CAD model; for image locations, we learn to predict it from the image itself. This establishes correspondences between 3D points on the CAD model and 2D locations of the input images. However, many of these correspondences are ambiguous as many points may have similar local geometries. We show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences. We can then robustly estimate a 3D pose from these discriminative correspondences using a RANSAC- like algorithm. We demonstrate the performance of this approach on the T-LESS dataset, by using a small number of objects to learn the embedding and testing it on the other objects. Our experiments show that our method is on par or better than previous methods.
摘要：我们提出的方法用于检测和估计三维姿态在图像中的对象的，仅需要一个无纹理CAD模型和新对象没有训练阶段。我们的方法结合了深度学习和3D几何形状：它依赖于本地的3D几何形状的嵌入到CAD模型匹配输入图像。在物体的表面上的点，此嵌入可以直接从CAD模型计算;对于图像位置，我们学会了从图像本身预测它。在CAD模型的3D点和输入图像的2D位置之间这建立对应关系。然而，许多这些对应的是模糊的尽可能多的积分可能有类似的地方几何形状。我们表明，我们可以用面膜RCNN一类无关的方式来检测新的对象，而无需再培训，从而大大限制可能对应的数量。然后，我们可以稳健估计使用像算法RANSAC的这些区别对应一个三维姿态。我们证明在T-LESS数据集这种方法的性能，通过使用较少的对象来学习嵌入和其他对象测试它。我们的实验表明，我们的方法是持平或比以前的方法更好。

5. Are Adaptive Face Recognition Systems still Necessary? Experiments on the APE Dataset [PDF] 返回目录
Giulia Orrù, Marco Micheletto, Julian Fierrez, Gian Luca Marcialis
Abstract: In the last five years, deep learning methods, in particular CNN, have attracted considerable attention in the field of face-based recognition, achieving impressive results. Despite this progress, it is not yet clear precisely to what extent deep features are able to follow all the intra-class variations that the face can present over time. In this paper we investigate the performance the performance improvement of face recognition systems by adopting self updating strategies of the face templates. For that purpose, we evaluate the performance of a well-known deep-learning face representation, namely, FaceNet, on a dataset that we generated explicitly conceived to embed intra-class variations of users on a large time span of captures: the APhotoEveryday (APE) dataset. Moreover, we compare these deep features with handcrafted features extracted using the BSIF algorithm. In both cases, we evaluate various template update strategies, in order to detect the most useful for such kind of features. Experimental results show the effectiveness of "optimized" self-update methods with respect to systems without update or random selection of templates.
摘要：在过去的五年中，深的学习方法，特别是CNN，已经吸引了相当多的关注基于面识别领域，取得了令人瞩目的成果。尽管取得这一进展，目前尚不能准确清楚到什么程度深的特点是能够按照所有的类内变化，面部可呈现随着时间的推移。在本文中，我们采用的人脸模板自我更新策略研究的性能人脸识别系统的性能改进。为此，我们评价一个众所周知的深学习的人脸表示，即FaceNet的表现，对我们产生的数据集显式地设想嵌入在捕获的时间跨度很大的用户的类内变化：在APhotoEveryday（ APE）的数据集。此外，我们比较这些深层次的功能与使用BSIF算法提取手工制作的特点。在这两种情况下，我们评估各种模板更新策略，以检测最有用的这样那样的功能。实验结果表明的“优化”相对于系统没有更新或模板随机选择自我更新方法的有效性。

6. Semi-Supervised Learning of Multi-Object 3D Scene Representations [PDF] 返回目录
Cathrin Elich, Martin R. Oswald, Marc Pollefeys, Jörg Stückler
Abstract: Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multi-object 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes in a supervised way. By differentiable rendering we then train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture from a single view. We evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate its generative capabilities.
摘要：在对象的粒度代表场景是场景的理解和决策的先决条件。我们建议从图像学的多对象的3D场景表示一种新的方法。甲复发性编码器开始退化，从输入的RGB图像的3D形状，姿势和每个对象的纹理的潜表示。三维形状在功能空间作为有符号的距离函数（SDF），其有效地我们预列车在监督方式从例如形状连续地表示。通过微的渲染，我们再训练我们的模型分解场面自我监督从RGB-d的图像。我们的做法学会分解图像到场景的组成对象，并从一个单一的视图推断出它们的形状，姿态和质感。我们评估我们推断3D场景布局模型的准确性和展示其生殖能力。

7. Single-Image Camera Response Function Using Prediction Consistency and Gradual Refinement [PDF] 返回目录
Aashish Sharma, Robby T. Tan, Loong-Fah Cheong
Abstract: A few methods have been proposed to estimate the CRF from a single image, however most of them tend to fail in handling general real images. For instance, EdgeCRF based on patches extracted from colour edges works effectively only when the presence of noise is insignificant, which is not the case for many real images; and, CRFNet, a recent method based on fully supervised deep learning works only for the CRFs that are in the training data, and hence fail to deal with other possible CRFs beyond the training data. To address these problems, we introduce a non-deep-learning method using prediction consistency and gradual refinement. First, we rely more on the patches of the input image that provide more consistent predictions. If the predictions from a patch are more consistent, it means that the patch is likely to be less affected by noise or any inferior colour combinations, and hence, it can be more reliable for CRF estimation. Second, we employ a gradual refinement scheme in which we start from a simple CRF model to generate a result which is more robust to noise but less accurate, and then we gradually increase the model's complexity to improve the estimation. This is because a simple model, while being less accurate, overfits less to noise than a complex model does. Our experiments confirm that our method outperforms the existing single-image methods for both daytime and nighttime real images.
摘要：一些方法被提出从单一图像估计CRF，但大多倾向于在处理一般的真实图像失败。例如，基于从彩色边缘提取的拼块EdgeCRF有效地工作，只有当噪声的存在是微不足道的，这是不是许多实际图像的情况;并且，CRFNet的基础上，近期方法充分监督的深度学习仅适用于那些在训练数据，因此无法处理超出训练数据的其他可能的CRF病例报告表。为了解决这些问题，我们将介绍使用预测的一致性，并逐步细化非深学习方法。首先，我们更多地依赖于输入图像的提供更一致预测的补丁。如果从一个补丁的预测是更一致的，这意味着该补丁可能被较少受噪声或任何劣质的色彩组合，因而，它可以是用于CRF估计更可靠。其次，我们采用中，我们从一个简单的CRF模型开始产生的结果是更强大的噪音，但不太精确的逐步细化方案，然后我们逐步提高模型的改善估计的复杂性。这是因为一个简单的模型，而被不准确的，overfits少噪音比一个复杂的模型一样。我们的实验证实，我们的方法优于两个白天和夜间的真实图像的现有的单图像的方法。

8. Watch, read and lookup: learning to spot signs from multiple supervisors [PDF] 返回目录
Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
Abstract: The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles (readily available translations of the signed content) which provide additional weak-supervision; (3) looking up words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning. We validate the effectiveness of our approach on low-shot sign spotting benchmarks. In addition, we contribute a machine-readable British Sign Language (BSL) dictionary dataset of isolated signs, BSLDict, to facilitate study of this task. The dataset, models and code are available at our project page.
摘要：这项工作的重点是标志斑点 - 给定一个孤立的标志的视频，我们的任务是确定是否以及在何处已连续，协同发音的手语视频已经签署。为了实现这一迹象斑点的任务，我们使用多种类型可供监督的火车模型：（1）看现有的稀疏标记的镜头;这提供额外的弱监督（2）读数相关字幕（签名内容的容易获得的翻译）; （3）查找单词（其中没有协同发音标记的实例是可用的）的视觉符号语言字典，以使新的标志斑点。这三个任务集成到使用噪声对比估计和多示例学习的原则，统一的学习框架。我们验证我们的低次迹象斑点的基准方法的有效性。此外，我们还有助于机器可读英国手语（BSL）的分离迹象，BSLDict词典数据集，以促进这项工作的研究。该数据集，模型和代码都可以在我们的项目页面。

9. Unconstrained Text Detection in Manga [PDF] 返回目录
Julián Del Gobbo, Rosana Matuk Herrera
Abstract: The detection and recognition of unconstrained text is an open problem in research. Text in comic books has unusual styles that raise many challenges for text detection. This work aims to identify text characters at a pixel level in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with individual character level annotations, we create our own. Most of the literature in text detection use bounding box metrics, which are unsuitable for pixel-level evaluation. Thus, we implemented special metrics to evaluate performance. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text detection in manga in most metrics.
摘要：检测与识别不受约束的文本是在研究一个开放的问题。在漫画书文字有着不同寻常的风格，提高对文本检测许多挑战。这项工作的目的是在像素级的漫画风格，以确定文本字符高度复杂的文本样式：日本漫画。为了克服缺少有个性级别注释的漫画集，我们创造我们自己的。大多数在文本检测使用包围盒的度量，不适合于像素级评价的文献。因此，我们实施特殊的指标来评估绩效。利用这些资源，我们设计并评估了深刻的网络模型，超越了文本检测电流的方法在漫画中的大多数指标。

10. UESegNet: Context Aware Unconstrained ROI Segmentation Networks for Ear Biometric [PDF] 返回目录
Aman Kamboj, Rajneesh Rani, Aditya Nigam, Ranjeet Ranjan Jha
Abstract: Biometric-based personal authentication systems have seen a strong demand mainly due to the increasing concern in various privacy and security applications. Although the use of each biometric trait is problem dependent, the human ear has been found to have enough discriminating characteristics to allow its use as a strong biometric measure. To locate an ear in a 2D side face image is a challenging task, numerous existing approaches have achieved significant performance, but the majority of studies are based on the constrained environment. However, ear biometrics possess a great level of difficulties in the unconstrained environment, where pose, scale, occlusion, illuminations, background clutter etc. varies to a great extent. To address the problem of ear localization in the wild, we have proposed two high-performance region of interest (ROI) segmentation models UESegNet-1 and UESegNet-2, which are fundamentally based on deep convolutional neural networks and primarily uses contextual information to localize ear in the unconstrained environment. Additionally, we have applied state-of-the-art deep learning models viz; FRCNN (Faster Region Proposal Network) and SSD (Single Shot MultiBox Detecor) for ear localization task. To test the model's generalization, they are evaluated on six different benchmark datasets viz; IITD, IITK, USTB-DB3, UND-E, UND-J2 and UBEAR, all of which contain challenging images. The performance of the models is compared on the basis of object detection performance measure parameters such as IOU (Intersection Over Union), Accuracy, Precision, Recall, and F1-Score. It has been observed that the proposed models UESegNet-1 and UESegNet-2 outperformed the FRCNN and SSD at higher values of IOUs i.e. an accuracy of 100\% is achieved at IOU 0.5 on majority of the databases.
摘要：生物为基础的个人认证系统却出现了强劲的需求主要是由于各种隐私和安全应用的越来越多的关注。虽然使用的每个生物计量特征的是问题相关的，人的耳朵已经被发现有足够的鉴别特征允许其作为一个强大的生物特征测定中使用。要找到在2D侧面图像的耳朵是一个具有挑战性的任务，许多现有的方法都取得了显著的性能，但大多数的研究都是基于约束的环境。然而，生物测定耳朵具有的在不受约束的环境困难，其中的姿势，规模，遮挡，照明，背景杂波等很大的电平变化到一个很大的程度。为了解决在野外耳定位问题，我们提出的利息率（ROI）的分割模式UESegNet-1和UESegNet-2，这是从根本上基于深卷积神经网络两个高性能区域和主要使用的上下文信息本地化耳朵在不受约束的环境。此外，我们已申请国家的最先进的深学习模型即; FRCNN（更快的地区建议网络）和SSD（单次MultiBox的Detecor）耳的定位任务。为了检验模型的泛化，他们在六个不同的标准数据集即评价; IITD，IITK，北京科技大学，DB3，UND-E，UND-J2和UBEAR，它们都含有具有挑战性的图像。所述模型的性能的物体检测性能测量参数，例如期票（交叉口超过联盟），准确度，精确，回想一下，和F1-得分的基础上进行比较。已经观察到的是，提出的模型UESegNet-1和UESegNet-2优于FRCNN和SSD在即100 \％的精度在0.5 IOU上多数的数据库实现白条更高的值。

11. A Human Ear Reconstruction Autoencoder [PDF] 返回目录
Hao Sun, Nick Pears, Hang Dai
Abstract: The ear, as an important part of the human head, has received much less attention compared to the human face in the area of computer vision. Inspired by previous work on monocular 3D face reconstruction using an autoencoder structure to achieve self-supervised learning, we aim to utilise such a framework to tackle the 3D ear reconstruction task, where more subtle and difficult curves and features are present on the 2D ear input images. Our Human Ear Reconstruction Autoencoder (HERA) system predicts 3D ear poses and shape parameters for 3D ear meshes, without any supervision to these parameters. To make our approach cover the variance for in-the-wild images, even grayscale images, we propose an in-the-wild ear colour model. The constructed end-to-end self-supervised model is then evaluated both with 2D landmark localisation performance and the appearance of the reconstructed 3D ears.
摘要：耳朵，因为人体头部的重要组成部分，已经相比于计算机视觉领域的人脸备受关注较少。使用自动编码器结构，以实现自监督学习通过对单眼3D面部重建先前工作的启发，我们的目标是利用这样的架构，以解决所述3D耳朵重建任务，其中更微妙的和困难的曲线和特征存在于2D耳朵输入图片。我们人耳重建自动编码器（HERA）系统预测3D耳朵的姿势和3D耳朵网格形状参数，没有任何监督这些参数。为了使我们的方法包括在最狂野的图像，甚至灰度图像的变化，我们提出了一个在最狂野的耳朵颜色模型。将构建的端至端自监督模型，然后用2D地定位性能评价和二者的外观重建的3D耳朵。

12. IRX-1D: A Simple Deep Learning Architecture for Remote Sensing Classifications [PDF] 返回目录
Mahesh Pal, Akshay, B. Charan Teja
Abstract: We proposes a simple deep learning architecture combining elements of Inception, ResNet and Xception networks. Four new datasets were used for classification with both small and large training samples. Results in terms of classification accuracy suggests improved performance by proposed architecture in comparison to Bayesian optimised 2D-CNN with small training samples. Comparison of results using small training sample with Indiana Pines hyperspectral dataset suggests comparable or better performance by proposed architecture than nine reported works using different deep learning architectures. In spite of achieving high classification accuracy with limited training samples, comparison of classified image suggests different land cover classes are assigned to same area when compared with the classified image provided by the model trained using large training samples with all datasets.
摘要：本文提出了一种简单的深度学习架构整合后，成立之初，RESNET和Xception网络的元素。四个新的数据集被用于分类与小型和大型训练样本。结果在分类精度方面建议提高由提出的架构的性能相比于贝叶斯优化2D-CNN小训练样本。用小训练样本与印第安纳州松树高光谱数据集的结果比较表明所提出的架构相当或更好的性能超过九个报道作品采用不同的深度学习架构。尽管用有限训练样本获得高的分类精度的，分类的图像的比较表明，当通过使用大的训练样本的所有数据集训练所述模型提供的分类的图像相比不同的土地覆盖类别被分配给相同的面积。

13. BGM: Building a Dynamic Guidance Map without Visual Images for Trajectory Prediction [PDF] 返回目录
Beihao xia, Conghao Wong, Heng Li, Shiming Chen, Qinmu Peng, Xinge You
Abstract: Visual images usually contain the informative context of the environment, thereby helping to predict agents' behaviors. However, they hardly impose the dynamic effects on agents' actual behaviors due to the respectively fixed semantics. To solve this problem, we propose a deterministic model named BGM to construct a guidance map to represent the dynamic semantics, which circumvents to use visual images for each agent to reflect the difference of activities in different periods. We first record all agents' activities in the scene within a period close to the current to construct a guidance map and then feed it to a Context CNN to obtain their context features. We adopt a Historical Trajectory Encoder to extract the trajectory features and then combine them with the context feature as the input of the social energy based trajectory decoder, thus obtaining the prediction that meets the social rules. Experiments demonstrate that BGM achieves state-of-the-art prediction accuracy on the two widely used ETH and UCY datasets and handles more complex scenarios.
摘要：视觉图像通常包含环境的信息方面，从而帮助预测代理人的行为。然而，他们几乎没有强加给代理商的实际行为的动态效果，由于分别固定语义。为了解决这个问题，我们提出了一个名为BGM确定性模型来构建一个引导图来表示动态语义，用视觉图像规避每个代理，以反映活动在不同时期的不同。我们首先记录所有代理的活动在一段接近当前构建引导图中的场景，然后将其提供给一个Context CNN获得他们的背景特征。我们采用历史轨迹编码器提取的轨迹特征，然后用上下文特征的社会能量基于轨迹解码器的输入将它们结合起来，从而获得符合社会规则的预测。实验结果表明，实现BGM上的两个广泛使用的ETH和UCY数据集和手柄更复杂的场景状态的最先进的预测精度。

14. Clinically Verified Hybrid Deep Learning System for Retinal Ganglion Cells Aware Grading of Glaucomatous Progression [PDF] 返回目录
Hina Raja, Taimur Hassan, Muhammad Usman Akram, Naoufel Werghi
Abstract: Objective: Glaucoma is the second leading cause of blindness worldwide. Glaucomatous progression can be easily monitored by analyzing the degeneration of retinal ganglion cells (RGCs). Many researchers have screened glaucoma by measuring cup-to-disc ratios from fundus and optical coherence tomography scans. However, this paper presents a novel strategy that pays attention to the RGC atrophy for screening glaucomatous pathologies and grading their severity. Methods: The proposed framework encompasses a hybrid convolutional network that extracts the retinal nerve fiber layer, ganglion cell with the inner plexiform layer and ganglion cell complex regions, allowing thus a quantitative screening of glaucomatous subjects. Furthermore, the severity of glaucoma in screened cases is objectively graded by analyzing the thickness of these regions. Results: The proposed framework is rigorously tested on publicly available Armed Forces Institute of Ophthalmology (AFIO) dataset, where it achieved the F1 score of 0.9577 for diagnosing glaucoma, a mean dice coefficient score of 0.8697 for extracting the RGC regions and an accuracy of 0.9117 for grading glaucomatous progression. Furthermore, the performance of the proposed framework is clinically verified with the markings of four expert ophthalmologists, achieving a statistically significant Pearson correlation coefficient of 0.9236. Conclusion: An automated assessment of RGC degeneration yields better glaucomatous screening and grading as compared to the state-of-the-art solutions. Significance: An RGC-aware system not only screens glaucoma but can also grade its severity and here we present an end-to-end solution that is thoroughly evaluated on a standardized dataset and is clinically validated for analyzing glaucomatous pathologies.
摘要：目的：青光眼是全世界失明的第二大原因。青光眼进展可以通过分析视网膜神经节细胞（RGC）的变性容易地监测。许多研究人员通过从眼底和光学相干断层扫描量杯到盘比筛选青光眼。然而，本文提出了注重RGC萎缩筛查青光眼病理和分级的严重程度的新策略。方法：所提出的框架涵盖提取视网膜神经纤维层，与内丛状层和神经节细胞复杂区域神经节细胞，因此允许青光眼受试者的定量筛选混合卷积网络。此外，青光眼在筛选的情况下的严重程度通过分析这些区域的厚度是客观分级。结果：所提出的框架上眼科公开可用武装部队研究所严格的测试（用afio）的数据集，在那里它获得的0.9577的F1得分用于诊断青光眼，用于提取RGC区域的平均骰子系数得分的0.8697和0.9117的精度对于分级青光眼的进展。此外，拟议框架的性能在临床与四个专家眼科医生标记验证，实现了0.9236统计学显著Pearson相关系数。结论：RGC变性产生更好的青光眼筛查和分级相比，国家的最先进的解决方案的自动评估。意义：一个RGC感知系统不仅屏幕青光眼，但也可以分级其严重性，在这里我们提出一个终端到终端的解决方案，它是在一个标准化的数据集全面的评估，并临床验证分析青光眼疾病。

15. Dense Relational Image Captioning via Multi-task Triple-Stream Networks [PDF] 返回目录
Dong-Jin Kim, Tae-Hyun oh, Jinsoo Choi, In So Kweon
Abstract: We introduce dense relational captioning, a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in a visual scene. Relational captioning provides explicit descriptions of each relationship between object combinations. This framework is advantageous in both diversity and amount of information, leading to a comprehensive image understanding based on relationships, e.g., relational proposal generation. For relational understanding between objects, the part-of-speech (POS, i.e., subject-object-predicate categories) can be a valuable prior information to guide the causal sequence of words in a caption. We enforce our framework to not only learn to generate captions but also predict the POS of each word. To this end, we propose the multi-task triple-stream network (MTTSNet) which consists of three recurrent units responsible for each POS which is trained by jointly predicting the correct captions and POS for each word. In addition, we found that the performance of MTTSNet can be improved by modulating the object embeddings with an explicit relational module. We demonstrate that our proposed model can generate more diverse and richer captions, via extensive experimental analysis on large scale datasets and several metrics. We additionally extend analysis to an ablation study, applications on holistic image captioning, scene graph generation, and retrieval tasks
摘要：介绍密关系字幕，一个新的图像字幕的任务，目标相对于在视觉场景的对象之间关系的信息来生成多个字幕。关系字幕对象提供组合之间的每个关系的明确的描述。该框架是在这两个多样性和信息量有利，导致基础上的关系，例如，关系方案生成一个全面的图像理解。为对象之间的关系的理解，部分的词性（POS，即，主体 - 客体谓类别）可以是有价值的先验信息来指导字的因果序列中的字幕。我们我们的框架执行，不仅学会生成字幕，但也预示着每个单词的POS。为此，我们建议它由负责这是由联合预测每个字正确的字幕和POS训练的每个POS三份经常性单位的多任务三流网络（MTTSNet）。此外，我们发现，MTTSNet的性能可以通过明确的关系模块调节对象的嵌入得到改善。我们表明，我们的模型能够产生更多样化，更丰富的字幕，通过大规模数据集和几个指标广泛的实验分析。我们还扩展分析，以消融研究，对整体形象的字幕，场景图的生成，和检索任务的应用程序

16. A Comparative Study on Effects of Original and Pseudo Labels for Weakly Supervised Learning for Car Localization Problem [PDF] 返回目录
Cenk Bircanoglu
Abstract: In this study, the effects of different class labels created as a result of multiple conceptual meanings on localization using Weakly Supervised Learning presented on Car Dataset. In addition, the generated labels are included in the comparison, and the solution turned into Unsupervised Learning. This paper investigates multiple setups for car localization in the images with other approaches rather than Supervised Learning. To predict localization labels, Class Activation Mapping (CAM) is implemented and from the results, the bounding boxes are extracted by using morphological edge detection. Besides the original class labels, generated class labels also employed to train CAM on which turn to a solution to Unsupervised Learning example. In the experiments, we first analyze the effects of class labels in Weakly Supervised localization on the Compcars dataset. We then show that the proposed Unsupervised approach outperforms the Weakly Supervised method in this particular dataset by approximately %6.
摘要：在这项研究中，作为使用弱监督学习本地化多重概念含义的结果产生不同的类标签的作用提出了关于汽车的数据集。另外，生成的标签被包括在比较中，并将该溶液变成无监督学习。本文研究了与其它方法，而不是监督学习图像小车定位多个设置。为了预测定位标签，类激活映射（CAM）实现并从结果中，包围盒是通过使用形态学边缘检测提取。除了原有的类别标签，生成类别标签还用于训练CAM在其上转向解决无监督学习的例子。在实验中，我们首先分析类的标签在上Compcars数据集弱监督的定位效果。然后，我们表明，所提出的无监督方法，通过约6％，优于在这个特定的数据集的弱监督方法。

17. Age and Gender Prediction From Face Images Using Attentional Convolutional Network [PDF] 返回目录
Amirali Abdolrashidi, Mehdi Minaei, Elham Azimi, Shervin Minaee
Abstract: Automatic prediction of age and gender from face images has drawn a lot of attention recently, due it is wide applications in various facial analysis problems. However, due to the large intra-class variation of face images (such as variation in lighting, pose, scale, occlusion), the existing models are still behind the desired accuracy level, which is necessary for the use of these models in real-world applications. In this work, we propose a deep learning framework, based on the ensemble of attentional and residual convolutional networks, to predict gender and age group of facial images with high accuracy rate. Using attention mechanism enables our model to focus on the important and informative parts of the face, which can help it to make a more accurate prediction. We train our model in a multi-task learning fashion, and augment the feature embedding of the age classifier, with the predicted gender, and show that doing so can further increase the accuracy of age prediction. Our model is trained on a popular face age and gender dataset, and achieved promising results. Through visualization of the attention maps of the train model, we show that our model has learned to become sensitive to the right regions of the face.
摘要：从人脸图像的年龄和性别自动预测已经引起了很多关注最近，由于它是在各种面部分析问题有着广泛的应用。然而，由于人脸图像（如在照明，姿势，规模，闭塞变化）的大类内变化，现有的模型仍然落后于期望的精度水平，这是必要的在现实的使用这些模型的世界中的应用。在这项工作中，我们提出了一个深刻的学习框架的基础上，注意力和残余卷积网络的集成，来预测具有高准确率的面部图像的性别和年龄。使用注意机制使我们的模型把重点放在面部的重要和翔实的部分，它可以帮助它做出更准确的预测。我们培训我们的模型在一个多任务的学习方式，并扩大功能嵌入年龄分类的，有预测性，并表明，这样做可以进一步增加年龄预测的准确性。我们的模型被训练在一个流行的脸年龄和性别的数据集，并取得了可喜的成果。通过火车模型的注意力地图的可视化，我们表明，我们的模型已经学会成为面部的右侧区域敏感。

18. DBLFace: Domain-Based Labels for NIR-VIS Heterogeneous Face Recognition [PDF] 返回目录
Ha Le, Ioannis A. Kakadiaris
Abstract: Deep learning-based domain-invariant feature learning methods are advancing in near-infrared and visible (NIR-VIS) heterogeneous face recognition. However, these methods are prone to overfitting due to the large intra-class variation and the lack of NIR images for training. In this paper, we introduce Domain-Based Label Face (DBLFace), a learning approach based on the assumption that a subject is not represented by a single label but by a set of labels. Each label represents images of a specific domain. In particular, a set of two labels per subject, one for the NIR images and one for the VIS images, are used for training a NIR-VIS face recognition model. The classification of images into different domains reduces the intra-class variation and lessens the negative impact of data imbalance in training. To train a network with sets of labels, we introduce a domain-based angular margin loss and a maximum angular loss to maintain the inter-class discrepancy and to enforce the close relationship of labels in a set. Quantitative experiments confirm that DBLFace significantly improves the rank-1 identification rate by 6.7% on the EDGE20 dataset and achieves state-of-the-art performance on the CASIA NIR-VIS 2.0 dataset.
摘要：深基础的学习域不变特征的学习方式正在推进在近红外和可见光（NIR-VIS）异构人脸识别。然而，这些方法容易出现过度拟合由于大的类内变化和培训缺乏NIR图像。在本文中，我们将介绍基于域的标签面（DBLFace）的基础上，一个主题不是由单一的标签，而是由一组标签所代表的假设一个学习方法。每个标签代表一个特定域的图像。特别是，一组每个受试者两个标签，一个用于NIR图像和一个用于可见光图像，被用于训练NIR-VIS面部识别模型。图像划分为不同的域的分类降低了类内变化，并减少数据失衡的训练的负面影响。培养带套标签的网络，我们引入一个基于域的角缘损失和最大角损失保持级间差异和执行标签的一组的密切关系。定量实验确认DBLFace 6.7％显著改善了EDGE20数据集中的秩为1的识别率，并实现对CASIA NIR-VIS 2.0数据集状态的最先进的性能。

19. Generative Autoregressive Ensembles for Satellite Imagery Manipulation Detection [PDF] 返回目录
Daniel Mas Montserrat, János Horváth, S. K. Yarlagadda, Fengqing Zhu, Edward J. Delp
Abstract: Satellite imagery is becoming increasingly accessible due to the growing number of orbiting commercial satellites. Many applications make use of such images: agricultural management, meteorological prediction, damage assessment from natural disasters, or cartography are some of the examples. Unfortunately, these images can be easily tampered and modified with image manipulation tools damaging downstream applications. Because the nature of the manipulation applied to the image is typically unknown, unsupervised methods that don't require prior knowledge of the tampering techniques used are preferred. In this paper, we use ensembles of generative autoregressive models to model the distribution of the pixels of the image in order to detect potential manipulations. We evaluate the performance of the presented approach obtaining accurate localization results compared to previously presented approaches.
摘要：卫星图像是由于越来越多的轨道商用卫星的日益接近。许多应用程序利用这些图像的：农业管理，气象预报，损失评估自然灾害，或制图是一些例子。不幸的是，这些图像可以被容易地篡改，并用图像处理工具损坏下游应用修改。因为施加到图像操纵的性质通常是未知的，不无监督方法需要的使用是优选的篡改技术的先验知识。在本文中，我们使用生成自回归模型的合奏，以便发现潜在的操作图像的像素的分布模型。我们评估所提出的方法与先前提出的方法获得准确的定位结果的性能。

20. VisualNews : A Large Multi-source News Image Dataset [PDF] 返回目录
Fuxiao Liu, Yinghan Wang, Tianlu Wang, Vicente Ordonez
Abstract: We introduce VisualNews, a large-scale dataset collected from four news agencies consisting of more than one million news images along with associated news articles, image captions, author information, and other metadata. We also propose VisualNews-Captioner, a model for the task of news image captioning. Unlike the standard image captioning task, news images depict situations where people, locations, andevents are of paramount importance. Our proposed method is able to effectively combine visual and textual features to generate captions with richer information such as events and entities. More specifically, we propose an Entity-Aware module along with an Entity-Guide attention layer to encourage more accurate predictions for named entities. Our method achieves new state-of-the-art results on both GoodNews and VisualNews datasets while having significantly fewer parameters than competing methods. Our larger and more diverse VisualNews dataset further highlights the remaining challenges in news image captioning.
摘要：介绍VisualNews，从由超过一张百万的新闻图片与相关新闻报道，图片标题，作者信息和其他元数据四个新闻机构收集的大规模数据集。我们还建议VisualNews，字幕员，新闻图像字幕的任务模式。与标准图像字幕任务，新闻图像描绘那里的人，地点，andevents是极为重要的情况。我们提出的方法能够有效地结合视觉和文本特征，以产生更丰富的信息，标题如事件和实体。更具体地说，我们提出了与实体引导注意力层以鼓励命名实体更为准确的预测沿实体感知模块。我们的方法实现状态的最先进的新上都真证和同时具有比竞争方法的参数显著更少VisualNews数据集的结果。我们的更大和更多样化VisualNews数据集进一步凸显新闻图像字幕仍然面临的挑战。

21. Decamouflage: A Framework to Detect Image-Scaling Attacks on Convolutional Neural Networks [PDF] 返回目录
Bedeuro Kim, Alsharif Abuadbba, Yansong Gao, Yifeng Zheng, Muhammad Ejaz Ahmed, Hyoungshick Kim, Surya Nepal
Abstract: As an essential processing step in computer vision applications, image resizing or scaling, more specifically downsampling, has to be applied before feeding a normally large image into a convolutional neural network (CNN) model because CNN models typically take small fixed-size images as inputs. However, image scaling functions could be adversarially abused to perform a newly revealed attack called image-scaling attack, which can affect a wide range of computer vision applications building upon image-scaling functions. This work presents an image-scaling attack detection framework, termed as Decamouflage. Decamouflage consists of three independent detection methods: (1) rescaling, (2) filtering/pooling, and (3) steganalysis. While each of these three methods is efficient standalone, they can work in an ensemble manner not only to improve the detection accuracy but also to harden potential adaptive attacks. Decamouflage has a pre-determined detection threshold that is generic. More precisely, as we have validated, the threshold determined from one dataset is also applicable to other different datasets. Extensive experiments show that Decamouflage achieves detection accuracy of 99.9\% and 99.8\% in the white-box (with the knowledge of attack algorithms) and the black-box (without the knowledge of attack algorithms) settings, respectively. To corroborate the efficiency of Decamouflage, we have also measured its run-time overhead on a personal PC with an i5 CPU and found that Decamouflage can detect image-scaling attacks in milliseconds. Overall, Decamouflage can accurately detect image scaling attacks in both white-box and black-box settings with acceptable run-time overhead.
摘要：在计算机视觉应用中，图像尺寸调整或缩放的基本处理步骤中，更具体的下采样，具有因为CNN模型通常采取小固定大小的图像以正常大的图像馈送到卷积神经网络（CNN）模型之前施加作为输入。然而，图像缩放功能可以adversarially滥用，以执行新发现的攻击称为图像缩放攻击，这可能会影响广泛的计算机视觉应用构建在图像缩放功能。这项工作提出了图像缩放攻击检测框架，称为Decamouflage。 Decamouflage由三个独立的检测方法：（1）重新缩放，（2）过滤/池，和（3）隐写。虽然这些三种方法是有效的独立，他们可以自由组合的方式，不仅提高了检测精度，而且硬化潜在的适应攻击工作。 Decamouflage具有预定检测阈值是通用的。更确切地说，我们已经证实，该阈值从一个数据集，也适用于其他不同的数据集来确定。大量的实验表明，Decamouflage达到99.9 \％和99.8 \％的检测精度的白盒（带攻击算法的知识）分别与黑盒（无攻击算法的知识）的设置。为了证实Decamouflage的效率，我们还测量了个人电脑与在i5 CPU在其运行时的开销，发现Decamouflage可以检测到以毫秒为单位的图像缩放攻击。总体而言，Decamouflage能准确检测与可接受的运行时间开销都白盒和黑盒设置图像缩放攻击。

22. Deep Tiered Image Segmentation forDetecting Internal Ice Layers in Radar Imagery [PDF] 返回目录
Yuchen Wang, Mingze Xu, John Paden, Lora Koenig, Geoffrey Fox, David Crandall
Abstract: Understanding the structure of the ice at the Earth's poles is important for modeling how global warming will impact polar ice and, in turn, the Earth's climate. Ground-penetrating radar is able to collect observations of the internal structure of snow and ice, but the process of manually labeling these observations with layer boundaries is slow and laborious. Recent work has developed automatic techniques for finding ice-bed boundaries, but finding internal boundaries is much more challenging because the number of layers is unknown and the layers can disappear, reappear, merge, and split. In this paper, we propose a novel deep neural network-based model for solving a general class of tiered segmentation problems. We then apply it to detecting internal layers in polar ice, and evaluate on a large-scale dataset of polar ice radar data with human-labeled annotations as ground truth.
摘要：了解在地球两极的冰结构建模全球变暖将如何影响极地冰盖，反过来，地球的气候非常重要的。地面穿透雷达能够收集的雪和冰的内部结构的观察，但与层边界手动标记这些观察的过程是缓慢的和费力的。最近的工作已经开发出用于寻找冰床的界限，但发现内部边界自动技术更具有挑战性，因为层数是未知的，该层可以消失，重新出现，合并和拆分。在本文中，我们提出了解决一般类的分层分割问题，一种新型的深基于神经网络模型。然后，我们把它应用到探测极地冰盖内部层，并评估与人类标记注释作为地面实况极地冰雷达数据的大规模数据集。

23. Revisiting Batch Normalization for Improving Corruption Robustness [PDF] 返回目录
Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon
Abstract: Modern deep neural networks (DNN) have demonstrated remarkable success in image recognition tasks when the test dataset and training dataset are from the same distribution. In practical applications, however, this assumption is often not valid and results in performance drop when there is a domain shift. For example, the performance of DNNs trained on clean images has been shown to decrease when the test images have common corruptions, limiting their use in performance-sensitive applications. In this work, we interpret corruption robustness as a domain shift problem and propose to rectify batch normalization (BN) statistics for improving model robustness. This shift from the clean domain to the corruption domain can be interpreted as a style shift that is represented by the BN statistics. Straightforwardly, adapting BN statistics is beneficial for rectifying this style shift. Specifically, we find that simply estimating and adapting the BN statistics on a few (32 for instance) representation samples, without retraining the model, improves the corruption robustness by a large margin on several benchmark datasets with a wide range of model architectures. For example, on ImageNet-C, statistics adaptation improves the top1 accuracy from 40.2% to 49%. Moreover, we find that this technique can further improve state-of-the-art robust models from 59.0% to 63.5%.
摘要：现代深层神经网络（DNN）已经证明，在图像识别任务显着的成功，当测试数据集和训练数据集是从相同的分布。在实际应用中，然而，这种假设时，有一个域转变往往是无效的，并在性能下降的结果。例如，已经显示出训练有素的清洁图像DNNs的性能下降时，测试图像有共同的损坏，限制了性能敏感的应用中的使用。在这项工作中，我们解释腐败健壮性域转变的问题，并提出整改为提高模型的鲁棒性批标准化（BN）的统计数据。这从清洁域到腐败域转变可以理解为是由国阵统计所代表的风格转变。直截了当，适应BN统计是纠正这种风格的转变是有益的。具体而言，我们发现，简单地估算和适应于少数（32例）表示样品的BN统计，不再培训模式，提高了具有广泛的模型架构的几个基准数据集以大比分腐败的鲁棒性。例如，在ImageNet-C，统计适应提高了从40.2％的准确度TOP1到49％。此外，我们发现，这种技术可以进一步从59.0％提高国家的最先进可靠的模型到63.5％。

24. Infant-ID: Fingerprints for Global Good [PDF] 返回目录
Joshua J. Engelsma, Debayan Deb, Kai Cao, Anjoo Bhatnagar, Prem S. Sudhish, Anil K. Jain
Abstract: In many of the least developed and developing countries, a multitude of infants continue to suffer and die from vaccine-preventable diseases and malnutrition. Lamentably, the lack of official identification documentation makes it exceedingly difficult to track which infants have been vaccinated and which infants have received nutritional supplements. Answering these questions could prevent this infant suffering and premature death around the world. To that end, we propose Infant-Prints, an end-to-end, low-cost, infant fingerprint recognition system. Infant-Prints is comprised of our (i) custom built, compact, low-cost (85 USD), high-resolution (1,900 ppi), ergonomic fingerprint reader, and (ii) high-resolution infant fingerprint matcher. To evaluate the efficacy of Infant-Prints, we collected a longitudinal infant fingerprint database captured in 4 different sessions over a 12-month time span (December 2018 to January 2020), from 315 infants at the Saran Ashram Hospital, a charitable hospital in Dayalbagh, Agra, India. Our experimental results demonstrate, for the first time, that Infant-Prints can deliver accurate and reliable recognition (over time) of infants enrolled between the ages of 2-3 months, in time for effective delivery of vaccinations, healthcare, and nutritional supplements (TAR=95.2% @ FAR = 1.0% for infants aged 8-16 weeks at enrollment and authenticated 3 months later).
摘要：在许多最不发达国家和发展中国家，婴幼儿的多种继续受害，从疫苗预防的疾病和营养不良死亡。可悲的是，由于缺乏官方的身份证明文件使得极难追踪哪些婴儿已接种疫苗和婴儿收到补充营养。回答这些问题可能会阻止世界各地的这个婴儿的痛苦和过早死亡。为此，我们提出婴儿打印，终端到终端，低成本，婴儿的指纹识别系统。婴儿版画是由我们（我）定制，结构紧凑，低成本（85美元），高分辨率（1900 PPI），符合人体工学的指纹识别器，以及（ii）高分辨率婴儿的指纹匹配。为了评估婴儿打印的效果，我们收集了在4周不同的会议超过12个月的时间跨度捕获的纵向婴儿指纹数据库（2018年12月至一月2020年），从在萨兰聚会所医院，在达耶尔巴格慈善医院315名婴儿，阿格拉，印度。我们的实验结果表明，对于第一次，婴幼儿，可打印提供准确而可靠的识别（一段时间内）的婴儿2-3个月之间的年龄入学，及时有效地提供疫苗接种，医疗保健和营养补充剂（ TAR = 95.2％@在登记年龄8-16周的婴儿FAR = 1.0％和认证的3个月后）。

25. Shape, Illumination, and Reflectance from Shading [PDF] 返回目录
Jonathan T. Barron, Jitendra Malik
Abstract: A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world. Traditional methods for recovering scene properties such as shape, reflectance, or illumination rely on multiple observations of the same scene to overconstrain the problem. Recovering these same properties from a single image seems almost impossible in comparison -- there are an infinite number of shapes, paint, and lights that exactly reproduce a single image. However, certain explanations are more likely than others: surfaces tend to be smooth, paint tends to be uniform, and illumination tends to be natural. We therefore pose this problem as one of statistical inference, and define an optimization problem that searches for the *most likely* explanation of a single image. Our technique can be viewed as a superset of several classic computer vision problems (shape-from-shading, intrinsic images, color constancy, illumination estimation, etc) and outperforms all previous solutions to those constituent problems.
摘要：在计算机视觉的一个基本问题是，推断内蕴，从平面世界的三维结构，即世界的2D图像。用于回收场景性质如形状，反射率，或照明的传统方法依赖于相同场景的多个观测到过约束问题。从一个单一的形象恢复这些相同的性能比较，似乎几乎是不可能的 - 有形状，油漆和灯光恰好再现单个图像的无限数量。然而，某些说明的可能性比他人：表面往往是光滑的，涂料趋向于均匀，照明往往是自然的。因此，我们提出这个问题，因为统计推断的一个，并且，对于*搜索最有可能定义一个优化问题*单幅图像的解释。我们的技术可被视为若干经典计算机视觉问题（形状从遮光，固有的图像，颜色恒常，照明估计等），并优于所有先前的解决方案，以这些组成问题的超集。

26. Regularized Compression of MRI Data: Modular Optimization of Joint Reconstruction and Coding [PDF] 返回目录
Veronica Corona, Yehuda Dar, Carola-Bibiane Schönlieb
Abstract: The Magnetic Resonance Imaging (MRI) processing chain starts with a critical acquisition stage that provides raw data for reconstruction of images for medical diagnosis. This flow usually includes a near-lossless data compression stage that enables digital storage and/or transmission in binary formats. In this work we propose a framework for joint optimization of the MRI reconstruction and lossy compression, producing compressed representations of medical images that achieve improved trade-offs between quality and bit-rate. Moreover, we demonstrate that lossy compression can even improve the reconstruction quality compared to settings based on lossless compression. Our method has a modular optimization structure, implemented using the alternating direction method of multipliers (ADMM) technique and the state-of-the-art image compression technique (BPG) as a black-box module iteratively applied. This establishes a medical data compression approach compatible with a lossy compression standard of choice. A main novelty of the proposed algorithm is in the total-variation regularization added to the modular compression process, leading to decompressed images of higher quality without any additional processing at/after the decompression stage. Our experiments show that our regularization-based approach for joint MRI reconstruction and compression often achieves significant PSNR gains between 4 to 9 dB at high bit-rates compared to non-regularized solutions of the joint task. Compared to regularization-based solutions, our optimization method provides PSNR gains between 0.5 to 1 dB at high bit-rates, which is the range of interest for medical image compression.
摘要：磁共振成像（MRI）处理链始于对于用于医学诊断图像的重建提供原始数据的关键采集阶段。该流程通常包括使得能够在二进制格式的数字存储和/或传输的近无损数据压缩的阶段。在这项工作中，我们提出了一个框架的MRI重建和有损压缩的联合优化，产生实现质量和比特率之间的改进权衡医学图像压缩表示。此外，我们证明了有损压缩，甚至可以提高重建质量与基于无损压缩设置。我们的方法具有模块化的优化结构，采用乘法器（ADMM）技术的交替方向的方法和国家的最先进的图像压缩技术（BPG），其为黑盒模块迭代地应用来实现。这建立与所选择的有损压缩标准兼容的医疗数据压缩方法。所提出的算法的主要新颖之处在于在加入到模块化压缩过程中的总变差正则化，解压缩阶段之后导致较高质量的解压缩的图像，而无需任何附加的处理在/。我们的实验表明，联合MRI重建和压缩我们的基于正则化的方法往往比合资任务的非正则的解决方案实现了在高比特率在4〜9分贝显著PSNR收益。相比基于正则化的解决方案，我们的优化方法提供PSNR涨幅0.5之间以高比特汇率1分贝，这是医学图像压缩感兴趣的范围内。

27. A Unified Approach to Interpreting and Boosting Adversarial Transferability [PDF] 返回目录
Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, Quanshi Zhang
Abstract: In this paper, we use the interaction inside adversarial perturbations to explain and boost the adversarial transferability. We discover and prove the negative correlation between the adversarial transferability and the interaction inside adversarial perturbations. The negative correlation is further verified through different DNNs with various inputs. Moreover, this negative correlation can be regarded as a unified perspective to understand current transferability-boosting methods. To this end, we prove that some classic methods of enhancing the transferability essentially decease interactions inside adversarial perturbations. Based on this, we propose to directly penalize interactions during the attacking process, which significantly improves the adversarial transferability.
摘要：在本文中，我们使用对抗扰动内的相互作用来解释和提高对抗转移性。我们发现并证明了对抗转移性和对抗性的扰动内的互动之间的负相关关系。负相关是通过与各种输入不同DNNs进一步验证。此外，这种负相关关系可以被视为一个统一的角度来理解当前的转移性升压的方法。为此，我们证明了提高转让的一些经典方法本质死亡对抗扰动内的相互作用。在此基础上，我们提出了在攻击过程，显著提高了对抗转让直接惩罚的相互作用。

28. Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study [PDF] 返回目录
Jiancheng Yang, Mingze Gao, Kaiming Kuang, Bingbing Ni, Yunlang She, Dong Xie, Chang Chen
Abstract: Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as the clinical golden standard of lung cancer diagnosis, in this study, we solve the label ambiguity issue via a large-scale radio-pathomics dataset containing 5,134 radiological CT images with pathologically confirmed labels, including cancers (e.g., invasive/non-invasive adenocarcinoma, squamous carcinoma) and non-cancer diseases (e.g., tuberculosis, hamartoma). This retrospective dataset, named Pulmonary-RadPath, enables development and validation of accurate deep learning systems to predict invasive pathological labels with a non-invasive procedure, i.e., radiological CT scans. A three-level hierarchical classification system for pulmonary lesions is developed, which covers most diseases in cancer-related diagnosis. We explore several techniques for hierarchical classification on this dataset, and propose a Leaky Dense Hierarchy approach with proven effectiveness in experiments. Our study significantly outperforms prior arts in terms of data scales (6x larger), disease comprehensiveness and hierarchies. The promising results suggest the potentials to facilitate precision medicine.
摘要：诊断从计算机断层摄影肺部病变（CT）是重要的，但具有挑战性的肺癌有关的疾病的临床决策。深学习已因放射学诊断的难度达到了肺癌计算机辅助诊断（态CADx）领域的巨大成功，而它的标签不确定性困扰。考虑到浸润性病理分析作为肺癌诊断的临床金标准，在这项研究中，我们通过大规模解决标签模糊性问题无线电pathomics数据集包含经病理证实的标签5,134放射CT图像，包括癌症（例如，侵入性/非侵入性腺癌，鳞状细胞癌）和非癌症疾病（例如，结核，错构瘤）。这项回顾性的数据集，名为肺-RadPath，能够进行精确的深度学习系统的开发和验证具有非侵入性的程序，即放射CT扫描，以预测浸润性病理标签。肺部病变三级分层分类系统的开发，其中涵盖了大多数疾病与癌症相关的诊断。我们探讨这个数据集分层分类的几种技术，并提出用实验证明有效性破密集层次的方法。我们的研究数据规模（10倍以上），疾病全面性和层次结构方面优于显著的现有技术。有希望的结果表明潜力，以方便精确药。

29. Texture-based Presentation Attack Detection for Automatic Speaker Verification [PDF] 返回目录
Lazaro J. Gonzalez-Soler, Jose Patino, Marta Gomez-Barrero, Massimiliano Todisco, Christoph Busch, Nicholas Evans
Abstract: Biometric systems are nowadays employed across a broad range of applications. They provide high security and efficiency and, in many cases, are user friendly. Despite these and other advantages, biometric systems in general and Automatic speaker verification (ASV) systems in particular can be vulnerable to attack presentations. The most recent ASVSpoof 2019 competition showed that most forms of attacks can be detected reliably with ensemble classifier-based presentation attack detection (PAD) approaches. These, though, depend fundamentally upon the complementarity of systems in the ensemble. With the motivation to increase the generalisability of PAD solutions, this paper reports our exploration of texture descriptors applied to the analysis of speech spectrogram images. In particular, we propose a common fisher vector feature space based on a generative model. Experimental results show the soundness of our approach: at most, 16 in 100 bona fide presentations are rejected whereas only one in 100 attack presentations are accepted.
摘要：生物识别系统在广泛的应用时下使用。他们提供高安全性和效率，而且在很多情况下，是用户友好的。尽管有这些和其它优点，在一般的自动说话人确认（ASV），特别是系统的生物识别系统很容易受到攻击的演示。最近ASVSpoof 2019竞争表明，大多数形式的攻击能够可靠地集成基于分类，呈现攻击检测（PAD）方法来检测。这些，但是，从根本上依赖于在集合系统的互补性。随着动力增加PAD解决方案的推广性，本文报道我们应用到语音频谱图像分析纹理描述符的探索。特别是，我们提出了一种基于生成模型的共同渔民矢量特征空间。实验结果表明我们的方法的合理性：顶多16 100真正的演讲被拒绝，而100只有一个攻击演示被接受。

30. Frequency and Spatial domain based Saliency for Pigmented Skin Lesion Segmentation [PDF] 返回目录
Zanobya N. Khan
Abstract: Skin lesion segmentation can be rather a challenging task owing to the presence of artifacts, low contrast between lesion and boundary, color variegation, fuzzy skin lesion borders and heterogeneous background in dermoscopy images. In this paper, we propose a simple yet effective saliency-based approach derived in the frequency and spatial domain to detect pigmented skin lesion. Two color models are utilized for the construction of these maps. We suggest a different metric for each color model to design map in the spatial domain via color features. The map in the frequency domain is generated from aggregated images. We adopt a separate fusion scheme to combine salient features in their respective domains. Finally, two-phase saliency integration scheme is devised to combine these maps using pixelwise multiplication. Performance of the proposed method is assessed on PH2 and ISIC 2016 datasets. The outcome of the experiments suggests that the proposed scheme generate better segmentation result as compared to state-of-the-art methods.
摘要：皮肤病变分割可以是相当由于伪影，在皮肤镜图像病变和边界，颜色杂色，模糊性皮肤病变边界和背景异构之间低对比度的存在的具有挑战性的任务。在本文中，我们提出了一个简单但在频域和空间域衍生检测色素性皮肤病病变有效的基于显着的方法。两种颜色模型被用于这些地图的建设。我们建议对通过颜色特征的每种颜色模型设计图在空间域不同的度量。在频域中的映射是由聚集的图像生成。我们采用一个单独的融合方案，以显着的特征在各自的领域结合起来。最后，两相显着的整合方案被设计为基于像素用乘法这些地图结合起来。该方法的业绩进行评价，在PH2和ISIC 2016个数据集。实验结果表明，相比于国家的最先进的方法，该方案产生更好的分割结果。

31. Spatially-Variant CNN-based Point Spread Function Estimation for Blind Deconvolution and Depth Estimation [PDF] 返回目录
Adrian Shajkofci, Michael Liebling
Abstract: Optical microscopy is an essential tool in biology and medicine. Imaging thin, yet non-flat objects in a single shot (without relying on more sophisticated sectioning setups) remains challenging as the shallow depth of field that comes with high-resolution microscopes leads to unsharp image regions and makes depth localization and quantitative image interpretation difficult. Here, we present a method that improves the resolution of light microscopy images of such objects by locally estimating image distortion while jointly estimating object distance to the focal plane. Specifically, we estimate the parameters of a spatially-variant Point-Spread function (PSF) model using a Convolutional Neural Network (CNN), which does not require instrument- or object-specific calibration. Our method recovers PSF parameters from the image itself with up to a squared Pearson correlation coefficient of 0.99 in ideal conditions, while remaining robust to object rotation, illumination variations, or photon noise. When the recovered PSFs are used with a spatially-variant and regularized Richardson-Lucy deconvolution algorithm, we observed up to 2.1 dB better signal-to-noise ratio compared to other blind deconvolution techniques. Following microscope-specific calibration, we further demonstrate that the recovered PSF model parameters permit estimating surface depth with a precision of 2 micrometers and over an extended range when using engineered PSFs. Our method opens up multiple possibilities for enhancing images of non-flat objects with minimal need for a priori knowledge about the optical setup.
摘要：光学显微镜是在生物学和医学的重要工具。成像单杆薄，但不平坦的对象（不依赖于更复杂的切片设置）仍然具有挑战性的浅景深随高分辨率显微镜导致不清晰的图像区域，使深度国产化和定量图像难以解释。这里，我们提出改善通过局部地估计图像失真而共同估计到焦平面对象距离的这样的对象的光显微图像分辨率的方法。具体而言，我们估计使用卷积神经网络（CNN），其不需要仪器 - 或对象的特定的校准空间变点扩散函数（PSF）的模型的参数。我们从图像本身具有高达在理想的条件下0.99平方Pearson相关系数法复苏PSF参数，同时保持健壮到对象的旋转，照明的变化，或光子噪声。当恢复的的PSF与在空间上变体使用，正规化理查森 - 露去卷积算法，我们观察到2.1分贝更好信噪比相对于其他盲去卷积技术。以下显微镜特定的校准，我们进一步证实所回收的PSF模型参数估计许可证表面深度为2微米的精度和在延长的范围使用工程的PSF时。我们的方法开辟了以最小的需要有关光盘安装的先验知识，提高非平面物体的图像多种可能性。

32. Tractography filtering using autoencoders [PDF] 返回目录
Jon Haitz Legarreta, Laurent Petit, François Rheault, Guillaume Theaud, Carl Lemaire, Maxime Descoteaux, Pierre-Marc Jodoin
Abstract: Current brain white matter fiber tracking techniques show a number of problems, including: generating large proportions of streamlines that do not accurately describe the underlying anatomy; extracting streamlines that are not supported by the underlying diffusion signal; and under-representing some fiber populations, among others. In this paper, we describe a novel unsupervised learning method to filter streamlines from diffusion MRI tractography, and hence, to obtain more reliable tractograms. We show that a convolutional neural network autoencoder provides a straightforward and elegant way to learn a robust representation of brain streamlines, which can be used to filter undesired samples with a nearest neighbor algorithm. Our method, dubbed FINTA (Filtering in Tractography using Autoencoders) comes with several key advantages: training does not need labeled data, as it uses raw tractograms, it is fast and easily reproducible, it does not rely on the input diffusion MRI data, and thus, does not suffer from domain adaptation issues. We demonstrate the ability of FINTA to discriminate between "plausible" and "implausible" streamlines as well as to recover individual streamline group instances from a raw tractogram, from both synthetic and real human brain diffusion MRI tractography data, including partial tractograms. Results reveal that FINTA has a superior filtering performance compared to state-of-the-art methods. Together, this work brings forward a new deep learning framework in tractography based on autoencoders, and shows how it can be applied for filtering purposes. It sets the foundations for opening up new prospects towards more accurate and robust tractometry and connectivity diffusion MRI analyses, which may ultimately lead to improve the imaging of the white matter anatomy.
摘要：当前脑白质纤维跟踪技术表现出许多问题，包括：生成流线不准确描述底层解剖学大比例;提取不受下面的扩散信号支持的流线;和欠代表一些纤维群体，等等。在本文中，我们描述了一种新的无监督学习方法从扩散MRI示踪过滤流线，因此，为了获得更可靠的tractograms。我们证明了卷积神经网络的自动编码提供了一个简单而优雅的方式来学习脑流线，可用于过滤不希望的样品与近邻算法的稳健表现。我们的方法，被称为芬塔（滤波的束成像采用自动编码），附带了几个关键的优势：训练不需要标记数据，因为它使用的原料tractograms，它是快速和容易复制，它不依赖于输入扩散MRI数据，以及因此，不从领域适应性问题的影响。我们证明芬塔的“可信的”和“不可信的”流线之间进行区分，以及向从原料tractogram恢复各个流组的情况下，从合成的和真实的人脑扩散MRI示踪数据，包括部分tractograms的能力。结果表明，相比于国家的最先进的方法芬塔具有优异的过滤性能。总之，这项工作提出了在跟踪技术基于自动编码一个新的深度学习框架，并显示它如何应用过滤的目的。它设置了基础，为实现更精确和稳健的tractometry和连接扩散MRI的分析，这可能最终导致改善脑白质解剖成像开辟了新的前景。

33. Free annotated data for deep learning in microscopy? A hitchhiker's guide [PDF] 返回目录
Adrian Shajkofci, Michael Liebling
Abstract: In microscopy, the time burden and cost of acquiring and annotating large datasets that many deep learning models take as a prerequisite, often appears to make these methods impractical. Can this requirement for annotated data be relaxed? Is it possible to borrow the knowledge gathered from datasets in other application fields and leverage it for microscopy? Here, we aim to provide an overview of methods that have recently emerged to successfully train learning-based methods in bio-microscopy.
摘要：在显微镜的时间负担，获取和注释大型数据集，许多深学习模型需要为前提的成本，经常出现使这些方法不切实际。这个要求对注释的数据可以放宽？是否有可能借用其他应用领域从数据集收集到的知识，并利用它的显微镜？在这里，我们的目标是提供那些最近出现的成功训练方法的概述学习为基础的生物显微镜方法。

34. A Brief Review of Domain Adaptation [PDF] 返回目录
Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, Hamid R. Arabnia
Abstract: Classical machine learning assumes that the training and test sets come from the same distributions. Therefore, a model learned from the labeled training data is expected to perform well on the test data. However, This assumption may not always hold in real-world applications where the training and the test data fall from different distributions, due to many factors, e.g., collecting the training and test sets from different sources, or having an out-dated training set due to the change of data over time. In this case, there would be a discrepancy across domain distributions, and naively applying the trained model on the new dataset may cause degradation in the performance. Domain adaptation is a sub-field within machine learning that aims to cope with these types of problems by aligning the disparity between domains such that the trained model can be generalized into the domain of interest. This paper focuses on unsupervised domain adaptation, where the labels are only available in the source domain. It addresses the categorization of domain adaptation from different viewpoints. Besides, It presents some successful shallow and deep domain adaptation approaches that aim to deal with domain adaptation problems.
摘要：经典的机器学习假设训练组和测试组来自同一分布。因此，从标记的训练数据学到了模型预计将测试数据表现良好。然而，这种假设可能不总是保持在训练和测试数据来自不同的分布下跌，由于多种因素，例如，从不同来源收集的训练和测试集，或者具有过时的训练组真实世界中的应用由于数据随时间的变化。在这种情况下，将有跨越域分布的差异，并天真地应用在新的数据集训练的模型可能会在导致性能降低。领域适应性是机器学习中的子场，旨在通过对齐域使得训练的模式可以推广到感兴趣的领域之间的差异与这些类型的问题处理。本文侧重于监督的领域调整，其中标签仅在源域可用。它涉及从不同的角度域适应的分类。此外，它提供了一些成功的浅层和深层领域适应性方法，旨在应对领域适应性问题。

35. Synthesising clinically realistic Chest X-rays using Generative Adversarial Networks [PDF] 返回目录
Bradley Segal, David M. Rubin, Grace Rubin, Adam Pantanowitz
Abstract: Chest x-rays are one of the most commonly performed medical investigations globally and are vital to identifying a number of conditions. These images are however protected under patient confidentiality and as such require the removal of identifying information as well as ethical clearance to be released. Generative adversarial networks (GANs) are a branch of deep learning which are capable of producing synthetic samples of a desired distribution. Image generation is one such application with recent advances enabling the production of high-resolution images, a feature vital to the utility of x-rays given the scale of various pathologies. We apply the Progressive Growing GAN (PGGAN) to the task of chest x-ray generation with the goal of being able to produce images without any ethical concerns that may be used for medical education or in other machine learning work. We evaluate the properties of the generated x-rays with a practicing radiologist and demonstrate that high-quality, realistic images can be produced with global features consistent with pathologies seen in the NIH dataset. Improvements in the reproduction of small-scale details remains for future work. We train a classification model on the NIH images and evaluate the distribution of disease labels across the generated samples. We find that the model is capable of reproducing all the abnormalities in a similar proportion to the source image distribution as labelled by the classifier. We additionally demonstrate that the latent space can be optimised to produce images of a particular class despite unconditional training, with the model producing related features and complications for the class of interest. We also validate the application of the Fr'echet Inception Distance (FID) to x-ray images and determine that the PGGAN reproduces x-ray images with an FID of 8.02, which is similar to other high resolution tasks.
摘要：胸部X光检查是全局最常进行的医疗调查之一，并且对识别号码的条件是至关重要的。这些图像是在患者的隐私保护，但是，因此需要拆除的识别信息，以及道德的间隙被释放。生成对抗网络（甘斯）是深学习的分支，它能够产生所需的分布的合成的样品。图像生成是一种这样的应用与最新进展能够生产高清晰度图像，给定的各种病症的规模x射线的效用重要的特征的。我们采用渐进式生长GaN（PGGAN）胸部X射线生成的任务与能够产生图像，而可用于医疗教育或其他机器学习工作的任何道德问题的目标。我们评估产生的x射线的性质与执业放射科医生和证明，高品质，可与在美国国立卫生研究院的数据集看到病变一致的全球功能产生逼真的图像。在小规模的细节再现的改进仍然是今后的工作。我们培养的NIH图像分类模型，并评估疾病标签的整个生成的样本分布。我们发现，该模型能够再现类似的比例在源图像分配的所有异常被分类为标记的。我们还表明，潜在空间可以无条件尽管训练进行优化，以一类特殊的产品图片与模型制作相关的功能和并发症之类的兴趣。我们还验证Fr'echet启距离（FID）的到x射线图像的应用程序，并确定与一个8.02 FID，它类似于其他高分辨率任务，所述PGGAN再现X射线图像。

36. High Definition image classification in Geoscience using Machine Learning [PDF] 返回目录
Yajun An, Zachary Golden, Tarka Wilcox, Renzhi Cao
Abstract: High Definition (HD) digital photos taken with drones are widely used in the study of Geoscience. However, blurry images are often taken in collected data, and it takes a lot of time and effort to distinguish clear images from blurry ones. In this work, we apply Machine learning techniques, such as Support Vector Machine (SVM) and Neural Network (NN) to classify HD images in Geoscience as clear and blurry, and therefore automate data cleaning in Geoscience. We compare the results of classification based on features abstracted from several mathematical models. Some of the implementation of our machine learning tool is freely available at: this https URL.
摘要：高清晰度（HD）无人机拍摄的数码照片被广泛应用于地球科学的研究。然而，模糊的图像往往采取在收集到的数据，并且需要花费大量的时间和精力，从模糊的人区分清晰的图像。在这项工作中，我们采用机器学习技术，如支持向量机（SVM）和神经网络（NN）在地球科学分类的高清图像清晰和模糊的，因此数据自动清洁的地球科学。我们根据几个数学模型抽象特征进行比较分类结果。我们的一些机器学习工具的实施是免费提供的：该HTTPS URL。

37. Improve Adversarial Robustness via Weight Penalization on Classification Layer [PDF] 返回目录
Cong Xu, Dan Li, Min Yang
Abstract: It is well-known that deep neural networks are vulnerable to adversarial attacks. Recent studies show that well-designed classification parts can lead to better robustness. However, there is still much space for improvement along this line. In this paper, we first prove that, from a geometric point of view, the robustness of a neural network is equivalent to some angular margin condition of the classifier weights. We then explain why ReLU type function is not a good choice for activation under this framework. These findings reveal the limitations of the existing approaches and lead us to develop a novel light-weight-penalized defensive method, which is simple and has a good scalability. Empirical results on multiple benchmark datasets demonstrate that our method can effectively improve the robustness of the network without requiring too much additional computation, while maintaining a high classification precision for clean data.
摘要：这是众所周知的，深层神经网络很容易受到攻击的对抗性。最近的研究表明，精心设计的分级部分可以带来更好的稳健性。但是，仍然是沿着这条线的改进的空间。在本文中，我们首先证明，从一个几何点，神经网络的鲁棒性等同于分类器的权重的一些角度余量条件。然后，我们解释为什么RELU型功能不激活此框架下，一个不错的选择。这些发现揭示了现有方法的局限性，并带领我们开发一种新型轻质，处罚防守方法，该方法简单，具有良好的可扩展性。在多个基准数据集实证结果表明，我们的方法可以有效地提高网络的健壮性，而不需要太多额外的计算，同时保持干净的数据较高的分类精度。

38. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning [PDF] 返回目录
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht
Abstract: Given a simple request (e.g., Put a washed apple in the kitchen fridge), humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text-based policies in TextWorld (Côté et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, visual scene understanding, and so forth).
摘要：给定一个简单的请求（例如，将在厨房冰箱的洗涤苹果），人类可以通过想象动作序列和得分他们的成功，典型性，和效率的可能性，所有这些都不移动肌肉理由纯粹抽象的术语。一旦我们看到有问题的厨房，我们就可以更新我们的抽象计划，以适应现场。体现代理商需要相同的能力，但现有的工作还没有提供必要的抽象两种推理和具体执行的基础设施。我们在丰富引入ALFWorld，一个模拟器，可让客户在TextWorld学习抽象的，基于文本的政策解决此限制（Cote等，2018），然后执行从ALFRED基准目标（Shridhar等人，2020年）视觉环境。 ALFWorld使新BUTLER剂，其抽象的知识，在TextWorld学会的创建，直接对应于具体，直观地接地措施。反过来，因为我们经验证明，这样才能建立更好的代理推广不仅仅是在视觉上接地环境训练。巴特勒的简单，模块化设计因素，使研究人员能够专注于模型提高每一块管道（自然语言理解，规划，导航，视觉场景的理解，等等）的问题。

39. Bone Feature Segmentation in Ultrasound Spine Image with Robustness to Speckle and Regular Occlusion Noise [PDF] 返回目录
Zixun Huang, Li-Wen Wang, Frank H. F. Leung, Sunetra Banerjee, De Yang, Timothy Lee, Juan Lyu, Sai Ho Ling, Yong-Ping Zheng
Abstract: 3D ultrasound imaging shows great promise for scoliosis diagnosis thanks to its low-costing, radiation-free and real-time characteristics. The key to accessing scoliosis by ultrasound imaging is to accurately segment the bone area and measure the scoliosis degree based on the symmetry of the bone features. The ultrasound images tend to contain many speckles and regular occlusion noise which is difficult, tedious and time-consuming for experts to find out the bony feature. In this paper, we propose a robust bone feature segmentation method based on the U-net structure for ultrasound spine Volume Projection Imaging (VPI) images. The proposed segmentation method introduces a total variance loss to reduce the sensitivity of the model to small-scale and regular occlusion noise. The proposed approach improves 2.3% of Dice score and 1% of AUC score as compared with the u-net model and shows high robustness to speckle and regular occlusion noise.
摘要：3D超声成像显示脊柱侧弯诊断得益于其较低的成本计算，无辐射和实时特性很大的希望。通过超声成像访问脊柱侧凸的关键是精确地分段骨区域并测量基于所述骨特征的对称的程度脊柱侧凸。超声图像中通常有很多斑点和定期遮挡噪音这是困难，繁琐和费时的专家，找出骨的特征。在本文中，我们提出了基于超声脊柱卷投影成像（VPI）图像U型网状结构坚固骨骼特征分割方法。所提出的分割方法引入了一个总方差损失到模型的敏感性降低到小规模和定期闭塞噪声。与U形网模型，并显示高的鲁棒性和散斑定期闭塞噪声相比所提出的方法提高了骰子得分的2.3％和AUC得分的1％。

40. 3D Convolutional Sequence to Sequence Model for Vertebral Compression Fractures Identification in CT [PDF] 返回目录
David Chettrit, Tomer Meir, Hila Lebel, Mila Orlovsky, Ronen Gordon, Ayelet Akselrod-Ballin, Amir Bar
Abstract: An osteoporosis-related fracture occurs every three seconds worldwide, affecting one in three women and one in five men aged over 50. The early detection of at-risk patients facilitates effective and well-evidenced preventative interventions, reducing the incidence of major osteoporotic fractures. In this study, we present an automatic system for identification of vertebral compression fractures on Computed Tomography images, which are often an undiagnosed precursor to major osteoporosis-related fractures. The system integrates a compact 3D representation of the spine, utilizing a Convolutional Neural Network (CNN) for spinal cord detection and a novel end-to-end sequence to sequence 3D architecture. We evaluate several model variants that exploit different representation and classification approaches and present a framework combining an ensemble of models that achieves state of the art results, validated on a large data set, with a patient-level fracture identification of 0.955 Area Under the Curve (AUC). The system proposed has the potential to support osteoporosis clinical management, improve treatment pathways, and to change the course of one of the most burdensome diseases of our generation.
摘要：骨质疏松症相关的骨折全球每三秒钟发生，影响了三分之一的女性和五分之一的50岁以上的高危患者的早期检测有利于有效和充分证明预防性干预措施，减少主要骨质疏松的发病率男性骨折。在这项研究中，我们提出了一种自动系统，用于在计算机断层摄影图像椎体压缩性骨折，这往往是未确诊的前体到严重骨质疏松症相关的骨折的识别。该系统集成了脊柱的紧凑3D表示，利用卷积神经网络（CNN），用于脊髓检测和一种新颖的端至端的序列以序列三维结构。我们评价其利用不同的表示和分类方法和呈现框架相结合的模型的集合实现所艺术效果的状态下，在大的数据集验证，以0.955区的患者级裂缝识别的曲线下（几个模型的变体AUC）。所提出的系统必须支持骨质疏松症的临床管理的潜力，提高治疗的途径，并改变我们这一代人最繁重的疾病之一的过程。

41. A Critique of Self-Expressive Deep Subspace Clustering [PDF] 返回目录
Benjamin D. Haeffele, Chong You, René Vidal
Abstract: Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an appropriate kernel embedding of the original data using a neural network, which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in multiple ways, which can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all. We validate our theoretical results experimentally and additionally repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the clustering model.
摘要：子空间聚类是一种无监督聚类技术设计成支撑在子空间的线性的并集簇数据，每个子空间定义与尺寸的簇比周围空间低。针对此问题许多现有的制剂是基于利用线性子空间，其中的子空间内的任意点可被表示为子空间内的其它点的线性组合的自我表现的属性。为了扩展这种方法来支持非线性歧管的联合数据，大量的研究提出了学习适当的内核使用的神经网络，这是一个自我表现的损失函数在嵌入式转正的数据的原始数据的嵌入空间到嵌入空间中的数据之前上鼓励线性子空间的联合。在这里，我们表明，有一些这种方法还没有在以前的工作得到适当解决潜在的缺陷。尤其是，我们展示了模型公式往往以多种方式，这可能会导致数据，这需要不对应于在所有子空间的联合的退化嵌入病态。我们验证我们的理论计算结果和实验还重复以前的实验文献报道，我们得出结论，先前宣称的性能好处显著部分可以归因于一个特设的后处理步骤，而不是集群化模型。

42. Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations [PDF] 返回目录
Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
Abstract: A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and benchmarks are reliable. In this work, we set forth to design a set of experiments to understand an important but often ignored problem in visually grounded language generation: given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance? Empirically, we study several multi-reference datasets and corresponding vision-and-language tasks. We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task; that metric-wise, CIDEr has shown systematically larger variances than others. Our evaluations on reference-per-instance shed light on the design of reliable datasets in the future.
摘要：在视觉上接地语言生成的一个主要挑战是建立稳健的基准数据集和模型，可以在现实世界中的设置推广好。要做到这一点，关键是要确保我们的评价协议是正确的，基准是可靠的。在这项工作中，我们提出，设计了一组实验，以了解在视觉上接地语言生成的一个重要但常常被忽视的问题：因为人类有不同的公用事业和视觉注意，怎么会多参考样本方差的数据集影响模型的表现？根据经验，我们研究一些多引用数据集和相应的视觉和语言的任务。我们表明，这是至为重要的实验报告方差;人类产生的引用可以在不同的数据集/任务急剧变化，揭示了每项任务的性质;该指标上看，苹果酒已经显示出比其他系统更大的差异。我们在参考每个实例评估阐明了在未来可靠的数据集的设计。

43. pymia: A Python package for data handling and evaluation in deep learning-based medical image analysis [PDF] 返回目录
Alain Jungo, Olivier Scheidegger, Mauricio Reyes, Fabian Balsiger
Abstract: Background and Objective: Deep learning enables tremendous progress in medical image analysis. One driving force of this progress are open-source frameworks like TensorFlow and PyTorch. However, these frameworks rarely address issues specific to the domain of medical image analysis, such as 3-D data handling and distance metrics for evaluation. pymia, an open-source Python package, tries to address these issues by providing flexible data handling and evaluation independent of the deep learning framework. Methods: The pymia package provides data handling and evaluation functionalities. The data handling allows flexible medical image handling in every commonly used format (e.g., 2-D, 2.5-D, and 3-D; full- or patch-wise). Even data beyond images like demographics or clinical reports can easily be integrated into deep learning pipelines. The evaluation allows stand-alone result calculation and reporting, as well as performance monitoring during training using a vast amount of domain-specific metrics for segmentation, reconstruction, and regression. Results: The pymia package is highly flexible, allows for fast prototyping, and reduces the burden of implementing data handling routines and evaluation methods. While data handling and evaluation are independent of the deep learning framework used, they can easily be integrated into TensorFlow and PyTorch pipelines. The developed package was successfully used in a variety of research projects for segmentation, reconstruction, and regression. Conclusions: The pymia package fills the gap of current deep learning frameworks regarding data handling and evaluation in medical image analysis. It is available at this https URL and can directly be installed from the Python Package Index using pip install pymia.
摘要：背景与目的：深度学习使医学图像分析的巨大进步。这种进步的一个动力，就像TensorFlow和PyTorch开源框架。然而，这些框架很少地址问题的具体到医学图像分析的领域，如3- d数据处理和评价距离度量。 pymia，一个开源的Python包，试图通过提供灵活的数据处理和分析的独立的深度学习框架来解决这些问题。方法：pymia包提供数据处理和分析功能。数据处理允许灵活的医用图像中每一个常用的格式处理（例如，2-d，2.5 d，和3-d;全或贴片明智）。超越例如人口或临床报告中的图像数据，甚至可以很容易地集成到深度学习管道。评估允许独立的计算结果和报告，以及使用性能进行分割，重建和回归特定领域的指标的大量培养过程中监测。结果：pymia包是高度灵活，可实现快速原型设计，并降低实现数据处理程序和评估方法的负担。虽然数据处理和评估独立于所使用的深度学习框架，他们可以很容易地集成到TensorFlow和PyTorch管道。该开发包中的各种研究项目进行分割，重建和回归的成功运用。结论：pymia包填充关于医学图像分析的数据处理和分析目前的深度学习框架的差距。它可在此HTTPS URL，并可以直接从Python包索引使用PIP安装pymia安装。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-09

目录

摘要