摘要

1. Using Player's Body-Orientation to Model Pass Feasibility in Soccer [PDF] 返回目录
Adrià Arbués-Sangüesa, Adrián Martín, Javier Fernández, Coloma Ballester, Gloria Haro
Abstract: Given a monocular video of a soccer match, this paper presents a computational model to estimate the most feasible pass at any given time. The method leverages offensive player's orientation (plus their location) and opponents' spatial configuration to compute the feasibility of pass events within players of the same team. Orientation data is gathered from body pose estimations that are properly projected onto the 2D game field; moreover, a geometrical solution is provided, through the definition of a feasibility measure, to determine which players are better oriented towards each other. Once analyzed more than 6000 pass events, results show that, by including orientation as a feasibility measure, a robust computational model can be built, reaching more than 0.7 Top-3 accuracy. Finally, the combination of the orientation feasibility measure with the recently introduced Expected Possession Value metric is studied; promising results are obtained, thus showing that existing models can be refined by using orientation as a key feature. These models could help both coaches and analysts to have a better understanding of the game and to improve the players' decision-making process.
摘要：由于足球比赛的单筒视频，本文提出了一个计算模型来估计在任何给定的时间是最可行的通行证。该方法利用进攻球员的方向（加上自己的位置）和对手的空间配置来计算同队球员内通过事件的可行性。定位数据是从正确投射到2D游戏领域的身体姿势估计聚集;此外，几何溶液中提供，通过可行性度量的定义，以确定哪些玩家朝向彼此被更好地定向。一旦超过6000个事件分析更多，结果表明，通过包括定向的可行性措施，一个强大的计算模型可建，达到了0.7前三的准确性。最后，最近推出的预期占有价值指标方向的可行性措施的组合进行了研究;有希望的结果得到的，从而表明现有的模型可以通过使用取向作为密钥特征加以改进。这些模型可以帮助双方教练和分析师有一个更好的了解游戏，提高玩家的决策过程。

2. A Transductive Approach for Video Object Segmentation [PDF] 返回目录
Zhang Yizhuo, Wu Zhirong, Peng Houwen, Lin Stephen
Abstract: Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame. Most of current prevailing methods utilize information from additional modules trained in other domains like optical flow and instance segmentation, and as a result they do not compete with other methods on common ground. To address this issue, we propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed. Our method takes a label propagation approach where pixel labels are passed forward based on feature similarity in an embedding space. Different from other propagation methods, ours diffuses temporal information in a holistic manner which take accounts of long-term object appearance. In addition, our method requires few additional computational overhead, and runs at a fast $\sim$37 fps speed. Our single model with a vanilla ResNet50 backbone achieves an overall score of 72.3 on the DAVIS 2017 validation set and 63.1 on the test set. This simple yet high performing and efficient method can serve as a solid baseline that facilitates future research. Code and models are available at \url{this https URL}.
摘要：半监督视频对象分割旨在从一个视频序列，给定在第一帧中的掩模的目标对象分开。大多数目前通行的方法利用来自在像光流和实例分割其他领域受过训练的附加模块的信息，并且作为结果它们不与上共同点的其它方法竞争。为了解决这个问题，我们提出了一个简单而强大的直推式方法，在不需要额外的模块，数据集，以及专用建筑设计。我们的方法利用其中像素标签被传递了一种基于在嵌入空间特征相似度标签传播方法。从其它传播方法的不同，我们的扩散以整体方式，其采取长期对象物出现的帐户时间信息。此外，我们的方法需要一些额外的计算开销，并以很快的$ \ $ SIM卡37 fps的速度运行。我们与香草ResNet50骨干单一车型实现了72.3上戴维斯2017年的验证集和63.1的测试集的整体得分。这个简单而高性能的和有效的方法，可以作为促进未来研究了坚实的基础。代码和模型可在\ {URL这HTTPS URL}。

3. Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [PDF] 返回目录
Alejandro Peña, Ignacio Serna, Aythami Morales, Julian Fierrez
Abstract: The presence of decision-making algorithms in society is rapidly increasing nowadays, while concerns about their transparency and the possibility of these algorithms becoming new sources of discrimination are arising. In fact, many relevant automated systems have been shown to make decisions based on sensitive information or discriminate certain social groups (e.g. certain biometric systems for person recognition). With the aim of studying how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data, we propose a fictitious automated recruitment testbed: FairCVtest. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. FairCVtest shows the capacity of the Artificial Intelligence (AI) behind such recruitment tool to extract sensitive information from unstructured data, and exploit it in combination to data biases in undesirable (unfair) ways. Finally, we present a list of recent works developing techniques capable of removing sensitive information from the decision-making process of deep learning architectures. We have used one of these algorithms (SensitiveNets) to experiment discrimination-aware learning for the elimination of sensitive information in our multimodal AI framework. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
摘要：决策算法在社会中的存在是迅速增加的今天，而他们的透明度，这些算法的可能性的担忧成为新的歧视来源产生。事实上，许多相关的自动化系统已被证明能够基于敏感信息或判别某些社会群体（用于个人识别例如某些生物特征识别系统）的决定。随着研究如何基于信息异构源电流多峰算法由在数据敏感元件和内偏差影响的目的，我们提出了一个虚构的自动化测试平台招募：FairCVtest。我们培养使用一组的性别和种族偏见意识地多进球的合成轮廓自动招募算法。 FairCVtest示出了人工智能（AI）的后面这样招聘工具的能力，以提取从非结构化数据的敏感信息，并利用它在组合数据偏差不希望的（不公平）的方式。最后，我们提出近期的作品开发能够去除深度学习架构的决策过程中的敏感信息技术的列表。我们使用这些算法（SensitiveNets）到实验歧视意识的学习，为消除我们多Al骨架敏感信息之一。我们的方法和结果表明，如何在一般产生更公平的基于人工智能的工具，特别是公平的自动化招聘制度。

4. A recurrent cycle consistency loss for progressive face-to-face synthesis [PDF] 返回目录
Enrique Sanchez, Michel Valstar
Abstract: This paper addresses a major flaw of the cycle consistency loss when used to preserve the input appearance in the face-to-face synthesis domain. In particular, we show that the images generated by a network trained using this loss conceal a noise that hinders their use for further tasks. To overcome this limitation, we propose a ''recurrent cycle consistency loss" which for different sequences of target attributes minimises the distance between the output images, independent of any intermediate step. We empirically validate not only that our loss enables the re-use of generated images, but that it also improves their quality. In addition, we propose the very first network that covers the task of unconstrained landmark-guided face-to-face synthesis. Contrary to previous works, our proposed approach enables the transfer of a particular set of input features to a large span of poses and expressions, whereby the target landmarks become the ground-truth points. We then evaluate the consistency of our proposed approach to synthesise faces at the target landmarks. To the best of our knowledge, we are the first to propose a loss to overcome the limitation of the cycle consistency loss, and the first to propose an ''in-the-wild'' landmark guided synthesis approach. Code and models for this paper can be found in this https URL
摘要：本文地址周期一致性的损失用来保存在面到面合成域的输入时的外观的一个主要缺陷。特别是，我们表明，由网络产生的图像采用这种损失隐瞒噪音影响到他们对其他任务使用培训。为了克服这种限制，我们提出了一个“”反复周期一致性的损失”，这对靶的不同序列的输出图像之间的属性最小化的距离，独立于任何中间步骤，我们凭经验验证不仅我们的损失能够再利用的生成的图像，但它也提高了它们的质量。此外，我们提出的第一个网络覆盖无约束地标导向的脸对脸合成的任务。相反，以前的作品，我们提出的方法能够在特定的传输输入的设置功能，以大跨度的姿势和表情，从而使目标标志性建筑成为地面实况点。然后，我们评估我们提出的方法在目标地标合成面的一致性。据我们所知，我们是最早提出损失来克服循环一致性损失的限制，并且第一个提出的“”内式野生“”里程碑引导合成的方法。代码以及mod对于本文ELS可以在此HTTPS URL中找到

5. A Novel CNN-based Method for Accurate Ship Detection in HR Optical Remote Sensing Images via Rotated Bounding Box [PDF] 返回目录
Linhao Li, Zhiqiang Zhou, Bo Wang, Lingjuan Miao, Hua Zong
Abstract: Currently, reliable and accurate ship detection in optical remote sensing images is still challenging. Even the state-of-the-art convolutional neural network (CNN) based methods cannot obtain very satisfactory results. To more accurately locate the ships in diverse orientations, some recent methods conduct the detection via the rotated bounding box. However, it further increases the difficulty of detection, because an additional variable of ship orientation must be accurately predicted in the algorithm. In this paper, a novel CNN-based ship detection method is proposed, by overcoming some common deficiencies of current CNN-based methods in ship detection. Specifically, to generate rotated region proposals, current methods have to predefine multi-oriented anchors, and predict all unknown variables together in one regression process, limiting the quality of overall prediction. By contrast, we are able to predict the orientation and other variables independently, and yet more effectively, with a novel dual-branch regression network, based on the observation that the ship targets are nearly rotation-invariant in remote sensing images. Next, a shape-adaptive pooling method is proposed, to overcome the limitation of typical regular ROI-pooling in extracting the features of the ships with various aspect ratios. Furthermore, we propose to incorporate multilevel features via the spatially-variant adaptive pooling. This novel approach, called multilevel adaptive pooling, leads to a compact feature representation more qualified for the simultaneous ship classification and localization. Finally, detailed ablation study performed on the proposed approaches is provided, along with some useful insights. Experimental results demonstrate the great superiority of the proposed method in ship detection.
摘要：目前，在光学遥感图像可靠和准确的船舶检测仍然具有挑战性。即使是国家的最先进的卷积神经网络（CNN）为基础的方法不能获得非常满意的结果。为了更准确地定位在不同取向的船只，最近的一些方法，通过旋转的边框进行检测。然而，进一步增加了检测的难度，因为船舶取向的附加变量必须在算法来准确地预测。在本文中，一个新颖的基于CNN舰检测方法，提出了通过克服的当前基于CNN-方法的一些共同缺陷船舶检测。具体而言，以产生旋转的区域的建议，目前的方法有预定义定向多锚，预测所有未知变量一起在一个回归过程，限制了总体预测的质量。通过对比，我们能更有效地预测方向和其他变量独立，然而，有一个新的双分支回归网络，基于观察的舰船目标几乎是旋转不变的遥感图像。接着，提出了一种形状自适应池的方法，以克服通常规则ROI-池的限制在提取附带各种纵横比的特征。此外，建议通过空间变自适应统筹纳入多层次的特点。这种新方法，称为多级自适应池，导致一个紧凑的特征表示更有资格的同时船舶分类和定位。最后，对所提出的方法进行了详细的研究消融提供，有一些有益的见解一起。实验结果表明，在船上发现了该方法的极大的优越性。

6. Fully Convolutional Online Tracking [PDF] 返回目录
Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu
Abstract: Discriminative training has turned out to be effective for robust tracking. However, online learning could be simply applied for classification branch, while still remains challenging to adapt to regression branch due to its complex design. In this paper, we present the first fully convolutional online tracking framework (FCOT), with a focus on enabling online learning for both classification and regression branches. Our key contribution is to introduce an anchor-free box regression branch, which unifies the whole tracking pipeline into a simpler fully convolutional network. This unified framework is beneficial to greatly decrease the complexity of tracking system and allows for more efficient training and inference. In addition, thanks to its simplicity, we are able to design a regression model generator (RMG) to perform online optimization of regression branch, making the whole tracking pipeline more effective in handling target deformation during tracking procedure. The proposed FCOT sets a new state-of-the-art results on five benchmarks including GOT-10k, LaSOT, TrackingNet, UAV123 and NFS, and performs on par with the state-of-the-art trackers on OTB100, with a high running speed of 53 FPS. The code and models will be made available at this https URL.
摘要：判别训练已经被证明是有效的鲁棒跟踪。然而，在线学习可以简单地适用于分类的分支，而仍然具有挑战性，以适应回归分支，由于其复杂的设计。在本文中，我们提出的第一个完全卷积在线跟踪框架（FCOT），重点扶持的在线学习分类和回归分支机构。我们的主要贡献是引入无锚箱回归分支，它统一了整个跟踪管道成一个简单的完全卷积网络。这个统一的框架，有利于极大地降低跟踪系统的复杂性，并允许进行更有效的培训和推理。此外，由于其简单，我们可以设计一个回归模型生成（RMG）进行回归分支的在线优化，使得整个跟踪管道跟踪过程中，处理目标的变形更有效。所提出的FCOT设置看齐上OTB100所述状态的最先进的跟踪器在五个基准包括GOT-10K，LaSOT，TrackingNet，UAV123和NFS一个新的国家的最先进的结果，并执行，具有高运行53 FPS的速度。代码和模型将在此HTTPS URL提供。

7. DeeSCo: Deep heterogeneous ensemble with Stochastic Combinatory loss for gaze estimation [PDF] 返回目录
Edouard Yvinec, Arnaud Dapogny, Kévin Bailly
Abstract: From medical research to gaming applications, gaze estimation is becoming a valuable tool. While there exists a number of hardware-based solutions, recent deep learning-based approaches, coupled with the availability of large-scale databases, have allowed to provide a precise gaze estimate using only consumer sensors. However, there remains a number of questions, regarding the problem formulation, architectural choices and learning paradigms for designing gaze estimation systems in order to bridge the gap between geometry-based systems involving specific hardware and approaches using consumer sensors only. In this paper, we introduce a deep, end-to-end trainable ensemble of heatmap-based weak predictors for 2D/3D gaze estimation. We show that, through heterogeneous architectural design of these weak predictors, we can improve the decorrelation between the latter predictors to design more robust deep ensemble models. Furthermore, we propose a stochastic combinatory loss that consists in randomly sampling combinations of weak predictors at train time. This allows to train better individual weak predictors, with lower correlation between them. This, in turns, allows to significantly enhance the performance of the deep ensemble. We show that our Deep heterogeneous ensemble with Stochastic Combinatory loss (DeeSCo) outperforms state-of-the-art approaches for 2D/3D gaze estimation on multiple datasets.
摘要：从医学研究到游戏应用，凝视估计成为一个有价值的工具。虽然存在一些基于硬件的解决方案，近期的深学习型的方法，再加上大型数据库的可用性，也允许提供仅使用消费者传感器精确的目光估计。但是，仍然存在一些问题，针对这个问题制定，结构选择和学习范式设计凝视估计系统，以弥合涉及特定的硬件基于几何系统之间的差距，只能通过消费者传感器的方法。在本文中，我们介绍了深刻的，端至端的2D / 3D基于热图弱预测的可训练的合奏凝视估计。我们表明，通过这些弱预测的异构体系结构设计，我们能提高后的预测之间的去相关设计出更强大的深合奏模式。此外，我们建议，包括在列车时刻随机抽样弱预测的组合随机组合子损失。这允许更好地培养各个弱预测，它们之间的相关性较低。这一点，在转弯，允许显著提升深合奏的性能。我们证明了我们的随机组合子损失（DeeSCo）深异构合奏优于对2D / 3D接近先进国家的凝视估计在多个数据集。

8. Seeing Red: PPG Biometrics Using Smartphone Cameras [PDF] 返回目录
Giulio Lovisotto, Henry Turner, Simon Eberz, Ivan Martinovic
Abstract: In this paper, we propose a system that enables photoplethysmogram (PPG)-based authentication by using a smartphone camera. PPG signals are obtained by recording a video from the camera as users are resting their finger on top of the camera lens. The signals can be extracted based on subtle changes in the video that are due to changes in the light reflection properties of the skin as the blood flows through the finger. We collect a dataset of PPG measurements from a set of 15 users over the course of 6-11 sessions per user using an iPhone X for the measurements. We design an authentication pipeline that leverages the uniqueness of each individual's cardiovascular system, identifying a set of distinctive features from each heartbeat. We conduct a set of experiments to evaluate the recognition performance of the PPG biometric trait, including cross-session scenarios which have been disregarded in previous work. We found that when aggregating sufficient samples for the decision we achieve an EER as low as 8%, but that the performance greatly decreases in the cross-session scenario, with an average EER of 20%.
摘要：在本文中，我们提出了一个系统，该系统通过使用智能手机摄像头使容积图（PPG）的认证。 PPG信号被作为用户在相机透镜的顶部静止他们的手指从摄像机记录的视频获得的。这些信号可以根据在视频作为血液流经手指是由于在皮肤的光反射特性的变化的微小变化来提取。我们收集PPG测量的数据集从超过使用iPhone X为测量每个用户会话6-11的过程中一组15个用户的。我们设计的验证管道，充分利用了每个人的心血管系统的独特性，确定了一组从每次心跳鲜明的特点。我们进行了一组实验，以评估PPG生物计量特征，其中包括已经在以前的工作被忽视跨会话场景识别性能。我们发现，决定聚集足够的样本，当我们达到能效比低8％，但其性能在跨会话情景大大降低，以20％的平均能效比。

9. Contextual Pyramid Attention Network for Building Segmentation in Aerial Imagery [PDF] 返回目录
Clint Sebastian, Raffaele Imbriaco, Egor Bondarev, Peter H.N. de With
Abstract: Building extraction from aerial images has several applications in problems such as urban planning, change detection, and disaster management. With the increasing availability of data, Convolutional Neural Networks (CNNs) for semantic segmentation of remote sensing imagery has improved significantly in recent years. However, convolutions operate in local neighborhoods and fail to capture non-local features that are essential in semantic understanding of aerial images. In this work, we propose to improve building segmentation of different sizes by capturing long-range dependencies using contextual pyramid attention (CPA). The pathways process the input at multiple scales efficiently and combine them in a weighted manner, similar to an ensemble model. The proposed method obtains state-of-the-art performance on the Inria Aerial Image Labelling Dataset with minimal computation costs. Our method improves 1.8 points over current state-of-the-art methods and 12.6 points higher than existing baselines on the Intersection over Union (IoU) metric without any post-processing. Code and models will be made publicly available.
摘要：建筑物的航拍图像提取在问题多种应用，如城市规划，变化检测和灾害管理。随着数据的可用性的增加，卷积神经网络（细胞神经网络），用于遥感图像的语义分割近年来显著改善。然而，卷积当地社区工作，并没有捕捉到非本地的功能，在航空影像的语义理解是必不可少的。在这项工作中，我们建议提高将通过使用上下文金字塔关注（CPA）远程依赖性建设不同大小的分割。途径处理在多尺度有效地输入和以加权的方式，类似于一个整体模型将它们结合起来。所提出的方法取得国家的先进性能的INRIA航空影像数据集标记以最小的计算成本。我们的方法提高了1.8点以上的国家的最先进的现有方法和比（IOU）度量交点超过联盟现有基准而没有任何后处理更高12.6分。代码和模型将被公之于众。

10. Code-Aligned Autoencoders for Unsupervised Change Detection in Multimodal Remote Sensing Images [PDF] 返回目录
Luigi T.Luppino, Mads A. Hansen, Michael Kampffmeyer, Filippo M. Bianchi, Gabriele Moser, Robert Jenssen, Stian N. Anfinsen
Abstract: Image translation with convolutional autoencoders has recently been used as an approach to multimodal change detection in bitemporal satellite images. A main challenge is the alignment of the code spaces by reducing the contribution of change pixels to the learning of the translation function. Many existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. We propose to extract relational pixel information captured by domain-specific affinity matrices at the input and use this to enforce alignment of the code spaces and reduce the impact of change pixels on the learning objective. A change prior is derived in an unsupervised fashion from pixel pair affinities that are comparable across domains. To achieve code space alignment we enforce that pixel with similar affinity relations in the input domains should be correlated also in code space. We demonstrate the utility of this procedure in combination with cycle consistency. The proposed approach are compared with state-of-the-art deep learning algorithms. Experiments conducted on four real datasets show the effectiveness of our methodology.
摘要：卷积自动编码图像平移最近被用作双颞卫星图像的方法多变化检测。一个主要的挑战是代码空间的减少变化像素的翻译功能的学习的贡献对齐。许多现有的方法通过利用变化的区域，其中，但是，并不总是可用的监督信息培训网络。我们建议以提取在输入由特定于域的亲和基质捕获的关系的像素信息，并使用该强制执行代码空间对准，并减少变化像素的所述学习目标的影响。变化之前导出在从像素对的亲和力是通过域可比无监督方式。为了实现代码空间对准我们执行与输入域相似的亲和力之间的关系应代码空间也被关联该像素。我们证明与周期的一致性结合这个程序的效用。建议的做法是与国家的最先进的深学习算法进行比较。在四个真实数据集进行的实验表明我们的方法的有效性。

11. Visual Descriptor Learning from Monocular Video [PDF] 返回目录
Umashankar Deekshith, Nishit Gajjar, Max Schwarz, Sven Behnke
Abstract: Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep learning methods solve this by training the network with a large set of expensive labeled data or perform labeling through strong 3D generative models using RGB-D videos. Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow. We demonstrate the functionality in a quantitative analysis on rendered videos, where ground truth information is available. Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background. The descriptors learned are unique and the representations determined by the network are global. We further show the applicability of the method to real-world videos.
摘要：函授估计是最广泛研究的之一，但只有部分解决了计算机视觉领域，在跟踪许多应用，测绘，识别物体和环境。在本文中，我们提出了以估算视觉描述符从视频实例通过训练完全卷积网络上了解到的RGB图像密集对应的新方法。最深刻的学习方法与一大组昂贵的标签数据训练网络解决这个问题，或者通过使用RGB-d视频强的3D生成模型进行标注。从RGB视频使用对比损失，其中相对标记是从光流估计我们的方法获悉。我们证明在渲染影片的定量分析，其中地面实况信息是可用的功能。不仅在方法上测试数据使用相同的背景表现良好，这也推广到一个新的背景情况。学到的描述符是唯一的，由网络确定的表示是全球性的。我们进一步表明，该方法对真实世界视频的适用性。

12. Self-Supervised training for blind multi-frame video denoising [PDF] 返回目录
Dewil Valéry, Arias Pablo, Facciolo Gabriele, Anger Jérémy, Davy Axel, Ehret Thibaud
Abstract: We propose a self-supervised approach for training multi-frame video denoising networks. These networks predict frame t from a window of frames around t. Our self-supervised approach benefits from the video temporal consistency by penalizing a loss between the predicted frame t and a neighboring target frame, which are aligned using an optical flow. We use the proposed strategy for online internal learning, where a pre-trained network is fine-tuned to denoise a new unknown noise type from a single video. After a few frames, the proposed fine-tuning reaches and sometimes surpasses the performance of a state-of-the-art network trained with supervision. In addition, for a wide range of noise types, it can be applied blindly without knowing the noise distribution. We demonstrate this by showing results on blind denoising of different synthetic and realistic noises.
摘要：我们提出了训练多帧降噪视频网络自我监督的做法。这些网络从大约t帧的窗口预测帧t。从视频时间一致性我们的自我监督方法的好处通过惩罚预测帧t和相邻目标帧，其中，使用光流对准之间的损耗。我们使用在线内部学习，其中预训练网络是微调去噪从单个视频新的未知噪声类型所提出的策略。几帧后，所提出的微调达到和超过有时国家的最先进的网络监督训练有素的性能。此外，对于一宽范围的噪声类型，它可以被盲目地在不知道噪声分布施加。我们通过展示在不同的合成和现实噪声的盲去噪结果证明这一点。

13. Explaining Regression Based Neural Network Model [PDF] 返回目录
Mégane Millan, Catherine Achard
Abstract: Several methods have been proposed to explain Deep Neural Network (DNN). However, to our knowledge, only classification networks have been studied to try to determine which input dimensions motivated the decision. Furthermore, as there is no ground truth to this problem, results are only assessed qualitatively in regards to what would be meaningful for a human. In this work, we design an experimental settings where the ground truth can been established: we generate ideal signals and disrupted signals with errors and learn a neural network that determines the quality of the signals. This quality is simply a score based on the distance between the disrupted signals and the corresponding ideal signal. We then try to find out how the network estimated this score and hope to find the time-step and dimensions of the signal where errors are present. This experimental setting enables us to compare several methods for network explanation and to propose a new method, named AGRA for Accurate Gradient, based on several trainings that decrease the noise present in most state-of-the-art results. Comparative results show that the proposed method outperforms state-of-the-art methods for locating time-steps where errors occur in the signal.
摘要：有几种方法已经被提出来解释深层神经网络（DNN）。然而，据我们所知，只有分类网络进行了研究，试图确定哪个输入尺寸动机的决定。此外，由于没有地面实况这个问题，结果仅在定性评估方面对什么是有意义的人。在这项工作中，我们的设计可以在那里建立了基本事实实验设置：我们产生理想信号和干扰信号，错误和学习决定了信号质量的神经网络。这个质量是简单地基于所述干扰信号和相应的理想信号之间的距离的分数。然后，我们试图找出网络如何估计这个分数，并希望能找到时间步长和地方存在错误的信号的尺寸。这一实验性设置，使我们能够比较的网络解释了几种方法，并提出了一种新方法，命名为AGRA精确梯度，基于对减少存在于大多数国家的先进成果噪音几个培训。比较结果表明，该方法优于状态的最先进的方法，用于定位的时间步骤，其中发生在信号中的误差。

14. Combining Visible Light and Infrared Imaging for Efficient Detection of Respiratory Infections such as COVID-19 on Portable Device [PDF] 返回目录
Zheng Jiang, Menghan Hu, Lei Fan, Yaling Pan, Wei Tang, Guangtao Zhai, Yong Lu
Abstract: Coronavirus Disease 2019 (COVID-19) has become a serious global epidemic in the past few months and caused huge loss to human society worldwide. For such a large-scale epidemic, early detection and isolation of potential virus carriers is essential to curb the spread of the epidemic. Recent studies have shown that one important feature of COVID-19 is the abnormal respiratory status caused by viral infections. During the epidemic, many people tend to wear masks to reduce the risk of getting sick. Therefore, in this paper, we propose a portable non-contact method to screen the health condition of people wearing masks through analysis of the respiratory characteristics. The device mainly consists of a FLIR one thermal camera and an Android phone. This may help identify those potential patients of COVID-19 under practical scenarios such as pre-inspection in schools and hospitals. In this work, we perform the health screening through the combination of the RGB and thermal videos obtained from the dual-mode camera and deep learning architecture.We first accomplish a respiratory data capture technique for people wearing masks by using face recognition. Then, a bidirectional GRU neural network with attention mechanism is applied to the respiratory data to obtain the health screening result. The results of validation experiments show that our model can identify the health status on respiratory with the accuracy of 83.7\% on the real-world dataset. The abnormal respiratory data and part of normal respiratory data are collected from Ruijin Hospital Affiliated to The Shanghai Jiao Tong University Medical School. Other normal respiratory data are obtained from healthy people around our researchers. This work demonstrates that the proposed portable and intelligent health screening device can be used as a pre-scan method for respiratory infections, which may help fight the current COVID-19 epidemic.
摘要：冠状病毒病2019（COVID-19）已经成为过去的几个月中一个严重的全球流行，造成巨大的损失对人类社会全球。对于这样的大规模流行，早期发现和隔离的潜在病毒携带者是必不可少的遏制疫情的蔓延。最近的研究表明，COVID-19的一个重要特征是由病毒感染的异常呼吸状态。在流行，许多人往往要戴口罩，以减少生病的风险。因此，在本文中，我们提出了一个便携式非接触式方法筛选的人戴口罩通过呼吸特性分析的健康状况。该装置主要由一个FLIR一个热照相机和Android手机。这可能有助于确定在实际情况下，如在学校和医院前检查COVID-19的潜在患者。在这项工作中，我们进行健康通过从双模式的相机和深度学习architecture.We第一完成人通过使用面部识别戴着口罩呼吸数据采集技术，来获得RGB和热视频的组合筛选。然后，注意机制的双向GRU神经网络应用到呼吸的数据，以获得健康检查结果。验证实验结果表明，我们的模型可以识别对呼吸健康状况与83.7 \％的真实世界的数据集的准确性。正常呼吸数据的呼吸异常数据和部分从附属瑞金医院收集到的上海交通大学医学院。其他正常呼吸的数据是从各地的我们的研究人员健康的人获得。这项工作表明，所提出的便携和智能健康检查设备可作为呼吸道感染的预扫描方法，它可能有助于应对当前COVID-19流行。

15. Continuous learning of face attribute synthesis [PDF] 返回目录
Xin Ning, Shaohui Xu, Xiaoli Dong, Weijun Li, Fangzhe Nan, Yuanzhou Yao
Abstract: The generative adversarial network (GAN) exhibits great superiority in the face attribute synthesis task. However, existing methods have very limited effects on the expansion of new attributes. To overcome the limitations of a single network in new attribute synthesis, a continuous learning method for face attribute synthesis is proposed in this work. First, the feature vector of the input image is extracted and attribute direction regression is performed in the feature space to obtain the axes of different attributes. The feature vector is then linearly guided along the axis so that images with target attributes can be synthesized by the decoder. Finally, to make the network capable of continuous learning, the orthogonal direction modification module is used to extend the newly-added attributes. Experimental results show that the proposed method can endow a single network with the ability to learn attributes continuously, and, as compared to those produced by the current state-of-the-art methods, the synthetic attributes have higher accuracy.
摘要：生成对抗网络（GAN）表现出的面部属性合成任务巨大的优越性。但是，现有的方法已经非常有限的新属性的膨胀效应。为了克服在新的属性合成一个单一网络的局限性，脸属性合成一个连续的学习方法，在这项工作中提出。首先，输入图像的特征向量被提取并且在特征空间中进行属性方向回归，以获得不同的属性的坐标轴。特征向量然后线性沿轴线引导，使得与目标属性的图像可以由解码器来合成。最后，为了能够连续学习的网络，正交方向修改模块用于扩展新添加的属性。实验结果表明，所提出的方法可以赋予一个单一的网络与不断学习属性的能力，并且，相比于那些由当前状态的最先进的方法产生的，合成的属性具有更高的精度。

16. A Practical Blockchain Framework using Image Hashing for Image Authentication [PDF] 返回目录
Cameron White, Manoranjan Paul, Subrata Chakraborty
Abstract: Blockchain is a relatively new technology that can be seen as a decentralised database. Blockchain systems heavily rely on cryptographic hash functions to store their data, which makes it difficult to tamper with any data stored in the system. A topic that was researched along with blockchain is image authentication. Image authentication focuses on investigating and maintaining the integrity of images. As a blockchain system can be useful for maintaining data integrity, image authentication has the potential to be enhanced by blockchain. There are many techniques that can be used to authenticate images; the technique investigated by this work is image hashing. Image hashing is a technique used to calculate how similar two different images are. This is done by converting the images into hashes and then comparing them using a distance formula. To investigate the topic, an experiment involving a simulated blockchain was created. The blockchain acted as a database for images. This blockchain was made up of devices which contained their own unique image hashing algorithms. The blockchain was tested by creating modified copies of the images contained in the database, and then submitting them to the blockchain to see if it will return the original image. Through this experiment it was discovered that it is plausible to create an image authentication system using blockchain and image hashing. However, the design proposed by this work requires refinement, as it appears to struggle in some situations. This work shows that blockchain can be a suitable approach for authenticating images, particularly via image hashing. Other observations include that using multiple image hash algorithms at the same time can increase performance in some cases, as well as that each type of test done to the blockchain has its own unique pattern to its data.
摘要：Blockchain是一个相对较新的技术，它可以被看作是一个分散的数据库。 Blockchain系统在很大程度上依赖于加密散列函数来存储数据，这使得它很难与存储系统中的任何数据篡改。这是与blockchain一起研究的主题是图像认证。图像认证侧重于调查和保持图像的完整性。作为blockchain系统可以是用于保持数据完整性的有用，图像认证必须由blockchain被增强的潜力。有迹象表明，可用于验证图像的许多技术;通过这项工作调查的技术是图像的散列。图像散列是用于计算两个不同的图像的相似程度的技术。这是通过将图像变换成散列，然后使用距离式比较它们来完成。研究该课题，涉及模拟blockchain实验已创建。该blockchain充当了图像的数据库。这blockchain制成含有自己独特的形象散列算法设备组成。该blockchain是通过创建包含在数据库中的图像的修改后的副本，然后将其提交给blockchain，看它是否会返回原图像进行测试。通过这个实验，人们发现，这是合理的创建使用blockchain和图像哈希图像认证系统。然而，这项工作提出了设计要求细化，因为它出现在某些情况下挣扎。这项工作表明，blockchain可以是用于认证的图像的适当的方法，特别是通过图像散列。其他意见包括使用多个图像的哈希算法，同时在某些情况下提高性能，以及该做的blockchain测试每种类型都有其独特的模式其数据。

17. Intuitive, Interactive Beard and Hair Synthesis with Generative Models [PDF] 返回目录
Kyle Olszewski, Duygu Ceylan, Jun Xing, Jose Echevarria, Zhili Chen, Weikai Chen, Hao Li
Abstract: We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects. To circumvent the tedious and computationally expensive tasks of modeling, rendering and compositing the 3D geometry of the target hairstyle using the traditional graphics pipeline, we employ a neural network pipeline that synthesizes realistic and detailed images of facial hair directly in the target image in under one second. The synthesis is controlled by simple and sparse guide strokes from the user defining the general structural and color properties of the target hairstyle. We qualitatively and quantitatively evaluate our chosen method compared to several alternative approaches. We show compelling interactive editing results with a prototype user interface that allows novice users to progressively refine the generated image to match their desired hairstyle, and demonstrate that our approach also allows for flexible and high-fidelity scalp hair synthesis.
摘要：我们提出了一个互动的方式，在图像中的面部毛发合成现实的变化，从细微的修改，以现有的头发加复杂，胡子刮得干干净净科目图像具有挑战性的头发。为了规避建模，渲染，并使用传统的图形流水线合成目标发型的三维几何的繁琐和耗费计算任务，我们采用了神经网络的管道，在综合了面部毛发的现实和详细的图像直接在目标图像下一个第二。合成通过从用户定义目标发型的一般结构和颜色特性简单和稀疏导向冲程控制。相比于一些替代方法，我们定性和定量评估我们的选择的方法。我们展示了原型用户界面，使新手用户逐步细化生成的图像，以满足他们期望的发型引人注目的交互式编辑结果，并证明我们的方法还允许灵活和高保真头皮的合成。

18. Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision [PDF] 返回目录
Hoel Kervadec, Jose Dolz, Shanshan Wang, Eric Granger, Ismail Ben Ayed
Abstract: We propose a novel weakly supervised learning segmentation based on several global constraints derived from box annotations. Particularly, we leverage a classical tightness prior to a deep learning setting via imposing a set of constraints on the network outputs. Such a powerful topological prior prevents solutions from excessive shrinking by enforcing any horizontal or vertical line within the bounding box to contain, at least, one pixel of the foreground region. Furthermore, we integrate our deep tightness prior with a global background emptiness constraint, guiding training with information outside the bounding box. We demonstrate experimentally that such a global constraint is much more powerful than standard cross-entropy for the background class. Our optimization problem is challenging as it takes the form of a large set of inequality constraints on the outputs of deep networks. We solve it with sequence of unconstrained losses based on a recent powerful extension of the log-barrier method, which is well-known in the context of interior-point methods. This accommodates standard stochastic gradient descent (SGD) for training deep networks, while avoiding computationally expensive and unstable Lagrangian dual steps and projections. Extensive experiments over two different public data sets and applications (prostate and brain lesions) demonstrate that the synergy between our global tightness and emptiness priors yield very competitive performances, approaching full supervision and outperforming significantly DeepCut. Furthermore, our approach removes the need for computationally expensive proposal generation. Our code is shared anonymously.
摘要：本文提出了一种弱监督基于从盒标注的数种全局约束的学习分割。特别是，我们之前的深度学习环境通过在网络上输出施加一组约束利用一个经典的密封性。过度收缩如此强大的拓扑之前防止溶液通过强制执行边界框内的任何水平或垂直线来含有，至少，前景区域的一个像素。此外，我们整合我们深密封性之前具有全球背景的空虚约束，指导与边框外界信息的培训。我们通过实验证明，这样一个全球性的约束是比标准交叉熵为背景类更强大。我们的优化问题是一个挑战，因为它需要一大套深网络的输出不等式约束的形式。我们的具有基于最近的强大的扩展对数障碍法，在内部点方法的背景下这是众所周知的不受约束的损失顺序解决。这样可以适应用于训练深网络，同时避免计算上昂贵且不稳定的拉格朗日双步骤和突起标准随机梯度下降（SGD）。在两个不同的公共数据集和应用（前列腺癌和脑损伤）大量的实验证明，我们的全球密封性和空虚先验之间的协同作用产生非常有竞争力的性能，接近满监督显著跑赢DeepCut。此外，我们的方法消除了对昂贵的计算方案生成的需要。我们的代码是匿名共享。

19. RoboTHOR: An Open Simulation-to-Real Embodied AI Platform [PDF] 返回目录
Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi
Abstract: Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI. Notwithstanding this progress, the crucial question of how well models trained in simulation generalize to reality has remained largely unanswered. The creation of a comparable ecosystem for simulation-to-real embodied AI presents many challenges: (1) the inherently interactive nature of the problem, (2) the need for tight alignments between real and simulated worlds, (3) the difficulty of replicating physical conditions for repeatable experiments, (4) and the associated cost. In this paper, we introduce RoboTHOR to democratize research in interactive and embodied visual AI. RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world. As a first benchmark, our experiments show there exists a significant gap between the performance of models trained in simulation when they are tested in both simulations and their carefully constructed physical analogs. We hope that RoboTHOR will spur the next stage of evolution in embodied computer vision. RoboTHOR can be accessed at the following link: this https URL
摘要：视觉识别的生态系统（例如ImageNet，帕斯卡，COCO）在现代计算机视觉的发展不可否认地发挥了普遍的作用。我们认为，互动性和视觉体现了AI之前，这些生态系统的到来达到类似于视觉识别发展的阶段。近来，各种合成环境已经出台，以促进体现人工智能研究。尽管有此进步，如何模型模拟推广到现实训练的关键问题仍然悬而未决大部分。可比的生态系统进行仿真到真实的创作体现AI提出了许多挑战：（1）问题的内在互动性，（2）需要真实和模拟世界之间的紧密对齐，（3）复制的难度可重复的实验中，（4）和相关联的成本的物理条件。在本文中，我们介绍RoboTHOR在交互和视觉体现AI民主化研究。 RoboTHOR提供带物理同行配对系统地探索和克服模拟到真实转移的挑战模拟环境的框架，一个平台，让世界各地的研究人员可以远程测试他们的模型体现在物理世界。作为第一个标杆，我们的实验证明存在的模拟训练的时候，他们在两个模拟和他们精心构建的物理类似物被测试车型的性能之间的差距显著。我们希望RoboTHOR将推动进化的下一阶段中体现的计算机视觉。此HTTPS URL：RoboTHOR可以通过以下链接进行访问

20. Cascaded Structure Tensor Framework for Robust Identification of Heavily Occluded Baggage Items from X-ray Scans [PDF] 返回目录
Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
Abstract: In the last two decades, baggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is tedious, error-prone, and compromise privacy. Hence, many researchers have developed X-ray imagery-based autonomous systems to address these shortcomings. This paper presents a cascaded structure tensor framework that can automatically extract and recognize suspicious items in heavily occluded and cluttered baggage. The proposed framework is unique, as it intelligently extracts each object by iteratively picking contour-based transitional information from different orientations and uses only a single feed-forward convolutional neural network for the recognition. The proposed framework has been rigorously evaluated using a total of 1,067,381 X-ray scans from publicly available GDXray and SIXray datasets where it outperformed the state-of-the-art solutions by achieving the mean average precision score of 0.9343 on GDXray and 0.9595 on SIXray for recognizing the highly cluttered and overlapping suspicious items. Furthermore, the proposed framework computationally achieves 4.76\% superior run-time performance as compared to the existing solutions based on publicly available object detectors
摘要：在过去的二十年里，行李扫描已成为全球的主要航空安全问题之一。行李物品的人工筛选繁琐，容易出错，和妥协的隐私。因此，许多研究人员已经开发的X射线图像为基础的自治系统来克服这些缺点。本文提出了一种级联结构张量的框架，可以自动提取和识别严重堵塞和混乱的行李可疑物品。所提出的架构是独特的，因为它智能地通过迭代地从拾取只不同的方向和用途识别单前馈卷积神经网络基于轮廓的过渡信息来提取每个对象。拟议的框架已经总共使用了从公开GDXray和SIXray数据集1067381 X射线扫描得到严格的评估它通过实现对GDXray均值平均精确度得分为0.9343和0.9595的SIXray跑赢国家的最先进的解决方案用于识别高度混乱和重叠的可疑物品。此外，相比于根据公开的对象检测器的现有解决方案提议的框架计算达到4.76 \％优越的运行时性能

21. Line Art Correlation Matching Network for Automatic Animation Colorization [PDF] 返回目录
Zhang Qian, Wang Bo, Wen Wei, Li Hai, Liu Jun Hui
Abstract: Automatic animation line art colorization is a challenging computer vision problem since line art is a highly sparse and abstracted information and there exists a strict requirement for the color and style consistency between frames. Recently, a lot of GAN(Generative Adversarial Network) based image-to-image transfer method for single line art colorization has emerged. They can generate perceptually appealing result conditioned on line art. However,these methods can not be adopted to the task of animation colorization because of the lack of consideration of in-between frame consistency. Existing methods simply input the previous colored frame as a reference to color the next line art, which will mislead the colorization due to the spatial misalignment of the previous colored frame and the next line art especially at positions where apparent changes happen. To address these challenges, we design a kind of matching model called CM(co-rrelation matching) to align the colored reference in an learnable way and integrate the model into an U-Net structure generator in a coarse-to-fine manner. Extension evaluations shows that CM model can effectively improve the in-between consistency and generating quality expecially when the motion is intense and diverse.
摘要：自动动画艺术线条彩色化是一个具有挑战性的计算机视觉问题，因为线条艺术是一个非常稀疏，抽象信息和存在的帧之间的颜色和风格的一致性严格的要求。最近，很多GAN（剖成对抗性网络）基于图像到图像的单个线路技术着色的转移方法的已经出现。他们可以产生感知吸引人结果为条件的线条艺术。然而，这些方法不能采用动画着色的任务，因为缺乏考虑框架的一致性之间的-的。由于以前的彩色帧的空间偏移并且在其中明显变化发生的位置的下一行领域尤其现有的方法简单地输入前一个彩色帧作为参考着色下一行技术，这将误导彩色化。为了应对这些挑战，我们设计了一种被称为CM（共rrelation匹配）匹配模型的对齐在可学习方式的彩色参考和模型整合到一个由粗到细地的U型网状结构产生。扩展评价表明，CM模型可有效地提高在它们之间的一致性和expecially产生质量当运动激烈多样。

22. On Box-Cox Transformation for Image Normality and Pattern Classification [PDF] 返回目录
Abbas Cheddad
Abstract: A unique member of the power transformation family is known as the Box-Cox transformation. The latter can be seen as a mathematical operation that leads to finding the optimum lambda ({\lambda}) value that maximizes the log-likelihood function to transform a data to a normal distribution and to reduce heteroscedasticity. In data analytics, a normality assumption underlies a variety of statistical test models. This technique, however, is best known in statistical analysis to handle one-dimensional data. Herein, this paper revolves around the utility of such a tool as a pre-processing step to transform two-dimensional data, namely, digital images and to study its effect. Moreover, to reduce time complexity, it suffices to estimate the parameter lambda in real-time for large two-dimensional matrices by merely considering their probability density function as a statistical inference of the underlying data distribution. We compare the effect of this light-weight Box-Cox transformation with well-established state-of-the-art low light image enhancement techniques. We also demonstrate the effectiveness of our approach through several test-bed data sets for generic improvement of visual appearance of images and for ameliorating the performance of a colour pattern classification algorithm as an example application. Results with and without the proposed approach, are compared using the state-of-the art transfer/deep learning which are discussed in the Appendix. To the best of our knowledge, this is the first time that the Box-Cox transformation is extended to digital images by exploiting histogram transformation.
摘要：变电家族的唯一成员是被称为Box-Cox转换。后者可以被看作是一个数学运算，导致发现最佳拉姆达（{\拉姆达}）值最大化对数似然函数的数据变换到正常分布，并减少异方差。在数据分析，假设其正态分布underlies各种统计测试模型。这种技术，但是，在统计分析中最有名的要处理一维数据。这里，围绕这种工具的作为预处理步骤的效用本文绕转变换的二维数据，即，数字图像，并研究其效果。此外，为了减少时间复杂度，它足以通过仅仅考虑到它们的概率密度函数作为基础数据分布的统计推断来估计实时为大的二维矩阵的参数拉姆达。我们比较这重量轻Box-Cox变换的效果与成熟国家的最先进的低光图像增强技术。我们还通过几个试验台数据集图像的视觉外观仿制改进和改善色彩模式分类算法的性能作为一个应用实例证明了该方法的有效性。结果具有和不具有所提出的方法，使用的是哪个都在附录中所讨论的国家的本领域转移/深度学习比较。据我们所知，这是第一次，Box-Cox转换是通过利用直方图转型扩展到数字图像。

23. BabyAI++: Towards Grounded-Language Learning beyond Memorization [PDF] 返回目录
Tianshi Cao, Jingkang Wang, Yining Zhang, Sivabalan Manivasagam
Abstract: Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.
摘要：尽管在许多现实世界的任务（例如，机器人），强化学习（RL）面临着新的和动态的场景时，代理人仍白板学习的成功。相比之下，人类可以通过卸载文字描述这个包袱。虽然最近的工作表明，在目标空调RL指导性文本的好处，很少有研究描述文本帮助代理商是否要在动态环境一概而论。为了促进研究在这个方向上，我们引入一个新的平台，BabyAI ++，与相应的描述文本以及产生各种动态的环境。此外，我们的基准数的基线从以下设置指令继承和发展对我们的平台上视觉接地语言学习的新方法。大量的实验证明了强有力的证据，使用说明文字提高RL剂的不同环境中的推广与变化动态。

24. Residual-driven Fuzzy C-Means Clustering for Image Segmentation [PDF] 返回目录
Cong Wang, Witold Pedrycz, ZhiWu Li, MengChu Zhou
Abstract: Due to its inferior characteristics, an observed (noisy) image's direct use gives rise to poor segmentation results. Intuitively, using its noise-free image can favorably impact image segmentation. Hence, the accurate estimation of the residual between observed and noise-free images is an important task. To do so, we elaborate on residual-driven Fuzzy \emph{C}-Means (FCM) for image segmentation, which is the first approach that realizes accurate residual estimation and leads noise-free image to participate in clustering. We propose a residual-driven FCM framework by integrating into FCM a residual-related fidelity term derived from the distribution of different types of noise. Built on this framework, we present a weighted $\ell_{2}$-norm fidelity term by weighting mixed noise distribution, thus resulting in a universal residual-driven FCM algorithm in presence of mixed or unknown noise. Besides, with the constraint of spatial information, the residual estimation becomes more reliable than that only considering an observed image itself. Supporting experiments on synthetic, medical, and real-world images are conducted. The results demonstrate the superior effectiveness and efficiency of the proposed algorithm over existing FCM-related algorithms.
摘要：由于其特性较差，观察到的（吵）图像的直接利用产生不良的分割结果。直观地说，利用其无噪声图像可以有利地影响图像分割。因此，观察到的和无噪声的图像之间的残差的准确的估计是一项重要任务。要做到这一点，我们要说明的剩余驱动的模糊\ EMPH {C} -Means（FCM）的图像分割，这是实现精确的剩余估计和引线无噪声图像参与集群的第一种方法。我们通过集成到FCM从不同类型的噪声的分布而残余的相关保真项建议的剩余驱动FCM框架。通过加权混合噪声分布，从而导致在混合的或未知的噪声的存在的通用残余驱动FCM算法规范保真项 - 建立在该框架中，我们提出了一个加权$ \ ell_ {2} $。此外，随着空间信息的约束，残留估计变得比仅考虑的观察图像本身更可靠。支持上合成，医疗和真实世界的影像实验进行。结果证明，该算法在现有FCM相关算法的优异效果和效率。

25. Image Segmentation Using Hybrid Representations [PDF] 返回目录
Alakh Desai, Ruchi Chauhan, Jayanthi Sivaswamy
Abstract: This work explores a hybrid approach to segmentation as an alternative to a purely data-driven approach. We introduce an end-to-end U-Net based network called DU-Net, which uses additional frequency preserving features, namely the Scattering Coefficients (SC), for medical image segmentation. SC are translation invariant and Lipschitz continuous to deformations which help DU-Net outperform other conventional CNN counterparts on four datasets and two segmentation tasks: Optic Disc and Optic Cup in color fundus images and fetal Head in ultrasound images. The proposed method shows remarkable improvement over the basic U-Net with performance competitive to state-of-the-art methods. The results indicate that it is possible to use a lighter network trained with fewer images (without any augmentation) to attain good segmentation results.
摘要：本工作探讨作为一种替代的混合方法来分割，以纯粹的数据驱动的方法。我们引入端至端的U净基于网络称为DU-网，其使用附加的频率保持特征，即散射系数（SC），用于医学图像分割。 SC是平移不变性与李氏持续到帮助DU-Net的性能超过四个数据集和两个分割任务等常规CNN同行变形：视盘和视杯彩色眼底图像和胎头超声图像。该方法显示了显着的改善了基本的掌中与性能国家的最先进的方法，有竞争力的。结果表明，它是可以使用具有较少的图像（没有任何增强）培养了打火机网络获得良好的分割结果。

26. Fully Automated Myocardial Strain Estimation from CMR Tagged Images using a Deep Learning Framework in the UK Biobank [PDF] 返回目录
Edward Ferdian, Avan Suinesiaputra, Kenneth Fung, Nay Aung, Elena Lukaschuk, Ahmet Barutcu, Edd Maclean, Jose Paiva, Stefan K. Piechnik, Stefan Neubauer, Steffen E Petersen, Alistair A. Young
Abstract: Purpose: To demonstrate the feasibility and performance of a fully automated deep learning framework to estimate myocardial strain from short-axis cardiac magnetic resonance tagged images. Methods and Materials: In this retrospective cross-sectional study, 4508 cases from the UK Biobank were split randomly into 3244 training and 812 validation cases, and 452 test cases. Ground truth myocardial landmarks were defined and tracked by manual initialization and correction of deformable image registration using previously validated software with five readers. The fully automatic framework consisted of 1) a convolutional neural network (CNN) for localization, and 2) a combination of a recurrent neural network (RNN) and a CNN to detect and track the myocardial landmarks through the image sequence for each slice. Radial and circumferential strain were then calculated from the motion of the landmarks and averaged on a slice basis. Results: Within the test set, myocardial end-systolic circumferential Green strain errors were -0.001 +/- 0.025, -0.001 +/- 0.021, and 0.004 +/- 0.035 in basal, mid, and apical slices respectively (mean +/ std. dev. of differences between predicted and manual strain). The framework reproduced significant reductions in circumferential strain in diabetics, hypertensives, and participants with previous heart attack. Typical processing time was ~260 frames (~13 slices) per second on an NVIDIA Tesla K40 with 12GB RAM, compared with 6-8 minutes per slice for the manual analysis. Conclusions: The fully automated RNNCNN framework for analysis of myocardial strain enabled unbiased strain evaluation in a high-throughput workflow, with similar ability to distinguish impairment due to diabetes, hypertension, and previous heart attack.
摘要：目的：为了证明一个完全自动化的深度学习架构的可行性和性能，以从估计心肌应变短轴心脏磁共振标记图像。方法和材料：在该回顾性横断面研究，从英国生物银行4508案件随机分成训练3244和812的验证的情况下，和452的测试用例。地面实况心肌标志被定义和使用先前验证的软件有五个读者追踪由变形图像配准的手动初始化和纠正。全自动框架包括了用于定位1）的卷积神经网络（CNN），和2）回归神经网络（RNN）和CNN的组合来检测，并通过针对每个切片图像序列追踪心肌的地标。径向和周向应变然后从标志的运动计算出的并平均切片的基础上。结果：在测试集合，心肌收缩末期周格林应变误差分别为-0.001 0.025 +/-，-0.001 0.021 +/-，以及在基底，中部0.004 +/- 0.035和心尖切片（均值+ / STD的预测和手动应变之间的差异。dev的）。该框架转载糖尿病患者，高血压患者圆周应变显著降低，与以前的心脏发作的参与者。典型的处理时间为每秒〜260帧（13〜切片）上的NVIDIA特斯拉K40与12GB的RAM，每个切片6-8分钟为手动分析比较。结论：心肌劳损的分析完全自动化的RNNCNN框架启用公正的应变评价高通量的工作流程，以区分功能障碍是由于糖尿病，高血压类似的能力，和以前的心脏发作。

27. JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation [PDF] 返回目录
Yu-Huan Wu, Shang-Hua Gao, Jie Mei, Jun Xu, Deng-Ping Fan, Chao-Wei Zhao, Ming-Ming Cheng
Abstract: Recently, the novel coronavirus 2019 (COVID-19) has caused a pandemic disease over 200 countries, influencing billions of humans. To control the infection, the first and key step is to identify and separate the infected people. But due to the lack of Reverse Transcription Polymerase Chain Reaction (RT-PCR) tests, it is essential to discover suspected COVID-19 patients via CT scan analysis by radiologists. However, CT scan analysis is usually time-consuming, requiring at least 15 minutes per case. In this paper, we develop a novel Joint Classification and Segmentation (JCS) system to perform real-time and explainable COVID-19 diagnosis. To train our JCS system, we construct a large scale COVID-19 Classification and Segmentation (COVID-CS) dataset, with 144,167 CT images of 400 COVID-19 patients and 350 uninfected cases. 3,855 CT images of 200 patients are annotated with fine-grained pixel-level labels, lesion counts, infected areas and locations, benefiting various diagnosis aspects. Extensive experiments demonstrate that, the proposed JCS diagnosis system is very efficient for COVID-19 classification and segmentation. It obtains an average sensitivity of 95.0% and a specificity of 93.0% on the classification test set, and 78.3% Dice score on the segmentation test set, of our COVID-CS dataset. The online demo of our JCS diagnosis system will be available soon.
摘要：近日，新型冠状病毒2019（COVID-19）已经引起大流行的疾病超过200个国家，数十亿影响人类。为了控制感染，第一和关键的步骤是识别和区分感染者。但由于缺乏反转录聚合酶链反应（RT-PCR）测试，有必要探索放射科医生通过CT扫描分析COVID，19例疑似。然而，CT扫描分析通常是费时的，需要每情况下至少15分钟。在本文中，我们开发了一种新的联合分类和分割（JCS）系统执行实时和可解释COVID-19的诊断。要培养我们的JCS系统，我们构建了一个大规模COVID-19分类和分割（COVID-CS）的数据集，400 COVID-19的病人144167幅CT图像和350感染病例。 200名患者3855个CT图像标注有细粒像素级别的标签，损害数，疫区和位置，有利于各种诊断方面。大量的实验表明，所提出JCS诊断系统是非常有效的用于COVID-19分类和分割。它获得了COVID-CS数据集的95.0％的分割测试组的平均灵敏度和93.0％的分类测试集的特异性和78.3％骰子得分。我们的JCS诊断系统的在线演示也将很快面市。

28. Extending Unsupervised Neural Image Compression With Supervised Multitask Learning [PDF] 返回目录
David Tellez, Diederik Hoppener, Cornelis Verhoef, Dirk Grunhagen, Pieter Nierop, Michal Drozdzal, Jeroen van der Laak, Francesco Ciompi
Abstract: We focus on the problem of training convolutional neural networks on gigapixel histopathology images to predict image-level targets. For this purpose, we extend Neural Image Compression (NIC), an image compression framework that reduces the dimensionality of these images using an encoder network trained unsupervisedly. We propose to train this encoder using supervised multitask learning (MTL) instead. We applied the proposed MTL NIC to two histopathology datasets and three tasks. First, we obtained state-of-the-art results in the Tumor Proliferation Assessment Challenge of 2016 (TUPAC16). Second, we successfully classified histopathological growth patterns in images with colorectal liver metastasis (CLM). Third, we predicted patient risk of death by learning directly from overall survival in the same CLM data. Our experimental results suggest that the representations learned by the MTL objective are: (1) highly specific, due to the supervised training signal, and (2) transferable, since the same features perform well across different tasks. Additionally, we trained multiple encoders with different training objectives, e.g. unsupervised and variants of MTL, and observed a positive correlation between the number of tasks in MTL and the system performance on the TUPAC16 dataset.
摘要：我们专注于对千兆像素图像病理训练卷积神经网络预测图像层次目标的问题。为了这个目的，我们扩展神经图像压缩（NIC），图像压缩框架，降低了使用unsupervisedly训练的编码器网络这些图像的维数。我们建议使用培训监督多任务学习（MTL），而不是这个编码器。我们应用所提出的MTL NIC两个病理数据集和三个任务。首先，我们在肿瘤2016（TUPAC16）的增殖评估挑战获得国家的先进成果。其次，我们成功地与大肠癌肝转移（CLM）的图像分类组织病理学的增长模式。第三，我们在同一CLM数据直接从总生存期预测学习死亡的病人的风险。我们的实验结果表明，由MTL目标学到的表示是：（1）具有高度特异性，由于指导训练信号，和（2）转让的，因为同样的功能在不同的任务表现良好。此外，我们训练有素的多个编码器具有不同的培养目标，例如MTL的无监督和变体，并观察到在MTL任务的数量和在数据集TUPAC16系统性能之间的正相关。

29. 4DFlowNet: Super-Resolution 4D Flow MRI using Deep Learning and Computational Fluid Dynamics [PDF] 返回目录
Edward Ferdian, Avan Suinesiaputra, David Dubowitz, Debbie Zhao, Alan Wang, Brett Cowan, Alistair Young
Abstract: 4D-flow magnetic resonance imaging (MRI) is an emerging imaging technique where spatiotemporal 3D blood velocity can be captured with full volumetric coverage in a single non-invasive examination. This enables qualitative and quantitative analysis of hemodynamic flow parameters of the heart and great vessels. An increase in the image resolution would provide more accuracy and allow better assessment of the blood flow, especially for patients with abnormal flows. However, this must be balanced with increasing imaging time. The recent success of deep learning in generating super resolution images shows promise for implementation in medical images. We utilized computational fluid dynamics simulations to generate fluid flow simulations and represent them as synthetic 4D flow MRI data. We built our training dataset to mimic actual 4D flow MRI data with its corresponding noise distribution. Our novel 4DFlowNet network was trained on this synthetic 4D flow data and was capable in producing noise-free super resolution 4D flow phase images with upsample factor of 2. We also tested the 4DFlowNet in actual 4D flow MR images of a phantom and normal volunteer data, and demonstrated comparable results with the actual flow rate measurements giving an absolute relative error of 0.6 to 5.8% and 1.1 to 3.8% in the phantom data and normal volunteer data, respectively.
摘要：4D-流动磁共振成像（MRI）是一种新兴的成像技术，其中时空三维血流速度可以在一个单一的无创伤性检查全体积覆盖被捕获。这使心脏和大血管血流动力学流参数的定性和定量分析。在图像分辨率的增加会提供更多的准确性，让血液流动的更好的评估，尤其是对患者的异常流动。然而，这必须随着成像时间进行平衡。深度学习中产生的超高分辨率图像显示了最近成功承诺在医学图像实现。我们利用计算流体动力学模拟，以产生流体流动模拟和它们表示为合成4D流MRI数据。我们建立了我们的训练数据集与它对应的噪声分布模拟实际4D流MRI数据。我们的新颖4DFlowNet网络进行训练有关此合成4D流数据，并能够在具有2上采样因子产生无噪声超分辨率4D流相位图像我们还测试了在4DFlowNet的假想的实际4D流动的MR图像和正常志愿者数据，并表现出与所述实际流率测量值分别给予0.6的绝对相对误差至5.8％及1.1至3.8％，在幻象数据和正常志愿者的数据，比较的结果。

30. MXR-U-Nets for Real Time Hyperspectral Reconstruction [PDF] 返回目录
Atmadeep Banerjee, Akash Palrecha
Abstract: In recent times, CNNs have made significant contributions to applications in image generation, super-resolution and style transfer. In this paper, we build upon the work of Howard and Gugger, He et al. and Misra, D. and propose a CNN architecture that accurately reconstructs hyperspectral images from their RGB counterparts. We also propose a much shallower version of our best model with a 10% relative memory footprint and 3x faster inference, thus enabling real-time video applications while still experiencing only about a 0.5% decrease in performance.
摘要：近来，细胞神经网络做出了在图像生成，超分辨率和风格的传输应用显著的贡献。在本文中，我们建立在霍华德和Gugger，他等人的作品。和米斯拉，D.，提出了一种CNN架构，准确地重建来自他们的RGB对应的高光谱图像。我们也建议我们最好的模型的浅得多版采用了10％的相对内存占用和更快的3倍推理，从而实现实时视频应用程序，同时仍只经历有关的性能下降0.5％。

31. ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos [PDF] 返回目录
Guillaume Vaudaux-Ruth, Adrien Chan-Hon-Tong, Catherine Achard
Abstract: Summarizing video content is an important task in many applications. This task can be defined as the computation of the ordered list of actions present in a video. Such a list could be extracted using action detection algorithms. However, it is not necessary to determine the temporal boundaries of actions to know their existence. Moreover, localizing precise boundaries usually requires dense video analysis to be effective. In this work, we propose to directly compute this ordered list by sparsely browsing the video and selecting one frame per action instance, task known as action spotting in literature. To do this, we propose ActionSpotter, a spotting algorithm that takes advantage of Deep Reinforcement Learning to efficiently spot actions while adapting its video browsing speed, without additional supervision. Experiments performed on datasets THUMOS14 and ActivityNet show that our framework outperforms state of the art detection methods. In particular, the spotting mean Average Precision on THUMOS14 is significantly improved from 59.7% to 65.6% while skipping 23% of video.
摘要：总结视频内容在许多应用中的一项重要任务。作为行动的有序列表的计算呈现在视频这个任务可以被定义。这样的列表可以用动作检测算法来提取。然而，没有必要确定行动的时间界限，知道他们的存在。此外，定位精确边界通常需要密集的视频分析是有效的。在这项工作中，我们建议由稀疏浏览视频并选择每个操作实例一帧，任务被称为文学行动斑点直接计算此有序列表。要做到这一点，我们提出ActionSpotter，一个斑点的算法，利用深强化学习的有效当场行动，同时调整其视频的浏览速度，无需额外的监管。实验数据集上进行THUMOS14和ActivityNet表明，艺术的检测方法我们的框架性能优于状态。特别是，THUMOS14的点滴平均平均精度显著的59.7％，而跳过视频的23％提高到65.6％。

32. Roommate Compatibility Detection Through Machine Learning Techniques [PDF] 返回目录
Mansha Lamba, Raunak Goswami, Mr. Vinay, Mohit Lamba
Abstract: Our objective is to develop an artificially intelligent system which aims at checking the compatibility between the roommates of same or different sex sharing a common area of residence. There are a few key factors determining one's compatibility with the other person. Interpersonal behaviour , situational awareness, communication skills. Here we are trying to build a system that evaluates user on these key factors not via pen paper test but through a highly engaging set of questions and answers. Hence using these scores as an input to our machine learning algorithm which is based on previous trends to come up with percentage probability of user being compatible with another user. With the growing population there is always a challenge for organisation and educational institutions to make the students and their employees more and more productive and in such cases a person's social environment comes into play. A person may be a genius but as long as he is not able to work well with his peers there will always be a chance of more productive performance. It is a well-established fact that human are and have always been a social animal and this has helped in creating communities of like-minded people. Many times, even when there are a large no of people employed to do a particular task the result may not be as expected as people may not compatible in working with one another. This at the end creates performance gaps, hinders organisation success and in many cases loss of precious resources. Our intent is not to remove the non-compatible people from the picture but to find out the perfect compatible match for the person elsewhere that will not only save the resources will also enable effective use of resources. Through the use of various machine learning classification techniques, we intent to do this.
摘要：我们的目标是开发一种人工智能系统，它的目的是检查相同或不同性别的共享居住的公共区域的室友之间的兼容性。有确定与其他人一个的兼容性的几个关键因素。人际行为，态势感知能力，沟通能力。在这里，我们试图建立一个系统，在不通过笔纸测试，但通过极具吸引力的一系列问题和回答这些关键因素评估板用户。因此，使用这些分数作为输入到我们的机器学习算法是基于先前的趋势，拿出的用户百分比概率是与另一个用户兼容。随着人口的不断增长总会有一款适合组织和教育机构，使学生和他们的员工更多，更高效，在这种情况下，一个人的社会环境发挥作用的一个挑战。一个人可能是个天才，但只要他是不能够与他的同龄人以及工作总是会有更多的生产性能的机会。这是一个公认的事实是人类是，一直都是社会动物，这有助于创造志同道合的人的社区。很多时候，即使有中采用做特定任务的结果可能不会像预期的那样的人可能不是彼此兼容工作的人的大型无。这在年底创建绩效差距，阻碍了企业的成功在许多情况下宝贵资源的流失。我们的目的并不是要移除图片中的不兼容的人，而是找出其他地方的人的完美兼容的比赛，不仅节约资源也将使资源的有效利用。通过使用各种机器学习分类技术，我们打算这样做。

33. Exploration of Indoor Environments Predicting the Layout of Partially Observed Rooms [PDF] 返回目录
Matteo Luperto, Luca Fochetta, Francesco Amigoni
Abstract: We consider exploration tasks in which an autonomous mobile robot incrementally builds maps of initially unknown indoor environments. In such tasks, the robot makes a sequence of decisions on where to move next that, usually, are based on knowledge about the observed parts of the environment. In this paper, we present an approach that exploits a prediction of the geometric structure of the unknown parts of an environment to improve exploration performance. In particular, we leverage an existing method that reconstructs the layout of an environment starting from a partial grid map and that predicts the shape of partially observed rooms on the basis of geometric features representing the regularities of the indoor environment. Then, we originally employ the predicted layout to estimate the amount of new area the robot would observe from candidate locations in order to inform the selection of the next best location and to early stop the exploration when no further relevant area is expected to be discovered. Experimental activities show that our approach is able to effectively predict the layout of partially observed rooms and to use such knowledge to speed up the exploration.
摘要：我们认为，其中自主移动机器人逐步建立最初不知道室内环境的地图探索任务。在这些任务中，机器人使得在哪里下次移动决定的顺序，通常是基于对环境的观察到的部分知识。在本文中，我们提出利用环境的未知部分的几何结构的预测，以提高勘探性能的方法。特别是，我们利用的是从重构的局部栅格地图开始的环境的布局的现有方法和预测的表示室内环境的规律几何特征的基础上，部分地观察到的房间的形状。然后，我们原来采用的预测布局估计，以通知的下一个最佳位置的选择和提前停止勘探新领域的机器人将从候选位置观察量时，预计没有进一步的相关区域被发现。实验活动表明，我们的做法是能够有效地预测的部分观察室的布局和利用这些知识来加快探索。

34. Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations [PDF] 返回目录
Yu-Syuan Xu, Shou-Yao Roy Tseng, Yu Tseng, Hsien-Kai Kuo, Yi-Min Tsai
Abstract: Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instead, a more practical approach is to train a single network for wide-ranging and variational degradations. To fulfill this requirement, this paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra-image (spatial variations). Different from the existing works, we incorporate dynamic convolution which is a far more flexible alternative to handle different variations. In SISR with non-blind setting, our Unified Dynamic Convolutional Network for Variational Degradations (UDVD) is evaluated on both synthetic and real images with an extensive set of variations. The qualitative results demonstrate the effectiveness of UDVD over various existing works. Extensive experiments show that our UDVD achieves favorable or comparable performance on both synthetic and real images.
摘要：深卷积神经网络（细胞神经网络）对单幅图像超分辨率（SISR）取得了显着成效。尽管只考虑单一退化，最近的研究还包括更好地多个劣化效应反映真实世界的情况。然而，大部分工程承担的降级效应的固定组合，或者甚至训练的个体网络不同的组合。取而代之的是，一个更实际的方法是培养单个网络用于广泛而变退化。为了满足该要求，提出了一种统一的网络，以适应从帧间图像（交叉图像的变化）和图像内（空间变化）的变化。从现有的作品不同的是，我们将动态卷积这是一种更为灵活的选择来处理不同的变化。在SISR与非盲设置，我们的统一动态卷积网络的变分的降解实验（UDVD）是在两者的合成，并具有广泛集合变化的真实图像进行评价。定性结果显示UDVD超过现有各类工程的成效。大量的实验表明，我们的UDVD实现在人工和真实图像良好或相当的性能。

35. Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images [PDF] 返回目录
D.Sabari Nathan, K.Uma, D Synthiya Vinothini, B. Sathya Bama, S. M. Md Mansoor Roomi
Abstract: Hyperspectral Imaging is the acquisition of spectral and spatial information of a particular scene. Capturing such information from a specialized hyperspectral camera remains costly. Reconstructing such information from the RGB image achieves a better solution in both classification and object recognition tasks. This work proposes a novel light weight network with very less number of parameters about 233,059 parameters based on Residual dense model with attention mechanism to obtain this solution. This network uses Coordination Convolutional Block to get the spatial information. The weights from this block are shared by two independent feature extraction mechanisms, one by dense feature extraction and the other by the multiscale hierarchical feature extraction. Finally, the features from both the feature extraction mechanisms are globally fused to produce the 31 spectral bands. The network is trained with NTIRE 2020 challenge dataset and thus achieved 0.0457 MRAE metric value with less computational complexity.
摘要：高光谱成像是采集的特定场景的光谱和空间信息。捕获从专门的高光谱相机这样的信息仍然是昂贵的。从RGB图像重建等信息实现了分类和目标识别任务，更好的解决方案。这项工作提出了一种新的光网络重量具有非常少的约基于残余致密模型注意机制以获得该溶液233059个参数的参数号。此网络使用卷积协调座获得的空间信息。从该块中的权重是由两个独立的特征提取机制，一个由密集特征提取和另一个由多尺度分层特征提取共享。最后，由所述特征抽取机制两个功能进行全局稠合以产生31个的光谱带。该网络进行训练NTIRE 2020挑战数据集，从而达到0.0457以较少的计算复杂性MRAE度量值。

36. Analysis of Scoliosis From Spinal X-Ray Images [PDF] 返回目录
Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Kenneth M.C. Cheung, Michael To, Zhen Qian, Demetri Terzopoulos
Abstract: Scoliosis is a congenital disease in which the spine is deformed from its normal shape. Measurement of scoliosis requires labeling and identification of vertebrae in the spine. Spine radiographs are the most cost-effective and accessible modality for imaging the spine. Reliable and accurate vertebrae segmentation in spine radiographs is crucial in image-guided spinal assessment, disease diagnosis, and treatment planning. Conventional assessments rely on tedious and time-consuming manual measurement, which is subject to inter-observer variability. A fully automatic method that can accurately identify and segment the associated vertebrae is unavailable in the literature. Leveraging a carefully-adjusted U-Net model with progressive side outputs, we propose an end-to-end segmentation model that provides a fully automatic and reliable segmentation of the vertebrae associated with scoliosis measurement. Our experimental results from a set of anterior-posterior spine X-Ray images indicate that our model, which achieves an average Dice score of 0.993, promises to be an effective tool in the identification and labeling of spinal vertebrae, eventually helping doctors in the reliable estimation of scoliosis. Moreover, estimation of Cobb angles from the segmented vertebrae further demonstrates the effectiveness of our model.
摘要：脊柱侧弯是先天性疾病，其中脊柱从其正常形状变形。脊柱侧弯的测量需要脊柱标签和椎骨的鉴定。脊柱X光片是成像脊柱最具成本效益和可访问的方式。在X光片脊柱可靠和准确的椎骨分割是图像引导脊柱评估，疾病的诊断和治疗计划的关键。常规评估依靠乏味和耗时的手工测量，这是受观察者间的变异性。可以准确地识别和段相关的椎骨完全自动的方法是在文献中不可用。利用与渐进侧输出仔细调整U形网模型，我们提出了一种端至端分割模型提供与脊柱侧凸测量相关的椎骨的全自动和可靠分割。我们从一组前后脊柱X射线图像的实验结果表明，我们的模型，达到平均骰子得分0.993，承诺要在脊椎骨的标识和标签的有效工具，最终帮助医生在可靠估算的脊柱侧弯。此外，柯布的估计角度上的分段椎骨进一步证明了我们模型的有效性。

37. Effect of Input Noise Dimension in GANs [PDF] 返回目录
Manisha Padala, Debojit Das, Sujit Gujar
Abstract: Generative Adversarial Networks (GANs) are by far the most successful generative models. Learning the transformation which maps a low dimensional input noise to the data distribution forms the foundation for GANs. Although they have been applied in various domains, they are prone to certain challenges like mode collapse and unstable training. To overcome the challenges, researchers have proposed novel loss functions, architectures, and optimization methods. In our work here, unlike the previous approaches, we focus on the input noise and its role in the generation. We aim to quantitatively and qualitatively study the effect of the dimension of the input noise on the performance of GANs. For quantitative measures, typically \emph{Fréchet Inception Distance (FID)} and \emph{Inception Score (IS)} are used as performance measure on image data-sets. We compare the FID and IS values for DCGAN and WGAN-GP. We use three different image data-sets - each consisting of different levels of complexity. Through our experiments, we show that the right dimension of input noise for optimal results depends on the data-set and architecture used. We also observe that the state of the art performance measures does not provide enough useful insights. Hence we conclude that we need further theoretical analysis for understanding the relationship between the low dimensional distribution and the generated images. We also require better performance measures.
摘要：创成对抗性网络（甘斯）是迄今为止最成功的生成模型。学习它映射的低维输入噪声给前述数据配给形成用于甘斯的基础的转化。虽然他们已经在各个领域得到应用，他们很容易像模式瓦解和不稳定的训练一定的挑战。为了克服所面临的挑战，研究人员提出了新的损失的功能，架构和优化方法。在这里我们的工作，不像以前的方法，我们专注于输入噪声及其产生的作用。我们的目标是定量和定性研究了甘斯的表现输入噪声的大小的影响。对于定量测量，典型地\ EMPH {Fréchet可启距离（FID）}和\ {EMPH盗分数（IS）}被用作性能测量的图像数据集。我们比较了FID，是DCGAN和WGAN-GP值。我们使用三种不同的图像数据集 - 分别由不同复杂程度的。通过我们的实验，我们表明，输入噪声以获得最佳效果的正确尺寸取决于所使用的数据集和体系结构。我们还观察到的艺术表演措施的国家没有提供足够的有用的见解。因此，我们得出结论，我们需要理解的低维分布和所产生的图像之间的关系进一步理论分析。我们还需要更好的性能指标。

38. Mosaic Super-resolution via Sequential Feature Pyramid Networks [PDF] 返回目录
Mehrdad Shoeiby, Mohammad Ali Armin, Sadegh Aliakbarian, Saeed Anwar, Lars Petersson
Abstract: Advances in the design of multi-spectral cameras have led to great interests in a wide range of applications, from astronomy to autonomous driving. However, such cameras inherently suffer from a trade-off between the spatial and spectral resolution. In this paper, we propose to address this limitation by introducing a novel method to carry out super-resolution on raw mosaic images, multi-spectral or RGB Bayer, captured by modern real-time single-shot mosaic sensors. To this end, we design a deep super-resolution architecture that benefits from a sequential feature pyramid along the depth of the network. This, in fact, is achieved by utilizing a convolutional LSTM (ConvLSTM) to learn the inter-dependencies between features at different receptive fields. Additionally, by investigating the effect of different attention mechanisms in our framework, we show that a ConvLSTM inspired module is able to provide superior attention in our context. Our extensive experiments and analyses evidence that our approach yields significant super-resolution quality, outperforming current state-of-the-art mosaic super-resolution methods on both Bayer and multi-spectral images. Additionally, to the best of our knowledge, our method is the first specialized method to super-resolve mosaic images, whether it be multi-spectral or Bayer.
摘要：进展的多光谱相机的设计都带来了很大的利益，在广泛的应用，从天文到自主驾驶。然而，这样的相机本身遭受的空间和光谱分辨率之间的权衡。在本文中，我们提出了通过引入一种新的方法来进行对原始镶嵌图像的超分辨率，以解决此限制，多光谱或RGB拜尔，通过现代实时单次马赛克传感器捕获。为此，我们设计了一个深超分辨率的架构，从顺序功能优势的同时，网络的深度金字塔。此，实际上，是利用卷积LSTM（ConvLSTM）来学习特征之间的相互依赖关系在不同的感受域来实现的。此外，通过调查在我们的框架不同的注意机制的影响，我们表明，ConvLSTM启发模块能够提供卓越的关注我们的环境。我们大量的实验和分析的证据表明，我们的方法产生显著超分辨率质量，跑赢上都拜耳和多光谱图像当前国家的最先进的镶嵌超分辨率的方法。此外，据我们所知，我们的方法是第一个专门法超决心镶嵌图像，无论是多光谱或Bayer。

39. Melanoma Detection using Adversarial Training and Deep Transfer Learning [PDF] 返回目录
Hasib Zunair, A. Ben Hamza
Abstract: Skin lesion datasets consist predominantly of normal samples with only a small percentage of abnormal ones, giving rise to the class imbalance problem. Also, skin lesion images are largely similar in overall appearance owing to the low inter-class variability. In this paper, we propose a two-stage framework for automatic classification of skin lesion images using adversarial training and transfer learning toward melanoma detection. In the first stage, we leverage the inter-class variation of the data distribution for the task of conditional image synthesis by learning the inter-class mapping and synthesizing under-represented class samples from the over-represented ones using unpaired image-to-image translation. In the second stage, we train a deep convolutional neural network for skin lesion classification using the original training set combined with the newly synthesized under-represented class samples. The training of this classifier is carried out by minimizing the focal loss function, which assists the model in learning from hard examples, while down-weighting the easy ones. Experiments conducted on a dermatology image benchmark demonstrate the superiority of our proposed approach over several standard baseline methods, achieving significant performance improvements. Interestingly, we show through feature visualization and analysis that our method leads to context based lesion assessment that can reach an expert dermatologist level.
摘要：皮损数据集主要由正常样品，只有那些异常的很小比例，从而引发类不平衡问题。此外，皮肤损伤的图像是由于低阶级间的变异整体外观极其相似。在本文中，我们提出了用对黑素瘤检测的对抗训练和迁移学习皮肤损伤的图像进行自动分类的两级架构。在第一阶段中，我们通过学习级间映射和合成从过度代表那些代表性不足类样品使用不成对图像到图像利用有条件图像合成的任务的数据分布的级间变化翻译。在第二阶段，我们培养皮肤病变分类深卷积神经网络使用该原始训练集与新合成的代表性不足类别样本组合。该分类的训练是通过最小化损失焦功能，这有助于从硬实例学习，边倒加权难办的模式进行。在皮肤科图像基准所进行的实验证明我们提出的方法在几个标准基线方法的优越性，实现显著的性能提升。有趣的是，我们通过展示功能可视化和分析，我们的方法导致基于上下文病变评估，认为可以达到专家的水平皮肤科医生。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-04-16

目录

摘要