摘要

1. Seismic Shot Gather Noise Localization Using a Multi-Scale Feature-Fusion-Based Neural Network [PDF] 返回目录
Antonio José G. Busson, Sérgio Colcher, Ruy Luiz Milidiú, Bruno Pereira Dias, André Bulcão
Abstract: Deep learning-based models, such as convolutional neural networks, have advanced various segments of computer vision. However, this technology is rarely applied to seismic shot gather noise localization problem. This letter presents an investigation on the effectiveness of a multi-scale feature-fusion-based network for seismic shot-gather noise localization. Herein, we describe the following: (1) the construction of a real-world dataset of seismic noise localization based on 6,500 seismograms; (2) a multi-scale feature-fusion-based detector that uses the MobileNet combined with the Feature Pyramid Net as the backbone; and (3) the Single Shot multi-box detector for box classification/regression. Additionally, we propose the use of the Focal Loss function that improves the detector's prediction accuracy. The proposed detector achieves an AP@0.5 of 78.67\% in our empirical evaluation.
摘要：深学习型模型，如卷积神经网络，具有计算机视觉的先进各个环节。然而，这种技术很少应用到地震炮收集噪声定位问题。这封信呈现在基于特征融合多网络规模地震的有效性进行调查拍摄收集噪声的定位。在这里，我们描述如下：（1）基于6500级地震地震噪声本土化的现实世界的数据集的建设; （2）使用该MobileNet与特征金字塔净作为主链结合的多尺度基于特征的融合检测器;和（3）单拍多盒检测器，用于框分类/消退。此外，我们建议使用能够提高探测器的预测精度的焦点损失函数。该检测器实现了78.67 \％，在我们的实证评价的AP@0.5。

2. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [PDF] 返回目录
Zhaohui Zheng, Ping Wang, Dongwei Ren, Wei Liu, Rongguang Ye, Qinghua Hu, Wangmeng Zuo
Abstract: Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $\ell_n$-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, , and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR$_{100}$ for object detection, and +0.9 AP and +3.5 AR$_{100}$ for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at this https URL
摘要：深基础的学习对象检测和实例分割都取得了前所未有的进展。在本文中，我们提出了完整的-IOU（CIoU）在边框回归和非最大抑制（NMS）都提高几何因素损失和集群网管，导致平均精确度（AP）和平均召回显着收益（AR ），未经推理效率的牺牲。特别是，我们考虑三个因素几何，即，重叠区域，归一中心点距离和纵横比，这是在物体检测和实例分割测量边界框回归至关重要。然后这三个几何因素纳入CIoU损失为更好区分难以回归的情况。使用一致的AP和AR改进CIoU损失结果的巨大模型相比的培训，以广泛采用$ \ $ ell_n损失范数和基于借条丢失。此外，建议集群网管，在推理过程中NMS是通过隐含聚类检测箱，通常需要较少的迭代完成。簇-NMS是非常有效的，由于其纯GPU实现，和几何因素可以结合，以改善AP和AR。在实验中，CIoU损失和群集-NMS已应用于国家的最先进的实例分割（例如，YOLACT），以及物体检测（例如，YOLO V3，SSD和更快的R-CNN）的模型。服用YOLACT在MS COCO作为一个例子，我们的方法实现了性能增益为1.7 AP和6.2 AR $ _ {100} $用于物体检测，和0.9 AP和3.5 AR $ _ {100} $例如分割， 27.1 FPS上一个NVIDIA GTX GPU 1080Ti。所有的源代码和训练的模型可在此HTTPS URL

3. NH-HAZE: An Image Dehazing Benchmark with Non-Homogeneous Hazy and Haze-Free Images [PDF] 返回目录
Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte
Abstract: Image dehazing is an ill-posed problem that has been extensively studied in the recent years. The objective performance evaluation of the dehazing methods is one of the major obstacles due to the lacking of a reference dataset. While the synthetic datasets have shown important limitations, the few realistic datasets introduced recently assume homogeneous haze over the entire scene. Since in many real cases haze is not uniformly distributed we introduce NH-HAZE, a non-homogeneous realistic dataset with pairs of real hazy and corresponding haze-free images. This is the first non-homogeneous image dehazing dataset and contains 55 outdoor scenes. The non-homogeneous haze has been introduced in the scene using a professional haze generator that imitates the real conditions of hazy scenes. Additionally, this work presents an objective assessment of several state-of-the-art single image dehazing methods that were evaluated using NH-HAZE dataset.
摘要：图片除雾是已在近年来被广泛研究的病态问题。该除雾方法的客观绩效评价是由于基准数据集合的缺乏的主要障碍之一。虽然合成数据集显示重要的限制，一些现实的数据集最近推出了承担整个场景均匀混浊。由于在许多实际案例阴霾不是均匀分布我们引入NH-HAZE，与对现实朦胧和相应的无雾图像的非均匀逼真的数据集。这是第一个非均质图像除雾数据集，并包含55个室外场景。非均质阴霾在现场使用专业的阴霾发电机模仿朦胧场景的实际情况出台。此外，这项工作提出的使用NH-HAZE数据集进行评估几个国家的最先进的单个图像去雾方法的一个客观的评价。

4. Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room [PDF] 返回目录
Lena Maier-Hein, Martin Wagner, Tobias Ross, Annika Reinke, Sebastian Bodenstedt, Peter M. Full, Hellena Hempe, Diana Mindroc-Filimon, Patrick Scholz, Thuy Nuong Tran, Pierangela Bruno, Anna Kisilenko, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Minu Tizabi, Matthias Eisenmann, Tim J. Adler, Janek Gröhl, Melanie Schellenberg, Silvia Seidlitz, T. Y. Emmy Lai, Veith Roethlingshoefer, Fabian Both, Sebastian Bittel, Marc Mengler, Martin Apitz, Stefanie Speidel, Hannes G. Kenngott, Beat P. Müller-Stich
Abstract: Image-based tracking of medical instruments is an integral part of many surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the methods proposed still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper introduces the Heidelberg Colorectal (HeiCo) data set the first publicly available data set enabling comprehensive benchmarking of medical instrument detection and segmentation algorithms with a specific emphasis on robustness and generalization capabilities of the methods. Our data set comprises 30 laparoscopic videos and corresponding sensor data from medical devices in the operating room for three different types of laparoscopic surgery. Annotations include surgical phase labels for all frames in the videos as well as instance-wise segmentation masks for surgical instruments in more than 10,000 individual frames. The data has successfully been used to organize international competitions in the scope of the Endoscopic Vision Challenges (EndoVis) 2017 and 2019.
摘要：医疗仪器的基于图像跟踪的许多外科数据科学应用的一个组成部分。以前的研究已经解决了检测，分割和基于腹腔镜视频数据跟踪医疗器械的任务。但是，仍然建议在方法往往当应用于挑战图像和不推广以及他们没有受过训练的数据失败。本文介绍了海德堡大肠（海高）数据中设置的第一可公开获得的数据组使得能够医疗器械检测和分割算法综合性基准测试与特定的强调方法的鲁棒性和泛化能力。我们的数据集包括30个腹腔镜视频和从在手术室为三种不同类型的腹腔镜手术的医疗设备对应的传感器数据。注释包括在视频的所有帧的手术阶段的标签，以及在超过10,000单个帧手术器械实例明智的分割掩码。该数据已成功地用于举办国际赛事在内窥镜视觉挑战（ENDOVIS）2017年和2019的范围。

5. Text Recognition in the Wild: A Survey [PDF] 返回目录
Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, Tianwei Wang
Abstract: The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research field in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising in terms of innovation, practicality, and efficiency. This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work. In summary, this literature review attempts to present the entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research. Related resources are available at our Github repository: this https URL.
摘要：文字的历史可以追溯到几千年来。通过文本进行丰富和精确的语义信息是在宽范围的基于视觉的应用场景重要。因此，在自然场景文本识别一直是计算机视觉和模式识别领域一个活跃的研究领域。近年来，同升和深度学习的发展，许多方法已经显示出的创新性，实用性和效率方面有前途的。本文旨在（1）概括的基本问题和国家的最先进的与场景文本识别相关联; （2）引进新的见解和思路; （3）提供公共可用资源进行全面审查; （4）所指出的方向为今后的工作。综上所述，本文献回顾尝试当前场景的文字识别领域的整个画面。它提供了进入这个领域的人一个全面的参考，并可能有助于激发未来的研究。相关资源可以从我们的Github上库：该HTTPS URL。

6. NTIRE 2020 Challenge on NonHomogeneous Dehazing [PDF] 返回目录
Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Jing Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, Lizhuang Ma, Ziling Huang, Qili Deng, Ju-Chin Chao, Tsung-Shan Yang, Peng-Wen Chen, Po-Min Hsu, Tzu-Yi Liao, Chung-En Sun, Pei-Yuan Wu, Jeonghyeok Do, Jongmin Park, Munchurl Kim, Kareem Metwaly, Xuelu Li, Tiantong Guo, Vishal Monga, Mingzhao Yu, Venkateswararao Cherukuri, Shiue-Yuan Chuang, Tsung-Nan Lin, David Lee, Jerome Chang, Zhan-Han Wang, Yu-Bang Chang, Chang-Hong Lin, Yu Dong, Hongyu Zhou, Xiangzhen Kong, Sourya Dipta Das, Saikat Dutta, Xuan Zhao, Bing Ouyang, Dennis Estrada, Meiqi Wang, Tianqi Su, Siyi Chen, Bangyong Sun, Vincent Whannou de Dravo, Zhe Yu, Pratik Narang, Aryan Mehra, Navaneeth Raghunath, Murari Mandal
Abstract: This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing.
摘要：本文综述了NTIRE 2020挑战赛上的图像的非均质除雾（在朦胧的图像丰富的细节还原）。我们专注于对NH-阴霾，一种新型的数据集由55一对实阴霾自由和非均质朦胧的影像记录室外评估提出的解决方案及其结果。 NH-雾是第一个现实的非齐次霾数据集提供地面实况图像。非齐阴霾已经使用专业的阴霾发电机模仿薄雾场景的现实条件下产生的。 168名参与者注册在挑战和27支球队在最后的测试阶段比赛。提出的解决方案衡量状态的最先进的图像除雾。

7. Kunster -- AR Art Video Maker -- Real time video neural style transfer on mobile devices [PDF] 返回目录
Wojciech Dudzik, Damian Kosowski
Abstract: Neural style transfer is a well-known branch of deep learning research, with many interesting works and two major drawbacks. Most of the works in the field are hard to use by non-expert users and substantial hardware resources are required. In this work, we present a solution to both of these problems. We have applied neural style transfer to real-time video (over 25 frames per second), which is capable of running on mobile devices. We also investigate the works on achieving temporal coherence and present the idea of fine-tuning, already trained models, to achieve stable video. What is more, we also analyze the impact of the common deep neural network architecture on the performance of mobile devices with regard to number of layers and filters present. In the experiment section we present the results of our work with respect to the iOS devices and discuss the problems present in current Android devices as well as future possibilities. At the end we present the qualitative results of stylization and quantitative results of performance tested on the iPhone 11 Pro and iPhone 6s. The presented work is incorporated in Kunster - AR Art Video Maker application available in the Apple's App Store.
摘要：神经风格转移是深度学习研究的一个众所周知的分支，有许多有趣的作品和两大弊端。在大多数领域的作品都很难通过非专业用户和大量的硬件资源都需要使用。在这项工作中，我们提出了一个解决这两个问题。我们应用神经风格转移到实时视频（每秒25帧），这是能够在移动设备上运行的。我们还探讨实现时间相干性的作品，并提出微调，已经训练过的模型的想法，实现稳定的视频。更重要的是，我们还分析了常见的深层神经网络架构的移动设备上的性能对于层数的影响和过滤器存在。在实验部分，我们提出我们的工作结果与对于iOS设备，并讨论出现在目前的Android设备以及未来的可能性的问题。最后，我们给出程式化和性能的定量结果的iPhone上的11 Pro和iPhone 6S测试的定性结果。在苹果的App Store提供AR艺术视频制作应用程序 - 该论文在Kunster中。

8. Semantic Signatures for Large-scale Visual Localization [PDF] 返回目录
Li Weng, Valerie Gouet-Brunet, Bahman Soheilian
Abstract: Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called "semantic signature" is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI'18. A more efficient retrieval protocol is presented with additional experiment results.
摘要：可视定位是标准定位技术的有用替代。它的工作原理，利用相机。在一个典型的场景中，特征从捕获的图像中提取，并用地理参考数据库进行比较。位置信息，然后从匹配的结果推断。传统方案主要使用低级别的视觉特征。这些方法提供了良好的精度，而且从可扩展性问题的影响。为了帮助定位在大型城市地区，这项工作探索利用高层语义信息不同的路径。我们发现在街景该对象的信息可以方便的定位。所谓的“语义特征”一种新的描述符方案提出来概括这一信息。语义签名包括在一个空间位置的可见对象的类型和角度信息。几个指标协议提出了签名比较和检索。他们说明了精确度和复杂度之间的不同折衷。大量的仿真结果证实了该方案的大规模应用的潜力。本文是作者在CBMI'18会议论文的扩展版本。一个更有效的检索协议呈现另外的实验结果。

9. Self-Supervised Human Depth Estimation from Monocular Videos [PDF] 返回目录
Feitong Tan, Hao Zhu, Zhaopeng Cui, Siyu Zhu, Marc Pollefeys, Ping Tan
Abstract: Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
摘要：估计详细的人力深度先前的方法往往需要`地面实况”深度数据指导训练。本文提出了一种自我监督的方法，可以在YouTube上的视频，而不知道深度的培训，这使得训练数据采集简单，提高了学习网络的泛化。自监督学习是通过最小化光损失稠度，其翘曲根据估计的深度和人体的三维非刚性运动的视频帧和它的相邻帧之间评估来实现的。为了解决这个非刚性运动，我们首先估计在每个视频帧的粗略SMPL模型并计算非刚体运动相应地，这使得能够在估计所述形状细节自监督学习。实验表明，我们的方法中享有较好的泛化和执行好得多在野外数据。

10. DMCP: Differentiable Markov Channel Pruning for Neural Networks [PDF] 返回目录
Shaopeng Guo, Yujie Wang, Quanquan Li, Junjie Yan
Abstract: Recent works imply that the channel pruning can be regarded as searching optimal sub-structure from unpruned networks. However, existing works based on this observation require training and evaluating a large number of structures, which limits their application. In this paper, we propose a novel differentiable method for channel pruning, named Differentiable Markov Channel Pruning (DMCP), to efficiently search the optimal sub-structure. Our method is differentiable and can be directly optimized by gradient descent with respect to standard task loss and budget regularization (e.g. FLOPs constraint). In DMCP, we model the channel pruning as a Markov process, in which each state represents for retaining the corresponding channel during pruning, and transitions between states denote the pruning process. In the end, our method is able to implicitly select the proper number of channels in each layer by the Markov process with optimized transitions. To validate the effectiveness of our method, we perform extensive experiments on Imagenet with ResNet and MobilenetV2. Results show our method can achieve consistent improvement than state-of-the-art pruning methods in various FLOPs settings. The code is available at this https URL
摘要：最近的作品暗示通道修剪可视为从未修剪的网络搜索最优子结构。然而，在此基础上观察现有的工作需要培训和评估大量的结构，这限制了它们的应用。在本文中，我们提出一种用于信道修剪一种新颖的可区分的方法，命名为可微马尔可夫信道修剪（DMCP），为了有效地搜索最优子结构。我们的方法是可微的，并且可以直接通过梯度下降相对于标准任务损失和预算正规化（例如触发器约束）优化。在DMCP，我们建模通道修剪为马尔可夫过程，其中每个状态表示用于修剪期间保持相应的信道，和状态之间的转换表示修剪过程。最后，我们的方法能够通过与过渡优化马尔可夫过程的每一层选择隐式信道的适当的数。为了验证我们的方法的有效性，我们执行上Imagenet广泛的实验与RESNET和MobilenetV2。结果显示我们的方法可以达到一致的改善比在各个触发器设置国家的最先进的修剪方法。该代码可在此HTTPS URL

11. Regression Forest-Based Atlas Localization and Direction Specific Atlas Generation for Pancreas Segmentation [PDF] 返回目录
Masahiro Oda, Natsuki Shimizu, Ken'ichi Karasawa, Yukitaka Nimura, Takayuki Kitasaka, Kazunari Misawa, Michitaka Fujiwara, Daniel Rueckert, Kensaku Mori
Abstract: This paper proposes a fully automated atlas-based pancreas segmentation method from CT volumes utilizing atlas localization by regression forest and atlas generation using blood vessel information. Previous probabilistic atlas-based pancreas segmentation methods cannot deal with spatial variations that are commonly found in the pancreas well. Also, shape variations are not represented by an averaged atlas. We propose a fully automated pancreas segmentation method that deals with two types of variations mentioned above. The position and size of the pancreas is estimated using a regression forest technique. After localization, a patient-specific probabilistic atlas is generated based on a new image similarity that reflects the blood vessel position and direction information around the pancreas. We segment it using the EM algorithm with the atlas as prior followed by the graph-cut. In evaluation results using 147 CT volumes, the Jaccard index and the Dice overlap of the proposed method were 62.1% and 75.1%, respectively. Although we automated all of the segmentation processes, segmentation results were superior to the other state-of-the-art methods in the Dice overlap.
摘要：提出从通过回归森林和Atlas代利用图谱定位CT体积使用血管信息的完全自动化的地图集胰分割方法。上一页概率地图集胰分割方法无法处理那些在胰腺中发现以及空间变化。另外，形状的变化不被平均的图谱表示。我们提出了一个完全自动化的胰腺分割方法，与两种类型的变化的交易如上所述。胰腺的位置和大小被使用回归技术森林估计。定位后，患者特异性概率图谱来生成基于反映胰腺周围血管的位置和方向信息的新的图像的相似性。我们段它使用与图谱作为先其次是图切割EM算法。在使用147个CT体积评价结果，所述的Jaccard指数和所提出的方法的骰子重叠分别为62.1％和75.1％。虽然我们自动化所有的分割处理的，分割结果优于其他国家的的技术方法在骰子重叠。

12. Scene Text Image Super-Resolution in the Wild [PDF] 返回目录
Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
Abstract: Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.
摘要：低分辨率图像的文字中经常可看到的自然景观，如通过手机拍摄的文件。认识到低分辨率文字图像是具有挑战性的，因为他们失去了内容详细信息，导致识别精度差。直观的解决方案是引入超分辨率（SR）技术作为预处理。然而，以往的单一图像超分辨率（SISR）方法被训练上合成的低分辨率图像（e.g.Bicubic下采样），这是简单的，不适合实际低分辨率文本识别。为此，我们亲构成了真实场景文本SR数据集，称为TextZoom。它包含成对这是由相机在野外不同焦距的实拍低分辨率和高分辨率的图像。更真实，比人工合成的数据挑战，如图1，我们认为即兴-ING识别精度是现场文字SR的终极目标。在这个目的，一个新的文本超分辨率的网络被称为TSRN，有三个新的模块开发。（1）一种顺序残余块被提出来提取文本的图像的顺序的信息。（2）的边界感知损失被设计成锐化字符边界。（3）中央对准模块被提出，以减轻在TextZoom错位问题。在TextZoom大量的实验证明，我们的TSRN很大程度上是由CRNN超过13％，提高了识别的准确性，以及近9.0％ASTER和MORAN相比，合成的SR数据。此外，我们的TSRN明显优于在提高LR图像的识别精度TextZoom 7国家的最先进的SR方法。例如，它由5％以上和ASTER和CRNN的识别精度8％优于LapSRN。我们的研究结果表明，在野外低分辨率文字识别是被解决为止，需要更多这样的研究工作。

13. Wavelet Integrated CNNs for Noise-Robust Image Classification [PDF] 返回目录
Qiufu Li, Linlin Shen, Sheng Guo, Zhihui Lai
Abstract: Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
摘要：卷积神经网络（细胞神经网络）通常容易产生噪音的中断，即，小图像噪声会导致在输出中的急剧变化。为了抑制噪声影响到最终的预测，我们用离散小波更换MAX-池，跨入卷积，和平均池变换（DWT）提高细胞神经网络。我们本发明总的DWT和逆DWT（IDWT）适用于各种小波像哈尔，Daubechies小波，和Cohen等的层，并设计小波使用这些层对图像分类集成细胞神经网络（WaveCNets）。在WaveCNets，特征地图被分解成下采样时的低频和高频分量。低频分量存储主信息包括基本对象结构，其被传输到后续层以提取健壮高级特征。的高频成分，将含有大部分的数据的噪音，推理过程中被丢弃，以改善WaveCNets的噪声鲁棒性。在ImageNet和ImageNet-C我们的实验结果（嘈杂版本ImageNet的）表明WaveCNets的VGG，ResNets小波集成版本，并DenseNet，实现比其香草版本更高的精度和更好的噪声鲁棒性。

14. Deep Learning based Person Re-identification [PDF] 返回目录
Nirbhay Kumar Tagore, Ayushman Singh, Sumanth Manche, Pratik Chattopadhyay
Abstract: Automated person re-identification in a multi-camera surveillance setup is very important for effective tracking and monitoring crowd movement. In the recent years, few deep learning based re-identification approaches have been developed which are quite accurate but time-intensive, and hence not very suitable for practical purposes. In this paper, we propose an efficient hierarchical re-identification approach in which color histogram based comparison is first employed to find the closest matches in the gallery set, and next deep feature based comparison is carried out using Siamese network. Reduction in search space after the first level of matching helps in achieving a fast response time as well as improving the accuracy of prediction by the Siamese network by eliminating vastly dissimilar elements. A silhouette part-based feature extraction scheme is adopted in each level of hierarchy to preserve the relative locations of the different body structures and make the appearance descriptors more discriminating in nature. The proposed approach has been evaluated on five public data sets and also a new data set captured by our team in our laboratory. Results reveal that it outperforms most state-of-the-art approaches in terms of overall accuracy.
摘要：自动人重新鉴定的多摄像头监控的设置是有效的跟踪和监控人流非常重要。近年来，几道深深的学习基础重新鉴定方法已经开发了相当准确的，但耗时的，因此不太适合实际用途。在本文中，我们提出了在基于颜色直方图比较首先用来寻找最近的比赛在画廊一套高效的分层重新鉴定方法，以及未来深基于特征的比较采用连体网络进行。减少搜索空间的匹配的第一级后有助于实现快速响应时间以及通过消除大大异种元素提高预测的由所述网络连体的准确性。一种基于部分轮廓特征提取方案在每个层次保留不同的身体结构的相对位置，使外观描述在本质上更多的歧视被采用。所提出的方法进行了评估五个公共数据集，也由我们在我们的实验室团队捕捉到新的数据集。结果表明，它优于大多数国家的最先进的整体精度方面接近。

15. Multi-Target Deep Learning for Algal Detection and Classification [PDF] 返回目录
Peisheng Qian, Ziyuan Zhao, Haobing Liu, Yingcai Wang, Yu Peng, Sheng Hu, Jing Zhang, Yue Deng, Zeng Zeng
Abstract: Water quality has a direct impact on industry, agriculture, and public health. Algae species are common indicators of water quality. It is because algal communities are sensitive to changes in their habitats, giving valuable knowledge on variations in water quality. However, water quality analysis requires professional inspection of algal detection and classification under microscopes, which is very time-consuming and tedious. In this paper, we propose a novel multi-target deep learning framework for algal detection and classification. Extensive experiments were carried out on a large-scale colored microscopic algal dataset. Experimental results demonstrate that the proposed method leads to the promising performance on algal detection, class identification and genus identification.
摘要：水质，对工业，农业，公共健康有直接影响。藻类是水质量的常用指标。这是因为藻类群落都在它们的栖息地变化，在水质上的变化给予的宝贵知识敏感。然而，水的质量分析需要在显微镜下藻类检测和分类，这是非常费时和乏味的专业检查。在本文中，我们提出了藻类检测和分类的新型多目标深度学习的框架。大量的实验是在一个大型的彩色显微藻类的数据集进行。实验结果表明，该方法导致对藻类检测，分类识别和鉴定属有为的性能。

16. Hierarchical Predictive Coding Models in a Deep-Learning Framework [PDF] 返回目录
Matin Hosseini, Anthony Maida
Abstract: Bayesian predictive coding is a putative neuromorphic method for acquiring higher-level neural representations to account for sensory input. Although originating in the neuroscience community, there are also efforts in the machine learning community to study these models. This paper reviews some of the more well known models. Our review analyzes module connectivity and patterns of information transfer, seeking to find general principles used across the models. We also survey some recent attempts to cast these models within a deep learning framework. A defining feature of Bayesian predictive coding is that it uses top-down, reconstructive mechanisms to predict incoming sensory inputs or their lower-level representations. Discrepancies between the predicted and the actual inputs, known as prediction errors, then give rise to future learning that refines and improves the predictive accuracy of learned higher-level representations. Predictive coding models intended to describe computations in the neocortex emerged prior to the development of deep learning and used a communication structure between modules that we name the Rao-Ballard protocol. This protocol was derived from a Bayesian generative model with some rather strong statistical assumptions. The RB protocol provides a rubric to assess the fidelity of deep learning models that claim to implement predictive coding.
摘要：贝叶斯预测编码是获得更高级别的神经交涉账户感觉输入一个假定的神经形态方法。虽然起源于神经科学界，也有在机器学习领域的努力来研究这些模型。本文回顾了一些比较知名的车型。我们的审查分析模块的连接和信息传递的方式，寻求找到整个模型中使用的一般原则。我们还调查了一些最近尝试了深刻的学习框架内施放这些模型。贝叶斯的定义特征预测编码是，它采用自顶向下，重建机制来预测传入感觉输入或他们的下级表示。之间的差异的预测和实际的投入，被称为预测误差，然后产生未来得知提炼和改进了解到更高级别表示的预测准确性。预测编码模型旨在描述之前，深度学习的发展，在新皮层计算出现和使用的模块，我们将其命名饶巴拉德协议之间的通信结构。此协议是从贝叶斯生成模型得出一些相当强的统计假设。该RB协议提供了一个专栏，以评估声称实现预测编码深度学习模型的保真度。

17. Deeply Supervised Active Learning for Finger Bones Segmentation [PDF] 返回目录
Ziyuan Zhao, Xiaoyan Yang, Bharadwaj Veeravalli, Zeng Zeng
Abstract: Segmentation is a prerequisite yet challenging task for medical image analysis. In this paper, we introduce a novel deeply supervised active learning approach for finger bones segmentation. The proposed architecture is fine-tuned in an iterative and incremental learning manner. In each step, the deep supervision mechanism guides the learning process of hidden layers and selects samples to be labeled. Extensive experiments demonstrated that our method achieves competitive segmentation results using less labeled samples as compared with full annotation.
摘要：分割又是具有挑战性的任务的医学图像分析的先决条件。在本文中，我们介绍一种新的深刻监督的手指骨分割主动学习的方法。所提出的架构是微调以迭代和增量学习方式。在每个步骤中，深监督机制导向隐藏层和选择样本的学习过程来进行标记。大量的实验表明，我们的方法使用更少的标记的样品具有完全注释相比，实现了有竞争力的分割结果。

18. End-to-End Domain Adaptive Attention Network for Cross-Domain Person Re-Identification [PDF] 返回目录
Amena Khatun, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract: Person re-identification (re-ID) remains challenging in a real-world scenario, as it requires a trained network to generalise to totally unseen target data in the presence of variations across domains. Recently, generative adversarial models have been widely adopted to enhance the diversity of training data. These approaches, however, often fail to generalise to other domains, as existing generative person re-identification models have a disconnect between the generative component and the discriminative feature learning stage. To address the on-going challenges regarding model generalisation, we propose an end-to-end domain adaptive attention network to jointly translate images between domains and learn discriminative re-id features in a single framework. To address the domain gap challenge, we introduce an attention module for image translation from source to target domains without affecting the identity of a person. More specifically, attention is directed to the background instead of the entire image of the person, ensuring identifying characteristics of the subject are preserved. The proposed joint learning network results in a significant performance improvement over state-of-the-art methods on several benchmark datasets.
摘要：人重新鉴定（重新-ID）保持在真实世界的场景中具有挑战性的，因为它需要受过训练的网络推广到完全看不见目标数据在跨域变化的存在。近日，生成对抗性的模型已被广泛采用，以提高训练数据的多样性。这些方法，但是，往往不能推广到其他领域，如现有的生成者重新鉴定机型具有生成组件和辨别功能学习阶段之间的脱节。为了解决正在进行的关于模型综合挑战，我们提出了一个终端到终端域自适应注意网络域之间的共同翻译的图像，并学习在一个单一的框架辨别重新显示功能。为了解决该领域的差距的挑战，我们介绍了从源到目标域图像平移注意的模块，而不会影响一个人的身份。更具体地，将注意力引导到背景而不是人的整个图像，从而确保受试者的识别特征被保留。拟议的合资学习网络导致了国家的最先进的方法显著的性能提升几个基准数据集。

19. Deep Learning Framework for Detecting Ground Deformation in the Built Environment using Satellite InSAR data [PDF] 返回目录
Nantheera Anantrasirichai, Juliet Biggs, Krisztina Kelevitz, Zahra Sadeghi, Tim Wright, James Thompson, Alin Achim, David Bull
Abstract: The large volumes of Sentinel-1 data produced over Europe are being used to develop pan-national ground motion services. However, simple analysis techniques like thresholding cannot detect and classify complex deformation signals reliably making providing usable information to a broad range of non-expert stakeholders a challenge. Here we explore the applicability of deep learning approaches by adapting a pre-trained convolutional neural network (CNN) to detect deformation in a national-scale velocity field. For our proof-of-concept, we focus on the UK where previously identified deformation is associated with coal-mining, ground water withdrawal, landslides and tunnelling. The sparsity of measurement points and the presence of spike noise make this a challenging application for deep learning networks, which involve calculations of the spatial convolution between images. Moreover, insufficient ground truth data exists to construct a balanced training data set, and the deformation signals are slower and more localised than in previous applications. We propose three enhancement methods to tackle these problems: i) spatial interpolation with modified matrix completion, ii) a synthetic training dataset based on the characteristics of real UK velocity map, and iii) enhanced over-wrapping techniques. Using velocity maps spanning 2015-2019, our framework detects several areas of coal mining subsidence, uplift due to dewatering, slate quarries, landslides and tunnel engineering works. The results demonstrate the potential applicability of the proposed framework to the development of automated ground motion analysis systems.
摘要：在欧洲生产的Sentinel-1数据的大量被用于开发泛国家地面运动服务。然而，阈值的简单的分析技术不能检测和分类复杂变形的信号可靠地使提供有用的信息，以宽范围的非专家的利益相关者的挑战。这里，我们探讨深度学习的适用性通过采用预训练的卷积神经网络（CNN）来检测一个全国规模的速度场变形方法。对于我们证明了概念，我们重点放在先前确定的变形与煤炭开采，地下取水，山体滑坡和隧道有关的英国。测量点的稀疏性和尖峰的存在噪声使该深学习网络，其中涉及的图像之间的空间的卷积运算的有挑战性的应用。此外，不足地面实况数据存在构建均衡的训练数据集，并且变形的信号比在以前的应用更慢更本地化。我们提出了三种增强方法来解决这些问题：1）与修正矩阵完成空间插值，ii）基于真实UK速度地图的特性的合成训练数据集，以及iii）增强过度包装技术。使用速度地图跨越二○一五年至2019年，我们的框架检测由于脱水，板岩采石场，山体滑坡和隧道工程工程采煤沉陷，隆起的几个领域。结果表明所提出的框架，以自动地运动分析系统的发展的潜在适用性。

20. Hierarchical Attention Network for Action Segmentation [PDF] 返回目录
Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract: The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.
摘要：事件的时间分割是一个重要的任务，并设置自动识别视频中的人的行为的前兆。一些已经尝试通过注意捕捉帧级突出方面，但他们缺乏在帧之间的时间关系，以有效地映射，因为它们只捕获时间相关的有限范围的能力。为此，我们提出了一个完整的端至端监督学习方法，可以更好地学习行为之间的关系随着时间的推移，从而提高企业的整体分割性能。所提出的分层经常关注框架分析在多个时间尺度的输入视频，以在帧级别和段级别的嵌入，以及执行细粒度动作分割。这产生一个简单的，重量轻，又非常有效的结构，用于分割连续视频流，具有多个应用程序域。我们评估我们在多个具有挑战性的公共标准数据集，包括MERL购物，50个沙拉和佐治亚理工学院自我中心的数据集系统，并实现国家的最先进的性能。所评估的数据集包括许多视频捕捉设置其中包括静态开销摄像机视图和动态的，自我为中心的头部安装的摄像机视图，展示了提出的框架的直接适用于各种环境。

21. What comprises a good talking-head video generation?: A Survey and Benchmark [PDF] 返回目录
Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu
Abstract: Over the years, performance evaluation has become essential in computer vision, enabling tangible progress in many sub-fields. While talking-head video generation has become an emerging research topic, existing evaluations on this topic present many limitations. For example, most approaches use human subjects (e.g., via Amazon MTurk) to evaluate their research claims directly. This subjective evaluation is cumbersome, unreproducible, and may impend the evolution of new research. In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies. As for evaluation, we either propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video, namely, identity preserving, lip synchronization, high video quality, and natural-spontaneous motion. By conducting a thoughtful analysis across several state-of-the-art talking-head generation approaches, we aim to uncover the merits and drawbacks of current methods and point out promising directions for future work. All the evaluation code is available at: this https URL.
摘要：多年来，绩效评估已经成为计算机视觉至关重要，能够在许多子领域取得实际进展。虽然特写视频生成已成为一个新兴的研究课题，现有的评价对这个话题目前很多限制。例如，大多数方法使用人类受试者（例如，经由亚马逊MTurk）来直接评估他们的研究权利要求书。这种主观的评价是繁琐的，不可再生，并可能即将发生新的研究的发展。在这项工作中，我们提出了一个精心设计的基准与标准数据集的预处理策略评估特写视频生成。至于评估时，我们或者提出新的指标或选择最合适的人来评估什么，我们认为是一个很好的特写视频，即，身份保留，唇音同步，高质量视频所需的性能，和天然自发的运动结果。通过在多个国家的最先进的说话头生成方法进行深思熟虑的分析，我们的目标是揭示的优点和现有方法的缺陷，并指出有前途的未来的工作方向。所有评价代码，请访问：此HTTPS URL。

22. Recognizing Exercises and Counting Repetitions in Real Time [PDF] 返回目录
Talal Alatiah, Chen Chen
Abstract: Artificial intelligence technology has made its way absolutely necessary in a variety of industries including the fitness industry. Human pose estimation is one of the important researches in the field of Computer Vision for the last few years. In this project, pose estimation and deep machine learning techniques are combined to analyze the performance and report feedback on the repetitions of performed exercises in real-time. Involving machine learning technology in the fitness industry could help the judges to count repetitions of any exercise during Weightlifting or CrossFit competitions.
摘要：人工智能技术，已在多种行业，包括健身行业它的方式绝对必要的。人体姿势估计是计算机视觉领域的重要研究在过去几年中的一个。在这个项目中，姿态估计和深刻的机器学习技术相结合来分析的实时进行练习重复的性能和报告的反馈意见。在健身产业涉及的机器学习技术可以帮助法官在举重或CrossFit比赛来算任何运动的重复。

23. A Dynamical Perspective on Point Cloud Registration [PDF] 返回目录
Heng Yang
Abstract: We provide a dynamical perspective on the classical problem of 3D point cloud registration with correspondences. A point cloud is considered as a rigid body consisting of particles. The problem of registering two point clouds is formulated as a dynamical system, where the dynamic model point cloud translates and rotates in a viscous environment towards the static scene point cloud, under forces and torques induced by virtual springs placed between each pair of corresponding points. We first show that the potential energy of the system recovers the objective function of the maximum likelihood estimation. We then adopt Lyapunov analysis, particularly the invariant set theorem, to analyze the rigid body dynamics and show that the system globally asymptotically tends towards the set of equilibrium points, where the globally optimal registration solution lies in. We conjecture that, besides the globally optimal equilibrium point, the system has either three or infinite "spurious" equilibrium points, and these spurious equilibria are all locally unstable. The case of three spurious equilibria corresponds to generic shape of the point cloud, while the case of infinite spurious equilibria happens when the point cloud exhibits symmetry. Therefore, simulating the dynamics with random perturbations guarantees to obtain the globally optimal registration solution. Numerical experiments support our analysis and conjecture.
摘要：我们提供与对应的三维点云登记的经典问题一个动态的观点。点云被认为是由颗粒组成的一个刚性体。注册两个点云的问题被公式化为一个动力系统，其中所述动态模型点云平移和转动在朝向静态场景点云的粘性环境下的力和由假想的弹簧引起的转矩放置在每对对应的点之间。我们首先表明，该系统的势能而恢复最大似然估计的目标函数。然后，我们采用非线性系统分析，特别是不变集定理，分析刚体动力学，并表明该系统全局渐进趋向设定的平衡点，在全局最优登记办法在于，我们推测，除了全局最优平衡点，该系统具有三个或无限的“虚假”的平衡点，而这些虚假的均衡都是局部不稳定。三个伪平衡对应于点云的通用形状的情况下，而无限杂散均衡的情况下，当点云展品对称性发生。因此，与随机扰动模拟动力学保证得到全局最优登记溶液。数值实验支持我们的分析和猜想。

24. NTIRE 2020 Challenge on Image Demoireing: Methods and Results [PDF] 返回目录
Shanxin Yuan, Radu Timofte, Ales Leonardis, Gregory Slabaugh, Xiaotong Luo, Jiangtao Zhang, Yanyun Qu, Ming Hong, Yuan Xie, Cuihua Li, Dejia Xu, Yihao Chu, Qingyan Sun, Shuai Liu, Ziyao Zong, Nan Nan, Chenghua Li, Sangmin Kim, Hyungjoon Nam, Jisu Kim, Jechang Jeong, Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, Junwoo Lee, Bolun Zheng, Xiaohong Liu, Linhui Dai, Jun Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Chul Lee, An Gia Vien, Hyunkook Park, Sabari Nathan, M.Parisa Beham, S Mohamed Mansoor Roomi, Florian Lemarchand, Maxime Pelcat, Erwan Nogues, Densen Puthussery, Hrishikesh P S, Jiji C V, Ashish Sinha, Xuan Zhao
Abstract: This paper reviews the Challenge on Image Demoireing that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2020. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. The challenge was divided into two tracks. Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image. Track 2 focused on the burst demoireing problem, where a set of degraded moire images of the same scene were provided as input, with the goal of producing a single demoired image as output. The methods were ranked in terms of their fidelity, measured using the peak signal-to-noise ratio (PSNR) between the ground truth clean images and the restored images produced by the participants' methods. The tracks had 142 and 99 registered participants, respectively, with a total of 14 and 6 submissions in the final testing stage. The entries span the current state-of-the-art in image and burst image demoireing problems.
摘要：本文综述了图像Demoireing这是新趋势的图像恢复和增强（NTIRE）车间，连同2020年CVPR举行Demoireing的一部分，面临的挑战是从图像去除云纹图案揭示底层的一个艰巨的任务干净的形象。我们面临的挑战分为两个轨道。跟踪1靶向的单个图像demoireing问题，其目的是从一个单一的图像中去除波纹图案。跟踪2聚焦在突发demoireing问题，其中一组的相同场景的退化莫尔图像被作为输入提供，以产生单个demoired图像作为输出的目标。方法是以其保真度方面进行了排名，使用地面实况干净的图像和由所述参加者的方法产生被恢复的图像之间的峰值信噪比（PSNR）测量。磁道具有142名99注册参与者，分别与在最终测试阶段总共14次6的提交。条目跨越当前状态的最先进的图像和突发图像demoireing问题。

25. Deep Learning for Image-based Automatic Dial Meter Reading: Dataset and Baselines [PDF] 返回目录
Gabriel Salomon, Rayson Laroca, David Menotti
Abstract: Smart meters enable remote and automatic electricity, water and gas consumption reading and are being widely deployed in developed countries. Nonetheless, there is still a huge number of non-smart meters in operation. Image-based Automatic Meter Reading (AMR) focuses on dealing with this type of meter readings. We estimate that the Energy Company of Paraná (Copel), in Brazil, performs more than 850,000 readings of dial meters per month. Those meters are the focus of this work. Our main contributions are: (i) a public real-world dial meter dataset (shared upon request) called UFPR-ADMR; (ii) a deep learning-based recognition baseline on the proposed dataset; and (iii) a detailed error analysis of the main issues present in AMR for dial meters. To the best of our knowledge, this is the first work to introduce deep learning approaches to multi-dial meter reading, and perform experiments on unconstrained images. We achieved a 100.0% F1-score on the dial detection stage with both Faster R-CNN and YOLO, while the recognition rates reached 93.6% for dials and 75.25% for meters using Faster R-CNN (ResNext-101).
摘要：智能电表远程启动和自动电力，煤气及水的消耗阅读和被广泛部署于发达国家。尽管如此，仍然有巨大的操作非智能电表数字。基于图像的自动抄表（AMR）的重点是处理这类电表读数。我们估计，能源公司巴拉那（COPEL），在巴西，执行每月拨号米85个万多读数。这些米是这项工作的重点。我们的主要贡献是：（i）称为UFPR-ADMR公众现实世界的拨号计的数据集（根据要求共享）; （ii）在所提出的数据集深基于学习的识别基线; （三）目前在AMR的主要问题详细的错误分析拨号米。据我们所知，这是引入深度学习的第一项工作接近多拨号抄表，并在不受约束的图像进行实验。我们实现了与速度更快的R-CNN和YOLO拨号检测阶段100.0％F1-得分，而识别率达到了表盘93.6％和使用更快的R-CNN（ResNext-101）米75.25％。

26. Scale-Equalizing Pyramid Convolution for Object Detection [PDF] 返回目录
Xinjiang Wang, Shilong Zhang, Zhuoran Yu, Litong Feng, Wayne Zhang
Abstract: Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement ($>4$AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has $\sim3.5$AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by $\sim2$AP. The source code can be found at this https URL.
摘要：特征金字塔一直一种有效的方法在不同尺度来提取特征。发展了这种方法主要侧重于不同层次聚集的背景信息，而很少触及的功能金字塔的层间的相关性。早期的计算机视觉方法通过在空间和尺度尺寸定位特征极值提取比例不变特征。受此启发，横跨锥体水平的卷积在这项研究中，提出了一种被称为金字塔卷积和是修饰的3- d卷积。堆叠金字塔卷积直接提取3-d（比例和空间）的特点和优于其他精心设计特征融合模块。基于3-d卷积，集成批标准化从整体特征金字塔收集统计金字塔卷积后自然插入性的观点出发。此外，我们还显示，天真金字塔卷积，与RetinaNet头的设计在一起，其实最好的适用于从高斯金字塔，它的性能也很难通过特征金字塔满足提取功能。为了缓解这一矛盾，我们建立了一个规模均衡金字塔卷积（SEPC），其对齐到共享金字塔卷积核只有在高级别特征图。作为计算高效和与头部设计最单级对象检测器兼容，则SEPC模块带来在国家的最先进的单级对象探测器显著性能改进（在MS-COCO2017数据集$> 4 $ AP增加）和SEPC的光版本也有$ \ sim3.5 $ AP收益只有7％左右推理时间的增加。金字塔卷积还功能以及在两个阶段的目标探测器的独立模块，并能改善$ \性能SIM2 $ AP。源代码可以在此HTTPS URL中找到。

27. Deep Learning on Point Clouds for False Positive Reduction at Nodule Detection in Chest CT Scans [PDF] 返回目录
Ivan Drokin, Elena Ericheva
Abstract: The paper focuses on a novel approach for false-positive reduction (FPR) of nodule candidates in Computer-aided detection (CADe) system after suspicious lesions proposing stage. Unlike common decisions in medical image analysis, the proposed approach considers input data not as 2d or 3d image, but as a point cloud and uses deep learning models for point clouds. We found out that models for point clouds require less memory and are faster on both training and inference than traditional CNN 3D, achieves better performance and does not impose restrictions on the size of the input image, thereby the size of the nodule candidate. We propose an algorithm for transforming 3d CT scan data to point cloud. In some cases, the volume of the nodule candidate can be much smaller than the surrounding context, for example, in the case of subpleural localization of the nodule. Therefore, we developed an algorithm for sampling points from a point cloud constructed from a 3D image of the candidate region. The algorithm guarantees to capture both context and candidate information as part of the point cloud of the nodule candidate. An experiment with creating a dataset from an open LIDC-IDRI database for a feature of the FPR task was accurately designed, set up and described in detail. The data augmentation technique was applied to avoid overfitting and as an upsampling method. Experiments are conducted with PointNet, PointNet++ and DGCNN. We show that the proposed approach outperforms baseline CNN 3D models and demonstrates 85.98 FROC versus 77.26 FROC for baseline models.
摘要：本文的重点为结节候选的假阳性的减少（FPR）在计算机辅助检测（凯德）系统可疑病变提出阶段之后的新方法。不像在医学图像分析共同决定，提出的方法考虑了输入数据不作为2D或3D图像，但作为一个点云和使用深度学习模型点云。我们发现，型号为点云所需的存储空间，并在训练和推理比传统的3D CNN更快，达到更好的性能，对输入图像的大小，从而结节候选对象的大小不加以限制。我们提出了转化3D CT扫描数据点云的算法。在一些情况下，结节候选的体积可以比周围的上下文小得多，例如，在结节的胸膜下的定位的情况下。因此，我们开发了一种算法，用于从所述候选区域的3D图像构成的点云的采样点。该算法可以保证捕捉到这两个背景和考生信息的结节候选对象的点云的一部分。与用于FPR任务的特征创建从打开LIDC-IDRI数据库的数据集进行实验准确设计，建立和进行详细描述。数据增强技术应用于以避免过度拟合，并作为上采样方法。实验用PointNet，PointNet ++和DGCNN进行。我们表明，该方法比基线CNN的3D模型，并演示85.98 FROC与77.26 FROC基线模型。

28. Visualisation and knowledge discovery from interpretable models [PDF] 返回目录
Sreejita Ghosh, Peter Tino, Kerstin Bunte
Abstract: Increasing number of sectors which affect human lives, are using Machine Learning (ML) tools. Hence the need for understanding their working mechanism and evaluating their fairness in decision-making, are becoming paramount, ushering in the era of Explainable AI (XAI). In this contribution we introduced a few intrinsically interpretable models which are also capable of dealing with missing values, in addition to extracting knowledge from the dataset and about the problem. These models are also capable of visualisation of the classifier and decision boundaries: they are the angle based variants of Learning Vector Quantization. We have demonstrated the algorithms on a synthetic dataset and a real-world one (heart disease dataset from the UCI repository). The newly developed classifiers helped in investigating the complexities of the UCI dataset as a multiclass problem. The performance of the developed classifiers were comparable to those reported in literature for this dataset, with additional value of interpretability, when the dataset was treated as a binary class problem.
摘要：数量越来越影响人的生命，是使用机器学习（ML）工具部门。因此，需要了解他们的工作机制和决策评估其公正性，正在成为最重要的，在解释的AI（XAI）的时代。在这方面的贡献，我们介绍了一些内在的解释模型，其也能够与缺失值处理，除了从数据集和有关该问题的提取的知识。这些模型还能够分类和决策边界的可视化：他们的学习矢量量化的角度基于变种。我们已经证明了在合成数据集和真实世界的一个（从UCI库心脏疾病数据集）的算法。新开发的分类有助于在调查的UCI数据集的复杂的多类问题。在发达的分类器的性能比得上那些文献报道的用于该数据集，具有解释性的附加价值，当这些数据被作为二进制类问题处理。

29. Efficient Exact Verification of Binarized Neural Networks [PDF] 返回目录
Kai Jia, Martin Rinard
Abstract: We present a new system, EEV, for verifying binarized neural networks (BNNs). We formulate BNN verification as a Boolean satisfiability problem (SAT) with reified cardinality constraints of the form $y = (x_1 + \cdots + x_n \le b)$, where $x_i$ and $y$ are Boolean variables possibly with negation and $b$ is an integer constant. We also identify two properties, specifically balanced weight sparsity and lower cardinality bounds, that reduce the verification complexity of BNNs. EEV contains both a SAT solver enhanced to handle reified cardinality constraints natively and novel training strategies designed to reduce verification complexity by delivering networks with improved sparsity properties and cardinality bounds. We demonstrate the effectiveness of EEV by presenting the first exact verification results for $\ell_{\infty}$-bounded adversarial robustness of nontrivial convolutional BNNs on the MNIST and CIFAR10 datasets. Our results also show that, depending on the dataset and network architecture, our techniques verify BNNs between a factor of ten to ten thousand times faster than the best previous exact verification techniques for either binarized or real-valued networks.
摘要：本文提出了一种新的系统，EEV，用于检验二值化神经网络（BNNs）。我们制定BNN验证为布尔可满足性问题（SAT）与形式$ Y的具体化的基数约束=（X_1 + \ cdots + x_n \文件B）$，其中$ X_I $和$ Y $是布尔变量可能与否定和$ b $是一个整常数。我们还确定了两个属性，特别是平衡重量稀疏和较低的基数范围，减少BNNs验证复杂。 EEV包含一个SAT解算器增强以处理本地物化基数约束和设计成通过具有改进的稀疏性质和基数范围递送网络，以降低复杂性验证新颖训练策略。界上MNIST和CIFAR10数据集平凡的卷积BNNs的对抗性稳健性 - 我们通过介绍第一确切的验证结果为$ \ ELL _ {\ infty} $证明EEV的有效性。我们的研究结果也显示，这取决于数据集和网络架构，我们的技术验证十倍之间BNNs近万倍，比以前的最好验证准确的技术或者二值化或实值的网络速度更快。

30. Noisy Differentiable Architecture Search [PDF] 返回目录
Xiangxiang Chu, Bo Zhang, Xudong Li
Abstract: Simplicity is the ultimate sophistication. Differentiable Architecture Search (DARTS) has now become one of the mainstream paradigms of neural architecture search. However, it largely suffers from several disturbing factors of optimization process whose results are unstable to reproduce. FairDARTS points out that skip connections natively have an unfair advantage in exclusive competition which primarily leads to dramatic performance collapse. While FairDARTS turns the unfair competition into a collaborative one, we instead impede such unfair advantage by injecting unbiased random noise into skip operations' output. In effect, the optimizer should perceive this difficulty at each training step and refrain from overshooting on skip connections, but in a long run it still converges to the right solution area since no bias is added to the gradient. We name this novel approach as NoisyDARTS. Our experiments on CIFAR-10 and ImageNet attest that it can effectively break the skip connection's unfair advantage and yield better performance. It generates a series of models that achieve state-of-the-art results on both datasets.
摘要：简单是终极的复杂。微架构搜索（飞镖）如今已成为神经结构搜索的主流范式之一。然而，它在很大程度上从优化的过程，其结果是不稳定重现的若干干扰因素困扰。 FairDARTS指出，跳过连接本机具有独特的竞争，主要是导致显着的性能崩溃不公平的优势。虽然FairDARTS变成了不公平竞争变成一个合作注入公正的随机噪声成跳过运营的输出，我们反而妨碍这种不公平的优势。在效果上，优化器应该感知在从过冲上跳跃连接每个训练步骤和副歌这一困难，但在相当长的运行它仍然收敛到正确的解决方案，因为区域没有偏压加到梯度。我们命名这个新的方法为NoisyDARTS。我们对CIFAR-10和ImageNet鉴证实验证明，它可以有效地打破跳过连接的不公平优势，并产生更好的性能。它产生了一系列实现对两个数据集的国家的最先进的车型结果。

31. How Can CNNs Use Image Position for Segmentation? [PDF] 返回目录
Rito Murase, Masanori Suganuma, Takayuki Okatani
Abstract: Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs. The study further claims that the position information enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified. Moreover, the absolute image position may not be essential for the segmentation of natural images, in which target objects will appear at any image position. In this study, we investigate how positional information is and can be utilized for segmentation tasks. Toward this end, we consider {\em positional encoding} (PE) that adds channels embedding image position to the input images and compare PE with several padding methods. Considering the above nature of natural images, we choose medical image segmentation tasks, in which the absolute position appears to be relatively important, as the same organs (of different patients) are captured in similar sizes and positions. We draw a mixed conclusion from the experimental results; the positional encoding certainly works in some cases, but the absolute image position may not be so important for segmentation tasks as we think.
摘要：卷积是一个等变运行，图像位置不影响其效果。最近的一项研究表明，在细胞神经网络卷积层中采用的零填充将位置信息提供给细胞神经网络。该研究还称，位置信息能够准确地推断为多个任务，如目标识别，分割等。然而，有一个技术问题与研究的实验设计，因而要求的正确性尚未待验证。此外，绝对图像位置可能不是自然的图像的分割，其中目标对象将出现在任何图像位置是至关重要的。在这项研究中，我们研究如何信息是位置和可用于分割任务。为此目的，我们考虑{\ EM位置编码}，增加了信道嵌入到输入图像的图像位置，并且与几个填补方法比较PE（PE）。考虑到自然图像的上述特性，我们选择医学图像分割任务，其中绝对位置显得比较重要，因为相同的器官（不同患者）在类似的尺寸和位置被捕获。我们借鉴的实验结果混合的结论;位置编码肯定工作在某些情况下，但绝对图像位置可能不适合分割的任务是如此重要，因为我们的想法。

32. AIBench: Scenario-distilling AI Benchmarking [PDF] 返回目录
Wanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang, Zheng Cao, Chuanxin Lan, Chunjie Luo, Zihan Jiang
Abstract: Real-world application scenarios like modern Internet services consist of diversity of AI and non-AI modules with very long and complex execution paths. Using component or micro AI benchmarks alone can lead to error-prone conclusions. This paper proposes a scenario-distilling AI benchmarking methodology. Instead of using real-world applications, we propose the permutations of essential AI and non-AI tasks as a scenario-distilling benchmark. We consider scenario-distilling benchmarks, component and micro benchmarks as three indispensable parts of a benchmark suite. Together with seventeen industry partners, we identify nine important real-world application scenarios. We design and implement a highly extensible, configurable, and flexible benchmark framework. On the basis of the framework, we propose the guideline for building scenario-distilling benchmarks, and present two Internet service AI ones. The preliminary evaluation shows the advantage of scenario-distilling AI benchmarking against using component or micro AI benchmarks alone. The specifications, source code, testbed, and results are publicly available from the web site \url{this http URL}.
摘要：喜欢现代互联网服务的真实世界应用场景包括AI和非AI模块的多样性具有非常长而复杂的执行路径。使用组分或微AI单独基准测试能导致易错的结论。本文提出了一种方案，蒸馏AI基准测试方法。而不是使用现实世界的应用，我们提出必要的AI和非AI任务的排列为场景的蒸馏基准。我们认为情景蒸馏基准，成分，微基准测试为基准套件三者缺一不可的部分。加上17级行业的合作伙伴，我们确定九大现实世界的应用场景。我们设计并实现了一个高度可扩展，可配置和灵活的基准框架。在框架的基础上，我们提出了建设方案，蒸馏基准，和现在的两个互联网服务AI的人的指导。初步评估显示的情况下，蒸馏AI基准反对使用组件或微AI单独基准测试中的优势。这些规范，源代码，测试平台和结果是公众可从该网站\ {URL这个HTTP URL}。

33. Lifted Regression/Reconstruction Networks [PDF] 返回目录
Rasmus Kjær Høier, Christopher Zach
Abstract: In this work we propose lifted regression/reconstruction networks (LRRNs), which combine lifted neural networks with a guaranteed Lipschitz continuity property for the output layer. Lifted neural networks explicitly optimize an energy model to infer the unit activations and therefore---in contrast to standard feed-forward neural networks---allow bidirectional feedback between layers. So far lifted neural networks have been modelled around standard feed-forward architectures. We propose to take further advantage of the feedback property by letting the layers simultaneously perform regression and reconstruction. The resulting lifted network architecture allows to control the desired amount of Lipschitz continuity, which is an important feature to obtain adversarially robust regression and classification methods. We analyse and numerically demonstrate applications for unsupervised and supervised learning.
摘要：在这项工作中我们提出解除回归/重建网络（LRRNs），它结合了有保证的利普希茨连续属性输出层解除神经网络。解除神经网络明确地优化的能量模型来推断该单元激活，并且因此在---与标准前馈神经网络---允许层之间的双向的反馈。到目前为止，解除神经网络是参照各地标准前馈架构。我们建议通过让层同时进行回归和重建采取反馈属性的另一个优点。将得到的解除网络架构允许控制利普希茨连续，这是一个重要的特征，以获得adversarially稳健回归和分类方法的期望的量。我们分析和数值上演示了无监督和监督学习应用。

34. NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image [PDF] 返回目录
Boaz Arad, Radu Timofte, Ohad Ben-Shahar, Yi-Tun Lin, Graham Finlayson, Shai Givati, others
Abstract: This paper reviews the second challenge on spectral reconstruction from RGB images, i.e., the recovery of whole-scene hyperspectral (HS) information from a 3-channel RGB image. As in the previous challenge, two tracks were provided: (i) a "Clean" track where HS images are estimated from noise-free RGBs, the RGB images are themselves calculated numerically using the ground-truth HS images and supplied spectral sensitivity functions (ii) a "Real World" track, simulating capture by an uncalibrated and unknown camera, where the HS images are recovered from noisy JPEG-compressed RGB images. A new, larger-than-ever, natural hyperspectral image data set is presented, containing a total of 510 HS images. The Clean and Real World tracks had 103 and 78 registered participants respectively, with 14 teams competing in the final testing phase. A description of the proposed methods, alongside their challenge scores and an extensive evaluation of top performing methods is also provided. They gauge the state-of-the-art in spectral reconstruction from an RGB image.
摘要：综述了频谱重建的第二挑战从RGB图像，即，从一个3通道RGB图像全场景高光谱（HS）信息的恢复。如在先前的挑战，分别设置两个轨道：（ⅰ）一个“干净”轨道，其中HS图像从无噪声位RGB估计，该RGB图像本身使用地面实况HS图像计算数字并供给分光灵敏度函数（ II）“真实世界”的轨道，通过一个未校准和未知的相机，其中HS图像从嘈杂的JPEG压缩RGB图像恢复模拟捕获。一个新的，较大比以往，天然的高光谱图像数据组被呈现，含有总共510个HS图像。清洁和真实世界的轨道分别有103倍78个注册的参与者，有14支球队在最后的测试阶段的竞争。还提供了所提出的方法的描述，沿着他们的挑战成绩和上面的方法进行广泛的评估。它们从RGB图像衡量状态的最先进的频谱重建。

35. Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan [PDF] 返回目录
Xiaofeng Zhu, Bin Song, Feng Shi, Yanbo Chen, Rongyao Hu, Jiangzhang Gan, Wenhai Zhang, Man Li, Liye Wang, Yaozong Gao, Fei Shan, Dinggang Shen
Abstract: With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the time that patients might convert to the severe stage, for designing effective treatment plan and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time, and if yes, predict the possible conversion time that the patient would spend to convert to the severe stage. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of high-dimensional data and learn the shared information across the classification task and the regression task. To our knowledge, this study is the first work to predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 422 chest computed tomography (CT) scans, where 52 cases were converted to severe on average 5.64 days and 34 cases were severe at admission. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
摘要：随着冠状病毒病（COVID-19）的全球范围内迅速蔓延，这是非常重要的，进行COVID-19的早期诊断和预测，患者可能会转化为严重的阶段，设计有效的治疗方案，减少的时间医生的工作量。在这项研究中，我们提出了一个共同的分类和回归的方法来确定患者是否会发展在以后的时间严重的症状，如果是，预测可能的转换时间，病人将花费转换到严重阶段。要做到这一点，所提出的方法通过一个稀疏正则化项考虑到1）的权重对每个样品，以减少异常值的影响力和探索不平衡分类的问题，和2）的权重为每个功能来删除的冗余特征高维数据，了解跨分类任务和回归任务共享信息。据我们所知，这项研究是预测疾病进展和转换时间，这可能有助于临床医生及时处理，甚至拯救患者生命的潜在严重的情况下的第一项工作。实验分析从两家医院422与胸部计算机断层摄影（CT）扫描，其中52箱子转化为严重平均5.64天和34例入院时严重在真实数据集进行的。结果表明，我们的方法实现了最好的分级（例如，准确度的85.91％）和回归（例如，相关系数0.462）的性能，相对于所有比较方法。此外，我们所提出的方法产生的精度76.97％用于预测严重的情况下，相关系数的0.524，而对于转换后的时间0.55天差。

36. WSMN: An optimized multipurpose blind watermarking in Shearlet domain using MLP and NSGA-II [PDF] 返回目录
Behrouz Bolourian Haghighi, Amir Hossein Taherinia, Ahad Harati, Modjtaba Rouhani
Abstract: Digital watermarking is a remarkable issue in the field of information security to avoid the misuse of images in multimedia networks. Although access to unauthorized persons can be prevented through cryptography, it cannot be simultaneously used for copyright protection or content authentication with the preservation of image integrity. Hence, this paper presents an optimized multipurpose blind watermarking in Shearlet domain with the help of smart algorithms including MLP and NSGA-II. In this method, four copies of the robust copyright logo are embedded in the approximate coefficients of Shearlet by using an effective quantization technique. Furthermore, an embedded random sequence as a semi-fragile authentication mark is effectively extracted from details by the neural network. Due to performing an effective optimization algorithm for selecting optimum embedding thresholds, and also distinguishing the texture of blocks, the imperceptibility and robustness have been preserved. The experimental results reveal the superiority of the scheme with regard to the quality of watermarked images and robustness against hybrid attacks over other state-of-the-art schemes. The average PSNR and SSIM of the dual watermarked images are 38 dB and 0.95, respectively; Besides, it can effectively extract the copyright logo and locates forgery regions under severe attacks with satisfactory accuracy.
摘要：数字水印是信息安全领域的一个显着的问题，以避免图像的多媒体网络被滥用。虽然可以通过加密来防止访问未经授权的人员，不能同时使用与图像的完整性的保护版权保护或内容认证。因此，本文提出了一种优化的多功能盲目剪切波域水印算法的智能算法，包括MLP和NSGA-II的帮助。在该方法中，鲁棒版权标志的四个拷贝是通过使用有效量化技术嵌入在剪切波的近似系数。此外，嵌入的随机序列作为半脆弱认证标记被有效地从细节由神经网络提取。由于执行用于选择最佳嵌入阈值，并且还区分块的纹理的有效优化算法，不可见性和鲁棒性被保存。实验结果表明该方案的优越性关于水印图像的质量和可靠性，防止过其他国家的最先进的方案混合攻击。平均PSNR和SSIM双加水印的图像为38 dB和0.95，分别的;此外，它可以有效地提取下获得满意的精度严重攻击的版权标识和定位伪造区域。

37. Vid2Curve: Simultaneously Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video [PDF] 返回目录
Peng Wang, Lingjie Liu, Nenglun Chen, Hung-Kuo Chu, Christian Theobalt, Wenping Wang
Abstract: Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.
摘要：薄的结构，如线框雕塑，围栏，电缆，电力线，和树枝，是在现实世界中普遍。这是非常具有挑战性的获取他们的3D数字模型使用传统的基于图像或基于深度的重建方法，因为薄的结构往往缺乏鲜明点特征和有严重的自遮挡。我们建议，同时估计摄像机运动和重建复杂的3D薄结构在从由手持照相机拍摄的彩色视频高质量的几何形状的第一种方法。具体而言，提出了一种新的基于曲线的方法，通过在连续视频帧前台建立无特色薄的物体之间的对应关系来估计准确的照相机姿势，而不需要在背景场景视觉纹理来锁定。这个有效的基于曲线相机姿态估计策略启用，我们开发出几何剪裁措施，拓扑以及自遮挡处理重建3D薄结构迭代优化方法。上的各种薄结构的广泛的验证表明，我们的方法在尚未达到通过其他现有的重建方法的电平实现准确相机姿态估计和3D薄结构具有复杂形状和拓扑的忠实重建。

38. Scoring Root Necrosis in Cassava Using Semantic Segmentation [PDF] 返回目录
Jeremy Francis Tusubira, Benjamin Akera, Solomon Nsumba, Joyce Nakatumba-Nabende, Ernest Mwebaze
Abstract: Cassava a major food crop in many parts of Africa, has majorly been affected by Cassava Brown Streak Disease (CBSD). The disease affects tuberous roots and presents symptoms that include a yellow/brown, dry, corky necrosis within the starch-bearing tissues. Cassava breeders currently depend on visual inspection to score necrosis in roots based on a qualitative score which is quite subjective. In this paper we present an approach to automate root necrosis scoring using deep convolutional neural networks with semantic segmentation. Our experiments show that the UNet model performs this task with high accuracy achieving a mean Intersection over Union (IoU) of 0.90 on the test set. This method provides a means to use a quantitative measure for necrosis scoring on root cross-sections. This is done by segmentation and classifying the necrotized and non-necrotized pixels of cassava root cross-sections without any additional feature engineering.
摘要：木薯非洲许多地区的主要粮食作物，已majorly一直受木薯褐条病（CBSD）。疾病影响块根和礼物症状包括黄色/棕色，干燥，含淀粉组织内木栓质坏死。木薯育种目前依靠目视检查，以得分坏死的基础上有质的分数是相当主观的根源。在本文中，我们提出使用深卷积神经网络与语义分割来自动根坏死得分的方法。我们的实验表明，UNET模型执行此任务高精度的测试集达到0.90比联盟平均交集（IOU）。此方法提供了一种手段，以对根的横截面用的定量测量为坏死评分。这是通过分割和木薯根的横截面的坏死和非坏死像素分类而无需任何额外的特征工程完成。

39. DramaQA: Character-Centered Video Story Understanding with Hierarchical QA [PDF] 返回目录
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Seungchan Lee, Minsu Lee, Byoung-Tak Zhang
Abstract: Despite recent progress on computer vision and natural language processing, developing video understanding intelligence is still hard to achieve due to the intrinsic difficulty of story in video. Moreover, there is not a theoretical metric for evaluating the degree of video understanding. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focused on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors, and emotions of main characters, and coreference resolved scripts. Additionally, we provide analyses of the dataset as well as Dual Matching Multistream model which effectively learns character-centered representations of video to answer questions about the video. We are planning to release our dataset and model publicly for research purposes and expect that our work will provide a new perspective on video story understanding research.
摘要：尽管在计算机视觉和自然语言处理，开发视频了解智能的最新进展仍难以实现，由于故事的视频中的内在困境。此外，没有评估的视频了解程度的理论度量。在本文中，我们提出了一种新的视频问答（视频QA）任务，DramaQA，用于视频故事的一个全面的了解。该DramaQA集中在两个方面：1）层次的问题答案是基于人类智能的认知发展阶段的评估指标。 2）字符为中心的视频注释建模的故事的局部相干性。我们的数据是在电视剧“另一种哦小姐”建成，它包含23928个多种长度的视频剪辑16191 QA对，属于四个难度等级各一个QA对。我们提供217308幅注释的图像具有丰富的角色为中心的注解，包括视觉边框，行为和主要人物的情绪，共指解决脚本。此外，我们提供的数据集的分析，以及有效地学习视频的角色为中心的交涉有关视频答题双匹配多数据流模型。我们计划公开发布我们的数据和模型为研究目的和期望，我们的工作将提供视频故事的理解研究的新视角。

40. A Review of Computer Vision Methods in Network Security [PDF] 返回目录
Jiawei Zhao, Rahat Masood, Suranga Seneviratne
Abstract: Network security has become an area of significant importance more than ever as highlighted by the eye-opening numbers of data breaches, attacks on critical infrastructure, and malware/ransomware/cryptojacker attacks that are reported almost every day. Increasingly, we are relying on networked infrastructure and with the advent of IoT, billions of devices will be connected to the internet, providing attackers with more opportunities to exploit. Traditional machine learning methods have been frequently used in the context of network security. However, such methods are more based on statistical features extracted from sources such as binaries, emails, and packet flows. On the other hand, recent years witnessed a phenomenal growth in computer vision mainly driven by the advances in the area of convolutional neural networks. At a glance, it is not trivial to see how computer vision methods are related to network security. Nonetheless, there is a significant amount of work that highlighted how methods from computer vision can be applied in network security for detecting attacks or building security solutions. In this paper, we provide a comprehensive survey of such work under three topics; i) phishing attempt detection, ii) malware detection, and iii) traffic anomaly detection. Next, we review a set of such commercial products for which public information is available and explore how computer vision methods are effectively used in those products. Finally, we discuss existing research gaps and future research directions, especially focusing on how network security research community and the industry can leverage the exponential growth of computer vision methods to build much secure networked systems.
摘要：网络安全已经成为显著重要的领域比以往任何时候所强调的数据泄露，对关键基础设施的攻击和恶意软件/勒索/ cryptojacker攻击睁眼数字，几乎每天都在报道。我们越来越多地依靠网络基础设施，并与物联网的出现，十亿台设备将连接到互联网，提供更多的机会利用攻击。传统的机器学习方法已经在网络安全方面被频繁使用。然而，这种方法是更基于来自诸如二进制文件，电子邮件，和数据包流中提取统计特征。在另一方面，近几年见证了主要由进步卷积神经网络的区域带动计算机视觉显着增长。一目了然，这是不平凡看到计算机视觉的方法是如何与网络安全。然而，有一个突出如何从计算机视觉的方法可以在网络安全检测攻击或建筑物的安全解决方案可应用于工作显著量。在本文中，我们提供以下三个主题，工作的全面调查; ⅰ）网络钓鱼企图检测，ⅱ）的恶意软件检测，以及iii）流量异常检测。接下来，我们回顾了一组这样的商业产品，为哪些公共信息可用，并探讨如何计算机视觉的方法在这些产品的有效使用。最后，我们讨论现有的研究差距和今后的研究方向，尤其是专注于网络安全研究界和产业如何利用计算机视觉方法的指数增长建立多安全的网络系统。

41. Encoding in the Dark Grand Challenge: An Overview [PDF] 返回目录
Nantheera Anantrasirichai, Fan Zhang, Alexandra Malyugina, Paul Hill, Angeliki Katsenou
Abstract: A big part of the video content we consume from video providers consists of genres featuring low-light aesthetics. Low light sequences have special characteristics, such as spatio-temporal varying acquisition noise and light flickering, that make the encoding process challenging. To deal with the spatio-temporal incoherent noise, higher bitrates are used to achieve high objective quality. Additionally, the quality assessment metrics and methods have not been designed, trained or tested for this type of content. This has inspired us to trigger research in that area and propose a Grand Challenge on encoding low-light video sequences. In this paper, we present an overview of the proposed challenge, and test state-of-the-art methods that will be part of the benchmark methods at the stage of the participants' deliverable assessment. From this exploration, our results show that VVC already achieves a high performance compared to simply denoising the video source prior to encoding. Moreover, the quality of the video streams can be further improved by employing a post-processing image enhancement method.
摘要：我们从视频提供商消费视频内容的一个重要组成部分是由流派，具有低光美学。低光序列具有特殊的特性，如时空变采集噪声和光闪烁，使编码处理挑战。为应对时空语无伦次噪音，更高的比特率来实现高品质的目标。此外，质量评估指标和方法没有被设计，培训或为这种类型的内容进行测试。这一直激励着我们在这方面的触发研究和提出关于编码低照度视频序列的大挑战。在本文中，我们提出了挑战，提出的概述，和国家的最先进的测试方法，这将是在参加者交付评估阶段的基准方法的一部分。从这种探索，我们的研究结果表明，VVC比较简单编码前去噪视频源已经实现了高性能。此外，视频流的质量可以被进一步通过采用后处理图像增强方法改善。

42. Knowledge Enhanced Neural Fashion Trend Forecasting [PDF] 返回目录
Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua
Abstract: Fashion trend forecasting is a crucial task for both academia and industry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal the real fashion trends. Towards insightful fashion trend forecasting, this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Further-more, to effectively model the time series data of fashion elements with rather complex patterns, we propose a Knowledge EnhancedRecurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time-series data. Moreover, it leverages internal and external knowledge in fashion domain that affects the time-series patterns of fashion element trends. Such incorporation of domain knowledge further enhances the deep learning model in capturing the patterns of specific fashion elements and predicting the future trends. Extensive experiments demonstrate that the proposed KERN model can effectively capture the complicated patterns of objective fashion elements, therefore making preferable fashion trend forecast.
摘要：时尚趋势预测是学术界和工业界的关键任务。虽然一些努力，一直致力于解决这个艰巨的任务，他们只研究了很强的季节性或简单的模式，这可能很难揭示真正的时尚潮流有限的时尚元素。迈向见地的时尚趋势预测，今年工作重点放在研究为特定用户群细粒度的时尚元素趋势。我们首先从贡献Instagram的收集与提取的时间序列的时尚元素记录和用户信息大规模流行趋势的数据集（FIT）。此外，更多的，有效应对相当复杂的图案时尚元素的时间序列数据模型，我们建议其在模拟时间序列数据需要深递归神经网络的能力的优势知识EnhancedRecurrent网络模型（KERN）。此外，它充分利用了影响的时尚元素趋势的时间序列模式时装领域内部和外部的知识。领域知识这样的结合进一步增强了深度学习模型捕捉特定的时尚元素的图案，并预测未来的发展趋势。广泛的实验表明，该KERN模型能够有效地捕捉目标的时尚元素的复杂图案，因此使优选时尚潮流预测。

43. Multi-view data capture using edge-synchronised mobiles [PDF] 返回目录
Matteo Bortolon, Paul Chippendale, Stefano Messelodi, Fabio Poiesi
Abstract: Multi-view data capture permits free-viewpoint video (FVV) content creation. To this end, several users must capture video streams, calibrated in both time and pose, framing the same object/scene, from different viewpoints. New-generation network architectures (e.g. 5G) promise lower latency and larger bandwidth connections supported by powerful edge computing, properties that seem ideal for reliable FVV capture. We have explored this possibility, aiming to remove the need for bespoke synchronisation hardware when capturing a scene from multiple viewpoints, making it possible through off-the-shelf mobiles. We propose a novel and scalable data capture architecture that exploits edge resources to synchronise and harvest frame captures. We have designed an edge computing unit that supervises the relaying of timing triggers to and from multiple mobiles, in addition to synchronising frame harvesting. We empirically show the benefits of our edge computing unit by analysing latencies and show the quality of 3D reconstruction outputs against an alternative and popular centralised solution based on Unity3D.
摘要：多视角数据采集允许自由视点视频（FVV）内容创作。为此，多个用户必须捕获视频流，在时间和姿势校准，取景同一对象/场景，从不同的角度。新一代网络架构（例如5G）承诺更低的延迟和由强大边缘计算支持更大的带宽连接，似乎非常适合可靠FVV俘获性质。我们已经探索这种可能性，旨在消除对定制的同步硬件的需求捕获从多个角度的场景时，能够通过车外的现成手机。我们建议，利用优势资源，以同步和收获帧捕获新的和可扩展的数据采集架构。我们已经设计了一个边缘计算单元，其监督定时触发器的中继，并从多个运动中，除了同步帧收获。我们通过实证分析表明延迟我们的优势计算单元的好处，并展示三维重建输出相对于基于Unity3D替代和流行的集中式解决方案的质量。

44. Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT [PDF] 返回目录
Liang Sun, Zhanhao Mo, Fuhua Yan, Liming Xia, Fei Shan, Zhongxiang Ding, Wei Shao, Feng Shi, Huan Yuan, Huiting Jiang, Dijia Wu, Ying Wei, Yaozong Gao, Wanchun Gao, He Sui, Daoqiang Zhang, Dinggang Shen
Abstract: Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19 classification based on chest CT images. Specifically, we first extract location-specific features from CT images. Then, in order to capture the high-level representation of these features with the relatively small-scale data, we leverage a deep forest model to learn high-level representation of the features. Moreover, we propose a feature selection method based on the trained deep forest model to reduce the redundancy of features, where the feature selection could be adaptively incorporated with the COVID-19 classification model. We evaluated our proposed AFS-DF on COVID-19 dataset with 1495 patients of COVID-19 and 1027 patients of community acquired pneumonia (CAP). The accuracy (ACC), sensitivity (SEN), specificity (SPE) and AUC achieved by our method are 91.79%, 93.05%, 89.95% and 96.35%, respectively. Experimental results on the COVID-19 dataset suggest that the proposed AFS-DF achieves superior performance in COVID-19 vs. CAP classification, compared with 4 widely used machine learning methods.
摘要：胸部计算机断层扫描（CT）变得协助冠状疾病-19（COVID-19）的诊断的有效工具。由于COVID-19全球爆发以来，使用基于CT图像COVID-19分类计算的辅助诊断技术可以在很大程度上缓解医生的负担。在本文中，我们提出了指导森林深处（AFS-DF）基于胸部CT图像COVID-19的分类自适应特征选择。具体地，从CT图像，我们首先提取位置特定的功能。然后，为了捕捉这些功能与规模相对较小的数据高层表示，我们充分利用深林模型学习的特点高层表示。此外，我们提出了基于训练的深林模型特征选择方法降低特征，该特征选择可以与COVID-19分类模型自适应结合的冗余。我们评估我们提出的AFS-DF上COVID-19数据集1495例COVID-19和1027例社区获得性肺炎（CAP）的。精度（ACC），灵敏度（SEN），特异性（SPE）和AUC由我们的方法分别达到91.79是％，93.05％，89.95％和96.35％。在COVID-19实验结果数据集表明，所提出的AFS-DF实现了COVID-19与CAP分类性能优越，具有4种广泛使用的机器学习方法相比。

45. Subdomain Adaptation with Manifolds Discrepancy Alignment [PDF] 返回目录
Pengfei Wei, Yiping Ke, Xinghua Qu, Tze-Yun Leong
Abstract: Reducing domain divergence is a key step in transfer learning problems. Existing works focus on the minimization of global domain divergence. However, two domains may consist of several shared subdomains, and differ from each other in each subdomain. In this paper, we take the local divergence of subdomains into account in transfer. Specifically, we propose to use low-dimensional manifold to represent subdomain, and align the local data distribution discrepancy in each manifold across domains. A Manifold Maximum Mean Discrepancy (M3D) is developed to measure the local distribution discrepancy in each manifold. We then propose a general framework, called Transfer with Manifolds Discrepancy Alignment (TMDA), to couple the discovery of data manifolds with the minimization of M3D. We instantiate TMDA in the subspace learning case considering both the linear and nonlinear mappings. We also instantiate TMDA in the deep learning framework. Extensive experimental studies demonstrate that TMDA is a promising method for various transfer learning tasks.
摘要：减少域的分歧是在迁移学习问题的关键一步。现有的研究主要集中在全球领域发散的最小化。然而，两个结构域可以由几个子域共享的，并且彼此在每个子域不同。在本文中，我们采取的子域的本地发散到账户转移。具体地讲，我们建议使用低维流形来表示子域，并对准本地数据分布差异在跨域每个歧管。歧管最大平均差异（M3D）被显影以测量每个歧管中的局部分布的差异。然后，我们提出了一个总体框架，称为转移与歧管对准差异（TMDA），耦合数据流形的M3D的最小化的发现。我们实例TMDA子空间学习情况同时考虑线性和非线性映射。我们也深学习框架实例TMDA。大量的实验研究表明，TMDA是各种迁移学习任务的很有前途的方法。

46. Collective Loss Function for Positive and Unlabeled Learning [PDF] 返回目录
Chenhao Xie, Qiao Cheng, Jiaqing Liang, Lihan Chen, Yanghua Xiao
Abstract: People learn to discriminate between classes without explicit exposure to negative examples. On the contrary, traditional machine learning algorithms often rely on negative examples, otherwise the model would be prone to collapse and always-true predictions. Therefore, it is crucial to design the learning objective which leads the model to converge and to perform predictions unbiasedly without explicit negative signals. In this paper, we propose a Collectively loss function to learn from only Positive and Unlabeled data (cPU). We theoretically elicit the loss function from the setting of PU learning. We perform intensive experiments on the benchmark and real-world datasets. The results show that cPU consistently outperforms the current state-of-the-art PU learning methods.
摘要：人们学习没有明确的接触反面的例子类之间进行区分。相反，传统的机器学习算法，往往依赖于反例，否则模型将容易崩溃，永远如此预测。因此，关键是要设计的学习目标，这导致该模型收敛，并没有明确的负信号无偏地进行预测。在本文中，我们提出了一个总体损失函数只从积极的和未标记的数据（CPU）学习。我们从理论上PU学习的设置引起的损失函数。我们执行的基准和真实世界的数据集密集的实验。结果表明，CPU的性能一直优于当前国家的最先进的PU学习方法。

47. Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent Multi-View Representation Learning [PDF] 返回目录
Hengyuan Kang, Liming Xia, Fuhua Yan, Zhibin Wan, Feng Shi, Huan Yuan, Huiting Jiang, Dijia Wu, He Sui, Changqing Zhang, Dinggang Shen
Abstract: Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed, and could largely reduce the efforts of clinicians and accelerate the diagnosis process. Chest computed tomography (CT) has been recognized as an informative tool for diagnosis of the disease. In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images. To fully explore multiple features describing CT images from different views, a unified latent representation is learned which can completely encode information from different aspects of features and is endowed with promising class structure for separability. Specifically, the completeness is guaranteed with a group of backward neural networks (each for one type of features), while by using class labels the representation is enforced to be compact within COVID-19/community-acquired pneumonia (CAP) and also a large margin is guaranteed between different types of pneumonia. In this way, our model can well avoid overfitting compared to the case of directly projecting highdimensional features into classes. Extensive experimental results show that the proposed method outperforms all comparison methods, and rather stable performances are observed when varying the numbers of training data.
摘要：近日，冠状病毒病2019（COVID-19）的爆发，迅速蔓延世界各地。由于大量受影响的患者和重体力劳动的医生，用机器学习算法的计算机辅助诊断迫切需要，并能大大降低医生的努力和加速诊断过程。胸部计算机断层扫描（CT）已被确认为疾病的诊断的信息工具。在这项研究中，我们提出了一系列的从CT图像中提取的特征进行COVID-19的诊断。充分发掘描述从不同视图CT图像的多个特征，一个统一的潜表示了解其可以完全由特征不同方面的编码信息，并且赋有希望的一类结构可分离性。具体地，完整性与一组向后神经网络（每一个用于一种类型的特征）的保证的，同时通过使用类别标签的表示被强制为COVID-19 /社区获得性肺炎（CAP）和还大的内是紧凑的余量不同类型肺炎之间保证。这样一来，我们的模型可以很好地避免相比，直接投射高维功能融入班的情况下过度拟合。大量的实验结果表明，该方法优于所有的比较方法，和不同的训练数据的数字时，而稳定的性能进行了观察。

48. Towards Frequency-Based Explanation for Robust CNN [PDF] 返回目录
Zifan Wang, Yilin Yang, Ankit Shrivastava, Varun Rawal, Zihao Ding
Abstract: Current explanation techniques towards a transparent Convolutional Neural Network (CNN) mainly focuses on building connections between the human-understandable input features with models' prediction, overlooking an alternative representation of the input, the frequency components decomposition. In this work, we present an analysis of the connection between the distribution of frequency components in the input dataset and the reasoning process the model learns from the data. We further provide quantification analysis about the contribution of different frequency components toward the model's prediction. We show that the vulnerability of the model against tiny distortions is a result of the model is relying on the high-frequency features, the target features of the adversarial (black and white-box) attackers, to make the prediction. We further show that if the model develops stronger association between the low-frequency component with true labels, the model is more robust, which is the explanation of why adversarially trained models are more robust against tiny distortions.
摘要：朝向透明卷积神经网络（CNN）当前解释的技术主要集中在建设有模型的预测的人类可理解的输入要素之间的连接，俯视输入时，频率分量分解的另一种表示。在这项工作中，我们提出的频率分量中的输入数据集的分布和推理过程从数据模型获悉之间的连接的分析。我们进一步提供有关的不同频率成分对模型预测的贡献定量分析。我们发现，对微小的变形模型的弱点是模型是依靠高频特性的结果，对抗（黑和白盒）的目标特征的攻击，使预测。进一步的研究表明，如果该模型开发与真正的标签低频成分之间的关联性更强，模型较为强劲，这也是为什么adversarially训练的模型都反对微小的失真更稳健的解释。

49. Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting [PDF] 返回目录
Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann
Abstract: Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only. However, it is still challenging to associate source-target sentences in the latent space. As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising yet under-explored in unsupervised multimodal MT (MMT). In this paper, we investigate how to utilize visual content for disambiguation and promoting latent space alignment in unsupervised MMT. Our model employs multimodal back-translation and features pseudo visual pivoting in which we learn a shared multilingual visual-semantic embedding space and incorporate visually-pivoted captioning as additional weak supervision. The experimental results on the widely used Multi30K dataset show that the proposed model significantly improves over the state-of-the-art methods and generalizes well when the images are not available at the testing time.
摘要：无监督机器翻译（MT）最近取得了只有单语语料库骄人的成绩。然而，它仍然是具有挑战性的潜在空间关联源 - 目标的句子。随着人们讲不同语言的生物有着相似的视觉系统，实现了通过视觉内容更加协调一致的潜力仍然看好下，探索在无人监督的多式联运MT（MMT）。在本文中，我们研究如何利用消除歧义，促进监督的MMT潜在空间定位的视觉内容。我们的模型采用了多回译和伪功能中，我们学到了共享多语种视觉语义嵌入空间，并纳入视觉摆动字幕作为附加的监管不力视觉旋转。广泛使用的Multi30K数据集显示，该模型的国家的最先进的方法，概括显著改进了很好的图像时，不可在测试时间的实验结果。

50. Diagnosing the Environment Bias in Vision-and-Language Navigation [PDF] 返回目录
Yubo Zhang, Hao Tan, Mohit Bansal
Abstract: Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. These step-by-step navigational instructions are crucial when the agent is navigating new environments about which it has no prior knowledge. Most recent works that study VLN observe a significant performance drop when tested on unseen environments (i.e., environments not used in training), indicating that the neural agent models are highly biased towards training environments. Although this issue is considered as one of the major challenges in VLN research, it is still under-studied and needs a clearer explanation. In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias. We observe that neither the language nor the underlying navigational graph, but the low-level visual appearance conveyed by ResNet features directly affects the agent model and contributes to this environment bias in results. According to this observation, we explore several kinds of semantic representations that contain less low-level visual information, hence the agent learned with these features could be better generalized to unseen testing environments. Without modifying the baseline agent model and its training method, our explored semantic features significantly decrease the performance gaps between seen and unseen on multiple datasets (i.e. R2R, R4R, and CVDN) and achieve competitive unseen results to previous state-of-the-art models. Our code and features are available at: this https URL
摘要：视觉和语言导航（VLN）需要代理遵循自然语言指令，探索给定的环境，并达到预期的目标位置。当代理导航关于它没有先验知识的新环境下，这些一步一步的导航指令是至关重要的。最近的作品为：当在看不见的环境下进行试验研究VLN观察显著性能下降（即，在训练中不使用环境），表明神经代理模式是高度对训练环境偏见。虽然这个问题被认为是在VLN研究的主要挑战之一，它仍然在深入研究的，需要更清楚的解释。在这项工作中，我们通过环境重新分割和特征替换设计新颖的诊断试验，寻找到这个环境偏差的可能原因。我们观察到，无论是语言还是基本的导航图，但低级别的视觉外观传达由RESNET功能直接影响到代理模式，并有助于在结果这种环境偏差。根据这一观察，我们探索了几种含有较少的低级别的视觉信息，因此这些功能学到的代理可以更好地推广到看不见的测试环境的语义表示。在不修改基线代理模式和它的训练方法，我们的探索语义特征显著降低之间的性能差距看见和看不见的多个数据集（即R2R，R4R和CVDN），并获得竞争看不见的结果以前的状态的最先进的楷模。我们的代码和功能，请访问：此HTTPS URL

51. Line Artefact Quantification in Lung Ultrasound Images of COVID-19 Patients via Non-Convex Regularisation [PDF] 返回目录
Oktay Karakuş, Nantheera Anantrasirichai, Amazigh Aguersif, Stein Silva, Adrian Basarab, Alin Achim
Abstract: In this paper, we present a novel method for line artefacts quantification in lung ultrasound (LUS) images of COVID-19 patients. We formulate this as a non-convex regularisation problem involving a sparsity-enforcing, Cauchy-based penalty function, and the inverse Radon transform. We employ a simple local maxima detection technique in the Radon transform domain, associated with known clinical definitions of line artefacts. Despite being non-convex, the proposed method has guaranteed convergence via a proximal splitting algorithm and accurately identifies both horizontal and vertical line artefacts in LUS images. In order to reduce the number of false and missed detection, our method includes a two-stage validation mechanism, which is performed in both Radon and image domains. We evaluate the performance of the proposed method in comparison to the current state-of-the-art B-line identification method and show a considerable performance gain with 87% correctly detected B-lines in LUS images of nine COVID-19 patients. In addition, owing to its fast convergence, which takes around 12 seconds for a given frame, our proposed method is readily applicable for processing LUS image sequences.
摘要：在本文中，我们提出了在COVID-19的患者的肺超声（LUS）图像线伪像定量的新方法。我们制定这是涉及稀疏性的强制执行，基于柯西刑罚功能的非凸正规化的问题，和逆Radon变换。我们目前在氡简单的局部最大值检测技术变换域，本着文物已知的临床定义关联。尽管是是非凸的，所提出的方法已经经由近端分裂算法保证收敛性和准确地识别在LUS图像的水平和垂直线伪像。为了减少假和漏检的数量，我们的方法包括两个阶段的验证机制，这是在两者氡和图像域进行。我们评价所提出方法的性能相比于当前状态的最先进的B线识别方法和示出了具有九个COVID-19的患者LUS图像87％正确地检测到B-线相当大的性能增益。另外，由于它的快速收敛，这需要大约12秒，一个给定帧中，我们提出的方法是容易适用于处理LUS图像序列。

52. CovidCTNet: An Open-Source Deep Learning Approach to Identify Covid-19 Using CT Image [PDF] 返回目录
Tahereh Javaheri, Morteza Homayounfar, Zohreh Amoozgar, Reza Reiazi, Fatemeh Homayounieh, Engy Abbas, Azadeh Laali, Amir Reza Radmard, Mohammad Hadi Gharib, Seyed Ali Javad Mousavi, Omid Ghaemi, Rosa Babaei, Hadi Karimi Mobin, Mehdi Hosseinzadeh, Rana Jahanban-Esfahlan, Khaled Seidi, Mannudeep K. Kalra, Guanglan Zhang, L.T. Chitkushev, Benjamin Haibe-Kains, Reza Malekzadeh, Reza Rawassizadeh
Abstract: Coronavirus disease 2019 (Covid-19) is highly contagious with limited treatment options. Early and accurate diagnosis of Covid-19 is crucial in reducing the spread of the disease and its accompanied mortality. Currently, detection by reverse transcriptase polymerase chain reaction (RT-PCR) is the gold standard of outpatient and inpatient detection of Covid-19. RT-PCR is a rapid method, however, its accuracy in detection is only ~70-75%. Another approved strategy is computed tomography (CT) imaging. CT imaging has a much higher sensitivity of ~80-98%, but similar accuracy of 70%. To enhance the accuracy of CT imaging detection, we developed an open-source set of algorithms called CovidCTNet that successfully differentiates Covid-19 from community-acquired pneumonia (CAP) and other lung diseases. CovidCTNet increases the accuracy of CT imaging detection to 90% compared to radiologists (70%). The model is designed to work with heterogeneous and small sample sizes independent of the CT imaging hardware. In order to facilitate the detection of Covid-19 globally and assist radiologists and physicians in the screening process, we are releasing all algorithms and parametric details in an open-source format. Open-source sharing of our CovidCTNet enables developers to rapidly improve and optimize services, while preserving user privacy and data ownership.
摘要：冠状病2019（Covid-19）是具有有限的治疗选择具有高度传染性的。 Covid-19的早期，准确的诊断是减少疾病和死亡相伴的传播至关重要。目前，检测通过逆转录酶聚合酶链式反应（RT-PCR）是门诊和Covid-19的住院检测的黄金标准。 RT-PCR是一种快速方法，但是，其检测精度仅为〜70-75％。另一个批准策略是计算机断层摄影（CT）成像。 CT成像具有〜80-98％高得多的灵敏度，但70％的相似的精度。为了提高CT成像检测的准确性，我们开发了一种开源的算法集称为CovidCTNet从社区获得性肺炎（CAP）和其他肺部疾病成功区分Covid-19。 CovidCTNet时相比增加的放射科医师（70％）CT成像检测到的90％的准确度。该模型被设计为与异质和小样本大小的工作独立于CT成像硬件。为了方便Covid-19的探测全球和帮助放射科医生和医生在筛选过程中，我们在释放一个开源格式的所有算法和参数的详细信息。我们CovidCTNet的开源共享使开发人员能够快速改善和优化服务，同时保护用户的隐私和数据所有权。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-05-08

目录

摘要