摘要

1. Mind the Pad -- CNNs can Develop Blind Spots [PDF] 返回目录
Bilal Alsallakh, Narine Kokhlikyan, Vivek Miglani, Jun Yuan, Orion Reblitz-Richardson
Abstract: We show how feature maps in convolutional networks are susceptible to spatial bias. Due to a combination of architectural choices, the activation at certain locations is systematically elevated or weakened. The major source of this bias is the padding mechanism. Depending on several aspects of convolution arithmetic, this mechanism can apply the padding unevenly, leading to asymmetries in the learned weights. We demonstrate how such bias can be detrimental to certain tasks such as small object detection: the activation is suppressed if the stimulus lies in the impacted area, leading to blind spots and misdetection. We propose solutions to mitigate spatial bias and demonstrate how they can improve model accuracy.
摘要：我们表现出卷积网络功能地图如何易受空间偏差。由于架构选择的组合，在某些位置的激活系统地升高或减弱。这种偏差的主要来源是填充机制。根据卷积运算的几个方面，这种机制可以申请不均匀填充，导致在学习权不对称。我们演示如何偏差等可能会损害某些任务，如小物件检测：激活被抑制，如果刺激在于受影响区域，导致盲点和误检。我们提出解决方案，以减轻空间偏见和展示他们如何能提高模型的准确性。

2. Ego-Motion Alignment from Face Detections for Collaborative Augmented Reality [PDF] 返回目录
Branislav Micusik, Georgios Evangelidis
Abstract: Sharing virtual content among multiple smart glasses wearers is an essential feature of a seamless Collaborative Augmented Reality experience. To enable the sharing, local coordinate systems of the underlying 6D ego-pose trackers, running independently on each set of glasses, have to be spatially and temporally aligned with respect to each other. In this paper, we propose a novel lightweight solution for this problem, which is referred as ego-motion alignment. We show that detecting each other's face or glasses together with tracker ego-poses sufficiently conditions the problem to spatially relate local coordinate systems. Importantly, the detected glasses can serve as reliable anchors to bring sufficient accuracy for the targeted practical use. The proposed idea allows us to abandon the traditional visual localization step with fiducial markers or scene points as anchors. A novel closed form minimal solver which solves a Quadratic Eigenvalue Problem is derived and its refinement with Gaussian Belief Propagation is introduced. Experiments validate the presented approach and show its high practical potential.
摘要：多个智能眼镜配戴者之间共享的虚拟内容的无缝协同增强现实体验的本质特征。要启用共享，局部坐标的底层6D自我姿态跟踪器的系统中，在每个组的眼镜独立运行，必须相对于彼此在空间上和时间上对准。在本文中，我们提出了这个问题，这被称为自运动对准的新的轻便的解决方案。我们发现，探测对方的脸或眼镜连同追踪自我的姿态充分条件的问题空间涉及局部坐标系。重要的是，检测到的眼镜可以作为可靠的锚带来足够的精度为目标的实际使用。所提出的想法使我们能够放弃与基准标记或场景点作为锚传统视觉定位的一步。这解决了一个二次特征值问题的一种新颖的闭合形式解算器最小推导和其与高斯置信传播细化被引入。实验验证所提出的方法，并显示出其很高的实用潜力。

3. TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos [PDF] 返回目录
Kailas Vodrahalli, Roxana Daneshjou, Roberto A Novoa, Albert Chiou, Justin M Ko, James Zou
Abstract: Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework proposed here can be generalized to other health domains. For telemedicine, dermatologists request that patients submit images of their lesions for assessment. However, these images are often of insufficient quality to make a clinical diagnosis since patients do not have experience taking clinical photos. A clinician has to manually triage poor quality images and request new images to be submitted, leading to wasted time for both the clinician and the patient. We propose an automated image assessment machine learning pipeline, TrueImage, to detect poor quality dermatology photos and to guide patients in taking better photos. Our experiments indicate that TrueImage can reject 50% of the sub-par quality images, while retaining 80% of good quality images patients send in, despite heterogeneity and limitations in the training data. These promising results suggest that our solution is feasible and can improve the quality of teledermatology care.
摘要：远程医疗是医疗保健生态系统的一个日益重要的组成部分，特别是对COVID-19大流行所致。快速采用远程医疗的在现有的基础设施已经暴露出局限性。在本文中，我们研究和亮点照片质量作为远程医疗工作流程的一个重大挑战。我们专注于皮肤学，其中的照片质量就显得尤为重要;这里提出的框架可以推广到其他健康领域。对于远程医疗，皮肤科医生要求患者提交病灶的图像进行评估。然而，这些图像往往是质量不足的，以作出临床诊断，因为患者没有服用经验的临床照片。临床医生必须手动分流质量差的图像，并要求提交新的图像，从而导致浪费的时间为临床医生和患者。我们提出的自动图像评估机器学习管道，TrueImage，检测质量差的皮肤科照片和指导患者采取更好的照片。我们的实验表明，该TrueImage可以拒绝低于标准杆高质量图像的50％，同时保持良好的图像质量病人送80％，尽管异质性和限制在训练数据。这些令人鼓舞的结果表明，我们的解决方案是可行的，能提高皮肤学护理的质量。

4. Non-anchor-based vehicle detection for traffic surveillance using bounding ellipses [PDF] 返回目录
Byeonghyeop Yu, Johyun Shin, Gyeongjun Kim, Seungbin Roh, Keemin Sohn
Abstract: Cameras for traffic surveillance are usually pole-mounted and produce images that reflect a birds-eye view. Vehicles in such images, in general, assume an ellipse form. A bounding box for the vehicles usually includes a large empty space when the vehicle orientation is not parallel to the edges of the box. To circumvent this problem, the present study applied bounding ellipses to a non-anchor-based, single-shot detection model (CenterNet). Since this model does not depend on anchor boxes, non-max suppression (NMS) that requires computing the intersection over union (IOU) between predicted bounding boxes is unnecessary for inference. The SpotNet that extends the CenterNet model by adding a segmentation head was also tested with bounding ellipses. Two other anchor-based, single-shot detection models (YOLO4 and SSD) were chosen as references for comparison. The model performance was compared based on a local dataset that was doubly annotated with bounding boxes and ellipses. As a result, the performance of the two models with bounding ellipses exceeded that of the reference models with bounding boxes. When the backbone of the ellipse models was pretrained on an open dataset (UA-DETRAC), the performance was further enhanced. The data augmentation schemes developed for YOLO4 also improved the performance of the proposed models. As a result, the best mAP score of a CenterNet with bounding ellipses exceeds 0.9.
摘要：相机对交通监控通常柱上和反映的鸟瞰视图产生图像。在这样的图像的车辆，在一般情况下，假设椭圆形式。用于车辆用包围盒通常包括一个大的空的空间时车辆方位是不平行于盒子的边缘。为了解决这个问题，本研究中应用的边界的椭圆的非锚定基，单次检测模型（CenterNet）。由于该模型不依赖于锚盒，需要预测的边界框之间的计算过联盟（IOU）的交叉点的非最大值抑制（NMS）是不必要的推论。通过添加一个分割头延伸CenterNet模型SpotNet也与包围椭圆测试。两个其他基于锚的，单次检测模型（YOLO4和SSD）被选择作为用于比较的引用。该模型的性能基于本地数据集倍加与边框和椭圆注释进行比较。其结果是，这两个模型与边界椭圆的性能超过了参考模型与边界框。当椭圆模型的主链被在一个开放的数据集（UA-DETRAC）预训练的，性能进一步增强。对于YOLO4开发的数据扩张计划也改善了提出的模型的性能。其结果是，与边界椭圆一CenterNet的最佳地图分数超过0.9。

5. Probabilistic 3D surface reconstruction from sparse MRI information [PDF] 返回目录
Katarína Tóthová, Sarah Parisot, Matthew Lee, Esther Puyol-Antón, Andrew King, Marc Pollefeys, Ender Konukoglu
Abstract: Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstruction methods, however, often only produce shapes of limited variability positioned in a canonical position or lack uncertainty evaluation. In this paper, we present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction. Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets whilst modelling the location of each mesh vertex through a Gaussian distribution. Prior shape information is encoded using a built-in linear principal component analysis (PCA) model. Extensive experiments on cardiac MR data show that our probabilistic approach successfully assesses prediction uncertainty while at the same time qualitatively and quantitatively outperforms SOTA methods in shape prediction. Compared to SOTA, we are capable of properly localising and orientating the prediction via the use of a spatially aware neural network.
摘要：从磁共振（MR）成像数据表面重建是在医学图像分析和临床研究必不可少的。一个可靠和有效的重建工具应：是在快速准确的定位以及和高分辨率模型的预测，评估预测的不确定性，工作，尽可能少的输入数据成为可能。本领域的当前深度学习状态（SOTA）3D重建方法，但是，通常只产生定位在正则位置有限的变异性的形状或缺乏的不确定性的评价。在本文中，我们提出了从稀疏2D MR图像数据和不确定性肆意预测并发三维表面重建的新的概率深学习方法。我们的方法能够从有限的培训集3的准正交MR成像切片重建大表面网格同时通过高斯分布建模的每个网格顶点的位置。现有形状信息是使用内置的线性主成分分析（PCA）模型编码。心脏MR数据的大量实验表明，我们的概率方法成功地评估预测的不确定性，而在同一时间定性和定量性能优于形状预测SOTA方法。相比于SOTA，我们能够正确地本地化，并通过使用空间感知神经网络的定向预测。

6. BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer [PDF] 返回目录
Or Patashnik, Dov Danon, Hao Zhang, Daniel Cohen-Or
Abstract: State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference. We show that employing modalities within the dataset improves the quality of the translated images, and that BalaGAN outperforms strong baselines of both unconditioned and style-transfer-based image-to-image translation methods, in terms of image quality and diversity.
摘要：采用最先进技术的图像到图像的翻译方法趋向于不平衡域设置，其中一个图像领域缺乏丰富性和多样性的斗争。我们推出了新的无监督的翻译网络，BalaGAN，专门用来解决不平衡域问题。我们充分利用更丰富的网域的潜在方式，接通图像 - 图像转换问题，这两个不平衡域之间，到平衡，多类别和条件翻译的问题，更像风格转移设置。具体而言，我们分析源域和学习它分解成一组潜在的模式或类别的，没有任何监督。这就给我们的平衡跨域翻译任务众多，所有对类，包括目标域之间。在推理时，训练网络需要输入一个源图像，以及从模式作为条件中的一个的参考或风格图像，并且产生其类似于在逐个像素水平的源的图像，但共享相同的模式作为参考。我们发现，采用数据集中的方式提高了转换图像的质量，而且BalaGAN优于两种风格为主转移，无条件和图像到图像的翻译方法强基线，在图像质量和多样性方面。

7. A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning [PDF] 返回目录
Ruchika Chavhan, Biplab Banerjee, Xiao Xiang Zhu, Subhasis Chaudhuri
Abstract: We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data, jointly encoding the sentences and images encourages prediction of captions that are semantically more precise than the ground truth in many cases. To this end, we introduce an Actor Dual-Critic training strategy where a second critic model is deployed in the form of an encoder-decoder RNN to encode the latent information corresponding to the original and generated captions. While all actor-critic methods use an actor to predict sentences for an image and a critic to provide rewards, our proposed encoder-decoder RNN guarantees high-level comprehension of images by sentence-to-image translation. We observe that the proposed model generates sentences on the test data highly similar to the ground truth and is successful in generating even better captions in many critical cases. Extensive experiments on the benchmark Remote Sensing Image Captioning Dataset (RSICD) and the UCM-captions dataset confirm the superiority of the proposed approach in comparison to the previous state-of-the-art where we obtain a gain of sharp increments in both the ROUGE-L and CIDEr measures.
摘要：我们在处理与利用深强化学习的概念，产生从光学遥感（RS）的文本字幕图像的问题。由于在参考句子的高级间相似度描述遥感数据，联合编码的句子和图像鼓励在语义上比在许多情况下，地面实况更精确的字幕的预测。为此，我们引入其中的第二评论家模型部署在编码器 - 解码器RNN的形式来编码对应于原始和产生的字幕的潜信息的演员双评论家训练策略。虽然所有演员评论家方法使用一个演员来预测句子的图像和批评家提供奖励，我们提出的编码解码器RNN保证了一句到影像转换图像的高层次的理解。我们观察到，该模型产生上高度相似的地面实况测试数据的句子，是成功的，在许多重大案件产生更好的字幕。基准的广泛的实验遥感影像的字幕数据集（RSICD）和UCM-字幕数据集证实了该方法的优越性相比于以前的状态的最先进的，我们在这两个ROUGE获得清晰的增量增益-L和苹果酒措施。

8. EqCo: Equivalent Rules for Self-supervised Contrastive Learning [PDF] 返回目录
Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun
Abstract: In this paper, we propose a method, named EqCo (Equivalent Rules for Contrastive Learning), to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework. Inspired by the infomax principle, we point that the margin term in contrastive loss needs to be adaptively scaled according to the number of negative pairs in order to keep steady mutual information bound and gradient magnitude. EqCo bridges the performance gap among a wide range of negative sample sizes, so that for the first time, we can perform self-supervised contrastive training using only a few negative pairs (e.g.smaller than 256 per query) on large-scale vision tasks like ImageNet, while with little accuracy drop. This is quite a contrast to the widely used large batch training or memory bank mechanism in current practices. Equipped with EqCo, our simplified MoCo (SiMo) achieves comparable accuracy with MoCo v2 on ImageNet (linear evaluation protocol) while only involves 16 negative pairs per query instead of 65536, suggesting that large quantities of negative samples might not be a critical factor in contrastive learning frameworks.
摘要：在本文中，我们提出了一个方法，名为EqCo（用于对比学习等效规则），进行自我监督学习无关的阴性样品在对比学习框架的数量。由最大熵原理的启发，我们指出，在对比损失的需要保证金期限根据负对数，以保持稳定的结合互信息和梯度幅值进行自适应调整的。 EqCo桥等一系列负面样本量之间的性能差距，因此，对于第一次，我们可以对大型视觉任务，如仅使用少量负对（egsmaller超过256每次查询）自我监督对比训练ImageNet，而很少精度下降。这是在目前的做法相当对比的是广泛使用的大批量培训或记忆库的机制。配备EqCo，我们的简化莫科（SIMO）实现了与莫科V2可比的精度上ImageNet（线性评价协议），而仅涉及每个查询16负对而不是65536，这表明大量的阴性样品可能无法在对比的关键因素学习框架。

9. Automatic label correction based on CCESD [PDF] 返回目录
Jiawei Liu, Qiang Wang, Huijie Fan, Yandong Tang
Abstract: In the computer-aided diagnosis of cervical precancerous lesions, it is essential for accurate cell segmentation. For a cervical cell image with multi-cell overlap (n>3), blurry and noisy background, and low contrast, it is difficult for a professional doctor to obtain an ultra-high-precision labeled image. On the other hand, it is possible for the annotator to draw the outline of the cell as accurately as possible. However, if the label edge position is inaccurate, the accuracy of the training model will decrease, and it will have a great impact on the accuracy of the model evaluation. We designed an automatic label correction algorithm based on gradient guidance, which can solve the effects of poor edge position accuracy and differences between different annotators during manual labeling. At the same time, an open cervical cell edge segmentation dataset (CCESD) with higher labeling accuracy was constructed. We also use deep learning models to generate the baseline performance on CCESD. Using the modified labeling data to train multiple models compared to the original labeling data can be improved 7% average precision (AP). The implementation is available at this https URL.
摘要：宫颈癌前病变的计算机辅助诊断，这是准确的细胞分割是必不可少的。对于具有多小区重叠（N> 3），模糊和噪声背景，和低对比度宫颈细胞图像中，难以为专业的医生以获得超高精度的标记的图像。在另一方面，可能的是注释器，以尽可能准确地绘制细胞的轮廓。但是，如果标签边缘位置是不准确的，训练模型的准确度会降低，这将对模型评估的准确性有很大的影响。我们设计了一种基于梯度的指导，可以解决差的边缘位置精度和手动标记在不同的注释之间的差异的影响的自动标签校正算法。与此同时，一个开放的宫颈细胞边缘分割数据集（CCESD）以更高的精确度标记构建。我们还使用深度学习模型来生成对CCESD基准性能。使用修改的标签数据相对于原始数据的标签来训练多个模型可以提高7％的平均精确度（AP）。实施可在此HTTPS URL。

10. Best Buddies Registration for Point Clouds [PDF] 返回目录
Amnon Drory, Tal Shomer, Shai Avidan, Raja Giryes
Abstract: We propose new, and robust, loss functions for the point cloud registration problem. Our loss functions are inspired by the Best Buddies Similarity (BBS) measure that counts the number of mutual nearest neighbors between two point sets. This measure has been shown to be robust to outliers and missing data in the case of template matching for images. We present several algorithms, collectively named Best Buddy Registration (BBR), where each algorithm consists of optimizing one of these loss functions with Adam gradient descent. The loss functions differ in several ways, including the distance function used (point-to-point vs. point-to-plane), and how the BBS measure is combined with the actual distances between pairs of points. Experiments on various data sets, both synthetic and real, demonstrate the effectiveness of the BBR algorithms, showing that they are quite robust to noise, outliers, and distractors, and cope well with extremely sparse point clouds. One variant, BBR-F, achieves state-of-the-art accuracy in the registration of automotive lidar scans taken up to several seconds apart, from the KITTI and Apollo-Southbay datasets.
摘要：我们提出了点云的注册问题新的和稳健的，丧失功能。我们的损失函数被死党相似（BBS）措施的启发才是最重要的两点集之间的相互最近邻居的数量。这种措施已被证明是稳健的模板匹配为图像的情况下的异常值和缺失数据。我们本若干算法，统称为最好的朋友登记（BBR），其中每个算法包括优化与亚当梯度下降这些损失功能之一。的损失函数在几个方面，包括所使用的距离函数（点对点与点对平面）不同，并且BBS度量如何与对点之间的实际距离相结合。上的各种数据集的实验，合成的和真实的，演示的BBR算法的有效性，这表明它们是非常鲁棒的噪声，异常值，和干扰项，并用极其稀疏的点云应付良好。一个变型中，BBR-F，实现了吸收到几秒分开，从KITTI和阿波罗南湾数据集汽车激光雷达扫描的登记状态的最先进的精度。

11. Semantics through Time: Semi-supervised Segmentation of Aerial Videos with Iterative Label Propagation [PDF] 返回目录
Alina Marcu, Vlad Licaret, Dragos Costea, Marius Leordeanu
Abstract: Semantic segmentation is a crucial task for robot navigation and safety. However, current supervised methods require a large amount of pixelwise annotations to yield accurate results. Labeling is a tedious and time consuming process that has hampered progress in low altitude UAV applications. This paper makes an important step towards automatic annotation by introducing SegProp, a novel iterative flow-based method, with a direct connection to spectral clustering in space and time, to propagate the semantic labels to frames that lack human annotations. The labels are further used in semi-supervised learning scenarios. Motivated by the lack of a large video aerial dataset, we also introduce Ruralscapes, a new dataset with high resolution (4K) images and manually-annotated dense labels every 50 frames - the largest of its kind, to the best of our knowledge. Our novel SegProp automatically annotates the remaining unlabeled 98% of frames with an accuracy exceeding 90% (F-measure), significantly outperforming other state-of-the-art label propagation methods. Moreover, when integrating other methods as modules inside SegProp's iterative label propagation loop, we achieve a significant boost over the baseline labels. Finally, we test SegProp in a full semi-supervised setting: we train several state-of-the-art deep neural networks on the SegProp-automatically-labeled training frames and test them on completely novel videos. We convincingly demonstrate, every time, a significant improvement over the supervised scenario.
摘要：语义分割是机器人导航和安全性极为重要的任务。然而，目前的方法有监督需要大量按像素注解以得到准确的结果。标记是乏味且耗时的过程，已经阻碍了在低海拔UAV应用进展。本文对通过引入SegProp，一种新型的迭代基于流的方法，在空间和时间的直接连接到谱聚类，所述语义标签传播到缺乏人力注解帧朝向自动标注的一个重要步骤。标签在半监督学习情境继续使用。由于缺乏大型视频航拍数据集的启发，我们也引进Ruralscapes，具有高分辨率（4K）的图像和手动标注密集标签每50帧的新数据集 - 其规模最大的，据我们所知。我们的新颖SegProp，其精确度超过90％（F值）自动注释帧的剩余未标记的98％，显著优于国家的最先进的其它标记传播方法。此外，整合其他方法SegProp的迭代标签传播循环内部模块时，我们实现对基准标记一个显著提升。最后，我们在一个完整的半监督设置测试SegProp：我们在SegProp，自动标记训练帧和测试他们完全新颖的视频培训国家的最先进的几个深层神经网络。我们令人信服地证明，每一次，在监督的情况下一个显著的改善。

12. Joint Scene and Object Tracking for Cost-Effective Augmented Reality Assisted Patient Positioning in Radiation Therapy [PDF] 返回目录
Hamid Sarmadi, Rafael Muñoz-Salinas, M. Álvaro Berbís, Antonio Luna, R. Medina-Carnicer
Abstract: $\textrm{Background and Objective:}$ The research done in the field of Augmented Reality (AR) for patient positioning in radiation therapy is scarce. We propose an efficient and cost-effective algorithm for tracking the scene and the patient to interactively assist the patient's positioning process by providing visual feedback to the operator. $\textrm{Methods:}$ We have taken advantage of the marker mapper algorithm combined with other steps including generalized ICP to track the patient. We track the environment using the UcoSLAM algorithm. The alignment between the 3D reference model and body marker map is calculated employing our efficient body reconstruction algorithm. $\textrm{Results}:$ Our quantitative evaluation shows that we were able to achieve an average rotational error of 1.77 deg and a translational error of 7.28 mm. Our algorithm performed with an average frame rate of 19 fps. Furthermore, the qualitative results demonstrate the usefulness of our algorithm in patient positioning on different human subjects. $\textrm{Conclusion:}$ Since our algorithm achieves a relatively high frame rate and accuracy without the usage of a dedicated GPU employing a regular laptop, it is a very cost-effective AR-based patient positioning method.
摘要：$ \ {TEXTRM背景和目的：}美元，用于放射治疗的患者定位是稀缺的增强现实（AR）领域所做的研究。我们建议跟踪现场和病人通过向运营商提供视觉反馈以交互方式帮助病人的定位过程中的效率和成本效益的算法。 $ \ {TEXTRM方法：} $我们已经采取的标记映射算法与其他步骤包括全身ICP追踪病人合并的优势。我们跟踪使用UcoSLAM算法的环境。所述3D参考模型和身体标记地图之间的对准来计算采用我们的高效率的机构重建算法。 $ \ TEXTRM {结果}：$我们的定量评价表明，我们能够实现1.77度的平均旋转误差和7.28毫米的平移误差。我们的算法为19 fps的平均帧速率来执行。此外，定性的结果表明我们的算法在不同的人类个体患者定位的有效性。 $ \ {TEXTRM结论：} $因为我们的算法达到比较高的帧速率和准确性不使用普通笔记本电脑专用GPU的使用，这是一个非常具有成本效益的基于AR-患者定位方法。

13. Joint Pruning & Quantization for Extremely Sparse Neural Networks [PDF] 返回目录
Po-Hsiang Yu, Sih-Sian Wu, Jan P. Klopp, Liang-Gee Chen, Shao-Yi Chien
Abstract: We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99% of memory demand can be cut while hardware costs can be reduced up to 99.9%. In addition, to compare with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and ImageNet.
摘要：本文探讨深层神经网络修剪和量化。我们的目标是实现极高的稀疏性量化网络，实现对低成本和低功耗的加速器硬件实现。在实际情况下，也有密集的预测任务特别是许多应用程序，因此我们选择了立体深度估计为目标。我们提出了一个两个阶段的修剪和量化管道和引进泰勒得分旁边一个新的微调方式，实现极端稀疏而不牺牲性能。我们的评估不仅表明修剪和量化，应联合调查，但也显示出的内存需求近99％时，可切，而硬件成本最多可降低至99.9％。此外，与其他作品相比，我们展示了当CIFAR10和ImageNet应用于RESNET我们的修剪阶段单独击败了国家的最先进的。

14. MetaBox+: A new Region Based Active Learning Method for Semantic Segmentation using Priority Maps [PDF] 返回目录
Pascal Colling, Lutz Roese-Koerner, Hanno Gottschalk, Matthias Rottmann
Abstract: We present a novel region based active learning method for semantic image segmentation, called MetaBox+. For acquisition, we train a meta regression model to estimate the segment-wise Intersection over Union (IoU) of each predicted segment of unlabeled images. This can be understood as an estimation of segment-wise prediction quality. Queried regions are supposed to minimize to competing targets, i.e., low predicted IoU values / segmentation quality and low estimated annotation costs. For estimating the latter we propose a simple but practical method for annotation cost estimation. We compare our method to entropy based methods, where we consider the entropy as uncertainty of the prediction. The comparison and analysis of the results provide insights into annotation costs as well as robustness and variance of the methods. Numerical experiments conducted with two different networks on the Cityscapes dataset clearly demonstrate a reduction of annotation effort compared to random acquisition. Noteworthily, we achieve 95%of the mean Intersection over Union (mIoU), using MetaBox+ compared to when training with the full dataset, with only 10.47% / 32.01% annotation effort for the two networks, respectively.
摘要：我们提出了语义图像分割的新颖区域基于主动学习方法，称为MetaBox +。用于采集，我们培养了元回归模型来估计未标记的图像中的每个预测分块的过联盟（IOU）的逐段交叉口。这可以被理解为逐段预测质量的估计。查询区域应该尽量减少竞争的目标，即低预测IOU值/分段质量和低估计注释成本。为了估计后者，我们提出了注释成本估算一个简单而实用的方法。我们我们的方法比较基于熵的方法，在这里我们考虑熵作为预测的不确定性。结果的比较和分析提供深入了解注释成本以及鲁棒性和方差的方法。随着对城市景观两个不同的网络上进行的数值实验数据集清楚地表明，相比于随机采集的减少注释的努力。 Noteworthily，我们实现了平均超过交集联盟（米欧）的95％，使用MetaBox +相比，在训练时与完整数据集，分别只有10.47％/ 32.01％注释为工作在两个网络。

15. Monocular Rotational Odometry with Incremental Rotation Averaging and Loop Closure [PDF] 返回目录
Chee-Kheng Chng, Alvaro Parra, Tat-Jun Chin, Yasir Latif
Abstract: Estimating absolute camera orientations is essential for attitude estimation tasks. An established approach is to first carry out visual odometry (VO) or visual SLAM (V-SLAM), and retrieve the camera orientations (3 DOF) from the camera poses (6 DOF) estimated by VO or V-SLAM. One drawback of this approach, besides the redundancy in estimating full 6 DOF camera poses, is the dependency on estimating a map (3D scene points) jointly with the 6 DOF poses due to the basic constraint on structure-and-motion. To simplify the task of absolute orientation estimation, we formulate the monocular rotational odometry problem and devise a fast algorithm to accurately estimate camera orientations with 2D-2D feature matches alone. Underpinning our system is a new incremental rotation averaging method for fast and constant time iterative updating. Furthermore, our system maintains a view-graph that 1) allows solving loop closure to remove camera orientation drift, and 2) can be used to warm start a V-SLAM system. We conduct extensive quantitative experiments on real-world datasets to demonstrate the accuracy of our incremental camera orientation solver. Finally, we showcase the benefit of our algorithm to V-SLAM: 1) solving the known rotation problem to estimate the trajectory of the camera and the surrounding map, and 2)enabling V-SLAM systems to track pure rotational motions.
摘要：估计绝对相机的方位是姿态估计任务是必不可少的。已建立的方法是先进行视觉测距法（VO）或视觉SLAM（V-SLAM），和检索由VO或V-SLAM估计相机的姿势（6自由度）的照相机取向（3 DOF）。这种方法的一个缺点，除了在估计全6个DOF摄影机姿态冗余，是对估计地图（3D场景分）与6个DOF姿态共同的依赖性由于对结构与动作的基本约束。为了简化绝对定向估计的任务，我们制定单眼旋转测距问题，并制定一个快速算法准确估计相机的方位与2D-2D功能相匹配孤单。支撑我们的系统是一个用于快速和持续的时间反复更新新的增量旋转平均法。此外，我们的系统维护视图图，其1）允许解决环路闭合以除去相机方向漂移，和2）可用于热启动的V SLAM系统。我们进行了真实世界的数据集大量的定量实验，以证明我们的增量相机定位解算的精度。最后，我们展示了我们的算法，以V-SLAM的好处：解决已知的旋转问题估计相机的轨迹使V-SLAM系统来追踪纯旋转运动1）和周边地图，和2）。

16. AE-Netv2: Optimization of Image Fusion Efficiency and Network Architecture [PDF] 返回目录
Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Beibei Qin, Yanning Zhang
Abstract: Existing image fusion methods pay few research attention to image fusion efficiency and network architecture. However, the efficiency and accuracy of image fusion has an important impact in practical applications. To solve this problem, we propose an \textit{efficient autonomous evolution image fusion method, dubed by AE-Netv2}. Different from other image fusion methods based on deep learning, AE-Netv2 is inspired by human brain cognitive mechanism. Firstly, we discuss the influence of different network architecture on image fusion quality and fusion efficiency, which provides a reference for the design of image fusion architecture. Secondly, we explore the influence of pooling layer on image fusion task and propose an image fusion method with pooling layer. Finally, we explore the commonness and characteristics of different image fusion tasks, which provides a research basis for further research on the continuous learning characteristics of human brain in the field of image fusion. Comprehensive experiments demonstrate the superiority of AE-Netv2 compared with state-of-the-art methods in different fusion tasks at a real time speed of 100+ FPS on GTX 2070. Among all tested methods based on deep learning, AE-Netv2 has the faster speed, the smaller model size and the better robustness.
摘要：现有的图像融合方法支付一些研究关注于图像融合效率和网络架构。然而，图像融合的效率和准确性在实际应用中有着重要的影响。为了解决这个问题，我们提出了一个\ textit {高效的自主进化图像融合方法，通过AE-Netv2 dubed}。基于深度学习其他图像融合方法的不同，AE-Netv2是由人脑认知机制的启发。首先，我们将讨论不同的网络体系结构的图像融合质量和融合效率，它提供了图像融合体系结构的设计基准的影响。其次，我们探讨汇集在图像融合任务层的影响，并提出与汇集层的图像融合方法。最后，我们探讨的共性和不同的图像融合任务的特点，它提供了在图像融合领域人类大脑的不断学习特点进一步研究的研究基础。综合实验证明AE-Netv2的优越性与2070 GTX国家的最先进的方法，在不同的融合任务的100 + FPS的实时速度在基于深度学习所有测试方法相比，AE-Netv2有更快的速度，更小的模型尺寸和更好的鲁棒性。

17. Depth-wise layering of 3d images using dense depth maps: a threshold based approach [PDF] 返回目录
Seyedsaeid Mirkamali, P. Nagabhushan
Abstract: Image segmentation has long been a basic problem in computer vision. Depth-wise Layering is a kind of segmentation that slices an image in a depth-wise sequence unlike the conventional image segmentation problems dealing with surface-wise decomposition. The proposed Depth-wise Layering technique uses a single depth image of a static scene to slice it into multiple layers. The technique employs a thresholding approach to segment rows of the dense depth map into smaller partitions called Line-Segments in this paper. Then, it uses the line-segment labelling method to identify number of objects and layers of the scene independently. The final stage is to link objects of the scene to their respective object-layers. We evaluate the efficiency of the proposed technique by applying that on many images along with their dense depth maps. The experiments have shown promising results of layering.
摘要：图像分割一直是计算机视觉的一个基本问题。深度方向的分层是一种细分的，在不同于处理表面明智分解传统的图像分割问题深度方向序列切片的图像。所提出的深度方向的分层技术使用静态的场景的单个深度图像到它切成多个层。该技术采用一个阈值的方法来稠密深度图成称为线路段在本文中更小的分区段的行。然后，它使用的线段标记方法以独立地识别所述场景的对象和层的数量。最后阶段是对所述场景的对象链接到它们各自的对象的层。我们通过应用上有许多图像与他们的密集深度图沿评价所提出的技术的效率。根据试验结果表明看好分层的结果。

18. Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance [PDF] 返回目录
Saptarshi Sinha, Hiroki Ohashi, Katsuyuki Nakamura
Abstract: Class-imbalance is one of the major challenges in real world datasets, where a few classes (called majority classes) constitute much more data samples than the rest (called minority classes). Learning deep neural networks using such datasets leads to performances that are typically biased towards the majority classes. Most of the prior works try to solve class-imbalance by assigning more weights to the minority classes in various manners (e.g., data re-sampling, cost-sensitive learning). However, we argue that the number of available training data may not be always a good clue to determine the weighting strategy because some of the minority classes might be sufficiently represented even by a small number of training data. Overweighting samples of such classes can lead to drop in the model's overall performance. We claim that the 'difficulty' of a class as perceived by the model is more important to determine the weighting. In this light, we propose a novel loss function named Class-wise Difficulty-Balanced loss, or CDB loss, which dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to. Note that the assigned weights dynamically change as the 'difficulty' for the model may change with the learning progress. Extensive experiments are conducted on both image (artificially induced class-imbalanced MNIST, long-tailed CIFAR and ImageNet-LT) and video (EGTEA) datasets. The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets irrespective of the data type (i.e., video or image).
摘要：类失衡是在现实世界中的数据集，其中几类（称为多数班）构成更多的数据样本比其他人（称为民族班）的主要挑战之一。学习使用这些数据集导致了通常对多数类偏向表演深层神经网络。大多数现有工程尝试通过以各种方式（例如，数据重新采样，对成本敏感的学习）的少数类分配多个权重来解决类不平衡。然而，我们认为，现有的训练数据的数量可能不总是一个很好的线索来确定权重的策略，因为一些少数民族类可能被充分甚至有少数的训练数据的表示。增持这些类的样本可能导致模型的整体性能下降。我们断言感知由模型类的“困难”更重要的是确定的权重。在这个光中，我们提出命名类明智难度平衡损失，或CDB损失的新颖损失函数，它根据该样品属于类的难度动态分配权重给每个样品。请注意，分配的权重动态变化作为模型可以与学习进度改变“困难”。广泛的实验都图像上进行（人工诱导类不平衡MNIST，长尾CIFAR和ImageNet-LT）和视频（EGTEA）的数据集。结果表明，CDB损耗始终优于上类不平衡数据集最近提出的损失函数不考虑数据类型（即，视频或图像）的。

19. Painting Outside as Inside: Edge Guided Image Outpainting via Bidirectional Rearrangement with Step-By-Step Learning [PDF] 返回目录
Kyunghun Kim, Yeohun Yun, Keon-Woo Kang, Kyungbo Gong, Siyeong Lee, Suk-Ju Kang
Abstract: Image outpainting is a very intriguing problem as the outside of a given image can be continuously filled by considering as the context of the image. This task has two main challenges. The first is to maintain the spatial consistency in contents of generated regions and the original input. The second is to generate a high-quality large image with a small amount of adjacent information. Conventional image outpainting methods generate inconsistent, blurry, and repeated pixels. To alleviate the difficulty of an outpainting problem, we propose a novel image outpainting method using bidirectional boundary region rearrangement. We rearrange the image to benefit from the image inpainting task by reflecting more directional information. The bidirectional boundary region rearrangement enables the generation of the missing region using bidirectional information similar to that of the image inpainting task, thereby generating the higher quality than the conventional methods using unidirectional information. Moreover, we use the edge map generator that considers images as original input with structural information and hallucinates the edges of unknown regions to generate the image. Our proposed method is compared with other state-of-the-art outpainting and inpainting methods both qualitatively and quantitatively. We further compared and evaluated them using BRISQUE, one of the No-Reference image quality assessment (IQA) metrics, to evaluate the naturalness of the output. The experimental results demonstrate that our method outperforms other methods and generates new images with 360°panoramic characteristics.
摘要：图片outpainting是一个非常耐人寻味的问题，因为一个给定的图像外可以考虑作为图像的背景下连续填充。这个任务有两个主要的挑战。第一种方法是在保持产生的区域和原始输入的内容的空间一致性。第二个是与相邻的少量信息，以产生高质量的大的图像。传统的图像outpainting方法产生不一致的，模糊的，反复的像素。以减轻outpainting问题的难度，我们提出使用双向边界区域重排的新型图像outpainting方法。我们通过反射更有方向性信息，重新排列的图像，从图像修复任务的好处。双向边界区域重排使得能够使用类似于图像修复任务的双向信息，从而产生比使用单向信息的常规方法质量越高的丢失区域的发生。此外，我们使用了边缘图产生器，其考虑图像作为具有结构信息和出现幻觉未知区域的边缘，以产生图像的原始输入。我们提出的方法是与国家的最先进的其他outpainting和定性和定量补绘方法相比。我们进一步比较和使用必胜的无参考图像质量评价（IQA）指标之一，以评估输出的自然度评价他们。实验结果表明，我们的方法优于其它方法，并产生360°全景特征的新的图像。

20. Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [PDF] 返回目录
Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, Stella X. Yu
Abstract: Natural data are often long-tail distributed over semantic classes. Existing recognition methods tend to focus on tail performance gain, often at the expense of head performance loss from increased classifier variance. The low tail performance manifests itself in large inter-class confusion and high classifier variance. We aim to reduce both the bias and the variance of a long-tailed classifier by RoutIng Diverse Experts (RIDE). It has three components: 1) a shared architecture for multiple classifiers (experts); 2) a distribution-aware diversity loss that encourages more diverse decisions for classes with fewer training instances; and 3) an expert routing module that dynamically assigns more ambiguous instances to additional experts. With on-par computational complexity, RIDE significantly outperforms the state-of-the-art methods by 5% to 7% on all the benchmarks including CIFAR100-LT, ImageNet-LT and iNaturalist. RIDE is also a universal framework that can be applied to different backbone networks and integrated into various long-tailed algorithms and training mechanisms for consistent performance gains.
摘要：自然的数据往往是长尾分布在语义类。现有的识别方法往往集中在尾部的性能增益，往往从提高分类方差头的性能损失为代价的。低尾性能体现在大类间的混乱和高分级方差。我们的目标是同时减少偏差和长尾分类由路由多元化专家（RIDE）中的变化。它有三个部分：1）对多个分类器（专家）的共享架构; 2）鼓励用较少的训练实例类更多样化的决策分布感知多样性丧失;和3）的专家路由模块动态分配到更多的专家更含糊的实例。与上面值的计算复杂度，童车显著5％优于国家的最先进的方法，以7％的所有基准包括CIFAR100-LT，ImageNet-LT和iNaturalist。 RIDE也是一个通用的框架，可以适用于不同骨干网和集成到各种长尾算法和培训机制，一致的性能提升。

21. MetaPhys: Unsupervised Few-Shot Adaptation for Non-Contact Physiological Measurement [PDF] 返回目录
Xin Liu, Ziheng Jiang, Josh Fromm, Xuhai Xu, Shwetak Patel, Daniel McDuff
Abstract: There are large individual differences in physiological processes, making designing personalized health sensing algorithms challenging. Existing machine learning systems struggle to generalize well to unseen subjects or contexts, especially in video-based physiological measurement. Although fine-tuning for a user might address this issue, it is difficult to collect large sets of training data for specific individuals because supervised algorithms require medical-grade sensors for generating the training target. Therefore, learning personalized or customized models from a small number of unlabeled samples is very attractive as it would allow fast calibrations. In this paper, we present a novel unsupervised meta-learning approach called MetaPhys for learning personalized cardiac signals from 18-seconds of unlabeled video data. We evaluate our proposed approach on two benchmark datasets and demonstrate superior performance in cross-dataset evaluation with substantial reductions (42% to 44%) in errors compared with state-of-the-art approaches. Visualization of attention maps and ablation experiments reveal how the model adapts to each subject and why our proposed approach leads to these improvements. We have also demonstrated our proposed method significantly helps reduce the bias in skin type.
摘要：有生理过程个体差异较大，使得设计个性化的健康检测算法挑战。现有机器学习系统很难很好推广到看不见受试者或上下文中，尤其是在基于视频的生理测量。虽然微调的用户可能会解决这个问题，就很难收集大量集特定个人的训练数据，因为监督算法需要医疗级传感器产生的培养目标。从少数未标记样本。因此，学习个性化或定制的机型是非常有吸引力的，因为它能够快速的校准。在本文中，我们提出了称为METAPHYS用于学习个性化与未标记的视频数据的18秒的心脏信号的新颖的无监督的元学习的方法。我们评估我们所提出的两个标准数据集的方法，并展示在错误的大幅减少（42％至44％），与国家的最先进的方法相比，跨数据集的评估性能优越。注意图和消融实验的可视化显示如何模型适应每个主题，为什么我们提出的方法导致了这些改进。我们还证明我们提出的方法显著有助于减少皮肤类型的偏见。

22. A Review of Vegetation Encroachment Detection in Power Transmission Lines using Optical Sensing Satellite Imagery [PDF] 返回目录
Fathi Mahdi Elsiddig Haroun, Siti Noratiqah Mohamad Deros, Norashidah Md Din
Abstract: Vegetation encroachment in power transmission lines can cause outages, which may result in severe impact on economic of power utilities companies as well as the consumer. Vegetation detection and monitoring along the power line corridor right-of-way (ROW) are implemented to protect power transmission lines from vegetation penetration. There were various methods used to monitor the vegetation penetration, however, most of them were too expensive and time consuming. Satellite images can play a major role in vegetation monitoring, because it can cover high spatial area with relatively low cost. In this paper, the current techniques used to detect the vegetation encroachment using satellite images are reviewed and categorized into four sectors; Vegetation Index based method, object-based detection method, stereo matching based and other current techniques. However, the current methods depend usually on setting manually serval threshold values and parameters which make the detection process very static. Machine Learning (ML) and deep learning (DL) algorithms can provide a very high accuracy with flexibility in the detection process. Hence, in addition to review the current technique of vegetation penetration monitoring in power transmission, the potential of using Machine Learning based algorithms are also included.
摘要：植被侵占输电线路会中断，这可能导致对经济电力公用事业公司的严重冲击，以及消费者。植被检测和沿电源线走廊铁路地役权（ROW）监视被实现为从植被渗透保护电力传输线。有用于监视植被穿透各种方法，但是，其中大多数是过于昂贵和费时。卫星图像能起到植被监测中起主要作用，因为它可以覆盖相对较低的成本高的空间区域。在本文中，当前技术中所使用利用卫星图像进行了综述和分类为四个扇区，以检测植被侵占;植被指数为基础的方法，基于对象的检测方法中，基于立体匹配和其它的电流的技术。然而，目前的方法通常依赖于手动设定阈薮值和参数，这使得在检测过程中非常静态的。机器学习（ML）和深度学习（DL）算法可以提供在检测过程中的灵活性非常高的精度。因此，除了审查植被穿透输变电监测的电流技术，使用基于机器学习算法的潜力也包括在内。

23. Revisiting Batch Normalization for Training Low-latency Deep Spiking Neural Networks from Scratch [PDF] 返回目录
Youngeun Kim, Priyadarshini Panda
Abstract: Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. Most existing approaches to create SNNs either convert the weights from pre-trained Artificial Neural Networks (ANNs) or directly train SNNs with surrogate gradient backpropagation. Each approach presents its pros and cons. The ANN-to-SNN conversion method requires at least hundreds of time-steps for inference to yield competitive accuracy that in turn reduces the energy savings. Training SNNs with surrogate gradients from scratch reduces the latency or total number of time-steps, but the training becomes slow/problematic and has convergence issues. Thus, the latter approach of training SNNs has been limited to shallow networks on simple datasets. To address this training issue in SNNs, we revisit batch normalization and propose a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. Different from previous works, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. The temporally evolving learnable parameters in BNTT allow a neuron to control its spike rate through different time-steps, enabling low-latency and low-energy training from scratch. We conduct experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and event-driven DVS-CIFAR10 datasets. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. We also propose an early exit algorithm using the distribution of parameters in BNTT to reduce the latency at inference, that further improves the energy-efficiency.
摘要：扣球神经网络（SNNS）最近出现了作为替代深度学习，由于稀疏，异步和二进制事件（或尖峰）驱动的处理，可在神经形态硬件产生巨大的能源效率方面的好处。大多数现有的方法来创建SNNS无论是从预先训练的人工神经网络转换的权重（人工神经网络），或者直接用火车替代梯度反向传播SNNS。每一种方法介绍其优点和缺点。人工神经网络对SNN转换方法至少需要数百个时间步长为推理，得到有竞争力的精确度进而降低了能量节省。从头开始与代理梯度培训SNNS减少了等待时间或时间步长总人数，但训练变慢/有问题，具有收敛的问题。因此，培养SNNS的后一种方法仅限于浅网络上简单的数据集。为了解决这个问题，训练中SNNS，我们重温批标准化，并提出时间批标准化通过时间（BNTT）技术。大多数现有SNN工作到现在漠视批标准化推定它是无效的训练时间SNNS。从以前的作品不同的是，我们提出的BNTT解耦在BNTT层参数沿时间轴捕捉到峰值的时间动态。在BNTT所述时间演变可学习参数允许神经元，以控制通过不同的时间步其穗率，从而使从头低延迟和低能量的训练。我们对CIFAR10进行实验，CIFAR-100，微型-ImageNet和事件驱动的DVS-CIFAR10数据集。 BNTT让我们从头开始培养深SNN架构，首次，在复杂的数据集只有少数25-30时的步骤。我们也建议使用在BNTT分布参数，以降低推理延迟的提前退出算法，进一步提高了能效。

24. Attention Guided Semantic Relationship Parsing for Visual Question Answering [PDF] 返回目录
Moshiur Farazi, Salman Khan, Nick Barnes
Abstract: Humans explain inter-object relationships with semantic labels that demonstrate a high-level understanding required to perform complex Vision-Language tasks such as Visual Question Answering (VQA). However, existing VQA models represent relationships as a combination of object-level visual features which constrain a model to express interactions between objects in a single domain, while the model is trying to solve a multi-modal task. In this paper, we propose a general purpose semantic relationship parser which generates a semantic feature vector for each subject-predicate-object triplet in an image, and a Mutual and Self Attention (MSA) mechanism that learns to identify relationship triplets that are important to answer the given question. To motivate the significance of semantic relationships, we show an oracle setting with ground-truth relationship triplets, where our model achieves a ~25% accuracy gain over the closest state-of-the-art model on the challenging GQA dataset. Further, with our semantic parser, we show that our model outperforms other comparable approaches on VQA and GQA datasets.
摘要：人类与讲解，演示的理解需要一个高级别执行复杂的视觉语言任务，例如视觉答疑（VQA）语义标签对象间的关系。但是，现有的VQA模型代表为对象级别的视觉特征，其约束的模型来表达在一个域中的对象之间的相互作用的组合关系，而模型试图解决一个多模式的任务。在本文中，我们提出了一种通用的语义关系解析器产生用于图像中的每个主谓对象三元组语义特征向量，和一个相互和自我注意（MSA）机制，学习识别关系三胞胎是重要回答特定问题。为了激励语义关系的重要性，我们显示与地面实况关系三胞胎，在我们的模型实现了对挑战GQA数据集最接近的国家的最先进的模型〜25％的准确度增益的神谕设置。此外，我们的语义解析，我们表明，我们的模型优于上VQA和GQA数据集等类似的方法。

25. MetaDetect: Uncertainty Quantification and Prediction Quality Estimates for Object Detection [PDF] 返回目录
Marius Schubert, Karsten Kahl, Matthias Rotmann
Abstract: In object detection with deep neural networks, the box-wise objectness score tends to be overconfident, sometimes even indicating high confidence in presence of inaccurate predictions. Hence, the reliability of the prediction and therefore reliable uncertainties are of highest interest. In this work, we present a post processing method that for any given neural network provides predictive uncertainty estimates and quality estimates. These estimates are learned by a post processing model that receives as input a hand-crafted set of transparent metrics in form of a structured dataset. Therefrom, we learn two tasks for predicted bounding boxes. We discriminate between true positives ($\mathit{IoU}\geq0.5$) and false positives ($\mathit{IoU} < 0.5$) which we term meta classification, and we predict $\mathit{IoU}$ values directly which we term meta regression. The probabilities of the meta classification model aim at learning the probabilities of success and failure and therefore provide a modelled predictive uncertainty estimate. On the other hand, meta regression gives rise to a quality estimate. In numerical experiments, we use the publicly available YOLOv3 network and the Faster-RCNN network and evaluate meta classification and regression performance on the Kitti, Pascal VOC and COCO datasets. We demonstrate that our metrics are indeed well correlated with the $\mathit{IoU}$. For meta classification we obtain classification accuracies of up to 98.92% and AUROCs of up to 99.93%. For meta regression we obtain an $R^2$ value of up to 91.78%. These results yield significant improvements compared to other network's objectness score and other baseline approaches. Therefore, we obtain more reliable uncertainty and quality estimates which is particularly interesting in the absence of ground truth.
摘要：目标检测与深层神经网络，箱明智的对象性得分往往过于自信，有时甚至指示不准确的预测存在高可信度。因此，预测的可靠性，因此可靠的不确定性是最高利益。在这项工作中，我们提出了一个后处理方法，对于任何给定的神经网络预测提供确定性估算和质量估计。这些估计是由接收作为输入的手工制作的集透明度量在结构化数据集的形式的后处理模型学习。由此，我们学会了预测边框两个任务。我们真阳性区分（$ \ mathit {IOU} \ geq0.5 $）和假阳性（$ \ mathit {IOU} <0.5 2 $），它我们术语元分类，我们预测$ \ mathit {iou}直接$值这是我们长期元回归。元分类模型的概率瞄准学习成功和失败的概率，因此提供了一个模拟预测的不确定性估计。在另一方面，元回归产生了质量估计。在数值实验中，我们使用公开可用的yolov3网络和更快rcnn网络并在吉滴，帕斯卡尔voc和coco的数据集评估元分类和回归的表现。我们表明，我们的指标确实很好用$ {mathit欠条} $相关。对于元分类，我们获得高达98.92％和aurocs起来，以99.93％的分类准确度。对于元回归分析，我们获得$ r ^ $的最高值91.78％。相对于其他网络的对象性得分和其他基线的方法。这些结果产生显著的改善。因此，我们获得更可靠的不确定性和质量估计这一点尤其在缺乏地面真实有趣。< font>

26. Generating Gameplay-Relevant Art Assets with Transfer Learning [PDF] 返回目录
Adrian Gonzalez, Matthew Guzdial, Felix Ramos
Abstract: In game development, designing compelling visual assets that convey gameplay-relevant features requires time and experience. Recent image generation methods that create high-quality content could reduce development costs, but these approaches do not consider game mechanics. We propose a Convolutional Variational Autoencoder (CVAE) system to modify and generate new game visuals based on their gameplay relevance. We test this approach with Pokémon sprites and Pokémon type information, since types are one of the game's core mechanics and they directly impact the game's visuals. Our experimental results indicate that adopting a transfer learning approach can help to improve visual quality and stability over unseen data.
摘要：在游戏开发，设计引人注目的视觉资产传达游戏相关功能，需要时间和经验。创造优质的内容最近的图像生成方法可以降低开发成本，但是这些方法没有考虑游戏机制。我们提出了一个卷积变自动编码器（CVAE）系统进行修改和生成基于他们的游戏相关的新游戏的视觉效果。我们测试与神奇宝贝精灵和神奇宝贝类型信息这一做法，因为类型是游戏的核心机制之一，他们直接影响了游戏的视觉效果。我们的实验结果表明，采用转移学习方法可以帮助改善了看不见的数据可视化的质量和稳定性。

27. Learning Complete 3D Morphable Face Models from Images and Videos [PDF] 返回目录
Mallikarjun B R, Ayush Tewari, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt
Abstract: Most 3D face reconstruction methods rely on 3D morphable models, which disentangle the space of facial deformations into identity geometry, expressions and skin reflectance. These models are typically learned from a limited number of 3D scans and thus do not generalize well across different identities and expressions. We present the first approach to learn complete 3D models of face identity geometry, albedo and expression just from images and videos. The virtually endless collection of such data, in combination with our self-supervised learning-based approach allows for learning face models that generalize beyond the span of existing approaches. Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis. Our method also allows for in-the-wild monocular reconstruction at test time. We show that our learned models better generalize and lead to higher quality image-based reconstructions than existing approaches.
摘要：大多数3D人脸重建方法依赖于3D形变模型，理清面部变形的空间分成身份几何，表情和皮肤反射。这些模型通常是由3D扫描数量有限的经验和因此不跨越不同的身份和表情推广好。我们提出学习完整的3D模型的脸身份几何的第一种方法，反照率和距离图像和视频表达。这样的数据几乎是无止境的集合，与我们的自我监督基于学习的方法相结合，可学习的脸部模型是广义含超出了现有方法的跨度。我们的网络设计和损失功能，不仅确保身份和反照率的参数化解开，而且还首次，表达的基础。我们的方法还允许在最狂野单眼重建的测试时间。我们表明，我们了解到模型更好地推广并导致比现有方法更高质量的基于图像的重建。

28. Multi-Resolution Fusion and Multi-scale Input Priors Based Crowd Counting [PDF] 返回目录
Usman Sajid, Wenchi Ma, Guanghui Wang
Abstract: Crowd counting in still images is a challenging problem in practice due to huge crowd-density variations, large perspective changes, severe occlusion, and variable lighting conditions. The state-of-the-art patch rescaling module (PRM) based approaches prove to be very effective in improving the crowd counting performance. However, the PRM module requires an additional and compromising crowd-density classification process. To address these issues and challenges, the paper proposes a new multi-resolution fusion based end-to-end crowd counting network. It employs three deep-layers based columns/branches, each catering the respective crowd-density scale. These columns regularly fuse (share) the information with each other. The network is divided into three phases with each phase containing one or more columns. Three input priors are introduced to serve as an efficient and effective alternative to the PRM module, without requiring any additional classification operations. Along with the final crowd count regression head, the network also contains three auxiliary crowd estimation regression heads, which are strategically placed at each phase end to boost the overall performance. Comprehensive experiments on three benchmark datasets demonstrate that the proposed approach outperforms all the state-of-the-art models under the RMSE evaluation metric. The proposed approach also has better generalization capability with the best results during the cross-dataset experiments.
摘要：在静止图像人群计数在实践中是一个具有挑战性的问题，由于巨大的人群密度的变化，大视角的变化，严重的阻塞，和可变的照明条件。基于状态的最先进的补丁的缩放模块（PRM）的方法被证明是在提高人群计数性能非常有效。然而，PRM模块需要额外的和妥协人群密度分类过程。为了解决这些问题和挑战，提出了一种新的多分辨率融合的基于终端到高端人群计数的网络。它采用基于列/分支3深的层，每个餐饮相应人群密度比例。这些列定期保险（股）彼此的信息。网络被划分为三个阶段与含有一个或多个列的各相。三个输入先验被引入作为高效且有效的替代PRM模块，而不需要任何额外的分类操作。随着最终人群计数回归头，该网络还含有三个辅助人群估计回归头，其被策略性地放置在各相端，以提高整体性能。对三个标准数据集综合性实验表明，该方法比RMSE评价指标下的所有国家的最先进的车型。所提出的方法也有交叉的数据集实验中的最好成绩更好的泛化能力。

29. Supporting large-scale image recognition with out-of-domain samples [PDF] 返回目录
Christof Henkel, Philipp Singer
Abstract: This article presents an efficient end-to-end method to perform instance-level recognition employed to the task of labeling and ranking landmark images. In a first step, we embed images in a high dimensional feature space using convolutional neural networks trained with an additive angular margin loss and classify images using visual similarity. We then efficiently re-rank predictions and filter noise utilizing similarity to out-of-domain images. Using this approach we achieved the 1st place in the 2020 edition of the Google Landmark Recognition challenge.
摘要：本文提出了一种高效的端至端的方法来执行用于标记和排名的地标图像的任务实例级识别。在第一个步骤中，我们使用具有使用视觉相似性的添加剂角度余量损失和图像分类训练的卷积神经网络的图像嵌入在高维特征空间。然后，我们有效地重新排序的预测和噪声利用相似超出域图像过滤。我们使用这种方法在2020年版的谷歌地标识别挑战的取得第一名。

30. Unknown Presentation Attack Detection against Rational Attackers [PDF] 返回目录
Ali Khodabakhsh
Abstract: Despite the impressive progress in the field of presentation attack detection and multimedia forensics over the last decade, these systems are still vulnerable to attacks in real-life settings. Some of the challenges for existing solutions are the detection of unknown attacks, the ability to perform in adversarial settings, few-shot learning, and explainability. In this study, these limitations are approached by reliance on a game-theoretic view for modeling the interactions between the attacker and the detector. Consequently, a new optimization criterion is proposed and a set of requirements are defined for improving the performance of these systems in real-life settings. Furthermore, a novel detection technique is proposed using generator-based feature sets that are not biased towards any specific attack species. To further optimize the performance on known attacks, a new loss function coined categorical margin maximization loss (C-marmax) is proposed which gradually improves the performance against the most powerful attack. The proposed approach provides a more balanced performance across known and unknown attacks and achieves state-of-the-art performance in known and unknown attack detection cases against rational attackers. Lastly, the few-shot learning potential of the proposed approach is studied as well as its ability to provide pixel-level explainability.
摘要：尽管在演示攻击检测和在过去十年中的多媒体取证领域的重大进展，这些系统仍然容易在现实生活中设置的攻击。一些现有的解决方案所面临的挑战的未知攻击的检测，在对抗的设置来进行，很少拍学习，explainability的能力。在这项研究中，这些限制是由依赖建模攻击者和探测器之间的相互作用一个博弈论的观点接近。因此，新的优化准则，提出和一组的要求，以提高在现实生活中设置这些系统的性能定义。此外，一个新的检测技术使用的是不向任何特定的攻击物种偏置基于生成器的功能集提出。为了进一步优化已知攻击的性能，新的损失函数创造绝对利润率最大化损失（C-marmax），提出了逐步改善对最强大的攻击性能。所提出的方法提供了一个更平衡的性能跨越已知和未知攻击，并实现在已知和未知的攻击检测的情况下对理性攻击的国家的最先进的性能。最后，该方法的几个拍的学习潜力进行了研究，以及其提供的像素级explainability能力。

31. The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge [PDF] 返回目录
Pablo Barros, Alessandra Sciutti
Abstract: Predicting affective information from human faces became a popular task for most of the machine learning community in the past years. The development of immense and dense deep neural networks was backed by the availability of numerous labeled datasets. These models, most of the time, present state-of-the-art results in such benchmarks, but are very difficult to adapt to other scenarios. In this paper, we present one more chapter of benchmarking different versions of the FaceChannel neural network: we demonstrate how our little model can predict affective information from the facial expression on the novel AffWild2 dataset.
摘要：从预测的人脸情感信息成为在过去几年里大部分的机器学习领域的一个热门课题。庞大和密集深层神经网络的发展是由众多的标记数据集的可用性支持。这些模型，大部分的时间，目前国家的先进成果在这样的基准，但很难适应其他场景。在本文中，我们提出了标杆不同版本的FaceChannel神经网络的多了一个章：我们演示如何我们的小模型可以从小说AffWild2数据集中的面部表情预测情感信息。

32. Holistic static and animated 3D scene generation from diverse text descriptions [PDF] 返回目录
Faria Huq, Anindya Iqbal, Nafees Ahmed
Abstract: We propose a framework for holistic static and animated 3D scene generation from diverse text descriptions. Prior works of scene generation rely on static rule-based entity extraction from natural language description. However, this limits the usability of a practical solution. To overcome this limitation, we use one of state-of-the-art architecture - TransformerXL. Instead of rule-based extraction, our framework leverages the rich contextual encoding which allows us to process a larger range (diverse) of possible natural language descriptions. We empirically show how our proposed mechanism generalizes even on novel combinations of object-features during inference. We also show how our framework can jointly generate static and animated 3D scene efficiently. We modify CLEVR to generate a large, scalable dataset - Integrated static and animated 3D scene (Iscene). Data preparation code and pre-trained model available at - this https URL.
摘要：我们提出了整体的静态和来自不同的文字描述了一个框架三维动画场景生成。场景生成的作品之前依赖从自然语言描述静态基于规则的实体提取。然而，这限制了可行的解决方案的可用性。为了克服这种局限性，我们使用国家的最先进的建筑之一 - TransformerXL。相反，基于规则的提取，我们的框架利用丰富的背景编码，这使我们能够处理可能的自然语言描述更大范围的（不同的）。我们经验表明推理过程如何我们提出的机制推广甚至在的新组合对象的特性。我们还展示了如何我们的架构能够有效地共同生成静态和动画的3D场景。我们修改CLEVR产生大的，可扩展的数据集 - 综合静态和动态3D场景（Iscene）。数据准备代码，并可以在预先训练模型 - 这HTTPS URL。

33. Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting [PDF] 返回目录
Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell
Abstract: The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made decisions. As a first step towards exploring this hypothesis, we propose a simple novel training paradigm, called Remembering for the Right Reasons (RRR), that additionally stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions by encouraging its explanations to remain consistent with those used to make decisions at training time. Without this constraint, there is a drift in explanations and increase in forgetting as conventional continual learning algorithms learn new tasks. We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting, and more importantly, improved model explanations. We have evaluated our approach in the standard and few-shot settings and observed a consistent improvement across various CL approaches using different architectures and techniques to generate model explanations and demonstrated our approach showing a promising connection between explainability and continual learning. Our code is available at this https URL.
摘要：不断地学习（CL）的目标是学习任务的序列，而不灾难性遗忘的现象痛苦。以前的工作已经表明，在重传缓冲器的形式利用存储器可以减少对现有的任务性能下降。我们假设可以当模特鼓励记住\ textit {}证据先前作出的决定进一步减少了遗忘。由于对探索该假说的第一步，我们提出了一个简单新颖的培训模式，被称为缅怀正确的原因（RRR），即追加存储在缓冲区中的每个例如可视化模型的解释，并确保该模型有“正确的理由”它的预测鼓励其解释保持与那些在训练时使用做出决定相一致。如果没有这个限制，存在解释的漂移，并增加忘记常规不断学习算法学习新的任务。我们演示了如何存款准备金率可以很容易地添加到任何存储或基于正则化的方法，并在减少遗忘的结果，更重要的是，改进模型的解释。我们已经评估了我们的标准和为数不多的拍摄设置方法，并观察在不同CL持续改善使用不同的架构和技术来生成模型的解释方法和证明我们的方法显示explainability和不断学习之间的前途连接。我们的代码可在此HTTPS URL。

34. A Study for Universal Adversarial Attacks on Texture Recognition [PDF] 返回目录
Yingpeng Deng, Lina J. Karam
Abstract: Given the outstanding progress that convolutional neural networks (CNNs) have made on natural image classification and object recognition problems, it is shown that deep learning methods can achieve very good recognition performance on many texture datasets. However, while CNNs for natural image classification/object recognition tasks have been revealed to be highly vulnerable to various types of adversarial attack methods, the robustness of deep learning methods for texture recognition is yet to be examined. In our paper, we show that there exist small image-agnostic/univesal perturbations that can fool the deep learning models with more than 80\% of testing fooling rates on all tested texture datasets. The computed perturbations using various attack methods on the tested datasets are generally quasi-imperceptible, containing structured patterns with low, middle and high frequency components.
摘要：鉴于突出进展是卷积神经网络（细胞神经网络）对自然图像分类和目标识别的问题时，应当表明深学习方法可以实现许多数据集质感非常好的识别性能。然而，当细胞神经网络的自然图像分类/对象识别任务已显露是非常容易受到各种敌对攻击方法，深学习方法纹理识别的鲁棒性还有待检验。在本文中，我们表明存在较小的图像不可知/万向扰动，可以与所有测试数据集质感测试嘴硬率超过80 \％愚弄深度学习模型。使用关于所测试的数据集的各种攻击方法所计算的扰动一般是准不可察觉的，含有低，中和高频分量的结构的图案。

35. A New Mask R-CNN Based Method for Improved Landslide Detection [PDF] 返回目录
Silvia Liberata Ullo, Amrita Mohan, Alessandro Sebastianelli, Shaik Ejaz Ahamed, Basant Kumar, Ramji Dwivedi, G. R. Sinha
Abstract: This paper presents a novel method of landslide detection by exploiting the Mask R-CNN capability of identifying an object layout by using a pixel-based segmentation, along with transfer learning used to train the proposed model. A data set of 160 elements is created containing landslide and non-landslide images. The proposed method consists of three steps: (i) augmenting training image samples to increase the volume of the training data, (ii) fine tuning with limited image samples, and (iii) performance evaluation of the algorithm in terms of precision, recall and F1 measure, on the considered landslide images, by adopting ResNet-50 and 101 as backbone models. The experimental results are quite encouraging as the proposed method achieves Precision equals to 1.00, Recall 0.93 and F1 measure 0.97, when ResNet-101 is used as backbone model, and with a low number of landslide photographs used as training samples. The proposed algorithm can be potentially useful for land use planners and policy makers of hilly areas where intermittent slope deformations necessitate landslide detection as prerequisite before planning.
摘要：提出通过利用由使用基于像素的分割，与转印沿识别对象布局的面膜R-CNN能力滑坡检测的新方法学习用于训练该模型。数据组160个的元件创建包含滑坡和非滑坡图像。所提出的方法包括三个步骤：（i）增强训练图像样本来增加训练数据量，（ⅱ）微调具有有限图像样本，并在精确度，召回的术语算法（iii）的性能评价和F1的措施，所考虑的滑坡图像，采用RESNET-50和101作为骨干机型。实验结果相当令人鼓舞所提出的方法实现精密等于1.00，召回0.93和F1值0.97，当RESNET-101被用作骨架模型，并用少量的作为训练样本滑坡照片。该算法可以为土地利用规划师和地方间歇斜坡变形必要滑坡检测策划之前前提丘陵地区的政策制定者可能有用。

36. Generalized Two-Dimensional Quaternion Principal Component Analysis with Weighting for Color Image Recognition [PDF] 返回目录
Zhi-Gang Jia, Zi-Jin Qiu, Mei-Xiang Zhao
Abstract: A generalized two-dimensional quaternion principal component analysis (G2DQPCA) approach with weighting is presented for color image analysis. As a general framework of 2DQPCA, G2DQPCA is flexible to adapt different constraints or requirements by imposing $L_{p}$ norms both on the constraint function and the objective function. The gradient operator of quaternion vector functions is redefined by the structure-preserving gradient operator of real vector function. Under the framework of minorization-maximization (MM), an iterative algorithm is developed to obtain the optimal closed-form solution of G2DQPCA. The projection vectors generated by the deflating scheme are required to be orthogonal to each other. A weighting matrix is defined to magnify the effect of main features. The weighted projection bases remain the accuracy of face recognition unchanged or moving in a tight range as the number of features increases. The numerical results based on the real face databases validate that the newly proposed method performs better than the state-of-the-art algorithms.
摘要：广义与加权二维四元数主成分分析（G2DQPCA）方法提出了一种用于彩色图像的分析。作为2DQPCA的一个总体框架，G2DQPCA灵活双方的约束功能和目标函数堂堂$ L_ {P} $规范，以适应不同的约束和要求。的四元数矢量函数的梯度算子是由实数向量函数的结构保留梯度算子重新定义。下minorization最大化（MM）的框架内，迭代算法被显影以获得G2DQPCA的最佳封闭形式的解。由放气方案生成的投影矢量需要是彼此正交。加权矩阵被定义为放大的主要特征的效果。加权投影碱基保持人脸识别不变或在窄幅作为移动的特征的数量增加的准确性。基于真实脸部数据库中的数值结果验证了比国家的最先进的算法的新提出的方法进行更好。

37. MDReg-Net: Multi-resolution diffeomorphic image registration using fully convolutional networks with deep self-supervision [PDF] 返回目录
Hongming Li, Yong Fan
Abstract: We present a diffeomorphic image registration algorithm to learn spatial transformations between pairs of images to be registered using fully convolutional networks (FCNs) under a self-supervised learning setting. The network is trained to estimate diffeomorphic spatial transformations between pairs of images by maximizing an image-wise similarity metric between fixed and warped moving images, similar to conventional image registration algorithms. It is implemented in a multi-resolution image registration framework to optimize and learn spatial transformations at different image resolutions jointly and incrementally with deep self-supervision in order to better handle large deformation between images. A spatial Gaussian smoothing kernel is integrated with the FCNs to yield sufficiently smooth deformation fields to achieve diffeomorphic image registration. Particularly, spatial transformations learned at coarser resolutions are utilized to warp the moving image, which is subsequently used for learning incremental transformations at finer resolutions. This procedure proceeds recursively to the full image resolution and the accumulated transformations serve as the final transformation to warp the moving image at the finest resolution. Experimental results for registering high resolution 3D structural brain magnetic resonance (MR) images have demonstrated that image registration networks trained by our method obtain robust, diffeomorphic image registration results within seconds with improved accuracy compared with state-of-the-art image registration algorithms.
摘要：本文提出了一种微分同胚的图像配准算法学习对图像之间的空间转换到下一个自我监督的学习环境中使用完全卷积网络（FCNs）注册。的网络进行训练，通过最大化固定和翘曲运动图像，类似于传统的图像配准算法之间的图像方式相似性度量来估计对图像之间的微分同胚空间变换。它是在一个多分辨率图像配准框架，优化执行，并在不同的图像分辨率学习空间变换联合并逐步深自检，以图像之间更好地处理大变形。一种空间高斯平滑内核集成了FCNs得到足够光滑变形场，以实现微分同胚图像配准。特别是，在粗糙的分辨率得知的空间变换被用来翘曲的运动图像，其随后用于在更精细的分辨率学习增量变换。该过程递归转移到完整的图像分辨率和累积的变换作为最终转换翘曲在最精细分辨率的运动图像。用于登记高分辨率三维结构脑磁共振（MR）图像的实验结果已经表明，由我们的方法训练图像配准的网络与国家的最先进的图像配准算法相比获得秒内健壮，微分同胚图像配准结果与改善的准确度。

38. LEGAN: Disentangled Manipulation of Directional Lighting and Facial Expressions by Leveraging Human Perceptual Judgements [PDF] 返回目录
Sandipan Banerjee, Ajjen Joshi, Prashant Mahajan, Sneha Bhattacharya, Survi Kyal, Taniya Mishra
Abstract: Building facial analysis systems that generalize to extreme variations in lighting and facial expressions is a challenging problem that can potentially be alleviated using natural-looking synthetic data. Towards that, we propose LEGAN, a novel synthesis framework that leverages perceptual quality judgments for jointly manipulating lighting and expressions in face images, without requiring paired training data. LEGAN disentangles the lighting and expression subspaces and performs transformations in the feature space before upscaling to the desired output image. The fidelity of the synthetic image is further refined by integrating a perceptual quality estimation model into the LEGAN framework as an auxiliary discriminator. The quality estimation model is learned from face images rendered using multiple synthesis methods and their crowd-sourced naturalness ratings using a margin-based regression loss. Using objective metrics like FID and LPIPS, LEGAN is shown to generate higher quality face images when compared with popular GAN models like pix2pix, CycleGAN and StarGAN for lighting and expression synthesis. We also conduct a perceptual study using images synthesized by LEGAN and other GAN models, trained with and without the quality based auxiliary discriminator, and show the correlation between our quality estimation and visual fidelity. Finally, we demonstrate the effectiveness of LEGAN as training data augmenter for expression recognition and face verification tasks.
摘要：构建面部分析系统，其推广到在照明和面部表情的极端变化是可可能使用自然的合成数据被减轻一个具有挑战性的问题。为实现这一，我们提出LEGAN，一种新颖的合成框架，利用感知质量的判断为共同操作的照明和在脸图像表情，而无需配对的训练数据。 LEGAN升频到所需输出图像之前理顺了那些纷繁在特征空间中的照明和表达子空间和执行转换。的合成图像的保真度是由感知质量估计模型集成到作为辅助鉴别器LEGAN框架进一步细化。质量评估模型根据使用使用基于利润率回归损失的多重合成方法及其人群来源的收视率自然呈现的人脸图像的教训。使用客观度量像FID和LPIPS，LEGAN示出当与流行GAN模型，如pix2pix，CycleGAN和StarGAN用于照明和表达合成比较，以产生更高质量的脸部图像。我们还进行使用LEGAN等GAN模型，有和没有基础的质量辅助鉴别训练的合成图像的感知研究，并展示我们的质量估计和视觉保真度之间的相关性。最后，我们证明LEGAN作为训练数据增强因子的表情识别和人脸验证工作的有效性。

39. 3D Orientation Field Transform [PDF] 返回目录
Wai-Tsun Yeung, Xiaohao Cai, Zizhen Liang, Byung-Ho Kang
Abstract: The two-dimensional (2D) orientation field transform has been proved to be effective at enhancing 2D contours and curves in images by means of top-down processing. It, however, has no counterpart in three-dimensional (3D) images due to the extremely complicated orientation in 3D compared to 2D. Practically and theoretically, the demand and interest in 3D can only be increasing. In this work, we modularise the concept and generalise it to 3D curves. Different modular combinations are found to enhance curves to different extents and with different sensitivity to the packing of the 3D curves. In principle, the proposed 3D orientation field transform can naturally tackle any dimensions. As a special case, it is also ideal for 2D images, owning simpler methodology compared to the previous 2D orientation field transform. The proposed method is demonstrated with several transmission electron microscopy tomograms ranging from 2D curve enhancement to, the more important and interesting, 3D ones.
摘要：二维（2D）取向场变换已被证明是在通过自顶向下处理来增强2D轮廓和曲线在图像有效。然而，它具有在三维（3D）图像中没有对应由于在3D不需要非常复杂的取向相比，2D。实践和理论，在3D的需求和利益只能不断增加。在这项工作中，我们modularise的概念，并将其推广到3D曲线。不同的模块化的组合被发现增强曲线到不同的程度，并与所述3D曲线的包装不同的敏感性。原则上，提出了三维方向场改造自然可以解决的任何尺寸。作为一个特例，它也是理想的2D图像，相比以前的二维方向变换领域拥有更简单的方法。所提出的方法证明与几个透射电子显微镜的X线断层范围从2D曲线增强，更重要的和有趣的，三维的。

40. MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network [PDF] 返回目录
Yi Wei, Zhe Gan, Wenbo Li, Siwei Lyu, Ming-Ching Chang, Lei Zhang, Jianfeng Gao, Pengchuan Zhang
Abstract: We present Mask-guided Generative Adversarial Network (MagGAN) for high-resolution face attribute editing, in which semantic facial masks from a pre-trained face parser are used to guide the fine-grained image editing process. With the introduction of a mask-guided reconstruction loss, MagGAN learns to only edit the facial parts that are relevant to the desired attribute changes, while preserving the attribute-irrelevant regions (e.g., hat, scarf for modification `To Bald'). Further, a novel mask-guided conditioning strategy is introduced to incorporate the influence region of each attribute change into the generator. In addition, a multi-level patch-wise discriminator structure is proposed to scale our model for high-resolution ($1024 \times 1024$) face editing. Experiments on the CelebA benchmark show that the proposed method significantly outperforms prior state-of-the-art approaches in terms of both image quality and editing performance.
摘要：我们目前面膜引导剖成对抗性网络（MagGAN）高分辨率面部属性的编辑，其中来自预训练脸解析器语义面膜用于指导细粒度的图像编辑处理。随着引进掩模引导的重建损失，MagGAN学会编辑仅是相关于所期望的属性的变化，同时保持属性无关的区域（例如，帽子，头巾用于修饰`秃'）的面部部分。此外，一个新的掩模引导的调节策略被引入到引入各属性变化的影响区域到发电机。此外，多层次的补丁明智的鉴别结构，提出了扩大我们的模型对于高分辨率（$ 1024 \ 1024次$）面编辑。在CelebA基准表明，所提出的方法显著优于先前状态的最先进的图像质量和编辑性能方面接近实验。

41. Early Bird: Loop Closures from Opposing Viewpoints for Perceptually-Aliased Indoor Environments [PDF] 返回目录
Satyajit Tourani, Dhagash Desai, Udit Singh Parihar, Sourav Garg, Ravi Kiran Sarvadevabhatla, K. Madhava Krishna
Abstract: Significant advances have been made recently in Visual Place Recognition (VPR), feature correspondence, and localization due to the proliferation of deep-learning-based methods. However, existing approaches tend to address, partially or fully, only one of two key challenges: viewpoint change and perceptual aliasing. In this paper, we present novel research that simultaneously addresses both challenges by combining deep-learned features with geometric transformations based on reasonable domain assumptions about navigation on a ground-plane, whilst also removing the requirement for specialized hardware setup (e.g. lighting, downwards facing cameras). In particular, our integration of VPR with SLAM by leveraging the robustness of deep-learned features and our homography-based extreme viewpoint invariance significantly boosts the performance of VPR, feature correspondence, and pose graph submodules of the SLAM pipeline. For the first time, we demonstrate a localization system capable of state-of-the-art performance despite perceptual aliasing and extreme 180-degree-rotated viewpoint change in a range of real-world and simulated experiments. Our system is able to achieve early loop closures that prevent significant drifts in SLAM trajectories. We also compare extensively several deep architectures for VPR and descriptor matching. We also show that superior place recognition and descriptor matching across opposite views results in a similar performance gain in back-end pose graph optimization.
摘要：重大进展已最近在视觉识别广场（VPR），特征对应，和本地化由于基于深学习的方法增殖。但是，现有的方法往往地址，部分或全部，只有两个关键挑战之一：视点变更和感知走样。在本文中，我们提出了新的研究，能同时满足通过基于关于在地平面导航合理域的假设与几何变换深学特征相结合，同时也消除了对专用硬件设置的要求（例如照明，向下既面临挑战相机）。特别是，我们的整合VPR与SLAM通过利用深学特点和我们的基于单应性，极端的观点不变显著提升VPR的性能，特征对应，和SLAM管道的姿态图形子模块的鲁棒性。首次，我们证明能够国家的最先进的性能，尽管感知混叠，并在范围真实世界和模拟实验的极端180度旋转的视点变更定位系统。我们的系统能够实现防止在SLAM轨迹显著漂移早期环封闭。我们也比较了VPR和描述符匹配广泛几个深架构。我们还表明，在后端姿态图形优化类似的性能增益跨相反的意见结果优越的位置识别和描述符匹配。

42. Adversarial and Natural Perturbations for General Robustness [PDF] 返回目录
Sadaf Gulshad, Jan Hendrik Metzen, Arnold Smeulders
Abstract: In this paper we aim to explore the general robustness of neural network classifiers by utilizing adversarial as well as natural perturbations. Different from previous works which mainly focus on studying the robustness of neural networks against adversarial perturbations, we also evaluate their robustness on natural perturbations before and after robustification. After standardizing the comparison between adversarial and natural perturbations, we demonstrate that although adversarial training improves the performance of the networks against adversarial perturbations, it leads to drop in the performance for naturally perturbed samples besides clean samples. In contrast, natural perturbations like elastic deformations, occlusions and wave does not only improve the performance against natural perturbations, but also lead to improvement in the performance for the adversarial perturbations. Additionally they do not drop the accuracy on the clean images.
摘要：在本文中，我们的目标是通过利用对抗性以及自然扰动探索神经网络分类的一般鲁棒性。从主要集中于研究对敌对扰动神经网络的健壮性以前的作品不同的是，我们也评估之前和之后robustification自然干扰的鲁棒性。规范对抗性和自然干扰的比较后，我们证明了，虽然对抗性训练改善了对于对抗扰动的网络的性能，它会导致在清洁之外自然采样样本扰动性能下降。相比之下，像弹性变形，闭塞和波浪自然干扰不仅提高抗自然干扰的性能，而且还会导致改善的对抗扰动性能。此外，他们不砸在干净的图像精度。

43. Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment [PDF] 返回目录
Jiafei Duan, Samson Yu, Hui Li Tan, Cheston Tan
Abstract: The problem of task planning for artificial agents remains largely unsolved. While there has been increasing interest in data-driven approaches for the study of task planning for artificial agents, a significant remaining bottleneck is the dearth of large-scale comprehensive task-based datasets. In this paper, we present ActioNet, an interactive end-to-end platform for data collection and augmentation of task-based dataset in 3D environment. Using ActioNet, we collected a large-scale comprehensive task-based dataset, comprising over 3000 hierarchical task structures and videos. Using the hierarchical task structures, the videos are further augmented across 50 different scenes to give over 150,000 video. To our knowledge, ActioNet is the first interactive end-to-end platform for such task-based dataset generation and the accompanying dataset is the largest task-based dataset of such comprehensive nature. The ActioNet platform and dataset will be made available to facilitate research in hierarchical task planning.
摘要：任务规划的人工坐席的问题在很大程度上仍然没有解决。虽然一直有增加的任务规划的人工坐席的研究数据驱动的方法的兴趣，一个显著剩下的瓶颈是大型综合性的任务型数据集的缺乏。在本文中，我们提出ActioNet，交互式终端到终端的平台，进行数据采集和3D环境中的基于任务的数据集的增强。使用ActioNet，我们收集了一个大型的综合性的基于任务的数据集，包括超过3000个分层任务结构和视频。使用分层任务结构，这些视频的进一步跨越50个不同的场景增强给超过15万的视频。据我们所知，ActioNet是第一个交互式终端到终端的平台，这样的基于任务的数据集生成和伴随的数据集是这样的综合性的最大的基于任务的数据集。该ActioNet平台和数据集将提供给便于层次结构的任务规划研究。

44. A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition [PDF] 返回目录
Ayush Srivastava, Oshin Dutta, Prathosh AP, Sumeet Agarwal, Jigyasa Gupta
Abstract: In the last few years, compression of deep neural networks has become an important strand of machine learning and computer vision research. Deep models require sizeable computational complexity and storage, when used for instance for Human Action Recognition (HAR) from videos, making them unsuitable to be deployed on edge devices. In this paper, we address this issue and propose a method to effectively compress Recurrent Neural Networks (RNNs) such as Gated Recurrent Units (GRUs) and Long-Short-Term-Memory Units (LSTMs) that are used for HAR. We use a Variational Information Bottleneck (VIB) theory-based pruning approach to limit the information flow through the sequential cells of RNNs to a small subset. Further, we combine our pruning method with a specific group-lasso regularization technique that significantly improves compression. The proposed techniques reduce model parameters and memory footprint from latent representations, with little or no reduction in the validation accuracy while increasing the inference speed several-fold. We perform experiments on the three widely used Action Recognition datasets, viz. UCF11, HMDB51, and UCF101, to validate our approach. It is shown that our method achieves over 70 times greater compression than the nearest competitor with comparable accuracy for the task of action recognition on UCF11.
摘要：在过去的几年中，深层神经网络的压缩已成为机器学习和计算机视觉研究的重要链。深模型需要相当大的计算复杂度和存储，例如用于从视频中的人体行为识别（HAR）时，使它们不被部署到边缘设备。在本文中，我们要解决这个问题，并提出了有效地压缩回归神经网络（RNNs），如用于HAR门控复发单位（丹顶鹤）和长短期内存单元（LSTMs）的方法。我们用一个变信息瓶颈（VIB）理论为基础的修剪方法来通过RNNs的顺序单元的信息流限制到一小部分。此外，我们结合我们的修剪方法与特定组，套索正规化技术，显著提高了压缩。所提出的技术减少模型参数和内存占用潜伏表示，在验证准确性很少或根本没有减少，同时增加了推理速度若干倍。我们执行的三个广泛使用的动作识别的数据集，即实验。 UCF11，HMDB51和UCF101，来验证我们的方法。结果表明，我们的方法实现超过70倍比可比的准确度上UCF11动作识别的任务最接近的竞争对手更大的压缩。

45. End-to-End Training of CNN Ensembles for Person Re-Identification [PDF] 返回目录
Ayse Serbetci, Yusuf Sinan Akgul
Abstract: We propose an end-to-end ensemble method for person re-identification (ReID) to address the problem of overfitting in discriminative models. These models are known to converge easily, but they are biased to the training data in general and may produce a high model variance, which is known as overfitting. The ReID task is more prone to this problem due to the large discrepancy between training and test distributions. To address this problem, our proposed ensemble learning framework produces several diverse and accurate base learners in a single DenseNet. Since most of the costly dense blocks are shared, our method is computationally efficient, which makes it favorable compared to the conventional ensemble models. Experiments on several benchmark datasets demonstrate that our method achieves state-of-the-art results. Noticeable performance improvements, especially on relatively small datasets, indicate that the proposed method deals with the overfitting problem effectively.
摘要：我们提出了个人重新鉴定（里德）的端至端集成方法，以解决判别模型过度拟合的问题。这些模型是公知容易地会聚，但它们被偏置到一般训练数据，并且可以产生很高的模型方差，这被称为过度拟合。里德任务更容易出现这个问题是由于训练和测试分布之间的巨大差异。为了解决这个问题，我们提出的集成学习框架产生单一DenseNet几个不同的，准确的基础学习。由于大多数昂贵致密块是共享的，我们的方法是计算有效的，这使得相比于常规合奏模型它是有利的。在几个基准数据集实验表明，我们的方法实现国家的最先进的成果。显着的性能改进，特别是在相对小的数据集，表示与过学习问题，有效，所提出的方法处理。

46. Gaussian Vector: An Efficient Solution for Facial Landmark Detection [PDF] 返回目录
Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su
Abstract: Significant progress has been made in facial landmark detection with the development of Convolutional Neural Networks. The widely-used algorithms can be classified into coordinate regression methods and heatmap based methods. However, the former loses spatial information, resulting in poor performance while the latter suffers from large output size or high post-processing complexity. This paper proposes a new solution, Gaussian Vector, to preserve the spatial information as well as reduce the output size and simplify the post-processing. Our method provides novel vector supervision and introduces Band Pooling Module to convert heatmap into a pair of vectors for each landmark. This is a plug-and-play component which is simple and effective. Moreover, Beyond Box Strategy is proposed to handle the landmarks out of the face bounding box. We evaluate our method on 300W, COFW, WFLW and JD-landmark. That the results significantly surpass previous works demonstrates the effectiveness of our approach.
摘要：显着进展，在面部标志检测已经取得与卷积神经网络的发展。广泛使用的算法可分为坐标回归方法和基于热图方法。然而，前丢失的空间信息，导致性能不佳而从大的输出尺寸或者高后遭受后期处理的复杂性。本文提出了一种新的解决方案，高斯向量，以保留该空间信息以及减少输出尺寸和简化后处理。我们的方法提供了新的载体监督引入带池模块以转换成热图的一对矢量的每一个地标。这是一个插件和播放组件，它是简单而有效的。此外，除了盒子战略，提出了处理的地标出来面对边框。我们评估我们在300W，COFW，WFLW和JD-里程碑意义的方法。这结果显著超越以往的作品证明了我们方法的有效性。

47. A simulation environment for drone cinematography [PDF] 返回目录
Fan Zhang, David Hall, Tao Xu, Stephen Boyle, David Bull
Abstract: In this paper, we present a workflow for the simulation of drone operations exploiting realistic background environments constructed within Unreal Engine 4 (UE4). Methods for environmental image capture, 3D reconstruction (photogrammetry) and the creation of foreground assets are presented along with a flexible and user-friendly simulation interface. Given the geographical location of the selected area and the camera parameters employed, the scanning strategy and its associated flight parameters are first determined for image capture. Source imagery can be extracted from virtual globe software or obtained through aerial photography of the scene (e.g. using drones). The latter case is clearly more time consuming but can provide enhanced detail, particularly where coverage of virtual globe software is limited. The captured images are then used to generate 3D background environment models employing photogrammetry software. The reconstructed 3D models are then imported into the simulation interface as background environment assets together with appropriate foreground object models as a basis for shot planning and rehearsal. The tool supports both free-flight and parameterisable standard shot types along with programmable scenarios associated with foreground assets and event dynamics. It also supports the exporting of flight plans. Camera shots can also be designed to provide suitable coverage of any landmarks which need to appear in-shot. This simulation tool will contribute to enhanced productivity, improved safety (awareness and mitigations for crowds and buildings), improved confidence of operators and directors and ultimately enhanced quality of viewer experience.
摘要：在本文中，我们提出了一个工作流程雄蜂操作利用虚幻引擎4（UE4）内构建现实背景环境的仿真。环境图像采集方法，三维重建（摄影）和前景资产的创造与灵活和人性化的模拟界面呈现沿。给定所选择的区域和所使用的照相机参数的地理位置，扫描策略和其相关联的飞行参数用于图像捕获首先确定。源图像可以从虚拟地球仪软件中提取或（例如，使用无人驾驶飞机）通过现场的航空摄影获得。后者的情况下，显然消耗更多的时间，但可以提供增强的细节，特别是其中的虚拟地球仪软件覆盖被限制。所拍摄的图像，然后用来生成使用摄影软件的3D背景环境模型。重建的3D模型，然后导入到模拟接口与适当的前景对象模型作为拍摄计划和演练的基础背景环境的资产联系在一起。该工具同时支持自由飞行和参数化的标准镜头类型与前景的资产与事件相关的动态可编程的场景一起。它还支持飞行计划出口。相机的镜头也可以被设计成提供需要出现在拍任何标志的合适范围。这种模拟工具，将有助于提高生产力，提高了安全性（意识和缓解的人群和建筑物），提高了运营商和导演的信心，并最终提高了观众的体验质量。

48. Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected Buildings [PDF] 返回目录
Kun Zhao, Yongkun Liu, Siyuan Hao, Shaoxing Lu, Hongbin Liu, Lijian Zhou
Abstract: Street view images have been increasingly used in tasks like urban land use classification and urban functional zone portraying. Street view image classification is difficult because the class labels such as commercial area, are concepts with higher abstract level compared to general visual tasks. Therefore, classification models using only visual features often fail to achieve satisfactory performance. We believe that the efficient representation of significant objects and their context relations in street view images are the keys to solve this problem. In this paper, a novel approach based on a detector-encoder-classifier framework is proposed. Different from common image-level end-to-end models, our approach does not use visual features of the whole image directly. The proposed framework obtains the bounding boxes of buildings in street view images from a detector. Their contextual information such as building classes and positions are then encoded into metadata and finally classified by a recurrent neural network (RNN). To verify our approach, we made a dataset of 19,070 street view images and 38,857 buildings based on the BIC_GSV dataset through a combination of automatic label acquisition and expert annotation. The dataset can be used not only for street view image classification aiming at urban land use analysis, but also for multi-class building detection. Experiments show that the proposed approach achieves a 12.65% performance improvement on macro-precision and 12% on macro-recall over the models based on end-to-end convolutional neural network (CNN). Our code and dataset are available at this https URL
摘要：街景图像已经在像城市用地分类与城市功能区描绘的任务越来越多地使用。街景图像分类是困难的，因为班级为商业区等标签，都具有较高的抽象层次的概念比一般的视觉任务。因此，只用视觉特征往往不能分类模型达到满意的性能。我们相信，显著对象，并在街景图像的语境关系的有效表示是解决这一问题的关键。在本文中，基于检测器 - 编码器的分类器框架的新方法，提出了从常见的图像层次端至中高端车型不同的是，我们的方法不直接使用整个图像的视觉特征。所提出的框架获得从一个检测器在街景图像建筑物的边界框。他们的上下文信息，例如建筑物的类和位置然后编码成元数据和最后由回归神经网络（RNN）分类。为了验证我们的方法，我们通过自动标签获取和专家注释的组合制成的基础上，BIC_GSV数据集19070幅街景图像和38857个建筑物的数据集。该数据集不仅可以为街道视图图像分类针对城市土地利用的分析，同时也为多类建筑的检测中使用。实验表明，该方法实现了对宏观精度12.65％的性能提升和12％的宏观召回了基于终端到终端的卷积神经网络（CNN）的模型。我们的代码和数据集可在此HTTPS URL

49. Deep Convolutional Neural Network Based Facial Expression Recognition in the Wild [PDF] 返回目录
Hafiq Anas, Bacha Rehman, Wee Hong Ong
Abstract: This paper describes the proposed methodology, data used and the results of our participation in the ChallengeTrack 2 (Expr Challenge Track) of the Affective Behavior Analysis in-the-wild (ABAW) Competition 2020. In this competition, we have used a proposed deep convolutional neural network (CNN) model to perform automatic facial expression recognition (AFER) on the given dataset. Our proposed model has achieved an accuracy of 50.77% and an F1 score of 29.16% on the validation set.
摘要：本文介绍了建议的方法，使用的数据和我们的ChallengeTrack 2（表达式挑战曲目）的情感行为分析中最野性（ABAW）竞争2020年在本次比赛的参与的结果，我们使用了提出深卷积神经网络（CNN）模型在给定数据集执行自动面部表情识别（AFER）。我们提出的模型已经实现了50.77％的准确度和验证集的F1分数的29.16％。

50. Unsupervised Shadow Removal Using Target Consistency Generative Adversarial Network [PDF] 返回目录
Chao Tan, Xin Feng
Abstract: Unsupervised shadow removal aims to learn a non-linear function to map the original image from shadow domain to non-shadow domain in the absence of paired shadow and non-shadow data. In this paper, we develop a simple yet efficient target-consistency generative adversarial network (TC-GAN) for the shadow removal task in the unsupervised manner. Compared with the bidirectional mapping in cycle-consistency GAN based methods for shadow removal, TC-GAN tries to learn a one-sided mapping to cast shadow images into shadow-free ones. With the proposed target-consistency constraint, the correlations between shadow images and the output shadow-free image are strictly confined. Extensive comparison experiments results show that TC-GAN outperforms the state-of-the-art unsupervised shadow removal methods by 14.9% in terms of FID and 31.5% in terms of KID. It is rather remarkable that TC-GAN achieves comparable performance with supervised shadow removal methods.
摘要：无监督阴影去除旨在学习非线性函数从阴影域到非阴影域原始图像映射在不存在配对的阴影和非阴影数据。在本文中，我们开发的无监督方式的阴影去除任务简单而有效的目标一致性生成对抗网络（TC-GAN）。与用于阴影去除循环一致性GAN基础的方法的双向映射进行比较，TC-GAN试图学习片面映射到投射阴影图像转换成无阴影的。与所提出的目标一致性约束，阴影图像和输出无阴影图像之间的相关性被严格限于。广泛的比较实验结果表明，TC-GAN优于由14.9％的状态下的最先进的无监督阴影去除方法在FID的术语和31.5％在KID的条款。这是相当了不起的TC-GaN实现了与监管的阴影去除方法相当的性能。

51. Generating the Cloud Motion Winds Field from Satellite Cloud Imagery Using Deep Learning Approach [PDF] 返回目录
Chao Tan
Abstract: Cloud motion winds (CMW) are routinely derived by tracking features in sequential geostationary satellite infrared cloud imagery. In this paper, we explore the cloud motion winds algorithm based on data-driven deep learning approach, and different from conventional hand-craft feature tracking and correlation matching algorithms, we use deep learning model to automatically learn the motion feature representations and directly output the field of cloud motion winds. In addition, we propose a novel large-scale cloud motion winds dataset (CMWD) for training deep learning models. We also try to use a single cloud imagery to predict the cloud motion winds field in a fixed region, which is impossible to achieve using traditional algorithms. The experimental results demonstrate that our algorithm can predict the cloud motion winds field efficiently, and even with a single cloud imagery as input.
摘要：云运动风（CMW）常规地通过在连续的静止卫星红外云图跟踪特征的。在本文中，我们探索风算法基于数据驱动深学习方法云运动，并且从传统的手工工艺特征跟踪和相关匹配算法不同，我们使用深度学习模型来自动学习运动特征的陈述和直接输出云运动风场。此外，我们提出了一个新的大型云迹风数据集（CMWD）训练深度学习模型。我们也尝试使用单个云图预测在一个固定的区域，这是不可能使用传统的算法，以实现云运动的风场。实验结果表明，我们的算法可以预测云运动风场有效地，甚至与单云图作为输入。

52. TCLNet: Learning to Locate Typhoon Center Using Deep Neural Network [PDF] 返回目录
Chao Tan
Abstract: The task of typhoon center location plays an important role in typhoon intensity analysis and typhoon path prediction. Conventional typhoon center location algorithms mostly rely on digital image processing and mathematical morphology operation, which achieve limited performance. In this paper, we proposed an efficient fully convolutional end-to-end deep neural network named TCLNet to automatically locate the typhoon center position. We design the network structure carefully so that our TCLNet can achieve remarkable performance base on its lightweight architecture. In addition, we also present a brand new large-scale typhoon center location dataset (TCLD) so that the TCLNet can be trained in a supervised manner. Furthermore, we propose to use a novel TCL+ piecewise loss function to further improve the performance of TCLNet. Extensive experimental results and comparison demonstrate the performance of our model, and our TCLNet achieve a 14.4% increase in accuracy on the basis of a 92.7% reduction in parameters compared with SOTA deep learning based typhoon center location methods.
摘要：台风中心位置的任务起着台风强度分析和台风路径预测具有重要作用。常规台风中心位置的算法主要依赖于数字图像处理和数学形态学运算，从而实现有限的性能。在本文中，我们提出了一个高效的全卷积终端到终端的深命名TCLNet自动定位台风中心位置的神经网络。我们设计的网络结构仔细使我们TCLNet能够实现其轻量化架构卓越的性能基础。此外，我们还提出了一种全新的大型台风中心位置的数据集（TCLD），以便TCLNet可监督的方式进行培训。此外，我们建议使用一种新的TCL +分段损失函数的进一步提高TCLNet的性能。大量的实验结果和比较表明我们的模型的性能，而我们TCLNet实现精度提高参数的92.7％，减少与SOTA深学习基于台风中心位置的方法相比较的基础上，14.4％。

53. UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks Compression and Acceleration [PDF] 返回目录
Jingfei Chang, Yang Lu, Ping Xue, Xing Wei, Zhen Wei
Abstract: To apply deep CNNs to mobile terminals and portable devices, many scholars have recently worked on the compressing and accelerating deep convolutional neural networks. Based on this, we propose a novel uniform channel pruning (UCP) method to prune deep CNN, and the modified squeeze-and-excitation blocks (MSEB) is used to measure the importance of the channels in the convolutional layers. The unimportant channels, including convolutional kernels related to them, are pruned directly, which greatly reduces the storage cost and the number of calculations. There are two types of residual blocks in ResNet. For ResNet with bottlenecks, we use the pruning method with traditional CNN to trim the 3x3 convolutional layer in the middle of the blocks. For ResNet with basic residual blocks, we propose an approach to consistently prune all residual blocks in the same stage to ensure that the compact network structure is dimensionally correct. Considering that the network loses considerable information after pruning and that the larger the pruning amplitude is, the more information that will be lost, we do not choose fine-tuning but retrain from scratch to restore the accuracy of the network after pruning. Finally, we verified our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification. The results indicate that the performance of the compact network after retraining from scratch, when the pruning rate is small, is better than the original network. Even when the pruning amplitude is large, the accuracy can be maintained or decreased slightly. On the CIFAR-100, when reducing the parameters and FLOPs up to 82% and 62% respectively, the accuracy of VGG-19 even improved by 0.54% after retraining.
摘要：为应用深层细胞神经网络的移动终端和便携式设备，许多学者最近制作的压缩和加速深卷积神经网络。基于此，我们提出了一个新颖的均匀沟道修剪（UCP）方法来剪枝深CNN，以及修改后的挤压和激励块（MSEB）被用于测量在卷积层中的信道的重要性。不重要的信道，包括与之相关的卷积核，直接修剪，这大大降低了存储成本和计算的数目。有两种类型的RESNET残余块。对于与RESNET瓶颈，我们使用的修剪方法与传统的CNN的块的中间修剪的3×3的卷积层。对于RESNET基本残留块，我们提出了一个方法来修剪始终在同一阶段的所有剩余块，保证了紧凑的网络结构尺寸正确。考虑到网络后修剪和修剪幅度越大失去了大量的信息，将丢失的信息越多，我们不选择微调，但从头开始重新训练修剪后恢复网络的准确性。最后，我们验证了我们的方法在CIFAR-10，CIFAR-100和ILSVRC-2012图像分类。结果表明，紧凑型网络的性能从头再培训后，当修剪率小，是比原来更好的网络。即使当修剪振幅大，精度可以保持或稍微降低。在CIFAR-100，还原的参数，并分别触发器高达82％和62％时，VGG-19的精度甚至0.54％再训练后好转。

54. 3D-Aided Data Augmentation for Robust Face Understanding [PDF] 返回目录
Yifan Xing, Yuanjun Xiong, Wei Xia
Abstract: Data augmentation has been highly effective in narrowing the data gap and reducing the cost for human annotation, especially for tasks where ground truth labels are difficult and expensive to acquire. In face recognition, large pose and illumination variation of face images has been a key factor for performance degradation. However, human annotation for the various face understanding tasks including face landmark localization, face attributes classification and face recognition under these challenging scenarios are highly costly to acquire. Therefore, it would be desirable to perform data augmentation for these cases. But simple 2D data augmentation techniques on the image domain are not able to satisfy the requirement of these challenging cases. As such, 3D face modeling, in particular, single image 3D face modeling, stands a feasible solution for these challenging conditions beyond 2D based data augmentation. To this end, we propose a method that produces realistic 3D augmented images from multiple viewpoints with different illumination conditions through 3D face modeling, each associated with geometrically accurate face landmarks, attributes and identity information. Experiments demonstrate that the proposed 3D data augmentation method significantly improves the performance and robustness of various face understanding tasks while achieving state-of-arts on multiple benchmarks.
摘要：数据隆胸已经在缩小差距的数据，并减少人为标注的成本，尤其是对于其中地面实况标签是困难和昂贵的收购任务非常有效。在面部识别，面部图像的大姿态和照明变化一直是性能退化的关键因素。然而，对于不同的面部理解任务，包括人脸标志性的本地化人力标注，面部属性这些具有挑战性的场景下的分类和人脸识别是非常昂贵的收购。因此，这将是理想的是对于这些情况执行数据扩张。但是，从图像域简单的2D数据的增强技术不能满足这些挑战的情况下的要求。这样，三维人脸建模，特别是单个图像的3D面部模型，代表超越基于2D数据增强这些挑战性条件的可行解。为此，我们建议产生逼真的三维增强从通过三维人脸模型不同的照明条件，多个视点图像中的每个与几何精确面部界标，属性和身份信息相关联的方法。实验结果表明，所提出的3D数据隆胸方法显著改善各种面部理解任务的性能和耐用性，同时实现在多个基准国家的艺术。

55. Consensus Clustering with Unsupervised Representation Learning [PDF] 返回目录
Jayanth Reddy Regatti, Aniket Anand Deshmukh, Eren Manavoglu, Urun Dogan
Abstract: Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must either be closer in the representation space, or have a similar cluster assignment. In this work, we leverage this idea together with ensemble learning to perform clustering and representation learning. Ensemble learning is widely used in the supervised learning setting but has not yet been practical in deep clustering. Previous works on ensemble learning for clustering neither work on the feature space nor learn features. We propose a novel ensemble learning algorithm dubbed Consensus Clustering with Unsupervised Representation Learning (ConCURL) which learns representations by creating a consensus on multiple clustering outputs. Specifically, we generate a cluster ensemble using random transformations on the embedding space, and define a consensus loss function that measures the disagreement among the constituents of the ensemble. Thus, diverse ensembles minimize this loss function in a synergistic way, which leads to better representations that work with all cluster ensemble constituents. Our proposed method ConCURL is easy to implement and integrate into any representation learning or deep clustering block. ConCURL outperforms all state of the art methods on various computer vision datasets. Specifically, we beat the closest state of the art method by 5.9 percent on the ImageNet-10 dataset, and by 18 percent on the ImageNet-Dogs dataset in terms of clustering accuracy. We further shed some light on the under-studied overfitting issue in clustering and show that our method does not overfit as much as existing methods, and thereby generalizes better for new data samples.
摘要：在深聚类和无监督学习的表示最新进展是基于这样的思想：将输入图像（通过数据扩充技术产生的）的不同视图必须是在表示空间更靠近，或具有类似的集群分配。在这项工作中，我们与乐团一起学习利用这一想法进行聚类和代表学习。集成学习被广泛用于监督学习设置，但至今还没有在深集群实用。集成学习以前的作品在聚类特征空间既不工作也不学习功能。我们建议配成共识聚类无监督表示学习（ConCURL），它通过创建多个集群输出共识学会表示一种新型的集成学习算法。具体来说，我们使用关于嵌入空间随机变换生成聚类集成，并定义一个共识损失函数措施的合奏的组分之间的分歧。因此，不同的合奏最小化以协同的方式这个损失函数，从而得到更好的表述，与所有群集合奏组成部分的工作。我们提出的方法ConCURL易于实现和集成到任何表示学习或深部聚集块。 ConCURL优于对各种计算机视觉数据集技术方法的所有状态。具体而言，我们5.9％击败技术的方法的最接近的状态上ImageNet-10的数据集，并通过18％的ImageNet-狗数据集在聚类精度方面。我们进一步阐明了下学习过学习问题的一些光聚类，并表明我们的方法没有过拟合不亚于现有的方法，从而更好地推广新的数据样本。

56. Nonconvex Regularization for Network Slimming:Compressing CNNs Even More [PDF] 返回目录
Kevin Bui, Fredrick Park, Shuai Zhang, Yingyong Qi, Jack Xin
Abstract: In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an $\ell_1$ penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the $\ell_1$ penalty with the $\ell_p$ and transformed $\ell_1$ (T$\ell_1$) penalties since these nonconvex penalties outperformed $\ell_1$ in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with $\ell_p$ and T$\ell_1$ penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than $\ell_1$. In addition, T$\ell_1$ preserves the model accuracy after channel pruning, and $\ell_{1/2, 3/4}$ yield compressed models with similar accuracies as $\ell_1$ after retraining.
摘要：在过去的十年中，卷积神经网络（细胞神经网络）已经发展成为适用于各种计算机视觉任务的主导车型，但它们不能在低内存设备由于其高内存需求和计算成本部署。一种流行的，直接的方法压缩细胞神经网络是网络纤体，其中在训练期间施加在所述批标准化层信道相关联的定标因子的$ \ ell_1 $罚款。通过这种方式，以低比例因子通道被确定是微不足道，并在模型被修剪。在本文中，我们建议更换$ \ ell_1 $罚款与$ \ ell_p $和转化$ \ ell_1 $（T $ \ ell_1 $）罚款，因为这些非凸惩罚产生各种稀疏满意的解决方案胜过$ \ ell_1 $压缩传感问题。在我们的数值实验，我们展示了训练有素的CIFAR 10/100上VGGNet和Densenet $ \ $ ell_p和T $ \ $ ell_1处罚网络减肥。结果表明，非凸处罚压缩比细胞神经网络$ \ $ ell_1更好。此外，T $ \ ell_1 $保留信道修剪后的模型的准确性，和$ \ ell_ {1/2，3/4} $屈服压缩模型具有类似的精度为$ \ ell_1 $重新训练后。

57. A Deep Genetic Programming based Methodology for Art Media Classification Robust to Adversarial Perturbations [PDF] 返回目录
Gustavo Olague, Gerardo Ibarra-Vazquez, Mariana Chan-Ley, Cesar Puente, Carlos Soubervielle-Montalvo, Axel Martinez
Abstract: Art Media Classification problem is a current research area that has attracted attention due to the complex extraction and analysis of features of high-value art pieces. The perception of the attributes can not be subjective, as humans sometimes follow a biased interpretation of artworks while ensuring automated observation's trustworthiness. Machine Learning has outperformed many areas through its learning process of artificial feature extraction from images instead of designing handcrafted feature detectors. However, a major concern related to its reliability has brought attention because, with small perturbations made intentionally in the input image (adversarial attack), its prediction can be completely changed. In this manner, we foresee two ways of approaching the situation: (1) solve the problem of adversarial attacks in current neural networks methodologies, or (2) propose a different approach that can challenge deep learning without the effects of adversarial attacks. The first one has not been solved yet, and adversarial attacks have become even more complex to defend. Therefore, this work presents a Deep Genetic Programming method, called Brain Programming, that competes with deep learning and studies the transferability of adversarial attacks using two artworks databases made by art experts. The results show that the Brain Programming method preserves its performance in comparison with AlexNet, making it robust to these perturbations and competing to the performance of Deep Learning.
摘要：艺术媒体分类的问题是，已经引起关注，因为高价值的艺术作品的功能复杂的提取和分析当前的研究领域。属性的感知不能是主观的，作为人类有时跟随艺术品，同时确保自动观测的信任的偏见解释。机器学习已经通过人工特征提取其学习过程从图像代替手工设计特征检测器的表现优于许多领域。然而，有关其可靠性的主要问题，促使人们关注，因为与输入图像（敌对攻击）在故意制造小扰动，其预测可以完全改变。通过这种方式，我们预计接近情况的方法有两种：（1）解决目前神经网络的方法，或（2）对抗攻击的问题，提出了不同的方法，可以挑战深度学习无对抗攻击的影响。第一个尚未解决的是，和对抗性攻击变得更加复杂保卫。因此，这项工作提出了一个深遗传编程的方法，称为脑编程，与深度学习和研究中使用的专家做了两个作品数据库对抗攻击的可转让性竞争。结果表明，大脑的编程方法保留其性能与AlexNet相比，让它更加强大对这些扰动，并争相深度学习的性能。

58. Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements [PDF] 返回目录
Arun Das, Jeffrey Mock, Henry Chacon, Farzan Irani, Edward Golob, Peyman Najafirad
Abstract: Speech disorders such as stuttering disrupt the normal fluency of speech by involuntary repetitions, prolongations and blocking of sounds and syllables. In addition to these disruptions to speech fluency, most adults who stutter (AWS) also experience numerous observable secondary behaviors before, during, and after a stuttering moment, often involving the facial muscles. Recent studies have explored automatic detection of stuttering using Artificial Intelligence (AI) based algorithm from respiratory rate, audio, etc. during speech utterance. However, most methods require controlled environments and/or invasive wearable sensors, and are unable explain why a decision (fluent vs stuttered) was made. We hypothesize that pre-speech facial activity in AWS, which can be captured non-invasively, contains enough information to accurately classify the upcoming utterance as either fluent or stuttered. Towards this end, this paper proposes a novel explainable AI (XAI) assisted convolutional neural network (CNN) classifier to predict near future stuttering by learning temporal facial muscle movement patterns of AWS and explains the important facial muscles and actions involved. Statistical analyses reveal significantly high prevalence of cheek muscles (p<0.005) and lip muscles (p<0.005) to predict stuttering shows a behavior conducive of arousal anticipation speak. the temporal study these upper lower facial may facilitate early detection stuttering, promote automated assessment have application in behavioral therapies by providing automatic non-invasive feedback realtime. < font>
摘要：语音障碍如口吃不自主重复，延长部分和声音和音节的阻断干扰语音的正常流畅性。除了这些中断语音流畅，多数成年人谁口吃（AWS）也出现在面前，许多观察到的二次行为，和口吃片刻后，常累及面部肌肉。最近的研究已经探索了利用人工智能（AI）语音发声期间呼吸率，音频等基于算法口吃的自动检测。然而，大多数方法需要控制的环境和/或侵入性的穿戴式传感器，并不能解释为什么决定（流利VS结巴）制成。我们假设在AWS是预讲话面部活动，能够被捕获的非侵入性，包含足够的信息，以即将到来的话语作为要么流畅或结巴准确分类。为此，本文提出了一种新解释的AI（XAI）的协助卷积神经网络（CNN）的分类，通过学习AWS的颞面部肌肉运动模式来预测不久的将来，口吃，并解释了重要的面部肌肉和参与行动。统计分析揭示了脸颊肌肉的显著高患病率（P <0.005）和唇肌肉（p <0.005）来预测口吃和示出了有利于觉醒和预期可言行为。的时间研究这些上下面部肌肉可以便于早期检测口吃，促进口吃的自动评估和通过实时提供自动非侵入性的反馈在行为疗法应用。< font>

59. Video Saliency Detection with Domain Adaption using Hierarchical Gradient Reversal Layers [PDF] 返回目录
Giovanni Bellitto, Federica Proietto Salanitri, Simone Palazzo, Francesco Rundo, Daniela Giordano, Concetto Spampinato
Abstract: In this work, we propose a 3D fully convolutional architecture for video saliency detection that employs multi-head supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction level. More specifically, the model employs a single encoder and features extracted at different levels are then passed to multiple decoders aiming at predicting multiple saliency instances that are finally combined to obtain final output saliency maps. We also combine the hierarchical features extracted from the model's encoder with a domain adaptation approach based on gradient reversal at multiple scales in order to improve the generalization capabilities on datasets for which no annotations are provided during training. The results of our experiments on standard benchmarks, namely DHF1K, Hollywood2 and UCF Sports, show that the proposed model outperforms state-of-the-art methods on most metrics for supervised saliency prediction. Moreover, when tested in an unsupervised settings, it is able to obtain performance comparable to those achieved by supervised state-of-the-art methods.
摘要：在这项工作中，我们提出了视频显着性检测三维充分卷积架构，采用使用在不同的抽象级别中提取的特征产生的上中间地图多头监督（称为醒目地图）。更具体地，模型采用然后单个编码器和不同级别提取的特征被传递给多个解码器针对预测，最终组合以获得最终的输出显着性映射的多个显着的实例。我们还结合起来，以提高对这些培训期间提供不带注释的数据集的泛化能力，从模型的基于多尺度梯度逆转域自适应编码方法提取的分层功能。我们的标准基准，即DHF1K，Hollywood2和UCF体育，显示实验的结果是国家的最先进的提议模型优于大多数指标显着监督预测方法。此外，在无监督设置中测试时，它能够获得性能比得上那些状态的最先进的监督的方法来实现。

60. Artificial Intelligence Enabled Traffic Monitoring System [PDF] 返回目录
Vishal Mandal, Abdul Rashid Mussah, Peng Jin, Yaw Adu-Gyamfi
Abstract: Manual traffic surveillance can be a daunting task as Traffic Management Centers operate a myriad of cameras installed over a network. Injecting some level of automation could help lighten the workload of human operators performing manual surveillance and facilitate making proactive decisions which would reduce the impact of incidents and recurring congestion on roadways. This article presents a novel approach to automatically monitor real time traffic footage using deep convolutional neural networks and a stand-alone graphical user interface. The authors describe the results of research received in the process of developing models that serve as an integrated framework for an artificial intelligence enabled traffic monitoring system. The proposed system deploys several state-of-the-art deep learning algorithms to automate different traffic monitoring needs. Taking advantage of a large database of annotated video surveillance data, deep learning-based models are trained to detect queues, track stationary vehicles, and tabulate vehicle counts. A pixel-level segmentation approach is applied to detect traffic queues and predict severity. Real-time object detection algorithms coupled with different tracking systems are deployed to automatically detect stranded vehicles as well as perform vehicular counts. At each stages of development, interesting experimental results are presented to demonstrate the effectiveness of the proposed system. Overall, the results demonstrate that the proposed framework performs satisfactorily under varied conditions without being immensely impacted by environmental hazards such as blurry camera views, low illumination, rain, or snow.
摘要：手动交通监控可以是一项艰巨的任务，因为交通管理中心的工作通过网络安装摄像机万千。注射自动化可以帮助照亮执行人工监控操作人员的工作量一定程度，并促进作出积极的决定，这将减少事故对道路的影响，经常性拥堵。本文提出了一种新的方法来使用卷积深层神经网络和独立的图形用户界面自动监测实时交通镜头。作者描述了在开发作为人工智能的集成框架模型的过程中收到的研究结果使交通监控系统。所提出的系统部署几个国家的最先进的深度学习算法，自动不同流量监测需求。以某大型数据库注释的视频监控数据的优势，深学习型模型训练查出队列，跟踪静止车辆，以及车辆制表计数。像素级分割方法适用于检测流量队列和预测严重程度。加上不同的跟踪系统的实时对象检测算法被部署到以及执行车辆计数自动检测滞留车辆。在发展的各个阶段，有趣的实验结果都证明了该系统的有效性。总的来说，结果表明，令人满意地在不同的条件的提出的框架进行而不被极大地由环境危害的影响如模糊的摄像机视图，低照度，雨水或雪。

61. Leveraging Tacit Information Embedded in CNN Layers for Visual Tracking [PDF] 返回目录
Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba
Abstract: Different layers in CNNs provide not only different levels of abstraction for describing the objects in the input but also encode various implicit information about them. The activation patterns of different features contain valuable information about the stream of incoming images: spatial relations, temporal patterns, and co-occurrence of spatial and spatiotemporal (ST) features. The studies in visual tracking literature, so far, utilized only one of the CNN layers, a pre-fixed combination of them, or an ensemble of trackers built upon individual layers. In this study, we employ an adaptive combination of several CNN layers in a single DCF tracker to address variations of the target appearances and propose the use of style statistics on both spatial and temporal properties of the target, directly extracted from CNN layers for visual tracking. Experiments demonstrate that using the additional implicit data of CNNs significantly improves the performance of the tracker. Results demonstrate the effectiveness of using style similarity and activation consistency regularization in improving its localization and scale accuracy.
摘要：在细胞神经网络不同的层不仅提供不同的抽象级别的用于说明在输入的对象，但也编码关于他们各种隐式信息。的不同特性的激活模式包含关于传入的图像流的有价值的信息：空间关系，时间模式，并且在空间和时空（ST）特征共同出现。在视觉跟踪文献的研究，迄今为止，仅利用一个CNN层，它们中的一个预先固定的组合，或在单独的层建立跟踪器的集合。在这项研究中，我们采用几个CNN层在单个DCF跟踪到目标出现的地址变化的自适应组合，并提出对目标的空间和时间特性的使用样式的统计，从CNN层用于视觉跟踪直接提取。实验表明，利用细胞神经网络的额外隐式数据显著提高了跟踪器的性能。结果证明使用样式的相似性和活化一致性正规化提高其定位和尺度精度的有效性。

62. Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images [PDF] 返回目录
John B. Sigman, Gregory P. Spell, Kevin J Liang, Lawrence Carin
Abstract: Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to protect air travelers in the United States. While more training data with threats may reliably improve performance for this class of deep algorithm, it is expensive to stage in realistic contexts. By contrast, data from the real world can be collected quickly with minimal cost. In this paper, we present a semi-supervised approach for threat recognition which we call Background Adaptive Faster R-CNN. This approach is a training method for two-stage object detectors which uses Domain Adaptation methods from the field of deep learning. The data sources described earlier make two "domains": a hand-collected data domain of images with threats, and a real-world domain of images assumed without threats. Two domain discriminators, one for discriminating object proposals and one for image features, are adversarially trained to prevent encoding domain-specific information. Without this penalty a Convolutional Neural Network (CNN) can learn to identify domains based on superficial characteristics, and minimize a supervised loss function without improving its ability to recognize objects. For the hand-collected data, only object proposals and image features from backgrounds are used. The losses for these domain-adaptive discriminators are added to the Faster R-CNN losses of images from both domains. This can reduce threat detection false alarm rates by matching the statistics of extracted features from hand-collected backgrounds to real world data. Performance improvements are demonstrated on two independently-collected datasets of labeled threats.
摘要：最近，取得了长足卷积对象探测器在指导训练（例如更快的R-CNN）的威胁识别由随身携带的行李通过X射线图像。这是美国运输安全管理局（TSA的）任务，在美国保护航空旅客的一部分。虽然有威胁更多的训练数据可以可靠地提高该类深算法的性能，它是昂贵的在现实环境中的阶段。相比之下，来自现实世界的数据可以迅速以最少的成本回收。在本文中，我们提出了威胁识别半监督的方法，我们称之为自适应背景更快R-CNN。这种方法是用于两级对象检测器的训练方法，它使用域适配方法从深学习的领域。数据源如前所述提出两个“域”：与威胁图像的手工收集的数据域，图像的真实世界域假设没有威胁。两个域鉴别，一个用于区分对象的提议，一个用于图像特征，被adversarially培训，以防止编码特定领域的信息。如果没有这个点球卷积神经网络（CNN）可以学会识别基于外表特征域，并尽量减少一个监督损失函数，而不提高其识别物体的能力。为手工收集到的数据，仅使用对象的建议和从背景图像特征。对于这些领域自适应鉴别器的损失被添加到从两个域的图像的更快的R-CNN损失。这可以通过匹配从手工收集的背景，现实世界的数据提取特征的统计减少威胁检测的误报率。性能改进上演示标记的威胁两个独立收集的数据集。

63. Semantic MapNet: Building Allocentric SemanticMaps and Representations from Egocentric Views [PDF] 返回目录
Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra
Abstract: We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?") from egocentric observations of an RGB-D camera with known pose (via localization sensors). Towards this goal, we present SemanticMapNet (SMNet), which consists of: (1) an Egocentric Visual Encoder that encodes each egocentric RGB-D frame, (2) a Feature Projector that projects egocentric features to appropriate locations on a floor-plan, (3) a Spatial Memory Tensor of size floor-plan length x width x feature-dims that learns to accumulate projected egocentric features, and (4) a Map Decoder that uses the memory tensor to produce semantic top-down maps. SMNet combines the strengths of (known) projective camera geometry and neural representation learning. On the task of semantic mapping in the Matterport3D dataset, SMNet significantly outperforms competitive baselines by 4.01-16.81% (absolute) on mean-IoU and 3.81-19.69% (absolute) on Boundary-F1 metrics. Moreover, we show how to use the neural episodic memories and spatio-semantic allocentric representations build by SMNet for subsequent tasks in the same space - navigating to objects seen during the tour("Find chair") or answering questions about the space ("How many chairs did you see in the house?").
摘要：我们研究语义映射的任务 - 具体而言，具体化剂（机器人或自我中心的AI助理）被赋予了新的环境之旅，并要求建立一个自我中心自上而下的语义地图（“这是什么地方？” ）从RGB-d照相机与已知的姿态（通过定位传感器）的自我中心观测。为了实现这一目标，我们提出SemanticMapNet（SMNet），其包括：编码每个自我中心RGB-d框架（1）的自我中心的视觉编码器，（2）一种功能投影仪项目自我中心功能上的平面布置图的适当位置，（3）尺寸平面布置图长度的空间记忆张量x宽x特征在于变暗学会累加投影自我中心的特征，和（4），其使用所述存储器张量，以产生语义自上而下图的映射解码器。 SMNet结合（已知）投影镜头的几何形状和神经表示学习的优势。论Matterport3D数据集语义映射的任务，SMNet显著通过对平均-IOU 4.01-16.81％（绝对值）和3.81-19.69％（绝对值）上边界F1指标优于竞争力的基线。此外，我们将展示如何通过SMNet在同一空间使用神经情节记忆和时空语义自我中心表示建立后续任务 - 浏览使其在巡视中看到的物体（“查找椅”）或回答问题有关的空间（“如何许多椅子做你在家里看到了什么？“）。

64. Adaptive Neural Layer for Globally Filtered Segmentation [PDF] 返回目录
Viktor Shipitsin, Iaroslav Bespalov, Dmitry V. Dylov
Abstract: This study is motivated by typical images taken during ultrasonic examinations in the clinic. Their grainy appearance, low resolution, and poor contrast demand an eye of a very qualified expert to discern targets and to spot pathologies. Training a segmentation model on such data is frequently accompanied by excessive pre-processing and image adjustments, with an accumulation of the localization error emerging due to the digital post-filtering artifacts and due to the annotation uncertainty. Each patient case generally requires an individually tuned frequency filter to obtain optimal image contrast and to optimize the segmentation quality. Thus, we aspired to invent an adaptive global frequency-filtering neural layer to "learn" optimal frequency filter for each image together with the weights of the segmentation network itself. Specifically, our model receives the source image in the spatial domain, automatically selects the necessary frequencies from the frequency domain, and transmits the inverse-transform image to the convolutional neural network for concurrent segmentation. In our experiments, such "learnable" filters boosted typical U-Net segmentation performance by 10% and made the training of other popular models (DenseNet and ResNet) almost twice faster. In our experiments, this trait holds both for two public datasets with ultrasonic images (breast and nerves), and for natural images (Caltech birds).
摘要：本研究是通过在在门诊超声检查采取典型形象的动机。他们粗糙的外观，分辨率低，对比度差的要求非常合格的专家来辨别目标的眼睛，并发现病变。训练这样的数据分割模型常常伴有过度前处理和图像调整，与定位误差出现的积累由于数字后置滤波伪影和由于注释不确定性。每个患者的情况下通常需要一个单独地调谐频率滤波器，以获得最佳的图像对比度和优化分割质量。因此，我们立志与分割网络本身的权重一起发明的自适应全球频率滤波神经层“学习”用于每个图像最佳频率滤波器。具体地，我们的模型接收所述源图像中的空间域，自动地从频域选择必要的频率，并发送该逆变换图像并发分割卷积神经网络。在我们的实验中，这种“可学”的过滤器由10％提高典型的U型网络分割性能和制造等热门机型的训练（DenseNet和RESNET）几乎快两倍。在我们的实验中，这种特质同时拥有与超声波图像（乳腺癌和神经），两个公共数据集和自然图像（加州理工学院的鸟）。

65. Machine learning approach to force reconstruction in photoelastic materials [PDF] 返回目录
Renat Sergazinov, Miroslav Kramar
Abstract: Photoelastic techniques have a long tradition in both qualitative and quantitative analysis of the stresses in granular materials. Over the last two decades, computational methods for reconstructing forces between particles from their photoelastic response have been developed by many different experimental teams. Unfortunately, all of these methods are computationally expensive. This limits their use for processing extensive data sets that capture the time evolution of granular ensembles consisting of a large number of particles. In this paper, we present a novel approach to this problem which leverages the power of convolutional neural networks to recognize complex spatial patterns. The main drawback of using neural networks is that training them usually requires a large labeled data set which is hard to obtain experimentally. We show that this problem can be successfully circumvented by pretraining the networks on a large synthetic data set and then fine-tuning them on much smaller experimental data sets. Due to our current lack of experimental data, we demonstrate the potential of our method by changing the size of the considered particles which alters the exhibited photoelastic patterns more than typical experimental errors.
摘要：光弹技术有颗粒状材料的应力定性和定量分析的悠久传统。在过去的二十年中，从他们的光弹响应重建颗粒之间的力的计算方法已被许多不同的实验团队开发。不幸的是，所有这些方法都是计算昂贵。这限制了它们的使用，用于处理大量的数据集，捕获由大量颗粒的粒状合奏的时间演变。在本文中，我们提出了一种新颖的方法解决这个问题，其利用卷积神经网络的功率以识别复杂的空间模式。采用神经网络的主要缺点是，训练他们通常需要大的标记的数据集，这是很难获得实验。我们发现，这个问题可以通过在一个大的综合数据集，然后微调他们在更小的实验数据集训练前的网络中成功地规避。由于目前我国缺乏实验数据，我们通过改变所考虑的粒子的个别展出光弹模式超过典型的实验误差的大小证明我们的方法的潜力。

66. MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand Pose Synthesis [PDF] 返回目录
Zhenyu Wu, Duc Hoang, Shih-Yao Lin, Yusheng Xie, Liangjian Chen, Yen-Yu Lin, Zhangyang Wang, Wei Fan
Abstract: Estimating the 3D hand pose from a monocular RGB image is important but challenging. A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations. However, it is too expensive in practice. Instead, we have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images under the guidance of 3D pose information. We propose a 3D-aware multi-modal guided hand generative network (MM-Hand), together with a novel geometry-based curriculum learning strategy. Our extensive experimental results demonstrate that the 3D-annotated images generated by MM-Hand qualitatively and quantitatively outperform existing options. Moreover, the augmented data can consistently improve the quantitative performance of the state-of-the-art 3D hand pose estimators on two benchmark datasets. The code will be available at this https URL.
摘要：从估计单眼RGB图像的3D手的姿势很重要，但挑战。一种解决方案培训与精确的3D手关键点注解大型RGB手图像。然而，在实践中是太贵了。取而代之的是，我们已经开发出一种基于学习的方法来合成逼真，多样，三维姿态保留的三维姿态信息的指导下，手图像。我们提出了一个3D知晓的多模式制导手生成网络（MM-手），用一种新的基于几何课程学习策略在一起。我们广泛的实验结果表明，通过MM-手工生成的3D-注释的图像质量和数量上超越现有的选项。此外，增强的数据可以持续改善国家的最先进的三维手姿态估计的两个标准数据集的定量性能。该代码将可在此HTTPS URL。

67. Semantics-Guided Clustering with Deep Progressive Learning for Semi-Supervised Person Re-identification [PDF] 返回目录
Chih-Ting Liu, Yu-Jhe Li, Shao-Yi Chien, Yu-Chiang Frank Wang
Abstract: Person re-identification (re-ID) requires one to match images of the same person across camera views. As a more challenging task, semi-supervised re-ID tackles the problem that only a number of identities in training data are fully labeled, while the remaining are unlabeled. Assuming that such labeled and unlabeled training data share disjoint identity labels, we propose a novel framework of Semantics-Guided Clustering with Deep Progressive Learning (SGC-DPL) to jointly exploit the above data. By advancing the proposed Semantics-Guided Affinity Propagation (SG-AP), we are able to assign pseudo-labels to selected unlabeled data in a progressive fashion, under the semantics guidance from the labeled ones. As a result, our approach is able to augment the labeled training data in the semi-supervised setting. Our experiments on two large-scale person re-ID benchmarks demonstrate the superiority of our SGC-DPL over state-of-the-art methods across different degrees of supervision. In extension, the generalization ability of our SGC-DPL is also verified in other tasks like vehicle re-ID or image retrieval with the semi-supervised setting.
摘要：人重新鉴定（重新-ID）需要一个跨越摄像机视图匹配同一人的图像。作为一个更艰巨的任务，半监督再ID铲球的问题，只有数量的训练数据的身份是完全标记，而其余未标记。假设这样的标记和未标记的训练数据共享不相交的身份标签，我们提出与深渐进式学习（SGC-DPL）语义制导集群的一个新框架，共同利用上述数据。通过推进所提出的语义制导近邻传播（SG-AP），我们能够分配伪标签选择以渐进的方式未标记的数据，从标记那些语义指导下进行。其结果是，我们的做法是能够在半监督环境，以加强标记的训练数据。我们对两次大规模的人重新编号基准的实验证明我们的SGC-DPL超过跨越不同程度的监管的国家的最先进方法的优越性。在扩展，我们的SGC-DPL的泛化能力也验证了样车再ID或图像检索与半监督设置等任务。

68. Medical Imaging and Computational Image Analysis in COVID-19 Diagnosis: A Review [PDF] 返回目录
Shahabedin Nabavi, Azar Ejmalian, Mohsen Ebrahimi Moghaddam, Ahmad Ali Abin, Alejandro F. Frangi, Mohammad Mohammadi, Hamidreza Saligheh Rad
Abstract: Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. The disease presents with symptoms such as shortness of breath, fever, dry cough, and chronic fatigue, amongst others. Sometimes the symptoms of the disease increase so much they lead to the death of the patients. The disease may be asymptomatic in some patients in the early stages, which can lead to increased transmission of the disease to others. Many studies have tried to use medical imaging for early diagnosis of COVID-19. This study attempts to review papers on automatic methods for medical image analysis and diagnosis of COVID-19. For this purpose, PubMed, Google Scholar, arXiv and medRxiv were searched to find related studies by the end of April 2020, and the essential points of the collected studies were summarised. The contribution of this study is four-fold: 1) to use as a tutorial of the field for both clinicians and technologists, 2) to comprehensively review the characteristics of COVID-19 as presented in medical images, 3) to examine automated artificial intelligence-based approaches for COVID-19 diagnosis based on the accuracy and the method used, 4) to express the research limitations in this field and the methods used to overcome them. COVID-19 reveals signs in medical images can be used for early diagnosis of the disease even in asymptomatic patients. Using automated machine learning-based methods can diagnose the disease with high accuracy from medical images and reduce time, cost and error of diagnostic procedure. It is recommended to collect bulk imaging data from patients in the shortest possible time to improve the performance of COVID-19 automated diagnostic methods.
摘要：冠状病毒病（COVID-19）是由一种新发现的冠状病毒感染性疾病。该疾病呈现的症状，如呼吸急促，发烧，干咳，和慢性疲劳，等等。有时病情加重的症状这么多，他们会导致患者死亡。这种疾病可能在某些患者在早期阶段，这可能会导致疾病给别人透射增加无症状。许多研究都试图利用医疗成像COVID-19的早期诊断。本研究试图审查的COVID-19的医学图像分析和诊断的自动方法的论文。为此，考研，谷歌学术，和的arXiv被medRxiv搜索四月2020年底查找相关研究，并收集到研究的要点进行了总结。本研究的贡献是四倍：1）作为域的两个临床医生和技术专家的教程，2）使用全面检讨COVID-19的特性如在医学图像中呈现的，3）检查自动化的人工智能基于所述准确度和所使用的方法，4）来表达在这一领域的研究局限性COVID-19基于诊断方法和用于克服它们的方法。 COVID-19显示在医学图像标志，可用于即使在无症状患者疾病的早期诊断。使用自动化机器学习型方法可以诊断与来自医用图像高精度地疾病和减少时间，成本和诊断程序错误。建议收集批量成像数据从患者在最短的时间内提高的COVID-19自动诊断方法的性能。

69. i-DenseNets [PDF] 返回目录
Yura Perugachi-Diaz, Jakub M. Tomczak, Sandjai Bhulai
Abstract: We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient alternative to Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce the invertibility of the network by satisfying the Lipschitz constraint. Additionally, we extend this method by proposing a learnable concatenation, which not only improves the model performance but also indicates the importance of the concatenated representation. We demonstrate the performance of i-DenseNets and Residual Flows on toy, MNIST, and CIFAR10 data. Both i-DenseNets outperform Residual Flows evaluated in negative log-likelihood, on all considered datasets under an equal parameter budget.
摘要：介绍可逆的密集网络（I-DenseNets），残留更有效的参数备选流。该方法依赖于串联的DenseNets的李氏连续性，在这里我们通过满足李氏约束强制执行网络的可逆性的分析。此外，我们通过提出一个可以学习的串联，这不仅提高了模型的性能扩展此方法也预示级联代表的重要性。我们展示的i-DenseNets的性能和剩余流量的玩具，MNIST和CIFAR10数据。两个I-DenseNets跑赢剩余流量为负对数似然同等参数下的预算评估，在所有考虑的数据集。

70. Automatic Deep Learning System for COVID-19 Infection Quantification in chest CT [PDF] 返回目录
Omar Ibrahim Alirr
Abstract: Coronavirus Disease spread globally and infected millions of people quickly, causing high pressure on the health-system facilities. PCR screening is the adopted diagnostic testing method for COVID-19 detection. However, PCR is criticized due its low sensitivity ratios, also, it is time-consuming and manual complicated process. CT imaging proved its ability to detect the disease even for asymptotic patients, which make it a trustworthy alternative for PCR. In addition, the appearance of COVID-19 infections in CT slices, offers high potential to support in disease evolution monitoring using automated infection segmentation methods. However, COVID-19 infection areas include high variations in term of size, shape, contrast and intensity homogeneity, which impose a big challenge on segmentation process. To address these challenges, this paper proposed an automatic deep learning system for COVID-19 infection areas segmentation. The system include different steps to enhance and improve infection areas appearance in the CT slices so they can be learned efficiently using the deep network. The system start prepare the region of interest by segmenting the lung organ, which then undergo edge enhancing diffusion filtering (EED) to improve the infection areas contrast and intensity homogeneity. The proposed FCN is implemented using U-net architecture with modified residual block with concatenation skip connection. The block improves the learning of gradient values by forwarding the infection area features through the network. To demonstrate the generalization and effectiveness of the proposed system, it is trained and tested using many 2D CT slices extracted from diverse datasets from different sources. The proposed system is evaluated using different measures and achieved dice overlapping score of 0.961 and 0.780 for lung and infection areas segmentation, respectively.
摘要：冠状疾病传播和全球感染数以百万计的人快，造成对卫生系统设施的高压。 PCR筛选为COVID-19检测所采用的诊断试验方法。然而，PCR由于其低灵敏度比批评，另外，它是费时和手动复杂的过程。 CT成像证明了其对于即使查出疾病为渐近的患者，这使得它的可信任替代用于PCR的能力。此外，COVID-19感染的CT片的出现，提供了高潜力疾病演变使用自动感染分割方法监控支持。然而，COVID-19感染的领域包括在尺寸，形状，对比度和强度均匀性的术语高变化，这强加分割处理的一大挑战。为了应对这些挑战，本文提出了COVID-19感染区域分割的自动深度学习系统。该系统包括不同的步骤，加强和改善CT片，使他们能够有效地利用深层网络得知感染地区的外观。该系统开始通过分割肺器官，其然后经历边缘增强扩散滤波（EED），以改善感染区域的对比度和均匀性强度制备的感兴趣的区域。所提出的FCN是使用U形网架构与级联连接跳过修改的残余块来实现。该块通过网络转发所述感染区域特征提高了梯度值的学习。为了证明所提出的系统的通用性和有效性，它被训练，并使用来自不同来源的不同数据集提取的许多二维CT切片进行测试。所提出的系统使用不同的措施，取得了骰子肺癌和感染区域分割，分别重叠的0.961和0.780的分数进行评价。

71. Machine Learning and Computer Vision Techniques to Predict Thermal Properties of Particulate Composites [PDF] 返回目录
Fazlolah Mohaghegh, Jayathi Murthy
Abstract: Accurate thermal analysis of composites and porous media requires detailed characterization of local thermal properties in small scale. For some important applications such as lithium-ion batteries, changes in the properties during the operation makes the analysis even more challenging, necessitating a rapid characterization. We propose a new method to characterize the thermal properties of particulate composites based on actual micro-images. Our computer-vision-based approach constructs 3D images from stacks of 2D SEM images and then extracts several representative elemental volumes (REVs) from the reconstructed images at random places, which leads to having a range of geometrical features for different REVs. A deep learning algorithm is designed based on convolutional neural nets to take the shape of the geometry and result in the effective conductivity of the REV. The training of the network is performed in two methods: First, based on implementing a coarser grid that uses the average values of conductivities from the fine grid and the resulted effective conductivity from the DNS solution of the fine grid. The other method uses conductivity values on cross sections from each REV in different directions. The results of training based on averaging show that using a coarser grid in the network does not have a meaningful effect on the network error; however, it decreases the training time up to three orders of magnitude. We showed that one general network can make accurate predictions using different types of electrode images, representing the difference in the geometry and constituents. Moreover, training based on averaging is more accurate than training based on cross sections. The study of the robustness of implementing a machine learning technique in predicting the thermal percolation shows the prediction error is almost half of the error from predictions based on the volume fraction.
摘要：复合材料和多孔介质的准确的热分析需要在小规模局部热性能的详细特征。用于例如锂离子电池的一些重要的应用中，在操作期间变化的属性，使分析更加具有挑战性的，因此需要一个快速表征。我们提出了一个新的方法来表征根据实际微影像颗粒复合材料的热性能。我们基于计算机视觉的方式从2D SEM图像的堆栈构造的3D图像，然后提取从重建图像随机地几个有代表性的元素卷（转速），从而导致具有一定范围的不同转速几何特征。深学习算法是基于卷积神经网络采取的几何形状和结果的外形在REV的有效导热设计。首先，根据实现使用电导率的平均值从精细网格和起因于细网格的DNS溶液有效电导率较粗的网格：网络的训练是在两种方法执行。另一种方法上的横截面采用电导率值从不同的方向各REV。基于平均显示，在网络中使用粗糙的网格不会对网络错误的有意义的影响训练效果;然而，它减少了训练时间长达三个数量级。我们发现，一个普通的网络可以使用不同类型的电极像的，代表的几何形状和成分的差别作出准确的预测。此外，基于平均训练比基于截面训练更加准确。在预测热渗滤所示的预测误差执行机器学习技术的鲁棒性的研究是几乎一半的错误的从基于所述体积分数的预测。

72. A Comparative Study of Existing and New Deep Learning Methods for Detecting Knee Injuries using the MRNet Dataset [PDF] 返回目录
David Azcona, Kevin McGuinness, Alan F. Smeaton
Abstract: This work presents a comparative study of existing and new techniques to detect knee injuries by leveraging Stanford's MRNet Dataset. All approaches are based on deep learning and we explore the comparative performances of transfer learning and a deep residual network trained from scratch. We also exploit some characteristics of Magnetic Resonance Imaging (MRI) data by, for example, using a fixed number of slices or 2D images from each of the axial, coronal and sagittal planes as well as combining the three planes into one multi-plane network. Overall we achieved a performance of 93.4% AUC on the validation data by using the more recent deep learning architectures and data augmentation strategies. More flexible architectures are also proposed that might help with the development and training of models that process MRIs. We found that transfer learning and a carefully tuned data augmentation strategy were the crucial factors in determining best performance.
摘要：这项工作提出的现有的和新技术的比较研究通过利用斯坦福大学的MRNet数据集检测膝盖受伤。所有的方法都是基于深度学习和大家探讨迁移学习的对比表演，从头开始培养了深厚的剩余网络。我们还利用磁共振成像（MRI）数据的一些特点通过，例如，使用切片或2D图像的固定数量从每个轴向，冠状和矢状平面以及三个平面组合成一个多平面网络。整体而言，我们通过使用更近的深度学习的架构和数据扩张策略验证数据达到93.4％AUC的性能。更灵活的架构也提出了发展和模式，过程核磁共振训练可能会有帮助。我们发现，转让的学习和精心调校的数据增强策略是决定最佳性能的关键因素。

73. Unsupervised Region-based Anomaly Detection in Brain MRI with Adversarial Image Inpainting [PDF] 返回目录
Bao Nguyen, Adam Feldman, Sarath Bethapudi, Andrew Jennings, Chris G. Willcocks
Abstract: Medical segmentation is performed to determine the bounds of regions of interest (ROI) prior to surgery. By allowing the study of growth, structure, and behaviour of the ROI in the planning phase, critical information can be obtained, increasing the likelihood of a successful operation. Usually, segmentations are performed manually or via machine learning methods trained on manual annotations. In contrast, this paper proposes a fully automatic, unsupervised inpainting-based brain tumour segmentation system for T1-weighted MRI. First, a deep convolutional neural network (DCNN) is trained to reconstruct missing healthy brain regions. Then, upon application, anomalous regions are determined by identifying areas of highest reconstruction loss. Finally, superpixel segmentation is performed to segment those regions. We show the proposed system is able to segment various sized and abstract tumours and achieves a mean and standard deviation Dice score of 0.771 and 0.176, respectively.
摘要：医疗分段进行，以确定手术前的感兴趣区域（ROI）的范围。通过允许增长，结构，并在规划阶段的投资回报率的行为的研究，可以得到关键信息，提高操作成功的可能性。通常情况下，被分割手动或通过训练手动注释的机器学习方法来执行。与此相反，提出了一种完全自动的，无监督基于修补脑肿瘤分割为T1加权的MRI系统。首先，深卷积神经网络（DCNN）被训练来重建丢失健康的大脑区域。然后，在应用程序中，异常的区域被通过识别最高重建丢失的区域来确定。最后，超像素分割进行分割的那些区域。我们表明，该系统能够段不同尺寸和抽象的肿瘤，达到平均值和标准偏差骰子得分分别为0.771和0.176。

74. Test-time Unsupervised Domain Adaptation [PDF] 返回目录
Thomas Varsavsky, Mauricio Orbes-Arteaga, Carole H. Sudre, Mark S. Graham, Parashkev Nachev, M. Jorge Cardoso
Abstract: Convolutional neural networks trained on publicly available medical imaging datasets (source domain) rarely generalise to different scanners or acquisition protocols (target domain). This motivates the active field of domain adaptation. While some approaches to the problem require labeled data from the target domain, others adopt an unsupervised approach to domain adaptation (UDA). Evaluating UDA methods consists of measuring the model's ability to generalise to unseen data in the target domain. In this work, we argue that this is not as useful as adapting to the test set directly. We therefore propose an evaluation framework where we perform test-time UDA on each subject separately. We show that models adapted to a specific target subject from the target domain outperform a domain adaptation method which has seen more data of the target domain but not this specific target subject. This result supports the thesis that unsupervised domain adaptation should be used at test-time, even if only using a single target-domain subject
摘要：训练上公开可用的医学成像数据集（源域）卷积神经网络很少推广到不同的扫描仪或获取协议（目标域）。这促使领域适应性的活跃领域。虽然一些方法的问题需要从目标域标记数据，其他人采取监督的做法领域适应性（UDA）。评估UDA方法包括测量模型推广到看不见的数据在目标域的能力。在这项工作中，我们认为这是没有适应测试仪直接有用。因此，我们提出了一个评估框架，我们在每一个主体进行测试时间分别UDA。我们表明，模型适用于从目标域的特定目标对象跑赢域自适应方法，这已经看到目标域的详细数据，但没有这个特定的目标对象。该结果支持了论断无监督域适应应在测试时间被使用，即使仅使用单一的目标域主题

75. Quantifying Statistical Significance of Neural Network Representation-Driven Hypotheses by Selective Inference [PDF] 返回目录
Vo Nguyen Le Duy, Shogo Iwazaki, Ichiro Takeuchi
Abstract: In the past few years, various approaches have been developed to explain and interpret deep neural network (DNN) representations, but it has been pointed out that these representations are sometimes unstable and not reproducible. In this paper, we interpret these representations as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method to quantify the reliability of these hypotheses in statistical hypothesis testing framework. To this end, we introduce Selective Inference (SI) framework, which has received much attention in the past few years as a new statistical inference framework for data-driven hypotheses. The basic idea of SI is to make conditional inferences on the selected hypotheses under the condition that they are selected. In order to use SI framework for DNN representations, we develop a new SI algorithm based on homotopy method which enables us to derive the exact (non-asymptotic) conditional sampling distribution of the DNN-driven hypotheses. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method can successfully control the false positive rate, has decent performance in terms of computational efficiency, and provides good results in practical applications.
摘要：在过去的几年中，各种办法已经制定，解释和解释深层神经网络（DNN）表示，但它已经指出，这些表示有时不稳定和不可再现。在本文中，我们解释这些表示由DNN驱动的假设（称为DNN驱动的假设），并提出量化这些假设的统计假设测试框架的可靠性的方法。为此，我们引入选择性推理（SI）的框架，它受到很多关注，在过去几年中，作为数据驱动假设一个新的统计推断的框架。 SI的基本思想是使他们选择的条件下，所选择的假设条件的推论。为了使用SI框架DNN表示，我们开发了基于同伦方法的一个新的SI算法，使我们能够得出DNN驱动假设的确切（非渐近）有条件的抽样分布。我们对合成和真实世界的数据集，通过它，我们提供的证据表明，我们提出的方法能够成功地控制了假阳性率，在计算效率方面不俗的表现，并提供在实际应用中很好的效果进行实验。

76. DCT-SNN: Using DCT to Distribute Spatial Information over Time for Learning Low-Latency Spiking Neural Networks [PDF] 返回目录
Isha Garg, Sayeed Shafayet Chowdhury, Kaushik Roy
Abstract: Spiking Neural Networks (SNNs) offer a promising alternative to traditional deep learning frameworks, since they provide higher computational efficiency due to event-driven information processing. SNNs distribute the analog values of pixel intensities into binary spikes over time. However, the most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the additional temporal learning capability of SNNs effectively. Moreover, these SNNs suffer from high inference latency which is a major bottleneck to their deployment. To overcome this, we propose a scalable time-based encoding scheme that utilizes the Discrete Cosine Transform (DCT) to reduce the number of timesteps required for inference. DCT decomposes an image into a weighted sum of sinusoidal basis images. At each time step, the Hadamard product of the DCT coefficients and a single frequency base, taken in order, is given to an accumulator that generates spikes upon crossing a threshold. We use the proposed scheme to learn DCT-SNN, a low-latency deep SNN with leaky-integrate-and-fire neurons, trained using surrogate gradient descent based backpropagation. We achieve top-1 accuracy of 89.94%, 68.3% and 52.43% on CIFAR-10, CIFAR-100 and TinyImageNet, respectively using VGG architectures. Notably, DCT-SNN performs inference with 2-14X reduced latency compared to other state-of-the-art SNNs, while achieving comparable accuracy to their standard deep learning counterparts. The dimension of the transform allows us to control the number of timesteps required for inference. Additionally, we can trade-off accuracy with latency in a principled manner by dropping the highest frequency components during inference.
摘要：扣球神经网络（SNNS）提供了一个很有前途的替代传统的深度学习的框架，因为它们由于事件驱动的信息处理提供更高的计算效率。 SNNS分发像素强度的模拟值转换成二进制尖峰随时间。然而，最广泛使用的输入编码方案，如基于泊松率编码，没有有效地利用SNNS的额外时间学习能力。此外，这些SNNS高延迟推断这是一个主要瓶颈其部署受到影响。为了克服这个问题，我们提出了利用离散余弦变换（DCT），以减少所需的推理时间步数的可扩展的基于时间的编码方案。 DCT将图像分解成正弦基础图像的加权和。在每个时间步骤中，DCT系数的Hadamard积和单一频率的基础上，为了服用，是考虑到使得在跨越阈值时生成的尖峰的累加器。我们使用该方案，了解DCT-SNN，低延迟深SNN与泄漏积分和火神经元，使用替代梯度下降反向传播基础的培训。我们对CIFAR-10实现顶-1精度89.94％，68.3％和52.43％，CIFAR-100和TinyImageNet，分别使用VGG架构。值得注意的是，DCT-SNN进行推理与2-14X降低的延迟相比，国家的最先进的其他SNNS，同时实现可比较的准确性其标准深度学习对应。变换的尺寸使我们能够控制的推理所需时间步数。此外，我们可以通过推理过程中掉落的最高频率分量权衡精度潜伏期有原则的方式。

77. Towards Generalized and Distributed Privacy-Preserving Representation Learning [PDF] 返回目录
Sheikh Shams Azam, Taejin Kim, Seyyedali Hosseinalipour, Christopher Brinton, Carlee Joe-Wong, Saurabh Bagchi
Abstract: We study the problem of learning data representations that are private yet informative, i.e., providing information about intended "ally" targets while obfuscating sensitive "adversary" attributes. We propose a novel framework, Exclusion-Inclusion Generative Adversarial Network (EIGAN), that generalizes existing adversarial privacy-preserving representation learning (PPRL) approaches to generate data encodings that account for multiple possibly overlapping ally and adversary targets. Preserving privacy is even more difficult when the data is collected across multiple distributed nodes, which for privacy reasons may not wish to share their data even for PPRL training. Thus, learning such data representations at each node in a distributed manner (i.e., without transmitting source data) is of particular importance. This motivates us to develop D-EIGAN, the first distributed PPRL method, based on federated learning with fractional parameter sharing to account for communication resource limitations. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and consider the impact of dependencies among ally and adversary tasks on the encoder performance. Our experiments on real-world and synthetic datasets demonstrate the advantages of EIGAN encodings in terms of accuracy, robustness, and scalability; in particular, we show that EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement). The experiments further reveal that D-EIGAN's performance is consistent with EIGAN under different node data distributions and is resilient to communication constraints.
摘要：我们研究学习数据表示是私人而内容，即提供有关预期的“盟友”的目标，同时混淆敏感的“对手”的属性信息的问题。我们提出了一个新的框架，排除共融剖成对抗性网（EIGAN），一般化现有的对抗性隐私保护表示学习（PPRL）方法来生成考虑多个可能的盟友重叠和对手的目标数据编码。保护私密性，当数据分布在多个分布式节点，这对于隐私的原因，可能不希望共享他们的数据甚至PPRL培训收集更是难上加难。因此，以分布的方式学习在每个节点处，例如数据表示（即，不发送源的数据）是特别重要的。这促使我们基于联邦学习与分数参数共享账户通信资源限制开发d-EIGAN，第一分布式PPRL方法。从理论上分析对手的行为的最佳下EIGAN和d-EIGAN编码器和考虑中的编码器性能的盟友，敌人的任务依赖性的影响。我们对现实世界和合成数据集的实验表明EIGAN编码的优势，在精度，可靠性和可扩展性方面;特别是，我们表明，EIGAN由显著精度余量（47％的改善）优于先前的状态的最先进的。实验还表明，d-EIGAN的性能符合不同EIGAN节点数据分布和下是弹性的，以通信约束。

78. Photon-Driven Neural Path Guiding [PDF] 返回目录
Shilin Zhu, Zexiang Xu, Tiancheng Sun, Alexandr Kuznetsov, Mark Meyer, Henrik Wann Jensen, Hao Su, Ravi Ramamoorthi
Abstract: Although Monte Carlo path tracing is a simple and effective algorithm to synthesize photo-realistic images, it is often very slow to converge to noise-free results when involving complex global illumination. One of the most successful variance-reduction techniques is path guiding, which can learn better distributions for importance sampling to reduce pixel noise. However, previous methods require a large number of path samples to achieve reliable path guiding. We present a novel neural path guiding approach that can reconstruct high-quality sampling distributions for path guiding from a sparse set of samples, using an offline trained neural network. We leverage photons traced from light sources as the input for sampling density reconstruction, which is highly effective for challenging scenes with strong global illumination. To fully make use of our deep neural network, we partition the scene space into an adaptive hierarchical grid, in which we apply our network to reconstruct high-quality sampling distributions for any local region in the scene. This allows for highly efficient path guiding for any path bounce at any location in path tracing. We demonstrate that our photon-driven neural path guiding method can generalize well on diverse challenging testing scenes that are not seen in training. Our approach achieves significantly better rendering results of testing scenes than previous state-of-the-art path guiding methods.
摘要：虽然蒙特卡洛路径跟踪是一种简单而有效的算法合成照片般逼真的图像，它往往是非常缓慢的收敛到涉及复杂的全局光照时无噪音的效果。其中最成功的方差缩减技术是路径引导，其可以更好地学习为分布重要性采样来减少像素的噪声。然而，以往的方法需要大量的路径的样品来实现可靠的路径引导。我们提出了一个新颖的神经路径引导方法，可以重构高质量的取样分布对路径从稀疏样本集合指导，使用离线训练的神经网络。我们利用从光源作为输入用于采样密度重建，这对于具有较强的全局照明场景挑战高效跟踪光子。为了充分利用我们的深层神经网络，我们的现场空间分隔成自适应分层网格，在此我们运用我们的网络重建高质量抽样分布在场景中的任何局部区域。这允许高效路径引导，用于在路径跟踪任何位置的任何路径反弹。我们证明了我们的光子驱动的神经路径导航方法可以在未在训练看到不同的挑战测试场景一概而论好。我们的方法达到显著更好的渲染测试场景比以前的国家的最先进的路径导航方法的结果。

79. OLALA: Object-Level Active Learning Based Layout Annotation [PDF] 返回目录
Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, Weining Li
Abstract: In layout object detection problems, the ground-truth datasets are constructed by annotating object instances individually. Yet active learning for object detection is typically conducted at the image level, not at the object level. Because objects appear with different frequencies across images, image-level active learning may be subject to over-exposure to common objects. This reduces the efficiency of human labeling. This work introduces an Object-Level Active Learning based Layout Annotation framework, OLALA, which includes an object scoring method and a prediction correction algorithm. The object scoring method estimates the object prediction informativeness considering both the object category and the location. It selects only the most ambiguous object prediction regions within an image for users to label, optimizing the use of the annotation budget. For the unselected model predictions, we propose a correction algorithm to rectify two types of possible errors with minor supervision from ground-truths. The human annotated and model predicted objects are then merged as new image annotations for training the object detection models. In simulated labeling experiments, we show that OLALA helps to create the dataset more efficiently and report strong accuracy improvements of the trained models compared to image-level active learning baselines. The code is available at this https URL.
摘要：在布局对象检测的问题，地面实况数据集通过分别标注对象实例构造。然而，对于物体检测主动学习是在图像的水平典型地进行，而不是在对象级别。因为对象显示与整个图像不同的频率，图像级的主动学习可能受到过度暴露于共同的对象。这减少了人工标记的效率。这项工作引入了一个对象级主动学习基于布局注释框架，的Olala，其包括对象评分法和预测校正算法。对象评分方法估计同时考虑对象类别和位置的对象预测信息量。它为用户标签图像内只选择最模糊对象预测区域，优化使用注释预算。对于未选择的模型预测，我们提出了一个修正算法来纠正两种可能出现的错误与地面真理未成年人的监督。然后，人类注释和模型预测对象合并为训练对象检测模型的新形象注解。在模拟标记实验，我们发现的Olala有助于更有效地创建数据集和报告相比，图像水平主动学习基线的训练的模型的精度强劲改善。该代码可在此HTTPS URL。

80. Learning Manifold Implicitly via Explicit Heat-Kernel Learning [PDF] 返回目录
Yufan Zhou, Changyou Chen, Jinhui Xu
Abstract: Manifold learning is a fundamental problem in machine learning with numerous applications. Most of the existing methods directly learn the low-dimensional embedding of the data in some high-dimensional space, and usually lack the flexibility of being directly applicable to down-stream applications. In this paper, we propose the concept of implicit manifold learning, where manifold information is implicitly obtained by learning the associated heat kernel. A heat kernel is the solution of the corresponding heat equation, which describes how "heat" transfers on the manifold, thus containing ample geometric information of the manifold. We provide both practical algorithm and theoretical analysis of our framework. The learned heat kernel can be applied to various kernel-based machine learning models, including deep generative models (DGM) for data generation and Stein Variational Gradient Descent for Bayesian inference. Extensive experiments show that our framework can achieve state-of-the-art results compared to existing methods for the two tasks.
摘要：流形学习是机器与众多的应用学习的一个基本问题。大多数现有的方法直接学习一些高维空间数据的低维嵌入，通常缺乏的是直接适用于下游应用的灵活性。在本文中，我们提出隐歧管学习，其中隐含通过学习相关的热核获得歧管信息的概念。一种热内核是相应的热方程，它描述了在歧管上如何“热”的转移，因此含有所述歧管的充足几何信息的解决方案。我们提供既实用算法和我们的框架的理论分析。博学的热核可以适用于各种基于内核的机器学习模型，包括深生成模型（DGM）的数据生成和Stein变梯度下降的贝叶斯推理。大量的实验表明，我们的架构能够实现国家的最先进的结果相比，这两个任务现有的方法。

81. Lipschitz Bounded Equilibrium Networks [PDF] 返回目录
Max Revay, Ruigang Wang, Ian R. Manchester
Abstract: This paper introduces new parameterizations of equilibrium neural networks, i.e. networks defined by implicit equations. This model class includes standard multilayer and residual networks as special cases. The new parameterization admits a Lipschitz bound during training via unconstrained optimization: no projections or barrier functions are required. Lipschitz bounds are a common proxy for robustness and appear in many generalization bounds. Furthermore, compared to previous works we show well-posedness (existence of solutions) under less restrictive conditions on the network weights and more natural assumptions on the activation functions: that they are monotone and slope restricted. These results are proved by establishing novel connections with convex optimization, operator splitting on non-Euclidean spaces, and contracting neural ODEs. In image classification experiments we show that the Lipschitz bounds are very accurate and improve robustness to adversarial attacks.
摘要：介绍平衡的神经网络的新参数化，即通过隐方程定义的网络。该模型组包括标准多层和残余网络作为特殊情况。新的参数承认经由无约束最优化的训练期间李普希茨约束：不需要突起或屏障功能。李氏界限是稳健性的共同代理，并出现在许多泛化界。此外，相比以前的作品中，我们显示出对网络的权重，更自然的假设上的激活功能，限制条件少下适定性（解的存在性）：他们是单调和斜率限制。这些结果是通过对非欧氏空间建立与凸优化，算子分裂新颖的连接，并承包神经微分方程证明。在图像分类实验中，我们表明，李氏界限非常精确，提高稳健性对抗攻击。

82. Surface Agnostic Metrics for Cortical Volume Segmentation and Regression [PDF] 返回目录
Samuel Budd, Prachi Patkee, Ana Baburamani, Mary Rutherford, Emma C. Robinson, Bernhard Kainz
Abstract: The cerebral cortex performs higher-order brain functions and is thus implicated in a range of cognitive disorders. Current analysis of cortical variation is typically performed by fitting surface mesh models to inner and outer cortical boundaries and investigating metrics such as surface area and cortical curvature or thickness. These, however, take a long time to run, and are sensitive to motion and image and surface resolution, which can prohibit their use in clinical settings. In this paper, we instead propose a machine learning solution, training a novel architecture to predict cortical thickness and curvature metrics from T2 MRI images, while additionally returning metrics of prediction uncertainty. Our proposed model is tested on a clinical cohort (Down Syndrome) for which surface-based modelling often fails. Results suggest that deep convolutional neural networks are a viable option to predict cortical metrics across a range of brain development stages and pathologies.
摘要：大脑皮层执行高阶脑功能和范围认知障碍因此牵连。皮质变化电流分析通常通过表面网格模型拟合到内和外皮层边界和调查指标，如表面积和皮质的曲率或厚度进行。然而，这些需要很长时间来运行，并且对运动和形象，表面分辨率，它可以禁止在临床上的使用是敏感的。在本文中，我们提议代替机器学习溶液，培养一种新的体系结构来预测从T2 MRI图像皮层厚度和曲率的指标，同时另外返回预测的不确定性的量度。我们所提出的模型在临床队列（唐氏综合征），用于其表面的建模经常出现故障测试。结果表明，深卷积神经网络是在一系列大脑发育阶段和疾病的预测指标皮质一个可行的选择。

83. KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation [PDF] 返回目录
Jeya Maria Jose Valanarasu, Vishwanath A. Sindagi, Ilker Hacihaliloglu, Vishal M. Patel
Abstract: Most methods for medical image segmentation use U-Net or its variants as they have been successful in most of the applications. After a detailed analysis of these "traditional" encoder-decoder based approaches, we observed that they perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. This issue can be attributed to the increase in receptive field size as we go deeper into the encoder. The extra focus on learning high level features causes the U-Net based approaches to learn less information about low-level features which are crucial for detecting small structures. To overcome this issue, we propose using an overcomplete convolutional architecture where we project our input image into a higher dimension such that we constrain the receptive field from increasing in the deep layers of the network. We design a new architecture for image segmentation- KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. Furthermore, we also propose KiU-Net 3D which is a 3D convolutional architecture for volumetric segmentation. We perform a detailed study of KiU-Net by performing experiments on five different datasets covering various image modalities like ultrasound (US), magnetic resonance imaging (MRI), computed tomography (CT), microscopic and fundus images. The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence. Additionally, we also demonstrate that the extensions of KiU-Net based on residual blocks and dense blocks result in further performance improvements. The implementation of KiU-Net can be found here: this https URL
摘要：大多数方法在医学图像分割用U-Net或它的变体，因为他们已经成功地在大多数应用。后的这些“传统”编码器 - 译码器为基础的方法的详细分析，我们观察到它们在检测更小的结构表现很差，并且不能精确地分割边界区域。因为我们去深入到编码器这个问题可以归结为在感受野大小的增加。额外的精力放在学习上高水平的功能使得U型网络基础的方法，以了解低级别的功能，其可用于检测小结构关键的信息较少。为了解决这个问题，我们建议使用我们预测我们输入图像输入到一个更高的层面，使得我们从网络的深层增加约束感受野的完备卷积架构。我们设计了一种新的体系结构对图像segmentation- KIU-Net的具有两个分支：（1）一个完备卷积网络风筝净其学会捕获精细细节和输入的准确的边缘，以及（2）U-Net的哪个学习高级功能。此外，我们还提出了桥型网3D这是体积分割的三维卷积架构。我们通过在五个不同的数据集覆盖等各种超声图像模态（US），磁共振成像（MRI），计算机断层扫描（CT），微观和眼底图像进行实验执行KIU-Net的详细研究。该方法实现了更好的性能相比，最近所有的方法与参数较少的一个额外的好处，快速融合的。此外，我们还表明，桥型网的基础上的残余块和密块扩展导致进一步的性能提升。桥型网的实施可以在这里找到：此HTTPS URL

84. AFN: Attentional Feedback Network based 3D Terrain Super-Resolution [PDF] 返回目录
Ashish Kubade, Diptiben Patel, Avinash Sharma, K. S. Rajan
Abstract: Terrain, representing features of an earth surface, plays a crucial role in many applications such as simulations, route planning, analysis of surface dynamics, computer graphics-based games, entertainment, films, to name a few. With recent advancements in digital technology, these applications demand the presence of high-resolution details in the terrain. In this paper, we propose a novel fully convolutional neural network-based super-resolution architecture to increase the resolution of low-resolution Digital Elevation Model (LRDEM) with the help of information extracted from the corresponding aerial image as a complementary modality. We perform the super-resolution of LRDEM using an attention-based feedback mechanism named 'Attentional Feedback Network' (AFN), which selectively fuses the information from LRDEM and aerial image to enhance and infuse the high-frequency features and to produce the terrain realistically. We compare the proposed architecture with existing state-of-the-art DEM super-resolution methods and show that the proposed architecture outperforms enhancing the resolution of input LRDEM accurately and in a realistic manner.
摘要：地形，表示地球表面的功能，发挥在许多应用中，如模拟，路线规划，表面动力学分析，基于图形的计算机游戏，娱乐，电影，仅举几例了至关重要的作用。随着数字技术的最新发展，这些应用需要高分辨率的细节地形的存在。在本文中，我们提出了一个新的完全卷积基于神经网络的超分辨率的架构与从相应的空中图像中提取作为补充程式信息帮助提高低分辨率数字高程模型（LRDEM）的分辨率。我们进行了超分辨率LRDEM的使用名为“的注意反馈网络”（AFN）的关注，基于反馈机制，它选择性地熔合从LRDEM和航空图像的信息，以加强和注入的高频特性和逼真地产生地形。我们比较建议的架构与现有的国家的最先进的DEM超分辨率的方法和结果表明，所提出的结构性能优于提高输入LRDEM的分辨率精确和实事求是的态度。

85. AIFNet: Automatic Vascular Function Estimation for Perfusion Analysis Using Deep Learning [PDF] 返回目录
Ezequiel de la Rosa, Diana M. Sima, Bjoern Menze, Jan S. Kirschke, David Robben
Abstract: Perfusion imaging is crucial in acute ischemic stroke for quantifying the salvageable penumbra and irreversibly damaged core lesions. As such, it helps clinicians to decide on the optimal reperfusion treatment. In perfusion CT imaging, deconvolution methods are used to obtain clinically interpretable perfusion parameters that allow identifying brain tissue abnormalities. Deconvolution methods require the selection of two reference vascular functions as inputs to the model: the arterial input function (AIF) and the venous output function, with the AIF as the most critical model input. When manually performed, the vascular function selection is time demanding, suffers from poor reproducibility and is subject to the professionals' experience. This leads to potentially unreliable quantification of the penumbra and core lesions and, hence, might harm the treatment decision process. In this work we automatize the perfusion analysis with AIFNet, a fully automatic and end-to-end trainable deep learning approach for estimating the vascular functions. Unlike previous methods using clustering or segmentation techniques to select vascular voxels, AIFNet is directly optimized at the vascular function estimation, which allows to better recognise the time-curve profiles. Validation on the public ISLES18 stroke database shows that AIFNet reaches inter-rater performance for the vascular function estimation and, subsequently, for the parameter maps and core lesion quantification obtained through deconvolution. We conclude that AIFNet has potential for clinical transfer and could be incorporated in perfusion deconvolution software.
摘要：灌注成像是定量挽救的半影和不可逆转的破坏病灶核心急性缺血性卒中的关键。因此，它可以帮助临床医生对最优再灌流决定。在灌注CT成像，去卷积方法被用于获得允许识别脑组织异常临床可解释灌注参数。去卷积方法需要两个参考血管功能的选择作为输入到模型：动脉输入功能（AIF）和静脉输出功能，与作为AIF中最关键的模型输入。手动执行时，血管功能选择时间内要求，从重复性差患有并受到专业人士的经验。这导致了半影和核心病变的可能不可靠的量化，因此，可能会损害治疗决策过程。在这项工作中，我们使自动化与AIFNet，估计血管功能的全自动和终端到终端的可训练的深度学习方法的灌注分析。不像使用聚类或分割技术来选择的体素的血管之前的方法，AIFNet直接在血管函数估计，这允许更好地识别时间曲线轮廓最优化。验证对公众ISLES18行程数据库显示，AIFNet达到用于血管函数估计评估者间的性能，并随后通过解卷积得到的参数图和核心病变定量。我们的结论是AIFNet有临床转移能力，并可能在灌注反卷积软件被纳入。

86. Collaborative Tracking and Capture of Aerial Object using UAVs [PDF] 返回目录
Lima Agnel Tony, Shuvrangshu Jana, Varun V P, Vidyadhara B V, Mohitvishnu S Gadde, Abhishek Kashyap, Rahul Ravichandran, Debasish Ghose
Abstract: This work details the problem of aerial target capture using multiple UAVs. This problem is motivated from the challenge 1 of Mohammed Bin Zayed International Robotic Challenge 2020. The UAVs utilise visual feedback to autonomously detect target, approach it and capture without disturbing the vehicle which carries the target. Multi-UAV collaboration improves the efficiency of the system and increases the chance of capturing the ball robustly in short span of time. In this paper, the proposed architecture is validated through simulation in ROS-Gazebo environment and is further implemented on hardware.
摘要：该作品描述了使用多个无人机空中目标捕捉的问题。这个问题是由穆罕默德·本·扎耶德国际机器人挑战赛2020年的挑战1激发了无人机利用视觉反馈，自动检测目标，接近它和捕捉不干扰承载对象的车辆。多UAV协作提高了系统的效率，并且增加了在很短的时间跨度鲁棒地捕获所述球的机会。在本文中，提出的架构是通过模拟ROS-凉亭环境验证，并在硬件上是进一步落实。

87. Facial gesture interfaces for expression and communication [PDF] 返回目录
Michael J. Lyons
Abstract: Considerable effort has been devoted to the automatic extraction of information about action of the face from image sequences. Within the context of human-computer interaction (HCI) we may distinguish systems that allow expression from those which aim at recognition. Most of the work in facial action processing has been directed at automatically recognizing affect from facial actions. By contrast, facial gesture interfaces, which respond to deliberate facial actions, have received comparatively little attention. This paper reviews several projects on vision-based interfaces that rely on facial action for intentional HCI. Applications to several domains are introduced, including text entry, artistic and musical expression and assistive technology for motor-impaired users.
摘要：相当大的努力一直致力于大约从图像序列的脸部动作信息的自动提取。内的人机交互的上下文中（HCI）我们可以区分允许从那些旨在识别表达系统。大多数面部动作处理的工作已经冲着自动识别人脸，从行为影响。相比之下，面部姿态接口，刻意的面部动作，其响应，已收到相对很少关注。本文综述了依赖于有意HCI面部动作的基于视觉的接口的几个项目。应用到多个领域进行介绍，包括文本输入，艺术和音乐表现和辅助技术，电机障碍的用户。

88. Towards Cross-modality Medical Image Segmentation with Online Mutual Knowledge Distillation [PDF] 返回目录
Kang Li, Lequan Yu, Shujun Wang, Pheng-Ann Heng
Abstract: The success of deep convolutional neural networks is partially attributed to the massive amount of annotated training data. However, in practice, medical data annotations are usually expensive and time-consuming to be obtained. Considering multi-modality data with the same anatomic structures are widely available in clinic routine, in this paper, we aim to exploit the prior knowledge (e.g., shape priors) learned from one modality (aka., assistant modality) to improve the segmentation performance on another modality (aka., target modality) to make up annotation scarcity. To alleviate the learning difficulties caused by modality-specific appearance discrepancy, we first present an Image Alignment Module (IAM) to narrow the appearance gap between assistant and target modality data.We then propose a novel Mutual Knowledge Distillation (MKD) scheme to thoroughly exploit the modality-shared knowledge to facilitate the target-modality segmentation. To be specific, we formulate our framework as an integration of two individual segmentors. Each segmentor not only explicitly extracts one modality knowledge from corresponding annotations, but also implicitly explores another modality knowledge from its counterpart in mutual-guided manner. The ensemble of two segmentors would further integrate the knowledge from both modalities and generate reliable segmentation results on target modality. Experimental results on the public multi-class cardiac segmentation data, i.e., MMWHS 2017, show that our method achieves large improvements on CT segmentation by utilizing additional MRI data and outperforms other state-of-the-art multi-modality learning methods.
摘要：深卷积神经网络的成功部分归功于注释的训练数据的巨量。然而，在实践中，医疗数据注释通常很昂贵并且要获得费时。考虑到与相同的解剖结构多模态数据是诊疗常规广泛使用，在本文中，我们的目标是利用先验知识（例如，形状先验）从一个模态学（亦助理模式），以提高分割性能另一个方式（亦称，目标模式）来弥补注释稀缺性。为了减轻因模态特异性外观差异的学习困难，我们首先提出了一种图像对齐模块（IAM）来缩小助理和目标之间形态的外观间隙数据。我们然后提出一种新颖的相互知识蒸馏（MKD）方案来彻底利用该模式共享知识，以促进目标模态分割。具体而言，我们制定了框架，作为两个独立的segmentors的整合。每个分段装置不仅明确抽取一个形态的知识从相应的注解，同时也含蓄地探索其对应的互导方式的另一形态的知识。 2个segmentors的合奏会从两种模式进一步整合知识并生成目标模式可靠的分割结果。在公共多类心脏分割的数据，即，2017年MMWHS，表明我们的方法实现了通过利用附加的MRI数据和优于其他国家的最先进的多模态的学习方法在CT分割大的改进的实验结果。

89. Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors [PDF] 返回目录
Yu Sun, Jiaming Liu, Yiran Sun, Brendt Wohlberg, Ulugbek S. Kamilov
Abstract: Regularization by denoising (RED) is a recently developed framework for solving inverse problems by integrating advanced denoisers as image priors. Recent work has shown its state-of-the-art performance when combined with pre-trained deep denoisers. However, current RED algorithms are inadequate for parallel processing on multicore systems. We address this issue by proposing a new asynchronous RED (ASYNC-RED) algorithm that enables asynchronous parallel processing of data, making it significantly faster than its serial counterparts for large-scale inverse problems. The computational complexity of ASYNC-RED is further reduced by using a random subset of measurements at every iteration. We present complete theoretical analysis of the algorithm by establishing its convergence under explicit assumptions on the data-fidelity and the denoiser. We validate ASYNC-RED on image recovery using pre-trained deep denoisers as priors.
摘要：通过正则降噪（RED）是一种通过结合先进的denoisers作为先验图像解决反问题最近开发的框架。当预训练的深denoisers结合最近的工作表明国家的最先进的性能。然而，目前的RED的算法不适合用于在多核系统并行处理。我们通过提出一个新的异步RED（异步RED）算法，使数据的异步并行处理，使得它显著比大型反问题的串行同行更快的解决这个问题。 ASYNC-RED的计算复杂性是通过使用在每次迭代的测量的随机子集进一步减小。我们通过在对数据的保真度和降噪明确的假设建立其收敛呈现算法的完整的理论分析。我们使用预训练深层denoisers作为先验验证图像恢复异步RED。

90. autoTICI: Automatic Brain Tissue Reperfusion Scoring on 2D DSA Images of Acute Ischemic Stroke Patients [PDF] 返回目录
Ruisheng Su, Sandra A.P. Cornelissen, Matthijs van der Sluijs, Adriaan C.G.M. van Es, Wim H. van Zwam, Diederik W.J. Dippel, Geert Lycklama, Pieter Jan van Doormaal, Wiro J. Niessen, Aad van der Lugt, Theo van Walsum
Abstract: Thrombolysis in Cerebral Infarction (TICI) score is an important metric for reperfusion therapy assessment in acute ischemic stroke. It is commonly used as a technical outcome measure after endovascular treatment (EVT). Existing TICI scores are defined in coarse ordinal grades based on visual inspection, leading to inter- and intra-observer variations. In this work, we present autoTICI, an automatic and quantitative TICI scoring method. First, each digital subtraction angiography (DSA) sequence is separated into four phases (non-contrast, arterial, parenchymal and venous phase) using a multi-path convolutional neural network (CNN), which exploits spatio-temporal features. The network also incorporates sequence level label dependencies in the form of a state-transition matrix. Next, a minimum intensity map (MINIP) is computed using the motion corrected arterial and parenchymal frames. On the MINIP image, vessel, perfusion and background pixels are segmented. Finally, we quantify the autoTICI score as the ratio of reperfused pixels after EVT. On a routinely acquired multi-center dataset, the proposed autoTICI shows good correlation with the extended TICI (eTICI) reference with an average area under the curve (AUC) score of 0.81. The AUC score is 0.90 with respect to the dichotomized eTICI. In terms of clinical outcome prediction, we demonstrate that autoTICI is overall comparable to eTICI.
摘要：溶栓治疗脑梗死（TICI）分数是在急性缺血性卒中再灌注治疗评估的一个重要指标。它是常用的血管内治疗（EVT）之后的技术成果的措施。现有TICI分数基于目测粗序等级定义，导致之间和内部观察者的变化。在这项工作中，我们提出autoTICI，自动定量TICI评分方法。首先，每个数字减影血管造影（DSA）序列分离成使用多路径卷积神经网络（CNN），其利用时空特征四个阶段（非造影，动脉，肺实质和静脉阶段）。该网络还包含一个状态转换矩阵的形式序列级标签的依赖关系。接着，最小强度地图（MINIP）正在使用所述运动校正的动脉和实质的帧来计算。在MINIP图像，船只，灌注和背景像素被分割。最后，我们量化autoTICI比分为再灌注像素的EVT后的比率。在常规地获取的多中心的数据集，建议autoTICI显示与曲线（AUC）下，平均面积扩展TICI（eTICI）参考良好的相关性得分为0.81。该AUC得分为0.90关于二分eTICI。在临床结果预测方面，我们证明了autoTICI是整体媲美eTICI。

91. Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation [PDF] 返回目录
Madhu Vankadari, Sourav Garg, Anima Majumder, Swagat Kumar, Ardhendu Behera
Abstract: In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between them. The usual photo metric losses used for training these networks may not work for night-time images due to the absence of uniform lighting which is commonly present in day-time images, making it a difficult problem to solve. We propose to solve this problem by posing it as a domain adaptation problem where a network trained with day-time images is adapted to work for night-time images. Specifically, an encoder is trained to generate features from night-time images that are indistinguishable from those obtained from day-time images by using a PatchGAN-based adversarial discriminative learning method. Unlike the existing methods that directly adapt depth prediction (network output), we propose to adapt feature maps obtained from the encoder network so that a pre-trained day-time depth decoder can be directly used for predicting depth from these adapted features. Hence, the resulting method is termed as "Adversarial Domain Feature Adaptation (ADFA)" and its efficacy is demonstrated through experimentation on the challenging Oxford night driving dataset. Also, The modular encoder-decoder architecture for the proposed ADFA method allows us to use the encoder module as a feature extractor which can be used in many other applications. One such application is demonstrated where the features obtained from our adapted encoder network are shown to outperform other state-of-the-art methods in a visual place recognition problem, thereby, further establishing the usefulness and effectiveness of the proposed approach.
摘要：在本文中，我们考虑每像素深度从不受约束的RGB单筒夜间图像这是一个艰巨的任务尚未在文献中充分解决映射估计的问题。当与夜间图像由于它们之间的大的域移位测试的状态的最先进的日间深度估计方法悲惨地失败。用于训练这些网络可能不适用于夜间图像的工作由于缺乏均匀照明中的哪一个在日间图像通常存在，使其成为一个难以解决的问题通常的相片度量损失。我们建议冒充它作为一个领域适应性问题，其中有一天时间的图像训练的网络适合于工作夜间图像来解决这个问题。具体而言，编码器被训练以生成从作为从通过使用基于PatchGAN对抗性判别学习方法从日间图像获得的那些无法区分夜间图像中的特征。不同于直接适应深度预测（网络输出）的现有的方法，我们提出了适应特征地图从编码器网络，使得一个预训练的日间深度解码器可直接用于从这些特征适于预测深度获得。因此，所得到的方法被称为为“对抗性域特征适应（ADFA）”和它的功效通过对挑战牛津夜间驾驶数据集的实验证实。此外，对于所提出的方法ADFA模块化编码器 - 解码器架构允许我们使用编码器模块为可在许多其它应用中使用的特征提取。一种这样的应用证明在来自我们的适于编码器网络获得的特征被证明优于其他国家的最先进的方法，在视觉识别的地方的问题，从而，进一步确立了该方法的有用性和有效性。

92. COVID-19 Classification of X-ray Images Using Deep Neural Networks [PDF] 返回目录
Elisha Goldstein, Daphna Keidar, Daniel Yaron, Yair Shachar, Ayelet Blass, Leonid Charbinsky, Israel Aharony, Liza Lifshitz, Dimitri Lumelsky, Ziv Neeman, Matti Mizrachi, Majd Hajouj, Nethanel Eizenbach, Eyal Sela, Chedva S Weiss, Philip Levin, Ofer Benjaminov, Gil N Bachar, Shlomit Tamir, Yael Rapson, Dror Suhami, Amiel A Dror, Naama R Bogot, Ahuva Grubstein, Nogah Shabshin, Yishai M Elyada, Yonina C Eldar
Abstract: In the midst of the coronavirus disease 2019 (COVID-19) outbreak, chest X-ray (CXR) imaging is playing an important role in the diagnosis and monitoring of patients with COVID-19. Machine learning solutions have been shown to be useful for X-ray analysis and classification in a range of medical contexts. The purpose of this study is to create and evaluate a machine learning model for diagnosis of COVID-19, and to provide a tool for searching for similar patients according to their X-ray scans. In this retrospective study, a classifier was built using a pre-trained deep learning model (ReNet50) and enhanced by data augmentation and lung segmentation to detect COVID-19 in frontal CXR images collected between January 2018 and July 2020 in four hospitals in Israel. A nearest-neighbors algorithm was implemented based on the network results that identifies the images most similar to a given image. The model was evaluated using accuracy, sensitivity, area under the curve (AUC) of receiver operating characteristic (ROC) curve and of the precision-recall (P-R) curve. The dataset sourced for this study includes 2362 CXRs, balanced for positive and negative COVID-19, from 1384 patients (63 +/- 18 years, 552 men). Our model achieved 89.7% (314/350) accuracy and 87.1% (156/179) sensitivity in classification of COVID-19 on a test dataset comprising 15% (350 of 2326) of the original data, with AUC of ROC 0.95 and AUC of the P-R curve 0.94. For each image we retrieve images with the most similar DNN-based image embeddings; these can be used to compare with previous cases.
摘要：在冠状病2019（COVID-19）的爆发，胸部X射线（CXR）成像是打在诊断的重要作用，并与COVID-19监测患者的中间。机器学习解决方案已被证明是在一个范围内的医疗环境中的X射线分析和分类非常有用。这项研究的目的是创建和评估COVID-19的诊断机器学习模型，并用于搜索根据自己的X射线扫描类似患者提供一个工具。在此回顾性研究，分类是使用预训练深度学习模型（ReNet50）建造和数据增强和肺分割增强检测COVID-19在以色列的四家医院2018年1月，七月至2020年间收集的正面CXR图像。最近邻算法基于标识最相似的一个给定图像的图像中的网络的结果来实现。使用接收器操作特性（ROC）曲线的曲线（AUC）下和精确召回（P-R）曲线的精确度，灵敏度，面积的模型中评价。来源这项研究的数据集包括2362个CXRS，正负COVID-19平衡，从1384名患者（63±18年来，552名男性）。我们的模型中COVID-19的分级取得的测试数据集，其包括原始数据的15％（2326 350）89.7％（350分之314）精度和87.1％（179分之156）的灵敏度，与ROC AUC 0.95和AUC的的PR曲线0.94。对于每一个我们检索最相似的基于DNN图像的嵌入图像的图像;这些可以用于与以往的个案比较。

93. WeMix: How to Better Utilize Data Augmentation [PDF] 返回目录
Yi Xu, Asaf Noy, Ming Lin, Qi Qian, Hao Li, Rong Jin
Abstract: Data augmentation is a widely used training trick in deep learning to improve the network generalization ability. Despite many encouraging results, several recent studies did point out limitations of the conventional data augmentation scheme in certain scenarios, calling for a better theoretical understanding of data augmentation. In this work, we develop a comprehensive analysis that reveals pros and cons of data augmentation. The main limitation of data augmentation arises from the data bias, i.e. the augmented data distribution can be quite different from the original one. This data bias leads to a suboptimal performance of existing data augmentation methods. To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation. Our theoretical analysis shows that both algorithms are guaranteed to improve the effect of data augmentation through the bias correction, which is further validated by our empirical studies. Finally, we propose a generic algorithm "WeMix" by combining AugDrop and MixLoss, whose effectiveness is observed from extensive empirical evaluations.
摘要：数据增强在深学习，以提高网络的泛化能力广泛使用的训练技巧。尽管有许多令人鼓舞的结果，最近的几项研究确实指出在某些情况下的传统数据增强方案的限制，要求数据增添的更好的理论认识。在这项工作中，我们开发了显示的优点和数据增强的利弊进行综合分析。数据扩张的主要限制产生于数据偏压，即增强数据分布可以从原来的一个很大的不同。这一数据偏差导致的现有数据增强方法最佳性能。为此，我们开发了两种新的算法，称为“AugDrop”和“MixLoss”，以纠正数据增强了数据偏差。我们的理论分析表明，两种算法是保证提高数据增强通过偏差校正，这是由我们的实证研究进一步验证的效果。最后，我们结合AugDrop和MixLoss，其有效性是由经验丰富的评估观察提出了一个通用的算法“WeMix”。

94. CorrAttack: Black-box Adversarial Attack with Structured Search [PDF] 返回目录
Zhichao Huang, Yaowei Huang, Tong Zhang
Abstract: We present a new method for score-based adversarial attack, where the attacker queries the loss-oracle of the target model. Our method employs a parameterized search space with a structure that captures the relationship of the gradient of the loss function. We show that searching over the structured space can be approximated by a time-varying contextual bandits problem, where the attacker takes feature of the associated arm to make modifications of the input, and receives an immediate reward as the reduction of the loss function. The time-varying contextual bandits problem can then be solved by a Bayesian optimization procedure, which can take advantage of the features of the structured action space. The experiments on ImageNet and the Google Cloud Vision API demonstrate that the proposed method achieves the state of the art success rates and query efficiencies for both undefended and defended models.
摘要：本文提出了一种新的方法，基于分数敌对攻击，在攻击者查询对象模型的损失甲骨文。我们的方法使用与捕获损失函数的梯度的关系的结构的参数的搜索空间。我们发现，在搜索结构化的空间可以通过随时间变化的背景土匪问题，在攻击者需要的功能相关联的手臂，使输入的修改，并且收到立即回报为损失函数的减少来近似。随时间变化的情境土匪问题然后可以通过贝叶斯优化程序，可以采取的行动的结构空间的功能优势来解决。在ImageNet和谷歌云愿景API实验表明，该方法实现了艺术的成功率和查询效率两个不设防，捍卫模型的状态。

95. Deep Expectation-Maximization for Semi-Supervised Lung Cancer Screening [PDF] 返回目录
Sumeet Menon, David Chapman, Phuong Nguyen, Yelena Yesha, Michael Morris, Babak Saboury
Abstract: We present a semi-supervised algorithm for lung cancer screening in which a 3D Convolutional Neural Network (CNN) is trained using the Expectation-Maximization (EM) meta-algorithm. Semi-supervised learning allows a smaller labelled data-set to be combined with an unlabeled data-set in order to provide a larger and more diverse training sample. EM allows the algorithm to simultaneously calculate a maximum likelihood estimate of the CNN training coefficients along with the labels for the unlabeled training set which are defined as a latent variable space. We evaluate the model performance of the Semi-Supervised EM algorithm for CNNs through cross-domain training of the Kaggle Data Science Bowl 2017 (Kaggle17) data-set with the National Lung Screening Trial (NLST) data-set. Our results show that the Semi-Supervised EM algorithm greatly improves the classification accuracy of the cross-domain lung cancer screening, although results are lower than a fully supervised approach with the advantage of additional labelled data from the unsupervised sample. As such, we demonstrate that Semi-Supervised EM is a valuable technique to improve the accuracy of lung cancer screening models using 3D CNNs.
摘要：本文提出了一种半监督算法用于肺癌筛查其中3D卷积神经网络（CNN）使用期望最大化（EM）元算法来训练。半监督学习允许更小的标记的数据集来，以便提供一个更大和更多样化训练样本的未标记数据集进行组合。 EM允许算法同时与标签，其被定义为一个潜变量空间中的未标记的训练集沿着计算CNN训练系数的最大似然估计。我们评估通过Kaggle数据科学碗2017年（Kaggle17）数据集与全国肺癌筛查试验（NLST）数据集的跨域培训细胞神经网络的半监督EM算法的模型的性能。我们的结果表明，半监督EM算法极大地提高了跨域肺癌筛查的分类精度，虽然结果比与来自无监督样品附加标记的数据的优点的完全监控方法更低。因此，我们证明了半监督EM是提高使用3D细胞神经网络的肺癌筛查模型的精度的有价值的技术。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-10-06

目录

摘要