摘要

1. Online Spatiotemporal Action Detection and Prediction via Causal Representations [PDF] 返回目录
Gurkirt Singh
Abstract: In this thesis, we focus on video action understanding problems from an online and real-time processing point of view. We start with the conversion of the traditional offline spatiotemporal action detection pipeline into an online spatiotemporal action tube detection system. An action tube is a set of bounding connected over time, which bounds an action instance in space and time. Next, we explore the future prediction capabilities of such detection methods by extending an existing action tube into the future by regression. Later, we seek to establish that online/causal representations can achieve similar performance to that of offline three dimensional (3D) convolutional neural networks (CNNs) on various tasks, including action recognition, temporal action segmentation and early prediction.
摘要：在本文中，我们专注于视频动作理解从一个角度在线和实时处理点的问题。我们先从传统的离线时空动作检测管道转换成在线时空行为管检测系统。动作管是一组连接的边界随着时间的推移，它在空间和时间限定一个动作实例的。下一步，我们通过回归延伸现有的操作管进入未来探索的这种检测方法的将来预测能力。后来，我们寻求建立，网上/因果表示可以达到类似的性能，以脱机三维（3D）卷积神经上的各种任务，包括动作识别，时间行动分割和早期预测网络（细胞神经网络）的。

2. Reinforced Axial Refinement Network for Monocular 3D Object Detection [PDF] 返回目录
Lijie Liu, Chufan Wu, Jiwen Lu, Lingxi Xie, Jie Zhou, Qi Tian
Abstract: Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. This is an ill-posed problem with a major difficulty lying in the information loss by depth-agnostic cameras. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. To improve the efficiency of sampling, we propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it. The proposed framework, Reinforced Axial Refinement Network (RAR-Net), serves as a post-processing stage which can be freely integrated into existing monocular 3D detection methods, and improve the performance on the KITTI dataset with small extra computational costs.
摘要：单眼立体物检测目的来提取从2D输入图像的3D位置和对象的属性。这是一个病态问题，趴在由深不可知相机的信息丢失的一大难点。常规从空间接近样品三维边界框和推断所述目标对象和它们各自之间的关系，但是，有效样本的概率是在3D空间中相对较小。为了提高采样的效率，我们建议开始的初始预测和逐渐缩小其朝向地面实况，只有一个在各步骤中改变三维参数。这就需要设计一个在数个步骤获得一个奖励政策，因此，我们采用强化学习来优化它。拟议的框架，增强推力细化网络（RAR-NET）作为可自由地集成到现有的单眼3D对应的检测方法后处理阶段，并改善与小的额外计算成本的KITTI数据集的性能。

3. RecSal : Deep Recursive Supervision for Visual Saliency Prediction [PDF] 返回目录
Sandeep Mishra, Oindrila Saha
Abstract: State-of-the-art saliency prediction methods develop upon model architectures or loss functions; while training to generate one target saliency map. However, publicly available saliency prediction datasets can be utilized to create more information for each stimulus than just a final aggregate saliency map. This information when utilized in a biologically inspired fashion can contribute in better prediction performance without the use of models with huge number of parameters. In this light, we propose to extract and use the statistics of (a) region specific saliency and (b) temporal order of fixations, to provide additional context to our network. We show that extra supervision using spatially or temporally sequenced fixations results in achieving better performance in saliency prediction. Further, we also design novel architectures for utilizing this extra information and show that it achieves superior performance over a base model which is devoid of extra supervision. We show that our best method outperforms previous state-of-the-art methods with 50-80% fewer parameters. We also show that our models perform consistently well across all evaluation metrics unlike prior methods.
摘要：国家的最先进的显着性的预测方法开发在模型架构或损失的功能;同时培养生成一个目标显着映像。然而，公开可用的显着性预测数据集可以被用来创建每个刺激不仅仅是一个最终总显着图的更多信息。此信息时，在生物灵感的时尚使用可以更好地预测性能有助于在不使用的机型的参数数量巨大。鉴于此，我们建议提取和使用（一）区域特定的显着性和（b）的注视时间顺序的统计，以提供额外的背景下，以我们的网络。我们发现在实现显着预测性能更好利用空间或时间排序注视结果额外的监管。此外，我们还设计新颖的架构来利用这些额外的信息，并表明它实现了一个基本模型是没有额外的监管优越的性能。我们证明了我们最好的方法优于先前的国家的最先进的方法，用较少的50-80％的参数。我们还表明，我们的模型与现有方法在所有评价指标进行持续良好。

4. RESA: Recurrent Feature-Shift Aggregator for Lane Detection [PDF] 返回目录
Tu Zheng, Hao Fang, Yi Zhang, Wenjian Tang, Zheng Yang, Haifeng Liu, Deng Cai
Abstract: Lane detection is one of the most important tasks in self-driving. Due to various complex scenarios (e.g., severe occlusion, ambiguous lanes, and etc.) and the sparse supervisory signals inherent in lane annotations, lane detection task is still challenging. Thus, it is difficult for ordinary convolutional neural network (CNN) trained in general scenes to catch subtle lane feature from raw image. In this paper, we present a novel module named REcurrent Feature-Shift Aggregator (RESA) to enrich lane feature after preliminary feature extraction with an ordinary CNN. RESA takes advantage of strong shape priors of lanes and captures spatial relationships of pixels across rows and columns. It shifts sliced feature map recurrently in vertical and horizontal directions and enables each pixel to gather global information. With the help of slice-by-slice information propagation, RESA can conjecture lanes accurately in challenging scenarios with weak appearance clues. Moreover, we also propose a Bilateral Up-Sampling Decoder which combines coarse grained feature and fine detailed feature in up-sampling stage, and it can recover low-resolution feature map into pixel-wise prediction meticulously. Our method achieves state-of-the-art results on two popular lane detection benchmarks (CULane and Tusimple). The code will be released publicly available.
摘要：车道检测是在自驾车最重要的任务之一。由于各种复杂的场景（例如，严重的阻塞，暧昧车道，等）和在车道注解中固有的稀疏监控信号，车道检测任务仍然具有挑战性。因此，很难在一般场景训练赶上从原始图像细微特征车道普通卷积神经网络（CNN）。在本文中，我们提出了一个名为复发的特点移聚合（RESA）一种新型的模块来充实车道功能与普通CNN初步特征提取后。 RESA需要车道的强形状先验的优点，并捕获跨越行和列的像素的空间关系。它在垂直和水平方向移动反复切片特征映射，使每个像素收集全局信息。与切片通过切片信息传播的帮助下，RESA可以与弱外观线索挑战场景准确推测车道。此外，我们还提出了一种双侧上采样解码器相结合的粗粒在上采样阶段特征和细详细的功能，并且它可以恢复低分辨率特征映射到逐像素预测精心。我们的方法实现了在两个流行的车道检测基准（CULane和Tusimple）状态的最先进的结果。该代码将被释放公之于众。

5. Extracting full-field subpixel structural displacements from videos via deep learning [PDF] 返回目录
Lele Luan, Ming L. Wang, Yongchao Yang, Hao Sun
Abstract: This paper develops a deep learning framework based on convolutional neural networks (CNNs) that enable real-time extraction of full-field subpixel structural displacements from videos. In particular, two new CNN architectures are designed and trained on a dataset generated by the phase-based motion extraction method from a single lab-recorded high-speed video of a dynamic structure. As displacement is only reliable in the regions with sufficient texture contrast, the sparsity of motion field induced by the texture mask is considered via the network architecture design and loss function definition. Results show that, with the supervision of full and sparse motion field, the trained network is capable of identifying the pixels with sufficient texture contrast as well as their subpixel motions. The performance of the trained networks is tested on various videos of other structures to extract the full-field motion (e.g., displacement time histories), which indicates that the trained networks have generalizability to accurately extract full-field subtle displacements for pixels with sufficient texture contrast.
摘要：本文开发了基于卷积神经网络（细胞神经网络），使从视频中满场的像素结构位移的实时提取了深刻的学习框架。特别是，两个新的CNN架构的设计和训练上从动态结构的单个实验室记录高速视频通过基于相位的运动提取方法所生成的数据集。由于位移仅在有足够的质感对比的区域可靠，运动场的由纹理掩模引起的稀疏经由网络体系结构设计和损失函数定义考虑。结果表明，与完整的和稀疏运动场的监督下，训练网络能够识别具有足够的质感对比以及它们的子像素的运动的像素。经训练的网络的性能上的其他结构的各种视频测试，以提取全视野运动（例如，位移时程），这表明经训练的网络具有普遍性准确地提取全视场细微位移具有足够的纹理像素对比。

6. Initial Classifier Weights Replay for Memoryless Class Incremental Learning [PDF] 返回目录
Eden Belouadah, Adrian Popescu, Ioannis Kanellos
Abstract: Incremental Learning (IL) is useful when artificial systems need to deal with streams of data and do not have access to all data at all times. The most challenging setting requires a constant complexity of the deep model and an incremental model update without access to a bounded memory of past data. Then, the representations of past classes are strongly affected by catastrophic forgetting. To mitigate its negative effect, an adapted fine tuning which includes knowledge distillation is usually deployed. We propose a different approach based on a vanilla fine tuning backbone. It leverages initial classifier weights which provide a strong representation of past classes because they are trained with all class data. However, the magnitude of classifiers learned in different states varies and normalization is needed for a fair handling of all classes. Normalization is performed by standardizing the initial classifier weights, which are assumed to be normally distributed. In addition, a calibration of prediction scores is done by using state level statistics to further improve classification fairness. We conduct a thorough evaluation with four public datasets in a memoryless incremental learning setting. Results show that our method outperforms existing techniques by a large margin for large-scale datasets.
摘要：在人工系统需要处理的数据流，并没有在任何时候访问所有数据的增量学习（IL）是有用的。最有挑战性的设置要求的深层模型和增量模型更新的不断复杂得不到过去数据的有限记忆。于是，过去的阶级表示灾难性遗忘的强烈影响。为了减轻其负面效应，适于微调其包括知识蒸馏通常部署。我们提出了一种基于香草微调骨干一种不同的方法。它利用初始分类的权重，其提供过去类具有较强的代表性，因为它们与所有的类数据训练。然而，分类器在不同状态下学到的大小而变化，并且需要一个公平处理所有类的正常化。归一化是通过标准化的初始分类器的权重，其被假定为正态分布进行。此外，预测分数的校准是通过使用状态级别的统计数据，以进一步提高分类的公平性来完成。我们用一种记忆增量学习环境四次公开数据集进行一次彻底的评估。结果表明：以大比分为大型数据集，我们的方法优于现有技术。

7. Learning to Localize Actions from Moments [PDF] 返回目录
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
Abstract: With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all training videos to be labeled with temporal annotations (action category and temporal boundary) and develop the models in a fully-supervised manner, despite expensive labeling efforts and inapplicable to new categories. In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes. Specifically, we present Action Herald Networks (AherNet) that integrate such design into an one-stage action localization framework. Technically, a weight transfer function is uniquely devised to build the transformation between classification of action moments or foreground video segments and action localization in synthetic contextual moments or untrimmed videos. The context of each moment is learnt through the adversarial mechanism to differentiate the generated features from those of background in untrimmed videos. Extensive experiments are conducted on the learning both across the splits of ActivityNet v1.3 and from THUMOS14 to ActivityNet v1.3. Our AherNet demonstrates the superiority even comparing to most fully-supervised action localization methods. More remarkably, we train AherNet to localize actions from 600 categories on the leverage of action moments in Kinetics-600 and temporal annotations from 200 classes in ActivityNet v1.3. Source code and data are available at \url{this https URL}.
摘要：随着动作的瞬间（即，修剪视频片段，每个包含一个操作实例）的知识，人类可能经常在时间上未修剪视频本地化行动。然而，最实用的方法仍然需要所有的训练视频与时间注解（动作类和临时边界）标记，并在充分监督的方式开发车型，尽管价格昂贵标签的努力和不适用于新的类别。在本文中，我们介绍了迁移学习型的新设计学习本地化行动一大组的活动分类，但只有从从一小部分动作类的利益和修剪视频时间标注类别行动的时刻。具体而言，这样的设计集成到一个阶段的行动本地化框架，我们现在的动作导报网（AherNet）。从技术上讲，重量传递函数唯一地设计构建合成上下文时刻或未经修整视频的动作时刻或前景视频段分类和定位动作之间的变换。每个时刻的背景下，通过对抗机制，从修剪视频的背景的差异所产生的功能教训。大量的实验是在两个跨越ActivityNet V1.3的分裂和THUMOS14到ActivityNet V1.3学习进行。我们AherNet演示了优势，甚至比较最充分的监督作用定位方法。更引人注目的是，我们在ActivityNet V1.3训练AherNet从600个类别上的动作场景的杠杆本地化行动动力学-600和200班的时间标注。源代码和数据都可以在\ {URL这HTTPS URL}。

8. Adversarial Patch Camouflage against Aerial Detection [PDF] 返回目录
Ajaya Adhikari, Richard den Hollander, Ioannis Tolios, Michael van Bekkum, Anneloes Bal, Stijn Hendriks, Maarten Kruithof, Dennis Gross, Nils Jansen, Guillermo Pérez, Kit Buurman, Stephan Raaijmakers
Abstract: Detection of military assets on the ground can be performed by applying deep learning-based object detectors on drone surveillance footage. The traditional way of hiding military assets from sight is camouflage, for example by using camouflage nets. However, large assets like planes or vessels are difficult to conceal by means of traditional camouflage nets. An alternative type of camouflage is the direct misleading of automatic object detectors. Recently, it has been observed that small adversarial changes applied to images of the object can produce erroneous output by deep learning-based detectors. In particular, adversarial attacks have been successfully demonstrated to prohibit person detections in images, requiring a patch with a specific pattern held up in front of the person, thereby essentially camouflaging the person for the detector. Research into this type of patch attacks is still limited and several questions related to the optimal patch configuration remain open. This work makes two contributions. First, we apply patch-based adversarial attacks for the use case of unmanned aerial surveillance, where the patch is laid on top of large military assets, camouflaging them from automatic detectors running over the imagery. The patch can prevent automatic detection of the whole object while only covering a small part of it. Second, we perform several experiments with different patch configurations, varying their size, position, number and saliency. Our results show that adversarial patch attacks form a realistic alternative to traditional camouflage activities, and should therefore be considered in the automated analysis of aerial surveillance imagery.
摘要：在地面军事资产的检测可以通过无人机监控录像将深基础的学习对象检测器来进行。从视线中隐藏的军事资产的传统方式是伪装，例如利用伪装网。然而，大型资产像飞机或船只由传统的伪装网的手段是难以掩饰。另一种类型的伪装的是直接误导自动对象检测器。最近，人们已经观察到，施加到对象的图像小对抗性的改变可产生通过深基于学习的检测器错误的输出。特别是，对抗性攻击已被成功地证明，禁止人检测图像中的，需要与在人的前方举起一个特定图案的补丁，从而基本上隐蔽用于检测器的人。研究这种类型的补丁的攻击仍然是有限的和几个相关的最佳补丁配置问题保持开放。这项工作使两个贡献。首先，我们申请的无人空中监视，其中补片是放在大型军工资产的顶部，运行在图像自动检测伪装他们的使用情况下，基于补丁对抗性攻击。该补丁可以防止整个对象的自动检测而仅覆盖它的一小部分。其次，我们进行了几个实验用不同的补丁配置，改变它们的大小，位置，数量和显着性。我们的研究结果表明，对抗性补丁的攻击形成传统的迷彩活动现实的选择，因此应该在空中监视图像的自动分析认为。

9. Radar+RGB Attentive Fusion for Robust Object Detection in Autonomous Vehicles [PDF] 返回目录
Ritu Yadav, Axel Vierling, Karsten Berns
Abstract: This paper presents two variations of architecture referred to as RANet and BIRANet. The proposed architecture aims to use radar signal data along with RGB camera images to form a robust detection network that works efficiently, even in variable lighting and weather conditions such as rain, dust, fog, and others. First, radar information is fused in the feature extractor network. Second, radar points are used to generate guided anchors. Third, a method is proposed to improve region proposal network targets. BIRANet yields 72.3/75.3% average AP/AR on the NuScenes dataset, which is better than the performance of our base network Faster-RCNN with Feature pyramid network(FFPN). RANet gives 69.6/71.9% average AP/AR on the same dataset, which is reasonably acceptable performance. Also, both BIRANet and RANet are evaluated to be robust towards the noise.
摘要：本文介绍架构的两个变化被称为RANET和BIRANet。所提出的架构的目的是利用雷达信号数据与RGB相机图像一起，以形成一个坚固的检测网络有效地工作，即使在可变照明和天气条件例如雨，灰尘，雾，以及其他。首先，雷达信息的特征提取网络融合。其次，雷达点来生成引导锚。第三，提出了一种方法，以提高区域建议的网络目标。 BIRANet产生的数据集NuScenes，这是比我们的基本网络更快RCNN与特征金字塔网络（FFPN）的性能更好72.3 / 75.3％平均AP / AR。 RANET给出了相同的数据集，这是合理的可接受的性能69.6 / 71.9％平均AP / AR。此外，无论是BIRANet和RANET评估是对噪声的鲁棒性。

10. Continuous Color Transfer [PDF] 返回目录
Chunzhi Gu, Xuequan Lu, Chao Zhang
Abstract: Color transfer, which plays a key role in image editing, has attracted noticeable attention recently. It has remained a challenge to date due to various issues such as time-consuming manual adjustments and prior segmentation issues. In this paper, we propose to model color transfer under a probability framework and cast it as a parameter estimation problem. In particular, we relate the transferred image with the example image under the Gaussian Mixture Model (GMM) and regard the transferred image color as the GMM centroids. We employ the Expectation-Maximization (EM) algorithm (E-step and M-step) for optimization. To better preserve gradient information, we introduce a Laplacian based regularization term to the objective function at the M-step which is solved by deriving a gradient descent algorithm. Given the input of a source image and an example image, our method is able to generate continuous color transfer results with increasing EM iterations. Various experiments show that our approach generally outperforms other competitive color transfer methods, both visually and quantitatively.
摘要：色彩转换，它在图像编辑关键作用，吸引了最近明显注意。它至今仍是一个挑战，因为各种问题，如耗时的手动调整和之前的分割问题。在本文中，我们建议在一个概率框架模型的颜色转移和投它作为一个参数估计问题。特别是，我们涉及与下，高斯混合模型（GMM）的例子的图像转印的图像并把所转印的图像的颜色作为GMM质心。我们采用了优化的期望最大化（EM）算法（E级和M级）。为了更好地维护梯度信息，我们引入了基于拉普拉斯正则项在M步骤的目标函数，它是通过获取一梯度下降算法求解。给定一个源图像和实例图像的输入，我们的方法是能够产生具有增加EM迭代连续颜色转移的结果。各种实验表明，我们的方法通常优于其他竞争色彩的传输方式，视觉和定量。

11. Galaxy Morphology Classification using EfficientNet Architectures [PDF] 返回目录
Shreyas Kalvankar, Hrushikesh Pandit, Pranav Parwate
Abstract: We study the usage of EfficientNets and their applications to Galaxy Morphology Classification. We explore the usage of EfficientNets into predicting the vote fractions of the 79,975 testing images from the Galaxy Zoo 2 challenge on Kaggle. We evaluate this model using the standard competition metric i.e. rmse score and rank among the top 3 on the public leaderboard with a public score of 0.07765. We propose a fine-tuned architecture using EfficientNetB5 to classify galaxies into seven classes - completely round smooth, in-between smooth, cigarshaped smooth, lenticular, barred spiral, unbarred spiral and irregular. The network along with other popular convolutional networks are used to classify 29,941 galaxy images. Different metrics such as accuracy, recall, precision, F1 score are used to evaluate the performance of the model along with a comparative study of other state of the art convolutional models to determine which one performs the best. We obtain an accuracy of 93.7% on our classification model with an F1 score of 0.8857. EfficientNets can be applied to large scale galaxy classification in future optical space surveys which will provide a large amount of data such as the Large Synoptic Space Telescope.
摘要：我们研究EfficientNets的使用情况和他们的应用程序银河形态分类。我们探索EfficientNets的使用到预测从Kaggle星系动物园2挑战79975个测试图像的投票分数。我们使用标准的竞争指标即RMSE分数和等级与公共得分0.07765公开排行榜前三位评估这一模式。我们建议使用EfficientNetB5分类星系微调架构分为七类 - 完全圆润流畅，在两者之间平滑，流畅cigarshaped，透镜状，棒旋，拔去门闩螺旋和不规则。与其他流行的卷积网络沿着网络用来区分29941倍星系的图像。不同的指标，如精确性，召回，精度，F1得分被用于与现有技术的卷积模型的其它状态进行比较研究沿着评估模型的性能以确定哪一个进行最好的。我们获得了与F1得分0.8857分类模型的93.7％的准确度。 EfficientNets可以在将来的光学空间调查其将提供大量的数据，例如在大口径综合太空望远镜被应用于大规模星系分类。

12. iLGaCo: Incremental Learning of Gait Covariate Factors [PDF] 返回目录
Zihao Mu, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Yan-ran Li, Shiqi Yu
Abstract: Gait is a popular biometric pattern used for identifying people based on their way of walking. Traditionally, gait recognition approaches based on deep learning are trained using the whole training dataset. In fact, if new data (classes, view-points, walking conditions, etc.) need to be included, it is necessary to re-train again the model with old and new data samples. In this paper, we propose iLGaCo, the first incremental learning approach of covariate factors for gait recognition, where the deep model can be updated with new information without re-training it from scratch by using the whole dataset. Instead, our approach performs a shorter training process with the new data and a small subset of previous samples. This way, our model learns new information while retaining previous knowledge. We evaluate iLGaCo on CASIA-B dataset in two incremental ways: adding new view-points and adding new walking conditions. In both cases, our results are close to the classical `training-from-scratch' approach, obtaining a marginal drop in accuracy ranging from 0.2% to 1.2%, what shows the efficacy of our approach. In addition, the comparison of iLGaCo with other incremental learning methods, such as LwF and iCarl, shows a significant improvement in accuracy, between 6% and 15% depending on the experiment.
摘要：步态是用于识别基于他们的行走方式人们流行的生物体图案。传统上，步态识别方法的基础上深度学习使用整个训练数据集进行培训。事实上，如果要包括新的数据（类，查看点，走路条件等）的需要，有必要重新列车再次与新老数据样本模型。在本文中，我们提出iLGaCo，对步态识别，幽深的模型可以用新的信息，而无需从头开始重新训练它用整个数据集更新协变量因素，第一个增量学习方法。取而代之的是，我们的方法执行与新的数据，先前的样本的一小部分较短的训练过程。这样一来，我们的模型中学习新的信息，同时保留以前的知识。我们评估在两个增量式的方式CASIA-B数据集iLGaCo：增加新的视图点和增加新的步行条件。在这两种情况下，我们的结果是接近经典的'训练从划痕”的方式，获得精度边际下降，从0.2％到1.2％，那么表明该方法的有效性。此外，iLGaCo与其它增量学习方法，如LWF和iCarl的比较，示出了根据实验准确性的显著改善，6％和15％之间。

13. Deep Probabilistic Feature-metric Tracking [PDF] 返回目录
Binbin Xu, Andrew J. Davison, Stefan Leutenegger
Abstract: Dense image alignment from RGB-D images remains a critical issue for real-world applications, especially under challenging lighting conditions and in a wide baseline setting. In this paper, we propose a new framework to learn a pixel-wise deep feature map and a deep feature-metric uncertainty map predicted by a Convolutional Neural Network (CNN), which together formulate a deep probabilistic feature-metric residual of the two-view constraint that can be minimised using Gauss-Newton in a coarse-to-fine optimisation framework. Furthermore, our network predicts a deep initial pose for faster and more reliable convergence. The optimisation steps are differentiable and unrolled to train in an end-to-end fashion. Due to its probabilistic essence, our approach can easily couple with other residuals, where we show a combination with ICP. Experimental results demonstrate state-of-the-art performance on the TUM RGB-D dataset and 3D rigid object tracking dataset. We further demonstrate our method's robustness and convergence qualitatively.
摘要：从RGB-d图像密集的影像对准仍然是现实世界的应用是一个关键问题，尤其是在恶劣的光照条件和广泛的基线设置。在本文中，我们提出了一个新的框架学习逐像素深特征图和卷积神经网络（CNN）预测了深刻的特征指标不确定性的地图，它们共同制定了两深刻的概率特征度量残留可在粗到细的优化框架使用高斯 - 牛顿被最小化视图约束。此外，我们的网络预测的更快，更可靠的收敛了深刻的初始姿势。优化步骤是微和展开在终端到终端的方式来训练。由于其概率本质上说，我们的方法可以很容易地夫妇与其他残留物，在这里我们展示与ICP的组合。实验结果表明，在TUM RGB-d的数据集与3D刚性物体跟踪数据集状态的最先进的性能。我们进一步证明了我们方法的稳健性和收敛定性。

14. Receptive Multi-granularity Representation for Person Re-Identification [PDF] 返回目录
Guanshuo Wang, Yufeng Yuan, Jiwei Li, Shiming Ge, Xi Zhou
Abstract: A key for person re-identification is achieving consistent local details for discriminative representation across variable environments. Current stripe-based feature learning approaches have delivered impressive accuracy, but do not make a proper trade-off between diversity, locality, and robustness, which easily suffers from part semantic inconsistency for the conflict between rigid partition and misalignment. This paper proposes a receptive multi-granularity learning approach to facilitate stripe-based feature learning. This approach performs local partition on the intermediate representations to operate receptive region ranges, rather than current approaches on input images or output features, thus can enhance the representation of locality while remaining proper local association. Toward this end, the local partitions are adaptively pooled by using significance-balanced activations for uniform stripes. Random shifting augmentation is further introduced for a higher variance of person appearing regions within bounding boxes to ease misalignment. By two-branch network architecture, different scales of discriminative identity representation can be learned. In this way, our model can provide a more comprehensive and efficient feature representation without larger model storage costs. Extensive experiments on intra-dataset and cross-dataset evaluations demonstrate the effectiveness of the proposed approach. Especially, our approach achieves a state-of-the-art accuracy of 96.2%@Rank-1 or 90.0%@mAP on the challenging Market-1501 benchmark.
摘要：一种人重新鉴定的关键是实现跨越可变环境判别表示一致的局部细节。目前，基于条纹特征学习方法已经交付令人惊叹的准确，但不做出适当的权衡多样性，地区和鲁棒性，这很容易从部分语义不一致受到刚性分区和错位的矛盾之间。本文提出了一种接受多粒度的学习方式来促进基于条纹特征的学习。该方法执行在中间表示本地分区操作接受区域的范围，而不是对输入图像或输出特征当前的方法，从而可以提高局部性的表示，同时保持适当的本地关联。为此，当地的分区自适应使用意义平衡激活了均匀的条纹汇集。随机移位增强被进一步引入的人内边界框来缓解未对准区域出现较高的方差。由两个分支网络架构，辨别身份表示的不同尺度可以学会的。这样一来，我们的模型可以提供更全面和有效的特征表示没有更大的模型存储成本。在内部数据集和跨数据集的评估大量的实验证明了该方法的有效性。特别是，我们的方法实现96.2%@Rank-1或90.0%@mAP对充满挑战的市场，1501基准测试中的国家的最先进的精度。

15. Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics [PDF] 返回目录
Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Wei Liu, Yun-hui Liu
Abstract: This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with several 3D backbone networks, i.e., C3D, 3D-ResNet and R(2+1)D. The results show that our approach outperforms the existing approaches across the three backbone networks on various downstream video analytic tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is made publicly available at: this https URL.
摘要：本文提出了一种新借口任务，解决自我监督的视频表示学习问题。具体地，给出的未标记的视频剪辑，我们计算了一系列时空统计汇总，如空间位置和最大运动的主导方向，空间位置和沿时间轴的最大颜色多样性的主导颜色等的然后，神经网络是建立并训练以获得给定的视频帧作为输入的统计汇总。为了缓解学习困难，我们采用多种空间分割模式编码粗糙的空间位置，而不是精确的空间直角坐标。我们的做法是通过观察启发人类的视觉系统是在视野瞬息万变内容敏感，而且只需要大约粗略空间位置的展示，了解视觉内容。为了验证所提出的方法的有效性，我们进行了广泛的实验与几个3D骨干网络，即，C3D，3D-RESNET和R（2 + 1）d。结果表明，我们的方法比在三个骨干网络上的各种下游视频分析任务，包括动作识别，视频检索，动态场景识别和动作相似标签的现有方法。此HTTPS URL：源代码的公开。

16. Evaluating Single Image Dehazing Methods Under Realistic Sunlight Haze [PDF] 返回目录
Zahra Anvari, Vassilis Athitsos
Abstract: Haze can degrade the visibility and the image quality drastically, thus degrading the performance of computer vision tasks such as object detection. Single image dehazing is a challenging and ill-posed problem, despite being widely studied. Most existing methods assume that haze has a uniform/homogeneous distribution and haze can have a single color, i.e. grayish white color similar to smoke, while in reality haze can be distributed non-uniformly with different patterns and colors. In this paper, we focus on haze created by sunlight as it is one of the most prevalent type of haze in the wild. Sunlight can generate non-uniformly distributed haze with drastic density changes due to sun rays and also a spectrum of haze color due to sunlight color changes during the day. This presents a new challenge to image dehazing methods. For these methods to be practical, this problem needs to be addressed. To quantify the challenges and assess the performance of these methods, we present a sunlight haze benchmark dataset, Sun-Haze, containing 107 hazy images with different types of haze created by sunlight having a variety of intensity and color. We evaluate a representative set of state-of-the-art image dehazing methods on this benchmark dataset in terms of standard metrics such as PSNR, SSIM, CIEDE2000, PI and NIQE. This uncovers the limitation of the current methods, and questions their underlying assumptions as well as their practicality.
摘要：雾度会降低的可见性和图像质量显着，从而降低的计算机视觉任务，诸如对象检测的性能。单张图像除雾是一个具有挑战性的和病态问题，尽管被广泛研究。大多数现有方法假设雾度具有均匀/均匀分布和雾度可以具有单一的颜色，即灰白色颜色类似抽烟，而在现实中雾度可以非均匀分布具有不同图案和颜色。在本文中，我们专注于通过阳光产生混浊，因为它是在野外最常见的类型阴霾之一。太阳光可以产生非均匀地分布与由于太阳光线由于在白天太阳光的颜色变化剧烈密度变化和雾度也颜色的光谱雾度。这给图像除雾方法一个新的挑战。对于这些方法是可行的，这个问题需要加以解决。为了定量的挑战和评估这些方法的性能，提出了一种太阳光雾度基准数据集，太阳阴霾，含有由具有各种强度和颜色的太阳光产生的不同类型的雾度的朦胧107倍的图像。我们评估一组代表性在此基准数据集状态的最先进的图像去雾方法的标准度量，例如PSNR，SSIM，CIEDE2000，PI和NIQE方面。这揭示了当前方法的局限性，和问题的基本假设以及它们的实用性。

17. Introducing Representations of Facial Affect in Automated Multimodal Deception Detection [PDF] 返回目录
Leena Mathur, Maja J Matarić
Abstract: Automated deception detection systems can enhance health, justice, and security in society by helping humans detect deceivers in high-stakes situations across medical and legal domains, among others. This paper presents a novel analysis of the discriminative power of dimensional representations of facial affect for automated deception detection, along with interpretable features from visual, vocal, and verbal modalities. We used a video dataset of people communicating truthfully or deceptively in real-world, high-stakes courtroom situations. We leveraged recent advances in automated emotion recognition in-the-wild by implementing a state-of-the-art deep neural network trained on the Aff-Wild database to extract continuous representations of facial valence and facial arousal from speakers. We experimented with unimodal Support Vector Machines (SVM) and SVM-based multimodal fusion methods to identify effective features, modalities, and modeling approaches for detecting deception. Unimodal models trained on facial affect achieved an AUC of 80%, and facial affect contributed towards the highest-performing multimodal approach (adaptive boosting) that achieved an AUC of 91% when tested on speakers who were not part of training sets. This approach achieved a higher AUC than existing automated machine learning approaches that used interpretable visual, vocal, and verbal features to detect deception in this dataset, but did not use facial affect. Across all videos, deceptive and truthful speakers exhibited significant differences in facial valence and facial arousal, contributing computational support to existing psychological theories on affect and deception. The demonstrated importance of facial affect in our models informs and motivates the future development of automated, affect-aware machine learning approaches for modeling and detecting deception and other social behaviors in-the-wild.
摘要：自动谎言识别系统可以帮助人类检测高风险的情况下，骗子跨医疗和法律领域，其中包括加强卫生，司法和安全的社会。本文介绍了面部二维表示的辨别力的新型分析用于自动检测欺骗的影响，从视觉，声乐可解释特征和口头方式沿。我们使用的人在现实世界中，高风险的情况下，法庭如实或欺骗性通信的视频数据集。我们通过实施训练AFF-野生数据库的国家的最先进的深层神经网络提取人脸价和从扬声器面部觉醒的不断交涉杠杆在最狂野的自动情感识别的最新进展。我们用单峰支持向量机（SVM）和基于SVM多模态融合方法尝试找出有效的特点，方式，以及用于检测欺骗建模方法。训练有素的面部模型的单峰型实现的影响，80％的AUC和面部影响对最高性能的多模式方法（自适应增强）当扬声器谁没有训练集的部分测试的实现91％的AUC贡献。这种方法取得了较高的AUC比现有的自动化机器学习方法是使用可解释的视觉，声音，和言语功能，以检测该数据集的欺骗，但没有使用面部影响。在所有的视频，欺骗性和真实的扬声器表现出面部价和面部觉醒显著的差异，有助于对影响和欺骗现有的心理学理论计算支持。面部所表现出的重要性，在我们的模型运筹学影响和激励的自动化未来的发展，影响感知建模和检测，对野生欺骗和其他社会行为机器学习方法。

18. VarifocalNet: An IoU-aware Dense Object Detector [PDF] 返回目录
Haoyang Zhang, Ying Wang, Feras Dayoub, Niko Sünderhauf
Abstract: Accurately ranking a huge number of candidate detections is a key to the high-performance dense object detector. While prior work uses the classification score or the combination of it and the IoU-based localization score as the ranking basis, neither of them can reliably represent the rank, and this harms the detection performance. In this paper, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate rank of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss, for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new dense object detector on the FCOS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones and our best model with Res2Net-101-DCN reaches a single-model single-scale AP of 51.3 on COCO test-dev, achieving the state-of-the-art among various object detectors. Code is available at this https URL .
摘要：准确顺序的候选检测的一个巨大的数字是高性能致密的天体探测器的关键。虽然现有工作使用分类评分或它的组合，并且所述基于IOU定位分数作为排名的基础上，他们都没有能够可靠地表示秩，这危害了检测性能。在本文中，我们建议学习欠条感知分类评分（IACS）中同时表示对象存在的信心和定位精度，以产生致密的天体探测器检测更准确的排名。特别是，我们设计了一种新的损失函数，名为变焦镜头损失，训练密集物体检测来预测IACS，和新的高效的星型边框功能来估计IACS和精炼的粗边框表示。结合这两个新的组件和边框细化分支，我们建立在FCOS架构的新密目标检测，我们称之为VarifocalNet或VFNet的简称。在MS COCO基准显示出广泛的实验，我们VFNet一致通过不同骨干网$ \卡$ 2.0 AP超过了强有力的基础，并与Res2Net-101-DCN我们最好的模式达到51.3对COCO测试开发一个单一模式单一尺度AP ，实现各种物体检测器之中的状态的最先进的。代码可在此HTTPS URL。

19. Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation [PDF] 返回目录
Mrigank Rochan, Mahesh Kumar Krishna Reddy, Yang Wang
Abstract: We consider the problem of sentence specified dynamic video thumbnail generation. Given an input video and a user query sentence, the goal is to generate a video thumbnail that not only provides the preview of the video content, but also semantically corresponds to the sentence. In this paper, we propose a sentence guided temporal modulation (SGTM) mechanism that utilizes the sentence embedding to modulate the normalized temporal activations of the video thumbnail generation network. Unlike the existing state-of-the-art method that uses recurrent architectures, we propose a non-recurrent framework that is simple and allows much more parallelization. Extensive experiments and analysis on a large-scale dataset demonstrate the effectiveness of our framework.
摘要：我们认为句子的问题指定的动态视频缩略图生成。由于输入视频和用户查询语句，我们的目标是生成缩略图，不仅提供视频内容的预览视频，而且还语义对应的句子。在本文中，我们提出了引导时间调制（SGTM）机制的句子，它利用了一句嵌入调节视频缩略图代网络的归一化时间激活。不同于现有的国家的最先进的方法反复使用的架构，我们提出了非经常性框架，简单，并允许更多的并行化。在大规模数据集大量的实验和分析，证明了我们框架的有效性。

20. DeepFacePencil: Creating Face Images from Freehand Sketches [PDF] 返回目录
Yuhang Li, Xuejin Chen, Binxin Yang, Zihan Chen, Zhihua Cheng, Zheng-Jun Zha
Abstract: In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches. Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision. They typically utilize synthesized edge maps of face images as training data. However, these synthesized edge maps strictly align with the edges of the corresponding face images, which limit their generalization ability to real hand-drawn sketches with vast stroke diversity. To address this problem, we propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches, based on a novel dual generator image translation network during training. A novel spatial attention pooling (SAP) is designed to adaptively handle stroke distortions which are spatially varying to support various stroke styles and different levels of details. We conduct extensive experiments and the results demonstrate the superiority of our model over existing methods on both image quality and model generalization to hand-drawn sketches.
摘要：在本文中，我们将探讨产生从手绘草图照片般逼真的人脸图像的任务。现有的图像 - 图像平移方法需要配对的草图和图像监督的大规模数据集。它们一般使用的人脸图像的合成边缘地图作为训练数据。然而，这些合成边缘映射与相应的人脸图像的边缘，这限制了其推广能力与庞大的行程多样性真实手绘草图严格对齐。为了解决这个问题，我们提出DeepFacePencil，一个有效的工具，能够从手绘草图生成逼真的人脸图像，根据培训期间，一种新型的双发电机图像转换网络上。一种新颖的空间注意池（SAP）被设计成自适应地手柄行程的失真其空间变化，以支持各种行程风格和不同级别的细节。我们进行了广泛的实验，结果证明我们的模型在对图像质量和模式推广到手绘草图现有方法的优越性。

21. Shape Defense [PDF] 返回目录
Ali Borji
Abstract: Humans rely heavily on shape information to recognize objects. Conversely, convolutional neural networks (CNNs) are biased more towards texture. This is perhaps the main reason why CNNs are vulnerable to adversarial examples. Here, we explore how shape bias can be incorporated into CNNs to improve their robustness. Two algorithms are proposed, based on the observation that edges are invariant to moderate imperceptible perturbations. In the first one, a classifier is adversarially trained on images with the edge map as an additional channel. At inference time, the edge map is recomputed and concatenated to the image. In the second algorithm, a conditional GAN is trained to translate the edge maps, from clean and/or perturbed images, into clean images. Inference is done over the generated image corresponding to the input's edge map. Extensive experiments over 10 datasets demonstrate the effectiveness of the proposed algorithms against FGSM and $\ell_\infty$ PGD-40 attacks. Further, we show that a) edge information can also benefit other adversarial training methods, and b) CNNs trained on edge-augmented inputs are more robust against natural image corruptions such as motion blur, impulse noise and JPEG compression, than CNNs trained solely on RGB images. From a broader perspective, our study suggests that CNNs do not adequately account for image structures that are crucial for robustness. Code is available at:~\url{this https URL}.
摘要：人类在很大程度上依赖于形状信息来识别物体。相反地，卷积神经网络（细胞神经网络）更向纹理偏压。这也许是最主要的原因，细胞神经网络很容易受到对抗性的例子。在这里，我们探索形状偏差如何被纳入细胞神经网络，以提高他们的鲁棒性。两种算法提出的基础上，观察到边缘不变，中度潜移默化的扰动。在第一个，一分类器adversarially训练与所述边缘图作为附加信道的图像。在推理时，边缘地图重新计算，然后连接起来的形象。在第二个算法，条件GAN被训练以平移的边缘地图，从清洁和/或扰动图像，成清洁的图像。推理做是在对应于输入的边缘图产生的图像。大量的实验在10集示威抗议FGSM和$ \ ell_ \ infty $ PGD-40攻击该算法的有效性。此外，我们证明了一个）的边缘信息也可以有益于其他敌对的训练方法，和b）细胞神经网络训练的上边缘增强输入是针对自然图像损坏诸如运动模糊，脉冲噪声和JPEG压缩更健壮，比细胞神经网络训练仅基于RGB图像。从更广泛的角度来看，我们的研究表明，细胞神经网络没有充分考虑到这是至关重要的稳健性图像结构。代码可在：〜\ {URL这HTTPS URL}。

22. An Integrated Approach to Produce Robust Models with High Efficiency [PDF] 返回目录
Zhijian Li, Bao Wang, Jack Xin
Abstract: Deep Neural Networks (DNNs) needs to be both efficient and robust for practical uses. Quantization and structure simplification are promising ways to adapt DNNs to mobile devices, and adversarial training is the most popular method to make DNNs robust. In this work, we try to obtain both features by applying a convergent relaxation quantization algorithm, Binary-Relax (BR), to a robust adversarial-trained model, ResNets Ensemble via Feynman-Kac Formalism (EnResNet). We also discover that high precision, such as ternary (tnn) and 4-bit, quantization will produce sparse DNNs. However, this sparsity is unstructured under advarsarial training. To solve the problems that adversarial training jeopardizes DNNs' accuracy on clean images and the struture of sparsity, we design a trade-off loss function that helps DNNs preserve their natural accuracy and improve the channel sparsity. With our trade-off loss function, we achieve both goals with no reduction of resistance under weak attacks and very minor reduction of resistance under strong attcks. Together with quantized EnResNet with trade-off loss function, we provide robust models that have high efficiency.
摘要：深层神经网络（DNNs）需要既高效又健壮的实际用途。量化和结构的简化是有希望的方式来DNNs适应移动设备和对抗性训练，使DNNs强大的最常用的方法。在这项工作中，我们试图通过应用收敛松弛量化算法来获得这两项功能，二进制放松（BR），以强大的对抗性训练模式，通过费曼 - 卡茨形式主义ResNets合奏（EnResNet）。我们还发现，精度高，如三元（TNN）和4位，量化会产生稀疏DNNs。然而，这种稀疏度下advarsarial培训非结构化的。为了解决这个对抗性训练危及DNNs'干净的图像精度和稀疏的struture的问题，我们设计了一个权衡损失函数，可以帮助DNNs保留其天然的精度和提高信道稀疏。随着我们的权衡损失函数，我们实现了与在弱的攻击没有减少阻力，并在强attcks阻力非常小的降低这两个目标。与量化EnResNet与权衡损失函数一起，我们提供可靠的模型具有高效率。

23. Deep Volumetric Universal Lesion Detection using Light-Weight Pseudo 3D Convolution and Surface Point Regression [PDF] 返回目录
Jinzheng Cai, Ke Yan, Chi-Tung Cheng, Jing Xiao, Chien-Hung Liao, Le Lu, Adam P. Harrison
Abstract: Identifying, measuring and reporting lesions accurately and comprehensively from patient CT scans are important yet time-consuming procedures for physicians. Computer-aided lesion/significant-findings detection techniques are at the core of medical imaging, which remain very challenging due to the tremendously large variability of lesion appearance, location and size distributions in 3D imaging. In this work, we propose a novel deep anchor-free one-stage VULD framework that incorporates (1) P3DC operators to recycle the architectural configurations and pre-trained weights from the off-the-shelf 2D networks, especially ones with large capacities to cope with data variance, and (2) a new SPR method to effectively regress the 3D lesion spatial extents by pinpointing their representative key points on lesion surfaces. Experimental validations are first conducted on the public large-scale NIH DeepLesion dataset where our proposed method delivers new state-of-the-art quantitative performance. We also test VULD on our in-house dataset for liver tumor detection. VULD generalizes well in both large-scale and small-sized tumor datasets in CT imaging.
摘要：识别，测量和从病人CT扫描准确，全面地报道病变是医生重要而又耗时的过程。计算机辅助病变/显著-发现检测技术是在医学成像，这仍然非常具有挑战性的核心由于在3D成像的大巨大可变性病变的外观，位置和大小分布。在这项工作中，我们提出了一种新颖的深无锚一阶段VULD框架并入（1）P3DC运营商回收体系结构配置和从关闭的，现成的预训练的权重的2D网络，特别是具有大容量，以那些与数据的方差，和（2）一种新的SPR法应付由上病变的表面精确定位他们的代表关键点有效退步三维病变的空间范围。实验验证是对公众大规模NIH DeepLesion数据集，其中我们提出的方法提供了国家的最先进的新的量化表现首先进行。我们也对我们的内部数据集肝肿瘤检测测试VULD。 VULD概括以及在CT成像两个大型和小型肿瘤数据集。

24. A Compact Deep Architecture for Real-time Saliency Prediction [PDF] 返回目录
Samad Zabihi, Hamed Rezazadegan Tavakoli, Ali Borji
Abstract: Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency prediction. Our proposed model consists of a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers. The modified U-Net architecture promotes compactness and efficiency. The novel fully-connected layer facilitates the implicit capturing of the location-dependent information. Using the central difference convolutional layers at different scales enables capturing more robust and biologically motivated features. We compare our model with state of the art saliency models using traditional saliency scores as well as our newly devised scheme. Experimental results over four challenging saliency benchmark datasets demonstrate the effectiveness of our approach in striking a balance between accuracy and speed. Our model can be run in real-time which makes it appealing for edge devices and video processing.
摘要：显着性计算模型的目标是在模仿人类视觉系统的注意力机制。深层神经网络的显着性预测中的应用已经导致在过去几年显着改善。然而，深车型有大量的参数，这使得它们不太适合于实时应用。在这里，我们提出了实时的显着性预测一个紧凑而快速模式。我们提出的模型由变形的U网架构，完全连接的新型层，和中央差卷积层。变形的U-Net的体系结构促进了紧凑性和效率。新颖的全连接层促进的依赖于位置的信息的隐含捕获。使用中央差在不同尺度的卷积层使得能够捕捉更健壮的和生物动机的功能。我们比较我们的模型与使用传统的显着成绩，以及我们的新设计方案的艺术显着模型状态。超过四个有挑战性的显着性标准数据集实验结果表明，在醒目的精度和速度之间的平衡我们的方法的有效性。我们的模型可以实时这使得它呼吁边缘设备和视频处理运行。

25. Finding Action Tubes with a Sparse-to-Dense Framework [PDF] 返回目录
Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Limin Wang, Shugong Xu
Abstract: The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.
摘要：时空动作检测的任务已经吸引了越来越多研究者的关注。现有的主导方法依靠每个单独的帧或剪辑短期信息和密集串行明智的检测解决这个问题。尽管其有效性，这些方法均显示使用不足的长期信息，而且容易低效。在本文中，我们提出一种用于在第一时间，其生成来自视频流的动作管建议用在稀疏到密方式的单个向前动作的有效架构。有在该框架的两个关键特征：（1）我们的时空网络中是明确地利用这两个长期和短期的采样信息，（2）一种新的动态特征采样模块（DTS）被设计为有效地近似管输出同时维持系统容易处理。我们评估我们对UCF101-24，JHMDB-21和UCFSports基准数据集模型的有效性，实现了可喜的成果是国家的最先进的方法，有竞争力的。所提出的疏到密的策略使得我们的框架约7.6倍，比最接近的竞争对手更有效。

26. Learn by Observation: Imitation Learning for Drone Patrolling from Videos of A Human Navigator [PDF] 返回目录
Yue Fan, Shilei Chu, Wei Zhang, Ran Song, Yibin Li
Abstract: We present an imitation learning method for autonomous drone patrolling based only on raw videos. Different from previous methods, we propose to let the drone learn patrolling in the air by observing and imitating how a human navigator does it on the ground. The observation process enables the automatic collection and annotation of data using inter-frame geometric consistency, resulting in less manual effort and high accuracy. Then a newly designed neural network is trained based on the annotated data to predict appropriate directions and translations for the drone to patrol in a lane-keeping manner as humans. Our method allows the drone to fly at a high altitude with a broad view and low risk. It can also detect all accessible directions at crossroads and further carry out the integration of available user instructions and autonomous patrolling control commands. Extensive experiments are conducted to demonstrate the accuracy of the proposed imitating learning process as well as the reliability of the holistic system for autonomous drone navigation. The codes, datasets as well as video demonstrations are available at this https URL
摘要：我们提出了一个模仿学习方法仅基于原始视频自主无人机巡逻。从以前的方法不同，我们建议让无人机通过观察和模仿人的导航是怎么做的地面上学会在空中巡逻。观察过程使自动收集和利用帧间几何一致性数据的注解，导致更少的人工劳动，精度高。然后，新设计的神经网络是基于注释的数据来预测合适的方向和翻译无人机巡逻的车道保持的方式为人类的培训。我们的方法允许无人机在高海拔地区有广阔的视野和低风险飞行。它还可以检测所有可访问的方向，在十字路口，并进一步开展提供用户指令和自主巡逻控制命令的集成。大量的实验以验证了模仿学习过程的精确度以及整体性系统的自主无人机导航的可靠性。该代码，数据集和视频演示可在此HTTPS URL

27. Transfer Learning-based Road Damage Detection for Multiple Countries [PDF] 返回目录
Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, Alexander Mraz, Takehiro Kashiyama, Yoshihide Sekimoto
Abstract: Many municipalities and road authorities seek to implement automated evaluation of road damage. However, they often lack technology, know-how, and funds to afford state-of-the-art equipment for data collection and analysis of road damages. Although some countries, like Japan, have developed less expensive and readily available Smartphone-based methods for automatic road condition monitoring, other countries still struggle to find efficient solutions. This work makes the following contributions in this context. Firstly, it assesses the usability of the Japanese model for other countries. Secondly, it proposes a large-scale heterogeneous road damage dataset comprising 26620 images collected from multiple countries using smartphones. Thirdly, we propose generalized models capable of detecting and classifying road damages in more than one country. Lastly, we provide recommendations for readers, local agencies, and municipalities of other countries when one other country publishes its data and model for automatic road damage detection and classification. Our dataset is available at (this https URL).
摘要：许多城市和道路管理当局寻求实施路面损坏的自动评估。然而，他们往往缺乏技术，知识，资金，获得国家的最先进的设备进行数据采集和道路损坏的分析。尽管在一些国家，如日本，已经开发出更便宜和容易获得的自动路况监控基于智能手机的方法，其他国家仍然找不到有效的解决方案。这项工作使得在这方面的主要贡献。首先，它评估了其他国家的日本模式的可用性。其次，它提出了一种大规模异构路面损坏数据集包括从使用智能手机多个国家收集26620倍的图像。第三，我们建议广义能够检测和在多个国家道路病害分类模型。最后，我们为读者提供，地方机构和其他国家的市政当局建议，当另一个国家公布其数据和模型自动路面破损检测和分类。我们的数据可在（此HTTPS URL）。

28. An automatic framework to study the tissue micro-environment of renal glomeruli in differently stained consecutive digital whole slide images [PDF] 返回目录
Odyssee Merveille, Thomas Lampert, Jessica Schmitz, Germain Forestier, Friedrich Feuerhake, Cédric Wemmert
Abstract: Objective: This article presents an automatic image processing framework to extract quantitative high-level information describing the micro-environment of glomeruli in consecutive whole slide images (WSIs) processed with different staining modalities of patients with chronic kidney rejection after kidney transplantation. Methods: This three step framework consists of: 1) cell and anatomical structure segmentation based on colour deconvolution and deep learning 2) fusion of information from different stainings using a newly developed registration algorithm 3) feature extraction. Results: Each step of the framework is validated independently both quantitatively and qualitatively by pathologists. An illustration of the different types of features that can be extracted is presented. Conclusion: The proposed generic framework allows for the analysis of the micro-environment surrounding large structures that can be segmented (either manually or automatically). It is independent of the segmentation approach and is therefore applicable to a variety of biomedical research questions. Significance: Chronic tissue remodelling processes after kidney transplantation can result in interstitial fibrosis and tubular atrophy (IFTA) and glomerulosclerosis. This pipeline provides tools to quantitatively analyse, in the same spatial context, information from different consecutive WSIs and help researchers understand the complex underlying mechanisms leading to IFTA and glomerulosclerosis.
摘要：本文介绍的自动图像处理框架，以提取描述在与患者的肾移植后慢性肾脏排斥不同的染色模式处理连续整个幻灯片图像（WSIS）肾小球的微环境定量的高级信息。方法：此三步框架包括：1）基于颜色解卷积和从采用新开发的配准算法3）特征提取不同染色信息深学习2）融合细胞和解剖结构分割。结果：该框架的各步骤由病理学家验证独立地定性和定量。不同类型的可提取的特征的示图被呈现。结论：该通用框架允许周围大型结构的微环境，可以（手动或自动）分段的分析。它是独立的分割方法，因此适用于各种生物医学研究的问题。意义：肾移植后慢性组织重塑过程会导致间质纤维化和管萎缩（IFTA）和肾小球硬化。这条管道提供的工具进行定量分析，在相同的空间范围内，从不同的连续峰会并帮助研究人员了解信息导致IFTA和肾小球硬化复杂的底层机制。

29. Dual Attention GANs for Semantic Image Synthesis [PDF] 返回目录
Hao Tang, Song Bai, Nicu Sebe
Abstract: In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively. Specifically, SAM selectively correlates the pixels at each position by a spatial attention map, leading to pixels with the same semantic label being related to each other regardless of their spatial distances. Meanwhile, CAM selectively emphasizes the scale-wise features at each channel by a channel attention map, which integrates associated features among all channel maps regardless of their scales. We finally sum the outputs of SAM and CAM to further improve feature representation. Extensive experiments on four challenging datasets show that DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters. The source code and trained models are available at this https URL.
摘要：在本文中，我们专注于语义图像合成任务，其目的是转移语义标签映射到照片般逼真的图像。现有方法缺乏有效的语义约束以保持语义信息和忽略在空间和通道尺寸的结构的相关性，从而导致不令人满意的模糊和伪影倾向的结果。为了解决这些限制，我们提出了一个新颖的双重关注GAN（DAGAN）合成照片般逼真和语义一致的图像，用输入的布局细节不附加任何额外的培训费用或修改现有的方法在网络架构。我们还提出了两种新的模块，即，位置明智空间注意模块（SAM）和规模逐频道注意模块（CAM），捕捉语义结构注意在空间和通道尺寸，分别。具体地，SAM选择性地相关以由空间注意图中的每个位置上的像素，从而导致像素具有相同的语义标签被彼此相关而不管它们的空间距离。同时，CAM选择性地强调该尺度明智特征在由信道注意图，其中集成相关联的特征，无论其尺度之间的所有信道映射的每个信道。最后，我们总结SAM和CAM的输出，进一步提高特征表示。在四个挑战数据集广泛实验表明，DAGAN实现比状态的最先进的方法显着更好的结果，同时使用更少的模型参数。源代码和训练的模型可在此HTTPS URL。

30. Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks for Robust Visual Tracking [PDF] 返回目录
Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract: Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved great success in challenging scenarios for visual tracking purposes. Although many of those trackers utilize the feature maps from pre-trained convolutional neural networks (CNNs), the effects of selecting different models and exploiting various combinations of their feature maps are still not compared completely. To the best of our knowledge, all those methods use a fixed number of convolutional feature maps without considering the scene attributes (e.g., occlusion, deformation, and fast motion) that might occur during tracking. As a pre-requisition, this paper proposes adaptive discriminative correlation filters (DCF) based on the methods that can exploit CNN models with different topologies. First, the paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model. Second, with the aid of analysis results as attribute dictionaries, adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers regarding video characteristics. Third, the generalization of the proposed method is validated on various tracking datasets as well as CNN models with similar architectures. Finally, extensive experimental results demonstrate the effectiveness of the proposed adaptive method compared with state-of-the-art visual tracking methods.
摘要：由于通过多层非线性变换的自动特征提取过程中，深基于学习的视觉追踪最近在具有挑战性的情景视觉跟踪的目的取得了巨大成功。虽然许多这些纤夫利用来自预训练卷积神经网络的特征映射（细胞神经网络），选择不同的模式，利用其特征图的各种组合的影响仍然没有完全比较。据我们所知，所有这些方法都使用卷积功能固定数量的映射，不考虑跟踪过程中可能出现的场景属性（例如，阻塞，变形，快速运动）。作为预申请，提出基于能够利用CNN型号不同拓扑结构的方法中的自适应辨别相关滤波器（DCF）。首先，本文提供的四种常用CNN模型进行综合分析，以确定每个模型的最佳特征图。其次，分析结果为属性字典的帮助下，深的特点自适应开采提出了改进关于视频特征的视觉跟踪的准确性和鲁棒性。第三，该方法的推广得到验证各种跟踪数据集以及具有类似结构CNN模型。最后，大量的实验结果表明与国家的最先进的视觉跟踪方法相比所提出的自适应方法的有效性。

31. Lymph Node Gross Tumor Volume Detection in Oncology Imaging via Relationship Learning Using Graph Neural Network [PDF] 返回目录
Chun-Hung Chao, Zhuotun Zhu, Dazhou Guo, Ke Yan, Tsung-Ying Ho, Jinzheng Cai, Adam P. Harrison, Xianghua Ye, Jing Xiao, Alan Yuille, Min Sun, Le Lu, Dakai Jin
Abstract: Determining the spread of GTV$_{LN}$ is essential in defining the respective resection or irradiating regions for the downstream workflows of surgical resection and radiotherapy for many cancers. Different from the more common enlarged lymph node (LN), GTV$_{LN}$ also includes smaller ones if associated with high positron emission tomography signals and/or any metastasis signs in CT. This is a daunting task. In this work, we propose a unified LN appearance and inter-LN relationship learning framework to detect the true GTV$_{LN}$. This is motivated by the prior clinical knowledge that LNs form a connected lymphatic system, and the spread of cancer cells among LNs often follows certain pathways. Specifically, we first utilize a 3D convolutional neural network with ROI-pooling to extract the GTV$_{LN}$'s instance-wise appearance features. Next, we introduce a graph neural network to further model the inter-LN relationships where the global LN-tumor spatial priors are included in the learning process. This leads to an end-to-end trainable network to detect by classifying GTV$_{LN}$. We operate our model on a set of GTV$_{LN}$ candidates generated by a preliminary 1st-stage method, which has a sensitivity of $>85\%$ at the cost of high false positive (FP) ($>15$ FPs per patient). We validate our approach on a radiotherapy dataset with 142 paired PET/RTCT scans containing the chest and upper abdominal body parts. The proposed method significantly improves over the state-of-the-art (SOTA) LN classification method by $5.5\%$ and $13.1\%$ in F1 score and the averaged sensitivity value at $2, 3, 4, 6$ FPs per patient, respectively.
摘要：确定GTV $蔓延_ {LN} $是在确定相应的切除或照射区域的手术切除和放射治疗的许多癌症下游流程至关重要。从更常见的淋巴结肿大（LN）不同，GTV $ _ {LN} $还包括：如果具有高的正电子发射断层扫描信号和/或在CT任何转移迹象相关的较小的。这是一项艰巨的任务。在这项工作中，我们提出了一个统一的外观LN和LN之间的关系学习框架检测真实GTV $ _ {LN} $。这是由在现有的临床知识动机即逻辑节点形成连接的淋巴系统，和癌细胞的淋巴结中的传播常遵循一定路径。具体而言，我们首先利用与ROI-池提取GTV $ _ {LN} $的情况下，明智的外观特征的3D卷积神经网络。接下来，我们引入了图形神经网络进一步模型，其中全球LN-肿瘤空间先验包括在学习过程中，LN间关系。这就导致了一个终端到年底可训练网络由GTV $ _ {LN} $进行分类检测。我们对一组GTV $ _ {LN}通过初步第一阶段方法产生$候选，其具有$灵敏度>在高的假阳性的成本（FP）（$> 15 85 \％$的操作我们的模型每位患者$ FPS）。我们验证与142配对PET / RTCT放疗数据集我们的方法中包含扫描胸部和上腹部的身体部位。所提出的方法通过$ 5.5显著改进了的状态的最先进的（SOTA）LN分类方法\％$和$ 13.1 \％$在F1得分和$ 2，3，4中的平均灵敏度值，$ 6点每名患者的FP ，分别。

32. AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning [PDF] 返回目录
Akash Roy
Abstract: I propose a state of the art deep neural architectural solution for handwritten character recognition for Bengali alphabets, compound alphabets as well as numerical digits that achieves state-of-the-art accuracy 96.8% in just 11 epochs. Similar work has been done before by Chatterjee, Dutta, et al. 2019 but they achieved 96.12% accuracy in about 47 epochs. The deep neural architecture used in that paper was fairly large considering the inclusion of the weights of the ResNet 50 model which is a 50-layer Residual Network. This proposed model achieves higher accuracy as compared to any previous work & in a little number of epochs. ResNet50 is a good model trained on the ImageNet dataset, but I propose an HCR network that is trained from the scratch on Bengali characters without the "Ensemble Learning" that can outperform previous architectures.
摘要：我提出的技术的一个状态手写字符识别孟加拉语字母，字母化合物深层神经架构解决方案，以及数位的该实现状态的最先进的准确性为96.8％，在短短11时期。类似的工作之前查特吉，杜塔等已经完成。 2019，但他们在约47时代取得了96.12％的准确率。在纸中使用的深的神经结构是相当大的考虑RESNET 50模型的权重，其有50层剩余网络的包容。相比于以前的任何工作和在时代的一个小数字。这提出的模型实现了更高的精度。 ResNet50是一个很好的模型中训练的ImageNet数据集，但我建议是从孟加拉语字符划痕的培训没有“集成学习”，可以超越以前的架构的HCR网络。

33. Driving Through Ghosts: Behavioral Cloning with False Positives [PDF] 返回目录
Andreas Bühler, Adrien Gaidon, Andrei Cramariuc, Rares Ambrus, Guy Rosman, Wolfram Burgard
Abstract: Safe autonomous driving requires robust detection of other traffic participants. However, robust does not mean perfect, and safe systems typically minimize missed detections at the expense of a higher false positive rate. This results in conservative and yet potentially dangerous behavior such as avoiding imaginary obstacles. In the context of behavioral cloning, perceptual errors at training time can lead to learning difficulties or wrong policies, as expert demonstrations might be inconsistent with the perceived world state. In this work, we propose a behavioral cloning approach that can safely leverage imperfect perception without being conservative. Our core contribution is a novel representation of perceptual uncertainty for learning to plan. We propose a new probabilistic birds-eye-view semantic grid to encode the noisy output of object perception systems. We then leverage expert demonstrations to learn an imitative driving policy using this probabilistic representation. Using the CARLA simulator, we show that our approach can safely overcome critical false positives that would otherwise lead to catastrophic failures or conservative behavior.
摘要：安全自动驾驶需要强大的检测其他交通参与者。然而，强大并不意味着完美，而安全系统通常以较高的假阳性率的费用减少漏检。这导致了保守的，但潜在的危险行为，如避免虚障碍。在克隆行为的背景下，在训练时间知觉错误可能导致学习困难或错误的政策，专家演示可能与感知世界的状态不一致。在这项工作中，我们提出了行为克隆的方法，可以安全地利用不完善的看法而不保守。我们的核心贡献是感性的不确定性的学习计划一个新的表示。我们提出了一个新的概率鸟瞰视点语义网格编码对象的感知系统的嘈杂输出。然后，我们利用专家演示使用此概率表示学习模仿的驾驶策略。使用CARLA模拟器，我们表明，我们的方法可以安全地克服严重误报，否则将导致灾难性的失败或保守行为。

34. Patch-based Brain Age Estimation from MR Images [PDF] 返回目录
Kyriaki-Margarita Bintsi, Vasileios Baltatzis, Arinbjörn Kolbeinsson, Alexander Hammers, Daniel Rueckert
Abstract: Brain age estimation from Magnetic Resonance Images (MRI) derives the difference between a subject's biological brain age and their chronological age. This is a potential biomarker for neurodegeneration, e.g. as part of Alzheimer's disease. Early detection of neurodegeneration manifesting as a higher brain age can potentially facilitate better medical care and planning for affected individuals. Many studies have been proposed for the prediction of chronological age from brain MRI using machine learning and specifically deep learning techniques. Contrary to most studies, which use the whole brain volume, in this study, we develop a new deep learning approach that uses 3D patches of the brain as well as convolutional neural networks (CNNs) to develop a localised brain age estimator. In this way, we can obtain a visualization of the regions that play the most important role for estimating brain age, leading to more anatomically driven and interpretable results, and thus confirming relevant literature which suggests that the ventricles and the hippocampus are the areas that are most informative. In addition, we leverage this knowledge in order to improve the overall performance on the task of age estimation by combining the results of different patches using an ensemble method, such as averaging or linear regression. The network is trained on the UK Biobank dataset and the method achieves state-of-the-art results with a Mean Absolute Error of 2.46 years for purely regional estimates, and 2.13 years for an ensemble of patches before bias correction, while 1.96 years after bias correction.
摘要：从磁共振影像（MRI）脑年龄估计派生的对象的生物大脑年龄和实际年龄之间的差异。这是潜在的生物标志物为神经变性，例如阿尔茨海默氏病的一部分。神经退行性疾病的早期发现表现为更高的大脑年龄有可能促进更好的医疗保健和计划受影响的个人。许多研究已经提出了脑MRI实足年龄的使用机器学习和专深的学习技术的预测。相反，大多数的研究，其使用全脑容量，在这项研究中，我们开发了一个新的深度学习法，即使用3D大脑的补丁以及卷积神经网络（细胞神经网络）来开发本地化的大脑年龄估计。通过这种方式，我们可以得到，对于估算脑年龄发挥的最重要作用的区域的可视化，导致更多的解剖学驱动，可解释的结果，从而确认相关的文献，这表明脑室和海马是属于区域最翔实。另外，我们为了结合使用的集成方法不同的修补程序，如平均或线性回归的结果，以提高年龄估计的任务的整体性能充分利用这方面的知识。该网络进行训练，在英国生物库数据集和方法实现国家的先进成果与2.46岁纯粹区域估算，和2.13岁偏差修正前补丁的合奏平均绝对误差，而1.96年后偏差校正。

35. Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature [PDF] 返回目录
Bo Liu, Qiulei Dong, Zhanyi Hu
Abstract: Recently, many zero-shot learning (ZSL) methods focused on learning discriminative object features in an embedding feature space, however, the distributions of the unseen-class features learned by these methods are prone to be partly overlapped, resulting in inaccurate object recognition. Addressing this problem, we propose a novel adversarial network to synthesize compact semantic visual features for ZSL, consisting of a residual generator, a prototype predictor, and a discriminator. The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor for synthesizing the visual feature. The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN. Since the generated residuals are generally numerically much smaller than the distances among all the prototypes, the distributions of the unseen-class features synthesized by the proposed network are less overlapped. In addition, considering that the visual features from categorization CNNs are generally inconsistent with their semantic features, a simple feature selection strategy is introduced for extracting more compact semantic visual features. Extensive experimental results on six benchmark datasets demonstrate that our method could achieve a significantly better performance than existing state-of-the-art methods by 1.2-13.2% in most cases.
摘要：近日，许多零次学习（ZSL）方法侧重于学习辨别物体嵌入特征空间特征，然而，看不见级的分布特性在这些方法学是容易被部分重叠，导致不准确的对象承认。解决这个问题，我们提出了一种新颖的网络对抗性合成紧凑语义视觉特征为ZSL，由残差产生器，一个原型预测器和鉴别器的。残差产生是产生视觉特征的残余，这是与通过原型预测器，用于合成所述视觉特征预测视觉原型集成。鉴别是从从现有的分类CNN萃取以假乱真区分合成的视觉特征。由于所生成的残差通常比所有的原型之间的距离数值小得多，特性在所提出的网络合成的看不见级的分布较少重叠。此外，考虑到从分类细胞神经网络的视觉特征一般是与他们的语义特征不一致，一个简单的特征选择策略，引入更多的提取紧致语义视觉特征。在六个标准数据集大量的实验结果表明，我们的方法可以通过1.2-13.2％，在大多数情况下实现比现有的国家的最先进的方法显著更好的性能。

36. Puzzle-AE: Novelty Detection in Images through Solving Puzzles [PDF] 返回目录
Mohammadreza Salehi, Ainaz Eftekhar, Niousha Sadjadi, Mohammad Hossein Rohban, Hamid R. Rabiee
Abstract: Autoencoder (AE) has proved to be an effective framework for novelty detection. However, they do not typically show promising results on other kinds of real-world datasets, which are exhibiting high intra-class variations, such as CIFAR-10. AEs are not generally able to learn a latent space that solely captures common features of the normal class, resulting in both high false positive and false negative rates due to modeling features that are irrelevant to the normal class. Recently, self-supervised learning has shown great promise in representation learning. To this end, we propose a new AE framework that is trained based on solving puzzles on randomly permuted image patches. Based on this framework, we achieve competitive or superior results compared to SOTA anomaly detection methods on various toy and real-world datasets. Unlike many competitors in this field, the proposed framework is stable, has real-time performance, more general and agnostic to choices of the model hyper-parameters, can work effectively under small sample size settings, and does not require unprincipled early stopping.
摘要：自动编码器（AE）已被证明是对新颖性检测一个有效的框架。然而，他们通常不会出现在其他类型的真实世界的数据集，其表现出高的类内变化，如CIFAR-10可喜的成果。不良事件一般不能够学习，是专为捕捉正常类的共同特征，导致在高假阳性和假阴性率，由于造型不相关的正常类特征的潜在空间。近日，自我监督学习已经显示出代表学习很大的希望。为此，我们提出了基于解决上随机排列的像块拼图培养了新的AE框架。基于这个框架，我们比较各种玩具和真实世界的数据集SOTA异常检测方法取得竞争或更好的结果。不像在这一领域的众多竞争者，拟议的框架是稳定的，具有实时性能，更普遍的和不可知的模型超参数的选择，可以在小样本的设置有效地工作，并且不需要无原则提前停止。

37. Adaptive Local Structure Consistency based Heterogeneous Remote Sensing Change Detection [PDF] 返回目录
Lin Lei, Yuli Sun, Gangyao Kuang
Abstract: Change detection of heterogeneous remote sensing images is an important and challenging topic in remote sensing for emergency situation resulting from nature disaster. Due to the different imaging mechanisms of heterogeneous sensors, it is difficult to directly compare the images. To address this challenge, we explore an unsupervised change detection method based on adaptive local structure consistency (ALSC) between heterogeneous images in this letter, which constructs an adaptive graph representing the local structure for each patch in one image domain and then projects this graph to the other image domain to measure the change level. This local structure consistency exploits the fact that the heterogeneous images share the same structure information for the same ground object, which is imaging modality-invariant. To avoid the leakage of heterogeneous data, the pixelwise change image is calculated in the same image domain by graph projection. Experiment results demonstrate the effectiveness of the proposed ALSC based change detection method by comparing with some state-of-the-art methods.
摘要：异质遥感图像的变化检测是遥感从自然灾害等导致的紧急状况的重要和具有挑战性的话题。由于异质传感器的不同的成像机制，所以难以直接比较的图像。为了应对这一挑战，我们探索基于在此信，它构造表示每个贴片的局部结构的自适应图形在一个图像域，然后异质图像之间自适应局部结构一致性（ALSC）无监督变化检测方法突出此图另一图像域以测量变化级别。该局部结构一致性利用该异质图像分担相同的地对象，该对象的成像模态不变相同的结构信息的事实。为了避免异构数据的泄漏，所述逐像素变化的图像在由图形投影相同的图像域中计算。实验结果由与国家的最先进的一些方法比较表明了该ALSC基于变化检测方法的有效性。

38. VR-Caps: A Virtual Environment for Capsule Endoscopy [PDF] 返回目录
Kagan Incetan, Ibrahim Omer Celik, Abdulhamid Obeid, Guliz Irem Gokceler, Kutsev Bengisu Ozyoruk, Yasin Almalioglu, Richard J. Chen, Faisal Mahmood, Hunter Gilbert, Nicholas J. Durr, Mehmet Turan
Abstract: Current capsule endoscopes and next-generation robotic capsules for diagnosis and treatment of gastrointestinal diseases are complex cyber-physical platforms that must orchestrate complex software and hardware functions. The desired tasks for these systems include visual localization, depth estimation, 3D mapping, disease detection and segmentation, automated navigation, active control, path realization and optional therapeutic modules such as targeted drug delivery and biopsy sampling. Data-driven algorithms promise to enable many advanced functionalities for capsule endoscopes, but real-world data is challenging to obtain. Physically-realistic simulations providing synthetic data have emerged as a solution to the development of data-driven algorithms. In this work, we present a comprehensive simulation platform for capsule endoscopy operations and introduce VR-Caps, a virtual active capsule environment that simulates a range of normal and abnormal tissue conditions (e.g., inflated, dry, wet etc.) and varied organ types, capsule endoscope designs (e.g., mono, stereo, dual and 360°camera), and the type, number, strength, and placement of internal and external magnetic sources that enable active locomotion. VR-Caps makes it possible to both independently or jointly develop, optimize, and test medical imaging and analysis software for the current and next-generation endoscopic capsule systems. To validate this approach, we train state-of-the-art deep neural networks to accomplish various medical image analysis tasks using simulated data from VR-Caps and evaluate the performance of these models on real medical data. Results demonstrate the usefulness and effectiveness of the proposed virtual platform in developing algorithms that quantify fractional coverage, camera trajectory, 3D map reconstruction, and disease classification.
摘要：当前的胶囊型内窥镜和下一代机器人胶囊用于诊断和治疗胃肠疾病的复杂的网络物理平台，必须协调复杂的软件和硬件功能。这些系统的期望的任务包括视觉定位，深度估计，3D映射，疾病检测和分割，自动导航，主动控制，实现路径和任选的治疗模块，例如靶向药物递送和活检取样。数据驱动算法，承诺使许多先进的功能胶囊内窥镜，但是真实世界的数据是具有挑战性的获得。提供合成的数据物理上逼真的模拟已成为解决的数据驱动算法的开发。在这项工作中，我们提出了一个全面的仿真平台胶囊内窥镜操作和介绍VR-帽，模拟的范围的正常和异常组织的条件（例如，充气，干法，湿法等）和各种器官类型的虚拟活性胶囊环境，胶囊型内窥镜的设计（例如，单声道，立体声，双和360°照相机），以及类型，数目，强度，和的内部和外部的磁源，使有源运动放置。 VR-帽使得能够既独立地或共同开发，优化和测试医疗成像和分析软件用于当前和下一代胶囊内窥镜系统等。为了验证这种方法，我们培养国家的最先进的深层神经网络中使用来自VR-帽仿真数据来完成各种医学图像分析任务和评估实际的医疗数据，这些模型的性能。结果表明，在制定量化的分数范围，相机轨迹，3D地图重建和疾病分类算法提出的虚拟平台的有用性和有效性。

39. On segmentation of pectoralis muscle in digital mammograms by means of deep learning [PDF] 返回目录
Hossein Soleimani, Oleg V.Michailovich
Abstract: Computer-aided diagnosis (CAD) has long become an integral part of radiological management of breast disease, facilitating a number of important clinical applications, including quantitative assessment of breast density and early detection of malignancies based on X-ray mammography. Common to such applications is the need to automatically discriminate between breast tissue and adjacent anatomy, with the latter being predominantly represented by pectoralis major (or pectoral muscle). Especially in the case of mammograms acquired in the mediolateral oblique (MLO) view, the muscle is easily confusable with some elements of breast anatomy due to their morphological and photometric similarity. As a result, the problem of automatic detection and segmentation of pectoral muscle in MLO mammograms remains a challenging task, innovative approaches to which are still required and constantly searched for. To address this problem, the present paper introduces a two-step segmentation strategy based on a combined use of data-driven prediction (deep learning) and graph-based image processing. In particular, the proposed method employs a convolutional neural network (CNN) which is designed to predict the location of breast-pectoral boundary at different levels of spatial resolution. Subsequently, the predictions are used by the second stage of the algorithm, in which the desired boundary is recovered as a solution to the shortest path problem on a specially designed graph. The proposed algorithm has been tested on three different datasets (i.e., MIAS, CBIS-DDSm and InBreast) using a range of quantitative metrics. The results of comparative analysis show considerable improvement over state-of-the-art, while offering the possibility of model-free and fully automatic processing.
摘要：计算机辅助诊断（CAD）长期以来成为乳腺疾病的放射管理的一个组成部分，促进了许多重要的临床应用，包括乳房密度和早期检测基于X射线乳房X射线摄影的恶性肿瘤的定量评估。常见于这样的应用是需要的乳房组织和邻近的解剖结构之间的自动判别，与后者主要由胸大肌（或胸肌）表示。尤其在中侧倾斜（MLO）视图获取乳房X线照片的情况下，肌肉与乳房解剖结构的一些元件容易混淆，因为它们的形态和光度相似性。其结果是，自动检测和MLO乳房X线照片胸肌分割的问题仍然是一个艰巨的任务，创新方法，它仍需要不断搜索。为了解决这个问题，本介绍了基于一个组合使用数据驱动的预测（深学习）和基于图形的图像处理的两个步骤的分割策略。特别地，所提出的方法采用一个卷积神经网络（CNN），其被设计在不同级别的空间分辨率的预测母乳胸边界的位置。随后，预测由所述算法，其中，所期望的边界被回收作为解决最短路径问题上的特殊设计的曲线图的第二阶段中使用。该算法已在使用一系列定量的度量的三个不同的数据集（即，MIAS，CBIS-DDSM和InBreast）进行了测试。对比分析的结果表明，在国家的最先进的相当大的改进，同时提供无模型和全自动加工的可能性。

40. Self-Organized Operational Neural Networks for Severe Image Restoration Problems [PDF] 返回目录
Junaid Malik, Serkan Kiranyaz, Moncef Gabbouj
Abstract: Discriminative learning based on convolutional neural networks (CNNs) aims to perform image restoration by learning from training examples of noisy-clean image pairs. It has become the go-to methodology for tackling image restoration and has outperformed the traditional non-local class of methods. However, the top-performing networks are generally composed of many convolutional layers and hundreds of neurons, with trainable parameters in excess of several millions. We claim that this is due to the inherent linear nature of convolution-based transformation, which is inadequate for handling severe restoration problems. Recently, a non-linear generalization of CNNs, called the operational neural networks (ONN), has been shown to outperform CNN on AWGN denoising. However, its formulation is burdened by a fixed collection of well-known nonlinear operators and an exhaustive search to find the best possible configuration for a given architecture, whose efficacy is further limited by a fixed output layer operator assignment. In this study, we leverage the Taylor series-based function approximation to propose a self-organizing variant of ONNs, Self-ONNs, for image restoration, which synthesizes novel nodal transformations onthe-fly as part of the learning process, thus eliminating the need for redundant training runs for operator search. In addition, it enables a finer level of operator heterogeneity by diversifying individual connections of the receptive fields and weights. We perform a series of extensive ablation experiments across three severe image restoration tasks. Even when a strict equivalence of learnable parameters is imposed, Self-ONNs surpass CNNs by a considerable margin across all problems, improving the generalization performance by up to 3 dB in terms of PSNR.
摘要：基于卷积神经网络判别学习（细胞神经网络）旨在通过从嘈杂的清洁图像对训练实例学习进行图像恢复。它已成为去到方法论解决图像恢复并优于传统的非本地类的方法。不过，表现最出色的网络一般都是由许多卷积层和数以百计的神经细胞，以超过几百万可训练参数。我们主张，这是由于基于卷积的转变固有的线性性质，这是不够的处理严重的恢复问题。最近，细胞神经网络的非线性概括，称为操作神经网络（ONN），已被证明优于CNN上AWGN去噪。然而，它的制剂通过公知的非线性算子的一个固定的收集和穷举搜索，以找到一个给定体系结构，其效力通过一个固定的输出层操作者分配被进一步限制可能的最佳配置的负担。在这项研究中，我们充分利用系列为主泰勒函数逼近提出ONNs，自ONNs的自组织变形，图像复原，综合了新的节点转换onthe飞作为学习过程的一部分，因此无需对于运营商搜索冗余的训练运行。此外，它能够通过多样化的感受野和权重的个人连接运营商异质性的精细程度。我们执行跨三个严重的图像恢复任务的一系列广泛的实验消融。即使可以学习的参数严格的等价性规定，自ONNs超越细胞神经网络的所有问题可观的边际，提高PSNR方面高达3分贝泛化性能。

41. Background Splitting: Finding Rare Classes in a Sea of Background [PDF] 返回目录
Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian
Abstract: We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories. In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background). We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in the presence of this extreme imbalance. Our key observation is that the extreme imbalance due to the background category can be drastically reduced by leveraging visual knowledge from an existing pre-trained model. Specifically, the background category is "split" into smaller and more coherent pseudo-categories during training using a pre-trained model. We incorporate background splitting into an image classification model by adding an auxiliary loss that learns to mimic the predictions of the existing, pre-trained image classification model. Note that this process is automatic and requires no additional manual labels. The auxiliary loss regularizes the feature representation of the shared network trunk by requiring it to discriminate between previously homogeneous background instances and reduces overfitting to the small number of rare category positives. We also show that BG splitting can be combined with other background imbalance methods to further improve performance. We evaluate our method on a modified version of the iNaturalist dataset where only a small subset of rare category labels are available during training (all other images are labeled as background). By jointly learning to recognize ImageNet categories and selected iNaturalist categories, our approach yields performance that is 42.3 mAP points higher than a fine-tuning baseline when 99.98% of the data is background, and 8.3 mAP points higher than SotA baselines when 98.30% of the data is background.
摘要：我们专注于训练精确的深型号为少数罕见类别的图像分类的实际问题。在这些情况下，几乎所有的图像属于数据集中的背景类别（>数据集的95％是背景）。我们证明这两个标准的微调方法和国家的最先进的训练方法上的不平衡数据集不以这种极不平衡的情况下产生准确的深模型。我们的主要发现是，由于背景类别中的极度失衡可以通过从现有的预训练模型，利用视觉知识急剧减少。具体而言，背景类别是“分裂”成更小和更一致的伪类别使用预训练的模型训练期间。我们通过加入助剂损失纳入背景分裂成图像分类模型，学会现有的，预先训练的图像分类模型的模拟预测。请注意，这个过程是自动的，不需要额外的手动标签。辅助损失通过要求它先前均匀背景实例之间进行区分规则化共享网络主干的特征表示，并减少过度拟合到少数稀有类阳性。我们还表明，BG拆分可以与其他背景失衡的方法相结合，以进一步提高性能。我们评估对iNaturalist数据集只有罕见的分类标签的一小部分是培训（所有其它图像被标记为背景）期间可用的修改版本我们的方法。通过共同学习认识ImageNet类别和选择iNaturalist类别，我们的做法收益率的表现比在数据的99.98％的背景下微调基线高42.3地图分和8.3地图百分点，比SOTA基线高时的98.30％数据是背景。

42. Using Artificial Intelligence for Particle Track Identification in CLAS12 Detector [PDF] 返回目录
Gagik Gavalian, Polykarpos Thomadakis, Angelos Angelopoulos, Veronique Ziegler, Nikos Chrisochoides
Abstract: In this article we describe the development of machine learning models to assist the CLAS12 tracking algorithm by identifying the best track candidates from combinatorial track candidates from the hits in drift chambers. Several types of machine learning models were tested, including: Convolutional Neural Networks (CNN), Multi-Layer Perceptron (MLP) and Extremely Randomized Trees (ERT). The final implementation was based on an MLP network and provided an accuracy $>99\%$. The implementation of AI assisted tracking into the CLAS12 reconstruction workflow and provided a 6 times code speedup.
摘要：在这篇文章中，我们介绍的机器学习模型的开发由漂移室命中识别从组合跟踪候选人的最佳人选跟踪协助CLAS12跟踪算法。几种类型的机器学习模型进行了测试，其中包括：卷积神经网络（CNN），多层感知（MLP）和极其随机树（ERT）。最终的实现是基于一个MLP网络，并提供了精确度$> 99 \％$。 AI执行辅助跟踪到CLAS12重建工作流程，并提供了6次代码加速。

43. Learning to Balance Specificity and Invariance for In and Out of Domain Generalization [PDF] 返回目录
Prithvijit Chattopadhyay, Yogesh Balaji, Judy Hoffman
Abstract: We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance. For domain generalization, the goal is to learn from a set of source domains to produce a single model that will best generalize to an unseen target domain. As such, many prior approaches focus on learning representations which persist across all source domains with the assumption that these domain agnostic representations will generalize well. However, often individual domains contain characteristics which are unique and when leveraged can significantly aid in-domain recognition performance. To produce a model which best generalizes to both seen and unseen domains, we propose learning domain specific masks. The masks are encouraged to learn a balance of domain-invariant and domain-specific features, thus enabling a model which can benefit from the predictive power of specialized features while retaining the universal applicability of domain-invariant features. We demonstrate competitive performance compared to naive baselines and state-of-the-art methods on both PACS and DomainNet.
摘要：介绍特定领域掩码泛化，一款型号为改善双方在域和外的域泛化性能。对于域泛化，我们的目标是从一组源域的产生，将最好的推广到一个看不见的目标域的单一模式来学习。因此，许多现有的方法中重点学习其所有源域仍存在与假设这些未知的领域表示将推广以及表示。然而，通常个人域包含具有独特性和杠杆时会显著有助于域识别性能特性。为了生产出既看到和看不到的领域最好的概括，我们建议学习特定领域的面具模型。鼓励面具学习域不变和特定域功能的平衡，从而使这可以从专业特点的预测能力，同时保留域不变特征的普遍适用性获益的典范。我们比较两个PACS和DomainNet天真基线和国家的最先进的方法表现出有竞争力的表现。

44. A Multisite, Report-Based, Centralized Infrastructure for Feedback and Monitoring of Radiology AI/ML Development and Clinical Deployment [PDF] 返回目录
Menashe Benjamin, Guy Engelhard, Alex Aisen, Yinon Aradi, Elad Benjamin
Abstract: An infrastructure for multisite, geographically-distributed creation and collection of diverse, high-quality, curated and labeled radiology image data is crucial for the successful automated development, deployment, monitoring and continuous improvement of Artificial Intelligence (AI)/Machine Learning (ML) solutions in the real world. An interactive radiology reporting approach that integrates image viewing, dictation, natural language processing (NLP) and creation of hyperlinks between image findings and the report, provides localized labels during routine interpretation. These images and labels can be captured and centralized in a cloud-based system. This method provides a practical and efficient mechanism with which to monitor algorithm performance. It also supplies feedback for iterative development and quality improvement of new and existing algorithmic models. Both feedback and monitoring are achieved without burdening the radiologist. The method addresses proposed regulatory requirements for post-marketing surveillance and external data. Comprehensive multi-site data collection assists in reducing bias. Resource requirements are greatly reduced compared to dedicated retrospective expert labeling.
摘要：多站点，地理分布的创建和多样化，高品质，策划收集和标记放射图像数据的基础设施是成功的自动化开发，部署至关重要，监测和人工智能（AI）/机器学习的持续改进（ ML）在现实世界中的解决方案。一个互动的放射学报告的方法，集成了图像浏览，听写，自然语言处理（NLP）和图像发现和报告之间的超链接创建，提供常规的解释中的本地化标签。这些图像和标签可以被捕获并在基于云的系统集中。这种方法提供了与监测算法的性能实用和有效的机制。它也为新的和现有的算法模型迭代开发和质量改进提供反馈。反馈和监控是不负担的放射科医生完成。该方法解决提出了上市后监测和外部数据的监管要求。综合多站点的数据采集有助于减少偏见。资源需求进行比较，以专用的追溯标签专家大大降低。

45. Plug-and-Play Image Restoration with Deep Denoiser Prior [PDF] 返回目录
Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc Van Gool, Radu Timofte
Abstract: Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems. Such a property induces considerable advantages for plug-and-play image restoration (e.g., integrating the flexibility of model-based method and effectiveness of learning-based methods) when the denoiser is discriminatively learned via deep convolutional neural network (CNN) with large modeling capacity. However, while deeper and larger CNN models are rapidly gaining popularity, existing plug-and-play image restoration hinders its performance due to the lack of suitable denoiser prior. In order to push the limits of plug-and-play image restoration, we set up a benchmark deep denoiser prior by training a highly flexible and effective CNN denoiser. We then plug the deep denoiser prior as a modular part into a half quadratic splitting based iterative algorithm to solve various image restoration problems. We, meanwhile, provide a thorough analysis of parameter setting, intermediate results and empirical convergence to better understand the working mechanism. Experimental results on three representative image restoration tasks, including deblurring, super-resolution and demosaicing, demonstrate that the proposed plug-and-play image restoration with deep denoiser prior not only significantly outperforms other state-of-the-art model-based methods but also achieves competitive or even superior performance against state-of-the-art learning-based methods. The source code is available at this https URL.
摘要：在插件和播放图像恢复最近的工作已经表明，降噪可以隐式作为基于模型的方法之前的图像，解决了许多反问题。这样的特性引起的插件和播放图像恢复相当大的优势（例如，集成基于模型的方法和基于学习的方法有效性的灵活性）时，降噪是有区别通过深卷积神经网络（CNN）学会用大造型容量。然而，在更深层次和更大的CNN模型，正在迅速普及，现有的插件和播放图像恢复阻碍了它的性能，因为缺乏合适的降噪之前。为了推动插件和播放图像恢复的极限，我们成立了由训练一个高度灵活和有效的降噪CNN之前的基准深降噪。然后，我们之前为模块化部分成半二次分裂基于迭代算法解决各种图像恢复问题，堵塞深降噪。我们，同时提供参数设置，中间结果和经验融合的深入分析，以更好地了解工作机制。对三种具有代表性的图像恢复的任务，包括去模糊，超分辨率和去马赛克实验结果，证明了深降噪提出的插件和播放图像恢复之前不仅显著优于国家的最先进的其他基于模型的方法，但还实现了对国家的最先进的基于学习的方法有竞争力的，甚至优异的性能。源代码可在此HTTPS URL。

46. Unpaired Learning of Deep Image Denoising [PDF] 返回目录
Xiaohe Wu, Ming Liu, Yue Cao, Dongwei Ren, Wangmeng Zuo
Abstract: We investigate the task of learning blind image denoising networks from an unpaired set of clean and noisy images. Such problem setting generally is practical and valuable considering that it is feasible to collect unpaired noisy and clean images in most real-world applications. And we further assume that the noise can be signal dependent but is spatially uncorrelated. In order to facilitate unpaired learning of denoising network, this paper presents a two-stage scheme by incorporating self-supervised learning and knowledge distillation. For self-supervised learning, we suggest a dilated blind-spot network (D-BSN) to learn denoising solely from real noisy images. Due to the spatial independence of noise, we adopt a network by stacking 1x1 convolution layers to estimate the noise level map for each image. Both the D-BSN and image-specific noise model (CNN\_est) can be jointly trained via maximizing the constrained log-likelihood. Given the output of D-BSN and estimated noise level map, improved denoising performance can be further obtained based on the Bayes' rule. As for knowledge distillation, we first apply the learned noise models to clean images to synthesize a paired set of training images, and use the real noisy images and the corresponding denoising results in the first stage to form another paired set. Then, the ultimate denoising model can be distilled by training an existing denoising network using these two paired sets. Experiments show that our unpaired learning method performs favorably on both synthetic noisy images and real-world noisy photographs in terms of quantitative and qualitative evaluation.
摘要：我们研究学习暗图像从一个不成对的一套干净和嘈杂的图像去噪网络的任务。这样的问题设置通常是实用和有价值的考虑，这是可行的收集未成嘈杂和干净的影像在大多数现实世界的应用。我们进一步假设，噪声可以依赖于信号，但在空间上是不相关的。为了方便网络去噪不成对的学习，提出了通过将自我监督学习和知识蒸馏两阶段方案。对于自我监督学习，我们建议扩张的盲点网络（d-BSN），以了解真正噪声图像去噪只。由于噪声的空间独立性，我们通过堆叠的1x1卷积层来估计每幅图像的噪声水平地图采用的网络。无论是d-BSN和特定图像噪声模型（CNN \ _est）可以通过最大化约束数似然联合训练。给定d-BSN的输出和估计噪声电平图，改进的去噪性能可进一步获得基于贝叶斯规则。至于知识蒸馏，我们首先学到噪声模型适用于干净的影像合成配对训练图像集，并使用真实的图像噪点和相应的去噪结果在第一阶段形成另一种配对的。于是，最终的去噪模型可以通过训练使用这两个成对的多组现有的去噪网络进行蒸馏。实验结果表明，未配对的学习方法进行毫不逊色于这两个合成图像噪点和现实世界的嘈杂照片在定量和定性评估的条款。

47. Switchable Deep Beamformer [PDF] 返回目录
Shujaat Khan, Jaeyoung Huh, Jong Chul Ye
Abstract: Recent proposals of deep beamformers using deep neural networks have attracted significant attention as computational efficient alternatives to adaptive and compressive beamformers. Moreover, deep beamformers are versatile in that image post-processing algorithms can be combined with the beamforming. Unfortunately, in the current technology, a separate beamformer should be trained and stored for each application, demanding significant scanner resources. To address this problem, here we propose a {\em switchable} deep beamformer that can produce various types of output such as DAS, speckle removal, deconvolution, etc., using a single network with a simple switch. In particular, the switch is implemented through Adaptive Instanace Normalization (AdaIN) layers, so that various output can be generated by merely changing the AdaIN code. Experimental results using B-mode focused ultrasound confirm the flexibility and efficacy of the proposed methods for various applications.
摘要：使用深层神经网络的深波束生成最近提议招致了显著关注，因为计算效率的替代适应性和压缩波束成形。此外，深波束形成器是通用的该图像后处理算法可以与波束赋形组合英寸不幸的是，在目前的技术，独立的波束形成器应接受培训并存储每个应用程序，要求显著扫描器资源。为了解决这个问题，在这里我们提出了一个{\ EM可切换}深波束形成器，可以产生各种类型的输出，如DAS，祛除斑点，反褶积等，采用一个单一的网络用一个简单的开关。特别地，该开关通过自适应Instanace正常化（AdaIN）层中实现，以使得各种输出可以通过仅仅改变AdaIN代码来生成。使用B模式的实验结果聚焦超声确认用于各种应用所提出的方法的灵活性和有效性。

48. Evaluating Knowledge Transfer In Neural Network for Medical Images [PDF] 返回目录
Sina Akbarian, Laleh Seyyed-Kalantari, Farzad Khalvati, Elham Dolatabadi
Abstract: Deep learning and knowledge transfer techniques have permeated the field of medical imaging and are considered as key approaches for revolutionizing diagnostic imaging practices. However, there are still challenges for the successful integration of deep learning into medical imaging tasks due to a lack of large annotated imaging data. To address this issue, we propose a teacher-student learning framework to transfer knowledge from a carefully pre-trained convolutional neural network (CNN) teacher to a student CNN as a way of improving the diagnostic tasks on a small data regime. In this study, we explore the performance of knowledge transfer in the medical imaging setting through a series of experiments. We investigate the proposed network's performance when the student network is trained on a small dataset (target dataset) as well as when teachers and student's domains are distinct. We also examine the proposed network's behavior on the convergence and regularization of the student network during training. The performances of the CNN models are evaluated on three medical imaging datasets including Diabetic Retinopathy, CheXpert, and ChestX-ray8. Our results indicate that the teacher-student learning framework outperforms transfer learning for small imaging datasets. Particularly, the teacher-student learning framework improves the area under the ROC Curve (AUC) of the CNN model on a small sample of CheXpert (n=5k) by 4% and on ChestX-ray8 (n=5.6k) by 9%. In addition to small training data size, we also demonstrate a clear advantage to favoring teacher-student learning framework for cross-domain knowledge transfer in the medical imaging setting compared to other knowledge transfer techniques such as transfer learning. We observe that the teacher-student network holds a great promise not only to improve the performance of diagnosis but also to reduce overfitting when the dataset is small.
摘要：深学习和知识转移技术已经渗透到医疗成像领域，被认为是键彻底改变诊断成像实践方法。然而，仍然有成功整合深学习到医疗成像任务，由于缺乏大型注释的成像数据的挑战。为了解决这个问题，我们提出了师生的学习框架，从一个精心预先训练卷积神经网络（CNN）教师传授知识给学生CNN为提高对小数据政权诊断任务的一种方式。在这项研究中，我们通过一系列的实验，探索在医疗成像设定知识转移的性能。我们调查所提出的网络性能，当学生网络上的一个小数据集（目标数据集）的培训，以及当教师和学生的域是不同的。我们还检查所提出的网络对学生的网络训练时的收敛和正规化的行为。 CNN的车型的性能进行了三个医疗成像数据集，包括糖尿病性视网膜病变，CheXpert和ChestX-ray8评估。我们的研究结果表明，教师与学生的学习框架性能优于转移学习小成像数据集。特别地，教师与学生学习框架4％和ChestX-ray8（N = 5.6K）9％提高了CheXpert（N = 5K）的一个小样本下CNN模型的ROC曲线（AUC）的区域。除了小训练数据的大小，我们也表现出明显的优势，有利于师生的学习框架，相对于其他的知识转移技术，例如转移学习医疗成像设置跨域知识转移。我们观察到，师生网络拥有巨大潜力，不仅提高诊断的性能，而且还减少过度拟合如果数据集小。

49. Structured Graph Learning for Clustering and Semi-supervised Classification [PDF] 返回目录
Zhao Kang, Chong Peng, Qiang Cheng, Xinwang Liu, Xi Peng, Zenglin Xu, Ling Tian
Abstract: Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn't have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly $c$ connected components if there are $c$ clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.
摘要：图形在过去的十年中造型结构和相互作用各种各样的问题变得日益流行。基于图的聚类和半监督分类技术已经表现出了不俗的表现。本文提出了一种图形学习框架保留两个数据的本地和全球的结构。具体来说，我们的方法是使用样品的自我表现力捕捉到全球的结构和自适应邻法要尊重当地的结构。此外，大多数现有的基于图形的方法进行聚类以及从原始数据矩阵，不具有明确的集群结构，因此他们可能无法达到最佳性能了解到图表半监督分类。通过考虑秩约束，所取得的图形将会有确切$ C $连接的部件是否有$ C群或类。由于这一副产品，图表学习和标签推断有原则的方式共同和重复执行。从理论上讲，我们表明，我们的模型相当于核k均值和k均值在一定条件下相结合的方法。聚类和半监督分类了广泛的实验表明，该方法优于其他国家的最先进的方法。

50. Integrative Object and Pose to Task Detection for an Augmented-Reality-based Human Assistance System using Neural Networks [PDF] 返回目录
Linh Kästner, Leon Eversberg, Marina Mursa, Jens Lambrecht
Abstract: As a result of an increasingly automatized and digitized industry, processes are becoming more complex. Augmented Reality has shown considerable potential in assisting workers with complex tasks by enhancing user understanding and experience with spatial information. However, the acceptance and integration of AR into industrial processes is still limited due to the lack of established methods and tedious integration efforts. Meanwhile, deep neural networks have achieved remarkable results in computer vision tasks and bear great prospects to enrich Augmented Reality applications . In this paper, we propose an Augmented-Reality-based human assistance system to assist workers in complex manual tasks where we incorporate deep neural networks for computer vision tasks. More specifically, we combine Augmented Reality with object and action detectors to make workflows more intuitive and flexible. To evaluate our system in terms of user acceptance and efficiency, we conducted several user studies. We found a significant reduction in time to task completion in untrained workers and a decrease in error rate. Furthermore, we investigated the users learning curve with our assistance system.
摘要：作为一个日益自动化的和数字化产业的结果，过程变得更加复杂。增强现实技术已显示出通过加强用户的理解和与空间信息的经验协助处理复杂任务的工人相当大的潜力。然而，AR的接受和融入工业生产仍然有限，由于缺乏成熟的方法和繁琐的整合力度。同时，深层神经网络已经实现了计算机视觉任务，成效显着，并承担巨大前景，丰富的增强现实应用。在本文中，我们提出了一个基于增强现实的人类辅助系统，以帮助工人在我们纳入了计算机视觉任务深层神经网络的复杂的手动任务。更具体地说，我们结合了增强现实与目标和行动探测器，使工作流程更加直观和灵活。要在用户的认可和效率方面评估我们的系统，我们进行了多次的用户研究。我们发现，及时完成任务在未经训练的工人显著减少和错误率的下降。此外，我们调查了用户学习我们的协助系统曲线。

51. Extreme Memorization via Scale of Initialization [PDF] 返回目录
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur
Abstract: We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD, interpolating from good generalization performance to completely memorizing the training set while making little progress on the test set. Moreover, we find that the extent and manner in which generalization ability is affected depends on the activation and loss function used, with $\sin$ activation being the most extreme. In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function. Our empirical investigation reveals that increasing the scale of initialization could cause the representations and gradients to be increasingly misaligned across examples in the same class. We further demonstrate that a similar misalignment phenomenon occurs in other scenarios affecting generalization performance, such as changes to the architecture or data distribution.
摘要：我们构建一个实验设置在其中改变初始化规模强烈影响的隐含正规化通过SGD所致，是良好的泛化性能插值完全记忆训练集，同时使对测试集进展甚微。此外，我们发现，在泛化能力受影响的程度和方式取决于所使用的激活和损失函数，以$ \ $罪被激活的最极端的。在同质RELU激活的情况下，我们表明，这种行为可以归因于损失函数。我们的实证研究表明，增加初始化的规模可能导致交涉和梯度跨越的例子越来越多地对准在同一个班级。我们进一步证明在影响泛化性能其他情况下，如变更对体系结构或数据分发发生类似现象错位。

52. An evolutionary perspective on the design of neuromorphic shape filters [PDF] 返回目录
Ernest Greene
Abstract: A substantial amount of time and energy has been invested to develop machine vision using connectionist (neural network) principles. Most of that work has been inspired by theories advanced by neuroscientists and behaviorists for how cortical systems store stimulus information. Those theories call for information flow through connections among several neuron populations, with the initial connections being random (or at least non-functional). Then the strength or location of connections are modified through training trials to achieve an effective output, such as the ability to identify an object. Those theories ignored the fact that animals that have no cortex, e.g., fish, can demonstrate visual skills that outpace the best neural network models. Neural circuits that allow for immediate effective vision and quick learning have been preprogrammed by hundreds of millions of years of evolution and the visual skills are available shortly after hatching. Cortical systems may be providing advanced image processing, but most likely are using design principles that had been proven effective in simpler systems. The present article provides a brief overview of retinal and cortical mechanisms for registering shape information, with the hope that it might contribute to the design of shape-encoding circuits that more closely match the mechanisms of biological vision.
摘要：时间和精力大量已投入开发使用联结（神经网络）的原则机器视觉。大部分工作已经由神经学家和行为主义如何皮质系统存储刺激信息先进理论的启发。这些理论呼吁通过连接几个神经元种群之间的信息流，与所述初始连接是随机的（或至少非功能性）。然后连接的强度或位置，通过训练试验修改以实现有效的输出，诸如识别对象的能力。这些理论忽略了一个事实，即有没有皮质的动物，如鱼，能证明的视觉技能，超过最佳的神经网络模型。神经回路，允许即时有效的眼光和快速学习已被亿万年的进化和视觉技能预编程的孵化后不久可用。皮质系统可提供先进的图像处理，但最有可能使用的是已被证明有效的更简单的系统设计原则。本文章提供的视网膜和皮质机制登记形状信息，希望它可能有助于形状编码电路，其更符合生物视觉机制的设计的简要概述。

53. Deep Hypergraph U-Net for Brain Graph Embedding and Classification [PDF] 返回目录
Mert Lostar, Islem Rekik
Abstract: -Background. Network neuroscience examines the brain as a complex system represented by a network (or connectome), providing deeper insights into the brain morphology and function, allowing the identification of atypical brain connectivity alterations, which can be used as diagnostic markers of neurological disorders. -Existing Methods. Graph embedding methods which map data samples (e.g., brain networks) into a low dimensional space have been widely used to explore the relationship between samples for classification or prediction tasks. However, the majority of these works are based on modeling the pair-wise relationships between samples, failing to capture their higher-order relationships. -New Method. In this paper, inspired by the nascent field of geometric deep learning, we propose Hypergraph U-Net (HUNet), a novel data embedding framework leveraging the hypergraph structure to learn low-dimensional embeddings of data samples while capturing their high-order relationships. Specifically, we generalize the U-Net architecture, naturally operating on graphs, to hypergraphs by improving local feature aggregation and preserving the high-order relationships present in the data. -Results. We tested our method on small-scale and large-scale heterogeneous brain connectomic datasets including morphological and functional brain networks of autistic and demented patients, respectively. -Conclusion. Our HUNet outperformed state-of-the-art geometric graph and hypergraph data embedding techniques with a gain of 4-14% in classification accuracy, demonstrating both scalability and generalizability. HUNet code is available at this https URL.
摘要：-Background。网络神经科学检查脑如通过网络（或连接组）表示的复杂系统，提供深入了解脑形态和功能，允许非典型脑连通的改变，其可被用作神经性疾病的诊断标志物的鉴定。 - 现有的方法。图嵌入其中映射的数据样本（例如，大脑网络）转换成低维空间中已被广泛用来研究进行分类或预测任务样本之间的关系的方法。然而，大多数这些作品都是基于建模样本之间的成对关系，未能捕捉他们的高阶关系。 - 新的方法。在本文中，通过几何深度学习的新兴领域的启发，我们提出超图掌中宽带（HUNet），一种新型的数据嵌入框架借力超图结构学习数据样本的低维的嵌入，同时捕捉它们的高次关系。具体来说，我们推广了U型网结构，改善局部特征聚集和保存数据中存在的高阶关系图上的自然运行，对超图。 - 结果。我们分别测试了小规模和大规模异构脑连接组学数据集，包括自闭症和痴呆病人的形态和功能的大脑网络我们的方法。 -结论。我们的HUNet优于状态的最先进的几何图形和超图数据嵌入技术以4-14％的分类精度的增益，这表明这两个可扩展性和普遍性。 HUNet代码可在此HTTPS URL。

54. MDCN: Multi-scale Dense Cross Network for Image Super-Resolution [PDF] 返回目录
Juncheng Li, Faming Fang, Jiaqian Li, Kangfu Mei, Guixu Zhang
Abstract: Convolutional neural networks have been proven to be of great benefit for single-image super-resolution (SISR). However, previous works do not make full use of multi-scale features and ignore the inter-scale correlation between different upsampling factors, resulting in sub-optimal performance. Instead of blindly increasing the depth of the network, we are committed to mining image features and learning the inter-scale correlation between different upsampling factors. To achieve this, we propose a Multi-scale Dense Cross Network (MDCN), which achieves great performance with fewer parameters and less execution time. MDCN consists of multi-scale dense cross blocks (MDCBs), hierarchical feature distillation block (HFDB), and dynamic reconstruction block (DRB). Among them, MDCB aims to detect multi-scale features and maximize the use of image features flow at different scales, HFDB focuses on adaptively recalibrate channel-wise feature responses to achieve feature distillation, and DRB attempts to reconstruct SR images with different upsampling factors in a single model. It is worth noting that all these modules can run independently. It means that these modules can be selectively plugged into any CNN model to improve model performance. Extensive experiments show that MDCN achieves competitive results in SISR, especially in the reconstruction task with multiple upsampling factors. The code will be provided at this https URL.
摘要：卷积神经网络已经被证明是对单图像超分辨率（SISR）大有裨益。然而，以前的作品不充分利用多尺度特征，而忽略不同的采样因素之间的相互规模的相关性，从而导致次优的性能。而不是盲目地增加网络的深度，我们致力于挖掘图像特征和学习不同的采样因素之间的相互规模的相关性。为了实现这一目标，我们提出了一个多尺度密集跨网络（MDCN），达到了用较少的参数和更少的执行时间，强大的性能。 MDCN由多尺度致密的交块（MDCBs），分层特征蒸馏块（HFDB），以及动态重构块（DRB）的。其中，MDCB目的，以检测多尺度特征和最大限度地利用图像的特征在不同尺度流动，HFDB集中在自适应地重新校准信道逐特征响应来实现特征蒸馏，以及DRB尝试重建与不同采样因子SR图像一个单一的模式。值得注意的是，所有这些模块可以独立运行。这意味着，这些模块可以被选择性地插入到任何CNN模式，以提高模型的性能。大量的实验表明，MDCN实现了SISR竞争的结果，特别是在具有多个采样因素的重建任务。该代码将在这个HTTPS URL来提供。

55. Longitudinal Image Registration with Temporal-order and Subject-specificity Discrimination [PDF] 返回目录
Qianye Yang, Yunguan Fu, Francesco Giganti, Nooshin Ghavami, Qingchao Chen, J. Alison Noble, Tom Vercauteren, Dean Barratt, Yipeng Hu
Abstract: Morphological analysis of longitudinal MR images plays a key role in monitoring disease progression for prostate cancer patients, who are placed under an active surveillance program. In this paper, we describe a learning-based image registration algorithm to quantify changes on regions of interest between a pair of images from the same patient, acquired at two different time points. Combining intensity-based similarity and gland segmentation as weak supervision, the population-data-trained registration networks significantly lowered the target registration errors (TREs) on holdout patient data, compared with those before registration and those from an iterative registration algorithm. Furthermore, this work provides a quantitative analysis on several longitudinal-data-sampling strategies and, in turn, we propose a novel regularisation method based on maximum mean discrepancy, between differently-sampled training image pairs. Based on 216 3D MR images from 86 patients, we report a mean TRE of 5.6 mm and show statistically significant differences between the different training data sampling strategies.
摘要：纵向MR图像的形态分析在监测疾病进展的前列腺癌患者，谁放在积极的监察计划下了关键的作用。在本文中，我们描述了一种基于学习的图像配准算法来量化上的来自同一患者的一对图像，在两个不同的时间点采集之间感兴趣的区域的变化。结合基于强度的相似性和腺分割作为弱监督，人口数据训练的登记网络显著降低上抵抗患者数据中的目标的配准误差（居民企业），与登记之前和那些从一个迭代配准算法进行比较。此外，该工作提供了几个纵向数据采样策略进行定量分析，并反过来，我们提出了一种基于最大平均差异的新型正则化方法，不同的采样的训练图像对之间。基于从86例216个3D MR图像，我们报告的5.6mm的平均TRE，并显示不同的训练数据抽样策略差异有统计学显著差异。

56. Improved anomaly detection by training an autoencoder with skip connections on images corrupted with Stain-shaped noise [PDF] 返回目录
Anne-Sophie Collin, Christophe De Vleeschouwer
Abstract: In industrial vision, the anomaly detection problem can be addressed with an autoencoder trained to map an arbitrary image, i.e. with or without any defect, to a clean image, i.e. without any defect. In this approach, anomaly detection relies conventionally on the reconstruction residual or, alternatively, on the reconstruction uncertainty. To improve the sharpness of the reconstruction, we consider an autoencoder architecture with skip connections. In the common scenario where only clean images are available for training, we propose to corrupt them with a synthetic noise model to prevent the convergence of the network towards the identity mapping, and introduce an original Stain noise model for that purpose. We show that this model favors the reconstruction of clean images from arbitrary real-world images, regardless of the actual defects appearance. In addition to demonstrating the relevance of our approach, our validation provides the first consistent assessment of reconstruction-based methods, by comparing their performance over the MVTec AD dataset, both for pixel- and image-wise anomaly detection.
摘要：工业视觉，异常检测的问题可以以训练到任意的图像，即映射有或没有任何缺陷的自动编码器而没有任何缺陷寻址到一个干净的图像，即。在这种方法中，异常检测通常依赖于重建残留，或者可选地，在重建的不确定性。为了提高重建的清晰度，我们考虑跳过连接自动编码架构。在普通场景中只有干净的图像可用于训练，我们建议对其造成损坏与合成噪声模型，以防止对标识映射网络的融合，并引入用于该目的的原有色斑噪声模型。我们表明，这种模式有利于无论实际缺陷外观从任意真实世界的图像明亮的图像重建。除了证明我们的方法的相关性，我们的验证提供了基于重建的方法，第一是一致的评估，通过在MVTec公司AD数据集，他们的表现都比较的像素和图像明智的异常检测。

57. Unpaired Deep Learning for Accelerated MRI using Optimal Transport Driven CycleGAN [PDF] 返回目录
Gyutaek Oh, Byeongsu Sim, Hyungjin Chung, Leonard Sunwoo, Jong Chul Ye
Abstract: Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k-space data, since the acquisition of fully sampled k-space data requires long scan time and often leads to the change of the acquisition protocol. Therefore, unpaired deep learning without matched label data has become a very important research topic. In this paper, we propose an unpaired deep learning approach using a optimal transport driven cycle-consistent generative adversarial network (OT-cycleGAN) that employs a single pair of generator and discriminator. The proposed OT-cycleGAN architecture is rigorously derived from a dual formulation of the optimal transport formulation using a specially designed penalized least squares cost. The experimental results show that our method can reconstruct high resolution MR images from accelerated k- space data from both single and multiple coil acquisition, without requiring matched reference data.
摘要：近日，深度学习加速MRI方法都有尽管显著降低了运行时的复杂性被广泛地研究由于其高性能的重建。这些神经网络通常在训练监督的方式，所以匹配的对二次采样，并且需要完全采样的k-空间数据。不幸的是，它往往是难以获取匹配的完全采样的k-空间数据，因为完全采样的k-空间数据的获取需要较长的扫描时间并经常导致采集协议的变化。因此，如果没有匹配的标签数据未配对的深度学习已经成为一个非常重要的研究课题。在本文中，我们提出了用最优的运输驱动周期一致的生成敌对，它采用一对发电机和鉴别的网络（OT-cycleGAN）不成对深学习方法。所提出的OT-cycleGAN架构的严格使用一个特别设计的惩罚最小二乘成本的最佳运输制剂的双重制剂的。实验结果表明，我们的方法可以从单个和多个线圈采集重建从加速k-空间数据的高分辨率的MR图像，而无需相匹配的参考数据。

58. Path Planning Followed by Kinodynamic Smoothing for Multirotor Aerial Vehicles (MAVs) [PDF] 返回目录
Geesara Kulathunga, Dmitry Devitt, Roman Fedorenko, Sergei Savin, Alexandr Klimchik
Abstract: We explore path planning followed by kinodynamic smoothing while ensuring the vehicle dynamics feasibility for MAVs. We have chosen a geometrically based motion planning technique \textquotedblleft RRT*\textquotedblright\; for this purpose. In the proposed technique, we modified original RRT* introducing an adaptive search space and a steering function which help to increase the consistency of the planner. Moreover, we propose multiple RRT* which generates a set of desired paths, provided that the optimal path is selected among them. Then, apply kinodynamic smoothing, which will result in dynamically feasible as well as obstacle-free path. Thereafter, a b spline-based trajectory is generated to maneuver vehicle autonomously in unknown environments. Finally, we have tested the proposed technique in various simulated environments.
摘要：我们探索的路径规划，然后kinodynamic平滑，同时确保微型飞行器的车辆动态可行性。我们选择了基于几何运动规划技术\ textquotedblleft RRT * \ textquotedblright \;以此目的。在提出的技术，我们修改了原来的RRT *引入自适应搜索空间和转向功能来提高规划者的一致性，帮助。此外，我们提出了多种RRT *它产生一组期望的路径的，条件是该最佳路径当中选择的。然后，应用kinodynamic平滑，这将导致动态可行以及无障碍通道。此后，产生基于花键b轨迹在未知环境中自主操纵车辆。最后，我们在各种模拟环境下进行试验所提出的技术。

59. Ultra Lightweight Image Super-Resolution with Multi-Attention Layers [PDF] 返回目录
Abdul Muqeet, Jiwon Hwang, Subin Yang, Jung Heum Kang, Yongwoo Kim, Sung-Ho Bae
Abstract: Lightweight image super-resolution (SR) networks have the utmost significance for real-world applications. There are several deep learning based SR methods with remarkable performance, but their memory and computational cost are hindrances in practical usage. To tackle this problem, we propose a Multi-Attentive Feature Fusion Super-Resolution Network (MAFFSRN). MAFFSRN consists of proposed feature fusion groups (FFGs) that serve as a feature extraction block. Each FFG contains a stack of proposed multi-attention blocks (MAB) that are combined in a novel feature fusion structure. Further, the MAB with a cost-efficient attention mechanism (CEA) helps us to refine and extract the features using multiple attention mechanisms. The comprehensive experiments show the superiority of our model over the existing state-of-the-art. We participated in AIM 2020 efficient SR challenge with our MAFFSRN model and won 1st, 3rd, and 4th places in memory usage, floating-point operations (FLOPs) and number of parameters, respectively.
摘要：轻量级图像超分辨率（SR）网络对现实世界的应用程序的极其重要的意义。有性能卓越的几个深基础的学习方法，SR，但他们的记忆和计算成本是实际使用的障碍。为了解决这个问题，我们提出了一个多细心的特征融合超分辨网络（MAFFSRN）。 MAFFSRN由充当一个特征提取块提出特征融合基团（护卫舰）的。每个FFG包含提出的多关注块（MAB）被组合在一个新颖特征融合结构的堆叠。此外，具有成本效益的注意机制与生物圈（CEA）帮助我们完善和提取使用多关注机制的功能。综合实验表明，我们的模型在现有的国家的最先进的优越性。我们分别参加了AIM 2020高效SR挑战，我们MAFFSRN模型和韩元1，第3，并在内存中使用4位，浮点运算（FLOPS）和一些参数。

60. ChildBot: Multi-Robot Perception and Interaction with Children [PDF] 返回目录
Niki Efthymiou, Panagiotis P. Filntisis, Petros Koutras, Antigoni Tsiami, Jack Hadfield, Gerasimos Potamianos, Petros Maragos
Abstract: In this paper we present an integrated robotic system capable of participating in and performing a wide range of educational and entertainment tasks, in collaboration with one or more children. The system, called ChildBot, features multimodal perception modules and multiple robotic agents that monitor the interaction environment, and can robustly coordinate complex Child-Robot Interaction use-cases. In order to validate the effectiveness of the system and its integrated modules, we have conducted multiple experiments with a total of 52 children. Our results show improved perception capabilities in comparison to our earlier works that ChildBot was based on. In addition, we have conducted a preliminary user experience study, employing some educational/entertainment tasks, that yields encouraging results regarding the technical validity of our system and initial insights on the user experience with it.
摘要：本文提出了一种能够参与，并与一个或多个孩子进行了广泛的教育和娱乐的任务，协作的集成机器人系统。这个系统被称为ChildBot，功能多感知模块和多个机器人代理监视的互动环境，并能稳健地协调复杂的儿童机器人交互使用情况。为了验证该系统及其集成的模块的有效性，我们总52名儿童进行多个实验。我们的研究结果显示，相较于我们早期的作品是ChildBot的基础上改进的感知能力。此外，我们已经进行了初步的用户体验研究，采用一些教育/娱乐任务，国债收益率就鼓励我们的系统，并与它的用户体验初始见解的技术有效性的结果。

61. Variable Star Classification Using Multi-View Metric Learning [PDF] 返回目录
K. B. Johnston, S.M. Caballero-Nieves, V. Petit, A.M. Peter, R. Haber
Abstract: Our multi-view metric learning framework enables robust characterization of star categories by directly learning to discriminate in a multi-faceted feature space, thus, eliminating the need to combine feature representations prior to fitting the machine learning model. We also demonstrate how to extend standard multi-view learning, which employs multiple vectorized views, to the matrix-variate case which allows very novel variable star signature representations. The performance of our proposed methods is evaluated on the UCR Starlight and LINEAR datasets. Both the vector and matrix-variate versions of our multi-view learning framework perform favorably --- demonstrating the ability to discriminate variable star categories.
摘要：我们多视角度量学习框架允许通过直接学习多方面的功能空间区分，因此，无需之前安装的机器学习模型特征表示结合星类的稳健特征。我们还演示了如何扩展标准的多视图学习，它采用多个量化的意见，对矩阵变量的情况下，允许非常新颖的变星的签名表示。我们提出的方法的性能在UCR星光和线性的数据集进行评估。我们两个多视图学习框架的向量和矩阵变量版本进行顺利---展示的能力来区分变星的类别。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computer Vision and Pattern Recognition 2020-09-01

目录

摘要