摘要

1. Looking Enhances Listening: Recovering Missing Speech Using Images [PDF] 返回目录
Tejas Srinivasan, Ramon Sanabria, Florian Metze
Abstract: Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems. Current work, however, has shown that visually adapted ASR models only use images as a regularization signal, while completely ignoring their semantic content. In this paper, we present a set of experiments where we show the utility of the visual modality under noisy conditions. Our results show that multimodal ASR models can recover words which are masked in the input acoustic signal, by grounding its transcriptions using the visual representations. We observe that integrating visual context can result in up to 35% relative improvement in masked word recovery. These results demonstrate that end-to-end multimodal ASR systems can become more robust to noise by leveraging the visual context.
摘要：言语理解通过视觉环境更好;因为这个原因，已经有许多尝试使用图片来适应自动语音识别（ASR）系统。目前的工作，但是，已经表明，在视觉上适应ASR机型只能使用图片作为正规化信号，而全然不顾自己的语义内容。在本文中，我们提出了一套，我们展示的视觉方式的噪声条件下的实用试验。我们的研究结果表明，多ASR模式可以恢复被掩盖在输入声音信号，通过使用可视化表示接地其改编的话。我们观察到，整合的视觉环境可以导致高达屏蔽字恢复35％的相对改善。这些结果表明，端至端多峰ASR系统可通过利用可视上下文成为噪声更为鲁棒。

2. Pre-Training for Query Rewriting in A Spoken Language Understanding System [PDF] 返回目录
Zheng Chen, Xing Fan, Yuan Ling, Lambert Mathias, Chenlei Guo
Abstract: Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.
摘要：查询重写（QR）是减少在口语理解管道，其中的误差来自各种来源，如语音识别错误，语言理解错误或实体解析错误源于错误造成客户的摩擦日益重要的技术。在这项工作中，我们首先提出了查询重写神经检索基础的方法。然后，通过预先训练情境语言的嵌入的广泛成功的启发，并以此来弥补不足QR训练数据，我们提出了一个语言模型（LM）为基础的方法预火车上的历史用户会话数据查询的嵌入与语音助手。此外，我们建议使用通过了解系统，以加强前培训语言产生的NLU假设。我们的实验显示前培训提供了丰富的先验信息和帮助的QR任务实现强劲性能。我们还表明联合前培训NLU假设有另一个好处。最后，经过岗前培训，我们发现了一个小套重写对足以微调QR模型通过对所有QR训练数据全员培训跑赢强大的基线。

3. Sentiment Analysis Using Averaged Weighted Word Vector Features [PDF] 返回目录
Ali Erkan, Tunga Gungor
Abstract: People use the world wide web heavily to share their experience with entities such as products, services, or travel destinations. Texts that provide online feedback in the form of reviews and comments are essential to make consumer decisions. These comments create a valuable source that may be used to measure satisfaction related to products or services. Sentiment analysis is the task of identifying opinions expressed in such text fragments. In this work, we develop two methods that combine different types of word vectors to learn and estimate polarity of reviews. We develop average review vectors from word vectors and add weights to this review vectors using word frequencies in positive and negative sensitivity-tagged reviews. We applied the methods to several datasets from different domains that are used as standard benchmarks for sentiment analysis. We ensemble the techniques with each other and existing methods, and we make a comparison with the approaches in the literature. The results show that the performances of our approaches outperform the state-of-the-art success rates.
摘要：人们使用万维网巨资分享他们的实体，如产品，服务或旅游目的地体验。提供的评论和意见的形式在线反馈文本是必须要做出决定消费。这些意见创建可用于测量有关的产品或服务的满意度的重要来源。情感分析是识别这样的文本片段表达意见的任务。在这项工作中，我们开发了两个方法，结合不同类型的词矢量的学习和评估审查的极性。我们开发从词矢量的平均评价载体，并添加砝码使用在正面和负面的敏感性标记评论词频本次审查的载体。我们使用的方法从不同的域中的多个数据集，它们作为标准的基准情感分析。我们合奏彼此之间以及现有方法的技术，和我们做与对比文献的方法。结果表明，我们的方法的性能优于国家的最先进的成功率。

4. Sparse and Structured Visual Attention [PDF] 返回目录
Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins
Abstract: Visual attention mechanisms are widely used in multimodal tasks, such as image captioning and visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attention mechanism with two alternative sparsity-promoting transformations: sparsemax, which is able to select the relevant regions only (assigning zero weight to the rest), and a newly proposed Total-Variation Sparse Attention (TVmax), which further encourages the joint selection of adjacent spatial locations. Experiments in image captioning and VQA, using both LSTM and Transformer architectures, show gains in terms of human-rated caption quality, attention relevance, and VQA accuracy, with improved interpretability.
摘要：视觉注意机制被广泛应用于多任务，如图像字幕和视觉问答（VQA）。基于SOFTMAX注意力机制的一个缺点是它们分配概率质量到所有的图像区域，无论其邻接结构及其相关的文字。在本文中，以更好地链接与文本的图像结构，我们更换两种可供选择的稀疏性，促进转变传统的SOFTMAX注意机制：sparsemax，这是能够选择相关区域中仅仅（分配权重为零的其余部分），和新提出的总的变化率稀疏注意（TVmax），其进一步鼓励相邻的空间位置的联合选择。在图像字幕和VQA，同时使用LSTM和变压器的架构实验，显示人类额定字幕质量，重视相关性，准确性VQA，具有完善的可解释性方面的收益。

5. Unsupervised Separation of Native and Loanwords for Malayalam and Telugu [PDF] 返回目录
Sridhama Prakhya, Deepak P
Abstract: Quite often, words from one language are adopted within a different language without translation; these words appear in transliterated form in text written in the latter language. This phenomenon is particularly widespread within Indian languages where many words are loaned from English. In this paper, we address the task of identifying loanwords automatically and in an unsupervised manner, from large datasets of words from agglutinative Dravidian languages. We target two specific languages from the Dravidian family, viz., Malayalam and Telugu. Based on familiarity with the languages, we outline an observation that native words in both these languages tend to be characterized by a much more versatile stem - stem being a shorthand to denote the subword sequence formed by the first few characters of the word - than words that are loaned from other languages. We harness this observation to build an objective function and an iterative optimization formulation to optimize for it, yielding a scoring of each word's nativeness in the process. Through an extensive empirical analysis over real-world datasets from both Malayalam and Telugu, we illustrate the effectiveness of our method in quantifying nativeness effectively over available baselines for the task.
摘要：很多时候，从一种语言单词不用翻译不同语言中通过;这些词出现在写在后面的语言文本音译形式。这种现象是许多词是从英语借给印度语中特别普遍。在本文中，我们要解决自动在无人监督的方式识别外来词，从粘着达罗毗荼语系的单词大型数据集的任务。我们的目标从德拉威家庭两个特定的语言，即，马拉雅拉姆语和泰卢固语。根据与语言的熟悉程度，我们从整体上观察，在这两种语言的本地话往往被表征一个更通用的干 - 干是表示由单词的前几个字符构成的子字序列的速记 - 比的话从其他语言贷款。我们利用这些观测建立一个目标函数和迭代优化配方，以优化它，产生过程中的每个字的本土性的进球。通过以上来自马来亚和泰卢固语真实世界的数据集丰富的实证分析，说明我们在全球为任务提供基准有效量化本土化方法的有效性。

6. Comparison of Turkish Word Representations Trained on Different Morphological Forms [PDF] 返回目录
Gökhan Güler, A. Cüneyd Tantuğ
Abstract: Increased popularity of different text representations has also brought many improvements in Natural Language Processing (NLP) tasks. Without need of supervised data, embeddings trained on large corpora provide us meaningful relations to be used on different NLP tasks. Even though training these vectors is relatively easy with recent methods, information gained from the data heavily depends on the structure of the corpus language. Since the popularly researched languages have a similar morphological structure, problems occurring for morphologically rich languages are mainly disregarded in studies. For morphologically rich languages, context-free word vectors ignore morphological structure of languages. In this study, we prepared texts in morphologically different forms in a morphologically rich language, Turkish, and compared the results on different intrinsic and extrinsic tasks. To see the effect of morphological structure, we trained word2vec model on texts which lemma and suffixes are treated differently. We also trained subword model fastText and compared the embeddings on word analogy, text classification, sentimental analysis, and language model tasks.
摘要：增加不同的文字表述的人气也自然语言处理（NLP）任务带来了许多改进。而不需要监督的数据，培训了大量语料的嵌入提供我们要在不同的NLP任务时使用有意义的关系。虽然训练这些载体是相对容易与最近的方法，从数据中获得的信息很大程度上取决于语料库的语言结构。由于普遍研究的语言也有类似的形态结构，发生了形态丰富的语言主要是忽略在研究的问题。对于形态丰富的语言，上下文词矢量忽视的语言形态结构。在这项研究中，我们准备了形态不同形式的文本在形态丰富的语言，土耳其语和比较不同的内在和外在的任务的结果。看形态结构的影响，我们训练上引理和后缀区别对待文本word2vec模型。我们还培养了子字模型fastText和比较了字类比，文本分类，感性的分析和语言模型任务的嵌入。

7. Keyphrase Extraction with Span-based Feature Representations [PDF] 返回目录
Funan Mu, Zhenting Yu, LiFeng Wang, Yequan Wang, Qingyu Yin, Yibo Sun, Liqun Liu, Teng Ma, Jing Tang, Xing Zhou
Abstract: Keyphrases are capable of providing semantic metadata characterizing documents and producing an overview of the content of a document. Since keyphrase extraction is able to facilitate the management, categorization, and retrieval of information, it has received much attention in recent years. There are three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks. Two-step ranking approach is based on feature engineering, which is labor intensive and domain dependent. Sequence labeling is not able to tackle overlapping phrases. Generation methods (i.e., Sequence-to-sequence neural network models) overcome those shortcomings, so they have been widely studied and gain state-of-the-art performance. However, generation methods can not utilize context information effectively. In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens. In this way, our model obtains representation for each keyphrase and further learns to capture the interaction between keyphrases in one document to get better ranking results. In addition, with the help of tokens, our model is able to extract overlapped keyphrases. Experimental results on the benchmark datasets show that our proposed model outperforms the existing methods by a large margin.
摘要：关键字句能够提供语义元数据特征文件和产生文件的内容的概述的。由于关键词的提取是能够方便管理，分类和检索的信息，它受到很多关注在最近几年。有三种方法来解决的关键词提取：（ⅰ）传统的两步骤排序方法，（ⅱ）序列标签和（iii）使用神经网络的产生。两步排序方法是基于特征的工程，这是劳动密集和域依赖。序列标注是不能够解决重叠短语。产生方法（即，序列到序列神经网络模型）克服这些缺点，所以它们已经被广泛地研究和国家的最先进的增益性能。然而，代方法不能有效地利用上下文信息。在本文中，我们提出了一个新颖的跨度的关键词提取模型，提取跨度基于关键词短语特征表示直接从所有内容令牌。这样一来，我们的模型获得每个关键词的进一步获悉代表性捕捉关键短语之间的相互作用一个文档中获得更好的排名结果。此外，与令牌的帮助下，我们的模型是能够提取重叠的关键字句。基准的数据集实验结果表明，该模型优于大幅度现有的方法。

8. Exploiting the Matching Information in the Support Set for Few Shot Event Classification [PDF] 返回目录
Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen
Abstract: The existing event classification (EC) work primarily focuseson the traditional supervised learning setting in which models are unableto extract event mentions of new/unseen event types. Few-shot learninghas not been investigated in this area although it enables EC models toextend their operation to unobserved event types. To fill in this gap, inthis work, we investigate event classification under the few-shot learningsetting. We propose a novel training method for this problem that exten-sively exploit the support set during the training process of a few-shotlearning model. In particular, in addition to matching the query exam-ple with those in the support set for training, we seek to further matchthe examples within the support set themselves. This method providesmore training signals for the models and can be applied to every metric-learning-based few-shot learning methods. Our extensive experiments ontwo benchmark EC datasets show that the proposed method can improvethe best reported few-shot learning models by up to 10% on accuracyfor event classification
摘要：新的/看不见的事件类型的现有的事件分类（EC）的工作主要focuseson传统的监督式学习环境中，模型unableto提取物事件中提到。很少拍learninghas没有在这方面进行了研究，虽然它使EC车型toextend其操作未观察到的事件类型。为了填补这一空白，inthis工作中，我们很少拍learningsetting下调查事件分类。我们提出这个问题的新的训练方法EXTEN-sively开发过程中的几个-shotlearning模型的训练过程中的支集。特别是，除了查询考试-PLE与那些在训练支持组匹配，我们寻求进一步小组赛的例子中支持自己设定。为模特这个方法providesmore训练信号，并且可以应用于所有的基于度量学习几拍的学习方法。我们广泛的实验ontwo基准EC数据集表明，该方法可以improvethe最好的报道很少拍学习高达模型10％accuracyfor事件分类

9. What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations [PDF] 返回目录
Michał Kuźba, Przemysław Biecek
Abstract: Recently we see a rising number of methods in the field of eXplainable Artificial Intelligence. To our surprise, their development is driven by model developers rather than a study of needs for human end users. To answer the question "What would a human operator like to ask the ML model?" we propose a conversational system explaining decisions of the predictive model. In this experiment, we implement a chatbot called dr_ant and train a model predicting survival odds on Titanic. People can talk to dr_ant about the model to understand the rationale behind its predictions. Having collected a corpus of 1000+ dialogues, we analyse the most common types of questions that users would like to ask. To our knowledge, it is the first study of needs for human operators in the context of conversations with an ML model. It is also a first study which uses a conversational system for interactive exploration of a predictive model trained on tabular data.
摘要：最近，我们看到了越来越多的在解释的人工智能领域的方法。令我们惊讶的是，他们的发展是由开发商模式，而不是对人类最终用户需求的研究驱动。要回答这个问题：“什么想人类操作员问ML模式？”我们提出了一个对讲系统解释预测模型的决定。在这个实验中，我们实施了一个名为dr_ant聊天机器人和培养模式上的泰坦尼克号预测生存概率。人们可以跟dr_ant有关的模型，以了解它的预测背后的基本原理。在收集的1000多个对话语料库，我们分析了最常见的问题是用户想请教。据我们所知，它是在与ML模型对话的背景下人工操作需要先学习。它也是使用对话系统，训练有素的表格数据的预测模型的互动探索第一个研究。

10. Image-to-Image Translation with Text Guidance [PDF] 返回目录
Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
Abstract: The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.
摘要：本文的目的是嵌入可控因素，即，自然语言描述成图像到图像的平移与生成对抗性的网络，它允许文本描述，以确定合成图像的视觉属性。我们提出四个主要组成部分：（1）部分的词性标注的实施，以过滤出在给定的描述非语义字，（2）通过仿射组合模块的有效熔丝不同模态的文本和图像的特征，（3）一种新的改进的多级结构，以加强鉴别器的差动能力和发电机的整流能力，和（4）的新结构的损失，进一步提高鉴别器，以更好地分辨实际的和合成的图像。在COCO大量的实验数据集表明，我们的方法有两个逼真视觉效果，并与给定的描述语义一致性优越的性能。

11. Deep compositional robotic planners that follow natural language commands [PDF] 返回目录
Yen-Ling Kuo, Boris Katz, Andrei Barbu
Abstract: We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recurrent hierarchical deep network controls how the planner explores the environment, determines when a planned path is likely to achieve a goal, and estimates the confidence of each move to trade off exploitation and exploration between the network and the planner. Planners are designed to have near-optimal behavior when information about the task is missing, while networks learn to exploit observations which are available from the environment, making the two naturally complementary. Combining the two enables generalization to new maps, new kinds of obstacles, and more complex sentences that do not occur in the training set. Little data is required to train the model despite it jointly acquiring a CNN that extracts features from the environment as it learns the meanings of words. The model provides a level of interpretability through the use of attention maps allowing users to see its reasoning steps despite being an end-to-end model. This end-to-end model allows robots to learn to follow natural language commands in challenging continuous environments.
摘要：我们展示了一个基于采样的机器人计划者可以如何增强学习理解自然语言指令在连续配置空间的顺序移动和操作物体。我们的方法结合了根据一个复杂的命令，其中包括对象，动词，空间关系和属性，具有基于采样的规划师，RRT的解析构成的深网络。规划者如何探索环境的反复出现的深层次的网络控制，确定何时有计划的路径是有可能实现一个目标，并估计每一个举动权衡网络和规划者之间的开采和勘探的信心。规划者被设计成具有当有关任务的信息丢失接近最优的行为，而网络学习利用观察其可从环境，使两个自然补充。两者结合能够推广到不训练集中出现新的地图，新的各种障碍，和更复杂的句子。小数据需要火车模型，尽管它共同取得CNN说，从提取的环境特征，因为它学习单词的含义。该模型提供通过使用注意地图让用户看到它的推理步骤，尽管是一个终端到高端机型的一个解释性的水平。这端至端模型允许机器人学会跟随在挑战不断的环境中的自然语言命令。

12. A Combined Stochastic and Physical Framework for Modeling Indoor 5G Millimeter Wave Propagation [PDF] 返回目录
Georges Nassif, Catherine Gloaguen, Philippe Martins
Abstract: Indoor coverage is a major challenge for 5G millimeter waves (mmWaves). In this paper, we address this problem through a novel theoretical framework that combines stochastic indoor environment modeling with advanced physical propagation simulation. This approach is particularly adapted to investigate indoor-to-indoor 5G mmWave propagation. Its system implementation, so-called iGeoStat, generates parameterized typical environments that account for the indoor spatial variations, then simulates radio propagation based on the physical interaction between electromagnetic waves and material properties. This framework is not dedicated to a particular environment, material, frequency or use case and aims to statistically understand the influence of indoor environment parameters on mmWave propagation properties, especially coverage and path loss. Its implementation raises numerous computational challenges that we solve by formulating an adapted link budget and designing new memory optimization algorithms. The first simulation results for two major 5G applications are validated with measurement data and show the efficiency of iGeoStat to simulate multiple diffusion in realistic environments, within a reasonable amount of time and memory resources. Generated output maps confirm that diffusion has a critical impact on indoor mmWave propagation and that proper physical modeling is of the utmost importance to generate relevant propagation models.
摘要：室内覆盖是5G毫米波（mmWaves）的一个重大挑战。在本文中，我们通过一个新的理论框架，解决这个问题，结合随机与先进的物理传播模拟室内环境建模。这种方法特别适合于调查室内至室内5G毫米波传播。其系统实施，所谓iGeoStat，生成参数即占室内空间变化典型的环境中，然后模拟基于电磁波和材料性能之间的物理相互作用的无线电传播。该框架不专用于特定的环境中，物质，频率或使用情况，其目的在于统计学理解对毫米波传输特性，特别是覆盖和路径损耗的室内环境参数的影响。它的实施，提高了我们通过制定适当链路预算和设计新的内存优化算法，解决了许多计算挑战。第一仿真结果两大5G应用程序验证用的测量数据，并显示iGeoStat的效率，以模拟真实的环境的多个扩散，时间和内存资源的合理量内。生成的输出贴图证实，扩散，对室内毫米波传播和适当的物理建模是极为重要的，以生成相关的传播模型具有关键性的影响。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-02-14

目录

摘要