0%

【arxiv论文】 Computation and Language 2020-02-21

目录

1. Measuring Social Biases in Grounded Vision and Language Embeddings [PDF] 摘要
2. How Much Knowledge Can You Pack Into the Parameters of a Language Model? [PDF] 摘要
3. REALM: Retrieval-Augmented Language Model Pre-Training [PDF] 摘要
4. Application of Pre-training Models in Named Entity Recognition [PDF] 摘要
5. Identifying physical health comorbidities in a cohort of individuals with severe mental illness: An application of SemEHR [PDF] 摘要
6. Compositional Neural Machine Translation by Removing the Lexicon from Syntax [PDF] 摘要
7. MA-DST: Multi-Attention Based Scalable Dialog State Tracking [PDF] 摘要
8. The Fluidity of Concept Representations in Human Brain Signals [PDF] 摘要
9. Contextual Lensing of Universal Sentence Representations [PDF] 摘要
10. Guiding attention in Sequence-to-sequence models for Dialogue Act prediction [PDF] 摘要
11. Balancing Cost and Benefit with Tied-Multi Transformers [PDF] 摘要
12. FrameAxis: Characterizing Framing Bias and Intensity with Word Embedding [PDF] 摘要
13. Federated pretraining and fine tuning of BERT using clinical notes from multiple silos [PDF] 摘要
14. Wavesplit: End-to-End Speech Separation by Speaker Clustering [PDF] 摘要
15. Imputer: Sequence Modelling via Imputation and Dynamic Programming [PDF] 摘要
16. Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges [PDF] 摘要
17. How To Avoid Being Eaten By a Grue: Exploration Strategies for Text-Adventure Agents [PDF] 摘要
18. Interactive Natural Language-based Person Search [PDF] 摘要

摘要

1. Measuring Social Biases in Grounded Vision and Language Embeddings [PDF] 返回目录
  Candace Ross, Boris Katz, Andrei Barbu
Abstract: We generalize the notion of social biases from language embeddings to grounded vision and language embeddings. Biases are present in grounded embeddings, and indeed seem to be equally or more significant than for ungrounded embeddings. This is despite the fact that vision and language can suffer from different biases, which one might hope could attenuate the biases in both. Multiple ways exist to generalize metrics measuring bias in word embeddings to this new setting. We introduce the space of generalizations (Grounded-WEAT and Grounded-SEAT) and demonstrate that three generalizations answer different yet important questions about how biases, language, and vision interact. These metrics are used on a new dataset, the first for grounded bias, created by augmenting extending standard linguistic bias benchmarks with 10,228 images from COCO, Conceptual Captions, and Google Images. Dataset construction is challenging because vision datasets are themselves very biased. The presence of these biases in systems will begin to have real-world consequences as they are deployed, making carefully measuring bias and then mitigating it critical to building a fair society.
摘要:从一般化的嵌入语言到接地视觉和语言的嵌入社会偏见的概念。偏见存在于接地的嵌入,也确实似乎同样适用或比不接地的嵌入更显著。尽管这是一个事实,即视觉和语言可以从不同的偏见,人们可能希望能够减轻在两个偏差受到影响。多种方式存在概括的指标测量字的嵌入偏置到这个新的设置。我们引进概括的空间(接地WEAT和接地SEAT),并展示三个概括回答有关如何偏见,语言和视觉互动不同但重要的问题。这些指标都上了一个新的数据集,第一个为接地偏差,通过扩大与COCO 10,228图像,概念字幕,和谷歌图片扩展标准语言偏置基准创建使用。数据集建设是具有挑战性的,因为视觉数据集本身非常偏颇。在系统中,这些偏见的存在会开始有真实世界的后果,因为他们部署,精心制作测量偏差,然后减轻它来建立一个公平的社会是至关重要的。

2. How Much Knowledge Can You Pack Into the Parameters of a Language Model? [PDF] 返回目录
  Adam Roberts, Colin Raffel, Noam Shazeer
Abstract: It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.
摘要:最近有人发现,在训练的非结构化文本神经语言模型可以隐式存储和使用自然语言查询检索知识。在这短短的文章中,我们衡量这种方法通过微调预训练模型的实用性,回答问题不访问任何外部环境或知识。我们表明,这种做法尺度出奇地好与模型的大小,优于模型,明确查找知识的开放域自然问题和WebQuestions的变体。

3. REALM: Retrieval-Augmented Language Model Pre-Training [PDF] 返回目录
  Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
Abstract: Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.
摘要:语言模型前培训已被证明是捕捉世界的知识数量惊人,对NLP任务,如问答至关重要。但是,这方面的知识是隐含存储在神经网络的参数,需要越来越大的网络,以覆盖更多的事实。要在一个更加模块化和解释的方式获取知识,我们扩充语言模型前培训与潜在的知识猎犬,它允许模型检索,并出席了由大语料库,如维基百科文档,在预培训使用,细调整和推理。这是第一次,我们将展示如何在无人监督的方式,预先培养这样的知识猎犬使用屏蔽语言建模作为学习信号,并通过一种考虑数百万文档的检索步骤backpropagating。我们展示的开放域问答系统的具有挑战性的任务(开放-QA)检索,增强语言模型的有效性前培训(REALM)的微调。我们比较反对国家的最先进的模型上的三个热门打开-QA基准显性和隐性知识的存储,并发现我们超越由显著保证金(4-16%的绝对精度)以前的所有方法,同时还提供定性的好处,如可解释性和模块化。

4. Application of Pre-training Models in Named Entity Recognition [PDF] 返回目录
  Yu Wang, Yining Sun, Zuchang Ma, Lisheng Gao, Yang Xu, Ting Sun
Abstract: Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task to extract entities from unstructured data. The previous methods for NER were based on machine learning or deep learning. Recently, pre-training models have significantly improved performance on multiple NLP tasks. In this paper, firstly, we introduce the architecture and pre-training tasks of four common pre-training models: BERT, ERNIE, ERNIE2.0-tiny, and RoBERTa. Then, we apply these pre-training models to a NER task by fine-tuning, and compare the effects of the different model architecture and pre-training tasks on the NER task. The experiment results showed that RoBERTa achieved state-of-the-art results on the MSRA-2006 dataset.
摘要:命名实体识别(NER)是一个基本的自然语言处理(NLP)任务从非结构化数据提取的实体。对于NER以前的方法是基于机器学习或深度学习。近日,前培训模型已显著提高多任务NLP性能。在本文中,首先,我们介绍的架构和预培训的四种常见的预培训模式的任务:BERT,摇奖,ERNIE2.0纤巧,和罗伯塔。然后,我们将这些前培训模式,以通过微调一个NER任务,并比较不同的模型结构的影响,前培训在NER任务的任务。实验结果表明,罗伯塔实现在MSRA-2006数据集的国家的最先进的成果。

5. Identifying physical health comorbidities in a cohort of individuals with severe mental illness: An application of SemEHR [PDF] 返回目录
  Rebecca Bendayan, Honghan Wu, Zeljko Kraljevic, Robert Stewart, Tom Searle, Jaya Chaturvedi, Jayati Das-Munshi, Zina Ibrahim, Aurelie Mascio, Angus Roberts, Daniel Bean, Richard Dobson
Abstract: Multimorbidity research in mental health services requires data from physical health conditions which is traditionally limited in mental health care electronic health records. In this study, we aimed to extract data from physical health conditions from clinical notes using SemEHR. Data was extracted from Clinical Record Interactive Search (CRIS) system at South London and Maudsley Biomedical Research Centre (SLaM BRC) and the cohort consisted of all individuals who had received a primary or secondary diagnosis of severe mental illness between 2007 and 2018. Three pairs of annotators annotated 2403 documents with an average Cohen's Kappa of 0.757. Results show that the NLP performance varies across different diseases areas (F1 0.601 - 0.954) suggesting that the language patterns or terminologies of different condition groups entail different technical challenges to the same NLP task.
摘要:Multimorbidity研究精神卫生服务,需要从身体健康条件,在精神卫生保健电子健康记录传统有限的数据。在这项研究中,我们的目的是提取使用SemEHR临床笔记身体健康状况的数据。数据来自于伦敦南部和莫兹利生物医学研究中心(SLAM BRC)临床记录交互式搜索(CRIS)系统中提取和对象包括谁收到了2007年和2018年三对之间的严重精神疾病的原发性或继发性诊断的所有个人的注释的注释2403个文档与0.757的平均Cohen的κ。结果表明,NLP性能在不同疾病领域(F1 0.601 - 0.954)变化提示语言模式或不同条件组的术语意味着相同的NLP任务不同的技术挑战。

6. Compositional Neural Machine Translation by Removing the Lexicon from Syntax [PDF] 返回目录
  Tristan Thrush
Abstract: The meaning of a natural language utterance is largely determined from its syntax and words. Additionally, there is evidence that humans process an utterance by separating knowledge about the lexicon from syntax knowledge. Theories from semantics and neuroscience claim that complete word meanings are not encoded in the representation of syntax. In this paper, we propose neural units that can enforce this constraint over an LSTM encoder and decoder. We demonstrate that our model achieves competitive performance across a variety of domains including semantic parsing, syntactic parsing, and English to Mandarin Chinese translation. In these cases, our model outperforms the standard LSTM encoder and decoder architecture on many or all of our metrics. To demonstrate that our model achieves the desired separation between the lexicon and syntax, we analyze its weights and explore its behavior when different neural modules are damaged. When damaged, we find that the model displays the knowledge distortions that aphasics are evidenced to have.
摘要:一个自然语言语句的含义主要是从它的语法和单词确定。此外,有证据表明,人类大约从语法知识的词汇知识分开处理的话语。从语义和神经科学理论要求的是完整的单词的含义在语法的表示不被编码。在本文中,我们建议可以通过一个LSTM编码器和解码器强制执行此约束的神经单元。我们表明,我们的模型在各种领域,包括语义分析,句法分析实现竞争力的性能,以及英语翻译成汉语中国的翻译。在这种情况下,我们的模型优于标准LSTM编码器和解码器架构在很多或所有的指标。为了证明我们的模型实现了词汇和语法之间所需的间隔,我们分析它的权重,并探讨其行为时,不同的神经模块损坏。当损坏,我们发现,该模型显示知识的扭曲是失语症患者被证实有。

7. MA-DST: Multi-Attention Based Scalable Dialog State Tracking [PDF] 返回目录
  Adarsh Kumar, Peter Ku, Anuj Kumar Goyal, Angeliki Metallinou, Dilek Hakkani-Tur
Abstract: Task oriented dialog agents provide a natural language interface for users to complete their goal. Dialog State Tracking (DST), which is often a core component of these systems, tracks the system's understanding of the user's goal throughout the conversation. To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context, including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to resolve cross-domain coreferences. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.
摘要:基于任务的对话框代理商提供自然语言界面,供用户完成他们的目标。对话状态跟踪(DST),这往往是这些系统的核心部件,追踪整个谈话的用户目标系统的理解。为了能够精确的多域DST,过去的话语和狭槽之间的语义模型需要编码的依赖关系和理解对话上下文,包括远程跨域引用。我们引入新的架构,这个任务通过使用注意机制在多粒度编码的对话历史和插槽语义更有力。特别是,我们使用交叉注意在不同的语义水平和自我关注的背景和插槽之间的关系进行建模来解决跨域coreferences。此外,我们提出的架构不依赖于知道领域本体事前也可以在零射门设置新的域或看不见的槽值使用。我们的模型通过在全数据设定5%(绝对值)提高了联合目标精度和由高达2%(绝对值)在零拍设置在存在于MultiWoZ 2.1数据集状态的最先进的。

8. The Fluidity of Concept Representations in Human Brain Signals [PDF] 返回目录
  Eva Hendrikx, Lisa Beinborn
Abstract: Cognitive theories of human language processing often distinguish between concrete and abstract concepts. In this work, we analyze the discriminability of concrete and abstract concepts in fMRI data using a range of analysis methods. We find that the distinction can be decoded from the signal with an accuracy significantly above chance, but it is not found to be a relevant structuring factor in clustering and relational analyses. From our detailed comparison, we obtain the impression that human concept representations are more fluid than dichotomous categories can capture. We argue that fluid concept representations lead to more realistic models of human language processing because they better capture the ambiguity and underspecification present in natural language use.
摘要:人类语言处理的认知理论往往具象与抽象的概念区分。在这项工作中,我们分析了具体的和抽象的概念,fMRI数据使用一系列的分析方法可辨。我们发现,区分可从显著上述机会的精度信号进行解码,但不认为是在集群和关联分析相关的结构因素。从我们的详细的对比,我们得到的印象是,人的概念表示比二分类别可以捕捉更多的流体。我们认为,流体概念表示导致人类语言处理,以更现实的模型,因为他们更好地捕捉模糊和标示不足出现在自然语言使用。

9. Contextual Lensing of Universal Sentence Representations [PDF] 返回目录
  Jamie Kiros
Abstract: What makes a universal sentence encoder universal? The notion of a generic encoder of text appears to be at odds with the inherent contextualization and non-permanence of language use in a dynamic world. However, mapping sentences into generic fixed-length vectors for downstream similarity and retrieval tasks has been fruitful, particularly for multilingual applications. How do we manage this dilemma? In this work we propose Contextual Lensing, a methodology for inducing context-oriented universal sentence vectors. We break the construction of universal sentence vectors into a core, variable length, sentence matrix representation equipped with an adaptable `lens' from which fixed-length vectors can be induced as a function of the lens context. We show that it is possible to focus notions of language similarity into a small number of lens parameters given a core universal matrix representation. For example, we demonstrate the ability to encode translation similarity of sentences across several languages into a single weight matrix, even when the core encoder has not seen parallel data.
摘要:是什么让一个普遍的一句编码器通用?文本的通用编码器的概念似乎是在与固有的语境和语言运用的非永久性在这个瞬息万变的世界的赔率。然而,映射到句子下游相似度和检索任务的通用固定长度矢量取得了成果,特别是用于多语言应用程序。我们如何管理这个难题呢?在这项工作中,我们提出语境伦辛,诱导面向环境的万能句子向量的方法。我们打破普遍句子矢量成芯,可变长度,句子矩阵表示装备有从该固定长度矢量可以诱导作为透镜上下文的功能的适应性`透镜的结构。我们表明,它可能集中语言相似的概念变成少数赋予了核心的通用矩阵表示镜头参数。例如,我们证明能力跨越多种语言的句子翻译编码相似度成一个单一的权重矩阵,即使在核心编码器还没有见过的并行数据。

10. Guiding attention in Sequence-to-sequence models for Dialogue Act prediction [PDF] 返回目录
  Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, Chloe Clavel
Abstract: The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
摘要:预测基于会话对话对话框行为(DA)的任务是在对话代理商发展的重要组成部分。准确预测的DA需要对话和全局变量依赖两者的精确建模。我们充分利用seq2seq办法在神经机器翻译(NMT)广泛采用,以提高标签的顺序性的造型。 Seq2seq型号,会同时使用线性条件随机场(CRF)只有模型局部变量的依赖目前提出的方法去学习复杂的全球性依赖。在这项工作中,我们介绍了使用DA分类量身打造的一款seq2seq模式:分层编码器,一种新型的注意力引导机制和束搜索应用到训练和推理。与现有技术相比我们的模型不需要手工的特性和训练端至端的状态。此外,所提出的方法达到的85%的上SWDA不匹配的准确度得分,并在MRDA状态的最先进的准确度得分的91.6%。

11. Balancing Cost and Benefit with Tied-Multi Transformers [PDF] 返回目录
  Raj Dabre, Raphael Rubino, Atsushi Fujita
Abstract: We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In sequence-to-sequence modeling, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. We then propose a mechanism to choose a priori the number of encoder and decoder layers for faster decoding, and also explore recurrent stacking of layers and knowledge distillation for model compression. We present a cost-benefit analysis of applying the proposed approaches for neural machine translation and show that they reduce decoding costs while preserving translation quality.
摘要:我们提出并评估用于训练多个变压器捆绑带参数的新的方法来压缩多个模型为一个能够在解码期间动态选择的编码器和译码器层的数目。在序列到序列建模,通常,N型层编码器的最后一层的输出被馈送到M-层解码器,最后解码器层的输出被用来计算损失。相反,我们的方法计算的单个损失由N×M个损失,其中每个损失从连接到所述N个编码器层中的一个的M个译码器层中的一个的输出来计算。这样的模型涵括N×M个具有不同数量的编码器和译码器层模型,并且可以被用于具有比的编码器和译码器层的最大数目更少的解码。然后,我们提出了一种机制来选择的编码器和解码器层的先验数更快解码,并且还探讨了层和知识蒸馏模型压缩的反复堆叠。我们提出申请神经机器翻译所提出的方案的成本效益分析和展示,同时保持翻译质量,他们降低成本的解码。

12. FrameAxis: Characterizing Framing Bias and Intensity with Word Embedding [PDF] 返回目录
  Haewoon Kwak, Jisun An, Yong-Yeol Ahn
Abstract: We propose FrameAxis, a method of characterizing the framing of a given text by identifying the most relevant semantic axes ("microframes") defined by antonym word pairs. In contrast to the traditional framing analysis, which has been constrained by a small number of manually annotated general frames, our unsupervised approach provides much more detailed insights, by considering a host of semantic axes. Our method is capable of quantitatively teasing out framing bias -- how biased a text is in each microframe -- and framing intensity -- how much each microframe is used -- from the text, offering a nuanced characterization of framing. We evaluate our approach using SemEval datasets as well as three other datasets and human evaluations, demonstrating that FrameAxis can reliably characterize documents with relevant microframes. Our method may allow scalable and nuanced computational analyses of framing across disciplines.
摘要:本文提出FrameAxis,通过识别反义词词对定义的最相关的语义轴(“微帧”)表征给定文本的框架的方法。相较于传统的框架分析,这已经限制由少数手动注释一般框架,我们的无监督的方法提供更详细的见解,通过考虑语义轴的主机。我们的方法是能够定量地梳理出成帧偏压 - 如何偏置的文本是在每个微帧 - 和成帧强度 - 每个微帧使用了多少 - 从所述文本,提供成帧的细致入微的表征。我们使用的数据集SemEval以及其他三个数据集和人的评价,这表明FrameAxis能够可靠地表征相关微帧文件评估我们的做法。我们的方法可以允许跨学科框架的可扩展性和细致入微的计算分析。

13. Federated pretraining and fine tuning of BERT using clinical notes from multiple silos [PDF] 返回目录
  Dianbo Liu, Tim Miller
Abstract: Large scale contextual representation models, such as BERT, have significantly advanced natural language processing (NLP) in recently years. However, in certain area like healthcare, accessing diverse large scale text data from multiple institutions is extremely challenging due to privacy and regulatory reasons. In this article, we show that it is possible to both pretrain and fine tune BERT models in a federated manner using clinical texts from different silos without moving the data.
摘要:大型上下文表示模型,如BERT,在最近几年有显著先进的自然语言处理(NLP)。然而,在某些领域诸如医疗保健,访问来自多个机构不同的大规模文本数据极为由于隐私和管理上的原因挑战。在这篇文章中,我们表明,它使用来自不同筒仓临床文本,而无需移动数据可能既pretrain和微调BERT模型以联合方式。

14. Wavesplit: End-to-End Speech Separation by Speaker Clustering [PDF] 返回目录
  Neil Zeghidour, David Grangier
Abstract: We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.
摘要:介绍Wavesplit,最终到终端的语音分离系统。从混合的语音的单个记录,该模型推断和每个扬声器的簇表示,然后估计空调所述推断表示每个源信号。该模型被训练在原始波形,共同执行两项任务。我们的模型推断通过集群一组扬声器表示,这解决了语音分离的根本置换问题。此外,序列范围的扬声器表示提供长,挑战序列的更稳健的分离,比以前的方法。我们发现,Wavesplit优于上(WSJ0-2mix,WSJ0-3mix),以及在嘈杂的(开个唱!)和回荡的2个或3扬声器清洁混合物以前的国家的最先进的(WHAMR!)的条件。作为一个额外的贡献,我们进一步提高通过引入在线数据扩张分离我们的模型。

15. Imputer: Sequence Modelling via Imputation and Dynamic Programming [PDF] 返回目录
  William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
Abstract: This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and all possible generation orders. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood. When applied to end-to-end speech recognition, the Imputer outperforms prior non-autoregressive models and achieves competitive results to autoregressive models. On LibriSpeech test-other, the Imputer achieves 11.1 WER, outperforming CTC at 13.0 WER and seq2seq at 12.5 WER.
摘要:本文介绍了Imputer,经由插补迭代地生成输出序列的神经序列的模型。所述Imputer是一个迭代生成模型,只需要独立的输入或输出的令牌的数量的生成步骤常数。的Imputer可以训练大约边缘化在输入和输出序列,以及所有可能的代级之间的所有可能的比对。我们提出了一个听话的动态编程训练算法,这将产生一个下界日志边际可能性。当应用到终端到终端的语音识别,在Imputer优于以前的非自回归模型,并实现有竞争力的结果自回归模型。上LibriSpeech测试其他的Imputer达到11.1 WER,在12.5 WER优于CTC在13.0 WER和seq2seq。

16. Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges [PDF] 返回目录
  Clément Moulin-Frier, Pierre-Yves Oudeyer
Abstract: Computational models of emergent communication in agent populations are currently gaining interest in the machine learning community due to recent advances in Multi-Agent Reinforcement Learning (MARL). Current contributions are however still relatively disconnected from the earlier theoretical and computational literature aiming at understanding how language might have emerged from a prelinguistic substance. The goal of this paper is to position recent MARL contributions within the historical context of language evolution research, as well as to extract from this theoretical and computational background a few challenges for future research.
摘要:在代理群体应急通信的计算模型,目前获得在机器学习领域的兴趣是由于多Agent强化学习(MARL)的最新进展。当前贡献但是还是比较从早期的理论和计算文学,旨在了解语言如何可能从一个前语言的物质出现断开。本文的目标是从这个理论和计算背景为今后的研究一些挑战定位到提取语言进化研究的历史范围内最近MARL的贡献,以及。

17. How To Avoid Being Eaten By a Grue: Exploration Strategies for Text-Adventure Agents [PDF] 返回目录
  Prithviraj Ammanabrolu, Ethan Tien, Zhaochen Luo, Mark O. Riedl
Abstract: Text-based games -- in which an agent interacts with the world through textual natural language -- present us with the problem of combinatorially-sized action-spaces. Most current reinforcement learning algorithms are not capable of effectively handling such a large number of possible actions per turn. Poor sample efficiency, consequently, results in agents that are unable to pass bottleneck states, where they are unable to proceed because they do not see the right action sequence to pass the bottleneck enough times to be sufficiently reinforced. Building on prior work using knowledge graphs in reinforcement learning, we introduce two new game state exploration strategies. We compare our exploration strategies against strong baselines on the classic text-adventure game, Zork1, where prior agent have been unable to get past a bottleneck where the agent is eaten by a Grue.
摘要:基于文本的游戏 - 其中通过文本自然语言与世界的代理进行交互 - 现在我们有组合地大小的动作空间的问题。目前大多数的强化学习算法不能够有效地处理如此大量的每回合可能采取的行动。可怜样的效率,因此,在那些无法通过瓶颈状态,他们不能继续进行,因为他们看不到正确的行动顺序传递瓶颈足够的时间药的结果进行充分加固。使用知识图中强化学习之前工作的基础上,我们引入了两个新的游戏状态探索的战略。我们将我们的勘探战略对抗的经典文字冒险游戏,Zork1,其中前代理已经无法让过去在代理被怪兽吃掉的瓶颈强大的基线。

18. Interactive Natural Language-based Person Search [PDF] 返回目录
  Vikram Shree, Wei-Lun Chao, Mark Campbell
Abstract: In this work, we consider the problem of searching people in an unconstrained environment, with natural language descriptions. Specifically, we study how to systematically design an algorithm to effectively acquire descriptions from humans. An algorithm is proposed by adapting models, used for visual and language understanding, to search a person of interest (POI) in a principled way, achieving promising results without the need to re-design another complicated model. We then investigate an iterative question-answering (QA) strategy that enable robots to request additional information about the POI's appearance from the user. To this end, we introduce a greedy algorithm to rank questions in terms of their significance, and equip the algorithm with the capability to dynamically adjust the length of human-robot interaction according to model's uncertainty. Our approach is validated not only on benchmark datasets but on a mobile robot, moving in a dynamic and crowded environment.
摘要:在这项工作中,我们考虑搜索不受约束的环境中的人,用自然语言描述的问题。具体来说,我们研究如何系统地设计一个算法,从人类有效地获取描述。一种算法是通过调整模型提出的,用于视觉和语言的理解,有原则的方式对人的搜索兴趣点(POI),取得可喜的成果,而不需要重新设计的另一个复杂的模型。然后,我们调查的迭代答疑(QA)的战略,使机器人能够请求关于该用户的POI的外观的其他信息。为此,我们在他们的意义方面引入贪心算法来排名的问题,并装备了算法的能力,动态调整,根据模型的不确定性人类与机器人互动的长度。我们的方法是有效的,不仅在标准数据集,但在移动机器人上,在一个充满活力和拥挤的环境中移动。

注:中文为机器翻译结果!