摘要

1. SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model [PDF] 返回目录
Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao
Abstract: This paper presents a new method SOLOIST, which uses transfer learning to efficiently build task-oriented dialog systems at scale. We parameterize a dialog system using a Transformer-based auto-regressive language model, which subsumes different dialog mod-ules (e.g.,state tracker, dialog policy, responsegenerator) into a single neural model. We pre-train, on large heterogeneous dialog corpora, a large-scale Transformer model which can generate dialog responses grounded in user goals and real-world knowledge for task completion. The pre-trained model can be efficiently adapted to accomplish a new dialog task with a handful of task-specific dialogs via machine teaching. Our experiments demonstrate that (i) SOLOIST creates new state-of-the-art results on two well-known benchmarks, CamRest and MultiWOZ, (ii) in the few-shot learning setting, the dialog systems developed by SOLOIST significantly outperform those by existing methods, and (iii) the use of machine teaching substantially reduces the labeling cost. We will release our code and pre-trained models for reproducible research.
摘要：本文提出了一种新的方法独奏，它使用迁移学习有效地构建面向任务的对话系统大规模。我们使用参数基于变压器的自回归语言模型，它包含了不同的对话MOD-ULES（例如，状态追踪器，对话政策，responsegenerator）成一个单一的神经网络模型的对话系统。我们预火车，在大型异构对话框语料库，一个大型变压器模型，可以生成用户的目标和任务的完成现实世界的知识接地对话响应。预先训练的模型可以有效地适用于实现通过计算机教学任务的具体对话了一把新的对话框任务。我们的实验证明，（我）独奏创建国家的最先进的新上两个著名的基准，CamRest和MultiWOZ结果，（二）在几个拍的学习环境，由独奏开发的对话系统显著优于那些由现有的方法，以及（iii）使用机教学的大大降低了贴标成本。我们会发布我们的代码和预先训练模型重复性研究。

2. Multidirectional Associative Optimization of Function-Specific Word Representations [PDF] 返回目录
Daniela Gerz, Ivan Vulić, Marek Rei, Roi Reichart, Anna Korhonen
Abstract: We present a neural framework for learning associations between interrelated groups of words such as the ones found in Subject-Verb-Object (SVO) structures. Our model induces a joint function-specific word vector space, where vectors of e.g. plausible SVO compositions lie close together. The model retains information about word group membership even in the joint space, and can thereby effectively be applied to a number of tasks reasoning over the SVO structure. We show the robustness and versatility of the proposed framework by reporting state-of-the-art results on the tasks of estimating selectional preference and event similarity. The results indicate that the combinations of representations learned with our task-independent model outperform task-specific architectures from prior work, while reducing the number of parameters by up to 95%.
摘要：我们提出了学习单词的相关群体之间的关联神经框架，如那些在主题谓宾（SVO）结构中。我们的模型诱导关节功能特定字向量空间，其中的载体，例如似是而非SVO组合物位于靠近在一起。该模型保留约什在关节间隙字组成员信息，并且可以从而有效地被应用到了一些推理在SVO结构的任务。我们通过对报告估计selectional偏好和事件相似的任务，国家的最先进的结果证明了该框架的鲁棒性和多功能性。结果表明，与以前的工作，我们的任务无关的模型跑赢大盘任务的具体架构了解到，虽然高达95％，减少的参数的数量表示的组合。

3. A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering [PDF] 返回目录
Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme
Abstract: Legislation can be viewed as a body of prescriptive rules expressed in natural language. The application of legislation to facts of a case we refer to as statutory reasoning, where those facts are also expressed in natural language. Computational statutory reasoning is distinct from most existing work in machine reading, in that much of the information needed for deciding a case is declared exactly once (a law), while the information needed in much of machine reading tends to be learned through distributional language statistics. To investigate the performance of natural language understanding approaches on statutory reasoning, we introduce a dataset, together with a legal-domain text corpus. Straightforward application of machine reading models exhibits low out-of-the-box performance on our questions, whether or not they have been fine-tuned to the legal domain. We contrast this with a hand-constructed Prolog-based system, designed to fully solve the task. These experiments support a discussion of the challenges facing statutory reasoning moving forward, which we argue is an interesting real-world task that can motivate the development of models able to utilize prescriptive rules specified in natural language.
摘要：立法可以被视为自然语言表达的规范规则的机构。立法，我们指的法定理由，在这些事实也用自然语言表达的情况下的事实中的应用。计算的法定理由是从机器读取大部分现有的工作不同，在这么多的需要决定的情况下，该信息被宣告一次（法律），而在多机读取所需要的信息往往通过分布式的语言统计学习。为了研究自然语言理解的性能接近法定理由，我们引入一个数据集，具有法律域文本语料库在一起。机器阅读我们的问题模型表现出低出的现成的性能，无论其是否已直接应用微调的法律领域。我们用手工构建了一个基于Prolog的系统，旨在全面解决任务对比这一点。这些实验支持的面向法定推理前进的挑战，我们认为是一个有趣的真实世界的任务，可以激励的能够利用自然语言指定的法定规则模式的发展进行了讨论。

4. Reinforced Rewards Framework for Text Style Transfer [PDF] 返回目录
Abhilasha Sancheti, Kundan Krishna, Balaji Vasan Srinivasan, Anandhavelu Natarajan
Abstract: Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In this work, we propose a reinforcement learning based framework that directly rewards the framework on these target metrics yielding a better transfer of the target style. We show the improved performance of our proposed framework based on automatic and human evaluation on three independent tasks: wherein we transfer the style of text from formal to informal, high excitement to low excitement, modern English to Shakespearean English, and vice-versa in all the three cases. Improved performance of the proposed framework over existing state-of-the-art frameworks indicates the viability of the approach.
摘要：与算法样式转移交易，以一段文本的样式属性转移成另一种，同时确保核心内容将被保留。目前已经有很多的文字风格转移领域的兴趣，由于其广泛的应用到定制的文本生成。现有工程评估基于内容的保存和传输强度的风格传递模型。在这项工作中，我们提出了强化学习基础框架，直接奖励在这些目标指标得到目标风格更好的传输架构。我们会根据自动和人工评估三个独立的任务，我们提出的架构性能的改进：其中我们文本的样式从正式转会到正规，高兴奋的低兴奋，现代英语莎士比亚英语，反之亦然所有这三种情况。在现有的国家的最先进的框架建议框架的改进表现表明该方法的可行性。

5. Toward Better Storylines with Sentence-Level Language Models [PDF] 返回目录
Daphne Ippolito, David Grangier, Douglas Eck, Chris Callison-Burch
Abstract: We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives. Since it does not need to model fluency, the sentence-level language model can focus on longer range dependencies, which are crucial for multi-sentence coherence. Rather than dealing with individual words, our method treats the story so far as a list of pre-trained sentence embeddings and predicts an embedding for the next sentence, which is more efficient than predicting word embeddings. Notably this allows us to consider a large number of candidates for the next sentence during training. We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task and with promising results on larger-scale next sentence prediction tasks.
摘要：我们提出了一个句子层次的语言模型，从有限的流畅替代的选择下一个句子中的一个故事。因为它不需要模型的流畅性，句子层次的语言模型可以专注于更长距离的依赖，这是多句子的连贯性是至关重要的。而不是处理个人的话，我们的方法治疗迄今为止的故事作为预先训练句子的嵌入列表，并预测下一句，这比预测的嵌入字更有效地嵌入。值得注意的是这可以让我们在训练中考虑了大量的下一句候选人。我们通过我们在无人监督的故事完形填空任务状态的最先进的准确性，并与更大规模的下一句预测任务可喜的成果方法的有效性。

6. A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction [PDF] 返回目录
Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, Minlie Huang
Abstract: Neural models have achieved great success on machine reading comprehension (MRC), many of which typically consist of two components: an evidence extractor and an answer predictor. The former seeks the most relevant information from a reference text, while the latter is to locate or generate answers from the extracted evidence. Despite the importance of evidence labels for training the evidence extractor, they are not cheaply accessible, particularly in many non-extractive MRC tasks such as YES/NO question answering and multi-choice MRC. To address this problem, we present a Self-Training method (STM), which supervises the evidence extractor with auto-generated evidence labels in an iterative process. At each iteration, a base MRC model is trained with golden answers and noisy evidence labels. The trained model will predict pseudo evidence labels as extra supervision in the next iteration. We evaluate STM on seven datasets over three MRC tasks. Experimental results demonstrate the improvement on existing MRC models, and we also analyze how and why such a self-training method works in MRC.
摘要：神经车型都取得了机器阅读理解（MRC）大获成功，其中许多通常由两个部分组成：一个证据提取器和答案的预测。前者旨在从参考文本最相关的信息，而后者则是定位或产生从提取的证据的答案。尽管有证据标签的用于训练的证据提取器的重要性，他们不便宜访问，特别是在许多非采掘MRC任务，如YES / NO问答和多选择MRC。为了解决这个问题，我们提出了一个自我训练的方法（STM），负责监督与在迭代过程中自动生成的证据标签证据提取。在每次迭代中，基本MRC模型进行训练金色的答案和嘈杂的证据标签。训练的模型会预测伪证据标签作为下一次迭代额外的监管。我们评估七个数据集超过三个MRC任务STM。实验结果表明，在现有的MRC车型的改进，我们还分析了这样的自我训练方法MRC如何以及为什么工作。

7. Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain [PDF] 返回目录
Mohammad Amin Samadi, Mohammad Sadegh Akhondzadeh, Sayed Jalal Zahabi, Mohammad Hossein Manshaei, Zeinab Maleki, Payman Adibi
Abstract: Word embeddings have found their way into a wide range of natural language processing tasks including those in the biomedical domain. While these vector representations successfully capture semantic and syntactic word relations, hidden patterns and trends in the data, they fail to offer interpretability. Interpretability is a key means to justification which is an integral part when it comes to biomedical applications. We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods. Qualitative and quantitative measurements and metrics for interpretability of word vector representations are provided. For the quantitative evaluation, we introduce an extensive categorized dataset that can be used to quantify interpretability based on category theory. Intrinsic and extrinsic evaluation of the studied methods are also presented. As for the latter, we propose datasets which can be utilized for effective extrinsic evaluation of word vectors in the biomedical domain. Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
摘要：Word中的嵌入已经找到他们的方式进入大范围的自然语言处理任务，包括那些在生物医学领域。虽然这些向量表示成功占领语义和语法词的关系，隐藏的模式和趋势的数据，但是他们没有提供解释性。可解释性是理由这是一个不可分割的组成部分，当涉及到生物医学应用的关键手段。我们目前在医疗领域的嵌入一词的解释性一个包容性的研究，注重疏方法的作用。提供了字向量表示的解释性定性和定量测量和度量。对于定量评价，我们介绍了可用于基于范畴论量化解释性广泛的分级式数据集。所研究方法的内在和外在的评价也呈现。至于后者，我们提出一种可用于在生物医学领域字向量的有效外在评估数据集。根据我们的实验中，可以看出，稀疏词矢量显示更可解释性，同时保留其原有的载体下游任务的性能。

8. A Deep Learning Approach for Automatic Detection of Fake News [PDF] 返回目录
Tanik Saikh, Arkadipta De, Asif Ekbal, Pushpak Bhattacharyya
Abstract: Fake news detection is a very prominent and essential task in the field of journalism. This challenging problem is seen so far in the field of politics, but it could be even more challenging when it is to be determined in the multi-domain platform. In this paper, we propose two effective models based on deep learning for solving fake news detection problem in online news contents of multiple domains. We evaluate our techniques on the two recently released datasets, namely FakeNews AMT and Celebrity for fake news detection. The proposed systems yield encouraging performance, outperforming the current handcrafted feature engineering based state-of-the-art system with a significant margin of 3.08% and 9.3% by the two models, respectively. In order to exploit the datasets, available for the related tasks, we perform cross-domain analysis (i.e. model trained on FakeNews AMT and tested on Celebrity and vice versa) to explore the applicability of our systems across the domains.
摘要：假新闻的检测是在新闻领域非常突出和重要任务。在政治领域，到目前为止这一具有挑战性的问题是看到的，但是当它是在多域平台，以确定它可能是更具挑战性。在本文中，我们提出了基于深度学习在多域网络新闻内容解决假新闻的检测问题，有两种有效的模式。我们评估我们的两个最近发布的数据集，即FakeNews AMT和名人的假新闻的检测技术。所提出的系统分别得到令人鼓舞的表现，由两个模型跑赢当前的手工特色工程基于状态的最先进的系统，为3.08％和9.3％显著保证金。为了利用数据集，可用于相关任务，我们执行跨域分析（即培训了FakeNews AMT和测试的名人模型，反之亦然）来探索我们的跨网域系统的适用性。

9. Towards logical negation for compositional distributional semantics [PDF] 返回目录
Martha Lewis
Abstract: The categorical compositional distributional model of meaning gives the composition of words into phrases and sentences pride of place. However, it has so far lacked a model of logical negation. This paper gives some steps towards providing this operator, modelling it as a version of projection onto the subspace orthogonal to a word. We give a small demonstration of the operators performance in a sentence entailment task.
摘要：意义的范畴组成分布式模型给出的单词组成成短语和句子骄傲的地方。然而，迄今缺乏逻辑否定的典范。本文给出了对提供这个操作符，它建模为一个版本投影到子空间正交的一句话一些步骤。我们给出一个句子中蕴涵任务的运营商表现小规模示威。

10. Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation [PDF] 返回目录
Aditya Siddhant, Ankur Bapna, Yuan Cao, Orhan Firat, Mia Chen, Sneha Kudugunta, Naveen Arivazhagan, Yonghui Wu
Abstract: Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with self-supervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on ro-en translation without any parallel data or back-translation.
摘要：在过去的几年里在资源匮乏的神经机器翻译（NMT）两个有前途的研究方向已经出现。首先侧重使用高资源语言，以提高通过多语种NMT低资源语言的质量。第二个方向采用单语数据与自检预先训练翻译模型，随后微调对少量监督数据。在这项工作中，我们一起研究这两条线，展示一种语言的数据对多语言NMT自我监督的有效性。我们提供三个主要的结果：（i）使用显著单语数据提升低资源语言的多语言模型的翻译质量。（二）自检提高了多种语言模型零次翻译质量。（三）利用单语数据和自检提供了对增加新的语言多语言模型，起床后33 BLEU对滚装连接转换，而无需任何并行数据或回译的可行路径。

11. A SentiWordNet Strategy for Curriculum Learning in Sentiment Analysis [PDF] 返回目录
Vijjini Anvesh Rao, Kaveri Anuranjana, Radhika Mamidi
Abstract: Curriculum Learning (CL) is the idea that learning on a training set sequenced or ordered in a manner where samples range from easy to difficult, results in an increment in performance over otherwise random ordering. The idea parallels cognitive science's theory of how human brains learn, and that learning a difficult task can be made easier by phrasing it as a sequence of easy to difficult tasks. This idea has gained a lot of traction in machine learning and image processing for a while and recently in Natural Language Processing (NLP). In this paper, we apply the ideas of curriculum learning, driven by SentiWordNet in a sentiment analysis setting. In this setting, given a text segment, our aim is to extract its sentiment or polarity. SentiWordNet is a lexical resource with sentiment polarity annotations. By comparing performance with other curriculum strategies and with no curriculum, the effectiveness of the proposed strategy is presented. Convolutional, Recurrence, and Attention-based architectures are employed to assess this improvement. The models are evaluated on a standard sentiment dataset, Stanford Sentiment Treebank.
摘要：课程学习（CL）的理念是学习上的训练组测序或在样本范围从易到难，结果以增量业绩超过否则随机排序的方式排列。这个想法平行的人类大脑如何学习认知科学的理论，以及学习困难的任务可以由通过措辞它的易到难的任务序列容易。这个想法在自然语言处理（NLP）已经获得了机器学习和图像处理的大量关注了一段时间，最近。在本文中，我们应用课程的学习，通过SentiWordNet在情绪分析设置驱动的想法。在这种背景下，给定一个文本段，我们的目的是提取其情绪或极性。 SentiWordNet与情感倾向注释的词汇资源。通过比较与其他课程的战略和没有课程的性能，该策略的有效性提出。卷积，复发和基于注意的架构被用来评估这种改进。该机型是在一个标准的情绪数据集，斯坦福情绪树库进行评估。

12. Towards Robustifying NLI Models Against Lexical Dataset Biases [PDF] 返回目录
Xiang Zhou, Mohit Bansal
Abstract: While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics. Using contradiction-word bias and word-overlapping bias as our two bias examples, this paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. Next, we also compare two ways of directly debiasing the model without knowing what the dataset biases are in advance. The first approach aims to remove the label bias at the embedding level. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features by forcing orthogonality between these two sub-models. We performed evaluations on new balanced datasets extracted from the original MNLI dataset as well as the NLI stress tests, and show that the orthogonality approach is better at debiasing the model while maintaining competitive overall accuracy. Our code and data are available at: this https URL
摘要：尽管深度学习模型正在对自然语言推理的任务进度快，最近的研究还表明，这些模型通过利用几个数据集的偏差达到较高的精度，而且没有语言语义的深刻理解。利用矛盾字偏置和文字重叠偏置作为我们两个偏置的例子，本文探讨了数据级和模型级消除直流偏压的方法来对词汇集偏见robustify模型。首先，我们通过消除直流偏压数据扩张和增强，但显示的数据集的模型偏置不能完全通过此方法去除。其次，我们也比较直接消除直流偏压模式不知道什么数据集的偏见是提前两个方面。第一种方法旨在消除在嵌入级别的标签偏差。第二种方法采用了一袋字子模型捕捉到有可能利用偏置功能和防止原始模型由迫使这两个子型号之间的正交性学习这些偏置功能。我们从原来的MNLI数据集以及该NLI压力测试，并显示了正交方法是更好地消除直流偏压模式，同时保持竞争力的整体精度提取新的平衡的数据集进行评估。我们的代码和数据，请访问：此HTTPS URL

13. CTC-synchronous Training for Monotonic Attention Model [PDF] 返回目录
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
Abstract: Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilities cannot be leveraged in the alignment marginalization process during training due to left-to-right dependency in the decoder. This results in the error propagation of alignments to subsequent token generation. To address this problem, we propose CTC-synchronous training (CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments. Reference CTC alignments are extracted from a CTC branch sharing the same encoder. The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments. Experimental evaluations of the TEDLIUM release-2 and Librispeech corpora show that the proposed method significantly improves recognition, especially for long utterances. We also show that CTC-ST can bring out the full potential of SpecAugment for MoChA.
摘要：单调逐块的关注（摩卡）已经研究了基于序列对序列框架在线流自动语音识别（ASR）。与此相反，以联结时间分类（CTC），向后概率不能在对准过程边缘化训练期间利用由于在解码器中左到右的依赖性。这导致比对以后续令牌生成的误差传播。为了解决这个问题，我们提出了CTC-同步训练（CTC-ST），其中摩卡使用CTC路线学习最佳单调路线。参考CTC比对从CTC分支共享相同的编码器提取。整个模型共同优化，使得从摩卡预期边界与比对同步。在TEDLIUM的实验评估发布-2和Librispeech语料库表明，该方法显著提高认识，尤其是对长期话语。我们还表明，CTC-ST能衬托出SpecAugment的全部潜力摸查。

14. Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning [PDF] 返回目录
Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu
Abstract: Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
摘要：大多数图像字幕模型是自回归的，即它们生成之前生成的话，这会导致沉重的延迟推理过程中调节每一个字。最近，非自回归解码已经在机器翻译中提出的并行产生的所有单词，加快推理时间。通常情况下，这些模型使用词级交叉熵损失独立地优化每个字。然而，这样的学习过程中没有考虑到的句子级的一致性，从而导致这些非自回归模型的低劣质量产生。在本文中，我们提出了一个非自回归图片字幕（NAIC）用一种新的培训模式模式：反事实的关键Multi-Agent的学习（CMAL）。 CMAL制定NAIC为靶序列中的位置被看作是学会协作最大化句级奖励代理多代理强化学习系统。此外，我们建议利用大量未标记的图像，以提高性能的字幕。在MSCOCO图像大量的实验字幕基准表明我们的模型NAIC达到堪比国家的最先进的自回归模型性能，同时带来了13.9倍解码加速。

15. From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information [PDF] 返回目录
Shen Gao, Xiuying Chen, Zhaochun Ren, Dongyan Zhao, Rui Yan
Abstract: Text summarization is the research area aiming at creating a short and condensed version of the original document, which conveys the main idea of the document in a few words. This research topic has started to attract the attention of a large community of researchers, and it is nowadays counted as one of the most promising research areas. In general, text summarization algorithms aim at using a plain text document as input and then output a summary. However, in real-world applications, most of the data is not in a plain text format. Instead, there is much manifold information to be summarized, such as the summary for a web page based on a query in the search engine, extreme long document (e.g., academic paper), dialog history and so on. In this paper, we focus on the survey of these new summarization tasks and approaches in the real-world application.
摘要：文本概括为研究区，旨在创建原始文件，传达三言两语文档的主要思想的短期和浓缩版。本课题已开始吸引大量的社区研究人员的注意，它是时下算作是最有前景的研究领域之一。一般地，文本摘要的算法的目的是使用纯文本文档作为输入，然后输出的摘要。然而，在实际应用中，大部分数据是不是在一个纯文本格式。取而代之的是要进行汇总的多歧管的信息，如内容中对基于搜索引擎的查询，极端的长文档（例如，学术论文），对话历史等网页。在本文中，我们专注于这些新概括任务的调查，在现实世界的应用方法。

16. How Context Affects Language Models' Factual Predictions [PDF] 返回目录
Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel
Abstract: When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an information retrieval system with a machine reading component. In this paper, we go a step further and integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Furthermore, processing query and context with different segment tokens allows BERT to utilize its Next Sentence Prediction pre-trained classifier to determine whether the context is relevant or not, substantially improving BERT's zero-shot cloze-style question-answering performance and making its predictions robust to noisy contexts.
摘要：当预先训练的大监督的文本语料库，语言模型能够存储和检索事实性知识在一定程度上，使得它可以直接使用他们的零次完形填空式问答。然而，在一个固定数量的语言模型的权重的存储事实性知识显然有其局限性。使用一台机器读取组件相结合的信息检索系统的监督架构以前的方法已经成功地提供了模型权重之外获取信息。在本文中，我们进一步走一步，从一个检索系统，在一个纯粹的无监督的方式预先训练的语言模型整合信息。我们报告，以这种方式增强预训练的语言模型显着提高了性能和所产生的系统，尽管是无人监管，是有监督的机器阅读基线的竞争力。此外，处理查询和上下文不同段的令牌允许BERT利用其下一句预测预训练的分类，以确定上下文是否相关与否，大幅度提高BERT的零射门完形填空式答疑性能，使得其预测强劲嘈杂的环境中。

17. Posterior Control of Blackbox Generation [PDF] 返回目录
Xiang Lisa Li, Alexander M. Rush
Abstract: Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models. In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach. Under this formulation, task-specific knowledge can be encoded through a range of rich, posterior constraints that are effectively trained into the model. This approach allows users to ground internal model decisions based on prior knowledge, without sacrificing the representational power of neural generative models. Experiments consider applications of this approach for text generation. We find that this method improves over standard benchmarks, while also providing fine-grained control.
摘要：文本生成往往需要高精度的输出服从任务的具体规则。这种细粒度的控制是很难过的，现成的深度学习模型执行。在这项工作中，我们考虑增强神经一代车型通过一个结构化的潜变量方法学会独立控制状态。根据这个公式，任务特定知识可以通过有效的培训纳入模型范围的丰富，后限制进行编码。这种方法允许用户根据先验知识接地内部模型决定，在不牺牲神经生成模型的表现力。实验认为这种方法用于文本生成的应用程序。我们发现，这种方法改善标准基准测试，同时还提供细粒度控制。

18. Article citation study: Context enhanced citation sentiment detection [PDF] 返回目录
Vishal Vyas, Kumar Ravi, Vadlamani Ravi, V.Uma, Srirangaraj Setlur, Venu Govindaraju
Abstract: Citation sentimet analysis is one of the little studied tasks for scientometric analysis. For citation analysis, we developed eight datasets comprising citation sentences, which are manually annotated by us into three sentiment polarities viz. positive, negative, and neutral. Among eight datasets, three were developed by considering the whole context of citations. Furthermore, we proposed an ensembled feature engineering method comprising word embeddings obtained for texts, parts-of-speech tags, and dependency relationships together. Ensembled features were considered as input to deep learning based approaches for citation sentiment classification, which is in turn compared with Bag-of-Words approach. Experimental results demonstrate that deep learning is useful for higher number of samples, whereas support vector machine is the winner for smaller number of samples. Moreover, context-based samples are proved to be more effective than context-less samples for citation sentiment analysis.
摘要：引文sentimet分析对科学计量分析的研究很少任务之一。对于引文分析，我们制定了8个数据集，包括引用的句子，这是手动注释我们分为三个情绪极性即。正面，负面和中性。在八个数据集，三是考虑引文的整个背景下开发的。此外，我们提出了文本获得合奏的特点工程方法，包括字的嵌入，部分的词性标记，依存关系在一起。合奏的特点被认为是输入为引用情感分类，而这又与一袋词方法相比深度学习为基础的方法。实验结果表明，深学习对于较高数目的采样是有用的，而支持向量机是用于样品的更小数量的赢家。此外，基于上下文的样本被证明比为引文情绪分析上下文较少样本更有效。

19. What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context [PDF] 返回目录
Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov
Abstract: Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online, has made it impossible to fact-check every single suspicious claim, either manually or automatically. Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content. This approach makes it possible to detect likely "fake news" the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself on Twitter) vs. (ii) who read it (i.e., analyzing the readers of the target medium on Facebook, Twitter, and YouTube). We further study (iii) what was written about the target medium on Wikipedia. The evaluation results show that what was written matters most, and that putting all information sources together yields huge improvements over the current state-of-the-art.
摘要：预测政治偏见和整个新闻媒体报告的真实性是媒体分析，这是一个充分研究，但一个日益重要的研究方向的关键要素。假，施力，宣传性内容的增殖在线的本水平，已使得不可能事实检查每一个可疑权利要求，无论是手动或自动。或者，我们可以分析整个新闻媒体，寻找那些可能发布虚假或有偏颇的内容。这种方法能够检测可能的“假新闻”，他们发布的那一刻，通过简单地检查其来源的可靠性。从实用的角度来看，政治偏见和报告的真实性有语言方面更是一种社会背景。在这里，我们既研究的影响，即（i）写着什么（即什么是由目标介质，并公布其描述如何在Twitter本身）与（II）谁读它（即分析的读者在Facebook，Twitter和YouTube的目标介质）。我们进一步研究（三）什么写在维基百科上的目标介质。评价结果表明，发生了什么，写得最多的问题，而把所有的信息源一起产生在当前国家的最先进的巨大的改进。

20. Finding Universal Grammatical Relations in Multilingual BERT [PDF] 返回目录
Ethan A. Chi, John Hewitt, Christopher D. Manning
Abstract: Recent work has found evidence that Multilingual BERT (mBERT), a transformer-based multilingual masked language model, is capable of zero-shot cross-lingual transfer, suggesting that some aspects of its representations are shared cross-lingually. To better understand this overlap, we extend recent work on finding syntactic trees in neural networks' internal representations to the multilingual setting. We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English, and that these subspaces are approximately shared across languages. Motivated by these results, we present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels, in the form of clusters which largely agree with the Universal Dependencies taxonomy. This evidence suggests that even without explicit supervision, multilingual masked language models learn certain linguistic universals.
摘要：最近的研究发现的证据表明，多语种BERT（mBERT），基于变压器的多语种蒙面语言模型，能够零射门跨语言传递的，这表明其陈述的某些方面是共享的跨舌。为了更好地理解这种重叠，我们在神经网络的内部交涉，多语种设置发现语法树延续近期的工作。我们发现，mBERT表示的子空间收回英语以外的语言语法树的距离，而这些子空间跨越语言大约共享。这些结果的启发，我们提出了一种无监督的分析方法，提供语法结构标签的证据mBERT获悉表示，在集群这在很大程度上与通用分类依赖性同意的形式。这一证据表明，即使没有明确的监督，多语种蒙面语言模型学习一些语言的共性。

21. Empowering Active Learning to Jointly Optimize System and User Demands [PDF] 返回目录
Ji-Ung Lee, Christian M. Meyer, Iryna Gurevych
Abstract: Existing approaches to active learning maximize the system performance by sampling unlabeled instances for annotation that yield the most efficient training. However, when active learning is integrated with an end-user application, this can lead to frustration for participating users, as they spend time labeling instances that they would not otherwise be interested in reading. In this paper, we propose a new active learning approach that jointly optimizes the seemingly counteracting objectives of the active learning system (training efficiently) and the user (receiving useful instances). We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user, while the users should receive only exercises that match their skills. We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
摘要：现有的方法为主动学习最大化抽样注释未标记的情况下，能产生最有效的培训系统的性能。然而，当主动学习与最终用户的应用程序集成，这可能会导致沮丧的参与用户，因为他们花时间标记的实例，他们不会以其他方式在阅读兴趣。在本文中，我们提出了一个新的主动学习的办法，共同优化了主动学习系统（有效培训）和用户（接收有用的实例）看似抵消目标。我们研究我们的教育应用程序，从这种技术，因为系统需要迅速特别的好处学会预测练习特定用户的适当方式，而用户应该仅仅接收符合他们技能的练习。我们评估多种学习策略和用户类型从真正的用户资料后，发现我们共同的做法更符合这两个目标时，可供选择的方法导致最终用户的许多不适合练习。

22. The Structured Weighted Violations MIRA [PDF] 返回目录
Dor Ringel, Rotem Dror, Roi Reichart
Abstract: We present the Structured Weighted Violation MIRA (SWVM), a new structured prediction algorithm that is based on an hybridization between MIRA (Crammer and Singer, 2003) and the structured weighted violations perceptron (SWVP) (Dror and Reichart, 2016). We demonstrate that the concepts developed in (Dror and Reichart, 2016) combined with a powerful structured prediction algorithm can improve performance on sequence labeling tasks. In experiments with syntactic chunking and named entity recognition (NER), the new algorithm substantially outperforms the original MIRA as well as the original structured perceptron and SWVP. Our code is available at this https URL.
摘要：我们目前的结构化加权违反MIRA（SWVM），是基于MIRA（谎言和歌手，2003年）和结构化加权违规感知器（的SVVVP）（德罗尔和Reichart，2016）之间的杂交新的结构预测算法。我们表明，在一个功能强大的结构预测算法相结合（德罗尔和Reichart，2016）提出的概念可以提高序列标注任务中的表现。与句法分块和命名实体识别（NER）的实验中，新的算法基本上优于原始MIRA以及结构化感知和的SVVVP原稿。我们的代码可在此HTTPS URL。

23. Generating Pertinent and Diversified Comments with Topic-aware Pointer-Generator Networks [PDF] 返回目录
Junheng Huang, Lu Pan, Kang Xu, Weihua Peng, Fayuan Li
Abstract: Comment generation, a new and challenging task in Natural Language Generation (NLG), attracts a lot of attention in recent years. However, comments generated by previous work tend to lack pertinence and diversity. In this paper, we propose a novel generation model based on Topic-aware Pointer-Generator Networks (TPGN), which can utilize the topic information hidden in the articles to guide the generation of pertinent and diversified comments. Firstly, we design a keyword-level and topic-level encoder attention mechanism to capture topic information in the articles. Next, we integrate the topic information into pointer-generator networks to guide comment generation. Experiments on a large scale of comment generation dataset show that our model produces the valuable comments and outperforms competitive baseline models significantly.
摘要：评论代，在自然语言生成（NLG）一个新的和艰巨的任务，吸引了大量的关注，近年来。然而，先前的工作所产生的意见往往缺乏针对性和多样性。在本文中，我们提出了基于主题感知一个新的一代车型指针发电机网络（TPGN），其可以利用隐藏在文章的主题信息，以指导有关的产生和多元化的意见。首先，我们设计了一个关键字级和主题级编码器注意机制在文章捕捉主题的信息。接下来，我们将整合主题的信息，为指针发电机网络指导意见产生。大规模的评论生成数据集上，我们的模型产生了宝贵的意见和显著优于竞争力的基准模型实验。

24. Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation [PDF] 返回目录
Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang
Abstract: Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi-supervised policy learning. The proposed approach learns a dynamics model as the reward function which models dialogue progress (i.e., state-action sequences) based on expert demonstrations, either with or without annotations. The dynamics model computes rewards by predicting whether the dialogue progress is consistent with expert demonstrations. We further propose to learn action embeddings for a better generalization of the reward function. The proposed approach outperforms competitive policy learning baselines on MultiWOZ, a benchmark multi-domain dataset.
摘要：对话政策优化往往得到的反馈，直到面向任务的对话系统，任务完成。这是不够的用于训练中间对话匝因为监督信号（或奖励）在对话结束时仅提供。为了解决这个问题，奖励学习已经引进了一个最优策略的状态 - 行为对学习提供转由转的奖励。这种方法需要人对人的对话（即，专家演示），这是劳动密集的完整状态动作注解。为了克服这种局限性，我们提出了半监督政策学习一种新的奖励学习方法。所提出的方法获知一个动力学模型作为回报函数，其模型对话的进展（即，状态 - 动作序列）的基础上专家示范，具有或不具有注释。动力学模型通过预测对话的进展是否与专家一致示威计算奖励。我们进一步提出了学习行动的嵌入的奖励功能的泛化。所提出的方法优于竞争力的政策学习上MultiWOZ，基准多域数据集的基线。

25. It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations [PDF] 返回目录
Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher
Abstract: Training on only perfect Standard English corpora predisposes pre-trained neural networks to discriminate against minorities from non-standard linguistic backgrounds (e.g., African American Vernacular English, Colloquial Singapore English, etc.). We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popular NLP models, e.g., BERT and Transformer, and show that adversarially fine-tuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.
摘要：培训上只有完美的标准英语语料库易患预训练神经网络，通过非标准语言背景对少数民族的歧视（例如，非洲裔美国黑人英语，口语新加坡英语等）。我们扰乱词的屈折形态手艺暴露在流行NLP模型，例如，BERT和变压器这些偏见，并表明adversarially微调他们一个划时代合理和语义相似对抗性的例子显著提高了耐用性不干净牺牲性能数据。

26. Diversifying Dialogue Generation with Non-Conversational Text [PDF] 返回目录
Hui Su, Xiaoyu Shen, Sanqiang Zhao, Xiao Zhou, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou
Abstract: Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-diversity problem when it comes to open-domain dialogue generation. As bland and generic utterances usually dominate the frequency distribution in our daily chitchat, avoiding them to generate more interesting responses requires complex data filtering, sampling techniques or modifying the training objective. In this paper, we propose a new perspective to diversify dialogue generation by leveraging non-conversational text. Compared with bilateral conversations, non-conversational text are easier to obtain, more diverse and cover a much broader range of topics. We collect a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets. We further present a training paradigm to effectively incorporate these text via iterative back translation. The resulting model is tested on two conversational datasets and is shown to produce significantly more diverse responses without sacrificing the relevance with context.
摘要：神经网络为基础的序列对序列（seq2seq）车型强从低多样性问题的困扰，当谈到开域对话产生。作为乏味和通用的话语通常占主导地位在我们日常闲聊的频率分布，避免他们产生更有趣的反应，需要复杂的数据滤波，采样技术或修改的培训目标。在本文中，我们提出了一个新的视角通过利用非会话文本多样化对话产生。与双边对话，非对话文本相比更容易获得，更多样化和主题覆盖的范围更广。我们收集的多来源，包括论坛，评论，成语和书籍片段大规模非会话语料库。我们进一步提出了一种训练模式，以有效地将通过反复的回译这些文字。将得到的模型是在两个数据集会话测试，被示出为产生显著更多样化的响应，而不会牺牲与上下文的相关性。

27. Generalizing Outside the Training Set: When Can Neural Networks Learn Identity Effects? [PDF] 返回目录
Simone Brugiapaglia, Matthew Liu, Paul Tupper
Abstract: Often in language and other areas of cognition, whether two components of an object are identical or not determine whether it is well formed. We call such constraints identity effects. When developing a system to learn well-formedness from examples, it is easy enough to build in an identify effect. But can identity effects be learned from the data without explicit guidance? We provide a simple framework in which we can rigorously prove that algorithms satisfying simple criteria cannot make the correct inference. We then show that a broad class of algorithms including deep neural networks with standard architecture and training with backpropagation satisfy our criteria, dependent on the encoding of inputs. Finally, we demonstrate our theory with computational experiments in which we explore the effect of different input encodings on the ability of algorithms to generalize to novel inputs.
摘要：经常在语言和认知的其他领域，不论对象的两种组分是相同的或不确定它是否很好地形成。我们称这种制约身份的影响。在开发系统时要学会从例子良构性，很容易足以建立在识别效果。但身份的影响从没有明确的指导数据学到了什么？我们提供一个简单的框架，使我们可以严格证明，满足简单的标准算法不能作出正确的推断。然后，我们表明，一大类算法，包括使用标准架构和训练反向传播深层神经网络，满足我们的标准，依赖于输入的编码。最后，我们证明我们的理论，在我们探讨的算法推广到新投入的能力不同的输入编码的效果计算实验。

28. LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation [PDF] 返回目录
Gustavo Aguilar, Sudipta Kar, Thamar Solorio
Abstract: Recent trends in NLP research have raised an interest in linguistic code-switching (CS); modern approaches have been proposed to solve a wide range of NLP tasks on multiple language pairs. Unfortunately, these proposed methods are hardly generalizable to different code-switched languages. In addition, it is unclear whether a model architecture is applicable for a different task while still being compatible with the code-switching setting. This is mainly because of the lack of a centralized benchmark and the sparse corpora that researchers employ based on their specific needs and interests. To facilitate research in this direction, we propose a centralized benchmark for Linguistic Code-switching Evaluation (LinCE) that combines ten corpora covering four different code-switched language pairs (i.e., Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic) and four tasks (i.e., language identification, named entity recognition, part-of-speech tagging, and sentiment analysis). As part of the benchmark centralization effort, we provide an online platform at this http URL, where researchers can submit their results while comparing with others in real-time. In addition, we provide the scores of different popular models, including LSTM, ELMo, and multilingual BERT so that the NLP community can compare against state-of-the-art systems. LinCE is a continuous effort, and we will expand it with more low-resource languages and tasks.
摘要：最近在NLP研究趋势已经提出了在语言代码交换（CS）的利益;现代方法已经提出了解决范围广泛的多语言对NLP任务。不幸的是，这些建议的方法是很难推广到不同的代码交换语言。此外，目前还不清楚一个模型架构是否适用于不同的任务，同时仍然与代码转换设置兼容。这主要是因为缺乏一个集中的标杆和稀疏的语料库，研究人员根据他们的具体需求和利益聘用。为了便于在这个方向的研究，我们提出了语言码转换评估（临策）集中基准，结合10语料库涵盖四个不同的代码交换语言对（即，西班牙语，英语，尼泊尔语，英语，印地文，英语，现代标准阿拉伯语 - 埃及阿拉伯语）和四个任务（即，语言识别，命名实体识别，部分词性标注，和情感分析）。作为基准集中努力的一部分，我们提供这个HTTP URL，那里的研究人员可以在与其他人进行实时比较提交结果的网上平台。此外，我们提供不同的受欢迎的机型，包括LSTM，埃尔莫和会讲多种语言BERT使得NLP社区可以比较的国家的最先进的系统的分数。林塞是一个不断的努力，我们将与更多的低资源语言和任务展开。

29. Probing Linguistic Systematicity [PDF] 返回目录
Emily Goodwin, Koustuv Sinha, Timothy J. O'Donnell
Abstract: Recently, there has been much interest in the question of whether deep natural language understanding models exhibit systematicity; generalizing such that units like words make consistent contributions to the meaning of the sentences in which they appear. There is accumulating evidence that neural models often generalize non-systematically. We examined the notion of systematicity from a linguistic perspective, defining a set of probes and a set of metrics to measure systematic behaviour. We also identified ways in which network architectures can generalize non-systematically, and discuss why such forms of generalization may be unsatisfying. As a case study, we performed a series of experiments in the setting of natural language inference (NLI), demonstrating that some NLU systems achieve high overall performance despite being non-systematic.
摘要：最近，一直在深自然语言理解模型是否具有系统性的问题多大的兴趣;概括使得类的字眼单位作出在它们出现的句子的意思相一致的贡献。目前的证据表明神经模型通常概括非系统。我们研究系统性的概念，从语言的角度来看，定义一组探测器和一组指标来衡量系统的行为。我们还确定了在其网络架构可以概括非系统，并讨论为什么推广这种形式可能是不令人满意的方式。作为一个案例研究中，我们在自然语言推理（NLI）的设置进行了一系列的实验，证明了一些NLU系统尽管被非系统性实现高的整体性能。

30. Temporal Common Sense Acquisition with Minimal Supervision [PDF] 返回目录
Ben Zhou, Qiang Ning, Daniel Khashabi, Dan Roth
Abstract: Temporal common sense (e.g., duration and frequency of events) is crucial for understanding natural language. However, its acquisition is challenging, partly because such information is often not expressed explicitly in text, and human annotation on such concepts is costly. This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense, extracted from a large corpus, to build TACOLM, a temporal common sense language model. Our method is shown to give quality predictions of various dimensions of temporal common sense (on UDST and a newly collected dataset from RealNews). It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA (on TimeBank, HiEVE and MCTACO) that are better than using the standard BERT. Thus, it will be an important component of temporal NLP.
摘要：临时常识（例如，持续时间和事件的频率）是理解自然语言的关键。然而，其收购的挑战，一方面是因为这些信息往往是不明确的文字表达，这种观念的人注解是昂贵的。这项工作提出了利用显性和隐性的时间常识提到了一个新序列建模方法，从大量语料中提取，打造TACOLM，时间常识语言模型。我们的方法被示出，得到的时空常识（上UDST和从RealNews一个新收集的数据集）的各种尺寸的质量预测。它也产生了比使用标准BERT更好地为相关的任务，如持续时间比较，亲子关系，事件共指和时间QA事件的陈述（在TimeBank，HiEVE和MCTACO）。因此，这将是颞NLP的重要组成部分。

31. Adversarial Learning for Supervised and Semi-supervised Relation Extraction in Biomedical Literature [PDF] 返回目录
Peng Su, K. Vijay-Shanker
Abstract: Adversarial training is a technique of improving model performance by involving adversarial examples in the training process. In this paper, we investigate adversarial training with multiple adversarial examples to benefit the relation extraction task. We also apply adversarial training technique in semi-supervised scenarios to utilize unlabeled data. The evaluation results on protein-protein interaction and protein subcellular localization task illustrate adversarial training provides improvement on the supervised model, and is also effective on involving unlabeled data in the semi-supervised training case. In addition, our method achieves state-of-the-art performance on two benchmarking datasets.
摘要：对抗性训练是涉及在训练过程中对抗的例子提高模型性能的技术。在本文中，我们研究了对抗性训练，多对抗的例子，以造福于关系抽取任务。我们也适用于半监督情景对抗训练技术，以利用未数据。蛋白质 - 蛋白质相互作用和蛋白的亚细胞定位的任务评价结果说明对抗性的训练提供了有关监管模式的改进，也是在半监督训练案件涉及未标记的数据有效。此外，我们的方法实现两个基准数据集的国家的最先进的性能。

32. ConvoKit: A Toolkit for the Analysis of Conversations [PDF] 返回目录
Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Z. Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil
Abstract: This paper describes the design and functionality of ConvoKit, an open-source toolkit for analyzing conversations and the social interactions embedded within. ConvoKit provides an unified framework for representing and manipulating conversational data, as well as a large and diverse collection of conversational datasets. By providing an intuitive interface for exploring and interacting with conversational data, this toolkit lowers the technical barriers for the broad adoption of computational methods for conversational analysis.
摘要：本文介绍了ConvoKit，一个开源工具包，用于分析谈话和嵌入在社会互动的设计和功能。 ConvoKit提供用于表示和操纵会话数据，以及一个庞大而多样的集合会话数据集的统一框架。通过探索与会话数据交互提供了一个直观的界面，这个工具包降低了广泛采用的会话进行分析计算方法的技术壁垒。

33. Balancing Objectives in Counseling Conversations: Advancing Forwards or Looking Backwards [PDF] 返回目录
Justine Zhang, Cristian Danescu-Niculescu-Mizil
Abstract: Throughout a conversation, participants make choices that can orient the flow of the interaction. Such choices are particularly salient in the consequential domain of crisis counseling, where a difficulty for counselors is balancing between two key objectives: advancing the conversation towards a resolution, and empathetically addressing the crisis situation. In this work, we develop an unsupervised methodology to quantify how counselors manage this balance. Our main intuition is that if an utterance can only receive a narrow range of appropriate replies, then its likely aim is to advance the conversation forwards, towards a target within that range. Likewise, an utterance that can only appropriately follow a narrow range of possible utterances is likely aimed backwards at addressing a specific situation within that range. By applying this intuition, we can map each utterance to a continuous orientation axis that captures the degree to which it is intended to direct the flow of the conversation forwards or backwards. This unsupervised method allows us to characterize counselor behaviors in a large dataset of crisis counseling conversations, where we show that known counseling strategies intuitively align with this axis. We also illustrate how our measure can be indicative of a conversation's progress, as well as its effectiveness.
摘要：在整个对话，使参与者能够定向交互流程的选择。寻求解决推进谈话，同情地处理危机情况：这样的选择是在危机咨询的相应域，在辅导员的困难是两个关键目标之间的平衡尤为突出。在这项工作中，我们开发了一种无监督的方法来量化辅导员如何管理这种平衡。我们的主要直觉的是，如果一个发声只能接收适当答复的窄范围内，则其可能的目标是向前推进交谈中，向在该范围内的目标。同样，只能适当地跟随可能话语的窄范围内的发声很可能在向后寻址范围内的具体的情况，旨在。通过施加这种直觉，我们可以在每个发声映射到连续取向轴捕捉到其预定向前或向后引导会话的流动程度。这种无监督的方法使我们能够在一个大的数据集危机咨询谈话，在这里我们表明，已知的辅导策略直观与此轴对齐的表征顾问的行为。我们也说明我们的措施如何能表明一个对话的进展，以及其有效性。

34. Text-Based Ideal Points [PDF] 返回目录
Keyon Vafa, Suresh Naidu, David M. Blei
Abstract: Ideal point models analyze lawmakers' votes to quantify their political positions, or ideal points. But votes are not the only way to express a political position. Lawmakers also give speeches, release press statements, and post tweets. In this paper, we introduce the text-based ideal point model (TBIP), an unsupervised probabilistic topic model that analyzes texts to quantify the political positions of its authors. We demonstrate the TBIP with two types of politicized text data: U.S. Senate speeches and senator tweets. Though the model does not analyze their votes or political affiliations, the TBIP separates lawmakers by party, learns interpretable politicized topics, and infers ideal points close to the classical vote-based ideal points. One benefit of analyzing texts, as opposed to votes, is that the TBIP can estimate ideal points of anyone who authors political texts, including non-voting actors. To this end, we use it to study tweets from the 2020 Democratic presidential candidates. Using only the texts of their tweets, it identifies them along an interpretable progressive-to-moderate spectrum.
摘要：理想点模型分析国会议员的投票来量化他们的政治立场，或理想的点。但票是不表达政治立场的唯一途径。国会议员也发表演讲，发布新闻稿，和后鸣叫。在本文中，我们介绍了基于文本的理想点模型（TBIP），用于分析文本，以量化其作者的政治立场无人监督的概率主题模型。我们证明两种类型的政治文本数据的TBIP：美国参议院的演讲和参议员鸣叫。虽然模型不甲方分析他们的选票或政治从属关系时，TBIP中隔离立法者，获悉可解释的政治话题，贴近传统的基于投理想点推断理想点。分析文本的一个好处，而不是选票，是该TBIP可以估计人的理想点谁作者的政治文本，包括无投票权的行为。为此，我们利用它来研究从2020年民主党总统候选人的鸣叫。只使用他们的鸣叫的文本，它确定他们一起可解释的渐进至中度的频谱。

35. The Safari of Update Structures: Visiting the Lens and Quantum Enclosures [PDF] 返回目录
Matthew Wilson, James Hefford, Guillaume Boisseau, Vincent Wang
Abstract: We build upon our recently introduced concept of an update structure to show that they are a generalisation of very-well-behaved lenses, that is, there is a bijection between a strict subset of update structures and vwb lenses in cartesian categories. We then begin to investigate the zoo of possible update structures. We show that update structures survive decoherence and are sufficiently general to capture quantum observables, pinpointing the additional assumptions required to make the two coincide. In doing so, we shift the focus from dagger-special commutative Frobenius algebras to interacting (co)magma (co)module pairs, showing that the algebraic properties of the (co)multiplication arise from the module-comodule interaction, rather than direct assumptions about the magmacomagma pair. Thus this work is of foundational interest as update structures form a strictly more general class of algebraic objects, the taming of which promises to illuminate novel relationships between separately studied mathematical structures.
摘要：我们建立在我们最近推出的更新结构的概念，以表明他们是非常-乖巧镜头的概括，那就是，有更新的结构和VWB镜头的直角类别严格的子集之间的一一对应。然后，我们开始调查可能的更新结构的动物园。我们发现，更新的结构生存消相干和足够一般以捕获量子观测，精确定位需要额外的假设使两者不谋而合。在这样做时，我们从匕首特殊交换弗罗贝纽斯代数重点转移到相互作用（共）岩浆（共）模块对，显示出（共）的代数性质乘法从模块余模相互作用产生的，而不是直接的假设关于magmacomagma对。因此，这种工作是基本感兴趣，因为更新的结构形成一个严格的更一般类别的代数的对象，其中，驯服的承诺照亮分别进行研究的数学结构之间新颖关系。

36. Commonsense Evidence Generation and Injection in Reading Comprehension [PDF] 返回目录
Ye Liu, Tao Yang, Zeyu You, Wei Fan, Philip S. Yu
Abstract: Human tackle reading comprehension not only based on the given context itself but often rely on the commonsense beyond. To empower the machine with commonsense reasoning, in this paper, we propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI. The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking. Specifically, we build two evidence generators: the first generator aims to generate textual evidence via a language model; the other generator aims to extract factual evidence (automatically aligned text-triples) from a commonsense knowledge graph after graph completion. Those evidences incorporate contextual commonsense and serve as the additional inputs to the model. Thereafter, we propose a deep contextual encoder to extract semantic relationships among the paragraph, question, option, and evidence. Finally, we employ a capsule network to extract different linguistic units (word and phrase) from the relations, and dynamically predict the optimal option based on the extracted units. Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches and achieves the accuracy (83.6%) on the leaderboard.
摘要：人类应对阅读不仅是基于给定的情况下本身的理解，但往往依靠常识超越。与常识推理授权的机器，在本文中，我们提出了一个常识证据的产生和注入框架在阅读理解，命名CEGI。该框架注入2种辅助常识证据转化为综合阅读与理性思维的能力装备机器。具体来说，我们构建两个证据发电机：第一发电机目标产生，通过语言模型文本证据;其他发电机组旨在提取图完成后常识性知识图的事实证据（自动对齐文本的三倍）。这些证据包括上下文常识，并作为额外的输入到模型中。此后，我们提出了一个深刻的背景编码器来提取段落，问题，选项和证据之间的语义关系。最后，我们使用的胶囊网络来提取关系不同的语言单位（字和词组）和动态预测基于提取单位的最佳选择。在CosmosQA实验数据集表明，该CEGI模型优于当前状态的最先进的方法，并实现在排行榜的准确性（83.6％）。

37. End-To-End Speech Synthesis Applied to Brazilian Portuguese [PDF] 返回目录
Edresson Casanova, Arnaldo Candido Junior, Frederico Santos de Oliveira, Christopher Shulby, João Paulo Teixeira, Moacir Antonelli Ponti, Sandra Maria Aluisio
Abstract: Voice synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. Voice provides an natural way for human-computer interaction. However, not all languages are in the same level when accounting resources and systems for voice synthesis. This work consists of the creation of publicly available resources for the Brazilian Portuguese language in the form of a dataset and deep learning models for end-to-end voice synthesis. The dataset has 10.5 hours from a single speaker. We investigated three different architectures to perform end-to-end speech synthesis: Tacotron 1, DCTTS and Mozilla TTS. We also analysed the performance of models according to different vocoders (RTISI-LA, WaveRNN and Universal WaveRNN), phonetic transcriptions usage, transfer learning (from English) and denoising. In the proposed scenario, a model based on Mozilla TTS and RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. We also verified that transfer learning, phonetic transcriptions and denoising are useful to train the models over the presented dataset. The obtained results are comparable to related works covering English, even using a smaller dataset.
摘要：语音合成系统在不同的应用，如个人助理，GPS应用程序，屏幕阅读器和辅助工具受欢迎。语音提供了人机交互的自然方式。然而，并非所有的语言都在同一水平占资源和系统时语音合成。这项工作包括对数据集中和深入的学习模式为终端到终端的语音合成形式的巴西葡萄牙语语言创作的公开可用的资源。数据集具有从单个扬声器10.5小时。我们研究了三种不同的架构来进行终端到终端的语音合成：Tacotron 1，DCTTS和Mozilla TTS。我们还分析，根据不同的声码器（RTISI-LA，WaveRNN和环球WaveRNN），注音使用，迁移学习（英语）和降噪模型的性能。在提出的方案中，基于Mozilla TTS和RTISI-LA声码器的模型提供了最好的性能，实现了4.03的MOS值。我们也验证了迁移学习，音标和去噪是在提交数据集模型训练有用。得到的结果与相关作品涵盖英语，即使使用较小的数据集。

38. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition [PDF] 返回目录
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang
Abstract: Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, which limits their practical applications. To address this issue, we propose a non-autoregressive end-to-end speech recognition system called LASO (listen attentively, and spell once). Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. Without beam-search, the one-pass propagation much reduces inference time cost of LASO. And because the model is based on the attention based feedforward structure, the computation can be implemented in parallel efficiently. We conduct experiments on publicly available Chinese dataset AISHELL-1. LASO achieves a character error rate of 6.4%, which outperforms the state-of-the-art autoregressive transformer model (6.7%). The average inference latency is 21 ms, which is 1/50 of the autoregressive transformer model.
摘要：尽管基于关注终端到高端机型都取得了有前途的语音识别性能，多通过推理时间成本，这限制了其实际应用的波束搜索增加演计算。为了解决这个问题，我们提出了所谓的LASO非自回归终端到终端的语音识别系统（倾听，和法术一次）。因为非自回归特性，LASO预测文本标记序列中而不对其他标记的依赖性。无梁搜索中，一个通传播多减少LASO的推理时间成本。而且，由于该模型是基于基于关注前馈结构，计算可以并行高效地实现。我们可公开获得的数据集中国AISHELL-1进行实验。 LASO实现了6.4％的字符误差率，其性能优于国家的最先进的自回归变压器模型（6.7％）。平均推断延迟为21毫秒，这是自回归变压器模型的1/50。

39. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [PDF] 返回目录
Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine
Abstract: This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
摘要：这项工作提出了多分类的新挑战组，侧重于多模因检测仇恨言论。它是这样构成的单峰模型奋斗，只有多模式能够成功：困难的例子（“良性的混淆”）添加到数据集，使其很难依靠单峰信号。该任务需要微妙的推理，又是简单的计算为二元分类问题。我们提供单峰模型基准性能数据，以及为多型号不同复杂程度。我们发现国家的最先进的方法，即表现不佳相比，人类（64.73％对84.7％的准确度），说明任务的艰巨性，突出的挑战，这一重要问题的姿态给社会。

40. Knowledge Graph semantic enhancement of input data for improving AI [PDF] 返回目录
Shreyansh Bhatt, Amit Sheth, Valerie Shalin, Jinjin Zhao
Abstract: Intelligent systems designed using machine learning algorithms require a large number of labeled data. Background knowledge provides complementary, real world factual information that can augment the limited labeled data to train a machine learning algorithm. The term Knowledge Graph (KG) is in vogue as for many practical applications, it is convenient and useful to organize this background knowledge in the form of a graph. Recent academic research and implemented industrial intelligent systems have shown promising performance for machine learning algorithms that combine training data with a knowledge graph. In this article, we discuss the use of relevant KGs to enhance input data for two applications that use machine learning -- recommendation and community detection. The KG improves both accuracy and explainability.
摘要：智能系统使用机器学习算法需要大量的标签数据而设计的。背景知识提供了一个可以增加有限的标记数据来训练机器学习算法的互补性，现实世界中真实的信息。术语知识图（KG）盛行作为对于许多实际应用，这是方便和有用的组织这种背景知识中的曲线图的形式。最近的学术研究和实施工业智能系统已展现出可喜的是有知识的图形相结合的训练数据的机器学习算法的性能。在这篇文章中，我们讨论了利用相关的幼稚园，以提高输入数据的两个应用程序，使用机器学习 - 推荐和社区发现。在KG提高了准确度和explainability。

41. Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis [PDF] 返回目录
Thomas Drugman, Thierry Dutoit
Abstract: It was recently shown that complex cepstrum can be effectively used for glottal flow estimation by separating the causal and anticausal components of speech. In order to guarantee a correct estimation, some constraints on the window have been derived. Among these, the window has to be synchronized on a Glottal Closure Instant. This paper proposes an extension of the complex cepstrum-based decomposition by incorporating a chirp analysis. The resulting method is shown to give a reliable estimation of the glottal flow wherever the window is located. This technique is then suited for its integration in usual speech processing systems, which generally operate in an asynchronous way. Besides its potential for automatic voice quality analysis is highlighted.
摘要：最近表明复倒谱可以通过分离的因果和语音的反因果部件被有效地用于声门流估计。为了确保正确的估计，在窗户上的一些限制已得到。其中，窗口有一个声门闭合即时同步。本文提出了通过将线性调频脉冲的分析的复杂的基于倒频谱分解的扩展。将得到的方法被示出，得到声门流的任何地方窗口位于的可靠估计。然后，该技术适合于它在通常的语音处理系统，其通常操作以异步方式集成。除了其自动语音质量分析潜在的高亮显示。

42. BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps [PDF] 返回目录
Wang Zhu, Hexiang Hu, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha
Abstract: Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially. A special design memory buffer is used by the agent to turn its past experiences into contexts for future steps. The learning process is composed of two phases. In the first phase, the agent uses imitation learning from demonstration to accomplish BabySteps. In the second phase, the agent uses curriculum-based reinforcement learning to maximize rewards on navigation tasks with increasingly longer instructions. We create two new benchmark datasets (of long navigation tasks) and use them in conjunction with existing ones to examine BabyWalk's generalization ability. Empirical results show that BabyWalk achieves state-of-the-art results on several metrics, in particular, is able to follow long instructions better. The codes and the datasets are released on our project page this https URL.
摘要：学习遵循指示是为视觉和语言导航自主代理（VLN）至关重要。在本文中，我们研究了来自包括较短的语料库学习当代理如何导航长路径。我们发现，现有的国家的最先进的药物不能推广好。为此，我们提出BabyWalk，即通过分解长的指令转换成较短的（BabySteps）中，依次完成他们学会了导航新VLN剂。特殊设计的内存缓冲区用于由代理把它过去的经历为背景的未来步骤。学习的过程是由两个阶段组成。在第一阶段，代理使用模仿学习的示范来完成BabySteps。在第二阶段，代理使用基于课程的强化学习与越来越长的指令最大限度地提高导航任务奖励。我们创建了两个新的基准数据集（长导航任务），并结合使用他们现有的检查BabyWalk的泛化能力。实证结果表明，BabyWalk达到几个指标的国家的最先进的成果，特别是能够按照指令长好。该代码和数据集被释放我们的项目页面此HTTPS URL上。

43. Transformer-Based Language Models for Similar Text Retrieval and Ranking [PDF] 返回目录
Javed Qadrud-Din, Ashraf Bah Rabiou, Ryan Walker, Ravi Soni, Martin Gajek, Gabriel Pack, Akhil Rangaraj
Abstract: Most approaches for similar text retrieval and ranking with long natural language queries rely at some level on queries and responses having words in common with each other. Recent applications of transformer-based neural language models to text retrieval and ranking problems have been very promising, but still involve a two-step process in which result candidates are first obtained through bag-of-words-based approaches, and then reranked by a neural transformer. In this paper, we introduce novel approaches for effectively applying neural transformer models to similar text retrieval and ranking without an initial bag-of-words-based step. By eliminating the bag-of-words-based step, our approach is able to accurately retrieve and rank results even when they have no non-stopwords in common with the query. We accomplish this by using bidirectional encoder representations from transformers (BERT) to create vectorized representations of sentence-length texts, along with a vector nearest neighbor search index. We demonstrate both supervised and unsupervised means of using BERT to accomplish this task.
摘要：大多数方法类似文本检索和长自然语言查询排名靠在上查询，并相互具有共同点的话回应了一些水平。基于变压器的神经语言模型的文本检索和排名问题的最新的应用已经非常有前景，但仍涉及两个步骤的过程中，导致考生首先通过基于袋的词的办法获得，然后通过重新分级神经变压器。在本文中，我们介绍了有效应用神经变压器模型类似的文本检索和排名没有初始的基于袋的字步新方法。通过消除基于袋的字步，我们的方法甚至能准确地检索和排名结果时，他们没有共同的非禁用词与查询。我们通过使用双向编码器表示从变压器（BERT）来创建句子长度的文本的矢量表示，与近邻搜索索引向量一起完成这个任务。我们演示使用BERT来完成这项任务的两个监督和无监督的手段。

44. Incremental Learning for End-to-End Automatic Speech Recognition [PDF] 返回目录
Li Fu, Xiaoxiao Li, Libo Zi
Abstract: We propose an incremental learning for end-to-end Automatic Speech Recognition (ASR) to extend the model's capacity on a new task while retaining the performance on existing ones. The proposed method is effective without accessing to the old dataset to address the issues of high training cost and old dataset unavailability. To achieve this, knowledge distillation is applied as a guidance to retain the recognition ability from the previous model, which is then combined with the new ASR task for model optimization. With an ASR model pre-trained on 12,000h Mandarin speech, we test our proposed method on 300h new scenario task and 1h new named entities task. Experiments show that our method yields 3.25% and 0.88% absolute Character Error Rate (CER) reduction on the new scenario, when compared with the pre-trained model and the full-data retraining baseline, respectively. It even yields a surprising 0.37% absolute CER reduction on the new scenario than the fine-tuning. For the new named entities task, our method significantly improves the accuracy compared with the pre-trained model, i.e. 16.95% absolute CER reduction. For both of the new task adaptions, the new models still maintain a same accuracy with the baseline on the old tasks.
摘要：我们提出了终端到终端的自动语音识别（ASR）的增量学习到一个新的任务扩展模型的能力，同时保留对现有的性能。该方法是在不访问旧的数据集，以解决高培训费用和旧数据集不可用的问题，有效的。为了实现这一目标，知识蒸馏应用为导向，以保留从以前的模型，然后将其与模型优化的新任务ASR结合识别能力。随着ASR上12,000h汉语语音预先训练的模型，我们测试我们提出了300H新方案的任务和新的1H命名实体的任务的方法。实验表明，当与预先训练模式和完整的数据分别再培训基线相比，我们的方法产生了3.25％，并在新的场景绝对字错误率（CER）减少0.88％。它甚至产生比上微调的新方案惊人的0.37％的绝对减少CER。对于新的命名实体的任务，我们的方法显著改善与预先训练模式相比的准确性，即16.95％的绝对减少CER。对于这两种新任务adaptions，新车型仍保持与旧任务的基线相同的精度。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-12

目录

摘要