0%

【arxiv论文】 Computation and Language 2020-04-17

目录

1. Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network [PDF] 摘要
2. Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic [PDF] 摘要
3. Cross-lingual Contextualized Topic Models with Zero-shot Learning [PDF] 摘要
4. Do sequence-to-sequence VAEs learn global features of sentences? [PDF] 摘要
5. Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation [PDF] 摘要
6. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection [PDF] 摘要
7. Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers [PDF] 摘要
8. Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [PDF] 摘要
9. Suicidal Ideation and Mental Disorder Detection with Attentive Relation Networks [PDF] 摘要
10. LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation [PDF] 摘要
11. TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition [PDF] 摘要
12. Paraphrase Augmented Task-Oriented Dialog Generation [PDF] 摘要
13. The Right Tool for the Job: Matching Model and Instance Complexities [PDF] 摘要
14. Non-Autoregressive Machine Translation with Latent Alignments [PDF] 摘要
15. Neural Data-to-Text Generation with Dynamic Content Planning [PDF] 摘要
16. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data [PDF] 摘要
17. Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation [PDF] 摘要
18. Learning Structured Embeddings of Knowledge Graphs with Adversarial Learning Framework [PDF] 摘要
19. There is Strength in Numbers: Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training [PDF] 摘要
20. Deep Generation of Coq Lemma Names Using Elaborated Terms [PDF] 摘要
21. A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation [PDF] 摘要
22. Trakhtenbrot's Theorem in Coq, A Constructive Approach to Finite Model Theory [PDF] 摘要
23. A Discriminator Improves Unconditional Text Generation without Updating the Generator [PDF] 摘要

摘要

1. Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network [PDF] 返回目录
  Md. Rezaul Karim, Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae, Michael Cochez
Abstract: Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behaviours analysis, document characterization, and sentiment analysis by predicting the contexts mostly for highly resourced languages such as English. However, there are languages that are under-resources, e.g., South Asian languages like Bengali, Tamil, Assamese, Telugu that lack of computational resources for the NLP tasks. In this paper, we provide several classification benchmarks for Bengali, an \texttt{under-resourced language}. We prepared three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively. We built the largest Bengali word embedding models to date based on 250 million articles, which we call \texttt{BengFastText}. We perform three different experiments, covering document classification, sentiment analysis, and hate speech detection. We incorporate word embeddings into a Multichannel Convolutional-LSTM~(\texttt{MConv-LSTM}) network for predicting different types of hate speech, document classification, and sentiment analysis. Experiments demonstrate that \texttt{BengFastText} can capture the semantics of words from respective contexts correctly. Evaluations against several baseline embedding models, e.g., Word2Vec and GloVe yield up to 92.30\%, 82.25\%, and 90.45\% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.
摘要:社交媒体和微博客网站的指数生长不仅提供平台授权的表情和个人的声音的自由,而且也使人们表达喜欢网上骚扰,网络欺凌和仇恨言论反社会行为。许多作品已被提出通过预测情境大多为资源丰富的条件语言,如英语,以利用这些数据对社会和反社会行为的分析,文件鉴定和情感分析。然而,也有各种语言下的资源,例如,南亚语言,如孟加拉语,泰米尔语,阿萨姆语,泰卢固语,对于NLP任务缺乏计算资源。在本文中,我们为孟加拉提供几种分类基准,一个\ texttt {资源不足的语言}。我们准备表达仇恨,常用的主题,并分别仇恨言论检测,文档分类和情感分析,意见的三个数据集。我们建造的最大的孟加拉字嵌入基于2.5亿条模型到今天为止,我们称之为\ texttt {} BengFastText。我们执行三个不同的实验,涵盖了文档分类,情感分析,和仇恨言论检测。我们结合词的嵌入到一个多通道的卷积LSTM〜(\ texttt {MConv-LSTM})网络用于预测不同类型的仇恨言论,文档分类和情感分析。实验表明,\ texttt {BengFastText}可以从各自的上下文中正确地捕获单词的语义。针对几种基线嵌入模型,例如,Word2Vec和手套产量可达92.30 \%,82.25 \%,和90.45 \%F1-分数文件分类,情绪分析,和憎恨语音检测,分别的情况下评价5倍交叉期间-validation测试。

2. Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic [PDF] 返回目录
  Jón Friðrik Daðason, David Erik Mollberg, Hrafn Loftsson, Kristín Bjarnadóttir
Abstract: In this paper, we present a character-based BiLSTM model for splitting Icelandic compound words, and show how varying amounts of training data affects the performance of the model. Compounding is highly productive in Icelandic, and new compounds are constantly being created. This results in a large number of out-of-vocabulary (OOV) words, negatively impacting the performance of many NLP tools. Our model is trained on a dataset of 2.9 million unique word forms and their constituent structures from the Database of Icelandic Morphology. The model learns how to split compound words into two parts and can be used to derive the constituent structure of any word form. Knowing the constituent structure of a word form makes it possible to generate the optimal split for a given task, e.g., a full split for subword tokenization, or, in the case of part-of-speech tagging, splitting an OOV word until the largest known morphological head is found. The model outperforms other previously published methods when evaluated on a corpus of manually split word forms. This method has been integrated into Kvistur, an Icelandic compound word analyzer.
摘要:在本文中,我们提出了拆分冰岛复合词基于字符的BiLSTM模型,并显示出不同的训练数据如何影响模型的性能。配混是在冰岛高产,并且正在不断地被创建的新的化合物。这导致大量超出词汇(OOV)的话,许多N​​LP工具的性能产生负面影响。我们的模型是在290万种字的独特形式和冰岛形态的数据库的成分结构数据集训练。该模型学习如何复合词分成两个部分,并且可以被用来导出任何字形式的构成结构。知道一个词形的组织结构能够产生给定任务的最佳分割,例如,对于子字标记化全裂,或在部分词性标注的情况下,分割的OOV词,直到最大已知形态头中找到。其它的模型优于上的手动分割单词形式的语料库评价当先前公布的方法。这种方法已经被集成到Kvistur,冰岛复合词分析仪。

3. Cross-lingual Contextualized Topic Models with Zero-shot Learning [PDF] 返回目录
  Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini
Abstract: Many data sets in a domain (reviews, forums, news, etc.) exist in parallel languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduce a zero-shot cross-lingual topic model, i.e., our model learns topics on one language (here, English), and predicts them for documents in other languages. By using the text of the same document in different languages, we can evaluate the quality of the predictions. Our results show that topics are coherent and stable across languages, which suggests exciting future research directions.
摘要:许多数据集有一个域(评论,论坛,新闻等),并行语言存在。他们都覆盖相同的内容,但语言上的差异使之无法使用传统的,袋的字为基础的主题模型。模型必须是单语或从一个巨大的,但极其稀疏的词汇量受到影响。这两个问题可以通过转移学习来解决。在本文中,我们引入了一个零次跨语种主题模型,即,我们的模型获悉一种语言(在这里,英语)的主题,并预测它们用于其他语言的文档。通过使用不同语言的同一文档的文本,我们可以评估预测的质量。我们的研究结果表明,主题是一致的,稳定的跨语言,这意味着令人振奋的未来研究方向。

4. Do sequence-to-sequence VAEs learn global features of sentences? [PDF] 返回目录
  Tom Bosc, Pascal Vincent
Abstract: A longstanding goal in NLP is to compute global sentence representations. Such representations would be useful for sample-efficient semi-supervised learning and controllable text generation. To learn to represent global and local information separately, Bowman & al. (2016) proposed to train a sequence-to-sequence model with the variational auto-encoder (VAE) objective. What precisely is encoded in these latent variables expected to capture global features? We measure which words benefit most from the latent information by decomposing the reconstruction loss per position in the sentence. Using this method, we see that VAEs are prone to memorizing the first words and the sentence length, drastically limiting their usefulness. To alleviate this, we propose variants based on bag-of-words assumptions and language model pretraining. These variants learn latents that are more global: they are more predictive of topic or sentiment labels, and their reconstructions are more faithful to the labels of the original documents.
摘要:NLP一个长期目标是计算全球句子表示。这样的表示将样品高效的半监督学习和可控的文本生成有用的。要学会代表全局和局部信息分开,鲍曼人。 (2016)提出来训练序列到序列模型与变自动编码器(VAE)目标。什么是精确预期捕捉到全球的功能,这些潜在变量编码?我们衡量哪些词在句子分解每个位置重建损失受益于潜在的信息最多。使用这种方法,我们可以看到,VAES易于记忆的第一句话和句子长度,大大限制了其有效性。为了缓解这个问题,我们提出了一种基于袋的词假设和语言模型训练前的变种。这些变体得知更多全球latents:他们更有预测性的话题和情感的标签,他们的重建都比较忠实于原始文档的标签。

5. Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation [PDF] 返回目录
  Haoyu Song, Yan Wang, Wei-Nan Zhang, Xiaojiang Liu, Ting Liu
Abstract: Maintaining a consistent personality in conversations is quite natural for human beings, but is still a non-trivial task for machines. The persona-based dialogue generation task is thus introduced to tackle the personality-inconsistent problem by incorporating explicit persona text into dialogue generation models. Despite the success of existing persona-based models on generating human-like responses, their one-stage decoding framework can hardly avoid the generation of inconsistent persona words. In this work, we introduce a three-stage framework that employs a generate-delete-rewrite mechanism to delete inconsistent words from a generated response prototype and further rewrite it to a personality-consistent one. We carry out evaluations by both human and automatic metrics. Experiments on the Persona-Chat dataset show that our approach achieves good performance.
摘要:在谈话中保持一致的个性是人类很自然的,但仍然是机械的不平凡的任务。因此,基于角色的对话生成任务介绍通过引入明确的角色文本对话代车型,以解决个性不一致的问题。尽管现有的基于角色的模型对发电成功类似人类的反应,他们的一个阶段解码框架难以避免的矛盾角色的话产生。在这项工作中,我们引入了一个三阶段的框架,采用生成,删除重写机制,以从生成的响应原型删除不一致的话,并进一步将其改写为个性一致的。我们以人类和自动指标进行评估。在实验假面聊天数据集上,我们的方法取得了良好的业绩。

6. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection [PDF] 返回目录
  Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg
Abstract: The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for general scenarios, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
摘要:能够控制供种神经表示编码的信息有多种使用情况,特别是在解释这些模型的挑战的光芒。我们本迭代零空间投影(INLP),用于去除从神经表征信息的新方法。我们的方法是基于预测某个属性,我们旨在消除线性分类,然后在其零空间的表示的投影的反复训练。通过这样做,分类成为无视该目标属性,使其难以按照线性给它的数据分开。虽然适用于一般情况下,我们评估我们的偏见和公平的用例方法,并表明我们的方法能够减轻多类分类的设置在字的嵌入偏见,以及对增加公平性。

7. Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers [PDF] 返回目录
  Robert Litschko, Ivan Vulić, Željko Agić, Goran Glavaš
Abstract: Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level". In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transfer. We start from an empirical observation that different source parsers are the best choice for different Universal POS sequences in the target language. We then propose to predict the best parser at the instance level. To this end, we train a supervised regression model, based on the Transformer architecture, to predict parser accuracies for individual POS-sequences. We compare ILPS against two strong single-best parser selection baselines (SBPS): (1) a model that compares POS n-gram distributions between the source and target languages (KL) and (2) a model that selects the source based on the similarity between manually created language vectors encoding syntactic properties of languages (L2V). The results from our extensive evaluation, coupling 42 source parsers and 20 diverse low-resource test languages, show that ILPS outperforms KL and L2V on 13/20 and 14/20 test languages, respectively. Further, we show that by predicting the best parser "at the treebank level" (SBPS), using the aggregation of predictions from our instance-level model, we outperform the same baselines on 17/20 and 16/20 test languages.
摘要:针对预测的低资源目标“在树库级别”全球语言,那就是最好的解析器的跨语言解析器传递聚焦电流的方法。在这项工作中,我们提出并论证了一种新的跨语言传输模式:实例级解析器选择(ILPS),并出示证据的概念研究集中在实例级别中选择虚化解析器转让的框架。我们从经验观察不同来源解析器在目标语言不同的通用PO​​S序列的最佳选择开始。然后,我们建议在实例级别来预测最佳解析器。为此,我们训练监督回归模型,基于变压器的架构,来预测个体POS序列分析器精度。我们比较针对两种强单最好解析器选择基线(SBPS)ILPS:(1)POS n-gram中的分布​​的源语言和目标语言(KL)和之间进行比较的模型(2)的模型选择源基于所述编码的语言(L2V)句法属性手动创建的语言矢量之间的相似性。从我们广泛的评估结果,联轴器42个源解析器和20多元低资源测试语言,显示分别ILPS性能优于KL和L2V在13/20和14/20测试语言。此外,我们还表明,通过预测最佳分析器“的树库级别”(SBPS),利用我们的实例级模型预测的聚集,我们超越于17/20和16/20测试语言相同的基线。

8. Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [PDF] 返回目录
  Ankur Mali, Alexander Ororbia, Daniel Kifer, Clyde Lee Giles
Abstract: Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), never reaching 100% accuracy on training. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on CFLs nor tested on sequences longer than those seen in training. The few efforts that have studied them have shown that continuous differentiable memory structures yield poor generalization for complex CFLs, making the RNN less interpretable. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms that ensure that the model learns to properly balance the use of its latent states with external memory. Our improved RNN models exhibit better generalization performance and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CGGs, including the Dyck languages, as well as on the Penn Treebank language modelling task, and achieve stable, robust performance across these benchmarks. Furthermore, we show that only our memory-augmented networks are capable of retaining memory for a longer duration up to strings of length 160.
摘要:经常性神经网络(RNNs)是序列建模,生成和预测一种广泛使用的深架构。尽管应用,如机器翻译和语音识别成功,这些状态模型有一些致命缺陷。具体而言,RNNs概括差在很长的序列,这限制了它们的应用到许多重要的时间处理和时间序列预测问题。例如,在RNNs认识复杂的上下文无关语言(节能灯),从来没有达到培训100%的准确度斗争。解决这些缺点的一种方式是一个耦合RNN与外部,可微存储器结构,诸如栈。然而,在以前的工作微记忆既没有被广泛研究上的节能灯也没有对测试序列比那些在训练中看到更长的时间。已研究过的一些努力已经表明,连续可微的内存结构产生泛化差复杂的节能灯,使RNN少解释。在本文中,我们与重要建筑和状态更新机制,确保模型学会正确地与外部存储器平衡利用其潜在状态的改善记忆,增强RNN。我们的改进型号RNN表现出较好的泛化性能,并能够通过复杂的分层上下文无关文法(CFGS)产生的长串的分类。我们评估我们的模型上CGGs,包括戴克语言,以及对宾州树库的语言建模任务,并实现稳定,在这些基准测试性能强劲。此外,我们还表明,只有我们的记忆,增强网络是能够保持存储更长的持续时间可达长度为160的字符串。

9. Suicidal Ideation and Mental Disorder Detection with Attentive Relation Networks [PDF] 返回目录
  Shaoxiong Ji, Xue Li, Zi Huang, Erik Cambria
Abstract: Mental health is a critical issue in the modern society, mental disorders could sometimes turn to suicidal ideation without effective treatment. Early detection of mental disorders and suicidal ideation from social content provides a potential way for effective social intervention. Classifying suicidal ideation and other mental disorders, however, is a challenging task as they share quite similar patterns in language usage and sentimental polarity. In this paper, we enhance text representation with lexicon-based sentiment scores and latent topics, and propose to use relation networks for detecting suicidal ideation and mental disorders with related risk indicators. The relation module is further equipped with the attention mechanism to prioritize more important relational features. Through experiments on three real-world datasets, our model outperforms most of its counterparts.
摘要:心理健康是现代社会的一个关键问题,精神障碍有时会求助于自杀意念没有有效的治疗方法。精神障碍和来自社交内容自杀意念的早期诊断提供了有效的社会干预的潜在方法。分类自杀意念等精神障碍,但是,是一个艰巨的任务,因为他们在语言使用和感伤的极性分享颇为相似的模式。在本文中,我们加强与基础词汇,景气指数和潜在主题文本表示,并提出使用关系网络进行检测自杀意念和精神障碍与相关风险指标。关系模块,还配备了注意力机制,更重要的关系功能的优先级。通过三个真实世界的数据集的实验中,我们的模型优于其同行的最多。

10. LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation [PDF] 返回目录
  Dong-Ho Lee, Rahul Khanna, Bill Yuchen Lin, Jamin Chen, Seyeon Lee, Qinyuan Ye, Elizabeth Boschee, Leonardo Neves, Xiang Ren
Abstract: Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.
摘要:成功地训练深层神经网络需要标记数据的庞大语料库。然而,每个标签只提供有限的信息从收集标签必要数量的涉及人类大量精力来学习。在这项工作中,我们引入精益生活,对于序列标签和分类任务的基于Web的标签,高效的注释框架,易于使用的用户界面,不仅允许标注为任务提供所需的标签,但也使学习从说明每个标签的决定。这样的解释使我们能够生成未标记的情况下有用的附加标签的数据,巩固现有的训练数据池。上的三个热门NLP任务(命名实体识别,关系抽取,情感分析),我们发现,使用这种增强监督使我们的模型由超过5-10个百分点,超越竞争的基线F1成绩,同时使用2倍减少标记的实例。我们的框架是第一个利用这个增强的监控技术和这样做的三项重要任务 - 从而向用户提供改进建议,注释和能力(数据,标签,说明)的构建数据集的三倍,而不是常规(数据,标签)对。

11. TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition [PDF] 返回目录
  Bill Yuchen Lin, Dong-Ho Lee, Ming Shen, Ryan Moreno, Xiao Huang, Prashant Shiralkar, Xiang Ren
Abstract: Training neural models for named entity recognition (NER) in a new domain often requires additional human annotations (e.g., tens of thousands of labeled instances) that are usually expensive and time-consuming to collect. Thus, a crucial research question is how to obtain supervision in a cost-effective way. In this paper, we introduce "entity triggers", an effective proxy of human explanations for facilitating label-efficient learning of NER models. An entity trigger is defined as a group of words in a sentence that helps to explain why humans would recognize an entity in the sentence. We crowd-sourced 14k entity triggers for two well-studied NER datasets. Our proposed model, named Trigger Matching Network, jointly learns trigger representations and soft matching module with self-attention such that can generalize to unseen sentences easily for tagging. Experiments show that the framework is significantly more cost-effective such that using 20% of the trigger-annotated sentences can result in a comparable performance of conventional supervised approaches using 70% training data. We publicly release the collected entity triggers and our code.
摘要:培训在一个新的领域命名实体识别(NER)神经模型往往需要额外的人力注释(例如,数以万计的标记的实例),其通常是昂贵和费时的收集。因此,一个关键的研究问题是如何在成本效益的方式获得监管。在本文中,我们介绍了“实体触发器”,促进NER车型的标签高效学习的人解释的有效代理。实体触发被定义为一个组中的一句话的,有助于解释为什么人类会在句子识别的实体。我们人群来源的14K实体触发了两个经过仔细研究NER的数据集。我们提出的模型,命名为触发匹配网络,共同学习触发器表示和软匹配模块与自我的关注,从而可以推广到看不见的句子轻松地用于标记。实验表明,该框架显著是更具成本效益的,使得使用触发注释的句子的20%可导致用70%的训练数据的常规监督的方法可比较的性能。我们公开发布的收集实体触发器和我们的代码。

12. Paraphrase Augmented Task-Oriented Dialog Generation [PDF] 返回目录
  Silin Gao, Yichi Zhang, Zhijian Ou, Zhou Yu
Abstract: Neural generative models have achieved promising performance on dialog generation tasks if given a huge data set. However, the lack of high-quality dialog data and the expensive data annotation process greatly limit their application in real-world settings. We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model to improve the dialog generation performance. We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels. PARG is applicable to various dialog generation models, such as TSCP (Lei et al., 2018) and DAMD (Zhang et al., 2019). Experimental results show that the proposed framework improves these state-of-the-art dialog models further on CamRest676 and MultiWOZ. PARG also significantly outperforms other data augmentation methods in dialog generation tasks, especially under low resource settings.
摘要:神经生成模型都取得了如果给一个巨大的数据集上有为对话生成任务中的表现。然而,由于缺乏高质量的对话数据和昂贵的数据注释过程大大限制了它们在现实世界中的设置应用程序。我们提出了一个意译增强响应代(PARG)框架,联合训练一改述模型和响应生成模式,提高对话生成性能。我们还设计了一个方法来自动构建一套基于在对话状态,对话行为标签的释义训练数据。 PARG适用于各种对话生成模型,如TSCP(雷等,2018)和DAMD(Zhang等,2019)。实验结果表明,该框架进一步提高了CamRest676和MultiWOZ这些国家的最先进的对话模式。 PARG也显著优于在对话生成任务等数据增强方法,尤其是在低资源设置。

13. The Right Tool for the Job: Matching Model and Instance Complexities [PDF] 返回目录
  Roy Schwartz, Gabi Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith
Abstract: As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" from neural network calculations for simple instances, and late (and accurate) exit for hard instances. To achieve this, we add classifiers to different layers of BERT and use their calibrated confidence scores to make early exit decisions. We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks. Our method presents a favorable speed/accuracy tradeoff in almost all cases, producing models which are up to five times faster than the state of the art, while preserving their accuracy. Our method also requires almost no additional training resources (in either time or parameters) compared to the baseline BERT model. Finally, our method alleviates the need for costly retraining of multiple models at different levels of efficiency; we allow users to control the inference speed/accuracy tradeoff using a single trained model, by setting a single variable at inference time. We publicly release our code.
摘要:NLP模型变大,执行训练模型需要显著的计算资源招致的货币成本和环境成本。为了更好地尊重一个给定的推论预算,我们提出了一个修改上下文表示微调其中,推理过程,允许早日(和快速)“退出”从神经网络计算简单的情况下,和晚期(准确的)退出了硬实例。要做到这一点,我们添加分类到BERT的不同层次和使用他们的校准的置信度作出提前退场的决定。三个文本分类数据集和两个自然语言推理基准:我们在两个任务测试5点不同的数据集我们提出的修改。我们的方法呈现在几乎所有情况下良好的速度/准确性权衡,生产型号它们五倍比现有技术状态较快,同时保持其准确性。我们的方法也需要比较基准BERT模型几乎没有额外的培训资源(以时间或参数)。最后,我们的方法减轻了需要在不同层次的效率的多种型号的昂贵的再培训;我们允许用户使用单个训练的模型,通过在推理时间设定一个变量来控制推理速度/精度的折衷。我们公开发布我们的代码。

14. Non-Autoregressive Machine Translation with Latent Alignments [PDF] 返回目录
  Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi
Abstract: This paper investigates two latent alignment models for non-autoregressive machine translation, namely CTC and Imputer. CTC generates outputs in a single step, makes strong conditional independence assumptions about output variables, and marginalizes out latent alignments using dynamic programming. Imputer generates outputs in a constant number of steps, and approximately marginalizes out possible generation orders and latent alignments for training. These models are simpler than existing non-autoregressive methods, since they do not require output length prediction as a pre-process. In addition, our architecture is simpler than typical encoder-decoder architectures, since input-output cross attention is not used. On the competitive WMT'14 En$\rightarrow$De task, our CTC model achieves 25.7 BLEU with a single generation step, while Imputer achieves 27.5 BLEU with 2 generation steps, and 28.0 BLEU with 4 generation steps. This compares favourably to the baseline autoregressive Transformer with 27.8 BLEU.
摘要:本文研究了非自回归机英译的两个潜对准模型,即CTC和Imputer。 CTC在单一步骤中产生输出,使用动态编程使有关输出变量强条件独立性的假设,和边缘化出潜比对。 Imputer在步骤恒定数量的产生的输出,和大约边缘化出可能产生订单和训练潜比对。这些模型比现有的非自回归方法简单,因为它们不需要输出长度预测的预处理。此外,我们的架构比通常的编码器,解码器架构简单,因为未使用的输入输出交叉关注。在竞争激烈的WMT'14恩$ \ $ RIGHTARROW德的任务,我们的CTC模式达到25.7 BLEU用单生成步骤,而Imputer达到27.5 BLEU与2层的步骤,和28.0 BLEU与4层的步骤。这相比毫不逊色基线自回归互感器27.8 BLEU。

15. Neural Data-to-Text Generation with Dynamic Content Planning [PDF] 返回目录
  Kai Chen, Fayuan Li, Baotian Hu, Weihua Peng, Qingcai Chen, Yajuan Lv, Yong Zhu, Hong Yu
Abstract: Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.
摘要:神经数据到文本代车型在最近几年取得了显著的进步。然而,这些模型有两个缺点:所生成的文本往往会错过一些重要信息,他们往往产生不与结构化的输入数据一致的描述。为了解决这些问题,我们提出了动态内容规划一个神经数据到文本代车型,命名为NDP的缩写。的NDP可以利用先前生成的文本,以动态地选择从所述给定的结构化数据的适当的条目。我们进一步设计具有新颖目标函数,可以从解码器,这有助于所生成的文本的精度的隐藏状态顺序地重建所使用的数据的整个条目的重建机构。经验结果显示,NDP达到超过上ROTOWIRE数据集的状态的最先进的性能优越,在关系生成(RG),内容选择(CS),内容排序(CO)和BLEU度量方面。人的评价结果​​表明,所提出的NDP产生的文本比大多数的时间由NCP产生相应的要好。并且使用所提出的重建机构,将所生成的文本的保真度可以进一步提高显著。

16. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data [PDF] 返回目录
  Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, William Wang
Abstract: Existing question answering datasets focus on dealing with homogeneous information, based either only on text or KB/Table information alone. However, as human knowledge is distributed over heterogeneous forms, using homogeneous information might lead to severe coverage problems. To fill in the gap, we present \dataset, a new large-scale question-answering dataset that requires reasoning on heterogeneous information. Each question is aligned with a structured Wikipedia table and multiple free-form corpora linked with the entities in the table. The questions are designed to aggregate both tabular information and text information, i.e. lack of either form would render the question unanswerable. We test with three different models: 1) table-only model. 2) text-only model. 3) a hybrid model \model which combines both table and textual information to build a reasoning path towards the answer. The experimental results show that the first two baselines obtain compromised scores below 20\%, while \model significantly boosts EM score to over 50\%, which proves the necessity to aggregate both structure and unstructured information in \dataset. However, \model's score is still far behind human performance, hence we believe \dataset to an ideal and challenging benchmark to study question answering under heterogeneous information. The dataset and code are available at \url{this https URL}.
摘要:现有的问答集专注于应对同质信息,只在单独的文本或KB /表信息为基础的。然而,随着人类的知识分布在异构的形式,采用均匀的信息可能会导致严重的覆盖问题。为了填补缺口,我们现在\数据集,一个新的大型答疑数据集,需要推理在异构信息。每个问题与维基百科的结构表,并在表中的实体连接的多个自由形式的语料库对齐。该问题的目的是既汇总表格信息和文字信息,即缺乏两种形式将使问题无法回答。我们测试用三种不同的模式:1)仅表模型。 2)纯文本模式。 3)混合模型\模型,该模型结合了表和文本信息来构建朝向答案推理路径。实验结果显示,前两个基线得到损害分数低于20 \%,而\模型显著提升EM得分超过50 \%,这证明有必要在聚合\数据集两者的结构化和非结构化信息。然而,\模型的得分仍然远远落后于人的表现,因此我们认为\数据集中到一个理想的和富有挑战性的基准,以研究问题下异构信息应答。该数据集和代码可在\ {URL这HTTPS URL}。

17. Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation [PDF] 返回目录
  Idriss Mghabbar, Pirashanth Ratnamogan
Abstract: Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used millions of sentences. Today, the majority of multi-domain adaptation techniques are based on complex and sophisticated architectures that are not adapted for real-world applications. So far, no scalable method is performing better than the simple yet effective mixed-finetuning, i.e finetuning a generic model with a mix of all specialized data and generic data. In this paper, we propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model without adding new costs at inference time. Our experiments demonstrated that our training pipeline allows improving the performance of multi-domain translation over finetuning in configurations with 2, 3, and 4 domains by up to 2 points in BLEU.
摘要:专业数据的缺乏使构建多域神经机器翻译工具挑战。虽然新兴文学交易与低资源语言开始出现可喜的成果,大多数国家的最先进的车型使用的数以百万计的句子。今天,大多数的多域自适应技术是基于不适合实际应用中复杂而精密的架构。到目前为止,还没有可扩展的方法进行比简单而有效的混合细化和微调,所有专业的数据和一般数据的混合即一个细化和微调通用模型更好。在本文中,我们提出了一个新的培训管道,知识蒸馏和多专业老师让我们能够有效地微调模型,而不在推理时间增加新的成本。我们的实验证明,我们的培训通道允许提高多域转换的性能,在配置中有2,3,4域最多2个BLEU微调。

18. Learning Structured Embeddings of Knowledge Graphs with Adversarial Learning Framework [PDF] 返回目录
  Jiehang Zeng, Lu Liu, Xiaoqing Zheng
Abstract: Many large-scale knowledge graphs are now available and ready to provide semantically structured information that is regarded as an important resource for question answering and decision support tasks. However, they are built on rigid symbolic frameworks which makes them hard to be used in other intelligent systems. We present a learning method using generative adversarial architecture designed to embed the entities and relations of the knowledge graphs into a continuous vector space. A generative network (GN) takes two elements of a (subject, predicate, object) triple as input and generates the vector representation of the missing element. A discriminative network (DN) scores a triple to distinguish a positive triple from those generated by GN. The training goal for GN is to deceive DN to make wrong classification. When arriving at a convergence, GN recovers the training data and can be used for knowledge graph completion, while DN is trained to be a good triple classifier. Unlike few previous studies based on generative adversarial architectures, our GN is able to generate unseen instances while they just use GN to better choose negative samples (already existed) for DN. Experiments demonstrate our method can improve classical relational learning models (e.g.TransE) with a significant margin on both the link prediction and triple classification tasks.
摘要:许多大型知识图现已上市,并准备提供被视为答疑和决策支持工作的重要资源的语义结构化信息。然而,它们都建立在刚性象征框架,这使得他们很难在其他智能系统中使用。我们使用生成对抗性架构设计嵌入实体和知识图的关系进入了连续的向量空间呈现的学习方法。甲生成网络(GN)取三重作为输入(主语,谓语,宾语)的两个元素,并产生缺少的元素的矢量表示。甲判别网络(DN)的分数的三来区分由GN产生的那些正三倍。为GN的培养目标是欺骗DN做出错误分类。当收敛到达,GN恢复训练数据,并且可以用于知识图完成,而DN训练是一个很好的三重分类。与基于生成对抗架构几个以前的研究中,我们的GN能够产生看不见的情况下,而他们只是使用GN更好地选择阴性样品(已经存在)的DN。实验证明我们的方法可以改善传统的关系型学习模式(e.g.TransE)与链接预测和三分类任务既是显著保证金。

19. There is Strength in Numbers: Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training [PDF] 返回目录
  Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Sebastian Riedel, Tim Rocktäschel
Abstract: Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Previous work proposed tackling this problem via adversarial training, but this leads to learned sentence representations that still suffer from the same biases. As a solution, we propose using an ensemble of adversaries during the training, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. We show that using an ensemble of adversaries can prevent the bias from being relearned after the model training is completed, further improving how well the model generalises to different NLI datasets. In particular, these models outperformed previous approaches when tested on 12 different NLI datasets not used in the model training. Finally, the optimal number of adversarial classifiers depends on the dimensionality of the sentence representations, with larger dimensional representations benefiting when trained with a greater number of adversaries.
摘要:自然语言推理(NLI)数据集包含导致自然语言发言,并各自蕴涵类之间的伪相关注释文物。这些文物是由神经网络的利用,即使仅考虑假设和忽略的前提下,从而导致不必要的偏见。建议前期工作攻坚通过对抗性训练这个问题,但是这会导致得知判决表示仍然来自同一个偏见受到影响。作为一个解决方案,我们建议使用在培训期间对手的合奏,鼓励模型,共同减少这些不同对手的准确性,同时拟合数据。我们表明,利用对手的集合体,可以防止偏置从模型训练完成后,进一步提高得怎么样模型推广到不同的数据集NLI正在重新学习。特别是,当在模型训练不使用12个不同的数据集NLI测试这些模型优于以前的方法。最后,对抗性分类的最佳数量取决于句表示的维度,以在与对手更大数量的训练有素的大维表示受益。

20. Deep Generation of Coq Lemma Names Using Elaborated Terms [PDF] 返回目录
  Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric
Abstract: Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manually at high cost, emerging approaches automatically learn and suggest idiomatic names in Java-like languages by applying statistical language models on large code corpora. However, due to its powerful language extension facilities and fusion of type checking and computation, Coq is a challenging target for automated learning techniques. We present novel generation models for learning and suggesting lemma names for Coq projects. Our models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq's lexer (tokens in lemma statements), parser (syntax trees), and kernel (elaborated terms) for naming; the key insight is that learning from elaborated terms can substantially boost model performance. We implemented our models in a toolchain, dubbed Roosterize, and applied it on a large corpus of code derived from the Mathematical Components family of projects, known for its stringent coding conventions. Our results show that Roosterize substantially outperforms baselines for suggesting lemma names, highlighting the importance of using multi-input models and elaborated terms.
摘要:编码规范命名,间距和其他基本样式属性是必要的开发人员有效地了解,审查,并在大型软件项目的修改源代码。一致的约定基于证据的助手,如勒柯克,增加重要性项目增长的规模和范围的验证项目。虽然约定可以记录,并以高成本手动执行,新办法自动学习并建议惯用的名称的Java类的大码语料库应用统计语言模型的语言。然而,由于其强大的语言扩展设施和类型检查和计算的融合,勒柯克是自动学习技术,一个具有挑战性的目标。我们提出了新的一代车型的学习,并建议对勒柯克项目引理的名字。我们的模型,基于多输入神经网络,是第一个利用句法和勒柯克的词法分析器语义信息(引理声明令牌),解析器(语法树),并命名内核(拟定条款);主要观点是,从详尽的学习方面可以大幅提升模型的性能。我们实现了我们的模型的工具链,被称为Roosterize,并将其应用在大型语料库的代码从数学组件家庭的项目,其严格的编码约定知的。我们的研究结果表明,Roosterize大幅优于基准的建议引理的名字,突出使用多输入模型的重要性,阐述条款。

21. A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation [PDF] 返回目录
  Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak
Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of query tokens to OT operations. In our method, we randomly generate OTs from a context-free grammar. Afterwards, annotators have to write the appropriate natural language question that is represented by the OT. Finally, the annotators assign the tokens to the OT operations. We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases. We compare OTTA to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our corpus is a challenging dataset and that the token alignment can be leveraged to increase the performance significantly.
摘要:在本文中,我们介绍了一种新的方法来有效地构建问题回答了结构化数据的语料库。为此,我们推出了基于在一个叫树的操作(OT)数据库的逻辑查询计划的中间表示。该表示允许我们颠倒注释过程中没有查询类型,我们产生的失去灵活性。此外,它允许查询令牌OT操作的细粒度对齐。在我们的方法,我们随机生成从上下文无关文法在职培训计划。之后,注释必须编写由OT代表相应的自然语言问题。最后,注释分配令牌的OT操作。我们应用创建一个新的语料库OTTA(操作树木和令牌分配),一个大的语义分析语料库评估自然语言界面到数据库的方法。我们比较OTTA到蜘蛛和LC-QUAD 2.0,并表明我们的方法不止三倍注释速度,同时保持了查询的复杂性。最后,我们训练我们的数据一个国家的最先进的语义分析模型,并表明我们的语料库是一个具有挑战性的数据集,该令牌对准可以利用到显著提高性能。

22. Trakhtenbrot's Theorem in Coq, A Constructive Approach to Finite Model Theory [PDF] 返回目录
  Dominik Kirst, Dominique Larchey-Wendling
Abstract: We study finite first-order satisfiability (FSAT) in the constructive setting of dependent type theory. Employing synthetic accounts of enumerability and decidability, we give a full classification of FSAT depending on the first-order signature of non-logical symbols. On the one hand, our development focuses on Trakhtenbrot's theorem, stating that FSAT is undecidable as soon as the signature contains an at least binary relation symbol. Our proof proceeds by a many-one reduction chain starting from the Post correspondence problem. On the other hand, we establish the decidability of FSAT for monadic first-order logic, i.e. where the signature only contains at most unary function and relation symbols, as well as the enumerability of FSAT for arbitrary enumerable signatures. All our results are mechanised in the framework of a growing Coq library of synthetic undecidability proofs.
摘要:我们研究依赖型理论的建设性设定有限的一阶满足性(FSAT)。用人enumerability和可判定性的合成帐户,我们给这取决于非逻辑符号的一阶签名FSAT的完整分类。一方面,我们的发展重点放在Trakhtenbrot定理,指出FSAT是不可判定只要签名包含至少二元关系的象征。我们的收入证明由邮政通信问题开始一个多一个减少链。在另一方面,我们建立了FSAT的可判定为一元一阶逻辑,即这里的招牌只包含最多一元函数和关系符号,以及FSAT的任意枚举签名enumerability。我们所有的结果都机械化合成不可判定性证明的越来越多的勒柯克库的框架。

23. A Discriminator Improves Unconditional Text Generation without Updating the Generator [PDF] 返回目录
  Xingyuan Chen, Ping Cai, Peng Jin, Hongjun Wang, Xinyu Dai, Jiajun Chen
Abstract: We propose a novel mechanism to improve an unconditional text generator with a discriminator, which is trained to estimate the probability that a sample comes from real or generated data. In contrast to recent discrete language generative adversarial networks (GAN) which update the parameters of the generator directly, our method only retains generated samples which are determined to come from real data with relatively high probability by the discriminator. This not only detects valuable information, but also avoids the mode collapse introduced by GAN. To the best of our knowledge, this is the first method which improves the neural language models (LM) trained with maximum likelihood estimation (MLE) by using a discriminator as a filter. Experimental results show that our mechanism improves both RNN-based and Transformer-based LMs when measuring in sample quality and sample diversity simultaneously at different softmax temperatures (a previously noted deficit of language GANs). Further, by recursively adding more discriminators, more powerful generators are created.
摘要:我们提出了一个新的机制,以改善与鉴别,这是训练来估计,一个样品来自真实的或产生的数据的概率无条件文本生成。与此相反,以最近的离散语言生成对抗网络(GAN),其直接更新发生器的参数,我们的方法只保留那些被确定为来自实际数据与由所述鉴别器相对高的概率生成的样本。这不仅检测有价值的信息,同时也避免了GAN引入的模式崩溃。据我们所知,这是一种利用鉴别作为过滤器提高了与最大似然估计(MLE)训练的神经语言模型(LM)的第一种方法。实验结果表明,我们的机构在样品质量和在不同温度下添加Softmax同时样品分集(语言甘斯的前面所提到的缺陷)进行测量时改善了基于RNN和基于变压器的LM。此外,通过递归地增加更多的鉴别,创建更强大的发电机。

注:中文为机器翻译结果!