摘要

1. Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering [PDF] 返回目录
Changmao Li, Jinho D. Choi
Abstract: We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance prediction and the token span prediction is applied to fine-tune for span-based question answering (QA). Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models, BERT and RoBERTa, respectively.
摘要：本文介绍了一种新的方法，以变压器是学习在多方对话分层表示。首先，三个语言建模任务用于预先训练变压器，令牌的口才层次的语言模型和话语秩序的预测，该学会在对话情境更好地了解这两种令牌和话语的嵌入。然后，多任务的话语预测和令牌跨度预测之间的学习应用到微调的基于整体范围的问答（QA）。我们的做法是对的3.8％，比国家的最先进的两个变压器型号，分别BERT和罗伯塔，1.4％的FriendsQA数据集和节目改进评估。

2. Entity Linking via Dual and Cross-Attention Encoders [PDF] 返回目录
Oshin Agarwal, Daniel M. Bikel
Abstract: Entity Linking has two main open areas of research: 1) generate candidate entities without using alias tables and 2) generate more contextual representations for both mentions and entities. Recently, a solution has been proposed for the former as a dual-encoder entity retrieval system (Gillick et al., 2019) that learns mention and entity representations in the same space, and performs linking by selecting the nearest entity to the mention in this space. In this work, we use this retrieval system solely for generating candidate entities. We then rerank the entities by using a cross-attention encoder over the target mention and each of the candidate entities. Whereas a dual encoder approach forces all information to be contained in the small, fixed set of vector dimensions used to represent mentions and entities, a crossattention model allows for the use of detailed information (read: features) from the entirety of each tuple. We experiment with features used in the reranker including different ways of incorporating document-level context. We achieve state-of-the-art results on TACKBP-2010 dataset, with 92.05% accuracy. Furthermore, we show how the rescoring model generalizes well when trained on the larger CoNLL-2003 dataset and evaluated on TACKBP-2010.
摘要：实体链接有两个主要的研究开放方面：1）生成，而不使用别名桌和2个候选单位）产生更多的上下文表示两个提到和实体。最近，一个解决方案已经提出了前者作为双编码器实体检索系统（吉利克等，2019），其获悉提及并在同一空间实体的表示，并在此选择最近的实体提联执行空间。在这项工作中，我们只使用这个检索系统产生候选实体。然后，我们通过使用交叉注意编码器在目标提和每个候选实体的重新排名的实体。而双编码器的方法的力将被包含在小的，固定的一组用于表示矢量尺寸的所有信息提及和实体，一个crossattention模型允许使用的详细信息（读：特征）彼此<提及的全部，情况下，候选实体>元组。我们与在reranker包括合并文档级上下文的不同的方式使用功能进行试验。我们实现对TACKBP-2010数据集的国家的最先进成果，以92.05％的准确率。此外，我们显示出对大CoNLL-2003数据集中培训，并对TACKBP 2010年评估时再评分模型是如何推广好。

3. Fine-Grained Named Entity Typing over Distantly Supervised Data Based on Refined Representations [PDF] 返回目录
Muhammad Asif Ali, Yifang Sun, Bing Li, Wei Wang
Abstract: Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP). It aims at classifying an entity mention into a wide range of entity types. Due to a large number of entity types, distant supervision is used to collect training data for this task, which noisily assigns type labels to entity mentions irrespective of the context. In order to alleviate the noisy labels, existing approaches on FGNET analyze the entity mentions entirely independent of each other and assign type labels solely based on mention sentence-specific context. This is inadequate for highly overlapping and noisy type labels as it hinders information passing across sentence boundaries. For this, we propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification. Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro f1 and micro f1 respectively.
摘要：命名实体打字（FG-NET）细粒度是在自然语言处理（NLP）的一个关键组成部分。它的目的是实体提分成了广泛的实体类型。由于大量的实体类型的，遥远的监督是用来收集训练数据完成这个任务，到实体提到不管上下文的这吵闹指定类型的标签。为了缓解嘈杂的标签，在FGNET现有的方法分析实体提到完全相互独立，并指定类型的标签完全基于提一句专属情景。这是不够的高度重叠和嘈杂的类型标签，因为它阻碍了信息传递跨越句子边界。为此，我们提出了一个边加权细心图卷积网络，通过参加在语料库级结束分类之前，上下文线索细化嘈杂提表示。实验评价表明，该模型优于由高达10.2％的相对分数和8.3％分别为宏f1和微F1已有的研究。

4. What do Models Learn from Question Answering Datasets? [PDF] 返回目录
Priyanka Sen, Amir Saffari
Abstract: While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate what models are really learning from QA datasets by evaluating BERT-based models across five popular QA datasets. We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect information in datasets, and ability to handle variations in questions. We find that no single dataset is robust to all of our experiments and identify shortcomings in both datasets and evaluation methods. Following our analysis, we make recommendations for building future QA datasets that better evaluate the task of question answering.
摘要：虽然车型已经达到了流行的问答（QA）的数据集，例如小队超人的表现，他们还没有对问题的回答本身的任务跑赢大市人。在本文中，我们研究了哪些车型真正从QA的数据集通过在五个流行的QA数据集评估基于BERT的模型学习。我们对自己的推广到域外的例子，响应丢失或数据集不正确的信息，有能力在问题处理变化评估模型。我们发现，没有一个单一的数据集是强大的我们所有的实验，并确定在两个数据集中和评价方法的缺点。根据我们的分析，我们为建设未来的QA数据集，更好地评估答疑的任务建议。

5. Automated Utterance Generation [PDF] 返回目录
Soham Parikh, Quaizar Vohra, Mitul Tiwari
Abstract: Conversational AI assistants are becoming popular and question-answering is an important part of any conversational assistant. Using relevant utterances as features in question-answering has shown to improve both the precision and recall for retrieving the right answer by a conversational assistant. Hence, utterance generation has become an important problem with the goal of generating relevant utterances (sentences or phrases) from a knowledge base article that consists of a title and a description. However, generating good utterances usually requires a lot of manual effort, creating the need for an automated utterance generation. In this paper, we propose an utterance generation system which 1) uses extractive summarization to extract important sentences from the description, 2) uses multiple paraphrasing techniques to generate a diverse set of paraphrases of the title and summary sentences, and 3) selects good candidate paraphrases with the help of a novel candidate selection algorithm.
摘要：对话AI助理正在成为流行的问题回答是任何对话助手的重要组成部分。利用相关的话语在问题回答的功能已经表明，以提高精确度和召回无论是通过对话助手获取正确的答案。因此，话语一代已成为产生从包括标题和描述的知识库文章相关的话语（句子或短语）的目标的一个重要问题。然而，创造良好的话语通常需要大量的人工，创造了一个自动化的话语一代的需求。在本文中，我们提出一种1）采用萃取聚合，以从该描述中提取重要句，2话语生成系统）使用多个复述的技术来产生一组不同的标题和摘要的句子的释义的，和3）选择好的候选用新的候选选择算法的帮助下复述。

6. Operationalizing the legal concept of 'Incitement to Hatred' as an NLP task [PDF] 返回目录
Frederike Zufall, Huangpan Zhang, Katharina Kloppenborg, Torsten Zesch
Abstract: Hate speech detection or offensive language detection are well-established but controversial NLP tasks. There is no denying the temptation to use them for law enforcement or by private actors to censor, delete, or punish online statements. However, given the importance of freedom of expression for the public discourse in a democracy, determining statements that would potentially be subject to these measures requires a legal justification that outweighs the right to free speech in the respective case. The legal concept of 'incitement to hatred' answers this question by preventing discrimination against and segregation of a target group, thereby ensuring the members' acceptance as equal in a society - likewise a prerequisite for democracy. In this paper, we pursue these questions based on the criminal offense of 'incitement to hatred' in § 130 of the German Criminal Code along with the underlying EU Council Framework Decision. Under the German Network Enforcement Act, social media providers are subject to a direct obligation to delete postings violating this offense. We take this as a use case to study the transition from the ill-defined concepts of hate speech or offensive language which are usually used in NLP to an operationalization of an actual legally binding obligation. We first translate the legal assessment into a series of binary decisions and then collect, annotate, and analyze a dataset according to our annotation scheme. Finally, we translate each of the legal decisions into an NLP task based on the annotated data. In this way, we ultimately also explore the extent to which the underlying value-based decisions could be carried over to NLP.
摘要：仇恨言论检测或攻击性的语言检测是行之有效的，但有争议的NLP任务。无可否认使用他们的执法或私人行为者审查，删除或惩罚在线声明的诱惑。然而，考虑到言论自由的重要性，在一个民主国家的公共话语，判断语句将可能会受到这些措施的要求远远超过在各自的情况下言论自由权的法律依据。 “煽动仇恨”的法律概念回答了防止目标群体的歧视和隔离，从而保证了成员的社会接受平等这个问题 - 同样民主的前提条件。在本文中，我们追求基于“煽动仇恨”的德国刑法§130的犯罪行为与底层欧盟理事会框架决定沿着这些问题。根据德国网络执行法案，社交媒体供应商都受到了直接的义务删除帖子违反本罪。我们把这个作为一个使用案例来研究从通常在NLP用于实际的具有法律约束力的义务的运作仇恨言论或攻击性语言的界限不清的概念转变。首先，我们根据我们的注解方案翻译法律评估成一系列的二元决策，然后收集，注释和分析的数据集。最后，我们翻译的每一个法律决定到基于注释数据的NLP任务。通过这种方式，我们最终也探索其潜在的价值为基础的决定可以结转到NLP的程度。

7. Emergent Language Generalization and Acquisition Speed are not tied to Compositionality [PDF] 返回目录
Eugene Kharitonov, Marco Baroni
Abstract: Studies of discrete languages emerging when neural agents communicate to solve a joint task often look for evidence of compositional structure. This stems for the expectation that such a structure would allow languages to be acquired faster by the agents and enable them to generalize better. We argue that these beneficial properties are only loosely connected to compositionality. In two experiments, we demonstrate that, depending on the task, non-compositional languages might show equal, or better, generalization performance and acquisition speed than compositional ones. Further research in the area should be clearer about what benefits are expected from compositionality, and how the latter would lead to them.
摘要：离散语言时的神经代理商沟通解决共同任务新兴的研究往往找组成结构的证据。这源于对于期望这样的结构将允许语言被收购由代理商更快，使他们更好地一概而论。我们认为，这些有益的性质只是松散地连接到组合性。在两个实验中，我们证明了，根据不同的任务，非语言成分可能等于，或更好地显示，泛化性能和采集速度比组成的。在该区域，应进一步研究什么是从组合性预期收益更清晰，以及后者如何会导致他们。

8. Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking [PDF] 返回目录
Su Zhu, Jieyu Li, Lu Chen, Kai Yu
Abstract: Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multi-domain DST, the data sparsity problem is a major obstacle due to increased numbers of state candidates and dialogue lengths. To encode the dialogue context efficiently, we propose to utilize the previous dialogue state (predicted) and the current dialogue utterance as the input for DST. To consider relations among different domain-slots, the schema graph involving prior knowledge is exploited. In this paper, a novel context and schema fusion network is proposed to encode the dialogue context and schema graph by using internal and external attention mechanisms. Experiment results show that our approach can obtain new state-of-the-art performance of the open-vocabulary DST on both MultiWOZ 2.0 and MultiWOZ 2.1 benchmarks.
摘要：对话状态跟踪（DST）旨在估计给定上述所有谈话当前对话状态。对于多域DST，数据稀疏问题是由于国家的候选人与对话长度的数量增加的一个主要障碍。高效率地编码的对话上下文，我们建议利用先前对话状态（预测）和当前对话发声作为输入为DST。要考虑不同的域插槽之间的关系，涉及先验知识的模式图被利用。在本文中，一种新颖的上下文和模式融合网络提出了通过使用内部和外部机制关注编码所述对话上下文和模式图。实验结果表明，我们的方法可以同时MultiWOZ 2.0和2.1 MultiWOZ基准获得开放式词汇DST的新的国家的最先进的性能。

9. Inexpensive Domain Adaptation of Pretrained Language Models: A Case Study on Biomedical Named Entity Recognition [PDF] 返回目录
Nina Poerner, Ulli Waltinger, Hinrich Schütze
Abstract: Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by pretraining on in-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO_2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on in-domain text and align the resulting word vectors with the input space of a general-domain PTLM (here: BERT). We evaluate on eight biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model (Lee et al., 2020). We cover over 50% of the BioBERT-BERT F1 delta, at 5% of BioBERT's CO_2 footprint and 2% of its cloud compute cost.
摘要：预训练的语言模型（PTLMs）的领域适应性通常由训练前在域文本来实现的。虽然成功，这种方法是在硬件，运行时间和CO_2排放方面是昂贵的。在这里，我们提出了一个更便宜的选择：我们在域文本训练Word2Vec与一般的域PTLM（这里：BERT）的输入空间对齐造成的单词矢量。我们评估八个生物医学命名实体识别（NER）的任务和比较对最近提出的BioBERT模型（Lee等，2020）。我们覆盖在BioBERT-BERT F1三角洲的50％，在BioBERT的CO_2排放的5％，其云计算成本的2％。

10. Class-Agnostic Continual Learning of Alternating Languages and Domains [PDF] 返回目录
Germán Kruszewski, Ionut-Teodor Sorodoc, Tomas Mikolov
Abstract: Continual Learning has been often framed as the problem of training a model in a sequence of tasks. In this regard, Neural Networks have been attested to forget the solutions to previous task as they learn new ones. Yet, modelling human life-long learning does not necessarily require any crisp notion of tasks. In this work, we propose a benchmark based on language modelling in a multilingual and multidomain setting that prescinds of any explicit delimitation of training examples into distinct tasks, and propose metrics to study continual learning and catastrophic forgetting in this setting. Then, we introduce a simple Product of Experts learning system that performs strongly on this problem while displaying interesting properties, and investigate its merits for avoiding forgetting.
摘要：不断地学习经常被诬陷为在任务序列训练模式的问题。在这方面，神经网络已经被证明，因为他们学习新的忘掉解决以前的任务。然而，模拟人类终身学习并不一定需要任务的任何清脆的概念。在这项工作中，我们提出了一种基于语言建模在多语言和多域设置，训练例子为不同的任务，任何明确的划界prescinds，并提出指标来研究不断地学习和灾难性遗忘在此设置一个基准。然后，我们介绍了专家学习系统强烈地对这个问题进行同时显示有趣的性质的一个简单的产品，并探讨其避免遗忘优点。

11. Windowing Models for Abstractive Summarization of Long Texts [PDF] 返回目录
Leon Schüller, Florian Wilhelm, Nico Kreiling, Goran Glavaš
Abstract: Neural summarization models suffer from the fixed-size input limitation: if text length surpasses the model's maximal number of input tokens, some document content (possibly summary-relevant) gets truncated Independently summarizing windows of maximal input size disallows for information flow between windows and leads to incoherent summaries. We propose windowing models for neural abstractive summarization of (arbitrarily) long texts. We extend the sequence-to-sequence model augmented with pointer generator network by (1) allowing the encoder to slide over different windows of the input document and (2) sharing the decoder and retaining its state across different input windows. We explore two windowing variants: Static Windowing precomputes the number of tokens the decoder should generate from each window (based on training corpus statistics); in Dynamic Windowing the decoder learns to emit a token that signals encoder's shift to the next input window. Empirical results render our models effective in their intended use-case: summarizing long texts with relevant content not bound to the very document beginning.
摘要：神经汇总模型从固定大小的输入限制的影响：如果文本长度超过了模型的最大数量的输入令牌，一些文件的内容（可能汇总相关）获取最大输入尺寸不允许的窗户之间的信息流截断独立总结窗口并导致不连贯的摘要。我们提出的（任意）长文神经抽象概括窗模型。我们扩展由（1）用指针发生器网络增强序列到序列模型允许编码器滑过共享在解码器和跨不同输入窗口保持其状态的输入文件和（2）的不同的窗口。我们探索两款窗变种：静态开窗预计算令牌的解码器应该从每个窗口生成的数量（基于训练语料中统计）;在动态窗口解码器获悉发出的令牌信号编码器的移位到下一个输入窗口。实证结果使我们的模型有效地达到预定可使用情况：总结长文与未结合的非常文档开头相关内容。

12. KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding [PDF] 返回目录
Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, Hyungjoon Soh
Abstract: Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are made publicly available via our GitHub repository.
摘要：自然语言推理（NLI）和语义文本相似性（STS）是在自然语言理解（NLU）关键任务。虽然这些任务的几个基准数据集已经在英国和其他一些语言被释放，没有公开在韩国的语言提供NLI或STS数据集。这个启发，我们分别构建和释放韩国NLI和STS新的数据集，被称为KorNLI和KorSTS。继以前的方法，我们的机器翻译现有的英语培训组和人工开发和测试集翻译成韩文。为了加快对朝鲜NLU的研究，我们还设立KorNLI和KorSTS基线。我们的数据集是通过我们的GitHub库公之于众。

13. A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products [PDF] 返回目录
Saskia Schön, Veselina Mironova, Aleksandra Gabryszak, Leonhard Hennig
Abstract: Recognizing non-standard entity types and relations, such as B2B products, product classes and their producers, in news and forum texts is important in application areas such as supply chain monitoring and market research. However, there is a decided lack of annotated corpora and annotation guidelines in this domain. In this work, we present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions. We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity and the broad syntactic and semantic variety of their surface realizations. We also describe our ongoing annotation effort, and present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
摘要：鉴于非标实体类型和关系，如B2B的产品，产品类别及其生产企业，在新闻和论坛的文本是在应用领域重要的，如供应链的监控和市场调研。然而，有一个决定性的缺乏在这一领域的注释语料库和注释指南。在这项工作中，我们提出了一个语料库研究，注释的模式和相关的指导方针，为产品实体和公司产品的关系提到了注解。我们发现，虽然产品提到经常被实现为名词短语，确定其准确程度是高的边界模糊性和广阔的语法和语义的各种其表面变现的困难所致。我们还描述了我们不断努力的注释，并提出英语网页，并根据提出的指导原则注释社交媒体文件的初步语料库。

14. A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events [PDF] 返回目录
Martin Schiersch, Veselina Mironova, Maximilian Schmitt, Philippe Thomas, Aleksandra Gabryszak, Leonhard Hennig
Abstract: Monitoring mobility- and industry-relevant events is important in areas such as personal travel planning and supply chain management, but extracting events pertaining to specific companies, transit routes and locations from heterogeneous, high-volume text streams remains a significant challenge. This work describes a corpus of German-language documents which has been annotated with fine-grained geo-entities, such as streets, stops and routes, as well as standard named entity types. It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events, such as accidents, traffic jams, acquisitions, and strikes. The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies. It allows for training and evaluating both named entity recognition algorithms that aim for fine-grained typing of geo-entities, as well as n-ary relation extraction systems.
摘要：监测迁移率与行业相关的事件是在重要领域如个人旅游计划和供应链管理，但提取异类，大批量文本流属于特定公司，运输路线和地点的事件仍然是一个显著的挑战。这项工作描述了已经被注释细粒度地理实体，如街道德语语言文档的语料库，停止和路线，以及标准的命名实体类型。它也被注释了一套15交通资讯和行业相关的n元关系和事件，如事故，交通拥堵，收购和罢工。该文集由新闻专线文本，Twitter消息，以及广播电台，警察和铁路公司交通报告。它允许训练和评估旨在对地理实体的细粒度打字都命名实体识别算法，以及n元关系抽取系统。

15. Variational Question-Answer Pair Generation for Machine Reading Comprehension [PDF] 返回目录
Kazutoshi Shinoda, Akiko Aizawa
Abstract: We present a deep generative model of question-answer (QA) pairs for machine reading comprehension. We introduce two independent latent random variables into our model in order to diversify answers and questions separately. We also study the effect of explicitly controlling the KL term in the variational lower bound in order to avoid the "posterior collapse" issue, where the model ignores latent variables and generates QA pairs that are almost the same. Our experiments on SQuAD v1.1 showed that variational methods can aid QA pair modeling capacity, and that the controlled KL term can significantly improve diversity while generating high-quality questions and answers comparable to those of the existing systems.
摘要：我们提出的问题回答（QA）对深生成模型用于机器阅读理解能力。我们推出以单独多样化的答案和问题，两个独立的潜在随机变量为我们的模型。我们还研究明确地控制在变吉隆坡项，以避免“后崩溃”的问题，其中模型忽略了潜在变量，并产生几乎相同QA对下界的影响。我们对阵容V1.1的实验表明，变分法可以帮助QA对模拟的能力，以及控制KL项可以显著提高多样性而产生高质量的问题和答案媲美现有的系统。

16. Improving Fluency of Non-Autoregressive Machine Translation [PDF] 返回目录
Zdeněk Kasner, Jindřich Libovický, Jindřich Helcl
Abstract: Non-autoregressive (nAR) models for machine translation (MT) manifest superior decoding speed when compared to autoregressive (AR) models, at the expense of impaired fluency of their outputs. We improve the fluency of a nAR model with connectionist temporal classification (CTC) by employing additional features in the scoring model used during beam search decoding. Since the beam search decoding in our model only requires to run the network in a single forward pass, the decoding speed is still notably higher than in standard AR models. We train models for three language pairs: German, Czech, and Romanian from and into English. The results show that our proposed models can be more efficient in terms of decoding speed and still achieve a competitive BLEU score relative to AR models.
摘要自回归（AR）模型，在其输出端的受损流畅的费用时非自回归（NAR）模型机器翻译（MT）的清单优越解码速度：抽象。我们通过采用波束搜索译码期间所使用的评分模型附加功能改善与联结颞分类（CTC）一个NAR模型的流畅性。由于波束搜索在我们的模型解码只需要在一个单一的直传运行网络，解码速度仍比标准型号AR显着较高。我们培养模式三种语言对：德国，捷克和罗马尼亚的和成英语。结果表明，该模型可以在解码速度方面更有效率，而且还能达到具有竞争力相对于AR车型BLEU得分。

17. More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction [PDF] 返回目录
Xu Han, Tianyu Gao, Yankai Lin, Hao Peng, Yaoliang Yang, Chaojun Xiao, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
Abstract: Relational facts are an important component of human knowledge, which are hidden in vast amounts of text. In order to extract these facts from text, people have been working on relation extraction (RE) for years. From early pattern matching to current neural networks, existing RE methods have achieved significant progress. Yet with explosion of Web text and emergence of new relations, human knowledge is increasing drastically, and we thus require ``more'' from RE: a more powerful RE system that can robustly utilize more data, efficiently learn more relations, easily handle more complicated context, and flexibly generalize to more open domains. In this paper, we look back at existing RE methods, analyze key challenges we are facing nowadays, and show promising directions towards more powerful RE. We hope our view can advance this field and inspire more efforts in the community.
摘要：关系的事实是人类的知识，这是隐藏在海量文本的重要组成部分。为了从文本中提取这些事实，人们一直在努力关系抽取（RE）多年。从早期的模式匹配当前的神经网络，现有的RE方法都取得显著的进展。然而，随着网络的文本和新关系的出现爆炸，人类的知识急剧增加，因此，我们从RE需要``更“”：一个更强大的RE系统，可以稳健地利用更多的数据，有效地学习更多的关系，轻松处理多复杂的背景下，灵活地推广到更开放的领域。在本文中，我们回顾现有的RE方法，分析我们面临如今面临的主要挑战，并朝着更强大的RE看好的方向显示。我们希望我们的观点能够推动这一领域，激发社区更多的努力。

18. Towards Multimodal Simultaneous Neural Machine Translation [PDF] 返回目录
Aizhan Imankulova, Masahiro Kaneko, Tosho Hirasawa, Mamoru Komachi
Abstract: Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly harder than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine translation (MSNMT) which leverages visual information as an additional modality. Although the usefulness of images as an additional modality is moderate for full sentence translation, we verified, for the first time, its importance for simultaneous translation. Our experiments with the Multi30k dataset showed that MSNMT in a simultaneous setting significantly outperforms its text-only counterpart in situations where 5 or fewer input tokens are needed to begin translation. We then verified the importance of visual information during decoding by (a) performing an adversarial evaluation of MSNMT where we studied how models behave with incongruent input modality and (b) analyzing the image attention.
摘要：同声传译涉及以实现多语种实时了解完成音箱的发声之前翻译的句子。译码在这个任务显著困难不是因为输入信息短缺的一般完整的句子翻译。为了缓解这种短缺，我们提出多式联运同时神经机器翻译（MSNMT），它利用视觉信息作为附加模式。虽然图像作为额外模式的有效性适中全整句翻译，我们核实，首次，它的重要性同声翻译。我们与Multi30k数据集实验表明在同时设置，MSNMT显著优于在需要5个或更少的输入令牌开始翻译的情况下它的纯文本副本。然后，我们用（A）进行MSNMT的对抗性评价，我们研究模型的行为具有完全不同的输入方式和（b）分析图像解码注意力期间验证的视觉信息的重要性。

19. Machine Translation with Unsupervised Length-Constraints [PDF] 返回目录
Jan Niehues
Abstract: We have seen significant improvements in machine translation due to the usage of deep learning. While the improvements in translation quality are impressive, the encoder-decoder architecture enables many more possibilities. In this paper, we explore one of these, the generation of constraint translation. We focus on length constraints, which are essential if the translation should be displayed in a given format. In this work, we propose an end-to-end approach for this task. Compared to a traditional method that first translates and then performs sentence compression, the text compression is learned completely unsupervised. By combining the idea with zero-shot multilingual machine translation, we are also able to perform unsupervised monolingual sentence compression. In order to fulfill the length constraints, we investigated several methods to integrate the constraints into the model. Using the presented technique, we are able to significantly improve the translation quality under constraints. Furthermore, we are able to perform unsupervised monolingual sentence compression.
摘要：我们已经看到在机器翻译显著改善由于深学习使用。虽然翻译质量的改善是令人印象深刻，编码器，解码器架构可以让更多的可能性。在本文中，我们探讨了其中的一个，约束翻译的产生。我们专注于长度的限制，这是必不可少的，如果翻译应该在给定的格式显示。在这项工作中，我们提出了这个任务结束到终端的方法。相较于传统的方法是先转换，然后进行压缩句子，文本压缩教训完全无人监管。通过结合零射门多语言机器翻译的想法，我们也能够执行监督的单语的句子压缩。为了满足长度的限制，我们研究了几种方法来约束融入模型。使用所提出的技术，我们能够显著提高受到制约的翻译质量。此外，我们能够履行监督的单语的句子压缩。

20. Self-Induced Curriculum Learning in Neural Machine Translation [PDF] 返回目录
Dana Ruiter, Cristina España-Bonet, Josef van Genabith
Abstract: Self-supervised neural machine translation (SS-NMT) learns how to extract/select suitable training data from comparable -- rather than parallel -- corpora and how to translate, in a way that the two tasks support each other in a virtuous circle. SS-NMT has been shown to be competitive with state-of-the-art unsupervised NMT. In this study we provide an in-depth analysis of the sampling choices the SS-NMT model takes during training. We show that, without it having been told to do so, the model selects samples of increasing (i) complexity and (ii) task-relevance in combination with (iii) a denoising curriculum. We observe that the dynamics of the mutual-supervision of both system internal representation types is vital for the extraction and hence translation performance. We show that in terms of the human Gunning-Fog Readability index (GF), SS-NMT starts by extracting and learning from Wikipedia data suitable for high school (GF=10--11) and quickly moves towards content suitable for first year undergraduate students (GF=13).
摘要：自监督神经机器翻译（SS-NMT）学会如何提取/选择合适的可比训练数据 - 而不是并行 - 语料库以及如何翻译，在某种程度上，这两个任务互相支持的良性圈。 SS-NMT已被证明是与国家的最先进的无监督NMT竞争力。在这项研究中，我们提供的抽样选择培训期间SS-NMT模型利用进行了深入的分析。我们表明，在不被通知它这样做，与（III）降噪课程组合增加（I）的复杂性及（ii）任务相关的模型选择的样本。我们观察到两个系统内部表示类型的互监管力度的提取，因此翻译性能是至关重要的。我们表明，在人类冈宁雾灯可读性指数（GF）而言，SS-NMT开始迈向内容适合本科一年级提取和维基百科学习适合高中（GF = 10--11）数据和快速移动学生（GF = 13）。

21. Unsupervised Neural Machine Translation with Indirect Supervision [PDF] 返回目录
Hongxiao Bai, Mingxuan Wang, Hai Zhao, Lei Li
Abstract: Neural machine translation~(NMT) is ineffective for zero-resource languages. Recent works exploring the possibility of unsupervised neural machine translation (UNMT) with only monolingual data can achieve promising results. However, there are still big gaps between UNMT and NMT with parallel supervision. In this work, we introduce a multilingual unsupervised NMT (\method) framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions. More specifically, for unsupervised language pairs \texttt{En-De}, we can make full use of the information from parallel dataset \texttt{En-Fr} to jointly train the unsupervised translation directions all in one model. \method is based on multilingual models which require no changes to the standard unsupervised NMT. Empirical results demonstrate that \method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
摘要：神经机器翻译〜（NMT）是无效的零资源的语言。近期的作品只有单语数据探索无监督神经机器翻译（UNMT）的可能性可以达到希望的结果。然而，仍然有UNMT和NMT之间很大的差距与平行监督。在这项工作中，我们引入一个多语种的无监督NMT（\法）框架，利用弱监管信号从高资源语言对零资源翻译方向。更具体地说，无监督的语言对\ texttt {恩德}，我们可以充分利用从并行数据集信息\ texttt {恩神父}共同训练监督的翻译方向都在同一个模型。 \方法是基于其无需更改标准无监督NMT多语言模型。实证结果表明，\方法显著提高超过3 BLEU得分六个基准监督的翻译方向的翻译质量。

22. g2pM: A Neural Grapheme-to-Phoneme Conversion Package for MandarinChinese Based on a New Open Benchmark Dataset [PDF] 返回目录
Kyubyong Park, Seanie Lee
Abstract: Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems. One of the biggest challenges in Chinese G2P conversion is how to disambiguate the pronunciation of polyphones -- characters having multiple pronunciations. Although many academic efforts have been made to address it, there has been no open dataset that can serve as a standard benchmark for fair comparison to date. In addition, most of the reported systems are hard to employ for researchers or practitioners who want to convert Chinese text into pinyin at their convenience. Motivated by these, in this work, we introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation. We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems. Finally, we package our project and share it on PyPi.
摘要：中国的转化字形到音素（G2P）是普通话中国文本到语音转换（TTS）系统的一个重要组成部分。一个在中国G2P转换的最大挑战是如何消除歧义多音字的读音 - 字有多个发音。虽然许多学者已作出努力解决这一问题，一直没有打开的数据集，可以作为公平的比较迄今标准的基准。此外，大多数报告的系统是很难雇用谁想要中国文字转换成拼音在他们方便的研究者和实践者。这些启发，在这项工作中，我们引入了新的基准数据集包括99,000+句话对于中国多音字消歧的。我们就可以训练一个简单的神经网络模型，并发现它优于其他已有的G2P系统。最后，我们包装我们的项目，并分享PyPI上。

23. Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation [PDF] 返回目录
Seungjae Shin, Kyungwoo Song, JoonHo Jang, Hyemi Kim, Weonyoung Joo, Il-Chul Moon
Abstract: Recent researches demonstrate that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the prejudiced results from the downstream tasks, i.e. sentiment analysis. Whereas the previous debiasing models project word embeddings into a linear subspace, we introduce a Latent Disentangling model with a siamese auto-encoder structure and a gradient reversal layer. Our siamese auto-encoder utilizes gender word pairs to disentangle semantics and gender information of given word, and the associated gradient reversal layer provides the negative gradient to distinguish the semantics from the gender. Afterwards, we introduce a Counterfactual Generation model to modify the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment without loss of semantic information. Experimental results quantitatively and qualitatively indicate that the introduced method is better in debiasing word embeddings, and in minimizing the semantic information losses for NLP downstream tasks.
摘要：最近的研究表明，字的嵌入，培养对人体产生的语料库，在嵌入区域的强大的性别偏见，并可能导致从下游任务，即情感分析的结果存有偏见这些偏见。而之前去除偏差模型项目的嵌入字成线性子空间，我们介绍了连体自动编码器结构和梯度逆转层背后的潜解开模型。我们的连体自动编码器利用性别字对解开语义和给定单词的性别的信息，以及相关联的梯度反转层提供负梯度来区分性别的语义。然后，我们引入一个反事实生成模型来修改单词的性别的信息，所以原始的和经修饰的嵌入可以产生性别中和的字没有语义信息损失几何对准之后嵌入。实验结果定量和定性地指示引入的方法更好在消除直流偏压的嵌入字，并在最小化NLP下游任务的语义信息的损失。

24. RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases [PDF] 返回目录
DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, Dong Ryeol Shin
Abstract: Text-to-SQL is the problem of converting a user question into an SQL query, when the question and database are given. In this paper, we present a neural network approach called RYANSQL (Recursively Yielding Annotation Network for SQL) to solve complex Text-to-SQL tasks for cross-domain databases. State-ment Position Code (SPC) is defined to trans-form a nested SQL query into a set of non-nested SELECT statements; a sketch-based slot filling approach is proposed to synthesize each SELECT statement for its corresponding SPC. Additionally, two input manipulation methods are presented to improve generation performance further. RYANSQL achieved 58.2% accuracy on the challenging Spider benchmark, which is a 3.2%p improvement over previous state-of-the-art approaches. At the time of writing, RYANSQL achieves the first position on the Spider leaderboard.
摘要：文本到SQL是将用户问题到一个SQL查询，问题和数据库时，给出的问题。在本文中，我们提出所谓RYANSQL神经网络方法（递归屈服注释网络的SQL），以解决跨域数据库复杂的文本到SQL任务。国家包换位置代码（SPC）被定义为反式嵌套的SQL查询转换为一组非嵌套SELECT语句的;一个基于草图的槽填充方法，提出了以合成每个SELECT语句对于其相应SPC。此外，呈现两个输入操作方法以进一步提高发电性能。 RYANSQL上具有挑战性的蜘蛛基准，这是比以前的状态的最先进的接近3.2％的对改善达到58.2％的准确度。在写这篇文章的时候，RYANSQL实现对蜘蛛排行榜第一的位置。

25. A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [PDF] 返回目录
Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu
Abstract: Owing to the continuous contributions by the Chinese NLP community, more and more Chinese machine reading comprehension datasets become available, and they have been pushing Chinese MRC research forward. To add diversity in this area, in this paper, we propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin. We hope the release of the dataset could further accelerate the machine reading comprehension research. Resources available: this https URL
摘要：由于受到中国NLP社会不断贡献，越来越多的中国机器阅读理解数据集变得可用，他们一直在推动中国MRC研究向前发展。在这方面更多元化，在本文中，我们提出了一个所谓的句子完形填空式机阅读理解（SC-MRC）新的任务。所提出的任务目标，以填补合适的人选句成有几个空白的通道。此外，为了增加更多的困难，我们也做了假的考生类似于正确的，这就需要机器来判断其正确性的背景下。所提出的数据集包含超过10K的通道，这是源于中国的叙事故事内超过10万的空白（问题）。为了评估数据集，我们实现了基于预先训练模型的几个基准系统，结果表明，国家的最先进的模型仍然以大比分不佳人的表现。我们希望该数据集的版本可能会进一步加快机器阅读理解能力的研究。可用资源：此HTTPS URL

26. Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering [PDF] 返回目录
Pratyay Banerjee, Chitta Baral
Abstract: Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning by composing knowledge spread over multiple sentences. In the recently introduced open domain question answering challenge datasets, QASC and OpenBookQA, we need to perform retrieval of facts and compose facts to correctly answer questions. In our work, we learn a semantic knowledge ranking model to re-rank knowledge retrieved through Lucene based information retrieval systems. We further propose a ``knowledge fusion model'' which leverages knowledge in BERT-based language models with externally retrieved knowledge and improves the knowledge understanding of the BERT-based language models. On both OpenBookQA and QASC datasets, the knowledge fusion model with semantically re-ranked knowledge outperforms previous attempts.
摘要：开放域问答系统要求系统检索外部知识，并通过合成在多个句子的知识传播进行多跳推理。在近期引入的开放域问答挑战数据集，QASC和OpenBookQA，我们需要进行检索的事实和撰写事实的正确回答问题。在我们的工作中，我们学到了语义知识的排序模型，通过基于Lucene的信息检索系统检索重新排名的知识。我们进一步提出了``知识融合模型“”它利用基于BERT语言模型与外部获取知识，并且提高了基于BERT语言模型的知识了解。在这两个OpenBookQA和QASC数据集，知识融合模型与语义重新排序的知识优于以前的尝试。

27. Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation [PDF] 返回目录
Bowen Wu, Huan Zhang, Mengyuan Li, Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang
Abstract: Recently, BERT has become an essential ingredient of various NLP deep models due to its effectiveness and universal-usability. However, the online deployment of BERT is often blocked by its large-scale parameters and high computational cost. There are plenty of studies showing that the knowledge distillation is efficient in transferring the knowledge from BERT into the model with a smaller size of parameters. Nevertheless, current BERT distillation approaches mainly focus on task-specified distillation, such methodologies lead to the loss of the general semantic knowledge of BERT for universal-usability. In this paper, we propose a sentence representation approximating oriented distillation framework that can distill the pre-trained BERT into a simple LSTM based model without specifying tasks. Consistent with BERT, our distilled model is able to perform transfer learning via fine-tuning to adapt to any sentence-level downstream task. Besides, our model can further cooperate with task-specific distillation procedures. The experimental results on multiple NLP tasks from the GLUE benchmark show that our approach outperforms other task-specific distillation methods or even much larger models, i.e., ELMO, with efficiency well-improved.
摘要：近日，BERT已成为各种NLP深层模型的重要组成部分，由于其有效性和普遍性，实用性。然而，BERT的网络部署往往阻止其大规模参数和计算成本高。有大量的研究表明，知识蒸馏是把从BERT的知识纳入模型的参数更小尺寸高效。然而，当前BERT蒸馏办法主要集中在任务指定的蒸馏等方法导致的普遍使用性BERT的一般语义知识的损失。在本文中，我们提出了一个句子表示近似面向蒸馏框架，可提制预先训练BERT成一个简单的LSTM基于模型不指定任务。与BERT一致，我们的蒸馏模型能够通过微调进行迁移学习适应任何语句级下游任务。此外，我们的模型可以进一步与任务的具体蒸馏程序合作。从胶基准表明，我们的方法优于其他任务的具体蒸馏法，甚至更大的模型，即，ELMO，有效率在多个NLP任务的实验结果很好的改善。

28. Is Graph Structure Necessary for Multi-hop Reasoning? [PDF] 返回目录
Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu
Abstract: Recently, many works attempt to model texts as graph structure and introduce graph neural networks to deal with it on many NLP this http URL this paper, we investigate whether graph structure is necessary for multi-hop reasoning tasks and what role it plays. Our analysis is centered on HotpotQA. We use the state-of-the-art published model, Dynamically Fused Graph Network (DFGN), as our baseline. By directly modifying the pre-trained model, our baseline model gains a large improvement and significantly surpass both published and unpublished works. Ablation experiments established that, with the proper use of pre-trained models, graph structure may not be necessary for multi-hop reasoning. We point out that both the graph structure and the adjacency matrix are task-related prior knowledge, and graph-attention can be considered as a special case of self-attention. Experiments demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers, and achieve similar results to the previous state-of-the-art model achieved.
摘要：最近，很多作品试图示范文本作为图形结构，引入图形神经网络来解决它在许多NLP这个HTTP URL本文中，我们研究了图结构是否有必要为多跳推理任务和什么样的作用它起着。我们的分析集中在HotpotQA。我们使用国家的最先进的出版模式，动态图形融合网络（DFGN），作为我们的基线。通过直接修改预先训练的模式，我们的基础模型获得了一个大的改善，并显著超越发表和未发表的作品。消融实验证实，通过正确使用的预先训练模型，图形结构可能没有必要多跳推理。我们指出，无论是图形结构及邻接矩阵是与任务相关的先验知识，以及图形的注意力可以被视为自我关注的一个特例。实验证明，图形的注意或整个图形结构可以通过自关注或变压器来代替，并获得类似的结果所取得的前一状态的最先进的模型。

29. Exemplar Auditing for Multi-Label Biomedical Text Classification [PDF] 返回目录
Allen Schmaltz, Andrew Beam
Abstract: Many practical applications of AI in medicine consist of semi-supervised discovery: The investigator aims to identify features of interest at a resolution more fine-grained than that of the available human labels. This is often the scenario faced in healthcare applications as coarse, high-level labels (e.g., billing codes) are often the only sources that are readily available. These challenges are compounded for modalities such as text, where the feature space is very high-dimensional, and often contains considerable amounts of noise. In this work, we generalize a recently proposed zero-shot sequence labeling method, "binary labeling via a convolutional decomposition", to the case where the available document-level human labels are themselves relatively high-dimensional. The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors from the training set, under the model. The approach is effective, yet parsimonious, as demonstrated on a well-studied MIMIC-III multi-label classification task of electronic health record data, and is useful as a tool for organizing the analysis of neural model predictions and high-dimensional datasets. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
摘要：在医药AI的许多实际应用包括半监督发现：研究者旨在在分辨率更细粒度比现有的人力标签的识别所关注的特征。这通常是在面对医疗应用如粗糙的，高级别标签场景（例如，计费代码）通常都是现成的唯一来源。这些挑战加剧的方式，如文本，其中的功能空间是非常高维，常含有大量噪声。在这项工作中，我们“通过卷积分解二进制标记”概括最近提出的零次序列标记方法，到可用的文档级别的人的标签本身是相对高维的情况。该方法的产量与分类“内省”，与推理时期预测的细粒度功能训练组其最近的邻居，下模。这种方法是有效的，但吝啬的，对电子病历数据的充分研究MIMIC-III多标签分类任务，证明，并作为组织神经网络模型预测和高维数据集的分析工具是有用的。我们提出的方法既产量具有竞争力的有效分类模型和询问机制来援助医务人员在了解驱动模型的预测的显着特征。

30. Interview: A Large-Scale Open-Source Corpus of Media Dialog [PDF] 返回目录
Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, Julian McAuley
Abstract: Existing conversational datasets consist either of written proxies for dialog or small-scale transcriptions of natural speech. We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts. Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance on existing spoken dialog datasets, demonstrating its usefulness in modeling real-world conversations. 'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems. In fact, experiments on two dialog tasks show that leveraging such labels improves performance over strong speaker-agnostic baselines, and enabling models to generate more specific and inquisitive responses in interview-style conversations.
摘要：现有会话的数据集包括用于自然语音的对话或小规模改编书面代理任。我们引入“面试”：大规模（105K对话）媒体对话集从新闻采访记录收集。相较于现有大型代理的会话数据，培训了我们的数据具有较好的零射门语言模型外的域上现有的口语对话集，展示了其在模拟真实世界的对话有用的性能。 “面试”包含扬声器的作用注解每一轮，促进接合，响应对话系统的发展。事实上，在两个对话框任务实验表明，利用这种标签可以提高性能强过扬声器无关的基线，使模型来生成更具体，在采访风格的谈话好奇的响应。

31. Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning [PDF] 返回目录
Daya Guo, Akari Asai, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Jian Yin, Ming Zhou
Abstract: We study the problem of generating inferential texts of events for a variety of commonsense like \textit{if-else} relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowledge (e.g., \textit{isA} and \textit{relatedTo} relations), having a limited number of inferential knowledge. We use not only structured commonsense knowledge bases, but also natural language snippets from search-engine results. These sources are incorporated into a generative base model via key-value memory network. In addition, we introduce a meta-learning based multi-task learning algorithm. For each targeted commonsense relation, we regard the learning of examples from other relations as the meta-training process, and the evaluation on examples from the targeted relation as the meta-test process. We conduct experiments on Event2Mind and ATOMIC datasets. Results show that both the integration of multiple knowledge sources and the use of the meta-learning algorithm improve the performance.
摘要：我们研究产生事件的推理文本的各种常识像\ {textit的if-else}关系的问题。现有的方法通常使用从训练样本有限的证据，并了解每一个单独的关系。在这项工作中，我们使用多个知识源作为燃料的模型。现有常识知识库碱如ConceptNet被分类知识（例如，\ textit {ISA}和\ textit {}相关结果关于关系）为主，具有推理知识的数量有限。我们不仅采用结构化的常识知识库基地，同时也自然语言片段来自搜索引擎结果。这些源并入经由键值存储器网络生成基本模型。此外，我们引入元学习基于多任务学习算法。对于每一个有针对性的常识性的关系，我们认为从其他关系作为元训练过程实例的学习，从目标的关系作为元测试过程实施例的评价。我们对Event2Mind和原子的数据集进行实验。结果表明，多个知识源两者的整合和利用元学习算法的改进性能。

32. Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition [PDF] 返回目录
Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, Adina Williams
Abstract: Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether one sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, BOW, and InferSent NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI contains vanishingly few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences: it reliably treats implicatures triggered by "some" as entailments. For some presupposition triggers like "only", BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.
摘要：自然语言推理（NLI）是自然语言理解，这需要一个推断一句话是否需要另一个越来越重要的任务。然而，NLI模型，使语用推理的能力仍然充分研究。我们创建含意与预设诊断数据集（IMPPRES），由32K示出充分研究的语用推理类型半自动地生成句子对。我们使用IMPPRES评估培训了MultiNLI（Williams等，2018）BERT，弓，InferSent NLI模型是否学会做务实的推论。虽然MultiNLI包含难以察觉的对说明这些推理类型，我们发现，BERT学会画务实推论：它可靠地对待含意触发了“一些”的蕴涵。对于一些预设的触发，如“唯一”，BERT可靠识别预设为蕴涵，即使使触发嵌入其中的蕴涵取消运营商像否定下。弓和InferSent显示务实推理的证据说服力不够。我们的结论是NLI培训鼓励模型来了解一些，但不是全部，务实的推论。

33. Information-Theoretic Probing for Linguistic Structure [PDF] 返回目录
Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell
Abstract: The success of neural networks on a diverse set of NLP tasks has led researchers to question how much do these networks actually know about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotation in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that such models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic formalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate. The empirical portion of our paper focuses on obtaining tight estimates for how much information BERT knows about parts of speech in a set of five typologically diverse languages that are often underrepresented in parsing research, plus English, totaling six languages. We find BERT accounts for only at most 5% more information than traditional, type-based word embeddings.
摘要：神经网络对多样化的NLP任务的成功使研究者的问题有多少，这些网络实际上已经了解了自然语言。探针是评估这种自然的方式。探测时，研究人员选择了一个语言任务和训练有监督的模型从网络了解到表示，语言任务预测注解。如果探头做得很好，研究者可以得出结论，表示编码知识相关的任务。一种很普遍的观点是，用简单的模型作为探针为好;逻辑是这样的模型将标识语言结构，而不是学习任务本身。我们建议探测作为估计互信息的信息理论形式化违背此接受的观点：每个人都应该选择性能最高的探头能及，即使是比较复杂的，因为它会导致更严格的评估。我们的论文的实证部分着重于获取的大量信息BERT如何知道在一组五个类型学不同的语言，常常在分析研究的代表性不足的词类，再加上英语，共计六种语言的严密估计。我们发现BERT占比传统的，基于类型的字的嵌入更只有5％以下信息。

34. The Role of Pragmatic and Discourse Context in Determining Argument Impact [PDF] 返回目录
Esin Durmus, Faisal Ladhak, Claire Cardie
Abstract: Research in the social sciences and psychology has shown that the persuasiveness of an argument depends not only the language employed, but also on attributes of the source/communicator, the audience, and the appropriateness and strength of the argument's claims given the pragmatic and discourse context of the argument. Among these characteristics of persuasive arguments, prior work in NLP does not explicitly investigate the effect of the pragmatic and discourse context when determining argument quality. This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims. We further propose predictive models that incorporate the pragmatic and discourse context of argumentative claims and show that they outperform models that rely only on claim-specific linguistic features for predicting the perceived impact of individual claims within a particular line of argument.
摘要：研究在社会科学和心理学表明，论证的说服力不仅取决于所使用的语言，但也对源/传播者，受众的属性，并给出了务实的说法的权利要求书的恰当性和强度争论的话语背景。其中的有说服力的论据这些特性，在NLP以前的工作没有明确的说法，确定质量时，探讨务实和话语语境的影响。本文提出了一种新的数据集，以启动论证这方面的研究：它由参数覆盖741个有争议的话题，包括47,000索赔多样化的集合。我们进一步建议，包括议论文的权利要求的务实和话语语境，并表明他们优于只要求特定的语言特征，依赖于预测个人索赔的说法特定线内的感知影响模型的预测模型。

35. A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages [PDF] 返回目录
Daniel Edmiston
Abstract: This work describes experiments which probe the hidden representations of several BERT-style models for morphological content. The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages. The experiments contained herein show that (i) Transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, (ii) the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases, and (iii) very specific attention head/layer combinations appear to hone in on subject-verb agreement.
摘要：该作品描述了探测器的几个BERT式模型形态的内容隐藏表示实验。我们的目标是检查到离散语言结构的范围内，形态特征和特征值，提出了自己在预先训练语言模型五个欧洲语言的矢量表示与关注分布形式。实验内容本文表明，（ⅰ）变压器结构很大程度上划分其嵌入空间分成凸子区域与形态特征值高度相关的，（ⅱ）变压器的嵌入的上下文化的性质允许模型在许多区分暧昧形态学形式，但不是所有的情况下，及（iii）非常特别注意头/层的组合似乎磨练主谓一致。

36. Query Focused Multi-Document Summarization with Distant Supervision [PDF] 返回目录
Yumo Xu, Mirella Lapata
Abstract: We consider the problem of better modeling query-cluster interactions to facilitate query focused multi-document summarization (QFS). Due to the lack of training data, existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments. In this work, we leverage distant supervision from question answering where various resources are available to more explicitly capture the relationship between queries and documents. We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query, likely to contain an answer, and central. Under this framework, a trained evidence estimator further discerns which retrieved segments might answer the query for final selection in the summary. We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
摘要：我们认为更好的造型查询集群互动的问题，以方便查询集中的多文档文摘（QFS）。由于缺乏训练数据，现有的工作在很大程度上依赖于检索式的方法估算的查询和文本段之间的相关性。在这项工作中，我们利用从问答其中各种资源，以更明确地捕捉查询和文档之间的关系远的监督。我们提出了一个由粗到细的建模框架，引入了独立的模块，用于估计段是否与查询相关，可能包含答案，和中央。在此框架下，其检索段训练有素的证据估计进一步辨别出的回答可能是在总结最终选择查询。我们证明了我们的框架优于标准的QFS基准强烈对比系统。

37. Enhancing Review Comprehension with Domain-Specific Commonsense [PDF] 返回目录
Aaron Traylor, Chen Chen, Behzad Golshan, Xiaolan Wang, Yuliang Li, Yoshihiko Suhara, Jinfeng Li, Cagatay Demiralp, Wang-Chiew Tan
Abstract: Review comprehension has played an increasingly important role in improving the quality of online services and products and commonsense knowledge can further enhance review comprehension. However, existing general-purpose commonsense knowledge bases lack sufficient coverage and precision to meaningfully improve the comprehension of domain-specific reviews. In this paper, we introduce xSense, an effective system for review comprehension using domain-specific commonsense knowledge bases (xSense KBs). We show that xSense KBs can be constructed inexpensively and present a knowledge distillation method that enables us to use xSense KBs along with BERT to boost the performance of various review comprehension tasks. We evaluate xSense over three review comprehension tasks: aspect extraction, aspect sentiment classification, and question answering. We find that xSense outperforms the state-of-the-art models for the first two tasks and improves the baseline BERT QA model significantly, demonstrating the usefulness of incorporating commonsense into review comprehension pipelines. To facilitate future research and applications, we publicly release three domain-specific knowledge bases and a domain-specific question answering benchmark along with this paper.
摘要：回顾理解在提高在线服务和产品常识知识的质量可进一步提高评审的理解方面发挥着越来越重要的作用。但是，现有的通用常识性的知识基础缺乏足够的覆盖范围和精度有意义提高特定领域的评论的理解。在本文中，我们使用特定领域的常识性知识基础（xSense KBS）介绍xSense，一个有效的制度进行审查理解。我们表明，xSense知识库系统可以低成本地构建和呈现知识蒸馏法，使我们能够使用xSense KB的与BERT一起提高的各种审核修真任务的性能。我们评估xSense了三个评审修真任务：一方面提取方面情感分类，并答疑。我们发现，xSense优于国家的最先进的机型为前两项任务和显著提高基准BERT QA模型，展示了结合常识到审核修真管道的实用性。为了方便日后的研究和应用，我们公开发布的三个具体领域的知识基础和特定领域的问题与本文一起回答标杆。

38. "You are grounded!": Latent Name Artifacts in Pre-trained Language Models [PDF] 返回目录
Vered Shwartz, Rachel Rudinger, Oyvind Tafjord
Abstract: Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for "Donald is a" substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.
摘要：预先训练的语言模型（LMS）可能会延续起源于他们的训练语料库下游模型的偏差。我们专注于具有给定名称（例如，唐纳德），其依赖于语料库可以与特定实体相关联，由下一个标记预测（例如，特普）所指示的表示相关联的伪像。尽管有帮助在某些情况下，接地也发生在以下指定的或不适当的环境。例如，对于产生的词尾“唐纳德是一个”与其他名称的显着不同，并且通常具有更较平均的消极情绪。我们证明对下游任务的潜在影响与阅读理解的探头，这里的名字扰动改变了标准答案。作为一线希望，我们的实验表明，在不同的语料库额外的预培训可以缓解这种偏见。

39. Multi-Step Inference for Reasoning Over Paragraphs [PDF] 返回目录
Jiangming Liu, Matt Gardner
Abstract: Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground between these two extremes: a compositional model reminiscent of neural module networks that can perform chained logical reasoning. This model first finds relevant sentences in the context and then chains them together using neural modules. Our model gives significant performance improvements (up to 29\% relative error reduction when combined with a reranker) on ROPES, a recently-introduced complex reasoning dataset
摘要：在文本复杂的推理，需要理解和链接的自由形式的谓词和逻辑连接词在一起。以前的工作已经在很大程度上试图做到这一点无论是象征性或暗箱变压器。我们目前在这两个极端之间的中间地带：可以执行链逻辑推理神经网络模块的组成模型让人联想起。这种模式首先找到在上下文相关的句子，然后用铁链把他们一起利用神经模块。我们的模型给出了显著的性能提升（最多时有reranker联合29 \％的相对误差减少）上绳索，一个最近推出的复杂的推理数据集

40. Evaluating the Evaluation of Diversity in Natural Language Generation [PDF] 返回目录
Guy Tevet, Jonathan Berant
Abstract: Despite growing interest in natural language generation (NLG) models that produce diverse outputs, there is currently no principled method for evaluating the diversity of an NLG system. In this work, we propose a framework for evaluating diversity metrics. The framework measures the correlation between a proposed diversity metric and a diversity parameter, a single parameter that controls some aspect of diversity in generated text. For example, a diversity parameter might be a binary variable used to instruct crowdsourcing workers to generate text with either low or high content diversity. We demonstrate the utility of our framework by: (a) establishing best practices for eliciting diversity judgments from humans, (b) showing that humans substantially outperform automatic metrics in estimating content diversity, and (c) demonstrating that existing methods for controlling diversity by tuning a "decoding parameter" mostly affect form but not meaning. Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
摘要：尽管在产生不同的输出自然语言生成（NLG）车型日益增长的兴趣，目前还没有原则性方法用于评价NLG系统的多样性。在这项工作中，我们提出了评价指标的多样性的框架。该框架措施单个参数建议的多样性度量和分集参数之间的相关性，即控制在生成的文本多样性的某些方面。例如，一个集参数可能是用于指示众包工作者生成具有低或高含量的多样性文字二元变量。我们证明我们的框架由实用程序：用于从人类引发多样性判断（a）建立最佳实践，（b）显示，人类基本上优于在估计内容的多样性自动度量，以及（c）表明通过调谐控制分集的现有方法一个“解码参数”主要影响的形式，但没有意思。我们的分析框架可以提前不同多样性指标的理解，争取更好NLG系统的道路上的重要一步。

41. Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics [PDF] 返回目录
Xusen Yin, Jonathan May
Abstract: Reinforcement learning algorithms such as Q-learning have shown great promise in training models to learn the optimal action to take for a given system state; a goal in applications with an exploratory or adversarial nature such as task-oriented dialogues or games. However, models that do not have direct access to their state are harder to train; when the only state access is via the medium of language, this can be particularly pronounced. We introduce a new model amenable to deep Q-learning that incorporates a Siamese neural network architecture and a novel refactoring of the Q-value function in order to better represent system state given its approximation over a language channel. We evaluate the model in the context of zero-shot text-based adventure game learning. Extrinsically, our model reaches the baseline's convergence performance point needing only 15% of its iterations, reaches a convergence performance point 15% higher than the baseline's, and is able to play unseen, unrelated games with no fine-tuning. We probe our new model's representation space to determine that intrinsically, this is due to the appropriate clustering of different linguistic mediation into the same state.
摘要：强化学习算法，如Q-学习已经在培训模式学习的最佳行动采取给定系统状态表现出极大的承诺;在探索或对抗性的应用，如面向任务的对话或游戏的目标。但是，没有他们的状态直接访问模式更难列车;当唯一的状态访问是经由语言的平台，这可以特别显着。我们推出了新的模型适合于深Q学习，包含了连体的神经网络结构和Q值的功能，以一种新颖的重构，以赋予其近似在语言通道更好地反映系统状态。我们评估的零射门的基于文本的冒险游戏学习的情况下的模型。外在，我们的模型达到了基线的收敛性能点仅需要15％的迭代，达到收敛性能点比基线的高15％，并能没有微调，发挥看不见的，不相关的游戏。我们探索了新模式的代表性空间，以确定每个本质，这是由于不同的语言调解的适当聚集到相同的状态。

42. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices [PDF] 返回目录
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou
Abstract: Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).
摘要：自然语言处理（NLP）利用巨大的预先训练模式与数以亿计的参数最近已经取得了巨大的成功。然而，这些模型从繁重的模型尺寸和高延迟，使得它们不能被部署到资源有限的移动设备受到影响。在本文中，我们提出MobileBERT用于压缩和加速普及BERT模式。像原来的BERT，MobileBERT是任务无关的，也就是说，它一般可以适用于通过简单的微调各种下游NLP任务。基本上，MobileBERT是BERT_LARGE的薄版本，而配备了瓶颈结构和自关注和前馈网络之间的精心设计的平衡。为了训练MobileBERT，我们首先培养了专门设计的老师模型，纳入BERT_LARGE模型倒瓶颈。然后，我们进行从这个老师MobileBERT知识转移。实证研究表明，MobileBERT是较小的4.3倍和5.5倍的速度比BERT_BASE同时实现对知名基准的竞争结果。胶水的自然语言推理任务，MobileBERT实现了GLUEscoreØ77.7（比BERT_BASE 0.6以下），以及62毫秒的像素4手机上的延迟。在队内V1.1 / V2.0问答任务，MobileBERT达到90.0 / 79.2（1.5 / 2.1比BERT_BASE以上）的开发F1得分。

43. An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines [PDF] 返回目录
Elena Álvarez-Mellado
Abstract: The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.
摘要：anglicisms（从英文词汇借款）的提取既是词典目的和NLP下游任务有关。我们引进与anglicisms和制英语提取基准模型注解欧洲西班牙报纸头条的语料库。在本文中，我们提出：（1）的21,570写在欧洲的西班牙报纸的头条新闻语料标注有紧急anglicisms和（2）与制英语手工提取特征的条件随机场基准模型。我们提出了报纸的头条新闻语料，描述注释标记集和准则，并引进了CRF模型，可以作为基准的检测anglicisms任务。该论文是朝着建立一个制英语提取的西班牙通讯社的第一步。

44. Speaker-change Aware CRF for Dialogue Act Classification [PDF] 返回目录
Guokan Shang, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis, Jean-Pierre Lorré
Abstract: Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem, using neural network models coupled with a Conditional Random Field (CRF) as the last layer. CRF models the conditional probability of the target DA label sequence given the input utterance sequence. However, the task involves another important input sequence, that of speakers, which is ignored by previous work. To address this limitation, this paper proposes a simple modification of the CRF layer that takes speaker-change into account. Experiments on the SwDA corpus show that our modified CRF layer outperforms the original one, with very wide margins for some DA labels. Further, visualizations demonstrate that our CRF layer can learn meaningful, sophisticated transition patterns between DA label pairs conditioned on speaker-change in an end-to-end way. Code is publicly available.
摘要：在对话法案（DA）的分类最近的工作接近任务作为序列标注问题，使用加上一个条件随机场（CRF）作为最后一个层的神经网络模型。 CRF模型给出的输入话语序列的靶DA标签序列的条件概率。但是，这项任务涉及到的是扬声器，这是以前的工作忽略了另一个重要的输入序列。为了解决此限制，本文提出了采用扬声器变化考虑CRF层的简单修改。在SWDA语料库表明我们修改CRF层优于原来的一个，与一些DA标签非常广泛的利润率实验。此外，可视化展示我们的CRF层可以学习的端至端的方式调节喇叭上的变化DA标签对之间有意义的，复杂的过渡模式。代码是公开的。

45. A Few Topical Tweets are Enough for Effective User-Level Stance Detection [PDF] 返回目录
Younes Samih, Kareem Darwish
Abstract: Stance detection entails ascertaining the position of a user towards a target, such as an entity, topic, or claim. Recent work that employs unsupervised classification has shown that performing stance detection on vocal Twitter users, who have many tweets on a target, can yield very high accuracy (+98%). However, such methods perform poorly or fail completely for less vocal users, who may have authored only a few tweets about a target. In this paper, we tackle stance detection for such users using two approaches. In the first approach, we improve user-level stance detection by representing tweets using contextualized embeddings, which capture latent meanings of words in context. We show that this approach outperforms two strong baselines and achieves 89.6% accuracy and 91.3% macro F-measure on eight controversial topics. In the second approach, we expand the tweets of a given user using their Twitter timeline tweets, and then we perform unsupervised classification of the user, which entails clustering a user with other users in the training set. This approach achieves 95.6% accuracy and 93.1% macro F-measure.
摘要：姿态检测需要确定用户的位置朝向目标，诸如一个实体，主题或权利要求。它采用无监督分类最近的工作表明，进行声乐Twitter用户，谁拥有对目标多鸣叫的姿态检测，可以产生非常高的精度（+ 98％）。然而，这种方法表现不佳或声乐的用户少，谁可能创作大约只有目标几鸣叫完全失败。在本文中，我们将处理使用两种方法为此类用户的立场检测。在第一种方法中，我们通过使用情境化的嵌入，其捕获在上下文单词的潜含义代表鸣叫提高用户级的姿态的检测。我们表明，这种方法比两个强基线和实现八个有争议的话题89.6％的精度和91.3％宏观F值。在第二种方法中，我们使用他们的Twitter时间表鸣叫展开某个用户的鸣叫，然后我们执行用户，这需要聚类在训练组的其他用户的用户的监督分类。这种方法实现了95.6％的准确率和93.1％的宏F-措施。

46. Testing pre-trained Transformer models for Lithuanian news clustering [PDF] 返回目录
Lukas Stankevičius, Mantas Lukoševičius
Abstract: A recent introduction of Transformer deep learning architecture made breakthroughs in various natural language processing tasks. However, non-English languages could not leverage such new opportunities with the English text pre-trained models. This changed with research focusing on multilingual models, where less-spoken languages are the main beneficiaries. We compare pre-trained multilingual BERT, XLM-R, and older learned text representation methods as encodings for the task of Lithuanian news clustering. Our results indicate that publicly available pre-trained multilingual Transformer models can be fine-tuned to surpass word vectors but still score much lower than specially trained doc2vec embeddings.
摘要：近期出台变压器深学习架构做出各种自然语言处理任务的突破。然而，非英语语言不能利用与英文文本预训练等车型的新机遇。这改变与科研集中在多语言模型，其中少，使用的语言是主要的受益者。我们比较预先训练多种语言BERT，XLM-R，和老年学文本表示方法的编码立陶宛新闻聚类任务。我们的研究结果表明，公开可用的预先训练的多语种变压器模型可以进行微调，以超越词矢量但仍然得分远远低于受过专门训练的嵌入doc2vec。

47. Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition [PDF] 返回目录
Yi Zheng, Xianjie Yang, Xuyong Dang
Abstract: A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR). Compared with its forerunners, the proposed method uses pronunciation knowledge of homophones in a more complex way. End-to-end ASR models that learn acoustic model and language model jointly and modelling units of characters are necessary conditions for this method. Experiments with hybrid CTC sequence-to-sequence model show that the new method can reduce character error rate (CER) by 0.4% absolutely.
摘要：新标签平滑方法，它利用在人的水平，同音字语言的先验知识，本文对自动语音识别（ASR）的建议。与其前身相比，新方法使用同音字的发音知识在一个更复杂的方式。该学会声学模型和语言模型共同和建模字符为单位端至端ASR模型是这种方法的必要条件。与混合CTC序列到序列模型的实验表明，新方法可以通过0.4％的绝对减少字符错误率（CER）。

48. MedDialog: A Large-scale Medical Dialogue Dataset [PDF] 返回目录
Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, Pengtao Xie
Abstract: Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build a large-scale medical dialogue dataset -- MedDialog -- that contains 1.1 million conversations between patients and doctors and 4 million utterances. To our best knowledge, MedDialog is the largest medical dialogue dataset to date. The dataset is available at this https URL
摘要：医疗对话系统是有希望在远程医疗协助增加获得医疗保健服务，提高患者的护理质量，降低医疗费用。为了促进医疗对话系统的研究和开发，我们建立了一个大型医疗对话集 - MedDialog - 包含110万个病人和医生，400万个话语之间的对话。据我们所知，MedDialog是最大的医疗对话集至今。该数据集可在此HTTPS URL

49. Neural Image Inpainting Guided with Descriptive Text [PDF] 返回目录
Lisai Zhang, Qingcai Chen, Baotian Hu, Shuoran Jiang
Abstract: Neural image inpainting has achieved promising performance in generating semantically plausible content. Most of the recent works mainly focus on inpainting images depending on vision information, while neglecting the semantic information implied in human languages. To acquire more semantically accurate inpainting images, this paper proposes a novel inpainting model named \textit{N}eural \textit{I}mage Inpainting \textit{G}uided with \textit{D}escriptive \textit{T}ext (NIGDT). First, a dual multi-modal attention mechanism is designed to extract the explicit semantic information about corrupted regions. The mechanism is trained to combine the descriptive text and two complementary images through reciprocal attention maps. Second, an image-text matching loss is designed to enforce the model output following the descriptive text. Its goal is to maximize the semantic similarity of the generated image and the text. Finally, experiments are conducted on two open datasets with captions. Experimental results show that the proposed NIGDT model outperforms all compared models on both quantitative and qualitative comparison. The results also demonstrate that the proposed model can generate images consistent with the guidance text, which provides a flexible way for user-guided inpainting. Our systems and code will be released soon.
摘要：在产生合理的语义内容神经图像修复已取得看好的表现。最近期的作品主要集中在依赖于视觉信息补绘的图像，而忽略了人类语言所蕴含的语义信息。为了获得更多语义准确修补图像，提出命名一种新颖修复模型\ textit {N} eural \ textit {I}法师图像修复\ textit {G} uided与\ textit {d} escriptive \ textit横置EXT（NIGDT ）。首先，双多模态注意机制的目的是提取约损坏的地区明确的语义信息。该机制被训练为描述性文字，并通过相互关注地图两种互补的图像组合。第二，图像文本匹配损耗被设计为执行以下描述文本模式输出。它的目标是最大限度地生成的图像和文本的语义相似。最后，实验议两个公开数据集与字幕进行。实验结果表明，该NIGDT模型优于在定性和定量比较所有比较的车型。结果还表明，该模型可生成具有指导文本，它为用户提供引导修补灵活的方式是一致的图像。我们的系统和代码将很快被释放。

50. Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification [PDF] 返回目录
Youngmoon Jung, Seongmin Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim
Abstract: Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, convolutional neural networks are mainly used as a frame-level feature extractor, and speaker embeddings are extracted from the last layer of the feature extractor. Multi-scale aggregation (MSA), which utilizes multi-scale features from different layers of the feature extractor, has recently been introduced into the approach and has shown improved performance for both short and long utterances. This paper improves the MSA by using a feature pyramid module, which enhances speaker-discriminative information of features at multiple layers via a top-down pathway and lateral connections. We extract speaker embeddings using the enhanced features that contain rich speaker information at different resolutions. Experiments on the VoxCeleb dataset show that the proposed module improves previous MSA methods with a smaller number of parameters, providing better performance than state-of-the-art approaches.
摘要：目前，对说话人验证的最广泛使用的方法是深嵌入扬声器学习。在这种方法中，卷积神经网络主要用作一个帧级特征提取器，和扬声器的嵌入从特征提取器的最后一层萃取。多尺度聚合（MSA），它利用来自特征提取器的不同层多尺度的特征，最近已引入的方法，并已示出了用于短期和长期的话语改进的性能。本文通过使用特征金字塔模块，这增强了在多个层通过自顶向下的通路和横向连接的特征的说话者的区别信息提高了MSA。我们提取使用包含在不同的分辨率丰富的扬声器信息的增强功能的嵌入扬声器。在VoxCeleb数据集表明，该模块提高了以前的MSA方法与参数的数量较少，提供比国家的最先进的方法更好的性能试验。

51. Multilingual enrichment of disease biomedical ontologies [PDF] 返回目录
Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch
Abstract: Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both ontologies, plus Arabic, Chinese and Russian for the second one. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation.
摘要：生物医学翻译本体是一个重要的挑战，但这样做需要手工大量的时间和金钱。我们研究使用开源知识库翻译生物医学本体的可能性。我们专注于两个方面：覆盖面和质量。我们来看两个生物医学本体重点疾病相对于维基数据的9种欧洲语言（捷克语，荷兰语，英语，法语，德语，意大利语，波兰语，葡萄牙语和西班牙语）两种本体，再加上阿拉伯语，中国和俄罗斯的覆盖面第二个。我们首先使用维基数据和研究的本体之间的直接联系，然后通过其他中间本体打算使用二阶环节。然后，我们比较商用机器翻译工具，谷歌在这里翻译云获得感谢维基数据的翻译质量。

52. Predicting Strategic Behavior from Free Text [PDF] 返回目录
Omer Ben-Porat, Sharon Hirsch, Lital Kuchy, Guy Elad, Roi Reichart, Moshe Tennenholtz
Abstract: The connection between messaging and action is fundamental both to web applications, such as web search and sentiment analysis, and to economics. However, while prominent online applications exploit messaging in natural (human) language in order to predict non-strategic action selection, the economics literature focuses on the connection between structured stylized messaging to strategic decisions in games and multi-agent encounters. This paper aims to connect these two strands of research, which we consider highly timely and important due to the vast online textual communication on the web. Particularly, we introduce the following question: can free text expressed in natural language serve for the prediction of action selection in an economic context, modeled as a game? In order to initiate the research on this question, we introduce the study of an individual's action prediction in a one-shot game based on free text he/she provides, while being unaware of the game to be played. We approach the problem by attributing commonsensical personality attributes via crowd-sourcing to free texts written by individuals, and employing transductive learning to predict actions taken by these individuals in one-shot games based on these attributes. Our approach allows us to train a single classifier that can make predictions with respect to actions taken in multiple games. In experiments with three well-studied games, our algorithm compares favorably with strong alternative approaches. In ablation analysis, we demonstrate the importance of our modeling choices -- the representation of the text with the commonsensical personality attributes and our classifier -- to the predictive power of our model.
摘要：消息和动作之间的连接既Web应用程序，如Web搜索和情感分析基础，以及经济性。然而，尽管突出的在线应用程序利用短信自然（人）语言，以预测非战略性的行动选择，经济学文献着眼于战略决策在游戏和多代理遇到结构程式化消息之间的连接上。本文旨在研究这两个方向，我们认为非常及时和重要，因为在网络上广袤的在线文本通信连接。特别是，我们介绍了以下问题：可自由在自然语言服务的行动选择在经济方面的预测表示文字，模拟成一个游戏？为了启动对这个问题的研究，介绍了基于他/她提供免费的文字一个人的动作预测的在一次性博弈的研究，而意识不到比赛的进行播放。我们通过通过众包由个人编写自由文本，并采用推式学习归因常识的个性属性来预测这些个人根据这些属性一次性博弈所采取的行动解决这个问题。我们的方法使我们能够训练一个分类，可以使预测相对于在多个游戏所采取的行动。在有三个充分研究游戏的实验中，我们的算法具有较强的替代方法相比，毫不逊色。在消融的分析，我们证明了我们的建模选择的重要性 - 与常识的个性属性和我们的分类文本的表示 - 我们的模型的预测能力。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-08

目录

摘要