摘要

1. Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries [PDF] 返回目录
Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber
Abstract: Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI). Recent CLWE methods use linear projections, which underfit the training dictionary, to generalize on BLI. However, underfitting can hinder generalization to other downstream tasks that rely on words from the training dictionary. We address this limitation by retrofitting CLWE to the training dictionary, which pulls training translation pairs closer in the embedding space and overfits the training dictionary. This simple post-processing step often improves accuracy on two downstream tasks, despite lowering BLI test accuracy. We also retrofit to both the training dictionary and a synthetic dictionary induced from CLWE, which sometimes generalizes even better on downstream tasks. Our results confirm the importance of fully exploiting training dictionary in downstream tasks and explains why BLI is a flawed CLWE evaluation.
摘要：跨语言的嵌入字（CLWE）经常评估双语词典感应（BLI）。最近CLWE方法使用线性突起，其underfit训练字典，概括上BLI。然而，欠拟合可以阻碍泛化依赖于从训练词典单词等下游任务。我们通过改造CLWE的训练字典，从而牵引训练的翻译对在嵌入空间越走越overfits训练字典解决此限制。这个简单的后处理步骤通常改进了两个下游任务准确性，尽管降低BLI测试精度。我们还改造了训练字典和CLWE，有时甚至更好的推广上下游任务诱导合成的字典两者。我们的研究结果证实了充分利用下游任务训练字典，解释了为什么BLI是一个有缺陷的CLWE评估的重要性。

2. HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization [PDF] 返回目录
Yue Dong, Andrei Romascanu, Jackie C. K. Cheung
Abstract: We propose a novel graph-based ranking model for unsupervised extractive summarization of long documents. Graph-based ranking models typically represent documents as undirected fully-connected graphs, where a node is a sentence, an edge is weighted based on sentence-pair similarity, and sentence importance is measured via node centrality. Our method leverages positional and hierarchical information grounded in discourse structure to augment a document's graph representation with hierarchy and directionality. Experimental results on PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins and performs comparably to some of the state-of-the-art supervised models that are trained on hundreds of thousands of examples. In addition, we find that our method provides comparable improvements with various distributional sentence representations; including BERT and RoBERTa models fine-tuned on sentence similarity.
摘要：我们提出了长文档的无监督采掘总结了一种新的基于图的排序模型。基于图形的排名模型通常表示文档作为无向完全连接的图，其中节点是一个句子，一个边缘是基于句对相似度进行加权，和句子重要性通过节点中心性测量。我们的方法利用了在话语结构接地位置和分层信息，以增加与层次结构和方向性的文档的图形表示。在考研和数据集的arXiv实验结果表明，我们的方法，通过广泛的利润率和执行优于强无监督基线相当一些被数十万的例子训练有素的国家的最先进的监管模式。此外，我们发现，我们的方法提供了各种分布式一句陈述可比性的改善;包括BERT和罗伯塔模式微调的句子相似度。

3. SciREX: A Challenge Dataset for Document-Level Information Extraction [PDF] 返回目录
Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz Beltagy
Abstract: Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an understanding of the whole document to annotate entities and their document-level relationships that usually span beyond sentences or even sections. In this paper, we introduce SciREX, a document level IE dataset that encompasses multiple IE tasks, including salient entity identification and document level $N$-ary relation identification from scientific articles. We annotate our dataset by integrating automatic and human annotations, leveraging existing scientific knowledge resources. We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE. Analyzing the model performance shows a significant gap between human performance and current baselines, inviting the community to use our dataset as a challenge to develop document-level IE models. Our data and code are publicly available at this https URL
摘要：从全文档中提取信息，在许多领域上确定一个句子或段落中关系中的重要问题，但大多数以前的工作重点。这是一个挑战，因为它需要整个文档注释实体及其文档级别的关系，通常跨越超越的句子，甚至部分的理解，创造在文档级别大规模信息抽取（IE）的数据集。在本文中，我们介绍SciREX，文档级IE数据集涵盖了多个IE任务，包括突出的实体识别和文档级$ N $从科学文章进制相关标识。我们通过整合自动和人的注释，充分利用现有的科学知识资源注释我们的数据。我们开发了一个神经元模型作为扩展了以前的国家的最先进的IE模型文档级IE强大的基线。分析模型表现显示人的表现和目前的基准之间的差距显著，邀请社会各界使用我们的数据集是一个挑战，开发文档级IE模型。我们的数据和代码是公开的，在此HTTPS URL

4. Structured Tuning for Semantic Role Labeling [PDF] 返回目录
Tao Li, Parth Anand Jawale, Martha Palmer, Vivek Srikumar
Abstract: Recent neural network-driven semantic role labeling (SRL) systems have shown impressive improvements in F1 scores. These improvements are due to expressive input representations, which, at least at the surface, are orthogonal to knowledge-rich constrained decoding mechanisms that helped linear SRL models. Introducing the benefits of structure to inform neural models presents a methodological challenge. In this paper, we present a structured tuning framework to improve models using softened constraints only at training time. Our framework leverages the expressiveness of neural networks and provides supervision with structured loss components. We start with a strong baseline (RoBERTa) to validate the impact of our approach, and show that our framework outperforms the baseline by learning to comply with declarative constraints. Additionally, our experiments with smaller training sizes show that we can achieve consistent improvements under low-resource scenarios.
摘要：最近的神经网络驱动的语义角色标注（SRL）系统已经在F1的得分表现出了不俗的改进。这些改进是由于表现输入的表示，其中，至少在表面上，正交知识丰富的约束解码机制，帮助线性SRL模型。引入结构的好处告知神经模型提出了方法论的挑战。在本文中，我们提出了一个结构调整的框架只是在训练时使用软约束来提高车型。我们的框架利用神经网络的表现，并提供监督与结构的损耗成分。我们先从一个强大的基线（罗伯塔）来验证我们的做法的影响，并表明我们的框架通过学习遵守声明约束优于基准。此外，我们与更小尺寸的培训实验表明，我们可以实现在低资源的情况持续改善。

5. ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations [PDF] 返回目录
Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia
Abstract: In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.
摘要：为了简化一个句子，人工编辑执行多个重写变换：它们将它分成几个较短的句子，词复述（即取代复杂的单词或由简单的同义词短语），重新排序部件，和/或删除信息认为是不必要的。尽管可能的文本改变这些不同的范围内，目前的模型自动句子简化使用的是集中于单一的转化，如词法复述或分裂的数据集进行评估。这使得它无法理解的简化模型更加逼真的设置的能力。为了缓解这一限制，本文介绍了资产，在英语句子评估简化新的数据集。 ASSET是众包多参考语料库，其中每个简化通过执行几个重写变换产生的。通过定量和定性实验，我们表明，在资产简化是更好地相比其他标准的评估数据集的任务时捕获简单的特点。此外，我们鼓励开发利用资产进行自动评估更好的方法的需要，因为我们表明，当执行多个简化的变换，目前流行的指标可能不适合。

6. Knowledge Base Inference for Regular Expression Queries [PDF] 返回目录
Vaibhav Adlakha, Parth Shah, Srikanta Bedathur, Mausam
Abstract: Two common types of tasks on Knowledge Bases have been studied -- single link prediction (Knowledge Base Completion) and path query answering. However, our analysis of user queries on a real-world knowledge base reveals that a significant fraction of queries specify paths using regular expressions(regex). Such regex queries cannot be handled by any of the existing link prediction or path query answering models. In response, we present Regex Query Answering, the novel task of answering regex queries on incomplete KBs. We contribute two datasets for the task, including one where test queries are harvested from actual user querylogs. We train baseline neural models for our new task and propose novel ways to handle disjunction and Kleene plus regex operators.
摘要：两种常见类型的知识基地的任务进行了研究 - 单链路预测（知识库完成）和路径查询应答。然而，我们对现实世界的知识基础的用户查询的分析表明，查询的显著部分指定使用正则表达式（正则表达式）的路径。这样的正则表达式查询不能被任何现有的链接预测或路径查询应答模型来处理。作为回应，我们目前正则表达式查询应答，不完整的KB的回答正则表达式查询的新任务。我们贡献两个数据集的任务，其中有一个测试查询从实际的用户querylogs收获。我们培养的基线神经模型为我们的新任务，并提出新的方法来处理脱节和克林加regex操作符。

7. MedType: Improving Medical Entity Linking with Semantic Type Prediction [PDF] 返回目录
Shikhar Vashishth, Rishabh Joshi, Ritam Dutt, Denis Newman-Griffis, Carolyn Rose
Abstract: Medical entity linking is the task of identifying and standardizing concepts referred in a scientific article or clinical record. Existing methods adopt a two-step approach of detecting mentions and identifying a list of candidate concepts for them. In this paper, we probe the impact of incorporating an entity disambiguation step in existing entity linkers. For this, we present MedType, a novel method that leverages the surrounding context to identify the semantic type of a mention and uses it for filtering out candidate concepts of the wrong types. We further present two novel largescale, automatically-created datasets of medical entity mentions: WIKIMED, a Wikipediabased dataset for cross-domain transfer learning, and PUBMEDDS, a distantly-supervised dataset of medical entity mentions in biomedical abstracts. Through extensive experiments across several datasets and methods, we demonstrate that MedType pre-trained on our proposed datasets substantially improve medical entity linking and gives state-of-the-art performance. We make our source code and datasets publicly available for medical entity linking research.
摘要：医疗实体链接是识别和标准化的科学论文或临床记录所引用的概念的任务。现有的方法采用的检测提到并识别为他们候选概念的列表的两个步骤的方法。在本文中，我们探讨装配在现有实体接头实体消歧步的影响。对于这一点，我们现在MedType，它利用周围的上下文，以确定所述语义类型一提，并使用它的用于过滤出错误的类型的候选概念的新颖方法。我们进一步本两个新的大规模，医疗实体的自动创建的数据集提到：WIKIMED，一个数据集Wikipediabased跨域转移的学习，和PUBMEDDS，医疗实体的远亲监督数据集在生物医学摘要中提到。通过跨多个数据集和方法广泛的实验，我们证明了MedType预先训练对我们提出的数据集，大幅提高医疗实体连接，并给出国家的最先进的性能。我们使我们的源代码和数据集可公开获得的医疗实体链接研究。

8. Style Variation as a Vantage Point for Code-Switching [PDF] 返回目录
Khyathi Raghavi Chandu, Alan W Black
Abstract: Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities, thereby attaining prevalence in digital and social media platforms. This increasing prominence demands the need to model CS languages for critical downstream tasks. A major problem in this domain is the dearth of annotated data and a substantial corpora to train large scale neural models. Generating vast amounts of quality text assists several down stream tasks that heavily rely on language modeling such as speech recognition, text-to-speech synthesis etc,. We present a novel vantage point of CS to be style variations between both the participating languages. Our approach does not need any external annotations such as lexical language ids. It mainly relies on easily obtainable monolingual corpora without any parallel alignment and a limited set of naturally CS sentences. We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences. We present our experiments on the following pairs of languages: Spanish-English, Mandarin-English, Hindi-English and Arabic-French. We show that the trends in metrics for generated CS move closer to real CS data in each of the above language pairs through the dual stage training process. We believe this viewpoint of CS as style variations opens new perspectives for modeling various tasks in CS text.
摘要：语码转换（CS）在一些双语及多种语言的社区观察，从而实现患病率数字和社交媒体平台的一个普遍现象。这日益突出，要求需要CS语言模型的关键下游任务。在这一领域的主要问题是注释数据的缺乏和大量语料训练大规模神经模型。产生大量高质量文本的帮助，在很大程度上依赖于语言，比如语音识别模型的几个下游任务，文本到语音合成等，。我们提出CS的一个新的制高点既参与语言之间的风格差异。我们的方法不需要任何外部注解如词汇语言ID。它主要依赖容易获得的单语语料库没有任何平行对齐和一组有限的天然CS句子。我们建议，其中第一阶段产生有竞争力的负面例子为CS和第二阶段产生更真实的CS句子两级生成对抗性的训练方法。我们提出以下对语言我们的实验：西班牙英语，普通话，英语，印地文，英语和阿拉伯语，法语。我们发现，在指标产生的CS趋势通过双阶段训练过程更加接近真人CS数据在上述各语言对。我们相信CS的这一观点的风格变化打开新的前景在CS文字造型的各种任务。

9. USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation [PDF] 返回目录
Shikib Mehri, Maxine Eskenazi
Abstract: The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.
摘要：对话缺乏有意义的自动评估指标的阻碍开放域对话框的研究。标准语言生成的指标已被证明是无效的评估对话模式。为此，本文提出USR，无监督和参考，免费评估度量对话框。 USR是一个免费的参考指标，用于列车无监督模型计量对话框的若干所期望的品质。 USR被示出为与两个局部聊天人的判断强烈相关（转级：0.42，系统级：1.0）和PersonaChat（转级：0.48和系统级：1.0）。 USR另外产生用于对话的几个理想的性能可解释的措施。

10. Defense of Word-level Adversarial Attacks via Random Substitution Encoding [PDF] 返回目录
Zhaoyang Wang, Hongtao Wang
Abstract: The adversarial attacks against deep neural networks on computer version tasks has spawned many new technologies that help protect models avoiding false prediction. Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decision. Unfortunately, few previous literatures have discussed the defense of such word-level synonym substitution based attacks since they are hard to be perceived and detected. In this paper, we shed light on this problem and propose a novel defense framework called Random Substitution Encoding (RSE), which introduces a random substitution encoder into the training process of original neural networks. Extensive experiments on text classification tasks demonstrate the effectiveness of our framework on defense of word-level adversarial attacks, under various base and attack models.
摘要：针对深层神经网络计算机版本任务的敌对攻击催生了许多新的技术，有助于保护模式避免错误的预测。近日，在自然语言处理（NLP）的任务也表现出强大的力量，例如深车型，愚弄情感分类神经网络字级敌对攻击做出错误的决定。不幸的是，一些以往文献已经讨论了这样的字级同义词替换基于攻击防御，因为他们很难被察觉和检测。在本文中，我们阐明了这个问题，并提出了所谓的随机置换编码（RSE）一种新型的防御框架，它引入了一个随机置换编码器原始神经网络的训练过程。文本分类任务，大量的实验证明了我们对词级对抗攻击防御框架的有效性，在各种基地和攻击模式。

11. Topological Sort for Sentence Ordering [PDF] 返回目录
Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W Black
Abstract: Sentence ordering is the task of arranging the sentences of a given text in the correct order. Recent work using deep neural networks for this task has framed it as a sequence prediction problem. In this paper, we propose a new framing of this task as a constraint solving problem and introduce a new technique to solve it. Additionally, we propose a human evaluation for this task. The results on both automatic and human metrics across four different datasets show that this new technique is better at capturing coherence in documents.
摘要：句子顺序是安排给定文本的句子按照正确的顺序的任务。使用深层神经网络，此任务最近的工作框架它作为一个序列预测问题。在本文中，我们提出了这个任务作为约束求解问题的一种新的框架，并引入新的技术来解决它。此外，我们提出了一个人工评估这项任务。就在四个不同的数据集自动和人的度量结果表明，这种新技术是在文档捕获的一致性更好。

12. Identifying Necessary Elements for BERT's Multilinguality [PDF] 返回目录
Philipp Dufter, Hinrich Schütze
Abstract: It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is suprising given that mBERT does not use any kind of crosslingual signal during training. While recent literature has studied this effect, the exact reason for mBERT's multilinguality is still unknown. We aim to identify architectural properties of BERT as well as linguistic properties of languages that are necessary for BERT to become multilingual. To allow for fast experimentation we propose an efficient setup with small BERT models and synthetic as well as natural data. Overall, we identify six elements that are potentially necessary for BERT to be multilingual. Architectural factors that contribute to multilinguality are underparameterization, shared special tokens (e.g., "[CLS]"), shared position embeddings and replacing masked tokens with random tokens. Factors related to training data that are beneficial for multilinguality are similar word order and comparability of corpora.
摘要：已经表明，多语种BERT（mBERT）得到高品质的多语言表述，使有效的零出手转让。这，令人惊奇因为mBERT不训练过程中使用任何一种crosslingual信号。虽然最近的文献研究了这种效果，对于mBERT的多语言的确切原因仍是未知数。我们的目标是找出BERT的结构特性以及对所必需的BERT成为多语种语言的语言特性。为了能够快速的实验，我们建议小BERT模型和合成以及天然数据的有效设置。总体而言，我们识别出6个元素是潜在的必要BERT是多语种。促成多语言体系结构因素是underparameterization，共享特殊标记（例如，“[CLS]”），共享的嵌入位置，并用随机令牌替换掩蔽令牌。相关的培训是对多语言数据有利于的因素是类似的词序和语料的可比性。

13. Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter [PDF] 返回目录
Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
Abstract: We present a new challenging stance detection dataset, called Will-They-Won't-They (WT-WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.
摘要：本文提出了一种新的挑战姿态检测数据集，叫做威尔 - 他们，不需额外-他们（WT-WT），其中包含英语51284个鸣叫，目前该类型的最大可用数据集使得它。所有注释是由专家进行;因此，该数据集构成高品质和可靠的基准为姿态检测未来的研究。我们具有广泛的最近的国家的最先进的姿态检测系统的实验表明，该数据集构成对现有车型了强有力的挑战在这一领域。

14. Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research [PDF] 返回目录
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Rada Mihalcea
Abstract: Sentiment analysis as a field has come a long way since it was first introduced as a task nearly 20 years ago. It has widespread commercial applications in various domains like marketing, risk management, market research, and politics, to name a few. Given its saturation in specific subtasks -- such as sentiment polarity classification -- and datasets, there is an underlying perception that this field has reached its maturity. In this article, we discuss this perception by pointing out the shortcomings and under-explored, yet key aspects of this field that are necessary to attain true sentiment understanding. We analyze the significant leaps responsible for its current relevance. Further, we attempt to chart a possible course for this field that covers many overlooked and unanswered questions.
摘要：情感分析的领域已经走过了很长的路要走，因为它是第一个近20年前引入的任务。它像市场营销，风险管理，市场研究，以及政治各领域的广泛的商业应用，仅举几例。鉴于其在特定的子任务饱和度 - 比如情感极性分类 - 和数据集，有一个基本的看法，这个领域已发展成熟。在这篇文章中，我们通过指出缺点和不足的探索讨论这个问题的看法，但认为是必要的这一领域的关键环节，以实现真正的情绪理解。我们分析负责其当前相关的显著飞跃。此外，我们试图开辟一条可能的路线为这一领域涵盖了许多被忽视和没有答案的问题。

15. Multilingual Unsupervised Sentence Simplification [PDF] 返回目录
Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot
Abstract: Progress in Sentence Simplification has been hindered by the lack of supervised data, particularly in languages other than English. Previous work has aligned sentences from original and simplified corpora such as English Wikipedia and Simple English Wikipedia, but this limits corpus size, domain, and language. In this work, we propose using unsupervised mining techniques to automatically create training corpora for simplification in multiple languages from raw Common Crawl web data. When coupled with a controllable generation mechanism that can flexibly adjust attributes such as length and lexical complexity, these mined paraphrase corpora can be used to train simplification systems in any language. We further incorporate multilingual unsupervised pretraining methods to create even stronger models and show that by training on mined data rather than supervised corpora, we outperform the previous best results. We evaluate our approach on English, French, and Spanish simplification benchmarks and reach state-of-the-art performance with a totally unsupervised approach. We will release our models and code to mine the data in any language included in Common Crawl.
摘要：在句子简化进展一直阻碍是缺乏监督的数据，特别是在英语以外的语言。以前的工作已经从原来的排列句子和简化的语料库，如英文维基百科和简单的英文维基百科，但这种限制语料规模，领域和语言。在这项工作中，我们提出了利用不受监督挖掘技术来对从原料通用抓取网页的数据多国语言简化自动创建训练库。当具有可控生成机构，其能够灵活地调整属性，如长度和词汇复杂性耦合，这些开采复述语料库可以用任何语言被用来简化列车系统。我们进一步整合多语种无人监督训练前的方法来创建更强大的模型，并表明，通过对挖掘出的数据训练，而不是监督的语料库，我们超越以前的最好成绩。我们评估我们在英语，法语的完全无监督的方式方法，并简化西班牙基准和国家的最先进的性能范围。我们将纳入共同的抓取任何语言发布我们的模型和代码到矿井中的数据。

16. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [PDF] 返回目录
Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen
Abstract: In order to simulate human language capacity, natural language processing systems must complement the explicit information derived from raw text with the ability to reason about the possible causes and outcomes of everyday situations. Moreover, the acquired world knowledge should generalise to new languages, modulo cultural differences. Advances in machine commonsense reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We benchmark a range of state-of-the-art models on this novel dataset, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin. Finally, we propose ways to adapt these models to out-of-sample resource-lean languages where only a small corpus or a bilingual dictionary is available, and report substantial improvements over the random baseline. XCOPA is available at this http URL.
摘要：为了模拟人类的语言能力，自然语言处理系统必须补充从与能力，推理的日常情况可能的原因和结果的原始文本得到的明确信息。此外，所获得的世界知识应该推广到新的语言，模文化差异。在机常识推理和跨语种转移进展取决于挑战评价基准的可用性。双方需求的推动下，我们引入可行的替代品（XCOPA），对11种语言的因果常识推理一个类型学的多元化多语种数据集的跨语言选择。我们的基准上这本小说集各种先进设备，最先进的车型，揭示了基于多语言训练前和零射门的微调传递电流的方法，从多语言的诅咒受苦，在由单语设置短的性能下降大比分。最后，我们提出如何将这些模型适应外的样本资源贫乏的语言，其中只有一小阴茎或双语词典是可用的，并且报告了随机基线实质性的改进。 XCOPA可在此HTTP URL。

17. CDL: Curriculum Dual Learning for Emotion-Controllable Response Generation [PDF] 返回目录
Lei Shen, Yang Feng
Abstract: Emotion-controllable response generation is an attractive and valuable task that aims to make open-domain conversations more empathetic and engaging. Existing methods mainly enhance the emotion expression by adding regularization terms to standard cross-entropy loss and thus influence the training process. However, due to the lack of further consideration of content consistency, the common problem of response generation tasks, safe response, is intensified. Besides, query emotions that can help model the relationship between query and response are simply ignored in previous models, which would further hurt the coherence. To alleviate these problems, we propose a novel framework named Curriculum Dual Learning (CDL) which extends the emotion-controllable response generation to a dual task to generate emotional responses and emotional queries alternatively. CDL utilizes two rewards focusing on emotion and content to improve the duality. Additionally, it applies curriculum learning to gradually generate high-quality responses based on the difficulties of expressing various emotions. Experimental results show that CDL significantly outperforms the baselines in terms of coherence, diversity, and relation to emotion factors.
摘要：情绪控制的响应生成是一个有吸引力的和有价值的任务，目的是使开放域的对话更加同情和吸引力。现有方法主要通过添加正则项以标准的交叉熵损失增强情绪表达，从而影响训练过程。然而，由于缺乏进一步审议的内容的一致性，响应生成任务，安全响应的共同问题，是愈演愈烈。此外，查询的情绪，可以帮助模型查询和响应之间的关系，只是在以前的型号，这将进一步伤害一致性忽略。为了解决这些问题，我们提出了一个名为课程学习双（CDL），它扩展了情绪控制的响应产生的双重任务产生的情绪反应和情感的查询或者一个新的框架。 CDL利用两个奖励注重情感和内容，以提高对偶。此外，它适用的课程学习，逐步产生基于表达各种情绪的困难，高品质的响应。实验结果表明，CDL显著优于基准的情感因素的连贯性，多样性和关系的条款。

18. Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi [PDF] 返回目录
Benjamin Muller, Benoit Sagot, Djamé Seddah
Abstract: Building natural language processing systems for non standardized and low resource languages is a difficult challenge. The recent success of large-scale multilingual pretrained language models provides new modeling tools to tackle this. In this work, we study the ability of multilingual language models to process an unseen dialect. We take user generated North-African Arabic as our case study, a resource-poor dialectal variety of Arabic with frequent code-mixing with French and written in Arabizi, a non-standardized transliteration of Arabic to Latin script. Focusing on two tasks, part-of-speech tagging and dependency parsing, we show in zero-shot and unsupervised adaptation scenarios that multilingual language models are able to transfer to such an unseen dialect, specifically in two extreme cases: (i) across scripts, using Modern Standard Arabic as a source language, and (ii) from a distantly related language, unseen during pretraining, namely Maltese. Our results constitute the first successful transfer experiments on this dialect, paving thus the way for the development of an NLP ecosystem for resource-scarce, non-standardized and highly variable vernacular languages.
摘要：对于非标准化和低资源语言大厦的自然语言处理系统是一个艰难的挑战。最近的大型多语种预训练的语言模型的成功提供了新的建模工具来解决这个。在这项工作中，我们研究了多语种的语言模型来处理看不见的方言的能力。我们把生成的北非阿拉伯我们的案例，一个资源贫乏的方言品种阿拉伯语的频繁与法国代码混合并写入Arabizi，阿拉伯语的非标准化音译拉丁字母用户。专注于两项任务，部分词性标注和依存分析中，我们展示的零射门和无监督适应场景是多语种的语言模型能够在两种极端的情况下转移到这样一个看不见的话，具体是：（一）跨脚本使用现代标准阿拉伯语作为源语言，并且从（ⅱ）一个远亲语言，训练前，即马耳他期间看不见。我们的研究结果构成了这个方言的第一个成功转移实验，从而铺平了NLP生态系统资源匮乏，非标准化和高度可变的白话语言的发展道路。

19. Self-supervised Knowledge Triplet Learning for Zero-shot Question Answering [PDF] 返回目录
Pratyay Banerjee, Chitta Baral
Abstract: The aim of all Question Answering (QA) systems is to be able to generalize to unseen questions. Most of the current methods rely on learning every possible scenario which is reliant on expensive data annotation. Moreover, such annotations can introduce unintended bias which makes systems focus more on the bias than the actual task. In this work, we propose Knowledge Triplet Learning, a self-supervised task over knowledge graphs. We propose methods of how to use such a model to perform zero-shot QA and our experiments show considerable improvements over large pre-trained generative models.
摘要：所有的问题回答（QA）系统的目的是希望能够推广到看不见的问题。目前的大部分方法依赖于学习的每一个可能的方案是昂贵的数据注解的依赖。此外，这样的标注可以引入意想不到的偏见，这使得系统更专注于偏置比实际任务。在这项工作中，我们提出了知识学习的三重态，在知识图自监督任务。我们提出了如何利用这样的模式方法来执行零射门的质量保证，我们的实验表明，在大预先训练生成模型相当大的改善。

20. Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance [PDF] 返回目录
Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych
Abstract: Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.
摘要：模型的自然语言理解（NLU）任务通常依赖于数据集的特质偏见，这使得他们脆对抗训练分发之外的测试用例。最近，一些提出的消除直流偏压方法被证明是提高外的分布表现非常有效。然而，他们改进而来，在业绩下降的代价时，模型上的分布数据，其中含有较高的多样性的例子进行评估。这个看似不可避免的权衡可能不告诉我们很多关于推理的变化，更广泛类型的超越外的分布数据表示的小部分例子产生的模型的理解能力。在本文中，我们解决这个折衷通过引入一种新的去除偏差的方法，叫做信心正规化，从开发偏见阻碍车型，同时使他们能够获得足够的动力来自所有训练实例来学习。我们评估我们的三个自然语言理解任务，方法和显示，相比于它的前辈，它提高了外的分布数据集（上HANS数据集例如，7pp增益）的性能，同时保持原有的分布精确度。

21. Language (Re)modelling: Towards Embodied Language Understanding [PDF] 返回目录
Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, Dafna Shahaf
Abstract: While natural language understanding (NLU) is advancing rapidly, today's technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency, interpretability, and generalization. This work proposes an approach to representation and learning based on the tenets of embodied cognitive linguistics (ECL). According to ECL, natural language is inherently executable (like programming languages), driven by mental simulation and metaphoric mappings over hierarchical compositions of structures and schemata learned through embodied interaction. This position paper argues that the use of grounding by metaphoric inference and simulation will greatly benefit NLU systems, and proposes a system architecture along with a roadmap towards realizing this vision.
摘要：在自然语言理解（NLU）发展日新月异，今天的技术不同于类似人类的语言基本方式理解，特别是在它的下效率，可解释性和推广。这项工作提出了一种方法来表示和学习基础上体现认知语言学（ECL）的原理。据ECL，自然语言本身是可执行文件（如编程语言），通过心理模拟和隐喻映射在通过体现互动学到结构和图式的层次结构组成驱动。该意见书认为，通过隐喻的推理和模拟使用接地将大大有利于NLU系统，并提出了系统架构在实现这一愿景的路线图一起。

22. Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation [PDF] 返回目录
Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way
Abstract: Machine translation (MT) has benefited from using synthetic training data originating from translating monolingual corpora, a technique known as backtranslation. Combining backtranslated data from different sources has led to better results than when using such data in isolation. In this work we analyse the impact that data translated with rule-based, phrase-based statistical and neural MT systems has on new MT systems. We use a real-world low-resource use-case (Basque-to-Spanish in the clinical domain) as well as a high-resource language pair (German-to-English) to test different scenarios with backtranslation and employ data selection to optimise the synthetic corpora. We exploit different data selection strategies in order to reduce the amount of data used, while at the same time maintaining high-quality MT systems. We further tune the data selection method by taking into account the quality of the MT systems used for backtranslation and lexical diversity of the resulting corpora. Our experiments show that incorporating backtranslated data from different sources can be beneficial, and that availing of data selection can yield improved performance.
摘要：机器翻译（MT）从使用合成训练数据源自翻译单语语料库，被称为回译的技术中获益。来自不同来源的组合backtranslated数据导致了更好的结果在隔离使用这样的数据时，比。在这项工作中，我们分析了影响的是翻译与基于规则的，基于短语的统计和神经的机器翻译系统的数据对新机器翻译系统。我们用一个真实世界的低资源使用情况（巴斯克到西班牙在临床领域）以及高资源的语言对（德语到英语）来测试与回译和应用数据选择不同的场景优化合成全集。我们为了减少所使用的数据量，而同时保持高品质的机器翻译系统利用不同的数据选择策略。我们进一步调整考虑到用于回译所得语料库的词汇多样性MT系统的质量数据选择方法。我们的实验表明，不同来源的合并backtranslated数据可以是有益的，并且数据选择的是援用可以产生更好的性能。

23. Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing [PDF] 返回目录
Manikandan Ravikiran, Amin Ekant Muljibhai, Toshinori Miyoshi, Hiroaki Ozaki, Yuta Koreeda, Sakata Masayuki
Abstract: In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.
摘要：在本文中，我们提出我们参与SemEval-2020其重点是在嘈杂的标签攻击性的语言识别任务的子任务12-A（英语）。为此，我们开发利用进攻单词表与使用统计抽样算法（SA）和处理后（PP）选择鸣叫训练BERT分类的混合动力系统。我们的开发的系统实现34 0.90913与宏平均F1-得分（微距-F1）个位置上进攻性和非进攻性类。进一步的研究表明全面的结果和误差分析，以帮助未来攻击性的语言识别与嘈杂的标签的研究。

24. Facilitating Access to Multilingual COVID-19 Information via Neural Machine Translation [PDF] 返回目录
Andy Way, Rejwanul Haque, Guodong Xie, Federico Gaspari, Maja Popovic, Alberto Poncelas
Abstract: Every day, more people are becoming infected and dying from exposure to COVID-19. Some countries in Europe like Spain, France, the UK and Italy have suffered particularly badly from the virus. Others such as Germany appear to have coped extremely well. Both health professionals and the general public are keen to receive up-to-date information on the effects of the virus, as well as treatments that have proven to be effective. In cases where language is a barrier to access of pertinent information, machine translation (MT) may help people assimilate information published in different languages. Our MT systems trained on COVID-19 data are freely available for anyone to use to help translate information published in German, French, Italian, Spanish into English, as well as the reverse direction.
摘要：每天，越来越多的人正在成为感染和受到渴望COVID-19。在欧洲一些国家，如西班牙，法国，英国和意大利已经从病毒遭受尤深。其他如德国似乎已经能够适应得非常好。双方卫生专业人员和普通大众都热衷于收到已被证明是有效的病毒的影响，以及治疗上的最新信息。在语言是障碍的相关信息接入的情况下，机器翻译（MT）可以帮助人们发表不同语言的吸收信息。我们训练有素的COVID-19数据MT系统是免费供任何人都可以用它来帮助翻译出版了德语，法语，意大利语，西班牙语成英语信息，以及相反的方向。

25. Unsupervised Transfer of Semantic Role Models from Verbal to Nominal Domain [PDF] 返回目录
Yanpeng Zhao, Ivan Titov
Abstract: Semantic role labeling (SRL) is an NLP task involving the assignment of predicate arguments to types, called semantic roles. Though research on SRL has primarily focused on verbal predicates and many resources available for SRL provide annotations only for verbs, semantic relations are often triggered by other linguistic constructions, e.g., nominalizations. In this work, we investigate a transfer scenario where we assume role-annotated data for the source verbal domain but only unlabeled data for the target nominal domain. Our key assumption, enabling the transfer between the two domains, is that selectional preferences of a role (i.e., preferences or constraints on the admissible arguments) do not strongly depend on whether the relation is triggered by a verb or a noun. For example, the same set of arguments can fill the Acquirer role for the verbal predicate `acquire' and its nominal form `acquisition'. We approach the transfer task from the variational autoencoding perspective. The labeler serves as an encoder (predicting role labels given a sentence), whereas selectional preferences are captured in the decoder component (generating arguments for the predicting roles). Nominal roles are not labeled in the training data, and the learning objective instead pushes the labeler to assign roles predictive of the arguments. Sharing the decoder parameters across the domains encourages consistency between labels predicted for both domains and facilitates the transfer. The method substantially outperforms baselines, such as unsupervised and `direct transfer' methods, on the English CoNLL-2009 dataset.
摘要：语义角色标注（SRL）是涉及谓词参数类型分配，被称为语义角色的自然语言处理任务。虽然在SRL的研究主要集中在口头谓词和可供SRL仅提供动词注释许多资源，语义关系经常被其它语言结构，例如，名词化触发。在这项工作中，我们调查的转移方案，其中我们假设源语言域为目标标称域角色标注的数据，但只有未标记的数据。我们的关键假设，从而使两个域之间的转移，是一个角色的selectional喜好（即，在允许的参数喜好或约束）不强烈依赖于是否关系由动词或名词引发的。例如，同一组参数可以填补的动词谓'获取“和其标称形式`收购”的收购方的角色。我们即将从变autoencoding角度转职任务。贴标用作编码器（预测给定句子角色标签），而selectional偏好在解码器部件捕获的（生成用于预测的角色参数）。标称角色不是在训练数据标记，以及学习目标，而不是推到贴标分配角色预测的参数。跨越域共享解码器的参数鼓励预测两个域的标签之间的一致性和容易的传送。该方法基本上优于基线，如无监督和`直接转印”方法，在英国CoNLL-2009数据集。

26. Towards Controllable Biases in Language Generation [PDF] 返回目录
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng
Abstract: We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce or avoid biases in generated text containing mentions of specified demographic groups. We then analyze two scenarios: 1) inducing biases for one demographic and avoiding biases for another, and 2) mitigating biases between demographic pairs (e.g., man and woman). The former scenario gives us a tool for detecting the types of biases present in the model, and the latter is useful for mitigating biases in downstream applications (e.g., dialogue generation). Specifically, our approach facilitates more explainable biases by allowing us to 1) use the relative effectiveness of inducing biases for different demographics as a new dimension for bias evaluation, and 2) discover topics that correspond to demographic inequalities in generated text. Furthermore, our mitigation experiments exemplify our technique's effectiveness at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
摘要：我们目前对在自然语言生成（NLG）控制的社会偏见的通用方法。在对抗性触发器的想法的基础上，我们开发以诱导方法或避免偏差在含有生成的文本指定人口统计群体的提及。然后我们分析了两种情况：1）用于减轻人口统计对（例如，男人和女人）之间的偏差一个人口统计和避免偏差为另一个，和2）诱导的偏差。前一种情况为我们提供了一种工具，用于检测在所述模型中的类型的偏差的存在，后者是用于在下游应用（例如，对话代）减轻偏置是有用的。具体来说，我们的方法使我们能够有利于1可解释更偏向）使用诱导针对不同的人口统计偏差作为偏差评估一个新的层面的相对有效性，以及2）发现主题，对应于生成的文本人群不平等。此外，我们的实验缓解在全人口均衡偏差量，同时产生更少的整体负偏压文本例证我们技术的有效性。

27. Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic Conditional Random Fields [PDF] 返回目录
Jonas Pfeiffer, Edwin Simpson, Iryna Gurevych
Abstract: We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore correlations between label sequences, which can provide important information in settings with small training datasets. To analyze which scenarios can profit from modeling dependencies between labels in different tasks, we revisit dynamic conditional random fields (CRFs) and combine them with deep neural networks. We compare single-task, multi-task and dynamic CRF setups for three diverse datasets at both sentence and document levels in English and German low resource scenarios. We show that including silver labels from pretrained part-of-speech taggers as auxiliary tasks can improve performance on downstream tasks. We find that especially in low-resource scenarios, the explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
摘要：我们比较低的资源多任务序列不同型号标注为不同的任务标签序列之间的杠杆的依赖。我们的分析是针对数据集，其中每个实例有多个任务标签。目前的方法使用或者为每个任务或标准的多任务学习学习共享的特征表示一个独立的模型。然而，这些方法忽略标签序列，它可以提供与小训练数据设置的重要信息之间的相关性。为了分析这些场景可以从不同的任务标签之间的依赖关系建模获利，我们重温动态条件随机域（控释肥）和深神经网络将它们结合起来。我们比较单任务，多任务和动态CRF设置为在英语和德语低资源场景都句子和文件级别分为三个不同的数据集。我们显示部分的语音预训练标注器是包括了银色标签作为辅助任务可以提高下游任务中的表现。我们发现，特别是在资源匮乏的情况下，任务的预测性能优于单任务以及标准的多任务模型之间的相互依存关系的显式建模。

28. AdapterFusion: Non-Destructive Task Composition for Transfer Learning [PDF] 返回目录
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych
Abstract: Current approaches to solving classification tasks in NLP involve fine-tuning a pre-trained language model on a single target task. This paper focuses on sharing knowledge extracted not only from a pre-trained language model, but also from several source tasks in order to achieve better performance on the target task. Sequential fine-tuning and multi-task learning are two methods for sharing information, but suffer from problems such as catastrophic forgetting and difficulties in balancing multiple tasks. Additionally, multi-task learning requires simultaneous access to data used for each of the tasks, which does not allow for easy extensions to new tasks on the fly. We propose a new architecture as well as a two-stage learning algorithm that allows us to effectively share knowledge from multiple tasks while avoiding these crucial problems. In the first stage, we learn task specific parameters that encapsulate the knowledge from each task. We then combine these learned representations in a separate combination step, termed AdapterFusion. We show that by separating the two stages, i.e., knowledge extraction and knowledge combination, the classifier can effectively exploit the representations learned from multiple tasks in a non destructive manner. We empirically evaluate our transfer learning approach on 16 diverse NLP tasks, and show that it outperforms traditional strategies such as full fine-tuning of the model as well as multi-task learning.
摘要：目前的方法来解决NLP分类任务涉及微调对单个目标任务预先训练的语言模型。本文重点研究，以实现对目标任务更好的性能提取不仅从预训练的语言模型，还从多个源任务的知识共享。连续微调和多任务学习是共享信息的两种方法，但如灾难性的遗忘和平衡多个任务的困难问题的困扰。此外，多任务学习要求用于每个任务，不允许容易扩展到对飞新任务同时访问数据。我们提出了一个新的架构，以及两个阶段的学习算法，使我们能够从多个任务有效地共享知识，同时避免这些关键问题。在第一阶段中，我们了解到，从封装每个任务的认识任务的具体参数。然后，我们在一个单独的合成步骤结合这些学表示，被称为AdapterFusion。我们发现，由两个阶段分离，即知识提取和知识相结合，分类能够有效地利用从多个任务以非破坏性的方式学到的表示。我们根据经验评估我们在16个多元NLP任务转移的学习方法，并表明它优于传统战略，如模型的全微调以及多任务学习。

29. Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage [PDF] 返回目录
Ashish V. Thapliyal, Radu Soricut
Abstract: Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations. We investigate potential solutions for combining existing language-generation annotations in English with translation capabilities in order to create solutions at web-scale in both domain and language coverage. We describe an approach called Pivot-Language Generation Stabilization (PLuGS), which leverages directly at training time both existing English annotations (gold data) as well as their machine-translated versions (silver data); at run-time, it generates first an English caption and then a corresponding target-language caption. We show that PLuGS models outperform other candidate solutions in evaluations performed over 5 different target languages, under a large-domain testset using images from the Open Images dataset. Furthermore, we find an interesting effect where the English captions generated by the PLuGS models are better than the captions generated by the original, monolingual English model.
摘要：跨模态语言生成的任务，如图像字幕直接伤害他们通过大量数据的模型与缺乏非英语注释的结合趋势，支持非英语语言的能力。我们调查了现有的语言生成的注释的英文翻译与能力，以便创造的领域和语言覆盖在网络级解决方案结合潜在的解决方案。我们描述了所谓的透视，语言生成稳定器（插头）的做法，直接在训练时间同时利用现有的英文注释（金数据）以及他们的机器翻译版本（银数据）;在运行时，它首先产生一个英语字幕，然后对应的目标语言的字幕。我们表明，插头型号优于在评估其他候选方案进行了5个不同的目标语言，使用开放式的图像数据集的图像大域测试集下。此外，我们发现一个有趣的效果，其中由插头模型生成的英文字幕是比原来的，单语英语模型生成的字幕更好。

30. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions [PDF] 返回目录
Qiang Ning, Hao Wu, Rujun Han, Nanyun Peng, Matt Gardner, Dan Roth
Abstract: A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have practically no questions that test temporal phenomena, so systems trained on these benchmarks have no capacity to answer questions such as "what happened before/after [some event]?" We introduce TORQUE, a new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships. Results show that RoBERTa-large achieves an exact-match score of 51% on the test set of TORQUE, about 30% behind human performance.
摘要：阅读的关键部分是能够理解的文本段落，即使这些关系不明确说明描述的事件之间的时间关系。然而，目前该机阅读理解的基准测试已经几乎没有疑问的是测试时间的现象，这样的培训对这些基准系统没有能力来回答诸如“发生了什么事之前/之后[一些事件]？”我们介绍TORQUE，建立在3.2K的新闻摘要，以供查询时间关系21K人类产生的问题，新的英语阅读理解的标杆。结果表明，罗伯塔大型实现精确匹配分数的51％的测试集的扭矩，背后人的表现30％。

31. Biomedical Entity Representations with Synonym Marginalization [PDF] 返回目录
Mujeen Sung, Hwisang Jeon, Jinhyuk Lee, Jaewoo Kang
Abstract: Biomedical named entities often play important roles in many biomedical text mining tools. However, due to the incompleteness of provided synonyms and numerous variations in their surface forms, normalization of biomedical entities is very challenging. In this paper, we focus on learning representations of biomedical entities solely based on the synonyms of entities. To learn from the incomplete synonyms, we use a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates. Our model-based candidates are iteratively updated to contain more difficult negative samples as our model evolves. In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates. On four biomedical entity normalization datasets having three different entity types (disease, chemical, adverse reaction), our model BioSyn consistently outperforms previous state-of-the-art models almost reaching the upper bound on each dataset.
摘要：生物医学命名实体常常在许多生物医学文本挖掘工具的重要作用。然而，由于在其表面形式提供同义词和许多变化的不完备性，生物医学实体的归一化是非常具有挑战性的。在本文中，我们重点学习完全基于实体的同义词生物医学实体的表示。从残缺的同义词学习中，我们使用基于模型的候补选择，最大限度地出现在热门人选同义词的边际可能性。我们基于模型的候选人进行迭代更新，以包含更多的困难负样本作为我们的模型的演进。通过这种方式，我们避免了超过40万名考生阴性样品的明确的预选。在具有三个不同的实体类型（疾病，化学，不良反应）4点生物医学实体正常化的数据集，我们的模型生物合成始终优于国家的最先进的以前的型号几乎达到上每个数据集上的约束。

32. Multi-head Monotonic Chunkwise Attention For Online Speech Recognition [PDF] 返回目录
Baiji Liu, Songjun Cao, Sining Sun, Weibin Zhang, Long Ma
Abstract: The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attentions over the chunks. We also explore useful training strategies such as LSTM pooling, minimum world error rate training and SpecAugment to further improve the performance of MTH-MoChA. Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8.96% to 7.68% on test set. On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28% CER, which is significantly better than a state-of-the-art hybrid system.
摘要：听着，顾不上和拼写（LAS）模式的注意机制需要整个输入序列来计算的关注范围内，因此不适合在网上语音识别。为了解决这个问题，我们建议多头单调块明智的关注（MTH-摩卡），摩卡的改进版本。 MTH-摩卡将输入序列分割成小块，并通过组块计算多头关注。我们还探索提供有益的培训策略，如LSTM池，世界最小误差率训练和SpecAugment进一步提高MTH-摩卡的性能。在AISHELL-1数据实验表明，该模型与培训战略以来，从8.96％提高摩卡的字符错误率（CER），以7.68％的测试集。上的车载另一个18000小时语音数据集，MTH-摩卡取得7.28％CER，这是显著优于一个国家的最先进的混合动力系统。

33. KPQA: A Metric for Generative Question Answering Using Word Weights [PDF] 返回目录
Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Joongbo Shin, Kyomin Jung
Abstract: For the automatic evaluation of Generative Question Answering (genQA) systems, it is essential to assess the correctness of the generated answers. However, n-gram similarity metrics, which are widely used to compare generated texts and references, are prone to misjudge fact-based assessments. Moreover, there is a lack of benchmark datasets to measure the quality of metrics in terms of the correctness. To study a better metric for genQA, we collect high-quality human judgments of correctness on two standard genQA datasets. Using our human-evaluation datasets, we show that existing metrics based on n-gram similarity do not correlate with human judgments. To alleviate this problem, we propose a new metric for evaluating the correctness of genQA. Specifically, the new metric assigns different weights on each token via keyphrase prediction, thereby judging whether a predicted answer sentence captures the key meaning of the human judge's ground-truth. Our proposed metric shows a significantly higher correlation with human judgment than widely used existing metrics.
摘要：一个用于生成答疑（genQA）系统进行自动评估，有必要评估所产生的答案的正确性。然而，n-gram中的相似性的度量，其被广泛地用于比较所产生的文本和参考文献，很容易发生误判事实为基础的评估。此外，还有一个缺乏标准数据集来衡量指标的质量正确性方面。为了研究一个更好的度量genQA，我们收集正确性的两种标准genQA数据集高品质的人的判断。使用我们的人评价的数据集，我们表明，基于n-gram中的相似现有指标没有关联的人类判断。为了缓解这一问题，我们提出了评估genQA的正确性的新指标。具体来说，新的度量上分配每个令牌通过关键词的预测不同的权重，从而判断是否预测答案句子捕捉人类法官的地面实况的关键意义。我们提出的指标显示比广泛使用的现有指标人为判断一个显著较高的相关性。

34. Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks [PDF] 返回目录
Winston Wu, Dustin Arendt, Svitlana Volkova
Abstract: We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level. We experiment with different amounts of perturbations to examine model confidence and misclassification rate, and contrast model performance in adversarial training with different embedding types on two benchmark datasets. We demonstrate improving model performance with ensembling. Finally, we analyze factors that effect model behavior under adversarial training and develop a model to predict model errors during adversarial attacks.
摘要：我们评估机器理解模型的鲁棒性噪声和在字符，字进行新的扰动敌对攻击和句子水平。我们与不同量扰动实验，以检验在对抗训练，在两个标准数据集不同嵌入类型模式的信心和错误率，并对比模型的性能。我们证明改善与ensembling模型的性能。最后，我们分析的因素下对抗训练效果模型的行为和建立一个模型，在对抗攻击预测模型误差。

35. Cross-Linguistic Syntactic Evaluation of Word Prediction Models [PDF] 返回目录
Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou, Natalia Talmina, Tal Linzen
Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.
摘要：一系列研究得出的结论是神经字预测模型可以区分高精度不合语法的句子语法。然而，这些研究主要是基于英语单语的证据。为了研究这些模型的学习语法的能力如何通过语言而异，我们介绍CLAMS（关于语法模型的交叉语言评价），句法评估套件，单语和多语种的模型。 CLAMS包括主谓一致的挑战集英语，法语，德语，希伯来语和俄语，从我们开发的语法生成。我们用蛤评估LSTM语言模型以及单语和多语种BERT。跨语言，单语LSTMs横跨相关物体条款协议实现对依赖高精确度不吸引，和准确度普遍较差。在其他结构中，协议的准确性是语言通常更高更丰富的形态。多语言模型一般跑输单语车型。多语种BERT表现出对英语语法高精确度，但在其他语言中明显的不足之处。

36. Sparse, Dense, and Attentional Representations for Text Retrieval [PDF] 返回目录
Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins
Abstract: Dual encoder architectures perform retrieval by encoding documents and queries into dense low-dimensional vectors, and selecting the document that has the highest inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words retrieval models and attentional neural networks. We establish new connections between the encoding dimension and the number of unique terms in each document and query, using both theoretical and empirical analysis. We show an upper bound on the encoding size, which may be unsustainably large for long documents. For cross-attention models, we show an upper bound using much smaller encodings per token, but such models are difficult to scale to realistic retrieval problems due to computational cost. Building on these insights, we propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of attentional architectures, and explore a sparse-dense hybrid to capitalize on the precision of sparse retrieval. These models outperform strong alternatives in open retrieval.
摘要：双编码器的体系结构通过编码文档和查询成致密的低维向量，并选择具有与该查询的最高内积的文档执行检索。我们正在调查这个相对于袋的词疏检索模型和注意力神经网络架构的能力。我们建立了编码维和的每个文档和查询中唯一项数之间的新连接，同时使用理论和实证分析。我们展示的上界编码大小，这可能是长文档不可持续的大。对于跨关注的机型，我们将展示一个上限值，使用每个令牌小得多编码，但这样的模型，由于计算成本都难以规模逼真的检索问题。这些分析的基础上，我们提出了结合了双编码器与一些注意力架构的表现力的效率，一个简单的神经网络模型，并探索出疏密混合，以利用稀疏反演精度。这些模型优于开放式检索强的替代品。

37. Selecting Informative Contexts Improves Language Model Finetuning [PDF] 返回目录
Richard Antonello, Javier Turek, Alexander Huth
Abstract: We present a general finetuning meta-method that we call information gain filtration for improving the overall training efficiency and final performance of language model finetuning. This method uses a secondary learner which attempts to quantify the benefit of finetuning the language model on each given example. During the finetuning process, we use this learner to decide whether or not each given example should be trained on or skipped. We show that it suffices for this learner to be simple and that the finetuning process itself is dominated by the relatively trivial relearning of a new unigram frequency distribution over the modelled language domain, a process which the learner aids. Our method trains to convergence using 40% fewer batches than normal finetuning, and achieves a median perplexity of 54.0 on a books dataset compared to a median perplexity of 57.3 for standard finetuning using the same neural architecture.
摘要：本文提出了一种通用的元细化和微调，方法，我们称之为信息增益为过滤提高整体训练效率和语言模型的细化和微调最终性能。该方法使用其试图量化微调上每个给定的例子中，语言模型的益处的二次学习者。在过程中细化和微调，我们用这个学生来决定每个给定的例子是否应该进行培训或跳过。我们发现，就足够了这个学生是简单的，而且过程是细化和微调本身是通过在建模语言域的新单字频率分布的比较琐碎的再学习，这学习者艾滋病的过程为主。我们的方法列车收敛使用较少的40％，批次比正常细化和微调，以及54.0对书籍的中位数困惑的数据集相比，标准的细化和微调使用相同的神经结构的57.3位数困惑实现。

38. Universal Adversarial Attacks with Natural Triggers for Text Classification [PDF] 返回目录
Liwei Song, Xinwei Yu, Hsuan-Tung Peng, Karthik Narasimhan
Abstract: Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequence of words added to any input instance. Despite being highly successful, the word sequences produced in these attacks are often unnatural, do not carry much semantic meaning, and can be easily distinguished from natural text. In this paper, we develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. To achieve this, we leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search method to output natural text that fools a target classifier. Experiments on two different classification tasks demonstrate the effectiveness of our attacks while also being less identifiable than previous approaches on three simple detection metrics.
摘要：最近的研究已经证明现代文分类通用敌对攻击，这是添加到任何输入例如单词输入无关序列的脆弱性。尽管是非常成功的，在这些攻击中产生的单词序列往往是不自然的，不要携带大量语义，并且可以从自然的文本很容易区分。在本文中，我们开发的时候加入到良性的投入出现更加接近自然的英语短语，但搞不清分类系统对抗性攻击。为了实现这一目标，我们利用一个adversarially正规化自动编码器（araE的）产生触发，并提出到傻瓜目标分类器输出的自然文本基于梯度的搜索方法。在两个不同的分类任务的实验结果证明我们的攻击的有效性，同时还比上三个简单的检测指标，以前的方法少识别。

39. Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity [PDF] 返回目录
Pedro Rodriguez, Paul Crook, Seungwhan Moon, Zhiguang Wang
Abstract: Open-ended human learning and information-seeking are increasingly mediated by technologies like digital assistants. However, such systems often fail to account for the user's pre-existing knowledge, which is a powerful way to increase engagement and to improve retention. Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard of Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts that relate to their existing knowledge. Through crowd-sourcing of this experimental task we collected and now open-source 14K dialogs (181K utterances) where users and assistants converse about various aspects related to geographic entities. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, message grounding to Wikipedia, user reactions to messages, and per-dialog ratings. Our analysis shows that responses which incorporate a user's prior knowledge do increase engagement. We incorporate this knowledge into a state-of-the-art multi-task model that reproduces human assistant policies, improving over content selection baselines by 13 points.
摘要：不限成员名额的人学习和信息搜索越来越受到像数字助理技术介导的。然而，这些系统往往没有考虑到用户的预先存在的知识，这是增加参与和改善保持一种强有力的方式。假设参与和用户响应之间的相关性，如“喜欢”的消息或询问后续问题，我们设计，测试的假设，参与时增加用户呈现的是涉及到他们现有的知识事实一个绿野仙踪对话框任务。通过这个实验任务的众包，我们收集和现在开源14K对话框（181K话语），用户和助手交谈有关与地理实体的各个方面。此数据集注释与预先存在的用户的知识，消息级对话框作用，消息接地维基百科，用户反应的消息，和每个对话的评分。我们的分析表明，其中包括用户的先验知识回应做提高参与。我们将这一知识转化为一个国家的最先进的多任务模式再现人类助手政策，由13分提高对内容的选择基准。

40. Cross-lingual Entity Alignment for Knowledge Graphs with Incidental Supervision from Free Text [PDF] 返回目录
Muhao Chen, Weijia Shi, Ben Zhou, Dan Roth
Abstract: Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose a new model, JEANS , which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted: (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a self-learning based alignment learning process to iteratively induce the correspondence of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs.
摘要：许多研究工作已经投入多语言知识图（KG）嵌入方法来解决实体对准任务，其目的是匹配在指向同一个现实世界对象的不同languagespecific幼儿园的实体。这样的方法通常是由幼稚园之间设置种子对准的不足而受到阻碍。因此，我们提出了一个新的模式，牛仔裤，它们共同代表了一个共享嵌入方案多种语言公斤，语料库，并寻求改善与文本偶然监管信号实体对齐。 JEANS首先部署一个实体基础处理每公斤单语种文本语料库相结合。然后，两种学习过程进行：（ⅰ）一个嵌入学习过程来编码KG并且在一个每种语言的文本嵌入的空间，和（ii）一个自学习基于对准学习过程迭代地诱导实体的对应关系，并且的嵌入物之间的语意。在基准数据集实验表明，JEANS导致看好的实体对准改进与偶然的监督，而显著优于国家的最先进的方法是仅仅依靠幼儿园的内部信息。

41. Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment [PDF] 返回目录
Forrest Davis, Marten van Schijndel
Abstract: A standard approach to evaluating language models analyzes how models assign probabilities to valid versus invalid syntactic constructions (i.e. is a grammatical sentence more probable than an ungrammatical sentence). Our work uses ambiguous relative clause attachment to extend such evaluations to cases of multiple simultaneous valid interpretations, where stark grammaticality differences are absent. We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish. Thus, English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences. We conclude by relating these results to broader concerns about the relationship between comprehension (i.e. typical language model use cases) and production (which generates the training data for language models), suggesting that necessary linguistic biases are not present in the training signal at all.
摘要：以评估语言模型的标准方法分析模型如何分配概率，以有效与无效的句法结构（即是一个语法的句子不是一个不合语法的句子更有可能）。我们的工作用暧昧关系从句连接到这样的评估扩展到多个同时有效的解释，在那里形成了鲜明的语法性的差异是缺席的情况下。我们比较了英语和西班牙语模特表演，以显示RNN LM的，非语言的偏见与语法结构在英语而不是西班牙语有利地重叠。因此，英语车型可能会出现收购类似人类的语法偏好，而训练有素的西班牙模特不能获得相当的类人偏好。我们的结论受到与这些结果至约理解之间的关系更广泛的关注（即典型的语言模型用例）和生产（其产生用于语言模型的训练数据），这表明需要语言偏差不存在于在所有的训练信号。

42. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization [PDF] 返回目录
Sajad Sotudeh, Nazli Goharian, Ross W. Filice
Abstract: Sequence-to-sequence (seq2seq) network is a well-established model for text summarization task. It can learn to produce readable content; however, it falls short in effectively identifying key regions of the source. In this paper, we approach the content selection problem for clinical abstractive summarization by augmenting salient ontological terms into the summarizer. Our experiments on two publicly available clinical data sets (107,372 reports of MIMIC-CXR, and 3,366 reports of OpenI) show that our model statistically significantly boosts state-of-the-art results in terms of Rouge metrics (with improvements: 2.9% RG-1, 2.5% RG-2, 1.9% RG-L), in the healthcare domain where any range of improvement impacts patients' welfare.
摘要：序列到序列（seq2seq）网络是文本摘要任务一套行之有效的模式。它可以学习产生可读的内容;然而，它属于短于有效地识别所述源的关键区域。在本文中，我们通过增加显着本体论方面进入摘要装置接近临床抽象概括的内容选择问题。我们在两个可公开获得的临床数据集（MIMIC-CXR的107372个报告，OpenI的3366份报告），实验表明，我们的模型统计显著提升国家的先进成果在高棉指标方面（与改进：2.9％RG -1，2.5％RG-2，1.9％RG-L），在医疗领域，其中的改进影响患者的福利的任何范围。

43. Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations [PDF] 返回目录
Kai Sun, Richong Zhang, Samuel Mensah, Yongyi Mao, Xudong Liu
Abstract: Named entity recognition (NER) and Relation extraction (RE) are two fundamental tasks in natural language processing applications. In practice, these two tasks are often to be solved simultaneously. Traditional multi-task learning models implicitly capture the correlations between NER and RE. However, there exist intrinsic connections between the output of NER and RE. In this study, we argue that an explicit interaction between the NER model and the RE model will better guide the training of both models. Based on the traditional multi-task learning framework, we design an interactive feature encoding method to capture the intrinsic connections between NER and RE tasks. In addition, we propose a recurrent interaction network to progressively capture the correlation between the two models. Empirical studies on two real-world datasets confirm the superiority of the proposed model.
摘要：命名实体识别（NER）和关系抽取（RE）在自然语言处理应用的两个基本任务。在实践中，这两个任务往往是同时解决。传统的多任务学习模型隐含捕捉NER和RE之间的相关性。然而，存在NER和RE的输出之间的内在联系。在这项研究中，我们认为，NER模型和RE模型之间明确的互动将更好地指导这两种模式的培训。传统的基于多任务学习框架，我们设计的编码方法来捕获NER和RE任务之间的内在联系的互动功能。此外，我们提出了一个经常性的互动网络，逐步捕获两种模式之间的相关性。两个真实世界的数据集的实证研究证实了模型的优越性。

44. Why and when should you pool? Analyzing Pooling in Recurrent Architectures [PDF] 返回目录
Pratyush Maini, Keshav Kolluru, Danish Pruthi, Mausam
Abstract: Pooling-based recurrent neural architectures consistently outperform their counterparts without pooling. However, the reasons for their enhanced performance are largely unexamined. In this work, we examine three commonly used pooling techniques (mean-pooling, max-pooling, and attention), and propose max-attention, a novel variant that effectively captures interactions among predictive tokens in a sentence. We find that pooling-based architectures substantially differ from their non-pooling equivalents in their learning ability and positional biases--which elucidate their performance benefits. By analyzing the gradient propagation, we discover that pooling facilitates better gradient flow compared to BiLSTMs. Further, we expose how BiLSTMs are positionally biased towards tokens in the beginning and the end of a sequence. Pooling alleviates such biases. Consequently, we identify settings where pooling offers large benefits: (i) in low resource scenarios, and (ii) when important words lie towards the middle of the sentence. Among the pooling techniques studied, max-attention is the most effective, resulting in significant performance gains on several text classification tasks.
摘要：基于池经常性神经结构持续超越同行无池。然而，对于他们增强性能的原因，在很大程度上是浑浑噩噩。在这项工作中，我们研究了三种常用的池技术（均值池，MAX-池和注意力），并提出MAX-注意，一个新的变种，在一个句子中预测的令牌中有效地捕获相互作用。我们发现，基于池的架构实质上偏离其学习能力和位置偏差的非汇集当量不同 - 这阐明它们的性能优势。通过分析梯度传播，我们发现，汇集功能有助于更好的梯度流动相比BiLSTMs。此外，我们暴露了如何BiLSTMs在位置朝在开始令牌和一个序列结束偏见。池的缓解这种偏见。在资源匮乏的情况下（我），和（ii）当重要的话躺下朝向句子中间：因此，我们确定设置里汇集报价大的好处。在所研究的池技术，最大注意力是最有效的，从而在几个文本分类任务显著的性能提升。

45. Neural Entity Summarization with Joint Encoding and Weak Supervision [PDF] 返回目录
Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen
Abstract: In a large-scale knowledge graph (KG), an entity is often described by a large number of triple-structured facts. Many applications require abridged versions of entity descriptions, called entity summaries. Existing solutions to entity summarization are mainly unsupervised. In this paper, we present a supervised approach NEST that is based on our novel neural model to jointly encode graph structure and text in KGs and generate high-quality diversified summaries. Since it is costly to obtain manually labeled summaries for training, our supervision is weak as we train with programmatically labeled data which may contain noise but is free of manual work. Evaluation results show that our approach significantly outperforms the state of the art on two public benchmarks.
摘要：大规模知识图表（KG），一个实体通常由大量的三重结构的事实的说明。许多应用需要实体的描述，被呼叫的实体摘要的删节版本。以实体总结现有的解决方案主要是无人监督。在本文中，我们提出了基于我们新的神经网络模型，共同编码图形结构和文本公斤，产生高品质的多元化摘要监督的做法窝。因为它是昂贵的获得手工标注摘要培训，我们的监管相对薄弱，因为我们有可能含有的噪音，但免费体力劳动的编程标签的数据进行训练。评价结果表明，我们的方法显著优于两个公共基准的技术状态。

46. Interpretable Entity Representations through Large-Scale Typing [PDF] 返回目录
Yasumasa Onoe, Greg Durrett
Abstract: In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. Such approaches are strong building blocks for entity-related tasks, but the embeddings they produce require extensive additional processing in neural models, and these entity embeddings are fundamentally difficult to interpret. In this paper, we present an approach to creating interpretable entity representations that are human readable and achieve high performance on entity-related tasks out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types, indicating the confidence of a typing model's decision that the entity belongs to the corresponding type. We obtain these representations using a fine-grained entity typing model, trained either on supervised ultra-fine entity typing data (Choi et al. 2018) or distantly-supervised examples from Wikipedia. On entity probing tasks involving recognizing entity identity, our embeddings achieve competitive performance with ELMo and BERT without using any extra parameters. We also show that it is possible to reduce the size of our type set in a learning-based way for particular domains. Finally, we show that these embeddings can be post-hoc modified through simple rules to incorporate domain knowledge and improve performance.
摘要：在自然语言处理的标准方法，文本中的实体通常嵌入在密集的向量空间与预训练模式。这种方法是实体相关的任务，强大的基石，但它们产生的嵌入神经模型需要大量额外的处理，而这些实体的嵌入是从根本上难以解释。在本文中，我们提出了一个方法来创建可解释的实体表示这是人类可读，实现对实体相关的任务开箱即用的高性能。我们表示是向量，其值对应到后面的过度细粒度的实体类型的概率，这表明实体属于相应类型打字模式决定的信心。我们获得使用细粒度实体类型模型中的这些表述，有监督超细实体类型的数据无论是训练的（Choi等人。2018）或维基百科遥远监督的例子。在实体探测涉及识别实体身份的任务，我们的嵌入实现与毛毛和BERT竞争力的性能，而无需使用任何额外的参数。我们还表明，它可以减少在特定领域以学习为主的方式我们的类型集合的大小。最后，我们表明，这些嵌入物可以事后通过简单的规则修改，以结合领域知识和提高性能。

47. Contextual Text Style Transfer [PDF] 返回目录
Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li, Jingjing Liu
Abstract: We introduce a new task, Contextual Text Style Transfer - translating a sentence into a desired style with its surrounding context taken into account. This brings two key challenges to existing style transfer approaches: ($i$) how to preserve the semantic meaning of target sentence and its consistency with surrounding context during transfer; ($ii$) how to train a robust model with limited labeled data accompanied with context. To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context. A classifier is further trained to ensure contextual consistency of the generated sentence. To compensate for the lack of parallel data, additional self-reconstruction and back-translation losses are introduced to leverage non-parallel data in a semi-supervised fashion. Two new benchmarks, Enron-Context and Reddit-Context, are introduced for formality and offensiveness style transfer. Experimental results on these datasets demonstrate the effectiveness of the proposed CAST model over state-of-the-art methods across style accuracy, content preservation and contextual consistency metrics.
摘要：我们推出了新的任务，上下文文本式转换 - 翻译一个句子与它周围的情况下所需的风格考虑。这带来了两个关键的挑战现有的样式传输方法：（$ I $）如何保护目标句子及其与转移过程中周围的上下文一致性的语义; （$ $ II）如何训练与伴随上下文中不受限制标记数据的稳健的模型。为了实现与自然的背景下保存高品质的风格转移，我们提出了一个上下文感知式转换（CAST）模式，它使用两个单独的编码器为每个输入句子及其周边环境。分类器是进一步训练，以确保所产生的句子的上下文一致性。为了补偿缺乏的并行数据的，附加的自重建和回译损失引入到杠杆非并行数据在一个半监督方式。两个新的基准，安然语境和Reddit情境中，都推出了形式和风格冒犯转移。这些数据集的实验结果证明了该模型CAST在跨越式的精度，保存内容和上下文一致性度量状态的最先进的方法的有效性。

48. Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction [PDF] 返回目录
Gideon Maillette de Buy Wenniger, Thomas van Dongen, Eleri Aedmaa, Herbert Teun Kruitbosch, Edwin A. Valentijn, Lambert Schomaker
Abstract: Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction: substantial gains on average against other models and consistent improvements over HANs without structure-tags. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model. It gains 4.7% accuracy over the best of both models on the computation and language domain and loses 2.4% against the best of both on the machine learning domain. Compared to plain HANs, accuracy increases on both domains, with 1.5% and 2% respectively. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.0% over HANs without structure-tags.
摘要：长文反复训练神经网络，特别是学术文献，使学习的问题。虽然分层关注网络（HANS）能有效地解决这些问题，他们仍不失对文本结构的重要信息。为了解决这些问题，我们建议使用汉斯与标记的句子的文档中的角色结构标签相结合。添加标签的句子，将它们标记为对应的标题，摘要或主体文本，产生的改进在国家的最先进的学术文献质量预测：平均对其他车型及以上汉斯持续改善可观的收益，而不结构 - 标签。所提出的系统被施加到接受的任务/拒绝对数据集PeerRead预测和对一个最近的基于BiLSTM模型和关节文本+视觉模型进行比较。它获得4.7％的准确度最好的计算和语言域的两种模式，并在机器学习领域失去了2.4％，对两个最好的。相比普通汉斯，精度提高了在两个结构域，分别为1.5％和2％。引入标签，我们从艾伦AI S2ORC数据集编为88K科学出版物引用的次数的预测时，我们也得到改善。对于我们的HAN-系统结构的标签，我们达到28.5％，解释方差为1.0％，比汉斯的改善没有结构的标签。

49. Revisiting Memory-Efficient Incremental Coreference Resolution [PDF] 返回目录
Patrick Xia, João Sedoc, Benjamin Van Durme
Abstract: We explore the task of coreference resolution under fixed memory by extending an incremental clustering algorithm to utilize contextualized encoders and neural components. Our algorithm creates explicit representations for each entity, where given a new sentence, spans are proposed and subsequently scored against each entity representation, leading to emergent clusters. Our approach is end-to-end trainable and can be used to transform existing models, leading to an asymptotic reduction in memory usage while remaining competitive on task performance, which allows for more efficient use of computational resources for short documents, and making coreference more feasible across very long document contexts.
摘要：我们通过扩展增量聚类算法，利用情境编码器和神经元件探索下固定的内存指代消解的任务。我们的算法会为每个实体，其中，赋予了新的句子，提出了跨度，并随后取得了对每一个实体表示，导致紧急群明确表示。我们的做法是终端到终端的可训练的，可用于改造现有的模式，导致内存使用量的渐近减少，而对任务绩效，允许更有效地利用短文件的计算资源保持竞争力，并使共指更跨越很长的文档上下文是可行的。

50. Learning to Faithfully Rationalize by Construction [PDF] 返回目录
Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace
Abstract: In many settings it is important for one to be able to understand why a model made a particular prediction. In NLP this often entails extracting snippets of an input text `responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation. In some settings, faithfulness may be critical to ensure transparency. Lei et al. (2016) proposed a model to produce faithful rationales for neural text classification by defining independent snippet extraction and prediction modules. However, the discrete selection over input tokens performed by this method complicates training, leading to high variance and requiring careful hyperparameter tuning. We propose a simpler variant of this approach that provides faithful explanations by construction. In our scheme, named FRESH, arbitrary feature importance scores (e.g., gradients from a trained model) are used to induce binary labels over token inputs, which an extractor can be trained to predict. An independent classifier module is then trained exclusively on snippets provided by the extractor; these snippets thus constitute faithful explanations, even if the classifier is arbitrarily complex. In both automatic and manual evaluations we find that variants of this simple framework yield predictive performance superior to `end-to-end' approaches, while being more general and easier to train. Code is available at this https URL
摘要：在许多情况下，重要的是一个能够理解为什么一个模型进行特定的预测。在此NLP常常需要提取输入的文本的摘录`负责”对应模型输出;当这样的片段包括令牌确实告知模型预测，这是一个忠实的解释。在一些环境中，忠诚可能是至关重要的，以确保透明度。雷等人。（2016）提出了一个模型通过定义独立片段的提取和预测模块以产生用于神经文本分类忠实理由。然而，在输入的离散选择令牌通过该方法复杂化训练进行，导致高方差和需要仔细超参数调谐。我们建议这种做法，提供施工忠实解释的简单变种。在我们的方案中，命名为FRESH，任意特征重要性分数（例如，来自一个训练的模型梯度）用于诱导过令牌输入，提取器可以被训练以预测二进制标签。独立的分类器模块然后专门训练上由提取器提供的代码片段;因此，这些片段构成忠实的解释，即使分类器是任意复杂的。在自动和手动评估我们发现的优于'端至端”接近此简单的框架，产量预测性能的变体，同时更加普遍和更容易培养。代码可在此HTTPS URL

51. On the Spontaneous Emergence of Discrete and Compositional Signals [PDF] 返回目录
Nur Geffen Lan, Emmanuel Chemla, Shane Steinert-Threlkeld
Abstract: We propose a general framework to study language emergence through signaling games with neural agents. Using a continuous latent space, we are able to (i) train using backpropagation, (ii) show that discrete messages nonetheless naturally emerge. We explore whether categorical perception effects follow and show that the messages are not compositional.
摘要：通过信令与神经代理游戏建议学习语言出现了一个总体框架。采用连续潜在空间，我们采用反向传播能（I）列车，（II）显示出离散消息仍然自然产生。我们探讨范畴知觉效应是否跟随并表明该消息是不是成分。

52. Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [PDF] 返回目录
Alexander Gutkin, Tatiana Merkulova, Martin Jansche
Abstract: The use of linguistic typological resources in natural language processing has been steadily gaining more popularity. It has been observed that the use of typological information, often combined with distributed language representations, leads to significantly more powerful models. While linguistic typology representations from various resources have mostly been used for conditioning the models, there has been relatively little attention on predicting features from these resources from the input data. In this paper we investigate whether the various linguistic features from World Atlas of Language Structures (WALS) can be reliably inferred from multi-lingual text. Such a predictor can be used to infer structural features for a language never observed in training data. We frame this task as a multi-label classification involving predicting the set of non-mutually exclusive and extremely sparse multi-valued labels (WALS features). We construct a recurrent neural network predictor based on byte embeddings and convolutional layers and test its performance on 556 languages, providing analysis for various linguistic types, macro-areas, language families and individual features. We show that some features from various linguistic types can be predicted reliably.
摘要：在自然语言处理中使用的语言类型学的资源一直在稳步日渐流行。据观察，使用的类型学的信息，往往采用分布式的语言表述相结合，导致显著更强大的车型。虽然来自不同资源的语言类型学表示主要被用于空调的车型，出现了从输入数据来预测这些资源的功能比较少的关注。在本文中，我们调查是否从语言结构的世界地图集（WALS）的各种语言特征可以从多语言文本能够可靠地推断。这种预测器可以被用于推断的结构特征为在训练数据从未观察到的语言。我们帧此任务涉及预测组非相互排斥的，非常稀疏多值标签（WALS功能）多标签分类。我们构造基于字节的嵌入和卷积层的回归神经网络预测和556种语言测试其性能，各种语言的类型，宏区域，语言的家庭和个人的特点，提供分析。我们表明，从不同的语言类型的一些功能能够可靠地预测。

53. Revisiting Unsupervised Relation Extraction [PDF] 返回目录
Thy Thy Tran, Phong Le, Sophia Ananiadou
Abstract: Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named entities to induce relation types, we can outperform existing methods on two popular datasets. We conduct a comparison and evaluation of our findings with other URE techniques, to ascertain the important features in URE. We conclude that entity types provide a strong inductive bias for URE.
摘要：无监督关系抽取（URE）提取从原始文本命名实体之间的关系，而无需手动标记的数据和已有的知识基础（KBS）。 URE方法可分为生成和判别方法，它或者靠手工制作的特征或表面的形式。然而，我们证明了仅使用命名实体诱导关系类型，我们可以超越两种流行的数据集现有的方法。我们在开展与其他URE技术的研究结果进行比较和评估，以确定在URE的重要特征。我们的结论是实体类型为URE很强的归纳偏置。

54. AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages [PDF] 返回目录
Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar
Abstract: We present the IndicNLP corpus, a large-scale, general-domain corpus containing 2.7 billion words for 10 Indian languages from two language families. We share pre-trained word embeddings trained on these corpora. We create news article category classification datasets for 9 languages to evaluate the embeddings. We show that the IndicNLP embeddings significantly outperform publicly available pre-trained embedding on multiple evaluation tasks. We hope that the availability of the corpus will accelerate Indic NLP research. The resources are available at this https URL.
摘要：我们提出的IndicNLP语料库，在含有2.7十亿字来自两个语系10种印度语言大规模，一般领域的语料库。我们分享培训了这些语料库预先训练字的嵌入。我们创造了9种语言来评价的嵌入新闻文章类别分类的数据集。我们表明，IndicNLP的嵌入显著优于多个评测项目公开可用的预先训练嵌入。我们希望，语料库的可用性将加快印度语NLP研究。资源可在此HTTPS URL。

55. Aspect-Controlled Neural Argument Generation [PDF] 返回目录
Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych
Abstract: We rely on arguments in our daily lives to deliver our opinions and base them on evidence, making them more convincing in turn. However, finding and formulating arguments can be challenging. In this work, we train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect. We define argument aspect detection as a necessary method to allow this fine-granular control and crowdsource a dataset with 5,032 arguments annotated with aspects. Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments. Moreover, these arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments. We publish all datasets and code to fine-tune the language model.
摘要：我们依靠我们在日常生活中的论点提供证据我们的意见和基地他们，使他们反过来更有说服力。然而，寻找和制定参数可以挑战。在这项工作中，我们培养了一代的说法，可以在一个精细的层次进行控制，以产生一个给定的主题，立场，一方面句级参数的语言模型。我们定义参数方面的检测作为一个必要的方法，以允许这种细粒度控制和分工处理与各方面注释5,032参数的数据集。我们的评估显示，我们这一代模型能够生成高质量，纵横具体参数。此外，这些参数可以被用于改进姿态检测模型的数据经由增强的性能，并产生反参数。我们发布的所有数据集和代码进行微调的语言模型。

56. Attribution Analysis of Grammatical Dependencies in LSTMs [PDF] 返回目录
Yiding Hao
Abstract: LSTM language models have been shown to capture syntax-sensitive grammatical dependencies such as subject-verb agreement with a high degree of accuracy (Linzen et al., 2016, inter alia). However, questions remain regarding whether they do so using spurious correlations, or whether they are truly able to match verbs with their subjects. This paper argues for the latter hypothesis. Using layer-wise relevance propagation (Bach et al., 2015), a technique that quantifies the contributions of input features to model behavior, we show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns. Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
摘要：LSTM语言模型已被证明捕捉语法敏感语法相关性，如高准确度的主谓一致（Linzen等人，2016年，除其他外）。然而，问题仍然存在关于是否可能会使用虚假相关，或者他们是否真的能够与他们的主题相匹配的动词。本文认为后者的假设。使用逐层相关性传播（Bach等人，2015），其量化的输入功能，以模型的行为的贡献的技术中，我们显示在编号一致认为LSTM性能直接与模型的从其他名词区分受试者的能力相关。我们的研究结果表明，LSTM语言模型能够推断句法依赖强大的交涉。

57. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer [PDF] 返回目录
Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder
Abstract: The main goal behind state-of-the-art pretrained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pretraining. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and achieves competitive results on question answering.
摘要：落后国家的最先进的预训练的多语言模型，如多语种BERT和XLM-R被启用，并通过零拍或少拍跨语言传递引导在低资源语言NLP应用的主要目标。然而，由于有限的模型容量，其传输性能正是在这样的低资源语言和语言训练前在看不见最弱。我们建议MAD-X，基于适配器框架，使高便携性和通过学习模块化语言和任务表示参数有效地传输到任意的任务和语言。此外，我们引入了一个新的可逆适配器架构和用于调整预训练的多语言模式，以新的语言强大的基线法。 MAD-X优于技术跨越一组有代表性类型学的不同语言的命名实体识别的跨语言转移的状态，并实现在答疑竞争力的结果。

58. UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection [PDF] 返回目录
Andrey Kutuzov, Mario Giulianelli
Abstract: We apply contextualised word embeddings to lexical semantic change detection in the SemEval-2020 Shared Task 1. This paper focuses on Subtask 2, ranking words by the degree of their semantic drift over time. We analyse the performance of two contextualising architectures (BERT and ELMo) and three change detection algorithms. We find that the most effective algorithms rely on the cosine similarity between averaged token embeddings and the pairwise distances between token embeddings. They outperform strong baselines by large margin, but interestingly, the choice of a particular algorithm depends on the distribution of gold scores in the test set.
摘要：我们应用contextualised字的嵌入到词汇语义变化检测在SemEval-2020共享任务1.本文主要对子程序2，通过它们的语义漂移随着时间的推移度排名的话。我们分析两个contextualising架构（BERT和ELMO）和三个变化检测算法的性能。我们发现，最有效的算法依赖于平均令牌的嵌入和令牌的嵌入之间的成对的距离之间的余弦相似性。他们胜过大裕强基准，但有趣的是，特定算法的选择取决于黄金分数测试集中分布。

59. Context based Text-generation using LSTM networks [PDF] 返回目录
Sivasurya Santhanam
Abstract: Long short-term memory(LSTM) units on sequence-based models are being used in translation, question-answering systems, classification tasks due to their capability of learning long-term dependencies. In Natural language generation, LSTM networks are providing impressive results on text generation models by learning language models with grammatically stable syntaxes. But the downside is that the network does not learn about the context. The network only learns the input-output function and generates text given a set of input words irrespective of pragmatics. As the model is trained without any such context, there is no semantic consistency among the generated sentences. The proposed model is trained to generate text for a given set of input words along with a context vector. A context vector is similar to a paragraph vector that grasps the semantic meaning(context) of the sentence. Several methods of extracting the context vectors are proposed in this work. While training a language model, in addition to the input-output sequences, context vectors are also trained along with the inputs. Due to this structure, the model learns the relation among the input words, context vector and the target word. Given a set of context terms, a well trained model will generate text around the provided context. Based on the nature of computing context vectors, the model has been tried out with two variations (word importance and word clustering). In the word clustering method, the suitable embeddings among various domains are also explored. The results are evaluated based on the semantic closeness of the generated text to the given context.
摘要译，答疑系统，分类任务正在被使用，由于其学习长期依赖的能力，基于序列模型长短期记忆（LSTM）单位：抽象。在自然语言生成，LSTM网络通过学习语言模型与语法稳定的语法提供的文本代车型不俗的业绩。但不足之处是网络不学习的环境。网络仅学习输入 - 输出功能，并给定一组的语用无关的输入字产生的文本。由于模型是没有任何这样的上下文的训练，有将所生成的句子中没有语义一致性。该模型被训练来产生与上下文载体沿一组给定的输入字的文本。上下文向量类似于抓住句子的语义（上下文）段落矢量。提取上下文向量的几种方法，提出了这项工作。而训练语言模型，除了输入输出序列，上下文向量也与输入来训练沿。由于这种结构，该模型学习输入字，上下文向量和目标字之间的关系。给定一组方面的术语，一个训练有素的模型会产生围绕提供的上下文文本。基于计算上下文的载体的性质，该模型已尝试了用两种变体（字重要性和字集群）。在单词的聚类方法，各种域之间的合适的嵌入进行了探讨。该结果是基于所生成的文本给定上下文的语义紧密度进行评价。

60. Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering [PDF] 返回目录
Wenhan Xiong, Hong Wang, William Yang Wang
Abstract: To extract answers from a large corpus, open-domain question answering (QA) systems usually rely on information retrieval (IR) techniques to narrow the search space. Standard inverted index methods such as TF-IDF are commonly used as thanks to their efficiency. However, their retrieval performance is limited as they simply use shallow and sparse lexical features. To break the IR bottleneck, recent studies show that stronger retrieval performance can be achieved by pretraining a effective paragraph encoder that index paragraphs into dense vectors. Once trained, the corpus can be pre-encoded into low-dimensional vectors and stored within an index structure where the retrieval can be efficiently implemented as maximum inner product search. Despite the promising results, pretraining such a dense index is expensive and often requires a very large batch size. In this work, we propose a simple and resource-efficient method to pretrain the paragraph encoder. First, instead of using heuristically created pseudo question-paragraph pairs for pretraining, we utilize an existing pretrained sequence-to-sequence model to build a strong question generator that creates high-quality pretraining data. Second, we propose a progressive pretraining algorithm to ensure the existence of effective negative samples in each batch. Across three datasets, our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
摘要：从大语料库中提取的答案，开放域问答（QA）系统通常依赖于信息检索（IR）技术来缩小搜索空间。标准的倒排索引方法，比如TF-IDF通常用作感谢他们的工作效率。然而，当他们只需使用浅疏词汇特征的检索性能是有限的。为了打破瓶颈IR，最近的研究表明，较强的检索性能可以通过预训练有效段落编码器来实现该索引段成致密的载体。一旦被训练，胼可以预编码成低维矢量和存储的索引结构，其中的检索可如最大内积搜索被高效地实现内。尽管有希望的结果，训练前如此密集的指数是价格昂贵，往往需要非常大的批量大小。在这项工作中，我们提出了一个简单和资源节约型的方法来pretrain段落编码器。首先，而是采用启发式创建伪问题段落对用于训练前，我们利用现有的预训练序列对序列模型建立一个强大的问题产生，创建高品质的训练前的数据。其次，我们提出了一个渐进的训练前的算法，以确保每批有效阴性样本的存在。跨越三个数据集，我们的方法优于用于训练前现有的密集检索方法，它使用7倍更多的计算资源。

61. Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning [PDF] 返回目录
Mohsen Mesgar, Edwin Simpson, Yue Wang, Iryna Gurevych
Abstract: Recent transformer-based open-domain dialogue agents are trained by reference responses in a fully supervised scenario. Such agents often display inconsistent personalities as training data potentially contain contradictory responses to identical input utterances and no persona-relevant criteria are used in their training losses. We propose a novel approach to train transformer-based dialogue agents using actor-critic reinforcement learning. We define a new reward function to assess generated responses in terms of persona consistency, topic consistency, and fluency. Our reference-agnostic reward relies only on a dialogue history and a persona defined by a list of facts. Automatic and human evaluations on the PERSONACHAT dataset show that our proposed approach increases the rate of persona-consistent responses compared with its peers that are trained in a fully supervised scenario using reference responses.
摘要：近期基于变压器的开放域对话剂通过引用响应的完全监控场景训练。这样的试剂通常显示不一致的个性作为训练数据潜在地包含于相同的输入话语矛盾的反应和没有角色相关的标准在他们的训练损失使用。我们提出了一个新的方法来使用演员评论家强化学习型变压器火车对话代理。我们定义一个新的奖励功能评估的人物一致性，一致性的话题，和流畅度方面产生的反应。我们的参考无关的奖励只是依靠对话历史和事实的列表中定义一个角色。自动和对PERSONACHAT数据集上，我们提出的办法增加了与同行的是在使用的参考响应的完全监控场景的培训相比，人物一致的响应速度的人的评价。

62. Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society [PDF] 返回目录
Firoj Alam, Shaden Shaar, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Kareem Darwish, Preslav Nakov
Abstract: Disinformation, i.e., information that is both false and means harm, thrives in social media. Most often, it is used for political purposes, e.g., to influence elections or simply to cause distrust in society. It can also target medical issues, most notably the use of vaccines. With the emergence of the COVID-19 pandemic, the political and the medical aspects merged as disinformation got elevated to a whole new level to become the first global infodemic. Fighting this infodemic is now ranked second on the list of the most important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading xenophobia and panic. The fight requires solving a number of problems such as identifying tweets containing claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. These are challenging problems, and some of them have been studied previously, but typically in isolation. Here, we design, annotate, and release to the research community a new dataset for fine-grained disinformation analysis that (i) focuses on COVID-19, (ii) combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society as a whole, and (iii) covers both English and Arabic.
摘要：造谣，即，既虚假和手段损害的资料，在社交媒体蓬勃发展。大多数情况下，它被用于政治目的，例如，影响选举，或仅仅是为了在社会上造成的不信任。它也可以针对医疗问题，最明显的是使用疫苗。随着COVID-19大流行的出现，政治和医学方面合并为造谣得到提升到一个全新的水平，成为全球首个infodemic。打这场infodemic现在排名第二世界卫生组织的最重要的重点领域的名单上，有危险从宣传伪造疗法，谣言和阴谋论来散布排外和恐慌。的斗争需要解决的一些问题，如标识含有鸣叫的权利要求，确定其检查适航和真实性，及其可能给害以及该伤害的性质，仅举几例。这些都是具有挑战性的问题，其中一些已经被先前的研究，但通常孤立。在这里，我们设计，注释，然后松开，研究界细粒度造谣分析是：（i）侧重于COVID-19新的数据集，（二）结合的视角和记者的利益，事实跳棋，社交媒体平台上，政策制定者和整个社会，以及（iii）涵盖了英语和阿拉伯语。

63. Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs [PDF] 返回目录
Michael A. Lepori, Tal Linzen, R. Thomas McCoy
Abstract: Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks. Such tree-based networks can be provided with a constituency parse, a dependency parse, or both. We evaluate which of these two representational schemes more effectively introduces biases for syntactic structure that increase performance on the subject-verb agreement prediction task. We find that a constituency-based network generalizes more robustly than a dependency-based one, and that combining the two types of structure does not yield further improvement. Finally, we show that the syntactic robustness of sequential models can be substantially improved by fine-tuning on a small amount of constructed data, suggesting that data augmentation is a viable alternative to explicit constituency structure for imparting the syntactic biases that sequential models are lacking.
摘要：基于序列的神经网络展现给句法结构显著的灵敏度，但他们仍然在比基于树的网络句法任务表现较差。这样的基于树的网络可以被提供有一个选区解析，一个依赖解析，或两者。我们评估这两个代表性的方案更有效地引入偏见句法结构上的主谓一致预测任务提高性能。我们发现，基于选区的网络推广不是更有力基于依赖性的一个，并且结合这两种类型的结构不会产生进一步的改善。最后，我们表明，连续模式的句法的鲁棒性可以基本上由微调上构成的数据量小的提高，这表明数据扩张是一种可行的替代明确的选区结构赋予句法偏压该顺序模型缺乏。

64. Partially-Typed NER Datasets Integration: Connecting Practice to Theory [PDF] 返回目录
Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han
Abstract: While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them. Instead of relying on fully-typed NER datasets, many efforts have been made to leverage multiple partially-typed ones for training and allow the resulting model to cover a full type set. However, there is neither guarantee on the quality of integrated datasets, nor guidance on the design of training algorithms. Here, we conduct a systematic analysis and comparison between partially-typed NER datasets and fully-typed ones, in both theoretical and empirical manner. Firstly, we derive a bound to establish that models trained with partially-typed annotations can reach a similar performance with the ones trained with fully-typed annotations, which also provides guidance on the algorithm design. Moreover, we conduct controlled experiments, which shows partially-typed datasets leads to similar performance with the model trained with the same amount of fully-typed annotations
摘要：虽然典型的命名实体识别（NER）模型需要的训练集与所有目标类型进行标注，每个可用的数据集可能只覆盖其中的一部分。而不是依靠完全类型的NER数据集，许多已作出努力，以杠杆倍数部分类型的那些培训，并允许得到的模型覆盖全类型集。但是，在综合数据集的质量，也没有对训练算法的设计指导既保证。在这里，我们进行部分类型的NER数据集和全类型的人之间的系统的分析和比较，在理论和实证的方式。首先，我们获得一个绑定的建立与部分类型的注释训练的模型可以达到与全类型的注解，它也提供了对算法设计指导受训的那些类似的性能。此外，我们进行对照实验，这表明部分类型的数据集导致类似的性能与等量完全类型的注释的训练模式

65. Automatic Discourse Segmentation: Review and Perspectives [PDF] 返回目录
Iria da Cunha, Juan-Manuel Torres-Moreno
Abstract: Multilingual discourse parsing is a very prominent research topic. The first stage for discourse parsing is discourse segmentation. The study reported in this article addresses a review of two on-line available discourse segmenters (for English and Portuguese). We evaluate the possibility of developing similar discourse segmenters for Spanish, French and African languages.
摘要：多语言话语分析是一个非常突出的研究课题。用于解析话语的第一阶段是话语分割。该研究在这篇文章的地址报告了两上线可用话语分割器（英语和葡萄牙语）的审查。我们评估开发西班牙语，法语和非洲语言类似的话语分割器的可能性。

66. HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do [PDF] 返回目录
Keith Curtis, George Awad, Shahzad Rajput, Ian Soboroff
Abstract: In this paper we propose a new evaluation challenge and direction in the area of High-level Video Understanding. The challenge we are proposing is designed to test automatic video analysis and understanding, and how accurately systems can comprehend a movie in terms of actors, entities, events and their relationship to each other. A pilot High-Level Video Understanding (HLVU) dataset of open source movies were collected for human assessors to build a knowledge graph representing each of them. A set of queries will be derived from the knowledge graph to test systems on retrieving relationships among actors, as well as reasoning and retrieving non-visual concepts. The objective is to benchmark if a computer system can "understand" non-explicit but obvious relationships the same way humans do when they watch the same movies. This is long-standing problem that is being addressed in the text domain and this project moves similar research to the video domain. Work of this nature is foundational to future video analytics and video understanding technologies. This work can be of interest to streaming services and broadcasters hoping to provide more intuitive ways for their customers to interact with and consume video content.
摘要：本文提出了在高级视频谅解区域的新评价的挑战和方向。我们建议的挑战是设计用来测试自动视频分析和理解，以及如何准确地系统可以在演员，实体，事件以及它们相互之间的关系方面理解电影。开源电影飞行员高级视频了解（HLVU）数据集共收集人的评估建立代表他们每个人的知识图谱。一组查询将从知识图导出到测试系统上检索行为者之间的关系，以及推理和检索非视觉概念。目的是基准，如果计算机系统可以“理解”不明确的，但明显的关系以同样的方式人类一样，当他们观看同样的电影。这是长期的，其在文本域中解决问题。这个项目的移动类似的研究到视频领域。这种性质的工作基础是对未来视频分析和视频理解技术。这项工作可能会感兴趣的流媒体服务和广播，希望为他们的客户提供更直观的互动方式与和消费视频内容。

67. Bipartite Flat-Graph Network for Nested Named Entity Recognition [PDF] 返回目录
Ying Luo, Hai Zhao
Abstract: In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.
摘要：在本文中，我们提出了一种新颖的二分平面图形网络（BiFlaG）嵌套命名实体识别（NER），它包含两个子图的模块：用于最外实体均位于实体平坦NER模块和图形模块内层。双向LSTM（BiLSTM）和图形卷积网络（GCN）采用共同学习平实体和自己内心的依赖关系。从以前的型号，只考虑信息从最内层到外层的人（或外到内）的单向传递不同的是，我们的模型有效地捕捉它们之间的双向交互。我们首先使用由扁平NER模块识别的实体来构建实体图，其被馈送到下一个图形模块。更丰富的表示形式从图形模块了解到携带内实体的依赖性，可被利用来提高最外实体的预测。三个标准的嵌套NER数据集实验结果表明，我们的BiFlaG优于国家的最先进的以往机型。

68. Diverse Visuo-Lingustic Question Answering (DVLQA) Challenge [PDF] 返回目录
Shailaja Sampat, Yezhou Yang, Chitta Baral
Abstract: Existing question answering datasets mostly contain homogeneous contexts, based on either textual or visual information alone. On the other hand, digitalization has evolved the nature of reading which often includes integrating information across multiple heterogeneous sources. To bridge the gap between two, we compile a Diverse Visuo-Lingustic Question Answering (DVLQA) challenge corpus, where the task is to derive joint inference about the given image-text modality in a question answering setting. Each dataset item consists of an image and a reading passage, where questions are designed to combine both visual and textual information, i.e. ignoring either of them would make the question unanswerable. We first explore the combination of best existing deep learning architectures for visual question answering and machine comprehension to solve DVLQA subsets and show that they are unable to reason well on the joint task. We then develop a modular method which demonstrates slightly better baseline performance and offers more transparency for interpretation of intermediate outputs. However, this is still far behind the human performance, therefore we believe DVLQA will be a challenging benchmark for question answering involving reasoning over visuo-linguistic context. The dataset, code and public leaderboard will be made available at this https URL.
摘要：现有的问答集大多含有均匀背景的基础上，无论是文字或单独的视觉信息。在另一方面，数字化已经发展的读数通常包含多个异构源整合信息的性质。为了缩小两者之间的差距，我们编译一个多元化的视觉一语言学问题解答（DVLQA）挑战语料库，其中任务是有关在问答设置给定的图像，文本模式派生联合推断。每个数据集项由图像和阅读文章，其中问题的目的是视觉和文本信息，即忽略任何一方都会使问题无法回答的结合。我们首先探讨现有最佳深度学习架构视觉答疑和机器理解来解决DVLQA子集，并表明他们无法很好的理由对联合任务的组合。然后，我们开发这表明稍好基准性能和中间产出的解释提供了更多的透明度模块化方法。然而，这还远远落后于人的表现，因此，我们认为DVLQA将是问答涉及推理在视觉一语境一个富有挑战性的基准。该数据集，代码和公共的排行榜将在此HTTPS URL提供。

69. TransOMCS: From Linguistic Graphs to Commonsense Knowledge [PDF] 返回目录
Hongming Zhang, Daniel Khashabi, Yangqiu Song, Dan Roth
Abstract: Commonsense knowledge acquisition is a key problem for artificial intelligence. Conventional methods of acquiring commonsense knowledge generally require laborious and costly human annotations, which are not feasible on a large scale. In this paper, we explore a practical way of mining commonsense knowledge from linguistic graphs, with the goal of transferring cheap knowledge obtained with linguistic patterns into expensive commonsense knowledge. The result is a conversion of ASER [Zhang et al., 2020], a large-scale selectional preference knowledge resource, into TransOMCS, of the same representation as ConceptNet [Liu and Singh, 2004] but two orders of magnitude larger. Experimental results demonstrate the transferability of linguistic knowledge to commonsense knowledge and the effectiveness of the proposed approach in terms of quantity, novelty, and quality. TransOMCS is publicly available at: this https URL.
摘要：常识知识的获取是人工智能的一个关键问题。的获取常识知识的传统方法一般需要费力的和昂贵的人力注解，这是不大规模可行的。在本文中，我们将探讨从语言图挖掘常识知识，具有传输与语言模式为昂贵的常识性知识的获得廉价的知识目标的一条可行之路。其结果是ASER的转换[张等人，2020]年，大规模selectional偏好的知识资源，为TransOMCS，同样表现为ConceptNet [刘和辛格，2004年]，但幅度较大的两个数量级。实验结果表明，语言知识的常识知识转让和所提出的方法的数量，新颖性和质量方面的有效性。 TransOMCS是公开的：该HTTPS URL。

70. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training [PDF] 返回目录
Linjie Li, Yen-Chun Chen, Yu Cheng, Zhe Gan, Licheng Yu, Jingjing Liu
Abstract: We present HERO, a Hierarchical EncodeR for Omni-representation learning, for large-scale video+language pre-training. HERO encodes multimodal inputs in a hierarchical fashion, where local textual context of a video frame is captured by a Cross-modal Transformer via multimodal fusion, and global video context is captured by a Temporal Transformer. Besides standard Masked Language Modeling (MLM) and Masked Frame Modeling (MFM) objectives, we design two new pre-training tasks: (i) Video-Subtitle Matching (VSM), where the model predicts both global and local temporal alignment; and (ii) Frame Order Modeling (FOM), where the model predicts the right order of shuffled video frames. Different from previous work that mostly focused on cooking or narrated instructional videos, HERO is jointly trained on HowTo100M and large-scale TV show datasets to learn complex social scenes, dynamics backdrop transitions and multi-character interactions. Extensive experiments demonstrate that HERO achieves new state of the art on both text-based video moment retrieval and video question answering tasks across different domains.
摘要：目前英雄，分层编码器进行全方位表示学习，对于大型视频+语言前培训。 HERO以分层方式，其中，一个视频帧的本地文本上下文由跨模态变压器经由多模态融合捕获编码多模式输入，以及全球视频上下文由时间转换捕获。除了标准的蒙面语言模型（MLM）和罩住的框架模型（MFM）的目标，我们设计了两个新的预培训任务：（一）视频字幕匹配（VSM），其中该模型预测全球和本地时间比对;和（ii）的帧顺序的建模（FOM），其中该模型预测混洗的视频帧的正确顺序。从以前的工作，主要集中在烹饪或叙述的教学视频不同的是，主人公是共同培训了HowTo100M和大型电视节目的数据集学习复杂的社会场景，动态背景的转变，多角色互动。大量的实验证明，HERO达到了艺术上的基于文本的视频检索时刻和视频的问题在不同的领域回答这两个任务的新状态。

71. On the Merging of Domain-Specific Heterogeneous Ontologies using Wordnet and Web Pattern-based Queries [PDF] 返回目录
M. Maree, M. Belkhatir
Abstract: Ontologies form the basic interest in various computer science disciplines such as semantic web, information retrieval, database design, etc. They aim at providing a formal, explicit and shared conceptualization and understanding of common domains between different communities. In addition, they allow for concepts and their constraints of a specific domain to be explicitly defined. However, the distributed nature of ontology development and the differences in viewpoints of the ontology engineers have resulted in the so called "semantic heterogeneity" between ontologies. Semantic heterogeneity constitutes the major obstacle against achieving interoperability between ontologies. To overcome this obstacle, we present a multi-purpose framework which exploits the WordNet generic knowledge base for: i) Discovering and correcting the incorrect semantic relations between the concepts of the ontology in a specific domain. This step is a primary step of ontology merging. ii) Merging domain-specific ontologies through computing semantic relations between their concepts. iii) Handling the issue of missing concepts in WordNet through the acquisition of statistical information on the Web. And iv) Enriching WordNet with these missing concepts. An experimental instantiation of the framework and comparisons with state-of-the-art syntactic and semantic-based systems validate our proposal.
摘要：本体形成各种计算机科学等学科的语义网络，信息检索，数据库设计等，在提供一个正式的，明确的和共享的概念模型和不同社区之间的共同领域的理解根本利益他们的目标。此外，它们允许概念，并明确定义其在特定域的约束。然而，本体开发的分布式特性和本体工程师的观点的差异导致本体之间所谓的“语义异质性”。语义异质性构成了对本体间进行互操作的主要障碍。为了克服这个障碍，我们提出了一个多用途框架，它利用了通用的WordNet知识基础：1）发现和纠正在一个特定领域本体的概念之间的不正确的语义关系。该步骤是本体合并的主要步骤。 ⅱ）通过计算其概念之间的语义关系合并特定域本体。 III）处理通过收购的网站上的统计信息缺失WordNet中概念的问题。和iv）共发现丰富与这些失踪的概念。框架和比较与国家的最先进的语法和基于语义系统的实验验证实例我们的建议。

72. Unsupervised Learning of KB Queries in Task Oriented Dialogs [PDF] 返回目录
Dinesh Raghu, Nikhil Gupta, Mausam
Abstract: Task-oriented dialog (TOD) systems converse with users to accomplish a specific task. This task requires the system to query a knowledge base (KB) and use the retrieved results to fulfil user needs. Predicting the KB queries is crucial and can lead to severe under-performance if made incorrectly. KB queries are usually annotated in real-world datasets and are learnt using supervised approaches to achieve acceptable task completion. This need for query annotations prevents TOD systems from easily adapting to new domains. In this paper, we propose a novel problem of learning end-to-end TOD systems using dialogs that do not contain KB query annotations. Our approach first learns to predict the KB queries using reinforcement learning (RL) and then learns the end-to-end system using the predicted queries. However, predicting the correct query in TOD systems is uniquely plagued by correlated attributes, in which, due to data bias, certain attributes always occur together in the KB. This prevents the RL system to generalise and accuracy suffers as a result. We propose Correlated Attributes Resilient RL (CARRL), a modification to the RL gradient estimation, which mitigates the problem of correlated attributes and predicts KB queries better than existing weakly supervised approaches. Finally, we compare the performance of our end-to-end system trained using predicted queries to a system trained using annotated gold queries.
摘要：面向任务的对话框（TOD）系统与用户交谈，以完成特定的任务。此任务需要系统来查询知识库（KB），并使用检索到的结果来满足用户的需求。预测KB查询是至关重要的，如果做不当，可能会导致严重的表现不佳。 KB查询通常注释在现实世界的数据集和使用监督的方法学会了达到可接受的任务完成。这需要查询标注防止TOD系统轻易地适应新的领域。在本文中，我们提出了学习使用不包含KB查询标注对话框端至端TOD系统的新问题。我们的方法首先学习如何预测使用强化学习（RL）的KB查询，然后来学习使用预测的查询终端到终端的系统。然而，预测TOD系统正确的查询唯一地由相关属性的困扰，其中，由于数据的偏差，某些属性总是在KB一起发生。这防止了RL系统概括和精度受到了影响。我们建议相关属性弹性RL（CARRL），修改到RL梯度估计，这减轻相关属性的问题，并预测KB查询比现有的弱监督的方法更好。最后，我们比较使用预测查询使用标注的黄金查询培养了训练有素的系统，我们的终端到终端的系统的性能。

73. Learning to Rank Intents in Voice Assistants [PDF] 返回目录
Raviteja Anantha, Srinivas Chappidi, William Dawoodi
Abstract: Voice Assistants aim to fulfill user requests by choosing the best intent from multiple options generated by its Automated Speech Recognition and Natural Language Understanding sub-systems. However, voice assistants do not always produce the expected results. This can happen because voice assistants choose from ambiguous intents -- user-specific or domain-specific contextual information reduces the ambiguity of the user request. Additionally the user information-state can be leveraged to understand how relevant/executable a specific intent is for a user request. In this work, we propose a novel Energy-based model for the intent ranking task, where we learn an affinity metric and model the trade-off between extracted meaning from speech utterances and relevance/executability aspects of the intent. Furthermore we present a Multisource Denoising Autoencoder based pretraining that is capable of learning fused representations of data from multiple sources. We empirically show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%, which in turn reduces ambiguity and eliminates undesired dead-ends leading to better user experience. Finally, we evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.
摘要：语音助理旨在通过从它的自动语音识别和自然语言理解子系统生成的多个选项中进行选择最好的意图，满足用户的要求。然而，语音助手并不总能产生预期的效果。这可能发生，因为语音助手选择从暧昧的意图 - 特定用户或特定领域的上下文信息降低了用户请求的歧义。此外，该用户的信息状态可以被利用来了解相关/可执行特定意图是如何成为一个用户请求。在这项工作中，我们提出了意图排名的任务，在那里我们学习的亲和力公制和抽出的意思之间的权衡，从语音发声和意图的相关性/可执行性方面进行建模一个新的基于能量的模型。此外，我们提出一个基于多源降噪自动编码器训练前，它能够学习来自多个来源的数据融合表示。我们经验表明我们的方法比由3.8％，从而减少模糊和消除不要的死角导致更好的用户体验降低了错误率现有的技术方法状态。最后，我们评估我们的意图排序任务算法的鲁棒性，并显示我们的算法由33.3％提高了耐用性。

74. An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety [PDF] 返回目录
Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller
Abstract: The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought front in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on developing some potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients. In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety. For this purpose, two established acoustic feature sets and support vector machines are utilised. Our experiments show that an average accuracy of .69 has been obtained to estimate the severity of illness, which is derived from the number of days in hospitalisation. We hope that this study could enlighten an extremely fast, low-cost, and convenient way to automatically detect COVID-19 disease.
摘要：COVID-19的爆发被宣布为2020年3月通过了世界卫生组织全球大流行，并已影响到过去几周，越来越多的人。在此背景下，先进的人工智能技术在应对反打，减少这一全球健康危机带来的影响正面。在这项研究中，我们重点开发的智能语音分析一些潜在的使用情况进行COVID-19确诊患者。特别是，从这些患者的分析演讲录音，我们构建了纯音频为基础的模型，从四个方面，包括疾病的严重程度，睡眠质量，疲劳和焦虑自动分类的患者的健康状态。为此，两个已建立的声学特性集，并利用支持向量机。我们的实验表明，0.69的平均精度已获得估计的疾病，它是由住院天数得出的严重性。我们希望这一研究能够告诉一个非常快速，低成本和便捷的方式，自动检测COVID-19的疾病。

75. Method for Customizable Automated Tagging: Addressing the Problem of Over-tagging and Under-tagging Text Documents [PDF] 返回目录
Maharshi R. Pandya, Jessica Reyes, Bob Vanderheyden
Abstract: Using author provided tags to predict tags for a new document often results in the overgeneration of tags. In the case where the author doesn't provide any tags, our documents face the severe under-tagging issue. In this paper, we present a method to generate a universal set of tags that can be applied widely to a large document corpus. Using IBM Watson's NLU service, first, we collect keywords/phrases that we call "complex document tags" from 8,854 popular reports in the corpus. We apply LDA model over these complex document tags to generate a set of 765 unique "simple tags". In applying the tags to a corpus of documents, we run each document through the IBM Watson NLU and apply appropriate simple tags. Using only 765 simple tags, our method allows us to tag 87,397 out of 88,583 total documents in the corpus with at least one tag. About 92.1% of the total 87,397 documents are also determined to be sufficiently-tagged. In the end, we discuss the performance of our method and its limitations.
摘要：利用创作者提供的标签来预测标签为一个新的文档通常会导致标签的overgeneration。在这里笔者不提供任何标签的情况下，我们的文件所面临的严峻下标记问题。在本文中，我们提出了一个方法来生成一个通用的一套可广泛应用于大型文档语料库标签。使用IBM Watson的NLU服务，首先，我们收集的关键字/短语，我们从语料库8,854流行报告称之为“复杂的文件标签”。我们运用了这些复杂的文档标签LDA模型生成一组765独特的“简单的标记”。在应用标签文档的语料库，我们运行通过IBM沃森NLU每个文档和应用适当的简单的标记。只使用765的简单标签，我们的方法可以让我们的标签87397出88583个的总文档语料库中与至少一个标签。关于总87397个文档的92.1％还决心充分标记。最后，我们讨论我们的方法和它的局限性的表现。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-04

目录

摘要