摘要

1. A Survey of Embedding Space Alignment Methods for Language and Knowledge Graphs [PDF] 返回目录
Alexander Kalinowski, Yuan An
Abstract: Neural embedding approaches have become a staple in the fields of computer vision, natural language processing, and more recently, graph analytics. Given the pervasive nature of these algorithms, the natural question becomes how to exploit the embedding spaces to map, or align, embeddings of different data sources. To this end, we survey the current research landscape on word, sentence and knowledge graph embedding algorithms. We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research. By gathering these diverse approaches into a singular survey, we hope to further motivate research into alignment of embedding spaces of varied data types and sources.
摘要：神经植入方法，已成为计算机视觉，自然语言处理领域的主食，最近，图表分析。鉴于这些算法的普遍性，自然问题就变成了如何利用嵌入空间映射，或对齐，不同数据源的嵌入。为此，我们调查的单词，句子和知识图嵌入算法目前的研究环境。我们提供的相关对准技术进行分类，并讨论在这一研究领域中使用标准数据集。通过收集这些不同的方法到一个单一的调查中，我们希望能够进一步激励研究嵌入变化的数据类型和来源的空间排列。

2. A Corpus for Argumentative Writing Support in German [PDF] 返回目录
Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh, Jan Marco Leimeister
Abstract: In this paper, we present a novel annotation approach to capture claims and premises of arguments and their relations in student-written persuasive peer reviews on business models in German language. We propose an annotation scheme based on annotation guidelines that allows to model claims and premises as well as support and attack relations for capturing the structure of argumentative discourse in student-written peer reviews. We conduct an annotation study with three annotators on 50 persuasive essays to evaluate our annotation scheme. The obtained inter-rater agreement of $\alpha=0.57$ for argument components and $\alpha=0.49$ for argumentative relations indicates that the proposed annotation scheme successfully guides annotators to moderate agreement. Finally, we present our freely available corpus of 1,000 persuasive student-written peer reviews on business models and our annotation guidelines to encourage future research on the design and development of argumentative writing support systems for students.
摘要：在本文中，我们提出了一个新的注释的方法来捕捉索赔和论据总部大楼，在德语商业模式的学生写的有说服力的同行评审的关系。我们提出了一种基于注解准则，允许模型索赔和场所以及支持和攻击关系捕捉议论文语篇的结构在学生写的同行评审注释方案。我们与50个说服力的散文三对注释进行注释研究，以评估我们的注释方案。的$ \阿尔法= $ 0.57为参数的组件和$ \阿尔法= 0.49 $的议论关系所获得的评估人之间一致表明，该标注方式成功地指导注释者适度的协议。最后，我们提出我们的商业模式1000说服力的学生编写的同行评审和我们的注解指南免费提供语料，以鼓励未来对学生的议论文写作支持系统的设计与开发研究。

3. Exploiting Neural Query Translation into Cross Lingual Information Retrieval [PDF] 返回目录
Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
Abstract: As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency. To this end, existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT), limiting the further improvements on both translation and retrieval quality. In this paper, we investigate how to exploit neural query translation model into CLIR system. Specifically, we propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data, thus to alleviate the problem of domain-adaptation in NMT. Then, we introduce an asynchronous strategy which is able to leverage the advantages of the real-time in SMT and the veracity in NMT. Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines and can be well applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search engine. Readers can examine and test their cases on our website: this https URL .
摘要：作为跨语言信息检索了至关重要的作用（CLIR），查询翻译有三个主要的挑战：1）翻译的充分性; 2）缺乏域内平行训练数据;和3）低延迟的必要。为此，现有的跨语言信息检索系统主要是利用基于统计的机器翻译（SMT），而不是先进的神经机器翻译（NMT），限制对翻译和检索质量的进一步改善。在本文中，我们探讨了如何利用神经查询翻译模型成CLIR系统。具体地，我们提出了一种新颖的数据增强方法，该方法根据用户的点击数据中提取查询翻译对，从而减轻在NMT域的适应的问题。然后，我们介绍了异步策略，是能够利用的实时的SMT和NMT的准确性的优点。实验结果表明，该方法产生更好的检索质量，比强大的基线，并能很好地应用到真实世界的跨语言信息检索系统，即全球速卖通电子商务搜索引擎。读者可以检查和我们的网站上测试他们的情况：该HTTPS URL。

4. Constraint Translation Candidates: A Bridge between Neural Query Translation and Cross-lingual Information Retrieval [PDF] 返回目录
Tianchi Bi, Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
Abstract: Query translation (QT) is a key component in cross-lingual information retrieval system (CLIR). With the help of deep learning, neural machine translation (NMT) has shown promising results on various tasks. However, NMT is generally trained with large-scale out-of-domain data rather than in-domain query translation pairs. Besides, the translation model lacks a mechanism at the inference time to guarantee the generated words to match the search index. The two shortages of QT result in readable texts for human but inadequate candidates for the downstream retrieval task. In this paper, we propose a novel approach to alleviate these problems by limiting the open target vocabulary search space of QT to a set of important words mined from search index database. The constraint translation candidates are employed at both of training and inference time, thus guiding the translation model to learn and generate well performing target queries. The proposed methods are exploited and examined in a real-word CLIR system--Aliexpress e-Commerce search engine. Experimental results demonstrate that our approach yields better performance on both translation quality and retrieval accuracy than the strong NMT baseline.
摘要：查询翻译（QT）是在跨语言信息检索系统（CLIR）的关键部件。随着深度学习的帮助下，神经机器翻译（NMT）已经显示出有前途的各种任务的结果。然而，NMT一般训练了与大型出的域数据，而不是域的查询翻译对。此外，翻译模型缺少的推理时间，以保证生成的词相匹配的搜索索引的机制。 QT的两个不足导致对人类，但考生不能满足下游检索任务可读文本。在本文中，我们提出了一种新的方法，通过限制开放的目标词汇搜索QT的空间，一组从搜索索引数据库中挖掘重要的词来缓解这些问题。约束翻译候选人在两个训练和推理时所使用的，从而引导翻译模型学习和产生以及进行目标查询。所提出的方法是利用和研究在真实字CLIR系统 - 全球速卖通的电子商务搜索引擎。实验结果表明，我们的方法产生了两个翻译质量和检索精度都比较强NMT基准更好的性能。

5. Dutch Humor Detection by Generating Negative Examples [PDF] 返回目录
Thomas Winters, Pieter Delobelle
Abstract: Detecting if a text is humorous is a hard task to do computationally, as it usually requires linguistic and common sense insights. In machine learning, humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text. Rather than using completely different non-humorous texts, we propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm. We constructed several different joke and non-joke datasets to test the humor detection abilities of different language technologies. In particular, we compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT. In doing so, we create and compare the first Dutch humor detection systems. We found that while other language models perform well when the non-jokes came from completely different domains, RobBERT was the only one that was able to distinguish jokes from generated negative examples. This performance illustrates the usefulness of using text generation to create negative datasets for humor recognition, and also shows that transformer models are a large step forward in humor detection.
摘要：检测有文字幽默是一个艰巨的任务，做计算，因为它通常需要语言和常识的见解。在机器学习，幽默的检测通常建模为一个二元分类任务，训练以预测，如果给定的文本是一个笑话或其他类型的文本。而不是使用完全不同的非幽默的文本，我们建议使用文本生成算法模仿原来的笑话集，以提高学习算法的难度。我们建造了几个不同的笑话和非笑话集，以测试不同的语言技术的幽默检测能力。特别是，我们比较经典的神经网络的幽默检测能力与国家的最先进的荷兰语模型RobBERT方法。在此过程中，我们创建和比较第一荷兰人幽默检测系统。我们发现，虽然其他的语言模型表现良好时，非笑话从完全不同的领域来了，RobBERT是唯一一个能够区分产生负面的例子笑话。这样的表现说明了使用文本生成创造幽默识别消极的数据集，也表明了变压器模型在幽默检测一大进步是有益的。

6. Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification [PDF] 返回目录
Timo Schick, Helmut Schmid, Hinrich Schütze
Abstract: A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the language model's abilities. To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data. For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.
摘要：几拍文本分类最近的做法是文本输入转换为包含某种形式的任务描述的，具有预训练的语言模型对其进行处理和预测的话映射到标签完形填空题。手动定义字和标签之间的映射既需要专业知识和语言模型的能力的理解。为了缓解这一问题，我们设计了一种方法，自动发现这种映射给少量的训练数据。对于一些任务，映射发现我们的方法执行几乎以及手工制作的标签，对词的映射。

7. UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models [PDF] 返回目录
Mircea-Adrian Tanase, Dumitru-Clementin Cercel, Costing-Gabriel Chiru
Abstract: Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.
摘要：攻击性语言检测是在自然语言处理领域中最具挑战性的问题之一，通过这种现象在网络社交媒体的崛起存在被罚款。本文介绍了五种语言识别在Twitter上攻击性的语言我们基于变压器的解决方案（即英语，阿拉伯语，丹麦，希腊和土耳其），这是在子任务中使用的Offenseval 2020共享任务。几个神经结构（即，BERT，mBERT，罗伯塔，XLM-罗伯塔，和ALBERT），同时使用单一语言和多语语料库预先训练，进行微调和比较使用的数据集的多个组合。最后，使用我们的比赛中，名列其中我们的团队21 85，28 53，19 39，16 37，和10日的46英语，阿拉伯语，丹麦语，希腊语提交得分最高的机型，和土耳其，分别。

8. Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale [PDF] 返回目录
Ozan Caglayan, Pranava Madhyastha, Lucia Specia
Abstract: Automatic evaluation of language generation systems is a well-studied problem in Natural Language Processing. While novel metrics are proposed every year, a few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation, despite their known limitations. This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them. In this paper, we urge the community for more careful consideration of how they automatically evaluate their models by demonstrating important failure cases on multiple datasets, language pairs and tasks. Our experiments show that metrics (i) usually prefer system outputs to human-authored texts, (ii) can be insensitive to correct translations of rare words, (iii) can yield surprisingly high scores when given a single sentence as system output for the entire test set.
摘要：语言生成系统的自动评估是在自然语言处理的充分研究的问题。虽然新的指标，每年提出，一些流行的指标仍然作为事实上的标准来评估的任务，例如图像字幕和机器翻译，尽管他们已知的限制。这部分是由于易用性，部分是因为研究人员希望看到他们，知道如何解释它们。在本文中，我们敦促更仔细的考虑如何他们会自动通过对多个数据集，语言对和任务，展示重要的失效的情况下评估他们的模型社会。我们的实验表明，指标（ⅰ）通常喜欢系统输出到人撰写文本，（ⅱ）可以是不敏感的罕见词语正确的翻译，（ⅲ）可产生令人惊讶的高分数给出一个简单的句子作为系统输出为整个时测试集。

9. Interpreting convolutional networks trained on textual data [PDF] 返回目录
Reza Marzban, Christopher John Crick
Abstract: There have been many advances in the artificial intelligence field due to the emergence of deep learning. In almost all sub-fields, artificial neural networks have reached or exceeded human-level performance. However, most of the models are not interpretable. As a result, it is hard to trust their decisions, especially in life and death scenarios. In recent years, there has been a movement toward creating explainable artificial intelligence, but most work to date has concentrated on image processing models, as it is easier for humans to perceive visual patterns. There has been little work in other fields like natural language processing. In this paper, we train a convolutional model on textual data and analyze the global logic of the model by studying its filter values. In the end, we find the most important words in our corpus to our models logic and remove the rest (95%). New models trained on just the 5% most important words can achieve the same performance as the original model while reducing training time by more than half. Approaches such as this will help us to understand NLP models, explain their decisions according to their word choices, and improve them by finding blind spots and biases.
摘要：目前已在人工智能领域的许多进展，由于深学习的出现。在几乎所有的子场，人工神经网络已达到或超过人类水平的性能。然而，大部分车型都没有解释。其结果是，它是很难相信自己的决定，尤其是在生命和死亡的情景。近年来，出现了朝着建立可解释的人工智能运动，但大多数工作至今都集中在图像处理模型，因为它是人类感知视觉模式更容易。目前已在类似自然语言处理等领域的一些工作。在本文中，我们培养的文本数据卷积模型，并通过研究它的过滤值分析模型的全局逻辑。最后，我们发现在我们的语料库中的最重要的话我们的模型逻辑，并删除其余部分（95％）。训练有素的只是5％，最重要的字新模型可以达到同样的性能与原始模型，而一半以上减少了训练时间。的方法，例如，这将帮助我们了解NLP模型，根据他们的话选择解释他们的决策，并通过查找盲点和偏见改善。

10. Hierarchical Metadata-Aware Document Categorization under Weak Supervision [PDF] 返回目录
Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han
Abstract: Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of hierarchical topic structures in massive text corpora. Although related studies have achieved satisfying performance in fully supervised hierarchical document classification, they usually require massive human-annotated training data and only utilize text information. However, in many domains, (1) annotations are quite expensive where very few training samples can be acquired; (2) documents are accompanied by metadata information. Hence, this paper studies how to integrate the label hierarchy, metadata, and text signals for document categorization under weak supervision. We develop HiMeCat, an embedding-based generative framework for our task. Specifically, we propose a novel joint representation learning module that allows simultaneous modeling of category dependencies, metadata information and textual semantics, and we introduce a data augmentation module that hierarchically synthesizes training documents to complement the original, small-scale training set. Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.
摘要：分类文档转换成一个给定的标签层次直观吸引人由于大规模语料库分层主题结构的普及。尽管相关研究都取得了满意的性能在充分监督层次文档分类，它们通常需要大量的人力注释的训练数据，并且仅使用文本信息。然而，在许多领域，（1）注解是其中很少的训练样本可以获取相当昂贵; （2）文件都伴随着元数据信息。因此，研究如何将标签层次，元数据和文本文档分类信号集成弱监督。我们开发HiMeCat，我们的任务基于嵌入-生成框架。具体来说，我们提出了一个新的联合代表学习模块，允许类的依赖，元数据信息和文本语义的同时建模，我们介绍一个数据增强模块分层方式合成训练文件，配合原有的，小规模的训练集。我们的实验证明HiMeCat超过竞争性基准的持续改善和验证我们的代表学习和数据扩充模块的贡献。

11. Meta-Learning for Neural Relation Classification with Distant Supervision [PDF] 返回目录
Zhenzhen Li, Jian-Yun Nie, Benyou Wang, Pan Du, Yuhan Zhang, Lixin Zou, Dongsheng Li
Abstract: Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have been proposed to select a subset of reliable instances for neural model training, but they still suffer from noisy labeling problem or underutilization of the weakly-labeled data. To better select more reliable training instances, we introduce a small amount of manually labeled data as reference to guide the selection process. In this paper, we propose a meta-learning based approach, which learns to reweight noisy training data under the guidance of reference data. As the clean reference data is usually very small, we propose to augment it by dynamically distilling the most reliable elite instances from the noisy data. Experiments on several datasets demonstrate that the reference data can effectively guide the selection of training data, and our augmented approach consistently improves the performance of relation classification comparing to the existing state-of-the-art methods.
摘要：遥远的监管提供了一种低成本的关系分类创建大量微弱的标签数据。然而，得到的标记实例非常嘈杂，含有错误的标签数据。许多方法被提出来选择神经网络模型训练可靠的情况下的一个子集，但他们还是在嘈杂的标签问题或弱标记数据的利用不足受到影响。更好地选择更可靠的训练实例中，我们介绍手动标记的数据少量作为参考来指导选择过程。在本文中，我们提出了一个元学习基础的方法，其学习参考的数据的指导下reweight嘈杂的训练数据。作为清洁参照数据通常是非常小的，我们建议从噪声数据动态蒸馏最可靠的精锐情况下，以增加它。几个数据集实验表明，该参考数据可以有效地指导训练数据的选择，和我们的增强方法可以始终如一改善关系的分类比较现有的国家的最先进方法的性能。

12. Syllabification of the Divine Comedy [PDF] 返回目录
Andrea Asperti, Stefano Dal Bianco
Abstract: We provide a syllabification algorithm for the Divine Comedy using techniques from probabilistic and constraint programming. We particularly focus on the synalephe, addressed in terms of the "propensity" of a word to take part in a synalephe with adjacent words. We jointly provide an online vocabulary containing, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity, on the left and right sides. The algorithm is intrinsically nondeterministic, producing different possible syllabifications for each verse, with different likelihoods; metric constraints relative to accents on the 10th, 4th and 6th syllables are used to further reduce the solution space. The most likely syllabification is hence returned as output. We believe that this work could be a major milestone for a lot of different investigations. From the point of view of digital humanities it opens new perspectives on computer assisted analysis of digital sources, comprising automated detection of anomalous and problematic cases, metric clustering of verses and their categorization, or more foundational investigations addressing e.g. the phonetic roles of consonants and vowels. From the point of view of text processing and deep learning, information about syllabification and the location of accents opens a wide range of exciting perspectives, from the possibility of automatic learning syllabification of words and verses, to the improvement of generative models, aware of metric issues, and more respectful of the expected musicality.
摘要：我们提供使用技术神曲从概率和约束规划一个音节算法。我们特别注重synalephe，在一个词来参加与相邻的字synalephe的“倾向”的条款解决。我们共同提供包含一个在线的词汇，对于每一个字，有关其音节，补品口音的位置，和上述synalephe倾向，左侧和右侧。该算法在本质上是确定性的，从而产生每一节不同的可能的音节，具有不同的似然性;相对于在第10，第4和第6音节口音度量约束用于进一步降低溶液的空间。最有可能的音节因此被返回作为输出。我们相信，这项工作可能是有很多不同的调查的一个重要里程碑。但从数字人文点它打开上的数字源计算机辅助分析例如寻址新的视角，包括异常的和有问题的情况下自动检测，经文度量聚类和它们的分类，或更基础的调查辅音和元音的音标作用。但从文本处理和深度学习，点约音节和重音的位置信息打开了广泛精彩的观点，从文字和诗句的自动学习音节的可能性，以生成模型，了解度量的改进问题，更看重预期的音乐性。

13. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking [PDF] 返回目录
Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Liu, Hongsong Zhu, Limin Sun
Abstract: Extracting entities and relations from unstructured text has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty in identifying overlapping relations with shared entities. Prior works show that joint learning can result in a noticeable performance gain. However, they usually involve sequential interrelated steps and suffer from the problem of exposure bias. At training time, they predict with the ground truth conditions while at inference it has to make extraction from scratch. This discrepancy leads to error accumulation. To mitigate the issue, we propose in this paper a one-stage joint extraction model, namely, TPLinker, which is capable of discovering overlapping relations sharing one or both entities while immune from the exposure bias. TPLinker formulates joint extraction as a token pair linking problem and introduces a novel handshaking tagging scheme that aligns the boundary tokens of entity pairs under each relation type. Experiment results show that TPLinker performs significantly better on overlapping and multiple relation extraction, and achieves state-of-the-art performance on two public datasets.
摘要：从非结构化文本中提取实体和关系已经引起越来越多的关注，近年来，但仍识别与共享实体重叠关系的挑战，由于固有的困难。在此之前的作品表明，联合学习会导致明显的性能增益。然而，他们通常涉及顺序相互关联的步骤和曝光补偿的问题的困扰。在训练的时候，他们预测与地面实况的条件，而在推断它使从头提取。这种差异导致了误差积累。为了缓解这一问题，我们提出了本文一阶段联合开采模式，即TPLinker，这是能够发现重叠关系，共享一个或两个实体，而从曝光的偏见免疫。 TPLinker制定关节提取作为令牌一对连接问题，并介绍了一种新颖的握手标记方案使得每个关系类型下实体对对齐的边界标记。实验结果表明，TPLinker执行显著更好上重叠和多个关系抽取，并实现在两个公共数据集状态的最先进的性能。

14. Robust and Consistent Estimation of Word Embedding for Bangla Language by fine-tuning Word2Vec Model [PDF] 返回目录
Rifat Rahman
Abstract: Word embedding or vector representation of word holds syntactical and semantic characteristics of word which can be an informative feature for any machine learning based models of natural language processing. There are several deep learning based models for the vectorization of words like word2vec, fasttext, gensim, glove etc. In this study, we analysis word2vec model for learning word vectors by tuning different hyper-parameters and present the most effective word embedding for Bangla language. For testing the performances of different word embeddings induced by fine-tuning of word2vec model, we perform both intrinsic and extrinsic evaluations. We cluster the word vectors to examine the relational similarity of words and also use different word embeddings as the feature of news article classifier for extrinsic evaluation. From our experiment, we discover that the word vectors with 300 dimension, generated from 'skip-gram' method of word2vec model using the sliding window size of 4, are giving the most robust vector representations for Bangla language.
摘要：Word中嵌入或字的矢量表示持有字的句法和语义特征可以是任何机器学习自然语言处理的基于模型的信息量的特征。还有像word2vec，fasttext，gensim，手套等。在这种学习单词的矢量几个深度学习的机型为主，通过调整不同的超参数和学习的单词矢量我们分析word2vec模型目前最有效的字嵌入了孟加拉语言。为了测试通过word2vec模型的微调引起的不同字的嵌入的性能，我们进行内在和外在的评价。我们聚集单词矢量来检查单词的关系相似，也使用不同的字的嵌入为外在评价新闻文章分类的功能。从我们的实验中，我们发现，单词矢量300的尺寸，使用的4滑动窗口大小word2vec模型的“跳克”的方法，产生的是给了孟加拉语言最强大的矢量表示。

15. Graph Transformer Networks with Syntactic and Semantic Structures for Event Argument Extraction [PDF] 返回目录
Amir Pouran Ben Veyseh, Tuan Ngo Nguyen, Thien Huu Nguyen
Abstract: The goal of Event Argument Extraction (EAE) is to find the role of each entity mention for a given event trigger word. It has been shown in the previous works that the syntactic structures of the sentences are helpful for the deep learning models for EAE. However, a major problem in such prior works is that they fail to exploit the semantic structures of the sentences to induce effective representations for EAE. Consequently, in this work, we propose a novel model for EAE that exploits both syntactic and semantic structures of the sentences with the Graph Transformer Networks (GTNs) to learn more effective sentence structures for EAE. In addition, we introduce a novel inductive bias based on information bottleneck to improve generalization of the EAE models. Extensive experiments are performed to demonstrate the benefits of the proposed model, leading to state-of-the-art performance for EAE on standard datasets.
摘要：事件参数的提取（EAE）的目标是要找到每个实体提一个给定事件触发词的作用。它已被证明在以前的作品中的句子的句法结构的深度学习模型EAE很有帮助。然而，在这些现有工程的一个主要问题是，他们不能利用句子的语义结构诱导有效的表示，EAE。因此，在这项工作中，我们提出了EAE，它利用与图形变压器网络（GTNs）句子的语法和语义结构，以更有效地学习句子结构的EAE一种新的模式。此外，我们还介绍了基于信息瓶颈，提高EAE模型的推广一个新的归纳偏置。大量的实验，以验证了该模型的优势，导致国家的最先进的性能EAE标准数据集。

16. Improving Aspect-based Sentiment Analysis with Gated Graph Convolutional Networks and Syntax-based Regulation [PDF] 返回目录
Amir Pouran Ben Veyseh, Nasim Nour, Franck Dernoncourt, Quan Hung Tran, Dejing Dou, Thien Huu Nguyen
Abstract: Aspect-based Sentiment Analysis (ABSA) seeks to predict the sentiment polarity of a sentence toward a specific aspect. Recently, it has been shown that dependency trees can be integrated into deep learning models to produce the state-of-the-art performance for ABSA. However, these models tend to compute the hidden/representation vectors without considering the aspect terms and fail to benefit from the overall contextual importance scores of the words that can be obtained from the dependency tree for ABSA. In this work, we propose a novel graph-based deep learning model to overcome these two issues of the prior work on ABSA. In our model, gate vectors are generated from the representation vectors of the aspect terms to customize the hidden vectors of the graph-based models toward the aspect terms. In addition, we propose a mechanism to obtain the importance scores for each word in the sentences based on the dependency trees that are then injected into the model to improve the representation vectors for ABSA. The proposed model achieves the state-of-the-art performance on three benchmark datasets.
摘要：基于Aspect的情感分析（ABSA）试图预测句子的情感极性朝一个特定的方面。最近，它已经表明，依赖性树可以被集成到深层的学习模式，产生国家的最先进的性能ABSA。然而，这些模型往往计算隐藏/表示向量不考虑方面的条款和不从的话，可以从依赖关系树ABSA获得整体的上下文重要性分数受益。在这项工作中，我们提出了一种新的基于图的深度学习模型来克服ABSA的前期工作的这两个问题。在我们的模型中，从该方面的术语来定制朝向方面而言基于图形的模型的隐蔽向量的表示矢量产生的栅极载体。此外，我们提出了一种机制，以获得重视分数的基础上的依赖关系树被再注入模式，提高了ABSA表示向量的句子每个单词。该模型实现了对三个标准数据集的国家的最先进的性能。

17. FastFormers: Highly Efficient Transformer Models for Natural Language Understanding [PDF] 返回目录
Young Jin Kim, Hany Hassan Awadalla
Abstract: Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time performance for Transformer-based models on various NLU tasks. We show how carefully utilizing knowledge distillation, structured pruning and numerical optimization can lead to drastic improvements on inference efficiency. We provide effective recipes that can guide practitioners to choose the best settings for various NLU tasks and pretrained models. Applying the proposed recipes to the SuperGLUE benchmark, we achieve from 9.8x up to 233.9x speed-up compared to out-of-the-box models on CPU. On GPU, we also achieve up to 12.4x speed-up with the presented methods. We show that FastFormers can drastically reduce cost of serving 100 million requests from 4,223 USD to just 18 USD on an Azure F16s_v2 instance. This translates to a sustainable runtime by reducing energy consumption 6.9x - 125.8x according to the metrics used in the SustaiNLP 2020 shared task.
摘要：基于变压器的模型是国家的最先进的自然语言理解（NLU）应用。模型正在变得更大，更好的各种任务。然而，变压器模型计算仍然具有挑战性，因为他们是在推理时效率不高，与传统方法相比。在本文中，我们提出FastFormers，一套食谱，以实现各种自然语言理解任务的基于变压器的型号高效的推理时的性能。我们将展示如何通过谨慎地利用知识蒸馏，结构修剪和数值优化可能导致对推理效率大幅提高。我们提供了可以引导从业人员选择各种NLU任务和预训练模式的最佳设置有效的食谱。应用所提出的食谱，以强力胶水基准，我们实现了从9.8x高达233.9x增速相比，在CPU外的现成模式。在GPU上，我们还实现了高达12.4x加速与提出的方法。我们表明，FastFormers可以大大减少在一个蓝色的F16s_v2实例服务从4223 1名亿美元的请求，只是18美元的成本。根据在SustaiNLP 2020共享任务中使用的指标125.8x - 这通过减少能量消耗6.9倍转化为可持续运行时间。

18. Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning [PDF] 返回目录
Amir Pouran Ben Veyseh, Nasim Nouri, Franck Dernoncourt, Dejing Dou, Thien Huu Nguyen
Abstract: Targeted opinion word extraction (TOWE) is a sub-task of aspect based sentiment analysis (ABSA) which aims to find the opinion words for a given aspect-term in a sentence. Despite their success for TOWE, the current deep learning models fail to exploit the syntactic information of the sentences that have been proved to be useful for TOWE in the prior research. In this work, we propose to incorporate the syntactic structures of the sentences into the deep learning models for TOWE, leveraging the syntax-based opinion possibility scores and the syntactic connections between the words. We also introduce a novel regularization technique to improve the performance of the deep learning models based on the representation distinctions between the words in TOWE. The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
摘要：有针对性的意见词提取（TOWE）是基于方面情感分析（ABSA）的子任务，其目的是找出句子中的一个给定的方面长期的观点词。尽管他们成功TOWE，目前深度学习模型不能利用这一点已被证明为TOWE有用在以前的研究中句子的句法信息。在这项工作中，我们建议纳入句子的句法结构进深学习模型TOWE，借力基于语法的意见可能得分和词之间的句法联系。我们还引入了一个新的正则化技术，以提高基于TOWE中的词与词之间的区别表现在深度学习模型的性能。该模型被广泛分析，并实现了四个基准数据集的国家的最先进的性能。

19. LXPER Index 2.0: Improving Text Readability Assessment for L2 English Learners in South Korea [PDF] 返回目录
Bruce W. Lee, Jason Hyung-Jong Lee
Abstract: Most text readability assessment models are developed for the native readers of English and have low accuracy for texts in foreign English Language Training (ELT) curriculum. In this paper, we investigate a text readability assessment model for L2 English learners in Korea. In accordance, we improve and expand the Text Corpus of the Korean ELT curriculum (CoKEC-text). Each text is labeled with its target grade level. We train our model with CoKEC-text and significantly improve the accuracy of readability assessment for texts in the Korean ELT curriculum.
摘要：大多数文本可读性评估模型为英语的本土读者开发并有在国外英语语言培训（ELT）课程文本精度低。在本文中，我们探讨L2英语学习者在韩国文本可读性评估模型。按照我们改善和扩大韩国ELT课程的文本语料库（CoKEC文本）。每个文字都标有其目标年级水平。我们培训与CoKEC文本模式，显著提高可读性评估在韩国ELT课程文本的准确性。

20. The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task [PDF] 返回目录
Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser
Abstract: This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions, German<->Upper Sorbian. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation. Pseudo-parallel data obtained from an unsupervised statistical machine translation (USMT) system is used to fine-tune the UNMT model. We also apply BPE-Dropout to the low resource (Upper Sorbian) data to obtain a more robust system. We additionally experiment with residual adapters and find them useful in the Upper Sorbian->German direction. We explore sampling during backtranslation and curriculum learning to use SMT translations in a more principled way. Finally, we ensemble our best-performing systems and reach a BLEU score of 32.4 on German->Upper Sorbian and 35.2 on Upper Sorbian->German.
摘要：本文介绍了提交慕尼黑大学的在WMT 2020无监督的共同任务，两个语言方向，德语< - >上索布语。我们的核心无监督神经机器翻译（UNMT）系统遵循Chronopoulou等人的策略。（2020年），使用单语预训练语言代车型（德国）和微调它放在德国和上索布语，初始化UNMT模型，这与在线回译的培训之前。从一种无监督统计机器翻译（USMT）系统获得的伪并行数据被用于微调UNMT模型。我们也适用BPE降低资源（上索布语）的数据，以获得更强大的系统。我们还残留适配器实验，并发现他们在上Sorbian->德语方向有用。我们探索更有原则的方式回译和课程学习采用SMT转换过程中采样。最后，我们合奏我们表现最好的系统和上Sorbian-达到32.4在德语区>上索布语和35.2一BLEU得分>德语。

21. Fair Embedding Engine: A Library for Analyzing and Mitigating Gender Bias in Word Embeddings [PDF] 返回目录
Vaibhav Kumar, Tenzin Singhay Bhotia, Vaibhav Kumar
Abstract: Non-contextual word embedding models have been shown to inherit human-like stereotypical biases of gender, race and religion from the training corpora. To counter this issue, a large body of research has emerged which aims to mitigate these biases while keeping the syntactic and semantic utility of embeddings intact. This paper describes Fair Embedding Engine (FEE), a library for analysing and mitigating gender bias in word embeddings. FEE combines various state of the art techniques for quantifying, visualising and mitigating gender bias in word embeddings under a standard abstraction. FEE will aid practitioners in fast track analysis of existing debiasing methods on their embedding models. Further, it will allow rapid prototyping of new methods by evaluating their performance on a suite of standard metrics.
摘要：非上下文的单词嵌入模式已被证明继承类人的性别，从训练语料种族和宗教偏见的定型。为了解决这个问题，一个大的研究机构已经出现，其目的在保持完整的嵌入的句法和语义工具来减轻这些偏见。本文介绍了博览会嵌入引擎（FEE），在字的嵌入分析和减轻性别偏见的库。 FEE结合的本领域技术用于定量，可视化和一个标准的抽象下减轻在字的嵌入性别偏压各种状态。费将有助于他们的模型嵌入现有的方法消除直流偏压的快车道分析从业者。此外，它允许新的方法快速原型通过在一套标准的指标评估他们的表现。

22. Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding [PDF] 返回目录
Seongbin Kim, Gyuwan Kim, Seongjin Shin, Sangmin Lee
Abstract: End-to-end approaches open a new way for more accurate and efficient spoken language understanding (SLU) systems by alleviating the drawbacks of traditional pipeline systems. Previous works exploit textual information for an SLU model via pre-training with automatic speech recognition or fine-tuning with knowledge distillation. To utilize textual information more effectively, this work proposes a two-stage textual knowledge distillation method that matches utterance-level representations and predicted logits of two modalities during pre-training and fine-tuning, sequentially. We use vq-wav2vec BERT as a speech encoder because it captures general and rich features. Furthermore, we improve the performance, especially in a low-resource scenario, with data augmentation methods by randomly masking spans of discrete audio tokens and contextualized hidden representations. Consequently, we push the state-of-the-art on the Fluent Speech Commands, achieving 99.7% test accuracy in the full dataset setting and 99.5% in the 10% subset setting. Throughout the ablation studies, we empirically verify that all used methods are crucial to the final performance, providing the best practice for spoken language understanding. Code to reproduce our results will be available upon publication.
摘要：结束对终端进行更精确和有效的口语理解（SLU）通过减少传统的管道系统的缺点，系统方法打开了新的途径。以前的作品利用了通过前培训与自动语音识别或微调知识蒸馏的SLU模型的文本信息。为了更有效地利用文本信息，这项工作提出了两个阶段的文本知识蒸馏方法中前培训和微调，顺序匹配话语级表示和两种模式的预测logits。我们使用VQ-BERT wav2vec的语音编码器，因为它抓住了一般和丰富的功能。此外，我们通过随机掩蔽离散音频令牌和情境隐藏表示的跨度提高性能，尤其是在低资源的情况下，数据增强方法。因此，我们推动了流利的语音命令的国家的最先进，实现完整数据集的设定99.7％的测试精度和99.5％，在10％的子集的设置。整个消融研究中，我们经验验证所有常用的方法是到最后的性能是至关重要的，为口语理解的最佳实践。代码重现我们的结果将是可利用的出版物。

23. Autoencoding Improves Pre-trained Word Embeddings [PDF] 返回目录
Masahiro Kaneko, Danushka Bollegala
Abstract: Prior work investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to applying a linear autoencoder to minimise the squared l2 reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labelled data.
摘要：以前的工作调查的预训练的字的嵌入的几何形状已经表明字的嵌入到在窄圆锥体和通过定心并且使用主分量向量可以增加一组给定的预先训练字的嵌入的精度突出分布。然而，从理论上讲，这个后处理步骤是等同于应用线性自动编码器，以尽量减少平方L2重构误差。该结果相矛盾以前的工作（穆和Viswanath，2018），该建议从预先训练的嵌入取下顶部主成分。我们用实验来验证我们的理论主张，并表明护顶主要成分是提高预训练字的嵌入确实有用，而不需要访问其他语言资源或标签的数据。

24. Transgender Community Sentiment Analysis from Social Media Data: A Natural Language Processing Approach [PDF] 返回目录
Mengzhe Li, Yudan Wang, Ying Zhao, Zhixiang Li
Abstract: Transgender community is experiencing a huge disparity in mental health conditions compared with the general population. Interpreting the social medial data posted by transgender people may help us understand the sentiments of these sexual minority groups better and apply early interventions. In this study, we manually categorize 300 social media comments posted by transgender people to the sentiment of negative, positive, and neutral. 5 machine learning algorithms and 2 deep neural networks are adopted to build sentiment analysis classifiers based on the annotated data. Results show that our annotations are reliable with a high Cohen's Kappa score over 0.8 across all three classes. LSTM model yields an optimal performance of accuracy over 0.85 and AUC of 0.876. Our next step will focus on using advanced natural language processing algorithms on a larger annotated dataset.
摘要：变性者群体正经历着比一般人群心理健康状况的巨大差异。解读张贴变性人的社会内的数据可以帮助我们了解这些性少数群体更好的情绪和应用早期干预。在这项研究中，我们手动分类由变性人张贴到负，积极，中性的感悟300个社交媒体评论。 5种机器学习算法和2个深神经网络采用基于所述注释的数据构建情绪分析分类器。结果表明，我们的注释是具有高Cohen的κ得分超过所有三类0.8可靠。 LSTM模型产生的精度在0.85和0.876的AUC的最佳性能。下一步，我们将重点介绍如何使用一个更大的数据集注释先进的自然语言处理算法。

25. Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge [PDF] 返回目录
Sathvik Nair, Mahesh Srinivasan, Stephan Meylan
Abstract: Understanding context-dependent variation in word meanings is a key aspect of human language comprehension supported by the lexicon. Lexicographic resources (e.g., WordNet) capture only some of this context-dependent variation; for example, they often do not encode how closely senses, or discretized word meanings, are related to one another. Our work investigates whether recent advances in NLP, specifically contextualized word embeddings, capture human-like distinctions between English word senses, such as polysemy and homonymy. We collect data from a behavioral, web-based experiment, in which participants provide judgments of the relatedness of multiple WordNet senses of a word in a two-dimensional spatial arrangement task. We find that participants' judgments of the relatedness between senses are correlated with distances between senses in the BERT embedding space. Homonymous senses (e.g., bat as mammal vs. bat as sports equipment) are reliably more distant from one another in the embedding space than polysemous ones (e.g., chicken as animal vs. chicken as meat). Our findings point towards the potential utility of continuous-space representations of sense meanings.
摘要：词义理解上下文相关的变化是由词汇支持自然语言理解的一个重要方面。字典资源（例如，WordNet中）捕获仅一些这依赖于上下文的变化的;例如，他们往往不编码如何密切的感官，或离散字的含义，都涉及到彼此。我们的工作调查是否在NLP的最新进展，具体情境字的嵌入，捕捉人喜欢英语词义，如多义词和同形异义之间的区别。我们从行为的，基于网络的实验中，参与者提供一个二维空间布置任务单词的多重共发现感官的相关性判断收集数据。我们发现，参与者的相关性判断之间的感觉与在BERT嵌入空间的感官之间的距离有关。同名的感官（例如，蝙蝠如哺乳动物与球棒运动器材）是可靠地彼此更加遥远在比多义那些嵌入空间（例如，鸡作为动物对鸡肉）。我们的研究结果指向意义上的含义连续空间表示的潜在效用。

26. Commonsense knowledge adversarial dataset that challenges ELECTRA [PDF] 返回目录
Gongqi Lin, Yuan Miao, Xiaoyong Yang, Wenwu Ou, Lizhen Cui, Wei Guo, Chunyan Miao
Abstract: Commonsense knowledge is critical in human reading comprehension. While machine comprehension has made significant progress in recent years, the ability in handling commonsense knowledge remains limited. Synonyms are one of the most widely used commonsense knowledge. Constructing adversarial dataset is an important approach to find weak points of machine comprehension models and support the design of solutions. To investigate machine comprehension models' ability in handling the commonsense knowledge, we created a Question and Answer Dataset with common knowledge of Synonyms (QADS). QADS are questions generated based on SQuAD 2.0 by applying commonsense knowledge of synonyms. The synonyms are extracted from WordNet. Words often have multiple meanings and synonyms. We used an enhanced Lesk algorithm to perform word sense disambiguation to identify synonyms for the context. ELECTRA achieves the state-of-art result on the SQuAD 2.0 dataset in 2019. With scale, ELECTRA can achieve similar performance as BERT does. However, QADS shows that ELECTRA has little ability to handle commonsense knowledge of synonyms. In our experiment, ELECTRA-small can achieve 70% accuracy on SQuAD 2.0, but only 20% on QADS. ELECTRA-large did not perform much better. Its accuracy on SQuAD 2.0 is 88% but dropped significantly to 26% on QADS. In our earlier experiments, BERT, although also failed badly on QADS, was not as bad as ELECTRA. The result shows that even top-performing NLP models have little ability to handle commonsense knowledge which is essential in reading comprehension.
摘要：常识的知识是人类的阅读理解的关键。虽然机器理解在最近几年取得显著的进展，在处理常识性知识仍然是能力的限制。同义词是最广泛使用的常识性知识之一。构建对抗性数据集是找到机器理解模型的弱点和支持解决方案的设计中的一个重要途径。为了研究机械理解模型在处理常识性知识的能力，我们创造了一问一答数据集别名（QADS）的常识。 QADS是运用同义词常识知识的基础上班长2.0生成的问题。同义词是从WordNet中提取。词往往有多重含义和同义词。我们使用了增强Lesk算法进行词义消歧，以确定上下文的同义词。 ELECTRA实现在2019年的小队2.0数据集带有刻度的状态的最先进的结果，ELECTRA可以如BERT确实实现类似的性能。然而，QADS显示，ELECTRA有能力的同义词手柄常识性的知识很少。在我们的实验中，ELECTRA小可以实现在队2.0 70％的准确度，但只有20％的QADS。 ELECTRA - 大没更好的执行。其对阵容2.0精度为88％，但显著回落至26％的QADS。在我们早期的实验，BERT，虽然也QADS严重失败，并没有那么糟糕，因为ELECTRA。结果表明，即使是顶级表现的NLP模型有能力来处理常识性的知识是阅读理解的必要不大。

27. Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder [PDF] 返回目录
Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
Abstract: Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems. End-to-end (E2E) models based on the encoder-decoder architecture are more suitable for this goal than traditional cascaded systems, but their effectiveness regarding decoding speed has not been explored so far. Inspired by recent progress in non-autoregressive (NAR) methods in text-based translation, which generates target tokens in parallel by eliminating conditional dependencies, we study the problem of NAR decoding for E2E-ST. We propose a novel NAR E2E-ST framework, Orthoros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder. The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead. We further investigate effective length prediction methods from speech inputs and the impact of vocabulary sizes. Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality compared to state-of-the-art AR E2E-ST systems.
摘要：快速推理速度是对语音翻译（ST）系统的实际部署的重要目标。终端到端到端（E2E）的基础上，编码器，解码器架构模型更适合这个目标比传统的级联系统，但他们对解码速度有效性尚未到目前为止探讨。通过在基于文本的翻译，通过消除条件依赖于并行生成目标令牌非自回归（NAR）方法的最新进展的启发，我们研究NAR为E2E-ST解码的问题。我们提出了一个新颖的NAR E2E-ST框架，Orthoros，其中NAR和自回归（AR）解码器两者共同上的共享语音编码器训练。后者用于选择来自前，它极大地改善具有可忽略的开销大的长度光束的有效性产生的各种长度的候选中更好的翻译。我们进一步研究从语音输入有效长度的预测方法和词汇量的影响。四个基准实验表明该方法的有效性在提高推理速度，同时相比于国家的最先进的AR E2E-ST系统保持竞争力的翻译质量。

28. Fine-tuning ERNIE for chest abnormal imaging signs extraction [PDF] 返回目录
Zhaoning Li, Jiangtao Ren
Abstract: Chest imaging reports describe the results of chest radiography procedures. Automatic extraction of abnormal imaging signs from chest imaging reports has a pivotal role in clinical research and a wide range of downstream medical tasks. However, there are few studies on information extraction from Chinese chest imaging reports. In this paper, we formulate chest abnormal imaging sign extraction as a sequence tagging and matching problem. On this basis, we propose a transferred abnormal imaging signs extractor with pretrained ERNIE as the backbone, named EASON (fine-tuning ERNIE with CRF for Abnormal Signs ExtractiON), which can address the problem of data insufficiency. In addition, to assign the attributes (the body part and degree) to corresponding abnormal imaging signs from the results of the sequence tagging model, we design a simple but effective tag2relation algorithm based on the nature of chest imaging report text. We evaluate our method on the corpus provided by a medical big data company, and the experimental results demonstrate that our method achieves significant and consistent improvement compared to other baselines.
摘要：胸部影像学报告描述的胸片程序的结果。从胸部影像学报告的异常征象自动提取在临床研究和广泛的下游医疗任务了举足轻重的作用。不过，也有来自中国的胸部影像学报告信息提取的研究很少。在本文中，我们制定胸部异常成像符号提取作为序列标记和匹配问题。在此基础上，我们提出用预训练的摇奖一个转移异常征象提取为骨干，命名为EASON（微调摇奖与CRF的异常体征提取），它可以解决数据不足的问题。此外，分配的属性（身体部位和程度），以相应的从序列标签模型的结果异常征象，我们设计了一种基于胸部影像学报告文本性质的简单而有效的tag2relation算法。我们评估对医疗大数据公司提供的语料库我们的方法，实验结果表明，相对于其他基线我们的方法实现显著和持续改进。

29. Towards Medical Knowmetrics: Representing and Computing Medical Knowledge using Semantic Predications as the Knowledge Unit and the Uncertainty as the Knowledge Context [PDF] 返回目录
Xiaoying Li, Suyuan Peng, Jian Du
Abstract: In China, Prof. Hongzhou Zhao and Zeyuan Liu are the pioneers of the concept "knowledge unit" and "knowmetrics" for measuring knowledge. However, the definition of "computable knowledge object" remains controversial so far in different fields. For example, it is defined as 1) quantitative scientific concept in natural science and engineering, 2) knowledge point in the field of education research, and 3) semantic predications, i.e., Subject-Predicate-Object (SPO) triples in biomedical fields. The Semantic MEDLINE Database (SemMedDB), a high-quality public repository of SPO triples extracted from medical literature, provides a basic data infrastructure for measuring medical knowledge. In general, the study of extracting SPO triples as computable knowledge unit from unstructured scientific text has been overwhelmingly focusing on scientific knowledge per se. Since the SPO triples would be possibly extracted from hypothetical, speculative statements or even conflicting and contradictory assertions, the knowledge status (i.e., the uncertainty), which serves as an integral and critical part of scientific knowledge has been largely overlooked. This article aims to put forward a framework for Medical Knowmetrics using the SPO triples as the knowledge unit and the uncertainty as the knowledge context. The lung cancer publications dataset is used to validate the proposed framework. The uncertainty of medical knowledge and how its status evolves over time indirectly reflect the strength of competing knowledge claims, and the probability of certainty for a given SPO triple. We try to discuss the new insights using the uncertainty-centric approaches to detect research fronts, and identify knowledge claims with high certainty level, in order to improve the efficacy of knowledge-driven decision support.
摘要：在中国，洪州赵教授和丰泽园刘是体现了“知识单元”和“知识计量学”为衡量知识的先驱。然而，“可计算的知识对象”的定义仍存在争议在不同的领域至今。例如，它被定义为1）在自然科学和工程科学量化的概念，2）在教育研究领域的知识点，以及3）语义断言，即主谓对象（SPO）三倍于生物医学领域。语义MEDLINE数据库（SemMedDB），SPO的高质量的公共库三元组从医学文献中提取，提供了用于测量医学知识基本数据的基础设施。在一般情况下，提取SPO三元从非结构化文本的科学知识可计算单位的研究已经压倒性地专注于本身的科学知识。由于SPO三元将从假想，投机陈述或者甚至冲突的和矛盾的断言被可能萃取，将知识状态（即，不确定性），其用作科学知识的一个不可分割的重要组成部分已经大大地忽略。本文旨在通过SPO三元的知识单元和不确定性的背景知识提出了一个框架，医疗知识计量学。肺癌出版物数据集用于验证所提出的框架。医学知识和如何随时间而它的状态演变的不确定性间接反映竞争的知识的权利要求的强度，并确定对于给定SPO三倍的概率。我们尝试使用的不确定性为中心的方法来检测研究方面，讨论了新的见解，并确定具有高可靠度知识的主张，以提高知识驱动的决策支持的有效性。

30. CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media [PDF] 返回目录
Sayyed M. Zahiri, Ali Ahmadvand
Abstract: In recent years, social media platforms have hosted an explosion of hate speech and objectionable content. The urgent need for effective automatic hate speech detection models have drawn remarkable investment from companies and researchers. Social media posts are generally short and their semantics could drastically be altered by even a single token. Thus, it is crucial for this task to learn context-aware input representations, and consider relevancy scores between input embeddings and class representations as an additional signal. To accommodate these needs, this paper introduces CRAB (Class Representation Attentive BERT), a neural model for detecting hate speech in social media. The model benefits from two semantic representations: (i) trainable token-wise and sentence-wise class representations, and (ii) contextualized input embeddings from state-of-the-art BERT encoder. To investigate effectiveness of CRAB, we train our model on Twitter data and compare it against strong baselines. Our results show that CRAB achieves 1.89% relative improved Macro-averaged F1 over state-of-the-art baseline. The results of this research open an opportunity for the future research on automated abusive behavior detection in social media
摘要：近年来，社交媒体平台的托管仇恨言论和令人反感的内容的爆炸。对于有效的自动仇恨言论检测模型的迫切需求已引起了公司和研究人员显着的投资。社交媒体帖子一般都是短期和它们的语义可以通过大幅甚至是单个令牌来改变。因此，关键是这个任务的学习情境感知输入交涉，并考虑输入的嵌入和类表示之间相关度分值作为附加信号。为了适应这些需求，本文介绍CRAB（类表示细心BERT），用于在社交媒体检测恨语音的神经网络模型。从两个语义表示的模型的好处：（ⅰ）可训练令牌明智和句子逐类表示，和（ii）从国家的最先进的编码器BERT语境输入的嵌入。调查蟹有效性，我们培养我们的Twitter数据模型，并将其与强大的基线。我们的研究结果表明，实现了CRAB过度状态的最先进的基线相对于1.89％提高宏平均F1。这项研究的结果打开了一个机会，在社交媒体上自动滥用行为的检测未来的研究方向

31. Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference [PDF] 返回目录
Jian-Guo Zhang, Kazuma Hashimoto, Wenhao Liu, Chien-Sheng Wu, Yao Wan, Philip S. Yu, Richard Socher, Caiming Xiong
Abstract: Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep self-attention. Unlike softmax classifiers, we leverage BERT-style pairwise encoding to train a binary classifier that estimates the best matched training example for a user input. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model. Our extensive experiments on a large-scale multi-domain intent detection task show that our method achieves more stable and accurate in-domain and OOS detection accuracy than RoBERTa-based classifiers and embedding-based nearest neighbor approaches. More notably, the NLI transfer enables our 10-shot model to perform competitively with 50-shot or even full-shot classifiers, while we can keep the inference time constant by leveraging a faster embedding retrieval model.
摘要：意图检测是面向目标的对话系统的核心组分中的一种，并检测外的范围（OOS）意图也是实用上重要的技能。很少拍学习是倍受关注，以减轻数据稀少，但OOS检测变得更具挑战性。在本文中，我们提出了一个简单而有效的方法，具有深厚的自我注意辨别近邻分类。与SOFTMAX分类，我们利用BERT式成对编码到列车估计用户输入的最佳匹配训练例如二元分类。我们建议通过将自然语言推理（NLI）模型来提振甄别能力。我们在一个大型的多域意图检测任务表明，我们的方法实现了更稳定，域内和OOS检测精度比基于罗伯塔，分类准确和嵌入基于近邻广泛的实验方法。更值得注意的是，NLI转移使我们的10次模型与50次，甚至全拍分类进行竞争，虽然我们可以通过利用更快的嵌入检索模型保持推理时间常数。

32. Pre-trained Summarization Distillation [PDF] 返回目录
Sam Shleifer, Alexander M. Rush
Abstract: Current state-of-the-art approaches to summarization utilize large pre-trained Transformer models. Distilling these models to smaller student models has become critically important for practical use; however there are many different distillation methods in the NLP literature. Recent work on distilling BERT for classification and regression tasks shows strong performance using standard knowledge distillation. Alternatively, machine translation practitioners, have primarily distilled using pseudo labeling, where a small model is trained on the translations of a larger model. A third approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by transferring parameters to a student model and then fine-tuning. This work considers distillation of BART and Pegasus, two state of the art summarization models, on two datasets across a variety of student models. We produce high quality, fast checkpoints across different computational budgets, and learn some patterns about which distillation techniques perform well in which situations. PyTorch code to rerun our methods, and use the distilled BART and Pegasus checkpoints is available in Hugging Face transformers.
摘要：当前国家的最先进的方法来汇总利用大量预训练Transformer模型。蒸馏这些车型更小的学生机型已经成为了实际使用至关重要;但有在NLP文献许多不同的蒸馏方法。使用标准的知识蒸馏蒸馏BERT分类和回归任务显示了强劲的性能最近的工作。另外，机器翻译从业人员，主要有使用假标签，其中一个小模型是在一个更大的模型的翻译训练的蒸馏。第三种方法是对“收缩和微调”（SFT），这避免了由传送参数的一个学生模型，然后微调任何显式蒸馏。这项工作考虑在各种学生机型的BART和飞马蒸馏，艺术概括车型两种状态，在两个数据集。我们生产高品质，跨越不同的计算预算快速检查站，并学习一些图案哪些蒸馏技术在哪些情况下表现良好。 PyTorch代码重新运行我们的方法，并使用蒸馏水BART和飞马检查站是在拥抱面对变压器可用。

33. Unsupervised Learning of Disentangled Speech Content and Style Representation [PDF] 返回目录
Andros Tjandra, Ruoming Pang, Yu Zhang, Shigeki Karita
Abstract: We present an approach for unsupervised learning of speech representation disentangling contents and styles. Our model consists of: (1) a local encoder that captures per-frame information; (2) a global encoder that captures per-utterance information; and (3) a conditional decoder that reconstructs speech given local and global latent variables. Our experiments show that (1) the local latent variables encode speech contents, as reconstructed speech can be recognized by ASR with low word error rates (WER), even with a different global encoding; (2) the global latent variables encode speaker style, as reconstructed speech shares speaker identity with the source utterance of the global encoding. Additionally, we demonstrate an useful application from our pre-trained model, where we can train a speaker recognition model from the global latent variables and achieve high accuracy by fine-tuning with as few data as one label per speaker.
摘要：我们提出的讲话表示解开的内容和风格监督学习的方法。我们的模型包括：（1）一个本地编码器，其捕获每帧的信息; （2）一个编码器全球捕获每个发声信息; （3）有条件的解码器，其重建的演辞局部和全局潜在变量。我们的实验表明：（1）当地的潜在变量编码的语音内容，重建语音可以通过ASR低字错误率（WER），即使有不同的全局编码识别; （2）全球潜在变量编码扬声器风格，与全球编码的源发声重建语音股发言者身份。此外，我们证明我们的预先训练模式，在这里我们可以培训来自全球的潜在变量说话人识别模型，并实现通过微调精度高以尽可能少的数据，每个扬声器一个标签一个有用的应用。

34. Neural Compound-Word (Sandhi) Generation and Splitting in Sanskrit Language [PDF] 返回目录
Sushant Dave, Arun Kumar Singh, Dr. Prathosh A. P., Prof. Brejesh Lall
Abstract: This paper describes neural network based approaches to the process of the formation and splitting of word-compounding, respectively known as the Sandhi and Vichchhed, in Sanskrit language. Sandhi is an important idea essential to morphological analysis of Sanskrit texts. Sandhi leads to word transformations at word boundaries. The rules of Sandhi formation are well defined but complex, sometimes optional and in some cases, require knowledge about the nature of the words being compounded. Sandhi split or Vichchhed is an even more difficult task given its non uniqueness and context dependence. In this work, we propose the route of formulating the problem as a sequence to sequence prediction task, using modern deep learning techniques. Being the first fully data driven technique, we demonstrate that our model has an accuracy better than the existing methods on multiple standard datasets, despite not using any additional lexical or morphological resources. The code is being made available at this https URL
摘要：本文介绍了基于神经网络的方法来字复合，分别被称为变调和Vichchhed，在梵文语言的形成和分解的过程。变调是一个重要思路必须梵语文献的形态分析。变调导致字转换为字边界。的变调形成的规则，而是复杂的，有时是可选的，并且在某些情况下，大约需要的话正在加剧了自然知识的明确界定。变调拆分或Vichchhed是由于其非唯一性和依据上下文更加艰巨的任务。在这项工作中，我们提出制定问题的顺序序列预测任务的途径，利用现代深学习技术。作为第一个完全数据驱动技术，我们证明了我们的模型的精确度比对多个数据集标准现有的方法比较好，尽管没有使用任何额外的词汇或形态的资源。该代码被提供在该HTTPS URL

35. A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis [PDF] 返回目录
Arun Kumar Singh, Sushant Dave, Dr. Prathosh A. P., Prof. Brejesh Lall, Shresth Mehta
Abstract: This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately there has not been any work to standardize & validate these tools specifically for derivative nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools. We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools. This benchmark will be freely dedicated and available to researchers worldwide and we hope it will motivate all to improve morphological analysis in Sanskrit Language.
摘要：梵语恶缘（后缀），并且由于与基于神经网络的沿着形成为后缀屈折字（帕达斯）的本文呈现第一基准语料库接近处理的屈折词的形成和分裂。屈折词跨越初级和次级衍生物名词作电流的工作范围。 Pratyayas是梵语文献的形态分析的一个重要方面。已经有梵文计算语言学工具，用于处理和分析梵语文献。不幸的是没有发生过任何工作，以规范和明确确认这些工具的衍生名词分析。在这项工作中，我们准备了一个称为恶缘，信贷基金，以评估工具的性能梵文后缀基准语料库。我们也提出我们的衍生名词分析自己的神经途径，同时评估对最突出的梵文形态分析工具一样。这个基准测试将可以自由敬业，提供给世界各地的研究，我们希望这将激励所有改善梵语形态分析。

36. Disease Normalization with Graph Embeddings [PDF] 返回目录
Dhruba Pujary, Camilo Thorne, Wilker Aziz
Abstract: The detection and normalization of diseases in biomedical texts are key biomedical natural language processing tasks. Disease names need not only be identified, but also normalized or linked to clinical taxonomies describing diseases such as MeSH. In this paper we describe deep learning methods that tackle both tasks. We train and test our methods on the known NCBI disease benchmark corpus. We propose to represent disease names by leveraging MeSH's graphical structure together with the lexical information available in the taxonomy using graph embeddings. We also show that combining neural named entity recognition models with our graph-based entity linking methods via multitask learning leads to improved disease recognition in the NCBI corpus.
摘要：检测和生物医学文献疾病正常化的关键生物医学自然语言处理任务。疾病名称不仅需要被识别，而且正火或连接临床分类描述的疾病，如网格。在本文中，我们描述了解决这两个任务的深度学习方法。我们训练和测试我们对已知的NCBI疾病基准语料库方法。我们建议通过使用图嵌入分类学可用的词汇信息利用主题词的图形结构一起代表病名。我们还表明，通过多任务学习导致改善疾病认可，在NCBI阴茎神经命名实体识别模型与我们的基于图形的实体连接相结合的方法。

37. Causal Effects of Linguistic Properties [PDF] 返回目录
Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, Dhanya Sridhar
Abstract: We consider the problem of estimating the causal effects of linguistic properties on downstream outcomes. For example, does writing a complaint politely lead to a faster response time? How much will a positive product review increase sales? This paper focuses on two challenges related to the problem. First, we formalize the causal quantity of interest as the effect of a writer's intent, and establish the assumptions necessary to identify this from observational data. Second, in practice we only have access to noisy proxies for these linguistic properties---e.g., predictions from classifiers and lexicons. We propose an estimator for this setting and prove that its bias is bounded when we perform an adjustment for the text. The method leverages (1) a pre-trained language model (BERT) to adjust for the text, and (2) distant supervision to improve the quality of noisy proxies. We show that our algorithm produces better causal estimates than related methods on two datasets: predicting the effect of music review sentiment on sales, and complaint politeness on response time.
摘要：我们认为，估计对下游成果的语言特性的因果效应的问题。例如，不写礼貌导致更快的响应时间投诉？多少会产生积极的产品评论增加销售？本文主要对有关问题的两种挑战。首先，我们正式感兴趣的因果量作为一个作家的意图的作用，并建立必要从观测数据识别这个假设。其次，在实践中，我们只能访问嘈杂代理这些语言特性---如，从分类和词汇的预测。我们提出一个估计此设置和证明，当我们进行文本调整其偏置为界。该方法的杠杆（1）一个预训练的语言模型（BERT）调整为文本，和（2）遥远监控提高嘈杂代理的质量。我们证明了我们的算法产生更好的因果估计比相关的方法对两个数据集：预测上销售音乐评论的情绪，投诉礼貌的响应时间的影响。

38. Word Embeddings for Chemical Patent Natural Language Processing [PDF] 返回目录
Camilo Thorne, Saber Akhondi
Abstract: We evaluate chemical patent word embeddings against known biomedical embeddings and show that they outperform the latter extrinsically and intrinsically. We also show that using contextualized embeddings can induce predictive models of reasonable performance for this domain over a relatively small gold standard.
摘要：我们评估针对已知的生物医学的嵌入化学专利的嵌入文字，并表明他们优于后者外在和内在。我们还表明，使用情境的嵌入可诱发的合理的性能预测模型，该域在一个相对较小的黄金标准。

39. Unsupervised Paraphrase Generation via Dynamic Blocking [PDF] 返回目录
Tong Niu, Semih Yavuz, Yingbo Zhou, Huan Wang, Nitish Shirish Keskar, Caiming Xiong
Abstract: We propose Dynamic Blocking, a decoding algorithm which enables large-scale pretrained autoregressive models (such as BART, T5, GPT-2 and XLNet) to generate high-quality paraphrases in an unsupervised setting. In order to obtain an alternative surface form, whenever the language model emits a token that is present in the source sequence, we prevent the model from generating the subsequent source token for the next time step. We show that our approach achieves state-of-the-art results on benchmark datasets when compared to previous unsupervised approaches, and is even comparable with strong supervised, in-domain models. We also propose a new automatic metric based on self-BLEU and BERTscore which not only discourages the model from copying the input through, but also evaluates text similarity based on distributed representations, hence avoiding reliance on exact keyword matching. In addition, we demonstrate that our model generalizes across languages without any additional training.
摘要：我们提出动态阻塞，解码算法使大规模预训练的自回归模型（如BART，T5，GPT-2和XLNet）以无监督设置生成高质量的释义。为了获得一个替代的表面形状，每当语言模型发出的令牌是本源序列中，我们防止令牌生成用于下一时间步骤中的后续源模型。我们发现，相对于以前无人监督的方法时，我们的方法实现对基准数据集的国家的最先进的成果，甚至是具有很强的监督，在域车型相媲美。我们还提出了一个新的自动根据自我BLEU和BERTscore这不仅不鼓励通过复制输入的模式，也评估基于分布式陈述文字的相似性，从而避免在精确的关键字匹配的依赖度。此外，我们证明了我们的模型跨语言概括，没有任何额外的培训。

40. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints [PDF] 返回目录
Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
Abstract: Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large amounts of task-specific examples. We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models -- supervised or not -- to generate fluent text while satisfying complex lexical constraints. Our approach is powerful yet efficient. It handles any set of lexical constraints that is expressible under predicate logic, while its asymptotic runtime is equivalent to conventional beam search. Empirical results on four benchmarks show that NeuroLogic Decoding outperforms previous approaches, including algorithms that handle a subset of our constraints. Moreover, we find that unsupervised models with NeuroLogic Decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks. Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.
摘要：条件文本生成往往需要词汇的限制，即，哪些词应该或不应该被包括在输出文本。而占主导地位的配方条件文本一代一直被具体任务的培训数据微调，大型预训练的语言模型，这些模型没有学会遵循的基本约束可靠，即使有大量的特定任务的例子监督。我们建议神经解码，一个简单而有效的算法，使神经语言模型 - 监督或不 - 同时满足复杂的词汇的约束产生文字通顺。我们的做法是功能强大且高效。它处理任何一组词汇约束是下谓词逻辑表达，而它的渐近运行时是相当于传统波束搜索。四个基准实证结果表明，神经系统解码优于以前的方法，包括处理好我们的约束的子集的算法。此外，我们发现，无监督的模式与神经系统解码往往超越传统解码监督模式，即使后者是基于相当大的网络。我们的研究结果表明，大型神经网络的细粒度控制生成和限制的推理时间算法的承诺。

41. FedE: Embedding Knowledge Graphs in Federated Setting [PDF] 返回目录
Mingyang Chen, Wen Zhang, Zonggang Yuan, Yantao Jia, Huajun Chen
Abstract: Knowledge graphs (KGs) consisting of triples are always incomplete, so it's important to do Knowledge Graph Completion (KGC) by predicting missing triples. Multi-Source KG is a common situation in real KG applications which can be viewed as a set of related individual KGs where different KGs contains relations of different aspects of entities. It's intuitive that, for each individual KG, its completion could be greatly contributed by the triples defined and labeled in other ones. However, because of the data privacy and sensitivity, a set of relevant knowledge graphs cannot complement each other's KGC by just collecting data from different knowledge graphs together. Therefore, in this paper, we introduce federated setting to keep their privacy without triple transferring between KGs and apply it in embedding knowledge graph, a typical method which have proven effective for KGC in the past decade. We propose a Federated Knowledge Graph Embedding framework FedE, focusing on learning knowledge graph embeddings by aggregating locally-computed updates. Finally, we conduct extensive experiments on datasets derived from KGE benchmark datasets and results show the effectiveness of our proposed FedE.
摘要：知识图（KGS）由三元总是不完整的，所以它通过预测丢失三元做知识图完成（KGC）是很重要的。多源KG是可以被看作是一组相关的各个幼儿园的不同地方幼儿园包含的实体的不同方面的关系，真正KG应用的常见情况。这是直观的，对于每个单独的KG，它的建成可极大地促进在其他的定义，并标有三倍。然而，由于数据隐私和敏感，一组相关知识图不能仅通过收集来自不同的知识图数据一起相得益彰的KGC。因此，在本文中，我们介绍了联合设置，以保持自己的隐私没有幼儿园之间传递三倍和嵌入知识图，它在过去十年中被证明有效的KGC的典型方法应用它。我们提出了一个联合知识图嵌入框架FEDE，专注于通过整合本地计算的更新学习知识图嵌入。最后，我们从KGE基准数据集的数据集导出进行广泛的实验和实验结果表明我们提出的FEDE的有效性。

42. Revisiting Neural Language Modelling with Syllables [PDF] 返回目录
Arturo Oncevay, Kervy Rivas Rojas
Abstract: Language modelling is regularly analysed at word, subword or character units, but syllables are seldom used. Syllables provide shorter sequences than characters, they can be extracted with rules, and their segmentation typically requires less specialised effort than identifying morphemes. We reconsider syllables for an open-vocabulary generation task in 20 languages. We use rule-based syllabification methods for five languages and address the rest with a hyphenation tool, which behaviour as syllable proxy is validated. With a comparable perplexity, we show that syllables outperform characters, annotated morphemes and unsupervised subwords. Finally, we also study the overlapping of syllables concerning other subword pieces and discuss some limitations and opportunities.
摘要：语言模型在词，子词或字符为单位定期分析，但音节很少使用。音节提供比字符较短序列，它们可与规则中提取，和它们的分割通常需要比识别词素以下专门的努力。我们对20种语言的开放式词汇的生成任务重新考虑音节。我们使用五种语言基于规则的音节的方法和解决与一个断字的工具，其行为音节代理验证的其余部分。随着类似的困惑，我们表明，音节文字跑赢大市，注释语素和无监督的子词。最后，我们还研究了关于其他子字块音节的重叠，并讨论了一些限制和机会。

43. Learning Contextualized Knowledge Structures for Commonsense Reasoning [PDF] 返回目录
Jun Yan, Mrigank Raman, Tianyu Zhang, Ryan Rossi, Handong Zhao, Sungchul Kim, Nedim Lipka, Xiang Ren
Abstract: Recently, neural-symbolic architectures have achieved success on commonsense reasoning through effectively encoding relational structures retrieved from external knowledge graphs (KGs) and obtained state-of-the-art results in tasks such as (commonsense) question answering and natural language inference. However, these methods rely on quality and contextualized knowledge structures (i.e., fact triples) that are retrieved at the pre-processing stage but overlook challenges caused by incompleteness of a KG, limited expressiveness of its relations, and retrieved facts irrelevant to the reasoning context. In this paper, we present a novel neural-symbolic model, named Hybrid Graph Network (HGN), which jointly generates feature representations for new triples (as a complement to existing edges in the KG), determines the relevance of the triples to the reasoning context, and learns graph module parameters for encoding the relational information. Our model learns a compact graph structure (comprising both extracted and generated edges) through filtering edges that are unhelpful to the reasoning process. We show marked improvement on three commonsense reasoning benchmarks and demonstrate the superiority of the learned graph structures with user studies.
摘要：通过对常识推理近日，神经象征性的架构都取得了成功有效编码如（常识）答疑和自然语言推理的任务，从外部知识图（KGS）并获得国家的先进成果检索关系结构。然而，这些方法依赖于那些在预处理阶段获取，而忽视挑战一个KG的不完备造成的质量和情境化的知识结构（即事实上三倍），其关系的表现力的限制，并回收事实无关的推理情境。在本文中，我们提出了一种新颖的神经符号模型，名为混合图网（HGN），它们共同产生用于新的三元组特征表示（如在KG的补充现有边），确定三元组的推理的相关性上下文，并得知绘制用于编码所述关系信息模块参数。我们的模型学习通过过滤边缘是无助于推理过程紧凑图结构（包括两者提取并生成边缘）。我们表现出明显的三个常识推理基准改进和展示与用户研究学习图形结构的优越性。

44. Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation [PDF] 返回目录
Mrigank Raman, Siddhant Agarwal, Peifeng Wang, Aaron Chan, Hansen Wang, Sungchul Kim, Ryan Rossi, Handong Zhao, Nedim Lipka, Xiang Ren
Abstract: Symbolic knowledge (e.g., entities, relations, and facts in a knowledge graph) has become an increasingly popular component of neural-symbolic models applied to machine learning tasks, such as question answering and recommender systems. Besides improving downstream performance, these symbolic structures (and their associated attention weights) are often used to help explain the model's predictions and provide "insights" to practitioners. In this paper, we question the faithfulness of such symbolic explanations. We demonstrate that, through a learned strategy (or even simple heuristics), one can produce deceptively perturbed symbolic structures which maintain the downstream performance of the original structure while significantly deviating from the original semantics. In particular, we train a reinforcement learning policy to manipulate relation types or edge connections in a knowledge graph, such that the resulting downstream performance is maximally preserved. Across multiple models and tasks, our approach drastically alters knowledge graphs with little to no drop in performance. These results raise doubts about the faithfulness of explanations provided by learned symbolic structures and the reliability of current neural-symbolic models in leveraging symbolic knowledge.
摘要：符号知识（例如，实体关系，并在知识图的事实）已经成为适用于机器学习任务，比如问答和推荐系统神经象征性的车型越来越受欢迎的组成部分。除了改善下游性能，这些标志性建筑（及其相关的注意权重）经常被用来帮助说明模型的预测和提供“洞察力”练习者。在本文中，我们质疑这种象征性的解释的忠诚。我们证明，通过学到的策略（甚至是简单的启发），一个可以产生扰动看似其保持原有结构的下游性能的同时，从原来的语义显著偏离标志性建筑。特别是，我们培养了强化学习政策操作的关系类型或边缘连接，在知识图，使得产生的下游性能最大限度地保留。跨多个模型和任务，我们的做法大大变造知识图很少或几乎没有性能下降。这些结果提出有关学会标志性建筑提供解释的忠诚和当前神经象征车型在利用象征性知识的可靠性表示怀疑。

45. Large Scale Legal Text Classification Using Transformer Models [PDF] 返回目录
Zein Shaheen, Gerhard Wohlgenannt, Erwin Filtz
Abstract: Large multi-label text classification is a challenging Natural Language Processing (NLP) problem that is concerned with text classification for datasets with thousands of labels. We tackle this problem in the legal domain, where datasets, such as JRC-Acquis and EURLEX57K labeled with the EuroVoc vocabulary were created within the legal information systems of the European Union. The EuroVoc taxonomy includes around 7000 concepts. In this work, we study the performance of various recent transformer-based models in combination with strategies such as generative pretraining, gradual unfreezing and discriminative learning rates in order to reach competitive classification performance, and present new state-of-the-art results of 0.661 (F1) for JRC-Acquis and 0.754 for EURLEX57K. Furthermore, we quantify the impact of individual steps, such as language model fine-tuning or gradual unfreezing in an ablation study, and provide reference dataset splits created with an iterative stratification algorithm.
摘要：大型多标签文本分类是关注文本分类与成千上万的标签的数据集一个具有挑战性的自然语言处理（NLP）的问题。我们在法律领域，其中的数据集，如JRC-Acquis和EURLEX57K标有EuroVoc词汇是欧盟的法律信息系统中创建的解决这个问题。该EuroVoc分类包括7000点左右的概念。在这项工作中，我们与战略，如生殖训练前，逐步解冻，为了达到竞争的分类性能判别学习率，而目前新的国家的最先进成果研究结合最近各种基于变压器的模型的性能0.661（F1），用于JRC-Acquis和0.754为EURLEX57K。此外，我们量化的各个步骤，如在消融研究语言模型微调或逐渐解冻的影响，并提供一种具有分层迭代算法创建参考数据集分割。

46. Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation [PDF] 返回目录
Yongchang Hao, Shilin He, Wenxiang Jiao, Zhaopeng Tu, Michael Lyu, Xing Wang
Abstract: Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy. The common practice to tackle the problem is transferring the Autoregressive machine Translation (AT) knowledge to NAT models, e.g., with knowledge distillation. In this work, we hypothesize and empirically verify that AT and NAT encoders capture different linguistic properties and representations of source sentences. Therefore, we propose to adopt the multi-task learning to transfer the AT knowledge to NAT models through the encoder sharing. Specifically, we take the AT model as an auxiliary task to enhance NAT model performance. Experimental results on WMT14 English->German and WMT16 English->Romanian datasets show that the proposed multi-task NAT achieves significant improvements over the baseline NAT models. In addition, experimental results demonstrate that our multi-task NAT is complementary to the standard knowledge transfer method, knowledge distillation. Code is publicly available at this https URL
摘要：非自回归机器翻译（NAT）机型已经证明了显著推断加速但劣质翻译的准确性受到影响。最常见的做法来解决这个问题是转移自回归机器翻译（AT）知识NAT模式，例如，用知识升华。在这项工作中，我们假设和实证检验AT和NAT编码器捕捉不同的语言特性和源句子的表示。因此，我们建议采用多任务学习通过编码器共享来传输AT知识NAT模式。具体来说，我们采取的AT模式作为辅助任务来提高NAT模型的性能。在WMT14英语 - >德语和英语的WMT16实验结果>罗马尼亚数据集表明，所提出的多任务NAT实现将比基线NAT模式显著的改善。此外，实验结果表明，我们的多任务NAT是标准的知识转移方法，知识互补蒸馏。代码是公开的，在此HTTPS URL

47. Efficiently Mitigating Classification Bias via Transfer Learning [PDF] 返回目录
Xisen Jin, Francesco Barbieri, Aida Mostafazadeh Davani, Brendan Kennedy, Leonardo Neves, Xiang Ren
Abstract: Prediction bias in machine learning models refers to unintended model behaviors that discriminate against inputs mentioning or produced by certain groups; for example, hate speech classifiers predict more false positives for neutral text mentioning specific social groups. Mitigating bias for each task or domain is inefficient, as it requires repetitive model training, data annotation (e.g., demographic information), and evaluation. In pursuit of a more accessible solution, we propose the Upstream Bias Mitigation for Downstream Fine-Tuning (UBM) framework, which mitigate one or multiple bias factors in downstream classifiers by transfer learning from an upstream model. In the upstream bias mitigation stage, explanation regularization and adversarial training are applied to mitigate multiple bias factors. In the downstream fine-tuning stage, the classifier layer of the model is re-initialized, and the entire model is fine-tuned to downstream tasks in potentially novel domains without any further bias mitigation. We expect downstream classifiers to be less biased by transfer learning from de-biased upstream models. We conduct extensive experiments varying the similarity between the source and target data, as well as varying the number of dimensions of bias (e.g., discrimination against specific social groups or dialects). Our results indicate the proposed UBM framework can effectively reduce bias in downstream classifiers.
摘要：在机器学习模型预测偏压是指不期望的行为模型，对输入判别提或通过某些群体产生;例如，仇恨言论分类预测中性文本更误报提的特定的社会群体。缓解偏压为每个任务或域是低效率的，因为它需要重复模型训练，数据注释（例如，人口统计信息），和评价。为了追求更方便的解决方案的，我们提出上游偏置缓解对下游微调（UBM）的框架，它通过转移学习从上游模型减轻在下游的分类器的一个或多个偏压因素。在上游偏压缓解阶段，说明正则化和对抗性训练被施加到多个减轻偏置因子。在下游的微调阶段，模型的分类器层被重新初始化，并且整个模型是微调，以在潜在的新的结构域下游的任务，而无需任何进一步的偏压缓解。我们预计下游分类要少用迁移学习从去偏上游车型偏向。我们进行了广泛的实验改变源和目标数据之间的相似性，以及不同偏压的维度的数量（例如，针对特定的社会群体或方言歧视）。我们的研究结果表明了该UBM框架可以有效地降低下游分类偏差。

48. When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models [PDF] 返回目录
Benjamin Muller, Antonis Anastasopoulos, Benoît Sagot, Djamé Seddah
Abstract: Transfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear how this approach should be applied for unseen languages that are not covered by any available large-scale multilingual language model and for which only a small amount of raw data is generally available. In this work, by comparing multilingual and monolingual models, we show that such models behave in multiple ways on unseen languages. Some languages greatly benefit from transfer learning and behave similarly to closely related high resource languages whereas others apparently do not. Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages. Transliterating those languages improves very significantly the ability of large-scale multilingual language models on downstream tasks.
摘要：迁移学习基于对大量原始数据的预训练语言模型已成为一种新的规范在NLP达到国家的最先进的性能。不过，目前还不清楚这种方法是如何应适用于未通过任何可用的大型多语种语言模型覆盖和只有原始数据量小一般可看不见的语言。在这项工作中，通过比较多语言和多语模型，我们表明，这种模式在看不见的语言多种方式表现。有些语言从迁移学习受益匪浅，同样的行为密切相关的高资源语言，而其他人显然没有。着眼于后者，我们表明，这种故障转移主要与用来写这些语言脚本的影响。音译这些语言提高很显著的大型多语种的语言模型对下游任务的能力。

49. ReadOnce Transformers: Reusable Representations of Text for Transformers [PDF] 返回目录
Shih-Ting Lin, Ashish Sabharwal, Tushar Khot
Abstract: While large-scale language models are extremely effective when directly fine-tuned on many end-tasks, such models learn to extract information and solve the task simultaneously from end-task supervision. This is wasteful, as the general problem of gathering information from a document is mostly task-independent and need not be re-learned from scratch each time. Moreover, once the information has been captured in a computable representation, it can now be re-used across examples, leading to faster training and evaluation of models. We present a transformer-based approach, ReadOnce Transformers, that is trained to build such information-capturing representations of text. Our model compresses the document into a variable-length task-independent representation that can now be re-used in different examples and tasks, thereby requiring a document to only be read once. Additionally, we extend standard text-to-text models to consume our ReadOnce Representations along with text to solve multiple downstream tasks. We show our task-independent representations can be used for multi-hop QA, abstractive QA, and summarization. We observe 2x-5x speedups compared to standard text-to-text models, while also being able to handle long documents that would normally exceed the length limit of current models.
摘要：虽然大型语言模型是对许多最终任务时直接微调非常有效，这样的模型学会提取信息，并从最终任务监督同时解决的任务。这是一种浪费，因为从文档收集信息的一般问题主要是任务无关，不必每次都从头开始重新学习。而且，一旦信息已经在可计算表示被抓获，现在可以跨例子再利用，从而更快地培训和模型评价。我们提出了一个基于变压器的方法，ReadOnce变压器，被训练来构建文本的此类捕获信息的表示。我们的模型压缩文档转换为可变长度任务无关的表示现在可以重新使用在不同的实施例和任务，因此需要只被读取一次的文档。此外，我们扩展了标准文本到文本模式来消耗我们ReadOnce陈述文字来解决多个下游任务一起。我们发现我们的任务无关的表示，可用于多跳QA，QA抽象和概括。我们比标准文本到文本模型观察2倍，5倍的速度提升，同时还能够处理长文档，通常会超过现有机型的长度限制。

50. CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers [PDF] 返回目录
Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong
Abstract: Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. In comparison, widely used techniques like paraphrasing only affect the accuracy by at most 2%. Human evaluations show that CoCo-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations, further strengthening its reliability and promise to be adopted as part of the robustness evaluation of DST models.
摘要：对话状态追踪器已经对基准数据集显著的进步，但其泛化能力超越持有了新的对话和现实的场景是不太了解。我们建议可控反事实（李玟），以弥合这一差距，在新的方案中评估的对话状态跟踪（DST）模式，即，将系统成功地处理请求，如果用户的反应不同，但仍始终与对话流？的CoCo杠杆转动级信念状态作为反事实条件，以在两个步骤中产生新的会话情形：（ⅰ）反目标生成在通过删除和添加槽导电平，接着通过更换槽值，（ⅱ）反会话生成被上调节的（i）和与对话流保持一致。在MultiWOZ数据集在了一个显著性能下降至30.8％，李玟产生的反事实的结果（从49.4％到18.6％）的绝对联合的目标精度评价国家的最先进的DST模型。相比较而言，像意译广泛使用的技术仅影响由至多2％的精度。人的评估表明，李玟生成的对话是完全反映了超过95％的准确率的基础用户的目标，是人类般的原始对话，进一步加强了其可靠性，并承诺采用为DST模型的稳健性评估的一部分。

51. FLIN: A Flexible Natural Language Interface for Web Navigation [PDF] 返回目录
Sahisnu Mazumder, Oriana Riva
Abstract: AI assistants have started carrying out tasks on a user's behalf by interacting directly with the web. However, training an interface that maps natural language (NL) commands to web actions is challenging for existing semantic parsing approaches due to the variable and unknown set of actions that characterize websites. We propose FLIN, a natural language interface for web navigation that maps NL commands to concept-level actions rather than low-level UI interactions, thus being able to flexibly adapt to different websites and handle their transient nature. We frame this as a ranking problem where, given a user command and a webpage, FLIN learns to score the most appropriate navigation instruction (involving action and parameter values). To train and evaluate FLIN, we collect a dataset using nine popular websites from three different domains. Quantitative results show that FLIN is capable of adapting to new websites in a given domain.
摘要：AI助手已经开始通过直接与网络交互的执行代表用户的任务。然而，训练自然语言（NL）命令映射到网络行动为现有的语义分析是具有挑战性的界面接近而因变量和未知组动作表征的网站。我们建议FLIN，自然语言界面，网页导航是NL命令映射到理念层面的行动，而不是低层次的UI交互，从而能够灵活地适应不同的网站和处理他们的短暂性。我们这个框架作为地方，给用户命令和网页排名的问题，FLIN学会得分最合适的导航指示（包括动作和参数值）。要培养和评价FLIN，我们收集使用来自三个不同的域九个款热门网站的数据集。定量结果表明，FLIN能够在给定域中适应新的网站。

52. Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation [PDF] 返回目录
Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad
Abstract: Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation.
摘要：对大文本的自我监督的目标模式预先训练语料实现对文本摘要任务的国家的最先进的性能。然而，这些模型通常微调对几十万个数据点，应用聚合，以新的利基领域，当一个不可行的要求的。在这项工作中，我们以无监督，特定的数据集的方式，其利用目标数据集的特性如所期望的摘要的长度和abstractiveness引入用于聚合的一般方法，被称为WikiTransfer，进行微调预训练的模型。我们实现国家的最先进的，零射门的CNN-每日邮报数据集抽象概括的表现和展示三个额外的，不同的数据集，我们的方法的有效性。在这种无人监督的方式微调的模型更加坚固噪声数据，并实现使用10点和100个培训的例子更好几拍性能。我们对我们的监督的微调数据的分量的影响进行烧蚀研究，并用自动和人工评估数据增强技术一起分析这些车型中为数不多的射门场景的表现。

53. Go Figure! A Meta Evaluation of Factuality in Summarization [PDF] 返回目录
Saadia Gabriel, Asli Celikyilmaz, Rahul Jha, Yejin Choi, Jianfeng Gao
Abstract: Text generation models can generate factually inconsistent text containing distorted or fabricated facts about the source text. Recent work has focused on building evaluation models to verify the factual correctness of semantically constrained text generation tasks such as document summarization. While the field of factuality evaluation is growing fast, we don't have well-defined criteria for measuring the effectiveness, generalizability, reliability, or sensitivity of the factuality metrics. Focusing on these aspects, in this paper, we introduce a meta-evaluation framework for evaluating factual consistency metrics. We introduce five necessary, common-sense conditions for effective factuality metrics and experiment with nine recent factuality metrics using synthetic and human-labeled factuality data from short news, long news and dialogue summarization domains. Our framework enables assessing the efficiency of any new factual consistency metric on a variety of dimensions over multiple summarization domains and can be easily extended with new meta-evaluation criteria. We also present our conclusions towards standardizing the factuality evaluation metrics.
摘要：文代车型可以生成包含有关源文本歪曲或捏造事实不符事实的文字。最近的工作重点是建立评估模型来验证语义约束的文本生成任务，如文档文摘的事实正确性。而真实性评价领域正在快速增长，我们没有用于测量的有效性，普遍性，可靠性，或真实性度量的灵敏度良好定义的标准。着眼于这些方面，在本文中，我们介绍了评估事实一致性指标一元评价框架。我们引入有效的真实性指标和试验使用合成和人类标记真实性的数据从短消息，长新闻和对话摘要域9个最近真实性指标5必要的，常识性的条件。我们的框架允许评估对在多个聚合域的各种尺寸的任何新的事实一致性度量的效率，并且可以与新的元评价标准很容易地扩展。我们还提出对规范的真实性评价指标我们的结论。

54. Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions [PDF] 返回目录
Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
Abstract: Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
摘要：预先训练的情景视觉和语言（V＆L）车型带来了各种令人印象深刻的基准性能改进。然而，对于前培训所需要的成对的文本图像数据难以收集和扩大。我们调查，如果一个强大的V＆L表示模型可以没有文字，图像对来学习。我们建议弱监督VisualBERT与开展“面具和预测”的语言，只和只显示图像的语料库前培训的核心理念。此外，我们通过介绍一个对象识别模型作为支撑点检测弥合两种模式对象的标记。四个V＆L的基准显示评价该弱监督VisualBERT实现与配对数据一个预训练的模型类似的性能。此外，前培训更多的只是图像数据进一步改进了已经访问了对齐的数据模型，这利用了数十亿的RAW图像可增强V＆L型的可能性。

55. Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation [PDF] 返回目录
Chau Tran, Changhan Wang, Yuqing Tang, Yun Tang, Juan Pino, Xian Li
Abstract: We propose an effective approach to utilize pretrained speech and text models to perform speech-to-text translation (ST). Our recipe to achieve cross-modal and cross-lingual transfer learning (XMTL) is simple and generalizable: using an adaptor module to bridge the modules pretrained in different modalities, and an efficient finetuning step which leverages the knowledge from pretrained modules yet making it work on a drastically different downstream task. With this approach, we built a multilingual speech-to-text translation model with pretrained audio encoder (wav2vec) and multilingual text decoder (mBART), which achieves new state-of-the-art on CoVoST 2 ST benchmark [1] for English into 15 languages as well as 6 Romance languages into English with on average +2.8 BLEU and +3.9 BLEU, respectively. On low-resource languages (with less than 10 hours training data), our approach significantly improves the quality of speech-to-text translation with +9.0 BLEU on Portuguese-English and +5.2 BLEU on Dutch-English.
摘要：本文提出利用预训练的语音和文本模式来执行语音到文本的转换（ST）的有效途径。我们的方法来实现跨模式和跨语言迁移学习（XMTL）简单概括的：使用适配器模块，以缩小不同的方式预先训练模块，以及一个高效的细化和微调步骤，从预先训练模块利用了知识又使其工作在一个完全不同的下游任务。通过这种方法，我们建有预训练音频编码器（wav2vec）和多语言文本解码器（mBART），一个多语种语音到文本的转换模型，实现了新的国家的最先进的CoVoST 2 ST基准[1]英语成15种语言，以及6个罗曼语成英文与上分别平均+2.8 BLEU和+3.9 BLEU。在资源匮乏的语言（有不到10个小时的训练数据），我们的做法显著改善语音到文本的翻译与9.0 BLEU葡萄牙 - 英语和+5.2 BLEU在荷兰的英语素质。

56. Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [PDF] 返回目录
Haoyu Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie, Fei Huang, Ji Wang
Abstract: Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recently, Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. The main challenges of Seq2Seq methods lie in acquiring informative latent document representation and better modeling the compositionality of the target keyphrases set, which will directly affect the quality of generated keyphrases. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously. Concretely, we explore to integrate dependency trees with GCN for latent representation learning. Moreover, the graph structure in our model is dynamically modified during the learning process according to the generated keyphrases. To this end, our approach is able to explicitly learn the relations within the keyphrases collection and guarantee the information interchange between encoder and decoder in both directions. Extensive experiments on various KE benchmark datasets demonstrate the effectiveness of our approach.
摘要：关键词的提取（KE）旨在总结出了一套能准确表达一个概念或覆盖一个给定的文档中的主题短语。近年来，基于序列对序列（Seq2Seq）生成框架被广泛应用于KE任务，并已获得各种基准竞争力的性能。的Seq2Seq方法的主要挑战在于在获取信息潜文档表示和更好的建模对象的组合性的关键字句设置，这将直接影响产生的关键短语的质量。在本文中，我们建议采用动态图形卷积网络（DGCN）同时解决上述两个问题。具体而言，我们探索的依赖树木GCN整合潜表示学习。此外，在我们的模型中的图形结构动态地在学习过程中，根据所生成的关键字句修改。为此，我们的做法是能够明确地了解关键短语集合内的关系，保证编码器和解码器之间的信息交换在两个方向。各种KE基准数据集大量的实验证明了该方法的有效性。

57. Context-aware Decoder for Neural Machine Translation using a Target-side Document-Level Language Model [PDF] 返回目录
Amane Sugiyama, Naoki Yoshinaga
Abstract: Although many context-aware neural machine translation models have been proposed to incorporate contexts in translation, most of those models are trained end-to-end on parallel documents aligned in sentence-level. Because only a few domains (and language pairs) have such document-level parallel data, we cannot perform accurate context-aware translation in most domains. We therefore present a simple method to turn a sentence-level translation model into a context-aware model by incorporating a document-level language model into the decoder. Our context-aware decoder is built upon only a sentence-level parallel corpora and monolingual corpora; thus no document-level parallel data is needed. In a theoretical viewpoint, the core part of this work is the novel representation of contextual information using point-wise mutual information between context and the current sentence. We show the effectiveness of our approach in three language pairs, English to French, English to Russian, and Japanese to English, by evaluation in \textsc{bleu} and contrastive tests for context-aware translation.
摘要：尽管许多上下文感知神经的机器翻译模型已经被提出来结合上下文翻译，大多数这些模型进行培训，终端到终端的上句级平行排列的文档。因为只有少数域（和语言对）有这样文档级的并行数据，我们无法在大多数领域进行准确的上下文感知的翻译。因此，我们提出一个简单的方法，通过将文件级语言模型到解码器把一个句子层次的转换模型转换为上下文感知模型。我们的上下文感知解码器只在一个句子级平行语料库和多语语料库建设;因而不需要文档级并行数据。在理论的角度来看，这项工作的核心部分是利用上下文和当前句子之间的逐点互信息的上下文信息的新代表。我们证明我们的方法有三种语言对英语翻译成法语，英语俄语，日语英语，通过评估\ textsc {}蓝天为背景感知翻译的效果和对比试验。

58. Text Editing by Command [PDF] 返回目录
Felix Faltings, Michel Galley, Gerold Hintz, Chris Brockett, Chris Quirk, Jianfeng Gao, Bill Dolan
Abstract: A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model's performance.
摘要：在神经文本生成甲流行范式是单触发发生，其中文本被在一个单一的步骤中产生。该一次性设置是不够的，然而，当用户希望强加于生成的文本约束是动态的，创作较长文档时尤其如此。我们解决这一限制与交互文本生成设定在通过发布修改现有的文本命令与系统用户交互。为此，我们提出了一种新的文本编辑任务，并介绍WikiDocEdits，单句编辑的数据集从维基百科的抓取。我们证明了我们的交互式编辑器，基于变压器的模型中训练的数据集，优于基线并获得在自动和人的评价积极的成果。我们提出这个模型的表现实证和定性分析。

59. Cross-neutralising: Probing for joint encoding of linguistic information in multilingual models [PDF] 返回目录
Rochelle Choenni, Ekaterina Shutova
Abstract: Multilingual sentence encoders are widely used to transfer NLP models across languages. The success of this transfer is, however, dependent on the model's ability to encode the patterns of cross-lingual similarity and variation. Yet, little is known as to how these models are able to do this. We propose a simple method to study how relationships between languages are encoded in two state-of-the-art multilingual models (i.e. M-BERT and XLM-R). The results provide insight into their information sharing mechanisms and suggest that linguistic properties are encoded jointly across typologically-similar languages in these models.
摘要：多语言句子的编码器被广泛用于传输NLP模型跨语言。这种转移的成功，然而，依赖于模型的编码跨语言相似度和变化的模式的能力。然而，鲜为人知的是为这些模型是如何能够做到这一点。我们提出了一个简单的方法来研究如何种语言之间的关系，两个国家的最先进的多语言模型（即M-BERT和XLM-R）进行编码。结果提供深入了解他们的信息共享机制，并认为语言特性在这些模式编码的共同跨越类型学相似的语言。

60. Rethinking embedding coupling in pre-trained language models [PDF] 返回目录
Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder
Abstract: We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to significantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By reallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on standard natural language understanding tasks with the same number of parameters during fine-tuning. We also show that allocating additional capacity to the output embedding provides benefits to the model that persist through the fine-tuning stage even though the output embedding is discarded after pre-training. Our analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable to other tasks and languages. Harnessing these findings, we are able to train models that achieve strong performance on the XTREME benchmark without increasing the number of parameters at the fine-tuning stage.
摘要：我们重新评估在国家的最先进的预先训练的语言模型共享输入和输出的嵌入之间的权重的标准做法。我们发现，分离的嵌入提供更高的灵活性建模，使我们能够显著提高参数配置效率输入嵌入多语言模型。通过重新分配输入变压器层嵌入参数，就可以实现与微调在相同数量的参数标准自然语言理解任务，显着提高性能。我们还表明，到输出嵌入分配额外的容量提供的好处，通过即使输出嵌入是经过岗前培训丢弃微调阶段持续的模式。我们的分析表明，较大的输出的嵌入防止overspecializing到前培训任务模型的最后一层，并鼓励变压器表示更普遍，更转移到其他任务和语言。利用这些发现，我们能够列车实现对XTREME基准性能强劲的机型，而不在微调阶段增加的参数的数量。

61. "Nice Try, Kiddo": Ad Hominems in Dialogue Systems [PDF] 返回目录
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng
Abstract: Ad hominem attacks are those that attack some feature of a person's character instead of the position the person is maintaining. As a form of toxic and abusive language, ad hominems contain harmful language that could further amplify the skew of power inequality for marginalized populations. Since dialogue systems are designed to respond directly to user input, it is important to study ad hominems in these system responses. In this work, we propose categories of ad hominems that allow us to analyze human and dialogue system responses to Twitter posts. We specifically compare responses to Twitter posts about marginalized communities (#BlackLivesMatter, #MeToo) and other topics (#Vegan, #WFH). Furthermore, we propose a constrained decoding technique that uses salient $n$-gram similarity to apply soft constraints to top-$k$ sampling and can decrease the amount of ad hominems generated by dialogue systems. Our results indicate that 1) responses composed by both humans and DialoGPT contain more ad hominems for discussions around marginalized communities versus other topics, 2) different amounts of ad hominems in the training data can influence the likelihood of the model generating ad hominems, and 3) we can thus carefully choose training data and use constrained decoding techniques to decrease the amount of ad hominems generated by dialogue systems.
摘要：广告人身攻击的攻击是那些攻击一个人的性格的某些特征而不是人保持的位置。有毒，粗言秽语的形式，诉诸人身含有对人体有害的语言，可以进一步放大歪斜功率不平等的边缘人群。因为对话系统被设计成直接响应用户的输入，它来研究诉诸人身在这些系统的反应是很重要的。在这项工作中，我们提出诉诸人身，使我们能够分析人类和对话系统响应到Twitter的职位类别。我们特别比较约边缘化的社区（#BlackLivesMatter，#MeToo）等话题（#Vegan，#WFH）Twitter的职位回应。此外，我们提出了一种约束解码技术，该技术使用凸$ N $ -gram相似应用软约束以顶 - $ $ķ采样，并且可以减少由对话系统中产生诉诸人身的量。我们的研究结果表明由人类和DialoGPT组成：1）响应包含了讨论，更诉诸人身各地被边缘化的群体与其他主题，2）不同量的训练数据诉诸人身的可能影响模型生成诉诸人身的可能性，以及3 ）我们可以因此谨慎选择训练数据和使用受限的解码技术，以减少通过对话系统产生诉诸人身的量。

62. Inducing Taxonomic Knowledge from Pretrained Transformers [PDF] 返回目录
Catherine Chen, Kevin Lin, Dan Klein
Abstract: We present a method for inducing taxonomic trees from pretrained transformers. Given a set of input terms, we assign a score for the likelihood that each pair of terms forms a parent-child relation. To produce a tree from pairwise parent-child edge scores, we treat this as a graph optimization problem and output the maximum spanning tree. We train the model by finetuning it on parent-child relations from subtrees of WordNet and test on non-overlapping subtrees. In addition, we incorporate semi-structured definitions from the web to further improve performance. On the task of inducing subtrees of WordNet, the model achieves 66.0 ancestor F_1, a 10.4 point absolute increase over the previous best published result on this task.
摘要：我们提出了从预训练的变压器诱发分类树的方法。给定一组输入方面，我们对于分配每对术语的形成父子关系的可能性的分数。以从两两父子边的分数一棵树，我们把这个作为一个图形优化问题，输出的最大生成树。我们可以通过微调它从互不重叠的子树共发现和测试的子树亲子关系训练模型。此外，我们引入半结构化的定义，从网络，以进一步提高性能。在诱导共发现的子树的任务，该机型达到66.0祖先F_1，在这项任务上一公布的最好结果的10.4点的绝对增加。

63. A Frustratingly Easy Approach for Joint Entity and Relation Extraction [PDF] 返回目录
Zexuan Zhong, Danqi Chen
Abstract: End-to-end relation extraction aims to identify named entities and extract relations between them simultaneously. Most recent work models these two subtasks jointly, either by unifying them in one structured prediction framework, or multi-task learning through shared representations. In this work, we describe a very simple approach for joint entity and relation extraction, and establish the new state-of-the-art on standard benchmarks (ACE04, ACE05, and SciERC). Our approach essentially builds on two independent pre-trained encoders and merely uses the entity model to provide input features for the relation model. Through a series of careful examinations, we validate the importance of learning distinct contextual representations for entities and relations, fusing entity information at the input layer of the relation model, and incorporating global context. Finally, we also present an efficient approximation to our approach which requires only one pass of both encoders at inference time, obtaining a 8-16$\times$ speedup with a small accuracy drop.
摘要：结束到终端的关系抽取旨在同时识别它们之间的命名实体和提取的关系。最近的工作模式这两个共同的子任务，方法是通过共享交涉统一他们一个结构预测框架，或者多任务学习。在这项工作中，我们描述了联合实体和关系抽取一个非常简单的方法，并建立新的标准基准测试（ACE04，ACE05和SciERC）国家的最先进的。我们的方法基本上是建立在两个预先训练的独立编码器和仅仅采用实体模型为关系模型提供输入功能。通过一系列的拉网式检查，我们确认学习实体和关系不同情境表示，在关系模型的输入层融合的实体信息，并结合全球范围内的重要性。最后，我们还提出了一个高效的逼近我们的做法，只需要一个在推理时间通两个编码器，获得8-16 $ \ $倍加速用小的精度下降。

64. Paired Representation Learning for Event and Entity Coreference [PDF] 返回目录
Xiaodong Yu, Wenpeng Yin, Dan Roth
Abstract: Co-reference of Events and of Entities are commonly formulated as binary classification problems, given a pair of events or entities as input. Earlier work addressed the main challenge in these problems -- the representation of each element in the input pair by: (i) modelling the representation of one element (event or entity) without considering the other element in the pair; (ii) encoding all attributes of one element (e.g., arguments of an event) into a single non-interpretable vector, thus losing the ability to compare cross-element attributes. In this work we propose paired representation learning (PairedRL) for coreference resolution. Given a pair of elements (Events or Entities) our model treats the pair's sentences as a single sequence so that each element in the pair learns its representation by encoding its own context as well the other element's context. In addition, when representing events, PairedRL is structured in that it represents the event's arguments to facilitate their individual contribution to the final prediction. As we show, in both (within-document & cross-document) event and entity coreference benchmarks, our unified approach, PairedRL, outperforms prior state of the art systems with a large margin.
摘要：事件和实体联合引用一般配制成二元分类问题，给定一个对事件或实体输入。早期的工作中处理的主要挑战中的这些问题 - 通过每个元素的所述表示在输入对：（i）没有考虑在对中的另一个元件建模一个元件（事件或实体）的表示; （ii）编码一种元素的所有属性（例如，事件的参数）到单个不可解释的向量，从而失去了比较交叉元素的属性的能力。在这项工作中我们提出配对表示学习（PairedRL）用于指代消解。给定一对元件（活动或实体）我们的模型将所述一对的句子作为一个单一的序列，使得在所述一对的每个元素由编码其自己的上下文以及其它元素的上下文得知它的表示。此外，代表事件的时候，PairedRL的结构，因为它代表事件的参数，方便他们最终预测个人贡献。正如我们表明，在这两个（在文档和跨文档）事件和实体的共参照基准，我们的统一的方法，PairedRL，以大比分胜过技术的系统之前的状态。

65. COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval [PDF] 返回目录
Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Emmett Jesrani, Simon Lin, Huan Sun
Abstract: We present a large challenging dataset, COUGH, for COVID-19 FAQ retrieval. Specifically, similar to a standard FAQ dataset, COUGH consists of three parts: FAQ Bank, User Query Bank and Annotated Relevance Set. FAQ Bank contains ~16K FAQ items scraped from 55 credible websites (e.g., CDC and WHO). For evaluation, we introduce User Query Bank and Annotated Relevance Set, where the former contains 1201 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query. We analyze COUGH by testing different FAQ retrieval models built on top of BM25 and BERT, among which the best model achieves 0.29 under P@5, indicating that the dataset presents a great challenge for future research. Our dataset is freely available at this https URL.
摘要：我们提出了一个具有挑战性的大数据集，咳嗽，为COVID-19 FAQ检索。具体而言，类似于标准FAQ数据集，咳嗽由三个部分组成：常见问题银行，用户查询银行和注释关联设置。常见问题库包含来自55个可信的网站（例如，CDC和WHO）刮〜16K常见问题的项目。对于评估中，我们介绍了用户查询银行和注释关联集，其中前者包含1201人转述查询，而后者包含〜每个查询32人标注的常见问题解答项目。我们分析COUGH通过测试建立在BM25和BERT，其中最好的模式P的作用下达到0.29 @ 5的顶部不同FAQ检索模型，表明该数据集的礼物为今后的研究一个很大的挑战。我们的数据是免费提供在此HTTPS URL。

66. CaM-Gen:Causally-aware Metric-guided Text Generation [PDF] 返回目录
Navita Goyal, Roodram Paneri, Ayush Agarwal, Udit Kalani, Abhilasha Sancheti, Niyati Chhaya
Abstract: Content is created for a well-defined purpose, often described by a metric or a signal represented in the form of structured information. The relationship between the metrics or the goal of a target content and the content itself are non-trivial. While large scale language models show promising text generation capabilities, guiding and informing the generated text with external metrics is challenging. These metrics and the content tend to have inherent relationships and not all of them may directly impact the content. We introduce a CaM-Gen: Causally-aware Generative Networks guided by user-defined input metrics incorporating the causal relationships between the metric and the content features. We leverage causal inference techniques to identify the causally significant aspects of text that leads to the target metric and then explicitly guide the generative model towards these by a feedback mechanism. We propose this mechanism for variational autoencoder-based and transformer-based generative models. The proposed models beat baselines in terms of the target metric accuracy while maintaining the fluency and the language quality of the generated text. To the best of our knowledge, this is one of the early attempts at incorporating a metric-guide using causal inference towards controlled generation.
摘要：含量是良好定义的目的，通常是由公制或在结构化信息的形式表示的信号描述创建。度量或目标内容的目标和内容本身之间的关系是不平凡的。虽然大规模的语言模型显示出有前途的文本生成功能，引导和告知外部指标生成的文本是具有挑战性的。这些指标和内容往往具有内在的关系，而不是所有的人都可能直接影响的内容。我们引入凸轮根：通过将度量和内容特征之间的因果关系用户自定义输入度量引导有因果关系感知剖成网络。我们充分利用因果推理技术来识别文本的因果显著方面，导致了目标指标，然后明确的反馈机制指导对这些生成模型。我们提出这个机制变型变压器的自动编码为基础，生成模型。所提出的模型，目标度精度方面击败基线，同时保持流畅性和生成的文本的语言质量。据我们所知，这是结合使用因果推断对控制生成指标导向的早期尝试之一。

67. X-Class: Text Classification with Extremely Weak Supervision [PDF] 返回目录
Zihan Wang, Dheeraj Mekala, Jingbo Shang
Abstract: In this paper, we explore to conduct text classification with extremely weak supervision, i.e., only relying on the surface text of class names. This is a more challenging setting than the seed-driven weak supervision, which allows a few seed words per class. We opt to attack this problem from a representation learning perspective -- ideal document representations should lead to very close results between clustering and the desired classification. In particular, one can classify the same corpus differently (e.g., based on topics and locations), so document representations must be adaptive to the given class names. We propose a novel framework X-Class to realize it. Specifically, we first estimate comprehensive class representations by incrementally adding the most similar word to each class until inconsistency appears. Following a tailored mixture of class attention mechanisms, we obtain the document representation via a weighted average of contextualized token representations. We then cluster and align the documents to classes with the prior of each document assigned to its nearest class. Finally, we pick the most confident documents from each cluster to train a text classifier. Extensive experiments demonstrate that X-Class can rival and even outperform seed-driven weakly supervised methods on 7 benchmark datasets.
摘要：在本文中，我们探索用极其微弱的监督，即进行文本分类，仅依靠类名的表面文字。这比种子驱动的监管不力，这使得每个班上有几个种子的话更具挑战性的环境。我们选择从一个表示学习的角度攻击这个问题 - 理想文档表示应导致集群和预期的分类之间非常接近的结果。特别是，人们可以在同一语料库不同的分类（例如，根据主题和位置），所以文档表示必须适应给定的类名。我们提出了一个新的框架，X-Class，以实现它。具体而言，我们首先通过逐步添加最相似的单词每个类，直到出现不一致估计综合类的表述。继类关注机构量身定制的混合，我们可以透过情境令牌表示的加权平均文档表示。然后，我们聚集与现有分配给其最近的类每个文件的对齐文件类。最后，我们选择最自信的文件从每个集群训练文本分类。大量的实验证明，X级可媲美，甚至优于大种子驱动弱监督7个基准数据集的方法。

68. Exploration of NLU: disassemble the information represented by Natural Language, based on the understanding of the internal structure of information, modeling the storage and processing system of information [PDF] 返回目录
Limin Zhang
Abstract: Natural language is one of the ways information is encoded and it has highly abstracted and conceptualized the information. This paper disassembles the information represented by natural language, analyzes the classification coding system of attribute information and the abstraction relation between attribute information and entities in the real world, constructs the storage model of information, and simulate the attribute information precessing process in one of the attribute spaces, interprets how the relations which represented by "Be", "Of", "Have", and so on are embodied in the information storage data structures and the corresponding data reading modes, reclassifies the sentences types from the perspective of task types and data reading modes. Then, simulated the understanding process (the information processing process) on a dialogue example. Finally, the author summarizes the basic conditions of understanding and gives out the definition of understanding from a personal point of view. The study in this paper provides a practical, theoretical basis and research methods for NLU. It also can be applied in large-scale, multi-type information processing in the artificial intelligence (AI) area.
摘要：自然语言的信息进行编码的方法之一，它具有高度抽象的概念化和信息。本文拆解通过自然语言表示的信息，分析了分类编码的属性信息和属性信息和在现实世界实体之间的抽象关系系统，构建信息存储模型，并模拟属性信息进动过程中的一个属性的空间，解释它代表由“是”关系，“中”，“具有”，等等被实施在信息存储数据结构和相应的数据读出模式如何，从任务类型的角度重新分类句类型和数据读出模式。然后，模拟上的对话实例的理解处理（信息处理过程）。最后，笔者总结的理解的基本条件，并给出了从一个个人的角度理解的定义。本文的研究提供了NLU实践，理论基础和研究方法。它也可以在大型的，在人工智能多类型的信息处理（AI）领域中。

69. Efficient End-to-end Learning of Cross-event Dependencies for Document-level Event Extraction [PDF] 返回目录
Kung-Hsiang Huang, Nanyun Peng
Abstract: Document-level event extraction is important for indexing the most important information in a document to facilitate downstream tasks such as information retrieval or question answering. However, it is a challenging task because it requires the understanding of event and entity coreference, and capturing arguments that span across different sentences. Existing works on event extraction generally confine on extracting events from single sentences, which fail to capture the relationships between the event mentions at the scale of a document, as well as the event arguments that appear in a different sentence than the event trigger. In this paper, we propose an end-to-end model leveraging Deep Value Networks (DVN), a structured prediction algorithm, to efficiently capture cross-event dependencies for document-level event extraction. Experimental results show that our approach achieves comparable performance to CRF-based model on ACE05, while enjoys significantly higher efficiency.
摘要：文档级事件提取为一个文件索引中最重要的信息，以促进下游任务，如信息检索或答疑重要。然而，这是一项艰巨的任务，因为它需要的事件和实体共指的认识，并捕捉论点在不同的句子跨度。上提取单个的句子，它无法捕捉的关系之间的文档的比例的情况下提到，以及出现在不同的句子比事件触发事件参数事件的现有事件提取一般作品局限。在本文中，我们提出了一种端至端模型利用深度价值网络（DVN），结构化预测算法，以有效地捕获用于文档级事件提取交叉事件的依赖性。实验结果表明，我们的方法达到相当的性能对ACE05基于CRF模型，在享受显著更高的效率。

70. Measuring the `I don't know' Problem through the Lens of Gricean Quantity [PDF] 返回目录
Huda Khayrallah, João Sedoc
Abstract: We consider the intrinsic evaluation of neural generative dialog models through the lens of Grices Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the `I don't know' problem. The RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, `I don't know' is preferred over the reference more than half the time, but this can be mitigated with hyperparameter tuning.
摘要：我们认为通过对话Grices格言（1975年）的镜头神经生成对话框车型的内在评价。基于数量（具有信息）的格言，我们提出了相对话语数量（RUQ）诊断'我不知道”的问题。所述RUQ诊断比较该参考响应的通用响应的模型得分。我们发现，对于合理的基线模型，'我不知道”优于参考一半以上的时间，但是这可以通过超参数调整来缓解。

71. Clustering Contextualized Representations of Text for Unsupervised Syntax Induction [PDF] 返回目录
Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan
Abstract: We explore clustering of contextualized text representations for two unsupervised syntax induction tasks: part of speech induction (POSI) and constituency labelling (CoLab). We propose a deep embedded clustering approach which jointly transforms these representations into a lower dimension cluster friendly space and clusters them. We further enhance these representations by augmenting them with task-specific representations. We also explore the effectiveness of multilingual representations for different tasks and languages. With this work, we establish the first strong baselines for unsupervised syntax induction using contextualized text representations. We report competitive performance on 45-tag POSI, state-of-the-art performance on 12-tag POSI across 10 languages, and competitive results on CoLab.
摘要：本文探讨了两个无监督语法归纳任务情境文本表示的聚类：语音感应（POSI）和选区标记（CoLab）的一部分。我们提出了一个深刻的嵌入集群的做法，共同将这些人表示分为集群友好空间和集群，它们较低的层面。我们通过与任务的具体陈述对它们进行进一步增强这些表示。我们还探索不同的任务和语言的多语言表述的有效性。随着这项工作，我们建立了使用情境文本表示无监督语法感应第一强基线。我们报告45-标签POSI，国家的最先进的跨10种语言在12个标签POSI性能，并在CoLab竞争的结果，竞争力的性能。

72. Open-Domain Dialogue Generation Based on Pre-trained Language Models [PDF] 返回目录
Yan Zeng, Jian-Yun Nie
Abstract: Pre-trained language models have been successfully used in response generation for open-domain dialogue. Four main frameworks have been proposed: (1) Transformer-ED using Transformer encoder and decoder separately for source and target sentences; (2) Transformer-Dec using Transformer decoder for both source and target sentences; (3) Transformer-MLM using Transformer decoder that applies bi-directional attention on the source side and left-to-right attention on the target side with masked language model objective; and (4) Transformer-AR that uses auto-regressive objective instead. In this study, we compare these frameworks on 3 datasets, and our comparison reveals that the best framework uses bidirectional attention on the source side and does not separate encoder and decoder. We also examine model discrepancy, and our experiments confirm that the performance of a model is directly impacted by the underlying discrepancies. We then propose two correction methods to reduce the discrepancies, and both improve the model performance. These results show that discrepancies is an important factor to consider when we use a pre-trained model, and a reduction in discrepancies can lead to improved performance.
摘要：预先训练语言模型已经在响应代被成功地用于开放领域的对话。四个主要的框架已提出：（1）使用的变压器的编码器和解码器分别对源和目标句子变压器-ED; （2）使用用于源和目标句子变压器解码器变压器12月; （3）变压器MLM使用适用于源极侧的双向的关注和左到右的注意力与掩蔽语言模型客观目标侧变压器解码器; （4）变压器-AR使用自回归的目标来代替。在这项研究中，我们比较了3个集这些框架，我们的比较显示，最好的框架使用在源端双向关注和不独立编码器和解码器。我们还检查模型的差异，而我们的实验证实，模型的性能直接受底层差异的影响。然后，我们提出了两种校正方法来减少差异，并且都提高了模型的性能。这些结果表明，差异是需要考虑的一个重要因素，当我们使用一个预先训练模型，并在差异的减少可能会导致更高的性能。

73. Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals [PDF] 返回目录
Aida Mostafazadeh Davani, Ali Omrani, Brendan Kennedy, Mohammad Atari, Xiang Ren, Morteza Dehghani
Abstract: Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data, e.g., mentioned social groups. However, in the case of hate speech detection, it is not always desirable to equalize the effects of social groups because of their essential role in distinguishing outgroup-derogatory hate, such that particular types of hateful rhetoric carry the intended meaning only when contextualized around certain social group tokens. Counterfactual token fairness for a mentioned social group evaluates the model's predictions as to whether they are the same for (a) the actual sentence and (b) a counterfactual instance, which is generated by changing the mentioned social group in the sentence. Our approach assures robust model predictions for counterfactuals that imply similar meaning as the actual sentence. To quantify the similarity of a sentence and its counterfactual, we compare their likelihood score calculated by generative language models. By equalizing model behaviors on each sentence and its counterfactuals, we mitigate bias in the proposed model while preserving the overall classification performance.
摘要：途径用于减轻偏压监督型号的设计，以减少模型的输入数据，例如特定的敏感特性，提及的社会团体的依赖。然而，在仇恨言论检测的情况下，它并不总是需要平衡，因为在区分外类群贬损恨，其重要作用的社会群体的影响，因此特定类型的可恶修辞携带预期，只有当周围一定语境意义社会团体令牌。一提到社会团体反令牌公平性评估模型的预测他们是否是（一）实际的句子和（b）一个反例，这是通过改变提到的社会群体中的一句话产生的相同。我们的做法保证了暗示类似含义的实际句子反事实稳健的模型预测。为了量化句子和其反的相似性，我们比较它们的可能性得分生成语言模型计算。通过对每一个句子和它的反事实均衡模型的行为，我们减轻了模型偏差，同时保持整体的分类性能。

74. Improving Multilingual Models with Language-Clustered Vocabularies [PDF] 返回目录
Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, Jason Riesa
Abstract: State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. In this work, we introduce a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocabularies. Our experiments show improvements across languages on key multilingual benchmark tasks TyDi QA (+2.9 F1), XNLI (+2.1\%), and WikiAnn NER (+2.8 F1) and factor of 8 reduction in out-of-vocabulary rate, all without increasing the size of the model or data.
摘要：最先进的国家的最会讲多种语言的模型依赖于覆盖所有的语言模型会希望看到在推理时的词汇，但对于那些产生词汇的标准方法是不理想的大量多语言应用程序。在这项工作中，我们介绍多语言词汇产生组合几种自动导出语言簇的单独训练的词汇，因而平衡跨语言的子词共享和语言特定的词汇之间的折衷的新颖过程。我们的实验显示出的词汇率，都没有跨语言的改进重点多种语言基准任务泰迪QA（2.9 F1），XNLI（+2.1 \％）和WikiAnn NER（+2.8 F1）和8倍减少增加模型或数据的大小。

75. Improved Synthetic Training for Reading Comprehension [PDF] 返回目录
Yanda Chen, Md Arafat Sultan, Vittorio Castelli
Abstract: Automatically generated synthetic training examples have been shown to improve performance in machine reading comprehension (MRC). Compared to human annotated gold standard data, synthetic training data has unique properties, such as high availability at the possible expense of quality. In view of such differences, in this paper, we explore novel applications of synthetic examples to MRC. Our proposed pre-training and knowledge distillation strategies show significant improvements over existing methods. In a particularly surprising discovery, we observe that synthetic distillation often yields students that can outperform the teacher model.
摘要：自动生成的合成训练实例已经显示出改善机器阅读理解（MRC）的性能。相比于人类注释的黄金标准数据，合成训练数据具有独特的性能，如在质量的可能代价高可用性。考虑到这些差异，在本文中，我们探索的合成例向MRC新颖的应用。我们建议前期培训和知识蒸馏策略表明在现有的方法显著的改善。在一个特别令人惊讶的发现，我们观察到合成的蒸馏往往得到的是可以超越老师模范生。

76. Structure-Grounded Pretraining for Text-to-SQL [PDF] 返回目录
Xiang Deng, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson
Abstract: Learning to capture text-table alignment is essential for table related tasks like text-to-SQL. The model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (StruG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel prediction tasks: column grounding, value grounding and column-value mapping, and train them using weak supervision without requiring complex SQL annotation. Additionally, to evaluate the model under a more realistic setting, we create a new evaluation set Spider-Realistic based on Spider with explicit mentions of column names removed, and adopt two existing single-database text-to-SQL datasets. StruG significantly outperforms BERT-LARGE on Spider and the realistic evaluation sets, while bringing consistent improvement on the large-scale WikiSQL benchmark.
摘要：学习捕获文本对齐表对于像文本到SQL表相关的任务至关重要。该模型需要正确识别列和值自然语言的引用和地他们在给定的数据库架构。在本文中，我们提出了一种新型弱监督结构接地训练前的文本到SQL，可以有效地学会捕捉文本的对齐表基于并行文本的语料库表框架（StruG）。我们确定了一组新的预测任务：列接地，接地值和列值的映射，并使用监管不力，而不需要复杂的SQL注释训练他们。此外，以评估在一个更现实的设定模型，我们创建了一个新的评估组蜘蛛现实基础上蜘蛛明确提到删除列名，并采用两个现有的单一数据库文本到SQL数据集。 StruG显著优于BERT，大蜘蛛和现实的评估组，同时将上大型WikiSQL基准持续改善。

77. On Learning Text Style Transfer with Direct Rewards [PDF] 返回目录
Yixin Liu, Graham Neubig, John Wieting
Abstract: In most cases, the lack of parallel corpora makes it impossible to directly train supervised models for text style transfer task. In this paper, we explore training algorithms that instead optimize reward functions that explicitly consider different aspects of the style-transferred outputs. In particular, we leverage semantic similarity metrics originally used for fine-tuning neural machine translation models to explicitly assess the preservation of content between system outputs and input texts. We also investigate the potential weaknesses of the existing automatic metrics and propose efficient strategies of using these metrics for training. The experimental results show that our model provides significant gains in both automatic and human evaluation over strong baselines, indicating the effectiveness of our proposed methods and training strategies.
摘要：在大多数情况下，缺乏平行语料库的使得它不可能直接用于训练文本样式转移任务的监督模式。在本文中，我们将探讨训练的算法，而不是优化的奖励功能，明确考虑风格转移输出的不同方面。特别是，我们利用原本用于微调神经机器翻译模型，以明确评估系统输出输入文字之间的内容保存语义相似性指标。我们还调查了现有的自动度量的潜在弱点，并提出使用这些指标进行培训的有效策略。实验结果表明，我们的模型提供超过基线强自动和人工评估显著的收益，这表明我们所提出的方法和培训战略的有效性。

78. Conversational Semantic Parsing for Dialog State Tracking [PDF] 返回目录
Jianpeng Cheng, Devang Agrawal, Hector Martinez Alonso, Shruti Bhargava, Joris Driesen, Federico Flego, Dain Kaplan, Dimitri Kartsaklis, Lin Li, Dhivya Piraviperumal, Jason D Williams, Hong Yu, Diarmuid O Seaghdha, Anders Johannsen
Abstract: We consider a new perspective on dialog state tracking (DST), the task of estimating a user's goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to 20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.
摘要：我们认为在对话框状态跟踪（DST）一个新的视角，通过一个对话的过程中估计用户的目标任务。通过制定DST作为语义分析任务交给了分层表示，我们可以将语义组合性，跨域知识共享和联合引用。我们提出TreeDST，用树形结构的对话状态和系统注释的27K谈话的数据集的作用。我们描述了一种DST编码器 - 译码器框架具有分级表示，这导致20％的改善超过该上槽 - 值对的平坦含义空间中操作状态的最先进的DST方法。

79. Modularity Improves Out-of-Domain Instruction Following [PDF] 返回目录
Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell
Abstract: We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals, such as navigating to landmarks or picking up objects. Standard, non-modular, architectures used in instruction following do not exploit subgoal compositionality and often struggle on out-of-distribution tasks and environments. In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type. A sequence of modules to execute is chosen by learning to segment the instructions and predicting a subgoal type for each segment. When compared to standard sequence-to-sequence approaches on ALFRED, a challenging instruction following benchmark, we find that modularization improves generalization to environments unseen in training and to novel tasks.
摘要：我们提出了一个模块化的架构下面描述不同的子目标，比如导航到地标或拿起对象的序列自然语言指令。标准，非模块化，在指令后使用的架构不利用子目标组合性和经常挣扎于外的配送任务和环境。在我们的方法中，子目标模块为特定的子目标类型各自携带了自然语言指令。模块执行的序列通过学习来分割指令和预测子目标类型为每个段选择。相较于标准序列到序列上ALFRED，以下基准具有挑战性的指导方法，我们发现，模块化提高泛化的环境中训练看不见的新颖任务。

80. Measuring Association Between Labels and Free-Text Rationales [PDF] 返回目录
Sarah Wiegreffe, Ana Marasovic, Noah A. Smith
Abstract: Interpretable NLP has taking increasing interest in ensuring that explanations are faithful to the model's decision-making process. This property is crucial for machine learning researchers and practitioners using explanations to better understand models. While prior work focuses primarily on extractive rationales (a subset of the input elements), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that existing models for faithful interpretability do not extend cleanly to tasks where free-text rationales are needed. We turn to models that jointly predict and rationalize, a common class of models for free-text rationalization whose faithfulness is not yet established. We propose measurements of label-rationale association, a necessary property of faithful rationales, for these models. Using our measurements, we show that a state-of-the-art joint model based on T5 has strengths and weaknesses for producing faithful rationales.
摘要：可解释NLP已采取在确保解释是忠实于模型的决策过程越来越感兴趣。此属性是使用说明，以便更好地了解模型的机器学习研究人员和从业人员的关键。虽然以前的工作主要集中在采掘理由（输入元素的子集），我们调查他们研究过少对应：自由文本的自然语言理由。我们证明了忠实的解释性现有的模式不干净延伸到需要自由文本理由任务。我们转向模型，预测联合和合理化，共同级别的车型自由文本合理化，其忠诚尚未建立。我们建议标签理协会，忠实理由的必要属性，这些模型的测量。使用我们的测量，我们表明，基于T5一个国家的最先进的联合模型具有产生忠实理由的长处和短处。

81. NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions [PDF] 返回目录
Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu
Abstract: Existing conversational systems are mostly agent-centric, which assumes the user utterances would closely follow the system ontology (for NLU or dialogue state tracking). However, in real-world scenarios, it is highly desirable that the users can speak freely in their own way. It is extremely hard, if not impossible, for the users to adapt to the unknown system ontology. In this work, we attempt to build a user-centric dialogue system. As there is no clean mapping for a user's free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users' utterances to such distributions. Learning such a mapping poses new challenges on reasoning over existing knowledge, ranging from factoid knowledge, commonsense knowledge to the users' own situations. To this end, we build a new dataset named NUANCED that focuses on such realistic settings for conversational recommendation. Collected via dialogue simulation and paraphrasing, NUANCED contains 5.1k dialogues, 26k turns of high-quality user responses. We conduct experiments, showing both the usefulness and challenges of our problem setting. We believe NUANCED can serve as a valuable resource to push existing research from the agent-centric system to the user-centric system. The code and data will be made publicly available.
摘要：现有对话系统大多是代理为中心的，它假设用户的言论将密切关注系统的本体（NLU为对话或状态跟踪）。然而，在现实世界的情景，这是非常理想的是，用户可以用自己的方式自由发言。这是非常艰难，如果不是不可能的，为用户适应未知的系统的本体。在这项工作中，我们试图建立一个以用户为中心的对话系统。由于没有干净的映射用户的自由形式发声的本体，我们首先在用户偏好建模为在系统本体估计分布和用户的话语映射到这样的分布。学习这样的映射姿势推理在现有的知识，从知识的仿真陈述，常识性的知识到用户自己的情况下，新的挑战。为此，我们建立了一个名为细致入微的新的数据集，主要为对话建议这样的现实环境。通过对话模拟和转述收集，细致入微包含5.1K对话，高品质的用户响应26K圈。我们进行实验，显示两者的实用性和我们的问题设置的挑战。我们相信，细致入微可以作为一种宝贵的资源从代理为中心的系统推动现有的研究，以用户为中心的系统。代码和数据将被公之于众。

82. Adding Chit-Chats to Enhance Task-Oriented Dialogues [PDF] 返回目录
Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie
Abstract: The existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chats to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a flexible approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat annotations to 23.8K dialogues from the popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for ACCENTOR explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.
摘要：现有对话语料库和模型通常在两个不相交的动机设计：同时面向任务的系统专注于实现功能目标（例如，预订酒店），开域聊天机器人旨在使社会上合对话。在这项工作中，我们提出通过增加陈捷聊天，以增强面向任务的对话（笛），以使虚拟助理的对话更具吸引力和互动的目标，这两种类型的系统进行整合。具体来说，我们建议产生不同的闲聊反应，以最小的努力注释扩充面向任务的对话，一个灵活的方法。然后，我们从流行的面向任务的数据集（Schema的指导对话和MultiWOZ 2.1）介绍我们的新闲聊注释23.8K对话，展示了人类通过评估的原件自己的优势。最后，我们提出了笛三款新车型显式训练来预测用户的目标，并产生与内容相关的闲聊响应。自动和人的评估表明，随着国家的最先进的面向任务的基线相比，我们的模型能任务，闲聊之间的代码切换到更具吸引力的，有趣的，见识，像人一样，同时保持具有竞争力的任务性能。

83. Effective Distant Supervision for Temporal Relation Extraction [PDF] 返回目录
Xinyu Zhao, Shih-ting Lin, Greg Durrett
Abstract: A principal barrier to training temporal relation extraction models in new domains is the lack of varied, high quality examples and the challenge of collecting more. We present a method of automatically collecting distantly-supervised examples of temporal relations. We scrape and automatically label event pairs where the temporal relations are made explicit in text, then mask out those explicit cues, forcing a model trained on this data to learn other signals. We demonstrate that a pre-trained Transformer model is able to transfer from the weakly labeled examples to human-annotated benchmarks in both zero-shot and few-shot settings, and that the masking scheme is important in improving generalization.
摘要：在新的结构域训练时间关系提取模型的一个主要障碍是缺乏变化，高品质的例子和收集更多的挑战。我们提出的自动收集方法遥远监督时序关系的例子。我们凑和自动标签，其中，时间关系作出明确的文本事件对，然后屏蔽掉那些明确的线索，迫使训练有素的这个数据模型来学习其他的信号。我们证明了一个预先训练变压器模型能够从弱标识样本转移到人的注解基准两个零射门和为数不多的拍摄设置，并且掩码方案在提高泛化重要。

84. Temporal Reasoning on Implicit Events from Distant Supervision [PDF] 返回目录
Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, Dan Roth
Abstract: Existing works on temporal reasoning among events described in text focus on modeling relationships between explicitly mentioned events and do not handle event end time effectively. However, human readers can infer from natural language text many implicit events that help them better understand the situation and, consequently, better reason about time. This work proposes a new crowd-sourced dataset, TRACIE, which evaluates systems' understanding of implicit events - events that are not mentioned explicitly in the text but can be inferred from it. This is done via textual entailment instances querying both start and end times of events. We show that TRACIE is challenging for state-of-the-art language models. Our proposed model, SymTime, exploits distant supervision signals from the text itself and reasons over events' start time and duration to infer events' end time points. We show that our approach improves over baseline language models, gaining 5% on the i.i.d. split and 9% on an out-of-distribution test split. Our approach is also general to other annotation schemes, gaining 2%-8% on MATRES, an extrinsic temporal relation benchmark.
摘要：现有的造型上明确提到的事件之间的关系在文本中重点描述的事件之间的时间推理作品，并不能有效地处理事件的结束时间。然而，人类的读者可以从自然语言文本许多隐含的事件帮助他们更好地了解情况，因此，关于时间更好的理由推断。这项工作提出了新的人群来源的数据集，与Tracie，其评估系统隐式事件的理解 - 这是不是在文本中明确提及，但可以从中推断事件。这是通过文字蕴涵实例查询事件的开始和结束的时间来完成。我们发现，与Tracie为国家的最先进的语言模型挑战。我们提出的模型，SymTime，利用从文本本身和过事件的开始时间和持续时间来推断事件结束的时间点原因遥远的监管信号。我们证明了我们的方法改善了基线的语言模型，在独立同分布获得5％分裂和9％上的外的分布测试分裂。我们的做法也一般于其它注释方案，在MATRES，外在的时间关系的基准获得2％-8％。

85. Text Style Transfer: A Review and Experiment Evaluation [PDF] 返回目录
Zhiqiang Hu, Roy Ka-Wei Lee, Charu C. Aggarwal
Abstract: The stylistic properties of text have intrigued computational linguistics researchers in recent years. Specifically, researchers have investigated the Text Style Transfer (TST) task, which aims to change the stylistic properties of the text while retaining its style independent content. Over the last few years, many novel TST algorithms have been developed, while the industry has leveraged these algorithms to enable exciting TST applications. The field of TST research has burgeoned because of this symbiosis. This article aims to provide a comprehensive review of recent research efforts on text style transfer. More concretely, we create a taxonomy to organize the TST models and provide a comprehensive summary of the state of the art. We review the existing evaluation methodologies for TST tasks and conduct a large-scale reproducibility study where we experimentally benchmark 19 state-of-the-art TST algorithms on two publicly available datasets. Finally, we expand on current trends and provide new perspectives on the new and exciting developments in the TST field.
摘要：文本的样式属性在最近几年好奇计算语言学的研究人员。具体来说，研究人员调查了文本样式转移（TST）任务，同时保留其风格独立的内容，目的是改变文本的样式属性。在过去的几年中，许多新的TST算法已经被开发，同时得到了业界利用这些算法，使精彩的TST应用。 TST的研究领域已经因为这个共生的风生水起。本文旨在提供有关文本样式转移最近的研究工作进行了全面审查。更具体地说，我们创建了一个分类来组织TST模型，并提供技术状态的全面总结。我们审查TST任务的现有评价方法，并进行了大规模的重复性研究，在我们实验基准19国家的最先进的两个公开可用的数据集，TST算法。最后，我们目前的趋势扩大，并提供在TST领域的新的和令人兴奋的发展新的前景。

86. An Evaluation Protocol for Generative Conversational Systems [PDF] 返回目录
Seolhwa Lee, Heuiseok Lim, João Sedoc
Abstract: There is a multitude of novel generative models for open-domain conversational systems; however, there is no systematic evaluation of different systems. Systematic comparisons require consistency in experimental design, evaluation sets, conversational systems and their outputs, and statistical analysis. We lay out a protocol for the evaluation of conversational models using head-to-head pairwise comparison. We analyze ten recent models that claim state-of-the-art performance using a paired head-to-head performance (win-loss-tie) on five evaluation datasets. Our findings show that DialoGPT and Blender are superior systems using Bradley-Terry model and TrueSkill ranking methods. These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets. Finally, we make all code and evaluations publicly available for researchers to compare their model to other state-of-the-art dialog models.
摘要：有新生成模型的开放域对话系统的众多;然而，有不同的系统没有系统的评价。系统的比较需要在实验设计，评价套，对话系统和它们的输出，和统计分析的一致性。我们制定出了使用头对头两两比较会话模型的评估的协议。我们分析了近十年的模型，要求国家的最先进的性能用五个评价数据集配对头对头性能（输赢领带）。我们的研究结果显示，DialoGPT和搅拌机使用布拉德利 - 特里模型TrueSkill排名方法优越的系统和。这些结果表明我们的协议的可行性评估会话代理和评价集。最后，我们做的所有代码和评价可以公开进行研究，比较他们的模型与其他国家的最先进的对话框模式。

87. Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality [PDF] 返回目录
Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Keskar, Thamar Solorio
Abstract: Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokenization process of language models. BPE provides multiple benefits, such as handling the out-of-vocabulary problem and reducing vocabulary sparsity. However, this process is defined from the pre-training data statistics, making the tokenization on different domains susceptible to infrequent spelling sequences (e.g., misspellings as in social media or character-level adversarial attacks). On the other hand, pure character-level models, though robust to misspellings, often lead to unreasonably large sequence lengths and make it harder for the model to learn meaningful contiguous characters. To alleviate these challenges, we propose a character-based subword transformer module (char2subword) that learns the subword embedding table in pre-trained models like BERT. Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a drop-in replacement of the subword embedding table. The module is robust to character-level alterations such as misspellings, word inflection, casing, and punctuation. We integrate it further with BERT through pre-training while keeping BERT transformer parameters fixed. We show our method's effectiveness by outperforming a vanilla multilingual BERT on the linguistic code-switching evaluation (LinCE) benchmark.
摘要：字节对编码（BPE）是在语言模型子字标记化过程的普遍存在算法。 BPE提供了多种好处，如处理外的词汇问题和减少词汇稀疏。然而，该方法是从预训练数据的统计信息中定义，使得上易受不频繁的拼写序列不同的域标记化（例如，拼写错误如社交媒体或字符级对抗攻击）。在另一方面，纯字符级车型，虽然稳健的拼写错误，往往会导致不合理的大序列的长度并使其更难模型学习有意义的连续字符。为了减轻这些挑战，我们提出学习的子词在预先训练模型，如BERT嵌入表基于字符的子字变压器模块（char2subword）。我们char2subword模块建立从文字表述出来的子词的词汇，它可以作为一个下拉更换子字嵌入表。该模块是稳健的字符级的改变，如拼写错误，单词变形，套管，和标点符号。我们与BERT通过预先训练进一步整合，但同时保持BERT变压器参数固定。我们展示通过对语言的语码转换评估（临策）跑赢基准香草多种语言BERT我们的方法的有效性。

88. ANLIzing the Adversarial Natural Language Inference Dataset [PDF] 返回目录
Adina Williams, Tristan Thrush, Douwe Kiela
Abstract: We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We use these annotations to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models? We hope that our annotations will enable more fine-grained evaluation of models trained on ANLI, provide us with a deeper understanding of where models fail and succeed, and help us determine how to train better models in future.
摘要：我们进行对抗性NLI（安利），最近推出的大型人体和模型在中环的自然语言推理收集了多轮数据集的深度误差分析。我们提出的是负责金分类标签推理的不同方面的细粒度标注方案，并用它来手工编写所有三个ANLI发展集。我们使用这些注解解答各种有趣的问题：其推理类型是最常见的，其模型对每个推理类型的最高性能，以及类型是最有挑战性的的先进机型的状态？我们希望我们的注释将使培训了安利模式更细粒度的评估，为我们提供的，其中型号失败和成功，更深入的了解，并帮助我们确定如何在今后的训练更好的模型。

89. Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? [PDF] 返回目录
Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova
Abstract: Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing semantic parsing approaches do not yet perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.
摘要：序列到序列模型擅长于处理自然语言的变化，但已经显示出与外的分布组成概括斗争。这促使新的专门的架构具有更强的成分偏见，但大多数这些方法只被在综合生成的数据集，这是不是代表自然语言变化的评价。在这项工作中，我们要问：我们可以开发一个语义分析的方法来处理自然语言的变化和组成的概括？为了更好地评估这种能力，我们建议非合成数据集的新思路和测试分裂。我们表明，较强的现有语义分析的方法还没有在广泛的一套评价的良好表现。我们还建议NQG-T5，混合模式，结合了预训练序列到序列模型的高精度基于语法的方法。它优于跨越几个组成泛化挑战现有的方法，同时还具有标准评估的国家的最先进的竞争力。虽然从解决这个问题还远远没有，我们的研究强调多元化评价的重要性，并同时处理成分泛化和语义分析自然语言变化的公然挑战。

90. Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation [PDF] 返回目录
Yuning Mao, Xiang Ren, Heng Ji, Jiawei Han
Abstract: Summaries generated by abstractive summarization are supposed to only contain statements entailed by the source documents. However, state-of-the-art abstractive methods are still prone to hallucinate content inconsistent with the source documents. In this paper, we propose constrained abstractive summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization by specifying tokens as constraints that must be present in the summary. We explore the feasibility of using lexically constrained decoding, a technique applicable to any abstractive method with beam search decoding, to fulfill CAS and conduct experiments in two scenarios: (1) Standard summarization without human involvement, where keyphrase extraction is used to extract constraints from source documents; (2) Interactive summarization with human feedback, which is simulated by taking missing tokens in the reference summaries as constraints. Automatic and human evaluations on two benchmark datasets demonstrate that CAS improves the quality of abstractive summaries, especially on factual consistency. In particular, we observe up to 11.2 ROUGE-2 gains when several ground-truth tokens are used as constraints in the interactive summarization scenario.
摘要：由抽象的概括总结产生应该只包含由源文件entailed语句。然而，国家的最先进的抽象方法仍然倾向于产生幻觉内容不一致与源文档。在本文中，我们提议约束抽象汇总（CAS），一般设置，保留通过指定令牌作为约束条件必须存在在摘要抽象总结的事实一致。我们探讨使用词法约束解码，与波束搜索解码的任何抽象方法适用的技术，以满足CAS和进行实验的可行性在两种情况下：（1）标准聚合无需人工参与，其中的关键词提取用于提取约束从源文件; （2）聚合互动与人类反馈，这是通过取缺少标记在参考摘要作为约束条件模拟。自动和两个标准数据集人类评估表明，CAS提高了抽象总结的质量，特别是对事实的一致性。特别是，我们观察到高达11.2 ROUGE-2的收益时，几个地面实况令牌用作交互式汇总情况的制约。

91. Word2vec Conjecture and A Limitative Result [PDF] 返回目录
Falcon Z. Dai
Abstract: Being inspired by the success of \texttt{word2vec} \citep{mikolov2013distributed} in capturing analogies, we study the conjecture that analogical relations can be represented by vector spaces. Unlike many previous works that focus on the distributional semantic aspect of \texttt{word2vec}, we study the purely \emph{representational} question: can \emph{all} semantic word-word relations be represented by differences (or directions) of vectors? We call this the word2vec conjecture and point out some of its desirable implications. However, we will exhibit a class of relations that cannot be represented in this way, thus falsifying the conjecture and establishing a limitative result for the representability of semantic relations by vector spaces over fields of characteristic 0, e.g., real or complex numbers.
摘要：通过捕捉类比\ texttt {word2vec} \ {citep} mikolov2013distributed的成功的启发，我们研究猜想类比关系可以用向量空间表示。与许多以前的作品侧重于\ texttt {word2vec}的分布语义方面，我们研究了纯\ EMPH {表象}问题：可以\ EMPH {所有}语义词词的关系由向量的差异（或方向）表示？我们称之为word2vec猜想，并指出它的一些可取的含义。然而，我们将显示出类无法以这种方式来表示的关系，从而伪造猜想和用于通过矢量空间过度特性0，例如，实数或复数的领域语义关系的表示性建立限制性结果。

92. A Caption Is Worth A Thousand Images: Investigating Image Captions for Multimodal Named Entity Recognition [PDF] 返回目录
Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio
Abstract: Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. Due to advances in natural language processing (NLP) and computer vision (CV), many neural techniques have been proposed to incorporate images into the NER task. In this work, we conduct a detailed analysis of current state-of-the-art fusion techniques for MNER and describe scenarios where adding information from the image does not always result in boosts in performance. We also study the use of captions as a way to enrich the context for MNER. We provide extensive empirical analysis and an ablation study on three datasets from popular social platforms to expose the situations where the approach is beneficial.
摘要：多模式命名实体识别（MNER）需要弥合语言理解与视觉环境之间的差距。由于在自然语言处理（NLP）和计算机视觉（CV）的发展，许多神经技术已经提出了图像结合到NER任务。在这项工作中，我们进行的国家的最先进的电流融合技术MNER了详细的分析和描述的场景从图像中添加信息并不总是导致的性能提升。我们还研究了使用字幕，以此来丰富对MNER上下文。我们提供大量的实证分析，并从流行的社交平台，三个数据集消融研究，以揭露其中的做法是有益的情况。

93. Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education [PDF] 返回目录
Debajyoti Datta, Maria Phillips, Jennifer Chiu, Ginger S. Watson, James P. Bywater, Laura Barnes, Donald Brown
Abstract: Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data sources such as classroom videos. Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes in order to model the necessary scenarios. This method, while proven successful, ignores individual annotator strengths in labeling a data point and under-utilizes examples that do not have a majority vote for labeling. We propose using a multi-task weak supervision method combined with active learning to address these concerns. This approach requires less labeling than traditional methods and shows significant improvements in precision, efficiency, and time-requirements than the majority vote method (Ratner 2019). We demonstrate the validity of this method on the Google Jigsaw data set and then propose a scenario to apply this method using the Instructional Quality Assessment(IQA) to define the categories for labeling. We propose using probabilistic modeling of annotator labeling to generate active learning examples to further label the data. Active learning is able to iteratively improve the training performance and accuracy of the original classification model. This approach combines state-of-the art labeling techniques of weak supervision and active learning to optimize results in the educational domain and could be further used to lessen the data requirements for expanded scenarios within the education domain through transfer learning.
摘要：机器学习技术应用到会话代理发展前途的展会，以提高准确性和反馈会话代理能够提供高质量的结果自然语言处理（NLP）的组成部分。制定教育方案指定会话代理所需的努力是耗时的，因为它需要领域专家标签和注释嘈杂的数据源，如课堂视频。先前的方法建模标注都依赖于数千个标签的例子，以模拟必要的情况下计算，注释间协议和多数票。这种方法，虽然证明是成功的，忽略了个人注释优势，在标签上的数据点和下利用没有过半数票的标记的样本。我们建议使用多任务弱监督方法与主动学习来解决这些问题相结合。这种方法需要比上述多数表决方法传统方法和示出了精度，效率显著改进，和时间要求（维纳2019）标记以下。我们证明了在谷歌的拼图数据集这种方法的有效性，然后提出一个方案来申请使用教学质量评估（IQA）来定义类别标签此方法。我们建议使用注释标记的概率模型产生主动学习的例子来进一步标签的数据。主动学习是能够反复改进原有的分类模型的训练性能和精度。这种方法结合，可以进一步用于减少通过迁移学习教育领域内扩展场景的数据要求的国家的中教育领域监管不力和主动学习，优化结果的艺术标签技术和。

94. Learning to Recognize Dialect Features [PDF] 返回目录
Dorottya Demszky, Devyani Sharma, Jonathan H. Clark, Vinodkumar Prabhakaran, Jacob Eisenstein
Abstract: Linguists characterize dialects by the presence, absence, and frequency of dozens of interpretable features. Detecting these features in text has applications to social science and dialectology, and can be used to assess the robustness of natural language processing systems to dialect differences. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. Linguists typically define dialect features by providing a small number of minimal pairs, which are paired examples distinguished only by whether the feature is present, while holding everything else constant. In this paper, we present two multitask learning architectures for recognizing dialect features, both based on pretrained transformers. We evaluate these models on two test sets of Indian English, annotated for a total of 22 dialect features. We find these models learn to recognize many features with high accuracy; crucially, a few minimal pairs can be nearly as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of our dialect feature detection model as a dialect density measure and as a dialect classifier.
摘要：语言学家由存在，不存在，和几十可解释特征的频率表征方言。在文本检测这些特征具有应用社会科学与方言，可用于自然语言处理系统的稳健性评估方言差异。对于大多数的方言，对这些特性的大型注释的语料库是不可用的，因此很难训练识别器。语言学家通常通过提供少量的最小对，其被配对的实施例仅由特征是否存在区分开来，同时保持所有其他的常数限定方言特征。在本文中，我们提出了两种多任务学习架构识别方言的特点，两者都基于预先训练变压器。我们评估在两个试验组的印度英语，注释共计22个方言特征这些模型。我们发现这些模型学会识别精度高许多功能;至关重要的是，几个最小对可以是几千标识样本训练几乎一样有效。我们也证明了我们的方言特征检测模型的适用性下游的方言密度措施，并带有方言分类。

95. Graph-Based Universal Dependency Parsing in the Age of the Transformer: What Works, and What Doesn't [PDF] 返回目录
Stefan Grünewald, Annemarie Friedrich, Jonas Kuhn
Abstract: Current state-of-the-art graph-based dependency parsers differ on various dimensions. Among others, these include (a) the choice of pre-trained word embeddings or language models used for representing token, (b) training setups performing only parsing or additional tasks such as part-of-speech-tagging, and (c) their mechanism of constructing trees or graphs from edge scores. Because of this, it is difficult to estimate the impact of these architectural decisions when comparing parsers. In this paper, we perform a series of experiments on STEPS, a new modular graph-based parser for basic and enhanced Universal Dependencies, analyzing the effects of architectural configurations. We find that pre-trained embeddings have by far the greatest and most clear-cut impact on parser performance. The choice of factorized vs. unfactorized architectures and a multi-task training setup affect parsing accuracy in more subtle ways, depending on target language and output representation (trees vs. graphs). Our parser achieves new state-of-the-art results for a wide range of languages on both basic as well as enhanced Universal Dependencies, using a unified and comparatively simple architecture for both parsing tasks.
摘要：当前状态的最先进的基于图形的依赖性解析器上的各种尺寸不同。其中，这些包括：（一）用于表示令牌预先训练字的嵌入或语言模型的选择上，（b）培训设置仅执行分析或其他任务，如（三）部分的语音标记，它们的从边缘分数构建树或图的机制。正因为如此，它是很难比较的解析器来估算这些架构决策的影响。在本文中，我们执行的措施提出一系列的实验，基本和增强型通用依赖一种新的模块化基于图形的分析程序，分析建筑结构的影响。我们发现，预先训练的嵌入有迄今为止对解析器的性能最伟大，最明确的影响。因式分解与unfactorized架构的选择和多任务训练设置影响更微妙的方式解析精度，取决于目标语言和输出表现（树与图表）。我们的解析器实现国家的最先进的新成果对于各种语言在两个基本以及增强通用的依赖，使用统一的，相对简单的架构都解析任务。

96. AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization [PDF] 返回目录
Sayali Kulkarni, Sheide Chammas, Wan Zhu, Fei Sha, Eugene Ie
Abstract: Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing single-document and multi-document summarization datasets are inadequate in form and scale. We propose a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora. Our approach is unique in the sense that it can general a dual dataset -- for extractive and abstractive summaries both. We publicly release a specific instance of an AQuaMuSe dataset with 5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl. Extensive evaluation of the dataset along with baseline summarization model experiments are provided.
摘要：概述是压缩源文件（S）为统一和简洁的段落中的任务。这是一个有价值的工具，为用户呈现简洁与他们的查询排名靠前的文件的准确草图。基于查询的多文档文摘（qMDS）解决了这一普遍需求，但研究被严格限定，由于缺乏培训和评估数据集作为现有的单文档和多文档文摘数据集在形式和规模的不足。我们建议叫AQuaMuSe从问答数据集和大型文档语料库自动矿qMDS例子可扩展的方法。我们的做法是在独特之处在于它可以通用的双重数据集 - 采掘和抽象总结两者。我们公开与5519基于查询的总结，每个平均的从从通用抓取355M文档的索引中选择6个输入文档相关释放AQuaMuSe数据集的具体实例。提供与基线总结模型实验沿着数据集的广泛的评估。

97. Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training [PDF] 返回目录
Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou
Abstract: Generating natural sentences from Knowledge Graph (KG) triples, known as Data-To-Text Generation, is a task with many datasets for which numerous complex systems have been developed. However, no prior work has attempted to perform this generation at scale by converting an entire KG into natural text. In this paper, we verbalize the entire Wikidata KG, and create a KG-Text aligned corpus in the training process. We discuss the challenges in verbalizing an entire KG versus verbalizing smaller datasets. We further show that verbalizing an entire KG can be used to integrate structured and natural language data. In contrast to the many architectures that have been developed to integrate the structural differences between these two sources, our approach converts the KG into the same format as natural text allowing it to be seamlessly plugged into existing natural language systems. We evaluate this approach by augmenting the retrieval corpus in REALM and showing improvements, both on the LAMA knowledge probe and open domain QA.
摘要：从知识图（KG）三元组，称为数据到文本生成，生成自然的句子是针对许多复杂系统已经开发了许多数据集的任务。但是，之前没有工作一直试图在规模由整个KG转换成自然的文本来执行这一代。在本文中，我们用语言表达整个维基数据KG，创造在训练过程中KG-文本对齐的文集。我们讨论用言语表达一个完整的KG与言语表达较小的数据集的挑战。进一步的研究表明言语表达一个完整的KG可以用来整合结构化和自然语言数据。相较于已开发整合这两个源之间的结构差异，许多架构，我们的方法将KG转换成相同的格式，自然的文本允许其无缝地插入到现有的自然语言系统。我们评估在REALM增强检索语料库和显示的改进，无论在LAMA知识探头和开放的领域QA这种方法。

98. Dynamic Contextualized Word Embeddings [PDF] 返回目录
Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
Abstract: Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for various tasks in the computational social sciences. We highlight potential applications by means of qualitative and quantitative analyses.
摘要：静态字的嵌入，通过一个单一的矢量代表的话永远不能捕捉到的词在不同的语言和语言之外的语境意义的变化。对情境和动态的嵌入字之前工作的基础上，我们引入动态情境字的嵌入代表词既是语言和超语言方面的功能。基于预训练语言模型（PLM），动态情境字的嵌入模型时，共同的社会空间，这使得它们在计算社会科学的各种任务吸引力。我们通过定性和定量分析手段发现潜在的应用。

99. Robust Document Representations using Latent Topics and Metadata [PDF] 返回目录
Natraj Raman, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso
Abstract: Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems. This technique is not adequate when labeled examples are not available at training time and when the metadata artifacts in a document must be exploited. We address these challenges by generating document representations that capture both text and metadata artifacts in a task agnostic manner. Instead of traditional auto-regressive or auto-encoding based training, our novel self-supervised approach learns a soft-partition of the input space when generating text embeddings. Specifically, we employ a pre-learned topic model distribution as surrogate labels and construct a loss function based on KL divergence. Our solution also incorporates metadata explicitly rather than just augmenting them with text. The generated document embeddings exhibit compositional characteristics and are directly used by downstream classification tasks to create decision boundaries from a small number of labeled examples, thereby eschewing complicated recognition methods. We demonstrate through extensive evaluation that our proposed cross-model fusion solution outperforms several competitive baselines on multiple datasets.
摘要：任务的具体使用自定义输出添加Softmax层中的预训练神经语言模型的微调是文档分类问题后期处理时的实际做法。该技术不充分时标记的实例是不可用在训练时间，并且当文档中的元数据构件必须被利用。我们通过生成文档表示应对这些挑战，捕捉文本和元数据构件在任务无关的方式。代替传统的自回归或自动编码为基础的培训，我们的新的自我监督方法生成文本的嵌入时获悉的输入空间的软分区。具体来说，我们采用预先获悉主题模型分发代理标签并构建基于KL发散损失函数。我们的解决方案还集成了明确的元数据，而不仅仅是文本增强他们。所生成的文档的嵌入表现出组成特征，并且直接使用由下游分类任务从少数标记实例创建决策边界，从而避开复杂的识别方法。我们通过广泛的评估表明，我们提出的跨模型融合解决方案优于对多个数据集几个有竞争力的基线。

100. A Differentiable Relaxation of Graph Segmentation and Alignment for AMR Parsing [PDF] 返回目录
Chunchuan Lyu, Shay B. Cohen, Ivan Titov
Abstract: Abstract Meaning Representations (AMR) are a broad-coverage semantic formalism which represents sentence meaning as a directed acyclic graph. To train most AMR parsers, one needs to segment the graph into subgraphs and align each such subgraph to a word in a sentence; this is normally done at preprocessing, relying on hand-crafted rules. In contrast, we treat both alignment and segmentation as latent variables in our model and induce them as part of end-to-end training. As marginalizing over the structured latent variables is infeasible, we use the variational autoencoding framework. To ensure end-to-end differentiable optimization, we introduce a continuous differentiable relaxation of the segmentation and alignment problems. We observe that inducing segmentation yields substantial gains over using a `greedy' segmentation heuristic. The performance of our method also approaches that of a model that relies on \citet{Lyu2018AMRPA}'s segmentation rules, which were hand-crafted to handle individual AMR constructions.
摘要：抽象意义交涉（AMR）是一个广泛的覆盖语义形式主义表示句子的含义有向无环图。培养最AMR解析器，一个需要以分割成图形子图和对准每个这样的子图，以在一个句子中的词;这通常是在预处理完成后，依靠手工制作的规则。与此相反，我们把两个校准和分割在我们的模型中潜在变量，并诱导其作为终端到终端的培训的一部分。由于边缘化在结构潜在变量是不可行的，我们使用了变autoencoding框架。为了确保终端到终端的微优化，我们介绍的分割和校准问题连续可微的放松。我们观察到，诱导分割产生了使用'贪婪”分割启发式可观的收益。我们的方法的性能也接近于依赖于\ {citet} Lyu2018AMRPA的切分规则，这是手工制作的，以处理个人AMR结构的典范。

101. Overcoming Conflicting Data for Model Updates [PDF] 返回目录
David Gaddy, Alex Kouzemtchenko, Pavan Kumar Reddy, Prateek Kolhar, Rushin Shah
Abstract: In this paper, we explore how to use a small amount of new data to update a model when the desired output for some examples has changed. When making updates in this way, one potential problem that arises is the presence of conflicting data, or out-of-date labels in the original training set. To evaluate the impact of this problem, we propose an experimental setup for simulating changes to a neural semantic parser. We show that the presence of conflicting data greatly hinders learning of an update, then explore several methods to mitigate its effect. Our methods lead to large improvements in model accuracy compared to a naive mixing strategy, and our best method closes 86% of the accuracy gap between this baseline and an oracle upper bound.
摘要：在本文中，我们将探讨如何利用新的数据少量更新模型时的一些例子所需的输出已经改变。当以这种方式进行更新，即出现一个潜在的问题是相互矛盾的数据的存在，或外的日期标签在原来的训练集。为了评估这一问题的影响，我们提出了模拟改变神经语义解析的实验装置。我们证明了极大的矛盾的数据阻碍学习的更新的存在，进而探讨了几种方法来减轻其影响。我们的方法导致模型精度大的改进相比，天真的混合策略，我们最好的方法关闭这个基准线和上界一个Oracle之间的差距准确性86％。

102. On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer [PDF] 返回目录
Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong
Abstract: Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion. In HAT, the blank probability and the label probability are estimated using two separate probability distributions, which provides a more accurate solution for internal LM score estimation, and thus works better when combining with an external LM. Previous work mainly focuses on HAT model training with the negative log-likelihood loss, while in this paper, we study the minimum word error rate (MWER) training of HAT -- a criterion that is closer to the evaluation metric for speech recognition, and has been successfully applied to other types of end-to-end models such as sequence-to-sequence (S2S) and RNN-T models. From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models, while at the same time, improving the robustness of the model against the decoding hyper-parameters such as length normalization and decoding beam during inference.
摘要：混合自回归换能器（HAT）是一个扩展标准回归神经网络传感器（RNN-T）为外部语言模型（LM）融合的目的最近提出的端至端的声学模型。在HAT，空白概率和标签概率使用两个单独的概率分布，这提供了内部LM得分估计一个更加准确的解决方案，并由此更好地工作时与外部LM组合估计。以前的工作主要集中在与负对数似然亏损帽子模型训练，而在本文中，我们研究了最小字错误率（MWER）HAT的训练 - 这样的标准更接近评价指标语音识别，以及已成功地应用于其它类型的端至端模型，如序列到序列（S2S）和RNN-T模型。从与周围30000小时训练数据的实验中，我们表明，MWER训练可以提高HAT模型的精确度，而在同一时间，提高针对解码超参数如长度归一化的模型的稳健性和推理过程中解码光束。

103. Generating Adequate Distractors for Multiple-Choice Questions [PDF] 返回目录
Cheng Zhang, Yicheng Sun, Hejia Chen, Jie Wang
Abstract: This paper presents a novel approach to automatic generation of adequate distractors for a given question-answer pair (QAP) generated from a given article to form an adequate multiple-choice question (MCQ). Our method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms. We use the US SAT (Scholastic Assessment Test) practice reading tests as a dataset to produce QAPs and generate three distractors for each QAP to form an MCQ. We show that, via experiments and evaluations by human judges, each MCQ has at least one adequate distractor and 84\% of MCQs have three adequate distractors.
摘要：从给定的物品生成提出了一种新颖的方法来自动生成适当的干扰项的对于给定的问答配对（QAP），以形成适当的选择题（MCQ）。我们的方法是部分词性标注，命名实体标记，语义角色标注，正则表达式，领域知识基础，文字的嵌入，文字编辑距离，共发现，和其他算法的组合。我们使用美国的SAT（学术能力评估考试），练习阅读测试的数据集，以产生QAPs和生成三个错误选项每个QAP形成MCQ。我们表明，通过人类的法官实验和评估，每个MCQ至少有一个适当的牵张和多选题的84 \％，有三个干扰项充足。

104. Rapid Domain Adaptation for Machine Translation with Monolingual Data [PDF] 返回目录
Mahdis Mahdieh, Mia Xu Chen, Yuan Cao, Orhan Firat
Abstract: One challenge of machine translation is how to quickly adapt to unseen domains in face of surging events like COVID-19, in which case timely and accurate translation of in-domain information into multiple languages is critical but little parallel data is available yet. In this paper, we propose an approach that enables rapid domain adaptation from the perspective of unsupervised translation. Our proposed approach only requires in-domain monolingual data and can be quickly applied to a preexisting translation system trained on general domain, reaching significant gains on in-domain translation quality with little or no drop on general-domain. We also propose an effective procedure of simultaneous adaptation for multiple domains and languages. To the best of our knowledge, this is the first attempt that aims to address unsupervised multilingual domain adaptation.
摘要：机器翻译的一个挑战是如何快速适应看不见域在风起云涌的事件，如COVID-19，在这种情况下，及时和域内信息的准确翻译成多国语言是至关重要的，但很少并行数据尚未提供的脸。在本文中，我们提出了一种方法，能够快速适应域从无人监督翻译的视角。我们提出的方法只需要在域单语的数据，可以迅速应用到培训了一般域预先存在的翻译系统，达到与一般的域名很少或没有降在域翻译质量显著的收益。我们还提出了多个域和语言同时适应的有效方法。据我们所知，这是第一次尝试，旨在解决无监督的多语种域名适应。

105. Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering [PDF] 返回目录
Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano
Abstract: Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).
摘要：再加上大规模数据集的可用性，深度学习架构已经使有关问题回答任务进展迅速。然而，大多数这些数据集的都是英文，并且在非英语数据进行评估时，国家的最先进的多语言模型的性能是显著降低。由于高数据采集的成本，这是不现实的获得注释的数据为每种语言一个愿望的支持。我们建议，以提高而不需要额外的注释的数据，利用问题生成模式产生的跨语种的方式合成样品的跨语言问答系统性能的方法。我们表明，该方法允许显著超越训练的只有英文资料的基线。我们在四个多语种的资料集报告一个新的国家的最先进的：MLQA，XQuAD，阵容，并PIAF（FR）。

106. Posterior Differential Regularization with f-divergence for Improving Model Robustness [PDF] 返回目录
Hao Cheng, Xiaodong Liu, Lis Pereira, Yaoliang Yu, Jianfeng Gao
Abstract: We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of $f$-divergences and characterize the overall regularization framework in terms of Jacobian matrix. Empirically, we systematically compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model in-domain and out-of-domain generalization. For both fully supervised and semi-supervised settings, our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness. In particular, with a proper $f$-divergence, a BERT-base model can achieve comparable generalization as its BERT-large counterpart for in-domain, adversarial and domain shift scenarios, indicating the great potential of the proposed framework for boosting model generalization for NLP models.
摘要：我们解决通过正规化增强模型的鲁棒性的问题。具体地，我们关注的是正规化纯净和有噪声输入之间的模型后差的方法。从理论上讲，我们提供这个框架下的最近两个方法，雅可比规范化和虚拟对抗式训练的连接。此外，我们归纳后微分正为$ F族$ -divergences和雅可比矩阵来表征整个正规化框架。根据经验，我们系统地比较上一组不同的任务，这些正则化和标准BERT培训，提供模型域和外的域泛化其作用的综合资料。对于这两种完全监督和半监督的设置，我们的实验表明，与$正规化后微分f $ -divergence可导致良好的改进模型的鲁棒性。特别是，适当的$ F $ -divergence，一个BERT基模型可以达到与推广作为其BERT大型对口域内，对抗性和域转换情况，这表明提高模型综合拟议框架的巨大潜力为NLP模型。

107. Domain Specific Complex Sentence (DCSC) Semantic Similarity Dataset [PDF] 返回目录
Dhivya Chandrasekaran, Vijay Mago
Abstract: Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformed based models in existing benchmark datasets like STS dataset and SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We propose a new benchmark dataset -- the Domain Specific Complex Sentences (DSCS) dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and the results justify the hypothesis that the performance of the word embeddings decrease with an increase in complexity of the sentences.
摘要：语义文本相似性是自然语言处理领域的开放研究的挑战之一。广泛的研究已经在现有的基准数据集像STS数据集，SICK数据集这一领域开展和最近转化的机型为主近乎完美的效果得以实现。在本文中，我们研究了这些数据集的句子和关于句子的复杂性分析各种字的嵌入的灵敏度。我们提出了一个新的基准数据集 - 的领域特定复句（DSC）的数据集，其包括与由15个人的注释提供的相关的语义相似性值50句子对。进行可读性分析，以突出显示现有的基准数据集在句子的复杂性的增加和那些在建议的数据集。此外，我们还进行各种文字的嵌入的性能进行了比较分析，结果证明这一假设，这个词的嵌入的性能，增加了句子的复杂性降低。

108. Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text-to-SQL [PDF] 返回目录
Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao Yu, Peng Shi, Rui Zhang
Abstract: Neural models have achieved significant results on the text-to-SQL task, in which most current work assumes all the input questions are legal and generates a SQL query for any input. However, in the real scenario, users can input any text that may not be able to be answered by a SQL query. In this work, we propose TriageSQL, the first cross-domain text-to-SQL question intention classification benchmark that requires models to distinguish four types of unanswerable questions from answerable questions. The baseline RoBERTa model achieves a 60% F1 score on the test set, demonstrating the need for further improvement on this task. Our dataset is available at this https URL.
摘要：神经模型对文本到SQL任务，其中大部分目前的工作假设所有的输入问题，是合法的，并产生任何输入的SQL查询取得了显著成果。然而，在现实情况下，用户可以输入可能无法任何文本可以通过SQL查询的回答。在这项工作中，我们提出TriageSQL，第一跨域文本到SQL问题的意图分类基准，需要模型来区分四种类型，从回答的问题无法回答的问题。基线罗伯塔模型实现了对测试集的60％F1得分，证明了此任务需要进一步改进。我们的数据可在此HTTPS URL。

109. Anchor-based Bilingual Word Embeddings for Low-Resource Languages [PDF] 返回目录
Tobias Eder, Viktor Hangya, Alexander Fraser
Abstract: Bilingual word embeddings (BWEs) are useful for many cross-lingual applications, such as bilingual lexicon induction (BLI) and cross-lingual transfer learning. While recent methods have led to good quality BWEs for different language pairs using only weak bilingual signals, they still rely on an abundance of monolingual training data in both languages for their performance. This becomes a problem especially in the case of low resource languages where neither parallel bilingual corpora nor large monolingual training data are available. This paper proposes a new approach for building BWEs in which the vector space of the high resource source language is used as a starting point for training an embedding space for the low resource target language. By using the source vectors as anchors the vector spaces are automatically aligned. We evaluate the resulting BWEs on BLI and show the proposed method outperforms previous approaches in the low-resource setting by a large margin. We show strong results on the standard English-German test pair (using German to simulate low resource). We also show we can build useful BWEs for English-Hiligaynon, a true low-resource language, where previous approaches failed.
摘要：双语字的嵌入（BWEs）是许多跨语言的应用程序，例如双语词典诱导（BLI）和跨语种转印学习有用的。尽管最近的方法已经使用只有疲软的双语信号导致以优良的品质BWEs针对不同的语言对，他们仍然依靠这两种语言为他们的表现丰富的单语训练数据。这将成为特别是在不平行语料库双语大也不单语训练数据可用低资源语言的情况下的一个问题。本文提出了一种建立在高资源源语言的向量空间被用作用于低资源目标语言训练的嵌入空间的起点BWEs的新方法。通过使用源矢量作为锚矢量空间自动对齐。我们评估对BLI产生的BWEs并显示该方法优于大幅度的低资源设置以前的方法。我们展示（采用德国来模拟较低的资源）的标准英语，德语测试一双有力的结果。我们还表明我们可以建立英语希利盖农文，一个真正的低资源语言，在以前的方法失败有用BWEs。

110. Topic Modeling with Contextualized Word Representation Clusters [PDF] 返回目录
Laure Thompson, David Mimno
Abstract: Clustering token-level contextualized word representations produces output that shares many similarities with topic models for English text collections. Unlike clusterings of vocabulary-level word embeddings, the resulting models more naturally capture polysemy and can be used as a way of organizing documents. We evaluate token clusterings trained from several different output layers of popular contextualized language models. We find that BERT and GPT-2 produce high quality clusterings, but RoBERTa does not. These cluster models are simple, reliable, and can perform as well as, if not better than, LDA topic models, maintaining high topic quality even when the number of topics is large relative to the size of the local collection.
摘要：集群标记级别语境词表示产生输出许多相似之处与主题模型的英文文本集合。词汇级字的嵌入的不同聚类，得到的模型更自然地捕获多义性和可作为组织文档的方法。我们评估从流行的情境语言模型的多个不同的输出层的培训令牌聚类。我们发现，BERT和GPT-2生产出高品质的聚类，但罗伯塔没有。这些集群模型是简单的，可靠的，可如果不优于执行，以及，LDA主题模型，保持高品质的主题，即使主题的数量是相对于本地集合的大小大。

111. Unsupervised Multi-hop Question Answering by Question Generation [PDF] 返回目录
Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang
Abstract: Obtaining training data for Multi-hop Question Answering (QA) is extremely time-consuming and resource-intensive. To address this, we propose the problem of \textit{unsupervised} multi-hop QA, assuming that no human-labeled multi-hop question-answer pairs are available. We propose MQA-QG, an unsupervised question answering framework that can generate human-like multi-hop training pairs from both homogeneous and heterogeneous data sources. Our model generates questions by first selecting or generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. We find that we can train a competent multi-hop QA model with only generated data. The F1 gap between the unsupervised and fully-supervised models is less than 20 in both the HotpotQA and the HybridQA dataset. Further experiments reveal that an unsupervised pretraining with the QA data generated by our model would greatly reduce the demand for human-annotated training data for multi-hop QA.
摘要：多跳问答系统（QA）获得的训练数据是非常耗费时间和资源密集型。为了解决这个问题，我们提出了\ {textit无监督}多跳QA的问题，假设没有人标记的多跳问答配对可用。我们建议MQA-QG，无人监督的问题回答的框架，可以产生类似人类的自同构和异构数据源的多跳的训练对。我们的模型生成由第一选择或生成从每个数据源相关的信息，然后整合的多路信息，以形成多跳问题的问题。我们发现，我们可以训练只生成的数据胜任多跳QA模型。无监督，充分监督模型之间的F1间隙小于20同时在HotpotQA和HybridQA数据集。进一步的实验表明，与我们的模型所产生的QA数据无人监督的训练前，将大大降低多跳QA人注释的训练数据的需求。

112. Ranking Creative Language Characteristics in Small Data Scenarios [PDF] 返回目录
Julia Siekiera, Marius Köppel, Edwin Simpson, Kevin Stowe, Iryna Gurevych, Stefan Kramer
Abstract: The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's $\rho$ by 14% and 16% on average.
摘要：以等级创意自然语言的能力提供了下游语言理解和生成的重要通用工具。然而，目前的深排名模型需要大量这是困难和昂贵，以获得不同的领域，语言和创作特点的标签数据。最近的神经方针，DirectRanker，承诺以减少所需但其文本应用不充分探讨训练的数据量。因此，我们调整DirectRanker为排名创作语言与小数据提供了新的深层模型。我们比较DirectRanker与贝叶斯方法，高斯过程偏好学习（GPPL），先前已显示出良好的工作与稀疏数据。我们与稀疏训练数据实验表明，虽然标准神经排名方法的性能与小训练数据崩溃，DirectRanker仍然有效。我们发现，与DirectRanker结合GPPL通过利用这两种模式的互补利益的增加在不同设置的性能。我们的综合方法比对幽默和隐喻新奇任务的先前状态的最先进的，按14％，平均16％的速度增长斯皮尔曼$ \ $ RHO。

113. Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence [PDF] 返回目录
Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, Zheng Qin, Fengyuan Xu, Prateek Mittal, Sanjeev R. Kulkarni, Dawn Song
Abstract: Log-based cyber threat hunting has emerged as an important solution to counter sophisticated cyber attacks. However, existing approaches require non-trivial efforts of manual query construction and have overlooked the rich external knowledge about threat behaviors provided by open-source Cyber Threat Intelligence (OSCTI). To bridge the gap, we propose EffHunter, a system that facilitates cyber threat hunting in computer systems using OSCTI. Built upon mature system auditing frameworks, EffHunter provides (1) an unsupervised, light-weight, and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities, (3) a query synthesis mechanism that automatically synthesizes a TBQL query for threat hunting from the extracted threat behaviors, and (4) an efficient query execution engine to search the big audit logging data. Evaluations on a broad set of attack cases demonstrate the accuracy and efficiency of EffHunter in enabling practical threat hunting.
摘要：基于日志的网络威胁狩猎已成为应对复杂的网络攻击的一个重要的解决方案。然而，现有的方法需要人工查询构造的不平凡的努力，忽略了有关开放源代码的网络威胁智能（OSCTI）提供威胁行为丰富的外部知识。为了缩小差距，我们提出EffHunter，有利于网络威胁狩猎使用OSCTI计算机系统的系统。建立在成熟的系统审计框架，EffHunter提供（1）一种无监督，重量轻，和准确的NLP流水线提取物结构从非结构化OSCTI文本威胁的行为，（2）的简明和表现特定于域的查询语言，TBQL，打猎恶意系统活动，（3）的查询合成机构，其合成会自动为威胁狩猎从所提取的威胁的行为，以及（4）一种有效的查询执行引擎一个TBQL查询来搜索大审计记录数据。一系列广泛的攻击案例评估证明因而在实用性威胁狩猎EffHunter的准确性和效率。

114. Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020 [PDF] 返回目录
Henry Turner, Giulio Lovisotto, Ivan Martinovic
Abstract: In this paper, we present a Distribution-Preserving Voice Anonymization technique, as our submission to the VoicePrivacy Challenge 2020. We notice that the challenge baseline system generates fake X-vectors which are very similar to each other, significantly more so than those extracted from organic speakers. This difference arises from averaging many X-vectors from a pool of speakers in the anonymization processs, causing a loss of information. We propose a new method to generate fake X-vectors which overcomes these limitations by preserving the distributional properties of X-vectors and their intra-similarity. We use population data to learn the properties of the X-vector space, before fitting a generative model which we use to sample fake X-vectors. We show how this approach generates X-vectors that more closely follow the expected intra-similarity distribution of organic speaker X-vectors. Our method can be easily integrated with others as the anonymization component of the system and removes the need to distribute a pool of speakers to use during the anonymization. Our approach leads to an increase in EER of up to 16.8\% in males and 8.4\% in females in scenarios where enrollment and trial utterances are anonymized versus the baseline solution, demonstrating the diversity of our generated voices.
摘要：在本文中，我们提出了一个分布保留语音匿名化技术，因为我们提交VoicePrivacy挑战2020年我们注意到，挑战基线系统生成假的X-载体，这是非常彼此相似，显著更比那些从有机扬声器萃取。这种差异源于从匿名PROCESSS扬声器池均许多X-载体，导致信息丢失。我们提出了一个新的方法来生成假的X-载体，其通过保留X-载体和其内部的相似的分布特性克服了这些限制。我们用人口数据来学习X-向量空间的特性，配合我们用它来样假的X向量生成模型前。我们证明这种方法是如何产生更密切关注有机音箱X-载体的预期内相似分布X-载体。我们的方法可以与其他人很容易地集成为系统的匿名组件而且无需匿名期间扬声器池分配给使用。我们的方法会导致男性的情况下增加了的EER 16.8 \％和8.4 \％的女性在那里注册和审判的言论是匿名与基线的解决方案，展示了我们产生声音的多样性。

115. Improved Mask-CTC for Non-Autoregressive End-to-End ASR [PDF] 返回目录
Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
Abstract: For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC), Mask-CTC, fulfills this demand by generating tokens in a non-autoregressive fashion. While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems. To boost the performance of Mask-CTC, we first propose to enhance the encoder network architecture by employing a recently proposed architecture called Conformer. Next, we propose new training and decoding methods by introducing auxiliary objective to predict the length of a partial target sequence, which allows the model to delete or insert tokens during inference. Experimental results on different ASR tasks show that the proposed approaches improve Mask-CTC significantly, outperforming a standard CTC model (15.5% $\rightarrow$ 9.1% WER on WSJ). Moreover, Mask-CTC now achieves competitive results to AR models with no degradation of inference speed ($<$ 0.1 rtf using cpu). we also show a potential application of mask-ctc to end-to-end speech translation. < font>
摘要：自动语音识别（ASR）的实际部署，系统需要同时减轻计算资源的要求，能够快速的推断。根据最近提出的端至端的ASR系统掩模预测与联结颞分类（CTC），面膜-CTC，满足由在非自回归方式令牌生成这种需求。虽然面膜-CTC实现了非常快的速度推断，其识别性能落后，传统的自回归（AR）系统。为了提高面膜-CTC的性能，我们首先提出通过采用最近提出的架构，称为构象，以提高编码器的网络架构。接下来，我们提出新的训练，并通过引入辅助目标预测的部分的靶序列，这允许模型推理期间删除或插入的令牌的长度进行解码的方法。在不同的ASR任务的实验结果表明，该方法提高面膜-CTC显著，超越标准模型CTC（15.5％$ \ RIGHTARROW 9.1 $％WER上WSJ）。此外，掩码-CTC现在实现有竞争力的结果，以AR模型没有推理速度（$ <$ 0.1 rtf使用cpu）的降解。我们还表明面膜-ctc到终端到终端的语音翻译具有潜在的应用。< font>

116. ExplanationLP: Abductive Reasoning for Explainable Science Question Answering [PDF] 返回目录
Mokanarangan Thayaparan, Marco Valentino, André Freitas
Abstract: We propose a novel approach for answering and explaining multiple-choice science questions by reasoning on grounding and abstract inference chains. This paper frames question answering as an abductive reasoning problem, constructing plausible explanations for each choice and then selecting the candidate with the best explanation as the final answer. Our system, ExplanationLP, elicits explanations by constructing a weighted graph of relevant facts for each candidate answer and extracting the facts that satisfy certain structural and semantic constraints. To extract the explanations, we employ a linear programming formalism designed to select the optimal subgraph. The graphs' weighting function is composed of a set of parameters, which we fine-tune to optimize answer selection performance. We carry out our experiments on the WorldTree and ARC-Challenge corpus to empirically demonstrate the following conclusions: (1) Grounding-Abstract inference chains provides the semantic control to perform explainable abductive reasoning (2) Efficiency and robustness in learning with a fewer number of parameters by outperforming contemporary explainable and transformer-based approaches in a similar setting (3) Generalisability by outperforming SOTA explainable approaches on general science question sets.
摘要：我们提出了回答和推理上的接地和抽象推理链条解释选择题科学问题的新方法。本文帧答疑作为绎推理问题，构建合理的解释为每个选项，然后选择与作为最终的答案最好的解释候选人。我们的系统中，ExplanationLP，引发通过构建有关事实的加权曲线图对于每个候选答案和提取满足一定的结构和语义约束的事实解释。要提取的解释，我们采用了线性规划设计形式主义选择最优子。曲线图加权函数由一组参数，其中的我们微调以优化答案选择性能。我们履行我们对WorldTree和ARC-挑战语料库实验经验表现出以下结论：（1）接地 - 抽象推理链提供语义控制在以较少的学习进行解释的溯因推理（2）效率和稳健性由跑赢SOTA解释的表现优于在类似的设置（3）可推广当代可解释和变压器为基础的方法参数的方法，并综合性科学问题集。

117. Neural Code Completion with Anonymized Variable Names [PDF] 返回目录
Nadezhda Chirkova
Abstract: Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that renaming variables does not change the semantics of what the code does. In this work, we develop a recurrent architecture that processes code with all variable names anonymized, i. e. replaced with unique placeholders. The proposed architecture outperforms standard NLP baselines on code completion task by a large margin in the anonymized setting, and improves the base model in the non-anonymized setting, being ensembled with it.
摘要：源代码的处理主要依赖于广泛应用在自然语言处理（NLP）的方法，但涉及具体需要加以考虑，以实现更高的质量。这种特殊性的一个例子是，重命名变量不发生变化的代码做什么语义。在这项工作中，我们开发了一个经常性的架构，与所有变量名的进程码匿名，我。即具有独特的占位符代替。所提出的架构在匿名的环境大幅度优于在代码完成任务的标准NLP基线，并提高在非匿名设置的示范基地，正在合奏吧。

118. Long Document Ranking with Query-Directed Sparse Transformer [PDF] 返回目录
Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee, Wei Wang
Abstract: The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles. We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient than the fully-connected self-attention. All source codes, trained model, and predictions of this work are available at this https URL.
摘要：变压器自我关注的计算成本，往往需要打破长的文档，以适应在预训练的模型文件排序任务。在本文中，我们设计了查询，定向稀疏的关注，在变压器的自我关注诱导IR-不言自明的结构。我们的模型，达标变压器，强制排名所需的基本性能：本地情境，分层表示，和面向查询的接近匹配，同时也是从稀疏享有效率。在一个充分的监督和三个几拍TREC文件排序基准测试实验证明达标变压器的一致和强大的优势超过以前的方法，因为它们要么改造长文档进BERT或使用稀疏注意力不强调IR的原则。我们进一步量化计算的复杂性，并表明我们稀疏注意力TVM实现比完全连接的自我关注两次更有效。所有的源代码，训练模型，这项工作的预测可在此HTTPS URL。

119. The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation [PDF] 返回目录
Shurjo Banerjee, Jesse Thomason, Jason J. Corso
Abstract: Autonomous robot systems for applications from search and rescue to assistive guidance should be able to engage in natural language dialog with people. To study such cooperative communication, we introduce Robot Simultaneous Localization and Mapping with Natural Language (RobotSlang), a benchmark of 169 natural language dialogs between a human Driver controlling a robot and a human Commander providing guidance towards navigation goals. In each trial, the pair first cooperates to localize the robot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects. We introduce a Localization from Dialog History (LDH) and a Navigation from Dialog History (NDH) task where a learned agent is given dialog and visual observations from the robot platform as input and must localize in the global map or navigate towards the next target object, respectively. RobotSlang is comprised of nearly 5k utterances and over 1k minutes of robot camera and control streams. We present an initial model for the NDH task, and show that an agent trained in simulation can follow the RobotSlang dialog-based navigation instructions for controlling a physical robot platform. Code and data are available at this https URL.
摘要：从搜救辅助引导应用自治机器人系统应该能够从事自然语言对话的人。为了研究这种合作的沟通，我们引入机器人同步定位与地图用自然语言（RobotSlang）的人力驱动控制机器人和人类器对着导航目标提供指导与169个的自然语言对话的基准。在每次试验中，一对第一协作以定位一个全局映射到指挥官可见在机器人上，则驱动程序如下指挥官指令给移动机器人到目标对象的一个序列。我们介绍从对话历史（LDH），并从对话历史导航（NDH）任务，其中一个有学问的代理人给出从机器人平台的对话和目视观测作为输入，必须在全球地图或导航朝着下一个目标对象本地化本地化，分别。 RobotSlang由近5K话语和超过1K分钟机器人摄像机和控制流。我们提出了NDH任务初始模型，并表明在模拟训练的代理可以遵循基于对话框的RobotSlang导航指令控制的物理机器人平台。代码和数据都可以在此HTTPS URL。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-10-27

目录

摘要