摘要

1. Demoting Racial Bias in Hate Speech Detection [PDF] 返回目录
Mengzhou Xia, Anjalie Field, Yulia Tsvetkov
Abstract: In current hate speech datasets, there exists a high correlation between annotators' perceptions of toxicity and signals of African American English (AAE). This bias in annotated training data and the tendency of machine learning models to amplify it cause AAE text to often be mislabeled as abusive/offensive/hate speech with a high false positive rate by current hate speech classifiers. In this paper, we use adversarial training to mitigate this bias, introducing a hate speech classifier that learns to detect toxic sentences while demoting confounds corresponding to AAE texts. Experimental results on a hate speech dataset and an AAE dataset suggest that our method is able to substantially reduce the false positive rate for AAE text while only minimally affecting the performance of hate speech classification.
摘要：在当前仇恨言论的数据集，存在注释毒性的看法和非洲裔美国人的英语（AAE）的信号之间的高相关性。这种偏见在注释的训练数据和机器学习模型的倾向，放大它，因为AAE文字经常与当前仇恨言论分类较高的假阳性率误认为辱骂/攻击/仇恨言论。在本文中，我们使用对抗训练，以减轻这种偏差，引入仇恨言论分类器学会检测有毒的句子而降级对应AAE文本混淆。在仇恨言论集和AAE实验结果数据集表明，我们的方法是能够大幅降低误报率AAE文本，而只有极少影响仇恨言论分类的性能。

2. A review of sentiment analysis research in Arabic language [PDF] 返回目录
Oumaima Oueslati, Erik Cambria, Moez Ben HajHmida, Habib Ounelli
Abstract: Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language.
摘要：情感分析是，最近吸引了越来越多的关注自然语言处理的任务。然而，情感分析研究主要是进行英语。尽管阿拉伯语正在加快在互联网上使用最多的语言之一，只有少数的研究都集中在阿拉伯语情感分析至今。在本文中，我们通过提出限制和现有方法的优势开展这方面最重要的研究工作进行了深入的定性研究。特别是，我们的调查这两种方法，充分利用机器翻译或转让学习英语的资源适应阿拉伯语和方法，直接从阿拉伯语的干。

3. AMR quality rating with a lightweight CNN [PDF] 返回目录
Juri Opitz
Abstract: Structured semantic sentence representations such as Abstract Meaning Representations (AMRs) are potentially useful in a variety of natural language processing tasks. However, the quality of automatic parses can vary greatly and jeopardizes their usefulness. Therefore, we require systems that can accurately rate AMR quality in the absence of costly gold data. To achieve this, we transfer the AMR graph to the domain of images. This allows us to create a simple convolutional neural network (CNN) that imitates a human rater. In our experiments, we show that the method can rate the quality of AMR graphs more accurately than a strong baseline, with respect to several dimensions of interest. Furthermore, the method proves to be more efficient as it reduces the incurred energy consumption.
摘要：结构化的语义一句表述，如抽象的意思表示（自动抄表）是在各种自然语言处理任务可能有用。然而，自动解析的质量千差万别，危及它们的用处。因此，我们需要能够在没有昂贵的黄金数据的准确率AMR的质量体系。为了实现这一目标，我们将AMR图形转移到图像的领域。这使我们能够创造模仿人类评价者简单的卷积神经网络（CNN）。在我们的实验中，我们表明，该方法可以AMR图的质量比更准确强烈的基线，对于速度的几个感兴趣的方面。此外，该方法被证明是更有效的，因为它减少了所产生的能量消耗。

4. Adapting End-to-End Speech Recognition for Readable Subtitles [PDF] 返回目录
Danni Liu, Jan Niehues, Gerasimos Spanakis
Abstract: Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.
摘要：自动语音识别（ASR）系统的转录精度主要评估。然而，在某些使用情况，如字幕，逐字转录会减少给定的有限的屏幕尺寸和阅读时间的输出可读性。因此，今年工作重点放在ASR与输出压缩，一个任务，由于训练数据的缺乏监督的方法挑战。首先，我们考察一个级联系统，在无人监督的压缩模型被用来编辑后的转录讲话。然后，我们比较下输出长度的限制终端到终端的语音识别的几种方法。实验结果表明，用有限的数据远远低于需要从头开始训练模型，我们可以适应基于变压器的ASR模式，二者结合转录和抗压能力。此外，在WER和ROUGE得分方面的最佳性能是通过明确建模的端至端的ASR系统内的长度的约束实现的。

5. An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering [PDF] 返回目录
Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen
Abstract: In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.
摘要：在口语选择题的应答（SMCQA）的任务，给定一个通道，一个问题和多项选择所有演讲的形式，机器需要选择正确的选择来回答这个问题。虽然声音可能包含有用的线索为SMCQA，通常只有自动转录的文字被用于系统开发。由于采用大型预训练的语言表示模型，如变压器双向编码表示（BERT），只有自动转录文本系统仍然可以达到一定的性能水平。然而，先前的研究已经证明这声级统计可以抵消由自动语音识别系统或代表性不足潜伏在字嵌入发电机，从而使SMCQA系统强大的文本不准确。随着研究的路线，本文着重研究设计基于BERT-SMCQA框架，它不仅继承由BERT了解到情境语言表示的优点，而且集成了音频蒸馏，文字级信息互补声级信息。因此，音频富集的基础BERT-SMCQA框架建议。一系列的实验证明了在发布的中国SMCQA数据集选择的基准和SOTA系统精度显着改善。

6. NILE : Natural Language Inference with Faithful Natural Language Explanations [PDF] 返回目录
Sawan Kumar, Partha Talukdar
Abstract: The recent growth in the popularity and success of deep learning models on NLP classification tasks has accompanied the need for generating some form of natural language explanation of the predicted labels. Such generated natural language (NL) explanations are expected to be faithful, i.e., they should correlate well with the model's internal decision making. In this work, we focus on the task of natural language inference (NLI) and address the following question: can we build NLI systems which produce labels with high accuracy, while also generating faithful explanations of its decisions? We propose Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce labels along with its faithful explanation. We demonstrate NILE's effectiveness over previously reported methods through automated and human evaluation of the produced labels and explanations. Our evaluation of NILE also supports the claim that accurate systems capable of providing testable explanations of their decisions can be designed. We discuss the faithfulness of NILE's explanations in terms of sensitivity of the decisions to the corresponding explanations. We argue that explicit evaluation of faithfulness, in addition to label and explanation accuracy, is an important step in evaluating model's explanations. Further, we demonstrate that task-specific probes are necessary to establish such sensitivity.
摘要：在普及和NLP分类任务深度学习模式的成功，最近的增长一直伴随着产生某种形式的预测标签的自然语言解释的必要。这样产生的自然语言（NL）解释有望成为忠实的，即，它们应该与模型的内部决策密切相关。在这项工作中，我们专注于自然语言推理（NLI）的任务和解决以下问题：我们可以建立产生具有高精度的标签NLI系统，同时还产生了决策的忠实的解释吗？我们提出自然语言推理过标签的专项说明（NILE），它利用自动生成特定的标签，NL解释来生产标签，其忠实的解释和说明了一种新的方法NLI。我们证明了通过制作标签和说明的自动和人工评估之前报道的方法NILE的有效性。我们NILE的评价也支持能够提供其决定的可测试说明准确的系统可以设计的要求。我们讨论的决定相应的解释灵敏度方面尼罗河解释的忠诚。我们认为忠诚是明确的评价，除了标签和解释精度，在评估模型的解释的一个重要步骤。此外，我们证明了任务特异性探针是必要建立这样的灵敏度。

7. Køpsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding [PDF] 返回目录
Daniel Hershcovich, Miryam de Lhoneux, Artur Kulmizev, Elham Pejhan, Joakim Nivre
Abstract: We present Køpsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is effective for both Meaning Representation Parsing and Enhanced Universal Dependencies.
摘要：我们提出Køpsala，在2020年IWPT我们的系统哥本哈根 - 乌普萨拉体系，为增强型通用依赖共享任务是一个流水线，即关闭的，现成的模型一切，但增强图谱解析，而对于后者，一个transition-基于图解析器适于从Che等人。（2019）。我们每个语言训练一个增强的解析器模型，采用黄金分割句子符号化和培训，只有在标记化表面形式和编码多语种BERT依赖。虽然错误引入之前提交导致精确度严重下降，其后期提交的解决将带我们到第4位的官方排名，根据平均ELAS。我们的解析器表明，一个统一的管道是有效的两种含义代表解析和增强型通用依赖性。

8. Stable Style Transformer: Delete and Generate Approach with Encoder-Decoder for Text Style Transfer [PDF] 返回目录
Joosung Lee
Abstract: Text style transfer is the task that generates a sentence by preserving the content of the input sentence and transferring the style. Most existing studies are progressing on non-parallel datasets because parallel datasets are limited and hard to construct. In this work, we introduce a method that follows two stages in non-parallel datasets. The first stage is to delete attribute markers of a sentence directly through the classifier. The second stage is to generate the transferred sentence by combining the content tokens and the target style. We evaluate systems on two benchmark datasets. Transferred sentences are evaluated in terms of context, style, fluency, and semantic. These evaluation metrics are used to determine a stable system. Only robust systems in all evaluation metrics are suitable for use in real applications. Many previous systems are difficult to use in certain situations because they are unstable in some evaluation metrics. However, our system is stable in all evaluation metrics and has results comparable to other models.
摘要：文本样式转移是通过保持输入句子的内容和传输的风格产生了一句任务。大多数现有的研究是在非平行数据集进展，因为并行数据集是有限的，难以构建。在这项工作中，我们介绍的是遵循非并行数据集两个阶段的方法。第一个阶段是一个句子删除属性标记直接通过分级。第二阶段是通过组合所述内容令牌和目标样式，以产生传送的句子。我们评估两个标准数据集系统。转移的句子在上下文中，风格，流畅性和语义方面进行评估。这些评价指标被用来确定一个稳定的系统。只有在所有的评价指标，强大的系统适用于实际应用。许多以前的系统是困难的，因为他们在一些评价标准不稳定，在某些情况下使用。然而，我们的系统是稳定的在所有的评价指标和有效果媲美其他车型。

9. Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour [PDF] 返回目录
Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya
Abstract: The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading, named entity recognition, sarcasm detection $\textit{etc.}$ However, collecting gaze behaviour from readers is costly in terms of time and money. In this paper, we propose a way to improve automatic essay grading using gaze behaviour, where the gaze features are learnt at run time using a multi-task learning framework. To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays. Using the learnt gaze behaviour, we can achieve a statistically significant improvement in performance over the state-of-the-art system for the essay sets where we have gaze data. We also achieve a statistically significant improvement for 4 other essay sets, numbering about 6000 essays, where we have no gaze behaviour data available. Our approach establishes that learning gaze behaviour improves automatic essay grading.
摘要：{等}读者的注视行为是在解决一些NLP任务，如自动作文评分，命名实体识别，挖苦检测$ \ textit有益$然而，读者收集的注视行为是非常昂贵的，时间和金钱。在本文中，我们提出了一个方法来提高作文自动使用凝视行为，其中注视功能采用了多任务学习框架，在运行时了解到分级。为了证明这种多任务的学习为基础的方法来自动作文评分的功效，我们收集了跨越4套杂文随笔48凝视行为，并了解了文章的其余目光行为，编号7000的文章。利用学到的注视行为，就可以达到超过国家的最先进的系统，为散文集，我们有凝视数据的表现具有统计学显著的改善。我们也达到了4个其他文章集的统计显著改善，人数约6000篇杂文，在那里我们有没有注视行为数据可用。我们的做法是建立学习凝视行为提高了自动作文评分。

10. Knowledge Graph Simple Question Answering for Unseen Domains [PDF] 返回目录
Georgios Sidiropoulos, Nikos Voskarides, Evangelos Kanoulas
Abstract: Knowledge graph simple question answering (KGSQA), in its standard form, does not take into account that human-curated question answering training data only cover a small subset of the relations that exist in a Knowledge Graph (KG), or even worse, that new domains covering unseen and rather different to existing domains relations are added to the KG. In this work, we study KGSQA in a previously unstudied setting where new, unseen domains are added during test time. In this setting, question-answer pairs of the new domain do not appear during training, thus making the task more challenging. We propose a data-centric domain adaptation framework that consists of a KGSQA system that is applicable to new domains, and a sequence to sequence question generation method that automatically generates question-answer pairs for the new domain. Since the effectiveness of question generation for KGSQA can be restricted by the limited lexical variety of the generated questions, we use distant supervision to extract a set of keywords that express each relation of the unseen domain and incorporate those in the question generation method. Experimental results demonstrate that our framework significantly improves over zero-shot baselines and is robust across domains.
摘要：知识图简单的问答（KGSQA），在其标准的形式，并没有考虑到人策划的问答训练数据只涵盖中存在的知识图（KG）的关系的一小部分，甚至更糟，新域覆盖看不见，也相当不同的关系被添加到现有的KG域。在这项工作中，我们在新的，未知域在测试时增加了一个先前未研究的环境研究KGSQA。在这种背景下，新的域的问题 - 答案对不训练时出现，从而使任务更具挑战性。我们提出了一个数据中心领域的适应框架，由KGSQA系统，适用于新的领域，而一个序列序列题方法自动生成问题 - 答案对新域。由于问题产生的KGSQA有效性可以通过有限的词汇品种的产生问题的限制，我们使用遥远监督抽取一组表达看不见域的每个关系，并纳入那些在题方法的关键字。实验结果表明，我们的框架显著改进了零射门的基线，是跨域强劲。

11. Pointwise Paraphrase Appraisal is Potentially Problematic [PDF] 返回目录
Hannah Chen, Yangfeng Ji, David Evans
Abstract: The prevailing approach for training and evaluating paraphrase identification models is constructed as a binary classification problem: the model is given a pair of sentences, and is judged by how accurately it classifies pairs as either paraphrases or non-paraphrases. This pointwise-based evaluation method does not match well the objective of most real world applications, so the goal of our work is to understand how models which perform well under pointwise evaluation may fail in practice and find better methods for evaluating paraphrase identification models. As a first step towards that goal, we show that although the standard way of fine-tuning BERT for paraphrase identification by pairing two sentences as one sequence results in a model with state-of-the-art performance, that model may perform poorly on simple tasks like identifying pairs with two identical sentences. Moreover, we show that these models may even predict a pair of randomly-selected sentences with higher paraphrase score than a pair of identical ones.
摘要：用于训练和评估复述识别模型的普遍方法是构成为二元分类问题：该模型被赋予一对句子，并通过它如何准确地分类对如任一释义或非释义判断。这种基于逐点评价方法不匹配良好的最实际应用的目的，所以我们工作的目标是要了解这下逐点评估表现良好的模型在实践中可能会失败，并找到评估意译识别模型更好的方法。作为实现这一目标的第一步，我们表明，虽然通过与国家的最先进的性能模型配对两句话作为一个序列结果微调BERT为意译鉴定的标准方法，该模型可以不佳执行简单的任务，如识别与两个相同的句子对。此外，我们显示，这些模型可能甚至预测一对随机选择的句子的具有较高复述得分比一对相同的那些的。

12. Deep Learning Models for Automatic Summarization [PDF] 返回目录
Pirmin Lemberger
Abstract: Text summarization is an NLP task which aims to convert a textual document into a shorter one while keeping as much meaning as possible. This pedagogical article reviews a number of recent Deep Learning architectures that have helped to advance research in this field. We will discuss in particular applications of pointer networks, hierarchical Transformers and Reinforcement Learning. We assume basic knowledge of Seq2Seq architecture and Transformer networks within NLP.
摘要：文摘是自然语言处理的任务，旨在为文本文档转换成一个较短的一个，同时保持尽可能多的含义可能。这种教学文章评论最近的一些深学习架构，有助于推动该领域的研究中。我们将在指针网络，分层变压器和强化学习的具体应用讨论。我们假设内NLP Seq2Seq架构和变压器网络的基本知识。

13. Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text [PDF] 返回目录
Saif M. Mohammad
Abstract: Recent advances in machine learning have led to computer systems that are human-like in behaviour. Sentiment analysis, the automatic determination of emotions in text, is allowing us to capitalize on substantial previously unattainable opportunities in commerce, public health, government policy, social sciences, and art. Further, analysis of emotions in text, from news to social media posts, is improving our understanding of not just how people convey emotions through language but also how emotions shape our behaviour. This article presents a sweeping overview of sentiment analysis research that includes: the origins of the field, the rich landscape of tasks, challenges, a survey of the methods and resources used, and applications. We also discuss discuss how, without careful fore-thought, sentiment analysis has the potential for harmful outcomes. We outline the latest lines of research in pursuit of fairness in sentiment analysis.
摘要：在机器学习的最新进展，导致了有类似人类的行为的计算机系统。情感分析，文本自动确定的情绪，使我们能够利用在商业实质前所未有的机遇，公共卫生，政府政策，社会科学和艺术。此外，在文本的情感，从新闻到社交媒体帖子的分析，是提高我们的人不只是如何通过传递情感的语言理解也有七情六欲如何塑造我们的行为。本文介绍了情感分析研究的一个笼统的概述，包括：现场的起源，任务，挑战，方法和使用的资源和应用程序的调查景观丰富。我们还讨论讨论如何，没有仔细脱颖而出，思想，情感分析有有害后果的可能性。我们勾勒出研究的最新线路中的情感分析追求公平。

14. ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 [PDF] 返回目录
Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier
Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.
摘要：本文介绍了在IWSLT 2020年，离线语音翻译，同步语音翻译的评价活动有两位挑战的轨道发展的ON-TRAC联盟翻译系统。 LIA（阿维尼翁Üniversite电），LIG（Üniversite电格勒诺布尔阿尔卑斯大区），并LIUM（勒芒Üniversite电）：ON-TRAC协会是由法国三大学术实验室组成的研究人员。基于注意编码器，解码器模组，训练有素的终端到终端的，被用于我们提交给离线语音翻译的轨道。我们的贡献专注于数据增长以及多个模型的ensembling。在同步语音翻译的轨道，我们建立在基于变压器的等待-K型号为文本到文本的子任务。对于语音到文本的同声翻译，我们附上等待-K MT系统的混合动力ASR系统。我们提出了一个算法来控制ASR + MT级联的延迟，实现了良好的时延质量权衡两个子任务。

15. Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder Model [PDF] 返回目录
Satoru Katsumata, Mamoru Komachi
Abstract: Grammatical error correction (GEC) literature has reported on the effectiveness of pretraining a Seq2Seq model with a large amount of pseudo data. In this study, we explored two generic pretrained encoder-decoder (Enc-Dec) models, including BART, which reported the state-of-the-art (SOTA) results for several Seq2Seq tasks other than GEC. We found that monolingual and multilingual BART models achieve high performance in GEC, including a competitive result compared with the current SOTA result in English GEC. Our implementations will be publicly available at GitHub.
摘要：语法错误校正（GEC）文献已经报道了预训练Seq2Seq模型与大量的伪数据的有效性。在这项研究中，我们探讨了两个通用的预训练的编码解码器（ENC - 12月）的模型，其中包括捷运，报告比GEC其他几个Seq2Seq任务的国家的最先进的（SOTA）的结果。我们发现，单语和多语种BART模型实现GEC具有高性能，与目前SOTA结果在英语GEC比较有竞争力的结果。我们的实现将公开在GitHub上。

16. How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning [PDF] 返回目录
Aviad Elyashar, Rami Puzis, Michael Fire
Abstract: Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Currently, most solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. This sophisticated and innovative embeddings captures the way people pronounce names in any language and accent. Utilizing the name pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms. The performance of the proposed approach was found to be superior to 12 other algorithms evaluated in this study, including well used phonetic and string similarity algorithms, and two recently proposed algorithms. The results obtained suggest that the proposed approach could serve as a useful and valuable tool for solving the similar name suggestion problem.
摘要：搜索关于特定人的信息常常是由许多用户进行的在线活动。在大多数情况下，用户是通过包含一个名字并发送回网络搜索引擎寻找自己的意愿查询帮助。通常情况下，网络搜索引擎只提供一个包含名称查询相关的几个准确的结果。目前，这表明在网上搜索的同义词大多数解决方案都基于模式匹配和语音编码，但是很多时候，这样的解决方案的性能达不到最佳。在本文中，我们提出SpokenName2Vec，一种新颖的和通用的办法处理，利用自动语音生成和深度学习产生口语姓名的嵌入类似名称的建议问题。这个复杂的和创新的嵌入捕获任何语言和口音的方式发音的人的名字。利用名字的发音可以顶两个区分和检测姓名的声音都很有帮助，但有不同的写法。所提出的方法被证明在由25万个forenames大规模的数据集，并使用机器学习分类和7399名与他们核实同义词评估。该方法的性能被认为是优于该研究评估其他12种算法，包括很好用拼音和字符串相似性的算法，以及两个最近提出的算法。以上结果表明，该方法可作为解决类似名称的建议解决问题的有用和有价值的工具。

17. Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers [PDF] 返回目录
Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš
Abstract: Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models. While on the one hand, joint pretraining (i.e., training from scratch, adding objectives based on external knowledge to the primary LM objective) may be prohibitively computationally expensive, post-hoc fine-tuning on external knowledge, on the other hand, may lead to the catastrophic forgetting of distributional knowledge. In this work, we investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using adapter training. While overall results on the GLUE benchmark paint an inconclusive picture, a deeper analysis reveals that our adapter-based models substantially outperform BERT (up to 15-20 performance points) on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
摘要：继神经语言模型（LMS）如BERT或GPT-2上的各种语言理解任务的重大成功，最近的工作重点从注入外部资源（结构化），知识转化为这些模型。虽然在一方面，联合预训练（即，从头开始训练，基于外部知识到主LM目标添加目标）可以是令人望而却步计算上昂贵，事后对外部知识微调，在另一方面，有可能导致以分布式知识的灾难性遗忘。在这项工作中，我们调查从ConceptNet及其相应的开放性心理常识（OMCS）语料库补充BERT与概念知识分布式知识，分别使用适配器培训模式。虽然上胶基准总体结果画一个不确定的画面，更深层次的分析表明，在需要概念性知识的明确存在于ConceptNet和OMCS的类型推断的任务我们基于适配器的型号大幅跑赢大盘BERT（可达15-20性能点）。

18. KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation [PDF] 返回目录
Jiajing Wan, Xinting Huang
Abstract: This paper presents our strategies in SemEval 2020 Task 4: Commonsense Validation and Explanation. We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task. Our team ranks 2nd in subtask C according to human evaluation score.
摘要：本文介绍了我们战略SemEval 2020任务4：常识验证和说明。我们提出了一个新颖的方式来寻找证据，并选择不同的大型预训练模式为骨干三个子任务。结果表明，我们的证据，寻找办法提高了对常识的解释任务模型的表现。我们的团队根据人体的评估得分居子任务Ç第2位。

19. Adversarial NLI for Factual Correctness in Text Summarisation Models [PDF] 返回目录
Mario Barrantes, Benedikt Herudek, Richard Wang
Abstract: We apply the Adversarial NLI dataset to train the NLI model and show that the model has the potential to enhance factual correctness in abstract summarization. We follow the work of Falke et al. (2019), which rank multiple generated summaries based on the entailment probabilities between an source document and summaries and select the summary that has the highest entailment probability. The authors' earlier study concluded that current NLI models are not sufficiently accurate for the ranking task. We show that the Transformer models fine-tuned on the new dataset achieve significantly higher accuracy and have the potential of selecting a coherent summary.
摘要：我们采用对抗性NLI数据集训练NLI模型，表明该模型具有增强的事实正确性抽象总结的潜力。我们遵循法尔克等人的作品。（2019），其等级将多个所产生的摘要基于的源文档和摘要，并选择具有最高概率蕴涵的摘要之间的蕴涵概率。作者的早期研究得出的结论是当前NLI模型是不够准确的排名工作。我们表明，变压器模型的新的数据集微调达到显著更高的精确度，并有选择连贯总结的潜力。

20. GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning [PDF] 返回目录
Jianfeng Liu, Feiyang Pan, Ling Luo
Abstract: A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.
摘要：聊天机器人像人类交谈应该是目标导向（即，在交流的目的的），这是超越语言的产生。然而，现有的对话系统往往在很大程度上依赖于笨重的手工制作的规则或昂贵的标记数据集，以达到目标。在本文中，我们提出了面向目标的聊天机器人（GoChat），最终到终端的培训聊天机器人的框架，以最大限度地从离线多转对话数据集的长期回报。我们的框架采用分层强化学习（HRL），其中高级别政策通过确定一些子目标引向最终目标的谈话，而低一级的政策符合通过生成相应的话语为响应子球。在我们的现实世界对话的数据集金融反欺诈实验，我们的方法优于上一代的反应既质量以及实现目标的成功率以前的方法。

21. A Novel Distributed Representation of News (DRNews) for Stock Market Predictions [PDF] 返回目录
Ye Ma, Lu Zong, Peiwan Wang
Abstract: In this study, a novel Distributed Representation of News (DRNews) model is developed and applied in deep learning-based stock market predictions. With the merit of integrating contextual information and cross-documental knowledge, the DRNews model creates news vectors that describe both the semantic information and potential linkages among news events through an attributed news network. Two stock market prediction tasks, namely the short-term stock movement prediction and stock crises early warning, are implemented in the framework of the attention-based Long Short Term-Memory (LSTM) network. It is suggested that DRNews substantially enhances the results of both tasks comparing with five baselines of news embedding models. Further, the attention mechanism suggests that short-term stock trend and stock market crises both receive influences from daily news with the former demonstrates more critical responses on the information related to the stock market {\em per se}, whilst the latter draws more concerns on the banking sector and economic policies.
摘要：在这项研究中，新闻（DRNews）模型的新型分布式表示是开发和深学习型股市预测应用。随着整合的背景信息和跨文献学知识的优点，DRNews模型创建通过归因新闻网描述新闻事件之间的语义信息和潜在的联系，双方新闻载体。两个股市预测的任务，即短期股票走势预测和股票的危机预警，在关注基于长短期内存（LSTM）网络的框架中实现的。有人建议，DRNews大幅提高，两个任务，新闻嵌入模型五个基准比较的结果。此外，注意机制表明，短期股指走势和股市危机都收到来自每日经济新闻的影响与前者表明就有关股市{\ EM本身}的信息更为关键的反应，而后者吸引更多的关注对银行业和经济政策。

22. When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications [PDF] 返回目录
Zequn Liu, Ruiyi Zhang, Yiping Song, Ming Zhang
Abstract: Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.
摘要：模型无关元学习（MAML），模型无关元的学习方法，在NLP应用，包括一些次文本分类和多域低资源语言生成成功采用。许多影响因素，包括数据量，任务之间的相似性，和一般的语言模型和任务具体的适应之间的平衡，会影响MAML在NLP的表现，但很少作品彻底研究它们。在本文中，我们进行了实证研究，以调查这些影响因素，并得出结论：当MAML工作基础上，实验结果是最好的。

23. Integrated Node Encoder for Labelled Textual Networks [PDF] 返回目录
Ye Ma, Lu Zong
Abstract: Voluminous works have been implemented to exploit content-enhanced network embedding models, with little focus on the labelled information of nodes. Although TriDNR leverages node labels by treating them as node attributes, it fails to enrich unlabelled node vectors with the labelled information, which leads to the weaker classification result on the test set in comparison to existing unsupervised textual network embedding models. In this study, we design an integrated node encoder (INE) for textual networks which is jointly trained on the structure-based and label-based objectives. As a result, the node encoder preserves the integrated knowledge of not only the network text and structure, but also the labelled information. Furthermore, INE allows the creation of label-enhanced vectors for unlabelled nodes by entering their node contents. Our node embedding achieves state-of-the-art performances in the classification task on two public citation networks, namely Cora and DBLP, pushing benchmarks up by 10.0\% and 12.1\%, respectively, with the 70\% training ratio. Additionally, a feasible solution that generalizes our model from textual networks to a broader range of networks is proposed.
摘要：浩繁的工程已经实施利用内容增强网络嵌入模式，很少有重点节点的标记信息。虽然TriDNR将它们视为节点属性利用节点的标签，它未能丰富的未标记的节点向量与标记的信息，这导致在测试组中的弱分类结果相比于现有的无监督的文本网络嵌入模型。在这项研究中，我们为这是对基于标签的结构为基础和目标的联合训练的文本网络设计的集成节点编码器（INE）。其结果是，节点编码保留不仅网络文本和结构，同时也标记信息的综合知识。此外，INE允许未标记节点的标签增强载体的建立通过输入他们的节点的内容。我们的节点嵌入实现状态的最艺术表演在两个公共引网络，即科拉和DBLP，推基准分别上升10.0 \％和12.1 \％，与70 \％训练比的分类任务。此外，该概括我们从文本网络模型更广范围的网络的一个可行的解决方案，提出了

24. MASK: A flexible framework to facilitate de-identification of clinical texts [PDF] 返回目录
Nikola Milosevic, Gangamma Kalappa, Hesam Dadafarin, Mahmoud Azimaee, Goran Nenadic
Abstract: Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (currently implemented are two versions of CRF-based techniques and BiLSTM-based neural network with pre-trained GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity).
摘要：医疗记录和临床总结包含的重要信息以文本形式的大量的，可以帮助治疗上，药物和公共卫生推进研究。然而，大多数的这些信息不共享，因为它们包含有关病人，家属或医务人员对待他们的私人信息。法规，例如HIPPA在美国，PHIPPA在加拿大和GDPR规范这个信息的保护，处理和分发。如果这些信息去识别和个人信息被替换或删节，他们被分配到研究团体。在本文中，我们目前MASK，一个软件包，旨在执行去标识任务。该软件能够利用一些国家的最先进的技术来进行命名实体识别，然后掩盖或编校认可的实体。用户能够选择命名实体识别算法（目前实施与预先训练手套和埃尔莫嵌入基于CRF的技术和基于BiLSTM神经网络两个版本）和掩蔽算法（例如换档日期，更换名称/位置，完全编校实体）。

25. Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge [PDF] 返回目录
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Abstract: In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019. The main theme in this challenge is to build a speech synthesizer without any textual information or phonetic labels. In order to tackle those challenges, we build a system that must address two major components such as 1) given speech audio, extract subword units in an unsupervised way and 2) re-synthesize the audio from novel speakers. The system also needs to balance the codebook performance between the ABX error rate and the bitrate compression rate. Our main contribution here is we proposed Transformer-based VQ-VAE for unsupervised unit discovery and Transformer-based inverter for the speech synthesis given the extracted codebook. Additionally, we also explored several regularization methods to improve performance even further.
摘要：在本文中，我们报道了我们提交的系统上轨2019的ZeroSpeech 2020挑战这一挑战的主题是建立一个语音合成器没有任何文字信息或语音标签。为了应对这些挑战，我们建立了一个系统，必须解决两个主要部件，如1）给出的语音音频，提取子字单位无人监督的方式; 2）重新合成了由新的扬声器的音频。该系统还需要平衡ABX错误率和比特率压缩率之间的码本的性能。我们在这里的主要贡献是提出了基于变压器的VQ-VAE无监督单位发现和基于变压器的逆变器提供的提取码本的语音合成。此外，我们还探讨了几种正规化方法，以进一步提高性能。

26. A Question Type Driven and Copy Loss Enhanced Frameworkfor Answer-Agnostic Neural Question Generation [PDF] 返回目录
Xiuyu Wu, Nan Jiang, Yunfang Wu
Abstract: The answer-agnostic question generation is a significant and challenging task, which aims to automatically generate questions for a given sentence but without an answer. In this paper, we propose two new strategies to deal with this task: question type prediction and copy loss mechanism. The question type module is to predict the types of questions that should be asked, which allows our model to generate multiple types of questions for the same source sentence. The new copy loss enhances the original copy mechanism to make sure that every important word in the source sentence has been copied when generating questions. Our integrated model outperforms the state-of-the-art approach in answer-agnostic question generation, achieving a BLEU-4 score of 13.9 on SQuAD. Human evaluation further validates the high quality of our generated questions. We will make our code public available for further research.
摘要：答案无关的问题产生是一个显著和艰巨的任务，其目的是自动生成一个给定的判决，但没有答案的问题。在本文中，我们提出了两种新的策略来处理这个任务：问题类型的预测和复制损失机制。问题类型的模块是预测应该问的问题的类型，这使得我们的模型生成多种类型的相同源句子的问题。新副本的损失增强了原来的复制机制，以确保产生的问题时，在源句子的每个重要的词已经被复制。我们的综合模型优于在回答无关的问题生成的国家的最先进的方法，实现了在队内BLEU-4得分13.9。人的评价，进一步验证了我们产生问题的高品质。我们将提供进一步研究我们的代码公开。

27. Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding [PDF] 返回目录
Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu
Abstract: Spoken Language Understanding (SLU) converts hypotheses from automatic speech recognizer (ASR) into structured semantic representations. ASR recognition errors can severely degenerate the performance of the subsequent SLU module. To address this issue, word confusion networks (WCNs) have been used to encode the input for SLU, which contain richer information than 1-best or n-best hypotheses list. To further eliminate ambiguity, the last system act of dialogue context is also utilized as additional input. In this paper, a novel BERT based SLU model (WCN-BERT SLU) is proposed to encode WCNs and the dialogue context jointly. It can integrate both structural information and ASR posterior probabilities of WCNs in the BERT architecture. Experiments on DSTC2, a benchmark of SLU, show that the proposed method is effective and can outperform previous state-of-the-art models significantly.
摘要：口语理解（SLU）从自动语音识别（ASR）为结构化的语义表示转换假设。 ASR识别错误会严重退化后续SLU模块的性能。为了解决这个问题，字混淆网络（WCNS）已经被用于编码为SLU输入，其中包含比1 - 最佳或n个最好的假设列表更丰富的信息。以进一步消除歧义，对话上下文的最后的系统行为也被用作附加输入。在本文中，一种新颖的基于BERT SLU模型（WCN-BERT SLU）提出了编码和WCNS对话上下文共同。它可以整合两者的结构信息和ASR后在BERT架构WCNS的概率。在DSTC2，SLU的一个标杆，实验结果表明，该方法是有效的，可以显著超越国家的最先进的以往机型。

28. From Witch's Shot to Music Making Bones -- Resources for Medical Laymen to Technical Language and Vice Versa [PDF] 返回目录
Laura Seiffe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian Möller, Roland Roller
Abstract: Many people share information in social media or forums, like food they eat, sports activities they do or events which have been visited. This also applies to information about a person's health status. Information we share online unveils directly or indirectly information about our lifestyle and health situation and thus provides a valuable data resource. If we can make advantage of that data, applications can be created that enable e.g. the detection of possible risk factors of diseases or adverse drug reactions of medications. However, as most people are not medical experts, language used might be more descriptive rather than the precise medical expression as medics do. To detect and use those relevant information, laymen language has to be translated and/or linked to the corresponding medical concept. This work presents baseline data sources in order to address this challenge for German. We introduce a new data set which annotates medical laymen and technical expressions in a patient forum, along with a set of medical synonyms and definitions, and present first baseline results on the data.
摘要：很多人分享社交媒体或论坛的信息，如食物他们吃，他们做体育活动或已被访问的事件。这也适用于一个人的健康状况的信息。我们的信息直接或间接分享在线推出有关我们的生活方式和健康状况，从而提供了宝贵的数据资源。如果我们可以作出这样的数据优势，应用程序可以创建，使如的可能的风险检测因子的疾病或药物的药物不良反应。然而，因为大多数人都没有医学专家，语言使用的可能是更具描述性的，而不是精确的医疗表达医务人员做的。为了检测并使用这些相关的信息，外行语言要翻译和/或链接到相应的医学概念。为了解决德国这一挑战这项工作提出基线数据源。我们介绍其标注医疗外行和技术表现在患者的论坛上，有一所集医疗同义词和定义，以及对数据当前的第一基准结果沿着新的数据集。

29. Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media [PDF] 返回目录
Xiangjue Dong, Changmao Li, Jinho D. Choi
Abstract: We present a transformer-based sarcasm detection model that accounts for the context from the entire conversation thread for more robust predictions. Our model uses deep transformer layers to perform multi-head attentions among the target utterance and the relevant context in the thread. The context-aware models are evaluated on two datasets from social media, Twitter and Reddit, and show 3.1% and 7.0% improvements over their baselines. Our best models give the F1-scores of 79.0% and 75.0% for the Twitter and Reddit datasets respectively, becoming one of the highest performing systems among 36 participants in this shared task.
摘要：我们提出了占从整个对话线程上下文的更强大的预测基于变压器的嘲讽检测模型。我们的模型采用深变压器层作为对象的发声和线程的有关情况之间进行多头关注。上下文感知模型从社交媒体，微博和Reddit两个数据集进行评估，并显示在其基线3.1％和7.0％的改进。我们最好的车型给予的79.0％，并分别Twitter的和Reddit数据集75.0％的F1-成绩，成为36名与会者在这个共享任务中性能最高的系统之一。

30. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [PDF] 返回目录
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
Abstract: Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
摘要：大型预训练的语言模型已被证明的事实的知识存储在他们的参数，实现国家的最先进的成果上下游NLP任务时微调。然而，他们的访问能力和精确的操纵知识仍然有限，因此对知识密集的任务，后面任务的具体架构他们的表现滞后。另外，对于他们的决策提供来源和更新他们的世界的知识保持开放的研究问题。预先训练模式与微接入机制，明确非参数内存可以解决这个问题，但迄今只研究了采掘下游任务。我们探索检索增加了的一代（RAG）通用微调配方 - 这对于语言生成结合预先训练参数和非参数化存储模型。介绍RAG车型在参数的内存是预先训练seq2seq模型和非参数的内存是维基百科的密集矢量指数，具有预训练神经猎犬访问。我们比较两个RAG制剂，其中一个上跨越整个生成的序列相同的检索条件的通道，其他的可以每个令牌使用不同的通道。我们微调和评估范围广泛的知识密集的NLP任务，我们的模型，并设置了国家的最先进的三个域QA任务，跑赢参数seq2seq模型和任务特定的检索和抽取架构。对于语言生成的任务，我们发现，RAG模型产生比一个国家的最先进的唯一参数，seq2seq基线更具体，多样，真实的语言。

31. Towards Open Domain Event Trigger Identification using Adversarial Domain Adaptation [PDF] 返回目录
Aakanksha Naik, Carolyn Rosé
Abstract: We tackle the task of building supervised event trigger identification models which can generalize better across domains. Our work leverages the adversarial domain adaptation (ADA) framework to introduce domain-invariance. ADA uses adversarial training to construct representations that are predictive for trigger identification, but not predictive of the example's domain. It requires no labeled data from the target domain, making it completely unsupervised. Experiments with two domains (English literature and news) show that ADA leads to an average F1 score improvement of 3.9 on out-of-domain data. Our best performing model (BERT-A) reaches 44-49 F1 across both domains, using no labeled target data. Preliminary experiments reveal that finetuning on 1% labeled data, followed by self-training leads to substantial improvement, reaching 51.5 and 67.2 F1 on literature and news respectively.
摘要：我们解决建设监督事件触发标识模型可以跨域更好的推广任务。我们的工作充分利用了对抗领域适应性（ADA）框架引入域不变性。 ADA采用对抗性的训练来构造表示可预测的触发标识，但不能预测的例子的域名。它不需要从目标域标记数据，使得它完全不受监督。有两个域（英国文学和新闻）实验表明，ADA导致了3.9的平均得分F1上的改进外域的数据。我们的最佳表现模型（BERT-A）达到44-49 F1在这两个域，不使用标记的目标数据。初步实验表明在1％的标签数据，其次是自我训练导致显着改善，分别达到对文学和新闻51.5和67.2 F1是细化和微调。

32. SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding [PDF] 返回目录
Li Zhang, Han Wang, Lingxiao Li
Abstract: Pair-based metric learning has been widely adopted to learn sentence embedding in many NLP tasks such as semantic text similarity due to its efficiency in computation. Most existing works employed a sequence encoder model and utilized limited sentence pairs with a pair-based loss to learn discriminating sentence representation. However, it is known that the sentence representation can be biased when the sampled sentence pairs deviate from the true distribution of all sentence pairs. In this paper, our theoretical analysis shows that existing works severely suffered from a good pair sampling and instance weighting strategy. Instead of one time pair selection and learning on equal weighted pairs, we propose a unified locality weighting and learning framework to learn task-specific sentence embedding. Our model, SentPWNet, exploits the neighboring spatial distribution of each sentence as locality weight to indicate the informative level of sentence pair. Such weight is updated along with pair-loss optimization in each round, ensuring the model keep learning the most informative sentence pairs. Extensive experiments on four public available datasets and a self-collected place search benchmark with 1.4 million places clearly demonstrate that our model consistently outperforms existing sentence embedding methods with comparable efficiency.
摘要：基于对-度量学习已经被广泛采用，以学习NLP的许多任务，比如语义文本相似度句子嵌入由于其在计算效率。大多数现有的工程中采用的序列编码器模型和利用具有一对基损耗局限于句子对学习判别句子表示。然而，已知的是，当所采样的句子对来自所有句子对真实分布偏离句子表示可以被偏置。在本文中，我们的理论分析表明，现有的作品从严重一双好取样和实例加权策略遭遇。而不是一次对选择和学习等权对，我们提出了一个统一的地区权重和学习框架，以学习任务特定的句子嵌入。我们的模型，SentPWNet，利用每个句子的周边空间分布局部性权重指示句对的信息水平。这样的重量与在每一轮对损失优化一起更新，保证了模型不断的学习最翔实的句子对。四个公共可用的数据集和140万个地方自采的地方搜索基准大量的实验清楚地表明，我们的模型的性能一直优于现有的句子嵌入相媲美效率的方法。

33. The Discussion Tracker Corpus of Collaborative Argumentation [PDF] 返回目录
Christopher Olshefski, Luca Lugini, Ravneet Singh, Diane Litman, Amanda Godley
Abstract: Although Natural Language Processing (NLP) research on argument mining has advanced considerably in recent years, most studies draw on corpora of asynchronous and written texts, often produced by individuals. Few published corpora of synchronous, multi-party argumentation are available. The Discussion Tracker corpus, collected in American high school English classes, is an annotated dataset of transcripts of spoken, multi-party argumentation. The corpus consists of 29 multi-party discussions of English literature transcribed from 985 minutes of audio. The transcripts were annotated for three dimensions of collaborative argumentation: argument moves (claims, evidence, and explanations), specificity (low, medium, high) and collaboration (e.g., extensions of and disagreements about others' ideas). In addition to providing descriptive statistics on the corpus, we provide performance benchmarks and associated code for predicting each dimension separately, illustrate the use of the multiple annotations in the corpus to improve performance via multi-task learning, and finally discuss other ways the corpus might be used to further NLP research.
摘要：虽然争论挖掘自然语言处理（NLP）的研究在最近几年有了很大的进步，多数研究异步和书面文本，通常由个人产生的语料库借鉴。同步，多方论证的很少发表语料库是可用的。探讨跟踪语料，在美国高中英语课收集，是口语，多方论证的成绩单的注释数据集。该文集由985分钟的音频转录英国文学29多方讨论。该转录物注释协作论证的三个维度：移动参数（权利要求书，证据和解释），特异性（低，中，高）和协作（例如，扩展和分歧约他人的想法）。除了在语料库提供了描述性统计，我们提供性能基准和分别预测每个维度相关的代码，说明在语料库中使用多个注释，以提高通过多任务学习的性能，最后讨论等方式胼威力被用来进一步研究NLP。

34. NENET: An Edge Learnable Network for Link Prediction in Scene Text [PDF] 返回目录
Mayank Kumar Singh, Sayan Banerjee, Shubhasis Chaudhuri
Abstract: Text detection in scenes based on deep neural networks have shown promising results. Instead of using word bounding box regression, recent state-of-the-art methods have started focusing on character bounding box and pixel-level prediction. This necessitates the need to link adjacent characters, which we propose in this paper using a novel Graph Neural Network (GNN) architecture that allows us to learn both node and edge features as opposed to only the node features under the typical GNN. The main advantage of using GNN for link prediction lies in its ability to connect characters which are spatially separated and have an arbitrary orientation. We show our concept on the well known SynthText dataset, achieving top results as compared to state-of-the-art methods.
摘要：基于深层神经网络场景文字检测已展现出可喜效果。相反，使用单词边界框回归的，国家的最先进的方法，最近已经开始注重个性边框和像素级别的预测。这要求以链接相邻的字符，我们使用一种新的图形神经网络（GNN）架构，使我们能够学习点和边功能，而不是仅节点下的典型特征GNN在本文中提出。在其能力使用GNN链路预测谎言连接在空间分隔的字符，并具有任意方向的主要优势。我们展示的著名SynthText数据集我们的概念，相比于国家的最先进的方法达到最高的效果。

35. An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling [PDF] 返回目录
Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen
Abstract: Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better mispronunciation detection and diagnosis feedback. Furthermore, a novel transfer-learning paradigm is devised to obtain the initial model estimate of the E2E-based MDD system without resource to any phonological rules. Extensive sets of experimental results on the L2-ARCTIC dataset show that our best system can outperform the existing E2E baseline system and pronunciation scoring based method (GOP) in terms of the F1-score, by 11.05% and 27.71%, respectively.
摘要：发音错误检测和诊断（MDD）是计算机辅助的发音训练（CAPT）的核心组件。大多数现有的MDD接近关于处理分类误差焦点（即一个典型电话被另一免受由缺失或插入的那些错误发音取代，一边）。然而，精确的检测和非范畴或失真误差的诊断（即近似L2电话与L1（第一语言）电话，或在之间的错误发音）似乎仍然接触的地方。鉴于此，我们提出一个新的端至端的自动语音识别（基于E2E-ASR）方法以进行MDD。特别是，我们扩大了原来的L2手机设置与其对应的抗手机设置，使得基于E2E-MDD方法有更好的能力来采取两个分类和非范畴的错误发音，旨在提供更好的发音错误检测和诊断反馈。此外，一个新的传输学习范例被设计以获得基于E2E-MDD系统的初始模型估计没有资源的任何语音的规则。广泛套的L2-ARCTIC数据集上，我们最好的系统能够在F1-得分方面分别优于现有的E2E基线系统和发音评分为基础的方法（GOP），由11.05％和27.71％，实验结果。

36. Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection [PDF] 返回目录
Murong Ma, Haiwei Wu, Xuyang Wang, Lin Yang, Junjie Wang, Ming Li
Abstract: In this paper, we propose a deep convolutional neural network-based acoustic word embedding system on code-switching query by example spoken term detection. Different from previous configurations, we combine audio data in two languages for training instead of only using one single language. We transform the acoustic features of keyword templates and searching content to fixed-dimensional vectors and calculate the distances between keyword segments and searching content segments obtained in a sliding manner. An auxiliary variability-invariant loss is also applied to training data within the same word but different speakers. This strategy is used to prevent the extractor from encoding undesired speaker- or accent-related information into the acoustic word embeddings. Experimental results show that our proposed system produces promising searching results in the code-switching test scenario. With the increased number of templates and the employment of variability-invariant loss, the searching performance is further enhanced.
摘要：在本文中，我们通过举例说项检测提出了深刻的卷积基于神经网络的声字嵌入的代码交换查询系统。从以前的配置不同的是，我们结合两种语言的音频数据进行训练，而不是只使用一种语言。我们变换的关键字模板的声学特征和搜索内容的固定维向量和计算关键字段和以滑动的方式获得的搜索内容段之间的距离。辅助变性不变的损失也适用于同一个字，但不同的音箱内的训练数据。这种策略是用来防止提取由编码不需要说话者或口音相关信息进入声字的嵌入。实验结果表明，该系统产生看好中码转换的测试场景搜索结果。随着模板的数量增加和变化不变损失的就业，搜索性能进一步增强。

37. Lite Audio-Visual Speech Enhancement [PDF] 返回目录
Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang
Abstract: Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems. Despite improved denoising performance, two problems may be encountered when implementing an audio-visual SE (AVSE) system: (1) additional processing costs are incurred to incorporate visual input and (2) the use of face or lip images may cause privacy problems. In this study, we propose a Lite AVSE (LAVSE) system to address these problems. The system includes two visual data compression techniques and removes the visual feature extraction network from the training model, yielding better online computation efficiency. Our experimental results indicate that the proposed LAVSE system can provide notably better performance than an audio-only SE system with a similar number of model parameters. In addition, the experimental results confirm the effectiveness of the two techniques for visual data compression.
摘要：以往的研究已经证实结合视觉信息转化为语音增强（SE）系统的有效性。尽管改进的去噪性能，可实现视听SE（AVSE）系统时，会遇到两个问题：（1）附加处理成本发生将视觉输入和（2）使用面部或嘴唇图像的可能引起隐私问题。在这项研究中，我们提出了一个精简版AVSE（LAVSE）系统来解决这些问题。该系统包括两个视觉数据压缩技术和来自训练模型去除视觉特征提取网络，得到更好的在线计算效率。我们的实验结果表明，该LAVSE系统能够提供显着性能优于纯音频SE系统类似的模型参数的数量。此外，实验结果证实了两种技术用于视觉数据压缩的效果。

38. Glottal source estimation robustness: A comparison of sensitivity of voice source estimation techniques [PDF] 返回目录
Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D'Alessandro, Thierry Dutoit
Abstract: This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the fundamental frequency and the first formant on the performance are evaluated. Our proposed approach shows significant improvement in robustness, which could be of a great interest when decomposing real speech.
摘要：本文地址直接从语音波形估计语音源的问题。基于Anticausality统区（ACDR）一种新颖的原理被用于估计声门开放阶段。这种技术与两个其它国家的最先进的公知的方法中，Z变换（ZZT）的即零点和迭代自适应逆滤波（IAIF）算法。谱失真和声门共振峰判定率：分解质量是通过两个客观测量评估上的合成信号。技术的鲁棒性是通过分析噪声和声门闭合瞬间（GCI）的位置误差的影响进行试验。除了基本的频率和性能的第一共振峰的影响进行评估。我们提出的方法显示在稳健显著改善，分解真言论时，这可能是一个极大的兴趣。

39. COVID-19 Public Opinion and Emotion Monitoring System Based on Time Series Thermal New Word Mining [PDF] 返回目录
Yixian Zhang, Jieren Chen, Boyi Liu, Yifan Yang, Haocheng Li, Xinyi Zheng, Xi Chen, Tenglong Ren, Naixue Xiong
Abstract: With the spread and development of new epidemics, it is of great reference value to identify the changing trends of epidemics in public emotions. We designed and implemented the COVID-19 public opinion monitoring system based on time series thermal new word mining. A new word structure discovery scheme based on the timing explosion of network topics and a Chinese sentiment analysis method for the COVID-19 public opinion environment is proposed. Establish a "Scrapy-Redis-Bloomfilter" distributed crawler framework to collect data. The system can judge the positive and negative emotions of the reviewer based on the comments, and can also reflect the depth of the seven emotions such as Hopeful, Happy, and Depressed. Finally, we improved the sentiment discriminant model of this system and compared the sentiment discriminant error of COVID-19 related comments with the Jiagu deep learning model. The results show that our model has better generalization ability and smaller discriminant error. We designed a large data visualization screen, which can clearly show the trend of public emotions, the proportion of various emotion categories, keywords, hot topics, etc., and fully and intuitively reflect the development of public opinion.
摘要：随着新的传染病的传播与发展，具有十分重要的参考价值，以确定公众情绪流行病的变化趋势。我们基于时间序列热新词挖掘设计并实现了COVID-19舆情监测系统。基于网络话题的时机爆炸的COVID-19的舆论环境中国的情感分析方法的新词结构发现方案提出。树立“Scrapy，Redis的-布隆过滤器”分布式爬虫框架来收集数据。该系统可以判断基础上的评论评论者的积极和消极情绪，也能体现出七情如乐观，快乐，郁闷的深度。最后，我们改进了这一系统的情绪判别模型，比较与夹谷深学习模型COVID-19相关意见进行情绪判别错误。结果表明，我们的模型具有较好的泛化能力和更小的判别错误。我们设计了一个大的数据可视化屏幕，可清晰显示公众情绪的趋势，各种情感类别，关键字，热点话题等，充分和直观的比例反映民意的发展。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-26

目录

摘要