摘要

1. Semantic Scaffolds for Pseudocode-to-Code Generation [PDF] 返回目录
Ruiqi Zhong, Mitchell Stern, Dan Klein
Abstract: We propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program. By first searching over plausible scaffolds then using these as constraints for a beam search over programs, we achieve better coverage of the search space when compared with existing techniques. We apply our hierarchical search method to the SPoC dataset for pseudocode-to-code generation, in which we are given line-level natural language pseudocode annotations and aim to produce a program satisfying execution-based test cases. By using semantic scaffolds during inference, we achieve a 10% absolute improvement in top-100 accuracy over the previous state-of-the-art. Additionally, we require only 11 candidates to reach the top-3000 performance of the previous best approach when tested against unseen problems, demonstrating a substantial improvement in efficiency.
摘要：提出了一种基于语义的支架，代表程序的高层次语义和句法组成轻型结构的程序生成的方法。通过在合理的支架然后使用这些作为在节目梁搜索约束优先检索，与现有技术相比，我们实现了对搜索空间的更好的覆盖。我们运用我们的分层搜索方法对SPOC数据集的伪代码到代码的生成，在此我们给出线级别的自然语言伪代码注释，目标是制造满足基于执行测试用例的程序。由此推断过程中使用语义支架，我们实现了顶级的100精度比前国家的最先进的10％绝对改善。此外，我们只需要11名候选人时，对看不见的问题进行测试，证明了效率的显着改善，达到以前的最好方法的顶级性能3000。

2. Intersectional Bias in Hate Speech and Abusive Language Datasets [PDF] 返回目录
Jae Yeon Kim, Carlos Ortiz, Sarah Nam, Sarah Santiago, Vivek Datta
Abstract: Algorithms are widely applied to detect hate speech and abusive language in social media. We investigated whether the human-annotated data used to train these algorithms are biased. We utilized a publicly available annotated Twitter dataset (Founta et al. 2018) and classified the racial, gender, and party identification dimensions of 99,996 tweets. The results showed that African American tweets were up to 3.7 times more likely to be labeled as abusive, and African American male tweets were up to 77% more likely to be labeled as hateful compared to the others. These patterns were statistically significant and robust even when party identification was added as a control variable. This study provides the first systematic evidence on intersectional bias in datasets of hate speech and abusive language.
摘要：算法被广泛适用于检测社交媒体煽动仇恨的言论和侮辱性的语言。我们调查是否用来训练这些算法的人类标注的数据偏差。我们利用一个公开的注解Twitter的数据集（Founta等2018）和分类的种族，性别，和99996个鸣叫政党认同的尺寸。结果表明，非裔美国人的鸣叫更可能分别达到3.7倍，被标记为滥用，以及非洲裔男性鸣叫分别达到77％，更有可能被标记为可恶的比别人。即使当事人身份加入作为控制变量，这些模式有统计学显著和鲁棒性。这项研究提供了关于仇恨言论和侮辱性的语言的数据集交叉偏置第一系统的证据。

3. TextAttack: A Framework for Adversarial Attacks in Natural Language Processing [PDF] 返回目录
John X. Morris, Eli Lifland, Jin Yong Yoo, Yanjun Qi
Abstract: TextAttack is a library for running adversarial attacks against natural language processing (NLP) models. TextAttack builds attacks from four components: a search method, goal function, transformation, and a set of constraints. Researchers can use these components to easily assemble new attacks. Individual components can be isolated and compared for easier ablation studies. TextAttack currently supports attacks on models trained for text classification and entailment across a variety of datasets. Additionally, TextAttack's modular design makes it easily extensible to new NLP tasks, models, and attack strategies. TextAttack code and tutorials are available at this https URL}{this https URL.
摘要：TextAttack是运行针对自然语言处理（NLP）模型对抗攻击的库。 TextAttack建立由四个部分组成攻击：目标函数变换的检索方法，以及一组约束。研究人员可以利用这些组件容易地组装新的攻击。单个组件可以被分离，并更容易消融研究相比。 TextAttack目前支持跨多个数据集训练文本分类和蕴涵模型攻击。此外，TextAttack的模块化设计使得它很容易扩展到新的NLP任务，模型和攻击策略。 TextAttack代码和教程可在此HTTPS URL} {此HTTPS URL。

4. Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [PDF] 返回目录
Wenyu Du, Zhouhan Lin, Yikang Shen, Timothy J. O'Donnell, Yoshua Bengio, Yue Zhang
Abstract: It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
摘要：人们普遍认为的句法结构方面的知识，提高语言建模。然而，有效和高效地计算结合语法结构为神经语言模型一直是一个富有挑战性的课题。在本文中，我们使用了多任务目标，即模型同时预测的话，以及地面实况解析树在一个名为“句法距离”的形式，在这两个独立的目标共享相同的中间表示之间的信息。在宾州树库和中国的树库数据集的实验结果表明，当地面实况解析树作为附加训练信号提供，该模型能够实现更低的困惑并引起树木更好的质量。

5. Prta: A System to Support the Analysis of Propaganda Techniques in the News [PDF] 返回目录
Giovanni Da San Martino, Shaden Shaar, Yifan Zhang, Seunghak Yu, Alberto Barrón-Cedeño, Preslav Nakov
Abstract: Recent events, such as the 2016 US Presidential Campaign, Brexit and the COVID-19 "infodemic", have brought into the spotlight the dangers of online disinformation. There has been a lot of research focusing on fact-checking and disinformation detection. However, little attention has been paid to the specific rhetorical and psychological techniques used to convey propaganda messages. Revealing the use of such techniques can help promote media literacy and critical thinking, and eventually contribute to limiting the impact of "fake news" and disinformation campaigns. Prta (Propaganda Persuasion Techniques Analyzer) allows users to explore the articles crawled on a regular basis by highlighting the spans in which propaganda techniques occur and to compare them on the basis of their use of propaganda techniques. The system further reports statistics about the use of such techniques, overall and over time, or according to filtering criteria specified by the user based on time interval, keywords, and/or political orientation of the media. Moreover, it allows users to analyze any text or URL through a dedicated interface or via an API. The system is available online: this https URL
摘要：最近发生的事件，如2016年的美国总统大选，Brexit和COVID-19“infodemic”，把他们带到了聚光灯网上造谣的危险。目前已经有很多的研究，重点其实检查和造谣检测。然而，很少受到人们的重视，以用来传递信息的宣传具体的修辞和心理技巧。揭示了使用这种技术可以帮助提高媒体素养的和批判性思维，并最终有助于限制的“假新闻”和虚假广告活动的效果。 PRTA（宣传说服技巧分析）允许用户探索通过突出其宣传技巧发生跨度物品定期抓取和他们自己使用的宣传技术的基础上进行比较。关于使用这样的技术，整体和随着时间的推移，或根据基于时间间隔，关键字，和/或媒体的政治方向过滤由用户指定的条件的系统，还报告的统计信息。此外，它允许用户通过专用接口或通过API分析任何文本或网址。该系统可在网上：此HTTPS URL

6. A Report on the 2020 Sarcasm Detection Shared Task [PDF] 返回目录
Debanjan Ghosh, Avijit Vajpayee, Smaranda Muresan
Abstract: Figurative language analysis, such as sarcasm and irony detection has established itself as one of the popular NLP tasks in the last decade. As the community working on computational approaches to such problems is growing it is imperative to conduct benchmarking studies to analyze the current state-of-the-art, thus facilitating progress in this area. In this paper we report on the shared task on sarcasm detection we conducted as a part of the 2nd Workshop on Figurative Language Processing (FigLang2020) at ACL 2020.
摘要：比喻性语言的分析，如挖苦和讽刺检测已成为在过去十年中流行的NLP任务之一。由于社会上的计算方法，以这样的问题的工作越来越当务之急是要进行基准研究，分析这一领域的当前状态的最先进的，从而有利于进步。在本文中，我们对我们如在ACL 2020的第二研讨会比喻语言处理（FigLang2020）的一部分进行讽刺检测共享任务提出报告。

7. Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension [PDF] 返回目录
Bo Zheng, Haoyang Wen, Yaobo Liang, Nan Duan, Wanxiang Che, Daxin Jiang, Ming Zhou, Ting Liu
Abstract: Natural Questions is a new challenging machine reading comprehension benchmark with two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer). Despite the effectiveness of existing methods on this benchmark, they treat these two sub-tasks individually during training while ignoring their dependencies. To address this issue, we present a novel multi-grained machine reading comprehension framework that focuses on modeling documents at their hierarchical nature, which are different levels of granularity: documents, paragraphs, sentences, and tokens. We utilize graph attention networks to obtain different levels of representations so that they can be learned simultaneously. The long and short answers can be extracted from paragraph-level representation and token-level representation, respectively. In this way, we can model the dependencies between the two-grained answers to provide evidence for each other. We jointly train the two sub-tasks, and our experiments show that our approach significantly outperforms previous systems at both long and short answer criteria.
摘要：自然的问题是一个新的挑战机器阅读理解基准采用双晶答案，这是一个长期的答案（通常为一个段落）和简答题（长答案内一个以上的实体）。尽管对这一基准现有方法的有效性，他们在训练中单独对待这两个子任务，而忽略了他们的依赖性。为了解决这个问题，我们提出了一个新颖的多晶机阅读理解框架，重点是在建模的分层性质的文件，这是不同的粒度级别：文档，段落，句子和令牌。我们利用图形注意网络来获得不同级别的交涉，让他们可以同时获悉的。长和短答案可以从分别段落级表示和令牌级表示，被提取。通过这种方式，我们可以模拟双晶答案之间的依赖关系，为彼此提供证据。我们共同训练两个子任务，我们的实验表明，我们的做法在长期和短期的答案标准显著优于以前的系统。

8. Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019 [PDF] 返回目录
Antonio Toral
Abstract: We reassess the claims of human parity and super-human performance made at the news shared task of WMT 2019 for three translation directions: English-to-German, English-to-Russian and German-to-English. First we identify three potential issues in the human evaluation of that shared task: (i) the limited amount of intersentential context available, (ii) the limited translation proficiency of the evaluators and (iii) the use of a reference translation. We then conduct a modified evaluation taking these issues into account. Our results indicate that all the claims of human parity and super-human performance made at WMT 2019 should be refuted, except the claim of human parity for English-to-German. Based on our findings, we put forward a set of recommendations and open questions for future assessments of human parity in machine translation.
摘要：我们重新审视人类的奇偶性，WMT 2019的新闻共享任务三个翻译方向作出超人类性能的要求：英语到德语，英语对俄语和德语到英语。首先，我们找出三个潜在的问题，在共享任务的人的评价：（i）本有限句际背景可供选择，（二）评估有限译熟练及（iii）使用参考翻译。然后我们会进行修改的评价考虑这些问题考虑进去。我们的研究结果表明，人类的奇偶性，WMT 2019由超人类表现的全部权利应予以批驳，但人类的奇偶性对于英语到德国的要求。根据我们的调查结果，我们提出了一系列建议，并在机器翻译的人平等的未来评估开放性问题。

9. Dynamic Memory Induction Networks for Few-Shot Text Classification [PDF] 返回目录
Ruiying Geng, Binhua Li, Yongbin Li, Jian Sun, Xiaodan Zhu
Abstract: This paper proposes Dynamic Memory Induction Networks (DMIN) for few-shot text classification. The model utilizes dynamic routing to provide more flexibility to memory-based few-shot learning in order to better adapt the support sets, which is a critical capacity of few-shot classification models. Based on that, we further develop induction models with query information, aiming to enhance the generalization ability of meta-learning. The proposed model achieves new state-of-the-art results on the miniRCV1 and ODIC dataset, improving the best performance (accuracy) by 2~4%. Detailed analysis is further performed to show the effectiveness of each component.
摘要：本文提出了动态内存异步网络（DMIN）为几拍文本分类。该模型采用动态路由，以更好地适应支撑套，这是为数不多的镜头分类模型的关键能力提供更多的灵活性，以基于内存的几拍的学习秩序。在此基础上，我们进一步开发感应型号查询信息，旨在加强元学习的泛化能力。所提出的模型实现了国家的最先进的新的miniRCV1和ODIC数据集的结果，提高了2〜4％的最佳性能（准确性）。进一步进行详细的分析，以显示每种组分的有效性。

10. Detecting Multiword Expression Type Helps Lexical Complexity Assessment [PDF] 返回目录
Ekaterina Kochmar, Sian Gooding, Matthew Shardlow
Abstract: Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.
摘要：多字表达式（MWEs）表示应该被视为单个词汇单元的词位，由于其特异的性质。多NLP应用已经显示出从MWE识别利益，但是对词汇MWEs的复杂性研究仍是一个充分开发的区域。在这项工作中，我们重新诠释Yimam等的复合词识别共享任务2018数据集。（2017），其提供复杂性得分的范围的词位，同类型的MWEs。我们发布MWE标注的数据集本文中，我们认为这个数据集表示文本简化社会的宝贵资源。此外，我们调查哪些类型的表达式是最容易出问题的本地和非本地读者。最后，我们表明，约MWE类型信息的词汇复杂性评估体系的好处。

11. On the Robustness of Language Encoders against Grammatical Errors [PDF] 返回目录
Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang
Abstract: We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
摘要：我们进行了深入的研究，当与自然语法错误面对诊断的预先训练语言编码器（ELMO，BERT和罗伯塔）的行为。具体而言，我们收集来自非母语的真正语法错误，并进行对抗的攻击，以模拟在干净的文本数据这些错误。我们用这种方式来促进下游应用程序的调试模式。结果证实，所有测试车型的性能会受到影响，但影响程度有所不同。为了解释模型的行为，我们还设计了一个语言的接受任务，揭示自己的能力确定不合语法的句子和错误的位置。我们找到训练有素的句子的正确性，预测一个简单的分类器固定上下文编码器能够找到错误的位置。我们还设计了一个BERT完型填空和发现BERT捕获错误，并在上下文中的具体标记之间的相互作用。我们的研究结果阐明了理解的稳健性和对语法错误的语言编码器的行为。

12. Learning and Evaluating Emotion Lexicons for 91 Languages [PDF] 返回目录
Sven Buechel, Susanna Rücker, Udo Hahn
Abstract: Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often limited both in terms of the lexical units they contain and the emotional variables they feature. In order to break this bottleneck, we here introduce a methodology for creating almost arbitrarily large emotion lexicons for any target language. Our approach requires nothing but a source language emotion lexicon, a bilingual word translation model, and a target language embedding model. Fulfilling these requirements for 91 languages, we are able to generate representationally rich high-coverage lexicons comprising eight emotional variables with more than 100k lexical entries each. We evaluated the automatically generated lexicons against human judgment from 26 datasets, spanning 12 typologically diverse languages, and found that our approach produces results in line with state-of-the-art monolingual approaches to lexicon creation and even surpasses human reliability for some languages and variables. Code and data are available at this https URL archived under DOI this https URL.
摘要：情感词汇形容词的情感意义，从而构成了先进的情绪和情感分析核心。然而，人工监管的词汇只适用于不多的语言，让世界上的大多数语言，而不用于下游应用这样一个宝贵的资源。更糟的是，他们的覆盖面往往是在它们所包含的词汇单位，它们的特点是情绪变量方面的限制两者。为了突破这个瓶颈，我们在这里介绍针对任何目标语言创建几乎任意大的情感词典的方法。我们的方法需要什么，但源语言的情感词汇，双语词翻译模型和目标语言嵌入模型。满足91种语言这些要求，我们能够产生表象丰富的高覆盖率的词汇包括与每个超过10万级词汇的条目8个情绪变量。我们评估了对人的判断自动生成的词汇从26个集，跨越12种类型学的不同语言，并发现我们的方法产生符合结果的国家的最先进的单语接近词汇创作，甚至超越人类的可靠性对于某些语言和变量。代码和数据都可以在此HTTPS URL归档DOI该HTTPS URL下。

13. A Frobenius Algebraic Analysis for Parasitic Gaps [PDF] 返回目录
Michael Moortgat, Mehrnoosh Sadrzadeh, Gijs Wijnholds
Abstract: The interpretation of parasitic gaps is an ostensible case of non-linearity in natural language composition. Existing categorial analyses, both in the typelogical and in the combinatory traditions, rely on explicit forms of syntactic copying. We identify two types of parasitic gapping where the duplication of semantic content can be confined to the lexicon. Parasitic gaps in adjuncts are analysed as forms of generalized coordination with a polymorphic type schema for the head of the adjunct phrase. For parasitic gaps affecting arguments of the same predicate, the polymorphism is associated with the lexical item that introduces the primary gap. Our analysis is formulated in terms of Lambek calculus extended with structural control modalities. A compositional translation relates syntactic types and derivations to the interpreting compact closed category of finite dimensional vector spaces and linear maps with Frobenius algebras over it. When interpreted over the necessary semantic spaces, the Frobenius algebras provide the tools to model the proposed instances of lexical polymorphism.
摘要：寄生间隙的解释是在自然语言组成的非线性的表面上的情况下。现有的范畴分析，无论是在typelogical，并在组合子的传统，依靠语法复制的明确的形式。我们确定了两种类型的寄生间隙内，其中的语义内容的重复可局限于词汇。在辅料寄生的差距进行了分析与对附属短语的头一个多态型模式广义协调的形式。对于影响相同的谓词的参数寄生空白，多态性与引入该主间隙的词项相关联。我们的分析是制定扩大与结构控制形式Lambek演算方面。的组成翻译涉及句法类型和派生到有限维向量空间的解释紧凑闭合类和线性与弗罗贝纽斯代数在它的地图。当解释了必要的语义空间，弗罗比尼斯范代数提供词汇多态性的建议情况进行建模的工具。

14. SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis [PDF] 返回目录
Hao Tian, Can Gao, Xinyan Xiao, Hao Liu, Bolei He, Hua Wu, Haifeng Wang, Feng Wu
Abstract: Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at this https URL.
摘要：近日，情绪分析已经看到与前培训，帮助显着进步的方法。然而，感悟知识，如情绪词和纵横情绪对，在预训练的过程中，忽略尽管它们被广泛应用于传统的情感分析方法。在本文中，我们介绍了以学习为多个情感分析任务的统一的情绪表达情绪增强知识岗前培训（蜂箱）。随着自动开采知识的帮助下，进行蜂箱情绪掩蔽并构造3个情绪知识预测目标，以便在字，极性和方面水平进入预训练的情绪表示嵌入情绪信息。具体地，一方面，情绪对预测被转换成多标记分类，旨在捕获在一对的单词之间的相关性。对3种的情绪任务实验表明，蜂箱显著优于强前培训基准，实现国家的最先进的新上大部分测试数据集的结果。我们在此HTTPS URL释放我们的代码。

15. Neighborhood Matching Network for Entity Alignment [PDF] 返回目录
Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Dongyan Zhao
Abstract: Structural heterogeneity between knowledge graphs is an outstanding challenge for entity alignment. This paper presents Neighborhood Matching Network (NMN), a novel entity alignment framework for tackling the structural heterogeneity challenge. NMN estimates the similarities between entities to capture both the topological structure and the neighborhood difference. It provides two innovative components for better learning representations for entity alignment. It first uses a novel graph sampling method to distill a discriminative neighborhood for each entity. It then adopts a cross-graph neighborhood matching module to jointly encode the neighborhood difference for a given entity pair. Such strategies allow NMN to effectively construct matching-oriented entity representations while ignoring noisy neighbors that have a negative impact on the alignment task. Extensive experiments performed on three entity alignment datasets show that NMN can well estimate the neighborhood similarity in more tough cases and significantly outperforms 12 previous state-of-the-art methods.
摘要：知识图之间的异质结构是实体对准一个优秀的挑战。本文呈现邻居匹配网络（NMN），为应对结构异质性的挑战的新的实体对准框架。 NMN估计实体之间的相似之处同时捕获的拓扑结构和邻域差值。它提供了两个创新的组件更好的学习表示为实体对齐。它首先使用一种新颖的曲线图的采样方法，以蒸馏一个判别附近为每个实体。然后，它采用的横图形附近匹配模块，共同编码针对给定实体对附近差。这样的策略让NMN有效地构建面向匹配的实体表示，而忽略了对对齐任务产生负面影响吵闹的邻居。大量的实验三个实体比对数据集进行显示NMN可以很好地估计更碰硬邻里相似性和显著优于12以前的状态的最先进的方法。

16. Simultaneous paraphrasing and translation by fine-tuning Transformer models [PDF] 返回目录
Rakesh Chada
Abstract: This paper describes the third place submission to the shared task on simultaneous translation and paraphrasing for language education at the 4th workshop on Neural Generation and Translation (WNGT) for ACL 2020. The final system leverages pre-trained translation models and uses a Transformer architecture combined with an oversampling strategy to achieve a competitive performance. This system significantly outperforms the baseline on Hungarian (27% absolute improvement in Weighted Macro F1 score) and Portuguese (33% absolute improvement) languages.
摘要：本文介绍了第三名提交的同声翻译的共同任务，在为ACL 2020年的最后系统利用预先训练的翻译模型神经生成和转换（WNGT）第四届研讨会转述语言教育和使用变压器架构，具有过采样战略相结合，实现有竞争力的表现。该系统显著优于对匈牙利（加权宏F1分数27％的绝对提高）和葡萄牙（33％的绝对改善）语言基线。

17. DiscreTalk: Text-to-Speech as a Machine Translation Problem [PDF] 返回目录
Tomoki Hayashi, Shinji Watanabe
Abstract: This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT). The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model. The VQ-VAE model learns a mapping function from a speech waveform into a sequence of discrete symbols, and then the Transformer-NMT model is trained to estimate this discrete symbol sequence from a given input text. Since the VQ-VAE model can learn such a mapping in a fully-data-driven manner, we do not need to consider hyperparameters of the feature extraction required in the conventional E2E-TTS models. Thanks to the use of discrete symbols, we can use various techniques developed in NMT and automatic speech recognition (ASR) such as beam search, subword units, and fusions with a language model. Furthermore, we can avoid an over smoothing problem of predicted features, which is one of the common issues in TTS. The experimental evaluation with the JSUT corpus shows that the proposed method outperforms the conventional Transformer-TTS model with a non-autoregressive neural vocoder in naturalness, achieving the performance comparable to the reconstruction of the VQ-VAE model.
摘要：本文提出了一种新的终端到终端的文本到语音（E2E-TTS）基于神经机器翻译（NMT）模型。该模型由两个部分组成;非自回归矢量量化变自动编码器（VQ-VAE）模型和自回归变压器NMT模型。的VQ-VAE模型学习从语音波形的映射函数成离散符号的序列，然后将变压器NMT模型被训练以从给定的输入文本估计该离散符号序列。由于VQ-VAE模型可以在完全数据驱动的方式学习这样的映射，我们不需要考虑在传统的E2E-TTS车型所需的特征提取的超参数。由于使用离散符号的，我们可以使用在NMT和自动语音识别（ASR）开发了各种技术，例如波束搜索，子字单元和融合具有语言模型。此外，我们能够避免过度的预测功能，这是在TTS的常见问题之一平滑问题。实验评价与仅仅指刚语料库表明，所提出的方法优于传统的变压器TTS模型用非自回归神经声码器在自然，实现性能相媲美的VQ-VAE模型的重构。

18. Psychometric Analysis and Coupling of Emotions Between State Bulletins and Twitter in India during COVID-19 Infodemic [PDF] 返回目录
Baani Leen Kaur Jolly, Palash Aggrawal, Amogh Gulati, Amarjit Singh Sethi, Ponnurangam Kumaraguru, Tavpritesh Sethi
Abstract: COVID-19 infodemic has been spreading faster than the pandemic itself with misinformation riding upon the infodemic wave being a major threat to people's health and governance systems. Since social media is the largest source of information, managing the infodemic not only requires mitigating of misinformation but also an early understanding of psychological patterns resulting from it. During the COVID-19 crisis, Twitter alone has seen a sharp 45% increase in the usage of its curated events page, and a 30% increase in its direct messaging usage, since March 6th 2020. In this study, we analyze the psychometric impact and coupling of the COVID-19 infodemic with the official bulletins related to COVID-19 at the national and state level in India. We look at these two sources with a psycho-linguistic lens of emotions and quantified the extent and coupling between the two. We modified path, a deep skip-gram based open-sourced lexicon builder for effective capture of health-related emotions. We were then able to capture the time-evolution of health-related emotions in social media and official bulletins. An analysis of lead-lag relationships between the time series of extracted emotions from official bulletins and social media using Granger's causality showed that state bulletins were leading the social media for some emotions such as fear. Further insights that are potentially relevant for the policymaker and the communicators actively engaged in mitigating misinformation are also discussed. Our paper also introduces CoronaIndiaDataset2, the first social media based COVID-19 dataset at national and state levels from India with over 5.6 million national and 2.6 million state-level tweets. Finally, we present our findings as COVibes, an interactive web application capturing psychometric insights captured upon the CoronaIndiaDataset, both at a national and state level.
摘要：COVID-19 infodemic一直铺展比流行病本身于infodemic波是人们的身体健康和治理体系的一个主要威胁误传骑得更快。由于社交媒体是信息的最大来源，管理infodemic不仅要求减轻误传，但也从它产生的心理模式的早期理解。在COVID-19的危机，微博独自已经看到在其策划的活动页面的使用大幅增长45％，并增加其直接信息使用量的30％，自3月6日到2020年在这项研究中，我们分析了心理影响和COVID-19 infodemic与印度国家和州级相关COVID-19官方公报的耦合。我们来看看这两个来源与情感的心理语言学镜头和定量两者之间的范围和耦合。我们修正后的路径，深跳过克基于开源的词汇建设者的健康相关情绪的有效捕获。那时我们能够捕捉到的社交媒体和官方公告与健康有关的情绪的时间演化。时间序列使用格兰杰因果官方公告和社交媒体中提取的情感之间超前滞后关系的分析表明，国家公告了领先的一些情感的社会化媒体，如恐惧。进一步的见解，是潜在相关的决策者和积极从事减轻误传传播者进行了讨论。我们文中还介绍CoronaIndiaDataset2，是第一款基于社交媒体COVID-19数据集从印度国家和州一级有超过560万的国家和260万国家级鸣叫。最后，我们提出我们的研究结果作为COVibes，交互式Web应用程序捕获于CoronaIndiaDataset捕捉心理的见解，无论是在国家和省级层面。

19. A Framework for Hierarchical Multilingual Machine Translation [PDF] 返回目录
Ion Madrazo Azpiazu, Maria Soledad Pera
Abstract: Multilingual machine translation has recently been in vogue given its potential for improving machine translation performance for low-resource languages via transfer learning. Empirical examinations demonstrating the success of existing multilingual machine translation strategies, however, are limited to experiments in specific language groups. In this paper, we present a hierarchical framework for building multilingual machine translation strategies that takes advantage of a typological language family tree for enabling transfer among similar languages while avoiding the negative effects that result from incorporating languages that are too different to each other. Exhaustive experimentation on a dataset with 41 languages demonstrates the validity of the proposed framework, especially when it comes to improving the performance of low-resource languages via the use of typologically related families for which richer sets of resources are available.
摘要：多语言机器翻译最近一直在给它通过传递学习提高资源少的语言机器翻译的性能潜力时尚。实证检验表明现有的多语言机器翻译战略的成功，但仅限于特定的语言组实验。在本文中，我们提出了建立多语言机器翻译策略的一种体系，对于实现类似的语言之间的转移需要一个类型学的语言家谱的优势，同时避免负面影响这一结果从合并是彼此太不同的语言。与41种语言的数据集详尽的实验表明了该框架的有效性，尤其是当它涉及到通过使用类型学的家庭有关的这组更加丰富的资源可用来改善低资源语言的性能。

20. Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020) [PDF] 返回目录
Takashi Morita, Hiroki Koda
Abstract: In this study, we reported our exploration of Text-To-Speech without Text (TTS without T) in the ZeroSpeech Challenge 2020, in which participants proposed an end-to-end, unsupervised system that learned speech recognition and TTS together. We addressed the challenge using biologically/psychologically motivated modules of Artificial Neural Networks (ANN), with a particular interest in unsupervised learning of human language as a biological/psychological problem. The system first processes Mel Frequency Cepstral Coefficient (MFCC) frames with an Echo-State Network (ESN), and simulates computations in cortical microcircuits. The outcome is discretized by our original Variational Autoencoder (VAE) that implements the Dirichlet-based Bayesian clustering widely accepted in computational linguistics and cognitive science. The discretized signal is then reverted into sound waveform via a neural-network implementation of the source-filter model for speech production.
摘要：在这项研究中，我们报道了我们的文本到语音的探索没有文字（不使用T TTS）在ZeroSpeech挑战2020年，与会者提出了一个终端到年底，无人监督的系统，了解到语音识别和语音合成在一起。我们利用人工神经网络（ANN）的生物/心理动机模块解决的挑战，人类语言的无监督学习特别感兴趣的生物/心理问题。该系统首先处理梅尔频率倒谱系数（MFCC）帧与回声状态网络（ESN），以及在皮质微电路模拟计算。结果是由我们原来的变自动编码器（VAE）实现基于狄氏贝叶斯集群广泛接受，在计算语言学和认知科学离散。然后将离散化信号通过神经网络实现的源滤波器模型用于语音生产恢复了入声音波形。

21. Schema-Guided Natural Language Generation [PDF] 返回目录
Yuheng Du, Shereen Oraby, Vittorio Perera, Minmin Shen, Anjali Narayan-Chen, Tagyoung Chung, Anu Venkatesh, Dilek Hakkani-Tur
Abstract: Neural network based approaches to natural language generation (NLG) have gained popularity in recent years. The goal of the task is to generate a natural language string to realize an input meaning representation, hence large datasets of paired utterances and their meaning representations are used for training the network. However, dataset creation for language generation is an arduous task, and popular datasets designed for training these generators mostly consist of simple meaning representations composed of slot and value tokens to be realized. These simple meaning representations do not include any contextual information that may be helpful for training an NLG system to generalize, such as domain information and descriptions of slots and values. In this paper, we present the novel task of Schema-Guided Natural Language Generation, in which we repurpose an existing dataset for another task: dialog state tracking. Dialog state tracking data includes a large and rich schema spanning multiple different attributes, including information about the domain, user intent, and slot descriptions. We train different state-of-the-art models for neural natural language generation on this data and show that inclusion of the rich schema allows our models to produce higher quality outputs both in terms of semantics and diversity. We also conduct experiments comparing model performance on seen versus unseen domains. Finally, we present human evaluation results and analysis demonstrating high ratings for overall output quality.
摘要：基于神经网络的方法来自然语言生成（NLG）在近几年得到普及。任务的目标是产生一个自然语言字符串实现输入意思表示，因此大配对话语和自己的意思表示的数据集用于训练网络。然而，创建数据集语言生成是一项艰巨的任务，并实现设计训练这些发电机主要由插槽和价值代币组成的简单的意思表示的受欢迎数据集。这些简单的意思表示不包括可用于训练NLG系统一概而论，如域信息和插槽和值的说明帮助的任何上下文信息。在本文中，我们提出了模式引导下的自然语言生成的新的任务，这是我们重新利用现有数据集的另一项任务：对话状态跟踪。对话状态的跟踪数据包括大量和丰富模式跨越多个不同的属性，包括关于域的信息，用户意图，并且狭槽的描述。我们培养对这个数据的国家的最先进型号不同神经自然语言生成和显示，包括丰富的架构允许我们的模型在语义和多样性方面产生了更高质量的输出两者。我们还进行实验比较模型的性能上看到与看不到的领域。最后，我们提出人工评估结果和分析表明整体输出质量高的收视率。

22. Neural Polysynthetic Language Modelling [PDF] 返回目录
Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji, Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang
Abstract: Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especially those that are analytic, a common approach is to treat morphologically-distinct variants of a common root as completely independent word types. This assumes, that there are limited morphological inflections per root, and that the majority will appear in a large enough corpus, so that the model can adequately learn statistics about each form. Approaches like stemming, lemmatization, or subword segmentation are often used when either of those assumptions do not hold, particularly in the case of synthetic languages like Spanish or Russian that have more inflection than English. In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions. Yet, when considering all of the world's languages, Finnish and Turkish are closer to the average case. When we consider polysynthetic languages (those at the extreme of morphological complexity), approaches like stemming, lemmatization, or subword modelling may not suffice. These languages have very high numbers of hapax legomena, showing the need for appropriate morphological handling of words, without which it is not possible for a model to capture enough word statistics. We examine the current state-of-the-art in language modelling, machine translation, and text prediction for four polysynthetic languages: Guaraní, St. Lawrence Island Yupik, Central Alaskan Yupik, and Inuktitut. We then propose a novel framework for language modelling that combines knowledge representations from finite-state morphological analyzers with Tensor Product Representations in order to enable neural language models capable of handling the full range of typologically variant languages.
摘要：研究自然语言处理通常假定接近于工作做好英语和以及其他广泛使用的语言是“语言无关”。在资源丰富的语言，特别是那些分析，常用的方法是对待一个共同的根源的形态，不同的变种作为完全独立的字类型。这是假设，有限制每根形态拐点，并且大多数会出现在一个足够大的语料库，因此，该模型能够充分了解各种形式的统计数据。方法，如词干，词形归并或子词分割经常被用来当任的这些假设不成立，特别是在比英语更拐点像西班牙语或俄语综合语的情况。在文献中，像芬兰和土耳其语言被奉为复杂性挑战，共同的模型假设的极端例子。然而，考虑到世界上所有的语言的时候，芬兰和土耳其更接近平均情况。当我们考虑聚片双语言（那些形态复杂的极端），接近像词干，词形归并或子字造型可能是不够的。这些语言具有非常高的数字hapax legomena，显示出需要的话适当形态的处理，没有这一点是不可能的模型捕捉到足够的单词统计。我们审视当前的语言模型，机器翻译和文本预测国家的最先进的四种聚片双语言：瓜拉尼，圣劳伦斯岛尤皮克，阿拉斯加中部尤皮克和因纽特语。然后，我们提出了语言模型，从与张量积交涉有限状态形态分析仪结合了知识表示，以使能处理的全方位类型学的变异语言的神经语言模型一个新的框架。

23. Luganda Text-to-Speech Machine [PDF] 返回目录
Irene Nandutu, Ernest Mwebaze
Abstract: In Uganda, Luganda is the most spoken native language. It is used for communication in informal as well as formal business transactions. The development of technology startups globally related to TTS has mainly been with languages like English, French, etc. These are added in TTS engines by Google, Microsoft among others, allowing developers in these regions to innovate TTS products. Luganda is not supported because the language is not built and trained on these engines. In this study, we analyzed the Luganda language structure and constructions and then proposed and developed a Luganda TTS. The system was built and trained using locally sourced Luganda language text and audio. The engine is now able to capture text and reads it aloud. We tested the accuracy using MRT and MOS. MRT and MOS tests results are quite good with MRT having better results. The results general score was 71%. This study will enhance previous solutions to NLP gaps in Uganda, as well as provide raw data such that other research in this area can take place.
摘要：在乌干达，卢干达语是最讲母语。它是用于在非正式沟通，以及正式的商业交易。科技创业公司在全球范围与TTS的发展主要是与像英语，法语等，这些都在TTS引擎由谷歌，微软等加入，使开发人员在这些地区的创新产品TTS语言。不支持卢干达语，因为语言不是建立并训练这些引擎。在这项研究中，我们分析了卢干达语的语言结构和结构，然后提出并制定了卢干达语TTS。该系统建成并使用当地食材卢干达语语言文本和音频培训。发动机是现在能够捕获文本，并朗读它。我们测试使用地铁和MOS的准确性。地铁和MOS测试成绩也相当不错与具有更好的效果MRT。结果一般得分率为71％。这项研究将加强在乌干达NLP缺口之前的解决方案，同时为客户提供的原始数据，使得在这一领域的其他研究才能进行。

24. On the Generation of Medical Dialogues for COVID-19 [PDF] 返回目录
Wenmian Yang, Guangtao Zeng, Bowen Tan, Zeqian Ju, Subrato Chakravorty, Xuehai He, Shu Chen, Xingyi Yang, Qingyang Wu, Zhou Yu, Eric Xing, Pengtao Xie
Abstract: Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of consulting services have been moved online. Because of the shortage of medical professionals, many people cannot receive online consultations timely. To address this problem, we aim to develop a medical dialogue system that can provide COVID19-related consultations. We collected two dialogue datasets -CovidDialog- (in English and Chinese respectively) containing conversations between doctors and patients about COVID-19. On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT. Since the two COVID-19 dialogue datasets are small in size, which bears high risk of overfitting, we leverage transfer learning to mitigate data deficiency. Specifically, we take the pretrained models of Transformer, GPT, and BERT-GPT on dialog datasets and other large-scale texts, then finetune them on our CovidDialog datasets. Experiments demonstrate that these approaches are promising in generating meaningful medical dialogues about COVID-19. But more advanced approaches are needed to build a fully useful dialogue system that can offer accurate COVID-related consultations. The data and code are available at this https URL
摘要：在大流行COVID19，人们经历COVID19相关症状或暴露于风险因素的紧迫需要咨询医生。由于医院关闭，大量的咨询服务已经被搬到了网上。由于医疗专业人员短缺，很多人无法得到及时在线咨询。为了解决这个问题，我们的目标是建立一个医疗对话系统，可以提供COVID19相关磋商。我们收集了两人的对话集-CovidDialog-（英语和中国分别为）约含COVID-19医生和病人之间的对话。这两个数据集，我们培养基于变压器，GPT，和BERT-GPT几个对话代车型。由于两个COVID-19对话数据集体积小，负有过拟合，我们利用迁移学习，以减轻数据缺乏的高危人群。具体来说，我们采取对话数据集和其他大型文本变压器，GPT，和BERT-GPT的预训练的模型，然后微调他们对我们的CovidDialog数据集。实验证明，这些方法在产生约COVID-19有意义的医疗对话看好。但更先进的方法都需要建立一个完全有益的对话系统，可以提供准确的COVID相关磋商。数据和代码都可以在此HTTPS URL

25. Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data [PDF] 返回目录
Soumya Banerjee, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, Parthapratim Das
Abstract: The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and CONCLUSION, but this segmentation is uncommon in other fields like computer science. Explicit categories could be helpful for more granular, that is, discourse-level search and recommendation. The sparsity of labeled data makes it challenging to construct supervised machine learning solutions for automatic discourse-level segmentation of abstracts in non-bio domains. In this paper, we address this problem using transfer learning. In particular, we define three discourse categories BACKGROUND, TECHNIQUE, OBSERVATION-for an abstract because these three categories are the most common. We train a deep neural network on structured abstracts from PubMed, then fine-tune it on a small hand-labeled corpus of computer science papers. We observe an accuracy of 75% on the test corpus. We perform an ablation study to highlight the roles of the different parts of the model. Our method appears to be a promising solution to the automatic segmentation of abstracts, where the labeled data is sparse.
摘要：科学论文的抽象提炼论文的内容转换成短款。在生物医学文献中，习惯上构造一个抽象成话语类别，如背景，目的，方法，结果，和结论，但这种分割是，如计算机科学等领域少见。明确的类别可能是更精细，也就是话语级搜索和推荐有帮助。标记的数据的稀疏性使得它具有挑战性构建了用于在非生物域摘要的自动话语级分割监督机器学习解决方案。在本文中，我们利用迁移学习解决这个问题。特别是，我们定义了三种类型的话语背景，技术，观察，一个抽象的，因为这三类是最常见的。我们对结构化摘要从考研训练深层神经网络，再微调它的计算机科学论文的小手标记的语料库。我们观察到在测试语料的75％的准确度。我们进行消融研究，以突出该模型的不同部分的作用。我们的方法似乎是一个有希望的解决方案，以摘要的自动分割，其中，所述标记数据是稀疏的。

26. MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning [PDF] 返回目录
Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
Abstract: Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. The memory module generates a highly summarized memory state from the video segments and the sentence history so as to help better prediction of the next sentence (w.r.t. coreference and repetition aspects), thus encouraging coherent paragraph generation. Extensive experiments, human evaluations, and qualitative analyses on two popular datasets ActivityNet Captions and YouCookII show that MART generates more coherent and less repetitive paragraph captions than baseline methods, while maintaining relevance to the input video events. All code is available open-source at: this https URL
摘要：视频生成多一句描述是最具挑战性的字幕任务之一，由于其在该段不仅是可视的相关性，但整个句子也语篇连贯基础要求高。为了实现这一目标，我们提出了一个名为内存增强复发变压器（MART）的新方法，它使用一个内存模块以增加变压器的架构。内存模块生成从视频段的高度概括记忆状态和句子的历史，以便下一个句子（w.r.t.的共参照和重复方面）的帮助下更好的预测，从而鼓励连贯款产生。大量的实验，人类的评估，并在两个流行的数据集ActivityNet标题和YouCookII表明MART产生更加连贯，少重复的段落标题超过基线的方法，同时保持相关性输入的视频事件的定性分析。所有代码都可以开源在：此HTTPS URL

27. Enabling Language Models to Fill in the Blanks [PDF] 返回目录
Chris Donahue, Mina Lee, Percy Liang
Abstract: We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of language models (LMs) to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.
摘要：我们提出了文本充填，在文档中的任意位置预测文本的缺失跨越的任务，一个简单的方法。虽然充填可以使丰富的功能，尤其是对写作援助工具，更多的关注一直致力于语言建模---充填，其中文本是在文档的最后预测的特殊情况。在本文中，我们的目标是语言模型（LMS）的功能扩展到充填的更一般的任务。为此，我们培养在含人工掩蔽文本的级联和被掩蔽的文本序列（或微调）关闭的，现成的LM。我们表明，这种方法，我们称之为由语言建模充填，可以启用的LMS有效地填充在三个不同的领域完整的句子：短篇小说，科学文摘，和歌词。此外，我们表明，人类有我们的机器生成的短篇小说域方法填充墙难以识别句子。

28. Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis [PDF] 返回目录
Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro
Abstract: %auto-ignore In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible mapping of data to a latent space that can be manipulated to control many aspects of speech synthesis (pitch, tone, speech rate, cadence, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. In addition, we provide results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training. Code and pre-trained models will be made publicly available at this https URL
摘要：％自动忽略在本文中，我们提出Flowtron：自回归基于流的生成网络文本到语音合成超过言语变化和风格的传输控制。 Flowtron从IAF借用的见解，为了提供高品质的和表现梅尔频谱合成revamps Tacotron。 Flowtron通过最大化训练数据，这使得训练简单且稳定的可能性进行了优化。 Flowtron学习数据以能够被操纵以控制语音合成（音高，音色，话音速率，节奏音符）的许多方面的潜空间中的可逆映射。我们的平均意见得分（MOS）显示，Flowtron国家的最先进的TTS车型匹配的语音质量方面。此外，我们提供的言语变化，样品和看到和训练过程中看不见的扬声器之间的风格转移之间的插值的控制效果。代码和预先训练的车型将在此HTTPS URL公之于众

29. COVID-19Base: A knowledgebase to explore biomedical entities related to COVID-19 [PDF] 返回目录
Junaed Younus Khan, Md. Tawkat Islam Khondaker, Iram Tazim Hoque, Hamada Al-Absi, Mohammad Saifur Rahman, Tanvir Alam, M. Sohel Rahman
Abstract: We are presenting COVID-19Base, a knowledgebase highlighting the biomedical entities related to COVID-19 disease based on literature mining. To develop COVID-19Base, we mine the information from publicly available scientific literature and related public resources. We considered seven topic-specific dictionaries, including human genes, human miRNAs, human lncRNAs, diseases, Protein Databank, drugs, and drug side effects, are integrated to mine all scientific evidence related to COVID-19. We have employed an automated literature mining and labeling system through a novel approach to measure the effectiveness of drugs against diseases based on natural language processing, sentiment analysis, and deep learning. To the best of our knowledge, this is the first knowledgebase dedicated to COVID-19, which integrates such large variety of related biomedical entities through literature mining. Proper investigation of the mined biomedical entities along with the identified interactions among those, reported in COVID-19Base, would help the research community to discover possible ways for the therapeutic treatment of COVID-19.
摘要：我们提出COVID-19Base，知识库强调与基于文献挖掘COVID-19的疾病生物医学实体。为了开发COVID-19Base，我们挖掘从公开的科学文献和相关公共资源的信息。我们认为7话题特定的词典，包括人类基因，人类的miRNA，人类lncRNAs，疾病，蛋白质数据库，毒品和药物的毒副作用，都集成到矿山与COVID-19所有的科学证据。我们已采用的自动文献挖掘和标签系统通过一种新的方法来测量的药，对基于自然语言处理，情绪分析，和深度学习疾病的有效性。据我们所知，这是专门为COVID-19，它集成了如此大的品种通过文献挖掘相关的生物医学实体的第一个知识库。与那些在COVID-19Base报道中所标识的相互作用沿开采生物医学实体适当调查，将有助于研究团体发现了COVID-19的治疗可能途径。

30. WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge [PDF] 返回目录
Hongming Zhang, Xinran Zhao, Yangqiu Song
Abstract: In this paper, we present the first comprehensive categorization of essential commonsense knowledge for answering the Winograd Schema Challenge (WSC). For each of the questions, we invite annotators to first provide reasons for making correct decisions and then categorize them into six major knowledge categories. By doing so, we better understand the limitation of existing methods (i.e., what kind of knowledge cannot be effectively represented or inferred with existing methods) and shed some light on the commonsense knowledge that we need to acquire in the future for better commonsense reasoning. Moreover, to investigate whether current WSC models can understand the commonsense or they simply solve the WSC questions based on the statistical bias of the dataset, we leverage the collected reasons to develop a new task called WinoWhy, which requires models to distinguish plausible reasons from very similar but wrong reasons for all WSC questions. Experimental results prove that even though pre-trained language representation models have achieved promising progress on the original WSC dataset, they are still struggling at WinoWhy. Further experiments show that even though supervised models can achieve better performance, the performance of these models can be sensitive to the dataset distribution. WinoWhy and all codes are available at: this https URL.
摘要：在本文中，我们提出必要的常识性知识的第一个全面的分类为回答威诺格拉德架构挑战赛（WSC）。对于每个问题，我们邀请注释者先提供正确决策的原因，然后将它们归类为六大类的知识。通过这样做，我们更好地了解现有方法的限制（即什么样的知识，不能有效地代表或与现有的方法推断）和一些线索常识知识，我们需要获得在未来更好的常识推理。此外，调查当前WSC模式是否能理解的常识或者他们只是解决基于数据集的统计偏差的WSC的问题，我们利用收集到的原因，开发一种被称为WinoWhy新的任务，这需要模型来区分非常合理的原因类似的，但错误的原因，所有WSC的问题。实验结果证明，即使预先训练的语言表示模型已经实现承诺对原有WSC数据集的进展，他们仍然在挣扎WinoWhy。进一步的实验表明，即使监管模型可以达到更好的性能，这些车型的性能，可以对数据集分布敏感。 WinoWhy和所有代码，请访问：此HTTPS URL。

31. Do not let the history haunt you -- Mitigating Compounding Errors in Conversational Question Answering [PDF] 返回目录
Angrosh Mandya, James O'Neill, Danushka Bollegala, Frans Coenen
Abstract: The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground-truth answers for the previous questions, compelling the model to rely upon its own previously predicted answers for answering the subsequent questions. In this paper, we find that compounding errors occur when using previously predicted answers at test time, significantly lowering the performance of CoQA systems. To solve this problem, we propose a sampling strategy that dynamically selects between target answers and model predictions during training, thereby closely simulating the situation at test time. Further, we analyse the severity of this phenomena as a function of the question type, conversation length and domain type.
摘要：会话问答（CoQA）任务涉及回答关于上下文段落相互关联的对话问题的序列。虽然现有的在测试时间回答对话的问题，现实情景中的方法采用人编写的地面实况的回答，CoQA模式不会有地面实况回答任何访问前一个问题，令人信服的模型依靠自己的先前预测回答后续问题的答案。在本文中，我们发现，在测试时使用先前预测的答案时，显著降低CoQA系统的性能出现复合错误。为了解决这个问题，我们提出了一个抽样策略，培训期间的目标答案和模型预测之间动态选择，从而密切模拟在测试时的情况。此外，我们分析这种现象的严重程度的问题类型，通话长度，协议类型的函数。

32. AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN [PDF] 返回目录
Zewang Zhang, Qiao Tian, Heng Lu, Ling-Hui Chen, Shan Liu
Abstract: This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data. A weakness of the recently proposed end-to-end text-to-speech (TTS) systems is that robust alignment is hard to achieve, which hinders it to scale well with very limited data. To cope with this issue, we introduce AdaDurIAN by training an improved DurIAN-based average model and leverage it to few-shot learning with the shared speaker-independent content encoder across different speakers. Several few-shot learning tasks in our experiments show AdaDurIAN can outperform the baseline end-to-end system by a large margin. Subjective evaluations also show that AdaDurIAN yields higher mean opinion score (MOS) of naturalness and more preferences of speaker similarity. In addition, we also apply AdaDurIAN to emotion transfer tasks and demonstrate its promising performance.
摘要：本文探讨了如何利用基于榴莲平均模式，使一个新的扬声器同时具有准确的发音和流利的跨语言来讲非常有限的单语数据。最近提出的端至端的文本到语音转换（TTS）系统的一个弱点是，强大的定位是很难实现的，它以非常有限的数据扩展阻碍很好。为了解决这个问题，我们引入用在不同的扬声器共享独立扬声器内容编码器训练的改进的基于榴莲平均模型，并利用它来几炮学习AdaDurIAN。在我们的实验中几个几拍的学习任务显示AdaDurIAN可以大幅度跑赢基准终端到终端系统。主观评价还显示自然和扬声器相似性更偏好是AdaDurIAN产量较高的平均意见得分（MOS）。此外，我们也在申请AdaDurIAN情感传递任务，并展示其承诺的表现。

33. Discriminative Multi-modality Speech Recognition [PDF] 返回目录
Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang
Abstract: Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates. After combining visual modality, ASR is upgraded to the multi-modality speech recognition (MSR). In this paper, we propose a two-stage speech recognition model. In the first stage, the target voice is separated from background noises with help from the corresponding visual information of lip movements, making the model understands clearly. At the second stage, the audio modality combines visual modality again to better understand the speech by a MSR sub-network, further improving the recognition rate. There are some other key contributions: we introduce a pseudo-3D residual convolution (P3D)-based visual front-end to extract more discriminative features; we upgrade the temporal convolution block from 1D ResNet with the temporal convolutional network (TCN), which is more suitable for the temporal tasks; the MSR sub-network is built on the top of Element-wise-Attention Gated Recurrent Unit (EleAtt-GRU), which is more effective than Transformer in long sequences. We conducted extensive experiments on the LRS3-TED and the LRW datasets. Our two-stage model (audio enhanced multi-modality speech recognition, AE-MSR) consistently achieves the state-of-the-art performance by a significant margin, which demonstrates the necessity and effectiveness of AE-MSR.
摘要：远景经常被用来作为音频语音识别（ASR）的补充方式，特别是在嘈杂的环境中，个人音频形态的性能显著恶化。结合视觉模态后，ASR升级到多模态语音识别（MSR）。在本文中，我们提出了两个阶段的语音识别模型。在第一阶段中，识别对象语音是从背景噪音与从嘴唇运动的相应的视觉信息的帮助分离，使得模型清楚地了解。在第二阶段，听觉模态联合视觉形态再次更好地了解一个MSR子网的讲话，进一步提高识别率。还有一些其他的重要贡献：我们引入一个伪3D残余卷积（P3D）基视觉前端以提取更多的判别特征;我们升级从1D RESNET时间卷积块与时间卷积网络（TCN），这是更适合的时间任务;所述MSR子网络上建立逐元素-注意门控重复单元（EleAtt-GRU），这是更有效互感器比在长序列的顶部。我们已就LRS3-TED和LRW数据集大量的实验。我们的两阶段模型（音频增强的多模态语音识别，AE-MSR）一致地实现由显著余量，这表明AE-MSR的必要性和有效性的状态的最先进的性能。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-13

目录

摘要