目录
1. The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs [PDF] 摘要
2. Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation [PDF] 摘要
3. Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation [PDF] 摘要
8. LaCulturaNonSiFerma -- Report su uso e la diffusione degli hashtag delle istituzioni culturali italiane durante il periodo di lockdown [PDF] 摘要
15. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models [PDF] 摘要
26. Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis [PDF] 摘要
摘要
1. The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs [PDF] 返回目录
Alexander Mehler, Bernhard Jussen, Tim Geelhaar, Alexander Henlein, Giuseppe Abrami, Daniel Baumartz, Tolga Uslu, Wahed Hemati
Abstract: In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.
摘要:在这篇文章中,我们提出了法兰克福拉丁词汇(FLL),对中世纪拉丁语词汇资源,既拉丁文本的词形还原和编辑后lemmatizations的使用。我们描述了最新进展在lemmatizers的开发和测试他们对Capitularies语料库(包括法兰克皇家法令,中期6日至9世纪中叶),用于处理中世纪拉丁语参考创建的语料库。我们也可以考虑使用旨在不断审查和FLL的更新有限的众包过程lemmatizations的校正后。从这个词形还原过程中产生的文本开始,我们用字的嵌入,它的互动遍历由SemioGraphs的方式完成了数字增强释循环的方式描述了FLL的延伸。这样,文章主张词形还原的更全面的了解,涵盖经典机器学习以及知识产权后的修正,特别是,人脑运算基于底层的词汇资源的图形表示的解释过程的形式。
Alexander Mehler, Bernhard Jussen, Tim Geelhaar, Alexander Henlein, Giuseppe Abrami, Daniel Baumartz, Tolga Uslu, Wahed Hemati
Abstract: In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.
摘要:在这篇文章中,我们提出了法兰克福拉丁词汇(FLL),对中世纪拉丁语词汇资源,既拉丁文本的词形还原和编辑后lemmatizations的使用。我们描述了最新进展在lemmatizers的开发和测试他们对Capitularies语料库(包括法兰克皇家法令,中期6日至9世纪中叶),用于处理中世纪拉丁语参考创建的语料库。我们也可以考虑使用旨在不断审查和FLL的更新有限的众包过程lemmatizations的校正后。从这个词形还原过程中产生的文本开始,我们用字的嵌入,它的互动遍历由SemioGraphs的方式完成了数字增强释循环的方式描述了FLL的延伸。这样,文章主张词形还原的更全面的了解,涵盖经典机器学习以及知识产权后的修正,特别是,人脑运算基于底层的词汇资源的图形表示的解释过程的形式。
2. Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation [PDF] 返回目录
Weixin Liang, James Zou, Zhou Yu
Abstract: Open Domain dialog system evaluation is one of the most important challenges in dialog research. Existing automatic evaluation metrics, such as BLEU are mostly reference-based. They calculate the difference between the generated response and a limited number of available references. Likert-score based self-reported user rating is widely adopted by social conversational systems, such as Amazon Alexa Prize chatbots. However, self-reported user rating suffers from bias and variance among different users. To alleviate this problem, we formulate dialog evaluation as a comparison task. We also propose an automatic evaluation model CMADE (Comparison Model for Automatic Dialog Evaluation) that automatically cleans self-reported user ratings as it trains on them. Specifically, we first use a self-supervised method to learn better dialog feature representation, and then use KNN and Shapley to remove confusing samples. Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task.
摘要:开放领域对话系统评价是在对话框中研究的最重要的挑战之一。现有的自动评价指标,如BLEU大多参考为主。他们计算生成的响应和可引用的数量有限之间的差别。李克特评分依据自我报告的用户评价是广受社会对话系统,如Amazon的Alexa奖聊天机器人采用。然而,从不同用户之间的偏见和差异自我报告的用户评级受到影响。为了缓解这一问题,我们制定对话的评价作为比较的任务。我们还提出了一种自动评估模型CMADE(比较模型自动评估对话框),因为它在训练他们,自动清除自我报告的用户评级。具体而言,我们首先使用自监督法更好地学习对话框特征表示,然后用KNN和沙普利去除混淆样本。我们的实验表明,CMADE达到89.2%的准确率在对话框中比较任务。
Weixin Liang, James Zou, Zhou Yu
Abstract: Open Domain dialog system evaluation is one of the most important challenges in dialog research. Existing automatic evaluation metrics, such as BLEU are mostly reference-based. They calculate the difference between the generated response and a limited number of available references. Likert-score based self-reported user rating is widely adopted by social conversational systems, such as Amazon Alexa Prize chatbots. However, self-reported user rating suffers from bias and variance among different users. To alleviate this problem, we formulate dialog evaluation as a comparison task. We also propose an automatic evaluation model CMADE (Comparison Model for Automatic Dialog Evaluation) that automatically cleans self-reported user ratings as it trains on them. Specifically, we first use a self-supervised method to learn better dialog feature representation, and then use KNN and Shapley to remove confusing samples. Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task.
摘要:开放领域对话系统评价是在对话框中研究的最重要的挑战之一。现有的自动评价指标,如BLEU大多参考为主。他们计算生成的响应和可引用的数量有限之间的差别。李克特评分依据自我报告的用户评价是广受社会对话系统,如Amazon的Alexa奖聊天机器人采用。然而,从不同用户之间的偏见和差异自我报告的用户评级受到影响。为了缓解这一问题,我们制定对话的评价作为比较的任务。我们还提出了一种自动评估模型CMADE(比较模型自动评估对话框),因为它在训练他们,自动清除自我报告的用户评级。具体而言,我们首先使用自监督法更好地学习对话框特征表示,然后用KNN和沙普利去除混淆样本。我们的实验表明,CMADE达到89.2%的准确率在对话框中比较任务。
3. Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation [PDF] 返回目录
Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-yi Lee
Abstract: Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language. Previous works show that multitask learning improves the ST performance, in which the recognition decoder generates the text of the source language, and the translation decoder obtains the final translations based on the output of the recognition decoder. Because whether the output of the recognition decoder has the correct semantics is more critical than its accuracy, we propose to improve the multitask ST model by utilizing word embedding as the intermediate.
摘要:语音转换(ST)的目标是学习从语音转换的源语言到目标语言的文本。以前的作品表明,多任务学习提高了ST性能,其中识别解码器生成的源语言的文本,翻译解码器求出基于识别解码器的输出最终翻译。由于识别解码器的输出是否有正确的语义比其准确性更重要,我们建议提高利用字嵌入作为中间多任务ST模型。
Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-yi Lee
Abstract: Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language. Previous works show that multitask learning improves the ST performance, in which the recognition decoder generates the text of the source language, and the translation decoder obtains the final translations based on the output of the recognition decoder. Because whether the output of the recognition decoder has the correct semantics is more critical than its accuracy, we propose to improve the multitask ST model by utilizing word embedding as the intermediate.
摘要:语音转换(ST)的目标是学习从语音转换的源语言到目标语言的文本。以前的作品表明,多任务学习提高了ST性能,其中识别解码器生成的源语言的文本,翻译解码器求出基于识别解码器的输出最终翻译。由于识别解码器的输出是否有正确的语义比其准确性更重要,我们建议提高利用字嵌入作为中间多任务ST模型。
4. RuBQ: A Russian Dataset for Question Answering over Wikidata [PDF] 返回目录
Vladislav Korablinov, Pavel Braslavski
Abstract: The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
摘要:本文介绍了RuBQ,俄罗斯第一个知识基础问答(KBQA)数据集。高品质的数据集由1500个复杂程度不同,他们的英语机器翻译,SPARQL查询维基数据,参考答案的问题的俄罗斯,以及包含与俄罗斯标签实体三元组的维基数据样本。该数据集创建开始收集了大量来自在线测验问答对。数据进行自动过滤,人群辅助实体连接,自动生成SPARQL的查询,以及它们随后在内部验证。
Vladislav Korablinov, Pavel Braslavski
Abstract: The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
摘要:本文介绍了RuBQ,俄罗斯第一个知识基础问答(KBQA)数据集。高品质的数据集由1500个复杂程度不同,他们的英语机器翻译,SPARQL查询维基数据,参考答案的问题的俄罗斯,以及包含与俄罗斯标签实体三元组的维基数据样本。该数据集创建开始收集了大量来自在线测验问答对。数据进行自动过滤,人群辅助实体连接,自动生成SPARQL的查询,以及它们随后在内部验证。
5. Towards Finite-State Morphology of Kurdish [PDF] 返回目录
Sina Ahmadi, Hossein Hassani
Abstract: Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology. In this paper, as the first attempt of its kind, the morphology of the Kurdish language (Sorani dialect) is described from a computational point of view. We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words. The result of this research assists in conducting studies on language generation for Kurdish and enhances the Information Retrieval (IR) capacity for the language while leveraging the Kurdish NLP and CL into a more advanced computational level.
摘要:形态分析是词的形成和结构的研究。它在自然语言处理(NLP)和计算语言学(CL)的各种任务,比如机器翻译和文本和语音产生了至关重要的作用。库尔德人是一种资源少多方言印欧语高度屈折形态。在本文中,作为它的种的第一次尝试,库尔德语(索拉尼方言)的形态从一个计算点说明。我们提取其被变换成用于产生和分析词语有限状态变换器形态规则。这项研究有助于在开展了对库尔德人的语言,提高了信息检索(IR)的能力上语言生成的研究,同时利用库尔德NLP和CL为更先进的计算水平的结果。
Sina Ahmadi, Hossein Hassani
Abstract: Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology. In this paper, as the first attempt of its kind, the morphology of the Kurdish language (Sorani dialect) is described from a computational point of view. We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words. The result of this research assists in conducting studies on language generation for Kurdish and enhances the Information Retrieval (IR) capacity for the language while leveraging the Kurdish NLP and CL into a more advanced computational level.
摘要:形态分析是词的形成和结构的研究。它在自然语言处理(NLP)和计算语言学(CL)的各种任务,比如机器翻译和文本和语音产生了至关重要的作用。库尔德人是一种资源少多方言印欧语高度屈折形态。在本文中,作为它的种的第一次尝试,库尔德语(索拉尼方言)的形态从一个计算点说明。我们提取其被变换成用于产生和分析词语有限状态变换器形态规则。这项研究有助于在开展了对库尔德人的语言,提高了信息检索(IR)的能力上语言生成的研究,同时利用库尔德NLP和CL为更先进的计算水平的结果。
6. Unsupervised Quality Estimation for Neural Machine Translation [PDF] 返回目录
Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia
Abstract: Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By employing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.
摘要:质量估计(QE)是使机器翻译(MT)在实际应用中非常有用,因为它的目的是通知用户在MT上输出的测试时间质量的一个重要组成部分。现有的方法需要大量的专家注释的数据,计算和时间进行培训。作为替代方案,我们设计一个不受监督的方法来量化宽松政策,其中要求除MT系统本身没有培训或获得额外的资源。从目前的大部分工作是对待MT系统作为一个黑盒子,我们探索可以从MT系统中提取出来作为翻译的一个副产品有用的信息不同。通过采用不确定性定量方法,我们实现了质量的人的判断,可媲美国家的最先进的监管模式QE很好的相关性。为了评估我们的方法,我们收集的第一个数据集,使黑盒和玻璃盒子都工作接近QE。
Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia
Abstract: Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By employing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.
摘要:质量估计(QE)是使机器翻译(MT)在实际应用中非常有用,因为它的目的是通知用户在MT上输出的测试时间质量的一个重要组成部分。现有的方法需要大量的专家注释的数据,计算和时间进行培训。作为替代方案,我们设计一个不受监督的方法来量化宽松政策,其中要求除MT系统本身没有培训或获得额外的资源。从目前的大部分工作是对待MT系统作为一个黑盒子,我们探索可以从MT系统中提取出来作为翻译的一个副产品有用的信息不同。通过采用不确定性定量方法,我们实现了质量的人的判断,可媲美国家的最先进的监管模式QE很好的相关性。为了评估我们的方法,我们收集的第一个数据集,使黑盒和玻璃盒子都工作接近QE。
7. MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora [PDF] 返回目录
Lifeng Han, Gareth J.F. Jones, Alan F. Smeaton
Abstract: Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU Project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features.
摘要:多字表达式(MWEs)是在自然语言处理(NLP)研究的热点,包括主题,如MWE检测,MWE分解,研究调查MWEs在其他NLP领域,如机器翻译的剥削。然而,双语或多语语料库MWE的可用性是非常有限的。唯一的双语语料MWE,我们都知道是从PARSEME(解析和多字表达式)欧盟项目。这仅仅是871双英语,德语MWEs的小集合。在本文中,我们本发明的多语言和双语语料库MWE我们从根平行语料库萃取。我们的藏品3159226和过滤后,分别为德国,英国和中国的英文双语143042 MWE对。我们检查MT这些实验双语提取的MWEs质量。我们在MT应用MWEs最初的实验显示在MWE方面的定性分析,定量分析,提高了翻译的性能和更好的总体评价得分,双方德语 - 英语和中国英语的语言对。我们遵循一个标准的实验管道来创建我们MultiMWE语料库它可在网上。研究人员可以利用这个免费的语料库为自己的模型或他们的知识库为模型功能使用。
Lifeng Han, Gareth J.F. Jones, Alan F. Smeaton
Abstract: Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU Project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features.
摘要:多字表达式(MWEs)是在自然语言处理(NLP)研究的热点,包括主题,如MWE检测,MWE分解,研究调查MWEs在其他NLP领域,如机器翻译的剥削。然而,双语或多语语料库MWE的可用性是非常有限的。唯一的双语语料MWE,我们都知道是从PARSEME(解析和多字表达式)欧盟项目。这仅仅是871双英语,德语MWEs的小集合。在本文中,我们本发明的多语言和双语语料库MWE我们从根平行语料库萃取。我们的藏品3159226和过滤后,分别为德国,英国和中国的英文双语143042 MWE对。我们检查MT这些实验双语提取的MWEs质量。我们在MT应用MWEs最初的实验显示在MWE方面的定性分析,定量分析,提高了翻译的性能和更好的总体评价得分,双方德语 - 英语和中国英语的语言对。我们遵循一个标准的实验管道来创建我们MultiMWE语料库它可在网上。研究人员可以利用这个免费的语料库为自己的模型或他们的知识库为模型功能使用。
8. LaCulturaNonSiFerma -- Report su uso e la diffusione degli hashtag delle istituzioni culturali italiane durante il periodo di lockdown [PDF] 返回目录
Carola Carlino, Gennaro Nolano, Maria Pia di Buono, Johanna Monti
Abstract: This report presents an analysis of #hashtags used by Italian Cultural Heritage institutions to promote and communicate cultural content during the COVID-19 lock-down period in Italy. Several activities to support and engage users' have been proposed using social media. Most of these activities present one or more #hashtags which help to aggregate content and create a community on specific topics. Results show that on one side Italian institutions have been very proactive in adapting to the pandemic scenario and on the other side users' reacted very positively increasing their participation in the proposed activities.
摘要:本报告中使用由意大利文化遗产机构,以促进和的#哈希标签在分析过程中在意大利COVID-19的向下锁定期沟通文化的内容。一些活动,以支持和参与用户一直在使用社交媒体提出的。这些活动多数存在一个或多个该#哈希标签帮助聚合内容,并创建特定主题的社区。结果表明,在一边的意大利机构一直在适应大流行的情况,而在另一侧用户的非常积极主动的反应非常积极增加拟议活动的参与。
Carola Carlino, Gennaro Nolano, Maria Pia di Buono, Johanna Monti
Abstract: This report presents an analysis of #hashtags used by Italian Cultural Heritage institutions to promote and communicate cultural content during the COVID-19 lock-down period in Italy. Several activities to support and engage users' have been proposed using social media. Most of these activities present one or more #hashtags which help to aggregate content and create a community on specific topics. Results show that on one side Italian institutions have been very proactive in adapting to the pandemic scenario and on the other side users' reacted very positively increasing their participation in the proposed activities.
摘要:本报告中使用由意大利文化遗产机构,以促进和的#哈希标签在分析过程中在意大利COVID-19的向下锁定期沟通文化的内容。一些活动,以支持和参与用户一直在使用社交媒体提出的。这些活动多数存在一个或多个该#哈希标签帮助聚合内容,并创建特定主题的社区。结果表明,在一边的意大利机构一直在适应大流行的情况,而在另一侧用户的非常积极主动的反应非常积极增加拟议活动的参与。
9. Fluent Response Generation for Conversational Question Answering [PDF] 返回目录
Ashutosh Baheti, Alan Ritter, Kevin Small
Abstract: Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model's scalability by conducting tests on the CoQA dataset. The code and data are available at this https URL.
摘要:答疑(QA)是开放式的对话域代理的一个重要方面,赢得在对话QA(ConvQA)子任务,具体的研究重点。最近ConvQA努力的一个显着的限制是从所述目标语料库中的响应是答案跨度提取,从而忽视高质量会话剂的自然语言生成(NLG)方面。在这项工作中,我们提出了一个情境NLG SEQ2SEQ办法范围内QA反应生成流畅语法答案反应,同时保持正确的方法。从技术角度来看,我们用数据增强产生用于终端到终端系统的训练数据。具体而言,我们开发句法转换(STS),产生问题的专有的候补回答的反应和排名他们使用基于BERT分类器(Devlin等。2019年)。上小队2.0数据人工评估(Rajpurkar等人,2018)表明,该模型优于基线CoQA和QuAC模型在产生会话应答。进一步的研究表明通过开展对CoQA数据集的测试我们的模型的可扩展性。代码和数据都可以在此HTTPS URL。
Ashutosh Baheti, Alan Ritter, Kevin Small
Abstract: Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model's scalability by conducting tests on the CoQA dataset. The code and data are available at this https URL.
摘要:答疑(QA)是开放式的对话域代理的一个重要方面,赢得在对话QA(ConvQA)子任务,具体的研究重点。最近ConvQA努力的一个显着的限制是从所述目标语料库中的响应是答案跨度提取,从而忽视高质量会话剂的自然语言生成(NLG)方面。在这项工作中,我们提出了一个情境NLG SEQ2SEQ办法范围内QA反应生成流畅语法答案反应,同时保持正确的方法。从技术角度来看,我们用数据增强产生用于终端到终端系统的训练数据。具体而言,我们开发句法转换(STS),产生问题的专有的候补回答的反应和排名他们使用基于BERT分类器(Devlin等。2019年)。上小队2.0数据人工评估(Rajpurkar等人,2018)表明,该模型优于基线CoQA和QuAC模型在产生会话应答。进一步的研究表明通过开展对CoQA数据集的测试我们的模型的可扩展性。代码和数据都可以在此HTTPS URL。
10. Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit [PDF] 返回目录
Curtis Murray, Lewis Mitchell, Jonathan Tuke, Mark Mackay
Abstract: Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.
摘要:COVID-19的社交媒体的讨论提供了丰富的信息源到病毒如何影响人们的生活是从传统的公共卫生数据集质的不同。特别是,当个人在社交媒体上的病毒的过程中自我报告他们的经验,它可以允许情感识别病人的症状滋生的每一个阶段。帖子到reddit的论坛R / COVID19Positive包含来自COVID-19阳性患者的第一手资料,从而洞悉个人奋斗与病毒。这些职位往往功能说明文字是指开发后症状的天数的时间结构。利用主题建模和情感分析,我们量化,因为症状,发病时间在COVID-19整个人的第14天经历了讨论的变化。早期症状,如发热,咳嗽,咽痛话语,浓缩朝帖子的开头,而表明呼吸问题的语言见顶十天左右。围绕危重病例一番谈话也被确定并出现在一个大致恒定的速率。我们确定的与这些症状的发展相关联的积极和消极情绪两个明显的集群和映射它们之间的关系。我们的研究结果提供COVID-19的病人的经验,补充其他医疗数据流,并有可能泄露时的心理健康问题可能出现的透视。
Curtis Murray, Lewis Mitchell, Jonathan Tuke, Mark Mackay
Abstract: Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.
摘要:COVID-19的社交媒体的讨论提供了丰富的信息源到病毒如何影响人们的生活是从传统的公共卫生数据集质的不同。特别是,当个人在社交媒体上的病毒的过程中自我报告他们的经验,它可以允许情感识别病人的症状滋生的每一个阶段。帖子到reddit的论坛R / COVID19Positive包含来自COVID-19阳性患者的第一手资料,从而洞悉个人奋斗与病毒。这些职位往往功能说明文字是指开发后症状的天数的时间结构。利用主题建模和情感分析,我们量化,因为症状,发病时间在COVID-19整个人的第14天经历了讨论的变化。早期症状,如发热,咳嗽,咽痛话语,浓缩朝帖子的开头,而表明呼吸问题的语言见顶十天左右。围绕危重病例一番谈话也被确定并出现在一个大致恒定的速率。我们确定的与这些症状的发展相关联的积极和消极情绪两个明显的集群和映射它们之间的关系。我们的研究结果提供COVID-19的病人的经验,补充其他医疗数据流,并有可能泄露时的心理健康问题可能出现的透视。
11. MTSS: Learn from Multiple Domain Teachers and Become a Multi-domain Dialogue Expert [PDF] 返回目录
Shuke Peng, Feng Ji, Zehao Lin, Shaobo Cui, Haiqing Chen, Yin Zhang
Abstract: How to build a high-quality multi-domain dialogue system is a challenging work due to its complicated and entangled dialogue state space among each domain, which seriously limits the quality of dialogue policy, and further affects the generated response. In this paper, we propose a novel method to acquire a satisfying policy and subtly circumvent the knotty dialogue state representation problem in the multi-domain setting. Inspired by real school teaching scenarios, our method is composed of multiple domain-specific teachers and a universal student. Each individual teacher only focuses on one specific domain and learns its corresponding domain knowledge and dialogue policy based on a precisely extracted single domain dialogue state representation. Then, these domain-specific teachers impart their domain knowledge and policies to a universal student model and collectively make this student model a multi-domain dialogue expert. Experiment results show that our method reaches competitive results with SOTAs in both multi-domain and single domain setting.
摘要:如何建立一个高质量的多领域对话系统是一个具有挑战性的工作,因为每个域之间的复杂和纠缠对话状态空间,这严重限制了对话的质量方针,并进一步影响到生成的响应。在本文中,我们建议获得满足的政策,并巧妙地在多域设置规避棘手的对话状态表示问题的新方法。通过真正的学校教学情景的启发,我们的方法是由多个特定领域的教师和学生普遍的。每个单独的老师只专注于一个特定的领域和学习基于高精度提取单个域对话状态表示其相应的专业知识和对话的政策。然后,将这些特定领域的教师传授专业知识和政策的普遍学生模型和集体使该学生模型多域对话专家。实验结果表明,该方法以达到在两个多域和单域设置SOTAs竞争的结果。
Shuke Peng, Feng Ji, Zehao Lin, Shaobo Cui, Haiqing Chen, Yin Zhang
Abstract: How to build a high-quality multi-domain dialogue system is a challenging work due to its complicated and entangled dialogue state space among each domain, which seriously limits the quality of dialogue policy, and further affects the generated response. In this paper, we propose a novel method to acquire a satisfying policy and subtly circumvent the knotty dialogue state representation problem in the multi-domain setting. Inspired by real school teaching scenarios, our method is composed of multiple domain-specific teachers and a universal student. Each individual teacher only focuses on one specific domain and learns its corresponding domain knowledge and dialogue policy based on a precisely extracted single domain dialogue state representation. Then, these domain-specific teachers impart their domain knowledge and policies to a universal student model and collectively make this student model a multi-domain dialogue expert. Experiment results show that our method reaches competitive results with SOTAs in both multi-domain and single domain setting.
摘要:如何建立一个高质量的多领域对话系统是一个具有挑战性的工作,因为每个域之间的复杂和纠缠对话状态空间,这严重限制了对话的质量方针,并进一步影响到生成的响应。在本文中,我们建议获得满足的政策,并巧妙地在多域设置规避棘手的对话状态表示问题的新方法。通过真正的学校教学情景的启发,我们的方法是由多个特定领域的教师和学生普遍的。每个单独的老师只专注于一个特定的领域和学习基于高精度提取单个域对话状态表示其相应的专业知识和对话的政策。然后,将这些特定领域的教师传授专业知识和政策的普遍学生模型和集体使该学生模型多域对话专家。实验结果表明,该方法以达到在两个多域和单域设置SOTAs竞争的结果。
12. Text-to-Text Pre-Training for Data-to-Text Tasks [PDF] 返回目录
Mihir Kale
Abstract: We study the pre-train + fine-tune strategy for data-to-text tasks. Fine-tuning T5 achieves state-of-the-art results on the WebNLG, MultiWoz and ToTTo benchmarks. Moreover, the models are fully end-to-end and do not rely on any intermediate planning steps, delexicalization or copy mechanisms. T5 pre-training also enables stringer generalization, as evidenced by large improvements on out-of-domain test sets. We hope our work serves as a useful baseline for future research, as pre-training becomes ever more prevalent for data-to-text tasks.
摘要:我们研究的数据 - 文本任务预火车+微调策略。微调T5实现对WebNLG,MultiWoz和ToTTo基准状态的最先进的结果。此外,该机型完全结束到终端的,不依赖于任何中间的规划步骤,词语化或复制的机制。 T5前的训练也使桁概括,通过对域外的测试套大的改进就是明证。我们希望我们的工作成为未来研究的一个有用的基准,如前的训练变得更加普遍的数据到文本任务。
Mihir Kale
Abstract: We study the pre-train + fine-tune strategy for data-to-text tasks. Fine-tuning T5 achieves state-of-the-art results on the WebNLG, MultiWoz and ToTTo benchmarks. Moreover, the models are fully end-to-end and do not rely on any intermediate planning steps, delexicalization or copy mechanisms. T5 pre-training also enables stringer generalization, as evidenced by large improvements on out-of-domain test sets. We hope our work serves as a useful baseline for future research, as pre-training becomes ever more prevalent for data-to-text tasks.
摘要:我们研究的数据 - 文本任务预火车+微调策略。微调T5实现对WebNLG,MultiWoz和ToTTo基准状态的最先进的结果。此外,该机型完全结束到终端的,不依赖于任何中间的规划步骤,词语化或复制的机制。 T5前的训练也使桁概括,通过对域外的测试套大的改进就是明证。我们希望我们的工作成为未来研究的一个有用的基准,如前的训练变得更加普遍的数据到文本任务。
13. Automated Question Answer medical model based on Deep Learning Technology [PDF] 返回目录
Abdelrahman Abdallah, Mahmoud Kasem, Mohamed Hamada, Shaymaa Sdeek
Abstract: Artificial intelligence can now provide more solutions for different problems, especially in the medical field. One of those problems the lack of answers to any given medical/health-related question. The Internet is full of forums that allow people to ask some specific questions and get great answers for them. Nevertheless, browsing these questions in order to locate one similar to your own, also finding a satisfactory answer is a difficult and time-consuming task. This research will introduce a solution to this problem by automating the process of generating qualified answers to these questions and creating a kind of digital doctor. Furthermore, this research will train an end-to-end model using the framework of RNN and the encoder-decoder to generate sensible and useful answers to a small set of medical/health issues. The proposed model was trained and evaluated using data from various online services, such as WebMD, HealthTap, eHealthForums, and iCliniq.
摘要:人工智能现在可以针对不同的问题提供更多的解决方案,尤其是在医疗领域。其中的一个问题得不到答案任何给定的医疗/健康相关的问题。互联网充斥论坛,让人们问一些具体的问题,并为他们获得极大的答案。然而,为了找到一个类似于你自己,也找到一个满意的答复浏览这些问题是一项艰巨而耗时的任务。这项研究将通过自动生成资格回答这些问题,并创造一种数字医生的过程中引入解决了这一问题。此外,这项研究将培训使用RNN的框架和编码器,解码器,以生成合理的和有用的答案的小集医疗/健康问题的端至高端型号。该模型进行训练,并且使用各种在线服务,如WebMD表示,HealthTap,eHealthForums和iCliniq数据进行评估。
Abdelrahman Abdallah, Mahmoud Kasem, Mohamed Hamada, Shaymaa Sdeek
Abstract: Artificial intelligence can now provide more solutions for different problems, especially in the medical field. One of those problems the lack of answers to any given medical/health-related question. The Internet is full of forums that allow people to ask some specific questions and get great answers for them. Nevertheless, browsing these questions in order to locate one similar to your own, also finding a satisfactory answer is a difficult and time-consuming task. This research will introduce a solution to this problem by automating the process of generating qualified answers to these questions and creating a kind of digital doctor. Furthermore, this research will train an end-to-end model using the framework of RNN and the encoder-decoder to generate sensible and useful answers to a small set of medical/health issues. The proposed model was trained and evaluated using data from various online services, such as WebMD, HealthTap, eHealthForums, and iCliniq.
摘要:人工智能现在可以针对不同的问题提供更多的解决方案,尤其是在医疗领域。其中的一个问题得不到答案任何给定的医疗/健康相关的问题。互联网充斥论坛,让人们问一些具体的问题,并为他们获得极大的答案。然而,为了找到一个类似于你自己,也找到一个满意的答复浏览这些问题是一项艰巨而耗时的任务。这项研究将通过自动生成资格回答这些问题,并创造一种数字医生的过程中引入解决了这一问题。此外,这项研究将培训使用RNN的框架和编码器,解码器,以生成合理的和有用的答案的小集医疗/健康问题的端至高端型号。该模型进行训练,并且使用各种在线服务,如WebMD表示,HealthTap,eHealthForums和iCliniq数据进行评估。
14. Stance Prediction and Claim Verification: An Arabic Perspective [PDF] 返回目录
Jude Khouja
Abstract: This work explores the application of textual entailment in news claim verification and stance prediction using a new corpus in Arabic. The publicly available corpus comes in two perspectives: a version consisting of 4,547 true and false claims and a version consisting of 3,786 pairs (claim, evidence). We describe the methodology for creating the corpus and the annotation process. Using the introduced corpus, we also develop two machine learning baselines for two proposed tasks: claim verification and stance prediction. Our best model utilizes pretraining (BERT) and achieves 76.7 F1 on the stance prediction task and 64.3 F1 on the claim verification task. Our preliminary experiments shed some light on the limits of automatic claim verification that relies on claims text only. Results hint that while the linguistic features and world knowledge learned during pretraining are useful for stance prediction, such learned representations from pretraining are insufficient for verifying claims without access to context or evidence.
摘要:该作品探讨文字蕴涵的新闻要求验证和姿态预测使用阿拉伯语新文集中的应用。公开发布的语料库有两种观点:由4547个真假索赔和由3786对(主张,证据)版本的版本。我们描述了创建语料库和注释过程的方法。使用介绍语料库,我们还开发了两个任务提出两个机器学习的基线:要求验证和姿态预测。我们最好的模型利用训练前(BERT),达到76.7 F1上的立场预测任务和64.3 F1的要求验证任务。我们的初步实验揭示,它依赖于仅权利要求书的文字自动权利要求验证的限制一些光。结果提示,虽然训练前期间学到的语言特点和世界的知识是姿态预测是有用的,从训练前这样的教训陈述不足以证实索赔得不到上下文或证据。
Jude Khouja
Abstract: This work explores the application of textual entailment in news claim verification and stance prediction using a new corpus in Arabic. The publicly available corpus comes in two perspectives: a version consisting of 4,547 true and false claims and a version consisting of 3,786 pairs (claim, evidence). We describe the methodology for creating the corpus and the annotation process. Using the introduced corpus, we also develop two machine learning baselines for two proposed tasks: claim verification and stance prediction. Our best model utilizes pretraining (BERT) and achieves 76.7 F1 on the stance prediction task and 64.3 F1 on the claim verification task. Our preliminary experiments shed some light on the limits of automatic claim verification that relies on claims text only. Results hint that while the linguistic features and world knowledge learned during pretraining are useful for stance prediction, such learned representations from pretraining are insufficient for verifying claims without access to context or evidence.
摘要:该作品探讨文字蕴涵的新闻要求验证和姿态预测使用阿拉伯语新文集中的应用。公开发布的语料库有两种观点:由4547个真假索赔和由3786对(主张,证据)版本的版本。我们描述了创建语料库和注释过程的方法。使用介绍语料库,我们还开发了两个任务提出两个机器学习的基线:要求验证和姿态预测。我们最好的模型利用训练前(BERT),达到76.7 F1上的立场预测任务和64.3 F1的要求验证任务。我们的初步实验揭示,它依赖于仅权利要求书的文字自动权利要求验证的限制一些光。结果提示,虽然训练前期间学到的语言特点和世界的知识是姿态预测是有用的,从训练前这样的教训陈述不足以证实索赔得不到上下文或证据。
15. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models [PDF] 返回目录
Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky
Abstract: Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained to predict the text k sentences away using a sampled-softmax objective where the candidates consist of neighboring sentences and sentences randomly sampled from the corpus. On the discourse representation benchmark DiscoEval, our model improves over the previous state-of-the-art by up to 13% and on average 4% absolute across 7 tasks. Our model is the same size as BERT-Base, but outperforms the much larger BERT- Large model and other more recent approaches that incorporate discourse. We also show that CONPONO yields gains of 2%-6% absolute even for tasks that do not explicitly evaluate discourse: textual entailment (RTE), common sense reasoning (COPA) and reading comprehension (ReCoRD).
摘要:文本的无监督表示学习最新型号已经采用了一些技术来提高语境字表述,但已经把小注重语篇水平表示。我们建议CONPONO,一个句子间客观上为训练前的语言模型,模型语篇连贯和句子之间的距离。给定一个固定语句,我们的模型进行训练远用采样SOFTMAX目标,其中包括候选人相邻的句子和句子从语料库随机抽取的预测文本ķ句子。在话语描述基准DiscoEval,我们的模型改进了以前的状态的最先进的高达13%,平均为4%,在7个工作绝对的。我们的模型,其大小与BERT-基本相同,但优于更大BERT-大型模型和其他最近方法结合了话语权。我们还表明,CONPONO产生的2%-6%的绝对收益,甚至对于不明确的评价话语任务:文字蕴涵(RTE),常识推理(COPA)和阅读理解(记录)。
Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky
Abstract: Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained to predict the text k sentences away using a sampled-softmax objective where the candidates consist of neighboring sentences and sentences randomly sampled from the corpus. On the discourse representation benchmark DiscoEval, our model improves over the previous state-of-the-art by up to 13% and on average 4% absolute across 7 tasks. Our model is the same size as BERT-Base, but outperforms the much larger BERT- Large model and other more recent approaches that incorporate discourse. We also show that CONPONO yields gains of 2%-6% absolute even for tasks that do not explicitly evaluate discourse: textual entailment (RTE), common sense reasoning (COPA) and reading comprehension (ReCoRD).
摘要:文本的无监督表示学习最新型号已经采用了一些技术来提高语境字表述,但已经把小注重语篇水平表示。我们建议CONPONO,一个句子间客观上为训练前的语言模型,模型语篇连贯和句子之间的距离。给定一个固定语句,我们的模型进行训练远用采样SOFTMAX目标,其中包括候选人相邻的句子和句子从语料库随机抽取的预测文本ķ句子。在话语描述基准DiscoEval,我们的模型改进了以前的状态的最先进的高达13%,平均为4%,在7个工作绝对的。我们的模型,其大小与BERT-基本相同,但优于更大BERT-大型模型和其他最近方法结合了话语权。我们还表明,CONPONO产生的2%-6%的绝对收益,甚至对于不明确的评价话语任务:文字蕴涵(RTE),常识推理(COPA)和阅读理解(记录)。
16. ScriptWriter: Narrative-Guided Script Generation [PDF] 返回目录
Yutao Zhu, Ruihua Song, Zhicheng Dou, Jian-Yun Nie, Jin Zhou
Abstract: It is appealing to have a system that generates a story or scripts automatically from a story-line, even though this is still out of our reach. In dialogue systems, it would also be useful to drive dialogues by a dialogue plan. In this paper, we address a key problem involved in these applications - guiding a dialogue by a narrative. The proposed model ScriptWriter selects the best response among the candidates that fit the context as well as the given narrative. It keeps track of what in the narrative has been said and what is to be said. A narrative plays a different role than the context (i.e., previous utterances), which is generally used in current dialogue systems. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.
摘要:这是有吸引力的有来自故事线会自动生成一个故事或脚本的系统,尽管这仍然是我们接触不到的地方。在对话系统,它也将是有益的对话计划推动对话。在本文中,我们讨论了为这些应用中的一个关键问题 - 通过引导一个叙事的对话。该模型编剧选择适合的背景下,以及在给定叙事的候选人中最好的回应。它跟踪的叙事已经说了什么和做什么可说的。叙述性起着比上下文(即,以前的话语),其通常用于当前对话系统不同的作用。由于数据的新应用程序的可用性,我们构建了从电影网站,最终用户可以在看电影的时候自由地上传他们的叙述一个新的大规模数据收集GraphMovie。在数据集上,我们基于叙述建议的方法显著优于简单地用叙事作为一种背景下的基线实验结果。
Yutao Zhu, Ruihua Song, Zhicheng Dou, Jian-Yun Nie, Jin Zhou
Abstract: It is appealing to have a system that generates a story or scripts automatically from a story-line, even though this is still out of our reach. In dialogue systems, it would also be useful to drive dialogues by a dialogue plan. In this paper, we address a key problem involved in these applications - guiding a dialogue by a narrative. The proposed model ScriptWriter selects the best response among the candidates that fit the context as well as the given narrative. It keeps track of what in the narrative has been said and what is to be said. A narrative plays a different role than the context (i.e., previous utterances), which is generally used in current dialogue systems. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.
摘要:这是有吸引力的有来自故事线会自动生成一个故事或脚本的系统,尽管这仍然是我们接触不到的地方。在对话系统,它也将是有益的对话计划推动对话。在本文中,我们讨论了为这些应用中的一个关键问题 - 通过引导一个叙事的对话。该模型编剧选择适合的背景下,以及在给定叙事的候选人中最好的回应。它跟踪的叙事已经说了什么和做什么可说的。叙述性起着比上下文(即,以前的话语),其通常用于当前对话系统不同的作用。由于数据的新应用程序的可用性,我们构建了从电影网站,最终用户可以在看电影的时候自由地上传他们的叙述一个新的大规模数据收集GraphMovie。在数据集上,我们基于叙述建议的方法显著优于简单地用叙事作为一种背景下的基线实验结果。
17. Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation [PDF] 返回目录
Bryan Eikema, Wilker Aziz
Abstract: Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest that there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode, under the model distribution. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we criticise NMT models probabilistically showing that stochastic samples following the model's own generative story do reproduce various statistics of the training data well, but that it is beam search that strays from such statistics. We show that some of the known pathologies of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account statistics gathered from the model distribution holistically. As a proof of concept we show that a straightforward implementation of minimum Bayes risk decoding gives good results outperforming beam search using as little as 30 samples, confirming that MLE-trained NMT models do capture important aspects of translation well in expectation.
摘要:最近的研究揭示了一些神经机器翻译(NMT)系统的病变。假设这些解释大多认为,有一些根本性的错误NMT的模型或训练算法,最大似然估计(MLE)。大部分这方面的证据是使用解码最大后验(MAP),决策规则旨在找出得分最高的翻译,即模式云集,分布模型下。我们认为,该证据证实MAP的不足之处进行解码超过铸件的模型及其训练算法怀疑。在这项工作中,我们批评NMT概率模型显示,继模型自身生成随机故事样本做训练数据的再现各种统计数据很好,但它是定向搜索,从这些统计数字流浪狗。我们发现,一些NMT已知的疾病都是由于图进行解码,而不是NMT的统计假设也没有MLE。特别是,我们表明,该模型累积这么小概率质量下,最有可能的翻译,该模式可以被认为实质上是任意。因此,我们提倡使用的是考虑到从整体上的分布模型收集帐户统计决策规则。作为一种概念证明我们证明了一个简单的实现最小贝叶斯风险解码给出好的结果使用尽可能少的30个样品,确认MLE训练NMT模型做翻译的捕捉重要方面以及在期望跑赢束搜索。
Bryan Eikema, Wilker Aziz
Abstract: Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest that there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode, under the model distribution. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we criticise NMT models probabilistically showing that stochastic samples following the model's own generative story do reproduce various statistics of the training data well, but that it is beam search that strays from such statistics. We show that some of the known pathologies of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account statistics gathered from the model distribution holistically. As a proof of concept we show that a straightforward implementation of minimum Bayes risk decoding gives good results outperforming beam search using as little as 30 samples, confirming that MLE-trained NMT models do capture important aspects of translation well in expectation.
摘要:最近的研究揭示了一些神经机器翻译(NMT)系统的病变。假设这些解释大多认为,有一些根本性的错误NMT的模型或训练算法,最大似然估计(MLE)。大部分这方面的证据是使用解码最大后验(MAP),决策规则旨在找出得分最高的翻译,即模式云集,分布模型下。我们认为,该证据证实MAP的不足之处进行解码超过铸件的模型及其训练算法怀疑。在这项工作中,我们批评NMT概率模型显示,继模型自身生成随机故事样本做训练数据的再现各种统计数据很好,但它是定向搜索,从这些统计数字流浪狗。我们发现,一些NMT已知的疾病都是由于图进行解码,而不是NMT的统计假设也没有MLE。特别是,我们表明,该模型累积这么小概率质量下,最有可能的翻译,该模式可以被认为实质上是任意。因此,我们提倡使用的是考虑到从整体上的分布模型收集帐户统计决策规则。作为一种概念证明我们证明了一个简单的实现最小贝叶斯风险解码给出好的结果使用尽可能少的30个样品,确认MLE训练NMT模型做翻译的捕捉重要方面以及在期望跑赢束搜索。
18. SafeComp: Protocol For Certifying Cloud Computations Integrity [PDF] 返回目录
Evgeny Shishkin, Evgeny Kislitsyn
Abstract: We define a problem of certifying computation integrity performed by some remote party we do not necessarily trust. We present a multi-party interactive protocol called SafeComp that solves this problem under specified constraints. Comparing to the nearest related work, our protocol reduces a proof construction complexity from $O(n \log{n})$ to $O(n)$, turning a communication complexity to exactly one round using a certificate of a comparable length.
摘要:我们定义的我们不一定信任一些远程方进行认证计算完整性的问题。我们提出所谓SafeComp多方交互协议,解决了在特定的约束这一问题。相较于最近的相关工作,我们的协议距离O $(N \日志{N})$到$ O(N)$减少了防爆构造复杂,使用相当长的证书转弯通信复杂性只有一个回合。
Evgeny Shishkin, Evgeny Kislitsyn
Abstract: We define a problem of certifying computation integrity performed by some remote party we do not necessarily trust. We present a multi-party interactive protocol called SafeComp that solves this problem under specified constraints. Comparing to the nearest related work, our protocol reduces a proof construction complexity from $O(n \log{n})$ to $O(n)$, turning a communication complexity to exactly one round using a certificate of a comparable length.
摘要:我们定义的我们不一定信任一些远程方进行认证计算完整性的问题。我们提出所谓SafeComp多方交互协议,解决了在特定的约束这一问题。相较于最近的相关工作,我们的协议距离O $(N \日志{N})$到$ O(N)$减少了防爆构造复杂,使用相当长的证书转弯通信复杂性只有一个回合。
19. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [PDF] 返回目录
Michihiro Yasunaga, Percy Liang
Abstract: We consider the problem of learning to repair programs from diagnostic feedback (e.g., compiler error messages). Program repair is challenging for two reasons: First, it requires reasoning and tracking symbols across source code and diagnostic feedback. Second, labeled datasets available for program repair are relatively small. In this work, we propose novel solutions to these two challenges. First, we introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback, and then apply a graph neural network on top to model the reasoning process. Second, we present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online to create a large amount of extra program repair examples, which we use to pre-train our models. We evaluate our proposed approach on two applications: correcting introductory programming assignments (DeepFix dataset) and correcting the outputs of program synthesis (SPoC dataset). Our final system, DrRepair, significantly outperforms prior work, achieving 66.1% full repair rate on DeepFix (+20.8% over the prior best), and 48.0% synthesis success rate on SPoC (+3.3% over the prior best).
摘要:我们认为学习从诊断反馈(例如,编译器错误消息)修复程序的问题。计划维修是具有挑战性的原因有两个:首先,它要求推理和跟踪整个源代码和诊断反馈符号。二,供程序修复标记数据集都比较小。在这项工作中,我们提出这两项挑战新颖的解决方案。首先,我们引入一个程序反馈图表,连接相关的程序修复符号的源代码和诊断反馈,然后申请一个图表神经网络的顶部推理过程建模。其次,我们提出了计划修一个自我监督的学习模式,充分利用未标记可在网上创造了大量的额外程序修复的例子,我们用它来预先训练我们的模型程序。我们评估我们提出了两种应用方式:纠正编程入门任务(DeepFix数据集)和纠正程序合成输出(SPOC数据集)。我们的最终系统,DrRepair,显著优于以前的工作,实现了(超过前一个最佳+ 20.8%),上DeepFix 66.1%满修复率,以及(在之前最好的+ 3.3%),在SPOC 48.0%合成成功率。
Michihiro Yasunaga, Percy Liang
Abstract: We consider the problem of learning to repair programs from diagnostic feedback (e.g., compiler error messages). Program repair is challenging for two reasons: First, it requires reasoning and tracking symbols across source code and diagnostic feedback. Second, labeled datasets available for program repair are relatively small. In this work, we propose novel solutions to these two challenges. First, we introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback, and then apply a graph neural network on top to model the reasoning process. Second, we present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online to create a large amount of extra program repair examples, which we use to pre-train our models. We evaluate our proposed approach on two applications: correcting introductory programming assignments (DeepFix dataset) and correcting the outputs of program synthesis (SPoC dataset). Our final system, DrRepair, significantly outperforms prior work, achieving 66.1% full repair rate on DeepFix (+20.8% over the prior best), and 48.0% synthesis success rate on SPoC (+3.3% over the prior best).
摘要:我们认为学习从诊断反馈(例如,编译器错误消息)修复程序的问题。计划维修是具有挑战性的原因有两个:首先,它要求推理和跟踪整个源代码和诊断反馈符号。二,供程序修复标记数据集都比较小。在这项工作中,我们提出这两项挑战新颖的解决方案。首先,我们引入一个程序反馈图表,连接相关的程序修复符号的源代码和诊断反馈,然后申请一个图表神经网络的顶部推理过程建模。其次,我们提出了计划修一个自我监督的学习模式,充分利用未标记可在网上创造了大量的额外程序修复的例子,我们用它来预先训练我们的模型程序。我们评估我们提出了两种应用方式:纠正编程入门任务(DeepFix数据集)和纠正程序合成输出(SPOC数据集)。我们的最终系统,DrRepair,显著优于以前的工作,实现了(超过前一个最佳+ 20.8%),上DeepFix 66.1%满修复率,以及(在之前最好的+ 3.3%),在SPOC 48.0%合成成功率。
20. Hidden Markov Chains, Entropic Forward-Backward, and Part-Of-Speech Tagging [PDF] 返回目录
Elie Azeraf, Emmanuel Monfrini, Emmanuel Vignon, Wojciech Pieczynski
Abstract: The ability to take into account the characteristics - also called features of observations is essential in Natural Language Processing (NLP) problems. Hidden Markov Chain (HMC) model associated with classic Forward-Backward probabilities cannot handle arbitrary features like prefixes or suffixes of any size, except with an independence condition. For twenty years, this default has encouraged the development of other sequential models, starting with the Maximum Entropy Markov Model (MEMM), which elegantly integrates arbitrary features. More generally, it led to neglect HMC for NLP. In this paper, we show that the problem is not due to HMC itself, but to the way its restoration algorithms are computed. We present a new way of computing HMC based restorations using original Entropic Forward and Entropic Backward (EFB) probabilities. Our method allows taking into account features in the HMC framework in the same way as in the MEMM framework. We illustrate the efficiency of HMC using EFB in Part-Of-Speech Tagging, showing its superiority over MEMM based restoration. We also specify, as a perspective, how HMCs with EFB might appear as an alternative to Recurrent Neural Networks to treat sequential data with a deep architecture.
摘要:考虑到该特性的能力 - 观察又称特点是自然语言处理(NLP)问题的关键。与经典的正倒向概率相关的隐马尔可夫链(HMC)模型不能处理任意功能,如前缀或任何大小的后缀,有独立性的条件时除外。二十年来,这个默认的鼓励与其它顺序模式的发展,首先是最大熵马尔可夫模型(MEMM),其中集成了优雅的任意功能。更一般地,它导致了忽视HMC为NLP。在本文中,我们表明,这个问题是不是由于HMC本身,而是它的恢复算法进行计算的方式。我们目前使用的计算熵原有正向和反向熵(EFB)的概率HMC基于修复的新方法。我们的方法允许考虑到功能的HMC框架以同样的方式作为MEMM框架。我们使用在部分词性标注EFB,显示出其在基于MEMM恢复优势说明HMC的效率。我们还规定,作为一个角度来看,如何与EFB的HMC可能出现作为替代回归神经网络具有深刻的架构来治疗顺序数据。
Elie Azeraf, Emmanuel Monfrini, Emmanuel Vignon, Wojciech Pieczynski
Abstract: The ability to take into account the characteristics - also called features of observations is essential in Natural Language Processing (NLP) problems. Hidden Markov Chain (HMC) model associated with classic Forward-Backward probabilities cannot handle arbitrary features like prefixes or suffixes of any size, except with an independence condition. For twenty years, this default has encouraged the development of other sequential models, starting with the Maximum Entropy Markov Model (MEMM), which elegantly integrates arbitrary features. More generally, it led to neglect HMC for NLP. In this paper, we show that the problem is not due to HMC itself, but to the way its restoration algorithms are computed. We present a new way of computing HMC based restorations using original Entropic Forward and Entropic Backward (EFB) probabilities. Our method allows taking into account features in the HMC framework in the same way as in the MEMM framework. We illustrate the efficiency of HMC using EFB in Part-Of-Speech Tagging, showing its superiority over MEMM based restoration. We also specify, as a perspective, how HMCs with EFB might appear as an alternative to Recurrent Neural Networks to treat sequential data with a deep architecture.
摘要:考虑到该特性的能力 - 观察又称特点是自然语言处理(NLP)问题的关键。与经典的正倒向概率相关的隐马尔可夫链(HMC)模型不能处理任意功能,如前缀或任何大小的后缀,有独立性的条件时除外。二十年来,这个默认的鼓励与其它顺序模式的发展,首先是最大熵马尔可夫模型(MEMM),其中集成了优雅的任意功能。更一般地,它导致了忽视HMC为NLP。在本文中,我们表明,这个问题是不是由于HMC本身,而是它的恢复算法进行计算的方式。我们目前使用的计算熵原有正向和反向熵(EFB)的概率HMC基于修复的新方法。我们的方法允许考虑到功能的HMC框架以同样的方式作为MEMM框架。我们使用在部分词性标注EFB,显示出其在基于MEMM恢复优势说明HMC的效率。我们还规定,作为一个角度来看,如何与EFB的HMC可能出现作为替代回归神经网络具有深刻的架构来治疗顺序数据。
21. Dynamic Sparsity Neural Networks for Automatic Speech Recognition [PDF] 返回目录
Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang
Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, in order to optimize for hardware with different resource specifications and for applications that have various latency requirements, models with varying sparsity levels usually need to be trained and deployed separately. In this paper, generalizing from slimmable neural networks, we present dynamic sparsity neural networks (DSNN) that, once trained, can instantly switch to execute at any given sparsity level at run-time. We show the efficacy of such models on ASR through comprehensive experiments and demonstrate that the performance of a dynamic sparsity model is on par with, and in some cases exceeds, the performance of individually trained single sparsity networks. A trained DSNN model can therefore greatly ease the training process and simplifies deployment in diverse scenarios with resource constraints.
摘要:自动语音识别(ASR),模型修剪是一种广泛采用的技术,该技术减少了模型的大小和延迟与资源约束边缘设备部署神经网络模型。然而,为了优化具有不同的资源规格的硬件,并为有不同的延迟要求,型号不同,通常需要接受培训,并单独部署稀疏级别的应用程序。在本文中,从slimmable神经网络,我们目前的动态稀疏神经网络(DSNN),一旦经过培训,可以立即切换为在运行任何给定时间稀疏级别上执行推广。我们通过全面的实验表明,这种模式对ASR的功效,并证明动态稀疏模型的性能看齐,并在某些情况下超过,单独训练单稀疏网络的性能。因此,一个训练有素的DSNN模型可以大大缓解多种场景下的训练过程,并简化了部署与资源约束。
Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang
Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, in order to optimize for hardware with different resource specifications and for applications that have various latency requirements, models with varying sparsity levels usually need to be trained and deployed separately. In this paper, generalizing from slimmable neural networks, we present dynamic sparsity neural networks (DSNN) that, once trained, can instantly switch to execute at any given sparsity level at run-time. We show the efficacy of such models on ASR through comprehensive experiments and demonstrate that the performance of a dynamic sparsity model is on par with, and in some cases exceeds, the performance of individually trained single sparsity networks. A trained DSNN model can therefore greatly ease the training process and simplifies deployment in diverse scenarios with resource constraints.
摘要:自动语音识别(ASR),模型修剪是一种广泛采用的技术,该技术减少了模型的大小和延迟与资源约束边缘设备部署神经网络模型。然而,为了优化具有不同的资源规格的硬件,并为有不同的延迟要求,型号不同,通常需要接受培训,并单独部署稀疏级别的应用程序。在本文中,从slimmable神经网络,我们目前的动态稀疏神经网络(DSNN),一旦经过培训,可以立即切换为在运行任何给定时间稀疏级别上执行推广。我们通过全面的实验表明,这种模式对ASR的功效,并证明动态稀疏模型的性能看齐,并在某些情况下超过,单独训练单稀疏网络的性能。因此,一个训练有素的DSNN模型可以大大缓解多种场景下的训练过程,并简化了部署与资源约束。
22. Multistream CNN for Robust Acoustic Modeling [PDF] 返回目录
Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey
Abstract: This paper presents multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed architecture accommodates diverse temporal resolutions in multiple streams to achieve robustness in acoustic modeling. For the diversity of temporal resolution in embedding processing, we consider dilation on TDNN-F, a variant of 1D-CNN. Each stream stacks narrower TDNN-F layers whose kernel has a unique, stream-specific dilation rate when processing input speech frames in parallel. Hence it can better represent acoustic events without the increase of model complexity. We validate the effectiveness of the proposed multistream CNN architecture by showing consistent improvement across various data sets. Trained with data augmentation methods, multistream CNN improves the WER of the test-other set in the LibriSpeech corpus by 12% (relative). On custom data from ASAPP's production system for a contact center, it records a relative WER improvement of 11% for the customer channel audios (10% on average for the agent and customer channel recordings) to prove the superiority of the proposed model architecture in the wild. In terms of real-time factor (RTF), multistream CNN outperforms the normal TDNN-F by 15%, which also suggests its practicality on production systems or applications.
摘要:本文介绍MULTISTREAM CNN,一个新的神经网络结构中的语音识别任务强劲声学建模。所提出的架构适应不同的时间分辨率在多个数据流中声学建模,从而实现健壮性。对于时间分辨率的嵌入处理的多样性,我们考虑TDNN-F,1D-CNN一个变种扩张。每个流堆叠窄TDNN-F的层,其内核并行地处理输入语音帧时具有独特的,流特定的扩张率。因此,它可以更好地代表声音事件,而模型复杂性的增加。我们通过展示在不同的数据集相一致的改善证明了该多流CNN架构的有效性。与数据增强方法的培训,多流CNN 12%(相对的)改善了LibriSpeech语料库中的试验另一组的WER。从ASAPP生产系统,用于联系中心的自定义数据,它都记录11%为客户信道音频的相对WER改善(平均为代理和客户通道记录10%),以证明所提出的模型结构中的优越性野生。在实时因子(RTF)而言,多流CNN 15%,这也表明其对生产系统或应用中的实用性优于正常TDNN-F。
Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey
Abstract: This paper presents multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed architecture accommodates diverse temporal resolutions in multiple streams to achieve robustness in acoustic modeling. For the diversity of temporal resolution in embedding processing, we consider dilation on TDNN-F, a variant of 1D-CNN. Each stream stacks narrower TDNN-F layers whose kernel has a unique, stream-specific dilation rate when processing input speech frames in parallel. Hence it can better represent acoustic events without the increase of model complexity. We validate the effectiveness of the proposed multistream CNN architecture by showing consistent improvement across various data sets. Trained with data augmentation methods, multistream CNN improves the WER of the test-other set in the LibriSpeech corpus by 12% (relative). On custom data from ASAPP's production system for a contact center, it records a relative WER improvement of 11% for the customer channel audios (10% on average for the agent and customer channel recordings) to prove the superiority of the proposed model architecture in the wild. In terms of real-time factor (RTF), multistream CNN outperforms the normal TDNN-F by 15%, which also suggests its practicality on production systems or applications.
摘要:本文介绍MULTISTREAM CNN,一个新的神经网络结构中的语音识别任务强劲声学建模。所提出的架构适应不同的时间分辨率在多个数据流中声学建模,从而实现健壮性。对于时间分辨率的嵌入处理的多样性,我们考虑TDNN-F,1D-CNN一个变种扩张。每个流堆叠窄TDNN-F的层,其内核并行地处理输入语音帧时具有独特的,流特定的扩张率。因此,它可以更好地代表声音事件,而模型复杂性的增加。我们通过展示在不同的数据集相一致的改善证明了该多流CNN架构的有效性。与数据增强方法的培训,多流CNN 12%(相对的)改善了LibriSpeech语料库中的试验另一组的WER。从ASAPP生产系统,用于联系中心的自定义数据,它都记录11%为客户信道音频的相对WER改善(平均为代理和客户通道记录10%),以证明所提出的模型结构中的优越性野生。在实时因子(RTF)而言,多流CNN 15%,这也表明其对生产系统或应用中的实用性优于正常TDNN-F。
23. ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [PDF] 返回目录
Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma
Abstract: In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75% on test-clean and 4.46% on test-other.
摘要:在本文中,我们对LibriSpeech语料库当前状态的最先进的(SOTA)性能与两个新的神经网络结构,其声学建模和语言建模的自细心简单重复单元(SRU)多流CNN。在混合ASR框架,多流CNN声学模型处理语音帧的在多个并行管线,其中每个流具有用于分集的独特扩张率的输入。与SpecAugment数据增强方法训练,它实现了4%的相对字差错率(WER)的改进上测试清洁和14%测试其他。我们进一步提高通过性能的N条最佳使用24层自周到SRU语言模型再评分,在测试其他的测试干净,4.46%,达到1.75%WERS。
Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma
Abstract: In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75% on test-clean and 4.46% on test-other.
摘要:在本文中,我们对LibriSpeech语料库当前状态的最先进的(SOTA)性能与两个新的神经网络结构,其声学建模和语言建模的自细心简单重复单元(SRU)多流CNN。在混合ASR框架,多流CNN声学模型处理语音帧的在多个并行管线,其中每个流具有用于分集的独特扩张率的输入。与SpecAugment数据增强方法训练,它实现了4%的相对字差错率(WER)的改进上测试清洁和14%测试其他。我们进一步提高通过性能的N条最佳使用24层自周到SRU语言模型再评分,在测试其他的测试干净,4.46%,达到1.75%WERS。
24. Simplified Self-Attention for Transformer-based End-to-End Speech Recognition [PDF] 返回目录
Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie
Abstract: Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers. In this paper, to reduce the model complexity while maintaining good performance, we propose a simplified self-attention (SSAN) layer which employs FSMN memory block instead of projection layers to form query and key vectors for transformer-based end-to-end speech recognition. We evaluate the SSAN-based and the conventional SAN-based transformers on the public AISHELL-1, internal 1000-hour and 20,000-hour large-scale Mandarin tasks. Results show that our proposed SSAN-based transformer model can achieve over 20% relative reduction in model parameters and 6.7% relative CER reduction on the AISHELL-1 task. With impressively 20% parameter reduction, our model shows no loss of recognition performance on the 20,000-hour large-scale task.
摘要:变压器模型已被引入到对由于自己的优势在模拟长期依赖各种任务的国家的最先进的性能年底到终端的语音识别。然而,这样的改进通常是通过使用非常大的神经网络获得。 Transformer模型主要包括两个子模块 - 位置明智的前馈层和自我关注(SAN)层。在本文中,以降低模型的复杂性,同时保持良好的性能,我们提出了一种采用FSMN存储器块,而不是投影层以形成查询和关键向量基于变压器的端到端语音的简化自关注(SSAN)层承认。我们评估SSAN为基础,对公众AISHELL-1,内部1000小时和20000小时的大型文华任务的传统的基于SAN的变压器。结果表明,我们提出了基于SSAN变压器模型可以在AISHELL-1任务实现模型参数的超过20%的相对减少和6.7%的相对减少CER。与令人印象深刻的20%减少的参数,我们的模型没有显示出在20000小时的大型任务的识别性能损失。
Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie
Abstract: Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers. In this paper, to reduce the model complexity while maintaining good performance, we propose a simplified self-attention (SSAN) layer which employs FSMN memory block instead of projection layers to form query and key vectors for transformer-based end-to-end speech recognition. We evaluate the SSAN-based and the conventional SAN-based transformers on the public AISHELL-1, internal 1000-hour and 20,000-hour large-scale Mandarin tasks. Results show that our proposed SSAN-based transformer model can achieve over 20% relative reduction in model parameters and 6.7% relative CER reduction on the AISHELL-1 task. With impressively 20% parameter reduction, our model shows no loss of recognition performance on the 20,000-hour large-scale task.
摘要:变压器模型已被引入到对由于自己的优势在模拟长期依赖各种任务的国家的最先进的性能年底到终端的语音识别。然而,这样的改进通常是通过使用非常大的神经网络获得。 Transformer模型主要包括两个子模块 - 位置明智的前馈层和自我关注(SAN)层。在本文中,以降低模型的复杂性,同时保持良好的性能,我们提出了一种采用FSMN存储器块,而不是投影层以形成查询和关键向量基于变压器的端到端语音的简化自关注(SSAN)层承认。我们评估SSAN为基础,对公众AISHELL-1,内部1000小时和20000小时的大型文华任务的传统的基于SAN的变压器。结果表明,我们提出了基于SSAN变压器模型可以在AISHELL-1任务实现模型参数的超过20%的相对减少和6.7%的相对减少CER。与令人印象深刻的20%减少的参数,我们的模型没有显示出在20000小时的大型任务的识别性能损失。
25. Training Keyword Spotting Models on Non-IID Data with Federated Learning [PDF] 返回目录
Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews
Abstract: We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimization algorithms and hyperparameter configurations using large-scale federated simulations. To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%. Finally, to label examples (given the zero visibility into on-device data), we explore teacher-student training.
摘要:我们证明一个产品质量的关键字斑模型可以在设备上使用联合的学习培训,并获得相当的错误接受和错误拒绝率集中训练的模式。为了克服与装配设备上的数据(这是固有的非独立同分布)相关联的算法的约束,我们进行的大规模使用联合模拟的优化算法和超参数配置彻底实证研究。为了克服资源的限制,我们替换SpecAugment,这减少了56%的错误拒绝率存储器密集型MTR数据扩张。最后,标签例子(零可视性设备上的数据),我们探索师生培训。
Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews
Abstract: We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimization algorithms and hyperparameter configurations using large-scale federated simulations. To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%. Finally, to label examples (given the zero visibility into on-device data), we explore teacher-student training.
摘要:我们证明一个产品质量的关键字斑模型可以在设备上使用联合的学习培训,并获得相当的错误接受和错误拒绝率集中训练的模式。为了克服与装配设备上的数据(这是固有的非独立同分布)相关联的算法的约束,我们进行的大规模使用联合模拟的优化算法和超参数配置彻底实证研究。为了克服资源的限制,我们替换SpecAugment,这减少了56%的错误拒绝率存储器密集型MTR数据扩张。最后,标签例子(零可视性设备上的数据),我们探索师生培训。
26. Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis [PDF] 返回目录
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract: Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manually annotated and complicated linguistic features such as part-of-speech tags and syntactic structures for system training. However, it must be carefully designed and well optimized so that it can implicitly extract useful linguistic features from the input features. In this paper we investigate under what conditions the neural sequence-to-sequence TTS can work well in Japanese and English along with comparisons with deep neural network (DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline systems also use autoregressive probabilistic modeling and a neural vocoder. We investigated systems from three aspects: a) model architecture, b) model parameter size, and c) language. For the model architecture aspect, we adopt modified Tacotron systems that we previously proposed and their variants using an encoder from Tacotron or Tacotron2. For the model parameter size aspect, we investigate two model parameter sizes. For the language aspect, we conduct listening tests in both Japanese and English to see if our findings can be generalized across languages. Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.
摘要:神经序列到序列的文本到语音合成(TTS)可以直接生产出高品质的语音从文本或简单的语言特征,例如音素。不同于传统的管道TTS,神经序列到序列TTS不需要手动注释和复杂的语言特征,如部分的语音标签和系统培训的句法结构。然而,必须精心设计和很好的优化,以便它可以隐含提取输入功能非常有用的语言特点。在本文中,我们探讨在什么条件下,神经序列对序列TTS可以与深层神经网络(DNN)的管道TTS系统比较沿日语和英语的工作。与过去的比较研究,管道系统还使用自回归概率模型和神经声码器。我们研究从三个方面系统:一)型架构,B)模型参数的大小,以及c)语言。对于模型架构方面,我们采用改良Tacotron系统,我们以前提出的,并使用从Tacotron或Tacotron2编码器及其变种。对于模型参数尺寸方面,我们调研两个模型参数的大小。对于语言方面,我们进行听日语和英语测试,看看我们的调查结果也会因语言一概而论。我们的实验表明,一个)神经序列对序列TTS系统应具有足够数量的模型参数,生产出高品质的语音,B),它也应该利用强大的编码器,当它需要字符输入,以及c)编码器仍然有改进的余地,需要有一个改进的架构,以更恰当地学习超段特征。
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract: Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manually annotated and complicated linguistic features such as part-of-speech tags and syntactic structures for system training. However, it must be carefully designed and well optimized so that it can implicitly extract useful linguistic features from the input features. In this paper we investigate under what conditions the neural sequence-to-sequence TTS can work well in Japanese and English along with comparisons with deep neural network (DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline systems also use autoregressive probabilistic modeling and a neural vocoder. We investigated systems from three aspects: a) model architecture, b) model parameter size, and c) language. For the model architecture aspect, we adopt modified Tacotron systems that we previously proposed and their variants using an encoder from Tacotron or Tacotron2. For the model parameter size aspect, we investigate two model parameter sizes. For the language aspect, we conduct listening tests in both Japanese and English to see if our findings can be generalized across languages. Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.
摘要:神经序列到序列的文本到语音合成(TTS)可以直接生产出高品质的语音从文本或简单的语言特征,例如音素。不同于传统的管道TTS,神经序列到序列TTS不需要手动注释和复杂的语言特征,如部分的语音标签和系统培训的句法结构。然而,必须精心设计和很好的优化,以便它可以隐含提取输入功能非常有用的语言特点。在本文中,我们探讨在什么条件下,神经序列对序列TTS可以与深层神经网络(DNN)的管道TTS系统比较沿日语和英语的工作。与过去的比较研究,管道系统还使用自回归概率模型和神经声码器。我们研究从三个方面系统:一)型架构,B)模型参数的大小,以及c)语言。对于模型架构方面,我们采用改良Tacotron系统,我们以前提出的,并使用从Tacotron或Tacotron2编码器及其变种。对于模型参数尺寸方面,我们调研两个模型参数的大小。对于语言方面,我们进行听日语和英语测试,看看我们的调查结果也会因语言一概而论。我们的实验表明,一个)神经序列对序列TTS系统应具有足够数量的模型参数,生产出高品质的语音,B),它也应该利用强大的编码器,当它需要字符输入,以及c)编码器仍然有改进的余地,需要有一个改进的架构,以更恰当地学习超段特征。
注:中文为机器翻译结果!