目录
2. WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification on Different Twitter Datasets [PDF] 摘要
5. IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding [PDF] 摘要
13. Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks [PDF] 摘要
16. RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications [PDF] 摘要
摘要
1. Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation [PDF] 返回目录
Toms Bergmanis, Artūrs Stafanovičs, Mārcis Pinnis
Abstract: Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems' training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems' performance drops by 2-3 BLEU points. To measure the robustness and noise invariance of machine translation systems' outputs, we use the average translation edit rate between the translation of the original sentence and its noised variants. Using this measure, we show that systems trained on adversarial examples on average yield 50% consistency improvements when compared to baselines trained on clean data.
摘要:当遇到不规范的拼写和标点神经机器翻译系统通常是在策划语料库和突破训练。抵御拼写错误和错别字,但由于机器翻译系统是用来翻译的非正式渊源,诸如聊天对话,社交媒体文章和网页的文本是至关重要的。我们提出了一个简单的生成噪声模型生成十种不同类型的对抗性的例子。我们用这些来增加机器翻译系统的性能由2-3 BLEU点滴的训练数据并显示,当噪声数据测试,系统采用对抗性的例子执行几乎以及翻译干净的数据时一样,而基线系统的培训。为了测量机器翻译系统输出的稳定性和噪声不变性,我们使用原句的翻译及其变种降噪之间的平均转换率的编辑。采用这一措施,我们表现出训练有素的对抗性例子平均收益率50%的一致性的改进,系统相比,训练有素的干净的数据基线时。
Toms Bergmanis, Artūrs Stafanovičs, Mārcis Pinnis
Abstract: Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems' training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems' performance drops by 2-3 BLEU points. To measure the robustness and noise invariance of machine translation systems' outputs, we use the average translation edit rate between the translation of the original sentence and its noised variants. Using this measure, we show that systems trained on adversarial examples on average yield 50% consistency improvements when compared to baselines trained on clean data.
摘要:当遇到不规范的拼写和标点神经机器翻译系统通常是在策划语料库和突破训练。抵御拼写错误和错别字,但由于机器翻译系统是用来翻译的非正式渊源,诸如聊天对话,社交媒体文章和网页的文本是至关重要的。我们提出了一个简单的生成噪声模型生成十种不同类型的对抗性的例子。我们用这些来增加机器翻译系统的性能由2-3 BLEU点滴的训练数据并显示,当噪声数据测试,系统采用对抗性的例子执行几乎以及翻译干净的数据时一样,而基线系统的培训。为了测量机器翻译系统输出的稳定性和噪声不变性,我们使用原句的翻译及其变种降噪之间的平均转换率的编辑。采用这一措施,我们表现出训练有素的对抗性例子平均收益率50%的一致性的改进,系统相比,训练有素的干净的数据基线时。
2. WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification on Different Twitter Datasets [PDF] 返回目录
Yasser Otiefy, Ahmed Abdelmalek, Islam El Hosary
Abstract: Communicating through social platforms has become one of the principal means of personal communications and interactions. Unfortunately, healthy communication is often interfered by offensive language that can have damaging effects on the users. A key to fight offensive language on social media is the existence of an automatic offensive language detection system. This paper presents the results and the main findings of SemEval-2020, Task 12 OffensEval Sub-task A Zampieri et al. (2020), on Identifying and categorising Offensive Language in Social Media. The task was based on the Arabic OffensEval dataset Mubarak et al. (2020). In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9% on the golden dataset under CodaLab username "yasserotiefy". We experimented with various models and the best model is a linear SVM in which we use a combination of both character and word n-grams. We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
摘要:通过社交平台的沟通已经成为个人通信和交互的主要手段之一。不幸的是,健康的通信通常是由能够对用户的破坏作用攻击性的语言干扰。打攻击性的语言对社会化媒体的关键是自动攻击性的语言检测系统的存在。本文呈现的结果和SemEval-2020的主要结果,任务12 OffensEval子任务A ZAMPIERI等。 (2020年),识别和社交媒体分类攻击性语言。任务是基于阿拉伯语OffensEval数据集穆巴拉克等人。 (2020年)。在本文中,我们描述了其中排名第十出52名参与者宏观F1 86.9%的黄金数据集下CodaLab用户名“yasserotiefy”共享任务提交WideBot人工智能实验室系统。我们尝试了各种模型和最佳模型是我们使用字符和单词的n-gram的组合线性SVM。我们还引进了神经网络的办法,增强,包括CNN,高速公路网,双LSTM,并注意图层我们系统的预测能力。
Yasser Otiefy, Ahmed Abdelmalek, Islam El Hosary
Abstract: Communicating through social platforms has become one of the principal means of personal communications and interactions. Unfortunately, healthy communication is often interfered by offensive language that can have damaging effects on the users. A key to fight offensive language on social media is the existence of an automatic offensive language detection system. This paper presents the results and the main findings of SemEval-2020, Task 12 OffensEval Sub-task A Zampieri et al. (2020), on Identifying and categorising Offensive Language in Social Media. The task was based on the Arabic OffensEval dataset Mubarak et al. (2020). In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9% on the golden dataset under CodaLab username "yasserotiefy". We experimented with various models and the best model is a linear SVM in which we use a combination of both character and word n-grams. We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
摘要:通过社交平台的沟通已经成为个人通信和交互的主要手段之一。不幸的是,健康的通信通常是由能够对用户的破坏作用攻击性的语言干扰。打攻击性的语言对社会化媒体的关键是自动攻击性的语言检测系统的存在。本文呈现的结果和SemEval-2020的主要结果,任务12 OffensEval子任务A ZAMPIERI等。 (2020年),识别和社交媒体分类攻击性语言。任务是基于阿拉伯语OffensEval数据集穆巴拉克等人。 (2020年)。在本文中,我们描述了其中排名第十出52名参与者宏观F1 86.9%的黄金数据集下CodaLab用户名“yasserotiefy”共享任务提交WideBot人工智能实验室系统。我们尝试了各种模型和最佳模型是我们使用字符和单词的n-gram的组合线性SVM。我们还引进了神经网络的办法,增强,包括CNN,高速公路网,双LSTM,并注意图层我们系统的预测能力。
3. A Comparison of LSTM and BERT for Small Corpus [PDF] 返回目录
Aysu Ezen-Can
Abstract: Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. Transformers have made a significant improvement in creating new state-of-the-art results for many NLP tasks including but not limited to text classification, text generation, and sequence labeling. Most of these success stories were based on large datasets. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? To answer this question, we use a small dataset for intent classification collected for building chatbots and compare the performance of a simple bidirectional LSTM model with a pre-trained BERT model. Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts. We conclude that the performance of a model is dependent on the task and the data, and therefore before making a model choice, these factors should be taken into consideration instead of directly choosing the most popular model.
摘要:在NLP领域的最新进展表明,迁移学习帮助与调整前的训练的模型实现国家的最先进的结果新的任务,而不是从头开始。变形金刚都取得了显著的改善创造国家的最先进的新成果对于很多NLP任务,包括但不限于文本分类,文本生成和序列标签。大多数这些成功的故事是基于大型数据集。在本文中,我们侧重于实际生活的场景,科学家在学术界和工业界的脸频频:给定一个小的数据集,我们可以使用像BERT大预先训练模式,不是简单的模型取得更好的成绩?要回答这个问题,我们使用收集构建聊天机器人意图分类的小数据集和比较预先训练BERT模式简单的双向LSTM模型的性能。我们的实验结果表明,双向LSTM模型可以实现比BERT模型一个小数据集,这些简单的模型在更短的时间得到培训比调整前的训练的同行显著更好的结果。我们的结论是模型的性能取决于任务和数据,并因此使模型的选择之前,这些因素都要考虑,而不是直接选择最流行的模式。
Aysu Ezen-Can
Abstract: Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. Transformers have made a significant improvement in creating new state-of-the-art results for many NLP tasks including but not limited to text classification, text generation, and sequence labeling. Most of these success stories were based on large datasets. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? To answer this question, we use a small dataset for intent classification collected for building chatbots and compare the performance of a simple bidirectional LSTM model with a pre-trained BERT model. Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts. We conclude that the performance of a model is dependent on the task and the data, and therefore before making a model choice, these factors should be taken into consideration instead of directly choosing the most popular model.
摘要:在NLP领域的最新进展表明,迁移学习帮助与调整前的训练的模型实现国家的最先进的结果新的任务,而不是从头开始。变形金刚都取得了显著的改善创造国家的最先进的新成果对于很多NLP任务,包括但不限于文本分类,文本生成和序列标签。大多数这些成功的故事是基于大型数据集。在本文中,我们侧重于实际生活的场景,科学家在学术界和工业界的脸频频:给定一个小的数据集,我们可以使用像BERT大预先训练模式,不是简单的模型取得更好的成绩?要回答这个问题,我们使用收集构建聊天机器人意图分类的小数据集和比较预先训练BERT模式简单的双向LSTM模型的性能。我们的实验结果表明,双向LSTM模型可以实现比BERT模型一个小数据集,这些简单的模型在更短的时间得到培训比调整前的训练的同行显著更好的结果。我们的结论是模型的性能取决于任务和数据,并因此使模型的选择之前,这些因素都要考虑,而不是直接选择最流行的模式。
4. Deep Learning for Semantic Relations [PDF] 返回目录
Vivi Nastase, Stan Szpakowicz
Abstract: The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpakowicz, Preslav Nakov and Diarmuid Ó Séaghdha) will be published by Morgan & Claypool. A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared. This is a preview of Chapter 5, made public by the kind permission of Morgan & Claypool.
摘要:“语义关系标称值之间”(由维维·讷斯塔塞,斯坦Szpakowicz,Preslav Nakov和DiarmuidÓSéaghdha)的第二版将由摩根克莱普尔公布。其中出现的第一个版本后出现的深学习范式本书讨论了关系分类/提取的新的第5章。这是第5章的预览,由摩根克莱普尔的一种许可公开。
Vivi Nastase, Stan Szpakowicz
Abstract: The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpakowicz, Preslav Nakov and Diarmuid Ó Séaghdha) will be published by Morgan & Claypool. A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared. This is a preview of Chapter 5, made public by the kind permission of Morgan & Claypool.
摘要:“语义关系标称值之间”(由维维·讷斯塔塞,斯坦Szpakowicz,Preslav Nakov和DiarmuidÓSéaghdha)的第二版将由摩根克莱普尔公布。其中出现的第一个版本后出现的深学习范式本书讨论了关系分类/提取的新的第5章。这是第5章的预览,由摩根克莱普尔的一种许可公开。
5. IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding [PDF] 返回目录
Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti
Abstract: Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources. In response, we introduce the first-ever vast resource for the training, evaluating, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification to pair-sentences sequence labeling with different levels of complexity. The datasets for the tasks lie in different domains and styles to ensure task diversity. We also provide a set of Indonesian pre-trained models (IndoBERT) trained from a large and clean Indonesian dataset Indo4B collected from publicly available sources such as social media texts, blogs, news, and websites. We release baseline models for all twelve tasks, as well as the framework for benchmark evaluation, and thus it enables everyone to benchmark their system performances.
摘要:尽管印尼被称为是在互联网上的第四个最常用的语言,在自然语言处理(NLP)这种语言的研究进展缓慢移动,由于缺乏可利用的资源。为此,我们推出了第一个广阔的资源,培训,评估和标杆印尼自然语言理解(IndoNLU)任务。 IndoNLU包括十二个任务,从单句分类到具有不同复杂程度的成对的句子序列标注。对于任务的数据集处于不同的领域和风格,以确保任务的多样性。我们还提供了一组从公开渠道搜集了大量清洁印尼集Indo4B训练有素的印尼预先训练模型(IndoBERT),如社交媒体文章,博客,新闻和网站。我们推出了面向所有十二个任务基线模型,以及为基准评估的框架,因此它使每个人评估他们的系统性能。
Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti
Abstract: Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources. In response, we introduce the first-ever vast resource for the training, evaluating, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification to pair-sentences sequence labeling with different levels of complexity. The datasets for the tasks lie in different domains and styles to ensure task diversity. We also provide a set of Indonesian pre-trained models (IndoBERT) trained from a large and clean Indonesian dataset Indo4B collected from publicly available sources such as social media texts, blogs, news, and websites. We release baseline models for all twelve tasks, as well as the framework for benchmark evaluation, and thus it enables everyone to benchmark their system performances.
摘要:尽管印尼被称为是在互联网上的第四个最常用的语言,在自然语言处理(NLP)这种语言的研究进展缓慢移动,由于缺乏可利用的资源。为此,我们推出了第一个广阔的资源,培训,评估和标杆印尼自然语言理解(IndoNLU)任务。 IndoNLU包括十二个任务,从单句分类到具有不同复杂程度的成对的句子序列标注。对于任务的数据集处于不同的领域和风格,以确保任务的多样性。我们还提供了一组从公开渠道搜集了大量清洁印尼集Indo4B训练有素的印尼预先训练模型(IndoBERT),如社交媒体文章,博客,新闻和网站。我们推出了面向所有十二个任务基线模型,以及为基准评估的框架,因此它使每个人评估他们的系统性能。
6. UPB at SemEval-2020 Task 11: Propaganda Detection with Domain-Specific Trained BERT [PDF] 返回目录
Andrei Paraschiv, Dumitru-Clementin Cercel, Mihai Dascalu
Abstract: Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our participation in the SemEval-2020, Task 11: Detection of Propaganda Techniques in News Articles competition. Our approach considers specializing a pre-trained BERT model on propagandistic and hyperpartisan news articles, enabling it to create more adequate representations for the two subtasks, namely propaganda Span Identification (SI) and propaganda Technique Classification (TC). Our proposed system achieved a F1-score of 46.060% in subtask SI, ranking 5th in the leaderboard from 36 teams and a micro-averaged F1 score of 54.302% for subtask TC, ranking 19th from 32 teams.
摘要:手法和误导性的新闻已经成为一种商品对于一些网络新闻和这些新闻已获得对人的全球思维一个显著的影响。宣传是一个具有目标由扩频意味着扭曲的想法来影响读者或操纵他们的意见一种常用的操作方法。本文介绍了我们在SemEval-2020的参与,任务11:宣传部的检测技术在新闻文章的竞争。我们的方法考虑了专业上的宣传和hyperpartisan新闻文章预先训练BERT模型,我们就可以创建两个子任务,即宣传跨度识别(SI)和宣传技术分类(TC)更充分的表述。我们所提出的系统来实现的子任务SI 46.060%一个F1-分数,从36支球队排名第5位的排行榜和微平均F1分数的54.302%,为子任务TC,从32支球队排名第19位。
Andrei Paraschiv, Dumitru-Clementin Cercel, Mihai Dascalu
Abstract: Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our participation in the SemEval-2020, Task 11: Detection of Propaganda Techniques in News Articles competition. Our approach considers specializing a pre-trained BERT model on propagandistic and hyperpartisan news articles, enabling it to create more adequate representations for the two subtasks, namely propaganda Span Identification (SI) and propaganda Technique Classification (TC). Our proposed system achieved a F1-score of 46.060% in subtask SI, ranking 5th in the leaderboard from 36 teams and a micro-averaged F1 score of 54.302% for subtask TC, ranking 19th from 32 teams.
摘要:手法和误导性的新闻已经成为一种商品对于一些网络新闻和这些新闻已获得对人的全球思维一个显著的影响。宣传是一个具有目标由扩频意味着扭曲的想法来影响读者或操纵他们的意见一种常用的操作方法。本文介绍了我们在SemEval-2020的参与,任务11:宣传部的检测技术在新闻文章的竞争。我们的方法考虑了专业上的宣传和hyperpartisan新闻文章预先训练BERT模型,我们就可以创建两个子任务,即宣传跨度识别(SI)和宣传技术分类(TC)更充分的表述。我们所提出的系统来实现的子任务SI 46.060%一个F1-分数,从36支球队排名第5位的排行榜和微平均F1分数的54.302%,为子任务TC,从32支球队排名第19位。
7. Weakly Supervised Content Selection for Improved Image Captioning [PDF] 返回目录
Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut
Abstract: Image captioning involves identifying semantic concepts in the scene and describing them in fluent natural language. Recent approaches do not explicitly model the semantic concepts and train the model only for the end goal of caption generation. Such models lack interpretability and controllability, primarily due to sub-optimal content selection. We address this problem by breaking down the captioning task into two simpler, manageable and more controllable tasks -- skeleton prediction and skeleton-based caption generation. We approach the former as a weakly supervised task, using a simple off-the-shelf language syntax parser and avoiding the need for additional human annotations; the latter uses a supervised-learning approach. We investigate three methods of conditioning the caption on skeleton in the encoder, decoder and both. Our compositional model generates significantly better quality captions on out of domain test images, as judged by human annotators. Additionally, we demonstrate the cross-language effectiveness of the English skeleton to other languages including French, Italian, German, Spanish and Hindi. This compositional nature of captioning exhibits the potential of unpaired image captioning, thereby reducing the dependence on expensive image-caption pairs. Furthermore, we investigate the use of skeletons as a knob to control certain properties of the generated image caption, such as length, content, and gender expression.
摘要:图片字幕包括识别语义概念的场景和描述他们用流利的自然语言。近来的方案并没有明确的模型语义概念和训练模型只对字幕生成的最终目标。这种模式缺乏解释性和可控性,这主要是由于次优的内容选择。我们应对打破字幕任务分为两个简单,便于管理,更可控的任务这个问题 - 骨骼预测和基于骨架字幕生成。我们接近前者为弱监督的任务,使用简单现成的,货架语言的语法分析器和避免额外人力注释的需要;后者采用的是监督学习的方法。我们调查的编码器,解码器和两个空调的三种方法的标题上骨架。我们的成分模型产生出上域测试图像显著更优质的字幕,通过人工注释的判断。此外,我们展示了英语骨架,以其他语言,包括法语,意大利语,德语,西班牙语和印地文的跨语言的有效性。字幕的此组成性质表现出不成对图像字幕的电位,从而降低昂贵的图像标题对的依赖性。此外,我们研究了使用骨架作为旋钮控制所生成的图像标题的某些性质,诸如长度,内容和性别表达。
Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut
Abstract: Image captioning involves identifying semantic concepts in the scene and describing them in fluent natural language. Recent approaches do not explicitly model the semantic concepts and train the model only for the end goal of caption generation. Such models lack interpretability and controllability, primarily due to sub-optimal content selection. We address this problem by breaking down the captioning task into two simpler, manageable and more controllable tasks -- skeleton prediction and skeleton-based caption generation. We approach the former as a weakly supervised task, using a simple off-the-shelf language syntax parser and avoiding the need for additional human annotations; the latter uses a supervised-learning approach. We investigate three methods of conditioning the caption on skeleton in the encoder, decoder and both. Our compositional model generates significantly better quality captions on out of domain test images, as judged by human annotators. Additionally, we demonstrate the cross-language effectiveness of the English skeleton to other languages including French, Italian, German, Spanish and Hindi. This compositional nature of captioning exhibits the potential of unpaired image captioning, thereby reducing the dependence on expensive image-caption pairs. Furthermore, we investigate the use of skeletons as a knob to control certain properties of the generated image caption, such as length, content, and gender expression.
摘要:图片字幕包括识别语义概念的场景和描述他们用流利的自然语言。近来的方案并没有明确的模型语义概念和训练模型只对字幕生成的最终目标。这种模式缺乏解释性和可控性,这主要是由于次优的内容选择。我们应对打破字幕任务分为两个简单,便于管理,更可控的任务这个问题 - 骨骼预测和基于骨架字幕生成。我们接近前者为弱监督的任务,使用简单现成的,货架语言的语法分析器和避免额外人力注释的需要;后者采用的是监督学习的方法。我们调查的编码器,解码器和两个空调的三种方法的标题上骨架。我们的成分模型产生出上域测试图像显著更优质的字幕,通过人工注释的判断。此外,我们展示了英语骨架,以其他语言,包括法语,意大利语,德语,西班牙语和印地文的跨语言的有效性。字幕的此组成性质表现出不成对图像字幕的电位,从而降低昂贵的图像标题对的依赖性。此外,我们研究了使用骨架作为旋钮控制所生成的图像标题的某些性质,诸如长度,内容和性别表达。
8. Sparsifying Transformer Models with Differentiable Representation Pooling [PDF] 返回目录
Michał Pietruszka, Łukasz Borchmann, Filip Graliński
Abstract: We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations, thus leveraging the model's information bottleneck with twofold strength. A careful analysis shows that the contextualization of encoded representations in our model is significantly more effective than in the original Transformer. We achieve a notable reduction in memory usage due to an improved differentiable top-k operator, making the model suitable to process long documents, as shown on an example of a summarization task.
摘要:我们通过学习来选择最翔实的标记表示,从而利用该模型的信息瓶颈有两方面的力量提出了一种新的方法来在变压器模型sparsify关注。仔细分析表明,在我们的模型编码表达的语境是显著比原来的变压器更有效。我们由于改进的微分的前k个操作者实现的内存使用情况的显着减少,使得合适的处理长文档的模型,在一个摘要任务的示例,如图所示。
Michał Pietruszka, Łukasz Borchmann, Filip Graliński
Abstract: We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations, thus leveraging the model's information bottleneck with twofold strength. A careful analysis shows that the contextualization of encoded representations in our model is significantly more effective than in the original Transformer. We achieve a notable reduction in memory usage due to an improved differentiable top-k operator, making the model suitable to process long documents, as shown on an example of a summarization task.
摘要:我们通过学习来选择最翔实的标记表示,从而利用该模型的信息瓶颈有两方面的力量提出了一种新的方法来在变压器模型sparsify关注。仔细分析表明,在我们的模型编码表达的语境是显著比原来的变压器更有效。我们由于改进的微分的前k个操作者实现的内存使用情况的显着减少,使得合适的处理长文档的模型,在一个摘要任务的示例,如图所示。
9. Accelerating Real-Time Question Answering via Question Generation [PDF] 返回目录
Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
Abstract: Existing approaches to real-time question answering (RTQA) rely on learning the representations of only key phrases in the documents, then matching them with the question representation to derive answer. However, such approach is bottlenecked by the encoding time of real-time questions, thus suffering from detectable latency in deployment for large-volume traffic. To accelerate RTQA for practical use, we present Ocean-Q (an Ocean of Questions), a novel approach that leverages question generation (QG) for RTQA. Ocean-Q introduces a QG model to generate a large pool of question-answer (QA) pairs offline, then in real time matches an input question with the candidate QA pool to predict the answer without question encoding. To further improve QG quality, we propose a new data augmentation method and leverage multi-task learning and diverse beam search to boost RTQA performance. Experiments on SQuAD(-open) and HotpotQA benchmarks demonstrate that Ocean-Q is able to accelerate the fastest state-of-the-art RTQA system by 4X times, with only a 3+% accuracy drop.
摘要:现有的方法实时问答(RTQA)依靠学习文档中的唯一关键短语的陈述,然后将它们与问题表示为了得到答案符合。然而,这样的做法是通过实时的问题编码时间瓶颈,从而从检测延迟苦难部署大容量的交通。为了加快RTQA在实际使用中,我们提出海洋-Q(问题的海洋),一种新的方法,它利用问题生成(QG)为RTQA。海洋-Q引入了QG模型来生成问答的大池(QA)对离线,然后实时与候选QA池预测毫无疑问编码的答案的输入问题相匹配。为了进一步提高质量QG,我们提出了一个新的数据增强方法,并利用多任务学习和多样化的束搜索来提高RTQA性能。上小队(-open)和HotpotQA基准实验表明,海洋-Q能够通过4X倍加速最快状态的最先进的系统RTQA,只有3 +%的准确度下降。
Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
Abstract: Existing approaches to real-time question answering (RTQA) rely on learning the representations of only key phrases in the documents, then matching them with the question representation to derive answer. However, such approach is bottlenecked by the encoding time of real-time questions, thus suffering from detectable latency in deployment for large-volume traffic. To accelerate RTQA for practical use, we present Ocean-Q (an Ocean of Questions), a novel approach that leverages question generation (QG) for RTQA. Ocean-Q introduces a QG model to generate a large pool of question-answer (QA) pairs offline, then in real time matches an input question with the candidate QA pool to predict the answer without question encoding. To further improve QG quality, we propose a new data augmentation method and leverage multi-task learning and diverse beam search to boost RTQA performance. Experiments on SQuAD(-open) and HotpotQA benchmarks demonstrate that Ocean-Q is able to accelerate the fastest state-of-the-art RTQA system by 4X times, with only a 3+% accuracy drop.
摘要:现有的方法实时问答(RTQA)依靠学习文档中的唯一关键短语的陈述,然后将它们与问题表示为了得到答案符合。然而,这样的做法是通过实时的问题编码时间瓶颈,从而从检测延迟苦难部署大容量的交通。为了加快RTQA在实际使用中,我们提出海洋-Q(问题的海洋),一种新的方法,它利用问题生成(QG)为RTQA。海洋-Q引入了QG模型来生成问答的大池(QA)对离线,然后实时与候选QA池预测毫无疑问编码的答案的输入问题相匹配。为了进一步提高质量QG,我们提出了一个新的数据增强方法,并利用多任务学习和多样化的束搜索来提高RTQA性能。上小队(-open)和HotpotQA基准实验表明,海洋-Q能够通过4X倍加速最快状态的最先进的系统RTQA,只有3 +%的准确度下降。
10. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [PDF] 返回目录
Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
Abstract: Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great success in cross-lingual representation learning. However, when applied to zero-shot cross-lingual transfer tasks, most existing methods use only single-language input for LM finetuning, without leveraging the intrinsic cross-lingual alignment between different languages that is essential for multilingual tasks. In this paper, we propose FILTER, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. Specifically, FILTER first encodes text input in the source language and its translation in the target language independently in the shallow layers, then performs cross-lingual fusion to extract multilingual knowledge in the intermediate layers, and finally performs further language-specific encoding. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. For simple tasks such as classification, translated text in the target language shares the same label as the source language. However, this shared label becomes less accurate or even unavailable for more complex tasks such as question answering, NER and POS tagging. For better model scalability, we further propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language. Extensive experiments demonstrate that FILTER achieves new state of the art (77.0 on average) on the challenging multilingual multi-task benchmark, XTREME.
摘要:大型跨语种语言模型(LM),如mBERT,Unicoder和XLM,都实现了跨语言表示学习了巨大的成功。然而,当应用于零射门跨语言传输任务,大多数现有的方法只使用单一语言输入LM细化和微调,而不会利用不同语言之间的内在跨语种排列是多语种的任务是必不可少的。在本文中,我们提议FILTER,即需要跨语言数据作为用于XLM输入细化和微调的增强的融合方法。具体而言,在源语言FILTER第一编码文本输入和其在目标语言独立于浅层翻译,然后执行跨语言融合在中间层中提取多语言知识,最后执行进一步的特定语言的编码。在推理,该模型使得基于目标语言的文字输入和它在源语言翻译预测。对于简单的任务,如分类,在目标语言的翻译文本共享相同的标签作为源语言。但是,这种共享的标签变得不准确或更复杂的任务,如问答,NER和词性标注,甚至无法使用。为了更好的可扩展性模型,我们进一步提出了模型训练的额外KL-发散自学损失的基础上自动生成的软伪标签在目标语言的翻译文本。大量的实验证明,FILTER达到艺术(77.0平均)的挑战多语种多任务基准,XTREME的新状态。
Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
Abstract: Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great success in cross-lingual representation learning. However, when applied to zero-shot cross-lingual transfer tasks, most existing methods use only single-language input for LM finetuning, without leveraging the intrinsic cross-lingual alignment between different languages that is essential for multilingual tasks. In this paper, we propose FILTER, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. Specifically, FILTER first encodes text input in the source language and its translation in the target language independently in the shallow layers, then performs cross-lingual fusion to extract multilingual knowledge in the intermediate layers, and finally performs further language-specific encoding. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. For simple tasks such as classification, translated text in the target language shares the same label as the source language. However, this shared label becomes less accurate or even unavailable for more complex tasks such as question answering, NER and POS tagging. For better model scalability, we further propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language. Extensive experiments demonstrate that FILTER achieves new state of the art (77.0 on average) on the challenging multilingual multi-task benchmark, XTREME.
摘要:大型跨语种语言模型(LM),如mBERT,Unicoder和XLM,都实现了跨语言表示学习了巨大的成功。然而,当应用于零射门跨语言传输任务,大多数现有的方法只使用单一语言输入LM细化和微调,而不会利用不同语言之间的内在跨语种排列是多语种的任务是必不可少的。在本文中,我们提议FILTER,即需要跨语言数据作为用于XLM输入细化和微调的增强的融合方法。具体而言,在源语言FILTER第一编码文本输入和其在目标语言独立于浅层翻译,然后执行跨语言融合在中间层中提取多语言知识,最后执行进一步的特定语言的编码。在推理,该模型使得基于目标语言的文字输入和它在源语言翻译预测。对于简单的任务,如分类,在目标语言的翻译文本共享相同的标签作为源语言。但是,这种共享的标签变得不准确或更复杂的任务,如问答,NER和词性标注,甚至无法使用。为了更好的可扩展性模型,我们进一步提出了模型训练的额外KL-发散自学损失的基础上自动生成的软伪标签在目标语言的翻译文本。大量的实验证明,FILTER达到艺术(77.0平均)的挑战多语种多任务基准,XTREME的新状态。
11. Rank over Class: The Untapped Potential of Ranking in Natural Language Processing [PDF] 返回目录
Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
Abstract: Text classification has long been a staple in natural language processing with applications spanning across sentiment analysis, online content tagging, recommender systems and spam detection. However, text classification, by nature, suffers from a variety of issues stemming from dataset imbalance, text ambiguity, subjectivity and the lack of linguistic context in the data. In this paper, we explore the use of text ranking, commonly used in information retrieval, to carry out challenging classification-based tasks. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences, which are in turn passed into a context aggregating network outputting ranking scores used to determine an ordering to the sequences based on some notion of relevance. We perform numerous experiments on publicly-available datasets and investigate the possibility of applying our ranking approach to certain problems often addressed using classification. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification, demonstrating the efficacy of text ranking over text classification in certain scenarios.
摘要:文本分类一直是自然语言处理的主食与应用程序通过情感分析,在线内容标记跨越,推荐系统和垃圾邮件检测。然而,文本分类,按质,从各种数据集从失衡,文字模糊性,主观性,缺乏数据语境中所产生的问题困扰。在本文中,我们探索使用文本排序,信息检索常用的,开展具有挑战性的基于分类的任务。我们提出了一个新颖的端至端排序方法由变压器网络负责产生表示为一对文本序列,其依次通入上下文聚集网络输出排名分值的用于确定的排序基于所述序列相关的一些概念。我们公开发表的数据集进行无数次的实验和研究使用我们的排名方法对某些问题的可能性,采用分级往往解决。在上一重偏斜情绪分析数据集的实验,转换排名结果分类标签产生过度状态的最先进的文本分类的大约22%的提高,证明文本排名超过在某些情况下文本分类的功效。
Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
Abstract: Text classification has long been a staple in natural language processing with applications spanning across sentiment analysis, online content tagging, recommender systems and spam detection. However, text classification, by nature, suffers from a variety of issues stemming from dataset imbalance, text ambiguity, subjectivity and the lack of linguistic context in the data. In this paper, we explore the use of text ranking, commonly used in information retrieval, to carry out challenging classification-based tasks. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences, which are in turn passed into a context aggregating network outputting ranking scores used to determine an ordering to the sequences based on some notion of relevance. We perform numerous experiments on publicly-available datasets and investigate the possibility of applying our ranking approach to certain problems often addressed using classification. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification, demonstrating the efficacy of text ranking over text classification in certain scenarios.
摘要:文本分类一直是自然语言处理的主食与应用程序通过情感分析,在线内容标记跨越,推荐系统和垃圾邮件检测。然而,文本分类,按质,从各种数据集从失衡,文字模糊性,主观性,缺乏数据语境中所产生的问题困扰。在本文中,我们探索使用文本排序,信息检索常用的,开展具有挑战性的基于分类的任务。我们提出了一个新颖的端至端排序方法由变压器网络负责产生表示为一对文本序列,其依次通入上下文聚集网络输出排名分值的用于确定的排序基于所述序列相关的一些概念。我们公开发表的数据集进行无数次的实验和研究使用我们的排名方法对某些问题的可能性,采用分级往往解决。在上一重偏斜情绪分析数据集的实验,转换排名结果分类标签产生过度状态的最先进的文本分类的大约22%的提高,证明文本排名超过在某些情况下文本分类的功效。
12. RadLex Normalization in Radiology Reports [PDF] 返回目录
Surabhi Datta, Jordan Godfrey-Stovall, Kirk Roberts
Abstract: Radiology reports have been widely used for extraction of various clinically significant information about patients' imaging studies. However, limited research has focused on standardizing the entities to a common radiology-specific vocabulary. Further, no study to date has attempted to leverage RadLex for standardization. In this paper, we aim to normalize a diverse set of radiological entities to RadLex terms. We manually construct a normalization corpus by annotating entities from three types of reports. This contains 1706 entity mentions. We propose two deep learning-based NLP methods based on a pre-trained language model (BERT) for automatic normalization. First, we employ BM25 to retrieve candidate concepts for the BERT-based models (re-ranker and span detector) to predict the normalized concept. The results are promising, with the best accuracy (78.44%) obtained by the span detector. Additionally, we discuss the challenges involved in corpus construction and propose new RadLex terms.
摘要:放射学报告已被广泛用于对患者的影像学各种临床显著信息提取。然而,有限的研究都集中在实体规范到一个共同的放射科专用词汇。此外,没有研究迄今为止一直试图利用RadLex标准化。在本文中,我们的目标是一组不同的放射性实体正常化RadLex条款。从三种类型的报告标注的实体,我们手动构建标准化语料库。这包含1706实体提及。我们提出了一种基于预先训练的语言模型(BERT)自动正常化两个深学习型NLP的方法。首先,我们采用BM25检索基于BERT的模型(重新排序器和跨度检测器)候选概念来预测标准化概念。结果是有希望的,与由所述跨度检测器获得的最佳精度(78.44%)。此外,我们将讨论涉及语料库建设所面临的挑战,并提出新的RadLex条款。
Surabhi Datta, Jordan Godfrey-Stovall, Kirk Roberts
Abstract: Radiology reports have been widely used for extraction of various clinically significant information about patients' imaging studies. However, limited research has focused on standardizing the entities to a common radiology-specific vocabulary. Further, no study to date has attempted to leverage RadLex for standardization. In this paper, we aim to normalize a diverse set of radiological entities to RadLex terms. We manually construct a normalization corpus by annotating entities from three types of reports. This contains 1706 entity mentions. We propose two deep learning-based NLP methods based on a pre-trained language model (BERT) for automatic normalization. First, we employ BM25 to retrieve candidate concepts for the BERT-based models (re-ranker and span detector) to predict the normalized concept. The results are promising, with the best accuracy (78.44%) obtained by the span detector. Additionally, we discuss the challenges involved in corpus construction and propose new RadLex terms.
摘要:放射学报告已被广泛用于对患者的影像学各种临床显著信息提取。然而,有限的研究都集中在实体规范到一个共同的放射科专用词汇。此外,没有研究迄今为止一直试图利用RadLex标准化。在本文中,我们的目标是一组不同的放射性实体正常化RadLex条款。从三种类型的报告标注的实体,我们手动构建标准化语料库。这包含1706实体提及。我们提出了一种基于预先训练的语言模型(BERT)自动正常化两个深学习型NLP的方法。首先,我们采用BM25检索基于BERT的模型(重新排序器和跨度检测器)候选概念来预测标准化概念。结果是有希望的,与由所述跨度检测器获得的最佳精度(78.44%)。此外,我们将讨论涉及语料库建设所面临的挑战,并提出新的RadLex条款。
13. Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks [PDF] 返回目录
Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria
Abstract: Dialogue relation extraction (DRE) aims to detect the relation between two entities mentioned in a multi-party dialogue. It plays an important role in constructing knowledge graphs from conversational data increasingly abundant on the internet and facilitating intelligent dialogue system development. The prior methods of DRE do not meaningfully leverage speaker information-they just prepend the utterances with the respective speaker names. Thus, they fail to model the crucial inter-speaker relations that may give additional context to relevant argument entities through pronouns and triggers. We, however, present a graph attention network-based method for DRE where a graph, that contains meaningfully connected speaker, entity, entity-type, and utterance nodes, is constructed. This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context. We empirically show that this graph-based approach quite effectively captures the relations between different entity pairs in a dialogue as it outperforms the state-of-the-art approaches by a significant margin on the benchmark dataset DialogRE.
摘要:对话关系抽取(DRE)旨在检测多方对话提到的两个实体之间的关系。它在构建从会话数据知识图在互联网上日益丰富和促进智能对话系统发展的重要作用。 DRE的现有方法不有意义利用扬声器的信息,他们只是前面加上各自的扬声器名称的话语。因此,他们未能在关键的扬声器间的关系是可以通过代词和触发器相关参数实体提供额外的上下文模型。然而,我们提出,其中的曲线图,其包含有意义连接的扬声器,实体,实体型,和发声节点,被构建用于DRE的基于网络图关注方法。此图被馈送到图注意网络相关节点之间上下文传播,有效地捕捉对话上下文。我们经验表明,这种基于图形的方法相当有效地捕捉不同的实体对之间的关系进行了对话,因为它优于国家的最先进的通过在基准数据集DialogRE一个显著利润率接近。
Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria
Abstract: Dialogue relation extraction (DRE) aims to detect the relation between two entities mentioned in a multi-party dialogue. It plays an important role in constructing knowledge graphs from conversational data increasingly abundant on the internet and facilitating intelligent dialogue system development. The prior methods of DRE do not meaningfully leverage speaker information-they just prepend the utterances with the respective speaker names. Thus, they fail to model the crucial inter-speaker relations that may give additional context to relevant argument entities through pronouns and triggers. We, however, present a graph attention network-based method for DRE where a graph, that contains meaningfully connected speaker, entity, entity-type, and utterance nodes, is constructed. This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context. We empirically show that this graph-based approach quite effectively captures the relations between different entity pairs in a dialogue as it outperforms the state-of-the-art approaches by a significant margin on the benchmark dataset DialogRE.
摘要:对话关系抽取(DRE)旨在检测多方对话提到的两个实体之间的关系。它在构建从会话数据知识图在互联网上日益丰富和促进智能对话系统发展的重要作用。 DRE的现有方法不有意义利用扬声器的信息,他们只是前面加上各自的扬声器名称的话语。因此,他们未能在关键的扬声器间的关系是可以通过代词和触发器相关参数实体提供额外的上下文模型。然而,我们提出,其中的曲线图,其包含有意义连接的扬声器,实体,实体型,和发声节点,被构建用于DRE的基于网络图关注方法。此图被馈送到图注意网络相关节点之间上下文传播,有效地捕捉对话上下文。我们经验表明,这种基于图形的方法相当有效地捕捉不同的实体对之间的关系进行了对话,因为它优于国家的最先进的通过在基准数据集DialogRE一个显著利润率接近。
14. Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse [PDF] 返回目录
Ancil Crayton, João Fonseca, Kanav Mehra, Michelle Ng, Jared Ross, Marcelo Sandoval-Castañeda, Rachel von Gnechten
Abstract: People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020.
摘要:人们经常把社交媒体有关重大全球性事件的信息共享在评论和。因此,社交媒体正在受到越来越多的关注,了解人们的极端天气事件的社会,政治和经济经验丰富的数据源。在本文中,我们贡献了两种新方法,充分利用Twitter的话语来描述说明,并找出应对飓风Amphan,这影响了1800万人2020年5月未满足的需求。
Ancil Crayton, João Fonseca, Kanav Mehra, Michelle Ng, Jared Ross, Marcelo Sandoval-Castañeda, Rachel von Gnechten
Abstract: People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020.
摘要:人们经常把社交媒体有关重大全球性事件的信息共享在评论和。因此,社交媒体正在受到越来越多的关注,了解人们的极端天气事件的社会,政治和经济经验丰富的数据源。在本文中,我们贡献了两种新方法,充分利用Twitter的话语来描述说明,并找出应对飓风Amphan,这影响了1800万人2020年5月未满足的需求。
15. Systematic Generalization on gSCAN with Language Conditioned Embedding [PDF] 返回目录
Tong Gao, Qi Huang, Raymond J. Mooney
Abstract: Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning models fail dramatically even on tasks for which they are designed when the test set is systematically different from the training data. We hypothesize that explicitly modeling the relations between objects in their contexts while learning their representations will help achieve systematic generalization. Therefore, we propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language and end-to-end trainable with other downstream deep learning modules. To our knowledge, this model is the first one that significantly outperforms the provided baseline and reaches state-of-the-art performance on grounded-SCAN (gSCAN), a grounded natural language navigation dataset designed to require systematic generalization in its test splits.
摘要:系统推广是指学习算法的推断了解到的行为是不同的,但语义相似,其训练数据看不见局势的能力。正如在最近的工作显示,国家的最先进的深度学习模型哪怕是对当测试集是从训练数据系统不同,他们的设计任务,极大地失败。我们假设,明确建模对象之间的关系,在他们的环境中,同时学习他们的陈述将有助于实现系统的概括。因此,我们提出了一种新方法,该方法获悉对象的动态信息的嵌入情境对输入自然语言和端至端可训练与其他下游深学习模块路过空调。据我们所知,这个模型是显著优于上接地-SCAN(gSCAN),接地的自然语言导航数据集设计,要求在其测试分裂系统推广所提供的基线和下游国家的先进性能的第一个。
Tong Gao, Qi Huang, Raymond J. Mooney
Abstract: Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning models fail dramatically even on tasks for which they are designed when the test set is systematically different from the training data. We hypothesize that explicitly modeling the relations between objects in their contexts while learning their representations will help achieve systematic generalization. Therefore, we propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language and end-to-end trainable with other downstream deep learning modules. To our knowledge, this model is the first one that significantly outperforms the provided baseline and reaches state-of-the-art performance on grounded-SCAN (gSCAN), a grounded natural language navigation dataset designed to require systematic generalization in its test splits.
摘要:系统推广是指学习算法的推断了解到的行为是不同的,但语义相似,其训练数据看不见局势的能力。正如在最近的工作显示,国家的最先进的深度学习模型哪怕是对当测试集是从训练数据系统不同,他们的设计任务,极大地失败。我们假设,明确建模对象之间的关系,在他们的环境中,同时学习他们的陈述将有助于实现系统的概括。因此,我们提出了一种新方法,该方法获悉对象的动态信息的嵌入情境对输入自然语言和端至端可训练与其他下游深学习模块路过空调。据我们所知,这个模型是显著优于上接地-SCAN(gSCAN),接地的自然语言导航数据集设计,要求在其测试分裂系统推广所提供的基线和下游国家的先进性能的第一个。
16. RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications [PDF] 返回目录
Adriana Stan
Abstract: Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.
摘要:深学习能够有效终端到终端的语音处理应用程序的开发,同时绕过专家语言和信号处理功能的需求。然而,最近的研究表明,良好的品质语音资源和训练数据的语音记录可以增强这些应用程序的结果。在本文中,RECOApy工具被引入。 RECOApy简化数据记录和预处理的端至端的基于语音的应用程序所需的步骤。该工具实现了一个易于使用的界面,提示语音录音,频谱图和波形分析,发声水平正常化和静音微调,以八种语言以及字形到音位转换提示:捷克语,英语,法语,德语,意大利语,波兰语,罗马尼亚语和西班牙语。字形 - 音素(G2P)转换器被训练从维基提取的在线协作资源词汇深层神经网络(DNN)的基础架构。有了不同程度的正字透明度,以及跨语言的语音输入的变化量中,DNN的超参数都与进化策略优化。所产生的G2P转换器的音素和字差错率介绍和讨论。该工具,处理后的语音词典和训练有素的G2P机型免费提供。
Adriana Stan
Abstract: Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.
摘要:深学习能够有效终端到终端的语音处理应用程序的开发,同时绕过专家语言和信号处理功能的需求。然而,最近的研究表明,良好的品质语音资源和训练数据的语音记录可以增强这些应用程序的结果。在本文中,RECOApy工具被引入。 RECOApy简化数据记录和预处理的端至端的基于语音的应用程序所需的步骤。该工具实现了一个易于使用的界面,提示语音录音,频谱图和波形分析,发声水平正常化和静音微调,以八种语言以及字形到音位转换提示:捷克语,英语,法语,德语,意大利语,波兰语,罗马尼亚语和西班牙语。字形 - 音素(G2P)转换器被训练从维基提取的在线协作资源词汇深层神经网络(DNN)的基础架构。有了不同程度的正字透明度,以及跨语言的语音输入的变化量中,DNN的超参数都与进化策略优化。所产生的G2P转换器的音素和字差错率介绍和讨论。该工具,处理后的语音词典和训练有素的G2P机型免费提供。
17. Patient Cohort Retrieval using Transformer Language Models [PDF] 返回目录
Sarvesh Soni, Kirk Roberts
Abstract: We apply deep learning-based language models to the task of patient cohort retrieval (CR) with the aim to assess their efficacy. The task of CR requires the extraction of relevant documents from the electronic health records (EHRs) on the basis of a given query. Given the recent advancements in the field of document retrieval, we map the task of CR to a document retrieval task and apply various deep neural models implemented for the general domain tasks. In this paper, we propose a framework for retrieving patient cohorts using neural language models without the need of explicit feature engineering and domain expertise. We find that a majority of our models outperform the BM25 baseline method on various evaluation metrics.
摘要:我们采用基于深学习语言模型,以患者队列检索(CR)的任务,目的是评估其功效。 CR的任务,需要有指定查询的基础上的电子健康记录(电子病历)相关文件的提取。鉴于最近的进步在文献检索领域,我们CR的任务映射到一个文档检索任务,并申请一般域任务实施的各种深层神经模型。在本文中,我们提出了检索用神经语言模型的患者群,而不需要明确的功能设计和专业领域知识的框架。我们发现,我们的大部分车型的超越各种评价指标的BM25基线法。
Sarvesh Soni, Kirk Roberts
Abstract: We apply deep learning-based language models to the task of patient cohort retrieval (CR) with the aim to assess their efficacy. The task of CR requires the extraction of relevant documents from the electronic health records (EHRs) on the basis of a given query. Given the recent advancements in the field of document retrieval, we map the task of CR to a document retrieval task and apply various deep neural models implemented for the general domain tasks. In this paper, we propose a framework for retrieving patient cohorts using neural language models without the need of explicit feature engineering and domain expertise. We find that a majority of our models outperform the BM25 baseline method on various evaluation metrics.
摘要:我们采用基于深学习语言模型,以患者队列检索(CR)的任务,目的是评估其功效。 CR的任务,需要有指定查询的基础上的电子健康记录(电子病历)相关文件的提取。鉴于最近的进步在文献检索领域,我们CR的任务映射到一个文档检索任务,并申请一般域任务实施的各种深层神经模型。在本文中,我们提出了检索用神经语言模型的患者群,而不需要明确的功能设计和专业领域知识的框架。我们发现,我们的大部分车型的超越各种评价指标的BM25基线法。
注:中文为机器翻译结果!封面为论文标题词云图!