目录
摘要
1. Entity and Evidence Guided Relation Extraction for DocRED [PDF] 返回目录
Kevin Huang, Guangtao Wang, Tengyu Ma, Jing Huang
Abstract: Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document. In this paper, we pro-pose a joint training frameworkE2GRE(Entity and Evidence Guided Relation Extraction)for this task. First, we introduce entity-guided sequences as inputs to a pre-trained language model (e.g. BERT, RoBERTa). These entity-guided sequences help a pre-trained language model (LM) to focus on areas of the document related to the entity. Secondly, we guide the fine-tuning of the pre-trained language model by using its internal attention probabilities as additional features for evidence prediction.Our new approach encourages the pre-trained language model to focus on the entities and supporting/evidence sentences. We evaluate our E2GRE approach on DocRED, a recently released large-scale dataset for relation extraction. Our approach is able to achieve state-of-the-art results on the public leaderboard across all metrics, showing that our E2GRE is both effective and synergistic on relation extraction and evidence prediction.
摘要:文档级关系抽取是一项具有挑战性的任务,需要推理了多个句子,以预测文档中的关系。在本文中,我们亲构成该任务的联合训练frameworkE2GRE(实体和证据指导关系抽取)。首先,我们介绍实体引导序列作为输入来预先训练的语言模型(例如BERT,罗伯塔)。这些实体引导序列帮助上与实体相关的文档的区域预先训练的语言模型(LM)将焦点。其次,我们将作为证据prediction.Our新方法附加功能鼓励预先训练的语言模型,以集中精力于实体和支持/证据句子,用其内部的注意力引导概率预先训练的语言模型的微调。我们评估我们在DocRED,最近发布的大型数据集的关系抽取E2GRE方法。我们的做法是能够实现对所有指标的公开排行榜国家的先进成果,显示出我们的E2GRE既有效又协同的关系抽取和证据的预测。
Kevin Huang, Guangtao Wang, Tengyu Ma, Jing Huang
Abstract: Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document. In this paper, we pro-pose a joint training frameworkE2GRE(Entity and Evidence Guided Relation Extraction)for this task. First, we introduce entity-guided sequences as inputs to a pre-trained language model (e.g. BERT, RoBERTa). These entity-guided sequences help a pre-trained language model (LM) to focus on areas of the document related to the entity. Secondly, we guide the fine-tuning of the pre-trained language model by using its internal attention probabilities as additional features for evidence prediction.Our new approach encourages the pre-trained language model to focus on the entities and supporting/evidence sentences. We evaluate our E2GRE approach on DocRED, a recently released large-scale dataset for relation extraction. Our approach is able to achieve state-of-the-art results on the public leaderboard across all metrics, showing that our E2GRE is both effective and synergistic on relation extraction and evidence prediction.
摘要:文档级关系抽取是一项具有挑战性的任务,需要推理了多个句子,以预测文档中的关系。在本文中,我们亲构成该任务的联合训练frameworkE2GRE(实体和证据指导关系抽取)。首先,我们介绍实体引导序列作为输入来预先训练的语言模型(例如BERT,罗伯塔)。这些实体引导序列帮助上与实体相关的文档的区域预先训练的语言模型(LM)将焦点。其次,我们将作为证据prediction.Our新方法附加功能鼓励预先训练的语言模型,以集中精力于实体和支持/证据句子,用其内部的注意力引导概率预先训练的语言模型的微调。我们评估我们在DocRED,最近发布的大型数据集的关系抽取E2GRE方法。我们的做法是能够实现对所有指标的公开排行榜国家的先进成果,显示出我们的E2GRE既有效又协同的关系抽取和证据的预测。
2. Adaptable Filtering using Hierarchical Embeddings for Chinese Spell Check [PDF] 返回目录
Minh Nguyen, Gia H. Ngo, Nancy F. Chen
Abstract: Spell check is a useful application which involves processing noisy human-generated text. Compared to other languages like English, it is more challenging to detect and correct spelling errors in Chinese because it has more (up to 100k) characters. For Chinese spell check, using confusion sets narrows the search space and makes finding corrections easier. However, most, if not all, confusion sets used to date are fixed and thus do not include new, evolving error patterns. We propose a scalable approach to adapt confusion sets by exploiting hierarchical character embeddings to (1) obviate the need to handcraft confusion sets, and (2) resolve sparsity issues related to seldom-occurring errors. Our approach establishes new SOTA results in spelling error correction on the 2014 and 2015 Chinese Spelling Correction Bake-off datasets.
摘要:拼写检查是涉及处理嘈杂的人类生成的文本有用的应用。相比于类似英语以外的语言,它是更大的挑战在中国的检测和纠正拼写错误,因为它有多个(最多100K)字符。对于中国的拼写检查,使用混乱套缩小搜索空间,使得寻找更正容易。然而,大多数,如果不是全部,迄今使用的混乱集是固定的,因此不包括新的,不断发展的错误模式。我们提出了一个可扩展的方法通过利用分层字符的嵌入(1)适应混乱套避免与很少出现的错误,需要手工混乱套,和(2)解决稀疏的问题。我们的方法建立在2014年和2015年中国的拼写检查烘烤过的数据集拼写纠错新SOTA结果。
Minh Nguyen, Gia H. Ngo, Nancy F. Chen
Abstract: Spell check is a useful application which involves processing noisy human-generated text. Compared to other languages like English, it is more challenging to detect and correct spelling errors in Chinese because it has more (up to 100k) characters. For Chinese spell check, using confusion sets narrows the search space and makes finding corrections easier. However, most, if not all, confusion sets used to date are fixed and thus do not include new, evolving error patterns. We propose a scalable approach to adapt confusion sets by exploiting hierarchical character embeddings to (1) obviate the need to handcraft confusion sets, and (2) resolve sparsity issues related to seldom-occurring errors. Our approach establishes new SOTA results in spelling error correction on the 2014 and 2015 Chinese Spelling Correction Bake-off datasets.
摘要:拼写检查是涉及处理嘈杂的人类生成的文本有用的应用。相比于类似英语以外的语言,它是更大的挑战在中国的检测和纠正拼写错误,因为它有多个(最多100K)字符。对于中国的拼写检查,使用混乱套缩小搜索空间,使得寻找更正容易。然而,大多数,如果不是全部,迄今使用的混乱集是固定的,因此不包括新的,不断发展的错误模式。我们提出了一个可扩展的方法通过利用分层字符的嵌入(1)适应混乱套避免与很少出现的错误,需要手工混乱套,和(2)解决稀疏的问题。我们的方法建立在2014年和2015年中国的拼写检查烘烤过的数据集拼写纠错新SOTA结果。
3. Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus [PDF] 返回目录
Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
Abstract: This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected. We describe the ULI dataset and how it was constructed using the Wanca 2017 corpus and texts in different languages from the Leipzig corpora collection. We also provide baseline language identification experiments conducted using the ULI 2020 dataset.
摘要:本文介绍了文本的Wanca 2017年语料来自这难得的乌拉尔语系句子的使用乌拉尔语言识别(ULI)2020年共同任务收集互联网抓取。我们描述了ULI数据集以及它是如何使用Wanca 2017年语料和文本从莱比锡语料库收集不同的语言构造。我们还提供基准语言识别实验使用ULI 2020数据集进行。
Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
Abstract: This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected. We describe the ULI dataset and how it was constructed using the Wanca 2017 corpus and texts in different languages from the Leipzig corpora collection. We also provide baseline language identification experiments conducted using the ULI 2020 dataset.
摘要:本文介绍了文本的Wanca 2017年语料来自这难得的乌拉尔语系句子的使用乌拉尔语言识别(ULI)2020年共同任务收集互联网抓取。我们描述了ULI数据集以及它是如何使用Wanca 2017年语料和文本从莱比锡语料库收集不同的语言构造。我们还提供基准语言识别实验使用ULI 2020数据集进行。
4. GREEK-BERT: The Greeks visiting Sesame Street [PDF] 返回目录
John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, Ion Androutsopoulos
Abstract: Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek.
摘要:基于变压器的语言模型,如BERT和它的变体,已经在通用的基准数据集(例如,胶水,阵容,RACE)的几个下游自然语言处理(NLP)任务,实现了国家的最先进的性能。然而,这些模型大多被应用到资源丰富的英语。在本文中,我们目前希腊-BERT,现代希腊语和英语基础BERT语言模型。我们评估了三个NLP任务上的表现,即,部分词性标注,命名实体识别和自然语言推理,获得国家的最先进的性能。有趣的是,在两个基准希腊语-BERT性能优于2多种语言基于变压器的模型(M-BERT,XLM-R),以及浅神经基线上预先训练字的嵌入操作,大幅度(5%-10 %)。最重要的是,我们使这两个希腊-BERT和我们的训练代码公开的,有说明希腊-BERT如何进行微调下游NLP任务代码一起。我们希望这些资源来推动NLP的研究和应用现代希腊。
John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, Ion Androutsopoulos
Abstract: Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek.
摘要:基于变压器的语言模型,如BERT和它的变体,已经在通用的基准数据集(例如,胶水,阵容,RACE)的几个下游自然语言处理(NLP)任务,实现了国家的最先进的性能。然而,这些模型大多被应用到资源丰富的英语。在本文中,我们目前希腊-BERT,现代希腊语和英语基础BERT语言模型。我们评估了三个NLP任务上的表现,即,部分词性标注,命名实体识别和自然语言推理,获得国家的最先进的性能。有趣的是,在两个基准希腊语-BERT性能优于2多种语言基于变压器的模型(M-BERT,XLM-R),以及浅神经基线上预先训练字的嵌入操作,大幅度(5%-10 %)。最重要的是,我们使这两个希腊-BERT和我们的训练代码公开的,有说明希腊-BERT如何进行微调下游NLP任务代码一起。我们希望这些资源来推动NLP的研究和应用现代希腊。
5. A Survey of Evaluation Metrics Used for NLG Systems [PDF] 返回目录
Ananya B. Sai, Akash Kumar Mohankumar, Mitesh M. Khapra
Abstract: The success of Deep Learning has created a surge in interest in a wide a range of Natural Language Generation (NLG) tasks. Deep Learning has not only pushed the state of the art in several existing NLG tasks but has also facilitated researchers to explore various newer NLG tasks such as image captioning. Such rapid progress in NLG has necessitated the development of accurate automatic evaluation metrics that would allow us to track the progress in the field of NLG. However, unlike classification tasks, automatically evaluating NLG systems in itself is a huge challenge. Several works have shown that early heuristic-based metrics such as BLEU, ROUGE are inadequate for capturing the nuances in the different NLG tasks. The expanding number of NLG models and the shortcomings of the current metrics has led to a rapid surge in the number of evaluation metrics proposed since 2014. Moreover, various evaluation metrics have shifted from using pre-determined heuristic-based formulae to trained transformer models. This rapid change in a relatively short time has led to the need for a survey of the existing NLG metrics to help existing and new researchers to quickly come up to speed with the developments that have happened in NLG evaluation in the last few years. Through this survey, we first wish to highlight the challenges and difficulties in automatically evaluating NLG systems. Then, we provide a coherent taxonomy of the evaluation metrics to organize the existing metrics and to better understand the developments in the field. We also describe the different metrics in detail and highlight their key contributions. Later, we discuss the main shortcomings identified in the existing metrics and describe the methodology used to evaluate evaluation metrics. Finally, we discuss our suggestions and recommendations on the next steps forward to improve the automatic evaluation metrics.
摘要:深学习的成功创造了兴趣激增在大的范围内的自然语言生成(NLG)的任务。深度学习不仅推动了技术状态在现有的几种NLG任务,而且也促进了研究人员能够探索各种新的NLG任务,例如图像字幕。在NLG如此快速的进步使得有必要准确的自动评价标准,将允许我们跟踪NLG领域的进步发展。然而,不同于分类的任务,本身就自动评估系统NLG是一个巨大的挑战。几部作品已经表明,早期的启发式指标,如BLEU,ROUGE不足在不同NLG任务捕捉细微差别。扩展数NLG模型和当前度量的缺点已经导致在2014年以来此外,各种评价指标已经从使用预先确定的启发式公式来训练模型变压器移位拟议的评价指标的数量迅速激增。在较短的时间内这种快速的变化导致了需要对现有NLG指标,来帮助现有的和新的研究人员调查,以迅速达到速度与在过去的几年里发生在NLG评价的发展。通过本次调查,我们首先要强调的自动评估NLG系统的挑战和困难。然后,我们提供了评价指标的一致分类来组织现有的指标,并更好地了解该领域的发展。我们还描述了不同的指标在细节和突出自己的重要贡献。后来,我们将讨论在现有的指标确定的主要缺点,并描述了用于评估评价标准的方法。最后,我们讨论我们的意见和建议,对下一步的着提高自动评价指标。
Ananya B. Sai, Akash Kumar Mohankumar, Mitesh M. Khapra
Abstract: The success of Deep Learning has created a surge in interest in a wide a range of Natural Language Generation (NLG) tasks. Deep Learning has not only pushed the state of the art in several existing NLG tasks but has also facilitated researchers to explore various newer NLG tasks such as image captioning. Such rapid progress in NLG has necessitated the development of accurate automatic evaluation metrics that would allow us to track the progress in the field of NLG. However, unlike classification tasks, automatically evaluating NLG systems in itself is a huge challenge. Several works have shown that early heuristic-based metrics such as BLEU, ROUGE are inadequate for capturing the nuances in the different NLG tasks. The expanding number of NLG models and the shortcomings of the current metrics has led to a rapid surge in the number of evaluation metrics proposed since 2014. Moreover, various evaluation metrics have shifted from using pre-determined heuristic-based formulae to trained transformer models. This rapid change in a relatively short time has led to the need for a survey of the existing NLG metrics to help existing and new researchers to quickly come up to speed with the developments that have happened in NLG evaluation in the last few years. Through this survey, we first wish to highlight the challenges and difficulties in automatically evaluating NLG systems. Then, we provide a coherent taxonomy of the evaluation metrics to organize the existing metrics and to better understand the developments in the field. We also describe the different metrics in detail and highlight their key contributions. Later, we discuss the main shortcomings identified in the existing metrics and describe the methodology used to evaluate evaluation metrics. Finally, we discuss our suggestions and recommendations on the next steps forward to improve the automatic evaluation metrics.
摘要:深学习的成功创造了兴趣激增在大的范围内的自然语言生成(NLG)的任务。深度学习不仅推动了技术状态在现有的几种NLG任务,而且也促进了研究人员能够探索各种新的NLG任务,例如图像字幕。在NLG如此快速的进步使得有必要准确的自动评价标准,将允许我们跟踪NLG领域的进步发展。然而,不同于分类的任务,本身就自动评估系统NLG是一个巨大的挑战。几部作品已经表明,早期的启发式指标,如BLEU,ROUGE不足在不同NLG任务捕捉细微差别。扩展数NLG模型和当前度量的缺点已经导致在2014年以来此外,各种评价指标已经从使用预先确定的启发式公式来训练模型变压器移位拟议的评价指标的数量迅速激增。在较短的时间内这种快速的变化导致了需要对现有NLG指标,来帮助现有的和新的研究人员调查,以迅速达到速度与在过去的几年里发生在NLG评价的发展。通过本次调查,我们首先要强调的自动评估NLG系统的挑战和困难。然后,我们提供了评价指标的一致分类来组织现有的指标,并更好地了解该领域的发展。我们还描述了不同的指标在细节和突出自己的重要贡献。后来,我们将讨论在现有的指标确定的主要缺点,并描述了用于评估评价标准的方法。最后,我们讨论我们的意见和建议,对下一步的着提高自动评价指标。
6. Query Focused Multi-document Summarisation of Biomedical Texts [PDF] 返回目录
Diego Molla, Christopher Jones, Vincent Nguyen
Abstract: This paper presents the participation of Macquarie University and the Australian National University for Task B Phase B of the 2020 BioASQ Challenge (BioASQ8b). Our overall framework implements Query focused multi-document extractive summarisation by applying either a classification or a regression layer to the candidate sentence embeddings and to the comparison between the question and sentence embeddings. We experiment with variants using BERT and BioBERT, Siamese architectures, and reinforcement learning. We observe the best results when BERT is used to obtain the word embeddings, followed by an LSTM layer to obtain sentence embeddings. Variants using Siamese architectures or BioBERT did not improve the results.
摘要:本文介绍麦考瑞大学的澳大利亚国立大学为2020年BioASQ挑战的任务B阶段B(BioASQ8b)的参与和。我们的总体框架实现查询通过应用或者是分类或回归层候选句子的嵌入以及问题和句子的嵌入之间的比较集中的多文档提取概要。我们使用BERT和BioBERT,连体结构,并强化学习变异实验。当BERT用于获取字的嵌入,随后是LSTM层来获得句子的嵌入我们观察到最好的结果。变种采用连体结构或BioBERT没有改善的结果。
Diego Molla, Christopher Jones, Vincent Nguyen
Abstract: This paper presents the participation of Macquarie University and the Australian National University for Task B Phase B of the 2020 BioASQ Challenge (BioASQ8b). Our overall framework implements Query focused multi-document extractive summarisation by applying either a classification or a regression layer to the candidate sentence embeddings and to the comparison between the question and sentence embeddings. We experiment with variants using BERT and BioBERT, Siamese architectures, and reinforcement learning. We observe the best results when BERT is used to obtain the word embeddings, followed by an LSTM layer to obtain sentence embeddings. Variants using Siamese architectures or BioBERT did not improve the results.
摘要:本文介绍麦考瑞大学的澳大利亚国立大学为2020年BioASQ挑战的任务B阶段B(BioASQ8b)的参与和。我们的总体框架实现查询通过应用或者是分类或回归层候选句子的嵌入以及问题和句子的嵌入之间的比较集中的多文档提取概要。我们使用BERT和BioBERT,连体结构,并强化学习变异实验。当BERT用于获取字的嵌入,随后是LSTM层来获得句子的嵌入我们观察到最好的结果。变种采用连体结构或BioBERT没有改善的结果。
7. Opinion-aware Answer Generation for Review-driven Question Answering in E-Commerce [PDF] 返回目录
Yang Deng, Wenxuan Zhanng, Wai Lam
Abstract: Product-related question answering (QA) is an important but challenging task in E-Commerce. It leads to a great demand on automatic review-driven QA, which aims at providing instant responses towards user-posted questions based on diverse product reviews. Nevertheless, the rich information about personal opinions in product reviews, which is essential to answer those product-specific questions, is underutilized in current generation-based review-driven QA studies. There are two main challenges when exploiting the opinion information from the reviews to facilitate the opinion-aware answer generation: (i) jointly modeling opinionated and interrelated information between the question and reviews to capture important information for answer generation, (ii) aggregating diverse opinion information to uncover the common opinion towards the given question. In this paper, we tackle opinion-aware answer generation by jointly learning answer generation and opinion mining tasks with a unified model. Two kinds of opinion fusion strategies, namely, static and dynamic fusion, are proposed to distill and aggregate important opinion information learned from the opinion mining task into the answer generation process. Then a multi-view pointer-generator network is employed to generate opinion-aware answers for a given product-related question. Experimental results show that our method achieves superior performance in real-world E-Commerce QA datasets, and effectively generate opinionated and informative answers.
摘要:产品相关问答(QA)是电子商务的一个重要而艰巨的任务。这导致对自动审查驱动的QA,其目的有很大的需求,在提供基于不同的产品评论对用户张贴的问题,即时响应。然而,关于商品评论的个人意见的丰富的信息,这是至关重要的回答这些特定产品的问题,在当前基于新一代审查驱动的QA研究充分利用。有两个主要的挑战利用从评论者的意见信息时,方便的意见感知答案代:(一)联合建模设计刚愎自用,相互关联的问题和评论之间的信息捕捉要回答一代的重要信息,(二)汇总不同意见信息揭示对特定问题的共同意见。在本文中,我们通过共同学习的答案生成和意见挖掘任务有一个统一的模型处理的意见,知道答案的产生。两种观点的融合策略,即静态和动态的融合,提出了提炼和意见挖掘任务到答案生成过程中了解到骨料重要的意见信息。然后,多视点指针发电机网络被用来生成给定产品相关问题的意见,感知的答案。实验结果表明,该方法实现在真实世界电子商务QA数据集卓越性能,并有效地产生自以为是和翔实的答案。
Yang Deng, Wenxuan Zhanng, Wai Lam
Abstract: Product-related question answering (QA) is an important but challenging task in E-Commerce. It leads to a great demand on automatic review-driven QA, which aims at providing instant responses towards user-posted questions based on diverse product reviews. Nevertheless, the rich information about personal opinions in product reviews, which is essential to answer those product-specific questions, is underutilized in current generation-based review-driven QA studies. There are two main challenges when exploiting the opinion information from the reviews to facilitate the opinion-aware answer generation: (i) jointly modeling opinionated and interrelated information between the question and reviews to capture important information for answer generation, (ii) aggregating diverse opinion information to uncover the common opinion towards the given question. In this paper, we tackle opinion-aware answer generation by jointly learning answer generation and opinion mining tasks with a unified model. Two kinds of opinion fusion strategies, namely, static and dynamic fusion, are proposed to distill and aggregate important opinion information learned from the opinion mining task into the answer generation process. Then a multi-view pointer-generator network is employed to generate opinion-aware answers for a given product-related question. Experimental results show that our method achieves superior performance in real-world E-Commerce QA datasets, and effectively generate opinionated and informative answers.
摘要:产品相关问答(QA)是电子商务的一个重要而艰巨的任务。这导致对自动审查驱动的QA,其目的有很大的需求,在提供基于不同的产品评论对用户张贴的问题,即时响应。然而,关于商品评论的个人意见的丰富的信息,这是至关重要的回答这些特定产品的问题,在当前基于新一代审查驱动的QA研究充分利用。有两个主要的挑战利用从评论者的意见信息时,方便的意见感知答案代:(一)联合建模设计刚愎自用,相互关联的问题和评论之间的信息捕捉要回答一代的重要信息,(二)汇总不同意见信息揭示对特定问题的共同意见。在本文中,我们通过共同学习的答案生成和意见挖掘任务有一个统一的模型处理的意见,知道答案的产生。两种观点的融合策略,即静态和动态的融合,提出了提炼和意见挖掘任务到答案生成过程中了解到骨料重要的意见信息。然后,多视点指针发电机网络被用来生成给定产品相关问题的意见,感知的答案。实验结果表明,该方法实现在真实世界电子商务QA数据集卓越性能,并有效地产生自以为是和翔实的答案。
8. Improvement of a dedicated model for open domain persona-aware dialogue generation [PDF] 返回目录
Qiang Han
Abstract: This paper analyzes some speed and performance improvement methods of Transformer architecture in recent years, mainly its application in dedicated model training. The dedicated model studied here refers to the open domain persona-aware dialogue generation model, and the dataset is multi turn short dialogue, The total length of a single input sequence is no more than 105 tokens. Therefore, many improvements in the architecture and attention mechanism of transformer architecture for long sequence processing are not discussed in this paper. The source code of the experiments has been open sourced: this https URL
摘要:本文分析了变压器架构的一些速度和性能改进方法,近年来,主要是其在专用模型训练中的应用。这里所研究的专用模型指的是开放域角色感知对话生成模型,该数据集是多圈短对话,一个单一的输入序列的总长度不超过105级的令牌。因此,在变压器架构的长序列处理的体系结构和注意机制的许多改进在本文中不讨论。实验的源代码已经开源了:这HTTPS URL
Qiang Han
Abstract: This paper analyzes some speed and performance improvement methods of Transformer architecture in recent years, mainly its application in dedicated model training. The dedicated model studied here refers to the open domain persona-aware dialogue generation model, and the dataset is multi turn short dialogue, The total length of a single input sequence is no more than 105 tokens. Therefore, many improvements in the architecture and attention mechanism of transformer architecture for long sequence processing are not discussed in this paper. The source code of the experiments has been open sourced: this https URL
摘要:本文分析了变压器架构的一些速度和性能改进方法,近年来,主要是其在专用模型训练中的应用。这里所研究的专用模型指的是开放域角色感知对话生成模型,该数据集是多圈短对话,一个单一的输入序列的总长度不超过105级的令牌。因此,在变压器架构的长序列处理的体系结构和注意机制的许多改进在本文中不讨论。实验的源代码已经开源了:这HTTPS URL
9. Relation/Entity-Centric Reading Comprehension [PDF] 返回目录
Takeshi Onishi
Abstract: Constructing a machine that understands human language is one of the most elusive and long-standing challenges in artificial intelligence. This thesis addresses this challenge through studies of reading comprehension with a focus on understanding entities and their relationships. More specifically, we focus on question answering tasks designed to measure reading comprehension. We focus on entities and relations because they are typically used to represent the semantics of natural language.
摘要:构建能理解人类语言的机器是在人工智能中最难以捉摸的和长期的挑战之一。本文通过解决阅读理解的重点放在理解实体及其关系研究这一挑战。更具体地说,我们专注于设计用来测量阅读理解问题回答的任务。我们专注于实体和关系,因为它们通常用于表示自然语言的语义。
Takeshi Onishi
Abstract: Constructing a machine that understands human language is one of the most elusive and long-standing challenges in artificial intelligence. This thesis addresses this challenge through studies of reading comprehension with a focus on understanding entities and their relationships. More specifically, we focus on question answering tasks designed to measure reading comprehension. We focus on entities and relations because they are typically used to represent the semantics of natural language.
摘要:构建能理解人类语言的机器是在人工智能中最难以捉摸的和长期的挑战之一。本文通过解决阅读理解的重点放在理解实体及其关系研究这一挑战。更具体地说,我们专注于设计用来测量阅读理解问题回答的任务。我们专注于实体和关系,因为它们通常用于表示自然语言的语义。
10. Automatic Speech Summarisation: A Scoping Review [PDF] 返回目录
Dana Rezazadegan, Shlomo Berkovsky, Juan C. Quiroz, A. Baki Kocaballi, Ying Wang, Liliana Laranjo, Enrico Coiera
Abstract: Speech summarisation techniques take human speech as input and then output an abridged version as text or speech. Speech summarisation has applications in many domains from information technology to health care, for example improving speech archives or reducing clinical documentation burden. This scoping review maps the speech summarisation literature, with no restrictions on time frame, language summarised, research method, or paper type. We reviewed a total of 110 papers out of a set of 153 found through a literature search and extracted speech features used, methods, scope, and training corpora. Most studies employ one of four speech summarisation architectures: (1) Sentence extraction and compaction; (2) Feature extraction and classification or rank-based sentence selection; (3) Sentence compression and compression summarisation; and (4) Language modelling. We also discuss the strengths and weaknesses of these different methods and speech features. Overall, supervised methods (e.g. Hidden Markov support vector machines, Ranking support vector machines, Conditional random fields) performed better than unsupervised methods. As supervised methods require manually annotated training data which can be costly, there was more interest in unsupervised methods. Recent research into unsupervised methods focusses on extending language modelling, for example by combining Uni-gram modelling with deep neural networks. Protocol registration: The protocol for this scoping review is registered at this https URL.
摘要:演讲概要技术利用人的讲话为输入,然后输出的删节版为文本或语音。演讲概要已经从信息技术到医疗保健提高语音档案或减轻临床文档的负担在许多领域的应用,例如。此作用域审查映射演讲概要文学,对时间框架,语言总结,研究方法,或纸张类型没有限制。我们一共有110篇论文审查出一套通过文献检索发现,153和提取的语音使用的功能,方法,范围和训练语料库。四个演讲概要架构大多数研究采用一种:(1)句子提取和压实; (2)特征提取与分类或基于秩例句选择; (3)压缩句子和压缩概要;和(4)语言建模。我们还讨论这些不同的方法和语音特征的优势和劣势。总体而言,监督方法(例如隐马尔可夫支持向量机,排序支持向量机,条件随机场)比无监督方法更好执行。作为监督的方法需要手动注释这可能是昂贵的训练数据,出现了无监督的方法更感兴趣。最近的研究监督的方法主要论点集中在尤尼克建模与深层神经网络相结合扩展语言建模,例如。协议注册:此作用域审查的协议是在这个HTTPS URL注册。
Dana Rezazadegan, Shlomo Berkovsky, Juan C. Quiroz, A. Baki Kocaballi, Ying Wang, Liliana Laranjo, Enrico Coiera
Abstract: Speech summarisation techniques take human speech as input and then output an abridged version as text or speech. Speech summarisation has applications in many domains from information technology to health care, for example improving speech archives or reducing clinical documentation burden. This scoping review maps the speech summarisation literature, with no restrictions on time frame, language summarised, research method, or paper type. We reviewed a total of 110 papers out of a set of 153 found through a literature search and extracted speech features used, methods, scope, and training corpora. Most studies employ one of four speech summarisation architectures: (1) Sentence extraction and compaction; (2) Feature extraction and classification or rank-based sentence selection; (3) Sentence compression and compression summarisation; and (4) Language modelling. We also discuss the strengths and weaknesses of these different methods and speech features. Overall, supervised methods (e.g. Hidden Markov support vector machines, Ranking support vector machines, Conditional random fields) performed better than unsupervised methods. As supervised methods require manually annotated training data which can be costly, there was more interest in unsupervised methods. Recent research into unsupervised methods focusses on extending language modelling, for example by combining Uni-gram modelling with deep neural networks. Protocol registration: The protocol for this scoping review is registered at this https URL.
摘要:演讲概要技术利用人的讲话为输入,然后输出的删节版为文本或语音。演讲概要已经从信息技术到医疗保健提高语音档案或减轻临床文档的负担在许多领域的应用,例如。此作用域审查映射演讲概要文学,对时间框架,语言总结,研究方法,或纸张类型没有限制。我们一共有110篇论文审查出一套通过文献检索发现,153和提取的语音使用的功能,方法,范围和训练语料库。四个演讲概要架构大多数研究采用一种:(1)句子提取和压实; (2)特征提取与分类或基于秩例句选择; (3)压缩句子和压缩概要;和(4)语言建模。我们还讨论这些不同的方法和语音特征的优势和劣势。总体而言,监督方法(例如隐马尔可夫支持向量机,排序支持向量机,条件随机场)比无监督方法更好执行。作为监督的方法需要手动注释这可能是昂贵的训练数据,出现了无监督的方法更感兴趣。最近的研究监督的方法主要论点集中在尤尼克建模与深层神经网络相结合扩展语言建模,例如。协议注册:此作用域审查的协议是在这个HTTPS URL注册。
11. AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization [PDF] 返回目录
Xinsong Zhang, Hang Li
Abstract: Pre-trained language models such as BERT have exhibited remarkable performances in many tasks in natural language understanding (NLU). The tokens in the models are usually fine-grained in the sense that for languages like English they are words or sub-words and for languages like Chinese they are characters. In English, for example, there are multi-word expressions which form natural lexical units and thus the use of coarse-grained tokenization also appears to be reasonable. In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models. In this paper, we propose a novel pre-trained language model, referred to as AMBERT (A Multi-grained BERT), on the basis of both fine-grained and coarse-grained tokenizations. For English, AMBERT takes both the sequence of words (fine-grained tokens) and the sequence of phrases (coarse-grained tokens) as input after tokenization, employs one encoder for processing the sequence of words and the other encoder for processing the sequence of the phrases, utilizes shared parameters between the two encoders, and finally creates a sequence of contextualized representations of the words and a sequence of contextualized representations of the phrases. Experiments have been conducted on benchmark datasets for Chinese and English, including CLUE, GLUE, SQuAD and RACE. The results show that AMBERT outperforms the existing best performing models in almost all cases, particularly the improvements are significant for Chinese.
摘要:预先训练语言模型,如BERT已经在自然语言理解(NLU)许多任务表现出卓越的性能。在模型中的标记通常被细颗粒在这个意义上,对于英语这种语言他们是词或分词和像中国的语言他们的字符。在英语中,例如,存在其形成天然词汇单元因此使用粗粒标记化的也似乎是合理的多字表达式。其实,无论是细粒度和粗粒度tokenizations都有优点和缺点的预训练的语言模型的学习。在本文中,我们提出了一种新颖的预训练的语言模型,称为AMBERT(A多晶BERT),二者细粒度和粗粒度tokenizations的基础上。对于英语,AMBERT需要的话(细粒度令牌)的两个序列和短语(粗粒令牌)作为输入的标记化后的序列,采用一个编码器,用于处理的字的序列与其他编码器用于处理的序列短语,利用共享的参数的两个编码器之间,并且最终创建的话情境化表示的序列和短语的语境表示的序列。实验已经在基准数据集为中国和英语,包括线索,胶水,班长和RACE进行。结果表明,AMBERT优于现有表现最好的车型在几乎所有情况下,特别是改进对中国显著。
Xinsong Zhang, Hang Li
Abstract: Pre-trained language models such as BERT have exhibited remarkable performances in many tasks in natural language understanding (NLU). The tokens in the models are usually fine-grained in the sense that for languages like English they are words or sub-words and for languages like Chinese they are characters. In English, for example, there are multi-word expressions which form natural lexical units and thus the use of coarse-grained tokenization also appears to be reasonable. In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models. In this paper, we propose a novel pre-trained language model, referred to as AMBERT (A Multi-grained BERT), on the basis of both fine-grained and coarse-grained tokenizations. For English, AMBERT takes both the sequence of words (fine-grained tokens) and the sequence of phrases (coarse-grained tokens) as input after tokenization, employs one encoder for processing the sequence of words and the other encoder for processing the sequence of the phrases, utilizes shared parameters between the two encoders, and finally creates a sequence of contextualized representations of the words and a sequence of contextualized representations of the phrases. Experiments have been conducted on benchmark datasets for Chinese and English, including CLUE, GLUE, SQuAD and RACE. The results show that AMBERT outperforms the existing best performing models in almost all cases, particularly the improvements are significant for Chinese.
摘要:预先训练语言模型,如BERT已经在自然语言理解(NLU)许多任务表现出卓越的性能。在模型中的标记通常被细颗粒在这个意义上,对于英语这种语言他们是词或分词和像中国的语言他们的字符。在英语中,例如,存在其形成天然词汇单元因此使用粗粒标记化的也似乎是合理的多字表达式。其实,无论是细粒度和粗粒度tokenizations都有优点和缺点的预训练的语言模型的学习。在本文中,我们提出了一种新颖的预训练的语言模型,称为AMBERT(A多晶BERT),二者细粒度和粗粒度tokenizations的基础上。对于英语,AMBERT需要的话(细粒度令牌)的两个序列和短语(粗粒令牌)作为输入的标记化后的序列,采用一个编码器,用于处理的字的序列与其他编码器用于处理的序列短语,利用共享的参数的两个编码器之间,并且最终创建的话情境化表示的序列和短语的语境表示的序列。实验已经在基准数据集为中国和英语,包括线索,胶水,班长和RACE进行。结果表明,AMBERT优于现有表现最好的车型在几乎所有情况下,特别是改进对中国显著。
12. On the Optimality of Vagueness: "Around", "Between", and the Gricean Maxims [PDF] 返回目录
Paul Egré, Benjamin Spector, Adèle Mortier, Steven Verheyen
Abstract: Why is our language vague? We argue that in contexts in which a cooperative speaker is not perfectly informed about the world, the use of vague expressions can offer an optimal tradeoff between truthfulness (Gricean Quality) and informativeness (Gricean Quantity). Focusing on expressions of approximation such as "around", which are semantically vague, we show that they allow the speaker to convey indirect probabilistic information, in a way that gives the listener a more accurate representation of the information available to the speaker than any more precise expression would (intervals of the form "between"). We give a probabilistic treatment of the interpretation of "around", and offer a model for the interpretation and use of "around"-statements within the Rational Speech Act (RSA) framework. Our model differs in substantive ways from the Lexical Uncertainty model often used within the RSA framework for vague predicates.
摘要:为什么我们的语言含糊不清?我们认为,在其中一个合作扬声器不完全了解世界环境中,使用模糊表述的可提供真实(格莱斯质量)和信息量(格莱斯数量)之间的最佳平衡。着眼于近似表达式,如“左右”,这是语义含糊,我们表明,它们允许扬声器传达间接的概率信息,在给听众的信息提供一个更准确的表示扬声器比任何更多的方式精确表达将(“之间”的形式的间隔)。我们给出了“左右”的解释的概率治疗,并为解释和使用的合理言语行为(RSA)框架内的“左右” -statements的典范。我们从词汇的不确定性模型实质性的方式不同模式往往是模糊谓词RSA范围内使用。
Paul Egré, Benjamin Spector, Adèle Mortier, Steven Verheyen
Abstract: Why is our language vague? We argue that in contexts in which a cooperative speaker is not perfectly informed about the world, the use of vague expressions can offer an optimal tradeoff between truthfulness (Gricean Quality) and informativeness (Gricean Quantity). Focusing on expressions of approximation such as "around", which are semantically vague, we show that they allow the speaker to convey indirect probabilistic information, in a way that gives the listener a more accurate representation of the information available to the speaker than any more precise expression would (intervals of the form "between"). We give a probabilistic treatment of the interpretation of "around", and offer a model for the interpretation and use of "around"-statements within the Rational Speech Act (RSA) framework. Our model differs in substantive ways from the Lexical Uncertainty model often used within the RSA framework for vague predicates.
摘要:为什么我们的语言含糊不清?我们认为,在其中一个合作扬声器不完全了解世界环境中,使用模糊表述的可提供真实(格莱斯质量)和信息量(格莱斯数量)之间的最佳平衡。着眼于近似表达式,如“左右”,这是语义含糊,我们表明,它们允许扬声器传达间接的概率信息,在给听众的信息提供一个更准确的表示扬声器比任何更多的方式精确表达将(“之间”的形式的间隔)。我们给出了“左右”的解释的概率治疗,并为解释和使用的合理言语行为(RSA)框架内的“左右” -statements的典范。我们从词汇的不确定性模型实质性的方式不同模式往往是模糊谓词RSA范围内使用。
13. SHAP values for Explaining CNN-based Text Classification Models [PDF] 返回目录
Wei Zhao, Tarun Joshi, Vijayan N. Nair, Agus Sudjianto
Abstract: Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applications where the problem is more challenging due to the size of the vocabulary, high-dimensional nature, and the need to consider textual coherence and language structure. This paper develops a methodology to compute SHAP values for local explainability of CNN-based text classification models. The approach is also extended to compute global scores to assess the importance of features. The results are illustrated on sentiment analysis of Amazon Electronic Review data.
摘要:深层神经网络在自然语言处理(NLP)的机型越来越多地使用。然而,解释和说明从复杂算法的结果需要被限制在管制的行业,如银行的广泛采用。最近有与结构化数据的机器学习算法解释性的工作。但只有有限的应用NLP技术,其中的问题是更大的挑战,由于词汇量,高维性质,以及需要考虑文字的连贯性和语言结构的大小。本文开发了一种方法,基于CNN-文本分类模型的局部explainability计算SHAP值。该方法也扩展到计算全局评分评估功能的重要性。结果在亚马逊电子审查数据的情感分析说明。
Wei Zhao, Tarun Joshi, Vijayan N. Nair, Agus Sudjianto
Abstract: Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applications where the problem is more challenging due to the size of the vocabulary, high-dimensional nature, and the need to consider textual coherence and language structure. This paper develops a methodology to compute SHAP values for local explainability of CNN-based text classification models. The approach is also extended to compute global scores to assess the importance of features. The results are illustrated on sentiment analysis of Amazon Electronic Review data.
摘要:深层神经网络在自然语言处理(NLP)的机型越来越多地使用。然而,解释和说明从复杂算法的结果需要被限制在管制的行业,如银行的广泛采用。最近有与结构化数据的机器学习算法解释性的工作。但只有有限的应用NLP技术,其中的问题是更大的挑战,由于词汇量,高维性质,以及需要考虑文字的连贯性和语言结构的大小。本文开发了一种方法,基于CNN-文本分类模型的局部explainability计算SHAP值。该方法也扩展到计算全局评分评估功能的重要性。结果在亚马逊电子审查数据的情感分析说明。
注:中文为机器翻译结果!封面为论文标题词云图!