目录
9. "What is on your mind?" Automated Scoring of Mindreading in Childhood and Early Adolescence [PDF] 摘要
10. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness [PDF] 摘要
14. Performance of Transfer Learning Model vs. Traditional Neural Network in Low System Resource Environment [PDF] 摘要
23. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases [PDF] 摘要
25. IIT_kgp at FinCausal 2020, Shared Task 1: Causality Detection using Sentence Embeddings in Financial Reports [PDF] 摘要
26. ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments [PDF] 摘要
31. Iterative Self-Learning for Enhanced Back-Translation in Low Resource Neural Machine Translation [PDF] 摘要
33. Words are the Window to the Soul: Language-based User Representations for Fake News Detection [PDF] 摘要
34. Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration [PDF] 摘要
38. CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection [PDF] 摘要
43. Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition [PDF] 摘要
46. Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [PDF] 摘要
47. hyper-sinh: An Accurate and Reliable Function from Shallow to Deep Learning in TensorFlow and Keras [PDF] 摘要
摘要
1. Learning from Task Descriptions [PDF] 返回目录
Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters
Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this framework with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model's ability to solve each task. Moreover, the dataset's structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.
摘要:通常,机器学习系统,通过对数千例培养解决新任务。相比之下,人类可以通过阅读一些指令,也许一个例子或两个解决新任务。要充分地拉近这一差距一步,我们介绍了用于开发阅读他们的描述,在此区域合成之前的工作后解决新任务NLP系统的框架。我们实例化一个新的英语数据集,ZEST,结构上看不见的任务,以任务为导向的评价此框架。制定任务描述为的问题,我们要确保每个将被应用于许多可能的输入,从而全面评估模型来解决每个任务的能力普遍不足。此外,该数据集的结构测试特定类型的系统概括。我们发现,在国家的最先进的T5模型实现了比分上ZEST 12%,留下NLP研究人员显著的挑战。
Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters
Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this framework with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model's ability to solve each task. Moreover, the dataset's structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.
摘要:通常,机器学习系统,通过对数千例培养解决新任务。相比之下,人类可以通过阅读一些指令,也许一个例子或两个解决新任务。要充分地拉近这一差距一步,我们介绍了用于开发阅读他们的描述,在此区域合成之前的工作后解决新任务NLP系统的框架。我们实例化一个新的英语数据集,ZEST,结构上看不见的任务,以任务为导向的评价此框架。制定任务描述为的问题,我们要确保每个将被应用于许多可能的输入,从而全面评估模型来解决每个任务的能力普遍不足。此外,该数据集的结构测试特定类型的系统概括。我们发现,在国家的最先进的T5模型实现了比分上ZEST 12%,留下NLP研究人员显著的挑战。
2. A Dataset for Tracking Entities in Open Domain Procedural Text [PDF] 返回目录
Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy
Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from this http URL. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.
摘要:我们提出的第一个数据集采用无限制(开放式)的词汇跟踪从任意域程序的文本状态的变化。例如,在使用马铃薯描述除雾文本,汽车窗可以是雾,发粘的,不透明的,和清晰之间转变。这个任务的上一个配方提供文本和有关实体,并要求这些实体的只是一个小的,预先定义一组属性(例如,位置)的如何变化,限制了他们的忠诚度。我们的解决方案是一个新的任务制剂,其中给定的只是程序文本作为输入,任务是产生一组状态改变元组(实体,在-贡,前状态,后状态)的每个步骤,其中,所述实体的,属性,值和状态值必须从一个开放的词汇来预测。使用众包,我们创建OPENPI1,高品质(91.5%的覆盖率由人判断,完全审核),以及大型数据集,包括超过4,050的句子从810程序现实世界的段落从这个HTTP URL 29928个状态变化。关于此任务的当前状态的最先进的生成模型实现基于BLEU度量16.1%F1,留下足够的空间用于新的模型架构。
Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy
Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from this http URL. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.
摘要:我们提出的第一个数据集采用无限制(开放式)的词汇跟踪从任意域程序的文本状态的变化。例如,在使用马铃薯描述除雾文本,汽车窗可以是雾,发粘的,不透明的,和清晰之间转变。这个任务的上一个配方提供文本和有关实体,并要求这些实体的只是一个小的,预先定义一组属性(例如,位置)的如何变化,限制了他们的忠诚度。我们的解决方案是一个新的任务制剂,其中给定的只是程序文本作为输入,任务是产生一组状态改变元组(实体,在-贡,前状态,后状态)的每个步骤,其中,所述实体的,属性,值和状态值必须从一个开放的词汇来预测。使用众包,我们创建OPENPI1,高品质(91.5%的覆盖率由人判断,完全审核),以及大型数据集,包括超过4,050的句子从810程序现实世界的段落从这个HTTP URL 29928个状态变化。关于此任务的当前状态的最先进的生成模型实现基于BLEU度量16.1%F1,留下足够的空间用于新的模型架构。
3. Tweet Sentiment Quantification: An Experimental Re-Evaluation [PDF] 返回目录
Alejandro Moreo, Fabrizio Sebastiani
Abstract: Sentiment quantification is the task of estimating the relative frequency (or "prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts; this is especially important when these texts are tweets, since most sentiment classification endeavours carried out on Twitter data actually have quantification (and not the classification of individual tweets) as their ultimate goal. It is well-known that solving quantification via "classify and count" (i.e., by classifying all unlabelled items via a standard classifier and counting the items that have been assigned to a given class) is suboptimal in terms of accuracy, and that more accurate quantification methods exist. In 2016, Gao and Sebastiani carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimental protocol followed in that work is flawed, and that its results are thus unreliable. We now re-evaluate those quantification methods on the very same datasets, this time following a now consolidated and much more robust experimental protocol, that involves 5775 as many experiments as run in the original study. Our experimentation yields results dramatically different from those obtained by Gao and Sebastiani, and thus provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
摘要:情绪量化是估计的相对频率的任务(或“患病”)情感相关的类(如正,中性,负)的未标记的文本的样品中的;当这些文本鸣叫这一点尤其重要,因为大多数情感分类的努力在Twitter上的数据进行居然有定量(和个人微博的不分类)作为自己的终极目标。这是众所周知的是,通过“分类和计数”(即,通过标准分类的所有未标记的项目分类和计数已分配到给定类的项目)是次优的准确性方面,以及更准确的解决定量存在定量的方法。在2016年,高和塞巴斯蒂亚尼进行的定量方法对鸣叫情绪量化的任务系统比较。事后看来,我们观察到的实验方案遵循的工作是有缺陷的,并且它的结果是这样不可靠的。现在,我们重新评估对同样的数据集的定量方法,这个时候跟随现在巩固和更强大的实验方案,涉及5775尽可能多的实验,在原来学习运行。我们的试验产生的结果由高和塞巴斯蒂亚尼获得的显着不同,从而提供不同的,相对优势和不同的情绪量化方法的弱点更加深刻的理解。
Alejandro Moreo, Fabrizio Sebastiani
Abstract: Sentiment quantification is the task of estimating the relative frequency (or "prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts; this is especially important when these texts are tweets, since most sentiment classification endeavours carried out on Twitter data actually have quantification (and not the classification of individual tweets) as their ultimate goal. It is well-known that solving quantification via "classify and count" (i.e., by classifying all unlabelled items via a standard classifier and counting the items that have been assigned to a given class) is suboptimal in terms of accuracy, and that more accurate quantification methods exist. In 2016, Gao and Sebastiani carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimental protocol followed in that work is flawed, and that its results are thus unreliable. We now re-evaluate those quantification methods on the very same datasets, this time following a now consolidated and much more robust experimental protocol, that involves 5775 as many experiments as run in the original study. Our experimentation yields results dramatically different from those obtained by Gao and Sebastiani, and thus provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
摘要:情绪量化是估计的相对频率的任务(或“患病”)情感相关的类(如正,中性,负)的未标记的文本的样品中的;当这些文本鸣叫这一点尤其重要,因为大多数情感分类的努力在Twitter上的数据进行居然有定量(和个人微博的不分类)作为自己的终极目标。这是众所周知的是,通过“分类和计数”(即,通过标准分类的所有未标记的项目分类和计数已分配到给定类的项目)是次优的准确性方面,以及更准确的解决定量存在定量的方法。在2016年,高和塞巴斯蒂亚尼进行的定量方法对鸣叫情绪量化的任务系统比较。事后看来,我们观察到的实验方案遵循的工作是有缺陷的,并且它的结果是这样不可靠的。现在,我们重新评估对同样的数据集的定量方法,这个时候跟随现在巩固和更强大的实验方案,涉及5775尽可能多的实验,在原来学习运行。我们的试验产生的结果由高和塞巴斯蒂亚尼获得的显着不同,从而提供不同的,相对优势和不同的情绪量化方法的弱点更加深刻的理解。
4. Answer Identification in Collaborative Organizational Group Chat [PDF] 返回目录
Naama Tepper, Naama Zwerdling, David Naori, Inbal Ronen
Abstract: We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.
摘要:我们提出了在组织群聊的答案识别简单的无监督的办法。近年来,组织群聊是在上升在不同地点和时区使同事之间的异步基于文本的协作。找到问题的答案往往是工作效率的关键。然而,群聊的特点是“始终在线”的可用性交织在一起交谈,并使其难以为用户精确的答案,他们在实时的关心或搜索追溯答案的问题。此外,结构和词汇特征聊天不同群体之间,很难找到一个“模式适合所有”的做法。我们的核密度估计(KDE)基于聚类方法称为ANS-聊天隐式学习讨论图案作为应答识别的装置,从而消除了需要特定的信道标记。实证分析表明,该解决方案优于其他接洽。
Naama Tepper, Naama Zwerdling, David Naori, Inbal Ronen
Abstract: We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.
摘要:我们提出了在组织群聊的答案识别简单的无监督的办法。近年来,组织群聊是在上升在不同地点和时区使同事之间的异步基于文本的协作。找到问题的答案往往是工作效率的关键。然而,群聊的特点是“始终在线”的可用性交织在一起交谈,并使其难以为用户精确的答案,他们在实时的关心或搜索追溯答案的问题。此外,结构和词汇特征聊天不同群体之间,很难找到一个“模式适合所有”的做法。我们的核密度估计(KDE)基于聚类方法称为ANS-聊天隐式学习讨论图案作为应答识别的装置,从而消除了需要特定的信道标记。实证分析表明,该解决方案优于其他接洽。
5. Analyzing Sustainability Reports Using Natural Language Processing [PDF] 返回目录
Alexandra Luccioni, Emily Bailor, Nicolas Duchene
Abstract: Climate change is a far-reaching, global phenomenon that will impact many aspects of our society, including the global stock market \cite{dietz2016climate}. In recent years, companies have increasingly been aiming to both mitigate their environmental impact and adapt to the changing climate context. This is reported via increasingly exhaustive reports, which cover many types of climate risks and exposures under the umbrella of Environmental, Social, and Governance (ESG). However, given this abundance of data, sustainability analysts are obliged to comb through hundreds of pages of reports in order to find relevant information. We leveraged recent progress in Natural Language Processing (NLP) to create a custom model, ClimateQA, which allows the analysis of financial reports in order to identify climate-relevant sections based on a question answering approach. We present this tool and the methodology that we used to develop it in the present article.
摘要:气候变化是深远的,全球性的现象,这将影响我们社会的许多方面,包括全球股市\ {引用} dietz2016climate。近年来,企业已经越来越多地瞄准既减轻其对环境的影响和适应变化的气候环境。这是通过日益详尽的报告,其中包括多种类型的气候风险和暴露的环境,社会和治理(ESG)的保护伞下的报道。然而,由于这种丰富的资料,可持续性分析师通过数百页的报告,以查找相关资料有义务梳。我们利用最新进展在自然语言处理(NLP)创建一个自定义模式,ClimateQA,这使得财务报告的分析,以确定根据答疑的方式与气候相关的部分。我们提出这个工具,我们用来开发它在本文章中的方法。
Alexandra Luccioni, Emily Bailor, Nicolas Duchene
Abstract: Climate change is a far-reaching, global phenomenon that will impact many aspects of our society, including the global stock market \cite{dietz2016climate}. In recent years, companies have increasingly been aiming to both mitigate their environmental impact and adapt to the changing climate context. This is reported via increasingly exhaustive reports, which cover many types of climate risks and exposures under the umbrella of Environmental, Social, and Governance (ESG). However, given this abundance of data, sustainability analysts are obliged to comb through hundreds of pages of reports in order to find relevant information. We leveraged recent progress in Natural Language Processing (NLP) to create a custom model, ClimateQA, which allows the analysis of financial reports in order to identify climate-relevant sections based on a question answering approach. We present this tool and the methodology that we used to develop it in the present article.
摘要:气候变化是深远的,全球性的现象,这将影响我们社会的许多方面,包括全球股市\ {引用} dietz2016climate。近年来,企业已经越来越多地瞄准既减轻其对环境的影响和适应变化的气候环境。这是通过日益详尽的报告,其中包括多种类型的气候风险和暴露的环境,社会和治理(ESG)的保护伞下的报道。然而,由于这种丰富的资料,可持续性分析师通过数百页的报告,以查找相关资料有义务梳。我们利用最新进展在自然语言处理(NLP)创建一个自定义模式,ClimateQA,这使得财务报告的分析,以确定根据答疑的方式与气候相关的部分。我们提出这个工具,我们用来开发它在本文章中的方法。
6. Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles [PDF] 返回目录
Amanuel Alambo, Cori Lohstroh, Erik Madaus, Swati Padhee, Brandy Foster, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael Raymer
Abstract: Recent advances in natural language processing have enabled automation of a wide range of tasks, including machine translation, named entity recognition, and sentiment analysis. Automated summarization of documents, or groups of documents, however, has remained elusive, with many efforts limited to extraction of keywords, key phrases, or key sentences. Accurate abstractive summarization has yet to be achieved due to the inherent difficulty of the problem, and limited availability of training data. In this paper, we propose a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques. Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics (entailment, coherence, conciseness, readability, and grammar). We achieve a kappa score of 0.68 between two co-author linguists who evaluated our results. We plan to publicly share MAG-20, a human-validated gold standard dataset of topic-clustered research articles and their summaries to promote research in abstractive summarization.
摘要:在自然语言处理的最新进展已启用一系列任务,包括机器翻译,命名实体识别和情感分析的自动化。文件或文件组自动汇总,然而,一直难以实现,有许多的努力仅限于关键字,关键短语或句子的关键提取。精确的抽象概括尚待由于问题的内在困难实现的,有限的培训数据的可用性。在本文中,我们提出了一个主题为中心的监督的多文档文摘框架来生成DUC - 2004年的任务为在20个研究领域(FOS)在微软学术图表的科学论文组(MAG)和新闻文章采掘和抽象摘要2.所提出的算法产生由显影凸语言单元的选择和文本生成技术的抽象摘要。我们的方法的状态的最先进的自动上采掘评价指标和执行用于在五个人工评估指标(蕴涵,一致性,简明,可读性和语法)抽象汇总更好评价时相匹配。我们实现的谁评估我们的结果0.68两者之间的共同作者语言学家卡帕得分。我们计划公开分享MAG-20,话题集群的研究论文和概要的人验证的黄金标准数据集,以促进抽象总结研究。
Amanuel Alambo, Cori Lohstroh, Erik Madaus, Swati Padhee, Brandy Foster, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael Raymer
Abstract: Recent advances in natural language processing have enabled automation of a wide range of tasks, including machine translation, named entity recognition, and sentiment analysis. Automated summarization of documents, or groups of documents, however, has remained elusive, with many efforts limited to extraction of keywords, key phrases, or key sentences. Accurate abstractive summarization has yet to be achieved due to the inherent difficulty of the problem, and limited availability of training data. In this paper, we propose a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques. Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics (entailment, coherence, conciseness, readability, and grammar). We achieve a kappa score of 0.68 between two co-author linguists who evaluated our results. We plan to publicly share MAG-20, a human-validated gold standard dataset of topic-clustered research articles and their summaries to promote research in abstractive summarization.
摘要:在自然语言处理的最新进展已启用一系列任务,包括机器翻译,命名实体识别和情感分析的自动化。文件或文件组自动汇总,然而,一直难以实现,有许多的努力仅限于关键字,关键短语或句子的关键提取。精确的抽象概括尚待由于问题的内在困难实现的,有限的培训数据的可用性。在本文中,我们提出了一个主题为中心的监督的多文档文摘框架来生成DUC - 2004年的任务为在20个研究领域(FOS)在微软学术图表的科学论文组(MAG)和新闻文章采掘和抽象摘要2.所提出的算法产生由显影凸语言单元的选择和文本生成技术的抽象摘要。我们的方法的状态的最先进的自动上采掘评价指标和执行用于在五个人工评估指标(蕴涵,一致性,简明,可读性和语法)抽象汇总更好评价时相匹配。我们实现的谁评估我们的结果0.68两者之间的共同作者语言学家卡帕得分。我们计划公开分享MAG-20,话题集群的研究论文和概要的人验证的黄金标准数据集,以促进抽象总结研究。
7. JNLP Team: Deep Learning for Legal Processing in COLIEE 2020 [PDF] 返回目录
Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, Minh Le Nguyen
Abstract: We propose deep learning based methods for automatic systems of legal retrieval and legal question-answering in COLIEE 2020. These systems are all characterized by being pre-trained on large amounts of data before being finetuned for the specified tasks. This approach helps to overcome the data scarcity and achieve good performance, thus can be useful for tackling related problems in information retrieval, and decision support in the legal domain. Besides, the approach can be explored to deal with other domain specific problems.
摘要:我们提出了法律检索和自动系统深度学习为基础的方法的法律问题,在回答2020年COLIEE这些系统全部特征在于其预先训练被微调,指定的任务之前,大量的数据。这种方法有助于克服数据缺乏和取得良好的业绩,从而可以为解决在法律领域的信息检索和决策支持有关的问题非常有用。此外,该方法可以探讨处理其他领域的具体问题。
Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, Minh Le Nguyen
Abstract: We propose deep learning based methods for automatic systems of legal retrieval and legal question-answering in COLIEE 2020. These systems are all characterized by being pre-trained on large amounts of data before being finetuned for the specified tasks. This approach helps to overcome the data scarcity and achieve good performance, thus can be useful for tackling related problems in information retrieval, and decision support in the legal domain. Besides, the approach can be explored to deal with other domain specific problems.
摘要:我们提出了法律检索和自动系统深度学习为基础的方法的法律问题,在回答2020年COLIEE这些系统全部特征在于其预先训练被微调,指定的任务之前,大量的数据。这种方法有助于克服数据缺乏和取得良好的业绩,从而可以为解决在法律领域的信息检索和决策支持有关的问题非常有用。此外,该方法可以探讨处理其他领域的具体问题。
8. Hierarchical Transformer for Task Oriented Dialog Systems [PDF] 返回目录
Bishal Santra, Potnuru Anusha, Pawan Goyal
Abstract: Generative models for dialog system have gained a lot of interest because of the success of recent RNN and Transformer based models in complex natural language tasks like question answering and summarization. Although, the task of dialog response generation is generally seen as a sequence to sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn important utterance and conversation level features Sordoni et al.(2015); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer based models dominating the seq2seq problems lately, a natural question is to understand the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we show how a standard transformer can be morphed into a hierarchical one by using specially designed attention masks and positional embeddings. Our experiments show strong improvements in context-to-response generation performance for task-oriented dialog system over the current state-of-the-art approaches.
摘要:对话系统生成模型都获得了很大的兴趣,因为最近RNN和变压器基于模型的像问答和总结复杂的自然语言任务的成功。虽然对话响应生成的任务通常被视为一个序列序列(Seq2Seq)的问题,研究人员在过去发现它具有挑战性的训练使用标准Seq2Seq车型对话系统。因此,帮助模型学习重要讲话和谈话的音量功能Sordoni等人(2015年)。谢尔班等。 (2016)提出的分层架构RNN,后来被其他几个基于RNN对话系统采用。随着基于变压器的车型占据了seq2seq问题最近,一个自然的问题是了解分层的概念的适用性基于变压器的对话系统。在本文中,我们将展示如何将标准的变压器可以通过使用特殊设计的注意口罩和位置的嵌入被演变成一个层次的。我们的实验显示在当前国家的最先进的方法在上下文中对反应生成性能强劲提升面向任务的对话系统。
Bishal Santra, Potnuru Anusha, Pawan Goyal
Abstract: Generative models for dialog system have gained a lot of interest because of the success of recent RNN and Transformer based models in complex natural language tasks like question answering and summarization. Although, the task of dialog response generation is generally seen as a sequence to sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn important utterance and conversation level features Sordoni et al.(2015); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer based models dominating the seq2seq problems lately, a natural question is to understand the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we show how a standard transformer can be morphed into a hierarchical one by using specially designed attention masks and positional embeddings. Our experiments show strong improvements in context-to-response generation performance for task-oriented dialog system over the current state-of-the-art approaches.
摘要:对话系统生成模型都获得了很大的兴趣,因为最近RNN和变压器基于模型的像问答和总结复杂的自然语言任务的成功。虽然对话响应生成的任务通常被视为一个序列序列(Seq2Seq)的问题,研究人员在过去发现它具有挑战性的训练使用标准Seq2Seq车型对话系统。因此,帮助模型学习重要讲话和谈话的音量功能Sordoni等人(2015年)。谢尔班等。 (2016)提出的分层架构RNN,后来被其他几个基于RNN对话系统采用。随着基于变压器的车型占据了seq2seq问题最近,一个自然的问题是了解分层的概念的适用性基于变压器的对话系统。在本文中,我们将展示如何将标准的变压器可以通过使用特殊设计的注意口罩和位置的嵌入被演变成一个层次的。我们的实验显示在当前国家的最先进的方法在上下文中对反应生成性能强劲提升面向任务的对话系统。
9. "What is on your mind?" Automated Scoring of Mindreading in Childhood and Early Adolescence [PDF] 返回目录
Venelin Kovatchev, Phillip Smith, Mark Lee, Imogen Grumley Traynor, Irene Luque Aguilera, Rory T. Devine
Abstract: In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066 children aged 7 to 14. We perform machine learning experiments and carry out extensive quantitative and qualitative evaluation. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.
摘要:本文提出了对mindreading在中间的童年和青春期早期能力的自动评分的第一部作品。我们创造MIND-CA,在英语11311问答配对从1066点7岁至14的孩子,我们进行机器学习实验,进行广泛的定量和定性评估新的语料库。我们获得可喜的成果,展示了国家的最先进的解决方案,NLP的适用性到一个新的领域和任务。
Venelin Kovatchev, Phillip Smith, Mark Lee, Imogen Grumley Traynor, Irene Luque Aguilera, Rory T. Devine
Abstract: In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066 children aged 7 to 14. We perform machine learning experiments and carry out extensive quantitative and qualitative evaluation. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.
摘要:本文提出了对mindreading在中间的童年和青春期早期能力的自动评分的第一部作品。我们创造MIND-CA,在英语11311问答配对从1066点7岁至14的孩子,我们进行机器学习实验,进行广泛的定量和定性评估新的语料库。我们获得可喜的成果,展示了国家的最先进的解决方案,NLP的适用性到一个新的领域和任务。
10. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness [PDF] 返回目录
António Branco, João Rodrigues, Małgorzata Salawa, Ruben Branco, Chakaveh Saedi
Abstract: Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space, thus raising the question: is it the case that one of these approaches is superior to the others in representing lexical semantics appropriately? Or in its non antagonistic counterpart: could there be a unified account of lexical semantics where these approaches seamlessly emerge as (partial) renderings of (different) aspects of a core semantic knowledge base? In this paper, we contribute to these research questions with a number of experiments that systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness. The empirical findings obtained from these experiments advance our insight on lexical semantics as the feature-based approach emerges as superior to the other ones, and arguably also move us closer to finding answers to the research questions above.
摘要:词汇语义理论中主张的字义被表示为推理图,特征映射或一个向量空间,从而提高了问题不同:是它的情况是其中的一个方法是优于其他在表示词法语义是否正确?或在其非对抗性的对手:莫不是一个统一的账户词汇语义的地方,这些方法无缝脱颖而出,成为一个核心语义知识库的(不同的)方面(部分)效果?在本文中,我们贡献了大量的实验研究这些问题系统地探测和技术实用性他们的认知合理性的水平不同词汇语义理论。从这些实验中获得的实证研究结果推进我们的洞察力对词汇的语义特征为基础的方法出现优越于其他的人,可以说也推动我们更加接近找到答案,研究问题上面。
António Branco, João Rodrigues, Małgorzata Salawa, Ruben Branco, Chakaveh Saedi
Abstract: Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space, thus raising the question: is it the case that one of these approaches is superior to the others in representing lexical semantics appropriately? Or in its non antagonistic counterpart: could there be a unified account of lexical semantics where these approaches seamlessly emerge as (partial) renderings of (different) aspects of a core semantic knowledge base? In this paper, we contribute to these research questions with a number of experiments that systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness. The empirical findings obtained from these experiments advance our insight on lexical semantics as the feature-based approach emerges as superior to the other ones, and arguably also move us closer to finding answers to the research questions above.
摘要:词汇语义理论中主张的字义被表示为推理图,特征映射或一个向量空间,从而提高了问题不同:是它的情况是其中的一个方法是优于其他在表示词法语义是否正确?或在其非对抗性的对手:莫不是一个统一的账户词汇语义的地方,这些方法无缝脱颖而出,成为一个核心语义知识库的(不同的)方面(部分)效果?在本文中,我们贡献了大量的实验研究这些问题系统地探测和技术实用性他们的认知合理性的水平不同词汇语义理论。从这些实验中获得的实证研究结果推进我们的洞察力对词汇的语义特征为基础的方法出现优越于其他的人,可以说也推动我们更加接近找到答案,研究问题上面。
11. The Person Index Challenge: Extraction of Persons from Messy, Short Texts [PDF] 返回目录
Markus Schröder, Christian Jilek, Michael Schulze, Andreas Dengel
Abstract: When persons are mentioned in texts with their first name, last name and/or middle names, there can be a high variation which of their names are used, how their names are ordered and if their names are abbreviated. If multiple persons are mentioned consecutively in very different ways, especially short texts can be perceived as "messy". Once ambiguous names occur, associations to persons may not be inferred correctly. Despite these eventualities, in this paper we ask how well an unsupervised algorithm can build a person index from short texts. We define a person index as a structured table that distinctly catalogs individuals by their names. First, we give a formal definition of the problem and describe a procedure to generate ground truth data for future evaluations. To give a first solution to this challenge, a baseline approach is implemented. By using our proposed evaluation strategy, we test the performance of the baseline and suggest further improvements. For future research the source code is publicly available.
摘要:当人们在文本与他们的第一个名字,姓氏和/或中间名提到的,可以是自己的名字被使用,怎么他们的名字是有序的,如果他们的名字缩写高变化。如果多个人都在非常不同的方式连续提及,尤其是短文本可以被理解为“混乱”。一旦暧昧的名字出现,协会的人可能无法正确推断。尽管有这些不测事件,在本文中,我们提出一种无监督的算法可以如何建立从短文人指数。我们定义一个人指数为结构化表,通过它们的名字明显编目个人。首先,我们给出了问题的正式定义和描述的过程来产生未来的评估地面实况数据。为了让第一个解决这一难题,基线的方法来实现。通过使用我们提出的评估策略,我们测试基线的性能,并提出进一步改进。对于未来的研究源代码是公开的。
Markus Schröder, Christian Jilek, Michael Schulze, Andreas Dengel
Abstract: When persons are mentioned in texts with their first name, last name and/or middle names, there can be a high variation which of their names are used, how their names are ordered and if their names are abbreviated. If multiple persons are mentioned consecutively in very different ways, especially short texts can be perceived as "messy". Once ambiguous names occur, associations to persons may not be inferred correctly. Despite these eventualities, in this paper we ask how well an unsupervised algorithm can build a person index from short texts. We define a person index as a structured table that distinctly catalogs individuals by their names. First, we give a formal definition of the problem and describe a procedure to generate ground truth data for future evaluations. To give a first solution to this challenge, a baseline approach is implemented. By using our proposed evaluation strategy, we test the performance of the baseline and suggest further improvements. For future research the source code is publicly available.
摘要:当人们在文本与他们的第一个名字,姓氏和/或中间名提到的,可以是自己的名字被使用,怎么他们的名字是有序的,如果他们的名字缩写高变化。如果多个人都在非常不同的方式连续提及,尤其是短文本可以被理解为“混乱”。一旦暧昧的名字出现,协会的人可能无法正确推断。尽管有这些不测事件,在本文中,我们提出一种无监督的算法可以如何建立从短文人指数。我们定义一个人指数为结构化表,通过它们的名字明显编目个人。首先,我们给出了问题的正式定义和描述的过程来产生未来的评估地面实况数据。为了让第一个解决这一难题,基线的方法来实现。通过使用我们提出的评估策略,我们测试基线的性能,并提出进一步改进。对于未来的研究源代码是公开的。
12. Datasets and Models for Authorship Attribution on Italian Personal Writings [PDF] 返回目录
Gaetana Ruggiero, Albert Gatt, Malvina Nissim
Abstract: Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.
摘要:对著作权归属(AA)现有的研究主要集中在文本,供其大量的数据是可用的(例如小说),以英语为主。我们通过著作权验证短意大利文在两个新的数据集接近AA和分析流派,主题,性别和长度之间的相互作用。结果表明,AV是可行的,甚至几乎没有数据,但更多的证据帮助。性别和话题可以指示线索,如果没有控制,他们可能反超的个人风格更具体的方面。
Gaetana Ruggiero, Albert Gatt, Malvina Nissim
Abstract: Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.
摘要:对著作权归属(AA)现有的研究主要集中在文本,供其大量的数据是可用的(例如小说),以英语为主。我们通过著作权验证短意大利文在两个新的数据集接近AA和分析流派,主题,性别和长度之间的相互作用。结果表明,AV是可行的,甚至几乎没有数据,但更多的证据帮助。性别和话题可以指示线索,如果没有控制,他们可能反超的个人风格更具体的方面。
13. Learning from similarity and information extraction from structured documents [PDF] 返回目录
Martin Holeček
Abstract: Neural networks have successfully advanced in the task of information extraction from structured documents. In business document processing more precise techniques equal more automation and less manual work. In this paper we will design and examine various fully trainable approaches to use siamese networks, concepts of similarity, one-shot learning and context/memory awareness. The aim is to improve micro F_{1} of per-word classification on a testing split of an existing real world document dataset. The results verify the hypothesis, that access to a similar (yet still different) page with it's target information improves the information extraction. Furthermore the added contributions (in addition to siamese networks) of employing a class information, query-answer attention module and skip connections to the similar page are all required to beat the previous results. Our best model improves previous state-of-art results by 0.0825 gain in F1 score. All the techniques used are not problem-specific and should be generalizable to help in other tasks and contexts. The code and anonymized version of the dataset are provided.
摘要:神经网络已经成功地推进,从结构化文档信息提取的任务。在商务文档处理更精确的技术等于更多的自动化,减少人工作业。在本文中,我们将设计和检验各种充分训练的方法来使用连体网络,相似的概念,单次的学习和上下文/内存意识。其目的是提高每字分类的微F_ {1}上的现有真实世界的文档数据集的测试分裂。结果验证了假设,即进入一个类似的(但仍不同)页面,它的目标信息,提高了信息提取。而且采用的一类信息,查询应答注意模块的附加贡献(除连体网络),并跳过连接到类似的页面都需要击败以前的结果。我们最好的模型提高了以前的状态的最先进成果在F1得分0.0825增益。所使用的所有技术不是问题,具体的,应该推广到其他任务和上下文的帮助。提供的数据集的代码和匿名版本。
Martin Holeček
Abstract: Neural networks have successfully advanced in the task of information extraction from structured documents. In business document processing more precise techniques equal more automation and less manual work. In this paper we will design and examine various fully trainable approaches to use siamese networks, concepts of similarity, one-shot learning and context/memory awareness. The aim is to improve micro F_{1} of per-word classification on a testing split of an existing real world document dataset. The results verify the hypothesis, that access to a similar (yet still different) page with it's target information improves the information extraction. Furthermore the added contributions (in addition to siamese networks) of employing a class information, query-answer attention module and skip connections to the similar page are all required to beat the previous results. Our best model improves previous state-of-art results by 0.0825 gain in F1 score. All the techniques used are not problem-specific and should be generalizable to help in other tasks and contexts. The code and anonymized version of the dataset are provided.
摘要:神经网络已经成功地推进,从结构化文档信息提取的任务。在商务文档处理更精确的技术等于更多的自动化,减少人工作业。在本文中,我们将设计和检验各种充分训练的方法来使用连体网络,相似的概念,单次的学习和上下文/内存意识。其目的是提高每字分类的微F_ {1}上的现有真实世界的文档数据集的测试分裂。结果验证了假设,即进入一个类似的(但仍不同)页面,它的目标信息,提高了信息提取。而且采用的一类信息,查询应答注意模块的附加贡献(除连体网络),并跳过连接到类似的页面都需要击败以前的结果。我们最好的模型提高了以前的状态的最先进成果在F1得分0.0825增益。所使用的所有技术不是问题,具体的,应该推广到其他任务和上下文的帮助。提供的数据集的代码和匿名版本。
14. Performance of Transfer Learning Model vs. Traditional Neural Network in Low System Resource Environment [PDF] 返回目录
William Hui
Abstract: Recently, the use of pre-trained model to build neural network based on transfer learning methodology is increasingly popular. These pre-trained models present the benefit of using less computing resources to train model with smaller amount of training data. The rise of state-of-the-art models such as BERT, XLNet and GPT boost accuracy and benefit as a base model for transfer leanring. However, these models are still too complex and consume many computing resource to train for transfer learning with low GPU memory. We will compare the performance and cost between lighter transfer learning model and purposely built neural network for NLP application of text classification and NER model.
摘要:近日,采用预先训练模型的构建基于传输的学习方法神经网络是越来越受欢迎。这些预先训练模型提出使用更少的计算资源来训练模型的训练数据量较小的优点。状态的最先进的模型,例如BERT,XLNet和GPT升压精度和益处作为转印leanring基础模型的崛起。然而,这些车型仍然过于复杂,消耗大量计算资源转移低GPU内存的学习训练。我们将比较轻迁移学习模型和文本分类和NER模型的NLP应用特意建神经网络的性能和成本。
William Hui
Abstract: Recently, the use of pre-trained model to build neural network based on transfer learning methodology is increasingly popular. These pre-trained models present the benefit of using less computing resources to train model with smaller amount of training data. The rise of state-of-the-art models such as BERT, XLNet and GPT boost accuracy and benefit as a base model for transfer leanring. However, these models are still too complex and consume many computing resource to train for transfer learning with low GPU memory. We will compare the performance and cost between lighter transfer learning model and purposely built neural network for NLP application of text classification and NER model.
摘要:近日,采用预先训练模型的构建基于传输的学习方法神经网络是越来越受欢迎。这些预先训练模型提出使用更少的计算资源来训练模型的训练数据量较小的优点。状态的最先进的模型,例如BERT,XLNet和GPT升压精度和益处作为转印leanring基础模型的崛起。然而,这些车型仍然过于复杂,消耗大量计算资源转移低GPU内存的学习训练。我们将比较轻迁移学习模型和文本分类和NER模型的NLP应用特意建神经网络的性能和成本。
15. An Empirical Investigation of Contextualized Number Prediction [PDF] 返回目录
Daniel Spokoyny, Taylor Berg-Kirkpatrick
Abstract: We conduct a large scale empirical investigation of contextualized number prediction in running text. Specifically, we consider two tasks: (1)masked number prediction-predicting a missing numerical value within a sentence, and (2)numerical anomaly detection-detecting an errorful numeric value within a sentence. We experiment with novel combinations of contextual encoders and output distributions over the real number line. Specifically, we introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text, and combine them with both recurrent and transformer-based encoder architectures. We evaluate these models on two numeric datasets in the financial and scientific domain. Our findings show that output distributions that incorporate discrete latent variables and allow for multiple modes outperform simple flow-based counterparts on all datasets, yielding more accurate numerical prediction and anomaly detection. We also show that our models effectively utilize textual con-text and benefit from general-purpose unsupervised pretraining.
摘要:我们进行情境号预测的运行文本大规模实证调查。具体来说,我们考虑两个任务:(1)数掩蔽预测预测一个句子中缺少的数值,和(2)的数值的异常检测检测一个句子中的errorful数值。我们用在实数线的上下文编码器和输出分布的新组合进行试验。具体来说,我们推出了一套结合了潜在变量添加表现力和更适合的数值的运行文本的自然分布,并与这两个经常性和基于变压器的编码器架构将它们组合输出分布的参数化。我们评估对金融和科学领域的两个数值数据集这些模型。我们的研究结果表明,采用离散潜变量和允许多个模式,输出分布优于上的所有数据集的简单的基于流的对应物,得到更准确的数值预测和异常检测。我们还表明,我们的模型有效地利用从通用无人监督的训练前的文本CON-文字和效益。
Daniel Spokoyny, Taylor Berg-Kirkpatrick
Abstract: We conduct a large scale empirical investigation of contextualized number prediction in running text. Specifically, we consider two tasks: (1)masked number prediction-predicting a missing numerical value within a sentence, and (2)numerical anomaly detection-detecting an errorful numeric value within a sentence. We experiment with novel combinations of contextual encoders and output distributions over the real number line. Specifically, we introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text, and combine them with both recurrent and transformer-based encoder architectures. We evaluate these models on two numeric datasets in the financial and scientific domain. Our findings show that output distributions that incorporate discrete latent variables and allow for multiple modes outperform simple flow-based counterparts on all datasets, yielding more accurate numerical prediction and anomaly detection. We also show that our models effectively utilize textual con-text and benefit from general-purpose unsupervised pretraining.
摘要:我们进行情境号预测的运行文本大规模实证调查。具体来说,我们考虑两个任务:(1)数掩蔽预测预测一个句子中缺少的数值,和(2)的数值的异常检测检测一个句子中的errorful数值。我们用在实数线的上下文编码器和输出分布的新组合进行试验。具体来说,我们推出了一套结合了潜在变量添加表现力和更适合的数值的运行文本的自然分布,并与这两个经常性和基于变压器的编码器架构将它们组合输出分布的参数化。我们评估对金融和科学领域的两个数值数据集这些模型。我们的研究结果表明,采用离散潜变量和允许多个模式,输出分布优于上的所有数据集的简单的基于流的对应物,得到更准确的数值预测和异常检测。我们还表明,我们的模型有效地利用从通用无人监督的训练前的文本CON-文字和效益。
16. Explicitly Modeling Syntax in Language Model improves Generalization [PDF] 返回目录
Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
Abstract: Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly model syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
摘要:语法是我们对语言的思想基础。虽然神经网络在很多任务非常成功,他们没有明确的模型句法结构。未能捕捉到的输入结构可能导致泛化的问题和过度参数化。在目前的工作中,我们提出了一种新的语法感知的语言模型:句法有序存储器(SOM)。该模型明确型号的一步前瞻解析器的结构和维护标准的语言模型的条件概率设定。实验表明,SOM能够实现语言模型和句法推广试验强劲业绩,同时使用更少的参数,那么其他车型。
Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
Abstract: Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly model syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
摘要:语法是我们对语言的思想基础。虽然神经网络在很多任务非常成功,他们没有明确的模型句法结构。未能捕捉到的输入结构可能导致泛化的问题和过度参数化。在目前的工作中,我们提出了一种新的语法感知的语言模型:句法有序存储器(SOM)。该模型明确型号的一步前瞻解析器的结构和维护标准的语言模型的条件概率设定。实验表明,SOM能够实现语言模型和句法推广试验强劲业绩,同时使用更少的参数,那么其他车型。
17. Pre-training Text-to-Text Transformers for Concept-centric Common Sense [PDF] 返回目录
Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren
Abstract: Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks. However, current pre-training objectives such as masked token prediction (for BERT-style PTLMs) and masked span infilling (for T5-style PTLMs) do not explicitly model the relational commonsense knowledge about everyday concepts, which is crucial to many downstream tasks that need common sense to understand or generate. To augment PTLMs with concept-centric commonsense knowledge, in this paper, we propose both generative and contrastive objectives for learning common sense from the text, and use them as intermediate self-supervised learning tasks for incrementally pre-training PTLMs (before task-specific fine-tuning on downstream datasets). Furthermore, we develop a joint pre-training framework to unify generative and contrastive objectives so that they can mutually reinforce each other. Extensive experimental results show that our method, concept-aware language model (CALM), can pack more commonsense knowledge into the parameters of a pre-trained text-to-text transformer without relying on external knowledge graphs, yielding better performance on both NLU and NLG tasks. We show that while only incrementally pre-trained on a relatively small corpus for a few steps, CALM outperforms baseline methods by a consistent margin and even comparable with some larger PTLMs, which suggests that CALM can serve as a general, plug-and-play method for improving the commonsense reasoning ability of a PTLM.
摘要:预先训练的语言模型(PTLM)已经在一系列的自然语言理解(NLU)和代(NLG)任务,取得了不俗的成绩。然而,目前的预培养目标,如屏蔽令牌预测(用于BERT式PTLMs)和屏蔽跨度充填(对于T5式PTLMs)没有明确的模型对日常概念的关系常识性知识,这是至关重要的许多下游的任务是需要常识理解或产生。与概念为中心的常识性知识的扩充PTLMs,在本文中,我们提出了两种生成和对比目标从文字学的常识,并把它们作为中间自我监督学习任务逐步前培训PTLMs(任务特定的前微调对下游数据集)。此外,我们还制定了联合训练前框架,以统一生成和对比目标,使他们能够相互加强。大量的实验结果表明,我们的方法,观念意识的语言模型(CALM),可以将更多的常识性知识转化为预训练文本到文本的转换参数,而不依赖于外部知识图表,双方自然语言理解和产生更好的性能NLG任务。我们发现,在相对较小的阴茎,而只有增量预先训练了几步,CALM性能优于基准方法,通过一个一致的利润率,甚至与一些较大的PTLMs相媲美,这表明CALM可以作为一般的,插件和播放方法用于提高PTLM的常识推理能力。
Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren
Abstract: Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks. However, current pre-training objectives such as masked token prediction (for BERT-style PTLMs) and masked span infilling (for T5-style PTLMs) do not explicitly model the relational commonsense knowledge about everyday concepts, which is crucial to many downstream tasks that need common sense to understand or generate. To augment PTLMs with concept-centric commonsense knowledge, in this paper, we propose both generative and contrastive objectives for learning common sense from the text, and use them as intermediate self-supervised learning tasks for incrementally pre-training PTLMs (before task-specific fine-tuning on downstream datasets). Furthermore, we develop a joint pre-training framework to unify generative and contrastive objectives so that they can mutually reinforce each other. Extensive experimental results show that our method, concept-aware language model (CALM), can pack more commonsense knowledge into the parameters of a pre-trained text-to-text transformer without relying on external knowledge graphs, yielding better performance on both NLU and NLG tasks. We show that while only incrementally pre-trained on a relatively small corpus for a few steps, CALM outperforms baseline methods by a consistent margin and even comparable with some larger PTLMs, which suggests that CALM can serve as a general, plug-and-play method for improving the commonsense reasoning ability of a PTLM.
摘要:预先训练的语言模型(PTLM)已经在一系列的自然语言理解(NLU)和代(NLG)任务,取得了不俗的成绩。然而,目前的预培养目标,如屏蔽令牌预测(用于BERT式PTLMs)和屏蔽跨度充填(对于T5式PTLMs)没有明确的模型对日常概念的关系常识性知识,这是至关重要的许多下游的任务是需要常识理解或产生。与概念为中心的常识性知识的扩充PTLMs,在本文中,我们提出了两种生成和对比目标从文字学的常识,并把它们作为中间自我监督学习任务逐步前培训PTLMs(任务特定的前微调对下游数据集)。此外,我们还制定了联合训练前框架,以统一生成和对比目标,使他们能够相互加强。大量的实验结果表明,我们的方法,观念意识的语言模型(CALM),可以将更多的常识性知识转化为预训练文本到文本的转换参数,而不依赖于外部知识图表,双方自然语言理解和产生更好的性能NLG任务。我们发现,在相对较小的阴茎,而只有增量预先训练了几步,CALM性能优于基准方法,通过一个一致的利润率,甚至与一些较大的PTLMs相媲美,这表明CALM可以作为一般的,插件和播放方法用于提高PTLM的常识推理能力。
18. Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions [PDF] 返回目录
Muhammad N. ElNokrashy, Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, Ahmed Tawfik, Hany Hassan Awadalla
Abstract: This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvement over baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.
摘要:本文介绍了我们提交WMT20句子过滤任务。我们结合从(1)对于每个源语言建一个定制的激光,(2)通过内置语义对准以区分阳性和阴性对一个分类器,以及(3)包含在任务的devkit原始分数的分数。对于mBART设置细化和微调,由组织者,我们的方法示出了7%和5%的相对改善超过基线上,用于分别普什图语和高棉测试集提供,在sacreBLEU得分。
Muhammad N. ElNokrashy, Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, Ahmed Tawfik, Hany Hassan Awadalla
Abstract: This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvement over baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.
摘要:本文介绍了我们提交WMT20句子过滤任务。我们结合从(1)对于每个源语言建一个定制的激光,(2)通过内置语义对准以区分阳性和阴性对一个分类器,以及(3)包含在任务的devkit原始分数的分数。对于mBART设置细化和微调,由组织者,我们的方法示出了7%和5%的相对改善超过基线上,用于分别普什图语和高棉测试集提供,在sacreBLEU得分。
19. Text Information Aggregation with Centrality Attention [PDF] 返回目录
Jingjing Gong, Hang Yan, Yining Zheng, Xipeng Qiu, Xuanjing Huang
Abstract: A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement.}
摘要:许多自然语言处理的问题需要编码的文本序列作为固定长度的载体,它通常包括组合的所有单词,如集中或自我关注表示的聚集过程。然而,这些广泛使用的聚合方法没有考虑的话考虑中高阶的关系。因此,我们建议获得聚集的权重,称为本征中心地位的自我关注的一个新途径。更具体地讲,我们建立了一个完全连通图在句子中的所有单词,然后计算本征中心地位为关注分数每个字的。作为图形关系的显式建模是能够捕获单词之间的一些高阶的依赖,这将帮助我们在5级文本分类的任务,比基准模型,如池,自我关注和动态路由一个SNLI任务达到更好的效果。此外,为了计算图的主要特征向量,我们采用功率法算法得到本征中间值。此外,我们也得出一个迭代的方法来获得梯度为电力法工艺同时减少内存消耗和计算的要求。}
Jingjing Gong, Hang Yan, Yining Zheng, Xipeng Qiu, Xuanjing Huang
Abstract: A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement.}
摘要:许多自然语言处理的问题需要编码的文本序列作为固定长度的载体,它通常包括组合的所有单词,如集中或自我关注表示的聚集过程。然而,这些广泛使用的聚合方法没有考虑的话考虑中高阶的关系。因此,我们建议获得聚集的权重,称为本征中心地位的自我关注的一个新途径。更具体地讲,我们建立了一个完全连通图在句子中的所有单词,然后计算本征中心地位为关注分数每个字的。作为图形关系的显式建模是能够捕获单词之间的一些高阶的依赖,这将帮助我们在5级文本分类的任务,比基准模型,如池,自我关注和动态路由一个SNLI任务达到更好的效果。此外,为了计算图的主要特征向量,我们采用功率法算法得到本征中间值。此外,我们也得出一个迭代的方法来获得梯度为电力法工艺同时减少内存消耗和计算的要求。}
20. Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts [PDF] 返回目录
Kairit Sirts, Kairit Peekman
Abstract: Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules. Thus, sentence segmentation and word tokenization systems that have been developed on well-formed texts might not perform so well on unedited web texts. In this paper, we first describe the manual annotation of sentence boundaries of an Estonian web dataset and then present the evaluation results of three existing sentence segmentation and word tokenization systems on this corpus: EstNLTK, Stanza and UDPipe. While EstNLTK obtains the highest performance compared to other systems on sentence segmentation on this dataset, the sentence segmentation performance of Stanza and UDPipe remains well below the results obtained on the more well-formed Estonian UD test set.
摘要:从网络获得的文本是嘈杂,不一定遵循正字句子和词边界规则。因此,已在形成良好的文本发达句子切分和文字符号化系统可能无法在未经编辑的网页文本如此上佳表现。在本文中,我们首先介绍了爱沙尼亚的网络数据集的句子边界的人工标注,然后在这个语料库目前现有的三个句子切分与字标记化系统的评价结果:EstNLTK,诗节和UDPipe。虽然EstNLTK相比关于此数据集句子分割其他系统获得最高的性能,诗节和UDPipe的句子分割性能仍远低于对较合式爱沙尼亚UD测试组所获得的结果。
Kairit Sirts, Kairit Peekman
Abstract: Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules. Thus, sentence segmentation and word tokenization systems that have been developed on well-formed texts might not perform so well on unedited web texts. In this paper, we first describe the manual annotation of sentence boundaries of an Estonian web dataset and then present the evaluation results of three existing sentence segmentation and word tokenization systems on this corpus: EstNLTK, Stanza and UDPipe. While EstNLTK obtains the highest performance compared to other systems on sentence segmentation on this dataset, the sentence segmentation performance of Stanza and UDPipe remains well below the results obtained on the more well-formed Estonian UD test set.
摘要:从网络获得的文本是嘈杂,不一定遵循正字句子和词边界规则。因此,已在形成良好的文本发达句子切分和文字符号化系统可能无法在未经编辑的网页文本如此上佳表现。在本文中,我们首先介绍了爱沙尼亚的网络数据集的句子边界的人工标注,然后在这个语料库目前现有的三个句子切分与字标记化系统的评价结果:EstNLTK,诗节和UDPipe。虽然EstNLTK相比关于此数据集句子分割其他系统获得最高的性能,诗节和UDPipe的句子分割性能仍远低于对较合式爱沙尼亚UD测试组所获得的结果。
21. WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [PDF] 返回目录
Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, Graham Neubig
Abstract: Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
摘要:基于看点汇总是生成的任务集中基于感兴趣的特定点总结。这样的总结帮助文本的有效分析,如快速了解从不同角度的评论或意见。然而,由于在不同的域(例如,情绪,产品功能)方面的类型差异较大,前一模型的发展已趋向于特定域。在本文中,我们提出WikiAsp,大规模数据集的多域基础方面,总结,试图在开放域基于方面,总结的方向推动研究。具体来说,我们建立使用来自20个不同领域的维基百科文章的数据集,使用章节标题和每篇文章作为注释方面的代理的边界。我们提出了这一任务,并进行实验数据集上的几个简单的基本模式。结果突出显示关键的挑战,现有的总结模式面临此设置,如援引消息人士的正确代词的处理和对时间敏感的事件一致的解释。
Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, Graham Neubig
Abstract: Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
摘要:基于看点汇总是生成的任务集中基于感兴趣的特定点总结。这样的总结帮助文本的有效分析,如快速了解从不同角度的评论或意见。然而,由于在不同的域(例如,情绪,产品功能)方面的类型差异较大,前一模型的发展已趋向于特定域。在本文中,我们提出WikiAsp,大规模数据集的多域基础方面,总结,试图在开放域基于方面,总结的方向推动研究。具体来说,我们建立使用来自20个不同领域的维基百科文章的数据集,使用章节标题和每篇文章作为注释方面的代理的边界。我们提出了这一任务,并进行实验数据集上的几个简单的基本模式。结果突出显示关键的挑战,现有的总结模式面临此设置,如援引消息人士的正确代词的处理和对时间敏感的事件一致的解释。
22. Deep Shallow Fusion for RNN-T Personalization [PDF] 返回目录
Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer
Abstract: End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to a strong RNN-T baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.
摘要:终端到高端机型一般,和递归神经网络传感器(RNN-T),特别是,在过去的几年里,由于其简单性,紧凑性和优异的性能获得了显著牵引自动语音识别社区通用转录任务。然而,这些模型更具挑战性相比传统混合动力系统由于缺乏外部语言模型和识别罕见的长尾词,特别是实体名称的困难个性化。在这项工作中,我们提出了新的技术来提高RNN-T的稀有WordPieces建模,注入额外的信息到编码器,能够使用替代的字形发音,并提供个性化的语言模型更强大的偏置进行深度融合的能力。我们发现,这些组合技术导致15.4%-34.5%的相对词错误率的改善相比,它采用浅融合和文本到语音增强很强RNN-T基线。我们的工作有助于推动RNN-T的个性化的边界,并关闭与使用情况下,偏置和实体识别是至关重要的混合动力系统的差距。
Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer
Abstract: End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to a strong RNN-T baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.
摘要:终端到高端机型一般,和递归神经网络传感器(RNN-T),特别是,在过去的几年里,由于其简单性,紧凑性和优异的性能获得了显著牵引自动语音识别社区通用转录任务。然而,这些模型更具挑战性相比传统混合动力系统由于缺乏外部语言模型和识别罕见的长尾词,特别是实体名称的困难个性化。在这项工作中,我们提出了新的技术来提高RNN-T的稀有WordPieces建模,注入额外的信息到编码器,能够使用替代的字形发音,并提供个性化的语言模型更强大的偏置进行深度融合的能力。我们发现,这些组合技术导致15.4%-34.5%的相对词错误率的改善相比,它采用浅融合和文本到语音增强很强RNN-T基线。我们的工作有助于推动RNN-T的个性化的边界,并关闭与使用情况下,偏置和实体识别是至关重要的混合动力系统的差距。
23. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases [PDF] 返回目录
Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su
Abstract: Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,495 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.
摘要:对知识库(KBQA)答疑现有的研究主要是操作与标准i.i.d假设,即训练分布下来的问题是一样的测试分布。然而,由于1)真正的用户分布是很难捕捉和2)从巨大空间中随机样本训练实例将是高度数据低效i.i.d可能既不合理实现,也不可取上大规模KB的。相反,我们建议KBQA模型应该有三个层次的内置概括:i.i.d,成分和零射门。为了方便KBQA机型具有更强的泛化发展,我们构建和发布新的大规模,高品质的数据集64495点的问题,GrailQA,并为所有三个层次的概括评价设置。此外,我们提出了一种新的基于BERT-KBQA模型。我们的数据和模型的结合,使我们能够彻底地研究和展示,在第一时间,预先训练情境的嵌入像BERT在KBQA的推广的关键作用。
Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su
Abstract: Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,495 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.
摘要:对知识库(KBQA)答疑现有的研究主要是操作与标准i.i.d假设,即训练分布下来的问题是一样的测试分布。然而,由于1)真正的用户分布是很难捕捉和2)从巨大空间中随机样本训练实例将是高度数据低效i.i.d可能既不合理实现,也不可取上大规模KB的。相反,我们建议KBQA模型应该有三个层次的内置概括:i.i.d,成分和零射门。为了方便KBQA机型具有更强的泛化发展,我们构建和发布新的大规模,高品质的数据集64495点的问题,GrailQA,并为所有三个层次的概括评价设置。此外,我们提出了一种新的基于BERT-KBQA模型。我们的数据和模型的结合,使我们能够彻底地研究和展示,在第一时间,预先训练情境的嵌入像BERT在KBQA的推广的关键作用。
24. Reinforced Medical Report Generation with X-Linear Attention and Repetition Penalty [PDF] 返回目录
Wenting Xu, Chang Qi, Zhenghua Xu, Thomas Lukasiewicz
Abstract: To reduce doctors' workload, deep-learning-based automatic medical report generation has recently attracted more and more research efforts, where attention mechanisms and reinforcement learning are integrated with the classic encoder-decoder architecture to enhance the performance of deep models. However, these state-of-the-art solutions mainly suffer from two shortcomings: (i) their attention mechanisms cannot utilize high-order feature interactions, and (ii) due to the use of TF-IDF-based reward functions, these methods are fragile with generating repeated terms. Therefore, in this work, we propose a reinforced medical report generation solution with x-linear attention and repetition penalty mechanisms (ReMRG-XR) to overcome these problems. Specifically, x-linear attention modules are used to explore high-order feature interactions and achieve multi-modal reasoning, while repetition penalty is used to apply penalties to repeated terms during the model's training process. Extensive experimental studies have been conducted on two public datasets, and the results show that ReMRG-XR greatly outperforms the state-of-the-art baselines in terms of all metrics.
摘要:为了减少医生的工作量,深学为基础的自动医学报告生成最近吸引了越来越多的研究工作,这里注意机制和强化学习与经典的编码器,解码器架构集成,以提高深模型的性能。然而,国家的最先进的这些解决方案主要遭受两个缺点:(一)他们的注意力机制不能利用高阶功能的相互作用,以及(ii)由于使用基于TF-IDF回报功能,这些方法与生成的重复方面脆弱。因此,在这项工作中,我们提出了加强与X线性的关注和重复惩罚机制(ReMRG-XR)医学报告生成解决方案来克服这些问题。具体而言,X线性关注模块用来探索高阶功能的交互,实现多模态推理,而重复处罚用来在模型的训练过程,以处罚适用于反复条款。大量的实验研究已经进行了两次公开数据集进行,结果表明,ReMRG-XR大大优于国家的最先进的基线的所有指标的条款。
Wenting Xu, Chang Qi, Zhenghua Xu, Thomas Lukasiewicz
Abstract: To reduce doctors' workload, deep-learning-based automatic medical report generation has recently attracted more and more research efforts, where attention mechanisms and reinforcement learning are integrated with the classic encoder-decoder architecture to enhance the performance of deep models. However, these state-of-the-art solutions mainly suffer from two shortcomings: (i) their attention mechanisms cannot utilize high-order feature interactions, and (ii) due to the use of TF-IDF-based reward functions, these methods are fragile with generating repeated terms. Therefore, in this work, we propose a reinforced medical report generation solution with x-linear attention and repetition penalty mechanisms (ReMRG-XR) to overcome these problems. Specifically, x-linear attention modules are used to explore high-order feature interactions and achieve multi-modal reasoning, while repetition penalty is used to apply penalties to repeated terms during the model's training process. Extensive experimental studies have been conducted on two public datasets, and the results show that ReMRG-XR greatly outperforms the state-of-the-art baselines in terms of all metrics.
摘要:为了减少医生的工作量,深学为基础的自动医学报告生成最近吸引了越来越多的研究工作,这里注意机制和强化学习与经典的编码器,解码器架构集成,以提高深模型的性能。然而,国家的最先进的这些解决方案主要遭受两个缺点:(一)他们的注意力机制不能利用高阶功能的相互作用,以及(ii)由于使用基于TF-IDF回报功能,这些方法与生成的重复方面脆弱。因此,在这项工作中,我们提出了加强与X线性的关注和重复惩罚机制(ReMRG-XR)医学报告生成解决方案来克服这些问题。具体而言,X线性关注模块用来探索高阶功能的交互,实现多模态推理,而重复处罚用来在模型的训练过程,以处罚适用于反复条款。大量的实验研究已经进行了两次公开数据集进行,结果表明,ReMRG-XR大大优于国家的最先进的基线的所有指标的条款。
25. IIT_kgp at FinCausal 2020, Shared Task 1: Causality Detection using Sentence Embeddings in Financial Reports [PDF] 返回目录
Arka Mitra, Harshvardhan Srivastava, Yugam Tiwari
Abstract: The paper describes the work that the team submitted to FinCausal 2020 Shared Task. This work is associated with the first sub-task of identifying causality in sentences. The various models used in the experiments tried to obtain a latent space representation for each of the sentences. Linear regression was performed on these representations to classify whether the sentence is causal or not. The experiments have shown BERT (Large) performed the best, giving a F1 score of 0.958, in the task of detecting the causality of sentences in financial texts and reports. The class imbalance was dealt with a modified loss function to give a better metric score for the evaluation.
摘要:本文介绍了团队提交FinCausal 2020共享任务的工作。这项工作是在判决确定因果关系的第一子任务相关联。在实验中使用的各种模型试图获得潜在空间表示为每个句子。在对这些陈述进行线性回归进行分类的句子是否有因果关系或没有。根据试验结果表明BERT(大)表现最好,给人的0.958 F1得分,在检测中财务文本和报告句子的因果关系的任务。类失衡是查处了修改损失函数,给出评价一个更好的指标分数。
Arka Mitra, Harshvardhan Srivastava, Yugam Tiwari
Abstract: The paper describes the work that the team submitted to FinCausal 2020 Shared Task. This work is associated with the first sub-task of identifying causality in sentences. The various models used in the experiments tried to obtain a latent space representation for each of the sentences. Linear regression was performed on these representations to classify whether the sentence is causal or not. The experiments have shown BERT (Large) performed the best, giving a F1 score of 0.958, in the task of detecting the causality of sentences in financial texts and reports. The class imbalance was dealt with a modified loss function to give a better metric score for the evaluation.
摘要:本文介绍了团队提交FinCausal 2020共享任务的工作。这项工作是在判决确定因果关系的第一子任务相关联。在实验中使用的各种模型试图获得潜在空间表示为每个句子。在对这些陈述进行线性回归进行分类的句子是否有因果关系或没有。根据试验结果表明BERT(大)表现最好,给人的0.958 F1得分,在检测中财务文本和报告句子的因果关系的任务。类失衡是查处了修改损失函数,给出评价一个更好的指标分数。
26. ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments [PDF] 返回目录
Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal
Abstract: For embodied agents, navigation is an important ability but not an isolated goal. Agents are also expected to perform specific tasks after reaching the target location, such as picking up objects and assembling them into a particular arrangement. We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon. During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment. To support this task, we implement a 3D dynamic environment simulator and collect a dataset (in English; and also extended to Hindi) with human-written navigation and assembling instructions, and the corresponding ground truth trajectories. We also filter the collected instructions via a verification stage, leading to a total of 7.7K task instances (30.8K instructions and paths). We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work. Our dataset, simulator, and code are publicly available at: this https URL
摘要:体现剂,导航是一个重要的能力,但不是一个孤立的目标。剂也预期在到达目标位置,如拾取对象并将它们组装成一个特定布置之后执行特定的任务。我们结合视觉和语言导航,收集对象的组装和对象指表达的理解,创造一个新的联合导航和组装任务,命名ArraMon。在这个任务中,代理(类似口袋妖怪GO播放器)被要求查找和收集不同的目标通过基于在复杂,逼真的室外环境自然语言指令导航对象一个接一个,但随后又安排收集的对象部分通过部分在自我中心的网格布局环境。为了支持这项工作,我们实现了一个三维动态环境模拟,并收集数据集(英文;也扩展到印地文)与人编写的导航和装配说明,以及相应的地面实况轨迹。我们还通过验证阶段过滤收集指令,导致总7.7K任务实例(30.8K指令和路径)。我们几个基线模型(集成和偏置)和度量(nDTW,CTC,rPOD,和PTC),和大型号人类的性能差距目前的结果表明,我们的任务是挑战,并提出了今后工作的范围广。我们的数据集,仿真和代码是公开的网址为:此HTTPS URL
Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal
Abstract: For embodied agents, navigation is an important ability but not an isolated goal. Agents are also expected to perform specific tasks after reaching the target location, such as picking up objects and assembling them into a particular arrangement. We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon. During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment. To support this task, we implement a 3D dynamic environment simulator and collect a dataset (in English; and also extended to Hindi) with human-written navigation and assembling instructions, and the corresponding ground truth trajectories. We also filter the collected instructions via a verification stage, leading to a total of 7.7K task instances (30.8K instructions and paths). We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work. Our dataset, simulator, and code are publicly available at: this https URL
摘要:体现剂,导航是一个重要的能力,但不是一个孤立的目标。剂也预期在到达目标位置,如拾取对象并将它们组装成一个特定布置之后执行特定的任务。我们结合视觉和语言导航,收集对象的组装和对象指表达的理解,创造一个新的联合导航和组装任务,命名ArraMon。在这个任务中,代理(类似口袋妖怪GO播放器)被要求查找和收集不同的目标通过基于在复杂,逼真的室外环境自然语言指令导航对象一个接一个,但随后又安排收集的对象部分通过部分在自我中心的网格布局环境。为了支持这项工作,我们实现了一个三维动态环境模拟,并收集数据集(英文;也扩展到印地文)与人编写的导航和装配说明,以及相应的地面实况轨迹。我们还通过验证阶段过滤收集指令,导致总7.7K任务实例(30.8K指令和路径)。我们几个基线模型(集成和偏置)和度量(nDTW,CTC,rPOD,和PTC),和大型号人类的性能差距目前的结果表明,我们的任务是挑战,并提出了今后工作的范围广。我们的数据集,仿真和代码是公开的网址为:此HTTPS URL
27. DORB: Dynamically Optimizing Multiple Rewards with Bandits [PDF] 返回目录
Ramakanth Pasunuru, Han Guo, Mohit Bansal
Abstract: Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation, including on an unseen-test transfer setup. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards.
摘要:政策渐变为基础的强化学习已被证明是直接优化语言生成任务不可微评价指标有前途的方法。然而,优化了具体的指标奖励导致改善大多只有指标,表明该模型是游戏中一种特殊的方式该指标的制定往往没有实现真正的质量改进。因此,更有利于使模型优化多个不同指标的奖励联合。虽然吸引人,这是具有挑战性的,因为一个需要手动决定的重要性和缩放这些指标奖励的权重。此外,考虑使用动态组合和灵活的随时间变化的指标奖励的课程是非常重要的。考虑到上述方面,在我们的工作中,我们通过多武装土匪的做法(DORB),同时自动化多个指标奖励的优化,其中在每一轮中,哪些指标奖励,以优化未来,基于预期手臂收益匪选。我们使用EXP3算法土匪和制定土匪奖励两种方法:(1)单一的多奖励强盗(SM-强盗); (2)分层型奖励强盗(HM-强盗)。我们经验表明通过各种自动化指标和两个重要的NLG任务的人评价我们的方法的有效性:问题的产生和数据到文本生成,包括在一个看不见的测试传输设置。最后,我们提出学习课程匪在优化奖励的解释分析。
Ramakanth Pasunuru, Han Guo, Mohit Bansal
Abstract: Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation, including on an unseen-test transfer setup. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards.
摘要:政策渐变为基础的强化学习已被证明是直接优化语言生成任务不可微评价指标有前途的方法。然而,优化了具体的指标奖励导致改善大多只有指标,表明该模型是游戏中一种特殊的方式该指标的制定往往没有实现真正的质量改进。因此,更有利于使模型优化多个不同指标的奖励联合。虽然吸引人,这是具有挑战性的,因为一个需要手动决定的重要性和缩放这些指标奖励的权重。此外,考虑使用动态组合和灵活的随时间变化的指标奖励的课程是非常重要的。考虑到上述方面,在我们的工作中,我们通过多武装土匪的做法(DORB),同时自动化多个指标奖励的优化,其中在每一轮中,哪些指标奖励,以优化未来,基于预期手臂收益匪选。我们使用EXP3算法土匪和制定土匪奖励两种方法:(1)单一的多奖励强盗(SM-强盗); (2)分层型奖励强盗(HM-强盗)。我们经验表明通过各种自动化指标和两个重要的NLG任务的人评价我们的方法的有效性:问题的产生和数据到文本生成,包括在一个看不见的测试传输设置。最后,我们提出学习课程匪在优化奖励的解释分析。
28. The Challenge of Diacritics in Yoruba Embeddings [PDF] 返回目录
Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki
Abstract: The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation. The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set & WordSim) and we obtain the best performance in WordSim and corresponding Spearman correlation.
摘要:这项工作的主要贡献包括经验建立从undiacritized(标准化)的数据集,并提供新的比喻集评价约鲁巴的嵌入有更好的表现。约鲁巴语,作为一个音调的语言,利用以书面形式附加符号(音调标记)中。我们表明,这种影响从完全相同的维基百科数据集,但与第二个归被undiacritized创建的嵌入嵌入性能。我们进一步比较与其他两项工作(使用类比测试装置及WordSim)平均内在性能,我们获得WordSim最佳的性能和相应的Spearman相关。
Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki
Abstract: The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation. The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set & WordSim) and we obtain the best performance in WordSim and corresponding Spearman correlation.
摘要:这项工作的主要贡献包括经验建立从undiacritized(标准化)的数据集,并提供新的比喻集评价约鲁巴的嵌入有更好的表现。约鲁巴语,作为一个音调的语言,利用以书面形式附加符号(音调标记)中。我们表明,这种影响从完全相同的维基百科数据集,但与第二个归被undiacritized创建的嵌入嵌入性能。我们进一步比较与其他两项工作(使用类比测试装置及WordSim)平均内在性能,我们获得WordSim最佳的性能和相应的Spearman相关。
29. Morphologically Aware Word-Level Translation [PDF] 返回目录
Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake
Abstract: We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way. Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning, while inflectional morphology provides additional syntactic information. This approach leads to substantial performance improvements - 19% average improvement in accuracy across 6 language pairs over the state of the art in the supervised setting and 16% in the weakly supervised setting. As another contribution, we highlight issues associated with modern BLI that stem from ignoring inflectional morphology, and propose three suggestions for improving the task.
摘要:我们提出了双语词典感应,其共同的模型语义翻译和屈折形态结构化的方式新颖的形态感知概率模型。我们的模型利用基本的语言直觉的语义是意义的关键词汇单位,而屈折形态提供了额外的句法信息。该方法导致显着的性能改进 - 在相对于本领域的有监督的设置和16%,在弱监督设置的状态横跨6语言对准确性19%的平均改善。作为另一种贡献,我们强调的是从忽视屈折形态干,并提出改进工作提出三点建议与现代BLI相关的问题。
Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake
Abstract: We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way. Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning, while inflectional morphology provides additional syntactic information. This approach leads to substantial performance improvements - 19% average improvement in accuracy across 6 language pairs over the state of the art in the supervised setting and 16% in the weakly supervised setting. As another contribution, we highlight issues associated with modern BLI that stem from ignoring inflectional morphology, and propose three suggestions for improving the task.
摘要:我们提出了双语词典感应,其共同的模型语义翻译和屈折形态结构化的方式新颖的形态感知概率模型。我们的模型利用基本的语言直觉的语义是意义的关键词汇单位,而屈折形态提供了额外的句法信息。该方法导致显着的性能改进 - 在相对于本领域的有监督的设置和16%,在弱监督设置的状态横跨6语言对准确性19%的平均改善。作为另一种贡献,我们强调的是从忽视屈折形态干,并提出改进工作提出三点建议与现代BLI相关的问题。
30. Target Guided Emotion Aware Chat Machine [PDF] 返回目录
Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng
Abstract: The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post and leverage target information for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
摘要:在语义层面和情感层面特定职位的响应的一致性是至关重要的对话系统,提供类似人类的相互作用。然而,这种挑战不是在文献中解决,因为大多数方法的忽视情感传达的信息通过而产生响应后。本文解决了这个问题是通过提出一种unifed端至端的神经结构,这是能够同时在编码后和利用目标信息的语义和情绪,用于产生具有适当表达情绪更智能响应。在真实世界的数据大量实验证明,该方法优于在内容的连贯性和情感恰当方面国家的最先进的方法。
Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng
Abstract: The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post and leverage target information for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
摘要:在语义层面和情感层面特定职位的响应的一致性是至关重要的对话系统,提供类似人类的相互作用。然而,这种挑战不是在文献中解决,因为大多数方法的忽视情感传达的信息通过而产生响应后。本文解决了这个问题是通过提出一种unifed端至端的神经结构,这是能够同时在编码后和利用目标信息的语义和情绪,用于产生具有适当表达情绪更智能响应。在真实世界的数据大量实验证明,该方法优于在内容的连贯性和情感恰当方面国家的最先进的方法。
31. Iterative Self-Learning for Enhanced Back-Translation in Low Resource Neural Machine Translation [PDF] 返回目录
Idris Abdulmumin, Bashir Shehu Galadanci, Ismaila Idris Sinan
Abstract: Many language pairs are low resource - the amount and/or quality of parallel data is not sufficient to train a neural machine translation (NMT) model which can reach an acceptable standard of accuracy. Many works have explored the use of the easier-to-get monolingual data to improve the performance of translation models in this category of languages - and even high resource languages. The most successful of such works is the back-translation - using the translations of the target language monolingual data to increase the amount of the training data. The quality of the backward model - trained on the available parallel data - has been shown to determine the performance of the back-translation approach. Many approaches have been explored to improve the performance of this model especially in low resource languages where the amount of parallel data is not sufficient to train an acceptable backward model. Among such works are the use of self-learning and the iterative back-translation. These methods were shown to perform better than the standard back-translation. This work presents the iterative self-training approach as an improvement over the self-learning approach to further enhance the performance of the backward model. Over several iterations, the synthetic data generated by the backward model is used to improve its performance through forward translation. Experiments have shown that the method outperforms both the standard back-translation and self-learning approach on IWSLT'14 English German low resource NMT. While the method also outperforms the iterative back-translation, though slightly, the number of models required to be trained is reduced exactly by the number of iterations.
摘要:许多语言对是低资源 - 的量和/或并行数据的质量不足够训练可达到的精度的一个可接受的标准神经机器翻译(NMT)模型。许多作品已经探索了使用更易获得单语的数据,以提高这一类的语言翻译模型的表现 - 甚至高资源的语言。最成功的此类作品的是回译 - 使用目标语言的单语数据的转换,以增加训练的数据量。落后的模型的质量 - 训练有素的可用的并行数据 - 已被证明,以确定回译方法的性能。许多方法已被开发以改进该模型尤其是在低资源语言,其中的并行数据的量不足以训练可接受的向后模型的性能。在这些作品都是使用自我学习和反复的回译的。被证明这些方法来执行比标准的回译好。这项工作提出了迭代的自我训练方法,因为在自学习方法的改进,进一步提高落后模型的性能。在几个迭代,由后向模型生成的合成数据被用来改善通过向前平移其性能。实验表明,该方法优于标准的回译和自学的方式上IWSLT'14英语德语低资源NMT两者。尽管该方法也优于迭代回译,虽然稍,需要的车型数量进行培训,通过迭代次数恰好减少。
Idris Abdulmumin, Bashir Shehu Galadanci, Ismaila Idris Sinan
Abstract: Many language pairs are low resource - the amount and/or quality of parallel data is not sufficient to train a neural machine translation (NMT) model which can reach an acceptable standard of accuracy. Many works have explored the use of the easier-to-get monolingual data to improve the performance of translation models in this category of languages - and even high resource languages. The most successful of such works is the back-translation - using the translations of the target language monolingual data to increase the amount of the training data. The quality of the backward model - trained on the available parallel data - has been shown to determine the performance of the back-translation approach. Many approaches have been explored to improve the performance of this model especially in low resource languages where the amount of parallel data is not sufficient to train an acceptable backward model. Among such works are the use of self-learning and the iterative back-translation. These methods were shown to perform better than the standard back-translation. This work presents the iterative self-training approach as an improvement over the self-learning approach to further enhance the performance of the backward model. Over several iterations, the synthetic data generated by the backward model is used to improve its performance through forward translation. Experiments have shown that the method outperforms both the standard back-translation and self-learning approach on IWSLT'14 English German low resource NMT. While the method also outperforms the iterative back-translation, though slightly, the number of models required to be trained is reduced exactly by the number of iterations.
摘要:许多语言对是低资源 - 的量和/或并行数据的质量不足够训练可达到的精度的一个可接受的标准神经机器翻译(NMT)模型。许多作品已经探索了使用更易获得单语的数据,以提高这一类的语言翻译模型的表现 - 甚至高资源的语言。最成功的此类作品的是回译 - 使用目标语言的单语数据的转换,以增加训练的数据量。落后的模型的质量 - 训练有素的可用的并行数据 - 已被证明,以确定回译方法的性能。许多方法已被开发以改进该模型尤其是在低资源语言,其中的并行数据的量不足以训练可接受的向后模型的性能。在这些作品都是使用自我学习和反复的回译的。被证明这些方法来执行比标准的回译好。这项工作提出了迭代的自我训练方法,因为在自学习方法的改进,进一步提高落后模型的性能。在几个迭代,由后向模型生成的合成数据被用来改善通过向前平移其性能。实验表明,该方法优于标准的回译和自学的方式上IWSLT'14英语德语低资源NMT两者。尽管该方法也优于迭代回译,虽然稍,需要的车型数量进行培训,通过迭代次数恰好减少。
32. Lessons from Computational Modelling of Reference Production in Mandarin and English [PDF] 返回目录
Guanyi Chen, Kees van Deemter
Abstract: Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.
摘要:参照表达式代(REG)的算法所提供的生产参照表达式的计算模型。在早期的工作中,引入普通话指称词语(RES)的语料库。在本文中,我们诠释这个语料库,就可以评估经典REG算法,并将结果与上REG的英语指称词语评估先前的结果进行比较。接下来,我们提供的语料的深入分析,着眼于从普通话语法出现的问题。我们讨论我们的调查过程中揭发以前REG评估的缺点,我们强调一些令人惊讶的结果。也许最引人注目的是,我们发现在指定的表情比以前的研究已经表明,不仅在华语,英语及的比例要高得多。
Guanyi Chen, Kees van Deemter
Abstract: Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.
摘要:参照表达式代(REG)的算法所提供的生产参照表达式的计算模型。在早期的工作中,引入普通话指称词语(RES)的语料库。在本文中,我们诠释这个语料库,就可以评估经典REG算法,并将结果与上REG的英语指称词语评估先前的结果进行比较。接下来,我们提供的语料的深入分析,着眼于从普通话语法出现的问题。我们讨论我们的调查过程中揭发以前REG评估的缺点,我们强调一些令人惊讶的结果。也许最引人注目的是,我们发现在指定的表情比以前的研究已经表明,不仅在华语,英语及的比例要高得多。
33. Words are the Window to the Soul: Language-based User Representations for Fake News Detection [PDF] 返回目录
Marco Del Tredici, Raquel Fernández
Abstract: Cognitive and social traits of individuals are reflected in language use. Moreover, individuals who are prone to spread fake news online often share common traits. Building on these ideas, we introduce a model that creates representations of individuals on social media based only on the language they produce, and use them to detect fake news. We show that language-based user representations are beneficial for this task. We also present an extended analysis of the language of fake news spreaders, showing that its main features are mostly domain independent and consistent across two English datasets. Finally, we exploit the relation between language use and connections in the social graph to assess the presence of the Echo Chamber effect in our data.
摘要:人的认知和社会特征,反映在语言的使用。此外,个人谁是容易传播假新闻网上经常有一些共同特点。这些理念的基础上,我们将介绍创建仅基于它们产生的语言在社交媒体上的个人陈述的模型,并用它们来检测假新闻。我们表明,基于语言的用户表示对完成这一任务是有益的。我们还提出假新闻传播者的语言的扩展分析,显示出其主要特点是大多是独立的域名和跨两个英文数据集是一致的。最后,我们利用在社交图中的语言使用和连接之间的关系,以评估在我们的数据回声室效应的存在。
Marco Del Tredici, Raquel Fernández
Abstract: Cognitive and social traits of individuals are reflected in language use. Moreover, individuals who are prone to spread fake news online often share common traits. Building on these ideas, we introduce a model that creates representations of individuals on social media based only on the language they produce, and use them to detect fake news. We show that language-based user representations are beneficial for this task. We also present an extended analysis of the language of fake news spreaders, showing that its main features are mostly domain independent and consistent across two English datasets. Finally, we exploit the relation between language use and connections in the social graph to assess the presence of the Echo Chamber effect in our data.
摘要:人的认知和社会特征,反映在语言的使用。此外,个人谁是容易传播假新闻网上经常有一些共同特点。这些理念的基础上,我们将介绍创建仅基于它们产生的语言在社交媒体上的个人陈述的模型,并用它们来检测假新闻。我们表明,基于语言的用户表示对完成这一任务是有益的。我们还提出假新闻传播者的语言的扩展分析,显示出其主要特点是大多是独立的域名和跨两个英文数据集是一致的。最后,我们利用在社交图中的语言使用和连接之间的关系,以评估在我们的数据回声室效应的存在。
34. Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration [PDF] 返回目录
Fan-Keng Sun, Cheng-I Lai
Abstract: Transformer-based language models have shown to be very powerful for natural language generation (NLG). However, text generation conditioned on some user inputs, such as topics or attributes, is non-trivial. Past approach relies on either modifying the original LM architecture, re-training the LM on corpora with attribute labels, or having separately trained `guidance models' to guide text generation in decoding. We argued that the above approaches are not necessary, and the original unconditioned LM is sufficient for conditioned NLG. We evaluated our approaches by the samples' fluency and diversity with automated and human evaluation.
摘要:基于变压器的语言模型也显示出对自然语言生成(NLG)非常强大。然而,文本生成调节某些用户输入,诸如主题或属性,是不平凡的。过去的方法依赖于任何修改原始LM架构,重新训练语料的LM与属性的标签,或有单独训练的`指导模型的指导文本生成解码。我们认为,上述做法是没有必要的,和原来的无条件LM是足够的空调NLG。我们评估了样本的流畅性和多样性我们的方法有自动和人工评估。
Fan-Keng Sun, Cheng-I Lai
Abstract: Transformer-based language models have shown to be very powerful for natural language generation (NLG). However, text generation conditioned on some user inputs, such as topics or attributes, is non-trivial. Past approach relies on either modifying the original LM architecture, re-training the LM on corpora with attribute labels, or having separately trained `guidance models' to guide text generation in decoding. We argued that the above approaches are not necessary, and the original unconditioned LM is sufficient for conditioned NLG. We evaluated our approaches by the samples' fluency and diversity with automated and human evaluation.
摘要:基于变压器的语言模型也显示出对自然语言生成(NLG)非常强大。然而,文本生成调节某些用户输入,诸如主题或属性,是不平凡的。过去的方法依赖于任何修改原始LM架构,重新训练语料的LM与属性的标签,或有单独训练的`指导模型的指导文本生成解码。我们认为,上述做法是没有必要的,和原来的无条件LM是足够的空调NLG。我们评估了样本的流畅性和多样性我们的方法有自动和人工评估。
35. Meaningful Answer Generation of E-Commerce Question-Answering [PDF] 返回目录
Shen Gao, Xiuying Chen, Zhaochun Ren, Dongyan Zhao, Rui Yan
Abstract: In e-commerce portals, generating answers for product-related questions has become a crucial task. In this paper, we focus on the task of product-aware answer generation, which learns to generate an accurate and complete answer from large-scale unlabeled e-commerce reviews and product attributes. However, safe answer problems pose significant challenges to text generation tasks, and e-commerce question-answering task is no exception. To generate more meaningful answers, in this paper, we propose a novel generative neural model, called the Meaningful Product Answer Generator (MPAG), which alleviates the safe answer problem by taking product reviews, product attributes, and a prototype answer into consideration. Product reviews and product attributes are used to provide meaningful content, while the prototype answer can yield a more diverse answer pattern. To this end, we propose a novel answer generator with a review reasoning module and a prototype answer reader. Our key idea is to obtain the correct question-aware information from a large scale collection of reviews and learn how to write a coherent and meaningful answer from an existing prototype answer. To be more specific, we propose a read-and-write memory consisting of selective writing units to conduct reasoning among these reviews. We then employ a prototype reader consisting of comprehensive matching to extract the answer skeleton from the prototype answer. Finally, we propose an answer editor to generate the final answer by taking the question and the above parts as input. Conducted on a real-world dataset collected from an e-commerce platform, extensive experimental results show that our model achieves state-of-the-art performance in terms of both automatic metrics and human evaluations. Human evaluation also demonstrates that our model can consistently generate specific and proper answers.
摘要:在电子商务门户,为产品相关的问题产生答案已经成为一个重要的任务。在本文中,我们注重产品的感知答案的一代,学会生成大型未标记的电子商务评论和产品属性的准确和完整答案的任务。然而,安全问题的回答会对文本生成任务显著挑战,电子商务答疑任务也不例外。为了产生更有意义的答案,在本文中,我们提出了一种新生成神经元模型,称为有意义的产品回答生成器(MPAG),其通过采取产品评论,产品属性缓解保险的回答问题,和原型的答案考虑。产品评论和产品属性来提供有意义的内容,而原型回答能产生更多样化的回答模式。为此,我们提出了审查推理模块和原型回答读者一个新的答案生成。我们的核心理念是获得评论的大规模集中正确的问题意识的信息,并了解如何编写从现有的原型回答一个连贯的和有意义的答案。更具体地讲,我们提出由选择写入单位进行这些评论中推理的读取和写入内存。然后,我们采用原型读者组成的综合匹配,以提取原型答案的答案骨架。最后,我们提出了一个答案编辑器生成通过采取这个问题,上述部分作为输入最后的答案。进行了从电子商务平台收集了现实世界的数据集,大量的实验结果表明,我们的模型实现了两个自动度量和评价人的方面国家的最先进的性能。人的评价也证明了我们的模式能够持续产生具体和适当的答案。
Shen Gao, Xiuying Chen, Zhaochun Ren, Dongyan Zhao, Rui Yan
Abstract: In e-commerce portals, generating answers for product-related questions has become a crucial task. In this paper, we focus on the task of product-aware answer generation, which learns to generate an accurate and complete answer from large-scale unlabeled e-commerce reviews and product attributes. However, safe answer problems pose significant challenges to text generation tasks, and e-commerce question-answering task is no exception. To generate more meaningful answers, in this paper, we propose a novel generative neural model, called the Meaningful Product Answer Generator (MPAG), which alleviates the safe answer problem by taking product reviews, product attributes, and a prototype answer into consideration. Product reviews and product attributes are used to provide meaningful content, while the prototype answer can yield a more diverse answer pattern. To this end, we propose a novel answer generator with a review reasoning module and a prototype answer reader. Our key idea is to obtain the correct question-aware information from a large scale collection of reviews and learn how to write a coherent and meaningful answer from an existing prototype answer. To be more specific, we propose a read-and-write memory consisting of selective writing units to conduct reasoning among these reviews. We then employ a prototype reader consisting of comprehensive matching to extract the answer skeleton from the prototype answer. Finally, we propose an answer editor to generate the final answer by taking the question and the above parts as input. Conducted on a real-world dataset collected from an e-commerce platform, extensive experimental results show that our model achieves state-of-the-art performance in terms of both automatic metrics and human evaluations. Human evaluation also demonstrates that our model can consistently generate specific and proper answers.
摘要:在电子商务门户,为产品相关的问题产生答案已经成为一个重要的任务。在本文中,我们注重产品的感知答案的一代,学会生成大型未标记的电子商务评论和产品属性的准确和完整答案的任务。然而,安全问题的回答会对文本生成任务显著挑战,电子商务答疑任务也不例外。为了产生更有意义的答案,在本文中,我们提出了一种新生成神经元模型,称为有意义的产品回答生成器(MPAG),其通过采取产品评论,产品属性缓解保险的回答问题,和原型的答案考虑。产品评论和产品属性来提供有意义的内容,而原型回答能产生更多样化的回答模式。为此,我们提出了审查推理模块和原型回答读者一个新的答案生成。我们的核心理念是获得评论的大规模集中正确的问题意识的信息,并了解如何编写从现有的原型回答一个连贯的和有意义的答案。更具体地讲,我们提出由选择写入单位进行这些评论中推理的读取和写入内存。然后,我们采用原型读者组成的综合匹配,以提取原型答案的答案骨架。最后,我们提出了一个答案编辑器生成通过采取这个问题,上述部分作为输入最后的答案。进行了从电子商务平台收集了现实世界的数据集,大量的实验结果表明,我们的模型实现了两个自动度量和评价人的方面国家的最先进的性能。人的评价也证明了我们的模式能够持续产生具体和适当的答案。
36. Sentiment Analysis for Sinhala Language using Deep Learning Techniques [PDF] 返回目录
Lahiru Senevirathne, Piyumal Demotte, Binod Karunanayake, Udyogi Munasinghe, Surangika Ranathunga
Abstract: Due to the high impact of the fast-evolving fields of machine learning and deep learning, Natural Language Processing (NLP) tasks have further obtained comprehensive performances for highly resourced languages such as English and Chinese. However Sinhala, which is an under-resourced language with a rich morphology, has not experienced these advancements. For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case. They experimented with only three types of deep learning models. In contrast, this paper presents a much comprehensive study on the use of standard sequence models such as RNN, LSTM, Bi-LSTM, as well as more recent state-of-the-art models such as hierarchical attention hybrid neural networks, and capsule networks. Classification is done at document-level but with more granularity by considering POSITIVE, NEGATIVE, NEUTRAL, and CONFLICT classes. A data set of 15059 Sinhala news comments, annotated with these four classes and a corpus consists of 9.48 million tokens are publicly released. This is the largest sentiment annotated data set for Sinhala so far.
摘要:由于机器学习和深入学习快速发展的领域的高冲击,自然语言处理(NLP)任务,进一步获得了资源丰富的条件语言,如英语和中国的综合性能。然而僧伽罗语,这是一个拥有丰富的形态的资源不足的语言,没有经历过这些进步。对于情感分析,只存在两个以前研究与深度学习的方法,其只专注于为二进制的情况文档级情感分析。他们试验了只有三种类型的深度学习模式。相比之下,本文提出的使用标准序列模型,比如RNN,LSTM,碧LSTM一个更全面的研究,国家的最先进的,以及更近的模式,如分层关注混合神经网络和胶囊网络。分类是考虑正面,负面,中性和冲突班文档级的,但更多的粒度来完成。一个数据集的15059个僧伽罗语新闻评论,标注了这四个类和语料库包括948万个令牌公开发布。这是最大的情感注解数据僧伽罗语集至今。
Lahiru Senevirathne, Piyumal Demotte, Binod Karunanayake, Udyogi Munasinghe, Surangika Ranathunga
Abstract: Due to the high impact of the fast-evolving fields of machine learning and deep learning, Natural Language Processing (NLP) tasks have further obtained comprehensive performances for highly resourced languages such as English and Chinese. However Sinhala, which is an under-resourced language with a rich morphology, has not experienced these advancements. For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case. They experimented with only three types of deep learning models. In contrast, this paper presents a much comprehensive study on the use of standard sequence models such as RNN, LSTM, Bi-LSTM, as well as more recent state-of-the-art models such as hierarchical attention hybrid neural networks, and capsule networks. Classification is done at document-level but with more granularity by considering POSITIVE, NEGATIVE, NEUTRAL, and CONFLICT classes. A data set of 15059 Sinhala news comments, annotated with these four classes and a corpus consists of 9.48 million tokens are publicly released. This is the largest sentiment annotated data set for Sinhala so far.
摘要:由于机器学习和深入学习快速发展的领域的高冲击,自然语言处理(NLP)任务,进一步获得了资源丰富的条件语言,如英语和中国的综合性能。然而僧伽罗语,这是一个拥有丰富的形态的资源不足的语言,没有经历过这些进步。对于情感分析,只存在两个以前研究与深度学习的方法,其只专注于为二进制的情况文档级情感分析。他们试验了只有三种类型的深度学习模式。相比之下,本文提出的使用标准序列模型,比如RNN,LSTM,碧LSTM一个更全面的研究,国家的最先进的,以及更近的模式,如分层关注混合神经网络和胶囊网络。分类是考虑正面,负面,中性和冲突班文档级的,但更多的粒度来完成。一个数据集的15059个僧伽罗语新闻评论,标注了这四个类和语料库包括948万个令牌公开发布。这是最大的情感注解数据僧伽罗语集至今。
37. DebateSum: A large-scale argument mining and summarization dataset [PDF] 返回目录
Allen Roush, Arvind Balaji
Abstract: Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. DebateSum was made using data compiled by competitors within the National Speech and Debate Association over a 7-year period. We train several transformer summarization models to benchmark summarization performance on DebateSum. We also introduce a set of fasttext word-vectors trained on DebateSum called debate2vec. Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today. The DebateSum search engine is available to the public here: http://www.debate.cards
摘要:在参数矿业经常暗指此前工作,其自动辩论系统的应用潜力。尽管这一重点,几乎没有任何数据集或模型存在适用的自然语言处理技术来竞争的正式辩论中发现的问题。为了解决这个问题,我们提出了DebateSum数据集。 DebateSum由187386独特的证据有个说法对应和采掘摘要。 DebateSum使用超过7年期间全国演讲与辩论协会内的竞争对手收集的数据制成。我们培养几个变压器汇总车型上DebateSum基准总结性能。我们还引进了一套培训了DebateSum fasttext文字载体的所谓debate2vec。最后,我们提出了一个搜索引擎的数据集是由今天的全国演讲与辩论协会的会员广泛使用。该DebateSum搜索引擎提供给公众在这里:http://www.debate.cards
Allen Roush, Arvind Balaji
Abstract: Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. DebateSum was made using data compiled by competitors within the National Speech and Debate Association over a 7-year period. We train several transformer summarization models to benchmark summarization performance on DebateSum. We also introduce a set of fasttext word-vectors trained on DebateSum called debate2vec. Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today. The DebateSum search engine is available to the public here: http://www.debate.cards
摘要:在参数矿业经常暗指此前工作,其自动辩论系统的应用潜力。尽管这一重点,几乎没有任何数据集或模型存在适用的自然语言处理技术来竞争的正式辩论中发现的问题。为了解决这个问题,我们提出了DebateSum数据集。 DebateSum由187386独特的证据有个说法对应和采掘摘要。 DebateSum使用超过7年期间全国演讲与辩论协会内的竞争对手收集的数据制成。我们培养几个变压器汇总车型上DebateSum基准总结性能。我们还引进了一套培训了DebateSum fasttext文字载体的所谓debate2vec。最后,我们提出了一个搜索引擎的数据集是由今天的全国演讲与辩论协会的会员广泛使用。该DebateSum搜索引擎提供给公众在这里:http://www.debate.cards
38. CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection [PDF] 返回目录
Severin Laicher, Gioia Baldissin, Enrique Castañeda, Dominik Schlechtweg, Sabine Schulte im Walde
Abstract: We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit Average Pairwise Distance of token-based BERT embeddings between time points and rank 5 (of 8) in the official ranking with an accuracy of $.72$. While we tune parameters on the English data set of SemEval-2020 Task 1 and reach high performance, this does not translate to the Italian DIACR-Ita data set. Our results show that we do not manage to find robust ways to exploit BERT embeddings in lexical semantic change detection.
摘要:我们提出我们的DIACR式-Ita参与的结果对词汇语义变化检测意大利共同任务。我们在官方与$ .72 $精度等级利用基于令牌的BERT的嵌入的平均成对距离的时间点和等级(8)5之间。虽然在SemEval-2020任务1的英语数据集,我们调整参数,达到高性能,这并没有转化为意大利DIACR-伊达数据集。我们的研究结果显示,我们不设法找到稳健的方法来利用在词汇语义变化检测BERT的嵌入。
Severin Laicher, Gioia Baldissin, Enrique Castañeda, Dominik Schlechtweg, Sabine Schulte im Walde
Abstract: We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit Average Pairwise Distance of token-based BERT embeddings between time points and rank 5 (of 8) in the official ranking with an accuracy of $.72$. While we tune parameters on the English data set of SemEval-2020 Task 1 and reach high performance, this does not translate to the Italian DIACR-Ita data set. Our results show that we do not manage to find robust ways to exploit BERT embeddings in lexical semantic change detection.
摘要:我们提出我们的DIACR式-Ita参与的结果对词汇语义变化检测意大利共同任务。我们在官方与$ .72 $精度等级利用基于令牌的BERT的嵌入的平均成对距离的时间点和等级(8)5之间。虽然在SemEval-2020任务1的英语数据集,我们调整参数,达到高性能,这并没有转化为意大利DIACR-伊达数据集。我们的研究结果显示,我们不设法找到稳健的方法来利用在词汇语义变化检测BERT的嵌入。
39. Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection [PDF] 返回目录
Md Tahmid Rahman Laskar, Enamul Hoque, Jimmy Xiangji Huang
Abstract: Pre-training a transformer-based model for the language modeling task in a large dataset and then fine-tuning it for downstream tasks has been found very useful in recent years. One major advantage of such pre-trained language models is that they can effectively absorb the context of each word in a sentence. However, for tasks such as the answer selection task, the pre-trained language models have not been extensively used yet. To investigate their effectiveness in such tasks, in this paper, we adopt the pre-trained Bidirectional Encoder Representations from Transformer (BERT) language model and fine-tune it on two Question Answering (QA) datasets and three Community Question Answering (CQA) datasets for the answer selection task. We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets compared to the previous state-of-the-art.
摘要:前培训的语言建模任务基于变压器的模型,在一个大的数据集,然后进行精细调整,下游的任务已经发现非常有用的在最近几年。这种预先训练语言模型的一个主要优点是,它们可以有效地吸收每个单词的上下文的句子。然而,这样的任务,答案选择任务,预先训练的语言模型还没有被广泛尚未使用。探讨其在此类任务的有效性,在本文中,我们采用预先训练的双向编码器交涉从变压器(BERT)语言模型和微调它在两个问题回答(QA)的数据集和三个社区问答(CQA)数据集的回答选择任务。我们发现,微调的答案选择任务的BERT模式是非常有效的,并观察在QA数据集相比之前的国家的最先进的13.1%的最大改进和18.7%,在CQA数据集。
Md Tahmid Rahman Laskar, Enamul Hoque, Jimmy Xiangji Huang
Abstract: Pre-training a transformer-based model for the language modeling task in a large dataset and then fine-tuning it for downstream tasks has been found very useful in recent years. One major advantage of such pre-trained language models is that they can effectively absorb the context of each word in a sentence. However, for tasks such as the answer selection task, the pre-trained language models have not been extensively used yet. To investigate their effectiveness in such tasks, in this paper, we adopt the pre-trained Bidirectional Encoder Representations from Transformer (BERT) language model and fine-tune it on two Question Answering (QA) datasets and three Community Question Answering (CQA) datasets for the answer selection task. We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets compared to the previous state-of-the-art.
摘要:前培训的语言建模任务基于变压器的模型,在一个大的数据集,然后进行精细调整,下游的任务已经发现非常有用的在最近几年。这种预先训练语言模型的一个主要优点是,它们可以有效地吸收每个单词的上下文的句子。然而,这样的任务,答案选择任务,预先训练的语言模型还没有被广泛尚未使用。探讨其在此类任务的有效性,在本文中,我们采用预先训练的双向编码器交涉从变压器(BERT)语言模型和微调它在两个问题回答(QA)的数据集和三个社区问答(CQA)数据集的回答选择任务。我们发现,微调的答案选择任务的BERT模式是非常有效的,并观察在QA数据集相比之前的国家的最先进的13.1%的最大改进和18.7%,在CQA数据集。
40. Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling [PDF] 返回目录
Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli
Abstract: Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, naïve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue by introducing efficient approximations to make inference with the noisy channel approach as fast as strong ensembles while increasing accuracy. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
摘要:在浩大的数量未标记数据的预培训模式已经成为一种有效的途径很多NLP任务提高了精度。在另一方面,传统机器翻译已经通过噪声信道建模利用未标记数据的悠久历史。同样的想法最近被证明能够使神经机器翻译强烈的改善。不幸的是,现代的序列序列模型天真噪声信道建模达到一个数量级比替代品慢。我们通过引入有效的近似,使推理与尽可能快地强合奏,同时提高了准确度的噪声信道的方式解决这个问题。我们还表明,嘈杂的渠道方法可以实现艺术上WMT罗马尼亚语 - 英语翻译的一个新的状态优于强前培训的效果。
Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli
Abstract: Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, naïve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue by introducing efficient approximations to make inference with the noisy channel approach as fast as strong ensembles while increasing accuracy. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
摘要:在浩大的数量未标记数据的预培训模式已经成为一种有效的途径很多NLP任务提高了精度。在另一方面,传统机器翻译已经通过噪声信道建模利用未标记数据的悠久历史。同样的想法最近被证明能够使神经机器翻译强烈的改善。不幸的是,现代的序列序列模型天真噪声信道建模达到一个数量级比替代品慢。我们通过引入有效的近似,使推理与尽可能快地强合奏,同时提高了准确度的噪声信道的方式解决这个问题。我们还表明,嘈杂的渠道方法可以实现艺术上WMT罗马尼亚语 - 英语翻译的一个新的状态优于强前培训的效果。
41. IIRC: A Dataset of Incomplete Information Reading Comprehension Questions [PDF] 返回目录
James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi
Abstract: Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear. This process also gave many questions without answers, and those that require discrete reasoning, increasing the difficulty of the task. We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset, finding that it achieves 31.1% F1 on this task, while estimated human performance is 88.4%. The dataset, code for the baseline system, and a leaderboard can be found at this https URL.
摘要:人类常常要读多个文档,以解决他们的信息需求。然而,大多数现有的阅读理解(RC)的任务只专注于为其提供上下文都回答他们所需要的信息的问题,因此不能在识别潜在缺乏足够的信息和定位为信息源评估系统的性能。为了填补这一空白,我们提出了一个数据集,IIRC,比从英文维基百科的段落只提供部分信息来回答他们超过13K的问题,在一个或多个链接的文档中出现的丢失的信息。这些问题是通过谁没有访问任何链接的文档,导致具有与其中的答案出现的上下文小词的重叠问题的人群工人写的。这个过程也给了不回答了很多问题,而那些需要独立的推理,增加了工作的难度。我们遵循各种阅读理解数据集最近建模工作,建立了一个基准模型数据集,发现它实现这一任务31.1%F1,而估计人的表现是88.4%。该数据集,为基线系统代码,排行榜可以在此HTTPS URL中找到。
James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi
Abstract: Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear. This process also gave many questions without answers, and those that require discrete reasoning, increasing the difficulty of the task. We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset, finding that it achieves 31.1% F1 on this task, while estimated human performance is 88.4%. The dataset, code for the baseline system, and a leaderboard can be found at this https URL.
摘要:人类常常要读多个文档,以解决他们的信息需求。然而,大多数现有的阅读理解(RC)的任务只专注于为其提供上下文都回答他们所需要的信息的问题,因此不能在识别潜在缺乏足够的信息和定位为信息源评估系统的性能。为了填补这一空白,我们提出了一个数据集,IIRC,比从英文维基百科的段落只提供部分信息来回答他们超过13K的问题,在一个或多个链接的文档中出现的丢失的信息。这些问题是通过谁没有访问任何链接的文档,导致具有与其中的答案出现的上下文小词的重叠问题的人群工人写的。这个过程也给了不回答了很多问题,而那些需要独立的推理,增加了工作的难度。我们遵循各种阅读理解数据集最近建模工作,建立了一个基准模型数据集,发现它实现这一任务31.1%F1,而估计人的表现是88.4%。该数据集,为基线系统代码,排行榜可以在此HTTPS URL中找到。
42. Zero-shot Learning for Relation Extraction [PDF] 返回目录
Jiaying Gong, Hoda Eldardiry
Abstract: Most existing supervised and few-shot learning relation extraction methods have relied on labeled training data. However, in real-world scenarios, there exist many relations for which there is no available training data. We address this issue from the perspective of zero-shot learning (ZSL) which is similar to the way humans learn and recognize new concepts with no prior knowledge. We propose a zero-shot learning relation extraction (ZSLRE) framework, which focuses on recognizing novel relations that have no corresponding labeled data available for training. Our proposed ZSLRE model aims to recognize new relations based on prototypical networks that are modified to utilize side (auxiliary) information. The additional use of side information allows those modified prototype networks to recognize novel relations in addition to recognized previously known relations. We construct side information from labels and their synonyms, hypernyms of name entities, and keywords. We build an automatic hypernym extraction framework to help get hypernyms of various name entities directly from the web. We demonstrate using extensive experiments on two public datasets (NYT and FewRel) that our proposed model significantly outperforms state-of-the-art methods on supervised learning, few-shot learning, and zero-shot learning tasks. Our experimental results also demonstrate the effectiveness and robustness of our proposed model in a combination scenario. Once accepted for publication, we will publish ZSLRE's source code and datasets to enable reproducibility and encourage further research.
摘要:大多数现有的监督和几个次学习的关系抽取方法依赖于标记的训练数据。然而,在现实世界的情况下,存在着许多关系,为其中没有可用的训练数据。我们从零次学习(ZSL)的角度,它类似于人类学习和认识,没有先验知识新观念的方式解决这一问题。我们提出了一个零次学习关系抽取(ZSLRE)框架,其重点是承认有可用于训练没有相应的标签数据的新关系。我们提出的ZSLRE模型旨在表彰基于被修改为使用方(辅助)信息原型网络的新关系。附加使用的辅助信息允许这些修改原型网络识别除了公认的先前已知的关系新颖关系。我们构造从标签和它们的同义词,名实体的上位词和关键字侧信息。我们建立一个自动提取上位词框架,直接从网页上各种名称实体的帮助获得上位词。我们演示使用两个公共数据集(NYT和FewRel),我们提出的模型显著优于上监督学习,很少拍的学习和零次学习任务的国家的最先进的方法,大量的实验。我们的实验结果也表明在一个组合方案的有效性和我们提出的模型的鲁棒性。一旦接受发表,我们会公布ZSLRE的源代码和数据集,以使重现,并鼓励进一步研究。
Jiaying Gong, Hoda Eldardiry
Abstract: Most existing supervised and few-shot learning relation extraction methods have relied on labeled training data. However, in real-world scenarios, there exist many relations for which there is no available training data. We address this issue from the perspective of zero-shot learning (ZSL) which is similar to the way humans learn and recognize new concepts with no prior knowledge. We propose a zero-shot learning relation extraction (ZSLRE) framework, which focuses on recognizing novel relations that have no corresponding labeled data available for training. Our proposed ZSLRE model aims to recognize new relations based on prototypical networks that are modified to utilize side (auxiliary) information. The additional use of side information allows those modified prototype networks to recognize novel relations in addition to recognized previously known relations. We construct side information from labels and their synonyms, hypernyms of name entities, and keywords. We build an automatic hypernym extraction framework to help get hypernyms of various name entities directly from the web. We demonstrate using extensive experiments on two public datasets (NYT and FewRel) that our proposed model significantly outperforms state-of-the-art methods on supervised learning, few-shot learning, and zero-shot learning tasks. Our experimental results also demonstrate the effectiveness and robustness of our proposed model in a combination scenario. Once accepted for publication, we will publish ZSLRE's source code and datasets to enable reproducibility and encourage further research.
摘要:大多数现有的监督和几个次学习的关系抽取方法依赖于标记的训练数据。然而,在现实世界的情况下,存在着许多关系,为其中没有可用的训练数据。我们从零次学习(ZSL)的角度,它类似于人类学习和认识,没有先验知识新观念的方式解决这一问题。我们提出了一个零次学习关系抽取(ZSLRE)框架,其重点是承认有可用于训练没有相应的标签数据的新关系。我们提出的ZSLRE模型旨在表彰基于被修改为使用方(辅助)信息原型网络的新关系。附加使用的辅助信息允许这些修改原型网络识别除了公认的先前已知的关系新颖关系。我们构造从标签和它们的同义词,名实体的上位词和关键字侧信息。我们建立一个自动提取上位词框架,直接从网页上各种名称实体的帮助获得上位词。我们演示使用两个公共数据集(NYT和FewRel),我们提出的模型显著优于上监督学习,很少拍的学习和零次学习任务的国家的最先进的方法,大量的实验。我们的实验结果也表明在一个组合方案的有效性和我们提出的模型的鲁棒性。一旦接受发表,我们会公布ZSLRE的源代码和数据集,以使重现,并鼓励进一步研究。
43. Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition [PDF] 返回目录
Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer
Abstract: Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7% on test-clean and 5.8% on test-other, to our best knowledge the lowest among streaming approaches reported so far.
摘要:基于注意力模型最近已日益普及为他们的领域,如机器翻译和自动语音识别表现出强大的性能。关注基于模型的一个主要挑战是获得了全序列和关于序列长度的平方增长的计算成本的需要。这些特点构成了挑战,特别是对于低延时的情况,在那里经常需要对系统进行流式传输。在本文中,我们对与卷积增强注意力基于模块的终端到终端的神经传感器架构之上建立一个紧凑和流语音识别系统。所提出的系统配备的端至高端机型与流媒体的能力,并减少使用增强记忆的流注意力基于模型的大脚印。在LibriSpeech数据集,我们提出的系统实现了文字错误率2.7%,上试干净,5.8%的测试等,以流媒体方式中最低的迄今为止所报道我们所知。
Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer
Abstract: Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7% on test-clean and 5.8% on test-other, to our best knowledge the lowest among streaming approaches reported so far.
摘要:基于注意力模型最近已日益普及为他们的领域,如机器翻译和自动语音识别表现出强大的性能。关注基于模型的一个主要挑战是获得了全序列和关于序列长度的平方增长的计算成本的需要。这些特点构成了挑战,特别是对于低延时的情况,在那里经常需要对系统进行流式传输。在本文中,我们对与卷积增强注意力基于模块的终端到终端的神经传感器架构之上建立一个紧凑和流语音识别系统。所提出的系统配备的端至高端机型与流媒体的能力,并减少使用增强记忆的流注意力基于模型的大脚印。在LibriSpeech数据集,我们提出的系统实现了文字错误率2.7%,上试干净,5.8%的测试等,以流媒体方式中最低的迄今为止所报道我们所知。
44. The Teacher-Student Chatroom Corpus [PDF] 返回目录
Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne, Paula Buttery
Abstract: The Teacher-Student Chatroom Corpus (TSCC) is a collection of written conversations captured during one-to-one lessons between teachers and learners of English. The lessons took place in an online chatroom and therefore involve more interactive, immediate and informal language than might be found in asynchronous exchanges such as email correspondence. The fact that the lessons were one-to-one means that the teacher was able to focus exclusively on the linguistic abilities and errors of the student, and to offer personalised exercises, scaffolding and correction. The TSCC contains more than one hundred lessons between two teachers and eight students, amounting to 13.5K conversational turns and 133K words: it is freely available for research use. We describe the corpus design, data collection procedure and annotations added to the text. We perform some preliminary descriptive analyses of the data and consider possible uses of the TSCC.
摘要:师生聊天室语料库(TSCC)是在教师和英语学习者之间的一到一个教训捕获书面谈话的集合。教训就发生在一网上聊天室,因此涉及到更多的互动,即时的和非正式的语言可能比异步交流,如电子邮件信件中找到。该课是一到一个手段使得教师能够专注于学生的语言能力和错误,并提供个性化的练习,脚手架和修正的事实。该TSCC包含两个教师和八名学生之间的一百多个教训,达13.5K谈话轮和133K的话:它是免费供研究使用。我们描述添加到文本语料库设计,数据收集程序和注释。我们执行数据的一些初步的描述性分析,并考虑舌鳞癌的可能用途。
Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne, Paula Buttery
Abstract: The Teacher-Student Chatroom Corpus (TSCC) is a collection of written conversations captured during one-to-one lessons between teachers and learners of English. The lessons took place in an online chatroom and therefore involve more interactive, immediate and informal language than might be found in asynchronous exchanges such as email correspondence. The fact that the lessons were one-to-one means that the teacher was able to focus exclusively on the linguistic abilities and errors of the student, and to offer personalised exercises, scaffolding and correction. The TSCC contains more than one hundred lessons between two teachers and eight students, amounting to 13.5K conversational turns and 133K words: it is freely available for research use. We describe the corpus design, data collection procedure and annotations added to the text. We perform some preliminary descriptive analyses of the data and consider possible uses of the TSCC.
摘要:师生聊天室语料库(TSCC)是在教师和英语学习者之间的一到一个教训捕获书面谈话的集合。教训就发生在一网上聊天室,因此涉及到更多的互动,即时的和非正式的语言可能比异步交流,如电子邮件信件中找到。该课是一到一个手段使得教师能够专注于学生的语言能力和错误,并提供个性化的练习,脚手架和修正的事实。该TSCC包含两个教师和八名学生之间的一百多个教训,达13.5K谈话轮和133K的话:它是免费供研究使用。我们描述添加到文本语料库设计,数据收集程序和注释。我们执行数据的一些初步的描述性分析,并考虑舌鳞癌的可能用途。
45. Sampling Approach Matters: Active Learning for Robotic Language Acquisition [PDF] 返回目录
Nisha Pillai, Edward Raff, Francis Ferraro, Cynthia Matuszek
Abstract: Ordering the selection of training data using active learning can lead to improvements in learning efficiently from smaller corpora. We present an exploration of active learning approaches applied to three grounded language problems of varying complexity in order to analyze what methods are suitable for improving data efficiency in learning. We present a method for analyzing the complexity of data in this joint problem space, and report on how characteristics of the underlying task, along with design decisions such as feature selection and classification model, drive the results. We observe that representativeness, along with diversity, is crucial in selecting data samples.
摘要:订购使用主动学习训练数据的选择可以导致改善从较小的语料库有效地获知。我们提出主动学习的探索办法适用于不同的复杂三种接地语言问题,以分析哪些方法适合于提高数据效率的学习。我们提出的基本任务的特点,设计决策等特征选择和分类模型一起,如何推动其结果就在这个关节问题空间分析数据的复杂性的方法和报告。我们观察到的代表性,具有多样性的,在选择的数据样本是至关重要的。
Nisha Pillai, Edward Raff, Francis Ferraro, Cynthia Matuszek
Abstract: Ordering the selection of training data using active learning can lead to improvements in learning efficiently from smaller corpora. We present an exploration of active learning approaches applied to three grounded language problems of varying complexity in order to analyze what methods are suitable for improving data efficiency in learning. We present a method for analyzing the complexity of data in this joint problem space, and report on how characteristics of the underlying task, along with design decisions such as feature selection and classification model, drive the results. We observe that representativeness, along with diversity, is crucial in selecting data samples.
摘要:订购使用主动学习训练数据的选择可以导致改善从较小的语料库有效地获知。我们提出主动学习的探索办法适用于不同的复杂三种接地语言问题,以分析哪些方法适合于提高数据效率的学习。我们提出的基本任务的特点,设计决策等特征选择和分类模型一起,如何推动其结果就在这个关节问题空间分析数据的复杂性的方法和报告。我们观察到的代表性,具有多样性的,在选择的数据样本是至关重要的。
46. Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [PDF] 返回目录
Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov, Rada Chirkova, Alexander Tropsha
Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways, as a ground truth source for checking the existence of the proposed pairs. For the proposed pairs not found in ROBOKOP, we provide further verification using Chemotext. Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For example, our model successfully identified that Omeprazole can help treat heartburn.We discuss the significance of this result, showing some examples of the proposed pairs. Discussion and Conclusion: The agreement of our results with the existing knowledge source indicates a step in the right direction. Given the plug-and-play nature of our framework, it is easy to add, remove, or modify parts to improve the model as necessary. We discuss the results showing some examples, and note that this is a potentially new line of research that has further scope to be explored. Although our approach was originally oriented on radio podcast transcripts, it is input-agnostic and could be applied to any source of textual data and to any problem of interest.
摘要:目的:我们的目标是了解潜在的新疗法从非结构化文本源疾病。更具体地说,我们寻求通过简单的推理上的文字的结构来提取潜在治愈疾病的药物,疾病对。材料和方法:我们使用谷歌云抄写的NPR广播节目的播客节目。然后,我们建立一个管道系统用于前处理的文本,以确保质量的输入到核心分类模型,其中饲料的一系列获得筛选结果后处理步骤。我们的分类模型本身使用的语言模型对考研文字预先训练。我们的管道的模块化特性允许方便的在这一领域未来的发展通过在管道的每个阶段替换更高质量的元件。作为验证措施,我们使用ROBOKOP,在只验证途径中的医学知识图形引擎,作为检查提出对存在地面实况源。对于ROBOKOP没有发现所提出的对,我们使用Chemotext提供进一步的验证。结果:我们发现我们提出的对30.4%,在ROBOKOP数据库。例如,我们的模型成功地确定了奥美拉唑可帮助治疗heartburn.We讨论这一结果的意义,显示出所提出的对的例子。讨论和结论:我们与现有的知识源结果的一致性表明在正确的方向迈出的一步。鉴于我们的框架的插件和游戏的性质,很容易添加,删除或修改部分改善型是必要的。我们讨论出一些例子的结果,并注意这是有进一步的范围进行探讨研究一个潜在的新线。虽然我们的方法最初是面向在电台播客的成绩单,这是输入无关,可以适用于文本数据的任何来源和任何感兴趣的问题。
Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov, Rada Chirkova, Alexander Tropsha
Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways, as a ground truth source for checking the existence of the proposed pairs. For the proposed pairs not found in ROBOKOP, we provide further verification using Chemotext. Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For example, our model successfully identified that Omeprazole can help treat heartburn.We discuss the significance of this result, showing some examples of the proposed pairs. Discussion and Conclusion: The agreement of our results with the existing knowledge source indicates a step in the right direction. Given the plug-and-play nature of our framework, it is easy to add, remove, or modify parts to improve the model as necessary. We discuss the results showing some examples, and note that this is a potentially new line of research that has further scope to be explored. Although our approach was originally oriented on radio podcast transcripts, it is input-agnostic and could be applied to any source of textual data and to any problem of interest.
摘要:目的:我们的目标是了解潜在的新疗法从非结构化文本源疾病。更具体地说,我们寻求通过简单的推理上的文字的结构来提取潜在治愈疾病的药物,疾病对。材料和方法:我们使用谷歌云抄写的NPR广播节目的播客节目。然后,我们建立一个管道系统用于前处理的文本,以确保质量的输入到核心分类模型,其中饲料的一系列获得筛选结果后处理步骤。我们的分类模型本身使用的语言模型对考研文字预先训练。我们的管道的模块化特性允许方便的在这一领域未来的发展通过在管道的每个阶段替换更高质量的元件。作为验证措施,我们使用ROBOKOP,在只验证途径中的医学知识图形引擎,作为检查提出对存在地面实况源。对于ROBOKOP没有发现所提出的对,我们使用Chemotext提供进一步的验证。结果:我们发现我们提出的对30.4%,在ROBOKOP数据库。例如,我们的模型成功地确定了奥美拉唑可帮助治疗heartburn.We讨论这一结果的意义,显示出所提出的对的例子。讨论和结论:我们与现有的知识源结果的一致性表明在正确的方向迈出的一步。鉴于我们的框架的插件和游戏的性质,很容易添加,删除或修改部分改善型是必要的。我们讨论出一些例子的结果,并注意这是有进一步的范围进行探讨研究一个潜在的新线。虽然我们的方法最初是面向在电台播客的成绩单,这是输入无关,可以适用于文本数据的任何来源和任何感兴趣的问题。
47. hyper-sinh: An Accurate and Reliable Function from Shallow to Deep Learning in TensorFlow and Keras [PDF] 返回目录
Luca Parisi, Renfei Ma, Narrendar RaviChandran, Matteo Lanzillotta
Abstract: This paper presents the 'hyper-sinh', a variation of the m-arcsinh activation function suitable for Deep Learning (DL)-based algorithms for supervised learning, such as Convolutional Neural Networks (CNN). hyper-sinh, developed in the open source Python libraries TensorFlow and Keras, is thus described and validated as an accurate and reliable activation function for both shallow and deep neural networks. Improvements in accuracy and reliability in image and text classification tasks on five (N = 5) benchmark data sets available from Keras are discussed. Experimental results demonstrate the overall competitive classification performance of both shallow and deep neural networks, obtained via this novel function. This function is evaluated with respect to gold standard activation functions, demonstrating its overall competitive accuracy and reliability for both image and text classification.
摘要:本文介绍了“超双曲正弦”,适于深度学习(DL)基算法监督学习,如卷积神经网络(CNN)的m arcsinh激活函数的变型。超双曲正弦,在开源Python库TensorFlow和Keras开发的,如此描述和验证作为既浅层和深层的神经网络的精确和可靠的激活功能。在五个(N = 5)的基准数据可从Keras集的准确性和可靠性在图像和文本分类任务的改进进行了讨论。实验结果表明,这两种浅层和深层的神经网络的整体竞争分类性能,通过这种新颖的功能获得。此功能相对于黄金标准活化功能评价,显示它有整体竞争准确性和可靠性两个图像和文本分类。
Luca Parisi, Renfei Ma, Narrendar RaviChandran, Matteo Lanzillotta
Abstract: This paper presents the 'hyper-sinh', a variation of the m-arcsinh activation function suitable for Deep Learning (DL)-based algorithms for supervised learning, such as Convolutional Neural Networks (CNN). hyper-sinh, developed in the open source Python libraries TensorFlow and Keras, is thus described and validated as an accurate and reliable activation function for both shallow and deep neural networks. Improvements in accuracy and reliability in image and text classification tasks on five (N = 5) benchmark data sets available from Keras are discussed. Experimental results demonstrate the overall competitive classification performance of both shallow and deep neural networks, obtained via this novel function. This function is evaluated with respect to gold standard activation functions, demonstrating its overall competitive accuracy and reliability for both image and text classification.
摘要:本文介绍了“超双曲正弦”,适于深度学习(DL)基算法监督学习,如卷积神经网络(CNN)的m arcsinh激活函数的变型。超双曲正弦,在开源Python库TensorFlow和Keras开发的,如此描述和验证作为既浅层和深层的神经网络的精确和可靠的激活功能。在五个(N = 5)的基准数据可从Keras集的准确性和可靠性在图像和文本分类任务的改进进行了讨论。实验结果表明,这两种浅层和深层的神经网络的整体竞争分类性能,通过这种新颖的功能获得。此功能相对于黄金标准活化功能评价,显示它有整体竞争准确性和可靠性两个图像和文本分类。
48. Deep multi-modal networks for book genre classification based on its cover [PDF] 返回目录
Chandra Kundu, Lukun Zheng
Abstract: Book covers are usually the very first impression to its readers and they often convey important information about the content of the book. Book genre classification based on its cover would be utterly beneficial to many modern retrieval systems, considering that the complete digitization of books is an extremely expensive task. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of book genres, many of which are not concretely defined. Second, book covers, as graphic designs, vary in many different ways such as colors, styles, textual information, etc, even for books of the same genre. Third, book cover designs may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the book industry, the book cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The cover-based book classification systems become a particularly exciting research topic in recent years. In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, our method adds an extra modality by extracting texts automatically from the book covers. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of book cover classification. Third, we develop an efficient and salable multi-modal framework based on the images and texts shown on the covers only. Fourth, a thorough analysis of the experimental results is given and future works to improve the performance is suggested. The results show that the multi-modal framework significantly outperforms the current state-of-the-art image-based models. However, more efforts and resources are needed for this classification task in order to reach a satisfactory level.
摘要:书的封面通常是第一个印象,它的读者,他们往往传达关于这本书的内容的重要信息。基于它的封面书流派分类将许多现代检索系统完全有利的,考虑到书籍的完整数字化是一个非常昂贵的任务。同时,它也是一个极具挑战性的任务,因为以下几个原因:首先,存在着各种各样的书籍流派,都没有具体规定其中许多的。其次,书籍封面,为图形设计,改变在许多不同的方式,比如颜色,样式,文本信息等,即使是同一类型的书籍。三,书的封面设计可能会有所不同,由于许多外部因素,如国家,文化,目标读者群,等有了书业的竞争力不断提高,书的封面设计和排版推盖设计,其极限的希望吸引销售。基于盖图书分类系统成为近年来一个特别令人兴奋的研究课题。在本文中,我们提出了一种多模式深度学习框架来解决这个问题。本文的贡献是4倍。首先,我们的方法将通过这本书涵盖自动提取文本的额外方式。和基于文本的第二,基于图像的,国家的最先进的机型都进行了书的封面分类的任务彻底评估。第三,我们开发了基于仅封面显示的图像和文字的高效和实用的多模式框架。四,实验结果的深入分析,并给出未来的工程,以改善性能的建议。结果表明,多模态框架显著优于当前国家的最先进的基于图像的模型。然而,更多的精力和资源,以达到令人满意的程度,需要这个分类任务。
Chandra Kundu, Lukun Zheng
Abstract: Book covers are usually the very first impression to its readers and they often convey important information about the content of the book. Book genre classification based on its cover would be utterly beneficial to many modern retrieval systems, considering that the complete digitization of books is an extremely expensive task. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of book genres, many of which are not concretely defined. Second, book covers, as graphic designs, vary in many different ways such as colors, styles, textual information, etc, even for books of the same genre. Third, book cover designs may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the book industry, the book cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The cover-based book classification systems become a particularly exciting research topic in recent years. In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, our method adds an extra modality by extracting texts automatically from the book covers. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of book cover classification. Third, we develop an efficient and salable multi-modal framework based on the images and texts shown on the covers only. Fourth, a thorough analysis of the experimental results is given and future works to improve the performance is suggested. The results show that the multi-modal framework significantly outperforms the current state-of-the-art image-based models. However, more efforts and resources are needed for this classification task in order to reach a satisfactory level.
摘要:书的封面通常是第一个印象,它的读者,他们往往传达关于这本书的内容的重要信息。基于它的封面书流派分类将许多现代检索系统完全有利的,考虑到书籍的完整数字化是一个非常昂贵的任务。同时,它也是一个极具挑战性的任务,因为以下几个原因:首先,存在着各种各样的书籍流派,都没有具体规定其中许多的。其次,书籍封面,为图形设计,改变在许多不同的方式,比如颜色,样式,文本信息等,即使是同一类型的书籍。三,书的封面设计可能会有所不同,由于许多外部因素,如国家,文化,目标读者群,等有了书业的竞争力不断提高,书的封面设计和排版推盖设计,其极限的希望吸引销售。基于盖图书分类系统成为近年来一个特别令人兴奋的研究课题。在本文中,我们提出了一种多模式深度学习框架来解决这个问题。本文的贡献是4倍。首先,我们的方法将通过这本书涵盖自动提取文本的额外方式。和基于文本的第二,基于图像的,国家的最先进的机型都进行了书的封面分类的任务彻底评估。第三,我们开发了基于仅封面显示的图像和文字的高效和实用的多模式框架。四,实验结果的深入分析,并给出未来的工程,以改善性能的建议。结果表明,多模态框架显著优于当前国家的最先进的基于图像的模型。然而,更多的精力和资源,以达到令人满意的程度,需要这个分类任务。
49. Generating Negative Commonsense Knowledge [PDF] 返回目录
Tara Safavi, Danai Koutra
Abstract: The acquisition of commonsense knowledge is an important open challenge in artificial intelligence. In this work-in-progress paper, we study the task of automatically augmenting commonsense knowledge bases (KBs) with novel statements. We show empirically that obtaining meaningful negative samples for the completion task is nontrivial, and propose NegatER, a framework for generating negative commonsense knowledge, to address this challenge. In our evaluation we demonstrate the intrinsic value and extrinsic utility of the knowledge generated by NegatER, opening up new avenues for future research in this direction.
摘要:常识性知识的获取是人工智能的一个重要公然挑战。在这个工作正在进行论文中,我们研究了自动扩充与新的报表常识知识库(KBS)的任务。我们展示的是经验获得有意义的阴性样本为完成任务是平凡的,并提出NegatER,用来产生负常识性知识的框架,以应对这一挑战。在我们的评估中,我们展示的内在价值,并通过NegatER产生的知识的外在工具,在这个方向上开辟了新的途径,为今后的研究。
Tara Safavi, Danai Koutra
Abstract: The acquisition of commonsense knowledge is an important open challenge in artificial intelligence. In this work-in-progress paper, we study the task of automatically augmenting commonsense knowledge bases (KBs) with novel statements. We show empirically that obtaining meaningful negative samples for the completion task is nontrivial, and propose NegatER, a framework for generating negative commonsense knowledge, to address this challenge. In our evaluation we demonstrate the intrinsic value and extrinsic utility of the knowledge generated by NegatER, opening up new avenues for future research in this direction.
摘要:常识性知识的获取是人工智能的一个重要公然挑战。在这个工作正在进行论文中,我们研究了自动扩充与新的报表常识知识库(KBS)的任务。我们展示的是经验获得有意义的阴性样本为完成任务是平凡的,并提出NegatER,用来产生负常识性知识的框架,以应对这一挑战。在我们的评估中,我们展示的内在价值,并通过NegatER产生的知识的外在工具,在这个方向上开辟了新的途径,为今后的研究。
50. Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following [PDF] 返回目录
Valts Blukis, Ross A. Knepper, Yoav Artzi
Abstract: We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects. We introduce a few-shot language-conditioned object grounding method trained from augmented reality data that uses exemplars to identify objects and align them to their mentions in instructions. We present a learned map representation that encodes object locations and their instructed use, and construct it from our few-shot grounding output. We integrate this mapping approach into an instruction-following policy, thereby allowing it to reason about previously unseen objects at test-time by simply adding exemplars. We evaluate on the task of learning to map raw observations and instructions to continuous control of a physical quadcopter. Our approach significantly outperforms the prior state of the art in the presence of new objects, even when the prior approach observes all objects during training.
摘要:我们研究学习机器人的政策遵循自然语言指令,可以很容易地扩展到推理的新对象的问题。我们从介绍增强现实数据来训练了几拍的语言适应的对象接地方式使用范例来识别物体并调整他们自己的说明中提到。我们提出了一个学习的地图表示编码对象的位置和它们的指示使用,从我们的一些拍接地输出构建它。我们这个映射方法集成到下一条指令的策略,从而允许其原因大约在测试时间以前看不到的物体通过简单地添加典范。我们评估学习原始观测和指示映射到物理四轴飞行器的连续控制任务。我们的做法显著优于新物体的存在艺术的以前的状态,即使在现有技术方法的训练过程中观察到的所有对象。
Valts Blukis, Ross A. Knepper, Yoav Artzi
Abstract: We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects. We introduce a few-shot language-conditioned object grounding method trained from augmented reality data that uses exemplars to identify objects and align them to their mentions in instructions. We present a learned map representation that encodes object locations and their instructed use, and construct it from our few-shot grounding output. We integrate this mapping approach into an instruction-following policy, thereby allowing it to reason about previously unseen objects at test-time by simply adding exemplars. We evaluate on the task of learning to map raw observations and instructions to continuous control of a physical quadcopter. Our approach significantly outperforms the prior state of the art in the presence of new objects, even when the prior approach observes all objects during training.
摘要:我们研究学习机器人的政策遵循自然语言指令,可以很容易地扩展到推理的新对象的问题。我们从介绍增强现实数据来训练了几拍的语言适应的对象接地方式使用范例来识别物体并调整他们自己的说明中提到。我们提出了一个学习的地图表示编码对象的位置和它们的指示使用,从我们的一些拍接地输出构建它。我们这个映射方法集成到下一条指令的策略,从而允许其原因大约在测试时间以前看不到的物体通过简单地添加典范。我们评估学习原始观测和指示映射到物理四轴飞行器的连续控制任务。我们的做法显著优于新物体的存在艺术的以前的状态,即使在现有技术方法的训练过程中观察到的所有对象。
注:中文为机器翻译结果!封面为论文标题词云图!