目录
3. UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media [PDF] 摘要
4. Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference [PDF] 摘要
7. Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems [PDF] 摘要
13. Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries [PDF] 摘要
16. Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [PDF] 摘要
21. Relation Extraction from Biomedical and Clinical Text: Unified Multitask Learning Framework [PDF] 摘要
28. Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation [PDF] 摘要
29. Biomedical Event Extraction on Graph Edge-conditioned Attention Networks with Hierarchical Knowledge Graphs [PDF] 摘要
30. Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation [PDF] 摘要
32. BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition [PDF] 摘要
35. Aggressive Language Detection with Joint Text Normalization via Adversarial Multi-task Learning [PDF] 摘要
38. CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes [PDF] 摘要
42. Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation [PDF] 摘要
43. Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing [PDF] 摘要
46. Deliberate Self-Attention Network with Uncertainty Estimation for Multi-Aspect Review Rating Prediction [PDF] 摘要
47. A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection [PDF] 摘要
55. Can questions summarize a corpus? Using question generation for characterizing COVID-19 research [PDF] 摘要
摘要
1. Composed Variational Natural Language Generation for Few-shot Intents [PDF] 返回目录
Congying Xia, Caiming Xiong, Philip Yu, Richard Socher
Abstract: In this paper, we focus on generating training examples for few-shot intents in the realistic imbalanced scenario. To build connections between existing many-shot intents and few-shot intents, we consider an intent as a combination of a domain and an action, and propose a composed variational natural language generator (CLANG), a transformer-based conditional variational autoencoder. CLANG utilizes two latent variables to represent the utterances corresponding to two different independent parts (domain and action) in the intent, and the latent variables are composed together to generate natural examples. Additionally, to improve the generator learning, we adopt the contrastive regularization loss that contrasts the in-class with the out-of-class utterance generation given the intent. To evaluate the quality of the generated utterances, experiments are conducted on the generalized few-shot intent detection task. Empirical results show that our proposed model achieves state-of-the-art performances on two real-world intent detection datasets.
摘要:在本文中,我们致力于创造在现实的不平衡情况为几拍意图训练实例。要加强对现有许多次的意图和几拍意图之间的连接,我们认为意图为域和动作的组合,并提出由变自然语言生成器(CLANG),一个基于变压器的条件变的自动编码。 CLANG利用两个潜变量来表示对应于意图两个不同的独立的部分(结构域和动作)的话语,和潜变量组合在一起,以产生自然的例子。此外,为了提高发电机的学习,我们采用了对比正规化损失对比中有顶尖的给出的意图外的阶级话语的产生。为了评估产生的话语的质量,实验在广义几拍意图检测任务进行。实证结果表明,该模型实现了两个现实世界的意图检测数据集的国家的最先进的性能。
Congying Xia, Caiming Xiong, Philip Yu, Richard Socher
Abstract: In this paper, we focus on generating training examples for few-shot intents in the realistic imbalanced scenario. To build connections between existing many-shot intents and few-shot intents, we consider an intent as a combination of a domain and an action, and propose a composed variational natural language generator (CLANG), a transformer-based conditional variational autoencoder. CLANG utilizes two latent variables to represent the utterances corresponding to two different independent parts (domain and action) in the intent, and the latent variables are composed together to generate natural examples. Additionally, to improve the generator learning, we adopt the contrastive regularization loss that contrasts the in-class with the out-of-class utterance generation given the intent. To evaluate the quality of the generated utterances, experiments are conducted on the generalized few-shot intent detection task. Empirical results show that our proposed model achieves state-of-the-art performances on two real-world intent detection datasets.
摘要:在本文中,我们致力于创造在现实的不平衡情况为几拍意图训练实例。要加强对现有许多次的意图和几拍意图之间的连接,我们认为意图为域和动作的组合,并提出由变自然语言生成器(CLANG),一个基于变压器的条件变的自动编码。 CLANG利用两个潜变量来表示对应于意图两个不同的独立的部分(结构域和动作)的话语,和潜变量组合在一起,以产生自然的例子。此外,为了提高发电机的学习,我们采用了对比正规化损失对比中有顶尖的给出的意图外的阶级话语的产生。为了评估产生的话语的质量,实验在广义几拍意图检测任务进行。实证结果表明,该模型实现了两个现实世界的意图检测数据集的国家的最先进的性能。
2. Latin BERT: A Contextual Language Model for Classical Philology [PDF] 返回目录
David Bamman, Patrick J. Burns
Abstract: We present Latin BERT, a contextual language model for the Latin language, trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century. In a series of case studies, we illustrate the affordances of this language-specific model both for work in natural language processing for Latin and in using computational methods for traditional scholarship: we show that Latin BERT achieves a new state of the art for part-of-speech tagging on all three Universal Dependency datasets for Latin and can be used for predicting missing text (including critical emendations); we create a new dataset for assessing word sense disambiguation for Latin and demonstrate that Latin BERT outperforms static word embeddings; and we show that it can be used for semantically-informed search by querying contextual nearest neighbors. We publicly release trained models to help drive future work in this space.
摘要:我们提出拉丁BERT,为拉丁语,培训了64270万个字,从各种来源跨越古典时代到21世纪的上下文的语言模型。在一系列的案例研究,我们说明这个特定语言模型的启示都在拉丁美洲自然语言处理和使用传统的奖学金计算方法的工作:我们证明了拉丁BERT达到了一个新的艺术状态兼职词性标注在所有三个通用相关数据集对拉丁美洲和可用于预测缺少文本(包括关键emendations);我们创建了一个新的数据集用于评估拉美多义和证明,拉丁BERT优于静态字的嵌入;我们表明,它可以通过查询上下文的近邻用于语义搜索知情。我们公开在这个空间里训练的模型释放,有助于推动今后的工作。
David Bamman, Patrick J. Burns
Abstract: We present Latin BERT, a contextual language model for the Latin language, trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century. In a series of case studies, we illustrate the affordances of this language-specific model both for work in natural language processing for Latin and in using computational methods for traditional scholarship: we show that Latin BERT achieves a new state of the art for part-of-speech tagging on all three Universal Dependency datasets for Latin and can be used for predicting missing text (including critical emendations); we create a new dataset for assessing word sense disambiguation for Latin and demonstrate that Latin BERT outperforms static word embeddings; and we show that it can be used for semantically-informed search by querying contextual nearest neighbors. We publicly release trained models to help drive future work in this space.
摘要:我们提出拉丁BERT,为拉丁语,培训了64270万个字,从各种来源跨越古典时代到21世纪的上下文的语言模型。在一系列的案例研究,我们说明这个特定语言模型的启示都在拉丁美洲自然语言处理和使用传统的奖学金计算方法的工作:我们证明了拉丁BERT达到了一个新的艺术状态兼职词性标注在所有三个通用相关数据集对拉丁美洲和可用于预测缺少文本(包括关键emendations);我们创建了一个新的数据集用于评估拉美多义和证明,拉丁BERT优于静态字的嵌入;我们表明,它可以通过查询上下文的近邻用于语义搜索知情。我们公开在这个空间里训练的模型释放,有助于推动今后的工作。
3. UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media [PDF] 返回目录
Congcong Wang, David Lillis
Abstract: In this paper, we describe our approach in the shared task: COVID-19 event extraction from Twitter. The objective of this task is to extract answers from COVID-related tweets to a set of predefined slot-filling questions. Our approach treats the event extraction task as a question answering task by leveraging the transformer-based T5 text-to-text model. According to the official exact match based evaluation scores returned, namely F1, our submitted run can achieve competitive performance as compared to other participating runs (Top 3). However, we argue that this evaluation can potentially underestimate the actual performance of runs based on text-generation approaches (e.g. our run). This is due to the fact that although some predictions of such runs answer the slot questions well, they may not be an exact string match for the gold standard answers. To further measure the extent of this underestimation, we adopt a simple exact answer transformation method aiming at converting the well-answered predictions to exactly-matched predictions. The results show that after the transformation our run reaches the same level of performance as the best participating run. Our code is publicly available to aid reproducibility.
摘要:在本文中,我们描述了我们的共同任务的方法:从Twitter COVID-19事件提取。该任务的目的是提取COVID相关的鸣叫回答一组预定义的槽填充的问题。我们的方法治疗的事件抽取任务作为一个问题提出通过利用基于变压器的T5文本到文本模型接听任务。据返回,即F1官方确切根据匹配评估值,相对于其他参与运行(前3名),我们提交的运行可以实现竞争力的性能。然而,我们认为,这个评价有可能低估了基于文本的生成方法运行的实际性能(例如我们的运行)。这是由于这样的事实,虽然这样运行的一些预测回答插槽问题很好,他们可能不是金标准答案的精确字符串匹配。为了进一步措施这一低估的程度,我们采用一个简单的确切答案的转化方法针对转换很好回答的预测准确匹配的预测。结果表明,改造后我们运行达到的性能最好的参与运行相同的水平。我们的代码是公开的,以援助重现。
Congcong Wang, David Lillis
Abstract: In this paper, we describe our approach in the shared task: COVID-19 event extraction from Twitter. The objective of this task is to extract answers from COVID-related tweets to a set of predefined slot-filling questions. Our approach treats the event extraction task as a question answering task by leveraging the transformer-based T5 text-to-text model. According to the official exact match based evaluation scores returned, namely F1, our submitted run can achieve competitive performance as compared to other participating runs (Top 3). However, we argue that this evaluation can potentially underestimate the actual performance of runs based on text-generation approaches (e.g. our run). This is due to the fact that although some predictions of such runs answer the slot questions well, they may not be an exact string match for the gold standard answers. To further measure the extent of this underestimation, we adopt a simple exact answer transformation method aiming at converting the well-answered predictions to exactly-matched predictions. The results show that after the transformation our run reaches the same level of performance as the best participating run. Our code is publicly available to aid reproducibility.
摘要:在本文中,我们描述了我们的共同任务的方法:从Twitter COVID-19事件提取。该任务的目的是提取COVID相关的鸣叫回答一组预定义的槽填充的问题。我们的方法治疗的事件抽取任务作为一个问题提出通过利用基于变压器的T5文本到文本模型接听任务。据返回,即F1官方确切根据匹配评估值,相对于其他参与运行(前3名),我们提交的运行可以实现竞争力的性能。然而,我们认为,这个评价有可能低估了基于文本的生成方法运行的实际性能(例如我们的运行)。这是由于这样的事实,虽然这样运行的一些预测回答插槽问题很好,他们可能不是金标准答案的精确字符串匹配。为了进一步措施这一低估的程度,我们采用一个简单的确切答案的转化方法针对转换很好回答的预测准确匹配的预测。结果表明,改造后我们运行达到的性能最好的参与运行相同的水平。我们的代码是公开的,以援助重现。
4. Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference [PDF] 返回目录
Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff
Abstract: Leveraging text, such as social media posts, for causal inferences requires the use of NLP models to 'learn' and adjust for confounders, which could otherwise impart bias. However, evaluating such models is challenging, as ground truth is almost never available. We demonstrate the need for empirical evaluation frameworks for causal inference in natural language by showing that existing, commonly used models regularly disagree with one another on real world tasks. We contribute the first such framework, generalizing several challenges across these real world tasks. Using this framework, we evaluate a large set of commonly used causal inference models based on propensity scores and identify their strengths and weaknesses to inform future improvements. We make all tasks, data, and models public to inform applications and encourage additional research.
摘要:利用文字,如社交媒体帖子,对因果推论需要使用NLP模型来“学习”和调整混杂因素,否则可能传授偏见。然而,评估这种模式是具有挑战性的,因为地面事实是几乎从来没有用。我们通过表明现有的,常用的型号有规律与现实世界的任务彼此不同意展示了在自然语言因果推论实证评估框架的需要。我们贡献了第一个这样的框架下,推广在这些现实世界中的任务的几个挑战。利用这个框架,我们评估了一大套基于倾向得分常用因果推理模型,并确定自己的长处和短处,以便为将来的改进。我们做的所有任务,数据和模型公开通知应用程序,并鼓励更多的研究。
Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff
Abstract: Leveraging text, such as social media posts, for causal inferences requires the use of NLP models to 'learn' and adjust for confounders, which could otherwise impart bias. However, evaluating such models is challenging, as ground truth is almost never available. We demonstrate the need for empirical evaluation frameworks for causal inference in natural language by showing that existing, commonly used models regularly disagree with one another on real world tasks. We contribute the first such framework, generalizing several challenges across these real world tasks. Using this framework, we evaluate a large set of commonly used causal inference models based on propensity scores and identify their strengths and weaknesses to inform future improvements. We make all tasks, data, and models public to inform applications and encourage additional research.
摘要:利用文字,如社交媒体帖子,对因果推论需要使用NLP模型来“学习”和调整混杂因素,否则可能传授偏见。然而,评估这种模式是具有挑战性的,因为地面事实是几乎从来没有用。我们通过表明现有的,常用的型号有规律与现实世界的任务彼此不同意展示了在自然语言因果推论实证评估框架的需要。我们贡献了第一个这样的框架下,推广在这些现实世界中的任务的几个挑战。利用这个框架,我们评估了一大套基于倾向得分常用因果推理模型,并确定自己的长处和短处,以便为将来的改进。我们做的所有任务,数据和模型公开通知应用程序,并鼓励更多的研究。
5. WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using Transformers [PDF] 返回目录
Ahmed Sultan, Mahmoud Salim, Amina Gaber, Islam El Hosary
Abstract: In this paper, we describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text alongside other experiments. Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa", a transformer-based multilingual masked language model, on monolingual English and Spanish data and Spanish-English code-mixed data. Our system outperforms the official task baseline by achieving a 70.1% average F1-Score on the official leaderboard using the test set. For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan".
摘要:在本文中,我们描述了代码混社会媒体文本与其他实验中我们提交SemEval 2020任务9系统,情感分析。我们表现最好的系统是基于学习 - 移交模式,微调“XLM - 罗伯塔”,基于变压器的多语种蒙面语言模型,在单语英语和西班牙语的数据和西班牙的英文代码混合数据。我们的系统通过实现对使用测试组官方排行榜一个70.1%的平均F1-得分优于官方任务基线。对于后提交,我们的系统设法达到使用CodaLab用户名“ahmed0sultan”测试集的75.9%的平均F1-得分。
Ahmed Sultan, Mahmoud Salim, Amina Gaber, Islam El Hosary
Abstract: In this paper, we describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text alongside other experiments. Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa", a transformer-based multilingual masked language model, on monolingual English and Spanish data and Spanish-English code-mixed data. Our system outperforms the official task baseline by achieving a 70.1% average F1-Score on the official leaderboard using the test set. For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan".
摘要:在本文中,我们描述了代码混社会媒体文本与其他实验中我们提交SemEval 2020任务9系统,情感分析。我们表现最好的系统是基于学习 - 移交模式,微调“XLM - 罗伯塔”,基于变压器的多语种蒙面语言模型,在单语英语和西班牙语的数据和西班牙的英文代码混合数据。我们的系统通过实现对使用测试组官方排行榜一个70.1%的平均F1-得分优于官方任务基线。对于后提交,我们的系统设法达到使用CodaLab用户名“ahmed0sultan”测试集的75.9%的平均F1-得分。
6. Content Planning for Neural Story Generation with Aristotelian Rescoring [PDF] 返回目录
Seraphina Goldfarb-Tarrant, Tuhin Chakrabarty, Ralph Weischedel, Nanyun Peng
Abstract: Long-form narrative text generated from large language models manages a fluent impersonation of human writing, but only at the local sentence level, and lacks structure or global cohesion. We posit that many of the problems of story generation can be addressed via high-quality content planning, and present a system that focuses on how to learn good plot structures to guide story generation. We utilize a plot-generation language model along with an ensemble of rescoring models that each implement an aspect of good story-writing as detailed in Aristotle's Poetics. We find that stories written with our more principled plot-structure are both more relevant to a given prompt and higher quality than baselines that do not content plan, or that plan in an unprincipled way.
摘要:从大的语言模型生成的长篇叙事文本管理人类书写的流畅模仿,但只有在当地句子层面,缺乏结构或全局的凝聚力。我们断定,很多故事发生的问题可以通过高品质的内容规划加以解决,并提出一个系统,专注于如何学习好的情节结构,引导故事的产生。我们利用一个情节代语言模型,再评分模型,各自实现好故事写作的一个方面,如亚里士多德的诗学详细的集合一起。我们发现我们的原则性情节结构编写的故事都是给定的提示和更高的质量比没有计划的内容,或无原则的方式,计划基线更相关。
Seraphina Goldfarb-Tarrant, Tuhin Chakrabarty, Ralph Weischedel, Nanyun Peng
Abstract: Long-form narrative text generated from large language models manages a fluent impersonation of human writing, but only at the local sentence level, and lacks structure or global cohesion. We posit that many of the problems of story generation can be addressed via high-quality content planning, and present a system that focuses on how to learn good plot structures to guide story generation. We utilize a plot-generation language model along with an ensemble of rescoring models that each implement an aspect of good story-writing as detailed in Aristotle's Poetics. We find that stories written with our more principled plot-structure are both more relevant to a given prompt and higher quality than baselines that do not content plan, or that plan in an unprincipled way.
摘要:从大的语言模型生成的长篇叙事文本管理人类书写的流畅模仿,但只有在当地句子层面,缺乏结构或全局的凝聚力。我们断定,很多故事发生的问题可以通过高品质的内容规划加以解决,并提出一个系统,专注于如何学习好的情节结构,引导故事的产生。我们利用一个情节代语言模型,再评分模型,各自实现好故事写作的一个方面,如亚里士多德的诗学详细的集合一起。我们发现我们的原则性情节结构编写的故事都是给定的提示和更高的质量比没有计划的内容,或无原则的方式,计划基线更相关。
7. Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems [PDF] 返回目录
Ziming Li, Julia Kiseleva, Maarten de Rijke
Abstract: Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1)~traditional supervised learning together with (2)~a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
摘要:对话政策学习面向任务的对话系统大多采用通过强化学习方法最近得到了长足的进步。然而,这些方法已经变得非常复杂。现在是时候重新评估它。难道我们真的取得进展仅基于强化学习的开展对话代理商?我们展示如何(1)〜带(2)传统的监督学习一起〜自由模拟器对抗性学习方法可被用于实现性能比得上状态的最先进的基于RL-方法。首先,我们介绍一个简单的对话,动作解码器来预测适当的行动。然后,对话政策学习传统的多标签分类的解决方案是通过将致密层,以提高对话剂的性能扩展。最后,我们采用了冈贝尔,使用SoftMax估计到或者训练对话剂和对话的奖励模式,而无需使用强化学习。基于我们广泛的实验,我们可以得出结论所提出的方法可以实现更加稳定和更少的努力,如设计一个用户模拟和强化学习顽固参数调整所需要的领域知识,更高的性能。我们的主要目标不是击败强化与监督学习学习,而是要展示反思强化学习的角色的价值,并优化面向任务的对话系统监督学习。
Ziming Li, Julia Kiseleva, Maarten de Rijke
Abstract: Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1)~traditional supervised learning together with (2)~a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
摘要:对话政策学习面向任务的对话系统大多采用通过强化学习方法最近得到了长足的进步。然而,这些方法已经变得非常复杂。现在是时候重新评估它。难道我们真的取得进展仅基于强化学习的开展对话代理商?我们展示如何(1)〜带(2)传统的监督学习一起〜自由模拟器对抗性学习方法可被用于实现性能比得上状态的最先进的基于RL-方法。首先,我们介绍一个简单的对话,动作解码器来预测适当的行动。然后,对话政策学习传统的多标签分类的解决方案是通过将致密层,以提高对话剂的性能扩展。最后,我们采用了冈贝尔,使用SoftMax估计到或者训练对话剂和对话的奖励模式,而无需使用强化学习。基于我们广泛的实验,我们可以得出结论所提出的方法可以实现更加稳定和更少的努力,如设计一个用户模拟和强化学习顽固参数调整所需要的领域知识,更高的性能。我们的主要目标不是击败强化与监督学习学习,而是要展示反思强化学习的角色的价值,并优化面向任务的对话系统监督学习。
8. SDST: Successive Decoding for Speech-to-text Translation [PDF] 返回目录
Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei Li
Abstract: End-to-end speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose SDST, an integral framework with \textbf{S}uccessive \textbf{D}ecoding for end-to-end \textbf{S}peech-to-text \textbf{T}ranslation task. This method is verified in two mainstream datasets. Experiments show that our proposed \method improves the previous state-of-the-art methods by big margins.
摘要:最终到终端的语音到文本转换(ST),这直接将源语言的语音目标语言文字,吸引了密集的关注最近。但是,语音识别和机器翻译的单一模式相结合带来的直接跨模态跨语言映射了沉重的负担。为了降低学习难度,我们建议SDST,与\ textbf {S} uccessive \ textbf {d} ecoding的端至端\ textbf {S} peech到文本\ textbf横置ranslation任务不可或缺的框架。这种方法有两种主流的数据集进行验证。实验表明,我们提出的\方法提高了以前的状态的最先进的方法,通过大的利润。
Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei Li
Abstract: End-to-end speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose SDST, an integral framework with \textbf{S}uccessive \textbf{D}ecoding for end-to-end \textbf{S}peech-to-text \textbf{T}ranslation task. This method is verified in two mainstream datasets. Experiments show that our proposed \method improves the previous state-of-the-art methods by big margins.
摘要:最终到终端的语音到文本转换(ST),这直接将源语言的语音目标语言文字,吸引了密集的关注最近。但是,语音识别和机器翻译的单一模式相结合带来的直接跨模态跨语言映射了沉重的负担。为了降低学习难度,我们建议SDST,与\ textbf {S} uccessive \ textbf {d} ecoding的端至端\ textbf {S} peech到文本\ textbf横置ranslation任务不可或缺的框架。这种方法有两种主流的数据集进行验证。实验表明,我们提出的\方法提高了以前的状态的最先进的方法,通过大的利润。
9. Multitask Pointer Network for Multi-Representational Parsing [PDF] 返回目录
Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract: We propose a transition-based approach that, by training a single model, can efficiently parse any input sentence with both constituent and dependency trees, supporting both continuous/projective and discontinuous/non-projective syntactic structures. To that end, we develop a Pointer Network architecture with two separate task-specific decoders and a common encoder, and follow a multitask learning strategy to jointly train them. The resulting quadratic system, not only becomes the first parser that can jointly produce both unrestricted constituent and dependency trees from a single model, but also proves that both syntactic formalisms can benefit from each other during training, achieving state-of-the-art accuracies in several widely-used benchmarks such as the continuous English and Chinese Penn Treebanks, as well as the discontinuous German NEGRA and TIGER datasets.
摘要:本文提出了基于过渡的办法是,通过培训的单一模式,可以有效地解析任何输入句子既构成和依赖的树木,同时支持连续/投影和非连续/非投影句法结构。为此,我们开发了两个独立的任务,特定的解码器和编码器常见的指针网络架构,并按照多任务的学习策略,共同培养他们。将得到的二次系统,不仅成为可共同从单个模型产生既不受限制成分和依赖树木第一分析器,但也证明了这两个句法形式化可以在训练期间彼此受益,实现状态的最先进的精度在一些广泛使用的基准,如连续英语和中国宾州树库,以及不连续德国NEGRA和TIGER数据集。
Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract: We propose a transition-based approach that, by training a single model, can efficiently parse any input sentence with both constituent and dependency trees, supporting both continuous/projective and discontinuous/non-projective syntactic structures. To that end, we develop a Pointer Network architecture with two separate task-specific decoders and a common encoder, and follow a multitask learning strategy to jointly train them. The resulting quadratic system, not only becomes the first parser that can jointly produce both unrestricted constituent and dependency trees from a single model, but also proves that both syntactic formalisms can benefit from each other during training, achieving state-of-the-art accuracies in several widely-used benchmarks such as the continuous English and Chinese Penn Treebanks, as well as the discontinuous German NEGRA and TIGER datasets.
摘要:本文提出了基于过渡的办法是,通过培训的单一模式,可以有效地解析任何输入句子既构成和依赖的树木,同时支持连续/投影和非连续/非投影句法结构。为此,我们开发了两个独立的任务,特定的解码器和编码器常见的指针网络架构,并按照多任务的学习策略,共同培养他们。将得到的二次系统,不仅成为可共同从单个模型产生既不受限制成分和依赖树木第一分析器,但也证明了这两个句法形式化可以在训练期间彼此受益,实现状态的最先进的精度在一些广泛使用的基准,如连续英语和中国宾州树库,以及不连续德国NEGRA和TIGER数据集。
10. Empathetic Dialogue Generation via Knowledge Enhancing and Emotion Dependency Modeling [PDF] 返回目录
Qintong Li, Piji Li, Zhumin Chen, Zhaochun Ren
Abstract: Enabling the machines with empathetic abilities to provide context-consistent responses is crucial on both semantic and emotional levels. The task of empathetic dialogue generation is proposed to address this problem. However, two challenges still exist in this task: perceiving nuanced emotions implied in the dialogue context and modelling emotional dependencies. Lacking useful external knowledge makes it challenging to perceive implicit fine-grained emotions. Missing the emotional interactions among interlocutors also restricts the performance of empathetic dialogue generation. To address above challenges, we propose a knowledge-enhanced framework, named Know-EDG. We first enrich dialogue context by bunches of emotion-related concepts and construct a knowledge-enhanced context graph. Then we introduce a graph-aware Transformer encoder to learn graph's semantic and emotional representations, which are the prerequisites of the emotion identifier to predicate the target emotion signal. Finally, we propose an emotion-focused attention mechanism to exploit the emotional dependencies between dialogue context and target empathetic response. Conducted on a benchmark dataset, extensive experimental results show that our proposed framework outperforms state-of-the-art baselines in terms of automatic metrics and human evaluations.
摘要:启用具有移情能力的机器来提供上下文一致的反应是在语义和情感水平的关键。移情对话一代的任务是提出了解决这一问题。但是,有两个挑战仍然存在于这个任务:感知的对话语境暗示的细致入微的情感和情绪建模的依赖。由于缺乏有效的外部知识使得它具有挑战性的感知隐含细粒度的情绪。缺少对话者之间的情感互动也制约移情对话一代的性能。为了上述挑战的地址,我们提出了一个以知识为加强框架,命名为神秘EDG。通过情感相关的概念串首先,我们丰富了对话情境,构建一个知识增强背景图。然后我们引入一个图形感知变压器编码器来学习图形的语义和情感的表示,这是情感识别的谓词目标情感信号的先决条件。最后,我们提出了一种情绪,集中注意力机制,利用对话情境和目标移情反应之间的情感依赖。进行了一个基准数据集,大量的实验结果表明,该框架性能优于国家的最先进的基线自动指标和人类评估的条款。
Qintong Li, Piji Li, Zhumin Chen, Zhaochun Ren
Abstract: Enabling the machines with empathetic abilities to provide context-consistent responses is crucial on both semantic and emotional levels. The task of empathetic dialogue generation is proposed to address this problem. However, two challenges still exist in this task: perceiving nuanced emotions implied in the dialogue context and modelling emotional dependencies. Lacking useful external knowledge makes it challenging to perceive implicit fine-grained emotions. Missing the emotional interactions among interlocutors also restricts the performance of empathetic dialogue generation. To address above challenges, we propose a knowledge-enhanced framework, named Know-EDG. We first enrich dialogue context by bunches of emotion-related concepts and construct a knowledge-enhanced context graph. Then we introduce a graph-aware Transformer encoder to learn graph's semantic and emotional representations, which are the prerequisites of the emotion identifier to predicate the target emotion signal. Finally, we propose an emotion-focused attention mechanism to exploit the emotional dependencies between dialogue context and target empathetic response. Conducted on a benchmark dataset, extensive experimental results show that our proposed framework outperforms state-of-the-art baselines in terms of automatic metrics and human evaluations.
摘要:启用具有移情能力的机器来提供上下文一致的反应是在语义和情感水平的关键。移情对话一代的任务是提出了解决这一问题。但是,有两个挑战仍然存在于这个任务:感知的对话语境暗示的细致入微的情感和情绪建模的依赖。由于缺乏有效的外部知识使得它具有挑战性的感知隐含细粒度的情绪。缺少对话者之间的情感互动也制约移情对话一代的性能。为了上述挑战的地址,我们提出了一个以知识为加强框架,命名为神秘EDG。通过情感相关的概念串首先,我们丰富了对话情境,构建一个知识增强背景图。然后我们引入一个图形感知变压器编码器来学习图形的语义和情感的表示,这是情感识别的谓词目标情感信号的先决条件。最后,我们提出了一种情绪,集中注意力机制,利用对话情境和目标移情反应之间的情感依赖。进行了一个基准数据集,大量的实验结果表明,该框架性能优于国家的最先进的基线自动指标和人类评估的条款。
11. TED: Triple Supervision Decouples End-to-end Speech-to-text Translation [PDF] 返回目录
Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei Li
Abstract: An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Inspired by neuroscience, humans have perception systems and cognitive systems to process different information, we propose TED, \textbf{T}ransducer-\textbf{E}ncoder-\textbf{D}ecoder, a unified framework with triple supervision to decouple the end-to-end speech-to-text translation task. In addition to the target sentence translation loss, \method includes two auxiliary supervising signals to guide the acoustic transducer that extracts acoustic features from the input, and the semantic encoder to extract semantic features relevant to the source transcription text. Our method achieves state-of-the-art performance on both English-French and English-German speech translation benchmarks.
摘要:终端到终端的语音到文本转换(ST)在源语言都是需要花费音频输出目标语言的文本。通过神经科学的启发,人类的感知系统和认知系统来处理不同的信息,我们提出了TED,\ textbf横置ransducer- \ textbf {E} ncoder- \ textbf {d} ecoder,三重监管的统一框架来解耦终端到终端的语音到文本的翻译任务。除了目标句子翻译损失,\方法包括两个辅助监督信号来指导声换能器从输入提取声学特征,并且所述语义编码器,以提取相关的源转录文本的语义特征。我们的方法实现了对英语,法语和英语,德语语音翻译基准国家的最先进的性能。
Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei Li
Abstract: An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Inspired by neuroscience, humans have perception systems and cognitive systems to process different information, we propose TED, \textbf{T}ransducer-\textbf{E}ncoder-\textbf{D}ecoder, a unified framework with triple supervision to decouple the end-to-end speech-to-text translation task. In addition to the target sentence translation loss, \method includes two auxiliary supervising signals to guide the acoustic transducer that extracts acoustic features from the input, and the semantic encoder to extract semantic features relevant to the source transcription text. Our method achieves state-of-the-art performance on both English-French and English-German speech translation benchmarks.
摘要:终端到终端的语音到文本转换(ST)在源语言都是需要花费音频输出目标语言的文本。通过神经科学的启发,人类的感知系统和认知系统来处理不同的信息,我们提出了TED,\ textbf横置ransducer- \ textbf {E} ncoder- \ textbf {d} ecoder,三重监管的统一框架来解耦终端到终端的语音到文本的翻译任务。除了目标句子翻译损失,\方法包括两个辅助监督信号来指导声换能器从输入提取声学特征,并且所述语义编码器,以提取相关的源转录文本的语义特征。我们的方法实现了对英语,法语和英语,德语语音翻译基准国家的最先进的性能。
12. Profile Consistency Identification for Open-domain Dialogue Agents [PDF] 返回目录
Haoyu Song, Yan Wang, Wei-Nan Zhang, Zhengyu Zhao, Ting Liu, Xiaojiang Liu
Abstract: Maintaining a consistent attribute profile is crucial for dialogue agents to naturally converse with humans. Existing studies on improving attribute consistency mainly explored how to incorporate attribute information in the responses, but few efforts have been made to identify the consistency relations between response and attribute profile. To facilitate the study of profile consistency identification, we create a large-scale human-annotated dataset with over 110K single-turn conversations and their key-value attribute profiles. Explicit relation between response and profile is manually labeled. We also propose a key-value structure information enriched BERT model to identify the profile consistency, and it gained improvements over strong baselines. Further evaluations on downstream tasks demonstrate that the profile consistency identification model is conducive for improving dialogue consistency.
摘要:保持一致的属性配置文件用于对话代理商至关重要的与人类自然交谈。改善属性一致性现有的研究主要是探讨如何将属性信息的反应,但少数已作出努力,以确定响应和属性配置文件之间的一致性关系。为了便于轮廓一致性鉴定研究中,我们创造超过110K单圈谈话和他们的键值属性曲线大规模的人类标注的数据集。响应和轮廓之间的显式关系被手动标记。我们还提出了一个键值结构信息丰富BERT模型来识别个人资料的一致性,并获得了改进过强的基线。对下游任务进一步的评估表明,该曲线的一致性识别模式,有利于为提高对话的一致性。
Haoyu Song, Yan Wang, Wei-Nan Zhang, Zhengyu Zhao, Ting Liu, Xiaojiang Liu
Abstract: Maintaining a consistent attribute profile is crucial for dialogue agents to naturally converse with humans. Existing studies on improving attribute consistency mainly explored how to incorporate attribute information in the responses, but few efforts have been made to identify the consistency relations between response and attribute profile. To facilitate the study of profile consistency identification, we create a large-scale human-annotated dataset with over 110K single-turn conversations and their key-value attribute profiles. Explicit relation between response and profile is manually labeled. We also propose a key-value structure information enriched BERT model to identify the profile consistency, and it gained improvements over strong baselines. Further evaluations on downstream tasks demonstrate that the profile consistency identification model is conducive for improving dialogue consistency.
摘要:保持一致的属性配置文件用于对话代理商至关重要的与人类自然交谈。改善属性一致性现有的研究主要是探讨如何将属性信息的反应,但少数已作出努力,以确定响应和属性配置文件之间的一致性关系。为了便于轮廓一致性鉴定研究中,我们创造超过110K单圈谈话和他们的键值属性曲线大规模的人类标注的数据集。响应和轮廓之间的显式关系被手动标记。我们还提出了一个键值结构信息丰富BERT模型来识别个人资料的一致性,并获得了改进过强的基线。对下游任务进一步的评估表明,该曲线的一致性识别模式,有利于为提高对话的一致性。
13. Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries [PDF] 返回目录
Hideyuki Tachibana, Yotaro Katayama
Abstract: In Japanese text-to-speech (TTS), it is necessary to add accent information to the input sentence. However, there are a limited number of publicly available accent dictionaries, and those dictionaries e.g. UniDic, do not contain many compound words, proper nouns, etc., which are required in a practical TTS system. In order to build a large scale accent dictionary that contains those words, the authors developed an accent estimation technique that predicts the accent of a word from its limited information, namely the surface (e.g. kanji) and the yomi (simplified phonetic information). It is experimentally shown that the technique can estimate accents with high accuracies, especially for some categories of words. The authors applied this technique to an existing large vocabulary Japanese dictionary NEologd, and obtained a large vocabulary Japanese accent dictionary. Many cases have been observed in which the use of this dictionary yields more appropriate phonetic information than UniDic.
摘要:在日文文本到语音转换(TTS),它是必要的重音信息添加到输入句子。不过,也有可公开获得的口音字典的数量有限,并且这些字典例如UNIDIC,不包含许多复合词,专有名词,等等,这些都需要在实际TTS系统。为了构建包含这些词的大规模口音词典,作者开发的口音估计技术来预测单词的重音从其有限的信息,即表面(例如汉字)和优美(简化的语音信息)。据实验表明,该技术能够估计修饰具有高精确度,特别是对词语的一些类别。作者应用该技术为现有大词汇日辞典NEologd,并得到大的词汇日本口音字典。许多情况下,已经观察到在使用这本字典产量更合适比UNIDIC语音信息。
Hideyuki Tachibana, Yotaro Katayama
Abstract: In Japanese text-to-speech (TTS), it is necessary to add accent information to the input sentence. However, there are a limited number of publicly available accent dictionaries, and those dictionaries e.g. UniDic, do not contain many compound words, proper nouns, etc., which are required in a practical TTS system. In order to build a large scale accent dictionary that contains those words, the authors developed an accent estimation technique that predicts the accent of a word from its limited information, namely the surface (e.g. kanji) and the yomi (simplified phonetic information). It is experimentally shown that the technique can estimate accents with high accuracies, especially for some categories of words. The authors applied this technique to an existing large vocabulary Japanese dictionary NEologd, and obtained a large vocabulary Japanese accent dictionary. Many cases have been observed in which the use of this dictionary yields more appropriate phonetic information than UniDic.
摘要:在日文文本到语音转换(TTS),它是必要的重音信息添加到输入句子。不过,也有可公开获得的口音字典的数量有限,并且这些字典例如UNIDIC,不包含许多复合词,专有名词,等等,这些都需要在实际TTS系统。为了构建包含这些词的大规模口音词典,作者开发的口音估计技术来预测单词的重音从其有限的信息,即表面(例如汉字)和优美(简化的语音信息)。据实验表明,该技术能够估计修饰具有高精确度,特别是对词语的一些类别。作者应用该技术为现有大词汇日辞典NEologd,并得到大的词汇日本口音字典。许多情况下,已经观察到在使用这本字典产量更合适比UNIDIC语音信息。
14. Alleviating the Inequality of Attention Heads for Neural Machine Translation [PDF] 返回目录
Zewei Sun, Shujian Huang, Xinyu Dai, Jiajun Chen
Abstract: Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.
摘要:最近的研究表明,注意头的变压器是不相等的。我们与这一现象的多头关注的不平衡培训和具体负责人模型的依赖。为了解决这个问题,我们提出了一个简单的屏蔽方法:HeadMask,两种具体方式。实验表明,翻译改进多语言对实现。随后的实证分析也支持了我们的假设,并证明了该方法的有效性。
Zewei Sun, Shujian Huang, Xinyu Dai, Jiajun Chen
Abstract: Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.
摘要:最近的研究表明,注意头的变压器是不相等的。我们与这一现象的多头关注的不平衡培训和具体负责人模型的依赖。为了解决这个问题,我们提出了一个简单的屏蔽方法:HeadMask,两种具体方式。实验表明,翻译改进多语言对实现。随后的实证分析也支持了我们的假设,并证明了该方法的有效性。
15. Generative Imagination Elevates Machine Translation [PDF] 返回目录
Quanyu Long, Mingxuan Wang, Lei Li
Abstract: There are thousands of languages on earth, but visual perception is shared among peoples. Existing multimodal neural machine translation (MNMT) methods achieve knowledge transfer by enforcing one encoder to learn shared representation across textual and visual modalities. However, the training and inference process heavily relies on well-aligned bilingual sentence - image triplets as input, which are often limited in quantity. In this paper, we hypothesize that visual imagination via synthesizing visual representation from source text could help the neural model map two languages with different symbols, thus helps the translation task. Our proposed end-to-end imagination-based machine translation model (ImagiT) first learns to generate semantic-consistent visual representation from source sentence, and then generate target sentence based on both text representation and imagined visual representation. Experiments demonstrate that our translation model benefits from visual imagination and significantly outperforms the text-only neural machine translation (NMT) baseline. We also conduct analyzing experiments, and the results show that imagination can help fill in missing information when performing the degradation strategy.
摘要:有数以千计的语言在地球上,但视觉感受人民共享。现有的多模态神经机器翻译(MNMT)方法实现通过强制执行一个编码器来学习跨文本及可视化方式共享表示知识转移。然而,培训和推理过程在很大程度上依赖于良好的对齐的双语句子 - 图像三胞胎作为输入,这往往是在数量上的限制。在本文中,我们假设通过合成从源文本可视化表示可以帮助神经网络模型映射两种语言有不同的符号是视觉的想象力,从而有助于翻译任务。我们提出的终端到终端的基于想象的机器翻译模型(ImagiT)先学会从产生源句子语义一致的视觉表现,然后根据文字表达和想象的视觉表现产生译文句子。实验结果表明,从视觉想象力和显著优于纯文本的神经机器翻译(NMT)基线我们的翻译模型的好处。我们还进行分析实验,结果表明,想象可以在执行退化战略时缺少的信息有助于填补。
Quanyu Long, Mingxuan Wang, Lei Li
Abstract: There are thousands of languages on earth, but visual perception is shared among peoples. Existing multimodal neural machine translation (MNMT) methods achieve knowledge transfer by enforcing one encoder to learn shared representation across textual and visual modalities. However, the training and inference process heavily relies on well-aligned bilingual sentence - image triplets as input, which are often limited in quantity. In this paper, we hypothesize that visual imagination via synthesizing visual representation from source text could help the neural model map two languages with different symbols, thus helps the translation task. Our proposed end-to-end imagination-based machine translation model (ImagiT) first learns to generate semantic-consistent visual representation from source sentence, and then generate target sentence based on both text representation and imagined visual representation. Experiments demonstrate that our translation model benefits from visual imagination and significantly outperforms the text-only neural machine translation (NMT) baseline. We also conduct analyzing experiments, and the results show that imagination can help fill in missing information when performing the degradation strategy.
摘要:有数以千计的语言在地球上,但视觉感受人民共享。现有的多模态神经机器翻译(MNMT)方法实现通过强制执行一个编码器来学习跨文本及可视化方式共享表示知识转移。然而,培训和推理过程在很大程度上依赖于良好的对齐的双语句子 - 图像三胞胎作为输入,这往往是在数量上的限制。在本文中,我们假设通过合成从源文本可视化表示可以帮助神经网络模型映射两种语言有不同的符号是视觉的想象力,从而有助于翻译任务。我们提出的终端到终端的基于想象的机器翻译模型(ImagiT)先学会从产生源句子语义一致的视觉表现,然后根据文字表达和想象的视觉表现产生译文句子。实验结果表明,从视觉想象力和显著优于纯文本的神经机器翻译(NMT)基线我们的翻译模型的好处。我们还进行分析实验,结果表明,想象可以在执行退化战略时缺少的信息有助于填补。
16. Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [PDF] 返回目录
Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung
Abstract: Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
摘要:尽管最近在多模态的情感识别任务所取得的成就,有两个问题依然存在,并没有被很好地研究:1)不同的情感类别之间的关系不被利用,从而导致次优的性能; 2)目前的模式失败,低资源情绪应对好,尤其是对看不见的情绪。在本文中,我们提出了感慨的嵌入一个方式转让的模式,以解决上述问题。我们使用预先训练字的嵌入代表情感类别的文本数据。然后,两个映射功能都学会了这些嵌入物转移到视觉和听觉空间。对于每个模态,该模型计算输入序列和目标情绪之间的表示距离,并基于所述距离的预测。通过这样做,我们的模型可以直接适应任何形式的看不见的情感,因为我们有自己的预先训练的嵌入和情态映射函数。实验表明,我们的模型实现了对大部分的情感类别的国家的最先进的性能。此外,我们的模型也优于在零射门,很少拍情景看不见的情绪现有基准。
Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung
Abstract: Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
摘要:尽管最近在多模态的情感识别任务所取得的成就,有两个问题依然存在,并没有被很好地研究:1)不同的情感类别之间的关系不被利用,从而导致次优的性能; 2)目前的模式失败,低资源情绪应对好,尤其是对看不见的情绪。在本文中,我们提出了感慨的嵌入一个方式转让的模式,以解决上述问题。我们使用预先训练字的嵌入代表情感类别的文本数据。然后,两个映射功能都学会了这些嵌入物转移到视觉和听觉空间。对于每个模态,该模型计算输入序列和目标情绪之间的表示距离,并基于所述距离的预测。通过这样做,我们的模型可以直接适应任何形式的看不见的情感,因为我们有自己的预先训练的嵌入和情态映射函数。实验表明,我们的模型实现了对大部分的情感类别的国家的最先进的性能。此外,我们的模型也优于在零射门,很少拍情景看不见的情绪现有基准。
17. Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media [PDF] 返回目录
Shamik Roy, Dan Goldwasser
Abstract: In this paper we suggest a minimally-supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way. We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control and abortion. We demonstrate the ability of the subframes to capture ideological differences and analyze political discourse in news media.
摘要:在本文中,我们建议在政治上分裂主题的新闻文章报道识别细致入微帧最小监督的做法。我们建议打破由Boydstun等人在2014年提出到细粒度的子帧,可以捕捉以更好的方式在政治意识形态的差异大的政策框架。我们评估建议的子帧和他们的嵌入,使用最少的监控,在三个议题,即,移民,枪支控制和堕胎的经验教训。我们展示的子帧捕捉意识形态差异,在新闻媒体分析政治话语的能力。
Shamik Roy, Dan Goldwasser
Abstract: In this paper we suggest a minimally-supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way. We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control and abortion. We demonstrate the ability of the subframes to capture ideological differences and analyze political discourse in news media.
摘要:在本文中,我们建议在政治上分裂主题的新闻文章报道识别细致入微帧最小监督的做法。我们建议打破由Boydstun等人在2014年提出到细粒度的子帧,可以捕捉以更好的方式在政治意识形态的差异大的政策框架。我们评估建议的子帧和他们的嵌入,使用最少的监控,在三个议题,即,移民,枪支控制和堕胎的经验教训。我们展示的子帧捕捉意识形态差异,在新闻媒体分析政治话语的能力。
18. Assessing the Severity of Health States based on Social Media Posts [PDF] 返回目录
Shweta Yadav, Joy Prakash Sain, Amit Sheth, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya
Abstract: The unprecedented growth of Internet users has resulted in an abundance of unstructured information on social media including health forums, where patients request health-related information or opinions from other users. Previous studies have shown that online peer support has limited effectiveness without expert intervention. Therefore, a system capable of assessing the severity of health state from the patients' social media posts can help health professionals (HP) in prioritizing the user's post. In this study, we inspect the efficacy of different aspects of Natural Language Understanding (NLU) to identify the severity of the user's health state in relation to two perspectives(tasks) (a) Medical Condition (i.e., Recover, Exist, Deteriorate, Other) and (b) Medication (i.e., Effective, Ineffective, Serious Adverse Effect, Other) in online health communities. We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state. Specifically, our model utilizes the NLU views such as sentiment, emotions, personality, and use of figurative language to extract the contextual information. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
摘要:网民的空前增长,导致了大量的社会化媒体,包括健康论坛,其中要求患者健康有关的信息或来自其他用户的意见非结构化信息。以前的研究已经表明,在线支持已没有专家的干预效果有限。因此,能够从患者的社交媒体帖子评估健康状况的严重程度可以帮助卫生专业人员(HP)的优先级用户的帖子的系统。在这项研究中,我们考察的自然语言理解(NLU)的不同方面的功效鉴别与用户的健康状况的严重程度两个角度(任务)(a)医疗条件(即恢复,存在恶化,其他在在线健康社区)和(b)药物(即有效的,无效的,严重的不良影响,其他)。我们提出了一个多视角的学习框架,模型无论是文本内容以及上下文信息来评估用户的健康状况的严重程度。具体来说,我们的模型利用自然语言理解的观点,如情感,情绪,性格和使用比喻性语言来提取上下文信息。多样化的NLU意见表明双方的任务,它的有效性和以及对个别疾病来评估用户的健康。
Shweta Yadav, Joy Prakash Sain, Amit Sheth, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya
Abstract: The unprecedented growth of Internet users has resulted in an abundance of unstructured information on social media including health forums, where patients request health-related information or opinions from other users. Previous studies have shown that online peer support has limited effectiveness without expert intervention. Therefore, a system capable of assessing the severity of health state from the patients' social media posts can help health professionals (HP) in prioritizing the user's post. In this study, we inspect the efficacy of different aspects of Natural Language Understanding (NLU) to identify the severity of the user's health state in relation to two perspectives(tasks) (a) Medical Condition (i.e., Recover, Exist, Deteriorate, Other) and (b) Medication (i.e., Effective, Ineffective, Serious Adverse Effect, Other) in online health communities. We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state. Specifically, our model utilizes the NLU views such as sentiment, emotions, personality, and use of figurative language to extract the contextual information. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
摘要:网民的空前增长,导致了大量的社会化媒体,包括健康论坛,其中要求患者健康有关的信息或来自其他用户的意见非结构化信息。以前的研究已经表明,在线支持已没有专家的干预效果有限。因此,能够从患者的社交媒体帖子评估健康状况的严重程度可以帮助卫生专业人员(HP)的优先级用户的帖子的系统。在这项研究中,我们考察的自然语言理解(NLU)的不同方面的功效鉴别与用户的健康状况的严重程度两个角度(任务)(a)医疗条件(即恢复,存在恶化,其他在在线健康社区)和(b)药物(即有效的,无效的,严重的不良影响,其他)。我们提出了一个多视角的学习框架,模型无论是文本内容以及上下文信息来评估用户的健康状况的严重程度。具体来说,我们的模型利用自然语言理解的观点,如情感,情绪,性格和使用比喻性语言来提取上下文信息。多样化的NLU意见表明双方的任务,它的有效性和以及对个别疾病来评估用户的健康。
19. Improving Robustness and Generality of NLP Models Using Disentangled Representations [PDF] 返回目录
Jiawei Wu, Xiaoya Li, Xiang Ao, Yuxian Meng, Fei Wu, Jiwei Li
Abstract: Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a model trained on one domain drops drastically when tested on another domain. In this paper, we present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. Instead of mapping $x$ to a single representation $z$, the proposed strategy maps $x$ to a set of representations $\{z_1,z_2,...,z_K\}$ while forcing them to be disentangled. These representations are then mapped to different logits $l$s, the ensemble of which is used to make the final prediction $y$. We propose different methods to incorporate this idea into currently widely-used models, including adding an $L$2 regularizer on $z$s or adding Total Correlation (TC) under the framework of variational information bottleneck (VIB). We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
摘要:监督神经网络,其中第一映射的输入$ X $到单个表示$ Z $,然后映射$ Z $到输出标签$ Y $,在宽范围的自然语言处理的(NLP已经取得了显着的成功) 任务。尽管他们的成功,神经模型缺乏既稳健性和通用性:小扰动的投入可能会导致完全不同的输出;当在另一个域测试训练上一个域模型的性能急剧下降。在本文中,我们目前的方法来提高从解缠结表示学习的角度来看鲁棒性和NLP模型的通用性。不是映射$ X $为单个表示$ Z $的,所提出的策略映射$ X $到一组表示$ \ {Z_1,Z_2,...,Z_K \} $,而迫使他们被解开的。然后这些表示被映射到不同logits $ $升s时,合奏其中用于使最终的预测$ Y $。我们建议不同的方法来将这一想法变成目前广泛使用的模型,其中包括$ Z ^ $ S添加$ L $ 2正则或变信息瓶颈(VIB)的框架下,将总相关(TC)。我们表明,所提出的标准训练的模型提供更好的鲁棒性和广泛的监督学习任务域适应能力。
Jiawei Wu, Xiaoya Li, Xiang Ao, Yuxian Meng, Fei Wu, Jiwei Li
Abstract: Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a model trained on one domain drops drastically when tested on another domain. In this paper, we present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. Instead of mapping $x$ to a single representation $z$, the proposed strategy maps $x$ to a set of representations $\{z_1,z_2,...,z_K\}$ while forcing them to be disentangled. These representations are then mapped to different logits $l$s, the ensemble of which is used to make the final prediction $y$. We propose different methods to incorporate this idea into currently widely-used models, including adding an $L$2 regularizer on $z$s or adding Total Correlation (TC) under the framework of variational information bottleneck (VIB). We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
摘要:监督神经网络,其中第一映射的输入$ X $到单个表示$ Z $,然后映射$ Z $到输出标签$ Y $,在宽范围的自然语言处理的(NLP已经取得了显着的成功) 任务。尽管他们的成功,神经模型缺乏既稳健性和通用性:小扰动的投入可能会导致完全不同的输出;当在另一个域测试训练上一个域模型的性能急剧下降。在本文中,我们目前的方法来提高从解缠结表示学习的角度来看鲁棒性和NLP模型的通用性。不是映射$ X $为单个表示$ Z $的,所提出的策略映射$ X $到一组表示$ \ {Z_1,Z_2,...,Z_K \} $,而迫使他们被解开的。然后这些表示被映射到不同logits $ $升s时,合奏其中用于使最终的预测$ Y $。我们建议不同的方法来将这一想法变成目前广泛使用的模型,其中包括$ Z ^ $ S添加$ L $ 2正则或变信息瓶颈(VIB)的框架下,将总相关(TC)。我们表明,所提出的标准训练的模型提供更好的鲁棒性和广泛的监督学习任务域适应能力。
20. Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding [PDF] 返回目录
Su Zhu, Ruisheng Cao, Lu Chen, Kai Yu
Abstract: Few-shot slot tagging becomes appealing for rapid domain transfer and adaptation, motivated by the tremendous development of conversational dialogue systems. In this paper, we propose a vector projection network for few-shot slot tagging, which exploits projections of contextual word embeddings on each target label vector as word-label similarities. Essentially, this approach is equivalent to a normalized linear model with an adaptive bias. The contrastive experiment demonstrates that our proposed vector projection based similarity metric can significantly surpass other variants. Specifically, in the five-shot setting on benchmarks SNIPS and NER, our method outperforms the strongest few-shot learning baseline by $6.30$ and $13.79$ points on F$_1$ score, respectively. Our code will be released at this https URL.
摘要:很少拍插槽标签变得呼吁迅速域名转移和调整,通过对话的对话系统的巨大发展动机。在本文中,我们提出了几拍时隙标记,其利用在每个目标标签矢量作为单词标签的相似性的上下文字的嵌入的突起一个矢量投影网络。本质上,这种方法是等同于自适应偏置归一化的线性模型。在对比实验表明,我们所提出的矢量投影基于相似性度量可以显著超越其他变体。具体而言,在基准SNIPS和NER五合一设定,我们的方法由$ 6.30 $和$ 13.79 $点分别胜过最强的几炮学习基础上的F $ _1 $得分。我们的代码将在这个HTTPS URL被释放。
Su Zhu, Ruisheng Cao, Lu Chen, Kai Yu
Abstract: Few-shot slot tagging becomes appealing for rapid domain transfer and adaptation, motivated by the tremendous development of conversational dialogue systems. In this paper, we propose a vector projection network for few-shot slot tagging, which exploits projections of contextual word embeddings on each target label vector as word-label similarities. Essentially, this approach is equivalent to a normalized linear model with an adaptive bias. The contrastive experiment demonstrates that our proposed vector projection based similarity metric can significantly surpass other variants. Specifically, in the five-shot setting on benchmarks SNIPS and NER, our method outperforms the strongest few-shot learning baseline by $6.30$ and $13.79$ points on F$_1$ score, respectively. Our code will be released at this https URL.
摘要:很少拍插槽标签变得呼吁迅速域名转移和调整,通过对话的对话系统的巨大发展动机。在本文中,我们提出了几拍时隙标记,其利用在每个目标标签矢量作为单词标签的相似性的上下文字的嵌入的突起一个矢量投影网络。本质上,这种方法是等同于自适应偏置归一化的线性模型。在对比实验表明,我们所提出的矢量投影基于相似性度量可以显著超越其他变体。具体而言,在基准SNIPS和NER五合一设定,我们的方法由$ 6.30 $和$ 13.79 $点分别胜过最强的几炮学习基础上的F $ _1 $得分。我们的代码将在这个HTTPS URL被释放。
21. Relation Extraction from Biomedical and Clinical Text: Unified Multitask Learning Framework [PDF] 返回目录
Shweta Yadav, Srivatsa Ramesh, Sriparna Saha, Asif Ekbal
Abstract: To minimize the accelerating amount of time invested in the biomedical literature search, numerous approaches for automated knowledge extraction have been proposed. Relation extraction is one such task where semantic relations between the entities are identified from the free text. In the biomedical domain, extraction of regulatory pathways, metabolic processes, adverse drug reaction or disease models necessitates knowledge from the individual relations, for example, physical or regulatory interactions between genes, proteins, drugs, chemical, disease or phenotype. In this paper, we study the relation extraction task from three major biomedical and clinical tasks, namely drug-drug interaction, protein-protein interaction, and medical concept relation extraction. Towards this, we model the relation extraction problem in multi-task learning (MTL) framework and introduce for the first time the concept of structured self-attentive network complemented with the adversarial learning approach for the prediction of relationships from the biomedical and clinical text. The fundamental notion of MTL is to simultaneously learn multiple problems together by utilizing the concepts of the shared representation. Additionally, we also generate the highly efficient single task model which exploits the shortest dependency path embedding learned over the attentive gated recurrent unit to compare our proposed MTL models. The framework we propose significantly improves overall the baselines (deep learning techniques) and single-task models for predicting the relationships, without compromising on the performance of all the tasks.
摘要:为了最大限度地减少投资于生物医学文献检索时间加速量,自动知识抽取许多方法已经被提出。关系抽取就是这样一个任务,在实体之间的语义关系是从自由文本识别。在生物医学领域,调节途径,代谢过程,药物不良反应或疾病模型就必须从个人关系的知识,例如,之间的基因,蛋白质,药物,化学品,疾病或表型物理或监管互动的提取。在本文中,我们研究主要从三个生物医学和临床任务,即药物间相互作用,蛋白质 - 蛋白质相互作用,和医学概念关系抽取的关系抽取任务。为了实现这个,我们的模型在多任务学习(MTL)框架的关系抽取问题,并介绍了第一次与来自生物医学和临床文字关系的预测对抗的学习方法的补充结构自周到网络的概念。 MTL的基本概念是通过利用共享表示的概念来同时学习多个问题在一起。此外,我们也产生它利用最短的路径依赖包埋学会了细心的门重复单元来比较我们提出的MTL模型的高效率的单任务模式。我们显著提出了框架整体提高了基线(深学习技术)和预测的关系,单任务模式,在不影响所有任务的性能。
Shweta Yadav, Srivatsa Ramesh, Sriparna Saha, Asif Ekbal
Abstract: To minimize the accelerating amount of time invested in the biomedical literature search, numerous approaches for automated knowledge extraction have been proposed. Relation extraction is one such task where semantic relations between the entities are identified from the free text. In the biomedical domain, extraction of regulatory pathways, metabolic processes, adverse drug reaction or disease models necessitates knowledge from the individual relations, for example, physical or regulatory interactions between genes, proteins, drugs, chemical, disease or phenotype. In this paper, we study the relation extraction task from three major biomedical and clinical tasks, namely drug-drug interaction, protein-protein interaction, and medical concept relation extraction. Towards this, we model the relation extraction problem in multi-task learning (MTL) framework and introduce for the first time the concept of structured self-attentive network complemented with the adversarial learning approach for the prediction of relationships from the biomedical and clinical text. The fundamental notion of MTL is to simultaneously learn multiple problems together by utilizing the concepts of the shared representation. Additionally, we also generate the highly efficient single task model which exploits the shortest dependency path embedding learned over the attentive gated recurrent unit to compare our proposed MTL models. The framework we propose significantly improves overall the baselines (deep learning techniques) and single-task models for predicting the relationships, without compromising on the performance of all the tasks.
摘要:为了最大限度地减少投资于生物医学文献检索时间加速量,自动知识抽取许多方法已经被提出。关系抽取就是这样一个任务,在实体之间的语义关系是从自由文本识别。在生物医学领域,调节途径,代谢过程,药物不良反应或疾病模型就必须从个人关系的知识,例如,之间的基因,蛋白质,药物,化学品,疾病或表型物理或监管互动的提取。在本文中,我们研究主要从三个生物医学和临床任务,即药物间相互作用,蛋白质 - 蛋白质相互作用,和医学概念关系抽取的关系抽取任务。为了实现这个,我们的模型在多任务学习(MTL)框架的关系抽取问题,并介绍了第一次与来自生物医学和临床文字关系的预测对抗的学习方法的补充结构自周到网络的概念。 MTL的基本概念是通过利用共享表示的概念来同时学习多个问题在一起。此外,我们也产生它利用最短的路径依赖包埋学会了细心的门重复单元来比较我们提出的MTL模型的高效率的单任务模式。我们显著提出了框架整体提高了基线(深学习技术)和预测的关系,单任务模式,在不影响所有任务的性能。
22. Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging [PDF] 返回目录
Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi
Abstract: Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.
摘要:耶扎菲是在两个词连接起来一些伊朗语言小品词。不管它传达了重要的信息,它几乎总是不波斯脚本指示,导致失误阅读复杂的句子和错误的自然语言处理任务。在本文中,我们用不同的机器学习方法进行试验,以实现国家的最先进的结果耶扎菲识别任务。基于变压器的方法,BERT和XLMRoBERTa,达到最好的效果,后者则达到2.68%,F1-得分超过以前的国家的最先进的。我们,而且,使用耶扎菲信息,以提高零件的词性标注波斯结果显示,这些信息不会对基于变压器的方法有用,并解释为什么这可能是这样。
Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi
Abstract: Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.
摘要:耶扎菲是在两个词连接起来一些伊朗语言小品词。不管它传达了重要的信息,它几乎总是不波斯脚本指示,导致失误阅读复杂的句子和错误的自然语言处理任务。在本文中,我们用不同的机器学习方法进行试验,以实现国家的最先进的结果耶扎菲识别任务。基于变压器的方法,BERT和XLMRoBERTa,达到最好的效果,后者则达到2.68%,F1-得分超过以前的国家的最先进的。我们,而且,使用耶扎菲信息,以提高零件的词性标注波斯结果显示,这些信息不会对基于变压器的方法有用,并解释为什么这可能是这样。
23. Dialogue Distillation: Open-domain Dialogue Augmentation Using Unpaired Data [PDF] 返回目录
Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao, Xiaoxi Mao, Yadong Xi, Minlie Huang
Abstract: Recent advances in open-domain dialogue systems rely on the success of neural models that are trained on large-scale data. However, collecting large-scale dialogue data is usually time-consuming and labor-intensive. To address this data dilemma, we propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. Specifically, a data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. Further, a model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs, thereby preventing dialogue models from being affected by the noise in the augmented data. Automatic and manual evaluation indicates that our method can produce high-quality dialogue pairs with diverse contents, and the proposed data-level and model-level dialogue distillation can improve the performance of competitive baselines.
摘要:在开放领域的对话系统的最新进展依赖于对被大规模数据训练的神经模式的成功。然而,收集大型对话数据通常是耗时耗力。为了解决这个两难的数据,我们提出了利用非成对数据训练开放域的对话模式新的数据增强方法。具体而言,数据级蒸馏方法首次提出构建其中两个柱和响应从所述未配对数据中检索增强的对话。排序模块被用来过滤掉低质量的对话。此外,当使用模型级蒸馏方法蒸馏训练高质量配对数据到增强对话对,从而防止对话模型从正受到在所述增强数据中的噪声老师模型。自动和手动评估表明,我们的方法可以产生高质量的对话,对具有不同的内容,以及所提出的数据级和模型级对话蒸馏可以提高竞争力基准的表现。
Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao, Xiaoxi Mao, Yadong Xi, Minlie Huang
Abstract: Recent advances in open-domain dialogue systems rely on the success of neural models that are trained on large-scale data. However, collecting large-scale dialogue data is usually time-consuming and labor-intensive. To address this data dilemma, we propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. Specifically, a data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. Further, a model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs, thereby preventing dialogue models from being affected by the noise in the augmented data. Automatic and manual evaluation indicates that our method can produce high-quality dialogue pairs with diverse contents, and the proposed data-level and model-level dialogue distillation can improve the performance of competitive baselines.
摘要:在开放领域的对话系统的最新进展依赖于对被大规模数据训练的神经模式的成功。然而,收集大型对话数据通常是耗时耗力。为了解决这个两难的数据,我们提出了利用非成对数据训练开放域的对话模式新的数据增强方法。具体而言,数据级蒸馏方法首次提出构建其中两个柱和响应从所述未配对数据中检索增强的对话。排序模块被用来过滤掉低质量的对话。此外,当使用模型级蒸馏方法蒸馏训练高质量配对数据到增强对话对,从而防止对话模型从正受到在所述增强数据中的噪声老师模型。自动和手动评估表明,我们的方法可以产生高质量的对话,对具有不同的内容,以及所提出的数据级和模型级对话蒸馏可以提高竞争力基准的表现。
24. F^2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax [PDF] 返回目录
Byung-Ju Choi, Jimin Hong, David Keetae Park, Sang Wan Lee
Abstract: Despite recent advances in neural text generation, encoding the rich diversity in human language remains elusive. We argue that the sub-optimal text generation is mainly attributable to the imbalanced token distribution, which particularly misdirects the learning model when trained with the maximum-likelihood objective. As a simple yet effective remedy, we propose two novel methods, F^2-Softmax and MefMax, for a balanced training even with the skewed frequency distribution. MefMax assigns tokens uniquely to frequency classes, trying to group tokens with similar frequencies and equalize frequency mass between the classes. F^2-Softmax then decomposes a probability distribution of the target token into a product of two conditional probabilities of (i) frequency class, and (ii) token from the target frequency class. Models learn more uniform probability distributions because they are confined to subsets of vocabularies. Significant performance gains on seven relevant metrics suggest the supremacy of our approach in improving not only the diversity but also the quality of generated texts.
摘要:尽管在神经文本生成的最新进展,编码人类语言的丰富多样性仍然遥遥无期。我们认为,次优的文本生成的主要原因是不均衡分配令牌,当用最大似然目标训练有素其中特别引上错误的学习模式。作为一种简单而有效的补救措施,提出了两种新方法,F ^ 2-使用SoftMax和MefMax,一个平衡的训练,即使偏斜频率分布。 MefMax受让人唯一令牌到频率类,尝试使用相似的频率和类之间的均衡化频率质量组令牌。 ˚F^ 2-使用SoftMax然后分解令牌插入(ⅰ)频率类的两个条件概率的乘积的目标的概率分布,和(ii)从目标频度等级令牌。学模型更均匀概率分布,因为他们被限制在词汇的子集。七个相关指标显著的性能提升表明我们的方法至上的改善不仅多样性,而且产生的文本的质量。
Byung-Ju Choi, Jimin Hong, David Keetae Park, Sang Wan Lee
Abstract: Despite recent advances in neural text generation, encoding the rich diversity in human language remains elusive. We argue that the sub-optimal text generation is mainly attributable to the imbalanced token distribution, which particularly misdirects the learning model when trained with the maximum-likelihood objective. As a simple yet effective remedy, we propose two novel methods, F^2-Softmax and MefMax, for a balanced training even with the skewed frequency distribution. MefMax assigns tokens uniquely to frequency classes, trying to group tokens with similar frequencies and equalize frequency mass between the classes. F^2-Softmax then decomposes a probability distribution of the target token into a product of two conditional probabilities of (i) frequency class, and (ii) token from the target frequency class. Models learn more uniform probability distributions because they are confined to subsets of vocabularies. Significant performance gains on seven relevant metrics suggest the supremacy of our approach in improving not only the diversity but also the quality of generated texts.
摘要:尽管在神经文本生成的最新进展,编码人类语言的丰富多样性仍然遥遥无期。我们认为,次优的文本生成的主要原因是不均衡分配令牌,当用最大似然目标训练有素其中特别引上错误的学习模式。作为一种简单而有效的补救措施,提出了两种新方法,F ^ 2-使用SoftMax和MefMax,一个平衡的训练,即使偏斜频率分布。 MefMax受让人唯一令牌到频率类,尝试使用相似的频率和类之间的均衡化频率质量组令牌。 ˚F^ 2-使用SoftMax然后分解令牌插入(ⅰ)频率类的两个条件概率的乘积的目标的概率分布,和(ii)从目标频度等级令牌。学模型更均匀概率分布,因为他们被限制在词汇的子集。七个相关指标显著的性能提升表明我们的方法至上的改善不仅多样性,而且产生的文本的质量。
25. Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation [PDF] 返回目录
Chujie Zheng, Yunbo Cao, Daxin Jiang, Minlie Huang
Abstract: In a multi-turn knowledge-grounded dialog, the difference between the knowledge selected at different turns usually provides potential clues to knowledge selection, which has been largely neglected in previous research. In this paper, we propose a difference-aware knowledge selection method. It first computes the difference between the candidate knowledge sentences provided at the current turn and those chosen in the previous turns. Then, the differential information is fused with or disentangled from the contextual information to facilitate final knowledge selection. Automatic, human observational, and interactive evaluation shows that our method is able to select knowledge more accurately and generate more informative responses, significantly outperforming the state-of-the-art baselines. The codes are available at this https URL.
摘要:在多把知识接地对话框中,在不同的回合选择的知识之间的差别通常提供潜在线索知识的选择,这在以前的研究基本上被忽视。在本文中,我们提出了一个差感知知识的选择方法。它首先计算在当前回合提供的候选知识的句子和那些在先前轮流选择之间的差异。然后,差分信息融合与或从上下文信息解开,以方便最终选择的知识。自动,人的观察和互动评价表明,我们的方法能够更准确地选择知识,产生更多的信息的反应,显著超越国家的最先进的基线。该代码可在此HTTPS URL。
Chujie Zheng, Yunbo Cao, Daxin Jiang, Minlie Huang
Abstract: In a multi-turn knowledge-grounded dialog, the difference between the knowledge selected at different turns usually provides potential clues to knowledge selection, which has been largely neglected in previous research. In this paper, we propose a difference-aware knowledge selection method. It first computes the difference between the candidate knowledge sentences provided at the current turn and those chosen in the previous turns. Then, the differential information is fused with or disentangled from the contextual information to facilitate final knowledge selection. Automatic, human observational, and interactive evaluation shows that our method is able to select knowledge more accurately and generate more informative responses, significantly outperforming the state-of-the-art baselines. The codes are available at this https URL.
摘要:在多把知识接地对话框中,在不同的回合选择的知识之间的差别通常提供潜在线索知识的选择,这在以前的研究基本上被忽视。在本文中,我们提出了一个差感知知识的选择方法。它首先计算在当前回合提供的候选知识的句子和那些在先前轮流选择之间的差异。然后,差分信息融合与或从上下文信息解开,以方便最终选择的知识。自动,人的观察和互动评价表明,我们的方法能够更准确地选择知识,产生更多的信息的反应,显著超越国家的最先进的基线。该代码可在此HTTPS URL。
26. Softmax Tempering for Training Neural Machine Translation Models [PDF] 返回目录
Raj Dabre, Atsushi Fujita
Abstract: Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels. In low-resource scenarios, NMT models tend to over-fit because the softmax distribution quickly approaches the gold label distribution. To address this issue, we propose to divide the logits by a temperature coefficient, prior to applying softmax, during training. In our experiments on 11 language pairs in the Asian Language Treebank dataset and the WMT 2019 English-to-German translation task, we observed significant improvements in translation quality by up to 3.9 BLEU points. Furthermore, softmax tempering makes the greedy search to be as good as beam search decoding in terms of translation quality, enabling 1.5 to 3.5 times speed-up. We also study the impact of softmax tempering on multilingual NMT and recurrently stacked NMT, both of which aim to reduce the NMT model size by parameter sharing thereby verifying the utility of temperature in developing compact NMT models. Finally, an analysis of softmax entropies and gradients reveal the impact of our method on the internal behavior of NMT models.
摘要:神经机器翻译(NMT)模型用Softmax交叉熵损失,其中SOFTMAX分布对平滑黄金标签相比一般的培训。在资源匮乏的情况下,NMT模型倾向于过度拟合,因为SOFTMAX分布迅速接近黄金标签分发。为了解决这个问题,我们提出通过温度系数来划分logits,训练过程中施加SOFTMAX,之前。在我们对亚洲语言树库的数据集11的语言对和WMT 2019英语到德语翻译任务实验中,我们观察到多达3.9个BLEU点翻译质量显著的改善。此外,SOFTMAX回火使得贪婪搜索是不如波束搜索在翻译质量方面进行解码,使1.5〜3.5倍的加速。我们还研究SOFTMAX回火的多语种NMT的影响,反复堆叠NMT,两者的目标是通过参数共享从而验证开发紧凑车型NMT温度的工具,以减少NMT模型的大小。最后,SOFTMAX熵和梯度的分析,揭示了我们的方法对NMT模型的内部行为的影响。
Raj Dabre, Atsushi Fujita
Abstract: Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels. In low-resource scenarios, NMT models tend to over-fit because the softmax distribution quickly approaches the gold label distribution. To address this issue, we propose to divide the logits by a temperature coefficient, prior to applying softmax, during training. In our experiments on 11 language pairs in the Asian Language Treebank dataset and the WMT 2019 English-to-German translation task, we observed significant improvements in translation quality by up to 3.9 BLEU points. Furthermore, softmax tempering makes the greedy search to be as good as beam search decoding in terms of translation quality, enabling 1.5 to 3.5 times speed-up. We also study the impact of softmax tempering on multilingual NMT and recurrently stacked NMT, both of which aim to reduce the NMT model size by parameter sharing thereby verifying the utility of temperature in developing compact NMT models. Finally, an analysis of softmax entropies and gradients reveal the impact of our method on the internal behavior of NMT models.
摘要:神经机器翻译(NMT)模型用Softmax交叉熵损失,其中SOFTMAX分布对平滑黄金标签相比一般的培训。在资源匮乏的情况下,NMT模型倾向于过度拟合,因为SOFTMAX分布迅速接近黄金标签分发。为了解决这个问题,我们提出通过温度系数来划分logits,训练过程中施加SOFTMAX,之前。在我们对亚洲语言树库的数据集11的语言对和WMT 2019英语到德语翻译任务实验中,我们观察到多达3.9个BLEU点翻译质量显著的改善。此外,SOFTMAX回火使得贪婪搜索是不如波束搜索在翻译质量方面进行解码,使1.5〜3.5倍的加速。我们还研究SOFTMAX回火的多语种NMT的影响,反复堆叠NMT,两者的目标是通过参数共享从而验证开发紧凑车型NMT温度的工具,以减少NMT模型的大小。最后,SOFTMAX熵和梯度的分析,揭示了我们的方法对NMT模型的内部行为的影响。
27. Understanding Mention Detector-Linker Interaction for Neural Coreference Resolution [PDF] 返回目录
Zhaofeng Wu, Matt Gardner
Abstract: Coreference resolution is an important task for discourse-level natural language understanding. However, despite significant recent progress, the quality of current state-of-the-art systems still considerably trails behind human-level performance. Using the CoNLL-2012 and PreCo datasets, we dissect the best instantiation of the mainstream end-to-end coreference resolution model that underlies most current best-performing coreference systems, and empirically analyze the behavior of its two components: the mention detector and mention linker. While the detector traditionally focuses heavily on recall as a design decision, we demonstrate the importance of precision, calling for their balance. However, we point out the difficulty in building a precise detector due to its inability to make important anaphoricity decisions. We also highlight the enormous room for improving the linker and that the rest of its errors mainly involve pronoun resolution. We hope our findings will help future research in building coreference resolution systems.
摘要:指代消解是语篇层次的自然语言理解的重要任务。然而,尽管最近显著的进步,国家的最先进的电流系统仍然大大落后人类水平的性能创新的质量。使用CoNLL-2012和PRECO数据集,我们剖析主流的终端到终端的指代消解模型的最佳实例是underlies最新表现最好的共参照系统和实证分析它的两个组件的行为:提探测器和提连接。而探测器上的传统召回的重点集中为一个设计决策,我们展示的精度的重要性,呼吁他们的平衡。然而,我们指出在建设一个精确的检测难度,因为它无法作出重要anaphoricity决定。我们还强调了巨大改善的余地连接器和其错误的其余部分主要涉及代词消解。我们希望我们的发现将有助于未来研究建立指代消解系统。
Zhaofeng Wu, Matt Gardner
Abstract: Coreference resolution is an important task for discourse-level natural language understanding. However, despite significant recent progress, the quality of current state-of-the-art systems still considerably trails behind human-level performance. Using the CoNLL-2012 and PreCo datasets, we dissect the best instantiation of the mainstream end-to-end coreference resolution model that underlies most current best-performing coreference systems, and empirically analyze the behavior of its two components: the mention detector and mention linker. While the detector traditionally focuses heavily on recall as a design decision, we demonstrate the importance of precision, calling for their balance. However, we point out the difficulty in building a precise detector due to its inability to make important anaphoricity decisions. We also highlight the enormous room for improving the linker and that the rest of its errors mainly involve pronoun resolution. We hope our findings will help future research in building coreference resolution systems.
摘要:指代消解是语篇层次的自然语言理解的重要任务。然而,尽管最近显著的进步,国家的最先进的电流系统仍然大大落后人类水平的性能创新的质量。使用CoNLL-2012和PRECO数据集,我们剖析主流的终端到终端的指代消解模型的最佳实例是underlies最新表现最好的共参照系统和实证分析它的两个组件的行为:提探测器和提连接。而探测器上的传统召回的重点集中为一个设计决策,我们展示的精度的重要性,呼吁他们的平衡。然而,我们指出在建设一个精确的检测难度,因为它无法作出重要anaphoricity决定。我们还强调了巨大改善的余地连接器和其错误的其余部分主要涉及代词消解。我们希望我们的发现将有助于未来研究建立指代消解系统。
28. Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation [PDF] 返回目录
Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Md Hasan, Madhusudan Basak, M. Sohel Rahman, Rifat Shahriyar
Abstract: Despite being the seventh most widely spoken language in the world, Bengali has received much less attention in machine translation literature due to being low in resources. Most publicly available parallel corpora for Bengali are not large enough; and have rather poor quality, mostly because of incorrect sentence alignments resulting from erroneous sentence segmentation, and also because of a high volume of noise present in them. In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering. With the segmenter and the two methods combined, we compile a high-quality Bengali-English parallel corpus comprising of 2.75 million sentence pairs, more than 2 million of which were not available before. Training on neural machine translation models, we achieve an improvement of more than 9 BLEU over previous approaches to Bengali-English machine translation. We also evaluate on a new test set of 615 pairs made with extensive quality control. We release the segmenter, parallel corpus, and the evaluation set; thus elevating Bengali from its low-resource status. To the best of our knowledge, this is the first ever large scale study on Bengali-English machine translation. We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages.
摘要:尽管是世界上最广泛的语言第七,孟加拉已收到机器翻译文学更重视由于资源为低。孟加拉语大多数公开可用的平行语料库不够大;并具有由于高体积存在于它们的噪声的质量比较差,因为从错误句子分割所得的不正确的句子比对居多,而还。在这项工作中,我们建立了一个自定义的句子分割为孟加拉语和提出对低资源设置平行语料库的创建两个新的方法:矫正ensembling和批量过滤。随着分割器和两个相结合的方法,我们编译了高品质的孟加拉语,英语平行语料库包括275万句对,其中超过200万的人无法提供之前。神经机器翻译模型的训练,我们实现了超过9 BLEU超过以前的方法,以孟加拉语 - 汉英机器翻译的改善。我们还评估了新测试一套具有广泛的质量控制提出615对。我们发布的分割,平行语料库,评价组;因此,从低资源状态提升孟加拉语。据我们所知,这是孟加拉 - 汉英机器翻译的首次大规模研究。我们相信,我们的研究将为将来在孟加拉语 - 汉英机器翻译研究以及其他低资源语言的方式。
Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Md Hasan, Madhusudan Basak, M. Sohel Rahman, Rifat Shahriyar
Abstract: Despite being the seventh most widely spoken language in the world, Bengali has received much less attention in machine translation literature due to being low in resources. Most publicly available parallel corpora for Bengali are not large enough; and have rather poor quality, mostly because of incorrect sentence alignments resulting from erroneous sentence segmentation, and also because of a high volume of noise present in them. In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering. With the segmenter and the two methods combined, we compile a high-quality Bengali-English parallel corpus comprising of 2.75 million sentence pairs, more than 2 million of which were not available before. Training on neural machine translation models, we achieve an improvement of more than 9 BLEU over previous approaches to Bengali-English machine translation. We also evaluate on a new test set of 615 pairs made with extensive quality control. We release the segmenter, parallel corpus, and the evaluation set; thus elevating Bengali from its low-resource status. To the best of our knowledge, this is the first ever large scale study on Bengali-English machine translation. We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages.
摘要:尽管是世界上最广泛的语言第七,孟加拉已收到机器翻译文学更重视由于资源为低。孟加拉语大多数公开可用的平行语料库不够大;并具有由于高体积存在于它们的噪声的质量比较差,因为从错误句子分割所得的不正确的句子比对居多,而还。在这项工作中,我们建立了一个自定义的句子分割为孟加拉语和提出对低资源设置平行语料库的创建两个新的方法:矫正ensembling和批量过滤。随着分割器和两个相结合的方法,我们编译了高品质的孟加拉语,英语平行语料库包括275万句对,其中超过200万的人无法提供之前。神经机器翻译模型的训练,我们实现了超过9 BLEU超过以前的方法,以孟加拉语 - 汉英机器翻译的改善。我们还评估了新测试一套具有广泛的质量控制提出615对。我们发布的分割,平行语料库,评价组;因此,从低资源状态提升孟加拉语。据我们所知,这是孟加拉 - 汉英机器翻译的首次大规模研究。我们相信,我们的研究将为将来在孟加拉语 - 汉英机器翻译研究以及其他低资源语言的方式。
29. Biomedical Event Extraction on Graph Edge-conditioned Attention Networks with Hierarchical Knowledge Graphs [PDF] 返回目录
Kung-Hsiang Huang, Mu Yang, Nanyun Peng
Abstract: Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41% F1 and 3.19% F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.
摘要:生物医学事件的提取是在理解科学语料库描述生物分子相互作用的关键。其中一个主要的挑战是确定与非指示触发词相关联的嵌套结构化的事件。我们建议从一体化医学语言系统(UMLS)纳入领域知识通过图形边缘空调,注意网络预先训练的语言模型(GEANet)和分层图表示。为了更好地识别触发词,每个句子首先接地基于从UMLS一个共同建模层次知识图一个句子图。接地图形然后通过GEANet,用于推断复杂事件增强的能力的新的图形神经网络传播。在BioNLP 2011 GENIA事件抽取的任务,我们的方法实现了对所有分别的事件和复杂事件,1.41%,F1和3.19%F1的改进。消融研究证实GEANet和层次KG的重要性。
Kung-Hsiang Huang, Mu Yang, Nanyun Peng
Abstract: Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41% F1 and 3.19% F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.
摘要:生物医学事件的提取是在理解科学语料库描述生物分子相互作用的关键。其中一个主要的挑战是确定与非指示触发词相关联的嵌套结构化的事件。我们建议从一体化医学语言系统(UMLS)纳入领域知识通过图形边缘空调,注意网络预先训练的语言模型(GEANet)和分层图表示。为了更好地识别触发词,每个句子首先接地基于从UMLS一个共同建模层次知识图一个句子图。接地图形然后通过GEANet,用于推断复杂事件增强的能力的新的图形神经网络传播。在BioNLP 2011 GENIA事件抽取的任务,我们的方法实现了对所有分别的事件和复杂事件,1.41%,F1和3.19%F1的改进。消融研究证实GEANet和层次KG的重要性。
30. Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation [PDF] 返回目录
Fajri Koto, Ikhwan Koto
Abstract: Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.
摘要:虽然有些语言学家(Rusmali等人,1985;克劳奇,2009)已相当试图定义米南卡包的形态和句法,该语言信息处理仍然不存在由于附加说明的资源的缺乏。在这项工作中,我们推出两款米南加语料库:正在收获来自Twitter和维基百科构建情感分析和机器翻译。我们在开展米南加保语第一计算语言学采用经典的机器学习和序列到序列模型,如LSTM和变压器。我们的第一个实验表明,在米南加保文本分类性能当与印尼训练模型测试显著下降。然而,在机器翻译实验,一个简单的词对词的翻译在BLEU得分方面使用双语词典性能优于LSTM和变压器模型。
Fajri Koto, Ikhwan Koto
Abstract: Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.
摘要:虽然有些语言学家(Rusmali等人,1985;克劳奇,2009)已相当试图定义米南卡包的形态和句法,该语言信息处理仍然不存在由于附加说明的资源的缺乏。在这项工作中,我们推出两款米南加语料库:正在收获来自Twitter和维基百科构建情感分析和机器翻译。我们在开展米南加保语第一计算语言学采用经典的机器学习和序列到序列模型,如LSTM和变压器。我们的第一个实验表明,在米南加保文本分类性能当与印尼训练模型测试显著下降。然而,在机器翻译实验,一个简单的词对词的翻译在BLEU得分方面使用双语词典性能优于LSTM和变压器模型。
31. Word class flexibility: A deep contextualized approach [PDF] 返回目录
Bai Li, Guillaume Thomas, Yang Xu, Frank Rudzicz
Abstract: Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexibility. Our method builds on recent work in contextualized word embeddings to quantify semantic shift between word classes (e.g., noun-to-verb, verb-to-noun), and we apply this method to 37 languages. We find that contextualized embeddings not only capture human judgment of class variation within words in English, but also uncover shared tendencies in class flexibility across languages. Specifically, we find greater semantic variation when flexible lemmas are used in their dominant word class, supporting the view that word class flexibility is a directional process. Our work highlights the utility of deep contextualized models in linguistic typology.
摘要:字类灵活性是指由此单个字的形式在不同的语法范畴中使用的现象。在语言类型学大量的工作一直寻求跨语言表征词类的灵活性,但精确量化这种现象在规模一直困难重重。我们提出了一个原则性的方法来探索词类灵活性规律性。我们的方法建立在情境字的嵌入近期的工作进行量化词类(如名词到动词,动词对名词)之间的语义转变,我们将此方法应用于37种语言。我们发现,情境化的嵌入不仅能够捕捉类间变化的人为判断的话中英文,但还可能发现共享的跨语言类灵活性的倾向。具体来说,我们发现更大的语义变化时柔性外稃在他们的主导词类的使用,支承认为词类灵活性是定向过程。我们的工作突出深情境模型的语言类型学的效用。
Bai Li, Guillaume Thomas, Yang Xu, Frank Rudzicz
Abstract: Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexibility. Our method builds on recent work in contextualized word embeddings to quantify semantic shift between word classes (e.g., noun-to-verb, verb-to-noun), and we apply this method to 37 languages. We find that contextualized embeddings not only capture human judgment of class variation within words in English, but also uncover shared tendencies in class flexibility across languages. Specifically, we find greater semantic variation when flexible lemmas are used in their dominant word class, supporting the view that word class flexibility is a directional process. Our work highlights the utility of deep contextualized models in linguistic typology.
摘要:字类灵活性是指由此单个字的形式在不同的语法范畴中使用的现象。在语言类型学大量的工作一直寻求跨语言表征词类的灵活性,但精确量化这种现象在规模一直困难重重。我们提出了一个原则性的方法来探索词类灵活性规律性。我们的方法建立在情境字的嵌入近期的工作进行量化词类(如名词到动词,动词对名词)之间的语义转变,我们将此方法应用于37种语言。我们发现,情境化的嵌入不仅能够捕捉类间变化的人为判断的话中英文,但还可能发现共享的跨语言类灵活性的倾向。具体来说,我们发现更大的语义变化时柔性外稃在他们的主导词类的使用,支承认为词类灵活性是定向过程。我们的工作突出深情境模型的语言类型学的效用。
32. BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition [PDF] 返回目录
Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, Jinman Kim
Abstract: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) bioALBERT, an effective domain-specific language model trained on large-scale biomedical corpora designed to capture biomedical context-dependent NER. We adopted a self-supervised loss used in ALBERT that focuses on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction techniques to lower memory consumption and increase the training speed in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.
摘要:近年来,随着越来越多的生物医学量的文件,加上在自然语言处理算法的进步,生物医学命名实体识别(BioNER)的研究已经成倍增加。然而,BioNER研究是具有挑战性如在生物医学领域NER是:(i)通常被限制由于训练数据量有限,(ⅱ)一个实体可以指多种类型和概念取决于它的上下文和,(ⅲ)严重依赖在首字母缩写词是子域特有的。现有BioNER办法往往忽视了这些问题,并直接采用中经常产生不令人满意的结果一般语料训练的国家的最先进的(SOTA)模型。我们提出了生物医学伟业(A精简版双向编码器交涉从变压器的生物医学文本挖掘)bioALBERT,有效的领域特定语言模型训练的规模化生物医学语料库旨在捕获生物医学上下文相关的ER。我们通过在ALBERT使用,专注于造型句间的连贯性,更好的学习依赖于上下文的表示并结合参数减少技术来降低内存消耗和增加BioNER训练速度的自我监督的损失。在我们的实验中,BioALBERT跑赢比较SOTA BioNER型号八个生物医学NER基准数据集有四个不同的实体类型。我们训练有素的BioALBERT模型,其可用于在未来的研究中使用的研究界的四个不同的变种。
Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, Jinman Kim
Abstract: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) bioALBERT, an effective domain-specific language model trained on large-scale biomedical corpora designed to capture biomedical context-dependent NER. We adopted a self-supervised loss used in ALBERT that focuses on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction techniques to lower memory consumption and increase the training speed in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.
摘要:近年来,随着越来越多的生物医学量的文件,加上在自然语言处理算法的进步,生物医学命名实体识别(BioNER)的研究已经成倍增加。然而,BioNER研究是具有挑战性如在生物医学领域NER是:(i)通常被限制由于训练数据量有限,(ⅱ)一个实体可以指多种类型和概念取决于它的上下文和,(ⅲ)严重依赖在首字母缩写词是子域特有的。现有BioNER办法往往忽视了这些问题,并直接采用中经常产生不令人满意的结果一般语料训练的国家的最先进的(SOTA)模型。我们提出了生物医学伟业(A精简版双向编码器交涉从变压器的生物医学文本挖掘)bioALBERT,有效的领域特定语言模型训练的规模化生物医学语料库旨在捕获生物医学上下文相关的ER。我们通过在ALBERT使用,专注于造型句间的连贯性,更好的学习依赖于上下文的表示并结合参数减少技术来降低内存消耗和增加BioNER训练速度的自我监督的损失。在我们的实验中,BioALBERT跑赢比较SOTA BioNER型号八个生物医学NER基准数据集有四个不同的实体类型。我们训练有素的BioALBERT模型,其可用于在未来的研究中使用的研究界的四个不同的变种。
33. Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [PDF] 返回目录
Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
Abstract: Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.
摘要:对抗性攻击的目标与对抗的例子愚弄深层神经网络。在自然语言处理领域,各种文本敌对攻击模型已经提出,在无障碍受害人模型不同。其中,攻击模式,只需要受害人模型的输出是用于对抗攻击的现实情况更为契合。然而,为了实现高攻击性能,这些模型通常需要查询受害人模型太多次,这在实践中既没有效率,也不可行。为了解决这个问题,我们提出了一个强化学习基于Web的攻击模式,这可以从攻击的历史和发动攻击更有效地学习。在实验中,我们评估通过对多任务的基准数据集,包括情感分析,文本分类和自然语言推理攻击的国家的最先进的几款车型我们的模型。实验结果表明,我们的模型一致地实现了比最近提出的基准方法都更好的进攻表现和更高的效率。我们也发现我们的攻击模型可以通过对抗训练带来更多的稳健性改善受害人模型。所有的代码和本文的数据将被公之于众。
Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
Abstract: Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.
摘要:对抗性攻击的目标与对抗的例子愚弄深层神经网络。在自然语言处理领域,各种文本敌对攻击模型已经提出,在无障碍受害人模型不同。其中,攻击模式,只需要受害人模型的输出是用于对抗攻击的现实情况更为契合。然而,为了实现高攻击性能,这些模型通常需要查询受害人模型太多次,这在实践中既没有效率,也不可行。为了解决这个问题,我们提出了一个强化学习基于Web的攻击模式,这可以从攻击的历史和发动攻击更有效地学习。在实验中,我们评估通过对多任务的基准数据集,包括情感分析,文本分类和自然语言推理攻击的国家的最先进的几款车型我们的模型。实验结果表明,我们的模型一致地实现了比最近提出的基准方法都更好的进攻表现和更高的效率。我们也发现我们的攻击模型可以通过对抗训练带来更多的稳健性改善受害人模型。所有的代码和本文的数据将被公之于众。
34. OpenAttack: An Open-source Textual Adversarial Attack Toolkit [PDF] 返回目录
Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun
Abstract: Textual adversarial attacking has received wide and increasing attention in recent years. Various attack models have been proposed, which are enormously distinct and implemented with different programming frameworks and settings. These facts hinder quick utilization and apt comparison of attack models. In this paper, we present an open-source textual adversarial attack toolkit named OpenAttack. It currently builds in 12 typical attack models that cover all the attack types. Its highly inclusive modular design not only supports quick utilization of existing attack models, but also enables great flexibility and extensibility. OpenAttack has broad uses including comparing and evaluating attack models, measuring robustness of a victim model, assisting in developing new attack models, and adversarial training. Source code, built-in models and documentation can be obtained at this https URL.
摘要:文本对抗攻击已经得到了广泛的和越来越多的关注,近年来。各种攻击模型已经被提出,这是极大的不同而具有不同的编程框架和设置来实现。这些事实阻碍了快速的利用和攻击模式容易比较。在本文中,我们提出了一个开源的命名OpenAttack文本对抗攻击工具包。目前,它在建立覆盖所有攻击类型12个典型的攻击模型。它的包容性很强的模块化设计,不仅支持现有的攻击模型的快速利用率,而且能够极大的灵活性和可扩展性。 OpenAttack具有广泛的用途,包括比较和评估攻击模型,测量模型受害者的稳健性,在开发新的攻击模型协助和对抗性训练。源代码,内置的模型和文档可在此HTTPS URL来获得。
Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun
Abstract: Textual adversarial attacking has received wide and increasing attention in recent years. Various attack models have been proposed, which are enormously distinct and implemented with different programming frameworks and settings. These facts hinder quick utilization and apt comparison of attack models. In this paper, we present an open-source textual adversarial attack toolkit named OpenAttack. It currently builds in 12 typical attack models that cover all the attack types. Its highly inclusive modular design not only supports quick utilization of existing attack models, but also enables great flexibility and extensibility. OpenAttack has broad uses including comparing and evaluating attack models, measuring robustness of a victim model, assisting in developing new attack models, and adversarial training. Source code, built-in models and documentation can be obtained at this https URL.
摘要:文本对抗攻击已经得到了广泛的和越来越多的关注,近年来。各种攻击模型已经被提出,这是极大的不同而具有不同的编程框架和设置来实现。这些事实阻碍了快速的利用和攻击模式容易比较。在本文中,我们提出了一个开源的命名OpenAttack文本对抗攻击工具包。目前,它在建立覆盖所有攻击类型12个典型的攻击模型。它的包容性很强的模块化设计,不仅支持现有的攻击模型的快速利用率,而且能够极大的灵活性和可扩展性。 OpenAttack具有广泛的用途,包括比较和评估攻击模型,测量模型受害者的稳健性,在开发新的攻击模型协助和对抗性训练。源代码,内置的模型和文档可在此HTTPS URL来获得。
35. Aggressive Language Detection with Joint Text Normalization via Adversarial Multi-task Learning [PDF] 返回目录
Shengqiong Wu, Hao Fei, Donghong Ji
Abstract: Aggressive language detection (ALD), detecting the abusive and offensive language in texts, is one of the crucial applications in NLP community. Most existing works treat ALD as regular classification with neural models, while ignoring the inherent conflicts of social media text that they are quite unnormalized and irregular. In this work, we target improving the ALD by jointly performing text normalization (TN), via an adversarial multi-task learning framework. The private encoders for ALD and TN focus on the task-specific features retrieving, respectively, and the shared encoder learns the underlying common features over two tasks. During adversarial training, a task discriminator distinguishes the separate learning of ALD or TN. Experimental results on four ALD datasets show that our model outperforms all baselines under differing settings by large margins, demonstrating the necessity of joint learning the TN with ALD. Further analysis is conducted for a better understanding of our method.
摘要:积极的语言检测(ALD),检测文本的辱骂和攻击性的语言,是在NLP社会的关键应用之一。大多数现有的作品把ALD与神经模型一般的分类,而忽略了社交媒体文本的内在冲突,他们是相当的非标准化和不规则。在这项工作中,我们的目标通过联合执行文本规范化(TN),通过对抗性多任务学习框架提高了ALD。私人编码器为ALD和TN重点任务的具体特征检索,分别与共享编码器在两个任务学习的基本共同特征。在对抗训练中,任务区分鉴别ALD或TN的独立学习。四个ALD数据集实验结果表明,我们的模型优于下大的利润不同的设置,表明联合的学习与ALD的TN必要性所有基准。进一步分析我们的方法更好地理解进行。
Shengqiong Wu, Hao Fei, Donghong Ji
Abstract: Aggressive language detection (ALD), detecting the abusive and offensive language in texts, is one of the crucial applications in NLP community. Most existing works treat ALD as regular classification with neural models, while ignoring the inherent conflicts of social media text that they are quite unnormalized and irregular. In this work, we target improving the ALD by jointly performing text normalization (TN), via an adversarial multi-task learning framework. The private encoders for ALD and TN focus on the task-specific features retrieving, respectively, and the shared encoder learns the underlying common features over two tasks. During adversarial training, a task discriminator distinguishes the separate learning of ALD or TN. Experimental results on four ALD datasets show that our model outperforms all baselines under differing settings by large margins, demonstrating the necessity of joint learning the TN with ALD. Further analysis is conducted for a better understanding of our method.
摘要:积极的语言检测(ALD),检测文本的辱骂和攻击性的语言,是在NLP社会的关键应用之一。大多数现有的作品把ALD与神经模型一般的分类,而忽略了社交媒体文本的内在冲突,他们是相当的非标准化和不规则。在这项工作中,我们的目标通过联合执行文本规范化(TN),通过对抗性多任务学习框架提高了ALD。私人编码器为ALD和TN重点任务的具体特征检索,分别与共享编码器在两个任务学习的基本共同特征。在对抗训练中,任务区分鉴别ALD或TN的独立学习。四个ALD数据集实验结果表明,我们的模型优于下大的利润不同的设置,表明联合的学习与ALD的TN必要性所有基准。进一步分析我们的方法更好地理解进行。
36. Nominal Compound Chain Extraction: A New Task for Semantic-enriched Lexical Chain [PDF] 返回目录
Bobo Li, Hao Fei, Yafeng Ren, Donghong Ji
Abstract: Lexical chain consists of cohesion words in a document, which implies the underlying structure of a text, and thus facilitates downstream NLP tasks. Nevertheless, existing work focuses on detecting the simple surface lexicons with shallow syntax associations, ignoring the semantic-aware lexical compounds as well as the latent semantic frames, (e.g., topic), which can be much more crucial for real-world NLP applications. In this paper, we introduce a novel task, Nominal Compound Chain Extraction (NCCE), extracting and clustering all the nominal compounds that share identical semantic topics. In addition, we model the task as a two-stage prediction (i.e., compound extraction and chain detection), which is handled via a proposed joint framework. The model employs the BERT encoder to yield contextualized document representation. Also, HowNet is exploited as external resources for offering rich sememe information. The experiments are based on our manually annotated corpus, and the results prove the necessity of the NCCE task as well as the effectiveness of our joint approach.
摘要:词汇链由一个文件,这意味着文本的底层结构,在凝聚的话,从而促进下游NLP任务。然而,现有的工作集中在检测与浅语法协会简单表面词典,忽略了语义感知词法化合物以及所述潜在语义帧(例如,主题),它可以是真实世界NLP应用更加关键。在本文中,我们介绍一种新颖的任务,标称化合物链萃取(NCCE),提取和聚类所有标称化合物该共享相同语义的主题。此外,我们的建模任务作为两阶段预测(即,化合物提取和链检测),其经由拟议的联合框架处理。该模型采用了BERT编码器以产生情境化文档表示。此外,知网被利用外部资源,提供丰富的义原信息。实验是基于我们的人工标注语料库,结果证明NCCE任务的必要性,以及我们的共同方法的有效性。
Bobo Li, Hao Fei, Yafeng Ren, Donghong Ji
Abstract: Lexical chain consists of cohesion words in a document, which implies the underlying structure of a text, and thus facilitates downstream NLP tasks. Nevertheless, existing work focuses on detecting the simple surface lexicons with shallow syntax associations, ignoring the semantic-aware lexical compounds as well as the latent semantic frames, (e.g., topic), which can be much more crucial for real-world NLP applications. In this paper, we introduce a novel task, Nominal Compound Chain Extraction (NCCE), extracting and clustering all the nominal compounds that share identical semantic topics. In addition, we model the task as a two-stage prediction (i.e., compound extraction and chain detection), which is handled via a proposed joint framework. The model employs the BERT encoder to yield contextualized document representation. Also, HowNet is exploited as external resources for offering rich sememe information. The experiments are based on our manually annotated corpus, and the results prove the necessity of the NCCE task as well as the effectiveness of our joint approach.
摘要:词汇链由一个文件,这意味着文本的底层结构,在凝聚的话,从而促进下游NLP任务。然而,现有的工作集中在检测与浅语法协会简单表面词典,忽略了语义感知词法化合物以及所述潜在语义帧(例如,主题),它可以是真实世界NLP应用更加关键。在本文中,我们介绍一种新颖的任务,标称化合物链萃取(NCCE),提取和聚类所有标称化合物该共享相同语义的主题。此外,我们的建模任务作为两阶段预测(即,化合物提取和链检测),其经由拟议的联合框架处理。该模型采用了BERT编码器以产生情境化文档表示。此外,知网被利用外部资源,提供丰富的义原信息。实验是基于我们的人工标注语料库,结果证明NCCE任务的必要性,以及我们的共同方法的有效性。
37. Extracting Summary Knowledge Graphs from Long Documents [PDF] 返回目录
Zeqiu Wu, Rik Koncel-Kedziorski, Mari Ostendorf, Hannaneh Hajishirzi
Abstract: Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications. Extracting compact knowledge graphs containing only salient entities and relations is important but challenging for understanding and summarizing long documents. We introduce a new text-to-graph task of predicting summarized knowledge graphs from long documents. We develop a dataset of 200k document/graph pairs using automatic and human annotations. We also develop strong baselines for this task based on graph learning and text summarization, and provide quantitative and qualitative studies of their effect.
摘要:知识图捕获实体和关系的长文档,并且可以促进许多下游应用的推理。提取只包含突出的实体和关系紧密知识图是重要的,但对于了解和总结长文档挑战。我们引入一个新的文字到图形任务的长文档预测的综合知识图形。我们开发的使用自动和人的注释200K的文件/图对数据集。我们还开发了强大的基线基于图形的学习和文本摘要这项任务,并提供它们的影响的定量和定性研究。
Zeqiu Wu, Rik Koncel-Kedziorski, Mari Ostendorf, Hannaneh Hajishirzi
Abstract: Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications. Extracting compact knowledge graphs containing only salient entities and relations is important but challenging for understanding and summarizing long documents. We introduce a new text-to-graph task of predicting summarized knowledge graphs from long documents. We develop a dataset of 200k document/graph pairs using automatic and human annotations. We also develop strong baselines for this task based on graph learning and text summarization, and provide quantitative and qualitative studies of their effect.
摘要:知识图捕获实体和关系的长文档,并且可以促进许多下游应用的推理。提取只包含突出的实体和关系紧密知识图是重要的,但对于了解和总结长文档挑战。我们引入一个新的文字到图形任务的长文档预测的综合知识图形。我们开发的使用自动和人的注释200K的文件/图对数据集。我们还开发了强大的基线基于图形的学习和文本摘要这项任务,并提供它们的影响的定量和定性研究。
38. CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes [PDF] 返回目录
Raeid Saqur, Ameet Deshpande
Abstract: The CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph parser library for CLEVR, that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. Structural order-invariant representations enable geometric learning and can aid in downstream tasks like language grounding to vision, robotics, compositionality, interpretability, and computational grammar construction. We provide three extensible main components - parser, embedder, and visualizer that can be tailored to suit specific learning setups. We also provide out-of-the-box functionality for seamless integration with popular deep graph neural network (GNN) libraries. Additionally, we discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
摘要:CLEVR数据集已在机器学习(ML)和自然语言处理(NLP)域的语言接地视觉推理广泛使用。我们提出了CLEVR图形解析库,它提供的功能为对象为中心的属性和关系提取,双模式结构图表示的建设。结构顺序不变表示支持几何的学习,可以在像语言接地,以视觉,机器人,组合性,解释性和计算语法结构下游任务帮助。我们提供了三种可扩展的主要成分 - 解析器,嵌入和可视化,可定制,以满足特定的学习设置。我们也与流行的深图表神经网络(GNN)库无缝集成,提供出的现成功能。此外,我们还讨论了下游的使用和图书馆的应用,以及它是如何加速的NLP研究团体的研究。
Raeid Saqur, Ameet Deshpande
Abstract: The CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph parser library for CLEVR, that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. Structural order-invariant representations enable geometric learning and can aid in downstream tasks like language grounding to vision, robotics, compositionality, interpretability, and computational grammar construction. We provide three extensible main components - parser, embedder, and visualizer that can be tailored to suit specific learning setups. We also provide out-of-the-box functionality for seamless integration with popular deep graph neural network (GNN) libraries. Additionally, we discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
摘要:CLEVR数据集已在机器学习(ML)和自然语言处理(NLP)域的语言接地视觉推理广泛使用。我们提出了CLEVR图形解析库,它提供的功能为对象为中心的属性和关系提取,双模式结构图表示的建设。结构顺序不变表示支持几何的学习,可以在像语言接地,以视觉,机器人,组合性,解释性和计算语法结构下游任务帮助。我们提供了三种可扩展的主要成分 - 解析器,嵌入和可视化,可定制,以满足特定的学习设置。我们也与流行的深图表神经网络(GNN)库无缝集成,提供出的现成功能。此外,我们还讨论了下游的使用和图书馆的应用,以及它是如何加速的NLP研究团体的研究。
39. Weight Distillation: Transferring the Knowledge in Neural Network Parameters [PDF] 返回目录
Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu
Abstract: Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of transferring model parameters. Inspired by this, we investigate methods of model acceleration and compression in another line of research. We propose Weight Distillation to transfer the knowledge in the large network parameters through a parameter generator. Our experiments on WMT16 En-Ro, NIST12 Zh-En, and WMT14 En-De machine translation tasks show that weight distillation can train a small network that is 1.88~2.94x faster than the large network but with competitive performance. With the same sized small network, weight distillation can outperform knowledge distillation by 0.51~1.82 BLEU points.
摘要:知识蒸馏已被证明是有效的模式加速和压缩。它允许小型网络学习以同样的方式作为一个大的网络来概括。在训练前最近取得的成功表明传输模型参数的有效性。受此启发,我们在调查研究的另一条线模型加速和压缩的方法。我们建议重蒸馏通过参数发电机转移在大型网络参数的知识。我们对WMT16恩滚装船,NIST12 ZH-EN和WMT14恩德机器翻译任务,实验表明,体重蒸馏可以培养一个小的网络,为1.88〜2.94x比大型网络,但有竞争力的性能更快。与同尺寸的小型网络,重蒸馏可以通过0.51〜1.82 BLEU点超越知识升华。
Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu
Abstract: Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of transferring model parameters. Inspired by this, we investigate methods of model acceleration and compression in another line of research. We propose Weight Distillation to transfer the knowledge in the large network parameters through a parameter generator. Our experiments on WMT16 En-Ro, NIST12 Zh-En, and WMT14 En-De machine translation tasks show that weight distillation can train a small network that is 1.88~2.94x faster than the large network but with competitive performance. With the same sized small network, weight distillation can outperform knowledge distillation by 0.51~1.82 BLEU points.
摘要:知识蒸馏已被证明是有效的模式加速和压缩。它允许小型网络学习以同样的方式作为一个大的网络来概括。在训练前最近取得的成功表明传输模型参数的有效性。受此启发,我们在调查研究的另一条线模型加速和压缩的方法。我们建议重蒸馏通过参数发电机转移在大型网络参数的知识。我们对WMT16恩滚装船,NIST12 ZH-EN和WMT14恩德机器翻译任务,实验表明,体重蒸馏可以培养一个小的网络,为1.88〜2.94x比大型网络,但有竞争力的性能更快。与同尺寸的小型网络,重蒸馏可以通过0.51〜1.82 BLEU点超越知识升华。
40. Enhancing Dialogue Generation via Multi-Level Contrastive Learning [PDF] 返回目录
Xin Li, Piji Li, Yan Wang, Xiaojiang Liu, Wai Lam
Abstract: Most of the existing works for dialogue generation are data-driven models trained directly on corpora crawled from websites. They mainly focus on improving the model architecture to produce better responses but pay little attention to considering the quality of the training data contrastively. In this paper, we propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware Calibration (RC) network is designed to construct the multi-level contrastive optimization objectives. Since these objectives are calculated based on the sentence level, which may erroneously encourage/suppress the generation of uninformative/informative words. To tackle this incidental issue, on one hand, we design an exquisite token-level strategy for estimating the instance loss more accurately. On the other hand, we build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words. We evaluate the proposed model on a carefully annotated dialogue dataset and the results suggest that our model can generate more relevant and diverse responses compared to the baseline models.
摘要:大多数对话一代现有作品是数据驱动的模型上直接语料库培训了来自网站的抓取。他们主要集中在提高模型架构,以产生更好的反应,但很少注意考虑到训练数据的质量对比。在本文中,我们提出了一个多层次的对比学习范式响应的细粒质量对于查询模型。秩知道校准(RC)网络的设计,构建多层次的对比优化目标。由于这些目标是根据句子的水平,这可能会错误地鼓励计算/抑制无信息/信息词的产生。为了解决这个问题,偶发,一方面,我们设计了一个精致的标记级别战略更准确地估计实例损失。在另一方面,我们建立了一个知识推理(KI)组件来捕获从训练过程中的参考关键字知识和利用这些信息来鼓励信息词的产生。我们评价一个精心标注的对话数据集所提出的模型,结果表明,我们的模型能够比基线模型产生更多相关的和多样化的反应。
Xin Li, Piji Li, Yan Wang, Xiaojiang Liu, Wai Lam
Abstract: Most of the existing works for dialogue generation are data-driven models trained directly on corpora crawled from websites. They mainly focus on improving the model architecture to produce better responses but pay little attention to considering the quality of the training data contrastively. In this paper, we propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware Calibration (RC) network is designed to construct the multi-level contrastive optimization objectives. Since these objectives are calculated based on the sentence level, which may erroneously encourage/suppress the generation of uninformative/informative words. To tackle this incidental issue, on one hand, we design an exquisite token-level strategy for estimating the instance loss more accurately. On the other hand, we build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words. We evaluate the proposed model on a carefully annotated dialogue dataset and the results suggest that our model can generate more relevant and diverse responses compared to the baseline models.
摘要:大多数对话一代现有作品是数据驱动的模型上直接语料库培训了来自网站的抓取。他们主要集中在提高模型架构,以产生更好的反应,但很少注意考虑到训练数据的质量对比。在本文中,我们提出了一个多层次的对比学习范式响应的细粒质量对于查询模型。秩知道校准(RC)网络的设计,构建多层次的对比优化目标。由于这些目标是根据句子的水平,这可能会错误地鼓励计算/抑制无信息/信息词的产生。为了解决这个问题,偶发,一方面,我们设计了一个精致的标记级别战略更准确地估计实例损失。在另一方面,我们建立了一个知识推理(KI)组件来捕获从训练过程中的参考关键字知识和利用这些信息来鼓励信息词的产生。我们评价一个精心标注的对话数据集所提出的模型,结果表明,我们的模型能够比基线模型产生更多相关的和多样化的反应。
41. Prior Art Search and Reranking for Generated Patent Text [PDF] 返回目录
Jieh-Sheng Lee, Jieh Hsiang
Abstract: Generative models, such as GPT-2, have demonstrated impressive results recently. A fundamental question we'd like to address is: where did the generated text come from? This work is our initial effort toward answering the question by using prior art search. The purpose of the prior art search is to find the most similar prior text in the training data of GPT-2. We take a reranking approach and apply it to the patent domain. Specifically, we pre-train GPT-2 models from scratch by using the patent data from the USPTO. The input for the prior art search is the patent text generated by the GPT-2 model. We also pre-trained BERT models from scratch for converting patent text to embeddings. The steps of reranking are: (1) search the most similar text in the training data of GPT-2 by taking a bag-of-word ranking approach (BM25), (2) convert the search results in text format to BERT embeddings, and (3) provide the final result by ranking the BERT embeddings based on their similarities with the patent text generated by GPT-2. The experiments in this work show that such reranking is better than ranking with embeddings alone. However, our mixed results also indicate that calculating the semantic similarities among long text spans is still challenging. To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.
摘要:生成模型,如GPT-2,最近证明了不俗的业绩。我们希望,以解决根本的问题是:在哪里生成的文本从何而来?这项工作是我们对利用现有技术检索回答这个问题最初的努力。现有技术搜索的目的是找到在GPT-2的训练数据最相似的先前的文本。我们来重新分级方法,并把它应用到专利领域。具体而言,从头我们预列车GPT-2模型通过使用来自美国专利商标局专利数据。对于现有技术检索的输入是由GPT-2模型生成的专利文本。我们还预先训练BERT模式从无到有的专利文本转换成的嵌入。重新分级的步骤是:(1)通过采取袋的字排序方法(BM25),(2)转换成文本格式的搜索结果BERT的嵌入搜索在GPT-2的训练数据最相似的文本,和(3)由排名基于它们与由GPT-2生成的专利文本相似性的嵌入BERT提供最终结果。在这项工作中展示的实验,这样的重新排名比单纯的嵌入排名更好。然而,我们的混合结果还表明,长文本跨度中计算语义相似性仍然具有挑战性。据我们所知,这项工作是第一次实施再排序系统回顾找出最相似的投入到基于其输出GPT模式。
Jieh-Sheng Lee, Jieh Hsiang
Abstract: Generative models, such as GPT-2, have demonstrated impressive results recently. A fundamental question we'd like to address is: where did the generated text come from? This work is our initial effort toward answering the question by using prior art search. The purpose of the prior art search is to find the most similar prior text in the training data of GPT-2. We take a reranking approach and apply it to the patent domain. Specifically, we pre-train GPT-2 models from scratch by using the patent data from the USPTO. The input for the prior art search is the patent text generated by the GPT-2 model. We also pre-trained BERT models from scratch for converting patent text to embeddings. The steps of reranking are: (1) search the most similar text in the training data of GPT-2 by taking a bag-of-word ranking approach (BM25), (2) convert the search results in text format to BERT embeddings, and (3) provide the final result by ranking the BERT embeddings based on their similarities with the patent text generated by GPT-2. The experiments in this work show that such reranking is better than ranking with embeddings alone. However, our mixed results also indicate that calculating the semantic similarities among long text spans is still challenging. To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.
摘要:生成模型,如GPT-2,最近证明了不俗的业绩。我们希望,以解决根本的问题是:在哪里生成的文本从何而来?这项工作是我们对利用现有技术检索回答这个问题最初的努力。现有技术搜索的目的是找到在GPT-2的训练数据最相似的先前的文本。我们来重新分级方法,并把它应用到专利领域。具体而言,从头我们预列车GPT-2模型通过使用来自美国专利商标局专利数据。对于现有技术检索的输入是由GPT-2模型生成的专利文本。我们还预先训练BERT模式从无到有的专利文本转换成的嵌入。重新分级的步骤是:(1)通过采取袋的字排序方法(BM25),(2)转换成文本格式的搜索结果BERT的嵌入搜索在GPT-2的训练数据最相似的文本,和(3)由排名基于它们与由GPT-2生成的专利文本相似性的嵌入BERT提供最终结果。在这项工作中展示的实验,这样的重新排名比单纯的嵌入排名更好。然而,我们的混合结果还表明,长文本跨度中计算语义相似性仍然具有挑战性。据我们所知,这项工作是第一次实施再排序系统回顾找出最相似的投入到基于其输出GPT模式。
42. Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation [PDF] 返回目录
Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan
Abstract: Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.
摘要:许多文档级神经机器翻译(NMT)系统研究上下文感知建筑的效用,通常需要的参数和计算复杂性越来越多。然而,很少注重的是基准模型。在本文中,我们广泛的研究文档级的翻译标准变压器的优点和缺点,发现自回归属性可以同时带来的一致性的两个优势和积累误差的缺点。因此,我们建议在标准变压器两者之上的非常简单的长短期遮蔽自我关注有效捕捉远距离的依赖,并减少错误的传播。我们审视我们的两个公开提供文件级数据集的方式。我们可以实现在BLEU和捕获话语现象的好成绩。
Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan
Abstract: Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.
摘要:许多文档级神经机器翻译(NMT)系统研究上下文感知建筑的效用,通常需要的参数和计算复杂性越来越多。然而,很少注重的是基准模型。在本文中,我们广泛的研究文档级的翻译标准变压器的优点和缺点,发现自回归属性可以同时带来的一致性的两个优势和积累误差的缺点。因此,我们建议在标准变压器两者之上的非常简单的长短期遮蔽自我关注有效捕捉远距离的依赖,并减少错误的传播。我们审视我们的两个公开提供文件级数据集的方式。我们可以实现在BLEU和捕获话语现象的好成绩。
43. Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing [PDF] 返回目录
Ke Wang, Jiayi Wang, Niyu Ge, Yangbing Shi, Yu Zhao, Kai Fan
Abstract: With the advent of neural machine translation, there has been a marked shift towards leveraging and consuming the machine translation results. However, the gap between machine translation systems and human translators needs to be manually closed by post-editing. In this paper, we propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output. Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model. To imitate the behavior of human translators, we design three efficient delegation modules -- quality estimation, generative post-editing, and atomic operation post-editing and construct a hierarchical model based on them. We examine this approach with the English--German dataset from WMT 2017 APE shared task and our experimental results can achieve the state-of-the-art performance. We also verify that the certified translators can significantly expedite their post-editing processing with our model in human evaluation.
摘要:随着神经机器翻译的出现,出现了对利用和消耗的机器翻译结果的显着转变。然而,机器翻译系统和翻译人员的需求之间的差距,以通过手动后期编辑关闭。在本文中,我们提出了质量估计和机器翻译输出的编辑后自动的终端到终端的深度学习的框架。我们的目标是提供纠错建议,并通过可解释的模型,以进一步减轻翻译人员的负担。为了模仿人翻译的行为,我们设计了三种高效代表团模块 - 质量估计,生成后期编辑,和原子操作后期编辑和构建基于他们一个层次模型。我们研究与英文这种做法 - 德国数据集从2017年WMT APE共同任务,我们的实验结果能达到国家的最先进的性能。我们也可以验证认证的译员可以显著加快其后期编辑处理与我们人类的评价模型。
Ke Wang, Jiayi Wang, Niyu Ge, Yangbing Shi, Yu Zhao, Kai Fan
Abstract: With the advent of neural machine translation, there has been a marked shift towards leveraging and consuming the machine translation results. However, the gap between machine translation systems and human translators needs to be manually closed by post-editing. In this paper, we propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output. Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model. To imitate the behavior of human translators, we design three efficient delegation modules -- quality estimation, generative post-editing, and atomic operation post-editing and construct a hierarchical model based on them. We examine this approach with the English--German dataset from WMT 2017 APE shared task and our experimental results can achieve the state-of-the-art performance. We also verify that the certified translators can significantly expedite their post-editing processing with our model in human evaluation.
摘要:随着神经机器翻译的出现,出现了对利用和消耗的机器翻译结果的显着转变。然而,机器翻译系统和翻译人员的需求之间的差距,以通过手动后期编辑关闭。在本文中,我们提出了质量估计和机器翻译输出的编辑后自动的终端到终端的深度学习的框架。我们的目标是提供纠错建议,并通过可解释的模型,以进一步减轻翻译人员的负担。为了模仿人翻译的行为,我们设计了三种高效代表团模块 - 质量估计,生成后期编辑,和原子操作后期编辑和构建基于他们一个层次模型。我们研究与英文这种做法 - 德国数据集从2017年WMT APE共同任务,我们的实验结果能达到国家的最先进的性能。我们也可以验证认证的译员可以显著加快其后期编辑处理与我们人类的评价模型。
44. Will it Unblend? [PDF] 返回目录
Yuval Pinter, Cassandra L. Jacobs, Jacob Eisenstein
Abstract: Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT's processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that context-aware embedding systems outperform character-level and context-free embeddings, although their results are still far from satisfactory.
摘要:自然语言处理系统通常具有超出词汇(OOV)而言,其没有出现在训练数据奋斗。共混物,如“Innoventor公司”,被一个特别挑战类OOV的,因为它们是由涉及到不可预测的方式和程度预期含义的两个或更多个碱基熔合在一起而形成。在这项工作中,我们运行英语OOV混纺的新型数据集实验,以量化的大型情境语言模型如BERT解释共混物的含义的难度。首先证明这些混合物的BERT的处理并不完全访问组件的含义,留下自己的上下文语义表述贫困。我们认为,这主要是由于从混合形成的结果字符的损失。然后,我们评估不同的模型是多么容易识别的结构和恢复共混物的起源,发现上下文感知系统嵌入跑赢字符级和上下文的嵌入,虽然他们的结果是令人满意仍远。
Yuval Pinter, Cassandra L. Jacobs, Jacob Eisenstein
Abstract: Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT's processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that context-aware embedding systems outperform character-level and context-free embeddings, although their results are still far from satisfactory.
摘要:自然语言处理系统通常具有超出词汇(OOV)而言,其没有出现在训练数据奋斗。共混物,如“Innoventor公司”,被一个特别挑战类OOV的,因为它们是由涉及到不可预测的方式和程度预期含义的两个或更多个碱基熔合在一起而形成。在这项工作中,我们运行英语OOV混纺的新型数据集实验,以量化的大型情境语言模型如BERT解释共混物的含义的难度。首先证明这些混合物的BERT的处理并不完全访问组件的含义,留下自己的上下文语义表述贫困。我们认为,这主要是由于从混合形成的结果字符的损失。然后,我们评估不同的模型是多么容易识别的结构和恢复共混物的起源,发现上下文感知系统嵌入跑赢字符级和上下文的嵌入,虽然他们的结果是令人满意仍远。
45. Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering [PDF] 返回目录
Shih-Ting Lin, Greg Durrett
Abstract: Current methods in open-domain question answering (QA) usually employ a pipeline of first retrieving relevant documents, then applying strong reading comprehension (RC) models to that retrieved text. However, modern RC models are complex and expensive to run, so techniques to prune the space of retrieved text are critical to allow this approach to scale. In this paper, we focus on approaches which apply an intermediate sentence selection step to address this issue, and investigate the best practices for this approach. We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We examine trade-offs between processing speed and task performance in these two approaches, and demonstrate an ensemble module that represents a hybrid of the two. From experiments on Open-SQuAD and TriviaQA, we show that very lightweight QA models can do well at this task, but retrieval-based models are faster still. An ensemble module we describe balances between the two and generalizes well cross-domain.
摘要:在开放域问答(QA)目前的方法通常采用的第一取回相关文件管道,然后施加较强的阅读理解(RC)模型,该检索到的文本。然而,现代RC模型复杂和昂贵的运行,这样的技术来修剪检索到的文本的空间是最关键的,允许这种方法的规模。在本文中,我们侧重于适用中间的句子选择步骤来解决这个问题,并探讨这种方法的最佳实践方法。我们描述了两组模型句子的选择:基于QA-方法,其运行一个全面的质量保证体系,以确定答案的考生,以及基于内容的检索,模型,找到每个通道的部分具体涉及到每一个问题。我们研究处理速度,并在这两种方法任务绩效之间进行权衡,并展示代表两者的混合合奏模块。从开放式的阵容,TriviaQA实验,我们表明,非常轻巧QA车型可以在这个任务做的很好,但是基于内容的检索,模型越飞越快。合奏模块中,我们描述跨域两个,概括之间平衡好。
Shih-Ting Lin, Greg Durrett
Abstract: Current methods in open-domain question answering (QA) usually employ a pipeline of first retrieving relevant documents, then applying strong reading comprehension (RC) models to that retrieved text. However, modern RC models are complex and expensive to run, so techniques to prune the space of retrieved text are critical to allow this approach to scale. In this paper, we focus on approaches which apply an intermediate sentence selection step to address this issue, and investigate the best practices for this approach. We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We examine trade-offs between processing speed and task performance in these two approaches, and demonstrate an ensemble module that represents a hybrid of the two. From experiments on Open-SQuAD and TriviaQA, we show that very lightweight QA models can do well at this task, but retrieval-based models are faster still. An ensemble module we describe balances between the two and generalizes well cross-domain.
摘要:在开放域问答(QA)目前的方法通常采用的第一取回相关文件管道,然后施加较强的阅读理解(RC)模型,该检索到的文本。然而,现代RC模型复杂和昂贵的运行,这样的技术来修剪检索到的文本的空间是最关键的,允许这种方法的规模。在本文中,我们侧重于适用中间的句子选择步骤来解决这个问题,并探讨这种方法的最佳实践方法。我们描述了两组模型句子的选择:基于QA-方法,其运行一个全面的质量保证体系,以确定答案的考生,以及基于内容的检索,模型,找到每个通道的部分具体涉及到每一个问题。我们研究处理速度,并在这两种方法任务绩效之间进行权衡,并展示代表两者的混合合奏模块。从开放式的阵容,TriviaQA实验,我们表明,非常轻巧QA车型可以在这个任务做的很好,但是基于内容的检索,模型越飞越快。合奏模块中,我们描述跨域两个,概括之间平衡好。
46. Deliberate Self-Attention Network with Uncertainty Estimation for Multi-Aspect Review Rating Prediction [PDF] 返回目录
Tian Shi, Ping Wang, Chandan K. Reddy
Abstract: In recent years, several online platforms have seen a rapid increase in the number of review systems that request users to provide aspect-level feedback. Multi-Aspect Rating Prediction (MARP), where the goal is to predict the ratings from a review at an individual aspect level, has become a challenging and an imminent problem. To tackle this challenge, we propose a deliberate self-attention deep neural network model, named as FEDAR, for the MARP problem, which can achieve competitive performance while also being able to interpret the predictions made. As opposed to the previous studies, which make use of hand-crafted keywords to determine aspects in sentiment predictions, our model does not suffer from human bias issues since aspect keywords are automatically detected through a self-attention mechanism. FEDAR is equipped with a highway word embedding layer to transfer knowledge from pre-trained word embeddings, an RNN encoder layer with output features enriched by pooling and factorization techniques, and a deliberate self-attention layer. In addition, we also propose an Attention-driven Keywords Ranking (AKR) method, which can automatically extract aspect-level sentiment-related keywords from the review corpus based on the attention weights. Since crowdsourcing annotation can be an alternate way to recover missing ratings of reviews, we propose a LEcture-AuDience (LEAD) strategy to estimate model uncertainty in the context of multi-task learning, so that valuable human resources can focus on the most uncertain predictions. Our extensive set of experiments on different DMSC datasets demonstrate the superiority of the proposed FEDAR and LEAD models. Visualization of aspect-level sentiment keywords demonstrate the interpretability of our model and effectiveness of our AKR method.
摘要:近年来,一些在线平台已经看到了审核系统,其要求用户提供方面级反馈的人数迅速增加。多看点评级预测(高危人群),其目的是在个人方面的水平来预测从评论中的收视率,已经成为一个具有挑战性和迫在眉睫的问题。为了应对这一挑战,我们提出了一个深思熟虑的自我关注深层神经网络模型,命名为FEDAR,对高危人群的问题,从而可以实现有竞争力的性能,同时还能够解释作出的预测。相对于以往的研究,其中利用手工制作的关键字,以确定在情绪方面的预测,我们的模型不会人为偏差的问题,因为一方面会透过自我注意机制自动检测受损。 FEDAR配备了高速公路字埋层从预先训练字的嵌入传递知识,具有输出RNN编码层功能通过集中和分解技术,并蓄意自我注意层丰富。此外,我们还提出了一个警告驱动的关键词排名(AKR)方法,它可以自动地从基于关注权重的审查语料库中提取方面层次情绪相关的关键字。由于众包注释可以恢复丢失的审查评级的另一种方法,我们提出了一个演讲,听众(LEAD)策略,在多任务学习的角度估计模型的不确定性,使宝贵的人力资源可以专注于最不确定的预测。我们广泛的一套不同的数据集DMSC实验证明了该FEDAR和LEAD模式的优越性。一方面级别情绪关键字的可视化展示我们的模式和我们AKR方法的有效性的可解释性。
Tian Shi, Ping Wang, Chandan K. Reddy
Abstract: In recent years, several online platforms have seen a rapid increase in the number of review systems that request users to provide aspect-level feedback. Multi-Aspect Rating Prediction (MARP), where the goal is to predict the ratings from a review at an individual aspect level, has become a challenging and an imminent problem. To tackle this challenge, we propose a deliberate self-attention deep neural network model, named as FEDAR, for the MARP problem, which can achieve competitive performance while also being able to interpret the predictions made. As opposed to the previous studies, which make use of hand-crafted keywords to determine aspects in sentiment predictions, our model does not suffer from human bias issues since aspect keywords are automatically detected through a self-attention mechanism. FEDAR is equipped with a highway word embedding layer to transfer knowledge from pre-trained word embeddings, an RNN encoder layer with output features enriched by pooling and factorization techniques, and a deliberate self-attention layer. In addition, we also propose an Attention-driven Keywords Ranking (AKR) method, which can automatically extract aspect-level sentiment-related keywords from the review corpus based on the attention weights. Since crowdsourcing annotation can be an alternate way to recover missing ratings of reviews, we propose a LEcture-AuDience (LEAD) strategy to estimate model uncertainty in the context of multi-task learning, so that valuable human resources can focus on the most uncertain predictions. Our extensive set of experiments on different DMSC datasets demonstrate the superiority of the proposed FEDAR and LEAD models. Visualization of aspect-level sentiment keywords demonstrate the interpretability of our model and effectiveness of our AKR method.
摘要:近年来,一些在线平台已经看到了审核系统,其要求用户提供方面级反馈的人数迅速增加。多看点评级预测(高危人群),其目的是在个人方面的水平来预测从评论中的收视率,已经成为一个具有挑战性和迫在眉睫的问题。为了应对这一挑战,我们提出了一个深思熟虑的自我关注深层神经网络模型,命名为FEDAR,对高危人群的问题,从而可以实现有竞争力的性能,同时还能够解释作出的预测。相对于以往的研究,其中利用手工制作的关键字,以确定在情绪方面的预测,我们的模型不会人为偏差的问题,因为一方面会透过自我注意机制自动检测受损。 FEDAR配备了高速公路字埋层从预先训练字的嵌入传递知识,具有输出RNN编码层功能通过集中和分解技术,并蓄意自我注意层丰富。此外,我们还提出了一个警告驱动的关键词排名(AKR)方法,它可以自动地从基于关注权重的审查语料库中提取方面层次情绪相关的关键字。由于众包注释可以恢复丢失的审查评级的另一种方法,我们提出了一个演讲,听众(LEAD)策略,在多任务学习的角度估计模型的不确定性,使宝贵的人力资源可以专注于最不确定的预测。我们广泛的一套不同的数据集DMSC实验证明了该FEDAR和LEAD模式的优越性。一方面级别情绪关键字的可视化展示我们的模式和我们AKR方法的有效性的可解释性。
47. A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection [PDF] 返回目录
Tian Shi, Liuqing Li, Ping Wang, Chandan K. Reddy
Abstract: Unsupervised aspect detection (UAD) aims at automatically extracting interpretable aspects and identifying aspect-specific segments (such as sentences) from online reviews. However, recent deep learning-based topic models, specifically aspect-based autoencoder, suffer from several problems, such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To tackle these challenges, in this paper, we first propose a self-supervised contrastive learning framework and an attention-based model equipped with a novel smooth self-attention (SSA) module for the UAD task in order to learn better representations for aspects and review segments. Secondly, we introduce a high-resolution selective mapping (HRSMap) method to efficiently assign aspects discovered by the model to aspects of interest. We also propose using a knowledge distilling technique to further improve the aspect detection performance. Our methods outperform several recent unsupervised and weakly supervised approaches on publicly available benchmark user review datasets. Aspect interpretation results show that extracted aspects are meaningful, have good coverage, and can be easily mapped to aspects of interest. Ablation studies and attention weight visualization also demonstrate the effectiveness of SSA and the knowledge distilling method.
摘要:无监督方面检测(UAD)旨在自动提取可解释的方面和从在线评论识别方面特异性片段(如句子)。然而,最近的深学习型主题模型,特别是基于方面,自动编码,存在若干问题,如提取嘈杂的方面,并通过模型发现感兴趣的方面很差的映射方面。为了应对这些挑战,在本文中,我们首先提出了一个自我监督对比学习框架,为了更好地学习表示为方面配备了UAD任务新颖流畅的自我关注(SSA)模块的关注,基于模型和回顾片段。其次,我们引入了一个高分辨率的选择映射(HRSMap)方法,通过该模型的兴趣方面发现有效的分配方面。我们还建议使用知识蒸馏技术,进一步提高方面的检测性能。我们的方法优于上公开提供基准用户的评论数据集最近几无监督和弱监督的方法。看点解读结果表明,提取方面是有意义的,有良好的覆盖,并可以很容易地映射到感兴趣的方面。切除研究和关注重量的可视化也证明了SSA的有效性和知识的蒸馏方法。
Tian Shi, Liuqing Li, Ping Wang, Chandan K. Reddy
Abstract: Unsupervised aspect detection (UAD) aims at automatically extracting interpretable aspects and identifying aspect-specific segments (such as sentences) from online reviews. However, recent deep learning-based topic models, specifically aspect-based autoencoder, suffer from several problems, such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To tackle these challenges, in this paper, we first propose a self-supervised contrastive learning framework and an attention-based model equipped with a novel smooth self-attention (SSA) module for the UAD task in order to learn better representations for aspects and review segments. Secondly, we introduce a high-resolution selective mapping (HRSMap) method to efficiently assign aspects discovered by the model to aspects of interest. We also propose using a knowledge distilling technique to further improve the aspect detection performance. Our methods outperform several recent unsupervised and weakly supervised approaches on publicly available benchmark user review datasets. Aspect interpretation results show that extracted aspects are meaningful, have good coverage, and can be easily mapped to aspects of interest. Ablation studies and attention weight visualization also demonstrate the effectiveness of SSA and the knowledge distilling method.
摘要:无监督方面检测(UAD)旨在自动提取可解释的方面和从在线评论识别方面特异性片段(如句子)。然而,最近的深学习型主题模型,特别是基于方面,自动编码,存在若干问题,如提取嘈杂的方面,并通过模型发现感兴趣的方面很差的映射方面。为了应对这些挑战,在本文中,我们首先提出了一个自我监督对比学习框架,为了更好地学习表示为方面配备了UAD任务新颖流畅的自我关注(SSA)模块的关注,基于模型和回顾片段。其次,我们引入了一个高分辨率的选择映射(HRSMap)方法,通过该模型的兴趣方面发现有效的分配方面。我们还建议使用知识蒸馏技术,进一步提高方面的检测性能。我们的方法优于上公开提供基准用户的评论数据集最近几无监督和弱监督的方法。看点解读结果表明,提取方面是有意义的,有良好的覆盖,并可以很容易地映射到感兴趣的方面。切除研究和关注重量的可视化也证明了SSA的有效性和知识的蒸馏方法。
48. Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks [PDF] 返回目录
Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan Kapanipathi, Kartik Talamadupula
Abstract: In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks - has not been realized. In this paper, we study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization. We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. We validate our conjecture by showing competitive results on the task of QA and obtaining the best reported results on the task of Checking Factual Correctness of Summaries.
摘要:近年来,自然语言推理(NLI)的任务已经获得显著的关注,新的数据集和模型实现它接近人类水平的性能。然而,NLI的全部承诺 - 尤其是它学知识,应该是推广到其他下游NLP任务 - 一直未能实现。在本文中,我们研究从两个下游任务的镜头这个逝去的诺言:问答(QA)和文本摘要。我们猜想,在NLI的数据集,这些下游任务顾虑的前提下,长度之间的关键区别;而创造新的长期前提NLI数据集从现有的QA数据集是培养一个真正的普及NLI模型有希望的途径。我们通过展示对QA的工作竞争的结果,并获得关于检查总结的事实正确性的任务的最佳报告的结果验证我们的猜想。
Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan Kapanipathi, Kartik Talamadupula
Abstract: In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks - has not been realized. In this paper, we study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization. We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. We validate our conjecture by showing competitive results on the task of QA and obtaining the best reported results on the task of Checking Factual Correctness of Summaries.
摘要:近年来,自然语言推理(NLI)的任务已经获得显著的关注,新的数据集和模型实现它接近人类水平的性能。然而,NLI的全部承诺 - 尤其是它学知识,应该是推广到其他下游NLP任务 - 一直未能实现。在本文中,我们研究从两个下游任务的镜头这个逝去的诺言:问答(QA)和文本摘要。我们猜想,在NLI的数据集,这些下游任务顾虑的前提下,长度之间的关键区别;而创造新的长期前提NLI数据集从现有的QA数据集是培养一个真正的普及NLI模型有希望的途径。我们通过展示对QA的工作竞争的结果,并获得关于检查总结的事实正确性的任务的最佳报告的结果验证我们的猜想。
49. COMET: A Neural Framework for MT Evaluation [PDF] 返回目录
Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie
Abstract: We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.
摘要:我们目前COMET,用于培训,获得国家的最先进的新的水平与人类的判断相关的多语种机器翻译评测模型神经框架。我们的框架利用最近在跨语种预训练语言的突破造型导致该漏洞从源输入和目标语言参考译文,以便更准确地预测MT质量信息都非常多语种和适应性MT评价模型。为了展示我们的框架,我们培养三种型号与不同类型的人的判断:直接评估,人力介导的翻译编辑率和多维质量度量。我们的模型实现了对WMT 2019和度量共享任务新的国家的最先进的性能和表现出的鲁棒性高性能系统。
Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie
Abstract: We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.
摘要:我们目前COMET,用于培训,获得国家的最先进的新的水平与人类的判断相关的多语种机器翻译评测模型神经框架。我们的框架利用最近在跨语种预训练语言的突破造型导致该漏洞从源输入和目标语言参考译文,以便更准确地预测MT质量信息都非常多语种和适应性MT评价模型。为了展示我们的框架,我们培养三种型号与不同类型的人的判断:直接评估,人力介导的翻译编辑率和多维质量度量。我们的模型实现了对WMT 2019和度量共享任务新的国家的最先进的性能和表现出的鲁棒性高性能系统。
50. Presenting Simultaneous Translation in Limited Space [PDF] 返回目录
Dominik Macháček, Ondřej Bojar
Abstract: Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Deploying these systems for users faces the problem of presenting subtitles in a limited space, such as two lines on a television screen. The subtitles must be shown promptly, incrementally, and with adequate time for reading. We provide an algorithm for subtitling. Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.
摘要:一个长篇讲话的自动同声翻译的一些方法允许输出版本,低延迟交易的准确性。部署这些系统为用户面临在电视屏幕上在有限的空间中呈现的字幕,例如两行的问题。字幕必须及时显示,增量,并用足够的时间阅读。我们对字幕的算法。此外,我们提出了一个方法如何估算自动翻译相结合的整体可用性,并通过测量质量,时延,并在测试集的稳定性字幕,并提出翻译延迟的改进措施。
Dominik Macháček, Ondřej Bojar
Abstract: Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Deploying these systems for users faces the problem of presenting subtitles in a limited space, such as two lines on a television screen. The subtitles must be shown promptly, incrementally, and with adequate time for reading. We provide an algorithm for subtitling. Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.
摘要:一个长篇讲话的自动同声翻译的一些方法允许输出版本,低延迟交易的准确性。部署这些系统为用户面临在电视屏幕上在有限的空间中呈现的字幕,例如两行的问题。字幕必须及时显示,增量,并用足够的时间阅读。我们对字幕的算法。此外,我们提出了一个方法如何估算自动翻译相结合的整体可用性,并通过测量质量,时延,并在测试集的稳定性字幕,并提出翻译延迟的改进措施。
51. Visual-Semantic Embedding Model Informed by Structured Knowledge [PDF] 返回目录
Mirantha Jayathilaka, Tingting Mu, Uli Sattler
Abstract: We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both standard and zero-shot settings. We propose two novel evaluation frameworks to analyse classification errors with respect to the class hierarchy indicated by the knowledge base. The approach is tested using the ILSVRC 2012 image dataset and a WordNet knowledge base. With respect to both standard and zero-shot image classification, our approach shows superior performance compared with the original approach, which uses word embeddings.
摘要:本文提出一种新的方法通过将来自外部的结构化的知识基础拍摄的概念表示,以改善视觉语义模型嵌入。我们调查下标准和零拍摄设置的图像分类性能。我们提出了两个新的评估框架来分析相对于由知识库指示的类层次结构分类的错误。该方法是使用ILSVRC 2012的图像数据集和一个WordNet的知识库进行测试。对于标准和零镜头图像分类,我们的做法显示出优异的性能与原来的做法,即用字的嵌入比较。
Mirantha Jayathilaka, Tingting Mu, Uli Sattler
Abstract: We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both standard and zero-shot settings. We propose two novel evaluation frameworks to analyse classification errors with respect to the class hierarchy indicated by the knowledge base. The approach is tested using the ILSVRC 2012 image dataset and a WordNet knowledge base. With respect to both standard and zero-shot image classification, our approach shows superior performance compared with the original approach, which uses word embeddings.
摘要:本文提出一种新的方法通过将来自外部的结构化的知识基础拍摄的概念表示,以改善视觉语义模型嵌入。我们调查下标准和零拍摄设置的图像分类性能。我们提出了两个新的评估框架来分析相对于由知识库指示的类层次结构分类的错误。该方法是使用ILSVRC 2012的图像数据集和一个WordNet的知识库进行测试。对于标准和零镜头图像分类,我们的做法显示出优异的性能与原来的做法,即用字的嵌入比较。
52. Finding Influential Instances for Distantly Supervised Relation Extraction [PDF] 返回目录
Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, Yefeng Zheng
Abstract: Distant supervision has been demonstrated to be highly beneficial to enhance relation extraction models, but it often suffers from high label noise. In this work, we propose a novel model-agnostic instance subsampling method for distantly supervised relation extraction, namely REIF, which bridges the gap of realizing influence subsampling in deep learning. It encompasses two key steps: first calculating instance-level influences that measure how much each training instance contributes to the validation loss change of our model, then deriving sampling probabilities via the proposed sigmoid sampling function to perform batch-in-bag sampling. We design a fast influence subsampling scheme that reduces the computational complexity from O(mn) to O(1), and analyze its robustness when the sigmoid sampling function is employed. Empirical experiments demonstrate our method's superiority over the baselines, and its ability to support interpretable instance selection.
摘要:遥远的监管已经被证明是非常有益的,以提高关系抽取模型,但它往往是从高噪音的标签遭遇。在这项工作中,我们提出了遥远的监督关系抽取,即REIF,其中桥梁深度学习实现影响二次取样间隔一个新的模型无关的情况下的子采样方法。它包括两个关键步骤:首先计算实例级影响这一措施多少每次训练实例有助于我们的模型的验证损耗变化,然后通过推导所提出的乙状结肠采样功能采样概率执行批处理中袋取样。我们设计,减少从O(Mn)为O(1)计算复杂度快速影响子采样方案,并且采用S型采样功能时,分析其鲁棒性。实证实验证明我们的方法的在基线优势,它能够支持可解释的实例选择的能力。
Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, Yefeng Zheng
Abstract: Distant supervision has been demonstrated to be highly beneficial to enhance relation extraction models, but it often suffers from high label noise. In this work, we propose a novel model-agnostic instance subsampling method for distantly supervised relation extraction, namely REIF, which bridges the gap of realizing influence subsampling in deep learning. It encompasses two key steps: first calculating instance-level influences that measure how much each training instance contributes to the validation loss change of our model, then deriving sampling probabilities via the proposed sigmoid sampling function to perform batch-in-bag sampling. We design a fast influence subsampling scheme that reduces the computational complexity from O(mn) to O(1), and analyze its robustness when the sigmoid sampling function is employed. Empirical experiments demonstrate our method's superiority over the baselines, and its ability to support interpretable instance selection.
摘要:遥远的监管已经被证明是非常有益的,以提高关系抽取模型,但它往往是从高噪音的标签遭遇。在这项工作中,我们提出了遥远的监督关系抽取,即REIF,其中桥梁深度学习实现影响二次取样间隔一个新的模型无关的情况下的子采样方法。它包括两个关键步骤:首先计算实例级影响这一措施多少每次训练实例有助于我们的模型的验证损耗变化,然后通过推导所提出的乙状结肠采样功能采样概率执行批处理中袋取样。我们设计,减少从O(Mn)为O(1)计算复杂度快速影响子采样方案,并且采用S型采样功能时,分析其鲁棒性。实证实验证明我们的方法的在基线优势,它能够支持可解释的实例选择的能力。
53. DiffWave: A Versatile Diffusion Model for Audio Synthesis [PDF] 返回目录
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
Abstract: In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality~(MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
摘要:在这项工作中,我们提出DiffWave,有条件和无条件的波形生成一个通用的扩散概率模型。该模型是非自回归,和白噪声信号转换成结构化的波形通过马尔可夫链中的合成步骤恒定数量的转换。它被有效地通过优化结合上的数据的可能性变的变体训练。 DiffWave产生不同波形生成任务,包括空调,梅尔频谱,分类条件产生,并无条件代神经声码高保真音频。我们证明DiffWave语音质量〜方面强烈WaveNet声码器相匹配(MOS:4.44 4.43相比),而合成快几个数量级。特别是,它显著优于在挑战无条件生成任务自回归和GaN基波形模型的音频质量和各种自动化和人类评估样本的多样性方面。
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
Abstract: In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality~(MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
摘要:在这项工作中,我们提出DiffWave,有条件和无条件的波形生成一个通用的扩散概率模型。该模型是非自回归,和白噪声信号转换成结构化的波形通过马尔可夫链中的合成步骤恒定数量的转换。它被有效地通过优化结合上的数据的可能性变的变体训练。 DiffWave产生不同波形生成任务,包括空调,梅尔频谱,分类条件产生,并无条件代神经声码高保真音频。我们证明DiffWave语音质量〜方面强烈WaveNet声码器相匹配(MOS:4.44 4.43相比),而合成快几个数量级。特别是,它显著优于在挑战无条件生成任务自回归和GaN基波形模型的音频质量和各种自动化和人类评估样本的多样性方面。
54. Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation [PDF] 返回目录
Francisco Vargas, Ryan Cotterell
Abstract: Bolukbasi et al. (2016) presents one of the first gender bias mitigation techniques for word embeddings. Their method takes pre-trained word embeddings as input and attempts to isolate a linear subspace that captures most of the gender bias in the embeddings. As judged by an analogical evaluation task, their method virtually eliminates gender bias in the embeddings. However, an implicit and untested assumption of their method is that the bias sub-space is actually linear. In this work, we generalize their method to a kernelized, non-linear version. We take inspiration from kernel principal component analysis and derive a non-linear bias isolation technique. We discuss and overcome some of the practical drawbacks of our method for non-linear gender bias mitigation in word embeddings and analyze empirically whether the bias subspace is actually linear. Our analysis shows that gender bias is in fact well captured by a linear subspace, justifying the assumption of Bolukbasi et al. (2016).
摘要:Bolukbasi等。 (2016)呈现的第一性别偏见缓解技术的嵌入字一个。他们的方法需要预先训练字的嵌入作为输入,并且尝试以分离线性子空间,其捕获在嵌入物的性别偏见的最多。作为判断一个类比评估任务,他们的方法实际上消除了嵌入物的性别偏见。然而,他们的方法的隐式和未经试验的假设是,偏压子空间实际上是线性的。在这项工作中,我们概括他们的方法将核化,非线性的版本。我们从核主成分分析的启发和导出的非线性偏置隔离技术。我们讨论并克服我们的一些方法的非线性性别偏见缓解实际缺点的话的嵌入和实证分析的偏差子空间是否真的是线性的。我们的分析表明,性别偏见实际上以及由线性子空间捕获,证明Bolukbasi等人的假设。 (2016)。
Francisco Vargas, Ryan Cotterell
Abstract: Bolukbasi et al. (2016) presents one of the first gender bias mitigation techniques for word embeddings. Their method takes pre-trained word embeddings as input and attempts to isolate a linear subspace that captures most of the gender bias in the embeddings. As judged by an analogical evaluation task, their method virtually eliminates gender bias in the embeddings. However, an implicit and untested assumption of their method is that the bias sub-space is actually linear. In this work, we generalize their method to a kernelized, non-linear version. We take inspiration from kernel principal component analysis and derive a non-linear bias isolation technique. We discuss and overcome some of the practical drawbacks of our method for non-linear gender bias mitigation in word embeddings and analyze empirically whether the bias subspace is actually linear. Our analysis shows that gender bias is in fact well captured by a linear subspace, justifying the assumption of Bolukbasi et al. (2016).
摘要:Bolukbasi等。 (2016)呈现的第一性别偏见缓解技术的嵌入字一个。他们的方法需要预先训练字的嵌入作为输入,并且尝试以分离线性子空间,其捕获在嵌入物的性别偏见的最多。作为判断一个类比评估任务,他们的方法实际上消除了嵌入物的性别偏见。然而,他们的方法的隐式和未经试验的假设是,偏压子空间实际上是线性的。在这项工作中,我们概括他们的方法将核化,非线性的版本。我们从核主成分分析的启发和导出的非线性偏置隔离技术。我们讨论并克服我们的一些方法的非线性性别偏见缓解实际缺点的话的嵌入和实证分析的偏差子空间是否真的是线性的。我们的分析表明,性别偏见实际上以及由线性子空间捕获,证明Bolukbasi等人的假设。 (2016)。
55. Can questions summarize a corpus? Using question generation for characterizing COVID-19 research [PDF] 返回目录
Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo
Abstract: What are the latent questions on some textual data? In this work, we investigate using question generation models for exploring a collection of documents. Our method, dubbed corpus2question, consists of applying a pre-trained question generation model over a corpus and aggregating the resulting questions by frequency and time. This technique is an alternative to methods such as topic modelling and word cloud for summarizing large amounts of textual data. Results show that applying corpus2question on a corpus of scientific articles related to COVID-19 yields relevant questions about the topic. The most frequent questions are "what is covid 19" and "what is the treatment for covid". Among the 1000 most frequent questions are "what is the threshold for herd immunity" and "what is the role of ace2 in viral entry". We show that the proposed method generated similar questions for 13 of the 27 expert-made questions from the CovidQA question answering dataset. The code to reproduce our experiments and the generated questions are available at: this https URL
摘要:什么是对一些文本数据潜在的问题?在这项工作中,我们研究使用问题一代车型探索的文档集合。我们的方法,被称为corpus2question,包括在语料库应用预先训练的问题生成模型,并通过汇总频率和时间所产生的问题的。此技术是将方法如主题建模和用于概括大量文本数据的词云的替代方案。结果表明就有关COVID-19产量的话题有关问题的科学文章的文集,申请corpus2question。最常见的问题是“什么是covid 19”和“什么是covid治疗”。其中1000个最常见的问题是“什么是群体免疫门槛”和“什么是ACE2在病毒进入角色”。我们表明,该方法生成类似的问题为从CovidQA问答集27专家提出的问题13。代码复制我们的实验和产生的问题可在:此HTTPS URL
Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo
Abstract: What are the latent questions on some textual data? In this work, we investigate using question generation models for exploring a collection of documents. Our method, dubbed corpus2question, consists of applying a pre-trained question generation model over a corpus and aggregating the resulting questions by frequency and time. This technique is an alternative to methods such as topic modelling and word cloud for summarizing large amounts of textual data. Results show that applying corpus2question on a corpus of scientific articles related to COVID-19 yields relevant questions about the topic. The most frequent questions are "what is covid 19" and "what is the treatment for covid". Among the 1000 most frequent questions are "what is the threshold for herd immunity" and "what is the role of ace2 in viral entry". We show that the proposed method generated similar questions for 13 of the 27 expert-made questions from the CovidQA question answering dataset. The code to reproduce our experiments and the generated questions are available at: this https URL
摘要:什么是对一些文本数据潜在的问题?在这项工作中,我们研究使用问题一代车型探索的文档集合。我们的方法,被称为corpus2question,包括在语料库应用预先训练的问题生成模型,并通过汇总频率和时间所产生的问题的。此技术是将方法如主题建模和用于概括大量文本数据的词云的替代方案。结果表明就有关COVID-19产量的话题有关问题的科学文章的文集,申请corpus2question。最常见的问题是“什么是covid 19”和“什么是covid治疗”。其中1000个最常见的问题是“什么是群体免疫门槛”和“什么是ACE2在病毒进入角色”。我们表明,该方法生成类似的问题为从CovidQA问答集27专家提出的问题13。代码复制我们的实验和产生的问题可在:此HTTPS URL
56. Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect [PDF] 返回目录
Zheni Zeng, Chaojun Xiao, Yuan Yao, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun
Abstract: Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer between domains and tasks, which can potentially alleviate the data sparsity problem in recommender systems. In this survey, we first provide a review of recommender systems with pre-training. In addition, we show the benefits of pre-training to recommender systems through experiments. Finally, we discuss several promising directions for future research for recommender systems with pre-training.
摘要:推荐系统的目标是为用户提供项目建议,并在真实场景通常面临着数据稀疏的问题(例如,冷启动)。近日预先训练模型已经显示出他们的领域和任务,这可能缓解推荐系统的数据稀疏问题之间的知识转移影响。在本次调查中,我们首先提供售前培训推荐系统的审查。此外,我们展示了一个前培训,推荐系统通过实验的好处。最后,我们讨论了与前培训,推荐系统未来研究的几个有前途的方向。
Zheni Zeng, Chaojun Xiao, Yuan Yao, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun
Abstract: Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer between domains and tasks, which can potentially alleviate the data sparsity problem in recommender systems. In this survey, we first provide a review of recommender systems with pre-training. In addition, we show the benefits of pre-training to recommender systems through experiments. Finally, we discuss several promising directions for future research for recommender systems with pre-training.
摘要:推荐系统的目标是为用户提供项目建议,并在真实场景通常面临着数据稀疏的问题(例如,冷启动)。近日预先训练模型已经显示出他们的领域和任务,这可能缓解推荐系统的数据稀疏问题之间的知识转移影响。在本次调查中,我们首先提供售前培训推荐系统的审查。此外,我们展示了一个前培训,推荐系统通过实验的好处。最后,我们讨论了与前培训,推荐系统未来研究的几个有前途的方向。
57. Active Learning for Product Type Ontology Enhancement in E-commerce [PDF] 返回目录
Yun Zhu, Sayyed M. Zahiri, Jiaqi Wang, Han-Yu Chen, Faizan Javed
Abstract: Entity-based semantic search has been widely adopted in modern search engines to improve search accuracy by understanding users' intent. In e-commerce, an accurate and complete product type (PT) ontology is essential for recognizing product entities in queries and retrieving relevant products from catalog. However, finding product types (PTs) to construct such an ontology is usually expensive due to the considerable amount of human efforts it may involve. In this work, we propose an active learning framework that efficiently utilizes domain experts' knowledge for PT discovery. We also show the quality and coverage of the resulting PTs in the experiment results.
摘要:基于实体的语义搜索已在现代搜索引擎中被广泛采用,通过了解用户的意图,以提高搜索的准确性。在电子商务,准确和完整的产品类型(PT)本体是在查询确认产品的实体和检索相关的产品目录中是必不可少的。然而,寻找的产品类型(PTS)来构造这样的本体通常是昂贵的,因为相当大的量,可能涉及人努力。在这项工作中,我们提出了一个积极的学习框架,有效地利用领域专家为PT发现知识。我们还表明在实验结果所产生的PTS的质量和覆盖面。
Yun Zhu, Sayyed M. Zahiri, Jiaqi Wang, Han-Yu Chen, Faizan Javed
Abstract: Entity-based semantic search has been widely adopted in modern search engines to improve search accuracy by understanding users' intent. In e-commerce, an accurate and complete product type (PT) ontology is essential for recognizing product entities in queries and retrieving relevant products from catalog. However, finding product types (PTs) to construct such an ontology is usually expensive due to the considerable amount of human efforts it may involve. In this work, we propose an active learning framework that efficiently utilizes domain experts' knowledge for PT discovery. We also show the quality and coverage of the resulting PTs in the experiment results.
摘要:基于实体的语义搜索已在现代搜索引擎中被广泛采用,通过了解用户的意图,以提高搜索的准确性。在电子商务,准确和完整的产品类型(PT)本体是在查询确认产品的实体和检索相关的产品目录中是必不可少的。然而,寻找的产品类型(PTS)来构造这样的本体通常是昂贵的,因为相当大的量,可能涉及人努力。在这项工作中,我们提出了一个积极的学习框架,有效地利用领域专家为PT发现知识。我们还表明在实验结果所产生的PTS的质量和覆盖面。
58. An AI based talent acquisition and benchmarking for job [PDF] 返回目录
Rudresh Mishra, Ricardo Rodriguez, Valentin Portillo
Abstract: In a recruitment industry, selecting a best CV from a particular job post within a pile of thousand CV's is quite challenging. Finding a perfect candidate for an organization who can be fit to work within organizational culture is a difficult task. In order to help the recruiters to fill these gaps we leverage the help of AI. We propose a methodology to solve these problems by matching the skill graph generated from CV and Job Post. In this report our approach is to perform the business understanding in order to justify why such problems arise and how we intend to solve these problems using natural language processing and machine learning techniques. We limit our project only to solve the problem in the domain of the computer science industry.
摘要:在招聘行业,一堆一千简历内选择从特定的工作后最好的简历是相当具有挑战性的。找到一个完美的候选人对于一个组织谁可以适合组织文化工作是一项艰巨的任务。为了帮助招聘人员填补这些差距,我们利用AI的帮助。我们提出了一个方法,从简历和工作后所产生的技术图形匹配来解决这些问题。在这份报告中我们的做法是,以解释为何这样的问题出现,我们打算如何解决使用自然语言处理和机器学习技术的这些问题,执行业务的理解。我们限制了我们的项目只是为了解决在计算机科学行业领域的问题。
Rudresh Mishra, Ricardo Rodriguez, Valentin Portillo
Abstract: In a recruitment industry, selecting a best CV from a particular job post within a pile of thousand CV's is quite challenging. Finding a perfect candidate for an organization who can be fit to work within organizational culture is a difficult task. In order to help the recruiters to fill these gaps we leverage the help of AI. We propose a methodology to solve these problems by matching the skill graph generated from CV and Job Post. In this report our approach is to perform the business understanding in order to justify why such problems arise and how we intend to solve these problems using natural language processing and machine learning techniques. We limit our project only to solve the problem in the domain of the computer science industry.
摘要:在招聘行业,一堆一千简历内选择从特定的工作后最好的简历是相当具有挑战性的。找到一个完美的候选人对于一个组织谁可以适合组织文化工作是一项艰巨的任务。为了帮助招聘人员填补这些差距,我们利用AI的帮助。我们提出了一个方法,从简历和工作后所产生的技术图形匹配来解决这些问题。在这份报告中我们的做法是,以解释为何这样的问题出现,我们打算如何解决使用自然语言处理和机器学习技术的这些问题,执行业务的理解。我们限制了我们的项目只是为了解决在计算机科学行业领域的问题。
59. Intimate Partner Violence and Injury Prediction From Radiology Reports [PDF] 返回目录
Irene Y. Chen, Emily Alsentzer, Hyesun Park, Richard Thomas, Babina Gosangi, Rahul Gujrathi, Bharti Khurana
Abstract: Intimate partner violence (IPV) is an urgent, prevalent, and under-detected public health issue. We present machine learning models to assess patients for IPV and injury. We train the predictive algorithms on radiology reports with 1) IPV labels based on entry to a violence prevention program and 2) injury labels provided by emergency radiology fellowship-trained physicians. Our full dataset includes 34,642 radiology reports and 1479 patients of IPV victims and control patients. We are able to accurately predict IPV victims and injury labels, and our best model predicts IPV a median of 1.34 years before violence prevention program entry with a sensitivity of 95\% and a specificity of 71\%. Our findings align with known clinical patterns of IPV injuries. We conduct error analysis to determine for which patients our model has especially high or low performance and discuss next steps for a deployed clinical risk model.
摘要:亲密伴侣暴力(IPV)是一项紧迫的,普遍的,并且在检测到的公共健康问题。我们目前的机器学习模型来评估患者IPV和伤害。我们培养与1)放射报告IPV标签基础上进入紧急放射学奖学金,培训的医生提供了一个暴力预防计划和2)受伤标签预测算法。我们的完整数据集包括34642个放射报告和1479例IPV受害者和控制患者。我们能够准确地预测IPV受害者和受伤的标签,和我们最好的模型预测IPV与95 \%的敏感性和71 \%的特异性暴力预防程序条目之前1.34年的中位数。我们的研究结果与IPV受伤已知的临床模式保持一致。我们进行误差分析,以确定哪些病人我们的模型具有特别高或低性能,并讨论部署的临床风险模型的后续步骤。
Irene Y. Chen, Emily Alsentzer, Hyesun Park, Richard Thomas, Babina Gosangi, Rahul Gujrathi, Bharti Khurana
Abstract: Intimate partner violence (IPV) is an urgent, prevalent, and under-detected public health issue. We present machine learning models to assess patients for IPV and injury. We train the predictive algorithms on radiology reports with 1) IPV labels based on entry to a violence prevention program and 2) injury labels provided by emergency radiology fellowship-trained physicians. Our full dataset includes 34,642 radiology reports and 1479 patients of IPV victims and control patients. We are able to accurately predict IPV victims and injury labels, and our best model predicts IPV a median of 1.34 years before violence prevention program entry with a sensitivity of 95\% and a specificity of 71\%. Our findings align with known clinical patterns of IPV injuries. We conduct error analysis to determine for which patients our model has especially high or low performance and discuss next steps for a deployed clinical risk model.
摘要:亲密伴侣暴力(IPV)是一项紧迫的,普遍的,并且在检测到的公共健康问题。我们目前的机器学习模型来评估患者IPV和伤害。我们培养与1)放射报告IPV标签基础上进入紧急放射学奖学金,培训的医生提供了一个暴力预防计划和2)受伤标签预测算法。我们的完整数据集包括34642个放射报告和1479例IPV受害者和控制患者。我们能够准确地预测IPV受害者和受伤的标签,和我们最好的模型预测IPV与95 \%的敏感性和71 \%的特异性暴力预防程序条目之前1.34年的中位数。我们的研究结果与IPV受伤已知的临床模式保持一致。我们进行误差分析,以确定哪些病人我们的模型具有特别高或低性能,并讨论部署的临床风险模型的后续步骤。
注:中文为机器翻译结果!封面为论文标题词云图!