目录
12. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces [PDF] 摘要
14. MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models [PDF] 摘要
19. STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++ [PDF] 摘要
22. How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds [PDF] 摘要
23. Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling [PDF] 摘要
24. Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding [PDF] 摘要
25. Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers [PDF] 摘要
28. Building Large Lexicalized Ontologies from Text: a Use Case in Automatic Indexing of Biotechnology Patents [PDF] 摘要
摘要
1. Cross-Lingual Transfer Learning for Complex Word Identification [PDF] 返回目录
George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu
Abstract: Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techniques, alongside state-of-the-art solutions for Natural Language Processing (NLP) tasks (i.e., Transformers). Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i.e., English, German, Spanish, and also French). Our approach surpasses state-of-the-art cross-lingual results in terms of macro F1-score on English (0.774), German (0.782), and Spanish (0.734) languages, for the zero-shot learning scenario. At the same time, our model also outperforms the state-of-the-art monolingual result for German (0.795 macro F1-score).
摘要:复合词识别(CWI)是中心在检测难以了解词或单词的组,在从不同的专业领域的文本的任务。 CWI的目的是为了突出问题的结构,非母语者通常会发现很难理解。我们利用零出手,一出手,很少拍学习技术,为一起自然语言处理(NLP)的任务(例如,变压器)国家的最先进的解决方案。我们的目标是提供证据证明所提出的模型可以通过依靠CWI学习的多语言环境复杂词的特点共享任务提供四种不同的语言(即英语,德语,西班牙语和法语也)2018数据集。我们的做法超越了宏观F1-得分方面对英语(0.774),德国(0.782)和西班牙(0.734)语言,国家的最先进的跨语言业绩零射门的学习情况。与此同时,我们的模型也优于德国的先进设备,最先进的单语的结果(0.795宏观F1分数)。
George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu
Abstract: Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techniques, alongside state-of-the-art solutions for Natural Language Processing (NLP) tasks (i.e., Transformers). Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i.e., English, German, Spanish, and also French). Our approach surpasses state-of-the-art cross-lingual results in terms of macro F1-score on English (0.774), German (0.782), and Spanish (0.734) languages, for the zero-shot learning scenario. At the same time, our model also outperforms the state-of-the-art monolingual result for German (0.795 macro F1-score).
摘要:复合词识别(CWI)是中心在检测难以了解词或单词的组,在从不同的专业领域的文本的任务。 CWI的目的是为了突出问题的结构,非母语者通常会发现很难理解。我们利用零出手,一出手,很少拍学习技术,为一起自然语言处理(NLP)的任务(例如,变压器)国家的最先进的解决方案。我们的目标是提供证据证明所提出的模型可以通过依靠CWI学习的多语言环境复杂词的特点共享任务提供四种不同的语言(即英语,德语,西班牙语和法语也)2018数据集。我们的做法超越了宏观F1-得分方面对英语(0.774),德国(0.782)和西班牙(0.734)语言,国家的最先进的跨语言业绩零射门的学习情况。与此同时,我们的模型也优于德国的先进设备,最先进的单语的结果(0.795宏观F1分数)。
2. Multi-Modal Open-Domain Dialogue [PDF] 返回目录
Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston
Abstract: Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic is the ability to see images and communicate about what is perceived. With the goal of engaging humans in multi-modal dialogue, we investigate combining components from state-of-the-art open-domain dialogue agents with those from state-of-the-art vision models. We study incorporating different image fusion schemes and domain-adaptive pre-training and fine-tuning strategies, and show that our best resulting model outperforms strong existing models in multi-modal dialogue while simultaneously performing as well as its predecessor (text-only) BlenderBot (Roller et al., 2020) in text-based conversation. We additionally investigate and incorporate safety components in our final model, and show that such efforts do not diminish model performance with respect to engagingness metrics.
摘要:在开放领域对话剂最近的研究表明,在模型engagingness和人性化的指标显著的改善可以通过大力加强预训练数据和模型的大小都可以实现(Adiwardana等2020;压路机等人,2020年。 )。但是,如果我们想建立与代理类似人类的能力,我们必须扩大超出了处理只是文本。一个特别重要的话题,就是看图片和什么被认为沟通的能力。随着多模态对话人的目标,我们调查与那些从国家的最先进的视觉模型相结合,从国家的最先进的开放域对话剂成分。我们研究结合不同的图像融合方案和领域自适应前培训及微调策略,并展示在多种形式对话我们最好的产生模型优于强现有的模型,同时进行以及其前身(仅文字)BlenderBot在基于文本的会话(滚子等,2020)。我们还调查和我们的最终模型将安全组件,并表明这种努力不减少模型的性能相对于engagingness指标。
Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston
Abstract: Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic is the ability to see images and communicate about what is perceived. With the goal of engaging humans in multi-modal dialogue, we investigate combining components from state-of-the-art open-domain dialogue agents with those from state-of-the-art vision models. We study incorporating different image fusion schemes and domain-adaptive pre-training and fine-tuning strategies, and show that our best resulting model outperforms strong existing models in multi-modal dialogue while simultaneously performing as well as its predecessor (text-only) BlenderBot (Roller et al., 2020) in text-based conversation. We additionally investigate and incorporate safety components in our final model, and show that such efforts do not diminish model performance with respect to engagingness metrics.
摘要:在开放领域对话剂最近的研究表明,在模型engagingness和人性化的指标显著的改善可以通过大力加强预训练数据和模型的大小都可以实现(Adiwardana等2020;压路机等人,2020年。 )。但是,如果我们想建立与代理类似人类的能力,我们必须扩大超出了处理只是文本。一个特别重要的话题,就是看图片和什么被认为沟通的能力。随着多模态对话人的目标,我们调查与那些从国家的最先进的视觉模型相结合,从国家的最先进的开放域对话剂成分。我们研究结合不同的图像融合方案和领域自适应前培训及微调策略,并展示在多种形式对话我们最好的产生模型优于强现有的模型,同时进行以及其前身(仅文字)BlenderBot在基于文本的会话(滚子等,2020)。我们还调查和我们的最终模型将安全组件,并表明这种努力不减少模型的性能相对于engagingness指标。
3. HUMAN: Hierarchical Universal Modular ANnotator [PDF] 返回目录
Moritz Wolf, Dana Ruiter, Ashwin Geet D'Sa, Liane Reiners, Jan Alexandersson, Dietrich Klakow
Abstract: A lot of real-world phenomena are complex and cannot be captured by single task annotations. This causes a need for subsequent annotations, with interdependent questions and answers describing the nature of the subject at hand. Even in the case a phenomenon is easily captured by a single task, the high specialisation of most annotation tools can result in having to switch to another tool if the task only slightly changes. We introduce HUMAN, a novel web-based annotation tool that addresses the above problems by a) covering a variety of annotation tasks on both textual and image data, and b) the usage of an internal deterministic state machine, allowing the researcher to chain different annotation tasks in an interdependent manner. Further, the modular nature of the tool makes it easy to define new annotation tasks and integrate machine learning algorithms e.g., for active learning. HUMAN comes with an easy-to-use graphical user interface that simplifies the annotation task and management.
摘要:许多真实世界的现象是复杂的,不能由单个任务的注释被捕获。这将导致需要随后的注解,有相互依存的问题,并描述了手头的学科性质的答案。即使在情况下,一个现象是由一个单一的任务很容易捕捉到,大部分标注工具的高度专业化可导致不得不切换到另一种工具,如果任务只是略有变化。我们引入HUMAN,一种新颖的基于web的注释工具,地址由上述的问题)涵盖各种注解任务上文字和图像数据,以及b)一个内部确定性状态机的使用,允许研究者链不同在一个相互依存的方式注解任务。此外,该工具的模块化特性使得它容易地定义新的注释任务,例如集成的机器学习算法,主动学习。人力配备,简化了注释任务和管理一个易于使用的图形用户界面。
Moritz Wolf, Dana Ruiter, Ashwin Geet D'Sa, Liane Reiners, Jan Alexandersson, Dietrich Klakow
Abstract: A lot of real-world phenomena are complex and cannot be captured by single task annotations. This causes a need for subsequent annotations, with interdependent questions and answers describing the nature of the subject at hand. Even in the case a phenomenon is easily captured by a single task, the high specialisation of most annotation tools can result in having to switch to another tool if the task only slightly changes. We introduce HUMAN, a novel web-based annotation tool that addresses the above problems by a) covering a variety of annotation tasks on both textual and image data, and b) the usage of an internal deterministic state machine, allowing the researcher to chain different annotation tasks in an interdependent manner. Further, the modular nature of the tool makes it easy to define new annotation tasks and integrate machine learning algorithms e.g., for active learning. HUMAN comes with an easy-to-use graphical user interface that simplifies the annotation task and management.
摘要:许多真实世界的现象是复杂的,不能由单个任务的注释被捕获。这将导致需要随后的注解,有相互依存的问题,并描述了手头的学科性质的答案。即使在情况下,一个现象是由一个单一的任务很容易捕捉到,大部分标注工具的高度专业化可导致不得不切换到另一种工具,如果任务只是略有变化。我们引入HUMAN,一种新颖的基于web的注释工具,地址由上述的问题)涵盖各种注解任务上文字和图像数据,以及b)一个内部确定性状态机的使用,允许研究者链不同在一个相互依存的方式注解任务。此外,该工具的模块化特性使得它容易地定义新的注释任务,例如集成的机器学习算法,主动学习。人力配备,简化了注释任务和管理一个易于使用的图形用户界面。
4. Syntax Representation in Word Embeddings and Neural Networks -- A Survey [PDF] 返回目录
Tomasz Limisiewicz, David Mareček
Abstract: Neural networks trained on natural language processing tasks capture syntax even though it is not provided as a supervision signal. This indicates that syntactic analysis is essential to the understating of language in artificial intelligence systems. This overview paper covers approaches of evaluating the amount of syntactic information included in the representations of words for different neural network architectures. We mainly summarize re-search on English monolingual data on language modeling tasks and multilingual data for neural machine translation systems and multilingual language models. We describe which pre-trained models and representations of language are best suited for transfer to syntactic tasks.
摘要:经过培训的自然语言处理任务捕捉语法的神经网络,即使它不是作为一个监控信号提供。这表明,句法分析是语言的人工智能系统的低估是必不可少的。该概述文件覆盖方法评估包括在不同的神经网络结构词的表示语法的信息量。我们主要总结语言建模任务和神经机器翻译系统和多语种的语言模型多种语言数据,英语单语数据重新搜索。我们描述了预先训练模型和语言的表述是最适合于传输句法任务。
Tomasz Limisiewicz, David Mareček
Abstract: Neural networks trained on natural language processing tasks capture syntax even though it is not provided as a supervision signal. This indicates that syntactic analysis is essential to the understating of language in artificial intelligence systems. This overview paper covers approaches of evaluating the amount of syntactic information included in the representations of words for different neural network architectures. We mainly summarize re-search on English monolingual data on language modeling tasks and multilingual data for neural machine translation systems and multilingual language models. We describe which pre-trained models and representations of language are best suited for transfer to syntactic tasks.
摘要:经过培训的自然语言处理任务捕捉语法的神经网络,即使它不是作为一个监控信号提供。这表明,句法分析是语言的人工智能系统的低估是必不可少的。该概述文件覆盖方法评估包括在不同的神经网络结构词的表示语法的信息量。我们主要总结语言建模任务和神经机器翻译系统和多语种的语言模型多种语言数据,英语单语数据重新搜索。我们描述了预先训练模型和语言的表述是最适合于传输句法任务。
5. Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data [PDF] 返回目录
Nils Rethmeier, Isabelle Augenstein
Abstract: For natural language processing (NLP) tasks such as sentiment or topic classification, currently prevailing approaches heavily rely on pretraining large self-supervised models on massive external data resources. However, this methodology is being critiqued for: exceptional compute and pretraining data requirements; diminishing returns on both large and small datasets; and importantly, favourable evaluation settings that overestimate performance differences. The core belief behind current methodology, coined `the bitter lesson' by R. Sutton, is that `compute scale-up beats data and compute-efficient algorithms', neglecting that progress in compute hardware scale-up is based almost entirely on the miniaturisation of resource consumption. We thus approach pretraining from a miniaturisation perspective, such as not to require massive external data sources and models, or learned translations from continuous input embeddings to discrete labels. To minimise overly favourable evaluation, we examine learning on a long-tailed, low-resource, multi-label text classification dataset with noisy, highly sparse labels and many rare concepts. To this end, we propose a novel `dataset-internal' contrastive autoencoding approach to self-supervised pretraining and demonstrate marked improvements in zero-shot, few-shot and solely supervised learning performance; even under an unfavorable low-resource scenario, and without defaulting to large-scale external datasets for self-supervision. We also find empirical evidence that zero and few-shot learning markedly benefit from adding more `dataset-internal', self-supervised training signals, which is of practical importance when retrieving or computing on large external sources of such signals is infeasible.
摘要:自然语言处理(NLP)的任务,如情绪或主题分类,目前流行的方法在很大程度上依赖于对大量的外部数据资源,训练前大自我监督模式。然而,这种方法被批评为:卓越的计算和训练前的数据要求;减少对大型和小型数据集的回报;重要的是,良好的评价设置,高估的性能差异。核心信念的背后目前的方法,创造`的惨痛教训”由R.萨顿,是'计算规模化节拍数据和计算效率的算法,而忽略了在计算硬件规模化的进展情况也几乎完全基于小型化的资源消耗。因此,我们已接近于从小型化的角度训练前,如不要求从连续输入的嵌入到离散标签海量外部数据源和模型,或学习翻译。为了最大限度地减少过度优惠的评测中,我们考察学习的长尾,低资源,多标签文本分类与嘈杂,非常稀疏标签和许多罕见的概念数据集。为此,我们提出了一个新的`数据集内部的”对比autoencoding方法自我训练前的监督和证明零出手,很少拍显着改善,仅供监督学习表现;即使在不利的低资源情况下,没有违约的大规模数据集的外部进行自我监督。我们还发现经验证据表明,零和几个次学习显着受益于增加更多的'数据集内部的”,自我监督的训练信号,检索或计算上这些信号的大的外部源时是不可行这是具有实际意义。
Nils Rethmeier, Isabelle Augenstein
Abstract: For natural language processing (NLP) tasks such as sentiment or topic classification, currently prevailing approaches heavily rely on pretraining large self-supervised models on massive external data resources. However, this methodology is being critiqued for: exceptional compute and pretraining data requirements; diminishing returns on both large and small datasets; and importantly, favourable evaluation settings that overestimate performance differences. The core belief behind current methodology, coined `the bitter lesson' by R. Sutton, is that `compute scale-up beats data and compute-efficient algorithms', neglecting that progress in compute hardware scale-up is based almost entirely on the miniaturisation of resource consumption. We thus approach pretraining from a miniaturisation perspective, such as not to require massive external data sources and models, or learned translations from continuous input embeddings to discrete labels. To minimise overly favourable evaluation, we examine learning on a long-tailed, low-resource, multi-label text classification dataset with noisy, highly sparse labels and many rare concepts. To this end, we propose a novel `dataset-internal' contrastive autoencoding approach to self-supervised pretraining and demonstrate marked improvements in zero-shot, few-shot and solely supervised learning performance; even under an unfavorable low-resource scenario, and without defaulting to large-scale external datasets for self-supervision. We also find empirical evidence that zero and few-shot learning markedly benefit from adding more `dataset-internal', self-supervised training signals, which is of practical importance when retrieving or computing on large external sources of such signals is infeasible.
摘要:自然语言处理(NLP)的任务,如情绪或主题分类,目前流行的方法在很大程度上依赖于对大量的外部数据资源,训练前大自我监督模式。然而,这种方法被批评为:卓越的计算和训练前的数据要求;减少对大型和小型数据集的回报;重要的是,良好的评价设置,高估的性能差异。核心信念的背后目前的方法,创造`的惨痛教训”由R.萨顿,是'计算规模化节拍数据和计算效率的算法,而忽略了在计算硬件规模化的进展情况也几乎完全基于小型化的资源消耗。因此,我们已接近于从小型化的角度训练前,如不要求从连续输入的嵌入到离散标签海量外部数据源和模型,或学习翻译。为了最大限度地减少过度优惠的评测中,我们考察学习的长尾,低资源,多标签文本分类与嘈杂,非常稀疏标签和许多罕见的概念数据集。为此,我们提出了一个新的`数据集内部的”对比autoencoding方法自我训练前的监督和证明零出手,很少拍显着改善,仅供监督学习表现;即使在不利的低资源情况下,没有违约的大规模数据集的外部进行自我监督。我们还发现经验证据表明,零和几个次学习显着受益于增加更多的'数据集内部的”,自我监督的训练信号,检索或计算上这些信号的大的外部源时是不可行这是具有实际意义。
6. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention [PDF] 返回目录
Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto
Abstract: Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at this https URL.
摘要:实体表示是在涉及实体的自然语言的任务非常有用。在本文中,我们提出了基于双向变压器词和实体的新的预训练情境表示。该模型对待的话和实体在一个给定的文本作为独立的令牌,并输出情境的申述。我们的模型是使用基于BERT的掩盖语言模型的新训练前的任务训练。任务包括预测随机屏蔽字和实体在维基百科检索的大型实体标注语料库。我们还提出了一种实体感知自我注意机制计算关注分数时是变压器的自注意机制的延伸,并认为该类型的令牌(单词或实体)。该模型实现了广泛的实体相关的任务,令人印象深刻的经验性能。特别是,它获得的五大知名数据集的国家的最先进的成果:开放式实体(实体类型),TACRED(关系分类),CoNLL-2003(命名实体识别),录音(完形填空式问答)和小队1.1(萃取问答)。我们的源代码和预训练表示可在此HTTPS URL。
Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto
Abstract: Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at this https URL.
摘要:实体表示是在涉及实体的自然语言的任务非常有用。在本文中,我们提出了基于双向变压器词和实体的新的预训练情境表示。该模型对待的话和实体在一个给定的文本作为独立的令牌,并输出情境的申述。我们的模型是使用基于BERT的掩盖语言模型的新训练前的任务训练。任务包括预测随机屏蔽字和实体在维基百科检索的大型实体标注语料库。我们还提出了一种实体感知自我注意机制计算关注分数时是变压器的自注意机制的延伸,并认为该类型的令牌(单词或实体)。该模型实现了广泛的实体相关的任务,令人印象深刻的经验性能。特别是,它获得的五大知名数据集的国家的最先进的成果:开放式实体(实体类型),TACRED(关系分类),CoNLL-2003(命名实体识别),录音(完形填空式问答)和小队1.1(萃取问答)。我们的源代码和预训练表示可在此HTTPS URL。
7. Unsupervised Text Style Transfer with Padded Masked Language Models [PDF] 返回目录
Eric Malmi, Aliaksei Severyn, Sascha Rothe
Abstract: We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that Masker performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods' accuracy by over 10 percentage points when pre-training them on silver training data generated by Masker.
摘要:本文提出掩蔽,样式转让不受监督的文本编辑方法。为了解决案件时,没有并联源 - 目标对是可用的,我们训练蒙面语言模型(多层次营销)源和目标域两者。随后,我们发现文本跨度,其中两个模式不一致,最可能的方面。这使我们能够查明来源的标记删除变换源文本,以匹配目标域的风格。被删除的标记被替换为目标MLM,并且通过使用一个软垫MLM变体中,我们避免必须预先插入的令牌的数量。我们对句子的融合和情绪转移实验证明,掩蔽在完全无人监管的环境竞争力进行。此外,在低资源设置,它超过10个百分点,提高监督的方法时的精度前培训他们对掩蔽产生的银训练数据。
Eric Malmi, Aliaksei Severyn, Sascha Rothe
Abstract: We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that Masker performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods' accuracy by over 10 percentage points when pre-training them on silver training data generated by Masker.
摘要:本文提出掩蔽,样式转让不受监督的文本编辑方法。为了解决案件时,没有并联源 - 目标对是可用的,我们训练蒙面语言模型(多层次营销)源和目标域两者。随后,我们发现文本跨度,其中两个模式不一致,最可能的方面。这使我们能够查明来源的标记删除变换源文本,以匹配目标域的风格。被删除的标记被替换为目标MLM,并且通过使用一个软垫MLM变体中,我们避免必须预先插入的令牌的数量。我们对句子的融合和情绪转移实验证明,掩蔽在完全无人监管的环境竞争力进行。此外,在低资源设置,它超过10个百分点,提高监督的方法时的精度前培训他们对掩蔽产生的银训练数据。
8. TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation [PDF] 返回目录
Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi, Jens Lehmann
Abstract: In the last few years, there has been a surge of interest in learning representations of entitiesand relations in knowledge graph (KG). However, the recent availability of temporal knowledgegraphs (TKGs) that contain time information for each fact created the need for reasoning overtime in such TKGs. In this regard, we present a new approach of TKG embedding, TeRo, which defines the temporal evolution of entity embedding as a rotation from the initial time to the currenttime in the complex vector space. Specially, for facts involving time intervals, each relation isrepresented as a pair of dual complex embeddings to handle the beginning and the end of therelation, respectively. We show our proposed model overcomes the limitations of the existing KG embedding models and TKG embedding models and has the ability of learning and inferringvarious relation patterns over time. Experimental results on four different TKGs show that TeRo significantly outperforms existing state-of-the-art models for link prediction. In addition, we analyze the effect of time granularity on link prediction over TKGs, which as far as we know hasnot been investigated in previous literature.
摘要:在过去的几年中,一直存在的学习兴趣在知识图entitiesand关系(KG)的表示激增。然而,最近的是包含每个事实上时间信息的时间knowledgegraphs(TKGs)的可用性创造了在这样的TKGs推理加班的需要。在这方面,我们提出TKG嵌入,TERO,它定义实体嵌入作为从初始时间到在复矢量空间中的CURRENTTIME旋转的时间演变的一种新的方法。特别地,对于涉及的时间间隔的事实,每个关系isrepresented作为对双复杂的嵌入分别处理的开始和therelation的端部。我们证明我们提出的模型克服现有的KG嵌入模型和TKG嵌入模型的局限性,具有学习和inferringvarious关系模式在时间上的能力。在四个不同的TKGs实验结果表明,泰罗显著优于现有的国家的最先进的模型预测的链接。此外,我们分析时间粒度的链路上预测过TKGs,这是据我们所知hasnot已经在以往的文献研究的影响。
Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi, Jens Lehmann
Abstract: In the last few years, there has been a surge of interest in learning representations of entitiesand relations in knowledge graph (KG). However, the recent availability of temporal knowledgegraphs (TKGs) that contain time information for each fact created the need for reasoning overtime in such TKGs. In this regard, we present a new approach of TKG embedding, TeRo, which defines the temporal evolution of entity embedding as a rotation from the initial time to the currenttime in the complex vector space. Specially, for facts involving time intervals, each relation isrepresented as a pair of dual complex embeddings to handle the beginning and the end of therelation, respectively. We show our proposed model overcomes the limitations of the existing KG embedding models and TKG embedding models and has the ability of learning and inferringvarious relation patterns over time. Experimental results on four different TKGs show that TeRo significantly outperforms existing state-of-the-art models for link prediction. In addition, we analyze the effect of time granularity on link prediction over TKGs, which as far as we know hasnot been investigated in previous literature.
摘要:在过去的几年中,一直存在的学习兴趣在知识图entitiesand关系(KG)的表示激增。然而,最近的是包含每个事实上时间信息的时间knowledgegraphs(TKGs)的可用性创造了在这样的TKGs推理加班的需要。在这方面,我们提出TKG嵌入,TERO,它定义实体嵌入作为从初始时间到在复矢量空间中的CURRENTTIME旋转的时间演变的一种新的方法。特别地,对于涉及的时间间隔的事实,每个关系isrepresented作为对双复杂的嵌入分别处理的开始和therelation的端部。我们证明我们提出的模型克服现有的KG嵌入模型和TKG嵌入模型的局限性,具有学习和inferringvarious关系模式在时间上的能力。在四个不同的TKGs实验结果表明,泰罗显著优于现有的国家的最先进的模型预测的链接。此外,我们分析时间粒度的链路上预测过TKGs,这是据我们所知hasnot已经在以往的文献研究的影响。
9. MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale [PDF] 返回目录
Andreas Rücklé, Jonas Pfeiffer, Iryna Gurevych
Abstract: We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar domains. In addition, we extensively study how to best combine multiple source domains. We propose to incorporate self-supervised with supervised multi-task learning on all available source domains. Our best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks. Fine-tuning of our model with in-domain data results in additional large gains and achieves the new state of the art on all nine benchmarks.
摘要:大规模从社区问答论坛英语学习文本匹配模式的零射门传输功能,通过对140个的源域自我监督训练。我们调查的答案选择和问题相似任务的九项基准模型表演,并且表明,所有140款转移出奇地好,其中大部分车型大幅优于通用红外线基线。我们还表明,考虑到源域的一个广阔的选择是获得最佳的零出手转让表演,对比的标准程序,仅仅依靠规模最大,最相似的领域是至关重要的。此外,我们广泛地研究如何最好多源域结合起来。我们建议纳入自我监督与所有可用的源域监督的多任务学习。我们最好的零次传输模型大大优于域内BERT和艺术的六个基准以前的状态。我们的模型与额外收益大域内的数据结果进行微调,并实现所有九项基准的新的艺术状态。
Andreas Rücklé, Jonas Pfeiffer, Iryna Gurevych
Abstract: We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar domains. In addition, we extensively study how to best combine multiple source domains. We propose to incorporate self-supervised with supervised multi-task learning on all available source domains. Our best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks. Fine-tuning of our model with in-domain data results in additional large gains and achieves the new state of the art on all nine benchmarks.
摘要:大规模从社区问答论坛英语学习文本匹配模式的零射门传输功能,通过对140个的源域自我监督训练。我们调查的答案选择和问题相似任务的九项基准模型表演,并且表明,所有140款转移出奇地好,其中大部分车型大幅优于通用红外线基线。我们还表明,考虑到源域的一个广阔的选择是获得最佳的零出手转让表演,对比的标准程序,仅仅依靠规模最大,最相似的领域是至关重要的。此外,我们广泛地研究如何最好多源域结合起来。我们建议纳入自我监督与所有可用的源域监督的多任务学习。我们最好的零次传输模型大大优于域内BERT和艺术的六个基准以前的状态。我们的模型与额外收益大域内的数据结果进行微调,并实现所有九项基准的新的艺术状态。
10. Continual Learning for Natural Language Generation in Task-oriented Dialog Systems [PDF] 返回目录
Fei Mi, Liangwei Chen, Mengjie Zhao, Minlie Huang, Boi Faltings
Abstract: Natural language generation (NLG) is an essential component of task-oriented dialog systems. Despite the recent success of neural approaches for NLG, they are typically developed in an offline manner for particular domains. To better fit real-life applications where new data come in a stream, we study NLG in a "continual learning" setting to expand its knowledge to new domains or functionalities incrementally. The major challenge towards this goal is catastrophic forgetting, meaning that a continually trained model tends to forget the knowledge it has learned before. To this end, we propose a method called ARPER (Adaptively Regularized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on ElasticWeight Consolidation. Extensive experiments to continually learn new domains and intents are conducted on MultiWoZ-2.0 to benchmark ARPER with a wide range of techniques. Empirical results demonstrate that ARPER significantly outperforms other methods by effectively mitigating the detrimental catastrophic forgetting issue.
摘要:自然语言生成(NLG)是面向任务的对话系统的重要组成部分。尽管近期对NLG神经方法的成功,他们通常被开发为特定领域脱机方式。到新的数据来,在流更好地适应现实生活中的应用,我们本着“不断学习”的研究NLG设置扩展其知识,新领域或功能递增。实现这一目标的主要挑战是灾难性的遗忘,这意味着一个不断训练模型往往忘记以前学过的知识。为此,我们提出通过重放优先历史范例称为ARPER(自适应正则优先化的Exemplar重播)的方法,连同基于ElasticWeight固结的自适应正则化技术。大量的实验不断学习新领域和意图是在MultiWoZ-2.0传导至基准ARPER具有广泛的技术。实证结果表明,ARPER通过有效减轻不利灾难性遗忘问题显著优于其他方法。
Fei Mi, Liangwei Chen, Mengjie Zhao, Minlie Huang, Boi Faltings
Abstract: Natural language generation (NLG) is an essential component of task-oriented dialog systems. Despite the recent success of neural approaches for NLG, they are typically developed in an offline manner for particular domains. To better fit real-life applications where new data come in a stream, we study NLG in a "continual learning" setting to expand its knowledge to new domains or functionalities incrementally. The major challenge towards this goal is catastrophic forgetting, meaning that a continually trained model tends to forget the knowledge it has learned before. To this end, we propose a method called ARPER (Adaptively Regularized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on ElasticWeight Consolidation. Extensive experiments to continually learn new domains and intents are conducted on MultiWoZ-2.0 to benchmark ARPER with a wide range of techniques. Empirical results demonstrate that ARPER significantly outperforms other methods by effectively mitigating the detrimental catastrophic forgetting issue.
摘要:自然语言生成(NLG)是面向任务的对话系统的重要组成部分。尽管近期对NLG神经方法的成功,他们通常被开发为特定领域脱机方式。到新的数据来,在流更好地适应现实生活中的应用,我们本着“不断学习”的研究NLG设置扩展其知识,新领域或功能递增。实现这一目标的主要挑战是灾难性的遗忘,这意味着一个不断训练模型往往忘记以前学过的知识。为此,我们提出通过重放优先历史范例称为ARPER(自适应正则优先化的Exemplar重播)的方法,连同基于ElasticWeight固结的自适应正则化技术。大量的实验不断学习新领域和意图是在MultiWoZ-2.0传导至基准ARPER具有广泛的技术。实证结果表明,ARPER通过有效减轻不利灾难性遗忘问题显著优于其他方法。
11. Autoregressive Entity Retrieval [PDF] 返回目录
Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni
Abstract: Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per article). The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering. One way to understand current approaches is as classifiers among atomic labels, one for each entity. Their weight vectors are dense entity representations produced by encoding entity information such as descriptions. This approach leads to several shortcomings: i) context and entity affinity is mainly captured through a vector dot product, potentially missing fine-grained interactions between the two; ii) a large memory footprint is needed to store dense representations when considering large entity sets; iii) an appropriately hard set of negative data has to be subsampled at training time. We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion, and conditioned on the context. This enables to mitigate the aforementioned technical issues: i) the autoregressive formulation allows us to directly capture relations between context and entity name, effectively cross encoding both; ii) the memory footprint is greatly reduced because the parameters of our encoder-decoder architecture scale with vocabulary size, not entity count; iii) the exact softmax loss can be efficiently computed without the need to subsample negative data. We show the efficacy of the approach with more than 20 datasets on entity disambiguation, end-to-end entity linking and document retrieval tasks, achieving new SOTA, or very competitive results while using a tiny fraction of the memory of competing systems. Finally, we demonstrate that new entities can be added by simply specifying their unambiguous name.
摘要:实体是在我们如何代表中心和积累的知识。例如,百科全书如维基百科由实体(例如,每一个条)构成。可检索给定查询这些实体的能力是知识密集的任务,如实体连接和开放域问答的基础。要了解当前的方法一种方法是原子标签当中,分类,每一个实体。它们的重量的载体是通过编码实体信息如描述产生致密的实体表示。该方法导致几个缺点:1)上下文和实体的亲和性通过一个向量点积主要捕获,潜在地缺少两者之间细粒度相互作用; ⅱ)需要一个大容量内存中考虑大的实体集何时存储密表示; ⅲ)适当硬组负数据的具有训练时间被二次采样。我们建议流派,第一个系统检索实体通过生成自己独有的名字,从左到右,令牌通过令牌在自回归的方式,并调节上下文。这使得以减轻上述技术问题:1)自回归配方允许我们直接上下文和实体名称之间的关系捕获,有效地穿越编码两者; ii)该存储器占用大大降低,因为我们与词汇量大小,而不是实体计数的编码器 - 解码器架构规模的参数; iii)所述确切SOFTMAX损失可以有效地计算,而不需要子样本负数据。我们表明了该方法的有效性与实体消歧超过20个数据集,终端到终端实体的连接,同时使用竞争对手的系统内存的一小部分文件检索任务,实现新的SOTA,还是很有竞争力的结果。最后,我们证明了新的实体可以简单的通过指定其明确的名称添加。
Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni
Abstract: Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per article). The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering. One way to understand current approaches is as classifiers among atomic labels, one for each entity. Their weight vectors are dense entity representations produced by encoding entity information such as descriptions. This approach leads to several shortcomings: i) context and entity affinity is mainly captured through a vector dot product, potentially missing fine-grained interactions between the two; ii) a large memory footprint is needed to store dense representations when considering large entity sets; iii) an appropriately hard set of negative data has to be subsampled at training time. We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion, and conditioned on the context. This enables to mitigate the aforementioned technical issues: i) the autoregressive formulation allows us to directly capture relations between context and entity name, effectively cross encoding both; ii) the memory footprint is greatly reduced because the parameters of our encoder-decoder architecture scale with vocabulary size, not entity count; iii) the exact softmax loss can be efficiently computed without the need to subsample negative data. We show the efficacy of the approach with more than 20 datasets on entity disambiguation, end-to-end entity linking and document retrieval tasks, achieving new SOTA, or very competitive results while using a tiny fraction of the memory of competing systems. Finally, we demonstrate that new entities can be added by simply specifying their unambiguous name.
摘要:实体是在我们如何代表中心和积累的知识。例如,百科全书如维基百科由实体(例如,每一个条)构成。可检索给定查询这些实体的能力是知识密集的任务,如实体连接和开放域问答的基础。要了解当前的方法一种方法是原子标签当中,分类,每一个实体。它们的重量的载体是通过编码实体信息如描述产生致密的实体表示。该方法导致几个缺点:1)上下文和实体的亲和性通过一个向量点积主要捕获,潜在地缺少两者之间细粒度相互作用; ⅱ)需要一个大容量内存中考虑大的实体集何时存储密表示; ⅲ)适当硬组负数据的具有训练时间被二次采样。我们建议流派,第一个系统检索实体通过生成自己独有的名字,从左到右,令牌通过令牌在自回归的方式,并调节上下文。这使得以减轻上述技术问题:1)自回归配方允许我们直接上下文和实体名称之间的关系捕获,有效地穿越编码两者; ii)该存储器占用大大降低,因为我们与词汇量大小,而不是实体计数的编码器 - 解码器架构规模的参数; iii)所述确切SOFTMAX损失可以有效地计算,而不需要子样本负数据。我们表明了该方法的有效性与实体消歧超过20个数据集,终端到终端实体的连接,同时使用竞争对手的系统内存的一小部分文件检索任务,实现新的SOTA,还是很有竞争力的结果。最后,我们证明了新的实体可以简单的通过指定其明确的名称添加。
12. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces [PDF] 返回目录
K Vani, Sandra Mitrovic, Alessandro Antonucci, Fabio Rinaldi
Abstract: Lexical semantic change detection (also known as semantic shift tracing) is a task of identifying words that have changed their meaning over time. Unsupervised semantic shift tracing, focal point of SemEval2020, is particularly challenging. Given the unsupervised setup, in this work, we propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings. As such, disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages. To leverage this idea, clustering is performed on contextualized (BERT-based) embeddings of word occurrences. The obtained results show that our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
摘要:词汇语义变化检测(也称为语义移位跟踪)是一种用于识别已经改变了他们的含义随时间单词的任务。无监督语义移位跟踪,SemEval2020的焦点,特别具有挑战性。由于无人监督的设置,在这项工作中,我们提出来标识不同的每张目标词之间的簇,考虑到这些不同词义的代表。这样,在获得的集群的分歧自然允许量化在四个目标语言每每个目标字语义移位的水平。为了充分利用这一想法,聚类上出现的词语语境(基于BERT-)的嵌入进行。得到的结果表明,我们的方法执行好双方分别进行测量(每种语言)和整体,我们超越所有提供SemEval基线。
K Vani, Sandra Mitrovic, Alessandro Antonucci, Fabio Rinaldi
Abstract: Lexical semantic change detection (also known as semantic shift tracing) is a task of identifying words that have changed their meaning over time. Unsupervised semantic shift tracing, focal point of SemEval2020, is particularly challenging. Given the unsupervised setup, in this work, we propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings. As such, disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages. To leverage this idea, clustering is performed on contextualized (BERT-based) embeddings of word occurrences. The obtained results show that our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
摘要:词汇语义变化检测(也称为语义移位跟踪)是一种用于识别已经改变了他们的含义随时间单词的任务。无监督语义移位跟踪,SemEval2020的焦点,特别具有挑战性。由于无人监督的设置,在这项工作中,我们提出来标识不同的每张目标词之间的簇,考虑到这些不同词义的代表。这样,在获得的集群的分歧自然允许量化在四个目标语言每每个目标字语义移位的水平。为了充分利用这一想法,聚类上出现的词语语境(基于BERT-)的嵌入进行。得到的结果表明,我们的方法执行好双方分别进行测量(每种语言)和整体,我们超越所有提供SemEval基线。
13. Which *BERT? A Survey Organizing Contextualized Encoders [PDF] 返回目录
Patrick Xia, Shijie Wu, Benjamin Van Durme
Abstract: Pretrained contextualized text encoders are now a staple of the NLP community. We present a survey on language representation learning with the aim of consolidating a series of shared lessons learned across a variety of recent efforts. While significant advancements continue at a rapid pace, we find that enough has now been discovered, in different directions, that we can begin to organize advances according to common themes. Through this organization, we highlight important considerations when interpreting recent contributions and choosing which model to use.
摘要:预训练情境文本编码器现在是NLP社区的主食。我们目前的语言表示学习的调查与巩固了一系列跨越各种近来努力的学会了分享经验的目的。虽然显著进步继续以快速的步伐,我们发现,足够目前已经发现,在不同的方向,我们就可以开始根据共同的主题来组织的进步。通过这个组织,我们最近的解释贡献,选择使用哪个模型时强调的重要考虑因素。
Patrick Xia, Shijie Wu, Benjamin Van Durme
Abstract: Pretrained contextualized text encoders are now a staple of the NLP community. We present a survey on language representation learning with the aim of consolidating a series of shared lessons learned across a variety of recent efforts. While significant advancements continue at a rapid pace, we find that enough has now been discovered, in different directions, that we can begin to organize advances according to common themes. Through this organization, we highlight important considerations when interpreting recent contributions and choosing which model to use.
摘要:预训练情境文本编码器现在是NLP社区的主食。我们目前的语言表示学习的调查与巩固了一系列跨越各种近来努力的学会了分享经验的目的。虽然显著进步继续以快速的步伐,我们发现,足够目前已经发现,在不同的方向,我们就可以开始根据共同的主题来组织的进步。通过这个组织,我们最近的解释贡献,选择使用哪个模型时强调的重要考虑因素。
14. MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models [PDF] 返回目录
Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
Abstract: Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
摘要:现有的预训练的大型语言模型显示无与伦比的生殖能力。然而,他们是不可控的。在本文中,我们提出了威震天,CNTRL,使用大型语言模型,并通过将外部知识基础增加了控制文本生成一个新的框架。我们的框架由一个关键字预测,知识检索,上下文知识排序器,和一个有条件的文本生成的。由于我们没有获得地面实况监督知识排序器,我们利用从句子嵌入监管不力的。实证结果表明,我们的模型生成更为流畅,稳定,和更少的重复和更高的多样性连贯的故事相比,在ROC故事集前期工作。我们通过更换用于生成故事和重新运行生成过程中的关键字展示我们的模型的可控性。人的评价结果表明,这些故事的77.5%的成功新的关键字控制。此外,通过扩展我们的从1.24亿至8.3十亿参数模型中,我们证明了较大的车型改进一代无论是质量(从74.5%到93.0%的一致性)和可控性(从77.5%到91.5%)。
Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
Abstract: Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
摘要:现有的预训练的大型语言模型显示无与伦比的生殖能力。然而,他们是不可控的。在本文中,我们提出了威震天,CNTRL,使用大型语言模型,并通过将外部知识基础增加了控制文本生成一个新的框架。我们的框架由一个关键字预测,知识检索,上下文知识排序器,和一个有条件的文本生成的。由于我们没有获得地面实况监督知识排序器,我们利用从句子嵌入监管不力的。实证结果表明,我们的模型生成更为流畅,稳定,和更少的重复和更高的多样性连贯的故事相比,在ROC故事集前期工作。我们通过更换用于生成故事和重新运行生成过程中的关键字展示我们的模型的可控性。人的评价结果表明,这些故事的77.5%的成功新的关键字控制。此外,通过扩展我们的从1.24亿至8.3十亿参数模型中,我们证明了较大的车型改进一代无论是质量(从74.5%到93.0%的一致性)和可控性(从77.5%到91.5%)。
15. JAKET: Joint Pre-training of Knowledge Graph and Language Understanding [PDF] 返回目录
Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng
Abstract: Knowledge graphs (KGs) contain rich information about world knowledge, entities and relations. Thus, they can be great supplements to existing pre-trained language models. However, it remains a challenge to efficiently integrate information from KG into language modeling. And the understanding of a knowledge graph requires related context. We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language. The knowledge module and language module provide essential information to mutually assist each other: the knowledge module produces embeddings for entities in text while the language module generates context-aware initial embeddings for entities and relations in the graph. Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains. Experimental results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding.
摘要:知识图(KGS)包含有关世界的知识,实体和关系的丰富信息。因此,他们可以是很大的补充现有的预先训练语言模型。但是,它仍然有效信息从KG融入语言模型是一个挑战。而知识图的理解需要相关背景。我们提出了一个新的联合前培训框架,JAKET,以知识图形和语言都进行建模。知识模块和语言模块提供了必要的信息,以方面相互协助:知识模块产生的嵌入了实体文本,而语言模块生成的图形实体和关系,上下文感知的初始的嵌入。我们的设计能够预先训练的模式很容易适应在新领域看不见知识图。在几个知识感知NLP任务的实验结果表明,我们提出的框架,实现了通过有效地利用知识,语言理解优越的性能。
Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng
Abstract: Knowledge graphs (KGs) contain rich information about world knowledge, entities and relations. Thus, they can be great supplements to existing pre-trained language models. However, it remains a challenge to efficiently integrate information from KG into language modeling. And the understanding of a knowledge graph requires related context. We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language. The knowledge module and language module provide essential information to mutually assist each other: the knowledge module produces embeddings for entities in text while the language module generates context-aware initial embeddings for entities and relations in the graph. Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains. Experimental results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding.
摘要:知识图(KGS)包含有关世界的知识,实体和关系的丰富信息。因此,他们可以是很大的补充现有的预先训练语言模型。但是,它仍然有效信息从KG融入语言模型是一个挑战。而知识图的理解需要相关背景。我们提出了一个新的联合前培训框架,JAKET,以知识图形和语言都进行建模。知识模块和语言模块提供了必要的信息,以方面相互协助:知识模块产生的嵌入了实体文本,而语言模块生成的图形实体和关系,上下文感知的初始的嵌入。我们的设计能够预先训练的模式很容易适应在新领域看不见知识图。在几个知识感知NLP任务的实验结果表明,我们提出的框架,实现了通过有效地利用知识,语言理解优越的性能。
16. An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training [PDF] 返回目录
Kristjan Arumae, Qing Sun, Parminder Bhatia
Abstract: Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.
摘要:前培训大语言模型已成为自然语言处理领域的标准。这种模型预先训练上通用的数据(例如BookCorpus和英文维基百科),并在同一个域中的任务往往微调。然而,为了实现对出域任务的国家的最先进的性能,如域前培训临床命名实体识别和关系抽取,另外是必需的。在实践中,在灾难性遗忘的形式(CF)在通用评估基准如胶水时上演多域前培训呈现性能变差。在本文中,我们进行了实证调查,已知的方法来减轻CF.我们发现,弹性体重合并提供最佳的整体成绩得到在七个一般任务中的表现只有0.33%的跌幅,而在生物医学的任务中保持竞争力。此外,我们还探索梯度和潜在的基于聚类的数据选择技术采用弹性重量巩固和经验重播方法时,以提高覆盖率。
Kristjan Arumae, Qing Sun, Parminder Bhatia
Abstract: Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.
摘要:前培训大语言模型已成为自然语言处理领域的标准。这种模型预先训练上通用的数据(例如BookCorpus和英文维基百科),并在同一个域中的任务往往微调。然而,为了实现对出域任务的国家的最先进的性能,如域前培训临床命名实体识别和关系抽取,另外是必需的。在实践中,在灾难性遗忘的形式(CF)在通用评估基准如胶水时上演多域前培训呈现性能变差。在本文中,我们进行了实证调查,已知的方法来减轻CF.我们发现,弹性体重合并提供最佳的整体成绩得到在七个一般任务中的表现只有0.33%的跌幅,而在生物医学的任务中保持竞争力。此外,我们还探索梯度和潜在的基于聚类的数据选择技术采用弹性重量巩固和经验重播方法时,以提高覆盖率。
17. Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding [PDF] 返回目录
Heng Yang, Biqing Zeng
Abstract: Target-oriented sentiment classification is a fine-grained task of natural language processing to analyze the sentiment polarity of the targets. To improve the performance of sentiment classification, many approaches proposed various attention mechanisms to capture the important context words of a target. However, previous approaches ignored the significant relatedness of a target's sentiment and its local context. This paper proposes a local context-aware network (LCA-Net), equipped with the local context embedding and local context prediction loss, to strengthen the model by emphasizing the sentiment information of the local context. The experimental results on three common datasets show that local context-aware network performs superior to existing approaches in extracting local context features. Besides, the local context-aware framework is easy to adapt to many models, with the potential to improve other target-level tasks.
摘要:面向目标的情感分类是自然语言处理的细粒度任务来分析目标的情感极性。为了提高情感分类的性能,许多方法提出了各种关注机制,以捕捉目标的重要的背景的话。然而,以往的方法所忽略目标的信心和当地环境的显著相关性。本文提出了一种局部上下文感知网络(LCA-网),配有当地背景嵌入和本地情境预测的损失,通过强调当地环境的情绪信息,加强模式。在三种常见数据集上的实验结果表明优于提取局部背景特征现行做法,当地环境感知网络执行。此外,当地的环境感知架构很容易适应多种车型,有潜力,以提高其他目标级任务。
Heng Yang, Biqing Zeng
Abstract: Target-oriented sentiment classification is a fine-grained task of natural language processing to analyze the sentiment polarity of the targets. To improve the performance of sentiment classification, many approaches proposed various attention mechanisms to capture the important context words of a target. However, previous approaches ignored the significant relatedness of a target's sentiment and its local context. This paper proposes a local context-aware network (LCA-Net), equipped with the local context embedding and local context prediction loss, to strengthen the model by emphasizing the sentiment information of the local context. The experimental results on three common datasets show that local context-aware network performs superior to existing approaches in extracting local context features. Besides, the local context-aware framework is easy to adapt to many models, with the potential to improve other target-level tasks.
摘要:面向目标的情感分类是自然语言处理的细粒度任务来分析目标的情感极性。为了提高情感分类的性能,许多方法提出了各种关注机制,以捕捉目标的重要的背景的话。然而,以往的方法所忽略目标的信心和当地环境的显著相关性。本文提出了一种局部上下文感知网络(LCA-网),配有当地背景嵌入和本地情境预测的损失,通过强调当地环境的情绪信息,加强模式。在三种常见数据集上的实验结果表明优于提取局部背景特征现行做法,当地环境感知网络执行。此外,当地的环境感知架构很容易适应多种车型,有潜力,以提高其他目标级任务。
18. Enriching Word Embeddings with Temporal and Spatial Information [PDF] 返回目录
Hongyu Gong, Suma Bhat, Pramod Viswanath
Abstract: The meaning of a word is closely linked to sociocultural factors that can change over time and location, resulting in corresponding meaning changes. Taking a global view of words and their meanings in a widely used language, such as English, may require us to capture more refined semantics for use in time-specific or location-aware situations, such as the study of cultural trends or language use. However, popular vector representations for words do not adequately include temporal or spatial information. In this work, we present a model for learning word representation conditioned on time and location. In addition to capturing meaning changes over time and location, we require that the resulting word embeddings retain salient semantic and geometric properties. We train our model on time- and location-stamped corpora, and show using both quantitative and qualitative evaluations that it can capture semantics across time and locations. We note that our model compares favorably with the state-of-the-art for time-specific embedding, and serves as a new benchmark for location-specific embeddings.
摘要:一个词的意思是密切相关的,可以随着时间和地点的变化,从而导致相应的变化意味着社会文化因素。以词和它们的含义的全局视图,在广泛使用的语言,如英语,可能需要我们捕捉具体时间或位置感知的情况下,如文化趋势或语言使用的研究中使用更准确的含义。然而,对于流行的词向量表示不充分包括时间和空间信息。在这项工作中,我们提出了学习单词表示模型条件上的时间和地点。除了捕获随着时间和地点的变化意味着,我们需要的是,最终的字嵌入物保持显着的语义和几何性质。我们培训的时间和位置标记语料库模型,并采用定量和定性的评估,它可以捕捉不同时间和地点的语义表现。我们注意到,我们的模型与国家的最先进的针对特定时间嵌入相媲美,并作为定位特定的嵌入了新的标杆。
Hongyu Gong, Suma Bhat, Pramod Viswanath
Abstract: The meaning of a word is closely linked to sociocultural factors that can change over time and location, resulting in corresponding meaning changes. Taking a global view of words and their meanings in a widely used language, such as English, may require us to capture more refined semantics for use in time-specific or location-aware situations, such as the study of cultural trends or language use. However, popular vector representations for words do not adequately include temporal or spatial information. In this work, we present a model for learning word representation conditioned on time and location. In addition to capturing meaning changes over time and location, we require that the resulting word embeddings retain salient semantic and geometric properties. We train our model on time- and location-stamped corpora, and show using both quantitative and qualitative evaluations that it can capture semantics across time and locations. We note that our model compares favorably with the state-of-the-art for time-specific embedding, and serves as a new benchmark for location-specific embeddings.
摘要:一个词的意思是密切相关的,可以随着时间和地点的变化,从而导致相应的变化意味着社会文化因素。以词和它们的含义的全局视图,在广泛使用的语言,如英语,可能需要我们捕捉具体时间或位置感知的情况下,如文化趋势或语言使用的研究中使用更准确的含义。然而,对于流行的词向量表示不充分包括时间和空间信息。在这项工作中,我们提出了学习单词表示模型条件上的时间和地点。除了捕获随着时间和地点的变化意味着,我们需要的是,最终的字嵌入物保持显着的语义和几何性质。我们培训的时间和位置标记语料库模型,并采用定量和定性的评估,它可以捕捉不同时间和地点的语义表现。我们注意到,我们的模型与国家的最先进的针对特定时间嵌入相媲美,并作为定位特定的嵌入了新的标杆。
19. STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++ [PDF] 返回目录
Jack G. M. FitzGerald
Abstract: Slot-filling, Translation, Intent classification, and Language identification, or STIL, is a newly-proposed task for multilingual Natural Language Understanding (NLU). By performing simultaneous slot filling and translation into a single output language (English in this case), some portion of downstream system components can be monolingual, reducing development and maintenance cost. Results are given using the multilingual BART model (Liu et al., 2020) fine-tuned on 7 languages using the MultiATIS++ dataset. When no translation is performed, mBART's performance is comparable to the current state of the art system (Cross-Lingual BERT by Xu et al. (2020)) for the languages tested, with better average intent classification accuracy (96.07% versus 95.50%) but worse average slot F1 (89.87% versus 90.81%). When simultaneous translation is performed, average intent classification accuracy degrades by only 1.7% relative and average slot F1 degrades by only 1.2% relative.
摘要:吸入式灌装,翻译,意图分类,语言识别,或STIL,是多语种自然语言理解(NLU)新近提出的任务。通过执行同时出现信道的填充和翻译成一个单一的输出语言(英语在这种情况下),下游系统部件的一些部分可以是单语,降低开发和维护成本。使用多语言模型BART给出结果(Liu等人,2020年)微调上使用MultiATIS ++数据集7种语言。如果不进行翻译,mBART的性能与艺术系统的当前状态(跨语言BERT徐等人(2020))的语言测试,具有更好的平均意图分类精度(96.07%对95.50%)但更坏平均槽F1(89.87%对90.81%)。当同声翻译仅由1.7%的相对和平均时隙F1降解相对只有1.2%的执行,平均意图分类精度降低。
Jack G. M. FitzGerald
Abstract: Slot-filling, Translation, Intent classification, and Language identification, or STIL, is a newly-proposed task for multilingual Natural Language Understanding (NLU). By performing simultaneous slot filling and translation into a single output language (English in this case), some portion of downstream system components can be monolingual, reducing development and maintenance cost. Results are given using the multilingual BART model (Liu et al., 2020) fine-tuned on 7 languages using the MultiATIS++ dataset. When no translation is performed, mBART's performance is comparable to the current state of the art system (Cross-Lingual BERT by Xu et al. (2020)) for the languages tested, with better average intent classification accuracy (96.07% versus 95.50%) but worse average slot F1 (89.87% versus 90.81%). When simultaneous translation is performed, average intent classification accuracy degrades by only 1.7% relative and average slot F1 degrades by only 1.2% relative.
摘要:吸入式灌装,翻译,意图分类,语言识别,或STIL,是多语种自然语言理解(NLU)新近提出的任务。通过执行同时出现信道的填充和翻译成一个单一的输出语言(英语在这种情况下),下游系统部件的一些部分可以是单语,降低开发和维护成本。使用多语言模型BART给出结果(Liu等人,2020年)微调上使用MultiATIS ++数据集7种语言。如果不进行翻译,mBART的性能与艺术系统的当前状态(跨语言BERT徐等人(2020))的语言测试,具有更好的平均意图分类精度(96.07%对95.50%)但更坏平均槽F1(89.87%对90.81%)。当同声翻译仅由1.7%的相对和平均时隙F1降解相对只有1.2%的执行,平均意图分类精度降低。
20. A Survey of the State of Explainable AI for Natural Language Processing [PDF] 返回目录
Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, Prithviraj Sen
Abstract: Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.
摘要:最近几年,在国家的最先进的车型质量的重要进展,但为此付出的模型的费用越来越少解释。本次调查提出解释的AI(XAI)的当前状态的概况,自然语言处理(NLP)的领域内。我们讨论的说明的主要分类,以及各种方式的说明即可抵达和可视化。我们详细的操作和目前可用于为NLP模型预测生成的解释explainability技术,以作为在社区模式开发的资源。最后,我们指出目前的差距,并鼓励对今后的工作方向在这一重要的研究领域。
Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, Prithviraj Sen
Abstract: Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.
摘要:最近几年,在国家的最先进的车型质量的重要进展,但为此付出的模型的费用越来越少解释。本次调查提出解释的AI(XAI)的当前状态的概况,自然语言处理(NLP)的领域内。我们讨论的说明的主要分类,以及各种方式的说明即可抵达和可视化。我们详细的操作和目前可用于为NLP模型预测生成的解释explainability技术,以作为在社区模式开发的资源。最后,我们指出目前的差距,并鼓励对今后的工作方向在这一重要的研究领域。
21. Nearest Neighbor Machine Translation [PDF] 返回目录
Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
Abstract: We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search. This approach requires no additional training and scales to give the decoder direct access to billions of examples at test time, resulting in a highly expressive model that consistently improves performance across many settings. Simply adding nearest neighbor search improves a state-of-the-art German-English translation model by 1.5 BLEU. $k$NN-MT allows a single model to be adapted to diverse domains by using a domain-specific datastore, improving results by an average of 9.2 BLEU over zero-shot transfer, and achieving new state-of-the-art results---without training on these domains. A massively multilingual model can also be specialized for particular language pairs, with improvements of 3 BLEU for translating from English into German and Chinese. Qualitatively, $k$NN-MT is easily interpretable; it combines source and target context to retrieve highly relevant examples.
摘要:介绍$ $ķ-nearest邻机器翻译($ķ$ NN-MT),其预测与近邻分类在大数据存储的缓存例子令牌,使用表示从相似性搜索神经翻译模型。这种方法不需要额外的培训和规模给予解码器,以数十亿的例子直接访问在测试时,导致持续改善在许多设置,性能表现力极强的模型。简单地增加近邻搜索提高了1.5 BLEU一个国家的最先进的德国英语翻译模型。 $ $ķNN-MT允许通过使用特定域的数据存储,提高平均9.2 BLEU过零触发转移的结果,实现状态的最先进的新成果为一个单一的模型适应于不同领域在这些领域--without培训。一个大型多语种模型也可以专门为特定的语言对,从英语翻译成德语和中国3 BLEU的改进。定性,$ķ$ NN-MT是容易解释的;它结合了源和目标上下文检索高度相关的例子。
Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
Abstract: We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search. This approach requires no additional training and scales to give the decoder direct access to billions of examples at test time, resulting in a highly expressive model that consistently improves performance across many settings. Simply adding nearest neighbor search improves a state-of-the-art German-English translation model by 1.5 BLEU. $k$NN-MT allows a single model to be adapted to diverse domains by using a domain-specific datastore, improving results by an average of 9.2 BLEU over zero-shot transfer, and achieving new state-of-the-art results---without training on these domains. A massively multilingual model can also be specialized for particular language pairs, with improvements of 3 BLEU for translating from English into German and Chinese. Qualitatively, $k$NN-MT is easily interpretable; it combines source and target context to retrieve highly relevant examples.
摘要:介绍$ $ķ-nearest邻机器翻译($ķ$ NN-MT),其预测与近邻分类在大数据存储的缓存例子令牌,使用表示从相似性搜索神经翻译模型。这种方法不需要额外的培训和规模给予解码器,以数十亿的例子直接访问在测试时,导致持续改善在许多设置,性能表现力极强的模型。简单地增加近邻搜索提高了1.5 BLEU一个国家的最先进的德国英语翻译模型。 $ $ķNN-MT允许通过使用特定域的数据存储,提高平均9.2 BLEU过零触发转移的结果,实现状态的最先进的新成果为一个单一的模型适应于不同领域在这些领域--without培训。一个大型多语种模型也可以专门为特定的语言对,从英语翻译成德语和中国3 BLEU的改进。定性,$ķ$ NN-MT是容易解释的;它结合了源和目标上下文检索高度相关的例子。
22. How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds [PDF] 返回目录
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston
Abstract: We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)---a large-scale crowd-sourced fantasy text-game---with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations.
摘要:我们寻求建立代理这两个行为,在追求目标的其他代理进行通信。为此,我们延长光源(Urbanek等2019)---大规模人群来源的幻想的文字游戏---与任务的数据集。这些包含在游戏中的目标和人示威配对自然语言的动机;在完成一个任务可能需要对话或行动(或两者)。我们引进了强化学习系统(1)采用了基于模型的大型语言和推理常识前培训,灌输有关先验的代理人;和(2)利用动作命令和对话的因式分解动作空间,两者之间的平衡。我们进行了零次评估采用持有人进行示威活动的专家,这表明我们的代理商能够始终如一地和谈话相对于他们的动机自然行事。
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston
Abstract: We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)---a large-scale crowd-sourced fantasy text-game---with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations.
摘要:我们寻求建立代理这两个行为,在追求目标的其他代理进行通信。为此,我们延长光源(Urbanek等2019)---大规模人群来源的幻想的文字游戏---与任务的数据集。这些包含在游戏中的目标和人示威配对自然语言的动机;在完成一个任务可能需要对话或行动(或两者)。我们引进了强化学习系统(1)采用了基于模型的大型语言和推理常识前培训,灌输有关先验的代理人;和(2)利用动作命令和对话的因式分解动作空间,两者之间的平衡。我们进行了零次评估采用持有人进行示威活动的专家,这表明我们的代理商能够始终如一地和谈话相对于他们的动机自然行事。
23. Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling [PDF] 返回目录
Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies, Lakshminarayanan Subramanian
Abstract: This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we demonstrate that a solution combining syntactic DP coupled with type-specific SRL tasks provides the highest accuracy for retrieving contextual privacy parameters from privacy statements. We also observe that incorporating domain-specific knowledge is critical to achieving high precision and recall, thus inspiring new NLP research to address this important problem in the privacy domain.
摘要:本文从制定的隐私策略中提取的隐私参数,通过语境完整性,推理隐私规范建立的社会理论框架的镜片的新任务。隐私政策,由律师写的,是漫长的,常常包含不完整的,含糊的声明。在本文中,我们证明了传统的自然语言处理任务,包括最近提出的问题回答基础的解决方案,是不足以解决隐私参数提取问题,并提供精确的穷人和召回。我们描述了4种不同类型的常规方法,可以进行部分适于与不同程度的成功解决参数提取任务:隐马尔可夫模型,BERT微调模式,依赖类型分析(DP)和语义角色标注(SRL)。基于各主要企业的36现实世界的隐私政策进行详细的评估,我们证明了结合语法DP的解决方案加上特定类型-SRL任务提供了从隐私权声明获取上下文的隐私参数最高的精度。我们也观察到,结合特定领域的知识,以实现高精确度和召回,从而激发新的NLP研究在隐私领域,以解决这一重要问题的关键。
Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies, Lakshminarayanan Subramanian
Abstract: This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we demonstrate that a solution combining syntactic DP coupled with type-specific SRL tasks provides the highest accuracy for retrieving contextual privacy parameters from privacy statements. We also observe that incorporating domain-specific knowledge is critical to achieving high precision and recall, thus inspiring new NLP research to address this important problem in the privacy domain.
摘要:本文从制定的隐私策略中提取的隐私参数,通过语境完整性,推理隐私规范建立的社会理论框架的镜片的新任务。隐私政策,由律师写的,是漫长的,常常包含不完整的,含糊的声明。在本文中,我们证明了传统的自然语言处理任务,包括最近提出的问题回答基础的解决方案,是不足以解决隐私参数提取问题,并提供精确的穷人和召回。我们描述了4种不同类型的常规方法,可以进行部分适于与不同程度的成功解决参数提取任务:隐马尔可夫模型,BERT微调模式,依赖类型分析(DP)和语义角色标注(SRL)。基于各主要企业的36现实世界的隐私政策进行详细的评估,我们证明了结合语法DP的解决方案加上特定类型-SRL任务提供了从隐私权声明获取上下文的隐私参数最高的精度。我们也观察到,结合特定领域的知识,以实现高精确度和召回,从而激发新的NLP研究在隐私领域,以解决这一重要问题的关键。
24. Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding [PDF] 返回目录
Jiaming Shen, Heng Ji, Jiawei Han
Abstract: Linguistic steganography studies how to hide secret messages in natural language cover texts. Traditional methods aim to transform a secret message into an innocent text via lexical substitution or syntactical modification. Recently, advances in neural language models (LMs) enable us to directly generate cover text conditioned on the secret message. In this study, we present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model. We formally analyze the statistical imperceptibility of this method and empirically show it outperforms the previous state-of-the-art methods on four datasets by 15.3% and 38.9% in terms of bits/word and KL metrics, respectively. Finally, human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
摘要:语言隐写术研究如何隐藏在自然语言文字盖秘密信息。传统的方法的目标是变换秘密消息分成通过词法取代或句法变形例的无辜文本。最近,神经语言模型(LMS)的进步使我们能够直接产生封面文字空调的秘密信息。在这项研究中,我们提出了编码使用自调整算法基于神经语言模型编码秘密消息的新的语言隐写术的方法。我们正式分析这个方法的统计不可感知性和经验表明它优于分别位/字和KL指标,计15.3%和38.9%,在四个数据集以前的国家的最先进的方法。最后,人的评估表明,产生的封面文章的51%,确实可以糊弄窃听。
Jiaming Shen, Heng Ji, Jiawei Han
Abstract: Linguistic steganography studies how to hide secret messages in natural language cover texts. Traditional methods aim to transform a secret message into an innocent text via lexical substitution or syntactical modification. Recently, advances in neural language models (LMs) enable us to directly generate cover text conditioned on the secret message. In this study, we present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model. We formally analyze the statistical imperceptibility of this method and empirically show it outperforms the previous state-of-the-art methods on four datasets by 15.3% and 38.9% in terms of bits/word and KL metrics, respectively. Finally, human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
摘要:语言隐写术研究如何隐藏在自然语言文字盖秘密信息。传统的方法的目标是变换秘密消息分成通过词法取代或句法变形例的无辜文本。最近,神经语言模型(LMS)的进步使我们能够直接产生封面文字空调的秘密信息。在这项研究中,我们提出了编码使用自调整算法基于神经语言模型编码秘密消息的新的语言隐写术的方法。我们正式分析这个方法的统计不可感知性和经验表明它优于分别位/字和KL指标,计15.3%和38.9%,在四个数据集以前的国家的最先进的方法。最后,人的评估表明,产生的封面文章的51%,确实可以糊弄窃听。
25. Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers [PDF] 返回目录
Hanjie Chen, Yangfeng Ji
Abstract: To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training. To address this limitation, we propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions. The proposed method is evaluated with three neural text classifiers (CNN, LSTM, and BERT) on seven benchmark text classification datasets. Experiments show the effectiveness of VMASK in improving both model prediction accuracy and interpretability.
摘要:为建立可解释的神经文本分类,大部分以前的工作都集中于设计本身解释模型或找到忠实的解释。工作对提高模型解释性新线刚刚起步,许多现有方法需要事先两种信息或人的注释,如额外的培训投入。为了解决这个限制,我们提出了变字面具(VMASK)方法来自动学习特定任务的重要词汇和减少对分类不相关的信息,从而最终提高模型预测的可解释性。该方法与七个基准文本分类数据集3个神经文本分类(CNN,LSTM和BERT)进行评估。实验表明,在同时提高模型的预测准确度和可解释性VMASK的有效性。
Hanjie Chen, Yangfeng Ji
Abstract: To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training. To address this limitation, we propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions. The proposed method is evaluated with three neural text classifiers (CNN, LSTM, and BERT) on seven benchmark text classification datasets. Experiments show the effectiveness of VMASK in improving both model prediction accuracy and interpretability.
摘要:为建立可解释的神经文本分类,大部分以前的工作都集中于设计本身解释模型或找到忠实的解释。工作对提高模型解释性新线刚刚起步,许多现有方法需要事先两种信息或人的注释,如额外的培训投入。为了解决这个限制,我们提出了变字面具(VMASK)方法来自动学习特定任务的重要词汇和减少对分类不相关的信息,从而最终提高模型预测的可解释性。该方法与七个基准文本分类数据集3个神经文本分类(CNN,LSTM和BERT)进行评估。实验表明,在同时提高模型的预测准确度和可解释性VMASK的有效性。
26. Predicting User Engagement Status for Online Evaluation of Intelligent Assistants [PDF] 返回目录
Rui Meng, Zhen Yue, Alyssa Glass
Abstract: Evaluation of intelligent assistants in large-scale and online settings remains an open challenge. User behavior-based online evaluation metrics have demonstrated great effectiveness for monitoring large-scale web search and recommender systems. Therefore, we consider predicting user engagement status as the very first and critical step to online evaluation for intelligent assistants. In this work, we first proposed a novel framework for classifying user engagement status into four categories -- fulfillment, continuation, reformulation and abandonment. We then demonstrated how to design simple but indicative metrics based on the framework to quantify user engagement levels. We also aim for automating user engagement prediction with machine learning methods. We compare various models and features for predicting engagement status using four real-world datasets. We conducted detailed analyses on features and failure cases to discuss the performance of current models as well as challenges.
摘要:在大型和网络设置智能助手的评价仍然是一个悬而未决的挑战。网上评价指标基于用户的行为已经证明用于监控大型网络搜索和推荐系统的巨大效力。因此,我们认为预测用户参与状态。因为第一个和关键的一步网上评估智能助手。在这项工作中,我们首先提出了用户参与的状态分成四类一个新的框架 - 履行,延续,修订和遗弃。然后,我们演示了如何设计框架基础上,以用户进行量化敬业度简单,但指示指标。我们的目标是用机器学习方法自动化用户参与预测。我们比较各种型号和功能预测使用四个真实世界的数据集互动状态。我们进行了详细的功能和失败的案例分析,讨论当前的模型和挑战的表现。
Rui Meng, Zhen Yue, Alyssa Glass
Abstract: Evaluation of intelligent assistants in large-scale and online settings remains an open challenge. User behavior-based online evaluation metrics have demonstrated great effectiveness for monitoring large-scale web search and recommender systems. Therefore, we consider predicting user engagement status as the very first and critical step to online evaluation for intelligent assistants. In this work, we first proposed a novel framework for classifying user engagement status into four categories -- fulfillment, continuation, reformulation and abandonment. We then demonstrated how to design simple but indicative metrics based on the framework to quantify user engagement levels. We also aim for automating user engagement prediction with machine learning methods. We compare various models and features for predicting engagement status using four real-world datasets. We conducted detailed analyses on features and failure cases to discuss the performance of current models as well as challenges.
摘要:在大型和网络设置智能助手的评价仍然是一个悬而未决的挑战。网上评价指标基于用户的行为已经证明用于监控大型网络搜索和推荐系统的巨大效力。因此,我们认为预测用户参与状态。因为第一个和关键的一步网上评估智能助手。在这项工作中,我们首先提出了用户参与的状态分成四类一个新的框架 - 履行,延续,修订和遗弃。然后,我们演示了如何设计框架基础上,以用户进行量化敬业度简单,但指示指标。我们的目标是用机器学习方法自动化用户参与预测。我们比较各种型号和功能预测使用四个真实世界的数据集互动状态。我们进行了详细的功能和失败的案例分析,讨论当前的模型和挑战的表现。
27. Discontinuous Constituent Parsing as Sequence Labeling [PDF] 返回目录
David Vilares, Carlos Gómez-Rodríguez
Abstract: This paper reduces discontinuous parsing to sequence labeling. It first shows that existing reductions for constituent parsing as labeling do not support discontinuities. Second, it fills this gap and proposes to encode tree discontinuities as nearly ordered permutations of the input sequence. Third, it studies whether such discontinuous representations are learnable. The experiments show that despite the architectural simplicity, under the right representation, the models are fast and accurate.
摘要:降低非连续解析到序列标注。它首先表明,对于现有成分减少解析为标签不支持不连续。其次,它填补了这一空白,并建议编码树的不连续性作为输入序列的近有序排列。第三,研究了这种不连续的表示是否可学习。实验结果表明,尽管建筑简单,正确的表象下,模型是快速,准确。
David Vilares, Carlos Gómez-Rodríguez
Abstract: This paper reduces discontinuous parsing to sequence labeling. It first shows that existing reductions for constituent parsing as labeling do not support discontinuities. Second, it fills this gap and proposes to encode tree discontinuities as nearly ordered permutations of the input sequence. Third, it studies whether such discontinuous representations are learnable. The experiments show that despite the architectural simplicity, under the right representation, the models are fast and accurate.
摘要:降低非连续解析到序列标注。它首先表明,对于现有成分减少解析为标签不支持不连续。其次,它填补了这一空白,并建议编码树的不连续性作为输入序列的近有序排列。第三,研究了这种不连续的表示是否可学习。实验结果表明,尽管建筑简单,正确的表象下,模型是快速,准确。
28. Building Large Lexicalized Ontologies from Text: a Use Case in Automatic Indexing of Biotechnology Patents [PDF] 返回目录
Claire Nédellec, Wiktoria Golik, Sophie Aubin, Robert Bossy
Abstract: This paper presents a tool, TyDI, and methods experimented in the building of a termino-ontology, i.e. a lexicalized ontology aimed at fine-grained indexation for semantic search applications. TyDI provides facilities for knowledge engineers and domain experts to efficiently collaborate to validate, organize and conceptualize corpus extracted terms. A use case on biotechnology patent search demonstrates TyDI's potential.
摘要:本文提出了一种工具,泰迪和方法在termino-本体建设试验,即一个词汇化的本体,旨在为语义搜索应用的细粒度指数。泰迪提供知识工程师和领域专家设施,以有效协作来验证,组织和概念化语料库提取的字词。生物技术专利检索用例演示了泰迪的潜力。
Claire Nédellec, Wiktoria Golik, Sophie Aubin, Robert Bossy
Abstract: This paper presents a tool, TyDI, and methods experimented in the building of a termino-ontology, i.e. a lexicalized ontology aimed at fine-grained indexation for semantic search applications. TyDI provides facilities for knowledge engineers and domain experts to efficiently collaborate to validate, organize and conceptualize corpus extracted terms. A use case on biotechnology patent search demonstrates TyDI's potential.
摘要:本文提出了一种工具,泰迪和方法在termino-本体建设试验,即一个词汇化的本体,旨在为语义搜索应用的细粒度指数。泰迪提供知识工程师和领域专家设施,以有效协作来验证,组织和概念化语料库提取的字词。生物技术专利检索用例演示了泰迪的潜力。
29. Contrastive Learning of Medical Visual Representations from Paired Images and Text [PDF] 返回目录
Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning, Curtis P. Langlotz
Abstract: Learning visual representations of medical images is core to medical image understanding but its progress has been held back by the small size of hand-labeled datasets. Existing work commonly relies on transferring weights from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. We propose an alternative unsupervised strategy to learn medical visual representations directly from the naturally occurring pairing of images and textual data. Our method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input. We test our method by transferring our pretrained weights to 4 medical image classification tasks and 2 zero-shot retrieval tasks, and show that our method leads to image representations that considerably outperform strong baselines in most settings. Notably, in all 4 classification tasks, our method requires only 10% as much labeled training data as an ImageNet initialized counterpart to achieve better or comparable performance, demonstrating superior data efficiency.
摘要:学习医学图像的视觉表现为核心,以医疗图像理解,但其进展一直由手工标记的数据集的规模小了。现有的工作通常依赖于从ImageNet训练前,这是不理想的,由于完全不同的图像特征,或者从医学图像配对的文字报告数据,这是不准确的,很难一概而论基于规则的标签提取转移的权重。我们提出替代无人监督的策略,直接从图像和文本数据的自然发生的配对学习医学可视化表示。我们通过两个模态之间的双向对比目标训练前与成对的文本数据的医用图像编码器的方法是结构域无关的,并且不需要附加的专门的输入。我们通过我们的预训练的权重转移到4个医学图像分类任务和2个零次检索任务,并显示测试我们的方法,我们的方法导致图像表示,在大多数的设置大大超越强大的基线。值得注意的是,在所有4个分类任务,我们的方法只需要10%之多标记的训练数据作为ImageNet初始化配对,以达到更好或相当的性能,展示了出色的数据效率。
Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning, Curtis P. Langlotz
Abstract: Learning visual representations of medical images is core to medical image understanding but its progress has been held back by the small size of hand-labeled datasets. Existing work commonly relies on transferring weights from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. We propose an alternative unsupervised strategy to learn medical visual representations directly from the naturally occurring pairing of images and textual data. Our method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input. We test our method by transferring our pretrained weights to 4 medical image classification tasks and 2 zero-shot retrieval tasks, and show that our method leads to image representations that considerably outperform strong baselines in most settings. Notably, in all 4 classification tasks, our method requires only 10% as much labeled training data as an ImageNet initialized counterpart to achieve better or comparable performance, demonstrating superior data efficiency.
摘要:学习医学图像的视觉表现为核心,以医疗图像理解,但其进展一直由手工标记的数据集的规模小了。现有的工作通常依赖于从ImageNet训练前,这是不理想的,由于完全不同的图像特征,或者从医学图像配对的文字报告数据,这是不准确的,很难一概而论基于规则的标签提取转移的权重。我们提出替代无人监督的策略,直接从图像和文本数据的自然发生的配对学习医学可视化表示。我们通过两个模态之间的双向对比目标训练前与成对的文本数据的医用图像编码器的方法是结构域无关的,并且不需要附加的专门的输入。我们通过我们的预训练的权重转移到4个医学图像分类任务和2个零次检索任务,并显示测试我们的方法,我们的方法导致图像表示,在大多数的设置大大超越强大的基线。值得注意的是,在所有4个分类任务,我们的方法只需要10%之多标记的训练数据作为ImageNet初始化配对,以达到更好或相当的性能,展示了出色的数据效率。
30. Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer [PDF] 返回目录
Yufang Huang, Wentao Zhu, Deyi Xiong, Yiye Zhang, Changjian Hu, Feiyu Xu
Abstract: Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation. In this paper, we propose a novel neural approach to unsupervised text style transfer, which we refer to as Cycle-consistent Adversarial autoEncoders (CAE) trained from non-parallel data. CAE consists of three essential components: (1) LSTM autoencoders that encode a text in one style into its latent representation and decode an encoded representation into its original text or a transferred representation into a style-transferred text, (2) adversarial style transfer networks that use an adversarially trained generator to transform a latent representation in one style into a representation in another style, and (3) a cycle-consistent constraint that enhances the capacity of the adversarial style transfer networks in content preservation. The entire CAE with these three components can be trained end-to-end. Extensive experiments and in-depth analyses on two widely-used public datasets consistently validate the effectiveness of proposed CAE in both style transfer and content preservation against several strong baselines in terms of four automatic evaluation metrics and human evaluation.
摘要:无监督的文本样式转移是充满挑战,由于缺乏并行数据和内容保存困难。在本文中,我们提出了无监督文本样式转移,我们称之为非并行数据训练周期一致对抗性自动编码(CAE)一种新型的神经途径。 CAE由三个基本部件组成:(1)LSTM自动编码,在一个风格文本编码成它的潜表示和解码的编码表示成其原始文本或转移表示成式转印文本,(2)对抗式传送网络使用一个adversarially训练发生器在一个风格转换潜表示形式转换为表示在另一种风格,以及(3)一个周期一致的约束增强在保存内容的对抗式传输网络的容量。这三个组件的整个CAE可以训练结束到终端。两种广泛使用的公共数据集大量的实验和深入的分析一致验证提出CAE技术在四个自动评价指标和评价人既条款转让的风格和内容保存对几种强基线的有效性。
Yufang Huang, Wentao Zhu, Deyi Xiong, Yiye Zhang, Changjian Hu, Feiyu Xu
Abstract: Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation. In this paper, we propose a novel neural approach to unsupervised text style transfer, which we refer to as Cycle-consistent Adversarial autoEncoders (CAE) trained from non-parallel data. CAE consists of three essential components: (1) LSTM autoencoders that encode a text in one style into its latent representation and decode an encoded representation into its original text or a transferred representation into a style-transferred text, (2) adversarial style transfer networks that use an adversarially trained generator to transform a latent representation in one style into a representation in another style, and (3) a cycle-consistent constraint that enhances the capacity of the adversarial style transfer networks in content preservation. The entire CAE with these three components can be trained end-to-end. Extensive experiments and in-depth analyses on two widely-used public datasets consistently validate the effectiveness of proposed CAE in both style transfer and content preservation against several strong baselines in terms of four automatic evaluation metrics and human evaluation.
摘要:无监督的文本样式转移是充满挑战,由于缺乏并行数据和内容保存困难。在本文中,我们提出了无监督文本样式转移,我们称之为非并行数据训练周期一致对抗性自动编码(CAE)一种新型的神经途径。 CAE由三个基本部件组成:(1)LSTM自动编码,在一个风格文本编码成它的潜表示和解码的编码表示成其原始文本或转移表示成式转印文本,(2)对抗式传送网络使用一个adversarially训练发生器在一个风格转换潜表示形式转换为表示在另一种风格,以及(3)一个周期一致的约束增强在保存内容的对抗式传输网络的容量。这三个组件的整个CAE可以训练结束到终端。两种广泛使用的公共数据集大量的实验和深入的分析一致验证提出CAE技术在四个自动评价指标和评价人既条款转让的风格和内容保存对几种强基线的有效性。
31. Evaluating a Generative Adversarial Framework for Information Retrieval [PDF] 返回目录
Ameet Deshpande, Mitesh M. Khapra
Abstract: Recent advances in Generative Adversarial Networks (GANs) have resulted in its widespread applications to multiple domains. A recent model, IRGAN, applies this framework to Information Retrieval (IR) and has gained significant attention over the last few years. In this focused work, we critically analyze multiple components of IRGAN, while providing experimental and theoretical evidence of some of its shortcomings. Specifically, we identify issues with the constant baseline term in the policy gradients optimization and show that the generator harms IRGAN's performance. Motivated by our findings, we propose two models influenced by self-contrastive estimation and co-training which outperform IRGAN on two out of the three tasks considered.
摘要:在创成对抗性网络(甘斯)的最新进展已经导致其被广泛应用到多个领域。最近的一个模型,IRGAN,应用该框架信息检索(IR),并已获得显著的关注,在过去的几年。在这个集中的工作,我们批判地分析IRGAN的多个组件,同时提供的一些缺点实验和理论依据。具体而言,我们认同在政策梯度优化的恒定基线项,表明发电机危害IRGAN的性能问题。通过我们的研究结果的启发,我们提出通过自我对比估计和联合培训,跑赢IRGAN上三分之二的考虑的三个任务的影响两种车型。
Ameet Deshpande, Mitesh M. Khapra
Abstract: Recent advances in Generative Adversarial Networks (GANs) have resulted in its widespread applications to multiple domains. A recent model, IRGAN, applies this framework to Information Retrieval (IR) and has gained significant attention over the last few years. In this focused work, we critically analyze multiple components of IRGAN, while providing experimental and theoretical evidence of some of its shortcomings. Specifically, we identify issues with the constant baseline term in the policy gradients optimization and show that the generator harms IRGAN's performance. Motivated by our findings, we propose two models influenced by self-contrastive estimation and co-training which outperform IRGAN on two out of the three tasks considered.
摘要:在创成对抗性网络(甘斯)的最新进展已经导致其被广泛应用到多个领域。最近的一个模型,IRGAN,应用该框架信息检索(IR),并已获得显著的关注,在过去的几年。在这个集中的工作,我们批判地分析IRGAN的多个组件,同时提供的一些缺点实验和理论依据。具体而言,我们认同在政策梯度优化的恒定基线项,表明发电机危害IRGAN的性能问题。通过我们的研究结果的启发,我们提出通过自我对比估计和联合培训,跑赢IRGAN上三分之二的考虑的三个任务的影响两种车型。
注:中文为机器翻译结果!封面为论文标题词云图!