目录
5. Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model [PDF] 摘要
8. Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models [PDF] 摘要
13. Fake News Spreader Detection on Twitter using Character N-Grams. Notebook for PAN at CLEF 2020 [PDF] 摘要
16. SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery [PDF] 摘要
17. A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [PDF] 摘要
18. Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation [PDF] 摘要
27. The design and implementation of Language Learning Chatbot with XAI using Ontology and Transfer Learning [PDF] 摘要
30. VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training [PDF] 摘要
摘要
1. Contrastive Distillation on Intermediate Representations for Language Model Compression [PDF] 返回目录
Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu
Abstract: Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CoDIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods.
摘要:现有的语言模型压缩方法主要是使用一个简单的L2损失提炼知识大BERT模型的中间表示,以一个较小的一个。尽管广泛使用,这一目标被设计假定隐藏陈述的所有尺寸都是独立的,没有捕捉到重要的结构知识在教师网络的中间层。为了达到更好的疗效蒸馏,提出了对中间表示对比蒸馏(CoDIR),一个有原则的知识蒸馏框架,学生通过对比目标的培训,通过老师的中间层提制知识。通过学习从一大组阴性样品的区分阳性样品,CoDIR促进信息丰富学生的开采老师的隐藏层。 CoDIR可以容易地应用于压缩大型语言模型在这两个前培训和微调阶段,并实现上胶基准精湛的表演,超越国家的最先进的压缩方法。
Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu
Abstract: Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CoDIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods.
摘要:现有的语言模型压缩方法主要是使用一个简单的L2损失提炼知识大BERT模型的中间表示,以一个较小的一个。尽管广泛使用,这一目标被设计假定隐藏陈述的所有尺寸都是独立的,没有捕捉到重要的结构知识在教师网络的中间层。为了达到更好的疗效蒸馏,提出了对中间表示对比蒸馏(CoDIR),一个有原则的知识蒸馏框架,学生通过对比目标的培训,通过老师的中间层提制知识。通过学习从一大组阴性样品的区分阳性样品,CoDIR促进信息丰富学生的开采老师的隐藏层。 CoDIR可以容易地应用于压缩大型语言模型在这两个前培训和微调阶段,并实现上胶基准精湛的表演,超越国家的最先进的压缩方法。
2. Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank [PDF] 返回目录
Ethan C. Chau, Lucy H. Lin, Noah A. Smith
Abstract: Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled \emph{and unlabeled} data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.
摘要:预训练的多语种语境表示表现出了极大的成功,但由于他们的训练前数据的限制,他们的利益并不等同于所有的语言品种适用。这为语言的品种不熟悉这些模型,其标记\ {EMPH和未标记}数据的挑战是太有限有效地培养了一种语言模型。我们建议使用额外的特定语言训练前和词汇扩充的多语言模型适应低资源设置。使用依赖解析四种不同的低资源语言品种作为个案研究,我们证明了这些方法显著提高了基准性能,特别是在最低的资源情况,并证明这种模型的训练前的数据与目标之间的关系的重要性语言品种。
Ethan C. Chau, Lucy H. Lin, Noah A. Smith
Abstract: Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled \emph{and unlabeled} data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.
摘要:预训练的多语种语境表示表现出了极大的成功,但由于他们的训练前数据的限制,他们的利益并不等同于所有的语言品种适用。这为语言的品种不熟悉这些模型,其标记\ {EMPH和未标记}数据的挑战是太有限有效地培养了一种语言模型。我们建议使用额外的特定语言训练前和词汇扩充的多语言模型适应低资源设置。使用依赖解析四种不同的低资源语言品种作为个案研究,我们证明了这些方法显著提高了基准性能,特别是在最低的资源情况,并证明这种模型的训练前的数据与目标之间的关系的重要性语言品种。
3. A Survey on Semantic Parsing from the perspective of Compositionality [PDF] 返回目录
Pawan Kumar, Srikanta Bedathur
Abstract: Different from previous surveys in semantic parsing (Kamath and Das, 2018) and knowledge base question answering(KBQA)(Chakraborty et al., 2019; Zhu et al., 2019; Hoffner et al., 2017) we try to takes a different perspective on the study of semantic parsing. Specifically, we will focus on (a)meaning composition from syntactical structure(Partee, 1975), and (b) the ability of semantic parsers to handle lexical variation given the context of a knowledge base (KB). In the following section after an introduction of the field of semantic parsing and its uses in KBQA, we will describe meaning representation using grammar formalism CCG (Steedman, 1996). We will discuss semantic composition using formal languages in Section 2. In section 3 we will consider systems that uses formal languages e.g. $\lambda$-calculus (Steedman, 1996), $\lambda$-DCS (Liang, 2013). Section 4 and 5 consider semantic parser using structured-language for logical form. Section 6 is on different benchmark datasets ComplexQuestions (Bao et al.,2016) and GraphQuestions (Su et al., 2016) that can be used to evaluate semantic parser on their ability to answer complex questions that are highly compositional in nature.
摘要:(。Chakraborty等人,2019; Zhu等,2019;霍夫纳等,2017)从语义分析以往调查(卡马斯和DAS,2018)和知识基础问答(KBQA)不同,我们尽量采取语义分析的研究不同的看法。具体而言,我们将集中于从语法结构(Partee,1975),和(b)的语义解析器的处理给定一个知识库(KB)的上下文中词法变化的能力(A),这意味着组合物。在介绍语义分析及其在KBQA用途领域的后下面的部分中,我们将使用语法形式主义CCG(斯蒂德曼,1996)描述的含义表示。我们将在第2节使用正式语言在第3节我们将考虑系统的是例如使用正式语言的语义讨论组成$ \ $拉姆达演算(斯蒂德曼,1996年),$ \ $拉姆达-DCS(良,2013年)。第4和第5采用结构化语言的逻辑形式考虑语义解析。第6节是在不同的基准数据集ComplexQuestions(宝等,2016)和GraphQuestions(Su等,2016),可用于对他们的回答是本质上是高度复杂的成分问题的能力评估语义解析。
Pawan Kumar, Srikanta Bedathur
Abstract: Different from previous surveys in semantic parsing (Kamath and Das, 2018) and knowledge base question answering(KBQA)(Chakraborty et al., 2019; Zhu et al., 2019; Hoffner et al., 2017) we try to takes a different perspective on the study of semantic parsing. Specifically, we will focus on (a)meaning composition from syntactical structure(Partee, 1975), and (b) the ability of semantic parsers to handle lexical variation given the context of a knowledge base (KB). In the following section after an introduction of the field of semantic parsing and its uses in KBQA, we will describe meaning representation using grammar formalism CCG (Steedman, 1996). We will discuss semantic composition using formal languages in Section 2. In section 3 we will consider systems that uses formal languages e.g. $\lambda$-calculus (Steedman, 1996), $\lambda$-DCS (Liang, 2013). Section 4 and 5 consider semantic parser using structured-language for logical form. Section 6 is on different benchmark datasets ComplexQuestions (Bao et al.,2016) and GraphQuestions (Su et al., 2016) that can be used to evaluate semantic parser on their ability to answer complex questions that are highly compositional in nature.
摘要:(。Chakraborty等人,2019; Zhu等,2019;霍夫纳等,2017)从语义分析以往调查(卡马斯和DAS,2018)和知识基础问答(KBQA)不同,我们尽量采取语义分析的研究不同的看法。具体而言,我们将集中于从语法结构(Partee,1975),和(b)的语义解析器的处理给定一个知识库(KB)的上下文中词法变化的能力(A),这意味着组合物。在介绍语义分析及其在KBQA用途领域的后下面的部分中,我们将使用语法形式主义CCG(斯蒂德曼,1996)描述的含义表示。我们将在第2节使用正式语言在第3节我们将考虑系统的是例如使用正式语言的语义讨论组成$ \ $拉姆达演算(斯蒂德曼,1996年),$ \ $拉姆达-DCS(良,2013年)。第4和第5采用结构化语言的逻辑形式考虑语义解析。第6节是在不同的基准数据集ComplexQuestions(宝等,2016)和GraphQuestions(Su等,2016),可用于对他们的回答是本质上是高度复杂的成分问题的能力评估语义解析。
4. Improving Low Compute Language Modeling with In-Domain Embedding Initialisation [PDF] 返回目录
Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld
Abstract: Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with byte-pair encoding. We show that for our target setting in English, initialising and freezing input embeddings using in-domain data can improve language model performance by providing a useful representation of rare words, and this pattern holds across several different domains. In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.
摘要:许多NLP应用,如生物医学数据和技术支持,在域数据和计算资源有限的10-100亿令牌学习它。我们应该如何培养在这种情况下语言模型?大多数语言模型的研究认为,无论是小数据集封闭词汇(如标准的百万令牌宾州树库),或字节对编码整个网络。我们发现,在我们的英语目标设定,初始化并使用域数据可以通过提供的生僻字有用表示提高语言模型的性能冻结输入的嵌入,并在几个不同的领域这种模式成立。在这个过程中,我们表明,经过训练在域数据的嵌入初始化时搭售的输入和输出的嵌入的标准约定不改善困惑。
Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld
Abstract: Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with byte-pair encoding. We show that for our target setting in English, initialising and freezing input embeddings using in-domain data can improve language model performance by providing a useful representation of rare words, and this pattern holds across several different domains. In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.
摘要:许多NLP应用,如生物医学数据和技术支持,在域数据和计算资源有限的10-100亿令牌学习它。我们应该如何培养在这种情况下语言模型?大多数语言模型的研究认为,无论是小数据集封闭词汇(如标准的百万令牌宾州树库),或字节对编码整个网络。我们发现,在我们的英语目标设定,初始化并使用域数据可以通过提供的生僻字有用表示提高语言模型的性能冻结输入的嵌入,并在几个不同的领域这种模式成立。在这个过程中,我们表明,经过训练在域数据的嵌入初始化时搭售的输入和输出的嵌入的标准约定不改善困惑。
5. Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model [PDF] 返回目录
Vu Tran, Minh Le Nguyen, Ken Satoh
Abstract: We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document. We utilize the resource of COLIEE 2018 on which we train the document representation model. On the other hand, we extract lexical features on different parts of a given query and its candidates. We observe that by comparing different parts of the query and its candidates, we can achieve better performance. Furthermore, the combination of the lexical features with latent features by the summarization-based method achieves even better performance. We have achieved the state-of-the-art result for the task on the benchmark of the competition.
摘要:我们提出我们为应对竞争的法律信息提取/蕴涵2019年,我们的做法的法律案例检索任务的基础上,想法总结是检索重要方法。一方面,我们采用称为编码总结了总结基于模型编码一个给定的文档到其中嵌入文档的摘要属性连续向量空间。我们利用COLIEE 2018上,我们训练文档表示模型的资源。在另一方面,我们提取在一个给定的查询和考生的不同部分词汇特征。我们注意到,通过比较查询和考生的不同部位,就可以达到更好的性能。此外,与由基于聚合-方法潜特征词法特征的组合实现甚至更好的性能。我们已经取得了国家的最先进的结果为在竞争的标杆任务。
Vu Tran, Minh Le Nguyen, Ken Satoh
Abstract: We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document. We utilize the resource of COLIEE 2018 on which we train the document representation model. On the other hand, we extract lexical features on different parts of a given query and its candidates. We observe that by comparing different parts of the query and its candidates, we can achieve better performance. Furthermore, the combination of the lexical features with latent features by the summarization-based method achieves even better performance. We have achieved the state-of-the-art result for the task on the benchmark of the competition.
摘要:我们提出我们为应对竞争的法律信息提取/蕴涵2019年,我们的做法的法律案例检索任务的基础上,想法总结是检索重要方法。一方面,我们采用称为编码总结了总结基于模型编码一个给定的文档到其中嵌入文档的摘要属性连续向量空间。我们利用COLIEE 2018上,我们训练文档表示模型的资源。在另一方面,我们提取在一个给定的查询和考生的不同部分词汇特征。我们注意到,通过比较查询和考生的不同部位,就可以达到更好的性能。此外,与由基于聚合-方法潜特征词法特征的组合实现甚至更好的性能。我们已经取得了国家的最先进的结果为在竞争的标杆任务。
6. Neural Topic Modeling by Incorporating Document Relationship Graph [PDF] 返回目录
Deyu Zhou, Xuemeng Hu, Rui Wang
Abstract: Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN based neural topic model that represents a corpus as a document relationship graph. Documents and words in the corpus become nodes in the graph and are connected based on document-word co-occurrences. By introducing the graph structure, the relationships between documents are established through their shared words and thus the topical representation of a document is enriched by aggregating information from its neighboring nodes using graph convolution. Extensive experiments on three datasets were conducted and the results demonstrate the effectiveness of the proposed approach.
摘要:图形神经网络(GNNS)捕获通过消息传递图节点之间的关系已经在自然语言处理领域的热点研究方向。在本文中,我们提出了图形主题模型(GTM),表示语料库作为文档关系图基于GNN的神经主题模型。在语料库的文档和词语成为图中的节点,并且基于文档的词同现连接。通过引入图形结构,文档之间的关系通过它们共享话建立,因此一个文件的局部表示由聚集来自使用图形的卷积其相邻节点的信息丰富。三个数据集广泛进行实验,结果证明了该方法的有效性。
Deyu Zhou, Xuemeng Hu, Rui Wang
Abstract: Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN based neural topic model that represents a corpus as a document relationship graph. Documents and words in the corpus become nodes in the graph and are connected based on document-word co-occurrences. By introducing the graph structure, the relationships between documents are established through their shared words and thus the topical representation of a document is enriched by aggregating information from its neighboring nodes using graph convolution. Extensive experiments on three datasets were conducted and the results demonstrate the effectiveness of the proposed approach.
摘要:图形神经网络(GNNS)捕获通过消息传递图节点之间的关系已经在自然语言处理领域的热点研究方向。在本文中,我们提出了图形主题模型(GTM),表示语料库作为文档关系图基于GNN的神经主题模型。在语料库的文档和词语成为图中的节点,并且基于文档的词同现连接。通过引入图形结构,文档之间的关系通过它们共享话建立,因此一个文件的局部表示由聚集来自使用图形的卷积其相邻节点的信息丰富。三个数据集广泛进行实验,结果证明了该方法的有效性。
7. Neural Topic Modeling with Cycle-Consistent Adversarial Training [PDF] 返回目录
Xuemeng Hu, Rui Wang, Deyu Zhou, Yuxuan Xiong
Abstract: Advances on deep generative models have attracted significant research interest in neural topic modeling. The recently proposed Adversarial-neural Topic Model models topics with an adversarially trained generator network and employs Dirichlet prior to capture the semantic patterns in latent topics. It is effective in discovering coherent topics but unable to infer topic distributions for given documents or utilize available document labels. To overcome such limitations, we propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT. ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics. Adversarial training and cycle-consistent constraints are used to encourage the generator and the encoder to produce realistic samples that coordinate with each other. sToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics. The effectiveness of the proposed models is evaluated on unsupervised/supervised topic modeling and text classification. The experimental results show that our models can produce both coherent and informative topics, outperforming a number of competitive baselines.
摘要:深生成模型的进步已经引起了神经主题建模显著的研究兴趣。与adversarially训练的发电机网络最近提出的对抗性神经主题模型的模型主题和狄氏员工捕捉语义模式的潜在主题之前。它是有效的发现一致的主题,但无法推断话题分布在给定的文件或利用现有的文件标签。为了克服这些局限性,我们提出主题建模与周期一致的对抗性训练(TOMCAT)及其监督版本sToMCAT。 TOMCAT采用发电机网络解释主题和编码器网络来推断文档的话题。对抗性训练和周期一致的约束用于鼓励所述发电机和所述编码器,以产生相互协调现实样品。 sToMCAT通过将文件标记为主题建模过程,以帮助发现更连贯的主题延伸TOMCAT。提出的模型的有效性无监督/监督主题建模和文本分类评估。实验结果表明,我们的模型可以产生相干和翔实的主题,超越了许多有竞争力的基线。
Xuemeng Hu, Rui Wang, Deyu Zhou, Yuxuan Xiong
Abstract: Advances on deep generative models have attracted significant research interest in neural topic modeling. The recently proposed Adversarial-neural Topic Model models topics with an adversarially trained generator network and employs Dirichlet prior to capture the semantic patterns in latent topics. It is effective in discovering coherent topics but unable to infer topic distributions for given documents or utilize available document labels. To overcome such limitations, we propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT. ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics. Adversarial training and cycle-consistent constraints are used to encourage the generator and the encoder to produce realistic samples that coordinate with each other. sToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics. The effectiveness of the proposed models is evaluated on unsupervised/supervised topic modeling and text classification. The experimental results show that our models can produce both coherent and informative topics, outperforming a number of competitive baselines.
摘要:深生成模型的进步已经引起了神经主题建模显著的研究兴趣。与adversarially训练的发电机网络最近提出的对抗性神经主题模型的模型主题和狄氏员工捕捉语义模式的潜在主题之前。它是有效的发现一致的主题,但无法推断话题分布在给定的文件或利用现有的文件标签。为了克服这些局限性,我们提出主题建模与周期一致的对抗性训练(TOMCAT)及其监督版本sToMCAT。 TOMCAT采用发电机网络解释主题和编码器网络来推断文档的话题。对抗性训练和周期一致的约束用于鼓励所述发电机和所述编码器,以产生相互协调现实样品。 sToMCAT通过将文件标记为主题建模过程,以帮助发现更连贯的主题延伸TOMCAT。提出的模型的有效性无监督/监督主题建模和文本分类评估。实验结果表明,我们的模型可以产生相干和翔实的主题,超越了许多有竞争力的基线。
8. Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models [PDF] 返回目录
YuSheng Su, Xu Han, Zhengyan Zhang, Peng Li, Zhiyuan Liu, Yankai Lin, Jie Zhou, Maosong Sun
Abstract: Several recent efforts have been devoted to enhancing pre-trained language models (PLMs) by utilizing extra heterogeneous knowledge in knowledge graphs (KGs), and achieved consistent improvements on various knowledge-driven NLP tasks. However, most of these knowledge-enhanced PLMs embed static sub-graphs of KGs ("knowledge context"), regardless of that the knowledge required by PLMs may change dynamically according to specific text ("textual context"). In this paper, we propose a novel framework named DKPLM to dynamically select and embed knowledge context according to textual context for PLMs, which can avoid the effect of redundant and ambiguous knowledge in KGs that cannot match the input text. Our experimental results show that DKPLM outperforms various baselines on typical knowledge-driven NLP tasks, indicating the effectiveness of utilizing dynamic knowledge context for language understanding. Besides the performance improvements, the dynamically selected knowledge in DKPLM can describe the semantics of text-related knowledge in a more interpretable form than the conventional PLMs. Our source code and datasets will be available to provide more details for DKPLM.
摘要:最近的一些努力,一直致力于加强预先训练语言模型(周期性肢体运动障碍)利用在知识图(KGS)额外的异质知识,取得了各种知识驱动的NLP任务持续改善。然而,大多数这些知识增强周期性肢体运动障碍的嵌入幼稚园的静态子图(“知识上下文”),不管通过周期性肢体运动障碍所需的知识可以根据特定的文本(“原文上下文”)动态地改变。在本文中,我们提出了一个新的名为DKPLM框架动态选择,并根据原文上下文的周期性肢体运动障碍,避免在不能输入的文本匹配幼儿园冗余和模糊知识的效果嵌入知识背景。我们的实验结果表明,DKPLM优于典型的知识型NLP任务的各种基线,表明利用动态知识上下文语言理解的有效性。除了性能的提升,在DKPLM动态选择的知识可以描述更可解释的形式比传统的周期性肢体运动障碍的文本的相关知识的语义。我们的源代码和数据集将可为DKPLM提供更多的细节。
YuSheng Su, Xu Han, Zhengyan Zhang, Peng Li, Zhiyuan Liu, Yankai Lin, Jie Zhou, Maosong Sun
Abstract: Several recent efforts have been devoted to enhancing pre-trained language models (PLMs) by utilizing extra heterogeneous knowledge in knowledge graphs (KGs), and achieved consistent improvements on various knowledge-driven NLP tasks. However, most of these knowledge-enhanced PLMs embed static sub-graphs of KGs ("knowledge context"), regardless of that the knowledge required by PLMs may change dynamically according to specific text ("textual context"). In this paper, we propose a novel framework named DKPLM to dynamically select and embed knowledge context according to textual context for PLMs, which can avoid the effect of redundant and ambiguous knowledge in KGs that cannot match the input text. Our experimental results show that DKPLM outperforms various baselines on typical knowledge-driven NLP tasks, indicating the effectiveness of utilizing dynamic knowledge context for language understanding. Besides the performance improvements, the dynamically selected knowledge in DKPLM can describe the semantics of text-related knowledge in a more interpretable form than the conventional PLMs. Our source code and datasets will be available to provide more details for DKPLM.
摘要:最近的一些努力,一直致力于加强预先训练语言模型(周期性肢体运动障碍)利用在知识图(KGS)额外的异质知识,取得了各种知识驱动的NLP任务持续改善。然而,大多数这些知识增强周期性肢体运动障碍的嵌入幼稚园的静态子图(“知识上下文”),不管通过周期性肢体运动障碍所需的知识可以根据特定的文本(“原文上下文”)动态地改变。在本文中,我们提出了一个新的名为DKPLM框架动态选择,并根据原文上下文的周期性肢体运动障碍,避免在不能输入的文本匹配幼儿园冗余和模糊知识的效果嵌入知识背景。我们的实验结果表明,DKPLM优于典型的知识型NLP任务的各种基线,表明利用动态知识上下文语言理解的有效性。除了性能的提升,在DKPLM动态选择的知识可以描述更可解释的形式比传统的周期性肢体运动障碍的文本的相关知识的语义。我们的源代码和数据集将可为DKPLM提供更多的细节。
9. Aligning Intraobserver Agreement by Transitivity [PDF] 返回目录
Jacopo Amidei
Abstract: Annotation reproducibility and accuracy rely on good consistency within annotators. We propose a novel method for measuring within annotator consistency or annotator Intraobserver Agreement (IA). The proposed approach is based on transitivity, a measure that has been thoroughly studied in the context of rational decision-making. The transitivity measure, in contrast with the commonly used test-retest strategy for annotator IA, is less sensitive to the several types of bias introduced by the test-retest strategy. We present a representation theorem to the effect that relative judgement data that meet transitivity can be mapped to a scale (in terms of measurement theory). We also discuss a further application of transitivity as part of data collection design for addressing the problem of the quadratic complexity of data collection of relative judgements.
摘要:注释重复性和准确性依赖于注释中很好的一致性。我们提出了注释一致性或注释视盘协议(IA)内测量的新方法。所提出的方法是基于传递,已经在理性决策的情况下被彻底研究的措施。传递性的措施,与用于注释1A中的常用的测试 - 再测试策略相反,是由重测策略介绍了几种类型的偏压的较不敏感。我们提出了一个表示定理,大意是相对判断数据满足及物可以被映射到一个尺度(在测量理论的术语)。我们还讨论了传递的进一步应用是解决相对判断数据采集的二次复杂的问题,数据采集设计的一部分。
Jacopo Amidei
Abstract: Annotation reproducibility and accuracy rely on good consistency within annotators. We propose a novel method for measuring within annotator consistency or annotator Intraobserver Agreement (IA). The proposed approach is based on transitivity, a measure that has been thoroughly studied in the context of rational decision-making. The transitivity measure, in contrast with the commonly used test-retest strategy for annotator IA, is less sensitive to the several types of bias introduced by the test-retest strategy. We present a representation theorem to the effect that relative judgement data that meet transitivity can be mapped to a scale (in terms of measurement theory). We also discuss a further application of transitivity as part of data collection design for addressing the problem of the quadratic complexity of data collection of relative judgements.
摘要:注释重复性和准确性依赖于注释中很好的一致性。我们提出了注释一致性或注释视盘协议(IA)内测量的新方法。所提出的方法是基于传递,已经在理性决策的情况下被彻底研究的措施。传递性的措施,与用于注释1A中的常用的测试 - 再测试策略相反,是由重测策略介绍了几种类型的偏压的较不敏感。我们提出了一个表示定理,大意是相对判断数据满足及物可以被映射到一个尺度(在测量理论的术语)。我们还讨论了传递的进一步应用是解决相对判断数据采集的二次复杂的问题,数据采集设计的一部分。
10. Utterance-level Dialogue Understanding: An Empirical Study [PDF] 返回目录
Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Abstract: The recent abundance of conversational data on the Web and elsewhere calls for effective NLP systems for dialog understanding. Complete utterance-level understanding often requires context understanding, defined by nearby utterances. In recent years, a number of approaches have been proposed for various utterance-level dialogue understanding tasks. Most of these approaches account for the context for effective understanding. In this paper, we explore and quantify the role of context for different aspects of a dialogue, namely emotion, intent, and dialogue act identification, using state-of-the-art dialog understanding methods as baselines. Specifically, we employ various perturbations to distort the context of a given utterance and study its impact on the different tasks and baselines. This provides us with insights into the fundamental contextual controlling factors of different aspects of a dialogue. Such insights can inspire more effective dialogue understanding models, and provide support for future text generation approaches. The implementation pertaining to this work is available at this https URL dialogue-understanding.
摘要:最近在网络上的会话数据的丰富和其他要求对对话的理解有效的NLP系统。完整的话语层面的理解往往需要上下文的理解,通过附近的话语定义。近年来,许多方法已经提出了各种话语级别对话理解任务。大多数这些对于有效地了解上下文接近帐户。在本文中,我们探索和量化的背景下进行对话,即情感,意图和对话行为识别的不同方面,采用先进设备,最先进的对话理解方法作为基准的作用。具体来说,我们采用各种扰动扭曲给定的话语背景下,研究其对不同的任务和基线的影响。这为我们提供了洞察对话的不同方面的基本情境控制因素。这种洞察力可以激励更有效的对话理解的模型,并为未来的文本生成方法提供支持。关于这项工作的实施可在此HTTPS URL对话认识。
Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Abstract: The recent abundance of conversational data on the Web and elsewhere calls for effective NLP systems for dialog understanding. Complete utterance-level understanding often requires context understanding, defined by nearby utterances. In recent years, a number of approaches have been proposed for various utterance-level dialogue understanding tasks. Most of these approaches account for the context for effective understanding. In this paper, we explore and quantify the role of context for different aspects of a dialogue, namely emotion, intent, and dialogue act identification, using state-of-the-art dialog understanding methods as baselines. Specifically, we employ various perturbations to distort the context of a given utterance and study its impact on the different tasks and baselines. This provides us with insights into the fundamental contextual controlling factors of different aspects of a dialogue. Such insights can inspire more effective dialogue understanding models, and provide support for future text generation approaches. The implementation pertaining to this work is available at this https URL dialogue-understanding.
摘要:最近在网络上的会话数据的丰富和其他要求对对话的理解有效的NLP系统。完整的话语层面的理解往往需要上下文的理解,通过附近的话语定义。近年来,许多方法已经提出了各种话语级别对话理解任务。大多数这些对于有效地了解上下文接近帐户。在本文中,我们探索和量化的背景下进行对话,即情感,意图和对话行为识别的不同方面,采用先进设备,最先进的对话理解方法作为基准的作用。具体来说,我们采用各种扰动扭曲给定的话语背景下,研究其对不同的任务和基线的影响。这为我们提供了洞察对话的不同方面的基本情境控制因素。这种洞察力可以激励更有效的对话理解的模型,并为未来的文本生成方法提供支持。关于这项工作的实施可在此HTTPS URL对话认识。
11. Sequence-to-Sequence Learning for Indonesian Automatic Question Generator [PDF] 返回目录
Ferdiant Joshua Muis, Ayu Purwarianti
Abstract: Automatic question generation is defined as the task of automating the creation of question given a various of textual data. Research in automatic question generator (AQG) has been conducted for more than 10 years, mainly focused on factoid question. In all these studies, the state-of-the-art is attained using sequence-to-sequence approach. However, AQG system for Indonesian has not ever been researched intensely. In this work we construct an Indonesian automatic question generator, adapting the architecture from some previous works. In summary, we used sequence-to-sequence approach using BiGRU, BiLSTM, and Transformer with additional linguistic features, copy mechanism, and coverage mechanism. Since there is no public large dan popular Indonesian dataset for question generation, we translated SQuAD v2.0 factoid question answering dataset, with additional Indonesian TyDiQA dev set for testing. The system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96, 10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for TyDiQA, respectively. The system performed well when the expected answers are named entities and are syntactically close with the context explaining them. Additionally, from native Indonesian perspective, the best questions generated by our best models on their best cases are acceptable and reasonably useful.
摘要:自动询问生成被定义为自动给出一个不同的文本数据的创建问题的任务。研究自动问题生成(AQG)已经为超过10年进行的,主要集中在事实型询问。在所有这些研究中,所述状态的最先进的是使用序列到序列的方法来实现。然而,空气质量准则体系印尼还没有曾经被强烈的研究。在这项工作中,我们构建一个印尼自动问题产生,从以前的一些作品改编的架构。综上所述,我们使用BiGRU,BiLSTM和变压器额外的语言特征,复制机制和覆盖机制使用顺序对序列的方法。由于没有公共丹大受欢迎印尼的问题生成的数据集,我们翻译的阵容V2.0事实型询问应答数据集,与测试提供了更多的印尼TyDiQA开发集。该系统实现了BLEU1,BLEU2,BLEU3,BLEU4,和ROUGE-L得分在38,35,20,96,10,68,5,78,和43,4为小队,和39.9,20.78,10.26,6.31,44.13对于TyDiQA,分别。当预期的答案被命名实体和在语法上密切与上下文解释他们的系统表现良好。此外,来自印度尼西亚民族的角度来看,我们最好的榜样对他们最好的情况下产生的最好的问题,是可以接受的,合理有用。
Ferdiant Joshua Muis, Ayu Purwarianti
Abstract: Automatic question generation is defined as the task of automating the creation of question given a various of textual data. Research in automatic question generator (AQG) has been conducted for more than 10 years, mainly focused on factoid question. In all these studies, the state-of-the-art is attained using sequence-to-sequence approach. However, AQG system for Indonesian has not ever been researched intensely. In this work we construct an Indonesian automatic question generator, adapting the architecture from some previous works. In summary, we used sequence-to-sequence approach using BiGRU, BiLSTM, and Transformer with additional linguistic features, copy mechanism, and coverage mechanism. Since there is no public large dan popular Indonesian dataset for question generation, we translated SQuAD v2.0 factoid question answering dataset, with additional Indonesian TyDiQA dev set for testing. The system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96, 10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for TyDiQA, respectively. The system performed well when the expected answers are named entities and are syntactically close with the context explaining them. Additionally, from native Indonesian perspective, the best questions generated by our best models on their best cases are acceptable and reasonably useful.
摘要:自动询问生成被定义为自动给出一个不同的文本数据的创建问题的任务。研究自动问题生成(AQG)已经为超过10年进行的,主要集中在事实型询问。在所有这些研究中,所述状态的最先进的是使用序列到序列的方法来实现。然而,空气质量准则体系印尼还没有曾经被强烈的研究。在这项工作中,我们构建一个印尼自动问题产生,从以前的一些作品改编的架构。综上所述,我们使用BiGRU,BiLSTM和变压器额外的语言特征,复制机制和覆盖机制使用顺序对序列的方法。由于没有公共丹大受欢迎印尼的问题生成的数据集,我们翻译的阵容V2.0事实型询问应答数据集,与测试提供了更多的印尼TyDiQA开发集。该系统实现了BLEU1,BLEU2,BLEU3,BLEU4,和ROUGE-L得分在38,35,20,96,10,68,5,78,和43,4为小队,和39.9,20.78,10.26,6.31,44.13对于TyDiQA,分别。当预期的答案被命名实体和在语法上密切与上下文解释他们的系统表现良好。此外,来自印度尼西亚民族的角度来看,我们最好的榜样对他们最好的情况下产生的最好的问题,是可以接受的,合理有用。
12. Utility is in the Eye of the User: A Critique of NLP Leaderboards [PDF] 返回目录
Kawin Ethayarajh, Dan Jurafsky
Abstract: Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models. While this leaderboard paradigm has been remarkably successful, a historical focus on performance-based evaluation has been at the expense of other qualities that the NLP community values in models, such as compactness, fairness, and energy efficiency. In this opinion paper, we study the divergence between what is incentivized by leaderboards and what is useful in practice through the lens of microeconomic theory. We frame both the leaderboard and NLP practitioners as consumers and the benefit they get from a model as its utility to them. With this framing, we formalize how leaderboards -- in their current form -- can be poor proxies for the NLP community at large. For example, a highly inefficient model would provide less utility to practitioners but not to a leaderboard, since it is a cost that only the former must bear. To allow practitioners to better estimate a model's utility to them, we advocate for more transparency on leaderboards, such as the reporting of statistics that are of practical concern (e.g., model size, energy efficiency, and inference latency).
摘要:基准如胶已通过建立激励机制更精确的模型的建立有助于在NLP驱动器的进步。虽然此排行榜模式已经非常成功的,历史的重点放在基于性能的评价已经在其他品质为代价,在模型的NLP界值,如紧凑性,公平性和能效。在这个意见论文中,我们研究之间是什么诱因排行榜,哪些是通过微观经济理论的镜头在实践中的分歧。我们既帧的排行榜和NLP从业者作为消费者,他们从一个模型作为其效用,让他们的利益。有了这个框架,我们如何正式排行榜 - 以当前的形式 - 可以为NLP社区穷代理在逃。例如,一个非常低效模式将从业者,但不是在排行榜提供较少的工具,因为它是一种成本只有前者必须承担。为了让学员更好地估计模型的效用对他们来说,我们提倡在排行榜上更透明,如实用关注的统计报告(例如,模型的大小,能源效率,并推断延迟)。
Kawin Ethayarajh, Dan Jurafsky
Abstract: Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models. While this leaderboard paradigm has been remarkably successful, a historical focus on performance-based evaluation has been at the expense of other qualities that the NLP community values in models, such as compactness, fairness, and energy efficiency. In this opinion paper, we study the divergence between what is incentivized by leaderboards and what is useful in practice through the lens of microeconomic theory. We frame both the leaderboard and NLP practitioners as consumers and the benefit they get from a model as its utility to them. With this framing, we formalize how leaderboards -- in their current form -- can be poor proxies for the NLP community at large. For example, a highly inefficient model would provide less utility to practitioners but not to a leaderboard, since it is a cost that only the former must bear. To allow practitioners to better estimate a model's utility to them, we advocate for more transparency on leaderboards, such as the reporting of statistics that are of practical concern (e.g., model size, energy efficiency, and inference latency).
摘要:基准如胶已通过建立激励机制更精确的模型的建立有助于在NLP驱动器的进步。虽然此排行榜模式已经非常成功的,历史的重点放在基于性能的评价已经在其他品质为代价,在模型的NLP界值,如紧凑性,公平性和能效。在这个意见论文中,我们研究之间是什么诱因排行榜,哪些是通过微观经济理论的镜头在实践中的分歧。我们既帧的排行榜和NLP从业者作为消费者,他们从一个模型作为其效用,让他们的利益。有了这个框架,我们如何正式排行榜 - 以当前的形式 - 可以为NLP社区穷代理在逃。例如,一个非常低效模式将从业者,但不是在排行榜提供较少的工具,因为它是一种成本只有前者必须承担。为了让学员更好地估计模型的效用对他们来说,我们提倡在排行榜上更透明,如实用关注的统计报告(例如,模型的大小,能源效率,并推断延迟)。
13. Fake News Spreader Detection on Twitter using Character N-Grams. Notebook for PAN at CLEF 2020 [PDF] 返回目录
Inna Vogel, Meghana Meghana
Abstract: The authors of fake news often use facts from verified news sources and mix them with misinformation to create confusion and provoke unrest among the readers. The spread of fake news can thereby have serious implications on our society. They can sway political elections, push down the stock price or crush reputations of corporations or public figures. Several websites have taken on the mission of checking rumors and allegations, but are often not fast enough to check the content of all the news being disseminated. Especially social media websites have offered an easy platform for the fast propagation of information. Towards limiting fake news from being propagated among social media users, the task of this year's PAN 2020 challenge lays the focus on the fake news spreaders. The aim of the task is to determine whether it is possible to discriminate authors that have shared fake news in the past from those that have never done it. In this notebook, we describe our profiling system for the fake news detection task on Twitter. For this, we conduct different feature extraction techniques and learning experiments from a multilingual perspective, namely English and Spanish. Our final submitted systems use character n-grams as features in combination with a linear SVM for English and Logistic Regression for the Spanish language. Our submitted models achieve an overall accuracy of 73% and 79% on the English and Spanish official test set, respectively. Our experiments show that it is difficult to differentiate solidly fake news spreaders on Twitter from users who share credible information leaving room for further investigations. Our model ranked 3rd out of 72 competitors.
摘要:假新闻的作者经常用事实验证了从新闻来源,并与误传混合的读者中制造混乱,挑起动乱。假新闻的传播可以借此对我们的社会造成严重影响。他们可以挥洒政治选举,推低股价或公司或公众人物的名誉美眉。一些网站已经采取检查谣言和指控的任务,但往往不够快检查所有正在传播的新闻内容。尤其是社交媒体网站已经提供对信息的快速传播的易平台。从走向社会化媒体用户中被传播限制假新闻,今年的PAN 2020挑战的任务奠定了专注于假新闻传播者。该任务的目的是确定是否有可能是从那些从未做过共享假新闻在过去的判别作者。在这款笔记本上,我们描述了在Twitter上假新闻的检测任务,我们的分析系统。对于这一点,我们从一个多语种的角度,即英语和西班牙语进行不同的特征提取技术和学习实验。我们最后提交系统使用字符的n-gram作为特征结合英语和Logistic回归的线性SVM为西班牙语。我们提交的车型达到73%的总体精确度和在英国为79%和西班牙的官方测试集,分别。我们的实验表明,这是很难区分谁分享可靠的信息留出空间进一步调查用户在Twitter上扎实假新闻传播者。我们的模型位列72个竞争者第三出来。
Inna Vogel, Meghana Meghana
Abstract: The authors of fake news often use facts from verified news sources and mix them with misinformation to create confusion and provoke unrest among the readers. The spread of fake news can thereby have serious implications on our society. They can sway political elections, push down the stock price or crush reputations of corporations or public figures. Several websites have taken on the mission of checking rumors and allegations, but are often not fast enough to check the content of all the news being disseminated. Especially social media websites have offered an easy platform for the fast propagation of information. Towards limiting fake news from being propagated among social media users, the task of this year's PAN 2020 challenge lays the focus on the fake news spreaders. The aim of the task is to determine whether it is possible to discriminate authors that have shared fake news in the past from those that have never done it. In this notebook, we describe our profiling system for the fake news detection task on Twitter. For this, we conduct different feature extraction techniques and learning experiments from a multilingual perspective, namely English and Spanish. Our final submitted systems use character n-grams as features in combination with a linear SVM for English and Logistic Regression for the Spanish language. Our submitted models achieve an overall accuracy of 73% and 79% on the English and Spanish official test set, respectively. Our experiments show that it is difficult to differentiate solidly fake news spreaders on Twitter from users who share credible information leaving room for further investigations. Our model ranked 3rd out of 72 competitors.
摘要:假新闻的作者经常用事实验证了从新闻来源,并与误传混合的读者中制造混乱,挑起动乱。假新闻的传播可以借此对我们的社会造成严重影响。他们可以挥洒政治选举,推低股价或公司或公众人物的名誉美眉。一些网站已经采取检查谣言和指控的任务,但往往不够快检查所有正在传播的新闻内容。尤其是社交媒体网站已经提供对信息的快速传播的易平台。从走向社会化媒体用户中被传播限制假新闻,今年的PAN 2020挑战的任务奠定了专注于假新闻传播者。该任务的目的是确定是否有可能是从那些从未做过共享假新闻在过去的判别作者。在这款笔记本上,我们描述了在Twitter上假新闻的检测任务,我们的分析系统。对于这一点,我们从一个多语种的角度,即英语和西班牙语进行不同的特征提取技术和学习实验。我们最后提交系统使用字符的n-gram作为特征结合英语和Logistic回归的线性SVM为西班牙语。我们提交的车型达到73%的总体精确度和在英国为79%和西班牙的官方测试集,分别。我们的实验表明,这是很难区分谁分享可靠的信息留出空间进一步调查用户在Twitter上扎实假新闻传播者。我们的模型位列72个竞争者第三出来。
14. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [PDF] 返回目录
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
Abstract: We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.
摘要:我们目前格拉帕,为表语义分析的有效前培训的办法,学习在文本和表格数据的联合代表的组成归纳偏置。我们通过从现有文本到SQL数据集感应同步上下文无关文法(SCFG)构建了高品质的合成表问题-SQL对。我们预培养使用一种新的文本模式连接目标,预测表中字段的SQL每个问题-SQL对句法的作用合成数据我们的模型。为了保持模型的代表真实世界数据的能力,我们还包括屏蔽语言模型(MLM)在几个现有的表和语言数据集以规范预训练过程。在四大流行充分监督和弱监督表语义分析的基准,格拉帕显著优于罗伯塔 - 大特征表示层和建立国家的最先进的新上所有的人的结果。
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
Abstract: We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.
摘要:我们目前格拉帕,为表语义分析的有效前培训的办法,学习在文本和表格数据的联合代表的组成归纳偏置。我们通过从现有文本到SQL数据集感应同步上下文无关文法(SCFG)构建了高品质的合成表问题-SQL对。我们预培养使用一种新的文本模式连接目标,预测表中字段的SQL每个问题-SQL对句法的作用合成数据我们的模型。为了保持模型的代表真实世界数据的能力,我们还包括屏蔽语言模型(MLM)在几个现有的表和语言数据集以规范预训练过程。在四大流行充分监督和弱监督表语义分析的基准,格拉帕显著优于罗伯塔 - 大特征表示层和建立国家的最先进的新上所有的人的结果。
15. HINT3: Raising the bar for Intent Detection in the Wild [PDF] 返回目录
Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi
Abstract: Intent Detection systems in the real world are exposed to complexities of imbalanced datasets containing varying perception of intent, unintended correlations and domain-specific aberrations. To facilitate benchmarking which can reflect near real-world scenarios, we introduce 3 new datasets created from live chatbots in diverse domains. Unlike most existing datasets that are crowdsourced, our datasets contain real user queries received by the chatbots and facilitates penalising unwanted correlations grasped during the training process. We evaluate 4 NLU platforms and a BERT based classifier and find that performance saturates at inadequate levels on test sets because all systems latch on to unintended patterns in training data.
摘要:在真实世界中意图检测系统被暴露于含不同的意图,意外的相关性和特定域像差的感知不平衡数据集的复杂性。为了便于基准可以接近真实世界的场景中反映,我们将介绍从活聊天机器人在不同领域创造了3个新的数据集。不像是众包大多数现有的数据集,我们的数据集包含由聊天机器人,并促进处罚在训练过程中掌握不必要的关联收到真正的用户查询。我们评估4个NLU平台和基于BERT分类和查找测试集的水平不足,业绩饱和,因为所有系统在训练数据上意外的模式锁定。
Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi
Abstract: Intent Detection systems in the real world are exposed to complexities of imbalanced datasets containing varying perception of intent, unintended correlations and domain-specific aberrations. To facilitate benchmarking which can reflect near real-world scenarios, we introduce 3 new datasets created from live chatbots in diverse domains. Unlike most existing datasets that are crowdsourced, our datasets contain real user queries received by the chatbots and facilitates penalising unwanted correlations grasped during the training process. We evaluate 4 NLU platforms and a BERT based classifier and find that performance saturates at inadequate levels on test sets because all systems latch on to unintended patterns in training data.
摘要:在真实世界中意图检测系统被暴露于含不同的意图,意外的相关性和特定域像差的感知不平衡数据集的复杂性。为了便于基准可以接近真实世界的场景中反映,我们将介绍从活聊天机器人在不同领域创造了3个新的数据集。不像是众包大多数现有的数据集,我们的数据集包含由聊天机器人,并促进处罚在训练过程中掌握不必要的关联收到真正的用户查询。我们评估4个NLU平台和基于BERT分类和查找测试集的水平不足,业绩饱和,因为所有系统在训练数据上意外的模式锁定。
16. SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery [PDF] 返回目录
Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han
Abstract: Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependencies. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities' infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy. To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
摘要:实体集的扩展和同义词发现有两个关键NLP任务。以前的研究单独完成这些,而不探究他们之间的相互依赖。在这项工作中,我们假设这两个任务是紧耦合的因为有两个代名词实体往往有属于不同的语义类别的类似可能性。这促使我们设计SynSetExpan,一个新的框架,使两个任务互相增进彼此。 SynSetExpan使用同义词发现模型,包括流行实体罕见的同义词入套,队内套伸缩召回。同时,集扩展模型,能够确定实体是否属于语义类别,可以生成伪训练数据进行微调朝着更高的精度同义词发现模型。为了便于研究这两项工作的interplays的研究,我们创建了第一个大规模同义词增强型集扩展(SE2)数据集通过众包。在SE2数据集和以前的基准大量实验表明SynSetExpan的两个实体集扩展和同义词发现任务的有效性。
Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han
Abstract: Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependencies. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities' infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy. To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
摘要:实体集的扩展和同义词发现有两个关键NLP任务。以前的研究单独完成这些,而不探究他们之间的相互依赖。在这项工作中,我们假设这两个任务是紧耦合的因为有两个代名词实体往往有属于不同的语义类别的类似可能性。这促使我们设计SynSetExpan,一个新的框架,使两个任务互相增进彼此。 SynSetExpan使用同义词发现模型,包括流行实体罕见的同义词入套,队内套伸缩召回。同时,集扩展模型,能够确定实体是否属于语义类别,可以生成伪训练数据进行微调朝着更高的精度同义词发现模型。为了便于研究这两项工作的interplays的研究,我们创建了第一个大规模同义词增强型集扩展(SE2)数据集通过众包。在SE2数据集和以前的基准大量实验表明SynSetExpan的两个实体集扩展和同义词发现任务的有效性。
17. A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [PDF] 返回目录
Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen
Abstract: Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erased to yield its restricted views (during the fine-tuning stage). Notably, this process relies merely on stochastic sampling and thus adds little computational overhead. A Jensen-Shannon Divergence consistency loss is further utilized to incorporate these augmented samples into the training objective in a principled manner. To verify the effectiveness of the proposed strategies, we apply cutoff to both natural language understanding and generation problems. On the GLUE benchmark, it is demonstrated that cutoff, in spite of its simplicity, performs on par or better than several competitive adversarial-based approaches. We further extend cutoff to machine translation and observe significant gains in BLEU scores (based upon the Transformer Base model). Moreover, cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
摘要:对抗性训练已被证明在具有更强的泛化能力赋予学习交涉有效。然而,它通常需要昂贵的计算,以确定所注入的扰动的方向。在本文中,我们介绍了一组简单的但有效的数据扩张策略称为截止,其中的输入句子内的信息的一部分被擦除,得到其受限制的视图(在微调阶段)。值得注意的是,这个过程仅仅依赖于随机采样,从而增加很少的计算开销。甲詹森 - 香农散度的一致性损耗被进一步利用这些扩充的样品掺入培养目标有原则的方式。为了验证所提出的战略的有效性,我们申请截止到两个自然语言理解和产生的问题。上胶基准,则表明截止,尽管它的简单性,媲美或优于几个有竞争力的基于对抗的办法执行。我们进一步延长截止到机器翻译,并观察BLEU得分显著收益(基于变压器的基本型号)。此外,截止一贯优于对抗训练和实现国家的最先进成果的IWSLT2014德语英语数据集。
Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen
Abstract: Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erased to yield its restricted views (during the fine-tuning stage). Notably, this process relies merely on stochastic sampling and thus adds little computational overhead. A Jensen-Shannon Divergence consistency loss is further utilized to incorporate these augmented samples into the training objective in a principled manner. To verify the effectiveness of the proposed strategies, we apply cutoff to both natural language understanding and generation problems. On the GLUE benchmark, it is demonstrated that cutoff, in spite of its simplicity, performs on par or better than several competitive adversarial-based approaches. We further extend cutoff to machine translation and observe significant gains in BLEU scores (based upon the Transformer Base model). Moreover, cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
摘要:对抗性训练已被证明在具有更强的泛化能力赋予学习交涉有效。然而,它通常需要昂贵的计算,以确定所注入的扰动的方向。在本文中,我们介绍了一组简单的但有效的数据扩张策略称为截止,其中的输入句子内的信息的一部分被擦除,得到其受限制的视图(在微调阶段)。值得注意的是,这个过程仅仅依赖于随机采样,从而增加很少的计算开销。甲詹森 - 香农散度的一致性损耗被进一步利用这些扩充的样品掺入培养目标有原则的方式。为了验证所提出的战略的有效性,我们申请截止到两个自然语言理解和产生的问题。上胶基准,则表明截止,尽管它的简单性,媲美或优于几个有竞争力的基于对抗的办法执行。我们进一步延长截止到机器翻译,并观察BLEU得分显著收益(基于变压器的基本型号)。此外,截止一贯优于对抗训练和实现国家的最先进成果的IWSLT2014德语英语数据集。
18. Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation [PDF] 返回目录
Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer
Abstract: Neural models that independently project questions and answers into a shared embedding space allow for efficient continuous space retrieval from large corpora. Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers. While critical for efficient retrieval, late fusion underperforms models that make use of early fusion (e.g., a BERT based classifier with cross-attention between question-answer pairs). We present a supervised data mining method using an accurate early fusion model to improve the training of an efficient late fusion retrieval model. We first train an accurate classification model with cross-attention between questions and answers. The accurate cross-attention model is then used to annotate additional passages in order to generate weighted training examples for a neural retrieval model. The resulting retrieval model with additional data significantly outperforms retrieval models directly trained with gold annotations on Precision at $N$ (P@N) and Mean Reciprocal Rank (MRR).
摘要:神经模型独立项目的问题和答案到一个共享嵌入空间允许从大语料库有效的连续空间检索。独立计算的提问和答疑结果在的嵌入相关的匹配问题,他们的回答信息后融合。虽然高效检索,后融合不佳模型,利用早期融合的关键(例如,一个基于BERT与分类问答配对之间的交叉注意)。我们提出用一个精确的早期融合模式,提高高效的后融合检索模型的训练有监督的数据挖掘方法。首先,我们一起训练的问题和答案之间的交叉注意准确的分类模型。然后,将精确的横注意模型用于注释附加的通路,以便产生用于神经检索模型加权训练示例。与其他数据所得到的检索模型显著优于检索模型直接受训与精密黄金注释在$ N $(P @ N)和平均倒数排名(MRR)。
Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer
Abstract: Neural models that independently project questions and answers into a shared embedding space allow for efficient continuous space retrieval from large corpora. Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers. While critical for efficient retrieval, late fusion underperforms models that make use of early fusion (e.g., a BERT based classifier with cross-attention between question-answer pairs). We present a supervised data mining method using an accurate early fusion model to improve the training of an efficient late fusion retrieval model. We first train an accurate classification model with cross-attention between questions and answers. The accurate cross-attention model is then used to annotate additional passages in order to generate weighted training examples for a neural retrieval model. The resulting retrieval model with additional data significantly outperforms retrieval models directly trained with gold annotations on Precision at $N$ (P@N) and Mean Reciprocal Rank (MRR).
摘要:神经模型独立项目的问题和答案到一个共享嵌入空间允许从大语料库有效的连续空间检索。独立计算的提问和答疑结果在的嵌入相关的匹配问题,他们的回答信息后融合。虽然高效检索,后融合不佳模型,利用早期融合的关键(例如,一个基于BERT与分类问答配对之间的交叉注意)。我们提出用一个精确的早期融合模式,提高高效的后融合检索模型的训练有监督的数据挖掘方法。首先,我们一起训练的问题和答案之间的交叉注意准确的分类模型。然后,将精确的横注意模型用于注释附加的通路,以便产生用于神经检索模型加权训练示例。与其他数据所得到的检索模型显著优于检索模型直接受训与精密黄金注释在$ N $(P @ N)和平均倒数排名(MRR)。
19. Double Graph Based Reasoning for Document-level Relation Extraction [PDF] 返回目录
Shuang Zeng, Runxin Xu, Baobao Chang, Lei Li
Abstract: Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across a document. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN) featuring double graphs. GAIN first constructs a heterogeneous mention-level graph (hMG) to model complex interaction among different mentions across the document. It also constructs an entity-level graph (EG), based on which we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at this https URL .
摘要:文档级关系抽取旨在提取文档中的实体之间的关系。从句子层次关系抽取不同的是,它需要多个句子跨文档推理过来。在本文中,我们提出了图形汇聚和推理网络(GAIN)配备双图。 GAIN首先构造横跨文献提及的非均相提级图表(HMG),以复杂的相互作用模型中不同。它也构建了一个实体级图(EG),在此基础上,我们提出了一个新的路径推理机制来推断实体之间的关系。在公共数据集,DocRED,实验表明GAIN实现了显著的性能提升(2.85上F1)比上年国家的最先进的。我们的代码可在此HTTPS URL。
Shuang Zeng, Runxin Xu, Baobao Chang, Lei Li
Abstract: Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across a document. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN) featuring double graphs. GAIN first constructs a heterogeneous mention-level graph (hMG) to model complex interaction among different mentions across the document. It also constructs an entity-level graph (EG), based on which we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at this https URL .
摘要:文档级关系抽取旨在提取文档中的实体之间的关系。从句子层次关系抽取不同的是,它需要多个句子跨文档推理过来。在本文中,我们提出了图形汇聚和推理网络(GAIN)配备双图。 GAIN首先构造横跨文献提及的非均相提级图表(HMG),以复杂的相互作用模型中不同。它也构建了一个实体级图(EG),在此基础上,我们提出了一个新的路径推理机制来推断实体之间的关系。在公共数据集,DocRED,实验表明GAIN实现了显著的性能提升(2.85上F1)比上年国家的最先进的。我们的代码可在此HTTPS URL。
20. Leader: Prefixing a Length for Faster Word Vector Serialization [PDF] 返回目录
Brian Lester
Abstract: Two competing file formats have become the de facto standards for distributing pre-trained word embeddings. Both are named after the most popular pre-trained embeddings that are distributed in that format. The GloVe format is an entirely text based format that suffers from huge file sizes and slow reads, and the word2vec format is a smaller binary format that mixes a textual representation of words with a binary representation of the vectors themselves. Both formats have problems that we solve with a new format we call the Leader format. We include a word length prefix for faster reads while maintaining the smaller file size a binary format offers. We also created a minimalist library to facilitate the reading and writing of various word vector formats, as well as tools for converting pre-trained embeddings to our new Leader format.
摘要:两个竞争的文件格式已经成为分销预先训练字的嵌入的事实上的标准。两者都是分布在该格式最流行的预先训练的嵌入而得名。手套格式是一种基于完全的文本格式,从巨大的文件大小和患有慢读取,并且word2vec格式是一个较小的二进制格式混合词语的文本表示用载体本身的二进制表示。这两种格式有问题,我们与我们所说的领袖格式的新格式解决。我们有一个字长前缀的速度,同时保持较小的文件大小二进制格式提供阅读。我们还创建了一个简约库,以方便阅读和各种字矢量格式书写,以及用于将预先训练的嵌入到我们的新领袖格式的工具。
Brian Lester
Abstract: Two competing file formats have become the de facto standards for distributing pre-trained word embeddings. Both are named after the most popular pre-trained embeddings that are distributed in that format. The GloVe format is an entirely text based format that suffers from huge file sizes and slow reads, and the word2vec format is a smaller binary format that mixes a textual representation of words with a binary representation of the vectors themselves. Both formats have problems that we solve with a new format we call the Leader format. We include a word length prefix for faster reads while maintaining the smaller file size a binary format offers. We also created a minimalist library to facilitate the reading and writing of various word vector formats, as well as tools for converting pre-trained embeddings to our new Leader format.
摘要:两个竞争的文件格式已经成为分销预先训练字的嵌入的事实上的标准。两者都是分布在该格式最流行的预先训练的嵌入而得名。手套格式是一种基于完全的文本格式,从巨大的文件大小和患有慢读取,并且word2vec格式是一个较小的二进制格式混合词语的文本表示用载体本身的二进制表示。这两种格式有问题,我们与我们所说的领袖格式的新格式解决。我们有一个字长前缀的速度,同时保持较小的文件大小二进制格式提供阅读。我们还创建了一个简约库,以方便阅读和各种字矢量格式书写,以及用于将预先训练的嵌入到我们的新领袖格式的工具。
21. Improve Transformer Models with Better Relative Position Embeddings [PDF] 返回目录
Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang
Abstract: Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a sinusoid embedding is fixed and not learnable. In this paper, we first review absolute position embeddings and existing methods for relative position embeddings. We then propose new techniques that encourage increased interaction between query, key and relative position embeddings in the self-attention mechanism. Our most promising approach is a generalization of the absolute position embedding, improving results on SQuAD1.1 compared to previous position embeddings approaches. In addition, we address the inductive property of whether a position embedding can be robust enough to handle long sequences. We demonstrate empirically that our relative position embedding method is reasonably generalized and robust from the inductive perspective. Finally, we show that our proposed method can be adopted as a near drop-in replacement for improving the accuracy of large models with a small computational budget.
摘要:变压器的架构依靠明确的位置编码,以保持词序的概念。在本文中,我们认为,现有的工作不充分利用位置信息。例如,正弦曲线嵌入体的初步建议是固定的,而不是可学习。在本文中,我们首先回顾绝对位置的嵌入和相对位置的嵌入现有的方法。然后,我们建议,鼓励自注意机制提高查询键和相对位置的嵌入之间的互动的新技术。我们最有前途的方法是绝对位置嵌入的推广,提高相比以往的嵌入位置上的办法结果SQuAD1.1。此外,我们要解决的位置嵌入是否能足够强大的处理长序列的电感特性。我们经验证明,我们的相对位置埋线法是合理的推广,并从感性的角度稳健。最后,我们表明,我们提出的方法可以作为近下拉更换为提高大型模型精度有小的计算预算通过。
Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang
Abstract: Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a sinusoid embedding is fixed and not learnable. In this paper, we first review absolute position embeddings and existing methods for relative position embeddings. We then propose new techniques that encourage increased interaction between query, key and relative position embeddings in the self-attention mechanism. Our most promising approach is a generalization of the absolute position embedding, improving results on SQuAD1.1 compared to previous position embeddings approaches. In addition, we address the inductive property of whether a position embedding can be robust enough to handle long sequences. We demonstrate empirically that our relative position embedding method is reasonably generalized and robust from the inductive perspective. Finally, we show that our proposed method can be adopted as a near drop-in replacement for improving the accuracy of large models with a small computational budget.
摘要:变压器的架构依靠明确的位置编码,以保持词序的概念。在本文中,我们认为,现有的工作不充分利用位置信息。例如,正弦曲线嵌入体的初步建议是固定的,而不是可学习。在本文中,我们首先回顾绝对位置的嵌入和相对位置的嵌入现有的方法。然后,我们建议,鼓励自注意机制提高查询键和相对位置的嵌入之间的互动的新技术。我们最有前途的方法是绝对位置嵌入的推广,提高相比以往的嵌入位置上的办法结果SQuAD1.1。此外,我们要解决的位置嵌入是否能足够强大的处理长序列的电感特性。我们经验证明,我们的相对位置埋线法是合理的推广,并从感性的角度稳健。最后,我们表明,我们提出的方法可以作为近下拉更换为提高大型模型精度有小的计算预算通过。
22. Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems [PDF] 返回目录
Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, Pascale Fung
Abstract: Task-oriented dialogue systems are either modularized with separate dialogue state tracking (DST) and management steps or end-to-end trainable. In either case, the knowledge base (KB) plays an essential role in fulfilling user requests. Modularized systems rely on DST to interact with the KB, which is expensive in terms of annotation and inference time. End-to-end systems use the KB directly as input, but they cannot scale when the KB is larger than a few hundred entries. In this paper, we propose a method to embed the KB, of any size, directly into the model parameters. The resulting model does not require any DST or template responses, nor the KB as input, and it can dynamically update its KB via fine-tuning. We evaluate our solution in five task-oriented dialogue datasets with small, medium, and large KB size. Our experiments show that end-to-end models can effectively embed knowledge bases in their parameters and achieve competitive performance in all evaluated datasets.
摘要:面向任务的对话系统不是具有单独的对话状态跟踪(DST)和管理步骤或端至端可训练模块化。在这两种情况下,知识库(KB)起在满足用户的要求了至关重要的作用。模块化系统依赖于DST与KB,这是在注释和推理时间方面是昂贵的互动。终端到终端系统直接使用KB作为输入,但是当KB是超过几百项较大,他们无法形成规模。在本文中,我们提出了一个方法来嵌入KB,任何尺寸,直接进入模型参数。由此产生的模型不需要任何DST或模板的反应,也没有KB作为输入,它可以动态地更新通过微调其KB。我们评估我们在五个面向任务的对话集有小,中,大大小(KB)解决方案。我们的实验表明,端至中高端机型能有效地嵌入知识基地它们的参数,实现所有数据集进行评估竞争力的性能。
Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, Pascale Fung
Abstract: Task-oriented dialogue systems are either modularized with separate dialogue state tracking (DST) and management steps or end-to-end trainable. In either case, the knowledge base (KB) plays an essential role in fulfilling user requests. Modularized systems rely on DST to interact with the KB, which is expensive in terms of annotation and inference time. End-to-end systems use the KB directly as input, but they cannot scale when the KB is larger than a few hundred entries. In this paper, we propose a method to embed the KB, of any size, directly into the model parameters. The resulting model does not require any DST or template responses, nor the KB as input, and it can dynamically update its KB via fine-tuning. We evaluate our solution in five task-oriented dialogue datasets with small, medium, and large KB size. Our experiments show that end-to-end models can effectively embed knowledge bases in their parameters and achieve competitive performance in all evaluated datasets.
摘要:面向任务的对话系统不是具有单独的对话状态跟踪(DST)和管理步骤或端至端可训练模块化。在这两种情况下,知识库(KB)起在满足用户的要求了至关重要的作用。模块化系统依赖于DST与KB,这是在注释和推理时间方面是昂贵的互动。终端到终端系统直接使用KB作为输入,但是当KB是超过几百项较大,他们无法形成规模。在本文中,我们提出了一个方法来嵌入KB,任何尺寸,直接进入模型参数。由此产生的模型不需要任何DST或模板的反应,也没有KB作为输入,它可以动态地更新通过微调其KB。我们评估我们在五个面向任务的对话集有小,中,大大小(KB)解决方案。我们的实验表明,端至中高端机型能有效地嵌入知识基地它们的参数,实现所有数据集进行评估竞争力的性能。
23. Conversational Semantic Parsing [PDF] 返回目录
Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta
Abstract: The structured representation for semantic parsing in task-oriented assistant systems is geared towards simple understanding of one-turn queries. Due to the limitations of the representation, the session-based properties such as co-reference resolution and context carryover are processed downstream in a pipelined system. In this paper, we propose a semantic representation for such task-oriented conversational systems that can represent concepts such as co-reference and context carryover, enabling comprehensive understanding of queries in a session. We release a new session-based, compositional task-oriented parsing dataset of 20k sessions consisting of 60k utterances. Unlike Dialog State Tracking Challenges, the queries in the dataset have compositional forms. We propose a new family of Seq2Seq models for the session-based parsing above, which achieve better or comparable performance to the current state-of-the-art on ATIS, SNIPS, TOP and DSTC2. Notably, we improve the best known results on DSTC2 by up to 5 points for slot-carryover.
摘要:在面向任务的辅助系统语义分析结构化表示向着一个匝查询的简单的理解为目标。由于该表示的限制,基于会话的属性,如共同引用的分辨率和上下文夹带被处理以流水线系统下游。在本文中,我们提出了这样的面向任务的对话系统,可以表示概念,如联合引用和上下文结转,从而在一个会话查询的全面了解语义表示。我们发布了新的基于会话的,组成面向任务的解析,包括60K话语20K会话数据集。不同于对话状态跟踪的挑战,在数据集中的查询具有组成形式。我们建议Seq2Seq车型的一个新系列的基于会话的上述分析,其达到更好或相当的性能在当前的ATIS,SNIPS,TOP和DSTC2国家的最先进的。值得注意的是,我们通过多达5分槽结转改善DSTC2最有名的结果。
Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta
Abstract: The structured representation for semantic parsing in task-oriented assistant systems is geared towards simple understanding of one-turn queries. Due to the limitations of the representation, the session-based properties such as co-reference resolution and context carryover are processed downstream in a pipelined system. In this paper, we propose a semantic representation for such task-oriented conversational systems that can represent concepts such as co-reference and context carryover, enabling comprehensive understanding of queries in a session. We release a new session-based, compositional task-oriented parsing dataset of 20k sessions consisting of 60k utterances. Unlike Dialog State Tracking Challenges, the queries in the dataset have compositional forms. We propose a new family of Seq2Seq models for the session-based parsing above, which achieve better or comparable performance to the current state-of-the-art on ATIS, SNIPS, TOP and DSTC2. Notably, we improve the best known results on DSTC2 by up to 5 points for slot-carryover.
摘要:在面向任务的辅助系统语义分析结构化表示向着一个匝查询的简单的理解为目标。由于该表示的限制,基于会话的属性,如共同引用的分辨率和上下文夹带被处理以流水线系统下游。在本文中,我们提出了这样的面向任务的对话系统,可以表示概念,如联合引用和上下文结转,从而在一个会话查询的全面了解语义表示。我们发布了新的基于会话的,组成面向任务的解析,包括60K话语20K会话数据集。不同于对话状态跟踪的挑战,在数据集中的查询具有组成形式。我们建议Seq2Seq车型的一个新系列的基于会话的上述分析,其达到更好或相当的性能在当前的ATIS,SNIPS,TOP和DSTC2国家的最先进的。值得注意的是,我们通过多达5分槽结转改善DSTC2最有名的结果。
24. Visual Pivoting for (Unsupervised) Entity Alignment [PDF] 返回目录
Fangyu Liu, Muhao Chen, Dan Roth, Nigel Collier
Abstract: This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts necessary for capturing the correspondences.
摘要:本研究工作在异构知识图(KGS)使用视觉语义表示,以对齐实体。图像是很多现有的幼稚园天然成分。通过可视化的知识与其他辅助信息结合起来,我们表明,提出的新方法,EVA,创建了一个全面的实体表示提供强有力的信号,交叉图形实体对齐。此外,以前的实体对准方法需要人的种子标记对齐,限制可用性。 EVA提供通过利用实体的视觉相似性来创建初始种子字典(视觉枢转)完全无监督溶液。在基准数据集DBP15k和DWY15k的实验表明两个单语和跨语言实体对准任务EVA提供先进设备,最先进的性能。此外,我们发现,图像是特别有用对齐长尾KG实体,其本身缺乏必要的捕捉对应的结构性上下文。
Fangyu Liu, Muhao Chen, Dan Roth, Nigel Collier
Abstract: This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts necessary for capturing the correspondences.
摘要:本研究工作在异构知识图(KGS)使用视觉语义表示,以对齐实体。图像是很多现有的幼稚园天然成分。通过可视化的知识与其他辅助信息结合起来,我们表明,提出的新方法,EVA,创建了一个全面的实体表示提供强有力的信号,交叉图形实体对齐。此外,以前的实体对准方法需要人的种子标记对齐,限制可用性。 EVA提供通过利用实体的视觉相似性来创建初始种子字典(视觉枢转)完全无监督溶液。在基准数据集DBP15k和DWY15k的实验表明两个单语和跨语言实体对准任务EVA提供先进设备,最先进的性能。此外,我们发现,图像是特别有用对齐长尾KG实体,其本身缺乏必要的捕捉对应的结构性上下文。
25. Non-Pharmaceutical Intervention Discovery with Topic Modeling [PDF] 返回目录
Jonathan Smith, Borna Ghotbi, Seungeun Yi, Mahboobeh Parsapoor
Abstract: We consider the task of discovering categories of non-pharmaceutical interventions during the evolving COVID-19 pandemic. We explore topic modeling on two corpora with national and international scope. These models discover existing categories when compared with human intervention labels while reduced human effort needed.
摘要:我们认为,不断发展的COVID-19大流行期间发现非药物干预的类别的任务。我们对两个语料库与国家和国际范围内探索主题建模。当同时减少人为的努力需要在人工干预的标签相比,这些模型发现现有类别。
Jonathan Smith, Borna Ghotbi, Seungeun Yi, Mahboobeh Parsapoor
Abstract: We consider the task of discovering categories of non-pharmaceutical interventions during the evolving COVID-19 pandemic. We explore topic modeling on two corpora with national and international scope. These models discover existing categories when compared with human intervention labels while reduced human effort needed.
摘要:我们认为,不断发展的COVID-19大流行期间发现非药物干预的类别的任务。我们对两个语料库与国家和国际范围内探索主题建模。当同时减少人为的努力需要在人工干预的标签相比,这些模型发现现有类别。
26. DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue [PDF] 返回目录
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
Abstract: A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce \textbf{DialoGLUE} (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more general task-oriented dialogue models.
摘要:长期面向任务的对话研究的目标是对话模式灵活地适应新的领域的能力。要在这个方向前进的研究中,我们介绍\ textbf {DialoGLUE}(对话语言理解的评价),包括7个面向任务的对话数据集涵盖4个不同的自然语言理解任务,旨在鼓励基于表示传输对话研究公共标杆,领域适应性,和样品高效的任务学习。我们发布几个强势基线模型,在香草BERT架构和国家的最先进成果5出7个任务在一个大开放领域对话语料库和任务适应性自我监督展示性能改进,通过前培训训练。通过DialoGLUE基准,基准方法和我们的评估脚本,我们希望能够促进向发展中国家更广泛的面向任务的对话模式的目标方面取得进展。
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
Abstract: A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce \textbf{DialoGLUE} (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more general task-oriented dialogue models.
摘要:长期面向任务的对话研究的目标是对话模式灵活地适应新的领域的能力。要在这个方向前进的研究中,我们介绍\ textbf {DialoGLUE}(对话语言理解的评价),包括7个面向任务的对话数据集涵盖4个不同的自然语言理解任务,旨在鼓励基于表示传输对话研究公共标杆,领域适应性,和样品高效的任务学习。我们发布几个强势基线模型,在香草BERT架构和国家的最先进成果5出7个任务在一个大开放领域对话语料库和任务适应性自我监督展示性能改进,通过前培训训练。通过DialoGLUE基准,基准方法和我们的评估脚本,我们希望能够促进向发展中国家更广泛的面向任务的对话模式的目标方面取得进展。
27. The design and implementation of Language Learning Chatbot with XAI using Ontology and Transfer Learning [PDF] 返回目录
Nuobei Shi, Qin Zeng, Raymond Lee
Abstract: In this paper, we proposed a transfer learning-based English language learning chatbot, whose output generated by GPT-2 can be explained by corresponding ontology graph rooted by fine-tuning dataset. We design three levels for systematically English learning, including phonetics level for speech recognition and pronunciation correction, semantic level for specific domain conversation, and the simulation of free-style conversation in English the highest level of language chatbot communication as free-style conversation agent. For academic contribution, we implement the ontology graph to explain the performance of free-style conversation, following the concept of XAI (Explainable Artificial Intelligence) to visualize the connections of neural network in bionics, and explain the output sentence from language model. From implementation perspective, our Language Learning agent integrated the mini-program in WeChat as front-end, and fine-tuned GPT-2 model of transfer learning as back-end to interpret the responses by ontology graph.
摘要:本文提出了一种基于学习迁移英语学习聊天机器人,其输出由GPT-2产生可通过相应的本体图通过微调集扎根来解释。我们对系统的英语学习设计三个层次,包括语音级别语音识别和语音矫正,对特定领域的对话语义层面,和自由风格的谈话中英文模拟为自由谈吐代理语言聊天机器人沟通的最高水平。对于学术贡献,我们实现了本体图来解释自由谈吐表现,以下XAI(可解释人工智能)的概念形象化在仿生学神经网络的连接,并从语言模型解释输出文。从实现的角度来看,我们的语言学习代理微信集成微型程序作为前端,与转让学习作为后端的微调GPT-2模型来解释本体图的响应。
Nuobei Shi, Qin Zeng, Raymond Lee
Abstract: In this paper, we proposed a transfer learning-based English language learning chatbot, whose output generated by GPT-2 can be explained by corresponding ontology graph rooted by fine-tuning dataset. We design three levels for systematically English learning, including phonetics level for speech recognition and pronunciation correction, semantic level for specific domain conversation, and the simulation of free-style conversation in English the highest level of language chatbot communication as free-style conversation agent. For academic contribution, we implement the ontology graph to explain the performance of free-style conversation, following the concept of XAI (Explainable Artificial Intelligence) to visualize the connections of neural network in bionics, and explain the output sentence from language model. From implementation perspective, our Language Learning agent integrated the mini-program in WeChat as front-end, and fine-tuned GPT-2 model of transfer learning as back-end to interpret the responses by ontology graph.
摘要:本文提出了一种基于学习迁移英语学习聊天机器人,其输出由GPT-2产生可通过相应的本体图通过微调集扎根来解释。我们对系统的英语学习设计三个层次,包括语音级别语音识别和语音矫正,对特定领域的对话语义层面,和自由风格的谈话中英文模拟为自由谈吐代理语言聊天机器人沟通的最高水平。对于学术贡献,我们实现了本体图来解释自由谈吐表现,以下XAI(可解释人工智能)的概念形象化在仿生学神经网络的连接,并从语言模型解释输出文。从实现的角度来看,我们的语言学习代理微信集成微型程序作为前端,与转让学习作为后端的微调GPT-2模型来解释本体图的响应。
28. EEMC: Embedding Enhanced Multi-tag Classification [PDF] 返回目录
Yanlin Li, Shi An, Ruisheng Zhang
Abstract: The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a low-dimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multi-tag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 % - 450% and make average F1 score rise from 12 % - 224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.
摘要:最近发生的表示学习使NLP和复杂的网络有吸引力的性能,成为机器学习和数据挖掘技术为根本。如何使用表示学习提高分类器的性能是一个非常重要的研究方向。我们使用表示学习技术来映射的原始数据(图的节点)到低维特征空间。在该空间中,每个原始数据而得到的低维向量表示,我们做了一些简单的线性操作对那些矢量产生一些虚拟数据,使用这些矢量和虚拟数据来训练多标签分类器。之后,我们通过F1分数(宏%F1和Micro%F1)测量分类器的性能。从28%我们的方法补充宏F1上升 - 450%,并从12%平均得分F1上升 - 224%。通过对比,我们具有较低维向量直接训练的分类器,并测量分类器的性能。我们确认我们的三个公共数据集的算法,我们发现,虚拟数据帮助分类大大提高F1的分数。因此,我们的算法是提高分类性能的有效途径。这些结果表明,通过简单的线性操作中产生的虚拟数据,在表示空间,仍保留了原始数据的信息。它也有小样本数据集学习具有重要意义。
Yanlin Li, Shi An, Ruisheng Zhang
Abstract: The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a low-dimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multi-tag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 % - 450% and make average F1 score rise from 12 % - 224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.
摘要:最近发生的表示学习使NLP和复杂的网络有吸引力的性能,成为机器学习和数据挖掘技术为根本。如何使用表示学习提高分类器的性能是一个非常重要的研究方向。我们使用表示学习技术来映射的原始数据(图的节点)到低维特征空间。在该空间中,每个原始数据而得到的低维向量表示,我们做了一些简单的线性操作对那些矢量产生一些虚拟数据,使用这些矢量和虚拟数据来训练多标签分类器。之后,我们通过F1分数(宏%F1和Micro%F1)测量分类器的性能。从28%我们的方法补充宏F1上升 - 450%,并从12%平均得分F1上升 - 224%。通过对比,我们具有较低维向量直接训练的分类器,并测量分类器的性能。我们确认我们的三个公共数据集的算法,我们发现,虚拟数据帮助分类大大提高F1的分数。因此,我们的算法是提高分类性能的有效途径。这些结果表明,通过简单的线性操作中产生的虚拟数据,在表示空间,仍保留了原始数据的信息。它也有小样本数据集学习具有重要意义。
29. Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment [PDF] 返回目录
Sharan Raja, Rudraksh Tuwani
Abstract: Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than 3% of words in the discharge summary to significantly affect the performance of the baseline model.
摘要:ICD-9代码手册注释是耗时且容易出错的过程的时间。深度学习为基础的系统解决自动化ICD-9编码已经实现竞争力的性能问题。鉴于电子病历增加的增殖,这种自动化系统有望最终取代人类编码器。在这项工作中,我们研究了基于错字简单敌对攻击策略如何影响国家的最先进的车型的性能,从出院小结预测的前50个最常见的ICD-9编码的任务。初步结果表明,一个恶意的攻击者,使用梯度信息,可以通过精心设计特定的扰动,即显示为常规人类错别字,用于在放电摘要词语的小于3%至显著影响基准模型的性能。
Sharan Raja, Rudraksh Tuwani
Abstract: Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than 3% of words in the discharge summary to significantly affect the performance of the baseline model.
摘要:ICD-9代码手册注释是耗时且容易出错的过程的时间。深度学习为基础的系统解决自动化ICD-9编码已经实现竞争力的性能问题。鉴于电子病历增加的增殖,这种自动化系统有望最终取代人类编码器。在这项工作中,我们研究了基于错字简单敌对攻击策略如何影响国家的最先进的车型的性能,从出院小结预测的前50个最常见的ICD-9编码的任务。初步结果表明,一个恶意的攻击者,使用梯度信息,可以通过精心设计特定的扰动,即显示为常规人类错别字,用于在放电摘要词语的小于3%至显著影响基准模型的性能。
30. VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training [PDF] 返回目录
Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
Abstract: It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other than COCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pre-training (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of paired image-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training. We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score.
摘要:非常希望但具有挑战性的生成图像标题可以描述这是在字幕标记的训练数据看不见的新物体,即在新物体字幕挑战(nocaps)来评价的能力。在这个挑战,没有额外的图像,字幕训练数据,比COCO字幕等,是允许模型训练。因此,传统的视觉语言预训练(VLP)的方法不能应用。本文呈现的视觉词汇预训练(体内),其执行预先训练在不存在字幕的注解。通过破坏VLP对图像字幕训练数据的依赖性,VIVO可以利用大量的对图像标签数据的学习视觉词汇。这是由前训练多层变压器模型做过学会与它们对应的图像区域特征对准影像级标签。为了解决图像标签的无序性质,VIVO采用屏蔽带标签预测行为前培训匈牙利匹配损耗。我们通过微调验证VIVO的有效性进行图像字幕预先训练的模式。此外,我们履行我们的模型推导出的视觉文本对齐的分析。结果表明,我们的模型不仅可以生成描述新颖物品的流畅的视频字幕,也确定这些物体的位置。我们的单一模式取得了国家的最先进的新的nocaps结果和超越人类的苹果酒得分。
Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
Abstract: It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other than COCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pre-training (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of paired image-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training. We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score.
摘要:非常希望但具有挑战性的生成图像标题可以描述这是在字幕标记的训练数据看不见的新物体,即在新物体字幕挑战(nocaps)来评价的能力。在这个挑战,没有额外的图像,字幕训练数据,比COCO字幕等,是允许模型训练。因此,传统的视觉语言预训练(VLP)的方法不能应用。本文呈现的视觉词汇预训练(体内),其执行预先训练在不存在字幕的注解。通过破坏VLP对图像字幕训练数据的依赖性,VIVO可以利用大量的对图像标签数据的学习视觉词汇。这是由前训练多层变压器模型做过学会与它们对应的图像区域特征对准影像级标签。为了解决图像标签的无序性质,VIVO采用屏蔽带标签预测行为前培训匈牙利匹配损耗。我们通过微调验证VIVO的有效性进行图像字幕预先训练的模式。此外,我们履行我们的模型推导出的视觉文本对齐的分析。结果表明,我们的模型不仅可以生成描述新颖物品的流畅的视频字幕,也确定这些物体的位置。我们的单一模式取得了国家的最先进的新的nocaps结果和超越人类的苹果酒得分。
31. Joint Spatio-Textual Reasoning for Answering Tourism Questions [PDF] 返回目录
Danish Contractor, Shashank Goel, Mausam, Parag Singla
Abstract: Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer questions. We first develop a modular spatial-reasoning network that uses geo-coordinates of location names mentioned in a question, and of candidate answer POIs, to reason over only spatial constraints. We then combine our spatial-reasoner with a textual reasoner in a joint model and present experiments on a real world POI recommendation task. We report substantial improvements over existing models with-out joint spatio-textual reasoning.
摘要:我们的目标是要回答这个搜寻点,兴趣点(POI)的建议现实世界旅游问题。这些问题表达各种空间和非空间的限制条件,因此需要文字和空间推理的组合。对此,我们开发的第一个联合的空间 - 文字推理模型,它结合了地理空间信息与知识的文本语料库回答问题。我们首先开发出模块化的空间推理网络使用地理坐标在一个问题中提到的位置名称,以及候选答案的POI,以理智战胜了只有空间限制。我们随后在联合模式的文本推理和目前的实验在一个真实的世界POI推荐任务结合我们的空间推理。我们报告在现有机型出联合空 - 文字推理实质性的改进。
Danish Contractor, Shashank Goel, Mausam, Parag Singla
Abstract: Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer questions. We first develop a modular spatial-reasoning network that uses geo-coordinates of location names mentioned in a question, and of candidate answer POIs, to reason over only spatial constraints. We then combine our spatial-reasoner with a textual reasoner in a joint model and present experiments on a real world POI recommendation task. We report substantial improvements over existing models with-out joint spatio-textual reasoning.
摘要:我们的目标是要回答这个搜寻点,兴趣点(POI)的建议现实世界旅游问题。这些问题表达各种空间和非空间的限制条件,因此需要文字和空间推理的组合。对此,我们开发的第一个联合的空间 - 文字推理模型,它结合了地理空间信息与知识的文本语料库回答问题。我们首先开发出模块化的空间推理网络使用地理坐标在一个问题中提到的位置名称,以及候选答案的POI,以理智战胜了只有空间限制。我们随后在联合模式的文本推理和目前的实验在一个真实的世界POI推荐任务结合我们的空间推理。我们报告在现有机型出联合空 - 文字推理实质性的改进。
注:中文为机器翻译结果!封面为论文标题词云图!