目录
2. A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support [PDF] 摘要
3. Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction [PDF] 摘要
6. Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA [PDF] 摘要
7. What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus [PDF] 摘要
10. DSC IIT-ISM at SemEval-2020 Task 6: Boosting BERT with Dependencies for Definition Extraction [PDF] 摘要
11. ISCAS at SemEval-2020 Task 5: Pre-trained Transformers for Counterfactual Statement Modeling [PDF] 摘要
14. Multi^2OIE: Multilingual Open Information Extraction based on Multi-Head Attention with BERT [PDF] 摘要
15. A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning [PDF] 摘要
19. Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning [PDF] 摘要
20. Self-supervised pre-training and contrastive representation learning for multiple-choice video QA [PDF] 摘要
24. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis [PDF] 摘要
29. Understanding Effects of Editing Tweets for News Sharing by Media Accounts through a Causal Inference Framework [PDF] 摘要
摘要
1. Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks [PDF] 返回目录
Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum
Abstract: Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient -- when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.
摘要:变压器型号的自我监督前的训练已经彻底改变了NLP应用。这样的预培训,语言建模的目标提供了用于推广以及与微调新任务的参数一个有用的初始点。但是,微调是静止图像数据低效 - 当有几个标记的实施例中,精度可以较低。数据效率可以通过优化预训练直接用于将来微调用几个例子得到改善;这可以被视为一个元学习问题。然而,标准的元学习技术需要为了许多培训任务来概括;不幸的是,找到一个多样化的这种监督的任务通常是困难的。本文提出了一种自我监督的方法来产生一个大的,丰富的,元学习任务无标签的文本分配。这是使用实现了完形填空式的目标,而是通过收集令牌到只从词汇方面屈指可数之间空白创建单独的多类分类任务。这产生许多独特的元培训任务词汇术语的子集的数量。这个分布使用的是最新的元学习框架的任务,我们META-训练变压器模型。 5月17个NLP任务,我们表明,此元培训带来更好的几拍泛化比语言模型前培训,随后可以通过微调。此外,我们展示了如何自我监督的任务可以与元学习监督任务,提供超过以往监督元学习基本准确的收益相结合。
Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum
Abstract: Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient -- when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.
摘要:变压器型号的自我监督前的训练已经彻底改变了NLP应用。这样的预培训,语言建模的目标提供了用于推广以及与微调新任务的参数一个有用的初始点。但是,微调是静止图像数据低效 - 当有几个标记的实施例中,精度可以较低。数据效率可以通过优化预训练直接用于将来微调用几个例子得到改善;这可以被视为一个元学习问题。然而,标准的元学习技术需要为了许多培训任务来概括;不幸的是,找到一个多样化的这种监督的任务通常是困难的。本文提出了一种自我监督的方法来产生一个大的,丰富的,元学习任务无标签的文本分配。这是使用实现了完形填空式的目标,而是通过收集令牌到只从词汇方面屈指可数之间空白创建单独的多类分类任务。这产生许多独特的元培训任务词汇术语的子集的数量。这个分布使用的是最新的元学习框架的任务,我们META-训练变压器模型。 5月17个NLP任务,我们表明,此元培训带来更好的几拍泛化比语言模型前培训,随后可以通过微调。此外,我们展示了如何自我监督的任务可以与元学习监督任务,提供超过以往监督元学习基本准确的收益相结合。
2. A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support [PDF] 返回目录
Ashish Sharma, Adam S. Miner, David C. Atkins, Tim Althoff
Abstract: Empathy is critical to successful mental health support. Empathy measurement has predominantly occurred in synchronous, face-to-face settings, and may not translate to asynchronous, text-based contexts. Because millions of people use text-based platforms for mental health support, understanding empathy in these contexts is crucial. In this work, we present a computational approach to understanding how empathy is expressed in online mental health platforms. We develop a novel unifying theoretically-grounded framework for characterizing the communication of empathy in text-based conversations. We collect and share a corpus of 10k (post, response) pairs annotated using this empathy framework with supporting evidence for annotations (rationales). We develop a multi-task RoBERTa-based bi-encoder model for identifying empathy in conversations and extracting rationales underlying its predictions. Experiments demonstrate that our approach can effectively identify empathic conversations. We further apply this model to analyze 235k mental health interactions and show that users do not self-learn empathy over time, revealing opportunities for empathy training and feedback.
摘要:同情是成功的精神健康支持至关重要。同情测量已主要发生在同步,脸对脸的设置,可能无法转化为异步的,基于文本的环境。由于数以百万计的人使用的心理健康支持基于文本的平台,在这些背景下理解同情是至关重要的。在这项工作中,我们提出要理解同情是如何在网上心理健康的平台表示的计算方法。我们开发了一种新理论统一接地框架表征同情的基于文本的对话沟通。我们收集和共享的10K(邮局,响应),使用配套的注释(理由)这个证据的同情框架注释对语料库。我们制定查明同情在谈话并提取其理由背后的预测基于罗伯塔 - 多任务的双编码器模型。实验表明,我们的方法可以有效地识别移情对话。我们进一步应用此模型来分析235K心理健康的互动,并显示用户不要自行学习同情随着时间的推移,露出了同情的培训和反馈的机会。
Ashish Sharma, Adam S. Miner, David C. Atkins, Tim Althoff
Abstract: Empathy is critical to successful mental health support. Empathy measurement has predominantly occurred in synchronous, face-to-face settings, and may not translate to asynchronous, text-based contexts. Because millions of people use text-based platforms for mental health support, understanding empathy in these contexts is crucial. In this work, we present a computational approach to understanding how empathy is expressed in online mental health platforms. We develop a novel unifying theoretically-grounded framework for characterizing the communication of empathy in text-based conversations. We collect and share a corpus of 10k (post, response) pairs annotated using this empathy framework with supporting evidence for annotations (rationales). We develop a multi-task RoBERTa-based bi-encoder model for identifying empathy in conversations and extracting rationales underlying its predictions. Experiments demonstrate that our approach can effectively identify empathic conversations. We further apply this model to analyze 235k mental health interactions and show that users do not self-learn empathy over time, revealing opportunities for empathy training and feedback.
摘要:同情是成功的精神健康支持至关重要。同情测量已主要发生在同步,脸对脸的设置,可能无法转化为异步的,基于文本的环境。由于数以百万计的人使用的心理健康支持基于文本的平台,在这些背景下理解同情是至关重要的。在这项工作中,我们提出要理解同情是如何在网上心理健康的平台表示的计算方法。我们开发了一种新理论统一接地框架表征同情的基于文本的对话沟通。我们收集和共享的10K(邮局,响应),使用配套的注释(理由)这个证据的同情框架注释对语料库。我们制定查明同情在谈话并提取其理由背后的预测基于罗伯塔 - 多任务的双编码器模型。实验表明,我们的方法可以有效地识别移情对话。我们进一步应用此模型来分析235K心理健康的互动,并显示用户不要自行学习同情随着时间的推移,露出了同情的培训和反馈的机会。
3. Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction [PDF] 返回目录
Mariya Toneva, Otilia Stretcu, Barnabas Poczos, Leila Wehbe, Tom M. Mitchell
Abstract: How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e.g., bird) always have the same representation, or does the task under which the word is processed alter its representation (answering "can you eat it?" versus "can it fly?")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i.e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.
摘要:如何含义大脑表示仍是神经科学的大开放的问题之一。难道一个词(例如,鸟)总是有相同的表示,还是任务在其下字就改变它的表示(回答“你能当饭吃吗?”与“能飞吗?”)?是谁读同一个词,同时执行不同的语义任务科目的大脑活动已经显示出整个任务不同。但是,它仍然不理解任务本身如何促进这种差异。在目前的工作中,我们研究了脑磁图(MEG)的参与者回答有关具体名词的问题负责的大脑记录。我们通过预测毫秒分辨率MEG录音作为名词和任务的两个语义的功能调查任务的具体名词的处理效果(即所提出的问题)。使用这种方法,我们通过比较这些假设的训练中没有看到新的任务和名词取得了零的爆破预测测试有关任务的刺激作用几种假说。我们发现,包含任务语义显著提高MEG记录的预测,跨越参与者。改善发生后,参与者首先475-550ms见字,对应于什么被认为是语义处理的一个字的结束时间,这。这些结果表明,只有一个词的语义处理的到底是任务依赖性,并造成对未来的研究制定新的假说较早任务效果任务和刺激的功能是一个挑战。
Mariya Toneva, Otilia Stretcu, Barnabas Poczos, Leila Wehbe, Tom M. Mitchell
Abstract: How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e.g., bird) always have the same representation, or does the task under which the word is processed alter its representation (answering "can you eat it?" versus "can it fly?")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i.e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.
摘要:如何含义大脑表示仍是神经科学的大开放的问题之一。难道一个词(例如,鸟)总是有相同的表示,还是任务在其下字就改变它的表示(回答“你能当饭吃吗?”与“能飞吗?”)?是谁读同一个词,同时执行不同的语义任务科目的大脑活动已经显示出整个任务不同。但是,它仍然不理解任务本身如何促进这种差异。在目前的工作中,我们研究了脑磁图(MEG)的参与者回答有关具体名词的问题负责的大脑记录。我们通过预测毫秒分辨率MEG录音作为名词和任务的两个语义的功能调查任务的具体名词的处理效果(即所提出的问题)。使用这种方法,我们通过比较这些假设的训练中没有看到新的任务和名词取得了零的爆破预测测试有关任务的刺激作用几种假说。我们发现,包含任务语义显著提高MEG记录的预测,跨越参与者。改善发生后,参与者首先475-550ms见字,对应于什么被认为是语义处理的一个字的结束时间,这。这些结果表明,只有一个词的语义处理的到底是任务依赖性,并造成对未来的研究制定新的假说较早任务效果任务和刺激的功能是一个挑战。
4. Evaluating Interactive Summarization: an Expansion-Based Framework [PDF] 返回目录
Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan
Abstract: Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers the accumulating information along an interactive session. Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards, but adapted to reflect interaction. All of our solutions are intended to be released publicly as a benchmark, allowing comparison of future developments in interactive summarization. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis of these systems motivate our design choices and support the viability of our framework.
摘要:允许用户使用多文档summarizers互动,是改善和定制汇总结果有前途的方向。不同的想法互动总结已经在以前的工作被提出,但这些解决方案是高度发散的和无法比拟的。在本文中,我们开发了扩张为主的互动总结,它认为沿着一个交互式会话的积累信息的终端到终端的评估框架。我们的框架包括收集真实的用户会话,并依靠标准的评价措施的过程,但调整,以反映互动。我们所有的解决方案都旨在公开发布为基准,允许在互动总结的未来发展比较。我们通过评估和比较,我们为了这个目的,这将作为我们的基准的一部分开发基线实现演示了如何使用我们的框架中。我们广泛的实验和这些系统的分析,激励我们的设计选择,并支持我们的框架的可行性。
Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan
Abstract: Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers the accumulating information along an interactive session. Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards, but adapted to reflect interaction. All of our solutions are intended to be released publicly as a benchmark, allowing comparison of future developments in interactive summarization. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis of these systems motivate our design choices and support the viability of our framework.
摘要:允许用户使用多文档summarizers互动,是改善和定制汇总结果有前途的方向。不同的想法互动总结已经在以前的工作被提出,但这些解决方案是高度发散的和无法比拟的。在本文中,我们开发了扩张为主的互动总结,它认为沿着一个交互式会话的积累信息的终端到终端的评估框架。我们的框架包括收集真实的用户会话,并依靠标准的评价措施的过程,但调整,以反映互动。我们所有的解决方案都旨在公开发布为基准,允许在互动总结的未来发展比较。我们通过评估和比较,我们为了这个目的,这将作为我们的基准的一部分开发基线实现演示了如何使用我们的框架中。我们广泛的实验和这些系统的分析,激励我们的设计选择,并支持我们的框架的可行性。
5. More Embeddings, Better Sequence Labelers? [PDF] 返回目录
Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
Abstract: Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings. However, there is no definite conclusion on whether we can build better sequence labelers by combining different kinds of embeddings in various settings. In this paper, we conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to study the accuracy of sequence labeling with various embedding concatenations and make three observations: (1) concatenating more embedding variants leads to better accuracy in rich-resource and cross-domain settings and some conditions of low-resource settings; (2) concatenating additional contextual sub-word embeddings with contextual character embeddings hurts the accuracy in extremely low-resource settings; (3) based on the conclusion of (1), concatenating additional similar contextual embeddings cannot lead to further improvements. We hope these conclusions can help people build stronger sequence labelers in various settings.
摘要:最近的工作提出了一个家庭,显著提高序列贴标在非上下文的嵌入精度上下文的嵌入的。但是,我们是否可以在各种环境下不同种类的嵌入的结合构建更好的序列贴标没有明确的结论。在本文中,我们进行的3个任务超过18点的数据集和8种语言广泛的实验来研究各种嵌入级联序列标注的准确性,提出三点意见:(1)串联多个嵌入变种带来更好的准确性,丰富的资源和跨-domain设置和低资源设置一定的条件; (2)连接具有上下文字符的嵌入附加上下文的子字的嵌入伤害在极低资源设置的准确性; (3)根据(1)的结论,串联附加相似上下文的嵌入不会导致进一步的改进。我们希望这些结论可以帮助人们建立各种设置更强的序列贴标。
Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
Abstract: Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings. However, there is no definite conclusion on whether we can build better sequence labelers by combining different kinds of embeddings in various settings. In this paper, we conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to study the accuracy of sequence labeling with various embedding concatenations and make three observations: (1) concatenating more embedding variants leads to better accuracy in rich-resource and cross-domain settings and some conditions of low-resource settings; (2) concatenating additional contextual sub-word embeddings with contextual character embeddings hurts the accuracy in extremely low-resource settings; (3) based on the conclusion of (1), concatenating additional similar contextual embeddings cannot lead to further improvements. We hope these conclusions can help people build stronger sequence labelers in various settings.
摘要:最近的工作提出了一个家庭,显著提高序列贴标在非上下文的嵌入精度上下文的嵌入的。但是,我们是否可以在各种环境下不同种类的嵌入的结合构建更好的序列贴标没有明确的结论。在本文中,我们进行的3个任务超过18点的数据集和8种语言广泛的实验来研究各种嵌入级联序列标注的准确性,提出三点意见:(1)串联多个嵌入变种带来更好的准确性,丰富的资源和跨-domain设置和低资源设置一定的条件; (2)连接具有上下文字符的嵌入附加上下文的子字的嵌入伤害在极低资源设置的准确性; (3)根据(1)的结论,串联附加相似上下文的嵌入不会导致进一步的改进。我们希望这些结论可以帮助人们建立各种设置更强的序列贴标。
6. Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA [PDF] 返回目录
Ieva Staliūnaitė, Ignacio Iacobacci
Abstract: Many NLP tasks have benefited from transferring knowledge from contextualized word embeddings, however the picture of what type of knowledge is transferred is incomplete. This paper studies the types of linguistic phenomena accounted for by language models in the context of a Conversational Question Answering (CoQA) task. We identify the problematic areas for the finetuned RoBERTa, BERT and DistilBERT models through systematic error analysis - basic arithmetic (counting phrases), compositional semantics (negation and Semantic Role Labeling), and lexical semantics (surprisal and antonymy). When enhanced with the relevant linguistic knowledge through multitask learning, the models improve in performance. Ensembles of the enhanced models yield a boost between 2.2 and 2.7 points in F1 score overall, and up to 42.1 points in F1 on the hardest question classes. The results show differences in ability to represent compositional and lexical information between RoBERTa, BERT and DistilBERT.
摘要:许多NLP任务已经从语境字的嵌入知识转移中受益,但什么知识型转移的图片是不完整的。本文研究了各类语言现象的对话答疑(CoQA)任务的上下文占通过语言模型。我们确定通过系统误差分析的微调,罗伯塔,BERT和DistilBERT模型有问题的领域 - 基本算术(计数短语),组合语义学(否定和语义角色标注),和词汇语义(surprisal和反义)。当通过多任务学习相关语言知识增强,该机型在性能上得到改善。增强模型的合奏产生的F1 2.2和2.7点之间的升压总体得分和高达42.1 F1上的点最难的问题类。结果表明,在代表罗伯塔,BERT和DistilBERT之间的组成和词汇信息的能力的差异。
Ieva Staliūnaitė, Ignacio Iacobacci
Abstract: Many NLP tasks have benefited from transferring knowledge from contextualized word embeddings, however the picture of what type of knowledge is transferred is incomplete. This paper studies the types of linguistic phenomena accounted for by language models in the context of a Conversational Question Answering (CoQA) task. We identify the problematic areas for the finetuned RoBERTa, BERT and DistilBERT models through systematic error analysis - basic arithmetic (counting phrases), compositional semantics (negation and Semantic Role Labeling), and lexical semantics (surprisal and antonymy). When enhanced with the relevant linguistic knowledge through multitask learning, the models improve in performance. Ensembles of the enhanced models yield a boost between 2.2 and 2.7 points in F1 score overall, and up to 42.1 points in F1 on the hardest question classes. The results show differences in ability to represent compositional and lexical information between RoBERTa, BERT and DistilBERT.
摘要:许多NLP任务已经从语境字的嵌入知识转移中受益,但什么知识型转移的图片是不完整的。本文研究了各类语言现象的对话答疑(CoQA)任务的上下文占通过语言模型。我们确定通过系统误差分析的微调,罗伯塔,BERT和DistilBERT模型有问题的领域 - 基本算术(计数短语),组合语义学(否定和语义角色标注),和词汇语义(surprisal和反义)。当通过多任务学习相关语言知识增强,该机型在性能上得到改善。增强模型的合奏产生的F1 2.2和2.7点之间的升压总体得分和高达42.1 F1上的点最难的问题类。结果表明,在代表罗伯塔,BERT和DistilBERT之间的组成和词汇信息的能力的差异。
7. What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus [PDF] 返回目录
Yonatan Bilu, Shai Gretz, Edo Cohen, Noam Slonim
Abstract: One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than an idiosyncratic choice of Wikipedia editors? Specifically, we aim to identify such "wiki-worthy" terms in a massive news corpus, and see if this can be done with no, or minimal, dependency on actual Wikipedia entries. We suggest a five-step pipeline for doing so, providing baseline results for all five, and the relevant datasets for benchmarking them. Our work sheds new light on the domain-specific Automatic Term Extraction problem, with the problem at hand being a domain-independent variant of it.
摘要:一,过去二十年来的最令人印象深刻的人类努力的是在免费,方便的格式是维基百科的收集和人类知识的分类。在这项工作中,我们问是什么让一个值得进入的知识这栋大厦和按照维基百科它自己的一页的期限?到什么程度,这是一个自然的产品正在进行人类的话语和讨论,而不是维基百科的编辑是一种特别的选择?具体来说,我们的目标是在一个巨大的新闻语料识别这样的“维基值得”条款,看看是否这可以没有,或很少,实际百科词条的依赖来完成。我们建议这样做,所有五个提供基准结果五个步骤的管道,并为他们基准相关数据集。我们手头工作揭示了在特定领域的自动术语提取问题另眼相看,这个问题是它的一个领域无关的变种。
Yonatan Bilu, Shai Gretz, Edo Cohen, Noam Slonim
Abstract: One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than an idiosyncratic choice of Wikipedia editors? Specifically, we aim to identify such "wiki-worthy" terms in a massive news corpus, and see if this can be done with no, or minimal, dependency on actual Wikipedia entries. We suggest a five-step pipeline for doing so, providing baseline results for all five, and the relevant datasets for benchmarking them. Our work sheds new light on the domain-specific Automatic Term Extraction problem, with the problem at hand being a domain-independent variant of it.
摘要:一,过去二十年来的最令人印象深刻的人类努力的是在免费,方便的格式是维基百科的收集和人类知识的分类。在这项工作中,我们问是什么让一个值得进入的知识这栋大厦和按照维基百科它自己的一页的期限?到什么程度,这是一个自然的产品正在进行人类的话语和讨论,而不是维基百科的编辑是一种特别的选择?具体来说,我们的目标是在一个巨大的新闻语料识别这样的“维基值得”条款,看看是否这可以没有,或很少,实际百科词条的依赖来完成。我们建议这样做,所有五个提供基准结果五个步骤的管道,并为他们基准相关数据集。我们手头工作揭示了在特定领域的自动术语提取问题另眼相看,这个问题是它的一个领域无关的变种。
8. Fast and Accurate Sequence Labeling with Approximate Inference Network [PDF] 返回目录
Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
Abstract: The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms such as the forward-backward and Viterbi algorithms are typically applied in training and prediction stages of the CRF model. However, these algorithms require sequential computation that makes parallelization impossible. In this paper, we propose to employ a parallelizable approximate variational inference algorithm for the CRF model. Based on this algorithm, we design an approximate inference network that can be connected with the encoder of the neural CRF model to form an end-to-end network, which is amenable to parallelization for faster training and prediction. The empirical results show that our proposed approaches achieve a 12.7-fold improvement in decoding speed with long sentences and a competitive accuracy compared with the traditional CRF approach.
摘要:直链条件随机场(CRF)模型是最广泛使用的神经序列标注方法之一。确切概率推理算法,如前后和维特比算法是在所述CRF模型的训练和预测阶段典型地施加。然而,这些算法需要顺序计算,使并行化是不可能的。在本文中,我们建议采用的CRF模型并行近似变分推理算法。基于该算法,我们设计可以与神经CRF模型的编码器被连接以形成一个端部到终端网络,这是适合于并行更快的训练和预测的近似推断网络。实证结果表明,该方法实现用长句和有竞争力的精度与传统的CRF方法相比解码速度的12.7倍的改善。
Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
Abstract: The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms such as the forward-backward and Viterbi algorithms are typically applied in training and prediction stages of the CRF model. However, these algorithms require sequential computation that makes parallelization impossible. In this paper, we propose to employ a parallelizable approximate variational inference algorithm for the CRF model. Based on this algorithm, we design an approximate inference network that can be connected with the encoder of the neural CRF model to form an end-to-end network, which is amenable to parallelization for faster training and prediction. The empirical results show that our proposed approaches achieve a 12.7-fold improvement in decoding speed with long sentences and a competitive accuracy compared with the traditional CRF approach.
摘要:直链条件随机场(CRF)模型是最广泛使用的神经序列标注方法之一。确切概率推理算法,如前后和维特比算法是在所述CRF模型的训练和预测阶段典型地施加。然而,这些算法需要顺序计算,使并行化是不可能的。在本文中,我们建议采用的CRF模型并行近似变分推理算法。基于该算法,我们设计可以与神经CRF模型的编码器被连接以形成一个端部到终端网络,这是适合于并行更快的训练和预测的近似推断网络。实证结果表明,该方法实现用长句和有竞争力的精度与传统的CRF方法相比解码速度的12.7倍的改善。
9. Generating Label Cohesive and Well-Formed Adversarial Claims [PDF] 返回目录
Pepa Atanasova, Dustin Wright, Isabelle Augenstein
Abstract: Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural language inference model. We then train a conditional language model to generate semantically valid statements, which include the found universal triggers. We find that the generated attacks maintain the directionality and semantic validity of the claim better than previous work.
摘要:对抗性攻击揭示重要的漏洞和训练的模型的缺陷。一个有效的攻击类型是普遍的对抗性的触发器,这是单独的n-gram的是,在受到攻击时附加到一个类的实例,可以欺骗的模型成预测目标类。然而,对于推理任务,如核对事实,这些触发器往往在不经意间反转它们插入情况下的意思。另外,这种攻击产生语义上无意义的投入,因为他们只是在连接触发对现有的样本。在这里,我们研究如何产生对事实核查系统保护地面真实含义,在语义上有效的对抗攻击。我们通过联合最小化的事实检查模型的目标类损失以及辅助自然语言推理模型的蕴涵类损失扩大用于通用触发产生的HotFlip攻击算法。然后,我们培养了条件语言模型来生成语义上有效的声明,其中包括发现普遍的触发器。我们发现,产生攻击维持要求的方向性和语义有效性比以前更好地工作。
Pepa Atanasova, Dustin Wright, Isabelle Augenstein
Abstract: Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural language inference model. We then train a conditional language model to generate semantically valid statements, which include the found universal triggers. We find that the generated attacks maintain the directionality and semantic validity of the claim better than previous work.
摘要:对抗性攻击揭示重要的漏洞和训练的模型的缺陷。一个有效的攻击类型是普遍的对抗性的触发器,这是单独的n-gram的是,在受到攻击时附加到一个类的实例,可以欺骗的模型成预测目标类。然而,对于推理任务,如核对事实,这些触发器往往在不经意间反转它们插入情况下的意思。另外,这种攻击产生语义上无意义的投入,因为他们只是在连接触发对现有的样本。在这里,我们研究如何产生对事实核查系统保护地面真实含义,在语义上有效的对抗攻击。我们通过联合最小化的事实检查模型的目标类损失以及辅助自然语言推理模型的蕴涵类损失扩大用于通用触发产生的HotFlip攻击算法。然后,我们培养了条件语言模型来生成语义上有效的声明,其中包括发现普遍的触发器。我们发现,产生攻击维持要求的方向性和语义有效性比以前更好地工作。
10. DSC IIT-ISM at SemEval-2020 Task 6: Boosting BERT with Dependencies for Definition Extraction [PDF] 返回目录
Aadarsh Singh, Priyanshu Kumar, Aman Sinha
Abstract: We explore the performance of Bidirectional Encoder Representations from Transformers (BERT) at definition extraction. We further propose a joint model of BERT and Text Level Graph Convolutional Network so as to incorporate dependencies into the model. Our proposed model produces better results than BERT and achieves comparable results to BERT with fine tuned language model in DeftEval (Task 6 of SemEval 2020), a shared task of classifying whether a sentence contains a definition or not (Subtask 1).
摘要:本文探讨双向编码代表处从变压器(BERT)的定义提取的性能。我们进一步提出了BERT和文本级别图表卷积网络的联合模型以便纳入依赖到模型中。我们提出的模型产生比BERT更好的效果,达到比较的结果,BERT与DeftEval(SemEval 2020年任务6)微调语言模型,分类的句子是否包含定义与否的一个共同任务(子程序1)。
Aadarsh Singh, Priyanshu Kumar, Aman Sinha
Abstract: We explore the performance of Bidirectional Encoder Representations from Transformers (BERT) at definition extraction. We further propose a joint model of BERT and Text Level Graph Convolutional Network so as to incorporate dependencies into the model. Our proposed model produces better results than BERT and achieves comparable results to BERT with fine tuned language model in DeftEval (Task 6 of SemEval 2020), a shared task of classifying whether a sentence contains a definition or not (Subtask 1).
摘要:本文探讨双向编码代表处从变压器(BERT)的定义提取的性能。我们进一步提出了BERT和文本级别图表卷积网络的联合模型以便纳入依赖到模型中。我们提出的模型产生比BERT更好的效果,达到比较的结果,BERT与DeftEval(SemEval 2020年任务6)微调语言模型,分类的句子是否包含定义与否的一个共同任务(子程序1)。
11. ISCAS at SemEval-2020 Task 5: Pre-trained Transformers for Counterfactual Statement Modeling [PDF] 返回目录
Yaojie Lu, Annan Li, Hongyu Lin, Xianpei Han, Le Sun
Abstract: ISCAS participated in two subtasks of SemEval 2020 Task 5: detecting counterfactual statements and detecting antecedent and consequence. This paper describes our system which is based on pre-trained transformers. For the first subtask, we train several transformer-based classifiers for detecting counterfactual statements. For the second subtask, we formulate antecedent and consequence extraction as a query-based question answering problem. The two subsystems both achieved third place in the evaluation. Our system is openly released at this https URL.
摘要:ISCAS参加SemEval 2020年两项子任务5:检测反报表和检测前因和后果。本文描述了我们这是基于预先训练变压器系统。对于第一个子任务,我们培养检测反陈述几个变压器为基础的分类。对于第二子任务,我们制定前因和后果提取作为基于查询的问题回答的问题。两个子系统都在评估取得了第三名。我们的系统可以在该HTTPS URL公开发布。
Yaojie Lu, Annan Li, Hongyu Lin, Xianpei Han, Le Sun
Abstract: ISCAS participated in two subtasks of SemEval 2020 Task 5: detecting counterfactual statements and detecting antecedent and consequence. This paper describes our system which is based on pre-trained transformers. For the first subtask, we train several transformer-based classifiers for detecting counterfactual statements. For the second subtask, we formulate antecedent and consequence extraction as a query-based question answering problem. The two subsystems both achieved third place in the evaluation. Our system is openly released at this https URL.
摘要:ISCAS参加SemEval 2020年两项子任务5:检测反报表和检测前因和后果。本文描述了我们这是基于预先训练变压器系统。对于第一个子任务,我们培养检测反陈述几个变压器为基础的分类。对于第二子任务,我们制定前因和后果提取作为基于查询的问题回答的问题。两个子系统都在评估取得了第三名。我们的系统可以在该HTTPS URL公开发布。
12. End-to-End Neural Event Coreference Resolution [PDF] 返回目录
Yaojie Lu, Hongyu Lin, Jialong Tang, Xianpei Han, Le Sun
Abstract: Traditional event coreference systems usually rely on pipeline framework and hand-crafted features, which often face error propagation problem and have poor generalization ability. In this paper, we propose an End-to-End Event Coreference approach -- E3C neural network, which can jointly model event detection and event coreference resolution tasks, and learn to extract features from raw text automatically. Furthermore, because event mentions are highly diversified and event coreference is intricately governed by long-distance, semantic-dependent decisions, a type-guided event coreference mechanism is further proposed in our E3C neural network. Experiments show that our method achieves new state-of-the-art performance on two standard datasets.
摘要:传统仪式的共参照系统通常依赖于管道的框架和手工制作的特点,经常面对错误传播的问题,并有泛化能力较差。在本文中,我们提出了一个端到端结束事件共指的方法 - E3C神经网络,它可以联合模型事件检测和事件指代消解任务,并学会从中提取原始文本自动功能。此外,由于事件中提到的高度多样化和事件的共参照是错综复杂的长途,语义相关的决策管理,该型制导事件共指机构还被我们E3C神经网络中提出。实验表明,我们的方法实现两种标准数据集的新的国家的最先进的性能。
Yaojie Lu, Hongyu Lin, Jialong Tang, Xianpei Han, Le Sun
Abstract: Traditional event coreference systems usually rely on pipeline framework and hand-crafted features, which often face error propagation problem and have poor generalization ability. In this paper, we propose an End-to-End Event Coreference approach -- E3C neural network, which can jointly model event detection and event coreference resolution tasks, and learn to extract features from raw text automatically. Furthermore, because event mentions are highly diversified and event coreference is intricately governed by long-distance, semantic-dependent decisions, a type-guided event coreference mechanism is further proposed in our E3C neural network. Experiments show that our method achieves new state-of-the-art performance on two standard datasets.
摘要:传统仪式的共参照系统通常依赖于管道的框架和手工制作的特点,经常面对错误传播的问题,并有泛化能力较差。在本文中,我们提出了一个端到端结束事件共指的方法 - E3C神经网络,它可以联合模型事件检测和事件指代消解任务,并学会从中提取原始文本自动功能。此外,由于事件中提到的高度多样化和事件的共参照是错综复杂的长途,语义相关的决策管理,该型制导事件共指机构还被我们E3C神经网络中提出。实验表明,我们的方法实现两种标准数据集的新的国家的最先进的性能。
13. FewJoint: A Few-shot Learning Benchmark for Joint Language Understanding [PDF] 返回目录
Yutai Hou, Jiafeng Mao, Yongkui Lai, Cheng Chen, Wanxiang Che, Zhigang Chen, Ting Liu
Abstract: Few-learn learning (FSL) is one of the key future steps in machine learning and has raised a lot of attention. However, in contrast to the rapid development in other domains, such as Computer Vision, the progress of FSL in Nature Language Processing (NLP) is much slower. One of the key reasons for this is the lacking of public benchmarks. NLP FSL researches always report new results on their own constructed few-shot datasets, which is pretty inefficient in results comparison and thus impedes cumulative progress. In this paper, we present FewJoint, a novel Few-Shot Learning benchmark for NLP. Different from most NLP FSL research that only focus on simple N-classification problems, our benchmark introduces few-shot joint dialogue language understanding, which additionally covers the structure prediction and multi-task reliance problems. This allows our benchmark to reflect the real-word NLP complexity beyond simple N-classification. Our benchmark is used in the few-shot learning contest of SMP2020-ECDT task-1. We also provide a compatible FSL platform to ease experiment set-up.
摘要:很少学习学习(FSL)是机器学习的关键步骤,今后之一,并提出了很多的关注。然而,相对于其他领域,如计算机视觉,FSL在自然语言处理(NLP)的进度迅猛发展要慢得多。其中一个这种情况的重要原因是公共基准的缺乏。 NLP FSL研究报告总是对自己构建的几炮的数据集,这是结果的比较非常低效的新成果,从而妨碍累积的进展。在本文中,我们目前FewJoint,一种新型的为数不多的射门学习标杆NLP。从最NLP FSL研究只注重简单的N-分类问题,我们的基准介绍几个次联合对话语言理解,其还涵盖了结构预测和多任务的依赖问题不同。这使得我们的基准,以反映超出了简单的N-分类实字NLP的复杂性。我们的基准是SMP2020-ECDT任务-1的数,投篮大赛学习使用。我们还提供了一个兼容FSL平台,以缓解实验设置。
Yutai Hou, Jiafeng Mao, Yongkui Lai, Cheng Chen, Wanxiang Che, Zhigang Chen, Ting Liu
Abstract: Few-learn learning (FSL) is one of the key future steps in machine learning and has raised a lot of attention. However, in contrast to the rapid development in other domains, such as Computer Vision, the progress of FSL in Nature Language Processing (NLP) is much slower. One of the key reasons for this is the lacking of public benchmarks. NLP FSL researches always report new results on their own constructed few-shot datasets, which is pretty inefficient in results comparison and thus impedes cumulative progress. In this paper, we present FewJoint, a novel Few-Shot Learning benchmark for NLP. Different from most NLP FSL research that only focus on simple N-classification problems, our benchmark introduces few-shot joint dialogue language understanding, which additionally covers the structure prediction and multi-task reliance problems. This allows our benchmark to reflect the real-word NLP complexity beyond simple N-classification. Our benchmark is used in the few-shot learning contest of SMP2020-ECDT task-1. We also provide a compatible FSL platform to ease experiment set-up.
摘要:很少学习学习(FSL)是机器学习的关键步骤,今后之一,并提出了很多的关注。然而,相对于其他领域,如计算机视觉,FSL在自然语言处理(NLP)的进度迅猛发展要慢得多。其中一个这种情况的重要原因是公共基准的缺乏。 NLP FSL研究报告总是对自己构建的几炮的数据集,这是结果的比较非常低效的新成果,从而妨碍累积的进展。在本文中,我们目前FewJoint,一种新型的为数不多的射门学习标杆NLP。从最NLP FSL研究只注重简单的N-分类问题,我们的基准介绍几个次联合对话语言理解,其还涵盖了结构预测和多任务的依赖问题不同。这使得我们的基准,以反映超出了简单的N-分类实字NLP的复杂性。我们的基准是SMP2020-ECDT任务-1的数,投篮大赛学习使用。我们还提供了一个兼容FSL平台,以缓解实验设置。
14. Multi^2OIE: Multilingual Open Information Extraction based on Multi-Head Attention with BERT [PDF] 返回目录
Youngbin Ro, Yukyung Lee, Pilsung Kang
Abstract: In this paper, we propose Multi^2OIE, which performs open information extraction (open IE) by combining BERT with multi-head attention. Our model is a sequence-labeling system with an efficient and effective argument extraction method. We use a query, key, and value setting inspired by the Multimodal Transformer to replace the previously used bidirectional long short-term memory architecture with multi-head attention. Multi^2OIE outperforms existing sequence-labeling systems with high computational efficiency on two benchmark evaluation datasets, Re-OIE2016 and CaRB. Additionally, we apply the proposed method to multilingual open IE using multilingual BERT. Experimental results on new benchmark datasets introduced for two languages (Spanish and Portuguese) demonstrate that our model outperforms other multilingual systems without training data for the target languages.
摘要:在本文中,我们提出了多^ 2OIE,执行由BERT与多头注意结合开放式信息抽取(IE打开)。我们的模型是一个序列标记系统具有高效率和有效的参数的提取方法。我们采用查询,键和值由多式联运变压器启发设置,以取代多头注意以前使用的双向长短期内存架构。多^ 2OIE性能优于现有的序列标签系统的两个基准评估数据集,重新OIE2016和碳水化合物的高计算效率。此外,我们使用多语种BERT所提出的方法应用到多语种打开IE。在介绍了两种语言(西班牙语和葡萄牙语)新标准数据集实验结果表明,我们的模型优于其他多语言系统,而无需训练数据的目标语言。
Youngbin Ro, Yukyung Lee, Pilsung Kang
Abstract: In this paper, we propose Multi^2OIE, which performs open information extraction (open IE) by combining BERT with multi-head attention. Our model is a sequence-labeling system with an efficient and effective argument extraction method. We use a query, key, and value setting inspired by the Multimodal Transformer to replace the previously used bidirectional long short-term memory architecture with multi-head attention. Multi^2OIE outperforms existing sequence-labeling systems with high computational efficiency on two benchmark evaluation datasets, Re-OIE2016 and CaRB. Additionally, we apply the proposed method to multilingual open IE using multilingual BERT. Experimental results on new benchmark datasets introduced for two languages (Spanish and Portuguese) demonstrate that our model outperforms other multilingual systems without training data for the target languages.
摘要:在本文中,我们提出了多^ 2OIE,执行由BERT与多头注意结合开放式信息抽取(IE打开)。我们的模型是一个序列标记系统具有高效率和有效的参数的提取方法。我们采用查询,键和值由多式联运变压器启发设置,以取代多头注意以前使用的双向长短期内存架构。多^ 2OIE性能优于现有的序列标签系统的两个基准评估数据集,重新OIE2016和碳水化合物的高计算效率。此外,我们使用多语种BERT所提出的方法应用到多语种打开IE。在介绍了两种语言(西班牙语和葡萄牙语)新标准数据集实验结果表明,我们的模型优于其他多语言系统,而无需训练数据的目标语言。
15. A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning [PDF] 返回目录
Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng
Abstract: Structured belief states are crucial for user goal tracking and database query in task-oriented dialog systems. However, training belief trackers often requires expensive turn-level annotations of every user utterance. In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning. We propose a probabilistic dialog model, called the LAtent BElief State (LABES) model, where belief states are represented as discrete latent variables and jointly modeled with system responses given user inputs. Such latent variable modeling enables us to develop semi-supervised learning under the principled variational learning framework. Furthermore, we introduce LABES-S2S, which is a copy-augmented Seq2Seq model instantiation of LABES. In supervised experiments, LABES-S2S obtains strong results on three benchmark datasets of different scales. In utilizing unlabeled dialog data, semi-supervised LABES-S2S significantly outperforms both supervised-only and semi-supervised baselines. Remarkably, we can reduce the annotation demands to 50% without performance loss on MultiWOZ.
摘要:结构性信仰状态是在面向任务的对话系统用户目标跟踪和数据库查询的关键。然而,培养信念纤夫常常要求每个使用者说话的昂贵的转级别的注解。在本文中,我们的目标是减轻对信仰的状态标记的依赖,建立终端到终端的对话系统,通过对半监督学习利用未标记的对话框数据。我们提出一个概率对话模型,称为潜信仰国家(LABES)模型,其中的信仰状态表示为离散的潜在变量,并给予用户输入系统响应联合建模。这样的潜变量模型使我们能够开发的原则变学习框架下的半监督学习。此外,我们引入LABES-S2S,这是LABES的副本增加了的Seq2Seq模型实例。在监督的实验中,LABES-S2S获得不同尺度的三个地基准数据集强劲的业绩。在利用未标记的对话框数据,半监督LABES-S2S显著性能优于监督,只有和半监督基线。值得注意的是,我们可以减少注释需求50%不会对MultiWOZ性能损失。
Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng
Abstract: Structured belief states are crucial for user goal tracking and database query in task-oriented dialog systems. However, training belief trackers often requires expensive turn-level annotations of every user utterance. In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning. We propose a probabilistic dialog model, called the LAtent BElief State (LABES) model, where belief states are represented as discrete latent variables and jointly modeled with system responses given user inputs. Such latent variable modeling enables us to develop semi-supervised learning under the principled variational learning framework. Furthermore, we introduce LABES-S2S, which is a copy-augmented Seq2Seq model instantiation of LABES. In supervised experiments, LABES-S2S obtains strong results on three benchmark datasets of different scales. In utilizing unlabeled dialog data, semi-supervised LABES-S2S significantly outperforms both supervised-only and semi-supervised baselines. Remarkably, we can reduce the annotation demands to 50% without performance loss on MultiWOZ.
摘要:结构性信仰状态是在面向任务的对话系统用户目标跟踪和数据库查询的关键。然而,培养信念纤夫常常要求每个使用者说话的昂贵的转级别的注解。在本文中,我们的目标是减轻对信仰的状态标记的依赖,建立终端到终端的对话系统,通过对半监督学习利用未标记的对话框数据。我们提出一个概率对话模型,称为潜信仰国家(LABES)模型,其中的信仰状态表示为离散的潜在变量,并给予用户输入系统响应联合建模。这样的潜变量模型使我们能够开发的原则变学习框架下的半监督学习。此外,我们引入LABES-S2S,这是LABES的副本增加了的Seq2Seq模型实例。在监督的实验中,LABES-S2S获得不同尺度的三个地基准数据集强劲的业绩。在利用未标记的对话框数据,半监督LABES-S2S显著性能优于监督,只有和半监督基线。值得注意的是,我们可以减少注释需求50%不会对MultiWOZ性能损失。
16. A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching [PDF] 返回目录
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, Federico Nanni
Abstract: Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.
摘要:鉴于地名和他们解决他们的现实世界对象,需要提供对文本数据的语义先进的访问。这一过程通常由地名高度变化的阻碍。候选选择是识别能够由先前识别的地名被称为潜在实体的任务。虽然传统上很少受到关注的研究领域,它已经表明,候选人选择对下游任务(即实体分辨率),特别是在嘈杂或非标准文本显著的影响。在本文中,我们通过地名匹配引进候选选择灵活的深度学习方法,采用先进设备,最先进的神经网络结构。我们执行基于几个新现实主义的数据集,其中包括各种具有挑战性的场景(跨语言和区域差异,以及OCR错误)的固有地名匹配的评价。我们报告中的地名分辨率的下游任务的背景下其对候选人的选择性能,无论是在现有的数据集和19世纪英国OCR'd一个新的文本手动标注资源。
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, Federico Nanni
Abstract: Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.
摘要:鉴于地名和他们解决他们的现实世界对象,需要提供对文本数据的语义先进的访问。这一过程通常由地名高度变化的阻碍。候选选择是识别能够由先前识别的地名被称为潜在实体的任务。虽然传统上很少受到关注的研究领域,它已经表明,候选人选择对下游任务(即实体分辨率),特别是在嘈杂或非标准文本显著的影响。在本文中,我们通过地名匹配引进候选选择灵活的深度学习方法,采用先进设备,最先进的神经网络结构。我们执行基于几个新现实主义的数据集,其中包括各种具有挑战性的场景(跨语言和区域差异,以及OCR错误)的固有地名匹配的评价。我们报告中的地名分辨率的下游任务的背景下其对候选人的选择性能,无论是在现有的数据集和19世纪英国OCR'd一个新的文本手动标注资源。
17. Code-switching pre-training for neural machine translation [PDF] 返回目录
Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju
Abstract: This paper proposes a new pre-training method, called Code-Switching Pre-training (CSP for short) for Neural Machine Translation (NMT). Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language. Specifically, we firstly perform lexicon induction with unsupervised word embedding mapping between the source and target languages, and then randomly replace some words in the input sentence with their translation words according to the extracted translation lexicons. CSP adopts the encoder-decoder framework: its encoder takes the code-mixed sentence as input, and its decoder predicts the replaced fragment of the input sentence. In this way, CSP is able to pre-train the NMT model by explicitly making the most of the cross-lingual alignment information extracted from the source and target monolingual corpus. Additionally, we relieve the pretrain-finetune discrepancy caused by the artificial symbols like [mask]. To verify the effectiveness of the proposed method, we conduct extensive experiments on unsupervised and supervised NMT. Experimental results show that CSP achieves significant improvements over baselines without pre-training or with other pre-training methods.
摘要:本文提出了一种神经机器翻译(NMT)的新前的训练方法,称为码转换前的训练(CSP的简称)。不同于传统的前训练方法,随机掩码输入句子的一些片段,所提出的CSP随机替换源句子有些词在目标语言翻译的话。具体来说,我们首先词典诱导无监督字嵌入映射的源语言和目标语言之间,然后随机根据所提取的翻译词典与他们的翻译单词替换输入句子一些字上执行。 CSP采用编码器 - 解码器框架:其编码器拍摄代码混合句子作为输入,并且其解码器预测输入句子的替换片段。以这种方式,CSP能够通过明确地使得大部分的从源和目标单语语料库中提取的跨语种对准信息预先训练NMT模型。此外,我们减轻由像[面具]人造符号pretrain-精调的差异。为了验证该方法的有效性,我们进行非监督广泛的实验和监督NMT。实验结果表明,CSP达到了基线显著改善无需预先培训或与其他前的训练方法。
Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju
Abstract: This paper proposes a new pre-training method, called Code-Switching Pre-training (CSP for short) for Neural Machine Translation (NMT). Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language. Specifically, we firstly perform lexicon induction with unsupervised word embedding mapping between the source and target languages, and then randomly replace some words in the input sentence with their translation words according to the extracted translation lexicons. CSP adopts the encoder-decoder framework: its encoder takes the code-mixed sentence as input, and its decoder predicts the replaced fragment of the input sentence. In this way, CSP is able to pre-train the NMT model by explicitly making the most of the cross-lingual alignment information extracted from the source and target monolingual corpus. Additionally, we relieve the pretrain-finetune discrepancy caused by the artificial symbols like [mask]. To verify the effectiveness of the proposed method, we conduct extensive experiments on unsupervised and supervised NMT. Experimental results show that CSP achieves significant improvements over baselines without pre-training or with other pre-training methods.
摘要:本文提出了一种神经机器翻译(NMT)的新前的训练方法,称为码转换前的训练(CSP的简称)。不同于传统的前训练方法,随机掩码输入句子的一些片段,所提出的CSP随机替换源句子有些词在目标语言翻译的话。具体来说,我们首先词典诱导无监督字嵌入映射的源语言和目标语言之间,然后随机根据所提取的翻译词典与他们的翻译单词替换输入句子一些字上执行。 CSP采用编码器 - 解码器框架:其编码器拍摄代码混合句子作为输入,并且其解码器预测输入句子的替换片段。以这种方式,CSP能够通过明确地使得大部分的从源和目标单语语料库中提取的跨语种对准信息预先训练NMT模型。此外,我们减轻由像[面具]人造符号pretrain-精调的差异。为了验证该方法的有效性,我们进行非监督广泛的实验和监督NMT。实验结果表明,CSP达到了基线显著改善无需预先培训或与其他前的训练方法。
18. On the Transferability of Minimal Prediction Preserving Inputs in Question Answering [PDF] 返回目录
Shayne Longpre, Yi Lu, Christopher DuBois
Abstract: Recent work (Feng et al., 2018) establishes the presence of short, uninterpretable input fragments that yield high confidence and accuracy in neural models. We refer to these as Minimal Prediction Preserving Inputs (MPPIs). In the context of question answering, we investigate competing hypotheses for the existence of MPPIs, including poor posterior calibration of neural models, lack of pretraining, and "dataset bias" (where a model learns to attend to spurious, non-generalizable cues in the training data). We discover a perplexing invariance of MPPIs to random training seed, model architecture, pretraining, and training domain. MPPIs demonstrate remarkable transferability across domains - closing half the gap between models' performance on comparably short queries and original queries. Additionally, penalizing over-confidence on MPPIs fails to improve either generalization or adversarial robustness. These results suggest the interpretability of MPPIs is insufficient to characterize generalization capacity of these models. We hope this focused investigation encourages a more systematic analysis of model behavior outside of the human interpretable distribution of examples.
摘要:最近的工作(Feng等,2018)规定,产生神经模型高可信度和准确性短,无法解释的输入片段的存在。我们称这些为最小预测保留输入(MPPIS)。在问题回答的背景下,我们研究竞争假设为MPPIS的存在,包括神经模型后校准差,缺乏训练前,而“数据集偏见”(其中的典范获悉出席杂散,非概括性线索训练数据)。我们发现MPPIS的随机种子培训,模型架构,训练前和培训领域一个令人困惑的不变性。 MPPIS证明跨域显着的可转移性 - 关闭之间的车型的性能一半的差距在相对较短的查询和原始查询。此外,在MPPIS惩罚过于自信不能改善或者泛化或对抗性的鲁棒性。这些结果表明MPPIS的解释性不足以这些模型的特征分析概括能力。我们希望这个调查重点鼓励的例子人类可解释的分布模型行为之外的更系统的分析。
Shayne Longpre, Yi Lu, Christopher DuBois
Abstract: Recent work (Feng et al., 2018) establishes the presence of short, uninterpretable input fragments that yield high confidence and accuracy in neural models. We refer to these as Minimal Prediction Preserving Inputs (MPPIs). In the context of question answering, we investigate competing hypotheses for the existence of MPPIs, including poor posterior calibration of neural models, lack of pretraining, and "dataset bias" (where a model learns to attend to spurious, non-generalizable cues in the training data). We discover a perplexing invariance of MPPIs to random training seed, model architecture, pretraining, and training domain. MPPIs demonstrate remarkable transferability across domains - closing half the gap between models' performance on comparably short queries and original queries. Additionally, penalizing over-confidence on MPPIs fails to improve either generalization or adversarial robustness. These results suggest the interpretability of MPPIs is insufficient to characterize generalization capacity of these models. We hope this focused investigation encourages a more systematic analysis of model behavior outside of the human interpretable distribution of examples.
摘要:最近的工作(Feng等,2018)规定,产生神经模型高可信度和准确性短,无法解释的输入片段的存在。我们称这些为最小预测保留输入(MPPIS)。在问题回答的背景下,我们研究竞争假设为MPPIS的存在,包括神经模型后校准差,缺乏训练前,而“数据集偏见”(其中的典范获悉出席杂散,非概括性线索训练数据)。我们发现MPPIS的随机种子培训,模型架构,训练前和培训领域一个令人困惑的不变性。 MPPIS证明跨域显着的可转移性 - 关闭之间的车型的性能一半的差距在相对较短的查询和原始查询。此外,在MPPIS惩罚过于自信不能改善或者泛化或对抗性的鲁棒性。这些结果表明MPPIS的解释性不足以这些模型的特征分析概括能力。我们希望这个调查重点鼓励的例子人类可解释的分布模型行为之外的更系统的分析。
19. Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning [PDF] 返回目录
Bingbing Li, Zhenglun Kong, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding
Abstract: Pretrained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the popularity of pretrained models, especially in the era of edge computing. In this work, we propose an efficient transformer-based large-scale language representation using hardware-friendly block structure pruning. We incorporate the reweighted group Lasso into block-structured pruning for optimization. Besides the significantly reduced weight storage and computation, the proposed approach achieves high compression rates. Experimental results on different models (BERT, RoBERTa, and DistilBERT) on the General Language Understanding Evaluation (GLUE) benchmark tasks show that we achieve up to 5.0x with zero or minor accuracy degradation on certain task(s). Our proposed method is also orthogonal to existing compact pretrained language models such as DistilBERT using knowledge distillation, since a further 1.79x average compression rate can be achieved on top of DistilBERT with zero or minor accuracy degradation. It is suitable to deploy the final compressed model on resource-constrained edge devices.
摘要:预训练的大型语言模型已经越来越显示出对许多自然语言处理(NLP)任务,精度高。然而,有限的存储量和计算速度的硬件平台已经阻碍预训练模式的普及,特别是在边缘计算时代。在这项工作中,我们提出使用硬件友好的块结构修剪高效的基于变压器的大型语言表示。我们结合了重新加权组套索转化为优化块结构修剪。除了显著减轻了重量存储和计算,所提出的方法达到高压缩率。对通用语言不同型号(BERT,罗伯塔和DistilBERT)了解评价(胶)基准任务实验结果表明,我们实现高达5.0倍与特定任务(一个或多个)零或轻微的精度降低。我们提出的方法也是正交的已有的高密度预训练的语言模型,例如使用DistilBERT知识蒸馏,由于进一步1.79x平均压缩率可在DistilBERT的顶部具有零或微小的精度劣化而实现。它适合部署在资源受限的边缘设备的最终的压缩模式。
Bingbing Li, Zhenglun Kong, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding
Abstract: Pretrained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the popularity of pretrained models, especially in the era of edge computing. In this work, we propose an efficient transformer-based large-scale language representation using hardware-friendly block structure pruning. We incorporate the reweighted group Lasso into block-structured pruning for optimization. Besides the significantly reduced weight storage and computation, the proposed approach achieves high compression rates. Experimental results on different models (BERT, RoBERTa, and DistilBERT) on the General Language Understanding Evaluation (GLUE) benchmark tasks show that we achieve up to 5.0x with zero or minor accuracy degradation on certain task(s). Our proposed method is also orthogonal to existing compact pretrained language models such as DistilBERT using knowledge distillation, since a further 1.79x average compression rate can be achieved on top of DistilBERT with zero or minor accuracy degradation. It is suitable to deploy the final compressed model on resource-constrained edge devices.
摘要:预训练的大型语言模型已经越来越显示出对许多自然语言处理(NLP)任务,精度高。然而,有限的存储量和计算速度的硬件平台已经阻碍预训练模式的普及,特别是在边缘计算时代。在这项工作中,我们提出使用硬件友好的块结构修剪高效的基于变压器的大型语言表示。我们结合了重新加权组套索转化为优化块结构修剪。除了显著减轻了重量存储和计算,所提出的方法达到高压缩率。对通用语言不同型号(BERT,罗伯塔和DistilBERT)了解评价(胶)基准任务实验结果表明,我们实现高达5.0倍与特定任务(一个或多个)零或轻微的精度降低。我们提出的方法也是正交的已有的高密度预训练的语言模型,例如使用DistilBERT知识蒸馏,由于进一步1.79x平均压缩率可在DistilBERT的顶部具有零或微小的精度劣化而实现。它适合部署在资源受限的边缘设备的最终的压缩模式。
20. Self-supervised pre-training and contrastive representation learning for multiple-choice video QA [PDF] 返回目录
Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim, Inho Kang, Nojun Kwak
Abstract: Video Question Answering (Video QA) requires fine-grained understanding of both video and language modalities to answer the given questions. In this paper, we propose novel training schemes for multiple-choice video question answering with a self-supervised pre-training stage and a supervised contrastive learning in the main stage as an auxiliary learning. In the self-supervised pre-training stage, we transform the original problem format of predicting the correct answer into the one that predicts the relevant question to provide a model with broader contextual inputs without any further dataset or annotation. For contrastive learning in the main stage, we add a masking noise to the input corresponding to the ground-truth answer, and consider the original input of the ground-truth answer as a positive sample, while treating the rest as negative samples. By mapping the positive sample closer to the masked input, we show that the model performance is improved. We further employ locally aligned attention to focus more effectively on the video frames that are particularly relevant to the given corresponding subtitle sentences. We evaluate our proposed model on highly competitive benchmark datasets related to multiple-choice videoQA: TVQA, TVQA+, and DramaQA. Experimental results show that our model achieves state-of-the-art performance on all datasets. We also validate our approaches through further analyses.
摘要:视频答疑(视频QA)需要细粒度视频和语言模式的理解来回答给定的问题。在本文中,我们提出了多项选择题视频答疑新颖的培训计划与自我监督前培训阶段,在主舞台监督对比学习作为辅助学习。在自我监督前培训阶段,我们把预测的正确答案为一个预测的相关问题提供更广泛背景输入的模式没有任何另外的数据集或注释的原始问题的格式。对于在主舞台对比学习,我们添加了一个屏蔽噪音相当于地面实况答案输入,并考虑地面实况答案为阳性样品的原始输入,而处理其余为阴性样品。通过映射阳性样品接近屏蔽输入,我们表明,该模型性能的提高。我们进一步采用局部比关注更有效地专注于对给定相应的字幕句子特别相关的视频帧。我们评估我们提出了相关的选择题videoQA竞争激烈的基准数据集模型:TVQA,TVQA +和DramaQA。实验结果表明,我们的模型实现对所有数据集的国家的最先进的性能。我们还通过进一步分析验证了我们的方法。
Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim, Inho Kang, Nojun Kwak
Abstract: Video Question Answering (Video QA) requires fine-grained understanding of both video and language modalities to answer the given questions. In this paper, we propose novel training schemes for multiple-choice video question answering with a self-supervised pre-training stage and a supervised contrastive learning in the main stage as an auxiliary learning. In the self-supervised pre-training stage, we transform the original problem format of predicting the correct answer into the one that predicts the relevant question to provide a model with broader contextual inputs without any further dataset or annotation. For contrastive learning in the main stage, we add a masking noise to the input corresponding to the ground-truth answer, and consider the original input of the ground-truth answer as a positive sample, while treating the rest as negative samples. By mapping the positive sample closer to the masked input, we show that the model performance is improved. We further employ locally aligned attention to focus more effectively on the video frames that are particularly relevant to the given corresponding subtitle sentences. We evaluate our proposed model on highly competitive benchmark datasets related to multiple-choice videoQA: TVQA, TVQA+, and DramaQA. Experimental results show that our model achieves state-of-the-art performance on all datasets. We also validate our approaches through further analyses.
摘要:视频答疑(视频QA)需要细粒度视频和语言模式的理解来回答给定的问题。在本文中,我们提出了多项选择题视频答疑新颖的培训计划与自我监督前培训阶段,在主舞台监督对比学习作为辅助学习。在自我监督前培训阶段,我们把预测的正确答案为一个预测的相关问题提供更广泛背景输入的模式没有任何另外的数据集或注释的原始问题的格式。对于在主舞台对比学习,我们添加了一个屏蔽噪音相当于地面实况答案输入,并考虑地面实况答案为阳性样品的原始输入,而处理其余为阴性样品。通过映射阳性样品接近屏蔽输入,我们表明,该模型性能的提高。我们进一步采用局部比关注更有效地专注于对给定相应的字幕句子特别相关的视频帧。我们评估我们提出了相关的选择题videoQA竞争激烈的基准数据集模型:TVQA,TVQA +和DramaQA。实验结果表明,我们的模型实现对所有数据集的国家的最先进的性能。我们还通过进一步分析验证了我们的方法。
21. Towards Fully 8-bit Integer Inference for the Transformer Model [PDF] 返回目录
Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu
Abstract: 8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization. In this work, we show that after a principled modification on the Transformer architecture, dubbed Integer Transformer, an (almost) fully 8-bit integer inference algorithm Scale Propagation could be derived. De-quantization is adopted when necessary, which makes the network more efficient. Our experiments on WMT16 En<->Ro, WMT14 En<->De and En->Fr translation tasks as well as the WikiText-103 language modelling task show that the fully 8-bit Transformer system achieves comparable performance with the floating point baseline but requires nearly 4X less memory footprint. ->->
摘要:8位整数推论,因为在降低延迟和深层神经网络的存储一个有前途的方向,最近取得了很大进展。在另一方面,以前的系统仍然依赖于32位浮点在复杂模型的某些功能(例如,使用SoftMax变压器),并大量使用量化和去量化的。在这项工作中,我们表明,在变压器架构的原则性修改,被称为整数变压器,之后的(几乎)全部8位整数推理算法鳞片繁殖可以得到。去量化,采用必要的时候,这使得网络更加高效。 < - >滚装,WMT14恩< - >我们的WMT16恩实验德和恩>神父翻译任务还有wikitext的-103语言建模任务显示,全8位变压器系统实现了与浮点基准相当的性能但需要近4倍更少的内存占用。
Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu
Abstract: 8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization. In this work, we show that after a principled modification on the Transformer architecture, dubbed Integer Transformer, an (almost) fully 8-bit integer inference algorithm Scale Propagation could be derived. De-quantization is adopted when necessary, which makes the network more efficient. Our experiments on WMT16 En<->Ro, WMT14 En<->De and En->Fr translation tasks as well as the WikiText-103 language modelling task show that the fully 8-bit Transformer system achieves comparable performance with the floating point baseline but requires nearly 4X less memory footprint. ->->
摘要:8位整数推论,因为在降低延迟和深层神经网络的存储一个有前途的方向,最近取得了很大进展。在另一方面,以前的系统仍然依赖于32位浮点在复杂模型的某些功能(例如,使用SoftMax变压器),并大量使用量化和去量化的。在这项工作中,我们表明,在变压器架构的原则性修改,被称为整数变压器,之后的(几乎)全部8位整数推理算法鳞片繁殖可以得到。去量化,采用必要的时候,这使得网络更加高效。 < - >滚装,WMT14恩< - >我们的WMT16恩实验德和恩>神父翻译任务还有wikitext的-103语言建模任务显示,全8位变压器系统实现了与浮点基准相当的性能但需要近4倍更少的内存占用。
22. Multi-modal Summarization for Video-containing Documents [PDF] 返回目录
Xiyan Fu, Jun Wang, Zhenglu Yang
Abstract: Summarization of multimedia data becomes increasingly significant as it is the basis for many real-world applications, such as question answering, Web search, and so forth. Most existing multi-modal summarization works however have used visual complementary features extracted from images rather than videos, thereby losing abundant information. Hence, we propose a novel multi-modal summarization task to summarize from a document and its associated video. In this work, we also build a baseline general model with effective strategies, i.e., bi-hop attention and improved late fusion mechanisms to bridge the gap between different modalities, and a bi-stream summarization strategy to employ text and video summarization simultaneously. Comprehensive experiments show that the proposed model is beneficial for multi-modal summarization and superior to existing methods. Moreover, we collect a novel dataset and it provides a new resource for future study that results from documents and videos.
摘要:多媒体数据的总结变得越来越显著,因为它是许多现实世界的应用,如问答,网页搜索的基础上,等等。大多数现有的多模态汇总作品不过已经用从图像而不是视频提取出来,从而失去了丰富的信息可视化互补的特征。因此,我们提出了一种新的多模式摘要任务从文件及其相关视频来概括。在这项工作中,我们还建立与有效的策略,即双跳重视和改进后融合机制的基准通用模型的桥梁和双向流汇总策略采用的文字和视频摘要同时在不同的模式之间的差距。综合实验表明,该模型是多模态聚合,以现有的方法是有利的和卓越的。此外,我们收集了新的数据集,它提供了一种新的资源用于未来的研究,从文档和视频效果。
Xiyan Fu, Jun Wang, Zhenglu Yang
Abstract: Summarization of multimedia data becomes increasingly significant as it is the basis for many real-world applications, such as question answering, Web search, and so forth. Most existing multi-modal summarization works however have used visual complementary features extracted from images rather than videos, thereby losing abundant information. Hence, we propose a novel multi-modal summarization task to summarize from a document and its associated video. In this work, we also build a baseline general model with effective strategies, i.e., bi-hop attention and improved late fusion mechanisms to bridge the gap between different modalities, and a bi-stream summarization strategy to employ text and video summarization simultaneously. Comprehensive experiments show that the proposed model is beneficial for multi-modal summarization and superior to existing methods. Moreover, we collect a novel dataset and it provides a new resource for future study that results from documents and videos.
摘要:多媒体数据的总结变得越来越显著,因为它是许多现实世界的应用,如问答,网页搜索的基础上,等等。大多数现有的多模态汇总作品不过已经用从图像而不是视频提取出来,从而失去了丰富的信息可视化互补的特征。因此,我们提出了一种新的多模式摘要任务从文件及其相关视频来概括。在这项工作中,我们还建立与有效的策略,即双跳重视和改进后融合机制的基准通用模型的桥梁和双向流汇总策略采用的文字和视频摘要同时在不同的模式之间的差距。综合实验表明,该模型是多模态聚合,以现有的方法是有利的和卓越的。此外,我们收集了新的数据集,它提供了一种新的资源用于未来的研究,从文档和视频效果。
23. State-Machine-Based Dialogue Agents with Few-Shot Contextual Semantic Parsers [PDF] 返回目录
Giovanni Campagna, Sina J. Semnani, Ryan Kearns, Lucas Jun Koba Sato, Monica S. Lam
Abstract: This paper presents a methodology and toolkit for creating a rule-based multi-domain conversational agent for transactions from (1) language annotations of the domains' database schemas and APIs and (2) a couple of hundreds of annotated human dialogues. There is no need for a large annotated training set, which is expensive to acquire. The toolkit uses a pre-defined abstract dialogue state machine to synthesize millions of dialogues based on the domains' information. The annotated and synthesized data are used to train a contextual semantic parser that interprets the user's latest utterance in the context of a formal representation of the conversation up to that point. Developers can refine the state machine to achieve higher accuracy. On the MultiWOZ benchmark, we achieve over 71% turn-by-turn slot accuracy on a cleaned, reannotated test set, without using any of the original training data. Our state machine can model 96% of the human agent turns. Our training strategy improves by 9% over a baseline that uses the same amount of hand-labeled data, showing the benefit of synthesizing data using the state machine.
摘要:本文介绍了从域名数据库模式和API(1)语言注释创建交易规则为基础的多领域对话剂的方法和工具;(2)一对夫妇数百个注释的人类对话的。有没有需要大量注释的训练集,这是昂贵的收购。该工具包使用预先定义的抽象对话状态机来合成数百万基于域的信息对话。该注释和合成的数据被用来训练上下文语义解析该解释在交谈到那个点的形式表示的情况下用户的最新发言。开发人员可以细化状态机来实现更高的精度。在MultiWOZ基准,我们实现了在清洁,reannotated测试集71%转由转槽的精度,而无需使用任何原始训练数据。我们的国家机器可以模拟人类代理圈的96%。我们的培养策略由9%,比使用手标记的相同的数据量,表示使用状态机合成数据的益处的基线提高。
Giovanni Campagna, Sina J. Semnani, Ryan Kearns, Lucas Jun Koba Sato, Monica S. Lam
Abstract: This paper presents a methodology and toolkit for creating a rule-based multi-domain conversational agent for transactions from (1) language annotations of the domains' database schemas and APIs and (2) a couple of hundreds of annotated human dialogues. There is no need for a large annotated training set, which is expensive to acquire. The toolkit uses a pre-defined abstract dialogue state machine to synthesize millions of dialogues based on the domains' information. The annotated and synthesized data are used to train a contextual semantic parser that interprets the user's latest utterance in the context of a formal representation of the conversation up to that point. Developers can refine the state machine to achieve higher accuracy. On the MultiWOZ benchmark, we achieve over 71% turn-by-turn slot accuracy on a cleaned, reannotated test set, without using any of the original training data. Our state machine can model 96% of the human agent turns. Our training strategy improves by 9% over a baseline that uses the same amount of hand-labeled data, showing the benefit of synthesizing data using the state machine.
摘要:本文介绍了从域名数据库模式和API(1)语言注释创建交易规则为基础的多领域对话剂的方法和工具;(2)一对夫妇数百个注释的人类对话的。有没有需要大量注释的训练集,这是昂贵的收购。该工具包使用预先定义的抽象对话状态机来合成数百万基于域的信息对话。该注释和合成的数据被用来训练上下文语义解析该解释在交谈到那个点的形式表示的情况下用户的最新发言。开发人员可以细化状态机来实现更高的精度。在MultiWOZ基准,我们实现了在清洁,reannotated测试集71%转由转槽的精度,而无需使用任何原始训练数据。我们的国家机器可以模拟人类代理圈的96%。我们的培养策略由9%,比使用手标记的相同的数据量,表示使用状态机合成数据的益处的基线提高。
24. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis [PDF] 返回目录
Xiaoyu Xing, Zhijing Jin, Di Jin, Bingning Wang, Qi Zhang, Xuanjing Huang
Abstract: Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. However, existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. To solve this problem, we develop a simple but effective approach to enrich ABSA test sets. Specifically, we generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment. Based on the SemEval 2014 dataset, we construct the Aspect Robustness Test Set (ARTS) as a comprehensive probe of the aspect robustness of ABSA models. Over 92% data of ARTS show high fluency and desired sentiment on all aspects by human evaluation. Using ARTS, we analyze the robustness of nine ABSA models, and observe, surprisingly, that their accuracy drops by up to 69.73%. Our code and new test set are available at this https URL.
摘要:基于Aspect的情感分析(ABSA)的目的是预测情绪对文本的特定方面。但是,现有的测试ABSA集不能被用于探测的模型是否可以从非目标方面区分目标方面的情绪。为了解决这个问题,我们开发了一个简单而有效的方法来充实ABSA测试集。具体来说,我们产生新的例子解开的从目标方面的情绪非目标方面的混杂的情绪。基于该SemEval 2014集,我们构建了纵横稳健性测试仪(ARTS)作为ABSA模型的鲁棒性方面的综合探测。超过92%的科学数据显示,对人的评价各方面的高流畅度和期望的情绪。使用ARTS,我们分析了九款ABSA的稳健性,并观察,出人意料的是,他们的准确度高达下降到69.73%。我们的代码和新的测试集可在此HTTPS URL。
Xiaoyu Xing, Zhijing Jin, Di Jin, Bingning Wang, Qi Zhang, Xuanjing Huang
Abstract: Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. However, existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. To solve this problem, we develop a simple but effective approach to enrich ABSA test sets. Specifically, we generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment. Based on the SemEval 2014 dataset, we construct the Aspect Robustness Test Set (ARTS) as a comprehensive probe of the aspect robustness of ABSA models. Over 92% data of ARTS show high fluency and desired sentiment on all aspects by human evaluation. Using ARTS, we analyze the robustness of nine ABSA models, and observe, surprisingly, that their accuracy drops by up to 69.73%. Our code and new test set are available at this https URL.
摘要:基于Aspect的情感分析(ABSA)的目的是预测情绪对文本的特定方面。但是,现有的测试ABSA集不能被用于探测的模型是否可以从非目标方面区分目标方面的情绪。为了解决这个问题,我们开发了一个简单而有效的方法来充实ABSA测试集。具体来说,我们产生新的例子解开的从目标方面的情绪非目标方面的混杂的情绪。基于该SemEval 2014集,我们构建了纵横稳健性测试仪(ARTS)作为ABSA模型的鲁棒性方面的综合探测。超过92%的科学数据显示,对人的评价各方面的高流畅度和期望的情绪。使用ARTS,我们分析了九款ABSA的稳健性,并观察,出人意料的是,他们的准确度高达下降到69.73%。我们的代码和新的测试集可在此HTTPS URL。
25. How to marry a star: probabilistic constraints for meaning in context [PDF] 返回目录
Katrin Erk, Aurelie Herbelot
Abstract: In this paper, we derive a notion of word meaning in context from Fillmore's 'semantics of understanding', in which a listener draws on their knowledge of both language and the world to 'envision' the situation described in an utterance. We characterize utterance understanding as a combination of cognitive semantics and Discourse Representation Theory, formalized as a situation description system:a probabilistic model which takes utterance understanding to be the mental process of describing one or more situations that would account for an observed utterance. Our model captures the interplay of local and global contexts and their joint influence upon the lexical representation of sentence constituents. We implement the system using a directed graphical model, and apply it to examples containing various contextualization phenomena.
摘要:在本文中,我们得出的字从菲尔莫尔的“理解语义”,其中一个监听器借鉴了他们的语言和世界“预想”在话语中所描述的情况的知识情形下,意味着一个概念。我们表征话语理解认知语义和话语表达理论的结合,正式的情况说明系统:一个概率模型,这需要话语理解是描述了将占到观察到的话语的一个或多个的情况下的心理过程。我们的模型捕获的本地和全球范围内的相互作用和在句子成分的词汇表示他们的共同影响。我们实施使用定向图形模型系统,并将其应用到包含各种情境化现象的例子。
Katrin Erk, Aurelie Herbelot
Abstract: In this paper, we derive a notion of word meaning in context from Fillmore's 'semantics of understanding', in which a listener draws on their knowledge of both language and the world to 'envision' the situation described in an utterance. We characterize utterance understanding as a combination of cognitive semantics and Discourse Representation Theory, formalized as a situation description system:a probabilistic model which takes utterance understanding to be the mental process of describing one or more situations that would account for an observed utterance. Our model captures the interplay of local and global contexts and their joint influence upon the lexical representation of sentence constituents. We implement the system using a directed graphical model, and apply it to examples containing various contextualization phenomena.
摘要:在本文中,我们得出的字从菲尔莫尔的“理解语义”,其中一个监听器借鉴了他们的语言和世界“预想”在话语中所描述的情况的知识情形下,意味着一个概念。我们表征话语理解认知语义和话语表达理论的结合,正式的情况说明系统:一个概率模型,这需要话语理解是描述了将占到观察到的话语的一个或多个的情况下的心理过程。我们的模型捕获的本地和全球范围内的相互作用和在句子成分的词汇表示他们的共同影响。我们实施使用定向图形模型系统,并将其应用到包含各种情境化现象的例子。
26. A Multimodal Memes Classification: A Survey and Open Research Issues [PDF] 返回目录
Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, Young-Koo Lee
Abstract: Memes are graphics and text overlapped so that together they present concepts that become dubious if one of them is absent. It is spread mostly on social media platforms, in the form of jokes, sarcasm, motivating, etc. After the success of BERT in Natural Language Processing (NLP), researchers inclined to Visual-Linguistic (VL) multimodal problems like memes classification, image captioning, Visual Question Answering (VQA), and many more. Unfortunately, many memes get uploaded each day on social media platforms that need automatic censoring to curb misinformation and hate. Recently, this issue has attracted the attention of researchers and practitioners. State-of-the-art methods that performed significantly on other VL dataset, tends to fail on memes classification. In this context, this work aims to conduct a comprehensive study on memes classification, generally on the VL multimodal problems and cutting edge solutions. We propose a generalized framework for VL problems. We cover the early and next-generation works on VL problems. Finally, we identify and articulate several open research issues and challenges. This is the first study that presents the generalized view of the advanced classification techniques concerning memes classification to the best of our knowledge. We believe this study presents a clear road-map for the Machine Learning (ML) research community to implement and enhance memes classification techniques.
摘要:模因是图形和文字重叠,使得这成为可疑的,如果他们中的一个不存在他们一起发明的概念。它是传播主要是在社会化媒体平台,以开玩笑,挖苦,激励等BERT的自然语言处理(NLP)成功后的形式,研究人员倾向于视觉语言学(VL),如拟子分类,图像多式联运问题字幕,视觉答疑(VQA),等等。不幸的是,许多模因获得上载需要自动删除遏制误传和仇恨的社交媒体平台的每一天。最近,这个问题已经引起了研究人员和从业者的关注。这对其他VL集显著执行国家的最先进的方法,往往会失败的记因分类。在此背景下,这项工作旨在对拟子分类进行了全面的研究,一般在VL多式联运问题和前沿解决方案。我们提出了VL问题广义框架。我们覆盖的VL问题早期和下一代作品。最后,我们确定和阐明几个开放的研究问题和挑战。这是第一次研究呈现的关于模因先进的分类技术广义视图分类为我们所知。我们相信,这项研究提出了一个明确的路线图的机器学习(ML)研究界在实施和加强拟子分类技术。
Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, Young-Koo Lee
Abstract: Memes are graphics and text overlapped so that together they present concepts that become dubious if one of them is absent. It is spread mostly on social media platforms, in the form of jokes, sarcasm, motivating, etc. After the success of BERT in Natural Language Processing (NLP), researchers inclined to Visual-Linguistic (VL) multimodal problems like memes classification, image captioning, Visual Question Answering (VQA), and many more. Unfortunately, many memes get uploaded each day on social media platforms that need automatic censoring to curb misinformation and hate. Recently, this issue has attracted the attention of researchers and practitioners. State-of-the-art methods that performed significantly on other VL dataset, tends to fail on memes classification. In this context, this work aims to conduct a comprehensive study on memes classification, generally on the VL multimodal problems and cutting edge solutions. We propose a generalized framework for VL problems. We cover the early and next-generation works on VL problems. Finally, we identify and articulate several open research issues and challenges. This is the first study that presents the generalized view of the advanced classification techniques concerning memes classification to the best of our knowledge. We believe this study presents a clear road-map for the Machine Learning (ML) research community to implement and enhance memes classification techniques.
摘要:模因是图形和文字重叠,使得这成为可疑的,如果他们中的一个不存在他们一起发明的概念。它是传播主要是在社会化媒体平台,以开玩笑,挖苦,激励等BERT的自然语言处理(NLP)成功后的形式,研究人员倾向于视觉语言学(VL),如拟子分类,图像多式联运问题字幕,视觉答疑(VQA),等等。不幸的是,许多模因获得上载需要自动删除遏制误传和仇恨的社交媒体平台的每一天。最近,这个问题已经引起了研究人员和从业者的关注。这对其他VL集显著执行国家的最先进的方法,往往会失败的记因分类。在此背景下,这项工作旨在对拟子分类进行了全面的研究,一般在VL多式联运问题和前沿解决方案。我们提出了VL问题广义框架。我们覆盖的VL问题早期和下一代作品。最后,我们确定和阐明几个开放的研究问题和挑战。这是第一次研究呈现的关于模因先进的分类技术广义视图分类为我们所知。我们相信,这项研究提出了一个明确的路线图的机器学习(ML)研究界在实施和加强拟子分类技术。
27. Impact and dynamics of hate and counter speech online [PDF] 返回目录
Joshua Garland, Keyan Ghazi-Zahedi, Jean-Gabriel Young, Laurent Hébert-Dufresne, Mirta Galesic
Abstract: Citizen-generated counter speech is a promising way to fight hate speech and promote peaceful, non-polarized discourse. However, there is a lack of large-scale longitudinal studies of its effectiveness for reducing hate speech. We investigate the effectiveness of counter speech using several different macro- and micro-level measures of over 180,000 political conversations that took place on German Twitter over four years. We report on the dynamic interactions of hate and counter speech over time and provide insights into whether, as in `classic' bullying situations, organized efforts are more effective than independent individuals in steering online discourse. Taken together, our results build a multifaceted picture of the dynamics of hate and counter speech online. They suggest that organized hate speech produced changes in the public discourse. Counter speech, especially when organized, could help in curbing hate speech in online discussions.
摘要:市民产生计数器讲话打击仇恨言论和促进和平,非极化的话语有前途的方法。然而,减少仇恨言论缺乏其有效性的大型纵向研究。我们使用的是发生在德国的Twitter在四年内超过180,000政治对话的几个不同的宏观和微观层面的措施,调查反演说的效果。我们仇恨和反言论随时间变化的动态交互报告,并提供深入了解是否为'经典”欺负的情况下,有组织的努力比转向在线话语独立的个体更有效。总之,我们的结果建立仇恨和反言论动态的多方面图片在线。他们认为,有组织的仇恨言论产生了公共话语的变化。计数器演讲,组织特别是,可以帮助在网上讨论遏制仇恨言论。
Joshua Garland, Keyan Ghazi-Zahedi, Jean-Gabriel Young, Laurent Hébert-Dufresne, Mirta Galesic
Abstract: Citizen-generated counter speech is a promising way to fight hate speech and promote peaceful, non-polarized discourse. However, there is a lack of large-scale longitudinal studies of its effectiveness for reducing hate speech. We investigate the effectiveness of counter speech using several different macro- and micro-level measures of over 180,000 political conversations that took place on German Twitter over four years. We report on the dynamic interactions of hate and counter speech over time and provide insights into whether, as in `classic' bullying situations, organized efforts are more effective than independent individuals in steering online discourse. Taken together, our results build a multifaceted picture of the dynamics of hate and counter speech online. They suggest that organized hate speech produced changes in the public discourse. Counter speech, especially when organized, could help in curbing hate speech in online discussions.
摘要:市民产生计数器讲话打击仇恨言论和促进和平,非极化的话语有前途的方法。然而,减少仇恨言论缺乏其有效性的大型纵向研究。我们使用的是发生在德国的Twitter在四年内超过180,000政治对话的几个不同的宏观和微观层面的措施,调查反演说的效果。我们仇恨和反言论随时间变化的动态交互报告,并提供深入了解是否为'经典”欺负的情况下,有组织的努力比转向在线话语独立的个体更有效。总之,我们的结果建立仇恨和反言论动态的多方面图片在线。他们认为,有组织的仇恨言论产生了公共话语的变化。计数器演讲,组织特别是,可以帮助在网上讨论遏制仇恨言论。
28. GraphCodeBERT: Pre-training Code Representations with Data Flow [PDF] 返回目录
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, Ming Zhou
Abstract: Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code snippet as a sequence of tokens, while ignoring the inherent structure of code, which provides crucial code semantics and would enhance the code understanding process. We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. Such a semantic-level structure is neat and does not bring an unnecessarily deep hierarchy of AST, the property of which makes the model more efficient. We develop GraphCodeBERT based on Transformer. In addition to using the task of masked language modeling, we introduce two structure-aware pre-training tasks. One is to predict code structure edges, and the other is to align representations between source code and code structure. We implement the model in an efficient way with a graph-guided masked attention function to incorporate the code structure. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement. Results show that code structure and newly introduced pre-training tasks can improve GraphCodeBERT and achieves state-of-the-art performance on the four downstream tasks. We further show that the model prefers structure-level attentions over token-level attentions in the task of code search.
摘要:编程语言预先训练模型已经在各种代码相关的任务,如代码搜索,代码完成,代码总结,等实现了惊人的经验改进。然而,现有的预训练的模型把一个代码片段作为一个序列令牌,而忽略的码固有的结构,其提供关键码语义和将加强代码理解过程。我们提出GraphCodeBERT,编程是考虑代码的内在结构语言预先训练的模式。而不是采取的像抽象语法树(AST)代码句法级结构中,我们使用的数据流在预训练阶段,这是一个代码语义层次结构编码的“的关系,其中,所述值-自带 - 摘自”变量之间。这样的语义层次结构整齐,不带AST的不必要的深层次的属性,它的使模型更有效。我们基于变压器的发展GraphCodeBERT。除了使用屏蔽语言建模的任务,我们引入两个结构感知前的训练任务。一个是预测码结构的边缘,而另一个为源代码和代码结构之间对准表示。我们与图形引导蒙面注意功能的有效方法,将代码结构实现模型。我们评估的4项任务,其中包括代码搜索,克隆检测,代码转换和代码细化模型。结果表明,代码结构和新引进的前训练任务可以提高GraphCodeBERT并实现在四个下游任务的国家的最先进的性能。进一步的研究表明,该模型程序首选标记级别的关注结构层次关注的代码搜索的任务。
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, Ming Zhou
Abstract: Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code snippet as a sequence of tokens, while ignoring the inherent structure of code, which provides crucial code semantics and would enhance the code understanding process. We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. Such a semantic-level structure is neat and does not bring an unnecessarily deep hierarchy of AST, the property of which makes the model more efficient. We develop GraphCodeBERT based on Transformer. In addition to using the task of masked language modeling, we introduce two structure-aware pre-training tasks. One is to predict code structure edges, and the other is to align representations between source code and code structure. We implement the model in an efficient way with a graph-guided masked attention function to incorporate the code structure. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement. Results show that code structure and newly introduced pre-training tasks can improve GraphCodeBERT and achieves state-of-the-art performance on the four downstream tasks. We further show that the model prefers structure-level attentions over token-level attentions in the task of code search.
摘要:编程语言预先训练模型已经在各种代码相关的任务,如代码搜索,代码完成,代码总结,等实现了惊人的经验改进。然而,现有的预训练的模型把一个代码片段作为一个序列令牌,而忽略的码固有的结构,其提供关键码语义和将加强代码理解过程。我们提出GraphCodeBERT,编程是考虑代码的内在结构语言预先训练的模式。而不是采取的像抽象语法树(AST)代码句法级结构中,我们使用的数据流在预训练阶段,这是一个代码语义层次结构编码的“的关系,其中,所述值-自带 - 摘自”变量之间。这样的语义层次结构整齐,不带AST的不必要的深层次的属性,它的使模型更有效。我们基于变压器的发展GraphCodeBERT。除了使用屏蔽语言建模的任务,我们引入两个结构感知前的训练任务。一个是预测码结构的边缘,而另一个为源代码和代码结构之间对准表示。我们与图形引导蒙面注意功能的有效方法,将代码结构实现模型。我们评估的4项任务,其中包括代码搜索,克隆检测,代码转换和代码细化模型。结果表明,代码结构和新引进的前训练任务可以提高GraphCodeBERT并实现在四个下游任务的国家的最先进的性能。进一步的研究表明,该模型程序首选标记级别的关注结构层次关注的代码搜索的任务。
29. Understanding Effects of Editing Tweets for News Sharing by Media Accounts through a Causal Inference Framework [PDF] 返回目录
Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla
Abstract: To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, research community does not own a sufficient level of understanding of what kinds of editing strategies are effective in promoting audience engagement. In this study, we aim to fill the gap by analyzing the current practices of media outlets using a data-driven approach. We first build a parallel corpus of original news articles and their corresponding tweets that were shared by eight media outlets. Then, we explore how those media edited tweets against original headlines, and the effects would be. To estimate the effects of editing news headlines for social media sharing in audience engagement, we present a systematic analysis that incorporates a causal inference technique with deep learning; using propensity score matching, it allows for estimating potential (dis-)advantages of an editing style compared to counterfactual cases where a similar news article is shared with a different style. According to the analyses of various editing styles, we report common and differing effects of the styles across the outlets. To understand the effects of various editing styles, media outlets could apply our easy-to-use tool by themselves.
摘要:为了达到更广泛的受众,优化交通走向新闻报道,媒体运行常用社交媒体帐户,并分享他们的一个简短的文字摘要内容。尽管在写文章共享一个引人注目的消息,它的重要性,研究界没有自己的什么样的编辑策略,能有效地促进观众的参与了解足够的水平。在这项研究中,我们的目标是通过分析使用数据驱动的方法媒体的现行做法填补了国内空白。我们首先构建了八个媒体共享原始新闻报道和其相应的鸣叫的平行语料库。然后,我们探讨这些媒体如何编辑对原标题鸣叫,而且效果会。为了估计编辑社交媒体在受众参与共享新闻标题的影响,我们提出整合与深度学习的因果推理技术进行系统的分析;使用倾向评分匹配,它允许用于估计的编辑风格电位(解散)相比的优点,其中相似的新闻文章与不同的样式共享反例。据各种编辑风格的分析,我们报告的整个网点的风格共同和不同的影响。要了解各种编辑风格的影响,媒体可以通过自己的应用我们的易于使用的工具。
Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla
Abstract: To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, research community does not own a sufficient level of understanding of what kinds of editing strategies are effective in promoting audience engagement. In this study, we aim to fill the gap by analyzing the current practices of media outlets using a data-driven approach. We first build a parallel corpus of original news articles and their corresponding tweets that were shared by eight media outlets. Then, we explore how those media edited tweets against original headlines, and the effects would be. To estimate the effects of editing news headlines for social media sharing in audience engagement, we present a systematic analysis that incorporates a causal inference technique with deep learning; using propensity score matching, it allows for estimating potential (dis-)advantages of an editing style compared to counterfactual cases where a similar news article is shared with a different style. According to the analyses of various editing styles, we report common and differing effects of the styles across the outlets. To understand the effects of various editing styles, media outlets could apply our easy-to-use tool by themselves.
摘要:为了达到更广泛的受众,优化交通走向新闻报道,媒体运行常用社交媒体帐户,并分享他们的一个简短的文字摘要内容。尽管在写文章共享一个引人注目的消息,它的重要性,研究界没有自己的什么样的编辑策略,能有效地促进观众的参与了解足够的水平。在这项研究中,我们的目标是通过分析使用数据驱动的方法媒体的现行做法填补了国内空白。我们首先构建了八个媒体共享原始新闻报道和其相应的鸣叫的平行语料库。然后,我们探讨这些媒体如何编辑对原标题鸣叫,而且效果会。为了估计编辑社交媒体在受众参与共享新闻标题的影响,我们提出整合与深度学习的因果推理技术进行系统的分析;使用倾向评分匹配,它允许用于估计的编辑风格电位(解散)相比的优点,其中相似的新闻文章与不同的样式共享反例。据各种编辑风格的分析,我们报告的整个网点的风格共同和不同的影响。要了解各种编辑风格的影响,媒体可以通过自己的应用我们的易于使用的工具。
30. Type-augmented Relation Prediction in Knowledge Graphs [PDF] 返回目录
Zijun Cui, Pavan Kapanipathi, Kartik Talamadupula, Tian Gao, Qiang Ji
Abstract: Knowledge graphs (KGs) are of great importance to many real world applications, but they generally suffer from incomplete information in the form of missing relations between entities. Knowledge graph completion (also known as relation prediction) is the task of inferring missing facts given existing ones. Most of the existing work is proposed by maximizing the likelihood of observed instance-level triples. Not much attention, however, is paid to the ontological information, such as type information of entities and relations. In this work, we propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction. In particular, type information and instance-level information are encoded as prior probabilities and likelihoods of relations respectively, and are combined by following Bayes' rule. Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on three benchmark datasets: FB15K, YAGO26K-906, and DB111K-174. In addition, we show that TaRP achieves significantly improved data efficiency. More importantly, the type information extracted from a specific dataset can generalize well to other datasets through the proposed TaRP model.
摘要:知识图(KGS)是许多现实世界的应用非常重要,但它们通常在缺少实体之间关系的形式不完全信息受到影响。知识图完成(也被称为关系预测)是推断给予现有的失踪事实的任务。现有的大部分工作是由最大化观察实例级三元的可能性建议。没有太多的关注,但是,支付给本体信息,如实体和关系的类型信息。在这项工作中,我们提出了一种增加了的关系预测(TARP)的方法,在这里我们应用类型信息以及相关预测实例级信息两者。具体地,类型信息和实例级信息被分别编码为先验概率和关系的似然性,并通过下面的贝叶斯规则相结合。我们提出的问题资产救助计划的方法实现了比对三个标准数据集的国家的最先进的方法显著更好的性能:FB15K,YAGO26K-906,和DB111K-174。此外,我们表明,TARP达到显著提高数据效率。更重要的是,从一个特定的数据集提取出的类型的信息可以概括以及通过所提出的篷布模型的其他数据集。
Zijun Cui, Pavan Kapanipathi, Kartik Talamadupula, Tian Gao, Qiang Ji
Abstract: Knowledge graphs (KGs) are of great importance to many real world applications, but they generally suffer from incomplete information in the form of missing relations between entities. Knowledge graph completion (also known as relation prediction) is the task of inferring missing facts given existing ones. Most of the existing work is proposed by maximizing the likelihood of observed instance-level triples. Not much attention, however, is paid to the ontological information, such as type information of entities and relations. In this work, we propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction. In particular, type information and instance-level information are encoded as prior probabilities and likelihoods of relations respectively, and are combined by following Bayes' rule. Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on three benchmark datasets: FB15K, YAGO26K-906, and DB111K-174. In addition, we show that TaRP achieves significantly improved data efficiency. More importantly, the type information extracted from a specific dataset can generalize well to other datasets through the proposed TaRP model.
摘要:知识图(KGS)是许多现实世界的应用非常重要,但它们通常在缺少实体之间关系的形式不完全信息受到影响。知识图完成(也被称为关系预测)是推断给予现有的失踪事实的任务。现有的大部分工作是由最大化观察实例级三元的可能性建议。没有太多的关注,但是,支付给本体信息,如实体和关系的类型信息。在这项工作中,我们提出了一种增加了的关系预测(TARP)的方法,在这里我们应用类型信息以及相关预测实例级信息两者。具体地,类型信息和实例级信息被分别编码为先验概率和关系的似然性,并通过下面的贝叶斯规则相结合。我们提出的问题资产救助计划的方法实现了比对三个标准数据集的国家的最先进的方法显著更好的性能:FB15K,YAGO26K-906,和DB111K-174。此外,我们表明,TARP达到显著提高数据效率。更重要的是,从一个特定的数据集提取出的类型的信息可以概括以及通过所提出的篷布模型的其他数据集。
注:中文为机器翻译结果!封面为论文标题词云图!