目录
3. Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders [PDF] 摘要
4. A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss [PDF] 摘要
6. A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking [PDF] 摘要
11. Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer [PDF] 摘要
12. Enhanced Universal Dependency Parsing with Second-Order Inference and Mixture of Training Data [PDF] 摘要
13. Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition [PDF] 摘要
18. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results [PDF] 摘要
19. BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text [PDF] 摘要
20. Do All Good Actors Look The Same? Exploring News Veracity Detection Across The U.S. and The U.K [PDF] 摘要
24. Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons [PDF] 摘要
25. A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection [PDF] 摘要
26. CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-artNLP Deep Learning Architectures on Commonsense Reasoning Task [PDF] 摘要
29. An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features [PDF] 摘要
摘要
1. Web Document Categorization Using Naive Bayes Classifier and Latent Semantic Analysis [PDF] 返回目录
Alireza Saleh Sedghpour, Mohammad Reza Saleh Sedghpour
Abstract: A rapid growth of web documents due to heavy use of World Wide Web necessitates efficient techniques to efficiently classify the document on the web. It is thus produced High volumes of data per second with high diversity. Automatically classification of these growing amounts of web document is One of the biggest challenges facing us today. Probabilistic classification algorithms such as Naive Bayes have become commonly used for web document classification. This problem is mainly because of the irrelatively high classification accuracy on plenty application areas as well as their lack of support to handle high dimensional and sparse data which is the exclusive characteristics of textual data representation. also it is common to Lack of attention and support the semantic relation between words using traditional feature selection method When dealing with the big data and large-scale web documents. In order to solve the problem, we proposed a method for web document classification that uses LSA to increase similarity of documents under the same class and improve the classification precision. Using this approach, we designed a faster and much accurate classifier for Web Documents. Experimental results have shown that using the mentioned preprocessing can improve accuracy and speed of Naive Bayes availably, the precision and recall metrics have indicated the improvement.
摘要:网络文件由于大量使用万维网的快速增长就必须有效的技术能够有效地在网络上的文件进行分类。因此,它产生具有高多样性每秒大量的数据。这些不断增长的大量Web文档的自动分类是一个我们今天所面临的最大挑战。概率分类算法,比如朴素贝叶斯已经成为常用的Web文档分类。这个问题主要是因为在很多应用领域irrelatively高分类精度以及他们缺乏支持,以处理高维稀疏数据是文本数据表示的独家特色。也常见的是缺乏关注和支持使用传统的特征选择方法,当与大数据和大规模的网络文件处理单词之间的语义关系。为了解决这个问题,我们提出了Web文档分类,使用LSA增加的文件相似的同一类下,提高了分类精度的方法。使用这种方法,我们专为Web文档更快,准确得多分类。实验结果表明,使用上述预处理可以有效提高朴素贝叶斯的精度和速度,精度和召回度量表明了改善。
Alireza Saleh Sedghpour, Mohammad Reza Saleh Sedghpour
Abstract: A rapid growth of web documents due to heavy use of World Wide Web necessitates efficient techniques to efficiently classify the document on the web. It is thus produced High volumes of data per second with high diversity. Automatically classification of these growing amounts of web document is One of the biggest challenges facing us today. Probabilistic classification algorithms such as Naive Bayes have become commonly used for web document classification. This problem is mainly because of the irrelatively high classification accuracy on plenty application areas as well as their lack of support to handle high dimensional and sparse data which is the exclusive characteristics of textual data representation. also it is common to Lack of attention and support the semantic relation between words using traditional feature selection method When dealing with the big data and large-scale web documents. In order to solve the problem, we proposed a method for web document classification that uses LSA to increase similarity of documents under the same class and improve the classification precision. Using this approach, we designed a faster and much accurate classifier for Web Documents. Experimental results have shown that using the mentioned preprocessing can improve accuracy and speed of Naive Bayes availably, the precision and recall metrics have indicated the improvement.
摘要:网络文件由于大量使用万维网的快速增长就必须有效的技术能够有效地在网络上的文件进行分类。因此,它产生具有高多样性每秒大量的数据。这些不断增长的大量Web文档的自动分类是一个我们今天所面临的最大挑战。概率分类算法,比如朴素贝叶斯已经成为常用的Web文档分类。这个问题主要是因为在很多应用领域irrelatively高分类精度以及他们缺乏支持,以处理高维稀疏数据是文本数据表示的独家特色。也常见的是缺乏关注和支持使用传统的特征选择方法,当与大数据和大规模的网络文件处理单词之间的语义关系。为了解决这个问题,我们提出了Web文档分类,使用LSA增加的文件相似的同一类下,提高了分类精度的方法。使用这种方法,我们专为Web文档更快,准确得多分类。实验结果表明,使用上述预处理可以有效提高朴素贝叶斯的精度和速度,精度和召回度量表明了改善。
2. DiscSense: Automated Semantic Analysis of Discourse Markers [PDF] 返回目录
Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller
Abstract: Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or phrases that are used to signal semantic and/or pragmatic relationships between clauses or sentences. Recent work has fruitfully explored the prediction of discourse markers between sentence pairs in order to learn accurate sentence representations, that are useful in various classification tasks. In this work, we take another perspective: using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exist hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.
摘要:话语标记({\它相比之下},{\它高兴地}等)的字词或用于信号条款或句子之间的语义和/或实际的关系的短语。最近的工作卓有成效的探索句对之间的话语标记的预测,以准确了解句子的表示,这是在不同的分类任务有用。在这项工作中,我们采取另一种观点:用训练来预测句子对之间话语标记的模型,我们预测与已知的语义关系句对之间似是而非的标记(通过现有的分类数据集提供)。这些预测让我们来研究话语标记和分类数据集注释的语义关系之间的联系。手工制作的映射已经被标记和话语之间的关系提出了一组有限的标记和一组种类有限,但存在着数以百计的话语标记的表达各种各样的关系,并且对竞争的话语之间的关系的分类没有达成共识理论(这在很大程度上是建立在一个自上而下的方式)。通过使用自动rediction方法比现有的语义标注的数据集,我们提供英语语篇标记的自下而上的表征。所得到的数据集,名为DiscSense,是公开的。
Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller
Abstract: Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or phrases that are used to signal semantic and/or pragmatic relationships between clauses or sentences. Recent work has fruitfully explored the prediction of discourse markers between sentence pairs in order to learn accurate sentence representations, that are useful in various classification tasks. In this work, we take another perspective: using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exist hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.
摘要:话语标记({\它相比之下},{\它高兴地}等)的字词或用于信号条款或句子之间的语义和/或实际的关系的短语。最近的工作卓有成效的探索句对之间的话语标记的预测,以准确了解句子的表示,这是在不同的分类任务有用。在这项工作中,我们采取另一种观点:用训练来预测句子对之间话语标记的模型,我们预测与已知的语义关系句对之间似是而非的标记(通过现有的分类数据集提供)。这些预测让我们来研究话语标记和分类数据集注释的语义关系之间的联系。手工制作的映射已经被标记和话语之间的关系提出了一组有限的标记和一组种类有限,但存在着数以百计的话语标记的表达各种各样的关系,并且对竞争的话语之间的关系的分类没有达成共识理论(这在很大程度上是建立在一个自上而下的方式)。通过使用自动rediction方法比现有的语义标注的数据集,我们提供英语语篇标记的自下而上的表征。所得到的数据集,名为DiscSense,是公开的。
3. Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders [PDF] 返回目录
Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe
Abstract: We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages. Differently from previous works, we simultaneously train $N$ languages in all translation directions by alternately freezing encoder or decoder modules, which indirectly forces the system to train in a common intermediate representation for all languages. Experimental results from multilingual machine translation show that we can successfully train this modular architecture improving on the initial languages while falling slightly behind when adding new languages or doing zero-shot translation. Additional comparison of the quality of sentence representation in the task of natural language inference shows that the alternately freezing training is also beneficial in this direction.
摘要:我们提出具体的语言编码器,解码器的模块化架构,构成一个多语种的机器翻译系统,可逐步扩展到新的语言,而不需要增加新的语言的时候再培训现有的系统。不同于以前的工作中,我们同时通过交替冷冻编码器或解码器的模块,这间接强制系统列车在对所有语言的公共中间表示培养中的所有平移方向$ N $语言。从多语言机器翻译表明,我们能够成功地培养这种模块化架构改进初始语言而略有下降背后添加新的语言或做零次转换时的实验结果。句表示的自然语言推理显示了任务质量的交替的冻结训练也是在这个方向上有益的附加比较。
Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe
Abstract: We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages. Differently from previous works, we simultaneously train $N$ languages in all translation directions by alternately freezing encoder or decoder modules, which indirectly forces the system to train in a common intermediate representation for all languages. Experimental results from multilingual machine translation show that we can successfully train this modular architecture improving on the initial languages while falling slightly behind when adding new languages or doing zero-shot translation. Additional comparison of the quality of sentence representation in the task of natural language inference shows that the alternately freezing training is also beneficial in this direction.
摘要:我们提出具体的语言编码器,解码器的模块化架构,构成一个多语种的机器翻译系统,可逐步扩展到新的语言,而不需要增加新的语言的时候再培训现有的系统。不同于以前的工作中,我们同时通过交替冷冻编码器或解码器的模块,这间接强制系统列车在对所有语言的公共中间表示培养中的所有平移方向$ N $语言。从多语言机器翻译表明,我们能够成功地培养这种模块化架构改进初始语言而略有下降背后添加新的语言或做零次转换时的实验结果。句表示的自然语言推理显示了任务质量的交替的冻结训练也是在这个方向上有益的附加比较。
4. A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss [PDF] 返回目录
Hou Pong Chan, Wang Chen, Irwin King
Abstract: Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. Review summarization aims at generating a concise summary that describes the key opinions and sentiment of a review, while sentiment classification aims to predict a sentiment label indicating the sentiment attitude of a review. To effectively leverage the shared sentiment information in both review summarization and sentiment classification tasks, we propose a novel dual-view model that jointly improves the performance of these two tasks. In our model, an encoder first learns a context representation for the review, then a summary decoder generates a review summary word by word. After that, a source-view sentiment classifier uses the encoded context representation to predict a sentiment label for the review, while a summary-view sentiment classifier uses the decoder hidden states to predict a sentiment label for the generated summary. During training, we introduce an inconsistency loss to penalize the disagreement between these two classifiers. It helps the decoder to generate a summary to have a consistent sentiment tendency with the review and also helps the two sentiment classifiers learn from each other. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
摘要:从用户评论获取准确的总结和感悟是现代电子商务平台的重要组成部分。回顾总结旨在生成描述审查的主要意见和情绪,而情感分类目标预测情绪标签,说明审查的感情态度的简明摘要。为了有效地利用两个评审总结和情感分类任务共享的情绪信息,我们建议,共同提高的这两个任务的表现一种新型的双视图模式。在我们的模型中,一个编码器首先学会的审查方面表示,然后总结解码器通过字生成评论汇总字。在此之后,源视情绪分类器使用该编码上下文表示预测一个情绪的标签的审查,而摘要视图情绪分类器使用解码器隐藏状态来预测一个情绪标签所生成的概要。在培训过程中,我们介绍的不一致损失,惩罚这两个分类之间的分歧。它可以帮助解码器生成一个总结与检讨一致的情绪倾向,也有助于两个情绪分类取长补短。来自不同领域的四个实世界的数据集实验结果表明我们的模型的有效性。
Hou Pong Chan, Wang Chen, Irwin King
Abstract: Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. Review summarization aims at generating a concise summary that describes the key opinions and sentiment of a review, while sentiment classification aims to predict a sentiment label indicating the sentiment attitude of a review. To effectively leverage the shared sentiment information in both review summarization and sentiment classification tasks, we propose a novel dual-view model that jointly improves the performance of these two tasks. In our model, an encoder first learns a context representation for the review, then a summary decoder generates a review summary word by word. After that, a source-view sentiment classifier uses the encoded context representation to predict a sentiment label for the review, while a summary-view sentiment classifier uses the decoder hidden states to predict a sentiment label for the generated summary. During training, we introduce an inconsistency loss to penalize the disagreement between these two classifiers. It helps the decoder to generate a summary to have a consistent sentiment tendency with the review and also helps the two sentiment classifiers learn from each other. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
摘要:从用户评论获取准确的总结和感悟是现代电子商务平台的重要组成部分。回顾总结旨在生成描述审查的主要意见和情绪,而情感分类目标预测情绪标签,说明审查的感情态度的简明摘要。为了有效地利用两个评审总结和情感分类任务共享的情绪信息,我们建议,共同提高的这两个任务的表现一种新型的双视图模式。在我们的模型中,一个编码器首先学会的审查方面表示,然后总结解码器通过字生成评论汇总字。在此之后,源视情绪分类器使用该编码上下文表示预测一个情绪的标签的审查,而摘要视图情绪分类器使用解码器隐藏状态来预测一个情绪标签所生成的概要。在培训过程中,我们介绍的不一致损失,惩罚这两个分类之间的分歧。它可以帮助解码器生成一个总结与检讨一致的情绪倾向,也有助于两个情绪分类取长补短。来自不同领域的四个实世界的数据集实验结果表明我们的模型的有效性。
5. Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [PDF] 返回目录
Jouni Luoma, Sampo Pyysalo
Abstract: Named entity recognition (NER) is frequently addressed as a sequence classification task where each input consists of one sentence of text. It is nevertheless clear that useful information for the task can often be found outside of the scope of a single-sentence context. Recently proposed self-attention models such as BERT can both efficiently capture long-distance relationships in input as well as represent inputs consisting of several sentences, creating new opportunitites for approaches that incorporate cross-sentence information in natural language processing tasks. In this paper, we present a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models. Including multiple sentences in each input also allows us to study the predictions of the same sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT. Our approach does not require any changes to the underlying BERT architecture, rather relying on restructuring examples for training and prediction. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with performance reported with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.
摘要:命名实体识别(NER)经常讨论的序列分类任务,每个输入由文本的一个句子。但它仍然是明确的,对于任务的有用的信息往往可以发现一个单句上下文的范围之外。最近提出的自我关注车型如BERT既可以有效地捕捉远距离的关系在输入以及代表由几个句子的输入,特别适合采用自然语言处理任务交叉句子信息的方法创造新opportunitites。在本文中,我们提出了一个系统的研究,探索利用BERT模式五种语言使用跨句信息ER。我们发现,在另外的语句来BERT输入的形式添加上下文系统上的所有测试语言和模型的增加NER性能。包括每个输入多个句子,也让我们研究在不同环境下同一句子的预测。我们提出了一个简单的方法,背景多数表决(CMV),以不同的预测对句子结合起来,证明这一点与BERT进一步增加NER性能。我们的方法不需要对底层架构BERT任何改变,而依靠改制为训练和预测的例子。评价建立数据集,包括CoNLL'02和CoNLL'03 NER基准,证明了我们提出的方法可以提高英语,荷兰语,和芬兰的国家的最先进的NER的结果,实现了报BERT为基础的最佳结果在德国,并且是在同水准与性能报道在西班牙其他基于BERT的办法。我们发布下开放许可在这项工作中实现的所有方法。
Jouni Luoma, Sampo Pyysalo
Abstract: Named entity recognition (NER) is frequently addressed as a sequence classification task where each input consists of one sentence of text. It is nevertheless clear that useful information for the task can often be found outside of the scope of a single-sentence context. Recently proposed self-attention models such as BERT can both efficiently capture long-distance relationships in input as well as represent inputs consisting of several sentences, creating new opportunitites for approaches that incorporate cross-sentence information in natural language processing tasks. In this paper, we present a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models. Including multiple sentences in each input also allows us to study the predictions of the same sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT. Our approach does not require any changes to the underlying BERT architecture, rather relying on restructuring examples for training and prediction. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with performance reported with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.
摘要:命名实体识别(NER)经常讨论的序列分类任务,每个输入由文本的一个句子。但它仍然是明确的,对于任务的有用的信息往往可以发现一个单句上下文的范围之外。最近提出的自我关注车型如BERT既可以有效地捕捉远距离的关系在输入以及代表由几个句子的输入,特别适合采用自然语言处理任务交叉句子信息的方法创造新opportunitites。在本文中,我们提出了一个系统的研究,探索利用BERT模式五种语言使用跨句信息ER。我们发现,在另外的语句来BERT输入的形式添加上下文系统上的所有测试语言和模型的增加NER性能。包括每个输入多个句子,也让我们研究在不同环境下同一句子的预测。我们提出了一个简单的方法,背景多数表决(CMV),以不同的预测对句子结合起来,证明这一点与BERT进一步增加NER性能。我们的方法不需要对底层架构BERT任何改变,而依靠改制为训练和预测的例子。评价建立数据集,包括CoNLL'02和CoNLL'03 NER基准,证明了我们提出的方法可以提高英语,荷兰语,和芬兰的国家的最先进的NER的结果,实现了报BERT为基础的最佳结果在德国,并且是在同水准与性能报道在西班牙其他基于BERT的办法。我们发布下开放许可在这项工作中实现的所有方法。
6. A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking [PDF] 返回目录
Yong Shan, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Cheng Niu, Jie Zhou
Abstract: Recent studies in dialogue state tracking (DST) leverage historical information to determine states which are generally represented as slot-value pairs. However, most of them have limitations to efficiently exploit relevant context due to the lack of a powerful mechanism for modeling interactions between the slot and the dialogue history. Besides, existing methods usually ignore the slot imbalance problem and treat all slots indiscriminately, which limits the learning of hard slots and eventually hurts overall performance. In this paper, we propose to enhance the DST through employing a contextual hierarchical attention network to not only discern relevant information at both word level and turn level but also learn contextual representations. We further propose an adaptive objective to alleviate the slot imbalance problem by dynamically adjust weights of different slots during training. Experimental results show that our approach reaches 52.68% and 58.55% joint accuracy on MultiWOZ 2.0 and MultiWOZ 2.1 datasets respectively and achieves new state-of-the-art performance with considerable improvements (+1.24% and +5.98%).
摘要:在对话状态跟踪(DST)的杠杆的历史信息最近的研究,以确定其通常表示为时隙 - 值对的状态。然而,大部分都有有效限制开发有关情况,由于缺乏一个强有力的机制建模插槽和对话历史之间的相互作用。此外,现有的方法通常忽略插槽不平衡问题,并把所有的插槽乱射,这限制了硬盘插槽的学习,最终伤害了整体性能。在本文中,我们提出通过在两个单词水平,反过来级采用情境层次关注网络不仅可以辨别相关信息,以提高DST也学习情境表示。我们进一步建议的自适应目标通过训练期间动态地调整不同的时隙的权重,以缓解槽的不平衡问题。实验结果表明,我们的方法分别为2.1的数据集到达52.68%和58.55%关节精度上MultiWOZ 2.0和MultiWOZ和实现先进的最先进的新的性能具有相当大的改进(+ 1.24%和5.98 +%)。
Yong Shan, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Cheng Niu, Jie Zhou
Abstract: Recent studies in dialogue state tracking (DST) leverage historical information to determine states which are generally represented as slot-value pairs. However, most of them have limitations to efficiently exploit relevant context due to the lack of a powerful mechanism for modeling interactions between the slot and the dialogue history. Besides, existing methods usually ignore the slot imbalance problem and treat all slots indiscriminately, which limits the learning of hard slots and eventually hurts overall performance. In this paper, we propose to enhance the DST through employing a contextual hierarchical attention network to not only discern relevant information at both word level and turn level but also learn contextual representations. We further propose an adaptive objective to alleviate the slot imbalance problem by dynamically adjust weights of different slots during training. Experimental results show that our approach reaches 52.68% and 58.55% joint accuracy on MultiWOZ 2.0 and MultiWOZ 2.1 datasets respectively and achieves new state-of-the-art performance with considerable improvements (+1.24% and +5.98%).
摘要:在对话状态跟踪(DST)的杠杆的历史信息最近的研究,以确定其通常表示为时隙 - 值对的状态。然而,大部分都有有效限制开发有关情况,由于缺乏一个强有力的机制建模插槽和对话历史之间的相互作用。此外,现有的方法通常忽略插槽不平衡问题,并把所有的插槽乱射,这限制了硬盘插槽的学习,最终伤害了整体性能。在本文中,我们提出通过在两个单词水平,反过来级采用情境层次关注网络不仅可以辨别相关信息,以提高DST也学习情境表示。我们进一步建议的自适应目标通过训练期间动态地调整不同的时隙的权重,以缓解槽的不平衡问题。实验结果表明,我们的方法分别为2.1的数据集到达52.68%和58.55%关节精度上MultiWOZ 2.0和MultiWOZ和实现先进的最先进的新的性能具有相当大的改进(+ 1.24%和5.98 +%)。
7. WikiBERT models: deep transfer learning for many languages [PDF] 返回目录
Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter
Abstract: Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. Due to the effort and computational cost involved in their pre-training, language-specific models are typically introduced only for a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using the state-of-the-art UDify parser on Universal Dependencies data, contrasting performance with results using the multilingual BERT model. We find that UDify using WikiBERT models outperforms the parser using mBERT on average, with the language-specific models showing substantially improved performance for some languages, yet limited improvement or a decrease in performance for others. We also present preliminary results as first steps toward an understanding of the conditions under which language-specific models are most beneficial. All of the methods and models introduced in this work are available under open licenses from this https URL.
摘要:深神经语言模型如BERT已经启用了最近在许多自然语言处理任务重大进展。由于努力和参与他们的训练前计算成本,特定语言的模型通常只介绍了少数资源丰富的语言,如英语。尽管多语言模型覆盖大量的语言可供选择,最近的研究显示一种语言训练可以产生更好的模型,我们的单和多语种培训之间的平衡的理解是不完整的。在本文中,我们介绍一个简单的,全自动的流水线从维基百科的数据创建特定语言-BERT模式,引进了42个新车型等,大多数为语言到现在而缺少专用的深层神经语言模型。我们评估使用的通用数据依赖关系的国家的最先进的UDify解析器这些车型的优点,对比使用多语言BERT模型结果的表现。我们发现,使用UDify模型WikiBERT的性能优于平均使用mBERT解析器,与特定语言的模型显示了某些语言大幅提高性能,但有限的改善或为他人性能下降。我们目前的初步结果对在何种情况下特定于语言的模型是最有利的条件的理解的第一步。所有在这项工作中引入的方法和模式正在开放许可可以从这个HTTPS URL。
Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter
Abstract: Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. Due to the effort and computational cost involved in their pre-training, language-specific models are typically introduced only for a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using the state-of-the-art UDify parser on Universal Dependencies data, contrasting performance with results using the multilingual BERT model. We find that UDify using WikiBERT models outperforms the parser using mBERT on average, with the language-specific models showing substantially improved performance for some languages, yet limited improvement or a decrease in performance for others. We also present preliminary results as first steps toward an understanding of the conditions under which language-specific models are most beneficial. All of the methods and models introduced in this work are available under open licenses from this https URL.
摘要:深神经语言模型如BERT已经启用了最近在许多自然语言处理任务重大进展。由于努力和参与他们的训练前计算成本,特定语言的模型通常只介绍了少数资源丰富的语言,如英语。尽管多语言模型覆盖大量的语言可供选择,最近的研究显示一种语言训练可以产生更好的模型,我们的单和多语种培训之间的平衡的理解是不完整的。在本文中,我们介绍一个简单的,全自动的流水线从维基百科的数据创建特定语言-BERT模式,引进了42个新车型等,大多数为语言到现在而缺少专用的深层神经语言模型。我们评估使用的通用数据依赖关系的国家的最先进的UDify解析器这些车型的优点,对比使用多语言BERT模型结果的表现。我们发现,使用UDify模型WikiBERT的性能优于平均使用mBERT解析器,与特定语言的模型显示了某些语言大幅提高性能,但有限的改善或为他人性能下降。我们目前的初步结果对在何种情况下特定于语言的模型是最有利的条件的理解的第一步。所有在这项工作中引入的方法和模式正在开放许可可以从这个HTTPS URL。
8. Situated and Interactive Multimodal Conversations [PDF] 返回目录
Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
Abstract: Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, etc., in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models will be made publicly available.
摘要:下一代虚拟助理被设想来处理多模式输入(例如,视觉,先前交互等的存储器中,除了用户的话语),并执行多模的动作(例如,显示除了路由信息来生成所述系统的发声)。我们介绍位于交互式多模态会话(SIMMC)作为新的发展方向,旨在表示诚挚除了对话历史共同发展的多模式输入上下文接地多式联运行动的培养剂。我们提供了两个SIMMC数据集共计使用多式联运向导的盎司(WOZ)建立〜13K人与人的对话(〜169K话语),在两个购物领域:(1)家具(在一个共享的虚拟环境接地)和( b)中的方式(在不断变化的图像集合的接地)。我们还提供与出现在每个场景中的项目,和上下文NLU和共指注释的日志,使用新的SIMMC既为用户和助理话语对话行为统一框架。最后,我们提出内SIMMC几项任务客观评价协议,如结构API预测和响应产生。我们对这些SIMMC任务,强大的基线现有模型的基准集合,并展示丰富的多模式的会话交互。我们的数据,注释,代码和车型都将公之于众。
Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
Abstract: Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, etc., in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models will be made publicly available.
摘要:下一代虚拟助理被设想来处理多模式输入(例如,视觉,先前交互等的存储器中,除了用户的话语),并执行多模的动作(例如,显示除了路由信息来生成所述系统的发声)。我们介绍位于交互式多模态会话(SIMMC)作为新的发展方向,旨在表示诚挚除了对话历史共同发展的多模式输入上下文接地多式联运行动的培养剂。我们提供了两个SIMMC数据集共计使用多式联运向导的盎司(WOZ)建立〜13K人与人的对话(〜169K话语),在两个购物领域:(1)家具(在一个共享的虚拟环境接地)和( b)中的方式(在不断变化的图像集合的接地)。我们还提供与出现在每个场景中的项目,和上下文NLU和共指注释的日志,使用新的SIMMC既为用户和助理话语对话行为统一框架。最后,我们提出内SIMMC几项任务客观评价协议,如结构API预测和响应产生。我们对这些SIMMC任务,强大的基线现有模型的基准集合,并展示丰富的多模式的会话交互。我们的数据,注释,代码和车型都将公之于众。
9. An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events [PDF] 返回目录
M. Janina Sarol, Ly Dinh, Rezvaneh Rezapour, Chieh-Li Chin, Pingjing Yang, Jana Diesner
Abstract: In times of crisis, identifying the essential needs is a crucial step to providing appropriate resources and services to affected entities. Social media platforms such as Twitter contain vast amount of information about the general public's needs. However, the sparsity of the information as well as the amount of noisy content present a challenge to practitioners to effectively identify shared information on these platforms. In this study, we propose two novel methods for two distinct but related needs detection tasks: the identification of 1) a list of resources needed ranked by priority, and 2) sentences that specify who-needs-what resources. We evaluated our methods on a set of tweets about the COVID-19 crisis. For task 1 (detecting top needs), we compared our results against two given lists of resources and achieved 64% precision. For task 2 (detecting who-needs-what), we compared our results on a set of 1,000 annotated tweets and achieved a 68% F1-score.
摘要:在危机时刻,确定基本需求是向受影响的实体提供适当的资源和服务的关键一步。社会化媒体平台,如Twitter包含大约广大市民的需求大量信息。然而,信息的稀疏性以及嘈杂的含量存在的量,以从业者的挑战,有效识别在这些平台上共享信息。在这项研究中,我们提出了两种不同但相关的需要检测任务的两种新方法:即指定谁-需求,什么样的资源的1标识)的资源列表需要按照优先级排列,和2)句子。我们评估了一组关于COVID-19危机的鸣叫我们的方法。任务1(检测顶部的需求),我们比较我们对资源的两个给定名单的成效,取得64%的精度。对于任务2(检测谁-需求-什么),我们比较了一组1000个注解鸣叫结果取得了68%的F1-得分。
M. Janina Sarol, Ly Dinh, Rezvaneh Rezapour, Chieh-Li Chin, Pingjing Yang, Jana Diesner
Abstract: In times of crisis, identifying the essential needs is a crucial step to providing appropriate resources and services to affected entities. Social media platforms such as Twitter contain vast amount of information about the general public's needs. However, the sparsity of the information as well as the amount of noisy content present a challenge to practitioners to effectively identify shared information on these platforms. In this study, we propose two novel methods for two distinct but related needs detection tasks: the identification of 1) a list of resources needed ranked by priority, and 2) sentences that specify who-needs-what resources. We evaluated our methods on a set of tweets about the COVID-19 crisis. For task 1 (detecting top needs), we compared our results against two given lists of resources and achieved 64% precision. For task 2 (detecting who-needs-what), we compared our results on a set of 1,000 annotated tweets and achieved a 68% F1-score.
摘要:在危机时刻,确定基本需求是向受影响的实体提供适当的资源和服务的关键一步。社会化媒体平台,如Twitter包含大约广大市民的需求大量信息。然而,信息的稀疏性以及嘈杂的含量存在的量,以从业者的挑战,有效识别在这些平台上共享信息。在这项研究中,我们提出了两种不同但相关的需要检测任务的两种新方法:即指定谁-需求,什么样的资源的1标识)的资源列表需要按照优先级排列,和2)句子。我们评估了一组关于COVID-19危机的鸣叫我们的方法。任务1(检测顶部的需求),我们比较我们对资源的两个给定名单的成效,取得64%的精度。对于任务2(检测谁-需求-什么),我们比较了一组1000个注解鸣叫结果取得了68%的F1-得分。
10. BERT Based Multilingual Machine Comprehension in English and Hindi [PDF] 返回目录
Somil Gupta, Nilesh Khade
Abstract: Multilingual Machine Comprehension (MMC) is a Question-Answering (QA) sub-task that involves quoting the answer for a question from a given snippet, where the question and the snippet can be in different languages. Recently released multilingual variant of BERT (m-BERT), pre-trained with 104 languages, has performed well in both zero-shot and fine-tuned settings for multilingual tasks; however, it has not been used for English-Hindi MMC yet. We, therefore, present in this article, our experiments with m-BERT for MMC in zero-shot, mono-lingual (e.g. Hindi Question-Hindi Snippet) and cross-lingual (e.g. English QuestionHindi Snippet) fine-tune setups. These model variants are evaluated on all possible multilingual settings and results are compared against the current state-of-the-art sequential QA system for these languages. Experiments show that m-BERT, with fine-tuning, improves performance on all evaluation settings across both the datasets used by the prior model, therefore establishing m-BERT based MMC as the new state-of-the-art for English and Hindi. We also publish our results on an extended version of the recently released XQuAD dataset, which we propose to use as the evaluation benchmark for future research.
摘要:多语种机器理解(MMC)是一个问题回答(QA)子任务,涉及引述答案从给定的片段,其中的问题和片段可以以不同的语言的问题。最近公布的BERT(M-BERT),预训练的有104种语言,已在这两个零射门和多语种的任务微调设置表现出色的多语言变体;然而,它并没有被用于英语,印地文MMC呢。因此,我们目前在这篇文章中,我们的实验在零射门M-BERT的MMC,单语种(如印地文QuestionHindi片段)和跨语言(如英语QuestionHindi摘录)微调设置。这些模型变体的所有可能的多语言设置,评估和结果对这些语言当前国家的最先进的连续QA系统相比。实验表明,M-BERT,与微调,提高了跨两个由现有模型中使用的数据集,所有的评价设置的性能,因此建立基于M-BERT MMC作为国家的最先进的英语和印地文新。我们还出版上的扩展版本,最近发布XQuAD数据集,我们建议作为评价基准为今后的研究使用我们的结果。
Somil Gupta, Nilesh Khade
Abstract: Multilingual Machine Comprehension (MMC) is a Question-Answering (QA) sub-task that involves quoting the answer for a question from a given snippet, where the question and the snippet can be in different languages. Recently released multilingual variant of BERT (m-BERT), pre-trained with 104 languages, has performed well in both zero-shot and fine-tuned settings for multilingual tasks; however, it has not been used for English-Hindi MMC yet. We, therefore, present in this article, our experiments with m-BERT for MMC in zero-shot, mono-lingual (e.g. Hindi Question-Hindi Snippet) and cross-lingual (e.g. English QuestionHindi Snippet) fine-tune setups. These model variants are evaluated on all possible multilingual settings and results are compared against the current state-of-the-art sequential QA system for these languages. Experiments show that m-BERT, with fine-tuning, improves performance on all evaluation settings across both the datasets used by the prior model, therefore establishing m-BERT based MMC as the new state-of-the-art for English and Hindi. We also publish our results on an extended version of the recently released XQuAD dataset, which we propose to use as the evaluation benchmark for future research.
摘要:多语种机器理解(MMC)是一个问题回答(QA)子任务,涉及引述答案从给定的片段,其中的问题和片段可以以不同的语言的问题。最近公布的BERT(M-BERT),预训练的有104种语言,已在这两个零射门和多语种的任务微调设置表现出色的多语言变体;然而,它并没有被用于英语,印地文MMC呢。因此,我们目前在这篇文章中,我们的实验在零射门M-BERT的MMC,单语种(如印地文QuestionHindi片段)和跨语言(如英语QuestionHindi摘录)微调设置。这些模型变体的所有可能的多语言设置,评估和结果对这些语言当前国家的最先进的连续QA系统相比。实验表明,M-BERT,与微调,提高了跨两个由现有模型中使用的数据集,所有的评价设置的性能,因此建立基于M-BERT MMC作为国家的最先进的英语和印地文新。我们还出版上的扩展版本,最近发布XQuAD数据集,我们建议作为评价基准为今后的研究使用我们的结果。
11. Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer [PDF] 返回目录
Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Francoise Beaufays
Abstract: The demand for fast and accurate incremental speech recognition increases as the applications of automatic speech recognition (ASR) proliferate. Incremental speech recognizers output chunks of partially recognized words while the user is still talking. Partial results can be revised before the ASR finalizes its hypothesis, causing instability issues. We analyze the quality and stability of on-device streaming end-to-end (E2E) ASR models. We first introduce a novel set of metrics that quantify the instability at word and segment levels. We study the impact of several model training techniques that improve E2E model qualities but degrade model stability. We categorize the causes of instability and explore various solutions to mitigate them in a streaming E2E ASR system. Index Terms: ASR, stability, end-to-end, text normalization,on-device, RNN-T
摘要:为快速,准确的增量语音识别随着自动语音识别(ASR)增殖的应用需求。增量语音识别的部分识别的单词输出块,而用户仍然说话。在最终确定ASR的假设之前,导致不稳定的问题部分结果是可以修改的。我们分析设备上的流媒体终端到端到端(E2E)ASR模型的质量和稳定性。我们首先引入新的一套量化的词,段水平的不稳定性指标。我们研究的几个模型训练技术,提高端到端的模型质量,但降级模式稳定性的影响。我们分类不稳定的根源,探索各种解决方案,以减轻他们在一个流E2E ASR系统。关键词:ASR,稳定,终端到终端的,文本规范化,设备上,RNN-T
Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Francoise Beaufays
Abstract: The demand for fast and accurate incremental speech recognition increases as the applications of automatic speech recognition (ASR) proliferate. Incremental speech recognizers output chunks of partially recognized words while the user is still talking. Partial results can be revised before the ASR finalizes its hypothesis, causing instability issues. We analyze the quality and stability of on-device streaming end-to-end (E2E) ASR models. We first introduce a novel set of metrics that quantify the instability at word and segment levels. We study the impact of several model training techniques that improve E2E model qualities but degrade model stability. We categorize the causes of instability and explore various solutions to mitigate them in a streaming E2E ASR system. Index Terms: ASR, stability, end-to-end, text normalization,on-device, RNN-T
摘要:为快速,准确的增量语音识别随着自动语音识别(ASR)增殖的应用需求。增量语音识别的部分识别的单词输出块,而用户仍然说话。在最终确定ASR的假设之前,导致不稳定的问题部分结果是可以修改的。我们分析设备上的流媒体终端到端到端(E2E)ASR模型的质量和稳定性。我们首先引入新的一套量化的词,段水平的不稳定性指标。我们研究的几个模型训练技术,提高端到端的模型质量,但降级模式稳定性的影响。我们分类不稳定的根源,探索各种解决方案,以减轻他们在一个流E2E ASR系统。关键词:ASR,稳定,终端到终端的,文本规范化,设备上,RNN-T
12. Enhanced Universal Dependency Parsing with Second-Order Inference and Mixture of Training Data [PDF] 返回目录
Xinyu Wang, Yong Jiang, Kewei Tu
Abstract: This paper presents the system used in our submission to the \textit{IWPT 2020 Shared Task}. Our system is a graph-based parser with second-order inference. For the low-resource Tamil corpus, we specially mixed the training data of Tamil with other languages and significantly improved the performance of Tamil. Due to our misunderstanding of the submission requirements, we submitted graphs that are not connected, which makes our system only rank \textbf{6th} over 10 teams. However, after we fixed this problem, our system is 0.6 ELAS higher than the team that ranked \textbf{1st} in the official results.
摘要:本文介绍在我们提交用于该\ {textit IWPT 2020共享任务}系统。我们的系统是二阶推断基于图的解析器。对于低资源泰米尔语语料库,我们特意与其他语言混合泰米尔人的训练数据和显著改善泰米尔人的表现。由于我们的投稿要求的误解,我们提交的未连接的图形,这使得我们的系统只排名\ textbf {} 6日在10支球队。然而,当我们修正了这个问题,我们的系统比排名\ textbf {} 1日在官方结果球队高0.6 ELAS。
Xinyu Wang, Yong Jiang, Kewei Tu
Abstract: This paper presents the system used in our submission to the \textit{IWPT 2020 Shared Task}. Our system is a graph-based parser with second-order inference. For the low-resource Tamil corpus, we specially mixed the training data of Tamil with other languages and significantly improved the performance of Tamil. Due to our misunderstanding of the submission requirements, we submitted graphs that are not connected, which makes our system only rank \textbf{6th} over 10 teams. However, after we fixed this problem, our system is 0.6 ELAS higher than the team that ranked \textbf{1st} in the official results.
摘要:本文介绍在我们提交用于该\ {textit IWPT 2020共享任务}系统。我们的系统是二阶推断基于图的解析器。对于低资源泰米尔语语料库,我们特意与其他语言混合泰米尔人的训练数据和显著改善泰米尔人的表现。由于我们的投稿要求的误解,我们提交的未连接的图形,这使得我们的系统只排名\ textbf {} 6日在10支球队。然而,当我们修正了这个问题,我们的系统比排名\ textbf {} 1日在官方结果球队高0.6 ELAS。
13. Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition [PDF] 返回目录
Takuma Kato, Kaori Abe, Hiroki Ouchi, Shumpei Miyawaki, Jun Suzuki, Kentaro Inui
Abstract: In general, the labels used in sequence labeling consist of different types of elements. For example, IOB-format entity labels, such as B-Person and I-Person, can be decomposed into span (B and I) and type information (Person). However, while most sequence labeling models do not consider such label components, the shared components across labels, such as Person, can be beneficial for label prediction. In this work, we propose to integrate label component information as embeddings into models. Through experiments on English and Japanese fine-grained named entity recognition, we demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.
摘要:一般情况下,在序列标注使用的标签由不同类型的元素的。例如,IOB-格式实体标签,如B-Person和I-人,可以分解成跨度(B和I)和类型信息(人)。然而,虽然大多数序列标注模型没有考虑这样的标签组件,所有标签共享组件,如人,可以为标签预测有益。在这项工作中,我们建议将标签组件信息的嵌入到模型中。通过对英语和日语细粒度命名实体识别实验中,我们证明了该方法提高性能,尤其是对于低频标签的情况。
Takuma Kato, Kaori Abe, Hiroki Ouchi, Shumpei Miyawaki, Jun Suzuki, Kentaro Inui
Abstract: In general, the labels used in sequence labeling consist of different types of elements. For example, IOB-format entity labels, such as B-Person and I-Person, can be decomposed into span (B and I) and type information (Person). However, while most sequence labeling models do not consider such label components, the shared components across labels, such as Person, can be beneficial for label prediction. In this work, we propose to integrate label component information as embeddings into models. Through experiments on English and Japanese fine-grained named entity recognition, we demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.
摘要:一般情况下,在序列标注使用的标签由不同类型的元素的。例如,IOB-格式实体标签,如B-Person和I-人,可以分解成跨度(B和I)和类型信息(人)。然而,虽然大多数序列标注模型没有考虑这样的标签组件,所有标签共享组件,如人,可以为标签预测有益。在这项工作中,我们建议将标签组件信息的嵌入到模型中。通过对英语和日语细粒度命名实体识别实验中,我们证明了该方法提高性能,尤其是对于低频标签的情况。
14. A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension [PDF] 返回目录
Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu
Abstract: Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.
摘要:预先训练模式带来显著改善许多NLP任务,并已被广泛分析。但鲜为人知的是,微调对特定任务的影响。直觉上,人们可以同意预先训练的模型已经学习单词的语义表示(如同义词是相互靠近)和微调,进一步提高其功能,这需要更复杂的推理(如指代消解,实体边界检测等) 。然而,如何分析验证这些参数和定量是一项具有挑战性的任务,很少有作品关注这个话题。在本文中,通过观察启发,大部分探测任务包括:标识配对词组(如共指需要相匹配的实体和代词),我们提出了一个两两探测器来了解机器阅读理解BERT微调(MRC)任务。具体而言,我们确定MRC 5升的现象。根据两两探测任务,我们比较预先训练和微调BERT的每一层的隐蔽的图示的表现。所提出的两两探头减轻不准确的模型训练分心的问题,使一个强大的和定量比较。我们的实验分析导致非常有信心的结论:(1)微调对基本面和低级别信息和通用语义的任务几乎没有影响。需要下游的任务(2)对于特定的能力,微调BERT比预先训练BERT和这样的间隙是显而易见的第五层后更好。
Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu
Abstract: Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.
摘要:预先训练模式带来显著改善许多NLP任务,并已被广泛分析。但鲜为人知的是,微调对特定任务的影响。直觉上,人们可以同意预先训练的模型已经学习单词的语义表示(如同义词是相互靠近)和微调,进一步提高其功能,这需要更复杂的推理(如指代消解,实体边界检测等) 。然而,如何分析验证这些参数和定量是一项具有挑战性的任务,很少有作品关注这个话题。在本文中,通过观察启发,大部分探测任务包括:标识配对词组(如共指需要相匹配的实体和代词),我们提出了一个两两探测器来了解机器阅读理解BERT微调(MRC)任务。具体而言,我们确定MRC 5升的现象。根据两两探测任务,我们比较预先训练和微调BERT的每一层的隐蔽的图示的表现。所提出的两两探头减轻不准确的模型训练分心的问题,使一个强大的和定量比较。我们的实验分析导致非常有信心的结论:(1)微调对基本面和低级别信息和通用语义的任务几乎没有影响。需要下游的任务(2)对于特定的能力,微调BERT比预先训练BERT和这样的间隙是显而易见的第五层后更好。
15. A Survey of Neural Networks and Formal Languages [PDF] 返回目录
Joshua Ackerman, George Cybenko
Abstract: This report is a survey of the relationships between various state-of-the-art neural network architectures and formal languages as, for example, structured by the Chomsky Language Hierarchy. Of particular interest are the abilities of a neural architecture to represent, recognize and generate words from a specific language by learning from samples of the language.
摘要:该报告是国家的最先进的各种神经网络的结构和形式语言,例如,由乔姆斯基语言层次结构之间的关系进行了调查。特别感兴趣的是一个神经结构的能力来表示,从语言学习的样本识别并生成特定语言的话。
Joshua Ackerman, George Cybenko
Abstract: This report is a survey of the relationships between various state-of-the-art neural network architectures and formal languages as, for example, structured by the Chomsky Language Hierarchy. Of particular interest are the abilities of a neural architecture to represent, recognize and generate words from a specific language by learning from samples of the language.
摘要:该报告是国家的最先进的各种神经网络的结构和形式语言,例如,由乔姆斯基语言层次结构之间的关系进行了调查。特别感兴趣的是一个神经结构的能力来表示,从语言学习的样本识别并生成特定语言的话。
16. Context-based Transformer Models for Answer Sentence Selection [PDF] 返回目录
Ivano Lauriola, Alessandro Moschitti
Abstract: An important task for the design of Question Answering systems is the selection of the sentence containing (or constituting) the answer from documents relevant to the asked question. Most previous work has only used the target sentence to compute its score with the question as the models were not powerful enough to also effectively encode additional contextual information. In this paper, we analyze the role of the contextual information in the sentence selection task, proposing a Transformer based architecture that leverages two types of contexts, local and global. The former describes the paragraph containing the sentence, aiming at solving implicit references, whereas the latter describes the entire document containing the candidate sentence, providing content-based information. The results on three different benchmarks show that the combination of local and global contexts in a Transformer model significantly improves the accuracy in Answer Sentence Selection.
摘要:问答系统设计中的一个重要任务是包含(或构成)对判决有关的问的问题文件的答案的选择。大多数以前的工作只用了译文句子来计算其成绩与问题,为模特们没有足够强大也有效地编码额外的上下文信息。在本文中,我们分析了在句子中选择任务的上下文信息的作用,提出了基于变压器的架构,充分利用两种环境中,局部和全局的。前者描述了包含了句,旨在解决含蓄地提及的段落,而后者描述了含有候选句整个文档,提供基于内容的信息。在三个不同的基准测试结果表明,本地和全球范围内的变压器模型相结合显著改善答句选择的准确性。
Ivano Lauriola, Alessandro Moschitti
Abstract: An important task for the design of Question Answering systems is the selection of the sentence containing (or constituting) the answer from documents relevant to the asked question. Most previous work has only used the target sentence to compute its score with the question as the models were not powerful enough to also effectively encode additional contextual information. In this paper, we analyze the role of the contextual information in the sentence selection task, proposing a Transformer based architecture that leverages two types of contexts, local and global. The former describes the paragraph containing the sentence, aiming at solving implicit references, whereas the latter describes the entire document containing the candidate sentence, providing content-based information. The results on three different benchmarks show that the combination of local and global contexts in a Transformer model significantly improves the accuracy in Answer Sentence Selection.
摘要:问答系统设计中的一个重要任务是包含(或构成)对判决有关的问的问题文件的答案的选择。大多数以前的工作只用了译文句子来计算其成绩与问题,为模特们没有足够强大也有效地编码额外的上下文信息。在本文中,我们分析了在句子中选择任务的上下文信息的作用,提出了基于变压器的架构,充分利用两种环境中,局部和全局的。前者描述了包含了句,旨在解决含蓄地提及的段落,而后者描述了含有候选句整个文档,提供基于内容的信息。在三个不同的基准测试结果表明,本地和全球范围内的变压器模型相结合显著改善答句选择的准确性。
17. Leveraging Affective Bidirectional Transformers for Offensive Language Detection [PDF] 返回目录
AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi
Abstract: Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data.
摘要:社会化媒体普遍存在于我们的生活,因此有必要确保检测和删除的进攻和仇恨言论安全的在线体验。在这项工作中,我们报道了我们提交的攻击性语言和仇恨言论检测共享与第四届研讨会开源阿拉伯语语料库与处理工具阿拉伯语(OSACT4)组织的任务。我们专注于培养纯粹的深度学习系统,而不需要功能的工程。为此,我们开发了自动数据增强的有效方法,并显示训练进攻和仇恨言论模型关闭(即通过微调)的效用先前训练的情感模型(即,情绪和情感)。我们最好的模型显著优于香草BERT模式,以89.60%的ACC(82.31%宏F1)的仇恨言论和95.20%ACC(70.51%宏F1)上的官方测试数据。
AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi
Abstract: Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data.
摘要:社会化媒体普遍存在于我们的生活,因此有必要确保检测和删除的进攻和仇恨言论安全的在线体验。在这项工作中,我们报道了我们提交的攻击性语言和仇恨言论检测共享与第四届研讨会开源阿拉伯语语料库与处理工具阿拉伯语(OSACT4)组织的任务。我们专注于培养纯粹的深度学习系统,而不需要功能的工程。为此,我们开发了自动数据增强的有效方法,并显示训练进攻和仇恨言论模型关闭(即通过微调)的效用先前训练的情感模型(即,情绪和情感)。我们最好的模型显著优于香草BERT模式,以89.60%的ACC(82.31%宏F1)的仇恨言论和95.20%ACC(70.51%宏F1)上的官方测试数据。
18. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results [PDF] 返回目录
Enrique Amigó, Julio Gonzalo, Stefano Mizzaro, Jorge Carrillo-de-Albornoz
Abstract: In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.
摘要:在序分类的任务,物品必须被分配给具有相对排序类,如积极,中性,负面情感分析。值得注意的是,对于有序分类任务最流行的评价指标,要么忽略相关信息(例如,精密/召回在每个班的忽略它们的相对排序)或承担其他信息(例如,中值平均误差是假定类之间的绝对距离)。在本文中,我们提出了有序分类,贴近度评价尺度,这是植根于测量理论与信息理论的新度量标准。我们从以上两个NLP合成数据和数据的理论分析和实验结果的共享任务表明,该从同时在不同的传统任务度量收集质量方面。此外,它概括一些流行的分类(标称刻度)和误差最小化(间隔尺度)的度量,取决于测量标度,其中它被实例化。
Enrique Amigó, Julio Gonzalo, Stefano Mizzaro, Jorge Carrillo-de-Albornoz
Abstract: In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.
摘要:在序分类的任务,物品必须被分配给具有相对排序类,如积极,中性,负面情感分析。值得注意的是,对于有序分类任务最流行的评价指标,要么忽略相关信息(例如,精密/召回在每个班的忽略它们的相对排序)或承担其他信息(例如,中值平均误差是假定类之间的绝对距离)。在本文中,我们提出了有序分类,贴近度评价尺度,这是植根于测量理论与信息理论的新度量标准。我们从以上两个NLP合成数据和数据的理论分析和实验结果的共享任务表明,该从同时在不同的传统任务度量收集质量方面。此外,它概括一些流行的分类(标称刻度)和误差最小化(间隔尺度)的度量,取决于测量标度,其中它被实例化。
19. BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text [PDF] 返回目录
Tanvi Dadu, Kartikey Pant, Radhika Mamidi
Abstract: There is a growing interest in understanding how humans initiate and hold conversations. The affective understanding of conversations focuses on the problem of how speakers use emotions to react to a situation and to each other. In the CL-Aff Shared Task, the organizers released Get it #OffMyChest dataset, which contains Reddit comments from casual and confessional conversations, labeled for their disclosure and supportiveness characteristics. In this paper, we introduce a predictive ensemble model exploiting the finetuned contextualized word embeddings, RoBERTa and ALBERT. We show that our model outperforms the base models in all considered metrics, achieving an improvement of $3\%$ in the F1 score. We further conduct statistical analysis and outline deeper insights into the given dataset while providing a new characterization of impact for the dataset.
摘要:为了解人类如何启动和保持对话的兴趣与日俱增。谈话的情感理解侧重于音箱如何利用情感来的情况作出反应的问题,并给对方。在CL-AFF共享任务,组织者发布得到它#OffMyChest数据集,其中包含从休闲和忏悔的谈话,标记为他们的信息披露和支持性特点reddit的评论。在本文中,我们介绍一种预测集成模型开发的微调,情境化的嵌入字,罗伯塔和阿尔伯特。我们表明,我们的模型优于所有考虑的指标基本模型,实现$ 3 \%$在F1评分改善。我们进一步进行统计分析,并概述深入了解给定的数据集,同时提供数据集影响的新特征。
Tanvi Dadu, Kartikey Pant, Radhika Mamidi
Abstract: There is a growing interest in understanding how humans initiate and hold conversations. The affective understanding of conversations focuses on the problem of how speakers use emotions to react to a situation and to each other. In the CL-Aff Shared Task, the organizers released Get it #OffMyChest dataset, which contains Reddit comments from casual and confessional conversations, labeled for their disclosure and supportiveness characteristics. In this paper, we introduce a predictive ensemble model exploiting the finetuned contextualized word embeddings, RoBERTa and ALBERT. We show that our model outperforms the base models in all considered metrics, achieving an improvement of $3\%$ in the F1 score. We further conduct statistical analysis and outline deeper insights into the given dataset while providing a new characterization of impact for the dataset.
摘要:为了解人类如何启动和保持对话的兴趣与日俱增。谈话的情感理解侧重于音箱如何利用情感来的情况作出反应的问题,并给对方。在CL-AFF共享任务,组织者发布得到它#OffMyChest数据集,其中包含从休闲和忏悔的谈话,标记为他们的信息披露和支持性特点reddit的评论。在本文中,我们介绍一种预测集成模型开发的微调,情境化的嵌入字,罗伯塔和阿尔伯特。我们表明,我们的模型优于所有考虑的指标基本模型,实现$ 3 \%$在F1评分改善。我们进一步进行统计分析,并概述深入了解给定的数据集,同时提供数据集影响的新特征。
20. Do All Good Actors Look The Same? Exploring News Veracity Detection Across The U.S. and The U.K [PDF] 返回目录
Benjamin D. Horne, Maurício Gruppi, Sibel Adalı
Abstract: A major concern with text-based news veracity detection methods is that they may not generalize across countries and cultures. In this short paper, we explicitly test news veracity models across news data from the United States and the United Kingdom, demonstrating there is reason for concern of generalizabilty. Through a series of testing scenarios, we show that text-based classifiers perform poorly when trained on one country's news data and tested on another. Furthermore, these same models have trouble classifying unseen, unreliable news sources. In conclusion, we discuss implications of these results and avenues for future work.
摘要:基于文本的消息真实性的检测方法的主要问题是,他们可能无法跨国家和文化的推广。在这短短的文章中,我们明确地对面的美国和英国的新闻数据测试新闻真实性模型,证明我们有理由generalizabilty的关注。通过一系列的测试场景中,我们展示一个国家的新闻数据训练和另一个测试时基于文本分类器表现不佳。此外,这些机型同样有麻烦分级看不见的,不可靠的新闻来源。最后,我们讨论这些结果和途径对未来工作的影响。
Benjamin D. Horne, Maurício Gruppi, Sibel Adalı
Abstract: A major concern with text-based news veracity detection methods is that they may not generalize across countries and cultures. In this short paper, we explicitly test news veracity models across news data from the United States and the United Kingdom, demonstrating there is reason for concern of generalizabilty. Through a series of testing scenarios, we show that text-based classifiers perform poorly when trained on one country's news data and tested on another. Furthermore, these same models have trouble classifying unseen, unreliable news sources. In conclusion, we discuss implications of these results and avenues for future work.
摘要:基于文本的消息真实性的检测方法的主要问题是,他们可能无法跨国家和文化的推广。在这短短的文章中,我们明确地对面的美国和英国的新闻数据测试新闻真实性模型,证明我们有理由generalizabilty的关注。通过一系列的测试场景中,我们展示一个国家的新闻数据训练和另一个测试时基于文本分类器表现不佳。此外,这些机型同样有麻烦分级看不见的,不可靠的新闻来源。最后,我们讨论这些结果和途径对未来工作的影响。
21. The 'Letter' Distribution in the Chinese Language [PDF] 返回目录
Qinghua Chen, Yan Wang, Mengmeng Wang, Xiaomeng Li
Abstract: Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent. In particular, the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages. This study provides new evidence of the consistency of human languages.
摘要:基于语料库的统计分析在语言研究一个显著的作用,和充足的证据表明,不同的语言表现出一些共同的规律。有研究发现,在一些拼音文字语言的字母都有惊人相似的统计使用频率分布。这是否搁置了中国,它采用表意文字书写?我们得到了一些字母书写语言的字母频率数据,发现信分布的普通法。此外,我们还收集了从唐朝到本不同历史时期中国文学语料库,我们拆除了中国的文字为三种基本的粒子:字符,笔画和建设性部分。统计分析结果表明,在不同的历史时期,在中国使用的基本颗粒的密度写入变化,但分布的形式是一致的。特别是,中国的建设性部分的分布是肯定与拼音文字的语言一致。这项研究提供了人类语言的一致性的新证据。
Qinghua Chen, Yan Wang, Mengmeng Wang, Xiaomeng Li
Abstract: Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent. In particular, the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages. This study provides new evidence of the consistency of human languages.
摘要:基于语料库的统计分析在语言研究一个显著的作用,和充足的证据表明,不同的语言表现出一些共同的规律。有研究发现,在一些拼音文字语言的字母都有惊人相似的统计使用频率分布。这是否搁置了中国,它采用表意文字书写?我们得到了一些字母书写语言的字母频率数据,发现信分布的普通法。此外,我们还收集了从唐朝到本不同历史时期中国文学语料库,我们拆除了中国的文字为三种基本的粒子:字符,笔画和建设性部分。统计分析结果表明,在不同的历史时期,在中国使用的基本颗粒的密度写入变化,但分布的形式是一致的。特别是,中国的建设性部分的分布是肯定与拼音文字的语言一致。这项研究提供了人类语言的一致性的新证据。
22. Learning Constraints for Structured Prediction Using Rectifier Networks [PDF] 返回目录
Xingyuan Pan, Maitrey Mehta, Vivek Srikumar
Abstract: Various natural language processing tasks are structured prediction problems where outputs are constructed with multiple interdependent decisions. Past work has shown that domain knowledge, framed as constraints over the output space, can help improve predictive accuracy. However, designing good constraints often relies on domain expertise. In this paper, we study the problem of learning such constraints. We frame the problem as that of training a two-layer rectifier network to identify valid structures or substructures, and show a construction for converting a trained network into a system of linear constraints over the inference variables. Our experiments on several NLP tasks show that the learned constraints can improve the prediction accuracy, especially when the number of training examples is small.
摘要:各种自然语言处理的任务被构造,其中输出被构造为具有多个相互依存决定预测的问题。过去的工作表明,领域知识,诬陷为对输出空间的限制,可以帮助提高预测的准确性。然而,设计良好的约束往往依赖于领域的专业知识。在本文中,我们研究学习这种约束的问题。我们框架的问题作为训练两层整流器网络,以确定有效的结构或子结构,并且示出了构造用于在所述推理变量转换训练的网络进入的线性约束的系统。我们几个NLP任务实验表明,了解到约束可以提高预测精度,尤其是当训练实例的数量较少。
Xingyuan Pan, Maitrey Mehta, Vivek Srikumar
Abstract: Various natural language processing tasks are structured prediction problems where outputs are constructed with multiple interdependent decisions. Past work has shown that domain knowledge, framed as constraints over the output space, can help improve predictive accuracy. However, designing good constraints often relies on domain expertise. In this paper, we study the problem of learning such constraints. We frame the problem as that of training a two-layer rectifier network to identify valid structures or substructures, and show a construction for converting a trained network into a system of linear constraints over the inference variables. Our experiments on several NLP tasks show that the learned constraints can improve the prediction accuracy, especially when the number of training examples is small.
摘要:各种自然语言处理的任务被构造,其中输出被构造为具有多个相互依存决定预测的问题。过去的工作表明,领域知识,诬陷为对输出空间的限制,可以帮助提高预测的准确性。然而,设计良好的约束往往依赖于领域的专业知识。在本文中,我们研究学习这种约束的问题。我们框架的问题作为训练两层整流器网络,以确定有效的结构或子结构,并且示出了构造用于在所述推理变量转换训练的网络进入的线性约束的系统。我们几个NLP任务实验表明,了解到约束可以提高预测精度,尤其是当训练实例的数量较少。
23. Automatic Discovery of Novel Intents & Domains from Text Utterances [PDF] 返回目录
Nikhita Vedula, Rahul Gupta, Aman Alok, Mukund Sridhar
Abstract: One of the primary tasks in Natural Language Understanding (NLU) is to recognize the intents as well as domains of users' spoken and written language utterances. Most existing research formulates this as a supervised classification problem with a closed-world assumption, i.e. the domains or intents to be identified are pre-defined or known beforehand. Real-world applications however increasingly encounter dynamic, rapidly evolving environments with newly emerging intents and domains, about which no information is known during model training. We propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled data. We first employ an open classification model to identify all utterances potentially consisting of a novel intent. Next, we build a knowledge transfer component with a pairwise margin loss function. It learns discriminative deep features to group together utterances and discover multiple latent intent categories within them in an unsupervised manner. We finally hierarchically link mutually related intents into domains, forming an intent-domain taxonomy. ADVIN significantly outperforms baselines on three benchmark datasets, and real user utterances from a commercial voice-powered agent.
摘要:一个在自然语言理解的主要任务(NLU)是认识的意图,以及口语的用户域和书面语言发言。大多数现有的研究制定本作为监督分类问题与封闭世界假定,即要被识别是预先定义的,或者是预先已知的域或意图。现实世界的应用越来越但是遇到有活力,快速发展的环境与新兴的意图和域,对此模型训练过程中没有信息是已知的。我们提出了一个新的框架,ADVIN,自动发现新的领域和意图从大量无标签的数据。首先,我们采用开放分类模型来识别所有潜在话语,包括一个新的意图。接下来,我们建立与成对差额损失功能的知识转让成分。它学会辨别深深的功能组合在一起的话语,发现在无人监督的方式在其中多个潜在意图类别。我们终于层次相互关联的意图链接到域,形成意图域分类。 ADVIN显著优于对三个标准数据集的基线,而真正的用户话语从商业的声音供电剂。
Nikhita Vedula, Rahul Gupta, Aman Alok, Mukund Sridhar
Abstract: One of the primary tasks in Natural Language Understanding (NLU) is to recognize the intents as well as domains of users' spoken and written language utterances. Most existing research formulates this as a supervised classification problem with a closed-world assumption, i.e. the domains or intents to be identified are pre-defined or known beforehand. Real-world applications however increasingly encounter dynamic, rapidly evolving environments with newly emerging intents and domains, about which no information is known during model training. We propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled data. We first employ an open classification model to identify all utterances potentially consisting of a novel intent. Next, we build a knowledge transfer component with a pairwise margin loss function. It learns discriminative deep features to group together utterances and discover multiple latent intent categories within them in an unsupervised manner. We finally hierarchically link mutually related intents into domains, forming an intent-domain taxonomy. ADVIN significantly outperforms baselines on three benchmark datasets, and real user utterances from a commercial voice-powered agent.
摘要:一个在自然语言理解的主要任务(NLU)是认识的意图,以及口语的用户域和书面语言发言。大多数现有的研究制定本作为监督分类问题与封闭世界假定,即要被识别是预先定义的,或者是预先已知的域或意图。现实世界的应用越来越但是遇到有活力,快速发展的环境与新兴的意图和域,对此模型训练过程中没有信息是已知的。我们提出了一个新的框架,ADVIN,自动发现新的领域和意图从大量无标签的数据。首先,我们采用开放分类模型来识别所有潜在话语,包括一个新的意图。接下来,我们建立与成对差额损失功能的知识转让成分。它学会辨别深深的功能组合在一起的话语,发现在无人监督的方式在其中多个潜在意图类别。我们终于层次相互关联的意图链接到域,形成意图域分类。 ADVIN显著优于对三个标准数据集的基线,而真正的用户话语从商业的声音供电剂。
24. Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons [PDF] 返回目录
Jens Helge Reelfs, Oliver Hohlfeld, Markus Strohmaier, Niklas Henckell
Abstract: We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associations between emojis, between emojis and text, and between text and emojis. Our investigations demonstrate anecdotally that word-emoji embeddings trained on large scale messaging data can reflect real-world semantic associations. To enable further research we release the Jodel Emoji Embedding Dataset (JEED1488) containing 1488 emojis and their embeddings along 300 dimensions.
摘要:我们培养从Jodel在线获得的社交网络的大规模邮件数据字表情图案的嵌入。我们的数据集包含40分多万的句子,其中有1100万分的句子标注有Unicode的13.0标准的表情符号列表的子集。我们探讨在这嵌入包含通过分析表情符号之间的关联,表情符号和文本之间以及文本和表情符号,语义表情符关联。我们的调查表明有传言称,经过训练,在大规模信息数据字表情图案的嵌入能反映真实世界的语义关联。为了能够进一步的研究,我们释放出含有1488种表情符号的Jodel绘文字嵌入数据集(JEED1488)和他们的嵌入沿300种尺寸。
Jens Helge Reelfs, Oliver Hohlfeld, Markus Strohmaier, Niklas Henckell
Abstract: We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associations between emojis, between emojis and text, and between text and emojis. Our investigations demonstrate anecdotally that word-emoji embeddings trained on large scale messaging data can reflect real-world semantic associations. To enable further research we release the Jodel Emoji Embedding Dataset (JEED1488) containing 1488 emojis and their embeddings along 300 dimensions.
摘要:我们培养从Jodel在线获得的社交网络的大规模邮件数据字表情图案的嵌入。我们的数据集包含40分多万的句子,其中有1100万分的句子标注有Unicode的13.0标准的表情符号列表的子集。我们探讨在这嵌入包含通过分析表情符号之间的关联,表情符号和文本之间以及文本和表情符号,语义表情符关联。我们的调查表明有传言称,经过训练,在大规模信息数据字表情图案的嵌入能反映真实世界的语义关联。为了能够进一步的研究,我们释放出含有1488种表情符号的Jodel绘文字嵌入数据集(JEED1488)和他们的嵌入沿300种尺寸。
25. A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection [PDF] 返回目录
O. H. Anidjar, C. Hajaj, A. Dvir, I. Gilad
Abstract: Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and F1-Score measurements. This negative correlation is shown to be the outcome of a massive experimental evaluation process, which accounts its superiority to recently developed voice based solutions. In order to overcome the number of speakers issue, we suggest a robust solution based on a novel Natural Language Processing (NLP) technique, as well as a metadata features extraction process, rather than a vocal based alone. To the best of our knowledge, we are the first to propose an intelligent NLP based solution that (I) tackles the CPD problem with a dataset in Hebrew, and (II) solves the CPD variant of the SD problem. We empirically show, based on two distinct datasets, that our method is abled to accurately identify the CPDs in an audio burst with 82.12% and 89.02% of success in the Recall and F1-score measurements.
摘要:扬声器Diarization(SD)由分割或根据扬声器的身份分割输入的音频突发的。在本文中,我们重点是音频分段处理的SD问题的重要任务,并提出了改变点检测(CPD)问题的解决方案。我们经验表明增加扬声器数量,召回和F1-分数测量之间的负相关关系。这种负相关证明是一个巨大的实验评估过程中,占了它的优越性最近开发的基于语音解决方案的结果。为了克服扬声器发行数量,我们建议基于一种新的自然语言处理(NLP)技术,强大的解决方案,以及元数据特征提取过程,而不是一个单纯的声乐基础。据我们所知,我们是第一个提出智能NLP基础的解决方案,(我)铲断CPD问题在希伯来语中的数据集,以及(ii)解决了SD问题的CPD变种。我们的经验表明,基于两个不同的数据集,我们的方法是体健以准确82.12%,在召回和F1-得分测量成功的89.02%识别的音频突发的国家方案文件。
O. H. Anidjar, C. Hajaj, A. Dvir, I. Gilad
Abstract: Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and F1-Score measurements. This negative correlation is shown to be the outcome of a massive experimental evaluation process, which accounts its superiority to recently developed voice based solutions. In order to overcome the number of speakers issue, we suggest a robust solution based on a novel Natural Language Processing (NLP) technique, as well as a metadata features extraction process, rather than a vocal based alone. To the best of our knowledge, we are the first to propose an intelligent NLP based solution that (I) tackles the CPD problem with a dataset in Hebrew, and (II) solves the CPD variant of the SD problem. We empirically show, based on two distinct datasets, that our method is abled to accurately identify the CPDs in an audio burst with 82.12% and 89.02% of success in the Recall and F1-score measurements.
摘要:扬声器Diarization(SD)由分割或根据扬声器的身份分割输入的音频突发的。在本文中,我们重点是音频分段处理的SD问题的重要任务,并提出了改变点检测(CPD)问题的解决方案。我们经验表明增加扬声器数量,召回和F1-分数测量之间的负相关关系。这种负相关证明是一个巨大的实验评估过程中,占了它的优越性最近开发的基于语音解决方案的结果。为了克服扬声器发行数量,我们建议基于一种新的自然语言处理(NLP)技术,强大的解决方案,以及元数据特征提取过程,而不是一个单纯的声乐基础。据我们所知,我们是第一个提出智能NLP基础的解决方案,(我)铲断CPD问题在希伯来语中的数据集,以及(ii)解决了SD问题的CPD变种。我们的经验表明,基于两个不同的数据集,我们的方法是体健以准确82.12%,在召回和F1-得分测量成功的89.02%识别的音频突发的国家方案文件。
26. CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-artNLP Deep Learning Architectures on Commonsense Reasoning Task [PDF] 返回目录
Sirwe Saeedi, Aliakbar Panahi, Seyran Saeedi, Alvis C Fong
Abstract: In this paper, we investigate a commonsense inference task that unifies natural language understanding and commonsense reasoning. We describe our attempt at SemEval-2020 Task 4competition: Commonsense Validation and Explanation (ComVE) challenge. We discuss several state-of-the-art deep learning architectures for this challenge. Our system uses prepared labeled textual datasets that were manually curated for three different natural language inference tasks.The goal of the first subtask is to test whether a model can distinguish between natural language statements that make sense and those that do not make sense. We compare the performance of several language models and fine-tuned classifiers. Then, we propose a method inspired by question/answering tasks to treat a classification problem as a multiple choice question task to boost the performance of our experimental results (96.06%), which is significantly better than the baseline. For the second subtask, which is to select the reason why a statement does not make sense, we stand within the first six teams (93.7%) among 27 participants with very competitive results.
摘要:在本文中,我们研究了一个常识性的推理任务,它统一了自然语言理解和常识推理。我们描述我们的SemEval-2020任务4competition尝试:常识验证和说明(ComVE)的挑战。我们讨论了一些国家的最先进的深度学习架构的这一挑战。我们的系统使用预处理标记是手动策划了第一子任务的三个不同的自然语言推理tasks.The目标是测试模型是否能够自然语言的语句区分文本数据集是否有意义和那些没有任何意义。我们比较了几种语言模型和微调,分类性能。然后,我们提出了一种方法,启发提问/回答任务处理分类问题,因为选择题的任务,以提高我们的实验结果(96.06%),这是显著比基线更好的性能。对于第二子任务,这是为什么选择一份声明中无厘头的理由,我们站在27个参与者之间的第一六支球队(93.7%)中非常有竞争力的结果。
Sirwe Saeedi, Aliakbar Panahi, Seyran Saeedi, Alvis C Fong
Abstract: In this paper, we investigate a commonsense inference task that unifies natural language understanding and commonsense reasoning. We describe our attempt at SemEval-2020 Task 4competition: Commonsense Validation and Explanation (ComVE) challenge. We discuss several state-of-the-art deep learning architectures for this challenge. Our system uses prepared labeled textual datasets that were manually curated for three different natural language inference tasks.The goal of the first subtask is to test whether a model can distinguish between natural language statements that make sense and those that do not make sense. We compare the performance of several language models and fine-tuned classifiers. Then, we propose a method inspired by question/answering tasks to treat a classification problem as a multiple choice question task to boost the performance of our experimental results (96.06%), which is significantly better than the baseline. For the second subtask, which is to select the reason why a statement does not make sense, we stand within the first six teams (93.7%) among 27 participants with very competitive results.
摘要:在本文中,我们研究了一个常识性的推理任务,它统一了自然语言理解和常识推理。我们描述我们的SemEval-2020任务4competition尝试:常识验证和说明(ComVE)的挑战。我们讨论了一些国家的最先进的深度学习架构的这一挑战。我们的系统使用预处理标记是手动策划了第一子任务的三个不同的自然语言推理tasks.The目标是测试模型是否能够自然语言的语句区分文本数据集是否有意义和那些没有任何意义。我们比较了几种语言模型和微调,分类性能。然后,我们提出了一种方法,启发提问/回答任务处理分类问题,因为选择题的任务,以提高我们的实验结果(96.06%),这是显著比基线更好的性能。对于第二子任务,这是为什么选择一份声明中无厘头的理由,我们站在27个参与者之间的第一六支球队(93.7%)中非常有竞争力的结果。
27. Automatic Dialogic Instruction Detection for K-12 Online One-on-one Classes [PDF] 返回目录
Shiting Xu, Wenbiao Ding, Zitao Liu
Abstract: Online one-on-one class is created for highly interactive and immersive learning experience. It demands a large number of qualified online instructors. In this work, we develop six dialogic instructions and help teachers achieve the benefits of one-on-one learning paradigm. Moreover, we utilize neural language models, i.e., long short-term memory (LSTM), to detect above six instructions automatically. Experiments demonstrate that the LSTM approach achieves AUC scores from 0.840 to 0.979 among all six types of instructions on our real-world educational dataset.
摘要:高度互动和身临其境的学习体验,创建在线一对一的一类。它需要大量合格的在线讲师。在这项工作中,我们开发了6只对话说明和帮助教师实现单对一个学习范式的好处。此外,我们利用神经语言模型,即,长短期记忆(LSTM),自动检测上述六个指令。实验表明,该LSTM方法实现AUC分数从0.840到0.979的所有六种类型对我们的现实世界的教育数据集的指令之一。
Shiting Xu, Wenbiao Ding, Zitao Liu
Abstract: Online one-on-one class is created for highly interactive and immersive learning experience. It demands a large number of qualified online instructors. In this work, we develop six dialogic instructions and help teachers achieve the benefits of one-on-one learning paradigm. Moreover, we utilize neural language models, i.e., long short-term memory (LSTM), to detect above six instructions automatically. Experiments demonstrate that the LSTM approach achieves AUC scores from 0.840 to 0.979 among all six types of instructions on our real-world educational dataset.
摘要:高度互动和身临其境的学习体验,创建在线一对一的一类。它需要大量合格的在线讲师。在这项工作中,我们开发了6只对话说明和帮助教师实现单对一个学习范式的好处。此外,我们利用神经语言模型,即,长短期记忆(LSTM),自动检测上述六个指令。实验表明,该LSTM方法实现AUC分数从0.840到0.979的所有六种类型对我们的现实世界的教育数据集的指令之一。
28. Hybrid Improved Document-level Embedding (HIDE) [PDF] 返回目录
Satanik Mitra, Mamata Jenamani
Abstract: In recent times, word embeddings are taking a significant role in sentiment analysis. As the generation of word embeddings needs huge corpora, many applications use pretrained embeddings. In spite of the success, word embeddings suffers from certain drawbacks such as it does not capture sentiment information of a word, contextual information in terms of parts of speech tags and domain-specific information. In this work we propose HIDE a Hybrid Improved Document level Embedding which incorporates domain information, parts of speech information and sentiment information into existing word embeddings such as GloVe and Word2Vec. It combine improved word embeddings into document level embeddings. Further, Latent Semantic Analysis (LSA) has been used to represent documents as a vectors. HIDE is generated, combining LSA and document level embeddings, which is computed from improved word embeddings. We test HIDE with six different datasets and shown considerable improvement over the accuracy of existing pretrained word vectors such as GloVe and Word2Vec. We further compare our work with two existing document level sentiment analysis approaches. HIDE performs better than existing systems.
摘要:近来,文字的嵌入正在情感分析一个显著的作用。由于字的嵌入的产生需要大量的语料库,许多应用程序都使用预训练的嵌入。尽管成功的,从某些缺点字的嵌入患有如不捕捉语音标签和特定域信息的部分条款一句话,上下文信息的情感信息。在这项工作中,我们提出了隐藏混合改进文档级嵌入其中包含域信息,语音信息和情绪信息到现有的嵌入字,如手套和Word2Vec部分。它结合改进的嵌入字到文档级别的嵌入。此外,潜在语义分析(LSA)已被用于表示单据作为载体。隐藏产生,结合LSA和文档级别的嵌入,这是从提高字的嵌入计算的。我们测试了六种不同的数据集隐藏和显示相当大的改进在现有的预训练的词矢量,如手套和Word2Vec的准确性。我们进一步比较我们两个现有的文档级情感分析方法是可行的。隐藏执行比现有系统更好。
Satanik Mitra, Mamata Jenamani
Abstract: In recent times, word embeddings are taking a significant role in sentiment analysis. As the generation of word embeddings needs huge corpora, many applications use pretrained embeddings. In spite of the success, word embeddings suffers from certain drawbacks such as it does not capture sentiment information of a word, contextual information in terms of parts of speech tags and domain-specific information. In this work we propose HIDE a Hybrid Improved Document level Embedding which incorporates domain information, parts of speech information and sentiment information into existing word embeddings such as GloVe and Word2Vec. It combine improved word embeddings into document level embeddings. Further, Latent Semantic Analysis (LSA) has been used to represent documents as a vectors. HIDE is generated, combining LSA and document level embeddings, which is computed from improved word embeddings. We test HIDE with six different datasets and shown considerable improvement over the accuracy of existing pretrained word vectors such as GloVe and Word2Vec. We further compare our work with two existing document level sentiment analysis approaches. HIDE performs better than existing systems.
摘要:近来,文字的嵌入正在情感分析一个显著的作用。由于字的嵌入的产生需要大量的语料库,许多应用程序都使用预训练的嵌入。尽管成功的,从某些缺点字的嵌入患有如不捕捉语音标签和特定域信息的部分条款一句话,上下文信息的情感信息。在这项工作中,我们提出了隐藏混合改进文档级嵌入其中包含域信息,语音信息和情绪信息到现有的嵌入字,如手套和Word2Vec部分。它结合改进的嵌入字到文档级别的嵌入。此外,潜在语义分析(LSA)已被用于表示单据作为载体。隐藏产生,结合LSA和文档级别的嵌入,这是从提高字的嵌入计算的。我们测试了六种不同的数据集隐藏和显示相当大的改进在现有的预训练的词矢量,如手套和Word2Vec的准确性。我们进一步比较我们两个现有的文档级情感分析方法是可行的。隐藏执行比现有系统更好。
29. An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features [PDF] 返回目录
Shi-Yan Weng, Tien-Hong Lo, Berlin Chen
Abstract: Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.
摘要:与话音信息相关联的多媒体的巨大数额驱动迫切需要发展高效和有效的自动摘要方法。为此,我们已经看到在应用监督深基于神经网络的方法来提取演讲摘要进展迅速。最近,从变形金刚(BERT)模式的双向编码器交涉,提出并取得了破纪录的成功在许多自然语言处理(NLP)的任务,如答疑和语言理解。鉴于此,我们在本文中考虑并加以提高语音总结了国家的最先进的基于BERT的模式,而其贡献至少三倍。首先,我们将探讨信心分数纳入一句陈述,看看这样的尝试可以帮助减轻因不完善的自动语音识别(ASR)的负面影响。其次,我们还增加有额外的结构和语言功能,如句子位置和逆文档频率(IDF)统计从BERT获得的句子的嵌入。最后,我们验证了我们提出的方法在基准数据集的有效性,在比较几款经典和著名的讲话总结方法。
Shi-Yan Weng, Tien-Hong Lo, Berlin Chen
Abstract: Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.
摘要:与话音信息相关联的多媒体的巨大数额驱动迫切需要发展高效和有效的自动摘要方法。为此,我们已经看到在应用监督深基于神经网络的方法来提取演讲摘要进展迅速。最近,从变形金刚(BERT)模式的双向编码器交涉,提出并取得了破纪录的成功在许多自然语言处理(NLP)的任务,如答疑和语言理解。鉴于此,我们在本文中考虑并加以提高语音总结了国家的最先进的基于BERT的模式,而其贡献至少三倍。首先,我们将探讨信心分数纳入一句陈述,看看这样的尝试可以帮助减轻因不完善的自动语音识别(ASR)的负面影响。其次,我们还增加有额外的结构和语言功能,如句子位置和逆文档频率(IDF)统计从BERT获得的句子的嵌入。最后,我们验证了我们提出的方法在基准数据集的有效性,在比较几款经典和著名的讲话总结方法。
30. Lexical Normalization for Code-switched Data and its Effect on POS-tagging [PDF] 返回目录
Rob van der Goot, Özlem Çetinoğlu
Abstract: Social media provides an unfiltered stream of user-generated input, leading to creative language use and many interesting linguistic phenomena, which were previously not available so abundantly. However, this language is harder to process automatically. One particularly challenging phenomenon is the use of multiple languages within one utterance, also called Code-Switching (CS). Whereas monolingual social media data already provides many problems for natural language processing, CS adds another challenging dimension. One solution that is commonly used to improve processing of social media data is to translate input texts to standard language first. This normalization has shown to improve performance of many natural language processing tasks. In this paper, we focus on normalization in the context of code-switching. We introduce a variety of models to perform normalization on CS data, and analyse the impact of word-level language identification on normalization. We show that the performance of the proposed normalization models is generally high, but language labels are only slightly informative. We also carry out POS tagging as extrinsic evaluation and show that automatic normalization of the input leads to 3.2% absolute performance increase, whereas gold normalization leads to an increase of 6.8%.
摘要:社会化媒体提供用户生成的输入的未过滤气流,从而导致创造性语言使用和许多有趣的语言现象,这在以前是不可用这么多的恩惠。然而,这种语言是很难自动处理。一个特别具有挑战性的现象是一个话语,也被称为码转换(CS)中使用多种语言。而单语社会化媒体数据已经提供了自然语言处理很多问题,CS增加了一个具有挑战性的维度。一种常用来提高社交媒体数据的处理的一个解决方案是第一个翻译文本输入到标准语言。这种正常化已经显示出改善的许多自然语言处理任务中的表现。在本文中,我们专注于正常化的语码转换的背景下。我们推出了各种模型对CS的数据进行归一化,和分析的字级语言识别正常化的影响。我们表明,该标准化模型的性能通常很高,但语言标签是仅略有信息。我们还开展POS标记,作为外在的评价,并显示输入导线,以3.2%的绝对性能提升是自动正常化,而黄金正常化导致增长6.8%。
Rob van der Goot, Özlem Çetinoğlu
Abstract: Social media provides an unfiltered stream of user-generated input, leading to creative language use and many interesting linguistic phenomena, which were previously not available so abundantly. However, this language is harder to process automatically. One particularly challenging phenomenon is the use of multiple languages within one utterance, also called Code-Switching (CS). Whereas monolingual social media data already provides many problems for natural language processing, CS adds another challenging dimension. One solution that is commonly used to improve processing of social media data is to translate input texts to standard language first. This normalization has shown to improve performance of many natural language processing tasks. In this paper, we focus on normalization in the context of code-switching. We introduce a variety of models to perform normalization on CS data, and analyse the impact of word-level language identification on normalization. We show that the performance of the proposed normalization models is generally high, but language labels are only slightly informative. We also carry out POS tagging as extrinsic evaluation and show that automatic normalization of the input leads to 3.2% absolute performance increase, whereas gold normalization leads to an increase of 6.8%.
摘要:社会化媒体提供用户生成的输入的未过滤气流,从而导致创造性语言使用和许多有趣的语言现象,这在以前是不可用这么多的恩惠。然而,这种语言是很难自动处理。一个特别具有挑战性的现象是一个话语,也被称为码转换(CS)中使用多种语言。而单语社会化媒体数据已经提供了自然语言处理很多问题,CS增加了一个具有挑战性的维度。一种常用来提高社交媒体数据的处理的一个解决方案是第一个翻译文本输入到标准语言。这种正常化已经显示出改善的许多自然语言处理任务中的表现。在本文中,我们专注于正常化的语码转换的背景下。我们推出了各种模型对CS的数据进行归一化,和分析的字级语言识别正常化的影响。我们表明,该标准化模型的性能通常很高,但语言标签是仅略有信息。我们还开展POS标记,作为外在的评价,并显示输入导线,以3.2%的绝对性能提升是自动正常化,而黄金正常化导致增长6.8%。
31. Neural Speaker Diarization with Speaker-Wise Chain Rule [PDF] 返回目录
Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu
Abstract: Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results with a variable number of speakers and outperforms the state-of-the-art end-to-end speaker diarization methods in terms of diarization error rate.
摘要:扬声器diarization是用于处理多扬声器音频的必要步骤。虽然端至端神经diarization(EEND)方法来实现的状态的最先进的性能,它被限制在一个固定数量的扬声器。在本文中,我们通过基于概率的链式法则一种新型的扬声器明智的条件推理方法解决这个固定扬声器问题的数量。在所提出的方法中,每个说话者的语音活动被视为单一的随机变量,并且估计依次空调先前估计的其他发言者的语音活动。类似于其他序列到序列模型,所提出的方法产生的可变数目扬声器与停止序列的条件。我们评估的扬声器可变数量的多扬声器录音所提出的方法。实验结果表明,所提出的方法能够正确地产生与扬声器的可变数目diarization结果,优于在diarization错误率方面的状态的最先进的端至端扬声器diarization方法。
Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu
Abstract: Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results with a variable number of speakers and outperforms the state-of-the-art end-to-end speaker diarization methods in terms of diarization error rate.
摘要:扬声器diarization是用于处理多扬声器音频的必要步骤。虽然端至端神经diarization(EEND)方法来实现的状态的最先进的性能,它被限制在一个固定数量的扬声器。在本文中,我们通过基于概率的链式法则一种新型的扬声器明智的条件推理方法解决这个固定扬声器问题的数量。在所提出的方法中,每个说话者的语音活动被视为单一的随机变量,并且估计依次空调先前估计的其他发言者的语音活动。类似于其他序列到序列模型,所提出的方法产生的可变数目扬声器与停止序列的条件。我们评估的扬声器可变数量的多扬声器录音所提出的方法。实验结果表明,所提出的方法能够正确地产生与扬声器的可变数目diarization结果,优于在diarization错误率方面的状态的最先进的端至端扬声器diarization方法。
32. Relational Learning Analysis of Social Politics using Knowledge Graph Embedding [PDF] 返回目录
Bilal Abu-Salih, Marwan Al-Tawil, Ibrahim Aljarah, Hossam Faris, Pornpit Wongthongtham
Abstract: Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools. Therefore, the application of KGs has extended to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a credibility module to ensure data quality and trustworthiness. The constructed KG is then embedded in a low-dimension semantically-continuous space using several embedding techniques. The utility of the constructed KG and its embeddings is demonstrated and substantiated on link prediction, clustering, and visualisation tasks.
摘要:知识图(KGS)已经从学术界和工业界获得了相当大的关注最近。事实上,整合图形技术和各种图表数据集的丰富导致了研究界构建复杂的图形分析工具。因此,幼儿园的应用已经延伸到应对在不同领域的现实问题太多了。尽管丰目前激增通用幼稚园,有迫切需要构建特定领域的幼稚园。此外,质量和信誉应在构建和增强幼稚园,特别是那些从混合质量资源传播的过程中被同化如社会媒体数据。本文提出了一种新的基于域的可信度KG嵌入框架。该框架包括捕获来自不同种类的资源获得到由领域本体所示的正式KG表示数据的融合物。所提出的方法是利用各种知识为基础的资源库,丰富的文本内容的语义,从而促进信息的互操作性。拟议的框架也体现了信誉模块,以保证数据的质量和可信度。然后将构建的KG是使用若干嵌入技术嵌入在低维语义连续空间。所构建的KG和其的嵌入的效用证明和证实链路预测,聚类,和可视化任务。
Bilal Abu-Salih, Marwan Al-Tawil, Ibrahim Aljarah, Hossam Faris, Pornpit Wongthongtham
Abstract: Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools. Therefore, the application of KGs has extended to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a credibility module to ensure data quality and trustworthiness. The constructed KG is then embedded in a low-dimension semantically-continuous space using several embedding techniques. The utility of the constructed KG and its embeddings is demonstrated and substantiated on link prediction, clustering, and visualisation tasks.
摘要:知识图(KGS)已经从学术界和工业界获得了相当大的关注最近。事实上,整合图形技术和各种图表数据集的丰富导致了研究界构建复杂的图形分析工具。因此,幼儿园的应用已经延伸到应对在不同领域的现实问题太多了。尽管丰目前激增通用幼稚园,有迫切需要构建特定领域的幼稚园。此外,质量和信誉应在构建和增强幼稚园,特别是那些从混合质量资源传播的过程中被同化如社会媒体数据。本文提出了一种新的基于域的可信度KG嵌入框架。该框架包括捕获来自不同种类的资源获得到由领域本体所示的正式KG表示数据的融合物。所提出的方法是利用各种知识为基础的资源库,丰富的文本内容的语义,从而促进信息的互操作性。拟议的框架也体现了信誉模块,以保证数据的质量和可信度。然后将构建的KG是使用若干嵌入技术嵌入在低维语义连续空间。所构建的KG和其的嵌入的效用证明和证实链路预测,聚类,和可视化任务。
33. Question Answering on Scholarly Knowledge Graphs [PDF] 返回目录
Mohamad Yaser Jaradeh, Markus Stocker, Sören Auer
Abstract: Answering questions on scholarly knowledge comprising text and other artifacts is a vital part of any research life cycle. Querying scholarly knowledge and retrieving suitable answers is currently hardly possible due to the following primary reason: machine inactionable, ambiguous and unstructured content in publications. We present JarvisQA, a BERT based system to answer questions on tabular views of scholarly knowledge graphs. Such tables can be found in a variety of shapes in the scholarly literature (e.g., surveys, comparisons or results). Our system can retrieve direct answers to a variety of different questions asked on tabular data in articles. Furthermore, we present a preliminary dataset of related tables and a corresponding set of natural language questions. This dataset is used as a benchmark for our system and can be reused by others. Additionally, JarvisQA is evaluated on two datasets against other baselines and shows an improvement of two to three folds in performance compared to related methods.
摘要:在回答有关学术知识,包括文本和其他文物的问题是任何研究生命周期的重要组成部分。查询学术知识和检索合适的答案是目前几乎不可能由于以下主要原因:机inactionable,在出版物暧昧化和非结构化内容。我们提出JarvisQA,一个BERT为基础的系统对学术知识图的表格视图回答问题。这样的表可以以各种在学术文献(例如,调查,比较或结果)的形状被发现。我们的系统可以检索直接回答各种不同的问题在文章表格数据要求。此外,我们提出了相关表的初步数据集和自然语言问题的相应集。此数据集作为我们系统的基准,并且可以被其他人重复使用。此外,JarvisQA是在对其他基线和节目相比,相关的方法在性能上两到三个倍的改善两个数据集进行评估。
Mohamad Yaser Jaradeh, Markus Stocker, Sören Auer
Abstract: Answering questions on scholarly knowledge comprising text and other artifacts is a vital part of any research life cycle. Querying scholarly knowledge and retrieving suitable answers is currently hardly possible due to the following primary reason: machine inactionable, ambiguous and unstructured content in publications. We present JarvisQA, a BERT based system to answer questions on tabular views of scholarly knowledge graphs. Such tables can be found in a variety of shapes in the scholarly literature (e.g., surveys, comparisons or results). Our system can retrieve direct answers to a variety of different questions asked on tabular data in articles. Furthermore, we present a preliminary dataset of related tables and a corresponding set of natural language questions. This dataset is used as a benchmark for our system and can be reused by others. Additionally, JarvisQA is evaluated on two datasets against other baselines and shows an improvement of two to three folds in performance compared to related methods.
摘要:在回答有关学术知识,包括文本和其他文物的问题是任何研究生命周期的重要组成部分。查询学术知识和检索合适的答案是目前几乎不可能由于以下主要原因:机inactionable,在出版物暧昧化和非结构化内容。我们提出JarvisQA,一个BERT为基础的系统对学术知识图的表格视图回答问题。这样的表可以以各种在学术文献(例如,调查,比较或结果)的形状被发现。我们的系统可以检索直接回答各种不同的问题在文章表格数据要求。此外,我们提出了相关表的初步数据集和自然语言问题的相应集。此数据集作为我们系统的基准,并且可以被其他人重复使用。此外,JarvisQA是在对其他基线和节目相比,相关的方法在性能上两到三个倍的改善两个数据集进行评估。
34. NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature [PDF] 返回目录
Saif M. Mohammad
Abstract: As part of the NLP Scholar project, we created a single unified dataset of NLP papers and their meta-information (including citation numbers), by extracting and aligning information from the ACL Anthology and Google Scholar. In this paper, we describe several interconnected interactive visualizations (dashboards) that present various aspects of the data. Clicking on an item within a visualization or entering query terms in the search boxes filters the data in all visualizations in the dashboard. This allows users to search for papers in the area of their interest, published within specific time periods, published by specified authors, etc. The interactive visualizations presented here, and the associated dataset of papers mapped to citations, have additional uses as well including understanding how the field is growing (both overall and across sub-areas), as well as quantifying the impact of different types of papers on subsequent publications.
摘要:随着NLP学者项目的一部分,我们创建的NLP文件和他们的元信息(包括引用的数字),一个统一的数据集,通过提取和调整从ACL文集和谷歌学术信息。在本文中,我们介绍了几个相互关联的交互式可视化(仪表盘),该数据目前各个方面。点击一个项目一个可视化文件中或者在搜索框中输入查询词过滤器在仪表板的所有可视化数据。这允许用户在他们感兴趣的领域,具体的时间期限内公布,由指定作者等交互式可视化呈现在这里,并映射到引用论文的相关数据集中公布搜索的论文,还有其他的用途,以及包括理解该领域是如何增长(整体和跨子区域),以及量化不同类型的后续发布论文的影响。
Saif M. Mohammad
Abstract: As part of the NLP Scholar project, we created a single unified dataset of NLP papers and their meta-information (including citation numbers), by extracting and aligning information from the ACL Anthology and Google Scholar. In this paper, we describe several interconnected interactive visualizations (dashboards) that present various aspects of the data. Clicking on an item within a visualization or entering query terms in the search boxes filters the data in all visualizations in the dashboard. This allows users to search for papers in the area of their interest, published within specific time periods, published by specified authors, etc. The interactive visualizations presented here, and the associated dataset of papers mapped to citations, have additional uses as well including understanding how the field is growing (both overall and across sub-areas), as well as quantifying the impact of different types of papers on subsequent publications.
摘要:随着NLP学者项目的一部分,我们创建的NLP文件和他们的元信息(包括引用的数字),一个统一的数据集,通过提取和调整从ACL文集和谷歌学术信息。在本文中,我们介绍了几个相互关联的交互式可视化(仪表盘),该数据目前各个方面。点击一个项目一个可视化文件中或者在搜索框中输入查询词过滤器在仪表板的所有可视化数据。这允许用户在他们感兴趣的领域,具体的时间期限内公布,由指定作者等交互式可视化呈现在这里,并映射到引用论文的相关数据集中公布搜索的论文,还有其他的用途,以及包括理解该领域是如何增长(整体和跨子区域),以及量化不同类型的后续发布论文的影响。
注:中文为机器翻译结果!