目录
10. Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning [PDF] 摘要
13. MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks [PDF] 摘要
14. On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation [PDF] 摘要
摘要
1. Language-agnostic BERT Sentence Embedding [PDF] 返回目录
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang
Abstract: We adapt multilingual BERT to produce language-agnostic sentence embeddings for 109 languages. %The state-of-the-art for numerous monolingual and multilingual NLP tasks is masked language model (MLM) pretraining followed by task specific fine-tuning. While English sentence embeddings have been obtained by fine-tuning a pretrained BERT model, such models have not been applied to multilingual sentence embeddings. Our model combines masked language model (MLM) and translation language model (TLM) pretraining with a translation ranking task using bi-directional dual encoders. The resulting multilingual sentence embeddings improve average bi-text retrieval accuracy over 112 languages to 83.7%, well above the 65.5% achieved by the prior state-of-the-art on Tatoeba. Our sentence embeddings also establish new state-of-the-art results on BUCC and UN bi-text retrieval.
摘要:我们适应多种语言BERT产生语言无关的句子的嵌入为109种语言。 %在国家的最先进的众多单语和多语言的NLP任务被屏蔽的语言模型(MLM)训练前,然后具体的任务微调。虽然英语句子的嵌入已经被微调获得预训练BERT模式,这种模式还没有被应用到多语言句子的嵌入。我们的模型结合掩盖语言模型(MLM),并使用双向双编码器翻译任务排序训练前翻译的语言模型(TLM)。将所得的多语言句子的嵌入改善超过112种语言平均双向文本检索精度83.7%,远高于65.5%由上Tatoeba先前状态的最先进的实现。我们一句也嵌入物建立国家的最先进的新的BUCC和联合国双向文本检索结果。
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang
Abstract: We adapt multilingual BERT to produce language-agnostic sentence embeddings for 109 languages. %The state-of-the-art for numerous monolingual and multilingual NLP tasks is masked language model (MLM) pretraining followed by task specific fine-tuning. While English sentence embeddings have been obtained by fine-tuning a pretrained BERT model, such models have not been applied to multilingual sentence embeddings. Our model combines masked language model (MLM) and translation language model (TLM) pretraining with a translation ranking task using bi-directional dual encoders. The resulting multilingual sentence embeddings improve average bi-text retrieval accuracy over 112 languages to 83.7%, well above the 65.5% achieved by the prior state-of-the-art on Tatoeba. Our sentence embeddings also establish new state-of-the-art results on BUCC and UN bi-text retrieval.
摘要:我们适应多种语言BERT产生语言无关的句子的嵌入为109种语言。 %在国家的最先进的众多单语和多语言的NLP任务被屏蔽的语言模型(MLM)训练前,然后具体的任务微调。虽然英语句子的嵌入已经被微调获得预训练BERT模式,这种模式还没有被应用到多语言句子的嵌入。我们的模型结合掩盖语言模型(MLM),并使用双向双编码器翻译任务排序训练前翻译的语言模型(TLM)。将所得的多语言句子的嵌入改善超过112种语言平均双向文本检索精度83.7%,远高于65.5%由上Tatoeba先前状态的最先进的实现。我们一句也嵌入物建立国家的最先进的新的BUCC和联合国双向文本检索结果。
2. Exploration and Discovery of the COVID-19 Literature through Semantic Visualization [PDF] 返回目录
Jingxuan Tu, Marc Verhagen, Brent Cochran, James Pustejovsky
Abstract: We are developing semantic visualization techniques in order to enhance exploration and enable discovery over large datasets of complex networks of relations. Semantic visualization is a method of enabling exploration and discovery over large datasets of complex networks by exploiting the semantics of the relations in them. This involves (i) NLP to extract named entities, relations and knowledge graphs from the original data; (ii) indexing the output and creating representations for all relevant entities and relations that can be visualized in many different ways, e.g., as tag clouds, heat maps, graphs, etc.; (iii) applying parameter reduction operations to the extracted relations, creating "relation containers", or functional entities that can also be visualized using the same methods, allowing the visualization of multiple relations, partial pathways, and exploration across multiple dimensions. Our hope is that this will enable the discovery of novel inferences over relations in complex data that otherwise would go unnoticed. We have applied this to analysis of the recently released CORD-19 dataset.
摘要:我们是为了加强探索,使发现过关系的复杂网络的大数据集开发语义的可视化技术。语义的可视化是通过利用在它们之间的关系的语义能够探索和发现了复杂网络的大数据集的方法。这涉及到(我)NLP提取命名实体,关系和知识图表从原始数据; (二)索引输出和所有相关实体和关系,可以在许多不同的方式,例如可视化创造交涉,如标签云,热地图,图表等; (iii)将参数减少操作,以提取出的关系,创建“关系容器”,或者也可以使用相同的方法可视化的功能性实体,允许多个关系,局部途径,和勘探跨多个维度的可视化。我们希望,这将使新的推论了在复杂的数据,否则将被忽视关系的发现。我们应用这个刚刚发布的CORD-19数据集的分析。
Jingxuan Tu, Marc Verhagen, Brent Cochran, James Pustejovsky
Abstract: We are developing semantic visualization techniques in order to enhance exploration and enable discovery over large datasets of complex networks of relations. Semantic visualization is a method of enabling exploration and discovery over large datasets of complex networks by exploiting the semantics of the relations in them. This involves (i) NLP to extract named entities, relations and knowledge graphs from the original data; (ii) indexing the output and creating representations for all relevant entities and relations that can be visualized in many different ways, e.g., as tag clouds, heat maps, graphs, etc.; (iii) applying parameter reduction operations to the extracted relations, creating "relation containers", or functional entities that can also be visualized using the same methods, allowing the visualization of multiple relations, partial pathways, and exploration across multiple dimensions. Our hope is that this will enable the discovery of novel inferences over relations in complex data that otherwise would go unnoticed. We have applied this to analysis of the recently released CORD-19 dataset.
摘要:我们是为了加强探索,使发现过关系的复杂网络的大数据集开发语义的可视化技术。语义的可视化是通过利用在它们之间的关系的语义能够探索和发现了复杂网络的大数据集的方法。这涉及到(我)NLP提取命名实体,关系和知识图表从原始数据; (二)索引输出和所有相关实体和关系,可以在许多不同的方式,例如可视化创造交涉,如标签云,热地图,图表等; (iii)将参数减少操作,以提取出的关系,创建“关系容器”,或者也可以使用相同的方法可视化的功能性实体,允许多个关系,局部途径,和勘探跨多个维度的可视化。我们希望,这将使新的推论了在复杂的数据,否则将被忽视关系的发现。我们应用这个刚刚发布的CORD-19数据集的分析。
3. TICO-19: the Translation Initiative for Covid-19 [PDF] 返回目录
Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Abstract: The COVID-19 pandemic is the worst pandemic to strike the world in over a century. Crucial to stemming the tide of the SARS-CoV-2 virus is communicating to vulnerable populations the means by which they can protect themselves. To this end, the collaborators forming the Translation Initiative for COvid-19 (TICO-19) have made test and development data available to AI and MT researchers in 35 different languages in order to foster the development of tools and resources for improving access to information about COVID-19 in these languages. In addition to 9 high-resourced, "pivot" languages, the team is targeting 26 lesser resourced languages, in particular languages of Africa, South Asia and South-East Asia, whose populations may be the most vulnerable to the spread of the virus. The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set. Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
摘要:COVID-19大流行是流行最严重的一个多世纪以来,以打击世界英寸至关重要的词干的SARS冠状病毒2型病毒的浪潮传达给弱势群体以使他们能够保护自己的手段。为此,形成了COvid-19(TICO-19)的翻译计划的合作者,以促进工具和资源的开发对于提高获取信息提供给AI和MT研究者在35种不同的语言测试和开发数据关于COVID-19在这些语言。除了9大资源,“转动”的语言,球队的目标是26周较少的资源匮乏的语言,在非洲,南亚和东南亚地区,其人口可能是最容易受到病毒的传播的特殊语言。相同的数据被翻译成各种语言的代表,这意味着测试或开发可在设定任何语言的配对来完成。此外,该团队将测试和开发数据转换成可以和任何语言来使用本地化翻译记忆库(TMXs)。
Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Abstract: The COVID-19 pandemic is the worst pandemic to strike the world in over a century. Crucial to stemming the tide of the SARS-CoV-2 virus is communicating to vulnerable populations the means by which they can protect themselves. To this end, the collaborators forming the Translation Initiative for COvid-19 (TICO-19) have made test and development data available to AI and MT researchers in 35 different languages in order to foster the development of tools and resources for improving access to information about COVID-19 in these languages. In addition to 9 high-resourced, "pivot" languages, the team is targeting 26 lesser resourced languages, in particular languages of Africa, South Asia and South-East Asia, whose populations may be the most vulnerable to the spread of the virus. The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set. Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
摘要:COVID-19大流行是流行最严重的一个多世纪以来,以打击世界英寸至关重要的词干的SARS冠状病毒2型病毒的浪潮传达给弱势群体以使他们能够保护自己的手段。为此,形成了COvid-19(TICO-19)的翻译计划的合作者,以促进工具和资源的开发对于提高获取信息提供给AI和MT研究者在35种不同的语言测试和开发数据关于COVID-19在这些语言。除了9大资源,“转动”的语言,球队的目标是26周较少的资源匮乏的语言,在非洲,南亚和东南亚地区,其人口可能是最容易受到病毒的传播的特殊语言。相同的数据被翻译成各种语言的代表,这意味着测试或开发可在设定任何语言的配对来完成。此外,该团队将测试和开发数据转换成可以和任何语言来使用本地化翻译记忆库(TMXs)。
4. Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer [PDF] 返回目录
Kateřina Macková, Milan Straka
Abstract: Reading comprehension is a well studied task, with huge training datasets in English. This work focuses on building reading comprehension systems for Czech, without requiring any manually annotated Czech training data. First of all, we automatically translated SQuAD 1.1 and SQuAD 2.0 datasets to Czech to create training and development data, which we release at this http URL. We then trained and evaluated several BERT and XLM-RoBERTa baseline models. However, our main focus lies in cross-lingual transfer models. We report that a XLM-RoBERTa model trained on English data and evaluated on Czech achieves very competitive performance, only approximately 2 percent points worse than a~model trained on the translated Czech data. This result is extremely good, considering the fact that the model has not seen any Czech data during training. The cross-lingual transfer approach is very flexible and provides a reading comprehension in any language, for which we have enough monolingual raw texts.
摘要:阅读理解是一个很好的研究课题,用英语巨大的训练数据集。这项工作的重点是建设阅读理解系统捷克,而无需任何手动注释捷克的训练数据。首先,我们自动翻译阵容1.1和2.0队内数据集捷克建立培训和发展的数据,这是我们在这个HTTP URL释放。然后,我们的培训和评估了多个BERT和XLM - 罗伯塔基线模型。然而,我们的主要焦点在于跨语言传递模型。我们报告说,训练有素的英语数据和评估在捷克一个XLM - 罗伯塔模型实现了非常有竞争力的性能,比上训练翻译捷克数据〜模型差仅约2个百分点。这个结果是非常不错的,考虑到该模型还没有看到训练过程中的任何数据捷克的事实。在跨语言传递方式是非常灵活,提供了任何语言,我们有足够的单语生文本阅读理解。
Kateřina Macková, Milan Straka
Abstract: Reading comprehension is a well studied task, with huge training datasets in English. This work focuses on building reading comprehension systems for Czech, without requiring any manually annotated Czech training data. First of all, we automatically translated SQuAD 1.1 and SQuAD 2.0 datasets to Czech to create training and development data, which we release at this http URL. We then trained and evaluated several BERT and XLM-RoBERTa baseline models. However, our main focus lies in cross-lingual transfer models. We report that a XLM-RoBERTa model trained on English data and evaluated on Czech achieves very competitive performance, only approximately 2 percent points worse than a~model trained on the translated Czech data. This result is extremely good, considering the fact that the model has not seen any Czech data during training. The cross-lingual transfer approach is very flexible and provides a reading comprehension in any language, for which we have enough monolingual raw texts.
摘要:阅读理解是一个很好的研究课题,用英语巨大的训练数据集。这项工作的重点是建设阅读理解系统捷克,而无需任何手动注释捷克的训练数据。首先,我们自动翻译阵容1.1和2.0队内数据集捷克建立培训和发展的数据,这是我们在这个HTTP URL释放。然后,我们的培训和评估了多个BERT和XLM - 罗伯塔基线模型。然而,我们的主要焦点在于跨语言传递模型。我们报告说,训练有素的英语数据和评估在捷克一个XLM - 罗伯塔模型实现了非常有竞争力的性能,比上训练翻译捷克数据〜模型差仅约2个百分点。这个结果是非常不错的,考虑到该模型还没有看到训练过程中的任何数据捷克的事实。在跨语言传递方式是非常灵活,提供了任何语言,我们有足够的单语生文本阅读理解。
5. Playing with Words at the National Library of Sweden -- Making a Swedish BERT [PDF] 返回目录
Martin Malmsten, Love Börjeson, Chris Haffenden
Abstract: This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB). Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish. We also present the results of our model in comparison with existing models - chiefly that produced by the Swedish Public Employment Service, Arbetsförmedlingen, and Google's multilingual M-BERT - where we demonstrate that KB-BERT outperforms these in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS). Our discussion highlights the difficulties that continue to exist given the lack of training data and testbeds for smaller languages like Swedish. We release our model for further exploration and research here: this https URL .
摘要:本文介绍了瑞典BERT由KBLab数据驱动的研究,在瑞典国家图书馆(KB)开发(“KB-BERT”)。近期努力为英语以外的语言创建基于变压器-BERT模型的基础上,我们解释了如何使用KB的集合,以建立和培养了瑞典一个新的语言特有的BERT模式。我们还提出我们的模型的结果与现有车型相比 - 主要是由瑞典公共就业服务,Arbetsförmedlingen,和谷歌的多语种M-BERT产生 - 在这里我们表明,KB-BERT在一系列的NLP任务优于这些从命名实体识别(NER)到部分词性标注(POS)。我们的讨论着重指出,继续存在,由于缺乏训练数据和测试平台,像瑞典小语言的困难。我们发布了型号为进一步探索和研究在这里:此HTTPS URL。
Martin Malmsten, Love Börjeson, Chris Haffenden
Abstract: This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB). Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish. We also present the results of our model in comparison with existing models - chiefly that produced by the Swedish Public Employment Service, Arbetsförmedlingen, and Google's multilingual M-BERT - where we demonstrate that KB-BERT outperforms these in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS). Our discussion highlights the difficulties that continue to exist given the lack of training data and testbeds for smaller languages like Swedish. We release our model for further exploration and research here: this https URL .
摘要:本文介绍了瑞典BERT由KBLab数据驱动的研究,在瑞典国家图书馆(KB)开发(“KB-BERT”)。近期努力为英语以外的语言创建基于变压器-BERT模型的基础上,我们解释了如何使用KB的集合,以建立和培养了瑞典一个新的语言特有的BERT模式。我们还提出我们的模型的结果与现有车型相比 - 主要是由瑞典公共就业服务,Arbetsförmedlingen,和谷歌的多语种M-BERT产生 - 在这里我们表明,KB-BERT在一系列的NLP任务优于这些从命名实体识别(NER)到部分词性标注(POS)。我们的讨论着重指出,继续存在,由于缺乏训练数据和测试平台,像瑞典小语言的困难。我们发布了型号为进一步探索和研究在这里:此HTTPS URL。
6. Generating Informative Dialogue Responses with Keywords-Guided Networks [PDF] 返回目录
Heng-Da Xu, Xian-Ling Mao, Zewen Chi, Jing-Jing Zhu, Fanshu Sun, Heyan Huang
Abstract: Recently, open-domain dialogue systems have attracted growing attention. Most of them use the sequence-to-sequence (Seq2Seq) architecture to generate responses. However, traditional Seq2Seq-based open-domain dialogue models tend to generate generic and safe responses, which are less informative, unlike human responses. In this paper, we propose a simple but effective keywords-guided Sequence-to-Sequence model (KW-Seq2Seq) which uses keywords information as guidance to generate open-domain dialogue responses. Specifically, KW-Seq2Seq first uses a keywords decoder to predict some topic keywords, and then generates the final response under the guidance of them. Extensive experiments demonstrate that the KW-Seq2Seq model produces more informative, coherent and fluent responses, yielding substantive gain in both automatic and human evaluation metrics.
摘要:近日,开域的对话系统已经吸引了越来越多的关注。他们中的大多数使用该序列对序列(Seq2Seq)架构产生的反应。然而,传统的基于Seq2Seq开域的对话模式往往产生通用和安全的反应,这是不够丰富,不像人的响应。在本文中,我们提出了一个简单而有效的关键字引导序列到序列模型(KW-Seq2Seq)使用关键字的信息为指导,以生成开域对话响应其。具体而言,KW-Seq2Seq首先使用关键字解码器预测一些主题关键字,然后产生它们的指导下的最终响应。大量实验证明KW-Seq2Seq模型产生更多的信息,连贯和流畅的反应,产生在自动和人工评估指标实质性的收益。
Heng-Da Xu, Xian-Ling Mao, Zewen Chi, Jing-Jing Zhu, Fanshu Sun, Heyan Huang
Abstract: Recently, open-domain dialogue systems have attracted growing attention. Most of them use the sequence-to-sequence (Seq2Seq) architecture to generate responses. However, traditional Seq2Seq-based open-domain dialogue models tend to generate generic and safe responses, which are less informative, unlike human responses. In this paper, we propose a simple but effective keywords-guided Sequence-to-Sequence model (KW-Seq2Seq) which uses keywords information as guidance to generate open-domain dialogue responses. Specifically, KW-Seq2Seq first uses a keywords decoder to predict some topic keywords, and then generates the final response under the guidance of them. Extensive experiments demonstrate that the KW-Seq2Seq model produces more informative, coherent and fluent responses, yielding substantive gain in both automatic and human evaluation metrics.
摘要:近日,开域的对话系统已经吸引了越来越多的关注。他们中的大多数使用该序列对序列(Seq2Seq)架构产生的反应。然而,传统的基于Seq2Seq开域的对话模式往往产生通用和安全的反应,这是不够丰富,不像人的响应。在本文中,我们提出了一个简单而有效的关键字引导序列到序列模型(KW-Seq2Seq)使用关键字的信息为指导,以生成开域对话响应其。具体而言,KW-Seq2Seq首先使用关键字解码器预测一些主题关键字,然后产生它们的指导下的最终响应。大量实验证明KW-Seq2Seq模型产生更多的信息,连贯和流畅的反应,产生在自动和人工评估指标实质性的收益。
7. On-The-Fly Information Retrieval Augmentation for Language Models [PDF] 返回目录
Hai Wang, David McAllester
Abstract: Here we experiment with the use of information retrieval as an augmentation for pre-trained language models. The text corpus used in information retrieval can be viewed as form of episodic memory which grows over time. By augmenting GPT 2.0 with information retrieval we achieve a zero shot 15% relative reduction in perplexity on Gigaword corpus without any re-training. We also validate our IR augmentation on an event co-reference task.
摘要:在这里,我们实验中使用信息检索为预训练的语言模型的增强。在信息检索中所使用的文本语料库可以看作是其随时间增长情景记忆的形式。通过与信息检索增强GPT 2.0我们实现了在Gigaword语料库困惑零射门15%的相对减少,没有任何再培训。我们也验证了我们对事件联合引用任务IR增强。
Hai Wang, David McAllester
Abstract: Here we experiment with the use of information retrieval as an augmentation for pre-trained language models. The text corpus used in information retrieval can be viewed as form of episodic memory which grows over time. By augmenting GPT 2.0 with information retrieval we achieve a zero shot 15% relative reduction in perplexity on Gigaword corpus without any re-training. We also validate our IR augmentation on an event co-reference task.
摘要:在这里,我们实验中使用信息检索为预训练的语言模型的增强。在信息检索中所使用的文本语料库可以看作是其随时间增长情景记忆的形式。通过与信息检索增强GPT 2.0我们实现了在Gigaword语料库困惑零射门15%的相对减少,没有任何再培训。我们也验证了我们对事件联合引用任务IR增强。
8. Improving Event Detection using Contextual Word and Sentence Embeddings [PDF] 返回目录
Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé, Ana Maguitman, Evangelos Milios
Abstract: The task of Event Detection (ED) is a subfield of Information Extraction (IE) that consists in recognizing event mentions in natural language texts. Several applications can take advantage of an ED system, including alert systems, text summarization, question-answering systems, and any system that needs to extract structured information about events from unstructured texts. ED is a complex task, which is hampered by two main challenges: the lack of a dataset large enough to train and test the developed models and the variety of event type definitions that exist in the literature. These problems make generalization hard to achieve, resulting in poor adaptation to different domains and targets. The main contribution of this paper is the design, implementation and evaluation of a recurrent neural network model for ED that combines several features. In particular, the paper makes the following contributions: (1) it uses BERT embeddings to define contextual word and contextual sentence embeddings as attributes, which to the best of our knowledge were never used before for the ED task; (2) the proposed model has the ability to use its first layer to learn good feature representations; (3) a new public dataset with a general definition of event; (4) an extensive empirical evaluation that includes (i) the exploration of different architectures and hyperparameters, (ii) an ablation test to study the impact of each attribute, and (iii) a comparison with a replication of a state-of-the-art model. The results offer several insights into the importance of contextual embeddings and indicate that the proposed approach is effective in the ED task, outperforming the baseline models.
摘要:事件检测(ED)的任务是由在认识事件中的自然语言文本中提到的信息抽取(IE)的一个分支。多个应用程序可以采取ED系统的优势,包括预警系统,文本摘要,答疑系统,任何系统需要提取有关从非结构化文本事件的结构化信息。 ED是一项复杂的任务,这是由两个主要挑战阻碍:缺乏数据集足够大的训练和测试建立的模型和各种事件类型定义存在于文学作品。这些问题使得推广难以实现,导致较差的适应不同的域和目标。本文的主要贡献是ED经常性的神经网络模型,结合了多项功能的设计,实施和评价。特别是,本文提出了以下贡献:(1)它使用的嵌入BERT定义语境词和句子的上下文的嵌入的属性,这就给我们所知,从来没有为ED任务前使用; (2)该模型具有利用其第一层要学会良好的功能交涉的能力; (3)新的公共数据集与事件的一般定义; (4)丰富的经验评估,其包括(ⅰ)不同的体系结构和超参数的探索,(ⅱ)的消融试验来研究每个属性的影响,和(iii)的比较用的状态的最复制-art模型。结果提供了一些见解上下文的嵌入的重要性,并指出,该方法是有效的ED任务,跑赢基准模型。
Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé, Ana Maguitman, Evangelos Milios
Abstract: The task of Event Detection (ED) is a subfield of Information Extraction (IE) that consists in recognizing event mentions in natural language texts. Several applications can take advantage of an ED system, including alert systems, text summarization, question-answering systems, and any system that needs to extract structured information about events from unstructured texts. ED is a complex task, which is hampered by two main challenges: the lack of a dataset large enough to train and test the developed models and the variety of event type definitions that exist in the literature. These problems make generalization hard to achieve, resulting in poor adaptation to different domains and targets. The main contribution of this paper is the design, implementation and evaluation of a recurrent neural network model for ED that combines several features. In particular, the paper makes the following contributions: (1) it uses BERT embeddings to define contextual word and contextual sentence embeddings as attributes, which to the best of our knowledge were never used before for the ED task; (2) the proposed model has the ability to use its first layer to learn good feature representations; (3) a new public dataset with a general definition of event; (4) an extensive empirical evaluation that includes (i) the exploration of different architectures and hyperparameters, (ii) an ablation test to study the impact of each attribute, and (iii) a comparison with a replication of a state-of-the-art model. The results offer several insights into the importance of contextual embeddings and indicate that the proposed approach is effective in the ED task, outperforming the baseline models.
摘要:事件检测(ED)的任务是由在认识事件中的自然语言文本中提到的信息抽取(IE)的一个分支。多个应用程序可以采取ED系统的优势,包括预警系统,文本摘要,答疑系统,任何系统需要提取有关从非结构化文本事件的结构化信息。 ED是一项复杂的任务,这是由两个主要挑战阻碍:缺乏数据集足够大的训练和测试建立的模型和各种事件类型定义存在于文学作品。这些问题使得推广难以实现,导致较差的适应不同的域和目标。本文的主要贡献是ED经常性的神经网络模型,结合了多项功能的设计,实施和评价。特别是,本文提出了以下贡献:(1)它使用的嵌入BERT定义语境词和句子的上下文的嵌入的属性,这就给我们所知,从来没有为ED任务前使用; (2)该模型具有利用其第一层要学会良好的功能交涉的能力; (3)新的公共数据集与事件的一般定义; (4)丰富的经验评估,其包括(ⅰ)不同的体系结构和超参数的探索,(ⅱ)的消融试验来研究每个属性的影响,和(iii)的比较用的状态的最复制-art模型。结果提供了一些见解上下文的嵌入的重要性,并指出,该方法是有效的ED任务,跑赢基准模型。
9. Bayesian multilingual topic model for zero-shot cross-lingual topic identification [PDF] 返回目录
Santosh Kesiraju, Sangeet Sagar, Ondřej Glembek, Lukáš Burget, Suryakanth V Gangashetty
Abstract: This paper presents a Bayesian multilingual topic model for learning language-independent document embeddings. Our model learns to represent the documents in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers for zero-shot cross-lingual topic identification. Our experiments on 5 language Europarl and Reuters (MLDoc) corpora show that the proposed model outperforms multi-lingual word embedding and BiLSTM sentence encoder based systems with significant margins in the majority of the transfer directions. Moreover, our system trained under a single day on a single GPU with much lower amounts of data performs competitively as compared to the state-of-the-art universal BiLSTM sentence encoder trained on 93 languages. Our experimental analysis shows that the amount of parallel data improves the overall performance of embeddings. Nonetheless, exploiting the uncertainties is always beneficial.
摘要:本文介绍了学习语言无关的文档的嵌入贝叶斯多种语言的主题模型。我们的模型学会代表高斯分布形式的文件,从而编码的协方差的不确定性。我们通过对零射门跨语种的主题识别线性分类传播学的不确定性。我们对5语言Europarl和路透社(MLDoc)实验语料表明,该模型优于多语种的字与大多数传输方向的显著边缘嵌入和BiLSTM一句编码器为基础的系统。此外,我们一天的培训下,在单一GPU以低得多的数据量进行的系统竞争力相比,培训了93种语言的国家的最先进的通用BiLSTM一句编码器。我们的实验分析显示,并行数据的量提高的嵌入的整体性能。尽管如此,开发的不确定性始终是有益的。
Santosh Kesiraju, Sangeet Sagar, Ondřej Glembek, Lukáš Burget, Suryakanth V Gangashetty
Abstract: This paper presents a Bayesian multilingual topic model for learning language-independent document embeddings. Our model learns to represent the documents in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers for zero-shot cross-lingual topic identification. Our experiments on 5 language Europarl and Reuters (MLDoc) corpora show that the proposed model outperforms multi-lingual word embedding and BiLSTM sentence encoder based systems with significant margins in the majority of the transfer directions. Moreover, our system trained under a single day on a single GPU with much lower amounts of data performs competitively as compared to the state-of-the-art universal BiLSTM sentence encoder trained on 93 languages. Our experimental analysis shows that the amount of parallel data improves the overall performance of embeddings. Nonetheless, exploiting the uncertainties is always beneficial.
摘要:本文介绍了学习语言无关的文档的嵌入贝叶斯多种语言的主题模型。我们的模型学会代表高斯分布形式的文件,从而编码的协方差的不确定性。我们通过对零射门跨语种的主题识别线性分类传播学的不确定性。我们对5语言Europarl和路透社(MLDoc)实验语料表明,该模型优于多语种的字与大多数传输方向的显著边缘嵌入和BiLSTM一句编码器为基础的系统。此外,我们一天的培训下,在单一GPU以低得多的数据量进行的系统竞争力相比,培训了93种语言的国家的最先进的通用BiLSTM一句编码器。我们的实验分析显示,并行数据的量提高的嵌入的整体性能。尽管如此,开发的不确定性始终是有益的。
10. Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning [PDF] 返回目录
Pavel Denisov, Ngoc Thang Vu
Abstract: Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. Therefore, these components are optimized independently from each other and the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings such as BERT to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmark datasets show that our system reaches the pipeline architecture performance without using any training data and outperforms it after fine-tuning with only a few examples.
摘要:口语理解通常是基于流水线架构,包括语音识别和自然语言理解的步骤。因此,这些组件彼此和从错误传播系统的整体患有独立地优化。在本文中,我们提出了使预训练情境的嵌入,如BERT处理声学特征的新的训练方法。特别是,我们为了与预训练的语音识别系统的编码器扩展它来构建终端到终端的口语理解系统。我们提出的方法是基于跨语音和文本方式的师生框架对齐声和语义的潜在空间。在三个标准数据集的实验结果表明,该系统达到不使用任何训练数据,优于其微调后的几个例子而已流水线结构性能。
Pavel Denisov, Ngoc Thang Vu
Abstract: Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. Therefore, these components are optimized independently from each other and the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings such as BERT to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmark datasets show that our system reaches the pipeline architecture performance without using any training data and outperforms it after fine-tuning with only a few examples.
摘要:口语理解通常是基于流水线架构,包括语音识别和自然语言理解的步骤。因此,这些组件彼此和从错误传播系统的整体患有独立地优化。在本文中,我们提出了使预训练情境的嵌入,如BERT处理声学特征的新的训练方法。特别是,我们为了与预训练的语音识别系统的编码器扩展它来构建终端到终端的口语理解系统。我们提出的方法是基于跨语音和文本方式的师生框架对齐声和语义的潜在空间。在三个标准数据集的实验结果表明,该系统达到不使用任何训练数据,优于其微调后的几个例子而已流水线结构性能。
11. Visual Question Answering as a Multi-Task Problem [PDF] 返回目录
Amelia Elizabeth Pollard, Jonathan L. Shapiro
Abstract: Visual Question Answering(VQA) is a highly complex problem set, relying on many sub-problems to produce reasonable answers. In this paper, we present the hypothesis that Visual Question Answering should be viewed as a multi-task problem, and provide evidence to support this hypothesis. We demonstrate this by reformatting two commonly used Visual Question Answering datasets, COCO-QA and DAQUAR, into a multi-task format and train these reformatted datasets on two baseline networks, with one designed specifically to eliminate other possible causes for performance changes as a result of the reformatting. Though the networks demonstrated in this paper do not achieve strongly competitive results, we find that the multi-task approach to Visual Question Answering results in increases in performance of 5-9% against the single-task formatting, and that the networks reach convergence much faster than in the single-task case. Finally we discuss possible reasons for the observed difference in performance, and perform additional experiments which rule out causes not associated with the learning of the dataset as a multi-task problem.
摘要:视觉答疑(VQA)是一个非常复杂的问题集,依靠许多子问题产生合理的答案。在本文中,我们介绍了Visual答疑应该被看作是一个多任务的问题,并提供证据支持这一假说的假设。我们通过格式化两种常用的视觉问题回答的数据集,COCO-QA和DAQUAR,为多任务格式证明这一点,培养这些格式化的数据集上的两个基线网络,与一个专门设计,以消除对性能的变化,结果其他可能的原因的格式化。虽然在本文中展示的网络没有达到强劲的竞争力的结果,我们发现,多任务的方式来可视答疑结果在针对单任务的5-9%的性能提升格式,并且该网络达到收敛多比单任务的情况下更快。最后,我们讨论了在性能上观察到的差异的可能原因,并执行其排除不与数据集作为一个多任务的问题的学习有关原因的其他实验。
Amelia Elizabeth Pollard, Jonathan L. Shapiro
Abstract: Visual Question Answering(VQA) is a highly complex problem set, relying on many sub-problems to produce reasonable answers. In this paper, we present the hypothesis that Visual Question Answering should be viewed as a multi-task problem, and provide evidence to support this hypothesis. We demonstrate this by reformatting two commonly used Visual Question Answering datasets, COCO-QA and DAQUAR, into a multi-task format and train these reformatted datasets on two baseline networks, with one designed specifically to eliminate other possible causes for performance changes as a result of the reformatting. Though the networks demonstrated in this paper do not achieve strongly competitive results, we find that the multi-task approach to Visual Question Answering results in increases in performance of 5-9% against the single-task formatting, and that the networks reach convergence much faster than in the single-task case. Finally we discuss possible reasons for the observed difference in performance, and perform additional experiments which rule out causes not associated with the learning of the dataset as a multi-task problem.
摘要:视觉答疑(VQA)是一个非常复杂的问题集,依靠许多子问题产生合理的答案。在本文中,我们介绍了Visual答疑应该被看作是一个多任务的问题,并提供证据支持这一假说的假设。我们通过格式化两种常用的视觉问题回答的数据集,COCO-QA和DAQUAR,为多任务格式证明这一点,培养这些格式化的数据集上的两个基线网络,与一个专门设计,以消除对性能的变化,结果其他可能的原因的格式化。虽然在本文中展示的网络没有达到强劲的竞争力的结果,我们发现,多任务的方式来可视答疑结果在针对单任务的5-9%的性能提升格式,并且该网络达到收敛多比单任务的情况下更快。最后,我们讨论了在性能上观察到的差异的可能原因,并执行其排除不与数据集作为一个多任务的问题的学习有关原因的其他实验。
12. Interpretable Sequence Classification Via Prototype Trajectory [PDF] 返回目录
Dat Hong, Stephen S. Baek, Tong Wang
Abstract: We propose a novel interpretable recurrent neural network (RNN) model, called ProtoryNet, in which we introduce a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each of the sentences to the prototypes. The RNN backbone then captures the temporal pattern of the prototypes, to which we refer as prototype trajectories. The prototype trajectories enable intuitive, fine-grained interpretation of how the model reached to the final prediction, resembling the process of how humans analyze paragraphs. Experiments conducted on multiple public data sets reveal that the proposed method not only is more interpretable but also is more accurate than the current state-of-the-art prototype-based method. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand, compared to the other prototype-based methods.
摘要:本文提出了一种解释回归神经网络(RNN)模型,称为ProtoryNet,在此我们引入原型轨迹的新概念。在现代语言学的原型理论的启发,ProtoryNet通过找到最相似的原型在文本序列中的每个句子和喂养的RNN骨干与每个句子的原型附近进行预测。该RNN骨干然后拍摄的原型,这是我们作为参考原型轨迹的时空格局。原型轨迹使的模型是如何达到最终的预测,类似的人类如何分析段落过程直观,细粒度的解释。在多个公共数据集进行的实验表明,所提出的方法不仅是更可解释的而且是比当前状态的最先进的基于原型的方法更精确。此外,我们报告的调查结果表明,人类的用户发现ProtoryNet更直观,更容易理解,相比于其他基于原型的方法。
Dat Hong, Stephen S. Baek, Tong Wang
Abstract: We propose a novel interpretable recurrent neural network (RNN) model, called ProtoryNet, in which we introduce a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each of the sentences to the prototypes. The RNN backbone then captures the temporal pattern of the prototypes, to which we refer as prototype trajectories. The prototype trajectories enable intuitive, fine-grained interpretation of how the model reached to the final prediction, resembling the process of how humans analyze paragraphs. Experiments conducted on multiple public data sets reveal that the proposed method not only is more interpretable but also is more accurate than the current state-of-the-art prototype-based method. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand, compared to the other prototype-based methods.
摘要:本文提出了一种解释回归神经网络(RNN)模型,称为ProtoryNet,在此我们引入原型轨迹的新概念。在现代语言学的原型理论的启发,ProtoryNet通过找到最相似的原型在文本序列中的每个句子和喂养的RNN骨干与每个句子的原型附近进行预测。该RNN骨干然后拍摄的原型,这是我们作为参考原型轨迹的时空格局。原型轨迹使的模型是如何达到最终的预测,类似的人类如何分析段落过程直观,细粒度的解释。在多个公共数据集进行的实验表明,所提出的方法不仅是更可解释的而且是比当前状态的最先进的基于原型的方法更精确。此外,我们报告的调查结果表明,人类的用户发现ProtoryNet更直观,更容易理解,相比于其他基于原型的方法。
13. MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks [PDF] 返回目录
Yusi Zhang, Chuanjie Liu, Angen Luo, Hui Xue, Xuan Shan, Yuxiang Luo, Yiqian Xia, Yuanchi Yan, Haidong Wang
Abstract: We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.
摘要:,我们研究了工业网络搜索,这是深召回型号的问题给出一个用户查询,从数十亿考生检索数百个最相关的文件。常见的框架是基于其独立学习查询和文档的分布式表示和潜在语义空间匹配他们的神经嵌入训练两种编码模式。然而,所有的退出编码模型仅利用文档本身,与查询词匹配的时候,尤其是对硬尾查询这往往不足以在实践中的信息。在这项工作中,我们的目标是利用来自其合作点击邻居的帮助文档检索每个文档的附加信息。这些挑战包括如何有效地提取信息和噪声消除深层模型涉及共同点击信息时,同时满足数十亿规模的数据大小为在线实时推理的要求。为了处理共同点击关系的噪音,我们首先提出了一个网络规模的多意向合作点击文档图(MICG)的基础上点击意图层次文档之间而不是在文件级别共同点击连接。然后,我们提出了基于伯特和图形注意网络编码框架MIRA它利用双因素的关注机制,聚集邻居。为了满足网络时延要求,我们只涉及在文档方面邻居信息,从而可以节省实时服务费时查询邻搜索。我们从两个主要的商业搜索引擎展示与几个基线相比,该方法的有效性和可扩展性公共数据集和私有网络的大规模数据集进行广泛的离线实验。而进一步的案例研究表明,共同点击关系主要是帮助从两个方面提高网络搜索质量:关键概念加强和查询词的互补性。
Yusi Zhang, Chuanjie Liu, Angen Luo, Hui Xue, Xuan Shan, Yuxiang Luo, Yiqian Xia, Yuanchi Yan, Haidong Wang
Abstract: We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.
摘要:,我们研究了工业网络搜索,这是深召回型号的问题给出一个用户查询,从数十亿考生检索数百个最相关的文件。常见的框架是基于其独立学习查询和文档的分布式表示和潜在语义空间匹配他们的神经嵌入训练两种编码模式。然而,所有的退出编码模型仅利用文档本身,与查询词匹配的时候,尤其是对硬尾查询这往往不足以在实践中的信息。在这项工作中,我们的目标是利用来自其合作点击邻居的帮助文档检索每个文档的附加信息。这些挑战包括如何有效地提取信息和噪声消除深层模型涉及共同点击信息时,同时满足数十亿规模的数据大小为在线实时推理的要求。为了处理共同点击关系的噪音,我们首先提出了一个网络规模的多意向合作点击文档图(MICG)的基础上点击意图层次文档之间而不是在文件级别共同点击连接。然后,我们提出了基于伯特和图形注意网络编码框架MIRA它利用双因素的关注机制,聚集邻居。为了满足网络时延要求,我们只涉及在文档方面邻居信息,从而可以节省实时服务费时查询邻搜索。我们从两个主要的商业搜索引擎展示与几个基线相比,该方法的有效性和可扩展性公共数据集和私有网络的大规模数据集进行广泛的离线实验。而进一步的案例研究表明,共同点击关系主要是帮助从两个方面提高网络搜索质量:关键概念加强和查询词的互补性。
14. On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation [PDF] 返回目录
Jianing Li, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
Abstract: The goal of text generation models is to fit the underlying real probability distribution of text. For performance evaluation, quality and diversity metrics are usually applied. However, it is still not clear to what extend can the quality-diversity evaluation reflect the distribution-fitting goal. In this paper, we try to reveal such relation in a theoretical approach. We prove that under certain conditions, a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution. We also show that the commonly used BLEU/Self-BLEU metric pair fails to match any divergence metric, thus propose CR/NRR as a substitute for quality/diversity metric pair.
摘要:文代车型的目标是,以适应文本的深层真实概率分布。对于绩效考核,质量和多样性指标通常是应用。但是,目前还不清楚是什么可以延长质量,多样性的评估反映了分布拟合目标。在本文中,我们试图揭示一种理论方法等的关系。我们证明了在一定条件下,质量和多样性的线性组合构成所生成的分布和真实分布之间的偏差度量。我们还表明,常用的BLEU /自BLEU指标对未能匹配任何发散度,从而提出了CR / NRR作为质量/多样性指标对的替代品。
Jianing Li, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
Abstract: The goal of text generation models is to fit the underlying real probability distribution of text. For performance evaluation, quality and diversity metrics are usually applied. However, it is still not clear to what extend can the quality-diversity evaluation reflect the distribution-fitting goal. In this paper, we try to reveal such relation in a theoretical approach. We prove that under certain conditions, a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution. We also show that the commonly used BLEU/Self-BLEU metric pair fails to match any divergence metric, thus propose CR/NRR as a substitute for quality/diversity metric pair.
摘要:文代车型的目标是,以适应文本的深层真实概率分布。对于绩效考核,质量和多样性指标通常是应用。但是,目前还不清楚是什么可以延长质量,多样性的评估反映了分布拟合目标。在本文中,我们试图揭示一种理论方法等的关系。我们证明了在一定条件下,质量和多样性的线性组合构成所生成的分布和真实分布之间的偏差度量。我们还表明,常用的BLEU /自BLEU指标对未能匹配任何发散度,从而提出了CR / NRR作为质量/多样性指标对的替代品。
注:中文为机器翻译结果!封面为论文标题词云图!