目录
1. SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics [PDF] 摘要
4. Pragmatic information in translation: a corpus-based study of tense and mood in English and German [PDF] 摘要
摘要
1. SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics [PDF] 返回目录
Daniel Deutsch, Dan Roth
Abstract: We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.
摘要:我们提出SacreROUGE,一个开放源码库,使用和发展概括评价指标。 SacreROUGE消除诸多障碍,研究人员使用或开发指标时所面临的:(1)该库提供了Python包装围绕现有评价标准的正式实施,使他们都有一个共同的,易于使用的界面; (2)它提供的功能,以评估如何以及在库相关于人类注释的判断来实现,因此不需要额外的代码需要一个新的评价度量来编写的任何度量;和(3),它包括对包含人的判断,这样他们可以容易地用于评价加载数据集的脚本。这部作品描述了图书馆的设计,包括核心公制接口,用于评估总结的模型和指标命令行API和脚本加载并重新格式化可公开获得的数据集。 SacreROUGE的开发正在进行之中,并开放给社会的贡献。
Daniel Deutsch, Dan Roth
Abstract: We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.
摘要:我们提出SacreROUGE,一个开放源码库,使用和发展概括评价指标。 SacreROUGE消除诸多障碍,研究人员使用或开发指标时所面临的:(1)该库提供了Python包装围绕现有评价标准的正式实施,使他们都有一个共同的,易于使用的界面; (2)它提供的功能,以评估如何以及在库相关于人类注释的判断来实现,因此不需要额外的代码需要一个新的评价度量来编写的任何度量;和(3),它包括对包含人的判断,这样他们可以容易地用于评价加载数据集的脚本。这部作品描述了图书馆的设计,包括核心公制接口,用于评估总结的模型和指标命令行API和脚本加载并重新格式化可公开获得的数据集。 SacreROUGE的开发正在进行之中,并开放给社会的贡献。
2. Topic Modeling on User Stories using Word Mover's Distance [PDF] 返回目录
Kim Julian Gülle, Nicholas Ford, Patrick Ebel, Florian Brokhausen, Andreas Vogelsang
Abstract: Requirements elicitation has recently been complemented with crowd-based techniques, which continuously involve large, heterogeneous groups of users who express their feedback through a variety of media. Crowd-based elicitation has great potential for engaging with (potential) users early on but also results in large sets of raw and unstructured feedback. Consolidating and analyzing this feedback is a key challenge for turning it into sensible user requirements. In this paper, we focus on topic modeling as a means to identify topics within a large set of crowd-generated user stories and compare three approaches: (1) a traditional approach based on Latent Dirichlet Allocation, (2) a combination of word embeddings and principal component analysis, and (3) a combination of word embeddings and Word Mover's Distance. We evaluate the approaches on a publicly available set of 2,966 user stories written and categorized by crowd workers. We found that a combination of word embeddings and Word Mover's Distance is most promising. Depending on the word embeddings we use in our approaches, we manage to cluster the user stories in two ways: one that is closer to the original categorization and another that allows new insights into the dataset, e.g. to find potentially new categories. Unfortunately, no measure exists to rate the quality of our results objectively. Still, our findings provide a basis for future work towards analyzing crowd-sourced user stories.
摘要:需求捕获最近一直辅以人群为基础的技术,它不断地涉及谁表达通过各种媒体他们的反馈用户的大型,异质群体。基于人群启发具有与(潜在)用户早期参与的巨大潜力,但也导致了大套的原料和非结构化的反馈。巩固和分析这个反馈是把它变成明智的用户需求的一个关键挑战。在本文中,我们侧重于主题建模为一大组人群生成的用户故事中找出主题的一种手段,并比较了三种方法:(1)基于潜在狄利克雷分配传统的方法,(2)字的嵌入组合和主成分分析,及(3)的嵌入字和Word移动器的距离的组合。我们评估的方法,并公开可用的一套2,966围观工人书面和分类后的用户故事。我们发现,文字的嵌入组合和Word先行者的距离是最有前途的。根据我们在我们的方法使用这个词的嵌入,我们管理有两种方式来聚类用户的故事:一个是更接近原始分类,另一种是允许新的见解数据集,例如寻找潜在的新的类别。不幸的是,没有任何措施存在客观地评价我们的结果的质量。不过,我们的研究结果为将来走向分析人群来源的用户案例工作的基础。
Kim Julian Gülle, Nicholas Ford, Patrick Ebel, Florian Brokhausen, Andreas Vogelsang
Abstract: Requirements elicitation has recently been complemented with crowd-based techniques, which continuously involve large, heterogeneous groups of users who express their feedback through a variety of media. Crowd-based elicitation has great potential for engaging with (potential) users early on but also results in large sets of raw and unstructured feedback. Consolidating and analyzing this feedback is a key challenge for turning it into sensible user requirements. In this paper, we focus on topic modeling as a means to identify topics within a large set of crowd-generated user stories and compare three approaches: (1) a traditional approach based on Latent Dirichlet Allocation, (2) a combination of word embeddings and principal component analysis, and (3) a combination of word embeddings and Word Mover's Distance. We evaluate the approaches on a publicly available set of 2,966 user stories written and categorized by crowd workers. We found that a combination of word embeddings and Word Mover's Distance is most promising. Depending on the word embeddings we use in our approaches, we manage to cluster the user stories in two ways: one that is closer to the original categorization and another that allows new insights into the dataset, e.g. to find potentially new categories. Unfortunately, no measure exists to rate the quality of our results objectively. Still, our findings provide a basis for future work towards analyzing crowd-sourced user stories.
摘要:需求捕获最近一直辅以人群为基础的技术,它不断地涉及谁表达通过各种媒体他们的反馈用户的大型,异质群体。基于人群启发具有与(潜在)用户早期参与的巨大潜力,但也导致了大套的原料和非结构化的反馈。巩固和分析这个反馈是把它变成明智的用户需求的一个关键挑战。在本文中,我们侧重于主题建模为一大组人群生成的用户故事中找出主题的一种手段,并比较了三种方法:(1)基于潜在狄利克雷分配传统的方法,(2)字的嵌入组合和主成分分析,及(3)的嵌入字和Word移动器的距离的组合。我们评估的方法,并公开可用的一套2,966围观工人书面和分类后的用户故事。我们发现,文字的嵌入组合和Word先行者的距离是最有前途的。根据我们在我们的方法使用这个词的嵌入,我们管理有两种方式来聚类用户的故事:一个是更接近原始分类,另一种是允许新的见解数据集,例如寻找潜在的新的类别。不幸的是,没有任何措施存在客观地评价我们的结果的质量。不过,我们的研究结果为将来走向分析人群来源的用户案例工作的基础。
3. Learn to Use Future Information in Simultaneous Translation [PDF] 返回目录
Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Tie-Yan Liu
Abstract: Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can utilize the full input sentence, simultaneous NMT is formulated as a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and more uncertainty is introduced to decoding. Wait-$k$ is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence $k$ words behind the input words. We observed that training simultaneous NMT systems with future information (i.e., trained with a larger $k$) generally outperforms the standard ones (i.e., trained with the given $k$). Based on this observation, we propose a framework that automatically learns how much future information to use in training for simultaneous NMT. We first build a series of tasks where each one is associated with a different $k$, and then learn a model on these tasks guided by a controller. The controller is jointly trained with the translation model through bi-level optimization. We conduct experiments on four datasets to demonstrate the effectiveness of our method.
摘要:同时神经机器翻译(简称为NMT)已经吸引了最近备受关注。与标准NMT,其中NMT系统可利用全输入句子,同时NMT被配制作为前缀到前缀的问题,其中该系统可以仅利用输入句子的前缀和更多的不确定性被引入到进行解码。 WAIT - $ $ķ是一个简单但用于同时NMT,其中该解码器生成的输入字后面的输出序列$ $ķ话有效的策略。我们观察到,训练的同时NMT系统与未来的信息(即,具有较大$ķ$受训)通常优于通用的标准(即给定ķ$ $的培训)。基于这一观察,我们提出了一个框架,自动学习将来多少信息,使用用于同时NMT培训。我们首先建立一个系列赛里每一个与不同$ķ$关联的任务,然后了解这些任务由控制器引导的模式。控制器共同通过双层优化翻译模型训练。我们进行的四个数据集的实验证明了该方法的有效性。
Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Tie-Yan Liu
Abstract: Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can utilize the full input sentence, simultaneous NMT is formulated as a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and more uncertainty is introduced to decoding. Wait-$k$ is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence $k$ words behind the input words. We observed that training simultaneous NMT systems with future information (i.e., trained with a larger $k$) generally outperforms the standard ones (i.e., trained with the given $k$). Based on this observation, we propose a framework that automatically learns how much future information to use in training for simultaneous NMT. We first build a series of tasks where each one is associated with a different $k$, and then learn a model on these tasks guided by a controller. The controller is jointly trained with the translation model through bi-level optimization. We conduct experiments on four datasets to demonstrate the effectiveness of our method.
摘要:同时神经机器翻译(简称为NMT)已经吸引了最近备受关注。与标准NMT,其中NMT系统可利用全输入句子,同时NMT被配制作为前缀到前缀的问题,其中该系统可以仅利用输入句子的前缀和更多的不确定性被引入到进行解码。 WAIT - $ $ķ是一个简单但用于同时NMT,其中该解码器生成的输入字后面的输出序列$ $ķ话有效的策略。我们观察到,训练的同时NMT系统与未来的信息(即,具有较大$ķ$受训)通常优于通用的标准(即给定ķ$ $的培训)。基于这一观察,我们提出了一个框架,自动学习将来多少信息,使用用于同时NMT培训。我们首先建立一个系列赛里每一个与不同$ķ$关联的任务,然后了解这些任务由控制器引导的模式。控制器共同通过双层优化翻译模型训练。我们进行的四个数据集的实验证明了该方法的有效性。
4. Pragmatic information in translation: a corpus-based study of tense and mood in English and German [PDF] 返回目录
Anita Ramm, Ekaterina Lapshinova-Koltunski, Alexander Fraser
Abstract: Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research. We consider the correspondence between English and German tense and mood in translation. Human translators do not find this correspondence easy, and as we will show through careful analysis, there are no simplistic ways to map tense and mood from one language to another. Our observations about the challenges of human translation of tense and mood have important implications for multilingual NLP. Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
摘要:时态和情绪是很重要的语言现象在自然语言处理(NLP)的研究考虑。我们认为,英语和德语紧张和情绪在翻译的对应关系。翻译人员不容易发现这种对应关系,并且我们将展示经过仔细分析,有没有简单的方法来映射紧张和情绪从一种语言到另一种。我们约的紧张和情绪人工翻译的挑战意见有对多语言的NLP重要意义。特别重要的是基于规则的,基于短语的统计和神经机器翻译造型紧张和情绪的挑战。
Anita Ramm, Ekaterina Lapshinova-Koltunski, Alexander Fraser
Abstract: Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research. We consider the correspondence between English and German tense and mood in translation. Human translators do not find this correspondence easy, and as we will show through careful analysis, there are no simplistic ways to map tense and mood from one language to another. Our observations about the challenges of human translation of tense and mood have important implications for multilingual NLP. Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
摘要:时态和情绪是很重要的语言现象在自然语言处理(NLP)的研究考虑。我们认为,英语和德语紧张和情绪在翻译的对应关系。翻译人员不容易发现这种对应关系,并且我们将展示经过仔细分析,有没有简单的方法来映射紧张和情绪从一种语言到另一种。我们约的紧张和情绪人工翻译的挑战意见有对多语言的NLP重要意义。特别重要的是基于规则的,基于短语的统计和神经机器翻译造型紧张和情绪的挑战。
5. What Can We Learn From Almost a Decade of Food Tweets [PDF] 返回目录
Uga Sproģis, Matīss Rikters
Abstract: We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using data from the corpus.
摘要:我们提出拉脱维亚的Twitter噬魂语料库 - 在涉及食品,饮料,进食和饮水的狭窄领域提供一整套的鸣叫。语料库已经收集了超过8年的时间跨度,包括超过200万个微博与其他有用的数据承担的责任。我们也分开的问题和答案的鸣叫和感悟注释鸣叫的两个子语料库。我们分析语料库的内容和演示用例通过使用从语料数据域的具体培训答疑和情感分析模型子语料库。
Uga Sproģis, Matīss Rikters
Abstract: We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using data from the corpus.
摘要:我们提出拉脱维亚的Twitter噬魂语料库 - 在涉及食品,饮料,进食和饮水的狭窄领域提供一整套的鸣叫。语料库已经收集了超过8年的时间跨度,包括超过200万个微博与其他有用的数据承担的责任。我们也分开的问题和答案的鸣叫和感悟注释鸣叫的两个子语料库。我们分析语料库的内容和演示用例通过使用从语料数据域的具体培训答疑和情感分析模型子语料库。
6. Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling [PDF] 返回目录
Leonard K. M. Poon, Nevin L. Zhang, Haoran Xie, Gary Cheng
Abstract: Topic modeling has been one of the most active research areas in machine learning in recent years. Hierarchical latent tree analysis (HLTA) has been recently proposed for hierarchical topic modeling and has shown superior performance over state-of-the-art methods. However, the models used in HLTA have a tree structure and cannot represent the different meanings of multiword expressions sharing the same word appropriately. Therefore, we propose a method for extracting and selecting collocations as a preprocessing step for HLTA. The selected collocations are replaced with single tokens in the bag-of-words model before running HLTA. Our empirical evaluation shows that the proposed method led to better performance of HLTA on three of the four data sets tested.
摘要:主题造型一直是近年来在机器学习中最活跃的研究领域之一。潜在的分层树分析(HLTA)最近已经提出了分层主题建模,并表现出对国家的最先进的方法,性能优越。然而,在HLTA使用的模型有一个树形结构并不能代表多字表达适当共享同一个词的不同含义。因此,我们提出用于提取和选择搭配作为HLTA预处理步骤的方法。所选择的搭配被替换为在袋的字模型的单个记号运行HLTA之前。我们的实证分析表明,该方法导致HLTA更好的性能在三个测试的四个数据集。
Leonard K. M. Poon, Nevin L. Zhang, Haoran Xie, Gary Cheng
Abstract: Topic modeling has been one of the most active research areas in machine learning in recent years. Hierarchical latent tree analysis (HLTA) has been recently proposed for hierarchical topic modeling and has shown superior performance over state-of-the-art methods. However, the models used in HLTA have a tree structure and cannot represent the different meanings of multiword expressions sharing the same word appropriately. Therefore, we propose a method for extracting and selecting collocations as a preprocessing step for HLTA. The selected collocations are replaced with single tokens in the bag-of-words model before running HLTA. Our empirical evaluation shows that the proposed method led to better performance of HLTA on three of the four data sets tested.
摘要:主题造型一直是近年来在机器学习中最活跃的研究领域之一。潜在的分层树分析(HLTA)最近已经提出了分层主题建模,并表现出对国家的最先进的方法,性能优越。然而,在HLTA使用的模型有一个树形结构并不能代表多字表达适当共享同一个词的不同含义。因此,我们提出用于提取和选择搭配作为HLTA预处理步骤的方法。所选择的搭配被替换为在袋的字模型的单个记号运行HLTA之前。我们的实证分析表明,该方法导致HLTA更好的性能在三个测试的四个数据集。
7. Advances of Transformer-Based Models for News Headline Generation [PDF] 返回目录
Alexey Bukhtiyarov, Ilya Gusev
Abstract: Pretrained language models based on Transformer architecture are the reason for recent breakthroughs in many areas of NLP, including sentiment analysis, question answering, named entity recognition. Headline generation is a special kind of text summarization task. Models need to have strong natural language understanding that goes beyond the meaning of individual words and sentences and an ability to distinguish essential information to succeed in it. In this paper, we fine-tune two pretrained Transformer-based models (mBART and BertSumAbs) for that task and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. BertSumAbs increases ROUGE on average by 4.6 and 5.9 points respectively over previous best score achieved by Pointer-Generator network.
摘要:基于变压器的架构预训练的语言模型是自然语言处理的许多领域的最新突破,包括情感分析,问题解答,命名实体识别的原因。标题生成是一种特殊的文本摘要任务。模型需要有强大的自然语言理解超越个人的单词和句子,并区分其取得成功的重要信息的能力的意思。在本文中,我们微调预训练2为该任务基于变压器的模型(mBART和BertSumAbs),实现国家的最先进的新的俄罗斯新闻RIA与此Lenta数据集的结果。 BertSumAbs由4.6和5.9分分别比以前的最好得分的指针发电机网络来实现增加平均胭脂。
Alexey Bukhtiyarov, Ilya Gusev
Abstract: Pretrained language models based on Transformer architecture are the reason for recent breakthroughs in many areas of NLP, including sentiment analysis, question answering, named entity recognition. Headline generation is a special kind of text summarization task. Models need to have strong natural language understanding that goes beyond the meaning of individual words and sentences and an ability to distinguish essential information to succeed in it. In this paper, we fine-tune two pretrained Transformer-based models (mBART and BertSumAbs) for that task and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. BertSumAbs increases ROUGE on average by 4.6 and 5.9 points respectively over previous best score achieved by Pointer-Generator network.
摘要:基于变压器的架构预训练的语言模型是自然语言处理的许多领域的最新突破,包括情感分析,问题解答,命名实体识别的原因。标题生成是一种特殊的文本摘要任务。模型需要有强大的自然语言理解超越个人的单词和句子,并区分其取得成功的重要信息的能力的意思。在本文中,我们微调预训练2为该任务基于变压器的模型(mBART和BertSumAbs),实现国家的最先进的新的俄罗斯新闻RIA与此Lenta数据集的结果。 BertSumAbs由4.6和5.9分分别比以前的最好得分的指针发电机网络来实现增加平均胭脂。
注:中文为机器翻译结果!封面为论文标题词云图!