目录
4. Biomedical named entity recognition using BERT in the machine reading comprehension framework [PDF] 摘要
8. Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading [PDF] 摘要
10. Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling [PDF] 摘要
13. Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions [PDF] 摘要
摘要
1. A Python Library for Exploratory Data Analysis and Knowledge Discovery on Twitter Data [PDF] 返回目录
Mario Graff, Daniela Moctezuma, Sabino Miranda-Jiménez, Eric S. Tellez
Abstract: Twitter is perhaps the social media more amenable for research. It requires only a few steps to obtain information, and there are plenty of libraries that can help in this regard. Nonetheless, knowing whether a particular event is expressed on Twitter is a challenging task that requires a considerable collection of tweets. This proposal aims to facilitate, a researcher interested in Twitter data, the process of mining events on Twitter. The events could be related to natural disasters, health issues, people's mobility, among other studies that can be pursued with the library proposed. Different applications are presented in this contribution to illustrate the library's capabilities, starting from an exploratory analysis of the topics discovered in tweets, following it by studying the similarity among dialects of the Spanish language, and complementing it with a mobility report on different countries. In summary, the Python library presented retrieves a plethora of information processed from Twitter (since December 2015) in terms of words, bigrams of words, and their frequencies by day for Arabic, English, Spanish, and Russian languages. Finally, the mobility information considered is related to the number of travels among locations for more than 245 countries or territories.
摘要:Twitter的也许是社会化媒体的研究更适合。它只需要几个步骤来获取信息,并有大量的库,可以在这方面帮助的。然而,知道一个特定的事件是否在Twitter上表达是需要相当大的集合鸣叫的具有挑战性的任务。该提案的目的是促进,有兴趣在Twitter上的数据的研究员,在Twitter上挖掘事件的过程。该事件可能与自然灾害,健康问题,人们的流动性,其他研究中,可以与所提出的库追究。不同的应用程序在这方面的贡献来说明本库的能力,从主题的探索性分析开始在微博发现,通过学习西班牙语的方言之间的相似性,并与不同国家的移动性报告补充它跟随它。总之,Python库检索提出的(自2015年12月)从Twitter处理信息的话来说过多,文字的二元语法,以及它们的频率由每天阿拉伯语,英语,西班牙语和俄语语言。最后,考虑到移动性信息与超过245个国家或地区位置之间行驶的次数。
Mario Graff, Daniela Moctezuma, Sabino Miranda-Jiménez, Eric S. Tellez
Abstract: Twitter is perhaps the social media more amenable for research. It requires only a few steps to obtain information, and there are plenty of libraries that can help in this regard. Nonetheless, knowing whether a particular event is expressed on Twitter is a challenging task that requires a considerable collection of tweets. This proposal aims to facilitate, a researcher interested in Twitter data, the process of mining events on Twitter. The events could be related to natural disasters, health issues, people's mobility, among other studies that can be pursued with the library proposed. Different applications are presented in this contribution to illustrate the library's capabilities, starting from an exploratory analysis of the topics discovered in tweets, following it by studying the similarity among dialects of the Spanish language, and complementing it with a mobility report on different countries. In summary, the Python library presented retrieves a plethora of information processed from Twitter (since December 2015) in terms of words, bigrams of words, and their frequencies by day for Arabic, English, Spanish, and Russian languages. Finally, the mobility information considered is related to the number of travels among locations for more than 245 countries or territories.
摘要:Twitter的也许是社会化媒体的研究更适合。它只需要几个步骤来获取信息,并有大量的库,可以在这方面帮助的。然而,知道一个特定的事件是否在Twitter上表达是需要相当大的集合鸣叫的具有挑战性的任务。该提案的目的是促进,有兴趣在Twitter上的数据的研究员,在Twitter上挖掘事件的过程。该事件可能与自然灾害,健康问题,人们的流动性,其他研究中,可以与所提出的库追究。不同的应用程序在这方面的贡献来说明本库的能力,从主题的探索性分析开始在微博发现,通过学习西班牙语的方言之间的相似性,并与不同国家的移动性报告补充它跟随它。总之,Python库检索提出的(自2015年12月)从Twitter处理信息的话来说过多,文字的二元语法,以及它们的频率由每天阿拉伯语,英语,西班牙语和俄语语言。最后,考虑到移动性信息与超过245个国家或地区位置之间行驶的次数。
2. The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task [PDF] 返回目录
James Barry, Joachim Wagner, Jennifer Foster
Abstract: We describe the ADAPT system for the 2020 IWPT Shared Task on parsing enhanced Universal Dependencies in 17 languages. We implement a pipeline approach using UDPipe and UDPipe-future to provide initial levels of annotation. The enhanced dependency graph is either produced by a graph-based semantic dependency parser or is built from the basic tree using a small set of heuristics. Our results show that, for the majority of languages, a semantic dependency parser can be successfully applied to the task of parsing enhanced dependencies. Unfortunately, we did not ensure a connected graph as part of our pipeline approach and our competition submission relied on a last-minute fix to pass the validation script which harmed our official evaluation scores significantly. Our submission ranked eighth in the official evaluation with a macro-averaged coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented our own graph-connecting fix which resulted in a score of 79.53 (language average) or 79.76 (treebank average), which would have placed fourth in the competition evaluation.
摘要:我们描述的解析增强型通用相关性从17种语言为2020年IWPT共享任务的ADAPT系统。我们实现使用UDPipe和UDPipe-未来提供注解的初始水平管道的做法。增强的依赖图或者通过基于图的语义依赖解析器产生的或使用小组试探法从基本树构建。我们的研究结果表明,对于大多数的语言,语义依赖解析器可以成功地应用到解析增强依赖的任务。不幸的是,我们并没有确保连通图作为我们的管道方案的一部分,我们的竞争提交依靠最后一分钟的修复通过其显著伤害我们的官方评价得分验证脚本。我们在提交的官方评价排名第八与67.49宏平均粗的67.23 ELAS F1和树库平均。后来我们实现我们这就造成了得分79.53(语言平均值)或79.76(树库平均),这将在竞争中评价第四放在自己的图形,连接修复。
James Barry, Joachim Wagner, Jennifer Foster
Abstract: We describe the ADAPT system for the 2020 IWPT Shared Task on parsing enhanced Universal Dependencies in 17 languages. We implement a pipeline approach using UDPipe and UDPipe-future to provide initial levels of annotation. The enhanced dependency graph is either produced by a graph-based semantic dependency parser or is built from the basic tree using a small set of heuristics. Our results show that, for the majority of languages, a semantic dependency parser can be successfully applied to the task of parsing enhanced dependencies. Unfortunately, we did not ensure a connected graph as part of our pipeline approach and our competition submission relied on a last-minute fix to pass the validation script which harmed our official evaluation scores significantly. Our submission ranked eighth in the official evaluation with a macro-averaged coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented our own graph-connecting fix which resulted in a score of 79.53 (language average) or 79.76 (treebank average), which would have placed fourth in the competition evaluation.
摘要:我们描述的解析增强型通用相关性从17种语言为2020年IWPT共享任务的ADAPT系统。我们实现使用UDPipe和UDPipe-未来提供注解的初始水平管道的做法。增强的依赖图或者通过基于图的语义依赖解析器产生的或使用小组试探法从基本树构建。我们的研究结果表明,对于大多数的语言,语义依赖解析器可以成功地应用到解析增强依赖的任务。不幸的是,我们并没有确保连通图作为我们的管道方案的一部分,我们的竞争提交依靠最后一分钟的修复通过其显著伤害我们的官方评价得分验证脚本。我们在提交的官方评价排名第八与67.49宏平均粗的67.23 ELAS F1和树库平均。后来我们实现我们这就造成了得分79.53(语言平均值)或79.76(树库平均),这将在竞争中评价第四放在自己的图形,连接修复。
3. SRQA: Synthetic Reader for Factoid Question Answering [PDF] 返回目录
Jiuniu Wang, Wenjia Xu, Xingyu Fu, Yang Wei, Li Jin, Ziyan Chen, Guangluan Xu, Yirong Wu
Abstract: The question answering system can answer questions from various fields and forms with deep neural networks, but it still lacks effective ways when facing multiple evidences. We introduce a new model called SRQA, which means Synthetic Reader for Factoid Question Answering. This model enhances the question answering system in the multi-document scenario from three aspects: model structure, optimization goal, and training method, corresponding to Multilayer Attention (MA), Cross Evidence (CE), and Adversarial Training (AT) respectively. First, we propose a multilayer attention network to obtain a better representation of the evidences. The multilayer attention mechanism conducts interaction between the question and the passage within each layer, making the token representation of evidences in each layer takes the requirement of the question into account. Second, we design a cross evidence strategy to choose the answer span within more evidences. We improve the optimization goal, considering all the answers' locations in multiple evidences as training targets, which leads the model to reason among multiple evidences. Third, adversarial training is employed to high-level variables besides the word embedding in our model. A new normalization method is also proposed for adversarial perturbations so that we can jointly add perturbations to several target variables. As an effective regularization method, adversarial training enhances the model's ability to process noisy data. Combining these three strategies, we enhance the contextual representation and locating ability of our model, which could synthetically extract the answer span from several evidences. We perform SRQA on the WebQA dataset, and experiments show that our model outperforms the state-of-the-art models (the best fuzzy score of our model is up to 78.56%, with an improvement of about 2%).
摘要:问答系统能回答来自各个领域,并与深层神经网络形式的问题,但面临多重证据时,它仍然缺乏有效途径。我们引入了一个名为SRQA新模式,这意味着合成读本事实型询问应答。这种模式增强了的问题从三个方面在多文档情景应答系统:模型结构,优化目标和培训方法,与多层注意(MA),十字证据(CE),分别对抗性训练(AT)。首先,我们提出了一个多层关注网络获取证据的更好的代表性。的问题,并且每个层内的通道,使得在各层证据的令牌表示之间的多层注意机制行为交互发生的问题的需求考虑进去。其次,我们设计了一个跨证据战略选择更多的证据内的答案范围。我们完善的优化目标,考虑到所有的答案在多个证据作为培养对象,这导致该模型的多个证据中原因的位置。第三,对抗性训练,采用高层次的变量,除了这个词在我们的模型嵌入。新的规范化方法也提出了对抗性的扰动,使我们可以共同加入扰动多个目标变量。作为一种有效的正则化方法,对抗性训练提高模型的处理噪声数据的能力。结合这三大战略,我们增强了模型的上下文表示和定位能力,这可以从综合几个证据提取答案跨度。我们在WebQA数据集进行SRQA,和实验表明,我们的模型优于国家的最先进的模型(我们的模型的最佳模糊分数高达78.56%,约2%的改善)。
Jiuniu Wang, Wenjia Xu, Xingyu Fu, Yang Wei, Li Jin, Ziyan Chen, Guangluan Xu, Yirong Wu
Abstract: The question answering system can answer questions from various fields and forms with deep neural networks, but it still lacks effective ways when facing multiple evidences. We introduce a new model called SRQA, which means Synthetic Reader for Factoid Question Answering. This model enhances the question answering system in the multi-document scenario from three aspects: model structure, optimization goal, and training method, corresponding to Multilayer Attention (MA), Cross Evidence (CE), and Adversarial Training (AT) respectively. First, we propose a multilayer attention network to obtain a better representation of the evidences. The multilayer attention mechanism conducts interaction between the question and the passage within each layer, making the token representation of evidences in each layer takes the requirement of the question into account. Second, we design a cross evidence strategy to choose the answer span within more evidences. We improve the optimization goal, considering all the answers' locations in multiple evidences as training targets, which leads the model to reason among multiple evidences. Third, adversarial training is employed to high-level variables besides the word embedding in our model. A new normalization method is also proposed for adversarial perturbations so that we can jointly add perturbations to several target variables. As an effective regularization method, adversarial training enhances the model's ability to process noisy data. Combining these three strategies, we enhance the contextual representation and locating ability of our model, which could synthetically extract the answer span from several evidences. We perform SRQA on the WebQA dataset, and experiments show that our model outperforms the state-of-the-art models (the best fuzzy score of our model is up to 78.56%, with an improvement of about 2%).
摘要:问答系统能回答来自各个领域,并与深层神经网络形式的问题,但面临多重证据时,它仍然缺乏有效途径。我们引入了一个名为SRQA新模式,这意味着合成读本事实型询问应答。这种模式增强了的问题从三个方面在多文档情景应答系统:模型结构,优化目标和培训方法,与多层注意(MA),十字证据(CE),分别对抗性训练(AT)。首先,我们提出了一个多层关注网络获取证据的更好的代表性。的问题,并且每个层内的通道,使得在各层证据的令牌表示之间的多层注意机制行为交互发生的问题的需求考虑进去。其次,我们设计了一个跨证据战略选择更多的证据内的答案范围。我们完善的优化目标,考虑到所有的答案在多个证据作为培养对象,这导致该模型的多个证据中原因的位置。第三,对抗性训练,采用高层次的变量,除了这个词在我们的模型嵌入。新的规范化方法也提出了对抗性的扰动,使我们可以共同加入扰动多个目标变量。作为一种有效的正则化方法,对抗性训练提高模型的处理噪声数据的能力。结合这三大战略,我们增强了模型的上下文表示和定位能力,这可以从综合几个证据提取答案跨度。我们在WebQA数据集进行SRQA,和实验表明,我们的模型优于国家的最先进的模型(我们的模型的最佳模糊分数高达78.56%,约2%的改善)。
4. Biomedical named entity recognition using BERT in the machine reading comprehension framework [PDF] 返回目录
Cong Sun, Zhihao Yang, Lei Wang, Yin Zhang, Hongfei Lin, Jian Wang
Abstract: Recognition of biomedical entities from literature is a challenging research focus, which is the foundation for extracting a large amount of biomedical knowledge existing in unstructured texts into structured formats. Using the sequence labeling framework to implement biomedical named entity recognition (BioNER) is currently a conventional method. This method, however, often cannot take full advantage of the semantic information in the dataset, and the performance is not always satisfactory. In this work, instead of treating the BioNER task as a sequence labeling problem, we formulate it as a machine reading comprehension (MRC) problem. This formulation can introduce more prior knowledge utilizing well-designed queries, and no longer need decoding processes such as conditional random fields (CRF). We conduct experiments on six BioNER datasets, and the experimental results demonstrate the effectiveness of our method. Our method achieves state-of-the-art (SOTA) performance on the BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI Disease, BC2GM and JNLPBA datasets, with F1-scores of 92.38%, 94.19%, 87.36%, 90.04%, 84.98% and 78.93%, respectively.
摘要:从生物医学文献实体的识别是一个具有挑战性的研究重点,这是用于提取存在于非结构化的文本翻译成结构化的格式大量生物医学知识的基础。使用序列标签框架实现生物医学命名实体识别(BioNER)是目前的常规方法。但是这种方法,往往不能充分利用数据集中的语义信息,而且性能并不总是令人满意。在这项工作中,而不是将BioNER任务作为序列标注问题,我们制定为一个计算机阅读理解(MRC)的问题。该制剂可以引入利用精心设计的查询的详细的先验知识,并且不再需要的解码处理,诸如条件随机场(CRF)。我们在六个BioNER数据集进行实验,实验结果证明了该方法的有效性。我们的方法实现了对BC4CHEMD状态的最先进的(SOTA)的性能,BC5CDR-CHEM,BC5CDR-病,NCBI病,BC2GM和JNLPBA数据集,以92.38%,94.19%,87.36%F1-分数,90.04% ,84.98%和78.93分别%。
Cong Sun, Zhihao Yang, Lei Wang, Yin Zhang, Hongfei Lin, Jian Wang
Abstract: Recognition of biomedical entities from literature is a challenging research focus, which is the foundation for extracting a large amount of biomedical knowledge existing in unstructured texts into structured formats. Using the sequence labeling framework to implement biomedical named entity recognition (BioNER) is currently a conventional method. This method, however, often cannot take full advantage of the semantic information in the dataset, and the performance is not always satisfactory. In this work, instead of treating the BioNER task as a sequence labeling problem, we formulate it as a machine reading comprehension (MRC) problem. This formulation can introduce more prior knowledge utilizing well-designed queries, and no longer need decoding processes such as conditional random fields (CRF). We conduct experiments on six BioNER datasets, and the experimental results demonstrate the effectiveness of our method. Our method achieves state-of-the-art (SOTA) performance on the BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI Disease, BC2GM and JNLPBA datasets, with F1-scores of 92.38%, 94.19%, 87.36%, 90.04%, 84.98% and 78.93%, respectively.
摘要:从生物医学文献实体的识别是一个具有挑战性的研究重点,这是用于提取存在于非结构化的文本翻译成结构化的格式大量生物医学知识的基础。使用序列标签框架实现生物医学命名实体识别(BioNER)是目前的常规方法。但是这种方法,往往不能充分利用数据集中的语义信息,而且性能并不总是令人满意。在这项工作中,而不是将BioNER任务作为序列标注问题,我们制定为一个计算机阅读理解(MRC)的问题。该制剂可以引入利用精心设计的查询的详细的先验知识,并且不再需要的解码处理,诸如条件随机场(CRF)。我们在六个BioNER数据集进行实验,实验结果证明了该方法的有效性。我们的方法实现了对BC4CHEMD状态的最先进的(SOTA)的性能,BC5CDR-CHEM,BC5CDR-病,NCBI病,BC2GM和JNLPBA数据集,以92.38%,94.19%,87.36%F1-分数,90.04% ,84.98%和78.93分别%。
5. orgFAQ: A New Dataset and Analysis on Organizational FAQs and User Questions [PDF] 返回目录
Guy Lev, Michal Shmueli-Scheuer, Achiya Jerbi, David Konopnicki
Abstract: Frequently Asked Questions (FAQ) webpages are created by organizations for their users. FAQs are used in several scenarios, e.g., to answer user questions. On the other hand, the content of FAQs is affected by user questions by definition. In order to promote research in this field, several FAQ datasets exist. However, we claim that being collected from community websites, they do not correctly represent challenges associated with FAQs in an organizational context. Thus, we release orgFAQ, a new dataset composed of $6988$ user questions and $1579$ corresponding FAQs that were extracted from organizations' FAQ webpages in the Jobs domain. In this paper, we provide an analysis of the properties of such FAQs, and demonstrate the usefulness of our new dataset by utilizing it in a relevant task from the Jobs domain. We also show the value of the orgFAQ dataset in a task of a different domain - the COVID-19 pandemic.
摘要:常见问题(FAQ)网页是由为他们的用户组织创建的。常见问题在一些情况下,例如,用来回答用户提出的问题。在另一方面,常见问题解答的内容是由用户定义的问题的影响。为了促进该领域的研究,存在几个常见问题集。然而,我们要求,从社区网站收集,他们没有正确表示与常见问题在组织环境相关的挑战。因此,我们发布orgFAQ的$ 6988 $用户提出的问题组成一个新的数据集和$ 1579年$对应的常见问题从乔布斯域组织的常见问题网页中提取的。在本文中,我们提供了这样的常见问题的性质进行了分析,并利用它从工作领域相关的任务,证明我们的新数据集的有用性。我们还显示orgFAQ数据集的价值在不同领域的任务 - COVID-19大流行。
Guy Lev, Michal Shmueli-Scheuer, Achiya Jerbi, David Konopnicki
Abstract: Frequently Asked Questions (FAQ) webpages are created by organizations for their users. FAQs are used in several scenarios, e.g., to answer user questions. On the other hand, the content of FAQs is affected by user questions by definition. In order to promote research in this field, several FAQ datasets exist. However, we claim that being collected from community websites, they do not correctly represent challenges associated with FAQs in an organizational context. Thus, we release orgFAQ, a new dataset composed of $6988$ user questions and $1579$ corresponding FAQs that were extracted from organizations' FAQ webpages in the Jobs domain. In this paper, we provide an analysis of the properties of such FAQs, and demonstrate the usefulness of our new dataset by utilizing it in a relevant task from the Jobs domain. We also show the value of the orgFAQ dataset in a task of a different domain - the COVID-19 pandemic.
摘要:常见问题(FAQ)网页是由为他们的用户组织创建的。常见问题在一些情况下,例如,用来回答用户提出的问题。在另一方面,常见问题解答的内容是由用户定义的问题的影响。为了促进该领域的研究,存在几个常见问题集。然而,我们要求,从社区网站收集,他们没有正确表示与常见问题在组织环境相关的挑战。因此,我们发布orgFAQ的$ 6988 $用户提出的问题组成一个新的数据集和$ 1579年$对应的常见问题从乔布斯域组织的常见问题网页中提取的。在本文中,我们提供了这样的常见问题的性质进行了分析,并利用它从工作领域相关的任务,证明我们的新数据集的有用性。我们还显示orgFAQ数据集的价值在不同领域的任务 - COVID-19大流行。
6. Learning to summarize from human feedback [PDF] 返回目录
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.
摘要:作为语言模型变得更加强大,培训和评价越来越多地被用于特定任务的数据和指标瓶颈。例如,汇总模型经常训练来预测人类参考摘要和使用ROUGE评估,但是这两个指标是什么,我们真正关心---总结质量粗糙的代理。在这项工作中,我们表明,它是可以通过训练模式,以优化人类的喜好来显著改善总结的质量。我们收集人类比较摘要之间存在较大的,高品质的数据集,训练一个模型来预测人类偏好的总结,并利用强化学习使用该模型作为奖励功能微调了总结政策。我们应用我们的方法来一个版本的TL的; reddit的职位DR数据集,并发现我们的模型显著超越人类参考摘要和更大的模型微调单独监督学习。我们的模型也转移到CNN / DM的新闻文章,摘要生产几乎一样好,没有任何具体的消息,微调的人参考。我们进行了广泛的分析,以了解我们的人的反馈数据集和微调模型。我们建立了我们的奖励模式推广到新的数据集,并优化我们的奖励模式的结果,更好的总结不是根据人类的优化胭脂。我们希望证据我们的文章能够激励机器学习研究人员密切注意他们的培训损失如何影响他们真正想要的模型行为。
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.
摘要:作为语言模型变得更加强大,培训和评价越来越多地被用于特定任务的数据和指标瓶颈。例如,汇总模型经常训练来预测人类参考摘要和使用ROUGE评估,但是这两个指标是什么,我们真正关心---总结质量粗糙的代理。在这项工作中,我们表明,它是可以通过训练模式,以优化人类的喜好来显著改善总结的质量。我们收集人类比较摘要之间存在较大的,高品质的数据集,训练一个模型来预测人类偏好的总结,并利用强化学习使用该模型作为奖励功能微调了总结政策。我们应用我们的方法来一个版本的TL的; reddit的职位DR数据集,并发现我们的模型显著超越人类参考摘要和更大的模型微调单独监督学习。我们的模型也转移到CNN / DM的新闻文章,摘要生产几乎一样好,没有任何具体的消息,微调的人参考。我们进行了广泛的分析,以了解我们的人的反馈数据集和微调模型。我们建立了我们的奖励模式推广到新的数据集,并优化我们的奖励模式的结果,更好的总结不是根据人类的优化胭脂。我们希望证据我们的文章能够激励机器学习研究人员密切注意他们的培训损失如何影响他们真正想要的模型行为。
7. A Simple Global Neural Discourse Parser [PDF] 返回目录
Yichu Zhou, Omri Koshorek, Vivek Srikumar, Jonathan Berant
Abstract: Discourse parsing is largely dominated by greedy parsers with manually-designed features, while global parsing is rare due to its computational expense. In this paper, we propose a simple chart-based neural discourse parser that does not require any manually-crafted features and is based on learned span representations only. To overcome the computational challenge, we propose an independence assumption between the label assigned to a node in the tree and the splitting point that separates its children, which results in tractable decoding. We empirically demonstrate that our model achieves the best performance among global parsers, and comparable performance to state-of-art greedy parsers, using only learned span representations.
摘要:话语分析在很大程度上是由贪婪与分析器手工设计的功能为主,而全球的解析是它的计算开销罕见所致。在本文中,我们提出了一个简单的基于图表的神经话语分析器,它不需要任何手工制作的功能,但仅基于了解到跨度表示。为了解决计算难题,我们建议分配给树中的一个节点,并分隔它的孩子,这导致听话解码分流点标签之间的独立性假设。我们经验表明,我们的模型实现全球解析器中国家的最先进的贪婪解析器最佳的性能和相媲美的性能,使用才得知跨度表示。
Yichu Zhou, Omri Koshorek, Vivek Srikumar, Jonathan Berant
Abstract: Discourse parsing is largely dominated by greedy parsers with manually-designed features, while global parsing is rare due to its computational expense. In this paper, we propose a simple chart-based neural discourse parser that does not require any manually-crafted features and is based on learned span representations only. To overcome the computational challenge, we propose an independence assumption between the label assigned to a node in the tree and the splitting point that separates its children, which results in tractable decoding. We empirically demonstrate that our model achieves the best performance among global parsers, and comparable performance to state-of-art greedy parsers, using only learned span representations.
摘要:话语分析在很大程度上是由贪婪与分析器手工设计的功能为主,而全球的解析是它的计算开销罕见所致。在本文中,我们提出了一个简单的基于图表的神经话语分析器,它不需要任何手工制作的功能,但仅基于了解到跨度表示。为了解决计算难题,我们建议分配给树中的一个节点,并分隔它的孩子,这导致听话解码分流点标签之间的独立性假设。我们经验表明,我们的模型实现全球解析器中国家的最先进的贪婪解析器最佳的性能和相媲美的性能,使用才得知跨度表示。
8. Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading [PDF] 返回目录
Sasi Kiran Gaddipati, Deebul Nair, Paul G. Plöger
Abstract: Automatic Short Answer Grading (ASAG) is the process of grading the student answers by computational approaches given a question and the desired answer. Previous works implemented the methods of concept mapping, facet mapping, and some used the conventional word embeddings for extracting semantic features. They extracted multiple features manually to train on the corresponding datasets. We use pretrained embeddings of the transfer learning models, ELMo, BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a single feature, cosine similarity, extracted from the embeddings of these models. We compare the RMSE scores and correlation measurements of the four models with previous works on Mohler dataset. Our work demonstrates that ELMo outperformed the other three models. We also, briefly describe the four transfer learning models and conclude with the possible causes of poor results of transfer learning models.
摘要:自动简短回答分级(ASAG)是分级给出一个问题和所需答案的计算方法,学生的答案的过程。以前的作品实现的概念图,小映射的方法,有的使用传统的嵌入一词用于提取语义特征。他们手动提取的多个特征以在相应的数据集训练。我们使用预训练的转移学习模型,ELMO,BERT,GPT,和GPT-2的嵌入,以评估这项任务的效率。我们培养具有单一功能,余弦相似性,从这些模型的嵌入物提取。我们比较了四种模式的RMSE分数和相关测量与莫勒数据集以前的作品。我们的工作表明,埃尔莫优于其他三款车型。我们还简要介绍了四种传输学习模型,传输模型的学习效果差的可能原因的结论。
Sasi Kiran Gaddipati, Deebul Nair, Paul G. Plöger
Abstract: Automatic Short Answer Grading (ASAG) is the process of grading the student answers by computational approaches given a question and the desired answer. Previous works implemented the methods of concept mapping, facet mapping, and some used the conventional word embeddings for extracting semantic features. They extracted multiple features manually to train on the corresponding datasets. We use pretrained embeddings of the transfer learning models, ELMo, BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a single feature, cosine similarity, extracted from the embeddings of these models. We compare the RMSE scores and correlation measurements of the four models with previous works on Mohler dataset. Our work demonstrates that ELMo outperformed the other three models. We also, briefly describe the four transfer learning models and conclude with the possible causes of poor results of transfer learning models.
摘要:自动简短回答分级(ASAG)是分级给出一个问题和所需答案的计算方法,学生的答案的过程。以前的作品实现的概念图,小映射的方法,有的使用传统的嵌入一词用于提取语义特征。他们手动提取的多个特征以在相应的数据集训练。我们使用预训练的转移学习模型,ELMO,BERT,GPT,和GPT-2的嵌入,以评估这项任务的效率。我们培养具有单一功能,余弦相似性,从这些模型的嵌入物提取。我们比较了四种模式的RMSE分数和相关测量与莫勒数据集以前的作品。我们的工作表明,埃尔莫优于其他三款车型。我们还简要介绍了四种传输学习模型,传输模型的学习效果差的可能原因的结论。
9. Knowing What to Listen to: Early Attention for Deep Speech Representation Learning [PDF] 返回目录
Amirhossein Hajavi, Ali Etemad
Abstract: Deep learning techniques have considerably improved speech processing in recent years. Speech representations extracted by deep learning models are being used in a wide range of tasks such as speech recognition, speaker recognition, and speech emotion recognition. Attention models play an important role in improving deep learning models. However current attention mechanisms are unable to attend to fine-grained information items. In this paper we propose the novel Fine-grained Early Frequency Attention (FEFA) for speech signals. This model is capable of focusing on information items as small as frequency bins. We evaluate the proposed model on two popular tasks of speaker recognition and speech emotion recognition. Two widely used public datasets, VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on top of several prominent deep models as backbone networks to evaluate its impact on performance compared to the original networks and other related work. Our experiments show that by adding FEFA to different CNN architectures, performance is consistently improved by substantial margins, even setting a new state-of-the-art for the speaker recognition task. We also tested our model against different levels of added noise showing improvements in robustness and less sensitivity compared to the backbone networks.
摘要:深学习技术在最近几年显着改善语音处理。通过深度学习模型提取的语音表示正在大范围的任务,如语音识别,说话人识别和语音情感识别的使用。注意模型在提高深度学习模型具有重要作用。然而目前的注意力机制无法参加到细粒度信息项。在本文中,我们提出了语音信号的新型细粒度早期频率注意(FEFA)。这种模式能够专注于信息项小频点的。我们评估对说话人识别和语音情感识别的两种流行的任务所提出的模型。两种广泛使用的公共数据集,VoxCeleb和IEMOCAP,用于我们的实验。该模型上的几个著名的深车型上实现的骨干网,以评估相比原有网络等相关工作绩效及其影响。我们的实验表明,通过增加FEFA不同CNN架构,性能始终被大幅提高利润率,甚至设置一个新的国家的最先进的说话人识别任务。我们还测试了我们的模型对不同水平的鲁棒性和敏感性更低噪声增加显示改进相比,骨干网络。
Amirhossein Hajavi, Ali Etemad
Abstract: Deep learning techniques have considerably improved speech processing in recent years. Speech representations extracted by deep learning models are being used in a wide range of tasks such as speech recognition, speaker recognition, and speech emotion recognition. Attention models play an important role in improving deep learning models. However current attention mechanisms are unable to attend to fine-grained information items. In this paper we propose the novel Fine-grained Early Frequency Attention (FEFA) for speech signals. This model is capable of focusing on information items as small as frequency bins. We evaluate the proposed model on two popular tasks of speaker recognition and speech emotion recognition. Two widely used public datasets, VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on top of several prominent deep models as backbone networks to evaluate its impact on performance compared to the original networks and other related work. Our experiments show that by adding FEFA to different CNN architectures, performance is consistently improved by substantial margins, even setting a new state-of-the-art for the speaker recognition task. We also tested our model against different levels of added noise showing improvements in robustness and less sensitivity compared to the backbone networks.
摘要:深学习技术在最近几年显着改善语音处理。通过深度学习模型提取的语音表示正在大范围的任务,如语音识别,说话人识别和语音情感识别的使用。注意模型在提高深度学习模型具有重要作用。然而目前的注意力机制无法参加到细粒度信息项。在本文中,我们提出了语音信号的新型细粒度早期频率注意(FEFA)。这种模式能够专注于信息项小频点的。我们评估对说话人识别和语音情感识别的两种流行的任务所提出的模型。两种广泛使用的公共数据集,VoxCeleb和IEMOCAP,用于我们的实验。该模型上的几个著名的深车型上实现的骨干网,以评估相比原有网络等相关工作绩效及其影响。我们的实验表明,通过增加FEFA不同CNN架构,性能始终被大幅提高利润率,甚至设置一个新的国家的最先进的说话人识别任务。我们还测试了我们的模型对不同水平的鲁棒性和敏感性更低噪声增加显示改进相比,骨干网络。
10. Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling [PDF] 返回目录
Tsendsuren Munkhdalai
Abstract: Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling.
摘要:培训一个深层神经网络需要大量的单任务数据和涉及耗时长优化阶段。这是不可扩展到新的意想不到的变化复杂,有现实的环境。人类可以在大脑飞和存储系统进行快速增量学习起着至关重要的作用。我们引入稀疏元网络 - 一元学习方法网上学习顺序自适应算法的深层神经网络,利用深层神经网络。我们增加了一层专用的快速重记忆深刻的神经网络。快速权重在每个时间步疏产生,并通过时间递增积累提供在线连续适应的有用归纳偏置。我们证明在各种顺序适应情景的强劲表现,从一个简单的在线强化学习,以大规模自适应语言模型。
Tsendsuren Munkhdalai
Abstract: Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling.
摘要:培训一个深层神经网络需要大量的单任务数据和涉及耗时长优化阶段。这是不可扩展到新的意想不到的变化复杂,有现实的环境。人类可以在大脑飞和存储系统进行快速增量学习起着至关重要的作用。我们引入稀疏元网络 - 一元学习方法网上学习顺序自适应算法的深层神经网络,利用深层神经网络。我们增加了一层专用的快速重记忆深刻的神经网络。快速权重在每个时间步疏产生,并通过时间递增积累提供在线连续适应的有用归纳偏置。我们证明在各种顺序适应情景的强劲表现,从一个简单的在线强化学习,以大规模自适应语言模型。
11. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis [PDF] 返回目录
Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu
Abstract: High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In this paper, we develop HiFiSinger, an SVS system towards high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic model and a Parallel WaveGAN based vocoder to ensure fast training and inference and also high voice quality. To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling. Specifically, 1) To handle the larger range of frequencies caused by higher sampling rate, we propose a novel sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full 80-dimensional mel-frequency into multiple sub-bands and models each sub-band with a separate discriminator. 2) To model longer waveform sequences caused by higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform generation to model different lengths of waveform sequences with separate discriminators. 3) We also introduce several additional designs and findings in HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch) and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate window/hop size for mel-spectrogram, and increasing the receptive field in vocoder for long vowel modeling. Experiment results show that HiFiSinger synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44 MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.
摘要:高保真歌声通常需要较高的采样速率(例如,48kHz的)来传达表达和情感。然而,较高的采样率导致更宽的频带和更长的波形序列并抛出为在频域和时域歌声合成(SVS)的挑战。是采用小的采样速率常规SVS系统不能很好地解决了上述难题。在本文中,我们开发HiFiSinger,对高保真歌声的SVS系统。 HiFiSinger由基于FastSpeech声学模型和基于并行WaveGAN声码器,以确保快速培训和推理以及高语音质量。为了解决唱歌引起高采样率(更宽的频带和更长的波形)建模的困难,我们引入两个声学模型及声码器多尺度对抗训练,以提高歌唱造型。具体而言,1)为了处理较大范围造成较高的采样速率的频率,我们提出了一个新颖的分频GAN上梅尔频谱代(SF-GAN),其将充满80维梅尔频率划分为多个子频带和模型的每个子带与一个单独的鉴别器。 2)为了模拟造成较高采样率更长波形序列,我们提出了一种多长度GAN(ML-GAN)为波形产生与单独的鉴别器波形序列的不同长度进行建模。 3)我们还介绍了几个额外的设计和发现在HiFiSinger是高保真声音关键,如添加F0(间距)和V / UV(有声/无声标志)作为声学特征,选择一个适当的窗/跳跃大小梅尔频谱,并增加了感受野的声码器的长元音建模。实验结果表明,HiFiSinger合成高保真唱腔,以更高的品质:0.32 / 0.44 MOS增益超过48kHz的/ 24kHz时基线和0.83 MOS增益比以前的SVS系统。
Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu
Abstract: High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In this paper, we develop HiFiSinger, an SVS system towards high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic model and a Parallel WaveGAN based vocoder to ensure fast training and inference and also high voice quality. To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling. Specifically, 1) To handle the larger range of frequencies caused by higher sampling rate, we propose a novel sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full 80-dimensional mel-frequency into multiple sub-bands and models each sub-band with a separate discriminator. 2) To model longer waveform sequences caused by higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform generation to model different lengths of waveform sequences with separate discriminators. 3) We also introduce several additional designs and findings in HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch) and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate window/hop size for mel-spectrogram, and increasing the receptive field in vocoder for long vowel modeling. Experiment results show that HiFiSinger synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44 MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.
摘要:高保真歌声通常需要较高的采样速率(例如,48kHz的)来传达表达和情感。然而,较高的采样率导致更宽的频带和更长的波形序列并抛出为在频域和时域歌声合成(SVS)的挑战。是采用小的采样速率常规SVS系统不能很好地解决了上述难题。在本文中,我们开发HiFiSinger,对高保真歌声的SVS系统。 HiFiSinger由基于FastSpeech声学模型和基于并行WaveGAN声码器,以确保快速培训和推理以及高语音质量。为了解决唱歌引起高采样率(更宽的频带和更长的波形)建模的困难,我们引入两个声学模型及声码器多尺度对抗训练,以提高歌唱造型。具体而言,1)为了处理较大范围造成较高的采样速率的频率,我们提出了一个新颖的分频GAN上梅尔频谱代(SF-GAN),其将充满80维梅尔频率划分为多个子频带和模型的每个子带与一个单独的鉴别器。 2)为了模拟造成较高采样率更长波形序列,我们提出了一种多长度GAN(ML-GAN)为波形产生与单独的鉴别器波形序列的不同长度进行建模。 3)我们还介绍了几个额外的设计和发现在HiFiSinger是高保真声音关键,如添加F0(间距)和V / UV(有声/无声标志)作为声学特征,选择一个适当的窗/跳跃大小梅尔频谱,并增加了感受野的声码器的长元音建模。实验结果表明,HiFiSinger合成高保真唱腔,以更高的品质:0.32 / 0.44 MOS增益超过48kHz的/ 24kHz时基线和0.83 MOS增益比以前的SVS系统。
12. Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding [PDF] 返回目录
Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Wei Liu, Shih-Fu Chang
Abstract: The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals. To this end, we propose Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMSoperation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance. Since Ref-NMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS.
摘要:用于解决参照表达接地普遍的框架是基于两阶段过程:1)检测带接地指涉到的提案之一的物体检测和2)的提案。现有的两阶段方案主要集中在接地的步骤,其目标是让与建议的表达。在本文中,我们认为,这些方法在两个阶段忽视的提案角色之间存在明显的不匹配:它们产生完全基于检测置信提案(即表达无关),希望这些建议包含了所有正确的情况下,在表达(即,表达感知)。由于这种不匹配,目前的两级从方法检测地面实况提案之间产生严重的性能下降受到影响。为此,我们提出REF-NMS,这是在第一阶段,以产生表达感知提案的第一种方法。 REF-NMS关于在表达式作为关键对象所有的名词,并且引入了一个轻便的模块来预测分数与临界对象对准每个框。这些分数可以引导NMSoperation过滤掉无关的表达框,增加关键对象的调用,从而导致改进的显著接地性能。由于REF-NMS是不可知的接地步骤,它可以被容易地集成到国家的最先进的任何两阶段方法。在几个骨干,基准和任务广泛切除研究一致表明REF-NMS的优越性。
Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Wei Liu, Shih-Fu Chang
Abstract: The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals. To this end, we propose Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMSoperation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance. Since Ref-NMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS.
摘要:用于解决参照表达接地普遍的框架是基于两阶段过程:1)检测带接地指涉到的提案之一的物体检测和2)的提案。现有的两阶段方案主要集中在接地的步骤,其目标是让与建议的表达。在本文中,我们认为,这些方法在两个阶段忽视的提案角色之间存在明显的不匹配:它们产生完全基于检测置信提案(即表达无关),希望这些建议包含了所有正确的情况下,在表达(即,表达感知)。由于这种不匹配,目前的两级从方法检测地面实况提案之间产生严重的性能下降受到影响。为此,我们提出REF-NMS,这是在第一阶段,以产生表达感知提案的第一种方法。 REF-NMS关于在表达式作为关键对象所有的名词,并且引入了一个轻便的模块来预测分数与临界对象对准每个框。这些分数可以引导NMSoperation过滤掉无关的表达框,增加关键对象的调用,从而导致改进的显著接地性能。由于REF-NMS是不可知的接地步骤,它可以被容易地集成到国家的最先进的任何两阶段方法。在几个骨干,基准和任务广泛切除研究一致表明REF-NMS的优越性。
13. Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions [PDF] 返回目录
Sara Evensen, Chang Ge, Dongjin Choi, Çağatay Demiralp
Abstract: Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machine learning, transferring domain expertise into labeling functions by enumerating rules and thresholds is not only time consuming but also inherently difficult. Here we propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics such as identifying relevant signals for labeling tasks. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples. We compare Ruler with conventional data programming through a user study conducted with 10 data scientists creating labeling functions for sentiment and spam classification tasks. We find that Ruler is easier to use and learn and offers higher overall satisfaction, while providing discriminative model performances comparable to ones achieved by conventional data programming.
摘要:数据编程是一个纲领性的监管不力的方法来有效地牧师大型标记的训练数据。写数据的程序(标记功能)要求,但是,无论是编程知识和专业领域知识。许多主题专家既没有编程经验,也没有时间去有效地写入数据的程序。此外,无论在编码或机器学习,通过枚举规则和阈值传输领域的专业知识为标签的功能之一的专长是不仅费时,而且本来就很难。在这里,我们通过实证提出了一个新的框架,数据编程(DPBD),以使用用户的互动演示标签规则。 DPBD目的是缓解由用户编写标签的功能,使他们能够专注于更高级别的语义,如识别标记任务相关信号的负担。我们投入运作我们的标尺,即通过使用实例文档的用户跨度级别的注解合成的文档分类标签规则的交互系统框架。我们比较标尺与传统的数据通过与10个数据科学家创建的标签功能情绪和垃圾分类任务进行的一项用户调查节目。我们发现,统治者更容易学习和使用,并提供更高的整体满意度,同时提供判别模型表演堪比常规数据编程实现的。
Sara Evensen, Chang Ge, Dongjin Choi, Çağatay Demiralp
Abstract: Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machine learning, transferring domain expertise into labeling functions by enumerating rules and thresholds is not only time consuming but also inherently difficult. Here we propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics such as identifying relevant signals for labeling tasks. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples. We compare Ruler with conventional data programming through a user study conducted with 10 data scientists creating labeling functions for sentiment and spam classification tasks. We find that Ruler is easier to use and learn and offers higher overall satisfaction, while providing discriminative model performances comparable to ones achieved by conventional data programming.
摘要:数据编程是一个纲领性的监管不力的方法来有效地牧师大型标记的训练数据。写数据的程序(标记功能)要求,但是,无论是编程知识和专业领域知识。许多主题专家既没有编程经验,也没有时间去有效地写入数据的程序。此外,无论在编码或机器学习,通过枚举规则和阈值传输领域的专业知识为标签的功能之一的专长是不仅费时,而且本来就很难。在这里,我们通过实证提出了一个新的框架,数据编程(DPBD),以使用用户的互动演示标签规则。 DPBD目的是缓解由用户编写标签的功能,使他们能够专注于更高级别的语义,如识别标记任务相关信号的负担。我们投入运作我们的标尺,即通过使用实例文档的用户跨度级别的注解合成的文档分类标签规则的交互系统框架。我们比较标尺与传统的数据通过与10个数据科学家创建的标签功能情绪和垃圾分类任务进行的一项用户调查节目。我们发现,统治者更容易学习和使用,并提供更高的整体满意度,同时提供判别模型表演堪比常规数据编程实现的。
14. Towards Earnings Call and Stock Price Movement [PDF] 返回目录
Zhiqiang Ma, Grace Bang, Chong Wang, Xiaomo Liu
Abstract: Earnings calls are hosted by management of public companies to discuss the company's financial performance with analysts and investors. Information disclosed during an earnings call is an essential source of data for analysts and investors to make investment decisions. Thus, we leverage earnings call transcripts to predict future stock price dynamics. We propose to model the language in transcripts using a deep learning framework, where an attention mechanism is applied to encode the text data into vectors for the discriminative network classifier to predict stock price movements. Our empirical experiments show that the proposed model is superior to the traditional machine learning baselines and earnings call information can boost the stock price prediction performance.
摘要:财报电话会议是由上市公司的管理托管讨论与分析师和投资者对公司的财务业绩。财报电话会议中透露的信息是对分析师和投资者进行投资决策的数据的一个重要来源。因此,我们利用财报电话会议的成绩单来预测未来股票价格动态。我们建议使用深层学习框架,其中被应用到编码的文本数据转换成矢量的判别网络分类来预测股价走势的注意机制在成绩单的语言模型。我们的实证实验表明,该模型优于传统的机器学习基线和盈利电话信息可以提振股价预测性能。
Zhiqiang Ma, Grace Bang, Chong Wang, Xiaomo Liu
Abstract: Earnings calls are hosted by management of public companies to discuss the company's financial performance with analysts and investors. Information disclosed during an earnings call is an essential source of data for analysts and investors to make investment decisions. Thus, we leverage earnings call transcripts to predict future stock price dynamics. We propose to model the language in transcripts using a deep learning framework, where an attention mechanism is applied to encode the text data into vectors for the discriminative network classifier to predict stock price movements. Our empirical experiments show that the proposed model is superior to the traditional machine learning baselines and earnings call information can boost the stock price prediction performance.
摘要:财报电话会议是由上市公司的管理托管讨论与分析师和投资者对公司的财务业绩。财报电话会议中透露的信息是对分析师和投资者进行投资决策的数据的一个重要来源。因此,我们利用财报电话会议的成绩单来预测未来股票价格动态。我们建议使用深层学习框架,其中被应用到编码的文本数据转换成矢量的判别网络分类来预测股价走势的注意机制在成绩单的语言模型。我们的实证实验表明,该模型优于传统的机器学习基线和盈利电话信息可以提振股价预测性能。
注:中文为机器翻译结果!封面为论文标题词云图!