摘要

1. The Unstoppable Rise of Computational Linguistics in Deep Learning [PDF] 返回目录
James Henderson
Abstract: In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure model. This perspective leads to predictions of the challenges facing research in deep learning architectures for natural language understanding.
摘要：在本文中，我们追溯适用于自然语言理解任务神经网络的历史，并确定哪种语言的性质已经在神经网络架构的发展作出了重要贡献。我们专注于变量绑定和它的注意力，基于模型的实例化的重要性，并认为变压器不序列模型，但诱导的结构模型。这种观点导致了面临深度学习架构的自然语言理解研究的挑战预测。

2. Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA [PDF] 返回目录
Hyounghun Kim, Zineng Tang, Mohit Bansal
Abstract: Videos convey rich information. Dynamic spatio-temporal relationships between people/objects, and diverse multimodal events are present in a video clip. Hence, it is important to develop automated models that can accurately extract such information from videos. Answering questions on videos is one of the tasks which can evaluate such AI abilities. In this paper, we propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions. Specifically, we first employ dense image captions to help identify objects and their detailed salient regions and actions, and hence give the model useful extra information (in explicit textual format to allow easier matching) for answering questions. Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier. Finally, we also cast the frame selection problem as a multi-label classification task and introduce two loss functions, In-andOut Frame Score Margin (IOFSM) and Balanced Binary Cross-Entropy (BBCE), to better supervise the model with human importance annotations. We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin (74.09% versus 70.52%). We also present several word, object, and frame level visualization studies. Our code is publicly available at: this https URL
摘要：影片传达丰富的信息。人/物，多样的多式联运事件之间动态的时空关系存在于一个视频剪辑。因此，开发自动模式，可以准确地从视频中提取这些信息是非常重要的。在视频回答问题是可以评估这些AI能力的任务之一。在本文中，我们提出一种有效整合多模式输入源和查找时间的相关信息来回答问题的视频答疑系统的模型。具体而言，我们首先聘请密集图片说明，以帮助识别对象及其详细的显着区域和行动，从而给模型有用的额外信息（明确的文本格式，以便更容易匹配）为回答问题。此外，我们的模型也由双级注意（字/对象和帧级），多头自/交叉融合的不同来源（视频和密集字幕），和大门其中传递更多的相关信息的分类。最后，我们也投架选择问题，多标签分类的任务，介绍了两种损失函数，在-andOut框架得分保证金（IOFSM）和平衡二叉树跨熵（BBCE），以更好地监督与人类的重要性批注的模型。我们评估我们的挑战TVQA数据集，其中每一个我们的模型组件提供显著的收益模式，我们的整体模型优于国家的最先进的大幅度（74.09％比70.52％）。我们还提出几个单词，对象，和帧级的可视化研究。我们的代码是公开的：这HTTPS URL

3. Sanskrit Segmentation Revisited [PDF] 返回目录
Sriram Krishnan, Amba Kulkarni
Abstract: Computationally analyzing Sanskrit texts requires proper segmentation in the initial stages. There have been various tools developed for Sanskrit text segmentation. Of these, Gérard Huet's Reader in the Sanskrit Heritage Engine analyzes the input text and segments it based on the word parameters - phases like iic, ifc, Pr, Subst, etc., and sandhi (or transition) that takes place at the end of a word with the initial part of the next word. And it enlists all the possible solutions differentiating them with the help of the phases. The phases and their analyses have their use in the domain of sentential parsers. In segmentation, though, they are not used beyond deciding whether the words formed with the phases are morphologically valid. This paper tries to modify the above segmenter by ignoring the phase details (except for a few cases), and also proposes a probability function to prioritize the list of solutions to bring up the most valid solutions at the top.
摘要：在计算上分析梵语文献需要在初始阶段适当的分割。已经有梵文文本分割开发的各种工具。其中，杰拉德·休特的读者在梵文文物引擎分析输入文本和段它的基础上的字参数 - 相类似IIC，国际金融公司（IFC），PR，SUBST等，变调（或过渡），在年底发生与下一个字的初始部分的词。并征集所有可能的解决方案阶段的帮助区分它们。该阶段和他们的分析有自己的句子解析器的域使用。在分割，虽然，他们没有使用超出决定与阶段形成的话是否是有效的形态。本文试图通过忽略相细节（除了少数情况下）修改上面的分割，并且还提出了一个概率函数，以解决方案的优先级列表在顶部弹出最有效的解决方案。

4. End-to-end Semantics-based Summary Quality Assessment for Single-document Summarization [PDF] 返回目录
Forrest Sheng Bao, Hebi Li, Ge Luo, Cen Chen, Yinfei Yang, Minghui Qiu
Abstract: ROUGE is the de facto criterion for summarization research. However, its two major drawbacks limit the research and application of automated summarization systems. First, ROUGE favors lexical similarity instead of semantic similarity, making it especially unfit for abstractive summarization. Second, ROUGE cannot function without a reference summary, which is expensive or impossible to obtain in many cases. Therefore, we introduce a new end-to-end metric system for summary quality assessment by leveraging the semantic similarities of words and/or sentences in deep learning. Models trained in our framework can evaluate a summary directly against the input document, without the need of a reference summary. The proposed approach exhibits very promising results on gold-standard datasets and suggests its great potential to future summarization research. The scores from our models have correlation coefficients up to 0.54 with human evaluations on machine generated summaries in TAC2010. Its performance is also very close to ROUGE metrics'.
摘要：ROUGE是总结研究的事实上的标准。然而，它的两个主要缺点限制了研究和总结自动化系统的应用。首先，有利于ROUGE词汇相似度，而不是语义相似性，使得对抽象概括它特别不适合。第二，胭脂不能起到没有参考摘要，这是昂贵的或不可能获得在许多情况下。因此，我们通过利用在深学习单词和/或句子的语义相似介绍简易质量评估新的终端到终端的公制。在我们的分析框架训练的模型可以对输入文档直接评估总结，而不需要参考的总结。所提出的方法具有非常看好黄金标准数据集的结果，并建议其巨大的潜力未来的总结研究。从我们的模型分数有相关系数高达0.54与在TAC2010机器生成的摘要人类评估。其性能也非常接近ROUGE指标。

5. BIOMRC: A Dataset for Biomedical Machine Reading Comprehension [PDF] 返回目录
Petros Stavropoulos, Dimitris Pappas, Ion Androutsopoulos, Ryan McDonald
Abstract: We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset, and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.
摘要：介绍BIOMRC，大规模完形填空式生物医学MRC数据集。小心地降低噪音，相比Pappas等人的前面BIOREAD数据集。（2018）。实验表明，简单的启发式不要在新的数据集表现良好，并已被上BIOREAD测试了两种神经MRC车型上BIOMRC更好的执行，这表明新的数据集确实不太嘈杂的，或者至少是它的任务是比较可行的。非专业人的表现也比较BIOREAD在新的更高的数据集和生物医学专家表现得更好。我们还推出了新的基于BERT-MRC模式，这实质上优于所有其它测试方法，达到或超过在一些实验生物医学专家的准确度最好的版本。我们做三种不同尺寸可供选择的新的数据集，还发布了我们的代码，并提供了一个排行榜。

6. Towards Hate Speech Detection at Large via Deep Generative Modeling [PDF] 返回目录
Tomer Wullach, Amir Adler, Einat Minkov
Abstract: Hate speech detection is a critical problem in social media platforms, being often accused for enabling the spread of hatred and igniting physical violence. Hate speech detection requires overwhelming resources including high-performance computing for online posts and tweets monitoring as well as thousands of human experts for daily screening of suspected posts or tweets. Recently, Deep Learning (DL)-based solutions have been proposed for automatic detection of hate speech, using modest-sized training datasets of few thousands of hate speech sequences. While these methods perform well on the specific datasets, their ability to detect new hate speech sequences is limited and has not been investigated. Being a data-driven approach, it is well known that DL surpasses other methods whenever a scale-up in train dataset size and diversity is achieved. Therefore, we first present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model. We further utilize the generated dataset to train a well-studied DL-based hate speech detector, and demonstrate consistent and significant performance improvements across five public hate speech datasets. Therefore, the proposed solution enables high sensitivity detection of a very large variety of hate speech sequences, paving the way to a fully automatic solution.
摘要：仇恨言论的检测是在社会化媒体平台的一个关键问题，而经常被指责为使仇恨蔓延并点燃暴力行动。仇恨言论检测需要巨大的资源，包括高性能计算对于网上的帖子和微博监测以及数千人的专家怀疑帖子或微博的日常筛查。近日，深学习（DL）为基础的解决方案已经提出了仇恨言论的自动检测，使用的几千仇恨言论序列的中等规模的训练数据集。虽然这些方法在特定的数据集表现良好，其发现新的仇恨言论序列的能力是有限的，并没有受到调查。作为一个数据驱动的方法，这是众所周知的是DL超过每当一个规模化的训练集的规模和多样性，实现其他方法。因此，我们首先提出百万逼真的仇恨和不恨序列的数据集，由深生成语言模型制作。我们进一步利用产生的数据集来训练充分研究的基础DL-仇恨言论探测器，并展示在五个公众仇恨言论的数据集一致，显著的性能改进。因此，提出的解决方案使一个非常大的各种憎恨言论序列的高灵敏度检测，铺平了道路全自动溶液。

7. Reasoning with Latent Structure Refinement for Document-Level Relation Extraction [PDF] 返回目录
Guoshun Nan, Zhijiang Guo, Ivan Sekulić, Wei Lu
Abstract: Document-level relation extraction requires integrating information within and across multiple sentences of a document and capturing complex interactions between inter-sentence entities. However, effective aggregation of relevant information in the document remains a challenging research question. Existing approaches construct static document-level graphs based on syntactic trees, co-references or heuristics from the unstructured text to model the dependencies. Unlike previous methods that may not be able to capture rich non-local interactions for inference, we propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph. We further develop a refinement strategy, which enables the model to incrementally aggregate relevant information for multi-hop reasoning. Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED), significantly improving over the previous results, and also yields new state-of-the-art results on the CDR and GDA dataset. Furthermore, extensive analyses show that the model is able to discover more accurate inter-sentence relations.
摘要：文档级关系抽取需要内部和跨文档的多个句子整合信息和捕捉句子间实体之间复杂的相互作用。然而，该文件中的相关信息进行聚合仍然是一个具有挑战性的研究问题。现有的方法基于语法树，共引用或启发式从非结构化文本的依赖关系模型的构造静态文档级别的图表。不同于以往的方法，可能无法捕捉到丰富的非本地交互的推论，我们提出了一种新的模式，它使通过自动感应潜文档级图形整个句子的关系推理。我们进一步制定细化的策略，这使得该模型为多跳推理逐渐聚集相关信息。具体来说，我们的模型实现了59.05的大规模文档级数据集（DocRED）的F1得分，显著提高了之前的结果，也得到国家的最先进的新的CDR和GDA数据集的结果。此外，广泛的分析表明，该模型能够发现更准确的句子间的关系。

8. Smart To-Do : Automatic Generation of To-Do Items from Emails [PDF] 返回目录
Sudipto Mukherjee, Subhabrata Mukherjee, Marcello Hasegawa, Ahmed Hassan Awadallah, Ryen White
Abstract: Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks. In this work, we explore a new application, Smart-To-Do, that helps users with task management over emails. We introduce a new task and dataset for automatically generating To-Do items from emails where the sender has promised to perform an action. We design a two-stage process leveraging recent advances in neural text generation and sequence-to-sequence learning, obtaining BLEU and ROUGE scores of 0:23 and 0:63 for this task. To the best of our knowledge, this is the first work to address the problem of composing To-Do items from emails.
摘要：在电子邮件服务应用程序的智能功能的目标是通过帮助人们安排自己的文件夹中，以提高生产率，撰写他们的电子邮件和响应尚未完成的任务。在这项工作中，我们探索新的应用程序，智能待办事项，帮助与任务管理过电子邮件的用户。我们推出了新的任务和数据集从那里发送方已经承诺要执行的操作的邮件自动生成待办事项。我们设计了一个两阶段的过程中充分利用神经文本生成和顺序对序列的学习，获得的0:23和0:63完成这个任务BLEU和ROUGE得分的最新进展。据我们所知，这是解决从电子邮件撰写待办事项的问题的第一部作品。

9. Mitigating Gender Bias Amplification in Distribution by Posterior Regularization [PDF] 返回目录
Shengyu Jia, Tao Meng, Jieyu Zhao, Kai-Wei Chang
Abstract: Advanced machine learning techniques have boosted the performance of natural language processing. Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hidden in the corpus and further amplify it. However, their analysis is conducted only on models' top predictions. In this paper, we investigate the gender bias amplification issue from the distribution perspective and demonstrate that the bias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization. With little performance loss, our method can almost remove the bias amplification in the distribution. Our study sheds the light on understanding the bias amplification.
摘要：先进的机器学习技术已经提高自然语言处理的性能。然而，最近的研究，例如，赵等人。（2017）表明，这些技术无意中捕捉社会偏压隐藏在语料库和进一步放大它。然而，他们的分析仅在模型的顶部进行预测。在本文中，我们调查从分布角度来看，性别偏见放大的问题，并证明偏见在预测概率分布在标签视图放大。我们进一步提出了一种基于后正规化偏置缓解方法。几乎没有性能的损失，我们的方法几乎可以消除在分布偏差放大。我们的研究揭示了理解偏差放大的光。

10. Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond [PDF] 返回目录
Zhuosheng Zhang, Hai Zhao, Rui Wang
Abstract: Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning.
摘要：机阅读理解（MRC），目的是教机器阅读和理解人类的语言，这是自然语言处理（NLP）的长期目标。凭借深厚的神经网络的突发和语境的语言模型（的CLM）的发展，MRC的研究已经经历了两次显著的突破。 MRC和CLM作为一种现象，对NLP社会有很大的影响。在本次调查中，我们对MRC提供了全面的比较审查覆盖整个研究课题1）MRC与CLM的起源和发展，特别注重的CLM的作用; 2）MRC和CLM到NLP社区的影响; 3）定义，数据集和MRC的评价; 4）一般MRC架构和在从人的认知过程的见解两级编码器 - 解码器解决架构的视图技术方法; 5）以前的亮点，新兴的主题和我们的实证分析，其中我们特别专注于MRC研究不同时期的作品。我们建议对这些主题的完整视图分类和新的分类标准。我们在已经到达的主要观点是：1）MRC提升从语言处理理解的进展情况; 2）MRC系统的迅速改善大大受益的CLM的发展利益; 3）MRC的主题逐渐由浅文本匹配移动认知推理。

11. Parallel Corpus Filtering via Pre-trained Language Models [PDF] 返回目录
Boliang Zhang, Ajay Nagesh, Kevin Knight
Abstract: Web-crawled data provides a good source of parallel corpora for training machine translation models. It is automatically obtained, but extremely noisy, and recent work shows that neural machine translation systems are more sensitive to noise than traditional statistical machine translation methods. In this paper, we propose a novel approach to filter out noisy sentence pairs from web-crawled corpora via pre-trained language models. We measure sentence parallelism by leveraging the multilingual capability of BERT and use the Generative Pre-training (GPT) language model as a domain filter to balance data domains. We evaluate the proposed method on the WMT 2018 Parallel Corpus Filtering shared task, and on our own web-crawled Japanese-Chinese parallel corpus. Our method significantly outperforms baselines and achieves a new state-of-the-art. In an unsupervised setting, our method achieves comparable performance to the top-1 supervised method. We also evaluate on a web-crawled Japanese-Chinese parallel corpus that we make publicly available.
摘要：网络抓取的数据提供了训练机器翻译模型平行语料库的良好来源。它会自动获得的，但极其嘈杂的，最近的工作表明，神经机器翻译系统对噪声比传统的统计机器翻译方法更敏感。在本文中，我们提出了从网上抓取的语料库通过预先训练的语言模型的新方法来过滤嘈杂的句对。我们通过利用BERT的多语言能力衡量排比句，并使用剖成前培训（GPT）语言模型作为域过滤器，以平衡数据域。我们评估了该方法对WMT 2018语料库过滤共享任务，以及对我们自己的网页抓取日中国平行语料库。我们的方法显著优于基线，并实现了新的国家的最先进的。在无人监督的环境，我们的方法达到相当的性能顶端-1监督方法。我们也评估在网上爬日中国平行语料库，我们公开发布。

12. Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation [PDF] 返回目录
Zhiliang Tian, Wei Bi, Dongkyu Lee, Lanqing Xue, Yiping Song, Xiaojiang Liu, Nevin L. Zhang
Abstract: Neural conversation models are known to generate appropriate but non-informative responses in general. A scenario where informativeness can be significantly enhanced is Conversing by Reading (CbR), where conversations take place with respect to a given external document. In previous work, the external document is utilized by (1) creating a context-aware document memory that integrates information from the document and the conversational context, and then (2) generating responses referring to the memory. In this paper, we propose to create the document memory with some anticipated responses in mind. This is achieved using a teacher-student framework. The teacher is given the external document, the context, and the ground-truth response, and learns how to build a response-aware document memory from three sources of information. The student learns to construct a response-anticipated document memory from the first two sources, and the teacher's insight on memory creation. Empirical results show that our model outperforms the previous state-of-the-art for the CbR task.
摘要：神经会话模型已知产生普遍适用，但无信息的响应。其中信息量可以显著增强情景是由通话读（CBR），其中对话发生相对于一个给定的外部文件。在以前的工作中，外部文件是由（1）创建上下文感知文件存储器，其从文档和会话上下文，然后集成信息（2）产生的反应指的是存储器中利用。在本文中，我们建议在考虑一些预期的响应创建文件存储。这是利用师生的框架来实现的。老师给出的外部文档，背景和地面实况响应，并学习如何从信息的三个来源建立一个响应感知的文件存储。学生学习从第一两个来源构造一个响应预期的文件存储和老师的记忆创作的洞察力。实证结果表明，我们的模型优于以前的国家的最先进的CBR任务。

13. Screenplay Quality Assessment: Can We Predict Who Gets Nominated? [PDF] 返回目录
Ming-Chang Chiu, Tiantian Feng, Xiang Ren, Shrikanth Narayanan
Abstract: Deciding which scripts to turn into movies is a costly and time-consuming process for filmmakers. Thus, building a tool to aid script selection, an initial phase in movie production, can be very beneficial. Toward that goal, in this work, we present a method to evaluate the quality of a screenplay based on linguistic cues. We address this in a two-fold approach: (1) we define the task as predicting nominations of scripts at major film awards with the hypothesis that the peer-recognized scripts should have a greater chance to succeed. (2) based on industry opinions and narratology, we extract and integrate domain-specific features into common classification techniques. We face two challenges (1) scripts are much longer than other document datasets (2) nominated scripts are limited and thus difficult to collect. However, with narratology-inspired modeling and domain features, our approach offers clear improvements over strong baselines. Our work provides a new approach for future work in screenplay analysis.
摘要：决定把哪些脚本改编成电影是电影制片人昂贵和耗时的过程。因此，构建一个工具来辅助脚本的选择，在电影制作的初始阶段，是非常有利的。为了实现这一目标，在这项工作中，我们提出，以评估基于语言线索剧本的质量的方法。我们在两方面的办法解决这个问题：（1）我们定义的任务，因为在与假设各大电影奖项的同行公认的脚本应该有更大的机会获得成功预测剧本提名。（2）基于行业的意见和叙事，我们提取和特定域功能集成到常见的分类方法。我们面临着两个挑战（1）脚本比其他文件的数据集更长的时间（2）提名的脚本是有限的，因而难以收集。然而，随着叙事风格的造型和领域特征，我们的方法提供了强有力的基线明显改善。我们的工作提供了剧本分析今后工作的新方法。

14. INFOTABS: Inference on Tables as Semi-structured Data [PDF] 返回目录
Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar
Abstract: In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them. We argue that such data can prove as a testing ground for understanding how we reason about information. To study this, we introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes. Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning. Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task, suggesting that reasoning about tables can pose a difficult modeling challenge.
摘要：在本文中，我们观察到半结构化文本列出无处不在;了解他们不仅需要理解文本片段的意义，而且还使他们之间的隐含关系。我们认为，这样的数据可以作为一个试验场证明对理解我们如何推理的信息。为了研究这个问题，我们引入了一个名为INFOTABS新的数据集的基础上，是维基百科信息盒提取表场所包括人的书面文本假设。我们的分析表明，该处的半结构化，多域和异质性坦承复杂的，多方面的推理。实验表明，虽然人工注释上表假设对之间的关系达成一致，一些标准的建模策略是不成功的任务，提示有关表可能会带来难以建模的挑战推理。

15. Large Scale Multi-Actor Generative Dialog Modeling [PDF] 返回目录
Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro
Abstract: Non-goal oriented dialog agents (i.e. chatbots) aim to produce varying and engaging conversations with a user; however, they typically exhibit either inconsistent personality across conversations or the average personality of all users. This paper addresses these issues by controlling an agent's persona upon generation via conditioning on prior conversations of a target actor. In doing so, we are able to utilize more abstract patterns within a person's speech and better emulate them in generated responses. This work introduces the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona. We introduce an accompanying data collection procedure to obtain 10.3M conversations from 6 months worth of Reddit comments. We demonstrate that scaling model sizes from 117M to 8.3B parameters yields an improvement from 23.14 to 13.14 perplexity on 1.7M held out Reddit conversations. Increasing model scale yielded similar improvements in human evaluations that measure preference of model samples to the held out target distribution in terms of realism (31% increased to 37% preference), style matching (37% to 42%), grammar and content quality (29% to 42%), and conversation coherency (32% to 40%). We find that conditionally modeling past conversations improves perplexity by 0.47 in automatic evaluations. Through human trials we identify positive trends between conditional modeling and style matching and outline steps to further improve persona control.
摘要：非目标导向对话框药剂（即聊天机器人）的目标是产生不同和与用户啮合对话;然而，他们通常表现出整个对话或者不一致的个性或所有用户的平均个性。本文通过经由在以下目标的演员的对话之前空调控制在生成代理的角色来解决这些问题。在此过程中，我们可以对一个人的讲话中使用更多的抽象图案，并更好地生成的响应模仿他们。这项工作介绍了剖成会话控制模式，增强和微调GPT-2语言模型在过去的参考交谈，在演员的角色模型概率多圈的对话条件。我们推出了一项附带的数据收集程序，以获取6个月价值reddit的评论10.3M对话。我们证明从117M是缩放模型尺寸到8.3B参数收益率从23.14提高到13.14困惑的1.7M举行了reddit的对话。增加模型规模产生在人的评估类似的改善，在现实主义的术语模型样品到伸出目标分布的量度偏好（31％增加至37％的偏好），样式匹配（37％至42％），语法和内容质量（ 29％至42％），和会话一致性（32％至40％）。我们发现，有条件的建模过去的谈话0.47自动评估提高了困惑。通过人体试验，我们确定有条件的造型和风格的匹配和轮廓的步骤，进一步提高角色控制之间的积极趋势。

16. Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report [PDF] 返回目录
Ali Hürriyetoğlu, Vanni Zavarella, Hristo Tanev, Erdem Yörük, Ali Safaya, Osman Mutlu
Abstract: We describe our effort on automated extraction of socio-political events from news in the scope of a workshop and a shared task we organized at Language Resources and Evaluation Conference (LREC 2020). We believe the event extraction studies in computational linguistics and social and political sciences should further support each other in order to enable large scale socio-political event information collection across sources, countries, and languages. The event consists of regular research papers and a shared task, which is about event sentence coreference identification (ESCI), tracks. All submissions were reviewed by five members of the program committee. The workshop attracted research papers related to evaluation of machine learning methodologies, language resources, material conflict forecasting, and a shared task participation report in the scope of socio-political event information collection. It has shown us the volume and variety of both the data sources and event information collection approaches related to socio-political events and the need to fill the gap between automated text processing techniques and requirements of social and political sciences.
摘要：我们描述了我们从新闻在研讨会的范围，我们在语言资源与评价会议（LREC 2020年）举办了一个共同任务社会政治事件的自动提取工作。我们相信，在计算语言学，社会和政治科学事件提取研究应进一步相互支持，以实现跨越来源，国家和语言的大规模社会政治事件信息收集。本次活动由定期研究论文和共享任务，这是有关事件句子的共参照标识（ESCI），曲目。所有参赛作品是由程序委员会的五名成员审查。本次研讨会吸引了研究论文，涉及到机器学习方法，语言资源，材料冲突预测评估，并在社会政治事件信息收集范围共享任务的参与报告。它表明我们的数据源和事件信息的收集两者的数量和种类办法涉及到社会政治事件，并需要填写的自动文本处理技术和社会学和政治学的要求之间的差距。

17. A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text [PDF] 返回目录
Carlo Lipizzi, Dario Borrelli, Fernanda de Oliveira Capela
Abstract: This work introduces a new method to consider subjectivity and general context dependency in text analysis and uses as example the detection of emotions conveyed in text. The proposed method takes into account subjectivity using a computational version of the Framework Theory by Marvin Minsky (1974) leveraging on the Word2Vec approach to text vectorization by Mikolov et al. (2013), used to generate distributed representation of words based on the context where they appear. Our approach is based on three components: 1. a framework/'room' representing the point of view; 2. a benchmark representing the criteria for the analysis - in this case the emotion classification, from a study of human emotions by Robert Plutchik (1980); and 3. the document to be analyzed. By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark - intensities of emotions in our case study - for the document to be analyzed. Our method provides a measure that take into account the point of view of the entity reading the document. This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text. Subjectivity can be not limited to human reactions, but it could be used to provide a text with an interpretation related to a given domain ("room"). To evaluate our method, we used a test case in the political domain.
摘要：该作品引入了一种新的方法来考虑文本分析和应用为例情绪的检测文本中传达的主观性和整体环境的依赖。该方法考虑到使用框架理论的马文·明斯基（1974）的计算版本利用由Mikolov等人的Word2Vec方法文本量化帐户主观性。（2013），用于生成基于其中它们出现的上下文单词的分布表示。我们的方法是基于三个组成部分：1.一个框架/“房间”表示的角度; 2.代表用于分析的标准基准 - 在这种情况下的情感类别，从人类情感的罗伯特普拉奇克（1980）的研究;和3文档进行分析。通过使用单词之间的相似性措施，我们是能够提取在基准元素的相对关联性 - 在我们的案例研究的情感强度 - 要进行分析的文件。我们的方法提供了在考虑到视实体读取原稿的点的量度。这种方法可以适用于所有的情况下，评估的主体是相关的理解文本的相对价值或意义。主体可以不限于人类的反应，但它可以被用来提供与给定域（“房间”）的解释文本。为了评估我们的方法，我们用在政治领域的测试情况。

18. That is a Known Lie: Detecting Previously Fact-Checked Claims [PDF] 返回目录
Shaden Shaar, Giovanni Da San Martino, Nikolay Babulkov, Preslav Nakov
Abstract: The recent proliferation of "fake news" has triggered a number of responses, most notably the emergence of several manual fact-checking initiatives. As a result and over time, a large number of fact-checked claims have been accumulated, which increases the likelihood that a new claim in social media or a new statement by a politician might have already been fact-checked by some trusted fact-checking organization, as viral claims often come back after a while in social media, and politicians like to repeat their favorite statements, true or false, over and over again. As manual fact-checking is very time-consuming (and fully automatic fact-checking has credibility issues), it is important to try to save this effort and to avoid wasting time on claims that have already been fact-checked. Interestingly, despite the importance of the task, it has been largely ignored by the research community so far. Here, we aim to bridge this gap. In particular, we formulate the task and we discuss how it relates to, but also differs from, previous work. We further create a specialized dataset, which we release to the research community. Finally, we present learning-to-rank experiments that demonstrate sizable improvements over state-of-the-art retrieval and textual similarity approaches.
摘要：“假新闻”最近的激增引发了一些答复，最显着的几个手动事实查证的举措出现。因此随着时间的推移，大量的事实核对索赔已累积，这增加了可能性，社交媒体或政客一个新语句的新要求可能已经被事实检验一些可信的事实检查组织，病毒往往索赔回来在社交媒体上一段时间后，像政客重复自己喜欢的语句，真的还是假的，一遍又一遍。手动事实检查是很费时的（全自动事实检查有信誉的问题），它试图挽救这方面的努力，并避免在已经事实核对索赔浪费时间是很重要的。有趣的是，尽管任务的重要性，它已在很大程度上被研究界至今忽略。在这里，我们的目标是弥合这一差距。特别是，我们制定的任务，我们讨论它如何与，而且从，以前的工作不同。我们进一步建立专门的数据集，这是我们发布的研究团体。最后，我们现在学习到秩实验，展示了国家的最先进的检索和文本相似的方法可观的改善。

19. Cross-Modality Relevance for Reasoning on Language and Vision [PDF] 返回目录
Chen Zheng, Quan Guo, Parisa Kordjamshidi
Abstract: This work deals with the challenge of learning and reasoning over language and vision data for the related downstream tasks such as visual question answering (VQA) and natural language for visual reasoning (NLVR). We design a novel cross-modality relevance module that is used in an end-to-end framework to learn the relevance representation between components of various input modalities under the supervision of a target task, which is more generalizable to unobserved data compared to merely reshaping the original representation space. In addition to modeling the relevance between the textual entities and visual entities, we model the higher-order relevance between entity relations in the text and object relations in the image. Our proposed approach shows competitive performance on two different language and vision tasks using public benchmarks and improves the state-of-the-art published results. The learned alignments of input spaces and their relevance representations by NLVR task boost the training efficiency of VQA task.
摘要：以学习和推理在语言和视觉数据的相关下游任务的挑战性的工作交易，如视觉问题解答（VQA）和视觉推理（NLVR）自然语言。我们设计了在端至端框架用来学习的目标任务的监督下的各种输入模式的组件之间的相关性表示的新颖的跨模态的相关性模块，相比于仅仅重塑哪个更推广到未观测到的数据原表示空间。除了造型的文本实体和视觉实体之间的相关性，我们的模型在图像中的文本和对象关系实体之间的关系更高阶的相关性。在两个不同的语言和视觉任务我们所提出的方法显示出有竞争力的性能使用公共基准，提高了国家的最先进的公布结果。输入空间的教训比对，并通过NLVR任务的相关性陈述促进VQA任务的训练效率。

20. Which bills are lobbied? Predicting and interpreting lobbying activity in the US [PDF] 返回目录
Ivan Slobozhan, Peter Ormosi, Rajesh Sharma
Abstract: Using lobbying data from this http URL, we offer several experiments applying machine learning techniques to predict if a piece of legislation (US bill) has been subjected to lobbying activities or not. We also investigate the influence of the intensity of the lobbying activity on how discernible a lobbied bill is from one that was not subject to lobbying. We compare the performance of a number of different models (logistic regression, random forest, CNN and LSTM) and text embedding representations (BOW, TF-IDF, GloVe, Law2Vec). We report results of above 0.85% ROC AUC scores, and 78% accuracy. Model performance significantly improves (95% ROC AUC, and 88% accuracy) when bills with higher lobbying intensity are looked at. We also propose a method that could be used for unlabelled data. Through this we show that there is a considerably large number of previously unlabelled US bills where our predictions suggest that some lobbying activity took place. We believe our method could potentially contribute to the enforcement of the US Lobbying Disclosure Act (LDA) by indicating the bills that were likely to have been affected by lobbying but were not filed as such.
摘要：从这个HTTP URL使用的游说数据，我们提供了几个实验将机器学习技术，如果立了一块（美国法案）已经受到游说活动或不预测。我们还研究了游说活动的力度上明显的是如何游说法案是一个没有受到游说的影响。我们比较了多种不同的模型（logistic回归，随机森林，CNN和LSTM）和文本嵌入表示（BOW，TF-IDF，手套，Law2Vec）的性能。我们报告的上述0.85％ROC AUC分数，和78％的准确率结果。模型性能显著提高（95％ROC AUC，和88％的准确度），当以更高的强度游说纸币被搜索。我们亦建议可用于未标记的数据的方法。通过这个我们表明，有相当大量的未标记以前美钞在那里我们的预测表明，一些游说活动发生的。我们相信，我们的方法可以通过，表明很可能已经受到游说，但没有提起这样的法案可能有助于美国游说公开法（LDA）的执行。

21. Automatic Estimation of Inteligibility Measure for Consonants in Speech [PDF] 返回目录
Ali Abavisani, Mark Hasegawa-Johnson
Abstract: In this article, we provide a model to estimate a real-valued measure of the intelligibility of individual speech segments. We trained regression models based on Convolutional Neural Networks (CNN) for stop consonants \textipa{/p,t,k,b,d,g/} associated with vowel \textipa{/A/}, to estimate the corresponding Signal to Noise Ratio (SNR) at which the Consonant-Vowel (CV) sound becomes intelligible for Normal Hearing (NH) ears. The intelligibility measure for each sound is called SNR$_{90}$, and is defined to be the SNR level at which human participants are able to recognize the consonant at least 90\% correctly, on average, as determined in prior experiments with NH subjects. Performance of the CNN is compared to a baseline prediction based on automatic speech recognition (ASR), specifically, a constant offset subtracted from the SNR at which the ASR becomes capable of correctly labeling the consonant. Compared to baseline, our models were able to accurately estimate the SNR$_{90}$~intelligibility measure with less than 2 [dB$^2$] Mean Squared Error (MSE) on average, while the baseline ASR-defined measure computes SNR$_{90}$~with a variance of 5.2 to 26.6 [dB$^2$], depending on the consonant.
摘要：在这篇文章中，我们提供了估算个性语音片段的可理解的真实值测度模型。我们培养基于卷积神经网络（CNN）的回归模型用于停止辅音\ textipa {/ P，T，K，B，d，克/}与元音\ textipa相关联{/ A /}，为相应的信号估计噪比（SNR），在该辅音元音（CV）声音变得可理解为听力正常（NH）的耳朵。每种声音的可懂度度量被称为SNR $ _ {90} $，并且被定义为SNR水平在该人的参与者能够正确地识别该辅音至少90 \％，平均来说，如结合之前的实验测定NH科目。的CNN的性能进行比较，根据自动语音识别（ASR）的基线预测，具体而言，恒定偏移从将ASR变得能够正确地标记辅音的SNR减去。与基线相比，我们的模型能够准确地估计SNR $ _ {90} $〜可懂度度量与小于2分贝$ ^ 2 $]均方误差（MSE）的平均，而基线ASR-定义的测量单位计算SNR $ _ {90} $〜带的5.2至26.6 [dB的$ ^ 2 $]方差，这取决于辅音。

22. Mega-COV: A Billion-Scale Dataset of 65 Languages For COVID-19 [PDF] 返回目录
Muhammad Abdul-Mageed, AbdelRahim Elmadany, Dinesh Pabbi, Kunal Verma, Rannie Lin
Abstract: We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 234 countries), longitudinal (goes as back as 2007), multilingual (comes in 65 languages), and has a significant number of location-tagged tweets (~32M tweets). We release tweet IDs from the dataset, hoping it will be useful for studying various phenomena related to the ongoing pandemic and accelerating viable solutions to associated problems.
摘要：我们描述巨型COV，从Twitter的一个数十亿规模的数据集研究COVID-19。该数据集是不同的（覆盖234个国家），纵向（如去早在2007年），多语言（自带65种语言），并具有位置标记的鸣叫（〜32M微博）的显著数量。我们从数据集中发布的tweet的ID，希望这将是研究关系到相关的问题正在进行的流行以及加速可行的解决方案的各种现象非常有用。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-14

目录

摘要