摘要

1. Energy-Based Models for Text [PDF] 返回目录
Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam
Abstract: Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmative even if we do not. This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process. We give a formalism for this using the Energy-Based Model framework, and show that it indeed improves the results of the generative models, measured both in terms of perplexity and in terms of human evaluation.
摘要：目前大规模自回归语言模型展示令人印象深刻的流畅性并能产生令人信服的文本。在这项工作中，我们问这个问题开始：可这些车型的后代，可以可靠地实时文本通过统计鉴别区分？我们通过实验发现，答案是肯定的，当我们有机会获得训练数据的模型，即使我们不持谨慎肯定的。这表明，自回归模型可以通过将（全球标准化）鉴别到生成过程进行改进。我们给这个使用能源为主的模式框架形式主义，并表明它确实提高了生成模型的结果，无论是在困惑的方面和人的评价来衡量。

2. Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation [PDF] 返回目录
Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao
Abstract: Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder, making use of multilingual data to improve UNMT for all language pairs. On the basis of the empirical findings, we propose two knowledge distillation methods to further enhance multilingual UNMT performance. Our experiments on a dataset with English translated to and from twelve other languages (including three language families and six language branches) show remarkable results, surpassing strong unsupervised individual baselines while achieving promising performance between non-English language pairs in zero-shot translation scenarios and alleviating poor performance in low-resource language pairs.
摘要：无监督神经机器翻译（UNMT）最近取得了显着成效几个语言对。但是，它只能在单一的语言对之间进行转换，并在同一时间不能产生多个语言对翻译结果。也就是说，在多语种UNMT研究一直受到限制。在本文中，我们经验介绍一个简单的方法来使用单个编码器和解码器单，利用多语种数据的改善UNMT所有语言对十种三种语言之间的转换。在实证研究结果的基础上，我们提出了两种知识蒸馏法，进一步提高多语种UNMT性能。我们对英语的数据集实验翻译成和其他十二种语言（包括三个语家庭和六个语系）显示出了明显的成效，超过了强大的无监督各自的基准，而在零射门翻译场景实现非英语语言对之间有前途的性能和减轻中低资源语言对表现不佳。

3. Logic-Guided Data Augmentation and Regularization for Consistent Question Answering [PDF] 返回目录
Akari Asai, Hannaneh Hajishirzi
Abstract: Many natural language questions require qualitative, quantitative or logical comparisons between two entities or events. This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by integrating logic rules and neural models. Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model. Improving the global consistency of predictions, our approach achieves large improvements over previous methods in a variety of question answering (QA) tasks including multiple-choice qualitative reasoning, cause-effect reasoning, and extractive machine reading comprehension. In particular, our method significantly improves the performance of RoBERTa-based models by 1-5% across datasets. We advance the state of the art by around 5-8% on WIQA and QuaRel and reduce consistency violations by 58% on HotpotQA. We further demonstrate that our approach can learn effectively from limited data.
摘要：许多自然语言问题需要两个实体或事件之间的定性，定量或逻辑比较。本文讨论了通过整合逻辑规则和神经模型提高响应比较问题的准确性和一致性的问题。我们的方法利用逻辑和语言知识，以增加标记的训练数据，然后使用一个基于一致性的正则训练模型。提高预测的全球一致性，我们的方法实现了在各种包括选择题定性推理，因果推理，和采掘机阅读理解问答（QA）任务，以前的方法大的改进。特别是，我们的方法显著改进了基于罗伯塔的模型由1-5％横跨数据集的性能。我们以％左右5-8上WIQA和QuaRel推进技术状态，并通过58％的HotpotQA减少一致性违法行为。我们进一步证明我们的方法可以从有限的数据有效地学习。

4. Experience Grounds Language [PDF] 返回目录
Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian
Abstract: Successful linguistic communication relies on a shared experience of the world, and it is this shared experience that makes utterances meaningful. Despite the incredible effectiveness of language processing models trained on text alone, today's best systems still make mistakes that arise from a failure to relate language to the physical world it describes and to the social interactions it facilitates. Natural Language Processing is a diverse field, and progress throughout its development has come from new representational theories, modeling techniques, data collection paradigms, and tasks. We posit that the present success of representation learning approaches trained on large text corpora can be deeply enriched from the parallel tradition of research on the contextual and social nature of language. In this article, we consider work on the contextual foundations of language: grounding, embodiment, and social interaction. We describe a brief history and possible progression of how contextual information can factor into our representations, with an eye towards how this integration can move the field forward and where it is currently being pioneered. We believe this framing will serve as a roadmap for truly contextual language understanding.
摘要：成功的语言交际依赖于世界的共同经验，而正是这种共同的经验，使话语意义。尽管培训了单独的文本语言处理模型的有效性令人难以置信，今天的最好的系统仍然能够从失败到涉及语言它所描述的物理世界，它有利于社会交往出现的错误。自然语言处理是一个多元化的领域，在其发展的进步来自于新的代表性理论，建模技术，数据采集范例和任务。我们断定，表示学习本办法成功的大型语料库的培训可以从研究语言的语境和社会性的并行传统深深地丰富。在这篇文章中，我们考虑语言的语境基础工作：接地，实施方案中，与社会的互动。我们描述了一个简短的历史，以及如何上下文信息因素的可能发展成我们的交涉，朝向如何整合可以在现场前进，它目前正在倡导的眼睛。我们相信，这种取景将成为真正的上下文语言理解的路线图。

5. Unsupervised Opinion Summarization with Noising and Denoising [PDF] 返回目录
Reinald Kim Amplayo, Mirella Lapata
Abstract: The supervised training of high-capacity models on large datasets containing hundreds of thousands of document-summary pairs is critical to the recent success of deep learning techniques for abstractive summarization. Unfortunately, in most domains (other than news) such training data is not available and cannot be easily sourced. In this paper we enable the use of supervised learning for the setting where there are only documents available (e.g.,~product or business reviews) without ground truth summaries. We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof which we treat as pseudo-review input. We introduce several linguistically motivated noise generation functions and a summarization model which learns to denoise the input and generate the original review. At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise. Extensive automatic and human evaluation shows that our model brings substantial improvements over both abstractive and extractive baselines.
摘要：高容量的机型上含有成千上万的文档总结对大数据集的指导训练是深学习技术的抽象总结近期成功的关键。不幸的是，在大多数领域（除新闻外）这样的训练数据不可用，并不能轻易采购。在本文中，我们启用只有有没有地面实况摘要文件可用的（例如，〜产品或商业评论）的设置使用监督学习的。我们通过抽样审查创建的用户评论语料库中的合成数据集，假装这是一个总结，并产生噪声的版本上，我们当作伪评审输入。我们介绍几种语言的动机产生噪声的功能，并且学会降噪输入和生成原始审查总结模型。在测试时，该模型接受真正的审查和生成包含突出的意见汇总，处理那些没有达成一致意见的噪音。丰富的自动和人工评估表明，我们的模型带来了既抽象和采掘基线显着改善。

6. Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms [PDF] 返回目录
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
Abstract: Because attention modules are core components of Transformer-based models that have recently achieved considerable success in natural language processing, the community has a great deal of interest in why attention modules are successful and what kind of linguistic information they capture. In particular, previous studies have mainly analyzed attention weights to see how much information the attention modules gather from each input to produce an output. In this study, we point out that attention weights alone are only one of the two factors determining the output of self-attention modules, and we propose to incorporate the other factor as well, namely, the transformed input vectors into the analysis. That is, we measure the norm of the weighted vectors as the contribution of each input to an output. Our analysis of self-attention modules in BERT and the Transformer-based neural machine translation system shows that the attention modules behave very intuitively, contrary to previous findings. That is, our analysis reveals that (1) BERT's attention modules do not pay so much attention to special tokens, and (2) Transformer's attention modules capture word alignment quite well.
摘要：由于注意力模块是最近实现了自然语言处理相当大的成功基于变压器的模型的核心组成部分，得到了社会各界的兴趣，为什么关注模块都成功了大量工作，什么样的，他们捕捉语言信息。特别是，以往的研究主要分析关注权重看注意模块从每个输入多少信息聚集在一起，产生一个输出。在这项研究中，我们指出，关注权重单单是只有两个因素决定的自我关注模块的输出之一，我们建议结合其他因素还有，即变换输入向量进入分析。也就是说，我们测量的加权向量作为每个输入到输出的贡献的范数。我们的自我关注模块BERT和基于变压器的神经机器翻译系统显示分析认为，关注模块表现得非常直观，违背了先前的调查结果。也就是说，我们的分析显示，（1）BERT的注意模块不付出这么多注意一些特殊的记号，和（2）变压器的注意力模块捕获的字排列得非常好。

7. Curriculum Pre-training for End-to-End Speech Translation [PDF] 返回目录
Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, Zhenglu Yang
Abstract: End-to-end speech translation poses a heavy burden on the encoder, because it has to transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a powerful encoder, traditional methods pre-train it on ASR data to capture speech features. However, we argue that pre-training the encoder only through simple speech recognition is not enough and high-level linguistic knowledge should be considered. Inspired by this, we propose a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages. The difficulty of these courses is gradually increasing. Experiments show that our curriculum pre-training method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
摘要：结束到终端的语音翻译姿势编码器带来沉重的负担，因为它有录制，理解和学习同时进行跨语言的语义。为了获得一个强大的编码器，传统的方法预先训练它的ASR数据来捕捉语音特征。然而，我们认为前培训只能通过简单的语音识别编码器是不够的，高层次的语言知识应该被考虑。受此启发，我们提出了包括转录学习和两种语言理解话语和映射词两个高级课程的初级班课程前的训练方法。这些课程的难度逐渐增大。实验表明，我们的课程前的训练方法导致对恩德和恩神父语音翻译基准显著的改善。

8. TD-GIN: Token-level Dynamic Graph-Interactive Network for Joint Multiple Intent Detection and Slot Filling [PDF] 返回目录
Libo Qin, Xiao Xu, Wanxiang Che, Ting Liu
Abstract: Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system. Currently, most work on SLU have focused on the single intent scenario, and paid less attention into the multi-intent scenario, which commonly exists in real-world scenarios. In addition, multi-intent SLU faces an unique challenges: how to effectively incorporate multiple intents information to guide the slot prediction. In this paper, we propose a Token-level Dynamic Graph-Interactive Network (TD-GIN) for joint multiple intent detection and slot filling, where we model the interaction between multiple intents and each token slot in a unified graph architecture. With graph interaction mechanism, our framework has the advantage to automatically extract the relevant intents information to guide each token slot prediction, making a fine-grained intent information integration for the token-level slot prediction. Experiments on two multi-intent datasets show that our model achieves the state-of-the-art performance and outperforms other previous methods by a large margin. Comprehensive analysis empirically shows that our framework successfully captures multiple relevant intents information to improve the SLU performance.
摘要：意向检测和槽分配是建设口语理解（SLU）系统中的两个主要任务。目前，关于SLU大部分工作都集中在单一的意图的情况下，和较少关注到多意图的情况下，这在真实场景中普遍存在。此外，多意图SLU面临的一个独特的挑战：如何有效地将多个意图信息引导槽预测。在本文中，我们提出了一个令牌级动态图形交互式网络（TD-GIN）联合多意图检测和槽填充，在那里我们在一个统一架构图表多个意图和每个令牌槽之间的相互作用进行建模。随着图形互动机制，我们的框架具有的优点是自动提取相关信息的意图，引导各投币口的预测，使得该令牌级插槽预测细粒度的意图信息整合。在两个多意图数据集实验表明，我们的模型实现了国家的最先进的性能和优于大幅度以前的其他方法。综合分析经验表明，我们的框架中成功地捕捉多个相关意图的信息，以提高性能SLU。

9. Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation Extraction [PDF] 返回目录
Yuming Shang, Heyan Huang, Xin Sun, Xianling Mao
Abstract: Relation ties, defined as the correlation and mutual exclusion between different relations, are critical for distant supervised relation extraction. Existing approaches model this property by greedily learning local dependencies. However, they are essentially limited by failing to capture the global topology structure of relation ties. As a result, they may easily fall into a locally optimal solution. To solve this problem, in this paper, we propose a novel force-directed graph based relation extraction model to comprehensively learn relation ties. Specifically, we first build a graph according to the global co-occurrence of relations. Then, we borrow the idea of Coulomb's Law from physics and introduce the concept of attractive force and repulsive force to this graph to learn correlation and mutual exclusion between relations. Finally, the obtained relation representations are applied as an inter-dependent relation classifier. Experimental results on a large scale benchmark dataset demonstrate that our model is capable of modeling global relation ties and significantly outperforms other baselines. Furthermore, the proposed force-directed graph can be used as a module to augment existing relation extraction systems and improve their performance.
摘要：关联关系，定义为不同的关系之间的相关性和相互排斥，是遥远的监督关系抽取的关键。现有的方法通过学习贪婪地依赖当地的该属性建模。然而，他们基本上是由未能捕捉到相关关系的全局拓扑结构的限制。其结果是，它们可以容易地陷入局部最佳解。为了解决这个问题，在本文中，我们提出了一个新颖的力指向基于图形的关系抽取模型，全面了解相关关系。具体而言，我们首先根据关系的全球共生建立的图表。然后，我们从物理学借用库仑定律的想法和引进的吸引力和排斥力的概念，该图以了解关系之间的相关性和相互排斥。最后，所获得的关系表示被应用为相互依赖的关系分类器。大规模的基准数据集实验结果表明，我们的模型能够模拟全球关系纽带和显著优于其他基线。此外，所提出的力向图可以作为一个模块来增强现有关系抽取系统和提高它们的性能。

10. BERT-ATTACK: Adversarial Attack Against BERT Using BERT [PDF] 返回目录
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu
Abstract: Adversarial attacks for discrete data (such as text) has been proved significantly more challenging than continuous data (such as image), since it is difficult to generate adversarial samples with gradient-based methods. Currently, the successful attack methods for text usually adopt heuristic replacement strategies on character or word level, which remains challenging to find the optimal solution in the massive space of possible combination of replacements, while preserving semantic consistency and language fluency. In this paper, we propose \textbf{BERT-Attack}, a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT. We turn BERT against its fine-tuned models and other deep neural models for downstream tasks. Our method successfully misleads the target models to predict incorrectly, outperforming state-of-the-art attack strategies in both success rate and perturb percentage, while the generated adversarial samples are fluent and semantically preserved. Also, the cost of calculation is low, thus possible for large-scale generations.
摘要：离散数据（诸如文本）对抗性攻击已被证明比连续数据显著更具挑战性（如图像），因为它是难以生成与基于梯度的方法对抗样品。目前，文本成功的攻击方法通常采用的字符或单词的水平，这依然严峻找到替代的可能组合的大量空间的最优解启发式替换策略，同时保持语义一致性和语言流畅。在本文中，我们提出了\ textbf {BERT攻击}，高品质和有效的方法使用BERT例证预先训练掩盖语言模型来生成对抗性样本。我们把BERT兑微调模型和其他深层神经型号为下游任务。我们的方法成功地误导目标模型不正确地预测，跑赢国家的最先进的攻击策略均成功率和扰动的百分比，而产生的对抗性样品流畅和语义保留。此外，计算的成本是低的，因此，可以用于大规模代。

11. Learning to Encode Evolutionary Knowledge for Automatic Commenting Long Novels [PDF] 返回目录
Canxiang Yan, Jianhao Yan, Yangyin Xu, Cheng Niu, Jie Zhou
Abstract: Static knowledge graph has been incorporated extensively into sequence-to-sequence framework for text generation. While effectively representing structured context, static knowledge graph failed to represent knowledge evolution, which is required in modeling dynamic events. In this paper, an automatic commenting task is proposed for long novels, which involves understanding context of more than tens of thousands of words. To model the dynamic storyline, especially the transitions of the characters and their relations, Evolutionary Knowledge Graph(EKG) is proposed and learned within a multi-task framework. Given a specific passage to comment, sequential modeling is used to incorporate historical and future embedding for context representation. Further, a graph-to-sequence model is designed to utilize the EKG for comment generation. Extensive experimental results show that our EKG-based method is superior to several strong baselines on both automatic and human evaluations.
摘要：静态知识图已被广泛纳入序列到序列框架文本生成。同时有效地代表结构的背景下，静态知识图不能代表知识进化，它在造型动感事件所需。在本文中，自动评论任务，提出了很长的小说，其中包括比文字数以万计的更多的理解上下文。为了模拟动态的故事情节，尤其是人物和他们的关系的转变，进化知识图（EKG）的建议和多任务框架内学会。给定一个特定通道评论，连续建模用于集成历史和未来的嵌入上下文表示。此外，曲线图对序列模型被设计成利用对评论产生EKG。大量的实验结果表明，我们基于EKG法优于在自动和人的评价几个强势的基线。

12. DIET: Lightweight Language Understanding for Dialogue Systems [PDF] 返回目录
Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, Alan Nichol
Abstract: Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.
摘要：大型预训练的语言模型已经表现出对像胶水和强力胶语言理解基准测试结果令人印象深刻，大大提高了其他前的训练方法，如分布式表示（手套）和纯监督的方法。我们引入双重意图和实体变压器（饮食）架构，以及意图和实体预测，两种常见的对话语言理解任务，研究不同的预先训练陈述的有效性。 DIET前进上的复杂的多域NLU数据集的现有技术的状态，并实现对其它数据集简单同样高的性能。出人意料的是，我们表明，没有明确的好处是用大预先训练模型这一任务，而事实上饮食，即使在没有任何预先训练的嵌入一个纯粹的监督设置改进了现有技术的当前状态。我们表现最好的模型优于微调BERT和大约快六倍训练。

13. Relabel the Noise: Joint Extraction of Entities and Relations via Cooperative Multiagents [PDF] 返回目录
Daoyuan Chen, Yaliang Li, Kai Lei, Ying Shen
Abstract: Distant supervision based methods for entity and relation extraction have received increasing popularity due to the fact that these methods require light human annotation efforts. In this paper, we consider the problem of \textit{shifted label distribution}, which is caused by the inconsistency between the noisy-labeled training set subject to external knowledge graph and the human-annotated test set, and exacerbated by the pipelined entity-then-relation extraction manner with noise propagation. We propose a joint extraction approach to address this problem by re-labeling noisy instances with a group of cooperative multiagents. To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective; To leverage the correlations between these two extraction tasks, a confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels. Further, the confidences are used to adjust the training losses of extractors. Experimental results on two real-world datasets verify the benefits of re-labeling noisy instance, and show that the proposed model significantly outperforms the state-of-the-art entity and relation extraction methods.
摘要：遥远的监督为基础的实体和关系提取方法已收到由于这些方法都需要光的人注释的努力的事实日益普及。在本文中，我们考虑\ textit {转移标签分发}，这是由嘈杂标记的训练集受到外部知识图和人类标注测试集之间的不一致而产生的问题，并通过流水线实体 - 加剧然后-关系抽取的方式与噪声传播。我们提出了联合开采方式以小组合作的博弈行为，以解决通过重新贴标签嘈杂的情况下这个问题。为了处理在细粒度方式嘈杂实例中，在协同组的每个代理评估通过计算从自己的角度连续置信度得分的实例;为了充分利用这两种提取任务之间的相关性，置信共识模块旨在收集所有代理的智慧和重新分配有信心拿下标签嘈杂的训练集。此外，置信度用于调整提取的培训损失。两个真实世界的数据集实验结果验证了重新标注嘈杂实例的好处，并表明该模型显著优于国家的最先进的实体和关系提取方法。

14. Contextual Neural Machine Translation Improves Translation of Cataphoric Pronouns [PDF] 返回目录
KayYen Wong, Sameen Maruf, Gholamreza Haffari
Abstract: The advent of context-aware NMT has resulted in promising improvements in the overall translation quality and specifically in the translation of discourse phenomena such as pronouns. Previous works have mainly focused on the use of past sentences as context with a focus on anaphora translation. In this work, we investigate the effect of future sentences as context by comparing the performance of a contextual NMT model trained with the future context to the one trained with the past context. Our experiments and evaluation, using generic and pronoun-focused automatic metrics, show that the use of future context not only achieves significant improvements over the context-agnostic Transformer, but also demonstrates comparable and in some cases improved performance over its counterpart trained on past context. We also perform an evaluation on a targeted cataphora test suite and report significant gains over the context-agnostic Transformer in terms of BLEU.
摘要：上下文感知NMT的出现，导致看好整体翻译质量，特别在话语现象，如代词翻译的改进。以前的作品主要集中在使用过去的句子作为背景，重点照应的翻译。在这项工作中，我们通过比较在一个与过去的背景下培养出来的未来情境训练的上下文NMT模型的性能探讨未来的句子作为背景的影响。我们的实验和评估，使用通用和代词为重点的自动指标，表明使用未来的背景下，不仅实现了上下文无关变压器显著的改善，但也显示出比较的，在某些情况下，提高了它的相对表现过去的背景下受过训练。我们也有针对性的下指测试套件进行评估和报告BLEU方面比上下文无关变压器显著的收益。

15. Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling [PDF] 返回目录
David Harbecke, Christoph Alt
Abstract: Recently, state-of-the-art NLP models gained an increasing syntactic and semantic understanding of language, and explanation methods are crucial to understand their decisions. Occlusion is a well established method that provides explanations on discrete language data, e.g. by removing a language unit from an input and measuring the impact on a model's decision. We argue that current occlusion-based methods often produce invalid or syntactically incorrect language data, neglecting the improved abilities of recent NLP models. Furthermore, gradient-based explanation methods disregard the discrete distribution of data in NLP. Thus, we propose OLM: a novel explanation method that combines occlusion and language models to sample valid and syntactically correct replacements with high likelihood, given the context of the original input. We lay out a theoretical foundation that alleviates these weaknesses of other explanation methods in NLP and provide results that underline the importance of considering data likelihood in occlusion-based explanation.
摘要：近日，国家的最先进的NLP模型获得了语言的增加句法和语义理解，解释方法，了解他们的决定是至关重要的。闭塞是一个完善的方法，其提供离散语言数据，例如解释通过从输入移除语言单位和测量模型上的决定的影响。我们认为，当前基于闭塞的方法往往会产生无效或语法不正确的语言数据，而忽略了近期NLP模型的改进能力。此外，基于梯度的方法解释忽略数据的NLP离散分布。因此，我们提出OLM：一种新颖的方法的解释，结合闭塞和语言模型来给定的原始输入的上下文中有效采样和语法上是正确的替换具有高似然性。我们奠定了理论基础，可减轻在NLP其他解释方法和这些弱点提供结果下划线考虑基于遮挡，解释数据的可能性的重要性。

16. Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions [PDF] 返回目录
Siyu Ren, Kenny Q. Zhu
Abstract: In this paper, we propose a novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions, which incorporates a general-purpose knowledge base to effectively create a small distractor candidate set, and a feature-rich learning-to-rank model to select distractors that are both plausible and reliable. Experimental results on datasets across four domains show that our framework yields distractors that are more plausible and reliable than previous methods. This dataset can also be used as a benchmark for distractor generation in the future.
摘要：在本文中，我们提出了一个新的可配置的框架来自动生成开放域完形填空式选择题，它采用了通用的知识基础，有效地创建一个小的牵张候选集分散注意力的选择，和特征 - 丰富的学习到秩模式选择错误选项是两个可行的和可靠的。横跨四个领域数据集实验结果表明，我们的框架产生是更合理的和比以前的方法可靠的干扰项。该数据集也可以用作未来牵张一代的标杆。

17. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation [PDF] 返回目录
Nils Reimers, Iryna Gurevych
Abstract: We present an easy and efficient method to extend existing sentence embedding models to new languages. This allows to create multilingual versions from previously monolingual models. The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence. We use the original (monolingual) model to generate sentence embeddings for the source language and then train a new system on translated sentences to mimic the original model. Compared to other methods for training multilingual sentence embeddings, this approach has several advantages: It is easy to extend existing models with relatively few samples to new languages, it is easier to ensure desired properties for the vector space, and the hardware requirements for training is lower. We demonstrate the effectiveness of our approach for 10 languages from various language families. Code to extend sentence embeddings models to more than 400 languages is publicly available.
摘要：我们提出一个简单而有效的方法，以现有的句子嵌入模型扩展到新的语言。这允许创建从以前单一语言模型的多语言版本。培训是基于一个翻译的句子应在向量空间原判被映射到同一个位置的想法。我们使用原来的（单）模型来生成源语言句子的嵌入，然后训练一个新的系统上翻译的句子来模拟原始模型。与其他方法相比训练多语言句子的嵌入，这种方法有几个优点：这是很容易现有车型相对较少的样本，以新的语言扩展，更容易确保向量空间所需的性能，并为培训的硬件要求是降低。我们证明了我们为来自不同语系10种语言方法的有效性。代码模型扩展句的嵌入到超过400种语言是公开的。

18. Keyphrase Generation with Cross-Document Attention [PDF] 返回目录
Shizhe Diao, Yan Song, Tong Zhang
Abstract: Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document. Conventional methods normally apply an encoder-decoder architecture to generate the output keyphrases for an input document, where they are designed to focus on each current document so they inevitably omit crucial corpus-level information carried by other similar documents, i.e., the cross-document dependency and latent topics. In this paper, we propose CDKGen, a Transformer-based keyphrase generator, which expands the Transformer to global attention with cross-document attention networks to incorporate available documents as references so as to generate better keyphrases with the guidance of topic information. On top of the proposed Transformer + cross-document attention architecture, we also adopt a copy mechanism to enhance our model via selecting appropriate words from documents to deal with out-of-vocabulary words in keyphrases. Experiment results on five benchmark datasets illustrate the validity and effectiveness of our model, which achieves the state-of-the-art performance on all datasets. Further analyses confirm that the proposed model is able to generate keyphrases consistent with references while keeping sufficient diversity. The code of CDKGen is available at this https URL.
摘要：关键词的一代旨在产生一组短语概括给定文档的要领。常规的方法通常适用的编码器 - 解码器体系结构来生成用于输入文档，在那里它们被设计成聚焦在每个当前文档的输出关键短语所以它们不可避免地省略通过其他类似的文件，即，跨文档进行关键语料库级信息依赖和潜在主题。在本文中，我们提出CDKGen，基于变压器的关键词发生器，它扩大了变压器全球瞩目的跨文档重视网络纳入现有文件作为参考，以便产生与主题信息的引导更好的关键字句。在所提出的变压器+跨文档关注架构之上，我们也采用复制机制通过从文档中选择合适的词来应对关键短语外的词汇，以增强我们的模型。五个标准数据集实验结果说明我们的模型，实现对所有数据集的国家的最先进的性能的正确性和有效性。进一步的分析证实，该模型能够生成关键字句与文献相一致，同时保持足够的多样性。 CDKGen的代码可在此HTTPS URL。

19. Neural Abstractive Summarization with Structural Attention [PDF] 返回目录
Tanya Chowdhury, Sachin Kumar, Tanmoy Chakraborty
Abstract: Attentional, RNN-based encoder-decoder architectures have achieved impressive performance on abstractive summarization of news articles. However, these methods fail to account for long term dependencies within the sentences of a document. This problem is exacerbated in multi-document summarization tasks such as summarizing the popular opinion in threads present in community question answering (CQA) websites such as Yahoo! Answers and Quora. These threads contain answers which often overlap or contradict each other. In this work, we present a hierarchical encoder based on structural attention to model such inter-sentence and inter-document dependencies. We set the popular pointer-generator architecture and some of the architectures derived from it as our baselines and show that they fail to generate good summaries in a multi-document setting. We further illustrate that our proposed model achieves significant improvement over the baselines in both single and multi-document summarization settings -- in the former setting, it beats the best baseline by 1.31 and 7.8 ROUGE-1 points on CNN and CQA datasets, respectively; in the latter setting, the performance is further improved by 1.6 ROUGE-1 points on the CQA dataset.
摘要：基于RNN注视，编码器，解码器架构已对新闻报道的抽象概括取得了骄人的业绩。然而，这些方法没有考虑到文档的句子中的长期依赖。这个问题在多文档摘要的任务，如在线程总结民意加剧表现在社区问答（CQA）的网站，如雅虎答案与Quora的。这些线程包含答案经常重叠或相互矛盾。在这项工作中，我们提出了基于结构关注此类跨句子和文档间的依赖关系模型的分层编码器。我们设定的流行指针发电机的建筑和一些架构来源于它作为我们的基线，并表明他们不能产生在一个多文档设置好的总结。我们进一步说明，我们提出的模型，实现了在单和多文档文摘设置在基线显著的改善 - 在原来的设置，它击败了1.31和7.8 ROUGE-1在CNN和CQA数据集，分别点的最佳基线;在后一种设置中，性能进一步通过对数据集CQA 1.6 ROUGE-1个点的改善。

20. Train No Evil: Selective Masking for Task-guided Pre-training [PDF] 返回目录
Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun
Abstract: Recently, pre-trained language models mostly follow the pre-training-then-fine-tuning paradigm and have achieved great performances on various downstream tasks. However, due to the aimlessness of pre-training and the small in-domain supervised data scale of fine-tuning, the two-stage models typically cannot capture the domain-specific and task-specific language patterns well. In this paper, we propose a selective masking task-guided pre-training method and add it between the general pre-training and fine-tuning. In this stage, we train the masked language modeling task on in-domain unsupervised data, which enables our model to effectively learn the domain-specific language patterns. To efficiently learn the task-specific language patterns, we adopt a selective masking strategy instead of the conventional random masking, which means we only mask the tokens that are important to the downstream task. Specifically, we define the importance of tokens as their impacts on the final classification results and use a neural model to learn the implicit selecting rules. Experimental results on two sentiment analysis tasks show that our method can achieve comparable or even better performance with less than 50\% overall computation cost, which indicates our method is both effective and efficient. The source code will be released in the future.
摘要：近日，预先训练的语言模型主要遵循前训练，当时的微调模式，并取得了在各种下游任务，伟大的表演。然而，由于前培训的盲目性和域内的小型监督微调的数据规模，两阶段模型通常无法捕捉到特定领域和任务的特定语言的模式很好。在本文中，我们提出了一个选择性屏蔽任务引导前的训练方法和一般前培训及微调之间添加它。在这个阶段，我们在训练中域无监督的数据掩盖语言建模任务，这使得我们的模型能够有效学习领域特定语言的模式。为了有效地学习任务特定的语言模式，我们采用选择性屏蔽策略，而不是传统的随机屏蔽，这意味着我们不仅掩盖了对下游任务重要的标记。具体而言，我们定义标记的重要性，因为它们对最终分类结果的影响，并使用神经元模型来学习隐含的选择规则。两个情感分析任务的实验结果表明，我们的方法可以达到小于50 \％的总计算成本，这表明我们的方法是有效和高效率相当或更好的性能。源代码将在未来释放。

21. Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [PDF] 返回目录
Zheng Zhang, Lizi Liao, Xiaoyan Zhu, Tat-Seng Chua, Zitao Liu, Yan Huang, Minlie Huang
Abstract: Most existing approaches for goal-oriented dialogue policy learning used reinforcement learning, which focuses on the target agent policy and simply treat the opposite agent policy as part of the environment. While in real-world scenarios, the behavior of an opposite agent often exhibits certain patterns or underlies hidden policies, which can be inferred and utilized by the target agent to facilitate its own decision making. This strategy is common in human mental simulation by first imaging a specific action and the probable results before really acting it. We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy. We evaluate our model on both cooperative and competitive dialogue tasks, showing superior performance over state-of-the-art baselines.
摘要：面向目标的对话政策，学习使用强化学习，其重点目标代理政策，只是简单地将反代理政策环境的一部分，大多数现有的方法。尽管在真实世界的场景中，相对的代理的行为往往表现出一定图案或underlies隐蔽策略，其可以推断出，并利用由目标代理，以促进其自身的决策。这种策略首先成像的具体行动，真正采取行动之前，可能的结果是人类心理模拟常见。因此，我们建议对政策学习的目标导向的对话相反的行为认识框架。我们从它的行为估计相反的代理政策，并使用此估计通过把它作为目标政策的一部分，以提高目标代理。我们评估我们的竞争性合作对话任务模型，表现出对国家的最先进的基线性能优越。

22. Word Embedding-based Text Processing for Comprehensive Summarization and Distinct Information Extraction [PDF] 返回目录
Xiangpeng Wan, Hakim Ghazzai, Yehia Massoud
Abstract: In this paper, we propose two automated text processing frameworks specifically designed to analyze online reviews. The objective of the first framework is to summarize the reviews dataset by extracting essential sentence. This is performed by converting sentences into numerical vectors and clustering them using a community detection algorithm based on their similarity levels. Afterwards, a correlation score is measured for each sentence to determine its importance level in each cluster and assign it as a tag for that community. The second framework is based on a question-answering neural network model trained to extract answers to multiple different questions. The collected answers are effectively clustered to find multiple distinct answers to a single question that might be asked by a customer. The proposed frameworks are shown to be more comprehensive than existing reviews processing solutions.
摘要：在本文中，我们提出了专门分析在线评论，两条自动化文本处理框架。第一框架的目标是通过提取必要的一句话来概括的评论集。这是通过转换句子成数值矢量并且使用基于它们的相似性水平的社区检测算法聚类它们执行。此后，相关分数为每个句子测量，以确定在每个集群的重要性水平，并为其分配作为该社区的标签。第二个框架是基于训练来提取答案多个不同的问题进行答疑神经网络模型。收集的答案被有效聚集找到多个不同的答案，一个问题可能会被顾客问。拟议的框架是显示出比现有的审查处理解决方案更加全面。

23. The Panacea Threat Intelligence and Active Defense Platform [PDF] 返回目录
Adam Dalton, Ehsan Aghaei, Ehab Al-Shaer, Archna Bhatia, Esteban Castillo, Zhuo Cheng, Sreekar Dhaduvai, Qi Duan, Md Mazharul Islam, Younes Karimi, Amir Masoumzadeh, Brodie Mather, Sashank Santhanam, Samira Shaikh, Tomek Strzalkowski, Bonnie J. Dorr
Abstract: We describe Panacea, a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. Panacea processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the Panacea system is that uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker's time and resources.
摘要：我们描述万能的，一个系统支持自然语言处理（NLP）针对社会工程攻击防御活跃成分。我们部署了人类语言技术的管道，包括Ask和帧检测，命名实体识别，对话工程和Stylometry。万能药通过一个插件架构处理现代信息格式，以适应信息分析，知识表示和对话一代的创新方法。灵丹妙药系统的新颖之处是使用NLP的网络防御和使用机器人攻击者从事引起证据属性攻击和浪费攻击者的时间和资源。

24. An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com [PDF] 返回目录
Shadi Shahsavari, Ehsan Ebrahimzadeh, Behnam Shahbazi, Misagh Falahi, Pavan Holur, Roja Bandari, Timothy R. Tangherlini, Vwani Roychowdhury
Abstract: Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on this http URL. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others.
摘要：文学小说的社会化媒体读者评论，尤其是那些在持续，专用论坛，创建并反过来通过叙事框架底层驱动。在他们对一种新的评论，读者通常只包括人物和他们的关系的一个子集，从而提供对工作在有限的观点。然而，在总，我们标注“共识叙事框架”这些评论捕捉由不同的行动元（人，地点，事物），它们的作用和交互的底层叙事框架。我们代表的actant关系的故事图的形式这个框架。这提取图形是一个具有挑战性的计算问题，这是我们提出的潜在图形模型估计问题。帖子和评论被看作子图/隐藏的叙事框架网络的样本。通过格雷玛斯的定性叙事理论的启发，我们制定其中的节点代表节点捕捉特定上下文关系中的行动元，多边和自循环的图形生成机器学习（ML）模型。我们开发联锁自动方法提取关键行动元和它们之间的关系的管道，并将其应用到成千上万的评论和意见的公布在此http网址。我们手动派生从的SparkNotes基本事实的叙述框架，然后用字嵌入工具，我们提取网络比较地面实况网络关系。我们发现，我们的自动方法生成高度精确的共识叙事框架：我们的四个目标小说，每个新的约2900条评论中，我们报道的> 80％的重要关系和> 89的平均边缘检测率平均覆盖率/召回\％。这些提取的叙事框架可以产生洞察人（或人班）如何阅读，以及他们如何讲述自己曾经读给他人。

25. Grounding Conversations with Improvised Dialogues [PDF] 返回目录
Hyundong Cho, Jonathan May
Abstract: Effective dialogue involves grounding, the process of establishing mutual knowledge that is essential for communication between people. Modern dialogue systems are not explicitly trained to build common ground, and therefore overlook this important aspect of communication. Improvisational theater (improv) intrinsically contains a high proportion of dialogue focused on building common ground, and makes use of the yes-and principle, a strong grounding speech act, to establish coherence and an actionable objective reality. We collect a corpus of more than 26,000 yes-and turns, transcribing them from improv dialogues and extracting them from larger, but more sparsely populated movie script dialogue corpora, via a bootstrapped classifier. We fine-tune chit-chat dialogue systems with our corpus to encourage more grounded, relevant conversation and confirm these findings with human evaluations.
摘要：有效的对话涉及接地，建立相互知识是人们之间的沟通必不可少的过程。现代的对话系统没有明确训练来建立共识，并因此忽视沟通的这一重要方面。即兴剧场（即兴）本质上包含对话的高比例着力构建共同点，并且利用了肯定和原则，强大的接言语行为，以建立连贯性和可操作的客观现实。我们收集超过26000个是的，曲折的语料库，从即兴对话抄写他们，从更大的提取它们，但更多的人口稀少电影剧本对话语料，通过自举分类。我们与我们的语料库，以鼓励更多的接地，相关谈话，证实与人类评估这些发现微调闲聊对话系统。

26. Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance [PDF] 返回目录
Bhawani Selvaretnam, Mohammed Belkhatir
Abstract: The search of information in large text repositories has been plagued by the so-called document-query vocabulary gap, i.e. the semantic discordance between the contents in the stored document entities on the one hand and the human query on the other hand. Over the past two decades, a significant body of works has advanced technical retrieval prowess while several studies have shed light on issues pertaining to human search behavior. We believe that these efforts should be conjoined, in the sense that automated retrieval systems have to fully emulate human search behavior and thus consider the procedure according to which users incrementally enhance their initial query. To this end, cognitive reformulation patterns that mimic user search behaviour are highlighted and enhancement terms which are statistically collocated with or lexical-semantically related to the original terms adopted in the retrieval process. We formalize the application of these patterns by considering a query conceptual representation and introducing a set of operations allowing to operate modifications on the initial query. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type. An experimental evaluation on real-world datasets against relevance, language, conceptual and knowledge-based models is conducted. We also show, when compared to language and relevance models, a better performance in terms of mean average precision than a word embedding-based model instantiation.
摘要：在信息大文本储存库的搜索已经由所谓的文档查询词汇差距所困扰，即一方面是在存储文件的实体内容，并在另一方面，人力查询之间的语义不一致。在过去的二十年中，作品的显著体具有先进技术实力的检索，而一些研究就有关人肉搜索行为问题的线索。我们认为，这些努力应该被联合，在这个意义上，自动检索系统必须完全模仿人类的搜索行为，从而考虑根据其用户逐步提高他们的初始查询的过程。为此，认知再形成模式，模拟用户的搜索行为突出和增强方面其统计与搭配或词汇，语义与在检索过程中所采用的原有条款。我们通过考虑查询概念表示，引入一组操作允许在初始查询操作的修改正式确定这些图案的应用。遗传算法为基础的加权处理允许用户根据自己的概念角色的类型把重点放在条款。关于对关联性，语言，概念和知识为基础的模型真实世界的数据集的实验评估进行。我们还表明，相对于语言和相关性模型时，在一个字的平均平均精度方面有更好的表现埋设 - 基于模型的实例。

27. Adaptive Interaction Fusion Networks for Fake News Detection [PDF] 返回目录
Lianwei Wu, Yuan Rao
Abstract: The majority of existing methods for fake news detection universally focus on learning and fusing various features for detection. However, the learning of various features is independent, which leads to a lack of cross-interaction fusion between features on social media, especially between posts and comments. Generally, in fake news, there are emotional associations and semantic conflicts between posts and comments. How to represent and fuse the cross-interaction between both is a key challenge. In this paper, we propose Adaptive Interaction Fusion Networks (AIFN) to fulfill cross-interaction fusion among features for fake news detection. In AIFN, to discover semantic conflicts, we design gated adaptive interaction networks (GAIN) to capture adaptively similar semantics and conflicting semantics between posts and comments. To establish feature associations, we devise semantic-level fusion self-attention networks (SFSN) to enhance semantic correlations and fusion among features. Extensive experiments on two real-world datasets, i.e., RumourEval and PHEME, demonstrate that AIFN achieves the state-of-the-art performance and boosts accuracy by more than 2.05% and 1.90%, respectively.
摘要：大多数为假新闻的检测方法存在的普遍重视学习和融合各种功能进行检测。然而，各种特征的学习是独立的，这导致缺乏对社交媒体功能之间的交叉互动融合，特别是文章和评论之间。一般情况下，在假新闻，有情感协会和文章和评论之间的语义冲突。如何表示和保险丝两之间的交叉互动是一个关键的挑战。在本文中，我们提出了自适应交互融合网络（AIFN）以满足功能为假新闻的检测之间的交叉互动融合。在AIFN，发现语义上的冲突，我们设计门自适应相互作用网络（GAIN）来捕获自适应类似语义和文章和评论之间冲突的语义。要建立要素关联，我们设计语义级融合的自我关注网络（器SFsn）以增强功能之间的语义关联和融合。在两个现实世界的数据集，即，RumourEval和PHEME，广泛的实验表明，通过AIFN分别超过2.05％和1.90％，达到状态的最先进的性能和提升准确度。

28. Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation [PDF] 返回目录
Feng Lu, Anca Dumitrache, David Graus
Abstract: With the uptake of algorithmic personalization in the news domain, news organizations increasingly trust automated systems with previously considered editorial responsibilities, e.g., prioritizing news to readers. In this paper we study an automated news recommender system in the context of a news organization's editorial values. We conduct and present two online studies with a news recommender system, which span one and a half months and involve over 1,200 users. In our first study we explore how our news recommender steers reading behavior in the context of editorial values such as serendipity, dynamism, diversity, and coverage. Next, we present an intervention study where we extend our news recommender to steer our readers to more dynamic reading behavior. We find that (i) our recommender system yields more diverse reading behavior and yields a higher coverage of articles compared to non-personalized editorial rankings, and (ii) we can successfully incorporate dynamism in our recommender system as a re-ranking method, effectively steering our readers to more dynamic articles without hurting our recommender system's accuracy.
摘要：随着算法的个性化在新闻领域的摄取，新闻机构越来越信任与以前考虑的编辑职责，例如，优先新闻阅读器的自动化系统。在本文中，我们在一个新闻机构的编辑值的背景下研究自动化新闻推荐系统。我们开展与新闻推荐系统存在两个在线研究，其中跨越一个半月，涉及1200个多用户。在我们的第一个研究中，我们探讨如何我们的新闻推荐公牛在编辑值，如意外发现，活力，多样性和覆盖率的情况下阅读行为。接下来，我们提出了一个干预研究，我们扩展我们的新闻推荐操纵我们的读者更动态的阅读行为。我们发现，（我）我们的推荐系统产生更多样化的阅读行为和收益率的文章较高的覆盖率相比，非个性化编辑的排名，以及（ii）我们能够成功地一体化的动力在我们的推荐系统的重新排序的方法，有效地转向我们的读者更动态的文章不伤害我们的推荐系统的准确性。

29. The Ivory Tower Lost: How College Students Respond Differently than the General Public to the COVID-19 Pandemic [PDF] 返回目录
Viet Duong, Phu Pham, Tongyu Yang, Yu Wang, Jiebo Luo
Abstract: Recently, the pandemic of the novel Coronavirus Disease-2019 (COVID-19) has presented governments with ultimate challenges. In the United States, the country with the highest confirmed COVID-19 infection cases, a nationwide social distancing protocol has been implemented by the President. For the first time in a hundred years since the 1918 flu pandemic, the US population is mandated to stay in their households and avoid public contact. As a result, the majority of public venues and services have ceased their operations. Following the closure of the University of Washington on March 7th, more than a thousand colleges and universities in the United States have cancelled in-person classes and campus activities, impacting millions of students. This paper aims to discover the social implications of this unprecedented disruption in our interactive society regarding both the general public and higher education populations by mining people's opinions on social media. We discover several topics embedded in a large number of COVID-19 tweets that represent the most central issues related to the pandemic, which are of great concerns for both college students and the general public. Moreover, we find significant differences between these two groups of Twitter users with respect to the sentiments they expressed towards the COVID-19 issues. To our best knowledge, this is the first social media-based study which focuses on the college student community's demographics and responses to prevalent social issues during a major crisis.
摘要：近日，流行的新型冠状病毒病-2019（COVID-19）已经提出了政府与终极挑战。在美国，最高证实COVID-19感染病例的国家，在全国范围内社会隔离协议已经由总统实施。为了在一百年以来，1918年的流感大流行第一次，美国人口的任务是留在自己的家庭和公众避免接触。其结果是，大多数公共场所和服务的已停止经营。继华盛顿大学的3月7日关闭，千余所高校在美国已经取消了人的班级和校园活动，影响到数以百万计的学生。本文旨在发现在我们的互动社会对公众和通过挖掘社会化媒体人的意见高等教育的人口都这一前所未有的破坏的社会影响。我们发现嵌入大量表示相关的流行病，这是对大学生和普通市民都的高度关注最核心的问题COVID-19鸣叫的几个主题。此外，我们发现相对于他们对COVID-19的问题所表达的观点这两个群体的Twitter用户之间的显著差异。据我们所知，这是第一个社交媒体基础研究，重大危机期间的重点是大学生社区的人口和普遍的社会问题作出答复。

30. Discrete Variational Attention Models for Language Generation [PDF] 返回目录
Xianghong Fang, Haoli Bai, Zenglin Xu, Michael Lyu, Irwin King
Abstract: Variational autoencoders have been widely applied for natural language generation, however, there are two long-standing problems: information under-representation and posterior collapse. The former arises from the fact that only the last hidden state from the encoder is transformed to the latent space, which is insufficient to summarize data. The latter comes as a result of the imbalanced scale between the reconstruction loss and the KL divergence in the objective function. To tackle these issues, in this paper we propose the discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages. Our approach is combined with an auto-regressive prior to capture the sequential dependency from observations, which can enhance the latent space for language generation. Moreover, thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse. Furthermore, we carefully analyze the superiority of discrete latent space over the continuous space with the common Gaussian distribution. Extensive experiments on language generation demonstrate superior advantages of our proposed approach in comparison with the state-of-the-art counterparts.
摘要：变自动编码已被广泛应用于自然语言生成，但是，有两个长期存在的问题：代表性不足和后崩溃的信息。前者源于一个事实，即只从编码器的最后一个隐藏的状态转化为潜在空间，这是不够的汇总数据。后者当属重建损失和在目标函数中的KL散度之间的不平衡比例的结果。为了解决这些问题，在本文中，我们提出了超过由于在语言的离散性的注意机制分类分布离散变注意力模型。我们的方法是从观测自回归捕捉之前连续的依赖，可以增强语言生成的潜在空间相结合。此外，由于离散的特性，我们提出的方法的训练不会崩溃后苦。此外，我们仔细分析的离散潜在空间优于与普通高斯分布的连续空间。在语言生成大量的实验证明与国家的最先进的同行相比，我们提出的方法的优点出众。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-22

目录

摘要