摘要

1. End-to-End Multi-speaker Speech Recognition with Transformer [PDF] 返回目录
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
Abstract: Recently, fully recurrent neural network (RNN) based end-to-end models have been proven to be effective for multi-speaker speech recognition in both the single-channel and multi-channel scenarios. In this work, we explore the use of Transformer models for these tasks by focusing on two aspects. First, we replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture. Second, in order to use the Transformer in the masking network of the neural beamformer in the multi-channel case, we modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation. Besides the model architecture improvements, we also incorporate an external dereverberation preprocessing, the weighted prediction error (WPE), enabling our model to handle reverberated signals. Experiments on the spatialized wsj1-2mix corpus show that the Transformer-based models achieve 40.9% and 25.6% relative WER reduction, down to 12.1% and 6.4% WER, under the anechoic condition in single-channel and multi-channel tasks, respectively, while in the reverberant case, our methods achieve 41.5% and 13.8% relative WER reduction, down to 16.5% and 15.2% WER.
摘要：近日，完全回归神经网络（RNN）的端至高端机型已被证明是有效的在单通道和多通道两种情况下多说话者声音识别。在这项工作中，我们侧重于两个方面探讨使用Transformer模型为这些任务的。首先，我们更换了变压器架构的语音识别模型基于RNN编码器，解码器。其次，为了使用Transformer在多通道情况下，神经波束形成器的屏蔽网络中，我们修改了自注意成分被限制在一个段，而不是整个序列，以减少计算量。除了模型体系结构的改进，我们还包含一个外部去混响预处理，加权预测误差（WPE），使我们的模型来处理混响信号。在空间化wsj1-2mix语料库表明，基于变压器的模型达到40.9％和25.6％的相对减少WER，下降到12.1％和6.4％WER，在单通道和多通道任务的消声条件下，分别的实验，而在混响情况下，我们的方法达到41.5％和13.8％的相对减少WER，下降到16.5％和15.2％WER。

2. A Probabilistic Formulation of Unsupervised Text Style Transfer [PDF] 返回目录
Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick
Abstract: We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion. In contrast with traditional generative sequence models (e.g. the HMM), our model makes few assumptions about the data it generates: it uses a recurrent language model as a prior and an encoder-decoder as a transduction distribution. While computation of marginal data likelihood is intractable in this model class, we show that amortized variational inference admits a practical surrogate. Further, by drawing connections between our variational objective and other recent unsupervised style transfer and machine translation techniques, we show how our probabilistic view can unify some known non-generative objectives such as backtranslation and adversarial loss. Finally, we demonstrate the effectiveness of our method on a wide range of unsupervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation. Across all style transfer tasks, our approach yields substantial gains over state-of-the-art non-generative baselines, including the state-of-the-art unsupervised machine translation techniques that our approach generalizes. Further, we conduct experiments on a standard unsupervised machine translation task and find that our unified approach matches the current state-of-the-art.
摘要：我们提出了统一了先前提出的非生殖技术监督的文本样式转移了深刻的生成模型。从两个结构域为部分观察到平行语料库我们的概率方法模型非并行数据。通过假设到生成每个观察到的序列的并行潜序列，我们的模型学习从一个域变换序列到另一个在完全无监督方式。与传统的生成序列的模型（例如，HMM）相比之下，我们的模型使得它生成数据一些假设：它采用的是复发性语言模型作为先验和编码器 - 解码器作为转导分布。虽然边际数据可能性的计算是在这个模型类棘手，我们表明，摊销变推理承认一个现实的替代。此外，通过我们的目标变和其他最近的无监督式的转移和机器翻译技术之间绘制连接，我们将展示我们的概率观点如何能够统一一些已知的非生成目标，如回译和对抗性的损失。最后，我们证明我们的方法对大范围的无监督式的传输任务，包括情绪转移，转让手续，文字解读，作者模仿，以及相关的语言翻译的有效性。在所有风格的传输任务，我们的做法得到了国家的最先进的非生成基线大有斩获，其中包括国家的最先进的无监督的机器翻译技术，我们的方法推广。此外，我们在标准无监督的机器翻译任务进行实验，发现我们统一的方法当前国家的最先进的匹配。

3. A Study of Human Summaries of Scientific Articles [PDF] 返回目录
Odellia Boni, Guy Feigenblat, Doron Cohen, Haggai Roitman, David Konopnicki
Abstract: Researchers and students face an explosion of newly published papers which may be relevant to their work. This led to a trend of sharing human summaries of scientific papers. We analyze the summaries shared in one of these platforms this http URL. The goal is to characterize human summaries of scientific papers, and use some of the insights obtained to improve and adapt existing automatic summarization systems to the domain of scientific papers.
摘要：研究人员和学生面临的新发表的论文可能是与其工作相关的爆炸。这导致了共享的科学论文人类总结的趋势。我们分析在这些平台上的这个HTTP URL一个共享的摘要。我们的目标是表征的科学论文人类汇总，并使用一些得到改善和现有的自动摘要系统适应的科学论文域的见解。

4. What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process [PDF] 返回目录
Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael R. Lyu, Irwin King
Abstract: In our world with full of uncertainty, debates and argumentation contribute to the progress of science and society. Despite of the increasing attention to characterize human arguments, most progress made so far focus on the debate outcome, largely ignoring the dynamic patterns in argumentation processes. This paper presents a study that automatically analyzes the key factors in argument persuasiveness, beyond simply predicting who will persuade whom. Specifically, we propose a novel neural model that is able to dynamically track the changes of latent topics and discourse in argumentative conversations, allowing the investigation of their roles in influencing the outcomes of persuasion. Extensive experiments have been conducted on argumentative conversations on both social media and supreme court. The results show that our model outperforms state-of-the-art models in identifying persuasive arguments via explicitly exploring dynamic factors of topic and discourse. We further analyze the effects of topics and discourse on persuasiveness, and find that they are both useful - topics provide concrete evidence while superior discourse styles may bias participants, especially in social media arguments. In addition, we draw some findings from our empirical results, which will help people better engage in future persuasive conversations.
摘要：在我们与充满不确定性，辩论和论证的世界做出贡献的科学和社会的进步。尽管日益关注人类的特征参数，大部分取得的进展至今专注于辩论结果如何，在很大程度上忽视了在论证过程中的动态模式。本文提出了一种研究一种能够自动分析的关键因素，论证的说服力，超越了简单的预测谁将会说服谁。具体来说，我们提出了一种新的神经模型，该模型能够动态跟踪的潜在主题和议论交谈变化的话语，让自己的角色的影响说服效果的调查。大量的实验已经在这两个社交媒体和最高法院议论对话进行。结果表明，我们的模型优于国家的最先进的车型在通过主题和话语的明确探索动态因素识别有说服力的论据。我们进一步分析主题和话语的影响说服力，并且发现它们都是有用的 - 主题提供了确凿的证据，而优越的话语风格可偏向的参与者，尤其是在社交媒体上的参数。此外，我们从实证结果，这将帮助人们更好地参与未来有说服力的交谈得出一些结论。

5. Multilingual Alignment of Contextual Word Representations [PDF] 返回目录
Steven Cao, Nikita Kitaev, Dan Klein
Abstract: We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching pseudo-fully-supervised translate-train models for Bulgarian and Greek. Further, to measure the degree of alignment, we introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer. Using this word retrieval task, we also analyze BERT and find that it exhibits systematic deficiencies, e.g. worse alignment for open-class parts-of-speech and word pairs written in different scripts, that are corrected by the alignment procedure. These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
摘要：我们提出了评估和加强情境嵌入定位，并表明他们是在分析和改善多语种BERT有用的程序。特别是，我们提出的调整过程之后，BERT展品显著上XNLI相比基本模型提高零射门的表现，非常匹配伪充分监督翻译火车模型，保加利亚和希腊。此外，测量校准的程度，我们介绍检索词的上下文版本，并表明它与下游的零次转让很好的相关性。使用这个检索词的任务，我们也分析BERT，发现它具有系统性缺陷，例如对于用不同的脚本开放类零件的词性和词的对，由校准程序纠正糟糕对齐。这些结果支持上下文定位为了解大型多语种预训练模型一个有用的概念。

6. Limits of Detecting Text Generated by Large-Scale Language Models [PDF] 返回目录
Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
Abstract: Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.
摘要：有些人认为大型语言模型，可以产生长期而连贯的作品文本的危险，因为它们可能在误导广告系列。在这里，我们制定的大型语言模型输出检测为假设检验问题进行分类文本作为真正的或产生的。我们表明，特定的语言模型误差的指数在他们困惑的语言生成的性能衡量标准方面是有界的。在假设人类语言是固定的，并且遍历，该制剂是从考虑特定的语言模型来考虑最大似然语言模型中，类k阶马尔可夫近似值之间延伸;错误概率表征。结合语义方面信息的一些讨论也给出。

7. Abstractive Summarization for Low Resource Data using Domain Transfer and Data Synthesis [PDF] 返回目录
Ahmed Magooda, Diane Litman
Abstract: Training abstractive summarization models typically requires large amounts of data, which can be a limitation for many domains. In this paper we explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods when applied to small corpora of student reflections. First, we explored whether tuning state of the art model trained on newspaper data could boost performance on student reflection data. Evaluations demonstrated that summaries produced by the tuned model achieved higher ROUGE scores compared to model trained on just student reflection data or just newspaper data. The tuned model also achieved higher scores compared to extractive summarization baselines, and additionally was judged to produce more coherent and readable summaries in human evaluations. Second, we explored whether synthesizing summaries of student data could additionally boost performance. We proposed a template-based model to synthesize new data, which when incorporated into training further increased ROUGE scores. Finally, we showed that combining data synthesis with domain transfer achieved higher ROUGE scores compared to only using one of the two approaches.
摘要：培训抽象概括模型通常需要大量的数据，这可能是许多领域的限制。在本文中，我们探讨使用域传输和数据合成在应用于学生思考的小语料库提高近期抽象总结方法的性能。首先，我们探讨的培训在报纸上的数据的艺术模型的第二调谐状态是否能提高学生反映数据的表现。评估表明，通过调谐模型产生摘要相比，模型中训练的只是学生的反射数据，或只报数据来实现更高的分数ROUGE。调谐模型相比也萃取汇总基线实现较高的分数，并且还判定为产生更为一致的和人类可读的评价汇总。其次，我们探讨是否合成学生数据的汇总可以额外提高性能。我们提出了一个基于模板的模型来合成新的数据，这些数据在纳入培训进一步提高ROUGE得分。最后，我们显示，与域转移组合数据合成相比仅使用两种方法之一来实现更高ROUGE分数。

8. Attend to the beginning: A study on using bidirectional attention for extractive summarization [PDF] 返回目录
Ahmed Magooda, Cezary Marcjan
Abstract: Forum discussion data differ in both structure and properties from generic form of textual data such as news. Henceforth, summarization techniques should, in turn, make use of such differences, and craft models that can benefit from the structural nature of discussion data. In this work, we propose attending to the beginning of a document, to improve the performance of extractive summarization models when applied to forum discussion data. Evaluations demonstrated that with the help of bidirectional attention mechanism, attending to the beginning of a document (initial comment/post) in a discussion thread, can introduce a consistent boost in ROUGE scores, as well as introducing a new State Of The Art (SOTA) ROUGE scores on the forum discussions dataset. Additionally, we explored whether this hypothesis is extendable to other generic forms of textual data. We make use of the tendency of introducing important information early in the text, by attending to the first few sentences in generic textual data. Evaluations demonstrated that attending to introductory sentences using bidirectional attention, improves the performance of extractive summarization models when even applied to more generic form of textual data.
摘要：论坛讨论数据在结构和性能的文本数据的一般形式不同，如新闻。今后，概括技术应该反过来，利用这种差异，工艺模型，可以从讨论数据的结构性质中受益。在这项工作中，我们建议参加到文档的开头，当应用到论坛讨论数据，以提高采掘总结模型的性能。评估表明，随着双向注意机制的帮助下，参加到讨论线索文件（初始评论/后）的开始，也会引入ROUGE分数一致的提振，以及引入一个新的国家的艺术（SOTA在论坛上讨论的数据集）ROUGE得分。此外，我们探讨这个假设是否扩展到文本数据的其他一般形式。我们利用文本早期引进的重要信息，通过参加在通用文本数据的前几句的倾向。评估表明，使用双向注意参加到介绍性的句子，提高采掘总结机型的表现时，甚至应用于文本数据的更通用的形式。

9. Short Text Classification via Knowledge powered Attention with Similarity Matrix based CNN [PDF] 返回目录
Mingchen Li, Gabtone.Clinton, Yijia Miao, Feng Gao
Abstract: Short text is becoming more and more popular on the web, such as Chat Message, SMS and Product Reviews. Accurately classifying short text is an important and challenging task. A number of studies have difficulties in addressing this problem because of the word ambiguity and data sparsity. To address this issue, we propose a knowledge powered attention with similarity matrix based convolutional neural network (KASM) model, which can compute comprehensive information by utilizing the knowledge and deep neural network. We use knowledge graph (KG) to enrich the semantic representation of short text, specially, the information of parent-entity is introduced in our model. Meanwhile, we consider the word interaction in the literal-level between short text and the representation of label, and utilize similarity matrix based convolutional neural network (CNN) to extract it. For the purpose of measuring the importance of knowledge, we introduce the attention mechanisms to choose the important information. Experimental results on five standard datasets show that our model significantly outperforms state-of-the-art methods.
摘要：短文本正在变得越来越流行网络，比如聊天信息，短信和产品评论的。准确分类短文本是一项重要而艰巨的任务。许多研究都在解决，因为这个词的模糊性和数据稀疏的这个问题的困难。为了解决这个问题，我们提出了基于相似矩阵卷积神经网络（KASM）模型，它可以利用的知识和深层神经网络计算的综合信息知识供电关注。我们用知识图（KG）充实简短的文字，特别是，母公司的实体的信息在我们的模型引入的语义表示。同时，我们认为短文本和标签的表示之间的文字级别的字互动，并利用相似矩阵基于卷积神经网络（CNN）将其解压。用来衡量知识的重要性的目的，我们引入注意机制选择的重要信息。五个标准数据集实验结果表明，我们的模型显著优于国家的最先进的方法。

10. Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization [PDF] 返回目录
Nidhika Yadav, Niladri Chatterjee
Abstract: Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from Pawlak Rough Set based membership function which gives an equivalent characterization of the Rough Set based approximations. It becomes paramount to look beyond the traditional approach of computing memberships while handling inconsistent, erroneous and missing data that is typically present in real world problems. This led us to propose the aggregate Rank Measure. The contribution of the paper is three fold. Firstly, it proposes a Rough Set based measure to be utilized for numerical characterization of within class ranking of objects. Secondly, it proposes and establish the properties of Rank Measure and aggregate Rank Measure based membership. Thirdly, we apply the concept of membership and aggregate ranking to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure. The results proved to have significant improvement in accuracy.
摘要：在机器学习的大多数问题迎合分类和宇宙的对象分类的相关类别。每个决策类宇宙的分类对象的排名是一个具有挑战性的问题。我们在本文中提出了所谓的排名衡量一个新的基于粗糙集的成员来解决这个问题。它应被用于排序的元素到一个特定的类。它不同于帕夫拉克基于粗糙集的隶属度函数这给基于粗糙集近似的等价刻画。它成为极为重要的超越计算成员在处理不一致的，错误的，缺少通常存在于现实世界的问题数据的传统方式。这使我们提出的总排名措施。本文的贡献是三倍。首先，提出了将要用于的内类对象的排名数值表征粗集基于度量。其次，提出并建立等级测量和总浏览量措施的会员的属性。第三，我们申请会员资格的概念和总排名，其中第一使用各种监督学习技术确定句子的重要的一类，并利用所提出的衡量排名的后处理监督多文档文摘的问题。结果证明，在精度显著的改善。

11. Mining Commonsense Facts from the Physical World [PDF] 返回目录
Yanyan Zou
Abstract: Textual descriptions of the physical world implicitly mention commonsense facts, while the commonsense knowledge bases explicitly represent such facts as triples. Compared to dramatically increased text data, the coverage of existing knowledge bases is far away from completion. Most of the prior studies on populating knowledge bases mainly focus on Freebase. To automatically complete commonsense knowledge bases to improve their coverage is under-explored. In this paper, we propose a new task of mining commonsense facts from the raw text that describes the physical world. We build an effective new model that fuses information from both sequence text and existing knowledge base resource. Then we create two large annotated datasets each with approximate 200k instances for commonsense knowledge base completion. Empirical results demonstrate that our model significantly outperforms baselines.
摘要：对物理世界的文本描述隐含提到常识的事实，而常识性的知识基础明确表示这样的事实三倍。相比大幅增加文本数据，现有的知识基础的覆盖面是远离完成。对大多数填充知识库事先研究主要集中在游离碱自动完成常识性的知识基础，提高其覆盖面是充分开发。在本文中，我们提出了从描述物理世界的原始文本挖掘常识性事实的新任务。我们构建一个融合了来自两个序列的文本和已有的知识基础资源信息的有效新模式。然后，我们创建每两个大型注释的数据集与常识的知识基础完成近似200K实例。实证结果表明，我们的模型显著优于基准。

12. HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention [PDF] 返回目录
Qiming Bao, Lin Ni, Jiamou Liu
Abstract: This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collected from the Internet. HHH also implements a novel text representation and similarity deep learning model, Hierarchical BiLSTM Attention Model (HBAM), to find the most similar question from a large QA dataset. We compare HBAM with other state-of-the-art language models such as bidirectional encoder representation from transformers (BERT) and Manhattan LSTM Model (MaLSTM). We train and test the models with a subset of the Quora duplicate questions dataset in the medical area. The experimental results show that our model is able to achieve a superior performance than these existing methods.
摘要：本文提出了一种聊天机器人框架，采用它由一个知识图形和文本相似模型的混合模式。在此基础上聊天机器人框架，我们建立HHH，在线提问和回答（QA）医疗辅助系统是回答复杂的医学问题。 HHH保持从网上收集的医疗数据构建一个知识图谱。 HHH还实现了一个新的文本表示和相似性深度学习模型，分层BiLSTM注意力模型（HBAM），发现从大的QA数据集的最类似的问题。我们比较HBAM与国家的最先进的其他语言模型如变压器双向编码表示（BERT）和曼哈顿LSTM模型（MaLSTM）。我们培养和使用的Quora的重复问题的一个子集测试模型在医疗领域的数据集。实验结果表明，我们的模型能够实现比现有的这些方法优越的性能。

13. LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention [PDF] 返回目录
Xiaoya Li, Yuxian Meng, Arianna Yuan, Fei Wu, Jiwei Li
Abstract: Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass and is highly efficient at inference stage compared with autoregressive translation (AT) methods. However, NAT models often suffer from the multimodality problem, i.e., generating duplicated tokens or missing tokens. In this paper, we propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism. The Look-Around strategy predicts the neighbor tokens in order to predict the current token, and the Vocabulary Attention models long-term token dependencies inside the decoder by attending the whole vocabulary for each position to acquire knowledge of which token is about to generate. %We also propose a dynamic bidirectional decoding approach to accelerate the inference process of the LAVA model while preserving the high-quality of the generated output. Our proposed model uses significantly less time during inference compared with autoregressive models and most other NAT models. Our experiments on four benchmarks (WMT14 En$\rightarrow$De, WMT14 De$\rightarrow$En, WMT16 Ro$\rightarrow$En and IWSLT14 De$\rightarrow$En) show that the proposed model achieves competitive performance compared with the state-of-the-art non-autoregressive and autoregressive models while significantly reducing the time cost in inference phase.
摘要：非自回归转换（NAT）模型生成一个直传多个令牌，并在推论阶段高效自回归转换（AT）方法相比。然而，NAT模式经常遭受来自多模式问题，即，产生重复的令牌或丢失令牌。在本文中，我们提出了两种新的方法来解决这个问题，环视（LA）策略和词汇注意（VA）的机制。环视战略，每个位置上的所有词汇主治地获取知识，其中令牌即将产生预测，以预测当前令牌邻居令牌，解码器内部的词汇注意模型的长期令牌的依赖。％我们也提出了一个动态的双向解码方式，加快LAVA模型的推理过程，同时保留生成的输出的高品质。我们提出的模型采用与自回归模型和其他大多数NAT车型相比推理过程中显著的时间更少。我们的四个基准试验（WMT14恩$ \ RIGHTARROW $德，WMT14德$ \ RIGHTARROW $恩，WMT16滚装$ \ RIGHTARROW $恩和IWSLT14德$ \ RIGHTARROW $恩）显示，随着国家相比，该模型实现了有竞争力的性能-of最先进的非自回归和自回归模型，同时显著降低推断阶段的时间成本。

14. Blank Language Models [PDF] 返回目录
Tianxiao Shen, Victor Quach, Regina Barzilay, Tommi Jaakkola
Abstract: We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. Unlike previous masked language models or the Insertion Transformer, BLM uses blanks to control which part of the sequence to expand. This fine-grained control of generation is ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a blank and whether to insert new blanks, and stops generating when no blanks are left to fill. BLM can be efficiently trained using a lower bound of the marginal data likelihood, and achieves perplexity comparable to traditional left-to-right language models on the Penn Treebank and WikiText datasets. On the task of filling missing text snippets, BLM significantly outperforms all other baselines in terms of both accuracy and fluency. Experiments on style transfer and damaged ancient text restoration demonstrate the potential of this framework for a wide range of applications.
摘要：本文提出空白语言模型（BLM），通过动态地创建和填补空白生成序列模型。不同于以往的蒙面语言模型或插入变压器，BLM使用空格来控制流程的一部分，扩大它。这一代的细粒度控制是适用于各种文本编辑和重写任务。该模型可以从一个单一的空白开始或部分完成具有在指定位置空白文本。它反复确定哪些词来代替一个空白，是否插入新的空白，而当没有空格都留给填充停止发电。 BLM可以使用下界边际数据的可能性被有效的培训，并达到相当的困惑对宾州树库和数据集wikitext的传统左到右的语言模型。在填充缺失的文本片段的任务，BLM显著优于在准确性和流畅性方面的所有其他基线。款式转移和破坏古文字复原实验证明该框架为广泛的应用潜力。

15. Description Based Text Classification with Reinforcement Learning [PDF] 返回目录
Duo Chai, Wei Wu, Qinghong Han, Fei Wu, Jiwei Li
Abstract: The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model lacks for explicit instructions on what to classify. Inspired by the current trend of formalizing NLP problems as question answering tasks, we propose a new framework for text classification, in which each category label is associated with a category description. Descriptions are generated by hand-crafted templates or using abstractive/extractive models from reinforcement learning. The concatenation of the description and the text is fed to the classifier to decide whether or not the current label should be assigned to the text. The proposed strategy forces the model to attend to the most salient texts with respect to the label, which can be regarded as a hard version of attention, leading to better performances. We observe significant performance boosts over strong baselines on a wide range of text classification tasks including single-label classification, multi-label classification and multi-aspect sentiment analysis.
摘要：文本分类的任务通常分为两个阶段：{\它的文本特征提取}和{\它分类}。在这个标准形式化类别只是表示为标签的词汇索引，该模型缺少什么就分类明确的指示。通过正式NLP问题答疑任务的当前趋势的启发，我们提出了文本分类的新框架，其中每个类别标签与类别描述相关联。说明由手工制作的模板或使用抽象/采掘车型从强化学习产生。描述和文字的级联被送到分类，以决定当前标签是否应该被分配到的文本。拟议的战略力量模型出席中最突出的文字相对于标签，这可以看作是人们关注的硬的版本，从而获得更好的性能。我们观察到了一个大范围的文本分类的任务，包括单标签分类，多标签分类和多方位的情感分析强基线显著的性能提升。

16. autoNLP: NLP Feature Recommendations for Text Analytics Applications [PDF] 返回目录
Janardan Misra
Abstract: While designing machine learning based text analytics applications, often, NLP data scientists manually determine which NLP features to use based upon their knowledge and experience with related problems. This results in increased efforts during feature engineering process and renders automated reuse of features across semantically related applications inherently difficult. In this paper, we argue for standardization in feature specification by outlining structure of a language for specifying NLP features and present an approach for their reuse across applications to increase likelihood of identifying optimal features.
摘要：在设计基于机器学习的文本分析应用中，常，NLP数据科学家手动确定NLP功能，才能使用根据其与相关问题的知识和经验。在功能设计过程，这导致加大工作力度和渲染自动化的跨越语义相关的应用程序本身就难以功能重用。在本文中，我们通过概述用于指定NLP特征的语言的结构主张在特征规格标准化和呈现的方法用于其再利用跨应用程序以增加识别最佳特征的可能性。

17. Snippext: Semi-supervised Opinion Mining with Augmented Data [PDF] 返回目录
Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan
Abstract: Online services are interested in solutions to opinion mining, which is the problem of extracting aspects, opinions, and sentiments from text. One method to mine opinions is to leverage the recent success of pre-trained language models which can be fine-tuned to obtain high-quality extractions from reviews. However, fine-tuning language models still requires a non-trivial amount of training data. In this paper, we study the problem of how to significantly reduce the amount of labeled training data required in fine-tuning language models for opinion mining. We describe Snippext, an opinion mining system developed over a language model that is fine-tuned through semi-supervised learning with augmented data. A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data. We show with extensive experiments that Snippext performs comparably and can even exceed previous SOTA results on several opinion mining tasks with only half the training data required. Furthermore, it achieves new SOTA results when all training data are leveraged. By comparison to a baseline pipeline, we found that Snippext extracts significantly more fine-grained opinions which enable new opportunities of downstream applications.
摘要：在线服务，有兴趣的解决方案，意见挖掘，这是从文本中提取方面，意见和情绪的问题。一种方法矿井的意见是利用近期的预先训练语言模型，可以进行微调，以从审查获得高品质的提取成功。但是，微调语言模型仍然需要训练数据的不平凡的量。在本文中，我们研究如何显著减少微调语言模型所需的意见挖掘标记的训练数据量的问题。我们描述Snippext，发展了语言模型的意见挖掘系统进行微调，通过半监督学习与增强的数据。 Snippext的一个新的特点是其巧妙利用一个双叉方式通过与小标记的训练数据实现状态的最先进的（SOTA）性能：（1）数据的增强自动生成从现有的多个标记的训练数据，和（2）半监督学习技术来利用除了标记的数据（的限制量）的未标记数据的巨量。我们发现有大量的实验证明，Snippext执行同等而且甚至超过几个意见挖掘任务以前SOTA结果只需要训练数据的一半。此外，实现了新的SOTA结果时，所有的训练数据利用。通过比较基线管道，我们发现，Snippext提取显著更细粒度的意见这使下游应用新的机遇。

18. Pre-training Tasks for Embedding-based Large-scale Retrieval [PDF] 返回目录
Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar
Abstract: We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then re-ranks the documents. Critically, the retrieval algorithm not only desires high recall but also requires to be highly efficient, returning candidates in time sublinear to the number of documents. Unlike the scoring phase witnessing significant advances recently due to the BERT-style pre-training tasks on cross-attention models, the retrieval phase remains less well studied. Most previous works rely on classic Information Retrieval (IR) methods such as BM-25 (token matching + TF-IDF weights). These models only accept sparse handcrafted features and can not be optimized for different downstream tasks of interest. In this paper, we conduct a comprehensive study on the embedding-based retrieval models. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks. With adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers. The paragraph-level pre-training tasks we studied are Inverse Cloze Task (ICT), Body First Selection (BFS), Wiki Link Prediction (WLP), and the combination of all three.
摘要：我们认为大规模的查询，文献检索问题：给定一个查询（例如，一个问题），返回一组相关文件（例如，含有答案的段落）从一个大的文档语料库。这个问题往往解决了两个步骤。检索阶段一来降低解空间，返回候选文档的子集。该评价阶段再重新排名文档。重要的是，检索算法不仅欲火高涨的召回，也要求必须高效，及时次线性回归考生文件的数量。不同于评价阶段最近由于跨关注车型BERT式前培训任务目睹显著的进步，检索阶段仍不很好的研究。大多数以前的作品依靠传统的信息检索（IR）的方法，如BM-25（令牌匹配+ TF-IDF权重）。这些模型只接受稀疏手工制作的特点和利益不同的下游任务不能被优化。在本文中，我们进行的基于嵌入的检索模型的综合研究。我们证明了浓厚的学习基于嵌入变压器模型的关键成分是一组前培训任务。随着设计恰当段落级前的训练任务时，Transformer模型可以显着提高与广泛使用的BM-25以及嵌入模型没有变压器。我们研究的段落级前的训练任务是反完形填空任务（ICT），身体的第一选择（BFS），维基连结预测（WLP），三个人的组合。

19. A Novel Kuhnian Ontology for Epistemic Classification of STM Scholarly Articles [PDF] 返回目录
Khalid M. Saqr, Abdelrahman Elsharawy
Abstract: Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The concept of paradigm has not only explained the progress of science, but has also become the central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM scholarly articles. First, we explain how the Kuhnian cycle of science describes research at different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed into modular ontologies which classify scholarly articles according to their contribution to paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the best of the authors knowledge, this is the first attempt for creating an ontology for describing scholarly articles based on the Kuhnian paradigmatic view of science.
摘要：托马斯·库恩建议他的科学发现的范式观五十年前。范式的概念不仅解释科学的进步，而且已经成为STM科学家中央认知概念。在这里，我们采用库恩的哲学原理在分类和评估STM学术文章的影响，构建了一个新的本体的目的。首先，我们解释科学的库恩的周期是如何描述在不同认知阶段的研究。其次，我们展示了库恩的周期怎么可能被重建成分类根据自己的范式为中心的知识贡献学术文章模块化本体。所提出的本体及其情景进行了讨论。为了最好的作者的知识，这是一个用于创建用于描述基于科学的范式库恩视学术文章本体的第一次尝试。

20. SPA: Verbal Interactions between Agents and Avatars in Shared Virtual Environments using Propositional Planning [PDF] 返回目录
Andrew Best, Sahil Narang, Dinesh Manocha
Abstract: We present a novel approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments. Sense-Plan-Ask, or SPA, extends prior work in propositional planning and natural language processing to enable agents to plan with uncertain information, and leverage question and answer dialogue with other agents and avatars to obtain the needed information and complete their goals. The agents are additionally able to respond to questions from the avatars and other agents using natural-language enabling real-time multi-agent multi-avatar communication environments. Our algorithm can simulate tens of virtual agents at interactive rates interacting, moving, communicating, planning, and replanning. We find that our algorithm creates a small runtime cost and enables agents to complete their goals more effectively than agents without the ability to leverage natural-language communication. We demonstrate quantitative results on a set of simulated benchmarks and detail the results of a preliminary user-study conducted to evaluate the plausibility of the virtual interactions generated by SPA. Overall, we find that participants prefer SPA to prior techniques in 84\% of responses including significant benefits in terms of the plausibility of natural-language interactions and the positive impact of those interactions.
摘要：我们提出一个新的方法，用于产生之间合理的口头交互虚拟人样在共享虚拟环境代理和用户化身。感-计划-ASK，或SPA，扩展了命题规划和自然语言处理以前的工作，使代理商计划，不确定信息，并利用问题，并与其他代理和化身答案对话，以获得所需的信息，并完成自己的目标。这些代理还能够使用自然语言实现实时多Agent多具象通信环境从化身的问题和其他代理人回应。我们的算法可以在互动率模拟几十虚拟代理交互，移动，通信，规划，重新规划和。我们发现，我们的算法创建一个小的运行成本，使代理人没有充分利用自然语言交流的能力比药物更有效地完成自己的目标。我们展示了一套模拟基准和细节进行评估通过SPA生成的虚拟互动的合理性进行初步的用户研究结果的定量结果。总体而言，我们发现，参与者更喜欢SPA现有技术中的反应，包括在自然语言交互的真实性和这些交互的积极影响方面显著收益84 \％。

21. Time-aware Large Kernel Convolutions [PDF] 返回目录
Vasileios Lioutas, Yuhong Guo
Abstract: To date, most state-of-the-art sequence modelling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of $O(n^2)$. Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size $k$ acting as a limited-window self-attention, resulting in time complexity of $O(k{\cdot}n)$. In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using the fixed-sized kernel matrix. This method yields a time complexity of $O(n)$, effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation and language modelling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.
摘要：到目前为止，国家的最先进最序列建模架构使用时注意建立生成模型基于语言的任务。其中一些模型使用所有可用的序列令牌来产生注意力分布，结果在时间的$ O（N ^ 2）$的复杂性。可替换地，它们利用在深度方向上与卷积归SOFTMAX大小$ $ķ的内核充当有限窗口自关注，导致时间的O- $（K {\ CDOT} N）$复杂性。在本文中，我们引入时间感知较大的内核（TALK）卷积，一个新的自适应卷积运算该学习如何预测求和内核，而不是使用固定尺寸的内核矩阵的大小。该方法得到的O- $（n）的一个$时间复杂度，有效地使编码处理线到的令牌的数量的序列。我们评估对大型标准机器翻译和语言模型的数据集，表明该方法会说话的卷积构成超越其它基于关注/卷积方法的有效改善。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-02-11

目录

摘要