摘要

1. HAT: Hardware-Aware Transformers for Efficient Natural Language Processing [PDF] 返回目录
Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han
Abstract: Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with $\textit{arbitrary encoder-decoder attention}$ and $\textit{heterogeneous layers}$. Then we train a $\textit{SuperTransformer}$ that covers all candidates in the design space, and efficiently produces many $\textit{SubTransformers}$ with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized $\textit{SubTransformer}$ dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware (CPU, GPU, IoT device). When running WMT'14 translation task on Raspberry Pi-4, HAT can achieve $\textbf{3}\times$ speedup, $\textbf{3.7}\times$ smaller size over baseline Transformer; $\textbf{2.7}\times$ speedup, $\textbf{3.6}\times$ smaller size over Evolved Transformer with $\textbf{12,041}\times$ less search cost and no performance loss. HAT code is this https URL
摘要：变压器是在自然语言处理（NLP）任务无处不在，但它们很难被部署在硬件上，由于密集的计算。为了能够在资源有限的硬件平台低延迟的推论，我们提出用神经结构搜索来设计硬件识别变压器（HAT）。我们首先构建具有$ \ {textit任意编码解码器的关注} $和\ $ {textit异质层} $一个很大的设计空间。然后，我们培养了$ \ {textit} SuperTransformer $覆盖在设计空间中的所有候选人，并有效地产生许多$ \ {textit} SubTransformers $体重共享。最后，我们执行与硬件延迟约束的进化搜索，找到一家专门$ \ {textit} SubTransformer $致力于目标硬件上的运行速度很快。四个机器翻译任务，大量实验表明，HAT可以发现有效的模型，不同的硬件（CPU，GPU，物联网设备）。当树莓派-4上运行WMT'14翻译任务，HAT可以达到$ \ textbf {3} \ $次提速，$ \ textbf {3.7} \ $倍更小的尺寸比基线变压器; $ \ textbf {2.7} \ $次提速，$ \ textbf {3.6} \ $倍更小的尺寸超过演进互感器$ \ textbf {12,041} \ $次搜索较少的成本和性能没有损失。 HAT的代码是这样HTTPS URL

2. Language Models are Few-Shot Learners [PDF] 返回目录
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
摘要：最近的研究已经证明通过对大量语料文本的岗前培训在许多NLP任务和基准可观的收益，然后在一个特定的任务微调。虽然一般任务无关的架构，这种方法仍然需要成千上万的例子数千或数万特定任务的微调数据集。一些东西，目前NLP系统在很大程度上仍然很难做到 - 相比之下，人类可以从一般只有几个例子或简单的说明执行新的语言任务。在这里，我们表明，扩大语言模型，大大提高了工作无关的，为数不多的射门表演，有时甚至达到与现有国家的最先进的微调办法的竞争力。具体来说，我们训练GPT-3，自回归语言模型175个十亿参数，10倍超过以往任何非稀疏的语言模型，并在几合一设定测试其性能。对于所有任务，GPT-3被应用没有任何梯度更新或微调，通过与示范文本的互动纯粹指定的任务，很少拍示范。 GPT-3实现了很多NLP的数据集，包括翻译，问题回答和完形填空任务，以及若干任务，需要强大的性能上即时推理或领域适应性，如解读的话，用在一个新的单词句，或执行3位算术。与此同时，我们也找出一些数据集，其中，GPT-3的几个次学习还是斗争，以及一些数据集，其中GPT-3面对与大型网络语料库培训方法问题。最后，我们发现，GPT-3可以生成人工评估难以从人类撰写的文章区分新闻报道的样本。我们讨论一般这一发现GPT-3的和更广泛的社会影响。

3. Cats climb entails mammals move: preserving hyponymy in compositional distributional semantics [PDF] 返回目录
Gemma De las Cuevas, Andreas Klinger, Martha Lewis, Tim Netzer
Abstract: To give vector-based representations of meaning more structure, one approach is to use positive semidefinite (psd) matrices. These allow us to model similarity of words as well as the hyponymy or is-a relationship. Psd matrices can be learnt relatively easily in a given vector space $M\otimes M^*$, but to compose words to form phrases and sentences, we need representations in larger spaces. In this paper, we introduce a generic way of composing the psd matrices corresponding to words. We propose that psd matrices for verbs, adjectives, and other functional words be lifted to completely positive (CP) maps that match their grammatical type. This lifting is carried out by our composition rule called Compression, Compr. In contrast to previous composition rules like Fuzz and Phaser (a.k.a. KMult and BMult), Compr preserves hyponymy. Mathematically, Compr is itself a CP map, and is therefore linear and generally non-commutative. We give a number of proposals for the structure of Compr, based on spiders, cups and caps, and generate a range of composition rules. We test these rules on a small sentence entailment dataset, and see some improvements over the performance of Fuzz and Phaser.
摘要：为给这意味着更多的结构的基于矢量的表示方法，一种方法是使用半正定（PSD）矩阵。这些让我们的模型的话相似度以及在上下位或者是-的关系。 PSD矩阵可以在给定的向量空间$ M \ otimes M 1 * $，但撰写组成单词短语和句子，我们需要更大的空间表示相对容易地了解到。在本文中，我们介绍了合成对应词的PSD矩阵的通用方法。我们提出了动词，形容词等职能词PSD矩阵提升到完全正（CP）映射那场比赛他们的语法类型。这是提升我们称为压缩，COMPR组成规则进行。在对比等起毛和移相器（又称为KMult和BMult）前组合物的规则，COMPR保留上下义。在数学上，COMPR本身是一个CP地图，因此是线性的并且大体上非交换。我们给出了一些用于COMPR的结构的基础上，蜘蛛，杯子和盖子建议，并产生一系列的组合物的规则。我们在一个小句子蕴涵的数据集测试这些规则，并查看了起毛和移相器的性能有所改善。

4. Adversarial Attacks and Defense on Textual Data: A Review [PDF] 返回目录
Aminul Huq, Mst. Tasnim Pervin
Abstract: Deep leaning models have been used widely for various purposes in recent years in object recognition, self-driving cars, face recognition, speech recognition, sentiment analysis and many others. However, in recent years it has been shown that these models possess weakness to noises which forces the model to misclassify. This issue has been studied profoundly in image and audio domain. Very little has been studied on this issue with respect to textual data. Even less survey on this topic has been performed to understand different types of attacks and defense techniques. In this manuscript we accumulated and analyzed different attacking techniques, various defense models on how to overcome this issue in order to provide a more comprehensive idea. Later we point out some of the interesting findings of all papers and challenges that need to be overcome in order to move forward in this field.
摘要：深倚模型已被近年来在目标识别，自动驾驶汽车，面部识别，语音识别，情感分析和许多其他被广泛用于各种用途。然而，在最近几年已经证明，这些车型具有的弱点，迫使该模型错误分类的噪音。此问题已在图像和音频领域深入的研究。一直很少研究了这个问题对于文本数据。在这个题目就更少了调查已经进行了解不同类型的攻击和防御技术。在这个手稿，我们积累和分析不同的攻击技术，对如何以提供更全面的想法克服这个问题，各种防御模型。后来我们指出一些所有文件和挑战的有趣的发现之一是需要为了在这一领域取得进展来克服。

5. Language (Technology) is Power: A Critical Survey of "Bias" in NLP [PDF] 返回目录
Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach
Abstract: We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing "bias" in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of "bias"---i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements---and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities.
摘要：我们调查146篇论文在NLP系统，分析了“偏见”，发现他们的动机往往是模糊的，不一致的，并在规范的推理缺乏尽管分析“偏见”本质上是一个规范的过程。我们进一步发现，这些论文的被提出的定量技术用于测量或减轻‘偏见’是一对好对手，以他们的动机，不与NLP的相关文献外面搞。基于这些发现，我们通过提出应该指导工作分析“偏见”在NLP系统三项建议描述了一种路径的开端。这些建议停留在语言和社会阶层之间的关系的一个更大的认可，鼓励研究人员和从业人员，以表达自己的“偏见” ---即什么样的系统行为是有害的，概念化的什么样的方式，向谁，为什么，以及规范的推理这些声明根本---和中心工作，围绕受NLP系统，同时询问和reimagining技术人员和这些社区之间的权力关系的社区成员的生活经验。

6. Joint Modelling of Emotion and Abusive Language Detection [PDF] 返回目录
Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
Abstract: The rise of online communication platforms has been accompanied by some undesirable effects, such as the proliferation of aggressive and abusive behaviour online. Aiming to tackle this problem, the natural language processing (NLP) community has experimented with a range of techniques for abuse detection. While achieving substantial success, these methods have so far only focused on modelling the linguistic properties of the comments and the online communities of users, disregarding the emotional state of the users and how this might affect their language. The latter is, however, inextricably linked to abusive behaviour. In this paper, we present the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other. Our results demonstrate that incorporating affective features leads to significant improvements in abuse detection performance across datasets.
摘要：网上交流平台的崛起也伴随着一些不良的影响，如攻击性和虐待行为泛滥在线。旨在解决这个问题，自然语言处理（NLP）社区已尝试了一系列的滥用检测技术。虽然实现大幅的成功，这些方法迄今只注重造型的意见和用户的在线社区的语言特性，不顾用户的情绪状态，以及如何这可能会影响他们的语言。后者，但是，有着千丝万缕的联系滥用行为。在本文中，我们提出了情感和辱骂性语言检测的第一次联合的模式，在多任务学习框架，允许一个任务通知其他试验。我们的研究结果表明，引入情感特征，会导致跨数据集滥用检测性能显著的改善。

7. Variational Neural Machine Translation with Normalizing Flows [PDF] 返回目录
Hendra Setiawan, Matthias Sperber, Udhay Nallasamy, Matthias Paulik
Abstract: Variational Neural Machine Translation (VNMT) is an attractive framework for modeling the generation of target translations, conditioned not only on the source sentence but also on some latent random variables. The latent variable modeling may introduce useful statistical dependencies that can improve translation accuracy. Unfortunately, learning informative latent variables is non-trivial, as the latent space can be prohibitively large, and the latent codes are prone to be ignored by many translation models at training time. Previous works impose strong assumptions on the distribution of the latent code and limit the choice of the NMT architecture. In this paper, we propose to apply the VNMT framework to the state-of-the-art Transformer and introduce a more flexible approximate posterior based on normalizing flows. We demonstrate the efficacy of our proposal under both in-domain and out-of-domain conditions, significantly outperforming strong baselines.
摘要：变神经机器翻译（VNMT）为目标的翻译产生模拟一个有吸引力的框架，条件不仅对原文句子也对一些潜在的随机变量。潜变量模型可能会引入有用的统计相关，可以提高翻译的准确性。不幸的是，学习内容丰富潜在变量是不平凡的，因为潜在空间可能极其大，潜在的代码是容易被许多翻译模型在训练时被忽略。以前的作品强加于潜在代码分布较强的假设和限制NMT架构的选择。在本文中，我们提出了VNMT框架适用于国家的最先进的变压器，并推出基于标准化流程更加灵活的近似后路。我们证明无论是在域和外的域条件下，显著跑赢基准强下，我们的建议的有效性。

8. A Corpus for Large-Scale Phonetic Typology [PDF] 返回目录
Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, Jason Eisner
Abstract: A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions. We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants. Access to such data can greatly facilitate investigation of phonetic typology at a large scale and across many languages. However, it is non-trivial and computationally intensive to obtain such alignments for hundreds of languages, many of which have few to no resources presently available. We describe the methodology to create our corpus, discuss caveats with current methods and their impact on the utility of this data, and illustrate possible research directions through a series of case studies on the 48 highest-quality readings. Our corpus and scripts are publicly available for non-commercial use at this https URL.
摘要：数据驱动的研究的一个主要障碍类型学上是有许多语言足够的数据来得出有意义的结论。我们提出VoxClamantis V1.0，第一个大规模语料库的语音类型学，对准段和690个读数估计音素级标签跨越635种语言，与元音和咝的声音，语音措施一起。获得这样的数据可以大大方便语音类型学的研究在规模大，在许多语言。然而，这是不平凡和计算密集型获得这样的比对数百种语言，其中有许多少数没有资源目前可用。我们描述的方法来创建我们的语料库，与当前的方法和他们对这个数据的效用影响讨论的注意事项，并通过对48最高质量的读数一系列的案例研究的说明可能的研究方向。我们的语料库和脚本是公开可用于非商业用途，在此HTTPS URL。

9. Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [PDF] 返回目录
Dong Bok Lee, Seanie Lee, Woo Tae Jeong, Donghwan Kim, Sung Ju Hwang
Abstract: One of the most crucial challenges in questionanswering (QA) is the scarcity of labeled data,since it is costly to obtain question-answer(QA) pairs for a target text domain with human annotation. An alternative approach totackle the problem is to use automatically generated QA pairs from either the problem context or from large amount of unstructured texts(e.g. Wikipedia). In this work, we propose a hierarchical conditional variational autoencoder(HCVAE) for generating QA pairs given unstructured texts as contexts, while maximizingthe mutual information between generated QApairs to ensure their consistency. We validateourInformation MaximizingHierarchicalConditionalVariationalAutoEncoder (Info-HCVAE) on several benchmark datasets byevaluating the performance of the QA model(BERT-base) using only the generated QApairs (QA-based evaluation) or by using boththe generated and human-labeled pairs (semi-supervised learning) for training, against state-of-the-art baseline models. The results showthat our model obtains impressive performance gains over all baselines on both tasks,using only a fraction of data for training
摘要：一个在questionanswering最关键的挑战（QA）为标记数据的稀缺性，因为它是昂贵的获得问答（QA）对与人类注释的目标文本域。 totackle该问题的另一种方法是从任一问题上下文或从大量非结构化文本（例如维基百科）使用自动生成的QA对。在这项工作中，我们提出了一个分层条件变的自动编码（HCVAE），用于产生给定的非结构化文本的语境QA对，而产生QApairs之间maximizingthe互信息，以保证其一致性。我们validateourInformation MaximizingHierarchicalConditionalVariationalAutoEncoder（INFO-HCVAE）上仅使用产生QApairs（基于QA-评价）的几个基准数据集byevaluating的QA模型（BERT基）的性能或使用boththe产生与人类标记对（半监督学习）进行训练，对国家的最先进的基本模式。结果showthat我们的模型中取得令人印象深刻的性能提升超过两个任务的所有基线，只使用训练数据的一小部分

10. Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search [PDF] 返回目录
Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo
Abstract: In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for first-pass recognition as such. One way to solve this is to approximate the RNNLMs by back-off n-gram models. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition. Furthermore, we develop a new RNNLM approximation method suitable for subword units: It produces variable-order n-grams to include long-span approximations and considers also n-grams that were not originally observed in the training corpus. To evaluate these models on OOVs, we setup Arabic and Finnish Keyword Search tasks concentrating only on OOV words. On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords. Moreover, replacing the baseline approximation with the proposed method achieves the best performance on both multi- and single-character subwords.
摘要：在口语关键字搜索，查询可能含有超出词汇（OOV）训练语音识别系统时，未观察到的话。在第一次通过识别使用子字语言模型（LMS）能够识别OOV词，但即使是子词的n-gram的LM从数据稀疏受到影响。回归神经网络（RNN）的LM减轻稀疏性的问题，但不适合于第一通承认这样。解决方法之一是通过回退n元模型来近似RNNLMs。在本文中，我们建议插值常规的n-gram模型和更好的OOV识别RNNLM逼近。此外，我们开发出适合子字单元新RNNLM近似法：它产生可变顺序的n-gram包括大跨度近似，并且还认为，最初没有在训练语料库观察到的n-gram。要在OOVs，我们建立阿拉伯和芬兰的关键字搜索任务只在OOV词集中评估这些模型。在这些任务，插值基线RNNLM近似和传统的LM性能优于加权值的单字符的子词的最长期限方面的常规LM。此外，所提出的方法代替基线近似值实现在两个多和单字符的子词的最佳性能。

11. Contextual Dialogue Act Classification for Open-Domain Conversational Agents [PDF] 返回目录
Ali Ahmadvand, Jason Ingyu Choi, Eugene Agichtein
Abstract: Classifying the general intent of the user utterance in a conversation, also known as Dialogue Act (DA), e.g., open-ended question, statement of opinion, or request for an opinion, is a key step in Natural Language Understanding (NLU) for conversational agents. While DA classification has been extensively studied in human-human conversations, it has not been sufficiently explored for the emerging open-domain automated conversational agents. Moreover, despite significant advances in utterance-level DA classification, full understanding of dialogue utterances requires conversational context. Another challenge is the lack of available labeled data for open-domain human-machine conversations. To address these problems, we propose a novel method, CDAC (Contextual Dialogue Act Classifier), a simple yet effective deep learning approach for contextual dialogue act classification. Specifically, we use transfer learning to adapt models trained on human-human conversations to predict dialogue acts in human-machine dialogues. To investigate the effectiveness of our method, we train our model on the well-known Switchboard human-human dialogue dataset, and fine-tune it for predicting dialogue acts in human-machine conversation data, collected as part of the Amazon Alexa Prize 2018 competition. The results show that the CDAC model outperforms an utterance-level state of the art baseline by 8.0% on the Switchboard dataset, and is comparable to the latest reported state-of-the-art contextual DA classification results. Furthermore, our results show that fine-tuning the CDAC model on a small sample of manually labeled human-machine conversations allows CDAC to more accurately predict dialogue acts in real users' conversations, suggesting a promising direction for future improvements.
摘要：分类的一般目的的使用者说话的谈话，也被称为对话法案（DA），例如，开放式的问题，意见，或要求提供意见的说法，是在自然语言的关键一步理解（NLU ）用于会话代理。尽管DA分类已在人与人的对话进行了广泛研究，但尚未充分探讨自动化会话代理新兴的开放领域。此外，尽管在话语层面DA分类显著的进展，对话的话语充分的认识需要对话的上下文。另一个挑战是缺乏对开放领域人机对话提供标签数据。为了解决这些问题，我们提出了一种新方法，华助会（上下文对话行为分类），对情境对话行为分类的简单而有效的深层学习方法。具体来说，我们用转移不断学习才能适应训练的人 - 人对话来预测对话中充当人机对话模式。探讨了该方法的有效性，我们培养的知名总机人与人的对话集我们的模型，并且微调其预测的对话行为在人机对话的数据，收集亚马逊的Alexa奖2018竞争的一部分。结果表明，华助会的模型由8.0％上总机数据集优于现有技术基础的发声水平状态，并足以媲美最新报告的国家的最先进的情境DA分类结果。此外，我们的结果表明，微调手动标记的人 - 机对话的一个小样本的CDAC模型允许华助会更准确地预测对话的行为在现实用户的对话，这为今后的改进有前途的方向。

12. Would you Like to Talk about Sports Now? Towards Contextual Topic Suggestion for Open-Domain Conversational Agents [PDF] 返回目录
Ali Ahmadvand, Harshita Sahijwani, Eugene Agichtein
Abstract: To hold a true conversation, an intelligent agent should be able to occasionally take initiative and recommend the next natural conversation topic. This is a challenging task. A topic suggested by the agent should be relevant to the person, appropriate for the conversation context, and the agent should have something interesting to say about it. Thus, a scripted, or one-size-fits-all, popularity-based topic suggestion is doomed to fail. Instead, we explore different methods for a personalized, contextual topic suggestion for open-domain conversations. We formalize the Conversational Topic Suggestion problem (CTS) to more clearly identify the assumptions and requirements. We also explore three possible approaches to solve this problem: (1) model-based sequential topic suggestion to capture the conversation context (CTS-Seq), (2) Collaborative Filtering-based suggestion to capture previous successful conversations from similar users (CTS-CF), and (3) a hybrid approach combining both conversation context and collaborative filtering. To evaluate the effectiveness of these methods, we use real conversations collected as part of the Amazon Alexa Prize 2018 Conversational AI challenge. The results are promising: the CTS-Seq model suggests topics with 23% higher accuracy than the baseline, and incorporating collaborative filtering signals into a hybrid CTS-Seq-CF model further improves recommendation accuracy by 12%. Together, our proposed models, experiments, and analysis significantly advance the study of open-domain conversational agents, and suggest promising directions for future improvements.
摘要：为了保持一个真正的对话，智能代理应该可以偶尔主动和建议的下一个自然对话的话题。这是一项艰巨的任务。由代理人提出的话题应该是相关的人，适当的对话上下文，并且代理应该有一些有趣的说些什么。因此，脚本，或者一个尺寸适合所有人的，基于流行话题的建议是注定要失败的。相反，我们探讨了开放域的会话的个性化，情境主题建议不同的方法。我们正式的对话主题建议的问题（CTS），以更清楚地识别假设和要求。我们还探讨了三种可能的方法来解决这个问题：（1）基于模型的顺序话题建议，捕捉对话上下文（CTS-SEQ），（2）基于内容的过滤，协同建议以捕捉来自类似用户的上一次成功的对话（CTS- CF），和（3）的混合方法组合这两个会话上下文和协同过滤。为了评估这些方法的有效性，我们使用收集亚马逊的Alexa奖2018会话AI挑战的一部分真正的对话。结果是有希望的：在CTS-SEQ模型表明与比基线高23％的准确度的主题，并结合协同过滤信号到混合CTS-SEQ-CF模型12％，进一步提高推荐精度。总之，我们提出的模型，实验和分析显著推进开放域对话剂的研究，并提出承诺为今后的改进方向。

13. ConCET: Entity-Aware Topic Classification for Open-Domain Conversational Agents [PDF] 返回目录
Ali Ahmadvand, Harshita Sahijwani, Jason Ingyu Choi, Eugene Agichtein
Abstract: Identifying the topic (domain) of each user's utterance in open-domain conversational systems is a crucial step for all subsequent language understanding and response tasks. In particular, for complex domains, an utterance is often routed to a single component responsible for that domain. Thus, correctly mapping a user utterance to the right domain is critical. To address this problem, we introduce ConCET: a Concurrent Entity-aware conversational Topic classifier, which incorporates entity-type information together with the utterance content features. Specifically, ConCET utilizes entity information to enrich the utterance representation, combining character, word, and entity-type embeddings into a single representation. However, for rich domains with millions of available entities, unrealistic amounts of labeled training data would be required. To complement our model, we propose a simple and effective method for generating synthetic training data, to augment the typically limited amounts of labeled training data, using commonly available knowledge bases to generate additional labeled utterances. We extensively evaluate ConCET and our proposed training method first on an openly available human-human conversational dataset called Self-Dialogue, to calibrate our approach against previous state-of-the-art methods; second, we evaluate ConCET on a large dataset of human-machine conversations with real users, collected as part of the Amazon Alexa Prize. Our results show that ConCET significantly improves topic classification performance on both datasets, including 8-10% improvements over state-of-the-art deep learning methods. We complement our quantitative results with detailed analysis of system performance, which could be used for further improvements of conversational agents.
摘要：识别每个用户的开放域的会话系统话语的主题（域）是所有后续语言理解和响应任务的关键步骤。特别地，对于复杂的结构域，一个发声通常路由到负责该域的单个部件。因此，正确地映射用户话语向右域是至关重要的。为了解决这个问题，我们引入ConCET：并发实体感知会话话题分类，它与话语内容的功能结合在一起的实体类型信息。具体而言，利用ConCET实体信息来丰富发声表示，组合字符，词，和实体类型的嵌入到单个的表示。然而，对于数以百万计的实体提供丰富的领域，将需要标记的训练数据的不切实际的金额。为了完善我们的模型，我们提出了产生合成训练数据的简单有效的方法，以增强典型的限制数量标记的训练数据，使用常用的知识基础，以产生额外的标记话语。我们广泛的评估ConCET和我们提出的训练方法首先在所谓的自我对话，一个公开可用的人 - 人对话的数据集来校准我们对以前的国家的最先进的方法途径;第二，我们评估大型数据集的人机对话与实际用户，收集亚马逊的Alexa奖的一部分ConCET。我们的研究结果表明，ConCET显著提高了两个数据集的主题分类性能，包括8-10改进过的国家的最先进的深学习方法％。我们用的系统性能进行详细的分析，这可能是用于会话代理的进一步改进完善我们的定量结果。

14. The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion [PDF] 返回目录
Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden
Abstract: In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to simulate a realistic use case, we first released data for 5 development languages. However, systems were officially evaluated on 9 surprise languages, which were only revealed a few days before the submission deadline. We provided a modular baseline system, which is a pipeline of 4 components. 3 teams submitted a total of 7 systems, but, surprisingly, none of the submitted systems was able to improve over the baseline on average over all 9 test languages. Only on 3 languages did a submitted system obtain the best results. This shows that unsupervised morphological paradigm completion is still largely unsolved. We present an analysis here, so that this shared task will ground further research on the topic.
摘要：在本文中，我们描述了SIGMORPHON 2020共享任务的无监督形态范例完成（SIGMORPHON 2020任务2）的结论，在屈折形态的字段的新的任务。参与者被要求提交其采取原始文本和引理作为输入的列表，并输出所有词尾变化的形式，即整个形态的范例，每个引理的系统。为了模拟真实的使用情况下，我们首先发布了5种开发语言的数据。然而，系统正式于9周惊喜的语言，在提交截止日期前几天，只透露评估。我们提供了一个模块化的基线系统，这是4个组成部分的管道。 3支球队一共有7个系统提交，但令人惊讶的，没有提交系统能够平均基线提高了在所有9种测试语言。只有在3种语言并提交的系统获得最佳效果。由此可见，无人监督的形态模式完成在很大程度上仍然没有解决。我们在座的分析，使这一共同任务将地面上的课题进一步研究。

15. Phone Features Improve Speech Translation [PDF] 返回目录
Elizabeth Salesky, Alan W Black
Abstract: End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation. Their performance is often assumed to be superior, though in many conditions this is not yet the case. We compare cascaded and end-to-end models across high, medium, and low-resource conditions, and show that cascades remain stronger baselines. Further, we introduce two methods to incorporate phone features into ST models. We show that these features improve both architectures, closing the gap between end-to-end models and cascades, and outperforming previous academic work -- by up to 9 BLEU on our low-resource setting.
摘要：终端到高端机型的语音翻译（ST）更紧密地结合语音识别（ASR）和机器翻译（MT）比单独的ASR和MT车型传统的级联，具有简单的模型架构，并为减少误差传播的可能性。她们的表现往往认为是卓越的，尽管在许多条件下，这是没有的情况。我们比较级联和整个高，中，低资源条件，以及显示终端到终端的模式，级联保持较强的基线。此外，我们将介绍两种方法纳入手机功能集成到ST车型。我们发现，这些功能提高了这两种体系结构，关闭终端到高端机型和级联之间的差距，并超越以往的学术著作 - 多达9 BLEU我们的低资源设置。

16. Language Representation Models for Fine-Grained Sentiment Classification [PDF] 返回目录
Brian Cheang, Bailey Wei, David Kogan, Howey Qiu, Masud Ahmed
Abstract: Sentiment classification is a quickly advancing field of study with applications in almost any field. While various models and datasets have shown high accuracy inthe task of binary classification, the task of fine-grained sentiment classification is still an area with room for significant improvement. Analyzing the SST-5 dataset,previous work by Munikar et al. (2019) showed that the embedding tool BERT allowed a simple model to achieve state-of-the-art accuracy. Since that paper, several BERT alternatives have been published, with three primary ones being AlBERT (Lan et al., 2019), DistilBERT (Sanh et al. 2019), and RoBERTa (Liu etal. 2019). While these models report some improvement over BERT on the popular benchmarks GLUE, SQuAD, and RACE, they have not been applied to the fine-grained classification task. In this paper, we examine whether the improvements hold true when applied to a novel task, by replicating the BERT model from Munikar et al., and swapping the embedding layer to the alternative models. Over the experiments, we found that AlBERT suffers significantly more accuracy loss than reported on other tasks, DistilBERT has accuracy loss similar to their reported loss on other tasks while being the fastest model to train, and RoBERTa reaches anew state-of-the-art accuracy for prediction on the SST-5 root level (60.2%).
摘要：情感分类研究是用在几乎任何领域的应用迅速发展的领域。虽然各种模型和数据集显示高精度的二元分类的任务在矿井，细粒度的情感分类的任务依然具有空间显著改进的领域。通过Munikar等人分析了SST-5集，以前的工作。（2019）表明，嵌入工具BERT允许一个简单的模型，以实现状态的最先进的精度。由于纸张，几个BERT方案已公布，有三个最基本的方法是艾伯特（Lan等，2019），DistilBERT（葬身等2019），和罗伯塔（刘等人。2019）。虽然这些车型上流行的基准胶水，阵容，和RACE报告了BERT一些改进，它们还没有被应用到细粒度分类任务。在本文中，我们将考察在应用于新的任务，由Munikar等复制BERT模式，并嵌入层交换的替代机型的改进是否成立。在实验中，我们发现，阿尔伯特遭受比报导的其他任务显著更多的精度损失，DistilBERT具有相似，他们在其他任务报损的精度损失，同时最快的模型火车和罗伯塔达到重新国家的最先进的精度对SST-5根级（60.2％）的预测。

17. In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology [PDF] 返回目录
Chundra A. Cathcart, Florian Wandl
Abstract: This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in a multilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model outperforms the other two in terms of accuracy, but the Sigmoid model's language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research.
摘要：本文研究了神经网络结构，有效地学习历时音韵概括在多语言环境的能力。我们采用的模型使用了三种不同类型的语言嵌入（密集，乙状结肠和直通式）。我们发现，直通模型优于在精度方面的另外两个，但乙状结肠模型的语言的嵌入显示与斯拉夫语的传统亚组最强的协议。我们发现，直通模式了解到连贯的，有关声音变半可解释的信息，并为未来的研究方向的轮廓。

18. Variational Autoencoder with Embedded Student-$t$ Mixture Model for Authorship Attribution [PDF] 返回目录
Benedikt Boenninghoff, Steffen Zeiler, Robert M. Nickel, Dorothea Kolossa
Abstract: Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. More precisely, we are extending a variational autoencoder (VAE) with embedded Gaussian mixture model to a Student-$t$ mixture model. Autoencoders have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending the Gaussian model for the VAE to a Student-$t$ model, which allows for an independent control of the "heaviness" of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.
摘要：传统的计算归属作者描述了在一个封闭的集场景分类任务。给定一个有限组候选作家和相应的标记文本，目的是确定的作者已经写另一组匿名或有争议的文本。在这项工作中，我们提出了一种概率autoencoding框架，以应对这种监督分类任务。更准确地说，我们正在延伸的变自动编码器（VAE）与嵌入式高斯混合模式，以学生为$ T $混合模型。自动编码不得不在学习潜表示了巨大的成功。然而，现有的VAES目前仍然由底层概率分布的假设高斯于潜在空间所施加的限制约束。在这项工作中，我们对VAE高斯模型延伸到学生为$ T $模型，它允许隐含概率密度的各个尾部的“沉重”的独立控制。在亚马逊的评论数据集的实验表明该方法的优越性能。

19. When Can Self-Attention Be Replaced by Feed Forward Layers? [PDF] 返回目录
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Abstract: Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability to capture temporal relationships without being limited by the distance between two related events. However, we note that the range of the learned context progressively increases from the lower to upper self-attention layers, whilst acoustic events often happen within short time spans in a left-to-right order. This leads to a question: for speech recognition, is a global view of the entire sequence still important for the upper self-attention layers in the encoder of Transformers? To investigate this, we replace these self-attention layers with feed forward layers. In our speech recognition experiments (Wall Street Journal and Switchboard), we indeed observe an interesting result: replacing the upper self-attention layers in the encoder with feed forward layers leads to no performance drop, and even minor gains. Our experiments offer insights to how self-attention layers process the speech signal, leading to the conclusion that the lower self-attention layers of the encoder encode a sufficiently wide range of inputs, hence learning further contextual information in the upper layers is unnecessary.
摘要：最近，相比于语音识别递归神经网络系统的自我关注车型如变形金刚给予有竞争力的结果。对自我的关注车型的卓越性能的关键因素是他们捕捉到时间关系，而不由两个相关的事件之间的距离被限制的能力。然而，我们注意到，学习方面的范围逐渐从下上升至上自我关注层，而声事件往往在左到右的顺序很短的时间跨度内发生。这就导致了一个问题：语音识别，是整个序列上自我关注层在变压器的编码器仍然是重要的全球观点？为了研究这个问题，我们替换这些自我关注层，前馈层。在我们的语音识别实验（华尔街日报和总机），我们的确观察到一个有趣的结果：与前馈层没有导致性能下降，甚至略有改善替换上的自我关注层的编码器。我们的实验中提供见解自关注层如何处理该语音信号，从而得出结论，该编码器编码一个足够宽的范围内的输入，因此学习上层进一步的上下文信息的下自关注层是不必要的。

20. Learning Various Length Dependence by Dual Recurrent Neural Networks [PDF] 返回目录
Chenpeng Zhang, Shuai Li, Mao Ye, Ce Zhu, Xue Li
Abstract: Recurrent neural networks (RNNs) are widely used as a memory model for sequence-related problems. Many variants of RNN have been proposed to solve the gradient problems of training RNNs and process long sequences. Although some classical models have been proposed, capturing long-term dependence while responding to short-term changes remains a challenge. To this problem, we propose a new model named Dual Recurrent Neural Networks (DuRNN). The DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence. The first part is a recurrent neural network with constrained full recurrent connections to deal with short-term dependence in sequence and generate short-term memory. Another part is a recurrent neural network with independent recurrent connections which helps to learn long-term dependence and generate long-term memory. A selection mechanism is added between two parts to help the needed long-term information transfer to the independent neurons. Multiple modules can be stacked to form a multi-layer model for better performance. Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence. Both theoretical analysis and extensive experiments are conducted to validate the performance of our model, and we also conduct simple visualization experiments and ablation analyses for the model interpretability. Experimental results indicate that the proposed DuRNN model can handle not only very long sequences (over 5000 time steps), but also short sequences very well. Compared with many state-of-the-art RNN models, our model has demonstrated efficient and better performance.
摘要：经常性神经网络（RNNs）被广泛用作序列有关的问题存储器模型。已经提出RNN有很多变种，解决训练RNNs和流程长序列的梯度问题。尽管一些经典机型已经提出，获取长期的依赖，同时应对短期变化仍然是一个挑战。对于这个问题，我们提出了一个名为双循环神经网络（DuRNN）新模式。该DuRNN由两个部分组成学习短期的依赖，并逐步学会了长期依赖。第一部分是与约束充分反复连接的回归神经网络来处理短期依赖于序列和产生短期记忆。另一部分是具有自主经常连接的递归神经网络，这有助于学习的长期依赖，并产生长期记忆。两个部分之间添加了一个选择机制，以帮助需要长期的信息传递给独立的神经元。多个模块可以被堆叠以形成更好的性能的多层模型。我们的贡献是：1）新的复发性模型的基础上研制的分而治之的策略，了解长期和短期的依赖分开，和2）选择机制，以提高分离和依赖的不同时间尺度的学习。理论分析和大量的实验以验证我们的模型的性能，我们还进行简单的可视化实验和消融分析该模型可解释性。实验结果表明，该DuRNN模型不仅可以处理很长的序列（5000个时间步），而且短序列非常好。与许多国家的最先进的RNN模型相比，我们的模型展示了高效率和更好的性能。

21. User Intent Inference for Web Search and Conversational Agents [PDF] 返回目录
Ali Ahmadvand
Abstract: User intent understanding is a crucial step in designing both conversational agents and search engines. Detecting or inferring user intent is challenging, since the user utterances or queries can be short, ambiguous, and contextually dependent. To address these research challenges, my thesis work focuses on: 1) Utterance topic and intent classification for conversational agents 2) Query intent mining and classification for Web search engines, focusing on the e-commerce domain. To address the first topic, I proposed novel models to incorporate entity information and conversation-context clues to predict both topic and intent of the user's utterances. For the second research topic, I plan to extend the existing state of the art methods in Web search intent prediction to the e-commerce domain, via: 1) Developing a joint learning model to predict search queries' intents and the product categories associated with them, 2) Discovering new hidden users' intents. All the models will be evaluated on the real queries available from a major e-commerce site search engine. The results from these studies can be leveraged to improve performance of various tasks such as natural language understanding, query scoping, query suggestion, and ranking, resulting in an enriched user experience.
摘要：用户意图的理解是在设计这两个对话代理商和搜索引擎的关键一步。检测或推断用户意图是具有挑战性的，因为用户话语或查询可以是短的，不明确的，并且上下文相关的。为了应对这些挑战的研究，我的论文工作的重点是：1）语句主题和意图分级会话代理2）查询意图挖掘和分类的Web搜索引擎，专注于电子商务领域。为了解决第一个问题，我提出了新的模式，以整合实体信息和会话上下文线索来预测这两个主题和意图用户的话语。对于第二个研究课题，我打算延长在Web的技术方法的现有状态搜索意图预测到电子商务领域，通过：1）开发一个共同学习模型来预测搜索查询的意图和关联产品类别其中，2）发现新的隐藏用户的意图。所有型号将在可以从各大电子商务网站的搜索引擎真正的查询进行评估。从这些研究结果可以被利用来提高各种任务性能，如自然语言理解，查询范围界定，查询建议，和排名，导致丰富的用户体验。

22. JointMap: Joint Query Intent Understanding For Modeling Intent Hierarchies in E-commerce Search [PDF] 返回目录
Ali Ahmadvand, Surya Kallumadi, Faizan Javed, Eugene Agichtein
Abstract: An accurate understanding of a user's query intent can help improve the performance of downstream tasks such as query scoping and ranking. In the e-commerce domain, recent work in query understanding focuses on the query to product-category mapping. But, a small yet significant percentage of queries (in our website 1.5% or 33M queries in 2019) have non-commercial intent associated with them. These intents are usually associated with non-commercial information seeking needs such as discounts, store hours, installation guides, etc. In this paper, we introduce Joint Query Intent Understanding (JointMap), a deep learning model to simultaneously learn two different high-level user intent tasks: 1) identifying a query's commercial vs. non-commercial intent, and 2) associating a set of relevant product categories in taxonomy to a product query. JointMap model works by leveraging the transfer bias that exists between these two related tasks through a joint-learning process. As curating a labeled data set for these tasks can be expensive and time-consuming, we propose a distant supervision approach in conjunction with an active learning model to generate high-quality training data sets. To demonstrate the effectiveness of JointMap, we use search queries collected from a large commercial website. Our results show that JointMap significantly improves both "commercial vs. non-commercial" intent prediction and product category mapping by 2.3% and 10% on average over state-of-the-art deep learning methods. Our findings suggest a promising direction to model the intent hierarchies in an e-commerce search engine.
摘要：对用户的查询意图可以帮助准确理解提高下游任务，如查询的作用域和排名的性能。在电子商务领域，近期查询理解工作重点放在查询到产品类别的映射。但是，查询的一个小而显著比例（在2019年我们的网站1.5％或33M查询）具有非商业意图与它们相关联。这些意图通常与非商业信息搜索相关的需求，如折扣，营业时间，安装指南等。在本文中，我们介绍了联合查询意图的理解（JointMap），深学习模式，同时学习两种不同的高层次用户意图的任务：1）识别查询的商业与非商业意图，和2）在分类学一组相关产品类别的产品的查询相关联。 JointMap模型的工作原理是利用通过联合学习的过程这两个相关的任务之间存在传输偏差。作为策划标记数据集为这些任务可以是昂贵和费时，我们提出了结合主动学习的模型来生成高质量的训练数据集遥远的监督办法。为了证明JointMap的有效性，我们使用一个大型的商业网站收集的搜索查询。我们的研究结果表明，JointMap 2.3％，平均超过国家的最先进的深学习方法10％显著改善了“商业与非商业化”的意图预测和产品类别的映射。我们的研究结果表明有前途的方向在电子商务搜索引擎的意图层次结构模型。

23. Complex networks for event detection in heterogeneous high volume news streams [PDF] 返回目录
Iraklis Moutidis, Hywel T.P. Williams
Abstract: Detecting important events in high volume news streams is an important task for a variety of purposes.The volume and rate of online news increases the need for automated event detection methods thatcan operate in real time. In this paper we develop a network-based approach that makes the workingassumption that important news events always involve named entities (such as persons, locationsand organizations) that are linked in news articles. Our approach uses natural language processingtechniques to detect these entities in a stream of news articles and then creates a time-stamped seriesof networks in which the detected entities are linked by co-occurrence in articles and sentences. Inthis prototype, weighted node degree is tracked over time and change-point detection used to locateimportant events. Potential events are characterized and distinguished using community detectionon KeyGraphs that relate named entities and informative noun-phrases from related articles. Thismethodology already produces promising results and will be extended in future to include a widervariety of complex network analysis techniques.
摘要：检测高容量的新闻流的重要事件是各种purposes.The量和在线新闻率的一项重要任务增加了thatcan实时操作自动化的事件检测方法的需要。在本文中，我们开发了基于网络的方法，使workingassumption是重要的新闻事件总是涉及命名实体（如人，locationsand组织），它们在新闻文章链接。我们的方法使用自然语言processingtechniques检测新闻的物品流这些实体，然后创建一个时间标记seriesof网络中检测到的实体是由共同出现在文章和句子的联系。 Inthis原型，加权节点度随时间跟踪并用来locateimportant事件变化点检测。潜在事件的特点和使用有关命名实体和翔实的名词短语从相关文章社区detectionon KeyGraphs区分。 Thismethodology已经产生令人鼓舞的结果，并会在未来扩展到包括复杂的网络分析技术的widervariety。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-29

目录

摘要