摘要

1. Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions [PDF] 返回目录
Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson
Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of continual learning, providing engaging content, and being well-behaved -- and how to measure success in providing them. We end with a discussion of our experience and learnings, and our recommendations to the community.
摘要：我们提出我们的有什么必要建立一个合开域会话代理视图：覆盖这种试剂的品质，已到目前为止已经拼图的碎片，和大洞，我们还没有填补。我们提出的观点有失偏颇，专注于我们自己的小组所做的工作，而理由是在每个领域相关工作。特别是，我们在细节不断地学习的性质讨论，提供引人入胜的内容，并且是乖巧 - 以及如何衡量他们提供成功。我们结束与我们的经验和学习，以及我们对社会的建议的讨论。

2. A Step Towards Interpretable Authorship Verification [PDF] 返回目录
Oren Halvani, Lukas Graner, Roey Regev
Abstract: A central problem that has been researched for many years in the field of digital text forensics is the question whether two documents were written by the same author. Authorship verification (AV) is a research branch in this field that deals with this question. Over the years, research activities in the context of AV have steadily increased, which has led to a variety of approaches trying to solve this problem. Many of these approaches, however, make use of features that are related to or influenced by the topic of the documents. Therefore, it may accidentally happen that their verification results are based not on the writing style (the actual focus of AV), but on the topic of the documents. To address this problem, we propose an alternative AV approach that considers only topic-agnostic features in its classification decision. In addition, we present a post-hoc interpretation method that allows to understand which particular features have contributed to the prediction of the proposed AV method. To evaluate the performance of our AV method, we compared it with ten competing baselines (including the current state of the art) on four challenging data sets. The results show that our approach outperforms all baselines in two cases (with a maximum accuracy of 84%), while in the other two cases it performs as well as the strongest baseline.
摘要：已经研究了很多年的数字文本取证领域的一个核心问题是两个文件是否由同一作者写的问题。署名验证（AV）是这个领域的一个研究分支，这个问题涉及。多年来，在AV的背景下研究活动一直在稳步增加，这导致了各种办法试图解决这个问题。许多这些办法，但是，利用由文档的主题相关或影响功能。因此，它可能会意外发生，他们的验证结果是基于没有在写作风格（AV的实际焦点），但对文档的话题。为了解决这个问题，我们建议考虑在其归类决定唯一的话题无关的特性部件的可选AV办法。此外，我们提出了一个事后的解释方法，它允许以了解其特定的功能已拟议AV方法的预测作出了贡献。为了评估我们的AV方法的性能，我们有四个挑战数据十套的竞争基准（包括技术的当前状态）进行了比较。结果表明，我们的方法优于所有基准在两种情况下（有84％的最大精度），而在其他两种情况下，执行以及最强的基线。

3. Dirichlet-Smoothed Word Embeddings for Low-Resource Settings [PDF] 返回目录
Jakob Jungmaier, Nora Kassner, Benjamin Roth
Abstract: Nowadays, classical count-based word embeddings using positive pointwise mutual information (PPMI) weighted co-occurrence matrices have been widely superseded by machine-learning-based methods like word2vec and GloVe. But these methods are usually applied using very large amounts of text data. In many cases, however, there is not much text data available, for example for specific domains or low-resource languages. This paper revisits PPMI by adding Dirichlet smoothing to correct its bias towards rare words. We evaluate on standard word similarity data sets and compare to word2vec and the recent state of the art for low-resource settings: Positive and Unlabeled (PU) Learning for word embeddings. The proposed method outperforms PU-Learning for low-resource settings and obtains competitive results for Maltese and Luxembourgish.
摘要：如今，使用正逐点互信息（PPMI）加权共生矩阵的经典基于计数的单词的嵌入已经被广泛的基于机器学习的方法，如word2vec和手套取代。但是这些方法是使用非常大量的文本数据的通常应用。在许多情况下，但是，没有太多的文本数据用，例如用于特定域或低资源语言。本文通过增加狄氏平滑纠正其对生僻字偏置重访PPMI。我们评估标准字相似的数据集，并比较word2vec和在该领域的资源匮乏的状态：积极和未标记（PU）学习单词的嵌入。该方法优于PU-学习的资源匮乏，并获得了马耳他和卢森堡有竞争力的结果。

4. MedLatin1 and MedLatin2: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts [PDF] 返回目录
Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani, Mirko Tavoni
Abstract: We present and make available MedLatin1 and MedLatin2, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatin1 and MedLatin2 consist of 294 and 30 curated texts, respectively, labelled by author, with MedLatin1 texts being of an epistolary nature and MedLatin2 texts consisting of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification.
摘要：我们目前并提供MedLatin1和MedLatin2，在计算上分析，作者研究使用中世纪拉丁语本的两个数据集。 MedLatin1和MedLatin2包括294个30策划文本，分别由作家标记，与MedLatin1文本具有书信体的性质和MedLatin2文本包括各种主题的文学评论和论文。因此，这些两个数据集借给自己支持的作者分析的任务，如作者归属，作者身份验证，或同一作者验证研究。

5. Shared Task on Evaluating Accuracy in Natural Language Generation [PDF] 返回目录
Ehud Reiter, Craig Thomson
Abstract: We propose a shared task on methodologies and algorithms for evaluating the accuracy of generated texts. Participants will measure the accuracy of basketball game summaries produced by NLG systems from basketball box score data.
摘要：本文提出的方法和算法的共享任务评估产生文本的准确性。参与者将测量从篮球框得分数据NLG系统中产生的篮球比赛的摘要的准确性。

6. ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion [PDF] 返回目录
BingningWang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang
Abstract: This paper presents the ReCO, a human-curated ChineseReading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to the commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that requires various types of reasoning skills, such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT, only achieve 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO presents a good challenge for machine reading comprehension. The codes, datasets are freely available at this https URL.
摘要：本文介绍了RECO，在意见一人策划的ChineseReading理解数据集。在RECO的问题是颁发给商业搜索引擎基于舆论的查询。该通道由谁提取从检索文档的支持片断的crowdworkers提供。最后，抽象是/否/不确定的答案被crowdworkers给出。 RECO释放由300K问题，这对我们的知识在中国的阅读理解是最大的。 RECO的一个突出特点是，除了原来的背景下段，我们也提供了支持证据，可以直接用来回答这个问题。质量分析表明，需要各种类型的推理能力，如因果推理，逻辑推理等，在许多问题回答的问题，如BERT表现非常好，现在的QA车型RECO的挑战，只有达到这个数据集77％的准确率人类背后大比分接近92％的性能，这表明RECO提出了机器阅读理解一个很好的挑战。这些代码，数据集可以免费从该HTTPS URL。

7. Exploiting Non-Taxonomic Relations for Measuring Semantic Similarity and Relatedness in WordNet [PDF] 返回目录
Mohannad AlMousa, Rachid Benlamri, Richard Khoury
Abstract: Various applications in the areas of computational linguistics and artificial intelligence employ semantic similarity to solve challenging tasks, such as word sense disambiguation, text classification, information retrieval, machine translation, and document clustering. Previous work on semantic similarity followed a mono-relational approach using mostly the taxonomic relation "ISA". This paper explores the benefits of using all types of non-taxonomic relations in large linked data, such as WordNet knowledge graph, to enhance existing semantic similarity and relatedness measures. We propose a holistic poly-relational approach based on a new relation-based information content and non-taxonomic-based weighted paths to devise a comprehensive semantic similarity and relatedness measure. To demonstrate the benefits of exploiting non-taxonomic relations in a knowledge graph, we used three strategies to deploy non-taxonomic relations at different granularity levels. We conducted experiments on four well-known gold standard datasets, and the results demonstrated the robustness and scalability of the proposed semantic similarity and relatedness measure, which significantly improves existing similarity measures.
摘要：在计算语言学和人工智能聘请语义相似的领域的各种应用程序来解决具有挑战性的任务，如词义消歧，文本分类，信息检索，机器翻译，以及文档聚类。语义相似度以前的工作主要使用的分类关系“ISA”跟着单关系方法。本文探讨了使用所有类型的大数据链接，如共发现知识图非分类关系，以加强现有的语义相似性和关联性措施的好处。我们提出了一种基于新的基于非分类基于关系的信息内容和加权路径的整体聚关系的方法来制定全面的语义相似性和关联性的措施。为了证明在知识图谱利用非分类关系的好处，我们用三种策略部署在不同的粒度级别的非分类关系。我们在四个著名的黄金标准数据集进行了实验，结果证明所提出的语义相似性和关联性的措施，这显著改善现有的相似性措施的鲁棒性和可扩展性。

8. Clinical Predictive Keyboard using Statistical and Neural Language Modeling [PDF] 返回目录
John Pavlopoulos, Panagiotis Papapetrou
Abstract: A language model can be used to predict the next word during authoring, to correct spelling or to accelerate writing (e.g., in sms or emails). Language models, however, have only been applied in a very small scale to assist physicians during authoring (e.g., discharge summaries or radiology reports). But along with the assistance to the physician, computer-based systems which expedite the patient's exit also assist in decreasing the hospital infections. We employed statistical and neural language modeling to predict the next word of a clinical text and assess all the models in terms of accuracy and keystroke discount in two datasets with radiology reports. We show that a neural language model can achieve as high as 51.3% accuracy in radiology reports (one out of two words predicted correctly). We also show that even when the models are employed only for frequent words, the physician can save valuable time.
摘要：语言模型可用于在创作过程来预测下一个字，以正确的拼写或加速书面形式（例如，以短信或电子邮件）。语言模型，但是，只被在一个非常小规模的创作应用（例如，出院小结或放射报告）期间，协助医师。但随着医师的协助下沿，这加速了患者的退出计算机为基础的系统还有助于减少医院感染。我们采用的统计和神经语言模型来预测临床文本的下一个字，并评估在两个数据集与放射学报告的准确性和按键折扣方面的所有车型。我们表明，神经语言模型可以实现高达放射科报告51.3％的准确度（两个词一出正确预测）。我们还表明，即使在车型仅采用频繁的话，医生可以节省宝贵的时间。

9. Efficient text generation of user-defined topic using generative adversarial networks [PDF] 返回目录
Chenhan Yuan, Yi-chin Huang, Cheng-Hung Tsai
Abstract: This study focused on efficient text generation using generative adversarial networks (GAN). Assuming that the goal is to generate a paragraph of a user-defined topic and sentimental tendency, conventionally the whole network has to be re-trained to obtain new results each time when a user changes the topic. This would be time-consuming and impractical. Therefore, we propose a User-Defined GAN (UD-GAN) with two-level discriminators to solve this problem. The first discriminator aims to guide the generator to learn paragraph-level information and sentence syntactic structure, which is constructed by multiple-LSTMs. The second one copes with higher-level information, such as the user-defined sentiment and topic for text generation. The cosine similarity based on TF-IDF and length penalty are adopted to determine the relevance of the topic. Then, the second discriminator is re-trained with the generator if the topic or sentiment for text generation is modified. The system evaluations are conducted to compare the performance of the proposed method with other GAN-based ones. The objective results showed that the proposed method is capable of generating texts with less time than others and the generated text is related to the user-defined topic and sentiment. We will further investigate the possibility of incorporating more detailed paragraph information such as semantics into text generation to enhance the result.
摘要：本研究主要集中在高效的文本生成使用生成对抗网络（GAN）。假设我们的目标是产生一个用户自定义的主题和感伤倾向的段落，通常整个网络必须重新培训，取得新的成效每个当用户改变了话题的时间。这将是耗时且不切实际。因此，我们提出了一个用户自定义甘（UD-GAN）有两个级别的鉴别来解决这个问题。第一鉴别目的，引导发生器学习段落级信息和句子的句法结构，它是由多LSTMs构成。第二个与COPES更高水平的信息，诸如用于文本生成用户定义的情绪和主题。基于TF-IDF和长度处罚余弦相似度采用以确定该主题的相关性。然后，第二鉴别重新训练与发电机如果文本生成的主题或情绪被修改。该系统评价是进行比较所提出的方法与其他基于GaN的那些性能。客观结果表明，所提出的方法能够以比其他时间少并且将所生成的文本相关的用户定义的主题和情绪生成文本。我们将进一步调查，如语义成文本生成，增强的结果将更多详细的段落信息的可能性。

10. Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage [PDF] 返回目录
Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang, Ricardo Henao, Lawrence Carin
Abstract: Small and imbalanced datasets commonly seen in healthcare represent a challenge whentraining classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers forBiomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attentionin each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERTto smaller variants, we aim to reduce overfitting and model size when working on smalldatasets. As an application, our framework is utilized to build a model for patient portalmessage triage that classifies the urgency of a message into three categories: non-urgent,medium and urgent. Experiments demonstrate that our approach can outperform sev-eral strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is publicly available at \url{this https URL}.
摘要：小和医疗保健常见的不平衡数据集表示基于深度学习模型whentraining分类提出了挑战。如此上进，我们提出了基于BioBERT（双向编码器交涉从变形金刚forBiomedical文本挖掘）一种新型的框架。具体而言，（ⅰ）我们介绍自Attentionin标签曲面嵌入BERT的每一层，我们称之为LESA-BERT，和（ii）通过蒸馏LESA-BERTto较小的变体中，我们的目标是减少过度拟合和smalldatasets工作时模型大小。作为应用，我们的框架被用来建立用于患者portalmessage分流了一个模型，一个消息的紧急性分类成三类：非紧急，中等和紧迫。实验表明，我们的方法可以由4.3％显著保证金宏观F1得分方面超越SEV-全部擦除强基线分类。该项目的代码是公开的，在\ {URL这HTTPS URL}。

11. A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets [PDF] 返回目录
Chengchang Zeng, Shaobo Li, Qin Li, Jie Hu, Jianjun Hu
Abstract: Machine Reading Comprehension (MRC) is a challenging NLP research field with wide real world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed the human performance on many datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need of improving existing datasets, evaluation metrics and models to move the MRC models toward 'real' understanding. To address this lack of comprehensive survey of existing MRC tasks, evaluation metrics and datasets, herein, (1) we analyzed 57 MRC tasks and datasets; proposed a more precise classification method of MRC tasks with 4 different attributes (2) we summarized 9 evaluation metrics of MRC tasks and (3) 7 attributes and 10 characteristics of MRC datasets; (4) We also discussed some open issues in MRC research and highlight some future research directions. In addition, to help the community, we have collected, organized, and published our data on a companion website(this https URL) where MRC researchers could directly access each MRC dataset, papers, baseline projects and browse the leaderboard.
摘要：机阅读理解（MRC）是具有广泛的现实世界的应用挑战NLP研究领域。这个领域在近几年的巨大进步主要是由于大型数据集和深度学习的出现。目前，很多MRC机型已经超过许多数据集的人的表现，尽管现有的MRC模型和真正的人类级别的阅读理解之间的明显差距巨大。这表明有必要改进现有的数据集，评价指标和模型，以移动MRC模型对“真实”的理解。为了解决这个问题缺乏现有MRC任务，评价指标和数据集综合考察，在此，（1）我们分析了57个MRC任务和数据集;提出的任务MRC更精确的分类方法用4个不同的属性（2），我们总结的MRC任务9个评价指标和（3）7点的属性和MRC数据集10点的特性; （4）我们还讨论了在MRC研究一些开放性的问题，并强调一些未来的研究方向。此外，为帮助社会，我们收集，整理和出版了一个配套网站我们的数据（此HTTPS URL），其中MRC研究人员可以直接访问每个数据集的MRC，论文基线项目和浏览排行榜。

12. Labeling Explicit Discourse Relations using Pre-trained Language Models [PDF] 返回目录
Murathan Kurfalı
Abstract: Labeling explicit discourse relations is one of the most challenging sub-tasks of the shallow discourse parsing where the goal is to identify the discourse connectives and the boundaries of their arguments. The state-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features. The current paper investigates the efficacy of the pre-trained language models in this task. We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features. We evaluate our model on PDTB 2.0 and report the state-of-the-art results in the extraction of the full relation. This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.
摘要：标注明确的话语关系是浅话语分析，其目的是识别话语连接词和它们的参数边界的最具挑战性的子任务之一。国家的最先进的模型，通过使用手工制作的特征实现的F-score的略高于45％。目前论文研究预先训练的语言模型在这个任务中的功效。我们发现，预先训练的语言模型，微调，时，是强大足以取代语言特征。我们评估我们的PDTB 2.0模式，并报告国家的最先进成果的全面关系的提取。这是当一个模型优于知识密集型模式不使用任何语言特征的第一次。

13. AdvAug: Robust Adversarial Augmentation for Neural Machine Translation [PDF] 返回目录
Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein
Abstract: In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, of which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embedding space centered around observed training sentence pairs. We then discuss our approach, AdvAug, to train NMT models using the embeddings of virtual sentences in sequence-to-sequence learning. Experiments on Chinese-English, English-French, and English-German translation benchmarks show that AdvAug achieves significant improvements over the Transformer (up to 4.9 BLEU points), and substantially outperforms other data augmentation techniques (e.g. back-translation) without using extra corpora.
摘要：在本文中，我们提出了神经机器翻译（NMT）一种新的对抗性增强方法。其主要思想是，以尽量减少在从两个附近的分布，其中的关键之一是用于对抗句子的新颖附近分布，描述一个平滑内插嵌入空间围绕着观察到的训练句子对采样虚拟句子连位风险。然后，我们讨论了我们的方法，AdvAug，培养使用顺序对序列学习的虚拟句子的嵌入NMT模型。在中国，英语，英语，法语，英语，德语翻译基准测试实验表明，AdvAug在变压器（高达4.9 BLEU点）达到显著的改进，大幅度优于其他数据增强技术（例如回译），而无需使用额外的语料库。

14. The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2 [PDF] 返回目录
Assaf Singer, Katharina Kann
Abstract: We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion. The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form. The latter requires generating entire paradigms for a set of given lemmas from raw text alone. We model morphological inflection as a sequence-to-sequence problem, where the input is the sequence of the lemma's characters with morphological tags, and the output is the sequence of the inflected form's characters. First, we apply a transformer model to the task. Second, as inflected forms share most characters with the lemma, we further propose a pointer-generator transformer model to allow easy copying of input characters. Our best performing system for Task 0 is placed 6th out of 23 systems. We further use our inflection systems as subcomponents of approaches for Task 2. Our best performing system for Task 2 is the 2nd best out of 7 submissions.
摘要：我们在类型学不同形态拐点描述NYU-CUBoulder系统的SIGMORPHON 2020任务0和任务2无监督的形态模式完成。前者包括从外稃和一组描述目标形式形态句法特征生成形态学拐折的。后者需要产生整个范例为一组单独从原始文本给出引理。我们的形态拐点建模为一个序列到序列问题，其中输入是引理与形态标记的字符序列，并输出为词尾变化的形式的字符序列。首先，我们采用一个变压器模型的任务。其次，作为变形形式与引理共享大部分的人物，我们进一步提出了一个指针发电机变压器模型，让输入字符容易复制。任务0我们表现最好的系统被置于23个系统的第六届了。我们进一步利用我们的拐点系统，对任务2.我们表现最好的系统方法为子任务2是7个提交第二次最好的了。

15. Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations [PDF] 返回目录
Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng, Lifeng Shang
Abstract: Computational and cognitive studies suggest that the abstraction of eventualities (activities, states, and events) is crucial for humans to understand daily eventualities. In this paper, we propose a scalable approach to model the entailment relations between eventualities ("eat an apple'' entails ''eat fruit''). As a result, we construct a large-scale eventuality entailment graph (EEG), which has 10 million eventuality nodes and 103 million entailment edges. Detailed experiments and analysis demonstrate the effectiveness of the proposed approach and quality of the resulting knowledge graph. Our datasets and code are available at this https URL.
摘要：计算和认知研究表明不测事件（活动，状态和事件）的抽象是至关重要的人类理解日常不测。在本文中，我们提出了一个可扩展的方法来不测之间蕴涵的关系模型（“吃一个苹果‘’限嗣继承'吃水果‘’）。作为结果，我们构建了一个大型的可能性蕴含图（EEG），其拥有1000万个不测节点和1.03亿蕴涵的边缘。详细的实验和分析表明，该方法和产生的知识图形质量的效果。我们的数据集和代码可在此HTTPS URL。

16. The Importance of Category Labels in Grammar Induction with Child-directed Utterances [PDF] 返回目录
Lifeng Jin, William Schuler
Abstract: Recent progress in grammar induction has shown that grammar induction is possible without explicit assumptions of language-specific knowledge. However, evaluation of induced grammars usually has ignored phrasal labels, an essential part of a grammar. Experiments in this work using a labeled evaluation metric, RH, show that linguistically motivated predictions about grammar sparsity and use of categories can only be revealed through labeled evaluation. Furthermore, depth-bounding as an implementation of human memory constraints in grammar inducers is still effective with labeled evaluation on multilingual transcribed child-directed utterances.
摘要：在语法归纳近期的进展表明，语法归纳可能没有的特定语言知识显性化的假设。然而，诱导语法的评价通常忽略了标签短语，语法的重要组成部分。使用标记的评估度，相对湿度，在这项工作中实验证明关于语法稀疏和使用类别，在语言的动机的预测只能通过标记的评价来揭示。此外，深度边界作为人类记忆的限制语法诱导的实现仍然与多语种转录儿童的话语标记评估有效。

17. MDR Cluster-Debias: A Nonlinear WordEmbedding Debiasing Pipeline [PDF] 返回目录
Yuhao Du, Kenneth Joseph
Abstract: Existing methods for debiasing word embeddings often do so only superficially, in that words that are stereotypically associated with, e.g., a particular gender in the original embedding space can still be clustered together in the debiased space. However, there has yet to be a study that explores why this residual clustering exists, and how it might be addressed. The present work fills this gap. We identify two potential reasons for which residual bias exists and develop a new pipeline, MDR Cluster-Debias, to mitigate this bias. We explore the strengths and weaknesses of our method, finding that it significantly outperforms other existing debiasing approaches on a variety of upstream bias tests but achieves limited improvement on decreasing gender bias in a downstream task. This indicates that word embeddings encode gender bias in still other ways, not necessarily captured by upstream tests.
摘要：为消除直流偏压字的嵌入经常这样做只是表面上现有的方法，在那些与刻板印象，例如，在原嵌入空间特定性别相关联的话仍然可以在debiased空间聚类在一起。然而，还有待研究，探讨为什么这个剩余集群存在，以及它会如何加以解决。本著作填补了这一空白。我们确定其剩余存在偏差两种可能原因，并制定了新的管道，MDR集群消除直流偏压，以减轻这种偏见。我们探讨了该方法的优点和缺点，发现它显著优于其他现有的去除偏差方法对各种上游偏置测试，但实现上下游任务减少性别偏见改善有限。这表明用其它方式，不一定由上游测试捕获的字的嵌入编码的性别偏见。

18. Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble [PDF] 返回目录
Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-wei Chang, Xuanjing Huang
Abstract: Despite neural networks have achieved prominent performance on many natural language processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we propose Dirichlet Neighborhood Ensemble (DNE), a randomized smoothing method for training a robust model to defense substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models for NLP applications. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
摘要：尽管神经网络对许多自然语言处理（NLP）任务，取得了突出的表现，他们是脆弱的对抗性的例子。在本文中，我们提出了狄氏邻居合奏（DNE），用于训练稳健的模型，防御基于替代的攻击随机平滑方法。在训练期间，DNE形式的虚拟句子通过从凸壳中的输入句子采样嵌入矢量对每个字跨越由字和它的同义词，它与训练数据增强它们。以这种方式，该模型具有较强的抗攻击敌对同时保持原有的干净的数据表现。 DNE是不可知的网络架构和扩展到大型模型NLP应用。我们通过大量的实验，我们的方法一贯在不同的网络架构和多个数据集显著利润率优于最近提出的防御方法证明。

19. Studying Attention Models in Sentiment Attitude Extraction Task [PDF] 返回目录
Nicolay Rusnachenko, Natalia Loukachevitch
Abstract: In the sentiment attitude extraction task, the aim is to identify <> -- sentiment relations between entities mentioned in text. In this paper, we provide a study on attention-based context encoders in the sentiment attitude extraction task. For this task, we adapt attentive context encoders of two types: (i) feature-based; (ii) self-based. Our experiments with a corpus of Russian analytical texts RuSentRel illustrate that the models trained with attentive encoders outperform ones that were trained without them and achieve 1.5-5.9% increase by F1. We also provide the analysis of attention weight distributions in dependence on the term type.
摘要：在感情的态度提取任务，其目的是找出<<态度>> - 在文中提到的实体之间关系的感悟。在本文中，我们提供的情绪态度提取任务的关注，基于上下文的编码器进行了研究。对于这个任务，我们适应两种类型的周到上下文编码器：（一）基于特征的; （ⅱ）自为主。我们与俄罗斯的分析文本RuSentRel的语料库实验表明，与贴心的编码器训练模型胜过那进行了培训，没有他们，实现由F1 1.5-5.9％增加的。我们还提供了关注的重量分布的分析，对术语类型的依赖。

20. AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss [PDF] 返回目录
Mahmoud Daif, Shunsuke Kitada, Hitoshi Iyatomi
Abstract: Classical and some deep learning techniques for Arabic text classification often depend on complex morphological analysis, word segmentation, and hand-crafted feature engineering. These could be eliminated by using character-level features. We propose a novel end-to-end Arabic document classification framework, Arabic document image-based classifier (AraDIC), inspired by the work on image-based character embeddings. AraDIC consists of an image-based character encoder and a classifier. They are trained in an end-to-end fashion using the class balanced loss to deal with the long-tailed data distribution problem. To evaluate the effectiveness of AraDIC, we created and published two datasets, the Arabic Wikipedia title (AWT) dataset and the Arabic poetry (AraP) dataset. To the best of our knowledge, this is the first image-based character embedding framework addressing the problem of Arabic text classification. We also present the first deep learning-based text classifier widely evaluated on modern standard Arabic, colloquial Arabic and classical Arabic. AraDIC shows performance improvement over classical and deep learning baselines by 12.29% and 23.05% for the micro and macro F-score, respectively.
摘要：古典和阿拉伯文文本分类的一些深层次的学习方法往往依赖于复杂的形态分析，分词，和手工制作的功能设计。这些可以通过使用字符级的功能被淘汰。我们提出了一个新颖的终端到终端的阿拉伯语文件分类架构，基于图像的阿拉伯语文件分类（AraDIC），通过对基于图像的字符的嵌入工作的启发。 AraDIC由基于图像的字符编码器和分类器的。他们使用的是类平衡损失应对长尾数据分配问题在终端到终端的方式训练。为了评估AraDIC的有效性，我们创建并出版了两个集，阿拉伯语维基百科标题（AWT）数据集和阿拉伯诗歌（ARAP）数据集。据我们所知，这是第一个基于图像的字符嵌入框架内处理阿拉伯语文本分类的问题。我们还提出了第一个基于深学习文本分类的现代标准阿拉伯语，阿拉伯语口语和古典阿拉伯语广泛评估。 12.29％，并分别在微观和宏观F值，23.05％，相对于传统和深厚的学习基线AraDIC显示性能的提高。

21. Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy [PDF] 返回目录
Antonio H. O. Fonseca, David van Dijk
Abstract: Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in multilingual contexts. Moreover, it offers a way to infer cross-lingual meaning for words without a direct translation. Current methods for word embedding alignment are either supervised, i.e. they require known word pairs, or learn a cross-domain transformation on fixed embeddings in an unsupervised way. Here we propose an end-to-end approach for word embedding alignment that does not require known word pairs. Our method, termed Word Alignment through MMD (WAM), learns embeddings that are aligned during sentence translation training using a localized Maximum Mean Discrepancy (MMD) constraint between the embeddings. We show that our method not only out-performs unsupervised methods, but also supervised methods that train on known word translations.
摘要：字词翻译是语言翻译的一个组成部分。在机器翻译，每种语言被认为是有自己的字嵌入域。字的嵌入之间的对准允许连接在多语种情况下语义相同的单词。此外，它提供了一种方法来推断的话跨语言的意义，而不直接翻译。对于字嵌入对准当前的方法要么监管，即它们需要已知的字对，或学习在固定的嵌入的横域变换以无监督方式。在这里，我们提出了字嵌入不需要知道的单词对对齐的终端到终端的方法。我们的方法，称为词对齐通过MMD（WAM），正在使用的嵌入之间的局部最大平均差异（MMD）约束在整句翻译培训对齐获悉的嵌入。我们证明了我们的方法不仅出执行监督的方法，也有监督方法，那列火车上已知单词的翻译。

22. SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection [PDF] 返回目录
Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Abstract: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.
摘要：在自然语言处理（NLP）一个广义的目标是开发出具有处理任何自然语言的能力的系统。大多数系统，但使用的是只从一个语言的数据，如英语的发展。形态学reinflection目标的SIGMORPHON 2020共享任务，调查系统跨类型学的独特的语言，其中有许多是低资源归纳能力。系统使用来自45种语言数据和仅有5语系，开发微调从额外的45种语言和10个语系（总共13）数据，并在所有90种语言评价。共有来自10个队22个系统（19神经）的已提交任务。所有四个获奖系统是神经（二单语变压器和与门关注有两个大型多语种基于RNN的模型）。大多数团队演示数据幻觉和增强，合奏，以及低资源语言多语种培训的效用。非神经学习者和手工设计的语法表现出对某些语言（如Ingrian，塔吉克语，菲律宾语，Zarma，林加拉语）的竞争，甚至卓越的性能，特别是非常有限的数据。有些语系（亚非语系，尼日尔，刚果，突厥）相对容易为大多数系统，取得了超过90％的平均准确而另一些则更具挑战性。

23. Seq2Seq and Joint Learning Based Unix Command Line Prediction System [PDF] 返回目录
Thoudam Doren Singh, Abdullah Faiz Ur Rahman Khilji, Divyansha, Apoorva Vikram Singh, Surmila Thokchom, Sivaji Bandyopadhyay
Abstract: Despite being an open-source operating system pioneered in the early 90s, UNIX based platforms have not been able to garner an overwhelming reception from amateur end users. One of the rationales for under popularity of UNIX based systems is the steep learning curve corresponding to them due to extensive use of command line interface instead of usual interactive graphical user interface. In past years, the majority of insights used to explore the concern are eminently centered around the notion of utilizing chronic log history of the user to make the prediction of successive command. The approaches directed at anatomization of this notion are predominantly in accordance with Probabilistic inference models. The techniques employed in past, however, have not been competent enough to address the predicament as legitimately as anticipated. Instead of deploying usual mechanism of recommendation systems, we have employed a simple yet novel approach of Seq2seq model by leveraging continuous representations of self-curated exhaustive Knowledge Base (KB) to enhance the embedding employed in the model. This work describes an assistive, adaptive and dynamic way of enhancing UNIX command line prediction systems. Experimental methods state that our model has achieved accuracy surpassing mixture of other techniques and adaptive command line interface mechanism as acclaimed in the past.
摘要：尽管是一个开源操作系统在90年代初率先推出基于UNIX平台一直没能争取从业余终端用户压倒性的接收。其中一个理由下基于UNIX的系统的普及的是陡峭的学习对应于它们曲线由于大量使用命令行界面，而不是通常的交互式图形用户界面的。在过去几年中，大部分用于探索的关注见解各地利用用户的慢性日志历史记录，使连续命令的预测的概念被突出地集中。冲着这个概念的解剖的方法是主要根据概率推理模型。在过去所采用的技术，但是，还没有足够的能力来解决困境的合法如预期。相反，部署推荐系统中常用的机制，我们采用一个简单但通过利用自策划详尽的知识库（KB）的连续表示Seq2seq模型的新方法，以提高模型嵌入使用。这项工作描述了增强UNIX命令行预测系统的辅助，自适应和动态的方式。实验方法状态，我们的模型取得精度过去赞誉超越的其它技术和自适应命令行界面机制混合物。

24. Named Entity Extraction with Finite State Transducers [PDF] 返回目录
Diego Alexander Huérfano Villalba, Elizabeth León Guzmán
Abstract: We describe a named entity tagging system that requires minimal linguistic knowledge and can be applied to more target languages without substantial changes. The system is based on the ideas of the Brill's tagger which makes it really simple. Using supervised machine learning, we construct a series of automatons (or transducers) in order to tag a given text. The final model is composed entirely of automatons and it requires a lineal time for tagging. It was tested with the Spanish data set provided in the CoNLL-$2002$ attaining an overall $F_{\beta = 1}$ measure of $60\%.$ Also, we present an algorithm for the construction of the final transducer used to encode all the learned contextual rules.
摘要：我们描述了需要最少的语言知识，可以应用于多种目标语言没有实质性变化的命名实体标签系统。该系统是基于布里尔的恶搞的，这使得它非常简单的想法。使用监督的机器学习，我们为了标记一个给定文本构建一系列的自动机（或传感器）的。最终的模型自动机的完全由它需要标记直系时间。据与西班牙数据测试在CoNLL-集提供$ 2002 $实现整体$ F _ {\的β= 1}的$ 60 \％$量度。$此外，我们提出了最终换能器的结构的算法用来编码所有学到的上下文规则。

25. Memory Transformer [PDF] 返回目录
Mikhail S. Burtsev, Grigory V. Sapunov
Abstract: Transformer-based models have achieved state-of-the-art results in many natural language processing (NLP) tasks. The self-attention architecture allows us to combine information from all elements of a sequence into context-aware representations. However, all-to-all attention severely hurts the scaling of the model to large sequences. Another limitation is that information about the context is stored in the same element-wise representations. This makes the processing of properties related to the sequence as a whole more difficult. Adding trainable memory to selectively store local as well as global representations of a sequence is a promising direction to improve the Transformer model. Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations. MANNs have demonstrated the capability to learn simple algorithms like Copy or Reverse and can be successfully trained via backpropagation on diverse tasks from question answering to language modeling outperforming RNNs and LSTMs of comparable complexity. In this work, we propose and study two extensions of the Transformer baseline (1) by adding memory tokens to store non-local representations, and (2) creating memory bottleneck for the global information. We evaluate these memory augmented Transformers on machine translation task and demonstrate that memory size positively correlates with the model performance. Attention patterns over the memory suggest that it improves the model's ability to process a global context. We expect that the application of Memory Transformer architectures to the tasks of language modeling, reading comprehension, and text summarization, as well as other NLP tasks that require the processing of long contexts will contribute to solving challenging problems of natural language understanding and generation.
摘要：基于变压器的模型已在许多自然语言处理（NLP）任务，实现了国家的先进成果。自注意架构使我们能够从信息序列的所有元素融入上下文感知的陈述相结合。然而，所有到所有的注意力严重伤害模型到大序列的缩放。另一个限制是，有关的上下文信息被存储在相同的逐元素表示。这使得相关的序列作为一个整体更困难性的处理。添加训练的内存来选择性地存储序列的本地以及全球的表示是为了提高变压器模型有前途的方向。存储器增强神经网络（曼斯）延伸，以与通用存储器用于表示传统的神经结构。曼斯已经证明的能力，学习简单的算法，如复制或反转，并且可以通过反向传播成功培训了来自答疑不同的任务，以语言建模跑赢RNNs和可比的复杂性LSTMs。在这项工作中，我们提出并通过增加内存令牌存储非本地表示，和（2）建立全球信息存储瓶颈学习变压器基线（1）的两个扩展。我们评估这些记忆增强变形金刚机器翻译任务，并表明内存大小与模型的表现正相关。在内存注意模式表明，它提高了模型来处理全球范围内的能力。我们预计，内存变压器架构，语言建模的任务，阅读理解和文本摘要，以及那些需要长上下文的处理其他任务NLP的应用将有助于解决自然语言理解和生成具有挑战性的问题。

26. Sarcasm Detection in Tweets with BERT and GloVe Embeddings [PDF] 返回目录
Akshay Khatri, Pranav P, Dr. Anand Kumar M
Abstract: Sarcasm is a form of communication in whichthe person states opposite of what he actually means. It is ambiguous in nature. In this paper, we propose using machine learning techniques with BERT and GloVe embeddings to detect sarcasm in tweets. The dataset is preprocessed before extracting the embeddings. The proposed model also uses the context in which the user is reacting to along with his actual response.
摘要：讽刺是whichthe人交流的形式状态的他实际上意味着相反。它在本质上是不明确的。在本文中，我们建议使用机器学习技术与BERT和手套的嵌入检测鸣叫嘲讽。该数据集提取的嵌入前预处理。该模型还使用在用户与他的实际响应一起反应的上下文。

27. Improving Query Safety at Pinterest [PDF] 返回目录
Abhijit Mahabal, Yinrui Li, Rajat Raina, Daniel Sun, Revati Mahajan, Jure Lescovec
Abstract: Query recommendations in search engines is a double edged sword, with undeniable benefits but potential of harm. Identifying unsafe queries is necessary to protect users from inappropriate query suggestions. However, identifying these is non-trivial because of the linguistic diversity resulting from large vocabularies, social-group-specific slang and typos, and because the inappropriateness of a term depends on the context. Here we formulate the problem as query-set expansion, where we are given a small and potentially biased seed set and the aim is to identify a diverse set of semantically related queries. We present PinSets, a system for query-set expansion, which applies a simple yet powerful mechanism to search user sessions, expanding a tiny seed set into thousands of related queries at nearly perfect precision, deep into the tail, along with explanations that are easy to interpret. PinSets owes its high quality expansion to using a hybrid of textual and behavioral techniques (i.e., treating queries both as compositional and as black boxes). Experiments show that, for the domain of drugs-related queries, PinSets expands 20 seed queries into 15,670 positive training examples at over 99\% precision. The generated expansions have diverse vocabulary and correctly handles words with ambiguous safety. PinSets decreased unsafe query suggestions at Pinterest by 90\%.
摘要：在搜索引擎中的查询建议是一把双刃剑，有不可否认的好处，但损害的潜力。识别不安全的查询需要保护用户免受不适当的查询建议。然而，鉴定这些是不平凡的，因为从大量词汇，社交组特定的俚语和拼写错误而产生的语言的多样性，而且由于术语的不适当取决于上下文。在这里，我们配制问题作为查询集的扩展，在这里我们给出一个小的和潜在偏置种子组和所述目的是确定一组不同的语义相关的查询。我们目前PinSets，查询集扩展的系统，它适用于一个简单而强大的机制来搜索用户会话，以近乎完美的精度扩展一个很小的种子集到几千相关查询的，深入到尾，采用易解释沿解释。 PinSets欠其高品质膨胀到使用的文本和行为技术的混合（即，处理查询既作为组成和作为黑盒子）。实验表明，对于药物相关的查询域，PinSets超过99 \％的精度扩展20个种子查询到15670个正例。生成的扩展有不同的词汇和正确处理的话有歧义的安全性。 PinSets 90 \％的Pinterest的减少不安全的查询建议。

28. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations [PDF] 返回目录
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
Abstract: We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. We set a new state of the art on both the 100 hour subset of Librispeech as well as on TIMIT phoneme recognition. When lowering the amount of labeled data to one hour, our model outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 5.7/10.1 WER on the noisy/clean test sets of Librispeech. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. Fine-tuning on all of Librispeech achieves 1.9/3.5 WER using a simple baseline model architecture. We will release code and models.
摘要：我们展示的第一次，从语音音频学习强大的交涉单独随后微调对转录讲话，同时简单的概念可以超越最好的半监督方法。 wav2vec 2.0掩盖了潜在空间的语音输入，解决了其联合学潜表示的量化定义的对比任务。我们两个Librispeech的100小时子集，以及对TIMIT音素识别设置了新的艺术状态。当降低标记的数据量为一小时，我们的模型，同时使用更少的100倍标记数据优于现有技术的百小时子集的先前状态。使用标签数据对53K小时无标签的数据只是十分钟前的训练仍然实现在嘈杂/清洁测试集Librispeech 5.7 / 10.1 WER。这表明语音识别与数量有限的标记数据的可行性。所有Librispeech的微调使用一个简单的基准模型架构实现了1.9 / 3.5 WER。我们将发布的代码和模型。

29. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? [PDF] 返回目录
Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer
Abstract: Humans read and write hundreds of billions of messages every day. Further, due to the availability of large datasets, large computing systems, and better neural network models, natural language processing (NLP) technology has made significant strides in understanding, proofreading, and organizing these messages. Thus, there is a significant opportunity to deploy NLP in myriad applications to help web users, social networks, and businesses. In particular, we consider smartphones and other mobile devices as crucial platforms for deploying NLP models at scale. However, today's highly-accurate NLP neural network models such as BERT and RoBERTa are extremely computationally expensive, with BERT-base taking 1.7 seconds to classify a text snippet on a Pixel 3 smartphone. In this work, we observe that methods such as grouped convolutions have yielded significant speedups for computer vision networks, but many of these techniques have not been adopted by NLP neural network designers. We demonstrate how to replace several operations in self-attention layers with grouped convolutions, and we use this technique in a novel network architecture called SqueezeBERT, which runs 4.3x faster than BERT-base on the Pixel 3 while achieving competitive accuracy on the GLUE test set. The SqueezeBERT code will be released.
摘要：人类阅读，每天写几千亿的消息。此外，由于大型数据集，大型计算系统，以及更好的神经网络模型的有效性，自然语言处理（NLP）技术已在理解，校对，并组织这些消息作出显著的进步。因此，在无数的应用，帮助网络用户，社交网络和业务部署NLP一个显著的机会。特别是，我们认为智能手机和为大规模部署NLP模型的关键平台等移动设备。然而，今天的高精度NLP神经网络模型，如BERT和罗伯塔是非常昂贵的计算，与BERT基服用时1.7秒文本片段上的像素3智能分类。在这项工作中，我们观察到，如分组卷积方法已经取得了计算机视觉网络显著的加速，但是很多这些技术还没有被通过NLP神经网络的设计。我们演示了如何用分组卷积取代自我关注层的一些操作，而我们在一个新的网络架构称为SqueezeBERT，运行比4.3倍的像素3 BERT基快，而上胶测试实现有竞争力的准确使用这一技术组。该SqueezeBERT代码将被释放。

30. Limits to Depth Efficiencies of Self-Attention [PDF] 返回目录
Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua
Abstract: Self-attention architectures, which are rapidly pushing the frontier in natural language processing, demonstrate a surprising depth-inefficient behavior: Empirical signals indicate that increasing the internal representation (network width) is just as useful as increasing the number of self-attention layers (network depth). In this paper, we theoretically study the interplay between depth and width in self-attention, and shed light on the root of the above phenomenon. We invalidate the seemingly plausible hypothesis by which widening is as effective as deepening for self-attention, and show that in fact stacking self-attention layers is so effective that it quickly saturates a capacity of the network width. Specifically, we pinpoint a "depth threshold" that is logarithmic in $d_x$, the network width: $L_{\textrm{th}}=\log_{3}(d_x)$. For networks of depth that is below the threshold, we establish a double-exponential depth-efficiency of the self-attention operation, while for depths over the threshold we show that depth-inefficiency kicks in. Our predictions strongly accord with extensive empirical ablations in Kaplan et al. (2020), accounting for the different behaviors in the two depth-(in)efficiency regimes. By identifying network width as a limiting factor, our analysis indicates that solutions for dramatically increasing the width can facilitate the next leap in self-attention expressivity.
摘要：自关注架构，其迅速地推动前沿在自然语言处理，表现出令人惊讶的深度低效行为：经验信号指示增加的内部表示（网络宽度）是一样有用作为增加的自我注意数层（网络深度）。在本文中，我们研究理论上自关注的深度和宽度之间的相互影响，并在上述现象的根源线索。我们无效看似合理的假设通过加宽有效深化对自我的关注，表明事实上堆叠自我关注层是如此有效，它很快饱和了网络宽度的能力。具体来说，我们找出 “深度阈值”，也就是在$ $ D_X，网络宽度对数：$ L _ {\ TEXTRM {第}} = \ LOG_ {3}（D_X）$。对于深度低于阈值的网络，建立自我注意操作的双指数深度效率，而对于过门槛深处，我们表明在深度低效踢。我们的预测与丰富的经验消融强烈一致Kaplan等。（2020），占两个深度 - （上）效率制度的不同的行为。通过识别网络宽度的限制因素，我们的分析表明了显着增加的宽度可以促进自我表达能力的关注的下一个飞跃的解决方案。

31. Deep Job Understanding at LinkedIn [PDF] 返回目录
Shan Li, Baoxu Shi, Jaewon Yang, Ji Yan, Shuai Wang, Fei Chen, Qi He
Abstract: As the world's largest professional network, LinkedIn wants to create economic opportunity for everyone in the global workforce. One of its most critical missions is matching jobs with processionals. Improving job targeting accuracy and hire efficiency align with LinkedIn's Member First Motto. To achieve those goals, we need to understand unstructured job postings with noisy information. We applied deep transfer learning to create domain-specific job understanding models. After this, jobs are represented by professional entities, including titles, skills, companies, and assessment questions. To continuously improve LinkedIn's job understanding ability, we designed an expert feedback loop where we integrated job understanding models into LinkedIn's products to collect job posters' feedback. In this demonstration, we present LinkedIn's job posting flow and demonstrate how the integrated deep job understanding work improves job posters' satisfaction and provides significant metric lifts in LinkedIn's job recommendation system.
摘要：作为全球最大的专业网络LinkedIn希望为每个人创造全球劳动力的经济机会。它的一个最重要的任务是匹配与processionals工作。提高工作目标的精度和效率，租赁与对齐LinkedIn的会员首先座右铭。为了实现这些目标，我们需要了解与嘈杂的信息非结构化招聘信息。我们采用深陷转会学习创建特定领域的工作的理解模型。在此之后，工作是由专业实体，包括职称，技能，公司和评估问题表示。为不断提高LinkedIn的工作的理解能力，我们设计，我们整合工作的理解模型转换为LinkedIn的产品收集工作海报的反馈意见的专家反馈回路。在这个演示中，我们目前LinkedIn的招聘启事流量和展示整合深工作的了解工作如何改进工作海报的满意度和LinkedIn的职业介绍系统提供显著指标升降机。

32. Examination of community sentiment dynamics due to covid-19 pandemic: a case study from Australia [PDF] 返回目录
Jianlong Zhou, Shuiqiao Yang, Chun Xiao, Fang Chen
Abstract: The outbreak of the novel Coronavirus Disease 2019 (COVID-19) has caused unprecedented impacts to people's daily life around the world. Various measures and policies such as lock-down and social-distancing are implemented by governments to combat the disease during the pandemic period. These measures and policies as well as virus itself may cause different mental health issues to people such as depression, anxiety, sadness, etc. In this paper, we exploit the massive text data posted by Twitter users to analyse the sentiment dynamics of people living in the state of New South Wales (NSW) in Australia during the pandemic period. Different from the existing work that mostly focuses the country-level and static sentiment analysis, we analyse the sentiment dynamics at the fine-grained local government areas (LGAs). Based on the analysis of around 94 million tweets that posted by around 183 thousand users located at different LGAs in NSW in five months, we found that people in NSW showed an overall positive sentimental polarity and the COVID-19 pandemic decreased the overall positive sentimental polarity during the pandemic period. The fine-grained analysis of sentiment in LGAs found that despite the dominant positive sentiment most of days during the study period, some LGAs experienced significant sentiment changes from positive to negative. This study also analysed the sentimental dynamics delivered by the hot topics in Twitter such as government policies (e.g. the Australia's JobKeeper program, lock-down, social-distancing) as well as the focused social events (e.g. the Ruby Princess Cruise). The results showed that the policies and events did affect people's overall sentiment, and they affected people's overall sentiment differently at different stages.
摘要：新型冠状病毒病2019（COVID-19）的爆发，造成了空前的影响到了人们对世界各地的日常生活。各种措施和如向下锁定和社会疏远政策，由政府实施在大流行期间，以对抗疾病。这些措施，以及病毒本身可能会导致不同的心理健康问题的人，如抑郁，焦虑，悲伤等，本文的政策，我们利用发布的Twitter用户大量的文本数据分析的生活在人们的情绪动态新南威尔士州（NSW）澳大利亚在大流行期间的状态。从主要集中在国家层面和静态情感分析现有的工作不同的是，我们来分析一下情绪动态的细粒度当地政府区域（L气体）。根据各地的9400万鸣叫张贴位于新南威尔士州在五个月内不同L气体各地的183000用户的分析，我们发现，居住在新南威尔士州整体上呈正感伤极性和COVID-19大流行降低了整体正感伤极性在大流行期间。在L气体情绪的细化的分析发现，尽管在研究期间占主导地位的乐观情绪天的最多，有的L气体经历了从正到负显著的情绪变化。这项研究还分析了在Twitter上的热门话题发表了感伤的动态，如政府政策（如澳大利亚的JobKeeper方案，锁定下来，社会疏远），以及为关注的社会事件（例如红宝石公主号）。结果表明，在政策和事件确实影响了人们的整体人气，而他们在不同阶段的不同影响了人们的整体人气。

33. Self-Supervised Representations Improve End-to-End Speech Translation [PDF] 返回目录
Anne Wu, Changhan Wang, Juan Pino, Jiatao Gu
Abstract: End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have been shown to be effective on data-scarce settings. In this work, we explore whether self-supervised pre-trained speech representations can benefit the speech translation task in both high- and low-resource settings, whether they can transfer well to other languages, and whether they can be effectively combined with other common methods that help improve low-resource end-to-end speech translation such as using a pre-trained high-resource speech recognition system. We demonstrate that self-supervised pre-trained features can consistently improve the translation performance, and cross-lingual transfer allows to extend to a variety of languages without or with little tuning.
摘要：最终到终端的语音至文本转换可以提供更简单和更小的系统，但面临着数据匮乏的挑战。前培训方法可以利用未标记的数据，并已被证明是在数据稀少的设置生效。在这项工作中，我们将探讨自我监督预训练的讲话表示是否能在高，低资源条件有利于语音翻译任务，他们是否能很好地转移到其他语言，以及他们是否可以与其他常见的被有效地结合起来方法，有助于提高资源匮乏的终端到终端的语音翻译，例如使用预训练资源丰富的语音识别系统。我们证明自我监督预训练的特征能够持续提高翻译的性能和跨语言传递允许扩展到各种语言没有或几乎没有调整。

34. A Self-Attention Network based Node Embedding Model [PDF] 返回目录
Dai Quoc Nguyen, Tu Dinh Nguyen, Dinh Phung
Abstract: Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE -- a novel unsupervised embedding model -- whose central idea is to employ a transformer self-attention network to iteratively aggregate vector representations of nodes in random walks. Our SANNE aims to produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that the proposed SANNE obtains state-of-the-art results for the node classification task on well-known benchmark datasets.
摘要：尽管取得进展的迹象几个最近已经提出，有限的研究已经用于需要为新看不见节点的嵌入感应设置进行 - 在图形网络深度学习的实际应用中经常遇到的设置。这显著影响的下游任务，如节点分类，链接预测或社区提取的表演。为此，我们提出了桑妮 - 一种新型的无监督嵌入模型 - 它的核心思想是采用变压器自我关注网络中随机行走节点迭代总向量表示。我们桑妮的目的是产生合理的嵌入不仅为当前节点，同时也为新看不见的节点。实验结果表明，所提出的桑妮取得状态的最先进的结果对公知的基准数据集的节点分类任务。

35. Improving Image Captioning with Better Use of Captions [PDF] 返回目录
Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu
Abstract: Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. The representation is then enhanced with neighbouring and contextual nodes with their textual and visual features. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences. We perform extensive experiments on the MSCOCO dataset, showing that the proposed framework significantly outperforms the baselines, resulting in the state-of-the-art performance under a wide range of evaluation metrics.
摘要：图像字幕是，已经引起广泛关注的自然语言处理和计算机视觉领域既是一个多式联运的问题。在本文中，我们提出了一个新的图像字幕的架构，以便更好地探索在标题和杠杆作用，为加强双方的图像表现和字幕生成可用的语义。我们的模型第一个构建字幕引导视觉关系图，采用引进有益归纳偏置弱监督的多实例学习。该表示然后用相邻，并与它们的文本和视觉特征的上下文节点增强。生成过程中，该模型还结合使用多任务学习的联合预测字和对象/谓语标签序列视觉关系。我们对MSCOCO数据组执行了广泛的实验，显示出所提出的框架显著优于基线，从而导致在宽范围的评价度量的状态的最先进的性能。

36. Match$^2$: A Matching over Matching Model for Similar Question Identification [PDF] 返回目录
Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xueqi Cheng, Hui Jiang, Xiaozhao Wang
Abstract: Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose a two-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match$^2$, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.
摘要：社区答疑（CQA）已经成为人们获取知识，在这里人们可以自由地提出问题或提交答案的主要手段。为了提高服务效率，类似的问题识别成为一个CQA核心任务，其目的是从归档存储库中找到了类似的问题，每当一个新的问题是问。然而，长期以来一直是正确衡量两个问题之间的相似性的挑战，由于自然语言的内在变化，即，可能有不同的方式问一个同样的问题或分享类似的表达方式不同的问题。为了缓解这一问题，这是很自然，为的存档问题的丰富现有的答案涉及。传统的方法通常需要一个端使用，它利用的答案作为相应问题的一些扩展表示。不幸的是，这可能会导致异常噪声进入相似度计算，因为答案往往是长期和多样化，从而导致性能较差。在这项工作中，我们提出了两方面的使用，利用的是答案的两个问题的桥梁。其核心思想是基于我们观察到类似的问题可以通过回答类似的部分加以解决，而不同的问题可能不会。换句话说，我们可以在相同的答案比较的两个问题的匹配模式来衡量他们的相似性。通过这种方式，我们提出了匹配模型一种新型的匹配，即匹配$ ^ 2 $，这对于类似的问题，识别2问答对之间的匹配模式进行比较。两个基准数据集实证实验表明，我们的模型可以显著超越国家的最先进以前在类似的问题识别任务的方法。

37. IQA: Interactive Query Construction in Semantic Question Answering Systems [PDF] 返回目录
Hamid Zafara, Mohnish Dubey, Jens Lehmann, Elena Demidova
Abstract: Semantic Question Answering (QA) systems automatically interpret user questions expressed in a natural language in terms of semantic queries. This process involves uncertainty, such that the resulting queries do not always accurately match the user intent, especially for more complex and less common questions. In this article, we aim to empower users in guiding QA systems towards the intended semantic queries by means of interaction. We introduce IQA - an interaction scheme for semantic QA pipelines. This scheme facilitates seamless integration of user feedback in the question answering process and relies on Option Gain - a novel metric that enables efficient and intuitive user interaction. Our evaluation shows that using the proposed scheme, even a small number of user interactions can lead to significant improvements in the performance of semantic Question Answering systems.
摘要：语义问题回答（QA）系统自动解释自然语言的语义查询来表达用户的问题。这一过程涉及的不确定性，使得产生的查询并不总是精确地匹配用户的意图，尤其是对于更复杂和更常见的问题。在这篇文章中，我们的目标是使用户在指导QA系统对通过互动的方式意图语义查询。我们引进IQA - 语义QA管道的交互方案。此方案功能有助于无缝集成在问答过程的用户反馈，并依赖于选项增益 - 一种新的度量，使高效且直观的用户交互。我们的评估表明，采用该方案，即使用户交互的数量很少，导致语义答疑系统的性能显著的改善。

38. Towards a self-organizing pre-symbolic neural model representing sensorimotor primitives [PDF] 返回目录
Junpei Zhong, Angelo Cangelosi, Stefan Wermter
Abstract: The acquisition of symbolic and linguistic representations of sensorimotor behavior is a cognitive process performed by an agent when it is executing and/or observing own and others' actions. According to Piaget's theory of cognitive development, these representations develop during the sensorimotor stage and the pre-operational stage. We propose a model that relates the conceptualization of the higher-level information from visual stimuli to the development of ventral/dorsal visual streams. This model employs neural network architecture incorporating a predictive sensory module based on an RNNPB (Recurrent Neural Network with Parametric Biases) and a horizontal product model. We exemplify this model through a robot passively observing an object to learn its features and movements. During the learning process of observing sensorimotor primitives, i.e. observing a set of trajectories of arm movements and its oriented object features, the pre-symbolic representation is self-organized in the parametric units. These representational units act as bifurcation parameters, guiding the robot to recognize and predict various learned sensorimotor primitives. The pre-symbolic representation also accounts for the learning of sensorimotor primitives in a latent learning context.
摘要：感觉运动行为符号和语言表述的收购是当它执行和/或观察自己和他人的行为由代理进行一个认知过程。根据认知发展皮亚杰的理论，这些表示在感觉运动阶段和前期运营阶段发展。我们建议，涉及从视觉刺激的更高水平的信息概念化到腹侧/背侧视觉流的发展模式。该模型采用掺入基于一个RNNPB（回归神经网络具有参数偏见）上的预测传感模块和一个水平的产品模型的神经网络结构。我们通过一个机器人举例说明这种模型被动地观察对象，了解它的功能和动作。期间观察感觉运动原语，即，观察一组臂的运动和它的面向对象特征的轨迹的学习过程中，预符号表示是自组织的参数的单元。这些代表性的单元充当分岔参数，引导机器人识别和预测各种学习感觉运动原语。在象征性的预表示也占了感觉元的潜在学习环境学习。

39. M2P2: Multimodal Persuasion Prediction using Adaptive Fusion [PDF] 返回目录
Chongyang Bai, Haipeng Chen, Srijan Kumar, Jure Leskovec, V.S. Subrahmanian
Abstract: Identifying persuasive speakers in an adversarial environment is a critical task. In a national election, politicians would like to have persuasive speakers campaign on their behalf. When a company faces adverse publicity, they would like to engage persuasive advocates for their position in the presence of adversaries who are critical of them. Debates represent a common platform for these forms of adversarial persuasion. This paper solves two problems: the Debate Outcome Prediction (DOP) problem predicts who wins a debate while the Intensity of Persuasion Prediction (IPP) problem predicts the change in the number of votes before and after a speaker speaks. Though DOP has been previously studied, we are the first to study IPP. Past studies on DOP fail to leverage two important aspects of multimodal data: 1) multiple modalities are often semantically aligned, and 2) different modalities may provide diverse information for prediction. Our M2P2 (Multimodal Persuasion Prediction) framework is the first to use multimodal (acoustic, visual, language) data to solve the IPP problem. To leverage the alignment of different modalities while maintaining the diversity of the cues they provide, M2P2 devises a novel adaptive fusion learning framework which fuses embeddings obtained from two modules -- an alignment module that extracts shared information between modalities and a heterogeneity module that learns the weights of different modalities with guidance from three separately trained unimodal reference models. We test M2P2 on the popular IQ2US dataset designed for DOP. We also introduce a new dataset called QPS (from Qipashuo, a popular Chinese debate TV show ) for IPP. M2P2 significantly outperforms 3 recent baselines on both datasets. Our code and QPS dataset can be found at this http URL.
摘要：在敌对环境中识别有说服力的演讲是一项至关重要的任务。在全国选举，政治家想有代表他们有说服力的演讲活动。当一家公司面临的负面宣传，他们想从事有说服力的主张他们在敌人谁是他们的关键的存在位置。辩论代表这些形式的对抗性说服一个共同的平台。本文解决了两个问题：辩论结果预测（DOP）的问题预测谁赢了辩论，同时说服预测（IPP）问题的强度预测前后扬声器说话的票数变化。尽管DOP已经被先前研究中，我们是第一个研究IPP。 DOP的过去的研究未能杠杆多模态数据的两个重要的方面：1）多模态通常语义对齐，和2）不同模态可以提供用于预测不同的信息。我们M2P2（多式联运劝导预测）框架是首先使用多（的听觉，视觉，语言）的数据来解决问题IPP。为了利用不同模态的取向，同时保持他们提供的线索的多样性，M2P2发明了一种新颖的自适应融合学习框架，融合来自两个模块的嵌入获得 - 取向模块，其提取物共享模式和一个异质性模块之间的信息获知有三个单独训练的单峰参考模型指导不同形式的权重。我们广受欢迎的IQ2US数据集设计DOP测试M2P2。我们还引入了一个叫做QPS（从Qipashuo，一个流行的中国辩论的电视节目）为IPP新的数据集。 M2P2显著优于3周最近对两个数据集的基线。我们的代码和数据集中QPS可以在这个HTTP URL中找到。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-23

目录

摘要