摘要

1. Continual Lifelong Learning in Natural Language Processing: A Survey [PDF] 返回目录
Magdalena Biesialska, Katarzyna Biesialska, Marta R. Costa-jussà
Abstract: Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP. Finally, we present our outlook on future research directions.
摘要：持续学习（CL）旨在使信息系统能够跨时间从连续数据流中学习。但是，现有的深度学习架构很难在很大程度上不忘记先前获得的知识的情况下学习新任务。此外，CL对于语言学习尤其具有挑战性，因为自然语言是模棱两可的：它是离散的，组成的，并且其含义取决于上下文。在这项工作中，我们将通过各种NLP任务来探讨CL的问题。我们的调查讨论了CL的主要挑战以及在神经网络模型中应用的当前方法。我们还对NLP中现有的CL评估方法和数据集进行了严格的审查。最后，我们提出对未来研究方向的展望。

2. BERT Goes Shopping: Comparing Distributional Models for Product Representations [PDF] 返回目录
Federico Bianchi, Bingqing Yu, Jacopo Tagliabue
Abstract: Word embeddings (e.g., word2vec) have been applied successfully to eCommerce products through prod2vec. Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce: our model -- ProdBERT -- is trained to generate representations of products through masked session modeling. Through extensive experiments over multiple shops, different tasks, and a range of design choices, we systematically compare the accuracy of ProdBERT and prod2vec embeddings: while ProdBERT is found to be superior to traditional methods in several scenarios, we highlight the importance of resources and hyperparameters in the best performing models. Finally, we conclude by providing guidelines for training embeddings under a variety of computational and data constraints.
摘要：词嵌入（例如word2vec）已通过prod2vec成功应用于电子商务产品。受上下文关联嵌入带来的一些NLP任务最近性能改进的启发，我们建议将类似BERT的体系结构转移到电子商务：我们的模型ProdBERT经过培训，可以通过屏蔽会话建模生成产品的表示形式。通过在多家商店，不同任务和一系列设计选择上进行的广泛实验，我们系统地比较了ProdBERT和prod2vec嵌入的准确性：尽管在某些情况下ProdBERT被证明优于传统方法，但我们强调了资源和超参数的重要性表现最佳的模型。最后，我们通过提供在各种计算和数据约束下训练嵌入的准则来得出结论。

3. MIX : a Multi-task Learning Approach to Solve Open-Domain Question Answering [PDF] 返回目录
Sofian Chaybouti, Achraf Saghe, Aymen Shabou
Abstract: In this paper, we introduce MIX : a multi-task deep learning approach to solve Open-Domain Question Answering. First, we design our system as a multi-stage pipeline made of 3 building blocks : a BM25-based Retriever, to reduce the search space; RoBERTa based Scorer and Extractor, to rank retrieved documents and extract relevant spans of text respectively. Eventually, we further improve computational efficiency of our system to deal with the scalability challenge : thanks to multi-task learning, we parallelize the close tasks solved by the Scorer and the Extractor. Our system outperforms previous state-of-the-art by 12 points in both f1-score and exact-match on the squad-open benchmark.
摘要：在本文中，我们介绍了MIX：一种解决开放域问题回答的多任务深度学习方法。首先，我们将系统设计为由3个构建块组成的多级管道：基于BM25的检索器，以减少搜索空间；基于RoBERTa的记分器和提取器，分别对检索到的文档进行排名并提取相关的文本范围。最终，我们进一步提高了系统的计算效率，以应对可伸缩性挑战：多任务学习使我们并行化了由记分器和提取器解决的任务。我们的系统在f1得分和全开基准上的精确匹配方面都比以前的最新技术高出12分。

4. Benchmarking Automatic Detection of Psycholinguistic Characteristics for Better Human-Computer Interaction [PDF] 返回目录
Sanja Stajner, Seren Yenikent, Marc Franco-Salvador
Abstract: When two people pay attention to each other and are interested in what the other has to say or write, they almost instantly adapt their writing/speaking style to match the other. For a successful interaction with a user, chatbots and dialog systems should be able to do the same. We propose framework consisting of five psycholinguistic textual characteristics for better human-computer interaction. We describe annotation processes for collecting the data, and benchmark five binary classification tasks, experimenting with different training sizes and model architectures. We perform experiments in English, Spanish, German, Chinese, and Arabic. The best architectures noticeably outperform several baselines and achieve macro-averaged F1-scores between 72% and 96% depending on the language and the task. Similar results are achieved even with a small amount of training data. The proposed framework proved to be fairly easy to model for various languages even with small amount of manually annotated data if right architectures are used. At the same time, it showed potential for improving user satisfaction if applied in existing commercial chatbots.
摘要：当两个人彼此关注并对另一个人所说的或感兴趣的东西感兴趣时，他们几乎立即适应了他们的写作/说话风格以与另一个人相称。为了与用户成功进行交互，聊天机器人和对话系统应该能够做到这一点。我们提出了由五个心理语言文本特征组成的框架，以实现更好的人机交互。我们描述了用于收集数据的注释过程，并对五个二进制分类任务进行了基准测试，并尝试了不同的训练规模和模型架构。我们以英语，西班牙语，德语，中文和阿拉伯语进行实验。最好的架构明显胜过多个基准，并且根据语言和任务的不同，获得的F1-宏平均分数在72％到96％之间。即使使用少量训练数据，也可以获得类似的结果。事实证明，如果使用正确的体系结构，即使使用少量的手动注释数据，该框架也很容易为各种语言建模。同时，如果将其应用到现有的商业聊天机器人中，它具有提高用户满意度的潜力。

5. Hate Speech detection in the Bengali language: A dataset and its baseline evaluation [PDF] 返回目录
Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam
Abstract: Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the findings of an appropriate model and also the choice of feature analysis method. further more, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled datasets. This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts. All the comments are collected from YouTube and Facebook comment section and classified into seven categories: sports, entertainment, religion, politics, crime, celebrity and TikTok & meme. A total of 50 annotators annotated each comment three times and the majority vote was taken as the final annotation. Nevertheless, we have conducted base line experiments and several deep learning models along with extensive pre-trained Bengali word embedding such as Word2Vec, FastText and BengFastText on this dataset to facilitate future research opportunities. The experiment illustrated that although all deep learning models performed well, SVM achieved the best result with 87.5% accuracy. Our core contribution is to make this benchmark dataset available and accessible to facilitate further research in the field of in the field of Bengali hate speech detection.
摘要：诸如YouTube和Facebook之类的社交媒体网站已经成为每个人生活中不可或缺的一部分，并且在最近几年中，社交媒体评论部分中的仇恨言论迅速增加。在社交媒体网站上检测仇恨言论面临各种挑战，包括数据集不平衡小，发现合适模型的问题以及特征分析方法的选择。此外，由于缺乏用黄金标准标记的数据集，对于孟加拉语社区来说，这个问题更加严重。本文介绍了一个新的数据集，该数据集由30,000个用户评论（由众包标记）和专家进行了验证。所有评论均从YouTube和Facebook评论部分收集，并分为七个类别：体育，娱乐，宗教，政治，犯罪，名人和TikTok＆meme。共有50个注释者对每个评论进行了3次注释，并且以多数票作为最终注释。尽管如此，我们还是在此数据集上进行了基线实验和一些深度学习模型，以及广泛的预训练孟加拉语单词嵌入，例如Word2Vec，FastText和BengFastText，以方便将来的研究机会。实验表明，尽管所有深度学习模型均表现良好，但SVM以87.5％的准确度获得了最佳结果。我们的核心贡献是使该基准数据集可用和可访问，以促进孟加拉语仇恨语音检测领域的进一步研究。

6. Ultra-Fast, Low-Storage, Highly Effective Coarse-grained Selection in Retrieval-based Chatbot by Using Deep Semantic Hashing [PDF] 返回目录
Tian Lan, Xian-Ling Mao, Xiao-yan Gao, He-Yan Huang
Abstract: We study the coarse-grained selection module in the retrieval-based chatbot. Coarse-grained selection is a basic module in a retrieval-based chatbot, which constructs a rough candidate set from the whole database to speed up the interaction with customers. So far, there are two kinds of approaches for coarse-grained selection modules: (1) sparse representation; (2)dense representation. To the best of our knowledge, there is no systematic comparison between these two approaches in retrieval-based chatbots, and which kind of method is better in real scenarios is still an open question. In this paper, we first systematically compare these two methods. Extensive experiment results demonstrate that dense representation method significantly outperforms the sparse representation, but costs more time and storage occupation. In order to overcome these fatal weaknesses of the dense representation method, we also propose an ultra-fast, low-storage, and highly effective Deep Semantic Hashing Coarse-grained selection method, called DSHC model. Specifically, in our proposed DSHC model, a hashing optimizing module that consists of two auto-encoder models is stacked on a well trained dense representation model, and three loss functions are designed to optimize it. The hash codes provided by hashing optimizing module effectively preserve the rich semantic and similarity information in dense vectors. Ex-tensive experiment results prove that our proposed DSHC model can achieve much faster speed and lower storage than sparse representation, with very little performance loss compared with dense representation. Besides, our source codes have been publicly released for future research.
摘要：我们研究了基于检索的聊天机器人中的粗粒度选择模块。粗粒度选择是基于检索的聊天机器人的基本模块，该聊天机器人从整个数据库构造一个粗糙的候选集，以加快与客户的交互。到目前为止，粗粒度选择模块有两种方法：（1）稀疏表示；（2）密集表示。据我们所知，在基于检索的聊天机器人中，这两种方法之间没有系统的比较，在实际情况下哪种方法更好仍然是一个悬而未决的问题。在本文中，我们首先系统地比较这两种方法。大量的实验结果表明，密集表示法明显优于稀疏表示法，但花费更多的时间和存储空间。为了克服密集表示方法的这些致命缺点，我们还提出了一种超快速，低存储且高效的深度语义哈希粗粒度选择方法，称为DSHC模型。具体来说，在我们提出的DSHC模型中，由两个自动编码器模型组成的哈希优化模块堆叠在训练有素的密集表示模型上，并设计了三个损失函数对其进行优化。散列优化模块提供的散列码有效地将密集的语义和相似性信息保留在密集向量中。大量的实验结果证明，与稀疏表示相比，我们提出的DSHC模型可以实现更快的速度和更低的存储量，与密集表示相比，性能损失很小。此外，我们的源代码已公开发布以供将来研究。

7. ReferentialGym: A Nomenclature and Framework for Language Emergence & Grounding in (Visual) Referential Games [PDF] 返回目录
Kevin Denamganaï, James Alfred Walker
Abstract: Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to investigate language emergence and grounding working towards better human-machine interfaces. For instance, interactive/conversational AI assistants that are able to relate their vision to the ongoing conversation. This paper provides two contributions to this research field. Firstly, a nomenclature is proposed to understand the main initiatives in studying language emergence and grounding, accounting for the variations in assumptions and constraints. Secondly, a PyTorch based deep learning framework is introduced, entitled ReferentialGym, which is dedicated to furthering the exploration of language emergence and grounding. By providing baseline implementations of major algorithms and metrics, in addition to many different features and approaches, ReferentialGym attempts to ease the entry barrier to the field and provide the community with common implementations.
摘要：自然语言是人类用来交流信息和实现共同目标的强大工具。它们的价值在于一些主要属性，例如组成性，层次结构和递归语法，计算语言学家一直在研究语言游戏引起的人工语言的出现。直到最近，AI社区才开始研究语言的出现和基础，以寻求更好的人机界面。例如，能够将其视觉与正在进行的对话相关联的交互式/对话式AI助手。本文为该研究领域提供了两个贡献。首先，提出了一个命名法，以了解研究语言出现和扎根的主要举措，并考虑假设和约束条件的变化。其次，引入了一个基于PyTorch的深度学习框架，称为ReferentialGym，该框架致力于进一步探索语言的出现和扎根。通过提供主要算法和指标的基线实现，以及许多不同的功能和方法，ReferentialGym尝试缓解该领域的入门障碍，并为社区提供通用的实现。

8. cif-based collaborative decoding for end-to-end contextual speech recognition [PDF] 返回目录
Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu
Abstract: End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream. However, the unified structure and the E2E training hamper injecting contextual information into them for contextual biasing. Though contextual LAS (CLAS) gives an excellent all-neural solution, the degree of biasing to given context information is not explicitly controllable. In this paper, we focus on incorporating context information into the continuous integrate-and-fire (CIF) based model that supports contextual biasing in a more controllable fashion. Specifically, an extra context processing network is introduced to extract contextual embeddings, integrate acoustically relevant context information and decode the contextual output distribution, thus forming a collaborative decoding with the decoder of the CIF-based model. Evaluated on the named entity rich evaluation sets of HKUST/AISHELL-2, our method brings relative character error rate (CER) reduction of 8.83%/21.13% and relative named entity character error rate (NE-CER) reduction of 40.14%/51.50% when compared with a strong baseline. Besides, it keeps the performance on original evaluation set without degradation.
摘要：端到端（E2E）模型在多种语音识别基准上均取得了可喜的成果，并显示出成为主流的潜力。但是，统一的结构和E2E培训阻碍了向上下文中注入上下文信息以进行上下文偏向。尽管上下文LAS（CLAS）提供了出色的全神经解决方案，但是对给定上下文信息的偏倚程度却无法明确控制。在本文中，我们专注于将上下文信息合并到基于连续集成和解雇（CIF）的模型中，该模型以更可控的方式支持上下文偏差。具体而言，引入了额外的上下文处理网络以提取上下文嵌入，集成声学相关上下文信息并解码上下文输出分布，从而与基于CIF模型的解码器形成协作解码。通过对HKUST / AISHELL-2的命名实体丰富评估集进行评估，我们的方法使相对字符错误率（CER）降低了8.83％/ 21.13％，并且相对命名实体字符错误率（NE-CER）降低了40.14％/ 51.50与强基准相比，％。此外，它可以保持原始评估集上的性能而不会降低性能。

9. Unsupervised Learning of Discourse Structures using a Tree Autoencoder [PDF] 返回目录
Patrick Huber, Giuseppe Carenini
Abstract: Discourse information, as postulated by popular discourse theories, such as RST and PDTB, has been shown to improve an increasing number of downstream NLP tasks, showing positive effects and synergies of discourse with important real-world applications. While methods for incorporating discourse become more and more sophisticated, the growing need for robust and general discourse structures has not been sufficiently met by current discourse parsers, usually trained on small scale datasets in a strictly limited number of domains. This makes the prediction for arbitrary tasks noisy and unreliable. The overall resulting lack of high-quality, high-quantity discourse trees poses a severe limitation to further progress. In order the alleviate this shortcoming, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop a method to generate larger and more diverse discourse treebanks. In this paper we are inferring general tree structures of natural text in multiple domains, showing promising results on a diverse set of tasks.
摘要：RST和PDTB等流行的话语理论都假定话语信息可以改善下游NLP任务的数量，显示出话语在重要的实际应用中的积极作用和协同作用。尽管合并话语的方法变得越来越复杂，但是当前的话语解析器并不能充分满足对健壮和通用的话语结构的日益增长的需求，通常在严格限制数量的域中对小规模数据集进行训练。这使得对任意任务的预测变得嘈杂且不可靠。最终导致缺乏高质量，高数量的话语树，这严重限制了进一步的发展。为了缓解这一缺点，我们提出了一种新的策略，即通过扩展具有自动编码目标的潜在树归纳框架，以与任务无关的，无监督的方式生成树结构。所提出的方法可以应用于任何树状结构的目标，例如句法分析，话语分析等。然而，由于生成话语树的注释过程特别困难，因此我们最初开发了一种方法来生成更大，更多样化的话语树库。在本文中，我们推断了多个域中自然文本的一般树结构，这些结果在一系列任务上显示出令人鼓舞的结果。

10. Interactive Question Clarification in Dialogue via Reinforcement Learning [PDF] 返回目录
Xiang Hu, Zujie Wen, Yafang Wang, Xiaolong Li, Gerard de Melo
Abstract: Coping with ambiguous questions has been a perennial problem in real-world dialogue systems. Although clarification by asking questions is a common form of human interaction, it is hard to define appropriate questions to elicit more specific intents from a user. In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query. We first formulate a collection partitioning problem to select a set of labels enabling us to distinguish potential unambiguous intents. We list the chosen labels as intent phrases to the user for further confirmation. The selected label along with the original user query then serves as a refined query, for which a suitable response can more easily be identified. The model is trained using reinforcement learning with a deep policy network. We evaluate our model based on real-world user clicks and demonstrate significant improvements across several different experiments.
摘要：解决模棱两可的问题一直是现实世界中对话系统中的常年问题。尽管通过提问来澄清是人类互动的一种常见形式，但是很难定义适当的问题来引起用户的更具体的意图。在这项工作中，我们提出了一个增强模型，通过建议原始查询的改进来澄清模棱两可的问题。我们首先提出一个集合划分问题，以选择一组标签，使我们能够区分潜在的明确意图。我们将选择的标签作为意图短语列出给用户，以供进一步确认。然后，将所选标签与原始用户查询一起用作改进的查询，针对该查询，可以更轻松地识别合适的响应。使用强化学习和深层的政策网络对模型进行训练。我们根据现实世界中的用户点击次数评估模型，并在多个不同的实验中展示出显着的改进。

11. InSRL: A Multi-view Learning Framework Fusing Multiple Information Sources for Distantly-supervised Relation Extraction [PDF] 返回目录
Zhendong Chu, Haiyun Jiang, Yanghua Xiao, Wei Wang
Abstract: Distant supervision makes it possible to automatically label bags of sentences for relation extraction by leveraging knowledge bases, but suffers from the sparse and noisy bag issues. Additional information sources are urgently needed to supplement the training data and overcome these issues. In this paper, we introduce two widely-existing sources in knowledge bases, namely entity descriptions, and multi-grained entity types to enrich the distantly supervised data. We see information sources as multiple views and fusing them to construct an intact space with sufficient information. An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL), and the representations of single views are jointly learned simultaneously. Moreover, inner-view and cross-view attention mechanisms are used to highlight important information on different levels on an entity-pair basis. The experimental results on a popular benchmark dataset demonstrate the necessity of additional information sources and the effectiveness of our framework. We will release the implementation of our model and dataset with multiple information sources after the anonymized review phase.
摘要要】远程监督使得利用知识库来自动标记句子包以进行关系提取成为可能，但是却遭受到了稀疏和嘈杂的袋子问题。迫切需要其他信息资源来补充培训数据并克服这些问题。在本文中，我们介绍了知识库中两个广泛存在的资源，即实体描述和多粒度实体类型，以丰富远程监管的数据。我们将信息源视为多个视图，并将它们融合在一起以构建具有足够信息的完整空间。提出了一种端到端的多视图学习框架，用于通过完整空间表示学习（InSRL）进行关系提取，并且同时学习单个视图的表示。此外，内部视图和交叉视图注意机制用于在实体对的基础上突出显示不同级别的重要信息。在一个流行的基准数据集上的实验结果证明了附加信息源的必要性以及我们框架的有效性。在匿名审核阶段之后，我们将发布具有多个信息源的模型和数据集的实现。

12. Assessing COVID-19 Impacts on College Students via Automated Processing of Free-form Text [PDF] 返回目录
Ravi Sharma, Sri Divya Pagadala, Pratool Bharti, Sriram Chellappan, Trine Schmidt, Raj Goyal
Abstract: In this paper, we report experimental results on assessing the impact of COVID-19 on college students by processing free-form texts generated by them. By free-form texts, we mean textual entries posted by college students (enrolled in a four year US college) via an app specifically designed to assess and improve their mental health. Using a dataset comprising of more than 9000 textual entries from 1451 students collected over four months (split between pre and post COVID-19), and established NLP techniques, a) we assess how topics of most interest to student change between pre and post COVID-19, and b) we assess the sentiments that students exhibit in each topic between pre and post COVID-19. Our analysis reveals that topics like Education became noticeably less important to students post COVID-19, while Health became much more trending. We also found that across all topics, negative sentiment among students post COVID-19 was much higher compared to pre-COVID-19. We expect our study to have an impact on policy-makers in higher education across several spectra, including college administrators, teachers, parents, and mental health counselors.
摘要：在本文中，我们报告了通过处理由他们产生的自由格式文本来评估COVID-19对大学生的影响的实验结果。所谓自由格式的文字，是指大学生（在美国四年制大学就读）通过专门设计用来评估和改善其心理健康状况的应用发布的文字条目。使用一个数据集，该数据集包含来自四个月（在COVID-19之前和之后之间划分）的1451名学生的9000多个文本条目，并建立了NLP技术，a）我们评估学生最感兴趣的主题如何在COVID之前和之后之间变化-19，以及b）我们评估学生在COVID-19之前和之后之间在每个主题中表现出的情感。我们的分析表明，对于COVID-19之后的学生来说，诸如教育之类的话题变得不那么重要了，而健康则更加趋向流行。我们还发现，在所有主题中，与COVID-19之前相比，COVID-19之后的学生中的负面情绪要高得多。我们希望我们的研究对高等教育的决策者产生多方面的影响，包括大学行政人员，教师，父母和心理健康咨询师。

13. Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization [PDF] 返回目录
Jiho Noh, Ramakanth Kavuluru
Abstract: Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST's TREC-PM track datasets (2017--2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: this https URL.
摘要：精密医学（PM）的信息检索（IR）通常涉及寻找表征患者病例的多个证据。这通常至少包括病症名称和适用于患者的遗传变异。其他因素，例如人口统计属性，合并症和社会决定因素也可能是相关的。这样，检索问题通常被表述为即席搜索，但是具有可能需要纳入的多个方面（例如疾病，突变）。在本文中，我们提出了一种文档重新排序方法，该方法将神经查询-文档匹配和文本摘要结合在一起，以实现这种检索方案。我们的架构建立在基本BERT模型的基础上，具有三个用于重新排序的特定组件：（a）。文档查询匹配（b）。关键字提取和（c）。分面条件抽象摘要。（b）和（c）的结果用于从本质上将候选文档转换为简洁的摘要，该摘要可以与当前的查询进行比较以计算相关性得分。组件（a）直接为查询生成候选文档的匹配分数。完整的体系结构得益于文档查询匹配的互补潜力以及基于PM方面汇总的新颖文档转换方法。使用NIST的TREC-PM轨道数据集（2017--2019）进行的评估表明，我们的模型实现了最先进的性能。为了提高可重复性，我们的代码在此处可用：此https URL。

14. Do You Do Yoga? Understanding Twitter Users' Types and Motivations using Social and Textual Information [PDF] 返回目录
Tunazzina Islam, Dan Goldwasser
Abstract: Leveraging social media data to understand people's lifestyle choices is an exciting domain to explore but requires a multiview formulation of the data. In this paper, we propose a joint embedding model based on the fusion of neural networks with attention mechanism by incorporating social and textual information of users to understand their activities and motivations. We use well-being related tweets from Twitter, focusing on 'Yoga'. We demonstrate our model on two downstream tasks: (i) finding user type such as either practitioner or promotional (promoting yoga studio/gym), other; (ii) finding user motivation i.e. health benefit, spirituality, love to tweet/retweet about yoga but do not practice yoga.
摘要：利用社交媒体数据了解人们的生活方式选择是一个令人兴奋的探索领域，但需要对数据进行多视图表示。在本文中，我们通过结合用户的社交和文本信息来了解他们的活动和动机，提出了一种基于神经网络与注意力机制融合的联合嵌入模型。我们使用来自Twitter的健康相关推文，重点关注“瑜伽”。我们在两个下游任务上演示了我们的模型：（i）查找用户类型，例如从业者或促销者（促进瑜伽馆/健身房），其他；（ii）查找用户动机，即健康益处，灵性，热爱推/转推有关瑜伽的知识，但不要练习瑜伽。

15. MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification [PDF] 返回目录
Te-Lin Wu, Shikhar Singh, Sayan Paul, Gully Burns, Nanyun Peng
Abstract: We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs, and multimodal models. Extensive experiments and analysis show that multimodal models, despite outperforming unimodal ones, still need improvements especially on a less-supervised way of grounding visual concepts with languages, and better transferability to low resource domains. We release our dataset and the benchmarks to facilitate future research in multimodal learning, especially to motivate targeted improvements for applications in scientific domains.
摘要：我们为多峰生物医学方法分类引入了一个新的数据集MELINDA。以全自动的远程监管方式收集数据集，其中标签是从现有的策划数据库中获取的，而实际内容是从与数据库中每个记录相关联的论文中提取的。我们对各种最新的NLP和计算机视觉模型进行了基准测试，包括仅以字幕文本或图像作为输入的单峰模型以及多峰模型。大量的实验和分析表明，尽管多峰模型的性能优于单峰模型，但仍需要改进，特别是在以语言为基础的可视化概念较少监督的方式上，以及向低资源域的更好转移性方面。我们发布了数据集和基准，以促进未来多模式学习的研究，尤其是针对科学领域中的应用激励有针对性的改进。

16. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning [PDF] 返回目录
Hanrui Wang, Zhekai Zhang, Song Han
Abstract: The attention mechanism is becoming increasingly popular in Natural Language Processing (NLP) applications, showing superior performance than convolutional and recurrent architectures. However, general-purpose platforms such as CPUs and GPUs are inefficient when performing attention inference due to complicated data movement and low arithmetic intensity. Moreover, existing NN accelerators mainly focus on optimizing convolutional or recurrent models, and cannot efficiently support attention. In this paper, we present SpAtten, an efficient algorithm-architecture co-design that leverages token sparsity, head sparsity, and quantization opportunities to reduce the attention computation and memory access. Inspired by the high redundancy of human languages, we propose the novel cascade token pruning to prune away unimportant tokens in the sentence. We also propose cascade head pruning to remove unessential heads. Cascade pruning is fundamentally different from weight pruning since there is no trainable weight in the attention mechanism, and the pruned tokens and heads are selected on the fly. To efficiently support them on hardware, we design a novel top-k engine to rank token and head importance scores with high throughput. Furthermore, we propose progressive quantization that first fetches MSBs only and performs the computation; if the confidence is low, it fetches LSBs and recomputes the attention outputs, trading computation for memory reduction. Extensive experiments on 30 benchmarks show that, on average, SpAtten reduces DRAM access by 10.0x with no accuracy loss, and achieves 1.6x, 3.0x, 162x, 347x speedup, and 1,4x, 3.2x, 1193x, 4059x energy savings over A3 accelerator, MNNFast accelerator, TITAN Xp GPU, Xeon CPU, respectively.
摘要：注意机制在自然语言处理（NLP）应用程序中变得越来越流行，其表现优于卷积和循环体系结构。然而，由于复杂的数据移动和低的运算强度，当执行注意力推断时，诸如CPU和GPU之类的通用平台效率低下。此外，现有的NN加速器主要集中在优化卷积或递归模型上，无法有效地支持注意力。在本文中，我们介绍了SpAtten，这是一种有效的算法－体系结构协同设计，它利用令牌稀疏性，头部稀疏性和量化机会来减少注意力计算和内存访问。受人类语言高度冗余的启发，我们提出了一种新颖的级联标记修剪，以修剪句子中不重要的标记。我们还建议使用级联头修剪来删除不必要的头。级联修剪与重量修剪在根本上是不同的，因为注意机制中没有可训练的重量，并且修剪的标记和头部是即时选择的。为了在硬件上有效地支持它们，我们设计了一种新颖的top-k引擎，以高吞吐量对令牌和头部重要性得分进行排名。此外，我们提出了渐进式量化，该渐进式量化首先仅获取MSB并执行计算；然后进行逐步量化。如果置信度低，则它将获取LSB并重新计算注意力输出，从而将计算结果用于减少内存。在30个基准测试上进行的广泛实验表明，平均而言，SpAtten可以将DRAM访问减少10.0倍，而不会降低精度，并且可以实现1.6倍，3.0倍，162倍，347倍的加速，以及1,4倍，3.2倍，1193倍，4059倍的节能分别是A3加速器，MNNFast加速器，TITAN Xp GPU和Xeon CPU。

17. The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks [PDF] 返回目录
Siyuan Feng, Odette Scharenborg
Abstract: This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants were observed. Moreover, a positive correlation was found between the effectiveness of the back-end in capturing a phoneme's information and the quality of the cross-lingual phone labels assigned to the phoneme. The AF-level analysis together with t-SNE visualization results showed that the proposed approach is better than MFCC and APC features in capturing manner and place of articulation information, vowel height, and backness information. Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information. Nevertheless, monophthong vowel information is less well captured than consonant information, which suggests that future research should focus on improving capturing monophthong vowel information.
摘要：本研究致力于无监督子词建模，即学习可以区分语言子词单元的声学特征表示。我们提出了一个两阶段的学习框架，该框架结合了自我监督学习和跨语言知识转移。该框架由自回归预测编码（APC）作为前端和跨语言深度神经网络（DNN）作为后端组成。使用Libri-light和ZeroSpeech 2017数据库进行的ABX子词可辨别性任务实验表明，我们的方法具有竞争力或优于最新研究。在音素和发音特征（AF）级别上的全面，系统的分析表明，我们的方法在捕获二重音方面优于单音元音信息，同时还观察到了针对不同类型辅音的捕获信息量的差异。此外，发现后端在捕获音素信息方面的有效性与分配给音素的跨语言电话标签的质量之间存在正相关。 AF级分析和t-SNE可视化结果表明，该方法在捕获发音信息，元音高度和后背信息的方式和位置方面优于MFCC和APC。两者合计，分析表明我们方法的两个阶段都可以有效地捕获音素和AF信息。然而，单音元音信息的获取不如辅音信息好，这表明未来的研究应集中在改善单音元音信息的获取上。

18. The voice of COVID-19: Acoustic correlates of infection [PDF] 返回目录
Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Schönweiler, Markus Wehler, Björn W. Schuller
Abstract: COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present study aims to investigate voice acoustic correlates of an infection with COVID-19 on the basis of a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /o:/, /u:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with the most prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Findings of this study can be considered an important proof-of-concept contribution for a potential future voice-based identification of individuals infected with COVID-19.
摘要：COVID-19是全球性的健康危机，在过去的一年中一直影响着我们日常生活的许多方面。 COVID-19的症状是异质的，严重程度连续。很大一部分症状与人声系统的病理变化有关，导致人们认为COVID-19也可能会影响声音的产生。首次，本研究旨在基于全面的声学参数集来调查感染COVID-19的声音声学相关性。我们比较了从11个有症状的COVID-19阳性和11个有症状的COVID-19产生的元音/ i：/，/ e：/，/ o：/，/ u：/和/ a：/提取的88个声学特征会说德语的参与者。我们采用Mann-Whitney U检验并计算效果大小，以找出具有最显着组差异的特征。平均浊音段长度和每秒浊音段数在所有元音中产生最重要的差异，表明在COVID-19阳性参与者发声期间肺气气流不连续。前元音/ i：/和/ e：/中的组差异还反映在基本频率和谐波噪声比的变化中，后元音中的/ o：/和/ u：/中的组差异在统计中Mel频率倒谱系数和频谱斜率的关系。这项研究的发现可以被认为是概念上重要的贡献，对于将来可能通过语音识别受COVID-19感染的个体具有重要意义。

19. MASKER: Masked Keyword Regularization for Reliable Text Classification [PDF] 返回目录
Seung Jun Moon, Sangwoo Mo, Kimin Lee, Jaeho Lee, Jinwoo Shin
Abstract: Pre-trained language models have achieved state-of-the-art accuracies on various text classification tasks, e.g., sentiment analysis, natural language inference, and semantic textual similarity. However, the reliability of the fine-tuned text classifiers is an often underlooked performance criterion. For instance, one may desire a model that can detect out-of-distribution (OOD) samples (drawn far from training distribution) or be robust against domain shifts. We claim that one central obstacle to the reliability is the over-reliance of the model on a limited number of keywords, instead of looking at the whole context. In particular, we find that (a) OOD samples often contain in-distribution keywords, while (b) cross-domain samples may not always contain keywords; over-relying on the keywords can be problematic for both cases. In light of this observation, we propose a simple yet effective fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. When applied to various pre-trained language models (e.g., BERT, RoBERTa, and ALBERT), we demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy. Code is available at this https URL.
摘要：经过预训练的语言模型已在各种文本分类任务（例如情感分析，自然语言推断和语义文本相似性）上达到了最新的准确性。但是，微调文本分类器的可靠性是经常被忽视的性能标准。例如，人们可能需要一种模型，该模型可以检测分布失调（OOD）样本（远离训练分布）或对域偏移具有鲁棒性。我们声称，可靠性的一个主要障碍是模型对有限数量的关键字的过度依赖，而不是对整个上下文的依赖。特别是，我们发现（a）OOD样本通常包含分布内关键字，而（b）跨域样本可能并不总是包含关键字；在两种情况下，过度依赖关键字都可能会出现问题。根据这一观察结果，我们提出了一种简单而有效的微调方法，即硬币蒙版关键字正则化（MASKER），它有助于基于上下文的预测。 MASKER对该模型进行正则化，以从其余单词中重建关键字，并在没有足够上下文的情况下进行低置信度预测。当应用于各种预训练的语言模型（例如BERT，RoBERTa和ALBERT）时，我们证明了MASKER在不降低分类准确性的情况下改善了OOD检测和跨域泛化。在此https URL上提供了代码。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-18

目录

摘要