摘要

1. An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition [PDF] 返回目录
Gizem Aras, Didem Makaroglu, Seniz Demir, Altan Cakir
Abstract: Named entity recognition (NER) is an extensively studied task that extracts and classifies named entities in a text. NER is crucial not only in downstream language processing applications such as relation extraction and question answering but also in large scale big data operations such as real-time analysis of online digital media content. Recent research efforts on Turkish, a less studied language with morphologically rich nature, have demonstrated the effectiveness of neural architectures on well-formed texts and yielded state-of-the art results by formulating the task as a sequence tagging problem. In this work, we empirically investigate the use of recent neural architectures (Bidirectional long short-term memory and Transformer-based networks) proposed for Turkish NER tagging in the same setting. Our results demonstrate that transformer-based networks which can model long-range context overcome the limitations of BiLSTM networks where different input features at the character, subword, and word levels are utilized. We also propose a transformer-based network with a conditional random field (CRF) layer that leads to the state-of-the-art result (95.95\% f-measure) on a common dataset. Our study contributes to the literature that quantifies the impact of transfer learning on processing morphologically rich languages.
摘要：命名实体识别（NER）是被广泛研究的任务，提取和分类命名的文本实体。 NER不仅是下游的语言处理的应用，如关系抽取和答疑，而且在大规模大数据业务，如在线数字媒体内容的实时分析是至关重要的。在土耳其，一个研究较少的语言与形态上富裕的自然最近的研究工作，证明神经结构的有效性上形成良好的文本，并通过制定任务的序列标签问题得到国家的艺术效果。在这项工作中，我们凭经验调查使用提出了在相同的设置土耳其NER标记最近的神经结构（双向长短期记忆和基于变压器的网络）的。我们的结果表明，基于变压器的网络，这可以模拟远程上下文克服BiLSTM网络的局限性，其中在字符，子字不同的输入的功能，并且被利用字的水平。我们还提出了一种基于变压器的网络上的公共数据集条件随机场（CRF）层，导致国家的最先进的结果（95.95 \％F值）。我们的研究有助于量化迁移学习的处理上丰富的形态语言的影响的文献。

2. Movement Pruning: Adaptive Sparsity by Fine-Tuning [PDF] 返回目录
Victor Sanh, Thomas Wolf, Alexander M. Rush
Abstract: Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.
摘要：震级修剪是减少纯监督学习模型大小广泛使用的策略;但是，它是在传递学习制度，已成为国家的最先进的自然语言处理应用程序的标准不太有效。我们建议使用运动修剪，一个简单的，那就是更适应预训练模型微调确定性的第一阶重修剪方法。我们给数学基础的方法，并将其与现有的zeroth-和一阶修剪方法。实验表明，修剪大预训练的语言模型时，运动修剪显示在高稀疏性制度显著的改善。当与蒸馏相结合，该方法实现了与下降到仅3％的模型参数的最小的精度损失。

3. Recent Advances in SQL Query Generation: A Survey [PDF] 返回目录
Jovan Kalajdjieski, Martina Toshevska, Frosina Stojanovska
Abstract: Natural language is hypothetically the best user interface for many domains. However, general models that provide an interface between natural language and any other domain still do not exist. Providing natural language interface to relational databases could possibly attract a vast majority of users that are or are not proficient with query languages. With the rise of deep learning techniques, there is extensive ongoing research in designing a suitable natural language interface to relational databases. This survey aims to overview some of the latest methods and models proposed in the area of SQL query generation from natural language. We describe models with various architectures such as convolutional neural networks, recurrent neural networks, pointer networks, reinforcement learning, etc. Several datasets intended to address the problem of SQL query generation are interpreted and briefly overviewed. In the end, evaluation metrics utilized in the field are presented mainly as a combination of execution accuracy and logical form accuracy.
摘要：自然语言是假设为许多领域提供最佳的用户界面。但是，仍然不存在提供自然语言和任何其他域之间的接口一般车型。提供自然语言界面，关系数据库可能可能吸引绝大多数是或不精通与查询语言的用户。随着深学习技术的兴起，是在设计一个合适的自然语言接口关系数据库广泛持续的研究。这项调查的目的概述一些在SQL查询生成的自然语言方面提出的最新方法和模型。我们描述了各种架构模式，如卷积神经网络，递归神经网络，指针网络，强化学习等几个数据集旨在解决SQL查询生成的问题进行解释，并简要综述。最后，在该领域使用的评价标准主要呈现为执行精度和逻辑形式的精度的组合。

4. Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity [PDF] 返回目录
Steven R. Wilson, Walid Magdy, Barbara McGillivray, Gareth Tyson
Abstract: As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, We explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter.
摘要：作为一个网上，众包，开放的英语俚语词典，市字典平台，包含了丰富的意见，笑话，和术语的定义，短语，首字母缩写词，等等。但是，目前还不清楚这个平台上的活动究竟是如何涉及到更大的对话在网络上其他地方发生的事情，比如在更大，更流行的社交媒体平台的讨论。在这项研究中，我们研究了城市词典的时间活动性趋势，并提供这个活动如何与主要的社交网络上讨论的内容是第一个分析：Twitter的。通过收集整个城市词典中，以及鸣叫超过七年的大样本，我们探索的定义和搜索城市字典和被谈论的Twitter内容的单词和短语之间的连接。通过一系列互相关的计算，我们确定其中城市词典活性直接反映了其更大的对话发生在Twitter上的情况。然后，我们分析的类型必须在Twitter上的讨论更强的连接条件的，发现这正与Twitter相关的城市词典的活动是围绕相关记因受欢迎的公众人物，和离线活动方面居中。最后，我们探索的时间段时，术语在Twitter趋势和城市字典相应的活动之间的关系，揭示了新的定义更可能被添加到城市词典对当前趋势的Twitter条款。

5. Challenges in Emotion Style Transfer: An Exploration with a Lexical Substitution Pipeline [PDF] 返回目录
David Helbig, Enrica Troiano, Roman Klinger
Abstract: We propose the task of emotion style transfer, which is particularly challenging, as emotions (here: anger, disgust, fear, joy, sadness, surprise) are on the fence between content and style. To understand the particular difficulties of this task, we design a transparent emotion style transfer pipeline based on three steps: (1) select the words that are promising to be substituted to change the emotion (with a brute-force approach and selection based on the attention mechanism of an emotion classifier), (2) find sets of words as candidates for substituting the words (based on lexical and distributional semantics), and (3) select the most promising combination of substitutions with an objective function which consists of components for content (based on BERT sentence embeddings), emotion (based on an emotion classifier), and fluency (based on a neural language model). This comparably straight-forward setup enables us to explore the task and understand in what cases lexical substitution can vary the emotional load of texts, how changes in content and style interact and if they are at odds. We further evaluate our pipeline quantitatively in an automated and an annotation study based on Tweets and find, indeed, that simultaneous adjustments of content and emotion are conflicting objectives: as we show in a qualitative analysis motivated by Scherer's emotion component model, this is particularly the case for implicit emotion expressions based on cognitive appraisal or descriptions of bodily reactions.
摘要：本文提出的情感风格的转移，这是特别具有挑战性的任务，因为感情（这里：愤怒，厌恶，恐惧，快乐，悲伤，惊讶）是在内容和风格之间的栅栏。要理解这一任务的特殊困难，我们设计了一种基于三步一透明的情感风格传输管道：（1）选择有希望被取代，以改变情感（基于该蛮力方式和选择的话情绪分类器的注意机制），（2）求出套的话，作为替代的话候选（基于词法和分配语义），以及（3）选择与由组分对目标函数的取代的最有希望的组合含量（以BERT句的嵌入），情感（基于情感分类），流畅（基于神经语言模型）。这相当简单明了的设置使我们能够探索任务，在什么情况下，词汇替代可以改变文本的感性负载明白，如何内容和风格的变化相互作用，如果他们在赔率。我们进一步定量地评估我们的管道的基础上鸣叫的自动化和注释的研究和发现，事实上，那的内容和情感的同时调整相互冲突的目标：为大家展示由谢勒的情感组件模型的动机定性分析，这是特别的情况下为基于认知评价或身体反应的说明隐式情感表达式。

6. Neural Entity Linking on Technical Service Tickets [PDF] 返回目录
Nadja Kurz, Felix Hamann, Adrian Ulges
Abstract: Entity linking, the task of mapping textual mentions to known entities, has recently been tackled using contextualized neural networks. We address the question whether these results -- reported for large, high-quality datasets such as Wikipedia -- transfer to practical business use cases, where labels are scarce, text is low-quality, and terminology is highly domain-specific. Using an entity linking model based on BERT, a popular transformer network in natural language processing, we show that a neural approach outperforms and complements hand-coded heuristics, with improvements of about 20\% top-1 accuracy. Also, the benefits of transfer learning on a large corpus are demonstrated, while fine-tuning proves difficult. Finally, we compare different BERT-based architectures and show that a simple sentence-wise encoding (Bi-Encoder) offers a fast yet efficient search in practice.
摘要：实体连接，映射文本的任务提到已知的实体，最近一直使用情境的神经网络解决。我们解决的问题，这些结果是否 - 报告大型，高品质的数据集，如维基百科 - 转移到实际的业务用例，其中标签是稀缺的，文字是低质量的，和术语是非常特定领域。利用基于BERT链接模型的实体，在自然语言处理的流行变压器网络，我们表明，神经方法比和补手工编码的启发，约20 \％，最高1精度的改善。此外，在一个大的语料库迁移学习的好处被证实，而微调证明是困难的。最后，我们比较了不同的基于BERT的架构，并显示一个简单的句子明智编码（双编码器）提供了在实践中还没有快速有效的搜索。

7. Parallel Data Augmentation for Formality Style Transfer [PDF] 返回目录
Yi Zhang, Tao Ge, Xu Sun
Abstract: The main barrier to progress in the task of Formality Style Transfer is the inadequacy of training data. In this paper, we study how to augment parallel data and propose novel and simple data augmentation methods for this task to obtain useful sentence pairs with easily accessible models and systems. Experiments demonstrate that our augmented parallel data largely helps improve formality style transfer when it is used to pre-train the model, leading to the state-of-the-art results in the GYAFC benchmark dataset.
摘要：在形式上的风格转换的任务进步的主要障碍是训练数据的不足。在本文中，我们研究如何增强并行数据，并提出新的和简单的数据隆胸方法完成这个任务，以获得有用的句子对与方便模型和系统。实验表明，我们的增强并行数据在很大程度上有助于提高手续风格转移当它被用来预训练模型，导致国家的最先进的导致GYAFC基准数据集。

8. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre [PDF] 返回目录
Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero
Abstract: This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
摘要：本文介绍了建立一个标注语料库和剧场训练模型古典法国文学，重点，并在诗歌特别喜剧的过程。它最初是作为一个初步的步骤，在Cafiero主管和营[2019]提出的stylometric分析。基于神经网络和CRF标注器使用最近lemmatiser允许实现超越当前国家的最先进的中域测试精度，并证明是在超出域测试，ieup强劲20 c.novels。

9. COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter [PDF] 返回目录
Martin Müller, Marcel Salathé, Per E Kummervold
Abstract: In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.
摘要：在这项工作中，我们释放COVID，Twitter的-BERT（CT-BERT），基于变压器的模型，预先训练上COVID-19的话题大语料库的Twitter消息。我们的模型显示相比，它的基本型号，BERT-大，在五个不同的分类数据集10-30％的边际改善。最大的改进是在目标域。预训练的变压器模型，如CT-BERT，被训练在一个特定的目标域，可以用于各种各样的自然语言处理任务，包括分类，答疑和聊天机器人。 CT-BERT进行了优化，可在COVID-19的内容使用，从Twitter尤其是社交媒体文章。

10. Adaptive Transformers for Learning Multimodal Representations [PDF] 返回目录
Prajjwal Bhargava
Abstract: The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations. These architectures are often over-parametrized, requiring large amounts of computation. In this work, we extend adaptive approaches to learn more about model interpretability and computational efficiency. Specifically, we study attention spans, sparse, and structured dropout methods to help understand how their attention mechanism extends for vision and language tasks. We further show that these approaches can help us learn more about how the network perceives the complexity of input sequences, sparsity preferences for different modalities, and other related phenomena.
摘要：变压器的使用已经从学习语言的语义形成有意义的visiolinguistic表示增长。这些架构通常过于参数化，需要大量的计算。在这项工作中，我们扩展的自适应方法来了解模型解释性和计算效率。具体来说，我们学习的注意力，稀疏，脱落的结构方法，以帮助了解他们的注意力机制，视觉和语言的任务延伸。进一步的研究表明，这些方法可以帮助我们更多地了解如何在网络感知输入序列，不同的方式稀疏的喜好，以及其他相关现象的复杂性。

11. Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space [PDF] 返回目录
Marko Robnik-Sikonja, Igor Mozetic
Abstract: Word embeddings represent words in a numeric space in such a way that semantic relations between words are encoded as distances and directions in the vector space. Cross-lingual word embeddings map words from one language to the vector space of another language, or words from multiple languages to the same vector space where similar words are aligned. Cross-lingual embeddings can be used to transfer machine learning models between languages and thereby compensate for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms using the joint numerical space for many languages as implemented in the LASER library: the transfer of trained models, and expansion of training sets with instances from other languages. Our experiments show that the transfer of models between similar languages is sensible, while dataset expansion did not increase the predictive performance.
摘要：字的嵌入表示这样一种方式，字与字之间的语义关系被编码为在矢量空间距离和方向的数字空间的话。跨语言字的嵌入从一种语言到另一种语言的向量空间，或从多个语言的文字到类似的词语排列相同的向量空间映射的话。跨语言的嵌入可用于语言之间转移机器学习模型，从而弥补资源不足较少的语言数据不足。我们采用跨语言词的嵌入，以13种语言之间传输机器学习模型预测Twitter的情绪。我们专注于两个转移机制，用联合的数值空间为激光库中实现多语言：训练的模型的转移，以及与其他语言的情况下，训练集扩展。我们的实验表明，模型相似的语言之间的转移是明智的，而数据集扩展并没有增加的预测性能。

12. Spelling Error Correction with Soft-Masked BERT [PDF] 返回目录
Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li
Abstract: Spelling error correction is an important yet challenging task because a satisfactory solution of it essentially needs human-level language understanding ability. Without loss of generality we consider Chinese spelling error correction (CSC) in this paper. A state-of-the-art method for the task selects a character from a list of candidates for correction (including non-correction) at each position of the sentence on the basis of BERT, the language representation model. The accuracy of the method can be sub-optimal, however, because BERT does not have sufficient capability to detect whether there is an error at each position, apparently due to the way of pre-training it using mask language modeling. In this work, we propose a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT, with the former being connected to the latter with what we call soft-masking technique. Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two datasets demonstrate that the performance of our proposed method is significantly better than the baselines including the one solely based on BERT.
摘要：拼写纠错是一项重要而艰巨的任务，因为它的一个令人满意的解决方案基本上是需要人类水平的语言理解能力。不失一般性，我们认为在本文中中国拼写纠错（CSC）。为在任务A的状态的最先进的方法选择从用于校正的候选的列表（包括非校正）在BERT，语言表示模型的基础上的句子的各位置的字符。该方法的精确度可以是次优的，但是，因为BERT没有足够的能力来检测是否有在每一个位置的误差，显然是由于使用面膜语言建模它前培训的方式。在这项工作中，我们提出了一种新的神经结构，以解决上述问题，其中包括用于错误检测网络和基于BERT用于纠错的网络，前者被连接到后者就是我们所说的软屏蔽技术。我们使用`软屏蔽BERT”的方法是通用的，并且它可以在其他语言检测校正的问题可以使用。对两个数据集实验结果表明，我们提出的方法的性能显著优于基线包括仅仅基于BERT的一个。

13. Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model [PDF] 返回目录
Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig
Abstract: Videos uploaded on social media are often accompanied with textual descriptions. In building automatic speech recognition (ASR) systems for videos, we can exploit the contextual information provided by such video metadata. In this paper, we explore ASR lattice rescoring by selectively attending to the video descriptions. We first use an attention based method to extract contextual vector representations of video metadata, and use these representations as part of the inputs to a neural language model during lattice rescoring. Secondly, we propose a hybrid pointer network approach to explicitly interpolate the word probabilities of the word occurrences in metadata. We perform experimental evaluations on both language modeling and ASR tasks, and demonstrate that both proposed methods provide performance improvements by selectively leveraging the video metadata.
摘要：上传到社交媒体的视频往往伴随着文字描述。在建设自动语音识别（ASR）系统的视频，我们可以利用这种视频元数据提供的上下文信息。在本文中，我们探讨了ASR晶格通过选择性地参加到视频的描述再评分。我们首先使用基于注意力的方法来提取视频元数据的上下文向量表示，和晶格再评分过程中使用这些表示作为输入部分神经语言模型。其次，我们提出了一个混合网络指针的方法来明确插值词出现的词概率在元数据中。我们两个语言建模和ASR任务进行实验评估，并证明了这两种方法，提出了通过有选择地利用视频元数据提供的性能改进。

14. Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation [PDF] 返回目录
Ryuichi Takanobu, Qi Zhu, Jinchao Li, Baolin Peng, Jianfeng Gao, Minlie Huang
Abstract: There is a growing interest in developing goal-oriented dialog systems which serve users in accomplishing complex tasks through multi-turn conversations. Although many methods are devised to evaluate and improve the performance of individual dialog components, there is a lack of comprehensive empirical study on how different components contribute to the overall performance of a dialog system. In this paper, we perform a system-wise evaluation and present an empirical analysis on different types of dialog systems which are composed of different modules in different settings. Our results show that (1) a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels, (2) component-wise, single-turn evaluation results are not always consistent with the overall performance of a dialog system, and (3) despite the discrepancy between simulators and human users, simulated evaluation is still a valid alternative to the costly human evaluation especially in the early stage of development.
摘要：在开发其服务用户通过多圈的谈话完成复杂的任务，目标导向的对话系统越来越感兴趣。虽然许多方法设计评估和提高个人对话组件的性能，对组件如何不同的有助于对话系统的整体性能缺乏全面的实证研究。在本文中，我们执行系统明智评价和在其上以不同的设置由不同的模块的不同类型的对话系统的呈现经验分析。我们的研究结果表明：（1）在不同水平的成分使用细粒度监控信号往往获得高于粗粒度的标签训练有素使用联合或终端到高端机型的系统更好的性能管道对话系统的训练，（2）成分-wise，单圈评测结果并不总是与对话系统，尽管模拟器和人类用户之间的差异的整体性能，和（3）一致，模拟评价仍然是昂贵的人工评估的有效替代特别是在早期发展阶段。

15. A chatbot architecture for promoting youth resilience [PDF] 返回目录
Chester Holt-Quick, Jim Warren, Karolina Stasiak, Ruth Williams, Grant Christie, Sarah Hetrick, Sarah Hopkins, Tania Cargo, Sally Merry
Abstract: E-health technologies have the potential to provide scalable and accessible interventions for youth mental health. As part of a developing an ecosystem of e-screening and e-therapy tools for New Zealand young people, a dialog agent, Headstrong, has been designed to promote resilience with methods grounded in cognitive behavioral therapy and positive psychology. This paper describes the architecture underlying the chatbot. The architecture supports a range of over 20 activities delivered in a 4-week program by relatable personas. The architecture provides a visual authoring interface to its content management system. In addition to supporting the original adolescent resilience chatbot, the architecture has been reused to create a 3-week 'stress-detox' intervention for undergraduates, and subsequently for a chatbot to support young people with the impacts of the COVID-19 pandemic, with all three systems having been used in field trials. The Headstrong architecture illustrates the feasibility of creating a domain-focused authoring environment in the context of e-therapy that supports non-technical expert input and rapid deployment.
摘要：E-卫生技术必须提供青少年心理健康的可扩展性和可访问干预措施的可能性。由于E-筛选和电子治疗工具的新西兰年轻人发展的生态系统的一部分，一个对话框剂，任性，一直旨在促进与认知行为治疗和积极心理学接地方法弹性。本文详细介绍了架构的聊天机器人底层。该架构支持的范围内通过听上去很像角色在4周的方案提供了超过20活动。该体系结构提供了一个可视创作界面到其内容管理系统。除了支持原有的青少年韧性聊天机器人，该架构得到了重用，以创建为本科生为期3周的“应力排毒的干预，以及随后的聊天机器人，支持年轻人的COVID-19大流行的影响，与已经在现场试验中使用的所有三个系统。任性的架构说明了E-治疗的情况下创建一个域为中心的创作环境的可行性，支持非技术专家的意见和快速部署。

16. OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach [PDF] 返回目录
Fatemah Husain
Abstract: The preprocessing phase is one of the key phases within the text classification pipeline. This study aims at investigating the impact of the preprocessing phase on text classification, specifically on offensive language and hate speech classification for Arabic text. The Arabic language used in social media is informal and written using Arabic dialects, which makes the text classification task very complex. Preprocessing helps in dimensionality reduction and removing useless content. We apply intensive preprocessing techniques to the dataset before processing it further and feeding it into the classification model. An intensive preprocessing-based approach demonstrates its significant impact on offensive language detection and hate speech detection shared tasks of the fourth workshop on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT). Our team wins the third place (3rd) in the Sub-Task A Offensive Language Detection division and wins the first place (1st) in the Sub-Task B Hate Speech Detection division, with an F1 score of 89% and 95%, respectively, by providing the state-of-the-art performance in terms of F1, accuracy, recall, and precision for Arabic hate speech detection.
摘要：预处理阶段是文本分类管道内的关键阶段之一。这项研究的目的是调查在文本分类的预处理阶段的影响，特别是对攻击性的语言和仇恨言论分类为阿拉伯文字。在社会化媒体中使用的阿拉伯语是非正式的，并使用阿拉伯语方言，这使得文本分类的任务非常复杂的编写。预处理有助于降维和删除无用的内容。我们进一步处理，并将其送入分类模型前申请密集预处理技术的数据集。一种基于强化预处理的方法证明对开源阿拉伯语语料库与语料库处理工具（OSACT）第四次研讨会的攻击性语言检测和仇恨言论检测共享任务的显著影响。我们的团队胜在子任务A攻击性语言检测部门的第三位（第3），并分别获得第一名（1）在小组任务B仇恨言论检测师，与F1分数的89％和95％，通过提供在F1，准确性，调用和精度阿拉伯恨语音检测条件的状态的最先进的性能。

17. VirAAL: Virtual Adversarial Active Learning [PDF] 返回目录
Gregory Senay, Badr Youbi Idrissi, Marine Haziza
Abstract: This paper presents VirAAL, an Active Learning framework based on Adversarial Training. VirAAL aims to reduce the effort of annotation in Natural Language Understanding (NLU). VirAAL is based on Virtual Adversarial Training (VAT), a semi-supervised approach that regularizes the model through Local Distributional Smoothness. With that, adversarial perturbations are added to the inputs making the posterior distribution more consistent. Therefore, entropy-based Active Learning becomes robust by querying more informative samples without requiring additional components. The first set of experiments studies the impact of VAT on NLU tasks (joint or not) within low labeled data regimes. The second set shows the effect of VirAAL in an Active Learning (AL) process. Results demonstrate that VAT is robust even on multitask training where the adversarial noise is computed from multiple loss functions. Substantial improvements are observed with entropy-based AL with VirAAL for querying data to annotate. VirAAL is an inexpensive method in terms of AL computation with a positive impact on data sampling. Furthermore, VirAAL decreases annotations in AL up to 80%.
摘要：本文介绍VirAAL的基础上，对抗性训练的主动学习框架。 VirAAL旨在减少标注在自然语言理解（NLU）的努力。 VirAAL基于虚拟对抗性训练（VAT），该规则化通过本地分布式平整度模型半监督方法。就这样，对抗性扰动被添加到使后验分布更一致的输入。因此，基于熵的主动学习成为通过查询更多信息的样本，而不需要额外的组件强劲。第一组实验研究增值税对自然语言理解任务的影响低标记数据政权内（联合或不）。所述第二组显示VirAAL的在主动学习（AL）过程的影响。结果表明，增值税是鲁棒的，即使在多任务训练，其中对抗噪声是从多个损耗函数来计算。实质性的改善观察到与基于熵AL与VirAAL用于查询数据来注释。 VirAAL与上采样数据产生积极影响AL计算方面的廉价的方法。此外，VirAAL降低了AL注释高达80％。

18. A pre-training technique to localize medical BERT and enhance BioBERT [PDF] 返回目录
Shoya Wada, Toshihiro Takeda, Shiro Manabe, Shozo Konishi, Jun Kamohara, Yasushi Matsumura
Abstract: Bidirectional Encoder Representations from Transformers (BERT) models for biomedical specialties such as BioBERT and clinicalBERT have significantly improved in biomedical text-mining tasks and enabled us to extract valuable information from biomedical literature. However, we benefitted only in English because of the significant scarcity of high-quality medical documents, such as PubMed, in each language. Therefore, we propose a method that realizes a high-performance BERT model by using a small corpus. We introduce the method to train a BERT model on a small medical corpus both in English and Japanese, respectively, and then we evaluate each of them in terms of the biomedical language understanding evaluation (BLUE) benchmark and the medical-document-classification task in Japanese, respectively. After confirming their satisfactory performances, we apply our method to develop a model that outperforms the pre-existing models. Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT) achieves the best scores on 7 of the 10 datasets in terms of the BLUE benchmark. The total score is 1.0 points above that of BioBERT.
摘要：从变压器（BERT）模型用于生物医学专业，如BioBERT和clinicalBERT双向编码器交涉生物医学文本挖掘任务都显著改善，使我们能够提取生物医学文献的有价值的信息。然而，我们只受益于英语，因为高品质的医疗文件，如考研的显著稀缺的，在每一种语言。因此，我们建议使用小语料库实现了高性能BERT模型的方法。我们介绍的方法来训练上都分别在英语和日语，小医语料库BERT模型，然后我们评估在生物医学语言理解评估（蓝色）的基准和医疗文档分类任务方面他们每个人日本人，分别。确认其满意的演出结束后，我们应用我们的方法开发出优于先前存在的模型的模型。从变压器由大阪大学的生物医学文本挖掘（ouBioBERT）双向编码器交涉达到在蓝色基准方面对10点的数据集的7最好的成绩。总得分高于BioBERT的1.0分。

19. Benchmarking neural embeddings for link prediction in knowledge graphs under semantic and structural changes [PDF] 返回目录
Asan Agibetov, Matthias Samwald
Abstract: Recently, link prediction algorithms based on neural embeddings have gained tremendous popularity in the Semantic Web community, and are extensively used for knowledge graph completion. While algorithmic advances have strongly focused on efficient ways of learning embeddings, fewer attention has been drawn to the different ways their performance and robustness can be evaluated. In this work we propose an open-source evaluation pipeline, which benchmarks the accuracy of neural embeddings in situations where knowledge graphs may experience semantic and structural changes. We define relation-centric connectivity measures that allow us to connect the link prediction capacity to the structure of the knowledge graph. Such an evaluation pipeline is especially important to simulate the accuracy of embeddings for knowledge graphs that are expected to be frequently updated.
摘要：近日，基于神经的嵌入链接预测算法已经在语义Web社区获得了巨大的人气，并且被广泛用于知识图完成。虽然算法的进步已经强烈地关注学习的嵌入的有效途径，更少的注意力都被吸引到不同的方式其性能和稳定性进行评估。在这项工作中，我们提出了一个开源的评价管道，其基准测试神经的嵌入的在知识图可能会遇到的语义和结构变化情况的准确性。我们定义的关系为中心的连通性措施，使我们能够链接预测能力连接到知识图的结构。这样的评价管道是特别重要的，以模拟预计要经常更新知识的嵌入图表的准确性。

20. Grounding Language in Play [PDF] 返回目录
Corey Lynch, Pierre Sermanet
Abstract: Natural language is perhaps the most versatile and intuitive way for humans to communicate tasks to a robot. Prior work on Learning from Play (LfP) [Lynch et al, 2019] provides a simple approach for learning a wide variety of robotic behaviors from general sensors. However, each task must be specified with a goal image---something that is not practical in open-world environments. In this work we present a simple and scalable way to condition policies on human language instead. We extend LfP by pairing short robot experiences from play with relevant human language after-the-fact. To make this efficient, we introduce multicontext imitation, which allows us to train a single agent to follow image or language goals, then use just language conditioning at test time. This reduces the cost of language pairing to less than 1% of collected robot experience, with the majority of control still learned via self-supervised imitation. At test time, a single agent trained in this manner can perform many different robotic manipulation skills in a row in a 3D environment, directly from images, and specified only with natural language (e.g. "open the drawer...now pick up the block...now press the green button..."). Finally, we introduce a simple technique that transfers knowledge from large unlabeled text corpora to robotic learning. We find that transfer significantly improves downstream robotic manipulation. It also allows our agent to follow thousands of novel instructions at test time in zero shot, in 16 different languages. See videos of our experiments at this http URL
摘要：自然语言也许是人类的任务传达给机器人最通用和直观的方式。从播放（LFP）[Lynch等，2019]上学习先前的工作提供了一种用于学习各种各样从一般传感器的机器人的行为的简单方法。然而，每一个任务必须与目标图像---东西是不实际的开放世界环境中指定。在这项工作中，我们提出了一个简单易扩展的方式对人类语言的条件，而不是政策。我们从玩配对短机器人的经验与相关的人类语言后 - 事实上延长LFP。为了使这个效率，我们引进multicontext模仿，这使得我们可以训练一个代理遵循图像或语言的目标，那么在测试时只使用语言调节。这减少了语言配对成本的收集机器人体验不到1％，通过自我监督的模仿还是学到了广大的控制。在测试时，以这种方式培养了单剂可以在3D环境中连续执行许多不同的机器人操作技能，直接从图像，并且只用自然语言规定（如“打开抽屉......现在拿起块...现在按下绿色按钮...“）。最后，我们介绍一个简单的方法，从大的未标记文本语料库转移知识的机器人学习。我们发现，转移显著改善下游机器人操作。这也使我们的代理遵循成千上万的新指令的测试时间在零出手，在16种不同的语言。见我们的实验视频在这个HTTP URL

21. Finding Experts in Transformer Models [PDF] 返回目录
Xavier Suau, Luca Zappella, Nicholas Apostoloff
Abstract: In this work we study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance. We define expert units to be neurons that are able to classify a concept with a given average precision, where a concept is represented by a binary set of sentences containing the concept (or not). Leveraging the OneSec dataset (Scarlini et al., 2019), we compile a dataset of 1641 concepts that allows diverse expert units in TM to be discovered. We show that expert units are important in several ways: (1) The presence of expert units is correlated ($r^2=0.833$) with the generalization power of TM, which allows ranking TM without requiring fine-tuning on suites of downstream tasks. We further propose an empirical method to decide how accurate such experts should be to evaluate generalization. (2) The overlap of top experts between concepts provides a sensible way to quantify concept co-learning, which can be used for explainability of unknown concepts. (3) We show how to self-condition off-the-shelf pre-trained language models to generate text with a given concept by forcing the top experts to be active, without requiring re-training the model or using additional parameters.
摘要：在这项工作中，我们研究单位的专家在预先训练Transformer模型存在（TM），以及它们如何影响模型的性能。我们定义专家单位是神经元，它们能够一个概念与给定的平均精确度，其中概念由二进制组包含的概念（或不）句子表示分类。凭借OneSec数据集（Scarlini等人2019年），我们编制1641年的概念，它允许在不同的TM专家单位被发现的数据集。我们表明，专家单元在几个方面是很重要的：（1）专家单元的存在是相关的（$ R ^ 2 = 0.833 $）与TM的泛化功率，这使得排名TM，而不需要对下游套房微调任务。我们进一步提出了实证的方法来决定这些专家应该如何准确评估概括。（2）顶部的专家概念之间的重叠提供了一种合理的方法来量化概念共同学习，这可用于未知概念explainability。（3）我们展示如何自我状态关闭的，现成的预先训练语言模型，以迫使顶级专家要主动生成与给定的概念文本，而不需要再培训的模式，或者使用其他参数。

22. History for Visual Dialog: Do we really need it? [PDF] 返回目录
Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser
Abstract: Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we show that co-attention models which explicitly encode dialog history outperform models that don't, achieving state-of-the-art performance (72 % NDCG on val set). However, we also expose shortcomings of the crowd-sourcing dataset collection procedure by showing that history is indeed only required for a small amount of the data and that the current evaluation metric encourages generic replies. To that end, we propose a challenging subset (VisDialConv) of the VisDial val set and provide a benchmark of 63% NDCG.
摘要：视觉对话涉及“理解”的对话历史（之前已经讨论过）和当前问题（什么是问），除了在图像中的接地信息，生成正确的响应。在本文中，我们证明了共同关注的模型，其中明确编码对话历史跑赢模型，不这样做，实现国家的最先进的性能（72％NDCG上VAL集）。然而，我们也通过展示历史确实是只需要对数据进行少量暴露人群采购数据集收集程序的缺点和当前的评估指标鼓励通用回复。为此，我们提出VisDial VAL集的一个子集挑战（VisDialConv），并提供63％NDCG的标杆。

23. Predicting User Emotional Tone in Mental Disorder Online Communities [PDF] 返回目录
Bárbara Silveira, Fabricio Murai, Ana Paula Couto da Silva
Abstract: Online Social Networks have become an important medium for communication among people who suffer from mental disorders to share moments of hardship and to seek support. Here we analyze how Reddit discussions can help improve the health conditions of its users. Using emotional tone of user publications as a proxy for his emotional state, we uncover relationships between state changes and interactions he has in a given community. We observe that authors of negative posts often write more positive comments after engaging in discussions. Second, we build models based on state-of-the-art embedding techniques and RNNs to predict shifts in emotional tone. We show that it is possible to predict with good accuracy the reaction of users of mental disorder online communities to the interactions experienced in these platforms. Our models could assist in interventions promoted by health care professionals to provide support to people suffering from mental health illnesses.
摘要：在线社交网络已成为通信谁从精神疾病患共享困难的时刻，并寻求支持人与人之间的重要媒介。下面我们分析reddit的讨论如何帮助提高用户的健康状况。用情感基调用户出版物为他的情绪状态，状态变化和相互作用他在某个社区之间，我们揪出关系的代理。我们观察到的负面帖子，作者往往写更多的积极评价在讨论后参与。其次，我们根据国家的最先进的嵌入技术和RNNs来预测情感基调的变化建立模型。我们表明，它可能具有良好的精度来预测心理障碍的在线社区，在这些平台经历了互动的用户的反应。我们的模型可以帮助医护人员晋升为提供给人们的心理健康疾病的人支持干预。

24. Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models [PDF] 返回目录
Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu
Abstract: Recent Transformer-based large-scale pre-trained models have revolutionized vision-and-language (V+L) research. Models such as ViLBERT, LXMERT and UNITER have significantly lifted state of the art across a wide range of V+L benchmarks with joint image-text pre-training. However, little is known about the inner mechanisms that destine their impressive success. To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e.g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e.g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings). Through extensive analysis of each archetypal model architecture via these probing tasks, our key observations are: (i) Pre-trained models exhibit a propensity for attending over text rather than images during inference. (ii) There exists a subset of attention heads that are tailored for capturing cross-modal interactions. (iii) Learned attention matrix in pre-trained models demonstrates patterns coherent with the latent alignment between image regions and textual words. (iv) Plotted attention patterns reveal visually-interpretable relations among image regions. (v) Pure linguistic knowledge is also effectively encoded in the attention heads. These are valuable insights serving to guide future work towards designing better model architecture and objectives for multimodal pre-training.
摘要：近期基于变压器的大型预训练模式已经彻底改变了视觉和语言（V + L）的研究。模型如ViLBERT，LXMERT和UNITER已经显著解除了现有技术的状态在很宽的范围第V + L基准与联合图像 - 文本预训练。然而，鲜为人知的是，这注定他们印象深刻的成功的内在机制。为了揭示的秘密，这些强悍的机型的场景背后，我们现值（视觉和语言理解评估），一套精心设计的探测任务（例如，视觉指代消解，视觉关系检测，语言完成探测任务）推广到标准预先训练V + L型，旨在破译多前培训的内部运作（例如，隐性知识在个人的关注头囊括，固有的跨模态定位，通过情境化的嵌入多学习）。通过通过这些探测任务，每个原型模型结构的深入分析，我们的主要意见是：（i）前训练的模型表现出倾向推断期间参加过的文字，而不是图像。（ⅱ）存在的注意头被定制用于捕获跨通道相互作用的子集。（ⅲ）在预训练模式习得关注矩阵演示模式与图像区域和文本字之间的潜对准相干的。（四）绘制的关注模式显示图像区域之间在视觉上可解释的关系。（五）纯语言知识，也有效编码中注意头。这些服务以指导今后的工作，争取多峰前培训设计更好的模型结构和目标有价值的见解。

25. Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario [PDF] 返回目录
Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko
Abstract: Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used clustering-based diarization approaches perform rather poorly in such conditions, mainly due to the limited ability to handle overlapping speech. We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame. TS-VAD model takes conventional speech features (e.g., MFCC) along with i-vectors for each speaker as inputs. A set of binary classification output layers produces activities of each speaker. I-vectors can be estimated iteratively, starting with a strong clustering-based diarization. We also extend the TS-VAD approach to the multi-microphone case using a simple attention mechanism on top of hidden representations extracted from the single-channel TS-VAD model. Moreover, post-processing strategies for the predicted speaker activity probabilities are investigated. Experiments on the CHiME-6 unsegmented data show that TS-VAD achieves state-of-the-art results outperforming the baseline x-vector-based system by more than 30% Diarization Error Rate (DER) abs.
摘要：喇叭diarization的现实生活场景是一个非常具有挑战性的问题。广泛使用的基于簇的diarization方法执行相当差在这样的条件下，主要由于处理重叠语音的能力有限。我们提出了一个新颖的目标语语音活动检测（TS-VAD）的方法，它直接预测每个时间帧上的每个扬声器的活性。 TS-VAD模型以常规的语音特征（例如，MFCC），其中i-矢量为每个说话者作为输入沿。一组二进制分类输出层的产生各扬声器的活动。 I-载体可以迭代地进行估计，从强烈的基于聚类的diarization。我们还使用从单声道TS-VAD模型中提取隐藏表示之上的简单注意机制延长TS-VAD方法来多麦克风的情况下。此外，预测扬声器活动概率后处理策略进行了研究。在磬-6未分段数据实验表明，TS-VAD实现状态的最先进的结果由30％以上Diarization错误率（DER）ABS表现好于基于X-矢量基线系统。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-18

目录

摘要