摘要

1. Towards the Study of Morphological Processing of the Tangkhul Language [PDF] 返回目录
Mirinso Shadang, Navanath Saharia, Thoudam Doren Singh
Abstract: There is no or little work on natural language processing of Tangkhul language. The current work is a humble beginning of morphological processing of this language using an unsupervised approach. We use a small corpus collected from different sources of text books, short stories and articles of other topics. Based on the experiments carried out, the morpheme identification task using morphessor gives reasonable and interesting output despite using a small corpus.
摘要：上Tangkhul语言的自然语言处理没有或很少的工作。目前的工作是使用无监督方法这门语言的形态学处理的一个不起眼的开始。我们使用的课本，短篇小说和其他主题的文章不同来源收集小语料库。根据进行的实验中，使用morphessor语素识别任务给予合理的和有趣的输出虽然使用小语料库。

2. Natural Backdoor Attack on Text Data [PDF] 返回目录
Moon Sun
Abstract: Deep learning has been widely adopted in natural language processing applications in recent years. Many existing studies show the vulnerabilities of machine learning and deep learning models against adversarial examples. However, most existing works currently focus on evasion attack on text data instead of positioning attack, also named \textit{backdoor attack}. In this paper, we systematically study the backdoor attack against models on text data. First, we define the backdoor attack on text data. Then, we propose the different attack strategies to generate trigger on text data. Next, we propose different types of triggers based on modification scope, human recognition and special cases. Last, we evaluate the backdoor attack and the results show the excellent performance of with 100\% backdoor attack rate and sacrificing of 0.71\% on text classification text.
摘要：深学习，近年来被广泛采用自然语言处理的应用程序。许多现有的研究表明，机器学习的反对对抗的例子漏洞和深的学习模式。然而，大多数现有的工作目前集中在对文本数据，而不是定位攻击，也称为\ textit {后门攻击}逃避攻击。在本文中，我们系统地研究对车型的后门攻击的文本数据。首先，我们定义的文本数据的后门攻击。然后，我们提出了不同的攻击策略生成的文本数据触发。接下来，我们提出了不同类型的基础上改造的范围，人类识别和特殊情况下触发的。最后，我们评估了后门攻击，结果显示有100 \％后门攻击速度的优异性能和0.71 \％文本分类文本牺牲。

3. Multichannel CNN with Attention for Text Classification [PDF] 返回目录
Zhenyu Liu, Haiwei Huang, Chaohong Lu, Shengfei Lyu
Abstract: Recent years, the approaches based on neural networks have shown remarkable potential for sentence modeling. There are two main neural network structures: recurrent neural network (RNN) and convolution neural network (CNN). RNN can capture long term dependencies and store the semantics of the previous information in a fixed-sized vector. However, RNN is a biased model and its ability to extract global semantics is restricted by the fixed-sized vector. Alternatively, CNN is able to capture n-gram features of texts by utilizing convolutional filters. But the width of convolutional filters restricts its performance. In order to combine the strengths of the two kinds of networks and alleviate their shortcomings, this paper proposes Attention-based Multichannel Convolutional Neural Network (AMCNN) for text classification. AMCNN utilizes a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations, so that the information of both the front and back of the sentence can be fully expressed. Then the scalar attention and vectorial attention are applied to obtain multichannel representations. The scalar attention can calculate the word-level importance and the vectorial attention can calculate the feature-level importance. In the classification task, AMCNN uses a CNN structure to cpture word relations on the representations generated by the scalar and vectorial attention mechanism instead of calculating the weighted sums. It can effectively extract the n-gram features of the text. The experimental results on the benchmark datasets demonstrate that AMCNN achieves better performance than state-of-the-art methods. In addition, the visualization results verify the semantic richness of multichannel representations.
摘要：近年来，基于神经网络的方法已经显示了句建模潜力惊人。有两个主要的神经网络结构：递归神经网络（RNN）和卷积神经网络（CNN）。 RNN可以捕获长期依赖关系和存储在一个固定大小的矢量的先前信息的语义。然而，RNN是偏置模型及其提取全局语义能力由固定大小的矢量的限制。另外，CNN是通过利用卷积过滤器能够文本捕获正语法特征。但是卷积过滤器的宽度限制了其性能。为了将两种网络的优势结合，减轻自己的缺点，本文提出了基于注意机制的多通道卷积神经网络（AMCNN）进行文本分类。 AMCNN利用双向长短期记忆编码的话转化为高维表示，历史和未来的信息，从而使前部和句子的后面两者的信息可以得到充分的体现。然后标量的关注和重视向量应用于得到的多声道表示。标量注意力可以计算出字级重要性和矢量注意力可以计算特征级重要性。在分类任务，AMCNN使用CNN结构由标量和矢量关注机制，而不是计算的加权和产生的表示cpture字的关系。它可以有效地提取文本的正语法特征。基准的数据集上的实验结果表明，AMCNN实现了比国家的最先进的方法更好的性能。此外，可视化结果验证了多通道表示的语义丰富性。

4. Leveraging Subword Embeddings for Multinational Address Parsing [PDF] 返回目录
Marouane Yassine, David Beauchemin, François Laviolette, Luc Lamontagne
Abstract: Address parsing consists of identifying the segments that make up an address such as a street name or a postal code. Because of its importance for tasks like record linkage, address parsing has been approached with many techniques. Neural network methods defined a new state-of-the-art for address parsing. While this approach yielded notable results, previous work has only focused on applying neural networks to achieve address parsing of addresses from one source country. We propose an approach in which we employ subword embeddings and a Recurrent Neural Network architecture to build a single model capable of learning to parse addresses from multiple countries at the same time while taking into account the difference in languages and address formatting systems. We achieved accuracies around 99 % on the countries used for training with no pre-processing nor post-processing needed. In addition, we explore the possibility of transferring the address parsing knowledge attained by training on some countries' addresses to others with no further training. This setting is also called zero-shot transfer learning. We achieve good results for 80 % of the countries (34 out of 41), almost 50 % of which (19 out of 41) is near state-of-the-art performance.
摘要：地址解析包括识别构成一个地址段例如街道名称或邮递区号的。因为它像记录链接任务的重要性，地址解析已经接洽了许多技术。神经网络的方法中定义的新的国家的最先进的用于地址解析。虽然这种方法收效明显，以前的工作只集中在应用神经网络来实现地址从一个来源国地址的解析。我们建议中，我们使用的嵌入子字的方法和递归神经网络体系结构来构建能够学习的同时，以解析来自多个国家的地址，考虑到差异语言和地址格式化系统，而单一模式。我们所使用的训练没有预先处理，也不是国家实现了99％左右的精度后期处理需要。此外，我们探索，没有进一步的培训上一些国家的地址通过转移培训获得的地址解析知识给他人的可能性。此设置也称为零次转让的学习。我们实现了80％的国家（34出来的41）良好的结果，约50％，其（19出来的41）的是国家的最先进的近性能。

5. Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa [PDF] 返回目录
Katikapalli Subramanyam Kalyan, S.Sangeetha
Abstract: This paper presents our approach for task 2 and task 3 of Social Media Mining for Health (SMM4H) 2020 shared tasks. In task 2, we have to differentiate adverse drug reaction (ADR) tweets from nonADR tweets and is treated as binary classification. Task3 involves extracting ADR mentions and then mapping them to MedDRA codes. Extracting ADR mentions is treated as sequence labeling and normalizing ADR mentions is treated as multi-class classification. Our system is based on pre-trained language model RoBERTa and it achieves a) F1-score of 58% in task2 which is 12% more than the average score b) relaxed F1-score of 70.1% in ADR extraction of task 3 which is 13.7% more than the average score and relaxed F1-score of 35% in ADR extraction + normalization of task3 which is 5.8% more than the average score. Overall, our models achieve promising results in both the tasks with significant improvements over average scores.
摘要：本文介绍了我们对社会化媒体的挖掘任务2和任务3健康（SMM4H）2020共享任务的方法。在任务2中，我们必须区分nonADR鸣叫药物不良反应（ADR）鸣叫和被视为二值分类。 TASK3涉及提取ADR提到，然后将它们映射到MedDRA的代码。提取ADR提到被视为序列标签和正火ADR提到被视为多类分类。我们的系统是基于预先训练语言模型罗伯塔，它实现了）的58％，在TASK2 F1-评分的12％，比平均得分为B以上）放宽了70.1％的F1-得分任务3的ADR提取这是13.7％比平均得分和中TASK3的ADR萃取+正常化的35％松弛F1分数是5.8％，比平均得分更多。总体而言，我们的模型实现承诺均超过平均分数显著改进的任务的结果。

6. Measuring Memorization Effect in Word-Level Neural Networks Probing [PDF] 返回目录
Rudolf Rosa, Tomáš Musil, David Mareček
Abstract: Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen in training. Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate. We exemplify this by showcasing our method on a case study of probing for part of speech in a trained neural machine translation encoder.
摘要：多项研究探讨交涉新兴训练结束到终端的NLP任务，神经网络和要检查什么字级语言信息可以在交涉进行编码。在传统的探测，分类训练的交涉提取目标的语言信息。但是，分类的简单记忆个别单词的语言的标签，而不是从陈述中提取语言的抽象，因而报告假阳性结果的威胁。虽然已经取得了相当大的努力，以尽量减少记忆问题，实际测量的分类记忆发生量的任务，迄今已充分研究。在我们的工作中，我们提出了测量记忆效应的基础上，可比套看出与培训看不见的测试的话对称选择一个简单的一般方法。我们的方法可以用来明确地量化记忆在探测设置发生的量，从而使适当的设置可以被选择和探测的结果可以用一个可靠性估值来解释。我们通过展示我们的方法在探查在训练神经机器翻译编码器词性为例举例说明这一点。

7. Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models [PDF] 返回目录
Viet Bui The, Oanh Tran Thi, Phuong Le-Hong
Abstract: This paper describes our study on using mutilingual BERT embeddings and some new neural models for improving sequence tagging tasks for the Vietnamese language. We propose new model architectures and evaluate them extensively on two named entity recognition datasets of VLSP 2016 and VLSP 2018, and on two part-of-speech tagging datasets of VLSP 2010 and VLSP 2013. Our proposed models outperform existing methods and achieve new state-of-the-art results. In particular, we have pushed the accuracy of part-of-speech tagging to 95.40% on the VLSP 2010 corpus, to 96.77% on the VLSP 2013 corpus; and the F1 score of named entity recognition to 94.07% on the VLSP 2016 corpus, to 90.31% on the VLSP 2018 corpus. Our code and pre-trained models viBERT and vELECTRA are released as open source to facilitate adoption and further research.
摘要：本文介绍了我们在使用mutilingual BERT的嵌入和提高序列标记任务越南语一些新的神经模型的研究。我们提出了新的模型架构和VLSP 2016和2018 VLSP两个命名实体识别数据集的广泛评估它们，并于2010年VLSP和VLSP 2013年的两个部分词性标注的数据集我们提出的模型优于现有的方法和实现新的国有的最先进的结果。特别是，我们已经推动部分词性标注的准确性，95.40％的VLSP 2010语料库，对VLSP 2013语料库96.77％;和F1分数命名实体识别，以94.07％的VLSP 2016语料库，以90.31％的VLSP 2018语料库。我们的代码和预先训练模型维贝尔和vELECTRA被释放作为开源促进采用和进一步研究。

8. A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis [PDF] 返回目录
Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche, Stéphane Dupont
Abstract: Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed solution has also been submitted to the ACL20: Second Grand-Challenge on Multimodal Language to be evaluated on the CMU-MOSEI dataset. The code to replicate the presented experiments is open-source: this https URL.
摘要：了解表达情绪和情感是人类语言多式联运两个关键因素。本文介绍的联合编码基于变压器的（TBJE）的情感识别和情感分析的任务。除了使用变压器的架构，我们的方法依赖于模块化的共同关注和一瞥层联合编码的一种或多种方式。所提出的解决方案也已提交给ACL20：第二大挑战赛的多模态语言对CMU-MOSEI数据集进行评估。复制展示实验的代码是开源：此HTTPS URL。

9. Hinting Semantic Parsing with Statistical Word Sense Disambiguation [PDF] 返回目录
Ritwik Bose, Siddharth Vashishtha, James Allen
Abstract: The task of Semantic Parsing can be approximated as a transformation of an utterance into a logical form graph where edges represent semantic roles and nodes represent word senses. The resulting representation should be capture the meaning of the utterance and be suitable for reasoning. Word senses and semantic roles are interdependent, meaning errors in assigning word senses can cause errors in assigning semantic roles and vice versa. While statistical approaches to word sense disambiguation outperform logical, rule-based semantic parsers for raw word sense assignment, these statistical word sense disambiguation systems do not produce the rich role structure or detailed semantic representation of the input. In this work, we provide hints from a statistical WSD system to guide a logical semantic parser to produce better semantic type assignments while maintaining the soundness of the resulting logical forms. We observe an improvement of up to 10.5% in F-score, however we find that this improvement comes at a cost to the structural integrity of the parse
摘要：语义分析的任务可以近似为一个变换发声的成逻辑形式图，其中边表示语义角色和节点表示单词感官。将得到的表示应捕获发声的含义，并适用于推理。词义和语义角色是相互依存的，在分配词义可以反之亦然指派的语义角色和副导致错误的意思的错误。虽然统计方法对多义超越了原始词义分配逻辑，以规则为基础的语义分析程序，这些统计词义消歧系统不产生丰富的角色结构或输入的语义详细表示。在这项工作中，我们提供从统计WSD系统提示引导逻辑语义解析，以产生更好的语义类型分配，同时保持产生逻辑形式的合理性。我们观察到了一个改进的F-得分10.5％，但是我们发现，这种改善是有代价的解析的结构完整性

10. Is Japanese gendered language used on Twitter ? A large scale study [PDF] 返回目录
Tiziana Carpi, Stefano Maria Iacus
Abstract: This study analyzes the usage of Japanese gendered language on Twitter. Starting from a collection of 408 million Japanese tweets from 2015 till 2019 and an additional sample of 2355 manually classified Twitter accounts timelines into gender and categories (politicians, musicians, etc). A large scale textual analysis is performed on this corpus to identify and examine sentence-final particles (SFPs) and first-person pronouns appearing in the texts. It turns out that gendered language is in fact used also on Twitter, in about 6% of the tweets, and that the prescriptive classification into "male" and "female" language does not always meet the expectations, with remarkable exceptions. Further, SFPs and pronouns show increasing or decreasing trends, indicating an evolution of the language used on Twitter.
摘要：本研究分析在Twitter日本性别语言的使用。从408万个日本鸣叫从2015年的集合直到2019和额外的样品开始2355手动分类Twitter帐户时间表成性别和类别（政治家，音乐家等）。关于这个语料库进行一个大规模的文本分析，以识别和检查句末颗粒（SFP）收发和第一人称代词出现在文本。事实证明，性别语言其实也用在Twitter上，在微博的约6％，而该规范分类为“男性”和“女性”的语言并不总是符合预期，具有显着的例外。此外，SFP和代词表现出递增或递减的趋势，说明在Twitter上使用的语言的演变。

11. A Framework for Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base [PDF] 返回目录
Taiwo Kolajo, Olawande Daramola, Ayodele Adebiyi, Seth Aaditeshwar
Abstract: Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social media posts. These noisy terms have implicit meanings and form part of the rich semantic context that must be analysed to gain complete insights from social media feeds. This paper proposes an improved framework for pre-processing of social media feeds for better performance. To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (Naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate semantic analysis of social media feeds. Experimental results showed that the proposed approach performed better than existing methods when it was tested on three machine learning models, which are support vector machines, multilayer perceptron, and convolutional neural networks. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets. The improved performance on the localised dataset reveals the advantage of integrating the use of local knowledge sources into the process of analysing social media feeds particularly in interpreting slangs/acronyms/abbreviations that have contextually rooted meanings.
摘要：大多数社交媒体供稿的语义分析以往的研究中没有考虑到与俚语，缩写词和缩略词嵌入在社交媒体帖子相关的不确定性的问题。这嘈杂的术语隐含意义，必须进行分析，以获取来自社交媒体资讯完整的见解丰富的语义上下文的一部分。本文提出了一种社交媒体供稿的前处理中更好的性能改进的框架。要做到这一点，利用一个集成知识库（IKB），它包括本地知识源（Naijalingo）的，城市词典和网络语言合并与调整Lesk算法，以促进社交媒体供稿语义分析。实验结果表明，该方法进行比现有方法更好的时候，是在三个机器学习模型，这是支持向量机，多层感知和卷积神经网络进行测试。从鸣叫用于提取情绪当框架具有94.07％的标准化的数据集上的局部数据集的准确度，和99.78％。对局部数据集的改进的性能尤其是显示在解释有上下文根植含义俚语/首字母缩略词/缩写整合利用当地的知识来源为分析社会媒体供稿的过程的优点。

12. Answering Questions on COVID-19 in Real-Time [PDF] 返回目录
Jinhyuk Lee, Sean S. Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, Jaewoo Kang
Abstract: The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it. One reason why the fight is difficult is due to the lack of information and knowledge. In this work, we outline our effort to contribute to shrinking this knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time. Our system leverages both supervised and unsupervised approaches to provide informative answers using DenSPI (Seo et al., 2019) and BEST (Lee et al., 2016). Evaluation of covidAsk is carried out by using a manually created dataset called COVID-19 Questions which is based on facts about COVID-19. We hope our system will be able to aid researchers in their search for knowledge and information not only for COVID-19 but for future pandemics as well.
摘要：新型冠状病毒的最近爆发是对世界造成严重破坏和研究人员都在努力有效地打击它。原因之一拼的是困难的是，由于缺乏信息和知识。在这项工作中，我们列出了我们的努力，通过创建covidAsk，一个问答（QA）系统，结合生物医学文本挖掘和质量保证技术，提供答案的实时问题，有助于缩小这方面的知识真空。我们的系统同时利用监督和无监督的方法来提供使用DenSPI翔实的答案（Seo等人。，2019）和最佳（Lee等，2016）。 covidAsk评价使用名为COVID-19问题手动创建的数据集是基于约COVID-19的事实进行。我们希望我们的系统将能够帮助研究人员在对知识和信息不仅对COVID-19，但对于未来的流感大流行，以及。

13. Combine Convolution with Recurrent Networks for Text Classification [PDF] 返回目录
Shengfei Lyu, Jiaqi Liu
Abstract: Convolutional neural network (CNN) and recurrent neural network (RNN) are two popular architectures used in text classification. Traditional methods to combine the strengths of the two networks rely on streamlining them or concatenating features extracted from them. In this paper, we propose a novel method to keep the strengths of the two networks to a great extent. In the proposed model, a convolutional neural network is applied to learn a 2D weight matrix where each row reflects the importance of each word from different aspects. Meanwhile, we use a bi-directional RNN to process each word and employ a neural tensor layer that fuses forward and backward hidden states to get word representations. In the end, the weight matrix and word representations are combined to obtain the representation in a 2D matrix form for the text. We carry out experiments on a number of datasets for text classification. The experimental results confirm the effectiveness of the proposed method.
摘要：卷积神经网络（CNN）和递归神经网络（RNN）在文本分类中使用两种流行的架构。传统方法对两个网络的优势结合依靠简化他们或串联从中提取的特征。在本文中，我们提出了一种新的方法，以保持两个网络的优势，在很大的程度。在该模型，卷积神经网络应用到学习2D权重矩阵，其中每一行反映来自不同方面的每个单词的重要性。同时，我们采用了双向RNN来处理每一个字，并采用神经张层保险丝向前和向后隐藏状态得到词表示。在结束时，将权重矩阵和字表示被组合以获得在文本2D矩阵形式表示。我们进行实验在多个数据集进行文本分类的。实验结果证实了该方法的有效性。

14. Mapping Topic Evolution Across Poetic Traditions [PDF] 返回目录
Petr Plechac, Thomas N. Haider
Abstract: Poetic traditions across languages evolved differently, but we find that certain semantic topics occur in several of them, albeit sometimes with temporal delay, or with diverging trajectories over time. We apply Latent Dirichlet Allocation (LDA) to poetry corpora of four languages, i.e. German (52k poems), English (85k poems), Russian (18k poems), and Czech (80k poems). We align and interpret salient topics, their trend over time (1600--1925 A.D.), showing similarities and disparities across poetic traditions with a few select topics, and use their trajectories over time to pinpoint specific literary epochs.
摘要：跨语言诗学传统发展方式不同，但我们发现，发生在几个人的某些语义的话题，虽然有时时间上的延迟，或随着时间的推移发散轨迹。我们采用隐含狄利克雷分布（LDA），以四种语言诗全集，即德语（52K诗），英语（85K诗），俄罗斯（18K诗），和捷克（80K诗）。我们调整和解释突出主题，随着时间的推移（公元1600--1925）的趋势，表现出相似性和差异跨越诗学传统与一些特定的主题，用自己的轨迹，随着时间的推移，以确定具体的文学时代。

15. Progressive Generation of Long Text [PDF] 返回目录
Bowen Tan, Zichao Yang, Maruan AI-Shedivat, Eric P. Xing, Zhiting Hu
Abstract: Large-scale language models pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. However, as our systematic examination reveals, it is still challenging for such models to generate coherent long passages of text ($>$1000 tokens), especially when the models are fine-tuned to the target domain on a small corpus. To overcome the limitation, we propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution. Our method first produces domain-specific content keywords and then progressively refines them into complete passages in multiple stages. The simple design allows our approach to take advantage of pretrained language models at each stage and effectively adapt to any target domain given only a small set of examples. We conduct a comprehensive empirical study with a broad set of evaluation metrics, and show that our approach significantly improves upon the fine-tuned GPT-2 in terms of domain-specific quality and sample efficiency. The coarse-to-fine nature of progressive generation also allows for a higher degree of control over the generated content.
摘要：预先训练文本的语料库庞大，如GPT-2大型语言模型，是功能强大的开放域文本发电机。然而，作为我们系统的检查表明，它仍然是具有挑战性的此类模型生成文本（$> $ 1000个标记）的连贯的长通道，尤其是当模型微调，以在一个小语料库目标域。为了克服的限制，我们提出以渐进的方式简单而生成文本的有效的方法，通过生成从低图像以高分辨率启发。我们的方法首先产生特定领域的内容的关键字，然后逐步细化，使它们成为多个阶段完成通道。简单的设计使我们的方法利用预训练的语言模型的每个阶段，并有效地适应只给一小部分的示例中的任何目标域。我们有一个广泛的评估指标进行了全面的实证研究，并表明我们的方法显著在特定领域的质量和样本效率方面于微调GPT-2提高。渐进生成的粗到细的性质还允许较高程度的过所生成的内容的控制。

16. Rethinking the Positional Encoding in Language Pre-training [PDF] 返回目录
Guolin Ke, Di He, Tie-Yan Liu
Abstract: How to explicitly encode positional information into neural networks is an important problem in natural language processing. In the Transformer model, the positional information is simply encoded as embedding vectors, which are used in the input layer, or encoded as a bias term in the self-attention module. In this work, we investigate the problems in the previous formulations and propose a new positional encoding method for BERT called Transformer with Untied Positional Encoding (TUPE). Different from all other works, TUPE only uses the word embedding as input. In the self-attention module, the word correlation and positional correlation are computed separately with different parameterizations and then added together. This design removes the noisy word-position correlation and gives more expressiveness to characterize the relationship between words/positions by using different projection matrices. Furthermore, TUPE unties the \texttt{[CLS]} symbol from other positions to provide it with a more specific role to capture the global representation of the sentence. Extensive experiments and ablation studies on GLUE benchmark demonstrate the effectiveness and efficiency of the proposed method: TUPE outperforms several baselines on almost all tasks by a large margin. In particular, it can achieve a higher score than baselines while only using 30\% pre-training computational costs. We release our code at this https URL.
摘要：如何位置信息明确地编码为神经网络在自然语言处理的一个重要问题。在变压器模型中，位置信息被简单地编码为嵌入载体中，在输入层中使用，或作为自注意模块中的偏差项编码。在这项工作中，我们探讨在以前的配方的问题，并提出了一个新的位置编码方法BERT称为互感器解开阵地编码（TUPE）。从所有其他作品不同的是，TUPE只能使用单词嵌入作为输入。在自注意模块，字相关和位置关系与不同的参数分别计算，然后加在一起。这种设计消除了噪声的字位置相关，并给出更表现通过使用不同的投影矩阵来表征字/位置之间的关系。此外，TUPE解开从其他位置\ texttt {[CLS]}符号为它提供捕捉到句子的全球代表性更具体的角色。胶水基准广泛的实验和消融研究证明的有效性和该方法的效率：TUPE大幅度优于几乎所有的任务数的基线。特别是，它可以在使用仅30 \％前培训计算成本实现更高的分数比基线。我们在此HTTPS URL释放我们的代码。

17. Self-Attention Networks for Intent Detection [PDF] 返回目录
Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth
Abstract: Self-attention networks (SAN) have shown promising performance in various Natural Language Processing (NLP) scenarios, especially in machine translation. One of the main points of SANs is the strength of capturing long-range and multi-scale dependencies from the data. In this paper, we present a novel intent detection system which is based on a self-attention network and a Bi-LSTM. Our approach shows improvement by using a transformer model and deep averaging network-based universal sentence encoder compared to previous solutions. We evaluate the system on Snips, Smart Speaker, Smart Lights, and ATIS datasets by different evaluation metrics. The performance of the proposed model is compared with LSTM with the same datasets.
摘要：自重视网络（SAN）已经显示出不同的自然语言处理（NLP）的方案有前途的性能，尤其是在机器翻译。一个SAN的要点是从数据捕获远距离和多尺度相关性的强度。在本文中，我们提出了一种基于自我关注网络和双LSTM一种新型意图检测系统。我们的做法表明，通过使用比以前的解决方案的变压器模型和深基于网络的平均万能句子编码器的改进。我们通过不同的评价指标来评估的零星消息，智能音箱，聪明灯和ATIS数据集的系统。该模型的性能与具有相同的数据集LSTM比较。

18. BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision [PDF] 返回目录
Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, Chao Zhang
Abstract: We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in this https URL.
摘要：我们研究一个名为遥远的监督下，实体识别（NER）问题开域。远处的监督，但并不需要大量的人工注释，高度不完备的产量和嘈杂遥远的标签通过外部的知识基础。为了应对这一挑战，我们提出了一个新的计算框架 - 邦德，它利用的预先训练语言模型（例如，BERT和罗伯塔）的功率，以提高NER模型的预测性能。具体来说，我们提出了两个阶段的训练算法：在第一阶段，我们适应预先训练的语言模型使用遥远的标签，它可以显著提高召回率和准确的NER任务;在第二阶段，我们放弃了遥远的标签，并提出了一个自我训练的方法来进一步提高模型的性能。 5个基准数据集彻底的实验表明债券在现有的远亲监督NER方法的优越性。代码和标记较远的数据已经公布在该HTTPS URL。

19. A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards [PDF] 返回目录
Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov
Abstract: Cross-lingual text summarization aims at generating a document summary in one language given input in another language. It is a practically important but under-explored task, primarily due to the dearth of available data. Existing methods resort to machine translation to synthesize training data, but such pipeline approaches suffer from error propagation. In this work, we propose an end-to-end cross-lingual text summarization model. The model uses reinforcement learning to directly optimize a bilingual semantic similarity metric between the summaries generated in a target language and gold summaries in a source language. We also introduce techniques to pre-train the model leveraging monolingual summarization and machine translation objectives. Experimental results in both English--Chinese and English--German cross-lingual summarization settings demonstrate the effectiveness of our methods. In addition, we find that reinforcement learning models with bilingual semantic similarity as rewards generate more fluent sentences than strong baselines.
摘要：跨语种文本摘要的目的是产生在另一种语言给定输入一个语言编写的文件摘要。这是一个重要的现实意义，但充分开发的任务，主要是由于可用数据的缺乏。现有的方法求助于机器翻译合成训练数据，但这样的管道的方法，从错误传播受到影响。在这项工作中，我们提出了一个终端到终端跨语种文本摘要模型。该模型使用强化学习直接优化双语语义相似度在源语言的目标语言和金摘要产生的摘要之间度量。我们还引进技术预先训练模型利用单语汇总和机器翻译的目标。中国和英国 - - 中英文实验结果德国跨语言概括的设置表明了我们方法的有效性。此外，我们发现，强化学习双语语义相似模型作为奖励产生更流畅的句子比强基线。

20. Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization [PDF] 返回目录
Beliz Gunel, Chenguang Zhu, Michael Zeng, Xuedong Huang
Abstract: Neural models have become successful at producing abstractive summaries that are human-readable and fluent. However, these models have two critical shortcomings: they often don't respect the facts that are either included in the source article or are known to humans as commonsense knowledge, and they don't produce coherent summaries when the source article is long. In this work, we propose a novel architecture that extends Transformer encoder-decoder architecture in order to improve on these shortcomings. First, we incorporate entity-level knowledge from the Wikidata knowledge graph into the encoder-decoder architecture. Injecting structural world knowledge from Wikidata helps our abstractive summarization model to be more fact-aware. Second, we utilize the ideas used in Transformer-XL language model in our proposed encoder-decoder architecture. This helps our model with producing coherent summaries even when the source article is long. We test our model on CNN/Daily Mail summarization dataset and show improvements on ROUGE scores over the baseline Transformer model. We also include model predictions for which our model accurately conveys the facts, while the baseline Transformer model doesn't.
摘要：神经模型已在生产抽象的总结是人类可读的，流利获得成功。然而，这些模型有两个重要的缺点：他们往往不尊重要么包含在源的文章或已知人类的常识知识的事实，他们不产生相干摘要当源文章很长。在这项工作中，我们提出了延长变压器编码器，解码器架构，以改善这些缺点的新型架构。首先，我们结合从维基数据知识图的实体层次的知识到编码器，解码器架构。从维基数据结构注入世界的知识有助于我们的抽象总结的模型更加事实上感知。其次，我们利用我们提出的编码器，解码器架构变压器-XL语言模型中使用的想法。这有助于我们的模型与生产连贯摘要即使源文章很长。我们测试我们对ROUGE得分超过基线变压器模型CNN /每日邮报汇总数据集和显示的改进型号。我们还包括，我们的模型准确地传达事实，而基线变压器模型没有模型预测。

21. String-based methods for tonal harmony: A corpus study of Haydn's string quartets [PDF] 返回目录
David R. W. Sears
Abstract: This chapter considers how string-based methods might be adapted to address music-analytic questions related to the discovery of musical organization, with particular attention devoted to the analysis of tonal harmony. I begin by applying the taxonomy of mental organization proposed by Mandler (1979) to the concept of musical organization. Using this taxonomy as a guide, I then present evidence for three principles of tonal harmony -- recurrence, syntax, and recursion -- using a corpus of Haydn string quartets.
摘要：本章考虑基于字符串的方法可能会如何适应所涉及的音乐组织的发现地址音乐解析的问题，尤其要注意致力于色调和谐的分析。我就开始采用心理组织提出的曼德勒（1979年），以音乐团体的概念分类。使用这种分类法为指导，然后我礼物色调和谐的三大原则的证据 - 使用海顿弦乐四重奏全集 - 复发，语法和递归。

22. Video-Grounded Dialogues with Pretrained Generation Language Models [PDF] 返回目录
Hung Le, Steven C.H. Hoi
Abstract: Pre-trained language models have shown remarkable success in improving various downstream NLP tasks due to their ability to capture dependencies in textual data and generate natural responses. In this paper, we leverage the power of pre-trained language models for improving video-grounded dialogue, which is very challenging and involves complex features of different dynamics: (1) Video features which can extend across both spatial and temporal dimensions; and (2) Dialogue features which involve semantic dependencies over multiple dialogue turns. We propose a framework by extending GPT-2 models to tackle these challenges by formulating video-grounded dialogue tasks as a sequence-to-sequence task, combining both visual and textual representation into a structured sequence, and fine-tuning a large pre-trained GPT-2 network. Our framework allows fine-tuning language models to capture dependencies across multiple modalities over different levels of information: spatio-temporal level in video and token-sentence level in dialogue context. We achieve promising improvement on the Audio-Visual Scene-Aware Dialogues (AVSD) benchmark from DSTC7, which supports a potential direction in this line of research.
摘要：预先训练语言模型已经在文本数据显示在改善各种下游NLP任务，显着的成就是由于他们有能力捕获的依赖性和产生的自然反应。在本文中，我们利用的预先训练语言模型的功率提高视频接地对话，这是非常具有挑战性的，涉及不同的动态的复杂的功能：（1）视频功能，它可以跨越空间和时间维度延伸;和（2）的对话特征，其涉及在多个对话回合语义相关性。我们通过扩展GPT-2车型，以解决通过制定视频接地对话任务序列到序列任务，无论是视觉和文字表述组合成一个结构化的序列这些挑战提出了一个框架，并微调了大量预先训练GPT-2网络。我们的框架允许微调语言模型，以便在不同层次的信息在多个方式捕获的依赖关系：在对话背景视频和令牌句子层面的时空水平。我们实现了从DSTC7的视听场景感知对话（AVSD）的基准，它支持在这一行的研究潜力有前途的方向改进。

23. Uncertainty-aware Self-training for Text Classification with Few Labels [PDF] 返回目录
Subhabrata Mukherjee, Ahmed Hassan Awadallah
Abstract: Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to pseudo-label and augment labeled data. In this work, we propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification on five benchmark datasets. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models fine-tuned on thousands of labeled instances with an aggregate accuracy of 91% and improving by upto 12% over baselines.
摘要：大型预训练的语言模型最近的成功上微调它们关键取决于对大量的下游任务标记数据，这通常是昂贵的收购。在这项工作中，我们研究的自我训练作为国内最早半监督学习的一个方法通过利用大规模的标签数据的目标任务，以减少注释瓶颈。标准自我培训机构从非标记池伪标签和标记扩充数据随机样本的实例。在这项工作中，我们提出通过将利用贝叶斯深度学习最新进展的基本神经网络的不确定性估算，以提高自我训练的方法。具体来说，我们建议（一）采集功能，选择从非标记池利用蒙特卡洛（MC）差的情况下，以及（ii）学习机制利用自我训练模式的信心。作为一个应用，我们专注于文本分类的五个基准数据集。我们证明我们的方法利用只有20％至30级标记的样品，用于培训和验证每个任务可以完全监控预先训练语言模型上千种标记实例的微调有91％的总精度和3％内执行通过高达12％以上的基线改善。

24. BERTology Meets Biology: Interpreting Attention in Protein Language Models [PDF] 返回目录
Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
Abstract: Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at this https URL.
摘要：变压器的架构已经证明学习对蛋白质的分类和生成任务用的表现。然而，这些表象呈现解释性的挑战。通过关注的镜头，我们分析了变压器的内部运作和探索模型是如何辨别蛋白质的结构和功能特性。我们表明，注意（1）捕获的蛋白质的折叠结构，连接该相距很远的基础序列的氨基酸，但在空间上接近所述三维结构，（2）目标的结合位点，蛋白的一个关键功能部件，和（3）集中于与增加层深度越来越复杂的生物物理特性。我们还提出关注和蛋白质结构之间的相互作用的三维可视化。我们的研究结果与已知的生物工艺调整和提供蛋白质工程和合成生物学的工具来帮助用户找到。对于可视化和分析的代码可在此HTTPS URL。

25. Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments [PDF] 返回目录
Mohammad Mohammadamini, Driss Matrouf
Abstract: The explosion of available speech data and new speaker modeling methods based on deep neural networks (DNN) have given the ability to develop more robust speaker recognition systems. Among DNN speaker modelling techniques, x-vector system has shown a degree of robustness in noisy environments. Previous studies suggest that by increasing the number of speakers in the training data and using data augmentation more robust speaker recognition systems are achievable in noisy environments. In this work, we want to know if explicit noise compensation techniques continue to be effective despite the general noise robustness of these systems. For this study, we will use two different x-vector networks: the first one is trained on Voxceleb1 (Protocol1), and the second one is trained on Voxceleb1+Voxveleb2 (Protocol2). We propose to add a denoising x-vector subsystem before scoring. Experimental results show that, the x-vector system used in Protocol2 is more robust than the other one used Protocol1. Despite this observation we will show that explicit noise compensation gives almost the same EER relative gain in both protocols. For example, in the Protocol2 we have 21% to 66% improvement of EER with denoising techniques.
摘要：基于深层神经网络（DNN）提供语音数据和新的扬声器建模方法的涌现也给开发更强大的说话人识别系统的能力。间DNN扬声器建模技术，X - 载体系统已经显示出在嘈杂环境中一定程度的鲁棒性。以往的研究表明，通过增加训练数据的扬声器数量和使用数据的增强更强大的说话人识别系统在嘈杂的环境中实现的。在这项工作中，我们想知道，如果明确的噪音补偿技术继续是，尽管这些系统的一般噪声鲁棒有效。对于本研究，我们将使用两个不同的x矢量网络：第一个是上训练Voxceleb1（协议1），而第二个是在Voxceleb1 + Voxveleb2（协议2）训练。我们建议进球前增加降噪的X矢量子系统。实验结果表明，在协议2使用的X - 载体系统比用于协议1的另一个更稳健。尽管这样的观察，我们将表明明确的噪声补偿给出了几乎两个协议相同的能效比相对增益。例如，在协议2，我们有EER的21％至66％的改善与降噪技术。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-30

目录

摘要