摘要

1. Example-Based Named Entity Recognition [PDF] 返回目录
Morteza Ziyadi, Yuting Sun, Abhishek Goswami, Jade Huang, Weizhu Chen
Abstract: We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER. Our train-free few-shot learning approach takes inspiration from question-answering to identify entity spans in a new and unseen domain. In comparison with the current state-of-the-art, the proposed method performs significantly better, especially when using a low number of support examples.
摘要：本文提出了一种新的方法来命名实体识别（NER）在数据稀少的情况下，我们称之为基于实例的ER。我们的免费火车几拍学习方法的灵感来自问题回答，以确定新的和看不见的域实体跨度。在与当前状态的最-技术相比，所提出的方法进行显著更好，使用的载体的例子低数量时尤其如此。

2. Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III [PDF] 返回目录
Brent Biseda, Gaurav Desai, Haifeng Lin, Anish Philip
Abstract: This paper achieves state of the art results for the ICD code prediction task using the MIMIC-III dataset. This was achieved through the use of Clinical BERT (Alsentzer et al., 2019). embeddings and text augmentation and label balancing to improve F1 scores for both ICD Chapter as well as ICD disease codes. We attribute the improved performance mainly to the use of novel text augmentation to shuffle the order of sentences during training. In comparison to the Top-32 ICD code prediction (Keyang Xu, et. al.) with an F1 score of 0.76, we achieve a final F1 score of 0.75 but on a total of the top 50 ICD codes.
摘要：实现了本领域的结果用于使用MIMIC-III数据集中的ICD代码预测任务的状态。这是通过使用临床BERT的（Alsentzer等人，2019）来实现。的嵌入和文本增强和标签平衡，改善二者ICD章F1分数以及ICD疾病编码。我们主要属性提高性能是使用新的文本增强培训期间洗牌句子的顺序。相比于顶32 ICD代码预测（可洋Xu等人）与F1得分为0.76，我们达到0.75，但在总排名前50 ICD代码的最终比分F1。

3. How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics [PDF] 返回目录
Prasanna Parthasarathi, Joelle Pineau, Sarath Chandar
Abstract: Though generative dialogue modeling is widely seen as a language modeling task, the task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user. The automatic metrics used evaluate the quality of the generated text as a proxy to the holistic interaction of the agent. Such metrics were earlier shown to not correlate with the human judgement. In this work, we observe that human evaluation of dialogue agents can be inconclusive due to the lack of sufficient information for appropriate evaluation. The automatic metrics are deterministic yet shallow and human evaluation can be relevant yet inconclusive. To bridge this gap in evaluation, we propose designing a set of probing tasks to evaluate dialogue models. The hand-crafted tasks are aimed at quantitatively evaluating a generative dialogue model's understanding beyond the token-level evaluation on the generated text. The probing tasks are deterministic like automatic metrics and requires human judgement in their designing; benefiting from the best of both worlds. With experiments on probe tasks we observe that, unlike RNN based architectures, transformer model may not be learning to comprehend the input text despite its generated text having higher overlap with the target text.
摘要：虽然生成对话模型被广泛认为是一个语言建模任务，任务要求代理人有其输入文本的复杂的自然语言理解随身携带的用户有意义的交流。使用自动度量评价所生成的文本作为一个代理到代理的整体相互作用的质量。这些指标均显示前面不与人的判断有关。在这项工作中，我们观察到的对话剂，人的评价可能是不确定的，由于缺乏足够的信息进行适当的评估。自动指标确定性又浅，人的评价，可以是相关的尚未定论。为了弥补在评价这一差距，我们建议设计一套探测任务，以评估对话模式。手工制作的任务，旨在对生成的文本定量评价超越了标记级别评估生成对话模式的理解。该探测任务是确定的，如自动指标，并要求在他们的设计人的判断;受益于两全其美。随着探测任务实验我们观察到的是，与基于RNN架构，变压器模型可能无法学习领会输入文本，尽管具有较高的重叠与目标文本的生成的文本。

4. End to End Dialogue Transformer [PDF] 返回目录
Ondřej Měkota, Memduh Gökırmak, Petr Laitoch
Abstract: Dialogue systems attempt to facilitate conversations between humans and computers, for purposes as diverse as small talk to booking a vacation. We are here inspired by the performance of the recurrent neural network-based model Sequicity, which when conducting a dialogue uses a sequence-to-sequence architecture to first produce a textual representation of what is going on in the dialogue, and in a further step use this along with database findings to produce a reply to the user. We here propose a dialogue system based on the Transformer architecture instead of Sequicity's RNN-based architecture, that works similarly in an end-to-end, sequence-to-sequence fashion.
摘要：对话系统试图促进人类与计算机之间的对话，为目的等不同的闲聊来预订假期。我们通过反复基于神经网络模型Sequicity，其在进行对话时的性能，这里的启发使用序列到序列架构的第一批产品是什么在对话正在进行的文字表述，并在进一步的步骤使用此使用数据库发现一起产生对用户的回复。在这里，我们提出了基于变压器的架构，而不是Sequicity的基于RNN的架构，在一个终端到终端的，序列对序列的方式工作方式类似对话系统。

5. Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources [PDF] 返回目录
Taolin Zhang, Chengyu Wang, Minghui Qiu, Bite Yang, Xiaofeng He, Jun Huang
Abstract: Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
摘要：机阅读理解（MRC）旨在提取答案给出一个通道的问题。它已被广泛最近研究，特别是在开放的领域。然而，少数已作出努力，在封闭域MRC，主要是由于缺乏大规模的训练数据。在本文中，我们介绍了医疗领域，其目标是预测解答医疗问题，并同时从医疗信息来源相对应的支持句多目标MRC任务，以保证医学知识服务的高可靠性。高品质的数据集手动构造为宗旨，命名为多任务中国医学MRC数据集（CMedMRC），详细的分析进行。我们进一步提出了中国医疗BERT模式的任务（CMedBERT），它通过异构特性的动态融合机制和多任务学习策略融合的医学知识为预训练的语言模型。实验表明，CMedBERT通过融合上下文感知和知识感知令牌表示一贯优于强基线。

6. Cross-lingual Semantic Role Labeling with Model Transfer [PDF] 返回目录
Hao Fei, Meishan Zhang, Fei Li, Donghong Ji
Abstract: Prior studies show that cross-lingual semantic role labeling (SRL) can be achieved by model transfer under the help of universal features. In this paper, we fill the gap of cross-lingual SRL by proposing an end-to-end SRL model that incorporates a variety of universal features and transfer methods. We study both the bilingual transfer and multi-source transfer, under gold or machine-generated syntactic inputs, pre-trained high-order abstract features, and contextualized multilingual word representations. Experimental results on the Universal Proposition Bank corpus indicate that performances of the cross-lingual SRL can vary by leveraging different cross-lingual features. In addition, whether the features are gold-standard also has an impact on performances. Precisely, we find that gold syntax features are much more crucial for cross-lingual SRL, compared with the automatically-generated ones. Moreover, universal dependency structure features are able to give the best help, and both pre-trained high-order features and contextualized word representations can further bring significant improvements.
摘要：此前的研究表明，跨语言的语义角色标注（SRL）可以通过模型传递的通用功能的帮助下才能实现。在本文中，我们通过提出一种采用了多种通用功能和转移方法的端至端SRL模型填充跨语种SRL的间隙。我们研究了双语转让双方和多源传输，下金或机器生成句法输入，预先训练高阶抽象的特点，与语境多种语言文字表示。关于通用命题银行语料的实验结果表明，跨语种SRL的性能可以通过利用不同的跨语言的特点而有所不同。此外，该功能是否金标准也对表演产生影响。确切地说，我们发现，黄金的语法特点是更加至关重要的跨语言SRL，与自动生成了对比。此外，通用的依赖结构的特点是能够提供最好的帮助，以及职前训练的高阶功能和语境字表示可以进一步带来显著的改善。

7. YNU-HPCC at SemEval-2020 Task 11: LSTM Network for Detection of Propaganda Techniques in News Articles [PDF] 返回目录
Jiaxu Dao
Abstract: This paper summarizes our studies on propaganda detection techniques for news articles in the SemEval-2020 task 11. This task is divided into the SI and TC subtasks. We implemented the GloVe word representation, the BERT pretraining model, and the LSTM model architecture to accomplish this task. Our approach achieved good results for both the SI and TC subtasks. The macro-F1-score for the SI subtask is 0.406, and the micro-F1-score for the TC subtask is 0.505. Our method significantly outperforms the officially released baseline method, and the SI and TC subtasks rank 17th and 22nd, respectively, for the test set. This paper also compares the performances of different deep learning model architectures, such as the Bi-LSTM, LSTM, BERT, and XGBoost models, on the detection of news promotion techniques. The code of this paper is availabled at: this https URL.
摘要：本文对宣传的检测技术，总结了我们的研究为这一任务分为SI和TC子任务在SemEval-2020的任务11.新闻文章。我们实施了手套字表示，该BERT训练前模式，以及LSTM模型架构来完成这个任务。我们的方法实现对SI和TC子任务都很好的效果。宏-F1-比分为SI子任务是0.406，并且微-F1-比分为TC子任务是0.505。我们的方法显著优于正式发布基线法，以及SI分别TC子任务排名第17和第22，对于测试集。本文还比较了不同深度学习模型架构，如碧LSTM，LSTM，BERT和XGBoost车型的性能，在检测的消息推广的手段。此HTTPS URL：本文的代码在被availabled。

8. syrapropa at SemEval-2020 Task 11: BERT-based Models Design For Propagandistic Technique and Span Detection [PDF] 返回目录
Jinfen Li, Lu Xiao
Abstract: This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.
摘要：本文介绍了SemEval-2020工作11提出了两个子任务的基于BERT的模型：在新闻文章宣传技巧的检测。我们首先建立了基于SpanBERT跨度识别（SI）的模型，并通过更深层次的模型和句子级表示有利于检测。然后，我们开发的技术分类（TC）的混合模式。混合模型由三个子模型包括不同的训练方法的两个BERT模型，基于特征的Logistic回归模型。我们努力通过调整成本函数来处理不平衡数据集。我们正处在SI子任务（F1-措施0.4711）的第七位，而在TC子任务的开发组第三名（F1-措施0.6783）。

9. Predicting Helpfulness of Online Reviews [PDF] 返回目录
Abdalraheem Alsmadi, Shadi AlZu'bi, Mahmoud Al-Ayyoub, Yaser Jararweh
Abstract: E-commerce dominates a large part of the world's economy with many websites dedicated to online selling products. The vast majority of e-commerce websites provide their customers with the ability to express their opinions about the products/services they purchase. These feedback in the form of reviews represent a rich source of information about the users' experiences and level of satisfaction, which is of great benefit to both the producer and the consumer. However, not all of these reviews are helpful/useful. The traditional way of determining the helpfulness of a review is through the feedback from human users. However, such a method does not necessarily cover all reviews. Moreover, it has many issues like bias, high cost, etc. Thus, there is a need to automate this process. This paper presents a set of machine learning (ML) models to predict the helpfulness online reviews. Mainly, three approaches are used: a supervised learning approach (using ML as well as deep learning (DL) models), a semi-supervised approach (that combines DL models with word embeddings), and pre-trained word embedding models that uses transfer learning (TL). The latter two approaches are among the unique aspects of this paper as they follow the recent trend of utilizing unlabeled text. The results show that the proposed DL approaches have superiority over the traditional existing ones. Moreover, the semi-supervised has a remarkable performance compared with the other ones.
摘要：电子商务主导着世界经济的许多网站专门为网上销售产品的很大一部分。绝大多数电子商务网站为其客户提供表达他们对产品/服务，他们购买意见的能力。在审查的形式反馈，这些代表了丰富的关于用户体验的信息和满意度来源，这是有很大好处的生产者和消费者两者。然而，并非所有的这些评论是有帮助/有用的。确定审查的乐于助人的传统方式是通过人类的用户反馈。然而，这种方法并不一定涵盖所有的评论。此外，它有一个像偏见，成本较高等诸多问题。因此，有必要使该过程自动化。本文提出了一套机器学习（ML）模型来预测乐于助人的在线评论。主要是，三种方法被使用：一个监督学习方法（使用ML以及深度学习（DL）模型），一个半监督办法（结合DL与词的嵌入模型），以及预先训练字嵌入模型的用途转移学习（TL）。后两种方法是本文的独特方面中，因为他们遵循最近利用未标记文本的趋势。结果表明，所提出的DL的方式同传统的现有优势。此外，半监督具有与其他的相比了骄人的业绩。

10. Deep Bayes Factor Scoring for Authorship Verification [PDF] 返回目录
Benedikt Boenninghoff, Julian Rupp, Robert M. Nickel, Dorothea Kolossa
Abstract: The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.
摘要：PAN 2020著作权验证（AV）的挑战集中在一个跨主题/闭组AV任务交给的同人文字的集合。同人是在所谓的影迷主题介绍了文件的主要主题故事情节的粉丝写的扩展。在PAN 2020 AV任务所提供的数据是相当困难的，因为在多个/不同的影迷主题文本的作者都包括在内。在这项工作中，我们提出的两种众所周知的方法的分层融合成一个单一的端至端的学习过程：在底部的目标深度量学习框架学习的伪度量长度可变的文档映射到一个固定大小的特征向量。在顶部，我们引入一个概率层在得知度量空间进行贝叶斯因子得分。我们还提供了文本预处理策略，以应对跨主题的问题。

11. COVID-19 Pandemic: Identifying Key Issues using Social Media and Natural Language Processing [PDF] 返回目录
Oladapo Oyebode, Chinenye Ndulue, Dinesh Mulchandani, Banuchitra Suruliraj, Ashfaq Adib, Fidelia Anulika Orji, Evangelos Milios, Stan Matwin, Rita Orji
Abstract: The COVID-19 pandemic has affected people's lives in many ways. Social media data can reveal public perceptions and experience with respect to the pandemic, and also reveal factors that hamper or support efforts to curb global spread of the disease. In this paper, we analyzed COVID-19-related comments collected from six social media platforms using Natural Language Processing (NLP) techniques. We identified relevant opinionated keyphrases and their respective sentiment polarity (negative or positive) from over 1 million randomly selected comments, and then categorized them into broader themes using thematic analysis. Our results uncover 34 negative themes out of which 17 are economic, socio-political, educational, and political issues. 20 positive themes were also identified. We discuss the negative issues and suggest interventions to tackle them based on the positive themes and research evidence.
摘要：COVID-19大流行已经影响到人们的生活中的许多方面。社交媒体数据可以揭示公众看法和经验，对于流感大流行，同时也揭示因素阻碍和扶持力度，遏制疫情的全球蔓延。在本文中，我们分析了使用自然语言处理（NLP）技术6个社交媒体平台收集COVID-19相关的评论。我们确定了相关的刚愎自用关键词和它们各自的情感极性（正或负）从超过100万随机选择的意见，然后采用主题分析它们分为更广泛的议题。我们的研究结果揪出34周负的主题出其中17个是经济，社会，政治，教育和政治问题。 20周积极的主题也被确定。我们讨论的负面问题，并提出干预措施基础上，积极的主题和研究的证据来解决这些问题。

12. An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web [PDF] 返回目录
Timothy R. Tangherlini, Shadi Shahsavari, Behnam Shahbazi, Ehsan Ebrahimzadeh, Vwani Roychowdhury
Abstract: Although a great deal of attention has been paid to how conspiracy theories circulate on social media and their factual counterpart conspiracies, there has been little computational work done on describing their narrative structures. We present an automated pipeline for the discovery and description of the generative narrative frameworks of conspiracy theories on social media, and actual conspiracies reported in the news media. We base this work on two separate repositories of posts and news articles describing the well-known conspiracy theory Pizzagate from 2016, and the New Jersey conspiracy Bridgegate from 2013. We formulate a graphical generative machine learning model where nodes represent actors/actants, and multi-edges and self-loops among nodes capture context-specific relationships. Posts and news items are viewed as samples of subgraphs of the hidden narrative network. The problem of reconstructing the underlying structure is posed as a latent model estimation problem. We automatically extract and aggregate the actants and their relationships from the posts and articles. We capture context specific actants and interactant relationships by developing a system of supernodes and subnodes. We use these to construct a network, which constitutes the underlying narrative framework. We show how the Pizzagate framework relies on the conspiracy theorists' interpretation of "hidden knowledge" to link otherwise unlinked domains of human interaction, and hypothesize that this multi-domain focus is an important feature of conspiracy theories. While Pizzagate relies on the alignment of multiple domains, Bridgegate remains firmly rooted in the single domain of New Jersey politics. We hypothesize that the narrative framework of a conspiracy theory might stabilize quickly in contrast to the narrative framework of an actual one, which may develop more slowly as revelations come to light.
摘要：虽然备受瞩目已经支付给阴谋论如何流传在社会化媒体和它们的实际对应的阴谋，一直在描述他们的叙述结构做很小的计算工作。我们目前对社交媒体的阴谋论，并在新闻媒体报道的实际阴谋的生成叙事框架的发现和描述的自动化流水线。我们立足这项工作对员额和新闻文章描述了从2016年著名的阴谋论Pizzagate两个独立的仓库和新泽西阴谋Bridgegate从2013年开始，我们制定的图形生成机器学习模型，其中节点代表演员/行动元，多-edges和节点捕捉特定上下文关系中的自我循环。帖子和新闻项目被看作是隐藏的叙事网络的子图的样品。重构的底层结构的问题是提出作为潜模型估计问题。我们自动提取和汇总行动元，并从帖子和文章之间的关系。我们通过开发超级节点和子节点的系统捕捉范围内具体行动元和相互作用物的关系。我们使用这些构建的网络，这构成了潜在的叙事框架。我们展示了Pizzagate框架如何依靠阴谋论的‘隐性知识’联系起来的人的互动，否则无关联域，或推测解释，这种多领域的重点是阴谋论的一个重要特征。虽然Pizzagate依赖于多个域的定位，Bridgegate仍然坚定地扎根于新泽西州政治的单域。我们假设一个阴谋论的叙事框架可能相反，实际之一，它可以开发出更多的慢启示显露出来的故事框架快速稳定。

13. Quantum Language Model with Entanglement Embedding for Question Answering [PDF] 返回目录
Yiwei Chen, Yu Pan, Daoyi Dong
Abstract: Quantum Language Models (QLMs) in which words are modelled as quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature word sequences are basically modelled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A full quantum model is yet to be developed to explicitly capture the non-classical correlations within the word sequences. We propose a neural network model with a novel Entanglement Embedding (EE) module, whose function is to transform the word sequences into entangled pure states of many-body quantum systems. Strong quantum entanglement, which is the central concept of quantum information and an indication of parallelized correlations among the words, is observed within the word sequences. Numerical experiments show that the proposed QLM with EE (QLM-EE) achieves superior performance compared with the classical deep neural network models and other QLMs on Question Answering (QA) datasets. In addition, the post-hoc interpretability of the model can be improved by quantizing the degree of entanglement among the words.
摘要：量子语言模型（QLMs），其中词被建模为义位的量子叠加已经证明了模型的透明度和事后解释性良好的高水平。然而，在目前的文献中的单词序列基本上建模为字状态，不能充分利用量子概率描述的电位的一个经典的混合物。一个完整的量子模型是尚待开发的明确捕获单词序列中的非经典相关性。我们提出一种具有新颖纠缠嵌入（EE）模块，其功能是在字序列变换为多体量子系统的纠缠纯状态的神经网络模型。强量子纠缠，这是量子信息的中心概念和并行相关性的单词中的指示，在字序列中观察到的。数值实验表明，与传统的深层神经网络模型等QLMs对问题回答（QA）的数据集相比，该QLM与EE（QLM-EE）实现了卓越的性能。此外，该模型的事后解释性可以通过量化缠结的词之间的程度的改善。

14. DUTH at SemEval-2020 Task 11: BERT with Entity Mapping for Propaganda Classification [PDF] 返回目录
Anastasios Bairaktaris, Symeon Symeonidis, Avi Arampatzis
Abstract: This report describes the methods employed by the Democritus University of Thrace (DUTH) team for participating in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. Our team dealt with Subtask 2: Technique Classification. We used shallow Natural Language Processing (NLP) preprocessing techniques to reduce the noise in the dataset, feature selection methods, and common supervised machine learning algorithms. Our final model is based on using the BERT system with entity mapping. To improve our model's accuracy, we mapped certain words into five distinct categories by employing word-classes and entity recognition.
摘要：本报告描述了色雷斯（DUTH）队的德谟克利特大学用于参与SemEval-2020工作11方法：宣传部的检测技术在新闻文章。我们的团队处理子程序2：技术分类。我们使用浅层自然语言处理（NLP）预处理技术，以减少数据集中，特征选择方法的噪音，共同监督的机器学习算法。我们的最终模型是基于使用的BERT系统实体映射。为了提高我们的模型的准确性，我们采用文字类和实体识别映射某些词分为五个不同的类别。

15. UTMN at SemEval-2020 Task 11: A Kitchen Solution to Automatic Propaganda Detection [PDF] 返回目录
Elena Mikhalkova, Nadezhda Ganzherli, Anna Glazkova, Yuliya Bidulya
Abstract: The article describes a fast solution to propaganda detection at SemEval-2020 Task 11, based onfeature adjustment. We use per-token vectorization of features and a simple Logistic Regressionclassifier to quickly test different hypotheses about our data. We come up with what seems to usthe best solution, however, we are unable to align it with the result of the metric suggested by theorganizers of the task. We test how our system handles class and feature imbalance by varying thenumber of samples of two classes (Propaganda and None) in the training set, the size of a contextwindow in which a token is vectorized and combination of vectorization means. The result of oursystem at SemEval2020 Task 11 is F-score=0.37.
摘要：本文描述了在SemEval-2020任务11，基于onfeature调整一个快速的解决方案，以宣传检测。我们使用的功能每个令牌量化和简单的物流Regressionclassifier快速测试对我们的数据不同的假设。我们想出了什么似乎usthe最佳的解决方案，但是，我们无法与度量该任务的theorganizers表示，这一结果保持一致。我们测试我们的系统如何处理类和功能失调通过改变两种类型的样本（宣传和无）在训练集的数量写，一个contextwindow的大小，其中令牌被矢量和矢量化手段相结合。 oursystem的SemEval2020任务11的结果是F-得分= 0.37。

16. CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection [PDF] 返回目录
Verena Blaschke, Maxim Korniyenko, Sam Tureski
Abstract: This paper describes our participation in the SemEval-2020 task Detection of Propaganda Techniques in News Articles. We participate in both subtasks: Span Identification (SI) and Technique Classification (TC). We use a bi-LSTM architecture in the SI subtask and train a complex ensemble model for the TC subtask. Our architectures are built using embeddings from BERT in combination with additional lexical features and extensive label post-processing. Our systems achieve a rank of 8 out of 35 teams in the SI subtask (F1-score: 43.86%) and 8 out of 31 teams in the TC subtask (F1-score: 57.37%).
摘要：本文介绍了我们在新闻文章宣传技巧的检测中SemEval-2020工作的参与。我们参加了两个子任务：跨度识别（SI）和技术分类（TC）。我们在SI子任务使用双LSTM架构和培养一个复杂的集成模型的TC子任务。我们的架构与附加词汇特征和广泛的标签后处理组合使用的嵌入来自BERT建。我们的系统实现8出来的35支球队的SI子任务（F1-得分：43.86％）的排名和8出的31支球队中TC子任务（F1-得分：57.37％）。

17. HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection [PDF] 返回目录
Meghana Bhange, Nirant Kasliwal
Abstract: Sentiment analysis for code-mixed social media text continues to be an under-explored area. This work adds two common approaches: fine-tuning large transformer models and sample efficient methods like ULMFiT. Prior work demonstrates the efficacy of classical ML methods for polarity detection. Fine-tuned general-purpose language representation models, such as those of the BERT family are benchmarked along with classical machine learning and ensemble methods. We show that NB-SVM beats RoBERTa by 6.2% (relative) F1. The best performing model is a majority-vote ensemble which achieves an F1 of 0.707. The leaderboard submission was made under the codalab username nirantk, with F1 of 0.689.
摘要：代码混合社交媒体文本情感分析仍然是一个充分开发的区域。这项工作增加两种常用的方法：微调大型变压器模型和样本有效的方法，如ULMFiT。以前的工作证明了经典的ML方法极性检测的功效。微调的通用语言表示模型，如那些BERT系列与经典的机器学习和集成方法沿为基准。我们证明了NB-SVM 6.2％（相对）F1击败罗伯塔。表现最佳的模型是一个多数表决合奏其实现的0.707的F1。排行榜划界案下codalab用户名nirantk制成，具有0.689 F1。

18. Applications of BERT Based Sequence Tagging Models on Chinese Medical Text Attributes Extraction [PDF] 返回目录
Gang Zhao, Teng Zhang, Chenxiao Wang, Ping Lv, Ji Wu
Abstract: We convert the Chinese medical text attributes extraction task into a sequence tagging or machine reading comprehension task. Based on BERT pre-trained models, we have not only tried the widely used LSTM-CRF sequence tagging model, but also other sequence models, such as CNN, UCNN, WaveNet, SelfAttention, etc, which reaches similar performance as LSTM+CRF. This sheds a light on the traditional sequence tagging models. Since the aspect of emphasis for different sequence tagging models varies substantially, ensembling these models adds diversity to the final system. By doing so, our system achieves good performance on the task of Chinese medical text attributes extraction (subtask 2 of CCKS 2019 task 1).
摘要：我们把中国医疗文本属性抽取的任务变成了序列标记或机器阅读理解任务。基于BERT预先训练的模型，我们不仅尝试了广泛使用的LSTM-CRF序列标注模型，还包括其他序列模型，如CNN，UCNN，WaveNet，SelfAttention等，达到性能LSTM + CRF类似。这揭示了传统的序列标注模型光。因为强调了不同的序列的标记的模型方面变化很大，ensembling这些模型增加多样性到最终系统。通过这样做，我们的系统实现了对中国医书的任务，良好的性能属性提取（CCKS 2019任务1的子任务2）。

19. Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [PDF] 返回目录
Yangjun Zhang, Pengjie Ren, Maarten de Rijke
Abstract: Conversational interfaces are increasingly popular as a way of connecting people to information. Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence or on single sentences instead of an entire dialogue. In this paper, we define the task of Malevolent Dialogue Response Detection and Classification (MDRDC). We make three contributions to advance research on this task. First, we present a Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical classification task over this taxonomy. Third, we apply stateof-the-art text classification methods to the MDRDC task and report on extensive experiments aimed at assessing the performance of these approaches.
摘要：会话界面是因为人们连接到信息的方式越来越受欢迎。基于语料库的会话接口能够产生比基于检索基于模板或代理商更加多样化和自然反应。凭借其增加corpusbased会话代理的生成能力来分类需要和过滤掉的内容和对话的行为而言是不合适的恶意响应。在识别和分类的不良内容的专题以往的研究大多集中在恶意的某一类或单句，而不是一个完整的对话。在本文中，我们定义了恶毒的对话响应检测和分类（MDRDC）的任务。我们做出预先研究三个方面的影响这项任务。首先，我们提出了一个层次恶毒对话分类（HMDT）。其次，我们创建了一个标记多转对话数据集，并制定MDRDC任务作为分层分类的任务交给了这种分类。第三，我们应用stateof最先进的文本分类方法的MDRDC任务和报告旨在评估这些方法的性能广泛的实验。

20. Team DoNotDistribute at SemEval-2020 Task 11: Features, Finetuning, and Data Augmentation in Neural Models for Propaganda Detection in News Articles [PDF] 返回目录
Michael Kranzlein, Shabnam Behzad, Nazli Goharian
Abstract: This paper presents our systems for SemEval 2020 Shared Task 11: Detection of Propaganda Techniques in News Articles. We participate in both the span identification and technique classification subtasks and report on experiments using different BERT-based models along with handcrafted features. Our models perform well above the baselines for both tasks, and we contribute ablation studies and discussion of our results to dissect the effectiveness of different features and techniques with the goal of aiding future studies in propaganda detection.
摘要：本文介绍了我们系统SemEval 2020共享任务11：宣传部的检测技术在新闻文章。我们参加的跨度鉴定和技术分类的子任务，并使用具有特色的手工沿着不同的基于BERT的模型实验报告都。我们的模型远高于基线对于这两项任务执行，以及我们的贡献消融研究，我们的研究结果的讨论与宣传的检测有助于未来研究的目标解剖的不同的功能和技术的有效性。

21. Abstractive Summarization of Spoken andWritten Instructions with BERT [PDF] 返回目录
Alexandra Savelieva, Bryan Au-Yeung, Vasanth Ramani
Abstract: Summarization of speech is a difficult problem due to the spontaneity of the flow, disfluencies, and other issues that are not usually encountered in written texts. Our work presents the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics, from gardening and cooking to software configuration and sports. In order to enrich the vocabulary, we use transfer learning and pretrain the model on a few large cross-domain datasets in both written and spoken English. We also do preprocessing of transcripts to restore sentence segmentation and punctuation in the output of an ASR system. The results are evaluated with ROUGE and Content-F1 scoring for the How2 and WikiHow datasets. We engage human judges to score a set of summaries randomly selected from a dataset curated from HowTo100M and YouTube. Based on blind evaluation, we achieve a level of textual fluency and utility close to that of summaries written by human content creators. The model beats current SOTA when applied to WikiHow articles that vary widely in style and topic, while showing no performance regression on the canonical CNN/DailyMail dataset. Due to the high generalizability of the model across different styles and domains, it has great potential to improve accessibility and discoverability of internet content. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
摘要：演讲的总结是一个棘手的问题，由于流动，不流利，以及其他问题通常不会在书面文本中遇到的自发性。我们的工作提出了BERTSum模型会话语言的首次应用。我们产生跨越各种主题的叙述教学视频抽象总结，从园艺和烹饪软件配置和运动。为了丰富的词汇，我们使用转让的学习和pretrain在书面和口头英语的几个大的跨域数据集模型。我们也做预处理成绩单恢复句子切分和标点在ASR系统的输出。结果与ROUGE和Content-F1评价得分为How2和wikiHow的数据集。我们从事人类评委得分一套从HowTo100M和YouTube策划的数据集随机选择的摘要。基于盲评，我们实现了文本的流畅性和效用接近于人类内容创作者的书面摘要的水平。当应用到的风格和主题有很大的不同，而呈现出的典型CNN /每日邮报数据集中没有表现回归wikiHow的文章模型击败当前SOTA。由于在不同的风格和领域模型的高普遍性，它具有很大的潜力，以提高互联网内容可访问性和可发现性。我们设想这个集成在智能虚拟助手功能，使他们能够总结应要求书面和口语教学内容。

22. Machine Semiotics [PDF] 返回目录
Peter Klimczak, Günther Wirsching, Peter beim Graben
Abstract: Despite their satisfactory speech recognition capabilities, current speech assistive devices still lack suitable automatic semantic analysis capabilities as well as useful representation of pragmatic world knowledge. Instead, current technologies require users to learn keywords necessary to effectively operate and work with a machine. Such a machine-centered approach can be frustrating for users. However, recognizing a basic difference between the semiotics of humans and machines presents a possibility to overcome this shortcoming: For the machine, the meaning of a (human) utterance is defined by its own scope of actions. Machines, thus, do not need to understand the meanings of individual words, nor the meaning of phrasal and sentence semantics that combine individual word meanings with additional implicit world knowledge. For speech assistive devices, the learning of machine specific meanings of human utterances by trial and error should be sufficient. Using the trivial example of a cognitive heating device, we show that -- based on dynamic semantics - this process can be formalized as the learning of utterance-meaning pairs (UMP). This is followed by a detailed semiotic contextualization of the previously generated signs.
摘要：尽管他们满意的语音识别能力，当前语音辅助设备仍缺乏合适的自动语义分析能力以及务实世界的知识有用表示。取而代之的是，目前的技术需要用户学习关键字要切实用机器操作和工作。这样的机器为中心的方法可以是用于用户沮丧。然而，认识到人与机器的礼物，以克服这个缺点可能的符号学之间的基本区别：对于机，（人）话语的意思是通过自己的行动范围限定。机，因此，并不需要了解的个别词的意思，也不是结合了附加隐含世界的知识单词的含义的短语和句子语义的含义。对于语音辅助装置，人类话语的机器特定含义的试验和错误的学习应该是足够了。使用认知加热装置的简单的例子，我们表明， - 基于动态语义 - 这个过程可以形式化为发声含义对（UMP）的学习。其次是先前生成的一个标志符号的详细语境。

23. Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning -- a Case Study on COVID-19 [PDF] 返回目录
Ali Shariq Imran, Sher Mohammad Doudpota, Zenun Kastrati, Rakhi Bhatra
Abstract: How different cultures react and respond given a crisis is predominant in a society's norms and political will to combat the situation. Often the decisions made are necessitated by events, social pressure, or the need of the hour, which may not represent the will of the nation. While some are pleased with it, others might show resentment. Coronavirus (COVID-19) brought a mix of similar emotions from the nations towards the decisions taken by their respective governments. Social media was bombarded with posts containing both positive and negative sentiments on the COVID-19, pandemic, lockdown, hashtags past couple of months. Despite geographically close, many neighboring countries reacted differently to one another. For instance, Denmark and Sweden, which share many similarities, stood poles apart on the decision taken by their respective governments. Yet, their nation's support was mostly unanimous, unlike the South Asian neighboring countries where people showed a lot of anxiety and resentment. This study tends to detect and analyze sentiment polarity and emotions demonstrated during the initial phase of the pandemic and the lockdown period employing natural language processing (NLP) and deep learning techniques on Twitter posts. Deep long short-term memory (LSTM) models used for estimating the sentiment polarity and emotions from extracted tweets have been trained to achieve state-of-the-art accuracy on the sentiment140 dataset. The use of emoticons showed a unique and novel way of validating the supervised deep learning models on tweets extracted from Twitter.
摘要：如何不同文化的反应，并给出应对危机是一个社会的准则和政治意愿来应对这种情况占主导地位。通常做出的决定是由事件，社会压力，还是需要时间的，这可能不是代表国家的意志必要。虽然有些很高兴能与它，其他人可能会显示不满。冠状病毒（COVID-19）带来了相似的情感的混合体，从对各自政府作出的决定的国家。社交媒体与包含在COVID-19，大流行，锁定的正面和负面情绪的帖子轰炸，井号标签过去几个月。尽管地理上接近，许多邻国不同的反应彼此。例如，丹麦和瑞典，其中有许多相似之处，站在南辕北辙由各自的政府作出的决定。然而，他们的国家的支持，主要是一致的，不同的是南亚邻国，人们表现出了极大的焦虑和不满。这项研究倾向于检测和分析情感极性并在流行的初始阶段，并采用在Twitter上的帖子自然语言处理（NLP）和深学习技术的锁定期表现出的情绪。深长用于从提取的鸣叫估计情感极性和情绪短期记忆（LSTM）模型已被训练来实现对数据集sentiment140状态的最先进的精度。使用表情符号的显示，从Twitter的鸣叫提取验证监督深度学习模式的独特和新颖的方式。

24. Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [PDF] 返回目录
Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu
Abstract: Millions of mobile apps are available in app stores, such as Apple's App Store and Google Play. For a mobile app, it would be increasingly challenging to stand out from the enormous competitors and become prevalent among users. Good user experience and well-designed functionalities are the keys to a successful app. To achieve this, popular apps usually schedule their updates frequently. If we can capture the critical app issues faced by users in a timely and accurate manner, developers can make timely updates, and good user experience can be ensured. There exist prior studies on analyzing reviews for detecting emerging app issues. These studies are usually based on topic modeling or clustering techniques. However, the short-length characteristics and sentiment of user reviews have not been considered. In this paper, we propose a novel emerging issue detection approach named MERIT to take into consideration the two aforementioned characteristics. Specifically, we propose an Adaptive Online Biterm Sentiment-Topic (AOBST) model for jointly modeling topics and corresponding sentiments that takes into consideration app versions. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version, and automatically interpret the meaning of the topics with most relevant phrases and sentences. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT in identifying emerging app issues, improving the state-of-the-art method by 22.3% in terms of F1-score. In terms of efficiency, MERIT can return results within acceptable time.
摘要：数以百万计的移动应用程序都在应用商店，比如苹果的App Store和谷歌Play提供。对于移动应用，这将是越来越具有挑战性，从巨大的竞争者中脱颖而出，成为用户之间流行。良好的用户体验而精心设计的功能的关键是成功的应用。为了实现这一目标，流行的应用程序通常安排他们会经常更新。如果我们可以捕捉及时，准确地面对用户关键应用的问题，开发人员可以及时的更新，以及良好的用户体验能够得到保证。存在于分析检测新出现的应用程序问题的评论之前的研究。这些研究通常是基于主题建模或聚类技术。然而，用户评论短距离的特点和情绪都没有被考虑。在本文中，我们提出了一个新颖的新兴名为MERIT问题检测方法考虑到上述两个特点。具体来说，我们提出了一个自适应在线Biterm情感主题（AOBST）为共同建模主题和对应的情感是考虑到应用程序的版本模式。基于该AOBST模型，我们推断在一个应用程序版本，用户评论负面反映的主题，并自动解译的主题内涵与最相关的短语和句子。从谷歌Play和Apple的App Store热门应用实验表明在确定新出现的应用的问题，由22.3％提高了国家的最先进的方法，在F1-分数方面MERIT的有效性。在效率方面，MERIT可以返回可接受的时间内的结果。

25. Fine-tune BERT for E-commerce Non-Default Search Ranking [PDF] 返回目录
Yunjiang Jiang, Yue Shang, Hongwei Shen, Wen-Yun Yang, Yun Xiao
Abstract: The quality of non-default ranking on e-commerce platforms, such as based on ascending item price or descending historical sales volume, often suffers from acute relevance problems, since the irrelevant items are much easier to be exposed at the top of the ranking results. In this work, we propose a two-stage ranking scheme, which first recalls wide range of candidate items through refined query/title keyword matching, and then classifies the recalled items using BERT-Large fine-tuned on human label data. We also implemented parallel prediction on multiple GPU hosts and a C++ tokenization custom op of Tensorflow. In this data challenge, our model won the 1st place in the supervised phase (based on overall F1 score) and 2nd place in the final phase (based on average per query F1 score).
摘要：非默认的排序上的电子商务平台，如基于提升项目的价格或下降的历史销量的好坏，往往是从急性相关问题受苦，因为不相关的项目更容易在的顶部暴露排名结果。在这项工作中，我们提出了两个阶段的分级方案，其中首先通过优化的查询/标题关键字匹配回忆广泛的候选项，然后分类使用被召回的商品BERT-大型微调对人体的标签数据。我们还实现在多个GPU主机并行预测和Tensorflow的C ++标记化的自定义同前。在这个数据的挑战，我们的模型获得了第一个在监督相位（基于整个F1分）和第2位的最后阶段（根据每个查询F1得分平均值）。

26. Efficient neural speech synthesis for low-resource languages through multilingual modeling [PDF] 返回目录
Marcel de Korte, Jaebok Kim, Esther Klabbers
Abstract: Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languages for which abundant multi-speaker data is not available. In this paper, we therefore investigated to what extent multilingual multi-speaker modeling can be an alternative to monolingual multi-speaker modeling, and explored how data from foreign languages may best be combined with low-resource language data. We found that multilingual modeling can increase the naturalness of low-resource language speech, showed that multilingual models can produce speech with a naturalness comparable to monolingual multi-speaker models, and saw that the target language naturalness was affected by the strategy used to add foreign language data.
摘要：在神经TTS的最新进展已经导致能够生产出高品质的合成语音模型。然而，这些模型通常需要大量的训练数据，它可以使生产成本高，一个新的声音与所需的品质。虽然多扬声器造型可以减少所需的新的语音数据的要求，这种方法通常是不可行的，用于其丰富的多扬声器数据不可用许多资源匮乏的语言。在本文中，因此我们研究到什么程度多语种多扬声器造型可以是单语多扬声器造型替代，并探讨了如何从外语数据可能最好的低资源语言数据相结合。我们发现，多语言建模可以增加低资源语言语音的自然，显示，多语言模型可以产生语音与自然相媲美单语多扬声器模型，看到目标语言自然是受到用于增加外国战略语言数据。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-25

目录

摘要