摘要

1. SemEval-2020 Task 6: Definition extraction from free text with the DEFT corpus [PDF] 返回目录
Sasha Spala, Nicholas A Miller, Franck Dernoncourt, Carl Dockhorn
Abstract: Research on definition extraction has been conducted for well over a decade, largely with significant constraints on the type of definitions considered. In this work, we present DeftEval, a SemEval shared task in which participants must extract definitions from free text using a term-definition pair corpus that reflects the complex reality of definitions in natural language. Definitions and glosses in free text often appear without explicit indicators, across sentences boundaries, or in an otherwise complex linguistic manner. DeftEval involved 3 distinct subtasks: 1)Sentence classification, 2) sequence labeling, and 3) relation extraction.
摘要：研究定义提取已经为十多年来进行的，主要是对所考虑的定义类型显著约束。在这项工作中，我们提出DeftEval，一个SemEval共享任务，参与者必须提取使用的一个术语，定义对语料反映在自然语言定义的复杂现实，从自由文本定义。定义和粉饰自由文本中经常出现没有明确的指标，整个句子的界限，或以其它复杂的语言方式。 DeftEval涉及3项不同的子任务：1）句子的分类，2）序列标签，和3）关系抽取。

2. Detecting Generic Music Features with Single Layer Feedforward Network using Unsupervised Hebbian Computation [PDF] 返回目录
Sourav Das, Anup Kumar Kolya
Abstract: With the ever-increasing number of digital music and vast music track features through popular online music streaming software and apps, feature recognition using the neural network is being used for experimentation to produce a wide range of results across a variety of experiments recently. Through this work, the authors extract information on such features from a popular open-source music corpus and explored new recognition techniques, by applying unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset. The authors show the detailed empirical findings to simulate how such an algorithm can help a single layer feedforward network in training for music feature learning as patterns. The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection. For comparative analysis against similar tasks, authors put their results with the likes of several previous benchmark works. They further discuss the limitations and thorough error analysis of their work. The authors hope to discover and gather new information about this particular classification technique and its performance, and further understand future potential directions and prospects that could improve the art of computational music feature recognition.
摘要：随着越来越多的数字音乐和丰富的音乐曲目，通过流行的在线音乐流媒体软件和应用程序，使用神经网络的特征识别功能被用于实验最近产生大范围的跨各种实验的结果。通过这项工作，作者从一个流行的开源音乐语料库中提取的这些特征信息，并探索新的识别技术，通过使用相同数据集的单层神经网络的应用无监督赫宾学习技术。作者展示了详细的实证研究结果来模拟这种算法可以如何帮助单层前馈网络培训音乐功能的学习模式。在无监督训练算法提高了他们提出的神经网络来实现的90.36％，为成功的音乐特征检测的精度。针对类似的任务比较分析，作者把自己的结果与前几次标杆作品的喜欢。他们进一步讨论的局限性和他们工作的深入误差分析。作者希望发现和收集有关这个特殊的分类技术，其性能的新信息，并进一步了解潜在的未来方向和前景，可以提高计算音乐特征识别的艺术。

3. Classifier Combination Approach for Question Classification for Bengali Question Answering System [PDF] 返回目录
Somnath Banerjee, Sudip Kumar Naskar, Sivaji Bandyopadhyay, Paolo Rosso
Abstract: Question classification (QC) is a prime constituent of automated question answering system. The work presented here demonstrates that the combination of multiple models achieve better classification performance than those obtained with existing individual models for the question classification task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble, stacking and voting, to increase QC accuracy. Lexical, syntactic and semantic features of Bengali questions are used for four well-known classifiers, namely Na\"ıve Bayes, kernel Na\"ıve Bayes, Rule Induction, and Decision Tree, which serve as our base learners. Single-layer question-class taxonomy with 8 coarse-grained classes is extended to two-layer taxonomy by adding 69 fine-grained classes. We carried out the experiments both on single-layer and two-layer taxonomies. Experimental results confirmed that classifier combination approaches outperform single classifier classification approaches by 4.02\% for coarse-grained question classes. Overall, the stacking approach produces the best results for fine-grained classification and achieves 87.79\% of accuracy. The approach presented here could be used in other Indo-Aryan or Indic languages to develop a question answering system.
摘要：问题分类（QC）是自动问答系统的一个主要组成部分。这里介绍的工作表明，多模式的组合，实现了比现有的个别型号在孟加拉的问题分类任务获得的更好的分类性能。我们已经利用国家的最先进的多模型组合技术，即，合奏，堆放和投票，以增加QC精度。词汇，孟加拉的问题，句法和语义特征用于四大知名分类，即娜\“伊夫贝叶斯，仁娜\”伊夫贝叶斯法则归纳，决策树，它作为我们的基础的学习者。 8粗粒类单层问题级分类法是通过将69细粒度类扩展到两层分类。我们都在单层和双层分类进行的实验。实验结果确认，分类器组合接近优于大单一分类分类4.02 \％接近为粗粒度的问题类。总体而言，堆叠方法产生的细颗粒分级的最好成绩，并实现精确的87.79 \％。这里介绍的方法可以在其他印度 - 雅利安或印度语被用来开发一个问答系统。

4. C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering [PDF] 返回目录
Laksh Advani, Clement Lu, Suraj Maharjan
Abstract: In today's interconnected and multilingual world, code-mixing of languages on social media is a common occurrence. While many Natural Language Processing (NLP) tasks like sentiment analysis are mature and well designed for monolingual text, techniques to apply these tasks to code-mixed text still warrant exploration. This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We tackle this problem by leveraging a set of hand-engineered lexical, sentiment, and metadata features to design a classifier that can disambiguate between "positive", "negative" and "neutral" sentiment. With this model, we are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks
摘要：在当今的互联和多语种的世界中，代码混合的语言在社会化媒体是一种常见的发生。虽然像情感分析许多自然语言处理（NLP）任务是成熟的，精心设计的单语种文本，技术，应用这些任务的代码混合文本仍令的探索。本文介绍了代码混合社交媒体的文字我们的特点工程方法情绪分析SemEval-2020任务9：SentiMix。我们利用一套手工设计的词汇，情绪的解决这个问题，和元数据功能设计的分类，可以“积极”，“负”和“中性”的情绪间的歧义。在这种模式下，我们能够获得的0.65加权F1比分为“印度英语”的任务和0.63，为“西班牙式”的任务

5. I-AID: Identifying Actionable Information from Disaster-related Tweets [PDF] 返回目录
Hamada M. Zahera, Rricha Jalota, Mohamed A. Sherif, Axel N. Ngomo
Abstract: Social media data plays a significant role in modern disaster management by providing valuable data about affected people, donations, help requests, and advice. Recent studies highlight the need to filter information on social media into fine-grained content categories. However, identifying useful information from massive amounts of social media posts during a crisis is a challenging task. Automatically categorizing the information (e.g., reports on affected individuals, donations, and volunteers) contained in these posts is vital for their efficient handling and consumption by the communities affected and organizations concerned. In this paper, we propose a system, dubbed I-AID, to automatically filter tweets with critical or actionable information from the enormous volume of social media data. Our system combines state-of-the-art approaches to process and represents textual data in order to capture its underlying semantics. In particular, we use 1) Bidirectional Encoder Representations from Transformers (commonly known as, BERT) to learn a contextualized vector representation of a tweet, and 2) a graph-based architecture to compute semantic correlations between the entities and hashtags in tweets and their corresponding labels. We conducted our experiments on a real-world dataset of disaster-related tweets. Our experimental results indicate that our model outperforms state-of-the-art approaches baselines in terms of F1-score by +11%.
摘要：社交媒体数据播放通过提供有关受灾群众捐款，帮助请求，并建议有价值的数据在现代灾害管理显著的作用。最近的研究突出表明有必要在社交媒体过滤的信息到细粒度内容类别。然而，在危机期间从识别大量的社交媒体帖子的有用信息是一项艰巨的任务。自动分类的信息（例如，在受影响的个人，捐款和志愿者报告）包含在这些职位对他们的有效处理量和消费量有关受影响的社区和组织是至关重要的。在本文中，我们提出了一个系统，被称为I-AID，自动过滤器的鸣叫与社交媒体数据的巨大量的关键或可操作的信息。我们的系统结合状态的最先进的方法来处理，并以捕获它的底层语义表示文本数据。特别地，我们使用1）从变压器（俗称，BERT双向编码表示）来学习鸣叫的情境矢量表示，以及2）基于图的架构来计算实体和主题标签之间的语义相关性在鸣叫和其相应的标签。我们进行了灾害有关的鸣叫的现实世界的数据集我们的实验。我们的实验结果表明，我们的模型优于国家的最先进的+ 11％的F1-得分方面接近基线。

6. Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study [PDF] 返回目录
Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins
Abstract: Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
摘要：大型生成语言模型，如GPT-2是其生成文本的能力，以及通过微调其监督下游任务的实用闻名。我们的工作是双重的：首先，我们通过人工评估表明，培训，人类和机器生成的文本区分的分类出现为“网页的质量”无监督的预测，能够检测低质量的内容无需任何培训。这使得在低资源环境的质量指标快速引导。其次，好奇地了解在野外低质量网页的流行和性质，我们进行了广泛的定性和定量分析，500多万网络文章，使以往任何时候都对这一课题进行这种规模最大的研究。

7. Discovering Bilingual Lexicons in Polyglot Word Embeddings [PDF] 返回目录
Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Tom M. Mitchell
Abstract: Bilingual lexicons and phrase tables are critical resources for modern Machine Translation systems. Although recent results show that without any seed lexicon or parallel data, highly accurate bilingual lexicons can be learned using unsupervised methods, such methods rely on the existence of large, clean monolingual corpora. In this work, we utilize a single Skip-gram model trained on a multilingual corpus yielding polyglot word embeddings, and present a novel finding that a surprisingly simple constrained nearest-neighbor sampling technique in this embedding space can retrieve bilingual lexicons, even in harsh social media data sets predominantly written in English and Romanized Hindi and often exhibiting code switching. Our method does not require monolingual corpora, seed lexicons, or any other such resources. Additionally, across three European language pairs, we observe that polyglot word embeddings indeed learn a rich semantic representation of words and substantial bilingual lexicons can be retrieved using our constrained nearest neighbor sampling. We investigate potential reasons and downstream applications in settings spanning both clean texts and noisy social media data sets, and in both resource-rich and under-resourced language pairs.
摘要：双语词典和短语表是现代机器翻译系统的关键资源。虽然最近的研究结果表明，没有任何种子词汇或并行数据，高度精确的双语词典可以使用无监督方法了解到，这种方法依赖于大的，干净的单语语料库的存在。在这项工作中，我们利用训练了多语种语料库产生通晓多种语言文字的嵌入单跳-gram模型，并提出一个新的发现，在这个嵌入空间出奇的简单约束最接近邻居采样技术，可以检索双语词典，即使在恶劣的社会媒体数据集主要以英语书写，罗马印地文，往往表现出代码转换。我们的方法不需要单语语料库，种子词汇，或任何其他这样的资源。此外，横跨欧洲三国语言对，我们观察到，通晓多国语言文字的嵌入确实学习单词和大量的双语词典可以使用我们的约束近邻取样进行检索的丰富的语义表示。我们调查涵盖既干净文本和嘈杂的社交媒体数据集的潜在原因，并在设置下游应用，并在这两个和资源不足的资源丰富的语言对。

8. A Bidirectional Tree Tagging Scheme for Jointly Extracting Overlapping Entities and Relations [PDF] 返回目录
Xukun Luo, Weijie Liu, Meng Ma, Ping Wang
Abstract: Joint extraction refers to extracting triples, composed of entities and relations, simultaneously from the text with a single model, but the existing methods rarely work well on sentences with overlapping issue, i.e., the same entity is included in multiple triples. In this paper, we propose a novel Bidirectional Tree Tagging (BiTT) scheme to label overlapping triples in the text. In a sentence, the triples with the same relation category are especially represented as two binary trees, each of which is converted into a word-level tags sequence to label each word. Based on our BiTT scheme, we develop an end-to-end classification framework to predict the BiTT tags. We adopt the Bi-LSTM layers and a pre-trained BERT encoder respectively as its encoder module, and obtain promising results in a public English dataset as well as a Chinese one. The source code is publicly available at https://anonymous/for/review.
摘要：联合开采是指提取三元组，实体和关系组成的，从单一车型的文字同时，但现有的方法很少在句子重叠的问题，即工作得很好，同样的实体包括在多个三元。在本文中，我们提出了一个新颖的双向树标记（BITT）方案标记重叠的三元组中的文本。在一个句子，具有相同的关系类别三元组是特别表示为两个二叉树，其中的每一个被转换成一个字级标签序列标记每个字。根据我们BITT方案，我们开发一个终端到终端的分类框架来预测BITT标签。我们分别采用双LSTM层和预训练BERT编码器作为其编码器模块，并获得公共英语数据集有希望的结果，以及作为一个中国人之一。源代码是公开的，在https：//开头匿名/为/审查。

9. SEEC: Semantic Vector Federation across Edge Computing Environments [PDF] 返回目录
Shalisha Witherspoon, Dean Steuer, Graham Bent, Nirmit Desai
Abstract: Semantic vector embedding techniques have proven useful in learning semantic representations of data across multiple domains. A key application enabled by such techniques is the ability to measure semantic similarity between given data samples and find data most similar to a given sample. State-of-the-art embedding approaches assume all data is available on a single site. However, in many business settings, data is distributed across multiple edge locations and cannot be aggregated due to a variety of constraints. Hence, the applicability of state-of-the-art embedding approaches is limited to freely shared datasets, leaving out applications with sensitive or mission-critical data. This paper addresses this gap by proposing novel unsupervised algorithms called \emph{SEEC} for learning and applying semantic vector embedding in a variety of distributed settings. Specifically, for scenarios where multiple edge locations can engage in joint learning, we adapt the recently proposed federated learning techniques for semantic vector embedding. Where joint learning is not possible, we propose novel semantic vector translation algorithms to enable semantic query across multiple edge locations, each with its own semantic vector-space. Experimental results on natural language as well as graph datasets show that this may be a promising new direction.
摘要：语义载体嵌入技术已经被证明在学习跨多个域数据的语义表示非常有用。一个关键的应用程序能够通过这样的技术是测量给定的数据样本之间的语义相似性，并找到最相似的一个给定的采样数据的能力。国家的最先进的嵌入方法假定所有的数据都可以在一个单一的网站。然而，在许多商业环境中，数据分布在多个边缘位置，因而不能被各种限制的聚集所致。因此，国家的最先进的嵌入体的适用性办法被限制自由地共享数据集，而忽略了以敏感或任务关键数据应用。本文地址通过提出新的无监督算法称为\ EMPH {SEEC}用于学习和应用语义向量在各种分布式设置嵌入该间隙。具体而言，对于场景中的多个边缘位置可以搞共同学习，我们适应了语义向量嵌入最近提出的联合学习技术。凡共同学习是不可能的，我们提出了新的语义向量转换算法，使得在多个边缘位置语义查询，每一个都有自己的语义向量空间。自然语言的实验结果，以及图形数据集显示，这可能是一个有前途的新方向。

10. LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis [PDF] 返回目录
Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso
Abstract: This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text. The proposed approach competed in SentiMix Hindi-English subtask, that addresses the problem of predicting the sentiment of a given Hindi-English code-mixed tweet. We propose Recurrent Convolutional Neural Network that combines both the recurrent neural network and the convolutional network to better capture the semantics of the text, for code-mixed sentiment analysis. The proposed system obtained 0.69 (best run) in terms of F1 score on the given test data and achieved the 9th place (Codalab username: somban) in the SentiMix Hindi-English subtask.
摘要：本文介绍了LIMSI UPV队在SemEval-2020任务9的参与：情感分析的代码混合社交媒体文本。所提出的方法在SentiMix印地文英子任务竞争，预测一个给定的印地文，英文代码混合鸣叫的情绪是地址的问题。我们建议经常卷积神经网络是结合了递归神经网络和卷积网络，以更好地捕捉文本的语义，代码混合情感分析。所提出的系统在F1得分给定的测试数据方面获得0.69（最佳运行），并取得了第9名（Codalab用户名：somban）在SentiMix印地文英子任务。

11. QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions [PDF] 返回目录
Rabab Alkhalifa, Theodore Yoong, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata
Abstract: This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We describe and analyse the results of our submissions. We show that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions can effectively boost performance from baseline results. We also show results of training data augmentation with rumours on other topics. Our best system ranked fourth in the task with encouraging outcomes showing potential for improved results in the future.
摘要：本文介绍了QMUL-SDS队参加了CLEF 2020 CheckThat的任务1！共享任务。该任务的目的是确定的鸣叫约COVID-19确定并优先鸣叫需要事实检查检查，老有所为。总体目标是进一步支持正在进行的努力，保护公众免受假新闻，并帮助人们找到可靠的信息。我们描述和分析我们提交的结果。我们表明，CNN使用与数字表达式增强COVID，Twitter的-BERT（CT-BERT）能够有效地提高从基准结果表现。我们还表明与其他主题的传言训练数据增强的结果。我们最好的系统在任务中排名第四，鼓励的结果显示了在未来改进的结果潜力。

12. Temporal Mental Health Dynamics on Social Media [PDF] 返回目录
Tom Tabak, Matthew Purver
Abstract: We describe a set of experiments for building a temporal mental health dynamics system. We utilise a pre-existing methodology for distant-supervision of mental health data mining from social media platforms and deploy the system during the global COVID-19 pandemic as a case study. Despite the challenging nature of the task, we produce encouraging results, both explicit to the global pandemic and implicit to a global phenomenon, Christmas Depression, supported by the literature. We propose a methodology for providing insight into temporal mental health dynamics to be utilised for strategic decision-making.
摘要：我们描述了一组实验建立时间的心理健康动态系统。我们利用从社交媒体平台的心理健康数据挖掘的遥远，监管预先存在的方法论和全球COVID-19大流行作为案例研究中部署这一系统。尽管任务的挑战性，我们产生了令人鼓舞的结果，既明确的全球流感大流行的和隐含的全球现象，圣诞萧条，通过文献支持。我们提出了一个方法，用于提供洞察时空的心理健康动态将被用于战略决策。

13. SocCogCom at SemEval-2020 Task 11: Characterizing and Detecting Propaganda using Sentence-Level Emotional Salience Features [PDF] 返回目录
Gangeshwar Krishnamurthy, Raj Kumar Gupta, Yinping Yang
Abstract: This paper describes a system developed for detecting propaganda techniques from news articles. We focus on examining how emotional salience features extracted from a news segment can help to characterize and predict the presence of propaganda techniques. Correlation analyses surfaced interesting patterns that, for instance, the "loaded language" and "slogan" techniques are negatively associated with valence and joy intensity but are positively associated with anger, fear and sadness intensity. In contrast, "flag waving" and "appeal to fear-prejudice" have the exact opposite pattern. Through predictive experiments, results further indicate that whereas BERT-only features obtained F1-score of 0.548, emotion intensity features and BERT hybrid features were able to obtain F1-score of 0.570, when a simple feedforward network was used as the classifier in both settings. On gold test data, our system obtained micro-averaged F1-score of 0.558 on overall detection efficacy over fourteen propaganda techniques. It performed relatively well in detecting "loaded language" (F1 = 0.772), "name calling and labeling" (F1 = 0.673), "doubt" (F1 = 0.604) and "flag waving" (F1 = 0.543).
摘要：本文介绍了从新闻宣传的检测技术，开发了一个系统。我们专注于研究如何显着性情感特征从新闻片段提取可以帮助描述和预测的宣传技巧的存在。相关分析浮出水面有趣的模式是，例如，“加载的语言”和“口号”技术是负价和喜悦强度有关，但与愤怒的正相关，恐惧和悲伤强度。相比之下，“挥旗”和“诉诸恐惧，偏见”具有完全相反的格局。通过预测实验，结果进一步表明，而BERT独有特性获得F1-得分的0.548，情感强度特征和BERT混合特征进行时一个简单的前馈网络被用作在这两个设置中的分类器能够获得的0.570，F1-得分。黄金测试数据，我们的系统获得的微平均的0.558 F1-得分上整体检测效果超过14个宣传技巧。它在检测 “加载的语言”（F1 = 0.772）， “名调用和标签”（F1 = 0.673）， “怀疑”（F1 = 0.604）和 “旗子飘动”（F1 = 0.543）表现相对较好。

14. Efficient Computation of Expectations under Spanning Tree Distributions [PDF] 返回目录
Ran Zmigrod, Tim Vieira, Ryan Cotterell
Abstract: We give a general framework for inference in spanning tree models. We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models. Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms. These algorithms are easy to implement, given the prevalence of automatic differentiation software. We motivate the development of our framework with several cautionary tales of previous re-search, which has developed numerous less-than-optimal algorithms for computing expectations and their gradients. We demonstrate how our framework efficiently computes several quantities with known algorithms, including the expected attachment score, entropy, and generalized expectation criteria. As a bonus, we give algorithms for quantities that are missing in the literature, including the KL divergence. In all cases, our approach matches the efficiency of existing algorithms and, in several cases, reducesthe runtime complexity by a factor (or two)of the sentence length. We validate the implementation of our framework through runtime experiments. We find our algorithms are upto $12$ and $26$ times faster than previous algorithms for computing the Shannon entropy and the gradient of the generalized expectation objective, respectively.
摘要：我们给出推断的总体框架中生成树模型。我们建议统一为一阶的期望和二阶预期在边缘因素，非投影生成树车型的重要情况的算法。我们的算法利用梯度和期望之间有一个基本的连接，这使我们能够得到有效的算法。这些算法很容易实现，给出了自动微分软件盛行。我们鼓励我们的框架的发展与前再次搜索，已经开发了许多低于最优算法来计算期望和他们的梯度的几个寓言故事。我们演示了如何我们的框架有效地计算一些量与已知的算法，包括预期附件得分，熵和广义期望的标准。作为奖励，我们给出了大量的算法，缺少文献，包括KL散度。在任何情况下，我们的方法通过句子长度因子（或两个）匹配的现有算法的效率，在一些情况下，reducesthe运行的复杂性。我们确认我们的框架，通过运行实验的实施。我们发现我们的算法是高达$ 12 $ $和26比分别计算信息熵和广义预期目标的梯度，以前的算法快$倍。

15. Zero-Resource Knowledge-Grounded Dialogue Generation [PDF] 返回目录
Linxiao Li, Can Xu, Wei Wu, Yufan Zhao, Xueliang Zhao, Chongyang Tao
Abstract: While neural conversation models have shown great potentials towards generating informative and engaging responses via introducing external knowledge, learning such a model often requires knowledge-grounded dialogues that are difficult to obtain. To overcome the data challenge and reduce the cost of building a knowledge-grounded dialogue system, we explore the problem under a zero-resource setting by assuming no context-knowledge-response triples are needed for training. To this end, we propose representing the knowledge that bridges a context and a response and the way that the knowledge is expressed as latent variables, and devise a variational approach that can effectively estimate a generation model from a dialogue corpus and a knowledge corpus that are independent with each other. Evaluation results on three benchmarks of knowledge-grounded dialogue generation indicate that our model can achieve comparable performance with state-of-the-art methods that rely on knowledge-grounded dialogues for training, and exhibits a good generalization ability over different topics and different datasets.
摘要：虽然神经会话模型表明，通过对引进外部知识，学习这种模式产生翔实，引人入胜的反应潜力巨大，往往需要知识接地对话，这是很难获得的。为了克服数据挑战，减少建筑知识接地对话系统的成本，我们假设没有上下文知识响应所需的培训三元探索下一个零的资源环境问题。为此，我们建议表示桥等上下文和响应和方式的知识表达为潜在变数的知识，并制定一个变分法，可以有效地估计从一个对话语料库是一代模型和知识语料库相互独立的。知识接地对话产生的三个基准评价结果表明，我们的模型可以实现与依赖于知识培训接地对话的国家的最先进的方法相当的性能，并且表现出了不同的主题，不同的数据集良好的推广能力。

16. Knowledge Efficient Deep Learning for Natural Language Processing [PDF] 返回目录
Hai Wang
Abstract: Deep learning has become the workhorse for a wide range of natural language processing applications. But much of the success of deep learning relies on annotated examples. Annotation is time-consuming and expensive to produce at scale. Here we are interested in methods for reducing the required quantity of annotated data -- by making the learning methods more knowledge efficient so as to make them more applicable in low annotation (low resource) settings. There are various classical approaches to making the models more knowledge efficient such as multi-task learning, transfer learning, weakly supervised and unsupervised learning etc. This thesis focuses on adapting such classical methods to modern deep learning models and algorithms. This thesis describes four works aimed at making machine learning models more knowledge efficient. First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models. In particular, we apply KRDL built on Markov logic networks to denoise weak supervision. Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision. Third, we investigate the knowledge transfer techniques in multilingual setting, where we proposed a method that can improve pre-trained multilingual BERT based on the bilingual dictionary. Fourth, we present an episodic memory network for language modelling, in which we encode the large external knowledge for the pre-trained GPT.
摘要：深学习已成为一个广泛的自然语言处理应用的主力。但有多深学习的成功依赖于注释的例子。注释是耗时和昂贵的大规模生产。在这里，我们感兴趣的是技术，减少注释所需的数据量 - 通过使学习方法更有效的知识，使之更适用于低注释（低资源）的设置。有各种经典方法，以使模型更有效的知识，如多任务学习，迁移学习，弱监督和无监督学习等。本文主要研究使这种传统方法向现代深度学习模型和算法。本文介绍了四个方面的工作，旨在使机器学习模型更多的知识有效。首先，我们提出了一个知识丰富的深度学习模型（KRDL）作为加入先验知识进深模型一个统一的学习框架。特别是，我们采用建立在马尔科夫逻辑网络KRDL去噪监管不力。其次，我们应用KRDL模型，以帮助机器阅读模式，找到正确的证据的句子，可以支持他们的决定。第三，我们探讨多语言环境，在这里我们建议，可以完善的售前培训的基础上，双语词典多种语言BERT的方法的知识转移技术。第四，我们提出了一个情景记忆网络语言模型，在此我们为编码预先训练GPT大的外部知识。

17. TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for Identification of Informative COVID-19 English Tweets [PDF] 返回目录
Anh Tuan Nguyen
Abstract: As the COVID-19 outbreak continues to spread throughout the world, more and more information about the pandemic has been shared publicly on social media. For example, there are a huge number of COVID-19 English Tweets daily on Twitter. However, the majority of those Tweets are uninformative, and hence it is important to be able to automatically select only the informative ones for downstream applications. In this short paper, we present our participation in the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets. Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective baseline for the task. Despite its simplicity, our proposed approach shows very competitive results in the leaderboard as we ranked 8 over 56 teams participated in total.
摘要：由于COVID-19疫情继续在世界各地传播，大流行的越来越多的信息已经被公布在社交媒体共享。例如，每天都有COVID-19英语鸣叫在Twitter上的一个巨大的数字。然而，大多数的鸣叫是不提供信息，因此它能够自动选择仅适用于下游应用信息的人是很重要的。在这短短的文章中，我们提出我们的W-NUT 2020共享任务2参与：信息化COVID-19英语鸣叫的识别。通过在变压器预先训练语言模型的最新进展的启发，我们提出的任务简单而有效的基线。尽管它的简单，作为我们排在8支球队56排行榜我们提出的方法显示了非常有竞争力的结果参加总。

18. HeteGCN: Heterogeneous Graph Convolutional Networks for Text Classification [PDF] 返回目录
Rahul Ragesh, Sundararajan Sellamanickam, Arun Iyer, Ram Bairi, Vijay Lingam
Abstract: We consider the problem of learning efficient and inductive graph convolutional networks for text classification with a large number of examples and features. Existing state-of-the-art graph embedding based methods such as predictive text embedding (PTE) and TextGCN have shortcomings in terms of predictive performance, scalability and inductive capability. To address these limitations, we propose a heterogeneous graph convolutional network (HeteGCN) modeling approach that unites the best aspects of PTE and TextGCN together. The main idea is to learn feature embeddings and derive document embeddings using a HeteGCN architecture with different graphs used across layers. We simplify TextGCN by dissecting into several HeteGCN models which (a) helps to study the usefulness of individual models and (b) offers flexibility in fusing learned embeddings from different models. In effect, the number of model parameters is reduced significantly, enabling faster training and improving performance in small labeled training set scenario. Our detailed experimental studies demonstrate the efficacy of the proposed approach.
摘要：我们认为学习的文本分类效率和感性图卷积网络有大量的实例和功能的问题。现有状态的最先进的图形嵌入基础的方法诸如预测文本嵌入（PTE）和TextGCN已经缺点的预测性能，可扩展性和感应能力方面。为了解决这些限制，我们提出了一个异构图形卷积网络（HeteGCN）建模方法，团结PTE和TextGCN最好的方面一起。其主要思想是学习功能的嵌入和使用HeteGCN架构，跨层使用不同的图形文件导出的嵌入。我们通过解剖成几HeteGCN模型，其中（一）有助于研究不同车型融合了解到的嵌入各个模型的有效性和（b）提供了灵活性简化TextGCN。实际上，模型参数的数目显著减少，从而实现更快的训练和提高小标记训练集方案的性能。我们详细的实验研究表明，该方法的有效性。

19. HittER: Hierarchical Transformers for Knowledge Graph Embeddings [PDF] 返回目录
Sanxing Chen, Xiaodong Liu, Jianfeng Gao, Jian Jiao, Ruofei Zhang, Yangfeng Ji
Abstract: This paper examines the challenging problem of learning representations of entities and relations in a complex multi-relational knowledge graph. We propose HittER, a Hierarchical Transformer model to jointly learn Entity-relation composition and Relational contextualization based on a source entity's neighborhood. Our proposed model consists of two different Transformer blocks: the bottom block extracts features of each entity-relation pair in the local neighborhood of the source entity and the top block aggregates the relational information from the outputs of the bottom block. We further design a masked entity prediction task to balance information from the relational context and the source entity itself. Evaluated on the task of link prediction, our approach achieves new state-of-the-art results on two standard benchmark datasets FB15K-237 and WN18RR.
摘要：本文探讨学习实体和关系的表示在一个复杂的多关系知识图的具有挑战性的问题。我们建议安打，分层变压器模型，共同学习的实体 - 关系构成和关系语境基于源实体的附近。我们提出的模型由两个不同的变压器块组成：在源实体的局部邻域中的每个实体 - 关系对底块中提取特征和顶块聚集从底部块的输出的关系的信息。我们进一步从关系背景和来源实体本身设计掩盖实体预测任务余额信息。评估链路预测的任务，我们的方法实现了国家的最先进的新上两个标准基准数据集FB15K-237和WN18RR结果。

20. Rethinking the objectives of extractive question answering [PDF] 返回目录
Martin Fajcik, Josef Jon, Santosh Kesiraju, Pavel Smrz
Abstract: This paper describes two generally applicable approaches towards the significant improvement of the performance of state-of-the-art extractive question answering (EQA) systems. Firstly, contrary to a common belief, it demonstrates that using the objective with independence assumption for span probability $P(a_s,a_e) = P(a_s)P(a_e)$ of span starting at position $a_s$ and ending at position $a_e$ may have adverse effects. Therefore we propose a new compound objective that models joint probability $P(a_s,a_e)$ directly, while still keeping the objective with independency assumption as an auxiliary objective. Our second approach shows the beneficial effect of distantly semi-supervised shared-normalization objective known from (Clark and Gardner, 2017). We show that normalizing over a set of documents similar to the golden passage, and marginalizing over all ground-truth answer string positions leads to the improvement of results from smaller statistical models. Our results are supported via experiments with three QA models (BidAF, BERT, ALBERT) over six datasets. The proposed approaches do not use any additional data. Our code, analysis, pretrained models, and individual results will be available online.
摘要：本文介绍了对国家的最先进的采掘问答（EQA）系统的性能的改善显著两个通常适用的方法。首先，违背了一个共同的信念，它表明，使用客观与独立性假设跨度概率$ P（A_S，a_e）= P（A_S）P（a_e）跨度$起始位置$ A_S $和结束的位置$ a_e $可能产生不利影响。因此，我们提出的目标新的化合物，它的模型联合概率$ P（A_S，a_e）$直接，同时仍保持与独立性假设，目标为辅助目标。我们的第二种方法示出了远缘半监督共享正常化目标从（Clark和加德纳，2017）已知的有益效果。我们展示了一组类似的黄金通道文档正火，和边缘化在所有地面实况答案串的位置导致从较小的统计模型结果的提高。我们的研究结果是通过三个QA车型（BidAF，BERT，ALBERT）超过六个集实验的支持。所提出的方法不使用任何额外的数据。我们的代码，分析，预先训练模式，和个人成绩将在网上提供。

21. Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs [PDF] 返回目录
Clara H. McCreery, Namit Katariya, Anitha Kannan, Manish Chablani, Xavier Amatriain
Abstract: People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective question answering schema. COVID-19 has only exacerbated this problem. Almost every government agency and healthcare organization has tried to meet the informational need of users by building online FAQs, but there is no way for people to ask their question and know if it is answered on one of these pages. While many research efforts have focused on the problem of general question similarity, these approaches do not generalize well to domains that require expert knowledge to determine semantic similarity, such as the medical domain. In this paper, we show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs followed by fine-tuning on medical question-question pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pretraining tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, an accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5% when the full corpus of medical question-answer data is used. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQs.
摘要：人们越来越多地要求回答他们的医疗问题，但在其医疗问题被要求在网上显著超过资格的人来回答他们的能力的速度在网上搜索。这使得许多悬而未决的问题或不充分的回答。许多这些问题是不是唯一的，类似的问题，可靠的识别将有助于更加有效的问答模式。 COVID-19只加剧了这一问题。几乎所有的政府机构和医疗保健机构试图通过建立网上常见问题，以满足用户的信息需求，但也没有办法为人们问他们的问题，并知道如果这些页面上的一个回答。虽然许多研究工作都集中在一般性的问题相似的问题，这些方法不能一概而论很好地需要专业知识来确定语义相似性，比如医疗领域的域。在本文中，我们表现出训练前的医疗问题，回答对神经网络的双微调的方式如何，随后对医疗问题 - 问题对微调是决定医疗问题相似的最终目标是特别有用的中间任务。虽然其他预训练任务此任务获得低于78.7％的准确度，我们的模型达到82.6％，与相同数量的训练实例，80.0％具有更小的训练集的准确性和84.5％的准确度的精度时，医疗问答数据的完整语料库使用。我们还描述了使用训练的模型来匹配用户问题COVID相关的常见问题当前正在运行的系统。

22. Neural Sinkhorn Topic Model [PDF] 返回目录
He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine
Abstract: In this paper, we present a new topic modelling approach via the theory of optimal transport (OT). Specifically, we present a document with two distributions: a distribution over the words (doc-word distribution) and a distribution over the topics (doc-topic distribution). For one document, the doc-word distribution is the observed, sparse, low-level representation of the content, while the doc-topic distribution is the latent, dense, high-level one of the same content. Learning a topic model can then be viewed as a process of minimising the transportation of the semantic information from one distribution to the other. This new viewpoint leads to a novel OT-based topic modelling framework, which enjoys appealing simplicity, effectiveness, and efficiency. Extensive experiments show that our framework significantly outperforms several state-of-the-art models in terms of both topic quality and document representations.
摘要：在本文中，我们提出了通过优化传输（OT）的理论的一个新课题建模方法。具体来说，我们提出了一个文件有两个发行版：在字（DOC字分布），并通过主题（DOC话题分布）的分布进行分布。对于一个文档，该文档字分布是所观察到的，疏，低水平的内容的表示，而DOC-主题分布是潜在的，致密的，高级别相同的内容中的一个。学习主题模型然后可被看作是从一个分发到其他最小化的语义信息的传送的处理。这种新的观点导致了一种新的基于OT-主题建模框架，享有吸引人的简单性，有效性和效率。大量的实验表明，我们的框架显著优于国家的最先进的几款机型在这两个题目的质量和文献着录的条款。

23. Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition [PDF] 返回目录
Wei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He
Abstract: Recent advances of end-to-end models have outperformed conventional models through employing a two-pass model. The two-pass model provides better speed-quality trade-offs for on-device speech recognition, where a 1st-pass model generates hypotheses in a streaming fashion, and a 2nd-pass model re-scores the hypotheses with full audio sequence context. The 2nd-pass model plays a key role in the quality improvement of the end-to-end model to surpass the conventional model. One main challenge of the two-pass model is the computation latency introduced by the 2nd-pass model. Specifically, the original design of the two-pass model uses LSTMs for the 2nd-pass model, which are subject to long latency as they are constrained by the recurrent nature and have to run inference sequentially. In this work we explore replacing the LSTM layers in the 2nd-pass rescorer with Transformer layers, which can process the entire hypothesis sequences in parallel and can therefore utilize the on-device computation resources more efficiently. Compared with an LSTM-based baseline, our proposed Transformer rescorer achieves more than 50% latency reduction with quality improvement.
摘要：终端到高端机型的最新进展，通过采用双通道模式优于传统模式。双通道模式提供更好的速度，质量取舍设备上的语音识别，其中第1通模式以流方式产生的假设，以及第二通模式重新评分与全音频序列方面的假设。第二通模型，对于中端至高端机型的品质提升到超越传统模式的关键作用。双通道模式的一个主要挑战是由第二通模型引入的计算延迟。具体而言，所述两个通模型的用途LSTMs为第二通模型，这是受长的等待时间，因为它们是由经常性约束，并且具有按顺序运行推理的原始设计。在这项工作中，我们探索与变压器层，其可以并行处理的整个序列的假设，因此可以更有效地利用设备上的计算资源的第二通rescorer更换LSTM层。与基于LSTM基线相比，我们提出的变压器rescorer实现了与质量改善50％以上，减少等待时间。

24. Data augmentation using prosody and false starts to recognize non-native children's speech [PDF] 返回目录
Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo
Abstract: This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech. The task is to recognize non-native speech from children of various age groups given a limited amount of speech. Moreover, the speech being spontaneous has false starts transcribed as partial words, which in the test transcriptions leads to unseen partial words. To cope with these two challenges, we investigate a data augmentation-based approach. Firstly, we apply the prosody-based data augmentation to supplement the audio data. Secondly, we simulate false starts by introducing partial-word noise in the language modeling corpora creating new words. Acoustic models trained on prosody-based augmented data outperform the models using the baseline recipe or the SpecAugment-based augmentation. The partial-word noise also helps to improve the baseline language model. Our ASR system, a combination of these schemes, is placed third in the evaluation period and achieves the word error rate of 18.71%. Post-evaluation period, we observe that increasing the amounts of prosody-based augmented data leads to better performance. Furthermore, removing low-confidence-score words from hypotheses can lead to further gains. These two improvements lower the ASR error rate to 17.99%.
摘要：本文介绍了对非本地儿童的讲话上自动语音识别（ASR）的INTERSPEECH 2020共享任务AaltoASR的语音识别系统。任务是从给定语音有限数量的不同年龄组的儿童承认非本地语音。此外，语音存在自发具有转录为部分单词错误的开始，这在测试转录导致看不见的部分的话。为了应对这两个挑战，我们研究了基于增强数据的方法。首先，我们采用基于韵律数据增强来补充音频数据。其次，我们通过模拟的语言模型语料库创造新词引入部分字噪音出师不利。训练有素的基于韵律，增强数据的声学模型优于使用基线配方或基于SpecAugment，增强模型。局部字噪音也有助于提高基准语言模型。我们的ASR系统中，这些方案的组合，在评估期被置于第三和实现了18.71％的字错误率。后评价期间，我们观察到，增加基于韵律增强数据导线的用量更好的性能。此外，从假设除去低信心分数的话可能会导致进一步上扬。这两项改进降低了ASR错误率17.99％。

25. Perla: A Conversational Agent for Depression Screening in Digital Ecosystems. Design, Implementation and Validation [PDF] 返回目录
Raúl Arrabales
Abstract: Most depression assessment tools are based on self-report questionnaires, such as the Patient Health Questionnaire (PHQ-9). These psychometric instruments can be easily adapted to an online setting by means of electronic forms. However, this approach lacks the interacting and engaging features of modern digital environments. With the aim of making depression screening more available, attractive and effective, we developed Perla, a conversational agent able to perform an interview based on the PHQ-9. We also conducted a validation study in which we compared the results obtained by the traditional self-report questionnaire with Perla's automated interview. Analyzing the results from this study we draw two significant conclusions: firstly, Perla is much preferred by Internet users, achieving more than 2.5 times more reach than a traditional form-based questionnaire; secondly, her psychometric properties (Cronbach's {\alpha} of 0.81, sensitivity of 96% and specificity of 90%) are excellent and comparable to the traditional well-established depression screening questionnaires.
摘要：大多数抑郁症的评估工具是基于自我报告问卷，如患者健康问卷（PHQ-9）。这些心理测量工具可以通过电子表格的装置很容易地适应在线设置。然而，这种做法缺乏现代数字环境的互动和配合的特点。随着使得抑郁症检查更多可用的，有吸引力的和有效的目的，我们开发Perla的，能够进行基于PHQ-9的采访会话代理。我们还进行中，我们比较了与佩拉的自动面试了传统的自我报告的问卷调查所获得的结果验证研究。这项研究分析结果，我们得出两个结论显著：首先，佩拉是很多互联网用户的首选，实现比传统的基于表单的问卷超过2.5倍的范围;其次，她的心理测量特性（Cronbach的0.81 {\阿尔法}，96％的灵敏度和90％特异性）是优异的并且与传统的公认的抑郁问卷筛选。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-09-01

目录

摘要